100万亿Token实证AI研究:从交互到推理的范式转移(英)
State of AI:An Empirical 100 Trillion Token Study with OpenRouterMalika Aubakirova∗†, Alex Atallah‡, Chris Clark‡, Justin Summerville‡, and Anjney Midha†‡OpenRouter Inc.†a16z (Andreessen Horowitz)December, 2025AbstractThe past year has marked a turning point in the evolution and real-world use of large language models(LLMs). With the release of the first widely adopted reasoning model, o1, on December 5th, 2024, the fieldshifted from single-pass pattern generation to multi-step deliberation inference, accelerating deployment,experimentation, and new classes of applications. As this shift unfolded at a rapid pace, our empiricalunderstanding of how these models have actually been used in practice has lagged behind. In this work,we leverage the OpenRouter platform, which is an AI inference provider across a wide variety of LLMs,to analyze over 100 trillion tokens of real-world LLM interactions across tasks, geographies, and time.In our empirical study, we observe substantial adoption of open-weight models, the outsized popularityof creative roleplay (beyond just the productivity tasks many assume dominate) and coding assistancecategories, plus the rise of agentic inference. Furthermore, our retention analysis identifies foundationalcohorts: early users whose engagement persists far longer than later cohorts. We term this phenomenonthe Cinderella “Glass Slipper” effect. These findings underscore that the way developers and end-usersengage with LLMs “in the wild” is complex and multifaceted. We discuss implications for model builders,AI developers, and infrastructure providers, and outline how a data-driven understanding of usage caninform better design and deployment of LLM systems.1IntroductionJust a year ago, the landscape of large language models looked fundamentally different.Prior to late2024, state-of-the-art systems were dominated by single-pass, autoregressive predictors optimized to continuetext sequences. Several precursor efforts attempted to approximate reasoning through advanced instructionfollowing and tool use. For instance, Anthropic’s Sonnet 2.1 & 3 models excelled at sophisticated tool useand Retrieval-Augmented Generation (RAG), and Cohere’s Command R models incorporated structuredtool-planning tokens. Separately, open source projects like those done by Reflection explored supervisedchain-of-thought and self-critique loops during training.Although these advanced techniques producedreasoning-like outputs and superior instruction following, the fundamental inference procedure remainedbased on a single forward pass, emitting a surface-level trace learned from data rather than performingiterative, internal computation.This paradigm evolved on December 5, 2024, when OpenAI released the first full version of its o1reasoning model (codenamed Strawberry) [4]. The preview released on September 12, 2024 had alreadyindicated a departure from conventional autoregressive inference. Unlike prior systems, o1 employed anexpanded inference-time com
100万亿Token实证AI研究:从交互到推理的范式转移(英),点击即可下载。报告格式为PDF,大小11.43M,页数36页,欢迎下载。



