100万亿Token实证AI研究:从交互到推理的范式转移(英)

State of AI:An Empirical 100 Trillion Token Study with OpenRouterMalika Aubakirova∗†, Alex Atallah‡, Chris Clark‡, Justin Summerville‡, and Anjney Midha†‡OpenRouter Inc.†a16z (Andreessen Horowitz)December, 2025AbstractThe past year has marked a turning point in the evolution and real-world use of large language models(LLMs). With the release of the first widely adopted reasoning model, o1, on December 5th, 2024, the fieldshifted from single-pass pattern generation to multi-step deliberation inference, accelerating deployment,experimentation, and new classes of applications. As this shift unfolded at a rapid pace, our empiricalunderstanding of how these models have actually been used in practice has lagged behind. In this work,we leverage the OpenRouter platform, which is an AI inference provider across a wide variety of LLMs,to analyze over 100 trillion tokens of real-world LLM interactions across tasks, geographies, and time.In our empirical study, we observe substantial adoption of open-weight models, the outsized popularityof creative roleplay (beyond just the productivity tasks many assume dominate) and coding assistancecategories, plus the rise of agentic inference. Furthermore, our retention analysis identifies foundationalcohorts: early users whose engagement persists far longer than later cohorts. We term this phenomenonthe Cinderella “Glass Slipper” effect. These findings underscore that the way developers and end-usersengage with LLMs “in the wild” is complex and multifaceted. We discuss implications for model builders,AI developers, and infrastructure providers, and outline how a data-driven understanding of usage caninform better design and deployment of LLM systems.1IntroductionJust a year ago, the landscape of large language models looked fundamentally different.Prior to late2024, state-of-the-art systems were dominated by single-pass, autoregressive predictors optimized to continuetext sequences. Several precursor efforts attempted to approximate reasoning through advanced instructionfollowing and tool use. For instance, Anthropic’s Sonnet 2.1 & 3 models excelled at sophisticated tool useand Retrieval-Augmented Generation (RAG), and Cohere’s Command R models incorporated structuredtool-planning tokens. Separately, open source projects like those done by Reflection explored supervisedchain-of-thought and self-critique loops during training.Although these advanced techniques producedreasoning-like outputs and superior instruction following, the fundamental inference procedure remainedbased on a single forward pass, emitting a surface-level trace learned from data rather than performingiterative, internal computation.This paradigm evolved on December 5, 2024, when OpenAI released the first full version of its o1reasoning model (codenamed Strawberry) [4]. The preview released on September 12, 2024 had alreadyindicated a departure from conventional autoregressive inference. Unlike prior systems, o1 employed anexpanded inference-time com

立即下载
综合
2025-12-18
36页
11.43M
收藏
分享

100万亿Token实证AI研究:从交互到推理的范式转移(英),点击即可下载。报告格式为PDF,大小11.43M,页数36页,欢迎下载。

本报告共36页,只提供前10页预览,清晰完整版报告请下载后查看,喜欢就下载吧!
立即下载
本报告共36页,只提供前10页预览,清晰完整版报告请下载后查看,喜欢就下载吧!
立即下载
水滴研报所有报告均是客户上传分享,仅供网友学习交流,未经上传用户书面授权,请勿作商用。
相关图表
表现出疾病进展。疾病控制率为 28.6%
综合
2025-12-18
来源:2025年创新深水区:核药研发机遇与挑战
查看原文
图 3.7 江西省发债城投企业筹资活动现金净
综合
2025-12-18
来源:联合资信评估-地方政府与城投企业债务风险研究报告-江西篇
查看原文
图 3.3 江西省各地市城投企业债券发行情况(单位:亿元)
综合
2025-12-18
来源:联合资信评估-地方政府与城投企业债务风险研究报告-江西篇
查看原文
图 2.3 2024 年江西省各地级市政府性基金收入情况(单位:亿元)
综合
2025-12-18
来源:联合资信评估-地方政府与城投企业债务风险研究报告-江西篇
查看原文
表 1.3 2022-2024 年及 2025 年 1-6 月江西省主要财政数据
综合
2025-12-18
来源:联合资信评估-地方政府与城投企业债务风险研究报告-江西篇
查看原文
图表 18 可转债核心风险指标识别
综合
2025-12-18
来源:联合资信评估_可转债风险重构与应对—基于风险案例与退市新规视角
查看原文
回顶部
报告群
公众号
小程序
在线客服
收起