美联储-全面召回?大型语言模型的宏观经济知识评价(英)

Finance and Economics Discussion SeriesFederal Reserve Board, Washington, D.C.ISSN 1936-2854 (Print)ISSN 2767-3898 (Online)Total Recall? Evaluating the Macroeconomic Knowledge of LargeLanguage ModelsLeland D. Crane, Akhil Karra, Paul E. Soto2025-044Please cite this paper as:Crane, D. Leland, Akhil Karra, Paul E. Soto (2025).“Total Recall?Evaluating theMacroeconomic Knowledge of Large Language Models,” Finance and Economics Discus-sion Series 2025-044.Washington: Board of Governors of the Federal Reserve System,https://doi.org/10.17016/FEDS.2025.044.NOTE: Staff working papers in the Finance and Economics Discussion Series (FEDS) are preliminarymaterials circulated to stimulate discussion and critical comment. The analysis and conclusions set forthare those of the authors and do not indicate concurrence by other members of the research staff or theBoard of Governors. References in publications to the Finance and Economics Discussion Series (other thanacknowledgement) should be cleared with the author(s) to protect the tentative character of these papers.Total Recall? Evaluating the Macroeconomic Knowledgeof Large Language Models*Leland D. Crane†Akhil Karra‡Paul E. Soto†June 24, 2025AbstractWe evaluate the ability of large language models (LLMs) to estimate historical macroe-conomic variables and data release dates. We find that LLMs have precise knowledgeof some recent statistics, but performance degrades as we go farther back in history. Wehighlight two particularly important kinds of recall errors: mixing together first printdata with subsequent revisions (i.e., smoothing across vintages) and mixing data forpast and future reference periods (i.e., smoothing within vintages). We also find thatLLMs can often recall individual data release dates accurately, but aggregating acrossseries shows that on any given day the LLM is likely to believe it has data in hand whichhas not been released. Our results indicate that while LLMs have impressively accuraterecall, their errors point to some limitations when used for historical analysis or to mimicreal time forecasters.*We thank Gary Cornwall, Anne Hansen, participants in Board brownbags, and participants at the 2025 SGEconference for useful comments. We thank Betsy Vrankovich for her technical expertise.Opinions expressed herein are those of the authors alone and do not necessarily reflect the views of the FederalReserve System or the Board of Governors.†Board of Governors of the Federal Reserve System‡Carnegie Mellon University1IntroductionThe rise of large language models (LLMs) has generated interest in how they can be used foreconomic analysis and forecasting (e.g., Korinek 2023). The utility of LLMs depends on theirunderstanding of economics-related facts and their ability to follow instructions precisely.We evaluate LLMs on several dimensions related to these capabilities. First, how well doLLMs estimate important macroeconomic variables from the past? Second, to what extentare LLMs’ estimates contami

立即下载
金融
2025-07-07
48页
1.48M
收藏
分享

美联储-全面召回?大型语言模型的宏观经济知识评价(英),点击即可下载。报告格式为PDF,大小1.48M,页数48页,欢迎下载。

本报告共48页,只提供前10页预览,清晰完整版报告请下载后查看,喜欢就下载吧!
立即下载
本报告共48页,只提供前10页预览,清晰完整版报告请下载后查看,喜欢就下载吧!
立即下载
水滴研报所有报告均是客户上传分享,仅供网友学习交流,未经上传用户书面授权,请勿作商用。
相关图表
图表 26. 同业存单发行历史情况
金融
2025-07-07
来源:银行业周报:银行指数上行创新高
查看原文
图表 25. 同业存单本周发行数据
金融
2025-07-07
来源:银行业周报:银行指数上行创新高
查看原文
图表 23. 中短期票据到期收益率、信用利差 图表 24. 同业存单到期收益率
金融
2025-07-07
来源:银行业周报:银行指数上行创新高
查看原文
图表 21. 国债到期收益率利差 图表 22. 国开债到期收益率利差
金融
2025-07-07
来源:银行业周报:银行指数上行创新高
查看原文
图表 20. 本周债券利率概况
金融
2025-07-07
来源:银行业周报:银行指数上行创新高
查看原文
图表 19. 本周债券市场发行结构
金融
2025-07-07
来源:银行业周报:银行指数上行创新高
查看原文
回顶部
报告群
公众号
小程序
在线客服
收起