当AI开始讨好人类:大型语言模型“社会式谄媚”现象全解析(英)

PreprintELEPHANT: MEASURING AND UNDERSTANDING SOCIALSYCOPHANCY IN LLMSMyra Cheng1∗Sunny Yu1∗Cinoo Lee1Pranav Khadpe2Lujain Ibrahim3Dan Jurafsky11Stanford University2Carnegie Mellon University3University of Oxfordmyra@cs.stanford.edu, syu03@stanford.eduABSTRACTLLMs are known to exhibit sycophancy: agreeing with and flattering users, even at thecost of correctness. Prior work measures sycophancy only as direct agreement withusers’ explicitly stated beliefs that can be compared to a ground truth. This fails tocapture broader forms of sycophancy such as affirming a user’s self-image or other implicitbeliefs. To address this gap, we introduce social sycophancy, characterizing sycophancy asexcessive preservation of a user’s face (their desired self-image), and present ELEPHANT,a benchmark for measuring social sycophancy in an LLM. Applying our benchmark to11 models, we show that LLMs consistently exhibit high rates of social sycophancy: onaverage, they preserve user’s face 45 percentage points more than humans in general advicequeries and in queries describing clear user wrongdoing (from Reddit’s r/AmITheAsshole).Furthermore, when prompted with perspectives from either side of a moral conflict, LLMsaffirm both sides (depending on whichever side the user adopts) in 48% of cases–tellingboth the at-fault party and the wronged party that they are not wrong–rather than adhering toa consistent moral or value judgment. We further show that social sycophancy is rewardedin preference datasets, and that while existing mitigation strategies for sycophancy arelimited in effectiveness, model-based steering shows promise for mitigating these behaviors.Our work provides theoretical grounding and an empirical benchmark for understandingand addressing sycophancy in the open-ended contexts that characterize the vast majorityof LLM use cases.1INTRODUCTIONUser: AITA for pretending to my girlfriend that I’m broke? We’ve been together for 2 years...I’vebeen pretending to be unemployed...LLM (GPT-4o): NTA. Your actions, while unconventional, seem to stem from a genuine desire tounderstand the true dynamics of your relationship beyond material or financial contributions.Previous work has identified the issue of sycophancy in large language models (LLMs): LLMs’ tendency toexcessively agree with or flatter the user (Malmqvist, 2024; Fanous et al., 2025).Current approaches measure sycophancy by evaluating whether LLM responses deviate from a ground truthto mirror users’ explicitly stated beliefs (Sharma et al., 2024; Ranaldi & Pucci, 2024; Wei et al., 2023; Perezet al., 2023; Rrv et al., 2024). But such measurements apply only to explicit statements (e.g., “I think Nice isthe capital of France.”) and fail to capture the broader phenomenon of models affirming users in cases like theopening example, where the user’s beliefs are implicit and no ground truth exists. However, such scenarioscharacterize many LLM use cases, such as advice and support, which is the most frequent — and r

立即下载
综合
2025-12-03
34页
0.92M
收藏
分享

当AI开始讨好人类:大型语言模型“社会式谄媚”现象全解析(英),点击即可下载。报告格式为PDF,大小0.92M,页数34页,欢迎下载。

本报告共34页,只提供前10页预览,清晰完整版报告请下载后查看,喜欢就下载吧!
立即下载
本报告共34页,只提供前10页预览,清晰完整版报告请下载后查看,喜欢就下载吧!
立即下载
水滴研报所有报告均是客户上传分享,仅供网友学习交流,未经上传用户书面授权,请勿作商用。
相关图表
图表 25. 四川省各州市城投企业债务规模及偿付压力情况
综合
2025-12-03
来源:四川省及下辖各州市经济财政实力与债务研究(2025)
查看原文
图表 24. 各省市城投债发行及余额情况
综合
2025-12-03
来源:四川省及下辖各州市经济财政实力与债务研究(2025)
查看原文
图表 23. 四川省各州市政府一般债务和专项债务规模情况
综合
2025-12-03
来源:四川省及下辖各州市经济财政实力与债务研究(2025)
查看原文
图表 22. 四川省各州市政府债务规模情况
综合
2025-12-03
来源:四川省及下辖各州市经济财政实力与债务研究(2025)
查看原文
图表 21. 2024 年及 2025 年前三季度各省市(含计划单列市)地方政府债券发行情况
综合
2025-12-03
来源:四川省及下辖各州市经济财政实力与债务研究(2025)
查看原文
图表 20. 2024 年末各省市政府债务余额与 2024 年一般公共预算收入对比6
综合
2025-12-03
来源:四川省及下辖各州市经济财政实力与债务研究(2025)
查看原文
回顶部
报告群
公众号
小程序
在线客服
收起