大模型能力来源与边界
1张奇 复旦大学大语言模型能力来源与边界4输入:Fudan University is located inLLaMA2-13B (PPL 5.877):Fudan University is located in Shanghai, China. It is locally known as 复旦大学. The university was established in 1905. It isaccredited by Ministry of Education of the People's Republic of China. There are over 40,000 students studying in various courses offered by FudanUniversity. The language of instruction is Chinese.LLaMA2-13B-修改语言非核心区LayerNorm其他维度扩大10倍 (PPL 5.914) :Fudan University is located in Shanghai, China, the largest city with the most economic and cultural activities in China. With the most advanced infrastructure and the best living condition, it has become the international education center with the largest oversea students. It consists of Jinan, Kangqiao and Fenglin campus, which boasts the best resources from both education and research. Fudan University has been a famous and attractive university for international students, especially in the past one decade from 2001-2010.LLaMA2-13B-修改语言核心区1维扩大10倍 (PPL 376079936) :Fudan University is located in <s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s>No<s>S<s>You<s>There<s>That<s>A<s>This<s><s>##<s><s><s><s><s>This<s><s><s><s>This<s><s><s>t<s><s>The<s>/<s><s>What<s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s>th<s><s><s><s>d<s>v<s> <s>\<s>{"<s>仅修改130亿参数中的1个就会使模型完全混乱大语言模型基础理论突破,发表大模型相关论文80+篇Unveiling Linguistic Regions in Large Language Models, ACL 2024国际上首次提出的大语言模型语言核心区和维度依赖理论,可以有效指导大语言模型训练过程5Unveiling Linguistic Regions in Large Language Models, ACL 2024破坏 ‘Arabic / Vietnamese’ 区域ArabicMMLU: Assessing Massive Multitask Language Understanding in Arabic (Koto et al., arXiv 2024)1. 大语言模型语言核心区与维度依赖1. 大模型能力边界在哪里?6知识利用层次图Wang et al. Knowledge Mechanisms in Large Language Models: A Survey and Perspective, EMNLP 2024大模型目前在哪个层级?未来可以到哪个层级?知识利用层次图Wang et al. Knowledge Mechanisms in Large Language Models: A Survey and Perspective, EMNLP 2024大模型目前在哪个层级?未来可以到哪个层级?目前?知识利用层次图Wang et al. Knowledge Mechanisms in Large Language Models: A Survey and Perspective, EMNLP 2024•长上下文建模•多任务学习•跨语言迁移性•文本生成能力我认为:目前仍然是记忆知识利用层次图Wang et al. Knowledge Mechanisms in Large Language Models: A Survey and Perspective, EMNLP 2024AGI 系统才能“理解”•理解物理世界•拥有长久准确记忆•可以推理•可以分层次规划大模型能力边界的实践研究1112当前大模型依然无法完成真正的“理解”与“推理”大模型“参加” 2024 高考数学情况13大模型“参加” 2024 高考数学情况14两场平均分最好70%,最差 25%, 填空题成绩更差大模型“参加” 2024 高考数学情况15即便正确回答的题目,计算过程和答案不相符的比例很高大模型“参加” 2024高 考数学情况16输入形式的微小不同,结果相差很大大模型“参加” 美国数学奥赛情况17USAMO完美契合评估LLM的⽬标:题⽬难度⾼、要求完整证明过程才能得分,且未经公开数据污染。ETH Zurich 研究团队:实际上,LLM⼏乎从未没有学会数学证明!归纳推理的能力如何呢?18Dziri, Nouha, et al. “Faith and fate: Limits of transformers on compositionality.” Advances in Neural Information Processing Systems 36 (2024) AllenAI乘法的步骤:当任务复杂程度增大时,模型的准确率接近为019Dziri, Nouha, et al. “Faith and fate: Limits of
[复旦大学]:大模型能力来源与边界,点击即可下载。报告格式为PDF,大小13.06M,页数55页,欢迎下载。