面向具身智能的大小模型协同算法研究和实践（2025.8）

面向具身智能的大小脑模型协同算法研究及实践盛律 | 软件学院2025-08-231具身智能的基本概念基于物理载体进行感知和行动的智能系统，其通过智能体与环境的交互获取信息、理解问题、做出决策并实现行动，从而产生智能行为和适应性具身智能2具身智能的基本概念基于物理载体进行感知和行动的智能系统，其通过智能体与环境的交互获取信息、理解问题、做出决策并实现行动，从而产生智能行为和适应性具身智能传统智能具身智能只可远观，被动接受别人告诉我这就是盒子可以打开，可以装东西我主动体验什么是盒子被动抽象接受主动具体体验重要意义具身智能因其能自主产生智能行为和适应性，是通用人工智能的可能起点3具身智能的关键任务导航问答操作4具身智能的核心目标5具身智能的核心要素具身载体(Agent)具身模型(Model)智能算法物理载体相比具身载体的日趋成熟，具身模型的算法研究方兴未艾、挑战众多现状6具身模型应该考虑哪些能力？n 技能泛化、真实交互、本体扩展Skill（技能泛化）Reality（真实交互）Embodiment（本体扩展）Adapted from Jim Fan’s talk7具身模型的几种类型大小脑协同端到端8具身模型的最新进展：代表性新工作端到端VLA(2024.10)大小脑 hi robot (2025.02)混合(2025.04)大脑-小脑端到端VLA端测SDK（2025.03）具身大脑端到端VLA9大小脑模型协同的技术路线仍有机会q 端到端模型虽决策高效，但泛化性和扩展性受限，受制于环境交互与硬件适配，难以适应多样场景。而模块化的大小脑协同框架凭借强泛化、可解释优势，正成为学界与业界的研究热点模块化：大小脑协同框架赋予具身智能体模块化优势，具备可扩展架构、高效开发与强适应性三大特性可泛化：基于VLM开发的大脑具备丰富的多模态认知能力，且不受小脑模型的影响可解释：决策过程更加透明，提升人机协同效率大小脑模型协同框架是当前实现具身智能体更易落地的技术路线11传统多模态大模型能够作为“大脑”？n 传统VLMs在具身智能场景（长程闭环操作、时空智能等）中面临严峻挑战以‘把锅放到抽屉里’为例，该任务涉及多步骤的长时间交互，包括移动、抓取、放置等操作，并需要与锅、抽屉等物体进行持续交互GPT-4o在具身任务中表现欠佳12回顾：具身模型应该考虑哪些能力？n 技能泛化、真实交互、本体扩展Skill（技能泛化）Reality（真实交互）Embodiment（本体扩展）Adapted from Jim Fan’s talk13技能泛化：多智能体实现长时序开放具身任务解决DayLong-horizon open-Forestworld embodied tasksStoneWaterTask: Gather wood fromthe forest, craft a stonesword on the plains, andthen use it to kill a pigduring the daytime nearwater and grassWoodPigGrassPlains14技能泛化：多智能体实现长时序开放具身任务解决Task: Gather wood fromthe forest, craft a stonesword on the plains, andthen use it to kill a pigduring the daytime nearwater and grassDayForestStoneWaterWoodO6ProcessO1O : pig8O7 : stone swordO : stone61O8O : wooden pickaxe5PigGrass长时序具身任务上下文依赖 + 过程依赖2O8O1: log1O8O1O6O82技能泛化：多智能体实现长时序开放具身任务解决n MP5 (CVPR 2024): 5 (M)LLMs with different roles, communicating for different purposesObtain Env.Info. forPlanningTask: Kill a pig with a wooden sword during the daytimenear the water with grass next to it.<Sub-Objective>KnowledgeMemoryPlanner: Can you tell me what important environmental information I need to know?Sub-Objectives{}Patroller: I conduct Active Perception with Percipient with your current observation,there is no pig based on the scene.ParserObtain Env.Info. forPerformerPerformer Memory<Sub-Objective>Planner: 1. Equip( ) 2. Find() 3. Move() 4. Fight()Performer: Start executing “Equip”.PlannerPerformer: Having completed a move in “Find” action, based on my current view,Activetell me if I should continue this action or if the next action is ready to execute.PerceptionPercipientPatrollerPatroller: I conduct Active Perception with Percipient with your current observation,you must continue with the current action since there is no river near the pig.MoveEquipMulti-roundSingle-roundPerformer: Continue executing “Find”.CraftPerformer: Having completed a move in “Find” action, based on my current view, tellme if I should continue this action or if the next action is ready to execute.MineFightFindErrorFeedbackRe-planPatroller: I conduct Active Perception with Percipient with your current observation,you can execute the next action since all conditions are sa

立即下载

综合

2025-09-22

37页

4.59M

面向具身智能的大小模型协同算法研究和实践（2025.8），点击即可下载。报告格式为PDF，大小4.59M，页数37页，欢迎下载。

本报告共37页，只提供前10页预览，清晰完整版报告请下载后查看，喜欢就下载吧！

立即下载

本报告共37页，只提供前10页预览，清晰完整版报告请下载后查看，喜欢就下载吧！

立即下载

水滴研报所有报告均是客户上传分享，仅供网友学习交流，未经上传用户书面授权，请勿作商用。

面向具身智能的大小模型协同算法研究和实践（2025.8）

关于我们

联系我们

面向具身智能的大小模型协同算法研究和实践（2025.8）

关于我们

联系我们

小程序

公众号