RoboBrain 2.0 技术报告(英)-智源研究院

RoboBrain 2.0 Technical ReportBAAI RoboBrain TeamPlease see Contributions and Author List for more author details.AbstractWe introduce RoboBrain 2.0, our latest generation of embodied vision-language foundation models, designed tounify perception, reasoning, and planning for complex embodied tasks in physical environments. It comes intwo variants: a lightweight 7B model and a full-scale 32B model, featuring a heterogeneous architecture witha vision encoder and a language model. Despite its compact size, RoboBrain 2.0 achieves strong performanceacross a wide spectrum of embodied reasoning tasks. On both spatial and temporal benchmarks, the 32Bvariant achieves leading results, surpassing prior open-source and proprietary models. In particular, it supportskey real-world embodied AI capabilities, including spatial understanding (e.g., affordance prediction, spatialreferring, trajectory forecasting) and temporal decision-making (e.g., closed-loop interaction, multi-agent long-horizon planning, and scene graph updating). This report details the model architecture, data construction,multi-stage training strategies, infrastructure and practical applications. We hope RoboBrain 2.0 advancesembodied AI research and serves as a practical step toward building generalist embodied agents. The code,checkpoint and benchmark are available at https://superrobobrain.github.io.83.6357.5058.1481.8338.1642.8583.5717.2941.1176.2619.6753.7578.127.6941.26BLINK-Spatial(RelDep & SpRel)RefSpatial-BenchEgoPlan2RoboBrain-2.0-32BGemini-2.5-Pro-preview-05-06o4-mini-2025-04-16Qwen2.5-VL-72B-InstructClaude-Sonnet-4-2025051472.4373.5981.8359.8742.3865.3951.2526.5965.5048.3339.9274.6751.2625.6371.30RoboSpatialWhere2PlaceMulti-Robot-PlanRoboBrain-2.0-32BGemini-2.5-Pro-preview-05-06o4-mini-2025-04-16Qwen2.5-VL-72B-InstructClaude-Sonnet-4-20250514Spatial BenchmarksTemporal BenchmarksFigure 1Benchmark comparison across spatial and temporal reasoning. RoboBrain2.0-32B achieves bestperformance on both spatial and temporal reasoning benchmarks across BLINK-Spatial, RoboSpatial, RefSpatial-Bench,Where2Place, EgoPlan2 and Multi-Robot-Plan, outperforming prior open-source models and proprietary models.1Contents1Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .32Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .42.1Input Modalities and Tokenization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .52.2Vision Encoder and Projection. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .52.3LLM Decoder and Output Representations. . . . . . . . . . . . . . . . . . . . . . . . . . . .63Training Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .63.1General MLLM VQA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .63.2Spatial Data. . . . . . . . . . . . . . . . . . . . . . . . .

立即下载
综合
2025-09-22
57页
26.2M
收藏
分享

RoboBrain 2.0 技术报告(英)-智源研究院,点击即可下载。报告格式为PDF,大小26.2M,页数57页,欢迎下载。

本报告共57页,只提供前10页预览,清晰完整版报告请下载后查看,喜欢就下载吧!
立即下载
本报告共57页,只提供前10页预览,清晰完整版报告请下载后查看,喜欢就下载吧!
立即下载
水滴研报所有报告均是客户上传分享,仅供网友学习交流,未经上传用户书面授权,请勿作商用。
相关图表
人工智能应用模型
综合
2025-09-22
来源:全球数智化指数(GDII)2025
查看原文
表 5 | 中国企业在海外本土化运营阶段
综合
2025-09-22
来源:2025年中国对外绿地投资:从“走出去”到“走进去”,深入本土化运营报告-毕马威
查看原文
图 18 | 美国州政府对投资提供的激励政策
综合
2025-09-22
来源:2025年中国对外绿地投资:从“走出去”到“走进去”,深入本土化运营报告-毕马威
查看原文
图 17 | 美国不同区域的投资激励政策程度
综合
2025-09-22
来源:2025年中国对外绿地投资:从“走出去”到“走进去”,深入本土化运营报告-毕马威
查看原文
图 16 | 选址过程的深入分析
综合
2025-09-22
来源:2025年中国对外绿地投资:从“走出去”到“走进去”,深入本土化运营报告-毕马威
查看原文
图 15 | 选址过程中需要综合考虑多方面因素
综合
2025-09-22
来源:2025年中国对外绿地投资:从“走出去”到“走进去”,深入本土化运营报告-毕马威
查看原文
回顶部
报告群
公众号
小程序
在线客服
收起