RoboBrain 2.0 技术报告（英）-智源研究院

RoboBrain 2.0 Technical ReportBAAI RoboBrain TeamPlease see Contributions and Author List for more author details.AbstractWe introduce RoboBrain 2.0, our latest generation of embodied vision-language foundation models, designed tounify perception, reasoning, and planning for complex embodied tasks in physical environments. It comes intwo variants: a lightweight 7B model and a full-scale 32B model, featuring a heterogeneous architecture witha vision encoder and a language model. Despite its compact size, RoboBrain 2.0 achieves strong performanceacross a wide spectrum of embodied reasoning tasks. On both spatial and temporal benchmarks, the 32Bvariant achieves leading results, surpassing prior open-source and proprietary models. In particular, it supportskey real-world embodied AI capabilities, including spatial understanding (e.g., affordance prediction, spatialreferring, trajectory forecasting) and temporal decision-making (e.g., closed-loop interaction, multi-agent long-horizon planning, and scene graph updating). This report details the model architecture, data construction,multi-stage training strategies, infrastructure and practical applications. We hope RoboBrain 2.0 advancesembodied AI research and serves as a practical step toward building generalist embodied agents. The code,checkpoint and benchmark are available at https://superrobobrain.github.io.83.6357.5058.1481.8338.1642.8583.5717.2941.1176.2619.6753.7578.127.6941.26BLINK-Spatial（RelDep & SpRel）RefSpatial-BenchEgoPlan2RoboBrain-2.0-32BGemini-2.5-Pro-preview-05-06o4-mini-2025-04-16Qwen2.5-VL-72B-InstructClaude-Sonnet-4-2025051472.4373.5981.8359.8742.3865.3951.2526.5965.5048.3339.9274.6751.2625.6371.30RoboSpatialWhere2PlaceMulti-Robot-PlanRoboBrain-2.0-32BGemini-2.5-Pro-preview-05-06o4-mini-2025-04-16Qwen2.5-VL-72B-InstructClaude-Sonnet-4-20250514Spatial BenchmarksTemporal BenchmarksFigure 1Benchmark comparison across spatial and temporal reasoning. RoboBrain2.0-32B achieves bestperformance on both spatial and temporal reasoning benchmarks across BLINK-Spatial, RoboSpatial, RefSpatial-Bench,Where2Place, EgoPlan2 and Multi-Robot-Plan, outperforming prior open-source models and proprietary models.1Contents1Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .32Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .42.1Input Modalities and Tokenization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .52.2Vision Encoder and Projection. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .52.3LLM Decoder and Output Representations. . . . . . . . . . . . . . . . . . . . . . . . . . . .63Training Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .63.1General MLLM VQA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .63.2Spatial Data. . . . . . . . . . . . . . . . . . . . . . . . .

立即下载

综合

2025-09-22

57页

26.2M

RoboBrain 2.0 技术报告（英）-智源研究院，点击即可下载。报告格式为PDF，大小26.2M，页数57页，欢迎下载。

本报告共57页，只提供前10页预览，清晰完整版报告请下载后查看，喜欢就下载吧！

立即下载

本报告共57页，只提供前10页预览，清晰完整版报告请下载后查看，喜欢就下载吧！

立即下载

水滴研报所有报告均是客户上传分享，仅供网友学习交流，未经上传用户书面授权，请勿作商用。

相关图表