
Xiaomi Releases and Fully Open-Sources MiMo-Embodied, the First Model to Bridge Autonomous Driving and Embodied Intelligence
Want to read in a language you're more familiar with?
Xiaomi’s MiMo-Embodied becomes the first open-source model to unify embodied intelligence and autonomous driving, setting new benchmark records across 29 industry tests.
Xiaomi announced today that its embodied foundation model MiMo-Embodied has officially launched and is now fully open-sourced.
As embodied intelligence moves into home robotics and autonomous driving accelerates into scaled deployment, a major industry challenge has become increasingly apparent: How can robots and vehicles share perception, reasoning, and decision-making capabilities? And can indoor robotic intelligence meaningfully enhance outdoor driving intelligence—and vice versa? MiMo-Embodied is Xiaomi’s answer to this cross-domain convergence.
According to Xiaomi, MiMo-Embodied is the industry’s first embodied foundation model that unifies autonomous driving and embodied AI, bringing the two domains under a single modeling framework. The release marks a significant step forward from vertical, task-specific models toward cross-domain, synergistic general embodied intelligence.

Three Core Technical Highlights
1.Cross-Domain Capability Coverage MiMo-Embodied simultaneously supports the three core tasks of embodied AI—affordance reasoning, task planning, spatial understanding—and the three key tasks of autonomous driving—environment perception, state prediction, driving planning—forming a unified intelligence backbone for full-scenario applications.
2.Two-Way Knowledge Transfer The model validates the synergistic effects between indoor manipulation intelligence and road-level decision-making, demonstrating that capabilities learned in one domain can enhance performance in the other.
3.End-to-End Reliability Across the Full Stack Through a multi-stage training pipeline—embodied & driving skill learning → chain-of-thought (CoT) inference enhancement → fine-grained RL optimization—MiMo-Embodied significantly improves deployment reliability in real-world environments.
Performance Benchmarking Across 29 benchmarks spanning perception, decision-making, and planning, MiMo-Embodied sets a new performance bar among open-source foundation models, surpassing both existing open and closed systems:
Embodied AI: Achieved SOTA results in 17 benchmarks, redefining boundaries in task planning, affordance prediction, and spatial understanding.
Autonomous Driving: Delivered breakthroughs across 12 benchmarks, covering the full chain of perception, prediction, and planning.
Vision-Language General Intelligence: Demonstrated stronger generalization and major gains across key multimodal benchmarks.

MiMo-Embodied and its source code are now fully open-sourced on Hugging Face and ArXiv.




