Xiaomi Releases and Fully Open-Sources MiMo-Embodied, the First Model to Bridge Autonomous Driving and Embodied Intelligence

Xiaomi announced today that its embodied foundation model MiMo-Embodied has officially launched and is now fully open-sourced.

As embodied intelligence moves into home robotics and autonomous driving accelerates into scaled deployment, a major industry challenge has become increasingly apparent: How can robots and vehicles share perception, reasoning, and decision-making capabilities? And can indoor robotic intelligence meaningfully enhance outdoor driving intelligence—and vice versa? MiMo-Embodied is Xiaomi’s answer to this cross-domain convergence.

According to Xiaomi, MiMo-Embodied is the industry’s first embodied foundation model that unifies autonomous driving and embodied AI, bringing the two domains under a single modeling framework. The release marks a significant step forward from vertical, task-specific models toward cross-domain, synergistic general embodied intelligence.

Three Core Technical Highlights

1.Cross-Domain Capability Coverage MiMo-Embodied simultaneously supports the three core tasks of embodied AI—affordance reasoning, task planning, spatial understanding—and the three key tasks of autonomous driving—environment perception, state prediction, driving planning—forming a unified intelligence backbone for full-scenario applications.

2.Two-Way Knowledge Transfer The model validates the synergistic effects between indoor manipulation intelligence and road-level decision-making, demonstrating that capabilities learned in one domain can enhance performance in the other.

3.End-to-End Reliability Across the Full Stack Through a multi-stage training pipeline—embodied & driving skill learning → chain-of-thought (CoT) inference enhancement → fine-grained RL optimization—MiMo-Embodied significantly improves deployment reliability in real-world environments.

Performance Benchmarking Across 29 benchmarks spanning perception, decision-making, and planning, MiMo-Embodied sets a new performance bar among open-source foundation models, surpassing both existing open and closed systems:

Embodied AI: Achieved SOTA results in 17 benchmarks, redefining boundaries in task planning, affordance prediction, and spatial understanding.
Autonomous Driving: Delivered breakthroughs across 12 benchmarks, covering the full chain of perception, prediction, and planning.
Vision-Language General Intelligence: Demonstrated stronger generalization and major gains across key multimodal benchmarks.

MiMo-Embodied and its source code are now fully open-sourced on Hugging Face and ArXiv.

Xiaomi Releases and Fully Open-Sources MiMo-Embodied, the First Model to Bridge Autonomous Driving and Embodied Intelligence

Tags

Featured Posts

“Qwen Panic”: How Alibaba’s AI Ambitions Are Shaking Silicon Valley

Alibaba Launches Robotics and Embodied AI

Xiaomi Responds to Incident of Car Reportedly Driving Off on Its Own

Afari Technology Unveils AI Plan and New Brand

ByteDance’s Doubao Translation Model Supports 28 Languages, Performance Comparable to GPT-4o

More in News