Xiaomi is no longer relying on phones and smart-home hardware alone. Over the past year and a half, the company has built an AI stack that now spans large language models, voice, vision, and agentic tools designed to act inside a phone.
The scale of that move matters because Xiaomi is not just adding AI features to existing products. It is building a full pipeline that reaches from open-source models for developers to consumer tools inside HyperOS, plus systems connected to smart homes and vehicles.
From compact models to giant architectures
Xiaomi’s first major step in the LLM race came in April 2025 with MiMo-7B. The model, short for Xiaomi Model, was built with reasoning and coding as its main focus rather than general chat.
Even with 7 billion parameters, Xiaomi said MiMo-7B performed far above its size class. On mathematics benchmarks such as MATH-500, the reinforcement learning version reportedly reached 95.8% and was said to outperform OpenAI o1-mini and Alibaba Qwen-32B-Preview in AIME 2024 and 2025.
The model was trained on 200 billion carefully curated reasoning tokens and went through 25 trillion tokens across three training phases. Xiaomi released it under the MIT license and made it available on Hugging Face.
The project was led by Luo Fuli, who joined Xiaomi from DeepSeek. That hiring move showed the company was treating AI as a serious technical bet, not a side experiment.
Speed, efficiency, and open-source scale
By December 2025, Xiaomi had introduced MiMo-V2-Flash. The model carries 309 billion parameters, but only about 15 billion are active at a time thanks to a Mixture-of-Experts design.
Xiaomi said the model combined strong performance with fast output. It described MiMo-V2-Flash as one of the top two open-source models for reasoning, matching GPT-5 and Claude 4.5 Sonnet on SWE-Bench Verified while reaching 150 tokens per second.
The company also claimed inference costs were just 2.5% of Claude’s pricing. Its API was set at $0.1 per million input tokens, with free access offered during an initial launch window.
MiMo-V2-Flash also uses Multi-Token Prediction, or MTP. That approach allows the model to generate and verify multiple tokens at once to improve efficiency.
The shift toward agentic and multimodal AI
Xiaomi’s ambitions became even clearer in March 2026 with MiMo-V2-Pro. The model has more than 1 trillion total parameters, 42 billion active parameters per pass, and a 1 million token context window.
Xiaomi said MiMo-V2-Pro was designed specifically for agentic tasks. In practical terms, that means it is aimed at complex, multi-step work that can continue without constant human prompting.
Before the official launch, the model appeared anonymously on OpenRouter under the name Hunter Alpha. It quickly climbed the leaderboard and processed more than 1.5 trillion tokens before Xiaomi acknowledged it.
Alongside it, Xiaomi also released MiMo-V2-Omni and MiMo-V2-TTS. Those models extend Xiaomi’s AI reach into text, images, audio, video, and synthetic speech for agent pipelines.
By late April 2026, Xiaomi unified its strongest V2 capabilities in MiMo-V2.5 and MiMo-V2.5-Pro. The Pro version carries 1.02 trillion parameters and handles text, images, audio, and video within a single architecture.
For complex tasks, MiMo-V2.5-Pro runs at roughly 60 to 80 tokens per second. The lighter MiMo-V2.5 is aimed at daily use and reaches around 100 to 150 tokens per second.
According to Artificial Analysis, MiMo-V2.5-Pro briefly became the world’s top open-source model for agentic capability at launch. Xiaomi also removed extra charges for full 1 million token context and reset user credits to make the model easier for developers to try.
At the start of June 2026, Xiaomi introduced MiMo Code. The terminal-based coding agent is built on MiMo-V2.5 and includes persistent memory that keeps decisions traceable across long projects.
Voice, vision, and the smart-home layer
On the audio side, Xiaomi released MiDashengLM-7B in August 2025. The model was trained on a 38,662-hour dataset and uses a general audio caption approach, so it understands music, environmental sounds, speaker emotion, and acoustic context, not just spoken words.
MiDashengLM-7B is built on Alibaba’s Qwen2.5-Omni-7B. Xiaomi has already embedded the model in electric vehicles and smart-home devices, and it was released under the Apache 2.0 license for commercial use.
Xiaomi also published MiMo-Audio. Its audio encoder was later integrated into MiMo-V2.5 to support an omnimodal experience.
For vision, Xiaomi released MiMo-VL and the home-focused MiMo-VL-Miloco-7B. The Miloco model was designed to understand home environments, including gestures such as thumbs-up, OK, peace sign, and an open palm.
It can also identify common household activities like watching TV, exercising, or reading. Xiaomi trained it with a mix of supervised fine-tuning and reinforcement learning so it would remain useful in the home without losing general capability.
Voice cloning and phone-level features
In May 2026, Xiaomi AI Lab’s next-gen Kaldi team released OmniVoice as open source. The model is a zero-shot text-to-speech voice cloning system that supports 646 languages, including many with limited training data.
OmniVoice can clone a voice from only a few seconds of reference audio. It then generates natural speech across languages while keeping the original vocal character intact.
Technically, OmniVoice uses a simpler single-transformer architecture and maps text directly to acoustic tokens. Xiaomi said the design allows 100,000 hours of audio data to be trained in a single day and enables inference up to 40 times faster than real time using PyTorch.
With the V2.5 launch, Xiaomi also added MiMo-V2.5-TTS and ASR. The TTS system supports voice cloning, while ASR handles bilingual recognition to build end-to-end voice products.
For consumers, Xiaomi continues to rely on Xiao AI and HyperAI. Xiao AI, already present in phones, smart speakers, and wearables, was upgraded through HyperOS 2 into Super Xiao AI with better context memory, smarter smart-home control, and text-to-image generation.
HyperAI was introduced globally at MWC 2025 and began appearing on the Xiaomi 15 series. The package includes real-time translation, writing assistance, smart voice recognition that summarizes recordings, and AI photo editing, with Google Gemini serving as the backend for global devices.
miclaw and Xiaomi’s broader strategy
The most ambitious piece of the strategy is miclaw. Announced in March 2026 and still in closed beta, it is an autonomous AI agent built on MiMo that goes beyond text answers.
miclaw can open apps, navigate interfaces, fill out forms, interact with system tools, and complete multi-step tasks on a phone. Xiaomi describes its mechanism as an inference-execution loop, where the AI plans actions, executes them, checks the result, and continues until the task is finished.
The agent also has contextual memory that compresses older interactions while preserving the original goal. It can connect to Xiaomi’s smart-home and car ecosystem, and the current beta supports the Xiaomi 17 series.
For privacy, Xiaomi says user interactions with miclaw are not used to train the AI model. Personal data is processed in real time to carry out commands, while sensitive information is handled locally through what it calls edge-cloud privacy computing.
Lei Jun said in March 2026 that Xiaomi will invest at least $8.7 billion in AI over three years. With annual R&D spending projected to reach around 40 billion yuan in 2026, the company is also aiming for a “grand convergence” that brings its chips, operating system, and AI models into one device.
The early impact is already visible on OpenRouter. In early April 2026, Xiaomi’s models were said to have captured about 21% of all traffic on the AI routing platform.
Source: www.gizmochina.com






