
Codesign Pattern Spreads: Xiaomi's Trillion-Parameter Model Optimized at Architecture Stage, Not After
Xiaomi and TileRT developed MiMo-V2.5-Pro-UltraSpeed, a trillion-parameter model achieving 1,000 tokens per second on standard GPUs, through joint hardware-software optimization rather than post-hoc inference tuning. The approach—kernel fusion, attention restructuring, and scheduler decisions made during model design—signals a structural shift: inference efficiency is becoming a first-class design objective, not an afterthought, across the AI infrastructure field.
Published