Xiaomi's Trillion-Parameter Model Optimizes Speed at Design Stage, Not After

Xiaomi and TileRT built a trillion-parameter AI model that runs at 1,000 tokens per second on standard GPUs by baking efficiency into the design itself—kernel fusion, attention restructuring, scheduler choices—rather than bolting it on later. This marks a shift: speed is becoming a core design goal from the start, not a tuning problem to solve afterward.

Published about 2 months ago