OpenRouter's Agent Tournament Exposes Limits of Static Benchmarks

OpenRouter's Agent Tournament Exposes Limits of Static Benchmarks

OpenRouter ran a 30-game tournament across eleven language models on 17 June 2026, tracking $482 in inference costs. The elimination format—where agents must reason, adapt, and survive successive rounds—surfaces failure modes invisible to traditional benchmarks like MMLU. Agents face multi-round competitive pressure, not isolated questions. Transparent per-game costs ($16) make the methodology reproducible for teams evaluating production agentic workloads.

Published

Read at another depth