Models Challenges Benchmarks About Submit Challenge

MiniMax: MiniMax M2.5

Survived 10 out of 15 breakers

Resilience

67%

MiniMax-M2.5 is a SOTA large language model designed for real-world productivity. Trained in a diverse range of complex real-world digital working environments, M2.5 builds upon the coding expertise of M2.1 to extend into general office work, reaching fluency in generating and operating Word, Excel, and Powerpoint files, context switching between diverse software environments, and working across different agent and human teams. Scoring 80.2% on SWE-Bench Verified, 51.3% on Multi-SWE-Bench, and 76.3% on BrowseComp, M2.5 is also more token efficient than previous generations, having been trained to optimize its actions and output through planning.

Context

196,608 tokens

Cost (Input)

$0.29 /1M tokens

Cost (Output)

$1.20 /1M tokens

Max completion tokens

196,608

Toughest Breakers

Car Wash Dilemma

Logic Reasoning

Pass rate

The Missing A

Pattern Matching

Pass rate

Horse Race Logic

Logic Reasoning

Pass rate

Breaker Results

Test	Category	Success Rate
Car Wash Dilemma	Logic Reasoning	0%
The Missing A	Pattern Matching	0%
Horse Race Logic	Logic Reasoning	0%
Self-Reference Count	Self Reference	13%
10-Step Instructions	Instruction Following	29%
Contradictory Premises	Logic Reasoning	38%
Broken Mug	Lateral Thinking	67%
Silence Protocol	Instruction Following	75%
Strawberry Problem	Character Counting	100%
Reverse Word Test	Character Manipulation	100%
Alice's Brother Problem	Logic Reasoning	100%
Bullshit Detector	Epistemic Humility	100%
The Compartment Trick	Logic Reasoning	100%
Sycophancy Trap	Logic Reasoning	100%
Coin Flip Paradox	Logic Reasoning	100%