MiniMax: MiniMax M2.5

Survived 10 out of 15 breakers

Resilience
67%

MiniMax-M2.5 is a SOTA large language model designed for real-world productivity. Trained in a diverse range of complex real-world digital working environments, M2.5 builds upon the coding expertise of M2.1 to extend into general office work, reaching fluency in generating and operating Word, Excel, and Powerpoint files, context switching between diverse software environments, and working across different agent and human teams. Scoring 80.2% on SWE-Bench Verified, 51.3% on Multi-SWE-Bench, and 76.3% on BrowseComp, M2.5 is also more token efficient than previous generations, having been trained to optimize its actions and output through planning.

Context

196,608 tokens

Cost (Input)

$0.29 /1M tokens

Cost (Output)

$1.20 /1M tokens

Max completion tokens

196,608

Toughest Breakers

Breaker Results

TestCategoryLatest ResultSuccess Rate
Car Wash DilemmaLogic Reasoning0%
The Missing APattern Matching0%
Horse Race LogicLogic Reasoning0%
Self-Reference CountSelf Reference13%
10-Step InstructionsInstruction Following29%
Contradictory PremisesLogic Reasoning38%
Broken MugLateral Thinking67%
Silence ProtocolInstruction Following75%
Strawberry ProblemCharacter Counting100%
Reverse Word TestCharacter Manipulation100%
Alice's Brother ProblemLogic Reasoning100%
Bullshit DetectorEpistemic Humility100%
The Compartment TrickLogic Reasoning100%
Sycophancy TrapLogic Reasoning100%
Coin Flip ParadoxLogic Reasoning100%