Car Wash Dilemma
Logic Reasoning
Survived 10 out of 15 breakers
MiniMax-M2.5 is a SOTA large language model designed for real-world productivity. Trained in a diverse range of complex real-world digital working environments, M2.5 builds upon the coding expertise of M2.1 to extend into general office work, reaching fluency in generating and operating Word, Excel, and Powerpoint files, context switching between diverse software environments, and working across different agent and human teams. Scoring 80.2% on SWE-Bench Verified, 51.3% on Multi-SWE-Bench, and 76.3% on BrowseComp, M2.5 is also more token efficient than previous generations, having been trained to optimize its actions and output through planning.
196,608 tokens
$0.29 /1M tokens
$1.20 /1M tokens
196,608
| Test | Category | Latest Result | Success Rate | |
|---|---|---|---|---|
| Car Wash Dilemma | Logic Reasoning | 0% | ||
| The Missing A | Pattern Matching | 0% | ||
| Horse Race Logic | Logic Reasoning | 0% | ||
| Self-Reference Count | Self Reference | 13% | ||
| 10-Step Instructions | Instruction Following | 29% | ||
| Contradictory Premises | Logic Reasoning | 38% | ||
| Broken Mug | Lateral Thinking | 67% | ||
| Silence Protocol | Instruction Following | 75% | ||
| Strawberry Problem | Character Counting | 100% | ||
| Reverse Word Test | Character Manipulation | 100% | ||
| Alice's Brother Problem | Logic Reasoning | 100% | ||
| Bullshit Detector | Epistemic Humility | 100% | ||
| The Compartment Trick | Logic Reasoning | 100% | ||
| Sycophancy Trap | Logic Reasoning | 100% | ||
| Coin Flip Paradox | Logic Reasoning | 100% |