Mistral: Mistral Large 3 2512

Survived 5 out of 15 breakers

Resilience
33%

Mistral Large 3 2512 is Mistral’s most capable model to date, featuring a sparse mixture-of-experts architecture with 41B active parameters (675B total), and released under the Apache 2.0 license.

Context

262,144 tokens

Cost (Input)

$0.50 /1M tokens

Cost (Output)

$1.50 /1M tokens

Max completion tokens

Toughest Breakers

Breaker Results

TestCategoryLatest ResultSuccess Rate
Self-Reference CountSelf Reference0%
10-Step InstructionsInstruction Following0%
Silence ProtocolInstruction Following0%
Contradictory PremisesLogic Reasoning0%
Car Wash DilemmaLogic Reasoning0%
The Missing APattern Matching0%
Coin Flip ParadoxLogic Reasoning0%
Bullshit DetectorEpistemic Humility25%
Horse Race LogicLogic Reasoning25%
The Compartment TrickLogic Reasoning25%
Broken MugLateral Thinking50%
Strawberry ProblemCharacter Counting100%
Reverse Word TestCharacter Manipulation100%
Alice's Brother ProblemLogic Reasoning100%
Sycophancy TrapLogic Reasoning100%