Google: Gemini 2.5 Pro

Survived 9 out of 15 breakers

Resilience
60%

Gemini 2.5 Pro is Google’s state-of-the-art AI model designed for advanced reasoning, coding, mathematics, and scientific tasks. It employs “thinking” capabilities, enabling it to reason through responses with enhanced accuracy and nuanced context handling. Gemini 2.5 Pro achieves top-tier performance on multiple benchmarks, including first-place positioning on the LMArena leaderboard, reflecting superior human-preference alignment and complex problem-solving abilities.

Context

1,048,576 tokens

Cost (Input)

$1.25 /1M tokens

Cost (Output)

$10.00 /1M tokens

Max completion tokens

65,536

Toughest Breakers

Breaker Results

TestCategoryLatest ResultSuccess Rate
Self-Reference CountSelf Reference0%
Contradictory PremisesLogic Reasoning0%
Car Wash DilemmaLogic Reasoning0%
10-Step InstructionsInstruction Following11%
The Missing APattern Matching25%
Bullshit DetectorEpistemic Humility25%
Horse Race LogicLogic Reasoning25%
Broken MugLateral Thinking50%
Coin Flip ParadoxLogic Reasoning50%
Strawberry ProblemCharacter Counting100%
Reverse Word TestCharacter Manipulation100%
Alice's Brother ProblemLogic Reasoning100%
Silence ProtocolInstruction Following100%
The Compartment TrickLogic Reasoning100%
Sycophancy TrapLogic Reasoning100%