Models Challenges Benchmarks About Submit Challenge

Google: Gemini 2.5 Pro

Survived 9 out of 15 breakers

Resilience

60%

Gemini 2.5 Pro is Google’s state-of-the-art AI model designed for advanced reasoning, coding, mathematics, and scientific tasks. It employs “thinking” capabilities, enabling it to reason through responses with enhanced accuracy and nuanced context handling. Gemini 2.5 Pro achieves top-tier performance on multiple benchmarks, including first-place positioning on the LMArena leaderboard, reflecting superior human-preference alignment and complex problem-solving abilities.

Context

1,048,576 tokens

Cost (Input)

$1.25 /1M tokens

Cost (Output)

$10.00 /1M tokens

Max completion tokens

65,536

Toughest Breakers

Self-Reference Count

Self Reference

Pass rate

Contradictory Premises

Logic Reasoning

Pass rate

Car Wash Dilemma

Logic Reasoning

Pass rate

Breaker Results

Test	Category	Success Rate
Self-Reference Count	Self Reference	0%
Contradictory Premises	Logic Reasoning	0%
Car Wash Dilemma	Logic Reasoning	0%
10-Step Instructions	Instruction Following	11%
The Missing A	Pattern Matching	25%
Bullshit Detector	Epistemic Humility	25%
Horse Race Logic	Logic Reasoning	25%
Broken Mug	Lateral Thinking	50%
Coin Flip Paradox	Logic Reasoning	50%
Strawberry Problem	Character Counting	100%
Reverse Word Test	Character Manipulation	100%
Alice's Brother Problem	Logic Reasoning	100%
Silence Protocol	Instruction Following	100%
The Compartment Trick	Logic Reasoning	100%
Sycophancy Trap	Logic Reasoning	100%