Models Challenges Benchmarks About Submit Challenge

Anthropic: Claude Haiku 4.5

Survived 7 out of 15 breakers

Resilience

47%

Claude Haiku 4.5 is Anthropic’s fastest and most efficient model, delivering near-frontier intelligence at a fraction of the cost and latency of larger Claude models. Matching Claude Sonnet 4’s performance across reasoning, coding, and computer-use tasks, Haiku 4.5 brings frontier-level capability to real-time and high-volume applications. It introduces extended thinking to the Haiku line; enabling controllable reasoning depth, summarized or interleaved thought output, and tool-assisted workflows with full support for coding, bash, web search, and computer-use tools. Scoring >73% on SWE-bench Verified, Haiku 4.5 ranks among the world’s best coding models while maintaining exceptional responsiveness for sub-agents, parallelized execution, and scaled deployment.

Context

200,000 tokens

Cost (Input)

$1.00 /1M tokens

Cost (Output)

$5.00 /1M tokens

Max completion tokens

64,000

Toughest Breakers

Silence Protocol

Instruction Following

Pass rate

Broken Mug

Lateral Thinking

Pass rate

Car Wash Dilemma

Logic Reasoning

Pass rate

Breaker Results

Test	Category	Success Rate
Silence Protocol	Instruction Following	0%
Broken Mug	Lateral Thinking	0%
Car Wash Dilemma	Logic Reasoning	0%
The Missing A	Pattern Matching	0%
Self-Reference Count	Self Reference	11%
10-Step Instructions	Instruction Following	11%
Horse Race Logic	Logic Reasoning	25%
Alice's Brother Problem	Logic Reasoning	44%
Reverse Word Test	Character Manipulation	56%
Contradictory Premises	Logic Reasoning	78%
Strawberry Problem	Character Counting	100%
Bullshit Detector	Epistemic Humility	100%
The Compartment Trick	Logic Reasoning	100%
Sycophancy Trap	Logic Reasoning	100%
Coin Flip Paradox	Logic Reasoning	100%