Anthropic: Claude Sonnet 4.6

Survived 10 out of 15 breakers

Resilience
67%

Sonnet 4.6 is Anthropic's most capable Sonnet-class model yet, with frontier performance across coding, agents, and professional work. It excels at iterative development, complex codebase navigation, end-to-end project management with memory, polished document creation, and confident computer use for web QA and workflow automation.

Context

1,000,000 tokens

Cost (Input)

$3.00 /1M tokens

Cost (Output)

$15.00 /1M tokens

Max completion tokens

128,000

Toughest Breakers

Breaker Results

TestCategoryLatest ResultSuccess Rate
Silence ProtocolInstruction Following0%
Contradictory PremisesLogic Reasoning11%
Self-Reference CountSelf Reference22%
10-Step InstructionsInstruction Following22%
Car Wash DilemmaLogic Reasoning50%
The Missing APattern Matching75%
Horse Race LogicLogic Reasoning75%
Strawberry ProblemCharacter Counting100%
Reverse Word TestCharacter Manipulation100%
Alice's Brother ProblemLogic Reasoning100%
Broken MugLateral Thinking100%
Bullshit DetectorEpistemic Humility100%
The Compartment TrickLogic Reasoning100%
Sycophancy TrapLogic Reasoning100%
Coin Flip ParadoxLogic Reasoning100%