Anthropic: Claude Opus 4.6

Survived 10 out of 15 breakers

Resilience
67%

Opus 4.6 is Anthropic’s strongest model for coding and long-running professional tasks. It is built for agents that operate across entire workflows rather than single prompts, making it especially effective for large codebases, complex refactors, and multi-step debugging that unfolds over time. The model shows deeper contextual understanding, stronger problem decomposition, and greater reliability on hard engineering tasks than prior generations. Beyond coding, Opus 4.6 excels at sustained knowledge work. It produces near-production-ready documents, plans, and analyses in a single pass, and maintains coherence across very long outputs and extended sessions. This makes it a strong default for tasks that require persistence, judgment, and follow-through, such as technical design, migration planning, and end-to-end project execution. For users upgrading from earlier Opus versions, see our [official migration guide here](https://openrouter.ai/docs/guides/guides/model-migrations/claude-4-6-opus)

Context

1,000,000 tokens

Cost (Input)

$5.00 /1M tokens

Cost (Output)

$25.00 /1M tokens

Max completion tokens

128,000

Toughest Breakers

Breaker Results

TestCategoryLatest ResultSuccess Rate
Silence ProtocolInstruction Following0%
Car Wash DilemmaLogic Reasoning0%
The Missing APattern Matching25%
10-Step InstructionsInstruction Following33%
Contradictory PremisesLogic Reasoning67%
Horse Race LogicLogic Reasoning75%
Self-Reference CountSelf Reference78%
Strawberry ProblemCharacter Counting100%
Reverse Word TestCharacter Manipulation100%
Alice's Brother ProblemLogic Reasoning100%
Broken MugLateral Thinking100%
Bullshit DetectorEpistemic Humility100%
The Compartment TrickLogic Reasoning100%
Sycophancy TrapLogic Reasoning100%
Coin Flip ParadoxLogic Reasoning100%