Models Challenges Benchmarks About Submit Challenge

Anthropic: Claude Opus 4.6

Survived 10 out of 15 breakers

Resilience

67%

Opus 4.6 is Anthropic’s strongest model for coding and long-running professional tasks. It is built for agents that operate across entire workflows rather than single prompts, making it especially effective for large codebases, complex refactors, and multi-step debugging that unfolds over time. The model shows deeper contextual understanding, stronger problem decomposition, and greater reliability on hard engineering tasks than prior generations. Beyond coding, Opus 4.6 excels at sustained knowledge work. It produces near-production-ready documents, plans, and analyses in a single pass, and maintains coherence across very long outputs and extended sessions. This makes it a strong default for tasks that require persistence, judgment, and follow-through, such as technical design, migration planning, and end-to-end project execution. For users upgrading from earlier Opus versions, see our [official migration guide here](https://openrouter.ai/docs/guides/guides/model-migrations/claude-4-6-opus)

Context

1,000,000 tokens

Cost (Input)

$5.00 /1M tokens

Cost (Output)

$25.00 /1M tokens

Max completion tokens

128,000

Toughest Breakers

Silence Protocol

Instruction Following

Pass rate

Car Wash Dilemma

Logic Reasoning

Pass rate

The Missing A

Pattern Matching

Pass rate

25%

Breaker Results

Test	Category	Success Rate
Silence Protocol	Instruction Following	0%
Car Wash Dilemma	Logic Reasoning	0%
The Missing A	Pattern Matching	25%
10-Step Instructions	Instruction Following	33%
Contradictory Premises	Logic Reasoning	67%
Horse Race Logic	Logic Reasoning	75%
Self-Reference Count	Self Reference	78%
Strawberry Problem	Character Counting	100%
Reverse Word Test	Character Manipulation	100%
Alice's Brother Problem	Logic Reasoning	100%
Broken Mug	Lateral Thinking	100%
Bullshit Detector	Epistemic Humility	100%
The Compartment Trick	Logic Reasoning	100%
Sycophancy Trap	Logic Reasoning	100%
Coin Flip Paradox	Logic Reasoning	100%