Self-Reference Count
Self Reference
Pass rate
0%
Survived 8 out of 15 breakers
Claude Opus 4.1 is an updated version of Anthropic’s flagship model, offering improved performance in coding, reasoning, and agentic tasks. It achieves 74.5% on SWE-bench Verified and shows notable gains in multi-file code refactoring, debugging precision, and detail-oriented reasoning. The model supports extended thinking up to 64K tokens and is optimized for tasks involving research, data analysis, and tool-assisted reasoning.
200,000 tokens
$15.00 /1M tokens
$75.00 /1M tokens
32,000
| Test | Category | Latest Result | Success Rate | |
|---|---|---|---|---|
| Self-Reference Count | Self Reference | 0% | ||
| Silence Protocol | Instruction Following | 0% | ||
| Contradictory Premises | Logic Reasoning | 0% | ||
| Broken Mug | Lateral Thinking | 0% | ||
| Car Wash Dilemma | Logic Reasoning | 0% | ||
| 10-Step Instructions | Instruction Following | 22% | ||
| Bullshit Detector | Epistemic Humility | 50% | ||
| Horse Race Logic | Logic Reasoning | 75% | ||
| Strawberry Problem | Character Counting | 100% | ||
| Reverse Word Test | Character Manipulation | 100% | ||
| Alice's Brother Problem | Logic Reasoning | 100% | ||
| The Missing A | Pattern Matching | 100% | ||
| The Compartment Trick | Logic Reasoning | 100% | ||
| Sycophancy Trap | Logic Reasoning | 100% | ||
| Coin Flip Paradox | Logic Reasoning | 100% |