Self-Reference Count
Self Reference
Survived 6 out of 15 breakers
Claude Sonnet 4 significantly enhances the capabilities of its predecessor, Sonnet 3.7, excelling in both coding and reasoning tasks with improved precision and controllability. Achieving state-of-the-art performance on SWE-bench (72.7%), Sonnet 4 balances capability and computational efficiency, making it suitable for a broad range of applications from routine coding tasks to complex software development projects. Key enhancements include improved autonomous codebase navigation, reduced error rates in agent-driven workflows, and increased reliability in following intricate instructions. Sonnet 4 is optimized for practical everyday use, providing advanced reasoning capabilities while maintaining efficiency and responsiveness in diverse internal and external scenarios. Read more at the [blog post here](https://www.anthropic.com/news/claude-4)
200,000 tokens
$3.00 /1M tokens
$15.00 /1M tokens
64,000
| Test | Category | Latest Result | Success Rate | |
|---|---|---|---|---|
| Self-Reference Count | Self Reference | 0% | ||
| Silence Protocol | Instruction Following | 0% | ||
| Contradictory Premises | Logic Reasoning | 0% | ||
| Broken Mug | Lateral Thinking | 0% | ||
| Car Wash Dilemma | Logic Reasoning | 0% | ||
| The Missing A | Pattern Matching | 0% | ||
| 10-Step Instructions | Instruction Following | 11% | ||
| Horse Race Logic | Logic Reasoning | 50% | ||
| Bullshit Detector | Epistemic Humility | 75% | ||
| Strawberry Problem | Character Counting | 100% | ||
| Reverse Word Test | Character Manipulation | 100% | ||
| Alice's Brother Problem | Logic Reasoning | 100% | ||
| The Compartment Trick | Logic Reasoning | 100% | ||
| Sycophancy Trap | Logic Reasoning | 100% | ||
| Coin Flip Paradox | Logic Reasoning | 100% |