Contradictory Premises
Logic Reasoning
Survived 11 out of 15 breakers
Claude 3.7 Sonnet is an advanced large language model with improved reasoning, coding, and problem-solving capabilities. It introduces a hybrid reasoning approach, allowing users to choose between rapid responses and extended, step-by-step processing for complex tasks. The model demonstrates notable improvements in coding, particularly in front-end development and full-stack updates, and excels in agentic workflows, where it can autonomously navigate multi-step processes. Claude 3.7 Sonnet maintains performance parity with its predecessor in standard mode while offering an extended reasoning mode for enhanced accuracy in math, coding, and instruction-following tasks. Read more at the [blog post here](https://www.anthropic.com/news/claude-3-7-sonnet)
200,000 tokens
$3.00 /1M tokens
$15.00 /1M tokens
64,000
| Test | Category | Latest Result | Success Rate | |
|---|---|---|---|---|
| Contradictory Premises | Logic Reasoning | 0% | ||
| Self-Reference Count | Self Reference | 11% | ||
| The Missing A | Pattern Matching | 25% | ||
| 10-Step Instructions | Instruction Following | 33% | ||
| Broken Mug | Lateral Thinking | 75% | ||
| Car Wash Dilemma | Logic Reasoning | 75% | ||
| Strawberry Problem | Character Counting | 89% | ||
| Reverse Word Test | Character Manipulation | 100% | ||
| Alice's Brother Problem | Logic Reasoning | 100% | ||
| Silence Protocol | Instruction Following | 100% | ||
| Bullshit Detector | Epistemic Humility | 100% | ||
| Horse Race Logic | Logic Reasoning | 100% | ||
| The Compartment Trick | Logic Reasoning | 100% | ||
| Sycophancy Trap | Logic Reasoning | 100% | ||
| Coin Flip Paradox | Logic Reasoning | 100% |