10-Step Instructions
Instruction Following
Pass rate
11%
Survived 13 out of 15 breakers
GPT-5 is OpenAI’s most advanced model, offering major improvements in reasoning, code quality, and user experience. It is optimized for complex tasks that require step-by-step reasoning, instruction following, and accuracy in high-stakes use cases. It supports test-time routing features and advanced prompt understanding, including user-specified intent like "think hard about this." Improvements include reductions in hallucination, sycophancy, and better performance in coding, writing, and health-related tasks.
400,000 tokens
$1.25 /1M tokens
$10.00 /1M tokens
128,000
| Test | Category | Latest Result | Success Rate | |
|---|---|---|---|---|
| 10-Step Instructions | Instruction Following | 11% | ||
| Contradictory Premises | Logic Reasoning | 11% | ||
| Silence Protocol | Instruction Following | 22% | ||
| Self-Reference Count | Self Reference | 33% | ||
| The Missing A | Pattern Matching | 50% | ||
| Bullshit Detector | Epistemic Humility | 75% | ||
| Coin Flip Paradox | Logic Reasoning | 75% | ||
| Strawberry Problem | Character Counting | 100% | ||
| Reverse Word Test | Character Manipulation | 100% | ||
| Alice's Brother Problem | Logic Reasoning | 100% | ||
| Broken Mug | Lateral Thinking | 100% | ||
| Car Wash Dilemma | Logic Reasoning | 100% | ||
| Horse Race Logic | Logic Reasoning | 100% | ||
| The Compartment Trick | Logic Reasoning | 100% | ||
| Sycophancy Trap | Logic Reasoning | 100% |