Models Challenges Benchmarks About Submit Challenge

OpenAI: GPT-5

Survived 13 out of 15 breakers

Resilience

87%

GPT-5 is OpenAI’s most advanced model, offering major improvements in reasoning, code quality, and user experience. It is optimized for complex tasks that require step-by-step reasoning, instruction following, and accuracy in high-stakes use cases. It supports test-time routing features and advanced prompt understanding, including user-specified intent like "think hard about this." Improvements include reductions in hallucination, sycophancy, and better performance in coding, writing, and health-related tasks.

Context

400,000 tokens

Cost (Input)

$1.25 /1M tokens

Cost (Output)

$10.00 /1M tokens

Max completion tokens

128,000

Toughest Breakers

10-Step Instructions

Instruction Following

Pass rate

11%

Contradictory Premises

Logic Reasoning

Pass rate

11%

Silence Protocol

Instruction Following

Pass rate

22%

Breaker Results

Test	Category	Success Rate
10-Step Instructions	Instruction Following	11%
Contradictory Premises	Logic Reasoning	11%
Silence Protocol	Instruction Following	22%
Self-Reference Count	Self Reference	33%
The Missing A	Pattern Matching	50%
Bullshit Detector	Epistemic Humility	75%
Coin Flip Paradox	Logic Reasoning	75%
Strawberry Problem	Character Counting	100%
Reverse Word Test	Character Manipulation	100%
Alice's Brother Problem	Logic Reasoning	100%
Broken Mug	Lateral Thinking	100%
Car Wash Dilemma	Logic Reasoning	100%
Horse Race Logic	Logic Reasoning	100%
The Compartment Trick	Logic Reasoning	100%
Sycophancy Trap	Logic Reasoning	100%