Models Challenges Benchmarks About Submit Challenge

OpenAI: GPT-5.2

Survived 12 out of 15 breakers

Resilience

80%

GPT-5.2 is the latest frontier-grade model in the GPT-5 series, offering stronger agentic and long context perfomance compared to GPT-5.1. It uses adaptive reasoning to allocate computation dynamically, responding quickly to simple queries while spending more depth on complex tasks. Built for broad task coverage, GPT-5.2 delivers consistent gains across math, coding, sciende, and tool calling workloads, with more coherent long-form answers and improved tool-use reliability.

Context

400,000 tokens

Cost (Input)

$1.75 /1M tokens

Cost (Output)

$14.00 /1M tokens

Max completion tokens

128,000

Toughest Breakers

The Missing A

Pattern Matching

Pass rate

Self-Reference Count

Self Reference

Pass rate

13%

10-Step Instructions

Instruction Following

Pass rate

13%

Breaker Results

Test	Category	Success Rate
The Missing A	Pattern Matching	0%
Self-Reference Count	Self Reference	13%
10-Step Instructions	Instruction Following	13%
Contradictory Premises	Logic Reasoning	25%
Horse Race Logic	Logic Reasoning	33%
Car Wash Dilemma	Logic Reasoning	67%
Strawberry Problem	Character Counting	75%
Reverse Word Test	Character Manipulation	100%
Alice's Brother Problem	Logic Reasoning	100%
Silence Protocol	Instruction Following	100%
Broken Mug	Lateral Thinking	100%
Bullshit Detector	Epistemic Humility	100%
The Compartment Trick	Logic Reasoning	100%
Sycophancy Trap	Logic Reasoning	100%
Coin Flip Paradox	Logic Reasoning	100%