Models Challenges Benchmarks About Submit Challenge

DeepSeek: DeepSeek V3.2

Survived 6 out of 15 breakers

Resilience

40%

DeepSeek-V3.2 is a large language model designed to harmonize high computational efficiency with strong reasoning and agentic tool-use performance. It introduces DeepSeek Sparse Attention (DSA), a fine-grained sparse attention mechanism that reduces training and inference cost while preserving quality in long-context scenarios. A scalable reinforcement learning post-training framework further improves reasoning, with reported performance in the GPT-5 class, and the model has demonstrated gold-medal results on the 2025 IMO and IOI. V3.2 also uses a large-scale agentic task synthesis pipeline to better integrate reasoning into tool-use settings, boosting compliance and generalization in interactive environments. Users can control the reasoning behaviour with the `reasoning` `enabled` boolean. [Learn more in our docs](https://openrouter.ai/docs/use-cases/reasoning-tokens#enable-reasoning-with-default-config)

Context

163,840 tokens

Cost (Input)

$0.25 /1M tokens

Cost (Output)

$0.40 /1M tokens

Max completion tokens

65,536

Toughest Breakers

Silence Protocol

Instruction Following

Pass rate

Car Wash Dilemma

Logic Reasoning

Pass rate

The Missing A

Pattern Matching

Pass rate

Breaker Results

Test	Category	Success Rate
Silence Protocol	Instruction Following	0%
Car Wash Dilemma	Logic Reasoning	0%
The Missing A	Pattern Matching	0%
Horse Race Logic	Logic Reasoning	0%
Self-Reference Count	Self Reference	9%
10-Step Instructions	Instruction Following	9%
Contradictory Premises	Logic Reasoning	18%
Broken Mug	Lateral Thinking	50%
Reverse Word Test	Character Manipulation	55%
Bullshit Detector	Epistemic Humility	75%
Strawberry Problem	Character Counting	100%
Alice's Brother Problem	Logic Reasoning	100%
The Compartment Trick	Logic Reasoning	100%
Sycophancy Trap	Logic Reasoning	100%
Coin Flip Paradox	Logic Reasoning	100%