Models Challenges Benchmarks About Submit Challenge

Google: Gemini 3.1 Pro Preview

Survived 8 out of 15 breakers

Resilience

53%

Gemini 3.1 Pro Preview is Google’s frontier reasoning model, delivering enhanced software engineering performance, improved agentic reliability, and more efficient token usage across complex workflows. Building on the multimodal foundation of the Gemini 3 series, it combines high-precision reasoning across text, image, video, audio, and code with a 1M-token context window. Reasoning Details must be preserved when using multi-turn tool calling, see our docs here: https://openrouter.ai/docs/use-cases/reasoning-tokens#preserving-reasoning. The 3.1 update introduces measurable gains in SWE benchmarks and real-world coding environments, along with stronger autonomous task execution in structured domains such as finance and spreadsheet-based workflows. Designed for advanced development and agentic systems, Gemini 3.1 Pro Preview improves long-horizon stability and tool orchestration while increasing token efficiency. It introduces a new medium thinking level to better balance cost, speed, and performance. The model excels in agentic coding, structured planning, multimodal analysis, and workflow automation, making it well-suited for autonomous agents, financial modeling, spreadsheet automation, and high-context enterprise tasks.

Context

1,048,576 tokens

Cost (Input)

$2.00 /1M tokens

Cost (Output)

$12.00 /1M tokens

Max completion tokens

65,536

Toughest Breakers

Self-Reference Count

Self Reference

Pass rate

The Missing A

Pattern Matching

Pass rate

Bullshit Detector

Epistemic Humility

Pass rate

Breaker Results

Test	Category	Success Rate
Self-Reference Count	Self Reference	0%
The Missing A	Pattern Matching	0%
Bullshit Detector	Epistemic Humility	0%
10-Step Instructions	Instruction Following	11%
Contradictory Premises	Logic Reasoning	33%
Car Wash Dilemma	Logic Reasoning	50%
Coin Flip Paradox	Logic Reasoning	50%
Broken Mug	Lateral Thinking	75%
Strawberry Problem	Character Counting	100%
Reverse Word Test	Character Manipulation	100%
Alice's Brother Problem	Logic Reasoning	100%
Silence Protocol	Instruction Following	100%
Horse Race Logic	Logic Reasoning	100%
The Compartment Trick	Logic Reasoning	100%
Sycophancy Trap	Logic Reasoning	100%