OpenAI: GPT-5 Codex

Survived 9 out of 15 breakers

Resilience
60%

GPT-5-Codex is a specialized version of GPT-5 optimized for software engineering and coding workflows. It is designed for both interactive development sessions and long, independent execution of complex engineering tasks. The model supports building projects from scratch, feature development, debugging, large-scale refactoring, and code review. Compared to GPT-5, Codex is more steerable, adheres closely to developer instructions, and produces cleaner, higher-quality code outputs. Reasoning effort can be adjusted with the `reasoning.effort` parameter. Read the [docs here](https://openrouter.ai/docs/use-cases/reasoning-tokens#reasoning-effort-level) Codex integrates into developer environments including the CLI, IDE extensions, GitHub, and cloud tasks. It adapts reasoning effort dynamically—providing fast responses for small tasks while sustaining extended multi-hour runs for large projects. The model is trained to perform structured code reviews, catching critical flaws by reasoning over dependencies and validating behavior against tests. It also supports multimodal inputs such as images or screenshots for UI development and integrates tool use for search, dependency installation, and environment setup. Codex is intended specifically for agentic coding applications.

Context

400,000 tokens

Cost (Input)

$1.25 /1M tokens

Cost (Output)

$10.00 /1M tokens

Max completion tokens

128,000

Toughest Breakers

Breaker Results

TestCategoryLatest ResultSuccess Rate
Self-Reference CountSelf Reference0%
Car Wash DilemmaLogic Reasoning0%
10-Step InstructionsInstruction Following11%
Contradictory PremisesLogic Reasoning22%
The Missing APattern Matching25%
Bullshit DetectorEpistemic Humility75%
Horse Race LogicLogic Reasoning75%
The Compartment TrickLogic Reasoning75%
Reverse Word TestCharacter Manipulation89%
Silence ProtocolInstruction Following89%
Strawberry ProblemCharacter Counting100%
Alice's Brother ProblemLogic Reasoning100%
Broken MugLateral Thinking100%
Sycophancy TrapLogic Reasoning100%
Coin Flip ParadoxLogic Reasoning100%