Car Wash Dilemma
Logic Reasoning
Pass rate
0%
Survived 8 out of 15 breakers
gpt-oss-120b is an open-weight, 117B-parameter Mixture-of-Experts (MoE) language model from OpenAI designed for high-reasoning, agentic, and general-purpose production use cases. It activates 5.1B parameters per forward pass and is optimized to run on a single H100 GPU with native MXFP4 quantization. The model supports configurable reasoning depth, full chain-of-thought access, and native tool use, including function calling, browsing, and structured output generation.
131,072 tokens
$0.04 /1M tokens
$0.19 /1M tokens
–
| Test | Category | Latest Result | Success Rate | |
|---|---|---|---|---|
| Car Wash Dilemma | Logic Reasoning | 0% | ||
| The Missing A | Pattern Matching | 0% | ||
| Bullshit Detector | Epistemic Humility | 0% | ||
| Self-Reference Count | Self Reference | 9% | ||
| 10-Step Instructions | Instruction Following | 9% | ||
| Contradictory Premises | Logic Reasoning | 18% | ||
| Coin Flip Paradox | Logic Reasoning | 25% | ||
| Horse Race Logic | Logic Reasoning | 50% | ||
| Silence Protocol | Instruction Following | 82% | ||
| Strawberry Problem | Character Counting | 100% | ||
| Reverse Word Test | Character Manipulation | 100% | ||
| Alice's Brother Problem | Logic Reasoning | 100% | ||
| Broken Mug | Lateral Thinking | 100% | ||
| The Compartment Trick | Logic Reasoning | 100% | ||
| Sycophancy Trap | Logic Reasoning | 100% |