ReAIty Check
ModelsChallengesBenchmarksAbout
Submit Challenge
ModelsChallengesBenchmarksAboutSubmit Challenge
openai

Openai

9 models tracked

Average resilience
71%
Tests Survived

886

Tests Failed

355

Toughest Breakers

10-Step Instructions

Instruction Following

#1
Pass rate (provider)
0%

Contradictory Premises

Logic Reasoning

#2
Pass rate (provider)
11%

Car Wash Dilemma

Logic Reasoning

#3
Pass rate (provider)
22%

Models

OG

OpenAI: GPT-5.2

openai

#1
Survived
76%
Failure Rate
24%
OO

OpenAI: o4 Mini

openai

#2
Survived
76%
Failure Rate
24%
OG

OpenAI: GPT-5

openai

#3
Survived
75%
Failure Rate
25%
OG

OpenAI: GPT-5 Codex

openai

#4
Survived
73%
Failure Rate
27%
OG

OpenAI: GPT-5.1-Codex

openai

#5
Survived
70%
Failure Rate
30%
OG

OpenAI: gpt-oss-120b

openai

#6
Survived
70%
Failure Rate
30%
OG

OpenAI: GPT-5 Chat

openai

#7
Survived
70%
Failure Rate
30%
OG

OpenAI: GPT-5.1

openai

#8
Survived
69%
Failure Rate
31%
OG

OpenAI: GPT-5.1 Chat

openai

#9
Survived
63%
Failure Rate
37%

© 2026 ReAIty Check v0.5.27-beta by Eugene Tusmenko