ReAIty Check
ModelsChallengesBenchmarksAbout
Submit Challenge
ModelsChallengesBenchmarksAboutSubmit Challenge
anthropic

Anthropic

13 models tracked

Average resilience
63%
Tests Survived

1286

Tests Failed

702

Toughest Breakers

10-Step Instructions

Instruction Following

#1
Pass rate (provider)
8%

Contradictory Premises

Logic Reasoning

#2
Pass rate (provider)
8%

Car Wash Dilemma

Logic Reasoning

#3
Pass rate (provider)
8%

Models

AC

Anthropic: Claude Opus 4.5

anthropic

#1
Survived
81%
Failure Rate
19%
AC

Anthropic: Claude Opus 4.6

anthropic

#2
Survived
80%
Failure Rate
20%
AC

Anthropic: Claude 3.7 Sonnet (thinking)

anthropic

#3
Survived
77%
Failure Rate
23%
AC

Anthropic: Claude Sonnet 4.6

anthropic

#4
Survived
74%
Failure Rate
26%
AC

Anthropic: Claude Opus 4

anthropic

#5
Survived
64%
Failure Rate
36%
AC

Anthropic: Claude Sonnet 4.5

anthropic

#6
Survived
64%
Failure Rate
36%
AC

Anthropic: Claude Opus 4.1

anthropic

#7
Survived
63%
Failure Rate
37%
AC

Anthropic: Claude Haiku 4.5

anthropic

#8
Survived
62%
Failure Rate
38%
AC

Anthropic: Claude 3.7 Sonnet

anthropic

#9
Survived
59%
Failure Rate
41%
AC

Anthropic: Claude Sonnet 4

anthropic

#10
Survived
59%
Failure Rate
41%
AC

Anthropic: Claude 3.5 Sonnet

anthropic

#11
Survived
53%
Failure Rate
47%
AC

Anthropic: Claude 3.5 Haiku

anthropic

#12
Survived
47%
Failure Rate
53%
AC

Anthropic: Claude 3 Haiku

anthropic

#13
Survived
39%
Failure Rate
61%

© 2026 ReAIty Check v0.5.27-beta by Eugene Tusmenko