Re
AI
ty Check
Models
Challenges
Benchmarks
About
Submit Challenge
Models
Challenges
Benchmarks
About
Submit Challenge
google
Google
9 models tracked
Average resilience
64%
Tests Survived
950
Tests Failed
577
Toughest Breakers
Self-Reference Count
Self Reference
#1
Pass rate (provider)
0%
10-Step Instructions
Instruction Following
#2
Pass rate (provider)
11%
Contradictory Premises
Logic Reasoning
#3
Pass rate (provider)
11%
Models
GG
Google: Gemini 3 Pro Preview
google
#1
Survived
78%
Failure Rate
22%
GG
Google: Gemini 3.1 Pro Preview
google
#2
Survived
74%
Failure Rate
26%
GG
Google: Gemini 3 Flash Preview
google
#3
Survived
72%
Failure Rate
28%
GG
Google: Gemini 2.5 Pro
google
#4
Survived
66%
Failure Rate
34%
GG
Google: Gemini 2.5 Flash
google
#5
Survived
63%
Failure Rate
37%
GG
Google: Gemini 2.0 Flash
google
#6
Survived
57%
Failure Rate
43%
GG
Google: Gemma 3 27B (free)
google
#7
Survived
57%
Failure Rate
43%
GG
Google: Gemini 2.5 Flash Lite
google
#8
Survived
57%
Failure Rate
43%
GG
Google: Gemma 3 27B
google
#9
Survived
51%
Failure Rate
49%