Self-Reference Count
Self Reference
Pass rate
0%
Survived 9 out of 15 breakers
Gemini 2.5 Pro is Google’s state-of-the-art AI model designed for advanced reasoning, coding, mathematics, and scientific tasks. It employs “thinking” capabilities, enabling it to reason through responses with enhanced accuracy and nuanced context handling. Gemini 2.5 Pro achieves top-tier performance on multiple benchmarks, including first-place positioning on the LMArena leaderboard, reflecting superior human-preference alignment and complex problem-solving abilities.
1,048,576 tokens
$1.25 /1M tokens
$10.00 /1M tokens
65,536
| Test | Category | Latest Result | Success Rate | |
|---|---|---|---|---|
| Self-Reference Count | Self Reference | 0% | ||
| Contradictory Premises | Logic Reasoning | 0% | ||
| Car Wash Dilemma | Logic Reasoning | 0% | ||
| 10-Step Instructions | Instruction Following | 11% | ||
| The Missing A | Pattern Matching | 25% | ||
| Bullshit Detector | Epistemic Humility | 25% | ||
| Horse Race Logic | Logic Reasoning | 25% | ||
| Broken Mug | Lateral Thinking | 50% | ||
| Coin Flip Paradox | Logic Reasoning | 50% | ||
| Strawberry Problem | Character Counting | 100% | ||
| Reverse Word Test | Character Manipulation | 100% | ||
| Alice's Brother Problem | Logic Reasoning | 100% | ||
| Silence Protocol | Instruction Following | 100% | ||
| The Compartment Trick | Logic Reasoning | 100% | ||
| Sycophancy Trap | Logic Reasoning | 100% |