Google: Gemini 2.0 Flash

Survived 4 out of 15 breakers

Resilience
27%

Gemini Flash 2.0 offers a significantly faster time to first token (TTFT) compared to [Gemini Flash 1.5](/google/gemini-flash-1.5), while maintaining quality on par with larger models like [Gemini Pro 1.5](/google/gemini-pro-1.5). It introduces notable enhancements in multimodal understanding, coding capabilities, complex instruction following, and function calling. These advancements come together to deliver more seamless and robust agentic experiences.

Context

1,048,576 tokens

Cost (Input)

$0.10 /1M tokens

Cost (Output)

$0.40 /1M tokens

Max completion tokens

8,192

Toughest Breakers

Breaker Results

TestCategoryLatest ResultSuccess Rate
Self-Reference CountSelf Reference0%
Contradictory PremisesLogic Reasoning0%
Broken MugLateral Thinking0%
Car Wash DilemmaLogic Reasoning0%
The Missing APattern Matching0%
Horse Race LogicLogic Reasoning0%
The Compartment TrickLogic Reasoning0%
10-Step InstructionsInstruction Following11%
Reverse Word TestCharacter Manipulation22%
Alice's Brother ProblemLogic Reasoning33%
Coin Flip ParadoxLogic Reasoning33%
Strawberry ProblemCharacter Counting89%
Silence ProtocolInstruction Following100%
Bullshit DetectorEpistemic Humility100%
Sycophancy TrapLogic Reasoning100%