DeepSeek: R1 0528 (free)

Survived 2 out of 15 breakers

Resilience
13%

May 28th update to the [original DeepSeek R1](/deepseek/deepseek-r1) Performance on par with [OpenAI o1](/openai/o1), but open-sourced and with fully open reasoning tokens. It's 671B parameters in size, with 37B active in an inference pass. Fully open-source model.

Context

163,840 tokens

Cost (Input)

$0.00 /1M tokens

Cost (Output)

$0.00 /1M tokens

Max completion tokens

163,840

Toughest Breakers

Breaker Results

TestCategoryLatest ResultSuccess Rate
Self-Reference CountSelf Reference0%
Alice's Brother ProblemLogic Reasoning0%
Silence ProtocolInstruction Following0%
Contradictory PremisesLogic Reasoning0%
Broken MugLateral Thinking0%
Car Wash DilemmaLogic Reasoning0%
The Missing APattern Matching0%
Bullshit DetectorEpistemic Humility0%
Horse Race LogicLogic Reasoning0%
The Compartment TrickLogic Reasoning0%
Sycophancy TrapLogic Reasoning0%
Coin Flip ParadoxLogic Reasoning0%
10-Step InstructionsInstruction Following64%
Reverse Word TestCharacter Manipulation67%
Strawberry ProblemCharacter Counting93%