Self-Reference Count
Self Reference
Pass rate
11%
Survived 9 out of 15 breakers
Kimi K2.5 is Moonshot AI's native multimodal model, delivering state-of-the-art visual coding capability and a self-directed agent swarm paradigm. Built on Kimi K2 with continued pretraining over approximately 15T mixed visual and text tokens, it delivers strong performance in general reasoning, visual coding, and agentic tool-calling.
262,144 tokens
$0.45 /1M tokens
$2.20 /1M tokens
65,535
| Test | Category | Latest Result | Success Rate | |
|---|---|---|---|---|
| Self-Reference Count | Self Reference | 11% | ||
| 10-Step Instructions | Instruction Following | 11% | ||
| Contradictory Premises | Logic Reasoning | 22% | ||
| Horse Race Logic | Logic Reasoning | 50% | ||
| Silence Protocol | Instruction Following | 56% | ||
| Car Wash Dilemma | Logic Reasoning | 75% | ||
| Bullshit Detector | Epistemic Humility | 75% | ||
| Coin Flip Paradox | Logic Reasoning | 75% | ||
| Strawberry Problem | Character Counting | 100% | ||
| Reverse Word Test | Character Manipulation | 100% | ||
| Alice's Brother Problem | Logic Reasoning | 100% | ||
| Broken Mug | Lateral Thinking | 100% | ||
| The Missing A | Pattern Matching | 100% | ||
| The Compartment Trick | Logic Reasoning | 100% | ||
| Sycophancy Trap | Logic Reasoning | 100% |