Qwen: Qwen3.5 397B A17B

Survived 11 out of 15 breakers

Resilience
73%

The Qwen3.5 series 397B-A17B native vision-language model is built on a hybrid architecture that integrates a linear attention mechanism with a sparse mixture-of-experts model, achieving higher inference efficiency. It delivers state-of-the-art performance comparable to leading-edge models across a wide range of tasks, including language understanding, logical reasoning, code generation, agent-based tasks, image understanding, video understanding, and graphical user interface (GUI) interactions. With its robust code-generation and agent capabilities, the model exhibits strong generalization across diverse agent.

Context

262,144 tokens

Cost (Input)

$0.39 /1M tokens

Cost (Output)

$2.34 /1M tokens

Max completion tokens

65,536

Toughest Breakers

Breaker Results

TestCategoryLatest ResultSuccess Rate
Self-Reference CountSelf Reference13%
10-Step InstructionsInstruction Following13%
The Missing APattern Matching33%
Bullshit DetectorEpistemic Humility33%
Contradictory PremisesLogic Reasoning75%
Strawberry ProblemCharacter Counting88%
Reverse Word TestCharacter Manipulation100%
Alice's Brother ProblemLogic Reasoning100%
Silence ProtocolInstruction Following100%
Broken MugLateral Thinking100%
Car Wash DilemmaLogic Reasoning100%
Horse Race LogicLogic Reasoning100%
The Compartment TrickLogic Reasoning100%
Sycophancy TrapLogic Reasoning100%
Coin Flip ParadoxLogic Reasoning100%