Countdown

Released: 2026-02-05

Models Tested

Solved

LLM Providers

Key:

Used

Unused

Score Threshold:

Within 7

google/gemini-2.0-flash-001

google

529Target

(75 * 6) + 50 - 5

= 495

Failed

mistralai/mistral-medium-3

mistral

529Target

((75 + 50 - 6) * (3 + 25)) - 5

= 3327

Failed

anthropic/claude-3.7-sonnet

anthropic

529Target

((75 + 6) * (3 + 5)) + 25

= 673

Failed

openai/gpt-4o

openai

529Target

(75 * 6) + (50 - 25) + 3

= 478

Failed

deepseek/deepseek-chat-v3-0324

deepseek

529Target

(75 * (6 + (50 / 25))) + 5 + 3

= 608

Failed

qwen/qwen3-235b-a22b

qwen

529Target

75*(5+6/3)+(50-25)

= 550

Failed

x-ai/grok-3-mini-beta

x-ai

529Target

((75 * ((50 /5) -3)) + 6)

= 531

Close; 2 away

Methodology Note

Each model receives the same prompt with the numbers to use. Models are tasked with creating an expression using only arithmetic operations to reach the target number. Each number can only be used once and you do not have to use all the numbers. Their answers are evaluated without feedback or retries.

Leaderboard