Countdown

Released: 2026-02-09

Models Tested

Solved

LLM Providers

Key:

Used

Unused

Score Threshold:

Within 7

mistralai/mistral-medium-3

mistral

204Target

(75 * (8 + 4)) - (50 + 25 + 1)

= 824

Failed

openai/gpt-4o

openai

204Target

(75 * 4) - (50 / 25) + 8 - 1

= 305

Failed

deepseek/deepseek-chat-v3-0324

deepseek

204Target

(75 - 25) * 4 + 8 - 1

= 207

Close; 3 away

google/gemini-2.0-flash-001

google

204Target

(75 * 4) - (50 + 25 - 8 - 1)

= 234

Failed

x-ai/grok-3-mini-beta

x-ai

204Target

(25 * 8) + 4

= 204

Perfect Solution!

qwen/qwen3-235b-a22b

qwen

204Target

4*(50 + 1)

= 204

Perfect Solution!

anthropic/claude-3.7-sonnet

anthropic

204Target

(75 - 50) * (4 + 1) * 8

= 1000

Failed

Methodology Note

Each model receives the same prompt with the numbers to use. Models are tasked with creating an expression using only arithmetic operations to reach the target number. Each number can only be used once and you do not have to use all the numbers. Their answers are evaluated without feedback or retries.

Leaderboard