Countdown

Released: 2026-01-31

Models Tested

Solved

LLM Providers

Key:

Used

Unused

Score Threshold:

Within 7

google/gemini-2.0-flash-001

google

788Target

(75 * (6 + 7)) + (25 - 3) + 3

= 1000

Failed

mistralai/mistral-medium-3

mistral

788Target

((75 + 25) * (6 * 7)) + (3 / 3)

= 4201

Failed

anthropic/claude-3.7-sonnet

anthropic

788Target

75 * (3 + 7) + 25 * 3 + 6

= 831

Failed

deepseek/deepseek-chat-v3-0324

deepseek

788Target

((75 * (7 + 3)) + (25 * 3)) + 6

= 831

Failed

x-ai/grok-3-mini-beta

x-ai

788Target

7 * (75 + 25 + 6 + 3 + 3)

= 784

Close; 4 away

openai/gpt-4o

openai

788Target

((75 + 7) * (6 + 3)) - 3

= 735

Failed

qwen/qwen3-235b-a22b

qwen

788Target

((75+6)*(7+3))-(25-3)

= 788

Perfect Solution!

Methodology Note

Each model receives the same prompt with the numbers to use. Models are tasked with creating an expression using only arithmetic operations to reach the target number. Each number can only be used once and you do not have to use all the numbers. Their answers are evaluated without feedback or retries.

Leaderboard