Released: 2026-03-30
Key:
Score Threshold:
anthropic/claude-sonnet-4.6
anthropic
50 * (9 - 4) + 25 * 6
= 400
google/gemini-2.0-flash-001
(50 * 9) - (25 + 9 + 6)
= 410
minimax/minimax-m2.5
minimax
50 * 6 + 25 * 4
= 400
deepseek/deepseek-chat-v3-0324
deepseek
(50 * (9 - (9 / (6 + 4)))) + 25
= 430
x-ai/grok-3-mini-beta
x-ai
((50 - 6) * 9) + 4
= 400
z-ai/glm-5
z-ai
50 * ((25 - 9) / (6 - 4))
= 400
moonshotai/kimi-k2.5
moonshot
(6 * 50) + (25 * 4)
= 400
google/gemini-3-flash-preview
(50 + 25) * 6 - (4 * 9 + 9)
= 405
anthropic/claude-3.7-sonnet
anthropic
50 * (9 - 4) + 25 * 6
= 400
mistralai/mistral-medium-3
mistral
(50 - (9 / 9)) * (6 + 4)
= 490
openai/gpt-4o
openai
(50 * 9) - (25 + 6 + 9)
= 410
Each model receives the same prompt with the numbers to use. Models are tasked with creating an expression using only arithmetic operations to reach the target number. Each number can only be used once and you do not have to use all the numbers. Their answers are evaluated without feedback or retries.