A systematic evaluation framework testing large language models against structured cognitive challenges. Wordle. Sudoku. And beyond.
Each puzzle type tests distinct cognitive capabilities: deduction, pattern recognition, semantic understanding, and constraint satisfaction.
Wordle
Five-letter word deduction with positional feedback constraints.
1687
Puzzles
8
Models
+ 3 more
Connections
Categorical grouping of 16 words into four thematic clusters.
965
Puzzles
8
Models
+ 3 more
Countdown
With the provided numbers and only using arithmetic operations, find an expression to reach the target number.
99
Puzzles
8
Models
+ 3 more
Sudoku
Fill a 9x9 grid with digits so each column, row, and 3x3 subgrid contains all numbers 1-9 exactly once.
182
Puzzles
8
Models
+ 3 more
Geoguessr
Given a street view image, guess the exact location on Earth where it was taken.
91
Puzzles
6
Models
+ 1 more
Domino
Identify the missing Domino from an image of a grid of dominoes.
91
Puzzles
6
Models
+ 1 more
If there are any LLM models or puzzles you'd like to see added, or if you'd like to directly contribute, please get in touch.
Get In Touch