This platform requires JavaScript for full functionality. Please enable JavaScript in your browser settings.

Quality follows upgrading

Simon Henniger, Gabriel Poesia

Articles by Simon Henniger, Gabriel Poesia

Academic · 1 min

The Token Games: Evaluating Language Model Reasoning with Puzzle Duels

arXiv:2602.17831v1 Announce Type: new Abstract: Evaluating the reasoning capabilities of Large Language Models is increasingly challenging as models improve. Human curation of hard questions is …

14 views Mar 7

Simon Henniger, Gabriel Poesia

Articles by Simon Henniger, Gabriel Poesia

The Token Games: Evaluating Language Model Reasoning with Puzzle Duels

JCG, PC

HSOLLC Co., Ltd.