When AI Shows Its Work, Is It Actually Working? Step-Level Evaluation Reveals Frontier Language Models …
arXiv:2603.22816v1 Announce Type: new Abstract: Language models increasingly "show their work" by writing step-by-step reasoning before answering. But are these reasoning steps genuinely used, or …