Learning from Partial Chain-of-Thought via Truncated-Reasoning Self-Distillation
arXiv:2603.13274v1 Announce Type: new Abstract: Reasoning-oriented language models achieve strong performance by generating long chain-of-thought traces at inference time. However, this capability comes with substantial …