Learning from Partial Chain-of-Thought via Truncated-Reasoning Self-Distillation
arXiv:2603.13274v1 Announce Type: new Abstract: Reasoning-oriented language models achieve strong performance by generating long chain-of-thought traces at inference time. However, this capability comes with substantial and often excessive computational cost, which can materialize in redundant or inefficient reasoning. We study this setting and introduce Truncated-Reasoning Self-Distillation (TRSD), a lightweight post-training procedure that encourages models to produce correct predictions from partial reasoning traces. In TRSD, a frozen teacher model first generates a full reasoning trace and evaluates the corresponding answer distribution conditioned on the prompt and the complete reasoning to construct a synthetic training target. A student model with the same architecture is then trained to match the teacher's answer distribution while being conditioned only on a truncated prefix of its reasoning trace. Across multiple reasoning benchmarks and token budgets, we demonstrate that TR
arXiv:2603.13274v1 Announce Type: new Abstract: Reasoning-oriented language models achieve strong performance by generating long chain-of-thought traces at inference time. However, this capability comes with substantial and often excessive computational cost, which can materialize in redundant or inefficient reasoning. We study this setting and introduce Truncated-Reasoning Self-Distillation (TRSD), a lightweight post-training procedure that encourages models to produce correct predictions from partial reasoning traces. In TRSD, a frozen teacher model first generates a full reasoning trace and evaluates the corresponding answer distribution conditioned on the prompt and the complete reasoning to construct a synthetic training target. A student model with the same architecture is then trained to match the teacher's answer distribution while being conditioned only on a truncated prefix of its reasoning trace. Across multiple reasoning benchmarks and token budgets, we demonstrate that TRSD improves robustness to truncated inference, with far reduced accuracy tradeoffs when applied to a diverse set of reasoning models. Moreover, although never explicitly regularized for shorter generation during training, we also find that TRSD-trained models inherently output shorter reasoning traces without truncation, significantly reducing inference-time costs even without artificial interventions.
Executive Summary
The article introduces Truncated-Reasoning Self-Distillation (TRSD), a post-training procedure that enhances the efficiency of reasoning-oriented language models. By leveraging a teacher-student framework, TRSD encourages models to produce accurate predictions from partial reasoning traces, reducing computational costs and improving robustness to truncated inference. The approach demonstrates significant improvements across multiple benchmarks and token budgets, with TRSD-trained models inherently generating shorter reasoning traces without truncation.
Key Points
- ▸ TRSD is a lightweight post-training procedure for improving reasoning-oriented language models
- ▸ The approach leverages a teacher-student framework to encourage accurate predictions from partial reasoning traces
- ▸ TRSD-trained models demonstrate improved robustness to truncated inference and reduced inference-time costs
Merits
Improved Efficiency
TRSD reduces computational costs by enabling models to produce accurate predictions from partial reasoning traces
Enhanced Robustness
TRSD-trained models demonstrate improved robustness to truncated inference, making them more reliable in real-world applications
Demerits
Limited Generalizability
The effectiveness of TRSD may be limited to specific types of reasoning tasks or models, requiring further research to fully understand its applicability
Expert Commentary
The introduction of TRSD marks a significant advancement in the development of reasoning-oriented language models. By addressing the limitations of traditional chain-of-thought approaches, TRSD offers a promising solution for improving model efficiency and robustness. However, further research is needed to fully understand the applicability and potential limitations of TRSD, particularly in real-world settings. As the field continues to evolve, it is likely that TRSD will play an important role in shaping the development of more efficient and reliable language models.
Recommendations
- ✓ Further research should be conducted to explore the applicability of TRSD to diverse types of reasoning tasks and models
- ✓ The development of TRSD should be integrated with other techniques, such as explainability and transparency methods, to create more comprehensive and reliable language models