Scaling Reasoning Efficiently via Relaxed On-Policy Distillation
arXiv:2603.11137v1 Announce Type: new Abstract: On-policy distillation is pivotal for transferring reasoning capabilities to capacity-constrained models, yet remains prone to instability and negative transfer. We …
Jongwoo Ko, Sara Abdali, Young Jin Kim, Tianyi Chen, Pashmina Cameron
8 views