Explain in Your Own Words: Improving Reasoning via Token-Selective Dual Knowledge Distillation
arXiv:2603.13260v1 Announce Type: new Abstract: Knowledge Distillation (KD) can transfer the reasoning abilities of large models to smaller ones, which can reduce the costs to …
Minsang Kim, Seung Jun Baek
9 views