Probing to Refine: Reinforcement Distillation of LLMs via Explanatory Inversion
arXiv:2603.19266v1 Announce Type: cross Abstract: Distilling robust reasoning capabilities from large language models (LLMs) into smaller, computationally efficient student models remains an unresolved challenge. Despite …
Zhen Tan, Chengshuai Zhao, Song Wang, Jundong Li, Tianlong Chen, Huan Liu
15 views