Rethinking Multiple-Choice Questions for RLVR: Unlocking Potential via Distractor Design
arXiv:2603.12826v1 Announce Type: new Abstract: Reinforcement Learning with Verifiable Rewards (RLVR) significantly enhances the reasoning capabilities of Large Language Models. When applied to RLVR, Multiple-Choice …
Xu Guo, Qiming Ge, Jian Tong, Kedi Chen, Jin Zhang, Xiaogui Yang, Xuan Gao, Haijun Lv, Zhihui Lu, Yicheng Zou, Qipeng Guo
8 views