Noisy Data is Destructive to Reinforcement Learning with Verifiable Rewards
arXiv:2603.16140v1 Announce Type: new Abstract: Reinforcement learning with verifiable rewards (RLVR) has driven recent capability advances of large language models across various domains. Recent studies …
Yuxuan Zhu, Daniel Kang
7 views