To Mix or To Merge: Toward Multi-Domain Reinforcement Learning for Large Language Models
arXiv:2602.12566v1 Announce Type: new Abstract: Reinforcement Learning with Verifiable Rewards (RLVR) plays a key role in stimulating the explicit reasoning capability of Large Language Models …