Does LLM Alignment Really Need Diversity? An Empirical Study of Adapting RLVR Methods for Moral …
arXiv:2603.10588v1 Announce Type: new Abstract: Reinforcement learning with verifiable rewards (RLVR) has achieved remarkable success in logical reasoning tasks, yet whether large language model (LLM) …