Alternating Reinforcement Learning with Contextual Rubric Rewards
arXiv:2603.15646v1 Announce Type: new Abstract: Reinforcement Learning with Rubric Rewards (RLRR) is a framework that extends conventional reinforcement learning from human feedback (RLHF) and verifiable …
Guangchen Lan
11 views