REAL: Regression-Aware Reinforcement Learning for LLM-as-a-Judge
arXiv:2603.17145v1 Announce Type: new Abstract: Large language models (LLMs) are increasingly deployed as automated evaluators that assign numeric scores to model outputs, a paradigm known …
Yasi Zhang, Tianyu Chen, Mingyuan Zhou, Oscar Leong, Ying Nian Wu, Michal Lukasik
8 views