Mending the Holes: Mitigating Reward Hacking in Reinforcement Learning for Multilingual Translation
arXiv:2603.13045v1 Announce Type: new Abstract: Large Language Models (LLMs) have demonstrated remarkable capability in machine translation on high-resource language pairs, yet their performance on low-resource …
Yifeng Liu, Siqi Ouyang, Yatish Hosmane Revanasiddappa, Lei Li
3 views