All Articles

Articles

Academic · 1 min

SteerRM: Debiasing Reward Models via Sparse Autoencoders

arXiv:2603.12795v1 Announce Type: new Abstract: Reward models (RMs) are critical components of alignment pipelines, yet they exhibit biases toward superficial stylistic cues, preferring better-presented responses …

Mengyuan Sun, Zhuohao Yu, Weizheng Gu, Shikun Zhang, Wei Ye
9 views
Academic · 1 min

Adaptive Vision-Language Model Routing for Computer Use Agents

arXiv:2603.12823v1 Announce Type: new Abstract: Computer Use Agents (CUAs) translate natural-language instructions into Graphical User Interface (GUI) actions such as clicks, keystrokes, and scrolls by …

Xunzhuo Liu, Bowei He, Xue Liu, Andy Luo, Haichen Zhang, Huamin Chen
81 views
Academic · 1 min

Long-form RewardBench: Evaluating Reward Models for Long-form Generation

arXiv:2603.12963v1 Announce Type: new Abstract: The widespread adoption of reinforcement learning-based alignment highlights the growing importance of reward models. Various benchmarks have been built to …

Hui Huang, Yancheng He, Wei Liu, Muyun Yang, Jiaheng Liu, Kehai Chen, Bing Xu, Conghui Zhu, Hailong Cao, Tiejun Zhao
9 views