Academic

Academic

Academic · 1 min

Near-Optimal Sample Complexity for Online Constrained MDPs

arXiv:2602.15076v1 Announce Type: new Abstract: Safety is a fundamental challenge in reinforcement learning (RL), particularly in real-world applications such as autonomous driving, robotics, and healthcare. …

Chang Liu, Yunfan Li, Lin F. Yang
12 views
Academic · 1 min

Automatically Finding Reward Model Biases

arXiv:2602.15222v1 Announce Type: new Abstract: Reward models are central to large language model (LLM) post-training. However, past work has shown that they can reward spurious …

Atticus Wang, Iv\'an Arcuschin, Arthur Conmy
23 views
Academic · 1 min

Closing the Distribution Gap in Adversarial Training for LLMs

arXiv:2602.15238v1 Announce Type: new Abstract: Adversarial training for LLMs is one of the most promising methods to reliably improve robustness against adversaries. However, despite significant …

Chengzhi Hu, Jonas Dornbusch, David L\"udke, Stephan G\"unnemann, Leo Schwinn
20 views