Academic

Academic

Academic · 1 min

Partial Policy Gradients for RL in LLMs

arXiv:2603.06138v1 Announce Type: new Abstract: Reinforcement learning is a framework for learning to act sequentially in an unknown environment. We propose a natural approach for …

Puneet Mathur, Branislav Kveton, Subhojyoti Mukherjee, Viet Dac Lai
9 views
Academic · 1 min

DC-Merge: Improving Model Merging with Directional Consistency

arXiv:2603.06242v1 Announce Type: new Abstract: Model merging aims to integrate multiple task-adapted models into a unified model that preserves the knowledge of each task. In …

Han-Chen Zhang, Zi-Hao Zhou, Mao-Lin Luo, Shimin Di, Min-Ling Zhang, Tong Wei
5 views