Category

Academic

Academic · 1 min

CLIPO: Contrastive Learning in Policy Optimization Generalizes RLVR

arXiv:2603.10101v1 Announce Type: new Abstract: Reinforcement Learning with Verifiable Rewards (RLVR) has significantly advanced the reasoning capacity of Large Language Models (LLMs). However, RLVR solely …

Sijia Cui, Pengyu Cheng, Jiajun Song, Yongbo Gai, Guojun Zhang, Zhechao Yu, Jianhe Lin, Xiaoxi Jiang, Guanjun Jiang
11 views
Academic · 1 min

Mashup Learning: Faster Finetuning by Remixing Past Checkpoints

arXiv:2603.10156v1 Announce Type: new Abstract: Finetuning on domain-specific data is a well-established method for enhancing LLM performance on downstream tasks. Training on each dataset produces …

Sofia Maria Lo Cicero Vaina, Artem Chumachenko, Max Ryabinin
17 views
Academic · 1 min

Rethinking the Harmonic Loss via Non-Euclidean Distance Layers

arXiv:2603.10225v1 Announce Type: new Abstract: Cross-entropy loss has long been the standard choice for training deep neural networks, yet it suffers from interpretability limitations, unbounded …

Maxwell Miller-Golub, Kamil Faber, Marcin Pietron, Panpan Zheng, Pasquale Minervini, Roberto Corizzo
18 views