All Articles

Articles

Academic · 1 min

Do We Need Frontier Models to Verify Mathematical Proofs?

arXiv:2604.02450v1 Announce Type: new Abstract: Advances in training, post-training, and inference-time methods have enabled frontier reasoning models to win gold medals in math competitions and …

Aaditya Naik, Guruprerana Shabadi, Rajeev Alur, Mayur Naik
1 views
Academic · 1 min

OPRIDE: Offline Preference-based Reinforcement Learning via In-Dataset Exploration

arXiv:2604.02349v1 Announce Type: cross Abstract: Preference-based reinforcement learning (PbRL) can help avoid sophisticated reward designs and align better with human intentions, showing great promise in …

Yiqin Yang, Hao Hu, Yihuan Mao, Jin Zhang, Chengjie Wu, Yuhua Jiang, Xu Yang, Runpeng Xie, Yi Fan, Bo Liu, Yang Gao, Bo Xu, Chongjie Zhang
2 views