Academic

Academic

Academic · 1 min

Deriving Hyperparameter Scaling Laws via Modern Optimization Theory

arXiv:2603.15958v1 Announce Type: new Abstract: Hyperparameter transfer has become an important component of modern large-scale training recipes. Existing methods, such as muP, primarily focus on …

Egor Shulgin, Dimitri von R\"utte, Tianyue H. Zhang, Niccol\`o Ajroldi, Bernhard Sch\"olkopf, Antonio Orvieto
7 views
Academic · 1 min

W2T: LoRA Weights Already Know What They Can Do

arXiv:2603.15990v1 Announce Type: new Abstract: Each LoRA checkpoint compactly stores task-specific updates in low-rank weight matrices, offering an efficient way to adapt large language models …

Xiaolong Han, Ferrante Neri, Zijian Jiang, Fang Wu, Yanfang Ye, Lu Yin, Zehong Wang
6 views
Academic · 1 min

The Importance of Being Smoothly Calibrated

arXiv:2603.16015v1 Announce Type: new Abstract: Recent work has highlighted the centrality of smooth calibration [Kakade and Foster, 2008] as a robust measure of calibration error. …

Parikshit Gopalan, Konstantinos Stavropoulos, Kunal Talwar, Pranay Tankala
29 views