Deriving Hyperparameter Scaling Laws via Modern Optimization Theory
arXiv:2603.15958v1 Announce Type: new Abstract: Hyperparameter transfer has become an important component of modern large-scale training recipes. Existing methods, such as muP, primarily focus on …
Egor Shulgin, Dimitri von R\"utte, Tianyue H. Zhang, Niccol\`o Ajroldi, Bernhard Sch\"olkopf, Antonio Orvieto
9 views