Deriving Hyperparameter Scaling Laws via Modern Optimization Theory
arXiv:2603.15958v1 Announce Type: new Abstract: Hyperparameter transfer has become an important component of modern large-scale training recipes. Existing methods, such as muP, primarily focus on …