Large Spikes in Stochastic Gradient Descent: A Large-Deviations View
arXiv:2603.10079v1 Announce Type: new Abstract: We analyse SGD training of a shallow, fully connected network in the NTK scaling and provide a quantitative theory of …
Benjamin Gess, Daniel Heydecker
9 views