Spectral Edge Dynamics Reveal Functional Modes of Learning
arXiv:2604.06256v1 Announce Type: new Abstract: Training dynamics during grokking concentrate along a small number of dominant update directions -- the spectral edge -- which reliably …
Quality follows upgrading
arXiv:2604.06256v1 Announce Type: new Abstract: Training dynamics during grokking concentrate along a small number of dominant update directions -- the spectral edge -- which reliably …
arXiv:2603.15678v1 Announce Type: new Abstract: Despite hundreds of millions of parameters, transformer training trajectories evolve within only a few coherent directions. We introduce \emph{Spectral Edge …
arXiv:2602.23696v1 Announce Type: new Abstract: We study the geometry of training trajectories in small transformer models and find that parameter updates organize into a dominant …
arXiv:2602.18523v1 Announce Type: new Abstract: Grokking -- the abrupt transition from memorization to generalization long after near-zero training loss -- has been studied mainly in …
arXiv:2602.18649v1 Announce Type: new Abstract: Grokking -- the abrupt transition from memorization to generalization after extended training -- has been linked to the emergence of …
arXiv:2602.16746v1 Announce Type: new Abstract: Grokking -- the delayed transition from memorization to generalization in small algorithmic tasks -- remains poorly understood. We present a …
arXiv:2602.16967v1 Announce Type: new Abstract: Grokking -- the abrupt transition from memorization to generalization after prolonged training -- has been linked to confinement on low-dimensional …