Latent Algorithmic Structure Precedes Grokking: A Mechanistic Study of ReLU MLPs on Modular Arithmetic
arXiv:2603.23784v1 Announce Type: new Abstract: Grokking-the phenomenon where validation accuracy of neural networks on modular addition of two integers rises long after training data has …
Anand Swaroop
4 views