Fundamental Limits of Neural Network Sparsification: Evidence from Catastrophic Interpretability Collapse
arXiv:2603.18056v1 Announce Type: new Abstract: Extreme neural network sparsification (90% activation reduction) presents a critical challenge for mechanistic interpretability: understanding whether interpretable features survive aggressive …
Dip Roy, Rajiv Misra, Sanjay Kumar Singh
5 views