Navigating the Concept Space of Language Models
arXiv:2603.23524v1 Announce Type: new Abstract: Sparse autoencoders (SAEs) trained on large language model activations output thousands of features that enable mapping to human-interpretable concepts. The …
Wilson E. Marc\'ilio-Jr, Danilo M. Eler
57 views