Academic

Polysemanticity or Polysemy? Lexical Identity Confounds Superposition Metrics

arXiv:2604.00443v1 Announce Type: new Abstract: If the same neuron activates for both "lender" and "riverside," standard metrics attribute the overlap to superposition--the neuron must be compressing two unrelated concepts. This work explores how much of the overlap is due a lexical confound: neurons fire for a shared word form (such as "bank") rather than for two compressed concepts. A 2x2 factorial decomposition reveals that the lexical-only condition (same word, different meaning) consistently exceeds the semantic-only condition (different word, same meaning) across models spanning 110M-70B parameters. The confound carries into sparse autoencoders (18-36% of features blend senses), sits in <=1% of activation dimensions, and hurts downstream tasks: filtering it out improves word sense disambiguation and makes knowledge edits more selective (p = 0.002).

I
Iyad Ait Hou, Rebecca Hwa
· · 1 min read · 10 views

arXiv:2604.00443v1 Announce Type: new Abstract: If the same neuron activates for both "lender" and "riverside," standard metrics attribute the overlap to superposition--the neuron must be compressing two unrelated concepts. This work explores how much of the overlap is due a lexical confound: neurons fire for a shared word form (such as "bank") rather than for two compressed concepts. A 2x2 factorial decomposition reveals that the lexical-only condition (same word, different meaning) consistently exceeds the semantic-only condition (different word, same meaning) across models spanning 110M-70B parameters. The confound carries into sparse autoencoders (18-36% of features blend senses), sits in <=1% of activation dimensions, and hurts downstream tasks: filtering it out improves word sense disambiguation and makes knowledge edits more selective (p = 0.002).

Executive Summary

This article investigates the impact of lexical identity on superposition metrics in neural networks. The study reveals that a significant portion of overlap between neurons is attributed to a lexical confound, where neurons fire for shared word forms rather than compressed concepts. The findings suggest that this confound can affect downstream tasks, such as word sense disambiguation and knowledge editing. The results indicate that filtering out this confound can improve task performance. The study contributes to our understanding of neural network behavior and the importance of considering lexical identity in natural language processing tasks.

Key Points

  • Lexical identity confound affects superposition metrics in neural networks
  • Neurons fire for shared word forms rather than compressed concepts
  • Confound impacts downstream tasks, such as word sense disambiguation and knowledge editing

Merits

Strength in Experimental Design

The study employs a 2x2 factorial decomposition, which provides a robust and controlled experimental design to investigate the impact of lexical identity on superposition metrics.

Insight into Neural Network Behavior

The study sheds light on the complex behavior of neural networks and highlights the importance of considering lexical identity in natural language processing tasks.

Demerits

Limitation in Generalizability

The study's findings are limited to a specific range of neural network models (110M-70B parameters), which may not be generalizable to other models or architectures.

Need for Further Investigation

The study's focus on a specific confound may overlook other potential factors that contribute to superposition metrics.

Expert Commentary

The article provides a rigorous investigation into the impact of lexical identity on superposition metrics in neural networks. The study's findings are significant, as they highlight the importance of considering lexical identity in natural language processing tasks. However, the study's limitations, such as the focus on a specific confound and the need for further investigation, are notable. The article's implications for word sense disambiguation and knowledge editing tasks are clear, and the study's findings suggest that filtering out the lexical identity confound can improve task performance. The study's policy implications are also significant, as they highlight the need for natural language processing models to be designed with lexical identity in mind. Overall, the article provides a valuable contribution to the field of natural language processing and highlights the importance of considering lexical identity in neural network design.

Recommendations

  • Future studies should investigate the impact of lexical identity on other natural language processing tasks.
  • Researchers should design neural networks that can effectively handle lexical identity to improve task performance.

Sources

Original: arXiv - cs.CL