Academic

In the LLM era, Word Sense Induction remains unsolved

arXiv:2603.11686v1 Announce Type: new Abstract: In the absence of sense-annotated data, word sense induction (WSI) is a compelling alternative to word sense disambiguation, particularly in low-resource or domain-specific settings. In this paper, we emphasize methodological problems in current WSI evaluation. We propose an evaluation on a SemCor-derived dataset, respecting the original corpus polysemy and frequency distributions. We assess pre-trained embeddings and clustering algorithms across parts of speech, and propose and evaluate an LLM-based WSI method for English. We evaluate data augmentation sources (LLM-generated, corpus and lexicon), and semi-supervised scenarios using Wiktionary for data augmentation, must-link constraints, number of clusters per lemma. We find that no unsupervised method (whether ours or previous) surpasses the strong "one cluster per lemma" heuristic (1cpl). We also show that (i) results and best systems may vary across POS, (ii) LLMs have troubles per

A
Anna Mosolova, Marie Candito, Carlos Ramisch
· · 1 min read · 7 views

arXiv:2603.11686v1 Announce Type: new Abstract: In the absence of sense-annotated data, word sense induction (WSI) is a compelling alternative to word sense disambiguation, particularly in low-resource or domain-specific settings. In this paper, we emphasize methodological problems in current WSI evaluation. We propose an evaluation on a SemCor-derived dataset, respecting the original corpus polysemy and frequency distributions. We assess pre-trained embeddings and clustering algorithms across parts of speech, and propose and evaluate an LLM-based WSI method for English. We evaluate data augmentation sources (LLM-generated, corpus and lexicon), and semi-supervised scenarios using Wiktionary for data augmentation, must-link constraints, number of clusters per lemma. We find that no unsupervised method (whether ours or previous) surpasses the strong "one cluster per lemma" heuristic (1cpl). We also show that (i) results and best systems may vary across POS, (ii) LLMs have troubles performing this task, (iii) data augmentation is beneficial and (iv) capitalizing on Wiktionary does help. It surpasses previous SOTA system on our test set by 3.3\%. WSI is not solved, and calls for a better articulation of lexicons and LLMs' lexical semantics capabilities.

Executive Summary

This paper critically evaluates the persistent challenges in word sense induction (WSI) within the LLM era, addressing methodological gaps in current evaluation frameworks. The authors propose a novel evaluation on a SemCor-derived dataset that preserves original polysemy and frequency distributions, offering a more realistic assessment of WSI methods. They assess pre-trained embeddings and clustering algorithms across parts of speech, and introduce an LLM-based WSI approach for English. The study also examines data augmentation sources and semi-supervised augmentation via Wiktionary. Findings reveal that unsupervised methods, including the authors’ own, cannot outperform the strong 'one cluster per lemma' heuristic, indicating persistent limitations in unsupervised WSI. Additionally, LLMs exhibit challenges in this domain, data augmentation positively influences performance, and Wiktionary augmentation yields measurable gains. Overall, the work confirms WSI remains unresolved, calling for deeper integration of lexicons and LLM lexical semantics. The paper contributes substantively to the ongoing discourse on WSI evaluation.

Key Points

  • WSI evaluation methodology is flawed in current literature
  • Authors propose a SemCor-derived dataset for more accurate WSI assessment
  • LLMs struggle with WSI tasks despite general linguistic capabilities

Merits

Methodological Innovation

The authors introduce a more realistic evaluation framework by preserving corpus polysemy and frequency distributions, enhancing validity of WSI evaluations.

Demerits

Persistent Limitations

Unsupervised WSI methods, including the authors’, cannot surpass the 'one cluster per lemma' heuristic, indicating a ceiling on current unsupervised approaches.

Expert Commentary

The paper makes a valuable contribution to the field by confronting the persistent myth that WSI is solved. While LLMs have revolutionized many NLP domains, their application to WSI reveals significant gaps that cannot be bridged without deeper lexical integration. The authors’ empirical findings—particularly the inability of any unsupervised method to outperform the one-cluster-per-lemma heuristic—are both surprising and instructive. This is a wake-up call for the NLP community: WSI remains a problem of semantic alignment, not computational capacity. The use of a SemCor-derived dataset is a methodological boon, offering a more authentic testbed. Moreover, the empirical validation of Wiktionary’s utility as an augmentation source is timely and practical. The paper’s call for better articulation between lexicons and LLMs is not merely academic; it is a necessary step toward more coherent semantic modeling in large-scale AI systems. Future work should extend this analysis to multilingual WSI and explore hybrid models that combine lexical resources with LLM embeddings.

Recommendations

  • Adopt SemCor-derived or corpus-distribution-aware evaluation protocols for WSI research
  • Integrate Wiktionary and other lexical augmentation sources systematically into WSI pipeline development

Sources