POLAR:A Per-User Association Test in Embedding Space
arXiv:2603.15950v1 Announce Type: new Abstract: Most intrinsic association probes operate at the word, sentence, or corpus level, obscuring author-level variation. We present POLAR (Per-user On-axis Lexical Association Re-port), a per-user lexical association test that runs in the embedding space of a lightly adapted masked language model. Authors are represented by private deterministic to-kens; POLAR projects these vectors onto curated lexicalaxes and reports standardized effects with permutation p-values and Benjamini--Hochberg control. On a balanced bot--human Twitter benchmark, POLAR cleanly separates LLM-driven bots from organic accounts; on an extremist forum,it quantifies strong alignment with slur lexicons and reveals rightward drift over time. The method is modular to new attribute sets and provides concise, per-author diagnostics for computational social science. All code is publicly avail-able at https://github.com/pedroaugtb/POLAR-A-Per-User-Association-Test-in-Embedding-
arXiv:2603.15950v1 Announce Type: new Abstract: Most intrinsic association probes operate at the word, sentence, or corpus level, obscuring author-level variation. We present POLAR (Per-user On-axis Lexical Association Re-port), a per-user lexical association test that runs in the embedding space of a lightly adapted masked language model. Authors are represented by private deterministic to-kens; POLAR projects these vectors onto curated lexicalaxes and reports standardized effects with permutation p-values and Benjamini--Hochberg control. On a balanced bot--human Twitter benchmark, POLAR cleanly separates LLM-driven bots from organic accounts; on an extremist forum,it quantifies strong alignment with slur lexicons and reveals rightward drift over time. The method is modular to new attribute sets and provides concise, per-author diagnostics for computational social science. All code is publicly avail-able at https://github.com/pedroaugtb/POLAR-A-Per-User-Association-Test-in-Embedding-Space.
Executive Summary
This article introduces POLAR, a novel per-user lexical association test that operates in the embedding space of a masked language model. POLAR separates LLM-driven bots from organic accounts on a Twitter benchmark and quantifies extremist alignment with slur lexicons on a forum dataset. The method is modular and provides concise diagnostics for computational social science. While demonstrating promising results, POLAR's performance may be influenced by the quality of the underlying language model and curated lexical axes. The publicly available code and dataset facilitate replication and extension of the research. As a tool for computational social science, POLAR has the potential to contribute to the analysis of online behavior and author-level variation in language use.
Key Points
- ▸ POLAR operates in the embedding space of a masked language model, offering a novel approach to per-user association testing.
- ▸ The method separates LLM-driven bots from organic accounts on a Twitter benchmark and identifies extremist alignment with slur lexicons.
- ▸ POLAR is modular and provides concise diagnostics for computational social science applications.
Merits
Strength in Addressing Author-Level Variation
POLAR's per-user association testing approach offers a nuanced understanding of author-level variation in language use, complementing existing methods that operate at the word, sentence, or corpus level.
Modularity and Flexibility
The method's modularity allows for easy adaptation to new attribute sets and applications, increasing its potential impact in computational social science.
Demerits
Dependence on Language Model Quality
The performance of POLAR may be influenced by the quality of the underlying language model and curated lexical axes, which could impact its accuracy and reliability.
Limited Generalizability
The method's performance on specific datasets and applications may not generalize to other contexts, requiring further validation and adaptation.
Expert Commentary
The introduction of POLAR represents a significant advancement in the field of computational social science, offering a novel approach to per-user association testing. While the method demonstrates promising results, its performance may be influenced by the quality of the underlying language model and curated lexical axes. To fully realize the potential of POLAR, further research is needed to adapt the method to diverse applications and validate its generalizability. Additionally, the method's modularity and flexibility make it an attractive tool for computational social scientists, enabling them to leverage the power of machine learning to analyze online behavior and author-level variation in language use.
Recommendations
- ✓ Future research should focus on adapting POLAR to diverse applications, including social media platforms, online forums, and text-based communication systems.
- ✓ The method's performance should be validated on a wider range of datasets and contexts to ensure its generalizability and robustness.