Academic

POLAR:A Per-User Association Test in Embedding Space

arXiv:2603.15950v1 Announce Type: new Abstract: Most intrinsic association probes operate at the word, sentence, or corpus level, obscuring author-level variation. We present POLAR (Per-user On-axis Lexical Association Re-port), a per-user lexical association test that runs in the embedding space of a lightly adapted masked language model. Authors are represented by private deterministic to-kens; POLAR projects these vectors onto curated lexicalaxes and reports standardized effects with permutation p-values and Benjamini--Hochberg control. On a balanced bot--human Twitter benchmark, POLAR cleanly separates LLM-driven bots from organic accounts; on an extremist forum,it quantifies strong alignment with slur lexicons and reveals rightward drift over time. The method is modular to new attribute sets and provides concise, per-author diagnostics for computational social science. All code is publicly avail-able at https://github.com/pedroaugtb/POLAR-A-Per-User-Association-Test-in-Embedding-

Pedro Bento, Arthur Buzelin, Arthur Chagas, Yan Aquino, Victoria Estanislau, Samira Malaquias, Pedro Robles Dutenhefner, Gisele L. Pappa, Virgilio Almeida, Wagner MeiraJr · March 18, 2026 · 1 min read · 25 views

#cs.CL #cs.CY #cs.SI

Executive Summary

This article introduces POLAR, a novel per-user lexical association test that operates in the embedding space of a masked language model. POLAR separates LLM-driven bots from organic accounts on a Twitter benchmark and quantifies extremist alignment with slur lexicons on a forum dataset. The method is modular and provides concise diagnostics for computational social science. While demonstrating promising results, POLAR's performance may be influenced by the quality of the underlying language model and curated lexical axes. The publicly available code and dataset facilitate replication and extension of the research. As a tool for computational social science, POLAR has the potential to contribute to the analysis of online behavior and author-level variation in language use.

Key Points

▸ POLAR operates in the embedding space of a masked language model, offering a novel approach to per-user association testing.
▸ The method separates LLM-driven bots from organic accounts on a Twitter benchmark and identifies extremist alignment with slur lexicons.
▸ POLAR is modular and provides concise diagnostics for computational social science applications.

Merits

Strength in Addressing Author-Level Variation

POLAR's per-user association testing approach offers a nuanced understanding of author-level variation in language use, complementing existing methods that operate at the word, sentence, or corpus level.

Modularity and Flexibility

The method's modularity allows for easy adaptation to new attribute sets and applications, increasing its potential impact in computational social science.

Demerits

Dependence on Language Model Quality

The performance of POLAR may be influenced by the quality of the underlying language model and curated lexical axes, which could impact its accuracy and reliability.

Limited Generalizability

The method's performance on specific datasets and applications may not generalize to other contexts, requiring further validation and adaptation.

Expert Commentary

The introduction of POLAR represents a significant advancement in the field of computational social science, offering a novel approach to per-user association testing. While the method demonstrates promising results, its performance may be influenced by the quality of the underlying language model and curated lexical axes. To fully realize the potential of POLAR, further research is needed to adapt the method to diverse applications and validate its generalizability. Additionally, the method's modularity and flexibility make it an attractive tool for computational social scientists, enabling them to leverage the power of machine learning to analyze online behavior and author-level variation in language use.

Recommendations

✓ Future research should focus on adapting POLAR to diverse applications, including social media platforms, online forums, and text-based communication systems.
✓ The method's performance should be validated on a wider range of datasets and contexts to ensure its generalizability and robustness.

Sources

arXiv - cs.CL

POLAR:A Per-User Association Test in Embedding Space

AI Commentary

Executive Summary

Key Points

Merits

Strength in Addressing Author-Level Variation

Modularity and Flexibility

Demerits

Dependence on Language Model Quality

Limited Generalizability

Expert Commentary

Recommendations

Sources

Related Articles

AI-Driven Approaches to Enhancing Fairness and Identifying Algorithmic Bias in …

High resolution schemes for hyperbolic conservation laws

Robust Graph Representation Learning via Adaptive Spectral Contrast

Towards Intrinsically Calibrated Uncertainty Quantification in Industrial Data-Driven Models via …

JCG, PC

HSOLLC Co., Ltd.