Academic

Perturbation: A simple and efficient adversarial tracer for representation learning in language models

arXiv:2603.23821v1 Announce Type: new Abstract: Linguistic representation learning in deep neural language models (LMs) has been studied for decades, for both practical and theoretical reasons. However, finding representations in LMs remains an unsolved problem, in part due to a dilemma between enforcing implausible constraints on representations (e.g., linearity; Arora et al. 2024) and trivializing the notion of representation altogether (Sutter et al., 2025). Here we escape this dilemma by reconceptualizing representations not as patterns of activation but as conduits for learning. Our approach is simple: we perturb an LM by fine-tuning it on a single adversarial example and measure how this perturbation ``infects'' other examples. Perturbation makes no geometric assumptions, and unlike other methods, it does not find representations where it should not (e.g., in untrained LMs). But in trained LMs, perturbation reveals structured transfer at multiple linguistic grain sizes, suggesti

Joshua Rozner, Cory Shain · March 26, 2026 · 1 min read · 17 views

#cs.CL #cs.AI #cs.LG

Executive Summary

This article introduces Perturbation, a novel approach to representation learning in deep neural language models. By fine-tuning an LM on a single adversarial example, Perturbation measures how this perturbation affects other examples, revealing structured transfer at multiple linguistic grain sizes. This method avoids the dilemma of enforcing implausible constraints on representations or trivializing the notion of representation. The findings suggest that LMs both generalize along representational lines and acquire linguistic abstractions from experience alone. The authors' reconceptualization of representations as conduits for learning, rather than patterns of activation, offers a fresh perspective on representation learning in LMs.

Key Points

▸ Perturbation is a simple and efficient adversarial tracer for representation learning in language models.
▸ Perturbation measures how a perturbation in an LM affects other examples, revealing structured transfer at multiple linguistic grain sizes.
▸ Perturbation avoids the dilemma of enforcing implausible constraints on representations or trivializing the notion of representation.

Merits

Strength

Perturbation's simplicity and efficiency make it a valuable tool for representation learning in LMs, as it can be easily integrated into existing LM training pipelines.

Strength

Perturbation's ability to reveal structured transfer at multiple linguistic grain sizes provides insights into the generalization and abstraction capabilities of LMs.

Demerits

Limitation

Perturbation's reliance on adversarial examples may limit its applicability to real-world scenarios, where data may not be as carefully curated.

Limitation

The article does not provide a comprehensive evaluation of Perturbation's performance on a diverse range of LMs and tasks.

Expert Commentary

The article's contribution lies in its reconceptualization of representations as conduits for learning, which offers a fresh perspective on representation learning in LMs. Perturbation's ability to reveal structured transfer at multiple linguistic grain sizes provides valuable insights into the generalization and abstraction capabilities of LMs. However, the article's limitations, such as its reliance on adversarial examples and lack of comprehensive evaluation, highlight the need for further research to fully realize the potential of Perturbation. Nevertheless, the study's findings and approach are significant contributions to the field of representation learning in LMs.

Recommendations

✓ Future research should focus on evaluating Perturbation's performance on a diverse range of LMs and tasks, including those that are more representative of real-world scenarios.
✓ Investigating the applicability of Perturbation to other areas of deep learning, such as computer vision and reinforcement learning, could provide further insights into its potential and limitations.

Sources

Original: arXiv - cs.CL

arXiv - cs.CL

Perturbation: A simple and efficient adversarial tracer for representation learning in language models

AI Commentary

Executive Summary

Key Points

Merits

Strength

Strength

Demerits

Limitation

Limitation

Expert Commentary

Recommendations

Sources

Related Articles

AI-Driven Approaches to Enhancing Fairness and Identifying Algorithmic Bias in …

High resolution schemes for hyperbolic conservation laws

Robust Graph Representation Learning via Adaptive Spectral Contrast

Towards Intrinsically Calibrated Uncertainty Quantification in Industrial Data-Driven Models via …

JCG, PC

HSOLLC Co., Ltd.