Academic

Transformers Can Learn Rules They've Never Seen: Proof of Computation Beyond Interpolation

arXiv:2603.17019v1 Announce Type: new Abstract: A central question in the LLM debate is whether transformers can infer rules absent from training, or whether apparent generalisation reduces to similarity-based interpolation over observed examples. We test a strong interpolation-only hypothesis in two controlled settings: one where interpolation is ruled out by construction and proof, and one where success requires emitting intermediate symbolic derivations rather than only final answers. In Experiment 1, we use a cellular automaton with a pure XOR transition rule and remove specific local input patterns from training; since XOR is linearly inseparable, each held-out pattern's nearest neighbours have the opposite label, so similarity-based predictors fail on the held-out region. Yet a two-layer transformer recovers the rule (best 100%; 47/60 converged runs), and circuit extraction identifies XOR computation. Performance depends on multi-step constraint propagation: without unrolling, a

A
Andy Gray
· · 1 min read · 13 views

arXiv:2603.17019v1 Announce Type: new Abstract: A central question in the LLM debate is whether transformers can infer rules absent from training, or whether apparent generalisation reduces to similarity-based interpolation over observed examples. We test a strong interpolation-only hypothesis in two controlled settings: one where interpolation is ruled out by construction and proof, and one where success requires emitting intermediate symbolic derivations rather than only final answers. In Experiment 1, we use a cellular automaton with a pure XOR transition rule and remove specific local input patterns from training; since XOR is linearly inseparable, each held-out pattern's nearest neighbours have the opposite label, so similarity-based predictors fail on the held-out region. Yet a two-layer transformer recovers the rule (best 100%; 47/60 converged runs), and circuit extraction identifies XOR computation. Performance depends on multi-step constraint propagation: without unrolling, accuracy matches output bias (63.1%), while soft unrolling reaches 96.7%. In Experiment 2, we study symbolic operator chains over integers with one operator pair held out; the model must emit intermediate steps and a final answer in a proof-like format. Across all 49 holdout pairs, the transformer exceeds every interpolation baseline (mean 41.8%, up to 78.6%; mean KRR 4.3%; KNN and MLP score 0% on every pair), while removing intermediate-step supervision degrades performance. Together with a construction showing that a standard transformer block can implement exact local Boolean rules, these results provide an existence proof that transformers can learn rule structure not directly observed in training and express it explicitly, ruling out the strongest architectural form of interpolation-only accounts: that transformers cannot in principle discover and communicate unseen rules, while leaving open when such behaviour arises in large-scale language training.

Executive Summary

This study provides evidence that transformers can learn and express rules not directly observed in training data. The researchers conducted two experiments to test the interpolation-only hypothesis, which suggests that apparent generalization in transformers is simply a result of similarity-based interpolation. In Experiment 1, a two-layer transformer successfully recovered a pure XOR transition rule in a cellular automaton, even when specific local input patterns were removed from training. Experiment 2 demonstrated that the transformer could emit intermediate symbolic derivations and a final answer in a proof-like format for symbolic operator chains. These results rule out the strongest architectural form of interpolation-only accounts and provide an existence proof that transformers can learn rule structure not observed in training.

Key Points

  • Transformers can learn and express rules not directly observed in training data.
  • Experiment 1 showed that a two-layer transformer successfully recovered a pure XOR transition rule.
  • Experiment 2 demonstrated that the transformer could emit intermediate symbolic derivations and a final answer in a proof-like format.

Merits

Strength in Design

The study's design is strong, with two controlled experiments that effectively test the interpolation-only hypothesis. The use of a cellular automaton and symbolic operator chains provides a clear and objective measure of the transformer's ability to learn and express rules.

Implications for AI Research

The study's findings have significant implications for AI research, particularly in the context of large-scale language training. If transformers can learn and express rules not observed in training data, it challenges the current understanding of their limitations and opens up new avenues for research.

Demerits

Limitation in Generalizability

The study's results may not generalize to all types of models or domains. The use of a specific type of transformer and a limited set of tasks may limit the applicability of the findings.

Need for Further Research

While the study provides evidence that transformers can learn and express rules not observed in training data, further research is needed to fully understand the mechanisms behind this ability and to explore its implications for AI research.

Expert Commentary

The study provides significant evidence that transformers can learn and express rules not directly observed in training data. The use of two controlled experiments and a clear design makes the findings more robust and generalizable. However, the study's limitations in generalizability and the need for further research to fully understand the mechanisms behind this ability are notable. The implications of the study's findings are far-reaching, with potential applications in a wide range of areas, including explainability in AI, transfer learning, and AI policy.

Recommendations

  • Further research is needed to fully understand the mechanisms behind the transformer's ability to learn and express rules not observed in training data.
  • The study's findings should be replicated and extended to other types of models and domains to further establish their generalizability.

Sources