Academic

Compression Favors Consistency, Not Truth: When and Why Language Models Prefer Correct Information

arXiv:2603.11749v1 Announce Type: new Abstract: Why do language models sometimes prefer correct statements even when trained on mixed-quality data? We introduce the Compression--Consistency Principle: next-token prediction favors hypotheses that allow shorter and more internally consistent descriptions of the training data. Truth bias emerges only when false alternatives are structurally harder to compress. We test this using small GPT-2-style character-level transformers (3.5M--86M parameters) on synthetic math corpora with controlled mixtures of correct and incorrect rules. In the random-error setting, models strongly prefer correct completions in paired evaluation: 83.1% accuracy at balanced data and 67.0% even when correct rules appear in only 10% of the corpus. Replacing random errors with a coherent but mathematically incorrect rule system largely eliminates the preference (near-chance accuracy). In a more natural-language-like synthetic world, the effect is weaker but still pre

K
Konstantin Krestnikov
· · 1 min read · 6 views

arXiv:2603.11749v1 Announce Type: new Abstract: Why do language models sometimes prefer correct statements even when trained on mixed-quality data? We introduce the Compression--Consistency Principle: next-token prediction favors hypotheses that allow shorter and more internally consistent descriptions of the training data. Truth bias emerges only when false alternatives are structurally harder to compress. We test this using small GPT-2-style character-level transformers (3.5M--86M parameters) on synthetic math corpora with controlled mixtures of correct and incorrect rules. In the random-error setting, models strongly prefer correct completions in paired evaluation: 83.1% accuracy at balanced data and 67.0% even when correct rules appear in only 10% of the corpus. Replacing random errors with a coherent but mathematically incorrect rule system largely eliminates the preference (near-chance accuracy). In a more natural-language-like synthetic world, the effect is weaker but still present (57.7%). Additional experiments show that embedding verification steps can restore preference for correctness even at small scale, while increasing the number of consistent rules produces a graded improvement in accuracy. Our results suggest that what appears as a "truth bias" is largely a side effect of compression pressure and preference for internal consistency, rather than an intrinsic drive toward truth. Full code and data are available at https://github.com/Rai220/compression-drives-truth.

Executive Summary

The article presents a compelling framework for understanding the apparent 'truth bias' in language models, proposing that the preference for correct statements stems from compression efficiency and internal consistency rather than an intrinsic drive toward truth. Using character-level transformers on synthetic math corpora, the authors demonstrate that models disproportionately favor correct completions when data compression and consistency are optimized—particularly in random error settings. The effect diminishes when errors are coherent and structurally consistent, suggesting that the observed behavior is contingent on compression pressure. The findings are robust across parameter scales and synthetic domains, with embedding verification steps mitigating the effect, indicating a modifiable influence. The work shifts the interpretive lens from epistemic bias to architectural constraint, offering a nuanced view of model behavior.

Key Points

  • Preference for correct statements is driven by compression and consistency, not inherent truth bias.
  • Effect diminishes when data is coherently inconsistent, indicating structural dependency.
  • Embedding verification steps can restore or suppress the effect, demonstrating modifiability.

Merits

Strength

The study offers a clear, empirically grounded mechanism for a previously ambiguous phenomenon, enhancing interpretability.

Demerits

Limitation

The synthetic data context limits generalizability to real-world, unstructured text, which may involve additional confounding factors.

Expert Commentary

This article represents a significant advance in demystifying the apparent 'truth bias' phenomenon in large language models. The Compression--Consistency Principle is both theoretically elegant and empirically validated. The authors effectively disentangle the conflation of epistemic bias and computational efficiency, which has long plagued discussions around model behavior. Their controlled experiments using synthetic corpora with quantifiable consistency and compression metrics are particularly noteworthy for their rigor. Moreover, the demonstration that embedding verification can counteract the effect—even at small scales—suggests practical avenues for mitigating unintended consequences. While the synthetic data limitation is a valid caveat, the findings are robust enough to inform broader epistemological debates in AI. This work bridges a gap between cognitive models of human reasoning and computational AI behavior, offering a framework applicable to both domains. It underscores the importance of reevaluating assumptions about model motivations—shifting from anthropomorphic interpretations to computational pragmatics.

Recommendations

  • Future research should extend this framework to real-world datasets with heterogeneous quality and structure.
  • Developers should integrate compression and consistency metrics into evaluation pipelines to better predict and mitigate unintended biases.

Sources