Academic

Compression is all you need: Modeling Mathematics

arXiv:2603.20396v1 Announce Type: new Abstract: Human mathematics (HM), the mathematics humans discover and value, is a vanishingly small subset of formal mathematics (FM), the totality of all valid deductions. We argue that HM is distinguished by its compressibility through hierarchically nested definitions, lemmas, and theorems. We model this with monoids. A mathematical deduction is a string of primitive symbols; a definition or theorem is a named substring or macro whose use compresses the string. In the free abelian monoid $A_n$, a logarithmically sparse macro set achieves exponential expansion of expressivity. In the free non-abelian monoid $F_n$, even a polynomially-dense macro set only yields linear expansion; superlinear expansion requires near-maximal density. We test these models against MathLib, a large Lean~4 library of mathematics that we take as a proxy for HM. Each element has a depth (layers of definitional nesting), a wrapped length (tokens in its definition), and an

arXiv:2603.20396v1 Announce Type: new Abstract: Human mathematics (HM), the mathematics humans discover and value, is a vanishingly small subset of formal mathematics (FM), the totality of all valid deductions. We argue that HM is distinguished by its compressibility through hierarchically nested definitions, lemmas, and theorems. We model this with monoids. A mathematical deduction is a string of primitive symbols; a definition or theorem is a named substring or macro whose use compresses the string. In the free abelian monoid $A_n$, a logarithmically sparse macro set achieves exponential expansion of expressivity. In the free non-abelian monoid $F_n$, even a polynomially-dense macro set only yields linear expansion; superlinear expansion requires near-maximal density. We test these models against MathLib, a large Lean~4 library of mathematics that we take as a proxy for HM. Each element has a depth (layers of definitional nesting), a wrapped length (tokens in its definition), and an unwrapped length (primitive symbols after fully expanding all references). We find unwrapped length grows exponentially with both depth and wrapped length; wrapped length is approximately constant across all depths. These results are consistent with $A_n$ and inconsistent with $F_n$, supporting the thesis that HM occupies a polynomially-growing subset of the exponentially growing space FM. We discuss how compression, measured on the MathLib dependency graph, and a PageRank-style analysis of that graph can quantify mathematical interest and help direct automated reasoning toward the compressible regions where human mathematics lives.

Executive Summary

The article presents a novel mathematical modeling framework proposing that human mathematics (HM) is distinguished by compressibility via hierarchically nested definitions, lemmas, and theorems, modeled via monoids. The authors argue that HM occupies a polynomially-growing subset of the exponentially expanding formal mathematics (FM) space. Using a monoid-based abstraction, the study applies free abelian (A_n) and non-abelian (F_n) monoids to quantify compressibility effects: in A_n, sparse macro sets yield exponential expressivity, while in F_n, dense sets yield only linear expansion. Empirical analysis of MathLib, a Lean~4 library, supports the A_n model through observed exponential growth in unwrapped length relative to depth and wrapped length, aligning with the monoid-based hypothesis. The findings suggest a quantitative method to identify human-mathematics-centric regions via compression metrics and graph analytics, offering potential tools for automated reasoning.

Key Points

  • HM is compressible via nested definitions/lemmas/theorems, modeled via monoids.
  • Free abelian monoids (A_n) enable exponential expressivity with sparse macros; non-abelian (F_n) limit growth to linear.
  • Empirical validation using MathLib supports the A_n hypothesis via observed exponential unwrapped length growth.

Merits

Theoretical Novelty

Introduces a monoid-based framework to reconcile human and formal mathematics, offering a novel lens on mathematical compressibility.

Empirical Support

The analysis of MathLib provides tangible data backing the theoretical model’s predictions, strengthening validity.

Demerits

Assumption Constraint

Relies on the implicit assumption that MathLib sufficiently encapsulates HM; potential bias if the library is skewed toward formal or non-human-centric constructs.

Generalizability Concern

Results may not extend to non-Lean or non-mathematical domains without adaptation, limiting applicability.

Expert Commentary

This paper makes a significant contribution to the epistemology of mathematics by proposing a formal, quantifiable mechanism—monoid-based compression—to distinguish human from formal mathematics. The choice of monoids as an algebraic abstraction is both elegant and well-suited to capture hierarchical compression, particularly through the contrast between abelian and non-abelian structures. The empirical validation using MathLib is particularly compelling: the observed patterns of unwrapped length growth align precisely with the theoretical predictions of the free abelian monoid model, lending substantial credibility to the hypothesis. Moreover, the application of graph-theoretic analytics (PageRank-style) to compressibility opens a new frontier in automated reasoning, suggesting applications in knowledge graphs, semantic indexing, and educational tools. While the reliance on a specific library introduces a potential limitation, the conceptual framework transcends platform specificity—it offers a universal lens for evaluating mathematical artifacts. This work bridges a long-standing gap between human intuition and formal verification, and I anticipate it will become a foundational reference in computational epistemology and AI-assisted mathematics.

Recommendations

  • Integrate compression metrics into academic repositories and AI platforms to enhance discoverability of human-relevant mathematical content.
  • Extend the monoid-based framework to other domains (e.g., physics, logic) to test applicability across scientific disciplines.

Sources

Original: arXiv - cs.AI