Academic

Attention Meets Reachability: Structural Equivalence and Efficiency in Grammar-Constrained LLM Decoding

arXiv:2603.05540v1 Announce Type: new Abstract: We study grammar-constrained decoding (GCD) as a coupling between an autoregressive next-token distribution and a reachability oracle over a pushdown system compiled from a context-free grammar (CFG). We prove an oracle invariance theorem: language-equivalent grammars induce identical admissible next-token sets for every prefix, hence identical logit masks, yet can yield provably different compiled state spaces and online ambiguity costs. We give exact control-state blowup counts for the canonical $a^n b^n$ language under redundant nonterminal delegation, and introduce a left-to-right structural ambiguity cost (SAC) measuring incremental packed-parse-forest growth per token. For two equivalent grammars over all finite strings, SAC is $O(1)$ per token under right-recursion but $\Theta(t^2)$ per token and $\Theta(n^3)$ cumulatively under concatenation. We establish engine-independent lower bounds: any sound, retrieval-efficient, parse-pres

F
Faruk Alpay, Bilge Senturk
· · 1 min read · 18 views

arXiv:2603.05540v1 Announce Type: new Abstract: We study grammar-constrained decoding (GCD) as a coupling between an autoregressive next-token distribution and a reachability oracle over a pushdown system compiled from a context-free grammar (CFG). We prove an oracle invariance theorem: language-equivalent grammars induce identical admissible next-token sets for every prefix, hence identical logit masks, yet can yield provably different compiled state spaces and online ambiguity costs. We give exact control-state blowup counts for the canonical $a^n b^n$ language under redundant nonterminal delegation, and introduce a left-to-right structural ambiguity cost (SAC) measuring incremental packed-parse-forest growth per token. For two equivalent grammars over all finite strings, SAC is $O(1)$ per token under right-recursion but $\Theta(t^2)$ per token and $\Theta(n^3)$ cumulatively under concatenation. We establish engine-independent lower bounds: any sound, retrieval-efficient, parse-preserving online masking engine must incur $\Omega(t^2)$ work per token on a specific constant-size CFG family, unconditionally within this model. We define decoding-cost equivalence classes of grammars and prove existence of minimal-SAC representatives within bounded rewrite families. Finally, we characterize the true conditional sampler via a Doob $h$-transform and derive sharp one-step KL and total-variation distortion bounds for hard-masked decoding in terms of survival-probability spread among admissible next tokens. We integrate these results with Transformer and Mixture-of-Experts architectures, derive latency envelopes in terms of vocabulary size, active state sets, and beam width, and connect SAC to instrumentation-based predictive performance models and automated grammar optimization.

Executive Summary

This article presents a rigorous mathematical analysis of grammar-constrained decoding (GCD) within autoregressive LLM frameworks, establishing a novel oracle invariance theorem that links language-equivalent CFGs to identical admissible next-token sets despite divergent compiled state spaces and ambiguity costs. The paper introduces a novel left-to-right structural ambiguity cost (SAC) metric, quantifies blowup in canonical languages under delegation, and derives engine-independent lower bounds on computational overhead for sound, retrieval-efficient masking. Importantly, the work bridges theoretical computer science and NLP by connecting SAC to predictive performance models and grammar optimization, offering quantifiable tradeoffs between equivalence, efficiency, and complexity across grammar families. The integration with Transformer and MoE architectures provides actionable latency modeling for real-world deployments.

Key Points

  • Proof of oracle invariance theorem linking language-equivalent CFGs to identical admissible next-token sets.
  • Introduction of SAC as a novel metric quantifying incremental packed-parse-forest growth per token.
  • Derivation of engine-independent lower bounds requiring Ω(t²) work per token on specific CFG families.

Merits

Theoretical Significance

Establishes foundational invariance principles in GCD, advancing the intersection of formal linguistics and LLMs.

Practical Utility

Provides quantifiable SAC metrics enabling predictive latency and performance modeling for production systems.

Demerits

Complexity of Application

Derivations involve advanced formal methods (pushdown systems, CFG delegation) that may limit accessibility for non-specialists.

Scope Limitation

Results are primarily theoretical; empirical validation across diverse LLM architectures remains to be demonstrated.

Expert Commentary

The paper represents a watershed moment in the formalization of decoding constraints within LLMs. The oracle invariance theorem is particularly compelling—it reveals that syntactic equivalence at the surface level does not imply equivalence in computational behavior, a insight that defies intuition and demands reevaluation of current decoding heuristics. The SAC metric, while mathematically elegant, introduces a novel dimension to decoding evaluation beyond traditional perplexity, aligning with recent trends in performance modeling via structural metrics. Moreover, the lower bound proofs establish fundamental constraints on any viable decoding engine, effectively setting a new standard for evaluation criteria. While the theoretical rigor is undeniable, the paper’s greatest impact lies in its synthesis: the ability to link equivalence, efficiency, and complexity in a unified framework opens the door to automated grammar optimization tools and predictive instrumentation platforms that could redefine how LLMs are deployed in constrained environments. This work deserves to be cited as a benchmark in future research on constrained decoding architectures.

Sources