Academic

Locally Confident, Globally Stuck: The Quality-Exploration Dilemma in Diffusion Language Models

arXiv:2604.00375v1 Announce Type: new Abstract: Diffusion large language models (dLLMs) theoretically permit token decoding in arbitrary order, a flexibility that could enable richer exploration of reasoning paths than autoregressive (AR) LLMs. In practice, however, random-order decoding often hurts generation quality. To mitigate this, low-confidence remasking improves single-sample quality (e.g., Pass@$1$) by prioritizing confident tokens, but it also suppresses exploration and limits multi-sample gains (e.g., Pass@$k$), creating a fundamental quality--exploration dilemma. In this paper, we provide a unified explanation of this dilemma. We show that low-confidence remasking improves a myopic proxy for quality while provably constraining the entropy of the induced sequence distribution. To overcome this limitation, we characterize the optimal distribution that explicitly balances quality and exploration, and develop a simple Independent Metropolis--Hastings sampler that approximately

arXiv:2604.00375v1 Announce Type: new Abstract: Diffusion large language models (dLLMs) theoretically permit token decoding in arbitrary order, a flexibility that could enable richer exploration of reasoning paths than autoregressive (AR) LLMs. In practice, however, random-order decoding often hurts generation quality. To mitigate this, low-confidence remasking improves single-sample quality (e.g., Pass@$1$) by prioritizing confident tokens, but it also suppresses exploration and limits multi-sample gains (e.g., Pass@$k$), creating a fundamental quality--exploration dilemma. In this paper, we provide a unified explanation of this dilemma. We show that low-confidence remasking improves a myopic proxy for quality while provably constraining the entropy of the induced sequence distribution. To overcome this limitation, we characterize the optimal distribution that explicitly balances quality and exploration, and develop a simple Independent Metropolis--Hastings sampler that approximately targets this distribution during decoding. Experiments across a range of reasoning benchmarks including MATH500, AIME24/25, HumanEval, and MBPP show that our approach yields better exploration-quality tradeoff than both random and low-confidence remasking.

Executive Summary

This article addresses the quality-exploration dilemma in diffusion large language models (dLLMs). The authors propose a unified explanation of the dilemma, demonstrating that low-confidence remasking improves single-sample quality but suppresses exploration. To mitigate this, they develop an Independent Metropolis-Hastings sampler that balances quality and exploration. The approach yields better exploration-quality tradeoff in experiments across various reasoning benchmarks. The study highlights the tension between localized confidence and global exploration in dLLMs, shedding light on the limitations of current decoding strategies. By exploring alternative sampling methods, the authors contribute to the advancement of language models, particularly in tasks requiring balanced exploration and quality.

Key Points

  • The quality-exploration dilemma in dLLMs is a fundamental tradeoff between localized confidence and global exploration.
  • Low-confidence remasking improves single-sample quality but suppresses exploration, leading to a suboptimal exploration-quality tradeoff.
  • The authors develop an Independent Metropolis-Hastings sampler that balances quality and exploration, achieving better performance in various reasoning benchmarks.

Merits

Theoretical Insight

The article provides a unified explanation of the quality-exploration dilemma in dLLMs, shedding light on the underlying mechanisms and limitations of current decoding strategies.

Methodological Innovation

The authors propose a novel Independent Metropolis-Hastings sampler that approximates the optimal distribution balancing quality and exploration.

Demerits

Lack of Generalizability

The study focuses on a specific type of language model (dLLMs) and decoding strategy, which may limit the generalizability of the findings to other models or applications.

Experimental Limitations

The experiments are conducted on a relatively small set of reasoning benchmarks, which may not fully capture the complexity and diversity of real-world language understanding tasks.

Expert Commentary

The article presents a well-structured and thorough investigation of the quality-exploration dilemma in dLLMs. The authors' analysis is rigorous and insightful, and their proposed solution is innovative and effective. However, the study could benefit from a more extensive discussion of the limitations and potential applications of the Independent Metropolis-Hastings sampler. Additionally, the authors could explore the connection between their findings and other related areas of research, such as the exploration-exploitation tradeoff and language model evaluation.

Recommendations

  • Future research should focus on generalizing the findings to other types of language models and decoding strategies.
  • The authors should explore the applicability of the Independent Metropolis-Hastings sampler to more complex language understanding tasks and real-world applications.

Sources

Original: arXiv - cs.CL