Academic

Sharpness-Aware Minimization in Logit Space Efficiently Enhances Direct Preference Optimization

arXiv:2603.18258v1 Announce Type: new Abstract: Direct Preference Optimization (DPO) has emerged as a popular algorithm for aligning pretrained large language models with human preferences, owing to its simplicity and training stability. However, DPO suffers from the recently identified squeezing effect (also known as likelihood displacement), where the probability of preferred responses decreases unintentionally during training. To understand and mitigate this phenomenon, we develop a theoretical framework that models the coordinate-wise dynamics in logit space. Our analysis reveals that negative-gradient updates cause residuals to expand rapidly along high-curvature directions, which underlies the squeezing effect, whereas Sharpness-Aware Minimization (SAM) can suppress this behavior through its curvature-regularization effect. Building on this insight, we investigate logits-SAM, a computationally efficient variant that perturbs only the output layer with negligible overhead. Extens

arXiv:2603.18258v1 Announce Type: new Abstract: Direct Preference Optimization (DPO) has emerged as a popular algorithm for aligning pretrained large language models with human preferences, owing to its simplicity and training stability. However, DPO suffers from the recently identified squeezing effect (also known as likelihood displacement), where the probability of preferred responses decreases unintentionally during training. To understand and mitigate this phenomenon, we develop a theoretical framework that models the coordinate-wise dynamics in logit space. Our analysis reveals that negative-gradient updates cause residuals to expand rapidly along high-curvature directions, which underlies the squeezing effect, whereas Sharpness-Aware Minimization (SAM) can suppress this behavior through its curvature-regularization effect. Building on this insight, we investigate logits-SAM, a computationally efficient variant that perturbs only the output layer with negligible overhead. Extensive experiments on Pythia-2.8B, Mistral-7B, and Gemma-2B-IT across multiple datasets and benchmarks demonstrate that logits-SAM consistently improves the effectiveness of DPO and integrates seamlessly with other DPO variants. Code is available at https://github.com/RitianLuo/logits-sam-dpo.

Executive Summary

This article presents logits-SAM, a computationally efficient variant of Sharpness-Aware Minimization (SAM) designed to mitigate the squeezing effect in Direct Preference Optimization (DPO). By modeling coordinate-wise dynamics in logit space, the authors reveal that negative-gradient updates cause residuals to expand rapidly along high-curvature directions, leading to the squeezing effect. Logits-SAM suppresses this behavior through its curvature-regularization effect, improving the effectiveness of DPO and integrating seamlessly with other variants. Extensive experiments demonstrate the efficacy of logits-SAM across multiple datasets and benchmarks.

Key Points

  • The squeezing effect in DPO is caused by negative-gradient updates that expand residuals along high-curvature directions.
  • Logits-SAM mitigates the squeezing effect through its curvature-regularization effect.
  • Logits-SAM is a computationally efficient variant of SAM with negligible overhead.

Merits

Strength in addressing the squeezing effect

The article provides a clear and comprehensive analysis of the squeezing effect and its mitigation through logits-SAM, addressing a significant limitation in DPO.

Demerits

Limited evaluation on diverse models

The article primarily focuses on three large language models (Pythia-2.8B, Mistral-7B, and Gemma-2B-IT), and further evaluation on diverse models would strengthen the results.

Lack of comparison with other regularization techniques

The article could benefit from a comparison with other regularization techniques to demonstrate the unique value proposition of logits-SAM.

Expert Commentary

The article presents a well-motivated and carefully executed study on logits-SAM, a computationally efficient variant of SAM designed to mitigate the squeezing effect in DPO. The authors' analysis and experiments demonstrate the efficacy of logits-SAM in improving the effectiveness of DPO. However, to further strengthen the results, it would be beneficial to evaluate logits-SAM on a diverse set of models and compare it with other regularization techniques. Additionally, exploring the application of logits-SAM in other NLP tasks and real-world scenarios would be a valuable direction for future research.

Recommendations

  • Future research should investigate the application of logits-SAM in other NLP tasks and real-world scenarios.
  • A comparison with other regularization techniques would be beneficial to demonstrate the unique value proposition of logits-SAM.

Sources