Academic

A Concept is More Than a Word: Diversified Unlearning in Text-to-Image Diffusion Models

arXiv:2603.18767v1 Announce Type: new Abstract: Concept unlearning has emerged as a promising direction for reducing the risks of harmful content generation in text-to-image diffusion models by selectively erasing undesirable concepts from a model's parameters. Existing approaches typically rely on keywords to identify the target concept to be unlearned. However, we show that this keyword-based formulation is inherently limited: a visual concept is multi-dimensional, can be expressed in diverse textual forms, and often overlap with related concepts in the latent space, making keyword-only unlearning, which imprecisely indicate the target concept is brittle and prone to over-forgetting. This occurs because a single keyword represents only a narrow point estimate of the concept, failing to cover its full semantic distribution and entangled variations in the latent space. To address this limitation, we propose Diversified Unlearning, a distributional framework that represents a concept t

arXiv:2603.18767v1 Announce Type: new Abstract: Concept unlearning has emerged as a promising direction for reducing the risks of harmful content generation in text-to-image diffusion models by selectively erasing undesirable concepts from a model's parameters. Existing approaches typically rely on keywords to identify the target concept to be unlearned. However, we show that this keyword-based formulation is inherently limited: a visual concept is multi-dimensional, can be expressed in diverse textual forms, and often overlap with related concepts in the latent space, making keyword-only unlearning, which imprecisely indicate the target concept is brittle and prone to over-forgetting. This occurs because a single keyword represents only a narrow point estimate of the concept, failing to cover its full semantic distribution and entangled variations in the latent space. To address this limitation, we propose Diversified Unlearning, a distributional framework that represents a concept through a set of contextually diverse prompts rather than a single keyword. This richer representation enables more precise and robust unlearning. Through extensive experiments across multiple benchmarks and state-of-the-art baselines, we demonstrate that integrating Diversified Unlearning as an add-on component into existing unlearning pipelines consistently achieves stronger erasure, better retention of unrelated concepts, and improved robustness against adversarial recovery attacks.

Executive Summary

This article proposes Diversified Unlearning, a new framework for concept unlearning in text-to-image diffusion models. The authors argue that existing keyword-based approaches are limited and prone to over-forgetting, as they fail to capture the full semantic distribution and entangled variations of a visual concept. Diversified Unlearning represents a concept through a set of contextually diverse prompts, enabling more precise and robust unlearning. The authors demonstrate the effectiveness of their approach through extensive experiments across multiple benchmarks and state-of-the-art baselines. The results show that Diversified Unlearning consistently achieves stronger erasure, better retention of unrelated concepts, and improved robustness against adversarial recovery attacks.

Key Points

  • Diversified Unlearning is a new framework for concept unlearning in text-to-image diffusion models
  • Existing keyword-based approaches are limited and prone to over-forgetting
  • Diversified Unlearning represents a concept through a set of contextually diverse prompts

Merits

Strength

The authors provide a comprehensive analysis of the limitations of existing keyword-based approaches and propose a novel solution that addresses these limitations.

Demerits

Limitation

The authors rely on extensive experiments and benchmarks to demonstrate the effectiveness of Diversified Unlearning, but the generalizability of their results to other domains remains unclear.

Expert Commentary

The article makes a significant contribution to the field of text-to-image diffusion models by addressing a critical limitation of existing approaches. The authors' proposal of Diversified Unlearning is well-motivated and demonstrates a deep understanding of the challenges associated with concept unlearning. However, the article could benefit from a more nuanced discussion of the trade-offs involved in Diversified Unlearning, particularly in terms of computational resources and model complexity. Additionally, the article's results should be further validated through more extensive evaluations and comparisons with other state-of-the-art approaches. Overall, the article is a valuable contribution to the field and has the potential to shape the future development of text-to-image diffusion models.

Recommendations

  • Future research should focus on developing more efficient and scalable versions of Diversified Unlearning
  • The authors should provide more detailed explanations of the computational resources and model complexity associated with Diversified Unlearning

Sources