Diet Your LLM: Dimension-wise Global Pruning of LLMs via Merging Task-specific Importance Score
arXiv:2603.23985v1 Announce Type: new Abstract: Large language models (LLMs) have demonstrated remarkable capabilities, but their massive scale poses significant challenges for practical deployment. Structured pruning offers a promising solution by removing entire dimensions or layers, yet existing methods face critical trade-offs: task-agnostic approaches cannot adapt to task-specific requirements, while task-aware methods require costly training to learn task adaptability. We propose DIET (Dimension-wise global pruning of LLMs via merging Task-wise importance scores), a training-free structured pruning method that combines dimension-level granularity with task-aware selection. DIET profiles activation magnitudes across tasks using only 100 samples per task, then applies majority voting to construct a single global mask. DIET does not require large costs from pre-computation or training. Experiments on seven zero-shot benchmarks using Gemma-2 2B and 9B models demonstrate the effectiv
arXiv:2603.23985v1 Announce Type: new Abstract: Large language models (LLMs) have demonstrated remarkable capabilities, but their massive scale poses significant challenges for practical deployment. Structured pruning offers a promising solution by removing entire dimensions or layers, yet existing methods face critical trade-offs: task-agnostic approaches cannot adapt to task-specific requirements, while task-aware methods require costly training to learn task adaptability. We propose DIET (Dimension-wise global pruning of LLMs via merging Task-wise importance scores), a training-free structured pruning method that combines dimension-level granularity with task-aware selection. DIET profiles activation magnitudes across tasks using only 100 samples per task, then applies majority voting to construct a single global mask. DIET does not require large costs from pre-computation or training. Experiments on seven zero-shot benchmarks using Gemma-2 2B and 9B models demonstrate the effectiveness of DIET; for example, at 20% sparsity on Gemma-2 2B, DIET achieves near 10% average accuracy improvement, compared to previous state-of-the-art structured pruning methods. This advantage persists across various sparsity levels and model scales, positioning DIET as a practical and robust choice for structured LLM pruning.
Executive Summary
This article proposes DIET, a training-free structured pruning method for Large Language Models (LLMs). DIET achieves significant accuracy improvements in zero-shot benchmarks by adapting to task-specific requirements without costly pre-computation or training. The method leverages majority voting to construct a single global mask from task-specific importance scores, demonstrating effectiveness across various sparsity levels and model scales. The findings position DIET as a practical and robust choice for structured LLM pruning, with potential applications in real-world deployment. The results are promising, but further investigation is needed to fully understand the method's implications and limitations.
Key Points
- ▸ DIET proposes a training-free structured pruning method for LLMs
- ▸ The method adapts to task-specific requirements without costly pre-computation or training
- ▸ DIET achieves significant accuracy improvements in zero-shot benchmarks
Merits
Improved Efficiency
DIET eliminates the need for costly pre-computation and training, making it a more efficient pruning method.
Task-aware Selection
DIET's task-aware selection mechanism enables the model to adapt to task-specific requirements, leading to better pruning decisions.
Robustness
DIET's majority voting approach ensures a stable and robust pruning process, unaffected by individual task-specific importance scores.
Demerits
Limited Generalizability
The method's performance might be specific to the Gemma-2 model and zero-shot benchmarks, limiting its generalizability to other models and tasks.
Lack of Theoretical Guarantees
The article does not provide theoretical guarantees for DIET's performance, making it challenging to predict its behavior in different scenarios.
Expert Commentary
The article presents a significant contribution to the field of efficient model pruning, offering a novel approach to structured pruning for LLMs. The proposed method, DIET, demonstrates impressive results in zero-shot benchmarks, showcasing its potential for practical deployment. However, the lack of theoretical guarantees and limited generalizability of the method need to be addressed in future research. Nevertheless, DIET's efficiency, task-aware selection, and robustness make it a compelling choice for researchers and practitioners seeking to deploy LLMs in real-world applications.
Recommendations
- ✓ Future research should focus on extending DIET's applicability to other models and tasks, as well as providing theoretical guarantees for its performance.
- ✓ The authors should investigate the interpretability of DIET's importance scores to better understand the model's decision-making process.
Sources
Original: arXiv - cs.LG