Academic

Does Unification Come at a Cost? Uni-SafeBench: A Safety Benchmark for Unified Multimodal Large Models

arXiv:2604.00547v1 Announce Type: new Abstract: Unified Multimodal Large Models (UMLMs) integrate understanding and generation capabilities within a single architecture. While this architectural unification, driven by the deep fusion of multimodal features, enhances model performance, it also introduces important yet underexplored safety challenges. Existing safety benchmarks predominantly focus on isolated understanding or generation tasks, failing to evaluate the holistic safety of UMLMs when handling diverse tasks under a unified framework. To address this, we introduce Uni-SafeBench, a comprehensive benchmark featuring a taxonomy of six major safety categories across seven task types. To ensure rigorous assessment, we develop Uni-Judger, a framework that effectively decouples contextual safety from intrinsic safety. Based on comprehensive evaluations across Uni-SafeBench, we uncover that while the unification process enhances model capabilities, it significantly degrades the inher

Zixiang Peng, Yongxiu Xu, Qinyi Zhang, Jiexun Shen, Yifan Zhang, Hongbo Xu, Yubin Wang, Gaopeng Gou · April 3, 2026 · 1 min read · 0 views

#cs.AI #cs.LG

Executive Summary

This article presents Uni-SafeBench, a comprehensive safety benchmark for Unified Multimodal Large Models (UMLMs), which integrates understanding and generation capabilities within a single architecture. The authors highlight the underexplored safety challenges of UMLMs and introduce a taxonomy of six major safety categories across seven task types. They also develop Uni-Judger, a framework to assess safety in UMLMs. The study reveals that unification degrades the inherent safety of LLMs and that open-source UMLMs perform worse than specialized models. The authors open-source all resources to promote safer AGI development. This work is significant as it sheds light on the safety risks of UMLMs and emphasizes the need for rigorous assessment of their safety. The Uni-SafeBench benchmark and Uni-Judger framework are valuable contributions to the field, enabling the evaluation of UMLMs and promoting the development of safer AI systems.

Key Points

▸ Uni-SafeBench is a comprehensive safety benchmark for UMLMs, addressing the underexplored safety challenges of unification.
▸ The authors introduce a taxonomy of six major safety categories across seven task types, enabling a holistic evaluation of UMLMs.
▸ Uni-Judger is a framework that decouples contextual safety from intrinsic safety, facilitating rigorous assessment of UMLMs.

Merits

Comprehensive Safety Benchmark

Uni-SafeBench provides a systematic evaluation of UMLMs' safety, covering six major safety categories across seven task types.

Decoupling of Safety Categories

Uni-Judger enables the decoupling of contextual safety from intrinsic safety, allowing for a more nuanced understanding of UMLMs' safety.

Demerits

Limited Model Scope

The study focuses on open-source UMLMs and may not be representative of all UMLMs, limiting the generalizability of the results.

Need for Further Research

While Uni-SafeBench and Uni-Judger are valuable contributions, further research is needed to fully understand the safety implications of UMLMs and to develop more effective safety assessments.

Expert Commentary

The article presents a significant contribution to the field of AI safety, highlighting the underexplored safety challenges of UMLMs and providing a comprehensive safety benchmark and framework for assessment. The study's findings on the degradation of inherent safety in UMLMs are particularly concerning, emphasizing the need for further research and development of safer AI systems. The Uni-SafeBench benchmark and Uni-Judger framework are valuable tools for evaluating the safety of UMLMs and promoting the development of safer AI systems. However, the study's limitations, including the focus on open-source UMLMs and the need for further research, highlight the importance of continued investigation in this area.

Recommendations

✓ Researchers should develop and apply Uni-SafeBench and Uni-Judger to a broader range of UMLMs to further understand the safety implications of unification.
✓ Developers of UMLMs should prioritize the implementation of safety assessments and evaluation frameworks, such as Uni-SafeBench and Uni-Judger, to ensure the development of safer AI systems.

Sources

Original: arXiv - cs.AI

arXiv - cs.AI

Does Unification Come at a Cost? Uni-SafeBench: A Safety Benchmark for Unified Multimodal Large Models

AI Commentary

Executive Summary

Key Points

Merits

Comprehensive Safety Benchmark

Decoupling of Safety Categories

Demerits

Limited Model Scope

Need for Further Research

Expert Commentary

Recommendations

Sources

Related Articles

AI-Driven Approaches to Enhancing Fairness and Identifying Algorithmic Bias in …

High resolution schemes for hyperbolic conservation laws

Robust Graph Representation Learning via Adaptive Spectral Contrast

Towards Intrinsically Calibrated Uncertainty Quantification in Industrial Data-Driven Models via …

JCG, PC

HSOLLC Co., Ltd.