Academic

Pragma-VL: Towards a Pragmatic Arbitration of Safety and Helpfulness in MLLMs

Ming Wen, Kun Yang, Xin Chen, Jingyu Zhang, Dingding Han, Shiwen Cui, Yuedong Xu · March 17, 2026 · 1 min read · 8 views

#cs.LG #cs.AI

arXiv:2603.13292v1 Announce Type: new Abstract: Multimodal Large Language Models (MLLMs) pose critical safety challenges, as they are susceptible not only to adversarial attacks such as jailbreaking but also to inadvertently generating harmful content for benign users. While internal safety alignment via Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL) is a primary mitigation strategy, current methods often face a safety-utility trade-off: they either refuse benign queries out of excessive caution or overlook latent risks in cross-modal interactions. To resolve this, we introduce Pragma-VL, an end-to-end alignment algorithm that enables MLLMs to pragmatically arbitrate between safety and helpfulness. First, we enhance visual risk perception with a novel cold-start SFT stage. This is achieved by applying risk-aware clustering to the visual encoder and using an interleaved dataset of risk descriptions and high-quality data. Second, we introduce a theoretically-guaranteed reward model that leverages synergistic learning. We train it with a novel data augmentation method that assigns dynamic weights based on the queries, enabling contextual arbitration between safety and helpfulness. Extensive experiments show that Pragma-VL effectively balances safety and helpfulness, outperforming baselines by 5% to 20% on most multimodal safety benchmarks while preserving its general capabilities in areas such as mathematics and knowledge reasoning.

Executive Summary

The article introduces Pragma-VL, an end-to-end alignment algorithm for Multimodal Large Language Models (MLLMs) that balances safety and helpfulness. Pragma-VL enhances visual risk perception through a novel cold-start Supervised Fine-Tuning stage and introduces a theoretically-guaranteed reward model with synergistic learning. The algorithm demonstrates a 5% to 20% improvement over baselines on multimodal safety benchmarks while preserving general capabilities. This innovation addresses the safety-utility trade-off in MLLMs, providing a pragmatic approach to arbitrate between safety and helpfulness.

Key Points

▸ Pragma-VL introduces a novel cold-start SFT stage for enhanced visual risk perception
▸ The algorithm uses a theoretically-guaranteed reward model with synergistic learning
▸ Pragma-VL demonstrates improved performance on multimodal safety benchmarks

Merits

Effective Safety-Utility Balance

Pragma-VL successfully balances safety and helpfulness, mitigating the trade-off faced by current methods

Demerits

Limited Generalizability

The algorithm's performance may be limited to specific domains or datasets, requiring further testing for broader applicability

Expert Commentary

The introduction of Pragma-VL marks a significant advancement in MLLM safety research. By pragmatically arbitrating between safety and helpfulness, Pragma-VL addresses a critical challenge in the development of reliable and trustworthy AI systems. The algorithm's ability to balance competing objectives while preserving general capabilities is a notable achievement. However, further research is needed to fully explore the potential of Pragma-VL and its applications in various domains.

Recommendations

✓ Further testing and evaluation of Pragma-VL on diverse datasets and domains
✓ Integration of Pragma-VL with existing MLLM architectures to enhance safety and performance

Sources

arXiv - cs.LG

Pragma-VL: Towards a Pragmatic Arbitration of Safety and Helpfulness in MLLMs

AI Commentary

Executive Summary

Key Points

Merits

Effective Safety-Utility Balance

Demerits

Limited Generalizability

Expert Commentary

Recommendations

Sources

Related Articles

ConstitutionGPT: An AI-Powered Multilingual Legal Assistance System for Indian Citizens

AI Copyright Infringement: Navigating the Legal Risks of AI-Generated Content

The Rhetoric of Machine Learning

Busemann energy-based attention for emotion analysis in Poincar\'e discs

JCG, PC

HSOLLC Co., Ltd.

Related Articles

ConstitutionGPT: An AI-Powered Multilingual Legal Assistance System for Indian Citizens

AI Copyright Infringement: Navigating the Legal Risks of AI-Generated Content

The Rhetoric of Machine Learning

Busemann energy-based attention for emotion analysis in Poincar\'e discs