Do Understanding and Generation Fight? A Diagnostic Study of DPO for Unified Multimodal Models
arXiv:2603.17044v1 Announce Type: new Abstract: Unified multimodal models share a language model backbone for both understanding and generating images. Can DPO align both capabilities simultaneously? …
Abinav Rao, Sujan Rachuri
15 views