Tag: cs.MM

#cs.MM

Academic · 1 min

VoxEmo: Benchmarking Speech Emotion Recognition with Speech LLMs

arXiv:2603.08936v1 Announce Type: cross Abstract: Speech Large Language Models (LLMs) show great promise for speech emotion recognition (SER) via generative interfaces. However, shifting from closed-set …

Hezhao Zhang, Huang-Cheng Chou, Shrikanth Narayanan, Thomas Hain
19 views
Academic · 1 min

VDCook:DIY video data cook your MLLMs

arXiv:2603.05539v1 Announce Type: cross Abstract: We introduce VDCook: a self-evolving video data operating system, a configurable video data construction platform for researchers and vertical domain …

Chengwei Wu
15 views
Academic · 1 min

Human-Data Interaction, Exploration, and Visualization in the AI Era: Challenges and Opportunities

arXiv:2603.05542v1 Announce Type: cross Abstract: The rapid advancement of AI is transforming human-centered systems, with profound implications for human-AI interaction, human-data interaction, and visual analytics. …

Jean-Daniel Fekete, Yifan Hu, Dominik Moritz, Arnab Nandi, Senjuti Basu Roy, Eugene Wu, Nikos Bikakis, George Papastefanatos, Panos K. Chrysanthis, Guoliang Li, Lingyun Yu
27 views
Academic · 1 min

OmniGAIA: Towards Native Omni-Modal AI Agents

arXiv:2602.22897v1 Announce Type: new Abstract: Human intelligence naturally intertwines omni-modal perception -- spanning vision, audio, and language -- with complex reasoning and tool usage to …

Xiaoxi Li, Wenxiang Jiao, Jiarui Jin, Shijian Wang, Guanting Dong, Jiajie Jin, Hao Wang, Yinuo Wang, Ji-Rong Wen, Yuan Lu, Zhicheng Dou
20 views
Academic · 1 min

ModalImmune: Immunity Driven Unlearning via Self Destructive Training

arXiv:2602.16197v1 Announce Type: new Abstract: Multimodal systems are vulnerable to partial or complete loss of input channels at deployment, which undermines reliability in real-world settings. …

Rong Fu, Jia Yee Tan, Wenxin Zhang, Zijian Zhang, Ziming Wang, Zhaolu Kang, Muge Qi, Shuning Zhang, Simon Fong
19 views