Tag: cs.CV

#cs.CV

Academic · 1 min

HippoCamp: Benchmarking Contextual Agents on Personal Computers

arXiv:2604.01221v1 Announce Type: new Abstract: We present HippoCamp, a new benchmark designed to evaluate agents' capabilities on multimodal file management. Unlike existing agent benchmarks that …

Zhe Yang, Shulin Tian, Kairui Hu, Shuai Liu, Hoang-Nhat Nguyen, Yichi Zhang, Zujin Guo, Mengying Yu, Zinan Zhang, Jingkang Yang, Chen Change Loy, Ziwei Liu
4 views
Academic · 1 min

Chitrakshara: A Large Multilingual Multimodal Dataset for Indian languages

arXiv:2603.23521v1 Announce Type: new Abstract: Multimodal research has predominantly focused on single-image reasoning, with limited exploration of multi-image scenarios. Recent models have sought to enhance …

Shaharukh Khan, Ali Faraz, Abhinav Ravi, Mohd Nauman, Mohd Sarfraz, Akshat Patidar, Raja Kolla, Chandra Khatri, Shubham Agarwal
17 views
Academic · 1 min

DISCO: Document Intelligence Suite for COmparative Evaluation

arXiv:2603.23511v1 Announce Type: new Abstract: Document intelligence requires accurate text extraction and reliable reasoning over document content. We introduce \textbf{DISCO}, a \emph{Document Intelligence Suite for …

Kenza Benkirane, Dan Goldwater, Martin Asenov, Aneiss Ghodsi
18 views
Academic · 1 min

Three Creates All: You Only Sample 3 Steps

arXiv:2603.22375v1 Announce Type: new Abstract: Diffusion models deliver high-fidelity generation but remain slow at inference time due to many sequential network evaluations. We find that …

Yuren Cai, Guangyi Wang, Zongqing Li, Li Li, Zhihui Liu, Songzhi Su
5 views
Academic · 1 min

Attention in Space: Functional Roles of VLM Heads for Spatial Reasoning

arXiv:2603.20662v1 Announce Type: new Abstract: Despite remarkable advances in large Vision-Language Models (VLMs), spatial reasoning remains a persistent challenge. In this work, we investigate how …

Xueqi Ma, Shuo Yang, Yanbei Jiang, Shu Liu, Zhenzhen Liu, Jiayang Ao, Xingjun Ma, Sarah Monazam Erfani, James Bailey
14 views