VisDoT : Enhancing Visual Reasoning through Human-Like Interpretation Grounding and Decomposition of Thought
arXiv:2603.11631v1 Announce Type: new Abstract: Large vision-language models (LVLMs) struggle to reliably detect visual primitives in charts and align them with semantic representations, which severely …
Eunsoo Lee, Jeongwoo Lee, Minki Hong, Jangho Choi, Jihie Kim
9 views