See, Symbolize, Act: Grounding VLMs with Spatial Representations for Better Gameplay
arXiv:2603.11601v1 Announce Type: new Abstract: Vision-Language Models (VLMs) excel at describing visual scenes, yet struggle to translate perception into precise, grounded actions. We investigate whether …
Ashish Baghel, Paras Chopra
11 views