RoboAlign: Learning Test-Time Reasoning for Language-Action Alignment in Vision-Language-Action Models
arXiv:2603.21341v1 Announce Type: new Abstract: Improving embodied reasoning in multimodal-large-language models (MLLMs) is essential for building vision-language-action models (VLAs) on top of them to readily …
Dongyoung Kim, Sumin Park, Woomin Song, Seungku Kim, Taeyoung Kim, Huiwon Jang, Jinwoo Shin, Jaehyung Kim, Younggyo Seo
8 views