On Robustness and Chain-of-Thought Consistency of RL-Finetuned VLMs
arXiv:2602.12506v1 Announce Type: new Abstract: Reinforcement learning (RL) fine-tuning has become a key technique for enhancing large language models (LLMs) on reasoning-intensive tasks, motivating its …