Academic

Can VLMs Reason Robustly? A Neuro-Symbolic Investigation

Weixin Chen, Antonio Vergari, Han Zhao · March 26, 2026 · 1 min read · 13 views

#cs.LG #cs.AI #cs.CV

arXiv:2603.23867v1 Announce Type: new Abstract: Vision-Language Models (VLMs) have been applied to a wide range of reasoning tasks, yet it remains unclear whether they can reason robustly under distribution shifts. In this paper, we study covariate shifts in which the perceptual input distribution changes while the underlying prediction rules do not. To investigate this question, we consider visual deductive reasoning tasks, where a model is required to answer a query given an image and logical rules defined over the object concepts in the image. Empirically, we find that VLMs fine-tuned through gradient-based end-to-end training can achieve high in-distribution accuracy but fail to generalize under such shifts, suggesting that fine-tuning does not reliably induce the underlying reasoning function. This motivates a neuro-symbolic perspective that decouples perception from reasoning. However, we further observe that recent neuro-symbolic approaches that rely on black-box components for reasoning can still exhibit inconsistent robustness across tasks. To address this issue, we propose VLC, a neuro-symbolic method that combines VLM-based concept recognition with circuit-based symbolic reasoning. In particular, task rules are compiled into a symbolic program, specifically a circuit, which executes the rules exactly over the object concepts recognized by the VLM. Experiments on three visual deductive reasoning tasks with distinct rule sets show that VLC consistently achieves strong performance under covariate shifts, highlighting its ability to support robust reasoning.

Executive Summary

This article investigates the robust reasoning capabilities of Vision-Language Models (VLMs) in visual deductive reasoning tasks. The authors find that VLMs fine-tuned through gradient-based end-to-end training achieve high in-distribution accuracy but fail to generalize under distribution shifts. To address this issue, they propose VLC, a neuro-symbolic method that combines VLM-based concept recognition with circuit-based symbolic reasoning. Experiments show that VLC consistently achieves strong performance under covariate shifts, highlighting its ability to support robust reasoning. The study provides valuable insights into the limitations of VLMs and the potential benefits of neuro-symbolic approaches in addressing these limitations.

Key Points

▸ VLMs fine-tuned through gradient-based end-to-end training achieve high in-distribution accuracy but fail to generalize under distribution shifts.
▸ VLC, a neuro-symbolic method, combines VLM-based concept recognition with circuit-based symbolic reasoning and achieves strong performance under covariate shifts.
▸ The study highlights the importance of decoupling perception from reasoning in VLMs and the potential benefits of neuro-symbolic approaches.

Merits

Strength in Addressing Limitations of VLMs

The study provides a comprehensive analysis of the limitations of VLMs in robust reasoning and proposes a novel neuro-symbolic approach to address these limitations.

Methodological Rigor

The authors employ a rigorous methodology, including experiments on three visual deductive reasoning tasks, to evaluate the performance of VLC and VLMs.

Demerits

Limited Generalizability

The study focuses on visual deductive reasoning tasks and may not generalize to other domains or tasks.

Technical Complexity

The neuro-symbolic approach proposed in the study may be technically complex and challenging to implement in practice.

Expert Commentary

The study provides a timely and comprehensive analysis of the limitations of VLMs in robust reasoning and proposes a novel neuro-symbolic approach to address these limitations. The authors' use of a rigorous methodology and experiments on three visual deductive reasoning tasks adds to the credibility of the study. The neuro-symbolic approach proposed in the study has the potential to support robust reasoning in AI systems and addresses an important issue in the field. However, the study's focus on visual deductive reasoning tasks may limit its generalizability to other domains or tasks. Furthermore, the technical complexity of the neuro-symbolic approach may make it challenging to implement in practice.

Recommendations

✓ Future studies should aim to generalize the findings of this study to other domains or tasks to further validate the potential benefits of neuro-symbolic approaches.
✓ Developers should consider incorporating neuro-symbolic approaches into AI systems that require robust reasoning capabilities to improve their performance and reliability.

Sources

Original: arXiv - cs.LG

arXiv - cs.LG

Can VLMs Reason Robustly? A Neuro-Symbolic Investigation

AI Commentary

Executive Summary

Key Points

Merits

Strength in Addressing Limitations of VLMs

Methodological Rigor

Demerits

Limited Generalizability

Technical Complexity

Expert Commentary

Recommendations

Sources

Related Articles

AI-Driven Approaches to Enhancing Fairness and Identifying Algorithmic Bias in …

High resolution schemes for hyperbolic conservation laws

Robust Graph Representation Learning via Adaptive Spectral Contrast

Towards Intrinsically Calibrated Uncertainty Quantification in Industrial Data-Driven Models via …

JCG, PC

HSOLLC Co., Ltd.