Academic

Cross-Domain Demo-to-Code via Neurosymbolic Counterfactual Reasoning

arXiv:2603.18495v1 Announce Type: new Abstract: Recent advances in Vision-Language Models (VLMs) have enabled video-instructed robotic programming, allowing agents to interpret video demonstrations and generate executable control code. We formulate video-instructed robotic programming as a cross-domain adaptation problem, where perceptual and physical differences between demonstration and deployment induce procedural mismatches. However, current VLMs lack the procedural understanding needed to reformulate causal dependencies and achieve task-compatible behavior under such domain shifts. We introduce NeSyCR, a neurosymbolic counterfactual reasoning framework that enables verifiable adaptation of task procedures, providing a reliable synthesis of code policies. NeSyCR abstracts video demonstrations into symbolic trajectories that capture the underlying task procedure. Given deployment observations, it derives counterfactual states that reveal cross-domain incompatibilities. By exploring

J
Jooyoung Kim, Wonje Choi, Younguk Song, Honguk Woo
· · 1 min read · 15 views

arXiv:2603.18495v1 Announce Type: new Abstract: Recent advances in Vision-Language Models (VLMs) have enabled video-instructed robotic programming, allowing agents to interpret video demonstrations and generate executable control code. We formulate video-instructed robotic programming as a cross-domain adaptation problem, where perceptual and physical differences between demonstration and deployment induce procedural mismatches. However, current VLMs lack the procedural understanding needed to reformulate causal dependencies and achieve task-compatible behavior under such domain shifts. We introduce NeSyCR, a neurosymbolic counterfactual reasoning framework that enables verifiable adaptation of task procedures, providing a reliable synthesis of code policies. NeSyCR abstracts video demonstrations into symbolic trajectories that capture the underlying task procedure. Given deployment observations, it derives counterfactual states that reveal cross-domain incompatibilities. By exploring the symbolic state space with verifiable checks, NeSyCR proposes procedural revisions that restore compatibility with the demonstrated procedure. NeSyCR achieves a 31.14% improvement in task success over the strongest baseline Statler, showing robust cross-domain adaptation across both simulated and real-world manipulation tasks.

Executive Summary

This article introduces NeSyCR, a neurosymbolic counterfactual reasoning framework that enables verifiable adaptation of task procedures in video-instructed robotic programming. By abstracting video demonstrations into symbolic trajectories and deriving counterfactual states from deployment observations, NeSyCR proposes procedural revisions to restore compatibility with the demonstrated procedure. The framework achieves a 31.14% improvement in task success over the strongest baseline, Statler, and demonstrates robust cross-domain adaptation across simulated and real-world manipulation tasks. The authors' innovative approach addresses a critical limitation of current Vision-Language Models (VLMs) in reformulating causal dependencies and achieving task-compatible behavior under domain shifts.

Key Points

  • NeSyCR is a neurosymbolic counterfactual reasoning framework that enables verifiable adaptation of task procedures.
  • The framework abstracts video demonstrations into symbolic trajectories and derives counterfactual states from deployment observations.
  • NeSyCR achieves a 31.14% improvement in task success over the strongest baseline, Statler.

Merits

Strength

The framework's ability to address cross-domain adaptation in video-instructed robotic programming is a significant strength, as it enables verifiable adaptation of task procedures and achieves robust results across both simulated and real-world manipulation tasks.

Demerits

Limitation

The framework's reliance on symbolic representations of video demonstrations may limit its applicability to tasks with complex, dynamic, or uncertain environments.

Expert Commentary

The article makes a significant contribution to the field of robotics by introducing a novel framework for cross-domain adaptation in video-instructed robotic programming. The authors' innovative approach to abstracting video demonstrations into symbolic trajectories and deriving counterfactual states from deployment observations is a key strength of the framework. However, the reliance on symbolic representations of video demonstrations may limit the framework's applicability to tasks with complex, dynamic, or uncertain environments. Nevertheless, the results demonstrate the potential of NeSyCR to achieve robust cross-domain adaptation across both simulated and real-world manipulation tasks.

Recommendations

  • Future research should focus on extending the framework to handle tasks with complex, dynamic, or uncertain environments.
  • The authors should investigate the application of NeSyCR to other areas of robotics, such as manipulation and mobility tasks, to further evaluate its potential and limitations.

Sources