Academic

KLDrive: Fine-Grained 3D Scene Reasoning for Autonomous Driving based on Knowledge Graph

Ye Tian, Jingyi Zhang, Zihao Wang, Xiaoyuan Ren, Xiaofan Yu, Onat Gungor, Tajana Rosing · March 24, 2026 · 1 min read · 9 views

#cs.AI

arXiv:2603.21029v1 Announce Type: new Abstract: Autonomous driving requires reliable reasoning over fine-grained 3D scene facts. Fine-grained question answering over multi-modal driving observations provides a natural way to evaluate this capability, yet existing perception pipelines and driving-oriented large language model (LLM) methods still suffer from unreliable scene facts, hallucinations, opaque reasoning, and heavy reliance on task-specific training. We present KLDrive, the first knowledge-graph-augmented LLM reasoning framework for fine-grained question answering in autonomous driving. KLDrive addresses this problem through designing two tightly coupled components: an energy-based scene fact construction module that consolidates multi-source evidence into a reliable scene knowledge graph, and an LLM agent that performs fact-grounded reasoning over a constrained action space under explicit structural constraints. By combining structured prompting with few-shot in-context exemplars, the framework adapts to diverse reasoning tasks without heavy task-specific fine-tuning. Experiments on two large-scale autonomous-driving QA benchmarks show that KLDrive outperforms prior state-of-the-art methods, achieving the best overall accuracy of 65.04% on NuScenes-QA and the best SPICE score of 42.45 on GVQA. On counting, the most challenging factual reasoning task, it improves over the strongest baseline by 46.01 percentage points, demonstrating substantially reduced hallucinations and the benefit of coupling reliable scene fact construction with explicit reasoning.

Executive Summary

This article presents KLDrive, a novel knowledge-graph-augmented large language model (LLM) framework for fine-grained 3D scene reasoning in autonomous driving. KLDrive addresses the limitations of existing perception pipelines and LLM methods by designing two tightly coupled components: an energy-based scene fact construction module and an LLM agent. Experiments on two large-scale autonomous-driving QA benchmarks demonstrate KLDrive's superiority, achieving the best overall accuracy and substantially reduced hallucinations. The framework's ability to adapt to diverse reasoning tasks without heavy task-specific fine-tuning is a significant advancement. However, the article highlights the importance of reliable scene fact construction and explicit reasoning in autonomous driving.

Key Points

▸ KLDrive is a knowledge-graph-augmented LLM framework for fine-grained 3D scene reasoning in autonomous driving.
▸ The framework consists of two tightly coupled components: an energy-based scene fact construction module and an LLM agent.
▸ KLDrive outperforms prior state-of-the-art methods on two large-scale autonomous-driving QA benchmarks.

Merits

Strength in Addressing Limitations

KLDrive effectively addresses the limitations of existing perception pipelines and LLM methods, including unreliable scene facts, hallucinations, and opaque reasoning.

Improved Adaptability

The framework's ability to adapt to diverse reasoning tasks without heavy task-specific fine-tuning is a significant advancement in autonomous driving.

State-of-the-Art Performance

KLDrive achieves the best overall accuracy and substantially reduces hallucinations on two large-scale autonomous-driving QA benchmarks.

Demerits

Limited Evaluation Datasets

The article's evaluation is limited to two large-scale autonomous-driving QA benchmarks, which may not fully capture the complexities of real-world driving scenarios.

Lack of Explanation for Scene Fact Construction

The article does not provide sufficient explanation for the energy-based scene fact construction module, making it difficult to understand the underlying reasoning.

Expert Commentary

While KLDrive is a significant advancement in autonomous driving, its limitations and potential applications warrant further investigation. The article's focus on fine-grained 3D scene reasoning highlights the importance of structured and reasoning-driven decision-making in autonomous driving. However, the lack of explanation for the scene fact construction module raises questions about the underlying reasoning and the potential for generalization to real-world scenarios. Nevertheless, KLDrive has the potential to improve the accuracy and reliability of autonomous driving systems, making it a valuable contribution to the field.

Recommendations

✓ Future research should focus on exploring the application of KLDrive in real-world driving scenarios and evaluating its performance in diverse environments.
✓ The development of more comprehensive evaluation datasets that capture the complexities of real-world driving scenarios is essential for the advancement of scene understanding and autonomous driving.

Sources

Original: arXiv - cs.AI

arXiv - cs.AI

KLDrive: Fine-Grained 3D Scene Reasoning for Autonomous Driving based on Knowledge Graph

AI Commentary

Executive Summary

Key Points

Merits

Strength in Addressing Limitations

Improved Adaptability

State-of-the-Art Performance

Demerits

Limited Evaluation Datasets

Lack of Explanation for Scene Fact Construction

Expert Commentary

Recommendations

Sources

Related Articles

ConstitutionGPT: An AI-Powered Multilingual Legal Assistance System for Indian Citizens

AI Copyright Infringement: Navigating the Legal Risks of AI-Generated Content

The Rhetoric of Machine Learning

Busemann energy-based attention for emotion analysis in Poincar\'e discs

JCG, PC

HSOLLC Co., Ltd.