KLDrive: Fine-Grained 3D Scene Reasoning for Autonomous Driving based on Knowledge Graph
arXiv:2603.21029v1 Announce Type: new Abstract: Autonomous driving requires reliable reasoning over fine-grained 3D scene facts. Fine-grained question answering over multi-modal driving observations provides a natural way to evaluate this capability, yet existing perception pipelines and driving-oriented large language model (LLM) methods still suffer from unreliable scene facts, hallucinations, opaque reasoning, and heavy reliance on task-specific training. We present KLDrive, the first knowledge-graph-augmented LLM reasoning framework for fine-grained question answering in autonomous driving. KLDrive addresses this problem through designing two tightly coupled components: an energy-based scene fact construction module that consolidates multi-source evidence into a reliable scene knowledge graph, and an LLM agent that performs fact-grounded reasoning over a constrained action space under explicit structural constraints. By combining structured prompting with few-shot in-context exemp
arXiv:2603.21029v1 Announce Type: new Abstract: Autonomous driving requires reliable reasoning over fine-grained 3D scene facts. Fine-grained question answering over multi-modal driving observations provides a natural way to evaluate this capability, yet existing perception pipelines and driving-oriented large language model (LLM) methods still suffer from unreliable scene facts, hallucinations, opaque reasoning, and heavy reliance on task-specific training. We present KLDrive, the first knowledge-graph-augmented LLM reasoning framework for fine-grained question answering in autonomous driving. KLDrive addresses this problem through designing two tightly coupled components: an energy-based scene fact construction module that consolidates multi-source evidence into a reliable scene knowledge graph, and an LLM agent that performs fact-grounded reasoning over a constrained action space under explicit structural constraints. By combining structured prompting with few-shot in-context exemplars, the framework adapts to diverse reasoning tasks without heavy task-specific fine-tuning. Experiments on two large-scale autonomous-driving QA benchmarks show that KLDrive outperforms prior state-of-the-art methods, achieving the best overall accuracy of 65.04% on NuScenes-QA and the best SPICE score of 42.45 on GVQA. On counting, the most challenging factual reasoning task, it improves over the strongest baseline by 46.01 percentage points, demonstrating substantially reduced hallucinations and the benefit of coupling reliable scene fact construction with explicit reasoning.
Executive Summary
This article presents KLDrive, a novel knowledge-graph-augmented large language model (LLM) framework for fine-grained 3D scene reasoning in autonomous driving. KLDrive addresses the limitations of existing perception pipelines and LLM methods by designing two tightly coupled components: an energy-based scene fact construction module and an LLM agent. Experiments on two large-scale autonomous-driving QA benchmarks demonstrate KLDrive's superiority, achieving the best overall accuracy and substantially reduced hallucinations. The framework's ability to adapt to diverse reasoning tasks without heavy task-specific fine-tuning is a significant advancement. However, the article highlights the importance of reliable scene fact construction and explicit reasoning in autonomous driving.
Key Points
- ▸ KLDrive is a knowledge-graph-augmented LLM framework for fine-grained 3D scene reasoning in autonomous driving.
- ▸ The framework consists of two tightly coupled components: an energy-based scene fact construction module and an LLM agent.
- ▸ KLDrive outperforms prior state-of-the-art methods on two large-scale autonomous-driving QA benchmarks.
Merits
Strength in Addressing Limitations
KLDrive effectively addresses the limitations of existing perception pipelines and LLM methods, including unreliable scene facts, hallucinations, and opaque reasoning.
Improved Adaptability
The framework's ability to adapt to diverse reasoning tasks without heavy task-specific fine-tuning is a significant advancement in autonomous driving.
State-of-the-Art Performance
KLDrive achieves the best overall accuracy and substantially reduces hallucinations on two large-scale autonomous-driving QA benchmarks.
Demerits
Limited Evaluation Datasets
The article's evaluation is limited to two large-scale autonomous-driving QA benchmarks, which may not fully capture the complexities of real-world driving scenarios.
Lack of Explanation for Scene Fact Construction
The article does not provide sufficient explanation for the energy-based scene fact construction module, making it difficult to understand the underlying reasoning.
Expert Commentary
While KLDrive is a significant advancement in autonomous driving, its limitations and potential applications warrant further investigation. The article's focus on fine-grained 3D scene reasoning highlights the importance of structured and reasoning-driven decision-making in autonomous driving. However, the lack of explanation for the scene fact construction module raises questions about the underlying reasoning and the potential for generalization to real-world scenarios. Nevertheless, KLDrive has the potential to improve the accuracy and reliability of autonomous driving systems, making it a valuable contribution to the field.
Recommendations
- ✓ Future research should focus on exploring the application of KLDrive in real-world driving scenarios and evaluating its performance in diverse environments.
- ✓ The development of more comprehensive evaluation datasets that capture the complexities of real-world driving scenarios is essential for the advancement of scene understanding and autonomous driving.
Sources
Original: arXiv - cs.AI