Academic

Distilling Deep Reinforcement Learning into Interpretable Fuzzy Rules: An Explainable AI Framework

arXiv:2603.13257v1 Announce Type: new Abstract: Deep Reinforcement Learning (DRL) agents achieve remarkable performance in continuous control but remain opaque, hindering deployment in safety-critical domains. Existing explainability methods either provide only local insights (SHAP, LIME) or employ over-simplified surrogates failing to capture continuous dynamics (decision trees). This work proposes a Hierarchical Takagi-Sugeno-Kang (TSK) Fuzzy Classifier System (FCS) distilling neural policies into human-readable IF-THEN rules through K-Means clustering for state partitioning and Ridge Regression for local action inference. Three quantifiable metrics are introduced: Fuzzy Rule Activation Density (FRAD) measuring explanation focus, Fuzzy Set Coverage (FSC) validating vocabulary completeness, and Action Space Granularity (ASG) assessing control mode diversity. Dynamic Time Warping (DTW) validates temporal behavioral fidelity. Empirical evaluation on \textit{Lunar Lander(Continuous)} sh

S
Sanup S. Araballi, Simon Khan, Chilukuri K. Mohan
· · 1 min read · 2 views

arXiv:2603.13257v1 Announce Type: new Abstract: Deep Reinforcement Learning (DRL) agents achieve remarkable performance in continuous control but remain opaque, hindering deployment in safety-critical domains. Existing explainability methods either provide only local insights (SHAP, LIME) or employ over-simplified surrogates failing to capture continuous dynamics (decision trees). This work proposes a Hierarchical Takagi-Sugeno-Kang (TSK) Fuzzy Classifier System (FCS) distilling neural policies into human-readable IF-THEN rules through K-Means clustering for state partitioning and Ridge Regression for local action inference. Three quantifiable metrics are introduced: Fuzzy Rule Activation Density (FRAD) measuring explanation focus, Fuzzy Set Coverage (FSC) validating vocabulary completeness, and Action Space Granularity (ASG) assessing control mode diversity. Dynamic Time Warping (DTW) validates temporal behavioral fidelity. Empirical evaluation on \textit{Lunar Lander(Continuous)} shows the Triangular membership function variant achieves 81.48\% $\pm$ 0.43\% fidelity, outperforming Decision Trees by 21 percentage points. The framework exhibits statistically superior interpretability (FRAD = 0.814 vs. 0.723 for Gaussian, $p < 0.001$) with low MSE (0.0053) and DTW distance (1.05). Extracted rules such as ``IF lander drifting left at high altitude THEN apply upward thrust with rightward correction'' enable human verification, establishing a pathway toward trustworthy autonomous systems.

Executive Summary

This article presents a novel framework for making Deep Reinforcement Learning (DRL) interpretable by distilling neural policies into human-readable Takagi-Sugeno-Kang (TSK) fuzzy rules via K-Means clustering and Ridge Regression. The framework introduces three metrics—FRAD, FSC, and ASG—to quantify interpretability and control diversity, validated through DTW on Lunar Lander(Continuous). Empirical results demonstrate superior fidelity (81.48%) and interpretability compared to Decision Trees, with concrete, verifiable rules enabling human oversight. The work addresses a critical gap in safety-critical AI by offering both performance and transparency.

Key Points

  • Introduction of TSK fuzzy classifier system for DRL interpretability
  • Use of K-Means and Ridge Regression for state partitioning and action inference
  • Introduction of novel metrics (FRAD, FSC, ASG) and validation via DTW

Merits

Interpretability Advance

The framework successfully translates opaque DRL into human-readable rules using established fuzzy logic constructs, offering a tangible bridge between black-box performance and regulatory compliance.

Empirical Validation

Quantitative results (81.48% fidelity, 21% improvement over DT) and statistical significance (p < 0.001) substantiate claims, enhancing credibility.

Demerits

Generalizability Concern

Evaluation is limited to a single continuous control environment (Lunar Lander); applicability to other domains—e.g., robotics, autonomous vehicles—remains unproven.

Expert Commentary

This work represents a significant methodological leap in the field of explainable AI. By leveraging TSK fuzzy systems—a well-established framework in control theory—the authors effectively bridge the ontological divide between neural networks and symbolic reasoning. The use of K-Means for state clustering is particularly elegant, as it aligns with human cognitive heuristics for spatial partitioning, thereby enhancing interpretability without sacrificing fidelity. Moreover, the introduction of FRAD as a metric to measure explanation density is a novel contribution; it shifts the evaluation from qualitative ‘is it explainable?’ to quantitative ‘how much is explained?’ The empirical validation against Decision Trees is not merely a benchmark—it is a paradigm shift, demonstrating that fuzzy logic can outperform traditional surrogate models in both accuracy and cognitive accessibility. While the single-domain validation limits generalizability, this is a forgivable limitation given the novelty of the approach. The authors have opened a door to a new class of hybrid AI systems: performant, transparent, and verifiable. Future work should extend this framework to multi-agent systems and adversarial environments, where fuzzy logic’s robustness may yield additional advantages.

Recommendations

  • 1. Extend evaluation to diverse domains including autonomous navigation and medical diagnostics.
  • 2. Integrate the framework into open-source DRL libraries (e.g., Stable Baselines) to promote adoption.
  • 3. Develop automated rule validation tools using formal verification methods to enhance safety guarantees.

Sources