Academic

When Names Change Verdicts: Intervention Consistency Reveals Systematic Bias in LLM Decision-Making

arXiv:2603.18530v1 Announce Type: new Abstract: Large language models (LLMs) are increasingly used for high-stakes decisions, yet their susceptibility to spurious features remains poorly characterized. We introduce ICE-Guard, a framework applying intervention consistency testing to detect three types of spurious feature reliance: demographic (name/race swaps), authority (credential/prestige swaps), and framing (positive/negative restatements). Across 3,000 vignettes spanning 10 high-stakes domains, we evaluate 11 LLMs from 8 families and find that (1) authority bias (mean 5.8%) and framing bias (5.0%) substantially exceed demographic bias (2.2%), challenging the field's narrow focus on demographics; (2) bias concentrates in specific domains -- finance shows 22.6% authority bias while criminal justice shows only 2.8%; (3) structured decomposition, where the LLM extracts features and a deterministic rubric decides, reduces flip rates by up to 100% (median 49% across 9 models). We demons

Abhinaba Basu, Pavan Chakraborty · March 20, 2026 · 1 min read · 21 views

#cs.CL #cs.AI #cs.CY #cs.LG

Executive Summary

This study introduces ICE-Guard, a framework for detecting spurious feature reliance in large language models (LLMs). The researchers evaluated 11 LLMs from 8 families across 10 high-stakes domains, finding significant biases in authority, framing, and demographic features. They demonstrate a detect-diagnose-mitigate-verify loop that achieves 78% bias reduction and validate their results using real COMPAS recidivism data. This study highlights the importance of considering multiple types of biases in LLM decision-making and provides a valuable tool for mitigating these biases. The findings have significant implications for the development and deployment of AI systems in high-stakes applications.

Key Points

▸ ICE-Guard framework detects spurious feature reliance in LLMs
▸ Significant biases found in authority, framing, and demographic features
▸ Detect-diagnose-mitigate-verify loop achieves 78% bias reduction
▸ Validation using real COMPAS recidivism data provides conservative estimate of real-world bias

Merits

Comprehensive evaluation of LLM biases

The study evaluates 11 LLMs from 8 families across 10 high-stakes domains, providing a comprehensive understanding of the biases present in these models.

Development of a practical mitigation framework

The detect-diagnose-mitigate-verify loop provides a practical solution for mitigating biases in LLMs, making it a valuable tool for developers and deployers of AI systems.

Demerits

Limited scope to real-world applications

While the study provides a conservative estimate of real-world bias using COMPAS recidivism data, the results may not generalize to all high-stakes applications.

Requires significant computational resources

The evaluation of multiple LLMs across 10 domains requires significant computational resources, which may be a limitation for some researchers or organizations.

Expert Commentary

The study's findings on LLM biases have significant implications for the development and deployment of AI systems in high-stakes applications. The ICE-Guard framework provides a valuable tool for detecting and mitigating these biases, but its application in real-world scenarios requires careful consideration of the computational resources and data requirements. Furthermore, the study highlights the need for policymakers to prioritize the development of fairness and accountability standards for AI systems, including LLMs. The researchers' use of intervention consistency testing to detect spurious feature reliance provides a valuable tool for understanding the behavior of LLMs and improving their explainability and interpretability.

Recommendations

✓ Developers and deployers of AI systems should prioritize the use of tools like ICE-Guard to detect and mitigate biases in LLMs.
✓ Policymakers should prioritize the development of fairness and accountability standards for AI systems, including LLMs, to ensure their safe and responsible deployment in high-stakes applications.

Sources

arXiv - cs.CL

When Names Change Verdicts: Intervention Consistency Reveals Systematic Bias in LLM Decision-Making

AI Commentary

Executive Summary

Key Points

Merits

Comprehensive evaluation of LLM biases

Development of a practical mitigation framework

Demerits

Limited scope to real-world applications

Requires significant computational resources

Expert Commentary

Recommendations

Sources

Related Articles

AI-Driven Approaches to Enhancing Fairness and Identifying Algorithmic Bias in …

High resolution schemes for hyperbolic conservation laws

Robust Graph Representation Learning via Adaptive Spectral Contrast

Towards Intrinsically Calibrated Uncertainty Quantification in Industrial Data-Driven Models via …

JCG, PC

HSOLLC Co., Ltd.