Academic

Widespread Gender and Pronoun Bias in Moral Judgments Across LLMs

arXiv:2603.13636v1 Announce Type: new Abstract: Large language models (LLMs) are increasingly used to assess moral or ethical statements, yet their judgments may reflect social and linguistic biases. This work presents a controlled, sentence-level study of how grammatical person, number, and gender markers influence LLM moral classifications of fairness. Starting from 550 balanced base sentences from the ETHICS dataset, we generated 26 counterfactual variants per item, systematically varying pronouns and demographic markers to yield 14,850 semantically equivalent sentences. We evaluated six model families (Grok, GPT, LLaMA, Gemma, DeepSeek, and Mistral), and measured fairness judgments and inter-group disparities using Statistical Parity Difference (SPD). Results show statistically significant biases: sentences written in the singular form and third person are more often judged as "fair'', while those in the second person are penalized. Gender markers produce the strongest effects, wi

arXiv:2603.13636v1 Announce Type: new Abstract: Large language models (LLMs) are increasingly used to assess moral or ethical statements, yet their judgments may reflect social and linguistic biases. This work presents a controlled, sentence-level study of how grammatical person, number, and gender markers influence LLM moral classifications of fairness. Starting from 550 balanced base sentences from the ETHICS dataset, we generated 26 counterfactual variants per item, systematically varying pronouns and demographic markers to yield 14,850 semantically equivalent sentences. We evaluated six model families (Grok, GPT, LLaMA, Gemma, DeepSeek, and Mistral), and measured fairness judgments and inter-group disparities using Statistical Parity Difference (SPD). Results show statistically significant biases: sentences written in the singular form and third person are more often judged as "fair'', while those in the second person are penalized. Gender markers produce the strongest effects, with non-binary subjects consistently favored and male subjects disfavored. We conjecture that these patterns reflect distributional and alignment biases learned during training, emphasizing the need for targeted fairness interventions in moral LLM applications.

Executive Summary

This study reveals statistically significant gender and pronoun biases in moral judgments across major large language models (LLMs), demonstrating that grammatical person, number, and gender markers influence fairness assessments. Using a controlled counterfactual design across 14,850 variants of 550 base sentences, the researchers found that sentences in singular or third-person forms are disproportionately deemed 'fair,' while second-person constructions are penalized. Gender bias is particularly pronounced, with non-binary subjects favored and male subjects disfavored. These findings suggest that LLM moral evaluation systems inherit distributional and alignment biases from training data. The work underscores a critical gap between intended ethical neutrality and actual biased outcomes in AI-mediated moral reasoning.

Key Points

  • Pronoun and grammatical person bias affects fairness judgments in LLMs
  • Gender markers produce the strongest bias effect, favoring non-binary and penalizing male subjects
  • Significant disparities emerge across model families despite semantic equivalence

Merits

Methodological Rigor

The controlled counterfactual design with 26 variants per sentence and use of SPD as a metric demonstrates rigorous experimental rigor and quantifiable bias detection.

Demerits

Generalizability Constraint

The study focuses on specific syntactic variations and may not capture broader linguistic or cultural nuances beyond the ETHICS dataset or tested model families.

Expert Commentary

The findings represent a pivotal moment in the ethical evaluation of LLMs. While prior studies have documented bias in factual or descriptive contexts, this work extends the analysis to moral judgment—a domain where normative expectations are heightened. The statistical significance of gender bias, particularly the disfavoring of male subjects, aligns with broader societal patterns of implicit bias that have been documented in human-led decision-making. Importantly, the authors' use of counterfactual manipulation to isolate linguistic effects is commendable; it demonstrates a high level of experimental control. However, the absence of longitudinal or evolutionary analysis of bias propagation from training to inference leaves a gap. Future work should integrate causal inference frameworks to trace bias origins and evaluate mitigation strategies—such as post-processing filters or diversity-augmented training—that could reduce these effects without compromising accuracy. This study should catalyze a broader conversation on ethical AI governance, particularly as LLMs are increasingly integrated into legal decision-support systems.

Recommendations

  • 1. Incorporate bias detection protocols in the development of LLM-based moral evaluation tools, particularly for legal or judicial applications
  • 2. Develop and publish transparency reports on bias metrics for LLMs used in ethical decision-making platforms

Sources