Academic

Marked Pedagogies: Examining Linguistic Biases in Personalized Automated Writing Feedback

arXiv:2603.12471v1 Announce Type: new Abstract: Effective personalized feedback is critical to students' literacy development. Though LLM-powered tools now promise to automate such feedback at scale, LLMs are not language-neutral: they privilege standard academic English and reproduce social stereotypes, raising concerns about how "personalization" shapes the feedback students receive. We examine how four widely used LLMs (GPT-4o, GPT-3.5-turbo, Llama-3.3 70B, Llama-3.1 8B) adapt written feedback in response to student attributes. Using 600 eighth-grade persuasive essays from the PERSUADE dataset, we generated feedback under prompt conditions embedding gender, race/ethnicity, learning needs, achievement, and motivation. We analyze lexical shifts across model outputs by adapting the Marked Words framework. Our results reveal systematic, stereotype-aligned shifts in feedback conditioned on presumed student attributes--even when essay content was identical. Feedback for students marked b

M
Mei Tan, Lena Phalen, Dorottya Demszky
· · 1 min read · 14 views

arXiv:2603.12471v1 Announce Type: new Abstract: Effective personalized feedback is critical to students' literacy development. Though LLM-powered tools now promise to automate such feedback at scale, LLMs are not language-neutral: they privilege standard academic English and reproduce social stereotypes, raising concerns about how "personalization" shapes the feedback students receive. We examine how four widely used LLMs (GPT-4o, GPT-3.5-turbo, Llama-3.3 70B, Llama-3.1 8B) adapt written feedback in response to student attributes. Using 600 eighth-grade persuasive essays from the PERSUADE dataset, we generated feedback under prompt conditions embedding gender, race/ethnicity, learning needs, achievement, and motivation. We analyze lexical shifts across model outputs by adapting the Marked Words framework. Our results reveal systematic, stereotype-aligned shifts in feedback conditioned on presumed student attributes--even when essay content was identical. Feedback for students marked by race, language, or disability often exhibited positive feedback bias and feedback withholding bias--overuse of praise, less substantive critique, and assumptions of limited ability. Across attributes, models tailored not only what content was emphasized but also how writing was judged and how students were addressed. We term these instructional orientations Marked Pedagogies and highlight the need for transparency and accountability in automated feedback tools.

Executive Summary

The article examines linguistic biases in personalized automated writing feedback generated by large language models (LLMs). The study reveals systematic shifts in feedback based on student attributes such as gender, race, and learning needs, with models exhibiting positive feedback bias and feedback withholding bias towards certain groups. The findings highlight the need for transparency and accountability in automated feedback tools to ensure fair and effective literacy development for all students.

Key Points

  • LLMs reproduce social stereotypes and privilege standard academic English
  • Feedback is conditioned on presumed student attributes, leading to biased feedback
  • Models exhibit positive feedback bias and feedback withholding bias towards certain student groups

Merits

Methodological Rigor

The study employs a robust methodology, utilizing a large dataset and adapting the Marked Words framework to analyze lexical shifts in model outputs.

Demerits

Limited Generalizability

The study focuses on a specific dataset and LLMs, which may limit the generalizability of the findings to other contexts and models.

Expert Commentary

The study's findings have significant implications for the development and use of automated feedback tools in educational settings. The discovery of systematic biases in LLM-generated feedback highlights the need for careful consideration of fairness and equity in AI development. Furthermore, the study's methodology provides a valuable framework for analyzing linguistic biases in AI systems, which can be applied to other contexts and models. Ultimately, the study underscores the importance of transparency and accountability in AI development to ensure that these systems promote fair and effective learning outcomes for all students.

Recommendations

  • Developers of automated feedback tools should prioritize transparency and accountability in their systems
  • Educational institutions should carefully evaluate the potential biases in automated feedback tools before implementing them in their curricula

Sources