Academic

Is AI Catching Up to Human Expression? Exploring Emotion, Personality, Authorship, and Linguistic Style in English and Arabic with Six Large Language Models

arXiv:2603.23251v1 Announce Type: new Abstract: The advancing fluency of LLMs raises important questions about their ability to emulate complex human traits, including emotional expression and personality, across diverse linguistic and cultural contexts. This study investigates whether LLMs can convincingly mimic emotional nuance in English and personality markers in Arabic, a critical under-resourced language with unique linguistic and cultural characteristics. We conduct two tasks across six models:Jais, Mistral, LLaMA, GPT-4o, Gemini, and DeepSeek. First, we evaluate whether machine classifiers can reliably distinguish between human-authored and AI-generated texts. Second, we assess the extent to which LLM-generated texts exhibit emotional or personality traits comparable to those of humans. Our results demonstrate that AI-generated texts are distinguishable from human-authored ones (F1>0.95), though classification performance deteriorates on paraphrased samples, indicating a relia

N
Nasser A Alsadhan
· · 1 min read · 0 views

arXiv:2603.23251v1 Announce Type: new Abstract: The advancing fluency of LLMs raises important questions about their ability to emulate complex human traits, including emotional expression and personality, across diverse linguistic and cultural contexts. This study investigates whether LLMs can convincingly mimic emotional nuance in English and personality markers in Arabic, a critical under-resourced language with unique linguistic and cultural characteristics. We conduct two tasks across six models:Jais, Mistral, LLaMA, GPT-4o, Gemini, and DeepSeek. First, we evaluate whether machine classifiers can reliably distinguish between human-authored and AI-generated texts. Second, we assess the extent to which LLM-generated texts exhibit emotional or personality traits comparable to those of humans. Our results demonstrate that AI-generated texts are distinguishable from human-authored ones (F1>0.95), though classification performance deteriorates on paraphrased samples, indicating a reliance on superficial stylistic cues. Emotion and personality classification experiments reveal significant generalization gaps: classifiers trained on human data perform poorly on AI-generated texts and vice versa, suggesting LLMs encode affective signals differently from humans. Importantly, augmenting training with AI-generated data enhances performance in the Arabic personality classification task, highlighting the potential of synthetic data to address challenges in under-resourced languages. Model-specific analyses show that GPT-4o and Gemini exhibit superior affective coherence. Linguistic and psycholinguistic analyses reveal measurable divergences in tone, authenticity, and textual complexity between human and AI texts. These findings have implications for affective computing, authorship attribution, and responsible AI deployment, particularly within underresourced language contexts where generative AI detection and alignment pose unique challenges.

Executive Summary

This study investigates the ability of six large language models (LLMs) to emulate human emotional expression, personality, and linguistic style in English and Arabic. The researchers conducted two tasks: distinguishing between human-authored and AI-generated texts, and evaluating the emotional and personality traits exhibited by LLM-generated texts. The results show that AI-generated texts can be distinguished from human-authored ones, but with reduced performance on paraphrased samples. Additionally, the study found significant generalization gaps between human and AI data, highlighting the need for more sophisticated affective signal encoding. The findings have important implications for affective computing, authorship attribution, and responsible AI deployment, particularly in under-resourced language contexts.

Key Points

  • LLMs can distinguish between human-authored and AI-generated texts, but with limitations
  • Significant generalization gaps exist between human and AI data for emotion and personality classification
  • Synthetic data can enhance performance in under-resourced languages

Merits

Advancing Our Understanding of LLMs

This study provides valuable insights into the capabilities and limitations of LLMs, shedding light on their ability to emulate human traits and linguistic styles.

Methodological Rigor

The researchers employed a robust methodology, involving multiple tasks and datasets, to evaluate the performance of LLMs and identify areas for improvement.

Demerits

Limited Contextual Understanding

The study's findings suggest that LLMs may struggle to understand contextual nuances, relying on superficial stylistic cues to generate texts.

Data Quality and Availability

The results highlight the challenges of working with under-resourced languages, where high-quality training data may be scarce or difficult to obtain.

Expert Commentary

The study's findings provide a nuanced understanding of the capabilities and limitations of LLMs, highlighting the need for more sophisticated affective signal encoding and contextual understanding. The results also underscore the importance of high-quality training data, particularly in under-resourced languages. As AI continues to advance, it is essential that researchers and policymakers prioritize the development of explainable, transparent, and responsible AI systems that can reliably and accurately capture human emotional expression and personality.

Recommendations

  • Develop more sophisticated affective signal encoding techniques to improve LLM performance in under-resourced languages
  • Prioritize the development and deployment of responsible AI systems, particularly in contexts where affective computing and authorship attribution are critical

Sources

Original: arXiv - cs.CL