Academic

Academic

Academic · 1 min

QuarkMedBench: A Real-World Scenario Driven Benchmark for Evaluating Large Language Models

arXiv:2603.13691v1 Announce Type: new Abstract: While Large Language Models (LLMs) excel on standardized medical exams, high scores often fail to translate to high-quality responses for …

Yao Wu, Kangping Yin, Liang Dong, Zhenxin Ma, Shuting Xu, Xuehai Wang, Yuxuan Jiang, Tingting Yu, Yunqing Hong, Jiayi Liu, Rianzhe Huang, Shuxin Zhao, Haiping Hu, Wen Shang, Jian Xu, Guanjun Jiang
14 views
Academic · 1 min

Widespread Gender and Pronoun Bias in Moral Judgments Across LLMs

arXiv:2603.13636v1 Announce Type: new Abstract: Large language models (LLMs) are increasingly used to assess moral or ethical statements, yet their judgments may reflect social and …

Gustavo L\'ucius Fernandes, Jeiverson C. V. M. Santos, Pedro O. S. Vaz-de-Melo
15 views