All Articles

Articles

Academic · 1 min

QuarkMedBench: A Real-World Scenario Driven Benchmark for Evaluating Large Language Models

arXiv:2603.13691v1 Announce Type: new Abstract: While Large Language Models (LLMs) excel on standardized medical exams, high scores often fail to translate to high-quality responses for …

Yao Wu, Kangping Yin, Liang Dong, Zhenxin Ma, Shuting Xu, Xuehai Wang, Yuxuan Jiang, Tingting Yu, Yunqing Hong, Jiayi Liu, Rianzhe Huang, Shuxin Zhao, Haiping Hu, Wen Shang, Jian Xu, Guanjun Jiang
17 views
Academic · 1 min

LiveWeb-IE: A Benchmark For Online Web Information Extraction

arXiv:2603.13773v1 Announce Type: new Abstract: Web information extraction (WIE) is the task of automatically extracting data from web pages, offering high utility for various applications. …

Seungbin Yang, Jihwan Kim, Jaemin Choi, Dongjin Kim, Soyoung Yang, ChaeHun Park, Jaegul Choo
12 views
Academic · 1 min

The Phenomenology of Hallucinations

arXiv:2603.13911v1 Announce Type: new Abstract: We show that language models hallucinate not because they fail to detect uncertainty, but because of a failure to integrate …

Valeria Ruscio, Keiran Thompson
4 views