Qworld: Question-Specific Evaluation Criteria for LLMs
arXiv:2603.23522v1 Announce Type: new Abstract: Evaluating large language models (LLMs) on open-ended questions is difficult because response quality depends on the question's context. Binary scores …
Shanghua Gao, Yuchang Su, Pengwei Sui, Curtis Ginder, Marinka Zitnik
23 views