Permutation-Consensus Listwise Judging for Robust Factuality Evaluation
arXiv:2603.20562v1 Announce Type: new Abstract: Large language models (LLMs) are now widely used as judges, yet their decisions can change under presentation choices that should …
Tianyi Huang, Nathan Huang, Justin Tang, Wenqian Chen, Elsa Fan
6 views