Leveraging Computerized Adaptive Testing for Cost-effective Evaluation of Large Language Models in Medical Benchmarking
arXiv:2603.23506v1 Announce Type: new Abstract: The rapid proliferation of large language models (LLMs) in healthcare creates an urgent need for scalable and psychometrically sound evaluation …
Tianpeng Zheng, Zhehan Jiang, Jiayi Liu, Shicong Feng
26 views