WHBench: Evaluating Frontier LLMs with Expert-in-the-Loop Validation on Women's Health Topics
arXiv:2604.00024v1 Announce Type: new Abstract: Large language models are increasingly used for medical guidance, but women's health remains under-evaluated in benchmark design. We present the …
Sneha Maurya, Pragya Saboo, Girish Kumar
7 views