Shattering the Shortcut: A Topology-Regularized Benchmark for Multi-hop Medical Reasoning in LLMs
arXiv:2603.12458v1 Announce Type: cross Abstract: While Large Language Models (LLMs) achieve expert-level performance on standard medical benchmarks through single-hop factual recall, they severely struggle with …
Xing Zi, Xinying Zhou, Jinghao Xiao, Catarina Moreira, Mukesh Prasad
9 views