Category

Academic

Academic · 1 min

BenchBench: Benchmarking Automated Benchmark Generation

arXiv:2603.20807v1 Announce Type: new Abstract: Benchmarks are the de facto standard for tracking progress in large language models (LLMs), yet static test sets can rapidly …

Yandan Zheng, Haoran Luo, Zhenghong Lin, Wenjin Liu, Luu Anh Tuan
5 views
Academic · 1 min

Can ChatGPT Really Understand Modern Chinese Poetry?

arXiv:2603.20851v1 Announce Type: new Abstract: ChatGPT has demonstrated remarkable capabilities on both poetry generation and translation, yet its ability to truly understand poetry remains unexplored. …

Shanshan Wang, Derek F. Wong, Jingming Yao, Lidia S. Chao
2 views
Academic · 1 min

LLM Router: Prefill is All You Need

arXiv:2603.20895v1 Announce Type: new Abstract: LLMs often share comparable benchmark accuracies, but their complementary performance across task subsets suggests that an Oracle router--a theoretical selector …

Tanay Varshney, Annie Surla, Michelle Xu, Gomathy Venkata Krishnan, Maximilian Jeblick, David Austin, Neal Vaidya, Davide Onofrio
3 views