Who Benchmarks the Benchmarks? A Case Study of LLM Evaluation in Icelandic
arXiv:2603.16406v1 Announce Type: new Abstract: This paper evaluates current Large Language Model (LLM) benchmarking for Icelandic, identifies problems, and calls for improved evaluation methods in …
Finnur \'Ag\'ust Ingimundarson, Steinunn Rut Fri{\dh}riksd\'ottir, Bjarki \'Armannsson, Iris Edda Nowenstein, Stein{\th}\'or Steingr\'imsson
10 views