Is Evaluation Awareness Just Format Sensitivity? Limitations of Probe-Based Evidence under Controlled Prompt Structure
arXiv:2603.19426v1 Announce Type: new Abstract: Prior work uses linear probes on benchmark prompts as evidence of evaluation awareness in large language models. Because evaluation context …
Viliana Devbunova
7 views