All Articles

Articles

Academic · 1 min

Brittlebench: Quantifying LLM robustness via prompt sensitivity

arXiv:2603.13285v1 Announce Type: new Abstract: Existing evaluation methods largely rely on clean, static benchmarks, which can overestimate true model performance by failing to capture the …

Angelika Romanou, Mark Ibrahim, Candace Ross, Chantal Shaib, Kerem Okta, Sam Bell, Elia Ovalle, Jesse Dodge, Antoine Bosselut, Koustuv Sinha, Adina Williams
10 views
Academic · 1 min

From Stochastic Answers to Verifiable Reasoning: Interpretable Decision-Making with LLM-Generated Code

arXiv:2603.13287v1 Announce Type: new Abstract: Large language models (LLMs) are increasingly used for high-stakes decision-making, yet existing approaches struggle to reconcile scalability, interpretability, and reproducibility. …

Anirudh Jaidev Mahesh, Ben Griffin, Fuat Alican, Joseph Ternasky, Zakari Salifu, Kelvin Amoaba, Yagiz Ihlamur, Aaron Ontoyin Yin, Aikins Laryea, Afriyie Samuel, Yigit Ihlamur
8 views
Academic · 1 min

ICPRL: Acquiring Physical Intuition from Interactive Control

arXiv:2603.13295v1 Announce Type: new Abstract: VLMs excel at static perception but falter in interactive reasoning in dynamic physical environments, which demands planning and adaptation to …

Xinrun Xu, Pi Bu, Ye Wang, B\"orje F. Karlsson, Ziming Wang, Tengtao Song, Qi Zhu, Jun Song, Shuo Zhang, Zhiming Ding, Bo Zheng
8 views
Academic · 1 min

DreamReader: An Interpretability Toolkit for Text-to-Image Models

arXiv:2603.13299v1 Announce Type: new Abstract: Despite the rapid adoption of text-to-image (T2I) diffusion models, causal and representation-level analysis remains fragmented and largely limited to isolated …

Nirmalendu Prakash, Narmeen Oozeer, Michael Lan, Luka Samkharadze, Phillip Howard, Roy Ka-Wei Lee, Dhruv Nathawani, Shivam Raval, Amirali Abdullah
15 views