Towards a Science of AI Agent Reliability
arXiv:2602.16666v1 Announce Type: new Abstract: AI agents are increasingly deployed to execute important tasks. While rising accuracy scores on standard benchmarks suggest rapid progress, many …
Quality follows upgrading
Academic
arXiv:2602.16666v1 Announce Type: new Abstract: AI agents are increasingly deployed to execute important tasks. While rising accuracy scores on standard benchmarks suggest rapid progress, many …
arXiv:2602.15832v1 Announce Type: cross Abstract: Existing user simulations, where models generate user-like responses in dialogue, often lack verification that sufficient user personas are provided, questioning …
arXiv:2602.15836v1 Announce Type: cross Abstract: Large Action Models (LAMs) have shown immense potential in autonomous navigation by bridging high-level reasoning with low-level control. However, deploying …
arXiv:2602.15843v1 Announce Type: cross Abstract: In "Compress or Route?" (Johnson, 2026), we found that code generation tolerates aggressive prompt compression (r >= 0.6) while chain-of-thought …
arXiv:2602.15844v1 Announce Type: cross Abstract: The Web is a rich source of structured data in the form of tables, from product catalogs and knowledge bases …
arXiv:2602.15847v1 Announce Type: cross Abstract: Personality steering in large language models (LLMs) commonly relies on injecting trait-specific steering vectors, implicitly assuming that personality traits can …
Artificial Intelligence (AI) plays a crucial role in the legal field today, carrying out processes such as predictive analysis, data interpretation, and decision making. AI …
Abstract Aim To develop a consensus paper on the central points of an international invitational think‐tank on nursing and artificial intelligence (AI). Methods We established …
Artificial intelligence (AI) has emerged as a crucial tool in healthcare with the primary aim of improving patient outcomes and optimizing healthcare delivery. By harnessing …