Measuring AI Agents' Progress on Multi-Step Cyber Attack Scenarios
arXiv:2603.11214v1 Announce Type: new Abstract: We evaluate the autonomous cyber-attack capabilities of frontier AI models on two purpose-built cyber ranges-a 32-step corporate network attack and …