Tag: cs.CR

#cs.CR

Academic · 1 min

UK AISI Alignment Evaluation Case-Study

arXiv:2604.00788v1 Announce Type: new Abstract: This technical report presents methods developed by the UK AI Security Institute for assessing whether advanced AI systems reliably follow …

Alexandra Souly, Robert Kirk, Jacob Merizian, Abby D'Cruz, Xander Davies
16 views
Academic · 1 min

Internal Safety Collapse in Frontier Large Language Models

arXiv:2603.23509v1 Announce Type: new Abstract: This work identifies a critical failure mode in frontier large language models (LLMs), which we term Internal Safety Collapse (ISC): …

Yutao Wu, Xiao Liu, Yifeng Gao, Xiang Zheng, Hanxun Huang, Yige Li, Cong Wang, Bo Li, Xingjun Ma, Yu-Gang Jiang
45 views
Academic · 1 min

RedacBench: Can AI Erase Your Secrets?

arXiv:2603.20208v1 Announce Type: new Abstract: Modern language models can readily extract sensitive information from unstructured text, making redaction -- the selective removal of such information …

Hyunjun Jeon, Kyuyoung Kim, Jinwoo Shin
46 views
Academic · 1 min

MAPLE: Metadata Augmented Private Language Evolution

arXiv:2603.19258v1 Announce Type: cross Abstract: While differentially private (DP) fine-tuning of large language models (LLMs) is a powerful tool, it is often computationally prohibitive or …

Eli Chien, Yuzheng Hu, Ryan McKenna, Shanshan Wu, Zheng Xu, Peter Kairouz
24 views