All Articles

Articles

Academic · 1 min

Large Language Models in the Abuse Detection Pipeline

arXiv:2604.00323v1 Announce Type: new Abstract: Online abuse has grown increasingly complex, spanning toxic language, harassment, manipulation, and fraudulent behavior. Traditional machine-learning approaches dependent on static …

Suraj Kath, Sanket Badhe, Preet Shah, Ashwin Sampathkumar, Shivani Gupta
3 views
Academic · 1 min

UK AISI Alignment Evaluation Case-Study

arXiv:2604.00788v1 Announce Type: new Abstract: This technical report presents methods developed by the UK AI Security Institute for assessing whether advanced AI systems reliably follow …

Alexandra Souly, Robert Kirk, Jacob Merizian, Abby D'Cruz, Xander Davies
10 views