Prompt Injection as Role Confusion
arXiv:2603.12277v1 Announce Type: cross Abstract: Language models remain vulnerable to prompt injection attacks despite extensive safety training. We trace this failure to role confusion: models …
Charles Ye, Jasmine Cui, Dylan Hadfield-Menell
12 views