This platform requires JavaScript for full functionality. Please enable JavaScript in your browser settings.

Quality follows upgrading

Tanay Gondil

Articles by Tanay Gondil

Academic · 1 min

Do Language Models Know When They'll Refuse? Probing Introspective Awareness of Safety Boundaries

arXiv:2604.00228v1 Announce Type: new Abstract: Large language models are trained to refuse harmful requests, but can they accurately predict when they will refuse before responding? …

Tanay Gondil

3 views Apr 3

Tanay Gondil

Articles by Tanay Gondil

Do Language Models Know When They'll Refuse? Probing Introspective Awareness of Safety Boundaries

JCG, PC

HSOLLC Co., Ltd.