How Transformers Reject Wrong Answers: Rotational Dynamics of Factual Constraint Processing
arXiv:2603.13259v1 Announce Type: new Abstract: When a language model is fed a wrong answer, what happens inside the network? Current understanding treats truthfulness as a …
Javier Mar\'in
12 views