SWAY: A Counterfactual Computational Linguistic Approach to Measuring and Mitigating Sycophancy
arXiv:2604.02423v1 Announce Type: new Abstract: Large language models exhibit sycophancy: the tendency to shift outputs toward user-expressed stances, regardless of correctness or consistency. While prior …
Joy Bhalla, Kristina Gligori\'c
4 views