Learning the Signature of Memorization in Autoregressive Language Models
arXiv:2604.03199v1 Announce Type: new Abstract: All prior membership inference attacks for fine-tuned language models use hand-crafted heuristics (e.g., loss thresholding, Min-K\%, reference calibration), each bounded …
David Ili\'c, Kostadin Cvejoski, David Stanojevi\'c, Evgeny Grigorenko
5 views