Replaying pre-training data improves fine-tuning
arXiv:2603.04964v1 Announce Type: new Abstract: To obtain a language model for a target domain (e.g. math), the current paradigm is to pre-train on a vast …
Suhas Kotha, Percy Liang
4 views