Data-efficient pre-training by scaling synthetic megadocs
arXiv:2603.18534v1 Announce Type: new Abstract: Synthetic data augmentation has emerged as a promising solution when pre-training is constrained by data rather than compute. We study …
Konwoo Kim, Suhas Kotha, Yejin Choi, Tatsunori Hashimoto, Nick Haber, Percy Liang
5 views