Academic

Academic

Academic · 1 min

Khatri-Rao Clustering for Data Summarization

arXiv:2603.06602v1 Announce Type: new Abstract: As datasets continue to grow in size and complexity, finding succinct yet accurate data summaries poses a key challenge. Centroid-based …

Martino Ciaperoni, Collin Leiber, Aristides Gionis, Heikki Mannila
14 views
Academic · 1 min

Scale Dependent Data Duplication

arXiv:2603.06603v1 Announce Type: new Abstract: Data duplication during pretraining can degrade generalization and lead to memorization, motivating aggressive deduplication pipelines. However, at web scale, it …

Joshua Kazdan, Noam Levi, Rylan Schaeffer, Jessica Chudnovsky, Abhay Puri, Bo He, Mehmet Donmez, Sanmi Koyejo, David Donoho
12 views
Academic · 1 min

CapTrack: Multifaceted Evaluation of Forgetting in LLM Post-Training

arXiv:2603.06610v1 Announce Type: new Abstract: Large language model (LLM) post-training enhances latent skills, unlocks value alignment, improves performance, and enables domain adaptation. Unfortunately, post-training is …

Lukas Thede, Stefan Winzeck, Zeynep Akata, Jonathan Richard Schwarz
10 views
Academic · 1 min

Correlation Analysis of Generative Models

arXiv:2603.06614v1 Announce Type: new Abstract: Based on literature review about existing diffusion models and flow matching with a neural network to predict a predefined target …

Zhengguo Li, Chaobing Zheng, Wei Wang
17 views