Tula: Optimizing Time, Cost, and Generalization in Distributed Large-Batch Training
arXiv:2603.18112v1 Announce Type: new Abstract: Distributed training increases the number of batches processed per iteration either by scaling-out (adding more nodes) or scaling-up (increasing the …
Sahil Tyagi, Feiyi Wang
5 views