no code implementations • 14 Mar 2019 • Linjian Ma, Gabe Montague, Jiayu Ye, Zhewei Yao, Amir Gholami, Kurt Keutzer, Michael W. Mahoney
In stochastic optimization, using large batch sizes during training can leverage parallel resources to produce faster wall-clock training times per training epoch.