no code implementations • 18 Oct 2019 • Scott Sievert, Shrey Shah
This work presents a method to adapt the batch size to the model's training loss.
1 code implementation • NeurIPS 2018 • Hongyi Wang, Scott Sievert, Zachary Charles, Shengchao Liu, Stephen Wright, Dimitris Papailiopoulos
We present ATOMO, a general framework for atomic sparsification of stochastic gradients.