1 code implementation • 13 Feb 2020 • Kazuki Osawa, Yohei Tsuji, Yuichiro Ueno, Akira Naruse, Chuan-Sheng Foo, Rio Yokota
Large-scale distributed training of deep neural networks results in models with worse generalization performance as a result of the increase in the effective mini-batch size.
3 code implementations • CVPR 2019 • Kazuki Osawa, Yohei Tsuji, Yuichiro Ueno, Akira Naruse, Rio Yokota, Satoshi Matsuoka
Large-scale distributed training of deep neural networks suffer from the generalization gap caused by the increase in the effective mini-batch size.