Shuffle-Exchange Brings Faster: Reduce the Idle Time During Communication for Decentralized Neural Network Training

16 Jul 2020 Yang Xiang

As a crucial scheme to accelerate the deep neural network (DNN) training, distributed stochastic gradient descent (DSGD) is widely adopted in many real-world applications. In most distributed deep learning (DL) frameworks, DSGD is implemented with Ring-AllReduce architecture (Ring-SGD) and uses a computation-communication overlap strategy to address the overhead of the massive communications required by DSGD... (read more)

PDF Abstract
No code implementations yet. Submit your code now

Categories


  • DISTRIBUTED, PARALLEL, AND CLUSTER COMPUTING