no code implementations • 23 Jan 2023 • Lu Xia, Michiel E. Hochstenbach, Stefano Massei
When training neural networks with low-precision computation, rounding errors often cause stagnation or are detrimental to the convergence of the optimizers; in this paper we study the influence of rounding errors on the convergence of the gradient descent method for problems satisfying the Polyak-Lojasiewicz inequality.
no code implementations • 24 Feb 2022 • Lu Xia, Stefano Massei, Michiel E. Hochstenbach, Barry Koren
When implementing the gradient descent method in low precision, the employment of stochastic rounding schemes helps to prevent stagnation of convergence caused by the vanishing gradient effect.