1 code implementation • 13 Oct 2018 • Drew Mitchell, Nan Ye, Hans De Sterck
While Nesterov acceleration turns gradient descent into an optimal first-order method for convex problems by adding a momentum term with a specific weight sequence, a direct application of this method and weight sequence to ALS results in erratic convergence behaviour.