1 code implementation • 30 May 2023 • Jonathan Mei, Alexander Moreno, Luke Walters
Second order stochastic optimizers allow parameter update step size and direction to adapt to loss curvature, but have traditionally required too much memory and compute for deep learning.
no code implementations • 15 May 2023 • Alexander Moreno, Jonathan Mei, Luke Walters
For the low rank component, we replace the RPE MLP with linear interpolation and use asymmetric Structured Kernel Interpolation (SKI) (Wilson et.