no code implementations • 14 Jan 2024 • Chengli Tan, Jiangshe Zhang, Junmin Liu, Yicheng Wang, Yunda Hao
Recently, sharpness-aware minimization (SAM) has attracted a lot of attention because of its surprising effectiveness in improving generalization performance. However, training neural networks with SAM can be highly unstable since the loss does not decrease along the direction of the exact gradient at the current point, but instead follows the direction of a surrogate gradient evaluated at another point nearby.