no code implementations • 27 May 2024 • Haichao Sha, Yang Cao, Yong liu, Yuncheng Wu, Ruixuan Liu, Hong Chen
However, recent studies have shown that the gradients in deep learning exhibit a heavy-tail phenomenon, that is, the tails of the gradient have infinite variance, which may lead to excessive clipping loss to the gradients with existing DPSGD mechanisms.
no code implementations • 6 Dec 2023 • Haichao Sha, Ruixuan Liu, Yixuan Liu, Hong Chen
We prove that pre-projection enhances the convergence of DP-SGD by reducing the dependence of clipping error and bias to a fraction of the top gradient eigenspace, and in theory, limits cross-client variance to improve the convergence under heterogeneous federation.