no code implementations • 13 Jul 2023 • Rishabh Dixit, Mert Gurbuzbalaban, Waheed U. Bajwa
This work also develops two metrics of asymptotic rate of convergence and divergence, and evaluates these two metrics for several popular standard accelerated methods such as the NAG, and Nesterov's accelerated gradient with constant momentum (NCM) near strict saddle points.
no code implementations • 13 May 2022 • Mert Gurbuzbalaban, Yuanhan Hu, Umut Simsekli, Kun Yuan, Lingjiong Zhu
To have a more explicit control on the tail exponent, we then consider the case where the loss at each node is a quadratic, and show that the tail-index can be estimated as a function of the step-size, batch-size, and the topological properties of the network of the computational nodes.
no code implementations • 19 Feb 2022 • Bugra Can, Mert Gurbuzbalaban, Necdet Serhat Aybat
In this work, we consider strongly convex strongly concave (SCSC) saddle point (SP) problems $\min_{x\in\mathbb{R}^{d_x}}\max_{y\in\mathbb{R}^{d_y}}f(x, y)$ where $f$ is $L$-smooth, $f(., y)$ is $\mu$-strongly convex for every $y$, and $f(x,.
no code implementations • 7 Jan 2021 • Rishabh Dixit, Mert Gurbuzbalaban, Waheed U. Bajwa
This paper concerns convergence of first-order discrete methods to a local minimum of nonconvex optimization problems that comprise strict-saddle points within the geometrical landscape.
no code implementations • NeurIPS 2020 • Xuefeng Gao, Mert Gurbuzbalaban, Lingjiong Zhu
We study two variants that are based on non-reversible Langevin diffusions: the underdamped Langevin dynamics (ULD) and the Langevin dynamics with a non-symmetric drift (NLD).
1 code implementation • 5 Aug 2020 • Nurdan Kuru, Ş. İlker Birbil, Mert Gurbuzbalaban, Sinan Yildirim
The first algorithm is inspired by Polyak's heavy ball method and employs a smoothing approach to decrease the accumulated noise on the gradient steps required for differential privacy.
1 code implementation • 8 Jun 2020 • Mert Gurbuzbalaban, Umut Şimşekli, Lingjiong Zhu
We claim that depending on the structure of the Hessian of the loss at the minimum, and the choices of the algorithm parameters $\eta$ and $b$, the SGD iterates will converge to a \emph{heavy-tailed} stationary distribution.
no code implementations • 1 Jun 2020 • Rishabh Dixit, Mert Gurbuzbalaban, Waheed U. Bajwa
This paper considers the problem of understanding the exit time for trajectories of gradient-related first-order methods from saddle neighborhoods under some initial boundary conditions.
no code implementations • 25 May 2020 • Mert Gurbuzbalaban, Yuanhan Hu
We prove that the logarithm of the norm of the network outputs, if properly scaled, will converge to a Gaussian distribution with an explicit mean and variance we can compute depending on the activation used, the value of s chosen and the network width.
no code implementations • 6 Apr 2020 • Yuanhan Hu, Xiaoyu Wang, Xuefeng Gao, Mert Gurbuzbalaban, Lingjiong Zhu
In this paper, we study the non reversible Stochastic Gradient Langevin Dynamics (NSGLD) which is based on discretization of the non-reversible Langevin diffusion.
no code implementations • 19 Oct 2019 • Alireza Fallah, Mert Gurbuzbalaban, Asuman Ozdaglar, Umut Simsekli, Lingjiong Zhu
When gradients do not contain noise, we also prove that distributed accelerated methods can \emph{achieve acceleration}, requiring $\mathcal{O}(\kappa \log(1/\varepsilon))$ gradient evaluations and $\mathcal{O}(\kappa \log(1/\varepsilon))$ communications to converge to the same fixed point with the non-accelerated variant where $\kappa$ is the condition number and $\varepsilon$ is the target accuracy.
no code implementations • NeurIPS 2019 • Necdet Serhat Aybat, Alireza Fallah, Mert Gurbuzbalaban, Asuman Ozdaglar
We study the problem of minimizing a strongly convex, smooth function when we have noisy estimates of its gradient.
no code implementations • 22 Jan 2019 • Bugra Can, Mert Gurbuzbalaban, Lingjiong Zhu
In the special case of strongly convex quadratic objectives, we can show accelerated linear rates in the $p$-Wasserstein metric for any $p\geq 1$ with improved sensitivity to noise for both AG and HB through a non-asymptotic analysis under some additional assumptions on the noise structure.
1 code implementation • 18 Jan 2019 • Umut Simsekli, Levent Sagun, Mert Gurbuzbalaban
This assumption is often made for mathematical convenience, since it enables SGD to be analyzed as a stochastic differential equation (SDE) driven by a Brownian motion.
no code implementations • 19 Dec 2018 • Xuefeng Gao, Mert Gurbuzbalaban, Lingjiong Zhu
We study two variants that are based on non-reversible Langevin diffusions: the underdamped Langevin dynamics (ULD) and the Langevin dynamics with a non-symmetric drift (NLD).
no code implementations • 27 May 2018 • Necdet Serhat Aybat, Alireza Fallah, Mert Gurbuzbalaban, Asuman Ozdaglar
We study the trade-offs between convergence rate and robustness to gradient errors in designing a first-order algorithm.
no code implementations • NeurIPS 2017 • Mert Gurbuzbalaban, Asuman Ozdaglar, Pablo A. Parrilo, Nuri Vanli
The coordinate descent (CD) method is a classical optimization algorithm that has seen a revival of interest because of its competitive performance in machine learning applications.
no code implementations • 24 Oct 2017 • Saeed Soori, Aditya Devarakonda, James Demmel, Mert Gurbuzbalaban, Maryam Mehri Dehnavi
We formulate the algorithm for two different optimization methods on the Lasso problem and show that the latency cost is reduced by a factor of k while bandwidth and floating-point operation costs remain the same.