Search Results for author: Stephen Wright

Found 22 papers, 4 papers with code

Convex and Bilevel Optimization for Neuro-Symbolic Inference and Learning

2 code implementations • 17 Jan 2024 • Charles Dickens, Changyu Gao, Connor Pryor, Stephen Wright, Lise Getoor

We address a key challenge for neuro-symbolic (NeSy) systems by leveraging convex and bilevel optimization techniques to develop a general gradient-based framework for end-to-end neural and symbolic parameter learning.

Bilevel Optimization

Paper
Code

Optimally Teaching a Linear Behavior Cloning Agent

no code implementations • 26 Nov 2023 • Shubham Kumar Bharti, Stephen Wright, Adish Singla, Xiaojin Zhu

The goal of the teacher is to teach a realizable target policy to the learner using minimum number of state demonstrations.

Paper
Add Code

On Penalty Methods for Nonconvex Bilevel Optimization and First-Order Stochastic Approximation

no code implementations • 4 Sep 2023 • Jeongyeol Kwon, Dohyun Kwon, Stephen Wright, Robert Nowak

When the perturbed lower-level problem uniformly satisfies the small-error proximal error-bound (EB) condition, we propose a first-order algorithm that converges to an $\epsilon$-stationary point of the penalty function, using in total $O(\epsilon^{-3})$ and $O(\epsilon^{-7})$ accesses to first-order (stochastic) gradient oracles when the oracle is deterministic and oracles are noisy, respectively.

Bilevel Optimization

Paper
Add Code

Cut your Losses with Squentropy

no code implementations • 8 Feb 2023 • Like Hui, Mikhail Belkin, Stephen Wright

We provide an extensive set of experiments on multi-class classification problems showing that the squentropy loss outperforms both the pure cross entropy and rescaled square losses in terms of the classification accuracy.

Classification Multi-class Classification

Paper
Add Code

A Fully First-Order Method for Stochastic Bilevel Optimization

no code implementations • 26 Jan 2023 • Jeongyeol Kwon, Dohyun Kwon, Stephen Wright, Robert Nowak

Specifically, we show that F2SA converges to an $\epsilon$-stationary solution of the bilevel problem after $\epsilon^{-7/2}, \epsilon^{-5/2}$, and $\epsilon^{-3/2}$ iterations (each iteration using $O(1)$ samples) when stochastic noises are in both level objectives, only in the upper-level objective, and not present (deterministic settings), respectively.

Bilevel Optimization

Paper
Add Code

BOME! Bilevel Optimization Made Easy: A Simple First-Order Approach

no code implementations • 19 Sep 2022 • Mao Ye, Bo Liu, Stephen Wright, Peter Stone, Qiang Liu

Bilevel optimization (BO) is useful for solving a variety of important machine learning problems including but not limited to hyperparameter optimization, meta-learning, continual learning, and reinforcement learning.

Bilevel Optimization Continual Learning +3

Paper
Add Code

On the Global Convergence of Gradient Descent for multi-layer ResNets in the mean-field regime

no code implementations • 6 Oct 2021 • Zhiyan Ding, Shi Chen, Qin Li, Stephen Wright

Finding the optimal configuration of parameters in ResNet is a nonconvex minimization problem, but first-order methods nevertheless find the global optimum in the overparameterized regime.

Paper
Add Code

Overparameterization of deep ResNet: zero loss and mean-field analysis

no code implementations • 30 May 2021 • Zhiyan Ding, Shi Chen, Qin Li, Stephen Wright

Finding parameters in a deep neural network (NN) that fit training data is a nonconvex optimization problem, but a basic first-order optimization method (gradient descent) finds a global optimizer with perfect fit (zero-loss) in many practical situations.

Paper
Add Code

Stochastic Learning for Sparse Discrete Markov Random Fields with Controlled Gradient Approximation Error

no code implementations • 12 May 2020 • Sinong Geng, Zhaobin Kuang, Jie Liu, Stephen Wright, David Page

We study the $L_1$-regularized maximum likelihood estimator/estimation (MLE) problem for discrete Markov random fields (MRFs), where efficient and scalable learning requires both sparse regularization and approximate inference.

Paper
Add Code

Computing Estimators of Dantzig Selector type via Column and Constraint Generation

1 code implementation • 18 Aug 2019 • Rahul Mazumder, Stephen Wright, Andrew Zheng

We consider a class of linear-programming based estimators in reconstructing a sparse signal from linear measurements.

Vocal Bursts Type Prediction

Paper
Code

Convergence and Margin of Adversarial Training on Separable Data

no code implementations • 22 May 2019 • Zachary Charles, Shashank Rajput, Stephen Wright, Dimitris Papailiopoulos

Our results are derived by showing that adversarial training with gradient updates minimizes a robust version of the empirical risk at a $\mathcal{O}(\ln(t)^2/t)$ rate, despite non-smoothness.

Paper
Add Code

Bilinear Bandits with Low-rank Structure

no code implementations • 8 Jan 2019 • Kwang-Sung Jun, Rebecca Willett, Stephen Wright, Robert Nowak

We introduce the bilinear bandit problem with low-rank structure in which an action takes the form of a pair of arms from two different entity types, and the reward is a bilinear function of the known feature vectors of the arms.

Paper
Add Code

ATOMO: Communication-efficient Learning via Atomic Sparsification

1 code implementation • NeurIPS 2018 • Hongyi Wang, Scott Sievert, Zachary Charles, Shengchao Liu, Stephen Wright, Dimitris Papailiopoulos

We present ATOMO, a general framework for atomic sparsification of stochastic gradients.

Paper
Code

Dissipativity Theory for Accelerating Stochastic Variance Reduction: A Unified Analysis of SVRG and Katyusha Using Semidefinite Programs

no code implementations • ICML 2018 • Bin Hu, Stephen Wright, Laurent Lessard

Our combination of perspectives leads to a better understanding of accelerated variance-reduced stochastic methods for finite-sum problems.

Paper
Add Code

Blended Conditional Gradients: the unconditioning of conditional gradients

2 code implementations • 18 May 2018 • Gábor Braun, Sebastian Pokutta, Dan Tu, Stephen Wright

We present a blended conditional gradient approach for minimizing a smooth convex function over a polytope P, combining the Frank--Wolfe algorithm (also called conditional gradient) with gradient-based steps, different from away steps and pairwise steps, but still achieving linear convergence for strongly convex functions, along with good practical performance.

Paper
Code

k-Support and Ordered Weighted Sparsity for Overlapping Groups: Hardness and Algorithms

no code implementations • NeurIPS 2017 • Cong Han Lim, Stephen Wright

We study the norms obtained from extending the k-support norm and OWL norms to the setting in which there are overlapping groups.

Paper
Add Code

Online Learning for Changing Environments using Coin Betting

no code implementations • 6 Nov 2017 • Kwang-Sung Jun, Francesco Orabona, Stephen Wright, Rebecca Willett

A key challenge in online learning is that classical algorithms can be slow to adapt to changing environments.

Metric Learning

Paper
Add Code

Improved Strongly Adaptive Online Learning using Coin Betting

no code implementations • 14 Oct 2016 • Kwang-Sung Jun, Francesco Orabona, Rebecca Willett, Stephen Wright

This paper describes a new parameter-free online learning algorithm for changing environments.

Metric Learning

Paper
Add Code

Beyond the Birkhoff Polytope: Convex Relaxations for Vector Permutation Problems

no code implementations • NeurIPS 2014 • Cong Han Lim, Stephen Wright

Using a recent construction of Goemans (2010), we show that when optimizing over the convex hull of the permutation vectors (the permutahedron), we can reduce the number of variables and constraints to $\Theta(n \log n)$ in theory and $\Theta(n \log^2 n)$ in practice.

Paper
Add Code

Forward - Backward Greedy Algorithms for Atomic Norm Regularization

no code implementations • 23 Apr 2014 • Nikhil Rao, Parikshit Shah, Stephen Wright

CoGEnT combines a greedy selection scheme based on the conditional gradient approach with a backward (or "truncation") step that exploits the quadratic nature of the objective to reduce the basis size.

Paper
Add Code

An Approximate, Efficient LP Solver for LP Rounding

no code implementations • NeurIPS 2013 • Srikrishna Sridhar, Stephen Wright, Christopher Re, Ji Liu, Victor Bittorf, Ce Zhang

Many problems in machine learning can be solved by rounding the solution of an appropriate linear program.

BIG-bench Machine Learning

Paper
Add Code

Hogwild: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent

no code implementations • NeurIPS 2011 • Benjamin Recht, Christopher Re, Stephen Wright, Feng Niu

Stochastic Gradient Descent (SGD) is a popular algorithm that can achieve state-of-the-art performance on a variety of machine learning tasks.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.