Search Results for author: Stephen Wright

Found 22 papers, 4 papers with code

Convex and Bilevel Optimization for Neuro-Symbolic Inference and Learning

2 code implementations17 Jan 2024 Charles Dickens, Changyu Gao, Connor Pryor, Stephen Wright, Lise Getoor

We address a key challenge for neuro-symbolic (NeSy) systems by leveraging convex and bilevel optimization techniques to develop a general gradient-based framework for end-to-end neural and symbolic parameter learning.

Bilevel Optimization

Optimally Teaching a Linear Behavior Cloning Agent

no code implementations26 Nov 2023 Shubham Kumar Bharti, Stephen Wright, Adish Singla, Xiaojin Zhu

The goal of the teacher is to teach a realizable target policy to the learner using minimum number of state demonstrations.

On Penalty Methods for Nonconvex Bilevel Optimization and First-Order Stochastic Approximation

no code implementations4 Sep 2023 Jeongyeol Kwon, Dohyun Kwon, Stephen Wright, Robert Nowak

When the perturbed lower-level problem uniformly satisfies the small-error proximal error-bound (EB) condition, we propose a first-order algorithm that converges to an $\epsilon$-stationary point of the penalty function, using in total $O(\epsilon^{-3})$ and $O(\epsilon^{-7})$ accesses to first-order (stochastic) gradient oracles when the oracle is deterministic and oracles are noisy, respectively.

Bilevel Optimization

Cut your Losses with Squentropy

no code implementations8 Feb 2023 Like Hui, Mikhail Belkin, Stephen Wright

We provide an extensive set of experiments on multi-class classification problems showing that the squentropy loss outperforms both the pure cross entropy and rescaled square losses in terms of the classification accuracy.

Classification Multi-class Classification

A Fully First-Order Method for Stochastic Bilevel Optimization

no code implementations26 Jan 2023 Jeongyeol Kwon, Dohyun Kwon, Stephen Wright, Robert Nowak

Specifically, we show that F2SA converges to an $\epsilon$-stationary solution of the bilevel problem after $\epsilon^{-7/2}, \epsilon^{-5/2}$, and $\epsilon^{-3/2}$ iterations (each iteration using $O(1)$ samples) when stochastic noises are in both level objectives, only in the upper-level objective, and not present (deterministic settings), respectively.

Bilevel Optimization

BOME! Bilevel Optimization Made Easy: A Simple First-Order Approach

no code implementations19 Sep 2022 Mao Ye, Bo Liu, Stephen Wright, Peter Stone, Qiang Liu

Bilevel optimization (BO) is useful for solving a variety of important machine learning problems including but not limited to hyperparameter optimization, meta-learning, continual learning, and reinforcement learning.

Bilevel Optimization Continual Learning +3

On the Global Convergence of Gradient Descent for multi-layer ResNets in the mean-field regime

no code implementations6 Oct 2021 Zhiyan Ding, Shi Chen, Qin Li, Stephen Wright

Finding the optimal configuration of parameters in ResNet is a nonconvex minimization problem, but first-order methods nevertheless find the global optimum in the overparameterized regime.

Overparameterization of deep ResNet: zero loss and mean-field analysis

no code implementations30 May 2021 Zhiyan Ding, Shi Chen, Qin Li, Stephen Wright

Finding parameters in a deep neural network (NN) that fit training data is a nonconvex optimization problem, but a basic first-order optimization method (gradient descent) finds a global optimizer with perfect fit (zero-loss) in many practical situations.

Stochastic Learning for Sparse Discrete Markov Random Fields with Controlled Gradient Approximation Error

no code implementations12 May 2020 Sinong Geng, Zhaobin Kuang, Jie Liu, Stephen Wright, David Page

We study the $L_1$-regularized maximum likelihood estimator/estimation (MLE) problem for discrete Markov random fields (MRFs), where efficient and scalable learning requires both sparse regularization and approximate inference.

Computing Estimators of Dantzig Selector type via Column and Constraint Generation

1 code implementation18 Aug 2019 Rahul Mazumder, Stephen Wright, Andrew Zheng

We consider a class of linear-programming based estimators in reconstructing a sparse signal from linear measurements.

Vocal Bursts Type Prediction

Convergence and Margin of Adversarial Training on Separable Data

no code implementations22 May 2019 Zachary Charles, Shashank Rajput, Stephen Wright, Dimitris Papailiopoulos

Our results are derived by showing that adversarial training with gradient updates minimizes a robust version of the empirical risk at a $\mathcal{O}(\ln(t)^2/t)$ rate, despite non-smoothness.

Bilinear Bandits with Low-rank Structure

no code implementations8 Jan 2019 Kwang-Sung Jun, Rebecca Willett, Stephen Wright, Robert Nowak

We introduce the bilinear bandit problem with low-rank structure in which an action takes the form of a pair of arms from two different entity types, and the reward is a bilinear function of the known feature vectors of the arms.

Dissipativity Theory for Accelerating Stochastic Variance Reduction: A Unified Analysis of SVRG and Katyusha Using Semidefinite Programs

no code implementations ICML 2018 Bin Hu, Stephen Wright, Laurent Lessard

Our combination of perspectives leads to a better understanding of accelerated variance-reduced stochastic methods for finite-sum problems.

Blended Conditional Gradients: the unconditioning of conditional gradients

2 code implementations18 May 2018 Gábor Braun, Sebastian Pokutta, Dan Tu, Stephen Wright

We present a blended conditional gradient approach for minimizing a smooth convex function over a polytope P, combining the Frank--Wolfe algorithm (also called conditional gradient) with gradient-based steps, different from away steps and pairwise steps, but still achieving linear convergence for strongly convex functions, along with good practical performance.

k-Support and Ordered Weighted Sparsity for Overlapping Groups: Hardness and Algorithms

no code implementations NeurIPS 2017 Cong Han Lim, Stephen Wright

We study the norms obtained from extending the k-support norm and OWL norms to the setting in which there are overlapping groups.

Online Learning for Changing Environments using Coin Betting

no code implementations6 Nov 2017 Kwang-Sung Jun, Francesco Orabona, Stephen Wright, Rebecca Willett

A key challenge in online learning is that classical algorithms can be slow to adapt to changing environments.

Metric Learning

Improved Strongly Adaptive Online Learning using Coin Betting

no code implementations14 Oct 2016 Kwang-Sung Jun, Francesco Orabona, Rebecca Willett, Stephen Wright

This paper describes a new parameter-free online learning algorithm for changing environments.

Metric Learning

Beyond the Birkhoff Polytope: Convex Relaxations for Vector Permutation Problems

no code implementations NeurIPS 2014 Cong Han Lim, Stephen Wright

Using a recent construction of Goemans (2010), we show that when optimizing over the convex hull of the permutation vectors (the permutahedron), we can reduce the number of variables and constraints to $\Theta(n \log n)$ in theory and $\Theta(n \log^2 n)$ in practice.

Forward - Backward Greedy Algorithms for Atomic Norm Regularization

no code implementations23 Apr 2014 Nikhil Rao, Parikshit Shah, Stephen Wright

CoGEnT combines a greedy selection scheme based on the conditional gradient approach with a backward (or "truncation") step that exploits the quadratic nature of the objective to reduce the basis size.

Hogwild: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent

no code implementations NeurIPS 2011 Benjamin Recht, Christopher Re, Stephen Wright, Feng Niu

Stochastic Gradient Descent (SGD) is a popular algorithm that can achieve state-of-the-art performance on a variety of machine learning tasks.

Cannot find the paper you are looking for? You can Submit a new open access paper.