Search Results for author: Difan Zou

Found 38 papers, 4 papers with code

Improving Group Robustness on Spurious Correlation Requires Preciser Group Inference

no code implementations • 22 Apr 2024 • Yujin Han, Difan Zou

GIC trains a spurious attribute classifier based on two key properties of spurious correlations: (1) high correlation between spurious attributes and true labels, and (2) variability in this correlation between datasets with different group distributions.

Attribute

Paper
Add Code

The Dog Walking Theory: Rethinking Convergence in Federated Learning

no code implementations • 18 Apr 2024 • Kun Zhai, Yifeng Gao, Xingjun Ma, Difan Zou, Guangnan Ye, Yu-Gang Jiang

In this paper, we study the convergence of FL on non-IID data and propose a novel \emph{Dog Walking Theory} to formulate and identify the missing element in existing research.

Federated Learning

Paper
Add Code

What Can Transformer Learn with Varying Depth? Case Studies on Sequence Learning Tasks

no code implementations • 2 Apr 2024 • Xingwu Chen, Difan Zou

Specifically, we designed a novel set of sequence learning tasks to systematically evaluate and comprehend how the depth of transformer affects its ability to perform memorization, reasoning, generalization, and contextual generalization.

Memorization

Paper
Add Code

On the Benefits of Over-parameterization for Out-of-Distribution Generalization

no code implementations • 26 Mar 2024 • Yifan Hao, Yong Lin, Difan Zou, Tong Zhang

We demonstrate that in this scenario, further increasing the model's parameterization can significantly reduce the OOD loss.

Out-of-Distribution Generalization

Paper
Add Code

Improving Implicit Regularization of SGD with Preconditioning for Least Square Problems

no code implementations • 13 Mar 2024 • Junwei Su, Difan Zou, Chuan Wu

In this paper, we study the generalization performance of SGD with preconditioning for the least squared problem.

regression

Paper
Add Code

An Improved Analysis of Langevin Algorithms with Prior Diffusion for Non-Log-Concave Sampling

no code implementations • 10 Mar 2024 • Xunpeng Huang, Hanze Dong, Difan Zou, Tong Zhang

Along this line, Freund et al. (2022) suggest that the modified Langevin algorithm with prior diffusion is able to converge dimension independently for strongly log-concave target distributions.

Paper
Add Code

Towards Robust Graph Incremental Learning on Evolving Graphs

no code implementations • 20 Feb 2024 • Junwei Su, Difan Zou, Zijun Zhang, Chuan Wu

We provide a formal formulation and analysis of the problem, and propose a novel regularization-based technique called Structural-Shift-Risk-Mitigation (SSRM) to mitigate the impact of the structural shift on catastrophic forgetting of the inductive NGIL problem.

Incremental Learning

Paper
Add Code

PRES: Toward Scalable Memory-Based Dynamic Graph Neural Networks

no code implementations • 6 Feb 2024 • Junwei Su, Difan Zou, Chuan Wu

Memory-based Dynamic Graph Neural Networks (MDGNNs) are a family of dynamic graph neural networks that leverage a memory module to extract, distill, and memorize long-term temporal dependencies, leading to superior performance compared to memory-less counterparts.

Paper
Add Code

Faster Sampling without Isoperimetry via Diffusion-based Monte Carlo

no code implementations • 12 Jan 2024 • Xunpeng Huang, Difan Zou, Hanze Dong, Yian Ma, Tong Zhang

Specifically, DMC follows the reverse SDE of a diffusion process that transforms the target distribution to the standard Gaussian, utilizing a non-parametric score estimation.

Paper
Add Code

Benign Oscillation of Stochastic Gradient Descent with Large Learning Rates

no code implementations • 26 Oct 2023 • Miao Lu, Beining Wu, Xiaodong Yang, Difan Zou

In view of this finding, we call such a phenomenon "benign oscillation".

Paper
Add Code

How Many Pretraining Tasks Are Needed for In-Context Learning of Linear Regression?

no code implementations • 12 Oct 2023 • Jingfeng Wu, Difan Zou, Zixiang Chen, Vladimir Braverman, Quanquan Gu, Peter L. Bartlett

Transformers pretrained on diverse tasks exhibit remarkable in-context learning (ICL) capabilities, enabling them to solve unseen tasks solely based on input contexts without adjusting model parameters.

In-Context Learning regression

Paper
Add Code

Less is More: On the Feature Redundancy of Pretrained Models When Transferring to Few-shot Tasks

no code implementations • 5 Oct 2023 • Xu Luo, Difan Zou, Lianli Gao, Zenglin Xu, Jingkuan Song

Transferring a pretrained model to a downstream task can be as easy as conducting linear probing with target data, that is, training a linear classifier upon frozen features extracted from the pretrained model.

Feature Importance

Paper
Add Code

Benign Overfitting in Two-Layer ReLU Convolutional Neural Networks for XOR Data

no code implementations • 3 Oct 2023 • Xuran Meng, Difan Zou, Yuan Cao

Modern deep learning models are usually highly over-parameterized so that they can overfit the training data.

Paper
Add Code

The Implicit Bias of Batch Normalization in Linear Models and Two-layer Linear Convolutional Neural Networks

no code implementations • 20 Jun 2023 • Yuan Cao, Difan Zou, Yuanzhi Li, Quanquan Gu

We show that when learning a linear model with batch normalization for binary classification, gradient descent converges to a uniform margin classifier on the training data with an $\exp(-\Omega(\log^2 t))$ convergence rate.

Binary Classification

Paper
Add Code

Per-Example Gradient Regularization Improves Learning Signals from Noisy Data

no code implementations • 31 Mar 2023 • Xuran Meng, Yuan Cao, Difan Zou

In this paper, we explore the per-example gradient regularization (PEGR) and present a theoretical analysis that demonstrates its effectiveness in improving both test error and robustness against noise perturbations.

Memorization

Paper
Add Code

The Benefits of Mixup for Feature Learning

no code implementations • 15 Mar 2023 • Difan Zou, Yuan Cao, Yuanzhi Li, Quanquan Gu

We consider a feature-noise data model and show that Mixup training can effectively learn the rare features (appearing in a small fraction of data) from its mixture with the common features (appearing in a large fraction of data).

Data Augmentation

Paper
Add Code

Finite-Sample Analysis of Learning High-Dimensional Single ReLU Neuron

no code implementations • 3 Mar 2023 • Jingfeng Wu, Difan Zou, Zixiang Chen, Vladimir Braverman, Quanquan Gu, Sham M. Kakade

On the other hand, we provide some negative results for stochastic gradient descent (SGD) for ReLU regression with symmetric Bernoulli data: if the model is well-specified, the excess risk of SGD is provably no better than that of GLM-tron ignoring constant factors, for each problem instance; and in the noiseless case, GLM-tron can achieve a small risk while SGD unavoidably suffers from a constant risk in expectation.

regression Vocal Bursts Intensity Prediction

Paper
Add Code

The Power and Limitation of Pretraining-Finetuning for Linear Regression under Covariate Shift

no code implementations • 3 Aug 2022 • Jingfeng Wu, Difan Zou, Vladimir Braverman, Quanquan Gu, Sham M. Kakade

Our bounds suggest that for a large class of linear regression instances, transfer learning with $O(N^2)$ source data (and scarce or no target data) is as effective as supervised learning with $N$ target data.

regression Transfer Learning

Paper
Add Code

Risk Bounds of Multi-Pass SGD for Least Squares in the Interpolation Regime

no code implementations • 7 Mar 2022 • Difan Zou, Jingfeng Wu, Vladimir Braverman, Quanquan Gu, Sham M. Kakade

Stochastic gradient descent (SGD) has achieved great success due to its superior performance in both optimization and generalization.

Paper
Add Code

Last Iterate Risk Bounds of SGD with Decaying Stepsize for Overparameterized Linear Regression

no code implementations • 12 Oct 2021 • Jingfeng Wu, Difan Zou, Vladimir Braverman, Quanquan Gu, Sham M. Kakade

In this paper, we provide a problem-dependent analysis on the last iterate risk bounds of SGD with decaying stepsize, for (overparameterized) linear regression problems.

regression

Paper
Add Code

Understanding the Generalization of Adam in Learning Neural Networks with Proper Regularization

no code implementations • 25 Aug 2021 • Difan Zou, Yuan Cao, Yuanzhi Li, Quanquan Gu

In this paper, we provide a theoretical explanation for this phenomenon: we show that in the nonconvex setting of learning over-parameterized two-layer convolutional neural networks starting from the same random initialization, for a class of data distributions (inspired from image data), Adam and gradient descent (GD) can converge to different global solutions of the training objective with provably different generalization errors, even with weight decay regularization.

Image Classification

Paper
Add Code

The Benefits of Implicit Regularization from SGD in Least Squares Problems

no code implementations • NeurIPS 2021 • Difan Zou, Jingfeng Wu, Vladimir Braverman, Quanquan Gu, Dean P. Foster, Sham M. Kakade

Stochastic gradient descent (SGD) exhibits strong algorithmic regularization effects in practice, which has been hypothesized to play an important role in the generalization of modern machine learning approaches.

regression

Paper
Add Code

Self-training Converts Weak Learners to Strong Learners in Mixture Models

no code implementations • 25 Jun 2021 • Spencer Frei, Difan Zou, Zixiang Chen, Quanquan Gu

We show that there exists a universal constant $C_{\mathrm{err}}>0$ such that if a pseudolabeler $\boldsymbol{\beta}_{\mathrm{pl}}$ can achieve classification error at most $C_{\mathrm{err}}$, then for any $\varepsilon>0$, an iterative self-training algorithm initialized at $\boldsymbol{\beta}_0 := \boldsymbol{\beta}_{\mathrm{pl}}$ using pseudolabels $\hat y = \mathrm{sgn}(\langle \boldsymbol{\beta}_t, \mathbf{x}\rangle)$ and using at most $\tilde O(d/\varepsilon^2)$ unlabeled examples suffices to learn the Bayes-optimal classifier up to $\varepsilon$ error, where $d$ is the ambient dimension.

Binary Classification

Paper
Add Code

Provable Robustness of Adversarial Training for Learning Halfspaces with Noise

no code implementations • 19 Apr 2021 • Difan Zou, Spencer Frei, Quanquan Gu

To the best of our knowledge, this is the first work to show that adversarial training provably yields robust classifiers in the presence of noise.

Classification General Classification +1

Paper
Add Code

Benign Overfitting of Constant-Stepsize SGD for Linear Regression

no code implementations • 23 Mar 2021 • Difan Zou, Jingfeng Wu, Vladimir Braverman, Quanquan Gu, Sham M. Kakade

More specifically, for SGD with iterate averaging, we demonstrate the sharpness of the established excess risk bound by proving a matching lower bound (up to constant factors).

regression

Paper
Add Code

Direction Matters: On the Implicit Bias of Stochastic Gradient Descent with Moderate Learning Rate

no code implementations • ICLR 2021 • Jingfeng Wu, Difan Zou, Vladimir Braverman, Quanquan Gu

Understanding the algorithmic bias of \emph{stochastic gradient descent} (SGD) is one of the key challenges in modern machine learning and deep learning theory.

Learning Theory

Paper
Add Code

Faster Convergence of Stochastic Gradient Langevin Dynamics for Non-Log-Concave Sampling

no code implementations • 19 Oct 2020 • Difan Zou, Pan Xu, Quanquan Gu

We provide a new convergence analysis of stochastic gradient Langevin dynamics (SGLD) for sampling from a class of distributions that can be non-log-concave.

Paper
Add Code

Improving Adversarial Robustness Requires Revisiting Misclassified Examples

1 code implementation • ICLR 2020 • Yisen Wang, Difan Zou, Jin-Feng Yi, James Bailey, Xingjun Ma, Quanquan Gu

In this paper, we investigate the distinctive influence of misclassified and correctly classified examples on the final robustness of adversarial training.

Adversarial Robustness

137

Paper
Code

On the Global Convergence of Training Deep Linear ResNets

no code implementations • ICLR 2020 • Difan Zou, Philip M. Long, Quanquan Gu

We further propose a modified identity input and output transformations, and show that a $(d+k)$-wide neural network is sufficient to guarantee the global convergence of GD/SGD, where $d, k$ are the input and output dimensions respectively.

Paper
Add Code

Stochastic Gradient Hamiltonian Monte Carlo Methods with Recursive Variance Reduction

1 code implementation • NeurIPS 2019 • Difan Zou, Pan Xu, Quanquan Gu

Stochastic Gradient Hamiltonian Monte Carlo (SGHMC) algorithms have received increasing attention in both theory and practice.

Paper
Code

How Much Over-parameterization Is Sufficient to Learn Deep ReLU Networks?

no code implementations • ICLR 2021 • Zixiang Chen, Yuan Cao, Difan Zou, Quanquan Gu

A recent line of research on deep learning focuses on the extremely over-parameterized setting, and shows that when the network width is larger than a high degree polynomial of the training sample size $n$ and the inverse of the target error $\epsilon^{-1}$, deep neural networks learned by (stochastic) gradient descent enjoy nice optimization and generalization guarantees.

Open-Ended Question Answering

Paper
Add Code

Layer-Dependent Importance Sampling for Training Deep and Large Graph Convolutional Networks

1 code implementation • NeurIPS 2019 • Difan Zou, Ziniu Hu, Yewen Wang, Song Jiang, Yizhou Sun, Quanquan Gu

Original full-batch GCN training requires calculating the representation of all the nodes in the graph per GCN layer, which brings in high computation and memory costs.

Node Classification

Paper
Code

Laplacian Smoothing Stochastic Gradient Markov Chain Monte Carlo

1 code implementation • 2 Nov 2019 • Bao Wang, Difan Zou, Quanquan Gu, Stanley Osher

As an important Markov Chain Monte Carlo (MCMC) method, stochastic gradient Langevin dynamics (SGLD) algorithm has achieved great success in Bayesian learning and posterior sampling.

Paper
Code

An Improved Analysis of Training Over-parameterized Deep Neural Networks

no code implementations • NeurIPS 2019 • Difan Zou, Quanquan Gu

A recent line of research has shown that gradient-based algorithms with random initialization can converge to the global minima of the training loss for over-parameterized (i. e., sufficiently wide) deep neural networks.

Paper
Add Code

Stochastic Gradient Descent Optimizes Over-parameterized Deep ReLU Networks

no code implementations • 21 Nov 2018 • Difan Zou, Yuan Cao, Dongruo Zhou, Quanquan Gu

In particular, we study the binary classification problem and show that for a broad family of loss functions, with proper random weight initialization, both gradient descent and stochastic gradient descent can find the global minima of the training loss for an over-parameterized deep ReLU network, under mild assumption on the training data.

Binary Classification

Paper
Add Code

Stochastic Variance-Reduced Hamilton Monte Carlo Methods

no code implementations • ICML 2018 • Difan Zou, Pan Xu, Quanquan Gu

We propose a fast stochastic Hamilton Monte Carlo (HMC) method, for sampling from a smooth and strongly log-concave distribution.

Stochastic Optimization

Paper
Add Code

Saving Gradient and Negative Curvature Computations: Finding Local Minima More Efficiently

no code implementations • 11 Dec 2017 • Yaodong Yu, Difan Zou, Quanquan Gu

We propose a family of nonconvex optimization algorithms that are able to save gradient and negative curvature computations to a large extent, and are guaranteed to find an approximate local minimum with improved runtime complexity.

Paper
Add Code

Global Convergence of Langevin Dynamics Based Algorithms for Nonconvex Optimization

no code implementations • NeurIPS 2018 • Pan Xu, Jinghui Chen, Difan Zou, Quanquan Gu

Furthermore, for the first time we prove the global convergence guarantee for variance reduced stochastic gradient Langevin dynamics (SVRG-LD) to the almost minimizer within $\tilde O\big(\sqrt{n}d^5/(\lambda^4\epsilon^{5/2})\big)$ stochastic gradient evaluations, which outperforms the gradient complexities of GLD and SGLD in a wide regime.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.