Search Results for author: Mingyang Yi

Found 17 papers, 5 papers with code

Continuous-time Riemannian SGD and SVRG Flows on Wasserstein Probabilistic Space

no code implementations • 24 Jan 2024 • Mingyang Yi, Bohan Wang

In this paper, we aim to enrich the continuous optimization methods in the Wasserstein space by extending the gradient flow into the stochastic gradient descent (SGD) flow and stochastic variance reduction gradient (SVRG) flow.

Stochastic Optimization

Paper
Add Code

SA-Solver: Stochastic Adams Solver for Fast Sampling of Diffusion Models

1 code implementation • NeurIPS 2023 • Shuchen Xue, Mingyang Yi, Weijian Luo, Shifeng Zhang, Jiacheng Sun, Zhenguo Li, Zhi-Ming Ma

Based on our analysis, we propose SA-Solver, which is an improved efficient stochastic Adams method for solving diffusion SDE to generate data with high quality.

Ranked #12 on Image Generation on ImageNet 512x512

Image Generation

Paper
Code

On the Generalization of Diffusion Model

no code implementations • 24 May 2023 • Mingyang Yi, Jiacheng Sun, Zhenguo Li

To understand this contradiction, we empirically verify the difference between the sufficiently trained diffusion model and the empirical optima.

Paper
Add Code

Breaking Correlation Shift via Conditional Invariant Regularizer

no code implementations • 14 Jul 2022 • Mingyang Yi, Ruoyu Wang, Jiachen Sun, Zhenguo Li, Zhi-Ming Ma

The correlation shift is caused by the spurious attributes that correlate to the class label, as the correlation between them may vary in training and test data.

Paper
Add Code

Out-of-distribution Generalization with Causal Invariant Transformations

no code implementations • CVPR 2022 • Ruoyu Wang, Mingyang Yi, Zhitang Chen, Shengyu Zhu

In this work, we obviate these assumptions and tackle the OOD problem without explicitly recovering the causal feature.

Out-of-Distribution Generalization

Paper
Add Code

Towards the Generalization of Contrastive Self-Supervised Learning

1 code implementation • 1 Nov 2021 • Weiran Huang, Mingyang Yi, Xuyang Zhao, Zihao Jiang

It reveals that the generalization ability of contrastive self-supervised learning is related to three key factors: alignment of positive samples, divergence of class centers, and concentration of augmented data.

Contrastive Learning Data Augmentation +1

Paper
Code

Improving OOD Generalization with Causal Invariant Transformations

no code implementations • 29 Sep 2021 • Ruoyu Wang, Mingyang Yi, Shengyu Zhu, Zhitang Chen

In this work, we obviate these assumptions and tackle the OOD problem without explicitly recovering the causal feature.

Paper
Add Code

Improved OOD Generalization via Adversarial Training and Pre-training

no code implementations • 24 May 2021 • Mingyang Yi, Lu Hou, Jiacheng Sun, Lifeng Shang, Xin Jiang, Qun Liu, Zhi-Ming Ma

In this paper, after defining OOD generalization via Wasserstein distance, we theoretically show that a model robust to input perturbation generalizes well on OOD data.

Image Classification Natural Language Understanding

Paper
Add Code

Reweighting Augmented Samples by Minimizing the Maximal Expected Loss

1 code implementation • ICLR 2021 • Mingyang Yi, Lu Hou, Lifeng Shang, Xin Jiang, Qun Liu, Zhi-Ming Ma

Inspired by adversarial training, we minimize this maximal expected loss (MMEL) and obtain a simple and interpretable closed-form solution: more attention should be paid to augmented samples with large loss values (i. e., harder examples).

Image Augmentation Image Classification +1

Paper
Code

BN-invariant sharpness regularizes the training model to better generalization

no code implementations • 8 Jan 2021 • Mingyang Yi, Huishuai Zhang, Wei Chen, Zhi-Ming Ma, Tie-Yan Liu

However, it has been pointed out that the usual definitions of sharpness, which consider either the maxima or the integral of loss over a $\delta$ ball of parameters around minima, cannot give consistent measurement for scale invariant neural networks, e. g., networks with batch normalization layer.

Paper
Add Code

Accelerating Training of Batch Normalization: A Manifold Perspective

no code implementations • 8 Jan 2021 • Mingyang Yi

The network with BN is invariant to positively linear re-scale transformation, which makes there exist infinite functionally equivalent networks with different scales of weights.

Paper
Add Code

Characterization of Excess Risk for Locally Strongly Convex Population Risk

1 code implementation • 4 Dec 2020 • Mingyang Yi, Ruoyu Wang, Zhi-Ming Ma

Our bounds underscore that with locally strongly convex population risk, the models trained by any proper iterative algorithm can generalize well, even for non-convex problems, and $d$ is large.

5,784

Paper
Code

THE EFFECT OF ADVERSARIAL TRAINING: A THEORETICAL CHARACTERIZATION

no code implementations • 25 Sep 2019 • Mingyang Yi, Huishuai Zhang, Wei Chen, Zhi-Ming Ma, Tie-Yan Liu

It has widely shown that adversarial training (Madry et al., 2018) is effective in defending adversarial attack empirically.

Adversarial Attack

Paper
Add Code

STABILITY AND CONVERGENCE THEORY FOR LEARNING RESNET: A FULL CHARACTERIZATION

no code implementations • 25 Sep 2019 • Huishuai Zhang, Da Yu, Mingyang Yi, Wei Chen, Tie-Yan Liu

We show that for standard initialization used in practice, $\tau =1/\Omega(\sqrt{L})$ is a sharp value in characterizing the stability of forward/backward process of ResNet, where $L$ is the number of residual blocks.

Paper
Add Code

Optimization on Multiple Manifolds

no code implementations • ICLR 2019 • Mingyang Yi, Huishuai Zhang, Wei Chen, Zhi-Ming Ma, Tie-Yan Liu

Optimization on manifold has been widely used in machine learning, to handle optimization problems with constraint.

Paper
Add Code

Stabilize Deep ResNet with A Sharp Scaling Factor $τ$

1 code implementation • 17 Mar 2019 • Huishuai Zhang, Da Yu, Mingyang Yi, Wei Chen, Tie-Yan Liu

Moreover, for ResNets with normalization layer, adding such a factor $\tau$ also stabilizes the training and obtains significant performance gain for deep ResNet.

Paper
Code

Positively Scale-Invariant Flatness of ReLU Neural Networks

no code implementations • 6 Mar 2019 • Mingyang Yi, Qi Meng, Wei Chen, Zhi-Ming Ma, Tie-Yan Liu

That is to say, the minimum with balanced values of basis paths will more likely to be flatter and generalize better.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.