no code implementations • 24 Jan 2024 • Mingyang Yi, Bohan Wang
In this paper, we aim to enrich the continuous optimization methods in the Wasserstein space by extending the gradient flow into the stochastic gradient descent (SGD) flow and stochastic variance reduction gradient (SVRG) flow.
1 code implementation • NeurIPS 2023 • Shuchen Xue, Mingyang Yi, Weijian Luo, Shifeng Zhang, Jiacheng Sun, Zhenguo Li, Zhi-Ming Ma
Based on our analysis, we propose SA-Solver, which is an improved efficient stochastic Adams method for solving diffusion SDE to generate data with high quality.
Ranked #12 on Image Generation on ImageNet 512x512
no code implementations • 24 May 2023 • Mingyang Yi, Jiacheng Sun, Zhenguo Li
To understand this contradiction, we empirically verify the difference between the sufficiently trained diffusion model and the empirical optima.
no code implementations • 14 Jul 2022 • Mingyang Yi, Ruoyu Wang, Jiachen Sun, Zhenguo Li, Zhi-Ming Ma
The correlation shift is caused by the spurious attributes that correlate to the class label, as the correlation between them may vary in training and test data.
no code implementations • CVPR 2022 • Ruoyu Wang, Mingyang Yi, Zhitang Chen, Shengyu Zhu
In this work, we obviate these assumptions and tackle the OOD problem without explicitly recovering the causal feature.
1 code implementation • 1 Nov 2021 • Weiran Huang, Mingyang Yi, Xuyang Zhao, Zihao Jiang
It reveals that the generalization ability of contrastive self-supervised learning is related to three key factors: alignment of positive samples, divergence of class centers, and concentration of augmented data.
no code implementations • 29 Sep 2021 • Ruoyu Wang, Mingyang Yi, Shengyu Zhu, Zhitang Chen
In this work, we obviate these assumptions and tackle the OOD problem without explicitly recovering the causal feature.
no code implementations • 24 May 2021 • Mingyang Yi, Lu Hou, Jiacheng Sun, Lifeng Shang, Xin Jiang, Qun Liu, Zhi-Ming Ma
In this paper, after defining OOD generalization via Wasserstein distance, we theoretically show that a model robust to input perturbation generalizes well on OOD data.
1 code implementation • ICLR 2021 • Mingyang Yi, Lu Hou, Lifeng Shang, Xin Jiang, Qun Liu, Zhi-Ming Ma
Inspired by adversarial training, we minimize this maximal expected loss (MMEL) and obtain a simple and interpretable closed-form solution: more attention should be paid to augmented samples with large loss values (i. e., harder examples).
no code implementations • 8 Jan 2021 • Mingyang Yi, Huishuai Zhang, Wei Chen, Zhi-Ming Ma, Tie-Yan Liu
However, it has been pointed out that the usual definitions of sharpness, which consider either the maxima or the integral of loss over a $\delta$ ball of parameters around minima, cannot give consistent measurement for scale invariant neural networks, e. g., networks with batch normalization layer.
no code implementations • 8 Jan 2021 • Mingyang Yi
The network with BN is invariant to positively linear re-scale transformation, which makes there exist infinite functionally equivalent networks with different scales of weights.
1 code implementation • 4 Dec 2020 • Mingyang Yi, Ruoyu Wang, Zhi-Ming Ma
Our bounds underscore that with locally strongly convex population risk, the models trained by any proper iterative algorithm can generalize well, even for non-convex problems, and $d$ is large.
no code implementations • 25 Sep 2019 • Mingyang Yi, Huishuai Zhang, Wei Chen, Zhi-Ming Ma, Tie-Yan Liu
It has widely shown that adversarial training (Madry et al., 2018) is effective in defending adversarial attack empirically.
no code implementations • 25 Sep 2019 • Huishuai Zhang, Da Yu, Mingyang Yi, Wei Chen, Tie-Yan Liu
We show that for standard initialization used in practice, $\tau =1/\Omega(\sqrt{L})$ is a sharp value in characterizing the stability of forward/backward process of ResNet, where $L$ is the number of residual blocks.
no code implementations • ICLR 2019 • Mingyang Yi, Huishuai Zhang, Wei Chen, Zhi-Ming Ma, Tie-Yan Liu
Optimization on manifold has been widely used in machine learning, to handle optimization problems with constraint.
1 code implementation • 17 Mar 2019 • Huishuai Zhang, Da Yu, Mingyang Yi, Wei Chen, Tie-Yan Liu
Moreover, for ResNets with normalization layer, adding such a factor $\tau$ also stabilizes the training and obtains significant performance gain for deep ResNet.
no code implementations • 6 Mar 2019 • Mingyang Yi, Qi Meng, Wei Chen, Zhi-Ming Ma, Tie-Yan Liu
That is to say, the minimum with balanced values of basis paths will more likely to be flatter and generalize better.