no code implementations • 7 May 2024 • Fan Bao, Chendong Xiang, Gang Yue, Guande He, Hongzhou Zhu, Kaiwen Zheng, Min Zhao, Shilong Liu, Yaole Wang, Jun Zhu
We introduce Vidu, a high-performance text-to-video generator that is capable of producing 1080p videos up to 16 seconds in a single generation.
1 code implementation • 26 May 2023 • Min Zhao, Rongzhen Wang, Fan Bao, Chongxuan Li, Jun Zhu
This paper presents \emph{ControlVideo} for text-driven video editing -- generating a video that aligns with a given text while preserving the structure of the source video.
2 code implementations • NeurIPS 2023 • Zhengyi Wang, Cheng Lu, Yikai Wang, Fan Bao, Chongxuan Li, Hang Su, Jun Zhu
In comparison, VSD works well with various CFG weights as ancestral sampling from diffusion models and simultaneously improves the diversity and sample quality with a common CFG weight (i. e., $7. 5$).
1 code implementation • 31 Mar 2023 • Chendong Xiang, Fan Bao, Chongxuan Li, Hang Su, Jun Zhu
Large-scale diffusion models like Stable Diffusion are powerful and find various real-world applications while customizing such models by fine-tuning is both memory and time inefficient.
3 code implementations • 12 Mar 2023 • Fan Bao, Shen Nie, Kaiwen Xue, Chongxuan Li, Shi Pu, Yaole Wang, Gang Yue, Yue Cao, Hang Su, Jun Zhu
Inspired by the unified view, UniDiffuser learns all distributions simultaneously with a minimal modification to the original diffusion model -- perturbs data in all modalities instead of a single modality, inputs individual timesteps in different modalities, and predicts the noise of all modalities instead of a single modality.
2 code implementations • NeurIPS 2023 • Zebin You, Yong Zhong, Fan Bao, Jiacheng Sun, Chongxuan Li, Jun Zhu
In an effort to further advance semi-supervised generative and classification tasks, we propose a simple yet effective training strategy called dual pseudo training (DPT), built upon strong semi-supervised learners and diffusion models.
1 code implementation • 5 Feb 2023 • Chenyu Zheng, Guoqiang Wu, Fan Bao, Yue Cao, Chongxuan Li, Jun Zhu
Theoretically, the paper considers the surrogate loss instead of the zero-one loss in analyses and generalizes the classical results from binary cases to multiclass ones.
no code implementations • 1 Dec 2022 • Fan Bao, Chongxuan Li, Jiacheng Sun, Jun Zhu
Extensive empirical evidence demonstrates that conditional generative models are easier to train and perform better than unconditional ones by exploiting the labels of data.
1 code implementation • 2 Nov 2022 • Cheng Lu, Yuhao Zhou, Fan Bao, Jianfei Chen, Chongxuan Li, Jun Zhu
The commonly-used fast sampler for guided sampling is DDIM, a first-order diffusion ODE solver that generally needs 100 to 250 steps for high-quality samples.
2 code implementations • 30 Sep 2022 • Fan Bao, Min Zhao, Zhongkai Hao, Peiyao Li, Chongxuan Li, Jun Zhu
Inverse molecular design is critical in material science and drug discovery, where the generated molecules should satisfy certain desirable properties.
3 code implementations • CVPR 2023 • Fan Bao, Shen Nie, Kaiwen Xue, Yue Cao, Chongxuan Li, Hang Su, Jun Zhu
We evaluate U-ViT in unconditional and class-conditional image generation, as well as text-to-image generation tasks, where U-ViT is comparable if not superior to a CNN-based U-Net of a similar size.
Ranked #4 on Text-to-Image Generation on MS COCO
1 code implementation • 30 Aug 2022 • Yong Zhong, Hongtao Liu, Xiaodong Liu, Fan Bao, Weiran Shen, Chongxuan Li
Deep generative models (DGMs) are data-eager because learning a complex model on limited data suffers from a large variance and easily overfits.
1 code implementation • 14 Jul 2022 • Min Zhao, Fan Bao, Chongxuan Li, Jun Zhu
Further, we provide an alternative explanation of the EGSDE as a product of experts, where each of the three experts (corresponding to the SDE and two feature extractors) solely contributes to faithfulness or realism.
Ranked #1 on Image-to-Image Translation on AFHQ (Wild to Dog)
1 code implementation • 16 Jun 2022 • Cheng Lu, Kaiwen Zheng, Fan Bao, Jianfei Chen, Chongxuan Li, Jun Zhu
To fill up this gap, we show that the negative likelihood of the ODE can be bounded by controlling the first, second, and third-order score matching errors; and we further present a novel high-order denoising score matching method to enable maximum likelihood training of score-based diffusion ODEs.
1 code implementation • 15 Jun 2022 • Fan Bao, Chongxuan Li, Jiacheng Sun, Jun Zhu, Bo Zhang
Thus, the generation performance on a subset of timesteps is crucial, which is greatly influenced by the covariance design in DPMs.
2 code implementations • 2 Jun 2022 • Cheng Lu, Yuhao Zhou, Fan Bao, Jianfei Chen, Chongxuan Li, Jun Zhu
In this work, we propose an exact formulation of the solution of diffusion ODEs.
2 code implementations • ICLR 2022 • Fan Bao, Chongxuan Li, Jun Zhu, Bo Zhang
In this work, we present a surprising result that both the optimal reverse variance and the corresponding optimal KL divergence of a DPM have analytic forms w. r. t.
1 code implementation • NeurIPS 2021 • Fan Bao, Guoqiang Wu, Chongxuan Li, Jun Zhu, Bo Zhang
Our results can explain some mysterious behaviours of the bilevel programming in practice, for instance, overfitting to the validation set.
1 code implementation • NeurIPS Workshop ICBINB 2020 • Fan Bao, Kun Xu, Chongxuan Li, Lanqing Hong, Jun Zhu, Bo Zhang
The learning and evaluation of energy-based latent variable models (EBLVMs) without any structural assumptions are highly challenging, because the true posteriors and the partition functions in such models are generally intractable.
1 code implementation • NeurIPS 2020 • Fan Bao, Chongxuan Li, Kun Xu, Hang Su, Jun Zhu, Bo Zhang
This paper presents a bi-level score matching (BiSM) method to learn EBLVMs with general structures by reformulating SM as a bi-level optimization problem.
1 code implementation • 11 May 2019 • Fan Bao, Hang Su, Jun Zhu
Besides, our framework can be extended to semi-supervised boosting, where the boosted model learns a joint distribution of data and labels.
no code implementations • 25 Jan 2019 • Yinpeng Dong, Fan Bao, Hang Su, Jun Zhu
3) We propose to improve the consistency of neurons on adversarial example subset by an adversarial training algorithm with a consistent loss.
no code implementations • 18 Aug 2017 • Yinpeng Dong, Hang Su, Jun Zhu, Fan Bao
We find that: (1) the neurons in DNNs do not truly detect semantic objects/parts, but respond to objects/parts only as recurrent discriminative patches; (2) deep visual representations are not robust distributed codes of visual concepts because the representations of adversarial images are largely not consistent with those of real images, although they have similar visual appearance, both of which are different from previous findings.