1 code implementation • 17 Jan 2024 • Shaobin Zhuang, Kunchang Li, Xinyuan Chen, Yaohui Wang, Ziwei Liu, Yu Qiao, Yali Wang
More importantly, Vlogger can generate over 5-minute vlogs from open-world descriptions, without loss of video coherence on script and actor.
2 code implementations • 5 Jan 2024 • Xin Ma, Yaohui Wang, Gengyun Jia, Xinyuan Chen, Ziwei Liu, Yuan-Fang Li, Cunjian Chen, Yu Qiao
We propose a novel Latent Diffusion Transformer, namely Latte, for video generation.
1 code implementation • 5 Jan 2024 • DeepSeek-AI, :, Xiao Bi, Deli Chen, Guanting Chen, Shanhuang Chen, Damai Dai, Chengqi Deng, Honghui Ding, Kai Dong, Qiushi Du, Zhe Fu, Huazuo Gao, Kaige Gao, Wenjun Gao, Ruiqi Ge, Kang Guan, Daya Guo, JianZhong Guo, Guangbo Hao, Zhewen Hao, Ying He, Wenjie Hu, Panpan Huang, Erhang Li, Guowei Li, Jiashi Li, Yao Li, Y. K. Li, Wenfeng Liang, Fangyun Lin, A. X. Liu, Bo Liu, Wen Liu, Xiaodong Liu, Xin Liu, Yiyuan Liu, Haoyu Lu, Shanghao Lu, Fuli Luo, Shirong Ma, Xiaotao Nie, Tian Pei, Yishi Piao, Junjie Qiu, Hui Qu, Tongzheng Ren, Zehui Ren, Chong Ruan, Zhangli Sha, Zhihong Shao, Junxiao Song, Xuecheng Su, Jingxiang Sun, Yaofeng Sun, Minghui Tang, Bingxuan Wang, Peiyi Wang, Shiyu Wang, Yaohui Wang, Yongji Wang, Tong Wu, Y. Wu, Xin Xie, Zhenda Xie, Ziwei Xie, Yiliang Xiong, Hanwei Xu, R. X. Xu, Yanhong Xu, Dejian Yang, Yuxiang You, Shuiping Yu, Xingkai Yu, B. Zhang, Haowei Zhang, Lecong Zhang, Liyue Zhang, Mingchuan Zhang, Minghua Zhang, Wentao Zhang, Yichao Zhang, Chenggang Zhao, Yao Zhao, Shangyan Zhou, Shunfeng Zhou, Qihao Zhu, Yuheng Zou
The rapid development of open-source large language models (LLMs) has been truly remarkable.
1 code implementation • 19 Dec 2023 • Lingjun Zhang, Xinyuan Chen, Yaohui Wang, Yue Lu, Yu Qiao
To tackle this problem, we propose Diff-Text, which is a training-free scene text generation framework for any language.
no code implementations • 11 Dec 2023 • Zehuan Huang, Hao Wen, Junting Dong, Yaohui Wang, Yangguang Li, Xinyuan Chen, Yan-Pei Cao, Ding Liang, Yu Qiao, Bo Dai, Lu Sheng
Generating multiview images from a single view facilitates the rapid generation of a 3D mesh conditioned on a single image.
1 code implementation • 29 Nov 2023 • Ziqi Huang, Yinan He, Jiashuo Yu, Fan Zhang, Chenyang Si, Yuming Jiang, Yuanhan Zhang, Tianxing Wu, Qingyang Jin, Nattapol Chanpaisit, Yaohui Wang, Xinyuan Chen, LiMin Wang, Dahua Lin, Yu Qiao, Ziwei Liu
We will open-source VBench, including all prompts, evaluation methods, generated videos, and human preference annotations, and also include more video generation models in VBench to drive forward the field of video generation.
1 code implementation • 23 Nov 2023 • YuFei Wang, Wenhan Yang, Xinyuan Chen, Yaohui Wang, Lanqing Guo, Lap-Pui Chau, Ziwei Liu, Yu Qiao, Alex C. Kot, Bihan Wen
Extensive experiments conducted on synthetic and real-world datasets demonstrate that the proposed method can achieve comparable or even superior performance compared to both previous SOTA methods and the teacher model, in just one sampling step, resulting in a remarkable up to x10 speedup for inference.
no code implementations • 31 Oct 2023 • Xinyuan Chen, Yaohui Wang, Lingjun Zhang, Shaobin Zhuang, Xin Ma, Jiashuo Yu, Yali Wang, Dahua Lin, Yu Qiao, Ziwei Liu
The goal is to generate high-quality long videos with smooth and creative transitions between scenes and varying lengths of shot-level videos.
1 code implementation • 11 Oct 2023 • Bo Peng, Xinyuan Chen, Yaohui Wang, Chaochao Lu, Yu Qiao
In this work, we introduce ConditionVideo, a training-free approach to text-to-video generation based on the provided condition, video, and input text, by leveraging the power of off-the-shelf text-to-image generation methods (e. g., Stable Diffusion).
2 code implementations • 26 Sep 2023 • Yaohui Wang, Xinyuan Chen, Xin Ma, Shangchen Zhou, Ziqi Huang, Yi Wang, Ceyuan Yang, Yinan He, Jiashuo Yu, Peiqing Yang, Yuwei Guo, Tianxing Wu, Chenyang Si, Yuming Jiang, Cunjian Chen, Chen Change Loy, Bo Dai, Dahua Lin, Yu Qiao, Ziwei Liu
To this end, we propose LaVie, an integrated video generation framework that operates on cascaded video latent diffusion models, comprising a base T2V model, a temporal interpolation model, and a video super-resolution model.
Ranked #4 on Text-to-Video Generation on EvalCrafter Text-to-Video (ECTV) Dataset (using extra training data)
no code implementations • 28 Aug 2023 • Di Yang, Yaohui Wang, Antitza Dantcheva, Quan Kong, Lorenzo Garattoni, Gianpiero Francesca, Francois Bremond
In this context, we propose Latent Action Composition (LAC), a novel self-supervised framework aiming at learning from synthesized composable motions for skeleton-based action segmentation.
no code implementations • 5 Aug 2023 • Beitong Tian, Kuan-Chieh Lu, Ahmadreza Eslaminia, Yaohui Wang, Chenhui Shao, Klara Nahrstedt
Ultrasonic welding machines play a critical role in the lithium battery industry, facilitating the bonding of batteries with conductors.
1 code implementation • 13 Jul 2023 • Yi Wang, Yinan He, Yizhuo Li, Kunchang Li, Jiashuo Yu, Xin Ma, Xinhao Li, Guo Chen, Xinyuan Chen, Yaohui Wang, Conghui He, Ping Luo, Ziwei Liu, Yali Wang, LiMin Wang, Yu Qiao
Specifically, we utilize a multi-scale approach to generate video-related descriptions.
4 code implementations • 10 Jul 2023 • Yuwei Guo, Ceyuan Yang, Anyi Rao, Zhengyang Liang, Yaohui Wang, Yu Qiao, Maneesh Agrawala, Dahua Lin, Bo Dai
Once trained, the motion module can be inserted into a personalized T2I model to form a personalized animation generator.
no code implementations • 10 May 2023 • Di Yang, Yaohui Wang, Quan Kong, Antitza Dantcheva, Lorenzo Garattoni, Gianpiero Francesca, Francois Bremond
Self-supervised video representation learning aimed at maximizing similarity between different temporal segments of one video, in order to enforce feature persistence over time.
5 code implementations • 6 May 2023 • Yaohui Wang, Xin Ma, Xinyuan Chen, Antitza Dantcheva, Bo Dai, Yu Qiao
Our key idea is to represent motion as a sequence of flow maps in the generation process, which inherently isolate motion from appearance.
1 code implementation • 2 May 2023 • Jiashuo Yu, Yaohui Wang, Xinyuan Chen, Xiao Sun, Yu Qiao
To this end, we present Long-Term Rhythmic Video Soundtracker (LORIS), a novel framework to synthesize long-term conditional waveforms.
no code implementations • 24 Apr 2023 • Zeyu Lu, Chengyue Wu, Xinyuan Chen, Yaohui Wang, Lei Bai, Yu Qiao, Xihui Liu
To mitigate those limitations, we propose Hierarchical Diffusion Autoencoders (HDAE) that exploit the fine-grained-to-abstract and lowlevel-to-high-level feature hierarchy for the latent space of diffusion models.
1 code implementation • 2 Jan 2023 • Hao Chen, Yaohui Wang, Benoit Lagadec, Antitza Dantcheva, Francois Bremond
This work focuses on unsupervised representation learning in person re-identification (ReID).
no code implementations • ICCV 2023 • Di Yang, Yaohui Wang, Antitza Dantcheva, Quan Kong, Lorenzo Garattoni, Gianpiero Francesca, Francois Bremond
In this context, we propose Latent Action Composition (LAC), a novel self-supervised framework aiming at learning from synthesized composable motions for skeleton-based action segmentation.
no code implementations • 1 Dec 2022 • Zhifeng Chen, Congyu Liao, Xiaozhi Cao, Benedikt A. Poser, Zhongbiao Xu, Wei-Ching Lo, Manyi Wen, Jaejin Cho, Qiyuan Tian, Yaohui Wang, Yanqiu Feng, Ling Xia, Wufan Chen, Feng Liu, Berkin Bilgic
Purpose: This work aims to develop a novel distortion-free 3D-EPI acquisition and image reconstruction technique for fast and robust, high-resolution, whole-brain imaging as well as quantitative T2* mapping.
1 code implementation • 31 Aug 2022 • Di Yang, Yaohui Wang, Antitza Dantcheva, Lorenzo Garattoni, Gianpiero Francesca, Francois Bremond
Current self-supervised approaches for skeleton action representation learning often focus on constrained scenarios, where videos and skeleton data are recorded in laboratory settings.
no code implementations • 17 Mar 2022 • Yaohui Wang, Di Yang, Francois Bremond, Antitza Dantcheva
Specifically, motion in generated video is constructed by linear displacement of codes in the latent space.
1 code implementation • ICLR 2022 • Yaohui Wang, Di Yang, Francois Bremond, Antitza Dantcheva
Deviating from such models, we here introduce Latent Image Animator (LIA), a self-supervised auto-encoder that evades need for structure representation.
1 code implementation • 19 Jul 2021 • Di Yang, Yaohui Wang, Antitza Dantcheva, Lorenzo Garattoni, Gianpiero Francesca, Francois Bremond
This is achieved by learning an optimal dependency matrix from the uniform distribution based on a multi-head attention mechanism.
Ranked #1 on Skeleton Based Action Recognition on UPenn Action
no code implementations • 8 Jan 2021 • Yaohui Wang, Francois Bremond, Antitza Dantcheva
We design the architecture of InMoDeGAN-generator in accordance to proposed Linear Motion Decomposition, which carries the assumption that motion can be represented by a dictionary, with related vectors forming an orthogonal basis in the latent space.
2 code implementations • CVPR 2021 • Hao Chen, Yaohui Wang, Benoit Lagadec, Antitza Dantcheva, Francois Bremond
In this context, we propose a mesh-based view generator.
1 code implementation • 10 Nov 2020 • Di Yang, Rui Dai, Yaohui Wang, Rupayan Mallick, Luca Minciullo, Gianpiero Francesca, Francois Bremond
Taking advantage of human pose data for understanding human activities has attracted much attention these days.
1 code implementation • CVPR 2020 • Yaohui Wang, Piotr Bilinski, Francois Bremond, Antitza Dantcheva
Creating realistic human videos entails the challenge of being able to simultaneously generate both appearance, as well as motion.