1 code implementation • 27 Feb 2024 • Chenxiang Ma, Jibin Wu, Chenyang Si, Kay Chen Tan
AugLocal constructs each hidden layer's auxiliary network by uniformly selecting a small subset of layers from its subsequent network layers to enhance their synergy.
no code implementations • 18 Jan 2024 • Jianzong Wu, Xiangtai Li, Chenyang Si, Shangchen Zhou, Jingkang Yang, Jiangning Zhang, Yining Li, Kai Chen, Yunhai Tong, Ziwei Liu, Chen Change Loy
We introduce a new task -- language-driven video inpainting, which uses natural language instructions to guide the inpainting process.
1 code implementation • 12 Dec 2023 • Tianxing Wu, Chenyang Si, Yuming Jiang, Ziqi Huang, Ziwei Liu
Though diffusion-based video generation has witnessed rapid progress, the inference results of existing models still exhibit unsatisfactory temporal consistency and unnatural dynamics.
no code implementations • 1 Dec 2023 • Yuming Jiang, Tianxing Wu, Shuai Yang, Chenyang Si, Dahua Lin, Yu Qiao, Chen Change Loy, Ziwei Liu
In this paper, we study the task of video generation with image prompts, which provide more accurate and direct content control beyond the text prompts.
1 code implementation • 29 Nov 2023 • Ziqi Huang, Yinan He, Jiashuo Yu, Fan Zhang, Chenyang Si, Yuming Jiang, Yuanhan Zhang, Tianxing Wu, Qingyang Jin, Nattapol Chanpaisit, Yaohui Wang, Xinyuan Chen, LiMin Wang, Dahua Lin, Yu Qiao, Ziwei Liu
We will open-source VBench, including all prompts, evaluation methods, generated videos, and human preference annotations, and also include more video generation models in VBench to drive forward the field of video generation.
2 code implementations • 26 Sep 2023 • Yaohui Wang, Xinyuan Chen, Xin Ma, Shangchen Zhou, Ziqi Huang, Yi Wang, Ceyuan Yang, Yinan He, Jiashuo Yu, Peiqing Yang, Yuwei Guo, Tianxing Wu, Chenyang Si, Yuming Jiang, Cunjian Chen, Chen Change Loy, Bo Dai, Dahua Lin, Yu Qiao, Ziwei Liu
To this end, we propose LaVie, an integrated video generation framework that operates on cascaded video latent diffusion models, comprising a base T2V model, a temporal interpolation model, and a video super-resolution model.
Ranked #4 on Text-to-Video Generation on EvalCrafter Text-to-Video (ECTV) Dataset (using extra training data)
1 code implementation • 20 Sep 2023 • Chenyang Si, Ziqi Huang, Yuming Jiang, Ziwei Liu
In this paper, we uncover the untapped potential of diffusion U-Net, which serves as a "free lunch" that substantially improves the generation quality on the fly.
no code implementations • ICCV 2023 • Jingwen Guo, Hong Liu, Shitong Sun, Tianyu Guo, Min Zhang, Chenyang Si
Existing skeleton-based action recognition methods typically follow a centralized learning paradigm, which can pose privacy concerns when exposing human-related videos.
1 code implementation • CVPR 2023 • Wentao Chen, Chenyang Si, Zhang Zhang, Liang Wang, Zilei Wang, Tieniu Tan
Instead of the naive exploitation of semantic information for remedying classifiers, we explore leveraging semantic information as prompts to tune the visual feature extraction network adaptively.
7 code implementations • 24 Oct 2022 • Weihao Yu, Chenyang Si, Pan Zhou, Mi Luo, Yichen Zhou, Jiashi Feng, Shuicheng Yan, Xinchao Wang
By simply applying depthwise separable convolutions as token mixer in the bottom stages and vanilla self-attention in the top stages, the resulting model CAFormer sets a new record on ImageNet-1K: it achieves an accuracy of 85. 5% at 224x224 resolution, under normal supervised training without external data or distillation.
Ranked #2 on Domain Generalization on ImageNet-C (using extra training data)
no code implementations • 29 Aug 2022 • Shitong Sun, Chenyang Si, Guile Wu, Shaogang Gong
To resolve this problem, federated learning has been introduced to transfer knowledge across multiple sources (clients) with non-shared data while optimising a globally generalised central model (server).
3 code implementations • 25 May 2022 • Chenyang Si, Weihao Yu, Pan Zhou, Yichen Zhou, Xinchao Wang, Shuicheng Yan
Recent studies show that Transformer has strong capability of building long-range dependencies, yet is incompetent in capturing high frequencies that predominantly convey local information.
1 code implementation • 27 Mar 2022 • Pan Zhou, Yichen Zhou, Chenyang Si, Weihao Yu, Teck Khim Ng, Shuicheng Yan
It provides complementary instance supervision to IDS via an extra alignment on local neighbors, and scatters different local-groups separately to increase discriminability.
Ranked #13 on Self-Supervised Image Classification on ImageNet
Contrastive Learning Self-Supervised Image Classification +3
no code implementations • 1 Mar 2022 • Ke Han, Chenyang Si, Yan Huang, Liang Wang, Tieniu Tan
In this paper, we investigate the generalization problem of person re-identification (re-id), whose major challenge is the distribution shift on an unseen domain.
no code implementations • 22 Nov 2021 • Peng Wang, Jun Wen, Chenyang Si, Yuntao Qian, Liang Wang
Finally, in the Information Fuser, we explore varied strategies to combine the Sequence Reconstructor and Contrastive Motion Learner, and propose to capture postures and motions simultaneously via a knowledge-distillation based fusion strategy that transfers the motion learning from the Contrastive Motion Learner to the Sequence Reconstructor.
14 code implementations • CVPR 2022 • Weihao Yu, Mi Luo, Pan Zhou, Chenyang Si, Yichen Zhou, Xinchao Wang, Jiashi Feng, Shuicheng Yan
Based on this observation, we hypothesize that the general architecture of the Transformers, instead of the specific token mixer module, is more essential to the model's performance.
Ranked #9 on Semantic Segmentation on DensePASS
no code implementations • 25 May 2021 • Wentao Chen, Chenyang Si, Wei Wang, Liang Wang, Zilei Wang, Tieniu Tan
Few-shot learning is a challenging task since only few instances are given for recognizing an unseen class.
no code implementations • ECCV 2020 • Chenyang Si, Xuecheng Nie, Wei Wang, Liang Wang, Tieniu Tan, Jiashi Feng
Self-supervised learning (SSL) has been proved very effective at learning representations from unlabeled data in the image domain.
no code implementations • 10 Jun 2019 • Chenyang Si, Wentao Chen, Wei Wang, Liang Wang, Tieniu Tan
Furthermore, the inter-class classification and the intra-class transduction are extremely flexible to be repeated several times to progressively purify the clusters.
no code implementations • CVPR 2019 • Chenyang Si, Wentao Chen, Wei Wang, Liang Wang, Tieniu Tan
Nevertheless, how to effectively extract discriminative spatial and temporal features is still a challenging problem.
Ranked #51 on Skeleton Based Action Recognition on NTU RGB+D
no code implementations • 22 Sep 2018 • Ya Jing, Chenyang Si, Jun-Bo Wang, Wei Wang, Liang Wang, Tieniu Tan
To exploit the multilevel corresponding visual contents, we propose a pose-guided multi-granularity attention network (PMA).
no code implementations • CVPR 2018 • Chenyang Si, Wei Wang, Liang Wang, Tieniu Tan
Human image synthesis has extensive practical applications e. g. person re-identification and data augmentation for human pose estimation.
no code implementations • 22 May 2018 • Wei Wang, Jinjin Zhang, Chenyang Si, Liang Wang
Second, few pose-based methods model the action-related objects in recognizing human-object interaction actions in which objects play an important role.
Action Recognition In Videos Human-Object Interaction Detection +2
no code implementations • ECCV 2018 • Chenyang Si, Ya Jing, Wei Wang, Liang Wang, Tieniu Tan
Skeleton-based action recognition has made great progress recently, but many problems still remain unsolved.
Ranked #81 on Skeleton Based Action Recognition on NTU RGB+D