Search Results for author: Chenyang Si

Found 24 papers, 10 papers with code

Scaling Supervised Local Learning with Augmented Auxiliary Networks

1 code implementation • 27 Feb 2024 • Chenxiang Ma, Jibin Wu, Chenyang Si, Kay Chen Tan

AugLocal constructs each hidden layer's auxiliary network by uniformly selecting a small subset of layers from its subsequent network layers to enhance their synergy.

Image Classification

Paper
Code

Towards Language-Driven Video Inpainting via Multimodal Large Language Models

no code implementations • 18 Jan 2024 • Jianzong Wu, Xiangtai Li, Chenyang Si, Shangchen Zhou, Jingkang Yang, Jiangning Zhang, Yining Li, Kai Chen, Yunhai Tong, Ziwei Liu, Chen Change Loy

We introduce a new task -- language-driven video inpainting, which uses natural language instructions to guide the inpainting process.

Video Inpainting

Paper
Add Code

FreeInit: Bridging Initialization Gap in Video Diffusion Models

1 code implementation • 12 Dec 2023 • Tianxing Wu, Chenyang Si, Yuming Jiang, Ziqi Huang, Ziwei Liu

Though diffusion-based video generation has witnessed rapid progress, the inference results of existing models still exhibit unsatisfactory temporal consistency and unnatural dynamics.

Denoising Text-to-Video Generation +1

434

Paper
Code

VideoBooth: Diffusion-based Video Generation with Image Prompts

no code implementations • 1 Dec 2023 • Yuming Jiang, Tianxing Wu, Shuai Yang, Chenyang Si, Dahua Lin, Yu Qiao, Chen Change Loy, Ziwei Liu

In this paper, we study the task of video generation with image prompts, which provide more accurate and direct content control beyond the text prompts.

Video Generation

Paper
Add Code

VBench: Comprehensive Benchmark Suite for Video Generative Models

1 code implementation • 29 Nov 2023 • Ziqi Huang, Yinan He, Jiashuo Yu, Fan Zhang, Chenyang Si, Yuming Jiang, Yuanhan Zhang, Tianxing Wu, Qingyang Jin, Nattapol Chanpaisit, Yaohui Wang, Xinyuan Chen, LiMin Wang, Dahua Lin, Yu Qiao, Ziwei Liu

We will open-source VBench, including all prompts, evaluation methods, generated videos, and human preference annotations, and also include more video generation models in VBench to drive forward the field of video generation.

Image Generation Video Generation

294

Paper
Code

LAVIE: High-Quality Video Generation with Cascaded Latent Diffusion Models

2 code implementations • 26 Sep 2023 • Yaohui Wang, Xinyuan Chen, Xin Ma, Shangchen Zhou, Ziqi Huang, Yi Wang, Ceyuan Yang, Yinan He, Jiashuo Yu, Peiqing Yang, Yuwei Guo, Tianxing Wu, Chenyang Si, Yuming Jiang, Cunjian Chen, Chen Change Loy, Bo Dai, Dahua Lin, Yu Qiao, Ziwei Liu

To this end, we propose LaVie, an integrated video generation framework that operates on cascaded video latent diffusion models, comprising a base T2V model, a temporal interpolation model, and a video super-resolution model.

Ranked #4 on Text-to-Video Generation on EvalCrafter Text-to-Video (ECTV) Dataset (using extra training data)

Text-to-Video Generation Video Generation +1

736

Paper
Code

FreeU: Free Lunch in Diffusion U-Net

1 code implementation • 20 Sep 2023 • Chenyang Si, Ziqi Huang, Yuming Jiang, Ziwei Liu

In this paper, we uncover the untapped potential of diffusion U-Net, which serves as a "free lunch" that substantially improves the generation quality on the fly.

Decoder Denoising +1

1,467

Paper
Code

FSAR: Federated Skeleton-based Action Recognition with Adaptive Topology Structure and Knowledge Distillation

no code implementations • ICCV 2023 • Jingwen Guo, Hong Liu, Shitong Sun, Tianyu Guo, Min Zhang, Chenyang Si

Existing skeleton-based action recognition methods typically follow a centralized learning paradigm, which can pose privacy concerns when exposing human-related videos.

Action Recognition Federated Learning +3

Paper
Add Code

Semantic Prompt for Few-Shot Image Recognition

1 code implementation • CVPR 2023 • Wentao Chen, Chenyang Si, Zhang Zhang, Liang Wang, Zilei Wang, Tieniu Tan

Instead of the naive exploitation of semantic information for remedying classifiers, we explore leveraging semantic information as prompts to tune the visual feature extraction network adaptively.

Few-Shot Learning

Paper
Code

MetaFormer Baselines for Vision

7 code implementations • 24 Oct 2022 • Weihao Yu, Chenyang Si, Pan Zhou, Mi Luo, Yichen Zhou, Jiashi Feng, Shuicheng Yan, Xinchao Wang

By simply applying depthwise separable convolutions as token mixer in the bottom stages and vanilla self-attention in the top stages, the resulting model CAFormer sets a new record on ImageNet-1K: it achieves an accuracy of 85. 5% at 224x224 resolution, under normal supervised training without external data or distillation.

Ranked #2 on Domain Generalization on ImageNet-C (using extra training data)

Domain Generalization Image Classification

29,916

Paper
Code

Exploring Semantic Attributes from A Foundation Model for Federated Learning of Disjoint Label Spaces

no code implementations • 29 Aug 2022 • Shitong Sun, Chenyang Si, Guile Wu, Shaogang Gong

To resolve this problem, federated learning has been introduced to transfer knowledge across multiple sources (clients) with non-shared data while optimising a globally generalised central model (server).

Attribute Federated Learning +3

Paper
Add Code

Inception Transformer

3 code implementations • 25 May 2022 • Chenyang Si, Weihao Yu, Pan Zhou, Yichen Zhou, Xinchao Wang, Shuicheng Yan

Recent studies show that Transformer has strong capability of building long-range dependencies, yet is incompetent in capturing high frequencies that predominantly convey local information.

Image Classification

561

Paper
Code

Mugs: A Multi-Granular Self-Supervised Learning Framework

1 code implementation • 27 Mar 2022 • Pan Zhou, Yichen Zhou, Chenyang Si, Weihao Yu, Teck Khim Ng, Shuicheng Yan

It provides complementary instance supervision to IDS via an extra alignment on local neighbors, and scatters different local-groups separately to increase discriminability.

Ranked #13 on Self-Supervised Image Classification on ImageNet

Contrastive Learning Self-Supervised Image Classification +3

Paper
Code

Generalizable Person Re-Identification via Self-Supervised Batch Norm Test-Time Adaption

no code implementations • 1 Mar 2022 • Ke Han, Chenyang Si, Yan Huang, Liang Wang, Tieniu Tan

In this paper, we investigate the generalization problem of person re-identification (re-id), whose major challenge is the distribution shift on an unseen domain.

Generalizable Person Re-identification

Paper
Add Code

Contrast-reconstruction Representation Learning for Self-supervised Skeleton-based Action Recognition

no code implementations • 22 Nov 2021 • Peng Wang, Jun Wen, Chenyang Si, Yuntao Qian, Liang Wang

Finally, in the Information Fuser, we explore varied strategies to combine the Sequence Reconstructor and Contrastive Motion Learner, and propose to capture postures and motions simultaneously via a knowledge-distillation based fusion strategy that transfers the motion learning from the Contrastive Motion Learner to the Sequence Reconstructor.

Action Recognition Contrastive Learning +4

Paper
Add Code

MetaFormer Is Actually What You Need for Vision

14 code implementations • CVPR 2022 • Weihao Yu, Mi Luo, Pan Zhou, Chenyang Si, Yichen Zhou, Xinchao Wang, Jiashi Feng, Shuicheng Yan

Based on this observation, we hypothesize that the general architecture of the Transformers, instead of the specific token mixer module, is more essential to the model's performance.

Ranked #9 on Semantic Segmentation on DensePASS

Image Classification Object Detection +1

125,796

Paper
Code

Few-Shot Learning with Part Discovery and Augmentation from Unlabeled Images

no code implementations • 25 May 2021 • Wentao Chen, Chenyang Si, Wei Wang, Liang Wang, Zilei Wang, Tieniu Tan

Few-shot learning is a challenging task since only few instances are given for recognizing an unseen class.

Ranked #3 on Unsupervised Few-Shot Image Classification on Tiered ImageNet 5-way (1-shot)

Few-Shot Learning Inductive Bias +2

Paper
Add Code

Adversarial Self-Supervised Learning for Semi-Supervised 3D Action Recognition

no code implementations • ECCV 2020 • Chenyang Si, Xuecheng Nie, Wei Wang, Liang Wang, Tieniu Tan, Jiashi Feng

Self-supervised learning (SSL) has been proved very effective at learning representations from unlabeled data in the image domain.

3D Action Recognition Self-Supervised Learning

Paper
Add Code

Progressive Cluster Purification for Transductive Few-shot Learning

no code implementations • 10 Jun 2019 • Chenyang Si, Wentao Chen, Wei Wang, Liang Wang, Tieniu Tan

Furthermore, the inter-class classification and the intra-class transduction are extremely flexible to be repeated several times to progressively purify the clusters.

Few-Shot Learning General Classification

Paper
Add Code

An Attention Enhanced Graph Convolutional LSTM Network for Skeleton-Based Action Recognition

no code implementations • CVPR 2019 • Chenyang Si, Wentao Chen, Wei Wang, Liang Wang, Tieniu Tan

Nevertheless, how to effectively extract discriminative spatial and temporal features is still a challenging problem.

Ranked #51 on Skeleton Based Action Recognition on NTU RGB+D

Action Recognition Skeleton Based Action Recognition +1

Paper
Add Code

Pose-Guided Multi-Granularity Attention Network for Text-Based Person Search

no code implementations • 22 Sep 2018 • Ya Jing, Chenyang Si, Jun-Bo Wang, Wei Wang, Liang Wang, Tieniu Tan

To exploit the multilevel corresponding visual contents, we propose a pose-guided multi-granularity attention network (PMA).

Person Search Sentence +1

Paper
Add Code

Multistage Adversarial Losses for Pose-Based Human Image Synthesis

no code implementations • CVPR 2018 • Chenyang Si, Wei Wang, Liang Wang, Tieniu Tan

Human image synthesis has extensive practical applications e. g. person re-identification and data augmentation for human pose estimation.

Data Augmentation Image Generation +2

Paper
Add Code

Pose-Based Two-Stream Relational Networks for Action Recognition in Videos

no code implementations • 22 May 2018 • Wei Wang, Jinjin Zhang, Chenyang Si, Liang Wang

Second, few pose-based methods model the action-related objects in recognizing human-object interaction actions in which objects play an important role.

Action Recognition In Videos Human-Object Interaction Detection +2

Paper
Add Code

Skeleton-Based Action Recognition with Spatial Reasoning and Temporal Stack Learning

no code implementations • ECCV 2018 • Chenyang Si, Ya Jing, Wei Wang, Liang Wang, Tieniu Tan

Skeleton-based action recognition has made great progress recently, but many problems still remain unsolved.

Ranked #81 on Skeleton Based Action Recognition on NTU RGB+D

Action Recognition Human-Object Interaction Detection +2

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.