Search Results for author: Shilei Wen

Found 39 papers, 21 papers with code

AffineQuant: Affine Transformation Quantization for Large Language Models

1 code implementation • 19 Mar 2024 • Yuexiao Ma, Huixia Li, Xiawu Zheng, Feng Ling, Xuefeng Xiao, Rui Wang, Shilei Wen, Fei Chao, Rongrong Ji

Among these techniques, Post-Training Quantization (PTQ) has emerged as a subject of considerable interest due to its noteworthy compression efficiency and cost-effectiveness in the context of training.

Quantization

Paper
Code

DiffusionGPT: LLM-Driven Text-to-Image Generation System

no code implementations • 18 Jan 2024 • Jie Qin, Jie Wu, Weifeng Chen, Yuxi Ren, Huixia Li, Hefeng Wu, Xuefeng Xiao, Rui Wang, Shilei Wen

Diffusion models have opened up new avenues for the field of image generation, resulting in the proliferation of high-quality models shared on open-source platforms.

Model Selection Text-to-Image Generation

Paper
Add Code

FreeSeg: Unified, Universal and Open-Vocabulary Image Segmentation

no code implementations • CVPR 2023 • Jie Qin, Jie Wu, Pengxiang Yan, Ming Li, Ren Yuxi, Xuefeng Xiao, Yitong Wang, Rui Wang, Shilei Wen, Xin Pan, Xingang Wang

Recently, open-vocabulary learning has emerged to accomplish segmentation for arbitrary categories of text-based descriptions, which popularizes the segmentation system to more general-purpose application scenarios.

Ranked #6 on Open Vocabulary Panoptic Segmentation on ADE20K

Image Segmentation Instance Segmentation +3

Paper
Add Code

MeMaHand: Exploiting Mesh-Mano Interaction for Single Image Two-Hand Reconstruction

no code implementations • CVPR 2023 • Congyi Wang, Feida Zhu, Shilei Wen

Existing methods proposed for hand reconstruction tasks usually parameterize a generic 3D hand model or predict hand mesh positions directly.

Vocal Bursts Valence Prediction

Paper
Add Code

Solving Oscillation Problem in Post-Training Quantization Through a Theoretical Perspective

1 code implementation • CVPR 2023 • Yuexiao Ma, Huixia Li, Xiawu Zheng, Xuefeng Xiao, Rui Wang, Shilei Wen, Xin Pan, Fei Chao, Rongrong Ji

In particular, we first formulate the oscillation in PTQ and prove the problem is caused by the difference in module capacity.

Quantization

Paper
Code

Beyond Self-Supervision: A Simple Yet Effective Network Distillation Alternative to Improve Backbones

2 code implementations • 10 Mar 2021 • Cheng Cui, Ruoyu Guo, Yuning Du, Dongliang He, Fu Li, Zewu Wu, Qiwen Liu, Shilei Wen, Jizhou Huang, Xiaoguang Hu, dianhai yu, Errui Ding, Yanjun Ma

Recently, research efforts have been concentrated on revealing how pre-trained model makes a difference in neural network performance.

Knowledge Distillation object-detection +3

5,277

Paper
Code

RSPNet: Relative Speed Perception for Unsupervised Video Representation Learning

1 code implementation • 27 Oct 2020 • Peihao Chen, Deng Huang, Dongliang He, Xiang Long, Runhao Zeng, Shilei Wen, Mingkui Tan, Chuang Gan

We study unsupervised video representation learning that seeks to learn both motion and appearance features from unlabeled video only, which can be reused for downstream tasks such as action recognition.

Ranked #11 on Self-Supervised Action Recognition on UCF101

Representation Learning Retrieval +2

Paper
Code

Coherent Loss: A Generic Framework for Stable Video Segmentation

no code implementations • 25 Oct 2020 • Mingyang Qian, Yi Fu, Xiao Tan, YingYing Li, Jinqing Qi, Huchuan Lu, Shilei Wen, Errui Ding

Video segmentation approaches are of great importance for numerous vision tasks especially in video manipulation for entertainment.

Segmentation Semantic Segmentation +2

Paper
Add Code

Discriminative Sounding Objects Localization via Self-supervised Audiovisual Matching

1 code implementation • NeurIPS 2020 • Di Hu, Rui Qian, Minyue Jiang, Xiao Tan, Shilei Wen, Errui Ding, Weiyao Lin, Dejing Dou

First, we propose to learn robust object representations by aggregating the candidate sound localization results in the single source scenes.

Object Object Localization

Paper
Code

PP-YOLO: An Effective and Efficient Implementation of Object Detector

5 code implementations • 23 Jul 2020 • Xiang Long, Kaipeng Deng, Guanzhong Wang, Yang Zhang, Qingqing Dang, Yuan Gao, Hui Shen, Jianguo Ren, Shumin Han, Errui Ding, Shilei Wen

We mainly try to combine various existing tricks that almost not increase the number of model parameters and FLOPs, to achieve the goal of improving the accuracy of detector as much as possible while ensuring that the speed is almost unchanged.

Ranked #134 on Object Detection on COCO test-dev (using extra training data)

Object object-detection +1

12,167

Paper
Code

Graph-PCNN: Two Stage Human Pose Estimation with Graph Pose Refinement

no code implementations • ECCV 2020 • Jian Wang, Xiang Long, Yuan Gao, Errui Ding, Shilei Wen

In the first stage, heatmap regression network is applied to obtain a rough localization result, and a set of proposal keypoints, called guided points, are sampled.

Pose Estimation regression +1

Paper
Add Code

PointTrack++ for Effective Online Multi-Object Tracking and Segmentation

1 code implementation • 3 Jul 2020 • Zhenbo Xu, Wei zhang, Xiao Tan, Wei Yang, Xiangbo Su, Yuchen Yuan, Hongwu Zhang, Shilei Wen, Errui Ding, Liusheng Huang

In this work, we present PointTrack++, an effective on-line framework for MOTS, which remarkably extends our recently proposed PointTrack framework.

Data Augmentation Decoder +8

260

Paper
Code

Segment as Points for Efficient Online Multi-Object Tracking and Segmentation

1 code implementation • ECCV 2020 • Zhenbo Xu, Wei zhang, Xiao Tan, Wei Yang, Huan Huang, Shilei Wen, Errui Ding, Liusheng Huang

The resulting online MOTS framework, named PointTrack, surpasses all the state-of-the-art methods including 3D tracking methods by large margins (5. 4% higher MOTSA and 18 times faster over MOTSFusion) with the near real-time speed (22 FPS).

Multi-Object Tracking Multi-Object Tracking and Segmentation +1

260

Paper
Code

Associate-3Ddet: Perceptual-to-Conceptual Association for 3D Point Cloud Object Detection

no code implementations • CVPR 2020 • Liang Du, Xiaoqing Ye, Xiao Tan, Jianfeng Feng, Zhenbo Xu, Errui Ding, Shilei Wen

Object detection from 3D point clouds remains a challenging task, though recent studies pushed the envelope with the deep learning techniques.

3D Object Detection Domain Adaptation +2

Paper
Add Code

NTIRE 2020 Challenge on Video Quality Mapping: Methods and Results

no code implementations • 5 May 2020 • Dario Fuoli, Zhiwu Huang, Martin Danelljan, Radu Timofte, Hua Wang, Longcun Jin, Dewei Su, Jing Liu, Jaehoon Lee, Michal Kudelski, Lukasz Bala, Dmitry Hrybov, Marcin Mozejko, Muchen Li, Si-Yao Li, Bo Pang, Cewu Lu, Chao Li, Dongliang He, Fu Li, Shilei Wen

For track 2, some existing methods are evaluated, showing promising solutions to the weakly-supervised video quality mapping problem.

Paper
Add Code

NTIRE 2020 Challenge on Perceptual Extreme Super-Resolution: Methods and Results

no code implementations • 3 May 2020 • Kai Zhang, Shuhang Gu, Radu Timofte, Taizhang Shang, Qiuju Dai, Shengchen Zhu, Tong Yang, Yandong Guo, Younghyun Jo, Sejong Yang, Seon Joo Kim, Lin Zha, Jiande Jiang, Xinbo Gao, Wen Lu, Jing Liu, Kwangjin Yoon, Taegyun Jeon, Kazutoshi Akita, Takeru Ooba, Norimichi Ukita, Zhipeng Luo, Yuehan Yao, Zhenyu Xu, Dongliang He, Wenhao Wu, Yukang Ding, Chao Li, Fu Li, Shilei Wen, Jianwei Li, Fuzhi Yang, Huan Yang, Jianlong Fu, Byung-Hoon Kim, JaeHyun Baek, Jong Chul Ye, Yuchen Fan, Thomas S. Huang, Junyeop Lee, Bokyeung Lee, Jungki Min, Gwantae Kim, Kanghyu Lee, Jaihyun Park, Mykola Mykhailych, Haoyu Zhong, Yukai Shi, Xiaojun Yang, Zhijing Yang, Liang Lin, Tongtong Zhao, Jinjia Peng, Huibing Wang, Zhi Jin, Jiahao Wu, Yifu Chen, Chenming Shang, Huanrong Zhang, Jeongki Min, Hrishikesh P. S, Densen Puthussery, Jiji C. V

This paper reviews the NTIRE 2020 challenge on perceptual extreme super-resolution with focus on proposed solutions and results.

Image Super-Resolution

Paper
Add Code

ZoomNet: Part-Aware Adaptive Zooming Neural Network for 3D Object Detection

1 code implementation • 1 Mar 2020 • Zhenbo Xu, Wei zhang, Xiaoqing Ye, Xiao Tan, Wei Yang, Shilei Wen, Errui Ding, Ajin Meng, Liusheng Huang

The pipeline of ZoomNet begins with an ordinary 2D object detection model which is used to obtain pairs of left-right bounding boxes.

3D Object Detection Autonomous Driving +2

Paper
Code

Dynamic Inference: A New Approach Toward Efficient Video Action Recognition

no code implementations • 9 Feb 2020 • Wenhao Wu, Dongliang He, Xiao Tan, Shifeng Chen, Yi Yang, Shilei Wen

In a nutshell, we treat input frames and network depth of the computational graph as a 2-dimensional grid, and several checkpoints are placed on this grid in advance with a prediction module.

Action Recognition In Videos Temporal Action Localization

Paper
Add Code

Cross-Modality Attention with Semantic Graph Embedding for Multi-Label Classification

no code implementations • 17 Dec 2019 • Renchun You, Zhiyao Guo, Lei Cui, Xiang Long, Yingze Bao, Shilei Wen

In order to overcome these challenges, we propose to use cross-modality attention with semantic graph embedding for multi label classification.

Ranked #8 on Multi-Label Classification on NUS-WIDE

Classification General Classification +4

Paper
Add Code

Multi-Label Classification with Label Graph Superimposing

2 code implementations • 21 Nov 2019 • Ya Wang, Dongliang He, Fu Li, Xiang Long, Zhichao Zhou, Jinwen Ma, Shilei Wen

In this paper, we propose a label graph superimposing framework to improve the conventional GCN+CNN framework developed for multi-label recognition in the following two aspects.

Ranked #28 on Multi-Label Classification on MS-COCO

Attribute Classification +3

461

Paper
Code

Dynamic Instance Normalization for Arbitrary Style Transfer

no code implementations • 16 Nov 2019 • Yongcheng Jing, Xiao Liu, Yukang Ding, Xinchao Wang, Errui Ding, Mingli Song, Shilei Wen

Prior normalization methods rely on affine transformations to produce arbitrary image style transfers, of which the parameters are computed in a pre-defined way.

Style Transfer

Paper
Add Code

TruNet: Short Videos Generation from Long Videos via Story-Preserving Truncation

no code implementations • 14 Oct 2019 • Fan Yang, Xiao Liu, Dongliang He, Chuang Gan, Jian Wang, Chao Li, Fu Li, Shilei Wen

In this work, we introduce a new problem, named as {\em story-preserving long video truncation}, that requires an algorithm to automatically truncate a long-duration video into multiple short and attractive sub-videos with each one containing an unbroken story.

Highlight Detection Video Summarization

Paper
Add Code

Perspective-Guided Convolution Networks for Crowd Counting

1 code implementation • ICCV 2019 • Zhaoyi Yan, Yuchen Yuan, WangMeng Zuo, Xiao Tan, Yezhen Wang, Shilei Wen, Errui Ding

In this paper, we propose a novel perspective-guided convolution (PGC) for convolutional neural network (CNN) based crowd counting (i. e. PGCNet), which aims to overcome the dramatic intra-scene scale variations of people due to the perspective effect.

Crowd Counting

Paper
Code

Image Inpainting with Learnable Bidirectional Attention Maps

1 code implementation • ICCV 2019 • Chaohao Xie, Shaohui Liu, Chao Li, Ming-Ming Cheng, WangMeng Zuo, Xiao Liu, Shilei Wen, Errui Ding

Most convolutional network (CNN)-based inpainting methods adopt standard convolution to indistinguishably treat valid pixels and holes, making them limited in handling irregular holes and more likely to generate inpainting results with color discrepancy and blurriness.

Ranked #2 on Image Inpainting on Paris StreetView

Decoder Image Inpainting +1

Paper
Code

Deep Concept-wise Temporal Convolutional Networks for Action Localization

2 code implementations • 26 Aug 2019 • Xin Li, Tianwei Lin, Xiao Liu, Chuang Gan, WangMeng Zuo, Chao Li, Xiang Long, Dongliang He, Fu Li, Shilei Wen

In this paper, we empirically find that stacking more conventional temporal convolution layers actually deteriorates action classification performance, possibly ascribing to that all channels of 1D feature map, which generally are highly abstract and can be regarded as latent concepts, are excessively recombined in temporal convolution.

Action Classification Action Localization

6,870

Paper
Code

Multi-Agent Reinforcement Learning Based Frame Sampling for Effective Untrimmed Video Recognition

no code implementations • ICCV 2019 • Wenhao Wu, Dongliang He, Xiao Tan, Shifeng Chen, Shilei Wen

Video Recognition has drawn great research interest and great progress has been made.

Ranked #7 on Action Recognition on ActivityNet

Action Recognition General Classification +5

Paper
Add Code

BMN: Boundary-Matching Network for Temporal Action Proposal Generation

15 code implementations • ICCV 2019 • Tianwei Lin, Xiao Liu, Xin Li, Errui Ding, Shilei Wen

To address these difficulties, we introduce the Boundary-Matching (BM) mechanism to evaluate confidence scores of densely distributed proposals, which denote a proposal as a matching pair of starting and ending boundaries and combine all densely distributed BM pairs into the BM confidence map.

Ranked #1 on Action Recognition on THUMOS’14

Action Detection Action Recognition +1

6,703

Paper
Code

Adapting Image Super-Resolution State-of-the-arts and Learning Multi-model Ensemble for Video Super-Resolution

no code implementations • 7 May 2019 • Chao Li, Dongliang He, Xiao Liu, Yukang Ding, Shilei Wen

Recently, image super-resolution has been widely studied and achieved significant progress by leveraging the power of deep convolutional neural networks.

Image Super-Resolution Video Super-Resolution

Paper
Add Code

STGAN: A Unified Selective Transfer Network for Arbitrary Image Attribute Editing

8 code implementations • CVPR 2019 • Ming Liu, Yukang Ding, Min Xia, Xiao Liu, Errui Ding, WangMeng Zuo, Shilei Wen

Arbitrary attribute editing generally can be tackled by incorporating encoder-decoder and generative adversarial networks.

Attribute Decoder +1

428

Paper
Code

Read, Watch, and Move: Reinforcement Learning for Temporally Grounding Natural Language Descriptions in Videos

1 code implementation • 21 Jan 2019 • Dongliang He, Xiang Zhao, Jizhou Huang, Fu Li, Xiao Liu, Shilei Wen

The task of video grounding, which temporally localizes a natural language description in a video, plays an important role in understanding videos.

Decision Making Multi-Task Learning +3

Paper
Code

StNet: Local and Global Spatial-Temporal Modeling for Action Recognition

8 code implementations • 5 Nov 2018 • Dongliang He, Zhichao Zhou, Chuang Gan, Fu Li, Xiao Liu, Yandong Li, Li-Min Wang, Shilei Wen

In this paper, in contrast to the existing CNN+RNN or pure 3D convolution based approaches, we explore a novel spatial temporal network (StNet) architecture for both local and global spatial-temporal modeling in videos.

Action Recognition Temporal Action Localization

335

Paper
Code

Solution for Large-Scale Hierarchical Object Detection Datasets with Incomplete Annotation and Data Imbalance

no code implementations • 15 Oct 2018 • Yuan Gao, Xingyuan Bu, Yang Hu, Hui Shen, Ti Bai, Xubin Li, Shilei Wen

This report demonstrates our solution for the Open Images 2018 Challenge.

object-detection Object Detection +1

Paper
Add Code

Exploiting Spatial-Temporal Modelling and Multi-Modal Fusion for Human Action Recognition

no code implementations • 27 Jun 2018 • Dongliang He, Fu Li, Qijie Zhao, Xiang Long, Yi Fu, Shilei Wen

In this challenge, we propose spatial-temporal network (StNet) for better joint spatial-temporal modelling and comprehensively video understanding.

Action Recognition Temporal Action Localization +1

Paper
Add Code

Attention Clusters: Purely Attention Based Local Feature Integration for Video Classification

5 code implementations • CVPR 2018 • Xiang Long, Chuang Gan, Gerard de Melo, Jiajun Wu, Xiao Liu, Shilei Wen

In this paper, however, we show that temporal information, especially longer-term patterns, may not be necessary to achieve competitive results on common video classification datasets.

Classification General Classification +1

335

Paper
Code

Revisiting the Effectiveness of Off-the-shelf Temporal Modeling Approaches for Large-scale Video Classification

no code implementations • 12 Aug 2017 • Yunlong Bian, Chuang Gan, Xiao Liu, Fu Li, Xiang Long, Yandong Li, Heng Qi, Jie zhou, Shilei Wen, Yuanqing Lin

Experiment results on the challenging Kinetics dataset demonstrate that our proposed temporal modeling approaches can significantly improve existing approaches in the large-scale video recognition tasks.

Ranked #163 on Action Classification on Kinetics-400

Action Classification General Classification +2

Paper
Add Code

Deep Metric Learning with Angular Loss

1 code implementation • ICCV 2017 • Jian Wang, Feng Zhou, Shilei Wen, Xiao Liu, Yuanqing Lin

The modern image search system requires semantic understanding of image, and a key yet under-addressed problem is to learn a good metric for measuring the similarity between images.

Image Retrieval Metric Learning

Paper
Code

Temporal Modeling Approaches for Large-scale Youtube-8M Video Understanding

1 code implementation • 14 Jul 2017 • Fu Li, Chuang Gan, Xiao Liu, Yunlong Bian, Xiang Long, Yandong Li, Zhichao Li, Jie zhou, Shilei Wen

This paper describes our solution for the video recognition task of the Google Cloud and YouTube-8M Video Understanding Challenge that ranked the 3rd place.

Video Recognition Video Understanding

114

Paper
Code

Dynamic Computational Time for Visual Attention

1 code implementation • 30 Mar 2017 • Zhichao Li, Yi Yang, Xiao Liu, Feng Zhou, Shilei Wen, Wei Xu

We propose a dynamic computational time model to accelerate the average processing time for recurrent visual attention (RAM).

reinforcement-learning Reinforcement Learning (RL)

Paper
Code

Localizing by Describing: Attribute-Guided Attention Localization for Fine-Grained Recognition

no code implementations • 20 May 2016 • Xiao Liu, Jiang Wang, Shilei Wen, Errui Ding, Yuanqing Lin

By designing a novel reward strategy, we are able to learn to locate regions that are spatially and semantically distinctive with reinforcement learning algorithm.

Attribute reinforcement-learning +1

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.