Search Results for author: Pichao Wang

Found 58 papers, 26 papers with code

Hallucination of Multimodal Large Language Models: A Survey

1 code implementation • 29 Apr 2024 • Zechen Bai, Pichao Wang, Tianjun Xiao, Tong He, Zongbo Han, Zheng Zhang, Mike Zheng Shou

By drawing the granular classification and landscapes of hallucination causes, evaluation benchmarks, and mitigation methods, this survey aims to deepen the understanding of hallucinations in MLLMs and inspire further advancements in the field.

Hallucination

189

Paper
Code

Text Is MASS: Modeling as Stochastic Embedding for Text-Video Retrieval

1 code implementation • 26 Mar 2024 • Jiamian Wang, Guohao Sun, Pichao Wang, Dongfang Liu, Sohail Dianat, Majid Rabbani, Raghuveer Rao, Zhiqiang Tao

Correspondingly, a single text embedding may be less expressive to capture the video embedding and empower the retrieval.

Multimodal Reasoning Retrieval +1

Paper
Code

Hourglass Tokenizer for Efficient Transformer-Based 3D Human Pose Estimation

1 code implementation • 20 Nov 2023 • Wenhao Li, Mengyuan Liu, Hong Liu, Pichao Wang, Jialun Cai, Nicu Sebe

Transformers have been successfully applied in the field of video-based 3D human pose estimation.

3D Human Pose Estimation

Paper
Code

Human Pose-based Estimation, Tracking and Action Recognition with Deep Learning: A Survey

no code implementations • 19 Oct 2023 • Lijuan Zhou, Xiang Meng, Zhihuan Liu, Mengqi Wu, Zhimin Gao, Pichao Wang

This paper presents a comprehensive survey of pose-based applications utilizing deep learning, encompassing pose estimation, pose tracking, and action recognition. Pose estimation involves the determination of human joint positions from images or image sequences.

2D Pose Estimation 3D Pose Estimation +3

Paper
Add Code

SCT: A Simple Baseline for Parameter-Efficient Fine-Tuning via Salient Channels

2 code implementations • 15 Sep 2023 • Henry Hengyuan Zhao, Pichao Wang, Yuyang Zhao, Hao Luo, Fan Wang, Mike Zheng Shou

Experiments on 19 visual transfer learning downstream tasks demonstrate that our SCT outperforms full fine-tuning on 18 out of 19 tasks by adding only 0. 11M parameters of the ViT-B, which is 780$\times$ fewer than its full fine-tuning counterpart.

Domain Generalization Few-Shot Learning +1

Paper
Code

Multi-stage Factorized Spatio-Temporal Representation for RGB-D Action and Gesture Recognition

1 code implementation • 23 Aug 2023 • Yujun Ma, Benjia Zhou, Ruili Wang, Pichao Wang

RGB-D action and gesture recognition remain an interesting topic in human-centered scene understanding, primarily due to the multiple granularities and large variation in human motion.

Gesture Recognition Scene Understanding

Paper
Code

Revisiting Vision Transformer from the View of Path Ensemble

no code implementations • ICCV 2023 • Shuning Chang, Pichao Wang, Hao Luo, Fan Wang, Mike Zheng Shou

Therefore, we propose the path pruning and EnsembleScale skills for improvement, which cut out the underperforming paths and re-weight the ensemble components, respectively, to optimize the path combination and make the short paths focus on providing high-quality representation for subsequent paths.

Paper
Add Code

Audio-Enhanced Text-to-Video Retrieval using Text-Conditioned Feature Alignment

no code implementations • ICCV 2023 • Sarah Ibrahimi, Xiaohang Sun, Pichao Wang, Amanmeet Garg, Ashutosh Sanan, Mohamed Omar

Nonetheless, the objective of the text-to-video retrieval task is to capture the complementary audio and video information that is pertinent to the text query rather than simply achieving better audio and video alignment.

Ranked #10 on Video Retrieval on MSR-VTT

Retrieval Text to Video Retrieval +2

Paper
Add Code

DOAD: Decoupled One Stage Action Detection Network

no code implementations • 1 Apr 2023 • Shuning Chang, Pichao Wang, Fan Wang, Jiashi Feng, Mike Zheng Show

Specifically, one branch focuses on detection representation for actor detection, and the other one for action recognition.

Action Detection Action Recognition +1

Paper
Add Code

PoseFormerV2: Exploring Frequency Domain for Efficient and Robust 3D Human Pose Estimation

2 code implementations • CVPR 2023 • Qitao Zhao, Ce Zheng, Mengyuan Liu, Pichao Wang, Chen Chen

However, in real scenarios, the performance of PoseFormer and its follow-ups is limited by two factors: (a) The length of the input joint sequence; (b) The quality of 2D joint detection.

Ranked #7 on Classification on Full-body Parkinson’s disease dataset

3D Human Pose Estimation Classification +1

449

Paper
Code

Selective Structured State-Spaces for Long-Form Video Understanding

no code implementations • CVPR 2023 • Jue Wang, Wentao Zhu, Pichao Wang, Xiang Yu, Linda Liu, Mohamed Omar, Raffay Hamid

To address this limitation, we present a novel Selective S4 (i. e., S5) model that employs a lightweight mask generator to adaptively select informative image tokens resulting in more efficient and accurate modeling of long-term spatiotemporal dependencies in videos.

Ranked #2 on Video Classification on Breakfast

Contrastive Learning Token Reduction +2

Paper
Add Code

EPro-PnP: Generalized End-to-End Probabilistic Perspective-n-Points for Monocular Object Pose Estimation

2 code implementations • 22 Mar 2023 • Hansheng Chen, Wei Tian, Pichao Wang, Fan Wang, Lu Xiong, Hao Li

In this paper, we propose the EPro-PnP, a probabilistic PnP layer for general end-to-end pose estimation, which outputs a distribution of pose with differentiable probability density on the SE(3) manifold.

Ranked #4 on 6D Pose Estimation using RGB on LineMOD

3D Object Detection 6D Pose Estimation using RGB +1

1,057

Paper
Code

Making Vision Transformers Efficient from A Token Sparsification View

1 code implementation • CVPR 2023 • Shuning Chang, Pichao Wang, Ming Lin, Fan Wang, David Junhao Zhang, Rong Jin, Mike Zheng Shou

In this work, we propose a novel Semantic Token ViT (STViT), for efficient global and local vision transformers, which can also be revised to serve as backbone for downstream tasks.

Efficient ViTs Instance Segmentation +4

Paper
Code

Revisit Parameter-Efficient Transfer Learning: A Two-Stage Paradigm

no code implementations • 14 Mar 2023 • Hengyuan Zhao, Hao Luo, Yuyang Zhao, Pichao Wang, Fan Wang, Mike Zheng Shou

In view of the practicality of PETL, previous works focus on tuning a small set of parameters for each downstream task in an end-to-end manner while rarely considering the task distribution shift issue between the pre-training task and the downstream task.

Transfer Learning Vocal Bursts Valence Prediction

Paper
Add Code

Head-Free Lightweight Semantic Segmentation with Linear Transformer

1 code implementation • 11 Jan 2023 • Bo Dong, Pichao Wang, Fan Wang

On the ADE20K dataset, our model achieves 41. 8 mIoU and 4. 6 GFLOPs, which is 4. 4 mIoU higher than Segformer, with 45% less GFLOPs.

Decoder Segmentation +1

113

Paper
Code

A Unified Multimodal De- and Re-coupling Framework for RGB-D Motion Recognition

1 code implementation • 16 Nov 2022 • Benjia Zhou, Pichao Wang, Jun Wan, Yanyan Liang, Fan Wang

Although improving motion recognition to some extent, these methods still face sub-optimal situations in the following aspects: (i) Data augmentation, i. e., the scale of the RGB-D datasets is still limited, and few efforts have been made to explore novel data augmentation strategies for videos; (ii) Optimization mechanism, i. e., the tightly space-time-entangled network structure brings more challenges to spatiotemporal information modeling; And (iii) cross-modal knowledge fusion, i. e., the high similarity between multimodal representations caused to insufficient late fusion.

Ranked #3 on Action Recognition on NTU RGB+D

Action Recognition Data Augmentation +2

Paper
Code

VTC-LFC: Vision Transformer Compression with Low-Frequency Components

1 code implementation • NIPS 2022 • Zhenyu Wang, Hao Luo, Pichao Wang, Feng Ding, Fan Wang, Hao Li

Although Vision transformers (ViTs) have recently dominated many vision tasks, deploying ViT models on resource-limited devices remains a challenging problem.

Paper
Code

Focal and Global Spatial-Temporal Transformer for Skeleton-based Action Recognition

no code implementations • 6 Oct 2022 • Zhimin Gao, Peitao Wang, Pei Lv, Xiaoheng Jiang, Qidong Liu, Pichao Wang, Mingliang Xu, Wanqing Li

Besides, these methods directly calculate the pair-wise global self-attention equally for all the joints in both the spatial and temporal dimensions, undervaluing the effect of discriminative local joints and the short-range temporal dynamics.

Action Recognition Skeleton Based Action Recognition

Paper
Add Code

Effective Vision Transformer Training: A Data-Centric Perspective

no code implementations • 29 Sep 2022 • Benjia Zhou, Pichao Wang, Jun Wan, Yanyan Liang, Fan Wang

To achieve these two purposes, we propose a novel data-centric ViT training framework to dynamically measure the ``difficulty'' of training samples and generate ``effective'' samples for models at different training stages.

Paper
Add Code

FT-HID: A Large Scale RGB-D Dataset for First and Third Person Human Interaction Analysis

1 code implementation • 21 Sep 2022 • Zihui Guo, Yonghong Hou, Pichao Wang, Zhimin Gao, Mingliang Xu, Wanqing Li

It has been studied either using first person vision (FPV) or third person vision (TPV).

Action Analysis Action Recognition

Paper
Code

EPro-PnP: Generalized End-to-End Probabilistic Perspective-n-Points for Monocular Object Pose Estimation

1 code implementation • CVPR 2022 • Hansheng Chen, Pichao Wang, Fan Wang, Wei Tian, Lu Xiong, Hao Li

The 2D-3D coordinates and corresponding weights are treated as intermediate variables learned by minimizing the KL divergence between the predicted and target pose distribution.

Ranked #6 on 6D Pose Estimation using RGB on LineMOD

3D Object Detection 6D Pose Estimation using RGB +1

1,057

Paper
Code

BP-Triplet Net for Unsupervised Domain Adaptation: A Bayesian Perspective

no code implementations • 19 Feb 2022 • Shanshan Wang, Lei Zhang, Pichao Wang

In our work, considering the different importance of pair-wise samples for both feature learning and domain alignment, we deduce our BP-Triplet loss for effective UDA from the perspective of Bayesian learning.

Metric Learning Unsupervised Domain Adaptation

Paper
Add Code

Image-to-Video Re-Identification via Mutual Discriminative Knowledge Transfer

no code implementations • 21 Jan 2022 • Pichao Wang, Fan Wang, Hao Li

During the KD process, the TCL loss transfers the local structure, exploits the higher order information, and mitigates the misalignment of the heterogeneous output of teacher and student networks.

Knowledge Distillation Transfer Learning

Paper
Add Code

ELSA: Enhanced Local Self-Attention for Vision Transformer

1 code implementation • 23 Dec 2021 • Jingkai Zhou, Pichao Wang, Fan Wang, Qiong Liu, Hao Li, Rong Jin

Self-attention is powerful in modeling long-range dependencies, but it is weak in local finer-level feature learning.

Ranked #46 on Semantic Segmentation on ADE20K val

Image Classification Instance Segmentation +2

114

Paper
Code

Decoupling and Recoupling Spatiotemporal Representation for RGB-D-based Motion Recognition

1 code implementation • CVPR 2022 • Benjia Zhou, Pichao Wang, Jun Wan, Yanyan Liang, Fan Wang, Du Zhang, Zhen Lei, Hao Li, Rong Jin

Decoupling spatiotemporal representation refers to decomposing the spatial and temporal features into dimension-independent factors.

Ranked #1 on Hand Gesture Recognition on NVGesture

Hand Gesture Recognition

Paper
Code

TransFGU: A Top-down Approach to Fine-Grained Unsupervised Semantic Segmentation

1 code implementation • 2 Dec 2021 • Zhaoyuan Yin, Pichao Wang, Fan Wang, Xianzhe Xu, Hanling Zhang, Hao Li, Rong Jin

Unsupervised semantic segmentation aims to obtain high-level semantic representation on low-level visual features without manual annotations.

Ranked #2 on Unsupervised Semantic Segmentation on COCO-Stuff-171 (using extra training data)

Segmentation Self-Supervised Learning +1

Paper
Code

MHFormer: Multi-Hypothesis Transformer for 3D Human Pose Estimation

1 code implementation • CVPR 2022 • Wenhao Li, Hong Liu, Hao Tang, Pichao Wang, Luc van Gool

Estimating 3D human poses from monocular videos is a challenging task due to depth ambiguity and self-occlusion.

Ranked #22 on 3D Human Pose Estimation on MPI-INF-3DHP

3D Human Pose Estimation

497

Paper
Code

Self-Supervised Pre-Training for Transformer-Based Person Re-Identification

2 code implementations • 23 Nov 2021 • Hao Luo, Pichao Wang, Yi Xu, Feng Ding, Yanxin Zhou, Fan Wang, Hao Li, Rong Jin

We first investigate self-supervised learning (SSL) methods with Vision Transformer (ViT) pretrained on unlabelled person images (the LUPerson dataset), and empirically find it significantly surpasses ImageNet supervised pre-training models on ReID tasks.

Ranked #1 on Unsupervised Person Re-Identification on Market-1501 (using extra training data)

Self-Supervised Learning Unsupervised Domain Adaptation +1

221

Paper
Code

CDTrans: Cross-domain Transformer for Unsupervised Domain Adaptation

2 code implementations • ICLR 2022 • Tongkun Xu, Weihua Chen, Pichao Wang, Fan Wang, Hao Li, Rong Jin

Along with the pseudo labels, a weight-sharing triple-branch transformer framework is proposed to apply self-attention and cross-attention for source/target feature learning and source-target domain alignment, respectively.

Ranked #3 on Domain Adaptation on Office-31

Unsupervised Domain Adaptation

315

Paper
Code

Scaled ReLU Matters for Training Vision Transformers

no code implementations • 8 Sep 2021 • Pichao Wang, Xue Wang, Hao Luo, Jingkai Zhou, Zhipeng Zhou, Fan Wang, Hao Li, Rong Jin

In this paper, we further investigate this problem and extend the above conclusion: only early convolutions do not help for stable training, but the scaled ReLU operation in the \textit{convolutional stem} (\textit{conv-stem}) matters.

Paper
Add Code

KVT: k-NN Attention for Boosting Vision Transformers

1 code implementation • 28 May 2021 • Pichao Wang, Xue Wang, Fan Wang, Ming Lin, Shuning Chang, Hao Li, Rong Jin

A key component in vision transformers is the fully-connected self-attention which is more powerful than CNNs in modelling long range dependencies.

Paper
Code

TransRPPG: Remote Photoplethysmography Transformer for 3D Mask Face Presentation Attack Detection

no code implementations • 15 Apr 2021 • Zitong Yu, Xiaobai Li, Pichao Wang, Guoying Zhao

3D mask face presentation attack detection (PAD) plays a vital role in securing face recognition systems from emergent 3D mask attacks.

Face Presentation Attack Detection Face Recognition

Paper
Add Code

Augmented Transformer with Adaptive Graph for Temporal Action Proposal Generation

no code implementations • 30 Mar 2021 • Shuning Chang, Pichao Wang, Fan Wang, Hao Li, Jiashi Feng

Temporal action proposal generation (TAPG) is a fundamental and challenging task in video understanding, especially in temporal action detection.

Action Detection Temporal Action Proposal Generation +1

Paper
Add Code

Exploiting Temporal Contexts with Strided Transformer for 3D Human Pose Estimation

1 code implementation • 26 Mar 2021 • Wenhao Li, Hong Liu, Runwei Ding, Mengyuan Liu, Pichao Wang, Wenming Yang

The modified VTE is termed as Strided Transformer Encoder (STE), which is built upon the outputs of VTE.

Ranked #2 on 3D Human Pose Estimation on HumanEva-I

Monocular 3D Human Pose Estimation

329

Paper
Code

TransReID: Transformer-based Object Re-Identification

4 code implementations • ICCV 2021 • Shuting He, Hao Luo, Pichao Wang, Fan Wang, Hao Li, Wei Jiang

Extracting robust feature representation is one of the key challenges in object re-identification (ReID).

Ranked #1 on Person Re-Identification on Market-1501-C

Object Person Re-Identification +1

757

Paper
Code

Zen-NAS: A Zero-Shot NAS for High-Performance Deep Image Recognition

2 code implementations • 1 Feb 2021 • Ming Lin, Pichao Wang, Zhenhong Sun, Hesen Chen, Xiuyu Sun, Qi Qian, Hao Li, Rong Jin

Comparing with previous NAS methods, the proposed Zen-NAS is magnitude times faster on multiple server-side and mobile-side GPU platforms with state-of-the-art accuracy on ImageNet.

Ranked #2 on Neural Architecture Search on ImageNet

Image Classification Neural Architecture Search

346

Paper
Code

Trear: Transformer-based RGB-D Egocentric Action Recognition

no code implementations • 5 Jan 2021 • Xiangyu Li, Yonghong Hou, Pichao Wang, Zhimin Gao, Mingliang Xu, Wanqing Li

In this paper, we propose a \textbf{Tr}ansformer-based RGB-D \textbf{e}gocentric \textbf{a}ction \textbf{r}ecognition framework, called Trear.

Action Recognition Optical Flow Estimation

Paper
Add Code

Zen-NAS: A Zero-Shot NAS for High-Performance Image Recognition

2 code implementations • ICCV 2021 • Ming Lin, Pichao Wang, Zhenhong Sun, Hesen Chen, Xiuyu Sun, Qi Qian, Hao Li, Rong Jin

To address this issue, instead of using an accuracy predictor, we propose a novel zero-shot index dubbed Zen-Score to rank the architectures.

Neural Architecture Search Vocal Bursts Intensity Prediction

346

Paper
Code

Transformer Guided Geometry Model for Flow-Based Unsupervised Visual Odometry

no code implementations • 8 Dec 2020 • Xiangyu Li, Yonghong Hou, Pichao Wang, Zhimin Gao, Mingliang Xu, Wanqing Li

In this paper, we propose a method consisting of two camera pose estimators that deal with the information from pairwise images and a short sequence of images respectively.

Visual Odometry

Paper
Add Code

SAR-NAS: Skeleton-based Action Recognition via Neural Architecture Searching

no code implementations • 29 Oct 2020 • Haoyuan Zhang, Yonghong Hou, Pichao Wang, Zihui Guo, Wanqing Li

The recently developed DARTS (Differentiable Architecture Search) is adopted to search for an effective network architecture that is built upon the two types of cells.

Action Recognition Skeleton Based Action Recognition

Paper
Add Code

Searching Multi-Rate and Multi-Modal Temporal Enhanced Networks for Gesture Recognition

1 code implementation • 21 Aug 2020 • Zitong Yu, Benjia Zhou, Jun Wan, Pichao Wang, Haoyu Chen, Xin Liu, Stan Z. Li, Guoying Zhao

Gesture recognition has attracted considerable attention owing to its great potential in applications.

Gesture Recognition Neural Architecture Search

Paper
Code

RobustTAD: Robust Time Series Anomaly Detection via Decomposition and Convolutional Neural Networks

no code implementations • 21 Feb 2020 • Jingkun Gao, Xiaomin Song, Qingsong Wen, Pichao Wang, Liang Sun, Huan Xu

It is deployed as a public online service and widely adopted in different business scenarios at Alibaba Group.

Anomaly Detection Data Augmentation +4

Paper
Add Code

Depth Pooling Based Large-scale 3D Action Recognition with Convolutional Neural Networks

no code implementations • 17 Mar 2018 • Pichao Wang, Wanqing Li, Zhimin Gao, Chang Tang, Philip Ogunbona

This paper proposes three simple, compact yet effective representations of depth sequences, referred to respectively as Dynamic Depth Images (DDI), Dynamic Depth Normal Images (DDNI) and Dynamic Depth Motion Normal Images (DDMNI), for both isolated and continuous action recognition.

3D Action Recognition Gesture Recognition

Paper
Add Code

Cooperative Training of Deep Aggregation Networks for RGB-D Action Recognition

no code implementations • 5 Dec 2017 • Pichao Wang, Wanqing Li, Jun Wan, Philip Ogunbona, Xinwang Liu

Differently from the conventional ConvNet that learns the deep separable features for homogeneous modality-based classification with only one softmax loss function, the c-ConvNet enhances the discriminative power of the deeply learned features and weakens the undesired modality discrepancy by jointly optimizing a ranking loss and a softmax loss for both homogeneous and heterogeneous modalities.

Action Recognition Temporal Action Localization

Paper
Add Code

RGB-D-based Human Motion Recognition with Deep Learning: A Survey

no code implementations • 31 Oct 2017 • Pichao Wang, Wanqing Li, Philip Ogunbona, Jun Wan, Sergio Escalera

Specifically, deep learning methods based on the CNN and RNN architectures have been adopted for motion recognition using RGB-D data.

Paper
Add Code

Skeleton-based Action Recognition Using LSTM and CNN

no code implementations • 6 Jul 2017 • Chuankun Li, Pichao Wang, Shuang Wang, Yonghong Hou, Wanqing Li

Recent methods based on 3D skeleton data have achieved outstanding performance due to its conciseness, robustness, and view-independent representation.

Action Analysis Action Recognition +2

Paper
Add Code

Investigation of Different Skeleton Features for CNN-based 3D Action Recognition

1 code implementation • 2 May 2017 • Zewei Ding, Pichao Wang, Philip O. Ogunbona, Wanqing Li

The proposed method achieved state-of-the-art performance on NTU RGB+D dataset for 3D human action analysis.

Ranked #105 on Skeleton Based Action Recognition on NTU RGB+D (Accuracy (CV) metric)

Action Analysis Skeleton Based Action Recognition

Paper
Code

Scene Flow to Action Map: A New Representation for RGB-D based Action Recognition with Convolutional Neural Networks

no code implementations • CVPR 2017 • Pichao Wang, Wanqing Li, Zhimin Gao, Yuyao Zhang, Chang Tang, Philip Ogunbona

Based on the scene flow vectors, we propose a new representation, namely, Scene Flow to Action Map (SFAM), that describes several long term spatio-temporal dynamics for action recognition.

Ranked #3 on Hand Gesture Recognition on ChaLearn val

3D Action Recognition

Paper
Add Code

Large-scale Isolated Gesture Recognition Using Convolutional Neural Networks

no code implementations • 7 Jan 2017 • Pichao Wang, Wanqing Li, Song Liu, Zhimin Gao, Chang Tang, Philip Ogunbona

Ranked #2 on Hand Gesture Recognition on ChaLearn val

General Classification Gesture Recognition

Paper
Add Code

Action Recognition Based on Joint Trajectory Maps with Convolutional Neural Networks

no code implementations • 30 Dec 2016 • Pichao Wang, Wanqing Li, Chuankun Li, Yonghong Hou

Convolutional Neural Networks (ConvNets) have recently shown promising performance in many computer vision tasks, especially image-based recognition.

Ranked #1 on Skeleton Based Action Recognition on Gaming 3D (G3D)

Action Recognition Skeleton Based Action Recognition +1

Paper
Add Code

Action Recognition Based on Joint Trajectory Maps Using Convolutional Neural Networks

no code implementations • 8 Nov 2016 • Pichao Wang, Zhaoyang Li, Yonghong Hou, Wanqing Li

Recently, Convolutional Neural Networks (ConvNets) have shown promising performances in many computer vision tasks, especially image-based recognition.

Action Recognition Temporal Action Localization

Paper
Add Code

Large-scale Continuous Gesture Recognition Using Convolutional Neural Networks

no code implementations • 22 Aug 2016 • Pichao Wang, Wanqing Li, Song Liu, Yuyao Zhang, Zhimin Gao, Philip Ogunbona

This paper addresses the problem of continuous gesture recognition from sequences of depth maps using convolutional neutral networks (ConvNets).

General Classification Gesture Recognition

Paper
Add Code

Combining ConvNets with Hand-Crafted Features for Action Recognition Based on an HMM-SVM Classifier

no code implementations • 1 Feb 2016 • Pichao Wang, Zhaoyang Li, Yonghong Hou, Wanqing Li

This paper proposes a new framework for RGB-D-based action recognition that takes advantages of hand-designed features from skeleton data and deeply learned features from depth maps, and exploits effectively both the local and global temporal information.

Action Recognition Temporal Action Localization

Paper
Add Code

RGB-D-based Action Recognition Datasets: A Survey

no code implementations • 21 Jan 2016 • Jing Zhang, Wanqing Li, Philip O. Ogunbona, Pichao Wang, Chang Tang

Human action recognition from RGB-D (Red, Green, Blue and Depth) data has attracted increasing attention since the first work reported in 2010.

Action Recognition Temporal Action Localization

Paper
Add Code

Online Action Recognition based on Incremental Learning of Weighted Covariance Descriptors

no code implementations • 10 Nov 2015 • Chang Tang, Pichao Wang, Wanqing Li

This paper presents a fast yet effective method to recognize actions from stream of noisy skeleton data, and a novel weighted covariance descriptor is adopted to accumulate evidence.

Action Recognition Incremental Learning +1

Paper
Add Code

Action recognition from depth maps using deep convolutional neural networks

no code implementations • IEEE Transactions on Human-Machine Systems 2016 2015 • Pichao Wang, Wanqing Li, Zhimin Gao, Jing Zhang, Chang Tang, Philip Ogunbona

In addition, the method was evaluated on the large dataset constructed from the above datasets.

Ranked #9 on Multimodal Activity Recognition on EV-Action

Action Recognition Multimodal Activity Recognition +1

Paper
Add Code

Deep Convolutional Neural Networks for Action Recognition Using Depth Map Sequences

no code implementations • 20 Jan 2015 • Pichao Wang, Wanqing Li, Zhimin Gao, Jing Zhang, Chang Tang, Philip Ogunbona

The results show that our approach can achieve state-of-the-art results on the individual datasets and without dramatical performance degradation on the Combined Dataset.

Action Recognition Temporal Action Localization

Paper
Add Code

Mining Mid-level Features for Action Recognition Based on Effective Skeleton Representation

no code implementations • 14 Sep 2014 • Pichao Wang, Wanqing Li, Philip Ogunbona, Zhimin Gao, Hanling Zhang

These parts are referred to as Frequent Local Parts or FLPs.

3D Action Recognition

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.