Skeleton-Based Action Recognition With Shift Graph Convolutional Network

Action recognition with skeleton data is attracting more attention in computer vision. Recently, graph convolutional networks (GCNs), which model the human body skeletons as spatiotemporal graphs, have obtained remarkable performance. However, the computational complexity of GCN-based methods are pretty heavy, typically over 15 GFLOPs for one action sample. Recent works even reach about 100 GFLOPs. Another shortcoming is that the receptive fields of both spatial graph and temporal graph are inflexible. Although some works enhance the expressiveness of spatial graph by introducing incremental adaptive modules, their performance is still limited by regular GCN structures. In this paper, we propose a novel shift graph convolutional network (Shift-GCN) to overcome both shortcomings. Instead of using heavy regular graph convolutions, our Shift-GCN is composed of novel shift graph operations and lightweight point-wise convolutions, where the shift graph operations provide flexible receptive fields for both spatial graph and temporal graph. On three datasets for skeleton-based action recognition, the proposed Shift-GCN notably exceeds the state-of-the-art methods with more than 10 times less computational complexity.

PDF Abstract
Task Dataset Model Metric Name Metric Value Global Rank Uses Extra
Training Data
Benchmark
Skeleton Based Action Recognition NTU RGB+D 4s Shift-GCN Accuracy (CV) 96.5 # 22
Accuracy (CS) 90.7 # 34
Ensembled Modalities 4 # 2
Skeleton Based Action Recognition NTU RGB+D 120 4s Shift-GCN Accuracy (Cross-Subject) 85.9% # 32
Accuracy (Cross-Setup) 87.6% # 33
Ensembled Modalities 4 # 1
Skeleton Based Action Recognition UAV-Human Shift-GCN CSv1(%) 37.98 # 3
CSv2(%) 67.04 # 3

Methods