MSR-GCN: Multi-Scale Residual Graph Convolution Networks for Human Motion Prediction

Human motion prediction is a challenging task due to the stochasticity and aperiodicity of future poses. Recently, graph convolutional network has been proven to be very effective to learn dynamic relations among pose joints, which is helpful for pose prediction. On the other hand, one can abstract a human pose recursively to obtain a set of poses at multiple scales. With the increase of the abstraction level, the motion of the pose becomes more stable, which benefits pose prediction too. In this paper, we propose a novel Multi-Scale Residual Graph Convolution Network (MSR-GCN) for human pose prediction task in the manner of end-to-end. The GCNs are used to extract features from fine to coarse scale and then from coarse to fine scale. The extracted features at each scale are then combined and decoded to obtain the residuals between the input and target poses. Intermediate supervisions are imposed on all the predicted poses, which enforces the network to learn more representative features. Our proposed approach is evaluated on two standard benchmark datasets, i.e., the Human3.6M dataset and the CMU Mocap dataset. Experimental results demonstrate that our method outperforms the state-of-the-art approaches. Code and pre-trained models are available at https://github.com/Droliven/MSRGCN.

PDF Abstract ICCV 2021 PDF ICCV 2021 Abstract

Datasets


Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Human Pose Forecasting Human3.6M MSR-GCN Average MPJPE (mm) @ 1000 ms 114.2 # 8
Average MPJPE (mm) @ 400ms 62.9 # 7

Methods