TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
3D Human Pose Estimation	Human3.6M	StridedTransformer (T=243)	Average MPJPE (mm)	44	# 96
3D Human Pose Estimation	Human3.6M	StridedTransformer (T=243)	Using 2D ground-truth joints	No	# 2
3D Human Pose Estimation	Human3.6M	StridedTransformer (T=243)	Multi-View or Monocular	Monocular	# 1
3D Human Pose Estimation	Human3.6M	StridedTransformer (T=351)	Average MPJPE (mm)	43.7	# 93
3D Human Pose Estimation	Human3.6M	StridedTransformer (T=351)	Using 2D ground-truth joints	No	# 2
3D Human Pose Estimation	Human3.6M	StridedTransformer (T=351)	Multi-View or Monocular	Monocular	# 1
3D Human Pose Estimation	Human3.6M	StridedTransformer (T=81)	Average MPJPE (mm)	45.4	# 115
3D Human Pose Estimation	Human3.6M	StridedTransformer (T=81)	Using 2D ground-truth joints	No	# 2
3D Human Pose Estimation	Human3.6M	StridedTransformer (T=81)	Multi-View or Monocular	Monocular	# 1
3D Human Pose Estimation	Human3.6M	StridedTransformer (T=27)	Average MPJPE (mm)	46.9	# 126
3D Human Pose Estimation	Human3.6M	StridedTransformer (T=27)	Using 2D ground-truth joints	No	# 2
3D Human Pose Estimation	Human3.6M	StridedTransformer (T=27)	Multi-View or Monocular	Monocular	# 1
3D Human Pose Estimation	Human3.6M	StridedTransformer (T=243 CPN GTi)	Average MPJPE (mm)	28.5	# 25
3D Human Pose Estimation	Human3.6M	StridedTransformer (T=243 CPN GTi)	Using 2D ground-truth joints	Yes	# 2
3D Human Pose Estimation	Human3.6M	StridedTransformer (T=243 CPN GTi)	Multi-View or Monocular	Monocular	# 1
3D Human Pose Estimation	HumanEva-I	StridedTransformer (T=27 MRCNN)	Mean Reconstruction Error (mm)	18.9	# 9
3D Human Pose Estimation	HumanEva-I	StridedTransformer (T=27 GT)	Mean Reconstruction Error (mm)	12.2	# 2

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/lifting-transformer-for-3d-human-pose/3d-human-pose-estimation-on-humaneva-i)](https://paperswithcode.com/sota/3d-human-pose-estimation-on-humaneva-i?p=lifting-transformer-for-3d-human-pose)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/lifting-transformer-for-3d-human-pose/3d-human-pose-estimation-on-human36m)](https://paperswithcode.com/sota/3d-human-pose-estimation-on-human36m?p=lifting-transformer-for-3d-human-pose)`

Exploiting Temporal Contexts with Strided Transformer for 3D Human Pose Estimation

26 Mar 2021 · Wenhao Li, Hong Liu, Runwei Ding, Mengyuan Liu, Pichao Wang, Wenming Yang ·

Despite the great progress in 3D human pose estimation from videos, it is still an open problem to take full advantage of a redundant 2D pose sequence to learn representative representations for generating one 3D pose. To this end, we propose an improved Transformer-based architecture, called Strided Transformer, which simply and effectively lifts a long sequence of 2D joint locations to a single 3D pose. Specifically, a Vanilla Transformer Encoder (VTE) is adopted to model long-range dependencies of 2D pose sequences. To reduce the redundancy of the sequence, fully-connected layers in the feed-forward network of VTE are replaced with strided convolutions to progressively shrink the sequence length and aggregate information from local contexts. The modified VTE is termed as Strided Transformer Encoder (STE), which is built upon the outputs of VTE. STE not only effectively aggregates long-range information to a single-vector representation in a hierarchical global and local fashion, but also significantly reduces the computation cost. Furthermore, a full-to-single supervision scheme is designed at both full sequence and single target frame scales applied to the outputs of VTE and STE, respectively. This scheme imposes extra temporal smoothness constraints in conjunction with the single target frame supervision and hence helps produce smoother and more accurate 3D poses. The proposed Strided Transformer is evaluated on two challenging benchmark datasets, Human3.6M and HumanEva-I, and achieves state-of-the-art results with fewer parameters. Code and models are available at \url{https://github.com/Vegetebird/StridedTransformer-Pose3D}.

PDF Abstract

Code

Add Remove Mark official

Vegetebird/StridedTransformer-Pose3D official

330

Tasks

Add Remove

3D Human Pose Estimation

Monocular 3D Human Pose Estimation

Pose Estimation

Datasets

Human3.6M

Results from the Paper

Edit

Ranked #2 on 3D Human Pose Estimation on HumanEva-I

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
3D Human Pose Estimation	Human3.6M	StridedTransformer (T=243)	Average MPJPE (mm)	44	# 96	Compare
			Using 2D ground-truth joints	No	# 2	Compare
			Multi-View or Monocular	Monocular	# 1	Compare
3D Human Pose Estimation	Human3.6M	StridedTransformer (T=351)	Average MPJPE (mm)	43.7	# 93	Compare
			Using 2D ground-truth joints	No	# 2	Compare
			Multi-View or Monocular	Monocular	# 1	Compare
3D Human Pose Estimation	Human3.6M	StridedTransformer (T=81)	Average MPJPE (mm)	45.4	# 115	Compare
			Using 2D ground-truth joints	No	# 2	Compare
			Multi-View or Monocular	Monocular	# 1	Compare
3D Human Pose Estimation	Human3.6M	StridedTransformer (T=27)	Average MPJPE (mm)	46.9	# 126	Compare
			Using 2D ground-truth joints	No	# 2	Compare
			Multi-View or Monocular	Monocular	# 1	Compare
3D Human Pose Estimation	Human3.6M	StridedTransformer (T=243 CPN GTi)	Average MPJPE (mm)	28.5	# 25	Compare
			Using 2D ground-truth joints	Yes	# 2	Compare
			Multi-View or Monocular	Monocular	# 1	Compare
3D Human Pose Estimation	HumanEva-I	StridedTransformer (T=27 MRCNN)	Mean Reconstruction Error (mm)	18.9	# 9	Compare
3D Human Pose Estimation	HumanEva-I	StridedTransformer (T=27 GT)	Mean Reconstruction Error (mm)	12.2	# 2	Compare

Methods

Add Remove

Absolute Position Encodings • Adam • BPE • Dense Connections • Dropout • Label Smoothing • Layer Normalization • Linear Layer • Multi-Head Attention • Position-Wise Feed-Forward Layer • Residual Connection • Scaled Dot-Product Attention • Softmax • Transformer

Edit Social Preview

Exploiting Temporal Contexts with Strided Transformer for 3D Human Pose Estimation

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove