TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Human Interaction Recognition	NTU RGB+D	SkeleTR	Accuracy (Cross-Subject)	94.9	# 2
Human Interaction Recognition	NTU RGB+D	SkeleTR	Accuracy (Cross-View)	97.7	# 2
Human Interaction Recognition	NTU RGB+D 120	SkeleTR	Accuracy (Cross-Subject)	87.8	# 3
Human Interaction Recognition	NTU RGB+D 120	SkeleTR	Accuracy (Cross-Setup)	88.3	# 3

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/skeletr-towards-skeleton-based-action/human-interaction-recognition-on-ntu-rgb-d)](https://paperswithcode.com/sota/human-interaction-recognition-on-ntu-rgb-d?p=skeletr-towards-skeleton-based-action)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/skeletr-towards-skeleton-based-action/human-interaction-recognition-on-ntu-rgb-d-1)](https://paperswithcode.com/sota/human-interaction-recognition-on-ntu-rgb-d-1?p=skeletr-towards-skeleton-based-action)`

SkeleTR: Towards Skeleton-based Action Recognition in the Wild

ICCV 2023 · Haodong Duan, Mingze Xu, Bing Shuai, Davide Modolo, Zhuowen Tu, Joseph Tighe, Alessandro Bergamo ·

We present SkeleTR, a new framework for skeleton-based action recognition. In contrast to prior work, which focuses mainly on controlled environments, we target in-the-wild scenarios that typically involve a variable number of people and various forms of interaction between people. SkeleTR works with a two-stage paradigm. It first models the intra-person skeleton dynamics for each skeleton sequence with graph convolutions, and then uses stacked Transformer encoders to capture person interactions that are important for action recognition in the wild. To mitigate the negative impact of inaccurate skeleton associations, SkeleTR takes relative short skeleton sequences as input and increases the number of sequences. As a unified solution, SkeleTR can be directly applied to multiple skeleton-based action tasks, including video-level action classification, instance-level action detection, and group-level activity recognition. It also enables transfer learning and joint training across different action tasks and datasets, which results in performance improvement. When evaluated on various skeleton-based action recognition benchmarks, SkeleTR achieves the state-of-the-art performance.

PDF Abstract

Code

Add Remove Mark official

No code implementations yet. Submit your code now

Tasks

Add Remove

Action Classification

Action Detection

Action Recognition

Activity Recognition

Human Interaction Recognition

Skeleton Based Action Recognition

Transfer Learning

Datasets

NTU RGB+D

AVA

NTU RGB+D 120

Volleyball

Results from the Paper

Add Remove

Ranked #2 on Human Interaction Recognition on NTU RGB+D

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Human Interaction Recognition	NTU RGB+D	SkeleTR	Accuracy (Cross-Subject)	94.9	# 2	Compare
Human Interaction Recognition	NTU RGB+D	SkeleTR	Accuracy (Cross-View)	97.7	# 2	Compare
Human Interaction Recognition	NTU RGB+D 120	SkeleTR	Accuracy (Cross-Subject)	87.8	# 3	Compare
Human Interaction Recognition	NTU RGB+D 120	SkeleTR	Accuracy (Cross-Setup)	88.3	# 3	Compare

Methods

Add Remove

Absolute Position Encodings • Adam • BPE • Dense Connections • Dropout • Label Smoothing • Layer Normalization • Linear Layer • Multi-Head Attention • Position-Wise Feed-Forward Layer • Residual Connection • Scaled Dot-Product Attention • Softmax • Transformer

Edit Social Preview

SkeleTR: Towards Skeleton-based Action Recognition in the Wild

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit Add Remove

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Add Remove

Methods

Add Remove