TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Action Recognition	Animal Kingdom	MSQNet	mAP	73.1	# 1
Zero-Shot Action Recognition	Charades	MSQNet	mAP	35.59	# 1
Action Recognition	Charades	MSQNet	MAP	47.57	# 1
Zero-Shot Action Recognition	HMDB51	MSQNet	Accuracy	69.43	# 1
Action Recognition	HMDB51	MSQNet	Accuracy	93.25	# 1
Action Recognition	Hockey	MSQNet	Accuracy	3.05	# 1
Action Recognition	THUMOS14	MSQNet	Accuracy	83.16	# 1
Zero-Shot Action Recognition	THUMOS' 14	MSQNet	Accuracy	75.33	# 1

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/msqnet-actor-agnostic-action-recognition-with/action-recognition-on-animal-kingdom)](https://paperswithcode.com/sota/action-recognition-on-animal-kingdom?p=msqnet-actor-agnostic-action-recognition-with)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/msqnet-actor-agnostic-action-recognition-with/zero-shot-action-recognition-on-charades-1)](https://paperswithcode.com/sota/zero-shot-action-recognition-on-charades-1?p=msqnet-actor-agnostic-action-recognition-with)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/msqnet-actor-agnostic-action-recognition-with/action-recognition-in-videos-on-charades)](https://paperswithcode.com/sota/action-recognition-in-videos-on-charades?p=msqnet-actor-agnostic-action-recognition-with)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/msqnet-actor-agnostic-action-recognition-with/zero-shot-action-recognition-on-hmdb51)](https://paperswithcode.com/sota/zero-shot-action-recognition-on-hmdb51?p=msqnet-actor-agnostic-action-recognition-with)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/msqnet-actor-agnostic-action-recognition-with/action-recognition-in-videos-on-hmdb51)](https://paperswithcode.com/sota/action-recognition-in-videos-on-hmdb51?p=msqnet-actor-agnostic-action-recognition-with)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/msqnet-actor-agnostic-action-recognition-with/action-recognition-on-hockey)](https://paperswithcode.com/sota/action-recognition-on-hockey?p=msqnet-actor-agnostic-action-recognition-with)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/msqnet-actor-agnostic-action-recognition-with/action-recognition-on-thumos14)](https://paperswithcode.com/sota/action-recognition-on-thumos14?p=msqnet-actor-agnostic-action-recognition-with)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/msqnet-actor-agnostic-action-recognition-with/zero-shot-action-recognition-on-thumos-14)](https://paperswithcode.com/sota/zero-shot-action-recognition-on-thumos-14?p=msqnet-actor-agnostic-action-recognition-with)`

Actor-agnostic Multi-label Action Recognition with Multi-modal Query

20 Jul 2023 · Anindya Mondal, Sauradip Nag, Joaquin M Prada, Xiatian Zhu, Anjan Dutta ·

Existing action recognition methods are typically actor-specific due to the intrinsic topological and apparent differences among the actors. This requires actor-specific pose estimation (e.g., humans vs. animals), leading to cumbersome model design complexity and high maintenance costs. Moreover, they often focus on learning the visual modality alone and single-label classification whilst neglecting other available information sources (e.g., class name text) and the concurrent occurrence of multiple actions. To overcome these limitations, we propose a new approach called 'actor-agnostic multi-modal multi-label action recognition,' which offers a unified solution for various types of actors, including humans and animals. We further formulate a novel Multi-modal Semantic Query Network (MSQNet) model in a transformer-based object detection framework (e.g., DETR), characterized by leveraging visual and textual modalities to represent the action classes better. The elimination of actor-specific model designs is a key advantage, as it removes the need for actor pose estimation altogether. Extensive experiments on five publicly available benchmarks show that our MSQNet consistently outperforms the prior arts of actor-specific alternatives on human and animal single- and multi-label action recognition tasks by up to 50%. Code is made available at https://github.com/mondalanindya/MSQNet.

PDF Abstract

Code

Add Remove Mark official

mondalanindya/msqnet official

Tasks

Add Remove

Action Classification

Action Recognition

Action Recognition In Videos

Action Recognition on HMDB-51

Animal Action Recognition

Zero-Shot Action Recognition

Datasets

HMDB51

Charades

THUMOS14

Animal Kingdom

Results from the Paper

Edit

Ranked #1 on Action Recognition on HMDB51

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Action Recognition	Animal Kingdom	MSQNet	mAP	73.1	# 1	Compare
Zero-Shot Action Recognition	Charades	MSQNet	mAP	35.59	# 1	Compare
Action Recognition	Charades	MSQNet	MAP	47.57	# 1	Compare
Zero-Shot Action Recognition	HMDB51	MSQNet	Accuracy	69.43	# 1	Compare
Action Recognition	HMDB51	MSQNet	Accuracy	93.25	# 1	Compare
Action Recognition	Hockey	MSQNet	Accuracy	3.05	# 1	Compare
Action Recognition	THUMOS14	MSQNet	Accuracy	83.16	# 1	Compare
Zero-Shot Action Recognition	THUMOS' 14	MSQNet	Accuracy	75.33	# 1	Compare

Methods

Add Remove

Dense Connections • Focus • Layer Normalization • Linear Layer • Multi-Head Attention • Residual Connection • Scaled Dot-Product Attention • Softmax • Vision Transformer

Edit Social Preview

Actor-agnostic Multi-label Action Recognition with Multi-modal Query

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove