TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Action Recognition	Okutama-Action	PLAR with bbox (Ours)	Accuracy	75.93	# 1
Action Recognition	Okutama-Action	PLAR without bbox (Ours)	Accuracy	71.54	# 2
Action Recognition	Something-Something V2	PLAR	Top-1 Accuracy	67.3	# 65
Action Recognition	Something-Something V2	PLAR	Top-5 Accuracy	91	# 47

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/prompt-learning-for-action-recognition/action-recognition-on-okutama-action)](https://paperswithcode.com/sota/action-recognition-on-okutama-action?p=prompt-learning-for-action-recognition)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/prompt-learning-for-action-recognition/action-recognition-in-videos-on-something)](https://paperswithcode.com/sota/action-recognition-in-videos-on-something?p=prompt-learning-for-action-recognition)`

PLAR: Prompt Learning for Action Recognition

21 May 2023 · Xijun Wang, Ruiqi Xian, Tianrui Guan, Dinesh Manocha ·

We present a new general learning approach, Prompt Learning for Action Recognition (PLAR), which leverages the strengths of prompt learning to guide the learning process. Our approach is designed to predict the action label by helping the models focus on the descriptions or instructions associated with actions in the input videos. Our formulation uses various prompts, including learnable prompts, auxiliary visual information, and large vision models to improve the recognition performance. In particular, we design a learnable prompt method that learns to dynamically generate prompts from a pool of prompt experts under different inputs. By sharing the same objective with the task, our proposed PLAR can optimize prompts that guide the model's predictions while explicitly learning input-invariant (prompt experts pool) and input-specific (data-dependent) prompt knowledge. We evaluate our approach on datasets consisting of both ground camera videos and aerial videos, and scenes with single-agent and multi-agent actions. In practice, we observe a 3.17-10.2% accuracy improvement on the aerial multi-agent dataset Okutamam and a 1.0-3.6% improvement on the ground camera single-agent dataset Something Something V2. We plan to release our code on the WWW.

PDF Abstract

Code

Add Remove Mark official

No code implementations yet. Submit your code now

Tasks

Add Remove

Action Recognition

Optical Flow Estimation

Datasets

Something-Something V2 Okutama-Action

Results from the Paper

Edit

Ranked #1 on Action Recognition on Okutama-Action

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Action Recognition	Okutama-Action	PLAR with bbox (Ours)	Accuracy	75.93	# 1	Compare
Action Recognition	Okutama-Action	PLAR without bbox (Ours)	Accuracy	71.54	# 2	Compare
Action Recognition	Something-Something V2	PLAR	Top-1 Accuracy	67.3	# 65	Compare
Action Recognition	Something-Something V2	PLAR	Top-5 Accuracy	91	# 47	Compare

Methods

Add Remove

No methods listed for this paper. Add relevant methods here

Edit Social Preview

PLAR: Prompt Learning for Action Recognition

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove