TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Multi-Instance Retrieval	EPIC-KITCHENS-100	Avion (ViT-L)	mAP(V2T)	57.9	# 1
Multi-Instance Retrieval	EPIC-KITCHENS-100	Avion (ViT-L)	mAP(T2V)	51.1	# 1
Multi-Instance Retrieval	EPIC-KITCHENS-100	Avion (ViT-L)	mAP (Avg)	54.5	# 1
Multi-Instance Retrieval	EPIC-KITCHENS-100	Avion (ViT-L)	nDCG (V2T)	70.4	# 1
Multi-Instance Retrieval	EPIC-KITCHENS-100	Avion (ViT-L)	nDCG (T2V)	67.6	# 1
Multi-Instance Retrieval	EPIC-KITCHENS-100	Avion (ViT-L)	nDCG (Avg)	69.0	# 1
Action Recognition	EPIC-KITCHENS-100	Avion (ViT-L)	Action@1	54.4	# 1
Action Recognition	EPIC-KITCHENS-100	Avion (ViT-L)	Verb@1	73.0	# 1
Action Recognition	EPIC-KITCHENS-100	Avion (ViT-L)	Noun@1	65.4	# 2

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/training-a-large-video-model-on-a-single/multi-instance-retrieval-on-epic-kitchens-100)](https://paperswithcode.com/sota/multi-instance-retrieval-on-epic-kitchens-100?p=training-a-large-video-model-on-a-single)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/training-a-large-video-model-on-a-single/action-recognition-on-epic-kitchens-100)](https://paperswithcode.com/sota/action-recognition-on-epic-kitchens-100?p=training-a-large-video-model-on-a-single)`

Training a Large Video Model on a Single Machine in a Day

28 Sep 2023 · Yue Zhao, Philipp Krähenbühl ·

Videos are big, complex to pre-process, and slow to train on. State-of-the-art large-scale video models are trained on clusters of 32 or more GPUs for several days. As a consequence, academia largely ceded the training of large video models to industry. In this paper, we show how to still train a state-of-the-art video model on a single machine with eight consumer-grade GPUs in a day. We identify three bottlenecks, IO, CPU, and GPU computation, and optimize each. The result is a highly efficient video training pipeline. For comparable architectures, our pipeline achieves higher accuracies with $\frac{1}{8}$ of the computation compared to prior work. Code is available at https://github.com/zhaoyue-zephyrus/AVION.

PDF Abstract

Code

Add Remove Mark official

zhaoyue-zephyrus/avion official

Tasks

Add Remove

Action Recognition

Multi-Instance Retrieval

Datasets

EPIC-KITCHENS-100

Results from the Paper

Edit

Ranked #1 on Multi-Instance Retrieval on EPIC-KITCHENS-100

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Multi-Instance Retrieval	EPIC-KITCHENS-100	Avion (ViT-L)	mAP(V2T)	57.9	# 1	Compare
			mAP(T2V)	51.1	# 1	Compare
			mAP (Avg)	54.5	# 1	Compare
			nDCG (V2T)	70.4	# 1	Compare
			nDCG (T2V)	67.6	# 1	Compare
			nDCG (Avg)	69.0	# 1	Compare
Action Recognition	EPIC-KITCHENS-100	Avion (ViT-L)	Action@1	54.4	# 1	Compare
			Verb@1	73.0	# 1	Compare
			Noun@1	65.4	# 2	Compare

Methods

Add Remove

No methods listed for this paper. Add relevant methods here

Edit Social Preview

Training a Large Video Model on a Single Machine in a Day

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove