TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Efficient ViTs	ImageNet-1K (with DeiT-S)	LTMP (80%)	Top 1 Accuracy	79.8	# 4
Efficient ViTs	ImageNet-1K (with DeiT-S)	LTMP (80%)	GFLOPs	3.8	# 37
Efficient ViTs	ImageNet-1K (with DeiT-S)	LTMP (45%)	Top 1 Accuracy	78.6	# 34
Efficient ViTs	ImageNet-1K (with DeiT-S)	LTMP (45%)	GFLOPs	2.3	# 5
Efficient ViTs	ImageNet-1K (with DeiT-S)	LTMP (60%)	Top 1 Accuracy	79.6	# 16
Efficient ViTs	ImageNet-1K (with DeiT-S)	LTMP (60%)	GFLOPs	3.0	# 23
Efficient ViTs	ImageNet-1K (with DeiT-T)	LTMP (45%)	Top 1 Accuracy	69.8	# 21
Efficient ViTs	ImageNet-1K (with DeiT-T)	LTMP (45%)	GFLOPs	0.7	# 5
Efficient ViTs	ImageNet-1K (with DeiT-T)	LTMP (60%)	Top 1 Accuracy	71.5	# 15
Efficient ViTs	ImageNet-1K (with DeiT-T)	LTMP (60%)	GFLOPs	0.8	# 8
Efficient ViTs	ImageNet-1K (with DeiT-T)	LTMP (80%)	Top 1 Accuracy	72.0	# 11
Efficient ViTs	ImageNet-1K (with DeiT-T)	LTMP (80%)	GFLOPs	1.0	# 18

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/learned-thresholds-token-merging-and-pruning/efficient-vits-on-imagenet-1k-with-deit-s)](https://paperswithcode.com/sota/efficient-vits-on-imagenet-1k-with-deit-s?p=learned-thresholds-token-merging-and-pruning)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/learned-thresholds-token-merging-and-pruning/efficient-vits-on-imagenet-1k-with-deit-t)](https://paperswithcode.com/sota/efficient-vits-on-imagenet-1k-with-deit-t?p=learned-thresholds-token-merging-and-pruning)`

Learned Thresholds Token Merging and Pruning for Vision Transformers

20 Jul 2023 · Maxim Bonnaerens, Joni Dambre ·

Vision transformers have demonstrated remarkable success in a wide range of computer vision tasks over the last years. However, their high computational costs remain a significant barrier to their practical deployment. In particular, the complexity of transformer models is quadratic with respect to the number of input tokens. Therefore techniques that reduce the number of input tokens that need to be processed have been proposed. This paper introduces Learned Thresholds token Merging and Pruning (LTMP), a novel approach that leverages the strengths of both token merging and token pruning. LTMP uses learned threshold masking modules that dynamically determine which tokens to merge and which to prune. We demonstrate our approach with extensive experiments on vision transformers on the ImageNet classification task. Our results demonstrate that LTMP achieves state-of-the-art accuracy across reduction rates while requiring only a single fine-tuning epoch, which is an order of magnitude faster than previous methods. Code is available at https://github.com/Mxbonn/ltmp .

PDF Abstract

Code

Add Remove Mark official

mxbonn/ltmp official

Tasks

Add Remove

Efficient ViTs

Datasets

ImageNet

Results from the Paper

Edit

Ranked #4 on Efficient ViTs on ImageNet-1K (with DeiT-S)

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Efficient ViTs	ImageNet-1K (with DeiT-S)	LTMP (80%)	Top 1 Accuracy	79.8	# 4	Compare
Efficient ViTs	ImageNet-1K (with DeiT-S)	LTMP (80%)	GFLOPs	3.8	# 37	Compare
Efficient ViTs	ImageNet-1K (with DeiT-S)	LTMP (45%)	Top 1 Accuracy	78.6	# 34	Compare
Efficient ViTs	ImageNet-1K (with DeiT-S)	LTMP (45%)	GFLOPs	2.3	# 5	Compare
Efficient ViTs	ImageNet-1K (with DeiT-S)	LTMP (60%)	Top 1 Accuracy	79.6	# 16	Compare
Efficient ViTs	ImageNet-1K (with DeiT-S)	LTMP (60%)	GFLOPs	3.0	# 23	Compare
Efficient ViTs	ImageNet-1K (with DeiT-T)	LTMP (45%)	Top 1 Accuracy	69.8	# 21	Compare
Efficient ViTs	ImageNet-1K (with DeiT-T)	LTMP (45%)	GFLOPs	0.7	# 5	Compare
Efficient ViTs	ImageNet-1K (with DeiT-T)	LTMP (60%)	Top 1 Accuracy	71.5	# 15	Compare
Efficient ViTs	ImageNet-1K (with DeiT-T)	LTMP (60%)	GFLOPs	0.8	# 8	Compare
Efficient ViTs	ImageNet-1K (with DeiT-T)	LTMP (80%)	Top 1 Accuracy	72.0	# 11	Compare
Efficient ViTs	ImageNet-1K (with DeiT-T)	LTMP (80%)	GFLOPs	1.0	# 18	Compare

Methods

Add Remove

Pruning

Edit Social Preview

Learned Thresholds Token Merging and Pruning for Vision Transformers

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove