TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Image Classification	ImageNet	GTP-ViT-B-Patch8/P20	Top 1 Accuracy	85.8%	# 188
Image Classification	ImageNet	GTP-EVA-L/P8	Top 1 Accuracy	85.4%	# 222
Image Classification	ImageNet	GTP-ViT-L/P8	Top 1 Accuracy	83.7%	# 366
Image Classification	ImageNet	GTP-LV-ViT-M/P8	Top 1 Accuracy	82.8%	# 454
Image Classification	ImageNet	GTP-LV-ViT-M/P8	GFLOPs	8	# 267
Image Classification	ImageNet	GTP-LV-ViT-S/P8	Top 1 Accuracy	81.9%	# 544
Image Classification	ImageNet	GTP-LV-ViT-S/P8	GFLOPs	4.8	# 226
Image Classification	ImageNet	GTP-DeiT-B/P8	Top 1 Accuracy	81.5%	# 578
Image Classification	ImageNet	GTP-DeiT-B/P8	GFLOPs	13.1	# 323
Image Classification	ImageNet	GTP-DeiT-S/P8	Top 1 Accuracy	79.5%	# 692
Image Classification	ImageNet	GTP-DeiT-S/P8	GFLOPs	3.4	# 178

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/gtp-vit-efficient-vision-transformers-via/image-classification-on-imagenet)](https://paperswithcode.com/sota/image-classification-on-imagenet?p=gtp-vit-efficient-vision-transformers-via)`

GTP-ViT: Efficient Vision Transformers via Graph-based Token Propagation

6 Nov 2023 · Xuwei Xu, Sen Wang, Yudong Chen, Yanping Zheng, Zhewei Wei, Jiajun Liu ·

Vision Transformers (ViTs) have revolutionized the field of computer vision, yet their deployments on resource-constrained devices remain challenging due to high computational demands. To expedite pre-trained ViTs, token pruning and token merging approaches have been developed, which aim at reducing the number of tokens involved in the computation. However, these methods still have some limitations, such as image information loss from pruned tokens and inefficiency in the token-matching process. In this paper, we introduce a novel Graph-based Token Propagation (GTP) method to resolve the challenge of balancing model efficiency and information preservation for efficient ViTs. Inspired by graph summarization algorithms, GTP meticulously propagates less significant tokens' information to spatially and semantically connected tokens that are of greater importance. Consequently, the remaining few tokens serve as a summarization of the entire token graph, allowing the method to reduce computational complexity while preserving essential information of eliminated tokens. Combined with an innovative token selection strategy, GTP can efficiently identify image tokens to be propagated. Extensive experiments have validated GTP's effectiveness, demonstrating both efficiency and performance improvements. Specifically, GTP decreases the computational complexity of both DeiT-S and DeiT-B by up to 26% with only a minimal 0.3% accuracy drop on ImageNet-1K without finetuning, and remarkably surpasses the state-of-the-art token merging method on various backbones at an even faster inference speed. The source code is available at https://github.com/Ackesnal/GTP-ViT.

PDF Abstract

Code

Add Remove Mark official

ackesnal/gtp-vit official

Tasks

Add Remove

Efficient ViTs

Image Classification

Datasets

ImageNet

Results from the Paper

Edit

Ranked #188 on Image Classification on ImageNet

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Image Classification	ImageNet	GTP-ViT-B-Patch8/P20	Top 1 Accuracy	85.8%	# 188	Compare
Image Classification	ImageNet	GTP-EVA-L/P8	Top 1 Accuracy	85.4%	# 222	Compare
Image Classification	ImageNet	GTP-ViT-L/P8	Top 1 Accuracy	83.7%	# 366	Compare
Image Classification	ImageNet	GTP-LV-ViT-M/P8	Top 1 Accuracy	82.8%	# 454	Compare
Image Classification	ImageNet	GTP-LV-ViT-M/P8	GFLOPs	8	# 267	Compare
Image Classification	ImageNet	GTP-LV-ViT-S/P8	Top 1 Accuracy	81.9%	# 544	Compare
Image Classification	ImageNet	GTP-LV-ViT-S/P8	GFLOPs	4.8	# 226	Compare
Image Classification	ImageNet	GTP-DeiT-B/P8	Top 1 Accuracy	81.5%	# 578	Compare
Image Classification	ImageNet	GTP-DeiT-B/P8	GFLOPs	13.1	# 323	Compare
Image Classification	ImageNet	GTP-DeiT-S/P8	Top 1 Accuracy	79.5%	# 692	Compare
Image Classification	ImageNet	GTP-DeiT-S/P8	GFLOPs	3.4	# 178	Compare

Methods

Add Remove

Pruning

Edit Social Preview

GTP-ViT: Efficient Vision Transformers via Graph-based Token Propagation

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove