TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Image Classification	ImageNet	ResT-Small	Top 1 Accuracy	79.6%	# 689
Image Classification	ImageNet	ResT-Small	Number of params	13.66M	# 510
Image Classification	ImageNet	ResT-Small	GFLOPs	1.9	# 145
Image Classification	ImageNet	ResT-Large	Top 1 Accuracy	83.6%	# 379
Image Classification	ImageNet	ResT-Large	Number of params	51.63M	# 734
Image Classification	ImageNet	ResT-Large	GFLOPs	7.9	# 265

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/rest-an-efficient-transformer-for-visual/image-classification-on-imagenet)](https://paperswithcode.com/sota/image-classification-on-imagenet?p=rest-an-efficient-transformer-for-visual)`

ResT: An Efficient Transformer for Visual Recognition

NeurIPS 2021 · Qinglong Zhang, YuBin Yang ·

This paper presents an efficient multi-scale vision Transformer, called ResT, that capably served as a general-purpose backbone for image recognition. Unlike existing Transformer methods, which employ standard Transformer blocks to tackle raw images with a fixed resolution, our ResT have several advantages: (1) A memory-efficient multi-head self-attention is built, which compresses the memory by a simple depth-wise convolution, and projects the interaction across the attention-heads dimension while keeping the diversity ability of multi-heads; (2) Position encoding is constructed as spatial attention, which is more flexible and can tackle with input images of arbitrary size without interpolation or fine-tune; (3) Instead of the straightforward tokenization at the beginning of each stage, we design the patch embedding as a stack of overlapping convolution operation with stride on the 2D-reshaped token map. We comprehensively validate ResT on image classification and downstream tasks. Experimental results show that the proposed ResT can outperform the recently state-of-the-art backbones by a large margin, demonstrating the potential of ResT as strong backbones. The code and models will be made publicly available at https://github.com/wofmanaf/ResT.

PDF Abstract NeurIPS 2021 PDF NeurIPS 2021 Abstract

Code

Add Remove Mark official

wofmanaf/ResT official

270

xmu-xiaoma666/External-Attention-py…

10,954

BR-IDL/PaddleViT

1,188

sithu31296/semantic-segmentation

↳ Quickstart in

Colab

768

mindspore-courses/External-Attentio…

Tasks

Add Remove

Image Classification

Datasets

ImageNet

Results from the Paper

Edit

Ranked #379 on Image Classification on ImageNet

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Image Classification	ImageNet	ResT-Small	Top 1 Accuracy	79.6%	# 689	Compare
			Number of params	13.66M	# 510	Compare
			GFLOPs	1.9	# 145	Compare
Image Classification	ImageNet	ResT-Large	Top 1 Accuracy	83.6%	# 379	Compare
			Number of params	51.63M	# 734	Compare
			GFLOPs	7.9	# 265	Compare

Methods

Add Remove

Absolute Position Encodings • Adam • BPE • Convolution • Dense Connections • Dropout • Label Smoothing • Layer Normalization • Linear Layer • Multi-Head Attention • Position-Wise Feed-Forward Layer • Residual Connection • Scaled Dot-Product Attention • Softmax • Transformer

Edit Social Preview

ResT: An Efficient Transformer for Visual Recognition

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove