TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Visual Object Tracking	GOT-10k	STARK	Average Overlap	68.8	# 18
Visual Object Tracking	GOT-10k	STARK	Success Rate 0.5	78.1	# 13
Visual Object Tracking	LaSOT	STARK	AUC	67.1	# 21
Visual Object Tracking	LaSOT	STARK	Normalized Precision	77.0	# 17
Visual Object Tracking	TrackingNet	STARK	Precision	79.1	# 16
Visual Object Tracking	TrackingNet	STARK	Normalized Precision	86.9	# 16
Visual Object Tracking	TrackingNet	STARK	Accuracy	82.0	# 18

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/learning-spatio-temporal-transformer-for/visual-object-tracking-on-got-10k)](https://paperswithcode.com/sota/visual-object-tracking-on-got-10k?p=learning-spatio-temporal-transformer-for)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/learning-spatio-temporal-transformer-for/visual-object-tracking-on-trackingnet)](https://paperswithcode.com/sota/visual-object-tracking-on-trackingnet?p=learning-spatio-temporal-transformer-for)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/learning-spatio-temporal-transformer-for/visual-object-tracking-on-lasot)](https://paperswithcode.com/sota/visual-object-tracking-on-lasot?p=learning-spatio-temporal-transformer-for)`

Learning Spatio-Temporal Transformer for Visual Tracking

ICCV 2021 · Bin Yan, Houwen Peng, Jianlong Fu, Dong Wang, Huchuan Lu ·

In this paper, we present a new tracking architecture with an encoder-decoder transformer as the key component. The encoder models the global spatio-temporal feature dependencies between target objects and search regions, while the decoder learns a query embedding to predict the spatial positions of the target objects. Our method casts object tracking as a direct bounding box prediction problem, without using any proposals or predefined anchors. With the encoder-decoder transformer, the prediction of objects just uses a simple fully-convolutional network, which estimates the corners of objects directly. The whole method is end-to-end, does not need any postprocessing steps such as cosine window and bounding box smoothing, thus largely simplifying existing tracking pipelines. The proposed tracker achieves state-of-the-art performance on five challenging short-term and long-term benchmarks, while running at real-time speed, being 6x faster than Siam R-CNN. Code and models are open-sourced at https://github.com/researchmm/Stark.

PDF Abstract ICCV 2021 PDF ICCV 2021 Abstract

Code

Add Remove Mark official

researchmm/Stark official

610

Tasks

Add Remove

Object Tracking

Visual Object Tracking

Visual Tracking

Datasets

LaSOT

GOT-10k

TrackingNet

Results from the Paper

Edit

Ranked #18 on Visual Object Tracking on TrackingNet

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Visual Object Tracking	GOT-10k	STARK	Average Overlap	68.8	# 18	Compare
Visual Object Tracking	GOT-10k	STARK	Success Rate 0.5	78.1	# 13	Compare
Visual Object Tracking	LaSOT	STARK	AUC	67.1	# 21	Compare
Visual Object Tracking	LaSOT	STARK	Normalized Precision	77.0	# 17	Compare
Visual Object Tracking	TrackingNet	STARK	Precision	79.1	# 16	Compare
			Normalized Precision	86.9	# 16	Compare
			Accuracy	82.0	# 18	Compare

Methods

Add Remove

Absolute Position Encodings • Adam • BPE • Dense Connections • Dropout • Label Smoothing • Layer Normalization • Linear Layer • Multi-Head Attention • Position-Wise Feed-Forward Layer • Residual Connection • Scaled Dot-Product Attention • Softmax • Transformer

Edit Social Preview

Learning Spatio-Temporal Transformer for Visual Tracking

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove