TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Object Detection	COCO test-dev	RelationNet++ (ResNeXt-64x4d-101-DCN)	box mAP	52.7	# 64
Object Detection	COCO test-dev	RelationNet++ (ResNeXt-64x4d-101-DCN)	Hardware Burden	None	# 1
Object Detection	COCO test-dev	RelationNet++ (ResNeXt-64x4d-101-DCN)	Operations per network pass	None	# 1

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/relationnet-bridging-visual-representations/object-detection-on-coco)](https://paperswithcode.com/sota/object-detection-on-coco?p=relationnet-bridging-visual-representations)`

RelationNet++: Bridging Visual Representations for Object Detection via Transformer Decoder

NeurIPS 2020 · Cheng Chi, Fangyun Wei, Han Hu ·

Existing object detection frameworks are usually built on a single format of object/part representation, i.e., anchor/proposal rectangle boxes in RetinaNet and Faster R-CNN, center points in FCOS and RepPoints, and corner points in CornerNet. While these different representations usually drive the frameworks to perform well in different aspects, e.g., better classification or finer localization, it is in general difficult to combine these representations in a single framework to make good use of each strength, due to the heterogeneous or non-grid feature extraction by different representations. This paper presents an attention-based decoder module similar as that in Transformer~\cite{vaswani2017attention} to bridge other representations into a typical object detector built on a single representation format, in an end-to-end fashion. The other representations act as a set of \emph{key} instances to strengthen the main \emph{query} representation features in the vanilla detectors. Novel techniques are proposed towards efficient computation of the decoder module, including a \emph{key sampling} approach and a \emph{shared location embedding} approach. The proposed module is named \emph{bridging visual representations} (BVR). It can perform in-place and we demonstrate its broad effectiveness in bridging other representations into prevalent object detection frameworks, including RetinaNet, Faster R-CNN, FCOS and ATSS, where about $1.5\sim3.0$ AP improvements are achieved. In particular, we improve a state-of-the-art framework with a strong backbone by about $2.0$ AP, reaching $52.7$ AP on COCO test-dev. The resulting network is named RelationNet++. The code will be available at https://github.com/microsoft/RelationNet2.

PDF Abstract NeurIPS 2020 PDF NeurIPS 2020 Abstract

Code

Add Remove Mark official

microsoft/RelationNet2 official

210

shinya7y/UniverseNet

416

MindSpore-paper-code-2/code2

MindSpore-paper-code-3/code5

Tasks

Add Remove

Object

object-detection

Object Detection

Datasets

MS COCO

Results from the Paper

Edit

Ranked #64 on Object Detection on COCO test-dev

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Object Detection	COCO test-dev	RelationNet++ (ResNeXt-64x4d-101-DCN)	box mAP	52.7	# 64	Compare
			Hardware Burden	None	# 1	Compare
			Operations per network pass	None	# 1	Compare

Methods

Add Remove

1x1 Convolution • ATSS • Convolution • CornerNet • Corner Pooling • Faster R-CNN • FCOS • Focal Loss • FPN • Hourglass Module • Max Pooling • Non Maximum Suppression • ReLU • RepPoints • Residual Connection • RetinaNet • RoIPool • RPN • Softmax • Stacked Hourglass Network

Edit Social Preview

RelationNet++: Bridging Visual Representations for Object Detection via Transformer Decoder

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove