ResNeSt: Split-Attention Networks

It is well known that featuremap attention and multi-path representation are important for visual recognition. In this paper, we present a modularized architecture, which applies the channel-wise attention on different network branches to leverage their success in capturing cross-feature interactions and learning diverse representations. Our design results in a simple and unified computation block, which can be parameterized using only a few variables. Our model, named ResNeSt, outperforms EfficientNet in accuracy and latency trade-off on image classification. In addition, ResNeSt has achieved superior transfer learning results on several public benchmarks serving as the backbone, and has been adopted by the winning entries of COCO-LVIS challenge. The source code for complete system and pretrained models are publicly available.

PDF Abstract

Results from the Paper


Task Dataset Model Metric Name Metric Value Global Rank Uses Extra
Training Data
Result Benchmark
Semantic Segmentation ADE20K ResNeSt-269 Validation mIoU 47.60 # 152
Semantic Segmentation ADE20K ResNeSt-101 Validation mIoU 46.91 # 160
Semantic Segmentation ADE20K ResNeSt-200 Validation mIoU 48.36 # 139
Semantic Segmentation ADE20K val ResNeSt-101 mIoU 46.91 # 64
Semantic Segmentation ADE20K val ResNeSt-200 mIoU 48.36 # 58
Semantic Segmentation ADE20K val ResNeSt-269 mIoU 47.60 # 61
Semantic Segmentation Cityscapes test ResNeSt200 (Mapillary) Mean IoU (class) 83.3% # 15
Semantic Segmentation Cityscapes val ResNeSt-200 mIoU 82.7 # 28
Instance Segmentation COCO minival ResNeSt-200-DCN (single-scale) mask AP 44.5 # 50
Panoptic Segmentation COCO minival PanopticFPN+ResNeSt(single-scale) PQ 47.9 # 20
PQth 55.1 # 17
PQst 37.0 # 16
Object Detection COCO minival ResNeSt-200-DCN (single-scale) box AP 50.91 # 71
AP50 69.53 # 22
AP75 55.40 # 15
APS 32.67 # 14
APM 54.66 # 14
APL 65.83 # 16
Object Detection COCO minival ResNeSt-200 (multi-scale) box AP 52.47 # 63
AP50 71.00 # 13
AP75 57.07 # 10
APS 36.80 # 10
APM 56.36 # 11
APL 66.29 # 15
Instance Segmentation COCO minival ResNeSt-200 (single-scale) mask AP 44.21 # 54
Instance Segmentation COCO minival ResNeSt-101 (single-scale) mask AP 41.56 # 62
Instance Segmentation COCO minival ResNeSt-200 (multi-scale) mask AP 46.25 # 41
Object Detection COCO minival ResNeSt-200 (single-scale) box AP 50.54 # 73
AP50 68.78 # 25
AP75 55.17 # 17
APM 54.2 # 15
APL 63.9 # 20
Instance Segmentation COCO test-dev ResNeSt-200 (multi-scale) AP50 70.2 # 10
AP75 51.5 # 9
APS 30.0 # 9
APM 49.6 # 8
APL 60.6 # 10
Instance Segmentation COCO test-dev ResNeSt101 mask AP 43% # 49
Object Detection COCO test-dev ResNeSt-200 (multi-scale) box mAP 53.3 # 48
AP50 72.0 # 10
AP75 58.0 # 17
APS 35.1 # 14
APM 56.2 # 13
APL 66.8 # 8
Semantic Segmentation DADA-seg ResNeSt (ResNeSt-101) mIoU 19.99 # 21
Image Classification ImageNet ResNeSt-101 Top 1 Accuracy 83.0% # 437
Number of params 48M # 713
Image Classification ImageNet ResNeSt-269 Top 1 Accuracy 84.5% # 293
Number of params 111M # 874
Image Classification ImageNet ResNeSt-200 Top 1 Accuracy 83.9% # 347
Number of params 70M # 787
Image Classification ImageNet ResNeSt-50 Top 1 Accuracy 81.13% # 605
Number of params 27.5M # 625
GFLOPs 5.39 # 235
Image Classification ImageNet ResNeSt-50-fast Top 1 Accuracy 80.64% # 633
Number of params 27.5M # 625
GFLOPs 4.34 # 207
Semantic Segmentation PASCAL Context ResNeSt-101 mIoU 56.5 # 20
Semantic Segmentation PASCAL Context ResNeSt-269 mIoU 58.9 # 16
Semantic Segmentation PASCAL Context ResNeSt-200 mIoU 58.4 # 17

Methods