RetinaMask: Learning to predict masks improves state-of-the-art single-shot detection for free

10 Jan 2019  ·  Cheng-Yang Fu, Mykhailo Shvets, Alexander C. Berg ·

Recently two-stage detectors have surged ahead of single-shot detectors in the accuracy-vs-speed trade-off. Nevertheless single-shot detectors are immensely popular in embedded vision applications. This paper brings single-shot detectors up to the same level as current two-stage techniques. We do this by improving training for the state-of-the-art single-shot detector, RetinaNet, in three ways: integrating instance mask prediction for the first time, making the loss function adaptive and more stable, and including additional hard examples in training. We call the resulting augmented network RetinaMask. The detection component of RetinaMask has the same computational cost as the original RetinaNet, but is more accurate. COCO test-dev results are up to 41.4 mAP for RetinaMask-101 vs 39.1mAP for RetinaNet-101, while the runtime is the same during evaluation. Adding Group Normalization increases the performance of RetinaMask-101 to 41.7 mAP. Code is at:https://github.com/chengyangfu/retinamask

PDF Abstract

Datasets


Results from the Paper


Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Object Detection COCO minival RetinaMask (ResNet-101-FPN) box AP 41.1 # 154
AP50 60.2 # 77
AP75 44.1 # 69
Object Detection COCO test-dev RetinaMask (ResNet-50-FPN) box mAP 39.4 # 187
AP50 58.6 # 136
AP75 42.3 # 132
APS 21.9 # 121
APM 42.0 # 125
APL 51.0 # 125
Hardware Burden 9G # 1
Operations per network pass None # 1
Object Detection COCO test-dev RetinaMask (ResNeXt-101-FPN-GN) box mAP 42.6 # 157
AP50 62.5 # 100
AP75 46.0 # 109
APS 24.8 # 93
APM 45.6 # 95
APL 53.8 # 104
Hardware Burden 12G # 1
Operations per network pass None # 1

Methods