AGSS-VOS: Attention Guided Single-Shot Video Object Segmentation
Most video object segmentation approaches process objects separately. This incurs high computational cost when multiple objects exist. In this paper, we propose AGSS-VOS to segment multiple objects in one feed-forward path via instance-agnostic and instance-specific modules. Information from the two modules is fused via an attention-guided decoder to simultaneously segment all object instances in one path. The whole framework is end-to-end trainable with instance IoU loss. Experimental results on Youtube- VOS and DAVIS-2017 dataset demonstrate that AGSS-VOS achieves competitive results in terms of both accuracy and efficiency.
PDF AbstractResults from the Paper
Task | Dataset | Model | Metric Name | Metric Value | Global Rank | Benchmark |
---|---|---|---|---|---|---|
Semi-Supervised Video Object Segmentation | DAVIS 2017 (val) | AGSS-VOS | Jaccard (Mean) | 63.4 | # 62 | |
F-measure (Mean) | 69.8 | # 62 | ||||
J&F | 66.6 | # 64 | ||||
Semi-Supervised Video Object Segmentation | DAVIS (no YouTube-VOS training) | AGSS-VOS | FPS | 10.0 | # 13 | |
D17 val (G) | 67.4 | # 22 | ||||
D17 val (J) | 64.9 | # 22 | ||||
D17 val (F) | 69.9 | # 23 | ||||
D17 test (G) | 57.2 | # 5 | ||||
D17 test (J) | 54.8 | # 5 | ||||
D17 test (F) | 59.7 | # 5 |