Pyramid Dilated Deeper ConvLSTM for Video Salient Object Detection

This paper proposes a fast video salient object detection model, based on a novel recurrent network architecture, named Pyramid Dilated Bidirectional ConvLSTM (PDB-ConvLSTM). A Pyramid Dilated Convolution (PDC) module is first designed for simultaneously extracting spatial features at multiple scales. These spatial features are then concatenated and fed into an extended Deeper Bidirectional ConvLSTM (DB-ConvLSTM) to learn spatiotemporal information. Forward and backward ConvLSTM units are placed in two layers and connected in a cascaded way, encouraging information flow between the bi-directional streams and leading to deeper feature extraction. We further augment DB-ConvLSTM with a PDC-like structure, by adopting several dilated DB-ConvLSTMs to extract multi-scale spatiotemporal information. Extensive experimental results show that our method outperforms previous video saliency models in a large margin, with a real-time speed of 20 fps on a single GPU. With unsupervised video object segmentation as an example application, the proposed model (with a CRF-based post-process) achieves state-of-the-art results on two popular benchmarks, well demonstrating its superior performance and high applicability.

PDF Abstract

Results from the Paper


 Ranked #1 on Video Salient Object Detection on UVSD (using extra training data)

     Get a GitHub badge
Task Dataset Model Metric Name Metric Value Global Rank Uses Extra
Training Data
Benchmark
Video Salient Object Detection DAVIS-2016 PDB S-Measure 0.882 # 5
MAX E-MEASURE 0.951 # 2
AVERAGE MAE 0.028 # 7
Unsupervised Video Object Segmentation DAVIS 2016 val PDB G 75.9 # 23
J 77.2 # 23
F 74.5 # 22
Unsupervised Video Object Segmentation DAVIS 2017 (test-dev) PDB J&F 40.4 # 5
Jaccard (Mean) 37.7 # 4
Jaccard (Recall) 42.6 # 4
Jaccard (Decay) 4.0 # 4
F-measure (Mean) 43.0 # 4
F-measure (Recall) 44.6 # 4
F-measure (Decay) 3.7 # 3
Unsupervised Video Object Segmentation DAVIS 2017 (val) PDB J&F 55.1 # 9
Jaccard (Mean) 53.2 # 9
Jaccard (Recall) 58.9 # 7
F-measure (Mean) 57.0 # 9
F-measure (Recall) 60.2 # 7
Video Salient Object Detection DAVSOD-Difficult20 PDB S-Measure 0.608 # 2
max E-measure 0.678 # 4
Average MAE 0.107 # 1
Video Salient Object Detection DAVSOD-easy35 PDB S-Measure 0.706 # 2
max F-Measure 0.591 # 2
max E-Measure 0.749 # 3
Average MAE 0.114 # 4
Video Salient Object Detection DAVSOD-Normal25 PDB S-Measure 0.649 # 2
max E-measure 0.698 # 3
Average MAE 0.132 # 4
Video Salient Object Detection FBMS-59 PDB S-Measure 0.851 # 5
AVERAGE MAE 0.064 # 5
MAX F-MEASURE 0.821 # 4
Unsupervised Video Object Segmentation FBMS test PDB J 74.0 # 11
Video Salient Object Detection MCL PDB S-Measure 0.856 # 1
MAX E-MEASURE 0.911 # 1
AVERAGE MAE 0.021 # 8
Video Salient Object Detection SegTrack v2 PDB S-Measure 0.864 # 2
AVERAGE MAE 0.024 # 3
max E-measure 0.935 # 1
Video Salient Object Detection UVSD PDB S-Measure 0.901 # 1
max E-measure 0.975 # 1
Average MAE 0.018 # 1
Video Salient Object Detection ViSal PDB S-Measure 0.907 # 3
max E-measure 0.846 # 6
Average MAE 0.032 # 3
Video Salient Object Detection VOS-T PDB S-Measure 0.818 # 3
max E-measure 0.837 # 3
Average MAE 0.078 # 3
Unsupervised Video Object Segmentation YouTube-Objects PDB J 65.5 # 11

Methods