Fast Video Object Segmentation using the Global Context Module

ECCV 2020  ·  Yu Li, Zhuoran Shen, Ying Shan ·

We developed a real-time, high-quality semi-supervised video object segmentation algorithm. Its accuracy is on par with the most accurate, time-consuming online-learning model, while its speed is similar to the fastest template-matching method with sub-optimal accuracy. The core component of the model is a novel global context module that effectively summarizes and propagates information through the entire video. Compared to previous approaches that only use one frame or a few frames to guide the segmentation of the current frame, the global context module uses all past frames. Unlike the previous state-of-the-art space-time memory network that caches a memory at each spatio-temporal position, the global context module uses a fixed-size feature representation. Therefore, it uses constant memory regardless of the video length and costs substantially less memory and computation. With the novel module, our model achieves top performance on standard benchmarks at a real-time speed.

PDF Abstract ECCV 2020 PDF ECCV 2020 Abstract
Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Semi-Supervised Video Object Segmentation DAVIS (no YouTube-VOS training) GC FPS 25.0 # 8
D16 val (G) 86.6 # 5
D16 val (J) 87.6 # 2
D16 val (F) 85.7 # 9
D17 val (G) 71.4 # 17
D17 val (J) 69.3 # 16
D17 val (F) 73.5 # 17

Methods