SLAMP: Stochastic Latent Appearance and Motion Prediction

Motion is an important cue for video prediction and often utilized by separating video content into static and dynamic components. Most of the previous work utilizing motion is deterministic but there are stochastic methods that can model the inherent uncertainty of the future. Existing stochastic models either do not reason about motion explicitly or make limiting assumptions about the static part. In this paper, we reason about appearance and motion in the video stochastically by predicting the future based on the motion history. Explicit reasoning about motion without history already reaches the performance of current stochastic models. The motion history further improves the results by allowing to predict consistent dynamics several frames into the future. Our model performs comparably to the state-of-the-art models on the generic video prediction datasets, however, significantly outperforms them on two challenging real-world autonomous driving datasets with complex motion and dynamic background.

PDF Abstract ICCV 2021 PDF ICCV 2021 Abstract

Results from the Paper


Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Video Generation BAIR Robot Pushing SLAMP FVD score 245 ± 5 # 22
Cond 2 # 13
SSIM 0.8175±0.084 # 6
PSNR 19.67±0.26 # 3
LPIPS 0.0596±0.0032 # 7
Pred 28 # 20
Train 10 # 23
Video Prediction Cityscapes 128x128 SLAMP SSIM 0.649±0.025 # 2
LPIPS 0.2941±0.022 # 3
Cond. 10 # 4
PSNR 21.73±0.76 # 1
Pred 20 # 1
Video Prediction KTH SLAMP LPIPS 0.0795±0.0034 # 3
PSNR 29.39±0.30 # 3
FVD 228 ± 5 # 7
SSIM 0.8646±0.0050 # 8
Cond 10 # 1
Pred 30 # 17
Train 10 # 1

Methods


No methods listed for this paper. Add relevant methods here