Adopting Self-Supervised Learning into Unsupervised Video Summarization through Restorative Score.

Conference 2023  ·  Mehryar Abbasi, Parvaneh Saeedi ·

In this paper, we present a new process for creating video summaries in an unsupervised manner. Our approach involves training a transformer encoder model to reconstruct missing frames in a video in a self-supervised way using the partially masked video as input. We then introduce an algorithm that utilizes the above-trained encoder to generate an importance score for each frame. Such frame importance scores are used to create the summary of the video. We show that the reconstruction loss of the model for a video with masked frames correlates with the representativeness of the remaining frames in the video. We validate the effectiveness of our approach on two benchmark datasets of TVSum and SumMe. We demonstrate that it outperforms state-of-the-art (SOTA) methods. Additionally, our approach is more stable during the training process compared to SOTA techniques based on generative adversarial learning. Our source code is publicly available 1 .

PDF

Datasets


Results from the Paper


Task Dataset Model Metric Name Metric Value Global Rank Benchmark
Unsupervised Video Summarization SumMe RS-SUM F1-score 52.0 # 2
Unsupervised Video Summarization TvSum RS-SUM F1-score 61.4 # 1
Spearman's Rho 0.106 # 2
Kendall's Tau 0.08 # 2

Methods


No methods listed for this paper. Add relevant methods here