Human skeletons and change detection for efficient violence detection in surveillance videos

In our constantly monitored world, surveillance cameras play a crucial role in curbing crime and violence in public spaces by serving as a deterrent. To enhance their effectiveness, there is a growing need for automated tools that can detect crimes in real time. In this paper, we propose a novel deep learning architecture that accurately and efficiently detects violent crimes in surveillance videos. We rely on what we believe are the most essential pieces of information to detect violence, namely: human bodies and their interaction. To this end, we employ human pose extractors and change detectors as the input of our proposal. Subsequently, we combine them using a novel method, which relies on additions instead of multiplications to guarantee the transmission of information even when one of the inputs provides a zero-valued signal; outperforming other combination alternatives of the literature. Finally, to account for both spatial and temporal information, we use a convolutional alternative of the standard LSTM, the ConvLSTM. The experiments performed on several benchmark datasets demonstrate the efficacy and efficiency of our proposal, achieving state-of-the-art results with much fewer trainable parameters. We release the code to replicate the proposed architecture at https://github.com/atmguille/Violence-Detection-With-Human-Skeletons

PDF

Datasets


Results from the Paper


Task Dataset Model Metric Name Metric Value Global Rank Benchmark
Activity Recognition RWF-2000 Human Skeletons + Change Detection Accuracy 90.25 # 3

Methods