audio-visual event localization

9 papers with code • 1 benchmarks • 1 datasets

This task has no description! Would you like to contribute one?

Benchmarks

Add a Result

These leaderboards are used to track progress in audio-visual event localization

Trend	Dataset	Best Model	Paper	Code	Compare
	UnAV-100	UnAV			See all

Datasets

UnAV-100

Most implemented papers

Most implemented Social Latest No code

Audio-Visual Event Localization in Unconstrained Videos

YapengTian/AVE-ECCV18 • • ECCV 2018

In this paper, we introduce a novel problem of audio-visual event localization in unconstrained videos.

Paper
Code

Dual-modality seq2seq network for audio-visual event localization

YapengTian/AVE-ECCV18 • • 20 Feb 2019

Audio-visual event localization requires one to identify theevent which is both visible and audible in a video (eitherat a frame or video level).

Paper
Code

Positive Sample Propagation along the Audio-Visual Event Line

jasongief/PSP_CVPR_2021 • • CVPR 2021

To encourage the network to extract high correlated features for positive samples, a new audio-visual pair similarity loss is proposed.

Paper
Code

MM-Pyramid: Multimodal Pyramid Attentional Network for Audio-Visual Event Localization and Video Parsing

JustinYuu/MM_Pyramid • • 24 Nov 2021

Recognizing and localizing events in videos is a fundamental task for video understanding.

Paper
Code

Cross-Modal Background Suppression for Audio-Visual Event Localization

marmot-xy/cmbs • • CVPR 2022

Audiovisual Event (AVE) localization requires the model to jointly localize an event by observing audio and visual information.

Paper
Code

ActionFormer: Localizing Moments of Actions with Transformers

happyharrycn/actionformer_release • • 16 Feb 2022

Self-attention based Transformer models have demonstrated impressive results for image classification and object detection, and more recently for video understanding.

Paper
Code

Leveraging the Video-level Semantic Consistency of Event for Audio-visual Event Localization

bravo5542/vscg • • 11 Oct 2022

In contrast to existing methods, we propose a novel video-level semantic consistency guidance network for the AVE localization task.

Paper
Code

Dense-Localizing Audio-Visual Events in Untrimmed Videos: A Large-Scale Benchmark and Baseline

ttgeng233/UnAV • • CVPR 2023

To better adapt to real-life applications, in this paper we focus on the task of dense-localizing audio-visual events, which aims to jointly localize and recognize all audio-visual events occurring in an untrimmed video.

Paper
Code

UniAV: Unified Audio-Visual Perception for Multi-Task Video Localization

ttgeng233/UniAV • • 4 Apr 2024

Video localization tasks aim to temporally locate specific instances in videos, including temporal action localization (TAL), sound event detection (SED) and audio-visual event localization (AVEL).

Paper
Code

audio-visual event localization

Benchmarks Add a Result

Datasets

Most implemented papers

Content

Benchmarks

Add a Result