no code implementations • 28 Oct 2023 • Ruohao Guo, Yaru Chen, Yanyu Qi, Wenzhen Yue, Dantong Niu, Xianghua Ying
In this paper, we propose a new multi-modal task, namely audio-visual instance segmentation (AVIS), in which the goal is to identify, segment, and track individual sounding object instances in audible videos, simultaneously.
no code implementations • 11 Oct 2023 • Yaru Chen, Ruohao Guo, Xubo Liu, Peipei Wu, Guangyao Li, Zhenbo Li, Wenwu Wang
Audio-visual video parsing is the task of categorizing a video at the segment level with weak labels, and predicting them as audible or visible events.