no code implementations • 10 Jun 2023 • Tomoya Yoshinaga, Keitaro Tanaka, Shigeo Morishima
This paper describes an audio-visual speech enhancement (AV-SE) method that estimates from noisy input audio a mixture of the speech of the speaker appearing in an input video (on-screen target speech) and of a selected speaker not appearing in the video (off-screen target speech).