no code implementations • 4 Nov 2023 • Veera Raghavendra Elluru, Devang Kulshreshtha, Rohit Paturi, Sravan Bodapati, Srikanth Ronanki
Our multimodal training approach improves the accuracy of zero-shot intent classification on unseen intents of SLURP by 2. 75% and 18. 2% for the SLURP and internal goal-oriented dialog datasets, respectively, compared to audio-only training.
1 code implementation • 1 Nov 2023 • Juan Zuluaga-Gomez, Zhaocheng Huang, Xing Niu, Rohit Paturi, Sundararajan Srinivasan, Prashant Mathur, Brian Thompson, Marcello Federico
Conventional speech-to-text translation (ST) systems are trained on single-speaker utterances, and they may not generalize to real-life scenarios where the audio contains conversations by multiple speakers.
no code implementations • 4 Aug 2023 • Yogesh Virkar, Brian Thompson, Rohit Paturi, Sundararajan Srinivasan, Marcello Federico
The media localization industry usually requires a verbatim script of the final film or TV production in order to create subtitles or dubbing scripts in a foreign language.
no code implementations • 15 Jun 2023 • Rohit Paturi, Sundararajan Srinivasan, Xiang Li
Speaker diarization (SD) is typically used with an automatic speech recognition (ASR) system to ascribe speaker labels to recognized words.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3
no code implementations • 10 Dec 2021 • Rohit Paturi, Sundararajan Srinivasan, Katrin Kirchhoff, Daniel Garcia-Romero
Also, most of these models are trained with synthetic mixtures and do not generalize to real conversational data.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2