no code implementations • 11 Oct 2023 • Cheng-I Jeff Lai, Freda Shi, Puyuan Peng, Yoon Kim, Kevin Gimpel, Shiyu Chang, Yung-Sung Chuang, Saurabhchand Bhati, David Cox, David Harwath, Yang Zhang, Karen Livescu, James Glass
We study phrase structure induction from visually-grounded speech.
no code implementations • 8 Sep 2023 • Saurabhchand Bhati, Jesús Villalba, Laureano Moro-Velazquez, Thomas Thebaud, Najim Dehak
Cascaded SpeechCLIP attempted to generate localized word-level information and utilize both the pretrained image and text encoders.
no code implementations • 12 Apr 2023 • Saurabhchand Bhati, Jesús Villalba, Piotr Żelasko, Laureano Moro-Velazquez, Najim Dehak
These representations significantly reduce the amount of labeled data needed for downstream task performance, such as automatic speech recognition.
no code implementations • 22 Oct 2022 • Aparna Khare, Minhua Wu, Saurabhchand Bhati, Jasha Droppo, Roland Maas
Contrastive Predictive Coding (CPC) is a representation learning method that maximizes the mutual information between intermediate latent representations and the output of a given model.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
1 code implementation • 26 Jan 2022 • Piotr Żelasko, Siyuan Feng, Laureano Moro Velazquez, Ali Abavisani, Saurabhchand Bhati, Odette Scharenborg, Mark Hasegawa-Johnson, Najim Dehak
In this paper, we 1) investigate the influence of different factors (i. e., model architecture, phonotactic model, type of speech representation) on phone recognition in an unknown language; 2) provide an analysis of which phones transfer well across languages and which do not in order to understand the limitations of and areas for further improvement for automatic phone inventory creation; and 3) present different methods to build a phone inventory of an unseen language in an unsupervised way.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3
no code implementations • 5 Oct 2021 • Saurabhchand Bhati, Jesús Villalba, Piotr Żelasko, Laureano Moro-Velazquez, Najim Dehak
We overcome this limitation with a segmental contrastive predictive coding (SCPC) framework to model the signal structure at a higher level, e. g., phone level.
no code implementations • 3 Jun 2021 • Saurabhchand Bhati, Jesús Villalba, Piotr Żelasko, Laureano Moro-Velazquez, Najim Dehak
We overcome this limitation with a segmental contrastive predictive coding (SCPC) framework that can model the signal structure at a higher level e. g. at the phoneme level.
no code implementations • 26 Jul 2020 • Saurabhchand Bhati, Jesús Villalba, Piotr Żelasko, Najim Dehak
We perform segmentation based on the assumption that the frame feature vectors are more similar within a segment than across the segments.