2 code implementations • 17 Nov 2021 • Arun Babu, Changhan Wang, Andros Tjandra, Kushal Lakhotia, Qiantong Xu, Naman Goyal, Kritika Singh, Patrick von Platen, Yatharth Saraf, Juan Pino, Alexei Baevski, Alexis Conneau, Michael Auli
On the CoVoST-2 speech translation benchmark, we improve the previous state of the art by an average of 7. 4 BLEU over 21 translation directions into English.
Ranked #1 on Language Identification on VoxLingua107 (using extra training data)
no code implementations • 14 Oct 2021 • Sangeeta Srivastava, Yun Wang, Andros Tjandra, Anurag Kumar, Chunxi Liu, Kritika Singh, Yatharth Saraf
While self-supervised speech representation learning has been popular in the speech research community, very few works have comprehensively analyzed audio representation learning for non-speech audio tasks.
Ranked #5 on Audio Classification on Balanced Audio Set
no code implementations • 8 Jul 2021 • Andros Tjandra, Diptanu Gon Choudhury, Frank Zhang, Kritika Singh, Alexis Conneau, Alexei Baevski, Assaf Sela, Yatharth Saraf, Michael Auli
Language identification greatly impacts the success of downstream tasks such as automatic speech recognition.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3
no code implementations • 11 Feb 2021 • Leda Sari, Kritika Singh, Jiatong Zhou, Lorenzo Torresani, Nayan Singhal, Yatharth Saraf
Although speaker verification has conventionally been an audio-only task, some practical applications provide both audio and visual streams of input.
no code implementations • 16 May 2020 • Kritika Singh, Vimal Manohar, Alex Xiao, Sergey Edunov, Ross Girshick, Vitaliy Liptchinsky, Christian Fuegen, Yatharth Saraf, Geoffrey Zweig, Abdel-rahman Mohamed
Many semi- and weakly-supervised approaches have been investigated for overcoming the labeling cost of building high quality speech recognition systems.
no code implementations • 27 Oct 2019 • Kritika Singh, Dmytro Okhonko, Jun Liu, Yongqiang Wang, Frank Zhang, Ross Girshick, Sergey Edunov, Fuchun Peng, Yatharth Saraf, Geoffrey Zweig, Abdelrahman Mohamed
Supervised ASR models have reached unprecedented levels of accuracy, thanks in part to ever-increasing amounts of labelled training data.
no code implementations • LREC 2020 • Chunxi Liu, Qiaochu Zhang, Xiaohui Zhang, Kritika Singh, Yatharth Saraf, Geoffrey Zweig
Towards developing high-performing ASR for low-resource languages, approaches to address the lack of resources are to make use of data from multiple languages, and to augment the training data by creating acoustic variations.