no code implementations • 5 Jan 2024 • Kevin Everson, Yile Gu, Huck Yang, Prashanth Gurunath Shivakumar, Guan-Ting Lin, Jari Kolehmainen, Ivan Bulyko, Ankur Gandhe, Shalini Ghosh, Wael Hamza, Hung-Yi Lee, Ariya Rastrow, Andreas Stolcke
In the realm of spoken language understanding (SLU), numerous natural language understanding (NLU) methodologies have been adapted by supplying large language models (LLMs) with transcribed speech instead of conventional written text.
no code implementations • 23 Dec 2023 • Guan-Ting Lin, Prashanth Gurunath Shivakumar, Ankur Gandhe, Chao-Han Huck Yang, Yile Gu, Shalini Ghosh, Andreas Stolcke, Hung-Yi Lee, Ivan Bulyko
Specifically, our framework serializes tasks in the order of current paralinguistic attribute prediction, response paralinguistic attribute prediction, and response text generation with autoregressive conditioning.
no code implementations • 10 Oct 2023 • Prashanth Gurunath Shivakumar, Jari Kolehmainen, Yile Gu, Ankur Gandhe, Ariya Rastrow, Ivan Bulyko
In this study, we propose and explore several discriminative fine-tuning schemes for pre-trained LMs.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
no code implementations • 13 Jul 2023 • Jari Kolehmainen, Yile Gu, Aditya Gourav, Prashanth Gurunath Shivakumar, Ankur Gandhe, Ariya Rastrow, Ivan Bulyko
On a test set with personalized named entities, we show that each of these approaches improves word error rate by over 10%, against a neural rescoring baseline.
no code implementations • 27 Jun 2023 • Yile Gu, Prashanth Gurunath Shivakumar, Jari Kolehmainen, Ankur Gandhe, Ariya Rastrow, Ivan Bulyko
We study whether this scaling property is also applicable to second-pass rescoring, which is an important component of speech recognition systems.
no code implementations • 15 Jun 2023 • Prashanth Gurunath Shivakumar, Jari Kolehmainen, Yile Gu, Ankur Gandhe, Ariya Rastrow, Ivan Bulyko
We also show that the proposed distillation can reduce the WER gap between the student and the teacher by 62% upto 100%.
no code implementations • 3 Sep 2021 • Prashanth Gurunath Shivakumar, Somer Bishop, Catherine Lord, Shrikanth Narayanan
In this paper, we propose features specific to children and focus on speaker's phone duration as an important biomarker of children's age.
no code implementations • 19 Feb 2021 • Prashanth Gurunath Shivakumar, Shrikanth Narayanan
A key desiderata for inclusive and accessible speech recognition technology is ensuring its robust performance to children's speech.
1 code implementation • 3 Feb 2021 • Prashanth Gurunath Shivakumar, Panayiotis Georgiou, Shrikanth Narayanan
Confusion2vec, motivated from human speech production and perception, is a word vector representation which encodes ambiguities present in human spoken language in addition to semantics and syntactic information.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +5
no code implementations • 23 Oct 2019 • Prashanth Gurunath Shivakumar, Naveen Kumar, Panayiotis Georgiou, Shrikanth Narayanan
We introduce and analyze different recurrent neural network architectures for incremental and online processing of the ASR transcripts and compare it to the existing offline systems.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +8
no code implementations • 31 Aug 2019 • Prashanth Gurunath Shivakumar, Shao-Yen Tseng, Panayiotis Georgiou, Shrikanth Narayanan
In this work we derive motivation from psycholinguistics and propose the addition of behavioral information into the context of language modeling.
1 code implementation • 7 Apr 2019 • Prashanth Gurunath Shivakumar, Mu Yang, Panayiotis Georgiou
In this paper, we address the spoken language intent detection under noisy conditions imposed by automatic speech recognition (ASR) systems.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3
no code implementations • 8 Nov 2018 • Prashanth Gurunath Shivakumar, Panayiotis Georgiou
In this paper, we propose a novel word vector representation, Confusion2Vec, motivated from the human speech production and perception that encodes representational ambiguity.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3
no code implementations • 8 May 2018 • Prashanth Gurunath Shivakumar, Panayiotis Georgiou
Evaluations are presented on (i) comparisons of earlier GMM-HMM and the newer DNN Models, (ii) effectiveness of standard adaptation techniques versus transfer learning, (iii) various adaptation configurations in tackling the variabilities present in children speech, in terms of (a) acoustic spectral variability, and (b) pronunciation variability and linguistic constraints.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
1 code implementation • 7 Feb 2018 • Prashanth Gurunath Shivakumar, Haoqi Li, Kevin Knight, Panayiotis Georgiou
In this work we model ASR as a phrase-based noisy transformation channel and propose an error correction system that can learn from the aggregate errors of all the independent modules constituting the ASR and attempt to invert those.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2