no code implementations • 16 Oct 2021 • Dino Oglic, Zoran Cvetkovic, Peter Sollich, Steve Renals, Bin Yu
We study the problem of learning robust acoustic models in adverse environments, characterized by a significant mismatch between training and test conditions.
no code implementations • 31 May 2021 • Aciel Eshky, Joanne Cleland, Manuel Sam Ribeiro, Eleanor Sugden, Korin Richmond, Steve Renals
Our results demonstrate the strength of our approach and its ability to generalise to data from new domains.
no code implementations • EACL 2021 • Georg Rehm, Stelios Piperidis, Kalina Bontcheva, Jan Hajic, Victoria Arranz, Andrejs Vasi{\c{l}}jevs, Gerhard Backfried, Jose Manuel Gomez-Perez, Ulrich Germann, R{\'e}mi Calizzano, Nils Feldhus, Stefanie Hegele, Florian Kintzel, Katrin Marheinecke, Julian Moreno-Schneider, Dimitris Galanis, Penny Labropoulou, Miltos Deligiannis, Katerina Gkirtzou, Athanasia Kolovou, Dimitris Gkoumas, Leon Voukoutis, Ian Roberts, Jana Hamrlova, Dusan Varis, Lukas Kacena, Khalid Choukri, Val{\'e}rie Mapelli, Micka{\"e}l Rigault, Julija Melnika, Miro Janosik, Katja Prinz, Andres Garcia-Silva, Cristian Berrio, Ondrej Klejch, Steve Renals
Europe is a multilingual society, in which dozens of languages are spoken.
no code implementations • 27 Feb 2021 • Manuel Sam Ribeiro, Aciel Eshky, Korin Richmond, Steve Renals
We observe that silent speech recognition from imaging data underperforms compared to modal speech recognition, likely due to a speaking-mode mismatch between training and testing.
no code implementations • 27 Feb 2021 • Manuel Sam Ribeiro, Joanne Cleland, Aciel Eshky, Korin Richmond, Steve Renals
For automatic velar fronting error detection, the best results are obtained when jointly using ultrasound and audio.
no code implementations • 9 Feb 2021 • Shucong Zhang, Cong-Thanh Do, Rama Doddipatla, Erfan Loweimi, Peter Bell, Steve Renals
Although the lower layers of a deep neural network learn features which are transferable across datasets, these layers are not transferable within the same dataset.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 19 Nov 2020 • Manuel Sam Ribeiro, Jennifer Sanger, Jing-Xuan Zhang, Aciel Eshky, Alan Wrench, Korin Richmond, Steve Renals
We present the Tongue and Lips corpus (TaL), a multi-speaker corpus of audio, ultrasound tongue imaging, and lip videos.
1 code implementation • 8 Nov 2020 • Shucong Zhang, Erfan Loweimi, Peter Bell, Steve Renals
To the best of our knowledge, we have achieved state-of-the-art end-to-end Transformer based model performance on Switchboard and AMI.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
no code implementations • 8 Nov 2020 • Shucong Zhang, Erfan Loweimi, Peter Bell, Steve Renals
Self-attention models such as Transformers, which can capture temporal relationships without being limited by the distance between events, have given competitive speech recognition results.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
1 code implementation • 27 Oct 2020 • Chau Luu, Peter Bell, Steve Renals
On a test set of US Supreme Court recordings, we show that by leveraging two additional forms of speaker attribute information derived respectively from the matched training data, and VoxCeleb corpus, we improve the performance of our deep speaker embeddings for both verification and diarization tasks, achieving a relative improvement of 26. 2% in DER and 6. 7% in EER compared to baselines using speaker labels only.
1 code implementation • 14 Aug 2020 • Peter Bell, Joachim Fainberg, Ondrej Klejch, Jinyu Li, Steve Renals, Pawel Swietojanski
We present a structured overview of adaptation algorithms for neural network-based speech recognition, considering both hybrid hidden Markov model / neural network systems and end-to-end neural network systems, with a focus on speaker adaptation, domain adaptation, and accent adaptation.
1 code implementation • 8 Aug 2020 • Ahmed Ali, Steve Renals
Measuring the performance of automatic speech recognition (ASR) systems requires manually transcribed data in order to compute the word error rate (WER), which is often time-consuming and expensive.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3
no code implementations • 28 May 2020 • Shucong Zhang, Erfan Loweimi, Peter Bell, Steve Renals
Recently, self-attention models such as Transformers have given competitive results compared to recurrent neural network systems in speech recognition.
no code implementations • LREC 2020 • Georg Rehm, Maria Berger, Ela Elsholz, Stefanie Hegele, Florian Kintzel, Katrin Marheinecke, Stelios Piperidis, Miltos Deligiannis, Dimitris Galanis, Katerina Gkirtzou, Penny Labropoulou, Kalina Bontcheva, David Jones, Ian Roberts, Jan Hajic, Jana Hamrlová, Lukáš Kačena, Khalid Choukri, Victoria Arranz, Andrejs Vasiļjevs, Orians Anvari, Andis Lagzdiņš, Jūlija Meļņika, Gerhard Backfried, Erinç Dikici, Miroslav Janosik, Katja Prinz, Christoph Prinz, Severin Stampler, Dorothea Thomas-Aniola, José Manuel Gómez Pérez, Andres Garcia Silva, Christian Berrío, Ulrich Germann, Steve Renals, Ondrej Klejch
With 24 official EU and many additional languages, multilingualism in Europe and an inclusive Digital Single Market can only be enabled through Language Technologies (LTs).
1 code implementation • 2 Feb 2020 • Chau Luu, Peter Bell, Steve Renals
The first proposed method, DropClass, works via periodically dropping a random subset of classes from the training data and the output layer throughout training, resulting in a feature extractor trained on many different classification tasks.
no code implementations • 31 Oct 2019 • Joanna Rownicka, Peter Bell, Steve Renals
We propose a multi-scale octave convolution layer to learn robust speech representations efficiently.
no code implementations • 25 Oct 2019 • Chau Luu, Peter Bell, Steve Renals
Previous work has encouraged domain-invariance in deep speaker embedding by adversarially classifying the dataset or labelled environment to which the generated features belong.
1 code implementation • 23 Oct 2019 • Ondřej Klejch, Joachim Fainberg, Peter Bell, Steve Renals
Speaker adaptive training (SAT) of neural network acoustic models learns models in a way that makes them more suitable for adaptation to test conditions.
no code implementations • 30 Sep 2019 • Joanna Rownicka, Peter Bell, Steve Renals
In this work, we investigate the use of embeddings for speaker-adaptive training of DNNs (DNN-SAT) focusing on a small amount of adaptation data per speaker.
1 code implementation • 30 Sep 2019 • Joachim Fainberg, Ondřej Klejch, Erfan Loweimi, Peter Bell, Steve Renals
Raw waveform acoustic modelling has recently gained interest due to neural networks' ability to learn feature extraction, and the potential for finding better representations for a given scenario than hand-crafted features.
no code implementations • 25 Sep 2019 • Shucong Zhang, Cong-Thanh Do, Rama Doddipatla, Erfan Loweimi, Peter Bell, Steve Renals
Interpreting the top layers as a classifier and the lower layers a feature extractor, one can hypothesize that unwanted network convergence may occur when the classifier has overfit with respect to the feature extractor.
no code implementations • 1 Jul 2019 • Manuel Sam Ribeiro, Aciel Eshky, Korin Richmond, Steve Renals
Ultrasound tongue imaging (UTI) provides a convenient way to visualize the vocal tract during speech production.
1 code implementation • 1 Jul 2019 • Manuel Sam Ribeiro, Aciel Eshky, Korin Richmond, Steve Renals
We investigate the automatic processing of child speech therapy sessions using ultrasound visual biofeedback, with a specific focus on complementing acoustic features with ultrasound images of the tongue for the tasks of speaker diarization and time-alignment of target words.
1 code implementation • 1 Jul 2019 • Aciel Eshky, Manuel Sam Ribeiro, Korin Richmond, Steve Renals
Audiovisual synchronisation is the task of determining the time offset between speech audio and a video recording of the articulators.
no code implementations • 27 Jun 2019 • Ondrej Klejch, Joachim Fainberg, Peter Bell, Steve Renals
Acoustic model adaptation to unseen test recordings aims to reduce the mismatch between training and testing conditions.
no code implementations • 30 May 2019 • Joachim Fainberg, Ondřej Klejch, Steve Renals, Peter Bell
This text data can be used for lightly supervised training, in which text matching the audio is selected using an existing speech recognition model.
1 code implementation • 17 Apr 2019 • Ben Krause, Emmanuel Kahembwe, Iain Murray, Steve Renals
This research note combines two methods that have recently improved the state of the art in language modeling: Transformers and dynamic evaluation.
Ranked #1 on Language Modelling on Hutter Prize
no code implementations • 12 Nov 2018 • Joanna Rownicka, Peter Bell, Steve Renals
We analyze the representations learned by deep CNNs and compare them with deep neural network (DNN) representations and i-vectors, in the context of acoustic model adaptation.
1 code implementation • ACL 2018 • Ahmed Ali, Steve Renals
Measuring the performance of automatic speech recognition (ASR) systems requires manually transcribed data in order to compute the word error rate (WER), which is often time-consuming and expensive.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +5
1 code implementation • 21 Sep 2017 • Ahmed Ali, Stephan Vogel, Steve Renals
Two hours of audio per dialect were released for development and a further two hours were used for evaluation.
no code implementations • 21 Sep 2017 • Ahmed Ali, Preslav Nakov, Peter Bell, Steve Renals
We study the problem of evaluating automatic speech recognition (ASR) systems that target dialectal speech input.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3
3 code implementations • ICML 2018 • Ben Krause, Emmanuel Kahembwe, Iain Murray, Steve Renals
We present methodology for using dynamic evaluation to improve neural sequence models.
Ranked #10 on Language Modelling on Hutter Prize
no code implementations • 1 Aug 2017 • Hao Tang, Liang Lu, Lingpeng Kong, Kevin Gimpel, Karen Livescu, Chris Dyer, Noah A. Smith, Steve Renals
Segmental models are an alternative to frame-based models for sequence prediction, where hypothesized path weights are based on entire segment scores rather than a single frame at a time.
no code implementations • EACL 2017 • Renars Liepins, Ulrich Germann, Guntis Barzdins, Alex Birch, ra, Steve Renals, Susanne Weber, Peggy van der Kreeft, Herv{\'e} Bourlard, Jo{\~a}o Prieto, Ond{\v{r}}ej Klejch, Peter Bell, Alex Lazaridis, ros, Alfonso Mendes, Sebastian Riedel, Mariana S. C. Almeida, Pedro Balage, Shay B. Cohen, Tomasz Dwojak, Philip N. Garner, Andreas Giefer, Marcin Junczys-Dowmunt, Hina Imran, David Nogueira, Ahmed Ali, Mir, Sebasti{\~a}o a, Andrei Popescu-Belis, Lesly Miculicich Werlen, Nikos Papasarantopoulos, Abiola Obamuyide, Clive Jones, Fahim Dalvi, Andreas Vlachos, Yang Wang, Sibo Tong, Rico Sennrich, Nikolaos Pappas, Shashi Narayan, Marco Damonte, Nadir Durrani, Sameer Khurana, Ahmed Abdelali, Hassan Sajjad, Stephan Vogel, David Sheppey, Chris Hernon, Jeff Mitchell
We present the first prototype of the SUMMA Platform: an integrated platform for multilingual media monitoring.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +5
no code implementations • 18 Oct 2016 • Liang Lu, Steve Renals
Furthermore, HDNNs are more controllable than DNNs: the gate functions of an HDNN can control the behavior of the whole network using a very small number of model parameters.
1 code implementation • 26 Sep 2016 • Ben Krause, Liang Lu, Iain Murray, Steve Renals
We introduce multiplicative LSTM (mLSTM), a recurrent neural network architecture for sequence modelling that combines the long short-term memory (LSTM) and multiplicative recurrent neural network architectures.
Ranked #14 on Language Modelling on Hutter Prize
no code implementations • 19 Sep 2016 • Sameer Khurana, Ahmed Ali, Steve Renals
In this work, we present a new Vector Space Model (VSM) of speech utterances for the task of spoken dialect identification.
no code implementations • 19 Sep 2016 • Ahmed Ali, Peter Bell, James Glass, Yacine Messaoui, Hamdy Mubarak, Steve Renals, Yifan Zhang
For language modelling, we made available over 110M words crawled from Aljazeera Arabic website Aljazeera. net for a 10 year duration 2000-2011.
no code implementations • 2 Aug 2016 • Liang Lu, Michelle Guo, Steve Renals
We have shown that HDNN-based acoustic models can achieve comparable recognition accuracy with much smaller number of model parameters compared to plain deep neural network (DNN) acoustic models.
1 code implementation • LREC 2016 • Guntis Barzdins, Steve Renals, Didzis Gosko
The results of this paper describe a novel approach to the automatic story segmentation and storyline clustering problem.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +6
no code implementations • 31 Mar 2016 • Pawel Swietojanski, Steve Renals
We present a deep neural network (DNN) acoustic model that includes parametrised and differentiable pooling operators.
no code implementations • 1 Mar 2016 • Liang Lu, Lingpeng Kong, Chris Dyer, Noah A. Smith, Steve Renals
This model connects the segmental conditional random field (CRF) with a recurrent neural network (RNN) used for feature extraction.
Ranked #16 on Speech Recognition on TIMIT
no code implementations • 12 Jan 2016 • Pawel Swietojanski, Jinyu Li, Steve Renals
This work presents a broad study on the adaptation of neural network acoustic models by means of learning hidden unit contributions (LHUC) -- a method that linearly re-combines hidden units in a speaker- or environment-dependent manner using small amounts of unsupervised adaptation data.
no code implementations • 14 Dec 2015 • Liang Lu, Steve Renals
For speech recognition, deep neural networks (DNNs) have significantly improved the recognition accuracy in most of benchmark datasets and application domains.
1 code implementation • 23 Sep 2015 • Ahmed Ali, Najim Dehak, Patrick Cardinal, Sameer Khurana, Sree Harsha Yella, James Glass, Peter Bell, Steve Renals
We used these features in a binary classifier to discriminate between Modern Standard Arabic (MSA) and Dialectal Arabic, with an accuracy of 100%.
no code implementations • 4 Nov 2014 • Liang Lu, Steve Renals
Acoustic models using probabilistic linear discriminant analysis (PLDA) capture the correlations within feature vectors using subspaces which do not vastly expand the model.