Search Results for author: Steve Renals

Found 47 papers, 16 papers with code

Towards Robust Waveform-Based Acoustic Models

no code implementations • 16 Oct 2021 • Dino Oglic, Zoran Cvetkovic, Peter Sollich, Steve Renals, Bin Yu

We study the problem of learning robust acoustic models in adverse environments, characterized by a significant mismatch between training and test conditions.

Data Augmentation Inductive Bias +3

Paper
Add Code

Automatic audiovisual synchronisation for ultrasound tongue imaging

no code implementations • 31 May 2021 • Aciel Eshky, Joanne Cleland, Manuel Sam Ribeiro, Eleanor Sugden, Korin Richmond, Steve Renals

Our results demonstrate the strength of our approach and its ability to generalise to data from new domains.

Paper
Add Code

European Language Grid: A Joint Platform for the European Language Technology Community

no code implementations • EACL 2021 • Georg Rehm, Stelios Piperidis, Kalina Bontcheva, Jan Hajic, Victoria Arranz, Andrejs Vasi{\c{l}}jevs, Gerhard Backfried, Jose Manuel Gomez-Perez, Ulrich Germann, R{\'e}mi Calizzano, Nils Feldhus, Stefanie Hegele, Florian Kintzel, Katrin Marheinecke, Julian Moreno-Schneider, Dimitris Galanis, Penny Labropoulou, Miltos Deligiannis, Katerina Gkirtzou, Athanasia Kolovou, Dimitris Gkoumas, Leon Voukoutis, Ian Roberts, Jana Hamrlova, Dusan Varis, Lukas Kacena, Khalid Choukri, Val{\'e}rie Mapelli, Micka{\"e}l Rigault, Julija Melnika, Miro Janosik, Katja Prinz, Andres Garcia-Silva, Cristian Berrio, Ondrej Klejch, Steve Renals

Europe is a multilingual society, in which dozens of languages are spoken.

Paper
Add Code

Silent versus modal multi-speaker speech recognition from ultrasound and video

no code implementations • 27 Feb 2021 • Manuel Sam Ribeiro, Aciel Eshky, Korin Richmond, Steve Renals

We observe that silent speech recognition from imaging data underperforms compared to modal speech recognition, likely due to a speaking-mode mismatch between training and testing.

speech-recognition Speech Recognition

Paper
Add Code

Exploiting ultrasound tongue imaging for the automatic detection of speech articulation errors

no code implementations • 27 Feb 2021 • Manuel Sam Ribeiro, Joanne Cleland, Aciel Eshky, Korin Richmond, Steve Renals

For automatic velar fronting error detection, the best results are obtained when jointly using ultrasound and audio.

Paper
Add Code

Train your classifier first: Cascade Neural Networks Training from upper layers to lower layers

no code implementations • 9 Feb 2021 • Shucong Zhang, Cong-Thanh Do, Rama Doddipatla, Erfan Loweimi, Peter Bell, Steve Renals

Although the lower layers of a deep neural network learn features which are transferable across datasets, these layers are not transferable within the same dataset.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

TaL: a synchronised multi-speaker corpus of ultrasound tongue imaging, audio, and lip videos

no code implementations • 19 Nov 2020 • Manuel Sam Ribeiro, Jennifer Sanger, Jing-Xuan Zhang, Aciel Eshky, Alan Wrench, Korin Richmond, Steve Renals

We present the Tongue and Lips corpus (TaL), a multi-speaker corpus of audio, ultrasound tongue imaging, and lip videos.

speech-recognition Speech Recognition +1

Paper
Add Code

Stochastic Attention Head Removal: A simple and effective method for improving Transformer Based ASR Models

1 code implementation • 8 Nov 2020 • Shucong Zhang, Erfan Loweimi, Peter Bell, Steve Renals

To the best of our knowledge, we have achieved state-of-the-art end-to-end Transformer based model performance on Switchboard and AMI.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Code

On the Usefulness of Self-Attention for Automatic Speech Recognition with Transformers

no code implementations • 8 Nov 2020 • Shucong Zhang, Erfan Loweimi, Peter Bell, Steve Renals

Self-attention models such as Transformers, which can capture temporal relationships without being limited by the distance between events, have given competitive speech recognition results.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Leveraging speaker attribute information using multi task learning for speaker verification and diarization

1 code implementation • 27 Oct 2020 • Chau Luu, Peter Bell, Steve Renals

On a test set of US Supreme Court recordings, we show that by leveraging two additional forms of speaker attribute information derived respectively from the matched training data, and VoxCeleb corpus, we improve the performance of our deep speaker embeddings for both verification and diarization tasks, achieving a relative improvement of 26. 2% in DER and 6. 7% in EER compared to baselines using speaker labels only.

Attribute Multi-Task Learning +2

Paper
Code

Adaptation Algorithms for Neural Network-Based Speech Recognition: An Overview

1 code implementation • 14 Aug 2020 • Peter Bell, Joachim Fainberg, Ondrej Klejch, Jinyu Li, Steve Renals, Pawel Swietojanski

We present a structured overview of adaptation algorithms for neural network-based speech recognition, considering both hybrid hidden Markov model / neural network systems and end-to-end neural network systems, with a focus on speaker adaptation, domain adaptation, and accent adaptation.

Data Augmentation Domain Adaptation +2

Paper
Code

Word Error Rate Estimation Without ASR Output: e-WER2

1 code implementation • 8 Aug 2020 • Ahmed Ali, Steve Renals

Measuring the performance of automatic speech recognition (ASR) systems requires manually transcribed data in order to compute the word error rate (WER), which is often time-consuming and expensive.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Code

When Can Self-Attention Be Replaced by Feed Forward Layers?

no code implementations • 28 May 2020 • Shucong Zhang, Erfan Loweimi, Peter Bell, Steve Renals

Recently, self-attention models such as Transformers have given competitive results compared to recurrent neural network systems in speech recognition.

speech-recognition Speech Recognition

Paper
Add Code

European Language Grid: An Overview

no code implementations • LREC 2020 • Georg Rehm, Maria Berger, Ela Elsholz, Stefanie Hegele, Florian Kintzel, Katrin Marheinecke, Stelios Piperidis, Miltos Deligiannis, Dimitris Galanis, Katerina Gkirtzou, Penny Labropoulou, Kalina Bontcheva, David Jones, Ian Roberts, Jan Hajic, Jana Hamrlová, Lukáš Kačena, Khalid Choukri, Victoria Arranz, Andrejs Vasiļjevs, Orians Anvari, Andis Lagzdiņš, Jūlija Meļņika, Gerhard Backfried, Erinç Dikici, Miroslav Janosik, Katja Prinz, Christoph Prinz, Severin Stampler, Dorothea Thomas-Aniola, José Manuel Gómez Pérez, Andres Garcia Silva, Christian Berrío, Ulrich Germann, Steve Renals, Ondrej Klejch

With 24 official EU and many additional languages, multilingualism in Europe and an inclusive Digital Single Market can only be enabled through Language Technologies (LTs).

Paper
Add Code

DropClass and DropAdapt: Dropping classes for deep speaker representation learning

1 code implementation • 2 Feb 2020 • Chau Luu, Peter Bell, Steve Renals

The first proposed method, DropClass, works via periodically dropping a random subset of classes from the training data and the output layer throughout training, resulting in a feature extractor trained on many different classification tasks.

General Classification Representation Learning +1

Paper
Code

Multi-scale Octave Convolutions for Robust Speech Recognition

no code implementations • 31 Oct 2019 • Joanna Rownicka, Peter Bell, Steve Renals

We propose a multi-scale octave convolution layer to learn robust speech representations efficiently.

Computational Efficiency Robust Speech Recognition +1

Paper
Add Code

Channel adversarial training for speaker verification and diarization

no code implementations • 25 Oct 2019 • Chau Luu, Peter Bell, Steve Renals

Previous work has encouraged domain-invariance in deep speaker embedding by adversarially classifying the dataset or labelled environment to which the generated features belong.

Speaker Verification

Paper
Add Code

Speaker Adaptive Training using Model Agnostic Meta-Learning

1 code implementation • 23 Oct 2019 • Ondřej Klejch, Joachim Fainberg, Peter Bell, Steve Renals

Speaker adaptive training (SAT) of neural network acoustic models learns models in a way that makes them more suitable for adaptation to test conditions.

Meta-Learning

Paper
Code

Embeddings for DNN speaker adaptive training

no code implementations • 30 Sep 2019 • Joanna Rownicka, Peter Bell, Steve Renals

In this work, we investigate the use of embeddings for speaker-adaptive training of DNNs (DNN-SAT) focusing on a small amount of adaptation data per speaker.

Speaker Recognition

Paper
Add Code

Acoustic Model Adaptation from Raw Waveforms with SincNet

1 code implementation • 30 Sep 2019 • Joachim Fainberg, Ondřej Klejch, Erfan Loweimi, Peter Bell, Steve Renals

Raw waveform acoustic modelling has recently gained interest due to neural networks' ability to learn feature extraction, and the potential for finding better representations for a given scenario than hand-crafted features.

Acoustic Modelling

Paper
Code

Top-down training for neural networks

no code implementations • 25 Sep 2019 • Shucong Zhang, Cong-Thanh Do, Rama Doddipatla, Erfan Loweimi, Peter Bell, Steve Renals

Interpreting the top layers as a classifier and the lower layers a feature extractor, one can hypothesize that unwanted network convergence may occur when the classifier has overfit with respect to the feature extractor.

speech-recognition Speech Recognition

Paper
Add Code

Speaker-independent classification of phonetic segments from raw ultrasound in child speech

no code implementations • 1 Jul 2019 • Manuel Sam Ribeiro, Aciel Eshky, Korin Richmond, Steve Renals

Ultrasound tongue imaging (UTI) provides a convenient way to visualize the vocal tract during speech production.

General Classification

Paper
Add Code

Ultrasound tongue imaging for diarization and alignment of child speech therapy sessions

1 code implementation • 1 Jul 2019 • Manuel Sam Ribeiro, Aciel Eshky, Korin Richmond, Steve Renals

We investigate the automatic processing of child speech therapy sessions using ultrasound visual biofeedback, with a specific focus on complementing acoustic features with ultrasound images of the tongue for the tasks of speaker diarization and time-alignment of target words.

speaker-diarization Speaker Diarization +1

Paper
Code

Synchronising audio and ultrasound by learning cross-modal embeddings

1 code implementation • 1 Jul 2019 • Aciel Eshky, Manuel Sam Ribeiro, Korin Richmond, Steve Renals

Audiovisual synchronisation is the task of determining the time offset between speech audio and a video recording of the articulators.

Paper
Code

Lattice-Based Unsupervised Test-Time Adaptation of Neural Network Acoustic Models

no code implementations • 27 Jun 2019 • Ondrej Klejch, Joachim Fainberg, Peter Bell, Steve Renals

Acoustic model adaptation to unseen test recordings aims to reduce the mismatch between training and testing conditions.

Test-time Adaptation

Paper
Add Code

Lattice-based lightly-supervised acoustic model training

no code implementations • 30 May 2019 • Joachim Fainberg, Ondřej Klejch, Steve Renals, Peter Bell

This text data can be used for lightly supervised training, in which text matching the audio is selected using an existing speech recognition model.

Language Modelling speech-recognition +2

Paper
Add Code

Dynamic Evaluation of Transformer Language Models

1 code implementation • 17 Apr 2019 • Ben Krause, Emmanuel Kahembwe, Iain Murray, Steve Renals

This research note combines two methods that have recently improved the state of the art in language modeling: Transformers and dynamic evaluation.

Ranked #1 on Language Modelling on Hutter Prize

Language Modelling

Paper
Code

Analyzing deep CNN-based utterance embeddings for acoustic model adaptation

no code implementations • 12 Nov 2018 • Joanna Rownicka, Peter Bell, Steve Renals

We analyze the representations learned by deep CNNs and compare them with deep neural network (DNN) representations and i-vectors, in the context of acoustic model adaptation.

speech-recognition Speech Recognition

Paper
Add Code

Word Error Rate Estimation for Speech Recognition: e-WER

1 code implementation • ACL 2018 • Ahmed Ali, Steve Renals

Measuring the performance of automatic speech recognition (ASR) systems requires manually transcribed data in order to compute the word error rate (WER), which is often time-consuming and expensive.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +5

Paper
Code

Speech Recognition Challenge in the Wild: Arabic MGB-3

1 code implementation • 21 Sep 2017 • Ahmed Ali, Stephan Vogel, Steve Renals

Two hours of audio per dialect were released for development and a further two hours were used for evaluation.

Arabic Speech Recognition Dialect Identification +2

Paper
Code

WERd: Using Social Text Spelling Variants for Evaluating Dialectal Speech Recognition

no code implementations • 21 Sep 2017 • Ahmed Ali, Preslav Nakov, Peter Bell, Steve Renals

We study the problem of evaluating automatic speech recognition (ASR) systems that target dialectal speech input.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

Dynamic Evaluation of Neural Sequence Models

3 code implementations • ICML 2018 • Ben Krause, Emmanuel Kahembwe, Iain Murray, Steve Renals

We present methodology for using dynamic evaluation to improve neural sequence models.

Ranked #10 on Language Modelling on Hutter Prize

Language Modelling

105

Paper
Code

End-to-End Neural Segmental Models for Speech Recognition

no code implementations • 1 Aug 2017 • Hao Tang, Liang Lu, Lingpeng Kong, Kevin Gimpel, Karen Livescu, Chris Dyer, Noah A. Smith, Steve Renals

Segmental models are an alternative to frame-based models for sequence prediction, where hypothesized path weights are based on entire segment scores rather than a single frame at a time.

Decoder speech-recognition +1

Paper
Add Code

The SUMMA Platform Prototype

no code implementations • EACL 2017 • Renars Liepins, Ulrich Germann, Guntis Barzdins, Alex Birch, ra, Steve Renals, Susanne Weber, Peggy van der Kreeft, Herv{\'e} Bourlard, Jo{\~a}o Prieto, Ond{\v{r}}ej Klejch, Peter Bell, Alex Lazaridis, ros, Alfonso Mendes, Sebastian Riedel, Mariana S. C. Almeida, Pedro Balage, Shay B. Cohen, Tomasz Dwojak, Philip N. Garner, Andreas Giefer, Marcin Junczys-Dowmunt, Hina Imran, David Nogueira, Ahmed Ali, Mir, Sebasti{\~a}o a, Andrei Popescu-Belis, Lesly Miculicich Werlen, Nikos Papasarantopoulos, Abiola Obamuyide, Clive Jones, Fahim Dalvi, Andreas Vlachos, Yang Wang, Sibo Tong, Rico Sennrich, Nikolaos Pappas, Shashi Narayan, Marco Damonte, Nadir Durrani, Sameer Khurana, Ahmed Abdelali, Hassan Sajjad, Stephan Vogel, David Sheppey, Chris Hernon, Jeff Mitchell

We present the first prototype of the SUMMA Platform: an integrated platform for multilingual media monitoring.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +5

Paper
Add Code

Small-footprint Highway Deep Neural Networks for Speech Recognition

no code implementations • 18 Oct 2016 • Liang Lu, Steve Renals

Furthermore, HDNNs are more controllable than DNNs: the gate functions of an HDNN can control the behavior of the whole network using a very small number of model parameters.

speech-recognition Speech Recognition

Paper
Add Code

Multiplicative LSTM for sequence modelling

1 code implementation • 26 Sep 2016 • Ben Krause, Liang Lu, Iain Murray, Steve Renals

We introduce multiplicative LSTM (mLSTM), a recurrent neural network architecture for sequence modelling that combines the long short-term memory (LSTM) and multiplicative recurrent neural network architectures.

Ranked #14 on Language Modelling on Hutter Prize

Density Estimation Language Modelling

Paper
Code

Multi-view Dimensionality Reduction for Dialect Identification of Arabic Broadcast Speech

no code implementations • 19 Sep 2016 • Sameer Khurana, Ahmed Ali, Steve Renals

In this work, we present a new Vector Space Model (VSM) of speech utterances for the task of spoken dialect identification.

Dialect Identification Dimensionality Reduction

Paper
Add Code

The MGB-2 Challenge: Arabic Multi-Dialect Broadcast Media Recognition

no code implementations • 19 Sep 2016 • Ahmed Ali, Peter Bell, James Glass, Yacine Messaoui, Hamdy Mubarak, Steve Renals, Yifan Zhang

For language modelling, we made available over 110M words crawled from Aljazeera Arabic website Aljazeera. net for a 10 year duration 2000-2011.

Acoustic Modelling Language Modelling +1

Paper
Add Code

Knowledge Distillation for Small-footprint Highway Networks

no code implementations • 2 Aug 2016 • Liang Lu, Michelle Guo, Steve Renals

We have shown that HDNN-based acoustic models can achieve comparable recognition accuracy with much smaller number of model parameters compared to plain deep neural network (DNN) acoustic models.

Acoustic Modelling Knowledge Distillation +2

Paper
Add Code

Character-Level Neural Translation for Multilingual Media Monitoring in the SUMMA Project

1 code implementation • LREC 2016 • Guntis Barzdins, Steve Renals, Didzis Gosko

The results of this paper describe a novel approach to the automatic story segmentation and storyline clustering problem.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +6

Paper
Code

Differentiable Pooling for Unsupervised Acoustic Model Adaptation

no code implementations • 31 Mar 2016 • Pawel Swietojanski, Steve Renals

We present a deep neural network (DNN) acoustic model that includes parametrised and differentiable pooling operators.

speech-recognition Speech Recognition

Paper
Add Code

Segmental Recurrent Neural Networks for End-to-end Speech Recognition

no code implementations • 1 Mar 2016 • Liang Lu, Lingpeng Kong, Chris Dyer, Noah A. Smith, Steve Renals

This model connects the segmental conditional random field (CRF) with a recurrent neural network (RNN) used for feature extraction.

Ranked #16 on Speech Recognition on TIMIT

Acoustic Modelling Language Modelling +2

Paper
Add Code

Learning Hidden Unit Contributions for Unsupervised Acoustic Model Adaptation

no code implementations • 12 Jan 2016 • Pawel Swietojanski, Jinyu Li, Steve Renals

This work presents a broad study on the adaptation of neural network acoustic models by means of learning hidden unit contributions (LHUC) -- a method that linearly re-combines hidden units in a speaker- or environment-dependent manner using small amounts of unsupervised adaptation data.

speech-recognition Speech Recognition

Paper
Add Code

Small-footprint Deep Neural Networks with Highway Connections for Speech Recognition

no code implementations • 14 Dec 2015 • Liang Lu, Steve Renals

For speech recognition, deep neural networks (DNNs) have significantly improved the recognition accuracy in most of benchmark datasets and application domains.

speech-recognition Speech Recognition

Paper
Add Code

Automatic Dialect Detection in Arabic Broadcast Speech

1 code implementation • 23 Sep 2015 • Ahmed Ali, Najim Dehak, Patrick Cardinal, Sameer Khurana, Sree Harsha Yella, James Glass, Peter Bell, Steve Renals

We used these features in a binary classifier to discriminate between Modern Standard Arabic (MSA) and Dialectal Arabic, with an accuracy of 100%.

Ranked #1 on Spoken language identification on Untranscribed mixed-speech dataset

Dialect Identification speech-recognition +2

Paper
Code

Multi-Reference Evaluation for Dialectal Speech Recognition System: A Study for Egyptian ASR

no code implementations • WS 2015 • Ahmed Ali, Walid Magdy, Steve Renals

Machine Translation speech-recognition +1

Paper
Add Code

Tied Probabilistic Linear Discriminant Analysis for Speech Recognition

no code implementations • 4 Nov 2014 • Liang Lu, Steve Renals

Acoustic models using probabilistic linear discriminant analysis (PLDA) capture the correlations within feature vectors using subspaces which do not vastly expand the model.

speech-recognition Speech Recognition

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.