no code implementations • COLING 2022 • Jie Chi, Peter Bell
This paper seeks to improve the performance of automatic speech recognition (ASR) systems operating on code-switched speech.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3
no code implementations • 30 May 2024 • Xiaoliang Wu, Chau Luu, Peter Bell, Ajitha Rajan
This paper proposes a fully explainable approach to speaker verification (SV), a task that fundamentally relies on individual speaker characteristics.
no code implementations • 26 May 2024 • Yuanchao Li, Pinzhen Chen, Peter Bell, Catherine Lai
ASR remains unsatisfactory in scenarios where the speaking style diverges from that used to train ASR systems, resulting in erroneous transcripts.
no code implementations • 22 Apr 2024 • Dongge Han, Trevor McInroe, Adam Jelley, Stefano V. Albrecht, Peter Bell, Amos Storkey
We introduce LLM-Personalize, a novel framework with an optimization pipeline designed to personalize LLM planners for household robotics.
no code implementations • 7 Jul 2023 • Sarenne Wallbridge, Peter Bell, Catherine Lai
Speech is a fundamental means of communication that can be seen to provide two channels for transmitting information: the lexical channel of which words are said, and the non-lexical channel of how they are spoken.
no code implementations • 29 May 2023 • Xiaoliang Wu, Peter Bell, Ajitha Rajan
Explainable AI (XAI) techniques have been widely used to help explain and understand the output of deep learning models in fields such as image classification and Natural Language Processing.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3
no code implementations • 25 May 2023 • Yuanchao Li, Peter Bell, Catherine Lai
In this work, we investigate the relationship between two affective attributes: personality and emotion, from a transfer learning perspective.
no code implementations • 25 May 2023 • Yuanchao Li, Zeyu Zhao, Ondrej Klejch, Peter Bell, Catherine Lai
To overcome this challenge, we investigate how Automatic Speech Recognition (ASR) performs on emotional speech by analyzing the ASR performance on emotion corpora and examining the distribution of word errors and confidence scores in ASR transcripts to gain insight into how emotion affects ASR.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 23 May 2023 • Yaoting Wang, Yuanchao Li, Paul Pu Liang, Louis-Philippe Morency, Peter Bell, Catherine Lai
Fusing multiple modalities has proven effective for multimodal information processing.
no code implementations • 31 Mar 2023 • Ramon Sanabria, Nikolay Bogoychev, Nina Markl, Andrea Carmantini, Ondrej Klejch, Peter Bell
Although the great many advances in English automatic speech recognition (ASR) over the past decades, results are usually reported based on test datasets which fail to represent the diversity of English as spoken today around the globe.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
no code implementations • 27 Feb 2023 • Xiaoliang Wu, Peter Bell, Ajitha Rajan
We address quality assessment for neural network based ASR by providing explanations that help increase our understanding of the system and ultimately help build trust in the system.
Automatic Speech Recognition Explainable Artificial Intelligence (XAI) +4
no code implementations • 24 Jan 2023 • Mathias Zinnen, Prathmesh Madhu, Ronak Kosti, Peter Bell, Andreas Maier, Vincent Christlein
The Odeuropa Challenge on Olfactory Object Recognition aims to foster the development of object detection in the visual arts and to promote an olfactory perspective on digital heritage.
no code implementations • 24 Jan 2023 • Mathias Zinnen, Prathmesh Madhu, Peter Bell, Andreas Maier, Vincent Christlein
We investigate the effect of style and category similarity in multiple datasets used for object detection pretraining.
no code implementations • 29 Nov 2022 • Christoph Minixhofer, Ondřej Klejch, Peter Bell
While modern Text-to-Speech (TTS) systems can produce natural-sounding speech, they remain unable to reproduce the full diversity found in natural speech data.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
no code implementations • 5 Oct 2022 • Yuanchao Li, Yumnah Mohamied, Peter Bell, Catherine Lai
Self-supervised speech models have grown fast during the past few years and have proven feasible for use in various downstream tasks.
no code implementations • 22 Jun 2022 • Prathmesh Madhu, Tilman Marquart, Ronak Kosti, Dirk Suckow, Peter Bell, Andreas Maier, Vincent Christlein
In this work, we present a novel approach called Image Composition Canvas (ICC++) to compare and retrieve images having similar compositional elements.
no code implementations • 15 Dec 2021 • Christoph Minixhofer, Ondřej Klejch, Peter Bell
In this work, we unify several existing decoding strategies for punctuation prediction in one framework and introduce a novel strategy which utilises multiple predictions at each word across different windows.
no code implementations • 12 Nov 2021 • Ondrej Klejch, Electra Wallington, Peter Bell
We present a method for cross-lingual training an ASR system using absolutely no transcribed training data from the target language, and with no phonetic knowledge of the language in question.
no code implementations • 29 Oct 2021 • Yuanchao Li, Peter Bell, Catherine Lai
However, due to the scarcity of emotion labelled data and the difficulty of recognizing emotional speech, it is hard to obtain reliable linguistic features and models in this research area.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 1 May 2021 • Sarenne Wallbridge, Peter Bell, Catherine Lai
People convey information extremely effectively through spoken interaction using multiple channels of information transmission: the lexical channel of what is said, and the non-lexical channel of how it is said.
no code implementations • EACL 2021 • David Wan, Chris Kedzie, Faisal Ladhak, Elsbeth Turcan, Petra Galuščáková, Elena Zotkina, Zhengping Jiang, Peter Bell, Kathleen McKeown
Typical ASR systems segment the input audio into utterances using purely acoustic information, which may not resemble the sentence-like units that are expected by conventional machine translation (MT) systems for Spoken Language Translation.
no code implementations • 9 Feb 2021 • Shucong Zhang, Cong-Thanh Do, Rama Doddipatla, Erfan Loweimi, Peter Bell, Steve Renals
Although the lower layers of a deep neural network learn features which are transferable across datasets, these layers are not transferable within the same dataset.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
1 code implementation • 10 Dec 2020 • Prathmesh Madhu, Angel Villar-Corrales, Ronak Kosti, Torsten Bendschus, Corinna Reinhardt, Peter Bell, Andreas Maier, Vincent Christlein
(2) To improve the already strong results further, we created a small dataset (ClassArch) consisting of ancient Greek vase paintings from the 6-5th century BCE with person and pose annotations.
1 code implementation • 8 Nov 2020 • Shucong Zhang, Erfan Loweimi, Peter Bell, Steve Renals
To the best of our knowledge, we have achieved state-of-the-art end-to-end Transformer based model performance on Switchboard and AMI.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
no code implementations • 8 Nov 2020 • Shucong Zhang, Erfan Loweimi, Peter Bell, Steve Renals
Self-attention models such as Transformers, which can capture temporal relationships without being limited by the distance between events, have given competitive speech recognition results.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
1 code implementation • 27 Oct 2020 • Chau Luu, Peter Bell, Steve Renals
On a test set of US Supreme Court recordings, we show that by leveraging two additional forms of speaker attribute information derived respectively from the matched training data, and VoxCeleb corpus, we improve the performance of our deep speaker embeddings for both verification and diarization tasks, achieving a relative improvement of 26. 2% in DER and 6. 7% in EER compared to baselines using speaker labels only.
no code implementations • 19 Oct 2020 • David Wan, Zhengping Jiang, Chris Kedzie, Elsbeth Turcan, Peter Bell, Kathleen McKeown
In this work, we focus on improving ASR output segmentation in the context of low-resource language speech-to-text translation.
1 code implementation • 8 Sep 2020 • Prathmesh Madhu, Tilman Marquart, Ronak Kosti, Peter Bell, Andreas Maier, Vincent Christlein
These compositions are useful in analyzing the interactions in an image to study artists and their artworks.
1 code implementation • 14 Aug 2020 • Peter Bell, Joachim Fainberg, Ondrej Klejch, Jinyu Li, Steve Renals, Pawel Swietojanski
We present a structured overview of adaptation algorithms for neural network-based speech recognition, considering both hybrid hidden Markov model / neural network systems and end-to-end neural network systems, with a focus on speaker adaptation, domain adaptation, and accent adaptation.
no code implementations • 28 May 2020 • Shucong Zhang, Erfan Loweimi, Peter Bell, Steve Renals
Recently, self-attention models such as Transformers have given competitive results compared to recurrent neural network systems in speech recognition.
no code implementations • LREC 2020 • David Wan, Zhengping Jiang, Chris Kedzie, Elsbeth Turcan, Peter Bell, Kathy Mckeown
In this work, we focus on improving ASR output segmentation in the context of low-resource language speech-to-text translation.
1 code implementation • 31 Mar 2020 • Prathmesh Madhu, Ronak Kosti, Lara Mührenberg, Peter Bell, Andreas Maier, Vincent Christlein
We present experiments and analysis on three different models and show that the model trained on domain related data gives the best performance for recognizing character.
1 code implementation • 2 Feb 2020 • Chau Luu, Peter Bell, Steve Renals
The first proposed method, DropClass, works via periodically dropping a random subset of classes from the training data and the output layer throughout training, resulting in a feature extractor trained on many different classification tasks.
no code implementations • 31 Oct 2019 • Joanna Rownicka, Peter Bell, Steve Renals
We propose a multi-scale octave convolution layer to learn robust speech representations efficiently.
no code implementations • 25 Oct 2019 • Chau Luu, Peter Bell, Steve Renals
Previous work has encouraged domain-invariance in deep speaker embedding by adversarially classifying the dataset or labelled environment to which the generated features belong.
1 code implementation • 23 Oct 2019 • Ondřej Klejch, Joachim Fainberg, Peter Bell, Steve Renals
Speaker adaptive training (SAT) of neural network acoustic models learns models in a way that makes them more suitable for adaptation to test conditions.
no code implementations • 30 Sep 2019 • Joanna Rownicka, Peter Bell, Steve Renals
In this work, we investigate the use of embeddings for speaker-adaptive training of DNNs (DNN-SAT) focusing on a small amount of adaptation data per speaker.
1 code implementation • 30 Sep 2019 • Joachim Fainberg, Ondřej Klejch, Erfan Loweimi, Peter Bell, Steve Renals
Raw waveform acoustic modelling has recently gained interest due to neural networks' ability to learn feature extraction, and the potential for finding better representations for a given scenario than hand-crafted features.
no code implementations • 25 Sep 2019 • Shucong Zhang, Cong-Thanh Do, Rama Doddipatla, Erfan Loweimi, Peter Bell, Steve Renals
Interpreting the top layers as a classifier and the lower layers a feature extractor, one can hypothesize that unwanted network convergence may occur when the classifier has overfit with respect to the feature extractor.
no code implementations • 27 Jun 2019 • Ondrej Klejch, Joachim Fainberg, Peter Bell, Steve Renals
Acoustic model adaptation to unseen test recordings aims to reduce the mismatch between training and testing conditions.
no code implementations • 30 May 2019 • Joachim Fainberg, Ondřej Klejch, Steve Renals, Peter Bell
This text data can be used for lightly supervised training, in which text matching the audio is selected using an existing speech recognition model.
no code implementations • 12 Nov 2018 • Joanna Rownicka, Peter Bell, Steve Renals
We analyze the representations learned by deep CNNs and compare them with deep neural network (DNN) representations and i-vectors, in the context of acoustic model adaptation.
no code implementations • 8 Nov 2018 • Bertrand Higy, Peter Bell
End-to-end approaches have recently become popular as a means of simplifying the training and deployment of speech recognition systems.
1 code implementation • 30 Aug 2018 • Ondřej Klejch, Joachim Fainberg, Peter Bell
The performance of automatic speech recognition systems can be improved by adapting an acoustic model to compensate for the mismatch between training and testing conditions, for example by adapting to unseen speakers.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 21 Sep 2017 • Ahmed Ali, Preslav Nakov, Peter Bell, Steve Renals
We study the problem of evaluating automatic speech recognition (ASR) systems that target dialectal speech input.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3
no code implementations • EACL 2017 • Renars Liepins, Ulrich Germann, Guntis Barzdins, Alex Birch, ra, Steve Renals, Susanne Weber, Peggy van der Kreeft, Herv{\'e} Bourlard, Jo{\~a}o Prieto, Ond{\v{r}}ej Klejch, Peter Bell, Alex Lazaridis, ros, Alfonso Mendes, Sebastian Riedel, Mariana S. C. Almeida, Pedro Balage, Shay B. Cohen, Tomasz Dwojak, Philip N. Garner, Andreas Giefer, Marcin Junczys-Dowmunt, Hina Imran, David Nogueira, Ahmed Ali, Mir, Sebasti{\~a}o a, Andrei Popescu-Belis, Lesly Miculicich Werlen, Nikos Papasarantopoulos, Abiola Obamuyide, Clive Jones, Fahim Dalvi, Andreas Vlachos, Yang Wang, Sibo Tong, Rico Sennrich, Nikolaos Pappas, Shashi Narayan, Marco Damonte, Nadir Durrani, Sameer Khurana, Ahmed Abdelali, Hassan Sajjad, Stephan Vogel, David Sheppey, Chris Hernon, Jeff Mitchell
We present the first prototype of the SUMMA Platform: an integrated platform for multilingual media monitoring.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +5
no code implementations • 19 Sep 2016 • Ahmed Ali, Peter Bell, James Glass, Yacine Messaoui, Hamdy Mubarak, Steve Renals, Yifan Zhang
For language modelling, we made available over 110M words crawled from Aljazeera Arabic website Aljazeera. net for a 10 year duration 2000-2011.
1 code implementation • 23 Sep 2015 • Ahmed Ali, Najim Dehak, Patrick Cardinal, Sameer Khurana, Sree Harsha Yella, James Glass, Peter Bell, Steve Renals
We used these features in a binary classifier to discriminate between Modern Standard Arabic (MSA) and Dialectal Arabic, with an accuracy of 100%.