no code implementations • 31 Jul 2023 • Guangyan Zhang, Thomas Merritt, Manuel Sam Ribeiro, Biel Tura-Vecino, Kayoko Yanagisawa, Kamil Pokora, Abdelhamid Ezzerg, Sebastian Cygert, Ammar Abbas, Piotr Bilinski, Roberto Barra-Chicote, Daniel Korzekwa, Jaime Lorenzo-Trueba
Neural text-to-speech systems are often optimized on L1/L2 losses, which make strong assumptions about the distributions of the target data space.
no code implementations • 31 Jul 2023 • Manuel Sam Ribeiro, Giulia Comini, Jaime Lorenzo-Trueba
The G2P model is used to train a multilingual phone recognition system, which then decodes speech recordings with a phonetic representation.
no code implementations • 31 Jul 2023 • Giulia Comini, Manuel Sam Ribeiro, Fan Yang, Heereen Shim, Jaime Lorenzo-Trueba
Phonetic information and linguistic knowledge are an essential component of a Text-to-speech (TTS) front-end.
1 code implementation • 22 Sep 2022 • Cassia Valentini-Botinhao, Manuel Sam Ribeiro, Oliver Watts, Korin Richmond, Gustav Eje Henter
While previous work has focused on predicting listeners' ratings (mean opinion scores) of individual stimuli, we focus on the simpler task of predicting subjective preference given two speech stimuli for the same text.
no code implementations • 29 Jul 2022 • Giulia Comini, Goeric Huybrechts, Manuel Sam Ribeiro, Adam Gabrys, Jaime Lorenzo-Trueba
The availability of data in expressive styles across languages is limited, and recording sessions are costly and time consuming.
no code implementations • 16 Feb 2022 • Adam Gabryś, Goeric Huybrechts, Manuel Sam Ribeiro, Chung-Ming Chien, Julian Roth, Giulia Comini, Roberto Barra-Chicote, Bartek Perz, Jaime Lorenzo-Trueba
It uses voice conversion (VC) as a post-processing module appended to a pre-existing high-quality TTS system and marks a conceptual shift in the existing TTS paradigm, framing the few-shot TTS problem as a VC task.
no code implementations • 10 Feb 2022 • Manuel Sam Ribeiro, Julian Roth, Giulia Comini, Goeric Huybrechts, Adam Gabrys, Jaime Lorenzo-Trueba
The proposed approach relies on voice conversion to first generate high-quality data from the set of supporting expressive speakers.
no code implementations • 31 May 2021 • Aciel Eshky, Joanne Cleland, Manuel Sam Ribeiro, Eleanor Sugden, Korin Richmond, Steve Renals
Our results demonstrate the strength of our approach and its ability to generalise to data from new domains.
no code implementations • 27 Feb 2021 • Manuel Sam Ribeiro, Joanne Cleland, Aciel Eshky, Korin Richmond, Steve Renals
For automatic velar fronting error detection, the best results are obtained when jointly using ultrasound and audio.
no code implementations • 27 Feb 2021 • Manuel Sam Ribeiro, Aciel Eshky, Korin Richmond, Steve Renals
We observe that silent speech recognition from imaging data underperforms compared to modal speech recognition, likely due to a speaking-mode mismatch between training and testing.
no code implementations • 19 Nov 2020 • Manuel Sam Ribeiro, Jennifer Sanger, Jing-Xuan Zhang, Aciel Eshky, Alan Wrench, Korin Richmond, Steve Renals
We present the Tongue and Lips corpus (TaL), a multi-speaker corpus of audio, ultrasound tongue imaging, and lip videos.
1 code implementation • 1 Jul 2019 • Aciel Eshky, Manuel Sam Ribeiro, Korin Richmond, Steve Renals
Audiovisual synchronisation is the task of determining the time offset between speech audio and a video recording of the articulators.
1 code implementation • 1 Jul 2019 • Aciel Eshky, Manuel Sam Ribeiro, Joanne Cleland, Korin Richmond, Zoe Roxburgh, James Scobbie, Alan Wrench
In addition, it includes a set of annotations, some manual and some automatically produced, and software tools to process, transform and visualise the data.
1 code implementation • 1 Jul 2019 • Manuel Sam Ribeiro, Aciel Eshky, Korin Richmond, Steve Renals
We investigate the automatic processing of child speech therapy sessions using ultrasound visual biofeedback, with a specific focus on complementing acoustic features with ultrasound images of the tongue for the tasks of speaker diarization and time-alignment of target words.
no code implementations • 1 Jul 2019 • Manuel Sam Ribeiro, Aciel Eshky, Korin Richmond, Steve Renals
Ultrasound tongue imaging (UTI) provides a convenient way to visualize the vocal tract during speech production.