Search Results for author: Manuel Sam Ribeiro

Found 15 papers, 4 papers with code

Comparing normalizing flows and diffusion models for prosody and acoustic modelling in text-to-speech

no code implementations • 31 Jul 2023 • Guangyan Zhang, Thomas Merritt, Manuel Sam Ribeiro, Biel Tura-Vecino, Kayoko Yanagisawa, Kamil Pokora, Abdelhamid Ezzerg, Sebastian Cygert, Ammar Abbas, Piotr Bilinski, Roberto Barra-Chicote, Daniel Korzekwa, Jaime Lorenzo-Trueba

Neural text-to-speech systems are often optimized on L1/L2 losses, which make strong assumptions about the distributions of the target data space.

Acoustic Modelling Speech Synthesis +1

Paper
Add Code

Improving grapheme-to-phoneme conversion by learning pronunciations from speech recordings

no code implementations • 31 Jul 2023 • Manuel Sam Ribeiro, Giulia Comini, Jaime Lorenzo-Trueba

The G2P model is used to train a multilingual phone recognition system, which then decodes speech recordings with a phonetic representation.

speech-recognition Speech Recognition

Paper
Add Code

Multilingual context-based pronunciation learning for Text-to-Speech

no code implementations • 31 Jul 2023 • Giulia Comini, Manuel Sam Ribeiro, Fan Yang, Heereen Shim, Jaime Lorenzo-Trueba

Phonetic information and linguistic knowledge are an essential component of a Text-to-speech (TTS) front-end.

Paper
Add Code

Predicting pairwise preferences between TTS audio stimuli using parallel ratings data and anti-symmetric twin neural networks

1 code implementation • 22 Sep 2022 • Cassia Valentini-Botinhao, Manuel Sam Ribeiro, Oliver Watts, Korin Richmond, Gustav Eje Henter

While previous work has focused on predicting listeners' ratings (mean opinion scores) of individual stimuli, we focus on the simpler task of predicting subjective preference given two speech stimuli for the same text.

Paper
Code

Low-data? No problem: low-resource, language-agnostic conversational text-to-speech via F0-conditioned data augmentation

no code implementations • 29 Jul 2022 • Giulia Comini, Goeric Huybrechts, Manuel Sam Ribeiro, Adam Gabrys, Jaime Lorenzo-Trueba

The availability of data in expressive styles across languages is limited, and recording sessions are costly and time consuming.

Data Augmentation Voice Conversion

Paper
Add Code

Voice Filter: Few-shot text-to-speech speaker adaptation using voice conversion as a post-processing module

no code implementations • 16 Feb 2022 • Adam Gabryś, Goeric Huybrechts, Manuel Sam Ribeiro, Chung-Ming Chien, Julian Roth, Giulia Comini, Roberto Barra-Chicote, Bartek Perz, Jaime Lorenzo-Trueba

It uses voice conversion (VC) as a post-processing module appended to a pre-existing high-quality TTS system and marks a conceptual shift in the existing TTS paradigm, framing the few-shot TTS problem as a VC task.

Speech Synthesis Voice Conversion

Paper
Add Code

Cross-speaker style transfer for text-to-speech using data augmentation

no code implementations • 10 Feb 2022 • Manuel Sam Ribeiro, Julian Roth, Giulia Comini, Goeric Huybrechts, Adam Gabrys, Jaime Lorenzo-Trueba

The proposed approach relies on voice conversion to first generate high-quality data from the set of supporting expressive speakers.

Data Augmentation Style Transfer +1

Paper
Add Code

Automatic audiovisual synchronisation for ultrasound tongue imaging

no code implementations • 31 May 2021 • Aciel Eshky, Joanne Cleland, Manuel Sam Ribeiro, Eleanor Sugden, Korin Richmond, Steve Renals

Our results demonstrate the strength of our approach and its ability to generalise to data from new domains.

Paper
Add Code

Exploiting ultrasound tongue imaging for the automatic detection of speech articulation errors

no code implementations • 27 Feb 2021 • Manuel Sam Ribeiro, Joanne Cleland, Aciel Eshky, Korin Richmond, Steve Renals

For automatic velar fronting error detection, the best results are obtained when jointly using ultrasound and audio.

Paper
Add Code

Silent versus modal multi-speaker speech recognition from ultrasound and video

no code implementations • 27 Feb 2021 • Manuel Sam Ribeiro, Aciel Eshky, Korin Richmond, Steve Renals

We observe that silent speech recognition from imaging data underperforms compared to modal speech recognition, likely due to a speaking-mode mismatch between training and testing.

speech-recognition Speech Recognition

Paper
Add Code

TaL: a synchronised multi-speaker corpus of ultrasound tongue imaging, audio, and lip videos

no code implementations • 19 Nov 2020 • Manuel Sam Ribeiro, Jennifer Sanger, Jing-Xuan Zhang, Aciel Eshky, Alan Wrench, Korin Richmond, Steve Renals

We present the Tongue and Lips corpus (TaL), a multi-speaker corpus of audio, ultrasound tongue imaging, and lip videos.

speech-recognition Speech Recognition +1

Paper
Add Code

Synchronising audio and ultrasound by learning cross-modal embeddings

1 code implementation • 1 Jul 2019 • Aciel Eshky, Manuel Sam Ribeiro, Korin Richmond, Steve Renals

Audiovisual synchronisation is the task of determining the time offset between speech audio and a video recording of the articulators.

Paper
Code

UltraSuite: A Repository of Ultrasound and Acoustic Data from Child Speech Therapy Sessions

1 code implementation • 1 Jul 2019 • Aciel Eshky, Manuel Sam Ribeiro, Joanne Cleland, Korin Richmond, Zoe Roxburgh, James Scobbie, Alan Wrench

In addition, it includes a set of annotations, some manual and some automatically produced, and software tools to process, transform and visualise the data.

Paper
Code

Ultrasound tongue imaging for diarization and alignment of child speech therapy sessions

1 code implementation • 1 Jul 2019 • Manuel Sam Ribeiro, Aciel Eshky, Korin Richmond, Steve Renals

We investigate the automatic processing of child speech therapy sessions using ultrasound visual biofeedback, with a specific focus on complementing acoustic features with ultrasound images of the tongue for the tasks of speaker diarization and time-alignment of target words.

speaker-diarization Speaker Diarization +1

Paper
Code

Speaker-independent classification of phonetic segments from raw ultrasound in child speech

no code implementations • 1 Jul 2019 • Manuel Sam Ribeiro, Aciel Eshky, Korin Richmond, Steve Renals

Ultrasound tongue imaging (UTI) provides a convenient way to visualize the vocal tract during speech production.

General Classification

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.