no code implementations • 10 Jan 2024 • Kenichi Fujita, Hiroshi Sato, Takanori Ashihara, Hiroki Kanagawa, Marc Delcroix, Takafumi Moriya, Yusuke Ijima
The zero-shot text-to-speech (TTS) method, based on speaker embeddings extracted from reference speech using self-supervised learning (SSL) speech representations, can reproduce speaker characteristics very accurately.
no code implementations • 24 Apr 2023 • Kenichi Fujita, Takanori Ashihara, Hiroki Kanagawa, Takafumi Moriya, Yusuke Ijima
This paper proposes a zero-shot text-to-speech (TTS) conditioned by a self-supervised speech-representation model acquired through self-supervised learning (SSL).
no code implementations • 2 Nov 2022 • Hiroki Kanagawa, Yusuke Ijima
Pruning time-consuming DNN modules is a promising way to realize a real-time vocoder on a CPU (e. g. WaveRNN, LPCNet).