Search Results for author: Yusuke Ijima

Found 10 papers, 1 papers with code

Speech Rhythm-Based Speaker Embeddings Extraction from Phonemes and Phoneme Duration for Multi-Speaker Speech Synthesis

no code implementations • 11 Feb 2024 • Kenichi Fujita, Atsushi Ando, Yusuke Ijima

This paper proposes a speech rhythm-based method for speaker embeddings to model phoneme duration using a few utterances by the target speaker.

Speaker Identification Speech Synthesis

Paper
Add Code

What Do Self-Supervised Speech and Speaker Models Learn? New Findings From a Cross Model Layer-Wise Analysis

no code implementations • 31 Jan 2024 • Takanori Ashihara, Marc Delcroix, Takafumi Moriya, Kohei Matsuura, Taichi Asami, Yusuke Ijima

Our analysis unveils that 1) the capacity to represent content information is somewhat unrelated to enhanced speaker representation, 2) specific layers of speech SSL models would be partly specialized in capturing linguistic information, and 3) speaker SSL models tend to disregard linguistic information but exhibit more sophisticated speaker representation.

Self-Supervised Learning

Paper
Add Code

Noise-robust zero-shot text-to-speech synthesis conditioned on self-supervised speech-representation model with adapters

no code implementations • 10 Jan 2024 • Kenichi Fujita, Hiroshi Sato, Takanori Ashihara, Hiroki Kanagawa, Marc Delcroix, Takafumi Moriya, Yusuke Ijima

The zero-shot text-to-speech (TTS) method, based on speaker embeddings extracted from reference speech using self-supervised learning (SSL) speech representations, can reproduce speaker characteristics very accurately.

Self-Supervised Learning Speech Enhancement +2

Paper
Add Code

StyleCap: Automatic Speaking-Style Captioning from Speech Based on Speech and Language Self-supervised Learning Models

no code implementations • 28 Nov 2023 • Kazuki Yamauchi, Yusuke Ijima, Yuki Saito

The experimental results demonstrate that our StyleCap leveraging richer LLMs for the text decoder, speech self-supervised learning (SSL) features, and sentence rephrasing augmentation improves the accuracy and diversity of generated speaking-style captions.

Decoder Language Modelling +3

Paper
Add Code

SpeechGLUE: How Well Can Self-Supervised Speech Models Capture Linguistic Knowledge?

1 code implementation • 14 Jun 2023 • Takanori Ashihara, Takafumi Moriya, Kohei Matsuura, Tomohiro Tanaka, Yusuke Ijima, Taichi Asami, Marc Delcroix, Yukinori Honma

Self-supervised learning (SSL) for speech representation has been successfully applied in various downstream tasks, such as speech and speaker recognition.

Natural Language Understanding Self-Supervised Learning +2

Paper
Code

Zero-shot text-to-speech synthesis conditioned using self-supervised speech representation model

no code implementations • 24 Apr 2023 • Kenichi Fujita, Takanori Ashihara, Hiroki Kanagawa, Takafumi Moriya, Yusuke Ijima

This paper proposes a zero-shot text-to-speech (TTS) conditioned by a self-supervised speech-representation model acquired through self-supervised learning (SSL).

Self-Supervised Learning Speech Synthesis +1

Paper
Add Code

SIMD-size aware weight regularization for fast neural vocoding on CPU

no code implementations • 2 Nov 2022 • Hiroki Kanagawa, Yusuke Ijima

Pruning time-consuming DNN modules is a promising way to realize a real-time vocoder on a CPU (e. g. WaveRNN, LPCNet).

Paper
Add Code

Model architectures to extrapolate emotional expressions in DNN-based text-to-speech

no code implementations • 20 Feb 2021 • Katsuki Inoue, Sunao Hara, Masanobu Abe, Nobukatsu Hojo, Yusuke Ijima

In this study, the meaning of "extrapolate emotional expressions" is to borrow emotional expressions from others, and the collection of emotional speech uttered by target speakers is unnecessary.

Paper
Add Code

DNN-based Speech Synthesis Using Abundant Tags of Spontaneous Speech Corpus

no code implementations • LREC 2020 • Yuki Yamashita, Tomoki Koriyama, Yuki Saito, Shinnosuke Takamichi, Yusuke Ijima, Ryo Masumura, Hiroshi Saruwatari

In this paper, we investigate the effectiveness of using rich annotations in deep neural network (DNN)-based statistical speech synthesis.

Speech Synthesis

Paper
Add Code

V2S attack: building DNN-based voice conversion from automatic speaker verification

no code implementations • 5 Aug 2019 • Taiki Nakamura, Yuki Saito, Shinnosuke Takamichi, Yusuke Ijima, Hiroshi Saruwatari

The experimental evaluation compares converted voices between the proposed method that does not use the targeted speaker's voice data and the standard VC that uses the data.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.