no code implementations • 22 Jan 2024 • Michael Hentschel, Yuta Nishikawa, Tatsuya Komatsu, Yusuke Fujita
This study presents a novel approach for knowledge distillation (KD) from a BERT teacher model to an automatic speech recognition (ASR) model using intermediate layers.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +4
no code implementations • 15 Sep 2023 • Reo Shimizu, Ryuichi Yamamoto, Masaya Kawamura, Yuma Shirahata, Hironori Doi, Tatsuya Komatsu, Kentaro Tachibana
We propose PromptTTS++, a prompt-based text-to-speech (TTS) synthesis system that allows control over speaker identity using natural language descriptions.
no code implementations • 15 Sep 2023 • Tatsuya Komatsu, Yusuke Fujita, Kazuya Takeda, Tomoki Toda
Furthermore, a unique technique is proposed that involves mixing the input audio with additional audio, and using the additional audio as a reference.
1 code implementation • 13 Mar 2023 • Yusuke Fujita, Tatsuya Komatsu, Robin Scheibler, Yusuke Kida, Tetsuji Ogawa
The experiments with the two-speaker CALLHOME dataset show that the intermediate labels with the proposed non-autoregressive intermediate attractors boost the diarization performance.
no code implementations • 1 Apr 2022 • Tatsuya Komatsu, Yusuke Fujita, Jaesong Lee, Lukas Lee, Shinji Watanabe, Yusuke Kida
This paper proposes a method for improved CTC inference with searched intermediates and multi-pass conditioning.
no code implementations • 1 Apr 2022 • Yusuke Fujita, Tatsuya Komatsu, Yusuke Kida
End-to-end automatic speech recognition directly maps input speech to characters.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 1 Apr 2022 • Yu Nakagome, Tatsuya Komatsu, Yusuke Fujita, Shuta Ichimura, Yusuke Kida
The proposed method exploits the conditioning framework of self-conditioned CTC to train robust models by conditioning with "noisy" intermediate predictions.
no code implementations • 17 Feb 2022 • Jin Sakuma, Tatsuya Komatsu, Robin Scheibler
We propose multi-layer perceptron (MLP)-based architectures suitable for variable length input.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 17 Feb 2022 • Tatsuya Komatsu
The proposed method realizes non-autoregressive ASR with fewer parameters by folding the conventional stack of encoders into only two blocks; base encoders and folded encoders.
no code implementations • 17 Feb 2022 • Tatsuya Komatsu, Shinji Watanabe, Koichi Miyazaki, Tomoki Hayashi
In each iteration, the event's activity is estimated and used to condition the next output based on the probabilistic chain rule to form classifier chains.
no code implementations • 11 Oct 2021 • Yosuke Higuchi, Nanxin Chen, Yuya Fujita, Hirofumi Inaguma, Tatsuya Komatsu, Jaesong Lee, Jumon Nozaki, Tianzi Wang, Shinji Watanabe
Non-autoregressive (NAR) models simultaneously generate multiple outputs in a sequence, which significantly reduces the inference speed at the cost of accuracy drop compared to autoregressive baselines.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3
no code implementations • 29 Sep 2021 • Jin Sakuma, Tatsuya Komatsu, Robin Scheibler
We propose three approaches to extend MLP-based architectures for use with sequences of arbitrary length.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 21 Apr 2021 • Yusuke Kida, Tatsuya Komatsu, Masahito Togami
The speech-to-text alignment is a problem of splitting long audio recordings with un-aligned transcripts into utterance-wise pairs of speech and text.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3
no code implementations • 6 Apr 2021 • Jumon Nozaki, Tatsuya Komatsu
This paper proposes a method to relax the conditional independence assumption of connectionist temporal classification (CTC)-based automatic speech recognition (ASR) models.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
no code implementations • 19 Jun 2020 • Tsubasa Takahashi, Shun Takagi, Hajime Ono, Tatsuya Komatsu
This paper studies how to learn variational autoencoders with a variety of divergences under differential privacy constraints.
no code implementations • 8 Apr 2019 • Chaitanya Narisetty, Tatsuya Komatsu, Reishi Kondo
This paper proposes a determined blind source separation method using Bayesian non-parametric modelling of sources.