Search Results for author: Tatsuya Komatsu

Found 16 papers, 1 papers with code

Keep Decoding Parallel with Effective Knowledge Distillation from Language Models to End-to-end Speech Recognisers

no code implementations • 22 Jan 2024 • Michael Hentschel, Yuta Nishikawa, Tatsuya Komatsu, Yusuke Fujita

This study presents a novel approach for knowledge distillation (KD) from a BERT teacher model to an automatic speech recognition (ASR) model using intermediate layers.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +4

Paper
Add Code

PromptTTS++: Controlling Speaker Identity in Prompt-Based Text-to-Speech Using Natural Language Descriptions

no code implementations • 15 Sep 2023 • Reo Shimizu, Ryuichi Yamamoto, Masaya Kawamura, Yuma Shirahata, Hironori Doi, Tatsuya Komatsu, Kentaro Tachibana

We propose PromptTTS++, a prompt-based text-to-speech (TTS) synthesis system that allows control over speaker identity using natural language descriptions.

Paper
Add Code

Audio Difference Learning for Audio Captioning

no code implementations • 15 Sep 2023 • Tatsuya Komatsu, Yusuke Fujita, Kazuya Takeda, Tomoki Toda

Furthermore, a unique technique is proposed that involves mixing the input audio with additional audio, and using the additional audio as a reference.

Audio captioning

Paper
Add Code

Neural Diarization with Non-autoregressive Intermediate Attractors

1 code implementation • 13 Mar 2023 • Yusuke Fujita, Tatsuya Komatsu, Robin Scheibler, Yusuke Kida, Tetsuji Ogawa

The experiments with the two-speaker CALLHOME dataset show that the intermediate labels with the proposed non-autoregressive intermediate attractors boost the diarization performance.

Decoder speaker-diarization +1

354

Paper
Code

Better Intermediates Improve CTC Inference

no code implementations • 1 Apr 2022 • Tatsuya Komatsu, Yusuke Fujita, Jaesong Lee, Lukas Lee, Shinji Watanabe, Yusuke Kida

This paper proposes a method for improved CTC inference with searched intermediates and multi-pass conditioning.

Paper
Add Code

Alternate Intermediate Conditioning with Syllable-level and Character-level Targets for Japanese ASR

no code implementations • 1 Apr 2022 • Yusuke Fujita, Tatsuya Komatsu, Yusuke Kida

End-to-end automatic speech recognition directly maps input speech to characters.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

InterAug: Augmenting Noisy Intermediate Predictions for CTC-based ASR

no code implementations • 1 Apr 2022 • Yu Nakagome, Tatsuya Komatsu, Yusuke Fujita, Shuta Ichimura, Yusuke Kida

The proposed method exploits the conditioning framework of self-conditioned CTC to train robust models by conditioning with "noisy" intermediate predictions.

Decoder speech-recognition +1

Paper
Add Code

MLP-ASR: Sequence-length agnostic all-MLP architectures for speech recognition

no code implementations • 17 Feb 2022 • Jin Sakuma, Tatsuya Komatsu, Robin Scheibler

We propose multi-layer perceptron (MLP)-based architectures suitable for variable length input.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Non-Autoregressive ASR with Self-Conditioned Folded Encoders

no code implementations • 17 Feb 2022 • Tatsuya Komatsu

The proposed method realizes non-autoregressive ASR with fewer parameters by folding the conventional stack of encoders into only two blocks; base encoders and folded encoders.

Paper
Add Code

Acoustic Event Detection with Classifier Chains

no code implementations • 17 Feb 2022 • Tatsuya Komatsu, Shinji Watanabe, Koichi Miyazaki, Tomoki Hayashi

In each iteration, the event's activity is estimated and used to condition the next output based on the probabilistic chain rule to form classifier chains.

Event Detection

Paper
Add Code

A Comparative Study on Non-Autoregressive Modelings for Speech-to-Text Generation

no code implementations • 11 Oct 2021 • Yosuke Higuchi, Nanxin Chen, Yuya Fujita, Hirofumi Inaguma, Tatsuya Komatsu, Jaesong Lee, Jumon Nozaki, Tianzi Wang, Shinji Watanabe

Non-autoregressive (NAR) models simultaneously generate multiple outputs in a sequence, which significantly reduces the inference speed at the cost of accuracy drop compared to autoregressive baselines.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

MLP-based architecture with variable length input for automatic speech recognition

no code implementations • 29 Sep 2021 • Jin Sakuma, Tatsuya Komatsu, Robin Scheibler

We propose three approaches to extend MLP-based architectures for use with sequences of arbitrary length.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Label-Synchronous Speech-to-Text Alignment for ASR Using Forward and Backward Transformers

no code implementations • 21 Apr 2021 • Yusuke Kida, Tatsuya Komatsu, Masahito Togami

The speech-to-text alignment is a problem of splitting long audio recordings with un-aligned transcripts into utterance-wise pairs of speech and text.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

Relaxing the Conditional Independence Assumption of CTC-based ASR by Conditioning on Intermediate Predictions

no code implementations • 6 Apr 2021 • Jumon Nozaki, Tatsuya Komatsu

This paper proposes a method to relax the conditional independence assumption of connectionist temporal classification (CTC)-based automatic speech recognition (ASR) models.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Differentially Private Variational Autoencoders with Term-wise Gradient Aggregation

no code implementations • 19 Jun 2020 • Tsubasa Takahashi, Shun Takagi, Hajime Ono, Tatsuya Komatsu

This paper studies how to learn variational autoencoders with a variety of divergences under differential privacy constraints.

Paper
Add Code

Bayesian Non-Parametric Multi-Source Modelling Based Determined Blind Source Separation

no code implementations • 8 Apr 2019 • Chaitanya Narisetty, Tatsuya Komatsu, Reishi Kondo

This paper proposes a determined blind source separation method using Bayesian non-parametric modelling of sources.

blind source separation

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.