Search Results for author: Albert Zeyer

Found 19 papers, 10 papers with code

Neural Speech Translation at AppTek

no code implementations • IWSLT (EMNLP) 2018 • Evgeny Matusov, Patrick Wilken, Parnia Bahar, Julian Schamper, Pavel Golik, Albert Zeyer, Joan Albert Silvestre-Cerda, Adrià Martínez-Villaronga, Hendrik Pesch, Jan-Thorsten Peter

This work describes AppTek’s speech translation pipeline that includes strong state-of-the-art automatic speech recognition (ASR) and neural machine translation (NMT) components.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +4

Paper
Add Code

Chunked Attention-based Encoder-Decoder Model for Streaming Speech Recognition

no code implementations • 15 Sep 2023 • Mohammad Zeineldeen, Albert Zeyer, Ralf Schlüter, Hermann Ney

We study a streamable attention-based encoder-decoder model in which either the decoder, or both the encoder and decoder, operate on pre-defined, fixed-size windows called chunks.

Decoder speech-recognition +1

Paper
Add Code

Monotonic segmental attention for automatic speech recognition

1 code implementation • 26 Oct 2022 • Albert Zeyer, Robin Schmitt, Wei Zhou, Ralf Schlüter, Hermann Ney

We restrict the decoder attention to segments to avoid quadratic runtime of global attention, better generalize to long sequences, and eventually enable streaming.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

152

Paper
Code

Why does CTC result in peaky behavior?

1 code implementation • 31 May 2021 • Albert Zeyer, Ralf Schlüter, Hermann Ney

The peaky behavior of CTC models is well known experimentally.

152

Paper
Code

Equivalence of Segmental and Neural Transducer Modeling: A Proof of Concept

no code implementations • 13 Apr 2021 • Wei Zhou, Albert Zeyer, André Merboldt, Ralf Schlüter, Hermann Ney

With the advent of direct models in automatic speech recognition (ASR), the formerly prevalent frame-wise acoustic modeling based on hidden Markov models (HMM) diversified into a number of modeling architectures like encoder-decoder attention models, transducer models and segmental models (direct HMM).

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Investigating Methods to Improve Language Model Integration for Attention-based Encoder-Decoder ASR Models

no code implementations • 12 Apr 2021 • Mohammad Zeineldeen, Aleksandr Glushko, Wilfried Michel, Albert Zeyer, Ralf Schlüter, Hermann Ney

Attention-based encoder-decoder (AED) models learn an implicit internal language model (ILM) from the training transcriptions.

Decoder Language Modelling

Paper
Add Code

Librispeech Transducer Model with Internal Language Model Prior Correction

2 code implementations • 7 Apr 2021 • Albert Zeyer, André Merboldt, Wilfried Michel, Ralf Schlüter, Hermann Ney

We present our transducer model on Librispeech.

Ranked #25 on Speech Recognition on LibriSpeech test-clean (using extra training data)

Language Modelling Sentence +1

349

Paper
Code

A study of latent monotonic attention variants

no code implementations • 30 Mar 2021 • Albert Zeyer, Ralf Schlüter, Hermann Ney

We compare several monotonic latent models to our global soft attention baseline such as a hard attention model, a local windowed soft attention model, and a segmental soft attention model.

Hard Attention speech-recognition +1

Paper
Add Code

A New Training Pipeline for an Improved Neural Transducer

1 code implementation • 19 May 2020 • Albert Zeyer, André Merboldt, Ralf Schlüter, Hermann Ney

We compare the original training criterion with the full marginalization over all alignments, to the commonly used maximum approximation, which simplifies, improves and speeds up our training.

152

Paper
Code

A systematic comparison of grapheme-based vs. phoneme-based label units for encoder-decoder-attention models

1 code implementation • 19 May 2020 • Mohammad Zeineldeen, Albert Zeyer, Wei Zhou, Thomas Ng, Ralf Schlüter, Hermann Ney

Following the rationale of end-to-end modeling, CTC, RNN-T or encoder-decoder-attention models for automatic speech recognition (ASR) use graphemes or grapheme-based subword units based on e. g. byte-pair encoding (BPE).

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

152

Paper
Code

Generating Synthetic Audio Data for Attention-Based Speech Recognition Systems

1 code implementation • 19 Dec 2019 • Nick Rossenbach, Albert Zeyer, Ralf Schlüter, Hermann Ney

We achieve improvements of up to 33% relative in word-error-rate (WER) over a strong baseline with data-augmentation in a low-resource environment (LibriSpeech-100h), closing the gap to a comparable oracle experiment by more than 50\%.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

152

Paper
Code

On Using SpecAugment for End-to-End Speech Translation

no code implementations • EMNLP (IWSLT) 2019 • Parnia Bahar, Albert Zeyer, Ralf Schlüter, Hermann Ney

This work investigates a simple data augmentation technique, SpecAugment, for end-to-end speech translation.

Data Augmentation Translation

Paper
Add Code

On using 2D sequence-to-sequence models for speech recognition

no code implementations • 20 Nov 2019 • Parnia Bahar, Albert Zeyer, Ralf Schlüter, Hermann Ney

Attention-based sequence-to-sequence models have shown promising results in automatic speech recognition.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Language Modeling with Deep Transformers

no code implementations • 10 May 2019 • Kazuki Irie, Albert Zeyer, Ralf Schlüter, Hermann Ney

We explore deep autoregressive Transformer models in language modeling for speech recognition.

Decoder Language Modelling +2

Paper
Add Code

RWTH ASR Systems for LibriSpeech: Hybrid vs Attention -- w/o Data Augmentation

2 code implementations • 8 May 2019 • Christoph Lüscher, Eugen Beck, Kazuki Irie, Markus Kitza, Wilfried Michel, Albert Zeyer, Ralf Schlüter, Hermann Ney

To the best knowledge of the authors, the results obtained when training on the full LibriSpeech training set, are the best published currently, both for the hybrid DNN/HMM and the attention-based systems.

Ranked #24 on Speech Recognition on LibriSpeech test-other

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

349

Paper
Code

RETURNN as a Generic Flexible Neural Toolkit with Application to Translation and Speech Recognition

3 code implementations • ACL 2018 • Albert Zeyer, Tamer Alkhouli, Hermann Ney

We compare the fast training and decoding speed of RETURNN of attention models for translation, due to fast CUDA LSTM kernels, and a fast pure TensorFlow beam search decoder.

Decoder speech-recognition +2

349

Paper
Code

Improved training of end-to-end attention models for speech recognition

14 code implementations • 8 May 2018 • Albert Zeyer, Kazuki Irie, Ralf Schlüter, Hermann Ney

Sequence-to-sequence attention-based models on subword units allow simple open-vocabulary end-to-end speech recognition.

Ranked #43 on Speech Recognition on LibriSpeech test-clean (using extra training data)

Language Modelling Speech Recognition

349

Paper
Code

RETURNN: The RWTH Extensible Training framework for Universal Recurrent Neural Networks

3 code implementations • 2 Aug 2016 • Patrick Doetsch, Albert Zeyer, Paul Voigtlaender, Ilya Kulikov, Ralf Schlüter, Hermann Ney

In this work we release our extensible and easily configurable neural network training software.

349

Paper
Code

A Comprehensive Study of Deep Bidirectional LSTM RNNs for Acoustic Modeling in Speech Recognition

no code implementations • 22 Jun 2016 • Albert Zeyer, Patrick Doetsch, Paul Voigtlaender, Ralf Schlüter, Hermann Ney

On this task, we get our best result with an 8 layer bidirectional LSTM and we show that a pretraining scheme with layer-wise construction helps for deep LSTMs.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.