no code implementations • IWSLT (EMNLP) 2018 • Evgeny Matusov, Patrick Wilken, Parnia Bahar, Julian Schamper, Pavel Golik, Albert Zeyer, Joan Albert Silvestre-Cerda, Adrià Martínez-Villaronga, Hendrik Pesch, Jan-Thorsten Peter
This work describes AppTek’s speech translation pipeline that includes strong state-of-the-art automatic speech recognition (ASR) and neural machine translation (NMT) components.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +4
no code implementations • 15 Sep 2023 • Mohammad Zeineldeen, Albert Zeyer, Ralf Schlüter, Hermann Ney
We study a streamable attention-based encoder-decoder model in which either the decoder, or both the encoder and decoder, operate on pre-defined, fixed-size windows called chunks.
1 code implementation • 26 Oct 2022 • Albert Zeyer, Robin Schmitt, Wei Zhou, Ralf Schlüter, Hermann Ney
We restrict the decoder attention to segments to avoid quadratic runtime of global attention, better generalize to long sequences, and eventually enable streaming.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
1 code implementation • 31 May 2021 • Albert Zeyer, Ralf Schlüter, Hermann Ney
The peaky behavior of CTC models is well known experimentally.
no code implementations • 13 Apr 2021 • Wei Zhou, Albert Zeyer, André Merboldt, Ralf Schlüter, Hermann Ney
With the advent of direct models in automatic speech recognition (ASR), the formerly prevalent frame-wise acoustic modeling based on hidden Markov models (HMM) diversified into a number of modeling architectures like encoder-decoder attention models, transducer models and segmental models (direct HMM).
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 12 Apr 2021 • Mohammad Zeineldeen, Aleksandr Glushko, Wilfried Michel, Albert Zeyer, Ralf Schlüter, Hermann Ney
Attention-based encoder-decoder (AED) models learn an implicit internal language model (ILM) from the training transcriptions.
2 code implementations • 7 Apr 2021 • Albert Zeyer, André Merboldt, Wilfried Michel, Ralf Schlüter, Hermann Ney
We present our transducer model on Librispeech.
Ranked #25 on Speech Recognition on LibriSpeech test-clean (using extra training data)
no code implementations • 30 Mar 2021 • Albert Zeyer, Ralf Schlüter, Hermann Ney
We compare several monotonic latent models to our global soft attention baseline such as a hard attention model, a local windowed soft attention model, and a segmental soft attention model.
1 code implementation • 19 May 2020 • Albert Zeyer, André Merboldt, Ralf Schlüter, Hermann Ney
We compare the original training criterion with the full marginalization over all alignments, to the commonly used maximum approximation, which simplifies, improves and speeds up our training.
1 code implementation • 19 May 2020 • Mohammad Zeineldeen, Albert Zeyer, Wei Zhou, Thomas Ng, Ralf Schlüter, Hermann Ney
Following the rationale of end-to-end modeling, CTC, RNN-T or encoder-decoder-attention models for automatic speech recognition (ASR) use graphemes or grapheme-based subword units based on e. g. byte-pair encoding (BPE).
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
1 code implementation • 19 Dec 2019 • Nick Rossenbach, Albert Zeyer, Ralf Schlüter, Hermann Ney
We achieve improvements of up to 33% relative in word-error-rate (WER) over a strong baseline with data-augmentation in a low-resource environment (LibriSpeech-100h), closing the gap to a comparable oracle experiment by more than 50\%.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3
no code implementations • EMNLP (IWSLT) 2019 • Parnia Bahar, Albert Zeyer, Ralf Schlüter, Hermann Ney
This work investigates a simple data augmentation technique, SpecAugment, for end-to-end speech translation.
no code implementations • 20 Nov 2019 • Parnia Bahar, Albert Zeyer, Ralf Schlüter, Hermann Ney
Attention-based sequence-to-sequence models have shown promising results in automatic speech recognition.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
no code implementations • 10 May 2019 • Kazuki Irie, Albert Zeyer, Ralf Schlüter, Hermann Ney
We explore deep autoregressive Transformer models in language modeling for speech recognition.
2 code implementations • 8 May 2019 • Christoph Lüscher, Eugen Beck, Kazuki Irie, Markus Kitza, Wilfried Michel, Albert Zeyer, Ralf Schlüter, Hermann Ney
To the best knowledge of the authors, the results obtained when training on the full LibriSpeech training set, are the best published currently, both for the hybrid DNN/HMM and the attention-based systems.
Ranked #24 on Speech Recognition on LibriSpeech test-other
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3
3 code implementations • ACL 2018 • Albert Zeyer, Tamer Alkhouli, Hermann Ney
We compare the fast training and decoding speed of RETURNN of attention models for translation, due to fast CUDA LSTM kernels, and a fast pure TensorFlow beam search decoder.
14 code implementations • 8 May 2018 • Albert Zeyer, Kazuki Irie, Ralf Schlüter, Hermann Ney
Sequence-to-sequence attention-based models on subword units allow simple open-vocabulary end-to-end speech recognition.
Ranked #43 on Speech Recognition on LibriSpeech test-clean (using extra training data)
3 code implementations • 2 Aug 2016 • Patrick Doetsch, Albert Zeyer, Paul Voigtlaender, Ilya Kulikov, Ralf Schlüter, Hermann Ney
In this work we release our extensible and easily configurable neural network training software.
no code implementations • 22 Jun 2016 • Albert Zeyer, Patrick Doetsch, Paul Voigtlaender, Ralf Schlüter, Hermann Ney
On this task, we get our best result with an 8 layer bidirectional LSTM and we show that a pretraining scheme with layer-wise construction helps for deep LSTMs.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1