1 code implementation • IWSLT (ACL) 2022 • Ioannis Tsiamas, Gerard I. Gállego, Carlos Escolano, José Fonollosa, Marta R. Costa-jussà
We further investigate the suitability of different speech encoders (wav2vec 2. 0, HuBERT) for our models and the impact of knowledge distillation from the Machine Translation model that we use for the decoder (mBART).
1 code implementation • 16 Feb 2024 • Ioannis Tsiamas, Gerard I. Gállego, José A. R. Fonollosa, Marta R. Costa-jussà
The speech encoder seamlessly integrates with the MT model at inference, enabling direct translation from speech to text, across all languages supported by the MT model.
no code implementations • 20 Sep 2023 • Belen Alastruey, Aleix Sant, Gerard I. Gállego, David Dale, Marta R. Costa-jussà
In doing so, we contribute to the ongoing research progress within the fields of Speech-to-Speech and Speech-to-Text translation.
no code implementations • 2 Jun 2023 • Ioannis Tsiamas, Gerard I. Gállego, José A. R. Fonollosa, Marta R. Costa-jussà
Our Speech Translation systems utilize foundation models for speech (wav2vec 2. 0) and text (mBART50).
1 code implementation • 21 May 2023 • Javier Ferrando, Gerard I. Gállego, Ioannis Tsiamas, Marta R. Costa-jussà
Language Generation Models produce words based on the previous context.
1 code implementation • 13 Apr 2023 • Laia Tarrés, Gerard I. Gállego, Amanda Duarte, Jordi Torres, Xavier Giró-i-Nieto
We report a result of 8. 03 on the BLEU score, and publish the first open-source implementation of its kind to promote further advances.
Ranked #1 on Sign Language Translation on How2Sign
1 code implementation • 28 Oct 2022 • Ioannis Tsiamas, Gerard I. Gállego, José A. R. Fonollosa, Marta R. Costa-jussà
Transformers have been the dominant architecture for Speech Translation in recent years, achieving significant improvements in translation quality.
1 code implementation • 23 May 2022 • Javier Ferrando, Gerard I. Gállego, Belen Alastruey, Carlos Escolano, Marta R. Costa-jussà
In Neural Machine Translation (NMT), each token prediction is conditioned on the source sentence and the target prefix (what has been previously translated at a decoding step).
no code implementations • NAACL (ACL) 2022 • Gerard Sant, Gerard I. Gállego, Belen Alastruey, Marta R. Costa-jussà
Different approaches have been proposed to overcome these problems, such as the use of efficient attention mechanisms.
no code implementations • ACL 2022 • Belen Alastruey, Javier Ferrando, Gerard I. Gállego, Marta R. Costa-jussà
Transformers have achieved state-of-the-art results across multiple NLP tasks.
2 code implementations • 8 Mar 2022 • Javier Ferrando, Gerard I. Gállego, Marta R. Costa-jussà
The Transformer architecture aggregates input information through the self-attention mechanism, but there is no clear understanding of how this information is mixed across the entire model.
2 code implementations • 9 Feb 2022 • Ioannis Tsiamas, Gerard I. Gállego, José A. R. Fonollosa, Marta R. Costa-jussà
Speech translation datasets provide manual segmentations of the audios, which are not available in real-world scenarios, and existing segmentation methods usually significantly reduce translation quality at inference time.
no code implementations • 7 Jul 2021 • Belen Alastruey, Gerard I. Gállego, Marta R. Costa-jussà
When working with speech, we must face a problem: the sequence length of an audio input is not suitable for the Transformer.
1 code implementation • ACL (IWSLT) 2021 • Gerard I. Gállego, Ioannis Tsiamas, Carlos Escolano, José A. R. Fonollosa, Marta R. Costa-jussà
Our submission also uses a custom segmentation algorithm that employs pre-trained Wav2Vec 2. 0 for identifying periods of untranscribable text and can bring improvements of 2. 5 to 3 BLEU score on the IWSLT 2019 test set, as compared to the result with the given segmentation.
Ranked #2 on Speech-to-Text Translation on MuST-C EN->DE (using extra training data)
no code implementations • LREC 2022 • Marta R. Costa-jussà, Christine Basta, Gerard I. Gállego
WinoST is the speech version of WinoMT which is a MT challenge set and both follow an evaluation protocol to measure gender accuracy.