Search Results for author: Marcely Zanon Boito

Found 17 papers, 7 papers with code

Findings of the IWSLT 2022 Evaluation Campaign

no code implementations • IWSLT (ACL) 2022 • Antonios Anastasopoulos, Loïc Barrault, Luisa Bentivogli, Marcely Zanon Boito, Ondřej Bojar, Roldano Cattoni, Anna Currey, Georgiana Dinu, Kevin Duh, Maha Elbayad, Clara Emmanuel, Yannick Estève, Marcello Federico, Christian Federmann, Souhir Gahbiche, Hongyu Gong, Roman Grundkiewicz, Barry Haddow, Benjamin Hsu, Dávid Javorský, Vĕra Kloudová, Surafel Lakew, Xutai Ma, Prashant Mathur, Paul McNamee, Kenton Murray, Maria Nǎdejde, Satoshi Nakamura, Matteo Negri, Jan Niehues, Xing Niu, John Ortega, Juan Pino, Elizabeth Salesky, Jiatong Shi, Matthias Sperber, Sebastian Stüker, Katsuhito Sudoh, Marco Turchi, Yogesh Virkar, Alexander Waibel, Changhan Wang, Shinji Watanabe

The evaluation campaign of the 19th International Conference on Spoken Language Translation featured eight shared tasks: (i) Simultaneous speech translation, (ii) Offline speech translation, (iii) Speech to speech translation, (iv) Low-resource speech translation, (v) Multilingual speech translation, (vi) Dialect speech translation, (vii) Formality control for speech translation, (viii) Isometric speech translation.

Speech-to-Speech Translation Translation

Paper
Add Code

Multilingual DistilWhisper: Efficient Distillation of Multi-task Speech Models via Language-Specific Experts

1 code implementation • 2 Nov 2023 • Thomas Palmeira Ferraz, Marcely Zanon Boito, Caroline Brun, Vassilina Nikoulina

Whisper is a multitask and multilingual speech model covering 99 languages.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Code

LeBenchmark 2.0: a Standardized, Replicable and Enhanced Framework for Self-supervised Representations of French Speech

no code implementations • 11 Sep 2023 • Titouan Parcollet, Ha Nguyen, Solene Evain, Marcely Zanon Boito, Adrien Pupier, Salima Mdhaffar, Hang Le, Sina Alisamir, Natalia Tomashenko, Marco Dinarelli, Shucong Zhang, Alexandre Allauzen, Maximin Coavoux, Yannick Esteve, Mickael Rouvier, Jerome Goulian, Benjamin Lecouteux, Francois Portet, Solange Rossato, Fabien Ringeval, Didier Schwab, Laurent Besacier

Self-supervised learning (SSL) is at the origin of unprecedented improvements in many different domains including computer vision and natural language processing.

Self-Supervised Learning

Paper
Add Code

NAVER LABS Europe's Multilingual Speech Translation Systems for the IWSLT 2023 Low-Resource Track

no code implementations • 13 Jun 2023 • Edward Gow-Smith, Alexandre Berard, Marcely Zanon Boito, Ioan Calapodescu

This paper presents NAVER LABS Europe's systems for Tamasheq-French and Quechua-Spanish speech translation in the IWSLT 2023 Low-Resource track.

Translation

Paper
Add Code

ON-TRAC Consortium Systems for the IWSLT 2022 Dialect and Low-resource Speech Translation Tasks

no code implementations • IWSLT (ACL) 2022 • Marcely Zanon Boito, John Ortega, Hugo Riguidel, Antoine Laurent, Loïc Barrault, Fethi Bougares, Firas Chaabani, Ha Nguyen, Florentin Barbier, Souhir Gahbiche, Yannick Estève

This paper describes the ON-TRAC Consortium translation systems developed for two challenge tracks featured in the Evaluation Campaign of IWSLT 2022: low-resource and dialect speech translation.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +4

Paper
Add Code

A Study of Gender Impact in Self-supervised Models for Speech-to-Text Systems

no code implementations • 4 Apr 2022 • Marcely Zanon Boito, Laurent Besacier, Natalia Tomashenko, Yannick Estève

These models are pre-trained on unlabeled audio data and then used in speech processing downstream tasks such as automatic speech recognition (ASR) or speech translation (ST).

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Speech Resources in the Tamasheq Language

1 code implementation • LREC 2022 • Marcely Zanon Boito, Fethi Bougares, Florentin Barbier, Souhir Gahbiche, Loïc Barrault, Mickael Rouvier, Yannick Estève

In this paper we present two datasets for Tamasheq, a developing language mainly spoken in Mali and Niger.

Translation

Paper
Code

Unsupervised Word Segmentation from Discrete Speech Units in Low-Resource Settings

no code implementations • SIGUL (LREC) 2022 • Marcely Zanon Boito, Bolaji Yusuf, Lucas Ondel, Aline Villavicencio, Laurent Besacier

Our results suggest that neural models for speech discretization are difficult to exploit in our setting, and that it might be necessary to adapt them to limit sequence length.

Paper
Add Code

LeBenchmark: A Reproducible Framework for Assessing Self-Supervised Representation Learning from Speech

1 code implementation • 23 Apr 2021 • Solene Evain, Ha Nguyen, Hang Le, Marcely Zanon Boito, Salima Mdhaffar, Sina Alisamir, Ziyi Tong, Natalia Tomashenko, Marco Dinarelli, Titouan Parcollet, Alexandre Allauzen, Yannick Esteve, Benjamin Lecouteux, Francois Portet, Solange Rossato, Fabien Ringeval, Didier Schwab, Laurent Besacier

In this paper, we propose LeBenchmark: a reproducible framework for assessing SSL from speech.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +6

Paper
Code

Investigating Language Impact in Bilingual Approaches for Computational Language Documentation

no code implementations • LREC 2020 • Marcely Zanon Boito, Aline Villavicencio, Laurent Besacier

For answering this question, we use the MaSS multilingual speech corpus (Boito et al., 2020) for creating 56 bilingual pairs that we apply to the task of low-resource unsupervised word segmentation and alignment.

Segmentation Translation

Paper
Add Code

ON-TRAC Consortium End-to-End Speech Translation Systems for the IWSLT 2019 Shared Task

no code implementations • EMNLP (IWSLT) 2019 • Ha Nguyen, Natalia Tomashenko, Marcely Zanon Boito, Antoine Caubriere, Fethi Bougares, Mickael Rouvier, Laurent Besacier, Yannick Esteve

This paper describes the ON-TRAC Consortium translation systems developed for the end-to-end model task of IWSLT Evaluation 2019 for the English-to-Portuguese language pair.

Decoder Translation

Paper
Add Code

How Does Language Influence Documentation Workflow? Unsupervised Word Discovery Using Translations in Multiple Languages

1 code implementation • 11 Oct 2019 • Marcely Zanon Boito, Aline Villavicencio, Laurent Besacier

For language documentation initiatives, transcription is an expensive resource: one minute of audio is estimated to take one hour and a half on average of a linguist's work (Austin and Sallabank, 2013).

Paper
Code

MaSS: A Large and Clean Multilingual Corpus of Sentence-aligned Spoken Utterances Extracted from the Bible

1 code implementation • LREC 2020 • Marcely Zanon Boito, William N. Havard, Mahault Garnerin, Éric Le Ferrand, Laurent Besacier

However, the fact that the source content (the Bible) is the same for all the languages is not exploited to date. Therefore, this article proposes to add multilingual links between speech segments in different languages, and shares a large and clean dataset of 8, 130 parallel spoken utterances across 8 languages (56 language pairs).

Automatic Speech Recognition Automatic Speech Recognition (ASR) +4

Paper
Code

Empirical Evaluation of Sequence-to-Sequence Models for Word Discovery in Low-resource Settings

1 code implementation • 29 Jun 2019 • Marcely Zanon Boito, Aline Villavicencio, Laurent Besacier

This task consists in aligning word sequences in a source language with phoneme sequences in a target language, inferring from it word segmentation on the target side [5].

Machine Translation

Paper
Code

A small Griko-Italian speech translation corpus

no code implementations • 27 Jul 2018 • Marcely Zanon Boito, Antonios Anastasopoulos, Marika Lekakou, Aline Villavicencio, Laurent Besacier

This paper presents an extension to a very low-resource parallel corpus collected in an endangered language, Griko, making it useful for computational research.

Translation

Paper
Add Code

A Very Low Resource Language Speech Corpus for Computational Language Documentation Experiments

1 code implementation • LREC 2018 • Pierre Godard, Gilles Adda, Martine Adda-Decker, Juan Benjumea, Laurent Besacier, Jamison Cooper-Leavitt, Guy-Noel Kouarata, Lori Lamel, HÃ©lÃ¨ne Maynard, Markus Mueller, Annie Rialland, Sebastian Stueker, FranÃ§ois Yvon, Marcely Zanon Boito

Paper
Code

Unwritten Languages Demand Attention Too! Word Discovery with Encoder-Decoder Models

no code implementations • 17 Sep 2017 • Marcely Zanon Boito, Alexandre Berard, Aline Villavicencio, Laurent Besacier

Word discovery is the task of extracting words from unsegmented text.

Decoder Machine Translation +1

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.