Search Results for author: Didier Schwab

Found 49 papers, 14 papers with code

Visualizing Cross‐Lingual Discourse Relations in Multilingual TED Corpora

1 code implementation • CODI 2021 • Zae Myung Kim, Vassilina Nikoulina, Dongyeop Kang, Didier Schwab, Laurent Besacier

This paper presents an interactive data dashboard that provides users with an overview of the preservation of discourse relations among 28 language pairs.

Relation

Paper
Code

Automatic Speech Recognition and Query By Example for Creole Languages Documentation

1 code implementation • Findings (ACL) 2022 • Cécile Macaire, Didier Schwab, Benjamin Lecouteux, Emmanuel Schang

We investigate the exploitation of self-supervised models for two Creole languages with few resources: Gwadloupéyen and Morisien.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Code

WordNet and beyond: the case of lexical access

no code implementations • GWC 2016 • Michael Zock, Didier Schwab

Next we will show under what conditions WN is suitable for word access, and finally we will present a roadmap showing the obstacles to be overcome to build a resource allowing the text producer to find the word s/he is looking for.

LEMMA

Paper
Add Code

Identification de profil clinique du patient: Une approche de classification de séquences utilisant des modèles de langage français contextualisés (Identification of patient clinical profiles : A sequence classification approach using contextualised French language models )

no code implementations • JEP/TALN/RECITAL 2021 • Aidan Mannion, Thierry Chevalier, Didier Schwab, Lorraine Goeuriot

Cet article présente un résumé de notre soumission pour Tâche 1 de DEFT 2021.

Language Modelling

Paper
Add Code

ON-TRAC’ systems for the IWSLT 2021 low-resource speech translation and multilingual speech translation shared tasks

no code implementations • ACL (IWSLT) 2021 • Hang Le, Florentin Barbier, Ha Nguyen, Natalia Tomashenko, Salima Mdhaffar, Souhir Gabiche Gahbiche, Benjamin Lecouteux, Didier Schwab, Yannick Estève

This paper describes the ON-TRAC Consortium translation systems developed for two challenge tracks featured in the Evaluation Campaign of IWSLT 2021, low-resource speech translation and multilingual speech translation.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +4

Paper
Add Code

LeBenchmark 2.0: a Standardized, Replicable and Enhanced Framework for Self-supervised Representations of French Speech

no code implementations • 11 Sep 2023 • Titouan Parcollet, Ha Nguyen, Solene Evain, Marcely Zanon Boito, Adrien Pupier, Salima Mdhaffar, Hang Le, Sina Alisamir, Natalia Tomashenko, Marco Dinarelli, Shucong Zhang, Alexandre Allauzen, Maximin Coavoux, Yannick Esteve, Mickael Rouvier, Jerome Goulian, Benjamin Lecouteux, Francois Portet, Solange Rossato, Fabien Ringeval, Didier Schwab, Laurent Besacier

Self-supervised learning (SSL) is at the origin of unprecedented improvements in many different domains including computer vision and natural language processing.

Self-Supervised Learning

Paper
Add Code

UMLS-KGI-BERT: Data-Centric Knowledge Integration in Transformers for Biomedical Entity Recognition

no code implementations • 20 Jul 2023 • Aidan Mannion, Thierry Chevalier, Didier Schwab, Lorraine Geouriot

In the biomedical domain, significant progress has been made in adapting this paradigm to NLP tasks that require the integration of domain-specific knowledge as well as statistical modelling of language.

Document Classification named-entity-recognition +4

Paper
Add Code

Pre-training for Speech Translation: CTC Meets Optimal Transport

1 code implementation • 27 Jan 2023 • Phuong-Hang Le, Hongyu Gong, Changhan Wang, Juan Pino, Benjamin Lecouteux, Didier Schwab

Nevertheless, CTC is only a partial solution and thus, in our second contribution, we propose a novel pre-training method combining CTC and optimal transport to further reduce this gap.

Multi-Task Learning Speech-to-Text Translation +1

Paper
Code

Effect Of Personalized Calibration On Gaze Estimation Using Deep-Learning

no code implementations • 27 Sep 2021 • Nairit Bandyopadhyay, Sébastien Riou, Didier Schwab

We trained a multi modal convolutional neural network and analysed its performance with and without calibration and this evaluation provides clear insights on how calibration improved the performance of the Deep Learning model in estimating gaze in the wild.

Gaze Estimation

Paper
Add Code

Lightweight Adapter Tuning for Multilingual Speech Translation

2 code implementations • ACL 2021 • Hang Le, Juan Pino, Changhan Wang, Jiatao Gu, Didier Schwab, Laurent Besacier

Adapter modules were recently introduced as an efficient alternative to fine-tuning in NLP.

Ranked #1 on Speech-to-Text Translation on MuST-C EN->ES

Automatic Speech Recognition Automatic Speech Recognition (ASR) +4

Paper
Code

Do Multilingual Neural Machine Translation Models Contain Language Pair Specific Attention Heads?

no code implementations • Findings (ACL) 2021 • Zae Myung Kim, Laurent Besacier, Vassilina Nikoulina, Didier Schwab

Recent studies on the analysis of the multilingual representations focus on identifying whether there is an emergence of language-independent representations, or whether a multilingual model partitions its weights among different languages.

Decoder Machine Translation +2

Paper
Add Code

LeBenchmark: A Reproducible Framework for Assessing Self-Supervised Representation Learning from Speech

1 code implementation • 23 Apr 2021 • Solene Evain, Ha Nguyen, Hang Le, Marcely Zanon Boito, Salima Mdhaffar, Sina Alisamir, Ziyi Tong, Natalia Tomashenko, Marco Dinarelli, Titouan Parcollet, Alexandre Allauzen, Yannick Esteve, Benjamin Lecouteux, Francois Portet, Solange Rossato, Fabien Ringeval, Didier Schwab, Laurent Besacier

In this paper, we propose LeBenchmark: a reproducible framework for assessing SSL from speech.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +6

Paper
Code

Dual-decoder Transformer for Joint Automatic Speech Recognition and Multilingual Speech Translation

1 code implementation • COLING 2020 • Hang Le, Juan Pino, Changhan Wang, Jiatao Gu, Didier Schwab, Laurent Besacier

We propose two variants of these architectures corresponding to two different levels of dependencies between the decoders, called the parallel and cross dual-decoder Transformers, respectively.

Ranked #1 on Speech-to-Text Translation on MuST-C EN->FR

Automatic Speech Recognition Automatic Speech Recognition (ASR) +4

Paper
Code

Reconnaissance de parole beatbox\'ee \`a l'aide d'un syst\`eme HMM-GMM inspir\'e de la reconnaissance automatique de la parole (BEATBOX SOUNDS RECOGNITION USING A SPEECH-DEDICATED HMM-GMM BASED SYSTEM 1 Human beatboxing is a vocal art making use of speech organs to produce percussive sounds and imitate musical instruments)

no code implementations • JEPTALNRECITAL 2020 • Sol{\`e}ne Evain, Adrien Contesse, Antoine Pinchaud, Didier Schwab, Benjamin Lecouteux, Nathalie Henrich Bernardoni

Nous proposons un syst{\`e}me de reconnaissance des sons de beatbox s{'}inspirant de la reconnaissance automatique de la parole.

Paper
Add Code

FlauBERT : des mod\`eles de langue contextualis\'es pr\'e-entra\^\in\'es pour le fran\ccais (FlauBERT : Unsupervised Language Model Pre-training for French)

no code implementations • JEPTALNRECITAL 2020 • Hang Le, Lo{\"\i}c Vial, Jibril Frej, Vincent Segonne, Maximin Coavoux, Benjamin Lecouteux, Alex Allauzen, re, Beno{\^\i}t Crabb{\'e}, Laurent Besacier, Didier Schwab

Les mod{\`e}les de langue pr{\'e}-entra{\^\i}n{\'e}s sont d{\'e}sormais indispensables pour obtenir des r{\'e}sultats {\`a} l{'}{\'e}tat-de-l{'}art dans de nombreuses t{\^a}ches du TALN.

FLUE Language Modelling

Paper
Add Code

Providing Semantic Knowledge to a Set of Pictograms for People with Disabilities: a Set of Links between WordNet and Arasaac: Arasaac-WN

no code implementations • LREC 2020 • Didier Schwab, Pauline Trial, C{\'e}line Vaschalde, Lo{\"\i}c Vial, Emmanuelle Esperanca-Rodier, Benjamin Lecouteux

In order to make it possible to use pictograms automatically in NLP applications, we propose a database that links them to semantic knowledge.

Paper
Add Code

Learning Term Discrimination

no code implementations • 24 Apr 2020 • Jibril Frej, Phillipe Mulhem, Didier Schwab, Jean-Pierre Chevallet

Document indexing is a key component for efficient information retrieval (IR).

Information Retrieval Retrieval

Paper
Add Code

FlauBERT: Unsupervised Language Model Pre-training for French

7 code implementations • LREC 2020 • Hang Le, Loïc Vial, Jibril Frej, Vincent Segonne, Maximin Coavoux, Benjamin Lecouteux, Alexandre Allauzen, Benoît Crabbé, Laurent Besacier, Didier Schwab

Language models have become a key step to achieve state-of-the art results in many different Natural Language Processing (NLP) tasks.

Ranked #1 on Natural Language Inference on XNLI French

FLUE Language Modelling +4

125,862

Paper
Code

WIKIR: A Python toolkit for building a large-scale Wikipedia-based English Information Retrieval Dataset

1 code implementation • LREC 2020 • Jibril Frej, Didier Schwab, Jean-Pierre Chevallet

Since most standard ad-hoc information retrieval datasets publicly available for academic research (e. g. Robust04, ClueWeb09) have at most 250 annotated queries, the recent deep learning models for information retrieval perform poorly on these datasets.

Ad-Hoc Information Retrieval Information Retrieval +1

Paper
Code

The LIG system for the English-Czech Text Translation Task of IWSLT 2019

no code implementations • EMNLP (IWSLT) 2019 • Loïc Vial, Benjamin Lecouteux, Didier Schwab, Hang Le, Laurent Besacier

Therefore, we implemented a Transformer-based encoder-decoder neural system which is able to use the output of a pre-trained language model as input embeddings, and we compared its performance under three configurations: 1) without any pre-trained language model (constrained), 2) using a language model trained on the monolingual parts of the allowed English-Czech data (constrained), and 3) using a language model trained on a large quantity of external monolingual data (unconstrained).

Language Modelling Machine Translation +1

Paper
Add Code

ArbEngVec : Arabic-English Cross-Lingual Word Embedding Model

no code implementations • WS 2019 • Raki Lachraf, El Moatez Billah Nagoudi, Youcef Ayachi, Ahmed Abdelali, Didier Schwab

Word Embeddings (WE) are getting increasingly popular and widely applied in many Natural Language Processing (NLP) applications due to their effectiveness in capturing semantic properties of words; Machine Translation (MT), Information Retrieval (IR) and Information Extraction (IE) are among such areas.

Information Retrieval Machine Translation +6

Paper
Add Code

Apporter des connaissances s\'emantiques \`a un jeu de pictogrammes destin\'e \`a des personnes en situation de handicap : Un ensemble de liens entre Princeton WordNet et Arasaac, Arasaac-WN (Giving semantic knowledge to a set of pictograms for people with disabilities : a set of links between WordNet and Arasaac, Arasaac-WN )

no code implementations • JEPTALNRECITAL 2019 • Didier Schwab, Pauline Trial, Vaschalde C{\'e}line, Lo{\"\i}c Vial, Benjamin Lecouteux

Cet article pr{\'e}sente une ressource qui fait le lien entre WordNet et Arasaac, la plus grande base de pictogrammes librement disponible.

Paper
Add Code

Compression de vocabulaire de sens gr\^ace aux relations s\'emantiques pour la d\'esambigu\"\isation lexicale (Sense Vocabulary Compression through Semantic Knowledge for Word Sense Disambiguation)

no code implementations • JEPTALNRECITAL 2019 • Lo{\"\i}c Vial, Benjamin Lecouteux, Didier Schwab

En D{\'e}sambigu{\"\i}sation Lexicale (DL), les syst{\`e}mes supervis{\'e}s dominent largement les campagnes d{'}{\'e}valuation.

Word Sense Disambiguation

Paper
Add Code

Sense Vocabulary Compression through the Semantic Knowledge of WordNet for Neural Word Sense Disambiguation

2 code implementations • GWC 2019 • Loïc Vial, Benjamin Lecouteux, Didier Schwab

In this article, we tackle the issue of the limited quantity of manually sense annotated corpora for the task of word sense disambiguation, by exploiting the semantic relationships between senses such as synonymy, hypernymy and hyponymy, in order to compress the sense vocabulary of Princeton WordNet, and thus reduce the number of different sense tags that must be observed to disambiguate all words of the lexical database.

Ranked #1 on Word Sense Disambiguation on SemEval 2015 Task 13

Word Sense Disambiguation

Paper
Code

Improving the Coverage and the Generalization Ability of Neural Word Sense Disambiguation through Hypernymy and Hyponymy Relationships

no code implementations • 2 Nov 2018 • Loïc Vial, Benjamin Lecouteux, Didier Schwab

Our method leads to state of the art results on most WSD evaluation tasks, while improving the coverage of supervised systems, reducing the training time and the size of the models, without additional training data.

Ranked #2 on Word Sense Disambiguation on SemEval 2013 Task 12

Word Sense Disambiguation

Paper
Add Code

UFSAC: Unification of Sense Annotated Corpora and Tools

1 code implementation • LREC 2018 • Lo{\"\i}c Vial, Benjamin Lecouteux, Didier Schwab

Word Sense Disambiguation

Paper
Code

Approche supervis\'ee \`a base de cellules LSTM bidirectionnelles pour la d\'esambigu\"\isation lexicale (LSTM Based Supervised Approach for Word Sense Disambiguation)

no code implementations • JEPTALNRECITAL 2018 • Lo{\"\i}c Vial, Benjamin Lecouteux, Didier Schwab

En d{\'e}sambigu{\"\i}sation lexicale, l{'}utilisation des r{\'e}seaux de neurones est encore peu pr{\'e}sente et tr{\`e}s r{\'e}cente.

NER Word Sense Disambiguation

Paper
Add Code

Traduction automatique de corpus en anglais annot\'es en sens pour la d\'esambigu\"\isation lexicale d'une langue moins bien dot\'ee, l'exemple de l'arabe (Automatic Translation of English Sense Annotated Corpora for Word Sense Disambiguation of a Less Well-endowed Language, the Example of Arabic)

no code implementations • JEPTALNRECITAL 2018 • Marwa Hadj Salah, Lo{\"\i}c Vial, Herv{\'e} Blanchon, Mounir Zrigui, Didier Schwab

Nous {\'e}valuons la qualit{\'e} de nos syst{\`e}mes de d{\'e}sambigu{\"\i}sation gr{\^a}ce {\`a} un corpus d{'}{\'e}valuation en arabe nouvellement disponible.

Word Sense Disambiguation

Paper
Add Code

Un corpus en arabe annot\'e manuellement avec des sens WordNet (Arabic Manually Sense Annotated Corpus with WordNet Senses)

no code implementations • JEPTALNRECITAL 2018 • Marwa Hadj Salah, Herv{\'e} Blanchon, Mounir Zrigui, Didier Schwab

OntoNotes comprend le seul corpus manuellement annot{\'e} en sens librement disponible pour l{'}arabe.

Paper
Add Code

Système de traduction automatique statistique Anglais-Arabe

no code implementations • 6 Feb 2018 • Marwa Hadj Salah, Didier Schwab, Hervé Blanchon, Mounir Zrigui

Machine translation (MT) is the process of translating text written in a source language into text in a target language.

Machine Translation Translation

Paper
Add Code

LIM-LIG at SemEval-2017 Task1: Enhancing the Semantic Similarity for Arabic Sentences with Vectors Weighting

no code implementations • SEMEVAL 2017 • El Moatez Billah Nagoudi, J{\'e}r{\'e}my Ferrero, Didier Schwab

This article describes our proposed system named LIM-LIG.

Descriptive Information Retrieval +7

Paper
Add Code

Am\'elioration de la similarit\'e s\'emantique vectorielle par m\'ethodes non-supervis\'ees (Improved the Semantic Similarity with Weighting Vectors)

no code implementations • JEPTALNRECITAL 2017 • El-Moatez-Billah Nagoudi, J{\'e}r{\'e}my Ferrero, Didier Schwab

Paper
Add Code

Repr\'esentation vectorielle de sens pour la d\'esambigu\"\isation lexicale \`a base de connaissances (Sense Embeddings in Knowledge-Based Word Sense Disambiguation)

no code implementations • JEPTALNRECITAL 2017 • Lo{\"\i}c Vial, Benjamin Lecouteux, Didier Schwab

Dans cet article, nous proposons une nouvelle m{\'e}thode pour repr{\'e}senter sous forme vectorielle les sens d{'}un dictionnaire.

SENTER Word Sense Disambiguation

Paper
Add Code

Uniformisation de corpus anglais annot\'es en sens (Unification of sense annotated English corpora for word sense disambiguation)

no code implementations • JEPTALNRECITAL 2017 • Lo{\"\i}c Vial, Benjamin Lecouteux, Didier Schwab

Pour la d{\'e}sambigu{\"\i}sation lexicale en anglais, on compte aujourd{'}hui une quinzaine de corpus annot{\'e}s en sens dans des formats souvent diff{\'e}rents et provenant de diff{\'e}rentes versions du Princeton WordNet.

Word Sense Disambiguation

Paper
Add Code

Deep Investigation of Cross-Language Plagiarism Detection Methods

1 code implementation • WS 2017 • Jeremy Ferrero, Laurent Besacier, Didier Schwab, Frederic Agnes

This paper is a deep investigation of cross-language plagiarism detection methods on a new recently introduced open dataset, which contains parallel and comparable collections of documents with multiple characteristics (different genres, languages and sizes of texts).

Paper
Code

Comparison of Global Algorithms in Word Sense Disambiguation

no code implementations • 7 Apr 2017 • Loïc Vial, Andon Tchechmedjiev, Didier Schwab

We find that CSA, GA and SA all eventually converge to similar results (0. 98 F1 score), but CSA gets there faster (in fewer scorer calls) and reaches up to 0. 95 F1 before SA in fewer scorer calls.

Word Sense Disambiguation

Paper
Add Code

CompiLIG at SemEval-2017 Task 1: Cross-Language Plagiarism Detection Methods for Semantic Textual Similarity

1 code implementation • SEMEVAL 2017 • Jeremy Ferrero, Frederic Agnes, Laurent Besacier, Didier Schwab

We present our submitted systems for Semantic Textual Similarity (STS) Track 4 at SemEval-2017.

Paper
Code

Semantic Similarity of Arabic Sentences with Word Embeddings

no code implementations • WS 2017 • El Moatez Billah Nagoudi, Didier Schwab

Semantic textual similarity is the basis of countless applications and plays an important role in diverse areas, such as information retrieval, plagiarism detection, information extraction and machine translation.

Descriptive Information Retrieval +10

Paper
Add Code

Using Word Embedding for Cross-Language Plagiarism Detection

no code implementations • EACL 2017 • J{\'e}r{\'e}my Ferrero, Laurent Besacier, Didier Schwab, Fr{\'e}d{\'e}ric Agn{\`e}s

This paper proposes to use distributed representation of words (word embeddings) in cross-language textual similarity detection.

Machine Translation Sentence +1

Paper
Add Code

Sense Embeddings in Knowledge-Based Word Sense Disambiguation

1 code implementation • WS 2017 • Lo{\"\i}c Vial, Benjamin Lecouteux, Didier Schwab

Machine Translation Word Embeddings +1

Paper
Code

Extension lexicale de d\'efinitions gr\^ace \`a des corpus annot\'es en sens (Lexical Expansion of definitions based on sense-annotated corpus )

no code implementations • JEPTALNRECITAL 2016 • Lo{\"\i}c Vial, Andon Tchechmedjiev, Didier Schwab

La proximit{\'e} s{\'e}mantique de deux d{\'e}finitions est {\'e}valu{\'e}e en comptant le nombre de mots communs dans les d{\'e}finitions correspondantes dans un dictionnaire.

Paper
Add Code

Am\'elioration de la traduction automatique d'un corpus annot\'e (Improvement of the automatic translation of an annotated corpus)

no code implementations • JEPTALNRECITAL 2016 • Marwa Hadj Salah, Herv{\'e} Blanchon, Mounir Zrigui, Didier Schwab

Dans cet article, nous pr{\'e}sentons une m{\'e}thode pour am{\'e}liorer la traduction automatique d{'}un corpus annot{\'e} et porter ses annotations de l{'}anglais vers une langue cible.

Paper
Add Code

A Multilingual, Multi-style and Multi-granularity Dataset for Cross-language Textual Similarity Detection

1 code implementation • LREC 2016 • J{\'e}r{\'e}my Ferrero, Fr{\'e}d{\'e}ric Agn{\`e}s, Laurent Besacier, Didier Schwab

In this paper we describe our effort to create a dataset for the evaluation of cross-language textual similarity detection.

Paper
Code

Cr\'eation rapide et efficace d'un syst\`eme de d\'esambigu\"\isation lexicale pour une langue peu dot\'ee

no code implementations • JEPTALNRECITAL 2015 • Mohammad Nasiruddin, Andon Tchechmedjiev, Herv{\'e} Blanchon, Didier Schwab

Nous pr{\'e}sentons une m{\'e}thode pour cr{\'e}er rapidement un syst{\`e}me de d{\'e}sambigu{\"\i}sation lexicale (DL) pour une langue L peu dot{\'e}e pourvu que l{'}on dispose d{'}un syst{\`e}me de traduction automatique statistique (TAS) d{'}une langue riche en corpus annot{\'e}s en sens (ici l{'}anglais) vers L. Il est, en effet, plus facile de disposer des ressources n{\'e}cessaires {\`a} la cr{\'e}ation d{'}un syst{\`e}me de TAS que des ressources d{\'e}di{\'e}es n{\'e}cessaires {\`a} la cr{\'e}ation d{'}un syst{\`e}me de DL pour la langue L. Notre m{\'e}thode consiste {\`a} traduire automatiquement un corpus annot{\'e} en sens vers la langue L, puis de cr{\'e}er le syst{\`e}me de d{\'e}sambigu{\"\i}sation pour L par des m{\'e}thodes supervis{\'e}es classiques.