1 code implementation • EMNLP 2021 • Lila Gravellier, Julie Hunter, Philippe Muller, Thomas Pellegrini, Isabelle Ferrané
Discourse segmentation, the first step of discourse analysis, has been shown to improve results for text summarization, translation and other NLP tasks.
2 code implementations • 25 Sep 2023 • Ismail Khalfaoui-Hassani, Timothée Masquelier, Thomas Pellegrini
Dilated convolution with learnable spacings (DCLS) is a recent convolution method in which the positions of the kernel elements are learned throughout training by backpropagation.
no code implementations • 29 Aug 2023 • Etienne Labbé, Thomas Pellegrini, Julien Pinquier
For ATR, we propose using the standard Cross-Entropy loss values obtained for any audio/caption pair.
1 code implementation • 1 Jun 2023 • Ismail Khalfaoui-Hassani, Thomas Pellegrini, Timothée Masquelier
Dilated Convolution with Learnable Spacings (DCLS) is a recently proposed variation of the dilated convolution in which the spacings between the non-zero elements in the kernel, or equivalently their positions, are learnable.
1 code implementation • 14 Nov 2022 • Etienne Labbé, Thomas Pellegrini, Julien Pinquier
For this reason, several complementary metrics, such as BLEU, CIDEr, SPICE and SPIDEr, are used to compare a single automatic caption to one or several captions of reference, produced by a human annotator.
no code implementations • 9 Jun 2022 • Lionel Pibre, Francisco Madrigal, Cyrille Equoy, Frédéric Lerasle, Thomas Pellegrini, Julien Pinquier, Isabelle Ferrané
In this paper, we propose two different types of fusion for the detection of the active speaker, combining two visual modalities and an audio modality through neural networks.
2 code implementations • 7 Dec 2021 • Ismail Khalfaoui-Hassani, Thomas Pellegrini, Timothée Masquelier
We call this method "Dilated Convolution with Learnable Spacings" (DCLS) and generalize it to the n-dimensional convolution case.
no code implementations • 4 Mar 2021 • Lucile Gelin, Morgane Daniel, Julien Pinquier, Thomas Pellegrini
Through transfer learning, a Transformer model complemented with a Connectionist Temporal Classification (CTC) objective function, reaches a phone error rate of 28. 1%, outperforming a state-of-the-art DNN-HMM model by 6. 6% relative, as well as other end-to-end architectures by more than 8. 5% relative.
1 code implementation • 1 Mar 2021 • Thomas Pellegrini, Timothée Masquelier
Multi-label audio tagging consists of assigning sets of tags to audio recordings.
1 code implementation • 16 Feb 2021 • Léo Cances, Etienne Labbé, Thomas Pellegrini
In all but one cases, MM, RMM, and FM outperformed MT and DCT significantly, MM and RMM being the best methods in most experiments.
1 code implementation • 13 Nov 2020 • Thomas Pellegrini, Romain Zimmer, Timothée Masquelier
Deep Neural Networks (DNNs) are the current state-of-the-art models in many speech related tasks.
no code implementations • JEPTALNRECITAL 2020 • Cedric Gendrot, Emmanuel Ferragne, Thomas Pellegrini
Les r{\'e}sultats montrent que les voyelles orales influent avant toute autre classe sur le taux de classification, suivies ensuite par les occlusives orales.
no code implementations • JEPTALNRECITAL 2020 • Lucile Gelin, Morgane Daniel, Thomas Pellegrini, Julien Pinquier
A conditions {\'e}gales, les performances actuelles de la reconnaissance vocale pour enfants sont inf{\'e}rieures {\`a} celles des syst{\`e}mes pour adultes.
2 code implementations • 22 Nov 2019 • Romain Zimmer, Thomas Pellegrini, Srisht Fateh Singh, Timothée Masquelier
Indeed, the most commonly used spiking neuron model, the leaky integrate-and-fire neuron, obeys a differential equation which can be approximated using discrete time steps, leading to a recurrent relation for the potential.
1 code implementation • 17 Jun 2019 • Leo Cances, Patrice Guyot, Thomas Pellegrini
We compared post-processing algorithms on the temporal prediction curves of two models: one based on the challenge's baseline and a Multiple Instance Learning (MIL) model.
1 code implementation • 10 Jan 2019 • Thomas Pellegrini, Léo Cances
In this work, we address Sound Event Detection in the case where a weakly annotated dataset is available for training.
Sound Audio and Speech Processing
no code implementations • JEPTALNRECITAL 2016 • Thomas Pellegrini, Lionel Fontan, Halima Sahraoui
Un gain de performance relatif de 13, 4{\%} a {\'e}t{\'e} obtenu avec le CNN, avec une pr{\'e}cision globale de 72, 6{\%}, sur un corpus d{'}{\'e}valuation enregistr{\'e} par 23 locuteurs japonophones.
no code implementations • JEPTALNRECITAL 2016 • C{\'e}line Manenti, Thomas Pellegrini, Julien Pinquier
Dans cet article, nous d{\'e}crivons une {\'e}tude exp{\'e}rimentale de segmentation de parole en unit{\'e}s acoustiques sous-lexicales (phones) {\`a} l{'}aide de r{\'e}seaux de neurones.
no code implementations • LREC 2014 • Thomas Pellegrini, Vahid Hedayati, Angela Costa
In order to collect spontaneous speech in a situation of interaction with a machine, this interface was designed as a Wizard-of-Oz (WOZ) plateform.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1