Search Results for author: Thomas Pellegrini

Found 20 papers, 11 papers with code

Weakly supervised discourse segmentation for multiparty oral conversations

1 code implementation • EMNLP 2021 • Lila Gravellier, Julie Hunter, Philippe Muller, Thomas Pellegrini, Isabelle Ferrané

Discourse segmentation, the first step of discourse analysis, has been shown to improve results for text summarization, translation and other NLP tasks.

Discourse Segmentation Segmentation +3

Paper
Code

Audio classification with Dilated Convolution with Learnable Spacings

2 code implementations • 25 Sep 2023 • Ismail Khalfaoui-Hassani, Timothée Masquelier, Thomas Pellegrini

Dilated convolution with learnable spacings (DCLS) is a recent convolution method in which the positions of the kernel elements are learned throughout training by backpropagation.

Audio Classification Audio Tagging

Paper
Code

Killing two birds with one stone: Can an audio captioning system also be used for audio-text retrieval?

no code implementations • 29 Aug 2023 • Etienne Labbé, Thomas Pellegrini, Julien Pinquier

For ATR, we propose using the standard Cross-Entropy loss values obtained for any audio/caption pair.

AudioCaps Audio captioning +5

Paper
Add Code

Dilated Convolution with Learnable Spacings: beyond bilinear interpolation

1 code implementation • 1 Jun 2023 • Ismail Khalfaoui-Hassani, Thomas Pellegrini, Timothée Masquelier

Dilated Convolution with Learnable Spacings (DCLS) is a recently proposed variation of the dilated convolution in which the spacings between the non-zero elements in the kernel, or equivalently their positions, are learnable.

Paper
Code

Is my automatic audio captioning system so bad? spider-max: a metric to consider several caption candidates

1 code implementation • 14 Nov 2022 • Etienne Labbé, Thomas Pellegrini, Julien Pinquier

For this reason, several complementary metrics, such as BLEU, CIDEr, SPICE and SPIDEr, are used to compare a single automatic caption to one or several captions of reference, produced by a human annotator.

AudioCaps Audio captioning +3

Paper
Code

Audio-video fusion strategies for active speaker detection in meetings

no code implementations • 9 Jun 2022 • Lionel Pibre, Francisco Madrigal, Cyrille Equoy, Frédéric Lerasle, Thomas Pellegrini, Julien Pinquier, Isabelle Ferrané

In this paper, we propose two different types of fusion for the detection of the active speaker, combining two visual modalities and an audio modality through neural networks.

Management Optical Flow Estimation +2

Paper
Add Code

Dilated convolution with learnable spacings

2 code implementations • 7 Dec 2021 • Ismail Khalfaoui-Hassani, Thomas Pellegrini, Timothée Masquelier

We call this method "Dilated Convolution with Learnable Spacings" (DCLS) and generalize it to the n-dimensional convolution case.

Image Classification Object Detection +1

Paper
Code

End-to-end acoustic modelling for phone recognition of young readers

no code implementations • 4 Mar 2021 • Lucile Gelin, Morgane Daniel, Julien Pinquier, Thomas Pellegrini

Through transfer learning, a Transformer model complemented with a Connectionist Temporal Classification (CTC) objective function, reaches a phone error rate of 28. 1%, outperforming a state-of-the-art DNN-HMM model by 6. 6% relative, as well as other end-to-end architectures by more than 8. 5% relative.

Acoustic Modelling Transfer Learning

Paper
Add Code

Fast threshold optimization for multi-label audio tagging using Surrogate gradient learning

1 code implementation • 1 Mar 2021 • Thomas Pellegrini, Timothée Masquelier

Multi-label audio tagging consists of assigning sets of tags to audio recordings.

Audio Tagging

Paper
Code

Comparison of semi-supervised deep learning algorithms for audio classification

1 code implementation • 16 Feb 2021 • Léo Cances, Etienne Labbé, Thomas Pellegrini

In all but one cases, MM, RMM, and FM outperformed MT and DCT significantly, MM and RMM being the best methods in most experiments.

Audio Tagging Data Augmentation +3

Paper
Code

Low-activity supervised convolutional spiking neural networks applied to speech commands recognition

1 code implementation • 13 Nov 2020 • Thomas Pellegrini, Romain Zimmer, Timothée Masquelier

Deep Neural Networks (DNNs) are the current state-of-the-art models in many speech related tasks.

Paper
Code

Informations segmentales pour la caract\'erisation phon\'etique du locuteur : variabilit\'e inter- et intra-locuteurs (An automatic classification task involving 44 speakers was performed using convolutional neural networks (CNN) on broadband spectrograms extracted from 2-second sequences of a spontaneous speech corpus (NCCFr))

no code implementations • JEPTALNRECITAL 2020 • Cedric Gendrot, Emmanuel Ferragne, Thomas Pellegrini

Les r{\'e}sultats montrent que les voyelles orales influent avant toute autre classe sur le taux de classification, suivies ensuite par les occlusives orales.

Classification General Classification

Paper
Add Code

Reconnaissance de phones fond\'ee sur du Transfer Learning pour des enfants apprenants lecteurs en environnement de classe (Transfer Learning based phone recognition on children learning to read, with speech recorded in a classroom environment)

no code implementations • JEPTALNRECITAL 2020 • Lucile Gelin, Morgane Daniel, Thomas Pellegrini, Julien Pinquier

A conditions {\'e}gales, les performances actuelles de la reconnaissance vocale pour enfants sont inf{\'e}rieures {\`a} celles des syst{\`e}mes pour adultes.

Transfer Learning

Paper
Add Code

Technical report: supervised training of convolutional spiking neural networks with PyTorch

2 code implementations • 22 Nov 2019 • Romain Zimmer, Thomas Pellegrini, Srisht Fateh Singh, Timothée Masquelier

Indeed, the most commonly used spiking neuron model, the leaky integrate-and-fire neuron, obeys a differential equation which can be approximated using discrete time steps, leading to a recurrent relation for the potential.

Paper
Code

Evaluation of post-processing algorithms for polyphonic sound event detection

1 code implementation • 17 Jun 2019 • Leo Cances, Patrice Guyot, Thomas Pellegrini

We compared post-processing algorithms on the temporal prediction curves of two models: one based on the challenge's baseline and a Multiple Instance Learning (MIL) model.

Audio Tagging Event Detection +2

Paper
Code

Cosine-similarity penalty to discriminate sound classes in weakly-supervised sound event detection

1 code implementation • 10 Jan 2019 • Thomas Pellegrini, Léo Cances

In this work, we address Sound Event Detection in the case where a weakly annotated dataset is available for training.

Sound Audio and Speech Processing

Paper
Code

R\'eseau de neurones convolutif pour l'\'evaluation automatique de la prononciation (CNN-based automatic pronunciation assessment of Japanese speakers learning French )

no code implementations • JEPTALNRECITAL 2016 • Thomas Pellegrini, Lionel Fontan, Halima Sahraoui

Un gain de performance relatif de 13, 4{\%} a {\'e}t{\'e} obtenu avec le CNN, avec une pr{\'e}cision globale de 72, 6{\%}, sur un corpus d{'}{\'e}valuation enregistr{\'e} par 23 locuteurs japonophones.

Paper
Add Code

Influence de la quantit\'e de donn\'ees sur une t\^ache de segmentation de phones fond\'ee sur les r\'eseaux de neurones (Phone-level speech segmentation with neural networks : influence of the amount of data )

no code implementations • JEPTALNRECITAL 2016 • C{\'e}line Manenti, Thomas Pellegrini, Julien Pinquier

Dans cet article, nous d{\'e}crivons une {\'e}tude exp{\'e}rimentale de segmentation de parole en unit{\'e}s acoustiques sous-lexicales (phones) {\`a} l{'}aide de r{\'e}seaux de neurones.

Paper
Add Code

Predicting disordered speech comprehensibility from Goodness of Pronunciation scores

no code implementations • WS 2015 • Lionel Fontan, Thomas Pellegrini, Julia Olcoz, Alberto Abad

Speech Recognition

Paper
Add Code

El-WOZ: a client-server wizard-of-oz interface

no code implementations • LREC 2014 • Thomas Pellegrini, Vahid Hedayati, Angela Costa

In order to collect spontaneous speech in a situation of interaction with a machine, this interface was designed as a Wizard-of-Oz (WOZ) plateform.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.