1 code implementation • 17 Apr 2023 • R Gnana Praveen, Eric Granger, Patrick Cardinal
In video-based emotion recognition (ER), it is important to effectively leverage the complementary relationship among audio (A) and visual (V) modalities, while retaining the intra-modal characteristics of individual modalities.
1 code implementation • 19 Sep 2022 • R Gnana Praveen, Eric Granger, Patrick Cardinal
In this paper, we focus on dimensional ER based on the fusion of facial and vocal modalities extracted from videos, where complementary audio-visual (A-V) relationships are explored to predict an individual's emotional states in valence-arousal space.
no code implementations • 14 Jul 2022 • Mohammad Esmaeilpour, Nourhene Chaalia, Patrick Cardinal
This paper introduces a new synthesis-based defense algorithm for counteracting with a varieties of adversarial attacks developed for challenging the performance of the cutting-edge speech-to-text transcription systems.
no code implementations • 24 May 2022 • Mohammad Esmaeilpour, Nourhene Chaalia, Adel Abusitta, Francois-Xavier Devailly, Wissem Maazoun, Patrick Cardinal
We refer to this noble definition as compound conditional vector and employ it for training the generator network.
no code implementations • 14 Apr 2022 • Mohammad Esmaeilpour, Patrick Cardinal, Alessandro Lameiras Koerich
This paper investigates the impact of different standard environmental sound representations (spectrograms) on the recognition performance and adversarial attack robustness of a victim residual convolutional neural network, namely ResNet-18.
1 code implementation • 28 Mar 2022 • Gnana Praveen Rajasekar, Wheidima Carneiro de Melo, Nasib Ullah, Haseeb Aslam, Osama Zeeshan, Théo Denorme, Marco Pedersoli, Alessandro Koerich, Simon Bacon, Patrick Cardinal, Eric Granger
Specifically, we propose a joint cross-attention model that relies on the complementary relationships to extract the salient features across A-V modalities, allowing for accurate prediction of continuous values of valence and arousal.
no code implementations • 12 Nov 2021 • Mohammad Esmaeilpour, Nourhene Chaalia, Adel Abusitta, Francois-Xavier Devailly, Wissem Maazoun, Patrick Cardinal
This paper introduces a bi-discriminator GAN for synthesizing tabular datasets containing continuous, binary, and discrete columns.
1 code implementation • 9 Nov 2021 • Gnana Praveen R, Eric Granger, Patrick Cardinal
Results indicate that our cross-attentional A-V fusion model is a cost-effective approach that outperforms state-of-the-art fusion approaches.
no code implementations • 15 Mar 2021 • Mohammad Esmaeilpour, Patrick Cardinal, Alessandro Lameiras Koerich
This paper introduces a novel adversarial algorithm for attacking the state-of-the-art speech-to-text systems, namely DeepSpeech, Kaldi, and Lingvo.
no code implementations • 15 Mar 2021 • Mohammad Esmaeilpour, Patrick Cardinal, Alessandro Lameiras Koerich
This paper introduces a defense approach against end-to-end adversarial attacks developed for cutting-edge speech-to-text systems.
no code implementations • 25 Jan 2021 • Gnana Praveen R, Eric Granger, Patrick Cardinal
In this paper, we provide a comprehensive review of weakly supervised learning (WSL) approaches for facial behavior analysis with both categorical as well as dimensional labels along with the challenges and potential research directions associated with it.
no code implementations • 28 Oct 2020 • Gnana Praveen R, Eric Granger, Patrick Cardinal
The WSDA-OR model enforces ordinal relationships among the intensity levels as-signed to the target sequences, and associates multiple relevant frames to sequence-level labels (instead of a single frame).
no code implementations • 22 Oct 2020 • Mohammad Esmaeilpour, Patrick Cardinal, Alessandro Lameiras Koerich
In this paper we propose a novel defense approach against end-to-end adversarial attacks developed to fool advanced speech-to-text systems such as DeepSpeech and Lingvo.
no code implementations • 12 Oct 2020 • Mohammad Esmaeilpour, Raymel Alfonso Sallo, Olivier St-Georges, Patrick Cardinal, Alessandro Lameiras Koerich
In this paper we propose a conditioning trick, called difference departure from normality, applied on the generator network in response to instability issues during GAN training.
no code implementations • 26 Aug 2020 • Raymel Alfonso Sallo, Mohammad Esmaeilpour, Patrick Cardinal
In this paper, we investigate the potential effect of the adversarially training on the robustness of six advanced deep neural networks against a variety of targeted and non-targeted adversarial attacks.
no code implementations • 13 Aug 2020 • Gnana Praveen R, Eric Granger, Patrick Cardinal
Estimation of pain intensity from facial expressions captured in videos has an immense potential for health care applications.
no code implementations • 12 Aug 2020 • Mohammad Esmaeilpour, Raymel Alfonso Sallo, Olivier St-Georges, Patrick Cardinal, Alessandro Lameiras Koerich
In this paper we address the instability issue of generative adversarial network (GAN) by proposing a new similarity metric in unitary space of Schur decomposition for 2D representations of audio and speech signals.
no code implementations • 27 Jul 2020 • Mohammad Esmaeilpour, Patrick Cardinal, Alessandro Lameiras Koerich
In this paper, we investigate the impact of different standard environmental sound representations (spectrograms) on the recognition performance and adversarial attack robustness of a victim residual convolutional neural network.
no code implementations • 26 Oct 2019 • Mohammad Esmaeilpour, Patrick Cardinal, Alessandro Lameiras Koerich
Adversarial attacks have always been a serious threat for any data-driven model.
no code implementations • 17 Oct 2019 • Gnana Praveen R, Eric Granger, Patrick Cardinal
Automatic pain assessment has an important potential diagnostic value for populations that are incapable of articulating their pain experiences.
no code implementations • 2 Oct 2019 • Masih Aminbeidokhti, Marco Pedersoli, Patrick Cardinal, Eric Granger
Video-based emotion recognition is a challenging task because it requires to distinguish the small deformations of the human face that represent emotions, while being invariant to stronger visual differences due to different identities.
1 code implementation • arXiv preprint 2019 • Sajjad Abdoli, Luiz G. Hafemann, Jerome Rony, Ismail Ben Ayed, Patrick Cardinal, Alessandro L. Koerich
We demonstrate the existence of universal adversarial perturbations, which can fool a family of audio classification architectures, for both targeted and untargeted attack scenarios.
no code implementations • 6 Jul 2019 • Juan D. S. Ortega, Mohammed Senoussaoui, Eric Granger, Marco Pedersoli, Patrick Cardinal, Alessandro L. Koerich
This paper presents a novel deep neural network (DNN) for multimodal fusion of audio, video and text modalities for emotion recognition.
no code implementations • 6 Jul 2019 • Mohammed Senoussaoui, Patrick Cardinal, Alessandro Lameiras Koerich
The conventional BoW model is based on a dictionary (codebook) built from elementary representations which are selected randomly or by using a clustering algorithm on a training dataset.
no code implementations • 25 Jun 2019 • Juan D. S. Ortega, Patrick Cardinal, Alessandro L. Koerich
In this paper we propose a fusion approach to continuous emotion recognition that combines visual and auditory modalities in their representation spaces to predict the arousal and valence levels.
no code implementations • 26 Apr 2019 • Mohammed Senoussaoui, Patrick Cardinal, Najim Dehak, Alessandro Lameiras Koerich
Automatic measuring of speaker sincerity degree is a novel research problem in computational paralinguistics.
no code implementations • 24 Apr 2019 • Mohammad Esmaeilpour, Patrick Cardinal, Alessandro Lameiras Koerich
In this paper we first review some strong adversarial attacks that may affect both audio signals and their 2D representations and evaluate the resiliency of the most common machine learning model, namely deep learning models and support vector machines (SVM) trained on 2D audio representations such as short time Fourier transform (STFT), discrete wavelet transform (DWT) and cross recurrent plot (CRP) against several state-of-the-art adversarial attacks.
3 code implementations • 18 Apr 2019 • Sajjad Abdoli, Patrick Cardinal, Alessandro Lameiras Koerich
In this paper, we present an end-to-end approach for environmental sound classification based on a 1D Convolution Neural Network (CNN) that learns a representation directly from the audio signal.
Ranked #6 on Environmental Sound Classification on UrbanSound8K (Accuracy metric, using extra training data)
Environmental Sound Classification General Classification +1
no code implementations • 8 Apr 2019 • Mohammad Esmaeilpour, Patrick Cardinal, Alessandro Lameiras Koerich
In this paper we propose a novel environmental sound classification approach incorporating unsupervised feature learning from codebook via spherical $K$-Means++ algorithm and a new architecture for high-level data augmentation.
1 code implementation • 23 Sep 2015 • Ahmed Ali, Najim Dehak, Patrick Cardinal, Sameer Khurana, Sree Harsha Yella, James Glass, Peter Bell, Steve Renals
We used these features in a binary classifier to discriminate between Modern Standard Arabic (MSA) and Dialectal Arabic, with an accuracy of 100%.