no code implementations • 9 Aug 2023 • Diep Luong, Minh Tran, Shayan Gharib, Konstantinos Drossos, Tuomas Virtanen
Privacy preservation has long been a concern in smart acoustic monitoring systems, where speech can be passively recorded along with a target signal in the system's operating environment.
1 code implementation • 29 Apr 2023 • Shayan Gharib, Minh Tran, Diep Luong, Konstantinos Drossos, Tuomas Virtanen
In this study, we propose a novel adversarial training method for learning representations of audio recordings that effectively prevents the detection of speech activity from the latent features of the recordings.
1 code implementation • 4 Aug 2022 • Yanxiong Li, Wenchang Cao, Konstantinos Drossos, Tuomas Virtanen
Automatic estimation of domestic activities from audio can be used to solve many problems, such as reducing the labor cost for nursing the elderly people.
no code implementations • 20 Apr 2022 • Samuel Lipping, Parthasaarathy Sudarsanam, Konstantinos Drossos, Tuomas Virtanen
Audio question answering (AQA) is a multimodal translation task where a system analyzes an audio signal and a natural language question, to generate a desirable natural language answer.
no code implementations • 14 Oct 2021 • Benno Weck, Xavier Favory, Konstantinos Drossos, Xavier Serra
Having attracted attention only recently, very few works on AAC study the performance of existing pre-trained audio and natural language processing resources.
1 code implementation • 6 Oct 2021 • Huang Xie, Okko Räsänen, Konstantinos Drossos, Tuomas Virtanen
We investigate unsupervised learning of correspondences between sound events and textual phrases through aligning audio clips with textual captions describing the content of a whole audio clip.
no code implementations • 4 Oct 2021 • Andreas Triantafyllopoulos, Manuel Milling, Konstantinos Drossos, Björn W. Schuller
Although these factors play a well-understood role in the performance of ASC models, most works report single evaluation metrics taking into account all different strata of a particular dataset.
1 code implementation • 16 Jul 2021 • Jan Berg, Konstantinos Drossos
In our scenario, a pre-optimized AAC method is used for some unseen general audio signals and can update its parameters in order to adapt to the new information, given a new reference caption.
no code implementations • 14 Jun 2021 • Einari Vaaras, Sari Ahlqvist-Björkroth, Konstantinos Drossos, Okko Räsänen
Researchers have recently started to study how the emotional speech heard by young infants can affect their developmental outcomes.
1 code implementation • 1 Apr 2021 • Andres Ferraro, Xavier Favory, Konstantinos Drossos, Yuntae Kim, Dmitry Bogdanov
Modeling various aspects that make a music piece unique is a challenging task, requiring the combination of multiple sources of information.
1 code implementation • 27 Oct 2020 • Xavier Favory, Konstantinos Drossos, Tuomas Virtanen, Xavier Serra
In this work we propose a method for learning audio representations using an audio autoencoder (AAE), a general word embeddings model (WEM), and a multi-head self-attention (MHA) mechanism.
1 code implementation • 21 Oct 2020 • An Tran, Konstantinos Drossos, Tuomas Virtanen
Automated audio captioning (AAC) is a novel task, where a method takes as an input an audio sample and outputs a textual description (i. e. a caption) of its contents.
no code implementations • 10 Jul 2020 • Konstantinos Drossos, Stylianos I. Mimilakis, Tuomas Virtanen
Sound event detection (SED) is the task of identifying sound events along with their onset and offset times.
1 code implementation • 9 Jul 2020 • Emre Çakır, Konstantinos Drossos, Tuomas Virtanen
Audio captioning is a multi-modal task, focusing on using natural language for describing the contents of general audio.
1 code implementation • 6 Jul 2020 • Stylianos Ioannis Mimilakis, Konstantinos Drossos, Gerald Schuller
In this work we present a method for unsupervised learning of audio representations, focused on the task of singing voice separation.
Sound Audio and Speech Processing
no code implementations • 6 Jul 2020 • Pyry Pyykkönen, Styliannos I. Mimilakis, Konstantinos Drossos, Tuomas Virtanen
We focus on singing voice separation, employing an RNN architecture, and we replace the RNNs with DWS convolutions (DWS-CNNs).
1 code implementation • 6 Jul 2020 • Khoa Nguyen, Konstantinos Drossos, Tuomas Virtanen
In this work we present an approach that focuses on explicitly taking advantage of this difference of lengths between sequences, by applying a temporal sub-sampling to the audio input sequence.
2 code implementations • 15 Jun 2020 • Xavier Favory, Konstantinos Drossos, Tuomas Virtanen, Xavier Serra
Audio representation learning based on deep neural networks (DNNs) emerged as an alternative approach to hand-crafted features.
1 code implementation • 3 Mar 2020 • Stylianos I. Mimilakis, Konstantinos Drossos, Gerald Schuller
In this work, we present a method for learning interpretable music signal representations directly from waveform signals.
1 code implementation • 2 Feb 2020 • Konstantinos Drossos, Stylianos I. Mimilakis, Shayan Gharib, Yanxiong Li, Tuomas Virtanen
The number of the channels of the CNNs and size of the weight matrices of the RNNs have a direct effect on the total amount of parameters of the SED method, which is to a couple of millions.
no code implementations • 1 Nov 2019 • Niccoló Nicodemo, Gaurav Naithani, Konstantinos Drossos, Tuomas Virtanen, Roberto Saletti
The application of the low-bit quantization allows a 50% reduction of the DNN memory footprint while the STOI performance drops only by 2. 7%.
7 code implementations • 21 Oct 2019 • Konstantinos Drossos, Samuel Lipping, Tuomas Virtanen
Audio captioning is the novel task of general audio content description using free text.
1 code implementation • 22 Jul 2019 • Samuel Lipping, Konstantinos Drossos, Tuomas Virtanen
In this paper we present a three steps based framework for crowdsourcing an audio captioning dataset, based on concepts and practises followed for the creation of widely used image captioning and machine translations datasets.
Sound Audio and Speech Processing
1 code implementation • Workshop on Detection and Classification of Acoustic Scenes and Events (DCASE) 2019 2019 • Konstantinos Drossos, Shayan Gharib, Paul Magron, Tuomas Virtanen
On the contrary, with our method there is a decrease of 4% at F1 score and an increase of 7% at ER for the TUT-SED Synthetic 2016 dataset.
1 code implementation • 24 Apr 2019 • Konstantinos Drossos, Paul Magron, Tuomas Virtanen
A challenging problem in deep learning-based machine listening field is the degradation of the performance when using data from unseen conditions.
no code implementations • 12 Apr 2019 • Stylianos Ioannis Mimilakis, Konstantinos Drossos, Estefanía Cano, Gerald Schuller
We examine the mapping functions of neural networks based on the denoising autoencoder (DAE) model that are conditioned on the mixture magnitude spectra.
1 code implementation • 17 Aug 2018 • Shayan Gharib, Konstantinos Drossos, Emre Çakır, Dmitriy Serdyuk, Tuomas Virtanen
A general problem in acoustic scene classification task is the mismatched conditions between training and testing data, which significantly reduces the performance of the developed methods on classification accuracy.
2 code implementations • 1 Feb 2018 • Konstantinos Drossos, Stylianos Ioannis Mimilakis, Dmitriy Serdyuk, Gerald Schuller, Tuomas Virtanen, Yoshua Bengio
Current state of the art (SOTA) results in monaural singing voice separation are obtained with deep learning based methods.
Sound Audio and Speech Processing
no code implementations • 4 Nov 2017 • Stylianos Ioannis Mimilakis, Konstantinos Drossos, João F. Santos, Gerald Schuller, Tuomas Virtanen, Yoshua Bengio
Singing voice separation based on deep learning relies on the usage of time-frequency masking.
Sound Audio and Speech Processing
no code implementations • 30 Jun 2017 • Konstantinos Drossos, Sharath Adavanne, Tuomas Virtanen
The encoder is a multi-layered, bi-directional gated recurrent unit (GRU) and the decoder a multi-layered GRU with a classification layer connected to the last GRU of the decoder.
no code implementations • 7 Jun 2017 • Sharath Adavanne, Konstantinos Drossos, Emre Çakır, Tuomas Virtanen
This paper studies the detection of bird calls in audio segments using stacked convolutional and recurrent neural networks.
no code implementations • 7 Jun 2017 • Miroslav Malik, Sharath Adavanne, Konstantinos Drossos, Tuomas Virtanen, Dasa Ticha, Roman Jarina
This paper studies the emotion recognition from musical tracks in the 2-dimensional valence-arousal (V-A) emotional space.
no code implementations • 7 Mar 2017 • EmreÇakır, Sharath Adavanne, Giambattista Parascandolo, Konstantinos Drossos, Tuomas Virtanen
Bird sounds possess distinctive spectral structure which may exhibit small shifts in spectrum depending on the bird species and environmental conditions.