1 code implementation • 5 Feb 2024 • Shanshan Wang, Soumya Tripathy, Toni Heittola, Annamaria Mesaros
In Self-Supervised Learning (SSL), Audio-Visual Correspondence (AVC) is a popular task to learn deep audio and video features from large unlabeled datasets.
no code implementations • 10 Jun 2022 • Duygu Dogan, Huang Xie, Toni Heittola, Tuomas Virtanen
The results show that the classification performance is highly sensitive to the semantic relation between test and training classes and textual and image embeddings can reach up to the semantic acoustic embeddings when the seen and unseen classes are semantically similar.
no code implementations • 8 Jun 2022 • Irene Martín-Morató, Francesco Paissan, Alberto Ancilotto, Toni Heittola, Annamaria Mesaros, Elisabetta Farella, Alessio Brutti, Tuomas Virtanen
The provided baseline system is a convolutional neural network which employs post-training quantization of parameters, resulting in 46. 5 K parameters, and 29. 23 million multiply-and-accumulate operations (MMACs).
1 code implementation • 12 Jul 2021 • Annamaria Mesaros, Toni Heittola, Tuomas Virtanen, Mark D. Plumbley
The goal of automatic sound event detection (SED) methods is to recognize what is happening in an audio signal and when it is happening.
no code implementations • 28 May 2021 • Shanshan Wang, Toni Heittola, Annamaria Mesaros, Tuomas Virtanen
More importantly, multi-modal methods using both audio and video are employed by all the top 5 teams.
1 code implementation • 28 May 2021 • Irene Martín-Morató, Toni Heittola, Annamaria Mesaros, Tuomas Virtanen
The most used techniques among the submissions were residual networks and weight quantization, with the top systems reaching over 70% accuracy, and log loss under 0. 8.
4 code implementations • 6 Sep 2020 • Archontis Politis, Annamaria Mesaros, Sharath Adavanne, Toni Heittola, Tuomas Virtanen
A large-scale realistic dataset of spatialized sound events was generated for the challenge, to be used for training of learning-based approaches, and for evaluation of the submissions in an unlabeled subset.
no code implementations • 29 May 2020 • Toni Heittola, Annamaria Mesaros, Tuomas Virtanen
This paper presents the details of Task 1: Acoustic Scene Classification in the DCASE 2020 Challenge.
no code implementations • 12 Feb 2020 • Shuyang Zhao, Toni Heittola, Tuomas Virtanen
Training with recordings as context outperforms training with only annotated segments.
no code implementations • 2 Aug 2018 • Shayan Gharib, Honain Derrar, Daisuke Niizumi, Tuukka Senttula, Janne Tommola, Toni Heittola, Tuomas Virtanen, Heikki Huttunen
In this paper we study the problem of acoustic scene classification, i. e., categorization of audio sequences into mutually exclusive classes based on their spectral content.
2 code implementations • 25 Jul 2018 • Annamaria Mesaros, Toni Heittola, Tuomas Virtanen
This paper introduces the acoustic scene classification task of DCASE 2018 Challenge and the TUT Urban Acoustic Scenes 2018 dataset provided for the task, and evaluates the performance of a baseline system in the task.
no code implementations • 7 Jun 2017 • Sharath Adavanne, Giambattista Parascandolo, Pasi Pertilä, Toni Heittola, Tuomas Virtanen
In this paper, we propose the use of spatial and harmonic features in combination with long short term memory (LSTM) recurrent neural network (RNN) for automatic sound event detection (SED) task.
1 code implementation • 21 Feb 2017 • Emre Çakır, Giambattista Parascandolo, Toni Heittola, Heikki Huttunen, Tuomas Virtanen
Sound events often occur in unstructured environments where they exhibit wide variations in their frequency content and temporal structure.