TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK	REMOVE
Audio Classification	FSD50K	Temporal Knowledge Distillation for On-device Audio Classification	mAP	54.8	# 7

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/temporal-knowledge-distillation-for-on-device/audio-classification-on-fsd50k)](https://paperswithcode.com/sota/audio-classification-on-fsd50k?p=temporal-knowledge-distillation-for-on-device)`

Temporal Knowledge Distillation for On-device Audio Classification

27 Oct 2021 · Kwanghee Choi, Martin Kersner, Jacob Morton, Buru Chang ·

Improving the performance of on-device audio classification models remains a challenge given the computational limits of the mobile environment. Many studies leverage knowledge distillation to boost predictive performance by transferring the knowledge from large models to on-device models. However, most lack a mechanism to distill the essence of the temporal information, which is crucial to audio classification tasks, or similar architecture is often required. In this paper, we propose a new knowledge distillation method designed to incorporate the temporal knowledge embedded in attention weights of large transformer-based models into on-device models. Our distillation method is applicable to various types of architectures, including the non-attention-based architectures such as CNNs or RNNs, while retaining the original network architecture during inference. Through extensive experiments on both an audio event detection dataset and a noisy keyword spotting dataset, we show that our proposed method improves the predictive performance across diverse on-device architectures.

PDF Abstract