no code implementations • 21 Mar 2024 • Dominik Wagner, Alexander Churchill, Siddharth Sigtia, Panayiotis Georgiou, Matt Mirsamadi, Aarshee Mishra, Erik Marchi
Interactions with virtual assistants typically start with a predefined trigger phrase followed by the user command.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +4
1 code implementation • 19 Feb 2024 • Dominik Wagner, Basim Khajwal, C. -H. Luke Ong
It is well-known that the reparameterisation gradient estimator, which exhibits low variance in practice, is biased for non-differentiable models.
no code implementations • 6 Dec 2023 • Dominik Wagner, Alexander Churchill, Siddharth Sigtia, Panayiotis Georgiou, Matt Mirsamadi, Aarshee Mishra, Erik Marchi
We compare the proposed system to unimodal baselines and show that the multimodal approach achieves lower equal-error-rates (EERs), while using only a fraction of the training data.
no code implementations • 30 May 2023 • Sebastian P. Bayerl, Dominik Wagner, Ilja Baumann, Florian Hönig, Tobias Bocklet, Elmar Nöth, Korbinian Riedhammer
Most stuttering detection and classification research has viewed stuttering as a multi-class classification problem or a binary detection task for each dysfluency type; however, this does not match the nature of stuttering, in which one dysfluency seldom comes alone but rather co-occurs with others.
no code implementations • 9 Jan 2023 • Basim Khajwal, C. -H. Luke Ong, Dominik Wagner
Thus we can prove stochastic gradient descent with the reparameterisation gradient estimator to be correct when applied to the smoothed problem.
no code implementations • 28 Oct 2022 • Sebastian P. Bayerl, Dominik Wagner, Florian Hönig, Tobias Bocklet, Elmar Nöth, Korbinian Riedhammer
This work explores an approach based on a modified wav2vec 2. 0 system for end-to-end stuttering detection and classification as a multi-label problem.
no code implementations • 28 Oct 2022 • Ilja Baumann, Dominik Wagner, Franziska Braun, Sebastian P. Bayerl, Elmar Nöth, Korbinian Riedhammer, Tobias Bocklet
Recent findings show that pre-trained wav2vec 2. 0 models are reliable feature extractors for various speaker characteristics classification tasks.
no code implementations • 27 Oct 2022 • Dominik Wagner, Ilja Baumann, Franziska Braun, Sebastian P. Bayerl, Elmar Nöth, Korbinian Riedhammer, Tobias Bocklet
The detection of pathologies from speech features is usually defined as a binary classification task with one class representing a specific pathology and the other class representing healthy speech.
no code implementations • 16 Jun 2022 • Ilja Baumann, Dominik Wagner, Sebastian Bayerl, Tobias Bocklet
In this work, the task is to determine whether spoken nonwords have been uttered correctly.
1 code implementation • 7 Jun 2022 • Sebastian P. Bayerl, Dominik Wagner, Elmar Nöth, Tobias Bocklet, Korbinian Riedhammer
This paper empirically investigates the influence of different data splits and splitting strategies on the performance of dysfluency detection systems.
no code implementations • 7 Apr 2022 • Sebastian P. Bayerl, Dominik Wagner, Ilja Baumann, Korbinian Riedhammer, Tobias Bocklet
Vocal fatigue refers to the feeling of tiredness and weakness of voice due to extended utilization.
no code implementations • 7 Apr 2022 • Sebastian P. Bayerl, Dominik Wagner, Elmar Nöth, Korbinian Riedhammer
This paper shows that fine-tuning wav2vec 2. 0 [1] for the classification of stuttering on a sizeable English corpus containing stuttered speech, in conjunction with multi-task learning, boosts the effectiveness of the general-purpose wav2vec 2. 0 features for detecting stuttering in speech; both within and across languages.
no code implementations • 8 Apr 2020 • Carol Mak, C. -H. Luke Ong, Hugo Paquet, Dominik Wagner
We give SPCF a sampling-style operational semantics a la Borgstrom et al., and study the associated weight (commonly referred to as the density) function and value function on the set of possible execution traces.