no code implementations • 21 Mar 2024 • Dominik Wagner, Alexander Churchill, Siddharth Sigtia, Panayiotis Georgiou, Matt Mirsamadi, Aarshee Mishra, Erik Marchi
Interactions with virtual assistants typically start with a predefined trigger phrase followed by the user command.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +4
no code implementations • 6 Dec 2023 • Dominik Wagner, Alexander Churchill, Siddharth Sigtia, Panayiotis Georgiou, Matt Mirsamadi, Aarshee Mishra, Erik Marchi
We compare the proposed system to unimodal baselines and show that the multimodal approach achieves lower equal-error-rates (EERs), while using only a fraction of the training data.
no code implementations • 5 Apr 2022 • Prateeth Nayak, Takuya Higuchi, Anmol Gupta, Shivesh Ranjan, Stephen Shum, Siddharth Sigtia, Erik Marchi, Varun Lakshminarasimhan, Minsik Cho, Saurabh Adya, Chandra Dhir, Ahmed Tewfik
A detector is typically trained on speech data independent of speaker information and used for the voice trigger detection task.
no code implementations • 14 May 2021 • Vineet Garg, Wonil Chang, Siddharth Sigtia, Saurabh Adya, Pramod Simha, Pranay Dighe, Chandra Dhir
We propose a streaming transformer (TF) encoder architecture, which progressively processes incoming audio chunks and maintains audio context to perform both VTD and FTM tasks using only acoustic features.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
no code implementations • 29 Oct 2020 • Siddharth Sigtia, John Bridle, Hywel Richards, Pascal Clark, Erik Marchi, Vineet Garg
We first demonstrate that by including more audio context after a detected trigger phrase, we can indeed get a more accurate decision.
no code implementations • 5 Aug 2020 • Saurabh Adya, Vineet Garg, Siddharth Sigtia, Pramod Simha, Chandra Dhir
Our baseline is an acoustic model(AM), with BiLSTM layers, trained by minimizing the CTC loss.
no code implementations • 26 Jan 2020 • Siddharth Sigtia, Pascal Clark, Rob Haynes, Hywel Richards, John Bridle
Next, we collect a much smaller dataset of examples that are challenging for the baseline system.
no code implementations • 26 Jan 2020 • Siddharth Sigtia, Erik Marchi, Sachin Kajarekar, Devang Naik, John Bridle
We train the network in a supervised multi-task learning setup, where the speech transcription branch of the network is trained to minimise a phonetic connectionist temporal classification (CTC) loss while the speaker recognition branch of the network is trained to label the input sequence with the correct label for the speaker.
no code implementations • 15 Jul 2016 • Siddharth Sigtia, Adam M. Stark, Sacha Krstulovic, Mark D. Plumbley
In the context of the Internet of Things (IoT), sound sensing applications are required to run on embedded platforms where notions of product pricing and form factor impose hard constraints on the available computing power.
2 code implementations • 13 Jul 2016 • Yong Xu, Qiang Huang, Wenwu Wang, Peter Foster, Siddharth Sigtia, Philip J. B. Jackson, Mark D. Plumbley
For the unsupervised feature learning, we propose to use a symmetric or asymmetric deep de-noising auto-encoder (sDAE or aDAE) to generate new data-driven features from the Mel-Filter Banks (MFBs) features.
no code implementations • 14 Apr 2016 • Alexander W. Churchill, Siddharth Sigtia, Chrisantha Fernando
Neural networks and evolutionary computation have a rich intertwined history.
1 code implementation • 7 Aug 2015 • Siddharth Sigtia, Emmanouil Benetos, Simon Dixon
We compare performance of the neural network based acoustic models with two popular unsupervised acoustic models.
no code implementations • 6 Nov 2014 • Siddharth Sigtia, Emmanouil Benetos, Nicolas Boulanger-Lewandowski, Tillman Weyde, Artur S. d'Avila Garcez, Simon Dixon
We investigate the problem of incorporating higher-level symbolic score-like information into Automatic Music Transcription (AMT) systems to improve their performance.
no code implementations • 6 Apr 2014 • Alexander W. Churchill, Siddharth Sigtia, Chrisantha Fernando
An algorithm is described that adaptively learns a non-linear mutation distribution.