Search Results for author: Siddharth Sigtia

Found 14 papers, 2 papers with code

A Multimodal Approach to Device-Directed Speech Detection with Large Language Models

no code implementations • 21 Mar 2024 • Dominik Wagner, Alexander Churchill, Siddharth Sigtia, Panayiotis Georgiou, Matt Mirsamadi, Aarshee Mishra, Erik Marchi

Interactions with virtual assistants typically start with a predefined trigger phrase followed by the user command.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +4

Paper
Add Code

Multimodal Data and Resource Efficient Device-Directed Speech Detection with Large Foundation Models

no code implementations • 6 Dec 2023 • Dominik Wagner, Alexander Churchill, Siddharth Sigtia, Panayiotis Georgiou, Matt Mirsamadi, Aarshee Mishra, Erik Marchi

We compare the proposed system to unimodal baselines and show that the multimodal approach achieves lower equal-error-rates (EERs), while using only a fraction of the training data.

Automatic Speech Recognition Decoder +4

Paper
Add Code

Improving Voice Trigger Detection with Metric Learning

no code implementations • 5 Apr 2022 • Prateeth Nayak, Takuya Higuchi, Anmol Gupta, Shivesh Ranjan, Stephen Shum, Siddharth Sigtia, Erik Marchi, Varun Lakshminarasimhan, Minsik Cho, Saurabh Adya, Chandra Dhir, Ahmed Tewfik

A detector is typically trained on speech data independent of speaker information and used for the voice trigger detection task.

Decoder Metric Learning

Paper
Add Code

Streaming Transformer for Hardware Efficient Voice Trigger Detection and False Trigger Mitigation

no code implementations • 14 May 2021 • Vineet Garg, Wonil Chang, Siddharth Sigtia, Saurabh Adya, Pramod Simha, Pranay Dighe, Chandra Dhir

We propose a streaming transformer (TF) encoder architecture, which progressively processes incoming audio chunks and maintains audio context to perform both VTD and FTM tasks using only acoustic features.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Progressive Voice Trigger Detection: Accuracy vs Latency

no code implementations • 29 Oct 2020 • Siddharth Sigtia, John Bridle, Hywel Richards, Pascal Clark, Erik Marchi, Vineet Garg

We first demonstrate that by including more audio context after a detected trigger phrase, we can indeed get a more accurate decision.

Paper
Add Code

Hybrid Transformer/CTC Networks for Hardware Efficient Voice Triggering

no code implementations • 5 Aug 2020 • Saurabh Adya, Vineet Garg, Siddharth Sigtia, Pramod Simha, Chandra Dhir

Our baseline is an acoustic model(AM), with BiLSTM layers, trained by minimizing the CTC loss.

Decoder Multi-Task Learning

Paper
Add Code

Multi-task Learning for Voice Trigger Detection

no code implementations • 26 Jan 2020 • Siddharth Sigtia, Pascal Clark, Rob Haynes, Hywel Richards, John Bridle

Next, we collect a much smaller dataset of examples that are challenging for the baseline system.

Multi-Task Learning

Paper
Add Code

Multi-task Learning for Speaker Verification and Voice Trigger Detection

no code implementations • 26 Jan 2020 • Siddharth Sigtia, Erik Marchi, Sachin Kajarekar, Devang Naik, John Bridle

We train the network in a supervised multi-task learning setup, where the speech transcription branch of the network is trained to minimise a phonetic connectionist temporal classification (CTC) loss while the speaker recognition branch of the network is trained to label the input sequence with the correct label for the speaker.

Multi-Task Learning Speaker Recognition +1

Paper
Add Code

Automatic Environmental Sound Recognition: Performance versus Computational Cost

no code implementations • 15 Jul 2016 • Siddharth Sigtia, Adam M. Stark, Sacha Krstulovic, Mark D. Plumbley

In the context of the Internet of Things (IoT), sound sensing applications are required to run on embedded platforms where notions of product pricing and form factor impose hard constraints on the available computing power.

General Classification Sound Classification

Paper
Add Code

Unsupervised Feature Learning Based on Deep Models for Environmental Audio Tagging

2 code implementations • 13 Jul 2016 • Yong Xu, Qiang Huang, Wenwu Wang, Peter Foster, Siddharth Sigtia, Philip J. B. Jackson, Mark D. Plumbley

For the unsupervised feature learning, we propose to use a symmetric or asymmetric deep de-noising auto-encoder (sDAE or aDAE) to generate new data-driven features from the Mel-Filter Banks (MFBs) features.

Audio Tagging General Classification +1

Paper
Code

Learning to Generate Genotypes with Neural Networks

no code implementations • 14 Apr 2016 • Alexander W. Churchill, Siddharth Sigtia, Chrisantha Fernando

Neural networks and evolutionary computation have a rich intertwined history.

Denoising

Paper
Add Code

An End-to-End Neural Network for Polyphonic Piano Music Transcription

1 code implementation • 7 Aug 2015 • Siddharth Sigtia, Emmanouil Benetos, Simon Dixon

We compare performance of the neural network based acoustic models with two popular unsupervised acoustic models.

Language Modelling Music Transcription +2

Paper
Code

A Hybrid Recurrent Neural Network For Music Transcription

no code implementations • 6 Nov 2014 • Siddharth Sigtia, Emmanouil Benetos, Nicolas Boulanger-Lewandowski, Tillman Weyde, Artur S. d'Avila Garcez, Simon Dixon

We investigate the problem of incorporating higher-level symbolic score-like information into Automatic Music Transcription (AMT) systems to improve their performance.

Music Transcription

Paper
Add Code

A Denoising Autoencoder that Guides Stochastic Search

no code implementations • 6 Apr 2014 • Alexander W. Churchill, Siddharth Sigtia, Chrisantha Fernando

An algorithm is described that adaptively learns a non-linear mutation distribution.

Denoising

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.