Automated speech-unit delimitation in spoken learner English

COLING 2016 · Russell Moore, Andrew Caines, Calbert Graham, Paula Buttery ·

In order to apply computational linguistic analyses and pass information to downstream applications, transcriptions of speech obtained via automatic speech recognition (ASR) need to be divided into smaller meaningful units, in a task we refer to as {`}speech-unit (SU) delimitation{'}. We closely recreate the automatic delimitation system described by Lee and Glass (2012), {`}Sentence detection using multiple annotations{'}, Proceedings of INTERSPEECH, which combines a prosodic model, language model and speech-unit length model in log-linear fashion. Since state-of-the-art natural language processing (NLP) tools have been developed to deal with written text and its characteristic sentence-like units, SU delimitation helps bridge the gap between ASR and NLP, by normalising spoken data into a more canonical format. Previous work has focused on native speaker recordings; we test the system of Lee and Glass (2012) on non-native speaker (or {`}learner{'}) data, achieving performance above the state-of-the-art. We also consider alternative evaluation metrics which move away from the idea of a single {`}truth{'} in SU delimitation, and frame this work in the context of downstream NLP applications.