Collecting fluency corrections for spoken learner English

WS 2017 · Andrew Caines, Emma Flint, Paula Buttery ·

We present crowdsourced collection of error annotations for transcriptions of spoken learner English. Our emphasis in data collection is on fluency corrections, a more complete correction than has traditionally been aimed for in grammatical error correction research (GEC). Fluency corrections require improvements to the text, taking discourse and utterance level semantics into account: the result is a more naturalistic, holistic version of the original. We propose that this shifted emphasis be reflected in a new name for the task: {`}holistic error correction{'} (HEC). We analyse crowdworker behaviour in HEC and conclude that the method is useful with certain amendments for future work.

PDF Abstract