Solving the AL Chicken-and-Egg Corpus and Model Problem: Model-free Active Learning for Phenomena-driven Corpus Construction
Active learning (AL) is often used in corpus construction (CC) for selecting {``}informative{''} documents for annotation. This is ideal for focusing annotation efforts when all documents cannot be annotated, but has the limitation that it is carried out in a closed-loop, selecting points that will improve an existing model. For phenomena-driven and exploratory CC, the lack of existing-models and specific task(s) for using it make traditional AL inapplicable. In this paper we propose a novel method for model-free AL utilising characteristics of phenomena for applying AL to select documents for annotation. The method can also supplement traditional closed-loop AL-based CC to extend the utility of the corpus created beyond a single task. We introduce our tool, MOVE, and show its potential with a real world case-study.
PDF Abstract LREC 2016 PDF LREC 2016 Abstract