Cross-lingual and Supervised Models for Morphosyntactic Annotation: a Comparison on Romanian

LREC 2016 · Lauriane Aufrant, Guillaume Wisniewski, Fran{\c{c}}ois Yvon ·

Because of the small size of Romanian corpora, the performance of a PoS tagger or a dependency parser trained with the standard supervised methods fall far short from the performance achieved in most languages. That is why, we apply state-of-the-art methods for cross-lingual transfer on Romanian tagging and parsing, from English and several Romance languages. We compare the performance with monolingual systems trained with sets of different sizes and establish that training on a few sentences in target language yields better results than transferring from large datasets in other languages.