Boosting statistical tagger accuracy with simple rule-based grammars
We report on several experiments on combining a rule-based tagger and a trigram tagger for Spanish. The results show that one can boost the accuracy of the best performing n-gram taggers by quickly developing a rough rule-based grammar to complement the statistically induced one and then combining the output of the two. The specific method of combination is crucial for achieving good results. The method provides particularly large gains in accuracy when only a small amount of tagged data is available for training a HMM, as may be the case for lesser-resourced and minority languages.
PDF Abstract