Attentive Language Models

IJCNLP 2017 · Giancarlo Salton, Robert Ross, John Kelleher ·

In this paper, we extend Recurrent Neural Network Language Models (RNN-LMs) with an attention mechanism. We show that an {``}attentive{''} RNN-LM (with 11M parameters) achieves a better perplexity than larger RNN-LMs (with 66M parameters) and achieves performance comparable to an ensemble of 10 similar sized RNN-LMs. We also show that an {``}attentive{''} RNN-LM needs less contextual information to achieve similar results to the state-of-the-art on the wikitext2 dataset.