Impact of representation matching with neural machine translation

Most neural machine translation models are implemented as a conditional language model framework composed of encoder and decoder models. This framework learns complex and long-distant dependencies, but its deep structure causes inefficiency in training. Matching vector representations of source and target sentences improves the inefficiency by shortening the depth from parameters to costs and generalizes NMTs with different perspective to cross-entropy loss. In this paper, we propose matching methods to derive the cost based on constant word embedding vectors of source and target sentences. To find the best method, we analyze impact of the methods with varying structures, distance metrics, and model capacity in a French to English translation task. An optimally configured method is applied to English from and to French, Spanish, and German translation tasks. In the tasks, the method showed performance improvement by 3.23 BLEU in maximum, 0.71 in average. We evaluated the robustness of this method to various embedding distributions and models as conventional gated structures and transformer network, and empirical results showed that it has higher chance to improve performance in those variety.

PDF Abstract

Datasets


  Add Datasets introduced or used in this paper

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods


No methods listed for this paper. Add relevant methods here