Search Results for author: Felipe Perez

Found 4 papers, 3 papers with code

Improving Transformer Optimization Through Better Initialization

1 code implementation • ICML 2020 • Xiao Shi Huang, Felipe Perez, Jimmy Ba, Maksims Volkovs

As Transformer models are becoming larger and more expensive to train, recent research has focused on understanding and improving optimization in these models.

Decoder Language Modelling +2

Paper
Code

Improving Transformer Optimization Through Better Initialization

1 code implementation • ICML 2020 • Xiao Shi Huang, Felipe Perez, Jimmy Ba, Maksims Volkovs

As Transformer models are becoming larger and more expensive to train, recent research has focused on understanding and improving optimization in these models.

Decoder Language Modelling +2

Paper
Code

DiMS: Distilling Multiple Steps of Iterative Non-Autoregressive Transformers for Machine Translation

1 code implementation • 7 Jun 2022 • Sajad Norouzi, Rasa Hosseinzadeh, Felipe Perez, Maksims Volkovs

The student is optimized to predict the output of the teacher after multiple decoding steps while the teacher follows the student via a slow-moving average.

Machine Translation Translation

Paper
Code

Improving Non-Autoregressive Translation Models Without Distillation

no code implementations • ICLR 2022 • Xiao Shi Huang, Felipe Perez, Maksims Volkovs

Empirically, we show that CMLMC achieves state-of-the-art NAR performance when trained on raw data without distillation and approaches AR performance on multiple datasets.

Language Modelling Machine Translation +1

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.