no code implementations • 13 Oct 2021 • Arvid Frydenlund, Gagandeep Singh, Frank Rudzicz
We also develop a method using $N$-grams to create a non-probabilistic teacher which generates the ranks without the need of a pre-trained LM.
Knowledge Distillation Language Modelling +2