no code implementations • 19 Dec 2021 • Vibhas Vats, David Crandall
We argue that for a given teacher-student pair, the quality of distillation can be improved by finding the sweet spot between batch size and number of epochs while training the teacher.