Search Results for author: Alexander Min

Found 2 papers, 1 papers with code

A Study on Knowledge Distillation from Weak Teacher for Scaling Up Pre-trained Language Models

1 code implementation26 May 2023 Hayeon Lee, Rui Hou, Jongpil Kim, Davis Liang, Sung Ju Hwang, Alexander Min

Distillation from Weak Teacher (DWT) is a method of transferring knowledge from a smaller, weaker teacher model to a larger student model to improve its performance.

Knowledge Distillation

Cannot find the paper you are looking for? You can Submit a new open access paper.