Search Results for author: Alexander Min

Found 2 papers, 1 papers with code

Co-training and Co-distillation for Quality Improvement and Compression of Language Models

no code implementations • 6 Nov 2023 • Hayeon Lee, Rui Hou, Jongpil Kim, Davis Liang, Hongbo Zhang, Sung Ju Hwang, Alexander Min

2) The enhanced performance of the larger model further boosts the performance of the smaller model.

Data Augmentation Knowledge Distillation

Paper
Add Code

A Study on Knowledge Distillation from Weak Teacher for Scaling Up Pre-trained Language Models

1 code implementation • 26 May 2023 • Hayeon Lee, Rui Hou, Jongpil Kim, Davis Liang, Sung Ju Hwang, Alexander Min

Distillation from Weak Teacher (DWT) is a method of transferring knowledge from a smaller, weaker teacher model to a larger student model to improve its performance.

Knowledge Distillation

126,027

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.