no code implementations • Findings (EMNLP) 2021 • Tianda Li, Ahmad Rashid, Aref Jafari, Pranav Sharma, Ali Ghodsi, Mehdi Rezagholizadeh
Knowledge Distillation (KD) is a model compression algorithm that helps transfer the knowledge in a large neural network into a smaller one.
1 code implementation • 31 Jul 2023 • Ehsan Kamalloo, Aref Jafari, Xinyu Zhang, Nandan Thakur, Jimmy Lin
In this paper, we introduce a new dataset, HAGRID (Human-in-the-loop Attributable Generative Retrieval for Information-seeking Dataset) for building end-to-end generative information-seeking models that are capable of retrieving candidate quotes and generating attributed explanations.
no code implementations • 27 Jan 2023 • Aref Jafari, Mehdi Rezagholizadeh, Ali Ghodsi
Augmenting the training set by adding this auxiliary improves the performance of KD significantly and leads to a closer match between the student and the teacher.
no code implementations • 12 Dec 2022 • Aref Jafari, Ivan Kobyzev, Mehdi Rezagholizadeh, Pascal Poupart, Ali Ghodsi
Knowledge Distillation (KD) has been extensively used for natural language understanding (NLU) tasks to improve a small model's (a student) generalization by transferring the knowledge from a larger model (a teacher).
no code implementations • 25 May 2022 • Ivan Kobyzev, Aref Jafari, Mehdi Rezagholizadeh, Tianda Li, Alan Do-Omri, Peng Lu, Pascal Poupart, Ali Ghodsi
Knowledge Distillation (KD) is a prominent neural model compression technique that heavily relies on teacher network predictions to guide the training of a student model.
no code implementations • COLING 2022 • Mehdi Rezagholizadeh, Aref Jafari, Puneeth Salad, Pranav Sharma, Ali Saheb Pasand, Ali Ghodsi
A case in point is that the best performing checkpoint of the teacher might not necessarily be the best teacher for training the student in KD.
1 code implementation • 13 Sep 2021 • Tianda Li, Ahmad Rashid, Aref Jafari, Pranav Sharma, Ali Ghodsi, Mehdi Rezagholizadeh
Knowledge Distillation (KD) is a model compression algorithm that helps transfer the knowledge of a large neural network into a smaller one.
1 code implementation • EACL 2021 • Aref Jafari, Mehdi Rezagholizadeh, Pranav Sharma, Ali Ghodsi
Knowledge distillation (KD) is a prominent model compression technique for deep neural networks in which the knowledge of a trained large teacher model is transferred to a smaller student model.
no code implementations • 30 Jun 2020 • Aref Jafari, Ali Ghodsi
This has been accomplished by defining an embedding method for the position of all members of a coreference cluster in a document and resolving all of them for a given mention.