no code implementations • 3 Oct 2023 • Young Jin Kim, Raffy Fahim, Hany Hassan Awadalla
In our comprehensive analysis, we show that MoE models with 2-bit expert weights can deliver better model performance than the dense model trained on the same dataset.
no code implementations • 16 Aug 2023 • Young Jin Kim, Rawn Henry, Raffy Fahim, Hany Hassan Awadalla
Large Language Models (LLMs) have achieved state-of-the-art performance across various language tasks but pose challenges for practical deployment due to their substantial memory requirements.
no code implementations • 18 Nov 2022 • Young Jin Kim, Rawn Henry, Raffy Fahim, Hany Hassan Awadalla
Mixture of Experts (MoE) models with conditional execution of sparsely activated layers have enabled training models with a much larger number of parameters.