Search Results for author: Raffy Fahim

Found 3 papers, 0 papers with code

Mixture of Quantized Experts (MoQE): Complementary Effect of Low-bit Quantization and Robustness

no code implementations3 Oct 2023 Young Jin Kim, Raffy Fahim, Hany Hassan Awadalla

In our comprehensive analysis, we show that MoE models with 2-bit expert weights can deliver better model performance than the dense model trained on the same dataset.

Machine Translation Quantization

FineQuant: Unlocking Efficiency with Fine-Grained Weight-Only Quantization for LLMs

no code implementations16 Aug 2023 Young Jin Kim, Rawn Henry, Raffy Fahim, Hany Hassan Awadalla

Large Language Models (LLMs) have achieved state-of-the-art performance across various language tasks but pose challenges for practical deployment due to their substantial memory requirements.

Quantization

Who Says Elephants Can't Run: Bringing Large Scale MoE Models into Cloud Scale Production

no code implementations18 Nov 2022 Young Jin Kim, Rawn Henry, Raffy Fahim, Hany Hassan Awadalla

Mixture of Experts (MoE) models with conditional execution of sparsely activated layers have enabled training models with a much larger number of parameters.

Machine Translation

Cannot find the paper you are looking for? You can Submit a new open access paper.