Search Results for author: Raffy Fahim

Found 3 papers, 0 papers with code

Mixture of Quantized Experts (MoQE): Complementary Effect of Low-bit Quantization and Robustness

no code implementations • 3 Oct 2023 • Young Jin Kim, Raffy Fahim, Hany Hassan Awadalla

In our comprehensive analysis, we show that MoE models with 2-bit expert weights can deliver better model performance than the dense model trained on the same dataset.

Machine Translation Quantization

Paper
Add Code

FineQuant: Unlocking Efficiency with Fine-Grained Weight-Only Quantization for LLMs

no code implementations • 16 Aug 2023 • Young Jin Kim, Rawn Henry, Raffy Fahim, Hany Hassan Awadalla

Large Language Models (LLMs) have achieved state-of-the-art performance across various language tasks but pose challenges for practical deployment due to their substantial memory requirements.

Quantization

Paper
Add Code

Who Says Elephants Can't Run: Bringing Large Scale MoE Models into Cloud Scale Production

no code implementations • 18 Nov 2022 • Young Jin Kim, Rawn Henry, Raffy Fahim, Hany Hassan Awadalla

Mixture of Experts (MoE) models with conditional execution of sparsely activated layers have enabled training models with a much larger number of parameters.

Machine Translation

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.