1 code implementation • 27 Sep 2023 • Jung Hwan Heo, Jeonghoon Kim, Beomseok Kwon, Byeongwook Kim, Se Jung Kwon, Dongsoo Lee
Weight-only quantization can be a promising approach, but sub-4 bit quantization remains a challenge due to large-magnitude activation outliers.
no code implementations • 8 May 2023 • Jung Hwan Heo, Seyedarmin Azizi, Arash Fayyazi, Massoud Pedram
Post-training compression techniques such as pruning and quantization can help lower deployment costs.
1 code implementation • 4 Mar 2023 • Jung Hwan Heo, Arash Fayyazi, Mahdi Nazemi, Massoud Pedram
Token pruning has emerged as an effective solution to speed up the inference of large Transformer models.
no code implementations • 30 Jun 2022 • Jung Hwan Heo, Arash Fayyazi, Amirhossein Esmaili, Massoud Pedram
This paper introduces the sparse periodic systolic (SPS) dataflow, which advances the state-of-the-art hardware accelerator for supporting lightweight neural networks.