Search Results for author: Jung Hwan Heo

Found 4 papers, 2 papers with code

Rethinking Channel Dimensions to Isolate Outliers for Low-bit Weight Quantization of Large Language Models

1 code implementation • 27 Sep 2023 • Jung Hwan Heo, Jeonghoon Kim, Beomseok Kwon, Byeongwook Kim, Se Jung Kwon, Dongsoo Lee

Weight-only quantization can be a promising approach, but sub-4 bit quantization remains a challenge due to large-magnitude activation outliers.

Language Modelling Quantization

Paper
Code

CrAFT: Compression-Aware Fine-Tuning for Efficient Visual Task Adaptation

no code implementations • 8 May 2023 • Jung Hwan Heo, Seyedarmin Azizi, Arash Fayyazi, Massoud Pedram

Post-training compression techniques such as pruning and quantization can help lower deployment costs.

Model Compression Quantization +1

Paper
Add Code

A Fast Training-Free Compression Framework for Vision Transformers

1 code implementation • 4 Mar 2023 • Jung Hwan Heo, Arash Fayyazi, Mahdi Nazemi, Massoud Pedram

Token pruning has emerged as an effective solution to speed up the inference of large Transformer models.

Paper
Code

Sparse Periodic Systolic Dataflow for Lowering Latency and Power Dissipation of Convolutional Neural Network Accelerators

no code implementations • 30 Jun 2022 • Jung Hwan Heo, Arash Fayyazi, Amirhossein Esmaili, Massoud Pedram

This paper introduces the sparse periodic systolic (SPS) dataflow, which advances the state-of-the-art hardware accelerator for supporting lightweight neural networks.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.