Search Results for author: Yilong Zhao

Found 4 papers, 1 papers with code

Atom: Low-bit Quantization for Efficient and Accurate LLM Serving

1 code implementation • 29 Oct 2023 • Yilong Zhao, Chien-Yu Lin, Kan Zhu, Zihao Ye, Lequn Chen, Size Zheng, Luis Ceze, Arvind Krishnamurthy, Tianqi Chen, Baris Kasikci

To maximize LLMs' serving throughput, we introduce Atom, a low-bit quantization method that achieves high throughput improvements with negligible accuracy loss.

Quantization Sentiment Analysis

172

Paper
Code

Neural-PIM: Efficient Processing-In-Memory with Neural Approximation of Peripherals

no code implementations • 30 Jan 2022 • Weidong Cao, Yilong Zhao, Adith Boloor, Yinhe Han, Xuan Zhang, Li Jiang

This paper presents a new PIM architecture to efficiently accelerate deep learning tasks by minimizing the required A/D conversions with analog accumulation and neural approximated peripheral circuits.

Quantization

Paper
Add Code

SME: ReRAM-based Sparse-Multiplication-Engine to Squeeze-Out Bit Sparsity of Neural Network

no code implementations • 2 Mar 2021 • Fangxin Liu, Wenbo Zhao, Yilong Zhao, Zongwu Wang, Tao Yang, Zhezhi He, Naifeng Jing, Xiaoyao Liang, Li Jiang

However, it is challenging for crossbar architecture to exploit the sparsity in the DNN.

Quantization

Paper
Add Code

An Ultra-Efficient Memristor-Based DNN Framework with Structured Weight Pruning and Quantization Using ADMM

no code implementations • 29 Aug 2019 • Geng Yuan, Xiaolong Ma, Caiwen Ding, Sheng Lin, Tianyun Zhang, Zeinab S. Jalali, Yilong Zhao, Li Jiang, Sucheta Soundarajan, Yanzhi Wang

Memristor-based weight pruning and weight quantization have been seperately investigated and proven effectiveness in reducing area and power consumption compared to the original DNN model.

Quantization

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.