Search Results for author: Qingyuan Li

Found 10 papers, 4 papers with code

Integer Scale: A Free Lunch for Faster Fine-grained Quantization of LLMs

no code implementations • 23 May 2024 • Qingyuan Li, Ran Meng, Yiduo Li, Bo Zhang, Yifan Lu, Yerui Sun, Lin Ma, Yuchen Xie

We introduce Integer Scale, a novel post-training quantization scheme for large language models that effectively resolves the inference bottleneck in current fine-grained quantization approaches while maintaining similar accuracies.

Quantization

Paper
Add Code

UMIE: Unified Multimodal Information Extraction with Instruction Tuning

1 code implementation • 5 Jan 2024 • Lin Sun, Kai Zhang, Qingyuan Li, Renze Lou

Multimodal information extraction (MIE) gains significant attention as the popularity of multimedia content increases.

Paper
Code

A Speed Odyssey for Deployable Quantization of LLMs

no code implementations • 16 Nov 2023 • Qingyuan Li, Ran Meng, Yiduo Li, Bo Zhang, Liang Li, Yifan Lu, Xiangxiang Chu, Yerui Sun, Yuchen Xie

The large language model era urges faster and less costly inference.

Language Modelling Large Language Model +2

Paper
Add Code

Norm Tweaking: High-performance Low-bit Quantization of Large Language Models

no code implementations • 6 Sep 2023 • Liang Li, Qingyuan Li, Bo Zhang, Xiangxiang Chu

On GLM-130B and OPT-66B, our method even achieves the same level of accuracy at 2-bit quantization as their float ones.

Model Compression Quantization

Paper
Add Code

FPTQ: Fine-grained Post-Training Quantization for Large Language Models

no code implementations • 30 Aug 2023 • Qingyuan Li, Yifan Zhang, Liang Li, Peng Yao, Bo Zhang, Xiangxiang Chu, Yerui Sun, Li Du, Yuchen Xie

In this study, we propose a novel W4A8 post-training quantization method for the available open-sourced LLMs, which combines the advantages of both two recipes.

Quantization