Search Results for author: Qingyuan Li

Found 10 papers, 4 papers with code

Integer Scale: A Free Lunch for Faster Fine-grained Quantization of LLMs

no code implementations23 May 2024 Qingyuan Li, Ran Meng, Yiduo Li, Bo Zhang, Yifan Lu, Yerui Sun, Lin Ma, Yuchen Xie

We introduce Integer Scale, a novel post-training quantization scheme for large language models that effectively resolves the inference bottleneck in current fine-grained quantization approaches while maintaining similar accuracies.

Quantization

UMIE: Unified Multimodal Information Extraction with Instruction Tuning

1 code implementation5 Jan 2024 Lin Sun, Kai Zhang, Qingyuan Li, Renze Lou

Multimodal information extraction (MIE) gains significant attention as the popularity of multimedia content increases.

Norm Tweaking: High-performance Low-bit Quantization of Large Language Models

no code implementations6 Sep 2023 Liang Li, Qingyuan Li, Bo Zhang, Xiangxiang Chu

On GLM-130B and OPT-66B, our method even achieves the same level of accuracy at 2-bit quantization as their float ones.

Model Compression Quantization

FPTQ: Fine-grained Post-Training Quantization for Large Language Models

no code implementations30 Aug 2023 Qingyuan Li, Yifan Zhang, Liang Li, Peng Yao, Bo Zhang, Xiangxiang Chu, Yerui Sun, Li Du, Yuchen Xie

In this study, we propose a novel W4A8 post-training quantization method for the available open-sourced LLMs, which combines the advantages of both two recipes.

Quantization

EAPruning: Evolutionary Pruning for Vision Transformers and CNNs

no code implementations1 Oct 2022 Qingyuan Li, Bo Zhang, Xiangxiang Chu

In this paper, we undertake a simple and effective approach that can be easily applied to both vision transformers and convolutional neural networks.

Cannot find the paper you are looking for? You can Submit a new open access paper.