Search Results for author: Shiyao Li

Found 7 papers, 2 papers with code

ViDiT-Q: Efficient and Accurate Quantization of Diffusion Transformers for Image and Video Generation

no code implementations • 4 Jun 2024 • Tianchen Zhao, Tongcheng Fang, Enshu Liu, Wan Rui, Widyadewi Soedarmadji, Shiyao Li, Zinan Lin, Guohao Dai, Shengen Yan, Huazhong Yang, Xuefei Ning, Yu Wang

Diffusion transformers (DiTs) have exhibited remarkable performance in visual generation tasks, such as generating realistic images or videos based on textual instructions.

Quantization Video Generation

Paper
Add Code

A Survey on Efficient Inference for Large Language Models

no code implementations • 22 Apr 2024 • Zixuan Zhou, Xuefei Ning, Ke Hong, Tianyu Fu, Jiaming Xu, Shiyao Li, Yuming Lou, Luning Wang, Zhihang Yuan, Xiuhong Li, Shengen Yan, Guohao Dai, Xiao-Ping Zhang, Yuhan Dong, Yu Wang

This paper presents a comprehensive survey of the existing literature on efficient LLM inference.

Paper
Add Code

Evaluating Quantized Large Language Models

1 code implementation • 28 Feb 2024 • Shiyao Li, Xuefei Ning, Luning Wang, Tengxuan Liu, Xiangsheng Shi, Shengen Yan, Guohao Dai, Huazhong Yang, Yu Wang

Specifically, PTQ can effectively mitigate memory consumption and reduce computational overhead in LLMs.

Quantization

Paper
Code

LV-Eval: A Balanced Long-Context Benchmark with 5 Length Levels Up to 256K

1 code implementation • 6 Feb 2024 • Tao Yuan, Xuefei Ning, Dong Zhou, Zhijie Yang, Shiyao Li, Minghui Zhuang, Zheyue Tan, Zhuyu Yao, Dahua Lin, Boxun Li, Guohao Dai, Shengen Yan, Yu Wang

In contrast, the average context lengths of mainstream benchmarks are insufficient (5k-21k), and they suffer from potential knowledge leakage and inaccurate metrics, resulting in biased evaluation.

16k

Paper
Code

FlightLLM: Efficient Large Language Model Inference with a Complete Mapping Flow on FPGAs

no code implementations • 8 Jan 2024 • Shulin Zeng, Jun Liu, Guohao Dai, Xinhao Yang, Tianyu Fu, Hongyi Wang, Wenheng Ma, Hanbo Sun, Shiyao Li, Zixiao Huang, Yadong Dai, Jintao Li, Zehao Wang, Ruoyu Zhang, Kairui Wen, Xuefei Ning, Yu Wang

However, existing GPU and transformer-based accelerators cannot efficiently process compressed LLMs, due to the following unresolved challenges: low computational efficiency, underutilized memory bandwidth, and large compilation overheads.

Computational Efficiency Language Modelling +2

Paper
Add Code

Enabling Fast 2-bit LLM on GPUs: Memory Alignment and Asynchronous Dequantization

no code implementations • 28 Nov 2023 • Jinhao Li, Shiyao Li, Jiaming Xu, Shan Huang, Yaoxiu Lian, Jun Liu, Yu Wang, Guohao Dai

Weights are quantized by groups, while the ranges of weights are large in some groups, resulting in large quantization errors and nonnegligible accuracy loss (e. g. >3% for Llama2-7b with 2-bit quantization in GPTQ and Greenbit).

Quantization

Paper
Add Code

Characterizing Datasets for Social Visual Question Answering, and the New TinySocial Dataset

no code implementations • 8 Oct 2020 • Zhanwen Chen, Shiyao Li, Roxanne Rashedi, Xiaoman Zi, Morgan Elrod-Erickson, Bryan Hollis, Angela Maliakal, Xinyu Shen, Simeng Zhao, Maithilee Kunda

Modern social intelligence includes the ability to watch videos and answer questions about social and theory-of-mind-related content, e. g., for a scene in Harry Potter, "Is the father really upset about the boys flying the car?"

Question Answering Visual Question Answering

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.