Search Results for author: Alan Zhu

Found 2 papers, 1 papers with code

Accelerating Retrieval-Augmented Language Model Serving with Speculation

no code implementations25 Jan 2024 Zhihao Zhang, Alan Zhu, Lijie Yang, Yihua Xu, LanTing LI, Phitchaya Mangpo Phothilimthana, Zhihao Jia

Retrieval-augmented language models (RaLM) have demonstrated the potential to solve knowledge-intensive natural language processing (NLP) tasks by combining a non-parametric knowledge base with a parametric language model.

Language Modelling Retrieval

SpecInfer: Accelerating Generative Large Language Model Serving with Tree-based Speculative Inference and Verification

3 code implementations16 May 2023 Xupeng Miao, Gabriele Oliaro, Zhihao Zhang, Xinhao Cheng, Zeyu Wang, Zhengxin Zhang, Rae Ying Yee Wong, Alan Zhu, Lijie Yang, Xiaoxiang Shi, Chunan Shi, Zhuoming Chen, Daiyaan Arfeen, Reyna Abhyankar, Zhihao Jia

Our evaluation shows that SpecInfer outperforms existing LLM serving systems by 1. 5-2. 8x for distributed LLM inference and by 2. 6-3. 5x for offloading-based LLM inference, while preserving the same generative performance.

Decoder Language Modelling +1

Cannot find the paper you are looking for? You can Submit a new open access paper.