Search Results for author: Yongwei Wu

Found 1 papers, 0 papers with code

Efficient and Economic Large Language Model Inference with Attention Offloading

no code implementations3 May 2024 Shaoyuan Chen, Yutong Lin, Mingxing Zhang, Yongwei Wu

To enhance the efficiency and cost-effectiveness of LLM serving, we introduce the concept of attention offloading.

Language Modelling Large Language Model

Cannot find the paper you are looking for? You can Submit a new open access paper.