Search Results for author: Yongwei Wu

Found 1 papers, 0 papers with code

Efficient and Economic Large Language Model Inference with Attention Offloading

no code implementations • 3 May 2024 • Shaoyuan Chen, Yutong Lin, Mingxing Zhang, Yongwei Wu

To enhance the efficiency and cost-effectiveness of LLM serving, we introduce the concept of attention offloading.

Language Modelling Large Language Model

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.