Search Results for author: Gaochang Xie

Found 1 papers, 0 papers with code

Edge Intelligence Optimization for Large Language Model Inference with Batching and Quantization

no code implementations • 12 May 2024 • Xinyuan Zhang, Jiang Liu, Zehui Xiong, Yudong Huang, Gaochang Xie, Ran Zhang

Specifically, with the deployment of the batching technique and model quantization on resource-limited edge devices, we formulate an inference model for transformer decoder-based LLMs.

Language Modelling Large Language Model +2

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.