Search Results for author: Gaochang Xie

Found 1 papers, 0 papers with code

Edge Intelligence Optimization for Large Language Model Inference with Batching and Quantization

no code implementations12 May 2024 Xinyuan Zhang, Jiang Liu, Zehui Xiong, Yudong Huang, Gaochang Xie, Ran Zhang

Specifically, with the deployment of the batching technique and model quantization on resource-limited edge devices, we formulate an inference model for transformer decoder-based LLMs.

Language Modelling Large Language Model +2

Cannot find the paper you are looking for? You can Submit a new open access paper.