Search Results for author: Wanzin Yazar

Found 2 papers, 0 papers with code

Combining multiple post-training techniques to achieve most efficient quantized LLMs

no code implementations • 12 May 2024 • Sayeh Sharify, Zifei Xu, Wanzin Yazar, Xin Wang

Large Language Models (LLMs) have distinguished themselves with outstanding performance in complex language modeling tasks, yet they come with significant computational and storage challenges.

Paper
Add Code

Self-Selected Attention Span for Accelerating Large Language Model Inference

no code implementations • 14 Apr 2024 • Tian Jin, Wanzin Yazar, Zifei Xu, Sayeh Sharify, Xin Wang

We demonstrate that using this custom CUDA kernel improves the throughput of LLM inference by 28%.

Language Modelling Large Language Model

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.