1 code implementation • 19 Feb 2024 • Zhuoming Chen, Avner May, Ruslan Svirschevski, Yuhsun Huang, Max Ryabinin, Zhihao Jia, Beidi Chen
This paper introduces Sequoia, a scalable, robust, and hardware-aware algorithm for speculative decoding.
1 code implementation • 5 Jun 2023 • Tim Dettmers, Ruslan Svirschevski, Vage Egiazarian, Denis Kuznedelev, Elias Frantar, Saleh Ashkboos, Alexander Borzunov, Torsten Hoefler, Dan Alistarh
Recent advances in large language model (LLM) pretraining have led to high-quality LLMs with impressive abilities.