Search Results for author: Ruslan Svirschevski

Sequoia: Scalable, Robust, and Hardware-aware Speculative Decoding

This paper introduces Sequoia, a scalable, robust, and hardware-aware algorithm for speculative decoding.

256

Paper
Code

Recent advances in large language model (LLM) pretraining have led to high-quality LLMs with impressive abilities.

512

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.