Search Results for author: Joshua Rosenkranz

Found 1 papers, 1 papers with code

Accelerating Production LLMs with Combined Token/Embedding Speculators

1 code implementation • 29 Apr 2024 • Davis Wertheimer, Joshua Rosenkranz, Thomas Parnell, Sahil Suneja, Pavithra Ranganathan, Raghu Ganti, Mudhakar Srivatsa

This technical report describes the design and training of novel speculative decoding draft models, for accelerating the inference speeds of large language models in a production environment.

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.