Search Results for author: Jinwei Yao

Found 1 papers, 0 papers with code

DeFT: Decoding with Flash Tree-attention for Efficient Tree-structured LLM Inference

no code implementations • 30 Mar 2024 • Jinwei Yao, Kaiqi Chen, Kexun Zhang, Jiaxuan You, Binhang Yuan, Zeke Wang, Tao Lin

Given the increasing demand for tree-structured interactions with LLMs, we introduce DeFT (Decoding with Flash Tree-Attention), an IO-aware tree attention algorithm tailored for tree-structured inference.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.