Search Results for author: Jinwei Yao

Found 1 papers, 0 papers with code

DeFT: Decoding with Flash Tree-attention for Efficient Tree-structured LLM Inference

no code implementations30 Mar 2024 Jinwei Yao, Kaiqi Chen, Kexun Zhang, Jiaxuan You, Binhang Yuan, Zeke Wang, Tao Lin

Given the increasing demand for tree-structured interactions with LLMs, we introduce DeFT (Decoding with Flash Tree-Attention), an IO-aware tree attention algorithm tailored for tree-structured inference.

Cannot find the paper you are looking for? You can Submit a new open access paper.