no code implementations • 19 Dec 2023 • Hui Wu, Yi Gan, Feng Yuan, Jing Ma, Wei Zhu, Yutao Xu, Hong Zhu, Yuhua Zhu, Xiaoli Liu, Jinghui Gu
A customized Scaled-Dot-Product-Attention kernel is designed to match our fusion policy based on the segment KV cache solution.