1 code implementation • 12 Dec 2023 • Giang Do, Khiem Le, Quang Pham, TrungTin Nguyen, Thanh-Nam Doan, Bint T. Nguyen, Chenghao Liu, Savitha Ramasamy, XiaoLi Li, Steven Hoi
By routing input tokens to only a few split experts, Sparse Mixture-of-Experts has enabled efficient training of large language models.