no code implementations • 15 May 2024 • Chi Ma, Mincong Huang, Chao Wang, Yujie Wang, Lei Yu
In this work, we systematically investigate the efficacy of dynamic activation mechanisms within the LLaMA family of language models.
no code implementations • 4 Jan 2024 • Mincong Huang, Chao Wang, Chi Ma, Yineng Zhang, Peng Zhang, Lei Yu
Pipeline parallelism is an essential technique in the training of large-scale Transformer models.