no code implementations • 24 May 2024 • Wenyu Du, Tongxu Luo, Zihan Qiu, Zeyu Huang, Yikang Shen, Reynold Cheng, Yike Guo, Jie Fu
For example, compared to a conventionally trained 7B model using 300B tokens, our $G_{\text{stack}}$ model converges to the same loss with 194B tokens, resulting in a 54. 6\% speedup.
no code implementations • 1 Apr 2024 • Yuemei Xu, Ling Hu, Jiayi Zhao, Zihan Qiu, Yuqi Ye, Hanwen Gu
Based on the foundation of Large Language Models (LLMs), Multilingual Large Language Models (MLLMs) have been developed to address the challenges of multilingual natural language processing tasks, hoping to achieve knowledge transfer from high-resource to low-resource languages.
1 code implementation • 20 Feb 2024 • Hao Zhao, Zihan Qiu, Huijia Wu, Zili Wang, Zhaofeng He, Jie Fu
The Mixture of Experts (MoE) for language models has been proven effective in augmenting the capacity of models by dynamically routing each input token to a specific subset of experts for processing.
1 code implementation • 19 Feb 2024 • Zihan Qiu, Zeyu Huang, Youcheng Huang, Jie Fu
The feed-forward networks (FFNs) in transformers are recognized as a group of key-value neural memories to restore abstract high-level knowledge.
1 code implementation • 17 Oct 2023 • Zihan Qiu, Zeyu Huang, Jie Fu
Despite the benefits of modularity, most Language Models (LMs) are still treated as monolithic models in the pre-train and fine-tune paradigm, with their emergent modularity locked and underutilized.
1 code implementation • 17 Oct 2023 • Zihan Qiu, Zhen Liu, Shuicheng Yan, Shanghang Zhang, Jie Fu
It has been shown that semi-parametric methods, which combine standard neural networks with non-parametric components such as external memory modules and data retrieval, are particularly helpful in data scarcity and out-of-distribution (OOD) scenarios.
3 code implementations • 13 Feb 2022 • Jialong Wu, Haixu Wu, Zihan Qiu, Jianmin Wang, Mingsheng Long
Policy constraint methods to offline reinforcement learning (RL) typically utilize parameterization or regularization that constrains the policy to perform actions within the support set of the behavior policy.