Search Results for author: Zihan Qiu

Found 7 papers, 5 papers with code

Stacking Your Transformers: A Closer Look at Model Growth for Efficient LLM Pre-Training

no code implementations • 24 May 2024 • Wenyu Du, Tongxu Luo, Zihan Qiu, Zeyu Huang, Yikang Shen, Reynold Cheng, Yike Guo, Jie Fu

For example, compared to a conventionally trained 7B model using 300B tokens, our $G_{\text{stack}}$ model converges to the same loss with 194B tokens, resulting in a 54. 6\% speedup.

Paper
Add Code

A Survey on Multilingual Large Language Models: Corpora, Alignment, and Bias

no code implementations • 1 Apr 2024 • Yuemei Xu, Ling Hu, Jiayi Zhao, Zihan Qiu, Yuqi Ye, Hanwen Gu

Based on the foundation of Large Language Models (LLMs), Multilingual Large Language Models (MLLMs) have been developed to address the challenges of multilingual natural language processing tasks, hoping to achieve knowledge transfer from high-resource to low-resource languages.

Transfer Learning

Paper
Add Code

HyperMoE: Towards Better Mixture of Experts via Transferring Among Experts

1 code implementation • 20 Feb 2024 • Hao Zhao, Zihan Qiu, Huijia Wu, Zili Wang, Zhaofeng He, Jie Fu

The Mixture of Experts (MoE) for language models has been proven effective in augmenting the capacity of models by dynamically routing each input token to a specific subset of experts for processing.

Multi-Task Learning

Paper
Code

Empirical Study on Updating Key-Value Memories in Transformer Feed-forward Layers

1 code implementation • 19 Feb 2024 • Zihan Qiu, Zeyu Huang, Youcheng Huang, Jie Fu

The feed-forward networks (FFNs) in transformers are recognized as a group of key-value neural memories to restore abstract high-level knowledge.

knowledge editing

Paper
Code

Unlocking Emergent Modularity in Large Language Models

1 code implementation • 17 Oct 2023 • Zihan Qiu, Zeyu Huang, Jie Fu

Despite the benefits of modularity, most Language Models (LMs) are still treated as monolithic models in the pre-train and fine-tune paradigm, with their emergent modularity locked and underutilized.

Domain Generalization Transfer Learning

Paper
Code

Heterogenous Memory Augmented Neural Networks

1 code implementation • 17 Oct 2023 • Zihan Qiu, Zhen Liu, Shuicheng Yan, Shanghang Zhang, Jie Fu

It has been shown that semi-parametric methods, which combine standard neural networks with non-parametric components such as external memory modules and data retrieval, are particularly helpful in data scarcity and out-of-distribution (OOD) scenarios.

Retrieval

Paper
Code

Supported Policy Optimization for Offline Reinforcement Learning

3 code implementations • 13 Feb 2022 • Jialong Wu, Haixu Wu, Zihan Qiu, Jianmin Wang, Mingsheng Long

Policy constraint methods to offline reinforcement learning (RL) typically utilize parameterization or regularization that constrains the policy to perform actions within the support set of the behavior policy.

Offline RL reinforcement-learning +1

404

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.