no code implementations • 27 Apr 2024 • Haoran Lian, Yizhe Xiong, Jianwei Niu, Shasha Mo, Zhenpeng Su, Zijia Lin, Peng Liu, Hui Chen, Guiguang Ding
Due to their infrequent appearance in the text corpus, Scaffold Tokens pose a learning imbalance issue for language models.
no code implementations • 27 Apr 2024 • Yizhe Xiong, Xiansheng Chen, Xin Ye, Hui Chen, Zijia Lin, Haoran Lian, Jianwei Niu, Guiguang Ding
We first investigate the imbalance of loss on each token positions and develop a reciprocal-law across model scales and training stages.