no code implementations • 27 Apr 2024 • Yizhe Xiong, Xiansheng Chen, Xin Ye, Hui Chen, Zijia Lin, Haoran Lian, Jianwei Niu, Guiguang Ding
We first investigate the imbalance of loss on each token positions and develop a reciprocal-law across model scales and training stages.