Search Results for author: Xiansheng Chen

Temporal Scaling Law for Large Language Models

We first investigate the imbalance of loss on each token positions and develop a reciprocal-law across model scales and training stages.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.