no code implementations • 25 Apr 2024 • Shi-Yu Xia, Wenxuan Zhu, Xu Yang, Xin Geng
When initializing variable-sized models adapting for different resource constraints, SWS achieves better results while reducing around 20x parameters stored to initialize these models and around 10x pre-training costs, in contrast to the pre-training and fine-tuning approach.