no code implementations • 27 Apr 2024 • Yunzhen Feng, Tim G. J. Rudner, Nikolaos Tsilivis, Julia Kempe
Adversarial examples have been shown to cause neural networks to fail on a wide range of vision and language tasks, but recent work has claimed that Bayesian neural networks (BNNs) are inherently robust to adversarial perturbations.
no code implementations • 21 Feb 2024 • Kai Yang, Jan Ackermann, Zhenyu He, Guhao Feng, Bohang Zhang, Yunzhen Feng, Qiwei Ye, Di He, LiWei Wang
Our results show that while these models are expressive enough to solve general DP tasks, contrary to expectations, they require a model size that scales with the problem size.
no code implementations • 12 Feb 2024 • Elvis Dohmatob, Yunzhen Feng, Julia Kempe
In the era of proliferation of large language and image generation models, the phenomenon of "model collapse" refers to the situation whereby as a model is trained recursively on data generated from previous generations of itself over time, its performance degrades until the model eventually becomes completely useless, i. e the model collapses.
no code implementations • 10 Feb 2024 • Elvis Dohmatob, Yunzhen Feng, Pu Yang, Francois Charton, Julia Kempe
We discover a wide range of decay phenomena, analyzing loss of scaling, shifted scaling with number of generations, the ''un-learning" of skills, and grokking when mixing human and synthesized data.
no code implementations • 28 Sep 2020 • Chizhou Liu, Yunzhen Feng, Ranran Wang, Bin Dong
Moreover, SWEEN models constructed using a few small models can achieve comparable performance to a single large model with a notable reduction in training time.
no code implementations • 24 Jul 2020 • Yunzhen Feng, Runtian Zhai, Di He, Li-Wei Wang, Bin Dong
Our experiments show that TD can provide fine-grained information for varied downstream tasks, and for the models trained from different initializations, the learned features are not the same in terms of downstream-task predictions.
no code implementations • ICML Workshop AML 2021 • Chizhou Liu, Yunzhen Feng, Ranran Wang, Bin Dong
Moreover, SWEEN models constructed using a few small models can achieve comparable performance to a single large model with a notable reduction in training time.