no code implementations • 28 Dec 2020 • Stanislaw Jastrzebski, Devansh Arpit, Oliver Astrand, Giancarlo Kerg, Huan Wang, Caiming Xiong, Richard Socher, Kyunghyun Cho, Krzysztof Geras
The early phase of training a deep neural network has a dramatic effect on the local curvature of the loss function.
1 code implementation • 28 Nov 2020 • Taro Makino, Stanislaw Jastrzebski, Witold Oleszkiewicz, Celin Chacko, Robin Ehrenpreis, Naziya Samreen, Chloe Chhor, Eric Kim, Jiyon Lee, Kristine Pysarenko, Beatriu Reig, Hildegard Toth, Divya Awal, Linda Du, Alice Kim, James Park, Daniel K. Sodickson, Laura Heacock, Linda Moy, Kyunghyun Cho, Krzysztof J. Geras
We compare the two with respect to their robustness to Gaussian low-pass filtering, performing a subgroup analysis on microcalcifications and soft tissue lesions.
no code implementations • WS 2020 • Diksha Meghwal, Katharina Kann, Iacer Calixto, Stanislaw Jastrzebski
Pretrained language models have obtained impressive results for a large set of natural language understanding tasks.
1 code implementation • 20 Jun 2020 • Tobiasz Cieplinski, Tomasz Danel, Sabina Podlewska, Stanislaw Jastrzebski
To close this gap, we propose a benchmark based on docking, a popular computational method for assessing molecule binding to a protein.
no code implementations • ICLR 2020 • Stanislaw Jastrzebski, Maciej Szymczak, Stanislav Fort, Devansh Arpit, Jacek Tabor, Kyunghyun Cho, Krzysztof Geras
We argue for the existence of the "break-even" point on this trajectory, beyond which the curvature of the loss surface and noise in the gradient are implicitly regularized by SGD.
1 code implementation • NeurIPS 2019 • Stanislav Fort, Stanislaw Jastrzebski
There are many surprising and perhaps counter-intuitive properties of optimization of deep neural networks.
15 code implementations • 2 Feb 2019 • Neil Houlsby, Andrei Giurgiu, Stanislaw Jastrzebski, Bruna Morrone, Quentin de Laroussilhe, Andrea Gesmundo, Mona Attariyan, Sylvain Gelly
On GLUE, we attain within 0. 4% of the performance of full fine-tuning, adding only 3. 6% parameters per task.
Ranked #4 on Image Classification on OmniBenchmark (using extra training data)
no code implementations • 28 Jan 2019 • Stanislav Fort, Paweł Krzysztof Nowak, Stanislaw Jastrzebski, Srini Narayanan
In particular, we study how stiffness depends on 1) class membership, 2) distance between data points in the input space, 3) training iteration, and 4) learning rate.