no code implementations • 5 Feb 2024 • Wu Lin, Felix Dangel, Runa Eschenhagen, Juhan Bae, Richard E. Turner, Alireza Makhzani
Adaptive gradient optimizers like Adam(W) are the default training algorithms for many deep learning architectures, such as transformers.
1 code implementation • 9 Dec 2023 • Wu Lin, Felix Dangel, Runa Eschenhagen, Kirill Neklyudov, Agustinus Kristiadi, Richard E. Turner, Alireza Makhzani
Second-order methods for deep learning -- such as KFAC -- can be useful for neural net training.
no code implementations • 29 Sep 2023 • Jonathan Wenger, Felix Dangel, Agustinus Kristiadi
Our empirical results demonstrate that this is not the case in optimization, uncertainty quantification or continual learning.
no code implementations • 5 Jul 2023 • Felix Dangel
Despite their simple intuition, convolutions are more tedious to analyze than dense layers, which complicates the generalization of theoretical and algorithmic ideas.
3 code implementations • 4 Jun 2021 • Felix Dangel, Lukas Tatzel, Philipp Hennig
Curvature in form of the Hessian or its generalized Gauss-Newton (GGN) approximation is valuable for algorithms that rely on a local model for the loss to train, compress, or explain deep networks.
2 code implementations • NeurIPS 2021 • Frank Schneider, Felix Dangel, Philipp Hennig
When engineers train deep learning models, they are very much 'flying blind'.
1 code implementation • ICLR 2020 • Felix Dangel, Frederik Kunstner, Philipp Hennig
Automatic differentiation frameworks are optimized for exactly one thing: computing the average mini-batch gradient.
1 code implementation • 5 Feb 2019 • Felix Dangel, Stefan Harmeling, Philipp Hennig
We propose a modular extension of backpropagation for the computation of block-diagonal approximations to various curvature matrices of the training objective (in particular, the Hessian, generalized Gauss-Newton, and positive-curvature Hessian).