no code implementations • 24 May 2024 • Luca Arnaboldi, Yatin Dandi, Florent Krzakala, Luca Pesce, Ludovic Stephan
Here, we investigate the training dynamics of two-layer shallow neural networks trained with gradient-based algorithms, and discuss how they learn pertinent features in multi-index models, that is target functions with low-dimensional relevant directions.
1 code implementation • 7 Feb 2024 • Hugo Cui, Luca Pesce, Yatin Dandi, Florent Krzakala, Yue M. Lu, Lenka Zdeborová, Bruno Loureiro
To our knowledge, our results provides the first tight description of the impact of feature learning in the generalization of two-layer neural networks in the large learning rate regime $\eta=\Theta_{d}(d)$, beyond perturbative finite width corrections of the conjugate and neural tangent kernels.
no code implementations • 5 Feb 2024 • Yatin Dandi, Emanuele Troiani, Luca Arnaboldi, Luca Pesce, Lenka Zdeborová, Florent Krzakala
In particular, multi-pass GD with finite stepsize is found to overcome the limitations of gradient flow and single-pass GD given by the information exponent (Ben Arous et al., 2021) and leap exponent (Abbe et al., 2023) of the target function.
1 code implementation • 29 May 2023 • Yatin Dandi, Florent Krzakala, Bruno Loureiro, Luca Pesce, Ludovic Stephan
The picture drastically improves over multiple gradient steps: we show that a batch-size of $n = \mathcal{O}(d)$ is indeed enough to learn multiple target directions satisfying a staircase property, where more and more directions can be learned over time.
1 code implementation • 17 Feb 2023 • Luca Pesce, Florent Krzakala, Bruno Loureiro, Ludovic Stephan
Motivated by the recent stream of results on the Gaussian universality of the test and training errors in generalized linear estimation, we ask ourselves the question: "when is a single Gaussian enough to characterize the error?".
1 code implementation • 26 May 2022 • Luca Pesce, Bruno Loureiro, Florent Krzakala, Lenka Zdeborová
A simple model to study subspace clustering is the high-dimensional $k$-Gaussian mixture model where the cluster means are sparse vectors.