no code implementations • 21 Feb 2024 • Dominik Schröder, Daniil Dmitriev, Hugo Cui, Bruno Loureiro
For a large class of feature maps we provide a tight asymptotic characterisation of the test error associated with learning the readout layer, in the high-dimensional limit where the input dimension, hidden layer widths, and number of training samples are proportionally large.
1 code implementation • 7 Feb 2024 • Hugo Cui, Luca Pesce, Yatin Dandi, Florent Krzakala, Yue M. Lu, Lenka Zdeborová, Bruno Loureiro
In this manuscript, we investigate the problem of how two-layer neural networks learn features from data, and improve over the kernel regime, after being trained with a single gradient descent step.
no code implementations • 6 Feb 2024 • Hugo Cui, Freya Behrens, Florent Krzakala, Lenka Zdeborová
We investigate how a dot-product attention layer learns a positional attention matrix (with tokens attending to each other based on their respective positions) and a semantic attention matrix (with tokens attending to each other based on their meaning).
1 code implementation • 5 Oct 2023 • Hugo Cui, Florent Krzakala, Eric Vanden-Eijnden, Lenka Zdeborová
We study the problem of training a flow-based generative model, parametrized by a two-layer autoencoder, to sample from a high-dimensional Gaussian mixture.
1 code implementation • 1 Feb 2023 • Dominik Schröder, Hugo Cui, Daniil Dmitriev, Bruno Loureiro
Establishing this result requires proving a deterministic equivalent for traces of the deep random features sample covariance matrices which can be of independent interest.
no code implementations • 1 Feb 2023 • Hugo Cui, Florent Krzakala, Lenka Zdeborová
We consider the problem of learning a target function corresponding to a deep, extensive-width, non-linear neural network with random Gaussian weights.
no code implementations • 29 Jan 2022 • Hugo Cui, Bruno Loureiro, Florent Krzakala, Lenka Zdeborová
We find that our rates tightly describe the learning curves for this class of data sets, and are also observed on real data.
no code implementations • NeurIPS 2021 • Hugo Cui, Bruno Loureiro, Florent Krzakala, Lenka Zdeborová
In this work, we unify and extend this line of work, providing characterization of all regimes and excess error decay rates that can be observed in terms of the interplay of noise and regularization.
1 code implementation • NeurIPS 2021 • Bruno Loureiro, Cédric Gerbelot, Hugo Cui, Sebastian Goldt, Florent Krzakala, Marc Mézard, Lenka Zdeborová
While still solvable in a closed form, this generalization is able to capture the learning curves for a broad range of realistic data sets, thus redeeming the potential of the teacher-student framework.
no code implementations • 9 Dec 2019 • Hugo Cui, Luca Saglietti, Lenka Zdeborová
These large deviations then provide optimal achievable performance boundaries for any active learning algorithm.