no code implementations • 24 May 2024 • Leonardo Defilippis, Bruno Loureiro, Theodor Misiakiewicz
Our main contribution is a general deterministic equivalent for the test error of RFRR.
no code implementations • 13 Mar 2024 • Hong Hu, Yue M. Lu, Theodor Misiakiewicz
On the other hand, if $p = o(n)$, the number of random features $p$ is the limiting factor and RFRR test error matches the approximation error of the random feature model class (akin to taking $n = \infty$).
no code implementations • 13 Mar 2024 • Theodor Misiakiewicz, Basil Saeed
Specifically, we establish in this setting a non-asymptotic deterministic approximation for the test error of KRR -- with explicit non-asymptotic bounds -- that only depends on the eigenvalues and the target function alignment to the eigenvectors of the kernel.
no code implementations • 25 Aug 2023 • Theodor Misiakiewicz, Andrea Montanari
In these six lectures, we examine what can be learnt about the behavior of multi-layer neural networks from the analysis of linear models.
no code implementations • 21 Feb 2023 • Emmanuel Abbe, Enric Boix-Adsera, Theodor Misiakiewicz
For $d$-dimensional uniform Boolean or isotropic Gaussian data, our main conjecture states that the time complexity to learn a function $f$ with low-dimensional support is $\tilde\Theta (d^{\max(\mathrm{Leap}(f), 2)})$.
no code implementations • 30 May 2022 • Lechao Xiao, Hong Hu, Theodor Misiakiewicz, Yue M. Lu, Jeffrey Pennington
As modern machine learning models continue to advance the computational frontier, it has become increasingly important to develop precise estimates for expected performance improvements under different model and data scaling regimes.
no code implementations • 21 Apr 2022 • Theodor Misiakiewicz
In this regime, the kernel matrix is well approximated by its degree-$\ell$ polynomial approximation and can be decomposed into a low-rank spike matrix, identity and a `Gegenbauer matrix' with entries $Q_\ell (\langle \textbf{x}_i , \textbf{x}_j \rangle)$, where $Q_\ell$ is the degree-$\ell$ Gegenbauer polynomial.
no code implementations • 17 Feb 2022 • Emmanuel Abbe, Enric Boix-Adsera, Theodor Misiakiewicz
It is currently known how to characterize functions that neural networks can learn with SGD for two extremal parameterizations: neural networks in the linear regime, and neural networks with no structural constraints.
no code implementations • 16 Nov 2021 • Theodor Misiakiewicz, Song Mei
Recent empirical work has shown that hierarchical convolutional kernels inspired by convolutional neural networks (CNNs) significantly improve the performance of kernel methods in image classification tasks.
no code implementations • 30 Mar 2021 • Michael Celentano, Theodor Misiakiewicz, Andrea Montanari
We study random features approximations to these norms and show that, for $p>1$, the number of random features required to approximate the original learning problem is upper bounded by a polynomial in the sample size.
no code implementations • 25 Feb 2021 • Song Mei, Theodor Misiakiewicz, Andrea Montanari
Certain neural network architectures -- for instance, convolutional networks -- are believed to owe their success to the fact that they exploit such invariance properties.
no code implementations • 26 Jan 2021 • Song Mei, Theodor Misiakiewicz, Andrea Montanari
We show that the test error of random features ridge regression is dominated by its approximation error and is larger than the error of KRR as long as $N\le n^{1-\delta}$ for some $\delta>0$.
1 code implementation • NeurIPS 2020 • Behrooz Ghorbani, Song Mei, Theodor Misiakiewicz, Andrea Montanari
Recent empirical work showed that, for some classification tasks, RKHS methods can replace NNs without a large loss in performance.
1 code implementation • NeurIPS 2019 • Song Mei, Theodor Misiakiewicz, Behrooz Ghorbani, Andrea Montanari
We study the supervised learning problem under either of the following two models: (1) Feature vectors x_i are d-dimensional Gaussian and responses are y_i = f_*(x_i) for f_* an unknown quadratic function; (2) Feature vectors x_i are distributed as a mixture of two d-dimensional centered Gaussians, and y_i's are the corresponding class labels.
1 code implementation • 21 Jun 2019 • Behrooz Ghorbani, Song Mei, Theodor Misiakiewicz, Andrea Montanari
We study the supervised learning problem under either of the following two models: (1) Feature vectors ${\boldsymbol x}_i$ are $d$-dimensional Gaussians and responses are $y_i = f_*({\boldsymbol x}_i)$ for $f_*$ an unknown quadratic function; (2) Feature vectors ${\boldsymbol x}_i$ are distributed as a mixture of two $d$-dimensional centered Gaussians, and $y_i$'s are the corresponding class labels.
no code implementations • 27 Apr 2019 • Behrooz Ghorbani, Song Mei, Theodor Misiakiewicz, Andrea Montanari
Both these approaches can also be regarded as randomized approximations of kernel ridge regression (with respect to different kernels), and enjoy universal approximation properties when the number of neurons $N$ diverges, for a fixed dimension $d$.
no code implementations • 16 Feb 2019 • Song Mei, Theodor Misiakiewicz, Andrea Montanari
Earlier work shows that (under some regularity assumptions), the mean field description is accurate as soon as the number of hidden units is much larger than the dimension $D$.
no code implementations • 25 Mar 2017 • Song Mei, Theodor Misiakiewicz, Andrea Montanari, Roberto I. Oliveira
In this paper we study the rank-constrained version of SDPs arising in MaxCut and in synchronization problems.
no code implementations • 23 Sep 2015 • Andrey Y. Lokhov, Theodor Misiakiewicz
A number of recent papers introduced efficient algorithms for the estimation of spreading parameters, based on the maximization of the likelihood of observed cascades, assuming that the full information for all the nodes in the network is available.