no code implementations • 15 Feb 2024 • Théophile Cantelobre, Carlo Ciliberto, Benjamin Guedj, Alessandro Rudi
Sequential Bayesian Filtering aims to estimate the current state distribution of a Hidden Markov Model, given the past observations.
1 code implementation • NeurIPS 2023 • Gaspard Beugnot, Julien Mairal, Alessandro Rudi
We present a novel approach to non-convex optimization with certificates, which handles smooth functions on the hypercube or on the torus.
no code implementations • 24 May 2023 • Riccardo Bonalli, Alessandro Rudi
We propose a novel non-parametric learning paradigm for the identification of drift and diffusion coefficients of multi-dimensional non-linear stochastic differential equations, which relies upon discrete-time observations of the state.
no code implementations • 16 Jan 2023 • Pierre-Cyril Aubin-Frankowski, Alessandro Rudi
Assuming further that the functions appearing in the problem are smooth, focusing on pointwise equality constraints enables the use of scattering inequalities to mitigate the curse of dimensionality in sampling the constraints.
no code implementations • 16 Nov 2022 • Luc Brogat-Motte, Alessandro Rudi, Céline Brouard, Juho Rousu, Florence d'Alché-Buc
We propose and analyse a reduced-rank method for solving least-squares regression problems with infinite dimensional output.
1 code implementation • 26 May 2022 • Vivien Cabannes, Francis Bach, Vianney Perchet, Alessandro Rudi
The workhorse of machine learning is stochastic gradient descent.
no code implementations • 11 Apr 2022 • Blake Woodworth, Francis Bach, Alessandro Rudi
We consider potentially non-convex optimization problems, for which optimal rates of approximation depend on the dimension of the parameter space and the smoothness of the function to be optimized.
no code implementations • 28 Feb 2022 • Gaspard Beugnot, Julien Mairal, Alessandro Rudi
This paper studies an intriguing phenomenon related to the good generalization performance of estimators obtained by using large learning rates within gradient descent algorithms.
1 code implementation • 11 Feb 2022 • Théophile Cantelobre, Carlo Ciliberto, Benjamin Guedj, Alessandro Rudi
Measures of similarity (or dissimilarity) are a key ingredient to many machine learning algorithms.
no code implementations • 31 Jan 2022 • Antoine Chatalic, Nicolas Schreuder, Alessandro Rudi, Lorenzo Rosasco
Our main result is an upper bound on the approximation error of this procedure.
no code implementations • 3 Dec 2021 • Boris Muzellec, Adrien Vacher, Francis Bach, François-Xavier Vialard, Alessandro Rudi
It was recently shown that under smoothness conditions, the squared Wasserstein distance between two distributions could be efficiently computed with appealing statistical error upper bounds.
1 code implementation • 22 Nov 2021 • Boris Muzellec, Francis Bach, Alessandro Rudi
Shape constraints such as positive semi-definiteness (PSD) for matrices or convexity for functions play a central role in many applications in machine learning and sciences, including metric learning, optimal transport, and economics.
no code implementations • 20 Oct 2021 • Ulysse Marteau-Ferey, Francis Bach, Alessandro Rudi
In many areas of applied statistics and machine learning, generating an arbitrary number of independent and identically distributed (i. i. d.)
no code implementations • NeurIPS 2021 • Rémi Jézéquel, Pierre Gaillard, Alessandro Rudi
Mixability has been shown to be a powerful tool to obtain algorithms with optimal regret.
no code implementations • NeurIPS 2021 • Alessandro Rudi, Carlo Ciliberto
Finding a good way to model probability densities is key to probabilistic inference.
1 code implementation • 18 Jun 2021 • Boris Muzellec, Francis Bach, Alessandro Rudi
Kernel mean embeddings are a popular tool that consists in representing probability measures by their infinite-dimensional mean embeddings in a reproducing kernel Hilbert space.
no code implementations • NeurIPS 2021 • Gaspard Beugnot, Julien Mairal, Alessandro Rudi
The theory of spectral filtering is a remarkable tool to understand the statistical properties of learning with kernels.
no code implementations • 31 May 2021 • Alex Nowak-Vila, Alessandro Rudi, Francis Bach
The resulting loss is also a generalization of the binary support vector machine and it is consistent under milder conditions on the discrete loss.
no code implementations • NeurIPS 2021 • Gaspard Beugnot, Julien Mairal, Alessandro Rudi
The theory of spectral filtering is a remarkable tool to understand the statistical properties of learning with kernels.
no code implementations • 6 Feb 2021 • Oleksandr Zadorozhnyi, Pierre Gaillard, Sebastien Gerschinovitz, Alessandro Rudi
In this work we investigate the variation of the online kernelized ridge regression algorithm in the setting of $d-$dimensional adversarial nonparametric regression.
1 code implementation • 4 Feb 2021 • Vivien Cabannes, Francis Bach, Alessandro Rudi
Machine learning approached through supervised learning requires expensive annotation of data.
no code implementations • 1 Feb 2021 • Vivien Cabannes, Alessandro Rudi, Francis Bach
Discrete supervised learning problems such as classification are often tackled by introducing a continuous surrogate problem akin to regression.
no code implementations • 13 Jan 2021 • Adrien Vacher, Boris Muzellec, Alessandro Rudi, Francis Bach, Francois-Xavier Vialard
It is well-known that plug-in statistical estimation of optimal transport suffers from the curse of dimensionality.
Statistics Theory Optimization and Control Statistics Theory 62G05
no code implementations • 22 Dec 2020 • Alessandro Rudi, Ulysse Marteau-Ferey, Francis Bach
We consider the global minimization of smooth functions based solely on function evaluations.
2 code implementations • NeurIPS 2021 • Vivien Cabannes, Loucas Pillaud-Vivien, Francis Bach, Alessandro Rudi
As annotations of data can be scarce in large-scale practical problems, leveraging unlabelled examples is one of the most important aspects of machine learning.
no code implementations • 29 Jul 2020 • Luc Brogat-Motte, Alessandro Rudi, Céline Brouard, Juho Rousu, Florence d'Alché-Buc
A powerful and flexible approach to structured prediction consists in embedding the structured objects to be predicted into a feature space of possibly infinite dimension by means of output kernels, and then, solving a regression problem in this output space.
1 code implementation • NeurIPS 2020 • Ulysse Marteau-Ferey, Francis Bach, Alessandro Rudi
The paper is complemented by an experimental evaluation of the model showing its effectiveness in terms of formulation, algorithmic derivation and practical results on the problems of density estimation, regression with heteroscedastic errors, and multiple quantile regression.
1 code implementation • ICML 2020 • Alex Nowak-Vila, Francis Bach, Alessandro Rudi
Max-margin methods for binary classification such as the support vector machine (SVM) have been extended to the structured prediction setting under the name of max-margin Markov networks ($M^3N$), or more generally structural SVMs.
1 code implementation • NeurIPS 2020 • Giacomo Meanti, Luigi Carratino, Lorenzo Rosasco, Alessandro Rudi
Kernel methods provide an elegant and principled approach to nonparametric learning, but so far could hardly be used in large scale problems, since na\"ive implementations scale poorly with data size.
no code implementations • 17 Jun 2020 • Nicolò Pagliana, Alessandro Rudi, Ernesto De Vito, Lorenzo Rosasco
We study the learning properties of nonparametric ridge-less least squares.
no code implementations • 16 Jun 2020 • Thomas Eboli, Alex Nowak-Vila, Jian Sun, Francis Bach, Jean Ponce, Alessandro Rudi
We present a novel approach to image restoration that leverages ideas from localized structured prediction and non-linear multi-task learning.
no code implementations • 18 Mar 2020 • Rémi Jézéquel, Pierre Gaillard, Alessandro Rudi
We consider the setting of online logistic regression and consider the regret with respect to the 2-ball of radius B.
2 code implementations • ICML 2020 • Vivien Cabannes, Alessandro Rudi, Francis Bach
Annotating datasets is one of the main costs in nowadays supervised learning.
no code implementations • 13 Feb 2020 • Carlo Ciliberto, Lorenzo Rosasco, Alessandro Rudi
We propose and analyze a novel theoretical and algorithmic framework for structured prediction.
no code implementations • 28 Jan 2020 • Carlo Ciliberto, Andrea Rocchetto, Alessandro Rudi, Leonard Wossnig
Within the framework of statistical learning theory it is possible to bound the minimum number of samples required by a learner to reach a target accuracy.
no code implementations • 11 Jul 2019 • Nicholas Sterge, Bharath Sriperumbudur, Lorenzo Rosasco, Alessandro Rudi
In this paper, we propose and study a Nystr\"om based approach to efficient large scale kernel principal component analysis (PCA).
1 code implementation • NeurIPS 2019 • Ulysse Marteau-Ferey, Francis Bach, Alessandro Rudi
In this paper, we study large-scale convex optimization algorithms based on the Newton method applied to regularized generalized self-concordant losses, which include logistic regression and softmax regression.
1 code implementation • NeurIPS 2019 • Rémi Jézéquel, Pierre Gaillard, Alessandro Rudi
For $d$-dimensional inputs, we provide a (close to) optimal regret of order $O((\log n)^{d+1})$ with per-round time complexity and space complexity $O((\log n)^{2d})$.
no code implementations • 8 Feb 2019 • Dmitrii Ostrovskii, Alessandro Rudi
Denoting $\text{cond}(\mathbf{S})$ the condition number of $\mathbf{S}$, the computational cost of the novel estimator is $O(d^2 n + d^3\log(\text{cond}(\mathbf{S})))$, which is comparable to the cost of the sample covariance estimator in the statistically interesing regime $n \ge d$.
no code implementations • 8 Feb 2019 • Ulysse Marteau-Ferey, Dmitrii Ostrovskii, Francis Bach, Alessandro Rudi
We consider learning methods based on the regularization of a convex empirical risk by a squared Hilbertian norm, a setting that includes linear predictors and non-linear predictors through positive-definite kernels.
no code implementations • 5 Feb 2019 • Alex Nowak-Vila, Francis Bach, Alessandro Rudi
In this work we provide a theoretical framework for structured prediction that generalizes the existing theory of surrogate methods for binary and multiclass classification based on estimating conditional probabilities with smooth convex surrogates (e. g. logistic regression).
no code implementations • NeurIPS 2019 • Jason Altschuler, Francis Bach, Alessandro Rudi, Jonathan Niles-Weed
The Sinkhorn "distance", a variant of the Wasserstein distance with entropic regularization, is an increasingly popular tool in machine learning and statistical inference.
1 code implementation • NeurIPS 2018 • Alessandro Rudi, Daniele Calandriello, Luigi Carratino, Lorenzo Rosasco
Leverage score sampling provides an appealing way to perform approximate computations for large matrices.
no code implementations • 16 Oct 2018 • Alex Nowak-Vila, Francis Bach, Alessandro Rudi
The problem of devising learning strategies for discrete losses (e. g., multilabeling, ranking) is currently addressed with methods and theoretical analyses ad-hoc for each loss.
no code implementations • NeurIPS 2018 • Luigi Carratino, Alessandro Rudi, Lorenzo Rosasco
Sketching and stochastic gradient methods are arguably the most common techniques to derive efficient large scale learning algorithms.
no code implementations • NeurIPS 2018 • Alessandro Rudi, Carlo Ciliberto, Gian Maria Marconi, Lorenzo Rosasco
Structured prediction provides a general framework to deal with supervised problems where the outputs have semantically rich structure.
no code implementations • NeurIPS 2019 • Carlo Ciliberto, Francis Bach, Alessandro Rudi
Key to structured prediction is exploiting the problem structure to simplify the learning process.
2 code implementations • NeurIPS 2018 • Giulia Luise, Alessandro Rudi, Massimiliano Pontil, Carlo Ciliberto
Applications of optimal transport have recently gained remarkable attention thanks to the computational advantages of entropic regularization.
no code implementations • NeurIPS 2018 • Loucas Pillaud-Vivien, Alessandro Rudi, Francis Bach
We consider stochastic gradient descent (SGD) for least-squares regression with potentially several passes over the data.
no code implementations • 6 Apr 2018 • Alessandro Rudi, Leonard Wossnig, Carlo Ciliberto, Andrea Rocchetto, Massimiliano Pontil, Simone Severini
Simulating the time-evolution of quantum mechanical systems is BQP-hard and expected to be one of the foremost applications of quantum computers.
no code implementations • 20 Jan 2018 • Junhong Lin, Alessandro Rudi, Lorenzo Rosasco, Volkan Cevher
In this paper, we study regression problems over a separable Hilbert space with the square loss, covering non-parametric regression over a reproducing kernel Hilbert space.
no code implementations • 13 Dec 2017 • Loucas Pillaud-Vivien, Alessandro Rudi, Francis Bach
We consider binary classification problems with positive definite kernels and square loss, and study the convergence rates of stochastic gradient methods.
4 code implementations • NeurIPS 2017 • Alessandro Rudi, Luigi Carratino, Lorenzo Rosasco
In this paper, we take a substantial step in scaling up kernel methods, proposing FALKON, a novel algorithm that allows to efficiently process millions of points.
no code implementations • NeurIPS 2017 • Carlo Ciliberto, Alessandro Rudi, Lorenzo Rosasco, Massimiliano Pontil
However, in practice assuming the tasks to be linearly related might be restrictive, and allowing for nonlinear structures is a challenge.
no code implementations • NeurIPS 2016 • Carlo Ciliberto, Alessandro Rudi, Lorenzo Rosasco
We propose and analyze a regularization approach for structured prediction problems.
1 code implementation • NeurIPS 2017 • Alessandro Rudi, Lorenzo Rosasco
We study the generalization properties of ridge regression with random features in the statistical learning framework.
1 code implementation • 19 Oct 2015 • Tomas Angles, Raffaello Camoriano, Alessandro Rudi, Lorenzo Rosasco
Early stopping is a well known approach to reduce the time complexity for performing training and model selection of large scale learning machines.
1 code implementation • NeurIPS 2015 • Alessandro Rudi, Raffaello Camoriano, Lorenzo Rosasco
We study Nystr\"om type subsampling approaches to large scale kernel methods, and prove learning bounds in the statistical learning setting, where random sampling and high probability estimates are considered.
no code implementations • NeurIPS 2013 • Alessandro Rudi, Guille D. Canas, Lorenzo Rosasco
A large number of algorithms in machine learning, from principal component analysis (PCA), and its non-linear (kernel) extensions, to more recent spectral embedding and support estimation methods, rely on estimating a linear subspace from samples.