Search Results for author: Alessandro Rudi

Found 59 papers, 19 papers with code

Closed-form Filtering for Non-linear Systems

no code implementations • 15 Feb 2024 • Théophile Cantelobre, Carlo Ciliberto, Benjamin Guedj, Alessandro Rudi

Sequential Bayesian Filtering aims to estimate the current state distribution of a Hidden Markov Model, given the past observations.

Computational Efficiency

Paper
Add Code

GloptiNets: Scalable Non-Convex Optimization with Certificates

1 code implementation • NeurIPS 2023 • Gaspard Beugnot, Julien Mairal, Alessandro Rudi

We present a novel approach to non-convex optimization with certificates, which handles smooth functions on the hypercube or on the torus.

Paper
Code

Non-Parametric Learning of Stochastic Differential Equations with Non-asymptotic Fast Rates of Convergence

no code implementations • 24 May 2023 • Riccardo Bonalli, Alessandro Rudi

We propose a novel non-parametric learning paradigm for the identification of drift and diffusion coefficients of multi-dimensional non-linear stochastic differential equations, which relies upon discrete-time observations of the state.

Paper
Add Code

Approximation of optimization problems with constraints through kernel Sum-Of-Squares

no code implementations • 16 Jan 2023 • Pierre-Cyril Aubin-Frankowski, Alessandro Rudi

Assuming further that the functions appearing in the problem are smooth, focusing on pointwise equality constraints enables the use of scattering inequalities to mitigate the curse of dimensionality in sampling the constraints.

Paper
Add Code

Vector-Valued Least-Squares Regression under Output Regularity Assumptions

no code implementations • 16 Nov 2022 • Luc Brogat-Motte, Alessandro Rudi, Céline Brouard, Juho Rousu, Florence d'Alché-Buc

We propose and analyse a reduced-rank method for solving least-squares regression problems with infinite dimensional output.

Image Reconstruction Multi-Label Classification +2

Paper
Add Code

Active Labeling: Streaming Stochastic Gradients

1 code implementation • 26 May 2022 • Vivien Cabannes, Francis Bach, Vianney Perchet, Alessandro Rudi

The workhorse of machine learning is stochastic gradient descent.

Active Learning regression

Paper
Code

Non-Convex Optimization with Certificates and Fast Rates Through Kernel Sums of Squares

no code implementations • 11 Apr 2022 • Blake Woodworth, Francis Bach, Alessandro Rudi

We consider potentially non-convex optimization problems, for which optimal rates of approximation depend on the dimension of the parameter space and the smoothness of the function to be optimized.

Paper
Add Code

On the Benefits of Large Learning Rates for Kernel Methods

no code implementations • 28 Feb 2022 • Gaspard Beugnot, Julien Mairal, Alessandro Rudi

This paper studies an intriguing phenomenon related to the good generalization performance of estimators obtained by using large learning rates within gradient descent algorithms.

Paper
Add Code

Measuring dissimilarity with diffeomorphism invariance

1 code implementation • 11 Feb 2022 • Théophile Cantelobre, Carlo Ciliberto, Benjamin Guedj, Alessandro Rudi

Measures of similarity (or dissimilarity) are a key ingredient to many machine learning algorithms.

Paper
Code

Nyström Kernel Mean Embeddings

no code implementations • 31 Jan 2022 • Antoine Chatalic, Nicolas Schreuder, Alessandro Rudi, Lorenzo Rosasco

Our main result is an upper bound on the approximation error of this procedure.

Paper
Add Code

Near-optimal estimation of smooth transport maps with kernel sums-of-squares

no code implementations • 3 Dec 2021 • Boris Muzellec, Adrien Vacher, Francis Bach, François-Xavier Vialard, Alessandro Rudi

It was recently shown that under smoothness conditions, the squared Wasserstein distance between two distributions could be efficiently computed with appealing statistical error upper bounds.

Paper
Add Code

Learning PSD-valued functions using kernel sums-of-squares

1 code implementation • 22 Nov 2021 • Boris Muzellec, Francis Bach, Alessandro Rudi

Shape constraints such as positive semi-definiteness (PSD) for matrices or convexity for functions play a central role in many applications in machine learning and sciences, including metric learning, optimal transport, and economics.

Metric Learning regression

Paper
Code

Sampling from Arbitrary Functions via PSD Models

no code implementations • 20 Oct 2021 • Ulysse Marteau-Ferey, Francis Bach, Alessandro Rudi

In many areas of applied statistics and machine learning, generating an arbitrary number of independent and identically distributed (i. i. d.)

Paper
Add Code

Mixability made efficient: Fast online multiclass logistic regression

no code implementations • NeurIPS 2021 • Rémi Jézéquel, Pierre Gaillard, Alessandro Rudi

Mixability has been shown to be a powerful tool to obtain algorithms with optimal regret.

regression

Paper
Add Code

PSD Representations for Effective Probability Models

no code implementations • NeurIPS 2021 • Alessandro Rudi, Carlo Ciliberto

Finding a good way to model probability densities is key to probabilistic inference.

Density Estimation

Paper
Add Code

A Note on Optimizing Distributions using Kernel Mean Embeddings

1 code implementation • 18 Jun 2021 • Boris Muzellec, Francis Bach, Alessandro Rudi

Kernel mean embeddings are a popular tool that consists in representing probability measures by their infinite-dimensional mean embeddings in a reproducing kernel Hilbert space.

Paper
Code

Beyond Tikhonov: Faster Learning with Self-Concordant Losses via Iterative Regularization

no code implementations • NeurIPS 2021 • Gaspard Beugnot, Julien Mairal, Alessandro Rudi

The theory of spectral filtering is a remarkable tool to understand the statistical properties of learning with kernels.

Paper
Add Code

On the Consistency of Max-Margin Losses

no code implementations • 31 May 2021 • Alex Nowak-Vila, Alessandro Rudi, Francis Bach

The resulting loss is also a generalization of the binary support vector machine and it is consistent under milder conditions on the discrete loss.

Structured Prediction

Paper
Add Code

Beyond Tikhonov: faster learning with self-concordant losses, via iterative regularization

no code implementations • NeurIPS 2021 • Gaspard Beugnot, Julien Mairal, Alessandro Rudi

The theory of spectral filtering is a remarkable tool to understand the statistical properties of learning with kernels.

Paper
Add Code

Online nonparametric regression with Sobolev kernels

no code implementations • 6 Feb 2021 • Oleksandr Zadorozhnyi, Pierre Gaillard, Sebastien Gerschinovitz, Alessandro Rudi

In this work we investigate the variation of the online kernelized ridge regression algorithm in the setting of $d-$dimensional adversarial nonparametric regression.

regression

Paper
Add Code

Disambiguation of weak supervision with exponential convergence rates

1 code implementation • 4 Feb 2021 • Vivien Cabannes, Francis Bach, Alessandro Rudi

Machine learning approached through supervised learning requires expensive annotation of data.

BIG-bench Machine Learning Weakly-supervised Learning

Paper
Code

Fast rates in structured prediction

no code implementations • 1 Feb 2021 • Vivien Cabannes, Alessandro Rudi, Francis Bach

Discrete supervised learning problems such as classification are often tackled by introducing a continuous surrogate problem akin to regression.

Binary Classification regression +1

Paper
Add Code

A Dimension-free Computational Upper-bound for Smooth Optimal Transport Estimation

no code implementations • 13 Jan 2021 • Adrien Vacher, Boris Muzellec, Alessandro Rudi, Francis Bach, Francois-Xavier Vialard

It is well-known that plug-in statistical estimation of optimal transport suffers from the curse of dimensionality.

Statistics Theory Optimization and Control Statistics Theory 62G05

Paper
Add Code

Finding Global Minima via Kernel Approximations

no code implementations • 22 Dec 2020 • Alessandro Rudi, Ulysse Marteau-Ferey, Francis Bach

We consider the global minimization of smooth functions based solely on function evaluations.

Paper
Add Code

Overcoming the curse of dimensionality with Laplacian regularization in semi-supervised learning

2 code implementations • NeurIPS 2021 • Vivien Cabannes, Loucas Pillaud-Vivien, Francis Bach, Alessandro Rudi

As annotations of data can be scarce in large-scale practical problems, leveraging unlabelled examples is one of the most important aspects of machine learning.

Clustering

Paper
Code

Learning Output Embeddings in Structured Prediction

no code implementations • 29 Jul 2020 • Luc Brogat-Motte, Alessandro Rudi, Céline Brouard, Juho Rousu, Florence d'Alché-Buc

A powerful and flexible approach to structured prediction consists in embedding the structured objects to be predicted into a feature space of possibly infinite dimension by means of output kernels, and then, solving a regression problem in this output space.

regression Structured Prediction

Paper
Add Code

Non-parametric Models for Non-negative Functions

1 code implementation • NeurIPS 2020 • Ulysse Marteau-Ferey, Francis Bach, Alessandro Rudi

The paper is complemented by an experimental evaluation of the model showing its effectiveness in terms of formulation, algorithmic derivation and practical results on the problems of density estimation, regression with heteroscedastic errors, and multiple quantile regression.

Density Estimation regression

Paper
Code

Consistent Structured Prediction with Max-Min Margin Markov Networks

1 code implementation • ICML 2020 • Alex Nowak-Vila, Francis Bach, Alessandro Rudi

Max-margin methods for binary classification such as the support vector machine (SVM) have been extended to the structured prediction setting under the name of max-margin Markov networks ($M^3N$), or more generally structural SVMs.

Binary Classification Generalization Bounds +2

Paper
Code

Kernel methods through the roof: handling billions of points efficiently

1 code implementation • NeurIPS 2020 • Giacomo Meanti, Luigi Carratino, Lorenzo Rosasco, Alessandro Rudi

Kernel methods provide an elegant and principled approach to nonparametric learning, but so far could hardly be used in large scale problems, since na\"ive implementations scale poorly with data size.

173

Paper
Code

Interpolation and Learning with Scale Dependent Kernels

no code implementations • 17 Jun 2020 • Nicolò Pagliana, Alessandro Rudi, Ernesto De Vito, Lorenzo Rosasco

We study the learning properties of nonparametric ridge-less least squares.

Paper
Add Code

Structured and Localized Image Restoration

no code implementations • 16 Jun 2020 • Thomas Eboli, Alex Nowak-Vila, Jian Sun, Francis Bach, Jean Ponce, Alessandro Rudi

We present a novel approach to image restoration that leverages ideas from localized structured prediction and non-linear multi-task learning.

Image Restoration Multi-Task Learning +1

Paper
Add Code

Efficient improper learning for online logistic regression

no code implementations • 18 Mar 2020 • Rémi Jézéquel, Pierre Gaillard, Alessandro Rudi

We consider the setting of online logistic regression and consider the regret with respect to the 2-ball of radius B.

regression

Paper
Add Code

Structured Prediction with Partial Labelling through the Infimum Loss

2 code implementations • ICML 2020 • Vivien Cabannes, Alessandro Rudi, Francis Bach

Annotating datasets is one of the main costs in nowadays supervised learning.

Structured Prediction

Paper
Code

A General Framework for Consistent Structured Prediction with Implicit Loss Embeddings

no code implementations • 13 Feb 2020 • Carlo Ciliberto, Lorenzo Rosasco, Alessandro Rudi

We propose and analyze a novel theoretical and algorithmic framework for structured prediction.

Structured Prediction

Paper
Add Code

Statistical Limits of Supervised Quantum Learning

no code implementations • 28 Jan 2020 • Carlo Ciliberto, Andrea Rocchetto, Alessandro Rudi, Leonard Wossnig

Within the framework of statistical learning theory it is possible to bound the minimum number of samples required by a learner to reach a target accuracy.

BIG-bench Machine Learning Learning Theory +1

Paper
Add Code

Gain with no Pain: Efficient Kernel-PCA by Nyström Sampling

no code implementations • 11 Jul 2019 • Nicholas Sterge, Bharath Sriperumbudur, Lorenzo Rosasco, Alessandro Rudi

In this paper, we propose and study a Nystr\"om based approach to efficient large scale kernel principal component analysis (PCA).

Computational Efficiency

Paper
Add Code

Globally Convergent Newton Methods for Ill-conditioned Generalized Self-concordant Losses

1 code implementation • NeurIPS 2019 • Ulysse Marteau-Ferey, Francis Bach, Alessandro Rudi

In this paper, we study large-scale convex optimization algorithms based on the Newton method applied to regularized generalized self-concordant losses, which include logistic regression and softmax regression.

Generalization Bounds regression

Paper
Code

Efficient online learning with kernels for adversarial large scale problems

1 code implementation • NeurIPS 2019 • Rémi Jézéquel, Pierre Gaillard, Alessandro Rudi

For $d$-dimensional inputs, we provide a (close to) optimal regret of order $O((\log n)^{d+1})$ with per-round time complexity and space complexity $O((\log n)^{2d})$.

regression

Paper
Code

Affine Invariant Covariance Estimation for Heavy-Tailed Distributions

no code implementations • 8 Feb 2019 • Dmitrii Ostrovskii, Alessandro Rudi

Denoting $\text{cond}(\mathbf{S})$ the condition number of $\mathbf{S}$, the computational cost of the novel estimator is $O(d^2 n + d^3\log(\text{cond}(\mathbf{S})))$, which is comparable to the cost of the sample covariance estimator in the statistically interesing regime $n \ge d$.

Paper
Add Code

Beyond Least-Squares: Fast Rates for Regularized Empirical Risk Minimization through Self-Concordance

no code implementations • 8 Feb 2019 • Ulysse Marteau-Ferey, Dmitrii Ostrovskii, Francis Bach, Alessandro Rudi

We consider learning methods based on the regularization of a convex empirical risk by a squared Hilbertian norm, a setting that includes linear predictors and non-linear predictors through positive-definite kernels.

regression

Paper
Add Code

A General Theory for Structured Prediction with Smooth Convex Surrogates

no code implementations • 5 Feb 2019 • Alex Nowak-Vila, Francis Bach, Alessandro Rudi

In this work we provide a theoretical framework for structured prediction that generalizes the existing theory of surrogate methods for binary and multiclass classification based on estimating conditional probabilities with smooth convex surrogates (e. g. logistic regression).

General Classification Graph Matching +2

Paper
Add Code

Massively scalable Sinkhorn distances via the Nyström method

no code implementations • NeurIPS 2019 • Jason Altschuler, Francis Bach, Alessandro Rudi, Jonathan Niles-Weed

The Sinkhorn "distance", a variant of the Wasserstein distance with entropic regularization, is an increasingly popular tool in machine learning and statistical inference.

Paper
Add Code

On Fast Leverage Score Sampling and Optimal Learning

1 code implementation • NeurIPS 2018 • Alessandro Rudi, Daniele Calandriello, Luigi Carratino, Lorenzo Rosasco

Leverage score sampling provides an appealing way to perform approximate computations for large matrices.

regression

Paper
Code

Sharp Analysis of Learning with Discrete Losses

no code implementations • 16 Oct 2018 • Alex Nowak-Vila, Francis Bach, Alessandro Rudi

The problem of devising learning strategies for discrete losses (e. g., multilabeling, ranking) is currently addressed with methods and theoretical analyses ad-hoc for each loss.

Paper
Add Code

Learning with SGD and Random Features

no code implementations • NeurIPS 2018 • Luigi Carratino, Alessandro Rudi, Lorenzo Rosasco

Sketching and stochastic gradient methods are arguably the most common techniques to derive efficient large scale learning algorithms.

Paper
Add Code

Manifold Structured Prediction

no code implementations • NeurIPS 2018 • Alessandro Rudi, Carlo Ciliberto, Gian Maria Marconi, Lorenzo Rosasco

Structured prediction provides a general framework to deal with supervised problems where the outputs have semantically rich structure.

regression Structured Prediction

Paper
Add Code

Localized Structured Prediction

no code implementations • NeurIPS 2019 • Carlo Ciliberto, Francis Bach, Alessandro Rudi

Key to structured prediction is exploiting the problem structure to simplify the learning process.

Learning Theory Structured Prediction

Paper
Add Code

Differential Properties of Sinkhorn Approximation for Learning with Wasserstein Distance

2 code implementations • NeurIPS 2018 • Giulia Luise, Alessandro Rudi, Massimiliano Pontil, Carlo Ciliberto

Applications of optimal transport have recently gained remarkable attention thanks to the computational advantages of entropic regularization.

Paper
Code

Statistical Optimality of Stochastic Gradient Descent on Hard Learning Problems through Multiple Passes

no code implementations • NeurIPS 2018 • Loucas Pillaud-Vivien, Alessandro Rudi, Francis Bach

We consider stochastic gradient descent (SGD) for least-squares regression with potentially several passes over the data.

Paper
Add Code

Approximating Hamiltonian dynamics with the Nyström method

no code implementations • 6 Apr 2018 • Alessandro Rudi, Leonard Wossnig, Carlo Ciliberto, Andrea Rocchetto, Massimiliano Pontil, Simone Severini

Simulating the time-evolution of quantum mechanical systems is BQP-hard and expected to be one of the foremost applications of quantum computers.

Paper
Add Code

Optimal Rates for Spectral Algorithms with Least-Squares Regression over Hilbert Spaces

no code implementations • 20 Jan 2018 • Junhong Lin, Alessandro Rudi, Lorenzo Rosasco, Volkan Cevher

In this paper, we study regression problems over a separable Hilbert space with the square loss, covering non-parametric regression over a reproducing kernel Hilbert space.

regression

Paper
Add Code

Exponential convergence of testing error for stochastic gradient methods

no code implementations • 13 Dec 2017 • Loucas Pillaud-Vivien, Alessandro Rudi, Francis Bach

We consider binary classification problems with positive definite kernels and square loss, and study the convergence rates of stochastic gradient methods.

Binary Classification Classification +1

Paper
Add Code

FALKON: An Optimal Large Scale Kernel Method

4 code implementations • NeurIPS 2017 • Alessandro Rudi, Luigi Carratino, Lorenzo Rosasco

In this paper, we take a substantial step in scaling up kernel methods, proposing FALKON, a novel algorithm that allows to efficiently process millions of points.

Paper
Code

Consistent Multitask Learning with Nonlinear Output Relations

no code implementations • NeurIPS 2017 • Carlo Ciliberto, Alessandro Rudi, Lorenzo Rosasco, Massimiliano Pontil

However, in practice assuming the tasks to be linearly related might be restrictive, and allowing for nonlinear structures is a challenge.

Structured Prediction

Paper
Add Code

A Consistent Regularization Approach for Structured Prediction

no code implementations • NeurIPS 2016 • Carlo Ciliberto, Alessandro Rudi, Lorenzo Rosasco

We propose and analyze a regularization approach for structured prediction problems.

Structured Prediction

Paper
Add Code

Generalization Properties of Learning with Random Features

1 code implementation • NeurIPS 2017 • Alessandro Rudi, Lorenzo Rosasco

We study the generalization properties of ridge regression with random features in the statistical learning framework.

regression

Paper
Code

NYTRO: When Subsampling Meets Early Stopping

1 code implementation • 19 Oct 2015 • Tomas Angles, Raffaello Camoriano, Alessandro Rudi, Lorenzo Rosasco

Early stopping is a well known approach to reduce the time complexity for performing training and model selection of large scale learning machines.

Model Selection regression

Paper
Code

Less is More: Nyström Computational Regularization

1 code implementation • NeurIPS 2015 • Alessandro Rudi, Raffaello Camoriano, Lorenzo Rosasco

We study Nystr\"om type subsampling approaches to large scale kernel methods, and prove learning bounds in the statistical learning setting, where random sampling and high probability estimates are considered.

Paper
Code

On the Sample Complexity of Subspace Learning

no code implementations • NeurIPS 2013 • Alessandro Rudi, Guille D. Canas, Lorenzo Rosasco

A large number of algorithms in machine learning, from principal component analysis (PCA), and its non-linear (kernel) extensions, to more recent spectral embedding and support estimation methods, rely on estimating a linear subspace from samples.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.