Search Results for author: Mehrdad Mahdavi

Found 49 papers, 10 papers with code

On the Generalization Ability of Unsupervised Pretraining

no code implementations • 11 Mar 2024 • Yuyang Deng, Junyuan Hong, Jiayu Zhou, Mehrdad Mahdavi

Recent advances in unsupervised learning have shown that unsupervised pre-training, followed by fine-tuning, can improve model generalization.

Binary Classification Unsupervised Pre-training

Paper
Add Code

On the Generalization Capability of Temporal Graph Learning Algorithms: Theoretical Insights and a Simpler Method

no code implementations • 26 Feb 2024 • Weilin Cong, Jian Kang, Hanghang Tong, Mehrdad Mahdavi

Temporal Graph Learning (TGL) has become a prevalent technique across diverse real-world applications, especially in domains where data can be represented as a graph and evolves over time.

Graph Learning

Paper
Add Code

Stochastic Quantum Sampling for Non-Logconcave Distributions and Estimating Partition Functions

no code implementations • 17 Oct 2023 • Guneykan Ozgul, Xiantao Li, Mehrdad Mahdavi, Chunhao Wang

We also incorporate a stochastic gradient oracle that implements the quantum walk operators inexactly by only using mini-batch gradients.

Paper
Add Code

Understanding Deep Gradient Leakage via Inversion Influence Functions

1 code implementation • NeurIPS 2023 • Haobo Zhang, Junyuan Hong, Yuyang Deng, Mehrdad Mahdavi, Jiayu Zhou

Deep Gradient Leakage (DGL) is a highly effective attack that recovers private training images from gradient vectors.

Paper
Code

On the Hardness of Robustness Transfer: A Perspective from Rademacher Complexity over Symmetric Difference Hypothesis Space

no code implementations • 23 Feb 2023 • Yuyang Deng, Nidham Gazagnadou, Junyuan Hong, Mehrdad Mahdavi, Lingjuan Lyu

Recent studies demonstrated that the adversarially robust learning under $\ell_\infty$ attack is harder to generalize to different domains than standard domain adaptation.

Binary Classification Domain Generalization +1

Paper
Add Code

Do We Really Need Complicated Model Architectures For Temporal Networks?

no code implementations • 22 Feb 2023 • Weilin Cong, Si Zhang, Jian Kang, Baichuan Yuan, Hao Wu, Xin Zhou, Hanghang Tong, Mehrdad Mahdavi

Recurrent neural network (RNN) and self-attention mechanism (SAM) are the de facto methods to extract spatial-temporal information for temporal graph learning.

Graph Learning Link Prediction

Paper
Add Code

Efficiently Forgetting What You Have Learned in Graph Representation Learning via Projection

no code implementations • 17 Feb 2023 • Weilin Cong, Mehrdad Mahdavi

As privacy protection receives much attention, unlearning the effect of a specific node from a pre-trained graph learning model has become equally important.

Graph Learning Graph Representation Learning

Paper
Add Code

Tight Analysis of Extra-gradient and Optimistic Gradient Methods For Nonconvex Minimax Problems

no code implementations • 17 Oct 2022 • Pouria Mahdavinia, Yuyang Deng, Haochuan Li, Mehrdad Mahdavi

Despite the established convergence theory of Optimistic Gradient Descent Ascent (OGDA) and Extragradient (EG) methods for the convex-concave minimax problems, little is known about the theoretical guarantees of these methods in nonconvex settings.

Paper
Add Code

Learning Distributionally Robust Models at Scale via Composite Optimization

no code implementations • ICLR 2022 • Farzin Haddadpour, Mohammad Mahdi Kamani, Mehrdad Mahdavi, Amin Karbasi

To train machine learning models that are robust to distribution shifts in the data, distributionally robust optimization (DRO) has been proven very effective.

Paper
Add Code

DyFormer: A Scalable Dynamic Graph Transformer with Provable Benefits on Generalization Ability

no code implementations • 19 Nov 2021 • Weilin Cong, Yanhong Wu, Yuandong Tian, Mengting Gu, Yinglong Xia, Chun-cheng Jason Chen, Mehrdad Mahdavi

To achieve efficient and scalable training, we propose temporal-union graph structure and its associated subgraph-based node sampling strategy.

Graph Learning Graph Representation Learning

Paper
Add Code

Learn Locally, Correct Globally: A Distributed Algorithm for Training Graph Neural Networks

no code implementations • ICLR 2022 • Morteza Ramezani, Weilin Cong, Mehrdad Mahdavi, Mahmut T. Kandemir, Anand Sivasubramaniam

To solve the performance degradation, we propose to apply $\text{{Global Server Corrections}}$ on the server to refine the locally learned models.

Paper
Add Code

On Provable Benefits of Depth in Training Graph Convolutional Networks

1 code implementation • NeurIPS 2021 • Weilin Cong, Morteza Ramezani, Mehrdad Mahdavi

Graph Convolutional Networks (GCNs) are known to suffer from performance degradation as the number of layers increases, which is usually attributed to over-smoothing.

Paper
Code

Meta-learning with an Adaptive Task Scheduler

2 code implementations • NeurIPS 2021 • Huaxiu Yao, Yu Wang, Ying WEI, Peilin Zhao, Mehrdad Mahdavi, Defu Lian, Chelsea Finn

In ATS, for the first time, we design a neural scheduler to decide which meta-training tasks to use next by predicting the probability being sampled for each candidate task, and train the scheduler to optimize the generalization capacity of the meta-model to unseen tasks.

Drug Discovery Meta-Learning

Paper
Code

Local SGD Optimizes Overparameterized Neural Networks in Polynomial Time

no code implementations • 22 Jul 2021 • Yuyang Deng, Mohammad Mahdi Kamani, Mehrdad Mahdavi

This work is the first to show the convergence of Local SGD on non-smooth functions, and will shed lights on the optimization theory of federated training of deep neural networks.

Distributed Optimization

Paper
Add Code

Pareto Efficient Fairness in Supervised Learning: From Extraction to Tracing

no code implementations • 4 Apr 2021 • Mohammad Mahdi Kamani, Rana Forsati, James Z. Wang, Mehrdad Mahdavi

The proposed PEF notion is definition-agnostic, meaning that any well-defined notion of fairness can be reduced to the PEF notion.

Bilevel Optimization Decision Making +1

Paper
Add Code

On the Importance of Sampling in Training GCNs: Tighter Analysis and Variance Reduction

1 code implementation • 3 Mar 2021 • Weilin Cong, Morteza Ramezani, Mehrdad Mahdavi

In this paper, we describe and analyze a general doubly variance reduction schema that can accelerate any sampling method under the memory budget.

Node Classification

Paper
Code

Distributionally Robust Federated Averaging

1 code implementation • NeurIPS 2020 • Yuyang Deng, Mohammad Mahdi Kamani, Mehrdad Mahdavi

To compensate for this, we propose a Distributionally Robust Federated Averaging (DRFA) algorithm that employs a novel snapshotting scheme to approximate the accumulation of history gradients of the mixing parameter.

Federated Learning

175

Paper
Code

Local Stochastic Gradient Descent Ascent: Convergence Analysis and Communication Efficiency

no code implementations • 25 Feb 2021 • Yuyang Deng, Mehrdad Mahdavi

Local SGD is a promising approach to overcome the communication overhead in distributed learning by reducing the synchronization frequency among worker nodes.

Paper
Add Code

Communication-efficient k-Means for Edge-based Machine Learning

no code implementations • 8 Feb 2021 • Hanlin Lu, Ting He, Shiqiang Wang, Changchang Liu, Mehrdad Mahdavi, Vijaykrishnan Narayanan, Kevin S. Chan, Stephen Pasteris

We consider the problem of computing the k-means centers for a large high-dimensional dataset in the context of edge-based machine learning, where data sources offload machine learning computation to nearby edge servers.

BIG-bench Machine Learning Dimensionality Reduction +1

Paper
Add Code

On the Importance of Sampling in Training GCNs: Convergence Analysis and Variance Reduction

no code implementations • 1 Jan 2021 • Weilin Cong, Morteza Ramezani, Mehrdad Mahdavi

In this paper, we describe and analyze a general \textbf{\textit{doubly variance reduction}} schema that can accelerate any sampling method under the memory budget.

Paper
Add Code

GCN meets GPU: Decoupling “When to Sample” from “How to Sample”

no code implementations • NeurIPS 2020 • Morteza Ramezani, Weilin Cong, Mehrdad Mahdavi, Anand Sivasubramaniam, Mahmut Kandemir

Sampling-based methods promise scalability improvements when paired with stochastic gradient descent in training Graph Convolutional Networks (GCNs).

Paper
Add Code

Online Structured Meta-learning

no code implementations • NeurIPS 2020 • Huaxiu Yao, Yingbo Zhou, Mehrdad Mahdavi, Zhenhui Li, Richard Socher, Caiming Xiong

When a new task is encountered, it constructs a meta-knowledge pathway by either utilizing the most relevant knowledge blocks or exploring new blocks.

Meta-Learning

Paper
Add Code

Targeted Data-driven Regularization for Out-of-Distribution Generalization

1 code implementation • 1 Aug 2020 • Mohammad Mahdi Kamani, Sadegh Farhang, Mehrdad Mahdavi, James Z. Wang

The proposed framework, named targeted data-driven regularization (TDR), is model- and dataset-agnostic and employs a target dataset that resembles the desired nature of test data in order to guide the learning process in a coupled manner.

Bilevel Optimization Meta-Learning +1

Paper
Code

Federated Learning with Compression: Unified Analysis and Sharp Guarantees

1 code implementation • 2 Jul 2020 • Farzin Haddadpour, Mohammad Mahdi Kamani, Aryan Mokhtari, Mehrdad Mahdavi

In federated learning, communication cost is often a critical bottleneck to scale up distributed optimization algorithms to collaboratively learn a model from millions of devices with potentially unreliable or limited communication and heterogeneous data distributions.

Distributed Optimization Federated Learning

175

Paper
Code

Minimal Variance Sampling with Provable Guarantees for Fast Training of Graph Neural Networks

no code implementations • 24 Jun 2020 • Weilin Cong, Rana Forsati, Mahmut Kandemir, Mehrdad Mahdavi

In this paper, we theoretically analyze the variance of sampling methods and show that, due to the composite structure of empirical risk, the variance of any sampling method can be decomposed into \textit{embedding approximation variance} in the forward stage and \textit{stochastic gradient variance} in the backward stage that necessities mitigating both types of variance to obtain faster convergence rate.

Paper
Add Code

Adaptive Personalized Federated Learning

9 code implementations • 30 Mar 2020 • Yuyang Deng, Mohammad Mahdi Kamani, Mehrdad Mahdavi

Investigation of the degree of personalization in federated learning algorithms has shown that only maximizing the performance of the global model will confine the capacity of the local models to personalize.

Bilevel Optimization Personalized Federated Learning

1,173

Paper
Code

Efficient Fair Principal Component Analysis

no code implementations • 12 Nov 2019 • Mohammad Mahdi Kamani, Farzin Haddadpour, Rana Forsati, Mehrdad Mahdavi

It has been shown that dimension reduction methods such as PCA may be inherently prone to unfairness and treat data from different sensitive groups such as race, color, sex, etc., unfairly.

Dimensionality Reduction Fairness

Paper
Add Code

On the Convergence of Local Descent Methods in Federated Learning

no code implementations • 31 Oct 2019 • Farzin Haddadpour, Mehrdad Mahdavi

To bridge this gap, we demonstrate that by properly analyzing the effect of unbiased gradients and sampling schema in federated setting, under mild assumptions, the implicit variance reduction feature of local distributed methods generalize to heterogeneous data shards and exhibits the best known convergence rates of homogeneous setting both in general nonconvex and under {\pl}~ condition (generalization of strong-convexity).

Distributed Optimization Federated Learning

Paper
Add Code

Local SGD with Periodic Averaging: Tighter Analysis and Adaptive Synchronization

2 code implementations • NeurIPS 2019 • Farzin Haddadpour, Mohammad Mahdi Kamani, Mehrdad Mahdavi, Viveck R. Cadambe

Specifically, we show that for loss functions that satisfy the Polyak-{\L}ojasiewicz condition, $O((pT)^{1/3})$ rounds of communication suffice to achieve a linear speed up, that is, an error of $O(1/pT)$, where $T$ is the total number of model updates at each worker.

Distributed Optimization

244

Paper
Code

Trading Redundancy for Communication: Speeding up Distributed SGD for Non-convex Optimization

1 code implementation • International Conference on Machine Learning 2019 • Farzin Haddadpour, Mohammad Mahdi Kamani, Mehrdad Mahdavi, Viveck Cadambe

Communication overhead is one of the key challenges that hinder the scalability of distributed optimization algorithms to train large neural networks.

Distributed Optimization

Paper
Code

Learning Feature Nonlinearities with Non-Convex Regularized Binned Regression

no code implementations • 20 May 2017 • Samet Oymak, Mehrdad Mahdavi, Jiasi Chen

Evaluations on synthetic and real datasets demonstrate that algorithm is competitive with current state-of-the-art and accurately learns feature nonlinearities.

regression

Paper
Add Code

Sketching Meets Random Projection in the Dual: A Provable Recovery Algorithm for Big and High-dimensional Data

no code implementations • 10 Oct 2016 • Jialei Wang, Jason D. Lee, Mehrdad Mahdavi, Mladen Kolar, Nathan Srebro

Sketching techniques have become popular for scaling up machine learning algorithms by reducing the sample size or dimensionality of massive data sets, while still maintaining the statistical power of big data.

Paper
Add Code

Smooth and Strong: MAP Inference with Linear Convergence

no code implementations • NeurIPS 2015 • Ofer Meshi, Mehrdad Mahdavi, Alex Schwing

Maximum a-posteriori (MAP) inference is an important task for many applications.

Combinatorial Optimization

Paper
Add Code

Train and Test Tightness of LP Relaxations in Structured Prediction

no code implementations • 4 Nov 2015 • Ofer Meshi, Mehrdad Mahdavi, Adrian Weller, David Sontag

Structured prediction is used in areas such as computer vision and natural language processing to predict structured outputs such as segmentations or parse trees.

Structured Prediction

Paper
Add Code

Matrix Factorization with Explicit Trust and Distrust Relationships

no code implementations • 2 Aug 2014 • Rana Forsati, Mehrdad Mahdavi, Mehrnoush Shamsfard, Mohamed Sarwat

With the advent of online social networks, recommender systems have became crucial for the success of many online applications/services due to their significance role in tailoring these applications to user-specific needs or preferences.

Recommendation Systems

Paper
Add Code

Exploiting Smoothness in Statistical Learning, Sequential Prediction, and Stochastic Optimization

no code implementations • 19 Jul 2014 • Mehrdad Mahdavi

In the last several years, the intimate connection between convex optimization and learning problems, in both statistical and sequential frameworks, has shifted the focus of algorithmic machine learning to examine this interplay.

Stochastic Optimization

Paper
Add Code

Binary Excess Risk for Smooth Convex Surrogates

no code implementations • 7 Feb 2014 • Mehrdad Mahdavi, Lijun Zhang, Rong Jin

In statistical learning theory, convex surrogates of the 0-1 loss are highly preferred because of the computational and theoretical virtues that convexity brings in.

Learning Theory

Paper
Add Code

Excess Risk Bounds for Exponentially Concave Losses

no code implementations • 18 Jan 2014 • Mehrdad Mahdavi, Rong Jin

The overarching goal of this paper is to derive excess risk bounds for learning from exp-concave loss functions in passive and sequential learning settings.

Management

Paper
Add Code

Mixed Optimization for Smooth Functions

no code implementations • NeurIPS 2013 • Mehrdad Mahdavi, Lijun Zhang, Rong Jin

It is well known that the optimal convergence rate for stochastic optimization of smooth functions is $[O(1/\sqrt{T})]$, which is same as stochastic optimization of Lipschitz continuous convex functions.

Stochastic Optimization

Paper
Add Code

Stochastic Convex Optimization with Multiple Objectives

no code implementations • NeurIPS 2013 • Mehrdad Mahdavi, Tianbao Yang, Rong Jin

It leverages on the theory of Lagrangian method in constrained optimization and attains the optimal convergence rate of $[O(1/ \sqrt{T})]$ in high probability for general Lipschitz continuous objectives.

Stochastic Optimization

Paper
Add Code

Linear Convergence with Condition Number Independent Access of Full Gradients

no code implementations • NeurIPS 2013 • Lijun Zhang, Mehrdad Mahdavi, Rong Jin

For smooth and strongly convex optimization, the optimal iteration complexity of the gradient-based algorithm is $O(\sqrt{\kappa}\log 1/\epsilon)$, where $\kappa$ is the conditional number.

Paper
Add Code

Beating the Minimax Rate of Active Learning with Prior Knowledge

no code implementations • 19 Nov 2013 • Lijun Zhang, Mehrdad Mahdavi, Rong Jin

Under the assumption that the norm of the optimal classifier that minimizes the convex risk is available, our analysis shows that the introduction of the convex surrogate loss yields an exponential reduction in the label complexity even when the parameter $\kappa$ of the Tsybakov noise is larger than $1$.

Active Learning

Paper
Add Code

MixedGrad: An O(1/T) Convergence Rate Algorithm for Stochastic Smooth Optimization

no code implementations • 26 Jul 2013 • Mehrdad Mahdavi, Rong Jin

It is well known that the optimal convergence rate for stochastic optimization of smooth functions is $O(1/\sqrt{T})$, which is same as stochastic optimization of Lipschitz continuous convex functions.

Stochastic Optimization

Paper
Add Code

Passive Learning with Target Risk

no code implementations • 8 Feb 2013 • Mehrdad Mahdavi, Rong Jin

In this paper we consider learning in passive setting but with a slight modification.

Generalization Bounds Learning Theory +1

Paper
Add Code

Nyström Method vs Random Fourier Features: A Theoretical and Empirical Comparison

no code implementations • NeurIPS 2012 • Tianbao Yang, Yu-Feng Li, Mehrdad Mahdavi, Rong Jin, Zhi-Hua Zhou

Both random Fourier features and the Nyström method have been successfully applied to efficient kernel learning.

Paper
Add Code

Stochastic Gradient Descent with Only One Projection

no code implementations • NeurIPS 2012 • Mehrdad Mahdavi, Tianbao Yang, Rong Jin, Shenghuo Zhu, Jin-Feng Yi

Although many variants of stochastic gradient descent have been proposed for large-scale convex optimization, most of them require projecting the solution at {\it each} iteration to ensure that the obtained solution stays within the feasible domain.