no code implementations • 11 Mar 2024 • Yuyang Deng, Junyuan Hong, Jiayu Zhou, Mehrdad Mahdavi
Recent advances in unsupervised learning have shown that unsupervised pre-training, followed by fine-tuning, can improve model generalization.
no code implementations • 26 Feb 2024 • Weilin Cong, Jian Kang, Hanghang Tong, Mehrdad Mahdavi
Temporal Graph Learning (TGL) has become a prevalent technique across diverse real-world applications, especially in domains where data can be represented as a graph and evolves over time.
no code implementations • 17 Oct 2023 • Guneykan Ozgul, Xiantao Li, Mehrdad Mahdavi, Chunhao Wang
We also incorporate a stochastic gradient oracle that implements the quantum walk operators inexactly by only using mini-batch gradients.
1 code implementation • NeurIPS 2023 • Haobo Zhang, Junyuan Hong, Yuyang Deng, Mehrdad Mahdavi, Jiayu Zhou
Deep Gradient Leakage (DGL) is a highly effective attack that recovers private training images from gradient vectors.
no code implementations • 23 Feb 2023 • Yuyang Deng, Nidham Gazagnadou, Junyuan Hong, Mehrdad Mahdavi, Lingjuan Lyu
Recent studies demonstrated that the adversarially robust learning under $\ell_\infty$ attack is harder to generalize to different domains than standard domain adaptation.
no code implementations • 22 Feb 2023 • Weilin Cong, Si Zhang, Jian Kang, Baichuan Yuan, Hao Wu, Xin Zhou, Hanghang Tong, Mehrdad Mahdavi
Recurrent neural network (RNN) and self-attention mechanism (SAM) are the de facto methods to extract spatial-temporal information for temporal graph learning.
no code implementations • 17 Feb 2023 • Weilin Cong, Mehrdad Mahdavi
As privacy protection receives much attention, unlearning the effect of a specific node from a pre-trained graph learning model has become equally important.
no code implementations • 17 Oct 2022 • Pouria Mahdavinia, Yuyang Deng, Haochuan Li, Mehrdad Mahdavi
Despite the established convergence theory of Optimistic Gradient Descent Ascent (OGDA) and Extragradient (EG) methods for the convex-concave minimax problems, little is known about the theoretical guarantees of these methods in nonconvex settings.
no code implementations • ICLR 2022 • Farzin Haddadpour, Mohammad Mahdi Kamani, Mehrdad Mahdavi, Amin Karbasi
To train machine learning models that are robust to distribution shifts in the data, distributionally robust optimization (DRO) has been proven very effective.
no code implementations • 19 Nov 2021 • Weilin Cong, Yanhong Wu, Yuandong Tian, Mengting Gu, Yinglong Xia, Chun-cheng Jason Chen, Mehrdad Mahdavi
To achieve efficient and scalable training, we propose temporal-union graph structure and its associated subgraph-based node sampling strategy.
no code implementations • ICLR 2022 • Morteza Ramezani, Weilin Cong, Mehrdad Mahdavi, Mahmut T. Kandemir, Anand Sivasubramaniam
To solve the performance degradation, we propose to apply $\text{{Global Server Corrections}}$ on the server to refine the locally learned models.
1 code implementation • NeurIPS 2021 • Weilin Cong, Morteza Ramezani, Mehrdad Mahdavi
Graph Convolutional Networks (GCNs) are known to suffer from performance degradation as the number of layers increases, which is usually attributed to over-smoothing.
2 code implementations • NeurIPS 2021 • Huaxiu Yao, Yu Wang, Ying WEI, Peilin Zhao, Mehrdad Mahdavi, Defu Lian, Chelsea Finn
In ATS, for the first time, we design a neural scheduler to decide which meta-training tasks to use next by predicting the probability being sampled for each candidate task, and train the scheduler to optimize the generalization capacity of the meta-model to unseen tasks.
no code implementations • 22 Jul 2021 • Yuyang Deng, Mohammad Mahdi Kamani, Mehrdad Mahdavi
This work is the first to show the convergence of Local SGD on non-smooth functions, and will shed lights on the optimization theory of federated training of deep neural networks.
no code implementations • 4 Apr 2021 • Mohammad Mahdi Kamani, Rana Forsati, James Z. Wang, Mehrdad Mahdavi
The proposed PEF notion is definition-agnostic, meaning that any well-defined notion of fairness can be reduced to the PEF notion.
1 code implementation • 3 Mar 2021 • Weilin Cong, Morteza Ramezani, Mehrdad Mahdavi
In this paper, we describe and analyze a general doubly variance reduction schema that can accelerate any sampling method under the memory budget.
1 code implementation • NeurIPS 2020 • Yuyang Deng, Mohammad Mahdi Kamani, Mehrdad Mahdavi
To compensate for this, we propose a Distributionally Robust Federated Averaging (DRFA) algorithm that employs a novel snapshotting scheme to approximate the accumulation of history gradients of the mixing parameter.
no code implementations • 25 Feb 2021 • Yuyang Deng, Mehrdad Mahdavi
Local SGD is a promising approach to overcome the communication overhead in distributed learning by reducing the synchronization frequency among worker nodes.
no code implementations • 8 Feb 2021 • Hanlin Lu, Ting He, Shiqiang Wang, Changchang Liu, Mehrdad Mahdavi, Vijaykrishnan Narayanan, Kevin S. Chan, Stephen Pasteris
We consider the problem of computing the k-means centers for a large high-dimensional dataset in the context of edge-based machine learning, where data sources offload machine learning computation to nearby edge servers.
no code implementations • 1 Jan 2021 • Weilin Cong, Morteza Ramezani, Mehrdad Mahdavi
In this paper, we describe and analyze a general \textbf{\textit{doubly variance reduction}} schema that can accelerate any sampling method under the memory budget.
no code implementations • NeurIPS 2020 • Morteza Ramezani, Weilin Cong, Mehrdad Mahdavi, Anand Sivasubramaniam, Mahmut Kandemir
Sampling-based methods promise scalability improvements when paired with stochastic gradient descent in training Graph Convolutional Networks (GCNs).
no code implementations • NeurIPS 2020 • Huaxiu Yao, Yingbo Zhou, Mehrdad Mahdavi, Zhenhui Li, Richard Socher, Caiming Xiong
When a new task is encountered, it constructs a meta-knowledge pathway by either utilizing the most relevant knowledge blocks or exploring new blocks.
1 code implementation • 1 Aug 2020 • Mohammad Mahdi Kamani, Sadegh Farhang, Mehrdad Mahdavi, James Z. Wang
The proposed framework, named targeted data-driven regularization (TDR), is model- and dataset-agnostic and employs a target dataset that resembles the desired nature of test data in order to guide the learning process in a coupled manner.
1 code implementation • 2 Jul 2020 • Farzin Haddadpour, Mohammad Mahdi Kamani, Aryan Mokhtari, Mehrdad Mahdavi
In federated learning, communication cost is often a critical bottleneck to scale up distributed optimization algorithms to collaboratively learn a model from millions of devices with potentially unreliable or limited communication and heterogeneous data distributions.
no code implementations • 24 Jun 2020 • Weilin Cong, Rana Forsati, Mahmut Kandemir, Mehrdad Mahdavi
In this paper, we theoretically analyze the variance of sampling methods and show that, due to the composite structure of empirical risk, the variance of any sampling method can be decomposed into \textit{embedding approximation variance} in the forward stage and \textit{stochastic gradient variance} in the backward stage that necessities mitigating both types of variance to obtain faster convergence rate.
9 code implementations • 30 Mar 2020 • Yuyang Deng, Mohammad Mahdi Kamani, Mehrdad Mahdavi
Investigation of the degree of personalization in federated learning algorithms has shown that only maximizing the performance of the global model will confine the capacity of the local models to personalize.
no code implementations • 12 Nov 2019 • Mohammad Mahdi Kamani, Farzin Haddadpour, Rana Forsati, Mehrdad Mahdavi
It has been shown that dimension reduction methods such as PCA may be inherently prone to unfairness and treat data from different sensitive groups such as race, color, sex, etc., unfairly.
no code implementations • 31 Oct 2019 • Farzin Haddadpour, Mehrdad Mahdavi
To bridge this gap, we demonstrate that by properly analyzing the effect of unbiased gradients and sampling schema in federated setting, under mild assumptions, the implicit variance reduction feature of local distributed methods generalize to heterogeneous data shards and exhibits the best known convergence rates of homogeneous setting both in general nonconvex and under {\pl}~ condition (generalization of strong-convexity).
2 code implementations • NeurIPS 2019 • Farzin Haddadpour, Mohammad Mahdi Kamani, Mehrdad Mahdavi, Viveck R. Cadambe
Specifically, we show that for loss functions that satisfy the Polyak-{\L}ojasiewicz condition, $O((pT)^{1/3})$ rounds of communication suffice to achieve a linear speed up, that is, an error of $O(1/pT)$, where $T$ is the total number of model updates at each worker.
1 code implementation • International Conference on Machine Learning 2019 • Farzin Haddadpour, Mohammad Mahdi Kamani, Mehrdad Mahdavi, Viveck Cadambe
Communication overhead is one of the key challenges that hinder the scalability of distributed optimization algorithms to train large neural networks.
no code implementations • 20 May 2017 • Samet Oymak, Mehrdad Mahdavi, Jiasi Chen
Evaluations on synthetic and real datasets demonstrate that algorithm is competitive with current state-of-the-art and accurately learns feature nonlinearities.
no code implementations • 10 Oct 2016 • Jialei Wang, Jason D. Lee, Mehrdad Mahdavi, Mladen Kolar, Nathan Srebro
Sketching techniques have become popular for scaling up machine learning algorithms by reducing the sample size or dimensionality of massive data sets, while still maintaining the statistical power of big data.
no code implementations • NeurIPS 2015 • Ofer Meshi, Mehrdad Mahdavi, Alex Schwing
Maximum a-posteriori (MAP) inference is an important task for many applications.
no code implementations • 4 Nov 2015 • Ofer Meshi, Mehrdad Mahdavi, Adrian Weller, David Sontag
Structured prediction is used in areas such as computer vision and natural language processing to predict structured outputs such as segmentations or parse trees.
no code implementations • 2 Aug 2014 • Rana Forsati, Mehrdad Mahdavi, Mehrnoush Shamsfard, Mohamed Sarwat
With the advent of online social networks, recommender systems have became crucial for the success of many online applications/services due to their significance role in tailoring these applications to user-specific needs or preferences.
no code implementations • 19 Jul 2014 • Mehrdad Mahdavi
In the last several years, the intimate connection between convex optimization and learning problems, in both statistical and sequential frameworks, has shifted the focus of algorithmic machine learning to examine this interplay.
no code implementations • 7 Feb 2014 • Mehrdad Mahdavi, Lijun Zhang, Rong Jin
In statistical learning theory, convex surrogates of the 0-1 loss are highly preferred because of the computational and theoretical virtues that convexity brings in.
no code implementations • 18 Jan 2014 • Mehrdad Mahdavi, Rong Jin
The overarching goal of this paper is to derive excess risk bounds for learning from exp-concave loss functions in passive and sequential learning settings.
no code implementations • NeurIPS 2013 • Mehrdad Mahdavi, Lijun Zhang, Rong Jin
It is well known that the optimal convergence rate for stochastic optimization of smooth functions is $[O(1/\sqrt{T})]$, which is same as stochastic optimization of Lipschitz continuous convex functions.
no code implementations • NeurIPS 2013 • Mehrdad Mahdavi, Tianbao Yang, Rong Jin
It leverages on the theory of Lagrangian method in constrained optimization and attains the optimal convergence rate of $[O(1/ \sqrt{T})]$ in high probability for general Lipschitz continuous objectives.
no code implementations • NeurIPS 2013 • Lijun Zhang, Mehrdad Mahdavi, Rong Jin
For smooth and strongly convex optimization, the optimal iteration complexity of the gradient-based algorithm is $O(\sqrt{\kappa}\log 1/\epsilon)$, where $\kappa$ is the conditional number.
no code implementations • 19 Nov 2013 • Lijun Zhang, Mehrdad Mahdavi, Rong Jin
Under the assumption that the norm of the optimal classifier that minimizes the convex risk is available, our analysis shows that the introduction of the convex surrogate loss yields an exponential reduction in the label complexity even when the parameter $\kappa$ of the Tsybakov noise is larger than $1$.
no code implementations • 26 Jul 2013 • Mehrdad Mahdavi, Rong Jin
It is well known that the optimal convergence rate for stochastic optimization of smooth functions is $O(1/\sqrt{T})$, which is same as stochastic optimization of Lipschitz continuous convex functions.
no code implementations • 8 Feb 2013 • Mehrdad Mahdavi, Rong Jin
In this paper we consider learning in passive setting but with a slight modification.
no code implementations • NeurIPS 2012 • Tianbao Yang, Yu-Feng Li, Mehrdad Mahdavi, Rong Jin, Zhi-Hua Zhou
Both random Fourier features and the Nyström method have been successfully applied to efficient kernel learning.
no code implementations • NeurIPS 2012 • Mehrdad Mahdavi, Tianbao Yang, Rong Jin, Shenghuo Zhu, Jin-Feng Yi
Although many variants of stochastic gradient descent have been proposed for large-scale convex optimization, most of them require projecting the solution at {\it each} iteration to ensure that the obtained solution stays within the feasible domain.
no code implementations • 26 Nov 2012 • Mehrdad Mahdavi, Tianbao Yang, Rong Jin
We first propose a projection based algorithm which attains an $O(T^{-1/3})$ convergence rate.
no code implementations • 13 Nov 2012 • Lijun Zhang, Mehrdad Mahdavi, Rong Jin, Tianbao Yang, Shenghuo Zhu
Random projection has been widely used in data classification.
no code implementations • 24 Jan 2012 • Tianbao Yang, Mehrdad Mahdavi, Rong Jin, Shenghuo Zhu
We study the non-smooth optimization problems in machine learning, where both the loss function and the regularizer are non-smooth functions.