no code implementations • 15 May 2024 • Sihan Zeng, Thinh T. Doan
Two-time-scale optimization is a framework introduced in Zeng et al. (2024) that abstracts a range of policy evaluation and policy optimization problems in reinforcement learning (RL).
no code implementations • 3 May 2024 • Sihan Zeng, Thinh T. Doan, Justin Romberg
Multi-task reinforcement learning (RL) aims to find a single policy that effectively solves multiple tasks at the same time.
no code implementations • 23 Jan 2024 • Thinh T. Doan
This paper proposes to develop a new variant of the two-time-scale stochastic approximation to find the roots of two coupled nonlinear operators, assuming only noisy samples of these operators can be observed.
no code implementations • 15 Jun 2022 • Dingyang Chen, Qi Zhang, Thinh T. Doan
Our focus in this paper is to study the convergence of the policy gradient method for solving MPGs under softmax policy parameterization, both tabular and parameterized with general function approximators such as neural networks.
no code implementations • 27 May 2022 • Sihan Zeng, Thinh T. Doan, Justin Romberg
We study the problem of finding the Nash equilibrium in a two-player zero-sum Markov game.
no code implementations • 17 Dec 2021 • Thinh T. Doan
Perhaps, the most popular first-order method in solving min-max optimization is the so-called simultaneous (or single-loop) gradient descent-ascent algorithm due to its simplicity in implementation.
no code implementations • 21 Oct 2021 • Sihan Zeng, Thinh T. Doan, Justin Romberg
To solve this constrained optimization program, we study an online actor-critic variant of a classic primal-dual method where the gradients of both the primal and dual functions are estimated using samples from a single trajectory generated by the underlying time-varying Markov processes.
no code implementations • 29 Sep 2021 • Sihan Zeng, Thinh T. Doan, Justin Romberg
In our two-time-scale approach, one scale is to estimate the true gradient from these samples, which is then used to update the estimate of the optimal solution.
no code implementations • 26 Aug 2021 • Nirupam Gupta, Thinh T. Doan, Nitin Vaidya
However, we do not know of any such techniques for the federated local SGD algorithm - a more commonly used method for federated machine learning.
no code implementations • 28 May 2021 • Marcos M. Vasconcelos, Thinh T. Doan, Urbashi Mitra
In particular, we show that the method converges at a rate $O(log_2 k/\sqrt k)$ to the optimal solution, when the underlying objective function is strongly convex and smooth.
no code implementations • 4 Apr 2021 • Thinh T. Doan
Such dependent data result to biased observations of the underlying operators.
no code implementations • 26 Jan 2021 • Sajad Khodadadian, Thinh T. Doan, Justin Romberg, Siva Theja Maguluri
In this paper, we characterize the \emph{global} convergence of an online natural actor-critic algorithm in the tabular setting using a single trajectory of samples.
no code implementations • 3 Nov 2020 • Thinh T. Doan
Under some fairly standard assumptions, we provide a formula that characterizes the rate of convergence of the main iterates to the desired solutions.
no code implementations • 28 Oct 2020 • Sihan Zeng, Thinh T. Doan, Justin Romberg
We study a decentralized variant of stochastic approximation, a data-driven approach for finding the root of an operator under noisy measurements.
no code implementations • 24 Jun 2020 • Thinh T. Doan
Motivated by broad applications in reinforcement learning and federated learning, we study local stochastic approximation over a network of agents, where their goal is to find the root of an operator composed of the local operators at the agents.
no code implementations • 24 Mar 2020 • Thinh T. Doan, Lam M. Nguyen, Nhan H. Pham, Justin Romberg
Motivated by broad applications in reinforcement learning and machine learning, this paper considers the popular stochastic gradient descent (SGD) when the gradients of the underlying objective function are sampled from Markov processes.
no code implementations • 23 Dec 2019 • Thinh T. Doan
Motivated by their broad applications in reinforcement learning, we study the linear two-time-scale stochastic approximation, an iterative method using two different step sizes for finding the solutions of a system of two equations.
no code implementations • 25 Jul 2019 • Thinh T. Doan, Siva Theja Maguluri, Justin Romberg
Our main contribution is to provide a finite-analysis on the performance of this distributed {\sf TD}$(\lambda)$ algorithm for both constant and time-varying step sizes.
1 code implementation • 27 May 2019 • Zaiwei Chen, Sheng Zhang, Thinh T. Doan, John-Paul Clarke, Siva Theja Maguluri
To demonstrate the generality of our theoretical results on Markovian SA, we use it to derive the finite-sample bounds of the popular $Q$-learning with linear function approximation algorithm, under a condition on the behavior policy.
no code implementations • 20 Feb 2019 • Thinh T. Doan, Siva Theja Maguluri, Justin Romberg
In this problem, a group of agents works cooperatively to evaluate the value function for the global discounted accumulative reward problem, which is composed of local rewards observed by the agents.
Optimization and Control