no code implementations • 29 Jan 2024 • Johan Olsson, Runyu Zhang, Emma Tegling, Na Li
In this work, we study a special class of such problems where distributed state feedback controllers can give near-optimal performance.
no code implementations • 18 Jan 2024 • Phevos Paschalidis, Runyu Zhang, Na Li
The reward of the system is modeled as a weighted sum of the rewards the agents observe, where the weights capture some transformation of the reward associated with multiple agents sampling the same node at the same time.
no code implementations • 20 Jun 2023 • Runyu Zhang, Yang Hu, Na Li
This paper introduces a new formulation for risk-sensitive MDPs, which assesses risk in a slightly different manner compared to the classical Markov risk measure (Ruszczy\'nski 2010), and establishes its equivalence with a class of regularized robust MDP (RMDP) problems, including the standard RMDP as a special case.
no code implementations • 28 Feb 2023 • Tyler Will, Runyu Zhang, Eli Sadovnik, Mengdi Gao, Joshua Vendrow, Jamie Haddock, Denali Molitor, Deanna Needell
We introduce a new method based on nonnegative matrix factorization, Neural NMF, for detecting latent hierarchical structure in data.
no code implementations • 6 Jun 2022 • Runyu Zhang, Qinghua Liu, Huan Wang, Caiming Xiong, Na Li, Yu Bai
Next, we show that this framework instantiated with the Optimistic Follow-The-Regularized-Leader (OFTRL) algorithm at each state (and smooth value updates) can find an $\mathcal{\widetilde{O}}(T^{-5/6})$ approximate NE in $T$ iterations, and a similar algorithm with slightly modified value update rule achieves a faster $\mathcal{\widetilde{O}}(T^{-1})$ convergence rate.
no code implementations • 1 Jun 2021 • Runyu Zhang, Zhaolin Ren, Na Li
We show that Nash equilibria (NEs) and first-order stationary policies are equivalent in this setting, and give a local convergence rate around strict NEs.
1 code implementation • L4DC 2020 • Ying-Ying Li, Yujie Tang, Runyu Zhang, Na Li
We propose a Zero-Order Distributed Policy Optimization algorithm (ZODPO) that learns linear local controllers in a distributed fashion, leveraging the ideas of policy gradient, zero-order optimization and consensus algorithms.