no code implementations • 29 May 2024 • Shicong Cen, Jincheng Mei, Katayoon Goshvadi, Hanjun Dai, Tong Yang, Sherry Yang, Dale Schuurmans, Yuejie Chi, Bo Dai
A key bottleneck is understanding how to incorporate uncertainty estimation in the reward function learned from the preference data for RLHF, regardless of how the preference data is collected.
no code implementations • 5 Feb 2024 • Shicong Cen, Jincheng Mei, Hanjun Dai, Dale Schuurmans, Yuejie Chi, Bo Dai
Stochastic dominance models risk-averse preferences for decision making with uncertain outcomes, which naturally captures the intrinsic structure of the underlying uncertainty, in contrast to simply resorting to the expectations.
no code implementations • 1 Nov 2023 • Tong Yang, Shicong Cen, Yuting Wei, Yuxin Chen, Yuejie Chi
Federated reinforcement learning (RL) enables collaborative decision making of multiple distributed agents without sharing local data trajectories.
no code implementations • 8 Oct 2023 • Shicong Cen, Yuejie Chi
Policy gradient methods, where one searches for the policy of interest by maximizing the value functions using first-order information, become increasingly popular for sequential decision making in reinforcement learning, games, and control.
no code implementations • 16 Nov 2022 • Ruicheng Ao, Shicong Cen, Yuejie Chi
Moving beyond, we demonstrate entropy-regularized OMWU -- by adopting two-timescale learning rates in a delay-aware manner -- enjoys faster last-iterate convergence under fixed delays, and continues to converge provably even when the delays are arbitrarily bounded in an average-iterate manner.
no code implementations • 3 Oct 2022 • Shicong Cen, Yuejie Chi, Simon S. Du, Lin Xiao
Multi-Agent Reinforcement Learning (MARL) -- where multiple agents learn to interact in a shared dynamic environment -- permeates across a wide range of critical applications.
no code implementations • 12 Apr 2022 • Shicong Cen, Fan Chen, Yuejie Chi
We show that the proposed method converges to the quantal response equilibrium (QRE) -- the equilibrium to the entropy-regularized game -- at a sublinear rate, which is independent of the size of the action space and grows at most sublinearly with the number of agents.
no code implementations • NeurIPS 2021 • Shicong Cen, Yuting Wei, Yuejie Chi
Motivated by the algorithmic role of entropy regularization in single-agent reinforcement learning and game theory, we develop provably efficient extragradient methods to find the quantal response equilibrium (QRE) -- which are solutions to zero-sum two-player matrix games with entropy regularization -- at a linear rate.
no code implementations • 24 May 2021 • Wenhao Zhan, Shicong Cen, Baihe Huang, Yuxin Chen, Jason D. Lee, Yuejie Chi
These can often be accounted for via regularized RL, which augments the target value function with a structure-promoting regularizer.
no code implementations • 13 Jul 2020 • Shicong Cen, Chen Cheng, Yuxin Chen, Yuting Wei, Yuejie Chi
This class of methods is often applied in conjunction with entropy regularization -- an algorithmic scheme that encourages exploration -- and is closely related to soft policy iteration and trust region policy optimization.
1 code implementation • 12 Sep 2019 • Boyue Li, Shicong Cen, Yuxin Chen, Yuejie Chi
There is growing interest in large-scale machine learning and optimization over decentralized networks, e. g. in the context of multi-agent learning and federated learning.
no code implementations • 29 May 2019 • Shicong Cen, Huishuai Zhang, Yuejie Chi, Wei Chen, Tie-Yan Liu
Our theory captures how the convergence of distributed algorithms behaves as the number of machines and the size of local data vary.
no code implementations • 9 Mar 2018 • Andre Milzarek, Xiantao Xiao, Shicong Cen, Zaiwen Wen, Michael Ulbrich
In this work, we present a globalized stochastic semismooth Newton method for solving stochastic optimization problems involving smooth nonconvex and nonsmooth convex terms in the objective function.