Search Results for author: Shicong Cen

Found 13 papers, 1 papers with code

Value-Incentivized Preference Optimization: A Unified Approach to Online and Offline RLHF

no code implementations • 29 May 2024 • Shicong Cen, Jincheng Mei, Katayoon Goshvadi, Hanjun Dai, Tong Yang, Sherry Yang, Dale Schuurmans, Yuejie Chi, Bo Dai

A key bottleneck is understanding how to incorporate uncertainty estimation in the reward function learned from the preference data for RLHF, regardless of how the preference data is collected.

reinforcement-learning Reinforcement Learning (RL) +1

Paper
Add Code

Beyond Expectations: Learning with Stochastic Dominance Made Practical

no code implementations • 5 Feb 2024 • Shicong Cen, Jincheng Mei, Hanjun Dai, Dale Schuurmans, Yuejie Chi, Bo Dai

Stochastic dominance models risk-averse preferences for decision making with uncertain outcomes, which naturally captures the intrinsic structure of the underlying uncertainty, in contrast to simply resorting to the expectations.

Decision Making Portfolio Optimization

Paper
Add Code

Federated Natural Policy Gradient Methods for Multi-task Reinforcement Learning

no code implementations • 1 Nov 2023 • Tong Yang, Shicong Cen, Yuting Wei, Yuxin Chen, Yuejie Chi

Federated reinforcement learning (RL) enables collaborative decision making of multiple distributed agents without sharing local data trajectories.

Decision Making Policy Gradient Methods +2

Paper
Add Code

Global Convergence of Policy Gradient Methods in Reinforcement Learning, Games and Control

no code implementations • 8 Oct 2023 • Shicong Cen, Yuejie Chi

Policy gradient methods, where one searches for the policy of interest by maximizing the value functions using first-order information, become increasingly popular for sequential decision making in reinforcement learning, games, and control.

Decision Making Policy Gradient Methods +1

Paper
Add Code

Asynchronous Gradient Play in Zero-Sum Multi-agent Games

no code implementations • 16 Nov 2022 • Ruicheng Ao, Shicong Cen, Yuejie Chi

Moving beyond, we demonstrate entropy-regularized OMWU -- by adopting two-timescale learning rates in a delay-aware manner -- enjoys faster last-iterate convergence under fixed delays, and continues to converge provably even when the delays are arbitrarily bounded in an average-iterate manner.

Paper
Add Code

Faster Last-iterate Convergence of Policy Optimization in Zero-Sum Markov Games

no code implementations • 3 Oct 2022 • Shicong Cen, Yuejie Chi, Simon S. Du, Lin Xiao

Multi-Agent Reinforcement Learning (MARL) -- where multiple agents learn to interact in a shared dynamic environment -- permeates across a wide range of critical applications.

Multi-agent Reinforcement Learning

Paper
Add Code

Independent Natural Policy Gradient Methods for Potential Games: Finite-time Global Convergence with Entropy Regularization

no code implementations • 12 Apr 2022 • Shicong Cen, Fan Chen, Yuejie Chi

We show that the proposed method converges to the quantal response equilibrium (QRE) -- the equilibrium to the entropy-regularized game -- at a sublinear rate, which is independent of the size of the action space and grows at most sublinearly with the number of agents.

Autonomous Vehicles Policy Gradient Methods

Paper
Add Code

Fast Policy Extragradient Methods for Competitive Games with Entropy Regularization

no code implementations • NeurIPS 2021 • Shicong Cen, Yuting Wei, Yuejie Chi

Motivated by the algorithmic role of entropy regularization in single-agent reinforcement learning and game theory, we develop provably efficient extragradient methods to find the quantal response equilibrium (QRE) -- which are solutions to zero-sum two-player matrix games with entropy regularization -- at a linear rate.

Paper
Add Code

Policy Mirror Descent for Regularized Reinforcement Learning: A Generalized Framework with Linear Convergence

no code implementations • 24 May 2021 • Wenhao Zhan, Shicong Cen, Baihe Huang, Yuxin Chen, Jason D. Lee, Yuejie Chi

These can often be accounted for via regularized RL, which augments the target value function with a structure-promoting regularizer.

Reinforcement Learning (RL)

Paper
Add Code

Fast Global Convergence of Natural Policy Gradient Methods with Entropy Regularization

no code implementations • 13 Jul 2020 • Shicong Cen, Chen Cheng, Yuxin Chen, Yuting Wei, Yuejie Chi

This class of methods is often applied in conjunction with entropy regularization -- an algorithmic scheme that encourages exploration -- and is closely related to soft policy iteration and trust region policy optimization.

Policy Gradient Methods

Paper
Add Code

Communication-Efficient Distributed Optimization in Networks with Gradient Tracking and Variance Reduction

1 code implementation • 12 Sep 2019 • Boyue Li, Shicong Cen, Yuxin Chen, Yuejie Chi

There is growing interest in large-scale machine learning and optimization over decentralized networks, e. g. in the context of multi-agent learning and federated learning.

Distributed Optimization Federated Learning

Paper
Code

Convergence of Distributed Stochastic Variance Reduced Methods without Sampling Extra Data

no code implementations • 29 May 2019 • Shicong Cen, Huishuai Zhang, Yuejie Chi, Wei Chen, Tie-Yan Liu

Our theory captures how the convergence of distributed algorithms behaves as the number of machines and the size of local data vary.

Paper
Add Code

A Stochastic Semismooth Newton Method for Nonsmooth Nonconvex Optimization

no code implementations • 9 Mar 2018 • Andre Milzarek, Xiantao Xiao, Shicong Cen, Zaiwen Wen, Michael Ulbrich

In this work, we present a globalized stochastic semismooth Newton method for solving stochastic optimization problems involving smooth nonconvex and nonsmooth convex terms in the objective function.

Binary Classification Stochastic Optimization

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.