Search Results for author: Wenjie Shi

Found 7 papers, 2 papers with code

Temporal-Spatial Causal Interpretations for Vision-Based Reinforcement Learning

no code implementations6 Dec 2021 Wenjie Shi, Gao Huang, Shiji Song, Cheng Wu

TSCI model builds on the formulation of temporal causality, which reflects the temporal causal relations between sequential observations and decisions of RL agent.

Causal Discovery Decision Making +2

A Reduction Approach to Constrained Reinforcement Learning

no code implementations1 Jan 2021 Tianchi Cai, Wenjie Shi, Lihong Gu, Xiaodong Zeng, Jinjie Gu

In this paper, we present a reduction approach to find sparse policies that randomize among a constant number of policies for the constrained RL problem.

reinforcement-learning Reinforcement Learning (RL)

Robust Offline Reinforcement Learning from Low-Quality Data

no code implementations1 Jan 2021 Wenjie Shi, Tianchi Cai, Shiji Song, Lihong Gu, Jinjie Gu, Gao Huang

We theoretically show that AdaPT produces a tight upper bound on the distributional deviation between the learned policy and the behavior policy, and this upper bound is the minimum requirement to guarantee policy improvement at each iteration.

Continuous Control Offline RL +2

Self-Supervised Discovering of Interpretable Features for Reinforcement Learning

1 code implementation16 Mar 2020 Wenjie Shi, Gao Huang, Shiji Song, Zhuoyuan Wang, Tingyu Lin, Cheng Wu

Deep reinforcement learning (RL) has recently led to many breakthroughs on a range of complex control tasks.

Atari Games Decision Making +2

Multi Pseudo Q-learning Based Deterministic Policy Gradient for Tracking Control of Autonomous Underwater Vehicles

no code implementations7 Sep 2019 Wenjie Shi, Shiji Song, Cheng Wu, C. L. Philip Chen

Different from existing policy gradient methods which employ single actor-critic but cannot realize satisfactory tracking control accuracy and stable learning, our proposed algorithm can achieve high-level tracking control accuracy of AUVs and stable learning by applying a hybrid actors-critics architecture, where multiple actors and critics are trained to learn a deterministic policy and action-value function, respectively.

Policy Gradient Methods Q-Learning

Regularized Anderson Acceleration for Off-Policy Deep Reinforcement Learning

1 code implementation NeurIPS 2019 Wenjie Shi, Shiji Song, Hui Wu, Ya-Chu Hsu, Cheng Wu, Gao Huang

To tackle this problem, we propose a general acceleration method for model-free, off-policy deep RL algorithms by drawing the idea underlying regularized Anderson acceleration (RAA), which is an effective approach to accelerating the solving of fixed point problems with perturbations.

reinforcement-learning Reinforcement Learning (RL)

Soft Policy Gradient Method for Maximum Entropy Deep Reinforcement Learning

no code implementations7 Sep 2019 Wenjie Shi, Shiji Song, Cheng Wu

Then, we present an off-policy actor-critic, model-free maximum entropy deep RL algorithm called deep soft policy gradient (DSPG) by combining soft policy gradient with soft Bellman equation.

reinforcement-learning Reinforcement Learning (RL)

Cannot find the paper you are looking for? You can Submit a new open access paper.