no code implementations • 27 Feb 2024 • Zijian Guo, Weichao Zhou, Wenchao Li
Offline safe reinforcement learning (RL) aims to train a constraint satisfaction policy from a fixed dataset.
no code implementations • 2 Jun 2023 • Weichao Zhou, Wenchao Li
Many imitation learning (IL) algorithms employ inverse reinforcement learning (IRL) to infer the intrinsic reward function that an expert is implicitly optimizing for based on their demonstrated behaviors.
1 code implementation • 31 Mar 2023 • YiXuan Wang, Weichao Zhou, Jiameng Fan, Zhilu Wang, Jiajun Li, Xin Chen, Chao Huang, Wenchao Li, Qi Zhu
We also present a novel approach to propagate TMs more efficiently and precisely across ReLU activation functions.
no code implementations • 20 Apr 2022 • Weichao Zhou, Wenchao Li
A misspecified reward can degrade sample efficiency and induce undesired behaviors in reinforcement learning (RL) problems.
no code implementations • 14 Dec 2021 • Weichao Zhou, Wenchao Li
In this paper, we propose the idea of programmatic reward design, i. e. using programs to specify the reward functions in RL environments.
2 code implementations • 25 Jun 2021 • Chao Huang, Jiameng Fan, Zhilu Wang, YiXuan Wang, Weichao Zhou, Jiajun Li, Xin Chen, Wenchao Li, Qi Zhu
We present POLAR, a polynomial arithmetic-based framework for efficient bounded-time reachability analysis of neural-network controlled systems (NNCSs).
no code implementations • 17 Aug 2020 • Weichao Zhou, Ruihan Gao, BaekGyu Kim, Eunsuk Kang, Wenchao Li
The key idea behind our approach is the formulation of a trajectory optimization problem that allows the joint reasoning of policy update and safety constraints.
1 code implementation • 22 Oct 2017 • Weichao Zhou, Wenchao Li
Apprenticeship learning (AL) is a kind of Learning from Demonstration techniques where the reward function of a Markov Decision Process (MDP) is unknown to the learning agent and the agent has to derive a good policy by observing an expert's demonstrations.