no code implementations • 17 Apr 2024 • Ameesh Shah, Cameron Voloshin, Chenxi Yang, Abhinav Verma, Swarat Chaudhuri, Sanjit A. Seshia
In this work, we present Cycle Experience Replay (CyclER), a reward-shaping approach to this problem that allows continuous state and action spaces and the use of function approximations.
no code implementations • 3 Mar 2023 • Cameron Voloshin, Abhinav Verma, Yisong Yue
Linear temporal logic (LTL) offers a simplified way of specifying tasks for policy optimization that may otherwise be difficult to describe with scalar reward functions.
no code implementations • 20 Jun 2022 • Cameron Voloshin, Hoang M. Le, Swarat Chaudhuri, Yisong Yue
We study the problem of policy optimization (PO) with linear temporal logic (LTL) constraints.
no code implementations • 2 Mar 2021 • Cameron Voloshin, Nan Jiang, Yisong Yue
We present a novel off-policy loss function for learning a transition model in model-based reinforcement learning.
3 code implementations • 15 Nov 2019 • Cameron Voloshin, Hoang M. Le, Nan Jiang, Yisong Yue
We offer an experimental benchmark and empirical study for off-policy policy evaluation (OPE) in reinforcement learning, which is a key problem in many safety critical applications.
2 code implementations • 20 Mar 2019 • Hoang M. Le, Cameron Voloshin, Yisong Yue
When learning policies for real-world domains, two important questions arise: (i) how to efficiently use pre-collected off-policy, non-optimal behavior data; and (ii) how to mediate among different competing objectives and constraints.