Search Results for author: Cameron Voloshin

Found 6 papers, 2 papers with code

LTL-Constrained Policy Optimization with Cycle Experience Replay

no code implementations • 17 Apr 2024 • Ameesh Shah, Cameron Voloshin, Chenxi Yang, Abhinav Verma, Swarat Chaudhuri, Sanjit A. Seshia

In this work, we present Cycle Experience Replay (CyclER), a reward-shaping approach to this problem that allows continuous state and action spaces and the use of function approximations.

Reinforcement Learning (RL)

Paper
Add Code

Eventual Discounting Temporal Logic Counterfactual Experience Replay

no code implementations • 3 Mar 2023 • Cameron Voloshin, Abhinav Verma, Yisong Yue

Linear temporal logic (LTL) offers a simplified way of specifying tasks for policy optimization that may otherwise be difficult to describe with scalar reward functions.

counterfactual Counterfactual Reasoning

Paper
Add Code

Policy Optimization with Linear Temporal Logic Constraints

no code implementations • 20 Jun 2022 • Cameron Voloshin, Hoang M. Le, Swarat Chaudhuri, Yisong Yue

We study the problem of policy optimization (PO) with linear temporal logic (LTL) constraints.

Paper
Add Code

Minimax Model Learning

no code implementations • 2 Mar 2021 • Cameron Voloshin, Nan Jiang, Yisong Yue

We present a novel off-policy loss function for learning a transition model in model-based reinforcement learning.

Model-based Reinforcement Learning Off-policy evaluation +1

Paper
Add Code

Empirical Study of Off-Policy Policy Evaluation for Reinforcement Learning

3 code implementations • 15 Nov 2019 • Cameron Voloshin, Hoang M. Le, Nan Jiang, Yisong Yue

We offer an experimental benchmark and empirical study for off-policy policy evaluation (OPE) in reinforcement learning, which is a key problem in many safety critical applications.

Benchmarking Experimental Design +2

Paper
Code

Batch Policy Learning under Constraints

2 code implementations • 20 Mar 2019 • Hoang M. Le, Cameron Voloshin, Yisong Yue

When learning policies for real-world domains, two important questions arise: (i) how to efficiently use pre-collected off-policy, non-optimal behavior data; and (ii) how to mediate among different competing objectives and constraints.

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.