no code implementations • 30 Apr 2024 • Calarina Muslimani, Matthew E. Taylor
To improve the feedback efficiency of HitL RL methods (i. e., require less feedback), this paper introduces Sub-optimal Data Pre-training, SDP, an approach that leverages reward-free, sub-optimal data to improve scalar- and preference-based HitL RL algorithms.
no code implementations • 25 Apr 2022 • Alex Lewandowski, Calarina Muslimani, Dale Schuurmans, Matthew E. Taylor, Jun Luo
To effectively learn such a teaching policy, we introduce a parametric-behavior embedder that learns a representation of the student's learnable parameters from its input/output behavior.