no code implementations • 5 Nov 2023 • Nianli Peng, Brandon Fain
We first state a distinct reward-aware version of value iteration that calculates a non-stationary policy that is approximately optimal for a given model of the environment.
1 code implementation • 30 Nov 2022 • Zimeng Fan, Nianli Peng, Muhang Tian, Brandon Fain
We study fair multi-objective reinforcement learning in which an agent must learn a policy that simultaneously achieves high reward on multiple dimensions of a vector-valued reward.