no code implementations • 6 Oct 2021 • Aishwarya Mandyam, Andrew Jones, Jiayu Yao, Krzysztof Laudanski, Barbara Engelhardt
CFQI uses a compositional $Q$-value function with separate modules for each task variant, allowing it to take advantage of shared knowledge while learning distinct policies for each variant.
no code implementations • 29 Sep 2021 • Aishwarya Mandyam, Andrew Jones, Krzysztof Laudanski, Barbara Engelhardt
Off-policy reinforcement learning (RL) has proven to be a powerful framework for guiding agents' actions in environments with stochastic rewards and unknown or noisy state dynamics.