1 code implementation • NeurIPS 2023 • Shenzhi Wang, Qisen Yang, Jiawei Gao, Matthieu Gaetan Lin, Hao Chen, Liwei Wu, Ning Jia, Shiji Song, Gao Huang
Existing solutions tackle this problem by imposing a policy constraint on the policy improvement objective in both offline and online learning.
no code implementations • 6 Jun 2023 • Qisen Yang, Shenzhi Wang, Matthieu Gaetan Lin, Shiji Song, Gao Huang
In particular, online fine-tuning has become a commonly used method to correct the erroneous estimates of out-of-distribution data learned in the offline training phase.
1 code implementation • 13 Oct 2022 • Andrew Zhao, Matthieu Gaetan Lin, Yangguang Li, Yong-Jin Liu, Gao Huang
However, both strategies rely on a strong assumption: the entropy of the environment's dynamics is either high or low.