no code implementations • 30 Oct 2023 • Hanwen Ye, Wenzhuo Zhou, Ruoqing Zhu, Annie Qu
In particular, the proposed learning scheme builds a more general framework which includes the popular outcome weighted learning framework as a special case of ours.
no code implementations • 28 Sep 2023 • Wenzhuo Zhou, Annie Qu
From a theoretical standpoint, we provide instance-dependent regret bounds with general function approximation, which shows that our algorithm can learn a best-effort policy that is able to compete against any comparator policy that is covered by batch data.
no code implementations • 23 Sep 2023 • Wenzhuo Zhou, Annie Qu, Keiland W. Cooper, Norbert Fortin, Babak Shahbaba
Graph Neural Networks (GNNs) have achieved promising performance in a variety of graph-focused tasks.
no code implementations • 23 Sep 2023 • Wenzhuo Zhou, Yuhan Li, Ruoqing Zhu, Annie Qu
This task faces two primary challenges: providing a comprehensive and rigorous error quantification in CI estimation, and addressing the distributional shift that results from discrepancies between the distribution induced by the target policy and the offline data-generating process.
no code implementations • 21 Jan 2023 • Yuhan Li, Wenzhuo Zhou, Ruoqing Zhu
Many real-world applications of reinforcement learning (RL) require making decisions in continuous action environments.
no code implementations • 20 Oct 2021 • Wenzhuo Zhou, Ruoqing Zhu, Annie Qu
To address these challenges, we propose a Proximal Temporal consistency Learning (pT-Learning) framework to estimate an optimal regime that is adaptively adjusted between deterministic and stochastic sparse policy models.
no code implementations • 14 Jan 2020 • Fei Xue, Yanqing Zhang, Wenzhuo Zhou, Haoda Fu, Annie Qu
An optimal dynamic treatment regime (DTR) consists of a sequence of decision rules in maximizing long-term benefits, which is applicable for chronic diseases such as HIV infection or cancer.