no code implementations • 19 Apr 2024 • Yanwei Jia
Owing to the martingale perspective in Jia and Zhou (2023) the risk-sensitive RL problem is shown to be equivalent to ensuring the martingale property of a process involving both the value function and the q-function, augmented by an additional penalty term: the quadratic variation of the value process, capturing the variability of the value-to-go along the trajectory.
no code implementations • 19 Dec 2023 • Min Dai, Yuchao Dong, Yanwei Jia, Xun Yu Zhou
We study Merton's expected utility maximization problem in an incomplete market, characterized by a factor process in addition to the stock price process, where all the model primitives are unknown.
no code implementations • 2 Jul 2022 • Yanwei Jia, Xun Yu Zhou
We study the continuous-time counterpart of Q-learning for reinforcement learning (RL) under the entropy-regularized, exploratory diffusion process formulation introduced by Wang et al. (2020).
no code implementations • 22 Nov 2021 • Yanwei Jia, Xun Yu Zhou
This effectively turns PG into a policy evaluation (PE) problem, enabling us to apply the martingale approach recently developed by Jia and Zhou (2021) for PE to solve our PG problem.
no code implementations • 15 Aug 2021 • Yanwei Jia, Xun Yu Zhou
From this perspective, we find that the mean--square TD error approximates the quadratic variation of the martingale and thus is not a suitable objective for PE.