Search Results for author: Yanwei Jia

Found 5 papers, 0 papers with code

Continuous-time Risk-sensitive Reinforcement Learning via Quadratic Variation Penalty

no code implementations • 19 Apr 2024 • Yanwei Jia

Owing to the martingale perspective in Jia and Zhou (2023) the risk-sensitive RL problem is shown to be equivalent to ensuring the martingale property of a process involving both the value function and the q-function, augmented by an additional penalty term: the quadratic variation of the value process, capturing the variability of the value-to-go along the trajectory.

Q-Learning reinforcement-learning +1

Paper
Add Code

Learning Merton's Strategies in an Incomplete Market: Recursive Entropy Regularization and Biased Gaussian Exploration

no code implementations • 19 Dec 2023 • Min Dai, Yuchao Dong, Yanwei Jia, Xun Yu Zhou

We study Merton's expected utility maximization problem in an incomplete market, characterized by a factor process in addition to the stock price process, where all the model primitives are unknown.

Reinforcement Learning (RL)

Paper
Add Code

q-Learning in Continuous Time

no code implementations • 2 Jul 2022 • Yanwei Jia, Xun Yu Zhou

We study the continuous-time counterpart of Q-learning for reinforcement learning (RL) under the entropy-regularized, exploratory diffusion process formulation introduced by Wang et al. (2020).

Learning Theory Q-Learning +1

Paper
Add Code

Policy Gradient and Actor-Critic Learning in Continuous Time and Space: Theory and Algorithms

no code implementations • 22 Nov 2021 • Yanwei Jia, Xun Yu Zhou

This effectively turns PG into a policy evaluation (PE) problem, enabling us to apply the martingale approach recently developed by Jia and Zhou (2021) for PE to solve our PG problem.

Paper
Add Code

Policy Evaluation and Temporal-Difference Learning in Continuous Time and Space: A Martingale Approach

no code implementations • 15 Aug 2021 • Yanwei Jia, Xun Yu Zhou

From this perspective, we find that the mean--square TD error approximates the quadratic variation of the martingale and thus is not a suitable objective for PE.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.