This was part of Decision Making and Uncertainty

Learning Merton’s Strategies in an Incomplete Market: Recursive Entropy Regularization and Biased Gaussian Exploration

XunYu Zhou, Columbia University

Monday, February 5, 2024


We study Merton's expected utility maximization problem in an incomplete market, characterized by a factor process in addition to the stock price process, where all the model primitives are unknown. We take the reinforcement learning (RL) approach to learn optimal portfolio policies directly by exploring the unknown market, without attempting to estimate the model parameters. Based on the entropy-regularization framework for general continuous-time RL formulated in Wang et al. (2020), we propose a recursive weighting scheme on exploration that endogenously discounts the current exploration reward by past accumulative amount of exploration. Such a recursive regularization restores the optimality of Gaussian exploration.  However, contrary to the existing results, the optimal Gaussian policy  turns out to be biased in general, due to the interwinding  needs for hedging and  for exploration.  We present an asymptotic analysis of the resulting errors to show how the level of exploration affects the learned policies. Furthermore, we establish a policy improvement theorem and design several RL algorithms to learn  Merton's optimal strategies. Finally, we carry out both simulation and empirical studies with a  stochastic volatility environment to demonstrate the efficiency and robustness of the RL algorithms in comparison  to the conventional  plug-in method.