This was part of Dynamic Assessment Indices
Risk-Averse Learning by Temporal Difference Methods with Markov Risk Measures
Andrzej Ruszczyński, Rutgers University
Friday, May 13, 2022
Abstract: We propose a novel reinforcement learning methodology where the system performance is evaluated by a Markov coherent dynamic risk measure with the use of linear value function approximations. We construct projected risk-averse dynamic programming equations and study their properties. We propose new risk-averse counterparts of the basic and multi-step methods of temporal differences and we prove their convergence with probability one. We also perform an empirical study on a complex control problem. This is a joint work with Umit Kose.