Mean field MDP and mean field RL

This was part of Applications to Financial Engineering

Mathieu Laurière, Google Brain, Paris

Tuesday, December 7, 2021

Abstract: Multi-agent reinforcement learning has attracted a lot of interest in the past decades. However, most existing methods do not scale well with the number of agents. In this talk, we study a limiting case in which there is an infinite population of cooperative agents. A mean field approach allows us to reduce the complexity of the problem and propose efficient learning methods. The problem is first phrased as a discrete time mean field control (MFC) problem. The model includes not only individual noise and individual action randomization at the agent level, but also common noise and common randomization at the population level. We relate this MFC problem to a lifted Mean Field Markov Decision Process (MFMDP), in which the state is the population distribution and for which we prove a dynamic programming principle. This allows us to connect closed-loop and open-loop controls for the original MFC problem. Building on this MFMDP, we propose two reinforcement learning (RL) methods: one based on tabular Q-learning, for which convergence can be proved, and one based on deep RL. Several numerical examples are provided, in discrete and continuous spaces. This is joint work with Rene Carmona and Zongjun Tan.