This was part of Machine Learning and Mean-Field Games

System Noise and Individual Exploration in Learning Large Population Games

Renyuan Xu, University of Southern California

Tuesday, May 24, 2022

In this talk, we demonstrate the importance of system noise and individual exploration in the context of multi-agent reinforcement learning in a large population regime.  In particular, we discuss several linear-quadratic problems where agents are assumed to have limited information about the stochastic system, and we focus on the policy gradient method which is a class of widely-used reinforcement learning algorithms.
In the finite-agent setting, we show that (a modified) policy gradient method could guide agents to find the Nash equilibrium solution provided there is a certain level of noise in the system. The noise can either come from the underlying dynamics or carefully designed explorations from the agents. When the number of agents goes to infinity, we propose an exploration scheme with entropy regularization that could help each individual agent to explore the unknown system as well as the behavior of other agents. The proposed scheme is shown to be able to speed up and stabilize the learning procedure.
This talk is based on several projects with Xin Guo (UC Berkeley), Ben Hambly (U of Oxford), Huining Yang (U of Oxford), and Thaleia Zariphopoulou (UT Austin).