The Mean-Field Limit for Shallow Neural Networks: Implications for Trainability and Generalization (Part 1)

This was part of Short Courses on the Mean Field Approach in Machine Learning and Statistics

Grant Rotskoff, Stanford University

Tuesday, October 19, 2021

Abstract:

Neural networks with large numbers of parameters have a number of remarkable empirical properties from the perspective of numerical analysis: these parametric models can be optimized reliably without regularization or guarantees of convexity and they also accurately regress very high-dimensional data. In these lectures, I will explore one theoretical explanation of these remarkable properties; first, I will introduce the mean-field limit for neural networks and discuss a corresponding law of large numbers, which ensures convergence of the “training” dynamics to a global optimum. In addition, I will discuss fluctuations and a central limit theorem type result. Subsequently, I will describe modifications of the gradient flow that improve converge and detail some applications.