On the convergence of gradient descent for wide two-layer neural networks (Part 1)

This was part of Introduction to Distributed Solutions

Francis Bach, INRIA, Ecole Normale Superieure, PSL Research University

Monday, October 4, 2021

Abstract:

Many supervised learning methods are naturally cast as optimization problems. For prediction models which are linear in their parameters, this often leads to convex problems for which many guarantees exist. Models which are non-linear in their parameters such as neural networks lead to non-convex optimization problems for which guarantees are harder to obtain. In this talk, I will consider two-layer neural networks with homogeneous activation functions where the number of hidden neurons tends to infinity, and show how qualitative convergence guarantees may be derived. I will also highlight open problems related to the quantitative behavior of gradient descent for such models. (Joint work with Lénaïc Chizat)