Computational/Statistical Gaps for Learning Neural Networks

This was part of The Multifaceted Complexity of Machine Learning

Adam Klivans, University of Texas at Austin

Tuesday, April 13, 2021

Abstract: It has been known for decades that a polynomial-size training sample suffices for learning neural networks. Most theoretical results, however, indicate that these learning tasks are computationally intractable. Where exactly is the dividing line? In this talk we consider arguably the most well-studied scenario, namely when the marginal distribution on inputs is Gaussian, and show unconditionally that gradient descent cannot learn even one-layer neural networks. We then point to a potential way forward and describe the first fixed-parameter tractable algorithm for learning deep ReLU networks. Its running time is a fixed polynomial in the ambient dimension and exponential in only the network’s parameters. This is the first nontrivial positive result for networks of depth more than two.