Benign overfitting

This was part of The Multifaceted Complexity of Machine Learning

Peter Bartlett, University of California, Berkeley

Monday, April 12, 2021

Abstract: Deep learning methodology has revealed some major surprises from the perspective of statistical complexity: even without any explicit effort to control model complexity, these methods find prediction rules that give a near-perfect fit to noisy training data and yet exhibit excellent prediction performance in practice. We investigate this phenomenon of ‘benign overfitting’ in the setting of linear prediction and give a characterization of linear regression problems for which the minimum norm interpolating prediction rule has near-optimal prediction accuracy. The characterization shows that overparameterization is essential: the number of directions in parameter space that are unimportant for prediction must be large compared to the sample size. We discuss implications for deep networks, for robustness to adversarial examples, and for the rich variety of possible behaviors of excess risk as a function of dimension, and we describe extensions to ridge regression and barriers to analyzing benign overfitting based on model-dependent generalization bounds