On Over-Parametrized Models and Sobolev Training

This was part of Statistical and Computational Challenges in Probabilistic Scientific Machine Learning (SciML)

Matthew Li, Massachusetts Institute of Technology (MIT)

Thursday, June 12, 2025

Abstract:

With Sobolev training, neural networks are provided data about both
the function of interest and its derivatives. This setting is
prevalent in scientific machine learning---appearing in molecular
dynamics emulators, derivative-informed neural operators, and
predictors of summary statistics of chaotic dynamical systems---as
well as in traditional machine learning tasks like teacher-student
model distillation. However, fundamental questions remain: How does
over-parameterization influence performance? What role does the
signal-to-noise ratio play? And is additional derivative data always
beneficial?

In this work, we study these questions using tools from statistical
physics and random matrix theory. In particular, we consider Sobolev
training in the proportional asymptotics regime in which the problem
dimensionality d, single hidden-layer features p, and training points
n grow to infinity at fixed ratios. We focus on target functions
modeled as single-index models (i.e., ridge functions with a single
intrinsic dimension), providing theoretical insights into the effects
of derivative information in high-dimensional learning.

Joint with Kate Fisher, Timo Schorlepp, and Youssef Marzouk.