Toward physical generative modeling

This was part of Statistical and Computational Challenges in Probabilistic Scientific Machine Learning (SciML)

Nisha Chandramoorthy, University of Chicago

Thursday, June 12, 2025

Abstract: We explore three different ways of effectively sampling physical measures (observable invariant probability measures) associated with chaotic dynamical systems. The most direct approach is to develop a surrogate model for the underlying dynamics that preserves the physical measure of the original system. In this regard, we first ask a key statistical question: under what conditions can a neural network surrogate model of short-term dynamics accurately reproduce statistical properties according to the true physical measure? We prove that when a system satisfies linear response, incorporating Jacobian information (the derivative of dynamics with respect to state variables) into the loss function for the surrogate model provides statistical accuracy guarantees. We then explore direct generative modeling for the physical measure. In a generative model, given samples of a target distribution, an optimization problem is solved for the score or a vector field associated with measures near the target. Using these score or drift vector fields, more samples from the target are produced. When we have samples known to be from the physical measure, we can train a generative model—whether through flow-matching or score-based techniques—to produce additional samples approximately from the same measure. Given inevitable errors in learning either the underlying vector field or score function, how do we ensure the physical validity of generated samples? We introduce the concept of "robustness of support": a property whereby generated samples, despite learning errors, remain approximately within the same support as the true physical measure. In practical terms, a robust generative model can sample effectively from the unstable manifold of a chaotic system where the physical measure is supported. We establish that this robustness is achievable when the least stable finite-time Lyapunov vectors of the generative modeling dynamics align with the unstable manifold, and we specify sufficient conditions for this alignment across a broad class of generative models. Finally, we show how to leverage efficient computational methods for estimating the score of physical measures to improve the accuracy of score-matching based generative models. We conclude by discussing how these precise score estimators can enable implementation of Bayesian methodologies—including parameter estimation and data assimilation—for high-dimensional chaotic systems.