This was part of

**Machine Learning for Climate and Weather Applications**## Systematically Generating Hierarchies of Machine-Learning Models, from Equation Discovery to Deep Neural Networks

**
Tom Beucler, University of LausanneFriday, November 4, 2022
**

**Abstract**: Model hierarchies help connect our fundamental understanding of the climate with operational predictions. However, deriving climate model hierarchies using dynamical models of increasing complexity is costly. Additionally, such hierarchies can be challenging to consistently organize since dynamical models may have entirely different outputs that were benchmarked using distinct (and sometimes intractable) data sources. Recently, machine learning approaches have proven invaluable in developing accurate statistical models from data, notably because of their ability to capture nonlinearities and connectivities in time and space ignored by traditional statistical models. Motivated by the possibility of deriving statistical models of varying complexity, from simple analytic equations to complex neural networks, we ask: Can we leverage machine learning to systematically generate a model hierarchy from a single data source? To address this question, we choose four atmospheric science problems for which we have physically-based, analytic models with just a few tunable parameters, and deep learning algorithms whose performance was already established in previous work: cloud cover parameterization, shortwave radiative transfer for numerical weather prediction, scalar flux parameterization in the planetary boundary layer, and subgrid-scale convection parameterization for climate modeling. In each case, we formalize the machine learning-based hierarchy by working in a well-defined, two-dimensional plane: Complexity versus Performance. We choose the number of trainable parameters as a simple definition for complexity, while performance is systematically defined using a single regression metric (e.g., the mean-squared error) calculated for the same outputs over the same dataset that all machine learning models were trained on. During this presentation, we will demonstrate how to use our data-driven hierarchies for two purposes: (1) Data–driven model development; and (2) process understanding. First, each machine learning model of the hierarchy occupies a well-defined (complexity, performance) position as they use the same performance metric. Models that maximize performance for a given complexity unambiguously define a Pareto frontier in (complexity) x (performance) space and can be deemed optimal. Second, optimal models on the Pareto frontier can be compared to reveal which added process/nonlinearity/regime/connectivity/etc. leads to the biggest increase in performance for a given complexity, which facilitates process understanding. For example, using a variational auto-encoder to reconstruct scalar turbulent fluxes in the boundary layer, we can capture non-local vertical transport neglected by traditional eddy-diffusivity schemes. Using a specialized type of convolutional neural network (U-net++) to emulate shortwave radiative heating, we can mostly overcome the biases of a one-stream model of shortwave radiation, notably the negative bias in downward shortwave fluxes at low zenith angles. To show its versatility, we will apply our framework to the data-driven discovery of analytic models, which are interpretable by construction. Combining symbolic regression with sequential feature selection, we derive progressively more complex equations to model cloud area fraction and broadband shortwave radiative fluxes from the local thermodynamic environment. In the case of cloud cover parameterization, a third-order polynomial with only six terms can yield a coefficient of determination as high as 0.65 (compared to 0.9 for the neural network), systematically beating the performance of the widely-used Sundqvist scheme thanks to its ability to capture the effects of cloud condensate mixing ratio on cloud fraction. In summary, our framework can guide the development of Pareto-optimal, data-driven models for weather and climate applications, while furthering process understanding by hierarchically unveiling system complexity.