Using autoencoders as generative models to create forecast ensembles for data assimilation

This was part of Machine Learning for Climate and Weather Applications

Ian Grooms, University of Colorado, Boulder

Monday, October 31, 2022

Abstract: The forecast covariance matrix is a crucial component of most operational data assimilation methods, and ensemble forecasts are frequently used to help estimate the forecast covariance. In the context of coupled Earth system models or high-resolution climate model components, ensemble forecasts can be extremely computationally expensive. Developing machine learning methods to reduce the cost of an ensemble forecast is an active area of research. The approach taken here relies on a single forecast (or a few) using the expensive physics-based model, and uses machine learning to generate an ensemble of synthetic model states that are similar to (analogs of) the single physics-based forecast. Specifically, autoencoders, including variational, are trained to reconstruct model states from a training set. Once trained, the ensemble is generated by encoding the single forecast into the latent space, then adding noise in latent space, then decoding the ensemble back to the model space. For very high-dimensional models, a single model state can be constructed from local 'patches' stitched together. The performance of data assimilation using these (and related) analog ensembles is explored in an idealized Lorenz model and in a high-resolution coupled quasigeostrohic model.