This workshop explores the intersection of uncertainty quantification (UQ) and machine learning (ML) in modeling and analyzing intricate physical phenomena. Participants will examine the challenges of quantifying uncertainties in complex systems across various scientific and engineering domains. The workshop will cover advanced UQ techniques, including Bayesian inference, sensitivity analysis, and probabilistic modeling, tailored for complex physical systems. Attendees will delve into cutting-edge machine learning approaches, such as physics-informed neural networks, deep learning for differential equations, and transfer learning, applied to physical system modeling. The workshop will emphasize the synergy between UQ and ML, exploring how these fields can complement each other to enhance prediction accuracy and reliability in complex systems. Through interactive lectures and group discussions, participants will gain insights into implementing these methods in their research or industrial applications. This workshop is designed for researchers, engineers, and data scientists working with complex physical systems in fields such as fluid dynamics, climate modeling, aerospace engineering, and beyond. Attendees will leave equipped with state-of-the-art knowledge to tackle uncertainty and complexity in their respective domains.
Funding
All funding has been allocated for this event.
In-Person Attendance
We are at capacity for in-person attendees as of May 11, 2025. Registrations received after May 11, 2025 will be asked to attend online only.
Accelerating Gaussian Process Emulators for Computer Simulations Using Random Fourier Features
Speaker: Peter Chien (University of Wisconsin, Madison)
Computer simulations are essential for exploring input-output relationships in engineering and science but can be computationally expensive for extensive what-if analyses. Gaussian process emulators offer a powerful statistical approach to approximating simulations, but their scalability is often hindered by the costly inversion of large correlation matrices. To overcome this challenge, we introduce new methods leveraging the random Fourier feature technique from computer science to accelerate Gaussian process emulators. Our approach enhances computational efficiency while maintaining accuracy, making it suitable for a broad range of simulations, including those with gradient information, functional outputs, and stochastic outputs. Through numerical experiments, we demonstrate that our methods outperform existing ones in speed and accuracy, with theoretical results validating these improvements.
15:00-16:30 CDT
Coffee Break & Poster Session 1
Tuesday, May 20, 2025
9:00-9:30 CDT
Sign-in & Breakfast
9:30-10:30 CDT
FAIR Universe: Benchmarks for Systematics-Aware Machine Learning in Particle Physics and Cosmology
Speaker: Po-Wen Chang (Lawrence Berkeley National Laboratory)
Measurements and observations in particle physics fundamentally depend on one's ability to quantify their uncertainty and, thereby, their significance. Therefore, as machine learning (ML) methods become more prevalent in high energy physics, being able to determine the uncertainties of an ML method becomes more important. A wide range of possible approaches has been proposed, however, there has not been a comprehensive comparison of individual methods. To address this, the Fair Universe project organized the HiggsML Uncertainty Challenge, which took place from September 2024 to March 2025, and the dataset and performance metrics of the challenge will serve as a permanent benchmark for further developments. Additionally, the Challenge was accepted as an official NeurIPS2024 competition. The goal of the challenge was to measure the Higgs signal strength, using a dataset of simulated $pp$ collision events observed in LHC. Participants were evaluated on both their ability to precisely determine the correct signal strength, as well as on their ability to report correct and well-calibrated uncertainty intervals. In this talk, we present an overview of the competition itself and of the infrastructure that underpins it. Further, we present the winners of the competition and discuss their winning uncertainty quantification approaches.
Deep Gaussian processes for estimation of failure probabilities in complex systems
Speaker: Annie Booth (Virginia Tech)
Consider an expensive computer simulation of a complex system whose inputs are governed by a known distribution and whose output demarcates passing and failing. For example, we are motivated by a simulation of airflow around an aircraft wing with inputs specifying wing design/flight conditions and output indicating whether aerodynamic efficiency standards have been met. Our ultimate objective is to quantify the probability of system failure (which could be rare) with only several hundred evaluations of the expensive simulation. We tackle this problem in three parts. First, we develop a Bayesian deep Gaussian process (DGP) surrogate to furnish predictions and uncertainty quantification at unobserved inputs. DGPs outperform ordinary GPs when dynamics are nonstationary. Second, we propose a contour locating sequential design scheme to train the DGP to identify the failure contour in the response surface. Third, we incorporate a hybrid Monte Carlo estimator of the failure probability which combines DGP surrogate predictions with strategically allocated evaluations of the expensive model. All methods are supported by publicly available software and the “deepgp” R package on CRAN.
15:00-16:30 CDT
Coffee Break & Poster Session 2
Wednesday, May 21, 2025
9:00-9:30 CDT
Sign-in & Breakfast
9:30-10:30 CDT
Studying the Universe with Astrostatistics
Speaker: Gwen Eadie (University of Toronto)
Astrostatistics is a growing interdisciplinary field at the interface of astronomy and statistics. Astronomy is a field rich with publicly available data, but inference using these data must acknowledge selection effects, measurement uncertainty, censoring, and missingness. In the Astrostatistics Research Team (ART) at the University of Toronto --- a joint team between the David A. Dunlap Department of Astronomy & Astrophysics and the Department of Statistical Sciences --- we take an interdisciplinary approach to analysing astronomical data from a range of objects such as stars, star clusters, and galaxies. In this talk, I will cover three ART projects that employ Bayesian inference techniques to: (1) find stellar flares in time series data from stars using hidden Markov models, (2) investigate the uncertain relationship between old star cluster populations and their host galaxies using hurdle models, and (3) discover potential "dark" galaxies within an inhomogeneous Poisson Process framework using noisy data.
10:30-11:15 CDT
Coffee Break & Networking
11:15-12:15 CDT
MaLT: Machine-Learning-Guided Test Case Design and Fault Localization of Complex Software Systems
Speaker: Irene Ji (JMP)
Software testing is essential for the reliable and robust development of complex software systems. A key step in software testing is fault localization, which uses test data to pinpoint failure-inducing combinations for further diagnosis. Existing fault localization methods have two key limitations: they (i) do not incorporate domain and/or structural knowledge from test engineers, and (ii) do not provide a probabilistic assessment of risk for potential root causes. Such methods can thus fail to confidently whittle down the combinatorial number of potential root causes in complex systems, resulting in prohibitively high testing costs. To address this, we outline a holistic machine-learning-guided test case design and fault localization (MaLT) framework, which leverages recent probabilistic machine learning methods to accelerate the testing of complex software systems. MaLT consists of three steps: (i) the construction of a suite of test cases using a covering array for initial testing, (ii) the investigation of posterior root cause probabilities via a Bayesian fault localization procedure, then (iii) the use of such Bayesian analysis to guide selection of subsequent test cases via active learning. The proposed MaLT framework can thus facilitate efficient identification and subsequent diagnosis of software faults with limited test runs.
12:15-14:00 CDT
Lunch Break
14:00-15:00 CDT
Generative modeling of conditional spatial distributions via autoregressive Gaussian processes
Speaker: Matthias Katzfuss (University of Wisconsin Madison)
In many applications, including climate-model emulation and calibration, there is a need to learn the conditional distribution of a high-dimensional spatial field given a covariate vector, based on a small number of training samples. We propose a nonparametric Bayesian method that decomposes this challenging conditional density estimation task into a large series of univariate autoregressions that we model using heteroskedastic Gaussian processes with carefully chosen prior parameterizations. We describe scalable variational inference based on stochastic gradient descent. The resulting generative model can be used to sample from the learned distribution or transform existing fields as a function of the covariate vector. We provide numerical illustrations and comparisons on simulated data and climate-model output.
15:00-15:30 CDT
Coffee Break & Networking
15:30-16:30 CDT
From Matrix Interpolation to Tensorized Simulation of High-Dimensional Random Variables: with Applications to Rare Event Estimation
Speaker: Tiangang Cu (University of Sydney)
Thursday, May 22, 2025
9:00-9:30 CDT
Sign-in & Breakfast
9:30-10:30 CDT
Digital Twin Calibration with Model-Based Reinforcement Learning
Speaker: Wei Xie (Northeastern University)
This presentation focuses on a novel methodological framework, called the “Actor-Simulator,” that incorporates the calibration of digital twins into model-based reinforcement learning for more effective control of stochastic systems with complex nonlinear dynamics. Traditional model-based control often relies on restrictive structural assumptions (such as linear state transitions) and fails to account for parameter uncertainty in the model. These issues become particularly critical in industries such as biopharmaceutical manufacturing, where process dynamics are complex and not fully known, and only a limited amount of data is available. Our approach jointly calibrates the digital twin and searches for an optimal control policy, thus accounting for and reducing model error. We balance exploration and exploitation by using policy performance as a guide for data collection. This dual-component approach provably converges to the optimal policy, and outperforms existing methods in extensive numerical experiments based on the biopharmaceutical manufacturing domain.
10:30-11:15 CDT
Coffee Break & Networking
11:15-12:15 CDT
A Kernel-Based Approach for Modelling Gaussian Processes with Functional Information
Speaker: Andrew Brown (Clemson University)
Gaussian processes are commonly used tools for modeling continuous processes in machine learning and statistics. This is partly due to the fact that one may employ a Gaussian process as an interpolator for a finite set of known points, which can then be used for prediction and straight forward uncertainty quantification at other locations. However, it is not always the case that the available information is in the form of a finite collection of points. For example, boundary value problems contain information on the boundary of a domain, which is an uncountable collection of points that cannot be incorporated into typical Gaussian process techniques. We propose and construct Gaussian processes that unify, via reproducing kernel Hilbert space, the typical finite case with the case of having uncountable information by exploiting the equivalence of conditional expectation and orthogonal projections. We show existence of the proposed Gaussian process and that it is the limit of a conventional Gaussian process conditioned on an increasing but finite number of points. We illustrate the applicability via numerical examples.
12:15-14:00 CDT
Lunch Break
14:00-15:00 CDT
Two Tales, One Resolution: Physics-Informed Inference Time Scaling and Precondition
Speaker: Yiping Lu (Northwestern University)
In this talk, I will introduce a novel framework for physics-informed debiasing of machine learning estimators, which we call Simulation-Calibrated Scientific Machine Learning (SCaSML). This approach leverages the structure of physical models to achieve three key objectives:
- Unbiased Predictions: It produces unbiased predictions even when the underlying machine learning predictor is biased.
- Overcoming Dimensionality Challenges: It mitigates the curse of dimensionality that often affects high-dimensional estimators.
- Inference Time Scaling: Improve the machine learning estimation by allocating inference time computation.
The SCaSML paradigm integrates a (potentially) biased machine learning algorithm with a de-biasing procedure that is rigorously designed using numerical analysis and stochastic simulation. We dynamically refines and debiases the SCiML predictions during inference by enforcing the physical laws. Our methodology aligns with recent advances in inference-time computation—similar to those seen in the large language model literature—demonstrating that additional computation can enhance ML estimates.
Furthermore, we establish a surprising equivalence between our framework and another research direction that utilizes approximate (linearized) solvers to precondition iterative methods. This connection not only bridges two distinct areas of study but also offers new insights and algorithms into improving estimation accuracy in physics-informed machine learning settings.
15:00-15:30 CDT
Coffee Break & Networking
15:30-16:30 CDT
Development of Physics-informed Spatio-temporal Models
Speaker: Youngdeok Hwang (CUNY - Bernard M. Baruch College)
IMSI is committed to making all of our programs and events inclusive and accessible. Contact [email protected] to request disability-related accommodations.
In order to register for this workshop, you must have an IMSI account and be logged in. Please use one of the buttons below to login or create an account.