Statistical Foundations of Generative Modeling

Description

Generative models have rapidly become a central tool in modern AI and data science. At a high level, a generative model learns an underlying probability distribution from data and can sample from it to create synthetic yet realistic outputs: text, images, financial scenarios, molecules, patient records, climate simulations, and more. Recent advances such as variational autoencoders, generative adversarial networks, normalizing flows, diffusion models, and flow matching have delivered striking empirical performance. At the same time, many of the most pressing questions remain fundamentally statistical: What distribution is being learned, and under what assumptions is it identifiable? Which distributional features are easy or hard to capture (e.g., modes with complex geometry, rare events, and tail behavior)? How can we quantify uncertainty, control bias, and ensure calibration, especially in high-stakes settings where downstream decisions depend on faithful modeling of extremes?

These challenges become even sharper in domain-specific contexts. Open-ended text generation lacks a single “correct” output, making objective evaluation of quality, coherence, diversity, and fluency a critical open problem. In finance and other dependent-data regimes, correlations and selection effects in training data raise questions about generalization and downstream validity. Across application areas, statistics plays a key role in designing reliable metrics to evaluate and compare generative models, and in understanding the properties of common fine-tuning and alignment procedures. More broadly, determining when synthetic data is “good enough” for inference, prediction, or decision-making remains an open question, as do opportunities to use generative models for tasks such as anomaly and changepoint detection.

This workshop brings together statisticians, machine learning researchers, and practitioners from domains including language modeling, finance, biomedicine, and the natural sciences to develop a shared language and research agenda. The goal is to connect modern generative modeling techniques to classical statistical principles, while advancing theory, methodology, and practices that enable reliable deployment in real-world scientific and societal applications.

Some of the funding for this workshop is provided by the Stevanovich Center.

Poster Session

This workshop will include a poster session for early career researchers (including graduate students). In order to propose a poster, you must first register for the workshop, and then submit a proposal using the form that will become available on this page after you register. The registration form should not be used to propose a poster. The organizers may offer the opportunity to give a short lightning talk to a subset of accepted poster proposals.

The deadline for proposing is Sunday, August 7, 2026. If your proposal is accepted, you should plan to attend the event in-person.

In-Person Registration

Seats are limited at the venue, which means that in-person registration may be capped prior to the workshop start date. If capacity is reached, a waitlist will be imposed, which the registration form will reflect. Early registration is strongly encouraged.

All in-person registrants must wait to receive an invitation to attend in-person from IMSI before traveling, which generally begin to be sent out 4-6 weeks in advance.

All registrants (online and in-person) will receive zoom links and are welcome to attend online.

Registration Fee

A non-refundable registration fee will be payable by credit card or debit card for any participants invited to attend this workshop in-person. In-person participants agree to pay the non-refundable fee by the deadline given by IMSI. Failure to pay the fee by the deadline may mean that the invitation to attend in-person is revoked.

Current fees:

$25 for students
$50 for non-students

Organizers

F B

Florentina Bunea Cornell University

L M

Li Ma University of Chicago

R W

Rebecca Willett University of Chicago

A Z

Anru Zhang Duke University

Speakers

R B

Randall Balestriero Brown University

R B

Ricardo Baptista University of Toronto

P B

Peter Bartlett University of California, Berkeley and Google

X ( B

Xin (Mike) Bing University of Toronto

A D

Arnak Dalalyan ENSAE Paris

F K

Fred Koehler University of Chicago

J L

Jessica Li Fred Hutchinson Cancer Center, Biostatistics

X L

Xihong Lin Harvard University, Biostatistics

Q L

Qiao Liu Yale University, Biostatistics

A R

Alessandro Rinaldo University of Texas at Austin

V R

Veronika Rockova University of Chicago, Econometrics and Statistics

X S

Xiaotong Shen University of Minnesota Twin Cities, Statistics

J S

Jeremias Sulam Johns Hopkins University

A T

Alex Tong AITHYRA Research Institute for Biomedical Artificial Intelligence of the Austrian Academy of Sciences

B T

Brian Trippe Stanford University, Statistics

K W

Kaizheng Wang Columbia University

M W

Mengdi Wang Princeton University, Electrical & Computer Engineering

Y W

Yuexi Wang University of Illinois Urbana-Champaign, Statistics

Z W

Zhaoran Wang Northwestern University

L Z

Linda Zhao University of Pennsylvania

H Z

Hongtu Zhu University of North Carolina-Chapel Hill, Biostatistics

Registration

IMSI is committed to making all of our programs and events inclusive and accessible. Contact [email protected] to request disability-related accommodations.

In order to register for this workshop, you must have an IMSI account and be logged in.

Create an Account

Theory, Evaluation, Applications