Description
Diffusion models have emerged as a transformative technology in generative AI [1, 2, 3, 4]. Their exceptionalability to generate high-fidelity synthetic data has proven invaluable in content creation, data augmentation,artistic exploration, healthcare, and simulation [1, 5]. Compared to classical generative models such as Generative Adversarial Networks (GANs) [6] and Variational Autoencoders (VAEs) [7], diffusion models mitigate issues of training instability and blurry reconstructions [8, 9]. Moreover, they complement Large Language Models (LLMs) [10, 11, 12] by providing an alternative to the autoregressive paradigm, enabling holistic structured data generation. Recent breakthroughs in video generation [13, 14, 15, 16] and financial and economic data augmentation [17] further demonstrate the expanding potential of diffusion models.Generating time series and sequential data that accurately capture the underlying data structure is crucial for high-stakes fields such as finance, economics, and healthcare [18, 19, 20]. However, this task is particularly challenging, as many applications in these domains operate in the small-data regime [21, 22]. This IRC proposal aims to bring together a selected group of researchers with diverse backgrounds in mathematics,statistics, and engineering who work on diffusion models to collaboratively tackle this problem. The goal is to take the first step toward establishing a rigorous mathematical framework for simulating sequential data and developing a principled implementation scheme to address complex societal challenges.
Challenges and Objectives
This project aims to advance the theoretical foundations and practical methodologies of diffusion models for sequential data. These advancements have broad applications in dynamical systems, financial markets, healthcare, and sequential decision-making [23, 24]. Unlike static data, such as images, sequential data consist of dependent frames, introducing significant challenges:
- Challenge 1: Curse of Dimensionality and Complex Dependencies. Sequential data can be extensively long, leading to high-dimensional representations when naively stacking frames. Furthermore, spatial-temporal correlations within and across frames add complexity to modeling and analysis.
- Challenge 2: Non-anticipativity Constraints. In applications such as time series forecasting [25, 26, 27] and finance [28, 29], future data should not influence past observations. Enforcing this non-anticipativity constraint in diffusion models requires novel theory and methods.
To address these challenges, we propose an interdisciplinary study integrating stochastic analysis, theoreticalstatistics, and applied mathematics. The project focuses on two key tasks:
- Task 1: Efficient Statistical Complexities. This task tackles Challenge 1 by connecting neural network architectural design with capturing spatial-temporal dependencies in sequential data and establish ambient dimension-free statistical complexities.
- Task 2: Provable Design for Non-anticipativity. This task addresses Challenge 2 by developing both new diffusion processes and network architectures to rigorously enforce the non-anticipativity constraint.
Both tasks bridge the gap between theory and practice in diffusion models for sequential data. Our theoretical contributions will provide the first rigorous justification for the efficiency and fidelity of diffusion models in this setting. Meanwhile, our principled methodologies will establish new baselines for downstream applications such as time series forecasting, imputation, and financial modeling.
Event Dates
August 3-13, 2026