In recent years, many novel methods and directions have emerged in reinforcement learning and control. A particularly exciting development is the use of online optimization and statistical learning techniques in control theory. This has led to novel methods and guarantees in various contexts, including in stochastic and adversarial environments, system identification, iterative planning and sequence prediction. Other topics we will cover include new connections between control and both model-free and model-based reinforcement learning, as well as learning dynamical systems. We aim to bring together researchers to facilitate progress along these lines of investigation, and discuss important future directions in reinforcement learning, control, learning dynamical systems and applications to sequence prediction.
Poster Session and Lightning Talks
This workshop will include a poster sessionand lightning talks for early career researchers (including graduate students). In order to propose a poster or a lightning talk, you must first register for the workshop, and then submit a proposal using the form that will become available on this page after you register. You can request to do one, or both. The registration form should not be used to propose a posteror a lightning talk.
The deadline for proposing is Sunday, March 15, 2026. If your proposal is accepted, you should plan to attend the event in-person.
In-Person Registration
Seats are limited at the venue, which means that in-person registration may be capped prior to the workshop start date. If capacity is reached, a waitlist will be imposed, which the registration form will reflect. Early registration is strongly encouraged.
All in-person registrants must wait to receive an invitation to attend in-person from IMSI before traveling, which generally begin to be sent out 4-6 weeks in advance.
All registrants (online and in-person) will receive zoom links and are welcome to attend online.
Max Raginski
University of Illinois Urbana-Champaign (UIUC)
D
R
Daniel Russo
Columbia University
D
S
Dale Schuurmans
University of Alberta and Google DeepMind
M
S
Max Simchowitz
Carnegie Mellon University
S
T
Stephen Tu
University of Southern California
V
T
Vasileios Tzoumas
University of Michigan
B
V
R
Ben Van Roy
Stanford University
Schedule
Monday, May 11, 2026
8:30-8:50 CDT
Breakfast/Check-in
8:50-9:00 CDT
Welcome
9:00-9:45 CDT
TBA
Speaker: Elad Hazan (Princeton University)
9:45-10:00 CDT
Q&A
10:00-10:05 CDT
Tech Break
10:05-10:50 CDT
TBA
Speaker: Babak Hassibis (Caltech)
10:50-11:05 CDT
Q&A
11:05-11:35 CDT
Coffee Break
11:35-12:20 CDT
Understanding the foundation model pipeline through coverage
Speaker: Dylan Foster (Microsoft Research)
12:20-12:35 CDT
Q&A
12:35-13:35 CDT
Lunch break
13:35-14:20 CDT
TBA
Speaker: Drew Bagnell (Carnegie Mellon University and Aurora)
14:20-14:35 CDT
Q&A
14:35-15:35 CDT
Lightning Talks
15:40-16:30 CDT
Poster Session & Social Hour
Tuesday, May 12, 2026
8:30-9:00 CDT
Breakfast/Check-in
9:00-9:45 CDT
Sequences of Logits and The Low Rank Structure of Language Models
Speaker: Noah Golowich (Microsoft Research NYC)
A major problem in the study of large language models is to understand their inherent low-dimensional structure. We introduce an approach to study the low-dimensional structure of language models at a model-agnostic level: as sequential probabilistic models. We first empirically demonstrate that a wide range of modern language models exhibit low-rank structure: in particular, matrices built from the model’s logits for varying sets of prompts and responses have low approximate rank. Taking a theoretical perspective, we then show that any distribution over sequences with such structure of low approximate logit rank can be provably learned using polynomially many queries to the model's logits and polynomial time. Finally, we show that insights resulting from this perspective of low-rank can be leveraged for generation— for instance, we can generate a response to a target prompt using a linear combination of the model’s outputs on unrelated, or even nonsensical prompts. Further, we show how such insights can explain phenomena observed in fine-tuning of LLMs, namely those relating to subliminal learning.
9:45-10:00 CDT
Q&A
10:00-10:05 CDT
Tech Break
10:05-10:50 CDT
The Power of Universal Sequence Preconditioning
Speaker: Annie Marsden (Google Deepmind)
We study the problem of preconditioning in sequential prediction. From the theoretical lens of linear dynamical systems, we show that convolving the target sequence corresponds to applying a polynomial to the hidden transition matrix. Building on this insight, we propose a universal preconditioning method that convolves the target with coefficients from orthogonal polynomials such as Chebyshev or Legendre. We prove that this approach reduces regret for two distinct prediction algorithms and yields the first ever sublinear and hidden-dimension-independent regret bounds (up to logarithmic factors) that hold for systems with marginally table and asymmetric transition matrices. Finally, extensive synthetic and real-world experiments show that this simple preconditioning strategy improves the performance of a diverse range of algorithms, including recurrent neural networks, and generalizes to signals beyond linear dynamical systems.
10:50-11:05 CDT
Q&A
11:05-11:35 CDT
Coffee Break
11:35-12:20 CDT
Learning Pipelines for Adaptive Control
Speaker: Florian Dorfler (University of Pennsylvania)
The adjacent fields of reinforcement learning (RL) and adaptive control share the same objectives, yet they are separated by a wide cultural gap. In this presentation, I attempt to bridge this gap for the linear quadratic regulator (LQR) problem, which serves as a cornerstone and the benchmark for both fields. I begin by discussing different learning pipelines, including direct and indirect (model-based) approaches, as well as episodic and online (adaptive) approaches. Despite the extensive literature spanning several decades, numerous problems remain unsolved. For instance, RL methods are seldom concerned with closed-loop stability certificates or efficient implementations, while the adaptive control community has dedicated minimal effort to optimality. We address the data-driven LQR problem in an adaptive setting, which entails online recursive algorithms and closed-loop data, and we seek both algorithmic as well as closed-loop certificates. Our approach encompasses different variations of policy gradient methods and employs a novel covariance parameterization of the LQR problem. Finally, all our theoretical results are validated through simulations and experiments in diverse domains, demonstrating the computational and sample efficiency of our method.
12:20-12:35 CDT
Q&A
12:35-13:35 CDT
Lunch break
13:35-14:20 CDT
TBA
Speaker: Vasileios Tzoumas (University of Michigan)
14:20-14:35 CDT
Q&A
14:35-15:00 CDT
Coffee Break
15:00-15:45 CDT
Success Conditioning as Policy Improvement: The Optimization Problem Solved by Imitating Success
Speaker: Daniel Russo (Columbia University)
15:45-16:00 CDT
Q&A
Wednesday, May 13, 2026
8:30-9:00 CDT
Breakfast/Check-in
9:00-9:45 CDT
Learning and control in the presence of observer effects
Speaker: Sarah Dean (Cornell University)
In many modern engineering domains, the presence of "observer effects" creates interdependence between measurement and underlying state. In such settings, control actions both impact the system state and determine what information about it is observed. Accounting for this dual role is crucial for designing reliable algorithms for learning and control, for applications ranging from robotics to personalized recommendation systems. In this talk, I will discuss recent work in the setting of partially observed dynamical systems with linear state transitions and bilinear observations. Inspired by the rich line of work on learning and control for linear systems, our goal is to understand how much (and which) data is necessary for reliable decision-making.
First, I will discuss learning from observations when the dynamics are unknown and provide finite data error bounds and a sample complexity analysis for inputs chosen according to a simple random design. Second, we will consider the optimal control problem with the objective of minimizing a quadratic cost. Despite the similarity to standard linear quadratic Gaussian (LQG) control, neither does the separation principle (SP) hold, nor is the optimal policy affine in the estimated state. Under certain conditions, the SP-based controller locally maximizes the cost instead of minimizing it, and instability can result from a loss of observability. By accounting for how the actions impact state estimation, I will introduce an MPC controller based on receding horizon planning in the belief space. I will conclude with a discussion of open questions on control design and end-to-end guarantees. Based on joint work with Yahya Sattar, Sunmook Choi, Yassir Jedra, Leo Maynard-Zhang, and Maryam Fazel
Sarah Dean is an assistant professor of computer science. She studies the interplay between optimization, machine learning, and dynamics in real-world systems. Her research focuses on understanding the fundamentals of data-driven methods for control and decision-making, inspired by applications ranging from robotics to recommendation systems. She completed her postdoctoral research at the University of Washington and earned her M.S. and Ph.D. in electrical engineering and computer science at the University of California, Berkeley. Dean received her B.S.E. in electrical engineering and mathematics from the University of Pennsylvania.
9:45-10:00 CDT
Q&A
10:00-10:05 CDT
Tech Break
10:05-10:50 CDT
Controlled dynamical systems on the space of probability measures
Speaker: Max Raginsky (University of Illinois at Urbana-Champaign)
10:50-11:05 CDT
Q&A
11:05-11:35 CDT
Coffee Break
11:35-12:20 CDT
Large Language Models and Computation
Speaker: Dale Schuurmans (University of Alberta)
The ability of large generative models to respond naturally to text, image and audio inputs has created significant excitement. Particularly interesting is the ability of these models to generate outputs that resemble coherent reasoning and computational sequences. I will discuss the inherent computational capability of large language models and show that autoregressive decoding supports universal computation, even without pre-training. The co-existence of informal and formal computational systems in the same model does not change what is computable, but does provide new means for eliciting desired behaviour. I will then discuss how post-training, in an attempt to make a model more directable, faces severe computational limits on what can be achieved, but that accounting for these limits can improve outcomes.
12:20-12:35 CDT
Q&A
12:35-13:35 CDT
Lunch break
13:35-14:20 CDT
Latent Representations for Control Design with Provable Stability and Safety Guarantees
Speaker: Stephen Tu (University of Southern California (USC))
We initiate a formal study on the use of low-dimensional latent representations of dynamical systems for verifiable control synthesis. Our main goal is to enable the application of verification techniques -- such as Lyapunov or barrier functions -- that might otherwise be computationally prohibitive when applied directly to the full state representation. Towards this goal, we first provide dynamics-aware approximate conjugacy conditions which formalize the notion of reconstruction error necessary for systems analysis. We then utilize our conjugacy conditions to transfer the stability and invariance guarantees of a latent certificate function (e.g., a Lyapunov or barrier function) for a latent space controller back to the original system. Importantly, our analysis contains several important implications for learning latent spaces and dynamics, by highlighting the necessary geometric properties which need to be preserved by the latent space, in addition to providing concrete loss functions for dynamics reconstruction that are directly related to control design.
14:20-14:35 CDT
Q&A
14:35-15:00 CDT
Coffee Break
15:00-15:45 CDT
Some fundamental limitations of learning for dynamics and control
Speaker: Necmiye Ozay (University of Michigan)
Data-driven and learning-based methods have attracted considerable attention in recent years both for the analysis of dynamical systems and for control design. While there are many interesting and exciting results in this direction, our understanding of fundamental limitations of learning for control is lagging. This talk will focus on the question of when learning can be hard or impossible in the context of dynamical systems and control. In the first part of the talk, I will discuss a new observation on immersions and how it reveals some potential limitations in learning Koopman embeddings. In the second part of the talk, I will show what makes it hard to learn to stabilize linear systems from a sample-complexity perspective. While these results might seem negative, I will conclude the talk with some thoughts on how they can inspire interesting future directions.
15:45-16:00 CDT
Q&A
Thursday, May 14, 2026
8:30-9:00 CDT
Breakfast/Check-in
9:00-9:45 CDT
A mathematical basis for Moravec’s paradox
Speaker: Max Simchowitz (Carnegie Mellon University)
9:45-10:00 CDT
Q&A
10:00-10:05 CDT
Tech Break
10:05-10:50 CDT
TBA
Speaker: Zak Mhammedi (Google Reasearch)
10:50-11:05 CDT
Q&A
11:05-11:35 CDT
Coffee Break
11:35-12:20 CDT
Self-Attention for Online Decision-Making and Control
Speaker: Gautam Goel (University of California, Berkeley (UC Berkeley)
Self-attention is the key algorithmic module which powers the Transformer neural architecture. However, the softmax nonlinearity appearing in self-attention makes theoretical analysis quite challenging. We study the training dynamics of gradient descent in a softmax self-attention layer trained to perform linear regression and propose a simple first-order optimization algorithm which converges to the globally optimal self-attention parameters at a geometric rate. Our analysis proceeds in two steps. First, we show that in the infinite-data limit the regression problem solved by the self-attention layer is equivalent to a nonconvex matrix factorization problem. Second, we exploit this connection to design a novel ``structure-aware" variant of gradient descent which efficiently optimizes the original finite-data regression objective. Our optimization algorithm features several innovations over standard gradient descent, including a preconditioner and regularizer which help avoid spurious stationary points, and a data-dependent spectral initialization of parameters which lie near the manifold of global minima with high probability. As an application of our results, we show that our algorithm can be used to obtain sublinear regret for the problem of online stochastic regression.
12:20-12:35 CDT
Q&A
12:35-13:35 CDT
Lunch break
13:35-14:20 CDT
From Generative Models to Control: Representation-based Reinforcement Learning in Physical Systems
Speaker: Na Li (Harvard)
14:20-14:35 CDT
Q&A
14:35-15:00 CDT
Coffee Break
15:00-15:45 CDT
TBA
Speaker: Jacob Abernethy (Georgia Tech)
15:45-16:00 CDT
Q&A
Friday, May 15, 2026
8:30-9:00 CDT
Breakfast/Check-in
9:00-9:45 CDT
TBA
Speaker: Nadav Cohen (Tel Aviv University)
We study the problem of preconditioning in sequential prediction. From the theoretical lens of linear dynamical systems, we show that convolving the target sequence corresponds to applying a polynomial to the hidden transition matrix. Building on this insight, we propose a universal preconditioning method that convolves the target with coefficients from orthogonal polynomials such as Chebyshev or Legendre. We prove that this approach reduces regret for two distinct prediction algorithms and yields the first ever sublinear and hidden-dimension-independent regret bounds (up to logarithmic factors) that hold for systems with marginally table and asymmetric transition matrices. Finally, extensive synthetic and real-world experiments show that this simple preconditioning strategy improves the performance of a diverse range of algorithms, including recurrent neural networks, and generalizes to signals beyond linear dynamical systems.
9:45-10:00 CDT
Q&A
10:00-10:30 CDT
Coffee Break
10:30-11:15 CDT
TBA
Speaker: Ben Van Roy (Stanford University)
11:15-11:30 CDT
Q&A
11:30-12:15 CDT
Can Gradient Descent Beat Ricatti?
Speaker: Alex Olshvesky (Boston University)
While it is known that gradient descent recovers the optimal LQR gain without spurious local minima, it is hard to argue against just solving the Riccati equation. We revisit this question in the dual (filtering) setting and give a concrete answer: for large-scale sparse systems, gradient-based computation of the Kalman gain can reduce per-iteration cost to linear in the state dimension.
Our starting point is a new formula expressing the gradient of the innovations loss as a product of two interpretable factors: the observability Gramian of the error dynamics and the cross-covariance between the estimation error and the innovation, a quantity that measures how far the current filter is from satisfying the Kalman orthogonality principle. This decomposition reveals that spurious stationary points arise if and only if the system loses observability, and it identifies a non-standard observability under which gradient descent converges geometrically to the Kalman gain. The convergence rate itself decomposes cleanly into a worst-case observability measure and a term capturing how steeply orthogonality violation penalizes the cost. Finally, we show that under natural sparsity assumptions, the gradient can be approximated using only sparse matrix-vector products, and we demonstrate order-of-magnitude speedups over standard Riccati solvers on systems with several thousand state variables.
12:15-12:30 CDT
Q&A
12:30-12:45 CDT
Workshop Survey and Closing Remarks
Registration
IMSI is committed to making all of our programs and events inclusive and accessible.
Contact [email protected] to request
disability-related accommodations.
In order to register for this workshop, you must have an IMSI account and be logged in.