New Directions in Reinforcement Learning and Control • IMSI

Description

In recent years, many novel methods and directions have emerged in reinforcement learning and control. A particularly exciting development is the use of online optimization and statistical learning techniques in control theory. This has led to novel methods and guarantees in various contexts, including in stochastic and adversarial environments, system identification, iterative planning and sequence prediction. Other topics we will cover include new connections between control and both model-free and model-based reinforcement learning, as well as learning dynamical systems. We aim to bring together researchers to facilitate progress along these lines of investigation, and discuss important future directions in reinforcement learning, control, learning dynamical systems and applications to sequence prediction.

Poster Session and Lightning Talks

This workshop will include a poster session and lightning talks for early career researchers (including graduate students). In order to propose a poster or a lightning talk, you must first register for the workshop, and then submit a proposal using the form that will become available on this page after you register. You can request to do one, or both. The registration form should not be used to propose a poster or a lightning talk.

The deadline for proposing is Sunday, March 15, 2026. If your proposal is accepted, you should plan to attend the event in-person.

In-Person Registration

Seats are limited at the venue, which means that in-person registration may be capped prior to the workshop start date. If capacity is reached, a waitlist will be imposed, which the registration form will reflect. Early registration is strongly encouraged.

All in-person registrants must wait to receive an invitation to attend in-person from IMSI before traveling, which generally begin to be sent out 4-6 weeks in advance.

All registrants (online and in-person) will receive zoom links and are welcome to attend online.

Organizers

E H

Elad Hazan Princeton University

G B

Gon Buzaglo Princeton University

Speakers

J A

Jacob Abernethy Georgia Tech

D B

Drew Bagnell Carnegie Mellon University

N C

Nadav Cohen Tel Aviv University

S D

Sarah Dean Cornell University

F D

Florian Dörfler ETH Zurich

D F

Dylan Foster Microsoft Research

G G

Gautam Goel University of California, Berkeley

N G

Noah Golowich Microsoft Research NYC

B H

Babak Hassibis Caltech

E H

Elad Hazan Princeton University

N L

Na Li Harvard University

C M

Cong Ma University of Chicago

A M

Annie Marsden Google DeepMind (GDM)

Z M

Zak Mhammedi Google Research

A O

Alex Olshevsky Boston University (BU)

N O

Necmiye Ozay University of Michigan

M R

Max Raginski University of Illinois Urbana-Champaign (UIUC)

D R

Daniel Russo Columbia University

D S

Dale Schuurmans University of Alberta and Google DeepMind

N S

Nathan Srebro University of Chicago and Toyota Technological Institute of Chicago

S T

Stephen Tu University of Southern California

V T

Vasileios Tzoumas University of Michigan

Schedule

Monday, May 11, 2026

8:30-8:50 CDT

Breakfast/Check-in

8:50-9:00 CDT

Welcome

9:00-9:45 CDT

Provably Efficient Learning in Nonlinear Dynamical Systems

Speaker: Elad Hazan (Princeton University)

Abstract +

9:45-10:00 CDT

Q&A

10:00-10:05 CDT

Tech Break

10:05-10:50 CDT

Understanding the foundation model pipeline through coverage

Speaker: Dylan Foster (Microsoft Research)

10:50-11:05 CDT

Q&A

11:05-11:35 CDT

Coffee Break

11:35-12:20 CDT

Learning to Answer from Correct Demonstrations

Speaker: Nathan Srebro (University of Chicago and Toyota Technological Institute at Chicago)

Abstract +

12:20-12:35 CDT

Q&A

12:35-13:35 CDT

Lunch break

13:35-14:20 CDT

Return of the Reward Function

Speaker: Drew Bagnell (Carnegie Mellon University and Aurora)

Abstract +

14:20-14:35 CDT

Q&A

14:35-15:35 CDT

Lightning Talks

15:40-16:30 CDT

Poster Session & Social Hour

Tuesday, May 12, 2026

8:30-9:00 CDT

Breakfast/Check-in

9:00-9:45 CDT

Generalization in AI Agents: Lessons from Linear-Quadratic Control

Speaker: Nadav Cohen (Tel Aviv University)

Abstract +

9:45-10:00 CDT

Q&A

10:00-10:05 CDT

Tech Break

10:05-10:50 CDT

The Power of Universal Sequence Preconditioning

Speaker: Annie Marsden (Google Deepmind)

Abstract +

10:50-11:05 CDT

Q&A

11:05-11:35 CDT

Coffee Break

11:35-12:20 CDT

Learning Pipelines for Adaptive Control

Speaker: Florian Dorfler (University of Pennsylvania)

Abstract +

12:20-12:35 CDT

Q&A

12:35-13:35 CDT

Lunch break

13:35-14:20 CDT

Some fundamental limitations of learning for dynamics and control

Speaker: Necmiye Ozay (University of Michigan)

Abstract +

14:20-14:35 CDT

Q&A

14:35-15:00 CDT

Coffee Break

15:00-15:45 CDT

Success Conditioning as Policy Improvement: The Optimization Problem Solved by Imitating Success

Speaker: Daniel Russo (Columbia University)

15:45-16:00 CDT

Q&A

Wednesday, May 13, 2026

8:30-9:00 CDT

Breakfast/Check-in

9:00-9:45 CDT

Learning and control in the presence of observer effects

Speaker: Sarah Dean (Cornell University)

Abstract +

In many modern engineering domains, the presence of "observer effects" creates interdependence between measurement and underlying state. In such settings, control actions both impact the system state and determine what information about it is observed. Accounting for this dual role is crucial for designing reliable algorithms for learning and control, for applications ranging from robotics to personalized recommendation systems. In this talk, I will discuss recent work in the setting of partially observed dynamical systems with linear state transitions and bilinear observations. Inspired by the rich line of work on learning and control for linear systems, our goal is to understand how much (and which) data is necessary for reliable decision-making.

First, I will discuss learning from observations when the dynamics are unknown and provide finite data error bounds and a sample complexity analysis for inputs chosen according to a simple random design. Second, we will consider the optimal control problem with the objective of minimizing a quadratic cost. Despite the similarity to standard linear quadratic Gaussian (LQG) control, neither does the separation principle (SP) hold, nor is the optimal policy affine in the estimated state. Under certain conditions, the SP-based controller locally maximizes the cost instead of minimizing it, and instability can result from a loss of observability. By accounting for how the actions impact state estimation, I will introduce an MPC controller based on receding horizon planning in the belief space. I will conclude with a discussion of open questions on control design and end-to-end guarantees. Based on joint work with Yahya Sattar, Sunmook Choi, Yassir Jedra, Leo Maynard-Zhang, and Maryam Fazel

Sarah Dean is an assistant professor of computer science. She studies the interplay between optimization, machine learning, and dynamics in real-world systems. Her research focuses on understanding the fundamentals of data-driven methods for control and decision-making, inspired by applications ranging from robotics to recommendation systems. She completed her postdoctoral research at the University of Washington and earned her M.S. and Ph.D. in electrical engineering and computer science at the University of California, Berkeley. Dean received her B.S.E. in electrical engineering and mathematics from the University of Pennsylvania.

9:45-10:00 CDT

Q&A

10:00-10:05 CDT

Tech Break

10:05-10:50 CDT

From Generative Models to Control: Representation-based Reinforcement Learning in Physical Systems

Speaker: Na Li (Harvard)

10:50-11:05 CDT

Q&A

11:05-11:35 CDT

Coffee Break

11:35-12:20 CDT

Large Language Models and Computation

Speaker: Dale Schuurmans (University of Alberta)

Abstract +

12:20-12:35 CDT

Q&A

12:35-13:35 CDT

Lunch break

13:35-14:20 CDT

Latent Representations for Control Design with Provable Stability and Safety Guarantees

Speaker: Stephen Tu (University of Southern California (USC))

Abstract +

14:20-14:35 CDT

Q&A

14:35-15:00 CDT

Coffee Break

Thursday, May 14, 2026

8:30-9:00 CDT

Breakfast/Check-in

9:00-9:45 CDT

Sequences of Logits and The Low Rank Structure of Language Models

Speaker: Noah Golowich (Microsoft Research NYC)

Abstract +

9:45-10:00 CDT

Q&A

10:00-10:05 CDT

Tech Break

10:05-10:50 CDT

Decoupling Exploration and Policy Optimization: Uncertainty Guided Tree Search for Hard Exploration

Speaker: Zak Mhammedi (Google Reasearch)

Abstract +

10:50-11:05 CDT

Q&A

11:05-11:35 CDT

Coffee Break

11:35-12:20 CDT

Self-Attention for Online Decision-Making and Control

Speaker: Gautam Goel (University of California, Berkeley (UC Berkeley)

Abstract +

Self-attention is the key algorithmic module which powers the Transformer neural architecture. However, the softmax nonlinearity appearing in self-attention makes theoretical analysis quite challenging. We study the training dynamics of gradient descent in a softmax self-attention layer trained to perform linear regression and propose a simple first-order optimization algorithm which converges to the globally optimal self-attention parameters at a geometric rate. Our analysis proceeds in two steps. First, we show that in the infinite-data limit the regression problem solved by the self-attention layer is equivalent to a nonconvex matrix factorization problem. Second, we exploit this connection to design a novel ``structure-aware" variant of gradient descent which efficiently optimizes the original finite-data regression objective. Our optimization algorithm features several innovations over standard gradient descent, including a preconditioner and regularizer which help avoid spurious stationary points, and a data-dependent spectral initialization of parameters which lie near the manifold of global minima with high probability. As an application of our results, we show that our algorithm can be used to obtain sublinear regret for the problem of online stochastic regression.

12:20-12:35 CDT

Q&A

12:35-13:35 CDT

Lunch break

13:35-14:20 CDT

Controlled dynamical systems on the space of probability measures

Speaker: Max Raginsky (University of Illinois at Urbana-Champaign)

14:20-14:35 CDT

Q&A

14:35-15:00 CDT

Coffee Break

15:00-15:45 CDT

Can Gradient Descent Beat Ricatti?

Speaker: Alex Olshvesky (Boston University)

Abstract +

15:45-16:00 CDT

Q&A

Friday, May 15, 2026

8:50-9:00 CDT

Breakfast, Check-in, and Workshop Survey

9:00-11:00 CDT

In-Person Group Discussion

Poster Session

Posters submitted in advance can be viewed on this page.