This event is part of Theoretical Advances in Reinforcement Learning and Control View Details

Reinforcement Learning from Offline Data and Human Feedback

David Rubenstein Forum - 1201 E 60th St, Chicago, IL 60637

Description

Back to top

Reinforcement Learning (RL) has seen remarkable progress in recent years, yet many of its most impressive achievements rely on extensive online interaction, curated environments, or simulated data—conditions rarely available in real-world settings. In contrast, real-world decision-making often depends on learning from limited, imperfect, or passively collected data, alongside guidance from human preferences, demonstrations, or corrections.

This workshop brings together researchers and practitioners exploring the frontiers of Offline Reinforcement Learning (Offline RL) and Reinforcement Learning from Human Feedback (RLHF)—two rapidly growing areas that aim to make RL more robust, safe, and deployable in practice.

Poster Session

This workshop will include a poster session for early career researchers (including graduate students). In order to propose a poster, you must first register for the workshop, and then submit a proposal using the form that will become available on this page after you register. The registration form should not be used to propose a poster.

The deadline for proposing is Wednesday, March 18, 2026. If your proposal is accepted, you should plan to attend the event in-person.

In-Person Registration

Seats are limited at the venue, which means that in-person registration may be capped prior to the workshop start date. If capacity is reached, a waitlist will be imposed, which the registration form will reflect. Early registration is strongly encouraged.

All in-person registrants must wait to receive an invitation to attend in-person from IMSI before traveling, which generally begin to be sent out 4-6 weeks in advance.

All registrants (online and in-person) will receive zoom links and are welcome to attend online.

Organizers

Back to top
C M
Cong Ma University of Chicago
Y C
Yuxin Chen University of Pennsylvania, The Wharton School

Speakers

Back to top
Y C
Yuejie Chi Yale University
J F
Jianqing Fan Princeton University
D F
Dylan Foster Microsoft Research
X G
Xin Guo UC Berkeley
N J
Nan Jiang UIUC
Y J
Ying Jin University of Pennsylvania
Y L
Yingbin Liang The Ohio State University
W M
Wenlong Mou University of Toronto
A Q
Annie Qu University of California, Santa Barbara
Z R
Zhimei Ren University of Pennsylvania
D S
Devavrat Shah MIT
C S
Chengchun Shi London School of Economics
R S
R. Srikant UIUC
W W S
Will Wei Sun Purdue University
W T
Wenpin Tang Columbia University
B V R
Benjamin Van Roy Stanford University
L W
Lan Wang University of Miami
Y W
Yu-Xiang Wang University of California, San Diego
Y W
Yuting Wei University of Pennsylvania, The Wharton School
R X
Renyuan Xu Stanford University
L X
Lingzhou Xue Penn State University
L Y
Lei Ying University of Michigan

Schedule

Monday, April 20, 2026
8:30-8:55 CDT
Check-in/Breakfast
8:55-9:00 CDT
Welcome Remarks
9:00-9:40 CDT
Learning to Answer from Correct Demonstrations

Speaker: Nathan Srebro (Toyota Technological Institute at Chicago)

9:40-9:55 CDT
Q&A
9:55-10:00 CDT
Tech Break
10:00-10:40 CDT
Deep Transfer Offline Q-Learning under Nonstationary Environments

Speaker: Jianqing Fan (Princeton University)

10:40-10:55 CDT
Q&A
10:55-11:25 CDT
Coffee Break
11:25-12:05 CDT
Automated hypothesis validation with agentic sequential falsifications

Speaker: Ying Jin (University of Pennsylvania)

12:05-12:20 CDT
Q&A
12:20-13:30 CDT
Lunch Break
13:30-14:10 CDT
Reinforcement Learning For Individual Optimal Policy From Heterogeneous Data

Speaker: Annie Qu (University of California, Santa Barbara)

14:10-14:25 CDT
Q&A
14:25-14:30 CDT
Tech Break
14:30-15:10 CDT
Sampler Stochasticity in Training Diffusion Models for RLHF

Speaker: Wenpin Tang (Columbia University)

15:10-15:25 CDT
Q&A
15:25-15:40 CDT
Coffee Break
15:40-16:20 CDT
Conditional Diffusion Guidance under Hard Constraint: A Stochastic Analysis Approach

Speaker: Renyuan Xu (Stanford University)

16:20-16:35 CDT
Q&A
Tuesday, April 21, 2026
8:30-9:00 CDT
Check-in/Breakfast
9:00-9:40 CDT
From Offline to Low-Adaptive Reinforcement Learning

Speaker: Yu-Xiang Wang (University of California, San Diego (UCSD))

9:40-9:55 CDT
Q&A
9:55-10:00 CDT
Tech Break
10:00-10:40 CDT
Consequentialist Objectives and Catastrophe

Speaker: Benjamin van Roy (Stanford University)

10:40-10:55 CDT
Q&A
10:55-11:25 CDT
Coffee Break
11:25-12:05 CDT
Demystifying Group Relative Policy Optimization: Its Policy Gradient is a U-Statistic

Speaker: Chengchun Shi (London School of Economics and Political Science)

12:05-12:20 CDT
Q&A
12:20-13:45 CDT
Lunch Break
13:45-14:25 CDT
New Results for Distributional Reinforcement Learning

Speaker: Lan Wang (University of Miami)

14:25-14:40 CDT
Q&A
14:40-15:10 CDT
Coffee Break
15:10-15:50 CDT
From Reward Learning to Leaderboards: Uncertainty Quantification for LLMs under Heterogeneous Human Feedback

Speaker: Will Wei Sun (Purdue University)

15:50-16:05 CDT
Q&A
Wednesday, April 22, 2026
8:30-9:00 CDT
Check-in/Breakfast
9:00-9:40 CDT
On the Learning Dynamics of RLVR at the Edge of Competence

Speaker: Yuejie Chi (Yale University)

9:40-9:55 CDT
Q&A
9:55-10:00 CDT
Tech Break
10:00-10:40 CDT
Statistical Inference under Adaptive Sampling with LinUCB

Speaker: Yuting Wei (University of Pennsylvania)

10:40-10:55 CDT
Q&A
10:55-11:25 CDT
Coffee Break
11:25-12:05 CDT
Non-Asymptotic CLTs and Concentration Inequalities for Stochastic Approximation Algorithms, with Applications to Reinforcement Learning

Speaker: R. Srikant (University of Illinois at Urbana-Champaign)

12:05-12:20 CDT
Q&A
12:20-13:45 CDT
Lunch Break
13:45-14:25 CDT
Off-policy Evaluation via Particle Filtering and Moment Matching

Speaker: Nan Jiang (University of Illinois at Urbana-Champaign)

14:25-14:40 CDT
Q&A
14:40-14:45 CDT
Tech Break
14:45-15:25 CDT
Model simulation using offline observations with low-rank factor model

Speaker: Devavrat Shah (Massachusetts Institute of Technology (MIT))

15:25-15:40 CDT
Q&A
15:40-16:30 CDT
Poster Session/Social Hour
Thursday, April 23, 2026
8:30-9:00 CDT
Check-in/Breakfast
9:00-9:40 CDT
What structures make model-free RL possible? an elliptic theory for controlled Markov diffusions

Speaker: Wenlong Mou (University of Toronto)

9:40-9:55 CDT
Q&A
9:55-10:00 CDT
Tech Break
10:00-10:40 CDT
Deterministic Policy Gradient for Reinforcement Learning with Continuous Time and Space

Speaker: Xin Guo (University of California, Berkeley (UC Berkeley))

10:40-10:55 CDT
Q&A
10:55-11:25 CDT
Coffee Break
11:25-12:05 CDT
Optimal offline policy learning under unknown confounding factors

Speaker: Zhimei Ren (University of Pennsylvania)

12:05-12:20 CDT
Q&A
12:20-13:45 CDT
Lunch Break
13:45-14:25 CDT
Toward efficient exploration for language models

Speaker: Dylan Foster (Microsoft Research)

14:25-14:40 CDT
Q&A
14:40-15:10 CDT
Coffee Break
15:10-15:50 CDT
Sample-Efficient and Low-Cost Model-Free Reinforcement Learning

Speaker: Lingzhou Xue (The Pennsylvania State University)

15:50-16:05 CDT
Q&A
Friday, April 24, 2026
8:30-9:00 CDT
Check-in/Breakfast
9:00-9:40 CDT
PPO Fine-Tuning of Diffusion Models: Provable Convergence across Interpolated Trajectories

Speaker: Yingbin Liang (The Ohio State University)

9:40-9:55 CDT
Q&A
9:55-10:00 CDT
Tech Break
10:00-10:40 CDT
Stochastic Zeroth-Order Policy Optimization for RLHF

Speaker: Lei Ying (University of Michigan)

10:40-10:55 CDT
Q&A
10:55-11:25 CDT
Coffee Break
11:25-12:05 CDT
Fisher Random Walk: Automatic Preference Inference for Language Models

Speaker: Junwei Lu (Harvard University)

12:05-12:20 CDT
Q&A
12:20-12:35 CDT
Workshop Survey and Closing Remarks

Poster Session

Back to top

Posters submitted in advance can be viewed on this page.

Registration

IMSI is committed to making all of our programs and events inclusive and accessible. Contact [email protected] to request disability-related accommodations.

In order to register for this workshop, you must have an IMSI account and be logged in.