Reinforcement Learning from Offline Data and Human Feedback • IMSI

Description

Reinforcement Learning (RL) has seen remarkable progress in recent years, yet many of its most impressive achievements rely on extensive online interaction, curated environments, or simulated data—conditions rarely available in real-world settings. In contrast, real-world decision-making often depends on learning from limited, imperfect, or passively collected data, alongside guidance from human preferences, demonstrations, or corrections.

This workshop brings together researchers and practitioners exploring the frontiers of Offline Reinforcement Learning (Offline RL) and Reinforcement Learning from Human Feedback (RLHF)—two rapidly growing areas that aim to make RL more robust, safe, and deployable in practice.

Poster Session

This workshop will include a poster session for early career researchers (including graduate students). In order to propose a poster, you must first register for the workshop, and then submit a proposal using the form that will become available on this page after you register. The registration form should not be used to propose a poster.

The deadline for proposing is Wednesday, March 18, 2026. If your proposal is accepted, you should plan to attend the event in-person.

In-Person Registration

Seats are limited at the venue, which means that in-person registration may be capped prior to the workshop start date. If capacity is reached, a waitlist will be imposed, which the registration form will reflect. Early registration is strongly encouraged.

All in-person registrants must wait to receive an invitation to attend in-person from IMSI before traveling, which generally begin to be sent out 4-6 weeks in advance.

All registrants (online and in-person) will receive zoom links and are welcome to attend online.

Organizers

C M

Cong Ma University of Chicago

Y C

Yuxin Chen University of Pennsylvania, The Wharton School

Speakers

Y C

Yuejie Chi Yale University

J F

Jianqing Fan Princeton University

D F

Dylan Foster Microsoft Research

X G

Xin Guo UC Berkeley

N J

Nan Jiang UIUC

Y J

Ying Jin University of Pennsylvania

Y L

Yingbin Liang The Ohio State University

W M

Wenlong Mou University of Toronto

A Q

Annie Qu University of California, Santa Barbara

Z R

Zhimei Ren University of Pennsylvania

D S

Devavrat Shah MIT

C S

Chengchun Shi London School of Economics

R S

R. Srikant UIUC

W W S

Will Wei Sun Purdue University

W T

Wenpin Tang Columbia University

B V R

Benjamin Van Roy Stanford University

L W

Lan Wang University of Miami

Y W

Yu-Xiang Wang University of California, San Diego

Y W

Yuting Wei University of Pennsylvania, The Wharton School

R X

Renyuan Xu Stanford University

L X

Lingzhou Xue Penn State University

L Y

Lei Ying University of Michigan

Schedule

Monday, April 20, 2026

8:30-8:55 CDT

Check-in/Breakfast

8:55-9:00 CDT

Welcome Remarks

9:00-9:40 CDT

Learning to Answer from Correct Demonstrations

Speaker: Nathan Srebro (Toyota Technological Institute at Chicago)

Abstract +

9:40-9:55 CDT

Q&A

9:55-10:00 CDT

Tech Break

10:00-10:40 CDT

Deep Transfer Offline Q-Learning under Nonstationary Environments

Speaker: Jianqing Fan (Princeton University)

Abstract +

10:40-10:55 CDT

Q&A

10:55-11:25 CDT

Coffee Break

11:25-12:05 CDT

Automated hypothesis validation with agentic sequential falsifications

Speaker: Ying Jin (University of Pennsylvania)

Abstract +

12:05-12:20 CDT

Q&A

12:20-13:30 CDT

Lunch Break

13:30-14:10 CDT

Reinforcement Learning For Individual Optimal Policy From Heterogeneous Data

Speaker: Annie Qu (University of California, Santa Barbara)

Abstract +

14:10-14:25 CDT

Q&A

14:25-14:30 CDT

Tech Break

14:30-15:10 CDT

Sampler Stochasticity in Training Diffusion Models for RLHF

Speaker: Wenpin Tang (Columbia University)

Abstract +

15:10-15:25 CDT

Q&A

15:25-15:40 CDT

Coffee Break

15:40-16:20 CDT

Conditional Diffusion Guidance under Hard Constraint: A Stochastic Analysis Approach

Speaker: Renyuan Xu (Stanford University)

Abstract +

16:20-16:35 CDT

Q&A

Tuesday, April 21, 2026

8:30-9:00 CDT

Check-in/Breakfast

9:00-9:40 CDT

From Offline to Low-Adaptive Reinforcement Learning

Speaker: Yu-Xiang Wang (University of California, San Diego (UCSD))

Abstract +

9:40-9:55 CDT

Q&A

9:55-10:00 CDT

Tech Break

10:00-10:40 CDT

Consequentialist Objectives and Catastrophe

Speaker: Benjamin van Roy (Stanford University)

Abstract +

10:40-10:55 CDT

Q&A

10:55-11:25 CDT

Coffee Break

11:25-12:05 CDT

Demystifying Group Relative Policy Optimization: Its Policy Gradient is a U-Statistic

Speaker: Chengchun Shi (London School of Economics and Political Science)

Abstract +

12:05-12:20 CDT

Q&A

12:20-13:45 CDT

Lunch Break

13:45-14:25 CDT

New Results for Distributional Reinforcement Learning

Speaker: Lan Wang (University of Miami)

Abstract +

14:25-14:40 CDT

Q&A

14:40-15:10 CDT

Coffee Break

15:10-15:50 CDT

From Reward Learning to Leaderboards: Uncertainty Quantification for LLMs under Heterogeneous Human Feedback

Speaker: Will Wei Sun (Purdue University)

Abstract +

15:50-16:05 CDT

Q&A

Wednesday, April 22, 2026

8:30-9:00 CDT

Check-in/Breakfast

9:00-9:40 CDT

On the Learning Dynamics of RLVR at the Edge of Competence

Speaker: Yuejie Chi (Yale University)

9:40-9:55 CDT

Q&A

9:55-10:00 CDT

Tech Break

10:00-10:40 CDT

Statistical Inference under Adaptive Sampling with LinUCB

Speaker: Yuting Wei (University of Pennsylvania)

Abstract +

10:40-10:55 CDT

Q&A

10:55-11:25 CDT

Coffee Break

11:25-12:05 CDT

Non-Asymptotic CLTs and Concentration Inequalities for Stochastic Approximation Algorithms, with Applications to Reinforcement Learning

Speaker: R. Srikant (University of Illinois at Urbana-Champaign)

Abstract +

12:05-12:20 CDT

Q&A

12:20-13:45 CDT

Lunch Break

13:45-14:25 CDT

Off-policy Evaluation via Particle Filtering and Moment Matching

Speaker: Nan Jiang (University of Illinois at Urbana-Champaign)

Abstract +

14:25-14:40 CDT

Q&A

14:40-14:45 CDT

Tech Break

14:45-15:25 CDT

Model simulation using offline observations with low-rank factor model

Speaker: Devavrat Shah (Massachusetts Institute of Technology (MIT))

Abstract +

15:25-15:40 CDT

Q&A

15:40-16:30 CDT

Poster Session/Social Hour

Thursday, April 23, 2026

8:30-9:00 CDT

Check-in/Breakfast

9:00-9:40 CDT

What structures make model-free RL possible? an elliptic theory for controlled Markov diffusions

Speaker: Wenlong Mou (University of Toronto)

Abstract +

9:40-9:55 CDT

Q&A

9:55-10:00 CDT

Tech Break

10:00-10:40 CDT

Deterministic Policy Gradient for Reinforcement Learning with Continuous Time and Space

Speaker: Xin Guo (University of California, Berkeley (UC Berkeley))

Abstract +

10:40-10:55 CDT

Q&A

10:55-11:25 CDT

Coffee Break

11:25-12:05 CDT

Optimal offline policy learning under unknown confounding factors

Speaker: Zhimei Ren (University of Pennsylvania)

Abstract +

12:05-12:20 CDT

Q&A

12:20-13:45 CDT

Lunch Break

13:45-14:25 CDT

Toward efficient exploration for language models

Speaker: Dylan Foster (Microsoft Research)

14:25-14:40 CDT

Q&A

14:40-15:10 CDT

Coffee Break

15:10-15:50 CDT

Sample-Efficient and Low-Cost Model-Free Reinforcement Learning

Speaker: Lingzhou Xue (The Pennsylvania State University)

Abstract +

15:50-16:05 CDT

Q&A

Friday, April 24, 2026

8:30-9:00 CDT

Check-in/Breakfast

9:00-9:40 CDT

PPO Fine-Tuning of Diffusion Models: Provable Convergence across Interpolated Trajectories

Speaker: Yingbin Liang (The Ohio State University)

Abstract +

9:40-9:55 CDT

Q&A

9:55-10:00 CDT

Tech Break

10:00-10:40 CDT

Stochastic Zeroth-Order Policy Optimization for RLHF

Speaker: Lei Ying (University of Michigan)

10:40-10:55 CDT

Q&A

10:55-11:25 CDT

Coffee Break

11:25-12:05 CDT

Fisher Random Walk: Automatic Preference Inference for Language Models

Speaker: Junwei Lu (Harvard University)

Abstract +

12:05-12:20 CDT

Q&A

12:20-12:35 CDT

Workshop Survey and Closing Remarks

Poster Session

Posters submitted in advance can be viewed on this page.