DescriptionBack to top
This workshop showcases current developments in theoretical and computational optimal transport with a focus on applications in machine learning and statistics.
OrganizersBack to top
SpeakersBack to top
ScheduleBack to top
Speaker: Soumik Pal (University of Washington)
Speaker: Jun Kitagawa (Michigan State University)
Speaker: Long Nguyen (University of Michigan)
Speaker: David Alvarez-Melis (Microsoft Research & Harvard University)
Speaker: Julio Backhoff-Veraguas (Universitaet Wien)
Speaker: Francis Bach (Institut National de Recherche en Informatique et Automatique (INRIA))
I will consider the analysis of probability distributions through their associated covariance operators from reproducing kernel Hilbert spaces. In this talk, I will show that the von Neumann entropy and relative entropy of these operators are intimately related to the usual notions of Shannon entropy and relative entropy, and share many of their properties. They come together with efficient estimation algorithms from various oracles on the probability distributions. I will also present how these new notions of relative entropy lead to new upper-bounds on log partition functions, that can be used together with convex optimization within variational inference methods, providing a new family of probabilistic inference methods (based on https://arxiv.org/pdf/2202.08545.pdf).
Speaker: Guillaume Carlier (Université Paris Dauphine)
Speaker: Jose Blanchet (Stanford University)
Speaker: Jonathan Niles-Weed (Courant Institute of Mathematical Sciences)
Speaker: Johannes Wiesel (Columbia University)
Speaker: Marco Cuturi (Apple ML Research and École Nationale de la Statistique et de l’Administration Économique)
I will present in this talk use cases in ML where optimal matchings pop up in various applied areas in ML. I will in particular mention areas where the optimal matching needs to be differentiated, in a way or another w.r.t. input parameters. I will then introduce two approaches to do so, either through entropic regularization or using neural solvers. I will present the implementation of these approaches in two instances: in the ott-jax toolbox that I have been developing actively, or to solve a bilevel optimization problem that appears when fitting JKO models to time series of measures.
Speaker: Elsa Cazelles (Centre National de la Recherche Scientifique (CNRS))
Speaker: Nabarun Deb (Columbia University)
Speaker: Gonzalo Mena (University of Oxford)
In this talk I discuss some benefits of optimizing an entropic optimal transport (OT) loss instead of the log likelihood for model-based clustering. The main drawback to maximizing the log-likelihood (e.g., through the EM algorithm) is the pervasiveness of bad local optima. By comparing the landscape of these two losses and their stationary points, I provide a detailed analysis of two situations where the log-likelihood possess such bad optima that are avoided by the entropic OT loss.
First, the log-likelihood exhibits ubiquitous ‘many-fit-one’ behavior on local optima, where many model components are placed on the same true data component. This leads to degeneracy as some remaining model components are placed around averages of true model components. I show that under some structural assumptions those bad optima are avoided by the entropic OT loss.
In the second case I study model-based clustering of two Gaussians with possibly different covariances in the high dimensional regime, where N/D=O(1). I show that with high probability estimates based on ill-conditioned covariance matrices achieve higher log-likelihoods than the estimates based on an oracle configuration where class membership is known beforehand. Local optima of the entropic OT loss, however, cannot correspond to such degenerate situations.
Besides these theoretical results I present extensive simulations and applications to Neuroscience and Genomics. Altogether, these results suggest that minimizing entropic OT loss is a sensible alternative to maximizing the log-likelihood, at least if the mixture weights are known beforehand.
Speaker: Promit Ghosal (Massachusetts Institute of Technology (MIT))
Speaker: Alfred Galichon (New York University)
Abstract: Motivated by problems from economics, this talk will introduce and analyze the “regularized equilibrium transport”, which embeds the well-known “regularized optimal transport” problem, but is more general, and more natural in some applications. Unlike in regularized optimal transport, the problem cannot be solved using the Hilbert metric or related techniques. Instead, the general framework of M-maps has to be invoked, and novels results in the theory are developed to obtain the existence of a solution, as well as a Jacobi-type algorithm, which extends Sinkhorn’s algorithm beyond optimal transport. Several applications to economics will be sketched. This talk is based on two separate works, one joint with Eugene Choo, Liang Chen and Simon Weber, and the other one with Flavien Léger.
Speaker: Johan Segers (UCLouvain)
Speaker: Florian Gunsilius (University of Michigan)
Speaker: Lenaic Chizat (EPFL)
Speaker: Anna Korba (ENSAE)
Sampling from a probability distribution whose density is only known up to a normalisation constant is a fundamental problem in statistics and machine learning. Recently, several algorithms based on interactive particle systems were proposed for this task, as an alternative to Markov Chain Monte Carlo methods or Variational Inference. These particle systems can be designed by adopting an optimisation point of view for the sampling problem: an optimisation objective is chosen (which typically measures the dissimilarity to the target distribution), and its Wasserstein gradient flow is approximated by an interacting particle system. At stationarity, the stationarity states of these particle systems define an empirical measure approximating the target distribution. In this talk I will present recent work on such algorithms, such as Stein Variational Gradient Descent  or Kernel Stein Discrepancy Descent , two algorithms based on Wasserstein gradient flows and reproducing kernels. I will discuss some recent results, that show that these particle systems can provide a good approximation of the target distribution; as well as current issues and open questions on the empirical and theoretical side. A non-asymptotic Analysis of Stein Variational Gradient Descent. Korba, A., Salim, A., Arbel, M., Luise, G., Gretton, A. Neural Information Processing Systems (Neurips), 2020  Kernel Stein Discrepancy Descent. Korba, A., Aubin-Frankowski, P.C., Majewski, S., Ablin, P. International Conference of Machine Learning (ICML), 2021.
Speaker: Eustasio del Barrio (Universidad de Valladolid)
VideosBack to top
The Wasserstein-Martingale projection of a Brownian motion given initial and terminal marginals.
May 16, 2022