Geometric and Topological Approaches to Representation Learning in Biomedical Data

This was part of Eliciting Structure in Genomics Data

Smita Krishnaswamy, Yale University

Tuesday, August 31, 2021

Abstract: High-throughput, high-dimensional data has become ubiquitous in the biomedical, health and social sciences as a result of breakthroughs in measurement technologies and data collection. While these large datasets containing millions of observations of cells, peoples, or brain voxels hold great potential for understanding generative state space of the data, as well as drivers of differentiation, disease and progression, they also pose new challenges in terms of noise, missing data, measurement artifacts, and the so-called “curse of dimensionality.” In this talk, I will cover data geometric and topological approaches to understanding the shape and structure of the data. First, we show how diffusion geometry and deep learning can be used to obtain useful representations of the data that enable denoising (MAGIC), dimensionality reduction (PHATE), and factor analysis (Archetypal Analysis Network) of the data. Next we will show how to learn dynamics from static snapshot data by using a manifold-regularized neural ODE-based optimal transport (TrajectoryNet). Finally, we cover a novel approach to combine diffusion geometry with topology to extract multi-granular features from the data (Diffusion Condensation and Multiscale PHATE) to assist in differential and predictive analysis. On the flip side, we also create a manifold geometry from topological descriptors, and show its applications to neuroscience. Together, we will show a complete framework for exploratory and unsupervised analysis of big biomedical data.