Eliciting Structure in Genomics Data

Bridging the Gap between Theory, Algorithms, Implementations, and Applications

Description

Back to top

Methods for dimension reduction play a critical role in a wide variety of genomic applications. Indeed, as technology develops, and datasets grow in both size and complexity, the need for effective dimension reduction methods that help visualize and distill the primary structures remains as essential as ever. Examples of the many practical applications in genomics include: (a) understanding (i) the structure of wild populations (particularly endangered species) from population genetic variation, (ii) human evolutionary history, also from population genetic variation, (iii) the 3-D structure of DNA from hi-C data, and (iv) genetic factors that influence risk for different human disease; (b) identifying (i) substructure among cell populations based on single-cell transcription patterns, and (ii) distinctive signatures of somatic mutations distinguishing different cancer subtypes; c) estimating confounding factors and other sources of unwanted variation in gene expression studies; d) segmenting and annotating genomic regions based on chromatin marks and other molecular features.

The development and provision of effective methods for dimension reduction involves connecting a series of areas of expertise: from theory to algorithms, implementations and applications. Theory is required to help decide what methods and algorithms to focus on; algorithms are required that help turn theoretical ideas into practical tools; and implementation of these algorithms is an often-overlooked step, where decisions are sometimes made that can greatly influence results. And all these steps need performing with at least one eye on the details of the practical applications and the data-types to which they will be applied. Unfortunately, there are relatively few opportunities for experts in these different areas to come together and learn from one another. This workshop will address this problem by bringing together mathematicians and computer scientists with a deep understanding of the theory and algorithmic and implementation issues, with applied statistical geneticists who have invaluable experience with both implementing and applying these methods to data, and interpreting the results. The goal will be to start new conversations across disciplinary barriers. The workshop will expose theoretical experts to the many ways that these methods are used in practice and the ongoing challenges that arise; and it will expose those familiar with applications to recent developments on the theoretical side.

Organizers

Back to top
M A
Mihai Anitescu Argonne National Laboratory and Statistics, University of Chicago
A G
Anna Gilbert Mathematics, Statistics and Data Science, Yale University
D N
Dan Nicolae Statistics, University of Chicago
M S
Matthew Stephens Statistics, University of Chicago

Speakers

Back to top
A B
Alex Bloemendal Broad Institute of MIT and Harvard
P D
Petros Drineas Purdue University
B D
Bianca Dumitrascu University of Cambridge
Z F
Zhou Fan Yale University
A G
Anna Gilbert Yale University
K H
Kasper Hansen Johns Hopkins University
T K
Tracy Ke Harvard University
S K
Smita Krishnaswamy Yale University
B L
Boris Landa Yale University
G M
Gal Mishne University of California, San Diego
B R
Ben Raphael Princeton University
K R
Karl Rohe University of Wisconsin-Madison
S S
Sriram Sankararaman University of California, Los Angeles
S V
Soledad Villar Johns Hopkins University
J W
Jingshu Wang University of Chicago
M W
Miaoyan Wang University of Wisconsin-Madison
T W
Tandy Warnow University of Illinois Urbana-Champaign

Schedule

Back to top
Monday, August 30, 2021
9:25-9:30 CDT
Welcome and Opening Remarks

Speaker: Kevin Corlette, Director, IMSI

9:30-10:30 CDT
MREC: a fast and versatile framework for aligning and matching point clouds with applications to single cell molecular data

Speaker: Soledad Villar (Johns Hopkins University)

11:00-12:00 CDT
Empirical Bayes PCA in high dimensions

Speaker: Zhou Fan (Yale University)

Online only

15:00-16:00 CDT
Two persistent puzzles in multivariate statistics; “rotations” and “picking k”

Speaker: Karl Rohe (University of Wisconsin-Madison)

Online only

Tuesday, August 31, 2021
9:30-10:30 CDT
Scaling statistical models to millions of human genomes

Speaker: Sriram Sankararaman (University of California, Los Angeles (UCLA))

11:00-12:00 CDT
Geometric and Topological Approaches to Representation Learning in Biomedical Data

Speaker: Smita Krishnaswamy (Yale University)

Online only

13:30-14:30 CDT
Bulk Eigenvalue Matching Analysis: A new method to estimating K in a spiked covariance matrix

Speaker: Tracy Ke (Harvard University)

Online only

15:00-16:00 CDT
Model-Based Trajectory Inference for Single-Cell RNA Sequencing Using Deep Learning with a Mixture Prior

Speaker: Jingshu Wang (University of Chicago)

Wednesday, September 1, 2021
9:30-10:30 CDT
Standardizing the spectra of count data matrices by diagonal scaling

Speaker: Boris Landa (Yale University)

11:00-12:00 CDT
LDLE: Low Distortion Local Eigenmaps

Speaker: Gal Mishne (University of California, San Diego)

Online only

13:30-14:30 CDT
Spatial transcriptomics: Alignment, integration, and inference of genomic aberrations

Speaker: Ben Raphael (Princeton University)

15:00-16:00 CDT
Metric representations: Algorithms and Geometry

Speaker: Anna Gilbert (Yale University)

Online only

Thursday, September 2, 2021
9:30-10:30 CDT
Beyond matrices: higher-order tensor methods meet computational biology

Speaker: Miaoyan Wang (University of Wisconsin-Madison)

Online only

11:00-12:00 CDT
Universal prediction of cell cycle position using transfer learning

Speaker: Kasper Hansen (Johns Hopkins University)

Online only

13:30-14:30 CDT
Theory and Practice for Large-scale Phylogeny Estimation

Speaker: Tandy Warnow (University of Illinois at Urbana-Champaign)

Online only

Friday, September 3, 2021
9:30-10:30 CDT
Genotype PCA as an estimator

Speaker: Alex Bloemendal (Broad Institute)

Online only

11:00-12:00 CDT
Machine learning for actionable, interpretable marker selection in -omics studies

Speaker: Bianca Dumitrascu (University of Cambridge)

Online only

13:30-14:30 CDT
Dimensionality reduction in the analysis of human genetics data

Speaker: Petros Drineas (Purdue University)

Online only


Videos

Back to top

MREC: a fast and versatile framework for aligning and matching point clouds with applications to single cell molecular data

Soledad Villar
August 30, 2021

Empirical Bayes PCA in high dimensions

Zhou Fan
August 30, 2021

Two persistent puzzles in multivariate statistics; “rotations” and “picking k”

Karl Rohe
August 30, 2021

Scaling statistical models to millions of human genomes

Sriram Sankararaman
August 31, 2021

Geometric and Topological Approaches to Representation Learning in Biomedical Data

Smita Krishnaswamy
August 31, 2021

Bulk Eigenvalue Matching Analysis: A new method to estimating K in a spiked covariance matrix

Tracy Ke
August 31, 2021

Model-Based Trajectory Inference for Single-Cell RNA Sequencing Using Deep Learning with a Mixture Prior

Jingshu Wang
August 31, 2021

Standardizing the spectra of count data matrices by diagonal scaling

Boris Landa
September 1, 2021

LDLE: Low Distortion Local Eigenmaps

Gal Mishne
September 1, 2021

Spatial transcriptomics: Alignment, integration, and inference of genomic aberrations

Ben Raphael
September 1, 2021

Metric representations: Algorithms and Geometry

Anna Gilbert
September 1, 2021

Beyond matrices: higher-order tensor methods meet computational biology

Miaoyan Wang
September 2, 2021

Universal prediction of cell cycle position using transfer learning

Kasper Hansen
September 2, 2021

Theory and Practice for Large-scale Phylogeny Estimation

Tandy Warnow
September 2, 2021

Machine learning for actionable, interpretable marker selection in -omics studies

Bianca Dumitrascu
September 3, 2021

Dimensionality reduction in the analysis of human genetics data

Petros Drineas
September 3, 2021