DescriptionBack to top
From control of epidemics to understanding of the gut microbiome, to understanding beneficial and harmful microbial impacts on farming, and ecological systems, to understanding the impact of climate on microbial systems, microbial communities are an integral and essential part of life on earth. This conference aims to bring together experts in ecology, microbial genomics, population evolution, dynamical modeling and high throughput data science to share the current state of knowledge and catalyze new interactions across biology, computational, and mathematical sciences. The potential range of topics is huge. To provide focus for this effort, our organizing team has chosen to emphasize two key areas and their interplay: microbial genomics, and microbial population dynamics.
OrganizersBack to top
SpeakersBack to top
ScheduleBack to top
Moderator: Shulei Wang
Speaker: Li Ma (Duke University)
A key characteristic of microbiome compositional data is its large and complex cross-sample heterogeneity. Appropriately accounting for these “variance components” is critical for several common inference tasks, including identifying latent structures, carrying out hypothesis testing on cross-group differences, and modeling dynamics, but is complicated by the key features of microbiome compositional data including high-dimensionality and compositionality. These characteristics incur the need for structural constraints on modeling taxa covariance while maintaining the analytical and computational tractability of the resulting model or method. In this talk, I will review several recently proposed methods that aim to utilize the phylogenetic tree structure to incorporate flexible covariance components while maintaining computational scalability. In particular, I will present probabilistic models for microbiome compositional data based on the Dirichlet-tree (DT) distribution and the logistic-tree normal (LTN) distribution, and demonstrate their wide applicability in a range of applications including cross-sample comparison, mixed-effects modeling, covariance estimation, clustering analysis, and subcommunity identification. Their performance will be contrasted with previous models such as the Dirichlet-multinomial models and log-ratio based models. This talk is based on joint work with my students Jialiang Mao, Zhuoqun Wang, and Patrick LeBlanc.
Speaker: Mihai Pop (University of Maryland)
Determining the identity and functionality of an organism in a metagenomic sample are key steps in analytical pipelines used in both basic research and clinical applications. A range of computational techniques are used to perform these tasks, including: database searches, machine learning, protein structure prediction, etc. The majority of the approaches used today aim to provide a definitive answer, perhaps with an associated confidence estimate. In my talk, I will argue that it is often valuable to report a broader set of plausible answers, an approach that can provide a more nuanced analysis of sequences that are not represented in biological databases. This view can also protect analyses from the errors that are common in public data sets.
Speaker: Tal Korem (Columbia University)
The paired analysis of the microbiome and metabolome is revolutionizing our mechanistic understanding of microbial ecosystems. I will present two recent projects: in the first, we developed predictive models of the human serum metabolome, uncovering multiple interactions between the host, its microbiome, and metabolome. In the second, we used paired analysis of vaginal microbes and metabolites from samples collected early in pregnancy to identify novel interactions with preterm birth, and showed that the metabolome can accurately predict the risk for preterm delivery. Overall, these studies demonstrate how high resolution multi-omic analysis can drive us closer to mechanisms, and eventually to clinical translations.
Moderator: Hongzhe Li
Speaker: Byron Smith (Gladstone Institutes)
Strain-level variation in microbial traits are widespread and are biologically important in human associated microbial communities, yet standard methods for studying the microbiome are often limited to species-level taxonomic resolution. Shotgun metagenomic data can in principle be used to characterize and quantify strains, but strain mixtures and low sequence coverage present significant challenges. Statistical deconvolution of allele frequencies into strain genotypes and relative abundances—similar to non-negative matrix factorization—is a promising approach, but existing methods are limited by computational scalability. StrainFacts is a novel method for strain deconvolution that harnesses a “fuzzy” genotype approximation to make the underlying graphical model fully differentiable. This allows parameter estimates to be optimized with gradient-based methods, speeding up model fitting, and enabling its application to much larger numbers of metagenomes. Applying StrainFacts to tens of thousands of publicly available human stool metagenomes, we quantify patterns of strain diversity, biogeography, and linkage-disequilibrium that agree with and expand on what is known based on existing reference genomes.
Speaker: Jing Ma (Fred Hutchinson Cancer Research Center)
Mechanistic understanding of the microbiome requires identifying co-regulating microbial markers (e.g. taxa, metabolites, etc.) that are associated with host health outcomes. In this talk, I will discuss a new systems biology approach for regression analysis of multi-view microbiome data (e.g. metagenomics, metabolomics, etc.). Our method identifies multivariate association between the outcome and the latent factors common to all omics layers, and reveals specific variables underlying the multivariate association. I will illustrate the merit of the proposed method using an analysis of metagenomic and metabolomic data from the Study of Latinos.
Speaker: Michael Wu (Fred Hutchinson Cancer Research Center)
Moderator: Hongzhe Li
Speaker: Tandy Warnow (University of Illinois Urbana-Champaign)
Phylogenies are a fundamental tool that provide insight into microbial dynamics. Yet phylogenies can be very challenging on large biological datasets, especially when attempting to infer species phylogenies (whether trees or networks). In this talk I will describe some advances in improving phylogenetic tree estimation on large and ultra-large datasets, and then also identify some challenges where new ideas and techniques are needed. Some of this work is joint with my current and former students and postdocs, including Prof. Siavash Mirarab, Prof. Erin Molloy, Dr. Paul Zaharias, Dr. Vladimir Smirnov, Mr. Minhyuk Park, and Ms. Eleanor Wedell.
Speaker: Fengzhu Sun (University of Southern California)
Most current microbiome studies concentrate on prokaryotes including bacteria and archaea. In addition to prokaryotes, there are many other microbes such as eukaryotes and mobile genetic elements (MGE) including viruses and plasmids. Although the amount of genetic materials from MGEs is much smaller compared to that of prokaryotes, the number of MGEs is over 10 times that of prokaryotes. MGEs affect microbial communities either directly by their genes or through interactions with their hosts. However, MGEs are vastly understudied in metagenomics. We develop statistical and machine learning methods for the identification of MGEs and their hosts from metagenomics shotgun data and investigate their contributions to microbial communities including human gut and marine environments.
Moderator: Pamela Martinez
Speaker: Stefano Allesina (University of Chicago)
Ecologists have devised a number of ways to infer species interactions using time-series data. Here we ask whether instead we can infer species interactions by co-culturing several subsets of a pool of (microbial) species in a set of controlled experiments. Importantly, we consider the case in which we only take one measurement per sub-community—we measure the (relative or absolute) density of each extant species at the end of the experiment. We show that very simple and parsimonious statistical models can fit these data well, and even correctly predict the outcomes of new experiments out of fit. We discuss problems that arise when relative (instead of absolute) abundances are measured, and how to fit simplified versions of these models when data are not sufficient to identify all parameters.
Speaker: Mercedes Pascual (University of Chicago)
Questions on coexistence and diversity apply at different levels of organization in microbial systems. My talk addresses the assembly of diverse pathogen populations as the result of negative frequency-dependent competition for hosts. Such competition arises from acquisition by hosts of specific immune memory which leads to negative frequency-dependent selection. I show with two examples how network structure of genetic variation in microbes can reveal underlying processes shaping strain coexistence and population dynamics in hyper-diverse systems. The first example concerns microbe-virus coevolution under CRISPR-induced immunity; the second one is the malaria parasite Plasmodium falciparum in high-transmission settings. Computational stochastic models allow us to identify “macroscopic properties” of diversity at the population level that inform us about underlying “microscopic” processes at the individual level. Similar approaches may apply at the community level.
Speaker: Ed Ionides (University of Michigan)
Microbial populations can demonstrate highly nonlinear, partially observable, stochastic dynamics. Parameter estimation and model criticism are inferential challenges which are tractable using modern methods applicable to general low-dimensional ecological models. Collections of interacting ecosystems at different locations raise additional challenges associated with high-dimensional Monte Carlo inference. Questions arise about when it is necessary to develop models for dynamic coupling between metapopulations, and if so how to carry out inference in this setting. We seek general-purpose methods that allow scientists to ask and answer questions for metapopulation dynamics in the context of scientifically useful models. We present recent progress and ongoing developments.
Speaker: Nianqiao Phyllis Ju (Purdue University)
Agent-based models have been used to characterize the complex dynamics of epidemics because they describe how individuals interact in a network and with the environment. Despite many attractive features of agent-based models, parameter estimation remains a challenge due to intractable computations. In this talk, we present an agent-based susceptible-infectious-removed hidden Markov model that allows heterogeneity among individuals, a social network structure, and observation noises. Next, we develop sequential Monte Carlo methods that enable scalable and efficient statistical inference in agent-based models. The guiding principle is to design informed proposals that take the observations and model structure into account by leveraging knowledge from signal processing and optimal control. Finally, our methodology is illustrated on the celebrated Abakaliki smallpox dataset.
VideosBack to top
Microbiome analyses through tree-based models for high-dimensional compositions
February 21, 2022
Statistical Analysis of Large-Scale Microbiome Profiling Studies: Batch Effects and Robust Testing
February 21, 2022
Challenges and progress in phylogeny estimation on very large numbers of sequences
February 22, 2022
Diving into the Dark Matter of Mobile Genetic Elements in Microbial Communities
February 22, 2022
Inferring interactions in microbial communities—an alternative to time-series
February 22, 2022
Network structure and eco-evolutionary dynamics in microbial pathogen systems
February 22, 2022
Scalable Inference in Agent-based Epidemic Models Using Sequential Monte Carlo
Nianqiao Phyllis Ju
February 22, 2022