This was part of Contemporary Challenges in Large-Scale Sequence Alignments and Phylogenies

GPU-Accelerated Construction of Ultra-Large Pangenomes via Alignment-Phylogeny Co-Estimation

Yatish Turakhia, University of California, San Diego (UCSD)

Tuesday, August 12, 2025



Slides
Abstract: Pangenomics is an emerging field that is allowing us to accurately and comprehensively study the within-species genetic diversity and its relationship to physical traits (phenotypes) by using a collection of genomes of a species instead of a single reference genome. Future pangenomics applications would require analyzing ultra-large and ever-growing collections of genomes. While existing pangenome data formats can represent the genetic variation in a collection of genomes, they do not store their shared evolutionary and mutational histories and are also unlikely to keep up with the speed and volume of genome sequencing data. In this talk, I will present ongoing work from my lab on a novel pangenomic data representation that achieves significant improvements in memory efficiency and the representative power of pangenomes. I will then discuss how we are leveraging GPUs and HPC systems to construct massive pangenomes consisting of millions of sequences using alignment-phylogeny co-estimation techniques.