This was part of Contemporary Challenges in Large-Scale Sequence Alignments and Phylogenies

Distance calculation and phylogenetic placement using k-mers

Siavash Mirarab, University of California, San Diego (UCSD)

Wednesday, August 13, 2025



Slides
Abstract: Alignment-free methods for computing genomic distances and phylogenetic analyses have a long history. While these methods have not generally been as accurate as alignment-based methods, they tend to scale well and also may provide an opportunity to use more of the sequencing data. In metagenomic applications, in particular, they hold the promise of freeing phylogenetic analyses from being restricted to marker genes. In this talk, we review several new methods for k-mer-based estimation of taxonomic profiles and phylogenetic placement, culminating in a method called krepp. The input to krepp is a set of reference genomes and (optionally) a reference phylogeny and a set of query reads. It computes the distance from the read to each of the reference genomes (that are sufficiently close to it) and uses the distances for phylogenetic placement. We show how krepp can scale to trees with 120,000 genomes as leaves and how it improves accuracy downstream metagenomic applications. We spend substantial time discussing mathematical techniques used in designing krepp and potential avenues for further mathematical and algorithmic development.