Universal prediction of cell cycle position using transfer learning
Kasper Hansen, Johns Hopkins University
The cell cycle is a highly conserved, continuous process which controls faithful replication and division of cells. Single-cell technologies have enabled increasingly precise measurements of the cell cycle both as a biological process of interest and as a possible confounding factor. Despite its importance and conservation, there is no universally applicable approach to infer position in the cell cycle with high-resolution from single-cell RNA-seq data. here, we present tricycle, a method with associated software, to address this challenge by leveraging key features of the biology of the cell cycle, the mathematical properties of principal component analysis of periodic functions, and the applicability of transfer learning. Tricycle works by first constructing a reference latent space using a fixed reference dataset as well as a projection operator which allows us to project any new dataset into the fixed reference space. We show that tricycle can predict any cell's position in the cell cycle regardless of the cell type, species of origin, and even sequencing assay. The accuracy of tricycle compares favorably to gold-standard experimental assays which generally require specialized measurements in specifically designed in vitro systems. Unlike gold-standard assays, tricycle is easily applicable to any single-cell RNA-seq dataset. We will highlight features of the problem and tricycle which we believe are specifically important for achieving high generalizability.