MREC: a fast and versatile framework for aligning and matching point clouds with applications to single cell molecular data

This was part of Eliciting Structure in Genomics Data

Soledad Villar, Johns Hopkins University

Monday, August 30, 2021

Abstract: Comparison and alignment of large, high dimensional data sets is a challenging problem. A particular difficulty is that standard algorithms do not scale. An efficient solution that was feasible for such data sets would impact many different knowledge domains. We introduce and study MREC, a recursive decomposition procedure for computing matchings between very large data sets. The basic idea is to partition the data, match the partitions, and then recursively match the points within each pair of identified partitions. The matchings themselves can be done using black-box matching procedures that are too expensive to run on the entire data set. Given an absolute measure of the quality of a matching, the framework supports optimization over parameters including partitioning procedures and matching algorithms. By design, MREC is applicable to extremely large data sets. We analyze the procedure and describe the conditions under which it is expected to work well. We demonstrate the flexibility and power of MREC by applying it to a number of alignment problems arising in the analysis of single-cell molecular data.