# Topological Data Analysis

## Topological Data Analysis

### April 26-30, 2021

##### Deadline to submit posters: April 23

Organizers

• Brittany Fasy (Mathematics, Montana)
• Kathryn Hess (Mathematics, EPFL)
• Matthew Kahle (Mathematics, Ohio State)
• Sayan Mukherjee (Statistics, Duke)
• Jose Perea (Mathematics, Michigan State)

Description

Invited Speakers

• Lorin Crawford (Microsoft Research New England)
• Sara Kalisnik (Bentley University)
• Facundo Memoli (Ohio State)
• Ezra Miller (Duke University)
• Anthea Monod (Imperial College London)
• Elizabeth Munch (Michigan State University)
• Vidit Nanda (University of Oxford)
• Katharine Turner (Australian National University)
• Yusu Wang (UC San Diego)

#### Monday, April 26

All times CDT (UTC-5)
 8:50-9:00 Introductory remarks: IMSI Director and workshop organizers Morning Session Chair: Sayan Mukherjee 9:00-9:50 Katharine Turner (Australian National University)Algebraic Wasserstein distance between persistence modules 9:50-10:15 Discussion and Break 10:15-10:45 Elchanan Solomon (Duke University)What is the “right” topological invariant of a large point cloud X? Prior research has focused on estimating the full persistence diagram of X, a quantity that is very expensive to compute, unstable to outliers, and far from a sufficient statistic. We therefore propose that the correct invariant is not the persistence diagram of X, but rather the collection of persistence diagrams of many small subsets. This invariant, which we call “distributed persistence,” is trivially parallelizable, more stable to outliers, and has a rich inverse theory. The map from the space of point clouds (with the quasi-isometry metric) to the space of distributed persistence invariants (with the Hausdorff-Bottleneck distance) is a global quasi-isometry. This is a much stronger property than simply being a sufficient statistic, and is to our knowledge the only result of its kind in the TDA literature. Moreover, the quasi-isometry bounds depend on the size of the subsets taken, so that as the size of these subsets goes from small to large, the invariant interpolates between a purely geometric one and a purely topological one. Lastly, we note that our inverse results do not actually require considering all subsets of a fixed size (an enormous collection), but a relatively small collection satisfying certain covering properties, properties that arise with high probability when randomly selecting sufficiently many subsets. These theoretical results are complemented by experiments showcasing the success of distributed persistence at solving a number of morphometric challenges. This is joint work with Alex Wagner and Paul Bendich.From Geometry to Topology: Inverse Theorems for Distributed Persistence 11:00-11:30 Nicholas Berkouk (Ecole Polytechnique Fédérale de Lausanne)In this talk, I will provide a definition of ephemeral multi-persistent modules and prove that the quotient of persistent modules by the ephemeral ones is equivalent to the category of γ-sheaves. In the case of one-dimensional persistence, our definition agrees with the usual one showing that the observable category and the category of γ-sheaves are equivalent. I will also establish isometry theorems between the category of persistent modules and γ-sheaves both endowed with their interleaving distance. Finally, we compare the interleaving and convolution distances. Altogether, these results pave a new way to define dimension reduction techniques for multi-parameter persistence modules.Joint work with François Petit.Ephemeral persistence modules and distance comparison 11:30-1:00 Lunch Break Afternoon Session Chair: Matthew Kahle 1:00-1:50 Yusu Wang (University of California, San Diego)In recent years, topological and geometric data analysis (TGDA) has emerged as a new and promising field for processing, analyzing and understanding complex data. Indeed, geometry and topology form natural platforms for data analysis, with geometry describing the ”shape” and ”structure” behind data; and topology characterizing / summarizing both the domain where data are sampled from, as well as functions and maps associated to them. In this talk, I will show how the topological objects from discrete Morse theory and persistent homology can be used to reconstruct hidden geometric graphs; how such an approach can be extended to handle high-dimensional point clouds data (with sparsification); and how they can be then combined with machine learning pipelines for further data analysis tasks. This talk is based on multiple projects with multiple collaborators and references will be given during the talk.Discrete Morse-based graph reconstruction and data analysis 1:50-2:15 Discussion and Break 2:15-2:45 Celia Hacker (Ecole Polytechnique Fédérale de Lausanne)The well known node2vec algorithm has been used to explore network structures and represent the nodes of a graph in a vector space in a way that reflects the structure of the graph. Random walks in node2vec have been used to study the local structure through pairwise interactions. Our motivation for this project comes from a desire to understand higher-order relationships by a similar approach. To this end, we propose an extension of node2vec to a method for representing the k-simplices of a simplicial complex into Euclidean space.In this presentation I outline a way to do this by performing random walks on simplicial complexes, which have a greater variety of adjacency relations to take into account than in the case of graphs. The walks on simplices are then used to obtain a representation of the simplices. We will show cases in which this method can uncover the roles of higher order simplices in a network and help understand structures in graphs that cannot be seen by using just the random walks on the nodes.A simplicial extension of node2vec 3:00-3:30 Bei Wang (University of Utah)Merge trees are a type of topological descriptors that record the connectivity among the sublevel sets of scalar fields. In this paper, we are interested in sketching a set of merge trees. That is, given a set T of merge trees, we would like to find a basis set S such that each tree in T can be approximately reconstructed from a linear combination of merge trees in S. A set of high-dimensional vectors can be sketched via matrix sketching techniques such as principal component analysis and column subset selection.However, up until now, topological descriptors such as merge trees have not been known to be sketchable.We develop a framework for sketching a set of merge trees that combines the Gromov-Wasserstein framework of Chowdhury and Needham with techniques from matrix sketching.We demonstrate the applications of our framework in sketching merge trees that arise from data ensembles in scientific simulations.This is joint work with Mingzhe Li and Sourabh Palande.Sketching Merge Trees

#### Tuesday, April 27

 Morning Session Chair: Kathryn Hess 9:00-9:50 Lorin Crawford (Microsoft Research)The recent curation of large-scale databases with 3D surface scans of shapes has motivated the development of tools that better detect global-patterns in morphological variation. Studies which focus on identifying differences between shapes have been limited to simple pairwise comparisons and rely on pre-specified landmarks (that are often known). In this talk, we present SINATRA: a statistical pipeline for analyzing collections of shapes without requiring any correspondences. Our method takes in two classes of shapes and highlights the physical features that best describe the variation between them.The SINATRA pipeline implements four key steps. First, SINATRA summarizes the geometry of 3D shapes (represented as triangular meshes) by a collection of vectors (or curves) that encode changes in their topology. Second, a nonlinear Gaussian process model, with the topological summaries as input, classifies the shapes. Third, an effect size analog and corresponding association metric is computed for each topological feature used in the classification model. These quantities provide evidence that a given topological feature is associated with a particular class. Fourth, the pipeline iteratively maps the topological features back onto the original shapes (in rank order according to their association measures) via a reconstruction algorithm. This highlights the physical (spatial) locations that best explain the variation between the two groups.We use a rigorous simulation framework to assess our approach, which themselves are a novel contribution to 3D image analysis. Lastly, as a case study, we use SINATRA to analyze mandibular molars from four different suborders of primates and demonstrate its ability recover known morphometric variation across phylogenies.Statistical Frameworks for Mapping 3D Shape Variation onto Genotypic and Phenotypic Variation 9:50-10:15 Discussion and Break 10:15-10:45 Woojin Kim (Duke University)Metrics in computational topology are often either (i) themselves in the form of the interleaving distance d(F,G) between certain order-preserving maps F and G between posets or (ii) admit d(F,G) as a tractable lower bound. In this talk, assuming that the target poset of F and G admits a join-dense subset, we propose certain join representations of F and G which facilitate the computation of d(F,G). We leverage this result in order to (i) elucidate the structure and computational complexity of the interleaving distance for poset-indexed clusterings (i.e. poset-indexed subpartition-valued functors), (ii) to clarify the relationship between the erosion distance by Patel and the graded rank function by Betthauser, Bubenik, and Edwards, and (iii) to reformulate and generalize the tripod distance by the second author. This is joint work with Facundo Memoli and Anastasios Stefanou. https://arxiv.org/abs/1912.04366Interleaving by parts for persistence in a poset 11:00-11:30 Antonio Rieser (CONACYT-CIMAT, A.C.)There have been a number of attempts to extend the realm of application of algebraic topological tools to discrete spaces such as graphs, digital images, and point clouds, which one more typically encounters in computer science and data analysis. In each of these theories, one of two strategies has typically been taken. In topological data analysis, one usually replaces the original space with one or more topological spaces that one hopes will retain the relevant topological information in the original set. In various approaches to discrete or digital topology, we find instead different attempts to develop algebraic topology from scratch for some class of discrete objects of interest, proceeding largely by analogy with classical algebraic topology. In this work, we propose a third option: we generalize algebraic topology to categories which contain both the topological spaces classically treated by classical homotopy theory, but which also include as objects the more discrete and combinatorial spaces of interest in applications. The advantage here is that there are now non-trivial ‘continuous maps’ from classical topological spaces to the discrete spaces (given the appropriate structure), and one may then compare the resulting topological invariants on each side functorially. We find that there are a number of possible such categories, each with its own particular homotopy theory and associated homologies, and, additionally, that there is a generalization of the coarse category which allows finite sets to be non-trivial (i.e. not ‘coarsely’ equivalent to a point). We will give an overview of these theories and several applications, discussing the advantages and disadvantages of each.Algebraic topology in the mesoscopic regime 11:30-1:00 Lunch Break Afternoon Session Chair: Kathryn Hess 1:00-1:50 Elizabeth Munch (Michigan State University)Reeb graphs and main other related graphical signatures have extensive use in applications, but only recently has there been intense interest in finding metrics for these objects. In this talk, we focus on the interleaving distance, which is a categorical reforumlation of the epynomous metric from persistence modules. In this talk, we introduce an extension of smoothing on Reeb graphs, which we call truncated smoothing; this in turn allows us to define a new family of metrics which generalize the interleaving distance for Reeb graphs. Intuitively, we “chop off” parts near local minima and maxima during the course of smoothing. After formalizing truncation as a functor, we show that when applied after the smoothing functor, this prevents extensive expansion of the range of the function, and yields particularly nice properties. Further, for certain choices of the truncation parameter, we can construct a categorical flow for any choice of slope $m in [0,1]$, which gives a family of interleaving distances. While the resulting metrics are not stable, we show that any pair of these for $m,m’ in [0,1)$ are strongly equivalent metrics, which in turn gives stability of each metric up to a multiplicative constant.The Truncated Interleaving Distance for Reeb Graphs 1:50-2:15 Discussion and Break 2:15-3:45 Poster session in Teamflow

#### Wednesday, April 28

 Morning Session Chair: Jose Perea 9:00-9:50 Facundo Memoli (Ohio State University)We study an invariant of compact metric spaces which combines the notion of curvature sets introduced by Gromov in the 1980s together with the notion of Vietoris-Rips persistent homology. For given integers k≥0 and  n≥1 these invariants arise by considering the degree k Vietoris-Rips persistence diagrams of all subsets of a given metric space with cardinality at most  n. We call these invariants (n,k)-persistence sets. We argue that computing these invariants could be significantly easier than computing the usual Vietoris-Rips persistence diagrams. We establish stability results for these invariants and we also precisely characterize some of them in the case of spheres with geodesic and Euclidean distances. We also identify a rich family of metric graphs for which the (4,1)-persistence sets fully recover their homotopy type. Along the way we prove some useful properties of Vietoris-Rips persistence diagrams.Curvature sets over persistence diagrams 9:50-10:15 Discussion and Break 10:15-10:45 Ximena Fernández (Swansea University)Typically, persistence diagrams computed from a sample depend strongly on the distance associated to the data. When the point cloud is a sample of a Riemannian manifold embedded in a Euclidean space, an estimator of the intrinsic distance is relevant to obtain persistence diagrams from data that capture its intrinsic geometry.In this talk, we consider a computable estimator of a Riemannian metric known as Fermat distance, that accounts for both the geometry of the manifold and the density that produces the sample. We prove that the metric space defined by the sample endowed with this estimator (known as sample Fermat distance) converges a.s. in the sense of Gromov-Hausdorff to the manifold itself endowed with the (population) Fermat distance. This result is applied to obtain sample persistence diagrams that converge towards an intrinsic persistence diagram. We show that this approach outperforms more standard methods based on Euclidean norm, with theoretical results and computational experiments [1].[1] E. Borghini, X. Fernández, P. Groisman, G. Mindlin. ‘Intrinsic persistent homology via density-based metric learning’. arXiv:2012.07621 (2020)Intrinsic Persistent Homology via density-based metric learning 11:00-11:30 Luis Scoccola (Michigan State University)Synchronization problems, such as the problem of reconstructing a 3D shape from a set of 2D projections, can often be modeled by principal bundles. Similarly, the application of local PCA to a point cloud concentrated around a manifold approximates the tangent bundle of the manifold. In the first case, the characteristic classes of the bundle provide obstructions to global synchronization, while, in the second case, they provide topological information of the manifold beyond its homology, and give obstructions to dimensionality reduction. I will describe joint work with Jose Perea in which we propose notions of approximate and discrete vector bundle, study the extent to which they determine true vector bundles, and give algorithms for the stable and consistent computation of low-dimensional characteristic classes directly from these combinatorial representations.Approximate and discrete vector bundles 11:30-1:00 Lunch Break Afternoon Session Chair: Brittany Fasy 1:00-1:50 Anthea Monod (Imperial College)Appropriately representing elements in a database so that queries may be accurately matched is a central task in information retrieval.  This recently has been achieved by embedding the graphical structure of the database into a manifold so that the hierarchy is preserved.  Persistent homology provides a rigorous characterization for the database topology in terms of both its hierarchy and connectivity structure.  We compute persistent homology on a variety of datasets and show that some commonly used embeddings fail to preserve the connectivity.  Moreover, we show that embeddings which successfully retain the database topology coincide in persistent homology.  We introduce the dilation-invariant bottleneck distance to capture this effect, which addresses metric distortion on manifolds.  We use it to show that distances between topology-preserving embeddings of databases are small.Topological Data Analysis of Database Representations for Information Retrieval 1:50-2:15 Discussion and Break 2:15-2:45 Chul Moon (Southern Methodist University)Tumor shape and size have been used as important markers for cancer diagnosis and treatment. This paper proposes a topological feature computed by persistent homology to characterize tumor progression from digital pathology and radiology images and examines its effect on the time-to-event data. The proposed topological features are invariant to scale-preserving transformation and can summarize various tumor shape patterns. The topological features are represented in functional space and used as functional predictors in a functional Cox proportional hazards model. The proposed model enables interpretable inference about the association between topological shape features and survival risks. Two case studies are conducted using lung cancer pathology and brain tumor radiology images. The results show that the topological features predict survival prognosis after adjusting clinical variables, and the predicted high-risk groups have significantly worse survival outcomes than the low-risk groups (p-values$<$0.005 for both studies). Also, the topological shape features found to be positively associated with survival hazards are irregular and heterogeneous shape patterns, which are known to be related to tumor progression. Our study provides a new perspective for understanding tumor shape and patient prognosis.Predicting Survival Outcomes using Topological Shape Features of AI-reconstructed Medical Images

#### Friday, April 30

 Morning Session Chair: Kathryn Hess 9:00-9:50 Vidit Nanda (University of Oxford)Many interesting objects across pure and applied mathematics (including persistence modules, cellular sheaves and connection matrices) are most naturally viewed as linear algebraic data parametrized by a finite space. In this talk, I will describe a practical framework for dimensionality reduction and linear optimization over a wide class of such objects.Compatibility and Optimization for Quiver Representations 9:50-10:15 Discussion and Break 10:15-10:45 Barbara Giunti (Technische Universität Graz)The use of persistent homology in applications is justified by the validity of certain stability results. At the core of such results is a notion of distance between the invariants that one associates to data sets. While such distances are well-understood in the one-parameter case, the situation for multiparameter persistence modules is more challenging, since there is no generalisation of the barcode.Here we introduce a general framework to study stability questions in multiparameter persistence. We first introduce the (outer) amplitude, a functional on abelian categories that mimics the properties of an outer measure in measure theory, then study different ways to associate distances to such functionals. Our framework is very comprehensive, as many different invariants that have been introduced in the literature are examples of outer amplitudes, and similarly, we show that many known distances for multiparameter persistence are distances from outer amplitudes.Finally, we provide new stability results using our framework.The amplitude of an abelian category: Measures in persistence theory 11:00-11:30 Hitesh Gakhar (University of Oklahoma)Sliding window embeddings were originally used in the study of dynamical systems to reconstruct the topology of underlying attractors from generic observation functions. In 2015, a technique for recurrence detection in time series data using sliding window embeddings of periodic functions and persistent homology was developed. We study a closely related class of functions, namely quasiperiodic functions, whose constitutive frequencies are non-commensurate harmonics. The sliding window embeddings of such functions are dense in high dimensional tori, where the dimension depends on the number of incommensurate harmonics. In this talk, we will present results pertaining to the structure of sliding window embeddings and their persistent homology, along with a brief discussion on how to choose the embedding parameters.Sliding windows persistence of quasiperiodic functions 11:30-1:00 Lunch Break Afternoon Session Chair: Anna Schenfisch 1:00-1:50 Ezra Miller (Duke University)Fundamental to applications of ordinary persistent homology in one parameter is the reconstruction of a module from the perfect matching between left endpoints and right endpoints of its bar code.  Do these concepts have analogues in multiple parameters?  The answer is largely yes: endpoints can be defined, and the module can be reconstructed from them, though the correspondence is not a perfect matching but rather a more arbitrary linear map.  The algebra needed for these developments will be covered from scratch, followed by a view toward how they might be used for computational purposes.What are left and right endpoints for multiparameter persistence? 1:50-2:15 Discussion and Break 2:15-2:45 Iris Yoon (University of Delaware)We present a new method for comparing topological features using dissimilarity matrices obtained from observing activity in distinct complex systems. Our method uses the Dowker complex of a cross-dissimilarity matrix to identify all possible ways a common feature could be represented by the barcodes of activity within the individual systems. This method can be used to study both how distinct systems respond to the same stimuli and how behavior in one system drives behavior in another. Motivated by questions in neuroscience, our framework will allow researchers to investigate open problems such as how neural systems code for complex stimuli and how such coding structures propagate and evolve through different neural systems without direct reference to external correlates. The same tools can also be applied more generally to explore two-dimensional persistence and to identify which topological features are preserved after dimensionality reduction. This is joint work with Chad Giusti (University of Delaware) and Robert Ghrist (University of Pennsylvania)Identifying analogous topological features across multiple systems 3:00-3:30 Bastian Rieck (ETH Zurich)Topological data analysis emerged as an effective tool in machine learning, supporting the analysis of neural networks, but also driving the development of novel algorithms that incorporate topological characteristics. As a problem class, graph classification is of particular interest here, since graphs are inherently amenable to a topological description in terms of their connected components and cycles. This talk will briefly summarise recent advances in topology-based graph classification, focussing equally on ’shallow’ and ‘deep’ approaches. Starting from an intuitive description of persistent homology, we will discuss how to incorporate topological features into the Weisfeiler–Lehman colour refinement scheme, thus obtaining a simple feature-based graph classification algorithm. We will then build a bridge to graph neural networks and demonstrate a topological variant of ‘readout’ functions, which can be learned end-to-end. Care has been taken to make the talk accessible to an audience that may not have been exposed to machine learning or topological data analysis.Recent Advances in Topology-Based Graph Classification 3:45-? Happy Hour on Zoom (with breakout rooms)
###### Algebraic Wasserstein distance between persistence modules
Speaker: Katharine Turner (Australian National University)
Occasion: Topological Data Analysis
Date: April 26, 2021
###### From Geometry to Topology: Inverse Theorems for Distributed Persistence
Speaker: Elchanan Solomon (Duke University)
Occasion: Topological Data Analysis
Date: April 26, 2021
###### Ephemeral persistence modules and distance comparison

Speaker: Nicolas Berkouk (Ecole Polytechnique Fédérale de Lausanne)
Occasion: Topological Data Analysis
Date: April 26, 2021
###### Discrete Morse-based graph reconstruction and data analysis
Speaker: Yusu Wang (University of California, San Diego)
Occasion: Topological Data Analysis
Date: April 26, 2021
###### A simplicial extension of node2vec
Speaker: Celia Hacker (Ecole Polytechnique Fédérale de Lausanne)
Occasion: Topological Data Analysis
Date: April 26, 2021
###### Sketching Merge Trees
Speaker: Bei Wang (University of Utah)
Occasion: Topological Data Analysis
Date: April 26, 2021
###### Statistical Frameworks for Mapping 3D Shape Variation onto Genotypic and Phenotypic Variation
Speaker: Lorin Crawford (Microsoft Research)
Occasion: Topological Data Analysis
Date: April 27, 2021
###### Interleaving by parts for persistence in a poset
Speaker: Woojin Kim (Duke University)
Occasion: Topological Data Analysis
Date: April 27, 2021
###### Algebraic topology in the mesoscopic regime
Speaker: Antonio Rieser (CONACYT-CIMAT, A.C.)
Occasion: Topological Data Analysis
Date: April 27, 2021
###### The Truncated Interleaving Distance for Reeb Graphs
Speaker: Elizabeth Munch (Michigan State University)
Occasion: Topological Data Analysis
Date: April 27, 2021
###### Curvature sets over persistence diagrams
Speaker: Facundo Memoli (Ohio State University)
Occasion: Topological Data Analysis
Date: April 28, 2021
###### Intrinsic Persistent Homology via density-based metric learning
Speaker: Ximena Fernández (Swansea University)
Occasion: Topological Data Analysis
Date: April 28, 2021
###### Approximate and discrete vector bundles
Speaker: Luis Scoccola (Michigan State University)
Occasion: Topological Data Analysis
Date: April 28, 2021
###### Topological Data Analysis of Database Representations for Information Retrieval
Speaker: Anthea Monod (Imperial College)
Occasion: Topological Data Analysis
Date: April 28, 2021
###### Predicting Survival Outcomes using Topological Shape Features of AI-reconstructed Medical Images
Speaker: Chul Moon (Southern Methodist University)
Occasion: Topological Data Analysis
Date: April 28, 2021
###### Back to Basics – Topology of Simplicial Complexes for Business Optimisations
Speaker: Marc Lange (Elbformat Consulting)
Occasion: Topological Data Analysis
Date: April 29, 2021
###### Geometric and Topological Fingerprints for Periodic Crystals
Speaker: Teresa Heiss (Institute of Science and Technology Austria)
Occasion: Topological Data Analysis
Date: April 29, 2021
###### Simplicial Neural Networks
Speaker: Stefania Ebli (Ecole Polytechnique Fédérale de Lausanne)
Occasion: Topological Data Analysis
Date: April 29, 2021
###### Topological Sholl Descriptors for Neuronal Clustering and Classification
Speaker: Sadok Kallel (American University of Sharjah )
Occasion: Topological Data Analysis
Date: April 29, 2021
###### Compatibility and Optimization for Quiver Representations
Speaker: Vidit Nanda (University of Oxford)
Occasion: Topological Data Analysis
Date: April 30, 2021
###### The amplitude of an abelian category: Measures in persistence theory
Speaker: Barbara Giunti (Technische Universität Graz)
Occasion: Topological Data Analysis
Date: April 30, 2021
###### Sliding windows persistence of quasiperiodic functions
Speaker: Hitesh Gakhar (University of Oklahoma)
Occasion: Topological Data Analysis
Date: April 30, 2021
###### What are left and right endpoints for multiparameter persistence?
Speaker: Ezra Miller (Duke University)
Occasion: Topological Data Analysis
Date: April 30, 2021
###### Identifying analogous topological features across multiple systems
Speaker: Iris Yoon (University of Delaware)
Occasion: Topological Data Analysis
Date: April 30, 2021
###### Recent Advances in Topology-Based Graph Classification
Speaker: Bastian Rieck (ETH Zurich)
Occasion: Topological Data Analysis
Date: April 30, 2021
###### Learning with Approximate or Distributed Topology
Speaker: Alexander Wagner (Duke University) Occasion: Topological Data Analysis Date: April 29, 2021