Topological Data Analysis
April 26-30, 2021
This workshop will take place online.
Deadline to offer a contributed talk: March 5
Deadline to submit posters: April 23
- Brittany Fasy (Mathematics, Montana)
- Kathryn Hess (Mathematics, EPFL)
- Matthew Kahle (Mathematics, Ohio State)
- Sayan Mukherjee (Statistics, Duke)
- Jose Perea (Mathematics, Michigan State)
In this age of rapidly increasing access to ever larger data sets, it has become clear that studying the “shape” of data using the tools of combinatorial and algebraic topology can lead to much deeper insights than other standard methods when analyzing complex data sets. Topological data analysis (TDA) is the exciting and highly active new field of research that encompasses these productive developments at the interface of algebraic topology, statistics, and data science. This workshop will consist of a small number of plenary one-hour lectures by leading researchers in the field, a larger number of contributed short talks from early-career researchers, live demos of software, a problem session, and a poster session. The speakers will cover a wide range of topics, from theory to concrete applications of TDA in science and engineering. The goals of the workshop are to foster scientific interactions across the growing breadth of the applied topology community and to provide an opportunity for algebraic topologists, statisticians, and data scientists curious about this dynamic new field to learn more about it.
- Lorin Crawford (Microsoft Research New England)
- Sara Kalisnik (Bentley University)
- Facundo Memoli (Ohio State)
- Ezra Miller (Duke University)
- Anthea Monod (Imperial College London)
- Elizabeth Munch (Michigan State University)
- Vidit Nanda (University of Oxford)
- Katharine Turner (Australian National University)
- Yusu Wang (UC San Diego)
Monday, April 26All times CDT (UTC-5)
|8:50-9:00||Introductory remarks: IMSI Director and workshop organizers|
Katharine Turner (Australian National University)|
Algebraic Wasserstein distance between persistence modules
|9:50-10:15||Discussion and Break|
Elchanan Solomon (Duke University)|
What is the “right” topological invariant of a large point cloud X? Prior research has focused on estimating the full persistence diagram of X, a quantity that is very expensive to compute, unstable to outliers, and far from a sufficient statistic. We therefore propose that the correct invariant is not the persistence diagram of X, but rather the collection of persistence diagrams of many small subsets. This invariant, which we call “distributed persistence,” is trivially parallelizable, more stable to outliers, and has a rich inverse theory. The map from the space of point clouds (with the quasi-isometry metric) to the space of distributed persistence invariants (with the Hausdorff-Bottleneck distance) is a global quasi-isometry. This is a much stronger property than simply being a sufficient statistic, and is to our knowledge the only result of its kind in the TDA literature. Moreover, the quasi-isometry bounds depend on the size of the subsets taken, so that as the size of these subsets goes from small to large, the invariant interpolates between a purely geometric one and a purely topological one. Lastly, we note that our inverse results do not actually require considering all subsets of a fixed size (an enormous collection), but a relatively small collection satisfying certain covering properties, properties that arise with high probability when randomly selecting sufficiently many subsets. These theoretical results are complemented by experiments showcasing the success of distributed persistence at solving a number of morphometric challenges. This is joint work with Alex Wagner and Paul Bendich.From Geometry to Topology: Inverse Theorems for Distributed Persistence
Nicholas Berkouk (Ecole Polytechnique Fédérale de Lausanne)
In this talk, I will provide a definition of ephemeral multi-persistent modules and prove that the quotient of persistent modules by the ephemeral ones is equivalent to the category of γ-sheaves. In the case of one-dimensional persistence, our definition agrees with the usual one showing that the observable category and the category of γ-sheaves are equivalent. I will also establish isometry theorems between the category of persistent modules and γ-sheaves both endowed with their interleaving distance. Finally, we compare the interleaving and convolution distances. Altogether, these results pave a new way to define dimension reduction techniques for multi-parameter persistence modules.
Joint work with François Petit.Ephemeral persistence modules and distance comparison
Yusu Wang (University of California, San Diego)|
In recent years, topological and geometric data analysis (TGDA) has emerged as a new and promising field for processing, analyzing and understanding complex data. Indeed, geometry and topology form natural platforms for data analysis, with geometry describing the ”shape” and ”structure” behind data; and topology characterizing / summarizing both the domain where data are sampled from, as well as functions and maps associated to them. In this talk, I will show how the topological objects from discrete Morse theory and persistent homology can be used to reconstruct hidden geometric graphs; how such an approach can be extended to handle high-dimensional point clouds data (with sparsification); and how they can be then combined with machine learning pipelines for further data analysis tasks. This talk is based on multiple projects with multiple collaborators and references will be given during the talk.Discrete Morse-based graph reconstruction and data analysis
|1:50-2:15||Discussion and Break|
Celia Hacker (Ecole Polytechnique Fédérale de Lausanne)|
The well known node2vec algorithm has been used to explore network structures and represent the nodes of a graph in a vector space in a way that reflects the structure of the graph. Random walks in node2vec have been used to study the local structure through pairwise interactions. Our motivation for this project comes from a desire to understand higher-order relationships by a similar approach. To this end, we propose an extension of node2vec to a method for representing the k-simplices of a simplicial complex into Euclidean space.
In this presentation I outline a way to do this by performing random walks on simplicial complexes, which have a greater variety of adjacency relations to take into account than in the case of graphs. The walks on simplices are then used to obtain a representation of the simplices. We will show cases in which this method can uncover the roles of higher order simplices in a network and help understand structures in graphs that cannot be seen by using just the random walks on the nodes.A simplicial extension of node2vec
Bei Wang (University of Utah)|
Merge trees are a type of topological descriptors that record the connectivity among the sublevel sets of scalar fields. In this paper, we are interested in sketching a set of merge trees. That is, given a set T of merge trees, we would like to find a basis set S such that each tree in T can be approximately reconstructed from a linear combination of merge trees in S. A set of high-dimensional vectors can be sketched via matrix sketching techniques such as principal component analysis and column subset selection.
However, up until now, topological descriptors such as merge trees have not been known to be sketchable.
We develop a framework for sketching a set of merge trees that combines the Gromov-Wasserstein framework of Chowdhury and Needham with techniques from matrix sketching.We demonstrate the applications of our framework in sketching merge trees that arise from data ensembles in scientific simulations.
This is joint work with Mingzhe Li and Sourabh Palande.Sketching Merge Trees
Tuesday, April 27
Lorin Crawford (Microsoft Research)|
The recent curation of large-scale databases with 3D surface scans of shapes has motivated the development of tools that better detect global-patterns in morphological variation. Studies which focus on identifying differences between shapes have been limited to simple pairwise comparisons and rely on pre-specified landmarks (that are often known). In this talk, we present SINATRA: a statistical pipeline for analyzing collections of shapes without requiring any correspondences. Our method takes in two classes of shapes and highlights the physical features that best describe the variation between them.The SINATRA pipeline implements four key steps. First, SINATRA summarizes the geometry of 3D shapes (represented as triangular meshes) by a collection of vectors (or curves) that encode changes in their topology. Second, a nonlinear Gaussian process model, with the topological summaries as input, classifies the shapes. Third, an effect size analog and corresponding association metric is computed for each topological feature used in the classification model. These quantities provide evidence that a given topological feature is associated with a particular class. Fourth, the pipeline iteratively maps the topological features back onto the original shapes (in rank order according to their association measures) via a reconstruction algorithm. This highlights the physical (spatial) locations that best explain the variation between the two groups.We use a rigorous simulation framework to assess our approach, which themselves are a novel contribution to 3D image analysis. Lastly, as a case study, we use SINATRA to analyze mandibular molars from four different suborders of primates and demonstrate its ability recover known morphometric variation across phylogenies.Statistical Frameworks for Mapping 3D Shape Variation onto Genotypic and Phenotypic Variation
|9:50-10:15||Discussion and Break|
Woojin Kim (Duke University)|
Metrics in computational topology are often either (i) themselves in the form of the interleaving distance d(F,G) between certain order-preserving maps F and G between posets or (ii) admit d(F,G) as a tractable lower bound. In this talk, assuming that the target poset of F and G admits a join-dense subset, we propose certain join representations of F and G which facilitate the computation of d(F,G). We leverage this result in order to (i) elucidate the structure and computational complexity of the interleaving distance for poset-indexed clusterings (i.e. poset-indexed subpartition-valued functors), (ii) to clarify the relationship between the erosion distance by Patel and the graded rank function by Betthauser, Bubenik, and Edwards, and (iii) to reformulate and generalize the tripod distance by the second author. This is joint work with Facundo Memoli and Anastasios Stefanou. https://arxiv.org/abs/1912.04366Interleaving by parts for persistence in a poset
Antonio Rieser (CONACYT-CIMAT, A.C.)
There have been a number of attempts to extend the realm of application of algebraic topological tools to discrete spaces such as graphs, digital images, and point clouds, which one more typically encounters in computer science and data analysis. In each of these theories, one of two strategies has typically been taken. In topological data analysis, one usually replaces the original space with one or more topological spaces that one hopes will retain the relevant topological information in the original set. In various approaches to discrete or digital topology, we find instead different attempts to develop algebraic topology from scratch for some class of discrete objects of interest, proceeding largely by analogy with classical algebraic topology. In this work, we propose a third option: we generalize algebraic topology to categories which contain both the topological spaces classically treated by classical homotopy theory, but which also include as objects the more discrete and combinatorial spaces of interest in applications. The advantage here is that there are now non-trivial ‘continuous maps’ from classical topological spaces to the discrete spaces (given the appropriate structure), and one may then compare the resulting topological invariants on each side functorially. We find that there are a number of possible such categories, each with its own particular homotopy theory and associated homologies, and, additionally, that there is a generalization of the coarse category which allows finite sets to be non-trivial (i.e. not ‘coarsely’ equivalent to a point). We will give an overview of these theories and several applications, discussing the advantages and disadvantages of each.Algebraic topology in the mesoscopic regime
Elizabeth Munch (Michigan State University)|
Reeb graphs and main other related graphical signatures have extensive use in applications, but only recently has there been intense interest in finding metrics for these objects. In this talk, we focus on the interleaving distance, which is a categorical reforumlation of the epynomous metric from persistence modules. In this talk, we introduce an extension of smoothing on Reeb graphs, which we call truncated smoothing; this in turn allows us to define a new family of metrics which generalize the interleaving distance for Reeb graphs. Intuitively, we “chop off” parts near local minima and maxima during the course of smoothing. After formalizing truncation as a functor, we show that when applied after the smoothing functor, this prevents extensive expansion of the range of the function, and yields particularly nice properties. Further, for certain choices of the truncation parameter, we can construct a categorical flow for any choice of slope $m in [0,1]$, which gives a family of interleaving distances. While the resulting metrics are not stable, we show that any pair of these for $m,m’ in [0,1)$ are strongly equivalent metrics, which in turn gives stability of each metric up to a multiplicative constant.The Truncated Interleaving Distance for Reeb Graphs
|1:50-2:15||Discussion and Break|
|2:15-3:45||Poster session in Teamflow|
Wednesday, April 28
Facundo Memoli (Ohio State University)|
We study an invariant of compact metric spaces which combines the notion of curvature sets introduced by Gromov in the 1980s together with the notion of Vietoris-Rips persistent homology. For given integers k≥0 and n≥1 these invariants arise by considering the degree k Vietoris-Rips persistence diagrams of all subsets of a given metric space with cardinality at most n. We call these invariants (n,k)-persistence sets. We argue that computing these invariants could be significantly easier than computing the usual Vietoris-Rips persistence diagrams. We establish stability results for these invariants and we also precisely characterize some of them in the case of spheres with geodesic and Euclidean distances. We also identify a rich family of metric graphs for which the (4,1)-persistence sets fully recover their homotopy type. Along the way we prove some useful properties of Vietoris-Rips persistence diagrams.Curvature sets over persistence diagrams
|9:50-10:15||Discussion and Break|
Ximena Fernández (Swansea University)|
Typically, persistence diagrams computed from a sample depend strongly on the distance associated to the data. When the point cloud is a sample of a Riemannian manifold embedded in a Euclidean space, an estimator of the intrinsic distance is relevant to obtain persistence diagrams from data that capture its intrinsic geometry.
In this talk, we consider a computable estimator of a Riemannian metric known as Fermat distance, that accounts for both the geometry of the manifold and the density that produces the sample. We prove that the metric space defined by the sample endowed with this estimator (known as sample Fermat distance) converges a.s. in the sense of Gromov-Hausdorff to the manifold itself endowed with the (population) Fermat distance. This result is applied to obtain sample persistence diagrams that converge towards an intrinsic persistence diagram. We show that this approach outperforms more standard methods based on Euclidean norm, with theoretical results and computational experiments . E. Borghini, X. Fernández, P. Groisman, G. Mindlin. ‘Intrinsic persistent homology via density-based metric learning’. arXiv:2012.07621 (2020)Intrinsic Persistent Homology via density-based metric learning
Luis Scoccola (Michigan State University)
Synchronization problems, such as the problem of reconstructing a 3D shape from a set of 2D projections, can often be modeled by principal bundles. Similarly, the application of local PCA to a point cloud concentrated around a manifold approximates the tangent bundle of the manifold. In the first case, the characteristic classes of the bundle provide obstructions to global synchronization, while, in the second case, they provide topological information of the manifold beyond its homology, and give obstructions to dimensionality reduction. I will describe joint work with Jose Perea in which we propose notions of approximate and discrete vector bundle, study the extent to which they determine true vector bundles, and give algorithms for the stable and consistent computation of low-dimensional characteristic classes directly from these combinatorial representations.Approximate and discrete vector bundles
Anthea Monod (Imperial College)|
Appropriately representing elements in a database so that queries may be accurately matched is a central task in information retrieval. This recently has been achieved by embedding the graphical structure of the database into a manifold so that the hierarchy is preserved. Persistent homology provides a rigorous characterization for the database topology in terms of both its hierarchy and connectivity structure. We compute persistent homology on a variety of datasets and show that some commonly used embeddings fail to preserve the connectivity. Moreover, we show that embeddings which successfully retain the database topology coincide in persistent homology. We introduce the dilation-invariant bottleneck distance to capture this effect, which addresses metric distortion on manifolds. We use it to show that distances between topology-preserving embeddings of databases are small.Topological Data Analysis of Database Representations for Information Retrieval
|1:50-2:15||Discussion and Break|
Chul Moon (Southern Methodist University)|
Tumor shape and size have been used as important markers for cancer diagnosis and treatment. This paper proposes a topological feature computed by persistent homology to characterize tumor progression from digital pathology and radiology images and examines its effect on the time-to-event data. The proposed topological features are invariant to scale-preserving transformation and can summarize various tumor shape patterns. The topological features are represented in functional space and used as functional predictors in a functional Cox proportional hazards model. The proposed model enables interpretable inference about the association between topological shape features and survival risks. Two case studies are conducted using lung cancer pathology and brain tumor radiology images. The results show that the topological features predict survival prognosis after adjusting clinical variables, and the predicted high-risk groups have significantly worse survival outcomes than the low-risk groups (p-values$<$0.005 for both studies). Also, the topological shape features found to be positively associated with survival hazards are irregular and heterogeneous shape patterns, which are known to be related to tumor progression. Our study provides a new perspective for understanding tumor shape and patient prognosis.Predicting Survival Outcomes using Topological Shape Features of AI-reconstructed Medical Images
Thursday, April 28
Marc Lange (Elbformat Consulting)|
When our topology ancestors were playing with simplicial complexes to devise models of RP^2 or common triangulations of surfaces they could not have envisioned how ubiquitous and large graph structures are nowadays. Starting with reminders on elementary collapses, clique complexes and a touch of NP-completeness for maximal cliques, I will illustrate the relevant business examples we have found in our Data Science practice as elbformat consulting in Hamburg, Germany. Cases will include Structural Website Optimisations, User Funnel Analyses, Process Mining insights and a Recommendation Engine Prototype.Back to Basics – Topology of Simplicial Complexes for Business Optimisations
Alexander Wagner (Duke University)|
The computational cost of calculating the persistence diagram for a large input inhibits its use in a deep learning framework. The fragility of the persistence diagram to outliers and the instability of critical cells present further challenges. In this talk, I will present two distinct approaches to address these concerns. In the first approach, by replacing the original filtration with a stochastically downsampled filtration on a smaller complex, one can obtain results in topological optimization tasks that are empirically more robust and much faster to compute than their vanilla counterparts. In the second approach, we work with the set of persistence diagrams of subsets of a fixed size rather than with the diagram of the complete point cloud. The benefits of this distributed approach are a greater degree of practical robustness to outliers, faster computation due to parallelizability and scaling of the persistence algorithm, and an inverse stability theory. After outlining these benefits, I will describe a dimensionality reduction pipeline using distributed persistence. This is joint work with Elchanan Solomon and Paul Bendich.Learning with Approximate or Distributed Topology
Teresa Heiss (Institute of Science and Technology Austria)
The following application has motivated us to develop new Computational Geometry and Topology methods, involving Brillouin zones and periodic k-fold persistent homology: We model crystals by (infinite) periodic point sets, i.e. by the union of several translates of a lattice, where every point represents an atom. Two periodic point sets are equivalent if there is a rigid transformation from one to the other. A periodic point set can be represented by a finite cutout s.t. copying this cutout infinitely often in all directions yields the periodic point set. The fact that these cutouts are not unique creates problems when working with them. Therefore, material scientists would like to work with a complete, continuous invariant instead.
In this talk, I will present two continuous invariants that are at least generically complete: Firstly, the density fingerprint, computing the probability that a random ball of radius r contains exactly k points of the periodic point set, for all positive integers k and positive reals r. And secondly, the persistence fingerprint, which is the sequence of order k persistence diagrams, newly defined for infinite periodic point sets, for all positive integers k.
Joint work with Herbert Edelsbrunner, Alexey Garber, Vitaliy Kurlin, Georg Osang, Janos Pach, Morteza Saghafian, Phil Smith, Mathijs Wintraecken.Geometric and Topological Fingerprints for Periodic Crystals
Sara Kalisnik (Bentley University)|
A common problem in data science is to determine properties of a space from a sample. For instance, under certain assumptions a subspace of a Euclidean space may be homotopy equivalent to the union of balls around sample points, which is in turn homotopy equivalent to the Čech complex of the sample. This enables us to determine the unknown space up to homotopy type, in particular giving us the homology of the space. A seminal result by Niyogi, Smale and Weinberger states that if a sample of a closed smooth submanifold of a Euclidean space is dense enough (relative to the reach of the manifold), there exists an interval of radii, for which the union of closed balls around sample points deformation retracts to the manifold. A tangent space is a good local approximation of a manifold, so we can expect that an object, elongated in the tangent direction, will better approximate the manifold than a ball. We present the result that the union of ellipsoids of suitable size around sample points deformation retracts to the manifold while requiring much smaller density than in the case of union of balls. The proof requires new techniques, as unlike the case of balls, the normal projection of a union of ellipsoids is in general not a deformation retraction.Sampling smooth manifolds using ellipsoids
|1:50-2:15||Discussion and Break|
Stefania Ebli (Ecole Polytechnique Fédérale de Lausanne)|
In this talk I will present simplicial neural networks (SNNs), a generalization of graph neural networks to data that live on a class of topological spaces called simplicial complexes. These are natural multi-dimensional extensions of graphs that encode not only pairwise relationships but also higher-order interactions between vertices – allowing us to consider richer data, including vector fields and n-fold collaboration networks. We define an appropriate notion of convolution that we leverage to construct the desired convolutional neural networks. We test the SNNs on the task of imputing missing data on coauthorship complexes. This is joint work with M. Defferrard and G.Spreemann.Simplicial Neural Networks
Sadok Kallel (American University of Sharjah
Variations in neuronal morphology among cell classes, brain regions, and animal species are thought to underlie known heterogeneities in neuronal function. Thus, accurate quantitative descriptions and classification of large sets of neurons is essential for functional characterization. However, unbiased computational methods to classify groups of neurons are currently scarce. We introduce a novel, robust, and unbiased method to study neuronal morphologies. We develop mathematical descriptors that quantitatively characterize structural differences among neuronal cell types and thus classify them. Each descriptor that is assigned to a neuron is a function of a distance from the soma with values in real numbers or more general metric spaces. Standard clustering methods enhanced with detection and metric learning algorithms are then used to objectively cluster and classify neurons.Topological Sholl Descriptors for Neuronal Clustering and Classification
Friday, April 30
Vidit Nanda (University of Oxford)|
Many interesting objects across pure and applied mathematics (including persistence modules, cellular sheaves and connection matrices) are most naturally viewed as linear algebraic data parametrized by a finite space. In this talk, I will describe a practical framework for dimensionality reduction and linear optimization 0ver a wide class of such objects.Compatibility and Optimization for Quiver Representations
|9:50-10:15||Discussion and Break|
Barbara Giunti (Technische universität Graz)|
The use of persistent homology in applications is justified by the validity of certain stability results. At the core of such results is a notion of distance between the invariants that one associates to data sets. While such distances are well-understood in the one-parameter case, the situation for multiparameter persistence modules is more challenging, since there is no generalisation of the barcode.
Here we introduce a general framework to study stability questions in multiparameter persistence. We first introduce the (outer) amplitude, a functional on abelian categories that mimics the properties of an outer measure in measure theory, then study different ways to associate distances to such functionals. Our framework is very comprehensive, as many different invariants that have been introduced in the literature are examples of outer amplitudes, and similarly, we show that many known distances for multiparameter persistence are distances from outer amplitudes.
Finally, we provide new stability results using our framework.The amplitude of an abelian category: Measures in persistence theory
Hitesh Gakhar (University of Oklahoma)
Sliding window embeddings were originally used in the study of dynamical systems to reconstruct the topology of underlying attractors from generic observation functions. In 2015, a technique for recurrence detection in time series data using sliding window embeddings of periodic functions and persistent homology was developed. We study a closely related class of functions, namely quasiperiodic functions, whose constitutive frequencies are non-commensurate harmonics. The sliding window embeddings of such functions are dense in high dimensional tori, where the dimension depends on the number of incommensurate harmonics. In this talk, we will present results pertaining to the structure of sliding window embeddings and their persistent homology, along with a brief discussion on how to choose the embedding parameters.Sliding windows persistence of quasiperiodic functions
Ezra Miller (Duke University)|
Fundamental to applications of ordinary persistent homology in one parameter is the reconstruction of a module from the perfect matching between left endpoints and right endpoints of its bar code. Do these concepts have analogues in multiple parameters? The answer is largely yes: endpoints can be defined, and the module can be reconstructed from them, though the correspondence is not a perfect matching but rather a more arbitrary linear map. The algebra needed for these developments will be covered from scratch, followed by a view toward how they might be used for computational purposes.What are left and right endpoints for multiparameter persistence?
|1:50-2:15||Discussion and Break|
Iris Yoon (University of Delaware)|
We present a new method for comparing topological features using dissimilarity matrices obtained from observing activity in distinct complex systems. Our method uses the Dowker complex of a cross-dissimilarity matrix to identify all possible ways a common feature could be represented by the barcodes of activity within the individual systems. This method can be used to study both how distinct systems respond to the same stimuli and how behavior in one system drives behavior in another. Motivated by questions in neuroscience, our framework will allow researchers to investigate open problems such as how neural systems code for complex stimuli and how such coding structures propagate and evolve through different neural systems without direct reference to external correlates. The same tools can also be applied more generally to explore two-dimensional persistence and to identify which topological features are preserved after dimensionality reduction. This is joint work with Chad Giusti (University of Delaware) and Robert Ghrist (University of Pennsylvania)Identifying analogous topological features across multiple systems
Bastian Rieck (ETH Zurich)|
Topological data analysis emerged as an effective tool in machine learning, supporting the analysis of neural networks, but also driving the development of novel algorithms that incorporate topological characteristics. As a problem class, graph classification is of particular interest here, since graphs are inherently amenable to a topological description in terms of their connected components and cycles. This talk will briefly summarise recent advances in topology-based graph classification, focussing equally on ’shallow’ and ‘deep’ approaches. Starting from an intuitive description of persistent homology, we will discuss how to incorporate topological features into the Weisfeiler–Lehman colour refinement scheme, thus obtaining a simple feature-based graph classification algorithm. We will then build a bridge to graph neural networks and demonstrate a topological variant of ‘readout’ functions, which can be learned end-to-end. Care has been taken to make the talk accessible to an audience that may not have been exposed to machine learning or topological data analysis.Recent Advances in Topology-Based Graph Classification