This was part of Learning Collective Variables and Coarse Grained Models

Manifold coordinates with physical meaning, and applications in MDS data

Hanyu Zhang, TikTok

Thursday, April 25, 2024



Slides
Abstract: One of the goals of both linear and non-linear dimension reduction is to identify a reduced set of collective variables that capture the essence of the data manifold. While such algorithms produce abstract coordinates—often represented by spaces spanned by eigenvectors of matrices that depend on the data—it is common to link these coordinates to specific data features, thereby connecting them with domain-specific significance. Typically, experts determine these domain-specific or physical meanings by visual analysis. In this talk, I will formulate the problem as a sparse, non-parametric, and non-linear task of recovering manifold coordinates using a predefined dictionary of domain-relevant functions, a method I call ManifoldLasso. I will demonstrate that the original issue can be converted into a linear Group Lasso problem. Additionally, I will introduce a simpler variant, TSlasso, and argue that non-linear dimension reduction isn't essential, as one can derive physically meaningful manifold coordinates by projecting the gradients of dictionary functions onto tangent spaces. I will further provide an end-to-end recovery guarantee for this approach and illustrate its effectiveness with molecular simulation data.