This was part of
Learning Collective Variables and Coarse Grained Models
Manifold coordinates with physical meaning, and applications in MDS data
Hanyu Zhang, TikTok
Thursday, April 25, 2024
Abstract: One of the goals of both linear and non-linear dimension reduction is to identify a reduced set of collective variables that capture the essence of the data manifold. While such algorithms produce abstract coordinates—often represented by spaces spanned by eigenvectors of matrices that depend on the data—it is common to link these coordinates to specific data features, thereby connecting them with domain-specific significance. Typically, experts determine these domain-specific or physical meanings by visual analysis. In this talk, I will formulate the problem as a sparse, non-parametric, and non-linear task of recovering manifold coordinates using a predefined dictionary of domain-relevant functions, a method I call ManifoldLasso. I will demonstrate that the original issue can be converted into a linear Group Lasso problem. Additionally, I will introduce a simpler variant, TSlasso, and argue that non-linear dimension reduction isn't essential, as one can derive physically meaningful manifold coordinates by projecting the gradients of dictionary functions onto tangent spaces. I will further provide an end-to-end recovery guarantee for this approach and illustrate its effectiveness with molecular simulation data.