This was part of Eliciting Structure in Genomics Data

Two persistent puzzles in multivariate statistics; “rotations” and “picking k”

Karl Rohe, University of Wisconsin-Madison
Monday, August 30, 2021

Abstract: This talk will give new intuition and theory for the hugely popular Varimax rotation. Varimax was proposed in the 1950s, is widely cited, and loaded into base R. Unfortunately, many statisticians have not yet heard about it! This talk will give a statistical theory showing that Principal Component Analysis (PCA) with the Varimax rotation is consistent for a broad class of semi-parametric models that includes Stochastic Blockmodels and Latent Dirichlet Allocation. Factor rotations have lacked such a statistical theory since Spearman and Thurstone fought about this in the 1930s and 40s. Time permitting, the second part of this talk will discuss “cross-validated eigenvalues” for data matrices with Poisson elements (or sparse Bernoulli elements). This approach is fast to run and provides a simple CLT + p-value for sample eigenvectors, even in settings where "signal eigenvectors" might not be detectible.