Speaker
Description
Spectral clustering is a graph-based method for data partitioning, which relates a relaxed graph partitioning problem to the eigenvectors of the associated graph Laplacian; see, e.g., [Peng et al., SIAM J. Comput. 46(2):710–743, 2017]. Following the computation of these eigenvectors, a post-processing step is used to determine the final clustering. Typically, k-means is the standard choice of post-processing algorithm. Recently [Eldén, SIAM J. Matrix Anal. Appl. 45(1):112–133, 2024], an algorithm called Semi-sparse Orthogonal (SSO) Approximation, has been proposed as an alternative that improves on both clustering accuracy and stability. Given the eigenvector matrix $X \in R^{n\times p}$, SSO seeks $\min_{Q,\Psi}\ \|XQ-\Psi\|_F,$ where $Q\in R^{p\times p}$ is orthogonal and $\Psi\in R^{n\times p}$ is an indicator matrix with one nonzero per row. The method alternates between updating $\Psi$ by assigning each row to the entry of the largest magnitude of $XQ$, and updating $Q$ by the polar factor of $\Psi^T X$.
In this work, we consider clustering problems on large and dense graphs, with a large number of clusters. In such settings, standard Arnoldi type solvers become computationally unattractive. Furthermore, computing the polar factor based on singular value decomposition becomes infeasible at these scales. We present an ongoing investigation that circumvents these difficulties by replacing explicit eigenvector computations with the spectral projector of the graph Laplacian, and by performing post-processing via a large-scale variant of SSO. Our approach exploits methods from matrix functions to compute the spectral projectors and the polar factors using polynomial approximations. This avoids the explicit computation of individual eigenvectors and is well suited for large-scale, dense problems.