May 18 – 22, 2026
Virginia Tech
America/New_York timezone

Fast and explainable clustering in the Manhattan and Tanimoto distance

May 18, 2026, 11:25 AM
25m
McBryde Hall 113 (Virginia Tech)

McBryde Hall 113

Virginia Tech

Minisymposium Talk Numerical Linear Algebra in Machine Learning Numerical Linear Algebra in Machine Learning

Speaker

Kaustubh Roy (University of Manchester)

Description

The CLASSIX algorithm is a fast and explainable approach to data clustering. In its original form, this method utilizes the first principal component of the data matrix to truncate the search for nearby data points, using the Cauchy-Schwarz inequality, with proximity being defined in terms of the Euclidean distance. In this work, we demonstrate methods to extend CLASSIX to other distance measures by showcasing its effectiveness in the Manhattan distance and the Tanimoto distance. CLASSIX in these two distance metrics uses the 1-norm of the data vectors as the sorting criterion. The triangle inequality is used as a general search truncation criterion applicable to any p-norm, and the Baldi intersection inequality is used as a search truncation criterion for the Tanimoto distance.

Authors

Kaustubh Roy (University of Manchester) Dr Stefan Guettel (University of Manchester)

Presentation materials

There are no materials yet.