Speaker
Description
The CLASSIX algorithm is a fast and explainable approach to data clustering. In its original form, this method utilizes the first principal component of the data matrix to truncate the search for nearby data points, using the Cauchy-Schwarz inequality, with proximity being defined in terms of the Euclidean distance. In this work, we demonstrate methods to extend CLASSIX to other distance measures by showcasing its effectiveness in the Manhattan distance and the Tanimoto distance. CLASSIX in these two distance metrics uses the 1-norm of the data vectors as the sorting criterion. The triangle inequality is used as a general search truncation criterion applicable to any p-norm, and the Baldi intersection inequality is used as a search truncation criterion for the Tanimoto distance.