Speaker
Description
The CANDECOMP/PARAFAC (CP) decomposition is a powerful tool used for multiway data analysis and to break the “curse of dimensionality” associated with higher-order tensors. The most common way to compute the CP decomposition of a tensor is via a standard alternating least squares (CP-ALS) algorithm. With the CP-ALS, one must iteratively solve a set of overdetermined least squares problem which can quickly become expensive in time and memory. Recently, Battaglino et al[1] developed a randomized CP-ALS algorithm that significantly reduced the cost of the optimization procedure with little impact on the CP decomposition's accuracy. The strategy leverages the structure of the Khatri-Rao product to efficiently compute a sampling probability distribution for each least squares sub-problem. Unfortunately, on modern computer architecture sampling a tensor is a memory-bound operation, i.e. the computational performance of the randomized CP-ALS method can become bound by the bandwidth and latency of a machine. This work seeks to improve on the computational performance of the randomized, sampled CP-ALS strategy by studying the effects of sampling blocks of columns in each least squares problem. In this talk, we illustrate how block-wise sampling impacts the theoretical performance of the randomized CP-ALS algorithm and provide numerical results comparing the block and standard sampled CP-ALS decomposition.
References:
[1]. Casey Battaglino, Grey Ballard, and Tamara G. Kolda. A practical randomized CP tensor decomposition. SIAM J. Matrix Anal. Appl., 39(2):876–901, 2018.