Speaker
Description
As machine learning and AI continue to shape modern hardware design, reduced-precision arithmetic has become essential in high-performance computing. Recent advances in hardware architectures—such as AI accelerators, GPUs, and tensor-core technologies—many of which are driven by machine-learning workloads, are optimized for low-precision operations to improve performance and reduce energy consumption in computationally intensive scientific applications. However, the primary drawback of uniformly low-precision methods remains their potential numerical instability, particularly when solving linear systems. As a result, mixed-precision strategies have become widely used in numerical linear and multilinear algebra to balance performance, energy efficiency, and accuracy while still exploiting ML-optimized hardware.
QR factorizations are among the core algorithms in numerical linear algebra and are also playing an increasing role in machine-learning applications such as randomized embeddings, dimensionality reduction, and least-squares problems. However, their computational cost remains significant for large-scale matrices. In this talk, we first present a uniform half-precision Householder QR decomposition based on the WY representation and examine both its potential performance advantages on modern AI hardware and its significant limitations. We then show how increasing the precision of key computations can substantially improve accuracy and how these choices interact with ML-optimized hardware. Finally, we discuss how reduced- and mixed-precision strategies can be incorporated into other QR decomposition algorithms, including shiftedCholeskyQR3 and TSQR, to further leverage modern AI hardware and software ecosystems.