Speaker
Description
Analyzing large-scale scientific data—such as molecular dynamics simulations of $MoS_2$ recrystallization—poses significant challenges for traditional methods like Nonnegative Matrix Factorization (NMF), particularly on exascale systems. In this talk, we introduce Low-Rank Approximations with Constraints at Exascale (LORACX), a scalable framework that employs distributed, GPU-accelerated NMF integrated into a modern, Python-based HPC stack.
Key innovations include communication-efficient designs using blocked and overlapped algorithms to mitigate latency and memory constraints, as well as GPU-optimized Nonnegative Least Squares (NNLS) solvers. Performance evaluations on up to 8,192 Frontier nodes demonstrate strong scalability, processing a 16.3 × 16.3 million matrix in 3 seconds and achieving 0.67 exaflops in double precision. We present detailed weak-scaling results, including computational versus communication cost analyses, and show that baseline comparisons consistently confirm the superior performance of LORACX-GPU.
Applied to $MoS_2$ molecular dynamics data, LORACX successfully identifies structural motifs and captures phase transition dynamics, highlighting its potential as a powerful tool for large-scale materials science discovery.