Speaker
Description
In modern machine learning applications, data matrices are always assumed to admit the signal-plus-noise structure. Typically, we assume that the spectra of signal and noise matrices are well-separated and that noise subspaces only produce a marginal influence. While these assumptions are readily verified for dense matrices via classical random matrix theory, real-world data is often sparse, posing significant theoretical challenges. We investigate the spectral properties of a matrix $X \in \mathbb{R}^{n \times m}$ with i.i.d. $\text{Bernoulli}(p)$ entries. Previous literature proved that when $np \gg \log(n)$, the singular values of $X$ almost surely remain within the compact support of the Marčenko-Pastur (MP) distribution. However, we identify a critical sparsity regime $p = b \log(n) / \sqrt{mn}$ where this classical result fails. We provide a quantitative characterization of the emergence of outlier singular values. For explicit thresholds $b_*$ and $b^*$ as functions of the aspect ratio $\gamma = n/m \ge 1$, we prove a three-phase transition: (1) for $b > b_*$, no outliers exist; (2) for $b^* < b < b_*$, outliers emerge only beyond the right edge of the MP law; and (3) for $b < b^*$, outliers appear on both sides of the bulk, all with high probability. The locations of those outliers are precisely determined by the largest and smallest degree vertices of the underlying random graph. Besides, behavior of singular vectors corresponding to bulk and edge singular values can be characterized precisely. Our approach follows the framework established by Alt, Ducatez, and Knowles (2021), which can be extended to sparse matrices with general bounded entries.