Speaker
Description
An increasing number of theoretical results are available to characterize the extent to which neural networks can (i) represent scientifically relevant functions and operators, and (ii) learn these functions and operators from data. However, even with the right network architecture and the right dataset, optimization is a bottleneck. On the one hand, popular machine learning optimizers such as SGD and Adam are not designed to attain the levels of precision needed for scientific applications. On the other hand, traditional approaches to high-precision optimization, such as Newton or quasi-Newton methods, are not designed for the highly stochastic training regimes that are characteristic of machine learning. In this talk, I will argue that these limitations can be overcome by adopting randomized linear algebra as a paradigm for high-precision scientific machine learning. As an example, I will share some recent work which uses this paradigm to both explain and improve a promising class of optimizers known as subsampled natural gradient algorithms.