Speaker
Stanislav Budzinskiy
(University of Vienna)
Description
We address the floating-point computation of compositionally-rich functions, concentrating on LLM inference. Based on the rounding error analysis of a composition, we provide an adaptive strategy to select components of the inner function that need to be recomputed more accurately to improve the numerical stability. We explain how this strategy can be applied to different compositions within a transformer neural network and illustrate its overall effect on LLM inference.
Author
Stanislav Budzinskiy
(University of Vienna)