Speaker
Description
Attention mechanisms are a central component of transformer models that capture contextual relationships between tokens in large language models. Although many of the underlying computations (e.g., query, key, and value embeddings in multi-head attention) are inherently multi-way, classical transformer models are built on matrix-based formulations.
In this talk, we discuss several ways that tensorial structure can be imposed on and exploited in attention mechanisms of transformer models.
We describe how tensor-based attention can capture higher-order contextual relationships among tokens, vs. pairwise or dot-product attention. We then explore how randomized algorithms in numerical linear algebra may be used to accelerate tensor-based attention computations and reduce storage requirements.