Page history

Page Discussion

Multi-head self-attention

18 March 2023

Walle
Created page with "{{see also|Machine learning terms}} ==Introduction== Multi-head self-attention is a core component of the Transformer architecture, a type of neural network introduced by Vaswani et al. (2017) in the paper "Attention Is All You Need". This mechanism allows the model to capture complex relationships between the input tokens by weighing their importance with respect to each other. The multi-head aspect refers to the parallel attention computations performed on differen..."
13:23
+3,303

Retrieved from "http:///wiki/Special:History/Multi-head_self-attention"