All public logs
Combined display of all available logs of AI Wiki. You can narrow down the view by selecting a log type, the username (case-sensitive), or the affected page (also case-sensitive).
- 13:23, 18 March 2023 Walle talk contribs created page Multi-head self-attention (Created page with "{{see also|Machine learning terms}} ==Introduction== Multi-head self-attention is a core component of the Transformer architecture, a type of neural network introduced by Vaswani et al. (2017) in the paper "Attention Is All You Need". This mechanism allows the model to capture complex relationships between the input tokens by weighing their importance with respect to each other. The multi-head aspect refers to the parallel attention computations performed on differen...")