LSTM

LSTM is a type of RNN with particular gates that allows to implement Attention (Attention is all you need 2017).

Attention could be implemented in two ways:

Bahdanau Attention: uses additive attention and a dedicated mini-network or layer was used to learn complex relationships between samples.
Luong Attention: Since and appartains to the same vector space, the similarity can be measured in a simplified way using dot product. i.e product between current state and previous state.
Self-attention: is the one implemented by transformers, typically using the query-key-value framework. Uses scaled dot product: :

Obsidian + 🪴 Quartz 4.0