LSTM

LSTM is a type of RNN with particular gates that allows to implement Attention (Attention is all you need 2017).

Attention could be implemented in two ways:

  • Bahdanau Attention: uses additive attention and a dedicated mini-network or layer was used to learn complex relationships between samples.
  • Luong Attention: Since and appartains to the same vector space, the similarity can be measured in a simplified way using dot product. i.e product between current state and previous state.
  • Self-attention: is the one implemented by transformers, typically using the query-key-value framework. Uses scaled dot product: :