LSTM
- NLP - Lecture - Sequence Models, RNN, LSTM, Encoer-Decoder in NLP
- ML2 - Lecture 15 - LSTM and other gated RNNs
LSTM is a type of RNN with particular gates that allows to implement Attention (Attention is all you need 2017).
Attention could be implemented in two ways:
- Bahdanau Attention: uses additive attention
and a dedicated mini-network or layer was used to learn complex relationships between samples. - Luong Attention: Since
and appartains to the same vector space, the similarity can be measured in a simplified way using dot product. i.e product between current state and previous state. - Self-attention: is the one implemented by transformers, typically using the query-key-value framework. Uses scaled dot product: :