Section outline

  • March 19th, Wednesday (16:30-18:30)

    Transformers: short recap

    • Attention
    • Encoder
    • Decoder
    • Residual stream

    Contextualised word embeddings

    • Static embeddings vs. contextualized embeddings
    • ELMo
    • BERT: encoder-only model
    • Masked language modeling
    • Next sentence prediction

    References

    • Jurafsky and Martin, chapter 9
    • Jurafsky and Martin, sections 11.1, 11.2, 11.3
    • Voita, NLP Course | For You (web course): Language Modeling

    Resources