Section outline

  • March 12th, Thursday (10:30-12:30)

    Text normalization

    • Byte-pair encoding algorithm: learner, encoder and decoder (cont'd)
    • Sentence segmentation and case folding
    • Stop words, stemming and lemmatization

    Words and meaning

    • Lexical semantics
    • Distributional semantics
    • Count-based embeddings
    • Word2vec and skip-gram
    • Logistic regression
    • Training

    References

    • Jurafsky and Martin, chapter 2
    • Jurafsky and Martin, chapter 5

    Resources