Day 03
Section outline
-
March 12th, Thursday (10:30-12:30)
Text normalization
- Byte-pair encoding algorithm: learner, encoder and decoder (cont'd)
- Sentence segmentation and case folding
- Stop words, stemming and lemmatization
Words and meaning
- Lexical semantics
- Distributional semantics
- Count-based embeddings
- Word2vec and skip-gram
- Logistic regression
- Training
References
- Jurafsky and Martin, chapter 2
- Jurafsky and Martin, chapter 5
Resources