In this thread I am going to post, for each individual lecture, a detailed lists of all the subjects that we have presented in class and that will be matter of evaluation at the final exam.
Here is the detailed list of subjects from lecture 1 that we have presented in class and that will be matter of evaluation at the finals.
Lecture 01: Natural language processing: An unexpected journey.
Content: What is natural language processing? A few case studies: finance, social networks, health. A very short history of natural language processing. Why is natural language processing tricky? Word distribution, ambiguity, composition, recursion and hidden structure. How does natural language processing work? Learning & knowledge. Search & learning. Market, environment and ethics.
References: Slides from the lecture; search & learning is from Eisenstein, section 1.2.2.
Here is the detailed list of subjects from lecture 2 that we have presented in class and that will be matter of evaluation at the finals.
Lecture 02: Essentials of linguistics.
Content: What is natural language? Languages of the world. What is linguistics? Generative linguistics. Phonology and rules. Morphology: inflectional and derivational morphology. Concatenative morphology and template morphology. Syntax: Part of speech tags, phrase structure trees and dependency trees. Syntactic ambiguity. Lexical semantics: internal and external semantic structure; lexical ambiguity. General semantics: principle of compositionality; representations of meaning. Pragmatics, discourse and dialogue.
References: Slides from the lecture.
Here is the detailed list of subjects from lecture 3 that we have presented in class and that will be matter of evaluation at the finals.
Lecture 03: Text normalization.
Content: Regular expressions and extended regular expressions, substitution and back-reference. Words, tokens, types and vocabulary. Herdan/Heaps law and Zipf/Mandelbrot law. Word-forms and lemmas; multi-element word-forms. Corpora. Text normalization: language identification, spell checker, contraction, punctuation and special characters. Text tokenization: word tokenization, character tokenization, and subword tokenization. Subword tokenization: learning algorithm, encoder algorithm and decoder algorithm. Byte-pair encoding: algorithm and examples. WordPiece. Sentence segmentation and case folding. Stop words, stemming and lemmatization.
References: Jurafsky & Martin, chapter 2; skip section 2.8. I take it for granted that you already know about regular expressions, which is presented in section 2.1.
Here is the detailed list of subjects from lecture 4 that we have presented in class and that will be matter of evaluation at the finals.
Lecture 04: Words and meaning.
Content: Lexical semantics: word senses and word relationships. Distributional semantics. Review of vectors: vector length, vector normalization and cosine similarity. Vector semantics and term-context matrix. Pointwise mutual information (PMI) and positive pointwise mutual information (PPMI). Probability estimation using word frequency. Practical issues. Truncated singular value decomposition. Neural static word embeddings. Word2vec and skip-gram with negative sampling: target embedding, context embedding, classifier algorithm derived from logistic regression and training algorithm. Practical issues. Other kinds of static embeddings: FastText and Glove. Visualizing word embeddings. Semantic properties of word embeddings. Bias and word embeddings. Evaluation of word embeddings. Cross-lingual word embeddings.
References: Jurafsky & Martin, chapter 6; skip sections 6.3.1, 6.3.2, 6.5, 6.7, and equations (6.35)-(6.40). Truncated singular value decomposition is taken from Eisenstein, section 14.3. Some of the topics under practical issues have been taken from the on-line course 'NLP Course | For You' by Elena Voita. Use lecture slides for cross-lingual word embeddings.
Here is the detailed list of subjects from lecture 5 that we have presented in class and that will be matter of evaluation at the finals.
Lecture 05: Language models.
Content: Language modeling (LM) and applications. Relative frequency estimation. N-gram model, N-gram probabilities and bias-variance tradeoff. Practical issues. Evaluation: perplexity measure. Sampling sentences. Sparse data: Laplace smoothing and add-k smoothing; stupid backoff and linear interpolation; out-of-vocabulary words. Limitations of N-gram model. Neural language models: general architecture. Feedforward neural LM (NLM): inference and training. Recurrent NLM: inference and training. Character level and character-aware NLM. Practical issues: weight tying, adaptive softmax, softmax temperature, contrastive evaluation.
References: Jurafsky & Martin, chapter 3; skip sections 3.7, 3.8. General architecture for NLM has been taken from the on-line course 'NLP Course | For You' by Elena Voita, section Language Modeling. I take it for granted that you already know about feedforward neural networks and recurrent neural networks, which are presented in Jurafsky & Martin, chapters 7 and 8. Feed-forward NLM is in sections 7.6 and 7.7; RNN for LM is in section 8.2.
Here is the detailed list of subjects from lecture 6a that we have presented in class and that will be matter of evaluation at the finals.
Lecture 06a: Contextualized word embeddings
Content: Review of transformers. Static embeddings vs. contextualized embeddings. ELMo architecture. BERT encoder-based model; masked language modeling and next sentence prediction. GPT-n decoder-based model, masked attention. Sentence BERT, training and inference.
References: I take it for granted that you already know about transformes, which are presented in Jurafsky & Martin, chapter 9. BERT is presented in Jurafsky & Martin, sections 11.1, 11.2 and 11.3. ELMo and GPT-n models have been taken from the on-line course 'NLP Course | For You' by Elena Voita, section Transfer Learning. Use lecture slides for sentence BERT.
Here is the detailed list of subjects from lecture 6b that we have presented in class and that will be matter of evaluation at the finals.
Lecture 06b: Large language models
Content: Large language models (LLMs) with transformers. The language modeling head. Decoder-only architecture and text completion. Sampling for LLM generation. Pretraining for LLMs: cross-entropy loss and teacher forcing. Training corpora. Scaling laws for LLMs. Overview of LLMs. LLMs classification: encoder, decoder and encoder-decoder. Multi-lingual LLMs: monolingual training and training based on parallel corpora. Miscellanea: evaluation of LLM, emergent abilities, potential harms from LLM, hallucinations, mixture of experts.
References: Jurafsky & Martin, section 9.5 for language modeling head and chapter 10; skip sections 10.3.3 and 10.5.3, which will be proposed later, and section 10.5.2. Use lecture slides for the following topics: overview of LLMs, LLMs classification, multi-lingual LLM, and miscellanea.
Here is the detailed list of subjects from lecture 6c that we have presented in class and that will be matter of evaluation at the finals.
Lecture 06c: Post-training
Content: The process of post-training or adaptation. Fine-tuning and head layer; catastrophic forgetting. Instruction tuning, also called supervised fine tuning (SFT); datasets for SFT. Model alignment: reinforcement learning and reward function; reinforcement learning with human feedback, preference datasets. Parameter efficient fine-tuning: adapters and LoRA. Transfer learning.
References: Jurafsky & Martin, sections 11.4, 12.2, 12.3. Use lecture slides for model alignment.
Here is the detailed list of subjects from lecture 6d that we have presented in class and that will be matter of evaluation at the finals.
Lecture 06d: ChatBots
Content: ChatBots and life cycle; datasets. Prompt and prompt engineering. Retrieval-augmented generation. Large reasoning models (hints).
References: Jurafsky & Martin, chapter 12 (sections 12.2 and 12.3 already used in lecture 06c). Use lecture slides for large reasoning models. Video by A. Karpathy (see Day 12 box).
Here is the detailed list of subjects from lecture 7 that we have presented in class and that will be matter of evaluation at the finals.
Lecture 07: Part-of-Speech Tagging
Content: Part-of-speech (PoS) and PoS tagging task. Evaluation for PoS tagging. Hidden Markov model (HMM), emission and transition probabilities. Probability estimation. HMM as automata. The Viterbi algorithm for decoding. The forward algorithm and the trellis data structure. Conditional random fields (CRF) and global features. Linear chain CRF, local features and feature templates. Decoding for CRF using Viterbi algorithm. Training algorithm for CRF: loss function, regularization, and stochastic gradient descent (sketch only). Neural PoS taggers using local search: fixed-window feed-forward neural model, recurrent neural model, and recurrent bidirectional model. Neural PoS taggers using global search: neural model combining RNN and CRF (sketch only). Named entity recognition and other sequence labelling tasks.
References: Jurafsky & Martin, chapter 17. Use lecture slides for forward algorithm and trellis, or else look into Jurafsky & Martin appendix A (available through the textbook web page only). Training algorithm for linear chain CRF is taken from Eisenstein, section 7.5.3. Use lecture slides for the fixed-window feed-forward neural model. Recurrent bidirectional model and neural structured prediction are taken from Eisenstein, section 7.6.1.
Here is the detailed list of subjects from lecture 10 that we have presented in class and that will be matter of evaluation at the finals.
Lecture 10: Dependency Parsing
Content: Dependency trees and grammatical functions. Dependency formalisms, projective and non-projective trees. Dependency tree banks and universal dependency project. Transition-based dependency parsing. Arc-standard parser: transition definition, oracle algorithm, and generation of training data. Feature extraction: feature functions and feature templates. Alternative models for dependency parsing (very basic notions only): arc-eager parser, Attardi parser (non-projective), transition-based parsing with beam search, and graph-based dependency parsing (non-projective). Neural architectures for dependency parsing: case study, Kiperwasser & Goldberg 2016; feature extraction using bidirectional LSTM; training algorithm using hinge loss.
References: Jurafsky & Martin, chapter 19; skip section 19.3. Slides from the lecture for the part on neural dependency parsing.
Here is the detailed list of subjects from lecture 12 that we have presented in class and that will be matter of evaluation at the finals.
Lecture 12: Machine Translation
Content: Word ordering and V, S, O language classification. Word translation and word alignment relation. Statistical machine translation (SMT) using language model and translation model. Neural machine translation (NMT). Encoder-decoder neural architecture (seq2seq). Autoregressive encoder-decoder using RNN: greedy inference algorithm; training algorithm, average cross-entropy loss, teacher forcing. Encoder-decoder RNN using attention techniques: dynamic context, dot-product attention, and bilinear attention. Transformer-based architecture for NMT: cross-attention, query, key and value. Search tree and beam search. Parallel corpora. Evaluation: BLEU metric and METEOR metric. Leaderboards.
References: Jurafsky & Martin, chapter 13; skip SentencePiece from section 13.2.1, and formula (13.10) from section 13.2.2. Use lecture slides for word alignment and statistical machine translation.
Here is the detailed list of subjects from lecture 13 that we have presented in class and that will be matter of evaluation at the finals.
Lecture 13: Question Answering
Content: Question answering (QA) and factoid questions. Text-based QA, information retrieval and reading comprehension. Answer span extraction using contextual embeddings: start and end vectors, fine-tuning loss, and negative examples. Answer span extraction using RNN and attention: Stanford attentive reader; bilinear product and attention. Practical issues. Datasets, evaluation measures, and leaderboard. Knowledge-based QA; entity linking.
References: For answer span extraction using contextual embeddings, see the slides from the lecture. For Stanford attentive reader, see Eisenstein, section 17.5.2. Use lecture slides for datasets, evaluation measures, leaderboard knowledge-based QA, and entity linking.