Syllabus

Syllabus

by Giorgio Satta -
Number of replies: 10

In this thread I am going to post, for each individual lecture, a detailed lists of all the subjects that we have presented in class and that will be matter of evaluation at the final exam.

In reply to Giorgio Satta

Re: Syllabus

by Giorgio Satta -

Here is the detailed list of subjects from lecture 1 that we have presented in class and that will be matter of evaluation at the finals.

Lecture 01: Natural language processing: An unexpected journey.

Content: What is natural language processing? A few case studies: finance, social networks, health. A very short history of natural language processing. Why is natural language processing tricky? Word distribution, ambiguity, composition, recursion and hidden structure. How does natural language processing work? Learning & knowledge. Search & learning. Market, environment and ethics.

References: Slides from the lecture; search & learning is from Eisenstein, section 1.2.2.

In reply to Giorgio Satta

Re: Syllabus

by Giorgio Satta -

Here is the detailed list of subjects from lecture 2 that we have presented in class and that will be matter of evaluation at the finals.

Lecture 02: Essentials of linguistics.

Content: What is natural language? Languages of the world. What is linguistics? Generative linguistics. Phonology and rules. Morphology: inflectional and derivational morphology. Concatenative morphology and template morphology. Syntax: Part of speech tags, phrase structure trees and dependency trees. Syntactic ambiguity. Lexical semantics: internal and external semantic structure; lexical ambiguity. General semantics: principle of compositionality; representations of meaning. Pragmatics, discourse and dialogue.

References: Slides from the lecture.

In reply to Giorgio Satta

Re: Syllabus

by Giorgio Satta -

Here is the detailed list of subjects from lecture 3 that we have presented in class and that will be matter of evaluation at the finals.

Lecture 03: Text normalization.

Content: Regular expressions and extended regular expressions, substitution and back-reference. Words, tokens, types and vocabulary. Herdan/Heaps law and Zipf/Mandelbrot law. Word-forms and lemmas; multi-element word-forms. Corpora. Text normalization: language identification, spell checker, contraction, punctuation and special characters. Text tokenization: word tokenization, character tokenization, and subword tokenization. Subword tokenization: learning algorithm, encoder algorithm and decoder algorithm. Byte-pair encoding: algorithm and examples. WordPiece. Sentence segmentation and case folding. Stop words, stemming and lemmatization.

References: Jurafsky & Martin, chapter 2; skip section 2.8. I take it for granted that you already know about regular expressions, which is presented in section 2.1.

In reply to Giorgio Satta

Re: Syllabus

by Giorgio Satta -

Here is the detailed list of subjects from lecture 4 that we have presented in class and that will be matter of evaluation at the finals.

Lecture 04: Words and meaning.

Content: Lexical semantics: word senses and word relationships. Distributional semantics. Review of vectors: vector length, vector normalization and cosine similarity. Vector semantics and term-context matrix. Pointwise mutual information (PMI) and positive pointwise mutual information (PPMI). Probability estimation using word frequency. Practical issues. Truncated singular value decomposition. Neural static word embeddings. Word2vec and skip-gram with negative sampling: target embedding, context embedding, classifier algorithm derived from logistic regression and training algorithm. Practical issues. Other kinds of static embeddings: FastText and Glove. Visualizing word embeddings. Semantic properties of word embeddings. Bias and word embeddings. Evaluation of word embeddings. Cross-lingual word embeddings.

References: Jurafsky & Martin, chapter 6; skip sections 6.3.1, 6.3.2, 6.5, 6.7, and equations (6.35)-(6.40). Truncated singular value decomposition is taken from Eisenstein, section 14.3. Some of the topics under practical issues have been taken from the on-line course 'NLP Course | For You' by Elena Voita. Use lecture slides for cross-lingual word embeddings.

In reply to Giorgio Satta

Re: Syllabus

by Giorgio Satta -

Here is the detailed list of subjects from lecture 5 that we have presented in class and that will be matter of evaluation at the finals.

Lecture 05: Language models.

Content: Language modeling (LM) and applications. Relative frequency estimation. N-gram model, N-gram probabilities and bias-variance tradeoff. Practical issues. Evaluation: perplexity measure. Sampling sentences. Sparse data: Laplace smoothing and add-k smoothing; stupid backoff and linear interpolation; out-of-vocabulary words. Limitations of N-gram model. Neural language models: general architecture. Feedforward neural LM (NLM): inference and training. Recurrent NLM: inference and training. Character level and character-aware NLM. Practical issues: weight tying, adaptive softmax, softmax temperature, contrastive evaluation.

References: Jurafsky & Martin, chapter 3; skip sections 3.7, 3.8. General architecture for NLM has been taken from the on-line course 'NLP Course | For You' by Elena Voita, section Language Modeling. I take it for granted that you already know about feedforward neural networks and recurrent neural networks, which are presented in Jurafsky & Martin, chapters 7 and 8. Feed-forward NLM is in sections 7.6 and 7.7; RNN for LM is in section 8.2.

In reply to Giorgio Satta

Re: Syllabus

by Giorgio Satta -

Here is the detailed list of subjects from lecture 6a that we have presented in class and that will be matter of evaluation at the finals.

Lecture 06a: Contextualized word embeddings

Content: Review of transformers. Static embeddings vs. contextualized embeddings. ELMo architecture. BERT encoder-based model; masked language modeling and next sentence prediction. GPT-n decoder-based model, masked attention. Sentence BERT, training and inference.

References: I take it for granted that you already know about transformes, which are presented in Jurafsky & Martin, chapter 9. BERT is presented in Jurafsky & Martin, sections 11.1, 11.2 and 11.3. ELMo and GPT-n models have been taken from the on-line course 'NLP Course | For You' by Elena Voita, section Transfer Learning. Use lecture slides for sentence BERT.

In reply to Giorgio Satta

Re: Syllabus

by Giorgio Satta -

Here is the detailed list of subjects from lecture 6b that we have presented in class and that will be matter of evaluation at the finals.

Lecture 06b: Large language models

Content: Large language models (LLMs) with transformers. The language modeling head. Decoder-only architecture and text completion. Sampling for LLM generation. Pretraining for LLMs: cross-entropy loss and teacher forcing. Training corpora. Scaling laws for LLMs. Overview of LLMs. LLMs classification: encoder, decoder and encoder-decoder. Multi-lingual LLMs: monolingual training and training based on parallel corpora. Miscellanea: evaluation of LLM, emergent abilities, potential harms from LLM, hallucinations, mixture of experts.

References: Jurafsky & Martin, section 9.5 for language modeling head and chapter 10; skip sections 10.3.3 and 10.5.3, which will be proposed later, and section 10.5.2. Use lecture slides for the following topics: overview of LLMs, LLMs classification, multi-lingual LLM, and miscellanea.

In reply to Giorgio Satta

Re: Syllabus

by Giorgio Satta -

Here is the detailed list of subjects from lecture 6c that we have presented in class and that will be matter of evaluation at the finals.

Lecture 06c: Post-training

Content: The process of post-training or adaptation. Fine-tuning and head layer; catastrophic forgetting. Instruction tuning, also called supervised fine tuning (SFT); datasets for SFT. Model alignment: reinforcement learning and reward function; reinforcement learning with human feedback, preference datasets. Parameter efficient fine-tuning: adapters and LoRA. Transfer learning.

References: Jurafsky & Martin, sections 11.4, 12.2, 12.3. Use lecture slides for model alignment.

In reply to Giorgio Satta

Re: Syllabus

by Giorgio Satta -

Here is the detailed list of subjects from lecture 6d that we have presented in class and that will be matter of evaluation at the finals.

Lecture 06d: ChatBots

Content: ChatBots and life cycle; datasets. Prompt and prompt engineering. Retrieval-augmented generation. Large reasoning models (hints).

References: Jurafsky & Martin, chapter 12 (sections 12.2 and 12.3 already used in lecture 06c). Use lecture slides for large reasoning models. Video by A. Karpathy (see Day 12 box).

In reply to Giorgio Satta

Re: Syllabus

by Giorgio Satta -

Here is the detailed list of subjects from lecture 7 that we have presented in class and that will be matter of evaluation at the finals.

Lecture 07: Part-of-Speech Tagging

Content: Part-of-speech (PoS) and PoS tagging task. Evaluation for PoS tagging. Hidden Markov model (HMM), emission and transition probabilities. Probability estimation. HMM as automata. The Viterbi algorithm for decoding. The forward algorithm and the trellis data structure. Conditional random fields (CRF) and global features. Linear chain CRF, local features and feature templates. Decoding for CRF using Viterbi algorithm. Training algorithm for CRF: loss function, regularization, and stochastic gradient descent (sketch only). Neural PoS taggers using local search: fixed-window feed-forward neural model, recurrent neural model, and recurrent bidirectional model. Neural PoS taggers using global search: neural model combining RNN and CRF (sketch only). Named entity recognition and other sequence labelling tasks.

References: Jurafsky & Martin, chapter 17. Use lecture slides for forward algorithm and trellis, or else look into Jurafsky & Martin appendix A (available through the textbook web page only). Training algorithm for linear chain CRF is taken from Eisenstein, section 7.5.3. Use lecture slides for the fixed-window feed-forward neural model. Recurrent bidirectional model and neural structured prediction are taken from Eisenstein, section 7.6.1.