Syllabus

Syllabus

by Giorgio Satta -
Number of replies: 12

In this thread I am going to post, for each individual lecture, a detailed lists of all the subjects that we have presented in class and that will be matter of evaluation at the final exam.

In reply to Giorgio Satta

Re: Syllabus

by Giorgio Satta -

Here is the detailed list of subjects from lecture 1 that we have presented in class and that will be matter of evaluation at the finals.

Lecture 01: Natural language processing: An unexpected journey.

Content: What is natural language processing? A very short history of natural language processing. Why is natural language processing tricky? Word distribution, ambiguity, composition, recursion and hidden structure. Language & Learning.

References: Slides from the lecture.

In reply to Giorgio Satta

Re: Syllabus

by Giorgio Satta -

Here is the detailed list of subjects from lecture 3 that we have presented in class and that will be matter of evaluation at the finals.

Lecture 03: Text normalization.

Content: Words, tokens, types and vocabulary. Herdan/Heaps law and Zipf/Mandelbrot law. Morphology: root and affixes; inflectional and derivational morphology. Word-forms and lemmas; multi-element word-forms. Corpora. Text normalization: language identification, spell checker, contraction, punctuation and special characters. Text tokenization: word tokenization, character tokenization, and subword tokenization. Subword tokenization: learning algorithm, encoder algorithm and decoder algorithm. Byte-pair encoding: algorithm and examples. Byte-level BPE and WordPiece. Sentence segmentation and case folding. Stop words, stemming and lemmatization.

References: Jurafsky & Martin, chapter 2; skip sections 2.3, 2.6 and 2.9. Slides from the lecture.

In reply to Giorgio Satta

Re: Syllabus

by Giorgio Satta -

Here is the detailed list of subjects from lecture 4 that we have presented in class and that will be matter of evaluation at the finals.

Lecture 04: Words and meaning.

Content: Lexical semantics: word senses and word relationships. Distributional semantics. Vector semantics and term-context matrix. Cosine similarity. Neural static word embeddings. Word2vec and skip-gram with negative sampling: target embedding, context embedding, classifier algorithm derived from logistic regression and training algorithm. Practical issues. Other kinds of static embeddings: FastText and Glove. Visualizing word embeddings. Semantic properties of word embeddings. Bias and word embeddings. Evaluation of word embeddings. Cross-lingual word embeddings.

References: Jurafsky & Martin, chapter 5; skip equations (5.22)-(5.27). Some of the topics under practical issues have been taken from the on-line course 'NLP Course | For You' by Elena Voita. Use lecture slides for cross-lingual word embeddings.

In reply to Giorgio Satta

Re: Syllabus

by Giorgio Satta -

Here is the detailed list of subjects from lecture 5a that we have presented in class and that will be matter of evaluation at the finals.

Lecture 05a: Statistical language models.

Content: Language modeling (LM) and applications. Relative frequency estimation. N-gram model, N-gram probabilities and bias-variance tradeoff. Practical issues. Evaluation: perplexity measure. Sampling sentences. Sparse data: Laplace smoothing and add-k smoothing; stupid backoff and linear interpolation; out-of-vocabulary words. Limitations of N-gram model.

References: Jurafsky & Martin, chapter 3; skip sections 3.7, 3.8.

In reply to Giorgio Satta

Re: Syllabus

by Giorgio Satta -

Here is the detailed list of subjects from lecture 5b that we have presented in class and that will be matter of evaluation at the finals.

Lecture 05b: Neural language models.

Content: Neural language models: general architecture. Feedforward neural LM (NLM): inference and training. Recurrent NLM: inference and training. Character level and character-aware NLM. Practical issues: weight tying, adaptive softmax, softmax temperature, contrastive evaluation.

References: Jurafsky & Martin, section 6.5 (skip subsection 'Pooling input embeddings for sentiment') and section 13.2. General architecture for NLM has been taken from the on-line course 'NLP Course | For You' by Elena Voita, section Language Modeling. I take it for granted that you already know about feedforward neural networks and recurrent neural networks: feedforward neural networks are presented in Jurafsky & Martin sections 6.3 and 6.6; recurrent neural networks are presented in Jurafsky & Martin section 13.1.

In reply to Giorgio Satta

Re: Syllabus

by Giorgio Satta -

Here is the detailed list of subjects from lecture 6 that we have presented in class and that will be matter of evaluation at the finals.

Lecture 06: Contextual word embeddings.

Content: Static embeddings vs. contextual embeddings. ELMo architecture. BERT encoder-based model. Masked language modeling. Next sentence prediction. Applications of BERT: sentiment analysis and named-entity recognition. GPT-n decoder-based model, masked attention. Sentence BERT.

References: I take it for granted that you already know about transformers, which are presented in Jurafsky & Martin, chapter 8. ELMo model has been taken from the on-line course 'NLP Course | For You' by Elena Voita, section Transfer Learning. BERT is presented in Jurafsky & Martin, chapter 9; skip sections 9.2.3, 9.3.1, 9.3.2, 9.4.2. Use lecture slides for GPT-n and sentence BERT.

In reply to Giorgio Satta

Re: Syllabus

by Giorgio Satta -

Here is the detailed list of subjects from lecture 7 that we have presented in class and that will be matter of evaluation at the finals.

Lecture 07: Large language models & pretraining

Content: Large language models (LLMs) with transformers. The language modeling head. Decoder-only architecture and text completion. Sampling for LLM generation. Key-Value cache. Pretraining for LLMs: cross-entropy loss and teacher forcing. Corpora for pretraining and quality filtering. Scaling laws for LLMs and new paradigms for pretraining. Overview of commercial LLMs. LLMs classification: encoder, decoder and encoder-decoder. Multi-lingual LLMs: monolingual training and training based on parallel corpora. Miscellanea: evaluation of LLM, emergent abilities, potential harms from LLM, hallucinations, mixture of experts.

References: Jurafsky & Martin, chapter 7; skip section 7.3 which will be presented later. Jurafsky & Martin, sections 8.5, 8.6, 8.7 and 8.8; skip section 8.8.3 which will be presented later. Use lecture slides for the following topics: overview of LLMs, LLMs classification, multi-lingual LLM, and miscellanea.

In reply to Giorgio Satta

Re: Syllabus

by Giorgio Satta -

Here is the detailed list of subjects from lecture 8 that we have presented in class and that will be matter of evaluation at the finals.

Lecture 08: Large language models & post-training

Content: The process of post-training or adaptation. Fine-tuning and head layer; catastrophic forgetting; supervised fine-tuning (SFT). Instruction tuning; datasets for instruction tuning. Model alignment: reinforcement learning and reward function, reinforcement learning with human feedback, preference datasets. Preference-based learning: modeling preferences and the Bradley-Terry model, learning to score preferences, LLM alignment via preference learning. Direct preference optimization (general idea only). Parameter efficient fine-tuning: adapters and LoRA. Transfer learning.

References: Jurafsky & Martin, chapter 10; section 10.3 basic ideas only; skip section 10.4. See Jurafsky & Martin section 8.8.3 for LoRA. See the on-line course 'NLP Course | For You' by Elena Voita for adapters and for transfer learning.

In reply to Giorgio Satta

Re: Syllabus

by Giorgio Satta -

Here is the detailed list of subjects from lecture 9 that we have presented in class and that will be matter of evaluation at the finals.

Lecture 09: ChatBots

Content: ChatBots and chatBot life cycle. Datasets for natural language understanding; chatBot evaluation. Prompt and in-context learning. Prompt engineering and techniques for prompt design. Ethics.

References: Jurafsky & Martin, sections 7.3, 7.7 and 8.9. Video by A. Karpathy (see Day 16 box). Use lecture slides for prompt engineering.

In reply to Giorgio Satta

Re: Syllabus

by Giorgio Satta -

Here is the detailed list of subjects from lecture 10 that we have presented in class and that will be matter of evaluation at the finals.

Lecture 10: Retrieval augmented generation

Content: RAG general architecture. Neural information retrieval, cross-encoder and bi-encoder; ColBERT. Generation. Advanced RAG methods. Datasets and evaluation.

References: Jurafsky & Martin, chapter 11; skip sections 11.1 and 11.2. Use lecture slides for ColBERT.

In reply to Giorgio Satta

Re: Syllabus

by Giorgio Satta -

Here is the detailed list of subjects from lecture 13 that we have presented in class and that will be matter of evaluation at the finals.

Lecture 13: Part-of-Speech Tagging

Content: Part-of-speech (PoS) and PoS tagging task. Evaluation of PoS tagging. Definition of structured prediction problem. Hidden Markov model (HMM), emission and transition probabilities. Probability estimation. HMM as automata. The Viterbi algorithm for decoding. Neural PoS taggers using local search: fixed-window feed-forward neural model, recurrent neural model, and recurrent bidirectional model. Named entity recognition and other sequence labelling tasks.

References: Jurafsky & Martin, chapter 17; skip section 17.5. Use lecture slides for the fixed-window feed-forward neural model. Recurrent bidirectional model are taken from Eisenstein, section 7.6.1.

In reply to Giorgio Satta

Re: Syllabus

by Giorgio Satta -

Here is the detailed list of subjects from lecture 14 that we have presented in class and that will be matter of evaluation at the finals.

Lecture 14: Dependency Parsing

Content: Dependency trees and grammatical functions. Dependency grammars, projective and non-projective trees. Dependency tree banks and universal dependency project. Transition-based dependency parsing. Arc-standard parser: transition definition, oracle algorithm, and generation of training data. Feature extraction: feature functions and feature templates. Neural architectures for dependency parsing: case study, Kiperwasser & Goldberg 2016; feature extraction using bidirectional LSTM; training algorithm using hinge loss.

References: Jurafsky & Martin, chapter 19; skip section 19.3. Slides from the lecture for the part on neural dependency parsing.