Topic outline

  • INQ0091105 - NATURAL LANGUAGE PROCESSING 2024-2025 - PROF. GIORGIO SATTA

    Content: Detailed description of course content and prerequisites can be found here.

    Textbook: The adopted textbook is Speech and Language Processing (3rd Edition, draft, January 12, 2025) by Dan Jurafsky and James H. Martin, available here.

    Additional resources: The following textbook can be used for consultation only: Introduction to Natural Language Processing by Jacob Eisenstein, October 2019, MIT Press, preprint version available here. The course also uses an electronic forum for discussion of technical matter and administrative information. You can also access video recordings of the lectures from academic year 2021/22 at this link

    Logistics: Lectures are on Monday 16:30-18:30 (room Ce) and on Wednesday 16:30-18:30 (room Ce).

    Office hours: Wednesday 12:30-14:30, email appointment required. Meetings can be face-to-face or else on-line at this Zoom link.

    • Forum for general news and announcements. Only the lecturer can post in this forum. Subscription to this forum is automatic for every student who has registered to this course.

    • Forum for discussion of technical matter presented during the lectures. Any student with a unipd account can post in this forum. Subscription to this forum is automatic for every student who has registered to this course.

    • Forum for discussion of technical matter presented during the open laboratory sessions. Any student with a unipd account can post in this forum. Subscription to this forum is automatic for every student who has registered to this course.

    • Forum for project discussion. Any student with a unipd account can post in this forum. Subscription to this forum is automatic for every student who has registered to this course.

  • Day 01

    February 24th, Monday (16:30-18:30)

    Course administration and presentation

    • Content outline
    • Laboratory sessions
    • Course requirements
    • Textbook
    • Project
    • Coursework
    • Statistics
    • Lecturer evaluation

    Natural language processing: An unexpected journey

    • What is natural language processing?
    • A few case studies: finance, social networks, health
    • Very short history of natural language processing
    • Why is natural language processing tricky?
    • Ambiguity, composition, recursion and hidden structure
    • How does natural language processing work?
    • Learning & knowledge
    • Search & learning
    • Market, environment and ethics

    References

    • Slides from the lecture
    • Eisenstein, chapter 1 for learning & knowledge and for search & learning

    Resources

  • Day 02

    February 26th, Wednesday (16:30-18:30)

    Essentials of linguistics

    • What is linguistics?
    • Phonology
    • Morphology
    • Part of speech
    • Syntax: phrase structure and dependency structure
    • Lexical semantics and general semantics
    • Pragmatics and discourse

    Text normalization

    • Regular expressions
    • Word types and word tokens
    • Corpora
    • Language identification and spell checking
    • Text normalization: contraction, punctuation and special characters

    References

    • Slides from the lecture
    • Jurafsky and Martin, chapter 2

    Resources

  • Day 03

    March 3rd, Monday (16:30-18:30)

    Text normalization

    • Word tokenization, character tokenization, and subword tokenization
    • Byte-pair encoding algorithm
    • Sentence segmentation and case folding
    • Stop words, stemming and lemmatization
    • Research papers

    Words and meaning

    • Lexical semantics
    • Distributional semantics
    • Review: vectors
    • Term-context matrix

    References

    • Jurafsky and Martin, chapter 2
    • Jurafsky and Martin, chapter 6
    • Eisenstein, section 14.3

    Resources

  • Day 04

    March 5th, Wednesday (16:30-18:30)

    Words and meaning

    • Pointwise mutual information
    • Probability estimation
    • Examples
    • Practical issues
    • Truncated singular value decomposition
    • Neural word embeddings
    • Word2vec and skip-gram
    • Logistic regression

    References

    • Jurafsky and Martin, chapter 6
    • Voita, NLP Course | For You (web course): Word embeddings
  • Day 05

    March 10th, Monday (16:30-18:30)

    Words and meaning

    • Training
    • Practical issues
    • FastText and GloVe
    • Semantic properties of neural word embeddings
    • Evaluation
    • Cross-lingual word embeddings
    • Research papers

    Language models

    • Language modeling: word prediction and sentence distribution
    • Language modeling applications
    • Relative frequency estimation
    • N-gram model

    References

    • Jurafsky and Martin, chapter 6
    • Jurafsky and Martin, chapter 3

    Resources

  • Day 06

    March 12th, Wednesday (16:30-18:30)

    Language models

    • N-gram probabilities and bias-variance trade-off
    • Practical issues
    • Evaluation: perplexity measure
    • Sampling sentences
    • Smoothing: Laplace and add-k smoothing
    • Stupid backoff and linear interpolation
    • Out-of-vocabulary words
    • Limitations of N-gram model
    • Research papers

    Exercises

    • Subword tokenization: BPE algorithm

    References

    • Jurafsky and Martin, chapter 3
  • Day 07

    March 17th, Monday (16:30-18:30)

    Neural language models (NLM)

    • General architecture for NLM
    • Feedforward NLM: inference
    • Feedforward NLM: training
    • Recurrent NLM: inference
    • Recurrent NLM: training
    • Practical issues: parameter freezing, weight tying, softmax temperature

    References

    • Voita, NLP Course | For You (web course): Language Modeling
    • Jurafsky and Martin, sections 7.6, 7.7
    • Jurafsky and Martin, section 8.2

    Resources

  • Day 08

    March 19th, Wednesday (16:30-18:30)

    Transformers: short recap

    • Attention
    • Encoder
    • Decoder
    • Residual stream

    Contextualised word embeddings

    • Static embeddings vs. contextualized embeddings
    • ELMo
    • BERT: encoder-only model
    • Masked language modeling
    • Next sentence prediction

    References

    • Jurafsky and Martin, chapter 9
    • Jurafsky and Martin, sections 11.1, 11.2, 11.3
    • Voita, NLP Course | For You (web course): Language Modeling

    Resources

  • Day 09

    March 24th, Monday (16:30-18:30)

    Contextualised word embeddings

    • GPT-n decoder-only model
    • Sentence-BERT

    Large language models

    • Language modeling head
    • Text completion and decoder-only model
    • Casting NLP tasks as text completion
    • Sampling
    • Pretraining

    References

    • Jurafsky and Martin, section 9.5
    • Jurafsky and Martin, chapter 10
    • Slides from lecture
  • Day 10

    March 26th, Wednesday (16:30-18:30)

    Large language models

    • Pretraining
    • Training corpora
    • Scaling laws for LLMs
    • Overview of LLMs
    • Multi-lingual LLMs

    Exercises

    • Positive pointwise mutual information (PPMI)

    References

    • Jurafsky and Martin, chapter 10
    • Slides from lecture
  • Lab Session I: word embeddings

    March 31st, Monday (8:30-10:30)

    Using pretrained word embeddings

    • Static word embeddings
    • Gensim and pre-trained embeddings
    • Embeddings visualization with PCA
    • Word embeddings evaluation: word similarity and word analogy benchmarks

    Exercises

    • Working with pre-trained embeddings
    • Training your own embeddings

    Resources

  • Day 11

    March 31st, Monday (16:30-18:30)

    Post-training

    • Fine-tuning
    • Instruction tuning
    • Model Alignment
    • Parameter efficient fine-tuning: adapters
    • Parameter efficient fine-tuning: LoRA
    • Transfer learning

    References

    • Jurafsky and Martin, sections 11.4, 12.2, 12.3
    • Voita, NLP Course | For You (web course): Transfer Learning
    • Slides from lecture
  • Day 12

    April 2nd, Wednesday (16:30-18:30)

    ChatBot

    • ChatBot
    • Datasets
    • Prompt
    • Prompt engineering
    • Retrieval-augmented generation

    References

    • Jurafsky and Martin, chapter 12, skip 12.2, 12.3 which are part of the previous lecture
    • Slides from lecture
  • Day 13

    April 7th, Monday (16:30-18:30)

    ChatBot

    • Large reasoning models

    Part-of-speech tagging

    • Part-of-speech (PoS) and part-of-speech tagging
    • Evaluation

    Hidden Markov models

    • Definition of Hidden Markov model (HMM)
    • Probability estimation for HMM

    References

    • Jurafsky and Martin, chapter 17
    • Slides from the lecture

    Resources

  • Day 14

    April 9th, Wednesday (16:30-18:30)

    Hidden Markov models

    • HMMs as automata with output
    • Decoding via Viterbi algorithm
    • Forward algorithm
    • Trellis representation
    • Backward algorithm
    • Forward-backward algorithm: motivation
    • E-step and M-step
    • Research papers

    References

    • Jurafsky and Martin, chapter 17
    • Jurafsky and Martin, appendix A (from the book web page)
  • Day 15

    April 14th, Monday (16:30-18:30)

    Conditional random fields

    • Conditional random fields (CRF) and global features
    • Linear chain CRF, local features and feature templates
    • Inference algorithm
    • Training algorithm
    • Research papers

    Neural part-of-speech tagging

    • Local search
    • Fixed-window neural model
    • Recurrent neural model
    • Recurrent bidirectional model
    • Global search
    • Learnable transition features
    • LSTM-CRF model

    References

    • Jurafsky and Martin, chapter 17
    • Eisenstein, section 7.5.3
    • Eisenstein, section 7.6.1
  • Day 16

    April 16th, Wednesday (16:30-18:30)

    Sequence labelling

    • Named entity recognition (NER)
    • BIO labeling
    • NER evaluation
    • Other sequence labelling tasks

    Phrase-structure parsing (part I)

    • Constituents and phrase structure
    • Notions of head, argument and modifier
    • Grammatical relations
    • PP-attachment and wh-movement
    • Treebanks
    • Context-free grammar (CFG)
    • Probabilistic CFG
    • Lexicalized CFG

    References

    • Jurafsky and Martin, chapter 17
    • Jurafsky and Martin, chapter 18
    • Slides from lecture
  • Lab Session II: introduction to transformers with Hugging Face

    April 21st, off-line

    Transformers & Hugging Face

    • Hugging Face hub
    • Transformer
    • Tokenizer
    • Datasets
    • Fine-tuning a transformer model
    • Evaluation
    • Generation

    Resources

  • Day 17

    April 23rd, Wednesday (16:30-18:30)

    Dependency parsing

    • Dependency trees
    • Grammatical functions
    • Projective and non-projective dependency trees
    • Dependency treebanks
    • Transition-based dependency parsing

    Exercises

    • Viterbi algorithm

    References

    • Jurafsky and Martin, chapter 19

    Resources

  • Day 18

    April 28th, Monday (16:30-18:30)

    Dependency parsing

    • Arc-standard parser
    • Transitions definition
    • Ambiguity
    • Oracle
    • Example
    • Oracle and generation of training data
    • Feature extraction, feature functions and feature templates

    References

    • Jurafsky and Martin, chapter 19
  • Day 19

    May 5th, Monday (16:30-18:30)

    Dependency parsing

    • Feature extraction, feature functions and feature templates
    • Alternative models for dependency parsing: beam search and graph-based dependency parsing

    Neural dependency parsing

    • Case study: Kiperwasser & Goldberg 2016
    • Feature extraction using BiLSTM
    • Hinge loss function

    Evaluation for dependency parsing

    • Unlabelled attachment score (UAS)
    • Labelled attachment score (LAS)

    References

    • Jurafsky and Martin, chapter 19
    • Slides from the lecture
    • Simple and Accurate Dependency Parsing Using Bidirectional LSTM Feature Representations, Kiperwasser and Goldberg, TACL, vol. 4, 2016
  • Lab Session III: natural language generation

    May 6th, Tuesday (16:30-18:30)

    Natural Language Generation

    • Bigram models
    • Large Language Models with Hugging Face
    • Customizing text generation via decoding strategies

    Exercises

    • Building a trigram model with a backoff strategy
    • Typo detection using a LLM

    Resources

  • Day 20

    May 7th, Wednesday (16:30-18:30)

    Machine translation

    • Word ordering and V,S,O language classification
    • Word translation and word alignment relation
    • Statistical machine translation (SMT)
    • Translation model + language model

    Exercises

    • Arc-standard oracle

    References

    • Jurafsky and Martin, chapter 13
    • Slides from the lecture

    Resources

  • Day 21

    May 12th, Monday (16:30-18:30)

    Neural machine translation

    • Neural machine translation (NMT) and posterior probability
    • Encoder-decoder architecture (seq2seq): general idea
    • Encoder-decoder through RNN and through transformer
    • RNN: autoregressive encoder-decoder
    • RNN: greedy inference algorithm
    • RNN: training algorithm and teacher forcing
    • RNN: attention and dynamic context vector
    • RNN: dot-product attention
    • RNN: bilinear attention
    • Transformer-based architecture
    • Cross-attention, query, key and value
    • Search tree and beam search

    References

    • Jurafsky and Martin, chapter 13
  • Lab Session IV: Local running of LLM

    May 13th, Tuesday (16:30-18:30)

    Local running of LLM

    • Ollama
    • OpenWebUI
    • Infrastructure issues
    • Computational resources
    • Exercise: Connecting to LLM via REST API in Python
    • Discussion
  • Day 22

    May 14th, Wednesday (16:30-18:30)

    Neural machine translation

    • Parallel corpora
    • Evaluation: BLEU and METEOR
    • NMT and leaderboard

    Question answering

    • Question answering (QA) and factoid questions
    • Text-based QA: IR + machine reading

    Exercises

    • Spurious ambiguity

    References

    • Jurafsky and Martin, chapter 13
    • Slides from the lecture

    Resources

  • Day 23

    May 19th, Monday (16:30-18:30)

    Question answering

    • Machine reading based on contextual embeddings
    • Start and end probabilities
    • Candidate score and fine-tuning loss
    • Negative examples and sliding windows
    • Machine reading based on attention: Stanford attentive reader
    • Bilinear product attention
    • Practical issues
    • Research papers
    • Datasets and leaderboards
    • Answer sentence selection
    • Knowledge-based QA
    • Entity linking

    References

    • Eisenstein, section 17.5.2
    • Slides from the lecture

    Resources

  • Day 24

    May 26th, Wednesday (16:30-18:30)

    Semantic parsing

    • Referential meaning and general semantics
    • Lexical semantics resources: WordNet and word senses
    • Word sense disambiguation
    • Semantic roles and thematic grid
    • Lexical semantic resources: PropBank and FrameNet
    • Semantic role labeling (SRL)
    • Neural algorithm for SRL
    • Argument selection and selectional restrictions
    • Referential meaning and meaning representations
    • Abstract meaning representation formalism (AMR)
    • Semantic parsing and transition-based approaches
    • Research papers

    References

    • Jurafsky and Martin, chapter 21
    • Slides from the lecture

    Resources

  • Lab Session V: retrieval augmented generation

    June 3rd, Tuesday (16:30-18:30)

    Retrieval augmented generation (RAG)

    • Introduction to LangChain
    • Building a knowledge base with Chroma
    • Leveraging open-weight LM from Hugging Face
    • Designing prompt templates

    Exercises

    • Developing a RAG application using domain-specific knowledge that has emerged after the training cutoff of the selected LMs

    Resources

  • Course syllabus

    The course syllabus is based on

    • the adopted textbook: 'Speech and Language Processing' by Dan Jurafsky and James H. Martin, 3rd Edition, draft from January 12th 2025, available on web
    • auxiliary textbook 'Introduction to Natural Language Processing' by Jacob Eisenstein, October 2019, MIT Press, preprint version available on web
    • on-line course 'NLP Course | For You' by Elena Voita, University of Edinburgh
    • lecture slides, available on the course website
  • Final Exams