Section outline

  • Content: Detailed description of course content and prerequisites can be found here.

    Textbook: The adopted textbook is Speech and Language Processing (3rd Edition, draft, Jan 6th 2026) by Dan Jurafsky and James H. Martin, available here.

    Logistics: Lectures are on Thursday 10:30-12:30 (room Ce) and on Friday 10:30-12:30 (room Ce).

    Office hours: Thursday 12:30-14:30, email appointment required. Meetings can be face-to-face or else on-line at this Zoom link.

    • Forum for general news and announcements. Only the lecturer can post in this forum. Subscription to this forum is automatic for every student who has registered to this course.

    • Forum for discussion of technical matter presented during the lectures. Any student with a unipd account can post in this forum. Subscription to this forum is automatic for every student who has registered to this course.

    • Forum for project discussion. Any student with a unipd account can post in this forum. Subscription to this forum is automatic for every student who has registered to this course.

  • March 5th, Thursday (10:30-12:30)

    Course administration and presentation

    • Content outline
    • Laboratory sessions
    • Course requirements
    • Textbook
    • Project
    • Coursework

    Natural language processing: An unexpected journey

    • What is natural language processing?
    • Very short history of natural language processing
    • Why is natural language processing tricky?
    • Ambiguity, composition, recursion and hidden structure
    • Language & learning
    • Miscellanea

    References

    • Slides from the lecture

    Resources

  • March 6th, Friday (10:30-12:30)

    Text normalization

    • Word types and word tokens
    • Herdan law and Zipf law
    • Morphology and word-form
    • Corpora
    • Language identification and spell checking
    • Text normalization: contraction, punctuation and special characters
    • Word tokenization, character tokenization, and subword tokenization
    • Byte-pair encoding algorithm: learner, encoder and decoder

    References

    • Jurafsky and Martin, chapter 2

    Resources

  • March 12th, Thursday (10:30-12:30)

    Text normalization

    • Byte-pair encoding algorithm: learner, encoder and decoder (cont'd)
    • Sentence segmentation and case folding
    • Stop words, stemming and lemmatization

    Words and meaning

    • Lexical semantics
    • Distributional semantics
    • Count-based embeddings
    • Word2vec and skip-gram
    • Logistic regression
    • Training

    References

    • Jurafsky and Martin, chapter 2
    • Jurafsky and Martin, chapter 5

    Resources

  • March 13th, Friday (10:30-12:30)

    Words and meaning

    • Training (cont'd)
    • Practical issues
    • FastText and GloVe
    • Semantic properties of neural word embeddings
    • Evaluation
    • Cross-lingual word embeddings

    References

    • Jurafsky and Martin, chapter 5
    • Voita, NLP Course | For You (web course): Word embeddings
  • March 19th, Thursday (10:30-12:30)

    Statistical language models

    • Language modeling: prediction and generation
    • Language modeling applications
    • Relative frequency estimation
    • N-gram model
    • N-gram probabilities and bias-variance trade-off
    • Practical issues
    • Evaluation: perplexity measure
    • Sampling sentences
    • Smoothing: Laplace and add-k smoothing
    • Stupid backoff

    References

    • Jurafsky and Martin, chapter 3

    Resources

  • March 20th, Friday (10:30-12:30)

    Statistical language models

    • Linear interpolation
    • Out-of-vocabulary words
    • Limitations of N-gram model

    Neural language models (NLM)

    • General architecture for NLM
    • Feedforward NLM: inference
    • Feedforward NLM: training
    • Recurrent NLM: inference
    • Recurrent NLM: training

    Exercises

    • Subword tokenization: BPE algorithm

    References

    • Jurafsky and Martin, section 6.5
    • Jurafsky and Martin, section 13.2
    • Voita, NLP Course | For You (web course): Language Modeling

    Resources

  • March 26th, Thursday (10:30-12:30)

    Neural language models (NLM)

    • Practical issues: parameter freezing, weight tying, softmax temperature

    Contextual word embeddings

    • Static embeddings vs. contextualized embeddings
    • ELMo
    • BERT: encoder-only model
    • BERT: masked language modeling

    References

    • Jurafsky and Martin, chapter 9
    • Voita, NLP Course | For You (web course): Language Modeling
    • Slides from lecture

    Resources

  • March 27th, Friday (10:30-12:30)

    Contextual word embeddings

    • BERT: next sentence prediction
    • Applications of BERT: sentiment analysis
    • Applications of BERT: named-entity recognition
    • GPT-n decoder-only model
    • Sentence-BERT

    References

    • Jurafsky and Martin, chapter 9
    • Slides from lecture
  • April 2nd, Thursday (10:30-12:30)

    Lab Session I: Static word embeddings

    • Introduction to the gensim library
    • Common operations with word embeddings: lookup, similarity, NN retrieval
    • Visualizing word embeddings: dimensionality reduction with PCA
    • Intrisic evaluation of word embeddings: word similarity and word analogy benchmarks

    Large language models and pretraining

    • Pretraining and transfer learning
    • Large language models
    • Language modeling head
    • Text completion and decoder-only model
    • Casting NLP tasks as text completion

    References

    • Jurafsky and Martin, chapter 7
    • Jurafsky and Martin, chapter 8

    Resources

  • April 9th, Thursday (10:30-12:30)

    Large language models and pretraining

    • Sampling
    • Key-Value cache
    • Pretraining

    References

    • Jurafsky and Martin, chapter 7
    • Jurafsky and Martin, chapter 8
  • April 10th, Friday (10:30-12:30)

    Lab Session II: Hugging Face Transformer

    • General overview of the library
    • Importing and using BERT
    • Using Gemma and chat templates

    Large language models and pretraining

    • Training corpora

    References

    • Jurafsky and Martin, chapter 7
    • Jurafsky and Martin, chapter 8
    • Slides from lecture

    Resources

  • April 16th, Thursday (10:30-12:30)

    Large language models and pretraining

    • Scaling laws for LLMs (cont'd)
    • Overview of LLMs
    • Classification for LLMs
    • Multi-lingual LLMs
    • Miscellanea

    References

    • Jurafsky and Martin, chapter 7
    • Jurafsky and Martin, chapter 8
    • Slides from lecture
  • April 17th, Friday (10:30-12:30)

    Large language models and post-training

    • Fine-tuning
    • Instruction tuning
    • Datasets for instruction tuning
    • Model Alignment

    References

    • Jurafsky and Martin, chapter 10

    Resources

  • April 23th, Thursday (10:30-12:30)

    Large language models and post-training

    • Preference-based learning
    • Modeling preferences
    • Learning to score preferences
    • LLM alignment via preference learning
    • Direct preference optimization
    • Parameter efficient fine-tuning: adapters
    • Parameter efficient fine-tuning: LoRA
    • Transfer learning

    References

    • Jurafsky and Martin, chapter 10
    • Jurafsky and Martin, chapter 8
    • Voita, NLP Course | For You (web course): Transfer Learning
    • Slides from lecture
  • April 24th, Friday (10:30-12:30)

    Lab Session III: Retrieval Augmented Generation

    • Embedder (BERT) and Tokenizer
    • GEMMA
    • Load Documents, Embed them and save to Knowledge Base
    • Retrieval: Top-n Similar Chunks
    • Generation: Answer from Retrieved Context

    References

    • Slides from lecture

    Resources

  • April 30th, Thursday (10:30-12:30)

    ChatBot

    • Introduction to ChatBots
    • Domain adaptation
    • Lifecycle of a chatBot
    • Datasets
    • Prompt
    • Prompt engineering

    References

    • Jurafsky and Martin, chapter 7
    • Slides from lecture

    Resources

    • External video: Training pipeline of GPT assistants like ChatGPT by Andrej Karpathy, 2023. First part only: stop at time-lapse 20:17

  • May 7th, Thursday (10:30-12:30)

    Retrieval-Augmented Generation

    • Introduction to RAG
    • Neural information retrieval
    • Cross-encoder
    • Bi-encoder
    • ColBERT
    • Generation
    • Advanced RAG methods
    • Datasets
    • Evaluation

    Part-of-speech tagging

    • Part-of-speech (PoS) and part-of-speech tagging
    • Evaluation

    Hidden Markov models

    • Structured prediction problems

    References

    • Jurafsky and Martin, chapter 11
    • Jurafsky and Martin, chapter 17
    • Slides from lecture

    Resources

  • May 8th, Friday (10:30-12:30)

    Hidden Markov models

    • Definition of Hidden Markov model (HMM)
    • Probability estimation for HMM
    • HMMs as automata with output
    • Decoding via Viterbi algorithm

    Neural part-of-speech tagging

    • Local search
    • Fixed-window neural model
    • Recurrent neural model
    • Recurrent bidirectional model

    Sequence labelling

    • Named entity recognition (NER)
    • BIO labeling
    • NER evaluation
    • Other sequence labelling tasks

    References

    • Jurafsky and Martin, chapter 17
    • Slides from the lecture

    Resources

  • May 14th, Thursday (10:30-12:30)

    Dependency parsing

    • Dependency trees
    • Grammatical functions
    • Projective and non-projective dependency trees
    • Dependency treebanks
    • Transition-based dependency parsing
    • Arc-standard parser
    • Transitions definition

    Exercises

    • Viterbi algorithm

    References

    • Jurafsky and Martin, chapter 19
    • Slides from the lecture

    Resources

  • May 21st, Thursday (10:30-12:30)

    Dependency parsing

    • Ambiguity
    • Oracle and generation of training data
    • Example
    • Feature extraction, feature functions and feature templates

    Neural dependency parsing

    • Case study: Kiperwasser & Goldberg 2016
    • Feature extraction using BiLSTM
    • Hinge loss function
    • Evaluation: UAS and LAS

    References

    • Jurafsky and Martin, chapter 19
    • Simple and Accurate Dependency Parsing Using Bidirectional LSTM Feature Representations, Kiperwasser and Goldberg, TACL, vol. 4, 2016
  • May 22nd, Friday (10:30-12:30)

    Lab Session IV: Named-Entity Recognition

    • BERT-based NER
    • Gemma prompting for NER
    • Evaluation
    • Exercise

    Resources

  • May 28th, Thursday (10:30-12:30)

    Machine translation

    • Word ordering and V,S,O language classification
    • Word translation and word alignment relation
    • Neural machine translation (NMT): general idea
    • Encoder-decoder architecture (seq2seq): general idea

    Exercises

    • Arc-standard oracle

    References

    • Jurafsky and Martin, chapter 12
    • Slides from the lecture

    Resources

  • In this box I am reporting the text for final exams from past sessions/years.

    Be aware that the program for academic year 2025/26 has changed considerably, therefore several of the questions that you will find in the final exams for the previous years are not in the program of academic year 2025/26.