INQ0091105 - NATURAL LANGUAGE PROCESSING 2023-2024
Topic outline
-
Content: Detailed description of course content and prerequisites can be found here.
Textbook: The adopted textbook is Speech and Language Processing (3rd Edition, draft, January 7th, 2023) by Dan Jurafsky and James H. Martin, available here.
Additional resources: The following textbook can be used for consultation only: Introduction to Natural Language Processing by Jacob Eisenstein, October 2019, MIT Press, preprint version available here. The course also uses an electronic forum for discussion of technical matter and administrative information. You can also access video recordings of the lectures from academic year 2021/22 at this link
Logistics: Lectures are on Wednesday 10:30-12:30 (room Me) and on Friday 10:30-12:30 (room De).
Office hours: Thursday 12:30-14:30, email appointment required. Meetings can be face-to-face or else on-line at this Zoom link.
-
Forum for general news and announcements. Only the lecturer can post in this forum. Subscription to this forum is automatic for every student who has registered to this course.
-
Forum for discussion of technical matter presented during the lectures. Any student with a unipd account can post in this forum. Subscription to this forum is automatic for every student who has registered to this course.
-
Forum for discussion of technical matter presented during the open laboratory sessions. Any student with a unipd account can post in this forum. Subscription to this forum is automatic for every student who has registered to this course.
-
Forum for project discussion. Any student with a unipd account can post in this forum. Subscription to this forum is automatic for every student who has registered to this course.
-
-
February 28th, Wednesday (10:30-12:30)
Course administration and presentation
- Content outline
- Laboratory sessions
- Course requirements
- Textbook
- Project
- Coursework
- Statistics
- Lecturer evaluation
Natural language processing: An unexpected journey
- What is natural language processing?
- A few case studies: finance, social networks, health
- Very short history of natural language processing
- Why is natural language processing tricky?
- Ambiguity, composition, recursion and hidden structure
- How does natural language processing work?
- Learning & knowledge
- Search & learning
- Market, environment and ethics
References
- Slides from the lecture
- Eisenstein, chapter 1 for learning & knowledge and for search & learning
Resources
-
March 1st, Friday (10:30-12:30)
Essentials of linguistics
- What is linguistics?
- Phonology
- Morphology
- Part of speech
- Syntax: phrase structure and dependency structure
- Lexical semantics and general semantics
- Pragmatics and discourse
Text normalization
- Word types and word tokens
- Corpora
- Language identification and spell checking
- Text normalization: contraction, punctuation and special characters
References
- Slides from the lecture
- Jurafsky and Martin, chapter 2
Resources
-
March 6th, Wednesday (10:30-12:30)
Text normalization
- Word tokenization, character tokenization, and subword tokenization
- Byte-pair encoding algorithm
- Sentence segmentation and case folding
- Stop words, stemming and lemmatization
- Research papers
Words and meaning
- Lexical semantics
- Distributional semantics
- Review: vectors
- Term-context matrix
References
- Jurafsky and Martin, chapter 2
- Jurafsky and Martin, chapter 6
- Eisenstein, section 14.3
Resources
-
March 8th, Friday (10:30-12:30)
Words and meaning
- Pointwise mutual information
- Probability estimation
- Examples
- Practical issues
- Truncated singular value decomposition
- Neural word embeddings
- Word2vec and skip-gram
- Logistic regression
- Training
- Practical issues
- FastText and GloVe
References
- Jurafsky and Martin, chapter 6
- Voita, NLP Course | For You (web course): Word embeddings
-
March 13th, Wednesday (10:30-12:30)
Words and meaning
- Semantic properties of neural word embeddings
- Evaluation
- Cross-lingual word embeddings
- Research papers
Language models
- Language modeling: prediction and generation
- Language modeling applications
- Relative frequency estimation
- N-gram model
- N-gram probabilities and bias-variance trade-off
- Practical issues
- Evaluation: perplexity measure
- Sampling sentences
- Smoothing: Laplace and add-k smoothing
- Stupid backoff and linear interpolation
References
- Jurafsky and Martin, chapter 6
- Jurafsky and Martin, chapter 3
Resources
-
March 15th, Friday (10:30-12:30)
Language models
- Out-of-vocabulary words
- Limitations of N-gram model
- Research papers
Neural language models (NLM)
- General architecture for NLM
- Feedforward NLM: inference
- Feedforward NLM: training
- Recurrent NLM: inference
Exercises
- Subword tokenization: BPE algorithm
References
- Jurafsky and Martin, chapter 3
- Voita, NLP Course | For You (web course): Language Modeling
- Jurafsky and Martin, section 7.5
- Jurafsky and Martin, section 7.7
- Jurafsky and Martin, section 9.2
-
March 20th, Wednesday (10:30-12:30)
Neural language models (NLM)
- Recurrent NLM: inference (continued)
- Recurrent NLM: training
- Practical issues: parameter freezing, weight tying, softmax temperature
Contextualised word embeddings
- Transformers: short recap
- Attention
- Static embeddings vs. contextualized embeddings
- ELMo
References
- Jurafsky and Martin, section 9.2
- Jurafsky and Martin, chapter 11
- Voita, NLP Course | For You (web course): Language Modeling
- Slides from lecture
Resources
-
March 22nd, Friday (10:30-12:30)
Large language models
- BERT: masked language modeling and next sentence prediction
- Other models
- The GPT-n family of large language models
- Other large language models
- Multi-lingual large language models
References
- Jurafsky and Martin, chapter 11
- Voita, NLP Course | For You (web course): Transfer Learning
- Slides from lecture
-
March 22nd, Friday (16:30-18:30)
Using pretrained word embeddings
- Introduction to the gensim library
- Common operations with word embeddings: lookup, similarity, NN retrieval
- Visualizing word embeddings: dimensionality reduction with PCA
- Intrisic evaluation of word embeddings: word similarity and word analogy benchmarks
Pretraining word embeddings
- Using gensim to pretrain word embeddings (Word2Vec style)
- Saving and loading embeddings
Extrinsic evaluation of word embeddings
- Using word2vec representations for spam classification
Resources
-
March 27th, Wednesday (10:30-12:30)
Large language models
- Multi-lingual large language models (continued)
- Sentence BERT
- Miscellanea: emergent abilities, hallucinations, mixture of experts
- Research papers
Fine-tuning
- Adaptation: feature extraction vs. fine-tuning; catastrophic forgetting
- Adapters
- LoRA
- Transfer learning
Exercises
- Positive pointwise mutual information (PPMI)
References
- Jurafsky and Martin, chapter 11
- Voita, NLP Course | For You (web course): Transfer Learning
- Slides from lecture
-
April 3rd, Wednesday (10:30-12:30)
Fine-tuning
- Prompt learning
- Retrieval augmented generation
- Large language models and ethics
- Research papers
ChatBots
- Supervised fine-tuning
- Reward modeling from human feedback
- Reinforcement learning training
References
- Jurafsky and Martin, section 10.10
- Slides from lecture
Resources
-
Slides: Training pipeline of GPT assistants like ChatGPT by Andrej Karpathy, 2023. First part only: stop at slide #30.
-
External video: Training pipeline of GPT assistants like ChatGPT by Andrej Karpathy, 2023. First part only: stop at time-lapse 20:17
-
April 5th, Friday (10:30-12:30)
Part-of-speech tagging
- Part-of-speech (PoS) and part-of-speech tagging
- Evaluation
Hidden Markov models
- Definition of Hidden Markov model (HMM)
- Probability estimation for HMM
- HMMs as automata with output
- Decoding via Viterbi algorithm
- Forward algorithm
- Trellis representation
- Backward algorithm
References
- Jurafsky and Martin, chapter 8
- Slides from the lecture
Resources
-
April 10th, Wednesday (10:30-12:30)
Hidden Markov models
- Forward-backward algorithm: motivation
- E-step and M-step
- Research papers
Conditional random fields
- Conditional random fields (CRF) and global features
- Linear chain CRF, local features and feature templates
- Inference algorithm
- Training algorithm
- Research papers
References
- Jurafsky and Martin, chapter 8
- Jurafsky and Martin, appendix A
- Eisenstein, section 7.5.3
-
April 12th, Friday (10:30-12:30)
Neural part-of-speech tagging
- Local search
- Fixed-window neural model
- Recurrent neural model
- Recurrent bidirectional model
- Global search
- Learnable transition features
- LSTM-CRF model
Sequence labelling
- Named entity recognition (NER)
- BIO labeling
- NER evaluation
- Other sequence labelling tasks
References
- Jurafsky and Martin, chapter 8
- Eisenstein, section 7.6.1
-
April 17th, Wednesday (16:30-18:30)
Dependency parsing
- Dependency trees
- Grammatical functions
- Projective and non-projective dependency trees
- Dependency treebanks
- Transition-based dependency parsing
Exercises
- N-gram model and \(k\)-smoothing
References
- Jurafsky and Martin, chapter 18
Resources
-
April 19th, Friday (10:30-12:30)
Dependency parsing
- Arc-standard parser
- Transitions definition
- Ambiguity
- Oracle
- Example
Exercises
- Part-of-speech tagging
- HMM supervised estimation
References
- Jurafsky and Martin, chapter 18
-
April 19th, Friday (16:30-18:30)
Transformers & Huggingface
- Huggingface hub
- Transformer
- Tokenizer
- Datasets
- Fine-tuning a transformer model
- Evaluation
- Generation
Resources
-
April 24th, Wednesday (10:30-12:30)
Dependency parsing
- Arc-standard parser
- Oracle and generation of training data
- Feature extraction, feature functions and feature templates
- Alternative models for dependency parsing: beam search and graph-based dependency parsing
Neural dependency parsing
- Case study: Kiperwasser & Goldberg 2016
- Feature extraction using BiLSTM
- Hinge loss function
References
- Jurafsky and Martin, chapter 18
- Slides from the lecture
- Simple and Accurate Dependency Parsing Using Bidirectional LSTM Feature Representations, Kiperwasser and Goldberg, TACL, vol. 4, 2016
-
April 29th, Monday
Introduction to LangChain Library
- Model I/O
- Data connection
- Chains
- Agents
- Memory
- Callbacks
Resources
-
May 3rd, Friday (10:30-12:30)
Dependency parsing
- Alternative models for dependency parsing: graph-based dependency parsing
- Evaluation: UAS and LAS
Exercises
- Viterbi algorithm
References
- Jurafsky and Martin, chapter 18
- Slides from the lecture
-
May 8th, Wednesday (10:30-12:30)
Semantic parsing
- Referential meaning and general semantics
- Lexical semantics resources: WordNet and word senses
- Word sense disambiguation
- Semantic roles and thematic grid
- Lexical semantic resources: PropBank and FrameNet
- Semantic role labeling (SRL)
- Neural algorithm for SRL
- Argument selection and selectional restrictions
- Referential meaning and meaning representations
- Abstract meaning representation formalism (AMR)
- Semantic parsing and transition-based approaches
- Research papers
References
- Jurafsky and Martin, chapter 20
- Slides from the lecture
Resources
-
May 10th, Friday (10:30-12:30)
Machine translation
- Word ordering and V,S,O language classification
- Word translation and word alignment relation
- Statistical machine translation (SMT)
- Translation model + language model
- Neural machine translation (NMT): general idea
- Encoder-decoder architecture (seq2seq): general idea
Exercises
- Arc-standard oracle
References
- Jurafsky and Martin, chapter 13
- Slides from the lecture
Resources
-
May 10th, Friday
Arc-standard model
- Implement an arc-standard parser
- Implement an associated oracle
- Train a neural model
Resources
-
May 15th, Wednesday (10:30-12:30)
Machine translation
- RNN: autoregressive encoder-decoder
- RNN: greedy inference algorithm
- RNN: training algorithm and teacher forcing
- RNN: attention and dynamic context vector
- RNN: dot-product attention
- RNN: bilinear attention
- Transformer-based architecture
- Cross-attention, query, key and value
- Search tree and beam search
- Evaluation: BLEU and METEOR
- NMT and leaderboard
- Parallel corpora
- Research papers
References
- Jurafsky and Martin, chapter 13
- Jurafsky and Martin, chapter 9
- Jurafsky and Martin, chapter 10
-
May 17th, Friday (10:30-12:30)
Question answering
- Question answering (QA) and factoid questions
- Text-based QA: IR + machine reading
- Machine reading based on contextual embeddings
- Start and end probabilities
- Candidate score and fine-tuning loss
- Negative examples and sliding windows
Exercises
- Spurious ambiguity
References
- Jurafsky and Martin, chapter 14
Resources
-
May 17th, Friday
Summarization
- T5 model
- Dataset
- Data collator and training
- Evaluation: ROUGE measure
Resources
-
May 22nd, Wednesday (10:30-12:30)
Question answering
- Machine reading based on attention: Stanford attentive reader
- Bilinear product attention
- Practical issues
- Retrieval augmented generation
- Research papers
- Datasets and leaderboards
- Answer sentence selection
- Knowledge-based QA
- Entity linking
Dialogue
- Human Conversation and turns
- Dialogue and speech acts
- Grounding
- Dialogue structure, adjacency pairs, and sub-dialogue
- Inference
Chatbots
- Rule-based systems
- Corpus-based systems
- Response by retrieval
- Response by generation
- Hybrid systems
- Research papers
Virtual assistants: frame-based
- Frame-based dialogue systems
- Slot/value pairs and question templates
- Domain classification, intent determination, and slot filling
References
- Eisenstein, section 17.5.2
- Jurafsky and Martin, chapter 14
- Slides from the lecture
- Jurafsky and Martin, chapter 15
Resources
-
May 24th, Friday (10:30-12:30)
Exercises
- Word embeddings: parallelogram model
- Dependency tree and ambiguity
Virtual assistants: dialogue-state
- General architecture
- Dialogue acts
- Natural language understanding
- Dialog state tracker
- Dialog policy
- Natural language generation
Dialogue systems
- Evaluation
- Ethical issues
References
- Jurafsky and Martin, chapter 15
Resources
-
May 29th, Wednesday (10:30-12:30)
Discussion & conclusions
- NLP timeline
- Open problems
- Explainability
- Grounding
- Theory vs. invention
- Ethics
- The NLP Hype
- NLU Datasets
- Generative AI project lifecycle
- FLAN Datasets
- ChatBot Arena
Wrap up
- Overview of past years final exams
- Overview of course syllabus
References
- Slides from the lecture
Resources
-
The course syllabus is based on
- the adopted textbook: 'Speech and Language Processing' by Dan Jurafsky and James H. Martin, 3rd Edition, draft from January 7th, 2023, available on web
- auxiliary textbook 'Introduction to Natural Language Processing' by Jacob Eisenstein, October 2019, MIT Press, preprint version available on web
- on-line course 'NLP Course | For You' by Elena Voita, University of Edinburgh
- lecture slides, available on the course website
-
Instruction for project registration, preparation and submission have been already posted in the Project forum.
In addition, I intend to run on your software a tool for plagiarism detection. In order to do this, I am asking you to extract all of your code from the notebook for part II of your project, and to submit the .py file to the assignment activity below associated to the date of your final exam. Only one student per group should make the .py submission.
-
Opened: Saturday, 1 June 2024, 12:00 AMDue: Friday, 21 June 2024, 11:59 PM
Upload here a code only version of part II of your project.
-
Opened: Monday, 1 July 2024, 12:00 AMDue: Sunday, 14 July 2024, 11:59 PM
Upload here a code only version of part II of your project.
-
Opened: Tuesday, 3 September 2024, 12:00 AMDue: Monday, 9 September 2024, 11:59 PM
-
-
In this box I am going to report the text for final exams from past sessions/years.