Stanford CS 224N | Natural Language Processing with Deep Learning

Natural Language Processing with Deep Learning

CS224N-2019

What is this course about?

Natural language processing (NLP) or computational linguistics is one of the most important technologies of the information age. Applications of NLP are everywhere because people communicate almost everything in language: web search, advertising, emails, customer service, language translation, virtual agents, medical reports, politics, etc. In the 2010s, deep learning (or neural network) approaches obtained very high performance across many different NLP tasks, using single end-to-end neural models that did not require traditional, task-specific feature engineering. In the 2020s amazing further progress was made through the scaling of Large Language Models, such as ChatGPT. In this course, students will gain a thorough introduction to both the basics of Deep Learning for NLP and the latest cutting-edge research on Large Language Models (LLMs). Through lectures, assignments and a final project, students will learn the necessary skills to design, implement, and understand their own neural network models, using the Pytorch framework.

“Take it. CS221 taught me algorithms. CS229 taught me math. CS224N taught me how to write machine learning models.” – A CS224N student on Carta

Previous offerings

Below you can find archived websites and student project reports from previous years. Disclaimer: assignments change from year to year; please do not do assignments from previous years!

CS224n Websites: Winter 2024 / Winter 2023 / Winter 2022 / Winter 2021 / Winter 2020 / Winter 2019 / Winter 2018 / Winter 2017 / Autumn 2015 / Autumn 2014 / Autumn 2013 / Autumn 2012 / Autumn 2011 / Winter 2011 / Spring 2010 / Spring 2009 / Spring 2008 / Spring 2007 / Spring 2006 / Spring 2005 / Spring 2004 / Spring 2003 / Spring 2002 / Spring 2000

CS224n Lecture Videos: Winter 2023 / Winter 2021 / Winter 2019 / Winter 2017

CS224n Reports: Winter 2024 / Winter 2023 / Winter 2022 / Winter 2021 / Winter 2020 / Winter 2019 / Winter 2018 / Winter 2017 / Autumn 2015 and earlier

CS224d Reports: Spring 2016 / Spring 2015

Prerequisites

Proficiency in Python
All class assignments will be in Python (using NumPy and PyTorch). If you need to remind yourself of Python, or you're not very familiar with NumPy, you can come to the Python review session in week 1 (listed in the schedule). If you have a lot of programming experience but in a different language (e.g. C/C++/Matlab/Java/Javascript), you will probably be fine.
College Calculus, Linear Algebra (e.g. MATH 51, CME 100)
You should be comfortable taking (multivariable) derivatives and understanding matrix/vector notation and operations.
Basic Probability and Statistics (e.g. CS 109 or equivalent)
You should know the basics of probabilities, gaussian distributions, mean, standard deviation, etc.
Foundations of Machine Learning (e.g. CS221, CS229, CS230, or CS124)
We will be formulating cost functions, taking derivatives and performing optimization with gradient descent. If you already have basic machine learning and/or deep learning knowledge, the course will be easier; however it is possible to take CS224n without it. There are many introductions to ML, in webpage, book, and video form. One approachable introduction is Hal Daumé’s in-progress A Course in Machine Learning. Reading the first 5 chapters of that book would be good background. Knowing the first 7 chapters would be even better!

Reference Texts

The following texts are useful, but none are required. All of them can be read free online.

Dan Jurafsky and James H. Martin. Speech and Language Processing (2024 pre-release)
Jacob Eisenstein. Natural Language Processing
Yoav Goldberg. A Primer on Neural Network Models for Natural Language Processing
Ian Goodfellow, Yoshua Bengio, and Aaron Courville. Deep Learning
Delip Rao and Brian McMahan. Natural Language Processing with PyTorch (requires Stanford login).
Lewis Tunstall, Leandro von Werra, and Thomas Wolf. Natural Language Processing with Transformers

If you have no background in neural networks but would like to take the course anyway, you might well find one of these books helpful to give you more background:

Michael A. Nielsen. Neural Networks and Deep Learning
Eugene Charniak. Introduction to Deep Learning

Date	Description	Course Materials	Events	Deadlines
Week 1 Tue Apr 2	Word Vectors [slides] [notes]	Suggested Readings: Efficient Estimation of Word Representations in Vector Space (original word2vec paper) Distributed Representations of Words and Phrases and their Compositionality (negative sampling paper)	Assignment 1 out [code] [preview]
Thu Apr 4	Word Vectors and Language Models [slides] [notes] [code]	Suggested Readings: GloVe: Global Vectors for Word Representation (original GloVe paper) Improving Distributional Similarity with Lessons Learned from Word Embeddings Evaluation methods for unsupervised word embeddings Additional Readings: A Latent Variable Model Approach to PMI-based Word Embeddings Linear Algebraic Structure of Word Senses, with Applications to Polysemy On the Dimensionality of Word Embedding
Fri Apr 5	Python Review Session [slides] [colab]	3:30pm - 4:20pm Gates B01
Week 2 Tue Apr 9	Backpropagation and Neural Network Basics [slides] [notes]	Suggested Readings: matrix calculus notes Review of differential calculus CS231n notes on network architectures CS231n notes on backprop Derivatives, Backpropagation, and Vectorization Learning Representations by Backpropagating Errors (seminal Rumelhart et al. backpropagation paper) Additional Readings: Yes you should understand backprop Natural Language Processing (Almost) from Scratch	Assignment 2 out [code] [handout] [latex template]	Assignment 1 due
Thu Apr 11	Dependency Parsing [slides] [notes]	Suggested Readings: Incrementality in Deterministic Dependency Parsing A Fast and Accurate Dependency Parser using Neural Networks Dependency Parsing Globally Normalized Transition-Based Neural Networks Universal Stanford Dependencies: A cross-linguistic typology Universal Dependencies website Jurafsky & Martin Chapter 18
Fri Apr 12	PyTorch Tutorial Session [colab]	3:30pm - 4:20pm Gates B01
Week 3 Tue Apr 16	Recurrent Neural Networks [slides] [notes (lectures 5 and 6)]	Suggested Readings: N-gram Language Models (textbook chapter) The Unreasonable Effectiveness of Recurrent Neural Networks (blog post overview) Sequence Modeling: Recurrent and Recursive Neural Nets (Sections 10.1 and 10.2) On Chomsky and the Two Cultures of Statistical Learning Sequence Modeling: Recurrent and Recursive Neural Nets (Sections 10.3, 10.5, 10.7-10.12) Learning long-term dependencies with gradient descent is difficult (one of the original vanishing gradient papers) On the difficulty of training Recurrent Neural Networks (proof of vanishing gradient problem) Vanishing Gradients Jupyter Notebook (demo for feedforward networks) Understanding LSTM Networks (blog post overview)
Thu Apr 18	Sequence to Sequence Models and Machine Translation [slides] [notes (lectures 5 and 6)]	Suggested Readings: Statistical Machine Translation slides, CS224n 2015 (lectures 2/3/4) Statistical Machine Translation (book by Philipp Koehn) BLEU (original paper) Sequence to Sequence Learning with Neural Networks (original seq2seq NMT paper) Sequence Transduction with Recurrent Neural Networks (early seq2seq speech recognition paper) Neural Machine Translation by Jointly Learning to Align and Translate (original seq2seq+attention paper) Attention and Augmented Recurrent Neural Networks (blog post overview) Massive Exploration of Neural Machine Translation Architectures (practical advice for hyperparameter choices) Achieving Open Vocabulary Neural Machine Translation with Hybrid Word-Character Models Revisiting Character-Based Neural Machine Translation with Capacity and Compression	Assignment 3 out [code] [handout] [latex template] [overleaf link]	Assignment 2 due
Week 4 Tue Apr 23	Final Projects and LLM intro [slides]	Suggested Readings: Practical Methodology (Deep Learning book chapter)	Project Proposal out [handout] Default Final Project out [handout]
Thu Apr 25	Transformers (by Anna Goldie) [slides] [notes]	Suggested Readings: Attention Is All You Need The Illustrated Transformer Transformer (Google AI blog post) Layer Normalization Image Transformer Music Transformer: Generating music with long-term structure Jurafsky and Martin Chapter 10 (Transformers and Large Language Models)
Week 5 Tue Apr 30	Pretraining [slides]	Suggested Readings: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding Contextual Word Representations: A Contextual Introduction The Illustrated BERT, ELMo, and co. Jurafsky and Martin Chapter 11 (Fine-Tuning and Masked Language Models)	Assignment 4 out [code] [handout] [overleaf] [colab run script]	Assignment 3 due
Thu May 2	Post-training (RLHF, SFT, DPO) (by Archit Sharma) [slides]	Suggested Readings: Aligning language models to follow instructions Scaling Instruction-Finetuned Language Models AlpacaFarm: A Simulation Framework for Methods that Learn from Human Feedback How Far Can Camels Go? Exploring the State of Instruction Tuning on Open Resources Direct Preference Optimization: Your Language Model is Secretly a Reward Model
Fri May 3	Hugging Face Transformers Tutorial Session [colab]	3:30pm - 4:20pm Gates B03		Project Proposal due
Week 6 Tue May 7	Benchmarking and Evaluation (by Yann Dubois) [slides]	Suggested Readings: Challenges and Opportunities in NLP Benchmarking Measuring Massive Multitask Language Understanding Holistic Evaluation of Language Models AlpacaEval
Thu May 9	Efficient Neural Network Training (by Shikhar Murty) [slides]	Suggested readings: Mixed Precision Training ZeRO: Memory Optimizations Toward Training Trillion Parameter Models PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel LoRA: Low-Rank Adaptation of Large Language Models	Final Project Proposals Returned Project Milestone out [handout]	Assignment 4 due
Week 7 Tue May 14	Speech Brain-Computer Interface (by Chaofei Fan) [slides]	Suggested readings: A high-performance speech neuroprosthesis An accurate and rapidly calibrating speech neuroprosthesis A high-performance neuroprosthesis for speech decoding and avatar control Brain-Machine Interfaces (Principles of Neural Science chapter)
Thu May 16	Reasoning and Agents (by Shikhar Murty) [slides]	Suggested readings: Orca: Progressive Learning from Complex Explanation Traces of GPT-4 Least-to-Most Prompting Enables Complex Reasoning in Large Language Models ReAct: Synergizing Reasoning and Acting in Language Models BAGEL: Bootstrapping Agents by Guiding Exploration with Language WebArena: A Realistic Web Environment for Building Autonomous Agents Additional Readings: Reasoning or Reciting? Exploring the Capabilities and Limitations of Language Models Through Counterfactual Tasks Response: Emergent analogical reasoning in large language models WebLINX: Real-World Website Navigation with Multi-Turn Dialogue
Week 8 Tue May 21	Life after DPO (by Nathan Lambert) [slides]	Suggested readings: RewardBench: Evaluating Reward Models for Language Modeling D2PO: Discriminator-Guided DPO with Response Evaluation Models Social Choice for AI Alignment: Dealing with Diverse Human Feedback
Wed May 22				Final Project Milestone due
Thu May 23	ConvNets, Tree Recursive Neural Networks and Constituency Parsing [slides]	Suggested readings (tentative): Convolutional Neural Networks for Sentence Classification Improving neural networks by preventing co-adaptation of feature detectors A Convolutional Neural Network for Modelling Sentences Parsing with Compositional Vector Grammars. Constituency Parsing with a Self-Attentive Encoder	Final Project Report Instructions out [Instructions]
Fri May 24				Course Withdrawal Deadline
Week 9 Tue May 28	An Introduction to Responsible NLP (by Adina Williams)	Suggested readings: Preface + Introduction chapter of the FairML book by Solon Barocas, Moritz Hardt, Arvind Narayanan Introducing v0.5 of the AI Safety Benchmark from MLCommons	Final Project Milestones Returned
Thu May 30	NLP, linguistics, and philosophy [slides]	Suggested readings:
Week 10 Tue June 4	Final Project Emergency Assistance (no lecture)	Extra project office hours available during usual lecture time, see Ed.
Thu June 6	No class			Final project due
Mon June 10	Final Project Poster Session	11 am - 3 pm [More details] Location: McCaw Hall and Ford Gardens On-campus students must attend in person!		[Printing guide]

Date

Description

Course Materials

Events

Deadlines

Week 1

Tue Apr 2

Word Vectors
[slides] [notes]

Suggested Readings:

Efficient Estimation of Word Representations in Vector Space (original word2vec paper)
Distributed Representations of Words and Phrases and their Compositionality (negative sampling paper)

Assignment 1 out
[code]
[preview]

Thu Apr 4

Word Vectors and Language Models
[slides] [notes] [code]

Suggested Readings:

Additional Readings:

Fri Apr 5

Python Review Session
[slides] [colab]

3:30pm - 4:20pm
Gates B01

Week 2

Tue Apr 9

Backpropagation and Neural Network Basics
[slides] [notes]

Suggested Readings:

matrix calculus notes
Review of differential calculus
CS231n notes on network architectures
CS231n notes on backprop
Derivatives, Backpropagation, and Vectorization
Learning Representations by Backpropagating Errors (seminal Rumelhart et al. backpropagation paper)

Additional Readings:

Assignment 2 out
[code]
[handout]
[latex template]

Assignment 1 due

Thu Apr 11

Dependency Parsing
[slides] [notes]

Suggested Readings:

Fri Apr 12

PyTorch Tutorial Session
[colab]

3:30pm - 4:20pm
Gates B01

Week 3

Tue Apr 16

Recurrent Neural Networks
[slides] [notes (lectures 5 and 6)]

Suggested Readings:

N-gram Language Models (textbook chapter)
The Unreasonable Effectiveness of Recurrent Neural Networks (blog post overview)
Sequence Modeling: Recurrent and Recursive Neural Nets (Sections 10.1 and 10.2)
On Chomsky and the Two Cultures of Statistical Learning
Sequence Modeling: Recurrent and Recursive Neural Nets (Sections 10.3, 10.5, 10.7-10.12)
Learning long-term dependencies with gradient descent is difficult (one of the original vanishing gradient papers)
On the difficulty of training Recurrent Neural Networks (proof of vanishing gradient problem)
Vanishing Gradients Jupyter Notebook (demo for feedforward networks)
Understanding LSTM Networks (blog post overview)

Thu Apr 18

Sequence to Sequence Models and Machine Translation
[slides] [notes (lectures 5 and 6)]

Suggested Readings:

Statistical Machine Translation slides, CS224n 2015 (lectures 2/3/4)
Statistical Machine Translation (book by Philipp Koehn)
BLEU (original paper)
Sequence to Sequence Learning with Neural Networks (original seq2seq NMT paper)
Sequence Transduction with Recurrent Neural Networks (early seq2seq speech recognition paper)
Neural Machine Translation by Jointly Learning to Align and Translate (original seq2seq+attention paper)
Attention and Augmented Recurrent Neural Networks (blog post overview)
Massive Exploration of Neural Machine Translation Architectures (practical advice for hyperparameter choices)
Achieving Open Vocabulary Neural Machine Translation with Hybrid Word-Character Models
Revisiting Character-Based Neural Machine Translation with Capacity and Compression

Assignment 3 out
[code]
[handout]
[latex template]
[overleaf link]

Assignment 2 due

Week 4

Tue Apr 23

Final Projects and LLM intro
[slides]

Suggested Readings:

Practical Methodology (Deep Learning book chapter)

Project Proposal out
[handout]

Default Final Project out
[handout]

Thu Apr 25

Transformers
(by Anna Goldie)
[slides] [notes]

Suggested Readings:

Week 5

Tue Apr 30

Pretraining
[slides]

Suggested Readings:

Assignment 4 out
[code]
[handout]
[overleaf]
[colab run script]

Assignment 3 due

Thu May 2

Post-training (RLHF, SFT, DPO)
(by Archit Sharma)
[slides]

Suggested Readings:

Fri May 3

Hugging Face Transformers Tutorial Session
[colab]

3:30pm - 4:20pm
Gates B03

Project Proposal due

Week 6

Tue May 7

Benchmarking and Evaluation
(by Yann Dubois)
[slides]

Suggested Readings:

Thu May 9

Efficient Neural Network Training
(by Shikhar Murty)
[slides]

Suggested readings:

Final Project Proposals Returned

Project Milestone out
[handout]

Assignment 4 due

Week 7

Tue May 14

Speech Brain-Computer Interface
(by Chaofei Fan)
[slides]

Suggested readings:

Thu May 16

Reasoning and Agents
(by Shikhar Murty)
[slides]

Suggested readings:

Additional Readings:

Week 8

Tue May 21

Life after DPO
(by Nathan Lambert)
[slides]

Suggested readings:

Wed May 22

Final Project Milestone due

Thu May 23

ConvNets, Tree Recursive Neural Networks and Constituency Parsing
[slides]

Suggested readings:

Final Project Milestones Returned

Thu May 30

NLP, linguistics, and philosophy
[slides]

Suggested readings:

Week 10

Tue June 4

Final Project Emergency Assistance (no lecture)

Extra project office hours available during usual lecture time, see Ed.

Thu June 6

No class

Final project due

Mon June 10

Final Project Poster Session

11 am - 3 pm [More details]
Location: McCaw Hall and Ford Gardens
On-campus students must attend in person!

[Printing guide]

Natural Language Processing with Deep Learning

What is this course about?

Previous offerings

Prerequisites

Reference Texts

Schedule