From Static Embeddings to Transformers

Dozierende: Dr. Younes Samih and David Arps
Voraussetzungen : Deep Learning
Raum: 23.21.00.97
Datum: 21.03.2022 – 25.03.2022
Rhythmus: Blockveranstaltung
Uhrzeit: 10:00 – 16:00

Overview:

Over the last years, pre-trained embeddings have helped achieve significant improvements in a wide range of classification tasks in natural language processing (NLP). Representing words as vectors in a low-dimensional continuous space and then using them for downstream tasks lowered the need for extensive manual feature engineering. However, these pre-trained vectors are static and fail to handle polysemous words, where different instances of a word have to share the same representation regardless of context. More recently, different large neural language models (LMs) have been released to create contextualized word representations that can cope with the context-dependent nature of words.
In this course we introduce vector semantics, which instantiates the distributional hypothesis by learning representations of the meaning of words, called embeddings, directly from their distributions in texts. These representations are used in every NLP application that makes use of meaning, and the static embeddings we introduce here underlie the more powerful dynamic or contextualized embeddings like ELMO, BERT, and GPT-2.

The class will mix theoretical and practical sessions. During the practical sessions, you will have time to become familiar with the most widely used python libraries in this field, and apply many of the discussed methods yourself. You do not need to bring your own computer, but please make sure that you have a Google account to use Google Colab.

Course content

Code repo: https://github.com/davidarps/2022_course_embeddings_and_transformers

Here are the topics covered by this course in order of appearance, and an (estimated) schedule:

Introduction (Monday) [slides]
Language Modeling, Vocabularies, Tokenization (Mon)
- [slides] | [notebook]
Static Word Vectors (Mon, Tue)
- [slides] | [notebook] | [code] | [notebook] | [code]
Contextualized Word Embeddings (Tue)
- [slides]
Attention and Transformers (Thu, Wed)
- [slides Motivation and Intuitions] | [Attention visualization notebook]
- [slides Architecture 1: Encoder] | [Exercise notebook] | [code]
- [slides Popular Models] | [Etherpad] | [slides Architecture 2: Decoder]
Transfer Learning, Pre-training and Fine-tuning (Wed, Thu)
- [slides] | [Notebooks]
Benchmarking (Thu)
- [slides]
Bias in LMs (Thu)
- [slides] | [notebook]
Interpretability (Thu,Fri)
- [slides]
- [notebook NeuroX]
Conclusion (Fri)

References

Download [pdf]