Computational Tools for Corpus Linguistics
WS2014/2015 Tue 12:30 - 14:00 23.21.04.87
Lecture Slides & Materials
28.10.2014
Course overview.
Corpus linguistics basics: definitions of a corpus, taxonomies of corpora, methodology or a theory, corpus-based vs. corpus-driven approach, main fields of application of corpus linguistics, historical roots of data-intensive linguistics, corpus design.
04.11.2014
Corpus linguistics basics (cont.): representativeness, ballance & sampling, mark-up & annotation.
British National Corpus (BNC).
11.11.2014
Multilingual corpora, statistical data.
DIY corpora.
18.11.2014
Unix basics, unix tools.
25.11.2014
Unix tools.
02.12.2014
Tokenizer.
09.12.2014
Frequency lists.
16.12.2014
N-grams.
06.01.2015
HTML-stripper.
13.01.2015
Extract information from annotated corpora.
20.01.2015
KWIC (keyword in context).
27.01.2015
AWK.
03.02.2015
Exercises