Computational Morphology
Wednesday, 12:30-15:45, 24.21.03.61
Tentative course schedule
- 10 April -- Introduction, terminology, recalling theoretical morphology. Slides Homework: read this chapter about Tamil morphology.
There are two groups: one prepares information about nouns and adjectives and the other - about verbs.
Write me to be assigned to one of them (for those who were not in class). You should prepare a description that
- interesting to the other part of the auditory
- contains a preliminary analysis of the facts (e.g. which types of processes are used)
- identifies places that are interesting/challenging for the analysis.
If you are unable to come on April 17, you can write a short summary (1 page), reflecting
on the above mentioned topics (of course after reading the chapter).
- 17 April -- FSA and FST. Discussing Tamil properties. PDF. Some FSA and RE exercises.
Homework: have a look at one of Tamil grammars at the library/Internet (24.21, spr o 700.a825, spr o 700.a575, spr o 700.r471, Lehmann)
and write a list of 20-30 pairs/triples that illustrate a phenomena (like change of number or tense) and contains all the
morphophonologically different classes. If you are in class, bring your list, if not, send it to me by email by the time of the class.
- 24 April -- Introducing xfst. Regular expressions for xfst. Homework.
- 8 May -- Working on transducing multicharachter labels into affixes and stack ordering (in particular, p. 144 in the book), as well as starting to work on Tamil. See files "past" and "tamil" in the dropbox folder.
Homework: Bambona exercise (p. 153 in the book) (don't submit code there...)
- 15 May -- Some more advanced xfst commands. Monish exercise
- 22 May -- Class canceled due to the presence of only one student
- 29 May -- Special session: Morphology with xmg (by Simon Petitjean)
- 5 June -- Exploring lexc (slides), creating a dictionary. Homework: exercises about Esperanto nouns and adjectives (4.2.8, p. 218-226 in the book or here: nouns
and adjectives)
- 12 June -- Transducers in lexc. Esperanto verbs. Homework: complete Esperanto verbs exercise + research the possibility of verbalising a noun/adjective and add it.
- 19 June -- Implementing some Tamil morphology (bring your Tamil examples!). Homework: modify Esperanto solution using the Flag Diacritics (p. 341-361) in order to control for the cooccurrence of ge- and -in affixes.
- 26 June -- Root-pattern morphology (Arabic)
- 3 July -- Presentations and discussion of AP projects
- 10 July -- Finnish Noun Inflection exercise, Finnish Consonant Gradation exercise.
Grading
For both BN and AP:
- Do your homework properly (most of the tasks with sufficient quality).
- Due dates will be announced and published here.
- You can leave you homework at the secretary of send to me by email (email only for programming exercises)
- When you send me something that is related to this class by email, start the title with CompMorph18.
- You homework assignments should be named HW-number-LastName.extension (e.g., HW3-Zinova.fst)
- Homework that is submitted after the due date does not bring you points.
- Up to 3 collaborators can submit a joint homework, indicating all names on the submission (please submit it once per group).
- Works that are obviously completed jointly while this is not indicated will be marked with 0 points.
For an AP:
- Ap is in a form of Hausarbeit
- you will have to describe a piece of morphology using one of the frameworks we will be working with;
- each student doing an AP should be describing a separate piece of morphology (you can work on one language and analyse different phenomena, if you want);
- the area covered by your program should be something that takes around 70 optimal rules;
- to find such a piece, go to the library and study the shelves with grammars of languages you don't know;
- you have to tell about the piece of morphology you have chosen at one of the seminars.
- As a result of you work I expect to receive a script, a set of test examples (with the corresponding set of outputs), and a paper.
- The script has to work for all the cases described by the piece of morphology you aim to cover.
- Your set of test examples should be representative of the data you aim to cover, be sure to check that all the important cases are included and you are not testing exactly the same combination of rules multiple times (unless you provide an automated testing script that checks the output).
- In the paper you should describe the facts that you are modeling, the choices you had to make while writing the program (e.g., the ordering of rules and the selection of the formalism), the testing phase, and (optional) the material that you are aware of, but your program does not cover for good reasons.
Grades:
- The description part is worth 30 points, the script part -- 60 points, the set of testing examples -- 10 points;
- Grade/points correspondence:
- 1.0: 95 -- 100
- 1.3: 91 -- 94
- 1.7: 87 -- 90
- 2.0: 83 -- 86
- 2.3: 80 -- 82
- 2.7: 75 -- 79
- 3.0: 70 -- 74
- 3.3: 65 -- 69
- 3.7: 60 -- 65
- 4.0: 50 -- 59
Materials appear here