Computational Morphology
Thursday, original time 10:30-14:00, videochat with lecture (10:30 - 12:00) + individual sessions (12:30 - 14:00). Join our Rocket Chat channel to stay up to date! Pieces of code are in the EtherpadTentative course schedule
- 23 April -- Introduction, terminology, recalling theoretical morphology. Slides Youtube: intoduction to morphology
- 30 April -- FSA and FST. PDF. Some FSA and RE exercises. Homework 1: create a transducer (on paper or draw it on your computer) for English plural. Please use tags +Sg and +Pl. This foto can help.
Please download and unpack xfst before the next session, you need to accept the license agreement at the bottom of this page. From Monday on, you can get this book in the library, it is very useful to do so.
If you need to refresh your knowledge about finite state automata and transducers, here are some useful links: this and the following lectures explain FSA, here is general information about transducers, and you can watch this video if you are stuck with the homework.
- 7 May -- Introducing xfst. Regular expressions for xfst. Homework 2.
- 14 May -- Working on transducing multicharachter labels into affixes and stack ordering (in particular, p. 144 in the book).
Homework 3: Bambona exercise (p. 153 in the book) (don't submit code there... Due date is May 28!)
- 28 May -- Some more advanced xfst commands. Monish exercise. Homework: The Monish Guesser exercise 4, p. 172 here (page 188 of the pdf)
- 4 June -- Exploring lexc (slides), creating a dictionary. Homework 5 (due 18.06): exercises about Esperanto nouns and adjectives (4.2.8, p. 218-226 in the book or here: nouns
and adjectives)
- 18 June -- Transducers in lexc. Homework 6: Esperanto verbs+restrict overgeneration, p. 245 + p. 273 here (page 261 +page 289 of the pdf)
- 25 June -- Reduplication, Root-pattern morphology. Homework 7: Write a lexicon (using lexc) that creates appropriate forms from descriptions (see Etherpad)
- 2 July -- Flag Diacritics. Homework 8: Change Esperanto exercise (HW 6) by restricting overgeneration using Flag Diacritics
- 9 July -- Presentations and discussion of AP projects
- 16 July -- Homework 9: Finnish Consonant Gradation exercise, Finnish Noun Inflection exercise.
Grading
For both BN and AP:
- Do your homework properly (most of the tasks with sufficient quality).
- Due dates will be announced and published here.
- You can leave you homework at the secretary of send to me by email (email only for programming exercises)
- When you send me something that is related to this class by email, start the title with CompMorph20.
- You homework assignments should be named HW-number-LastName.extension (e.g., HW3-Zinova.fst)
- Homework that is submitted after the due date does not bring you points.
- Up to 3 collaborators can submit a joint homework, indicating all names on the submission (please submit it once per group).
- Works that are obviously completed jointly while this is not indicated will be marked with 0 points.
For an AP:
- AP is in a form of Hausarbeit
- you will have to describe a piece of morphology using one of the frameworks we will be working with;
- each student doing an AP should be describing a separate piece of morphology (you can work on one language and analyse different phenomena, if you want);
- the area covered by your program should be something that takes around 70 optimal rules;
- to find such a piece, go to the library and study the shelves with grammars of languages you don't know;
- you have to tell about the piece of morphology you have chosen at one of the seminars.
- As a result of you work I expect to receive a script, a set of test examples (with the corresponding set of outputs), and a paper.
- The script has to work for all the cases described by the piece of morphology you aim to cover.
- Your set of test examples should be representative of the data you aim to cover, be sure to check that all the important cases are included and you are not testing exactly the same combination of rules multiple times (unless you provide an automated testing script that checks the output).
- In the paper you should describe the facts that you are modeling, the choices you had to make while writing the program (e.g., the ordering of rules and the selection of the formalism), the testing phase, and (optional) the material that you are aware of, but your program does not cover for good reasons.
Grades:
- The description part is worth 30 points, the script part -- 60 points, the set of testing examples -- 10 points;
- Grade/points correspondence:
- 1.0: 95 -- 100
- 1.3: 91 -- 94
- 1.7: 87 -- 90
- 2.0: 83 -- 86
- 2.3: 80 -- 82
- 2.7: 75 -- 79
- 3.0: 70 -- 74
- 3.3: 65 -- 69
- 3.7: 60 -- 65
- 4.0: 50 -- 59