Computational Morphology
Important information
Thursday, total time 10:30-14:00, Webex meeting, 2 parts:
10:30-12:00 - Q&A, you may come with individual/clarification/homework questions (homework first submission deadline is Wednesday 10:30, homework update deadline is Thursday 12:30)
12:30-14:00 - new material, group work on learning new skills, group and mini-group discussions
Webex meetings will not be recorded! Experience shows that providing recordings greatly diminishes online participation. It is a hands-on class, so i you want to acquire skills, you need to participate. If you can't manage it some day, information is in the book, I am available for questions in RocketChat and your homework buddy/group is there to answer your questions as well!
Join our Rocket Chat channel to stay up to date!
Homework submission schedule: your homework submission must take place until 10:30 Wednesday of the week following homework announcement (one week later if Thursday is a day off). I will have a quick look on it and on Thursday you can ask questions (10:30-12:00) and update your solution (until 12:30).
Tentative course schedule
- 15 April -- General session starts at 10:30. Be online to easily find a homework partner/group! Introduction, terminology, recalling theoretical morphology. Slides Youtube: intoduction to morphology
FSA and FST. PDF. Some FSA and RE exercises. Homework 1: create a transducer for English plural. Please use tags +Sg and +Pl. This foto can help.
Please download and unpack xfst before the next session, you need to accept the license agreement at the bottom of this page. You can get this book in the library, it is very useful to do so.
If you need to refresh your knowledge about finite state automata and transducers, here are some useful links: this and the following lectures explain FSA, here is general information about transducers, and you can watch this video if you are stuck with the homework.
- 22 April -- Introducing xfst. Regular expressions for xfst. Outputs rom the class: Text file for bulk testing, script file. Homework 2.
- 29 April -- Working on transducing multicharachter labels into affixes and stack ordering (in particular, p. 144 in the book). Script file for verbs, script file for nouns.
Homework 3: Bambona exercise (p. 153 in the book) (don't submit code there...)
- 6 May -- Some more advanced xfst commands. Monish exercise. Homework 4: The Monish Guesser exercise, p. 172 here (page 188 of the pdf)
- 20 May -- Exploring lexc (slides), creating a dictionary. Homework 5 (due June 9): exercises about Esperanto nouns and adjectives (4.2.8, p. 218-226 in the book or here: nouns
and adjectives)
- 27 May -- No plenum webex session, I will answer questions in RocketChat and individual session during the week are possible if you have questions and obligatory if you want to do an AP (we will discuss your project).
- 10 June -- Transducers in lexc. Homework 6: Esperanto verbs+restrict overgeneration, p. 245 + p. 273 here (page 261 +page 289 of the pdf)
- 17 June -- Reduplication, Root-pattern morphology. Homework 7: generate terms for creating a family tree (see the file in your dropbox)
- 24 June -- Flag Diacritics. Homework 8: add 2 verbal roots and 2 forms to the arabic morphological analysis
- 1 July -- Plenum session: Presentations and discussion of AP projects. Every author of an AP project has to pose a small problem related to their language. Homework 9: solve one of the presented problems, Change Esperanto exercise (HW 6) by restricting overgeneration using Flag Diacritics
- 9 July -- Discussion session: authors of AP problems must be available rom 10:30 to 12:00 if there are questions. Plenum session: other approaches to morphology. Homework 10: read one of the articles in groups of 2-3 people (list will follow) and prepare to tell others about it.
- 16 July -- Discussion session: you may meet with your group to decide on who tells what to the group or ask comprehension questions to me. Plenum session: group presentations of the articles, discussion. Homework 11: TBA
- 27 July -- Plenum session: discussing various approaches to computational morphology and its perspective in the contemporary NLP.
Grading
For both BN and AP:
- Do your homework properly (most of the tasks with sufficient quality).
- Due dates will be announced and published here.
- You can leave you homework at the secretary of send to me by email (email only for programming exercises)
- When you send me something that is related to this class by email, start the title with CompMorph21.
- You homework assignments should be named HW-number-LastName.extension (e.g., HW3-Zinova.fst)
- Homework that is submitted after the due date does not bring you points.
- Up to 3 collaborators can submit a joint homework, indicating all names on the submission (please submit it once per group).
- Works that are obviously completed jointly while this is not indicated will be marked with 0 points.
For an AP:
- AP is in a form of Hausarbeit
- you will have to describe a piece of morphology using one of the frameworks we will be working with;
- each student doing an AP should be describing a separate piece of morphology (you can work on one language and analyse different phenomena, if you want);
- the area covered by your program should be something that takes around 70 optimal rules;
- to find such a piece, go to the library and study the shelves with grammars of languages you don't know;
- you have to tell about the piece of morphology you have chosen at one of the seminars.
- As a result of you work I expect to receive a script, a set of test examples (with the corresponding set of outputs), and a paper.
- The script has to work for all the cases described by the piece of morphology you aim to cover.
- Your set of test examples should be representative of the data you aim to cover, be sure to check that all the important cases are included and you are not testing exactly the same combination of rules multiple times (unless you provide an automated testing script that checks the output).
- In the paper you should describe the facts that you are modeling, the choices you had to make while writing the program (e.g., the ordering of rules and the selection of the formalism), the testing phase, and (optional) the material that you are aware of, but your program does not cover for good reasons.
Grades:
- The description part is worth 30 points, the script part -- 60 points, the set of testing examples -- 10 points;
- Grade/points correspondence:
- 1.0: 95 -- 100
- 1.3: 91 -- 94
- 1.7: 87 -- 90
- 2.0: 83 -- 86
- 2.3: 80 -- 82
- 2.7: 75 -- 79
- 3.0: 70 -- 74
- 3.3: 65 -- 69
- 3.7: 60 -- 65
- 4.0: 50 -- 59
Mindmap