The XTAG-System (1)

Tree Adjoining Grammars

The XTAG-System (1)

29. October 2012

Laura Kallmeyer, Timm Lichte

Setting up the system on your account

The XTAG-tools are part of a virtual machine (VM), that needs to be startet with VirtualBox.

Start the virtual machine called "ubuntu-4.10".
At some point a login screen appears: user name = xtaguser, password = xtaguser

Bingo: You should see an archaic gnome desktop.

Turn off mouse integration: Machine => Disable Mouse Integration
Note: Once your mouse is catched, you can release it again with the right control key.
Open a terminal and type (where $ stands for the prompt):
```
$ cd Software
$ ls
```
You should see a list of directories and files, e.g. the directory lem-0.14.i686, which contains the XTAG-parser.
Alternatively, you can open a file browser: Computer => Home

The components of the XTAG-System

The Morphological database

Morphological Analyzer and Morph Database: Consists of appr. 317000 inflected items derived from over 90000 stems. Returns root form, POS, and inflectional information.

We will take a look at the maintenance interface of the morphological database. It allows to add, edit or view entries of the morphological database. Please type ... Unfortunately, this does not work yet.

Instead, open the file ~/Software/morph-1.5/data/morph_english.flat via the file browser, or type the following in the terminal

$ cd /home/xtaguser/Software/morph-1.5/data
$ less morph_english.flat

and lookup some interesting words:

car, car's, cars
give, gave, given, giving
flies
's
like, liked, liking

to top

POS Tagger/POS Blender

The basis for tree selection of the parser is the presence of a POS tag on every word of the input. For those lexical items that cannot be assigned a POS tag via the Morphological Analyzer, the POS tagger is used. The POS Blender makes the final decision about the POS tag a word receives. It uses the output of the POS tagger as a filter on the output of the morphological analyzer. Any words that are not found in the morphological database are assigned the POS given by the tagger.

to top

Syn DB

Syntactic Database: More than 30000 entries. Each entry consists of: uninflected form of the word, POS, list of trees or tree-families associated with the word, and a list of feature equations that capture lexical idiosyncrasies.

We will take a look at the maintenance interface of the syntatic database. Please run

$ ~/Software/synedit/synedit

and load the following file via File => Open:

~/Software/english/syntax/syntax-coded.flat

The interface can now be used to search the database or to view or edit entries.

Fetch the entry for "bring" and "cover".

to top

Trees DB

Tree Database: 1004 trees, divided into 53 tree families and 221 individual trees. The tree families represent subcategorization frames; the trees in a tree-family would be related to each other transformationally in a movement-based approach.

Various tools for the access to the two databases are available.

```
$ xtag.show english <regexp> 
```
xtag.show can be used to view individual trees from the grammar. Any regular expression can be used to match the tree names, for example:
```
xtag.show english ^betaN[0-9]*
```
will display all the relative clause trees.
```
$ xtag.show.fam english <regexp1> <regexp2>
```
xtag.show.fam can be used to view trees in families from the grammar. Any tree family that matches [regexp1] and then each tree in that family which matches [regexp2] is displayed. For example:
```
$ xtag.show.fam english ^Tnx0Vnx1$ ^alphaW[0-1]*
```
will display all wh-extraction trees from the transitive tree family
```
$ xtag.show.word english <word> "<regexp>"
```
xtag.show.word can be used to view all trees lexicalized by [word]. In addition, the list of all such trees can be filtered by using the optional [regexp] parameter. Use ".*" if you want to see all selected trees. For example:
```
$ xtag.show.word english aim "for"
```
will show only the trees that are anchored by the word "aim" and coanchored by the preposition "for". To see all trees for "aim", run the command:
```
$ xtag.show.word english aim ".*"
```
To see a transitive and an intransitive version of an elementary tree for "bought", run
```
$ xtag.show.word english bought "alphanx0Vnx1\[bought\]"
$ xtag.show.word english bought "alphanx0Vnx2nx1\[bought\]"
      
```

Using the tree display window:
right mouse button: next tree
left mouse button: previous tree
key 'f': show features
key 'q': exit

to top

The parser

The parser works in two steps. During tree selection, for each word, tree templates are chosen from the Tree database and the anchor position is filled with the word. POS tagging is decisive during this step: the set of POS tags corresponds exactly to the set of anchor nodes. This allows to establish a correspondence between words and elementary trees. During tree grafting, parsing is done. The output is a parse forest from which derived trees and derivation trees can be extracted.

Running the parser
Run the parser on a set of input sentences and take a look at the resulting parse forest:
```
$ runparser ~/Software/lem-0.14.0.i686/test/sample > outfile
$ less outfile
```
Parses (derivation trees and derived trees) can be extracted from the forest with print_deriv. Try
```
$ print_deriv -p outfile > outfile.derived
$ print_deriv -d outfile > outfile.derivation
```
The first command extracts all derived trees, the second command extracts all derivation trees. Take a look at the new files. They contain trees written in a bracketed notation. You can view them graphically with showtrees. (Navigation: right mouse button: next tree, left mouse button: previous tree)
```
$ showtrees outfile.xxx
```
Of course you can run all at once using pipes:
```
$ runparser ~/Software/lem-0.14.0.i686/test/sample | print_deriv -p | showtrees
```
Feature Structure Unification
By default, feature structure unification is switched off. To use the parser with feature structure unification:
```
$ runparser +c ~/Software/lem-0.14.0.i686/test/sample | print_deriv -f | showtrees
```
A window should pop up with parses that have had successful unifications. Pressing 'f' in the window shows you the feature structures of each tree.

To unify feature structures after parsing:
```
$ runparser +u outfile | print_deriv -f | showtrees
```
Parse forest browser
The (experimental) parse forest browser allows you to browse the parse forest directly and to save certain derivations to disk. Run
```
$ xtag.browser
```
and load one of the parse forest file you have produced before.

to top

Timm Lichte (Thanks to Wolfgang Maier!)

Last modified: Son Oct 28 22:00:00 CEST 2012

Tree Adjoining Grammars

The XTAG-System (1)

Setting up the system on your account

The components of the XTAG-System

The Morphological database

POS Tagger/POS Blender

Syn DB

Trees DB

The parser

Running the parser

Feature Structure Unification

Parse forest browser