Lab 1 (due 1/12), 11:59pm
NB The lab assignments will typically include write up
instructions at the end. Before you start, read the whole assignment
once, including the write-up instructions, so you know what to keep
track of along the way.
In addition: I will attempt to post the lab rubrics as each lab
is posted, so that you can see what is expected (and what would
be going above and beyond).
Choose your language and stake your claim
- Find a partner to work with for Labs 2-8. Ideally, each group
should include at least one person with solid training in linguistics.
- Choose the language you would like to work on this quarter. The languages on languages list are generally not available. English, German, Dutch and Japanese are not allowed. But there are 7000 or so languages in the world. Many are left to be worked on in 567 :)
- Some languages that would be particularly of interest this year, if you can find suitable descriptive resources:
- Anything related to Russian, not yet done
- A Turkic language, other those on the list
- Rukai [dru]
- Paresti-Haliti [pab]
- Yakima Sahaptin [yak] (was done once, but can be redone)
- Basque [eus]
- Korean [kor]
- Standard Arabic [arb], or some other variety of Arabic
- Archi [aqc]
- Find reference materials for the language that are available through UW libraries (you'll want these materials longer than inter-library loan allows). Alternatively, you might find good resources available as pdf:
Glottolog.org might help you identify resources, as well.
- Email me by 5pm Tuesday 1/9 (preferably earlier) with the references you have found so that I can verify that they will be sufficient.
- Edit the languages list on the Treehouse wiki to stake your claim to your language. Be sure to include the iso code and language family. You should only add one line to this table, but put both of your names on it.
Grammar Customization: Get a small grammar for English
- Download our choices file
- Visit the LinGO Grammar Matrix customization page.
- Click on "Browse" next to "Upload choices file..." and upload the choices file you downloaded.
- Click on all of the subpages (starting with "General Information" and "Word order") to see which options have been selected in this choices file, and what other options are available.
- Click on "create grammar" and download the .tgz or .zip file.
Emacs: Getting started
I strongly recommend familiarizing yourself with emacs at
the very start of the quarter. Here's one tutorial and another. Or, you can do the tutorial
within emacs by typing "Control-h" and then "t".
LKB: Getting started
- Install the LKB (see notes under "Software" on the main course page)
- If you find that the VM is very laggy, try ssh-ing into it from your host OS:
ssh -Y -p 2225 ubuntu@127.0.0.1
Note that you can also use scp to move files back and forth:
scp -P 2225 file_to_move.txt ubuntu@127.0.0.1:/file/destination/file.txt
scp -P 2225 ubuntu@127.0.0.1:/file/to/get.txt /local/destination/file.txt
- Run emacs
- Type M-x lkb to run the LKB
- Type (lui-initialize) to run the LUI interface
- Unzip the grammar you downloaded (tar xzf 567_english.tar.gz)
- Load the starter grammar in the LKB:
- Try parsing:
Examine the file lexicon.tdl in the starter grammar, and try making up sentences to parse based on the vocabulary there.
Find four different sentences that do parse. Record these in your write up.
Find two strings (nb: these do not have to be grammatical sentences of English), using the vocabulary in lexicon.tdl, that don't parse. Record these in your write up.
Try interactive unification
These instructions assume you are using the LUI interface, which
I believe is on by default. If they don't make sense, try invoking
(lui-initialize) at the LKB prompt (the *) in emacs.
- Ask the LKB to parse each of the strings you found that it doesn't parse.
- In the LKB Top menu, choose "Parse | Show parse chart".
- Examine the parse chart to find the first point of failure in parsing. Which constituents should combine, if only some constraint weren't blocking them?
- In the LKB Top menu, choose "View | Grammar Rule" and select the rule that you think should (modulo that constraint) combine the constituents.
- Click on "phr-synsem" (value of SYNSEM) to collapse that sub-structure.
- Choose the constituent from the parse chart that you believe should be the left-hand daughter and drag it onto the first element of the ARGS list in the rule. You should get a new window, labeled "unification result".
- Shrink "phr-synsem" in the "unification result" window, and then choose the constituent you believe should be the right-hand daughter and drag it onto the second element of the ARGS list in the rule.
- You should get a new window labeled "unification failure", with the point of failure highlighted in red.
- Look in the grammar files to see where the constraints that led to that unificaiton failure are encoded, and record this information in your write up.
- Do the same for the other non-parsing string you found.
Chain of identities
In the MRS assigned by this grammar to A cat chased me, the ARG0 value the _cat_n_rel is associated with the ARG1 value of the _chase_v_rel (that is, the cat is doing the chasing). In this part of the assignment, you will trace the chain of identities that connects these two.
- Parse the sentence, and click on the small tree to get the larger tree.
- Click on the N node above cat to get the feature structure associated with that node.
- Explore the feature structure, to locate the feature INDEX and see what it is identified with. (You may find it useful to shrink down certain substructures, and to use the pop-up menus on the identity tags.)
- Do the same with the second N node above cat (representing the singular noun lexical rule), the NP node, the S node, the VP node, and the two V nodes.
- Now look through the .tdl files to find the types which encode the constraints responsible for the chain of identities. You'll want to start with the leaf types, but you'll need to look through supertypes, too. This can be done by using grep or the search functionality in emacs (C-s). The supertypes in a type definition are after the ":=". To find where a type is defined, search for the type name followed by ":=".
Note that in addition to exploring the supertypes by searching through the .tdl files, you can also look at them through the LKB. For example, think of the constraint that you expect the lexical entry for "cat" to be contributing. Then:
- From the LKB Top menu, choose "view | lex entry"
- Enter "cat" (the identifier for that lexical entry)
- Right click on the type at the top left of the tfs that pops up (commoun-noun-lex)
- Choose "view type definition"---this should give a non-LUI window, showing the type definition, without inherited constraints.
- If you don't see the constraint you're looking for, explore the parent type(s) in the same fashion.
Write up
Please submit write-ups as plain text files. (In future labs,
that will help me run example sentences through your grammar.)
Your write up should include:
- The four sentences you found that parse.
- The two (or more) strings you found that didn't parse.
- The names of the rules you used in interactive unification to see why they didn't parse.
- The tdl snippets that lead to the conflicting constraints for each non-parsing string, along with a prose description of what they do.
- A description of TWO STEPS in the chain of identities linking the ARG0 of _cat_n_rel to the ARG1 of _chase_v_rel in A cat chased me. Each link in the chain should say which instance is involved (e.g., lexical entry for cat), which supertype it inherits the constraint from, and show the tdl for the constraint. In addition, you should indicate which identity tag is enforcing the constraint. Your description should take the form of a numbered list.
This chain is very long and has a lot of links. I'm only asking you to document two of them here (other than the ones given below). To help you out, and to give you a sense of the format
I'm expecting, here's two of them. (I picked these two because they are
possibly the most obscure.)
n. The head-spec phrase structure rule inherits the following
constraint from basic-head-spec-phrase-super:
[ HEAD-DTR [ SYNSEM [ LOCAL [ CONT.HOOK #hdhook ],
NON-HEAD-DTR.SYNSEM
[ LOCAL.CAT.VAL.SPEC < [ LOCAL [ CONT.HOOK #hdhook ] > ]]]
identifying the CONT.HOOK of the head daughter with the CONT.HOOK value inside the non-head daughter's SPEC via #hdhook.
n+1. The head-spec phrase structure rule also inherits the
following constraints from basic-head-spec-phrase:
[ NON-HEAD-DTR.SYNSEM.LOCAL.CONT.HOOK #hook,
C-CONT.HOOK #hook ]
identifying the CONT.HOOK of the non-head daughter with the C-CONT.HOOK of the rule.
- At least three questions that this lab caused you to wonder about.
(Please indicate if you've figured out the answers, or if you would still like to see them addressed. Also note that it's fine to include here questions you posted to Canvas while doing the assignment!)
- If you were unable to complete any part of the assignment, a
description of the problems you encountered and what you think might
be going on. (You can earn partial credit for any part of the
assignment you couldn't get working by describing it in this section.)
Submit your assignment
- For this assignment, you only need to submit your write up.
- Please submit it as a plain text (.txt) file to Canvas
Last modified: