Lab 2 (due 4/7)


NB As usual, write up instructions at the end. Please read the whole assignment before beginning.

NB2 This is a first, incomplete draft of these lab instructions. I'm posting this now to give you more lead time, but these instructions will be edited (and expanded) by Monday.


The goal of this week's lab is to make a big start on creating the primary test suite that you will use over the quarter. In order to do this, you'll need to gain an understanding of various grammatical phenomena in your language (described below). In addition, the test suites are to be encoded eventually as XML files (best practice archive format) and contributed to the ODIN project. This means that we'll be recording some extra information along the way. We'll continue with this next week, in addition to configuring starter-grammars based on the matrix and completing a baseline test suite run.

Back to top


Back to top

Test suites: General guidelines

Back to top

Grammatical phenomena

This section describes the grammatical phenomena we expect to cover, and the type of examples you should use to illustrate them. For Lab 2 (i.e., what you turn in on 4/7), you will need to handle at least 6 of the phenomena listed below. In addition, you should determine whether any of the phenomena are not relevant in your language (e.g., many languages have no case distinctions). Next week, you'll complete an additional 6 phenomena.

Basic Word Order

In declarative matrix clauses, what is the order of the major constituents (subject, verb, complements)? Be sure to consider both intransitive and transitive verbs, as well as ditransitive verbs in you can find any.

Do sentences in your language typically contain auxiliaries? Where does the auxiliary occur with respect to the verb?

Your ungrammatical examples in this section should explore all of the possible orders that are not allowed in your language. Likewise, your grammatical examples should illustrate all the possible orders. For a transitive verb, then, we expect six sentences (all possible orders of S, V, and O) with the number of grammatical v. ungrammatical sentences varying depending on the language type.

In some languages, full NPs are arguably always adjuncts (topics, etc.) with the valence requirements of the verb being filled by affixes. If your language seems to fall into this type, you should still try to find examples with full NPs illustrating where they can occur, but please discuss this in your write up.

Another more complicated word-order pattern is "V2", where the verb (or a finite auxiliary) must be the second thing in the clause, but anything can come first (S, O, adverb, etc). If your language is described as V2, your word order examples should include ungrammatical examples where the verb is not in second position, and a variety of grammatical examples where it is.

If your language is strict about the order of S and O, be careful in your assignment of (un)grammaticality in examples where S and O are reversed. Whether the string is strictly ungrammatical or actually just means something different depends on how else your language marks subjects and objects (e.g., with case or agreement). For example, in English, Cats chase dogs and Dogs chase chats are both grammatical, they just mean different things.

Please use full NPs for the basic word order examples (i.e., something like the cat or cats instead of Fluffy or them).


We'll be addressing agreement, and to get interesting subject-verb (or object-verb) agreement, it's useful to have non-third person forms.

Find the paradigm(s) for pronouns in your language: Do they vary by person, number, gender, case?

Do pronouns have the same distribution as full NPs in your language? (I.e., can they appear in the same places in the string?) If not, add some examples illustrating the contrast to your test suite. In either case, add some examples to your test suite illustrating the full paradigm. (Ungrammatical examples involving case and agreement will come up under those topics below.)

The rest of the NP

This section is meant to address the other obligatory elements of an NP (particularly, an NP headed by a common noun). Does the language require determimers? Require determiners only with certain nouns? Allow determiners optionally? Allow determiners even with proper nouns and pronouns? What is the order of the determiner with respect to the noun? The space of examples here (abstracting away from order) should look something like this:

Note that n1, n2, and n3 are supposed to be of different types (e.g., nouns that require determiners always, nouns that disallow determiners always, and nouns that optionally allow determiners). Not all languages will necessarily have three types.

Is there anything else that is required with NPs? One common thing is case-marking adpositions. Note that these can appear obligatorily with all nouns in all argument positions, optionally with all nouns in all argument positions, obligatorily but only in certain argument positions/with certain verbs, obligatorily but only with certain (e.g., animate) nouns, etc. Develop examples which illustrate the contrasts, including any contrasts between multiple case-marking adpositions (is one only used for subjects? etc).

Argument optionality

This section concerns whether and under what circumstances verbal arguments can be left unexpressed:

As far as I know, there are four basic patterns here:

  1. Rampant pro-drop: Any argument can be left unexpressed, with its interpretation handled by context (i.e., as if it were a pronoun), or, depending on the verb, as an unspecified object of that verb.
  2. Pronominal affixes: Any argument can be left unexpressed, but only if an "agreement marker" (arguably an incorporated pronoun) appears on the verb. Such "agreement markers" may or may not be compatible with overt objects. (Additionally, languages with this pattern may also allow the unspecified object situation.)
  3. Subject pro-drop: Subjects may be left unexpressed, but not other objects.
  4. Pro-drop limited to constructionally or lexically licensed null instantiation. (This is the situation in English, where argument can go unexpressed only in certain constructions [e.g., imperative] or with certain verbs [e.g., eat].

Determine which pattern your language falls under, and then create examples for your test suite. Here a typical set of examples would be (again, abstracting away from word order, and assuming that agreement markers are optional with overt arguments but obligatory with missing ones):

If your language generally doesn't allow pro-drop (or generally doesn't allow pro-drop of objects) but does with particular verbs, you should include two sets of examples like the above (one with a verb that does allow a dropped object and one with a verb that does not).


Agreement is covariation in form between multiple items (typically a head and a dependent) in a sentence. Sometimes, the head doesn't change form, but the dependent does, depending on properties of the head.

Languages vary greatly in how much agreement they display from none at all to quite a bit. You can categorize agreement systems along two dimensions:

  1. Which elements are in the agreement relation (subject & verb, object & verb, determiner & noun, adjective & noun are typical)
  2. Which features are involved (person, number, case and gender are typical)

Determine whether your language has agreement, and if so, which type. Then construct examples showing both grammatical and ungrammatical possibilities. Remember, the ungrammatical examples should only have one thing wrong with them (i.e., if you have both subject-verb and object-verb agreement, there's no need to make an example where both the subject and the object disagree with the verb).

For a language with determiner-noun agreement in number and case subject-verb agreement in person and number, a possible example set is:

Notice the inclusion of non-third person examples (using pronouns). You could pair e.g., every verb form with every kind of noun it doesn't agree with, but that's not strictly necessary. As long as each noun type and each verb form show up in at least one ungrammatical examples, most errors should get caught.


A language has a case system if the nouns vary in form depending on the grammatical role they play in a sentence and/or the specific head they are a dependent of. (In some languages, it's not the nouns themselves that vary in form, but the dependents of the noun, such as determiners or adjectives. We'll analyze this as the nouns still having case and their dependents agreeing with them in case.)

Some languages have no case system. Among languages that do have case systems, they can be characterized along the following parameters:

Determine if your language has a case system, and if so, where it falls on the above parameters. (An ergative-absolutive system is one where the object of a transitive verb bears the same case marking as the subject of an intransitive, and contrasts with the subject of a transitive. A split-ergative system is one where the ergative-absolutive pattern shows up only in certain contexts: only with certain tenses/aspects; only with certain types of nouns [e.g., animates]; only with certain types of verbs.)

Create examples showing both grammatical and ungrammatical case patterns. For the purposes of this class, you only need to consider the case of subjects, direct objects, and (if you found some ditransitives) indirect objects. If your language has quirky case, you should consider including both verbs that illustrate the major case patterns and verbs that have idiosyncratic patterns. If your language has split-ergativity, you should illustrate both sides of the split.

For a nominative-accusative language (with a distinct dative case) without quirky case, a typical example set would look like this (assuming SVO word order):

If your language also has agreement, or obligatory determiners, etc., make sure that the examples have the appropriate form along those dimensions.


Here we're interested in how your language handles sentential (NB: not constituent) negation. Again, there are a few basic strategies that should cover most cases:

  1. Inflection: An affix (prefix, suffix, infix) expressing negation, which attaches either to auxiliaries only, main verbs only, any (finite) verb.
  2. Independent adverb: An adverb that modifies a V, a VP, or an S; and attaches to the left, right, or either side.
  3. Selected adverb: An adverb that appears as a selected complement of auxiliaries only, main verbs only, any (finite) verb.

It can be subtle to distinguish between options 2 and 3, and chances are the data you'll be able to find won't be sufficient to do so. For option 1,positive test suite examples should show the negated verb/auxiliary. Negative examples should show the negation inflection on verbs that is not allowed on (non-finite verbs? main verbs?). For options 2/3, positive examples should show the negative adverb in the positions it can appear in; contrasting negative examples should illustrate where it cannot appear.

Some languages have both inflection and an adverb, in which case there are the following logical possibilities regarding their coocurrence:

  1. Both must be used together.
  2. Either one can be used separately, but they cannot be used together (complementary distribution).
  3. They can be used indepdently or together.
  4. The adverb is obligatory, but the inflection is optional.
  5. The inflection is obligatory, but the adverb is optional.

If your language allows both strategy types, determine (if you can) the rules of their coocurrence, and illustrate with appropriate positive and negative examples in your test suite.

Matrix yes-no questions

How does your language indicate matrix clause (i.e., not embedded) yes-no questions? Possible strategies include word order variations, a sentence-initial or sentence-final question particle, a special auxiliary, and intonation only.

Your testsuite should include positive examples illustrating all of the strategies in your language. If you have any strategies that involve additional lexical material (e.g., a question particle), create negative examples with the question particle in the wrong place.

If your language indicates questions with word order variations, go back to your negative examples under word order and check whether anything marked as ungrammatical there is really grammatical as a question. This is where we begin to see that a finished testsuite should pair strings with analyses --- this is done implicitly here in the free translations.

Consider creating examples of negative questions (i.e., sentences simultaneously illustrating both negation and yes-no questions).


How does your language indicate imperatives? (Word order variation, a special particle, morphological marking on the verb?) What happens to the subject of imperatives? Is it obligatorily present, optional, obligatorily missing? Again, create grammatical examples for all of the strategies that you have, and ungrammatical examples illustrating the limits on the distribution of any additional lexical material and/or limits on the expression/omission of subjects in imperative clauses.

Again consider negative imperatives (e.g., "Don't touch that!").

Embedded clauses (declarative, interrogative)

Try to find at least one verb that can embed finite declarative clauses and at least one verb that can embed finite interrogative clauses. How are the embedded clausees marked? Does the language use complementizers? Is the word order different between matrix and embedded clauses? Are there different complementizers for embedded declaratives v. interrogatives? Do the selecting verbs allow both kinds of clausees (e.g., English know) or just one (e.g., English ask). Create grammatical and ungrammatical exmaples to illustrate any contrasts that you find. Restrict your attention to yes-no questions.


How does your language express the meaning associated with English can in I can eat glass? The two major possibilities are an independent auxiliary like in English or an affix on the main verb. Alternatively, you might find only periphrastic means ("It is possible for me to eat glass.") If there's an auxiliary, you might find that it's a subject-raising verb (like in English), or that it does argument composition: in this case, the auxiliary takes the lexical verb as its complement and then adopts all of the verb's arguments (subject and complements) as its own. You can tell this is going on when the arguments of the verb are ordered with respect to the auxiliary rather than the verb.


Explore how coordination is marked in your language. Coordination is, very informally, the sort of phrasal combination marked by "and" in English. In some languages (like English), this is simple: a single lexical item can coordinate any kind of phrase. In other languages, coordination might be marked by adding an affix, lengthening a vowel, or changing to another tense -- the variety of marking strategies is surprising. Languages also vary as to how many coordinands must be marked: all of them (and A and B and C...), just one (A B and C), or none of them (A B C...). Also, some languages have different ways of marking coordination for different phrase types. If this is the case, it will be interesting to illustrate at least two different strategies.

Extracting this information from your written grammar can be challenging. Coordination is described in different sections in different grammars: in a separate section of its own, in a section that also describes subordination (often titled "Conjuctions"), or possibly spread out over the sections that describe each phrase type (i.e. nouns, verbs, adjectives, etc). Some grammars provide very little information beyond "the word for 'and' is FOO"; some are more detailed. Collect what information you can find, especially example sentences that have the word "and" in their gloss or translation. Consider doing a Google search (or other web search) based on the spelling of the morpheme for "and" to get some naturally occurring examples to supplement what you can get from your reference material. One last thing to be aware of: some languages mark coordinated meanings using a word or inflection meaning "with", but don't seem to actually form tightly bound coordinated constituents.

Adjectives, or ...

This topic is basically here to give folks who are working on languages without case and agreement something else to do when we get there. If there's something your language that you'd rather tackle other than adjectives, go ahead and develop examples for that instead.

For adjectives, you should determine where they appear with respect to the nouns they modify (left, right, either, separated to anywhere in the sentence). You should also determine if they agree with the nouns they modify.

Relative clauses

This is a topic that I hope we will all get to towards the end of the quarter. Relative clauses are clauses that appear as nominal modifiers, such as which I read in The book which I read is over there. Languages vary (at least) in (i) whether or not they have special relative words (words that mark a clause a relative, which may or may not also be pronouns which are coindexed with the noun the clause modifies), (ii) whether or not there is (necessarily) a gap in the relative clause that is notionally filled by the head noun, and (iii) which roles the head noun can fill in the relative clause.

Your positive examples in this part of the test suite should show what relative clauses look like. The negative examples should illustrate any restrictions of the order of the relative clause with respect to its head noun, any requirements that there be a gap in the relative clause, and any restrictions on which roles the head noun can fill. That will probably just scratch the surface on details about relative clauses --- if you would like to add more, please do.

Keep in mind that your examples should be complete sentences.

Here are some examples from English:

Back to top


The test suites should be initially produced as plain text files (in ascii or unicode). We'll produce perl scripts that turn the plain text into best-practice XML on the one hand and the required format for [incr tsdb()] on the other. In order to do this, the formatting of the plain text file has to be relatively strict.

Your test suite file should consist of a header, containing information pertinent to all of the examples, followed by a list of examples. The header should contain the following information:

Language: <language name>
Language code: <Ethnologue language code>
Author: <your name>
Date: April 7, 2006
Source a: <Reference to grammar/web page>
Source b: <Reference to grammar/web page>
... (as many sources as needed)

Each example should consist of the following:

Source: {a:page, b:page, author, elicited, attested}
Vetted: {t, f, s}
Judgment: {g, u}
Phenomena: {word order, case, agreement, ...}
<Example in standard orthography> (optional)
<Example in transliteration> (optional)
<Example with morpheme boundaries noted and morpheme forms regularized>
<Morpheme-by-morpheme glosses>
<Free translation>

The source field indicates where the example came from. If it came from one of your written sources, you can refer to that source with a single letter code. If there is a page number associated with the example, it should follow the letter (with a colon in between). If you made the example up, the source should be author. If you elicited the example from a native speaker, then the source should be elicited. If you found the example in a non-linguistic text, the source should be attested.

The vetted field indicates where the judgment on the example came from. t means the example has been vetted by a native speaker, who gave the judgment indicated. f means it has not. (In this case, the judgment is your best guess based on the grammatical materials you have.) If you are a native speaker, you can vet your own examples. If the example comes from a grammar (which indicates a grammaticality judgment for it explicitly), and you haven't had it vetted in addition, you should put s in this field. This is meant to indicate that we think the example was vetted before being included in the grammar, but since we didn't do it, we're not sure. For attested examples, you should use t if you checked it with a native speaker and f if you have not.

The judgment field indicates the gramamticality judgment assigned to the example (either by a native speaker, in a grammar, or your best guess). g is for grammatical and u is for ungrammatical.

The phenomena field is a list of phenomena illustrated in the example. We'll have the perl script recognize both long and short names for each phenomenon, according to the table below. A single example might illustrate multiple phenomena.
Long nameShort name
Word orderwo
Case-marking adpositionsadp
Argument Optionalitypro-d
Matrix yes-no questionsq
Embedded declarativesemb-d
Embedded questionsemb-q
Relatives clausesrel

The standard orthography and transliteration lines give a canonical respresentation of the string. You should have at least one of these, possibly both. Whatever you do, it should be consistent for the whole file.

The example should also be presented with the morpheme boundaries explicit. This will allow us to write a perl script that aligns glosses with each morpheme. For languages with particularly complex morphophonology, you might end up using this line as the example you actually parse/generate (to abstract away from the phonological rules). It is for this reason that this line should have phonologically regularized forms. In this line, morpheme boundaries should be indicated with hyphens and word boundaries with spaces. If your language has clitics, the boundary between a clitic and its host should be marked with an equals sign.

The next line is the morpheme-by-morpheme glosses. These should be in a one-to-one correspondence with the morphemes, so if the nth word in the line above has two hyphens in it, the nth word in this line has two hyphens as well. Stems should be given English glosses indicating their meaning. For formatting and a good set of grammatical abbreviations to use, please follow the Leipzig glossing rules. (You might also want to refer to the standardized set of grams from ODIN.)

The final line for each entry should have the free translation of the example.

Here is an example sentences that I just made up from Japanese (with only a transliteration and not a standard orthography line for the moment):

Source: author
Vetted: f
Judgment: g
Phenomena: {case, negation}
Keeki-wo tabenakatta.
Keeki-wo tabe-nai-ta
cake-acc eat-NEG-PRF
`(Someone) didn't eat cake.'

And a couple from French:

Source: author
Vetted: f
Judgment: g
Phenomena: {word order}
J'ai mangé le gâteau
Je-ai mange-é le gâteau
I-have.1sg eat-PRF the.M.SG cake
`I have eaten the cake.'

Source: author
Vetted: f
Judgment: u
Phenomena: {word order}
J'ai le gâteau mangé
Je-ai le gâteau mange-é
I-have.1sg the.M.SG cake eat-PRF
`I have eaten the cake.'

(NB: I'm going with the analysis of so-called French "clitics" as affixes.)

NB: To make things easier on the perl script, please don't include any newlines except at the end of each of the lines in the format.

Here's a long example from Japanese to illustrate. This is good:

Source: a
Vetted: s
Judgment: g
Phenomena: {coordination, servial verb}
Ima made takusan hon-wo yonde kimashita ga, kore kara mo yonde iku tsumori desu.
Ima made takusan hon-wo yom-te ki-mas-ta ga kore kara mo yom-te ik-u tsumori dseu.
now until many book-ACC read-PTCP come-HON-PRF CONJ here from also read-PTCP go-IPFV intention COP.HON.IPFV
`Up to now I have read quite a few books and I intend to read from now on, too.'

Do NOT do it this way. (I'm only including the IGT lines here to keep folks from looking at this quickly and copying.)

Ima made takusan hon-wo yonde kimashita ga,
kore kara mo yonde iku tsumori desu.
Ima made takusan hon-wo yom-te ki-mas-ta ga
kore kara mo yom-te ik-u tsumori dseu.
now until many book-ACC read-PTCP come-HON-PRF CONJ
here from also read-PTCP go-IPFV intention COP.HON.IPFV
`Up to now I have read quite a few books and I intend to read from now on, too.'

Do NOT do it this way either:

Ima made takusan hon-wo yonde kimashita ga,
Ima made takusan hon-wo yom-te ki-mas-ta ga
now until many book-ACC read-PTCP come-HON-PRF CONJ
kore kara mo yonde iku tsumori desu.
kore kara mo yom-te ik-u tsumori dseu.
here from also read-PTCP go-IPFV intention COP.HON.IPFV
`Up to now I have read quite a few books and I intend to read from now on, too.'

Back to top

Write up

This write up will be relatively long. For each grammatical phenomenon of the six you handle this week, you should describe what you discovered about your language. If you have any questions about any of them (that you haven't yet asked or even if you have), please include that as well. Indicate any places where you were uncertain about what the language actually does, and describe any assumptions you made in order to create test suite examples.

In addition, your write up should indicate whether any of the phenomena listed above (not just the 6 you do this week) are irrelevant for your language.

In general, I prefer for the write-ups to be plain text. If you want to submit something with formatting, please create a pdf.

Back to top

Submit your assignment

Back to top

ebender at u dot washington dot edu
Last modified: Fri Apr 7 10:15:34 PDT 2006