Lab 2 (due 1/18)

Updated 1/18/08 to include description of tense/aspect and navigation links to modals, coordination, and adjectives sections for the purposes of Lab 3.

Updated 1/22/08 to advise against copular sentences.

Updated 1/23/08 with more specific instructions for write-ups and comments on ungrammatical examples.

Navigation

NB As usual, write up instructions at the end. Please read the whole assignment before beginning.

Overview

The goal of this week's lab is to make a big start on creating the primary test suite that you will use over the quarter. In order to do this, you'll need to gain an understanding of various grammatical phenomena in your language (described below). In addition, the test suites are to be encoded eventually as XML files (best practice archive format) and contributed to the ODIN project. This means that we'll be recording some extra information along the way. We'll continue with the test suite development next week (Lab 3).

Back to top

Preliminaries

Back to top

Test suites: General guidelines

Back to top

Grammatical phenomena

This section describes the grammatical phenomena we expect to cover, and the type of examples you should use to illustrate them. For Lab 2 (i.e., what you turn in on 1/18), you will need to handle at least five of the phenomena listed below. In addition, you should determine whether any of the phenomena are not relevant in your language (e.g., many languages have no case distinctions). Next week, you'll complete an additional six phenomena.

Basic Word Order

In declarative matrix clauses, what is the order of the major constituents (subject, verb, complements)? Be sure to consider both intransitive and transitive verbs, as well as ditransitive verbs in you can find any.

Do sentences in your language typically contain auxiliaries? Where does the auxiliary occur with respect to the verb?

Your ungrammatical examples in this section should explore all of the possible orders that are not allowed in your language. Likewise, your grammatical examples should illustrate all the possible orders. For a transitive verb, then, we expect six sentences (all possible orders of S, V, and O) with the number of grammatical v. ungrammatical sentences varying depending on the language type.

In some languages, full NPs are arguably always adjuncts (topics, etc.) with the valence requirements of the verb being filled by affixes. If your language seems to fall into this type, you should still try to find examples with full NPs illustrating where they can occur, but please discuss this in your write up.

Another more complicated word-order pattern is "V2", where the verb (or a finite auxiliary) must be the second thing in the clause, but anything can come first (S, O, adverb, etc). If your language is described as V2, your word order examples should include ungrammatical examples where the verb is not in second position, and a variety of grammatical examples where it is.

If your language is strict about the order of S and O, be careful in your assignment of (un)grammaticality in examples where S and O are reversed. Whether the string is strictly ungrammatical or actually just means something different depends on how else your language marks subjects and objects (e.g., with case or agreement). For example, in English, Cats chase dogs and Dogs chase chats are both grammatical, they just mean different things.

Please use full NPs for the basic word order examples (i.e., something like the cat or cats instead of Fluffy or them).

Pronouns

We'll be addressing agreement, and to get interesting subject-verb (or object-verb) agreement, it's useful to have non-third person forms.

Find the paradigm(s) for pronouns in your language: Do they vary by person, number, gender, case?

Do pronouns have the same distribution as full NPs in your language? (I.e., can they appear in the same places in the string?) If not, add some examples illustrating the contrast to your test suite. In either case, add some examples to your test suite illustrating the full paradigm. (Ungrammatical examples involving case and agreement will come up under those topics below.)

The rest of the NP

This section is meant to address the other obligatory elements of an NP (particularly, an NP headed by a common noun). Does the language require determimers? Require determiners only with certain nouns? Allow determiners optionally? Allow determiners even with proper nouns and pronouns? What is the order of the determiner with respect to the noun? The space of examples here (abstracting away from order) should look something like this:

Note that n1, n2, and n3 are supposed to be of different types (e.g., nouns that require determiners always, nouns that disallow determiners always, and nouns that optionally allow determiners). Not all languages will necessarily have three types.

Is there anything else that is required with NPs? One common thing is case-marking adpositions. Note that these can appear obligatorily with all nouns in all argument positions, optionally with all nouns in all argument positions, obligatorily but only in certain argument positions/with certain verbs, obligatorily but only with certain (e.g., animate) nouns, etc. Develop examples which illustrate the contrasts, including any contrasts between multiple case-marking adpositions (is one only used for subjects? etc).

Argument optionality

This section concerns whether and under what circumstances verbal arguments can be left unexpressed:

As far as I know, there are four basic patterns here:

  1. Rampant pro-drop: Any argument can be left unexpressed, with its interpretation handled by context (i.e., as if it were a pronoun), or, depending on the verb, as an unspecified object of that verb.
  2. Pronominal affixes: Any argument can be left unexpressed, but only if an "agreement marker" (arguably an incorporated pronoun) appears on the verb. Such "agreement markers" may or may not be compatible with overt objects. (Additionally, languages with this pattern may also allow the unspecified object situation.)
  3. Subject pro-drop: Subjects may be left unexpressed, but not other objects.
  4. Pro-drop limited to constructionally or lexically licensed null instantiation. (This is the situation in English, where argument can go unexpressed only in certain constructions [e.g., imperative] or with certain verbs [e.g., eat].

Determine which pattern your language falls under, and then create examples for your test suite. Here a typical set of examples would be (again, abstracting away from word order, and assuming that agreement markers are optional with overt arguments but obligatory with missing ones):

If your language generally doesn't allow pro-drop (or generally doesn't allow pro-drop of objects) but does with particular verbs, you should include two sets of examples like the above (one with a verb that does allow a dropped object and one with a verb that does not).

Agreement

Agreement is covariation in form between multiple items (typically a head and a dependent) in a sentence. Sometimes, the head doesn't change form, but the dependent does, depending on properties of the head.

Languages vary greatly in how much agreement they display from none at all to quite a bit. You can categorize agreement systems along two dimensions:

  1. Which elements are in the agreement relation (subject & verb, object & verb, determiner & noun, adjective & noun are typical)
  2. Which features are involved (person, number, case and gender are typical)

Determine whether your language has agreement, and if so, which type. Then construct examples showing both grammatical and ungrammatical possibilities. Remember, the ungrammatical examples should only have one thing wrong with them (i.e., if you have both subject-verb and object-verb agreement, there's no need to make an example where both the subject and the object disagree with the verb).

For a language with determiner-noun agreement in number and case subject-verb agreement in person and number, a possible example set is:

Notice the inclusion of non-third person examples (using pronouns). You could pair e.g., every verb form with every kind of noun it doesn't agree with, but that's not strictly necessary. As long as each noun type and each verb form show up in at least one ungrammatical examples, most errors should get caught.

Case

A language has a case system if the nouns vary in form depending on the grammatical role they play in a sentence and/or the specific head they are a dependent of. (In some languages, it's not the nouns themselves that vary in form, but the dependents of the noun, such as determiners or adjectives. We'll analyze this as the nouns still having case and their dependents agreeing with them in case.)

Some languages have no case system. Among languages that do have case systems, they can be characterized along the following parameters:

Determine if your language has a case system, and if so, where it falls on the above parameters. (An ergative-absolutive system is one where the object of a transitive verb bears the same case marking as the subject of an intransitive, and contrasts with the subject of a transitive. A split-ergative system is one where the ergative-absolutive pattern shows up only in certain contexts: only with certain tenses/aspects; only with certain types of nouns [e.g., animates]; only with certain types of verbs.)

Create examples showing both grammatical and ungrammatical case patterns. For the purposes of this class, you only need to consider the case of subjects, direct objects, and (if you found some ditransitives) indirect objects. If your language has quirky case, you should consider including both verbs that illustrate the major case patterns and verbs that have idiosyncratic patterns. If your language has split-ergativity, you should illustrate both sides of the split.

For a nominative-accusative language (with a distinct dative case) without quirky case, a typical example set would look like this (assuming SVO word order):

If your language also has agreement, or obligatory determiners, etc., make sure that the examples have the appropriate form along those dimensions.

Negation

Here we're interested in how your language handles sentential (NB: not constituent) negation. Again, there are a few basic strategies that should cover most cases:

  1. Inflection: An affix (prefix, suffix, infix) expressing negation, which attaches either to auxiliaries only, main verbs only, any (finite) verb.
  2. Independent adverb: An adverb that modifies a V, a VP, or an S; and attaches to the left, right, or either side.
  3. Selected adverb: An adverb that appears as a selected complement of auxiliaries only, main verbs only, any (finite) verb.

It can be subtle to distinguish between options 2 and 3, and chances are the data you'll be able to find won't be sufficient to do so. For option 1, positive test suite examples should show the negated verb/auxiliary. Negative examples should show the negation inflection on verbs that is not allowed on (non-finite verbs? main verbs?). For options 2/3, positive examples should show the negative adverb in the positions it can appear in; contrasting negative examples should illustrate where it cannot appear.

Some languages have both inflection and an adverb, in which case there are the following logical possibilities regarding their coocurrence:

  1. Both must be used together.
  2. Either one can be used separately, but they cannot be used together (complementary distribution).
  3. They can be used indepdently or together.
  4. The adverb is obligatory, but the inflection is optional.
  5. The inflection is obligatory, but the adverb is optional.

If your language allows both strategy types, determine (if you can) the rules of their coocurrence, and illustrate with appropriate positive and negative examples in your test suite.

Matrix yes-no questions

How does your language indicate matrix clause (i.e., not embedded) yes-no questions? Possible strategies include word order variations, a sentence-initial or sentence-final question particle, a special auxiliary, and intonation only.

Your testsuite should include positive examples illustrating all of the strategies in your language. If you have any strategies that involve additional lexical material (e.g., a question particle), create negative examples with the question particle in the wrong place.

If your language indicates questions with word order variations, go back to your negative examples under word order and check whether anything marked as ungrammatical there is really grammatical as a question. This is where we begin to see that a finished testsuite should pair strings with analyses --- this is done implicitly here in the free translations.

Consider creating examples of negative questions (i.e., sentences simultaneously illustrating both negation and yes-no questions).

Embedded clauses (declarative, interrogative)

Try to find at least one verb that can embed finite declarative clauses and at least one verb that can embed finite interrogative clauses. How are the embedded clausees marked? Does the language use complementizers? Is the word order different between matrix and embedded clauses? Are there different complementizers for embedded declaratives v. interrogatives? Do the selecting verbs allow both kinds of clausees (e.g., English know) or just one (e.g., English ask). Create grammatical and ungrammatical exmaples to illustrate any contrasts that you find. Restrict your attention to yes-no questions. (That is, no wh- questions.)

Note that I am not looking for embedded clauses functioning as modifiers (e.g., relative clauses or clauses marked by when or because. Instead try to find examples similar to these:

Modals

How does your language express the meaning associated with English can in I can eat glass? The two major possibilities are an independent auxiliary like in English or an affix on the main verb. Alternatively, you might find only periphrastic means ("It is possible for me to eat glass.") If there's an auxiliary, you might find that it's a subject-raising verb (like in English), or that it does argument composition: in this case, the auxiliary takes the lexical verb as its complement and then adopts all of the verb's arguments (subject and complements) as its own. You can tell this is going on when the arguments of the verb are ordered with respect to the auxiliary rather than the verb.

Coordination

Explore how coordination is marked in your language. Coordination is, very informally, the sort of phrasal combination marked by "and" in English. In some languages (like English), this is simple: a single lexical item can coordinate any kind of phrase. In other languages, coordination might be marked by adding an affix, lengthening a vowel, or changing to another tense -- the variety of marking strategies is surprising. Languages also vary as to how many coordinands must be marked: all of them (and A and B and C...), just one (A B and C), or none of them (A B C...). Also, some languages have different ways of marking coordination for different phrase types. If this is the case, it will be interesting to illustrate at least two different strategies.

Extracting this information from your written grammar can be challenging. Coordination is described in different sections in different grammars: in a separate section of its own, in a section that also describes subordination (often titled "Conjuctions"), or possibly spread out over the sections that describe each phrase type (i.e. nouns, verbs, adjectives, etc). Some grammars provide very little information beyond "the word for 'and' is FOO"; some are more detailed. Collect what information you can find, especially example sentences that have the word "and" in their gloss or translation. Consider doing a Google search (or other web search) based on the spelling of the morpheme for "and" to get some naturally occurring examples to supplement what you can get from your reference material. One last thing to be aware of: some languages mark coordinated meanings using a word or inflection meaning "with", but don't seem to actually form tightly bound coordinated constituents.

Tense/aspect

What tense and aspect categories are marked in your language, and how are they marked?

Tense has to do with the time of the event in relation to the speech time, and usually involves categories like "past, present, future", though not all languages make a three-way distinction, and some languages allow more fine-grained distinction. Aspect has to do with the internal temporal structure of the event, and how it is viewed or portrayed in the utterance. Under the heading aspect, you might find categories like "progressive, habitual, perfective, durative, inceptive" and others.

Tense and aspect can be expressed by auxiliaries, affixes, particles or combinations thereof.

You should collect at least one way of expressing each tense category marked in your language, except for "perfect" tenses (e.g., English future perfect Kim will have gone by then.). If your language marks any aspect categories with auxiliaries, affixes, or particles, try to collect roughly three different aspects as well. Note that when a language doesn't have a grammaticalized means of expressing a particular tense or aspect category, it can usually still get it across by paraphrasing. Thus English Kim began to swim. doesn't count as inceptive aspect for our purposes.

Ungrammatical examples can be hard to come by in this category, but you should be able to construct some by showing illicit combinations of auxiliaries + main verb forms, or multiple incosistent tense affixes or particles.

Adjectives, or ...

This topic is basically here to give folks who are working on languages without case and agreement something else to do when we get there. If there's something your language that you'd rather tackle other than adjectives, go ahead and develop examples for that instead.

For adjectives, you should determine where they appear with respect to the nouns they modify (left, right, either, separated to anywhere in the sentence). You should also determine if they agree with the nouns they modify.

Back to top

Formatting

The test suites should be initially produced as plain text files (in ascii or unicode). We'll produce perl scripts that turn the plain text into best-practice XML on the one hand and the required format for [incr tsdb()] on the other. In order to do this, the formatting of the plain text file has to be relatively strict.

Your test suite file should consist of a header, containing information pertinent to all of the examples, followed by a list of examples. The header should contain the following information:

Language: <language name>
Language code: <Ethnologue language code>
Author: <your name>
Date: April 7, 2006
Source a: <Reference to grammar/web page>
Source b: <Reference to grammar/web page>
... (as many sources as needed)

Each example should consist of the following:

#Ex number and optional comment
Source: {a:page, b:page, author, elicited, attested}
Vetted: {t, f, s}
Judgment: {g, u}
Phenomena: {word order, case, agreement, ...}
<Example in standard orthography> (optional)
<Example in transliteration> (optional)
<Example with morpheme boundaries noted and morpheme forms regularized>
<Morpheme-by-morpheme glosses>
<Free translation>

The comment character is '#', and it is good practice to number your examples in a comment line above the Source: line. For ungrammatical examples, this comment should also indicate what is wrong with the example, for your reference and for mine.

The source field indicates where the example came from. If it came from one of your written sources, you can refer to that source with a single letter code. If there is a page number associated with the example, it should follow the letter (with a colon in between). If you made the example up, the source should be author. If you elicited the example from a native speaker, then the source should be elicited. If you found the example in a non-linguistic text, the source should be attested.

The vetted field indicates where the judgment on the example came from. t means the example has been vetted by a native speaker, who gave the judgment indicated. f means it has not. (In this case, the judgment is your best guess based on the grammatical materials you have.) If you are a native speaker, you can vet your own examples. If the example comes from a grammar (which indicates a grammaticality judgment for it explicitly), and you haven't had it vetted in addition, you should put s in this field. This is meant to indicate that we think the example was vetted before being included in the grammar, but since we didn't do it, we're not sure. For attested examples, you should use t if you checked it with a native speaker and f if you have not.

The judgment field indicates the gramamticality judgment assigned to the example (either by a native speaker, in a grammar, or your best guess). g is for grammatical and u is for ungrammatical.

The phenomena field is a list of phenomena illustrated in the example. We'll have the perl script recognize both long and short names for each phenomenon, according to the table below. A single example might illustrate multiple phenomena. However, ungrammatical examples should have only one thing wrong with them, and be tagged only the phenomenon tag corresponding to that problem.
Long nameShort name
Word orderwo
Pronounspn
Determinersdet
Case-marking adpositionsadp
Argument Optionalitypro-d
Agreementagr
Casec
Coordinationcrd
Matrix yes-no questionsq
Modalsm
Negationneg
Imperativesimp
Embedded declarativesemb-d
Embedded questionsemb-q
Adjectivesadj
Relatives clausesrel
Tense Aspectta

The standard orthography and transliteration lines give a canonical respresentation of the string. You should have at least one of these, possibly both. Whatever you do, it should be consistent for the whole file.

The example should also be presented with the morpheme boundaries explicit. This will allow us to write a perl script that aligns glosses with each morpheme. For languages with particularly complex morphophonology, you might end up using this line as the example you actually parse/generate (to abstract away from the phonological rules). It is for this reason that this line should have phonologically regularized forms. In this line, morpheme boundaries should be indicated with hyphens and word boundaries with spaces. If your language has clitics, the boundary between a clitic and its host should be marked with an equals sign.

The next line is the morpheme-by-morpheme glosses. These should be in a one-to-one correspondence with the morphemes, so if the nth word in the line above has two hyphens in it, the nth word in this line has two hyphens as well. Stems should be given English glosses indicating their meaning. For formatting and a good set of grammatical abbreviations to use, please follow the Leipzig glossing rules. (You might also want to refer to the standardized set of grams from ODIN.)

The final line for each entry should have the free translation of the example.

Here is an example sentence that I just made up from Japanese (with only a transliteration and not a standard orthography line for the moment):

Source: author
Vetted: f
Judgment: g
Phenomena: {case, negation}
Keeki-wo tabenakatta.
Keeki-wo tabe-nai-ta
cake-acc eat-NEG-PRF
`(Someone) didn't eat cake.'

And a couple from French:

Source: author
Vetted: f
Judgment: g
Phenomena: {word order}
J'ai mangé le gâteau
Je-ai mange-é le gâteau
I-have.1sg eat-PRF the.M.SG cake
`I have eaten the cake.'

Source: author
Vetted: f
Judgment: u
Phenomena: {word order}
J'ai le gâteau mangé
Je-ai le gâteau mange-é
I-have.1sg the.M.SG cake eat-PRF
`I have eaten the cake.'

(NB: I'm going with the analysis of so-called French "clitics" as affixes.)

Here's a long example from Japanese to illustrate. This is good:

Source: a
Vetted: s
Judgment: g
Phenomena: {coordination, servial verb}
Ima made takusan hon-wo yonde kimashita ga, kore kara mo yonde iku tsumori desu.
Ima made takusan hon-wo yom-te ki-mas-ta ga kore kara mo yom-te ik-u tsumori dseu.
now until many book-ACC read-PTCP come-HON-PRF CONJ here from also read-PTCP go-IPFV intention COP.HON.IPFV
`Up to now I have read quite a few books and I intend to read from now on, too.'

Do NOT do it this way. (I'm only including the IGT lines here to keep folks from looking at this quickly and copying.)

[BAD BAD BAD]
Ima made takusan hon-wo yonde kimashita ga,
kore kara mo yonde iku tsumori desu.
Ima made takusan hon-wo yom-te ki-mas-ta ga
kore kara mo yom-te ik-u tsumori dseu.
now until many book-ACC read-PTCP come-HON-PRF CONJ
here from also read-PTCP go-IPFV intention COP.HON.IPFV
`Up to now I have read quite a few books and I intend to read from now on, too.'

Do NOT do it this way either:

[BAD BAD BAD]
Ima made takusan hon-wo yonde kimashita ga,
Ima made takusan hon-wo yom-te ki-mas-ta ga
now until many book-ACC read-PTCP come-HON-PRF CONJ
kore kara mo yonde iku tsumori desu.
kore kara mo yom-te ik-u tsumori dseu.
here from also read-PTCP go-IPFV intention COP.HON.IPFV
`Up to now I have read quite a few books and I intend to read from now on, too.'

Back to top

Write up

This write up will be relatively long. For each grammatical phenomenon of the five you handle this week, you should describe what you discovered about your language. If you have any questions about any of them (that you haven't yet asked or even if you have), please include that as well. Indicate any places where you were uncertain about what the language actually does, and describe any assumptions you made in order to create test suite examples.

Your write up should have separate sections for each phenomenon, and include examples pasted in from the test suite to illustrate the points you are talking about. For example:

1. Basic Word Order

The basic word order in my language is AUX-S-V-O. In the example below, xua is the auxiliary. It is followed by the subject tcejbus, then the verb, then the object.

xua      tcejbus  brev  tcejbo
xua      tcejb-us brev  tcejb-o
3sg.PRES dog-NOM  chase cat-ACC
`(a/the) dog chases (a/the) cat'

Most sentences have an auxiliary, but in some tenses, there is the only the main verb. In these cases, the main verb bears the tense inflection, and appears initially in the sentence. The subject still precedes the object, giving VSO order:

brevd	     tcejbus  tcejbo
brev-d       tcejb-us tcejb-o
case-PST.3sg dog-NOM  cat-ACC
`(a/the) dog chased (a/the) cat'

In addition, your write up should indicate whether any of the phenomena listed above (not just the five you do this week) are irrelevant for your language.

In general, I prefer for the write-ups to be plain text. If you want to submit something with formatting, please create a pdf.

Back to top

Submit your assignment

Back to top

Back to course page


ebender at u dot washington dot edu
Last modified: Fri Dec 29 2006