Testsuite Specifications

Navigation

Preliminaries
General Guidelines
Phenomena
Formatting
Back to course page

Preliminaries

If your language uses non-ascii characters, you'll need to either:
- Settle on a system of transliteration (preferably both non-lossy and standard for the language, but at least the former)
- Figure out an input method for the standard orthography in emacs as well as how to create unicode files.
If your language has complex morphophonology, you'll need to work out an underlying representation to use for your testsuite (and lexicon/lexical rules).
If you're not a native speaker of your language, you might try to find out whether there are native speakers you could contact for judgments on examples. (Is the language taught at UW? Is there a student organization that's likely to attract speakers of the language? Is there a cultural organization in Seattle? Is there a forum somewhere on the web?) If you are working with a field linguist, you should find out from him/her to what extent you would be able to get judgments on new examples.

Test suites: General guidelines

Your test suite should include both grammatical and ungrammatical examples, and ideally more ungrammatical examples than grammatical ones. In the description of grammatical phenomena below, I'll give examples of how this might be done.
Your test suite should include examples showing the interaction of various phenomena (e.g., negative questions), to the extent that you can find this data.
In general, you'll want to keep your vocabulary as small as possible, using distinct word types only when required to illustrate particular phenomena. On the other hand, we'll eventually want to add enough words to each grammar so that we can do the MT exercise and, more importantly, to the extent that you can collect examples that have been vetted by native speakers (as the examples in your reference grammars should have been), these will are more valuable than examples that have been constructed on the basis of a grammar, but not validated.
Steer clear of sentences with copulas or non-verbal predicates (e.g., The cat is old), except in the portion of the testsuite dedicated to these phenomena.
Ungrammatical examples should generally have only one thing wrong with them. These are more useful for diagnostic purposes.
Grammatical examples should be complete sentences, even if you're illustrating something to do with smaller phrases (e.g., just NPs). We won't be dealing with fragments in this class.

This section describes the grammatical phenomena we expect to cover, and the type of examples you should use to illustrate them. (NB: This document has grown over the years, and some phenomena not treated in this year's 567 are still included for posterity. You should follow the links from the lab instructions to find the phenomena for each lab.)

Basic Word Order
Pronouns
The Rest of the NP
Argument Optionality
Agreement
Case
Negation
Matrix yes-no questions
Embedded complement clauses
Adverbial clausal modifiers
Modals
Coordination
Agreement in NP Coordination
Tense/aspect
Demonstratives/definiteness
Possessives
Attributive adjectives
Adverbs
Non-verbal predicates
Information Structure
Matrix wh questions
Valence-Changing Lexical Rules
Evidentials

Basic Word Order

In declarative matrix clauses, what is the order of the major constituents (subject, verb, complements)? Be sure to consider both intransitive and transitive verbs. Note that some languages strongly prefer one order, while allowing some variation, while others are better characteried as free or partially free, even if there is still a most frequent word order.

Do sentences in your language typically contain auxiliaries? Where does the auxiliary occur with respect to the verb?

Your ungrammatical examples in this section should explore all of the possible orders that are not allowed in your language. Likewise, your grammatical examples should illustrate all the possible orders. For a transitive verb, then, we expect six sentences (all possible orders of S, V, and O) with the number of grammatical v. ungrammatical sentences varying depending on the language type.

In some languages, full NPs are arguably always adjuncts (topics, etc.) with the valence requirements of the verb being filled by affixes. If your language seems to fall into this type, you should still try to find examples with full NPs illustrating where they can occur, but please discuss this in your write up.

Another more complicated word-order pattern is "V2", where the verb (or a finite auxiliary) must be the second thing in the clause, but anything can come first (S, O, adverb, etc). If your language is described as V2, your word order examples should include ungrammatical examples where the verb is not in second position, and a variety of grammatical examples where it is.

If your language is strict about the order of S and O, be careful in your assignment of (un)grammaticality in examples where S and O are reversed. Whether the string is strictly ungrammatical or actually just means something different depends on how else your language marks subjects and objects (e.g., with case or agreement). For example, in English, Cats chase dogs and Dogs chase cats are both grammatical, they just mean different things.

Please use full NPs for the basic word order examples (i.e., something like the cat or cats instead of Fluffy or them).

Pronouns

We'll be addressing agreement, and to get interesting subject-verb (or object-verb) agreement, it's useful to have non-third person forms.

Find the paradigm(s) for pronouns in your language: Do they vary by person, number, gender, case?

Do pronouns have the same distribution as full NPs in your language? (I.e., can they appear in the same places in the string?) If not, add some examples illustrating the contrast to your test suite. In either case, add some examples to your test suite illustrating the full paradigm. (Ungrammatical examples involving case and agreement will come up under those topics below.)

Note that we are NOT concerned with reflexive pronouns, so don't worry about collecting these.

The rest of the NP

This section is meant to address the other obligatory elements of an NP (particularly, an NP headed by a common noun). Does the language require determimers? Require determiners only with certain nouns? Allow determiners optionally? Allow determiners even with proper nouns and pronouns? What is the order of the determiner with respect to the noun? The space of examples here (abstracting away from order) should look something like this:

det n1 itr-verb
n1 itr-verb
det n2 itr-verb
n2 itr-verb
det n3 itr-verb
n3 itr-verb
n1 det itr-verb (illustrating other possible order of n and det)

Note that n1, n2, and n3 are supposed to be of different types (e.g., nouns that require determiners always, nouns that disallow determiners always, and nouns that optionally allow determiners). Not all languages will necessarily have three types.

Is there anything else that is required with NPs?

(Note that one common thing is case-marking adpositions. If your language has these, you'll want to illustrate them under "case".)

Argument optionality

This section concerns whether and under what circumstances verbal arguments can be left unexpressed. Languages vary in the restrictions they place on unexpressed arguments. For subjects and objects each, find out if your language:

allows them to be dropped for any verb, or just with some verbs.
requires agreement markers on the verb when the subject/object is unexpressed.
allows them to be dropped in any context, or only in particular contexts (e.g., particular tense/aspect combinations, particular person/number combinations, etc).

Based on the answers to the questions above, create examples for your test suite. Here a typical set of examples would be (again, abstracting away from word order, and assuming that agreement markers are optional with overt arguments but obligatory with missing ones):

s tverb o
s tverb-objagr
*s trans-verb
tverbs-subjagr o
*tverb o
tverb-subjagr-objagr
*tverb-objagr
*tverb-subjagr

If your language has lexically-licensed (or lexically-restricted) pro-drop, you should include two sets of examples like the above (one with a verb that does allow a dropped object and one with a verb that does not).

Agreement

Agreement is covariation in form between multiple items (typically a head and a dependent) in a sentence. Sometimes, the head doesn't change form, but the dependent does, depending on properties of the head.

Note that we are not concerned here with agreement between a pronoun and its antecedent.

Languages vary greatly in how much agreement they display from none at all to quite a bit. You can categorize agreement systems along two dimensions:

Which elements are in the agreement relation (subject & verb, object & verb, determiner & noun, adjective & noun are typical)
Which features are involved (person, number, case and gender are typical)

Determine whether your language has agreement, and if so, which type. Then construct examples showing both grammatical and ungrammatical possibilities. Remember, the ungrammatical examples should only have one thing wrong with them (i.e., if you have both subject-verb and object-verb agreement, there's no need to make an example where both the subject and the object disagree with the verb).

For a language with determiner-noun agreement in number and case subject-verb agreement in person and number, a possible example set is:

det-sg-nom n-sg-nom verb-3sgsubj det-sg-acc n-sg-acc
det-pl-nom n-pl-nom verb-3plsubj det-pl-acc n-pl-acc
det-sg-nom n-sg-nom verb-3plsubj det-sg-acc n-sg-acc
det-pl-nom n-pl-nom verb-3sgsubj det-pl-acc n-pl-acc
*det-sg-acc n-sg-nom verb-3sgsubj det-sg-acc n-sg-acc
*det-pl-nom n-sg-nom verb-3sgsubj det-sg-acc n-sg-acc
*det-sg-nom n-pl-nom verb-3plsubj det-pl-acc n-pl-acc
*det-pl-acc n-pl-nom verb-3plsubj det-pl-acc n-pl-acc
*det-sg-nom n-sg-nom verb-3sgsubj det-pl-acc n-sg-acc
*det-sg-nom n-sg-nom verb-3sgsubj det-sg-nom n-sg-acc
*det-pl-nom n-pl-nom verb-3plsubj det-sg-acc n-pl-acc
*det-pl-nom n-pl-nom verb-3plsubj det-pl-nom n-pl-acc
pronoun-1sg-nom verb-1sgsubj
pronoun-1pl-nom verb-1plsubj
pronoun-2sg-nom verb-2sgsubj
pronoun-2pl-nom verb-2plsubj
*pronoun-1sg-nom verb-2plsubj
*pronoun-1pl-nom verb-2sgsubj
*pronoun-2sg-nom verb-1plsubj
*pronoun-2pl-nom verb-2sgsubj

Notice the inclusion of non-third person examples (using pronouns). You could pair e.g., every verb form with every kind of noun it doesn't agree with, but that's not strictly necessary. As long as each noun type and each verb form show up in at least one ungrammatical examples, most errors should get caught.

Case

A language has a case system if the nouns vary in form depending on the grammatical role they play in a sentence and/or the specific head they are a dependent of. (In some languages, it's not the nouns themselves that vary in form, but the dependents of the noun, such as determiners or adjectives. We'll analyze this as the nouns still having case and their dependents agreeing with them in case.)

Some languages have no case system. Among languages that do have case systems, they can be characterized along the following parameters:

The number of distinct cases
Whether or not there are lexical idiosyncrasies in case assignment ("quirky case")
Whether the pattern is nominative-accusative, ergative-absolutive, or split-ergative, or one of a number of other possibilities
How the case is expressed (affixes on nouns/determiners/adjectives, adpositions)

Determine if your language has a case system, and if so, where it falls on the above parameters. If your language has a direct-inverse system, even though that isn't strictly case, please consider it here.

Create examples showing both grammatical and ungrammatical case patterns. For the purposes of this class, you only need to consider the case of subjects and direct objects. If your language has quirky case, you should consider including both verbs that illustrate the major case patterns and verbs that have idiosyncratic patterns. If your language has split-ergativity, you should illustrate both sides of the split.

For a nominative-accusative language (with a distinct dative case) without quirky case, a typical example set would look like this (assuming SVO word order):

noun-nom intrans-verb
*noun-acc intrans-verb
*noun-dat intrans-verb
noun-nom trans-verb noun-acc
*noun-acc trans-verb noun-acc
*noun-dat trans-verb noun-acc
*noun-nom trans-verb noun-nom
*noun-nom trans-verb noun-dat

If your language also has agreement, or obligatory determiners, etc., make sure that the examples have the appropriate form along those dimensions.

Negation

[In this section only, we're using "grammatical examples" and "ungrammatical examples" instead of "positive examples" and "negative examples", to avoid collision with the other uses of the term "negative".]

Here we're interested in how your language handles sentential (NB: not constituent) negation. Cross-linguistically, sentential negation strategies can be classified according to the number of morphemes required for the construction (see WALS Online Ch. 112). If your language marks negation with a single morpheme, then click on "simple" at the top of negation page, and you should then be able to fill out some more information about your negator morpheme.

For simple negation, you can select from the following types:

Inflection: An affix (prefix, suffix, infix) expressing negation, which attaches either to auxiliaries only, main verbs only, any (finite) verb.
Negative auxiliary verb: In this negation strategy, the negator morpheme is a syntactic head that selects a (V, VP, or an S) complement (according to the properties you specified for auxiliary verbs in your language on the word order page.
Independent adverb: An adverb that modifies a V, a VP, or an S; and attaches to the left, right, or either side.
Selected adverb: An adverb that appears as a selected complement of auxiliaries only, main verbs only, any (finite) verb.

It can be subtle to distinguish between options 3 and 4, and chances are the data you'll be able to find won't be sufficient to do so. For option 1, grammatical test suite examples should show the negated verb/auxiliary. Ungrammatical examples for your testsuite should show the negation inflection on verbs that is not allowed on (non-finite verbs? non-auxiliary verbs?). For option 2, grammatical examples should show the negative auxiliary with its proper complement in the proper word order. Ungrammatical examples should show the negative auxiliary in the wrong position or with the wrong complement type. For 3/4, grammatical examples should show the negative adverb in the positions it can appear in; contrasting ungrammatical examples should illustrate where it cannot appear.

If your language uses two morphemes together to mark negation, select the `bipartite' option on the negation page. For languages with bipartite (sometimes called double) negation, it's often the case that one morpheme is more associated with the actual negation function and the second morpheme seems to be more resumptive or supporting. We'll call the two morphemes NEG1 and NEG2, respectively. If it's not the case that one morpheme is primary, label them NEG1 and NEG2 based on linear order.

A word of caution about determining whether your language has bipartite negation: be sure not to count 'negative concord' or 'negative agreement' items in your counts. That is, sometimes NPs under the scope of negation show agreement marking (ie, in some Englishes: "I didn't drink no water."). Only count negative morphemes that are required by the formal construction. As example, consider Africaans [afr]:

Hulle was nie betrokke nie

they were NEG1 involved NEG2

They were not involved [afr]. (de Angulo and Freeland 1931)

After you've determined that you've got bipartite negation, you'll need to specify the individual properties of NEG1 and NEG2.

As with simple negation types, bipartite negation test suites should have grammatical and ungrammatical examples. Ungrammatical examples should show required dependencies between the two morphemes (and required positions) being violated.

Matrix yes-no questions

How does your language indicate matrix clause (i.e., not embedded) yes-no questions? Possible strategies include word order variations, a sentence-initial or sentence-final question particle, a special auxiliary, and intonation only.

Your testsuite should include positive examples illustrating all of the strategies in your language. If you have any strategies that involve additional lexical material (e.g., a question particle), create negative examples with the question particle in the wrong place.

If your language indicates questions with word order variations, go back to your negative examples under word order and check whether anything marked as ungrammatical there is really grammatical as a question. This is where we begin to see that a finished testsuite should pair strings with analyses --- this is done implicitly here in the free translations.

Consider creating examples of negative questions (i.e., sentences simultaneously illustrating both negation and yes-no questions).

Embedded complement clauses (declarative, interrogative)

Try to find at least one verb that can embed finite declarative clauses and at least one verb that can embed finite interrogative clauses --- both as complements (rather than subjects). How are the embedded clausees marked? Are there complementizers, either optional or obligatory? Is there any special morphology required on the verb? How is the embedded clause positioned with respect to the matrix verb (i.e. does it show up in object position, necessarily at the end of the clause, somewhere else)? Are embedded clauses necessarily nominalized? Does the language use complementizers? Is the word order different between matrix and embedded clauses? Are there different complementizers for embedded declaratives v. interrogatives? Do the selecting verbs allow both kinds of clauses (e.g., English know) or just one (e.g., English ask). Create grammatical and ungrammatical exmaples to illustrate any contrasts that you find. Restrict your attention to yes-no questions. (That is, no wh- questions.)

Note that in this section I am not looking for embedded clauses functioning as modifiers (e.g., relative clauses, clauses marked by when or because, non-finite clauses expressing simultaneous action). Instead try to find examples similar to these:

The dog thinks that the cat slept.
The child asked whether birds sing.

For the purposes of our MT exercise at the end of the class, try in particular to find equivalents of think and ask that can be used with clausal complements.

Adverbial Clausal Modifiers

This section looks at clauses that serve as modifiers for other clauses. Examples from English include:

Kim slept while Sandy played.
Kim laughed because Sandy told a joke.

You'll be looking for the following information about clausal modifier strategies:

What identifies the modifier as a subordinate clause? (Could be: A free subordinator, special morphology on the verb, both?)
If you have free subordinators, are they at the edge of the subordinate clause or do they potentially appear somewhere in the middle?
Is the subordinate clause required to be nominalized?
Does the modifier attach to the matrix clause at the S level or the VP level?
Does the matrix clause require any mark to show that it is modified by a clausal modifier?
Is the subject of the subordinate clause optionally or obligatorily shared with that in the matrix clause?

Focus in the first instance on finding how your language expresses the equivalents of because and while as expressed above. If you can't either or both of those, fill in if possible with other clausal modifiers. As usual, create both positive (grammatical) and negative (ungrammatical) examples, illustrating the properties elicited above.

Modals

How does your language express the meaning associated with English can in I can eat glass? The two major possibilities are an independent auxiliary like in English or an affix on the main verb. Alternatively, you might find only periphrastic means ("It is possible for me to eat glass.") If there's an auxiliary, you might find that it's a subject-raising verb (like in English), or that it does argument composition: in this case, the auxiliary takes the lexical verb as its complement and then adopts all of the verb's arguments (subject and complements) as its own. You can tell this is going on when the arguments of the verb are ordered with respect to the auxiliary rather than the verb.

Coordination

Explore how coordination is marked in your language. Coordination is, very informally, the sort of phrasal combination marked by "and" in English. In some languages (like English), this is simple: a single lexical item can coordinate any kind of phrase. In other languages, coordination might be marked by adding an affix, lengthening a vowel, or changing to another tense -- the variety of marking strategies is surprising. Languages also vary as to how many coordinands must be marked: all of them (and A and B and C...), just one (A B and C), or none of them (A B C...). Also, some languages have different ways of marking coordination for different phrase types. If this is the case, it will be interesting to illustrate at least two different strategies.

Extracting this information from your written grammar can be challenging. Coordination is described in different sections in different grammars: in a separate section of its own, in a section that also describes subordination (often titled "Conjuctions"), or possibly spread out over the sections that describe each phrase type (i.e. nouns, verbs, adjectives, etc). Some grammars provide very little information beyond "the word for 'and' is FOO"; some are more detailed. Collect what information you can find, especially example sentences that have the word "and" in their gloss or translation. Consider doing a Google search (or other web search) based on the spelling of the morpheme for "and" to get some naturally occurring examples to supplement what you can get from your reference material. One last thing to be aware of: some languages mark coordinated meanings using a word or inflection meaning "with", but don't seem to actually form tightly bound coordinated constituents.

Agreement in NP Coordination

[Phenomena code: Coordination Agreement/crdagr]

New section for 2017, so start early if possible and expect bugs!

In this section, we're interested in what happens to the agreement properties (person, number, gender) on NPs or NOMs (N's) coordinated with and, where the agreement properties of the conjuncts differ. There are three major patterns:

Feature resolution --- the value of a feature on the mother is some static function of the value of the features on the daughters. For example, in English, a coordinated NP is 1st person if any of the conjuncts are 1st person, otherwise 2nd person if any of the conjuncts are 2nd person, otherwise 3rd prson.
Specific value on the mother independent of the values on the daughters. For example, in English, the mother of an NP coordinated with and is always plural.
Distinguished conjunct agreement --- agreeing elements outside the coordinated NP agreed with one specific conjunct. This could be the closest conjunct to the agreeing element (e.g. rightmost in subject-verb agreement in an SVO language) or it could be rightmost or leftmost consistently, regardless of where the coordinated NP is situated with respect to the agreeing element.

For languages with agreement and and-coordinaton of NOMs or NPs only, collect examples showing how all of the features active in agreement behave when the conjuncts in a coordinated NP (or NOM) differ in those features.

Tense/aspect

What tense and aspect categories are marked in your language, and how are they marked?

Tense has to do with the time of the event in relation to the speech time, and usually involves categories like "past, present, future", though not all languages make a three-way distinction, and some languages allow more fine-grained distinction. Aspect has to do with the internal temporal structure of the event, and how it is viewed or portrayed in the utterance. Under the heading aspect, you might find categories like "progressive, habitual, perfective, durative, inceptive" and others.

Tense and aspect can be expressed by auxiliaries, affixes, particles or combinations thereof.

You should collect at least one way of expressing each tense category marked in your language, except for "perfect" tenses (e.g., English future perfect Kim will have gone by then.). If your language marks any aspect categories with auxiliaries, affixes, or particles, try to collect roughly three different aspects as well. Note that when a language doesn't have a grammaticalized means of expressing a particular tense or aspect category, it can usually still get it across by paraphrasing. Thus English Kim began to swim. doesn't count as inceptive aspect for our purposes.

Ungrammatical examples can be hard to come by in this category, but you should be able to construct some by showing illicit combinations of auxiliaries + main verb forms, or multiple inconsistent tense affixes or particles.

Demonstratives/definiteness

[Phenomena code: cognitive status/cogst]

Demonstratives and definiteness are two means of indicating the cognitive status (or discourse status) of the referent of a noun phrase. Demonstratives are elements like English this and that which canonically participate in a system that distinguishes degrees of distance from the speaker and can be used to draw a hearer's attention to something physically present (cf. Dryer, 2008). Depending on the language, demonstratives can be determiners, adjectives, or affixes.

Determine how demonstratives are marked in your language, and the distinctions (especially in terms of distance or related notions) that are expressed in the system. Illustrate the range of possibilities with examples. Ungrammatical examples can be constructed by putting the demonstrative in the wrong place in the string (e.g., a prefix used as a suffix, etc.) For present purposes, don't worry about demonstrative pronouns, i.e., those that can stand alone without a (separate) head noun.

Definitness may be marked as inflection on the noun, through determiners, through choice of case particles, or some combination (and perhaps indeed other strategies as well). Nominal dependents may agree with their head nouns in terms of definiteness. In English, we mark definiteness with determiners (the vs. a).

Determine if definiteness is marked in your language, and if so how. Construct relevant positive and negative examples illustrating these possibilities. If any elements agree in definiteness, include examples of non-agreement.

Reference: Matthew S.. 2008. Order of Demonstrative and Noun. In: Haspelmath, Martin & Dryer, Matthew S. & Gil, David & Comrie, Bernard (eds.) The World Atlas of Language Structures Online. Munich: Max Planck Digital Library, chapter 88. Available online at http://wals.info/feature/88 Accessed on 2009-01-26.

Possessives

[Phenomena code: possessives/poss]

How does your language mark posession? Try to find out how your language falls in the following dimensions, and then illustrate them with positive and negative examples in your testsuite.

Is possession marked with inflection on the possessor, on the possessee, both, or by a separate word (`particle') linking the two?
Is the order of the possessive phrase possessor-first, possessum-first, or free?
Is the possessor more like a modifier or a specifier (explanation here)? If you can find evidence that points one way or the other, include that in the testsuite too.

Consider both pronominal possessors and full NP possessors.

Some languages treat inalienable possession (my hand) differently from alienable possession (my book). If there is a difference documented in your materials, include examples of both.

Note: We are not interested here in verbal construtions (e.g. possessive have), but rather marking of possessors within NPs.

Reference: Johanna Nichols, Balthasar Bickel. 2013. Locus of Marking in Possessive Noun Phrases. In: Dryer, Matthew S. & Haspelmath, Martin (eds.) The World Atlas of Language Structures Online. Leipzig: Max Planck Institute for Evolutionary Anthropology. (Available online at http://wals.info/chapter/24, Accessed on 2016-01-15.)

Attributive adjectives

Does the language have attributive adjectives, i.e., adjectives that attach to nominal projections as modifiers? (Note that this section is looking at adjectives as nominal modifiers, for adjectives as predicates, see Non-verbal Predicates below.)

If so, where do the adjectives appear? If adjectives are separate words within the NP, depending on the language they may appear before or after the N, flexibly before or after, or before or after depending on the adjective itself. They may be strictly adjacent to the N or potentially separate by other (non-adjective) elements, they may be affixes to noun stems, or they may be allowed to appear outside the NP elsewhere in the sentence while still functioning as modifiers.

Do the adjectives agree in any features with the nouns they modify? (Typical candidates here are number and noun class (e.g. gender or animacy), but there may be other relevant features.)

Create grammatical and ungrammatical examples illustrating all of these properties.

Adverbs

Adverbs usually constitute a large and diverse class. For the purposes of this course, please focus on manner adverbs like quickly. Where can they appear in the sentence? Can they attach to V, VP, S? Do they attach to the left or to the right? Create grammatical and ungrammatical examples illustrating the placement possibilities.

Non-verbal Predicates

This section concerns copular or copulaless clauses. Can NPs, adjectives, or adpositional phrases function as predicates in your language? If so, do they require a copula in some or all cases? Does the form of the copula vary? Examples from English include:

NP: The winner is a doctor.
Adj: This dog is small.
PP: The cat is in the park.

Note that in some languages, the copula is required only in non-present tense, or only with NP predicates, etc. In other languages, there is not really a class of adjectives distinct from stative verbs. Or there maybe two classes of adjective-like predicates, those that pattern with intransitive verbs (or at least appear without the copula) and those that require a copula or are otherwise differentiated.

Construct relevant positive and negative examples illustrating how your language handles non-verbal predicates.

Information Structure

In some languages, this takes the form of a particular construction, like clefts in English for marking focus (It was KIM who left.) or left-dislocation for marking topics (The book, Kim reads it.) In others, a particular sentence position (clause-initial, clause-final, pre-verbal, or post-verbal) is associated with a particular information sturtucal status. A third type of strategy is morphological marking, where a particular ending or particle is used to mark topic or focus. Many, if not all languages, also use prosody to mark information structure. Since we are working with textual representations, we won't be analyzing prosodic marking. Rather, please try to determine if there is any morphosyntactic marking of information structure. Please determine (and document in your testsuite):

What morphosyntactic forms convey focus/topic/contrast
What semantic/pragmatic contrasts are marked
If your language has cleft constructions, and if so what form they take
What the unmarked (most common) position is for topic and focus (if any)

Some descriptive grammars don't address information structure. For those who are working with field linguists or who can ask native speakers, here are some tests:

Focus: The part of the answer to a WH question that corresponds to the WH word is in focus (Lambrecht 1996, Gundel 1998):
- Q: Who read this book?
- A: KIM read this book.
- Q: What did Kim read?
- A: Kim read THIS BOOK.
Topic: If X is repeated in the answer to "Tell me about X", then X is in topic position (Choi 1999):
- A: Tell me about Kim.
- B: Kim is a student. (cf. Japanese Kim-wa/#ga gakusei desu.)

In addition to topic and focus, some languages have distinctive marking for contrast (or contrastive topic and/or contrastive focus). There are some examples of this from Arabic and Vietnamese in Song and Bender 2011 (pp.5-6). If your language has distinctive marking for contrast, please include this in your test suite and your write up.

For more information on information structure, please see Fery & Krifka 2008.

Matrix wh questions

We are interested here in utterances where the whole utterance is a question, and the question concerns the identity of one (or more) of the arguments of the main verb. In the simplest case, there is a single clause and one argument is questioned: Who did the child see? Your tasks are to find out and then document in your testsuite:

The shape of the wh words for core arguments. Do these vary with case, animacy, gender, something else?
The possible positions of wh words: Do they appear where an ordinary argument would? Move to the beginning of the clause? Are both of these possible?
What happens if the questioned argument belongs to a lower clause (e.g. Who did the observer think the child saw?)?
Are there any other differences between wh questions and declaratives (or yes-no questions)? (For example, English requires subject-auxiliary inversion in the main clause of a matrix wh-question.)
Are there are any differences between wh questions concerning subject and non-subject arguments? (For example, English does not do subject-auxiliary inversion if the questioned element is the main clause subject.)
Optional: What happens with multiple wh elements in the same clause (e.g. Who saw what?)

Ungrammatical examples will depend on what constraints you find, but may involve wh elements in the wrong position, involve wh elements in the wrong form (e.g. case), or hinge on ancillary properties of the construction (e.g. lack of inversion where it is required).

You do not need to worry about: embedded wh questions or wh questions where the questioned element is an adjunct (how, why, when, where).

Valence Changing Lexical Rules

[Phenomena code: Valence Change/valchange]

Here we are interested in lexical rules that apply to verbs and result in valence alternations. Examples include passive (suppress the erstwhile subject, promote the first complement), applicative (add an additional complement, perhaps interpreted as an instrumental or benefactive argument), and causative (adds subject, demotes erstwhile subject to complement).

For this phenomenon, try to find one valence changing process for your language. It might be morphologically marked (like English passive) or morphologically unmarked (like the English dative alternation).

We are interested in and your testsuite should illustrate:

Restrictions on what type of verbs can 'undergo' this process.
Effects on the valence of the verb (e.g. a passivized transitive verb should no longer be able to take an NP complement).
Effects on the case frame of the verb (e.g. is the added argument instrumental?)

Evidentials

[Phenomena code: Evidentials/evid]

Some languages have grammaticized means of marking the information source for a statement. For example, a 'hearsay' evidential indicates that the speaker doesn't have direct knowledge of the truth of the proposition, but was told the information by someone else.

Grammmaticized evidentials typically take the form of affixes on verbs or of auxiliaries, but other forms are possible.

If your language has evidentials, try to determine what the range of evidential terms is (i.e. the kinds of evidential meanings that are marked). If a sentence has no overt mark of evidentiality, is that interpreted as underpsecified, or does it have a specific meaning within the evidential system?

Collect examples of each of the evidential markers. Try to determine whether there are constraints on combinations of evidential markers (this is one source of negative examples for this phenomenon) or constraints on where the evidentials may appear.

Formatting

The test suites should be initially produced as plain text files (in ascii or utf-8). We have a python script that turns the plain text into the required format for [incr tsdb()]. In order to do this, the formatting of the plain text file has to be relatively strict.

Your test suite file should consist of a header, containing information pertinent to all of the examples, followed by a list of examples. The header should contain the following information:

Language: <language name> Language code: <Ethnologue language code> Lines: <orth orth-seg gloss translit translit-seg translat>¹ Author: <your name> Date: April 7, 2006 Source a: <Reference to grammar/web page> Source b: <Reference to grammar/web page> ² 1. Line names give the type for each tier of the IGT as it appears in your testsuite. Names are from this list: orth, orth-seg, translit, translit-seg, gloss, translat. The make_item script checks that the number of lines is consistent across examples and also checks for proper token counts across all *-seg lines that appear. 2. (as many sources as needed)

Each example should consist of the following. (The { } are optional in all lines except Judgment, where they should not be included.). Note that the example below shows a 5 line IGT, so all examples in the testsuite are required to have 5 lines and the header info at the top of the file should have Lines: orth translit orth-seg gloss translat or Lines: orth translit translit-seg gloss translat

#Ex number and optional comment Source: {a:page, b:page, author, elicited, attested} Vetted: {t, f, s} Judgment: {g, u} Phenomena: {word order, case, agreement, ...} <Example in standard orthography> (one of this and transliteration is required; including both is okay) <Example in transliteration> (one of this and standard orthography is required; including both is okay) <Example with morpheme boundaries noted and morpheme forms regularized> <Morpheme-by-morpheme glosses> <Free translation>

The comment character is '#', and it is good practice to number your examples in a comment line above the Source: line. For ungrammatical examples, this comment should also indicate what is wrong with the example, for your reference and for mine.

The source field indicates where the example came from. If it came from one of your written sources, you can refer to that source with a single letter code. If there is a page number associated with the example, it should follow the letter (with a colon in between). If you made the example up, the source should be author. If you elicited the example from a native speaker, then the source should be elicited. If you found the example in a non-linguistic text, the source should be attested.

The vetted field indicates where the judgment on the example came from. t means the example has been vetted by a native speaker, who gave the judgment indicated. f means it has not. (In this case, the judgment is your best guess based on the grammatical materials you have.) If you are a native speaker, you can vet your own examples. If the example comes from a grammar (which indicates a grammaticality judgment for it explicitly), and you haven't had it vetted in addition, you should put s in this field. This is meant to indicate that we think the example was vetted before being included in the grammar, but since we didn't do it, we're not sure. For attested examples, you should use t if you checked it with a native speaker and f if you have not.

The judgment field indicates the grammaticality judgment assigned to the example (either by a native speaker, in a grammar, or your best guess). g is for grammatical and u is for ungrammatical.

The phenomena field is a list of phenomena illustrated in the example. We'll have the python script recognize both long and short names for each phenomenon, according to the table below. A single example might illustrate multiple phenomena. However, ungrammatical examples should have only one thing wrong with them, and be tagged only the phenomenon tag corresponding to that problem.

Long name Short name

Adjectives adj

Adverbs adv

Agreement agr

Argument Optionality pro-d

Case-marking adpositions adp

Case c

Cognitive status cogst

Coordination crd

Coordination Agreement crdagr

Determiners det

Direct-Inverse Marking dirinv

Embedded declaratives emb-d

Embedded questions emb-q

Evidentials evid

Clausal modifiers cl-mod

Imperatives imp

Information structure info

Matrix wh questions wh

Matrix yes-no questions q

Modals m

Negation neg

Non-Verbal Predicates cop

Numeral Classifiers numcl

Possessives poss

Pronouns pn

Serial Verb Constructions svc

Tense Aspect Mood tam

Valence Change valchg

Wh questions wh

Word order wo

Corpus (for test corpora sentences) corpus

Long name	Short name
Adjectives	adj
Adverbs	adv
Agreement	agr
Argument Optionality	pro-d
Case-marking adpositions	adp
Case	c
Cognitive status	cogst
Coordination	crd
Coordination Agreement	crdagr
Determiners	det
Direct-Inverse Marking	dirinv
Embedded declaratives	emb-d
Embedded questions	emb-q
Evidentials	evid
Clausal modifiers	cl-mod
Imperatives	imp
Information structure	info
Matrix wh questions	wh
Matrix yes-no questions	q
Modals	m
Negation	neg
Non-Verbal Predicates	cop
Numeral Classifiers	numcl
Possessives	poss
Pronouns	pn
Serial Verb Constructions	svc
Tense Aspect Mood	tam
Valence Change	valchg
Wh questions	wh
Word order	wo
Corpus (for test corpora sentences)	corpus

The standard orthography and transliteration lines give a canonical representation of the string. You should have at least one of these, possibly both. Whatever you do, it should be consistent for the whole file.

The example should also be presented with the morpheme boundaries explicit. This is equivalent to saying that you should have at least an orth-seg or translit-seg line in each example (and indicated in the file's Lines: header). This allows the python script to check the alignment of glosses with each morpheme. For languages with particularly complex morphophonology, you might end up using this line as the example you actually parse/generate (to abstract away from the phonological rules). It is for this reason that this line should have phonologically regularized forms. You can indicate which line of your testsuite examples is to be the input to your grammar in [incr tsdb()] by using the --map option of make_item. More on this in the make_item instructions. In any morpheme segmented (*-seg) line, morpheme boundaries should be indicated with hyphens and word boundaries with spaces. If your language has clitics, the boundary between a clitic and its host should be marked with an equals sign.

The line labelled gloss is required. This line contains morpheme-by-morpheme glosses in correspondence with the morpheme segmented (*-seg) lines. The tokens on these lines should be in a one-to-one correspondence with the morphemes, so if the nth word in a *-seg line has two hyphens in it, the nth word in this line has two hyphens as well. Stems should be given English glosses indicating their meaning. For formatting and a good set of grammatical abbreviations to use, please follow the Leipzig glossing rules. (You might also want to refer to the standardized set of grams from ODIN.) When multiple gloss grams are expressed on an unsegmented language form, separate the sub-morpheme glosses with a full stop.

The line labelled translat should have the free translation of the example.

Here is an example sentence that I just made up from Japanese (with only a transliteration and not a standard orthography line for the moment, so the testsuite header would have Lines: translit translit-seg gloss translat):

Source: author Vetted: f Judgment: g Phenomena: {case, negation} Keeki-wo tabenakatta. Keeki-wo tabe-nai-ta cake-acc eat-NEG-PRF `(Someone) didn't eat cake.'

And a couple from French (here the file header would say Lines: orth orth-seg gloss translat):

Source: author Vetted: f Judgment: g Phenomena: {word order} J'ai mangé le gâteau Je-ai mange-é le gâteau I-have.1sg eat-PRF the.M.SG cake `I have eaten the cake.'

Source: author Vetted: f Judgment: u Phenomena: {word order} J'ai le gâteau mangé Je-ai le gâteau mange-é I-have.1sg the.M.SG cake eat-PRF `I have eaten the cake.'

(NB: I'm going with the analysis of so-called French "clitics" as affixes.)

Here's a long example from Japanese to illustrate. This is good:

Source: a Vetted: s Judgment: g Phenomena: {coordination, servial verb} Ima made takusan honwo yonde kimashita ga, kore kara mo yonde iku tsumori desu. Ima made takusan hon-wo yom-te ki-mas-ta ga kore kara mo yom-te ik-u tsumori desu. now until many book-ACC read-PTCP come-HON-PRF CONJ here from also read-PTCP go-IPFV intention COP.HON.IPFV `Up to now I have read quite a few books and I intend to read from now on, too.'

Do NOT do it this way. This example has way too many lines. The author has presumably added spurious line breaks to the long example lines, creating a 7 line IGT where we only expect 4. (I'm only including the IGT lines here to keep folks from looking at this quickly and copying.)

[BAD BAD BAD] Ima made takusan hon-wo yonde kimashita ga, kore kara mo yonde iku tsumori desu. Ima made takusan hon-wo yom-te ki-mas-ta ga kore kara mo yom-te ik-u tsumori dseu. now until many book-ACC read-PTCP come-HON-PRF CONJ here from also read-PTCP go-IPFV intention COP.HON.IPFV `Up to now I have read quite a few books and I intend to read from now on, too.'

Do NOT do it this way either (same basic problem, breaking up the lines within an example):

[BAD BAD BAD] Ima made takusan hon-wo yonde kimashita ga, Ima made takusan hon-wo yom-te ki-mas-ta ga now until many book-ACC read-PTCP come-HON-PRF CONJ kore kara mo yonde iku tsumori desu. kore kara mo yom-te ik-u tsumori dseu. here from also read-PTCP go-IPFV intention COP.HON.IPFV `Up to now I have read quite a few books and I intend to read from now on, too.'

Finally, note that while you can add as many blank lines as you like between IGT examples, blank lines should not appear within an example.

Back to course page

ebender at u dot washington dot edu

Last modified: 1/6/15

Hulle	was	nie	betrokke	nie
they	were	NEG1	involved	NEG2
They were not involved [afr]. (de Angulo and Freeland 1931)