Notes on Wordnet domains

Since 2.0, Wordnet has been adding domain specifiers for various synsets (indicated in Bert Bos's treebolic by 4-by blue Grids) to indicate the relevant domain in which the particular synset is expected and understood. Such domain delimiters are common in most dictionaries. These tags go in pairs from a synset that names a domain to a synset which is limited to that domain. There are three different kinds of domains that WordNet tags for:

  1. domain categories The synset whose sense is active only in a certain domain of discourse is tagged with a domain category synset (treebolic: "classification"), as when the synset {bruise (damage etc.)} is tagged with {plant, flora, plant life} (and also, BTW, a different sense {bruise (break into small pieces in food preparation)} is tagged with {cooking, cookery, preparation}). Conversely, those synsets that are domain category names are tagged with domain category term (treebolic: "class") , and clicking that link gives a list of all synsets that are limited by that term. So {cooking, cookery, preparation} gives 5 noun synsets and a bazillion verb synsets (coddle, toast, salt, shirr, microwave etc.). Synsets tagged with the {plant, flora, plant life} category domain are equally numerous. Another rich category domain is {baseball, baseball game}

    For a second example, the second noun sense of cell {cell}(biology) the basic structural and functional unit of all organisms; they may exist as independent units of life (as in monads) or may form colonies or tissues as in higher plants and animals} is flagged with a domain category biology, biological science (the science that studies living organisms} that gives the limiting domain where the sense is current and a domain term category which indicates that it is itself a domain category for the term totipotent. Totipotent itself is flagged with {cell (biology) the basic structural etc.}.

  2. A synset can be tagged for more than one domain category: {place kick, place-kicking} is tagged for both {soccer, association football} and {football, football game} and similarly for {net (a goal lined with netting (as in soccer or hockey))} where one domain category is {football, football game} and the other is {field hockey, hockey (a game resembling ice hockey that is played on an open field; two opposing teams use curved sticks try to drive a ball into the opponents' net)}. {stick} is tagged for three categories, {winger} for four. {Nibelung} is tagged for {Teuton} and also for {mythology}.

  3. A word may occur in two synsets, with one synset a domain category for the other—in effect, a broader sense and a narrower, more contextually restricted sense:
    {(v) dance, trip the light fantastic, trip the light fantastic toe} (move in a pattern; usually to musical accompaniment; do or perform a dance) "My husband and I like to dance at home to the radio".
    in the context of
    {(n) dancing, dance, terpsichore, saltation} (taking a series of rhythmical steps (and movements) in time to music)}"

    The same sort of nexus shows up with the word music, which occurs in a "broader" sense synset (n) {music} (an artistic form of auditory communication incorporating instrumental or vocal tones in a structured and continuous manner) and in a narrower synset n) {musi}c (musical activity (singing or whistling etc.)) "his music was his central interest", with the broader sense as a domain category for it. In fact, music occurs in another narrower synset (n) music ((music) the sounds produced by singers or musical instruments (or reproductions of such sounds)) with the broader sense as domain category.

    There is yet another music synset that is as it were the least specific:
    (n){ music, euphony} (any agreeable (pleasing and harmonious) sounds) "he fell asleep to the music of the wind chimes". Here an analysis in terms of facets seems very attractive.

  4. domain regions: An example of a region identifier is {United Kingdom, UK, U.K., Great Britain, GB, Britain, United Kingdom of Great Britain and Northern Ireland}. This region domain term is used to tag synsets limited to the British Isles ("Britishisms"). There are many, many region domain terms.

  5. domain usagesThere are 29 are domain usage terms. These include such synset values as:

    This last domain usage term is only used 15 times, and of these only one synset is glossed as metaphor and one as figurative:

    This looks very much like a project that got started and then froze.

  1. WordNet Domains” is a project that produced (manually) a version of Wordnet 1.6 with a Content Domain tag for each synset. The list of domains and subdomains is based on the Dewey Decimal Categories. The relation is spelled out in Bentivogli et al. The revised list there is close to that given on linea> (Doctrine has been replaced in the article with Humanities). WND's 5 top categories (humanities/doctrines, free_time, applied_science, pure_science, social_science) align fairly well with 5 of the 9 Domains of the BNC. The four remaining (Written Arts, Commerce, Imaginative, and World Affairs) seem to have been tucked under various of the 5 heads in WND. The WND project is interested in sense disambiguation and secondarily in sense reduction, and they point out that their scheme allows the ten senses of bank to consolidated into seven. The WND tree can be carried to four levels of specificity; the second has about 167 categories and is considered Basic Level by them. It is the one used to tag every synset.

  2. Although WND work talks about taxonomy and hierarchy, WN 2 allows the same synset to be a term in two different domain categories: thus {film, cinema, celluloid} is tagged for both {art, artistic creation, artistic production} and {commercial enterprise, business enterprise, business} domain categories (thus capturing the art v. Mammon aspect of the movies!). It is not interested in connecting all subtrees into a single graph. It does not unify {music} and {music euphony} under a single head, nor {religion, faith, religious belief} and {religion, faith, organization} again reflecting a deep tension in religion.

  3. For WN, sub-synsets of broader synsets can be indicated by tagging them with a domain category that is a subcategory of broad one: so {medicine, medication, medicament, medicinal drug} has { medicine, medical speciality} as its domain category.

  4. A curious point: {literature}, which has several subsynsets, gives a list of domain category terms that include literary terms but also tempest, steed, and rosebud.

  5. ICE text categories number 11:A2.16 W1A-001 TO W1A-010: UNTIMED STUDENT ESSAYS


    A2.18 W1B-001 TO W1B-015: SOCIAL LETTERS

    A2.19 W1B-016 TO W1B-030: BUSINESS LETTERS

    A2.20 W2A-001 TO W2A-040: ACADEMIC WRITING

    A2.21 W2B-001 TO W2B-040: POPULAR WRITING

    A2.22 W2C-001 TO W2C-020: NEWSPAPER REPORTS


    A2.24 W2D-011 TO W2D-020: SKILLS AND HOBBIES

    A2.25 W2E-001 TO W2E-010: PRESS EDITORIALS

    A2.26 W2F-001 TO W2F-020: FICTION

  6. In the wordnetdomains file provided for WN 1.6, there are 115424 synsets, of which about 41000 are marked "factotum" (i.e. widely distributed to classify)

Luisa Bentivogli, Pamela Forner, Bernardo Magnini, Emanuele Pianta . “Revising the WORDNET DOMAINS Hierarchy: semantics, coverage and balancing.” In Proceedings of COLING 2004 Workshop on "Multilingual Linguistic Resources", Geneva, Switzerland, August 28, 2004, pp. 101-108.