Excellent intro on CL in relation to Chomskyan rev. Good on POS tagging and parsing, and explanation of ENGCG. No interest in literary topics that I can see.
Comprehensive and detailed about major corpora, their makeup and use; many lists and tables, many studies cited. Also covers POS tagging and parsing; some on tools. Impressed with ENGCG.
Although tied to a specific corpus query tool and corpus, this is a wonderfully detailed and clear walk-through of how to raise and answer questions with a corpus. (Copy also available from me.)
Argues that collocation is a psychological relation between words
Fine little introduction; works with corpus of business English
Excellent, lucid, non-technical introduction to the key ideas. Good examples. Often cited in Stubbs, which is in many ways an update and expansion.
Useful practical guidance; pieces by Sinclair, Leech, Burnard, and others
Papers from WebAsCorpus (WAC3) and CLEANEVAL on removing boilerplate from Web HTML pages. Pieces by Wm. Fletcher and Serge Sharoff.
All you would ever want to know about ICECUP and the grammatical markup.
More programmatic, less technical than Nelson et al. 19 articles
An excellent guide to this very rich interface to BNC (which is available on line), including its special 'Simple' query languge.
Fine explanation of why he built his indexer. Applies Gerard Steen's notion of genre as a basic level term. Not just about BNC.
Uses "PP", a field and genre-specific corpus of business writing (proposals). Chapter runs gamut of lexical issues using WordSmith, and esp. likes Keywords. A model of analysis.
on lose/find, amplifiers/downtowners in BNC
Discusses 3 ways to identify genres of web documents: external description, text language traits, and a third, best way, which involves hand culling of the texts. Concern is with top-level categories. Google's subject categories ot based on coherent linguistic features; cluster analysis of features pops out four 'text-types':
Need for a corpus of on-line forums. Procedures for setting one up. Issues of locality and identity. Pilot study on interaction and stance markers and differences in dif. localities.
Much about his design and plans for WebAsCorpus
about using web as corpus; types of search options, explanation of KWIKFinder program
Using on-line regional newspapers to monitor ongoing changes and dynamics of standardizing.
Reminds us of what is not available in a ready-made web corpus.
This is somewhat later version of The WebCorp Search Engine: a holistic approach to Web text Search (2005), but has some other examples of use.
offers 4 universal, functionally motivated tendencies affecting choices among varying alternatives. (See also collections edited by Rohdenburg and Schlüter and by Rohdenburg and Mondorf.)
NP-Poss N vs. N N (driver's license/ driver license) hard to detect animacy on web
Better at discussing the pitfalls/complexities of Web as corpus than in practical solutions. Compared university websites in OZ,NZ, and UK and found some differences between them (as a group) and BNC in top 50 most frequent words. Parturient montes...
The URL for RDUES is now at Birmingham City University: rdues.bce.ac.uk
Interesting positions on many points besides the get-passive.
long list of points of change; different from/than/to and progressive; natl. difs. by country extension
Old and non-corpus, but quite linguistic. several on modality (Paul Simpson, Joanna Channell, Christopher Butler)
uses parallel translated Swedish/English texts to tease out three functions of prag markers: a 'reality' use, a hedging use, and a conclusion from a number of alternatives.
Based on his own T2K-SWAL Corpus: chapters on vocaculary, grammatical variation, expression of stance, lex bundles, and multidimensional analysis
About "boosters" as used in two "soft" disciplines: history and economics. Corpora of all articles in 10 journals in each discipline for year 1999-2000. Mostly initial adverbials, esp. "the use significantly, invariably, undoubtedly. shows that the use of emphatics in history is much more varied and graded than in economics (52_
Extracts keywords for time setting in 2.5 M word history corpus (reference list a similar sized one in econ and business. Key words and phrases that emerge are frequent in history (COCA), sometimes most frequent, but never more than 2x more. Except of course those including 19th and 20th century, middle ages, history
Comparison of most frequent 4-grams in freshman essays and in acad/convers subcorpora of Longman's corpus (Biber et al.). Shows clearly Freshmen not influenced by conversational models, but by desire to replicate academic formulae in their readings. Suggests looking for variation across disciplines and academic levels.
Review of literature
Compares introductory chapters in 10 intro textbooks on point of overlapping of evaluation and argumentation. Style emerges as a parameter.
Traces 3 key keywords in lit. review chapters of diss.s in applied lings. Finds difs. in frequency of research, study, and studies by Moves. Finds study almost always self-referential (this, the present, the current).
Compares Pakistani and British essays by economics students (Br. from from BAWE) in use of conjunctive ties; finds and and but used initially more by the Pakistanis and in general more conjunctive ties used by them. Illustrates use of Rayson-Garside LL calculator to measure similarity or corpora.
Sharp rise in Acknowledgment sections of books in last 50 years creates a place where subjectivity can be indulged, esp. via hyperbole, irony, and emotivity. More of this across the board in the 'soft' sciences (lings, econ, soc). Size of networks to be acknowledged is greater in hards.
variation across disciplines
Contrasts research articles in 4 'hard' disciplines (App Lings, EE ME, Physics) with a matching set in 4 'soft' disciplines(Soc, Marketing, Philosophy, Biology) along three parameters of contrast: citation practices, Reader-oriented features, and self-mention/acad promotion.
Very introductory—not really corpus lings. Sketches research, instructional, student, and popular discourses.
use of lexical verbs expressing e. modality in writing of NS/NNS
Compares uses of 'sentence-builder' "it is/has been (often) asserted/believed/noted that X" in 3 subcorpora of BNC (socsci, medicine, tech/engineering) and three major functions (topic priming, support cited, straw man, self-reference) .
NNS learners' difficulties are more phraseological than word based.
Japanese scientists use past tense rather than present perfecdt in introduction, making their 'we' less authoritative.
Found high density of 4+ ngrams in 500K word corpus of artitcles on candida albicans. Strongest in 'external' reference to materials, procedures, and the liberature. Lots of 'reporting verb' ngrams ([have] been identified as
First person pronouns in French philosophical vs. linguistic writings
Focus on a corpus of book reviews in lings, on ways of expressing negative valuation. Considers gender. keys on words like surprising and disappointing.
Compares German and Eng Lang author's self-reference (and some 2nd person); finds 'I-taboo' much more robust for Germans (pref. for we) and German Philosophers very heavy users (77%) of we-humans. Some 3rdp ref. Status and gender also imp. parameters.
4 subcorpora: Unified Physics, Molecular Biology, Economics, and Business Management. Comparative keyword lists of 'doing science' verbs. logic of hard/soft disciplines.
averral vs. attribution in theses in Agricultural Botany, Agricultural Economics, and Psychology,
Analyzes distribution of modals and modality markers over four genres of professisonal American medical writing (Research articles, Editorial articles, Textbook samples, and Handbook samples) as well as two groups of popular texts (Newspaper/magazine articles and Guidebook samples). Finds main contrast in med writing between research and the clinic (advisory). Chapter on argumentation is good. [ [Here]]
Finds key words and phrases that characterize journalism and popular romance, using BNC and Lexis Nexis.
Based on a 258,348 word corpus of late 20th century British narrative prose.
A model paper for tackling a big issue (The Fish fork) and arguing for the whole application of machines in the humanities