14 Novels by Charles Dickens, tagged for POS and for NP and PP chunks.
IMS Corpus Workbench -- Demonstration Corpus
Size: 3.4 million tokens
This corpus is a collection of novels by Charles Dickens:
- A Christmas Carol
- - David Copperfield
- - Dombey and Son
- - Great Expectations
- - Hard Times
- - Master Humphrey's Clock
- - Nicholas Nickleby
- - Oliver Twist
- - Our Mutual Friend
- - Sketches by BOZ
- - A Tale of Two Cities
- - The Old Curiosity Shop
- - The Pickwick Papers
- - Three Ghost Stories
The text is derived from several Etext editions of Project Gutenberg.
It was tokenised, part-of-speech tagged and lemmatised with Helmut
Schmid's TreeTagger, and chunk-parsed with the Gramotron PCFG for
English developed at the IMS.
It includes two (Master Humphrey's Clock and Three Ghost Stories) not in Masahiro Hori, Investigating Dickens' Style: A Collocational Analysis