NB: The current total cluster counts of quotes and non-quotes do not add up to the total cluster counts in the whole text. This is a known issue which we are working on.

The documentation is work in progress. We are currently improving the speed of the application (mostly the concordance). All suggestions are welcome.


The keywords page allows you to compare two corpora and two subsets with each other in order to find the clusters that occur statisically significantly more frequently in a corpus.

The keyword extraction formula is taken from

Rayson, P. and Garside, R. (2000). Comparing corpora using frequency profiling. In proceedings of the workshop on Comparing Corpora, held in conjunction with the 38th annual meeting of the Association for Computational Linguistics (ACL 2000). 1-8 October 2000, Hong Kong, pp. 1-6. Available here.

The total count used in this formula is the total cluster count and not the total token count. For instance, the sample sentence "This is a simple sentence." is counted as single token for the 5-gram cluster count, e.g. there is one cluster of 5 tokens ['this is a simple sentence']. This differs from the individual token count which is 5 ['this', 'is', 'a', 'simple', 'sentence'].


DNov - corpus contents

Abbreviation Book title Author
BH Bleak House Charles Dickens
BR Barnaby Rudge --
DC David Copperfield
DS Dombey and Son
ED The Mystery of Edwin Drood
GE Great Expectations
HT Hard Times
LD Little Dorrit
MC Martin Chuzzlewit
NN Nicholas Nickleby
OCS The Old Curiosity Shop
OMF Our Mutual Friend
OT Oliver Twist
PP Pickwick Papers
TTC A Tale of Two Cities

19C - corpus contents

Abbreviation Book title Author
Agnes Agnes Grey Anne Brontë
Alli The Small House at Allington Anthony Trollope
Anto Antonina or, the Fall of Rome Wilkie Collins
Arma Armadale --
Audley Lady Audley’s Secret Mary Elizabeth Braddon
Basker The Hound of the Baskervilles Sir Arthur Conan Doyle
Cran Cranford Elizabeth Gaskell
Deronda Daniel Deronda George Eliot
Dorian The Picture of Dorian Gray Oscar Wilde
Dracula Dracula Bram Stoker
Emma Emma Jane Austen
Frank Frankenstein Mary Shelley
Jane Jane Eyre Charlotte Brontë
Jekyll The Strange Case of Dr Jekyll and Mr Hide  Robert Louis Stevenson
Jude Jude the Obscure Thomas Hardy
Mary Mary Barton --
Mill The Mill on the Floss --
Native The Return of the Native --
North North and South --
Persu Persuasion --
Pomp The Last Days of Pompeii Edward George Bulwer-Lytton
Pride Pride and Prejudice --
Prof The Professor --
Sybil Sybil, or the two nations Benjamin Disraeli
Tess Tess of the D’Urbervilles --
Vanity Vanity Fair William Makepeace Thackeray
Vivian Vivian Grey --
Woman The Woman in White --
Wuth Wuthering Heights Emily Brontë

Want to tweet?

We'd love to be in contact! Feel free to use the button below:


We are very keen to hear about:

  • What works
  • What doesn't work
  • What you like
  • What you don't like

Please email