#corpusmooc and spatial humanities

Update, 2016: not so supplementary now; see Spatial Humanities 2016 (programme: short & full | @spatialhums & #SH_2016), lots of delights.

Amongst the supplementary materials in weeks 6 and 8 of #corpusmooc was Ian Gregory on the potential for using GIS in corpus linguistics, aka spatial humanities.

First up, Mapping the Lakes (and version 2):


Place names were coded in XML and converted to a GIS, allowing mentions to be compared. Other features mapped included emotional response (on a scale of 1-10) and physical characteristics, ie altitude. Photos from Flickr were also incorporated. The end result permitted close reading of the text alongside a map of the area described. Next up, a corpus of Lake District writing for the period up to 1900, over a million words from 80 texts.

Next, geographical text analysis:


Claire Grover’s work (of Trading Consequences) on georeferencing (ie identifying all the place names automatically (?), pulling them out of the text and linking them up to a gazetteer to give them a point location on a map) 17,667 instances of places mentioned in the Registrar General’s Reports, 1851-1911 (2 million words; Histpop). Recall: 81%, precision: 82%, and correct with locality: 75%. Mapping the instances and smoothing gave a pretty good reflection of major population centres in England, however with Bedford as an outlier cluster:


Analysing ‘London’ found high z-scores relating to water supply/quality, whereas the Liverpool/Manchester cluster was more descriptive of diseases, with no discourse on water supply. Exploring causes of death and mapping collocations with place names led to the following conclusions:


Geographical text analysis can help to understand the geographies within a corpus. At the moment we only have recall and precision statistics of about 80%, but this will get better, and even if it doesn’t you still have most of the place names within a text. Bringing together statistical summaries from corpus linguistics and micro/close readings helps understand what’s going on within a text, to aid in decisions on which parts you perhaps need to close read and which parts you can ignore.

More on georeferencing place names, from Putting big data in its ‘place’…the power and value of amalgamating and querying content by ‘place’ has long been recognised through the use of place name gazetteers, however these have limitations as they tend to record only modern place names and lack spatial resolution. A number of initiatives aimed at extending the scope of modern gazetteers include:

Some spatial hums linkage:

See also my post on Telling stories with maps: literary geographies.


3 thoughts on “#corpusmooc and spatial humanities

  1. Hi there. Thank you for your mention of Claire Grover’s work and Trading Consequences. I thought you might be interested to know that the Trading Consequences searches and visualisations are now live. These tools are all based on text mined data which Claire and her colleagues Bea Alex and Ewan Klein have been working on and we have also made some of the lexical resources created during this project available from GitHub.

    For more information on all of the above take a look at our launch post: http://tradingconsequences.blogs.edina.ac.uk/2014/03/21/official-launch-of-trading-consequences/

  2. Thanks! Let us know what you think – we really value any questions or comments you (or your readers) may have 🙂

    And yes, I feel very lucky to get to work on some very fun stuff here with my lovely EDINA contacts and, in the case of Trading Consequences, brilliant colleagues from several other Scottish and Canadian universities!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s