#edDDI: Digital Day of Ideas 2015

2016 update: #DigScholEd was liveblogged by Nicola Osborne. Keynotes from literary historian Ted Underwood on Predicting the past, a distant reading type approach to digital libraries, Lorna Hughes on Content, co-curation and innovation: digital humanities and cultural heritage collaboration, and Karen Gregory on Conceptualizing digital sociology.

Bumped/rewritten post – see below for brief mentions of #edDDI in 2014 and 2013 and other #digitalhss doings.

From the #digitalhss stable came Digital Day of Ideas 2015 (#EdDDI | TAGSExplorer – see graph) on 26 May, livetweeted, blogged and Storified by Lorna Campbell (@LornaMCampbell), with recordings of the talks to come.

Speakers and outputs:

Other #edDDIs:

#digitalhss in four keys: medicine, law, bibliography and crime, workshop on 12 November 2013, liveblogged by Nicola Osborne:

  • Digital articulations in medicine (Alison Crockford) – ah, the Surgeons’ Hall…seeks to illuminate the relationship between literature and medicine in Edinburgh through the development of a digital reader,  joining together not only the literary and medical spheres but also the rapidly expanding field of the digital and the medical humanities; interesting points on the nature of digihum and public engagement issues, see Dissecting Edinburgh for more
  • Rethinking property: copyright law and digital humanities research (Zhu Chen Wei) – the entrenched idea of copyright as an exclusive property regime is ill suited for understanding digihum research activities; how might copyright law respond to the challenges posed by digital humanities research, in particular the legality of mass digitisation of scholarly materials and the possible copyright exemption for text and data mining
  • Building and rebuilding a digital catalogue for modern Chinese Buddhism (Gregory Scott) – the Digital Catalogue of Chinese Buddhism is a collection of data on over 2300 published items with a web based, online interface for searching and filtering its content; can the methods and implications of working with a large number of itemised records, bibliographic or otherwise, be applied to other projects?; channelling Borges’ library of Babel 
  • Digitally mapping crime in Edinburgh, 1900-1939 (Louise Settle) – specifically an historical geography of prostitution in Edinburgh; used Edinburgh Map Builder, developed as part of the Visualising Urban Geographies project, which allows you to use National Library of Scotland maps, Google Maps and your own data; viz helps you spot trends and patterns you may not have noticed before;  for locations elsewhere in UK Digimap includes both contemporary and historical maps; Historypin uses historical photography to create maps, (EH4, plus come in #kierkegaard); see also the Edinburgh Atlas

See also the workshop on data mining on 19 November 2013.

LitLong Edinburgh: exploring the literary city

Update: LitLong 2.0 launched at the 2017 Embra BookFest; see article

Edinburgh has just celebrated its 10th anniversary as UNESCO city of literature (Facebook | Twitter). The original city of literature, here’s Edinburgh’s literary story and details of tours and trails (guided | self guided | virtual – a bit lacking in the maps department, mind). Edinburgh is also home to the Scottish Poetry Library (Facebook | Twitter), the world’s first purpose built institution of its kind, it says here, and the Scottish Storytelling Centre (Facebook | Twitter), ditto, adjacent to John Knox House. Not forgetting the Book Festival (Facebook | Twitter), the “largest festival of its kind in the world“. 

The UK has one other city of literature, Norwich (see City of stories), and further literary cities include Dublin (great writers museum), and, pleasingly, Dunedin (about). Update: Nottingham has a bid in! If But I know this city! (tweets | David Belbin | report) is anything to go by, it should be successful. And there’s even Literary Dundee (@literarydundee). Unexpected update: Literary Odessa.

I suspect not entirely coincidentally, 30 March saw the launch of LitLong (@litlong), the latest output from the AHRC funded Palimpsest project (@LitPalimpsest) at the University of Edinburgh (see Nicola Osborne’s liveblog and #litlonglaunch, esp @sixfootdestiny). An “interactive resource of Edinburgh literature” currently based around a website with an app to come launched for iOS, LitLong grew out of the prototype Palimpsest app developed three years ago, taking a multidisciplinary team 15 months to build – geolocating the literature around a city is no trivial matter! See about LitLong for some of the issues.

550 works set in Edinburgh have been mined for placenames from the Edinburgh Gazetteer, with snippets selected for “interestingness” and added to the database, resulting in more than 47,000 mentions of over 1,600 different places. The data can be searched by keyword, location or author, opening up lots of possibilities, such as why is Irvine Welsh’s Embra further north than Walter Scott’s Edinburgh? Do memoir writers focus on different areas than crime writers? See too Mapping the Canongate.

Part of the point of Palimpsest is to allow us to explore and compare the cityscapes of individual writers, as well as the way in which literary works cultivate the personality of the city as a whole.

On the down side, while there is a handful of contemporary writers in the mix, the majority of the content necessarily comes from copyright free material available in a digitised corpus, ie old stuff they made you read at school. Plus search results can be rather overwhelming (339 hits for the Grassmarket) – filters for genre, time period, might be an idea. However the data is to be made available enabling interested parties to play around as they wish, with open source code and data resources on GitHub.

I’ve had a look at the data around Muriel Spark, who would surely be delighted to be considered contemporary. The prime of Miss Jean Brodie (1961) has a section set in Cramond, near where I grew up. Drilling down using the location visualiser quickly brings us to:

“I shouldn’t have thought there was much to explore at Cramond,” said Mr. Lloyd, smiling at her with his golden forelock falling into his eye.

Searching the database brings up three pages of Cramond results to explore, including 17 Brodie snippets. Note that here you can filter by decade or source.

A search for Cammo, even closer to home, brought up a quote from Irvine Welsh’s Skagboys, although the map shown was different depending on which tool I used:

Edinburgh is a city of trees and woods; from the magnificence of the natural woodlands at Corstorphine Hill or Cammo, to the huge variety of splendid specimens in our parks and streets, Alexander argued, a pleasing flourish to his rhetoric. — Trees and woodlands have an inherent biodiversity value, whilst providing opportunities for recreation and environmental education.

location visualiser map - quill not in park

location visualiser map – quill in back gardens rather than the “natural woodlands” #picky

database search map - not Cammo!

database search map – not Cammo!

At the other end of the scale a search for ‘Bobby’ brings up 72 snippets from Eleanor Atkinson’s book, that’s a lot to handle…TBH I don’t really want them, I want a nice map of locations mentioned in the book, or at least a list, to create my own Greyfriars Bobby trail. At the moment it’s not possible to switch between the text and the map from the location visualiser, although you can do this snippet by snippet from the database search.

As things stand LitLong feels like an academic project rather than a user friendly tool – some use cases might be an idea.Hopefully the same approach will be applied to other cities in due course.

#smwbigsocialdata: getting social at CBS

On 27 February the boffins at Copenhagen Business School (aka the Computational Social Science Laboratory in the Department of IT Management) opened their doors for Social Media Week with Big social data analytics: modelling, visualization and prediction. This was the second time CSSL has participated in #smwcph, with their 2014 workshop (preso) looking at social media analytics. See also my post on text analysis in Denmark.

Wifi access was not offered, resulting in only 19 tweets, but as many of these were photos of the slides I’m not really complaining. Also no hands-on this year, all in all a bit of a lacklustre form of public engagement.

Ravi Vatrapu kicked off the workshop with a couple of definitions:

  • What is social? – involves the other; associations rather than relations, sets rather than networks
  • What is media? – time and place shifting of meanings and actions

The CSSL conceptual model:


  • social graph analytics – the structure of the relationships emerging from social media use; focusing on identifying the actors involved, the activities they undertake, the actions they perform and the artefacts they create and interact with
  • social text analytics – the substantive nature of the interactions; focusing on the topics discussed and how they are discussed

It’s a different philosophy from social network analysis, using fuzzy set logic instead of graph theory, associations instead of relations and sets instead of social networks.

Abid Hussain then presented the SODATO tool, which offers keyword, sentiment and actor attribute analysis on Twitter and Facebook (public posts only, uses Facebook Graph API). Data from (for example) a company’s wall can be presented in dashboard style, eg post distribution by month.

Next, Raghava Rao Mukkamala explored social set analytics for #Marius and other social media crises. Predictions (emotions, stock market prices, box office revenues, iphone sales) can be made based on Twitter data.

Benjamin Flesch’s Social Set Visualizer (SoSeVi) is a tool for qualitative analysis. He has built a timeline of factory accidents and a corpus of Facebook walls for 11 companies, resulting in a social set analysis dashboard of 180 million+ data points around the time of the garment factory accidents in Bangladesh.

The dashboard shows an actor’s engagement before, during and after the crisis (time), which can also be analysed over space (how many walls did they post on). Tags are also listed, allowing text analysis to be undertaken.

Niels Buus Lassen and Rene Madsen then outlined some of their work with predictive modelling using Twitter. You have to buy into #some activity being a proxy for real world attention, ie Twitter as a mirror of what’s going on out in the market – a sampling issue like any other. Using a dashboard driven by SODATA they classify tweets using ensemble classifiers, such as iPhone sales from 500 million plus tweets containing the keyword “iphone” (see CBS news story | article in Science Nordic).

They also used a very cool formula I nearly understood.

Last up, Chris Zimmerman gave an overview of CSSL’s new Facebook Feelings project, a counterpart to all those Twitter happiness studies. A classification of 143 different emotions on Facebook, based on mood mining from 12 million public posts, yikes. “Feeling excited” was the most popular feeling by far. Analysis can be done and correlations made on any number of aspects of the data, with an active | passive axis in addition to the positive | negative axis used in sentiment analysis. Analysis by place runs into the usual issue – only 5% of data has locality data.

Overview slides currently available from the URL below…

#corpusmooc (and text analysis) linkage

Latest: Journalistic representations of Jeremy Corbyn in the British press

Updates: never forget – Sentiment analysis is opinion turned into code; see Stanford Named Entity Tagger, which has three English language classification algorithms to try, and a list of 20+ Sentiment Analysis APIs. Next up: Seven ways humanists are using computers to understand text, Semantic maps at DH2015Sensei FP7 project: making sense of human – human conversation (Gdn case study). Donald Trump’s tweets analysed. Pro-Brexit articles dominated newspaper referendum coverageAmericanisation of English

Updates: just came across culturomics via a 2011 TEDx talk – no, stay…two researchers who helped create the Google Ngram Viewer analyse the Google Books digital library for cultural patterns in language use over time. See the Culturomics site, Science paper etc. Critique: When physicists do linguistics and Bright lights, big dataEMOTIVE, sentiment analysis project at Lboro…Laurence Anthony reviews the future of corpus toolsSentiment and  semantic analysisanalysing Twitter sentiment in the urban context and againWisdom of the crowd, research project from inter alia Demos and Ipsos MORI, launches with a look at Twitter’s reaction to the autumn statementThe six main arcs in storytelling, as identified by an AI

Aha, a links post…I’ve got links on text analysis and related all over the shop – see the category and tags for text mining and sentiment analysis on this blog for starters, in particular #ivmooc 4: what? and #ivmooc 2: burst detection, plus Word clouds for text mining. Here’s a broadly corpus related haul.




There’s no shortage of cases. Here’s a selection with particular appeal, either due to subject matter or methodology:

Blogs, Twitter…The dragonfly’s gaze looks at computational approaches to literary text analysis, with a nice post listing repositories and exploring file formats.

Telling stories with maps: literary geographies

Telling stories with maps: the geoweb, qualitative GIS and narrative mapping (programme) was a seminar (report | another one) held on 30 April as part of Hestia2, a project centred round spatial reading and visualising Herodotus’ Histories (see posts). Sessions in the morning covered narrative mapping while the afternoon focused on literary analysis and networks.

Sessions of particular note:

During the lunch break participants tried out the MapLocal app (Android only), which allows users to take photos and record audio commentaries which are geolocated and uploaded to a shared map. Echoes of the Gdn’s Google  Street View Sleuth?

Time to revisit Kierkegaard in maps, although other personally related themes might prove more doable.

A recurrent theme [was] the conceptual and technical challenges associated with efforts to shift the focus away from traditional ‘Cartesian’ cartographic methods – with their focus on surfaces, images and topographies – onto the topological and networked representations contained in narrative depictions of space.

What is lost in translation from narrative to map or map to narrative form?

Great livetweeting from @muziejus:

A further event on 6 June explored digital pedagogy.

Some linkage:

Some notes:

  • literary cartography
    • an approach using a symbolic language
    • spatial elements of texts are translated into cartographic symbols
    • allows new ways in exploring and analysing the geography of literature
    • tools of interpretation – show something which hasn’t been seen before
    • not just supporting the text
  • the space of fiction – categories
    • settings – where the action takes place (house, village)
    • zones of action – several settings combined (city, region)
    • projected spaces
      • characters are not present but are thinking of, remembering, longing for or imagining a specific place
    • markers – places which are mentioned; indicate the geographical range and horizon of a fictional space
    • paths/routes – along which characters move; connections between waypoints (settings, projected spaces)
  • database support
    • data model
      • general text information, including bibliography and assigned model region
      • about the author
      • the temporal structure of the story line
      • spatial objects
    • maps created automatically from database
  • what elements of the literary space can be mapped
    • the city in literature
    • interactions/tensions between centre and periphery
    • travelling
    • crossing borders
    • imaginary places
    • literary tourism
  • what elements are unmappable
  • different representations for epochs, genres?
  • spatial models
    • maps in literature, eg Treasure Island
    • imaginary settings
    • mapping of a single text
    • mapping of groups of texts
      • where and when do cities appear on the literary map of Europe?
      • how international is the space?
    • placing literature on a map
      • simplistic
      • no theoretical foundation
    • issues and uncertainties
      • the artistic freedom of the author
      • semantic and linguistic variation in describing places and spaces
      • vague geographical concepts
      • reading variations by different readers
      • visualisation need to make some things clearer than they actually are
      • texts do not always provide distinct or correct information
      • different interpreters can provide different viewpoints – subjective
      • mark data as direct/indirect reference
      • detail may not be provided of a journey, but a straight line gives the wrong impression
  • maps as an intermediate results, sources of inspiration, generators of ideas for future research
    • makes aspects visible which were invisible before
    • creates knowledge about places, their historical layers, meanings, functions and symbolic values

#corpusmooc: review that journal


Updates: why take notes? The Guardian view on knowledge in an information age. What type of note taker are you?

Each week in #corpusmooc, straight after the vids, we’ve been exhorted to “update your journal”. A bit of explanation might have been idea for those not into Lancaster’s particular form of reflective practice, plus maybe “notes” would have worked better as a catch all, but hey… As you can see there were 37 comments on this particular page (en passant, think that comments is new; maybe it wasn’t just me who queried what the number referred to – my initial thought was page views). But what’s to comment on?

Some people take handwritten notes, some use Wikipad, Evernote, a couple use mindmapping “to keep the written record of the connections between ideas that come to my mind while learning and reflecting upon what I have learned”. Someone on pen and paper notes commented that “I think I’m absorbing more and retaining what I learn better”. It’s particularly fun that handwritten notes are called out for being “slow” – for me a bigger problem is that underuse has led to my handwriting being even more appalling than before the advent of computers. Mention of Docear, an ‘academic literature suite’ which offers electronic PDF highlighting as well as a reference manager and mindmapping, looks interesting.

Hamish Norbrook has a great approach:

Pen and paper transferred to the single file “MOOC notes”: individual units filed by unit number. I try and sift as I’m going into ‘Stuff I really need in my head and not on paper”, “Stuff I can come back to or refer to’ and and… ‘Stuff I’m unlikely to understand’.

Having never mastered mindmapping I’m a fan of the bullet point. I’ve made the biggest use of screen captures on this MOOC, thanks to Laurence Anthony introducing us to the Windows snipping tool, but in the past I’ve also tried out VideoNot.es – video watching and notetaking on one screen. Why take notes? An infographic on notetaking techniques offers some insights into the recording and retaining of information:

  • only 10% of a talk may last in your memory, but if you take and review notes you can recall about 80%
  • notetaking systems (who knew?) include the Cornell System with a cue column and notetaking and summaries areas, the outline system and the flow based system
  • writing vs typing – writing engages your brain while you form and connect letters helping you retain more – typing gives a greater quantity of notes

Here’s an article on student notetaking for recall and understanding.

CaptureThe final activity on the course is to review your journal, as I suggested in week 4. A number of people have made some progress in analysing their personal or other corpora:

  • on The Waste Land: “‘you’ features as much as ‘I’, which brought home to me how much the fragments in The Waste Land are parts/one side of a conversation, though the actual ‘you’ may not be given a voice”
  • on own notes: “Besides the classic function words such as articles, pronouns, conjunctions we use to see in corpora, I just realized that I use a lot the word ‘so’ in different contexts, especially as an adverb (I have a tendency to write things like ‘this is so interesting’, ‘this subject is so important’, etc), and as a linking word that I seem to use at the beginning of almost every paragraph.”
  • on own tweets, comments on the MOOC but difficult to get data (groan)

Some people have gone the full nine yards already. Liliana Lanz Vallejo:

I loaded the notes that I took of the course and I added the comments that I wrote in all the forums. This made a total of 9,436 word tokens and 2,338 word types. Something got my attention. While in most of the English corpora that I’ve cheked in this course, the pronoun “I” appears close to a rank 20, in my notes and comments corpus “I” appears in rank 2, after “the”.  This is curious because the same thing happens in the corpus of tweets containing Spanish-English codeswitchings that I gathered some years ago. In it, “I” appears in rank 1 of words in English, while “the” is in rank 3. It seems that my English and the English of Tijuana’s Twitter users in my corpus is highly self-centered. We are focusing in our opinions and our actions. Of course, the new-GSL list, the LOB and Brown corpus and all the others were not made with “net-speech”. So there is a possibility of native English speakers favoring the usage of the pronoun “I” in social media or internet forums…I would need to compare my notes and comments corpus to a corpus made of forum comments, and the tweets corpus to one made of social media posts (or tweets, that would be even better).

Andrew Hardie (CPQweb guru) responds: “May this be a genre effect? Are comments/twitter posts of equivalent genre to the written data you are comparing it to? Use of 1st and 2nd person pronouns is generally considered a marker of interactivity or involvement, which is found in spoken conversation but not in most traditional formal written genres. But then, comments on here are not exactly what you would call traditional formal written genres!”. Kim Witten (mentor): “Also keep in mind that while “I” can be perceived as focused on opinions and actions, it is also often indicative of the act of sharing (e.g., “I think”, “I feel”, “I want”), which as Andrew says is a marker of interactivity or involvement. So perhaps it is inward-facing, but for the intent of being outward-connecting.”

Anita Buzzi:

I generally take notes with pen and papers, so I decide to collect all the answers I gave in the two MOOCs on Futurelearn I attended creating my own corpus delicti. I generate a word list with AntConc – word types 944 word token 2937- the results: The first token is “the” freq. 140; the second token reveals that my favourite preposition is “in” 105 freq. then the list goes on showing: “and” ,“I”,”to”, “of”. I annotated the corpus in CLAWS–3016 words tagged, tagset C7 and then USAS. I generate a word list in CLAWS C7 – word types 1032- words token 5910. the resultes shows : nn -nouns 812, jj- general adjectives 213, AT- articles 201, ii preposition 181. I look for VM modal verbs. The first modal 17 hits is “can” and the concordance shows mostly in association with “be”, The second with 15 hits is “may” : may share, provide, be, reflect, feel, represent The third is “would” 10 hits : would like, would be; followed by “could”, “should” and “will” 4 hits; “need to” just 1 hit. While the modal verbs in the London Lud Corpus of Spoken English appear in this scale WOULD – CAN – WILL- COULD- MUST – SHOULD – MAY – MIGHT – SHALL The results I had from the corpus was: CAN- MAY – WOULD- COULD- SHOULD – WILL – MIGHT Why do I use “may” so much? Probably because I was talking about specific possibility, or making deductions.

Amy Aisha Brown (mentor): “Did you take a look at your concordance lines? What does ‘may’ collocate with? That might give you a hint at why you use it so much. Another thought, I wonder if someone has put Tony’s lectures into a corpus. It could be that he uses ‘may’ often and that you have picked it up from him? Maybe you always use it often?” Tamara Gorozhankina:

I’ve collected a very small corpus of all my comments through the course (4,835 tokens), and saved them in 8 separated text files (each file for each week). I used POS annotation in CLAWS C5, and the keyword list showed: Nouns – 510 Verbs – 208 Adjectives – 177 Adverbs – 100 Personal pronouns – 74 Then I divided this tiny corpus into 2 subcorpora: the first one for the comments of the first 4 weeks and the second one for the comments of the last 4 weeks of the course. The number of tokens was balanced. After getting the results, I realised that there was an interesting shift in using personal pronouns, as I tend to generalise the ideas by using “we” in the comments of the first 4 weeks, while in the last weeks’ comments there’s a tendency to use “I” instead. These results are quite unexpected I should say.

Finally, here’s a list of all the bloggers I’ve found on this MOOC:

See the #corpusMOOC tag for all my posts on this MOOC.