#corpusmooc (and text analysis) linkage

Bumped from March as so much interesting stuff…

Updates: just came across culturomics via a 2011 TEDx talk – no, stay…two researchers who helped create the Google Ngram Viewer analyse the Google Books digital library for cultural patterns in language use over time. See the Culturomics site, Science paper etc. Critique: When physicists do linguistics and Bright lights, big dataEMOTIVE, sentiment analysis project at Lboro…Laurence Anthony reviews the future of corpus toolsSentiment and  semantic analysis

Aha, a links post…I’ve got links on text analysis and related all over the shop – see the category and tags for text mining and sentiment analysis on this blog for starters, in particular #ivmooc 4: what? and #ivmooc 2: burst detection, plus Word clouds for text mining. Here’s a broadly corpus related haul.




There’s no shortage of cases. Here’s a selection with particular appeal, either due to subject matter or methodology:

Blogs, Twitter…The dragonfly’s gaze looks at computational approaches to literary text analysis, with a nice post listing repositories and exploring file formats.

Different ways of reading

For a several years I’ve found reading on a screen (and even at all) hard in that I’m programmed to scan, but what with ebooks and tablets really gaining traction and more quality ‘lean back’ content on offer it’s time to review my habits. I’m also interested in different ways of reading – and how they might relate to different ways of writing.

Reading on a screen:

What works for reading on the Web? It doesn’t have to be short, see #longform, but does the nation still shudder at large blocks of uninterrupted text? For more see Ebooks and digital literature.

Ways of reading:

I practise curated reading (I’ve just made this up). If you read book reviews, vaguely literary blogs etc, you already know a fair amount about a book before you pick it up – one of those sources may have made you pick it up in the first place. I might also have done a bit more searching around the book, looking for interviews with the author, their website, free/open bits of their writing elsewhere, online book reviews…so after I’ve read around 50 pages I might feel I’ve had enough. OTOH I might go through the curation process while I’m reading the book, or afterwards, and then put the whole thing together as a book review.

In what happens when I read non-fiction Barbara Fillip talks about connecting: “Once I’m deep into the book, my mind starts wandering and I start making connections with totally different aspects of my life…I get interrupted, read something else, and the connections between the two items I’ve been reading appear.” Think of it as an introvert appropriate approach to social reading.

The other side of this particular coin is that you can sound as if you have read the book without having ever opened it, channelling Pierre Bayard. I’ve heard Iain Sinclair bemoaning a couple of times that people can talk reasonably intelligently about his books without having made the effort to read all 400 pages.

And if you do make the effort, maybe you can write a book about it? Surely there have been loads of these (eg Susan Hill’s Howards End is on the landing: a year of reading from home), but The year of reading dangerously: how fifty great books saved my life by Andy Miller seems to be the latest in the canon. After listening to the Little Atoms podcast and scanning the sample chapter I feel like I can tick it off my to read list, especially as Andy admits his choices are “literary lad classics”. But his advice is to sticking with a book, particularly in these days of instant opinions, as the value of say, Middlemarch, may be in the whole experience.

Ebooks and digital literature

Digital literature offers new forms of interaction between author, work and reader:

Why ebooks:


How tos and tools:

Mainly in HE:

Free stuff:

Publishing platforms:

A post on ebook platform accessibility addresses the what is an ebook? issue.



Singles/longreads are a thing:

I have no luddite prejudice against new technology; it’s just that books look as if they contain knowledge, while e-readers look as if they contain information.

Julian Barnes, quoted by @currybet.

Introduction to digital curation weeks 6-8

For the record…

Week 6 was on “the digital curation worldview”, looking at two theoretical models that are starting to become “if not the ‘orthodox’ view at least the reference from which all deviations are measured”:

  • the OAIS Model (Open Archival Information Systems Reference Model ISO 14721:2003, recently revised as ISO 14721:2012 – developed by individuals interested in space data and information transfer systems
  • the DCC Lifecycle Model and glossary – developed more from within the archive community

The digital curation profession is made up of those whose primary role and job it is to ensure the ongoing accessibility of digital material in all its different forms, from data used in research, to records of businesses and individuals, to ebooks and ejournals, to software and computer games. It is still very much in a process of formation from and within many more established workgroups, including librarians, archivists, museum curators, researchers, computer scientists and IT professionals.

So not my worldview…

Week 7 looked at the digital curation community and its spaces, ” sites of community activity, shared resources and the active participation of individuals as they strive to keep up to date with developments and learn from each other”. This need not trouble us further.

Week 8 looked at the competencies and skills deemed necessary for those working in digital curation, referencing two frameworks:

There’s a Twitter chat on 30 June, with five questions:

  • What is digital curation? (definitions should be no more than one tweet long)
  • Has (and, if so, how has) your sense of what digital curation is changed as a result of this course?
  • How do you think ‘the general public’ view digital curation?
  • How can digital curation be made more mainstream?
  • What (if anything) will you be doing to interest and inform others in and about digital curation?

Literature of the English country house

The Literature of the English country house MOOC on FutureLearn is being run by Jim and Susan Fitzmaurice, director of distance learning and head of the School of English at Sheffield respectively. Runs for eight weeks from 2 June to 20 July, with a workload of three hours per week. Twitter: @FLHouseLitSheff (posting inter alia selfies and cat spam, trying too hard) and #FLHouseLit:

A journey through the literature of English country houses from the time of Thomas More to Oscar Wilde…you’ll learn to analyse literature using a technique called ‘close reading’. It will help you to make your own connections between country house literature and its historical backgrounds.

A large component of my first degree was studying German literature, but that was a fair while ago…at the moment I’m hoovering up everything available on literature to see what sticks in taking forward literary non-fiction as a writing project. This sticks out as offering an additional angle on the literature of place. In addition, two of the team are described as literary linguists – the use of language within literature of place?

My MOOCs seem to be progressively getting more leisure oriented – well, it is the (Danish) summer! Houses to be visited shown on a map got me thinking about houses I could research to write about, such as Lauriston Castle in Edinburgh, that castle Bothwell died in, Kierkegaard’s houses…see too the Historic Houses Association.

The warm-up activity has attracted 54 comments already. Nope, can’t face it:

Have you ever visited a country house, either in England or elsewhere in the world? What was it like? If you haven’t had chance to visit a country house is there anywhere you would like to visit, and why?

Other than that, there’s lots of close reading.

What is close reading? 

Close reading describes, in literary criticism, the careful, sustained interpretation of a brief passage of text. Such a reading places great emphasis on the single particular over the general, paying close attention to individual words, syntax, and the order in which sentences and ideas unfold as they are read. (Wikipedia)

Close reading differs from general reading in that we go back to the text to reread it, to focus specifically on particular details of language, to dig deep and uncover layers of meaning in the text. Close reading allows us to create an interpretation as well as an understanding of a text…Why read a text closely? Close reading gives us a deeper understanding of what a text could mean. And it allows us to fit texts into their wider cultural and historical context.

  1. First reading – to discover the general meaning of the text, an impression of the narrative, a tone of voice, a sense of a character, and perhaps of the period the text is set in.
  2. Second reading – concentrate attention on the language and structures of the text in order to confirm or test the impressions gained in the first reading, to reach a deeper level of meaning and different layers of meaning. Details like word choice, imagery, sentence structure, and the arrangement of sounds will all provide clues to these meanings.
  3. Third reading – delve into the cultural and historical context, then by using specific words and structures link the text to other texts and its wider context/s.

What do we examine? As we read closely, the word, a passage or scene will catch our attention. Look for what’s unexpected or surprising in a passage, the strikingly apt or especially appropriate. A repeated word may be a key word, or it may point to key words. Engage closely and intently with each word, line and sentence, watch carefully and think about each word and each phrase (the historical and contextual meanings of words); they combine to form chains of meanings; which words are important? why might they be important? Make notes, look up words in the dictionary and highlight phrases.

Close reading also came up in #corpusmooc as the qualitative angle, and in #FLfiction14 – see Read what you want to write. A useful technique IRT writing, editing and translation, but the passages put forward for close reading here aren’t for me so far (excerpts from Twelfth Night, Ben Jonson’s To Penshurst, Thomas More’s Utopia).

Week 2 focused on entertainment in the country house. Discussions allegedly explored the role and relationships between primary textual analysis and secondary information, eg historical or biographical context. Texts still a bit early for me, but I’ve bookmarked some texts of my own for attention instead.

Week 3’s historical and cultural context was attitudes to politeness in the 18th century. How important was it to be (and more importantly, to be seen to be) polite? Did everybody regard politeness in the same way? Did views of politeness change over time?

  • relationships between politeness, conversation and sociability (the ability to make people feel at ease in a variety of social situations)
  • the concept of the social house – a house where the owners prided themselves on being able to create an environment for people to be sociable and  at ease with one another; offered opportunities to socialise with people like them; this could extend to the bedchamber in an effort to be sociable, to entertain and not seem rude, leading to glamorous negligees put on to entertain as if you had just got out of bed
  • the language of politeness –  the 18th century notion of politeness was a model of behaviour which eased interaction and sociability among people, different from the modern day notion of minding one’s manners
  • by the end of the 18th century politeness associated less with sociability, more with form – being recognisably polite,
    having taste; more about one’s interest in self expression and impact on those around you than being sociable, paying attention to other people or being cooperative
  • became a target of satire – eg particular ways of speaking which function to exclude other people from that social circle

Fun! Of interest too IRT issues of negative and positive politeness and The Danes.

Week 4 wheeled out Jane Austen, looking at free indirect discourse in Pride and prejudice, a stylistic technique used to bring the reader into the perspectives of the narrator and the characters:

Free indirect discourse is a narrative style which is used for the representation of spoken words or thoughts. It typically appears in fictional prose when a character’s words or thoughts infiltrate the third person narrative, so that the perspective shifts from that of the narrator to that of the character.

Crucially, the style is not explicitly announced, and the speech or thought is not directly attributed to the character. Instead the reader has to rely on a number of stylistic cues to determine whether the character’s point of view is present. These cues include:

  • exclamations and questions
  • subjective or evaluative language which indicates the character’s opinion
  • markers of space and time from the character’s perspective

The heroine’s thoughts are so intermingled in the narrative that it’s often very hard to tell where they stop and
where the narrator comes in. We come to understand Elizabeth’s perspective well, but don’t really get
into the heads of anybody else. Least of all Darcy’s! Readers’ responses can range from empathy to ironic distance.

On to week 5, and the Gothic, examining Ann Radcliffe’s Gothic novel The mysteries of Udolpho and dissecting the reclusive Miss Havisham from Great expectations - skipped. Week 6, feels like it’s dragging on a bit, with rather less about the houses than might be expected, but if you were into children’s lit, specifically Edward Lear, Lewis Carroll and nonsense verse, this week was for you. Week 7 explored the end of the century, as seen through Oscar Wilde – the idea of country house transformation through non-English ownership in The Canterville Ghost and the subsurface of polite society in The importance of being earnest. “These texts suggest the end of the English country house tradition, or its possible Anglo-American reformation.”

So just as it was getting interesting we get to week 8, “reviewing 450 years of history, many different locations and a variety of authors and texts” with a marked assessment. Also, a rather handy discussion task on making connections between authors and vice versa:

Post a comment suggesting a country house which you would like to visit then reply to another learner’s comment, suggesting a piece of literature that they should read before visiting their chosen house, giving your reasons why it is relevant.

Taking me back to my list of houses to research from week 1.

From the final farewell:

We hope that among other things, what you have taken away from this course are two new ways of reading. The first is contextual reading to place the literature in its cultural and historical context. The second is close reading which is the intense, concentrated engagement with the text which we hope has provided a whole new way of looking at literature.

Yes to both, as a refresher course literary critique, although with a couple of exceptions the selection of texts wasn’t really for me.

Start writing fiction 7-8: reading and reflecting

Weeks 5 and 6 were on character, which I skipped – OK I could have tried to translate some of it into place as character, but this didn’t feel like a hugely productive exercise.

Week 7 was on reading as a writer. How can reading help develop the ‘habit’ of writing? See read what you want to write, a bit of a truism, but which ties in with close reading techniques. Learning to read as a writer helps you to improve and learn regarding your own writing skills – for example book reviewing reveals how voicing text appreciation and learning text analysis skills can really help accelerate writing development.

Learning from reading

Your opinion about what you read is important and you now have the skills necessary to be more analytical in assessing why you prefer one story, or novel, over another. Choose one book you have read and liked, and one you have read and disliked. In 100 words, say why you think a particular book you have read works; again, in 100 words, say why you think another book does not. Note especially:

  • how effective the characterisation is in these books
  • whether these books make you want to read on – why or why not
  • how and why you consider a book or passage in a book ‘works’ or doesn’t ‘work’.

Are there any aspects in your own work that tally with elements you enjoyed reading in the published novels?
Are there any aspects that you noticed about published novels where the writing was seen to be ‘working’ that are relevant to your writing?

Reading as a writer

Noticing details about the construction of language, plot and story in what you read will help form your own writing taste and style.

  • How long is the short story or novel?
  • Are there chapters? Sections? Parts?
  • If it’s a short story, how is it structured?
  • When and where is it set, do/how do these things appear to matter, and how are they conveyed?
  • From whose point of view is the story being told? Is it the story of one, or more than one of the characters?
  • Is there dialogue? If so, what kind?
  • Is the language modern, plain, elaborate, colloquial?
  • Are there short or long sentences?
  • Are the sentences ‘properly formed’, or broken down? For example, ‘Get this. Bravery. That wasn’t even in it. Heroism? Maybe that was nearer the mark.’
  • Would you say that the story was a ‘page-turner’?
  • Is it full of ‘researched facts’?
  • Is there much ‘internal’ psychological or emotional detail, or is most of the novel or story taken up with ‘external’ events or description?
  • How do you learn of the main characters?
  • Are the minor characters sufficiently clear or too flat?
  • In your opinion, is it clearly aimed at a certain type of reader?

Identifying the techniques and methods of other writers will influence and help your own style.

Week 8 was on sharing and reflecting “on the main tools you’ve picked up during the course and how these helped you turn characters into short stories”.

The quiz:

  • What are the rules for using your writer’s notebook? – There are no rules.
  • How should you start writing? – Write every day, even if you don’t think you’ve got anything to say. Looking through your notebook is always a good idea.
  • Why should you read the work of other writers? – To help you to learn how to do it yourself.
  • Should you share your writing with fellow writers? – Yes, because they can help me to evaluate my work, and analysing their work will help me to evaluate my own.
  • Editing essentially means reflecting on what you have written and redrafting it as many times as you have to. See Waste Effect’s review of The work of revision for more on this.
  • What does ‘learn through writing’ mean? – Do as much writing as you can and learn by doing it. The only way you can learn to write is by doing it!

Not convinced the creative writing course model is for me unless I can find the ideal Venn diagram, but lessons learnt on keeping a journal/notebook (using a combination of Fargo and blogs for now), writing every day, the importance of editing/rewriting and research. Still stuck on establishing rituals, a place to write, moving from notes to narrative. New discovery: research notes!

Bumping three ideas:

  • keep ideas floating – find a possible story you might be able to draw out of your notes, research elements for this idea and develop your journal notes
  • develop ideas – imagine more detail, do research, ask some ‘what if?’ questions
  • note down a menu of your overall concerns that are likely to be your overall subject matter or material and develop this over time to include detailed descriptions; match your concerns with the ideas in your notebook

What of the MOOC itself? Still getting the impression that FutureLearn is less intellectually rigorous than it might be, and remain untempted by the discussion facilities. Activity tailed off as ever – 247 tweetsin last 30 days, 55 in the last seven, bloggage less as the weeks went on. Here’s a final thought from Clare Hooper in Experiencing a MOOC: “the sheer volume of people participating on the course made it difficult if not impossible to feel you were part of a real community”.

Introduction to Digital Curation: weeks 3-5

Why title case? Read on to find out…

Logging back in reveals that it is possible to persuade Chrome to remember the nonsense password, hurra! The login screen shows three new messages since last login, but doesn’t exactly lead you to the content. Idiosyncratic at best. The topic for weeks 3-5 is digital curation begins at home. Twitter chat on 5 June. Having just exposed personal curation as a fraud, this should be fun. In her introductory email Jenny states that “there is something very interesting going on with the emergence of what some might term personal digital archiving and others might term community archives“, and which I might term crowdsourcing or co-creation:

How is the way we do curation different in the personal sphere from the institutional sphere and what (if anything) can we learn from that?

Me in discussion forum:

Re digital curation on the professional/insitutional level I’ve found it a useful approach to capture information from academic events, where a record is seldom kept of proceedings. On the personal level I find it a useful sensemaking tool. Any professional curators here? Lines between archiving and curating seem to be a bit blurry up to now, with the stress on the digital rather than the curation part of the topic. Isn’t the role of a curator to impose a narrative? For me we need to distinguish between archiving and curating. Wikipedia: Traditionally, a curator or keeper of a cultural heritage institution (eg gallery, museum, library or archive) is a content specialist responsible for an institution’s collections and involved with the interpretation of heritage material. The problem I have with ‘personal curation’ is that there is no audience.

A couple of people responded from type 1 (see below), and browsing the forums reveals discussions around technical vs interpretation.

A typology from the what is digital curation thread, with proposed new terminology:

  1. Digital data management – digital curation as understood by the e-science and data communities – narrowly defined, highly skilled and technical practices such as those of the Digital Curation Centre. This seems to be the earliest definition (2003).
  2. Digital stewardship – the utilisation of traditional practices and skills of museums, archives and art galleries as applied to digitised materials. This would involve the acquisition, selection and careful digitisation of physical texts/materials/objects – for the primary purpose of preservation – and then the contextualised exhibition of these items within a insitutional ‘space’ (whether this be physically within a museum (etc) via a digital screen, or via a website that has been ‘curated’ by a professional within the sector).
  3. Digital preservation – the ‘work undertaken to hold digital culture in trust for future generations’. This would involve the management of obselete (or soon to be obsolete) digital data (web pages, files, etc) in such a way that it would be usable by future generations using more advanced technologies. Examples of this would be the work undertaken by the Internet Archive.
  4. Digital (social) cataloguing – content curation- the, (relatively) non-technical, digital equivalents of the wider cultural trend for content ‘curation’. This would mean the (knowledgable) selection/cataloguing of digital content into (more of less) logical categories. Examples of this would be activities such as creating ‘intelligent’ YouTube or Spotify playlists focused on specific themes, or the cataloguing of diffuse links to digital content under particular topic headings.

If the primary intention of the course was to cover 1, getting involved in personal curation was bound to muddy the waters rather.

Let’s see what Jenny has for us (my bolding)!

One strong message that has led to and from the emergence of digital curation is that data stored digitally is both fragile and challenging when it comes to the question of its ongoing accessibility. In recent years, a concern has emerged, particularly among elements of the library and archive community such as the US Library of Congress, with ‘personal digital archiving‘ or ‘personal archiving‘. This concern reflects a desire to support individuals in managing their own digital material through the provision of information and advice.

In this section we will:

  • start to gain an understanding of what it means to undertake digital curation by considering it in a personal context
  • explore our own use of different storage media and file formats and the implications that has for the ongoing accessibility of our data
  • undertake some experiments with checksums and with exporting or format shifting our own material
  • reflect on our own practice in managing our personal digital material

Three resources, one for each week:

  • 19-25 May: the challenge of obsolescence – an introduction to storage media and file formats; “increasingly though, many of us are choosing to outsource our storage to the cloud, with the result that the storage media and the way in which it is stored becomes practically invisible” – well quite…skipping this; see Oliver Burkeman again
  • 26 May – 1 June: some strategies for digital curation – an introduction to format conversion and checksums…pass
  • 2-9 June: managing your own digital material – personal digital archiving and trusted digital repositories; “there has been some debate about the difference between digital curation and digital preservation”…see Sarah Higgins‘ article on Digital curation: the emergence of a new discipline. On the forum (personal) knowledge management almost got a look-in too.

Preservation implied a passive state, where material would be mothballed in an inaccessible “dark archive” [...] Over the last few years, the focus has shifted to ensuring that digital material is managed throughout its lifecycle so that it remains accessible to those who need to use it. [...] Digital material is actively preserved, used and reused for new purposes, creating new materials. This is Digital Curation.

Adding value, creating new materials as a form of interpretation – is this (digital) curation? This is going nowhere fast, but I will check in again for week 6.

Forum challenge: write a post/s outlining:

  • the results of a survey of your own digital material. What is its extent? In what formats and on what storage media is it held?
  • the results of any experiments. Have you tried to change the format of any of your material? Did you have a play with checksums?
  • your assessment of your management of your own digital material. Do you think you manage it well and, if so, why? Do you think you should manage it differently? What are the main problems you face when trying to manage your material? Have you used any specific software or services in this context and what do they do for you?

There’s also a Twitter chat on 5 June around the following questions:

  1. How good are you at looking after your own digital material?
  2. What would help you (and others) to look after their digital material better?
  3. Whose job should it be to help individuals look after their own digital material?
  4. Why do we need to look after our digital material anyway?
  5. Does digital curation begin at home?

Will it be curated? Update: no. It was quiet…