#corpusmooc (and text analysis) linkage

Latest: Journalistic representations of Jeremy Corbyn in the British press

Updates: never forget – Sentiment analysis is opinion turned into code; see Stanford Named Entity Tagger, which has three English language classification algorithms to try, and a list of 20+ Sentiment Analysis APIs. Next up: Seven ways humanists are using computers to understand text, Semantic maps at DH2015Sensei FP7 project: making sense of human – human conversation (Gdn case study). Donald Trump’s tweets analysed. Pro-Brexit articles dominated newspaper referendum coverageAmericanisation of English

Updates: just came across culturomics via a 2011 TEDx talk – no, stay…two researchers who helped create the Google Ngram Viewer analyse the Google Books digital library for cultural patterns in language use over time. See the Culturomics site, Science paper etc. Critique: When physicists do linguistics and Bright lights, big dataEMOTIVE, sentiment analysis project at Lboro…Laurence Anthony reviews the future of corpus toolsSentiment and  semantic analysisanalysing Twitter sentiment in the urban context and againWisdom of the crowd, research project from inter alia Demos and Ipsos MORI, launches with a look at Twitter’s reaction to the autumn statementThe six main arcs in storytelling, as identified by an AI

Aha, a links post…I’ve got links on text analysis and related all over the shop – see the category and tags for text mining and sentiment analysis on this blog for starters, in particular #ivmooc 4: what? and #ivmooc 2: burst detection, plus Word clouds for text mining. Here’s a broadly corpus related haul.

Projects:

Tools:

Corpora:

There’s no shortage of cases. Here’s a selection with particular appeal, either due to subject matter or methodology:

Blogs, Twitter…The dragonfly’s gaze looks at computational approaches to literary text analysis, with a nice post listing repositories and exploring file formats.

#corpusmooc: review that journal

Capture

Updates: why take notes? The Guardian view on knowledge in an information age. What type of note taker are you?

Each week in #corpusmooc, straight after the vids, we’ve been exhorted to “update your journal”. A bit of explanation might have been idea for those not into Lancaster’s particular form of reflective practice, plus maybe “notes” would have worked better as a catch all, but hey… As you can see there were 37 comments on this particular page (en passant, think that comments is new; maybe it wasn’t just me who queried what the number referred to – my initial thought was page views). But what’s to comment on?

Some people take handwritten notes, some use Wikipad, Evernote, a couple use mindmapping “to keep the written record of the connections between ideas that come to my mind while learning and reflecting upon what I have learned”. Someone on pen and paper notes commented that “I think I’m absorbing more and retaining what I learn better”. It’s particularly fun that handwritten notes are called out for being “slow” – for me a bigger problem is that underuse has led to my handwriting being even more appalling than before the advent of computers. Mention of Docear, an ‘academic literature suite’ which offers electronic PDF highlighting as well as a reference manager and mindmapping, looks interesting.

Hamish Norbrook has a great approach:

Pen and paper transferred to the single file “MOOC notes”: individual units filed by unit number. I try and sift as I’m going into ‘Stuff I really need in my head and not on paper”, “Stuff I can come back to or refer to’ and and… ‘Stuff I’m unlikely to understand’.

Having never mastered mindmapping I’m a fan of the bullet point. I’ve made the biggest use of screen captures on this MOOC, thanks to Laurence Anthony introducing us to the Windows snipping tool, but in the past I’ve also tried out VideoNot.es – video watching and notetaking on one screen. Why take notes? An infographic on notetaking techniques offers some insights into the recording and retaining of information:

  • only 10% of a talk may last in your memory, but if you take and review notes you can recall about 80%
  • notetaking systems (who knew?) include the Cornell System with a cue column and notetaking and summaries areas, the outline system and the flow based system
  • writing vs typing – writing engages your brain while you form and connect letters helping you retain more – typing gives a greater quantity of notes

Here’s an article on student notetaking for recall and understanding.

CaptureThe final activity on the course is to review your journal, as I suggested in week 4. A number of people have made some progress in analysing their personal or other corpora:

  • on The Waste Land: “‘you’ features as much as ‘I’, which brought home to me how much the fragments in The Waste Land are parts/one side of a conversation, though the actual ‘you’ may not be given a voice”
  • on own notes: “Besides the classic function words such as articles, pronouns, conjunctions we use to see in corpora, I just realized that I use a lot the word ‘so’ in different contexts, especially as an adverb (I have a tendency to write things like ‘this is so interesting’, ‘this subject is so important’, etc), and as a linking word that I seem to use at the beginning of almost every paragraph.”
  • on own tweets, comments on the MOOC but difficult to get data (groan)

Some people have gone the full nine yards already. Liliana Lanz Vallejo:

I loaded the notes that I took of the course and I added the comments that I wrote in all the forums. This made a total of 9,436 word tokens and 2,338 word types. Something got my attention. While in most of the English corpora that I’ve cheked in this course, the pronoun “I” appears close to a rank 20, in my notes and comments corpus “I” appears in rank 2, after “the”.  This is curious because the same thing happens in the corpus of tweets containing Spanish-English codeswitchings that I gathered some years ago. In it, “I” appears in rank 1 of words in English, while “the” is in rank 3. It seems that my English and the English of Tijuana’s Twitter users in my corpus is highly self-centered. We are focusing in our opinions and our actions. Of course, the new-GSL list, the LOB and Brown corpus and all the others were not made with “net-speech”. So there is a possibility of native English speakers favoring the usage of the pronoun “I” in social media or internet forums…I would need to compare my notes and comments corpus to a corpus made of forum comments, and the tweets corpus to one made of social media posts (or tweets, that would be even better).

Andrew Hardie (CPQweb guru) responds: “May this be a genre effect? Are comments/twitter posts of equivalent genre to the written data you are comparing it to? Use of 1st and 2nd person pronouns is generally considered a marker of interactivity or involvement, which is found in spoken conversation but not in most traditional formal written genres. But then, comments on here are not exactly what you would call traditional formal written genres!”. Kim Witten (mentor): “Also keep in mind that while “I” can be perceived as focused on opinions and actions, it is also often indicative of the act of sharing (e.g., “I think”, “I feel”, “I want”), which as Andrew says is a marker of interactivity or involvement. So perhaps it is inward-facing, but for the intent of being outward-connecting.”

Anita Buzzi:

I generally take notes with pen and papers, so I decide to collect all the answers I gave in the two MOOCs on Futurelearn I attended creating my own corpus delicti. I generate a word list with AntConc – word types 944 word token 2937- the results: The first token is “the” freq. 140; the second token reveals that my favourite preposition is “in” 105 freq. then the list goes on showing: “and” ,“I”,”to”, “of”. I annotated the corpus in CLAWS–3016 words tagged, tagset C7 and then USAS. I generate a word list in CLAWS C7 – word types 1032- words token 5910. the resultes shows : nn -nouns 812, jj- general adjectives 213, AT- articles 201, ii preposition 181. I look for VM modal verbs. The first modal 17 hits is “can” and the concordance shows mostly in association with “be”, The second with 15 hits is “may” : may share, provide, be, reflect, feel, represent The third is “would” 10 hits : would like, would be; followed by “could”, “should” and “will” 4 hits; “need to” just 1 hit. While the modal verbs in the London Lud Corpus of Spoken English appear in this scale WOULD – CAN – WILL- COULD- MUST – SHOULD – MAY – MIGHT – SHALL The results I had from the corpus was: CAN- MAY – WOULD- COULD- SHOULD – WILL – MIGHT Why do I use “may” so much? Probably because I was talking about specific possibility, or making deductions.

Amy Aisha Brown (mentor): “Did you take a look at your concordance lines? What does ‘may’ collocate with? That might give you a hint at why you use it so much. Another thought, I wonder if someone has put Tony’s lectures into a corpus. It could be that he uses ‘may’ often and that you have picked it up from him? Maybe you always use it often?” Tamara Gorozhankina:

I’ve collected a very small corpus of all my comments through the course (4,835 tokens), and saved them in 8 separated text files (each file for each week). I used POS annotation in CLAWS C5, and the keyword list showed: Nouns – 510 Verbs – 208 Adjectives – 177 Adverbs – 100 Personal pronouns – 74 Then I divided this tiny corpus into 2 subcorpora: the first one for the comments of the first 4 weeks and the second one for the comments of the last 4 weeks of the course. The number of tokens was balanced. After getting the results, I realised that there was an interesting shift in using personal pronouns, as I tend to generalise the ideas by using “we” in the comments of the first 4 weeks, while in the last weeks’ comments there’s a tendency to use “I” instead. These results are quite unexpected I should say.

Finally, here’s a list of all the bloggers I’ve found on this MOOC:

See the #corpusMOOC tag for all my posts on this MOOC.

#corpusmooc 8 and wrap-up

See the #corpusMOOC tag for all my posts on this MOOC. One more to come, on notetaking and blogging, plus a little text analysis…

Week 8 was on swearing, focusing on conversational English, with a disclaimer encouraging participants to “discuss and debate the topic of this step in an adult and constructive manner”. The warm up activity asked participants to listen out for examples of bad language, make a brief note of the context, who was speaking to who [sic] and what was said, returned to in the discussion question/s: 

Did the analytical framework presented work for the data you collected? If so, which categories of bad language did you hear? If not, why not? Was the language used an issue perhaps? Were there contextual factors not present in the corpus data that seemed important to interpretation in context? Has linguistic innovation changed the use of bad language since the 1990s?

The vids looked at what is ‘bad language’, developing a classification scheme for the data, do men swear more than women (no, but they use different words), how do men and women swear at the opposite gender (men swear less at women), and their own (men use stronger words), do different categories of swearing select stronger or weaker words systematically (quite possibly), how does bad language use and age interact (the young swear more, but is it age which is the issue), how does the use of bad language and class interact (tricky), the desirability – and viability – of looking at multiple factors at the same time, combining gender and age/class in two case studies . It’s all in Tony’s book. See also an article on Rude Britannia, and When Swedes swear, they do so in English: “often in contravention of accepted linguistic norms”. It turns out there’s a network for swearing researchers in the Nordic countries, called SwiSca, and they’ve just published a book.

No quiz, instead the “the opportunity to participate in a rigorous assessment”, similar to that in week 4, with a choice of three essay questions:

  • use the Lancaster Newsbooks Corpus to identify key themes connected with the The Glencairn Uprising
  • use the Lancaster-Oslo/Bergen Corpus (LOB) to explore the use of the passive construction in different genres of written English
  • use the VU-Lancaster Advanced Writing Corpus to explore the use of linking adverbials in advanced student writing

Still not for me.

OK, so what of this MOOC as a whole, and the FutureLearn platform?

Looking first at the discussion forum that wasn’t, it felt hard work just to find comments – click to open the list, endless scroll…then out of normal ‘workflow’, how do you get back to comments?

I’ve had to take this screenshot down to a silly size to get everything on, which makes the point itself (click for a clearer version):

Capture

The comments link at the bottom of the screen opens the list of comments for that page. You can click on poster names, but I’m not really sure why I would want to do that. To get to a broader list of comments you need to click on the square top left, which opens a window with Activity as an option (alongside To do and Progress), see the pretty graphics below.

Here you get options for everyone, following and replies, all completely out of context obv, although you can go to the thread. To find your own posts you need to look somewhere else entirely – top right, the little grey man (I didn’t add an avatar) offers My profile (plus my courses, settings, sign out), with activity, followers and following as options.

Finally, the grey block of nine in the centre at the top of the screen brings down links to courses, about and partners. It’s all a bit sparse, although they oceans of white space may in part be due to the size of my (pretty bog standard) laptop screen.

In addition I found the tone of the discussion forum offputting. Every comment was given a pat on the head, and there seemed to be little substantive discussion. Moreover, on occasion the mentors might pose a question, but the commenter may never see it as there’s no mail alert or sensible way of getting back to your comment. You’d spend more time trying to find stuff of interest than actually digesting the comments. Very disappointing, and that’s without addressing the point that participants were unable to initiate discussions outwith the defined structure of the course.

Whereas in some MOOCs the instructors are completely absent, here they were falling over each other – not a sustainable approach, and I wonder how this affected the discourse. In his final mail Tony comments: “So many of you have said that you have learned a lot from me. As always happens with corpus work, the teacher learns a lot from the students too” – all very binary. And there was no peer review – while acknowledging issues with that, it’s a further reflection of the nature of this beast.

Finally, ain’t it pretty, but what does it all mean?
progress

Under Progress, top left of screen – go me!:

Capture

More broadly, while corpus linguistics is not rocket science at this level (and the conclusions often seem surprisingly subjective) it’s a technique I’m glad I know more about. For my needs there was too much on using massive corpora – some examples of smaller projects might be an idea next time out, plus less ‘pure’ linguistics. In terms of presentation it felt more like a ‘course’ aimed at a fairly traditional student cohort than something more innovative, due in part to the absence of community and curation – just a loong stream of stuff. Looking at Tony’s post on Macmillan Education, this is perhaps not altogether surprising:

Are MOOCs the future of education? Well, in my opinion, yes and no. Yes – we must use them…But then also no – MOOCs must live with, and complement, face-to-face teaching, in my view. The responsiveness and immediacy of face-to-face teaching cannot be readily provided via a MOOC. If nothing else, the scale of the enterprise defies any credible and sustained attempt at building a rapport with individual students, which is, in my experience, a key motivator for students and staff alike.

In the light of all the above it’s not really surprising that Twitter didn’t really take off, but here’s the TAGS bits n bobs: viewer | spreadsheet | spreadsheet map version | map:

map

#corpusmooc and spatial humanities

Update, 2016: not so supplementary now; see Spatial Humanities 2016 (programme: short & full | @spatialhums & #SH_2016), lots of delights.

Amongst the supplementary materials in weeks 6 and 8 of #corpusmooc was Ian Gregory on the potential for using GIS in corpus linguistics, aka spatial humanities.

First up, Mapping the Lakes (and version 2):

mapping

Place names were coded in XML and converted to a GIS, allowing mentions to be compared. Other features mapped included emotional response (on a scale of 1-10) and physical characteristics, ie altitude. Photos from Flickr were also incorporated. The end result permitted close reading of the text alongside a map of the area described. Next up, a corpus of Lake District writing for the period up to 1900, over a million words from 80 texts.

Next, geographical text analysis:

Capture

Claire Grover’s work (of Trading Consequences) on georeferencing (ie identifying all the place names automatically (?), pulling them out of the text and linking them up to a gazetteer to give them a point location on a map) 17,667 instances of places mentioned in the Registrar General’s Reports, 1851-1911 (2 million words; Histpop). Recall: 81%, precision: 82%, and correct with locality: 75%. Mapping the instances and smoothing gave a pretty good reflection of major population centres in England, however with Bedford as an outlier cluster:

Capture

Analysing ‘London’ found high z-scores relating to water supply/quality, whereas the Liverpool/Manchester cluster was more descriptive of diseases, with no discourse on water supply. Exploring causes of death and mapping collocations with place names led to the following conclusions:

Capture

Geographical text analysis can help to understand the geographies within a corpus. At the moment we only have recall and precision statistics of about 80%, but this will get better, and even if it doesn’t you still have most of the place names within a text. Bringing together statistical summaries from corpus linguistics and micro/close readings helps understand what’s going on within a text, to aid in decisions on which parts you perhaps need to close read and which parts you can ignore.

More on georeferencing place names, from Putting big data in its ‘place’…the power and value of amalgamating and querying content by ‘place’ has long been recognised through the use of place name gazetteers, however these have limitations as they tend to record only modern place names and lack spatial resolution. A number of initiatives aimed at extending the scope of modern gazetteers include:

Some spatial hums linkage:

See also my post on Telling stories with maps: literary geographies.

#corpusmooc 6-7: language learning

See the #corpusMOOC tag for all my posts on this MOOC.

Weeks 6 and 7 are about language learning, with week 6 on textbook and dictionary construction and week 7 on learner corpora.

Week 6

Quiz after cursory view of vids:

  1. What new information can corpora bring to language teaching? – information about what forms are common (and therefore useful) in language
  2. What did the early corpus-based approaches to language teaching focus on? – vocabulary
  3. What is the most frequent verb form according to George (1963)? – simple past narrative
  4. What are the findings of Altenburg’s (1990) research? – frequent discourse items such as “so”, “well” and “right” are not handled well by grammars or dictionaries
  5. Which of these was NOT among the findings of Master’s (1987) research? – in academic writing, the indefinite article is more common than the definite article; Master’s research focused on generic uses of articles; besides, the definite article is more frequent than the indefinite article
  6. Which of these is the preferred order for modal verbs to be introduced in textbooks, according to Mindt’s research? – will, must;  “Will” is considerably more frequent than “must”, and Mindt’s suggestion is that the more frequent modal should be introduced first
  7. What is “the lexical syllabus”? – a syllabus for language teaching suggested by Sinclair and Renouf that is based on frequency information derived from a corpus; “the main focus of study should be on a) the commonest word forms in the language b) the central patterns of usage c) the combinations which they usually form”; 4000-5000 different word types account for up to 95% of written texts; 1000 words account for 85% of written texts; 50 high frequency function words account for up to 60% of spoken language –> the 100 most common written words in Danish
  8. What are “lexical bundles”? – word cluster, sequences of two or more words that occur frequently in language:

Capture

Words in the Cambridge Advanced Learner’s Dictionary:

  • Essential: 4900 terms
  • Improver: 3300 terms
  • Advanced: 3700

Capture2

Of the endless stream of supplementary materials there’s a whole chunk on corpus linguistics and GIS to make me happy – see separate post.

Week 7

The warm up activity:

I learnt French and German at school. The method that was used to teach me these languages was the grammar translation method. Armed with an introduction to a grammatical feature, a grammar of the language and a dictionary, I learnt the languages largely by translating chunks of them. What second languages have you learned? How did you learn them?

Check! The language classes on my 1980s German degree were made up almost solely of translation both ways, moving on to summarising texts in German in the second year – very hard! All classes were in English, bar the ‘oral’ class. In contrast, my 2000s Danish classes were conducted in Danish from the start, with no translation exercises at all although increasing levels of essay writing.

My German has stayed with me, many years later. When I can’t think of a word in Danish, the German pops out instead. And while I may have a reasonable level of proficiency in spoken and written Danish, translation calls for a different set of skills, which I don’t have (reconsider!). Another example of either/or not working, or perhaps different goals?

Going back to the thread at the end of the week reveals 199 responses, which as a long stream is difficult to digest –  in need of curation by the mentors, or could text analysis help reveal key themes?

The discussion question goes on to reflect on the use of corpus methods, attracting 265 responses:

How did corpora impact upon your experience of language learning? In my case they did not – none of the materials I was given to use when learning French and German were corpus based. If that is the case for you, how could the materials you used have been informed by corpus data? Can you recall experiences where corpus informed materials might have helped you? Use any corpus materials you wish to add weight to the points you make.

Updates: 14 April…scanning the responses made me feel in a class of one, but I need a verb table on the Guardian’s language learning blog shows I am not entirely alone…16 April: “Learning a language is not only tough but may be dull unless it involves intellectual challenges, cultural attractions, and communicative rewards.” Universities must make languages relevant, says Oxford University Professor.

Quiz:

  1. What is a learner corpus? – a corpus that contains data from people learning a particular target language
  2. Which of these is NOT an example of a learner corpus? – LOB
  3. What was the most frequent error type in the essays written by Mandarin Chinese students in Chaung and Nesi’s (2006) research? – missing definite article
  4. What are the main findings of Leńko-Szymańska’s (2006) research about the rhetorical strategies used by Polish and American students? – Americans talk about their own experience while Polish students approach the topic on a more general level
  5. How can learner corpora be used in the production of dictionaries? – learner corpora can be used to identify typical learner errors that can be pointed out in the dictionary
  6. What is data-driven learning? – direct use of corpora and corpus-generated concordances in the language classroom

Main errors:

  • grammatical: 85.9%
  • lexico-grammatical: 5%
  • lexical: 9.1%

Differences between native and non-native English usage:

Capture

Capture

Capture

#corpusmooc practical 2: using CQPweb

This post summaries the CQPweb practicals from #corpusMOOC.

CQPweb is an online environment for corpus processing, giving access to a number of tools plus a whole range of corpora. Access to the British National Corpus is via a similar interface, BNCweb. Note that there are lots of CQPweb installations based on the IMS Open Corpus Workbench – this is Lancaster’s. Advised to use unique password – scaremongering or no?

Compared with Gephi etc it’s so easy…plus the pace of the vids is painfully slow, however I shall persevere – Andrew has a lovely voice to listen to. All the vids and more are available on the Corpus Workbench YouTube channel.

With the mentors etc doing one to one advice, is this an online course rather than a MOOC?

Week 5:

  • standard queries – can just count hits; restrict to eg academic; see language syntax link for more
  • the concordance screen – use curly brackets for lemma queries; KWIC View and Line View; filename details; POS tags when hover over word, click for context stuff; show in random order to get more of an overview of results (useful when taking a subset to analyse); analysis options inc download under New query menu
  • corpus files – accessing metadata; always includes text ID label and no of words in text, may include classification/categorisation fields, or free fields (so much libship!); hover just shows classification fields
  • restricted queries – a query using a classification scheme; available schemes shown below the search box
  • query history – a record of all of the queries you have carried out on a corpus; allows you to rapidly replicate queries; stands as a useful record of how you conducted your research

Week 6:

  • the distribution option – under New query analysis options; view as table or bar chart; shows frequency per million words, ie normalised
  • the thin function – reduce hits in a query for manual analysis; also under NQ analysis options and passim
  • sorting – by words to the left/right; can restrict to POS
  • saving a query – ie save current set of hits; can also use query history
  • using wildcards – see simple query language syntax
  • finding annotation:
    • primary – usually POS tags; search for eg strange_adj to find occurrences of strange as an adjective; need to know which tag set has been used (under corpus info); can also search for POS, use wild cards…
    • secondary – usually lemmas; ie to search for all forms of a word; syntax: {lemma}
    • tertiary – usually simplified POS; syntax:_{}searchterm; can be used to disambiuate lemmas (apparently…)
  • looking for phrases – can just type in a phrase; can treat each word as a single word, ie use wild cards, combine lemma queries with word queries etc; NB issues with punctuation used within wildcards etc

Week 7:

  • collocation – need to create collocation database for query; not rocket science
  • sub-corpora – from main menu; can use for restricted queries 
  • frequency lists – can download
  • keywords and keytags – can access frequency lists from other corpora on system

Week 8:

  • categorising queries
  • downloading results – accessed from the concordance query under the new query menu

Also introduces BNC64, which allows you to manipulate and explore a subset of BNC spoken data, specifically to compare male and female speech.

#corpusmooc 5: looking at social issues

See the #corpusMOOC tag for all my posts on this MOOC.

Tony’s lecture/s this week summarised a CASS project looking at the possible reputational and other benefits of hosting the Olympics and Paralympics, published as the London 2012 media impact study.

The research questions illustrate the importance of this aspect – just looking at a word list may not be particularly enlightening:

rqs

For RQ1, looking at the role hosting the Paralympics may have had in making the media choose preferred over dispreferred naming strategies (see the Office for Disability Issues guidelines) when talking about disabled people, the following search terms were used:

  • preferred: person with a disability, disabled, wheelchair user, wheelchair-user, uses a wheelchair
  • dispreferred: handicapped, cripple, crippled, wheelchair bound, wheelchair-bound, confined to a wheelchair
  • politically correct: differently able and handicapable (+ dispreferred…)

Exploring frequency lists over time led to some encouraging findings. Are similar trends reflected in the English language globally? Here collocates, grouped into categories (sports, society, financial…) and semantic categories (people, age indicators, extent of disability…), were explored, finding for example that there is a tendency in American English to categorise disability to a much finer degree than in British English, and to construe perceived ability on this basis (see, for instance, functionally disabled).

The work went on to look at the differences in the representation of disabled people in British and American English, followed by a close look at the word ‘disabled’.

Quiz:

  1. Which of these is NOT a stage in corpus compilation as defined by Kennedy (1998: 70-85)? – corpus publication
  2. How large should a corpus be? – depends on a number of factors such as the genres we want to analyse and the frequency of the features we want to look at
  3. How large is the smallest corpus mentioned in the lecture on corpus design in week 4? – 20,000 words
  4. Which is the preferable option for storing a corpus? – storing each text as a separate file
  5. Which type of markup is compulsory for every corpus? – none
  6. What corpus data was used in the research discussed in this week’s (week 5) lecture? – both existing corpora and newspaper data gathered for the project
  7. What were the findings of the research about the effect of the Paralympics on the British newspaper discourse? – the Paralympics had a positive effect because they promoted preferred naming strategies and supressed dispreferred ones
  8. What were the findings of the research about the differences in the representation of disabled people in British and American English? – British English is more empowering than American English because it represents disabled people more in active contexts

Covers last week as well, actually quite handy to do it a week later.

Really need a quick way of getting to full layout of the course, as on Canvas. RU sure there’s enough supplementary material, it’s really demotivating scrolling through all the stuff I won’t do. Seems they are all vids with academics talking about their research – hmm, some well timed articles will do that OK. I’m reminded of #mapmooc, where you the videos were optional rather than bearing the full weight of the content.

One of the readings is a critical look at software tools in corpus linguistics by Laurence Anthony, may give that a whirl, however after reading an article yesterday on academic writing, albeit in a US context, am wondering if this is a barrier for MOOCers outside HE. It’s an unfamiliar style for many, and may even create a barrier to learning. I tend to stick academic stuff in a folder and never go back to it, whereas I will scan a blog post. There’s a need for more engaging writing, not just for the public at large but for other professions.