#corpusmooc 6-7: language learning

See the #corpusMOOC tag for all my posts on this MOOC.

Weeks 6 and 7 are about language learning, with week 6 on textbook and dictionary construction and week 7 on learner corpora.

Week 6

Quiz after cursory view of vids:

  1. What new information can corpora bring to language teaching? – information about what forms are common (and therefore useful) in language
  2. What did the early corpus-based approaches to language teaching focus on? – vocabulary
  3. What is the most frequent verb form according to George (1963)? – simple past narrative
  4. What are the findings of Altenburg’s (1990) research? – frequent discourse items such as “so”, “well” and “right” are not handled well by grammars or dictionaries
  5. Which of these was NOT among the findings of Master’s (1987) research? – in academic writing, the indefinite article is more common than the definite article; Master’s research focused on generic uses of articles; besides, the definite article is more frequent than the indefinite article
  6. Which of these is the preferred order for modal verbs to be introduced in textbooks, according to Mindt’s research? – will, must;  “Will” is considerably more frequent than “must”, and Mindt’s suggestion is that the more frequent modal should be introduced first
  7. What is “the lexical syllabus”? – a syllabus for language teaching suggested by Sinclair and Renouf that is based on frequency information derived from a corpus; “the main focus of study should be on a) the commonest word forms in the language b) the central patterns of usage c) the combinations which they usually form”; 4000-5000 different word types account for up to 95% of written texts; 1000 words account for 85% of written texts; 50 high frequency function words account for up to 60% of spoken language –> the 100 most common written words in Danish
  8. What are “lexical bundles”? – word cluster, sequences of two or more words that occur frequently in language:


Words in the Cambridge Advanced Learner’s Dictionary:

  • Essential: 4900 terms
  • Improver: 3300 terms
  • Advanced: 3700


Of the endless stream of supplementary materials there’s a whole chunk on corpus linguistics and GIS to make me happy – see separate post.

Week 7

The warm up activity:

I learnt French and German at school. The method that was used to teach me these languages was the grammar translation method. Armed with an introduction to a grammatical feature, a grammar of the language and a dictionary, I learnt the languages largely by translating chunks of them. What second languages have you learned? How did you learn them?

Check! The language classes on my 1980s German degree were made up almost solely of translation both ways, moving on to summarising texts in German in the second year – very hard! All classes were in English, bar the ‘oral’ class. In contrast, my 2000s Danish classes were conducted in Danish from the start, with no translation exercises at all although increasing levels of essay writing.

My German has stayed with me, many years later. When I can’t think of a word in Danish, the German pops out instead. And while I may have a reasonable level of proficiency in spoken and written Danish, translation calls for a different set of skills, which I don’t have (reconsider!). Another example of either/or not working, or perhaps different goals?

Going back to the thread at the end of the week reveals 199 responses, which as a long stream is difficult to digest –  in need of curation by the mentors, or could text analysis help reveal key themes?

The discussion question goes on to reflect on the use of corpus methods, attracting 265 responses:

How did corpora impact upon your experience of language learning? In my case they did not – none of the materials I was given to use when learning French and German were corpus based. If that is the case for you, how could the materials you used have been informed by corpus data? Can you recall experiences where corpus informed materials might have helped you? Use any corpus materials you wish to add weight to the points you make.

Updates: 14 April…scanning the responses made me feel in a class of one, but I need a verb table on the Guardian’s language learning blog shows I am not entirely alone…16 April: “Learning a language is not only tough but may be dull unless it involves intellectual challenges, cultural attractions, and communicative rewards.” Universities must make languages relevant, says Oxford University Professor.


  1. What is a learner corpus? – a corpus that contains data from people learning a particular target language
  2. Which of these is NOT an example of a learner corpus? – LOB
  3. What was the most frequent error type in the essays written by Mandarin Chinese students in Chaung and Nesi’s (2006) research? – missing definite article
  4. What are the main findings of Leńko-Szymańska’s (2006) research about the rhetorical strategies used by Polish and American students? – Americans talk about their own experience while Polish students approach the topic on a more general level
  5. How can learner corpora be used in the production of dictionaries? – learner corpora can be used to identify typical learner errors that can be pointed out in the dictionary
  6. What is data-driven learning? – direct use of corpora and corpus-generated concordances in the language classroom

Main errors:

  • grammatical: 85.9%
  • lexico-grammatical: 5%
  • lexical: 9.1%

Differences between native and non-native English usage:





Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s