#corpusmooc practical 2: using CQPweb

This post summaries the CQPweb practicals from #corpusMOOC.

CQPweb is an online environment for corpus processing, giving access to a number of tools plus a whole range of corpora. Access to the British National Corpus is via a similar interface, BNCweb. Note that there are lots of CQPweb installations based on the IMS Open Corpus Workbench – this is Lancaster’s. Advised to use unique password – scaremongering or no?

Compared with Gephi etc it’s so easy…plus the pace of the vids is painfully slow, however I shall persevere – Andrew has a lovely voice to listen to. All the vids and more are available on the Corpus Workbench YouTube channel.

With the mentors etc doing one to one advice, is this an online course rather than a MOOC?

Week 5:

  • standard queries – can just count hits; restrict to eg academic; see language syntax link for more
  • the concordance screen – use curly brackets for lemma queries; KWIC View and Line View; filename details; POS tags when hover over word, click for context stuff; show in random order to get more of an overview of results (useful when taking a subset to analyse); analysis options inc download under New query menu
  • corpus files – accessing metadata; always includes text ID label and no of words in text, may include classification/categorisation fields, or free fields (so much libship!); hover just shows classification fields
  • restricted queries – a query using a classification scheme; available schemes shown below the search box
  • query history – a record of all of the queries you have carried out on a corpus; allows you to rapidly replicate queries; stands as a useful record of how you conducted your research

Week 6:

  • the distribution option – under New query analysis options; view as table or bar chart; shows frequency per million words, ie normalised
  • the thin function – reduce hits in a query for manual analysis; also under NQ analysis options and passim
  • sorting – by words to the left/right; can restrict to POS
  • saving a query – ie save current set of hits; can also use query history
  • using wildcards – see simple query language syntax
  • finding annotation:
    • primary – usually POS tags; search for eg strange_adj to find occurrences of strange as an adjective; need to know which tag set has been used (under corpus info); can also search for POS, use wild cards…
    • secondary – usually lemmas; ie to search for all forms of a word; syntax: {lemma}
    • tertiary – usually simplified POS; syntax:_{}searchterm; can be used to disambiuate lemmas (apparently…)
  • looking for phrases – can just type in a phrase; can treat each word as a single word, ie use wild cards, combine lemma queries with word queries etc; NB issues with punctuation used within wildcards etc

Week 7:

  • collocation – need to create collocation database for query; not rocket science
  • sub-corpora – from main menu; can use for restricted queries 
  • frequency lists – can download
  • keywords and keytags – can access frequency lists from other corpora on system

Week 8:

  • categorising queries
  • downloading results – accessed from the concordance query under the new query menu

Also introduces BNC64, which allows you to manipulate and explore a subset of BNC spoken data, specifically to compare male and female speech.


2 thoughts on “#corpusmooc practical 2: using CQPweb

  1. Glad to know I’m not the only one who finds the CQPWeb videos painfully slow! Haven’t looked at this week’s ones yet as there seemed to be so many!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s