A workshop on text mining was held on 9 July at Open Repositories 2012. Below are some key points from the +/- 12o subhashtagged tweets. The session was avidly live tweeted by @criticalsteph, and proceedings will be published in. due. course.
- the MEDIE intelligent search engine
- challenges in mining historical text
- using mind maps to build a controlled vocabulary (output: ChemicalTagger)
- using SentiWordNet for analysing radical contents on web forums
- extracting keywords from mailing lists to show trends using MarkMail
Further sessions covered legal and ethical issues, for example (largely verbatim from tweets):
- for mailing lists, is harvesting addresses legal, who owns the content?
- copyright, contracts for resources, TOS, paywall, privacy and data protection law can all be barriers
- shifting sands – law is dynamic, and changing; many see money in text/data mining, which can be a catalyst to rapid change
- UK govt says non-commercial research can be an exception, although this must be done on larger scale with EU agreements
- databases allow private order to be applied – lets publishers opt out of the text mining exception. Publishers want to keep control!
- data/text mining could be maybe treated as an index? Author needs protection – maybe? There *is* an issue of author rights.
- Is student author copyright being ignored in plagerism s/ware e.g. Turnitin? Legal challenge = no in USA. Unclear.
- Privacy and Data Protection UK – sensible steps to follow, quite clear & can be used in text mining without problems BUT need to do a personalisation data minimisation risk assessment on this to show intent.
Key text mining resources:
- JISC Data and text mining page – 13 programmes, 33 projects
- Value and benefits of text mining – JISC horizon scan, March 2012
- NaCTeM – National Centre for Text Mining
- discipline specific research – bound to be lots of law stuff about
- techniques – sentiment analysis/subjectivity analysis, opinion mining, affect analysis, metaphor analysis
- approaches – metadata extraction, categorisation, summarisation
- text mining over the social web – community detection, timelines
- legal aspects