#edDDI: Digital Day of Ideas 2015

2016 update: #DigScholEd was liveblogged by Nicola Osborne. Keynotes from literary historian Ted Underwood on Predicting the past, a distant reading type approach to digital libraries, Lorna Hughes on Content, co-curation and innovation: digital humanities and cultural heritage collaboration, and Karen Gregory on Conceptualizing digital sociology.

Bumped/rewritten post – see below for brief mentions of #edDDI in 2014 and 2013 and other #digitalhss doings.

From the #digitalhss stable came Digital Day of Ideas 2015 (#EdDDI | TAGSExplorer – see graph) on 26 May, livetweeted, blogged and Storified by Lorna Campbell (@LornaMCampbell), with recordings of the talks to come.

Speakers and outputs:

Other #edDDIs:

#digitalhss in four keys: medicine, law, bibliography and crime, workshop on 12 November 2013, liveblogged by Nicola Osborne:

  • Digital articulations in medicine (Alison Crockford) – ah, the Surgeons’ Hall…seeks to illuminate the relationship between literature and medicine in Edinburgh through the development of a digital reader,  joining together not only the literary and medical spheres but also the rapidly expanding field of the digital and the medical humanities; interesting points on the nature of digihum and public engagement issues, see Dissecting Edinburgh for more
  • Rethinking property: copyright law and digital humanities research (Zhu Chen Wei) – the entrenched idea of copyright as an exclusive property regime is ill suited for understanding digihum research activities; how might copyright law respond to the challenges posed by digital humanities research, in particular the legality of mass digitisation of scholarly materials and the possible copyright exemption for text and data mining
  • Building and rebuilding a digital catalogue for modern Chinese Buddhism (Gregory Scott) – the Digital Catalogue of Chinese Buddhism is a collection of data on over 2300 published items with a web based, online interface for searching and filtering its content; can the methods and implications of working with a large number of itemised records, bibliographic or otherwise, be applied to other projects?; channelling Borges’ library of Babel 
  • Digitally mapping crime in Edinburgh, 1900-1939 (Louise Settle) – specifically an historical geography of prostitution in Edinburgh; used Edinburgh Map Builder, developed as part of the Visualising Urban Geographies project, which allows you to use National Library of Scotland maps, Google Maps and your own data; viz helps you spot trends and patterns you may not have noticed before;  for locations elsewhere in UK Digimap includes both contemporary and historical maps; Historypin uses historical photography to create maps, (EH4, plus come in #kierkegaard); see also the Edinburgh Atlas

See also the workshop on data mining on 19 November 2013.

Sagas and space (4-5): cosmography and cartography

Week 4 was entitled Cosmography: descriptions of the world in medieval texts:

This week’s main topic will be the cosmography of the North in the Middle Ages and the Early Modern Period and as such continues the discussion of pre-Christian cosmology in Week 1…The central question of all the sources is: How did people in the Middle Ages and the Early Modern Period conceive the North as a system of space and how did they represent this spatial system in texts, images, signs? Two famous works will be at the centre of our attention, the so called Itinerary by the Icelandic Abbot Nikulás, and the so called Carta Marina map by the Swede Olaus Magnus with its sea monsters.

Abbot Nikulás’ itinerary, aka Leiðarvísir og borgarskipan (Wikipedia), was published around 1157 and takes the form of a guidebook for pilgrims about routes from northern Europe to Rome and Jerusalem. Gosh. The wrap-up states that “many of the contributions you posted on cosmography and intertextuality were extremely good” – the number of contributions may have fallen off a cliff, but leaves a fully engaged hard core. Expanding on this, “a nice definition of intertextuality can be found in some of Mikhail Bakhtin’s and Julia Kristeva’s writings…Intertextuality means that a text uses another text (more or less overtly and explicitly) and thus speaks with the voice of the other text…the Bible is of course the main text which was and is re-used and re-writtten in the Christian tradition.”

Week 5 was entitled Cartography: mapping the North:

This week’s topic will be the cartography of the North in the Middle Ages and the Early Modern Period and as such continues the discussion of the textual cosmography from last week. This means that we will look at some of the same sources, although from different angles. The central question is still the same: How did people in the Middle Ages and the Early Modern Period conceive the North as a system of space and how did they represent this spatial system in texts, images, signs?

The videos take e a closer look at some of the more prominent medieval and early modern maps in the North, in particular Olaus Magnus’ Carta Marina (Wikipedia), made in the first half of the 16th century, ie a bit on the late side, but clearly Jürg Glauser’s specialist subject.

No headache inducing theory in weeks 4-5, hence rather less interesting to a non-Vikingophile.

As it happens the latest issue of Granta has the theme of the map is not the territory, ie “the difference between the world as we see it and the world as it actually is, beyond our faulty memories and tired understanding”, with pieces that “remind us of the human cost associated with the divergence of map and territory” in, for example, Iraq, and on the present state of Russia: “Communism…made the distinction between image and reality a political art form” (source: introduction). Of the open pieces, The archive is a splendid bit of experimental writing in the from of a visualisation which provides “a means of understanding the essential aspects of a literary text, avoiding the possible confusions, or a proliferation of diverging interpretations, to which a conventional approach could give rise”. It would be interesting to tie these ideas in with the texts and maps on offer in the MOOC.

#smwbigsocialdata: getting social at CBS

On 27 February the boffins at Copenhagen Business School (aka the Computational Social Science Laboratory in the Department of IT Management) opened their doors for Social Media Week with Big social data analytics: modelling, visualization and prediction. This was the second time CSSL has participated in #smwcph, with their 2014 workshop (preso) looking at social media analytics. See also my post on text analysis in Denmark.

Wifi access was not offered, resulting in only 19 tweets, but as many of these were photos of the slides I’m not really complaining. Also no hands-on this year, all in all a bit of a lacklustre form of public engagement.

Ravi Vatrapu kicked off the workshop with a couple of definitions:

  • What is social? – involves the other; associations rather than relations, sets rather than networks
  • What is media? – time and place shifting of meanings and actions

The CSSL conceptual model:

model

  • social graph analytics – the structure of the relationships emerging from social media use; focusing on identifying the actors involved, the activities they undertake, the actions they perform and the artefacts they create and interact with
  • social text analytics – the substantive nature of the interactions; focusing on the topics discussed and how they are discussed

It’s a different philosophy from social network analysis, using fuzzy set logic instead of graph theory, associations instead of relations and sets instead of social networks.

Abid Hussain then presented the SODATO tool, which offers keyword, sentiment and actor attribute analysis on Twitter and Facebook (public posts only, uses Facebook Graph API). Data from (for example) a company’s wall can be presented in dashboard style, eg post distribution by month.

Next, Raghava Rao Mukkamala explored social set analytics for #Marius and other social media crises. Predictions (emotions, stock market prices, box office revenues, iphone sales) can be made based on Twitter data.

Benjamin Flesch’s Social Set Visualizer (SoSeVi) is a tool for qualitative analysis. He has built a timeline of factory accidents and a corpus of Facebook walls for 11 companies, resulting in a social set analysis dashboard of 180 million+ data points around the time of the garment factory accidents in Bangladesh.

The dashboard shows an actor’s engagement before, during and after the crisis (time), which can also be analysed over space (how many walls did they post on). Tags are also listed, allowing text analysis to be undertaken.

Niels Buus Lassen and Rene Madsen then outlined some of their work with predictive modelling using Twitter. You have to buy into #some activity being a proxy for real world attention, ie Twitter as a mirror of what’s going on out in the market – a sampling issue like any other. Using a dashboard driven by SODATA they classify tweets using ensemble classifiers, such as iPhone sales from 500 million plus tweets containing the keyword “iphone” (see CBS news story | article in Science Nordic).

They also used a very cool formula I nearly understood.

Last up, Chris Zimmerman gave an overview of CSSL’s new Facebook Feelings project, a counterpart to all those Twitter happiness studies. A classification of 143 different emotions on Facebook, based on mood mining from 12 million public posts, yikes. “Feeling excited” was the most popular feeling by far. Analysis can be done and correlations made on any number of aspects of the data, with an active | passive axis in addition to the positive | negative axis used in sentiment analysis. Analysis by place runs into the usual issue – only 5% of data has locality data.

Overview slides currently available from the URL below…

Art, writing and big issues

Update, 23 Nov: Copenhagen Museum hosted a panel debate on architecture, art and urban nature yesterday, no coverage traced. Speakers included Camilla Berner of the Oversete Nyheder installation at Kongens Nytorv, a simple idea which may/not have been effective in situ. The summer’s growth was cleared at the beginning of September – and it’s good to know that the square will one day be restored to its previous state. But when? I can barely remember it as a funtioning square without a fence.

On Sunday some blocks of Greenlandic ice were dumped on Rådhuspladsen by go-to artist Olafur Eliasson (see comments, Classic Copenhagen), Klimakunst sees five artists installed in Østerbro’s Klimakvarter during October, while the Free Word Centre’s Weather Stations project is developing a literary response to climate change. I tend to the sceptic, like group working it’s one of those things where the intention seems more effective than the execution, although there’s money in it, folkens…

Two recent events explored the theme. 23 October saw Pynt eller politik: kan kunst og arkitektur fremme den grønne omstilling? (Storify | YouTube). Watching the stream the debate on engagement stuck out, with participants highlighting the need for new forms of communication, perhaps reducing the dystopian angle on climate change in favour of something more positive. More idealistic was a call for more of the aesthetic, which in turn would emphasise the ethical in society and education (this works better på dansk), more solutions and positive stories, less of the victim, endless facts and figures – current discourse is too functional and economically driven. What is needed is collective action rather than passive individuals, a lifestyle and value system change away from consumption. After that the second debate, on investment, touching on the ethics of nudging, seemed old fashioned.

Kudos for the streaming and a decent Storify, but maybe the event could have tried out something a bit more innovative than people giving presos. And just wondering, are Danes really bæredygtige or bare dygtige? (Broadly, good at sustainable lifestyles or good at doing what they are told…we create society or vice versa.) I don’t have a problem finally! sorting my household waste, but I don’t really feel it’s going to make a huge difference towards CPH’s climate goals, which don’t inspire, but rather feel childishly idealistic.

28-29 October saw Environmental entanglements: art, technology and natures (spot the Rennie Mackintosh font), organised by ITU’s Energy Futures squad (new on on Twitter; my bolding below):

This symposium brings together an interdisciplinary group of internationally acclaimed artists and academics in order to investigate how the arts, humanities and social sciences are responding to an increasing awareness of the complex environmental entanglements we are living in. In four themed sessions, the speakers explore alternative imaginaries and creative materializations of environmental issues. The symposium aims to foster lively cross-disciplinary conversations about the role of arts and humanities in articulating the political, scientific, social and aesthetic implications of environmental change.

It is becoming clear that a major part of the environmental problems are caused by the way our (mostly western) infrastructures are designed and that the resistance to changing existing infrastructures are often related to aesthetic issues (eg NIMBYism) and to a lack of creativity when it comes to re-imagining the very nature of these infrastructures. Therefore a growing number of artists have taken up engineering and architectural challenges as they propose ideas for spectacular and functional infrastructural constructions. In this session we will discuss what it is artists and designers can do differential than engineers and architects when it comes to re-imagining environmental infrastructures.

From the programme the following were of interest:

Incidentally, once again this wasn’t as well done as one might expect – time to revitalise event amplification – and curation?

Which is where this sort of thing comes in.

Project #marius and infostorms

(Post copied from Danegeld blog, 4 Feb 2015.)

Update, 28 Feb 2015: I gave #SMWZOOSHITSTORM a wide berth as it would just make me cross, although the CBS team commented at another Socal Media Week CPH event that the story keeps on giving. The event did yield up:

Updates: 2 April 2014: story in Berlingske on the research, plus perspectives of the day from Denmark and RoW. The CPH Post, who had their own Marius fool, reported that the Jobindex spoof was pulled at around noon due to complaints, but it still seems to be there…9 April: the Zoo’s comms guy tells his story…a peer reviewed article on the saga, Marius, the giraffe: a comparative informatics case study of linguistic features of the social media discourse, was presented at the ACM’s CABS 14 conference (abstract)

A team at Copenhagen Business School has taken a look at the use of social media around Copenhagen Zoo’s recent giraffe story:

See also Tableau visualisations and the timeline of events.

Research questions:

  • how did the conversation amplitude evolve?
  • where did negative sentiment originate and how did it evolve/spread?
  • who were the main actors – for some #sna see slides 20-23; Twitter bios showed a lot of vegans, activists etc (slide 19), well organised on #some
  • what types of posts and events instigated the issue online?
  • how did CPH Zoo handle the event on social channels and how did the social media storm affect their presence? – posted both in English and Danish on its Facebook and very successful in terms of check-ins, likes etc, but commentary very negative, mainly English (slide 24-26)
  • how did other organisations deal with the crisis?

Over 80% of the data came from Twitter. Highest buzz rate: 332 posts/minute, with a second short lived spike at 20K tweets/hr re the second Marius. 50% of tweets were retweets – a reflection of sentiment?

Twitter offered a more direct reflection of events, in terms of volume and sentiment, and also demonstrated a more drastic reaction to network prestige factors from activists and celebs. Discourse on Facebook was different –  a more closed environment, with feelings expressed to family and friends and maybe the Zoo.

95% of the global conversation was in English, with Danish detected in only 2,220 posts. Differences in the Danish subset are particularly interesting (slide 11) – Twitter and Facebook only share 50% of the conversation – does mainstream media play a larger role in Danish society? Fewer RTs – #some used more to express oneself than to share information? But sentiment is also more neutral (slide 17), with more negative sentiment on Facebook (apart from that viral photo in support of the Zoo; ?Twitter penetration in Denmark lower, large subset of politicians, media etc).

Radian6 used for analysis, but came up short – pretty hopeless for the Danish data subset, and its automatic sentiment coding was “either super safe or super crap” (slide 16), neutral heavy, often failing to detect negative sentiment. 50 corporate communications students at CBS hand coded some data with rather different results. Much discussion over what is positive or negative in this case. Now starting to analyse YouTube comments.

Was #marius an infostorm? Infostorms, a new book from two researchers in Denmark (one chairman of the Danish Nudging Network), explores whether #some “amplifies irrational social behaviour and can manipulate minds and markets” (see press release).

Denmark’s utilitarian approach towards animals is out of step with the English speaking world in particular. Some rather less robustly scientific articles have been sighted lately, and this is a topic it will be interesting to track in the future. Here’s my collection of notable #marius stories for the record:

An image in support of the Zoo’s Director went viral on my Facebook timeline at least, and an ill advised tweet from actor Pilou Asbæk, one of the hosts for Eurovision 2014 in Copenhagen, went viral on Facebook (traces of both now deleted), but it is to be hoped that organisations representing Denmark are sensitive to the issues:

Copenhagen’s visitor card

Word clouds for text mining?

Updates: July 2014 – word clouds, or maybe even wordclouds, are still with us, and making a little more sense in the big world of data. See Suprageography’s London Words, preparatory work for a Big Data and Urban Informatics workshop in Chicago. Gosh! Sep 2014: calligrams are back! July 2013: review of Textal (word cloud text analysis app).

Word clouds, who needs them.  I’ve previously considered them as a sort of cherry on the cake, but in the days of data visualisation are word clouds actually harmful or simply the ‘mullet of the Internet’?

Can you really use word clouds for serious visualisation (presentation/explanatory) or text mining (exploratory)? The second graphic for critique on the #datavis MOOC was a word cloud from the New York Times, At the National Conventions, the words they used.

For consideration:

  1. Is the graphic really ‘functional’ in the sense of facilitating basic, predictable tasks (comparing, relating variables etc)?
  2. Is it interactive enough? How could we improve its navigation?
  3. How would you improve its design? And what about its content? Should we include something else in the mix, more copy, different headlines (yes!), other related variables etc?

Summary of Alberto’s summary:

  1. What methodology was used to select the words?
  2. Navigation – confusing; does it make sense at all? Eg it is not clear  that if you click on a bubble the display will highlight the parts of the text where those words are mentioned (need to scroll to see).
  3.  The words are presented out of context. A word can mean different things depending on who says it, and on what other words surround it. Better to visualise the words as networks of relationships.
  4. Are bubbles an inadequate way of representing the data? They work for the big picture (popular words here, less popular words there), are fun and provide a nice looking first layer of information, but are not helpful to rank the words or make meaningful comparisons. Use in addition a different kind of graphic using a vertical or horizontal scale (bar graphs, slope graphs, scatter plots to see if there is a relationship between Democrat and Republican use of words).

From participants:

  • more ‘prepared data’ would be a powerful enhancement on a traditional word cloud – eg select different policy areas (health, privacy, education) to see a new set of bubbles, combine with selectable speaker sets (the candidates, their spouses, vice-presidential candidates)
  • what does it really mean if a word is used more often? the most important concepts (eg economy) are used almost equally, leaving you none the wiser; the democrats use millionaires 7 times (republicans 0), does that mean that the democrats are working for or against the rich people? the republicans use the word fail more often, what does that really tell me?
  • split the graphic split into two areas – where the parties are mostly saying the same thing and where they differ significantly
  • word clouds only make sense in showing the amount of something – the Democrats say ‘family’ more times the Republicans – what does that mean? out of context the words don’t mean much and can leave quite a bit open to misleading conjecture
  • word clouds for attention, summary and discovery – a word cloud gives an overview of the contents of a set of results
  • counting words is not analysis – it is the first step to analysis in qualitative methods
  • who is this for? too detailed for a casual viewer, and as a research tool would work better as a database driven webpage
  • in essence, the idea of displaying the frequency of words is interesting, but doesn’t give enough information to lead to objective insights

A couple of Wordles popped up in my feedreader last week:

  • ALT-C 2011 wordled – Sarah Horrigan wordled her own tweets from the UK’s learning technology conference and was happy to see a healthy balance between the two, plus terms such as education, listening and students sticking out more than the names of tools or technology specifics
  • Political slogans – in the run-up to Thursday’s general election in Denmark Kaas & Mulvad have wordled candidates’ campaign slogans. In the red corner emerge stem (vote), velfærd (welfare) and ansvar (responsibility), while the blues come out fighting with livet (life) and pengene (money).

Wordle is often used in event wrap-ups as a visualisation of the most tweeted words with a particular hashtag, but does a word cloud really tell us anything? I can’t really decide if the results are useful or just ‘cool’. Any creativity lies mainly in manipulating the results to suit.

Word cloud tools:

  • Textal app | review
  • Infomous – text visualisation tool, see below
  • TagCrowd (show frequencies next to words, group similar words)
  • Tagxedo  – “word clouds with styles”, create word clouds from any URL, Twitter ID etc. Update, Jan 2012: I’ve used Tagxedo to make a beagle shaped word cloud for my @beaglechat account
  • Tagul – lets you assign links to words in the cloud and other fancy things. Update, June 2012: puffin word cloud from IslandGovCamp
  • Textal – also finds pairs (words that were generally paired with the word selected) and collocates (words which the word was usually found next to), which can be exported as text files and sent to an email account; review
  • Tweet Cloud and Tweetstats both allow you to make a word cloud from your own tweets. Tweetstats also lets you Wordle your tweets.

More word clouds:

Infomous is a text visualisation tool which lets you create word clouds from a URL, RSS feed, Twitter user name or search. It’s used by The Economist for topics most commented on. Unlike most word cloud tools it’s interactive, allowing you to navigate the content on display by clicking on individual words, and offering stabs at topic importance (through word size), relationships (through connecting lines) and concepts (through groups).

Here is a link to the default cloud for this blog – WordPress.com strips out the embed code : ( There are configuration options, which I will take a look at in. due. course.

(HT: Alan Cann.)

Quantitative methods for social media research

Update: see coverage of the CCC Symposium in Copenhagen on 16 October for lots more on the perils of big data and quant methods.

The second round of #nsmnss activities took the form of a tweetchat (24 September) followed by a knowledge exchange event (26 September). Topics covered: data visualisation, populations and sampling, and big data. Notes below mainly paraphrased from Twitter. Here’s some videos too (posted 25 Jan 2013), and there’s a webinar on 8 April which I am sitting out.

Are we doing more than data mining when we analyse social media data? Which research questions is it best able to answer? What are the biggest methodological challenges when working with quantitative social media data?

  • Compared to surveys, social media data are conversations rather than responses to standardised questions.
  • Overall quality of data – social media is an expressive medium.

Social media data can often be analysed using visual methods. How can we visualise data collected by social media? How does visualisation relate to statistical analysis? What are the payoffs from using visualisations?

  • We can see patterns not visible in a table, for example can compare categories of data by visualising the size of each block of data- see Information is Beautiful.
  • Tools can explore data interactively – see Interactive VisualizationsGapminder and Hans Rosling’s TED Talk – and track how users explore data.
  • Visualization for storytelling and illustration, or visualization for prediction? Visualization to answer research questions.
  • “Often beautiful, sometimes helpful.”  Dataviz may be adding impact and getting statistics to a wider audience, but is it adding anything methodologically? Visualisations can be misleading – remember the real purpose of the data.

What is the ‘population’ on social media platforms? How do platforms differ in population characteristics? How can we select cases or samples on social media? Is it possible to get a statistically representative sample using from social media platforms? Does it matter?

  • Sampling goal: to make statements about a certain group of people or objects (webpages, tweets etc)
  • Sampling frame: list of every object in population. Internet: special issues, hard to find list of all blogs, pages etc, meaning that bias is common.
  • Huge problem with sampling – lack of demographic info on most people’s profiles makes this even harder.
  • Advantages of sampling with the Internet – cheap, fast turnaround, no interviewer effects.
  • Disadvantages – many have no or limited Internet access, so cannot generalise findings. Population characteristics unknown, meaning of behaviour may be unknown.
  • Social media users are only representative of social media users, not of any larger group – see Sampling and social media and Tortoise or the hare: social media sampling.

Social media research can involve very large datasets. What do we gain and lose with big data? How is big data changing the way we do research?

  • Are we analysing big data because it’s available rather than because it is suitable to answer our research questions? Bigger does not necessarily means more useful.
  • Availability of big data precedes availability of suitable quant methods for analysing complex data structures – we need to understand the structure before we can analyse it.
  • cases: Waller’s study of Australian Google users | text analysis using Google books, studies by Michel et al and Heuser and Le-Khac
  •  Twitter as social network or news/broadcast medium? Example: Kwak’s study of 1.47 billion social relations.
  • Is retweeting a social relationship? Claims based on big data network graphs, eg w influences x,y & z  because w retweets them. Does an RT mean you’ve influenced someone?