Category Archives: Uncategorised

Facteaser: a Knime workflow for parsing Factiva outputs

Most of the cool kids in communication and cultural studies these days are studying social media. Fake news on Facebook, Russian bots on Twitter, maladjusted manboys on Reddit — these are the kinds of research topics that are likely to score you a spot in one of the popular sessions at that big conference that everyone will be going to this year. And for the most part, rightly so, since these platforms have become an integral component of the networked public sphere in which popular culture and political discourse now unfold.

But lurking at the back of the conference programme, in the Friday afternoon sessions when the cool kids have already left for the pub or the airport, you might find some old-timers and young misfits who, for one reason or another, continue to study more traditional, less sexy forms of media. Like newspapers, for example. Or television news. Not so long ago, these were the go-to sources of data if you wanted to make claims about the state of public discourse or the public sphere.

If you don’t member member berries, you need to track down episode 268 of South Park.

Never one to follow the cool kids, I structured my whole PhD around a dataset comprising around 24,000 newspaper articles supplemented with texts from similarly uncool sources like media releases and web pages. One reason for choosing this kind of data is that it enabled me to construct a rich timeline of an issue (coal seam gas development in Australia) that reached back to a time before Twitter and Facebook even existed (member?). Another reason is that long-form texts provided good fodder for the computational methods I was interested in exploring. Topic models tends to work best when applied to texts that are much longer than 140 characters, or even the 280 that Twitter now allows. And even if you are interested primarily in social media, mainstream media can be hard to ignore, because it provides so much of the content that people share and react to on social media anyway.

So there are in fact plenty of reasons why you might still want to study texts from newspapers or news websites in the age of social media. But if you want to keep up with your trending colleagues who boast about their datasets of millions of tweets or Facebook posts assembled through the use of official platform APIs (member?), you might be in for some disappointment. Because while news texts also exist in their millions, sometimes even within single consolidated databases, you will rarely find them offered for download in large quantities or in formats that are amenable to computational analyses. The data is all there, but it is effectively just out of reach. Continue reading