The last two posts have updated my progress in understanding the Where and the Who of public discourse on coal seam gas, but didn’t say much about the When. Analysing the temporal dynamics of public discourse — in other words, how things change — has been one of my driving interests all along in this project, so to complete this series of stock-taking articles, I will now review where I’m up to in analysing the temporal dimension.
At least, I had hoped to complete the stock-taking process with this post. But in the course of putting this post together, I made some somewhat embarrassing discoveries about the temporal composition of my data — discoveries that have significant implications for all of my analyses. This post is dedicated mostly to dealing with this new development. I’ll present the remainder of what I planned to talk about in a second installment.
The experience I describe here contains important lessons for anyone planning to analyse data obtained from news aggregation services as Factiva.
The moving window of time
The first thing to mention — and this is untainted by the embarrassment that I will discuss shortly — is that I’ve changed the way I’m making temporal graphs. Whereas previously I was simply aggregating data into monthly or quarterly chunks, I am now using KNIME’s ‘Moving Aggregation’ node to calculate moving averages over a specified window of time. This way, I can tailor the level of aggregation to the density of the data and the purpose of the graph. And regardless of the size of the time window, the time increments by which the graph is plotted can be as short as a week or a day, so the curve is smoother than a simple monthly or quarterly plot.
One reason why this feature is so useful is that the volume of news coverage on coal seam gas over time is very peaky, as shown in Figure 1 (and even the 30-day window hides a considerable degree of peakiness). Smoothing out the peaks to see long-term trends is all well and good, but it’s important never to lose touch with the fact that the data doesn’t really look that way.
Figure 1. The number of articles in my corpus over time, aggregated to a 30-day moving window. Hovering over the image shows the same data aggregated to a 90-day window. Continue reading