Tag Archives: PhD

Mapping the news

Where did the last 12 months go? All I can really remember is something about being confirmed as a PhD candidate. I read a lot, and wrote a lot, but did very little of what I originally set out to do — namely, visualising and analysing text data. Now, finally, I am back in the sandpit. I’ve amassed a truckload of data in the form of news articles and blogs about coal seam gas development in Australia, and I intend to spend the next short while sifting through it and seeing what sort of sandcastles I can build before the tide of my next PhD milestone forces me to construct something more substantial.

The ultimate aim of my PhD is to explore how computational text analysis techniques such as topic modelling can assist in the analysis of public discourse. But for now, my objective is to get acquainted with my data. This data is divided into two piles, each representing a part of the discursive landscape around coal seam gas (or CSG) in Australia (if you’re American, think coalbed methane). One pile of data consists of texts published on the web by a range of actors (the sociology kind, not the Hollywood kind) including community groups, activists, lobbyists and politicians. I’ve siphoned these texts from a variety of websites using a data-crawling tool called import.io. The second, much larger, pile of data consists of news articles from hundreds of Australian mainstream media publications, from the national broadsheet right down to the local rags. I gathered these articles from the online news database Factiva, with the help of a script, available at the website for the conversation analysis tool Discursis, which converts Factiva’s HTML outputs into tabular format in the form of CSV files.

This post is devoted to exploring the second pile of data — the many thousands of news articles that I gathered from Factiva. Without attempting any fancy text analysis, I aim to get a first look at the overall volume, scope and diversity of the content. The focus in this post is on the overall volume and the geographic distribution of the content. In a future post, I plan to explore the the specific news sources in more detail. Continue reading

Mapping concepts, comparing texts

In the previous post, I explored the use of function words — that is, words without semantic content, like it and the — as a way of fingerprinting documents and identifying sets that are composed largely of the same text. I was inspired to do this when I realised that the dataset that I was exploring — a collection of nearly 900 public submissions to an inquiry by the New South Wales parliament into coal seam gas — contained several sets of documents that were nearly identical. The function-word fingerprinting technique that I used was far from perfect, but it did assist in the process of fishing out these recycled submissions.

That exercise was really a diversion from the objective of analysing the semantic content of these submissions — or in other words, what they are actually talking about. Of course, at a broad level, what the submissions are talking about is obvious, since they are all responses to an inquiry into the environmental, health, economic and social impacts of coal seam gas activities. But each submission (or at least each unique one) is bound to address the terms of reference differently, focussing on particular topics and making different arguments for or against coal seam gas development. Without reading and making notes about every individual submission, I wanted to know the scope of topics that the submissions discuss. And further to that, I wanted to see how the coverage of topics varied across the submissions.

Why did I want to do this? I’ll admit that my primary motivation was not to learn about the submissions themselves, but to try my hand at some analytical techniques. Ultimately, I want to use computational methods like text analytics to answer real questions about the social world. But first I need some practice at actually doing some text analytics, and some exposure to the mechanics of how it works. That, more than anything else, was the purpose of the exercise documented below. Continue reading

The bottom-right cluster. All of these documents except Submission 0655 draw on the same template.

Using Junk words to find recycled text

Newton’s third law of motion — that for every action, there is an equal and opposite reaction — would appear to apply to the coal seam gas industry in Australia. The dramatic expansion of the industry in recent years has been matched by the community’s equally dramatic mobilisation against it. As my previous post showed, there are literally dozens of organisations on the web (and probably even more on Facebook) concerned in some way with the impacts of coal seam gas development. Some of these are well-established groups that have incorporated coal seam gas into their existing agendas, but many others seem to have popped up out of nowhere.

Most of these groups could be classified as community organisations insofar as they are concerned with a specific region or locality. But to think of them all as ‘grassroots’ organsiations, each having emerged organically on its own accord, might be a mistake. As the website network in my last post suggests, many of these groups might better be thought of as ‘rhizomatic’ (or lateral) offshoots inspired by the Lock the Gate Alliance. Lock the Gate emerged in 2010 and quickly reconfigured the landscape of community opposition to coal seam gas. Its campaigns, strategies and symbolism provided a handy template upon which locally focussed organisations could form. You’ll be hard-pressed to find a community-based anti-CSG group without a link to Lock the Gate on their website.

The lesson here is that voices that appear to be independent may to some extent be influenced or assisted by a small handful of highly motivated (or well resourced) groups or individuals. Having observed this possibility in the network of anti-CSG websites, I recently encountered it again while sifting through a very different dataset that I am preparing for  textual analysis. The dataset in question is the 893 public submissions that the Parliament of New South Wales received in response to its 2011 inquiry into the environmental, health, economic and social impacts of coal seam gas activities. The submissions came from all kinds of stakeholders, including community groups, gas companies, scientific and legal experts, government agencies, and individual citizens. Of particular interest to me were the 660 submissions from individual citizens. Here was a sizable repository of views expressed straight from the minds and hearts of individual people, undistorted by the effects of groupthink or coordinated campaigns. Or so I thought. Continue reading

Seeing who’s who in the CSG zoo

As explained in the About page, I have recently commenced a PhD. Barely eight weeks in, I am still enjoying what I’ve heard described as the honeymoon phase, where all I need to do — indeed all I really can do — is immerse myself in new ideas, literature, software, and on-campus drinking outlets. You could also call it the pig-in-mud phase, since that is essentially what such activities equate to for someone like me. Milestones and progress indicators will come later (though soon enough I am sure!); right now my concern is to arm myself with knowledge and tools for the long road ahead, wherever that road may end up leading.

In broad terms, the task I have set myself is to use digital methods to explore and make sense of some social phenomenon, such as the discourse around a contentious issue. The case study that I have chosen to get me started is the recent expansion of the coal seam gas (CSG) industry in Australia. Coal seam gas (known as coal bed methane in the United States) has been used in Queensland to generate electricity for more than 20 years, but the industry has boomed since about 2008 as local gas companies race to get their gas into the global export market. This ‘gas rush’ has brought the industry into previously uncharted territories — first the prime agricultural areas of the Darling Downs in southern Queensland, and now several agricultural and pastoral regions of New South Wales.

To cut a long story short, a lot of people have become very concerned about coal seam gas. And while many of these concerns have been expressed through traditional, on-the-ground methods like marches and blockades, much of the debate around coal seam gas has unfolded, or at least been documented, on the web. A wide range of voices can be found, including those of individual citizens, grassroots action groups, seasoned lobbyists, government agencies, industry groups, research institutions, and of course, the news media. All of which makes the issue a perfect case study for my present purposes.

I should make clear, however, that I do have a personal interest in the topic. As a State Government employee, I worked for several years on a project about coal seam gas water management. I have also worked for about a year in a research centre at the University of Queensland that is concerned with both the impacts and the opportunities presented by coal seam gas. So I’ve become acquainted with a few small corners of the industry and how it is managed, and I have watched the debate around the industry unfold over a number of years. And while I have my own thoughts about the merits or otherwise of industry, I am not going to discuss them here. I am not concerned with who is right or wrong, but with how the debate around coal seam gas has developed. Who are the participants? How do they and their discourses interact? Which ideas and beliefs have shaped the debate, and how? Continue reading