An article by Steve Peters – stephen.peters1@sky.com, or tweeting at @SemanticTourist
Following my recent post, about trialling publication of statistical data as LinkedData, I wanted to update you on other things I’m doing to (hopefully) make it easier to find, understand and re-use DCLG’s data outputs.
I’ve been playing around with DCLG’s data on spending over £500, trying to find ways to:
- Make it more accessible – by amalgamating separate spreadsheets in to one long list of transactions, with the ability to search or filter that list in various ways; and
- Make it more useful – by including additional information about each transaction, to help the non-accountants amongst us begin to understand how or why the money was spent.
The result is this Google Fusion table and – to illustrate what can be done – this prototype Spending explorer application.
Making the data more accessible
The Fusion table brings together all data for the 2010-11 financial year, as published in 10 or spreadsheets on the DCLG site. Thanks to Google’s Fusion Table’s API, you can slice and dice this single list of transactions in a number of ways.
Here’s an example, which retrieves a Comma Separated list of all payments we’ve made to MITIE CATERING SERVICES LIMITED
http://tables.googlelabs.com/api/query?sql=SELECT%20*%20%20FROM%20334877%20where%20Supplier=’MITIE%20CATERING%20SERVICES%20LTD’
And another example, this time to get a CSV list of payments for Agency Staff – as recorded under the “Agency Staff” Expense Type.
http://tables.googlelabs.com/api/query?sql=SELECT%20*%20%20FROM%20334877%20where%20’Expense%20Type’=’Agency%20Staff’
So much easier that wading through separate spreadsheets, and thousands of lines of data – and a nice illustration too of how and why APIs are an important alternative to publishing flat, largely disconnected files on a website.
Making the data more useful
The fusion table incorporates two additional columns – i.e. information that isn’t available in the published spreadsheets.
First, is data on the organisational “Units” in DCLG which made each payment. The values here correspond to units in the organogram we published in June 2010 – with some additional entries, reflecting more recent changes to the Department’s structure.
Second, and with thanks to my colleagues in DCLG’s Finance Team, I’ve included a flag for each payment indicating whether it relates to: (a) money the Department spends on its own, day-to-day operations; or (b) money it has paid to external organisations, typically as government grants. You’ll see the Fusion table includes a new “FundingType” column, containing, with values of OVERHEADS or GRANT to indicate the type of payment.
So how does this make a difference?
In part as a personal learning exercise, I wanted to test whether and how offering data through an API with additional contextual information can make a difference.
I’m thinking about this from the perspectives of:
- Data owners and publishers – e.g. to begin to understand how APIs help us to remove some of the publishing burden, or “scale-up” to release more data, without getting sucked in to a file-based, content management quagmire.
- Data users and consumers – to determine how APIs can help you retrieve or data much more quickly and easily, understand it once its been retrieved, and blend/combine it with other sources.
The prototype Spending Explorer application is my first stab at demonstrating why APIs are good.
For example, its very easy to calculate headline trends in the data, then – using open free visualisation tools – compare, contrast and explore spending in various ways. The application provides treemaps for you to rummage through the data, starting at either total spend over the year under individual expense types, or – as in the example below – spending by DCLG Units.
Here, we can see that DCLG spent just over £311m running the organisation in 2010/11. The treemap shows us how this breaks down across individual DCLG Units, which are “clickable” for you to drill in for more detail.
Because we’re calling data through the API, we can also quickly examine and explore it from different perspectives. The application demonstrates this through its “By supplier” tab, where the start point is now a list of payments to individual suppliers, ordered by total cash value, with the number of payments made, and the average value per-payment.
Here, we can see that DCLG paid a total of £15.5m to Land Securities, in 35 separate payments. The smaller tables indicate how these payments break down across organisational units, and individual expense types.
Remember that, in these and all other cases, the application is simply asking the API to provide some data; which returns results in an open format; for the application to manipulate and display in any number of ways.
So what’s next then?
I’m hoping that I’m on to something here. I think I have proved, in a modest way, that APIs can and do offer real benefits over publishing flat, disconnected files, especially when we incorporate additional contextual information.
I think too that we can see through these examples clear advantages for data publishers and data consumers – e.g. efficient, scaleable publishing in searchable, open formats, can and does enable re-use and innovation.
As we move through the next year, I can see a number of opportunities to improve and extend the volume and quality of data available via APIs. Specific things I’m looking to do include:
1. Incorporating links to the excellent Open Corporates site’s information about registered companies – so you can begin to join the dots between payments made by DCLG and other public bodies to the same individual suppliers.
2. Aligning spending information with our soon-to-be-released, updated organogram – so you can see more clearly who in DCLG is accountable for different bits of departmental expenditure
3. Providing additional context – such as postcodes for supplier addresses (to show where in the UK our suppliers are based), or to indicate whether the supplier is a private sector or voluntary/charity sector organisation.
4. Joining up payment and budget information – so you can determine whether we’re on track to spend to agreed budget allocations.
I’m trying to move this forward in bite-sized chunks, as time and resource allows. This work is running alongside a significant restructuring of the Department’s responsibilities, and downsizing of its resources. In the mean time, please bear with me if things to be moving a little too slowly for your liking.
I would of course welcome any offers of help or support! You know where I live.