Thanks to a rainy bank holiday, I’ve been exploring Land Registry’s LinkedData service for its price-paid dataset, and wanted to share some thoughts on what I’ve found and done.
Firstly, congratulations and very well done Land Registry for providing your data in a really accessible and useful form. With my limited knowledge of LinkedData, I was able to quickly write some queries to extract information on property transactions, mash it with other related sources, and produced this application.
Overall, this is a far better alternative to downloading and crunching large CSV files. Its not that I have anything against CSV; its just that I’d rather spend less time wading through long lists of data, and focus instead on creating things with it. In my humble opinion, the Land Registry data service hits this particular spot very nicely.
The goal – understanding the housing market from a Local Authority perspective
My start point was: what can the Price Paid dataset tell us about housing markets at the local level? I was kicking around questions like:
- what’s the average price paid for detached, sem-detached and terraced properties in my local area?
- how do prices paid vary across local postcodes and towns?
- how does the price paid change over time?
- what is the local market for new build versus established properties?
Whilst you can find the answers in the monthly CSV downloads, I’ve found that making sense of the several hundred thousand (and counting) records is time consuming and sometimes frustrating process.
Step forward the Land Registry LinkedData Service, and its fab data querying facility, known in the trade as a SPARQL end-point. In essence, this allowed me to issue a web request which pulls back price paid information for a particular local authority, for re-use directly in my application. I chose to get the data in a JSON format, but its available in other formats too – including CSV, if you really want to go there!
Once I have the data, I can crunch it to display lots of different views. In the application, I’ve chosen to show a count of transactions, and the median and average price by month and year, type of estate (Freehold, or Leasehold), type of property, postcode sector, and towns.
More importantly, the data contains identifiers which allow me to link it to other related sources, over the web. For example, I use the postcode to link across to the OpenDataCommunities service I’m leading in my day job at DCLG. This gives me back the Postcode grid reference, for me to then display transactions on a map.
Of course, I could equally have used Ordnance Survey’s excellent new LinkedData platform to grab the additional geographic data I need. I went with OpenDataCommunities instead because I’m interested in linking up Land Registry’s data with the additional DCLG Housing and other statistics that we’ll be releasing via our LinkedData platform in the coming weeks and months.
Just like Land Registry’s and Ordnance Survey’s services, OpenDataCommunities provides LinkedData via a SPARQL end-point, so the querying method and structure is (broadly) the same in all cases. That means I spend less time figuring out how to extract related data from different systems, and more time doing interesting things with the outputs.
I can also use these identifiers to hook in to other open data platforms. In the application, I’m again using the postcode to query Geonames, and get back a list of populated places in my area of interest. Whilst it isn’t LinkedData, Geonames has done a great job providing flexible access to a rich source of geographic intelligence – and its free to re-use!
One small step for man…
Services like Ordnance Survey’s, Land Registry’s and (dare I say) DCLG’s are beginning to show the true potential of the web of data, founded on open, re-usable and consistent standards. Together, they show how it is now increasingly possible to bring together related data and facts over the web, without spending hours (and hours) copying, pasting and reformatting the original source.
The key task now – and focus of my day-job – is how to release more data in a similar open, accessible and re-usable form? Yes, of course there are challenges, but there are some potentially big prizes too.
In the words of one of my favourite songs – “we can create”.
If you have any thoughts on this post, or are simply confused about where to start with open data, please do get in touch and I’ll see if I can help.