You’ll have seen from previous blog posts that I’ve been messing around with data from the Greater Manchester Data Synchronisation Programme, and Leeds Data Mill.
As both sites have published data on allotments, I’ve taken the opportunity to:
- Compare different approaches to providing local data on the same topic;
- Explore similarities and differences in the actual content (data variables); and
- Test the feasibility and value of using the “official” alongside “unofficial”, crowd-sourced content – in this case, using allotments data published on OpenStreetMap.
I’ve approached this by developing two, similar applications to map:
- Allotment sites in Salford, Trafford and Manchester Councils – sourced directly from GMDSP’s triple store;
- Allotments sites in Leeds City Council – sourced from two CSV files published on the Leeds Data Mill.
Similarities and differences?
Data publishing methods and formats
GMDSP employs quad-store technology to provide all data via single, query-able API. Leeds Data Mill does it differently, offering data in downloadable CSV files via a data portal based on CKAN technology.
My personal preference is for API-based sources: in my view, they offer more flexibility to slice-and-dice the data we want. I also really like GMDSP’s use of a quad-store, LinkedData and a SPARQL end-point providing data in a range of formats, including CSV. For me, the real advantage is I can get at the URIs describing GMDSP’s data, and – therefore – link it directly, over the web, to other sources employing the same identifiers.
Now, don’t get me wrong. I’m equally comfortable working directly with CSV files. I would agree that – compared to implementing and populating a quad-store – this provides a more straightforward, arguably faster way of getting the data out there. My main issue with CSV’s is that, at some point, I will want to link the data in context to other related sources. For me as the data user, doing that without URIs can create more work and increases scope for mistakes or misrepresentation.
I’ll mention here two interesting initiatives to help unlock and link-together data published in separate CSV files:
- Open Data Institute’s CSVLint project – “CSVs looks easy, but it can be hard to make a CSV file that other people can read easily. CSVLint helps you to check that your CSV file is readable. And you can use it to check whether it contains the columns and types of values that it should”
- The W3C’s CSV on the Web Working Group – who’s mission is “to provide technologies whereby data dependent applications on the Web can provide higher interoperability when working with datasets using the CSV (Comma-Separated Values) or similar formats”.
On balance, both approaches have their pros and cons, and, of course, we should congratulate GMDSP and Leeds Data Mill for their excellent work to make the data available.
Data content
There are some interesting differences between the data (attributes) about allotments in the GMDSP and Leeds outputs.
The first, and arguably most important difference is: geo-references. GMDSP provides grid references (as lat/long and easting/northing pairs), making it really easy for me to map location of individual allotment sites. Leeds has instead provided an approximate address – e.g. Armley Ridge Terrace, Leeds, LS12 2QT. So, In order to map Leeds’ sites I first needed to geo-code the location using the address string: I used this public, free Batch geo-coding service.
It would be great if local public data publishers could include map coordinates as standard – ideally, centroids for each site, as lat/long and easting/northing coordinate pairs.
GMDSP has provided some additional data about each allotment site, including the name and contact numbers for the site manager. Leeds has provided a much richer set of information, including.
- Whether the site is managed by the council or an association
- Facilities available at each site.
- A break down of the numbers of plots on each site, by plot size.
- Information on waiting lists
This allowed me to incorporate additional functionality in the app – such as filtering by available facilities, and plot size.
The obvious question is: can we agree on a standard, core set of attributes that all councils should release in their data about allotments? Clearly, this would need to be tested with data users.
Blending the “official” and “unofficial” sources – OpenStreetMap (OSM)
As a contributor to OSM, I thought it would be interesting to compare its data on allotments with the “official” GMDSP and Leeds sources
To extract data from OSM, I’ve used the OverPass API.
Queries are straightforward. For example, this:
http://www.overpass-api.de/api/xapi?way[landuse=allotments][bbox=-1.8003592385542788,53.69898436015236,-1.2903945878677803,53.94587148697487]
Gets me polygons for allotment sites within the bounding box covering Leeds City Council.
Its no great surprise to discover various differences between the official and OSM sources:
- GMDSP and Leeds are reporting more sites that recorded on OSM.
- OSM is also lacking much of the attribute data about each site, and – in many cases – has only the boundary and no names.
- However, OSM arguably provides a much richer and more accurate source for mapping. Boundaries are more informative/useful than point locations. Map points are less accurate, particularly where locations are geo-coded by the user based on approximate addresses.
Perhaps there is a case for data publishers to collaborate more closely with OpenStreetMap. Similar work by New York City provides an interesting model and case study for you to chew over.
Thanks for reading, and please do get in touch if you have any views or feedback.