Inspired by Jamie Whyte and other Club Open Data friends, I’ve been playing with the crowd-sourced open dataset of commemorative plaques in the UK.
I’ve taken my lead from Jez Nicholson’s interesting Trello Board questions about what we might learn from the dataset. Regular readers of my irregular blog will know I’m especially keen on maps, so I decided to start there: specifically, to understand how plaques are distributed across different types of geographic area.
My story begins with data munging: lots of data munging.
Prepare to be mapped!
Open the CSV list of plaques and you’ll see – after you’ve stopped cursing malformed rows (more on this later) – that a large chunk of the entries are geo-coded: specifically 10,440 of the 11,260 rows in the UK file have a lat/long grid reference.
That got me thinking. What if I could generate maps showing numbers of plaques in all sorts of geographic areas? such as Local Authorities, Wards, Parliamentary Constituencies, and Towns and Cities.
So, I ran the file through Quantum GIS to perform a “point in polygon” search against these higher-level geographic areas in England, Wales and Scotland. Points falling outside of the UK territory have been excluded – I found 315 entries with “invalid” grid references.
The result is this GeoJSON file (6 MB). Open it in say, Quantum GIS, and you should see a map like the one below = 10,125 correctly geo-located (within the UK) plaques.
Click on any of the points, to display a list of its attributes:
Most of these attributes are the product of my “point-in-polygon” searching. Specifically:
- Attributes beginning “LAD” are the Local Authority code and name, from Ordnance Survey Boundary-Line, in which the plaque is located, as determined by its grid reference.
- “WD” are the containing Ward name and code – again from OS-BoundaryLine
- “BUA” means built-up area name and code – as defined in ONS’s datasets
- “S” and “L” are Settlements and Localities in Scotland.
My next step was to process the full (CSV) list of Plaques in the UK, using the Plaque ID to link back to the GeoJSON file in a demo app I’ve developed (see below).
Open the CSV in your favourite spreadsheet, and you’ll quickly discover some issues with the layout and formatting:
- Blank and “ragged” rows – caused by line breaks, mainly within the inscription column.
- Strange characters in various columns – which created difficulties parsing the file in my app.
- Single and double quotes within the text – also mainly within the inscription column, and sends my app bonkers.
To fix these issues I ran the CSV file through TextWrangler (my favourite text editor) to:
- Search and replace all blank rows
- Remove all strange characters (using TextWranglers handy “Zap Gremlins” feature); and
- Search and replace all in-text quotes with a vertical pipe character: so a single quote is marked as |, and double quote as ||.
The result is this CSV file (7 MB).
Visualising the results
Once the data has loaded (I use D3’s defer library for that), you should see a screen like the one below. Note that the Local Authority boundaries are coming from a 2MB topojson file which I created from OS-Boundary-Line – and which you can download here.
I’ve also experimented with D3 bubble charts to show top 20 subjects and people mentioned in the data (by number of associated plaques)
Clicking on a bubble or Local Authority (in the table or map) should then display a list of associated plaques, with an interactive map of where they are: you’ll need to scroll down to see this bit.
Images in the table are clickable too, like this one below…why is my cue to leave!
Hope this has inspired you to start playing with a fantastic dataset, and find out more about your local area, or favourite subject or person.