Back in January the UK Government opened a web site up that was described as “a one-stop shop for developers hoping to find inventive new ways of using government data”. The site, http://data.gov.uk/, aims to pull together government generated data sets in a form that application developers can use to create ‘mashups’ of data from different sources of public and private data, create map based information from the data, etc. In other words, the idea if to open up public data for private use.
I was pretty excited; professionally I’ve used some public data in the past and acquiring it is usually quite hard going. Even if you know where to find the data, it’s not easy to just grab and download, and then it comes in various formats that need pre-processing to make useful. So, I was pretty excited when I heard about this project. I wouldn’t go so far as to say that my nipples were pinging with excitement, but there was definite anticipation.
So….my thoughts. Bottom line for me at the moment is ‘Sorry chaps, sort of getting there but there’s a long trail a-winding before you reach your goal’. Now, this may sound rather churlish of me, but allow me to explain….
Nature of data
First of all, a lot of the data on the site has been available in other places before now – however it is at least now under one roof, so to say. The data is also available in disparate formats, like CSV files, etc. The data is also pre-processed / sanitised – depending upon how you want to view it. In some cases the data is in the form of Spreadsheets that are great for humans but dire for automated processing in to mashups. The datasets are not always as up to date as one might expect; for example, on digging through to the Scottish Government data, I found nothing more recent than 2007.
Use of SPARQL and RDF
Although the SPARQL query language has been implemented to allow machine based searching of the site to be done, the data available via this interface seems to be pretty thin on the ground AND, to be honest, I’m not sure that the format is the best for the job. SPARQL is a means of querying data that is represented in the RDF format to search what’s called the ‘Semantic Net’ – a way of representing data on teh Internet that is more easily made meaningful to search tools. But for a lot of statistical data, this isn’t necessarily the best way to search for data, and the SPARQL language is not widely used or understood by developers.
There’s no API available such as a Web Service to get at the data. The site acknowledges this and states :
“The W3C guidance on opening up government datasuggests that data should be published in its original raw format so that it’s available for re-use as soon as possible. Over time, we will covert datasets to use Linked Data standards, including access through a SPARQL end-point; this will provide an API for easy re-use.”
I think this is a rather facile argument. Apart from the data not being that up to date, one can surely publish the contentof the data raw – i.e. no numerical alterations – whilst still making it available via a SOAP, JSON or other similar API that more developers might have experience of and access to. As it stands it just seems that some of the time spent on this project could have been spent in getting the data in to a format that could be served up in a consistent format to a wider range of developers.
This current interface – wait for the heresy, people – may be wonderful for the Semantic Web geeks amongst us BUT for people wishing to make widescale, real use of the data it’s NOT the best format to allow the majority of non bleeding-edge developers to start making use of the data available.
This is an early stage operation – it is labelled ‘Beta’ in the top right of the screen, and as such I guess we can wait for improvements. But right now it just seems to be geared too much towards providing a sop for the ‘Open Data’ people rather than providing a widely usable and up to date resource.