Since my arrival at the ONS 6 months ago, I have received a good deal of training on the sort of stuff you’d expect at a big, responsible organisation – anti-bullying and harassment training, equality and diversity training, health and safety, responsibility with information, and a raft of ‘digital awareness’ modules. I think this training is of real value but a lot of people might see the legal requirement for this as an undue burden on business, especially for small to medium sized enterprises (SMEs). I would argue that government bodies like the ONS can maximise the value of their work and reduce some of the perceived burden on SMEs by applying Open Data philosophy to all resources pushing beyond the common misunderstanding that ‘open data’ is just the information that can be found in spreadsheets. Continue reading “Open Educational Resources from ONS”
Open Data Day has been again! Hundreds of events with thousands of attendees happened over 6 continents – what a community of developers, hackers, data wranglers and designers there are out there: talk about the Digital Revolution! I was lucky enough to attend the London event and take part in an excellent project to do with the gender of London’s street names.
The project was all the more interesting because it was based on another project by hackers from Montevideo in Uruguay. They had collected their city’s street names from Open Data sources and then used a system called Genderize and a lot of manual curation to identify all the streets named after women. They’d then plotted this on a map on their project site, A-tu-nombre.
We decided to do the same thing for London. It was interesting to see how the same project was approached differently by us. Our assumption was that this was a project intended to highlight gender disparity and so we were concerned with plotting men Vs women on our map. However a big part of the focus in Uruguay had been to highlight the women and link to their Wikipedia page so people could learn more about them, learning about cultural history and a bit less adversarial.
Other differences became obvious in the challenge itself, for example, street naming in Montevideo often uses the full name of the person whereas in the UK we tend to use a surname or title and it’s much harder to automate the identification – this meant we didn’t bother with automated links to Wikipedia and just stuck with war of the sexes (see how that looked at the end). This is user engagement
Getting stuck in at a hackathon was a great way to build relationships with developers and Open Data users that wouldn’t normally fall into our ‘User Experience’ surveys and seminars as well as to build relationships with obvious groups like Open Knowledge. I was impressed to be working alongside local council employees and after discovering they have lots of opinions on ONS Open Data I’ll be going to visit them to hear the experience of their whole team.
Another exciting hookup was with Data Campfire who are prototyping a platform that lets data users promote their projects and link to the publishers of the data they’ve used. It’ll be so much easier to learn from our wider data users if we can get a ping from that platform whenever someone posts a new use of our data.
Perhaps the best linkup was with the original team from Uruguay who were at their own Open Data Day event and happy to give us pointers and encouragement over the course of the event. Open Data is global and it’s great to have the opportunity to engage with potential users on another continent.
For anyone that’s thinking, ‘but I don’t have the skills to go to one of these things’, I can report that it was a hugely diverse group with bloggers, designers, journalists and activists alongside the obvious programmers and data geeks. You can definitely join in and contribute at an event like this. See our project
Open Source goes hand in hand with Open Data so check out the gender assignment code over at GitHub. Or check out the CartoDB map of London’s streets with gender.
You might notice that although it’s ‘quite good’, it’s not perfect. Long Acre is considered female for example and we had to manually intervene to stop all the lanes being genderised because Lane is a legitimate name. However, there is a reason the Open mantra is ‘release early, release often’. Rather than sit on the project until the system is perfect – many, many months from now – we can post our code and share our ideas and hopefully inspire the community straight away, just as we were inspired by the team in Uruguay.
Update: Gregor Boyd over at the Data Donkey blog has copied/extended this project for Edinburgh’s streets using a different data source and a different mapping system – check out Edinburgh streets by gender too. If you repeat/extend this project for your neighbourhood, please do comment to let us know!
In the science community this mountain or bottleneck (depending on your preferred metaphor) happens when the authors of a paper that summarises several years of work are suddenly asked to find all the metadata from all their lab books, spreadsheets and Post-It Notes and put them in a super-interoperable, machine-readable format. The problem is exacerbated by how many groups and experts collaborate on science these days and how often they move labs (or lose their USB drives).
Official Statistics producers are very much like scientific researchers. We have many collaborative groups of people with their own complex areas of expertise that all feed into the pipeline that culminates in the publication of data and analyses.
Open Data is the new kid on the block and it’s trying to muscle in on well established, effective, data production pipelines, often as a publishing exercise sitting at the end of this pipeline.
Cut me and it’s Open Data all the way through
A retrospective attempt at piecing together semantically viable metadata is an insurmountable task. Trying to bring together a comprehensive Open Data story for such a large body of work at the end of the production line is like trying to squeeze the words inside a stick of rock after you’ve rolled it.
The only way for any large process to reach the summit of Open Data is to have every contributor store, tag and share the deep metadata that’s relevant to their manipulation of the data. This applies to scientists in the field, in the lab and when writing the stats software, and it applies to the rainbow of official statistics teams too.
For example, survey questions should be linked to the statistics they’ll produce (e.g Linked Data URI for ‘date of birth’ ) and statistical bulletins should document the statistical methods and the software used to analyse the data (Check out the Linked Data resource for statistical methods: STATO and there’s a budding Linked Data resource for software called the EDAM ontology).
This might seem like a major burden of work but we expect it of scientists because we want to raise the reproducibility and robustness of their work so as to increase secondary value (re-use) and to improve trust in their findings. Those goals are very relevant to official statistics producers and eventually there will be calls for bringing these benchmarks into official data and methods.
More than that, this is how Open Data principals will most likely bring efficiency improvements within public institutions as our internal procedures become transparent to ourselves and to the wider community.
By spreading these tasks throughout the pipeline we reduce the size of the mountain at any one point.
What will it take? Training and awareness-raising at all ends of our organisations is a must but what about department-wide Open Data officers a bit like we have Data Protection officers or team-specific Open Data experts like we have First Aiders or Fire Wardens even? Openness is a lens for viewing how we do our business just like health and safety or risk management.
Please comment below (or over on Twitter @bobbledavidson) with your ideas about where you think Open Data should fit within an organisation, what that would take and where it would lead.
If you want to be followed on social media, just add #bigdata or #datascience to your posts. These are the buzzwords of the century so far and with the aid of geek chic have brought computer tech to its greatest prominence since #dotcombubble. We’re all going to get rich off Big Data or so the story goes – data is the ‘new oil’ ( or information is the new oil ) and Data Scientist is the ‘sexiest job of the 21st Century‘. These ideas have been endlessly rebutted and reinforced over the last couple of years but regardless of how much might be hype, data is definitely the big thing of the moment.
Arguably the poor cousin of Big Data is Open Data. This is probably because venture capitalists hate the idea of just giving away their IP, USP and other acronyms but also possibly because outside of the nascent hacker communities, not many people get too excited about having machine readable access to bus timetables or waste management data.
And yet, Open Data has been getting a lot of loving attention from governments, especially in the aftermath of the global financial crash and the ubiquitous drive to cut costs via efficiency savings and perhaps even increase economic returns from government assets.
This government sponsored open movement is incredibly timely and important. In part because the Open Source and Open Data movements are really priming the pump of the Data Science industry (or Digital Economy) but it also offers to increase public trust in government, something that appears a lot in the UK Statistics Authority’s Code of Practice . It also promises more globally linked-up monitoring, evaluation and strategising which is surely required for tackling global social challenges like climate change, food security and our ageing populations.
The UK has been at the forefront of Open Data for a few years, only just being pipped to pole-position by Taiwan in this year’s Open Data Index. The Office for National Statistics has been leading that charge and currently has 1213 datasets and over 20 thousand reference tables available via the ONS website – and yet there is so much further to go in opening up and unleashing the full potential of Open Data for “UK Plc” and our society.
I have been in my new role as Open Data Lead at the ONS for 3 weeks so far. It’s still early days but I’ve been excited to see the developments underway – with a new website almost finished beta testing, an API that’s also in late stage beta, and a pilot project for a Linked Data portal/API just kicking off (watch this space).
A big part of my role is community engagement and advocacy and I’ll be hoping to create a dialogue both on this blog and on social media (@bobbledavidson) on how the ONS should be pushing forward with Open Data. What data needs to be released? What format is best? I want to hear from you.
If data is the new oil, Open Data is the oil that fuels society and we need all hands at the pump.