The Office for National Statistics Big Data Project Nigel Swier NTTS Conference, 10-12 March 2015, Brussels
Background • Set up as a12 month project Jan-Dec 2014 • Aims: • Investigate the potential of big data for official statistics and to understand the challenges • Establish an ONS policy and longer term strategy • Recommend practical next steps • 3 month extension to prepare business case for next phase
Work packages • Management and Strategy • Stakeholder Engagement • Communication • Analysis and infrastructure: #Twitter Smart meters Technology Pilots Mobile phones Prices
Stakeholder Engagement Commercial Sector International Privacy Groups Academia Government
Pilot 1: Smart meters Potential of data from electricity smart-type meters to identify unoccupied households • More efficient response chasing • Data from smart meter trials in Great Britain and Republic of Ireland • A range of potential methods identified • Privacy and ethics
Smart-type Meter Energy Use Profiles Occupied profile Unoccupied profile Anomaly
Pilot 2: Mobile Phones Mobile phone data to model population flows, e.g. Commuting statistics • Building relationships with mobile network operators and other parts of UK Government • No data yet. Seeking better coordinated data access for Government • Privacy and ethics (again)
Pilot 3: Prices Use of web scraped price data for use in price statistics • ONS prices collection is manual • Web scraping promises more detailed, more frequent and cheaper data • Prototype web scrapers: • 35 CPI/RPI item categories • 3 supermarkets • Daily collection (around 6500 a day) • Data ‘wrangling’ is a big challenge
Daily Price Index (Whiskey)
Pilot 4: Twitter Potential of geo-located Twitter to gain new insights mobility and migration • 7 months of geo-located tweets within Great Britain (about 100 million data points) • Methodology to infer place of usual residence: - Identify user ‘anchor points’ by clustering tweets using a DBSCAN algorithm - Identify residential anchor points using AddressBase and nearest neighbour analysis
Use case: Student mobility
Conclusion • A range of potential benefits (not just about replacing existing outputs) • Challenges can be overcome: • Technical/Skills => Innovation labs • Legal/ethical => Policy and guidance • Statistical (bias) => Benchmarking survey • Affordable access => ??? • Long term investment is required Project recommends a further 3 year project to deliver both tangible benefits and a broader capability to support big data projects
Recommend
More recommend