Trusted Smart Statistics: What it is Why it comes Where it brings us Fabio Ricciato fabio.ricciato@ec.europa.eu EUROSTAT - Big Data Task Force Smart Statistics 4 Smart Cities Kalamata, Greece, 6.10.2018
The new datafied world The cyber world is natively digitial. And the physical world is • being increasingly digitized (IoT, Smart Devices … ) “ Anything that goes digital, gets logged ” • (somewehere, by somebody) 1° fundamental law of datafication digitalization à datafication my mobile phone operator
The new datafied world The cyber world is natively digitial. And the physical world is • being increasingly digitized (IoT, Smart Devices … ) “ Anything that goes digital, gets logged ” • (somewehere, by somebody) 1° fundamental law of datafication digitalization à datafication Individuals, organizations, places … become “data fountains ” • More and more business companies become “data buckets ” • my app provider my mobile phone operator my energy provider me and my smart devices
data and new data “micro-data” Features about the • Name. Gender. Birth date. individual Marital Status. Residence address. changing slowly or rarely • Occupation. Household composition … recorded at coarse • Monthly income. temporal aggregation Monthly expenditures per good category. (months, years). Number of touristic trips in a year. …
data and new data “micro-data” Features about the • Name. Gender. Birth date. individual Marital Status. Residence address. changing slowly or rarely • Occupation. Household composition … recorded at coarse • Monthly income. temporal aggregation Monthly expenditures per good category. (months, years). Number of touristic trips in a year. Features about single • events, transactions … “nano-data” à highly pervasive, sub-individual level changing continuously Your exact location, every second. • Every single heart-beat, blood pressure … recorded at fine temporal • Every single transaction, purchases, aggregation (minutes, encounter, event involving you … seconds) Your current opinion on any single fact …
data and new data “micro-data” Name. Gender. Birth date. “Shallow data” Marital Status. Residence address. Occupation. Household composition … … Monthly income. Monthly expenditures per good category. Number of touristic trips in a year … … “nano-data” Your exact location, every second. Every single heart-beat, blood pressure … “Deep data” Every single transaction, purchases, encounter, event involving you … Your current opinion on any single fact …
Official Statistics. The ultimate goal of Official Statistics is • to produce macro-data (statistics) from input micro-data • Collection of micro-data as ancillary task macro-data (statistics) micro-data (abut individual)
Official Statistics. Augmented Availability of new (deep, nano) data sources • as opportunity to extend & empower Official Statistics Additional statistical products: more dimensions, better timeliness, macro-data finer spatio/temporal (statistics) granularity, … Additional micro-data, micro-data possibly derived from (abut individual) nano-data Additional processes nano-data (sub-individual) Additional Input Data Sources
Where the data can be accessed? online energy carmaker platforms company smart home smart car B2G channel Business(Bucket?)-to-Government smartphone access to privately-held data private-public partnerships … smartwatch Statistical Office C2G channel Citizens-to-Government Crodwsourcing, Smart Surveys Citizen Statistics!
Official Statistics based on survey data society, policy, economy Public sector media, research SO collection processing SO: Statistical Office
Official Statistics based on survey data and administrative data society, policy, economy Public sector media, research SO collection processing processing SO: Statistical Office
and now Big Data come into play society, policy, economy Public sector media, research SO collection processing processing Private sector (business and citizens)
Handling the new in old ways Pull data in society, policy, economy Public sector media, research SO This is not feasible. collection processing Technical scalability, organisational, legal (risk concentration), … processing x processing processing processing processing Private sector
Handling the new in old ways Pull data in society, policy, economy Public sector media, research SO This is not feasible. collection processing Technical scalability, “Shallow data” organisational, legal (risk concentration), … processing micro-data x processing processing processing processing Deep data nano-data Private sector
Handle the new in new ways Push computation out (partially) society, policy, economy Public sector media, research SO collection processing processing Private sector
Handle the new in new ways Push computation out (partially) society, policy, economy Public sector media, research SO collection processing processing processing processing processing processing processing processing Trusted Smart processing processing processing processing processing processing Statistics Private sector
Trusted Smart Statistics Smart Statiscs as an opportunity to deliver more advanced statistical Smart: externalization towards data sources products, more timely (nowcasting), o f the (intial) part of processing execution more targeted to specific user groups, through novel reporting and Leveraging the “smart” features of the data presentation ways … sources (often Smart Systems, Smart Objects) and other “smart technologies” (e.g., Smart Contracts). SO Trusted : ensure an articulated set processing of trust guarantees to all players processing processing processing (SO as “taker” and “giver” of trust guarantees) Guarantee that data are processed for the agreed purpose, by the agreed method, respect of user privacy & processing processing Trusted Smart processing processing business confidentiality, compliance processing processing processing processing Statistics with legal provisions … Private sector (business and citizens)
Towards a Reference Architecture for Trusted Smart Statistics Design Principles Reference Architecture Specifications Work-in-progress at Eurostat in coordination with ESS European Statistical System in dialogue with other stakeholders Implementation Private Data Holders • Researchers, Academic communities • Data Protection Authorities • other arms of European Commission • National and Local authorities • … • …
Some design principles 1. Processing method (algorithm) transparent to all involved parties • co-designed or at least agreed-upon (consensus-based design) 2. Data are not “moved to/shared with”, but only “used by” the Statistical Office – goal is the output, not the input! • Adopt technologies for Secure Private Computing technologies, e.g., Secure Multy-Party Computation 3. Engage and partner with the input parties Incentives might involve “giving back” computation output to them • 4. Agreement for data usage bound to computation instance. • Technological means guarantee that data cannot be used for other query/ purpose other than the agreed one(s) 5. Purpose and algorithms open for public scrutiny • public transparency à public trust
Data Holders Certification Authority? Statistical Office CA DH-2 SO DH-1 consensus source code approved 1 by all parties
Some design principles 1. Processing method (algorithm) transparent to all involved parties • co-designed or at least agreed-upon (consensus-based design) 2. Data are not “moved to/shared with”, but only “used by” the Statistical Office – goal is the output, not the input! • Adopt technologies for Secure Private Computing technologies, e.g., Secure Multy-Party Computation 3. Engage and partner with the input parties Incentives might involve “giving back” computation output to them • 4. Agreement for data usage bound to computation instance. • Technological means guarantee that data cannot be used for other query/ purpose other than the agreed one(s) 5. Purpose and algorithms open for public scrutiny • public transparency à public trust
CA DH-2 SO DH-1 consensus source code approved 1 by all parties 2 non-personal intermediate data exported to SO SO DH-1 [ … ] official confidential statistics input data secret shares DH-2 authenticated binary code executed in secure hardwar
Secure Multi-Party Computation (SMPC) infrastructure An infrastructure (technology + organizational provisions) to let the output information be extracted without exchanging the input data computation output (non-personal) confidential input data secret shares SMPC computation
B2G scenario with multiple DHs SO DH-1 [ … ] official confidential statistics input data secret shares DH-2 authenticated binary code executed in secure hardwar
BG2G scenario: SO providing input data SO confidential input data SO DH-1 [ … ] official confidential statistics input data secret shares DH-2 authenticated binary code executed in secure hardwar
Recommend
More recommend