I was like her according to her; We were both outliers Privacy Preserved Data Augmentation using Enterprise Data Fabric Final blow before Tea! Atif Rahman Twitter: @mantaq10 Zetaris www.zetaris.com
Data Exchanged (without consent) • GPS • HIV Status • Email addresses • Weapon: Contract • Response: Excuse • Exposure: (Potential) exposure of marginalized people.
Data Breach: • Email Addresses • Username & Passwords Exposure: • 150 million customers Response: • No clear Apologies • (Delayed) Corrective Actions Weapon: Contract
Data Breach: • Names • Loyalty data • Email addresses • Physical addresses • DOB • Credit Card last 4 digits Exposure: • Millions of Customers Response: • Denial • Fake Solutions • 8 months before first action
Paper contracts are still the most common weapon organizations use to get away with. As regulations get more mature, the impetus to be more effective in privacy preservation will be on service providers.
Enterprises have different data landscape than consumer facing (typically tech) organisations. Enterprises have silos, legacy systems, have to learn to be data driven the hard way and have divergent forces giving a unique focus on From the exhibition: "M. Hulot, the protagonist in Jacques Tati's 1967 film Playtime, is continually frustrated by the endless repetition of office cubicles.
Agenda • Data Augmentation • First Principles • Enterprise Data Fabric
Data Augmentation Class 1 Class 2 ORG A Class 3
Data Augmentation Typical Modeling Exercise Class 1 Class 2 ORG A Class 3 Potentially Better Modeling after data augmentation Class 1 ORG A Class 2 ORG B Class 3 ORG C
Data Augmentation Class 1 ORG A Class 2 ORG B Class 3 ORG C Content Shared Channels: • Aggregated Data / Insights • Public Portals • Open Data • Private Marketplaces • Stratified Sampling • In Person Walk • Synthetic Data throughs/handovers • De-identified / Anonymized • Gossiping • Pigeons
Data as an asset Resolve to First Principles • Easy to copy and spawn Data has properties that make it • Does not depreciate or depletes intrinsically hard to ensure privacy preservation. Therefore, we must • Really hard to valuate adhere to first principles to better • Process to yield value understand the problem • Various forms and derivatives statement first.
The Five Safes Great Resources Safe Data Safe People Safe Setting Safe Project Safe Output ACS Data Sharing Frameworks The De-Identification Decision Making Framework
First Principles Safe Data Encryption Environment for Data Controllers & Processors Safe Setting Safe People Authentication & Authorisation Safe Output Linkage Problem Safe Project Audit Trail, Lineage and Access & Query Logs
First Principles Safe Data Encryption Environment for Data Controllers & Processors Safe Setting Safe People Authentication & Authorisation Safe Output Linkage Problem Safe Project Audit Trail, Lineage and Access & Query Logs
Safe Data – (Encryption) Data at Rest Standard Encryption Data in Transit Secure the Pipe Data for Compute Homomorphic Encryption
Homomorphic Encryption Less Costly Partial Homomorphic Encryption (PHE) Addition/Multiplication Low Order Polynomials Somewhat Homomorphic Encryption (SWHE) Full Homomorphic Encryption (FHE) Eval of Arbitrary Functions Data Analytics without seeing the data Max Ott, YOW Data 2016 More General
First Principles Safe Data Encryption Environment for Data Controllers & Processors Safe Setting Safe People Authentication & Authorisation Safe Output Linkage Problem Safe Project Audit Trail, Lineage and Access & Query Logs
Safe Setting - Confidential Computing Trusted Execution Environments (Safe Data in Safe Setting) Microsoft Azure Confidential Computing Google Cloud Platform: Asylo Open Source Framework Confidential Computing at the Software layer?
First Principles Safe Data Encryption Environment for Data Controllers & Processors Safe Setting Safe People Authentication & Authorisation Safe Project Audit Trail, Lineage and Access & Query Logs Safe Output Linkage Problem
Alice Bob
Safe People – (System Span)
Safe People – (System Span)
First Principles Safe Data Encryption Environment for Data Controllers & Processors Safe Setting Safe People Authentication & Authorisation Safe Output Linkage Problem Safe Project Audit Trail, Lineage and Access & Query Logs
Safe People – (System Span)
Safe People – (System Span) Expanding the Span of control
First Principles Safe Data Encryption Environment for Data Controllers & Processors Safe Setting Safe People Authentication & Authorisation Safe Output Linkage Problem Safe Project Audit Trail, Lineage and Access & Query Logs
Safe Project – Audit Trails & Lineage
Safe Project – Audit Trails & Lineage ? Its still very hard within enterprises to have a point to point track of data lineage and processing. The problem is expounded when data leaves the span of vision. Data in the wild
One Ring to Rule them All? Encryption Environment for Data Controllers & Processors Authentication & Authorisation Linkage Problem A data landscape must cover all principles of data privacy. Audit Trail, Lineage and Access & Query Logs
Monoliths in the era of Microservices
Server App DB
DB Server DB App Server DB Server
DB DB App Caching Server DB DB In-Memory App Server DB Streams DB App Server DB Messaging
The Enterprise Data Fabric DB Server App DB Server App DB Server DB App A unified data layer that is used by both user facing applications and downstream analytics, a potential holistic five safes environment
The Zetaris Enterprise Data Fabric – Location Aware, Usage Aware, People Aware, Privacy Preserved data in a secure environment. Also check out Apache Ignite, Redhat OpenShift + JBoss Virtualization,.
GDPR Highlights Monoliths e.g. Lakes Data Fabric Data Only through Right to transfer personal data from one electronic By Design Serialization processing system to and into another. Portability Random writes Right to withdraw consent and ask for personal Erasure By Design are not typical data to be deleted Right to know what’s been collected and how its Access Limited Purview By Design being processed Consumer is informed in ’clear’ and plain language. Consent Hard By Design Consent to collect can be withdrawn at any time
As data scientists, we are at Yet, our work has serious Enjoy the Tribe! the forefront of disruption negative implications, we and hold the potential to need to educate ourselves change things. We are on the broader societal automating decisions in all questions around aspects of society. regulations, ethics and impact
Recommend
More recommend