Think “Census” A Cryptography-Flavored � Method for sanitizing a database � Meaningful statistical analysis Approach to Privacy in � Preservation of individuals’ privacy Public Databases � What do we mean? Drineas, Dwork, Goldberg, Isard, Redz, Smith, Stockmeyer “Privacy” in English Focus on Geometric Data � Protection from being brought to the � Real database (RDB) consists of n points attention of others [Gavison] in d-dimensional space (say, unit ball) � inherently valuable � points are unlabeled � attention invites further privacy loss, eg info � Publish sanitized database (SDB) � One’s privacy is maintained to the extent � candidate sanitization procedure (later) that one blends in with the crowd. � Crowd size exceeds threshold T 1
Relative Notion of Isolation Adversary: The Isolator � Inputs to a c-isolator: � SDB T x � auxiliary information z � Output x δ � Success occurs if q c δ Cryptographic Flavoring Isolation Does Not Imply Failure of Sanitization � Cynthia publishes her point p on web � SDB shouldn’t help the isolator “too much” � I(SDB,Cynthia’s web site) = p � Definition of “not too much” should be � δ = 0 and ball of radius c δ contains only one fairly forgiving, eg, advantage obtained from seeing the SDB may be, say, n 1+ ε RDB point � Not the fault of the sanitization procedure! � I’(Cynthia’s web sit) = p 2
Candidate: Effective Sanitization Distribution on Databases? � Don’t want to deal with crypto-like definitions, in which, say, sum of every 7 th elements is congruent to 23 mod 51 � Take statistician’s approach: each point in the RDB is an independent sample from a single fixed distribution Meaningful Statistical Analysis Candidate Sanitization Procedure � For each x � RDB � Dream: find a large class of algorithms � Find T x = distance to T th nearest neighbor that “perform well” on sanitized data � Choose x’ � R B(x,T x ) � Start with clustering � Complements definition of c-isolation � if q c-isolates x then D(q,x) � T x /(c-1) � clusterings have measures of quality (diameter, conductance, etc.) � consequence: high dimensionality is our friend � Intuition: � See how measures are preserved � perturb minimally to prevent isolation � under sanitization � outliers randomized to oblivion � under de-sanitization � kills isolated anomalies, maintains group anomalies 3
Recommend
More recommend