Challenges of real-world data � We face an explosion in data from e.g.: � � Internet transactions � � Satellite measurements � Advances in � � Environmental sensors � � … � Privacy-Preserving Machine Learning � Real-world data can be: � � Vast (many examples) � � High-dimensional � Claire Monteleoni � � Noisy (incorrect/missing labels/features) � Center for Computational Learning Systems � � Sparse (relevant subspace is low-dim.) � Columbia University � � Streaming, time-varying � � Sensitive/private � Machine learning � Principled ML for real-world data � Goal: design algorithms to detect patterns in real-world data. � Given labeled data points, find a good classification rule. � � Describes the data � � Want efficient algorithms, with performance guarantees. � � Generalizes well � � Learning with online constraints: � � Algorithms for streaming, or time-varying data. � E.g. linear separators: � Active learning: � � Algorithms for settings in which unlabeled data is abundant, and labels are difficult to obtain. � Privacy-preserving machine learning: � � Algorithms to detect cumulative patterns in real databases, while maintaining the privacy of individuals. � New applications of machine learning: � � E.g. Climate Informatics: Algorithms to detect patterns in climate data, to answer pressing questions. �
Privacy-preserving machine learning � Anonymization: not enough � Sensitive personal data is increasingly � Anonymization does not ensure privacy. � � being digitally aggregated and stored. � Attacks may be possible e.g. with: � � Auxiliary information � � � Structural information � Problem: How to maintain the privacy of individuals, when Privacy attacks: � detecting patterns in cumulative, real-world data? � [ Narayanan & Shmatikov � 08 ] identify Netflix users from anonymized � E.g. � records, IMDB. � � � Disease studies, insurance risk � � [Backstrom, Dwork & Kleinberg ‘07] identify LiveJournal social relations � � Economics research, credit risk � from anonymized network topology and minimal local information. � � � Analysis of social networks � � � Related work � Related work � � � � � � � � � � � � � � Data mining: � Data mining: � � Algorithms, often lacking strong privacy guarantees. � k-anonymity [Sweeney ‘02], l -diversity [Machanavajjhala et al. ‘06], � � Subject to various attacks. � � t-closeness [Li et al. ‘07]. Each found privacy attacks on previous. � � All are subject to composition attacks [Ganta et al. ‘08]. � Cryptography and information security: � Privacy guarantees, but machine learning less explored. � � Cryptography and information security: � � [Dwork, McSherry, Nissim & Smith, TCC 2006]: Differential Learning theory � privacy, and sensitivity method. Extensions, [Nissim et al. ’07]. � � � Learning guarantees for algorithms that adhere to strong � privacy protocols, but are not efficient. � Learning theory � � [Blum et al. ‘08] method to publish data that is differentially � private under certain query types. (Can be computationally � � prohibitive.) � � [Kasiviswanathan et al. ’08] exponential time (in dimension) � algorithm to find classifiers that respect differential privacy. �
� ! differential privacy � The sensitivity method � � � � � [DMNS ‘06]: Given two databases, D 1 , D 2 that differ in one [DMNS ’06]: method to produce � ! private approximation to any function of a database. � element: � Sensitivity: For function g, sensitivity S(g) is the maximum change in g with one input. � [DMNS ’06]: Add noise, proportional to sensitivity. Output: � � f(D) = g(D) + Lap(0, S(g)/ � ) � t � A random function f is � -private, if, for any t � Pr[ f(D 1 ) = t ] � (1 + � ) Pr[ f(D 2 ) = t ] � t � Idea: Effect of one person’s data on the output: low. � g(D 1 ) � g(D 2 ) � Motivations and contributions � Regularized logistic regression � Goal: machine algorithms that maintain privacy yet output good We apply sensitivity method of [DMNS ‘06] to regularized logistic classifiers. � regression, a canonical, widely-used algorithm for learning a linear separator. � – � Adapt canonical, widely-used machine learning algorithms � – � Learning performance guarantees � – � Efficient algorithms with good practical performance � � Regularized logistic regression: � Input: (x 1 ,y 1 ),...,(x n ,y n ). � [Chaudhuri & Monteleoni, NIPS 2008]: � � x i in R d , norm at most 1. y i in {-1, +1}. � � A new privacy-preserving technique: perturb the optimization Output: � problem, instead of perturbing the solution. � n 2 w T w + 1 λ w ∗ = arg min � � � log(1 + exp( − y i w T x i )) � Applied both techniques to logistic regression, a canonical ML algorithm. � w n i =1 1 � Proved learning performance guarantees that are significantly tighter • � Derived from model: � p ( y | x ; w ) = for our new algorithm. � 1 + exp( − yw T x ) • � First term: regularization. � � Encouraging results in simulation. � • � w in R d predicts SIGN(w T x) for x in R d . �
Sensitivity method applied to LR � New method for PPML � Sensitivity method [DMNS ‘06] applied to logistic regression: � A new privacy-preserving technique: perturb the optimization problem, instead of perturbing the solution. � Lemma: The sensitivity of regularized logistic regression is 2/n � . � � � No need to bound sensitivity (may be difficult for other ML algorithms) � � � Noise does not depend on (the sensitivity of) the function to be learned. � Algorithm 1 [Sensitivity-based PPLR]: � � Optimization happens after perturbation. � 1. � Solve w = regularized logistic regression with Application to regularized logistic regression: � parameter � . � Algorithm 2 [New PPLR] � 2. � Pick a vector h: � 1. � Pick a vector b: � � Pick |h| from � (d, 2/n �� ), � Where density of � � Pick |b| from � (d, 2/ � ), � � Pick direction of h uniformly. � (d,t) at x ~ � � Pick direction of b uniformly. � x d-1 e -|x|/t � 2. Output: � 3. � Output w + h. �� n 2 w T w + 1 log(1 + exp( − y i w T x i )) + 1 λ w ∗ = arg min nb T w � Theorem 1: Algorithm 1 is � -private. � n w i =1 New method for PPML � Privacy of Algorithm 2 � Theorem 2: Algorithm 2 is � -private. � Proof outline (Theorem 2): � Want to show Pr[ f(D 1 ) = w* ] � (1 + � ) Pr[ f(D 2 ) = w* ]. � Remark: Algorithm 2 solves a convex program similar to standard, regularized LR, so similar running time. � D 1 = { ( x 1 , y 1 ) , . . . , ( x n − 1 , y n − 1 ) , ( a, y ) } ∀ i, || x i || ≤ 1 || a || , || a ′ || ≤ 1 D 2 = { ( x 1 , y 1 ) , . . . , ( x n − 1 , y n − 1 ) , ( a ′ , y ′ ) } General PPML for a class of convex loss functions: � Pr[ f ( D 1 ) = w ∗ ] = Pr[ w ∗ | x 1 , . . . , x n − 1 , y 1 , . . . , y n − 1 , x n = a, y n = y ] Theorem 3: Given database X={x 1 ,…,x n }, to minimize functions of the Pr[ f ( D 2 ) = w ∗ ] = Pr[ w ∗ | x 1 , . . . , x n − 1 , y 1 , . . . , y n − 1 , x n = a ′ , y n = y ′ ] n form: � � F ( w ) = G ( w ) + l ( w, x i ) We must bound the ratio: � i =1 � If G(w), l (w, x i ) everywhere differentiable, have continuous derivatives Pr[ w ∗ | x 1 , . . . , x n − 1 , y 1 , . . . , y n − 1 , x n = a ′ , y n = y ′ ] = h ( b 1 ) Pr[ w ∗ | x 1 , . . . , x n − 1 , y 1 , . . . , y n − 1 , x n = a, y n = y ] 2 ( || b 1 || − || b 2 || ) h ( b 2 ) = e − ǫ G(w) strongly convex, l (w, x i ) convex and � � , for any x, �� ∀ i �∇ w l ( w, x ) � ≤ κ n w ∗ = arg min l ( w, x i ) + b T w � then outputting � � w G ( w ) + Where b 1 is the unique value of b that yields w* on input D 1 . (Likewise b 2 ) � � - b’s are unique because both terms in objective differentiable everywhere. � i =1 � where b = B r, s.t. B is drawn from � (d, 2 � / � ) , r is a random unit vector, � Where h(b i ) is � density function at b i . � is � -private . � � Bound RHS, using optimality of w* for both problems, and bounded norms. � � �
Recommend
More recommend