Differentially Private Recommender Systems David Madras University of Toronto April 4, 2017 David Madras (University of Toronto) DP Recommender Systems April 4, 2017 1 / 24
Introduction Today I’ll be discussing ”Differentially Private Recommender Systems”, by Frank McSherry and Ilya Mironov in 2009 [1] Modern recommendation systems aggregate many user preferences This allows for better recommendations Can compromise privacy Improved privacy can lead to ”a virtuous cycle” Better privacy → more user data → better privacy → ... David Madras (University of Toronto) DP Recommender Systems April 4, 2017 2 / 24
Introduction Example: Netflix movie recommendation system Has database of ratings (1 - 5 stars) of many movies by many users Will recommend movies based on past ratings by you and similar users Information can be used to link profiles Attackers can make inferences about others by injecting own input Figure 1: Netflix David Madras (University of Toronto) DP Recommender Systems April 4, 2017 3 / 24
Contribution of this paper Develops ”realistic” DP recommender system Integrate DP into the calculations, rather than presenting private data Proves privacy guarantees Tests algorithm performance on Netflix Prize dataset David Madras (University of Toronto) DP Recommender Systems April 4, 2017 4 / 24
Related Work Survey of DP-analogues of various machine learning algorithms [2] Demonstrations of privacy attacks on Netflix (or similar) data ◮ Can identify rows based on few data points [3] ◮ Can make valid inferences about user history by observing recommendations (Amazon data) [4] Data anonymization techniques [5, 6] ◮ These tend to destroy performance of recommender algorithms Cryptographic solutions [7, 8] ◮ Focus on removing central trusted party with complete access David Madras (University of Toronto) DP Recommender Systems April 4, 2017 5 / 24
High-level Recommendation Algorithm Framework Given: users, items, ratings on a subset of (user, item) pairs Want to predict held-out values at (user, item) locations Global Effects: Centre ratings by subtracting per-user/per-movie 1 averages ⋆ Augment with artificial ratings at global average to stabilize averages with small support Find covariance matrix C 2 Apply geometric recommendation algorithm to C 3 ⋆ Roughly, we can compute many learning algorithms using the covariance matrix e.g. factor analysis, clustering, etc. ⋆ If covariance matrix is DP, the whole algorithm will be DP David Madras (University of Toronto) DP Recommender Systems April 4, 2017 6 / 24
A DP Recommendation Algorithm - Notation Let r u be user u ’s ratings vector, and r ui be user u ’s rating on item i Let e u , e ui be the binary vectors and elements denoting presence of ratings Let c u = � e u � 1 be the number of ratings by user u X = x + Noise means we’re adding some type of DP noise - either Laplacian or Gaussian depending on what guarantee we want to satisfy David Madras (University of Toronto) DP Recommender Systems April 4, 2017 7 / 24
A DP Recommendation Algorithm - Item Effects First calculate global average G privately � u , i r ui + Noise G = GSum GCount = (1) � u , i e ui + Noise Then calculate per-item averages MAvg i privately, stabilizing with β m fictitious ratings of G for each item MAvg i = MSum i + β m G (2) MCount i + β m where MSum i = � u r ui + Noise , MCount i = � u e ui + Noise These averages are DP and can be published - we can incorporate them into further computation with no additional privacy cost David Madras (University of Toronto) DP Recommender Systems April 4, 2017 8 / 24
A DP Recommendation Algorithm - User Effects We can subtract these per-item averages, and then centre ratings by user as well The per-user average (not DP) ¯ r u is calculated as � i ( r ui − MAvg i ) + β p G r u = ¯ (3) c u + β p Calculate centred ˆ r ui = r ui − ¯ r u Clamp these to a sensible interval [ − B , B ] to lower sensitivity of measurements David Madras (University of Toronto) DP Recommender Systems April 4, 2017 9 / 24
Effect of a Single Rating Change What is the maximum effect of a single rating change on centred and clamped ratings ˆ r ? Let r a , r b be two sets of ratings with a single new rating at r b ui r a and ˆ r b is in ˆ Then the only difference in ˆ r u For any j where r a , r b have common ratings: u | = | r b r a ui − ¯ u | α r b r a r b r a (4) | ˆ uj − ˆ uj | ≤ | ¯ u − ¯ ≤ c b u + β p c b u + β p where α is the maximum possible difference between ratings (for Netflix, α = 5 − 1 = 4) David Madras (University of Toronto) DP Recommender Systems April 4, 2017 10 / 24
Effect of a Single Rating Change α r b r a | ˆ uj − ˆ uj | ≤ u + β p is a bound on the difference in a single clamped, c b centred rating r b Using that | ˆ ui | ≤ B , we can bound the difference between the clamped, centred databases as well (they only differ on one row) α r b − ˆ r a � 1 ≤ c a � ˆ u × + B < α + B c b u + β p (5) u + β p ) 2 + B 2 < α 2 α 2 r b − ˆ r a � 2 + B 2 2 ≤ c a � ˆ u × ( c b 4 β 2 p Since c a u + 1 = c b u , we can bound the first squared term from above α 2 4 β p + B by taking derivative w.r.t. c a with u and maximizing As β increases, these differences become arbitrarily close to B , B 2 David Madras (University of Toronto) DP Recommender Systems April 4, 2017 11 / 24
Calculating the Covariance Matrix - User Weights For a single change in rating (in row u ), the difference in covariance matrices is bounded by (maybe times a constant) � Cov a − Cov b � ≤ � r a u � + � r b u � (6) For users with many ratings, this can be very high 1 We introduce weights w u = � e u � for each user, to normalize the contributions of each user These weights will be used to calculate the covariance matrix David Madras (University of Toronto) DP Recommender Systems April 4, 2017 12 / 24
Calculating the Covariance Matrix We want to find good low dimensional subspaces of the data - three similar approaches: Apply SVD to the data matrix 1 Apply SVD to the items x items covariance matrix 2 Apply SVD to the user x user Gram matrix 3 Adding noise for privacy makes some of these approaches inconvenient Data matrix: error scales with # users 1 Item cov. matrix: error scales with # items 2 User Gram matrix: error scales with # users, # items, max covariance 3 between two users For most applications, item covariance matrix is best To calculate the covariance matrix C of movies in a DP way � r T C = w u ˆ r u ˆ u + Noise (7) u David Madras (University of Toronto) DP Recommender Systems April 4, 2017 13 / 24
Calculating the Covariance Matrix We want to show that given a change in a single rating, this covariance matrix will not change too much Again, we’ll take r a , r b be two sets of ratings with a single new rating at r b ui How big can � C a − C b � be? First, note that since the ratings r only differ on one row, � C a − C b � = � w a r a r aT − w b r b r bT u ˆ u ˆ u ˆ u ˆ u � = u � w a r a r a r b u ) T � + � w b r aT r bT r bT u � + � ( w a u − w b r a r bT u ˆ u (ˆ u − ˆ u (ˆ − ˆ u )ˆ u )ˆ u ˆ u � u 1 1 1 Since � e a u � − � e b u � ≤ 1, w a u − w b u = u � − u � ≤ u � , we can � e a � e b � e a u �� e b also say that: r a r b r a r b � C a − C b � ≤ ( ˆ + ˆ u � + � ˆ u �� ˆ u � u u r a r b (8) ) � ˆ u − ˆ e a ˆ e b ˆ � e a u �� e b u � u u David Madras (University of Toronto) DP Recommender Systems April 4, 2017 14 / 24
Calculating the Covariance Matrix r a r b Using � ˆ r i � ≤ � ˆ e i � × B and the previous bounds on � ˆ u − ˆ u � : � C a − C b � 1 ≤ ( B + B )( α + B ) + B 2 = 2 B α + 3 B 2 � α 2 � C a − C b � 2 ≤ ( B + B )( + B 2 ) + B 2 (9) 4 β p √ √ 2 B 2 ) + B 2 = B 2 (1 + 2 = 2 B ( 2) α 2 where we use β p = 4 B 2 David Madras (University of Toronto) DP Recommender Systems April 4, 2017 15 / 24
Calculating the Covariance Weight Matrix A similar result holds for the binary e matrix (which indicates which ratings are present) � w a e a e aT − w b e b e bT u ˆ u ˆ u ˆ u ˆ u � 1 ≤ 3 u √ (10) � w a e a e aT − w b e b e bT u ˆ u ˆ u ˆ u ˆ u � 2 ≤ 2 u David Madras (University of Toronto) DP Recommender Systems April 4, 2017 16 / 24
Per-User Privacy The claims in this paper are with respect to per-rating privacy A stronger guarantee would mask the presence of an entire user The only change we need to make is to apply a ”more aggressive down-waiting by number of ratings” So our ratings vectors are normalized before we do any of the counting operations This claim is not entirely clear to me David Madras (University of Toronto) DP Recommender Systems April 4, 2017 17 / 24
Cleaning the Covariance Matrix Optionally, we can denoise the covariance matrix a little for better performance ”Shrinking to the average” C ij = C ij + β mean ( C ) ¯ (11) W ij + β mean ( W ) Conduct a rank- k approximation The low-rank approximation also compresses it - easier to send to client computers Post-processing does not affect privacy David Madras (University of Toronto) DP Recommender Systems April 4, 2017 18 / 24
Recommend
More recommend