efficient algorithms for public private social networks
play

Efficient Algorithms for Public-Private Social Networks Flavio - PowerPoint PPT Presentation

Efficient Algorithms for Public-Private Social Networks Flavio Chierichetti Vahab Mirrokni Alessandro Epasto Ravi Kumar Silvio Lattanzi Sapienza University Google Brown University Google Google KDD2015 Sydney, Australia August


  1. Efficient Algorithms for Public-Private Social Networks Flavio Chierichetti Vahab Mirrokni Alessandro Epasto Ravi Kumar Silvio Lattanzi Sapienza University Google Brown University Google Google KDD2015 — Sydney, Australia — August 11, 2015

  2. Private-Public networks Idealized vision

  3. Private-Public networks Reality My friends are private

  4. Private-Public networks Reality My friends are private B A C

  5. Private-Public networks Reality My friends are private Only my friends can see my friends

  6. Private-Public networks Reality My friends are private C Only my friends can see my A friends D B

  7. Private-Public networks We are Reality a private group My friends are private Only my friends can see my friends

  8. Private-Public networks ~ 5 2 % o f N Y C We are Reality Facebook users hide a private their friends group My friends are private Only my friends can see my friends

  9. Private-Public networks ~ 5 2 % o f N Y C We are Reality Facebook users hide a private their friends group My friends are private Only my friends can see my friends There is no such thing as the Social Network !

  10. Social network of User A User A Each user has his/her own personal Social Network!

  11. Social network of User B User B User A Each user has his/her own personal Social Network!

  12. Computational implication The algorithms need to respect the privacy of the users. We can only use the data that the user can access. Naively, we need to run the algorithms once for each user on a different ( and huge ) graph!

  13. Application: Friend suggestion Network signals are very useful 
 Number of common neighbors Personalized PageRank, etc. My friends are private C A B D

  14. Application: Friend suggestion Common Neighbors - Ideal World 1) Run the algorithm (in parallel) on the graph G 2) For each user suggest top k users by common neighbors. My friends are … but there is no such graph G. private C A B D

  15. Application: Friend suggestion Common Neighbors - Real World Multiple graphs = Multiple answers ! How many common neighbors do B and C have? My friends are private Answer for C A One common neighbor: me! A B D

  16. Application: Friend suggestion Common Neighbors - Real World Multiple graphs = Multiple answers ! How many common neighbors do B and C have? My friends are Answer for private B C Zero common A neighbors ! D B We cannot suggest C to B as friends based on common neighbors!

  17. Naive approaches 1) Running the algorithms N times is infeasible 2) Ignoring all private data is very ineffective ! My friends are From user A’s private prospective there are C E interesting signals A B D E and D are good suggestions!

  18. Naive approaches 1) Running the algorithms N times is infeasible 2) Ignoring all private data is very ineffective ! My friends are From public private data C prospective E there are no signals! A B D No suggestions for the user!

  19. Public-Private Graph Model

  20. Private-Public model There is a public graph G

  21. Private-Public model There is a public graph in addition every node has G u access to a private graph G u u u G u We assume the private graph to be at <= 2 hops from . u

  22. Private-Public model For each we would like to execute computation on u G ∪ G u u

  23. Private-Public model For each we would like to execute computation on u G ∪ G u u This respects the privacy of each user. We want the computation to be efficient.

  24. Two-Steps Approach Precompute data structure for so that we can G solve problems in efficiently. G ∪ G u Preprocessing G Synopsis of G Public Graph + u Query for user u fast computation Output for User u Private Graph G u

  25. Private-Public problem Ideally. Preprocessing time: ˜ O ( | E G | ) ˜ Preprocessing space: O ( | V G | ) ˜ Query time: O ( | E G u | )

  26. Warm-up: # connected components

  27. Warm-up: # connected components B B B B B A A A A C C A A A C C Precompute component IDs in G

  28. Warm-up: # connected components B B B B B A A A A C C A A A C C Add private edges and merge conn. components

  29. Warm-up: # connected components B A A Add private edges and merge conn. components.

  30. Results Algorithms 
 Reachability Approximate All-pairs shortest paths Correlation clustering Social affinity Heuristics Personalized PageRank Centrality measures

  31. Results Algorithms 
 Reachability Approximate All-pairs shortest paths Correlation clustering Social affinity Heuristics Personalized PageRank Centrality measures

  32. Reachability How many nodes can I reach from u? u

  33. Reachability How many nodes can I reach from u? u We have to handle overlaps.

  34. Reachability Key idea: use size-estimation sketch [Cohen JCSS97] 0.5 0.23 0.33 0.9 0.2 0.3 0.1 Every node samples a random number between [0,1]

  35. Reachability Key idea: use size-estimation sketch [Cohen JCSS97] [0.1, 0.2] 0.5 0.23 0.33 0.9 0.2 0.3 0.1 Every node samples a random number between [0,1]. Look at the k-th smallest value , use it to estimate the size of the set.

  36. Reachability Key idea: use size-estimation sketch [Cohen JCSS97] [0.1, 0.2] [0.15, 0.2] 0.5 0.5 0.23 0.15 0.33 0.33 0.9 0.2 0.9 0.3 0.7 0.1 Every node samples a random number between [0,1]. Look at the k-th smallest value , use it to estimate the size of the set. Composable sketch of size k.

  37. Reachability Key idea: use size-estimation sketch [Cohen JCSS97] [0.1, 0.15] [0.1, 0.2] [0.15, 0.2] 0.5 0.5 0.23 0.15 0.33 0.33 0.9 0.2 0.9 0.3 0.7 0.1 Every node samples a random number between [0,1]. Look at the k-th smallest value , use it to estimate the size of the set. Composable sketch of size k.

  38. Reachability How many nodes can I reach from u? [0.7, 1.0] [0.8, 1.0] [0.1, 1.0] u [0.2, 0.3] Precompute sketches for each node in public graph.

  39. Reachability How many nodes can I reach from u? [0.7, 1.0] [0.8, 1.0] [0.1, 1.0] u [0.2, 0.3] [0.1, 0.2] Compose sketches of nodes reachable in private graph.

  40. Experiments Personalized PageRank Approximating the PPR stationary distribution. Up to 4 orders of magnitudes faster naive approach.

  41. Conclusions New model for practical problems; Some algorithms designed using sampling and 
 sketching techniques; Large speed-up in practice.

  42. Future works New algorithms for other problems; Not only graph problems; Study limit of the model (lower bounds).

  43. Thanks!

  44. Personalized PageRank is the probability of visiting in the following PPR ( v, z ) z lazy random walk: - with probability jumps to α v - with probability jumps to a random neighbor 1 − α v

  45. Personalized PageRank is the probability of visiting in the following PPR ( v, z ) z lazy random walk: - with probability jumps to α v - with probability jumps to a random neighbor 1 − α v

  46. Personalized PageRank is the probability of visiting in the following PPR ( v, z ) z lazy random walk: - with probability jumps to α v - with probability jumps to a random neighbor 1 − α v

  47. Personalized PageRank is the probability of visiting in the following PPR ( v, z ) z lazy random walk: - with probability jumps to α v - with probability jumps to a random neighbor 1 − α v

  48. Personalized PageRank is the probability of visiting in the following PPR ( v, z ) z lazy random walk: - with probability jumps to α v - with probability jumps to a random neighbor 1 − α v

  49. Personalized PageRank is the probability of visiting in the following PPR ( v, z ) z lazy random walk: - with probability jumps to α v - with probability jumps to a random neighbor 1 − α v

  50. Personalized PageRank is the probability of visiting in the following PPR ( v, z ) z lazy random walk: - with probability jumps to α v - with probability jumps to a random neighbor 1 − α v

  51. Personalized PageRank Nice property [Jeh and Widom WWW03] PPR G ∪ G u ( v, z ) = (1 − α ) d G ∪ G u ( y ) − 1 X PPR G ∪ G u ( v, y ) + α 1 v y ∈ N ( z ) v

  52. Personalized PageRank Nice property [Jeh and Widom WWW03] PPR G ∪ G u ( v, z ) = (1 − α ) d G ∪ G u ( y ) − 1 X PPR G ∪ G u ( v, y ) + α 1 v y ∈ N ( z ) v

  53. Personalized PageRank Nice property [Jeh and Widom WWW03] PPR G ∪ G u ( v, z ) = (1 − α ) d G ∪ G u ( y ) − 1 X PPR G ∪ G u ( v, y ) + α 1 v y ∈ N ( z ) We don’t have it v

  54. Personalized PageRank Nice property [Jeh and Widom WWW03] PPR G ∪ G u ( v, z ) = (1 − α ) d G ∪ G u ( y ) − 1 X PPR G ∪ G u ( v, y ) + α 1 v y ∈ N ( z ) Simple heuristic: PPR G ∪ G u ( v, z ) ≈ (1 − α ) d G ∪ G u ( y ) − 1 X u ( v, y ) + α 1 v PPR G ∪ y ∈ N ( z ) Using public graph distribution v

  55. Social affinity Which connection is stronger?

  56. Social affinity Which connection is stronger? It is important to consider the number of paths and their lengths

Recommend


More recommend