purnamrita sarkar carnegie mellon deepayan chakrabarti
play

Purnamrita Sarkar (Carnegie Mellon) Deepayan Chakrabarti (Yahoo! - PowerPoint PPT Presentation

Purnamrita Sarkar (Carnegie Mellon) Deepayan Chakrabarti (Yahoo! Research) Andrew W . Moore (Google, Inc.) Which pair of nodes {i,j} should be connected? Variant: node i is given Alice Bob Charlie Friend suggestion in


  1. Purnamrita Sarkar (Carnegie Mellon) Deepayan Chakrabarti (Yahoo! Research) Andrew W . Moore (Google, Inc.)

  2. � � Which pair of nodes {i,j} should be connected? � � Variant: node i is given Alice Bob Charlie Friend suggestion in Facebook Movie recommendation in Netflix

  3. � � Predict link between nodes • � With the minimum number of hops • � With max common neighbors (length 2 paths) Alice Prolific 1000 common friends followers � Less evidence Bob Less prolific 8 followers � Much more evidence Charlie The Adamic/Adar score gives more weight to low degree common neighbors.

  4. � � Predict link between nodes • � With the minimum number of hops • � With more common neighbors (length 2 paths) • � With larger Adamic/Adar • � With more short paths (e.g. length 3 paths ) • � …

  5. Especially if the Link prediction accuracy* How do we justify these graph is sparse observations? Random Shortest Common Adamic/Adar Ensemble of Path Neighbors short paths *Liben-Nowell & Kleinberg, 2003; Brand, 2005; Sarkar & Moore, 2007

  6. Raftery et al.’s Model: Points close in this space are more likely to be connected. Unit volume universe Nodes are uniformly distributed in a latent space The problem of link prediction is to find the nearest neighbor who is not currently linked to the node. � � Equivalent to inferring distances in the latent space 6

  7. Two sources of randomness • � Point positions: uniform in D dimensional space • � Linkage probability: logistic with parameters � , r • � � , r and D are known Higher probability � determines the steepness 1 of linking � radius r 7

  8. Especially if the graph is sparse Link prediction accuracy Random Shortest Common Adamic/Adar Ensemble of Path Neighbors short paths *Liben-Nowell & Kleinberg, 2003; Brand, 2005; Sarkar & Moore, 2007

  9. j i � � Pr 2 (i,j) = Pr(common neighbor|d ij ) Product of two logistic probabilities, integrated over a volume determined by d ij As � � � Logistic � Step function Much easier to analyze!

  10. Everyone has same radius r Unit volume universe j i � =Number of common neighbors Empirical Bernstein V(r)=volume Bounds on of radius r in distance D dims 10

  11. � � OPT = node closest to i � � MAX = node with max common neighbors with i � � Theorem: w.h.p d OPT � d MAX � d OPT + 2[ ������� ���� ���������� � ����� � ��� �� ��� � ��� �� �� ���������������������� Common neighbors is an asymptotically optimal heuristic as N � �

  12. � � Node k has radius r k . � � i � k if d ik � r k (Directed graph) � � r k captures popularity of node k Type 2: i � k � j Type 1: i � k � j k k j r j j r k i i r k r i A(r i , r j ,d ij ) A(r k , r k ,d ij ) 12

  13. Example graph: � � N 1 nodes of radius r 1 and N 2 nodes of radius r 2 � � r 1 << r 2 � 2 ~ Bin[N 2 , A(r 2 , r 2 , d ij )] � 1 ~ Bin[N 1 , A(r 1 , r 1 , d ij )] k i j Maximize Pr[ � 1 , � 2 | d ij ] = product of two binomials w(r 1 ) E[ � 1 |d*] + w(r 2 ) E[ � 2 |d*] = w(r 1 ) � 1 + w(r 2 ) � 2 RHS � � LHS � � d* �

  14. Jacobian { Variance Adamic/Adar Small variance � Presence is more surprising Small variance � Absence is more surprising 1/r r is close to max radius Real world graphs generally fall in this range

  15. Especially if the graph is sparse Link prediction accuracy Random Shortest Common Adamic/Adar Ensemble of Path Neighbors short paths *Liben-Nowell & Kleinberg, 2003; Brand, 2005; Sarkar & Moore, 2007

  16. � � Common neighbors = 2 hop paths � � Analysis of longer paths: two components 1. Bounding E( � l | d ij ). [ � l = # l hop paths] � � Bounds Pr l (i,j) by using triangle inequality on a series of common neighbor probabilities. 2. � l � E( � l | d ij ) Triangulation

  17. � � Common neighbors = 2 hop paths � � Analysis of longer paths: two components 1. Bounding E( � l | d ij ) [ � l = # l hop paths] � � Bounds Pr l (i,j) by using triangle inequality on a series of common neighbor probabilities. 2. � l � E( � l | d ij ) • � Bounded dependence of � l on position of each node � Can use McDiarmid’s inequality to bound | - E( � l | d ij )| � l

  18. � � Bound d ij as a function of � l using McDiarmid’s inequality. � � For l’ � l we need � l’ >> � l to obtain similar bounds � � Also, we can obtain much tighter bounds for long paths if shorter paths are known to exist.

  19. 1 � Factor � weak bound for Logistic � � Can be made tighter, as logistic approaches the step function.

  20. � � Three key ingredients 1. � Closer points are likelier to be linked. Small World Model- Watts, Strogatz, 1998, Kleinberg 2001 2. � Triangle inequality holds � necessary to extend to l hop paths 3. � Points are spread uniformly at random � Otherwise properties will depend on location as well as distance

  21. In sparse graphs, Differentiating length 3 or more Link prediction accuracy* paths help in between different degrees is important prediction. For large dense graphs, common neighbors are enough The number of paths matters, not the length Random Shortest Common Adamic/Adar Ensemble of Path Neighbors short paths *Liben-Nowell & Kleinberg, 2003; Brand, 2005; Sarkar & Moore, 2007

  22. Link Prediction Generative model Heuristics A few properties Most likely neighbor of node i ? node b node a Compare � Can justify the empirical observations � We also offer some new prediction algorithms 23

  23. � � Combine bounds from different radii � � But there might not be enough data to obtain individual bounds from each radius � � New sweep estimator � � Q r = Fraction of nodes w. radius � r, which are common neighbors. � � Higher Q r � smaller d ij w.h.p

  24. � � Q r = Fraction of nodes w. radius � r, which are common neighbors • � larger Q r � smaller d ij w.h.p � � T R : = Fraction of nodes w. radius � R, which are common neighbors. � � Smaller T R � large d ij w.h.p

  25. Number of common neighbors of a given radius r T R = Fraction of nodes Q r = Fraction of nodes with radius � R which with radius � r which are common neighbors are common neighbors Small T R � large d ij Large Q r � small d ij

Recommend


More recommend