similarity ranking in large scale bipartite graphs
play

Similarity Ranking in Large- Scale Bipartite Graphs Alessandro - PowerPoint PPT Presentation

Similarity Ranking in Large- Scale Bipartite Graphs Alessandro Epasto Brown University - 20 th March 2014 1 Joint work with J. Feldman, S. Lattanzi, S. Leonardi, V. Mirrokni [WWW, 2014] 2 AdWords Ads Ads Our Goal Tackling AdWords


  1. Similarity Ranking in Large- Scale Bipartite Graphs Alessandro Epasto Brown University - 20 th March 2014 � 1

  2. Joint work with J. Feldman, S. Lattanzi, S. Leonardi, V. Mirrokni [WWW, 2014] � 2

  3. AdWords Ads Ads

  4. Our Goal ● Tackling AdWords data to identify automatically , for each advertiser, its main competitors and suggest relevant queries to each advertiser. � ● Goals: ● Useful business information. ● Improve advertisement. ● More relevant performance benchmarks.

  5. The Data Query Information Nike store New York Market Segment: Retailer , Geo: NY (USA), Stats: 10 clicks Soccer shoes Market Segment: Apparel , Geo: London, UK, Stats: 4 clicks Soccer ball Market Segment: Equipment Geo: San Francisco (USA), Stats: 5 clicks …. millions of other queries …. Large advertisers (e.g., Amazon, Ask.com, etc) compete in several market segments with very different advertisers.

  6. Modeling the Data as a Bipartite Graph Hundreds of Labels Millions of Advertisers Billions of Queries

  7. Other Applications ● General approach applicable to several contexts: ● User , Movies , Categories : find similar users and suggest movies. ● Authors , Papers , Conferences : find related authors and suggest papers to read. � ● Generally this bipartite graphs are lopsided: we want algorithms with complexity depending on the smaller side.

  8. Semi-Formal Problem Definition Advertisers Queries

  9. Semi-Formal Problem Definition Advertisers A Queries

  10. Semi-Formal Problem Definition Advertisers A Queries Labels:

  11. Semi-Formal Problem Definition Advertisers A Queries Goal: Find the nodes most Labels: “similar” to A.

  12. How to Define Similarity? ● We address the computation of several node similarity measures: ● Neighborhood based: Common neighbors, Jaccard Coefficient, Adamic-Adar. ● Paths based: Katz. ● Random Walk based: Personalized PageRank. � ● What is the accuracy? ● Can it scale to huge graphs? ● Can be computed in real-time ?

  13. Our Contribution ● Reduce and Aggregate: general approach to induce real-time similarity rankings in multi- categorical bipartite graphs, that we apply to several similarity measures. � ● Theoretical guarantees for the precision of the algorithms. � ● Experimental evaluation with real world data.

  14. Personalized PageRank For a node v (the seed) and a probability alpha v u The stationary distribution assigns a similarity score to each node in the graph w.r.t. node v. � �

  15. Personalized PageRank ● Extensive algorithmic literature. � ● Very good accuracy in our experimental evaluation compared to other similarities (Jaccard, Intersection, etc.). � ● Efficient MapReduce algorithm scaling to large graphs (hundred of millions of nodes). However…

  16. Personalized PageRank � � � � � ● Our graphs are too big ( billions of nodes) even for large-scale systems. ● MapReduce is not real-time. ● We cannot pre-compute the rankings for each subset of labels.

  17. Reduce and Aggregate �� � � � � �� � � � � �� � � � � Reduce: Given the bipartite and a category construct a graph with only A nodes that preserves the ranking on the entire graph. � Aggregate: Given a node v in A and the reduced graphs of the subset of categories interested determine the ranking for v.

  18. In practice � � � � � First stage: Large-scale (but feasible) MapReduce pre-computation of the individual category reduced graphs. � Second Stage: Fast real-time algorithm aggregation algorithm.

  19. Reduce for Personalized PageRank Side A Side A Side B ● Markov Chain state aggregation theory (Simon and Ado, ’61; Meyer ’89, etc.). ● 750x reduction in the number of node while preserving correctly the PPR distribution on the entire graph .

  20. Stochastic Complementation � � P 11 . . . P 1 i . . . P 1 k � � � � . . . . . � � . . . . . � � . . . . . � � P i 1 . . . P ii . . . P ik � � � � � . . . . . � . . . . . � � . . . . . � � P k 1 . . . P ki . . . P kk � � ● The stochastic complement of is the C i following matrix | C i | × | C i | i ) − 1 P ∗ i S i = P ii + P i ∗ (1 − P ∗

  21. Stochastic Complementation � � P 11 . . . P 1 i . . . P 1 k � � � � . . . . . � � . . . . . � � . . . . . � � P i 1 . . . P ii . . . P ik � � � � � . . . . . � . . . . . � � . . . . . � � P k 1 . . . P ki . . . P kk � � ● The stochastic complement of is the C i following matrix | C i | × | C i | i ) − 1 P ∗ i S i = P ii + P i ∗ (1 − P ∗

  22. Stochastic Complementation � � P 11 . . . P 1 i . . . P 1 k � � � � . . . . . � � . . . . . � � . . . . . � � P i 1 . . . P ii . . . P ik � � � � � . . . . . � . . . . . � � . . . . . � � P k 1 . . . P ki . . . P kk � � ● The stochastic complement of is the C i following matrix | C i | × | C i | i ) − 1 P ∗ i S i = P ii + P i ∗ (1 − P ∗

  23. Stochastic Complementation � � P 11 . . . P 1 i . . . P 1 k � � � � . . . . . � � . . . . . � � . . . . . � � P i 1 . . . P ii . . . P ik � � � � � . . . . . � . . . . . � � . . . . . � � P k 1 . . . P ki . . . P kk � � ● The stochastic complement of is the C i following matrix | C i | × | C i | i ) − 1 P ∗ i S i = P ii + P i ∗ (1 − P ∗

  24. Stochastic Complementation � � P 11 . . . P 1 i . . . P 1 k � � � � . . . . . � � . . . . . � � . . . . . � � P i 1 . . . P ii . . . P ik � � � � � . . . . . � . . . . . � � . . . . . � � P k 1 . . . P ki . . . P kk � � ● The stochastic complement of is the C i following matrix | C i | × | C i | i ) − 1 P ∗ i S i = P ii + P i ∗ (1 − P ∗

  25. Stochastic Complementation Theorem [Meyer ’89] For every irreducible aperiodic Markov Chain, π i = t i s i where is the stationary distribution of the nodes π i in and is the stationary distribution of C i S i s i

  26. Stochastic Complementation ● Computing the stochastic complements is unfeasible in general for large matrices (matrix inversion). � ● In our case we can exploit the properties of random walks on Bipartite graphs to invert the matrix analytically.

  27. Reduce for PPR w ( x, z ) Side A Side B x z w ( y, z ) y

  28. Reduce for PPR w ( x, z ) Side A Side B x z w ( x, y ) w ( y, z ) y w ( x, z ) w ( y, z ) X w ( x, y ) = P h ∈ N ( z ) w ( z,h ) z ∈ N ( x ) ∪ N ( y )

  29. Reduce for PPR w ( x, z ) Side A Side B x z w ( x, y ) w ( y, z ) y One step in the reduced graph is equivalent to two steps in the bipartite graph.

  30. Properties of the Reduced Graph 2 − α PPR( ˆ 1 G, 2 α − α 2 , a ) Lemma 1: PPR( G, α , a )[ A ] = Proof Sketch: ● Every path between nodes in A is even. � ● Probability of not jumping for two steps. � ● The probability of being in the A-Side at stationarity does not depend on the graph.

  31. Properties of the Reduced Graph Similarly, we can reduce the process to a graph with B-Side nodes only. Lemma 2: b ∈ N ( a ) w ( a, b )PPR( ˆ PPR( G, α , a )[ B ] = 1 − α G B , 2 α − α 2 , b ) P 2 − α Finally, the stationary distribution of either side uniquely determines that of the other side.

  32. Koury et al. Aggregation-Disaggregation Algorithm B A Step 1: Partition the Markov chain into disjoint subsets

  33. Koury et al. Aggregation-Disaggregation Algorithm B A π B π A Step 2: Approximate the stationary distribution on each subset independently.

  34. Koury et al. Aggregation-Disaggregation Algorithm P AA B A P AB π B π A P BA P BB Step 3: Compute the k x k approximated transition matrix T between the subsets.

  35. Koury et al. Aggregation-Disaggregation Algorithm t A t B P AA B A P AB π B π A P BA P BB Step 4: Compute the stationary distribution of T.

  36. Koury et al. Aggregation-Disaggregation Algorithm t A t B P AA B A P AB π 0 π 0 B A P BA P BB Step 5: Based on the stationary distribution improve the estimation of and . Repeat until convergence. π B π A

  37. Aggregation in PPR A X Y π A Precompute the stationary distributions individually

  38. Aggregation in PPR B X Y π B Precompute the stationary distributions individually

  39. Aggregation in PPR B A The two subsets are not disjoint!

  40. Reduction to the Query Side X Y π B π A

  41. Reduction to the Query Side X Y π B π A This is the larger side of the graph.

  42. Our Approach π A π B Y X Y X ● We tackle the bijective relationships between the stationary distributions of the two sides. ● The algorithm is based only on the reduced graphs with Advertiser-Side nodes. ● The aggregation algorithm is scalable and converges to the correct distribution.

Recommend


More recommend