finding dense subgraphs via low rank bilinear optimization
play

Finding Dense Subgraphs via Low-Rank Bilinear Optimization Ioannis - PowerPoint PPT Presentation

Finding Dense Subgraphs via Low-Rank Bilinear Optimization Ioannis Mitliagkas Dimitris Papailiopoulos with: Alex Dimakis UT Austin Constantine Caramanis Densest k-Subgraph (DkS) Given graph and a parameter k Find k vertices


  1. Finding Dense Subgraphs via Low-Rank Bilinear Optimization Ioannis Mitliagkas Dimitris Papailiopoulos with: Alex Dimakis � UT Austin Constantine Caramanis

  2. Densest k-Subgraph (DkS) Given graph and a parameter k � Find k vertices containing most edges

  3. Densest k-Subgraph (DkS) Given graph and a parameter k � Find k vertices containing most edges � Applications Community Mining communities = large dense components Link Spam Detection dense parts of web: spam Computational biology complex patterns in gene annotation graphs

  4. Densest k-Subgraph (DkS) There is a 5-subgraph with 10 edges � Q: Can you find it?

  5. Densest k-Subgraph (DkS) Given graph and a parameter k � Find k vertices containing most edges NP-hard Hard to approximate

  6. Densest k-Subgraph (DkS) Given graph and a parameter k � Find k vertices containing most edges NP-hard Hard to approximate [Khot, 2004] *Except in specific cases: [Arora et al 95] (1+ ε ) approx. for linear subgraphs of dense graphs

  7. Worst-Case Analysis

  8. Worst-Case Analysis � � � �

  9. Worst-Case Analysis � � � � After long effort, [Feige, 2001], [Bhaskara et al., STOC ’10] Best known ratio � � � 10-factor approx. for graphs with 10K nodes 100-factor approx. for graphs with 100 Million nodes

  10. Known DkS guarantees are not useful in practice… under worst case analysis

  11. Known DkS guarantees are not useful in practice… under worst case analysis Q1 : Provable, graph-dependent bounds? Q2 : DkS on billion-scale graphs?

  12. Beyond the Worst Case New DkS algorithm: Graph-dependent bounds In practice: Scalable nearly-linear times for many real-world graphs Parallelizable implementation in MapReduce+Python up to billion-edge graphs on 800 cores on Amazon EC2

  13. Our Low-Rank Framework 1 1 1 1 1 1 1 DkS on a graph - Hard to solve - Hard to approximate

  14. Our Low-Rank Framework 1 0.9 1 1.1 1 1.2 0.1 1 1.3 0.6 Low rank 1 approximation 1 1.4 1 0.7 -0.2 -0.3 DkS on a graph DkS on constant rank graph - Hard to solve - Nearly-linear time solvable (!) - Hard to approximate

  15. Our Low-Rank Framework 1 0.9 1 1.1 1 1.2 0.1 1 1.3 0.6 Low rank 1 approximation 1 1.4 1 0.7 -0.2 -0.3 DkS on a graph DkS on constant rank graph - Hard to solve - Nearly-linear time solvable (!) - Hard to approximate Low-rank DkS is related to original DkS

  16. Results: Theory

  17. Graph-dependent Guarantees Theorems: Algorithm computes in time O(n d+2 / δ ) a k -subgraph with density OPT d ≥ OPT · 0 . 5 · (1 − δ ) − 2 | λ d +1 |

  18. Graph-dependent Guarantees Theorems: Algorithm computes in time O(n d+2 / δ ) a k -subgraph with density OPT d ≥ OPT · 0 . 5 · (1 − δ ) − 2 | λ d +1 | If the largest d eigenvalues of the adjacency are positive O ( | E | · log n + n Our algorithm computes in time ✏ d ) a k -subgraph with density � OPT d ≥ OPT · (1 − ✏ ) − 2 | � d +1 |

  19. Graph-dependent Guarantees Theorems: Algorithm computes in time O(n d+2 / δ ) a k -subgraph with density OPT d ≥ OPT · 0 . 5 · (1 − δ ) − 2 | λ d +1 | If the largest d eigenvalues of the adjacency are positive O ( | E | · log n + n Our algorithm computes in time ✏ d ) a k -subgraph with density � OPT d ≥ OPT · (1 − ✏ ) − 2 | � d +1 | larger d => better approximation, slower computation

  20. Performance in Practice

  21. com-LiveJournal graph 4M nodes, 35M edges Trivial upper bound = k-1 density subgraph size, k

  22. com-LiveJournal graph 4M nodes, 35M edges Trivial upper bound = k-1 density subgraph size, k Blue: TPower JMLR’13 Green: GreedyFeige Algorithmica ’01 Yellow: GreedyRavi OR’94

  23. com-LiveJournal graph 4M nodes, 35M edges Trivial upper bound = k-1 density Big Gap subgraph size, k Blue: TPower JMLR’13 Green: GreedyFeige Algorithmica ’01 Yellow: GreedyRavi OR’94

  24. com-LiveJournal graph 4M nodes, 35M edges Trivial upper bound = k-1 density d=1 spannogram subgraph size, k Blue: TPower JMLR’13 Green: GreedyFeige Algorithmica ’01 Yellow: GreedyRavi OR’94

  25. com-LiveJournal graph 4M nodes, 35M edges Trivial upper bound = k-1 density d=2 spannogram subgraph size, k Blue: TPower JMLR’13 Green: GreedyFeige Algorithmica ’01 Yellow: GreedyRavi OR’94

  26. com-LiveJournal graph 4M nodes, 35M edges Trivial upper bound = k-1 density d=5 spannogram subgraph size, k Blue: TPower JMLR’13 Green: GreedyFeige Algorithmica ’01 Yellow: GreedyRavi OR’94

  27. com-LiveJournal graph 4M nodes, 35M edges Trivial upper bound = k-1 Smaller Gap density subgraph size, k Blue: TPower JMLR’13 Green: GreedyFeige Algorithmica ’01 Yellow: GreedyRavi OR’94

  28. com-LiveJournal graph 4M nodes, 35M edges Graph-dependent bound 80% OPT OPT d + λ d +1 density subgraph size, k Blue: TPower JMLR’13 Green: GreedyFeige Algorithmica ’01 Yellow: GreedyRavi OR’94

  29. How we do it

  30. DkS via Quadratic Optimization vertex vertex

  31. DkS via Quadratic Optimization vertex vertex

  32. DkS via Quadratic Optimization vertex vertex

  33. DkS via Quadratic Optimization vertex Edges In subgraph vertex

  34. DkS via Quadratic Optimization vertex Edges In subgraph vertex DkS :

  35. DkS via Bilinear Optimization DkS :

  36. DkS via Bilinear Optimization DBkS : DkS :

  37. DkS via Bilinear Optimization DBkS : DkS :

  38. DkS via Bilinear Optimization DBkS : Lemma: ρ -approximation for DBkS = ½ρ -approximation for DkS DkS :

  39. DkS via Bilinear Optimization DBkS : 1 1 1 1 1 1 1

  40. Low-Rank Approximation DBkS :

  41. Low-Rank Approximation DBkS : 0.9 1.1 1.2 0.1 1.3 0.6 1.4 0.7 -0.2 -0.3

  42. Low-Rank Approximation DBkS : 0.9 1.1 1.2 0.1 1.3 0.6 1.4 0.7 -0.2 -0.3

  43. Low-Rank Approximation DBkS : 0.9 1.1 1.2 0.1 1.3 0.6 1.4 0.7 -0.2 -0.3 Efficiently solvable

  44. How the Low-Rank Solver Works ✓ n ◆ Naïvely: Check all subgraphs k Rank-1 case: Q: Maximize the product of two numbers A: Maximize each number individually

  45. How the Rank-1 Solver Works 1 
 1 
 2 
 2 
 3 3 4 4 top-k set : the k-largest coordinates of a vector, e.g., if k =2, then top-2 set = {3,4} � Intuition : x, y pick the top-k set of v .

  46. 
 
 
 
 
 
 
 
 How the Rank-2 Solver Works 1 5 1 5 2 2 2 2 3 7 3 7 � � � � 4 0 4 0 Intuition : x, y pick the top- k set of a vector from a 2-dimensional span. Q: How many top-k sets are there in a 2-dimensional span? Based on Spannogram [Asteris, Papail., Karystinos, ISIT2011] Theorem : # top- k sets in a d-dimensional span: Spannogram : Traverses all of them efficiently

  47. 
 
 
 
 
 
 
 
 How the Rank-2 Solver Works 1 5 1 5 2 2 2 2 3 7 3 7 � � � � 4 0 4 0 Intuition : x, y pick the top- k set of a vector from a 2-dimensional span. Randomized algorithm Take random points : s 1 , . . . , s 1 / ✏ d ∈ span( v 1 , . . . , v d )

  48. 
 
 
 
 
 
 
 
 How the Rank-2 Solver Works 1 5 1 5 2 2 2 2 3 7 3 7 � � � � 4 0 4 0 Intuition : x, y pick the top- k set of a vector from a 2-dimensional span. Randomized algorithm Take random points : s 1 , . . . , s 1 / ✏ d ∈ span( v 1 , . . . , v d ) Practically linear time

  49. Implementation

  50. MapReduce Implementation �

  51. MapReduce Implementation git.io/spannogram �

  52. Billion-scale Graphs n, 1 � � 2 , k = 3 √ n G 1000 G-Feige G-Ravi TPower 800 Subgraph density Spannogram 600 400 200 0 4 6 8 10 10 10 10 10 | E |

  53. Conclusions

  54. Conclusions • New combinatorial approx. algorithm for DkS.

  55. Conclusions • New combinatorial approx. algorithm for DkS. • Graph-dependent spectral bounds: 
 OPT within 70% in most experiments.

  56. Conclusions • New combinatorial approx. algorithm for DkS. • Graph-dependent spectral bounds: 
 OPT within 70% in most experiments. • Bound could be trivial in the worst case.

  57. Conclusions • New combinatorial approx. algorithm for DkS. • Graph-dependent spectral bounds: 
 OPT within 70% in most experiments. • Bound could be trivial in the worst case. • Empirically outperforms previous state of the art

  58. Conclusions • New combinatorial approx. algorithm for DkS. • Graph-dependent spectral bounds: 
 OPT within 70% in most experiments. • Bound could be trivial in the worst case. • Empirically outperforms previous state of the art

  59. Conclusions • New combinatorial approx. algorithm for DkS. • Graph-dependent spectral bounds: 
 OPT within 70% in most experiments. • Bound could be trivial in the worst case. • Empirically outperforms previous state of the art • Highly scalable implementation

  60. Thank you

  61. Backup slides

  62. Other experiments

  63. Randomized Algorithm Step 1 Take random points : s 1 , . . . , s 1 / ✏ d ∈ span( v 1 , . . . , v d ) Step 2 Find largest k entries : Step 3 Compute density of corresponding subgraph

Recommend


More recommend