to be connected
play

To be connected, or not to be connected... That is the Minimum - PowerPoint PPT Presentation

To be connected, or not to be connected... That is the Minimum Inefficiency Subgraph Problem Natali Ruchansky Francesco Bonchi David Garcia-Soriano Francesco Gullo Nicolas Kourtellis Biologists in Lab X have constructed a large


  1. To be connected, or not to be connected... That is the Minimum Inefficiency Subgraph Problem Natali Ruchansky Francesco Bonchi David Garcia-Soriano Francesco Gullo Nicolas Kourtellis

  2. Biologists in Lab X have constructed a large protein-protein interaction network (PPI).

  3. Biologists in Lab X have constructed a large protein-protein interaction network (PPI). The PI has tasked them with making an amazing discovery about relationship among specific proteins P1, P2, and P3.

  4. suspect 2 suspect 3 suspect 1 Given a set of subjects in a terrorist network suspected of organizing an attack. Which other subjects, likely to be involved, should we keep under control?

  5. impression 2 impression 3 impression 1 Given a set of users who clicked on an ad , who else should the ad be displayed to?

  6. patient 2 patient 3 patient 1 Given a set of patients infected with a viral disease, which other people should we monitor?

  7. Community search / seed set expansion • General class of problems of the form: Given a graph G=(V,E) and a set of vertices Q  V , find a subgraph H of G that “explains” the connections among Q. (H minimizes/maximizes some objective function) • Several approaches in the literature – H must be a connected subgraph – Mostly based on random-walks – Tend to return rather large solutions – Solutions get very large when query nodes belong to different communities – Have parameters

  8. The Minimum Wiener Connector Problem (SIGMOD 2015) Our proposal: find the connected subgraph containing and minimizing the Wiener Index (the sum of pairwise distances) • Parameter-free • Returns smaller and denser subgraphs No matter whether the query nodes belong to the same community or not • Add “ important ” nodes (high centrality) • Efficient algorithm with approximation guarantees

  9. Smaller, denser, and more central vertices

  10. Relaxing connectivity instead of forcing connectivity relax the constraint

  11. Desired Properties Parsimonious vertex addition • vertices should be added iff they help forming a more cohesive subgraph Outlier Tolerance • query vertices which are far from others should remain disconnected Multi-community awareness • if the query vertices span multiple communities, connectedness should not be imposed among them

  12. Cohesiveness • As with the Wiener Connector, we leverage shortest path distances; however, the distance between disconnected vertices is infinite. • Idea: use the reciprocal of the shortest-path distance! This has the useful property of handling disconnection neatly ( ) Network Efficiency (Latora and Marchiori): Harmonic Centrality (Boldi and Vigna):

  13. What about these problem statements? Given a graph G=(V,E) and a set of vertices Q  V, find a (not-necessarily connected) subgraph H of G, with Q  V(H) that maximizes network efficiency E(H) Given a graph G=(V,E) and a set of vertices Q  V, find a (not-necessarily connected) subgraph H of G, with Q  V(H) that maximizes the total harmonic centrality C(H)

  14. These do not work… a clique of size 100 1 1 1 2 2 2 4 3 3 3 C(H)=9900 C(G[Q])=0 C(H)=0 E(H)=0.942 E(G[Q])=0 E(H)=0

  15. Minimize Network Inefficiency Given a graph G=(V,E), we define its inefficiency as: Note:

  16. … and this works a clique of size 100 1 1 1 2 2 2 4 3 3 3 C(G[Q])=0 C(G[Q])=9900 C(G[Q])=0 E(G[Q])=0 E(G[Q])=0 E(G[Q])=0.942 I(G[Q])=6 I(G[Q])=606 I(G[Q])=12

  17. Problem statement and hardness

  18. Greedy Algorithm Connect Start with the Minimum Wiener Connector for Q Remove Remove one vertex at a time until Q is disconnected Choose Choose the intermediate solution S that minimizes I(S)

  19. Competitors ICDE ’15 SIGMOD ’15 SDM’13 KDD’10 KDD’06

  20. Brain Co-activation Network The data is a graph where each vertex The 3 components in the solution end up is an area of the brain and edges are added corresponding to different functions: according to co-activation in experiments. motor , visual , and emotional . (The graph is one connected component) query vertices extra vertices relaxing connectivity highlights three different functional relationships and gives a smaller, more interpretable solution

  21. Brain Co-activation Network: competitors

  22. Experimental Results Parsimonious vertex addition • vertices should be added iff they help forming a more cohesive subgraph Outlier Tolerance • query vertices which are far from others should remain disconnected Multi-community awareness • if the query vertices span multiple communities, connectedness should not be imposed among them

  23. Experimental Results # disconnected solution singletons size in solution # query vertices # outliers selected # connected component in solution # of communities spanned by Q

  24. Cohesive meal creation Minimum MDL-based Bump Hunting Inefficiency lemongrass lemongrass scallop lemongrass scallop scallop onion onion onion bell bell bell pepper pepper pepper coffee mushroom mushroom mushroom beef soybean soybean honey peanut honey honey butter black bean black bean black bean

  25. Biology Minimum MDL-based Bump Hunting Inefficiency FAM100B FAM100B ELAV FAM100B NRAS NRAS NRAS NOD2 NOD2 PIK3CA SMARCA4 ERBB3 ERBB3 NF1 NOD2 NF1 ERBB3 NF1 SMAD4 SMAD4 SMAD4 BRAC1 GALNT2 BRAC1 BRAC1 MUC1 CTNNB1 CTNNB1 ESR1 CTNNB1

  26. Takeaway but I don’t... you love how are cats! we related? Selective Connector

Recommend


More recommend