Scalable Diffusion-Aware Optimization of Network Topology Elias Boutros Khalil, Bistra Dilkina, Le Song Georgia Institute of Technology
Problem • Given • G(V,E), • a set of source nodes X (infected nodes) • Linear Threshold Model • Find a set of k edges to • remove, s.t., the spread of a certain substance is minimized • add, s.t., the spread of a certain substance is maximized 2
Review: Diffusion Models • Linear Threshold Model • Each edge has a weight Wuv • each node u chooses a threshold uniformly at random in [0,1] • Node v will be infected if • Independent Cascade Model • Each edge has a propagation probability Puv • Each infected node u has only one chance to infect its neighbor v with prob. Puv 3
Review: Influence Maximization • Given • G(V,E) • LT model or IC model • To find k nodes to activate to maximize the spread of a certain substance • Greedy algorithm • Objective function is submodular • (1-1/e)-appriximation 4
Edge Deletion Problem • Given G, source set A, • Find k edges • Supermodular • Greedy algorithm provides (1-1/e)- approximation • Scaling up tricks 5
Edge Addition Problem • Given G, source set A, • Find k edges • Still supermodular (Equivalent to constrained submodular minimization) • Algorithm: max. the lowerbound 6
Edge Addition Problem • Marginal Gain is bounded • Apply an approach for constrained submodular minimization with approximation guarantees R. Iyer, S. Jegelka, and J. Bilmes. Fast semidifferential based submodular function optimization. In ICML, 2013. 7
Experiments • Datasets • Syntetic dataset: generated by Kronecker graph model • (1) CorePeriphery, (2) ErdosRenyi and (3) Hierarchical • Real datasets: 8
Experiments • Competing heuristics • Random • Weights: highest weights • Betweenness • Eigen: k edges to max the leading eigendrop • Degree: k edges whose destination nodes have the highest out-degrees [8] 9
Experiments Edge deletion Edge addition 10
Core Decomposition of Uncertain Graphs Francesco Bonchi, Francesco Gullo, Andreas Kaltenbrunner, Yana Volkovich Yahoo Labs, Spain
Core decomposition • k-core of a graph • a maximal subgraph in which every vertex is connected to at least k other vertices within that subgraph • Core decomposition • The set of all k-cores of a graph G forms the core decomposition of G 12
K-core under uncertain graphs • A maximal subgraph whose vertices have at least k neigbours in that subgraph with probability no less than η 13
Example 14
Motivation • core decomposition can be computed efficiently in deterministic graphs • computed in linear time • However, does not guarantee efficiency in uncertain graphs • even the simplest graph operations may become computationally intensive. • uncertain graph • edges are assigned a probability of existence • E.g.:, protein-interaction, the influence of one person on another 15
Applications • Influence maximization • Idea: just reduce the input graph G by keeping only the inner-most η -shells • the higher the core index is, the more likely the vertex is an influential spreader [17] • Task-driven team formation • Node: individuals; edge: a probabilistic topic model • Given a pair <T,Q> where T is the set of terms, Q is a set of nodes • Goal: Find a node of nodes A where Q ⊆ A, which a good team to perform the task in T • Solution: find a connected component of (k, η )-core which contains A 16
Algorithm framework the maximum degree such that the probability for v to have that degree is no less than η Non-trivial to compute Follow the deterministic case 17
Experiments Influence Maximization Task-driven Team-formation 18
Fast Influence-based Coarsening for Large Networks Manish Purohit ^ , B. Aditya Prakash *, Chanhyun Kang ^ , Yao Zhang * , V S Subrahmanian ^ *Virginia Tech ^University of Maryland KDD, New York City August 26, 2014
Networks are getting huge! Flickr (friendship network): 87 million Amazon (friendship network): 237 million users and 8 billion photos until 2013 accounts until 2013 Facebook (friendship network): 829 Twitter (follower network): 271 million million daily active users on average in monthly active users 20 June 2014 Purohit, Prakash, Kang, Zhang, Subrahmanian 2014
Need for fast analysis • Ever growing list of applications of network effects • Viral Marketing • Immunization • Information Diffusion • … However, scaling up traditional algorithms up to millions of nodes is hard 21 Purohit, Prakash, Kang, Zhang, Subrahmanian 2014
How to handle large-scale networks • Approaches • Use faster / simpler algorithms • Perform analysis locally • i.e., divide the large network into smaller subgraphs • Zoom-out the network to obtain a smaller representation of the network this paper 22 Purohit, Prakash, Kang, Zhang, Subrahmanian 2014
Bird’s eye view of a network 23 Purohit, Prakash, Kang, Zhang, Subrahmanian 2014
Bird’s eye view of a network • “Zoom - out” of the graph to get a quick picture A D D A Zoom-out C C B B F E F E Small representation Big graph of the network Called “coarsen” in this paper 24 Purohit, Prakash, Kang, Zhang, Subrahmanian 2014
Outline • Motivation • Challenges • Problem Definition • Our Proposed Method • Experiments • Applications • Conclusion 25 Purohit, Prakash, Kang, Zhang, Subrahmanian 2014
Challenges • C1: How do we maintain diffusive characteristics when coarsening networks? • C2: How do we merge node to get the coarse network? • C3: how do we find the best node to merge fast? 26 Purohit, Prakash, Kang, Zhang, Subrahmanian 2014
C1: Information Diffusion • Cascading behavior in networks Blogs 1 Posts B 1 B 2 1 1 2 B 3 3 Links B 4 Information Blog network cascade Source: [McGlohon et. al., SDM2007] Diffusion is graph induced by a time ordered propagation of information (edges) 27 Purohit, Prakash, Kang, Zhang, Subrahmanian 2014
C1: Model information diffusion • Information spreads over networks • e.g.:, rumor/meme spreads over Twitter following network • Independent cascade model (IC ) [Kempe+, KDD03] • Weights p ij : propagation prob. from i to j • Each node has only one chance to infect its neighbors Meme spreading 28 Purohit, Prakash, Kang, Zhang, Subrahmanian 2014
C1: Diffusive characteristics • First eigenvalue λ 1 (of adjacency matrix) is enough for most diffusion models. (Prakash et al. [ICDM’12]) λ 1 is the epidemic threshold “ Safe” “Vulnerable” “Deadly” Increasing λ 1 , Increasing vulnerability 29 Purohit, Prakash, Kang, Zhang, Subrahmanian 2014
C1: maintain diffusive characteristics • Goal: maintain the diffusive characteristics of the original network in the coarsened network? Make the coarsened network has the least change in the first eigenvalue A D D A coarsen C C B B F E F E Original network Coarsened network 30 Purohit, Prakash, Kang, Zhang, Subrahmanian 2014
C2: How to merge nodes • Goal: Merge nodes of graph G to get the coarsened graph that “approximates” G with respect to diffusion Original network Merge b and a can 0.375! get the least change of λ 1 Is this correct? Influence from d to b: 0.5 Influence from d to a: 0.25 Average: 0.375 31 Purohit, Prakash, Kang, Zhang, Subrahmanian 2014
Details C2: How to merge nodes • In general: Merging a,b 32 Purohit, Prakash, Kang, Zhang, Subrahmanian 2014
C3: which nodes to merge • Goal: • Find the best nodes to merge • Fast, scalable to large network Talk about it later A D D A coarsen C C B B F E F E Coarsened network Original network 33 Purohit, Prakash, Kang, Zhang, Subrahmanian 2014
Outline • Motivation • Challenges • Problem Definition • Our Proposed Method • Experiments • Applications • Conclusion 34 Purohit, Prakash, Kang, Zhang, Subrahmanian 2014
Problem Definition Graph Coarsening Problem (GCP) Given: large graph G(V, E), and reduction factor α Find: the best set of edges to merge Such that: | λ G - λ H | is minimized • (i.e. H is the coarsened graph with the least change in the first eigenvalue) 35
Naive Greedy Heuristic Step: • Score every edge by the change in eigenvalue • Greedily choose the edge (a,b) with the least score, and merge (a,b) • Re-evaluate the scores of every edge and repeat • Too slow! O(m 2 ) time to score all edges • Lose time benefits of analyzing the smaller graph 36 Purohit, Prakash, Kang, Zhang, Subrahmanian 2014
Outline • Motivation • Problem Definition • Challenges • Our Proposed Method • CoarseNet • Experiments • Applications • Conclusion 37 Purohit, Prakash, Kang, Zhang, Subrahmanian 2014
CoarseNet: idea • Can we approximate the edge scores faster? • Yes! • Use matrix perturbation arguments to estimate (up to first order terms) the score of an edge in constant time! • Score all edges in O(m) time • Naive Heuristic: O(m 2 ) time 38 Purohit, Prakash, Kang, Zhang, Subrahmanian 2014
Recommend
More recommend