Graph testing Observations: Infection status of n nodes in graph k infected nodes (1) c censored (nonreporting) nodes ( ⋆ ) n − k − c uninfected nodes (0) vs. vs. Po-Ling Loh (UW-Madison) Data science for networked data Apr 16, 2019 25 / 45
Graph testing H 0 H 1 H 2 vs. vs. T = 10 T = 0 T = 3 Compute test statistic T = # edges between infected nodes Po-Ling Loh (UW-Madison) Data science for networked data Apr 16, 2019 26 / 45
Graph testing H 0 H 1 H 2 vs. vs. T = 10 T = 0 T = 3 Compute test statistic T = # edges between infected nodes Need to construct proper rejection rule based on T , derive validity of hypothesis test Po-Ling Loh (UW-Madison) Data science for networked data Apr 16, 2019 26 / 45
Infection model Parameters λ, η For each node v , generate T v ∼ Exp ( λ ) For each edge ( u , v ), generate T uv ∼ Exp ( η ) Infection time of any vertex v is t v = min u ∈ N ( v ) { t u + T uv } ∧ T v Po-Ling Loh (UW-Madison) Data science for networked data Apr 16, 2019 27 / 45
Infection model Parameters λ, η For each node v , generate T v ∼ Exp ( λ ) For each edge ( u , v ), generate T uv ∼ Exp ( η ) Infection time of any vertex v is t v = min u ∈ N ( v ) { t u + T uv } ∧ T v Observation vector corresponds to infection states at a certain time Subset of censored nodes chosen uniformly at random Po-Ling Loh (UW-Madison) Data science for networked data Apr 16, 2019 27 / 45
Permutation test Goal: For α ∈ (0 , 1), construct rejection rule such that P (reject | H 0 is true) ≤ α Po-Ling Loh (UW-Madison) Data science for networked data Apr 16, 2019 28 / 45
Permutation test Goal: For α ∈ (0 , 1), construct rejection rule such that P (reject | H 0 is true) ≤ α n � � Use permutation test that computes T for reassignments k , c , n − k − c of infected/nonreporting/uninfected nodes H 1 T = 0 T = 4 T = 4 T = 4 Po-Ling Loh (UW-Madison) Data science for networked data Apr 16, 2019 28 / 45
Permutation test Goal: For α ∈ (0 , 1), construct rejection rule such that P (reject | H 0 is true) ≤ α n � � Use permutation test that computes T for reassignments k , c , n − k − c of infected/nonreporting/uninfected nodes H 1 T = 0 T = 4 T = 4 T = 4 Based on (randomly chosen) permutations, compute p -value/rejection region and reject H 0 if ( p -value of T ) ≤ α Po-Ling Loh (UW-Madison) Data science for networked data Apr 16, 2019 28 / 45
Permutation test α do not reject H 0 reject H 0 T ( I ) Po-Ling Loh (UW-Madison) Data science for networked data Apr 16, 2019 29 / 45
Permutation test α do not reject H 0 reject H 0 T ( I ) In practice, sufficient to compute empirical distribution from large number of random permutations Po-Ling Loh (UW-Madison) Data science for networked data Apr 16, 2019 29 / 45
Theory for permutation test Success depends on symmetries of underlying networks rather than parameters λ, η Consider Π 0 = Aut( G 0 ) and Π 1 = Aut( G 1 ), subsets of S n Po-Ling Loh (UW-Madison) Data science for networked data Apr 16, 2019 30 / 45
Theory for permutation test Success depends on symmetries of underlying networks rather than parameters λ, η Consider Π 0 = Aut( G 0 ) and Π 1 = Aut( G 1 ), subsets of S n 1 2 1 2 8 6 3 4 π 2 Aut( G ) 7 5 4 3 6 7 5 8 Po-Ling Loh (UW-Madison) Data science for networked data Apr 16, 2019 30 / 45
Theory for permutation test Success depends on symmetries of underlying networks rather than parameters λ, η Consider Π 0 = Aut( G 0 ) and Π 1 = Aut( G 1 ), subsets of S n 1 2 1 2 8 6 3 4 π 2 Aut( G ) 7 5 4 3 6 7 5 8 Theorem Let π be drawn uniformly from S n . If Π 1 Π 0 = S n , the permutation test controls Type I error at level α . Po-Ling Loh (UW-Madison) Data science for networked data Apr 16, 2019 30 / 45
Extensions and open directions Characterization of condition Π 1 Π 0 = S n for various graph families Bounds on Type II error for specific graphs Conditioning on identity of censored nodes Po-Ling Loh (UW-Madison) Data science for networked data Apr 16, 2019 31 / 45
Extensions and open directions Characterization of condition Π 1 Π 0 = S n for various graph families Bounds on Type II error for specific graphs Conditioning on identity of censored nodes Open directions: How to identify which graphs to use as null/alternative hypotheses? Inhomogeneous λ and η ? Confidence sets for underlying network? Po-Ling Loh (UW-Madison) Data science for networked data Apr 16, 2019 31 / 45
Resource allocation ? Justin Khim Varun Jog Ashley Hou Wen Yan (UPenn) (UW-Madison) (UW-Madison) (Southeast University) Po-Ling Loh (UW-Madison) Data science for networked data Apr 16, 2019 32 / 45
Influence maximization ( with Justin Khim and Varun Jog) New goal: Seed a network to “infect” as many nodes as possible Useful for information dissemination, marketing, etc. t = 0 Po-Ling Loh (UW-Madison) Data science for networked data Apr 16, 2019 33 / 45
Influence maximization ( with Justin Khim and Varun Jog) New goal: Seed a network to “infect” as many nodes as possible Useful for information dissemination, marketing, etc. t = 0 t = 1 Po-Ling Loh (UW-Madison) Data science for networked data Apr 16, 2019 33 / 45
Influence maximization ( with Justin Khim and Varun Jog) New goal: Seed a network to “infect” as many nodes as possible Useful for information dissemination, marketing, etc. t = 0 t = 1 t = 2 Po-Ling Loh (UW-Madison) Data science for networked data Apr 16, 2019 33 / 45
Influence maximization ( with Justin Khim and Varun Jog) New goal: Seed a network to “infect” as many nodes as possible Useful for information dissemination, marketing, etc. t = 0 t = 1 t = 2 Questions 1 If k nodes may be infected initially, which nodes should be selected to maximize infection spread? Po-Ling Loh (UW-Madison) Data science for networked data Apr 16, 2019 33 / 45
Influence maximization ( with Justin Khim and Varun Jog) New goal: Seed a network to “infect” as many nodes as possible Useful for information dissemination, marketing, etc. t = 0 t = 1 t = 2 Questions 1 If k nodes may be infected initially, which nodes should be selected to maximize infection spread? 2 How to determine maximal set efficiently? Po-Ling Loh (UW-Madison) Data science for networked data Apr 16, 2019 33 / 45
Model: Linear threshold model ( broadly, triggering models ) Edges have weights ( b ij ), satisfying � j b ji ≤ 1 Nodes choose thresholds θ i ∈ [0 , 1] i.i.d., uniformly at random Po-Ling Loh (UW-Madison) Data science for networked data Apr 16, 2019 34 / 45
Model: Linear threshold model ( broadly, triggering models ) Edges have weights ( b ij ), satisfying � j b ji ≤ 1 Nodes choose thresholds θ i ∈ [0 , 1] i.i.d., uniformly at random 0 . 5 0 . 6 0 . 2 0 . 4 0 . 4 0 . 3 0 . 9 0 . 7 0 . 1 t = 0 On each round, uninfected nodes compute total weight of infected neighbors and become infected if � b ji > θ i j is infected Po-Ling Loh (UW-Madison) Data science for networked data Apr 16, 2019 34 / 45
Model: Linear threshold model ( broadly, triggering models ) Edges have weights ( b ij ), satisfying � j b ji ≤ 1 Nodes choose thresholds θ i ∈ [0 , 1] i.i.d., uniformly at random 0 . 5 0 . 5 0 . 6 0 . 6 0 . 2 0 . 2 0 . 4 0 . 4 0 . 4 0 . 4 0 . 3 0 . 3 0 . 6 0 . 9 0 . 9 0 . 2 0 . 7 0 . 7 0 . 1 0 . 1 t = 0 t = 1 On each round, uninfected nodes compute total weight of infected neighbors and become infected if � b ji > θ i j is infected Po-Ling Loh (UW-Madison) Data science for networked data Apr 16, 2019 34 / 45
Model: Linear threshold model ( broadly, triggering models ) Edges have weights ( b ij ), satisfying � j b ji ≤ 1 Nodes choose thresholds θ i ∈ [0 , 1] i.i.d., uniformly at random 0 . 5 0 . 5 0 . 5 0 . 6 0 . 6 0 . 6 0 . 2 0 . 2 0 . 2 0 . 4 0 . 4 0 . 4 0 . 4 0 . 4 0 . 4 0 . 3 0 . 3 0 . 3 0 . 6 0 . 6 0 . 9 0 . 9 0 . 9 0 . 2 0 . 2 0 . 7 0 . 7 0 . 5 0 . 7 0 . 1 0 . 1 0 . 1 t = 0 t = 1 t = 2 On each round, uninfected nodes compute total weight of infected neighbors and become infected if � b ji > θ i j is infected Po-Ling Loh (UW-Madison) Data science for networked data Apr 16, 2019 34 / 45
Previous work Monotonicity, submodularity of influence function in triggering models ( Kempe et al. ’03 ) Po-Ling Loh (UW-Madison) Data science for networked data Apr 16, 2019 35 / 45
Previous work Monotonicity, submodularity of influence function in triggering models ( Kempe et al. ’03 ) 1 − 1 � � = ⇒ Greedy algorithm yields -approximation to e A ⊆ V : | A |≤ k I ( A ) max Po-Ling Loh (UW-Madison) Data science for networked data Apr 16, 2019 35 / 45
Previous work Monotonicity, submodularity of influence function in triggering models ( Kempe et al. ’03 ) 1 − 1 � � = ⇒ Greedy algorithm yields -approximation to e A ⊆ V : | A |≤ k I ( A ) max However, method involves approximating I at each iteration of greedy algorithm via simulations Po-Ling Loh (UW-Madison) Data science for networked data Apr 16, 2019 35 / 45
Key contributions 1 Computable upper and lower bounds for influence function in general triggering models 2 Characterization of gap between bounds Po-Ling Loh (UW-Madison) Data science for networked data Apr 16, 2019 36 / 45
Key contributions 1 Computable upper and lower bounds for influence function in general triggering models 2 Characterization of gap between bounds 3 Proof of monotonicity, submodularity for family of lower bounds 1 − 1 � � = ⇒ -approximation for sequential greedy algorithm e Po-Ling Loh (UW-Madison) Data science for networked data Apr 16, 2019 36 / 45
Key contributions 1 Computable upper and lower bounds for influence function in general triggering models 2 Characterization of gap between bounds 3 Proof of monotonicity, submodularity for family of lower bounds 1 − 1 � � = ⇒ -approximation for sequential greedy algorithm e Leads to significant speed-ups: LB 1 LB 2 UB Simulation Erd¨ os-Renyi 1.00 2.36 27.43 710.58 Preferential attachment 2.56 28.49 759.83 1.00 2 D -grid 1.00 2.43 47.08 1301.73 Po-Ling Loh (UW-Madison) Data science for networked data Apr 16, 2019 36 / 45
Budget allocation (with Ashley Hou) Problem: Given fixed budget to distribute amongst influencers, how to optimally allocate resources? T S y (1) = 2 y (4) = 3 Po-Ling Loh (UW-Madison) Data science for networked data Apr 16, 2019 37 / 45
Budget allocation (with Ashley Hou) Problem: Given fixed budget to distribute amongst influencers, how to optimally allocate resources? T S y (1) = 2 y (4) = 3 Mathematical formulation: If resources { y ( s ) } s ∈ S are allocated among source nodes S , probability of influencing customer t is � (1 − p st ) y ( s ) I t ( y ) = 1 − ( s , t ) ∈ E Po-Ling Loh (UW-Madison) Data science for networked data Apr 16, 2019 37 / 45
Budget allocation (with Ashley Hou) Problem: Given fixed budget to distribute amongst influencers, how to optimally allocate resources? T S y (1) = 2 y (4) = 3 Mathematical formulation: If resources { y ( s ) } s ∈ S are allocated among source nodes S , probability of influencing customer t is � (1 − p st ) y ( s ) I t ( y ) = 1 − ( s , t ) ∈ E so we solve max � t ∈ T I t ( y ) s.t. � s ∈ S y ( s ) ≤ B Po-Ling Loh (UW-Madison) Data science for networked data Apr 16, 2019 37 / 45
Robust variant In practice, might not know edge parameters p = { p st } , or even edge structure Po-Ling Loh (UW-Madison) Data science for networked data Apr 16, 2019 38 / 45
Robust variant In practice, might not know edge parameters p = { p st } , or even edge structure Robust optimization framework: � � � I p max min t ( y ) p ∈ Σ � s ∈ S y ( s ) ≤ B t ∈ T Po-Ling Loh (UW-Madison) Data science for networked data Apr 16, 2019 38 / 45
Robust variant In practice, might not know edge parameters p = { p st } , or even edge structure Robust optimization framework: � � � I p max min t ( y ) p ∈ Σ � s ∈ S y ( s ) ≤ B t ∈ T Goal: Develop efficient algorithms for robust budget allocation with provable approximation guarantees Po-Ling Loh (UW-Madison) Data science for networked data Apr 16, 2019 38 / 45
Robust variant In practice, might not know edge parameters p = { p st } , or even edge structure Robust optimization framework: � � � I p max min t ( y ) p ∈ Σ � s ∈ S y ( s ) ≤ B t ∈ T Goal: Develop efficient algorithms for robust budget allocation with provable approximation guarantees Ingredients: Maximization of min of submodular functions, extensions to integer lattices and budget constraints Po-Ling Loh (UW-Madison) Data science for networked data Apr 16, 2019 38 / 45
Network immunization (with Wen Yan) Goal: Given a budget of interventions at nodes/edges of a graph, how to optimally distribute resources to retard an epidemic? Po-Ling Loh (UW-Madison) Data science for networked data Apr 16, 2019 39 / 45
Network immunization (with Wen Yan) Goal: Given a budget of interventions at nodes/edges of a graph, how to optimally distribute resources to retard an epidemic? Interested in fractional immunization , which only decreases infectiveness of nodes/edges 0 . 2 0 . 5 0 . 4 Po-Ling Loh (UW-Madison) Data science for networked data Apr 16, 2019 39 / 45
Network immunization Formulation as influence maximization problem: � � min A ⊆ V : | A |≤ k I ( A ; { b ij } − { θ ij } ) max � θ ij ≤ B Po-Ling Loh (UW-Madison) Data science for networked data Apr 16, 2019 40 / 45
Network immunization Formulation as influence maximization problem: � � min A ⊆ V : | A |≤ k I ( A ; { b ij } − { θ ij } ) max � θ ij ≤ B Challenges: Bilevel optimization problem involving discrete and continuous variables 1 No computable closed-form expression for I or ∇I 2 Po-Ling Loh (UW-Madison) Data science for networked data Apr 16, 2019 40 / 45
Local algorithms Muni Pydi Varun Jog (UW-Madison) (UW-Madison) Po-Ling Loh (UW-Madison) Data science for networked data Apr 16, 2019 41 / 45
Maximizing graph functions Given function f defined on nodes of a graph Examples: Degree, age of node, power/population level, etc. 2 3 2 4 1 1 1 6 2 2 2 2 2 Po-Ling Loh (UW-Madison) Data science for networked data Apr 16, 2019 42 / 45
Maximizing graph functions Given function f defined on nodes of a graph Examples: Degree, age of node, power/population level, etc. 2 3 2 4 1 1 1 6 2 2 2 2 2 Goal: Maximize f by “walking” along edges and querying values Po-Ling Loh (UW-Madison) Data science for networked data Apr 16, 2019 42 / 45
Maximizing graph functions Given function f defined on nodes of a graph Examples: Degree, age of node, power/population level, etc. 2 3 2 4 1 1 1 6 2 2 2 2 2 Goal: Maximize f by “walking” along edges and querying values Could use “vanilla random walk” with transition probabilities P ij = w ij d i , but can we leverage smoothness/structure of graph function? Po-Ling Loh (UW-Madison) Data science for networked data Apr 16, 2019 42 / 45
Metropolis-Hastings algorithm MH algorithm specified by target density p f and proposal distribution Q (stochastic matrix) Po-Ling Loh (UW-Madison) Data science for networked data Apr 16, 2019 43 / 45
Metropolis-Hastings algorithm MH algorithm specified by target density p f and proposal distribution Q (stochastic matrix) Transition matrix: � � 1 , p f ( j ) Q ji � Q ij min , j � = i , p f ( i ) Q ij P ij = 1 − � j = i j � = i P ij , Po-Ling Loh (UW-Madison) Data science for networked data Apr 16, 2019 43 / 45
Metropolis-Hastings algorithm MH algorithm specified by target density p f and proposal distribution Q (stochastic matrix) Transition matrix: � � 1 , p f ( j ) Q ji � Q ij min , j � = i , p f ( i ) Q ij P ij = 1 − � j = i j � = i P ij , Known convergence of MH algorithm to p f Po-Ling Loh (UW-Madison) Data science for networked data Apr 16, 2019 43 / 45
Metropolis-Hastings algorithm MH algorithm specified by target density p f and proposal distribution Q (stochastic matrix) Transition matrix: � � 1 , p f ( j ) Q ji � Q ij min , j � = i , p f ( i ) Q ij P ij = 1 − � j = i j � = i P ij , Known convergence of MH algorithm to p f Idea: Build a density p f maximized wherever f is maximized, hope that MH algorithm finds maximizers quickly Po-Ling Loh (UW-Madison) Data science for networked data Apr 16, 2019 43 / 45
Local algorithm 1 Initialize at random vertex i 0 Po-Ling Loh (UW-Madison) Data science for networked data Apr 16, 2019 44 / 45
Local algorithm 1 Initialize at random vertex i 0 2 Take T steps of MH algorithm according to transition matrix P Po-Ling Loh (UW-Madison) Data science for networked data Apr 16, 2019 44 / 45
Local algorithm 1 Initialize at random vertex i 0 2 Take T steps of MH algorithm according to transition matrix P 3 Output maximum among { f ( i 0 ) , . . . , f ( i T ) } Po-Ling Loh (UW-Madison) Data science for networked data Apr 16, 2019 44 / 45
Local algorithm 1 Initialize at random vertex i 0 2 Take T steps of MH algorithm according to transition matrix P 3 Output maximum among { f ( i 0 ) , . . . , f ( i T ) } � � and Q = D − 1 W Exponential walk: p f ( i ) ∝ exp γ f ( i ) Laplacian walk: p f ( i ) ∝ f 2 ( i ) and Q defined with respect to eigenvectors of graph Laplacian L = D − W Po-Ling Loh (UW-Madison) Data science for networked data Apr 16, 2019 44 / 45
Local algorithm 1 Initialize at random vertex i 0 2 Take T steps of MH algorithm according to transition matrix P 3 Output maximum among { f ( i 0 ) , . . . , f ( i T ) } � � and Q = D − 1 W Exponential walk: p f ( i ) ∝ exp γ f ( i ) Laplacian walk: p f ( i ) ∝ f 2 ( i ) and Q defined with respect to eigenvectors of graph Laplacian L = D − W Theoretical results: Rates of convergence in TV distance, hitting time bounds for both algorithms in terms of graph/function characteristics Po-Ling Loh (UW-Madison) Data science for networked data Apr 16, 2019 44 / 45
Summary Many interesting data analysis problems involving network-structured data Po-Ling Loh (UW-Madison) Data science for networked data Apr 16, 2019 45 / 45
Recommend
More recommend