http cs224w stanford edu 1 new problem outbreak detection
play

http://cs224w.stanford.edu (1) New problem: Outbreak detection (2) - PowerPoint PPT Presentation

HW2 Q1.1 parts (b) and (c) cancelled. HW3 released. It is long. Start early. CS224W: Analysis of Networks Jure Leskovec, Stanford University http://cs224w.stanford.edu (1) New problem: Outbreak detection (2) Develop an approximation


  1. HW2 Q1.1 parts (b) and (c) cancelled. HW3 released. It is long. Start early. CS224W: Analysis of Networks Jure Leskovec, Stanford University http://cs224w.stanford.edu

  2. ¡ (1) New problem: Outbreak detection ¡ (2) Develop an approximation algorithm § It is a submodular opt. problem! ¡ (3) Speed-up greedy hill-climbing § Valid for optimizing general submodular functions (i.e., also works for influence maximization) ¡ (4) Prove a new “data dependent” bound on the solution quality § Valid for optimizing any submodular function (i.e., also works for influence maximization) 10/26/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 2

  3. ¡ Given a real city water distribution network ¡ And data on how contaminants spread in the network ¡ Detect the contaminant as quickly as possible S S ¡ Problem posed by the US Environmental Protection Agency 10/26/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 3

  4. Posts Blogs Information cascade Time ordered hyperlinks Which blogs should one read to detect cascades as effectively as possible? 10/26/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 4

  5. Want to read things before others do. Detect blue & yellow stories soon but miss the red story . Detect all stories but late . 10/26/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 5

  6. ¡ Both of these two are an instance of the same underlying problem! ¡ Given a dynamic process spreading over a network we want to select a set of nodes to detect the process effectively ¡ Many other applications: § Epidemics § Influence propagation § Network security 10/26/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 6

  7. ¡ Utility of placing sensors: § Water flow dynamics, demands of households, … ¡ For each subset S Í V compute utility f(S) High impact Low impact outbreak outbreak Contamination Medium impact S 3 outbreak S 1 S 2 S 3 S 4 S 2 S 1 Sensor reduces impact through S 4 early detection! Set V of all network junctions S 1 Low sensing “quality” (e.g. f(S)=0.01) High sensing “quality” (e.g., f(S) = 0.9) 10/26/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 7

  8. Given: ¡ Graph 𝐻(𝑊, 𝐹) ¡ Data on how outbreaks spread over the 𝑯 : § For each outbreak 𝑗 we know the time 𝑈(𝑣, 𝑗) when outbreak 𝑗 contaminates node 𝑣 Simulator of water consumption&flow Water distribution network (built by Mech. Eng. people) (physical pipes and junctions) We simulate the contamination spread for every possible location. 10/26/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 8

  9. Given: ¡ Graph 𝐻(𝑊, 𝐹) ¡ Data on how outbreaks spread over the 𝑯 : § For each outbreak 𝑗 we know the time 𝑈(𝑣, 𝑗) when outbreak 𝑗 contaminates node 𝑣 c a b a c b Traces of the information flow and identify influence sets The network of Collect lots of blogs posts and trace the blogosphere hyperlinks to obtain data about information flow from a given blog. 10/26/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 9

  10. � Given: ¡ Graph 𝐻(𝑊, 𝐹) ¡ Data on how outbreaks spread over the 𝑯 : § For each outbreak 𝑗 we know the time 𝑈(𝑣, 𝑗) when outbreak 𝑗 contaminates node 𝑣 ¡ Goal: Select a subset of nodes S that maximizes the expected reward : max /⊆1 𝑔 𝑇 = 5 𝑄 𝑗 𝑔 7 𝑇 7 Expected reward for detecting outbreak i subject to: cost(S) < B P(i) … probability of outbreak i occurring. f(i) … reward for detecting outbreak i using sensors S . 10/26/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 10

  11. ¡ Reward (one of the following three): § (1) Minimize time to detection § (2) Maximize number of detected propagations § (3) Minimize number of infected people ¡ Cost (context dependent): § Reading big blogs is more time consuming § Placing a sensor in a remote location is expensive 8 5 11 9 2 outbreak i 1 6 f(S) 3 10 Monitoring blue node saves more people than monitoring the green node 7 10/26/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 11

  12. ¡ Objective functions: § 1) Time to detection ( DT ) § How long does it take to detect a contamination? § Penalty for detecting at time 𝒖 : 𝜌 7 (𝑢) = 𝑢 § 2) Detection likelihood ( DL ) § How many contaminations do we detect? § Penalty for detecting at time 𝒖 : 𝜌 7 (𝑢) = 0 , 𝜌 7 (∞) = 1 § Note, this is binary outcome: we either detect or not § 3) Population affected ( PA ) § How many people drank contaminated water? § Penalty for detecting at time 𝒖 : 𝜌 7 (𝑢) = {# of infected nodes in outbreak 𝑗 by time 𝑢 }. ¡ Observation: In all cases detecting sooner does not hurt! 10/26/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 12

  13. We define 𝒈 𝒋 𝑻 as penalty reduction: 𝑔 7 𝑇 = 𝜌 7 ∅ − 𝜌 7 (𝑈(𝑇, 𝑗)) ¡ Observation: Diminishing returns New sensor: S 1 S 1 S’ s’ S 2 S 3 S 2 S 4 Placement S={s 1 , s 2 } Placement S’={s 1 , s 2 , s 3 , s 4 } Adding s’helps Adding s’helps a lot very little 10/26/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 13

  14. ¡ Claim: For all 𝑩 ⊆ 𝑪 ⊆ 𝑾 and sensors 𝒕 ∈ 𝑾\𝑪 𝒈 𝑩 ∪ 𝒕 − 𝒈 𝑩 ≥ 𝒈 𝑪 ∪ 𝒕 − 𝒈 𝑪 ¡ Proof: All our objectives are submodular § Fix cascade/outbreak 𝒋 § Show 𝒈 𝒋 𝑩 = 𝝆 𝒋 ∞ − 𝝆 𝒋 (𝑼(𝑩, 𝒋)) is submodular § Consider 𝑩 ⊆ 𝑪 ⊆ 𝑾 and sensor 𝒕 ∈ 𝑾\𝑪 § When does node 𝒕 detect cascade 𝒋 ? § We analyze 3 cases based on when 𝒕 detects outbreak i § (1) 𝑼 𝒕, 𝒋 ≥ 𝑼(𝑩, 𝒋) : 𝒕 detects late, nobody benefits: 𝑔 7 𝐵 ∪ 𝑡 = 𝑔 7 𝐵 , also 𝑔 7 𝐶 ∪ 𝑡 = 𝑔 7 𝐶 and so 𝑔 7 𝐵 ∪ 𝑡 − 𝑔 7 𝐵 = 0 = 𝑔 7 𝐶 ∪ 𝑡 − 𝑔 7 𝐶 10/26/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 14

  15. � Remember 𝑩 ⊆ 𝑪 ¡ Proof (contd.): § (2) 𝑼 𝑪, 𝒋 ≤ 𝑼 𝒕, 𝒋 < 𝑼 𝑩, 𝒋 : 𝒕 detects after B but before A 𝒕 detects sooner than any node in 𝑩 but after all in 𝑪 . So 𝒕 only helps improve the solution 𝑩 (but not 𝑪) 𝑔 7 𝐵 ∪ 𝑡 − 𝑔 7 𝐵 ≥ 0 = 𝑔 7 𝐶 ∪ 𝑡 − 𝑔 7 𝐶 § (3) 𝑼 𝒕, 𝒋 < 𝑼(𝑪, 𝒋) : 𝒕 detects early 𝑔 7 𝐵 ∪ 𝑡 − 𝑔 7 𝐵 = 𝜌 7 ∞ − 𝜌 7 𝑈 𝑡, 𝑗 − 𝑔 7 (𝐵) ≥ 𝜌 7 ∞ − 𝜌 7 𝑈 𝑡, 𝑗 − 𝑔 7 (𝐶) = 𝑔 7 𝐶 ∪ 𝑡 − 𝑔 7 𝐶 § Ineqaulity is due to non-decreasingness of 𝑔 7 (⋅) , i.e., 𝑔 7 𝐵 ≤ 𝑔 7 (𝐶) § So, 𝒈 𝒋 (⋅) is submodular! ¡ So, 𝒈(⋅) is also submodular 𝑔 𝑇 = 5 𝑄 𝑗 𝑔 7 𝑇 7 10/26/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 15

  16. ¡ What do we know about optimizing submodular Hill-climbing functions? reward § A hill-climbing (i.e., greedy) is near d a 𝟐 optimal: (𝟐 − 𝒇 ) ⋅ 𝑷𝑸𝑼 b b a ¡ But: c e § (1) This only works for unit cost c case! (each sensor costs the same) d § For us each sensor 𝒕 has cost 𝒅(𝒕) e § (2) Hill-climbing algorithm is slow Add sensor with § At each iteration we need to re-evaluate highest marginal gain marginal gains of all nodes § Runtime 𝑷(|𝑾| · 𝑳) for placing 𝑳 sensors 10/26/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu Part 2-16

  17. ¡ Consider the following algorithm to solve the outbreak detection problem: Hill-climbing that ignores cost § Ignore sensor cost 𝒅(𝒕) § Repeatedly select sensor with highest marginal gain § Do this until the budget is exhausted ¡ Q: How well does this work? ¡ A: It can fail arbitrarily badly! L § Next we come up with an example where Hill- climbing solution is arbitrarily away from OPT 10/26/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 18

  18. ¡ Bad example when we ignore cost: § 𝒐 sensors, budget 𝑪 § 𝒕 𝟐 : reward 𝒔 , cost 𝑪 , , 𝒕 𝟑 … 𝒕 𝒐 : reward 𝒔 − 𝜻 , § All sensors have the same cost: c 𝒕 𝒋 = 𝟐 § Hill-climbing always prefers more expensive sensor 𝒕 𝟐 with reward 𝒔 (and exhausts the budget). It never selects cheaper sensors with reward 𝒔 − 𝜻 → For variable cost it can fail arbitrarily badly! ¡ Idea: What if we optimize benefit-cost ratio ? 𝑔 𝐵 7ef ∪ {𝑡} − 𝑔(𝐵 7ef ) Greedily pick sensor 𝑡 7 = arg max 𝒕 𝒋 that maximizes 𝒅 𝒕 d∈1 benefit to cost ratio. 10/26/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 19

Recommend


More recommend