http://cs224w.stanford.edu (1) New problem: Outbreak detection (2) - PowerPoint PPT Presentation

CS224W: Machine Learning with Graphs Jure Leskovec, Stanford University http://cs224w.stanford.edu

¡ (1) New problem: Outbreak detection ¡ (2) Develop an approximation algorithm § It is a submodular opt. problem! ¡ (3) Speed-up greedy hill-climbing § Valid for optimizing general submodular functions (i.e., also works for influence maximization) ¡ (4) Prove a new “data dependent” bound on the solution quality § Valid for optimizing any submodular function (i.e., also works for influence maximization) 11/12/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 2

¡ Given a real city water distribution network ¡ And data on how contaminants spread in the network ¡ Detect the contaminant as quickly as possible S S ¡ Problem posed by the US Environmental Protection Agency 11/12/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 3

Posts Users/blogs Information cascade Time ordered hyperlinks Which users/news sites should one follow to detect cascades as effectively as possible? 11/12/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 4

Want to read things before others do. Detect blue & yellow stories soon but miss the red story . Detect all stories but late . 11/12/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 5

¡ Both of these two are instances of the same underlying problem! ¡ Given a dynamic process spreading over a network we want to select a set of nodes to detect the process effectively ¡ Many other applications: § Epidemics § Influence propagation § Network security 11/12/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 6

¡ Utility of placing sensors: § Water flow dynamics, demands of households, … ¡ For each subset S Í V compute utility f(S) High impact Low impact outbreak outbreak Contamination Medium impact S 3 outbreak S 1 S 2 S 3 S 4 S 2 S 1 Sensor reduces impact through S 4 early detection! Set V of all network junctions S 1 Low sensing “quality” (e.g. f(S)=0.01) High sensing “quality” (e.g., f(S) = 0.9) 11/12/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 7

Given: ¡ Graph 𝐻(𝑊, 𝐹) ¡ Data about how outbreaks spread over the 𝑯 : § For each outbreak 𝑗 we know the time 𝑈(𝑣, 𝑗) when outbreak 𝑗 contaminates node 𝑣 Simulator of water consumption & flow Water distribution network (built by Mech. Eng. people) (physical pipes and junctions) We simulate the contamination spread for every possible location. 11/12/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 8

Given: ¡ Graph 𝐻(𝑊, 𝐹) ¡ Data about how outbreaks spread over the 𝑯 : § For each outbreak 𝑗 we know the time 𝑈(𝑣, 𝑗) when outbreak 𝑗 contaminates node 𝑣 c a b a c b Traces of the information flow and identify influence sets The network of Collect lots of articles and trace them to news media obtain data about information flow from a given news site. 11/12/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 9

Given: ¡ Graph 𝐻(𝑊, 𝐹) ¡ Data on how outbreaks spread over the 𝑯 : § For each outbreak 𝑗 we know the time 𝑈(𝑣, 𝑗) when outbreak 𝑗 contaminates node 𝑣 ¡ Goal: Select a subset of nodes S that maximizes the expected reward : max .⊆0 𝑔 𝑇 = 4 𝑄 𝑗 𝑔 5 𝑇 5 Expected reward for detecting outbreak i subject to: cost(S) < B P(i) … probability of outbreak i occurring. f(i) … reward for detecting outbreak i using sensors S . 11/12/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 10

¡ Reward (one of the following three): § (1) Minimize time to detection § (2) Maximize number of detected propagations § (3) Minimize number of infected people ¡ Cost (context dependent): § Reading big blogs is more time consuming § Placing a sensor in a remote location is expensive 8 5 11 9 2 outbreak i 1 6 f(S) 3 10 Monitoring blue node saves more people than monitoring the green node 7 11/12/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 11

¡ Penalty 𝝆 𝒋 (𝒖) for detecting outbreak 𝒋 at time 𝒖 § 1) Time to detection ( DT ) § How long does it take to detect a contamination? § Penalty for detecting at time 𝒖 : 𝜌 5 (𝑢) = 𝑢 § 2) Detection likelihood ( DL ) § How many contaminations do we detect? § Penalty for detecting at time 𝒖 : 𝜌 5 (𝑢) = 0 , 𝜌 5 (∞) = 1 § Note, this is binary outcome: we either detect or not § 3) Population affected ( PA ) § How many people drank contaminated water? § Penalty for detecting at time 𝒖 : 𝜌 5 (𝑢) = {# of infected nodes in outbreak 𝑗 by time 𝑢 }. ¡ Observation: In all cases detecting sooner does not hurt! 11/12/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 12

We define 𝒈 𝒋 𝑻 as penalty reduction: 𝑔 5 𝑇 = 𝜌 5 ∞ − 𝜌 5 (𝑈(𝑇, 𝑗)) ¡ Observation: Diminishing returns New sensor: x 1 x 1 x’ S’ x 2 x 3 x 2 x 4 Placement S={x 1 , x 2 } Placement S’={x 1 , x 2 , x 3 , x 4 } Adding x’helps Adding x’helps a lot very little 11/12/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 13

¡ Claim: For all 𝑩 ⊆ 𝑪 ⊆ 𝑾 and sensor 𝒚 ∈ 𝑾\𝑪 𝒈 𝑩 ∪ 𝒚 − 𝒈 𝑩 ≥ 𝒈 𝑪 ∪ 𝒚 − 𝒈 𝑪 ¡ Proof: All our objectives are submodular § Fix outbreak 𝒋 § Show 𝒈 𝒋 𝑩 = 𝝆 𝒋 ∞ − 𝝆 𝒋 (𝑼(𝑩, 𝒋)) is submodular § Consider 𝑩 ⊆ 𝑪 ⊆ 𝑾 and sensor 𝒚 ∈ 𝑾\𝑪 § When does sensor 𝒚 detect outbreak 𝒋 ? § We analyze 3 cases based on when 𝒚 detects outbreak i § (1) 𝑼 𝑪, 𝒋 ≤ 𝑼 𝑩, 𝒋 < 𝑼(𝒚, 𝒋) : 𝒚 detects late, nobody benefits: 𝑔 5 𝐵 ∪ 𝑦 = 𝑔 5 𝐵 , also 𝑔 5 𝐶 ∪ 𝑦 = 𝑔 5 𝐶 and so 𝑔 5 𝐵 ∪ 𝑦 − 𝑔 5 𝐵 = 0 = 𝑔 5 𝐶 ∪ 𝑦 − 𝑔 5 𝐶 11/12/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 14

Remember 𝑩 ⊆ 𝑪 ¡ Proof (contd.): § (2) 𝑼 𝑪, 𝒋 ≤ 𝑼 𝒚, 𝒋 ≤ 𝑼 𝑩, 𝒋 : 𝒚 detects after B but before A 𝒚 detects sooner than any node in 𝑩 but after all in 𝑪 . So 𝒚 only helps improve the solution 𝑩 (but not 𝑪) 𝑔 5 𝐵 ∪ 𝑦 − 𝑔 5 𝐵 ≥ 0 = 𝑔 5 𝐶 ∪ 𝑦 − 𝑔 5 𝐶 § (3) 𝑼 𝒚, 𝒋 < 𝑼 𝑪, 𝒋 ≤ 𝑼(𝑩, 𝒋) : 𝒚 detects early 𝑔 5 𝐵 ∪ 𝑦 − 𝑔 5 𝐵 = 𝜌 5 ∞ − 𝜌 5 𝑈 𝑦, 𝑗 − 𝑔 5 (𝐵) ≥ 𝜌 5 ∞ − 𝜌 5 𝑈 𝑦, 𝑗 − 𝑔 5 (𝐶) = 𝑔 5 𝐶 ∪ 𝑦 − 𝑔 5 𝐶 § Inequality is due to non-decreasingness of 𝑔 5 (⋅) , i.e., 𝑔 5 𝐵 ≤ 𝑔 5 (𝐶) § So, 𝒈 𝒋 (⋅) is submodular! ¡ So, 𝒈(⋅) is also submodular 𝑔 𝑇 = 4 𝑄 𝑗 𝑔 5 𝑇 5 11/12/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 15

¡ What do we know about optimizing submodular Hill-climbing functions? reward § Hill-climbing (i.e., greedy) is near d a 𝟐 optimal: (𝟐 − 𝒇 ) ⋅ 𝑷𝑸𝑼 b b a ¡ But: c e § (1) This only works for unit cost c case! (each sensor costs the same) d § For us each sensor 𝒕 has cost 𝒅(𝒕) e § (2) Hill-climbing algorithm is slow Add sensor with § At each iteration we need to re-evaluate highest marginal gain marginal gains of all nodes § Runtime 𝑷(|𝑾| · 𝑳) for placing 𝑳 sensors 11/12/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu Part 2-16

¡ Consider the following algorithm to solve the outbreak detection problem: Hill-climbing that ignores cost § Ignore sensor cost 𝒅(𝒕) § Repeatedly select sensor with highest marginal gain § Do this until the budget is exhausted ¡ Q: How well does this work? ¡ A: It can fail arbitrarily badly! L § There exists a problem setting where the hill-climbing solution is arbitrarily far from OPT § Next we come up with an example 11/12/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 18

¡ Bad example when we ignore cost: § 𝒐 sensors, budget 𝑪 § 𝒕 𝟐 : reward 𝒔 , cost 𝑪 , , § 𝒕 𝟑 … 𝒕 𝒐 : reward 𝒔 − 𝜻 , c = 𝟐 § Hill-climbing always prefers more expensive sensor 𝒕 𝟐 with reward 𝒔 (and exhausts the budget). It never selects cheaper sensors with reward 𝒔 − 𝜻 → For variable cost it can fail arbitrarily badly! ¡ Idea: What if we optimize benefit-cost ratio ? 𝑔 𝐵 5fg ∪ {𝑡} − 𝑔(𝐵 5fg ) Greedily pick sensor 𝑡 5 = arg max 𝒕 𝒋 that maximizes 𝒅 𝒕 d∈(0\e) benefit to cost ratio. 11/12/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 19

http://cs224w.stanford.edu (1) New problem: Outbreak detection (2) - PowerPoint PPT Presentation

CS224W: Machine Learning with Graphs Jure Leskovec, Stanford University http://cs224w.stanford.edu (1) New problem: Outbreak detection (2) Develop an approximation algorithm It is a submodular opt. problem! (3) Speed-up greedy

http://cs224w.stanford.edu October August 12/3/2013 Jure Leskovec, Stanford CS224W: Social and

http://cs224w.stanford.edu 10/31/2012 Jure Leskovec, Stanford CS224W: Social and Information

http://cs224w.stanford.edu Course website: Course website: http://cs224w.stanford.edu

http://cs224w.stanford.edu 10/25/2010 Jure Leskovec, Stanford CS224W: Social and Information

http://cs224w.stanford.edu ? ? ? ? Machine Learning ? Node classification 12/4/17 Jure

http://cs224w.stanford.edu Nodes Nodes Network Adjacency matrix 11/30/17 Jure Leskovec,

http://cs224w.stanford.edu (1) New problem: Outbreak detection (2) Develop an approximation

http://cs224w.stanford.edu (1) New problem: Outbreak detection (2) Develop an approximation

http://cs224w.stanford.edu ? ? ? ? Machine Learning ? Node classification 10/15/19 Jure

http://cs224w.stanford.edu Output: Node embeddings. We can also embed larger network

http://cs224w.stanford.edu Stanford Social Web (ca. 1999) network

http://cs224w.stanford.edu Networks of tightly Networks of tightly connected groups

http://cs224w.stanford.edu Spreading through networks: Spreading through networks:

http://cs224w.stanford.edu Non overlapping vs overlapping communities Non overlapping

http://cs224w.stanford.edu Teams of 2 3 students (1 is also ok) Teams of 2 3 students

http://cs224w.stanford.edu How to organize/navigate it? How to organize/navigate it?

CONNECTING CARE FOR CHILDREN: A partnership between CCGs, hospital & community health

Vaccine Induced Pathogen Type Replacement: Theoretical Mechanism DIMACS Workshop on Co-evolution

Predicting ED Attendance from GP Records Jon Patrick CEO Statement of Interests Project was

National Quality Registers in Sweden Sweden Sweden is not one country concerning health care!

Strategies to Support the Health and Well-Being of Clinicians During the COVID-19 Outbreak

A Multidisciplinary Approach to Investigating Foodborne Illness Outbreaks June 26, 2019

Office Hours: COVID-19 Planning and Response March 27, 2020 Panelists/Resource Advisors Centers

Council for Outbreak Response: Healthcare-Associated Infections Antibiotic-Resistant Pathogens

http://cs224w.stanford.edu (1) New problem: Outbreak detection (2) - PowerPoint PPT Presentation

CS224W: Machine Learning with Graphs Jure Leskovec, Stanford University http://cs224w.stanford.edu (1) New problem: Outbreak detection (2) Develop an approximation algorithm It is a submodular opt. problem! (3) Speed-up greedy

http://cs224w.stanford.edu October August 12/3/2013 Jure Leskovec, Stanford CS224W: Social and

http://cs224w.stanford.edu 10/31/2012 Jure Leskovec, Stanford CS224W: Social and Information

http://cs224w.stanford.edu Course website: Course website: http://cs224w.stanford.edu

http://cs224w.stanford.edu 10/25/2010 Jure Leskovec, Stanford CS224W: Social and Information

http://cs224w.stanford.edu ? ? ? ? Machine Learning ? Node classification 12/4/17 Jure

http://cs224w.stanford.edu Nodes Nodes Network Adjacency matrix 11/30/17 Jure Leskovec,

http://cs224w.stanford.edu (1) New problem: Outbreak detection (2) Develop an approximation

http://cs224w.stanford.edu (1) New problem: Outbreak detection (2) Develop an approximation

http://cs224w.stanford.edu ? ? ? ? Machine Learning ? Node classification 10/15/19 Jure

http://cs224w.stanford.edu Output: Node embeddings. We can also embed larger network

http://cs224w.stanford.edu Stanford Social Web (ca. 1999) network

http://cs224w.stanford.edu Networks of tightly Networks of tightly connected groups

http://cs224w.stanford.edu Spreading through networks: Spreading through networks:

http://cs224w.stanford.edu Non overlapping vs overlapping communities Non overlapping

http://cs224w.stanford.edu Teams of 2 3 students (1 is also ok) Teams of 2 3 students

http://cs224w.stanford.edu How to organize/navigate it? How to organize/navigate it?

CONNECTING CARE FOR CHILDREN: A partnership between CCGs, hospital &amp; community health

Vaccine Induced Pathogen Type Replacement: Theoretical Mechanism DIMACS Workshop on Co-evolution

Predicting ED Attendance from GP Records Jon Patrick CEO Statement of Interests Project was

National Quality Registers in Sweden Sweden Sweden is not one country concerning health care!

Strategies to Support the Health and Well-Being of Clinicians During the COVID-19 Outbreak

A Multidisciplinary Approach to Investigating Foodborne Illness Outbreaks June 26, 2019

Office Hours: COVID-19 Planning and Response March 27, 2020 Panelists/Resource Advisors Centers

Council for Outbreak Response: Healthcare-Associated Infections Antibiotic-Resistant Pathogens

CONNECTING CARE FOR CHILDREN: A partnership between CCGs, hospital & community health