CS345a: Data Mining Jure Leskovec and Anand Rajaraman j Stanford University
Feature selection: Feature selection: Given a set of features X 1 , … X n Want to predict Y from a subset A = (X Want to predict Y from a subset A = (X i1 ,…,X ik ) X ) What are the k most informative features? Active learning: Want to predict medical condition p Each test has a cost (but also reveals information) Which tests should we perform to make most effective decisions? 3/9/2010 Jure Leskovec & Anand Rajaraman, Stanford CS345a: Data Mining 2
Influence maximization: Influence maximization: In a social network, which nodes to advertise to? Which are the most influential blogs? Which are the most influential blogs? Sensor placement: Given a water distribution network Where should we place sensors to quickly detect p q y contaminations? 3/9/2010 Jure Leskovec & Anand Rajaraman, Stanford CS345a: Data Mining 3
Given: Given: finite set V A function F: 2 V A function F: 2 Want: A * = argmax A F(A) A s.t. some constraints on A For example: Influence maximization: V= F(A)= Sensor placement: V= F(A)= Feature selection: V= F(A)= 3/9/2010 Jure Leskovec & Anand Rajaraman, Stanford CS345a: Data Mining 4
Given random variables Y, X 1 , … X n , 1 , n Want to predict Y from subset A = (X i1 ,…,X ik ) Y Naïve Bayes Model: Naïve Bayes Model: “Sick” k P(Y,X 1 ,…,X n ) = P(Y) i P(X i | Y) X 1 X 2 X 3 “Fever” “F ” “R “Rash” h” “C “Cough” h” Want k most informative features: A* = argmax I(A; Y) s.t. |A| k where I(A; Y) = H(Y) ‐ H(Y | A) Uncertainty Uncertainty before knowing A after knowing A 3/9/2010 Jure Leskovec & Anand Rajaraman, Stanford CS345a: Data Mining 5
Given: finite set V of features utility function Given: finite set V of features, utility function F(A) = I(A; Y) Y Want: A * V such that Want: A V such that “Sick” X 1 X 2 X 3 “Fever” “Fever” “Rash” “Rash” “Cough” “Cough” Typically NP ‐ hard! G Greedy hill ‐ climbing: d hill li bi How well does Start with A 0 = {} For i = 1 to k this simple t s s p e s * = argmax s F(A {s}) heuristic do? A i = A i ‐ 1 {s * } 3/9/2010 Jure Leskovec & Anand Rajaraman, Stanford CS345a: Data Mining 6
Greedy hill climbing produces a solution A Greedy hill climbing produces a solution A where F(A) (1 ‐ 1/e) of optimal value (~63%) [Hemhauser, Fisher, Wolsey ’78] [ , , y ] Claim holds for functions F with 2 properties: F is monotone: if A B then F(A) F(B) and F ({})=0 F is submodular: adding element to a set gives less improvement than adding to one of subsets 3/9/2010 Jure Leskovec & Anand Rajaraman, Stanford CS345a: Data Mining 7
Definition: Definition: Set function F on V is called submodular if: For all A B V: For all A,B V: F(A)+F(B) F(A B)+F(A B) + + A A B B A B A B 3/9/2010 Jure Leskovec & Anand Rajaraman, Stanford CS345a: Data Mining 8
Diminishing returns characterization Diminishing returns characterization Definition: Set function F on V is called submodular if: Set function F on V is called submodular if: For all A B, s B: F(A {s}) – F(A) ≥ F(B {s}) – F(B) Gain of adding s to a small set Gain of adding s to a large set + s Large improvement A B + s Small improvement 3/9/2010 Jure Leskovec & Anand Rajaraman, Stanford CS345a: Data Mining 9
Given random variables X Given random variables X 1 ,…,X n X Mutual information: F(A) = I(A; V\A) = H(V\A) – H(V\A|A) F(A) = I(A; V\A) = H(V\A) – H(V\A|A) = y,A P(A) [log P(y|A) – log P(y)] Mutual information F(A) is submodular [Krause ‐ Guestrin ’05] F(A {s}) – F(A) = H(s | A) – H(s | V\(A {s})) A B H(s|A) H(s|B) “Information never hurts” 3/9/2010 Jure Leskovec & Anand Rajaraman, Stanford CS345a: Data Mining 10
Let Y = i i X i + where (X 1 Let Y i i X i + , where (X 1 ,…,X n , ) N( ; , ) X ) ~ N( ; ) Want to pick a subset A to predict Y Var(Y|X A =x A ): conditional var of Y given X A =x A Var(Y|X A x A ): conditional var. of Y given X A x A Expected variance: Var(Y | X A ) = p(x A ) Var(Y | X A =x A ) dx A Variance reduction: F V (A) = Var(Y) – Var(Y | X A ) V A Then [Das ‐ Kempe 08] : Orthogonal matching pursuit F V (A) is monotonic V ( ) [Tropp Donoho] [Tropp ‐ Donoho] F V (A) is submodular * near optimal! * under some conditions on 3/9/2010 Jure Leskovec & Anand Rajaraman, Stanford CS345a: Data Mining 11
F F 1 ,…,F m submodular functions on V F submodular functions on V and 1 ,…, m > 0 Then: F(A) = F (A) is submodular! Then: F(A) = i i F i (A) is submodular! Submodularity closed under nonnegative linear combinations Extremely useful fact: y F (A) submodular P( ) F (A) submodular! Multicriterion optimization: Multicriterion optimization: F 1 ,…,F m submodular, i >0 i i F i (A) submodular 3/9/2010 Jure Leskovec & Anand Rajaraman, Stanford CS345a: Data Mining 12
Each element covers some area Each element covers some area Observation: Diminishing returns N New element: l t S S 1 S 1 S’ S’ S 2 S 3 S 2 S 4 A={S 1 , S 2 } B={S 1 , S 2 , S 3 , S 4 } Adding S’helps Adding S’helps a lot very little l l 3/9/2010 Jure Leskovec & Anand Rajaraman, Stanford CS345a: Data Mining 13
F is submodular: A B F is submodular: A B F(A {s}) – F(A) ≥ F(B {s}) – F(B) Gain of adding a set s to a small solution Gain of adding a set s to a small solution Gain of adding a set s to a large solution Gain of adding a set s to a large solution A Natural example: p s Sets s 1 , s 2 ,…, s n F(A) = size of union of s i F(A) si e of union of s i B (size of covered area) s s 3/9/2010 Jure Leskovec & Anand Rajaraman, Stanford CS345a: Data Mining 14
Most influential set of Most influential set of 0.4 a a d d 0.4 size k: set S of k nodes 0.2 0.3 0.3 0.2 producing largest producing largest 0.3 3 b f f 0.2 e h expected cascade 0.4 0.4 0.3 0.2 0.3 size F(S) if activated size F(S) if activated 0.3 g g i 0.4 c [Domingos ‐ Richardson ‘01] F S ( ( ) ) max Optimization problem: p p S of size k 3/9/2010 Jure Leskovec & Anand Rajaraman, Stanford CS345a: Data Mining 15
0.4 Fix outcome i of coin flips Fix outcome i of coin flips a a d d 0.4 0.2 Let F i (S) be size of 0.3 0.3 0.2 0.3 3 cascade from S cascade from S b f f 0.2 e h given these coin 0.4 0.4 0.3 0.2 0.3 0.3 flips flips g g i 0.4 c • Let F i (v) = set of nodes reachable from v on live ‐ edge paths • F i (S) = size of union F i (v) → F i is submodular • F= ∑ F i → F is submodular [Kempe ‐ Kleinberg ‐ Tardos ‘03] 3/9/2010 Jure Leskovec & Anand Rajaraman, Stanford CS345a: Data Mining 16
Given a real city water Given a real city water distribution network And data on how contaminants spread in the network Problem posed by US P bl d b US S S S S Environmental Protection Agency Protection Agency 3/9/2010 Jure Leskovec & Anand Rajaraman, Stanford CS345a: Data Mining 17
[Leskovec et al., KDD ’07] Real metropolitan area Real metropolitan area water network: V = 21,000 nodes V 21,000 nodes E = 25,000 pipes Water flow simulator provided by EPA 3.6 million contamination events Multiple objectives: Detection time, affected population, … , p p , Place sensors that detect well “on average” 3/9/2010 Jure Leskovec & Anand Rajaraman, Stanford CS345a: Data Mining 18
Utility of placing sensors Water flow dynamics, demands of households, … For each subset A V compute utility F(A) L Low impact i Model predicts d l d location High impact Contamination Medium impact location S 3 S 1 S 2 S 3 S 4 S S 2 S 1 Sensor reduces impact through S 4 early detection! Set V of all network junctions S 1 Low sensing quality F(A)=0.01 High sensing quality F(A) = 0.9 3/9/2010 Jure Leskovec & Anand Rajaraman, Stanford CS345a: Data Mining 19
Given: Gi en Graph G(V,E), budget B Data on how outbreaks o 1, …, o i , …,o K spread over time Select a set of nodes A maximizing the reward Reward for detecting outbreak i subject to cost(A) ≤ B 3/9/2010 Jure Leskovec & Anand Rajaraman, Stanford CS345a: Data Mining 20
Cost: Cost: Cost of monitoring is node dependent dependent Reward: Minimize the number of affected A nodes: R(A) ( ) If A are the monitored nodes, let R(A) denote the number of nodes we save 3/9/2010 Jure Leskovec & Anand Rajaraman, Stanford CS345a: Data Mining 21
Recommend
More recommend