cs345a data mining jure leskovec and anand rajaraman j
play

CS345a: Data Mining Jure Leskovec and Anand Rajaraman j Stanford - PowerPoint PPT Presentation

CS345a: Data Mining Jure Leskovec and Anand Rajaraman j Stanford University Feature selection: Feature selection: Given a set of features X 1 , X n Want to predict Y from a subset A = (X Want to predict Y from a subset A =


  1. CS345a: Data Mining Jure Leskovec and Anand Rajaraman j Stanford University

  2.  Feature selection:  Feature selection:  Given a set of features X 1 , … X n  Want to predict Y from a subset A = (X  Want to predict Y from a subset A = (X i1 ,…,X ik ) X )  What are the k most informative features?  Active learning:  Want to predict medical condition p  Each test has a cost (but also reveals information)  Which tests should we perform to make most effective decisions? 3/9/2010 Jure Leskovec & Anand Rajaraman, Stanford CS345a: Data Mining 2

  3.  Influence maximization:  Influence maximization:  In a social network, which nodes to advertise to?  Which are the most influential blogs? Which are the most influential blogs?  Sensor placement:  Given a water distribution network  Where should we place sensors to quickly detect p q y contaminations? 3/9/2010 Jure Leskovec & Anand Rajaraman, Stanford CS345a: Data Mining 3

  4.  Given:  Given:  finite set V  A function F: 2 V  A function F: 2   Want: A * = argmax A F(A) A s.t. some constraints on A  For example:  Influence maximization: V= F(A)=  Sensor placement: V= F(A)=  Feature selection: V= F(A)= 3/9/2010 Jure Leskovec & Anand Rajaraman, Stanford CS345a: Data Mining 4

  5.  Given random variables Y, X 1 , … X n , 1 , n  Want to predict Y from subset A = (X i1 ,…,X ik ) Y Naïve Bayes Model: Naïve Bayes Model: “Sick” k P(Y,X 1 ,…,X n ) = P(Y)  i P(X i | Y) X 1 X 2 X 3 “Fever” “F ” “R “Rash” h” “C “Cough” h”  Want k most informative features: A* = argmax I(A; Y) s.t. |A|  k where I(A; Y) = H(Y) ‐ H(Y | A) Uncertainty Uncertainty before knowing A after knowing A 3/9/2010 Jure Leskovec & Anand Rajaraman, Stanford CS345a: Data Mining 5

  6.  Given: finite set V of features utility function  Given: finite set V of features, utility function F(A) = I(A; Y) Y  Want: A *  V such that  Want: A  V such that “Sick” X 1 X 2 X 3 “Fever” “Fever” “Rash” “Rash” “Cough” “Cough” Typically NP ‐ hard! G Greedy hill ‐ climbing: d hill li bi How well does Start with A 0 = {} For i = 1 to k this simple t s s p e s * = argmax s F(A  {s}) heuristic do? A i = A i ‐ 1  {s * } 3/9/2010 Jure Leskovec & Anand Rajaraman, Stanford CS345a: Data Mining 6

  7.  Greedy hill climbing produces a solution A  Greedy hill climbing produces a solution A where F(A)  (1 ‐ 1/e) of optimal value (~63%) [Hemhauser, Fisher, Wolsey ’78] [ , , y ]  Claim holds for functions F with 2 properties:  F is monotone: if A  B then F(A)  F(B) and F ({})=0  F is submodular: adding element to a set gives less improvement than adding to one of subsets 3/9/2010 Jure Leskovec & Anand Rajaraman, Stanford CS345a: Data Mining 7

  8. Definition: Definition:  Set function F on V is called submodular if: For all A B  V: For all A,B  V: F(A)+F(B)  F(A  B)+F(A  B)  + + A A  B B A  B A  B 3/9/2010 Jure Leskovec & Anand Rajaraman, Stanford CS345a: Data Mining 8

  9.  Diminishing returns characterization  Diminishing returns characterization Definition:  Set function F on V is called submodular if:  Set function F on V is called submodular if: For all A  B, s  B: F(A  {s}) – F(A) ≥ F(B  {s}) – F(B) Gain of adding s to a small set Gain of adding s to a large set + s Large improvement A B + s Small improvement 3/9/2010 Jure Leskovec & Anand Rajaraman, Stanford CS345a: Data Mining 9

  10.  Given random variables X  Given random variables X 1 ,…,X n X  Mutual information: F(A) = I(A; V\A) = H(V\A) – H(V\A|A) F(A) = I(A; V\A) = H(V\A) – H(V\A|A) =  y,A P(A) [log P(y|A) – log P(y)]  Mutual information F(A) is submodular [Krause ‐ Guestrin ’05] F(A  {s}) – F(A) = H(s | A) – H(s | V\(A  {s}))  A  B  H(s|A)  H(s|B)  “Information never hurts” 3/9/2010 Jure Leskovec & Anand Rajaraman, Stanford CS345a: Data Mining 10

  11.  Let Y =  i  i X i +  where (X 1 Let Y  i  i X i +  , where (X 1 ,…,X n ,  ) N( ;  ,  ) X  ) ~ N(  ;   )  Want to pick a subset A to predict Y  Var(Y|X A =x A ): conditional var of Y given X A =x A  Var(Y|X A x A ): conditional var. of Y given X A x A  Expected variance: Var(Y | X A ) =  p(x A ) Var(Y | X A =x A ) dx A  Variance reduction: F V (A) = Var(Y) – Var(Y | X A ) V A  Then [Das ‐ Kempe 08] : Orthogonal matching pursuit  F V (A) is monotonic V ( ) [Tropp Donoho] [Tropp ‐ Donoho]  F V (A) is submodular * near optimal! * under some conditions on  3/9/2010 Jure Leskovec & Anand Rajaraman, Stanford CS345a: Data Mining 11

  12.  F  F 1 ,…,F m submodular functions on V F submodular functions on V and  1 ,…,  m > 0  Then: F(A) =   F (A) is submodular!  Then: F(A) =  i  i F i (A) is submodular!  Submodularity closed under nonnegative linear combinations  Extremely useful fact: y  F  (A) submodular    P(  ) F  (A) submodular!  Multicriterion optimization: Multicriterion optimization: F 1 ,…,F m submodular,  i >0   i  i F i (A) submodular 3/9/2010 Jure Leskovec & Anand Rajaraman, Stanford CS345a: Data Mining 12

  13.  Each element covers some area  Each element covers some area  Observation: Diminishing returns N New element: l t S S 1 S 1 S’ S’ S 2 S 3 S 2 S 4 A={S 1 , S 2 } B={S 1 , S 2 , S 3 , S 4 } Adding S’helps Adding S’helps a lot very little l l 3/9/2010 Jure Leskovec & Anand Rajaraman, Stanford CS345a: Data Mining 13

  14.  F is submodular: A  B  F is submodular: A  B F(A  {s}) – F(A) ≥ F(B  {s}) – F(B) Gain of adding a set s to a small solution Gain of adding a set s to a small solution Gain of adding a set s to a large solution Gain of adding a set s to a large solution A  Natural example: p s  Sets s 1 , s 2 ,…, s n  F(A) = size of union of s i F(A) si e of union of s i B (size of covered area) s s 3/9/2010 Jure Leskovec & Anand Rajaraman, Stanford CS345a: Data Mining 14

  15.  Most influential set of Most influential set of 0.4 a a d d 0.4 size k: set S of k nodes 0.2 0.3 0.3 0.2 producing largest producing largest 0.3 3 b f f 0.2 e h expected cascade 0.4 0.4 0.3 0.2 0.3 size F(S) if activated size F(S) if activated 0.3 g g i 0.4 c [Domingos ‐ Richardson ‘01] F S ( ( ) ) max  Optimization problem: p p S of size k 3/9/2010 Jure Leskovec & Anand Rajaraman, Stanford CS345a: Data Mining 15

  16. 0.4  Fix outcome i of coin flips  Fix outcome i of coin flips a a d d 0.4 0.2  Let F i (S) be size of 0.3 0.3 0.2 0.3 3 cascade from S cascade from S b f f 0.2 e h given these coin 0.4 0.4 0.3 0.2 0.3 0.3 flips flips g g i 0.4 c • Let F i (v) = set of nodes reachable from v on live ‐ edge paths • F i (S) = size of union F i (v) → F i is submodular • F= ∑ F i → F is submodular [Kempe ‐ Kleinberg ‐ Tardos ‘03] 3/9/2010 Jure Leskovec & Anand Rajaraman, Stanford CS345a: Data Mining 16

  17.  Given a real city water  Given a real city water distribution network  And data on how contaminants spread in the network  Problem posed by US P bl d b US S S S S Environmental Protection Agency Protection Agency 3/9/2010 Jure Leskovec & Anand Rajaraman, Stanford CS345a: Data Mining 17

  18. [Leskovec et al., KDD ’07]  Real metropolitan area  Real metropolitan area water network:  V = 21,000 nodes V 21,000 nodes  E = 25,000 pipes  Water flow simulator provided by EPA  3.6 million contamination events  Multiple objectives:  Detection time, affected population, … , p p ,  Place sensors that detect well “on average” 3/9/2010 Jure Leskovec & Anand Rajaraman, Stanford CS345a: Data Mining 18

  19.  Utility of placing sensors  Water flow dynamics, demands of households, …  For each subset A  V compute utility F(A) L Low impact i Model predicts d l d location High impact Contamination Medium impact location S 3 S 1 S 2 S 3 S 4 S S 2 S 1 Sensor reduces impact through S 4 early detection! Set V of all network junctions S 1 Low sensing quality F(A)=0.01 High sensing quality F(A) = 0.9 3/9/2010 Jure Leskovec & Anand Rajaraman, Stanford CS345a: Data Mining 19

  20.  Given: Gi en  Graph G(V,E), budget B  Data on how outbreaks o 1, …, o i , …,o K spread over time  Select a set of nodes A maximizing the reward Reward for detecting outbreak i subject to cost(A) ≤ B 3/9/2010 Jure Leskovec & Anand Rajaraman, Stanford CS345a: Data Mining 20

  21.  Cost:  Cost:  Cost of monitoring is node dependent dependent  Reward:  Minimize the number of affected A nodes: R(A) ( )  If A are the monitored nodes, let R(A) denote the number of nodes we save 3/9/2010 Jure Leskovec & Anand Rajaraman, Stanford CS345a: Data Mining 21

Recommend


More recommend