Active Learning and Optimized Information Gathering Lecture 13 – Submodularity (cont’d) CS 101.2 Andreas Krause
Announcements Homework 2: Due Thursday Feb 19 Project milestone due: Feb 24 4 Pages, NIPS format: http://nips.cc/PaperInformation/StyleFiles Should contain preliminary results (model, experiments, proofs, …) as well as timeline for remaining work Come to office hours to discuss projects! Office hours Come to office hours before your presentation! Andreas: Monday 3pm-4:30pm , 260 Jorgensen Ryan: Wednesday 4:00-6:00pm, 109 Moore 2
Feature selection Given random variables Y, X 1 , … X n Want to predict Y from subset X A = (X i1 ,…,X ik ) Y Naïve Bayes Model “Sick” X 1 X 2 X 3 “Fever” “Rash” “Male” Want k most informative features: A* = argmax IG(X A ; Y) s.t. |A| ≤ k where IG(X A ; Y) = H(Y) - H(Y | X A ) Uncertainty Uncertainty before knowing X A after knowing X A 3
Example: Greedy algorithm for feature selection Given: finite set V of features, utility function F(A) = IG(X A ; Y) Want: A * ⊆ V such that � ������ NP-hard! � � � � � � Greedy algorithm: ������� ������ ������ Start with A = ∅ For i = 1 to k s* := argmax s F(A ∪ {s}) A := A ∪ {s*} How well can this simple heuristic do? 4
Key property: Diminishing returns Selection A = {} Selection B = {X 2 ,X 3 } � � ������ ������ � � � � ������ ������ � � Adding X 1 Adding X 1 Theorem [Krause, Guestrin UAI ‘05] : Information gain F(A) in ������� will help a lot! doesn’t help much Naïve Bayes models is submodular! New feature X 1 + s Large improvement Submodularity: A B + s Small improvement For A ⊆ B, F(A ∪ {s}) – F(A) ≥ F(B ∪ {s}) – F(B) 5
Why is submodularity useful? Theorem [Nemhauser et al ‘78] Greedy maximization algorithm returns A greedy : F(A greedy ) ≥ (1-1/e) max |A| ≤ k F(A) ���� Greedy algorithm gives near-optimal solution! For info-gain: Guarantees best possible unless P = NP! [Krause, Guestrin UAI ’05] Submodularity is an incredibly useful and powerful concept! 6
Monitoring water networks [Krause et al, J Wat Res Mgt 2008] Contamination of drinking water could affect millions of people Contamination ������� Simulator from EPA Hach Sensor Place sensors to detect contaminations ~ $14K “Battle of the Water Sensor Networks” competition Where should we place sensors to quickly detect contamination? 7
Model-based sensing Utility of placing sensors based on model of the world For water networks: Water flow simulator from EPA F(A)=Expected impact reduction placing sensors at A Model predicts Low impact High impact Theorem [Krause et al., J Wat Res Mgt ’08] : Contamination location Impact reduction F(A) in water networks is submodular! Medium impact � � location � � � � � � � � � � � � Sensor reduces � � impact through Set V of all early detection! � � network junctions Low impact reduction F(A)=0.01 High impact reduction F(A) = 0.9 8
Battle of the Water Sensor Networks Competition Real metropolitan area network (12,527 nodes) Water flow simulator provided by EPA 3.6 million contamination events Multiple objectives: Detection time, affected population, … Place sensors that detect well “on average” 9
What about worst-case? [Krause et al., NIPS ’07] Knowing the sensor locations, an adversary contaminates here! � � � � � � � � � � � � � � � � Placement detects Very different average-case impact, well on “average-case” Same worst-case impact (accidental) contamination Where should we place sensors to quickly detect in the worst case ? 10
Constrained maximization: Outline Utility function Selected set Selection cost Budget Subset selection Complex constraints Robust optimization 11
Optimizing for the worst case Separate utility function F i for each contamination i F i (A) = impact reduction by sensors A for contamination i Contamination Sensors A at node s Want to solve F s (B) is high F s (A) is high Sensors B Contamination Each of the F i is submodular at node r Unfortunately, min i F i not submodular! F r (A) is low F r (B) is high How can we solve this robust optimization problem? 12
How does the greedy algorithm do? V={ , , } Can only buy k=2 Set A F 1 F 2 min i F i 1 0 0 Greedy picks 0 2 0 first ε ε ε Optimal Then, can Hence we can’t find any solution 1 ε ε choose only approximation algorithm. 2 ε ε or Optimal score: 1 Or can we? 1 2 1 Greedy score: ε � Greedy does arbitrarily badly. Is there something better? Theorem: The problem max |A| ≤ k min i F i (A) does not admit any approximation unless P=NP 13
Alternative formulation If somebody told us the optimal value, can we recover the optimal solution A * ? Need to find Is this any easier? Yes, if we relax the constraint |A| ≤ k 14
Solving the alternative problem Trick: For each F i and c, define truncation � � ��� c �� ��� ��� Remains submodular! |A| Problem 1 (last slide) Problem 2 Same optimal solutions! Non-submodular � Submodular! Solving one solves the other Don’t know how to solve But appears as constraint? 15
Maximization vs. coverage Previously: Wanted A* = argmax F(A) s.t. |A| ≤ k Now need to solve: A* = argmin |A| s.t. F(A) ≥ Q Greedy algorithm: Start with A := ∅ ; For bound, assume While F(A) < Q and |A|< n F is integral. s* := argmax s F(A ∪ {s}) If not, just round it. A := A ∪ {s*} Theorem [Wolsey et al]: Greedy will return A greedy |A greedy | ≤ (1+log max s F({s})) |A opt | 16
Solving the alternative problem Trick: For each F i and c, define truncation � � ��� c �� ��� ��� |A| Problem 1 (last slide) Problem 2 Non-submodular � Submodular! Don’t know how to solve Can use greedy algorithm! 17
Back to our example F’ avg,1 Guess c=1 Set A F 1 F 2 min i F i First pick 1 0 0 ½ Then pick 0 2 0 ½ � Optimal solution! ε ε ε ε ε ε ε 1 (1+ ε ε ε ε )/2 ε ε (1+ ε ε )/2 2 ε ε ε ε 1 2 1 1 How do we find c? Do binary search! 18
SATURATE Algorithm [Krause et al, NIPS ‘07] Given: set V, integer k and monotonic SFs F 1 ,…,F m Initialize c min =0, c max = min i F i (V) Do binary search: c = (c min +c max )/2 Greedily find A G such that F’ avg,c (A G ) = c If |A G | ≤ α k: increase c min If |A G | > α k: decrease c max until convergence Truncation threshold (color) 19
Theoretical guarantees [Krause et al, NIPS ‘07] Theorem: The problem max |A| ≤ k min i F i (A) does not admit any approximation unless P=NP � Theorem : SATURATE finds a solution A S such that min i F i (A S ) ≥ OPT k and |A S | ≤ α k where OPT k = max |A| ≤ k min i F i (A) α = 1 + log max s ∑ i F i ({s}) Theorem: If there were a polytime algorithm with better factor β < α , then NP ⊆ DTIME(n log log n ) 20
Example: Lake monitoring Monitor pH values using robotic sensor transect Prediction at unobserved Observations A locations pH value True (hidden) pH values Var(s | A) Use probabilistic model (Gaussian processes) to estimate prediction error Position s along transect Where should we sense to minimize our maximum error ? � � Robust submodular � � (often) submodular optimization problem! [Das & Kempe ’08] 21
Comparison with state of the art Algorithm used in geostatistics: Simulated Annealing [Sacks & Schiller ’88, van Groeningen & Stein ’98, Wiens ’05,…] 7 parameters that need to be fine-tuned 0.25 0.25 Maximum marginal variance Maximum marginal variance Maximum marginal variance 2.5 0.2 0.2 ������ Greedy Greedy Greedy 2 0.15 0.15 SATURATE 1.5 Simulated 0.1 0.1 Annealing 1 Simulated 0.05 0.05 Annealing SATURATE 0.5 0 0 0 20 40 60 80 100 0 0 20 20 40 40 60 60 Number of sensors SATURATE is competitive & 10x faster Number of sensors Number of sensors Precipitation data Environmental monitoring No parameters to tune! 22
Results on water networks 3000 Maximum detection time (minutes) No decrease 2500 until all Greedy �!��"��"������ contaminations 2000 detected! Simulated 1500 Annealing 1000 500 SATURATE 0 Water networks 0 10 20 Number of sensors 60% lower worst-case detection time! 23
Worst- vs. average case Given: Set V, submodular functions F 1 ,…,F m Average-case score Worst-case score Too optimistic? Very pessimistic! Want to optimize both average- and worst-case score! Can modify SATURATE to solve this problem! ☺ F ac (A) ≥ c ac and F wc (A) ≥ c wc Want: Truncate: min{F ac (A),c ac } + min{F wc (A),c wc } ≥ c ac +c wc 24
Worst- vs. average case 7000 Only optimize for average case 6000 ���������� �#$���" ��!��"��"������ 5000 Knee in 4000 Water tradeoff curve Tradeoffs networks 3000 ( SATURATE ) data 2000 1000 Only optimize for worst case 0 0 50 100 150 200 250 300 350 Can find good compromise between ������������� �#$��� average- and worst-case score! ��!��"��"������ 25
Recommend
More recommend