Using Partial Probes to Infer Network States Pavan Rangudu ◦ , Bijaya Adhikari ∗ , B. Aditya Prakash ∗ , Anil Vullikanti ∗ ◦ ∗ Department of Computer Science, Virginia Tech ◦ NDSSL, Biocomplexity Institute, Virginia Tech Contact: badityap@cs.vt.edu
Motivation • Network nodes and links fail dynamically • Networks not known fully because of privacy constraints • Our focus: if some failed nodes are known, can we infer the states of the remaining nodes? Node failures in internet Traffic jam in road network Prior works fail to the address the problem directly.
Our model • Graph G ( V , E ) with set I ⊆ V which have failed • Goegraphically correlated failure model [Agarwal et al., 2013] • Single seed of the failure, with probability p s ( v ) of node v being the seed • Correlated failure model: F ( u | v ) denotes the probability that node u fails given that v has failed • Assume independence, i.e., F ( u 1 , u 2 | v ) = F ( u 1 | v ) · F ( u 2 | v ) • Motivation: attacks or natural disasters in infrastructure networks • Probes: subset Q ⊆ I of failed nodes is known • Objective: find the set I − Q Figure: A toy road network with node failures
Our approach: Minimum Description Length • Model cost L ( |Q| , | I | , I ) has three components � |Q| � |Q| , | I | � � � � � � L ( |Q| , | I | , I ) = L ( |Q| ) + L | I | + L I . � � • L ( |Q| ) = − log Pr ( |Q| ) by using the Shannon-Fano code � Pr � � � | I | � |Q| Pr ( | I | ) � � |Q| � � � • L | I | = − log Pr ( |Q| ) � �� � �� � |Q| , | I | � |Q| , | I | � | I | � � � � � � � • L = − log = − log I Pr I Pr I • Data cost: description of Q + = I \ Q (assuming no observation errors) � γ |Q| (1 − γ ) |Q + | � • L ( Q + | I ) = − log = −|Q| log( γ ) − ( | I | − |Q| ) log(1 − γ )
Problem Description Model Cost � |Q| � |Q| , | I | � � � � � � L ( |Q| , | I | , I ) = L ( |Q| ) + L | I | + L I � � | I | = − log − |Q| log( γ ) − ( | I | − |Q| ) log(1 − γ ) |Q| 1 − F ( v ′ | s ) � � � �� � � − log p s ( s ) F ( v | s ) v ′ / s ∈ V v ∈ I ∈ I *after algebra Problem Formulation Given G , p s , F ( · ), Q , find I that minimizes the total MDL cost: � � | I | � � � 1 − F ( v ′ | s ) �� � � � � L |Q| , | I | , I , Q = − log − log p s ( s ) F ( v | s ) |Q| v ′ / s ∈ V v ∈ I ∈ I − 2 |Q| log( γ ) − 2( | I | − |Q| ) log(1 − γ )
Algorithm Greedy Input: Instance ( V , Q , p , P , γ ) Output: Solution ˆ I that minimizes L ( |Q| , | ˆ I | , ˆ I , Q ) 1: for each s ∈ V do for each k ∈ [ |Q| , | V | ] do 2: I s ( k ) ← Top k − |Q| nodes in V \ Q with highest weight 3: f ( s , v ) I s ( k ) ← I s ( k ) ∪ Q 4: end for 5: 6: end for 7: S ← { I s ( k ) : ∀ s ∈ V & k ∈ [ |Q| , | V | ] } 8: ˆ I ← arg min L ( |Q| , | I | , I , Q ) I ∈S 9: Return ˆ I
Analysis of Greedy Theorem: (Additive Approximation) Let I ∗ be the set minimizing the MDL cost, and let I denote the solution computed by Algorithm Greedy . Then, L ( |Q| , | I | , I , Q ) ≤ L ( |Q| , | I ∗ | , I ∗ , Q ) + log( n ), where n is the number of seed nodes. Running time Algorithm Greedy runs in O ( | V | 3 ) time
Experiments • Baseline: local improvement algorithm LocalSearch • Datasets • Synthetic grid • 60 × 60 grid • Uniform seed probability p s ( · ) • Conditional failure probability distribution using model of [Agarwal et al., 2013]: F ( v | s ) = 1 − d ( s , v ), where d ( · ) is (normalized) distance • Real datasets: Seed and conditional failure probability distributions computed from data • JAM data from WAZE for Boston: road network with 2650 nodes. • WEATHER data from WAZE for Boston: road network with 1520 nodes. • POWER-GRID: network of 24 nodes from Electric disturbance events
WAZE dataset Visualization of Waze dataset. Partitions in the 119 × 78 grid represent nodes in our network.
Takeaways Results for JAM dataset 1.0 1.0 Precision/ Recall/ F1 Score/ MDL Cost Ratio Precision/ Recall/ F1 Score/ MDL Cost Ratio 2.5 2.4 0.9 0.9 2.3 MDL Cost Ratio (L(I, Q)/L(I*, Q)) 2.2 2.1 0.8 0.8 2.0 1.9 0.7 0.7 1.8 1.7 1.6 0.6 0.6 1.5 1.4 1.3 0.5 0.5 1.2 1.1 1.0 0.4 0.4 0.9 0.8 0.7 0.3 Precision 0.3 Precision 0.6 Recall Recall 0.5 0.2 F1 Score 0.2 F1 Score 0.4 Quick Local 0.3 MDL Cost Ratio MDL Cost Ratio Greedy 0.2 Algorithm Algorithm Compa 0.1 0.1 0.1 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 Gamma Gamma Gamma LocalSearch Greedy of the MDL costs • Our MDL based approach helps identify missing failures • Promising approach for other problems with missing information
Recommend
More recommend