scalable algorithms for distributed statistical inference
play

Scalable Algorithms for Distributed Statistical Inference - PowerPoint PPT Presentation

Scalable Algorithms for Distributed Statistical Inference Animashree Anandkumar School of Electrical and Computer Engineering Cornell University, Ithaca, NY 14853 Currently visiting EECS, MIT, Cambridge, MA 02139 PhD Committee: Lang Tong,


  1. Scaling of Fusion Cost & Lossless Fusion Cost of a Fusion Policy ¯ E ( π n ) The fusion policy π n schedules O ( n ν/ 2 ) transmissions of sensor nodes O ( n ) O ( √ n ) The average fusion cost = 1 E ( π n ) ∆ � ¯ E i ( π n ) n V i ∈ V n O (1) How does ¯ E ( π n ) behave? n Constraint: No Loss in Inference Performance A fusion policy is lossless if it results in no loss of inference performance at fusion center- as if all raw data available at fusion center A. Anandkumar, J.E. Yukich, L. Tong, A. Swami, “Energy scaling laws for distributed inference in random networks,” accepted to IEEE JSAC: Special Issues on Stochastic Geometry and Random Graphs for Wireless Networks , Dec. 2008 (on ArXiv) Anima Anandkumar (Cornell) Scaling Laws B-exam 05/26/09 6 / 59

  2. Problem Statement-I : Energy Scaling Laws Fusion policy graph Anima Anandkumar (Cornell) Scaling Laws B-exam 05/26/09 7 / 59

  3. Problem Statement-I : Energy Scaling Laws Network graph Fusion policy graph Anima Anandkumar (Cornell) Scaling Laws B-exam 05/26/09 7 / 59

  4. Problem Statement-I : Energy Scaling Laws Network graph Dependency graph Fusion policy graph Anima Anandkumar (Cornell) Scaling Laws B-exam 05/26/09 7 / 59

  5. Problem Statement-I : Energy Scaling Laws Network graph Dependency graph Fusion policy graph Scalable Lossless Fusion Policy Find a sequence of scalable policies { π n } , i.e., 1 � E i ( π n ) L 2 = ¯ E π lim sup ∞ < ∞ , n n →∞ V i ∈ V n with small scaling constant ¯ E π ∞ such that optimal inference is achieved at fusion center (lossless) for a class of node configurations. Anima Anandkumar (Cornell) Scaling Laws B-exam 05/26/09 7 / 59

  6. Problem II: Optimal Node Placement Distribution Spread-out Clustered Uniform 0.5 1 0.5 0.4 0.9 0.4 0.3 0.8 0.3 0.2 0.7 0.2 0.1 0.6 0.1 0 0.5 0 −0.1 0.4 −0.1 −0.2 0.3 −0.2 −0.3 0.2 −0.3 −0.4 0.1 −0.4 −0.5 0 −0.5 −0.5 −0.4 −0.3 −0.2 −0.1 0 0.1 0.2 0.3 0.4 0.5 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 −0.5 −0.4 −0.3 −0.2 −0.1 0 0.1 0.2 0.3 0.4 0.5 Goal: what placement strategy has best asymptotic average energy ¯ E π ∞ ? Anima Anandkumar (Cornell) Scaling Laws B-exam 05/26/09 8 / 59

  7. Problem II: Optimal Node Placement Distribution Spread-out Clustered Uniform 0.5 1 0.5 0.4 0.9 0.4 0.3 0.8 0.3 0.2 0.7 0.2 0.1 0.6 0.1 0 0.5 0 −0.1 0.4 −0.1 −0.2 0.3 −0.2 −0.3 0.2 −0.3 −0.4 0.1 −0.4 −0.5 0 −0.5 −0.5 −0.4 −0.3 −0.2 −0.1 0 0.1 0.2 0.3 0.4 0.5 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 −0.5 −0.4 −0.3 −0.2 −0.1 0 0.1 0.2 0.3 0.4 0.5 ¯ E n O ( n ) e l b a l a O ( √ n ) c s t o N O (1) Scalable n Goal: what placement strategy has best asymptotic average energy ¯ E π ∞ ? Anima Anandkumar (Cornell) Scaling Laws B-exam 05/26/09 8 / 59

  8. Problem II: Optimal Node Placement Distribution Spread-out Clustered Uniform 0.5 1 0.5 0.4 0.9 0.4 0.3 0.8 0.3 0.2 0.7 0.2 0.1 0.6 0.1 0 0.5 0 −0.1 0.4 −0.1 −0.2 0.3 −0.2 −0.3 0.2 −0.3 −0.4 0.1 −0.4 −0.5 0 −0.5 −0.5 −0.4 −0.3 −0.2 −0.1 0 0.1 0.2 0.3 0.4 0.5 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 −0.5 −0.4 −0.3 −0.2 −0.1 0 0.1 0.2 0.3 0.4 0.5 ¯ E n ¯ E π ∞ Scalable n Goal: what placement strategy has best asymptotic average energy ¯ E π ∞ ? Anima Anandkumar (Cornell) Scaling Laws B-exam 05/26/09 8 / 59

  9. Problem II: Optimal Node Placement Distribution Spread-out Clustered Uniform 0.5 1 0.5 0.4 0.9 0.4 0.3 0.8 0.3 0.2 0.7 0.2 0.1 0.6 0.1 0 0.5 0 −0.1 0.4 −0.1 −0.2 0.3 −0.2 −0.3 0.2 −0.3 −0.4 0.1 −0.4 −0.5 0 −0.5 −0.5 −0.4 −0.3 −0.2 −0.1 0 0.1 0.2 0.3 0.4 0.5 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 −0.5 −0.4 −0.3 −0.2 −0.1 0 0.1 0.2 0.3 0.4 0.5 ¯ E n ? ¯ E π ∞ Scalable n Goal: what placement strategy has best asymptotic average energy ¯ E π ∞ ? Anima Anandkumar (Cornell) Scaling Laws B-exam 05/26/09 8 / 59

  10. Problem II: Optimal Node Placement Distribution Spread-out Clustered Uniform 0.5 1 0.5 0.4 0.9 0.4 0.3 0.8 0.3 0.2 0.7 0.2 0.1 0.6 0.1 0 0.5 0 −0.1 0.4 −0.1 −0.2 0.3 −0.2 −0.3 0.2 −0.3 −0.4 0.1 −0.4 −0.5 0 −0.5 −0.5 −0.4 −0.3 −0.2 −0.1 0 0.1 0.2 0.3 0.4 0.5 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 −0.5 −0.4 −0.3 −0.2 −0.1 0 0.1 0.2 0.3 0.4 0.5 ¯ E n ? ¯ E π ∞ Scalable n Goal: what placement strategy has best asymptotic average energy ¯ E π ∞ ? Challenge: Network & dependency graphs influenced by node locations Anima Anandkumar (Cornell) Scaling Laws B-exam 05/26/09 8 / 59

  11. Related Work: Scaling Laws in Networks Capacity Scaling in Wireless Networks (Gupta & Kumar, IT ‘00) 1 Information flow between nodes, O ( √ n log n ) scaling Routing Correlated Data Algorithms for gathering correlated data (Cristescu, B. Beferull-Lozano & Vetterli, TON ‘06) Function Computation Rate scaling for Computation of separable functions at a sink (Giridhar & Kumar, JSAC ‘05) Bounds on time required to achieve a distortion level for distributed computation (Ayaso, Dahleh & Shah, ISIT ‘08) Anima Anandkumar (Cornell) Scaling Laws B-exam 05/26/09 9 / 59

  12. Outline Models, assumptions, and problem formulations ◮ Propagation, network, and inference models Insights from special cases Markov random fields Scalable data fusion for Markov random field Some related problems Conclusion and future work Anima Anandkumar (Cornell) Scaling Laws B-exam 05/26/09 10 / 59

  13. Propagation Model and Assumptions P r P t ( dB ) Transmitter Receiver Cost for perfect reception: E T = O ( d ν ) . d ν : path-loss exponent. Scheduling to avoid interference. Quantization effects log d ignored. Characteristics Berkeley Mote � Transmission range: 500-1000 ft. � Current draw: 25mA (tx), 8mA (rx) � Rate: 38.4 Kbaud. A. Ephremides, “Energy concerns in wireless networks,” IEEE Wireless Comm. , no. 4, Aug. 2002 Anima Anandkumar (Cornell) Scaling Laws B-exam 05/26/09 11 / 59

  14. Network Graph Model For Communication Random Node Placement κ ( x ) R = � n i.i.d. . . Points X i ∼ κ ( x ) on unit ball Q 1 .... ... . . . . πλ κ ( x ) bounded away from 0 and ∞ Network scaled to a fixed density λ : V i V i = � n λ X i Anima Anandkumar (Cornell) Scaling Laws B-exam 05/26/09 12 / 59

  15. Network Graph Model For Communication Random Node Placement κ ( x ) R = � n i.i.d. . . Points X i ∼ κ ( x ) on unit ball Q 1 .... ... . . . . πλ κ ( x ) bounded away from 0 and ∞ Network scaled to a fixed density λ : V i V i = � n λ X i Network Graph for Communication R = � n πλ Connected set of comm. links Energy & interference constraints V i ◮ Disc graph above critical radius Adjustable transmission power Anima Anandkumar (Cornell) Scaling Laws B-exam 05/26/09 12 / 59

  16. Routing Strategies With No Fusion Are Not Scalable ¯ E ( π n ) O ( n ν/ 2 ) O ( n ) O ( √ n ) O (1) n Anima Anandkumar (Cornell) Scaling Laws B-exam 05/26/09 13 / 59

  17. Routing Strategies With No Fusion Are Not Scalable Single Hop ¯ E ( π n ) O ( n ν/ 2 ) O ( n ) O ( √ n ) O (1) n Anima Anandkumar (Cornell) Scaling Laws B-exam 05/26/09 13 / 59

  18. Routing Strategies With No Fusion Are Not Scalable Single Hop ¯ E ( π n ) O ( n ν/ 2 ) O ( n ) O ( √ n ) O (1) n Shortest path Anima Anandkumar (Cornell) Scaling Laws B-exam 05/26/09 13 / 59

  19. Routing Strategies With No Fusion Are Not Scalable Single Hop ¯ E ( π n ) O ( n ν/ 2 ) O ( n ) O ( √ n ) ? O (1) n Shortest path Anima Anandkumar (Cornell) Scaling Laws B-exam 05/26/09 13 / 59

  20. Routing Strategies With No Fusion Are Not Scalable Single Hop ¯ E ( π n ) O ( n ν/ 2 ) O ( n ) O ( √ n ) ? O (1) n Shortest path Incorporate inference model (dependency graph) for scalable fusion policy Anima Anandkumar (Cornell) Scaling Laws B-exam 05/26/09 13 / 59

  21. Distributed Computation of Sufficient Statistic i.i.d. ∼ N ( θ, 1) Example: Sufficient Statistic for Mean Estimation Y 1 , . . . , Y n � i Y i sufficient to estimate θ : no performance loss E. Dynkin, “Necessary and sufficient statistics for a family of probability distributions,” Tran. Math, Stat. and Prob. , vol. 1, pp. 23-41, 1961 Anima Anandkumar (Cornell) Scaling Laws B-exam 05/26/09 14 / 59

  22. Distributed Computation of Sufficient Statistic i.i.d. ∼ N ( θ, 1) Example: Sufficient Statistic for Mean Estimation Y 1 , . . . , Y n � i Y i sufficient to estimate θ : no performance loss Sufficient Statistic For Inference: No Performance Loss Dimensionality reduction: lower communication costs Minimal Sufficiency: Maximum dimensionality reduction E. Dynkin, “Necessary and sufficient statistics for a family of probability distributions,” Tran. Math, Stat. and Prob. , vol. 1, pp. 23-41, 1961 Anima Anandkumar (Cornell) Scaling Laws B-exam 05/26/09 14 / 59

  23. Distributed Computation of Sufficient Statistic i.i.d. ∼ N ( θ, 1) Example: Sufficient Statistic for Mean Estimation Y 1 , . . . , Y n � i Y i sufficient to estimate θ : no performance loss Sufficient Statistic For Inference: No Performance Loss Dimensionality reduction: lower communication costs Minimal Sufficiency: Maximum dimensionality reduction Decide Y 1 , . . . , Y n ∼ f 0 ( Y n ) or f 1 ( Y n ) Binary Hypothesis Testing: E. Dynkin, “Necessary and sufficient statistics for a family of probability distributions,” Tran. Math, Stat. and Prob. , vol. 1, pp. 23-41, 1961 Anima Anandkumar (Cornell) Scaling Laws B-exam 05/26/09 14 / 59

  24. Distributed Computation of Sufficient Statistic i.i.d. ∼ N ( θ, 1) Example: Sufficient Statistic for Mean Estimation Y 1 , . . . , Y n � i Y i sufficient to estimate θ : no performance loss Sufficient Statistic For Inference: No Performance Loss Dimensionality reduction: lower communication costs Minimal Sufficiency: Maximum dimensionality reduction Decide Y 1 , . . . , Y n ∼ f 0 ( Y n ) or f 1 ( Y n ) Binary Hypothesis Testing: Minimal Sufficient Statistic for Binary Hypothesis Testing (Dynkin 61) Log Likelihood Ratio: L G ( Y n ) = log f 0 ( Y n ) f 1 ( Y n ) E. Dynkin, “Necessary and sufficient statistics for a family of probability distributions,” Tran. Math, Stat. and Prob. , vol. 1, pp. 23-41, 1961 Anima Anandkumar (Cornell) Scaling Laws B-exam 05/26/09 14 / 59

  25. Distributed Computation of Sufficient Statistic i.i.d. ∼ N ( θ, 1) Example: Sufficient Statistic for Mean Estimation Y 1 , . . . , Y n � i Y i sufficient to estimate θ : no performance loss Sufficient Statistic For Inference: No Performance Loss Dimensionality reduction: lower communication costs Minimal Sufficiency: Maximum dimensionality reduction Decide Y 1 , . . . , Y n ∼ f 0 ( Y n ) or f 1 ( Y n ) Binary Hypothesis Testing: Minimal Sufficient Statistic for Binary Hypothesis Testing (Dynkin 61) Log Likelihood Ratio: L G ( Y n ) = log f 0 ( Y n ) f 1 ( Y n ) Is there a scalable fusion policy for computing likelihood ratio? E. Dynkin, “Necessary and sufficient statistics for a family of probability distributions,” Tran. Math, Stat. and Prob. , vol. 1, pp. 23-41, 1961 Anima Anandkumar (Cornell) Scaling Laws B-exam 05/26/09 14 / 59

  26. Inference Model and Assumptions ∆ =( V 1 , · · · , V n ) and sensor data Y V n . Random location V n Binary hypothesis: H 0 vs. H 1 : H k : Y V n ∼ f ( y v n | V n = v n ; H k ) Anima Anandkumar (Cornell) Scaling Laws B-exam 05/26/09 15 / 59

  27. Inference Model and Assumptions ∆ =( V 1 , · · · , V n ) and sensor data Y V n . Random location V n Binary hypothesis: H 0 vs. H 1 : H k : Y V n ∼ f ( y v n | V n = v n ; H k ) Y V n : Markov random field with dependency graph G k ( V n ) Y j Y i Fusion center Dependency neighbor condition: No direct “interaction” between two nodes unless they are neighbors in dependency graph Anima Anandkumar (Cornell) Scaling Laws B-exam 05/26/09 15 / 59

  28. Outline Models, assumptions, and problem formulations ◮ Propagation, network, and inference models Insights from special cases Markov random fields Scalable data fusion for Markov random field Some related problems Conclusion and future work Anima Anandkumar (Cornell) Scaling Laws B-exam 05/26/09 16 / 59

  29. Optimal Fusion: the IID Case Consider i.i.d. observations � H k : Y V ∼ f k ( Y i ) i ∈ V Sufficient statistic L ( Y V ) = log f 0 ( Y V ) � f 1 ( Y V ) = L ( Y i ) i ∈ V Anima Anandkumar (Cornell) Scaling Laws B-exam 05/26/09 17 / 59

  30. Optimal Fusion: the IID Case Consider i.i.d. observations � H k : Y V ∼ f k ( Y i ) i ∈ V Sufficient statistic L ( Y V ) = log f 0 ( Y V ) � f 1 ( Y V ) = L ( Y i ) i ∈ V The optimal data fusion is the LLR aggregation over the MST (why?) Anima Anandkumar (Cornell) Scaling Laws B-exam 05/26/09 17 / 59

  31. Optimal Fusion: the IID Case Consider i.i.d. observations � H k : Y V ∼ f k ( Y i ) i ∈ V Sufficient statistic L ( Y V ) = log f 0 ( Y V ) � f 1 ( Y V ) = L ( Y i ) i ∈ V The optimal data fusion is the LLR aggregation over the MST (why?) each node must transmit at least once MST minimizes power-weighted edge sum: min � i | e i | ν Anima Anandkumar (Cornell) Scaling Laws B-exam 05/26/09 17 / 59

  32. Optimal Fusion: the IID Case Consider i.i.d. observations � H k : Y V ∼ f k ( Y i ) i ∈ V Sufficient statistic L ( Y V ) = log f 0 ( Y V ) � f 1 ( Y V ) = L ( Y i ) i ∈ V The optimal data fusion is the LLR aggregation over the MST (why?) each node must transmit at least once MST minimizes power-weighted edge sum: min � i | e i | ν Assume network graph contains MST Anima Anandkumar (Cornell) Scaling Laws B-exam 05/26/09 17 / 59

  33. Optimal Fusion: Energy Analysis Energy per node is ) = 1 � ¯ | e | ν E ( π MST n n e ∈ MST n Steele’88, Yukich’00 1 � | e | ν L 2 → ¯ E MST < ∞ ∞ n e ∈ MST n Scalable fusion along MST for independent data J. E. Yukich,“Asymptotics for weighted minimal spanning trees on random points,” Stochastic Processes and their Applications , vol. 85, No. 1, pp. 123-138, Jan. 2000. Anima Anandkumar (Cornell) Scaling Laws B-exam 05/26/09 18 / 59

  34. Role of Sensor Location Distribution � κ ( x ) 1 − ν Better scaling constant ¯ 2 dx ? E MST = ζ ( ν ; MST ) ∞ Q 1 Clustered Uniform Spread-out 0.5 1 0.5 0.4 0.9 0.4 0.3 0.8 0.3 0.2 0.7 0.2 0.1 0.6 0.1 0 0.5 0 −0.1 0.4 −0.1 −0.2 0.3 −0.2 −0.3 0.2 −0.3 −0.4 0.1 −0.4 −0.5 0 −0.5 −0.4 −0.3 −0.2 −0.1 0 0.1 0.2 0.3 0.4 0.5 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 −0.5 −0.5 −0.4 −0.3 −0.2 −0.1 0 0.1 0.2 0.3 0.4 0.5 Anima Anandkumar (Cornell) Scaling Laws B-exam 05/26/09 19 / 59

  35. Role of Sensor Location Distribution � κ ( x ) 1 − ν Better scaling constant ¯ 2 dx ? E MST = ζ ( ν ; MST ) ∞ Q 1 Clustered Uniform Spread-out 0.5 1 0.5 0.4 0.9 0.4 0.3 0.8 0.3 0.2 0.7 0.2 0.1 0.6 0.1 0 0.5 0 −0.1 0.4 −0.1 −0.2 0.3 −0.2 −0.3 0.2 −0.3 −0.4 0.1 −0.4 −0.5 0 −0.5 −0.4 −0.3 −0.2 −0.1 0 0.1 0.2 0.3 0.4 0.5 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 −0.5 −0.5 −0.4 −0.3 −0.2 −0.1 0 0.1 0.2 0.3 0.4 0.5 Anima Anandkumar (Cornell) Scaling Laws B-exam 05/26/09 19 / 59

  36. Role of Sensor Location Distribution � κ ( x ) 1 − ν Better scaling constant ¯ 2 dx ? E MST = ζ ( ν ; MST ) ∞ Q 1 Clustered Uniform Spread-out 0.5 1 0.5 0.4 0.9 0.4 0.3 0.8 0.3 0.2 0.7 0.2 0.1 0.6 0.1 0 0.5 0 −0.1 0.4 −0.1 −0.2 0.3 −0.2 −0.3 0.2 −0.3 −0.4 0.1 −0.4 −0.5 0 −0.5 −0.4 −0.3 −0.2 −0.1 0 0.1 0.2 0.3 0.4 0.5 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 −0.5 −0.5 −0.4 −0.3 −0.2 −0.1 0 0.1 0.2 0.3 0.4 0.5 Ratio of ¯ E MST of clustered and spread-out placements with respect to uniform ∞ 2.5 Uniform is Worst−Case Uniform is Optimal 2 1.5 1 0.5 0 0 1 2 3 4 5 Path-loss Exponent ν Anima Anandkumar (Cornell) Scaling Laws B-exam 05/26/09 19 / 59

  37. Outline Models, assumptions, and problem formulations ◮ Propagation, network, and inference models Insights from special cases Markov random fields ◮ Conditional-independence Relationships ◮ Hammersley-Clifford Theorem ◮ Form of Likelihood Ratio Scalable data fusion for Markov random field Some related problems Conclusion and future work Anima Anandkumar (Cornell) Scaling Laws B-exam 05/26/09 20 / 59

  38. Inference Model and Assumptions ∆ Random location V n =( V 1 , · · · , V n ) and samples Y V n . Binary hypothesis: H 0 vs. H 1 : H k : Y V n ∼ f ( y v n | V n = v n ; H k ) Anima Anandkumar (Cornell) Scaling Laws B-exam 05/26/09 21 / 59

  39. Inference Model and Assumptions ∆ Random location V n =( V 1 , · · · , V n ) and samples Y V n . Binary hypothesis: H 0 vs. H 1 : H k : Y V n ∼ f ( y v n | V n = v n ; H k ) Y V n : Markov random field with dependency graph G k ( V n ) Y j Y i Fusion center Anima Anandkumar (Cornell) Scaling Laws B-exam 05/26/09 21 / 59

  40. Dependency Graph and Markov Random Field Consider an undirected graph G ( V ) , each vertex V i ∈ V is associated with a random variable Y i Y j Y i Y k Anima Anandkumar (Cornell) Scaling Laws B-exam 05/26/09 22 / 59

  41. Dependency Graph and Markov Random Field Consider an undirected graph G ( V ) , each vertex V i ∈ V is associated with a random variable Y i V \{ Nbd ( i ) ∪ i } Nbd ( i ) i Y i ⊥ ⊥ Y V \{ Nbd ( i ) ∪ i } | Y Nbd ( i ) Anima Anandkumar (Cornell) Scaling Laws B-exam 05/26/09 22 / 59

  42. Dependency Graph and Markov Random Field Consider an undirected graph G ( V ) , each vertex V i ∈ V is associated with a random variable Y i For any disjoint sets A , B , C such that C separates A and B , V \{ Nbd ( i ) ∪ i } B Nbd ( i ) C i A Y A ⊥ ⊥ Y B | Y C Y i ⊥ ⊥ Y V \{ Nbd ( i ) ∪ i } | Y Nbd ( i ) Anima Anandkumar (Cornell) Scaling Laws B-exam 05/26/09 22 / 59

  43. Likelihood Function of MRF Hammersley-Clifford Theorem’71 Let f be joint pdf of MRF with graph G ( V ) , � − log f ( Y V ) = Ψ c ( Y c ) c ∈ C where C is the set of maximal cliques. Anima Anandkumar (Cornell) Scaling Laws B-exam 05/26/09 23 / 59

  44. Likelihood Function of MRF Hammersley-Clifford Theorem’71 Let f be joint pdf of MRF with graph G ( V ) , � − log f ( Y V ) = Ψ c ( Y c ) c ∈ C where C is the set of maximal cliques. Gaussian MRF: � � � V ( i, j ) Y i Y j + � Σ − 1 Σ − 1 − log f ( Y V ) = 1 V ( i, i ) Y 2 − n log 2 π − log | Σ V | + i 2 ( i,j ) ∈ G i ∈ V 1 2 3 4 5 6 7 8 X 8 X 1 7 X X X X 2 Inverse of X X X 1 3 6 Dependency X 3 Covariance 5 4 X X X 5 Graph Matrix X X 6 X X 7 X X 8 4 2 Anima Anandkumar (Cornell) Scaling Laws B-exam 05/26/09 23 / 59

  45. Inference Model and Assumptions ∆ Random location V n =( V 1 , · · · , V n ) and samples Y V n . Binary hypothesis: H 0 vs. H 1 : H k : Y V n ∼ f ( y v n | V n = v n , H k ) Anima Anandkumar (Cornell) Scaling Laws B-exam 05/26/09 24 / 59

  46. Inference Model and Assumptions ∆ Random location V n =( V 1 , · · · , V n ) and samples Y V n . Binary hypothesis: H 0 vs. H 1 : H k : Y V n ∼ f ( y v n | V n = v n , H k ) Y V n : Markov random field with dependency graph G k ( V n ) � − log f ( Y V n | G k , H k ) = Ψ k,c ( Y c ) c ∈ C k where C n,k is the collection of maximal cliques Ψ k,c clique potentials. H 0 : G 0 H 1 : G 1 Anima Anandkumar (Cornell) Scaling Laws B-exam 05/26/09 24 / 59

  47. Dependency Graph Model H 0 : G 0 H 1 : G 1 Recall Hammersley-Clifford Theorem − log f ( Y V n | G k , H k ) = � Ψ k,c ( Y c ) c ∈ C k Minimal sufficient statistic L G ( Y V ) = log f ( Y V | G 0 , H 0 ) f ( Y V | G 1 , H 1 ) Anima Anandkumar (Cornell) Scaling Laws B-exam 05/26/09 25 / 59

  48. Dependency Graph Model H 0 : G 0 H 1 : G 1 Joint: G 0 ∪ G 1 Recall Hammersley-Clifford Theorem − log f ( Y V n | G k , H k ) = � Ψ k,c ( Y c ) c ∈ C k Minimal sufficient statistic L G ( Y V ) = log f ( Y V | G 0 , H 0 ) f ( Y V | G 1 , H 1 ) Anima Anandkumar (Cornell) Scaling Laws B-exam 05/26/09 25 / 59

  49. Dependency Graph Model H 0 : G 0 H 1 : G 1 Joint: G 0 ∪ G 1 Recall Hammersley-Clifford Theorem − log f ( Y V n | G k , H k ) = � Ψ k,c ( Y c ) c ∈ C k Minimal sufficient statistic L G ( Y V ) = log f ( Y V | G 0 , H 0 ) f ( Y V | G 1 , H 1 ) Anima Anandkumar (Cornell) Scaling Laws B-exam 05/26/09 25 / 59

  50. Dependency Graph Model H 0 : G 0 H 1 : G 1 Joint: G 0 ∪ G 1 Recall Hammersley-Clifford Theorem − log f ( Y V n | G k , H k ) = � Ψ k,c ( Y c ) c ∈ C k Minimal sufficient statistic L G ( Y V ) = log f ( Y V | G 0 , H 0 ) � f ( Y V | G 1 , H 1 )= φ ( Y c ) c ∈ C Anima Anandkumar (Cornell) Scaling Laws B-exam 05/26/09 25 / 59

  51. Outline Models, assumptions, and problem formulations ◮ Propagation, network, and inference models Insights from special cases Markov random fields Scalable data fusion for Markov random field ◮ A suboptimal scalable policy ◮ Effects of sparsity on scalability ◮ Energy scaling analysis Some related problems Conclusion and future work Anima Anandkumar (Cornell) Scaling Laws B-exam 05/26/09 26 / 59

  52. Fusion for Markov Random Field Dependency graph Network graph Fusion policy graph Lossless Fusion Policies Given the network and dependency graphs ( N , G ) , = { π : L G ( Y V ) = � ∆ φ ( Y c ) computable at the fusion center } . F G , N c ∈ C � Optimal fusion Policy: E ( π ∗ n ) = min i E i ( π n ) π ∈ F G n, N n NP-hard: Steiner-tree reduction (INFOCOM ‘08) Anima Anandkumar (Cornell) Scaling Laws B-exam 05/26/09 27 / 59

  53. Data Fusion for Markov Random Field (DFMRF) Log-likelihood Ratio L G ( Y V ) = � φ ( Y c ) c ∈ C Step I: Data forwarding and local computation: Given dependency graph G and network graph N . Randomly select a representative (processor) in each clique of G . Clique members forward data to processor via SPR on N c 3 c 2 c 1 c 4 Anima Anandkumar (Cornell) Scaling Laws B-exam 05/26/09 28 / 59

  54. Data Fusion for Markov Random Field (DFMRF) Log-likelihood Ratio L G ( Y V ) = � φ ( Y c ) c ∈ C Step I: Data forwarding and local computation: Given dependency graph G and network graph N . Randomly select a representative (processor) in each clique of G . Clique members forward data to processor via SPR on N c 3 c 2 c 1 c 4 Anima Anandkumar (Cornell) Scaling Laws B-exam 05/26/09 28 / 59

  55. Data Fusion for Markov Random Field (DFMRF) Log-likelihood Ratio L G ( Y V ) = � φ ( Y c ) c ∈ C Step I: Data forwarding and local computation: Given dependency graph G and network graph N . Randomly select a representative (processor) in each clique of G . Clique members forward data to processor via SPR on N Raw Data: Y i Anima Anandkumar (Cornell) Scaling Laws B-exam 05/26/09 28 / 59

  56. Data Fusion for Markov Random Field (DFMRF) Log-likelihood Ratio L G ( Y V ) = � φ ( Y c ) c ∈ C Step I: Data forwarding and local computation: Given dependency graph G and network graph N . Randomly select a representative (processor) in each clique of G . Clique members forward data to processor via SPR on N φ ( Y c 3 ) φ ( Y c 1 ) + φ ( Y c 2 ) φ ( Y c 4 ) Anima Anandkumar (Cornell) Scaling Laws B-exam 05/26/09 28 / 59

  57. Data Fusion for Markov Random Field (DFMRF) Log-likelihood Ratio L G ( Y V ) = � φ ( Y c ) c ∈ C Step I: Data forwarding and local computation: Given dependency graph G and network graph N . Randomly select a representative (processor) in each clique of G . Clique members forward data to processor via SPR on N Step II: aggregating LLR over MST φ ( Y c 3 ) φ ( Y c 1 ) + φ ( Y c 2 ) φ ( Y c 4 ) Anima Anandkumar (Cornell) Scaling Laws B-exam 05/26/09 28 / 59

  58. Data Fusion for Markov Random Field (DFMRF) Log-likelihood Ratio L G ( Y V ) = � φ ( Y c ) c ∈ C Step I: Data forwarding and local computation: Given dependency graph G and network graph N . Randomly select a representative (processor) in each clique of G . Clique members forward data to processor via SPR on N Step II: aggregating LLR over MST + L G Anima Anandkumar (Cornell) Scaling Laws B-exam 05/26/09 28 / 59

  59. Data Fusion for Markov Random Field (DFMRF) Log-likelihood Ratio L G ( Y V ) = � φ ( Y c ) c ∈ C Step I: Data forwarding and local computation: Given dependency graph G and network graph N . Randomly select a representative (processor) in each clique of G . Clique members forward data to processor via SPR on N Step II: aggregating LLR over MST + L G Total energy consumption= Data Forwarding + MST Aggregation Anima Anandkumar (Cornell) Scaling Laws B-exam 05/26/09 28 / 59

  60. Effects of Dependency Graph Sparsity on Scalability Sparsity of Dependency Graph � � φ ( Y i ) φ ( Y c ) φ ( Y V ) i ∈ V c ∈ C Anima Anandkumar (Cornell) Scaling Laws B-exam 05/26/09 29 / 59

  61. Effects of Dependency Graph Sparsity on Scalability Sparsity of Dependency Graph � � φ ( Y i ) φ ( Y c ) φ ( Y V ) i ∈ V c ∈ C Stabilizing graph (Penrose-Yukich) Local graph structure not affected by far away points ( k -NNG, Disk) M. D. Penrose and J. E. Yukich, “Weak Laws Of Large Numbers In Geometric Probability,” Annals of Applied probability, vol. 13, no. 1, pp. 277-303, 2003 Anima Anandkumar (Cornell) Scaling Laws B-exam 05/26/09 29 / 59

  62. Effects of Network Graph Sparsity on Scalability Sparsity of Network Graph Single Hop Complete ( N n ) u -Spanner Anima Anandkumar (Cornell) Scaling Laws B-exam 05/26/09 30 / 59

  63. Effects of Network Graph Sparsity on Scalability Sparsity of Network Graph Single Hop Complete ( N n ) u -Spanner Gabriel: u = 1 for ν ≥ 2 u -Spanner Given network graph N n and its �� �� �� �� completion N n , N n is a u -spanner if �� �� �� �� �� �� �� �� E ( V i → V j ; SP on N n ) max E ( V i → V j ; SP on N n ) ≤ u V i ,V j ∈ V n Anima Anandkumar (Cornell) Scaling Laws B-exam 05/26/09 30 / 59

  64. Effects of Network Graph Sparsity on Scalability Sparsity of Network Graph Single Hop Complete ( N n ) u -Spanner Gabriel: u = 1 for ν ≥ 2 u -Spanner Given network graph N n and its �� �� �� �� completion N n , N n is a u -spanner if �� �� �� �� �� �� �� �� E ( V i → V j ; SP on N n ) max E ( V i → V j ; SP on N n ) ≤ u V i ,V j ∈ V n Longest edge O ( √ log n ) Anima Anandkumar (Cornell) Scaling Laws B-exam 05/26/09 30 / 59

  65. Main Result: Scalability of DFMRF Dependency graph Network graph Fusion policy graph Stabilizing u -Spanner DFMRF Anima Anandkumar (Cornell) Scaling Laws B-exam 05/26/09 31 / 59

  66. Main Result: Scalability of DFMRF Dependency graph Network graph Fusion policy graph Stabilizing u -Spanner DFMRF Scaling Constant for Scale-Invariant Graphs ( k -NNG) � E ( π DFMRF ) λ − ν κ ( x ) 1 − ν n 2 [ u ζ ( ν ; G ) 2 dx, lim sup ≤ + ζ ( ν ; MST ) ] n n →∞ � �� � � �� � Q 1 data forward MST aggregation � ∆ | 0 , j | ν ζ ( ν ; G ) = E ( 0 ,j ) ∈ G ( P 1 ∪{ 0 } ) Anima Anandkumar (Cornell) Scaling Laws B-exam 05/26/09 31 / 59

  67. Approximation Ratio for DFMRF ∆ Recall F G = { π : L G ( Y V ) computable at the fusion center } � E ( π ∗ n ) = min E i ( π n ) π ∈ F G i Anima Anandkumar (Cornell) Scaling Laws B-exam 05/26/09 32 / 59

  68. Approximation Ratio for DFMRF ∆ Recall F G = { π : L G ( Y V ) computable at the fusion center } � E ( π ∗ n ) = min E i ( π n ) π ∈ F G i Lower and Upper Bounds For Optimal Fusion Policy ) ≤ E ( π ∗ E ( π MST n ) ≤ E ( π DFMRF ) n n Anima Anandkumar (Cornell) Scaling Laws B-exam 05/26/09 32 / 59

  69. Approximation Ratio for DFMRF ∆ Recall F G = { π : L G ( Y V ) computable at the fusion center } � E ( π ∗ n ) = min E i ( π n ) π ∈ F G i Lower and Upper Bounds For Optimal Fusion Policy ) ≤ E ( π ∗ E ( π MST n ) ≤ E ( π DFMRF ) n n Approximation Ratio of DFMRF for k -NNG Dependency � ζ ( ν ; G ) � E ( π DFMRF ) lim sup n ≤ 1 + u E ( π ∗ n ) ζ ( ν ; MST ) n →∞ Constant factor approximation for DFMRF for large networks Approximation ratio independent of node placement for k -NNG Anima Anandkumar (Cornell) Scaling Laws B-exam 05/26/09 32 / 59

  70. Simulation Results for k -NNG Dependency Avg. Energy Under Uniform Placement Approx. Ratio for DFMRF 10 5 0-NNG: MST No correlation 9 4.5 1-NNG: DFMRF 1-NNG dependency 8 2-NNG: DFMRF 2-NNG dependency 4 3-NNG: DFMRF 3-NNG dependency 7 3.5 No Fusion: SPR 6 3 5 2.5 4 2 3 1.5 2 1 1 0.5 0 20 40 60 80 100 120 140 160 180 20 40 60 80 100 120 140 160 180 Number of nodes n Number of nodes n Anima Anandkumar (Cornell) Scaling Laws B-exam 05/26/09 33 / 59

  71. What Have We Done and Left Out.... Energy scaling laws ◮ Assumed stabilizing dependency graph and u -spanner network graph ◮ Defined a fusion policy π DFMRF (DFMRF) n � ) ≤ ¯ ◮ Scalability analysis: lim sup 1 i E i ( π DFMRF E DFMRF n n ∞ n →∞ α ≤ ¯ ∞ ≤ ¯ E π ∗ E DFMRF ≤ β < ∞ ∞ β ◮ Asymptotic approximation ratio: α . Anima Anandkumar (Cornell) Scaling Laws B-exam 05/26/09 34 / 59

  72. What Have We Done and Left Out.... Energy scaling laws ◮ Assumed stabilizing dependency graph and u -spanner network graph ◮ Defined a fusion policy π DFMRF (DFMRF) n � ) ≤ ¯ ◮ Scalability analysis: lim sup 1 i E i ( π DFMRF E DFMRF n n ∞ n →∞ α ≤ ¯ ∞ ≤ ¯ E π ∗ E DFMRF ≤ β < ∞ ∞ β ◮ Asymptotic approximation ratio: α . Remarks ◮ Energy consumption is a key parameter for large sensor networks. ◮ Sensor location is a new source of randomness in distributed inference ◮ Asymptotic techniques are useful in overall network design. Anima Anandkumar (Cornell) Scaling Laws B-exam 05/26/09 34 / 59

  73. What Have We Done and Left Out.... Energy scaling laws ◮ Assumed stabilizing dependency graph and u -spanner network graph ◮ Defined a fusion policy π DFMRF (DFMRF) n � ) ≤ ¯ ◮ Scalability analysis: lim sup 1 i E i ( π DFMRF E DFMRF n n ∞ n →∞ α ≤ ¯ ∞ ≤ ¯ E π ∗ E DFMRF ≤ β < ∞ ∞ β ◮ Asymptotic approximation ratio: α . Remarks ◮ Energy consumption is a key parameter for large sensor networks. ◮ Sensor location is a new source of randomness in distributed inference ◮ Asymptotic techniques are useful in overall network design. We have ignored several issues: ◮ one-shot inference ◮ quantization of measurements and link capacity constraints ◮ perfect transmission/reception and scheduling ◮ computation cost and overheads Anima Anandkumar (Cornell) Scaling Laws B-exam 05/26/09 34 / 59

  74. Outline Models, assumptions, and problem formulations ◮ Propagation, network, and inference models Insights from special cases Markov random fields Scalable data fusion for Markov random field Some related problems ◮ Error exponents on random graph ◮ Cost performance tradeoff ◮ Inference in finite networks Conclusion and future work Anima Anandkumar (Cornell) Scaling Laws B-exam 05/26/09 35 / 59

  75. Design for Energy Constrained Inference P 1 → 0 ( n ) Error Exponent (IT ‘09, ISIT ‘09) For MRF hypothesis with node density P 1 → 0 ( n ) ∼ exp( − n D λ,κ ) λ and distribution κ ( x ) , − 1 ? − → D λ,κ n log P 1 → 0 ( n ) n Anima Anandkumar (Cornell) Scaling Laws B-exam 05/26/09 36 / 59

  76. Design for Energy Constrained Inference P 1 → 0 ( n ) Error Exponent (IT ‘09, ISIT ‘09) For MRF hypothesis with node density P 1 → 0 ( n ) ∼ exp( − n D λ,κ ) λ and distribution κ ( x ) , − 1 ? − → D λ,κ n log P 1 → 0 ( n ) n Design for Energy Constrained Inference (SP ‘08) subject to ¯ λ,κ ≤ ¯ E π λ,κ,π D λ,κ max E o (1) A. Anandkumar, L. Tong, A. Swami, “Detection of Gauss-Markov Random Fields with Nearest-Neighbor Dependency,” IEEE Tran. on Information Theory , Feb. 2009 (2) A. Anandkumar, J.E. Yukich, L. Tong, A. Willsky, “Detection Error Exponent for Spatially Dependent Samples in Random Networks,” Proc. of IEEE ISIT , Jun. 2009 (3) A. Anandkumar, L. Tong, and A. Swami, “Optimal Node Density for Detection in Energy Constrained Random Networks,” IEEE Tran. Signal Proc. , pp. 5232-5245, Oct. 2008. Anima Anandkumar (Cornell) Scaling Laws B-exam 05/26/09 36 / 59

  77. Inference In Finite Fusion Networks We have so far considered Harder problem Random node placement Arbitrary node placement Scaling as n → ∞ Finite n Results (INFOCOM ‘08 & ‘09) Fusion Center Fusion scheme has a Steiner tree reduction Cost-performance tradeoff Y n = [ Y 1 , . . . , Y n ] (1) A. Anandkumar, L. Tong, A. Swami, and A. Ephremides, “Minimum Cost Data Aggregation with Localized Processing for Statistical Inference,” in Proc. of INFOCOM , April 2008 (2) A. Anandkumar, M. Wang, L. Tong, and A. Swami, “Prize-Collecting Data Fusion for Cost- Performance Tradeoff in Distributed Inference,” in Proc. of IEEE INFOCOM , April 2009. Anima Anandkumar (Cornell) Scaling Laws B-exam 05/26/09 37 / 59

  78. Medium Access Control (SP ‘07, IT ‘08) With L. Tong, Cornell, & A. Swami, ARL Fusion Center Constant BW Scaling Realization . Type Anima Anandkumar (Cornell) Scaling Laws B-exam 05/26/09 38 / 59

  79. Medium Access Control Transaction Monitoring (SP ‘07, IT ‘08) With L. Tong, Cornell, & A. Swami, ARL (Sigmetrics ‘08) With C. Bisdikian & D. Agrawal, IBM Research Fusion Center Constant BW Scaling Realization . Type Decentralized Bipartite Matching Anima Anandkumar (Cornell) Scaling Laws B-exam 05/26/09 38 / 59

  80. Medium Access Control Transaction Monitoring (SP ‘07, IT ‘08) With L. Tong, Cornell, & A. Swami, ARL (Sigmetrics ‘08) With C. Bisdikian & D. Agrawal, IBM Research Fusion Center Constant BW Scaling Realization . Type Decentralized Bipartite Matching Learning dependency models (ISIT ‘09) With V. Tan, A. Willsky, MIT, & L. Tong, Cornell SNR for learning Anima Anandkumar (Cornell) Scaling Laws B-exam 05/26/09 38 / 59

  81. Medium Access Control Transaction Monitoring (SP ‘07, IT ‘08) With L. Tong, Cornell, & A. Swami, ARL (Sigmetrics ‘08) With C. Bisdikian & D. Agrawal, IBM Research Fusion Center Constant BW Scaling Realization . Type Decentralized Bipartite Matching Learning dependency models Competitive Learning (ISIT ‘09) With V. Tan, A. Willsky, MIT, & L. Tong, Cornell With A.K. Tang, Cornell Univ. SNR for learning Regret-free under interference Spectrum Whitespace Secondary Primary Anima Anandkumar (Cornell) Scaling Laws B-exam 05/26/09 38 / 59

  82. Holy Grail... Networks Seamless operation Efficient resource utilization Unified theory: feasibility of large networks under different applications Anima Anandkumar (Cornell) Scaling Laws B-exam 05/26/09 39 / 59

  83. Holy Grail... Networks Seamless operation Efficient resource utilization Unified theory: feasibility of large networks under different applications Network Data Data-centric paradigms Unifying computation and communication. ◮ e.g., inference Fundamental limits and scalable algorithms Anima Anandkumar (Cornell) Scaling Laws B-exam 05/26/09 39 / 59

  84. Multidisciplinary Research Approximation Algorithms Detection/Estimation Theory Unified Network Theory Information Theory Random Graphs Communication Theory Asymptotics and Large Deviations Anima Anandkumar (Cornell) Scaling Laws B-exam 05/26/09 40 / 59

  85. http://acsp.ece.cornell.edu/members/anima.html Thank You! Anima Anandkumar (Cornell) Scaling Laws B-exam 05/26/09 41 / 59

Recommend


More recommend