distributed estimation with relative measurements
play

Distributed Estimation with Relative Measurements Fundamental - PowerPoint PPT Presentation

Distributed Estimation with Relative Measurements Fundamental Limitations, Algorithms, Application to Power Systems Paolo Frasca based on joint works with F. Fagnani, H. Ishii, C. Ravazzi, W.S. Rossi, and R. Tempo partly supported by Joint


  1. Distributed Estimation with Relative Measurements Fundamental Limitations, Algorithms, Application to Power Systems Paolo Frasca based on joint works with F. Fagnani, H. Ishii, C. Ravazzi, W.S. Rossi, and R. Tempo partly supported by Joint International Lab COOPS and by JST CREST SICE International Symposium on Control Systems 2015 March 4-7, 2015, Tokyo, Japan

  2. Outline Estimation from relative measurements 1 Problem statement & graph representation Least-squares formulation Fundamental limitations: error of optimal estimator Gradient algorithm 2 Finite-time optimality of the expected error Ergodic randomized algorithm 3 Asynchronous gossip randomization Oscillations and ergodicity: sample-averages from time-averages An asynchronous distributed algorithm exploiting ergodicity Application & extension: power systems 4 1 / 17

  3. Problem statement: relative estimation V is a set of sensors of cardinality N ξ ∈ R V is an unknown vector each sensor u obtains noisy relative measurements with some other nodes v , b uv = ξ u − ξ v + η uv η uv are i.i.d. noise Goal: for each sensor v ∈ V , estimate the scalar value ξ v Applications: clock synchronization A. Giridhar and P. R. Kumar. Distributed clock synchronization over wireless networks: Al- gorithms and analysis. In IEEE Conference on Decision and Control , pages 4915–4920, San Diego, CA, USA, December 2006 self-localization of mobile robots P. Barooah and J. P. Hespanha. Estimation from relative measurements: Algorithms and scaling laws. IEEE Control Systems Magazine , 27(4):57–74, 2007 statistical ranking in databases B. Osting, C. Brune, and S. J. Osher. Optimal data collection for improved rankings expose well-connected graphs. Journal of Machine Learning Research , 15:2981–3012, 2014 2 / 17

  4. Relative estimation as a graph problem Measurements − → edges E of an oriented connected graph G = ( V , E ) 5 Incidence matrix A ∈ { 0 , ± 1 } E×V 1  4  +1 if e = ( v , w )  A ew = − 1 if e = ( w , v ) 2   0 otherwise 3 1 − 1 0 0 0   Laplacian matrix 1 0 0 0 − 1   0 1 − 1 0 0   A =   0 1 0 0 − 1 2 − 1 0 0 − 1       0 0 1 − 1 0 − 1 3 − 1 0 − 1     L = A ⊤ A = 0 0 0 1 − 1 0 − 1 2 − 1 0     0 0 − 1 2 − 1   − 1 − 1 0 − 1 3 3 / 17

  5. Relative estimation as a least-squares problem We define the least-squares problem || Az − b || 2 min z Matrix A has rank N − 1 = ⇒ affine space of solutions (up to a constant) The minimum-norm solution x ⋆ = L † A ⊤ b best explains the measurements Questions: Q1 How good is the estimate x ⋆ ? Q2 How can the sensor network compute x ⋆ ? 4 / 17

  6. Estimator error and effective resistance � Estimator error: 1 N E � x ⋆ − ξ � 2 = σ 2 1 1 N λ i i ≥ 2 where 0 = λ 1 < λ 2 ≤ · · · ≤ λ N are the eigenvalues of L σ 2 is variance of noise Observation from graph theory: � 1 1 = R ave ( G ) 5 λ i N i ≥ 2 R ave ( G ) = average of all effective resistances 1 between all pairs of nodes if the graph was an electrical network of unit resistors 4 The error is determined by the topology of the measurement graph: 2 e.g., the scaling in N depends on the 3 graph dimension 5 / 17

  7. Gradient algorithm

  8. Gradient descent algorithm The gradient of Ψ( z ) = || Az − b || 2 is ∇ Ψ( z ) = 2 Lz − 2 A ⊤ b We define, choosing a parameter τ > 0, � x (0) = 0 x ( k + 1) = ( I − τ L ) x ( k ) + τ A ⊤ b Proposition (Convergence) k → + ∞ x ( k ) = x ⋆ If τ < 1 / d max , where d max is the largest degree in G , then lim 6 / 17

  9. Gradient descent algorithm The gradient of Ψ( z ) = || Az − b || 2 is ∇ Ψ( z ) = 2 Lz − 2 A ⊤ b We define, choosing a parameter τ > 0, � x (0) = 0 x ( k + 1) = ( I − τ L ) x ( k ) + τ A ⊤ b Proposition (Convergence) k → + ∞ x ( k ) = x ⋆ If τ < 1 / d max , where d max is the largest degree in G , then lim The gradient algorithm is distributed : each node only needs to know the states of its neighbors τ A ⊤ b ( I − τ L ) x ( k + 1) = x ( k ) + � �� � � �� � constant input consensus algorithm synchronous : all nodes update their states at the same time 6 / 17

  10. Finite-time optimality of the expected error Assume ξ v s are i.i.d. with zero mean and variance ν 2 J ( k ) := 1 N E ξ E η � x ( k ) − ξ � 2 expected error 7 / 17

  11. Finite-time optimality of the expected error Assume ξ v s are i.i.d. with zero mean and variance ν 2 J ( k ) := 1 N E ξ E η � x ( k ) − ξ � 2 expected error Results: If k ≥ ν 2 τσ 2 , then J ( k + 1) ≥ J ( k ) (eventual increase) - the error J ( k ) has a minimum at a finite time k min - k min has an upper bound which does not depend on N or on G Ring, ν = 20 , σ = 1, τ = 0.250 N = 10 N = 20 N = 40 N = 80 N = 160 2 10 Surprising conclusion: N = 320 the algorithm should not be run until convergence, but stopped earlier, irrespective of the measurement graph! 1 10 0 1 2 3 4 5 6 10 10 10 10 10 10 10 time W. S. Rossi, P. Frasca, and F. Fagnani. Limited benefit of cooperation in distributed relative localization. In IEEE Conference on Decision and Control , pages 5427–5431, Florence, Italy, December 2013 7 / 17

  12. Ergodic randomized algorithm

  13. Asynchronous randomized algorithm We take a pairwise “gossip” approach Fix a real number γ ∈ (0 , 1) At every time instant k ∈ Z + , an edge ( u , v ) ∈ E is sampled randomly P [( u , v ) is selected at time k ] = 1 5 |E| 1 4 2 3 8 / 17

  14. Asynchronous randomized algorithm We take a pairwise “gossip” approach Fix a real number γ ∈ (0 , 1) At every time instant k ∈ Z + , an edge ( u , v ) ∈ E is sampled randomly P [( u , v ) is selected at time k ] = 1 5 |E| 1 4 2 3 8 / 17

  15. Asynchronous randomized algorithm We take a pairwise “gossip” approach Fix a real number γ ∈ (0 , 1) At every time instant k ∈ Z + , an edge ( u , v ) ∈ E is sampled randomly P [( u , v ) is selected at time k ] = 1 5 |E| 1 and the states are updated: 4 x u ( k + 1) = (1 − γ ) x u ( k ) + γ x v ( k ) + γ b ( u , v ) 2 x v ( k + 1) = (1 − γ ) x v ( k ) + γ x u ( k ) − γ b ( u , v ) 3 ∈ { u , v } x w ( k + 1) = x w ( k ) if w / This is not a standard coordinate gradient This reminds a gossip consensus algorithm with a constant input 8 / 17

  16. Simulations: no convergence The states x ( k ) persistently oscillate!!! 0.8 2.5 2 0.6 1.5 0.4 1 0.5 0.2 x x 0 0 −0.5 −1 −0.2 −1.5 −0.4 −2 −0.6 −2.5 k k 0 100 200 300 400 500 600 700 800 900 1000 0 100 200 300 400 500 600 700 800 900 1000 Can we still use this algorithm? 9 / 17

  17. Countermeasure: time-averages Time-averages smooth out the oscillations k � 1 ⇒ x ( k ) → x ⋆ as k → + ∞ x ( k ) := x ( ℓ ) = k + 1 ℓ =0 0.8 2.5 2 0.6 1.5 0.4 1 0.5 0.2 x x 0 0 −0.5 −1 −0.2 −1.5 −0.4 −2 k k −0.6 −2.5 0 100 200 300 400 500 600 700 800 900 1000 0 100 200 300 400 500 600 700 800 900 1000 10 / 17

  18. Countermeasure: time-averages Time-averages smooth out the oscillations k � 1 ⇒ x ( k ) → x ⋆ as k → + ∞ x ( k ) := x ( ℓ ) = k + 1 ℓ =0 0.5 2.5 2 0.4 1.5 0.3 1 0.2 0.5 0.1 x x 0 0 −0.5 −0.1 −1 −0.2 −1.5 −0.3 −2 k k −0.4 −2.5 0 100 200 300 400 500 600 700 800 900 1000 0 100 200 300 400 500 600 700 800 900 1000 10 / 17

  19. Countermeasure: time-averages Time-averages smooth out the oscillations k � 1 ⇒ x ( k ) → x ⋆ as k → + ∞ x ( k ) := x ( ℓ ) = k + 1 ℓ =0 0.5 0.8 0.4 0.6 0.3 0.4 0.2 0.2 0.1 0 x x 0 −0.2 −0.1 −0.4 −0.2 −0.6 −0.3 −0.8 k k −0.4 −1 0 100 200 300 400 500 600 700 800 900 1000 0 100 200 300 400 500 600 700 800 900 1000 10 / 17

  20. Countermeasure: time-averages Time-averages smooth out the oscillations k � 1 ⇒ x ( k ) → x ⋆ as k → + ∞ x ( k ) := x ( ℓ ) = k + 1 ℓ =0 0.5 0.8 0.4 0.6 0.3 0.4 0.2 0.2 0.1 0 x x 0 −0.2 −0.1 −0.4 −0.2 −0.6 −0.3 −0.8 k k −0.4 −1 0 100 200 300 400 500 600 700 800 900 1000 0 100 200 300 400 500 600 700 800 900 1000 thanks to: ergodicity of x ( · ): sample averages ⇐ ⇒ time averages simple average dynamics (like the gradient algorithm) � � I − γ E [ x ( k )] + γ |E| A ⊤ b E [ x ( k + 1)] = |E| L E [ x ( k )] → x ⋆ as k → + ∞ 10 / 17

  21. Local vs global clocks To compute time-averages x each sensor needs to know the absolute time k We can overcome this drawback by defining two auxiliary dynamics: local times: κ w (0) = 1 for all w ∈ V κ u ( k + 1) = κ u ( k ) + 1 κ v ( k + 1) = κ v ( k ) + 1 κ w ( k + 1) = κ w ( k ) if w / ∈ { u , v } “local” time-averages: x w (0) = 0 for all w ∈ V � � 1 x u ( k + 1) = κ u ( k ) x u ( k ) + x u ( k + 1) κ u ( k + 1) � � 1 x v ( k + 1) = κ v ( k ) x v ( k ) + x v ( k + 1) κ v ( k + 1) x w ( k + 1) = x w ( k ) if w / ∈ { u , v } 11 / 17

Recommend


More recommend