Distributed Estimation with Relative Measurements Fundamental - - PowerPoint PPT Presentation

distributed estimation with relative measurements
SMART_READER_LITE
LIVE PREVIEW

Distributed Estimation with Relative Measurements Fundamental - - PowerPoint PPT Presentation

Distributed Estimation with Relative Measurements Fundamental Limitations, Algorithms, Application to Power Systems Paolo Frasca based on joint works with F. Fagnani, H. Ishii, C. Ravazzi, W.S. Rossi, and R. Tempo partly supported by Joint


slide-1
SLIDE 1

Distributed Estimation with Relative Measurements

Fundamental Limitations, Algorithms, Application to Power Systems Paolo Frasca

based on joint works with

  • F. Fagnani, H. Ishii, C. Ravazzi, W.S. Rossi, and R. Tempo

partly supported by Joint International Lab COOPS and by JST CREST

SICE International Symposium on Control Systems 2015 March 4-7, 2015, Tokyo, Japan

slide-2
SLIDE 2

Outline

1

Estimation from relative measurements Problem statement & graph representation Least-squares formulation Fundamental limitations: error of optimal estimator

2

Gradient algorithm Finite-time optimality of the expected error

3

Ergodic randomized algorithm Asynchronous gossip randomization Oscillations and ergodicity: sample-averages from time-averages An asynchronous distributed algorithm exploiting ergodicity

4

Application & extension: power systems

1 / 17

slide-3
SLIDE 3

Problem statement: relative estimation

V is a set of sensors of cardinality N ξ ∈ RV is an unknown vector each sensor u obtains noisy relative measurements with some other nodes v, buv = ξu − ξv + ηuv ηuv are i.i.d. noise Goal: for each sensor v ∈ V, estimate the scalar value ξv Applications: clock synchronization

  • A. Giridhar and P. R. Kumar. Distributed clock synchronization over wireless networks: Al-

gorithms and analysis. In IEEE Conference on Decision and Control, pages 4915–4920, San Diego, CA, USA, December 2006

self-localization of mobile robots

  • P. Barooah and J. P. Hespanha. Estimation from relative measurements: Algorithms and scaling
  • laws. IEEE Control Systems Magazine, 27(4):57–74, 2007

statistical ranking in databases

  • B. Osting, C. Brune, and S. J. Osher. Optimal data collection for improved rankings expose

well-connected graphs. Journal of Machine Learning Research, 15:2981–3012, 2014

2 / 17

slide-4
SLIDE 4

Relative estimation as a graph problem

Measurements − → edges E of an oriented connected graph G = (V, E) Incidence matrix A ∈ {0, ±1}E×V Aew =      +1 if e = (v, w) −1 if e = (w, v)

  • therwise

A =        1 −1 1 −1 1 −1 1 −1 1 −1 1 −1       

4 5 1 2 3 Laplacian matrix

L = A⊤A =      2 −1 −1 −1 3 −1 −1 −1 2 −1 −1 2 −1 −1 −1 −1 3     

3 / 17

slide-5
SLIDE 5

Relative estimation as a least-squares problem

We define the least-squares problem min

z

||Az − b||2 Matrix A has rank N − 1 = ⇒ affine space of solutions (up to a constant) The minimum-norm solution x⋆ = L†A⊤b best explains the measurements Questions: Q1 How good is the estimate x⋆? Q2 How can the sensor network compute x⋆?

4 / 17

slide-6
SLIDE 6

Estimator error and effective resistance

Estimator error: 1 N Ex⋆ − ξ2 = σ2 1 N

  • i≥2

1 λi where 0 = λ1 < λ2 ≤ · · · ≤ λN are the eigenvalues of L σ2 is variance of noise Observation from graph theory: 1 N

  • i≥2

1 λi = Rave (G) Rave (G) = average of all effective resistances between all pairs of nodes if the graph was an electrical network of unit resistors The error is determined by the topology

  • f the measurement graph:

e.g., the scaling in N depends on the graph dimension 4 5 1 2 3

5 / 17

slide-7
SLIDE 7

Gradient algorithm

slide-8
SLIDE 8

Gradient descent algorithm

The gradient of Ψ(z) = ||Az − b||2 is ∇Ψ(z) = 2Lz − 2A⊤b We define, choosing a parameter τ > 0,

  • x(0) = 0

x(k + 1) = (I − τL)x(k) + τA⊤b

Proposition (Convergence)

If τ < 1/dmax, where dmax is the largest degree in G, then lim

k→+∞ x(k) = x⋆

6 / 17

slide-9
SLIDE 9

Gradient descent algorithm

The gradient of Ψ(z) = ||Az − b||2 is ∇Ψ(z) = 2Lz − 2A⊤b We define, choosing a parameter τ > 0,

  • x(0) = 0

x(k + 1) = (I − τL)x(k) + τA⊤b

Proposition (Convergence)

If τ < 1/dmax, where dmax is the largest degree in G, then lim

k→+∞ x(k) = x⋆

The gradient algorithm is distributed: each node only needs to know the states of its neighbors x(k + 1) = (I − τL)

  • consensus algorithm

x(k) + τA⊤b

constant input

synchronous: all nodes update their states at the same time

6 / 17

slide-10
SLIDE 10

Finite-time optimality of the expected error

Assume ξvs are i.i.d. with zero mean and variance ν2 J(k) := 1 N EξEηx(k) − ξ2 expected error

7 / 17

slide-11
SLIDE 11

Finite-time optimality of the expected error

Assume ξvs are i.i.d. with zero mean and variance ν2 J(k) := 1 N EξEηx(k) − ξ2 expected error Results: If k ≥ ν2 τσ2 , then J(k + 1) ≥ J(k) (eventual increase)

  • the error J(k) has a minimum at a finite time kmin
  • kmin has an upper bound which does not depend on N or on G

Surprising conclusion: the algorithm should not be run until convergence, but stopped earlier, irrespective

  • f the measurement graph!

10 10

1

10

2

10

3

10

4

10

5

10

6

10

1

10

2

time Ring, ν = 20 , σ = 1, τ = 0.250 N = 10 N = 20 N = 40 N = 80 N = 160 N = 320

  • W. S. Rossi, P. Frasca, and F. Fagnani. Limited benefit of cooperation in distributed relative localization. In

IEEE Conference on Decision and Control, pages 5427–5431, Florence, Italy, December 2013

7 / 17

slide-12
SLIDE 12

Ergodic randomized algorithm

slide-13
SLIDE 13

Asynchronous randomized algorithm

We take a pairwise “gossip” approach Fix a real number γ ∈ (0, 1) At every time instant k ∈ Z+, an edge (u, v) ∈ E is sampled randomly P[(u, v) is selected at time k] = 1 |E| 4 5 1 2 3

8 / 17

slide-14
SLIDE 14

Asynchronous randomized algorithm

We take a pairwise “gossip” approach Fix a real number γ ∈ (0, 1) At every time instant k ∈ Z+, an edge (u, v) ∈ E is sampled randomly P[(u, v) is selected at time k] = 1 |E| 4 5 1 2 3

8 / 17

slide-15
SLIDE 15

Asynchronous randomized algorithm

We take a pairwise “gossip” approach Fix a real number γ ∈ (0, 1) At every time instant k ∈ Z+, an edge (u, v) ∈ E is sampled randomly P[(u, v) is selected at time k] = 1 |E| and the states are updated: xu(k + 1) = (1 − γ)xu(k) + γxv(k) + γb(u,v) xv(k + 1) = (1 − γ)xv(k) + γxu(k) − γb(u,v) xw(k + 1) = xw(k) if w / ∈ {u, v} 4 5 1 2 3 This is not a standard coordinate gradient This reminds a gossip consensus algorithm with a constant input

8 / 17

slide-16
SLIDE 16

Simulations: no convergence

The states x(k) persistently oscillate!!!

100 200 300 400 500 600 700 800 900 1000 −0.6 −0.4 −0.2 0.2 0.4 0.6 0.8

x k

100 200 300 400 500 600 700 800 900 1000 −2.5 −2 −1.5 −1 −0.5 0.5 1 1.5 2 2.5

x k Can we still use this algorithm?

9 / 17

slide-17
SLIDE 17

Countermeasure: time-averages

Time-averages smooth out the oscillations x(k) := 1 k + 1

k

  • ℓ=0

x(ℓ) = ⇒ x(k) → x⋆ as k → +∞

100 200 300 400 500 600 700 800 900 1000 −0.6 −0.4 −0.2 0.2 0.4 0.6 0.8

k x

100 200 300 400 500 600 700 800 900 1000 −2.5 −2 −1.5 −1 −0.5 0.5 1 1.5 2 2.5

k x

10 / 17

slide-18
SLIDE 18

Countermeasure: time-averages

Time-averages smooth out the oscillations x(k) := 1 k + 1

k

  • ℓ=0

x(ℓ) = ⇒ x(k) → x⋆ as k → +∞

100 200 300 400 500 600 700 800 900 1000 −0.4 −0.3 −0.2 −0.1 0.1 0.2 0.3 0.4 0.5

x k

100 200 300 400 500 600 700 800 900 1000 −2.5 −2 −1.5 −1 −0.5 0.5 1 1.5 2 2.5

k x

10 / 17

slide-19
SLIDE 19

Countermeasure: time-averages

Time-averages smooth out the oscillations x(k) := 1 k + 1

k

  • ℓ=0

x(ℓ) = ⇒ x(k) → x⋆ as k → +∞

100 200 300 400 500 600 700 800 900 1000 −0.4 −0.3 −0.2 −0.1 0.1 0.2 0.3 0.4 0.5

x k

100 200 300 400 500 600 700 800 900 1000 −1 −0.8 −0.6 −0.4 −0.2 0.2 0.4 0.6 0.8

x k

10 / 17

slide-20
SLIDE 20

Countermeasure: time-averages

Time-averages smooth out the oscillations x(k) := 1 k + 1

k

  • ℓ=0

x(ℓ) = ⇒ x(k) → x⋆ as k → +∞

100 200 300 400 500 600 700 800 900 1000 −0.4 −0.3 −0.2 −0.1 0.1 0.2 0.3 0.4 0.5

x k

100 200 300 400 500 600 700 800 900 1000 −1 −0.8 −0.6 −0.4 −0.2 0.2 0.4 0.6 0.8

x k thanks to: ergodicity of x(·): sample averages ⇐ ⇒ time averages simple average dynamics (like the gradient algorithm) E[x(k + 1)] =

  • I − γ

|E|L

  • E[x(k)] + γ

|E|A⊤b E[x(k)] → x⋆ as k → +∞

10 / 17

slide-21
SLIDE 21

Local vs global clocks

To compute time-averages x each sensor needs to know the absolute time k We can overcome this drawback by defining two auxiliary dynamics: local times: κw(0) = 1 for all w ∈ V κu(k + 1) = κu(k) + 1 κv(k + 1) = κv(k) + 1 κw(k + 1) = κw(k) if w / ∈ {u, v} “local” time-averages: xw(0) = 0 for all w ∈ V xu(k + 1) = 1 κu(k + 1)

  • κu(k)xu(k) + xu(k + 1)
  • xv(k + 1) =

1 κv(k + 1)

  • κv(k)xv(k) + xv(k + 1)
  • xw(k + 1) = xw(k)

if w / ∈ {u, v}

11 / 17

slide-22
SLIDE 22

Convergence of “local” time-averages

We obtain a correct algorithm:

Theorem (Ergodicity & Convergence)

lim

k→+∞ x(k) = x⋆ almost surely

lim

k→+∞ E[||x(k) − x⋆||2 2] = 0 and E

  • ||x(k) − x⋆||2

2

  • ≤ C(γ, G)

k

100 200 300 400 500 600 700 800 900 1000 −0.5 −0.4 −0.3 −0.2 −0.1 0.1 0.2 0.3 0.4 0.5

k

x

100 200 300 400 500 600 700 800 900 1000 −0.5 −0.4 −0.3 −0.2 −0.1 0.1 0.2 0.3 0.4 0.5

k time-averages

x x 12 / 17

slide-23
SLIDE 23

Estimation in power systems

slide-24
SLIDE 24

Power systems estimation

IEEE 14-bus test system

  • x is the state of the network

(frequencies at the nodes)

  • linearized model of measurement

z = Hx + ν

  • E[νν⊤] = R,

H full rank (observable)

  • measurements can be power flows, power

injections, PMU data, . . .

Least-squares problem: x⋆ = arg minx(z − Hx)⊤R−1(z − Hx) Closed-form solution: x⋆ = L−1u, where L = H⊤R−1H and u = H⊤R−1z Issues: possibly very large state; computing capacities not everywhere

13 / 17

slide-25
SLIDE 25

Grouping

We divide nodes into groups endowed with computing power partition V into N disjoint groups Vi ⊂ V

for i ∈ V = {1, 2, . . . , N}

decompose vectors z, x, u and matrix H into x = [x⊤

1 , . . . , x⊤ N ]⊤ where xi ∈ RVi , etc.

define neighborhoods, for each group i ∈ V

  • ut-neighbors:

Ni = {j ∈ V : Hji = 0} in-neighbors: Mi = {j ∈ V : Hij = 0}

4 5 1 2 9 7 3

11 13

6 8

12 14 10 Group 1 Group 3 Group 4 Group 2

Issue: the resulting graph can be directed (Hij = Hji) we need to design a new ergodic randomized algorithm. . .

14 / 17

slide-26
SLIDE 26

A more involved randomized algorithm

Initialization: (xi, κi, xi) = (0, 0, 0) ui = (H⊤R−1z)i =

  • ℓ∈

Ni

H⊤

ℓi R−1 ℓ zℓ

measurements zℓ from out-neighbors At time k ∈ Z≥0:

  • ne group j ∈

V is randomly chosen to initiate updates group j sends its current estimate xj(k) to its out-neighbors ℓ ∈ Nj groups ℓ compute y (i,j)

:= H⊤

ℓi R−1 ℓ Hℓjxj(k)

in-neighbors i ∈ Mℓ update by xi(k + 1) = xi(k) − τ

  • m∈

Ni∩ Nj

y (i,j)

m

+ τui, i ∈ Mℓ, ℓ ∈ Nj the states of the other nodes remain unchanged xi(k + 1) = xi(k), i / ∈ Mℓ, ℓ ∈ Nj

15 / 17

slide-27
SLIDE 27

Simulations

4 5 1 2 9 7 3

11 13

6 8

12 14 10 Group 1 Group 3 Group 4 Group 2

0.5 1 1.5 2 2.5 3 x 10

4

  • 0.3
  • 0.25
  • 0.2
  • 0.15
  • 0.1
  • 0.05

0.05

xi(k) k

  • P. Frasca, H. Ishii, C. Ravazzi, and R. Tempo. Distributed randomized algorithms for opinion for-

mation, centrality computation and power systems estimation. European Journal of Control, 2015. submitted

16 / 17

slide-28
SLIDE 28

Take-home summary & final comments

Estimation from relative measurements is ubiquitous: sensor networks, clock networks, statistical ranking graph theory is useful to formulate the problem estimation error depends on the measurement graph stopping times of (gradient) algorithms do not depend on size can be solved by distributed, asynchronous, randomized algorithms, which also apply to similar least-squares problems, like in power systems Note: Dynamics with randomization + averaging also relevant in PageRank computation and in social networks

  • C. Ravazzi, P. Frasca, R. Tempo, and H. Ishii. Ergodic randomized algorithms and dynamics
  • ver networks, 2015. To appear in IEEE Transactions on Control of Network Systems. URL:

http://arxiv.org/abs/1309.1349

17 / 17