Time Series Bhiksha Raj Class 22. 14 Nov 2013 14 Nov 2013 - - PowerPoint PPT Presentation

time series
SMART_READER_LITE
LIVE PREVIEW

Time Series Bhiksha Raj Class 22. 14 Nov 2013 14 Nov 2013 - - PowerPoint PPT Presentation

Machine Learning for Signal Processing Predicting and Estimation from Time Series Bhiksha Raj Class 22. 14 Nov 2013 14 Nov 2013 11-755/18797 1 Administrivia No class on Tuesday.. Project Demos: 5 th December (Thursday). Before


slide-1
SLIDE 1

Machine Learning for Signal Processing

Predicting and Estimation from Time Series

Bhiksha Raj Class 22. 14 Nov 2013

14 Nov 2013 11-755/18797 1

slide-2
SLIDE 2

Administrivia

  • No class on Tuesday..
  • Project Demos: 5th December (Thursday).

– Before exams week

14 Nov 2013 11-755/18797 2

slide-3
SLIDE 3

An automotive example

  • Determine automatically, by only listening to a running

automobile, if it is:

– Idling; or – Travelling at constant velocity; or – Accelerating; or – Decelerating

  • Assume (for illustration) that we only record energy level

(SPL) in the sound

– The SPL is measured once per second

14 Nov 2013 11-755/18797 3

slide-4
SLIDE 4

What we know

  • An automobile that is at rest can accelerate, or

continue to stay at rest

  • An accelerating automobile can hit a steady-

state velocity, continue to accelerate, or decelerate

  • A decelerating automobile can continue to

decelerate, come to rest, cruise, or accelerate

  • A automobile at a steady-state velocity can

stay in steady state, accelerate or decelerate

14 Nov 2013 11-755/18797 4

slide-5
SLIDE 5

What else we know

  • The probability distribution of the SPL of the

sound is different in the various conditions

– As shown in figure

  • In reality, depends on the car
  • The distributions for the different conditions
  • verlap

– Simply knowing the current sound level is not enough to know the state of the car

14 Nov 2013 11-755/18797 5

45 70 65 60 P(x|idle) P(x|decel) P(x|cruise) P(x|accel)

slide-6
SLIDE 6

The Model!

  • The state-space model

– Assuming all transitions from a state are equally probable

14 Nov 2013 11-755/18797 6

45 P(x|idle) Idling state 70 P(x|accel) Accelerating state 65 Cruising state 60 Decelerating state 0.5 0.5 0.33 0.33 0.33 0.33 0.33 0.25 0.25 0.25 0.33 0.25 I A C D I 0.5 0.5 A 1/3 1/3 1/3 C 1/3 1/3 1/3 D 0.25 0.25 0.25 0.25

slide-7
SLIDE 7

Estimating the state at T = 0-

  • A T=0, before the first observation, we know

nothing of the state

– Assume all states are equally likely

14 Nov 2013 11-755/18797 7

Idling Accelerating Cruising Decelerating

0.25 0.25 0.25 0.25

slide-8
SLIDE 8

The first observation

  • At T=0 we observe the sound level x0 = 67dB SPL

– The observation modifies our belief in the state of the system

  • P(x0|idle) = 0
  • P(x0|deceleration) = 0.0001
  • P(x0|acceleration) = 0.7
  • P(x0|cruising) = 0.5

– Note, these don’t have to sum to 1 – In fact, since these are densities, any of them can be > 1

14 Nov 2013 11-755/18797 8

45 70 65 60 P(x|idle) P(x|decel) P(x|cruise) P(x|accel)

slide-9
SLIDE 9

Estimating state after at observing x0

  • P(state | x0) = C P(state)P(x0|state)

– P(idle | x0) = 0 – P(deceleration | x0) = C 0.000025 – P(cruising | x0) = C 0.125 – P(acceleration | x0) = C 0.175

  • Normalizing

– P(idle | x0) = 0 – P(deceleration | x0) = 0.000083 – P(cruising | x0) = 0.42 – P(acceleration | x0) = 0.57

14 Nov 2013 11-755/18797 9

slide-10
SLIDE 10

Estimating the state at T = 0+

  • At T=0, after the first observation, we must

update our belief about the states

– The first observation provided some evidence about the state of the system – It modifies our belief in the state of the system

14 Nov 2013 11-755/18797 10

Idling Accelerating Cruising Decelerating

0.0 0.57 0.42 8.3 x 10-5

slide-11
SLIDE 11

Predicting the state at T=1

  • Predicting the probability of idling at T=1

– P(idling|idling) = 0.5; – P(idling | deceleration) = 0.25 – P(idling at T=1| x0) = P(IT=0|x0) P(I|I) + P(DT=0|x0) P(I|D) = 2.1 x 10-5

  • In general, for any state S

– P(ST=1 | x0) = SST=0 P(ST=0 | x0) P(ST=1|ST=0)

14 Nov 2013 11-755/18797 11

I A C D I 0.5 0.5 A 1/3 1/3 1/3 C 1/3 1/3 1/3 D 0.25 0.25 0.25 0.25 I A C D

Idling Accelerating Cruising Decelerating 0.0 0.57 0.42 8.3 x 10-5

slide-12
SLIDE 12

Predicting the state at T = 1

14 Nov 2013 11-755/18797 12

Idling Accelerating Cruising Decelerating

0.0 0.57 0.42 8.3 x 10-5 2.1x10-5 0.33 0.33 0.33 P(ST=1 | x0) = SST=0 P(ST=0 | x0) P(ST=1|ST=0)

slide-13
SLIDE 13

Updating after the observation at T=1

  • At T=1 we observe x1 = 63dB SPL
  • P(x1|idle) = 0
  • P(x1|deceleration) = 0.2
  • P(x1|acceleration) = 0.001
  • P(x1|cruising) = 0.5

14 Nov 2013 11-755/18797 13

45 70 65 60 P(x|idle) P(x|decel) P(x|cruise) P(x|accel)

slide-14
SLIDE 14

Update after observing x1

  • P(state | x0:1) = C P(state| x0)P(x1|state)

– P(idle | x0:1) = 0 – P(deceleration | x0,1) = C 0.066 – P(cruising | x0:1) = C 0.165 – P(acceleration | x0:1) = C 0.00033

  • Normalizing

– P(idle | x0:1) = 0 – P(deceleration | x0:1) = 0.285 – P(cruising | x0:1) = 0.713 – P(acceleration | x0:1) = 0. 0014

14 Nov 2013 11-755/18797 14

slide-15
SLIDE 15

Estimating the state at T = 1+

  • The updated probability at T=1 incorporates

information from both x0 and x1

– It is NOT a local decision based on x1 alone – Because of the Markov nature of the process, the state at T=0 affects the state at T=1

  • x0 provides evidence for the state at T=1

14 Nov 2013 11-755/18797 15

Idling Accelerating Cruising Decelerating

0.0 0.713 0.0014 0.285

slide-16
SLIDE 16

Estimating a Unique state

  • What we have estimated is a distribution over

the states

  • If we had to guess a state, we would pick the

most likely state from the distributions

  • State(T=0) = Accelerating
  • State(T=1) = Cruising

14 Nov 2013 11-755/18797 16

Idling Accelerating Cruising Decelerating 0.0 0.713 0.0014 0.285 Idling Accelerating Cruising Decelerating 0.0 0.57 0.42 8.3 x 10-5

slide-17
SLIDE 17

Overall procedure

  • At T=0 the predicted state distribution is the initial state

probability

  • At each time T, the current estimate of the distribution over

states considers all observations x0 ... xT

– A natural outcome of the Markov nature of the model

  • The prediction+update is identical to the forward computation

for HMMs to within a normalizing constant

14 Nov 2013 11-755/18797 17

Predict the distribution of the state at T Update the distribution of the state at T after observing xT T=T+1

P(ST | x0:T-1) = SST-1 P(ST-1 | x0:T-1) P(ST|ST-1) P(ST | x0:T) = C. P(ST | x0:T-1) P(xT|ST)

PREDICT UPDATE

slide-18
SLIDE 18

Comparison to Forward Algorithm

  • Forward Algorithm:

– P(x0:T,ST) = P(xT|ST) SST-1 P(x0:T-1, ST-1) P(ST|ST-1)

  • Normalized:

– P(ST|x0:T) = (SS’T P(x0:T,S’T))-1 P(x0:T,ST) = C P(x0:T,ST)

14 Nov 2013 11-755/18797 18

Predict the distribution of the state at T Update the distribution of the state at T after observing xT T=T+1 PREDICT UPDATE PREDICT UPDATE

P(ST | x0:T-1) = SST-1 P(ST-1 | x0:T-1) P(ST|ST-1) P(ST | x0:T) = C. P(ST | x0:T-1) P(xT|ST)

slide-19
SLIDE 19

Decomposing the forward algorithm

 P(x0:T,ST) = P(xT|ST) SST-1 P(x0:T-1, ST-1) P(ST|ST-1)

  • Predict:

 P(x0:T-1,ST) = SST-1 P(x0:T-1, ST-1) P(ST|ST-1)

  • Update:

 P(x0:T,ST) = P(xT|ST) P(x0:T-1,ST)

14 Nov 2013 11-755/18797 19

slide-20
SLIDE 20

Estimating the state

  • The state is estimated from the updated

distribution

– The updated distribution is propagated into time, not the state

14 Nov 2013 11-755/18797 20

Estimate(ST) Predict the distribution of the state at T Update the distribution of the state at T after observing xT T=T+1

Estimate(ST) = argmax STP(ST | x0:T) P(ST | x0:T-1) = SST-1 P(ST-1 | x0:T-1) P(ST|ST-1) P(ST | x0:T) = C. P(ST | x0:T-1) P(xT|ST)

slide-21
SLIDE 21

Predicting the next observation

  • The probability distribution for the observations at the

next time is a mixture:

– P(xT|x0:T-1) = SST P(xT|ST) P(ST|x0:T-1)

  • The actual observation can be predicted from P(xT|x0:T-1)

14 Nov 2013 11-755/18797 21

Predict P(xT|x0:T-1) Predict the distribution of the state at T Update the distribution of the state at T after observing xT T=T+1 Predict xT

P(ST | x0:T-1) = SST-1 P(ST-1 | x0:T-1) P(ST|ST-1) P(ST | x0:T) = C. P(ST | x0:T-1) P(xT|ST)

slide-22
SLIDE 22

Predicting the next observation

  • MAP estimate:

– argmaxxT P(xT|x0:T-1)

  • MMSE estimate:

– Expectation(xT|x0:T-1)

14 Nov 2013 11-755/18797 22

slide-23
SLIDE 23

Difference from Viterbi decoding

  • Estimating only the current state at any time

– Not the state sequence – Although we are considering all past observations

  • The most likely state at T and T+1 may be such

that there is no valid transition between ST and ST+1

14 Nov 2013 11-755/18797 23

slide-24
SLIDE 24

A known state model

  • HMM assumes a very coarsely quantized state

space

– Idling / accelerating / cruising / decelerating

  • Actual state can be finer

– Idling, accelerating at various rates, decelerating at various rates, cruising at various speeds

  • Solution: Many more states (one for each

acceleration /deceleration rate, crusing speed)?

  • Solution: A continuous valued state

14 Nov 2013 11-755/18797 24

slide-25
SLIDE 25

The real-valued state model

  • A state equation describing the dynamics of the system

– st is the state of the system at time t – et is a driving function, which is assumed to be random

  • The state of the system at any time depends only on the state at

the previous time instant and the driving term at the current time

  • An observation equation relating state to observation

– ot is the observation at time t – gt is the noise affecting the observation (also random)

  • The observation at any time depends only on the current state of

the system and the noise

14 Nov 2013 11-755/18797 25

) , (

1 t t t

s f s e

) , (

t t t

s g

  • g

slide-26
SLIDE 26

Continuous state system

  • The state is a continuous valued parameter that is not directly

seen

– The state is the position of navlab or the star

  • The observations are dependent on the state and are the only

way of knowing about the state

– Sensor readings (for navlab) or recorded image (for the telescope)

) , (

t t t

s g

  • g

 ) , (

1 t t t

s f s e

slide-27
SLIDE 27

Statistical Prediction and Estimation

  • Given an a priori probability distribution for

the state

– P0(s): Our belief in the state of the system before we observe any data

  • Probability of state of navlab
  • Probability of state of stars
  • Given a sequence of observations o0..ot
  • Estimate state at time t

14 Nov 2013 11-755/18797 27

slide-28
SLIDE 28

Prediction and update at t = 0

  • Prediction

– Initial probability distribution for state – P(s0) = P0(s0)

  • Update:

– Then we observe o0 – We must update our belief in the state

  • P(s0|o0) = C.P0(s0)P(o0|s0)

14 Nov 2013 11-755/18797 28

) ( ) | ( ) ( ) ( ) | ( ) ( ) | (

  • P

s

  • P

s P

  • P

s

  • P

s P

  • s

P  

slide-29
SLIDE 29

The observation probability: P(o|s)

  • – This is a (possibly many-to-one) stochastic function
  • f state st and noise gt

– Noise gt is random. Assume it is the same dimensionality as ot

  • Let Pg(gt) be the probability distribution of gt
  • Let {g:g(st, g)=ot} be all g that result in ot

14 Nov 2013 11-755/18797 29

) , (

t t t

s g

  • g

t t t

  • s

g t s g t t

  • J

P s

  • P

) , ( : ) , (

| ) ( | ) ( ) | (

g g g g g

slide-30
SLIDE 30

The observation probability

  • P(o|s) = ?
  • The J is a jacobian
  • For scalar functions of scalar variables, it is simply a

derivative:

14 Nov 2013 11-755/18797 30

) , (

t t t

s g

  • g

t t t

  • s

g t s g t t

  • J

P s

  • P

) , ( : ) , (

| ) ( | ) ( ) | (

g g g g g ) ( ) ( ) 1 ( ) ( ) ( ) 1 ( ... ) 1 ( ) 1 ( | ) ( |

) , (

n n

  • n
  • n
  • J

t t t t t s g

t

g g g g

g

             g

g

  

t t s g

  • J

t

| ) ( |

) , (

slide-31
SLIDE 31

Predicting the next state

  • Given P(s0|o0), what is the probability of the state

at t=1

  • State progression function:

– et is a driving term with probability distribution Pe(et)

  • P(st|st-1) can be computed similarly to P(o|s)

– P(s1|s0) is an instance of this

14 Nov 2013 11-755/18797 31

 

 

} { 1 } { 1 1

) | ( ) | ( ) | , ( ) | (

s s

ds

  • s

P s s P ds

  • s

s P

  • s

P ) , (

1 t t t

s f s e

slide-32
SLIDE 32

And moving on

  • P(s1|o0) is the predicted state distribution for

t=1

  • Then we observe o1

– We must update the probability distribution for s1 – P(s1|o0:1) = CP(s1|o0)P(o1|s1)

  • We can continue on

14 Nov 2013 11-755/18797 32

slide-33
SLIDE 33

P(s1 | O0,O1)  C P(s1 | O0) P(O1|s1) P(s1| O0 ,O1)  C P(s1| O0) P(O1| s1) Update after O1:

Discrete vs. Continuous state systems

P(s0)  P(s) P(s0 | O0)  C p (s0)P(O0| s0) Update after O0: Prediction at time 1:

 ) | ( ) O | ( ) O | (

1 1 s

s s P s P s P

1 1

) | ( ) O | ( ) O | ( ds s s P s P s P

  

 P(s0)  p (s0) P(s0| O0)  C P(s0) P(O0| s0) Prediction at time 0: P(s) s

p 

0.2 0.3 0.4 0.1 1 2 3 1 2 3

) , (

t t t

s g

  • g

 ) , (

1 t t t

s f s e

slide-34
SLIDE 34

Update after Ot:

Discrete vs. Continuous State Systems

Prediction at time t:

 

1

) | ( ) O | ( ) O | (

1 1

  • t

: 1 1

  • t

:

t

s t t t t

s s P s P s P

1 1 1

  • t

: 1 1

  • t

:

) | ( ) O | ( ) O | (

     

t t t t t

ds s s P s P s P

) | O ( ) O | ( ) O | (

1

  • t

: t : t t t t

s P s CP s P 

) , (

t t t

s g

  • g

 ) , (

1 t t t

s f s e

) | O ( ) O | ( ) O | (

1

  • t

: t : t t t t

s P s CP s P 

1 2 3

slide-35
SLIDE 35

Initial state prob.

Discrete vs. Continuous State Systems

Parameters

p

) (s P

) | O ( s P

) | (

1  t t s

s P

) , (

t t t

s g

  • g

 ) , (

1 t t t

s f s e

) | ( } {

1

i s j s P T

t t ij

  

) | ( s

  • P

Transition prob Observation prob

1 2 3

slide-36
SLIDE 36

Special case: Linear Gaussian model

  • A linear state dynamics equation

– Probability of state driving term e is Gaussian – Sometimes viewed as a driving term me and additive zero-mean noise

  • A linear observation equation

– Probability of observation noise g is Gaussian

  • At, Bt and Gaussian parameters assumed known

– May vary with time

14 Nov 2013 11-755/18797 36

t t t t

s B

  • g

 

t t t t

s A s e  

1

   

 

e e e e

m e m e p e      

1

5 . exp | | ) 2 ( 1 ) (

T d

P

   

 

g g g g

m g m g p g      

1

5 . exp | | ) 2 ( 1 ) (

T d

P

slide-37
SLIDE 37

The initial state probability

  • We also assume the initial state distribution to

be Gaussian

– Often assumed zero mean

14 Nov 2013 11-755/18797 37

   

 

T d

s s R s s R s P    

1

5 . exp | | ) 2 ( 1 ) ( p ) , ; ( ) ( R s s Gaussian s P 

t t t t

s A s e  

1 t t t t

s B

  • g

 

slide-38
SLIDE 38

The observation probability

  • The probability of the observation, given the state, is

simply the probability of the noise, with the mean shifted

– Since the only uncertainty is from the noise

  • The new mean is the mean of the distribution of the

noise + the value of the observation in the absence of noise

14 Nov 2013 11-755/18797 38

t t t t

s B

  • g

 

) , ; ( ) (

g g

m g g   Gaussian P

) , ; ( ) | (

g g

m   

t t t t t

s B

  • Gaussian

s

  • P
slide-39
SLIDE 39

The updated state probability at T=0

  • P(s0| o0) = C P(s0) P(o0| s0)

14 Nov 2013 11-755/18797 39

) , ; ( ) ( R s s Gaussian s P 

) , ; ( ) | (

g g

m    s B

  • Gaussian

s

  • P

) , ; ( ) , ; ( ) | (

g g

m    s B

  • Gaussian

R s s CGaussian

  • s

P

t t t t

s B

  • g

 

) , ; ( ) (

g g

m g g   N P

slide-40
SLIDE 40

Note 1: product of two Gaussians

  • The product of two Gaussians is a Gaussian

14 Nov 2013 11-755/18797 40

) , ; ( ) , ; (   Bs

  • Gaussian

R s s Gaussian m

   

) ( ) ( 5 . exp ) ( ) ( 5 . exp

1 2 1 1

Bs

  • Bs
  • C

s s R s s C

T T

        

 

m m

     

 

1 1 1 1 1 1 1 1

, ) ( ; .

       

       B B R

  • B

s R B B R s Gaussian C

T T T

m

Not a good estimate --

slide-41
SLIDE 41

The updated state probability at T=0

  • P(s0| o0) = C P(s0) P(o0| s0)

14 Nov 2013 11-755/18797 41

) , ; ( ) ( R s s Gaussian s P 

) , ; ( ) | (

g g

m    s B

  • Gaussian

s

  • P

 ) | (

0 o

s P

     

 

1 1 1 1 1 1 1 1

, ) ( ;

       

       B B R

  • B

s R B B R s Gaussian

T T T g g g g

m

 

ˆ , ˆ ; ) | ( R s s Gaussian

  • s

P 

slide-42
SLIDE 42

The state transition probability

  • The probability of the state at time t, given the

state at time t-1 is simply the probability of the driving term, with the mean shifted

14 Nov 2013 11-755/18797 42

) , ; ( ) (

e e

m e e   Gaussian P

) , ; ( ) | (

1 1 e e

m   

  t t t t t

s A s Gaussian s s P

t t t t

s A s e  

1

slide-43
SLIDE 43

Note 2: integral of product of two Gaussians

  • The integral of the product of two Gaussians is a Gaussian

14 Nov 2013 11-755/18797 43

   

 

       

              dx b Ax y b Ax y C x x C dx b Ax y Gaussian x Gaussian

y T x x T x y x x

) ( ) ( 5 . exp ) ( ) ( 5 . exp ) , ; ( ) , ; (

1 2 1 1

m m m

 

T x y x

A A b A y Gaussian      , ; m

slide-44
SLIDE 44

Note 2: integral of product of two Gaussians

  • P(y) is the integral of the product of two Gaussians is a

Gaussian

14 Nov 2013 11-755/18797 44

 

     

     dx b Ax y Gaussian x Gaussian dx x y P y P

y x x

) , ; ( ) , ; ( ) , ( ) ( m

 

T x y x

A A b A y Gaussian      , ; m

e Ax y   ) , ( ~

x x

N x  m ) , ( ~

g

 b N e

 

T x y x

A A b A N y P      , ) ( m

slide-45
SLIDE 45

The predicted state probability at t=1

  • Remains Gaussian

14 Nov 2013 11-755/18797 45

 

     

 

1 1 1

) | ( ) | s ( ) | s , ( )

  • |

( ds s s P

  • P

ds

  • s

P s P

) , ; ( ) | (

1 1 1 e e

m    s A s Gaussian s s P

 

ˆ , ˆ ; ) | ( R s s Gaussian

  • s

P 

 

  

  

1 1 1

) , ; ( ˆ , ˆ ; )

  • |

( ds s A s Gaussian R s s Gaussian s P

e e

m

 

T

A R A s A s Gaussian

  • s

P

1 1 1 1 1

ˆ , ˆ ; ) | (    

e e

m

t t t t

s A s e  

1

slide-46
SLIDE 46

The updated state probability at T=1

  • P(s1| o0:1) = C P(s1 |o0) P(o1| s1)

14 Nov 2013 11-755/18797 46

) , ; ( ) | (

1 1 1 1 1 g g

m    s B

  • Gaussian

s

  • P

 

1 1 1 1 : 1

ˆ , ˆ ; ) | ( R s s Gaussian

  • s

P 

 

T

A R A s A s Gaussian

  • s

P

1 1 1 1 1

ˆ , ˆ ; ) | (    

e e

m

slide-47
SLIDE 47

The Kalman Filter!

  • Prediction at T
  • Update at T

14 Nov 2013 11-755/18797 47

     

 

1 1 1 1 1 1 1 1

, ) ( ;

       

      

t T t t t T t t t t T t t t

B B R

  • B

s R B B R s Gaussian

g g g g

m

 

t t t t t

R s s Gaussian

  • s

P ˆ , ˆ ; ) | (

:

 ) | (

: 0 t t o

s P

 

T t t t t t t t t

A R A s A s Gaussian

  • s

P

1 1 1 :

ˆ , ˆ ; ) | (

  

   

e e

m

 

t t t t t

R s s Gaussian

  • s

P , ; ) | (

1 :

t t t t

s A s e  

1 t t t t

s B

  • g

 

slide-48
SLIDE 48

Linear Gaussian Model

P(s0| O0)  C P(s0) P(O0| s0)

1 1

) | ( ) O | ( ) O | ( ds s s P s P s P

  

 P(s1| O0:1)  C P(s1| O0) P(O1| s0)

1 1 2 1 : 1 1 : 2

) | ( ) O | ( ) O | ( ds s s P s P s P

  

P(s2| O0:2)  C P(s2| O0:1) P(O2| s2) All distributions remain Gaussian P(s)  P(st|st-1)  P(Ot|st)  P(s0)  P(s)

a priori Transition prob. State output prob

t t t t

s B

  • g

 

t t t t

s A s e  

1

slide-49
SLIDE 49

The Kalman filter

  • The actual state estimate is the mean of the

updated distribution

  • Predicted state at time t
  • Updated estimate of state at time t

14 Nov 2013 11-755/18797 49

e

m   

  1 1 :

ˆ )] | ( [

t t t t t

s A

  • s

P mean s

   

) ( )] | ( [ ˆ

1 1 1 1 1 : g g g

m       

     t T t t t t T t t t t t

  • B

s R B B R

  • s

P mean s

slide-50
SLIDE 50

Stable Estimation

  • The above equation fails if there is no
  • bservation noise

– g = 0 – Paradoxical? – Happens because we do not use the relationship between o and s effectively

  • Alternate derivation required

– Conventional Kalman filter formulation

14 Nov 2013 11-755/18797 50

   

) ( )] | ( [ ˆ

1 1 1 1 1 : g g g

m       

     t T t t t t T t t t t t

  • B

s R B B R

  • s

P mean s

slide-51
SLIDE 51

Conditional Probability of y|x

  • The conditional probability of y given x is also Gaussian

– The slice in the figure is Gaussian

  • The mean of this Gaussian is a function of x
  • The variance of y reduces if x is known

– Uncertainty is reduced

12 Nov 2013 11755/18797 51

) ), ( ( ) | (

1 1 xy xx T yx yy x xx yx y

C C C C x C C N x y P

 

    m m

  • If P(x,y) is Gaussian:

) , ( ) , , (

, , , , , ,

            

yy yx xy xx y x

x y

k k k k k k

C C C C N k P m m

slide-52
SLIDE 52

A matrix inverse identity

– Work it out..

14 Nov 2013 11-755/18797 52

       

                     

              1 1 1 1 1 1 1 1 1 1 1 1 1 1

B A B C A B B A B C B A B C B A A B B A B C B A A C B B A

T T T T T T T

slide-53
SLIDE 53

For any jointly Gaussian RV

  • Using the Matrix Inversion Identity

14 Nov 2013 11-755/18797 53

       Y X Z       

Y X Z

m m m       

YY T XY XY XX Z

C C C C C        

               

              1 1 1 1 1 1 1 1 1 1 1 1 1 1 XY XX T XY YY XX T XY XY XX T XY YY XY XX T XY XY XX XX T XY XY XX T XY YY XY XX XX Z

C C C C C C C C C C C C C C C C C C C C C C C C C C

slide-54
SLIDE 54

For any jointly Gaussian RV

  • Using the Matrix Inversion Identity

14 Nov 2013 11-755/18797 54

       Y X Z       

Y X Z

m m m       

YY T XY XY XX Z

C C C C C

   

 

 Z Z T Z

Z C Z m m

1

 ) (X Quadratic

     

) ( ) (

1 1 1 1 X XX YX Y XY XX T XY YY T X XX YX Y

X C C Y C C C C X C C Y m m m m       

   

       

               

              1 1 1 1 1 1 1 1 1 1 1 1 1 1 XY XX T XY YY XX T XY XY XX T XY YY XY XX T XY XY XX XX T XY XY XX T XY YY XY XX XX Z

C C C C C C C C C C C C C C C C C C C C C C C C C C

slide-55
SLIDE 55

For any jointly Gaussian RV

  • The conditional of Y is a Gaussian

14 Nov 2013 11-755/18797 55

   ) ( 5 . exp( X Quadratic const

     )

) ( ) ( 5 .

1 1 1 1 X XX YX Y XY XX T XY YY T X XX YX Y

X C C Y C C C C X C C Y m m m m        

   

   

 

   

 Z Z T Z

Z C Z Const Y X P m m

1

5 . exp ) , (

     

 

) ( ) ( 5 . exp

1 1 1 1 X XX YX Y XY XX T XY YY T X XX YX Y

X C C Y C C C C X C C Y K m m m m        

   

 ) | ( X Y P

   

XY XX T XY YY X XX YX Y

C C C C X C C Y Gaussian

1 1

), ( ;

 

    m m

slide-56
SLIDE 56

Conditional Probability of y|x

  • The conditional probability of y given x is also Gaussian

– The slice in the figure is Gaussian

  • The mean of this Gaussian is a function of x
  • The variance of y reduces if x is known

– Uncertainty is reduced

12 Nov 2013 11755/18797 56

) ), ( ( ) | (

1 1 xy xx T yx yy x xx yx y

C C C C x C C N x y P

 

    m m

  • If P(x,y) is Gaussian:

) , ( ) , , (

, , , , , ,

            

yy yx xy xx y x

x y

k k k k k k

C C C C N k P m m

slide-57
SLIDE 57

Estimating P(s|o)

  • Consider the joint distribution of o and s

14 Nov 2013 11-755/18797 57

) , ; ( ) | (

1 :

R s s Gaussian

  • s

P

t

g   Bs

e e p g

g g 1

5 . exp | | ) 2 ( 1 ) (

   

T d

P Assuming g is 0 mean Dropping subscript t and o0:t-1 for brevity

       s

  • O

 O is a linear function of s

 Hence O is also Gaussian

) , ; ( ) (

O O

O Gaussian O P   m

slide-58
SLIDE 58

The joint PDF of o and s

  • o is Gaussian. Its cross covariance with s:

14 Nov 2013 11-755/18797 58

g   Bs

  • )

, ; ( ) | (

1 :

R s s Gaussian

  • s

P

t

) , ( ) (

g

g   Gaussian P

s B

m

g

  

T

  • BRB

C ,

) , ( ) | (

1 : g

  

 T t

BRB s B Gaussian

  • P

BR C

s

,

slide-59
SLIDE 59

The probability distribution of O

14 Nov 2013 11-755/18797 59

       s

  • O

g   Bs

                     s s B s E

  • E

s

  • E

O E

O

] [ ] [ ] [ ] [ m ) , ; ( ) ( R s s Gaussian s P 

) , ; ( ) (

g

g g   Gaussian P

) , ; ( ) (

O O

O Gaussian O P   m        s s B

O

m

slide-60
SLIDE 60

The probability distribution of O

14 Nov 2013 11-755/18797 60

) , ; ( ) (

O O

O Gaussian O P   m

       s s B

O

m

g   Bs

  • )

, ; ( ) ( R s s Gaussian s P 

) , ; ( ) (

g

g g   Gaussian P

       

s s

  • s

s

  • O

C C C C

, , , ,

          R RB BR BRB

T T O g

g

  

T

  • BRB

C , BR C

s

,

       s s B

O

m

slide-61
SLIDE 61

The probability distribution of O

14 Nov 2013 11-755/18797 61

) , ; ( ) (

O O

O Gaussian O P   m        s s B

O

m

g   Bs

  • )

, ; ( ) ( R s s Gaussian s P 

) , ; ( ) (

g

g g   Gaussian P

          R RB BR BRB

T T O g

       s

  • O
slide-62
SLIDE 62

The probability distribution of O

  • Writing it out in extended form

14 Nov 2013 11-755/18797 62

) , ; ( ) | , ( ) | (

1 : 1 : O O t t

O Gaussian

  • s
  • P
  • O

P   

 

m

   

 

                          

s s s B

  • R

RB BR BRB s s s B

  • C

T T T 1

5 . exp

g

slide-63
SLIDE 63

Recall: For any jointly Gaussian RV

  • Applying it to our problem (replace Y by s, X by o):

14 Nov 2013 11-755/18797 63

   

XY XX T XY YY X XX YX Y

C C C C X C C Y Gaussian X Y P

1 1

), ( ; ) | (

 

    m m

g

  

T

  • BRB

C , BR C

s

,

  • BRB

RB s B BRB RB I

T T T T 1 1

) ( ) ) ( (

 

      

g g

m

s B

m

 

  , ; ) | (

:

m s Gaussian

  • s

P

t

BR BRB RB R

T T 1

) (

    

g

slide-64
SLIDE 64

Stable Estimation

14 Nov 2013 11-755/18797 64

 Note that we are not computing g

  • 1 in this

formulation

 

t t

  • s
  • s

t

s Gaussian

  • s

P

: 1 : 1

| | :

, ; ) | (   m

t T T T T

  • s
  • BRB

RB s B BRB RB I

t

1 1 |

) ( ) ) ( (

: 1

 

      

g g

m

BR BRB RB R

T T

  • s

t

1 |

) (

: 1

    

g

slide-65
SLIDE 65

The Kalman filter

  • The actual state estimate is the mean of the

updated distribution

  • Predicted state at time t
  • Updated estimate of state at time t

14 Nov 2013 11-755/18797 65

e

m    

  1 1 :

ˆ )] | ( [

t t t t pred t t

s A

  • s

P mean s s

t T t t t T t t t t T t t t T t t t t

  • B

R B B R s B B R B B R I

  • s

s

1 1 1 : 1

) ( ) ) ( ( | ˆ

  

      

g g

m

t t t t

s A s e  

1 t t t t

s B

  • g

 

slide-66
SLIDE 66

The Kalman filter

  • Prediction
  • Update

14 Nov 2013 11-755/18797 66

e

m    

  1 1 :

ˆ )] | ( [

t t t t pred t t

s A

  • s

P mean s s

t t T t t t T t t t t

R B B R B B R R R

1

) ( ˆ

   

g

T t t t t

A R A R

1

ˆ

  

e

 

 

 

t T t t t T t t t t T t t t T t t t

  • B

R B B R s B B R B B R I s

1 1

ˆ

 

      

g g

slide-67
SLIDE 67

The Kalman filter

  • Prediction
  • Update

14 Nov 2013 11-755/18797 67

e

m  

1

ˆt

t t

s A s

 

t t t t

R B K I R   ˆ

T t t t t

A R A R

1

ˆ

  

e

 

t t t t t t

s B

  • K

s s    ˆ

 

1 

  

g T t t t T t t t

B R B B R K

t t t t

s A s e  

1 t t t t

s B

  • g

 

slide-68
SLIDE 68

The Kalman Filter

  • Very popular for tracking the state of

processes

– Control systems – Robotic tracking

  • Simultaneous localization and mapping

– Radars – Even the stock market..

  • What are the parameters of the process?

14 Nov 2013 11-755/18797 68

slide-69
SLIDE 69

Kalman filter contd.

  • Model parameters A and B must be known

– Often the state equation includes an additional driving term: st = Atst-1 + Gtut + et – The parameters of the driving term must be known

  • The initial state distribution must be known

14 Nov 2013 11-755/18797 69

t t t t

s B

  • g

 

t t t t

s A s e  

1

slide-70
SLIDE 70

Defining the parameters

  • State state must be carefully defined

– E.g. for a robotic vehicle, the state is an extended vector that includes the current velocity and acceleration

  • S = [X, dX, d2X]
  • State equation: Must incorporate appropriate

constraints

– If state includes acceleration and velocity, velocity at next time = current velocity + acc. * time step – St = ASt-1 + e

  • A = [1 t 0.5t2; 0 1 t; 0 0 1]

14 Nov 2013 11-755/18797 70

slide-71
SLIDE 71

Parameters

  • Observation equation:

– Critical to have accurate observation equation – Must provide a valid relationship between state and observations

  • Observations typically high-dimensional

– May have higher or lower dimensionality than state

14 Nov 2013 11-755/18797 71

slide-72
SLIDE 72

Problems

  • f() and/or g() may not be nice linear functions

– Conventional Kalman update rules are no longer valid

  • e and/or g may not be Gaussian

– Gaussian based update rules no longer valid

14 Nov 2013 11-755/18797 72

) , (

t t t

s g

  • g

 ) , (

1 t t t

s f s e

slide-73
SLIDE 73

Solutions

  • f() and/or g() may not be nice linear functions

– Conventional Kalman update rules are no longer valid – Extended Kalman Filter

  • e and/or g may not be Gaussian

– Gaussian based update rules no longer valid – Particle Filters

14 Nov 2013 11-755/18797 73

) , (

t t t

s g

  • g

 ) , (

1 t t t

s f s e