predicting and estimation from time series
play

Predicting and Estimation from Time Series Bhiksha Raj 25 Nov 2014 - PowerPoint PPT Presentation

Machine Learning for Signal Processing Predicting and Estimation from Time Series Bhiksha Raj 25 Nov 2014 11-755/18797 1 Administrivia Final class on Tuesday the 2 nd .. Project Demos: 4 th December (Thursday). Before exams week


  1. Machine Learning for Signal Processing Predicting and Estimation from Time Series Bhiksha Raj 25 Nov 2014 11-755/18797 1

  2. Administrivia • Final class on Tuesday the 2 nd .. • Project Demos: 4 th December (Thursday). – Before exams week • Problem: How to set up posters for SV students? – Find a representative here? 11-755/18797 2

  3. An automotive example • Determine automatically, by only listening to a running automobile, if it is: – Idling; or – Travelling at constant velocity; or – Accelerating; or – Decelerating • Assume (for illustration) that we only record energy level (SPL) in the sound – The SPL is measured once per second 11-755/18797 3

  4. What we know • An automobile that is at rest can accelerate, or continue to stay at rest • An accelerating automobile can hit a steady- state velocity, continue to accelerate, or decelerate • A decelerating automobile can continue to decelerate, come to rest, cruise, or accelerate • A automobile at a steady-state velocity can stay in steady state, accelerate or decelerate 11-755/18797 4

  5. What else we know P(x|idle) P(x|decel) P(x|cruise) P(x|accel) 45 60 65 70 • The probability distribution of the SPL of the sound is different in the various conditions – As shown in figure • In reality, depends on the car • The distributions for the different conditions overlap – Simply knowing the current sound level is not enough to know the state of the car 11-755/18797 5

  6. The Model! P(x|accel) 0.33 70 Accelerating state P(x|idle) 0.5 0.33 0.33 0.5 0.33 0.25 0.33 Idling state Cruising state 65 0.25 45 0.25 0.25 I A C D 0.33 I 0.5 0.5 0 0 A 0 1/3 1/3 1/3 Decelerating state C 0 1/3 1/3 1/3 60 D 0.25 0.25 0.25 0.25 • The state-space model – Assuming all transitions from a state are equally probable 11-755/18797 6

  7. Estimating the state at T = 0- 0.25 0.25 0.25 0.25 Idling Accelerating Cruising Decelerating • A T=0, before the first observation, we know nothing of the state – Assume all states are equally likely 11-755/18797 7

  8. The first observation P(x|idle) P(x|decel) P(x|cruise) P(x|accel) 45 60 65 70 • At T=0 we observe the sound level x 0 = 68dB SPL – The observation modifies our belief in the state of the system • P(x 0 |idle) = 0 • P(x 0 |deceleration) = 0.0001 • P(x 0 |acceleration) = 0.7 • P(x 0 |cruising) = 0.5 – Note, these don’t have to sum to 1 – In fact, since these are densities, any of them can be > 1 11-755/18797 8

  9. Estimating state after at observing x 0 • P(state | x 0 ) = C P(state)P(x 0 |state) – P(idle | x 0 ) = 0 – P(deceleration | x 0 ) = C 0.000025 – P(cruising | x 0 ) = C 0.125 – P(acceleration | x 0 ) = C 0.175 • Normalizing – P(idle | x 0 ) = 0 – P(deceleration | x 0 ) = 0.000083 – P(cruising | x 0 ) = 0.42 – P(acceleration | x 0 ) = 0.57 11-755/18797 9

  10. Estimating the state at T = 0+ 0.57 0.42 8.3 x 10 -5 0.0 Idling Accelerating Cruising Decelerating • At T=0, after the first observation, we must update our belief about the states – The first observation provided some evidence about the state of the system – It modifies our belief in the state of the system 11-755/18797 10

  11. Predicting the state at T=1 I A C D A 0.57 I 0.5 0.5 0 0 0.42 I C A 0 1/3 1/3 1/3 C 0 1/3 1/3 1/3 8.3 x 10 -5 0.0 D D 0.25 0.25 0.25 0.25 Idling Accelerating Cruising Decelerating • Predicting the probability of idling at T=1 – P(idling|idling) = 0.5; – P(idling | deceleration) = 0.25 – P(idling at T=1| x 0 ) = P(I T=0 |x 0 ) P(I|I) + P(D T=0 |x 0 ) P(I|D) = 2.1 x 10 -5 • In general, for any state S – P(S T=1 | x 0 ) = S ST=0 P(S T=0 | x 0 ) P(S T=1 |S T=0 ) 11-755/18797 11

  12. Predicting the state at T = 1 0.57 0.42 8.3 x 10 -5 0.0 Idling Accelerating Cruising Decelerating P(S T=1 | x 0 ) = S ST=0 P(S T=0 | x 0 ) P(S T=1 |S T=0 ) 0.33 0.33 0.33 Rounded. In reality, they sum to 1.0 2.1x10 -5 11-755/18797 12

  13. Updating after the observation at T=1 P(x|idle) P(x|decel) P(x|cruise) P(x|accel) 45 60 65 70 • At T=1 we observe x 1 = 63dB SPL • P(x 1 |idle) = 0 • P(x 1 |deceleration) = 0.2 • P(x 1 |acceleration) = 0.001 • P(x 1 |cruising) = 0.5 11-755/18797 13

  14. Update after observing x 1 • P(state | x 0:1 ) = C P(state| x 0 )P(x 1 |state) – P(idle | x 0:1 ) = 0 P(x|idle) P(x|decel) P(x|cruise) P(x|accel) – P(deceleration | x 0,1 ) = C 0.066 45 60 65 70 – P(cruising | x 0:1 ) = C 0.165 0.33 0.33 0.33 – P(acceleration | x 0:1 ) = C 0.00033 2.1x10 -5 • Normalizing – P(idle | x 0:1 ) = 0 – P(deceleration | x 0:1 ) = 0.285 – P(cruising | x 0:1 ) = 0.713 – P(acceleration | x 0:1 ) = 0. 0014 11-755/18797 14

  15. Estimating the state at T = 1+ 0.713 0.285 0.0 0.0014 Idling Accelerating Cruising Decelerating • The updated probability at T=1 incorporates information from both x 0 and x 1 – It is NOT a local decision based on x 1 alone – Because of the Markov nature of the process, the state at T=0 affects the state at T=1 • x 0 provides evidence for the state at T=1 11-755/18797 15

  16. Estimating a Unique state • What we have estimated is a distribution over the states • If we had to guess a state, we would pick the most likely state from the distributions 0.57 0.42 • State(T=0) = Accelerating 8.3 x 10 -5 0.0 Idling Accelerating Cruising Decelerating 0.713 • State(T=1) = Cruising 0.285 0.0 0.0014 Idling Accelerating Cruising Decelerating 11-755/18797 16

  17. Overall procedure T=T+1 P(S T | x 0:T-1 ) = S ST-1 P(S T-1 | x 0:T-1 ) P(S T |S T-1 ) P(S T | x 0:T ) = C. P(S T | x 0:T-1 ) P(x T |S T ) Update the Predict the distribution of the distribution of the state at T state at T after observing x T PREDICT UPDATE • At T=0 the predicted state distribution is the initial state probability • At each time T, the current estimate of the distribution over states considers all observations x 0 ... x T – A natural outcome of the Markov nature of the model • The prediction+update is identical to the forward computation for HMMs to within a normalizing constant 11-755/18797 17

  18. Comparison to Forward Algorithm T=T+1 P(S T | x 0:T-1 ) = S ST-1 P(S T-1 | x 0:T-1 ) P(S T |S T-1 ) P(S T | x 0:T ) = C. P(S T | x 0:T-1 ) P(x T |S T ) Update the Predict the distribution of the distribution of the state at T state at T after observing x T PREDICT UPDATE • Forward Algorithm: – P(x 0:T ,S T ) = P ( x T | S T ) S ST-1 P ( x 0:T-1 , S T-1 ) P(S T |S T-1 ) PREDICT UPDATE • Normalized: – P(S T |x 0:T ) = ( S S’ T P(x 0:T ,S’ T ) ) -1 P(x 0:T ,S T ) = C P(x 0:T ,S T ) 11-755/18797 18

  19. Decomposing the forward algorithm  P(x 0:T ,S T ) = P ( x T | S T ) S ST-1 P ( x 0:T-1 , S T-1 ) P(S T |S T-1 ) • Predict:  P(x 0:T-1 ,S T ) = S ST-1 P ( x 0:T-1 , S T-1 ) P(S T |S T-1 ) • Update:  P(x 0:T ,S T ) = P ( x T | S T ) P(x 0:T-1 ,S T ) 11-755/18797 19

  20. Estimating the state T=T+1 P(S T | x 0:T-1 ) = S ST-1 P(S T-1 | x 0:T-1 ) P(S T |S T-1 ) P(S T | x 0:T ) = C. P(S T | x 0:T-1 ) P(x T |S T ) Update the Predict the distribution of the distribution of the state at T state at T after observing x T Estimate(S T ) = argmax ST P(S T | x 0:T ) Estimate(S T ) • The state is estimated from the updated distribution – The updated distribution is propagated into time, not the state 11-755/18797 20

  21. Predicting the next observation T=T+1 P(S T | x 0:T-1 ) = S ST-1 P(S T-1 | x 0:T-1 ) P(S T |S T-1 ) P(S T | x 0:T ) = C. P(S T | x 0:T-1 ) P(x T |S T ) Update the Predict the distribution of the distribution of the state at T state at T after observing x T Predict P(x T |x 0:T-1 ) Predict x T • The probability distribution for the observations at the next time is a mixture: – P(x T |x 0:T-1 ) = S ST P(x T |S T ) P(S T |x 0:T-1 ) • The actual observation can be predicted from P(x T |x 0:T-1 ) 11-755/18797 21

  22. Predicting the next observation • MAP estimate: – argmax xT P(x T |x 0:T-1 ) • MMSE estimate: – Expectation(x T |x 0:T-1 ) 11-755/18797 22

  23. Difference from Viterbi decoding • Estimating only the current state at any time – Not the state sequence – Although we are considering all past observations • The most likely state at T and T+1 may be such that there is no valid transition between S T and S T+1 11-755/18797 23

  24. A known state model • HMM assumes a very coarsely quantized state space – Idling / accelerating / cruising / decelerating • Actual state can be finer – Idling, accelerating at various rates, decelerating at various rates, cruising at various speeds • Solution: Many more states (one for each acceleration /deceleration rate, crusing speed)? • Solution: A continuous valued state 11-755/18797 24

  25. The real-valued state model • A state equation describing the dynamics of the system  e ( , ) s f s  1 t t t – s t is the state of the system at time t – e t is a driving function, which is assumed to be random • The state of the system at any time depends only on the state at the previous time instant and the driving term at the current time • An observation equation relating state to observation  g ( , ) o g s – o t is the observation at time t t t t – g t is the noise affecting the observation (also random) • The observation at any time depends only on the current state of the system and the noise 11-755/18797 25

Recommend


More recommend