Machine Learning for Signal Processing Predicting and Estimation from Time Series Bhiksha Raj 15 Nov 2016 11-755/18797 1
Preliminaries : P(y|x) for Gaussian • If P(x,y) is Gaussian: C C x xx xy ( , ) ( , ) P N x y C C y yx yy • The conditional probability of y given x is also Gaussian – The slice in the figure is Gaussian 1 1 ( | ) ( ( ), ) P y x N C C x C C C C y yx xx x yy yx xx xy • The mean of this Gaussian is a function of x • The variance of y reduces if x is known – Uncertainty is reduced 11-755/18797 2
Preliminaries : P(y|x) for Gaussian Best guess for Y when X is not known 1 1 ( | ) ( ( ), ) P y x N C C x C C C C y yx xx x yy yx xx xy 11-755/18797 3
Preliminaries : P(y|x) for Gaussian Update guess of Y based on information in X Correction is 0 if X and Y are uncorrelated, i.e C yx = 0 Correction of Y using information in X Best guess for Y when X is not known Mean of Y given X Given X value 1 1 ( | ) ( ( ), ) P y x N C C x C C C C y yx xx x yy yx xx xy 11-755/18797 4
Preliminaries : P(y|x) for Gaussian Correction to Y = slope * (offset of X from mean) Correction of Y using information in X Best guess for Y when X is not known Mean of Y given X Given X value offset 1 1 ( | ) ( ( ), ) P y x N C C x C C C C y yx xx x yy yx xx xy Slope 11-755/18797 5
Preliminaries : P(y|x) for Gaussian Correction of Y using information in X Best guess for Y when X is not known Uncertainty in Y when X is not known 1 1 ( | ) ( ( ), ) P y x N C C x C C C C y yx xx x yy yx xx xy 11-755/18797 6
Preliminaries : P(y|x) for Gaussian Shrinkage of variance is 0 if X and Y are uncorrelated, i.e C yx = 0 Correction of Y using information in X Reduced uncertainty from knowing X Best guess for Y when X is not known Uncertainty in Y when X is not known Shrinkage of uncertainty from knowing X 1 1 ( | ) ( ( ), ) P y x N C C x C C C C y yx xx x yy yx xx xy 11-755/18797 7
Preliminaries : P(y|x) for Gaussian Knowing X modifies the mean of Y and shrinks its variance Variance of Y when X is known Overall variance of Y when X is unknown Mean of Y given X (MAP estimate of Y) Given X value 1 1 ( | ) ( ( ), ) P y x N C C x C C C C y yx xx x yy yx xx xy 11-755/18797 8
The little parable You’ve been kidnapped And blindfolded You can only hear the car You must find your way back home from wherever they drop you off 11755/18797 15
Kidnapped! • Determine by only listening to a running automobile, if it is: – Idling; or – Travelling at constant velocity; or – Accelerating; or – Decelerating • You only record energy level (SPL) in the sound – The SPL is measured once per second 11-755/18797 16
What we know • An automobile that is at rest can accelerate, or continue to stay at rest • An accelerating automobile can hit a steady- state velocity, continue to accelerate, or decelerate • A decelerating automobile can continue to decelerate, come to rest, cruise, or accelerate • A automobile at a steady-state velocity can stay in steady state, accelerate or decelerate 11-755/18797 17
What else we know P(x|idle) P(x|decel) P(x|cruise) P(x|accel) 45 60 65 70 • The probability distribution of the SPL of the sound is different in the various conditions – As shown in figure • In reality, depends on the car • The distributions for the different conditions overlap – Simply knowing the current sound level is not enough to know the state of the car 11-755/18797 18
The Model! P(x|accel) 0.33 70 Accelerating state P(x|idle) 0.5 0.33 0.33 0.5 0.33 0.25 0.33 Idling state Cruising state 65 0.25 45 0.25 0.25 I A C D 0.33 I 0.5 0.5 0 0 A 0 1/3 1/3 1/3 Decelerating state C 0 1/3 1/3 1/3 60 D 0.25 0.25 0.25 0.25 • The state-space model – Assuming all transitions from a state are equally probable – This is a Hidden Markov Model! 19
Estimating the state at T = 0- 0.25 0.25 0.25 0.25 Idling Declerating Cruising Accelerating • A T=0, before the first observation, we know nothing of the state – Assume all states are equally likely 11-755/18797 20
The first observation: T=0 P(x|idle) P(x|decel) P(x|cruise) P(x|accel) 45 60 65 70 68dB • At T=0 you observe the sound level x 0 = 68dB SPL – The observation modifies our belief in the state of the system 11-755/18797 21
The first observation: T=0 P(x|idle) P(x|decel) P(x|cruise) P(x|accel) 45 60 65 70 68dB P(x|idle) P(x|deceleration) P(x|cruising) P(x|acceleration) 0 0.0001 0.5 0.7 These don’t have to sum to 1 0.7 0.5 Can even be greater than 1! 0 0.0001 Idling Declerating Cruising Accelerating 11-755/18797 22
The first observation: T=0 P(x|idle) P(x|decel) P(x|cruise) P(x|accel) 45 60 65 70 68dB 0.7 𝑸(𝐲 𝟏 |𝒕𝒖𝒃𝒖𝒇) 0.5 0 0.0001 Idling Declerating Cruising Accelerating Remember the prior 𝑸𝒔𝒋𝒑𝒔: 𝑸(𝒕𝒖𝒃𝒖𝒇) 0.25 0.25 0.25 0.25 Idling Declerating Cruising Accelerating 23
Estimating state after at observing x 0 • Combine prior information about state and evidence from observation • We want 𝑄(𝑡𝑢𝑏𝑢𝑓|𝐲 0 ) • We can compute it using Bayes rule as 𝑄 𝑡𝑢𝑏𝑢𝑓 𝑄(x 0 |𝑡𝑢𝑏𝑢𝑓) 𝑄 𝑡𝑢𝑏𝑢𝑓 𝑦 0 = 𝑄 𝑡𝑢𝑏𝑢𝑓 ′ 𝑄(x 0 |𝑡𝑢𝑏𝑢𝑓′) 𝑡𝑢𝑏𝑢𝑓′ 11-755/18797 24
The Posterior 0.7 𝑸(𝐲 𝟏 |𝒕𝒖𝒃𝒖𝒇) 0.5 0 0.0001 Idling Declerating Cruising Accelerating 𝑸𝒔𝒋𝒑𝒔: 𝑸(𝒕𝒖𝒃𝒖𝒇) 0.25 0.25 0.25 0.25 Idling Declerating Cruising Accelerating • Multiply the two, term by term, and normalize them so that they sum to 1.0 11-755/18797 25
Estimating the state at T = 0+ 𝑸(𝑻 𝑼=𝟏 |𝐲 𝟏 ) 0.57 0.42 8.3 x 10 -5 0.0 Idling Decelerating Cruising Accelerating • At T=0, after the first observation x 0 , we update our belief about the states – The first observation provided some evidence about the state of the system – It modifies our belief in the state of the system 11-755/18797 26
Predicting the state at T=1 I A C D A 0.57 I 0.5 0.5 0 0 0.42 I C A 0 1/3 1/3 1/3 8.3 x 10 -5 0.0 Idling Decel Cruising Accel C 0 1/3 1/3 1/3 D D 0.25 0.25 0.25 0.25 • Predicting the probability of idling at T=1 – P( idling | idling ) = 0.5; – P( idling | deceleration ) = 0.25 – P( idling at T=1| x 0 ) = P(I T=0 |x 0 ) P(I|I) + P(D T=0 |x 0 ) P(I|D) = 2.1 x 10 -5 • In general, for any state S • 𝑄 𝑇 𝑈=1 𝐲 0 = 𝑄 𝑇 𝑈=0 |𝐲 0 𝑄(𝑇 𝑈=1 |𝑇 𝑈=0 ) 𝑇 𝑈=0 27
Predicting the state at T = 1 0.57 𝑸(𝑻 𝑼=𝟏 |𝐲 𝟏 ) 0.42 8.3 x 10 -5 0.0 Idling Decelerating Cruising Accelerating 𝑄 𝑇 𝑈=1 𝐲 0 = 𝑄 𝑇 𝑈=0 |𝐲 0 𝑄(𝑇 𝑈=1 |𝑇 𝑈=0 ) 𝑇 𝑈=0 𝑸(𝑻 𝑼=𝟐 |𝐲 𝟏 ) 0.33 0.33 0.33 Rounded. In reality, they sum to 1.0 2.1x10 -5 11-755/18797 28
Updating after the observation at T=1 P(x|idle) P(x|decel) P(x|cruise) P(x|accel) 45 60 65 70 63dB • At T=1 we observe x 1 = 63dB SPL 11-755/18797 29
Updating after the observation at T=1 P(x|idle) P(x|decel) P(x|cruise) P(x|accel) 45 60 65 70 63dB P(x|idle) P(x|deceleration) P(x|cruising) P(x|acceleration) 0 0.2 0.5 0.01 𝑸(𝐲 𝟐 |𝒕𝒖𝒃𝒖𝒇) 0.5 0.2 0.02 0 Idling Declerating Cruising Accelerating 11-755/18797 30
The first observation: T=0 P(x|idle) P(x|decel) P(x|cruise) P(x|accel) 45 60 65 70 63dB 𝑸(𝐲 𝟐 |𝒕𝒖𝒃𝒖𝒇) 0.5 0.2 0.02 0 Idling Declerating Cruising Accelerating Remember the prior 𝑸𝒔𝒋𝒑𝒔: 𝑸(𝒕𝒖𝒃𝒖𝒇|𝐲 𝟏 ) 0.33 0.33 0.33 2.1x10 -5 Idling Declerating Cruising Accelerating 31
Estimating state after at observing x 1 • Combine prior information from the observation at time T=0, AND evidence from observation at T=1 to estimate state at T=1 • We want 𝑄(𝑡𝑢𝑏𝑢𝑓|𝐲 0 , 𝐲 1 ) • We can compute it using Bayes rule as 𝑄 𝑡𝑢𝑏𝑢𝑓|𝐲 0 𝑄(𝐲 1 |𝑡𝑢𝑏𝑢𝑓) 𝑄 𝑡𝑢𝑏𝑢𝑓 𝐲 0 , 𝐲 1 = 𝑄 𝑡𝑢𝑏𝑢𝑓 ′ |𝐲 0 𝑄(𝐲 1 |𝑡𝑢𝑏𝑢𝑓′) 𝑡𝑢𝑏𝑢𝑓′ 11-755/18797 32
The Posterior at T = 1 𝑸(𝐲 𝟐 |𝒕𝒖𝒃𝒖𝒇) 0.5 0.2 0.02 0 Idling Declerating Cruising Accelerating 𝑸𝒔𝒋𝒑𝒔: 𝑸(𝒕𝒖𝒃𝒖𝒇|𝐲 𝟏 ) 0.33 0.33 0.33 2.1x10 -5 Idling Declerating Cruising Accelerating • Multiply the two, term by term, and normalize them so that they sum to 1.0 11-755/18797 33
Recommend
More recommend