Machine Learning for Signal Processing Linear Gaussian Models Class 21. 12 Nov 2013 Instructor: Bhiksha Raj 12 Nov 2013 11755/18797 1
Administrivia • HW3 is up – . • Projects – please send us an update 12 Nov 2013 11755/18797 2
Recap: MAP Estimators • MAP ( Maximum A Posteriori ): Find a “best guess” for y (statistically), given known x y = argmax Y P (Y | x ) 12 Nov 2013 11755/18797 3
Recap: MAP estimation • x and y are jointly Gaussian x y x z E [ z ] z y C C T xx xy C E [( x )( y ) ] Var ( z ) C xy x y zz C C yx yy 1 T P ( z ) N ( , C ) exp 0 . 5 ( z )( z ) z zz z z 2 | C | zz • z is Gaussian 12 Nov 2013 11755/18797 4
MAP estimation: Gaussian PDF Y F1 X 12 Nov 2013 11755/18797 5
MAP estimation: The Gaussian at a particular value of X x 0 12 Nov 2013 11755/18797 6
Conditional Probability of y|x 1 T 1 P ( y | x ) N ( C C ( x ), C C C C ) y yx xx x yy yx xx xy 1 E [ y ] C C ( x ) y | x y | x y yx xx x T 1 Var ( y | x ) C C C C yy xy xx xy • The conditional probability of y given x is also Gaussian – The slice in the figure is Gaussian • The mean of this Gaussian is a function of x • The variance of y reduces if x is known – Uncertainty is reduced 12 Nov 2013 11755/18797 7
MAP estimation: The Gaussian at a particular value of X Most likely value F1 x 0 12 Nov 2013 11755/18797 8
MAP Estimation of a Gaussian RV ˆ y arg max P ( y | x ) E [ y ] y y | x x 0 12 Nov 2013 11755/18797 9
Its also a minimum-mean-squared error estimate • Minimize error: 2 T ˆ ˆ ˆ Err E [ y y | x ] E [ y y y y | x ] ˆ ˆ ˆ ˆ ˆ ˆ T T T T T T Err E [ y y y y 2 y y | x ] E [ y y | x ] y y 2 y E [ y | x ] • Differentiating and equating to 0: ˆ ˆ ˆ T T d . Err 2 y d y 2 E [ y | x ] d y 0 ˆ The MMSE estimate is the y E [ y | x ] mean of the distribution 12 Nov 2013 11755/18797 10
For the Gaussian: MAP = MMSE Most likely value is also The MEAN value Would be true of any symmetric distribution 12 Nov 2013 11755/18797 11
MMSE estimates for mixture distributions P ( | ) P ( k | ) P ( | k , ) y x x y x k Let P( y | x ) be a mixture density The MMSE estimate of y is given by E [ y | x ] y P ( k | x ) P ( y | k , x ) d y P ( k | x ) y P ( y | k , x ) d y k k P ( k | ) E [ | k , ] x y x k Just a weighted combination of the MMSE estimates from the component distributions 12
MMSE estimates from a Gaussian mixture Let P( x , y ) be a Gaussian Mixture x P ( x, y ) P ( z ) P ( k ) N ( z ; , ) y z k k k P( y|x ) is also a Gaussian mixture P ( k , x , y ) P ( x ) P ( k | x ) P ( y | x , k ) P ( x, y ) k k P ( y | x ) P ( x ) P ( x ) P ( x ) P ( y | x ) P ( k | x ) P ( y | x , k ) k 12 Nov 2013 11755/18797 13
MMSE estimates from a Gaussian mixture Let P( y|x ) is a Gaussian Mixture P ( y | x ) P ( k | x ) P ( y | x , k ) k C C k , x k , xx k , xy P ( y , x , k ) N ( , ) C C k , y k , yx k , yy 1 P ( y | x , k ) N ( C C ( x ), ) k , y k , yx k , xx k , x 1 P ( y | x ) P ( k | x ) N ( C C ( x ), ) k , y k , yx k , xx k , x k 12 Nov 2013 11755/18797 14
MMSE estimates from a Gaussian mixture 1 P ( y | x ) P ( k | x ) N ( C C ( x ), ) k , y k , yx k , xx k , x k P( y | x ) is a mixture Gaussian density E[ y | x ] is also a mixture E [ y | x ] P ( k | x ) E [ y | k , x ] k 1 E [ y | x ] P ( k | x ) C C ( x ) k , y k , yx k , xx k , x k 12 Nov 2013 11755/18797 15
MMSE estimates from a Gaussian mixture 1 E [ y | x ] P ( k | x ) C C ( x ) k , y k , yx k , xx k , x k Weighted combination of MMSE estimates obtained from individual Gaussians! Weight P ( k | x ) is easily computed too.. P ( k , x ) P ( x ) P ( k ) N ( , C ) P ( k | x ) k , x xx P ( x ) k 12 Nov 2013 11755/18797 16
MMSE estimates from a Gaussian mixture A mixture of estimates from individual Gaussians 12 Nov 2013 11755/18797 17
Voice Morphing • Align training recordings from both speakers – Cepstral vector sequence • Learn a GMM on joint vectors • Given speech from one speaker, find MMSE estimate of the other • Synthesize from cepstra 12 Nov 2013 11755/18797 18
MMSE with GMM: Voice Transformation - Festvox GMM transformation suite (Toda) awb bdl jmk slt awb bdl jmk slt 12 Nov 2013 11755/18797 19
MAP / ML / MMSE • General statistical estimators • All used to predict a variable, based on other parameters related to it.. • Most common assumption: Data are Gaussian, all RVs are Gaussian – Other probability densities may also be used.. • For Gaussians relationships are linear as we saw.. 12 Nov 2013 11755/18797 20
Gaussians and more Gaussians.. • Linear Gaussian Models.. • But first a recap 12 Nov 2013 11755/18797 21
A Brief Recap D C D BC B • Principal component analysis: Find the K bases that best explain the given data • Find B and C such that the difference between D and BC is minimum – While constraining that the columns of B are orthonormal 12 Nov 2013 11755/18797 22
Remember Eigenfaces • Approximate every face f as f = w f,1 V 1 + w f,2 V 2 + w f,3 V 3 +.. + w f,k V k • Estimate V to minimize the squared error • Error is unexplained by V 1 .. V k • Error is orthogonal to Eigenfaces 12 Nov 2013 11755/18797 23
Karhunen Loeve vs. PCA • Eigenvectors of the Correlation matrix: – Principal directions of tightest ellipse centered on origin – Directions that retain maximum energy 12 Nov 2013 11755/18797 24
Karhunen Loeve vs. PCA • • Eigenvectors of the Correlation Eigenvectors of the Covariance matrix: matrix: – Principal directions of tightest – Principal directions of tightest ellipse centered on data ellipse centered on origin – Directions that retain maximum – Directions that retain variance maximum energy 12 Nov 2013 11755/18797 25
Karhunen Loeve vs. PCA • • Eigenvectors of the Correlation Eigenvectors of the Covariance matrix: matrix: – Principal directions of tightest – Principal directions of tightest ellipse centered on data ellipse centered on origin – Directions that retain maximum – Directions that retain variance maximum energy 12 Nov 2013 11755/18797 26
Karhunen Loeve vs. PCA • • Eigenvectors of the Correlation Eigenvectors of the Covariance matrix: matrix: – Principal directions of tightest – Principal directions of tightest ellipse centered on data ellipse centered on origin – Directions that retain maximum – Directions that retain variance maximum energy 12 Nov 2013 11755/18797 27
Karhunen Loeve vs. PCA • If the data are naturally centered at origin, KLT == PCA • Following slides refer to PCA! – Assume data centered at origin for simplicity • Not essential, as we will see.. 12 Nov 2013 11755/18797 28
Remember Eigenfaces • Approximate every face f as f = w f,1 V 1 + w f,2 V 2 + w f,3 V 3 +.. + w f,k V k • Estimate V to minimize the squared error • Error is unexplained by V 1 .. V k • Error is orthogonal to Eigenfaces 12 Nov 2013 11755/18797 29
Eigen Representation = w 11 + e 1 0 e 1 w 11 Illustration assuming 3D space • K-dimensional representation – Error is orthogonal to representation – Weight and error are specific to data instance 12 Nov 2013 11755/18797 30
Representation = w 12 + e 2 Error is at 90 o to the eigenface w 12 90 o e 2 Illustration assuming 3D space • K-dimensional representation – Error is orthogonal to representation – Weight and error are specific to data instance 12 Nov 2013 11755/18797 31
Representation All data with the same 0 representation wV 1 w lie a plane orthogonal to wV 1 • K-dimensional representation – Error is orthogonal to representation 12 Nov 2013 11755/18797 32
Recommend
More recommend