EM in Equations I(x,y,t) ∑ = ∇ + T 2 E ( a ) w ( x )( I u ( x ; a ) I ) 2 2 2 t ∈ x , y R w ( x , y ) I(x,y,t+1) u ( x , y ; a ) 2 2 2 57
EM in Pictures Ok. So where do we get the weights? 58
EM in Pictures The weights represent the probability that the constraint “belongs” to a particular layer. 59
EM in Pictures Assume we know the motion of the layers but not the ownership probabilities of the pixels (weights). 60
EM in Pictures Assume we know the motion of the layers but not the ownership probabilities of the pixels (weights). Also assume we have a likelihood at each pixel: 1 1 2 σ + ≈ − ∇ + T 2 p ( I ( t ), I ( t 1 ) | a ) exp( ( I u ( a ) I ) / ) t π σ 2 2 61
EM in Pictures Given the flow, warp the first image towards the second. Look at the residual error ( I t ) (since the flow is now zero). Don’t match match 1 1 + ≈ − σ 2 2 p ( W ( I ( t ), a ), I ( t 1 ) | 0 ) exp( ( I ) / ) 1 t π σ 2 2 62
EM in Pictures Given the flow, warp the first image towards the second. Look at the residual error ( I t ) (since the flow is now zero). match Don’t match 1 1 + ≈ − σ 2 2 p ( W ( I ( t ), a ), I ( t 1 ) | 0 ) exp( ( I ) / ) 2 t π σ 2 2 63
EM in Pictures Two “explanations” for each pixel. Two likelihoods: + p ( I ( x , t 1 ) | u ( a )) 1 + p ( I ( x , t 1 ) | u ( a )) 2 64
EM in Pictures Compute total likelihood and normalize: + p ( I ( x , t 1 ) | u ( a )) = w ( x ) i ∑ i + p ( I ( x , t 1 ) | u ( a )) k k 65
Motion segmentation Example • Model image pair (or video sequence) as consisting of regions of parametric motion – affine motion is popular v x t x = a b x + v y t y c d y • iterate E/M… – determine which pixels belong to which region – estimate parameters 66
Three frames from the MPEG “flower garden” sequence Figure from “Representing Images with layers,”, by J. Wang and E.H. Adelson, IEEE Transactions on Image Processing, 1994, c 1994, IEEE 67
Grey level shows region no. with highest probability Segments and motion fields associated with them Figure from “Representing Images with layers,”, by J. Wang and E.H. Adelson, IEEE Transactions on Image Processing, 1994, c 1994, IEEE 68
If we use multiple frames to estimate the appearance of a segment, we can fill in occlusions; so we can re-render the sequence with some segments removed. Figure from “Representing Images with layers,”, by J. Wang and E.H. Adelson, IEEE Transactions on Image Processing, 1994, c 1994, IEEE 69
Lines • Simple case: we have one • We wish to determine line, and n points – line parameters – p(comes from line) • Some come from the line, some from “noise” • This is a mixture model: ( ) = P point | line ( ) P comes from line ( ) + P point | line and noise params ( ) P comes from noise ( ) P point | noise ( ) λ + P point | noise ( ) (1 − λ ) = P point | line • e.g., – allocate each point to a line with a weight, which is the probability of the point given the line – refit lines to the weighted set of points 70
Line fitting review • In case of single line and normal i.i.d. errors, maximum likelihood estimation reduces to least- squares: ( ) ∑ ∑ + − 2 = 2 min ax b y min r a , b i i a , b i i i • The line parameters (a,b) are solutions to the system: ∑ ∑ ∑ 2 x y x x a i i i i = i i i ∑ ∑ ∑ x 1 y b i i i i i 71
The E Step • Compute residuals: = + − r ( i ) a x b y 1 1 i 1 i = + − r ( i ) a x b y k (uniform noise model) 2 2 i 2 i • Compute soft assignments: 2 2 − σ r ( i ) / e 1 = w ( i ) 1 2 2 2 2 − σ − σ + r ( i ) / r ( i ) / e e 1 2 2 2 − σ r ( i ) / e 2 = w ( i ) 2 2 2 2 2 − σ − σ r ( i ) / + r ( i ) / e e 1 2 72
The M Step Weighted least squares system is solved for (a 1 ,b 1 ) ∑ ∑ ∑ 2 w ( i ) x y w ( i ) x w ( i ) x a 1 i i 1 i 1 i 1 = i i i ∑ ∑ ∑ w ( i ) y w ( i ) x w ( i ) b 1 i 1 i 1 1 i i i 73
74
The expected values of the deltas at the maximum (notice the one value close to zero). 75
Closeup of the fit 76
Issues with EM • Local maxima – can be a serious nuisance in some problems – no guarantee that we have reached the “right” maximum • Starting – k means to cluster the points is often a good idea 77
Local maximum 78
which is an excellent fit to some points 79
and the deltas for this maximum 80
Choosing parameters • What about the noise parameter, and the sigma for the line? – several methods • from first principles knowledge of the problem (seldom really possible) • play around with a few examples and choose (usually quite effective, as precise choice doesn’t matter much) – notice that if k n is large, this says that points very seldom come from noise, however far from the line they lie • usually biases the fit, by pushing outliers into the line • rule of thumb; its better to fit to the better fitting points, within reason; if this is hard to do, then the model could be a problem 81
Estimating the number of models • In weighted scenario, additional models will not necessarily reduce the total error. • The optimal number of models is a function of the σ parameter – how well we expect the model to fit the data. • Algorithm: start with many models. redundant models will collapse. 82
Fitting 2 lines to data points (x i ,y i ) • Input: r i – Data points that where generated by 2 lines with Gaussian noise. • Output: – The parameters of the 2 lines. – The assignment of each point to its line. y=a 1 x+b 1 + σ v y=a 2 x+b 2 + σ v 83 v~N(0,1)
The E Step • Compute residuals assuming known lines: = + − r ( i ) a x b y 1 1 i 1 i = + − r ( i ) a x b y 2 2 i 2 i • Compute soft assignments: 2 2 − σ r ( i ) / e 1 = w ( i ) 1 2 2 2 2 − σ − σ + r ( i ) / r ( i ) / e e 1 2 2 2 − σ r ( i ) / e 2 = w ( i ) 2 2 2 2 2 − σ − σ r ( i ) / + r ( i ) / e e 1 2 84
The M Step • In the weighted case we find ( ) ∑ ∑ + 2 2 min w ( i ) r ( i ) w ( i ) r ( i ) a , b 1 1 2 2 i i � Weighted least squares system is solved twice for (a 1 ,b 1 ) and (a 2 ,b 2 ). ∑ ∑ ∑ 2 w ( i ) x y w ( i ) x w ( i ) x a 1 i i 1 i 1 i 1 = i i i ∑ ∑ ∑ w ( i ) y w ( i ) x w ( i ) b 1 i 1 i 1 1 i i i ∑ ∑ ∑ 2 w ( i ) x y w ( i ) x w ( i ) x a 2 i 2 i 2 i i 2 = i i i ∑ ∑ ∑ w ( i ) y w ( i ) x w ( i ) b 2 i 2 i 2 2 i i i 85
Illustrations 86
Illustration
Illustration l=log(likelihood)
Color segmentation Example Parameters include mixing weights and means/covars: yielding with 89
EM for Mixture models If log-likelihood is linear in missing variables we can replace missing variables with expectations. E.g., mixture model complete data log-likelihood 1. (E-step) estimate complete data (e.g, z j ’s) using previous parameters 2. (M-step) maximize complete log-likelihood using estimated complete data 90
Color segmentation with EM 91
Color segmentation with EM Initialize 92
Color segmentation • At each pixel in an image, we compute a d - dimensional feature vector x , which encapsulates position, colour and texture information. • Pixel is generated by one of G segments, each Gaussian, chosen with probability π : 93
Color segmentation with EM Initialize E 94
Color segmentation with EM Initialize E M 95
E-step Estimate support maps: 96
M-step Update mean’s, covar’s, and mixing coef.’s using support map: 97
98
Segmentation with EM 99
Model Selection • We wish to choose a • Issue model to fit to data – In general, models with more parameters will fit a – e.g. is it a line or a circle? dataset better, but are – e.g is this a perspective or poorer at prediction orthographic camera? – This means we can’t simply – e.g. is there an aeroplane look at the negative log- there or is it noise? likelihood (or fitting error) 100
Recommend
More recommend