In the name of Allah the compassionate, the merciful
Digital Video Systems S. Kasaei S. Kasaei Room: CE 307 Department of Computer Engineering Sharif University of Technology E-Mail: skasaei@sharif.edu Webpage: http://sharif.edu/~skasaei Lab. Website: http://mehr.sharif.edu/~ipl
Acknowledgment Most of the slides used in this course have been provided by: Prof. Yao Wang (Polytechnic University, Brooklyn) based on the book: Video Processing & Communications written by: Yao Wang, Jom Ostermann, & Ya-Oin Zhang Prentice Hall, 1 st edition, 2001, ISBN: 0130175471. [SUT Code: TK 5105 .2 .W36 2001]
Chapter 6 2-D Motion Estimation Part I: Fundamentals & Basic Techniques
Outline � 2-D motion vs. optical flow � Optical flow equation & ambiguity in motion estimation � General methodologies in motion estimation � Motion representation � Motion estimation criterion � Optimization methods � Gradient descent methods � Pixel-based motion estimation � Block-based motion estimation � EBMA algorithm
2-D Motion vs. Optical Flow � 2-D Motion: Projection of 3-D motion depends on 3-D object motion & projection operator (physical aspects). � Optical flow: “Perceived” 2-D motion based on changes in image pattern, also depends on illumination & object surface texture. (a) A sphere is rotating under a constant ambient illumination, but the observed image does not change. (b) A A point light source is rotating around a stationary sphere, causing the highlight point on the sphere to rotate. (a) (b)
Correspondence & Optical Flow � 2-D displacement & velocity fields are projections of respective 3-D fields into the image plane. � The correspondence & optical flow fields are the displacement & velocity functions perceived from the time-varying image intensity pattern.
Correspondence & Optical Flow � The correspondence field & the optical flow field are also called “apparent 2-D displacement” field & “apparent 2-D velocity” field. � Since we can only observe correspondence & optical flow fields, we assume that they are the same as the 2-D motion field.
Optical Flow Equation � When illumination condition is unknown, the best one can do is to estimate the optical flow. � Constant intensity assumption (CIA) -> Optical flow (OF) equation.
Optical Flow Equation Under " constant intensity assumption " : ψ + + + = ψ ( x d , y d , t d ) ( x , y , t ) x y t But, using Taylor' s expansion : ∂ ψ ∂ ψ ∂ ψ ψ + + + = ψ + + + ( x d , y d , t d ) ( x , y , t ) d d d x y t x y t ∂ ∂ ∂ x y y Compare the above two, we have the optical flow equation : ∂ ψ ∂ ψ ∂ ψ ∂ ψ ∂ ψ ∂ ψ ∂ ψ T + + = + + = ∇ ψ + = d d d 0 or v v 0 or v 0 x y t x y ∂ ∂ ∂ ∂ ∂ ∂ ∂ x y t x y t t
Ambiguities in Motion Estimation � Optical flow equation only constrains the flow vector in the gradient direction ( ). v n � The flow vector in the tangent direction ( ) is under- v t determined. � In regions with constant brightness ( ), the flow is ∇ ψ = 0 indeterminate -> Motion = + v v e v e estimation is unreliable in n n t t ∂ ψ regions with flat texture, & ∇ ψ + = v 0 n ∂ t more reliable near edges.
General Considerations for Motion Estimation � Two categories of approaches: � Feature-based: More often used in object tracking, 3-D reconstruction from 2-D. � Intensity-based: Based on constant intensity assumption. More often used for motion compensated prediction (required in video coding), frame interpolation -> Our focus.
General Considerations for Motion Estimation � Three important questions: � How to represent the motion field? � What criteria to use to estimate motion parameters? � How to search motion parameters?
Motion Representation Pixel-based: Global: One MV at each pixel, Entire motion field is with some smoothness represented by a few constraint between global parameters adjacent MVs. (camera motion). Region-based: Block-based: Entire frame is divided Entire frame is divided into regions, each into blocks, and motion region corresponding in each block is to an object or sub- characterized by a few object with consistent parameters. motion, represented by a few parameters. Other representation: mesh-based (control grid) (to be discussed later).
Notations ψ 1 x ( ) Anchor frame: ψ 2 x ( ) Target frame: a Motion parameters: Motion vector at a pixel in the anchor d ( x ) frame: Motion field: ∈ Λ d ( x ; a ), x Mapping function: = + ∈ Λ w ( x ; a ) x d ( x ; a ), x
Motion Estimation Criterion � To minimize the displaced frame difference (DFD): = ∑ p ψ + − ψ → E ( a ) ( x d ( x ; a )) ( x ) min DFD 2 1 ∈ Λ x = = p 1 : MAD; P 2 : MSE � To satisfy the optical flow equation: p = ∑ ( ) T ∇ ψ + ψ − ψ → E ( a ) ( x ) d ( x ; a ) ( x ) ( x ) min OF 1 2 1 ∈ Λ x
Motion Estimation Criterion � To impose additional smoothness constraint using regularization technique (Important in pixel- & block - based representation): = ∑ ∑ 2 − E ( a ) d ( x ; a ) d ( y ; a ) s ∈ Λ ∈ x y N x + → w E ( a ) w E ( a ) min DFD DFD s s � Bayesian (MAP) criterion: to maximize the a posteriori probability: = ψ ψ → P ( D d , ) max 2 1
Relation Among Different Criteria � OF criterion is good only if motion is small. � OF criterion can often yield closed - form solution as the objective function is quadratic in MVs. � When the motion is not small, can iterate the solution based on the OF criterion to satisfy the DFD criterion. � Bayesian criterion can be reduced to the DFD criterion plus motion smoothness constraint. � More in the textbook. [DFD: displaced frame difference]
Optimization Methods � Exhaustive search: � Typically used for the DFD criterion with p=1 (MAD). � Guarantees reaching the global optimal. � Required computation may be unacceptable when number of parameters to search simultaneously is large! � Fast search algorithms reach sub - o ptimal solution in shorter time.
Optimization Methods � Gradient-based search: � Typically used for the DFD or OF criterion with p=2 (MSE) � The gradient can often be calculated analytically. � When used with the OF criterion, closed-form solution may be obtained. � Reaches the local optimal point closest to the initial solution. � Multi-resolution search: � Searches from coarse to fine resolution, faster than exhaustive search. � Avoids being trapped into a local minimum.
Gradient Descent Method � Iteratively updates the current estimate in the direction opposite to the gradient direction. Not a good initial. A good initial. Appropriate Stepsize. Too big Stepsize.
Gradient Descent Method � The solution depends on the initial condition. Reaches the local minimum closest to the initial condition. � Choice of step side: � Fixed stepsize: Stepsize must be small to avoid oscillation, requires many iterations. � Steepest gradient descent: Adjusts stepsize optimally.
Newton’s Method � Newton’s method:
Newton’s Method � Converges faster than 1 st order method ( i.e ., requires fewer number of iterations to reach convergence). � Requires more calculations in each iteration. � More prone to noise (gradient calculation is subject to noise, more with 2 nd order than with 1 st order). � May not converge if a >=1. Should choose it appropriate to reach a good compromise between guaranteeing convergence & the convergence rate.
Newton-Raphson Method � Newton-Ralphson method: � Approximates 2 nd order gradient with a product of 1 st order gradients. � Applicable when the objective function is a sum of squared errors. � Only needs to calculate 1 st order gradients, yet converge at a rate similar to Newton’s method.
Newton-Raphson Method
Pixel-Based Motion Estimation � Horn-Schunck method: � OF + smoothness criterion. � Multipoint neighborhood method: � Assumes that every pixel in a small block surrounding a pixel has the same MV. � Pel-recurrsive method: � MV for a current pel is updated from those of its previous pels, so that the MV does not need to be coded. � Developed for early generation of video coders.
Multipoint Neighborhood Method � Estimates the MV at each pixel independently, by minimizing the DFD error over a neighborhood surrounding this pixel. � Every pixel in the neighborhood is assumed to have the same MV. � Minimizing function: = ∑ 2 ψ + − ψ → E ( d ) w ( x ) ( x d ) ( x ) min DFD n 2 n 1 ∈ x B ( x ) n
Multipoint Neighborhood Method � Optimization method: � Exhaustive search (feasible as one only needs to search one MV at a time). � Needs to select the appropriate search range & the search step-size. � Gradient - b ased method.
Recommend
More recommend