Super-resolution using Gaussian Process Regression Final Year Project Interim Report He He Department of Electronic and Information Engineering The Hong Kong Polytechnic Unviersity December 30, 2010 () December 30, 2010 1 / 33
Outline Introduction 1 Gaussian Process Regression 2 Multivariate Normal Distribution Gaussian Process Regression Training GPR for Super-resolution 3 Framework Covariance Function () December 30, 2010 2 / 33
Outline Introduction 1 Gaussian Process Regression 2 Multivariate Normal Distribution Gaussian Process Regression Training GPR for Super-resolution 3 Framework Covariance Function () December 30, 2010 3 / 33
The goal of super-resolution (SR) is to estimate a high-resolution (HR) image from one or a set of low-resolution (LR) images. It is widely applied in face recognition, medical imaging, HDTV etc. Figure: Face recognition in video. () December 30, 2010 4 / 33
The goal of super-resolution (SR) is to estimate a high-resolution (HR) image from one or a set of low-resolution (LR) images. It is widely applied in face recognition, medical imaging, HDTV etc. Figure: Super-resolution in medical imaging. () December 30, 2010 4 / 33
Super-resolution Methods Interpolation-based methods Fast but the HR image is usually blurred. E.g., bicubic interpolation, NEDI. Learning-based methods Hallucinate textures from the HR/LR image pair database. Reconstruction-based methods Formalize an optimization problem constrained by the LR image with various priors. () December 30, 2010 5 / 33
Outline Introduction 1 Gaussian Process Regression 2 Multivariate Normal Distribution Gaussian Process Regression Training GPR for Super-resolution 3 Framework Covariance Function () December 30, 2010 6 / 33
Multivariate Normal Distribution Definition A random vector X = ( X 1 , X 2 , . . . , X p ) is said to be multivariate normally (MVN) distributed if every linear combination of its components Y = a T X has a univariate normal distribution. Real-world random variables can often be approximated as following a multivariate normal distribution. The probability density function of X is � 1 � 1 2( x − µ ) T Σ − 1 ( x − µ ) f ( x ) = (2 π ) ( p / 2) | Σ | 1 / 2 exp (1) where µ is the mean of X and Σ is the covariance matrix. () December 30, 2010 7 / 33
Multivariate Normal Distribution Example Bivariate normal distribution � 1 � 0 µ = [1 1] ′ , Σ = . 0 1 () December 30, 2010 8 / 33
Multivariate Normal Distribution Property 1 The joint distribution of two MVN random variables is also an MVN distribution. � X 1 � Given X 1 ∼ N ( µ 1 , Σ 1 ), X 2 ∼ N ( µ 2 , Σ 2 ) and X = , we have X 2 � µ 1 � Σ 11 � � Σ 12 X ∼ N p ( µ, Σ ) with µ = , Σ = . µ 2 Σ 21 Σ 11 () December 30, 2010 9 / 33
Multivariate Normal Distribution Property 2 The conditional distribution of the components of MVN are (multivariate) normal. The distribution of X 1 , given that X 2 = x 2 , is normal and has Mean = µ 1 + Σ 12 Σ − 1 22 ( x 2 − µ 2 ) (2) Covariance = Σ 11 − Σ 12 Σ − 1 22 Σ 21 (3) () December 30, 2010 10 / 33
Gaussian Process Definition Gaussian Process (GP) defines a distribution over the function f , where f is a mapping from the input space X to R , such that for any finite subset of X , its marginal distribution P ( f ( x 1 ) , f ( x 2 ) , ... f ( x n )) is a multivariate normal distribution. f | X ∼ N ( m ( x ) , K ( X , X )) (4) where X = { x 1 , x 2 , . . . , x n } (5) m ( x ) = E [ f ( x )] (6) � ( f ( x i ) − m ( x ))( f ( x i ) T − m ( x T )) � k ( x i , x j ) = E (7) and K ( X , X ) denotes the covariance matrix such that K ij = k ( x i , x j ). () December 30, 2010 11 / 33
Gaussian Process Formally, we write the Gaussian Process as f ( x ) ∼ GP ( m ( x ) , k ( x i , x j )) (8) Without loss of generality, the mean is usually taken to be zero. Parameterized by the mean function m ( x ) and the covariance function k ( x i , x j ) Infer in the function space directly () December 30, 2010 12 / 33
Gaussian Process Regression Model: f ( x ) ∼ GP ( m ( x ) , k ( x i , x j )) (9) Given the inputs X ∗ , the output f ∗ is f ∗ ∼ N ( 0 , K ( X ∗ , X ∗ )) (10) According to the Gaussian prior, the joint distribution of the training outputs f , and the test outputs f ∗ is � f � K ( X , X ) � � �� K ( X , X ∗ ) ∼ N 0 , . (11) f ∗ K ( X ∗ , X ) K ( X ∗ , X ∗ ) () December 30, 2010 13 / 33
Noisy Model In reality, we do not have access to true function values but rather noisy observations. Assuming independent indentically distributed noise, we have the noisy model f ( x ) + ε, ε ∼ N (0 , σ 2 y = n ) (12) f ( x ) ∼ GP ( m ( x ) , K ( X , X )) (13) Var( f ( x )) + Var( ε ) = K ( X , X ) + σ 2 Var( y ) = n I (14) Thus, the joint distribution for prediction is � y � K ( X , X ) + σ 2 � � �� n I K ( X , X ∗ ) ∼ N 0 , (15) f ∗ K ( X ∗ , X ) K ( X ∗ , X ∗ ) () December 30, 2010 14 / 33
Prediction Referring to the previous property of the conditional distribution, we can obtain N ( ¯ f ∗ ∼ f , V ( f ∗ )) (16) ¯ K ( X ∗ , X )[ K ( X , X ) + σ 2 n I ] − 1 y , f ∗ = (17) V ( f ∗ ) = K ( X ∗ , X ∗ ) − K ( X ∗ , X )[ K ( X , X ) + σ 2 n I ] − 1 K ( X , X ∗ ) . (18) y are the training outputs and f ∗ are the test outputs, which are predicted as the mean ¯ f . () December 30, 2010 15 / 33
Marginal Likelihood GPR model: y = f + ǫ (19) f ∼ GP ( m ( x ) , K ) (20) N ( 0 , σ 2 ǫ ∼ n I ) (21) y is an n-dimensional vector of observations. Without loss of generality, let m ( x ) = 0. Thus y | X follows a normal distribution with E ( y | X ) = 0 (22) K ( X , X ) + σ 2 Var ( y | X ) = n I (23) () December 30, 2010 16 / 33
Marginal Likelihood Let K y = Var ( y | X ), � � 1 − 1 2 y T K − 1 p ( y | X ) = (2 π ) n / 2 | K y | 1 / 2 exp y y (24) The log marginal likelihood is L = log p ( y | X ) = − n 2log 2 π − 1 2log | K y | − 1 2 f T K − 1 y f (25) () December 30, 2010 17 / 33
Maximum a posteriori Matrix derivative: ∂ − Y − 1 ∂ Y Y − 1 ∂ x Y = (26) ∂θ i ∂ tr ( Y − 1 ∂ Y ∂ x log | Y | = ) (27) ∂θ i Gradient ascent: ∂ L = 1 2 y T K − 1 ∂ K K − 1 y − 1 2 tr ( K − 1 ∂ K ) (28) ∂θ i ∂θ i ∂θ i ∂ K ∂θ i is a matrix of derivatives of each element. () December 30, 2010 18 / 33
Outline Introduction 1 Gaussian Process Regression 2 Multivariate Normal Distribution Gaussian Process Regression Training GPR for Super-resolution 3 Framework Covariance Function () December 30, 2010 19 / 33
Graphical Representation Model: y = f ( x ) + ε Squares: observed pixels Circles: unknown Gaussian field Inputs ( x ): neighbors (predictors) of the target pixel Outputs ( y ): pixel at the center of each 3 × 3 patch Thick horizontal line: a set of fully connected nodes. () December 30, 2010 20 / 33
Workflow Stage 1: interpolation Input LR patch () December 30, 2010 21 / 33
Workflow Stage 1: interpolation Sample training targets () December 30, 2010 21 / 33
Workflow Stage 1: interpolation SR based on Bicubic Interpolation Stage 2: deblurring () December 30, 2010 21 / 33
Workflow Stage 1: interpolation Stage 2: deblurring Sample training targets () December 30, 2010 21 / 33
Workflow Stage 1: interpolation Stage 2: deblurring Obtain neighbors from the downsampled patch () December 30, 2010 21 / 33
Workflow Stage 1: interpolation Stage 2: deblurring SR based on the simulated blurring process () December 30, 2010 21 / 33
Covariance Equation defines the similarity between two points (vectors) indicate the underlying distribution of functions in GP Squared Exponential covariance function � ( x i − x j ) ′ ( x i − x j ) � − 1 k ( x i , x j ) = σ 2 f exp (29) ℓ 2 2 σ 2 f represents the signal variance and ℓ defines the characteristic length scale . Given an image I , the covariance between two pixels I i , j and I m , n is calculated as k ( I ( i , j ) , N , I ( m , n ) , N ), where N means to take the 8 nearest pixels around the pixel. Therefore, the similarity is based on the Euclidean distance between the pixels’ neighbors. () December 30, 2010 22 / 33
Covariance Equation (a) Test point (b) Training patch (c) Covariance ma- trix Local similarity : high responses (red regions) from the training patch are concentrated on edges Global similarity : high-responsive regions also include other similar edges within the patch Conclusion : pixels embedded in a similar structure to that of the target pixel in terms of the neighborhood tend to have higher weights during prediction () December 30, 2010 23 / 33
Hyperparameter Adaptation Hyperparameters : σ 2 f : signal variance σ 2 n : noise variance ℓ : characteristic length scale (a) Test (b) Training (c) ℓ = .50, (d) ℓ = .05, (e) ℓ = 1.65, σ n = .01 σ n = .001 σ n = .14 (c) : MAP estimation (d) : Quickly varying field with low noise (e) : Slowly varyin field with high noise () December 30, 2010 24 / 33
Recommend
More recommend