Image Restoration from a Machine Learning Perspective September 2012 NIST Dianne P. O’Leary � 2012 c 1
Image Restoration Using Machine Learning Dianne P. O’Leary Applied and Computational Mathematics Division, NIST Computer Science Dept. and Institute for Advanced Computer Studies University of Maryland oleary@cs.umd.edu http://www.cs.umd.edu/users/oleary Joint work with Julianne Chung (Virginia Tech), Matthias Chung (Virginia Tech) Support from NIST and NSF. 2
The Problem • Focus on numerical solution of ill-posed problems. • In particular, we try to reconstruct a clear image from a blurred one. • Focus on methods that take advantage of the singular value decomposition (SVD) of a matrix (spectral methods). 3
Goal of our work: To achieve better solutions than previously obtained from the SVD. Ingredients: • Exploiting training data. • Using Bayesian estimation. • Designing optimal filters. Note: I’ll focus in this talk on methods that take advantage of having the full SVD available, but our methods can exploit the savings of using iterative methods as well. 4
The Problem We have m observations b i resulting from convolution of a blurring function with a true image. We model this as a linear system b = Ax true + δ , where b ∈ R m is the vector of observed data, x true ∈ R n is an unknown vector containing values of x ( t j ) , matrix A ∈ R m × n , m ≥ n , is known, and δ ∈ R m represents noise in the data. Goal: compute an approximation of x true , given b and A . In other words: We need to learn the mapping between blurred images and true ones. 5
Problem characteristics This is a discretization of an ill-posed inverse problem, meaning that small perturbations in the data may result in large errors in the solution. 6
Example Suppose we have taken a picture but our lens gives us some Gaussian blur: a single bright pixel the blurred pixel We construct the matrix A from the blurred image. Our problem becomes x � b − Ax � 2 min 2 . 7
Can we deblur this image? 8
9
Remedy We regularize our problem by using extra information we have about the solution. For example, • We may have a bound on � x � 1 or � x � 2 . • We may know that 0 ≤ x , and we may have upper bounds, too. 10
Example, continued Suppose we replace our problem Ax = b by x � b − Ax � 2 min 2 subject to � x � 2 ≤ β. This formulation is called Tikhonov regularization. Using a Lagrange multiplier λ , this problem becomes x � b − Ax � 2 max min 2 + λ ( � x � 2 − β ) . λ 11
Write the solution to this problem using a spectral decomposition, the SVD of A : A = U Σ V T , where � ˆ � Σ • Σ = is diagonal with entries equal to the singular values 0 σ 1 ≥ σ 2 ≥ · · · ≥ σ n ≥ 0 . • The singular vectors u i ( i = 1 , . . . , m ) and v i ( i = 1 , . . . , n ) are columns of the matrices U and V respectively. • The singular vectors are orthonormal, so U T U = I m and V T V = I n . The solution becomes x = V ( Σ T Σ + λ I ) − 1 Σ T c , where c = U T b . Unfortunately, we don’t know λ , so a bit of trial-and-error is necessary. 12
Can we deblur this image? (Revisited) 13
14
15
16
17
18
19
20
21
22
What makes spectral methods work? For discretizations of ill-posed problems: • The singular values σ i > 0 have a clusterpoint at 0 as m, n → ∞ . • There is no noticeable gap in the singular values, and therefore the matrix A should be considered to be full-rank. • The small singular values correspond to oscillatory singular vectors. We need two further features: • The discretization is fine enough that to satisfy the discrete Picard condition : the sequence {| u T i b true |} decreases to 0 faster than { σ i } . • The noise components δ j , j = 1 , . . . , m , are uncorrelated, zero mean, and have identical but unknown variance. 23
0 10 −2 10 −4 10 −6 10 −8 10 0 100 200 300 400 500 i Picard plot: The singular values, represented with a red solid line, exhibit gradual decay to 0. The coefficients | u T i b | are represented by blue stars. 24
The Plan • The Problem • Spectral Filtering • Learning the Filter: Data to the Rescue • Judging Goodness • Results • Conclusions 25
Spectral Filtering We wrote our Tikhonov solution as x = V ( Σ T Σ + λ I ) − 1 Σ T c , where c = U T b . We can express this as x = V Φ ( λ ) Σ † c , where the diagonal matrix Φ is Φ ( λ ) = ( Σ T Σ + λ I ) − 1 Σ T Σ . For Tikhonov, λ is a single parameter. • Can we do better by using more parameters, resulting in a filter matrix Φ ( α ) ? • If so, how can we choose α ? We will learn it! 26
The Plan • The Problem • Spectral Filtering • Learning the Filter: Data to the Rescue • Judging Goodness • Results • Conclusions 27
Learning the Filter: Data to the Rescue What do we need? Informally, we need: • Knowledge of A . • A universe of possible true images. • A blurred image corresponding to one of these true images, chosen at random. • Knowledge of some characteristics of the noise. • Some training data. 28
More formally, we need: • Knowledge of A . We assume we know it exactly. • A universe of possible true images. We assume that the true images that resulted in the ones presented to us are chosen from a known probability distribution P ξ on images in Ξ ⊂ R n that has finite second moments. • A blurred image corresponding to one of these true images, chosen at random, according to P ξ . • Knowledge of some characteristics of the noise: mean zero, finite second moments, known probability distribution P δ on noise vectors in ∆ ⊂ R n . • Some training data: pairs consisting of a true image and its resulting blurred image. 29
Where does the training data come from? When an expensive imaging device is powered on, there is often a calibration procedure. For example, for an MRI machine, we might use a phantom, made of material with density similar to that of the brain, and insert a small sphere with density similar to that of a tumor. Taking images of the phantom at different positions in the field of view, or at different well-controlled rotations, gives us pairs of truth and measured values. 30
The Plan • The Problem • Spectral Filtering • Learning the Filter: Data to the Rescue • Judging Goodness • Results • Conclusions 31
How do we judge goodness of parameters? We want to minimize the error in our reconstruction! We settle for minimizing the expected error in our reconstruction: Error vector e ( α , ξ , δ ) = x filter ( α , ξ , δ ) − ξ , Measure error as n ERR( α , ξ , δ ) = 1 � ρ ( e i ( α , ξ , δ )) , n i =1 where, for example, ρ ( z ) = 1 p | z | p , for p ≥ 1 , related to the p -norm of the error vector. 32
Choice of ρ We use 1-norm, 2-norm, p -norm ( p = 4 , as an approximation to the ∞ -norm). We also use the Huber function to reduce the effects of outliers. | z | − β 2 , if | z | ≥ β, ρ ( z ) = 2 β z 2 , 1 if | z | < β, 33
Bayes risk minimization An optimal filter would minimize the expected value of the error: α = arg min α f ( α ) = E δ , ξ { ERR( α , ξ , δ ) } , ˇ Given our training data, we approximate this problem by minimizing the empirical Bayes risk α = arg min α f N ( α ) , ˆ where N n 1 ρ ( e ( k ) � � f N ( α ) = i ( α )) , nN k =1 i =1 where the samples ξ ( k ) , and noise realizations, δ ( k ) , for k = 1 , ..N, constitute a training set . Convergence theorems: Shapiro 2009. Statistical learning theory: Vapnik 1998. 34
Standard choices for the parameters α Two standard choices: • Truncated SVD: � 1 , if i ≤ α, φ tsvd ( α ) = i 0 , else, with α ∈ A tsvd = { 1 , . . . , n } . • Tikhonov filtering:, σ 2 φ tik i i ( α ) = i + α. σ 2 for α ∈ A tik = R + . Advantage: 1-parameter optimization problems are easy. Disadvantage: The filters are quite limited by their form. 35
Most general choice of parameters We let φ err i ( α ) = α i , i = 1 , . . . , n Advantage: The filters are now quite general. Disadvantage: n -parameter optimization problems are hard and the resulting filter can be very oscillatory. 36
A compromise: smoothing filters Take an n -parameter optimal filter and apply a smoothing operator to it: φ smooth = K ˆ φ err , where K denotes a smoothing matrix (e.g., a Gaussian). Advantage: The filter is now smoother. Disadvantage: It is no longer optimal. 37
A second compromise: spline filters Constrain the filter function φ ( α ) to be a cubic spline with m (given) knots. (We used knots equally spaced on a log scale.) Advantage: This simplifies the optimization problem to have approx. m variables and prevents wild oscillations or abrupt changes. Disadvantage: Knots and boundary conditions need to be specified or chosen by optimization. 38
Typical optimal filters 1 0.8 Filter factors 0.6 0.4 opt−error 0.2 opt−TSVD opt−Tik opt−spline 0 0.2 0.4 0.6 0.8 Singular values (Smooth filter (not shown) follows trend of optimal-error filter.) 39
Recommend
More recommend