image restoration from a machine learning perspective
play

Image Restoration from a Machine Learning Perspective September 2012 - PowerPoint PPT Presentation

Image Restoration from a Machine Learning Perspective September 2012 NIST Dianne P. OLeary 2012 c 1 Image Restoration Using Machine Learning Dianne P. OLeary Applied and Computational Mathematics Division, NIST Computer Science


  1. Image Restoration from a Machine Learning Perspective September 2012 NIST Dianne P. O’Leary � 2012 c 1

  2. Image Restoration Using Machine Learning Dianne P. O’Leary Applied and Computational Mathematics Division, NIST Computer Science Dept. and Institute for Advanced Computer Studies University of Maryland oleary@cs.umd.edu http://www.cs.umd.edu/users/oleary Joint work with Julianne Chung (Virginia Tech), Matthias Chung (Virginia Tech) Support from NIST and NSF. 2

  3. The Problem • Focus on numerical solution of ill-posed problems. • In particular, we try to reconstruct a clear image from a blurred one. • Focus on methods that take advantage of the singular value decomposition (SVD) of a matrix (spectral methods). 3

  4. Goal of our work: To achieve better solutions than previously obtained from the SVD. Ingredients: • Exploiting training data. • Using Bayesian estimation. • Designing optimal filters. Note: I’ll focus in this talk on methods that take advantage of having the full SVD available, but our methods can exploit the savings of using iterative methods as well. 4

  5. The Problem We have m observations b i resulting from convolution of a blurring function with a true image. We model this as a linear system b = Ax true + δ , where b ∈ R m is the vector of observed data, x true ∈ R n is an unknown vector containing values of x ( t j ) , matrix A ∈ R m × n , m ≥ n , is known, and δ ∈ R m represents noise in the data. Goal: compute an approximation of x true , given b and A . In other words: We need to learn the mapping between blurred images and true ones. 5

  6. Problem characteristics This is a discretization of an ill-posed inverse problem, meaning that small perturbations in the data may result in large errors in the solution. 6

  7. Example Suppose we have taken a picture but our lens gives us some Gaussian blur: a single bright pixel the blurred pixel We construct the matrix A from the blurred image. Our problem becomes x � b − Ax � 2 min 2 . 7

  8. Can we deblur this image? 8

  9. 9

  10. Remedy We regularize our problem by using extra information we have about the solution. For example, • We may have a bound on � x � 1 or � x � 2 . • We may know that 0 ≤ x , and we may have upper bounds, too. 10

  11. Example, continued Suppose we replace our problem Ax = b by x � b − Ax � 2 min 2 subject to � x � 2 ≤ β. This formulation is called Tikhonov regularization. Using a Lagrange multiplier λ , this problem becomes x � b − Ax � 2 max min 2 + λ ( � x � 2 − β ) . λ 11

  12. Write the solution to this problem using a spectral decomposition, the SVD of A : A = U Σ V T , where � ˆ � Σ • Σ = is diagonal with entries equal to the singular values 0 σ 1 ≥ σ 2 ≥ · · · ≥ σ n ≥ 0 . • The singular vectors u i ( i = 1 , . . . , m ) and v i ( i = 1 , . . . , n ) are columns of the matrices U and V respectively. • The singular vectors are orthonormal, so U T U = I m and V T V = I n . The solution becomes x = V ( Σ T Σ + λ I ) − 1 Σ T c , where c = U T b . Unfortunately, we don’t know λ , so a bit of trial-and-error is necessary. 12

  13. Can we deblur this image? (Revisited) 13

  14. 14

  15. 15

  16. 16

  17. 17

  18. 18

  19. 19

  20. 20

  21. 21

  22. 22

  23. What makes spectral methods work? For discretizations of ill-posed problems: • The singular values σ i > 0 have a clusterpoint at 0 as m, n → ∞ . • There is no noticeable gap in the singular values, and therefore the matrix A should be considered to be full-rank. • The small singular values correspond to oscillatory singular vectors. We need two further features: • The discretization is fine enough that to satisfy the discrete Picard condition : the sequence {| u T i b true |} decreases to 0 faster than { σ i } . • The noise components δ j , j = 1 , . . . , m , are uncorrelated, zero mean, and have identical but unknown variance. 23

  24. 0 10 −2 10 −4 10 −6 10 −8 10 0 100 200 300 400 500 i Picard plot: The singular values, represented with a red solid line, exhibit gradual decay to 0. The coefficients | u T i b | are represented by blue stars. 24

  25. The Plan • The Problem • Spectral Filtering • Learning the Filter: Data to the Rescue • Judging Goodness • Results • Conclusions 25

  26. Spectral Filtering We wrote our Tikhonov solution as x = V ( Σ T Σ + λ I ) − 1 Σ T c , where c = U T b . We can express this as x = V Φ ( λ ) Σ † c , where the diagonal matrix Φ is Φ ( λ ) = ( Σ T Σ + λ I ) − 1 Σ T Σ . For Tikhonov, λ is a single parameter. • Can we do better by using more parameters, resulting in a filter matrix Φ ( α ) ? • If so, how can we choose α ? We will learn it! 26

  27. The Plan • The Problem • Spectral Filtering • Learning the Filter: Data to the Rescue • Judging Goodness • Results • Conclusions 27

  28. Learning the Filter: Data to the Rescue What do we need? Informally, we need: • Knowledge of A . • A universe of possible true images. • A blurred image corresponding to one of these true images, chosen at random. • Knowledge of some characteristics of the noise. • Some training data. 28

  29. More formally, we need: • Knowledge of A . We assume we know it exactly. • A universe of possible true images. We assume that the true images that resulted in the ones presented to us are chosen from a known probability distribution P ξ on images in Ξ ⊂ R n that has finite second moments. • A blurred image corresponding to one of these true images, chosen at random, according to P ξ . • Knowledge of some characteristics of the noise: mean zero, finite second moments, known probability distribution P δ on noise vectors in ∆ ⊂ R n . • Some training data: pairs consisting of a true image and its resulting blurred image. 29

  30. Where does the training data come from? When an expensive imaging device is powered on, there is often a calibration procedure. For example, for an MRI machine, we might use a phantom, made of material with density similar to that of the brain, and insert a small sphere with density similar to that of a tumor. Taking images of the phantom at different positions in the field of view, or at different well-controlled rotations, gives us pairs of truth and measured values. 30

  31. The Plan • The Problem • Spectral Filtering • Learning the Filter: Data to the Rescue • Judging Goodness • Results • Conclusions 31

  32. How do we judge goodness of parameters? We want to minimize the error in our reconstruction! We settle for minimizing the expected error in our reconstruction: Error vector e ( α , ξ , δ ) = x filter ( α , ξ , δ ) − ξ , Measure error as n ERR( α , ξ , δ ) = 1 � ρ ( e i ( α , ξ , δ )) , n i =1 where, for example, ρ ( z ) = 1 p | z | p , for p ≥ 1 , related to the p -norm of the error vector. 32

  33. Choice of ρ We use 1-norm, 2-norm, p -norm ( p = 4 , as an approximation to the ∞ -norm). We also use the Huber function to reduce the effects of outliers. | z | − β  2 , if | z | ≥ β,  ρ ( z ) = 2 β z 2 , 1 if | z | < β,  33

  34. Bayes risk minimization An optimal filter would minimize the expected value of the error: α = arg min α f ( α ) = E δ , ξ { ERR( α , ξ , δ ) } , ˇ Given our training data, we approximate this problem by minimizing the empirical Bayes risk α = arg min α f N ( α ) , ˆ where N n 1 ρ ( e ( k ) � � f N ( α ) = i ( α )) , nN k =1 i =1 where the samples ξ ( k ) , and noise realizations, δ ( k ) , for k = 1 , ..N, constitute a training set . Convergence theorems: Shapiro 2009. Statistical learning theory: Vapnik 1998. 34

  35. Standard choices for the parameters α Two standard choices: • Truncated SVD: � 1 , if i ≤ α, φ tsvd ( α ) = i 0 , else, with α ∈ A tsvd = { 1 , . . . , n } . • Tikhonov filtering:, σ 2 φ tik i i ( α ) = i + α. σ 2 for α ∈ A tik = R + . Advantage: 1-parameter optimization problems are easy. Disadvantage: The filters are quite limited by their form. 35

  36. Most general choice of parameters We let φ err i ( α ) = α i , i = 1 , . . . , n Advantage: The filters are now quite general. Disadvantage: n -parameter optimization problems are hard and the resulting filter can be very oscillatory. 36

  37. A compromise: smoothing filters Take an n -parameter optimal filter and apply a smoothing operator to it: φ smooth = K ˆ φ err , where K denotes a smoothing matrix (e.g., a Gaussian). Advantage: The filter is now smoother. Disadvantage: It is no longer optimal. 37

  38. A second compromise: spline filters Constrain the filter function φ ( α ) to be a cubic spline with m (given) knots. (We used knots equally spaced on a log scale.) Advantage: This simplifies the optimization problem to have approx. m variables and prevents wild oscillations or abrupt changes. Disadvantage: Knots and boundary conditions need to be specified or chosen by optimization. 38

  39. Typical optimal filters 1 0.8 Filter factors 0.6 0.4 opt−error 0.2 opt−TSVD opt−Tik opt−spline 0 0.2 0.4 0.6 0.8 Singular values (Smooth filter (not shown) follows trend of optimal-error filter.) 39

Recommend


More recommend