Numerical Methods for Linear Discrete Ill-Posed Problems Lothar Reichel Como, May 2018
Part 1: The SVD and Krylov subspace methods Outline of Part 1: • Inverse and ill-posed problems • Solution of small to moderately sized problems: Direct methods based on the SVD. • Solution of large scale problems: Iterative methods
Inverse problems Inverse problems arise when one seeks to determine the cause of an observed effect. • Inverse heat conduction problems: What was the temperature of a rod an hour ago? • Computerized tomography. • Image restoration: Determine the unavailable exact image from an available contaminated version. Inverse problems often are ill-posed.
Ill-posed problems A problem is said to be ill-posed if it has at least one of the properties: • The problem does not have a solution, • The problem does not have a unique solution, • The solution does not depend continuously on the data.
Example of an ill-posed problem: Fredholm integral equation of the first kind, � 1 0 ≤ s ≤ 1 , 0 k ( s, t ) x ( t ) dt = f ( t ) , with a continuous kernel k . By the Riemann–Lebesgue lemma, small perturbations in f may correspond to large perturbations in x : � � � 1 � � � � max 0 k ( s, t ) cos(2 πℓt ) dt � � 0 ≤ s ≤ 1 can be made “tiny” by choosing | ℓ | large.
Linear discrete ill-posed problems Let A ∈ R m × n and b ∈ R m with m ≥ n . When m > n , consider the least-squares problems x ∈ R n � Ax − b � 2 min or, when m = n , consider the linear system of equations Ax = b. Matrices that arise in inverse problems, such as problems of remote sensing or image restoration problems, are of ill-determined rank, possibly rank deficient.
Least-squares problems or linear systems of equations with a matrix of this kind are referred to as linear discrete ill-posed problems. The vector b contains available data and is not required to be in R ( A ).
Linear discrete ill-posed problems arise from the discretization of linear ill-posed problems, such as Fredholm integral equations of the 1st kind or, in discrete form, such as in image restoration. The vector b in linear discrete ill-posed problems that arise in applications is generally determined by measurement and therefore is contaiminated by error (noise). In image restoration problems b represents an observed image.
Example 1: Consider the Fredholm integral equation of the 1st kind � π 0 exp( − st ) x ( t ) dt = 2sinh( s ) 0 ≤ s ≤ π , 2 . s Determine solution x ( t ) = sin( t ). Discretize integral by Galerkin method using piecewise constant functions. Code baart from Regularization Tools by Hansen.
The code baart gives • the matrix A ∈ R 200 × 200 , which is numerically singular, • the desired solution x exact ∈ R 200 , and • the error-free right-hand side b exact ∈ R 200 . Then Ax exact = b exact .
Assume that b exact is not available. Instead a noise-contaminated vector b = b exact + e is known. Here e represents white Gaussian noise scaled to correspond to 0.1% relative noise, i.e., � e � 2 = 10 − 3 � b exact � 2 We would like to determine an approximation of x exact by solving Ax = b.
right−hand side 0.26 0.25 0.24 no added noise added noise/signal=1e−3 0.23 0.22 0.21 0.2 0.19 0.18 0.17 0 20 40 60 80 100 120 140 160 180 200
exact solution 0.14 0.12 0.1 0.08 0.06 0.04 0.02 0 0 20 40 60 80 100 120 140 160 180 200
Solution Ax=b: noise level 10 −3 13 x 10 1.5 1 0.5 0 −0.5 −1 −1.5 0 20 40 60 80 100 120 140 160 180 200
solution of Ax=b : no added noise 150 100 50 0 −50 −100 0 20 40 60 80 100 120 140 160 180 200
The singular value decomposition The SVD of A ∈ R m × n , m ≥ n : A = U Σ V T , [ u 1 , u 2 , . . . , u m ] ∈ R m × m U = orthogonal , [ v 1 , v 2 , . . . , v n ] ∈ R n × n V = orthogonal , diag[ σ 1 , σ 2 , . . . , σ n ] ∈ R m × n , Σ = with σ 1 ≥ σ 2 ≥ . . . ≥ σ n ≥ 0 .
Application to least-squares approximation x � Ax − b � 2 x � U Σ V T x − b � 2 x � Σ V T x − U T b � 2 min 2 = min 2 = min 2 . Let [ y 1 , y 2 , . . . , y n ] T = V T x, y = m ] T = U T b. b ′ [ b ′ 1 , b ′ 2 , . . . , b ′ = Then n m � � x � Ax − b � 2 � Σ y − b ′ � 2 ( σ j y j − b ′ j ) 2 + ( b ′ j ) 2 . min 2 = min 2 = y j =1 j = n +1
If A is of full rank, then all σ j > 0 and y j = b ′ j , 1 ≤ j ≤ n, σ j yields the solution x = V y. If some σ j = 0, then y j is undetermined and the least-squares solution not unique. Often one is interested in the least-squares solution of minimal norm. Then undetermined elements y j are set to zero.
Assume that σ 1 ≥ σ 2 . . . ≥ σ ℓ > σ ℓ +1 = . . . = σ n = 0 . Then A is of rank ℓ . Introduce the diagonal matrix Σ † = diag[1 /σ 1 , 1 /σ 2 , . . . , 1 /σ ℓ , 0 , . . . , 0] ∈ R n × m . The matrix A † = V Σ † U T ∈ R n × m is known as the Moore–Penrose pseudoinverse of A .
The solution of the least-squares problem x � Ax − b � 2 min of minimal Eucliden norm can be expressed as x = A † b. Moreover, AA † = P R ( A ) ∈ R m × m . A † A = I ∈ R n × n ,
Note that ℓ � A = U Σ V T = σ j u j v T j . j =1 Define k � σ j u j v T 1 ≤ k ≤ ℓ. A k := j , j =1 Then A k is of rank k ; A k is the sum of k rank-one matrices σ j u j v T j . Moreover, � A − A k � 2 = rank( B ) ≤ k � A − B � 2 = σ k +1 , min i.e., A k is the closest matrix of rank ≤ k to A . Here � · � 2 denotes the spectral norm.
Let b = b exact + e , where e denotes an error. Then u T ℓ j b � x := A † b = v j σ j j =1 ℓ u T ℓ u T � j b exact � j e = v j + v j σ j σ j j =1 j =1 u T ℓ j e � = x exact + v j . σ j j =1 If σ ℓ > 0 tiny, then u T ℓ e σ ℓ might be huge and x a meaningless approximation of x exact .
Recall that k � σ j u j v T A k = j j =1 is the best rank- k approximation of A . Pseudoinverse of A k : k � A † σ − 1 j v j u T k := j , σ k > 0 . j =1
Approximate x exact by A † x k := k b k u T � j b = v j σ j j =1 u T u T k k j b exact j e � � = v j + v j . σ j σ j j =1 j =1 for some k ≤ ℓ . Approximating x exact by x k is known at the truncated SVD (TSVD) method. How to choose k ?
Example 1 cont’d: Singular values of A 5 10 0 10 −5 10 −10 10 −15 10 −20 10 0 20 40 60 80 100 120 140 160 180 200
Example 1 cont’d: Right-hand side without noise Norm of computed approximate solution 2.8 2.6 2.4 2.2 2 ||x k || 1.8 1.6 1.4 1.2 1 0.8 0 5 10 15 20 25 30 35 40 45 50 k
Example 1 cont’d: Right-hand side without noise Norm of residual vector 0 −2 −4 log 10 ||b−Ax k || −6 −8 −10 −12 −14 0 5 10 15 20 25 30 35 40 45 50 k
Example 1 cont’d: Right-hand side without noise: Exact and computed solutions 1.4 exact x 9 x 10 x 11 1.2 1 0.8 0.6 0.4 0.2 0 0 20 40 60 80 100 120 140 160 180 200
Example 1 cont’d: Right-hand side with relative noise 10 − 3 Norm of computed approximate solution 1.5 1.4 1.3 1.2 ||x k || 1.1 1 0.9 0.8 0 2 4 6 8 10 12 14 16 18 20 k
Example 1 cont’d: Right-hand side with relative noise 10 − 3 Norm of residual 0 10 −1 10 log 10 ||b−Ax k || −2 10 −3 10 −4 10 0 2 4 6 8 10 12 14 16 18 20 k The discrepancy principle prescribes that the truncation index k be as large as possible so that, for some fixed η > 1, � Ax k − b � 2 ≤ η � b − b exact � 2 . Here k = 3.
Example 1 cont’d: Right-hand side with relative noise 10 − 3 : Exact and computed solutions 1.4 exact x 3 x 4 1.2 x 5 1 0.8 0.6 0.4 0.2 0 0 20 40 60 80 100 120 140 160 180 200
There are many ways to determine the truncation index k including: • The discrepancy principle: Gives improved approximation of x exact as � b − b exact � 2 ց 0. Requires a bound for � b − b exact � 2 and that b exact ∈ R ( A ). • The L-curve criterion: A method that often gives a suitable value of k when � b − b exact � 2 is not too small. • Cross validation and generalized cross validation: Statistically based methods.
• The quasi-optimality criterion: Is based on comparing consecutive norms � x j +1 − x j � 2 , j = 1 , 2 , . . . . All methods, except for the discrepancy principle are referred to as “heuristic methods.” They work well for certain problems, but may fail to determine a suitable truncation index for others.
Example 1 cont’d: The L-curve for right-hand side with relative noise 10 − 3 L−curve 0 −0.5 −1 log 10 ||b−Ax k || −1.5 −2 −2.5 −3 −3.5 −0.06 −0.04 −0.02 0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 log 10 ||x k ||
Krylov subspace iterative methods: Regularization by truncated iteration The most popular iterative method is the conjugate gradient (CG) method applied to the normal equations A T Ax = A T b. The most stable implementation is based on Golub–Kahan bidiagonalization applied to A . This is the basis for the LSQR algorithm by Paige and Saunders.
Recommend
More recommend