statistical properties of the regularized least squares
play

Statistical Properties of the Regularized Least Squares Functional - PowerPoint PPT Presentation

Motivation Least Squares Problems Statistical Results for Least Squares Implications of Statistical Results for Regularized Least Squares Newton Statistical Properties of the Regularized Least Squares Functional and a hybrid LSQR Newton method


  1. Motivation Least Squares Problems Statistical Results for Least Squares Implications of Statistical Results for Regularized Least Squares Newton Statistical Properties of the Regularized Least Squares Functional and a hybrid LSQR Newton method for Finding the Regularization Parameter: Application in Image Deblurring and Signal Restoration Rosemary Renaut Midwest Conference on Mathematical Methods for Images and Surfaces April 18, 2009 National Science Foundation: Division of Computational Mathematics 1 / 34

  2. Motivation Least Squares Problems Statistical Results for Least Squares Implications of Statistical Results for Regularized Least Squares Newton Outline Motivation 1 Least Squares Problems 2 Statistical Results for Least Squares 3 Implications of Statistical Results for Regularized Least Squares 4 Newton algorithm 5 Algorithm with LSQR (Paige and Saunders) 6 Results 7 Conclusions and Future Work 8 National Science Foundation: Division of Computational Mathematics 2 / 34

  3. Motivation Least Squares Problems Statistical Results for Least Squares Implications of Statistical Results for Regularized Least Squares Newton Signal/Image Restoration: � Integral Model of Signal Degradation b ( t ) = K ( t , s ) x ( s ) ds K ( t , s ) describes blur of the signal. Convolutional model: invariant K ( t , s ) = K ( t − s ) is Point Spread Function (PSF). Typically sampling includes noise e ( t ) , model is � b ( t ) = K ( t − s ) x ( s ) ds + e ( t ) Discrete model: given discrete samples b, find samples x of x Let A discretize K , assume known, model is given by b = A x + e . Na¨ ıvely invert the system to find x ! National Science Foundation: Division of Computational Mathematics 3 / 34

  4. Motivation Least Squares Problems Statistical Results for Least Squares Implications of Statistical Results for Regularized Least Squares Newton Example 1-D Original and Blurred Noisy Signal Original signal x . Blurred and noisy signal b , Gaussian PSF. National Science Foundation: Division of Computational Mathematics 4 / 34

  5. Motivation Least Squares Problems Statistical Results for Least Squares Implications of Statistical Results for Regularized Least Squares Newton The Solution: Regularization is needed Na¨ ıve Solution A Regularized Solution National Science Foundation: Division of Computational Mathematics 5 / 34

  6. Motivation Least Squares Problems Statistical Results for Least Squares Implications of Statistical Results for Regularized Least Squares Newton Least Squares for A x = b: A Quick Review Consider discrete systems: A ∈ R m × n , b ∈ R m , x ∈ R n A x = b + e , Classical Approach Linear Least Squares x || A x − b || 2 x LS = arg min 2 Difficulty x LS is sensitive to changes in the right hand side b when A is ill-conditioned. For convolutional models system is numerically ill-posed . National Science Foundation: Division of Computational Mathematics 6 / 34

  7. Motivation Least Squares Problems Statistical Results for Least Squares Implications of Statistical Results for Regularized Least Squares Newton Introduce Regularization to Pick a Solution Weighted Fidelity with Regularization • Regularize x {� b − A x � 2 W b + λ 2 R ( x ) } , x RLS = arg min Weighting matrix W b • R ( x ) is a regularization term • λ is a regularization parameter which is unknown. Solution x RLS ( λ ) depends on λ . depends on regularization operator R depends on the weighting matrix W b National Science Foundation: Division of Computational Mathematics 7 / 34

  8. Motivation Least Squares Problems Statistical Results for Least Squares Implications of Statistical Results for Regularized Least Squares Newton Generalized Tikhonov Regularization With Weighting x = argmin J ( x ) = argmin {� A x − b � 2 W b + λ 2 � D ( x − x 0 ) � 2 } . ˆ (1) D is a suitable operator, often derivative approximation. Assume N ( A ) ∩ N ( D ) = ∅ x 0 is a reference solution, often x 0 = 0. Given multiple measurements of data: Usually error in b , e is an m − vector of random measurement errors with mean 0 and positive definite covariance matrix C b = E ( ee T ) . For uncorrelated measurements C b is diagonal matrix of standard deviations of the errors. (Colored noise) For white noise C b = σ 2 I . Weighting by W b = C b − 1 in data fit term, theoretically, ˜ e are uncorrelated. Question Given D , W b how do we find λ ? National Science Foundation: Division of Computational Mathematics 8 / 34

  9. Motivation Least Squares Problems Statistical Results for Least Squares Implications of Statistical Results for Regularized Least Squares Newton Example: solution for Increasing λ , D = I . National Science Foundation: Division of Computational Mathematics 9 / 34

  10. Motivation Least Squares Problems Statistical Results for Least Squares Implications of Statistical Results for Regularized Least Squares Newton Example: solution for Increasing λ , D = I . National Science Foundation: Division of Computational Mathematics 9 / 34

  11. Motivation Least Squares Problems Statistical Results for Least Squares Implications of Statistical Results for Regularized Least Squares Newton Example: solution for Increasing λ , D = I . National Science Foundation: Division of Computational Mathematics 9 / 34

  12. Motivation Least Squares Problems Statistical Results for Least Squares Implications of Statistical Results for Regularized Least Squares Newton Example: solution for Increasing λ , D = I . National Science Foundation: Division of Computational Mathematics 9 / 34

  13. Motivation Least Squares Problems Statistical Results for Least Squares Implications of Statistical Results for Regularized Least Squares Newton Choice of λ crucial Different algorithms yield different solutions. Examples: Discrepancy Principle Generalized Cross Validation (GCV) L-Curve Unbiased Predictive Risk (UPRE) General Difficulties Expensive (GCV, L, UPRE) Not necessarily unique solution (GCV) Oversmoothing (Discrepancy) No kink in the L-curve A new statistical approach χ 2 result National Science Foundation: Division of Computational Mathematics 10 / 34

  14. Motivation Least Squares Problems Statistical Results for Least Squares Implications of Statistical Results for Regularized Least Squares Newton Background: Statistics of the Least Squares Problem Theorem (Rao73: First Fundamental Theorem) Let r be the rank of A and for b ∼ N ( A x , σ 2 b I ) , (errors in measurements are normally distributed with mean 0 and covariance σ 2 b I), then x � A x − b � 2 ∼ σ 2 b χ 2 ( m − r ) . J = min J follows a χ 2 distribution with m − r degrees of freedom: Basically the Discrepancy Principle Corollary (Weighted Least Squares) For b ∼ N ( A x , C b ) , and W b = C b − 1 then x � A x − b � 2 W b ∼ χ 2 ( m − r ) . J = min National Science Foundation: Division of Computational Mathematics 11 / 34

  15. Motivation Least Squares Problems Statistical Results for Least Squares Implications of Statistical Results for Regularized Least Squares Newton Extension: Statistics of the Regularized Least Squares Problem Theorem: χ 2 distribution of the regularized functional (Renaut/Mead 2008) x = argmin J D ( x ) = argmin {� A x − b � 2 W b + � ( x − x 0 ) � 2 W D = D T W x D . ˆ W D } , (2) Assume W b and W x are symmetric positive definite. Problem is uniquely solvable N ( A ) ∩ N ( D ) � = 0. Moore-Penrose generalized inverse of W D is C D Statistics: ( b − A x ) = e ∼ N ( 0 , C b ) , ( x − x 0 ) = f ∼ N ( 0 , C D ) , x 0 is the mean vector of the model parameters. Then J D ∼ χ 2 ( m + p − n ) National Science Foundation: Division of Computational Mathematics 12 / 34

  16. Motivation Least Squares Problems Statistical Results for Least Squares Implications of Statistical Results for Regularized Least Squares Newton Key Aspects of the Proof I: The Functional J Algebraic Simplifications: Rewrite functional as quadratic form Regularized solution given in terms of resolution matrix R ( W D ) x 0 + ( A T W b A + D T W x D ) − 1 A T W b r , ˆ x = (3) x 0 + R ( W D ) W b 1 / 2 r , = r = b − A x 0 = x 0 + y ( W D ) . (4) ( A T W b A + D T W x D ) − 1 A T W b 1 / 2 R ( W D ) = (5) Functional is given in terms of influence matrix A ( W D ) W b 1 / 2 AR ( W D ) A ( W D ) = (6) r T W b 1 / 2 ( I m − A ( W D )) W b 1 / 2 r , r = W b 1 / 2 r (7) J D (ˆ ˜ x ) = let r T ( I m − A ( W D ))˜ = ˜ r . A Quadratic Form (8) National Science Foundation: Division of Computational Mathematics 13 / 34

  17. Motivation Least Squares Problems Statistical Results for Least Squares Implications of Statistical Results for Regularized Least Squares Newton Key Aspects of the Proof II : Properties of a Quadratic Form χ 2 distribution of Quadratic Forms x T P x for normal variables (Fisher- Cochran Theorem) Components x i are independent normal variables x i ∼ N ( 0 , 1 ) , i = 1 : n . A necessary and sufficient condition that x T P x has a central χ 2 distribution is that P is idempotent , P 2 = P . In which case the degrees of freedom of χ 2 is rank( P ) = trace( P ) = n . . When the means of x i are µ i � = 0, x T P x has a non-central χ 2 distribution, with non-centrality parameter c = µ T P µ A χ 2 random variable with n degrees of freedom and centrality parameter c has mean n + c and variance 2 ( n + 2 c ) . National Science Foundation: Division of Computational Mathematics 14 / 34

Recommend


More recommend