unbiased risk estimation as parameter choice rule for
play

Unbiased Risk Estimation as Parameter Choice Rule for Filter-based - PowerPoint PPT Presentation

Unbiased Risk Estimation as Parameter Choice Rule for Filter-based Regularization Methods Frank Werner 1 Statistical Inverse Problems in Biophysics Group Max Planck Institute for Biophysical Chemistry, G ottingen and Felix Bernstein


  1. Unbiased Risk Estimation as Parameter Choice Rule for Filter-based Regularization Methods Frank Werner 1 Statistical Inverse Problems in Biophysics Group Max Planck Institute for Biophysical Chemistry, G¨ ottingen and Felix Bernstein Institute for Mathematical Statistics in the Biosciences University of G¨ ottingen Chemnitz Symposium on Inverse Problems 2017 (on Tour in Rio) 1 joint work with Housen Li Frank Werner, MPIbpC G¨ ottingen Unbiased Risk Estimation October 30, 2017 1 / 34

  2. Outline 1 Introduction 2 A posteriori parameter choice methods 3 Error analysis 4 Simulations 5 Conclusion Frank Werner, MPIbpC G¨ ottingen Unbiased Risk Estimation October 30, 2017 2 / 34

  3. Introduction Outline 1 Introduction 2 A posteriori parameter choice methods 3 Error analysis 4 Simulations 5 Conclusion Frank Werner, MPIbpC G¨ ottingen Unbiased Risk Estimation October 30, 2017 3 / 34

  4. Introduction Statistical inverse problems Setting: X , Y Hilbert spaces, T : X → Y bounded, linear Frank Werner, MPIbpC G¨ ottingen Unbiased Risk Estimation October 30, 2017 4 / 34

  5. Introduction Statistical inverse problems Setting: X , Y Hilbert spaces, T : X → Y bounded, linear Task: Recover unknown f ∈ X from noisy measurements Y = Tf + σξ Frank Werner, MPIbpC G¨ ottingen Unbiased Risk Estimation October 30, 2017 4 / 34

  6. Introduction Statistical inverse problems Setting: X , Y Hilbert spaces, T : X → Y bounded, linear Task: Recover unknown f ∈ X from noisy measurements Y = Tf + σξ Noise: ξ is a standard Gaussian white noise process, σ > 0 noise level Frank Werner, MPIbpC G¨ ottingen Unbiased Risk Estimation October 30, 2017 4 / 34

  7. Introduction Statistical inverse problems Setting: X , Y Hilbert spaces, T : X → Y bounded, linear Task: Recover unknown f ∈ X from noisy measurements Y = Tf + σξ Noise: ξ is a standard Gaussian white noise process, σ > 0 noise level The model has to be understood in a weak sense: Y g := � Tf , g � Y + σ � ξ, g � for all g ∈ Y � � 0 , � g � 2 with � ξ, g � ∼ N and E [ � ξ, g 1 � � ξ, g 2 � ] = � g 1 , g 2 � Y . Y Frank Werner, MPIbpC G¨ ottingen Unbiased Risk Estimation October 30, 2017 4 / 34

  8. Introduction Statistical inverse problems Assumptions: • T is injective and Hilbert-Schmidt ( � σ 2 k < ∞ , σ k singular values) • σ is known exactly Frank Werner, MPIbpC G¨ ottingen Unbiased Risk Estimation October 30, 2017 5 / 34

  9. Introduction Statistical inverse problems Assumptions: • T is injective and Hilbert-Schmidt ( � σ 2 k < ∞ , σ k singular values) • σ is known exactly As the problem is ill-posed, regularization is needed. Consider filter-based regularization schemes f α := q α ( T ∗ T ) T ∗ Y , ˆ α > 0 . Frank Werner, MPIbpC G¨ ottingen Unbiased Risk Estimation October 30, 2017 5 / 34

  10. Introduction Statistical inverse problems Assumptions: • T is injective and Hilbert-Schmidt ( � σ 2 k < ∞ , σ k singular values) • σ is known exactly As the problem is ill-posed, regularization is needed. Consider filter-based regularization schemes f α := q α ( T ∗ T ) T ∗ Y , ˆ α > 0 . Aim: A posteriori choice of α such that rate of convergence (as σ ց 0) is order optimal (no loss of log-factors) Frank Werner, MPIbpC G¨ ottingen Unbiased Risk Estimation October 30, 2017 5 / 34

  11. Introduction Statistical inverse problems Assumptions: • T is injective and Hilbert-Schmidt ( � σ 2 k < ∞ , σ k singular values) • σ is known exactly As the problem is ill-posed, regularization is needed. Consider filter-based regularization schemes f α := q α ( T ∗ T ) T ∗ Y , ˆ α > 0 . Aim: A posteriori choice of α such that rate of convergence (as σ ց 0) is order optimal (no loss of log-factors) Note: Heuristic parameter choice rules might work here as well, as the Bakushinski˘ ı veto does not hold in our setting (Becker ’11). Frank Werner, MPIbpC G¨ ottingen Unbiased Risk Estimation October 30, 2017 5 / 34

  12. A posteriori parameter choice methods Outline 1 Introduction 2 A posteriori parameter choice methods 3 Error analysis 4 Simulations 5 Conclusion Frank Werner, MPIbpC G¨ ottingen Unbiased Risk Estimation October 30, 2017 6 / 34

  13. A posteriori parameter choice methods The discrepancy principle � � � � � T ˆ � • For deterministic data: α DP = max f α − Y Y ≤ τσ α > 0 � � � � Frank Werner, MPIbpC G¨ ottingen Unbiased Risk Estimation October 30, 2017 7 / 34

  14. A posteriori parameter choice methods The discrepancy principle � � � � � T ˆ � • For deterministic data: α DP = max f α − Y Y ≤ τσ α > 0 � � � � ∈ Y ! Either pre-smoothing ( Y � Z := T ∗ Y ∈ X ) ... • But here: Y / Frank Werner, MPIbpC G¨ ottingen Unbiased Risk Estimation October 30, 2017 7 / 34

  15. A posteriori parameter choice methods The discrepancy principle � � � � � T ˆ � • For deterministic data: α DP = max f α − Y Y ≤ τσ α > 0 � � � � ∈ Y ! Either pre-smoothing ( Y � Z := T ∗ Y ∈ X ) ... • But here: Y / • ... or discretization: Y ∈ R n , ξ ∼ N n (0 , I n ) and choose 2 ≤ τσ √ n � � � � � T ˆ � α DP = max α > 0 f α − Y � � � � Frank Werner, MPIbpC G¨ ottingen Unbiased Risk Estimation October 30, 2017 7 / 34

  16. A posteriori parameter choice methods The discrepancy principle � � � � � T ˆ � • For deterministic data: α DP = max f α − Y Y ≤ τσ α > 0 � � � � ∈ Y ! Either pre-smoothing ( Y � Z := T ∗ Y ∈ X ) ... • But here: Y / • ... or discretization: Y ∈ R n , ξ ∼ N n (0 , I n ) and choose 2 ≤ τσ √ n � � � � � T ˆ � α DP = max α > 0 f α − Y � � � � Pros: Cons: • Easy to implement • How to choose τ ≥ 1? • Works for all q α • Only discretized meaningful • Order-optimal convergence • Early saturation rates Davies & Anderssen ’86, Lukas ’95, Blanchard, Hoffmann & Reiß ’16 Frank Werner, MPIbpC G¨ ottingen Unbiased Risk Estimation October 30, 2017 7 / 34

  17. A posteriori parameter choice methods The quasi-optimality criterion � � � r α ( T ∗ T ) ˆ • Neubauer ’08 ( r α ( λ ) = 1 − λ q α ( λ )): α QO = argmin f α � � � X α> 0 Frank Werner, MPIbpC G¨ ottingen Unbiased Risk Estimation October 30, 2017 8 / 34

  18. A posteriori parameter choice methods The quasi-optimality criterion � � � r α ( T ∗ T ) ˆ • Neubauer ’08 ( r α ( λ ) = 1 − λ q α ( λ )): α QO = argmin f α � � � X α> 0 • But for spectral cut-off r α ( T ∗ T ) ˆ f α = 0 for all α > 0 Frank Werner, MPIbpC G¨ ottingen Unbiased Risk Estimation October 30, 2017 8 / 34

  19. A posteriori parameter choice methods The quasi-optimality criterion � � � r α ( T ∗ T ) ˆ • Neubauer ’08 ( r α ( λ ) = 1 − λ q α ( λ )): α QO = argmin f α � � � X α> 0 • But for spectral cut-off r α ( T ∗ T ) ˆ f α = 0 for all α > 0 • Alternative formulation for Tikhonov regularization if candidates α 1 < ... < α m are given: � � � ˆ f α n − ˆ n QO = argmin X , α QO := α n QO . f α n +1 � � � 1 ≤ n ≤ m − 1 Frank Werner, MPIbpC G¨ ottingen Unbiased Risk Estimation October 30, 2017 8 / 34

  20. A posteriori parameter choice methods The quasi-optimality criterion � � � r α ( T ∗ T ) ˆ • Neubauer ’08 ( r α ( λ ) = 1 − λ q α ( λ )): α QO = argmin f α � � � X α> 0 • But for spectral cut-off r α ( T ∗ T ) ˆ f α = 0 for all α > 0 • Alternative formulation for Tikhonov regularization if candidates α 1 < ... < α m are given: � � � ˆ f α n − ˆ n QO = argmin X , α QO := α n QO . f α n +1 � � � 1 ≤ n ≤ m − 1 Cons: Pros: • Only for special q α • Easy to implement, very fast • Additional assumptions on • No knowledge of σ necessary noise and/or f necessary • Order-optimal convergence • Performance unclear in rates in mildly ill-posed severely ill-posed situations situations Bauer & Kindermann ’08, Bauer & Reiß ’08, Bauer & Kindermann ’09 Frank Werner, MPIbpC G¨ ottingen Unbiased Risk Estimation October 30, 2017 8 / 34

  21. A posteriori parameter choice methods The Lepski˘ ı-type balancing principle • For given α , the standard deviation of ˆ f α can be bounded by � � q α k ( T ∗ T ) 2 T ∗ T � std ( α ) := σ Tr • If candidates α 1 < ... < α m are given: � � � � � ˆ f α j − ˆ � X ≤ 4 κ std ( α k ) for all 1 ≤ k ≤ j n LEP = max j f α k � � � � and α LEP = α n LEP Frank Werner, MPIbpC G¨ ottingen Unbiased Risk Estimation October 30, 2017 9 / 34

Recommend


More recommend