statistical inverse problems and instrumental variables
play

Statistical Inverse Problems and Instrumental Variables Thorsten - PowerPoint PPT Presentation

Statistical Inverse Problems and Instrumental Variables Thorsten Hohage Institut fr Numerische und Angewandte Mathematik University of Gttingen Workshop on Inverse and Partial Information Problems: Methodology and Applications RICAM,


  1. Statistical Inverse Problems and Instrumental Variables Thorsten Hohage Institut für Numerische und Angewandte Mathematik University of Göttingen Workshop on Inverse and Partial Information Problems: Methodology and Applications RICAM, Linz, 27.-31.10.2008

  2. Collaborators • Frank Bauer (Linz) • Laurent Cavalier (Marseille) • Jean-Pierre Florens (Toulouse) • Jan Johannnes (Heidelberg) • Enno Mammen (Mannheim) • Axel Munk (Göttingen)

  3. outline A Newton method for nonlinear statistical inverse problems 1 Oracle inequalities 2 3 Nonparametric instrumental variables and perturbed operators

  4. statistical inverse problem problem: Let X , Y be separable Hilbert spaces and F : D ( F ) ⊂ X → Y a Fréchet differentiable, one-to-one operator. Estimate a † given indirect observations in the form of a random process Y = F ( a † ) + σξ + δζ. F − 1 is not continuous! ξ normalized stochastic noise: a Hilbert space satisfying E ξ = 0 and � Cov ξ � ≤ 1 σ ≥ 0 stochastic noise level ζ ∈ Y normalized deterministic noise, � ζ � = 1 δ ≥ 0 deterministic noise level

  5. the algorithm The Newton equation F ′ [ � a k ]( � a k + 1 − � a k ) = Y − F ( � a k ) , k = 1 , 2 , . . . is regularized in each step by Tikhonov regularization with initial guess a 0 and regularization parameter α k = α 0 q k , q ∈ ( 0 , 1 ) : a k + 1 := argmin a ∈X � F ′ [ � a k ) − Y � 2 Y + α k + 1 � a − a 0 � 2 � a k ]( a − � a k )+ F ( � X

  6. What is this for linear problems? If F = T is linear, the iteration formula simplifies to a k + 1 := argmin a ∈X � Ta − Y � 2 Y + α k + 1 � a − a 0 � 2 � X . The iteration steps decouple in the sense that none of the previous iterate appears in the formula for � a k + 1 . Bias and variance must be balanced by proper choice of the stopping index.

  7. What if � a k / ∈ D ( F ) for some k ? • Since typically D ( F ) � = X and the stochastic noise σξ can be arbitrarily large, there exists a positive probability that � a k / ∈ D ( F ) in each Newton step. • “Emergency stop”: If this happens, we stop the Newton iteration and return a 0 as estimator of a † . • We will have to show that the probability the such an emergency stop is necessary rapidly tends to 0 with the stochastic noise level σ .

  8. Can we improve on the qualification of Tikhonov regularization? Replace Tikhonov regularization by iterated Tikhonov regularization: a ( 0 ) ˆ k + 1 := a 0 � a ( j ) � F ′ [ � a k ) − Y � 2 ˆ k + 1 := argmin a ∈X a k ]( a − � a k ) + F ( � Y � a ( j − 1 ) k + 1 � 2 + α k + 1 � a − ˆ j = 1 , . . . , m X a ( m ) ˆ a k + 1 := ˆ k + 1 closed formula: � � a k ] ∗ × F ′ [ � a k ] ∗ F ′ [ � F ′ [ � � a k + 1 := a 0 + g α k + 1 a k ] � � a k ) + F ′ [ � × Y − F ( � a k ]( � a k − a 0 ) � � m α , g α ( λ ) := 1 r α ( λ ) := λ ( 1 − r α ( λ )) α + λ

  9. references: deterministic convergence analysis: B. Kaltenbacher, A. Neubauer, O. Scherzer. Iterative Regularization Methods for Nonlinear Ill-Posed Problems. Radon Series on Computational and Applied Mathematics, de Gruyter, Berlin, 2008 A. B. Bakushinsky and M. Y. Kokurin. Iterative Methods for Approximate Solution of Inverse Problems . Springer, Dordrecht, 2008. A. B. Bakushinsky. The problem of the convergence of the iteratively regularized Gauss-Newton method. Comput. Maths. Math. Phys. , 32:1353–1359, 1992. The following results are from: F . Bauer, T. Hohage and A. Munk. Iteratively Regularized Gauss-Newton Method for Nonlinear Inverse Problems with Random Noise. preprint, under revision for SIAM J. Numer. Anal.

  10. error decomposition a k − a † in the Let T := F ′ [ a † ] and T k := F ′ [ � a k ] . The error E k = � k th Newton step can be decomposed into • an approximation error E app k + 1 := r α k + 1 ( T ∗ T ) E 0 , • a propagated data noise error E noi k + 1 := g α k + 1 ( T ∗ k T k ) T ∗ k ( δζ + σξ ) , • and a nonlinearity error � � g α k + 1 ( T ∗ k T k ) T ∗ F ( a † ) − F ( � E nl := a k ) + T k E k k + 1 k � � r α k + 1 ( T ∗ k T k ) − r α k ( T ∗ T ) + E 0 , i.e. E k + 1 = E app k + 1 + E noi k + 1 + E nl k + 1 .

  11. crucial lemma Lemma Under certain assumptions discussed below there exists γ nl > 0 such that � � � E nl � E app � + � E noi k � ≤ γ nl k � k = 1 , . . . , K max . k

  12. assumptions of the lemma • source condition: There exists a sufficiently small “source” w ∈ Y such that a 0 − a † = T ∗ w • α 0 sufficiently large such that � E 0 � ≤ q − m � E app 1 � • Lipschitz condition: For all a 1 , a 2 ∈ D ( F ) � F ′ [ a 1 ] − F ′ [ a 2 ] � ≤ L � a 1 − a 2 � . • choice of K max : � � k ∈ N : � E noi k � K max := max ≤ C stop √ α k

  13. on the proof of the lemma • The proof uses an straightforward induction argument in k . • The following properties of iterated Tikhonov regularization are used: • There exists γ app > 0 such that for all k � E app k + 1 � ≤ � E app � ≤ γ app � E app k + 1 � k This rules out methods with infinite qualification such as Landweber iteration! • The propagated data noise is an ordered process in the sense that � E noi k � ≤ � E noi k + 1 � for all k .

  14. optimal deterministic rates Corollary For deterministic errors ( σ = 0 ) define the optimal stopping index by � � δ � E app K ∗ := min { K max , K } , K := argmin k ∈ N � + . √ α k k Then there exist constants C , δ 0 > 0 such that � � δ � E app a K ∗ − a † � ≤ C inf � ˆ � + for all δ ∈ ( 0 , δ 0 ] . √ α k k k ∈ N In particular, under the Hölder source condition a 0 − a † = Λ( T ∗ T ) ˜ w with µ ∈ [ 1 2 , m ] we obtain � � 1 2 µ a K ∗ − a † � = O 2 µ + 1 δ � ˆ � ˜ w � , 2 µ + 1

  15. propagated data noise error We make the following assumptions on the variance term V ( a , α ) := � g α ( F ′ [ a ] ∗ F ′ [ a ]) F ′ [ a ] ∗ ξ � 2 : • There exists a known function ϕ noi such that ( E V ( a , α )) 1 / 2 ≤ ϕ noi ( α ) ∀ α ∈ ( 0 , α 0 ] and a ∈ D ( F ) . • There are constants 1 < γ noi ≤ γ noi < ∞ such that γ noi ≤ ϕ noi ( α k + 1 ) /ϕ noi ( α k ) ≤ γ noi , ∀ k ∈ N 0 . • (exponential inequality) ∃ λ 1 , λ 2 > 0 ∀ a ∈ D ( F ) ∀ α ∈ ( 0 , α 0 ] ∀ τ ≥ 1 P { V ( a , α ) ≥ τ E V ( a , α ) } ≤ λ 1 e − λ 2 τ .

  16. optimal rates for known smoothness Theorem Assume that { a : � a − a 0 � ≤ 2 R } ⊂ D ( F ) and define the optimal stopping index � � δ � E app K := argmin k ∈ N � + √ α k + σϕ noi ( α k ) . k If � ˆ a k − a 0 � ≤ 2 R for k = 1 , . . . , K, set K ∗ := K, otherwise K ∗ := 0 . Then there exist constants C > 1 and δ 0 , σ 0 > 0 such that � � � a K ∗ − a † � 2 � 1 / 2 δ � E app E � ˆ ≤ C min � + + σϕ noi ( α k ) √ α k k k ∈ N for all δ ∈ ( 0 , δ 0 ] and σ ∈ ( 0 , σ 0 ] . Short: The Newton method achieves the same rate as iterated Tikhonov applied to the linearized problem.

  17. outline A Newton method for nonlinear statistical inverse problems 1 Oracle inequalities 2 3 Nonparametric instrumental variables and perturbed operators

  18. oracle parameter choice rules Consider an inverse problem Y = F ( a † ) + σξ + δζ and a family { R α : Y → X} of regularized inverses of F . An oracle parameter choice rule α or for the method { R α } and the solution a † is defined by E � R α or ( Y ) − a † � 2 = inf E � R α ( Y ) − a † � 2 sup α sup � ζ �≤ 1 � ζ �≤ 1 An oracle inequality for some given parameter choice rule α ∗ = α ∗ ( Y , σ, δ ) is an estimate of the form E � R α ∗ ( Y ) − a † � 2 ≤ χ ( σ, δ ) sup E � R α or ( Y ) − a † � 2 . sup � ζ �≤ 1 � ζ �≤ 1 In the optimal case χ ( σ, δ ) → 1 as σ, δ → 0. E. Candès. Modern statistical estimation via oracle inequalities. Acta Numerica , 15:257–325, 2006.

  19. typical convergence results in deterministic regularization theory • In deterministic theory convergence results for parameter choice rules typically contain a comparison with all other reconstruction methods R : Y → X • In this case one cannot consider only one a † ∈ X , otherwise the optimal method would be R ( Y ) ≡ a † . • Hence, estimates must be uniform over a smoothness class S ⊂ X , which is typically defined by a source condition. E.g. � R α ∗ ( F ( a ) + δζ ) − a † � sup sup a ∈S � ζ �≤ 1 � ˜ R ( F ( a † ) + δζ ) − a † � . ≤ C inf sup sup ˜ R a † ∈S � ζ �≤ 1

Recommend


More recommend