Iterative regularization for general inverse problems Guillaume Garrigos with L. Rosasco and S. Villa CNRS, École Normale Supérieure Séminaire CVN - Centrale Supélec - 23 Jan 2018
Regularization of inverse problems 1 Regularization by penalization and early stopping 2 Iterative regularization for general models 3 Guillaume Garrigos 2/21
Intro : Inverse Problems An ill-posed inverse problem Given A : X → Y , and ¯ y ∈ Y we want to solve Ax = ¯ y (P) Guillaume Garrigos 3/21
Intro : Inverse Problems An ill-posed inverse problem Given A : X → Y , and ¯ y ∈ Y we want to solve Ax = ¯ y (P) Typically ¯ y = A ¯ x Signal/image processing: ¯ x the original signal deteriorated by A Linear regression: ( a i , y i ) the data, A = ( a 1 ; ... ; a i ; .. ) Non-linear/Kernel regression/SVM: same but send the a i ’s in a feature space Guillaume Garrigos 3/21
Intro : Inverse Problems An ill-posed inverse problem Given A : X → Y , and ¯ y ∈ Y we want to solve (P) Ax = ¯ y (P) might be ill-posed! Typically ¯ y = A ¯ x Signal/image processing: ¯ x the original signal deteriorated by A Linear regression: ( a i , y i ) the data, A = ( a 1 ; ... ; a i ; .. ) Non-linear/Kernel regression/SVM: same but send the a i ’s in a feature space Guillaume Garrigos 3/21
Intro : Inverse Problems An ill-posed inverse problem Given A : X → Y , and ¯ y ∈ Y we want to solve Ax = ¯ y (P) (P) might be ill-posed! (P) might have no solutions → introduce a discrepancy Guillaume Garrigos 3/21
Intro : Inverse Problems An ill-posed inverse problem Given A : X → Y , and ¯ y ∈ Y we want to solve x † = arg min D ( Ax ; ¯ y ) (P) (P) might be ill-posed! (P) might have no solutions → introduce a discrepancy y ) = � Ax − ¯ y � , � Ax − ¯ y � 1 , or D KL ( Ax ; ¯ D ( Ax ; ¯ y ) ... Guillaume Garrigos 3/21
Intro : Inverse Problems An ill-posed inverse problem Given A : X → Y , and ¯ y ∈ Y we want to solve x † = arg min D ( Ax ; ¯ y ) (P) (P) might be ill-posed! (P) might have no solutions → introduce a discrepancy D ( Ax ; ¯ y ) = � Ax − ¯ y � , � Ax − ¯ y � 1 , or D KL ( Ax ; ¯ y ) ... Guillaume Garrigos 3/21
Intro : Inverse Problems An ill-posed inverse problem Given A : X → Y , and ¯ y ∈ Y we want to solve x † = arg min D ( Ax ; ¯ y ) (P) (P) might be ill-posed! (P) might have no solutions → introduce a discrepancy y ) = � Ax − ¯ y � , � Ax − ¯ y � 1 , or D KL ( Ax ; ¯ D ( Ax ; ¯ y ) ... the solution x † might be not unique → introduce a prior Guillaume Garrigos 3/21
Intro : Inverse Problems An ill-posed inverse problem Given A : X → Y , and ¯ y ∈ Y we want to solve x † = arg min R ( x ) (P) arg min D ( Ax ;¯ y ) (P) might be ill-posed! (P) might have no solutions → introduce a discrepancy y ) = � Ax − ¯ y � , � Ax − ¯ y � 1 , or D KL ( Ax ; ¯ D ( Ax ; ¯ y ) ... the solution x † might be not unique → introduce a prior R ( x ) is a convex functional ( � x � 2 , � Wx � 1 , �∇ x � ,...) Guillaume Garrigos 3/21
Intro : Inverse Problems An ill-posed inverse problem Given A : X → Y , and ¯ y ∈ Y we want to solve x † = arg min R ( x ) (P) arg min D ( Ax ;¯ y ) (P) might be ill-posed! (P) might have no solutions → introduce a discrepancy y ) = � Ax − ¯ y � , � Ax − ¯ y � 1 , or D KL ( Ax ; ¯ D ( Ax ; ¯ y ) ... the solution x † might be not unique → introduce a prior R ( x ) is a convex functional ( � x � 2 , � Wx � 1 , �∇ x � ,...) (P) is our model. Guillaume Garrigos 3/21
Intro : Inverse Problems What about the stability to noise? ˆ y = ¯ y + ε A noisy example x † = arg min R ( x ) (P) arg min D ( Ax ;¯ y ) x † ¯ x ¯ y = A ¯ x y ˆ ˆ Guillaume Garrigos 4/21
Intro : Inverse Problems What about the stability to noise? ˆ y = ¯ y + ε A noisy example x † = arg min R ( x ) (P) arg min D ( Ax ;¯ y ) x † ¯ x ¯ y = A ¯ x y ˆ ˆ We need to impose well-posedness! Guillaume Garrigos 4/21
Regularization Regularization is a parametrization of a low-dimensional subset of the space of solutions, balancing between fitting the data/model. Guillaume Garrigos 5/21
Regularization Regularization is a parametrization of a low-dimensional subset of the space of solutions, balancing between fitting the data/model. We want a map ( y , λ ) ∈ Y × P �→ { x λ ( y ) } λ ∈P ⊂ X such that y ) = x † λ ∈P x λ (¯ lim 1 2 � ˆ y ) − x † � = O ( δ α ) y − ¯ y � ≤ δ ⇒ ∃ λ δ ∈ P , � x λ δ (ˆ Guillaume Garrigos 5/21
Regularization Regularization is a parametrization of a low-dimensional subset of the space of solutions, balancing between fitting the data/model. We want a map ( y , λ ) ∈ Y × P �→ { x λ ( y ) } λ ∈P ⊂ X such that y ) = x † λ ∈P x λ (¯ lim 1 2 � ˆ y ) − x † � = O ( δ α ) y − ¯ y � ≤ δ ⇒ ∃ λ δ ∈ P , � x λ δ (ˆ A good regularization method is a method for which α is big. Guillaume Garrigos 5/21
Regularization of inverse problems 1 Regularization by penalization and early stopping 2 Iterative regularization for general models 3 Guillaume Garrigos 6/21
Regularization x † = arg min R ( x ) (P) arg min D ( Ax ;¯ y ) Which regularization method for our model problem? Guillaume Garrigos 7/21
Regularization via Perturbation (Tikhonov) Penalization method x λ ( y ) := arg min λ R ( x ) + D ( Ax ; y ) ( P λ ) x ∈ X Guillaume Garrigos 8/21
Regularization via Perturbation (Tikhonov) Penalization method x λ ( y ) := arg min λ R ( x ) + D ( Ax ; y ) ( P λ ) x ∈ X In practice optim ր ( P λ 1 ) − → x λ 1 ց optim param. selec. ( P ) → ( P λ 2 ) − → → reg. path − → x λ 2 x λ δ optim ց − − → ր ( P λ 3 ) x λ 3 Guillaume Garrigos 8/21
Regularization via Penalization (Tikhonov) Penalization method x λ ( y ) := arg min λ R ( x ) + D ( Ax ; y ) ( P λ ) x ∈ X Example λ = 1 λ = 0 . 3 λ = 0 . 01 Guillaume Garrigos 8/21
Regularization via Penalization (Tikhonov) Penalization x λ ( y ) := arg min λ R ( x ) + D ( Ax ; y ) ( P λ ) x ∈ X Tikhonov regularization is a regularization method (linear case) Assume R ( x ) = � x � 2 , D ( Ax ; y ) = � Ax − y � 2 and x † ∈ Range ( A ∗ ) . Let � ˆ y − ¯ y � ≤ δ and ˆ x λ be generated by the data ˆ y . � x λ δ − x † � 1 If λ δ = O ( δ ) , then � ˆ � � δ � � 2 Guillaume Garrigos 8/21
Regularization via Penalization (Tikhonov) Penalization x λ ( y ) := arg min λ R ( x ) + D ( Ax ; y ) ( P λ ) x ∈ X Tikhonov regularization is a regularization method (linear case) Assume R ( x ) = � x � 2 , D ( Ax ; y ) = � Ax − y � 2 and x † ∈ Range ( A ∗ ) . Let � ˆ y − ¯ y � ≤ δ and ˆ x λ be generated by the data ˆ y . � x λ δ − x † � 1 If λ δ = O ( δ ) , then � ˆ � � δ � � 2 the exponent 1 / 2 is optimal very few results for other models... Guillaume Garrigos 8/21
Iterative Regularization (Early stopping) Early stopping Take any (robust) algorithm solving directly ( P ) : arg min R ( x ) arg min D ( Ax ;¯ y ) The regularization path is ( x n ) n ∈ N , the parameter is n . Guillaume Garrigos 9/21
Iterative Regularization (Early stopping) Early stopping Take any (robust) algorithm solving directly ( P ) : arg min R ( x ) arg min D ( Ax ;¯ y ) The regularization path is ( x n ) n ∈ N , the parameter is n . In practice optim param. selec. ( P ) − → ( x n ) n ∈ N → reg. path − → x n δ Guillaume Garrigos 9/21
Iterative Regularization (Early stopping) Early stopping Take any (robust) algorithm solving directly ( P ) : arg min R ( x ) x ∈ arg min D ( A · ; y ) The regularization path is { x n } , the parameter is n . Example n = 300 n = 500 n = 1000 Guillaume Garrigos 9/21
Iterative Regularization (Robust Optimization) Early stopping Take any (robust) algorithm solving directly ( P ) : arg min R ( x ) arg min D ( Ax ;¯ y ) The regularization path is { x n } , the parameter is n . The algorithm(s) If D ( Ax ; y ) = � Ax − y � 2 the constraint is linear so the dual of ( P ) is: u R ∗ ( − A ∗ u ) + � u , y � , min which could be solved by gradient on the dual: x n = ∇ R ∗ ( − A ∗ u n ) u n + 1 = u n + τ ( Ax n − y ) . NB: If R = � · � 2 it becomes the Landweber algorithm x n + 1 = x n − τ A ∗ ( Ax n − y ) . Guillaume Garrigos 9/21
Iterative Regularization (Robust Optimization) Early stopping Take any (robust) algorithm solving directly ( P ) : arg min R ( x ) x ∈ arg min D ( A · ; y ) The regularization path is { x n } , the parameter is n . Gradient descent is a regularization method Assume R ( x ) = � x � 2 , D ( Ax ; y ) = � Ax − y � 2 and x † ∈ Range ( A ∗ ) . Let � ˆ y − ¯ y � ≤ δ and ˆ x n be generated by the data ˆ y via x n − γ A ∗ ( A ˆ x n + 1 = ˆ ˆ x n − y ) . � x n δ − x † � 1 If n δ = O ( δ − 1 ) , then � ˆ � � δ � � 2 Guillaume Garrigos 9/21
Recommend
More recommend