course on inverse problems
play

Course on Inverse Problems Albert Tarantola Lesson XVII: - PowerPoint PPT Presentation

Institut de Physique du Globe de Paris & Universit Pierre et Marie Curie (Paris VI) Course on Inverse Problems Albert Tarantola Lesson XVII: Least-Squares Involving Functions mathematica notebook II: Functional Formulation The a


  1. Institut de Physique du Globe de Paris & Université Pierre et Marie Curie (Paris VI) Course on Inverse Problems Albert Tarantola Lesson XVII: Least-Squares Involving Functions

  2. ⇒ mathematica notebook

  3. II: Functional Formulation The a priori information is represented by C prior = { C prior ( z , z ′ ) } m prior = { m prior ( z ) } ; . The observable parameters are of the kind � o i = dz O i ( z ) m ( z ) , where, in our example O 1 ( z ) is a delta function and O 2 ( z ) is a box-car function. This corresponds to a linear relation between the model function m and the observations o : o = O m . We have some observations C obs = { C ij o obs = { o i obs } ; obs } .

  4. The solution is provided by the standard least-squares equa- tion, that we just need to interpret. The first equation is m post = m prior − C prior O t ( O C prior O t + C obs ) -1 ( O m prior − o obs ) , i.e., m post = m prior − P Q r P = C prior O t S = O C prior O t + C obs Q = S -1 r = O m prior − o obs

  5. This leads to P i ( z ) Q ij r j m post ( z ) = m prior ( z ) − ∑ i ∑ j � dz ′ C prior ( z , z ′ ) O i ( z ′ ) P i ( z ) = � � S ij = dz ′ O i ( z ) C prior ( z , z ′ ) O j ( z ′ ) + C ij dz obs Q = S -1 � r i = dz O i ( z ) m prior ( z ) − o i obs

  6. The second equation is C post = C prior − C prior O t ( O C prior O t + C obs ) -1 O C prior , i.e., C post = C prior − P Q P t P = C prior O t S = O C prior O t + C obs Q = S -1

  7. This leads to P i ( z ) Q ij P j ( z ′ ) C post ( z , z ′ ) = C prior ( z , z ′ ) − ∑ i ∑ j � dz ′ C prior ( z , z ′ ) O i ( z ′ ) P i ( z ) = � � S ij = dz ′ O i ( z ) C prior ( z , z ′ ) O j ( z ′ ) + C ij dz obs Q = S -1 ⇒ mathematica notebook

  8. Let me bring a clarification. We have examined two of the many possible algorithms providing the mean posterior model n post . The first one was the steepest-descent algorithm n k + 1 = n k − µ ( C prior L t C -1 obs ( L n k − t obs ) + ( n k − n prior ) ) that could advantageously be replaced by a preconditioned steepest descent algorithm, n k + 1 = n k − P ( C prior L t C -1 obs ( L n k − t obs ) + ( n k − n prior ) ) where P is a suitably chosen, but arbitrary, positive definite operator. A good choice of P will accelerate convergence (but not change the final result). Let us guess what P could be.

  9. The first iteration, when choosing n 0 = n prior gives n 1 = n prior − P C prior L t C -1 obs ( L n prior − t obs ) We also saw the Newton algorithm n post = n prior − C prior L t ( L C prior L t + C obs ) -1 ( L n prior − t obs ) that, because the forward relation is here linear, converges in just one step. Equivalent to the last equation is (see lesson XI) n post = n prior − ( L t C -1 prior ) -1 L t C -1 obs L + C -1 obs ( L n prior − o obs ) that can also be written as n post = n prior − Π C prior L t C -1 obs ( L n prior − o obs ) , with Π = ( C prior ( L t C -1 prior ) ) -1 . obs L + C -1

  10. The conclusion is that the more the (arbitrary) preconditioning operator P is close to Π = ( C prior ( L t C -1 obs L + C -1 prior ) ) -1 , the faster the preconditioned steepest descent algorithm will converge (and in only one step if P = Π , but this is usually impossible). Simplest example: let be Q = C prior ( L t C -1 obs L + C -1 prior ) , so we must use P ≈ Q -1 . Letting Q ( x , x ′ ) be the kernel of Q , one may choose to approximate Q -1 by the inverse of its diagonal elements, 1 P ( x , x ′ ) = Q ( x , x ) δ ( x − x ′ ) . Evaluating Q ( x , x ) grossly corresponds to “counting how ma- ny rays pass through point x ”.

  11. I have already explained that, in the context of least-squares, and given a covariance function C ( z , z ′ ) , the norm of any func- tion m = { m ( z ) } is to be defined via � m � 2 = ( m , m ) = � C -1 m , m � . Following a discussion by Prof. Mark Simons, I realized that I have not mentioned that when considering the exponential covariance function C ( z , z ′ ) = σ 2 exp ( −| z − z ′ | ) , L one obtains the simple and interesting result (Tarantola, 2005, page 311) � 1 � 2 � � � 1 � dm � m � 2 = dz m ( z ) 2 + L dz dz ( z ) . σ 2 L

  12. This implies that when solving the basic least-squares prob- lem, i.e., the problem of finding the model m post that mini- mizes the misfit function 2 S ( m ) = � o ( m ) − o obs � 2 C obs + � m − m prior � 2 C prior = � C -1 obs ( o ( m ) − o obs ) , ( o ( m ) − o obs ) � + � C -1 prior ( m − m prior ) , ( m − m prior ) � , we are imposing that m post ( z ) − m prior ( z ) must be small, but we are also imposing that dm post dm prior dz ( z ) − ( z ) must be small. dz In particular, if m prior ( z ) is smooth, then m post ( z ) must also be smooth. See page 316 of my book for the 3D case.

  13. In lesson XII, I introduced the definition of transpose operator: Let L be a linear operator from linear space A into linear space B . The transpose of L , denoted L t , is a linear oper- ator from B ∗ into A ∗ , defined by the condition that for any β ∈ B ∗ and for any a ∈ A , � β , L a � B = � L t β , a � A .

  14. Then, without demonstration, I gave two examples: If the expression b = L a means b i = ∑ L iI a I I then, an expression like α = L t β means α I = ∑ L iI β i . i If the expression b = L a means � b ( t ) = dV ( x ) L ( t , x ) a ( x ) , then, an expression like α = L t β means � α ( x ) = dt L ( t , x ) β ( t ) . Ozgun, like St. Thomas, asks for the demonstrations.

  15. Case s = L v ⇔ s i = ∑ α L i α v α . There must be two duality products: � ω , v � = ∑ � σ , s � = ∑ ω α v α σ i s i ; . α i The transpose operator, say M = L t , will operate on some σ to give some ω : ω = M σ ⇔ ω α = ∑ i M α i σ i . Relation be- tween M α i and L i α ? For any v and any σ one must have � σ , L v � = � M σ , v � σ i ( L v ) i = ∑ ( M σ ) α v α ∑ i.e., α i L i α v α ) = ∑ M α i σ i ) v α ∑ σ i ( ∑ ( ∑ i.e., α α i i L i α σ i v α = ∑ M α i σ i v α ∑ i ∑ i ∑ i.e., , α α and this implies that M α i L i α (the matrix representing = M = L t is the transpose of the matrix representing L ).

  16. Note: this demonstration is trivial if using matrix notations. Case s = L v . There must be two duality products: � ω , v � = ω t v � σ , s � = σ t s ; . The transpose operator, say M , will operate on some σ to give some ω : ω = M σ . Relation between M and L ? For any v and any σ one must have � σ , L v � = � M σ , v � σ t ( L v ) = ( M σ ) t v = ( σ t M t ) v i.e., σ t L v = σ t M t v i.e., , and this implies that M t = L , i.e., that M = L t .

  17. Case s = L v ⇔ s i = � dz L i ( z ) v ( z ) . There must be two du- ality products: � � σ , s � = ∑ σ i s i � ω , v � = dz ω ( z ) v ( z ) ; . i The transpose operator, say M = L t , will operate on some σ to give some ω : ω = M σ ⇔ ω ( z ) = ∑ i M i ( z ) σ i . Relation between M i ( z ) and L i ( z ) ? For any v and any σ one must have � σ , L v � = � M σ , v � � σ i ( L v ) i = ∑ i.e., dz ( M σ )( z ) v ( z ) i � � dz L i ( z ) v ( z ) ) = M i ( z ) σ i ) v ( z ) ∑ dz ( ∑ i.e., σ i ( i i � � dz L i ( z ) σ i v ( z ) = ∑ dz M i ( z ) σ i v ( z ) ∑ i.e., , i i ⇒ M i ( z ) = L i ( z ) (the two operators L and M = L t have

  18. the same kernels; when L operates, we make a sum (integral) over z ; when L t operates, we make a (discrete) sum over i .

  19. Case s = L v ⇔ s ( t ) = ∑ α L α ( t ) v α . Please do it. Case s = L v ⇔ s ( t ) = � dz L ( t , z ) v ( z ) . Please do it. � Case s = L v ⇔ s i ( t , ϕ ) = ∑ α ∑ β dz L i αβ ( t , ϕ , z ) v αβ ( z ) . Answer: � � ω = L t σ ⇔ ω αβ ( z ) = ∑ d ϕ L i dt αβ ( t , ϕ , z ) σ i ( t , ϕ ) i

  20. The transpose of the derivative operator The derivative operator D maps a space X of functions x = { x ( t ) } into a space V of functions v = { v ( t ) } . It is defined as v ( t ) = dx v = D x ⇔ dt ( t ) . It is obviously a linear operator. By definition, the transpose operator D t maps the dual of V (with functions that we may denote ω = { ω ( t ) } ) into the dual of X (with functions that we may denote χ = { χ ( t ) } ). We don’t need to be interested in the interpretation of the two spaces X ∗ and V ∗ . I want to prove that, excepted for some boundary condition (to be found), the derivative operator is antisymmetric, i.e., D t = − D .

  21. In other words, I want to prove the second of these two ex- pressions v ( t ) = dx v = D x ⇔ dt ( t ) χ ( t ) = − d ω χ = D t ω ⇔ dt ( t ) .

  22. The elements of the two spaces X and V are functions of the same variable t . Therefore the duality products in the two spaces are here quite similar � t 2 � t 2 � χ , x � X = dt χ ( t ) x ( t ) ; � ω , v � V = dt ω ( t ) v ( t ) t 1 t 1

  23. By definition of the transpose of a linear operator, for any x and any ω , one must have � ω , D x � = � D t ω , x � . Here, this gives � t 2 � t 2 dt ( D t ω )( t ) x ( t ) dt ω ( t ) ( D x )( t ) = . t 1 t 1 dt ( t ) , then ( D t ω )( t ) = − d ω To prove that if ( D x )( t ) = dx dt ( t ) , we must then prove that for any two functions x ( t ) and ω ( t ) , � t 2 � t 2 dt ω ( t ) dx dt d ω dt ( t ) + dt ( t ) x ( t ) = 0 . t 1 t 1

Recommend


More recommend