Princeton University Department of Geosciences Course on Inverse Problems Albert Tarantola Lesson X: Optimization
Optimization • If the volumetric probability f post ( M ) is expected to have a small number of maxima (say one, or two, or three), • we may try to locate them by using standard optimization methods (simplex methods, gradient-based methods), • and we may try to study f post ( M ) in the neighborhood of each optimum. Practical tip: simplex methods and gradient-based methods work much better with the function ψ ( M ) = log ( f post ( M ) / f 0 ) than with the function f post ( M ) .
Least-squares theory • The model parameter manifold may be a linear space, with vectors denoted m , m ′ , . . . , and the a priori information may have the Gaussian form � � 2 ( m − m prior ) t C m -1 ( m − m prior ) - 1 f prior ( m ) = k exp . • The observable parameter manifold may be a linear space, with vectors denoted o , o ′ , . . . , and the information brought by measurements may have the Gaussian form � � -1 ( o − o obs ) 2 ( o − o obs ) t C o - 1 g obs ( o ) = k exp . • The forward modeling relation becomes, with these nota- tions, o = o ( m ) .
Then, the posterior volumetric probability for the model pa- rameters, whose general expression is f post ( m ) = 1 ν f prior ( m ) g obs ( o ( m ) ) . here becomes f post ( m ) = k exp ( − S ( m ) ) , where the misfit function S ( m ) is the sum of squares 2 S ( m ) = ( m − m prior ) t C m -1 ( m − m prior ) -1 ( o ( m ) − o obs ) + ( o ( m ) − o obs ) t C o .
The maximum likelihood model is the model m maximizing f post ( m ) . It is also the model minimizing S ( m ) . It can be obtained using a quasi-Newton algorithm, m n + 1 = m n − H -1 , n γ n where the Hessian of S is H n = O t n C -1 o O n + C -1 , m and the gradient of S is γ n = O t n C -1 o ( o ( m n ) − o obs ) + C -1 m ( m n − m prior ) .
Here, the tangent linear operator O n is defined via o ( m n + δ m ) = o ( m n ) + O n δ m + . . . When the notations m = { m α } = { m 1 , m 2 , . . . , m p } o = { o i } = { o 1 , o 2 , . . . , o q } o i = o i ( m 1 , m 2 , . . . , m p ) apply, then O n is the matrix of partial derivatives ∂ o i O i α = ∂ m α (evaluated at point m n ).
As we have seen, the model m ∞ at which the algorithm con- verges maximizes the posterior volumetric probability f post ( m ) . To estimate the posterior uncertainties: the covariance oper- ator of the Gaussian volumetric probability that is tangent to f post ( m ) at m ∞ is C m = H -1 � , ∞ while the covariance operator of the Gaussian volumetric prob- ability that is tangent to g post ( o ) at o ∞ = o ( m ∞ ) is C o = O ∞ � � C m O t . ∞ Example: plants leafs and the radiative transfer model. ⇒ mathematica notebook
Recommend
More recommend