sio 230 introduction to geophysical inverse theory
play

SIO 230 Introduction to Geophysical Inverse Theory Cathy Constable - PowerPoint PPT Presentation

SIO 230 Introduction to Geophysical Inverse Theory Cathy Constable cconstable@ucsd.edu IGPP Munk Lab 329 x43183 http://igppweb.ucsd.edu/~cathy/Classes/SIO230 Background Reading Sections 1-4 (pages 1-19) of Supplementary Notes


  1. The Marquardt method combines the steepest descent method and Newton method in one algorithm by modifying the curvature matrix: α jk = α jk (1 + λ ) for j = k α jk = α jk for j � = k . When is large, this reduces to steepest descent. When is small, it λ λ reduces to the Newton method. Starting with large and then λ reducing it close to the solution works very well. For problems that are naturally discretely parameterized, Marquardt is hard to beat. For sparse parameterizations of infinite dimensional models, the parameterization (e.g. number of layers chosen) has a big influence on the outcome.

  2. Global versus local minima: For nonlinear problems, there are no guarantees that Gauss-Newton will converge. There are no guarantees that if it does converge the solution is a global one. The solution might well depend on the starting model. global minimum local minimum (maybe)

  3. If you increase N too much, even with the Marquardt approach the solution goes unstable. If N is big then the solutions become unstable, oscillatory, and generally useless (they are probably trying to converge to D+ type solutions). Almost all inversion today incorporates some type of regularization, which minimizes some aspect of the model as well as fit to data: U = || Rm 1 || 2 + µ − 1 � || Wd − W f ( m 1 ) || 2 ⇥ where is some measure of the model and is a trade-off Rm µ parameter or Lagrange multiplier.

  4. U = || Rm 1 || 2 + µ − 1 � || Wd − W f ( m 1 ) || 2 ⇥ In 1D a typical might be: R m 1 -1   − 1 1 0 0 0 0 m 2 . . . +1 -1 0 − 1 1 0 0 0 m 3 +1   -1 . . .   0 0 − 1 1 0 0 m 4 +1 -1   . . . R 1 =   m 5 ... ... +1 -1   m 6 +1 -1     m 7 -1 +1   +1 m 8 − 1 1 which extracts a measure of slope. This stabilizes the inversion, creates a unique solution, and manufactures models with useful properties. This is easily extended to 2D and 3D modelling.

  5. U = || Rm 1 || 2 + µ − 1 � || Wd − W f ( m 1 ) || 2 ⇥ When is small, model roughness is ignored and we try to fit the data. µ When is large, we smooth the model at the expense of data fit. µ One approach is to choose and minimize by least squares. There U µ are various sets of machinery to do this (Newton, quasi-Newton, conjugate gradients, etc.). With many of these methods must be µ chosen by trial and error, increasing the computational burden and introducing some subjectivity. Picking a priori is simply choosing how rough your model is µ compared to the data misfit. But, we’ve no idea how rough our model should be. However, we ought to have a decent idea of how well our data can be fit. χ 2 The Occam approach is to introduce some acceptable fit to the data ( ∗ ) and minimize: U = || Rm 1 || 2 + µ − 1 � || Wd − W f ( m 1 ) || 2 − χ 2 ⇥ ∗

  6. U = || Rm 1 || 2 + µ − 1 � || Wd − W f ( m 1 ) || 2 − χ 2 ⇥ ∗ Linearizing: U = || Rm 1 || 2 + µ − 1 � || 2 − χ 2 � ⇥ ⇥ || Wd − W f ( m 0 ) + J ( m 1 − m 0 ) ∗ After differentiation and setting to zero we get µ R T R + ( WJ ) T WJ ⇥ − 1 ( WJ ) T W ( d − f ( m 0 ) + Jm 0 ) � m 1 = The only thing we need is to find the right value for . µ (Note we are solving for the next model m 1 directly instead of . ∆ m Bob Parker calls these “leaping” and “creeping” algorithms.)

  7. Occam finds by carrying out a line search to find the ideal value. µ χ 2 χ 2 χ 2 Before is reached, we minimize . After is reached we ∗ ∗ χ 2 choose the which gives us exactly . µ ∗ χ 2 χ 2 ∗ µ

  8. Non-linear Gravity Problem

  9. χ 2 How to choose ? ∗ For zero-mean, Gaussian, independent errors, the sum-square misfit χ 2 = || Wd − W ˆ d || 2 is chi-squared distributed with M degrees of freedom. The expectation value is just M , which corresponds to an RMS of one, and so this could be a reasonable target misfit. Or, one could look up the 95% (or other) confidence interval for chi-squared M . RMS = 1 RMS = 1.36

  10. So, if our errors are well estimated and well behaved, this provides a χ 2 statistical guideline for choosing . ∗ Errors come from • statistical processing errors • systematic errors such as instrument calibrations, and • “geological or geophysical noise” (our inability to parameterize fine details of geology or extraneous physical processes). Instrument noise should be captured by processing errors, but some error models assume stationarity (i.e. noise statistics don’t vary with time). In practice, we only have a good handle on processing errors - everything else is lumped into a noise floor.

  11. Even with well-estimated errors, choice of misfit can still be somewhat subjective. Joint 2D inversion of marine CSEM (3 frequencies, no phase) and MT (Gemini salt prospect, Gulf of Mexico): Target misfit 2.4 Target misfit 2.0 Target misfit 1.5 Target misfit 1.3 Target misfit 1.2 Target misfit 1.1

  12. Beware of trade-off (“L”) curves: 2.5 2.4 A 2.0 2 RMS misfit RMS 1.4 1.7 1.5 1.5 1.3 1.2 1.1 1 0 100 200 300 400 500 600 700 800 Roughness measure

  13. Beware of trade-off (“L”) curves: 2.5 10 2.4 A B 9 8.75 8 7 2.0 2 6 RMS misfit RMS misfit RMS 1.4 1.7 5 RMS 1.9 4 1.5 1.5 3 1.3 2.4 2.0 2 1.2 1.7 1.5 1.3 1.2 1.1 1.1 1 1 0 100 200 300 400 500 600 700 800 0 100 200 300 400 500 600 700 800 Roughness measure Roughness measure (they are not as objective as proponents say...)

  14. Choice of Misfit Again Regularization Regularization Regularization Solution on the trade-off curve Solution on the trade-off curve Solution on the trade-off curve Regularization Regularization Regularization Example of core surface field trade-off curve Solution on the trade-off curve Solution on the trade-off curve

  15. Bayesian Methods • Read Chapter 11 of Aster et al. - many figures drawn from there • Read Appendix B if you need a refresher on Statistics and Maximum Likelihood, especially sections on Conditional Probability and Bayes Theorem. Or consult notes from SIO223A (Chapters 4 and 5) • Bayesian Methods treat the model as a random variable

  16. Travel Time Picks SIO 230: Lecture 8

  17. • For a Bayesian approach to the travel time problem read Malinverno & Parker (2006, doi:10.1190/1.2194516),Two ways to quantify uncertainty in geophysical inverse problems.

  18. Shaw diffraction intensity problem Aster et al., 2013

  19. Aster et al., 2013

  20. Aster et al., 2013

  21. Aster et al., 2013

  22. Aster et al., 2013

  23. Aster et al., 2013

  24. Aster et al., 2013

  25. Monte Carlo Methods in Inversion For details see Sambridge & Mosegaard, 2002, Rev Geophysics, doi:10.1029/2000RG000089 • Direct search method • Use pseudorandom sampling • includes random sampling from highly nonuniform multidimensional distributions

  26. Sambridge & Mosegaard, 2002

  27. Simulated Annealing • general purpose global optimization method - based on Metropolis sampling algorithm • numerous geophysical applications in optimization problems since mid 1980s • Annealing schedule controls characteristics of sampling

  28. Genetic Algorithms • not originally designed as function optimizers • broad class of methods • 1st used by geophysicists as global optimizers in 1990s. Several papers on seismic waveform fitting • search involves control parameters and tuning for each problem may be non-trivial

  29. Other global optimization • Evolutionary programming • Tabu search • Neighborhood algorithm direct search • Parallel Tempering

  30. Ensemble Inference • How to assess tradeoffs, constraints, and resolution in multimodal nonlinear problems • Bayesian approach - Tarantola and Valette (1982) • Bayesian inference named for Bayes (1763)

  31. Linearization or Monte Carlo? • depends on (i) complexity of data-model relationship, (ii) number of unknowns, (iii) computational resources • Some advantages in stability from not relying on model perturbations or matrix inversions • Appraisal of solution is usually more reliable with MC - don’t use linearized derivatives for model covariance and resolution estimates

  32. MC issues • MC only applicable to discretized problems • Large number of unknowns renders direct search impractical • Which MC methods should be used?

  33. Generating Random Samples • Pseudo random deviates vs quasi random sequences Sambridge & Mosegaard, 2002

  34. Trading off exploration and exploitation Smoothly varying near quadratic objective functions could converge rapidly with Newton-type descent method. Highly nonlinear problems with multiple minima in objective function better approached with a combination of exploration and exploitation Sambridge & Mosegaard, 2002

Recommend


More recommend