announcements please turn in assignment 5 lecture today
play

Announcements Please turn in Assignment 5. Lecture today: HW will - PowerPoint PPT Presentation

Announcements Please turn in Assignment 5. Lecture today: HW will not be graded. But content from lecture and HW can show up on final exam. Final Exam: 13:00-16:00 Wednesday May 29, in H331 Note: no make-up of final exam except in cases of


  1. Announcements Please turn in Assignment 5. Lecture today: HW will not be graded. But content from lecture and HW can show up on final exam. Final Exam: 13:00-16:00 Wednesday May 29, in H331 Note: no make-up of final exam except in cases of emergency or prior arrangement Visualization Project due by email on May 28

  2. Bayesian analyses for parameter estimation Lecture 5: Gravitational Waves MSc Course

  3. How do we go from detector data... LVC, PRL 116, 241103 (2016)

  4. ...to astrophysical parameters? LVC, PRL 118, 221101 (2017) LVC, PRL 119, 161101 (2017)

  5. We’ve seen that we can apply the matched filtering technique with many different possible filters in a coarse template bank and extract possible events... What can we conclude? Can we claim detection? If it is a detection, how can we reconstruct the properties of the source? And with what accuracy?

  6. Probability Consider set S with subsets A , B , ... Probability is real-valued function that satisfies: S A 1. For every A in S , P ( A ) ≥ 0 . 2. For disjoint subsets ( A ⋂ B = 0) , B P ( A ⋃ B ) = P ( A ) + P ( B ). 3. P ( S ) = 1. Conditional probability: probability of A given B A B P ( A | B ) = P ( A ∩ B ) P ( B ) A ∩ B

  7. Frequentist versus Bayesian interpretation Frequentist 1. A , B , ... are outcome of repeatable experiment 2. P ( A ) is frequency of occurrence of A 3. P (data | hypothesis) or P (data | parameters) are probabilities of obtaining some data, given some hypothesis or given value of a parameter 4. Hypotheses are either correct or wrong and parameters have a true value. We do not talk about probabilities of hypotheses or parameters.

  8. Frequentist versus Bayesian interpretation Bayesian 1. A , B , ... are hypotheses, or theories, or parameters within a theory. 2. P ( A ) is probability that A is true. 3. P (data | hypothesis) or P (data | parameters) are probabilities of obtaining some data, given some hypothesis or given value of a parameter. 4. Hypotheses and parameters are associated with probability distribution functions.

  9. Bayes’ Theorem Given: P ( A ∩ B ) = P ( A | B ) P ( B ) P ( B ∩ A ) = P ( B | A ) P ( A ) A ∩ B = B ∩ A We can derive Bayes’ Theorem : P ( A | B ) = P ( B | A ) P ( A ) P ( B ) A = hypothesis (or parameters or theory) B = data P (hypothesis|data) ∝ P (data|hypothesis) P (hypothesis)

  10. More on conditional probability It is customary to explicitly denote probabilities being conditional on “all background information we have”: P( A | I ), P( B | I ) , ... All essential formulae are unaffected, for example: P ( A, B | I ) = P ( A | B, I ) P ( B | I ) P ( A | B, I ) = P ( B | A, I ) P ( A | I ) P ( B | I )

  11. Marginalization Consider sets such that B k - They are disjoint: B k ∩ B l = ∅ - They are exhaustive: is the Universe, or ∪ k B k X p ( B k | I ) = 1 k X p ( A | I ) = p ( A, B k | I ) Then, k Marginalization Rule

  12. Marginalization over continuous variable Consider the proposition, “The continuous variable x has the value .” α Not a well-defined meaning of probability: p ( x = α | I ) Instead assign probabilities to finite intervals: Z x 2 p ( x 1 ≤ x ≤ x 2 | I ) = pdf( x ) dx x 1 where pdf() is the probability density function. Z x max pdf( x ) dx = 1 x min Z x max Marginalization for p ( A ) = pdf( A, x ) dx continuous variables: x min

  13. More on Bayes’ Theorem Initial Understanding + New Observation = Updated Understanding Likelihood Prior probability function p ( h 0 | d ) = p ( d | h 0 ) p ( h 0 ) Posterior p ( d ) probability Evidence

  14. More on Bayes’ Theorem An experiment is performed, data is collected. d We are measuring parameter . θ Consider a model that allows us to calculate the H probability of getting data if parameter is known. d θ Posterior probability of : θ p ( θ | d, H, I ) = p ( d | θ , H, I ) p ( θ | H, I ) p ( d | H, I ) The evidence doesn’t depend on so θ p ( θ | d, H, I ) ∝ p ( d | θ , H, I ) p ( θ | H, I ) ignore for now:

  15. More parameters Can extend to more parameters: joint posterior p ( θ 1 , . . . , θ N | d, H, I ) If we want posterior distribution just for variable , θ 1 p ( θ 1 | d, H, I ) then we marginalize Z θ max Z θ max 2 N p ( θ 1 | d, H, I ) = p ( θ 1 , . . . , θ N | d, H, I ) d θ 2 . . . d θ N . . . θ min θ min 2 N

  16. The likelihood function Likelihood function p ( h 0 | d ) = p ( d | h 0 ) p ( h 0 ) p ( d ) Probability of data given hypothesis - a true “frequentist” probability In GW science, the likelihood function is the noise model.

  17. The likelihood function: the data If the detector noise is stationary and Gaussian: n ( f 0 ) i = δ ( f � f 0 )1 n ⇤ ( f )˜ h ˜ 2 S n ( f ) Gaussian probability distribution for noise: ( ) Z ∞ n 0 ( f ) | 2 | ˜ − 1 p ( n 0 ) = N exp d f 2 (1 / 2) S n ( f ) −∞ ⇢ − ( n 0 | n 0 ) � = N exp 2 Output of detector: , s ( t ) = h ( t ; θ t ) + n 0 ( t ) n 0 = s − h ( θ t ) Plug into ⇢ � − 1 p ( n 0 ) to get: Λ ( s | θ t ) = N exp 2( s − h ( θ t ) | s − h ( θ t ))

  18. The likelihood function: the data h t ≡ h ( θ t ) ⇢ � ( h t | s ) − 1 2( h t | h t ) − 1 Λ ( s | θ t ) = N exp 2( s | s ) In this form, information might not be very manageable. For binary coalescence there could be more than 15 parameters θ i

  19. The prior probability Prior probability p ( h 0 | d ) = p ( d | h 0 ) p ( h 0 ) p ( d ) Probability of hypothesis; makes no sense in frequentist interpretation. But for a Bayesian, one can make assumptions to include a prior; can be subjective. Thus, prior choices can influence results. Can be seen as the “degree of belief” that the hypothesis is true before a measurement is made.

  20. The prior probability p (0) ( θ t ) Examples in GW science: * Known distributions in space p (0) ( r ) dr ∼ r 2 dr for isotropic sources p (0) ( r ) dr ∼ rdr for sources in the Galaxy * Known mass distribution of neutron stars ~1.35 M ⦿

  21. The posterior probability p ( h 0 | d ) = p ( d | h 0 ) p ( h 0 ) Posterior p ( d ) probability Can be seen as the “degree of belief” that the hypothesis is true after a measurement is made. ⇢ � ( h t | s ) − 1 p ( θ t | s ) = N p (0) ( θ t )exp 2( h t | h t )

  22. The evidence p ( h 0 | d ) = p ( d | h 0 ) p ( h 0 ) p ( d ) Evidence The evidence is unimportant for parameter estimation (but not model selection). It is basically a normalization factor for parameter estimation. Notice that it doesn’t depend on the parameter being measured.

  23. The evidence: model selection p ( h 0 | d, M ) = p ( d | h 0 , M ) p ( h 0 | M ) p ( d | M ) M : any overall assumption or model (e.g. the signal is a GW, the binary black hole is spin-precessing, the binary components are neutron stars) Odds Ratio: Compare competing models, for example “GW170817 was a BNS” vs “GW170817 was a BBH”: O ij = p ( M i | d ) p ( M j | d ) = p ( M i ) p ( d | M i ) p ( M j ) p ( d | M j )

  24. What is the most probable value of the parameters, ? θ t A rule for assigning the most probable value is called an estimator. Choices of estimators include: 1. Maximum likelihood estimator 2. Maximum posterior probability 3. Bayes estimator

  25. 1. Maximum likelihood estimator ˆ Define as value which maximizes θ probability distribution: ⇢ � ( h t | s ) − 1 p ( θ t | s ) = N p (0) ( θ t )exp 2( h t | h t ) Let prior be flat. Then problem is to maximize the likelihood Λ ( s | θ t ) Generally simpler to maximize . log Λ log Λ ( s | θ t ) = ( h t | s ) − 1 2( h t | h t )  � ( h t | s ) − 1 ∂ 2( h t | h t ) = 0 ∂θ i t

  26. 2. Maximum posterior probability Allows us to include prior information. Then we maximize the full posterior probability: ⇢ � ( h t | s ) − 1 p ( θ t | s ) = N p (0) ( θ t )exp 2( h t | h t ) Non-trivial priors can lead to conceptual issues. (¯ θ 1 , ¯ For example, if is the maximum of the θ 2 ) distribution function , p ( θ 1 , θ 2 | s ) ¯ it is no longer true that is the maximum of θ 1 the reduced distribution function Z p ( θ 1 | s ) = ˜ d θ 2 p ( θ 1 , θ 2 | s ) , integrating out θ 2

  27. 3. Bayes’ Estimator Neither 1) nor 2) minimizes the error on the parameter estimation. Most probable values of parameters defined by Z d θ θ i p ( θ | s ) ˆ θ i B ( s ) ≡ Errors on parameters defined by matrix: Z h i h i θ i − ˆ θ j − ˆ Σ ij θ j θ i B = B ( s ) B ( s ) p ( θ | s ) d θ Independent of whether we integrate out a variable, minimizes parameter estimation error, but has a high computational cost.

  28. Confidence versus Credibility Frequentist relies on confidence interval (CI). Bayesian approach relies on a credible region (CR) For example , consider an experimental apparatus that provides values distributed as Gaussian around true value with standard deviation sigma x t σ − ( x − x t ) 2 1 ⇢ � P ( x | x t ) = (2 πσ 2 ) 1 / 2 exp 2 σ 2 One repetition of experiment yields value . x 0 = 5

  29. Frequentist confidence interval Use Neyman’s construction for 90% confidence level. 1. Find value such that 5% of area under P ( x | x 1 ) x 1 < x 0 is at . x > x 0 x 1 x 1 ' x 0 � 1 . 64485 σ x 0 1 3 5

Recommend


More recommend