Gravitational Wave Data Analysis: II. Model Selection and Parameter Estimation Chris Van Den Broeck Kavli RISE Summer School on Gravitational Waves, Cambridge, UK, 23-27 September 2019
Bayesian inference Ø Aim: use available data to § Evaluate which out of several hypotheses is the most likely: Model selection § Construct probability density distribution for parameters associated with hypotheses: Parameter estimation Ø Do this while making explicit all assumptions made
Probabilities of propositions Ø Propositions (or statements) denoted by uppercase letters: A, B, C, . . . , X Ø Boolean algebra: Conjunction : and are both true § A B A ∧ B Disjunction : At least one of or is true § A B A ∨ B Negation : is false § A ¬ A Implication : From follows § B A A ⇒ B
Probabilities of propositions Ø Useful to view propositions as sets which are subsets of a “Universe” Conjunction: intersection of sets § A ∧ B Disjunction: union of sets § A ∨ B Negation: complement within Universe § ¬ A Ø Each of these sets have a probability associated with them If then § A ⊂ B p B p ( A ) ≤ p ( B ) If and are disjoint then § A B p ( A ∨ B ) = p ( A ) + p ( B ) The Universe has probability 1, so that e.g. § p ( A ) + p ( ¬ A ) = 1
Bayes’ theorem Ø Conditional probability : p ( A | B ) ≡ p ( A ∧ B ) p ( B ) … from which follows the product rule : p ( A ∧ B ) = p ( A | B ) p ( B ) … and from the product rule follows Bayes’ theorem : p ( A | B ) = p ( B | A ) p ( A ) p ( B )
Marginalization Ø Note that for any and , A B and A ∧ B B A ∧ ( ¬ B ) are disjoint sets whose union is , so that A p ( A ) = p ( A ∧ B ) + p ( A ∧ ( ¬ B )) Ø Consider sets such that { B k } They are disjoint: § B k ^ B l = ; k 6 = l X They are exhaustive: is the Universe, so that § p ( B k ) = 1 ∨ k B k k Then one has X p ( A ) = p ( A ∧ B k ) k Marginalization rule
Marginalization over a continuous variable Ø Consider the proposition “The continuous variable has the value “ x α Then the probability might be zero p ( x = α ) Ø Instead assign probabilities to finite intervals: Z x 2 p ( x 1 ≤ x ≤ x 2 ) = pdf( x ) dx x 1 where is called the probability density function pdf( x ) " Exhaustiveness given by § Z x max pdf( x ) dx = 1 x min Ø Marginalization for continuous variables : Z x max p ( A ) = pdf( A, x ) dx x min
Application to gravitational wave data analysis Ø The template banks we use to search for signals from coalescing binaries is coarse at high masses Ø Information about angles and distance enter through the waveform amplitude, hence matched filtering with a normalized template only involves intrinsic parameters (masses, spins) ✓ S ( h (¯ ◆ θ i ) | s ) = max ( h (¯ θ i ) | h (¯ p N i θ i )) max Fast sky position estimates instead come from different arrival times and § phases at the different detectors a network Ø After detection has taken place, we will want information about all parameters Binary black holes: 15 parameters § { m 1 , m 2 , ~ S 1 , ~ S 2 , ↵ , � , ◆ , , d L , t c , ' c } Binary neutron stars: 17 parameters § { m 1 , m 2 , ~ S 1 , ~ S 2 , ↵ , � , ◆ , , d L , t c , ' c , Λ 1 , Λ 2 }
Application to gravitational wave data analysis Ø Parameter estimation : find the posterior probability density p (¯ θ | d, H ) where ¯ are the parameters § θ = ( θ 1 , θ 2 , . . . , θ N ) is the hypothesi s that e.g. the signal was from the inspiral of two § H ) + h (¯ neutron stars, which comes with a family of waveforms θ ; t ) are the detector data § d ( t ) = n ( t ) + h (¯ θ ; t ) Ø Model selection : compare different hypotheses though an odds ratio H 2 = p ( H 1 | d ) O H 1 p ( H 2 | d ) where The hypotheses , correspond to different waveform models § H 1 H 2 Binary neutron star versus binary black hole § Waveform predicted by general relativity versus alternative theory of gravity § … § The probabilities (not probability densities) , § p ( H 1 | d ) p ( H 2 | d ) do not involve any statement about parameters
1. Parameter estimation Ø Using Bayes’ theorem: θ |H , d ) = p ( d |H , ¯ θ ) p (¯ θ |H ) p (¯ p ( d |H ) where ) = p ( d |H , ¯ is called the likelihood § θ ) θ ) p (¯ ¯ |H θ |H ) is the prior probability density § p ( d |H ) the evidence for the hypothesis § θ ) p (¯ ¯ Ø The prior probability density is a function we will choose ourselves, based θ |H ) on what we know about them prior to the measurement: If the hypothesis is binary neutron star inspiral, then we can take the prior on the § component masses to be uniform in the interval [1 , 3] M � § For sources that roughly uniformly distributed over spatial volume, we take distance p ( r ) dr ∝ r 2 dr prior The prior for all parameters together is usually taken to be the product of priors for the § individual parameters |H Ø The evidence is not important here; it is set by the requirement that the p ( d |H ) p (¯ posterior probability density be normalized θ |H , d ) ) = p ( d |H , ¯ Ø The likelihood is something we can calculate! θ )
1. Parameter estimation ) = p ( d |H , ¯ Ø How to calculate the likelihood ? θ ) Ø One has d ( t ) = n ( t ) + h (¯ θ ; t ) In the conditional probability density above, the hypotheses and parameter § values are assumed known, hence is assumed known h (¯ θ ; t ) We have a probability distribution for noise realizations! § Ø Assuming stationary, Gaussian noise, n ( f ) | 2 R ∞ | ˜ p [ n ] = N e − 2 Sn ( f ) d f 0 Z ∞ A ∗ ( f ) ˜ ˜ B ( f ) or in terms of the noise-weighted inner product : ( A | B ) = 4 < d f S n ( f ) p [ n ] = N e − 1 2 ( n | n ) 0 Ø But in our case we can write , which gives us n ( f ) = ˜ d ( f ) − ˜ h (¯ ˜ θ ; f ) p ( d |H , ¯ θ ) = N e − 1 2 ( d − h | d − h ) Ø We now have all we need to calculate the posterior probability density of the parameters: θ |H , d ) = p ( d |H , ¯ θ ) p (¯ θ |H ) p (¯ p ( d |H )
1. Parameter estimation p (¯ θ | d, H ) ∝ p ( d | ¯ θ , H ) p (¯ θ |H ) Ø The posterior is the likelihood weighted by the prior Conclusions drawn are based on: Experimental data obtained (likelihood) § Information available before experiment (prior) § Ø If we want posterior distribution for just one variable then we marginalize θ 1 over all the others: Z θ max Z θ max 2 N p ( θ 1 | d, H ) = . . . p ( θ 1 , θ 2 , . . . , θ N ) d θ 2 . . . d θ N θ min θ min 2 N
2. Model selection Ø Suppose we want to compare two hypotheses , H 1 H 2 Binary neutron star versus binary black hole § Waveform predicted by general relativity versus alternative theory of gravity § … § Ø Want to compare probabilities and p ( H 1 | d ) p ( H 2 | d ) Ø Bayes theorem for e.g. : H 1 p ( H 1 | d ) = p ( d |H 1 ) p ( H 1 ) p ( d ) Ø Define odds ratio : H 2 = p ( H 1 | d ) O H 1 p ( H 2 | d ) = p ( d |H 1 ) p ( H 1 ) p ( d |H 2 ) p ( H 2 ) where factors of have canceled out p ( d ) ratio of prior odds § p ( H 1 ) /p ( H 2 ) ratio of evidences § p ( d |H 1 ) /p ( d |H 2 )
2. Model selection Ø Recall from parameter estimation: θ |H , d ) = p ( d |H , ¯ θ ) p (¯ θ |H ) p (¯ p ( d |H ) or p (¯ θ | d, H ) p ( d |H ) = p ( d |H , ¯ θ ) p (¯ θ |H ) Ø Integrate both sides over all parameters: Z Z p (¯ p ( d |H , ¯ θ ) p (¯ θ | d, H ) p ( d |H ) d N θ = θ |H ) d N θ Z p (¯ Note that independent of parameter(s), and is normalized, ) p ( d |H ) θ | d, H ) hence left hand side becomes: Z Z p (¯ p (¯ θ | d, H ) p ( d |H ) d N θ = p ( d |H ) θ | d, H ) d N θ = p ( d |H ) Therefore the evidence is given by Z p ( d |H , ¯ θ ) p (¯ θ |H ) d N θ p ( d |H ) =
2. Model selection Ø Odds ratio H 2 = p ( H 1 | d ) O H 1 p ( H 2 | d ) = p ( d |H 1 ) p ( H 1 ) p ( d |H 2 ) p ( H 2 ) Ø Define Bayes factor H 2 = p ( d |H 1 ) B H 1 p ( d |H 2 ) Z p ( d |H , ¯ θ ) p (¯ Ø Evidences θ |H ) d N θ p ( d |H ) = Ø Hypotheses can have arbitrary number of free parameters Does model that fits data the best tend to give highest evidence? § If so, model with more parameters could give highest evidence even if § incorrect !
Occam’s razor Ø For simplicity, compare two hypotheses of the following form: has no free parameters § X Y has one free parameter, § X Y λ Will automatically be favored over ? X Y X Y Y = p ( d | X ) p ( X ) Ø Odds ratio O X p ( d | Y ) p ( Y ) Ø Evidence for : X Y Z Z p ( d | Y ) = p ( d | λ , Y ) p ( λ | Y ) d λ " Ø For simplicity assume flat prior for : λ ∈ [ λ min , λ max ] 1 p ( λ | Y ) = λ max − λ min
Recommend
More recommend