Some statistics for high-energy astrophysics with illustrations from XSPEC Andy Pollock European Space Agency XMM-Newton RGS Calibration Scientist Urbino Workshop in High-Energy Astrophysics 2008 July 31 A.M.T. Pollock European Space Astronomy Centre Statistics for high-energy astrophysics XMM-Newton SOC Villanueva de la Cañada, Madrid, Spain
Make every photon count. Account for every photon. A.M.T. Pollock European Space Astronomy Centre Statistics for high-energy astrophysics XMM-Newton SOC Villanueva de la Cañada, Madrid, Spain
Analysis in high-energy astrophysics data models { n i } i=1,N { µ i } i=1,N ≥ 0 individual events continuously distributed detector coordinates physical parameters never change change limited only by physics have no errors subject to fluctuations most precious resource predictions possible kept forever in archives kept forever in journals and textbooks statistics A.M.T. Pollock European Space Astronomy Centre Statistics for high-energy astrophysics XMM-Newton SOC Villanueva de la Cañada, Madrid, Spain
“There are three sorts of lies: lies, damned lies and statistics.” A.M.T. Pollock European Space Astronomy Centre Statistics for high-energy astrophysics XMM-Newton SOC Villanueva de la Cañada, Madrid, Spain
Statistical nature of scientific truth • Measurements in high-energy astrophysics collect individual events • Many different things could have happened to give those events • Alternatives are governed by the laws of probability • Direct inversion impossible • Information derived about the universe is not certain • Statistics quantifies the uncertainties : • What do we know ? • How well do we know it ? • Can we avoid mistakes ? • What should we do next ? A.M.T. Pollock European Space Astronomy Centre Statistics for high-energy astrophysics XMM-Newton SOC Villanueva de la Cañada, Madrid, Spain
There are two sorts of statistical inference • Classical statistical inference • infinite series of identical measurements (Frequentist) • hypothesis testing and rejection • the usual interpretation • Bayesian statistical inference • prior and posterior probabilities • currently popular • Neither especially relevant for astrophysics • one universe • irrelevance of prior probabilities and cost analysis • choice among many models driven by physics A.M.T. Pollock European Space Astronomy Centre Statistics for high-energy astrophysics XMM-Newton SOC Villanueva de la Cañada, Madrid, Spain
There are two sorts of statistic • χ 2 -statistic Gaussian statistics • C-statistic Poisson statistics A.M.T. Pollock European Space Astronomy Centre Statistics for high-energy astrophysics XMM-Newton SOC Villanueva de la Cañada, Madrid, Spain
There are two sorts of statistics • Gaussian statistics χ 2 • Poisson statistics C A.M.T. Pollock European Space Astronomy Centre Statistics for high-energy astrophysics XMM-Newton SOC Villanueva de la Cañada, Madrid, Spain
Gaussian statistics The Normal probability distribution P ( x| µ , σ ) for data={ x ∈ℜ ∈ℜ } and model={ µ , σ } 1 σ 1/3 1 σ 68.3% 2 ( ) 2 σ 1/22 2 σ 95.45% exp − x − µ 1 3 σ 1/370 3 σ 99.730% P ( x | µ , σ ) = 4 σ 1/15787 4 σ 99.99367% 2 σ 2 2 π σ 5 σ 1/1744277 5 σ 99.999943% P ( x| µ , σ ) + ∞ ∫ P ( x | µ , σ ) dx = 1 -1 σ +1 σ −∞ x 2 µ ( ) ln P = − x − µ ( ) + 1 σ − ln σ 2 π ∫ P ( x | µ , σ ) dx ≈ 0.6827 2 σ 2 − 1 σ A.M.T. Pollock European Space Astronomy Centre Statistics for high-energy astrophysics XMM-Newton SOC Villanueva de la Cañada, Madrid, Spain
Poisson statistics The Poisson probability distribution for data={ n ≥ 0} and model={ µ >0 >0 } ∀ n = 0,1,2, 3,..., ∞ P ( n | µ ) = e − µ µ n P (0 | µ ) = e − µ n ! P (1 | µ ) = e − µ µ 1 ∞ P (2 | µ ) = e − µ µ µ ∑ P ( n | µ ) = 1 1 2 P (3 | µ ) = e − µ µ µ µ n = 0 1 2 3 ln P = n ln µ − µ − ln n ! P ( n | µ ) = P ( n − 1 | µ ) µ n A.M.T. Pollock European Space Astronomy Centre Statistics for high-energy astrophysics XMM-Newton SOC Villanueva de la Cañada, Madrid, Spain
Likelihood of data on models { n i } i=1,N data statistics models { µ i } i=1,N N ∏ L = P ( n i | µ i ) i = 1 Gaussian Poisson N 2 n i e − µ i µ i N ( ) exp − n i − µ i 1 ∏ ∏ L = L = dn i 2 2 σ i n i ! 2 π σ i i = 1 i = 1 N 2 N N ( ) n i − µ i ln L = − 1 ∑ ∑ ∑ ( ) ln L = n i ln µ i − µ i − κ ln n i ! ln σ i + κ (ln dn i ) − 2 2 σ i i = 1 i = 1 i = 1 − 2ln L = χ 2 − 2ln L = C Cash 1979, ApJ, 228, 939 A.M.T. Pollock European Space Astronomy Centre Statistics for high-energy astrophysics XMM-Newton SOC Villanueva de la Cañada, Madrid, Spain
Numerical model of the life of a photon Detected data are governed by the laws of physics. The numerical model should reproduce as completely as possible every process that gives rise to events in the detector: • photon production in the source (or sources) of interest • intervening absorption • effects of the instrument • calibration • background components • cosmic X-ray background • local energetic particles • instrumental noise • model it, don’t subtract it A.M.T. Pollock European Space Astronomy Centre Statistics for high-energy astrophysics XMM-Newton SOC Villanueva de la Cañada, Madrid, Spain
An XMM-Newton RGS instrument A.M.T. Pollock European Space Astronomy Centre Statistics for high-energy astrophysics XMM-Newton SOC Villanueva de la Cañada, Madrid, Spain
RGS SAS & CCF CCF components m λ = d(cos β− β− cos α ) BORESIGHT LINCOORDS MISCDATA rgsproc • atthkgen • rgsoffsetcalc • rgssources HKPARMINT • rgsframes • rgsbadpix • rgsevents • evlistcomb ADUCONV • gtimerge BADPIX • rgsangles CROSSPSF • rgsfilter CTI • rgsregions • rgsspectrum LINESPREADFUNC • rgsrmfgen QUANTUMEFF • rgsfluxer REDIST EFFAREACORR 5-10% accuracy is a common calibration goal A.M.T. Pollock European Space Astronomy Centre Statistics for high-energy astrophysics XMM-Newton SOC Villanueva de la Cañada, Madrid, Spain
The final data model µ ( θ , β , Δ ,D ) =S( θ ( Ω )) ⊗ R( Ω < Δ >D)+B( β (D)) D = set of detector coordinates {X,Y,t,PI,…} S = source of interest θ = set of source parameters R = instrumental response Ω = set of physical coordinates { α , δ , τ , υ ,…} Δ = set of instrumental calibration parameters B = background β = set of background parameters ln L ( θ , β , Δ ) ln L ( θ ) A.M.T. Pollock European Space Astronomy Centre Statistics for high-energy astrophysics XMM-Newton SOC Villanueva de la Cañada, Madrid, Spain
Uses of the log-likelihood, ln L ( θ ) • ln L is what you need to assess all and any data models • locate the maximum-likelihood model when θ = θ * • minimum χ 2 is a maximum-likelihood Gaussian statistic • minimum C is a maximum-likelihood Poisson statistic • compute a goodness-of-fit statistic • reduced chi-squared χ 2 / ν ~ 1 ideally • reduced C C/ ν ~ 1 ideally • ν = number of degrees of freedom • estimate model parameters and uncertainties • ln L ( θ ) • θ * = { p 1 ,p 2 ,p 3 ,p 4 ,…,p M } • investigate the whole multi-dimensional surface ln L ( θ ) • compare two or more models • calibrating ln L, 2 Δ ln L σ √ 2 Δ ln L • 2 Δ ln L < 1. is not interesting • 2 Δ ln L > 10. is worth thinking about ( e.g. 2XMM DET_ML ≥ 8.) • 2 Δ ln L > 100. Hmmm… A.M.T. Pollock European Space Astronomy Centre Statistics for high-energy astrophysics XMM-Newton SOC Villanueva de la Cañada, Madrid, Spain
Example of a maximum-likelihood solution N-pixel image : data { n i } photons : model { µ i =sp i +b } : PSF p i : unknown parameters { s,b } N ∑ ln L = n i ln µ i − µ i i = 1 N ∑ ( ) − ( sp i + b ) n i ln sp i + b = i = 1 N ∂ ln L n i p i ∑ − p i = 0 = ∂ s sp i + b i = 1 N ∂ ln L n i ∑ − 1 = 0 = ∂ b sp i + b i = 1 N N s ∂ ln L + b ∂ ln L n i sp i n i b ∑ ∑ − sp i + − b = 0 = ∂ s ∂ b sp i + b sp i + b i = 1 i = 1 N N N ∑ ∑ ∑ n i = s p i + b 1 i = 1 i = 1 i = 1 A.M.T. Pollock European Space Astronomy Centre Statistics for high-energy astrophysics XMM-Newton SOC Villanueva de la Cañada, Madrid, Spain
Goodness-of-fit • Gaussian model and data are consistent if χ 2 / ν ~ 1 • ν = “number of degrees of freedom” = number of bins − number of free model parameters = N - M • cf < (x − µ ) 2 / σ 2 >=1 • same as comparison with best-possible ν =0 model, µ =x, • χ 2 = 2(ln L ( µ =x ) − ln L ( θ )) • Poisson model and data are consistent if C/ ν ~ 1 • comparison with best-possible ν =0 model, µ =n • 2 ∑ ( n i ln n i − n i ) − 2 ∑ ( n i ln µ i − µ i ) = 2 ∑ n i ln( n i / µ i ) − ( n i − µ i ) • XSPEC definition • What happens when many µ i « 1 && n i =0 ? Estimate model parameters and their uncertainties • Parameter error estimates, d θ , around maximum-likelihood solution, θ * • 2ln L ( θ * +d θ ) = 2ln L ( θ * ) + 1. for 1 σ (other choices than 1. sometimes made ) A.M.T. Pollock European Space Astronomy Centre Statistics for high-energy astrophysics XMM-Newton SOC Villanueva de la Cañada, Madrid, Spain
Recommend
More recommend