The ARCHES cross-correlation tool cois-Xavier Pineau 1 Fran¸ 1 Observatoire Astronomique de Strasbourg, Universit´ e de Strasbourg, CNRS Paris, 1 th December, 2015 1 / 36
INTRODUCTION This talk: “ Cross-correlation tool development & catalogue creation ” (WP4) Aims of ARCHES’s WP4: ◮ Create a public n -catalogues cross-correlation tool: ⋆ No magic BUT a flexible/multi-purpose/scriptable multi-catalogue xmatch engine ⋆ Usable as a building block from you own specific code ◮ Use/develop statistical methods to compute probabilities of associations: ⋆ Astrometry based probabilities only! ⋆ Can be combined with photometry based probabilities (in a further step) ◮ Use the tool to build ARCHES catalogue(s) Beyond the ARCHES project: ◮ tool will be part of the CDS XMatch Service ◮ ⇒ will be maintained, will keep evolving 2 / 36
INTRODUCTION This talk: “ Cross-correlation tool development & catalogue creation ” (WP4) Aims of ARCHES’s WP4: ◮ Create a public n -catalogues cross-correlation tool: ⋆ No magic BUT a flexible/multi-purpose/scriptable multi-catalogue xmatch engine ⋆ Usable as a building block from you own specific code ◮ Use/develop statistical methods to compute probabilities of associations: ⋆ Astrometry based probabilities only! ⋆ Can be combined with photometry based probabilities (in a further step) ◮ Use the tool to build ARCHES catalogue(s) Beyond the ARCHES project: ◮ tool will be part of the CDS XMatch Service ◮ ⇒ will be maintained, will keep evolving 2 / 36
INTRODUCTION This talk is mainly focused on the probabilistic part More details on the tool during the Hands on session 3 / 36
METHOD Steps to probabilistic positional xmatch ◮ Make simplifying assumptions ◮ Select candidates: select and group together sources possibly being various detections of a same real source ⋆ Need for a selection criterion ◮ Make hypothesis: are the sources really from a same real sources or from different real sources? ◮ For each hypothesis: ⋆ derive the associated likelihood ⋆ derive the associated prior ◮ Compute astrometry based probabilities 4 / 36
SIMPLIFYING ASSUMPTIONS Radical simplifying assumptions: ◮ No proper motions ◮ No blending ◮ No clustering (density of sources = Poisson law) ◮ No systematic offsets ◮ You can trust positional uncertainties provided in catalogues 5 / 36
CANDIDATE SELECTION Candidate selection criterion How to select a group of n sources from n distinct catalogues as possibly being various observations of a same actual source? Statistical hypothesis testing ◮ H 0 (null hypothesis): all n sources are from the same real source ◮ H 1 = ¯ H 0 (alternative hypothesis): at least one source (out of n ) is spurious User input: γ , the probability to accept H 0 while it is true ◮ γ (I call it completeness) is called true negative rate ◮ we usually fix γ = 0 . 9973 (99 . 73%, value of the 3 σ rule in 1 dimensional pb) ◮ ⇔ fixing the type I error = 0.027% = proba to reject null hypothesis while it is true ◮ we (theoretically) miss 27/10 000 real association The criterion used is based on a χ 2 test of 2( n − 1) degrees of freedom Now, a few slides to explain it since it plays a role in probabilities 6 / 36
CANDIDATE SELECTION Candidate selection criterion How to select a group of n sources from n distinct catalogues as possibly being various observations of a same actual source? Statistical hypothesis testing ◮ H 0 (null hypothesis): all n sources are from the same real source ◮ H 1 = ¯ H 0 (alternative hypothesis): at least one source (out of n ) is spurious User input: γ , the probability to accept H 0 while it is true ◮ γ (I call it completeness) is called true negative rate ◮ we usually fix γ = 0 . 9973 (99 . 73%, value of the 3 σ rule in 1 dimensional pb) ◮ ⇔ fixing the type I error = 0.027% = proba to reject null hypothesis while it is true ◮ we (theoretically) miss 27/10 000 real association The criterion used is based on a χ 2 test of 2( n − 1) degrees of freedom Now, a few slides to explain it since it plays a role in probabilities 6 / 36
CANDIDATE SELECTION Candidate selection criterion How to select a group of n sources from n distinct catalogues as possibly being various observations of a same actual source? Statistical hypothesis testing ◮ H 0 (null hypothesis): all n sources are from the same real source ◮ H 1 = ¯ H 0 (alternative hypothesis): at least one source (out of n ) is spurious User input: γ , the probability to accept H 0 while it is true ◮ γ (I call it completeness) is called true negative rate ◮ we usually fix γ = 0 . 9973 (99 . 73%, value of the 3 σ rule in 1 dimensional pb) ◮ ⇔ fixing the type I error = 0.027% = proba to reject null hypothesis while it is true ◮ we (theoretically) miss 27/10 000 real association The criterion used is based on a χ 2 test of 2( n − 1) degrees of freedom Now, a few slides to explain it since it plays a role in probabilities 6 / 36
CANDIDATE SELECTION Classical 2 catalogues case In the classical case (e.g. De Ruiter et al. 1977): ◮ Errors are independant on α and δ ◮ Source 1 has errors σ α 1 and σ δ 1 on α and δ respectively ◮ Source 2 has errors σ α 2 and σ δ 2 on α and δ respectively ◮ The normalized distance (or σ -distance) is defined by: � 1 / 2 � ∆ α 2 ∆ δ 2 r = + σ 2 δ 1 + σ 2 σ 2 α 1 + σ 2 α 2 δ 2 7 / 36
CANDIDATE SELECTION Classical 2 catalogues case More generally (see e.g. Pineau et al. 2011) ◮ We assimilate locally the surface of the sphere to the Euclidian plane ◮ The positions of the 2 sources are 2 dimentional vectors: � µ 1 and � µ 2 . ◮ Errors on � µ 1 and � µ 2 are oriented ellipses defined by covariance matrices V 1 and V 2 respectively: ◮ The normallized distance becomes (vectorial form): � 1 / 2 � µ 2 ) T ( V 1 + V 2 ) − 1 ( � r = ( � µ 1 − � µ 1 − � µ 2 ) ◮ ⇒ equation of an ellipse of radius r and covariance matrix V 1 + V 2 8 / 36
CANDIDATE SELECTION Classical 2 catalogues case For real associations, i.e. when H 0 is true ◮ The distribution of normalized distances is a Rayleigh distribution of scale σ = 1 Rayleigh distribution 0 . 6 xe − x 2 / 2 0 . 5 Density of probability 0 . 4 H 0 ∼ Rayleigh r 0 . 3 0 . 2 0 . 1 0 1 2 3 4 5 6 x 9 / 36
CANDIDATE SELECTION Classical 2 catalogues case Fixing the completeness γ ⇔ fixing a normalized distance threshold k γ : � k γ Rayleigh ( r ) d r = γ 0 For γ = 99 . 73% (the 1D 3 σ rule) ⇒ k γ = 3 . 4395 (not 3!) So, for 2 sources from 2 distinct catalogues, the selection criterion is � 1 / 2 ≤ k γ µ 2 ) T ( V 1 + V 2 ) − 1 ( � � ( � µ 1 − � µ 1 − � µ 2 ) I.e. source 2 kept as candidate if it is inside an error ellipse of covariance matrix V = V 1 + V 2 and of radius k γ , centered around source 1. ⇒ the surface area of the acceptance region is | V 1 + V 2 | 1 / 2 π k 2 γ 10 / 36
CANDIDATE SELECTION Classical 2 catalogues case Fixing the completeness γ ⇔ fixing a normalized distance threshold k γ : � k γ Rayleigh ( r ) d r = γ 0 For γ = 99 . 73% (the 1D 3 σ rule) ⇒ k γ = 3 . 4395 (not 3!) So, for 2 sources from 2 distinct catalogues, the selection criterion is � 1 / 2 ≤ k γ µ 2 ) T ( V 1 + V 2 ) − 1 ( � � ( � µ 1 − � µ 1 − � µ 2 ) I.e. source 2 kept as candidate if it is inside an error ellipse of covariance matrix V = V 1 + V 2 and of radius k γ , centered around source 1. ⇒ the surface area of the acceptance region is | V 1 + V 2 | 1 / 2 π k 2 γ 10 / 36
CANDIDATE SELECTION Classical 2 catalogues case Fixing the completeness γ ⇔ fixing a normalized distance threshold k γ : � k γ Rayleigh ( r ) d r = γ 0 For γ = 99 . 73% (the 1D 3 σ rule) ⇒ k γ = 3 . 4395 (not 3!) So, for 2 sources from 2 distinct catalogues, the selection criterion is � 1 / 2 ≤ k γ µ 2 ) T ( V 1 + V 2 ) − 1 ( � � ( � µ 1 − � µ 1 − � µ 2 ) I.e. source 2 kept as candidate if it is inside an error ellipse of covariance matrix V = V 1 + V 2 and of radius k γ , centered around source 1. ⇒ the surface area of the acceptance region is | V 1 + V 2 | 1 / 2 π k 2 γ 10 / 36
CANDIDATE SELECTION Now, a different version of the same story more easily generalisable to n -catalogues. 11 / 36
CANDIDATE SELECTION Revisited 2 catalogues case I have 2 sources from 2 distinct catalogues, I suppose H 0 is true Maximum Likelihood Estimate (MLE) of the position of the real source � the weighted mean position µ Σ = V Σ ( V − 1 µ 1 + V − 1 � � µ 2 ) � 1 2 in which V Σ = ( V − 1 + V − 1 2 ) − 1 1 The error on this MLE is ... V Σ The result is the same with a (by block) Weighted Least Squares method We can now define the Mahalanobis distance: � 2 � 1 / 2 H 0 � µ Σ ) T V − 1 D M = ( � µ i − � ( � µ i − � µ Σ ) ∼ χ dof =2 i i =1 12 / 36
CANDIDATE SELECTION Revisited 2 catalogues case I have 2 sources from 2 distinct catalogues, I suppose H 0 is true Maximum Likelihood Estimate (MLE) of the position of the real source � the weighted mean position µ Σ = V Σ ( V − 1 µ 1 + V − 1 � � µ 2 ) � 1 2 in which V Σ = ( V − 1 + V − 1 2 ) − 1 1 The error on this MLE is ... V Σ The result is the same with a (by block) Weighted Least Squares method We can now define the Mahalanobis distance: � 2 � 1 / 2 H 0 � µ Σ ) T V − 1 D M = ( � µ i − � ( � µ i − � µ Σ ) ∼ χ dof =2 i i =1 12 / 36
Recommend
More recommend