PRACTICAL ANALYTICS Tamás Budavári / The Johns Hopkins University 7/19/2012
Statistics Tamás Budavári Of numbers Of vectors Of functions Of trees ISSAC at HIPACC 7/19/2012
Statistics Tamás Budavári Description, modeling, inference, machine learning Bayesian / Frequentist / Pragmatist ? Supervised Unsupervised Discrete Classification Clustering Continuous Regression Dimensional Reduction ISSAC at HIPACC 7/19/2012
What’s Large? Tamás Budavári VOLUME Say >100TB today but tomorrow? Moving target… COMPLEXITY The raw dataset are simple unlike their derivatives DEFINITION? Large when you cannot apply the “usual” tools ISSAC at HIPACC 7/19/2012
Data LARGE !! ISSAC at HIPACC 7/19/2012
Data LARGE !! ISSAC at HIPACC 7/19/2012
Large? Tamás Budavári Sample size ISSAC at HIPACC 7/19/2012
Large? Tamás Budavári Sample size ISSAC at HIPACC 7/19/2012
Large? Tamás Budavári Dimensions Ratio of surface/volume grows all points are lonely in high dimensions ISSAC at HIPACC 7/19/2012
ISSAC at HIPACC 7/19/2012
Keeping Up? Tamás Budavári Image processing Catalog extraction O ( n ) What is difficult? O ( n log n ) O ( n 2 ), … Worse w/ Moore’s law ISSAC at HIPACC 7/19/2012
Fundamental Challenges Tamás Budavári Cross-identification of sources To assemble multicolor catalogs Drop-outs from sky coverage To constrain fluxes not detected Constraining physical properties To interpret the data ISSAC at HIPACC 7/19/2012
Cross-Identification From long-tail science to the largest experiments ISSAC at HIPACC 7/19/2012
Recording Observations Tamás Budavári Astronomers drew it… Now kids do it on SkyServer #1 by Haley ISSAC at HIPACC
Multicolor Universe Tamás Budavári ISSAC at HIPACC 7/19/2012
Eventful Universe Tamás Budavári ISSAC at HIPACC 7/19/2012
Cross-Identification One of the most fundamental analysis steps ISSAC at HIPACC 7/19/2012
What is the Right Question? Tamás Budavári Cross-identification is a hard problem Computationally, Scientifically & Statistically Need symmetric n -way solution Need reliable quality measure Same or not? Distance threshold? Maximum likelihood? ISSAC at HIPACC 7/19/2012
Tabletop Astronomy Tamás Budavári Imagine the observed sky has only 6 pixels One object : one die Observing : rolling a die Locality : die is loaded Sky : a bag of dice ISSAC at HIPACC 7/19/2012
Model Comparison: Same or Not? Tamás Budavári Crossmatch : draw two dice with replacement Same or not? Bayes Factor is the ratio of the Likelihood of “Same” Likelihood of “Not” Likelihood of a hypothesis? Sum over all possibilities ISSAC at HIPACC 7/19/2012
Model Comparison: Same or Not? Tamás Budavári Crossmatch : draw two dice with replacement Same or not? Bayes Factor is the ratio of the Likelihood of “Same” Likelihood of “Not” Likelihood of a hypothesis? Sum over all possibilities ISSAC at HIPACC 7/19/2012
Model Comparison: Same or Not? Tamás Budavári Model for loaded dice is matrix of probabilities E.g., loaded toward l =1 Etc. for l =2…6 2-way case Same: Not: n -way: same ISSAC at HIPACC 7/19/2012
Model Comparison: Same or Not? Tamás Budavári Model for loaded dice is matrix of probabilities E.g., loaded toward l =1 Etc. for l =2…6 2-way case Same: Not: n -way: same ISSAC at HIPACC 7/19/2012
Model Comparison: Same or Not? Tamás Budavári Model for loaded dice is matrix of probabilities E.g., loaded toward l =1 Etc. for l =2…6 2-way case Same: Not: n -way: same ISSAC at HIPACC 7/19/2012
Celestial Sphere Tamás Budavári Continuous functions General formalism Accuracy is a density fn on sky ISSAC at HIPACC 7/19/2012
Modeling the Astrometry Tamás Budavári Astrometric precision A simple function Where on the sky? Anywhere really… ISSAC at HIPACC 7/19/2012
Same or Not? Tamás Budavári OR The Bayes factor SAME H: all observations of the same object at m NOT K: might be from separate objects at { m i } ISSAC at HIPACC 7/19/2012
Same or Not? Tamás Budavári OR The Bayes factor SAME H: all observations of the same object at m NOT K: might be from separate objects at { m i } ISSAC at HIPACC 7/19/2012
Same or Not? Tamás Budavári OR The Bayes factor SAME H: all observations of the same object at m NOT K: might be from separate objects at { m i } ISSAC at HIPACC 7/19/2012
Same or Not? Tamás Budavári OR The Bayes factor SAME H: all observations of the same object at m On the sky Astrometry NOT K: might be from separate objects at { m i } ISSAC at HIPACC 7/19/2012
Same or Not? Tamás Budavári OR The Bayes factor SAME H: all observations of the same object at m On the sky Astrometry NOT K: might be from separate objects at { m i } ISSAC at HIPACC 7/19/2012
Analytic Results Tamás Budavári Normal distribution Flat and spherical Gauss and Fisher 2-way results ISSAC at HIPACC
Normal Distribution Tamás Budavári Astrometric precision: Fisher distribution: Analytic results: For high accuracies: ISSAC at HIPACC 7/19/2012
Wikipedia: Interpretation Tamás Budavári ISSAC at HIPACC 7/19/2012
Probability of a Match Same or not? ISSAC at HIPACC 7/19/2012
From Priors to Posteriors Tamás Budavári Bayes factor is the connection ISSAC at HIPACC 7/19/2012
From Priors to Posteriors Tamás Budavári Posterior probability from prior & Bayes factor Prior probability of a match Like dice in a bag: 1/ N and N 1 n In general? ISSAC at HIPACC 7/19/2012
From Priors to Posteriors Tamás Budavári Different selections Nearby / Distant Red / Blue But only 1 number ISSAC at HIPACC
Self-Consistent Estimates Tamás Budavári Prior has an unknown fudge-factor Educated guess TB & Szalay (2008) Or solve for it: ISSAC at HIPACC 7/19/2012
Simulations Tamás Budavári Mock objects With correct clustering U 01 values as properties 0 1 Simulated sources Subsets: N 1 N 2 Overlap: N ★ ISSAC at HIPACC 7/19/2012
Simulations Tamás Budavári Mock objects With correct clustering U 01 values as properties 0 1 Simulated sources Subsets: N 1 N 2 Overlap: N ★ ISSAC at HIPACC 7/19/2012
Simulations Tamás Budavári Quality Multiple matches Explained by simple model of point sources! Heinis, TB, Szalay (2009) ISSAC at HIPACC 7/19/2012
Proper Motion Tamás Budavári Same hypotheses but different parameters Just need prior to integrate Sources from SDSS ISSAC at HIPACC 7/19/2012
Proper Motion Tamás Budavári Same hypotheses but different parameters Just need prior to integrate Kerekes, TB+ (2010) Sources from SDSS ISSAC at HIPACC 7/19/2012
Matching Events Tamás Budavári Streams of events in time and space E.g., thresholded peaks in signal-to-noise (1) (x) (2) ISSAC at HIPACC 7/19/2012
Dropouts from Sky Coverage ISSAC at HIPACC 7/19/2012
Drawing with Equations Tamás Budavári TB, Szalay & Fekete (2010) r = 0.6 r = 0.5 ISSAC at HIPACC 7/19/2012
Matching in Practice ISSAC at HIPACC 7/19/2012
Open SkyQuery Tamás Budavári Following our 1 st prototype Successful Not bayesian Limitations ISSAC at HIPACC 7/19/2012
SkyQuery – The 3 rd Generation Tamás Budavári Dynamic federation of astronomy databases Query the collection as if they were one The 3 rd gen tool coming this summer Cluster of machines running partitioned jobs Proper probabilistic exec with variable errors ISSAC at HIPACC 7/19/2012
SkyQuery Tamás Budavári Almost pure standard SQL ISSAC at HIPACC 7/19/2012
SkyQuery Tamás Budavári Almost pure standard SQL ISSAC at HIPACC 7/19/2012
SkyQuery Tamás Budavári Almost pure standard SQL ISSAC at HIPACC 7/19/2012
SkyQuery Tamás Budavári Almost pure standard SQL Added XMATCH Verifiable Flexible ISSAC at HIPACC 7/19/2012
Tamás Budavári ISSAC at HIPACC 7/19/2012
HST Crossmatch Catalog RELEASE AT AAS Tamás Budavári SQL pipeline Astrometric TB & Lubow (2012) correction Subpixel precision ISSAC at HIPACC
HST Crossmatch Catalog RELEASE AT AAS Tamás Budavári FoF groups Possible chains Bayesian model selection Chainbreaker ISSAC at HIPACC
HST Crossmatch Catalog RELEASE AT AAS Tamás Budavári Lots of matching sources during HST’s long life TB & Lubow (2012) TB & Lubow (2012) ISSAC at HIPACC
Recommend
More recommend