PRACTICAL ANALYTICS Tamás Budavári / The Johns Hopkins University 7/19/2012
Statistics Tamás Budavári  Of numbers  Of vectors  Of functions  Of trees ISSAC at HIPACC 7/19/2012
Statistics Tamás Budavári  Description, modeling, inference, machine learning  Bayesian / Frequentist / Pragmatist ? Supervised Unsupervised Discrete Classification Clustering Continuous Regression Dimensional Reduction ISSAC at HIPACC 7/19/2012
What’s Large? Tamás Budavári  VOLUME  Say >100TB today but tomorrow? Moving target…  COMPLEXITY  The raw dataset are simple unlike their derivatives  DEFINITION?  Large when you cannot apply the “usual” tools ISSAC at HIPACC 7/19/2012
Data LARGE !! ISSAC at HIPACC 7/19/2012
Data LARGE !! ISSAC at HIPACC 7/19/2012
Large? Tamás Budavári  Sample size ISSAC at HIPACC 7/19/2012
Large? Tamás Budavári  Sample size ISSAC at HIPACC 7/19/2012
Large? Tamás Budavári  Dimensions  Ratio of surface/volume grows all points are lonely in high dimensions ISSAC at HIPACC 7/19/2012
ISSAC at HIPACC 7/19/2012
Keeping Up? Tamás Budavári  Image processing  Catalog extraction  O ( n )  What is difficult?  O ( n log n )  O ( n 2 ), … Worse w/ Moore’s law ISSAC at HIPACC 7/19/2012
Fundamental Challenges Tamás Budavári  Cross-identification of sources  To assemble multicolor catalogs  Drop-outs from sky coverage  To constrain fluxes not detected  Constraining physical properties  To interpret the data ISSAC at HIPACC 7/19/2012
Cross-Identification From long-tail science to the largest experiments ISSAC at HIPACC 7/19/2012
Recording Observations Tamás Budavári  Astronomers drew it…  Now kids do it on SkyServer #1 by Haley  ISSAC at HIPACC
Multicolor Universe Tamás Budavári ISSAC at HIPACC 7/19/2012
Eventful Universe Tamás Budavári ISSAC at HIPACC 7/19/2012
Cross-Identification One of the most fundamental analysis steps ISSAC at HIPACC 7/19/2012
What is the Right Question? Tamás Budavári  Cross-identification is a hard problem  Computationally, Scientifically & Statistically  Need symmetric n -way solution  Need reliable quality measure  Same or not?  Distance threshold? Maximum likelihood? ISSAC at HIPACC 7/19/2012
Tabletop Astronomy Tamás Budavári  Imagine the observed sky has only 6 pixels  One object : one die  Observing : rolling a die  Locality : die is loaded  Sky : a bag of dice ISSAC at HIPACC 7/19/2012
Model Comparison: Same or Not? Tamás Budavári  Crossmatch : draw two dice with replacement  Same or not?  Bayes Factor is the ratio of the  Likelihood of “Same”  Likelihood of “Not”  Likelihood of a hypothesis?  Sum over all possibilities ISSAC at HIPACC 7/19/2012
Model Comparison: Same or Not? Tamás Budavári  Crossmatch : draw two dice with replacement  Same or not?  Bayes Factor is the ratio of the  Likelihood of “Same”  Likelihood of “Not”  Likelihood of a hypothesis?  Sum over all possibilities ISSAC at HIPACC 7/19/2012
Model Comparison: Same or Not? Tamás Budavári  Model for loaded dice is matrix of probabilities  E.g., loaded toward l =1  Etc. for l =2…6  2-way case  Same:  Not:  n -way: same ISSAC at HIPACC 7/19/2012
Model Comparison: Same or Not? Tamás Budavári  Model for loaded dice is matrix of probabilities  E.g., loaded toward l =1  Etc. for l =2…6  2-way case  Same:  Not:  n -way: same ISSAC at HIPACC 7/19/2012
Model Comparison: Same or Not? Tamás Budavári  Model for loaded dice is matrix of probabilities  E.g., loaded toward l =1  Etc. for l =2…6  2-way case  Same:  Not:  n -way: same ISSAC at HIPACC 7/19/2012
Celestial Sphere Tamás Budavári  Continuous functions  General formalism  Accuracy is a density fn on sky ISSAC at HIPACC 7/19/2012
Modeling the Astrometry Tamás Budavári  Astrometric precision  A simple function  Where on the sky?  Anywhere really… ISSAC at HIPACC 7/19/2012
Same or Not? Tamás Budavári OR  The Bayes factor SAME  H: all observations of the same object at m NOT  K: might be from separate objects at { m i } ISSAC at HIPACC 7/19/2012
Same or Not? Tamás Budavári OR  The Bayes factor SAME  H: all observations of the same object at m NOT  K: might be from separate objects at { m i } ISSAC at HIPACC 7/19/2012
Same or Not? Tamás Budavári OR  The Bayes factor SAME  H: all observations of the same object at m NOT  K: might be from separate objects at { m i } ISSAC at HIPACC 7/19/2012
Same or Not? Tamás Budavári OR  The Bayes factor SAME  H: all observations of the same object at m On the sky Astrometry NOT  K: might be from separate objects at { m i } ISSAC at HIPACC 7/19/2012
Same or Not? Tamás Budavári OR  The Bayes factor SAME  H: all observations of the same object at m On the sky Astrometry NOT  K: might be from separate objects at { m i } ISSAC at HIPACC 7/19/2012
Analytic Results Tamás Budavári  Normal distribution  Flat and spherical  Gauss and Fisher  2-way results ISSAC at HIPACC
Normal Distribution Tamás Budavári  Astrometric precision:  Fisher distribution:  Analytic results:  For high accuracies: ISSAC at HIPACC 7/19/2012
Wikipedia: Interpretation Tamás Budavári ISSAC at HIPACC 7/19/2012
Probability of a Match Same or not? ISSAC at HIPACC 7/19/2012
From Priors to Posteriors Tamás Budavári  Bayes factor is the connection ISSAC at HIPACC 7/19/2012
From Priors to Posteriors Tamás Budavári  Posterior probability from prior & Bayes factor  Prior probability of a match  Like dice in a bag: 1/ N and N 1  n  In general? ISSAC at HIPACC 7/19/2012
From Priors to Posteriors Tamás Budavári  Different selections  Nearby / Distant  Red / Blue  But only 1 number ISSAC at HIPACC
Self-Consistent Estimates Tamás Budavári  Prior has an unknown fudge-factor  Educated guess TB & Szalay (2008)  Or solve for it: ISSAC at HIPACC 7/19/2012
Simulations Tamás Budavári  Mock objects  With correct clustering  U 01 values as properties 0 1  Simulated sources  Subsets: N 1 N 2  Overlap: N ★ ISSAC at HIPACC 7/19/2012
Simulations Tamás Budavári  Mock objects  With correct clustering  U 01 values as properties 0 1  Simulated sources  Subsets: N 1 N 2  Overlap: N ★ ISSAC at HIPACC 7/19/2012
Simulations Tamás Budavári  Quality  Multiple matches Explained by simple model of point sources! Heinis, TB, Szalay (2009) ISSAC at HIPACC 7/19/2012
Proper Motion Tamás Budavári  Same hypotheses but different parameters  Just need  prior to integrate Sources from SDSS ISSAC at HIPACC 7/19/2012
Proper Motion Tamás Budavári  Same hypotheses but different parameters  Just need  prior to integrate Kerekes, TB+ (2010) Sources from SDSS ISSAC at HIPACC 7/19/2012
Matching Events Tamás Budavári  Streams of events in time and space  E.g., thresholded peaks in signal-to-noise (1) (x) (2) ISSAC at HIPACC 7/19/2012
Dropouts from Sky Coverage ISSAC at HIPACC 7/19/2012
Drawing with Equations Tamás Budavári TB, Szalay & Fekete (2010) r = 0.6  r = 0.5  ISSAC at HIPACC 7/19/2012
Matching in Practice ISSAC at HIPACC 7/19/2012
Open SkyQuery Tamás Budavári  Following our 1 st prototype  Successful  Not bayesian  Limitations ISSAC at HIPACC 7/19/2012
SkyQuery – The 3 rd Generation Tamás Budavári  Dynamic federation of astronomy databases  Query the collection as if they were one  The 3 rd gen tool coming this summer  Cluster of machines running partitioned jobs  Proper probabilistic exec with variable errors ISSAC at HIPACC 7/19/2012
SkyQuery Tamás Budavári  Almost pure standard SQL ISSAC at HIPACC 7/19/2012
SkyQuery Tamás Budavári  Almost pure standard SQL ISSAC at HIPACC 7/19/2012
SkyQuery Tamás Budavári  Almost pure standard SQL ISSAC at HIPACC 7/19/2012
SkyQuery Tamás Budavári  Almost pure standard SQL  Added XMATCH  Verifiable  Flexible ISSAC at HIPACC 7/19/2012
Tamás Budavári ISSAC at HIPACC 7/19/2012
 HST Crossmatch Catalog RELEASE AT AAS Tamás Budavári  SQL pipeline  Astrometric TB & Lubow (2012) correction  Subpixel precision ISSAC at HIPACC
 HST Crossmatch Catalog RELEASE AT AAS Tamás Budavári  FoF groups  Possible chains  Bayesian model selection  Chainbreaker ISSAC at HIPACC
 HST Crossmatch Catalog RELEASE AT AAS Tamás Budavári  Lots of matching sources during HST’s long life TB & Lubow (2012) TB & Lubow (2012) ISSAC at HIPACC
Recommend
More recommend