practical analytics
play

PRACTICAL ANALYTICS Tams Budavri / The Johns Hopkins University - PowerPoint PPT Presentation

PRACTICAL ANALYTICS Tams Budavri / The Johns Hopkins University 7/19/2012 Statistics Tams Budavri Of numbers Of vectors Of functions Of trees ISSAC at HIPACC 7/19/2012 Statistics Tams Budavri Description,


  1. PRACTICAL ANALYTICS Tamás Budavári / The Johns Hopkins University 7/19/2012

  2. Statistics Tamás Budavári  Of numbers  Of vectors  Of functions  Of trees ISSAC at HIPACC 7/19/2012

  3. Statistics Tamás Budavári  Description, modeling, inference, machine learning  Bayesian / Frequentist / Pragmatist ? Supervised Unsupervised Discrete Classification Clustering Continuous Regression Dimensional Reduction ISSAC at HIPACC 7/19/2012

  4. What’s Large? Tamás Budavári  VOLUME  Say >100TB today but tomorrow? Moving target…  COMPLEXITY  The raw dataset are simple unlike their derivatives  DEFINITION?  Large when you cannot apply the “usual” tools ISSAC at HIPACC 7/19/2012

  5. Data LARGE !! ISSAC at HIPACC 7/19/2012

  6. Data LARGE !! ISSAC at HIPACC 7/19/2012

  7. Large? Tamás Budavári  Sample size ISSAC at HIPACC 7/19/2012

  8. Large? Tamás Budavári  Sample size ISSAC at HIPACC 7/19/2012

  9. Large? Tamás Budavári  Dimensions  Ratio of surface/volume grows all points are lonely in high dimensions ISSAC at HIPACC 7/19/2012

  10. ISSAC at HIPACC 7/19/2012

  11. Keeping Up? Tamás Budavári  Image processing  Catalog extraction  O ( n )  What is difficult?  O ( n log n )  O ( n 2 ), … Worse w/ Moore’s law ISSAC at HIPACC 7/19/2012

  12. Fundamental Challenges Tamás Budavári  Cross-identification of sources  To assemble multicolor catalogs  Drop-outs from sky coverage  To constrain fluxes not detected  Constraining physical properties  To interpret the data ISSAC at HIPACC 7/19/2012

  13. Cross-Identification From long-tail science to the largest experiments ISSAC at HIPACC 7/19/2012

  14. Recording Observations Tamás Budavári  Astronomers drew it…  Now kids do it on SkyServer #1 by Haley  ISSAC at HIPACC

  15. Multicolor Universe Tamás Budavári ISSAC at HIPACC 7/19/2012

  16. Eventful Universe Tamás Budavári ISSAC at HIPACC 7/19/2012

  17. Cross-Identification One of the most fundamental analysis steps ISSAC at HIPACC 7/19/2012

  18. What is the Right Question? Tamás Budavári  Cross-identification is a hard problem  Computationally, Scientifically & Statistically  Need symmetric n -way solution  Need reliable quality measure  Same or not?  Distance threshold? Maximum likelihood? ISSAC at HIPACC 7/19/2012

  19. Tabletop Astronomy Tamás Budavári  Imagine the observed sky has only 6 pixels  One object : one die  Observing : rolling a die  Locality : die is loaded  Sky : a bag of dice ISSAC at HIPACC 7/19/2012

  20. Model Comparison: Same or Not? Tamás Budavári  Crossmatch : draw two dice with replacement  Same or not?  Bayes Factor is the ratio of the  Likelihood of “Same”  Likelihood of “Not”  Likelihood of a hypothesis?  Sum over all possibilities ISSAC at HIPACC 7/19/2012

  21. Model Comparison: Same or Not? Tamás Budavári  Crossmatch : draw two dice with replacement  Same or not?  Bayes Factor is the ratio of the  Likelihood of “Same”  Likelihood of “Not”  Likelihood of a hypothesis?  Sum over all possibilities ISSAC at HIPACC 7/19/2012

  22. Model Comparison: Same or Not? Tamás Budavári  Model for loaded dice is matrix of probabilities  E.g., loaded toward l =1  Etc. for l =2…6  2-way case  Same:  Not:  n -way: same ISSAC at HIPACC 7/19/2012

  23. Model Comparison: Same or Not? Tamás Budavári  Model for loaded dice is matrix of probabilities  E.g., loaded toward l =1  Etc. for l =2…6  2-way case  Same:  Not:  n -way: same ISSAC at HIPACC 7/19/2012

  24. Model Comparison: Same or Not? Tamás Budavári  Model for loaded dice is matrix of probabilities  E.g., loaded toward l =1  Etc. for l =2…6  2-way case  Same:  Not:  n -way: same ISSAC at HIPACC 7/19/2012

  25. Celestial Sphere Tamás Budavári  Continuous functions  General formalism  Accuracy is a density fn on sky ISSAC at HIPACC 7/19/2012

  26. Modeling the Astrometry Tamás Budavári  Astrometric precision  A simple function  Where on the sky?  Anywhere really… ISSAC at HIPACC 7/19/2012

  27. Same or Not? Tamás Budavári OR  The Bayes factor SAME  H: all observations of the same object at m NOT  K: might be from separate objects at { m i } ISSAC at HIPACC 7/19/2012

  28. Same or Not? Tamás Budavári OR  The Bayes factor SAME  H: all observations of the same object at m NOT  K: might be from separate objects at { m i } ISSAC at HIPACC 7/19/2012

  29. Same or Not? Tamás Budavári OR  The Bayes factor SAME  H: all observations of the same object at m NOT  K: might be from separate objects at { m i } ISSAC at HIPACC 7/19/2012

  30. Same or Not? Tamás Budavári OR  The Bayes factor SAME  H: all observations of the same object at m On the sky Astrometry NOT  K: might be from separate objects at { m i } ISSAC at HIPACC 7/19/2012

  31. Same or Not? Tamás Budavári OR  The Bayes factor SAME  H: all observations of the same object at m On the sky Astrometry NOT  K: might be from separate objects at { m i } ISSAC at HIPACC 7/19/2012

  32. Analytic Results Tamás Budavári  Normal distribution  Flat and spherical  Gauss and Fisher  2-way results ISSAC at HIPACC

  33. Normal Distribution Tamás Budavári  Astrometric precision:  Fisher distribution:  Analytic results:  For high accuracies: ISSAC at HIPACC 7/19/2012

  34. Wikipedia: Interpretation Tamás Budavári ISSAC at HIPACC 7/19/2012

  35. Probability of a Match Same or not? ISSAC at HIPACC 7/19/2012

  36. From Priors to Posteriors Tamás Budavári  Bayes factor is the connection ISSAC at HIPACC 7/19/2012

  37. From Priors to Posteriors Tamás Budavári  Posterior probability from prior & Bayes factor  Prior probability of a match  Like dice in a bag: 1/ N and N 1  n  In general? ISSAC at HIPACC 7/19/2012

  38. From Priors to Posteriors Tamás Budavári  Different selections  Nearby / Distant  Red / Blue  But only 1 number ISSAC at HIPACC

  39. Self-Consistent Estimates Tamás Budavári  Prior has an unknown fudge-factor  Educated guess TB & Szalay (2008)  Or solve for it: ISSAC at HIPACC 7/19/2012

  40. Simulations Tamás Budavári  Mock objects  With correct clustering  U 01 values as properties 0 1  Simulated sources  Subsets: N 1 N 2  Overlap: N ★ ISSAC at HIPACC 7/19/2012

  41. Simulations Tamás Budavári  Mock objects  With correct clustering  U 01 values as properties 0 1  Simulated sources  Subsets: N 1 N 2  Overlap: N ★ ISSAC at HIPACC 7/19/2012

  42. Simulations Tamás Budavári  Quality  Multiple matches Explained by simple model of point sources! Heinis, TB, Szalay (2009) ISSAC at HIPACC 7/19/2012

  43. Proper Motion Tamás Budavári  Same hypotheses but different parameters  Just need  prior to integrate Sources from SDSS ISSAC at HIPACC 7/19/2012

  44. Proper Motion Tamás Budavári  Same hypotheses but different parameters  Just need  prior to integrate Kerekes, TB+ (2010) Sources from SDSS ISSAC at HIPACC 7/19/2012

  45. Matching Events Tamás Budavári  Streams of events in time and space  E.g., thresholded peaks in signal-to-noise (1) (x) (2) ISSAC at HIPACC 7/19/2012

  46. Dropouts from Sky Coverage ISSAC at HIPACC 7/19/2012

  47. Drawing with Equations Tamás Budavári TB, Szalay & Fekete (2010) r = 0.6  r = 0.5  ISSAC at HIPACC 7/19/2012

  48. Matching in Practice ISSAC at HIPACC 7/19/2012

  49. Open SkyQuery Tamás Budavári  Following our 1 st prototype  Successful  Not bayesian  Limitations ISSAC at HIPACC 7/19/2012

  50. SkyQuery – The 3 rd Generation Tamás Budavári  Dynamic federation of astronomy databases  Query the collection as if they were one  The 3 rd gen tool coming this summer  Cluster of machines running partitioned jobs  Proper probabilistic exec with variable errors ISSAC at HIPACC 7/19/2012

  51. SkyQuery Tamás Budavári  Almost pure standard SQL ISSAC at HIPACC 7/19/2012

  52. SkyQuery Tamás Budavári  Almost pure standard SQL ISSAC at HIPACC 7/19/2012

  53. SkyQuery Tamás Budavári  Almost pure standard SQL ISSAC at HIPACC 7/19/2012

  54. SkyQuery Tamás Budavári  Almost pure standard SQL  Added XMATCH  Verifiable  Flexible ISSAC at HIPACC 7/19/2012

  55. Tamás Budavári ISSAC at HIPACC 7/19/2012

  56.  HST Crossmatch Catalog RELEASE AT AAS Tamás Budavári  SQL pipeline  Astrometric TB & Lubow (2012) correction  Subpixel precision ISSAC at HIPACC

  57.  HST Crossmatch Catalog RELEASE AT AAS Tamás Budavári  FoF groups  Possible chains  Bayesian model selection  Chainbreaker ISSAC at HIPACC

  58.  HST Crossmatch Catalog RELEASE AT AAS Tamás Budavári  Lots of matching sources during HST’s long life TB & Lubow (2012) TB & Lubow (2012) ISSAC at HIPACC

Recommend


More recommend