virtual observatory
play

VIRTUAL OBSERVATORY TECHNOLOGIES Tams Budavri / The Johns Hopkins - PowerPoint PPT Presentation

VIRTUAL OBSERVATORY TECHNOLOGIES Tams Budavri / The Johns Hopkins University 7/30/2010 Moores Law, Big Data! Tams Budavri 2 7/30/2010 Outline 3 Tams Budavri SQL for Big Data Computing where the bytes are Database


  1. VIRTUAL OBSERVATORY TECHNOLOGIES Tamás Budavári / The Johns Hopkins University 7/30/2010

  2. Moore’s Law, Big Data! Tamás Budavári 2 7/30/2010

  3. Outline 3 Tamás Budavári  SQL for Big Data  Computing where the bytes are  Database and GPU integration  CUDA from SQL  Data intensive Web services  Behind the scenes  Working examples  Sloan Digital Sky Survey  Virtual Observatory tools and services 7/30/2010

  4. The Virtual Observatory 4 Tamás Budavári “The Virtual Observatory is a framework that enables new astronomical research by greatly enhancing access to worldwide data and computing resources.” http://us-vo.org/  How it works  How to build it  How to use it  What’s next 7/30/2010

  5. Hierarchy of Services 5 Tamás Budavári  Atomic services  Access to observations, simulations  Access to models  Higher level services  Combine for more functionality  User and analysis tools  Can be a high level service, too 7/30/2010

  6. Heterogeneous Datasets 6 Tamás Budavári  Blobs: images, spectra, etc...  Access, transfer  Catalogs  Fast searches, indexes 7/30/2010

  7. Structured Query Language 7 Tamás Budavári  SQL`92 standard  Almost in English SELECT <columns> FROM <table> WHERE <conditions>  Astronomical Data Query Language  An extended subset  GIS-like spatial 7/30/2010

  8. Structured Query Language 8 Tamás Budavári  SQL`92 standard  Almost in English SELECT RA, Dec FROM Stars WHERE r < 15  Astronomical Data Query Language  An extended subset  GIS-like spatial 7/30/2010

  9. Joining Tables 9 Tamás Budavári  Sources in observations fields: 2 tables SELECT f.FieldID , … s.ObjID, s.RA, s.Dec , … FROM Fields AS f INNER JOIN Sources AS s ON s.FieldID=f.FieldID WHERE f.ExpTime > 1000 AND s.Rmag > 16 7/30/2010

  10. Calculations in SQL 10 Tamás Budavári  Computed columns  Use J-H in SELECT and/or WHERE  Similarly functions, e.g., POWER(10,-0.4*Rmag)  Grouping SELECT FieldID, AVG(J), STDEV(J) FROM Sources GROUP BY FieldID  Can use for histograming , etc…  E.g., SDSS Catalog Archive here 7/30/2010

  11. Surveys in Astronomy 11 Tamás Budavári  Sloan Digital Sky Survey 2001-2008  8TB Catalog Archive Server  Custom tools and indices  Upcoming Surveys  PanSTARRS: 100TB 2010-  LSST: 1PB+ 201?

  12. New Moore’s Law 12 Tamás Budavári  In the number of cores  Faster than ever (for now) 7/30/2010

  13. New Programming Paradigm 13 Tamás Budavári  100s of cores – 27k parallel threads per GPU  Running a billion threads a second  Forget the fancy old algorithms  Built on wrong assumptions  Today CPU is free, RAM is slow  GPU has >50GB/s bandwidth  Still difficult to occupy the cores 7/30/2010

  14. Hybrid Architecture 14 Tamás Budavári run un launch launch sync 7/30/2010

  15. Extending SQL Server 15 Tamás Budavári  Dedicated service for direct access  Shared memory IPC w/ on-the-fly data transform IPC SQL 7/30/2010

  16. Extending SQL Server 16 Tamás Budavári  Dedicated service for direct access  Shared memory IPC w/ on-the-fly data transform IPC SQL 7/30/2010

  17. Spatial Statistics 17 Tamás Budavári  Correlation functions  From pair-counts 8 bins  State of the art  Dual-tree traversal  High resolution bins?  Just like brute force 7/30/2010

  18. Sloan DR7 800  800 bins 18 Tamás Budavári

  19. All Done Inside the Database 19 Tamás Budavári  Pair counts computed on GPU  Returns 2D histogram as a table (i, j, cts)  Calculate the correlation fn in SQL  Can also do async parallel GPU jobs 7/30/2010

  20. All Done Inside the Database 20 Tamás Budavári  Pair counts computed on GPU  Returns 2D histogram as a table (i, j, cts)  Calculate the correlation fn in SQL  Can also do async parallel GPU jobs 7/30/2010

  21. Distributed Data 21

  22. Data at the Projects 22 Tamás Budavári  Exponential growth  Projects last 3-5 years, data sent upwards at the end  Data will never be centralized  Most data at projects  More responsibility on projects  Bring analysis close to the data 7/30/2010

  23. 23 Tamás Budavári 7/30/2010

  24. Data Federation 24 Tamás Budavári  Metcalfe ’ s Law  Utility of computer networks grows as the number of possible connections: O(N 2 )  The Virtual Observatory  The federation of N astronomy archives has utility O(N 2 ), i.e. possibilities for making discoveries The whole is more than the sum of the parts 7/30/2010

  25. Interoperability Challenges 25 Tamás Budavári  Metadata standards  Data discovery  Data requests  Data delivery  Units  Database queries  Distributed applications  Authentication and authorization 7/30/2010

  26. US National Virtual Observatory 26 Tamás Budavári  NVO Research 2002-2007  NSF ITR Program: $10M for 5 years  17 organizations: Astro, CS, IT  VAO Facility 2010-  NSF $20M for 5 years  Operational phase! http://us-vo.org/ 7/30/2010

  27. http://ivoa.net/ 7/30/2010

  28. http://ivoa.net/ 7/30/2010

  29. IVOA Specifications 29 Tamás Budavári 7/30/2010

  30. First Standards 30 Tamás Budavári  VOTable  Universal container for tables (in XML)  First VO standard (from the DTD era)  ConeSearch  Simple catalog access based on location  First VO standard interface (http get)  Many implemented them! 7/30/2010

  31. Early Standards 31 Tamás Budavári  Simple Image Access Protocol (SIAP)  Http request, similar to opening a web page  Returns links to the matching images in votable  Assumes we know how to deal with FITS images  Universal Content Descriptor (UCD)  Crystallized set of keywords from literature  For data discovery – not queries 7/30/2010

  32. Components 32 Tamás Budavári  Discovery  Distributed Computing  Directory, Sky coverage  Web & Grid services  VOStat  Access  Messaging  Tables, Catalogs  Images, Spectra  SAMP, VOPipe  Events  User Interfaces  Distributed Storage  Aladin  VOSpace  Topcat  Authentication  Mirage, etc… 7/30/2010

  33. VO Examples 33 VO Applications and Services

  34. NVO Quick Start 34 Tamás Budavári 7/30/2010

  35. Ready, Steady… 35 Tamás Budavári 7/30/2010

  36. DataScope 36 Tamás Budavári  Collect info in VO  On a particular object  Or a part of the sky  GRBs, transients, etc.  VO plotting tools  FITS images  Catalog data  And more … 7/30/2010

  37. Bandpass Services 37 Tamás Budavári  Public repository  Search by keyword or  eff  Extract in various formats  Register & submit yours  Web site  On-the-fly plotting  Easy access to all  Web services  To code against 7/30/2010

  38. Spectrum Services 38 Tamás Budavári  Public repository  SDSS, 2dF spectra, etc  Spatial and SQL search  Register & submit yours  Web site  On-the-fly plotting  Building composites  De-reddening  Line analysis  Web services 7/30/2010

  39. Open SkyQuery 39 Tamás Budavári  SkyNode interface to archives  Implements ADQL returns VOTable  Basic node understands “ REGION ”  Full node understands “ XMATCH ”  SkyQuery portal  Knows the SkyNodes from Registry  Understands federated query http://openskyquery.net/

  40. WESIX 40 Tamás Budavári Web Enabled Source-Identification with Crossmatching Higher level astronomy services built on other existing VO services: SExtractor service and Open SkyQuery Result can be sent to plotting tool for quick inspection. http://nvogre.astro.washington.edu:8080/wesix/ 7/30/2010

  41. VOStat 41 Tamás Budavári  Enabling R  For VO data 7/30/2010

  42. Sky Coverage 42 Tamás Budavári  Discovery

  43. Transients: VOEvent 43 Tamás Budavári 7/30/2010

  44. Help! 44 Tamás Budavári

  45. VO for Developers 45 Automated tools for analysis Advanced services

  46. Web Services 46 Tamás Budavári  Simple HTTP requests  ConeSearch  Simple Image Access  Standard SOAP and REST  Interoperable across platforms  IVOA compliant XML messages  Programming toolkits exist 7/30/2010

  47. Command Line: VO-CLI 47 Tamás Budavári  VOTool 7/30/2010

  48. Command Line: VO-CLI 48 Tamás Budavári  VOTool 7/30/2010

  49. Future 49 New features Better integration

  50. VOSpace 2.0 50 Tamás Budavári  Storage instances soon everywhere  Save intermediate data products  Arrange for their transfer to other places  VOPipe  Chain VOSpaces for data flow between services  Async execution of custom processing steps 7/30/2010

  51. Summary 51 Tamás Budavári  More and Moore data: new opportunities  No central data store but at projects  On-site processing: CPU + GPU  Hierarchical Services  Standardized interfaces  Data federation  New “ VxOs ”  VaO: Virtual Astronomical Observatory  VsO, 7/30/2010

  52. Sites to Explore 52 Tamás Budavári 7/30/2010

  53. 53 Tamás Budavári 7/30/2010

Recommend


More recommend