VIRTUAL OBSERVATORY TECHNOLOGIES Tamás Budavári / The Johns Hopkins University 7/30/2010
Moore’s Law, Big Data! Tamás Budavári 2 7/30/2010
Outline 3 Tamás Budavári SQL for Big Data Computing where the bytes are Database and GPU integration CUDA from SQL Data intensive Web services Behind the scenes Working examples Sloan Digital Sky Survey Virtual Observatory tools and services 7/30/2010
The Virtual Observatory 4 Tamás Budavári “The Virtual Observatory is a framework that enables new astronomical research by greatly enhancing access to worldwide data and computing resources.” http://us-vo.org/ How it works How to build it How to use it What’s next 7/30/2010
Hierarchy of Services 5 Tamás Budavári Atomic services Access to observations, simulations Access to models Higher level services Combine for more functionality User and analysis tools Can be a high level service, too 7/30/2010
Heterogeneous Datasets 6 Tamás Budavári Blobs: images, spectra, etc... Access, transfer Catalogs Fast searches, indexes 7/30/2010
Structured Query Language 7 Tamás Budavári SQL`92 standard Almost in English SELECT <columns> FROM <table> WHERE <conditions> Astronomical Data Query Language An extended subset GIS-like spatial 7/30/2010
Structured Query Language 8 Tamás Budavári SQL`92 standard Almost in English SELECT RA, Dec FROM Stars WHERE r < 15 Astronomical Data Query Language An extended subset GIS-like spatial 7/30/2010
Joining Tables 9 Tamás Budavári Sources in observations fields: 2 tables SELECT f.FieldID , … s.ObjID, s.RA, s.Dec , … FROM Fields AS f INNER JOIN Sources AS s ON s.FieldID=f.FieldID WHERE f.ExpTime > 1000 AND s.Rmag > 16 7/30/2010
Calculations in SQL 10 Tamás Budavári Computed columns Use J-H in SELECT and/or WHERE Similarly functions, e.g., POWER(10,-0.4*Rmag) Grouping SELECT FieldID, AVG(J), STDEV(J) FROM Sources GROUP BY FieldID Can use for histograming , etc… E.g., SDSS Catalog Archive here 7/30/2010
Surveys in Astronomy 11 Tamás Budavári Sloan Digital Sky Survey 2001-2008 8TB Catalog Archive Server Custom tools and indices Upcoming Surveys PanSTARRS: 100TB 2010- LSST: 1PB+ 201?
New Moore’s Law 12 Tamás Budavári In the number of cores Faster than ever (for now) 7/30/2010
New Programming Paradigm 13 Tamás Budavári 100s of cores – 27k parallel threads per GPU Running a billion threads a second Forget the fancy old algorithms Built on wrong assumptions Today CPU is free, RAM is slow GPU has >50GB/s bandwidth Still difficult to occupy the cores 7/30/2010
Hybrid Architecture 14 Tamás Budavári run un launch launch sync 7/30/2010
Extending SQL Server 15 Tamás Budavári Dedicated service for direct access Shared memory IPC w/ on-the-fly data transform IPC SQL 7/30/2010
Extending SQL Server 16 Tamás Budavári Dedicated service for direct access Shared memory IPC w/ on-the-fly data transform IPC SQL 7/30/2010
Spatial Statistics 17 Tamás Budavári Correlation functions From pair-counts 8 bins State of the art Dual-tree traversal High resolution bins? Just like brute force 7/30/2010
Sloan DR7 800 800 bins 18 Tamás Budavári
All Done Inside the Database 19 Tamás Budavári Pair counts computed on GPU Returns 2D histogram as a table (i, j, cts) Calculate the correlation fn in SQL Can also do async parallel GPU jobs 7/30/2010
All Done Inside the Database 20 Tamás Budavári Pair counts computed on GPU Returns 2D histogram as a table (i, j, cts) Calculate the correlation fn in SQL Can also do async parallel GPU jobs 7/30/2010
Distributed Data 21
Data at the Projects 22 Tamás Budavári Exponential growth Projects last 3-5 years, data sent upwards at the end Data will never be centralized Most data at projects More responsibility on projects Bring analysis close to the data 7/30/2010
23 Tamás Budavári 7/30/2010
Data Federation 24 Tamás Budavári Metcalfe ’ s Law Utility of computer networks grows as the number of possible connections: O(N 2 ) The Virtual Observatory The federation of N astronomy archives has utility O(N 2 ), i.e. possibilities for making discoveries The whole is more than the sum of the parts 7/30/2010
Interoperability Challenges 25 Tamás Budavári Metadata standards Data discovery Data requests Data delivery Units Database queries Distributed applications Authentication and authorization 7/30/2010
US National Virtual Observatory 26 Tamás Budavári NVO Research 2002-2007 NSF ITR Program: $10M for 5 years 17 organizations: Astro, CS, IT VAO Facility 2010- NSF $20M for 5 years Operational phase! http://us-vo.org/ 7/30/2010
http://ivoa.net/ 7/30/2010
http://ivoa.net/ 7/30/2010
IVOA Specifications 29 Tamás Budavári 7/30/2010
First Standards 30 Tamás Budavári VOTable Universal container for tables (in XML) First VO standard (from the DTD era) ConeSearch Simple catalog access based on location First VO standard interface (http get) Many implemented them! 7/30/2010
Early Standards 31 Tamás Budavári Simple Image Access Protocol (SIAP) Http request, similar to opening a web page Returns links to the matching images in votable Assumes we know how to deal with FITS images Universal Content Descriptor (UCD) Crystallized set of keywords from literature For data discovery – not queries 7/30/2010
Components 32 Tamás Budavári Discovery Distributed Computing Directory, Sky coverage Web & Grid services VOStat Access Messaging Tables, Catalogs Images, Spectra SAMP, VOPipe Events User Interfaces Distributed Storage Aladin VOSpace Topcat Authentication Mirage, etc… 7/30/2010
VO Examples 33 VO Applications and Services
NVO Quick Start 34 Tamás Budavári 7/30/2010
Ready, Steady… 35 Tamás Budavári 7/30/2010
DataScope 36 Tamás Budavári Collect info in VO On a particular object Or a part of the sky GRBs, transients, etc. VO plotting tools FITS images Catalog data And more … 7/30/2010
Bandpass Services 37 Tamás Budavári Public repository Search by keyword or eff Extract in various formats Register & submit yours Web site On-the-fly plotting Easy access to all Web services To code against 7/30/2010
Spectrum Services 38 Tamás Budavári Public repository SDSS, 2dF spectra, etc Spatial and SQL search Register & submit yours Web site On-the-fly plotting Building composites De-reddening Line analysis Web services 7/30/2010
Open SkyQuery 39 Tamás Budavári SkyNode interface to archives Implements ADQL returns VOTable Basic node understands “ REGION ” Full node understands “ XMATCH ” SkyQuery portal Knows the SkyNodes from Registry Understands federated query http://openskyquery.net/
WESIX 40 Tamás Budavári Web Enabled Source-Identification with Crossmatching Higher level astronomy services built on other existing VO services: SExtractor service and Open SkyQuery Result can be sent to plotting tool for quick inspection. http://nvogre.astro.washington.edu:8080/wesix/ 7/30/2010
VOStat 41 Tamás Budavári Enabling R For VO data 7/30/2010
Sky Coverage 42 Tamás Budavári Discovery
Transients: VOEvent 43 Tamás Budavári 7/30/2010
Help! 44 Tamás Budavári
VO for Developers 45 Automated tools for analysis Advanced services
Web Services 46 Tamás Budavári Simple HTTP requests ConeSearch Simple Image Access Standard SOAP and REST Interoperable across platforms IVOA compliant XML messages Programming toolkits exist 7/30/2010
Command Line: VO-CLI 47 Tamás Budavári VOTool 7/30/2010
Command Line: VO-CLI 48 Tamás Budavári VOTool 7/30/2010
Future 49 New features Better integration
VOSpace 2.0 50 Tamás Budavári Storage instances soon everywhere Save intermediate data products Arrange for their transfer to other places VOPipe Chain VOSpaces for data flow between services Async execution of custom processing steps 7/30/2010
Summary 51 Tamás Budavári More and Moore data: new opportunities No central data store but at projects On-site processing: CPU + GPU Hierarchical Services Standardized interfaces Data federation New “ VxOs ” VaO: Virtual Astronomical Observatory VsO, 7/30/2010
Sites to Explore 52 Tamás Budavári 7/30/2010
53 Tamás Budavári 7/30/2010
Recommend
More recommend