The ALMA archive Mark Lacy Data Services Lead, NAASC, NRAO NA ALMA Development 2016
Motivation Papers from archival data are an important. Publications using archival data • account for half of the HST and Chandra publications each year. ALMA archival papers picking up, some examples (in high-z extragalactic science): • – Fujimoto et al. 2016: faint end of ALMA source counts to 0.02mJy. – Silva et al. 2015 – excess of submm sources around bright WISE-selected AGN – Oteo et al. 2016: number counts from calibrator fields. – Can save many hours of observation time spent looking at deep fields. – Removes issues of field-to-field variations Can point the way for some future observations without needing the TAC to be • “brave” and approve an ambitious/risky proposal. Could allow entire projects to be constructed, or supplemented with small • amounts of new ALMA data e.g. for a student thesis. NA ALMA development 2016
Goals of the Archive • Provide access to data. – Traditional search/download limited by bandwidth. – Need to move to server-side tools for the largest ALMA datasets. • Provide rich metadata to allow complicated queries/data mining – ALMA has a lot of data (~200TB today), but with very low information content (~10 -4 %). – To change it to “Big Data” that we can mine using data science techniques we need to extract the information from the noise. (Specific examples: source and line lists.) NA ALMA development 2016
Short-term Archive Developments • May have noticed “collapsed rows” in latest release. – These were needed as a prerequisite for the ingest of individual pipeline products in Cycle 4. – Tests of this will begin in September. – Current product “tar blobs” will be phased out. • Coming soon (during Cycle 4): – footprints via Aladdin Lite – RMS values – Upload of target lists to search on. NA ALMA development 2016
Current projects • Access to data – even very large files – CARTA - remote visualization capability (Erik Rosolowsky) – Pipeline Processing Interface (PPI) – remote pipeline runs to deliver calibrated measurement sets and/or images (NRAO initiative; this talk). • From “a lot of data” to “Big Data” – ADMIT enhanced metadata production (Peter Teuben) NA ALMA development 2016
PPI Pipeline Processing interface will allow reruns of the ALMA pipeline. Two • modes: – Apply existing archival calibration tables to ALMA raw data to produce calibrated measurement sets. • Useful if you just need calibrated uv-data – Run the current ALMA pipeline version to produce calibrated measurement sets and/or images. • Useful for running data with a new version of the pipeline • Also will allow for some pipeline parameter tweaks (parameters will expand with time). Initially available as part of the new NRAO archive access tool, will also be • made available to all ARCs as an add-on to the request handler. NA ALMA development 2016
RH/OODT infrastructure for the PPI • The PPI uses a modified ALMA request handler and the “Object Oriented Data Technology” (OODT) to kick off jobs on the cluster. • This can be generalized to other tasks e.g. analysis tools, visualization tasks etc. • So provides a framework for server-side deployment of software from the ALMA Development program. NA ALMA development 2016
Creating rich metadata • Accurate prior information is crucial – Source positions & spatial extents – Source velocities/redshifts – (Targeted lines) • Some of this is supplied by the PI in the OT, some not – Need to supplement with 3 rd party sources (SIMBAD, NED) – But problem of validation, and how to update – We are working on it, but slowly. • Source/clump finders in 2D and 3D (Jeff’s talk). NA ALMA development 2016
What else might we need/want? • Predictive searches (because you selected this dataset you may also be interested in…) (Barrientos, JAO). • Cutout server (for extracting small pieces of large cubes). • VO interoperability, e.g. – Search multiwavelength archives for other data on an ALMA field. – Upload ALMA pointings/footprints to e.g. DS9 or Topcat (SAMP). NA ALMA development 2016
Summary • We strongly encourage development proposals for the ALMA archive. • Do need to work closely with the Archive working group and the software development team in Garching to ensure successful integration of products/services. NA ALMA development 2016
Recommend
More recommend