CANFAR platform for data-intensive research David Schade Canadian Advanced Network for Astronomy Research (CANFAR) Canadian Astronomy Data Centre National Research Council Canada & University of Victoria
The Canadian organizations Canadian Astronomy Data Centre • Curation of Canada’s national astronomy data collections • 29 years supporting Canadian university researchers • National Research Council Canada / GoC Compute Canada • Canada’s national Advanced Research Computing organization CANFAR • National consortium of university astronomers • Directs CANFAR development and operations
CADC-CANFAR CADC is part of the National Research Council Canada (government) 29 years of experience in data management CANFAR is the Canadian Advanced Network for Astronomy Research • Consortium of university astronomers Compute Canada is the national organization that provides Advanced Research Computing HPC • Now moving toward support for data-intensive research
Canadian Astronomy Data Centre We began as a Data Centre • Data curation • Long-term preservation • Distribution • Telescope collections: • Multiple missions, facilities and wavelengths • 12 telescopes • 22 staff • 6 Scientists • 5 Operations staff • 10 Developers • Admin
Canadian Advanced Network for Astronomical Research A cloud ecosystem for data intensive astronomy • User services • Store and share data • Create and configure VMs • Run interactive VMs • Run persistent VMs • Batch processing with VMs • Support for visualization & analystics • Using research cloud resources • Compute Canada • Integrated authentication and authorization
Big Data The era of “silo-ed” data centre is dead The fundamental problem now is to develop a range of architectures that couple data to processing, networking, and services in ways that support researchers
CANFAR serves a global research community
CANFAR/CADC 2014 • Size: • 932M files • 2.3 PiB • Users • Authenticated access: 762 • Anonymous access: 7,544 • Registered: 7,018 • Data moved in the last year • TiB: 1,106 • Files: 91M
Leap in data transfers 2010
Context: Compute Canada • Large national computing infrastructure • Agencies pushing researchers to use it • Limited success in data- intensive astronomy • Users must adapt to local OS, software and policies • Conflicting demands • Limited mobility
CANFAR as a platform CANFAR develops, integrates, and operates the services • Distributed storage • VOSpace: user-managed storage • Batch cloud processing • Interactive and persistent VMs • Authentication and Authorization CANFAR supports users of the services Compute Canada provides hardware There is a Compute Canada-CANFAR Operations Committee
COMPUTE CANADA: things are changing Compute Canada (CC) has new funding and is committed to serving all Advanced Research Computing needs New funding program for CC emphasizes data-intensive research The future of CC lies in providing services rather than hardware CADC is contracting with CC to develop services • Project kick-off meeting November 5,6 CADC is pushing for generic research services
CANFAR: generic research platform
Why Federate International e-Infrastructures?
CANFAR: Observatory Partners / Primary Data Producers in astronomy • Chile • Canada • France • Australia • Korea • United States • United Kingdom • China • Taiwan • Netherlands • Japan • Argentina • Brazil • + ESA members
Where are the consumers of CANFAR data & services?
Science, Facilities, Data • All Canadian astronomy is collaborative, global, reciprocal • Many other sciences are the same • All Canadian observing facilities are multi-national • All Canadian science teams are multi-national • Shared e-infrastructure needs to be multi-national
European Grid Initiative: CANFAR/INAF/EGI PROPOSAL – Technical Annex Sections 1-3: Excellence, Impact & Implementation Proposal full title: Engaging the Research Community towards an Open Science Commons Proposal acronym: EGI-Engage Call: EINFRA-1-2014 studies will be launched at PM15. Canadian Advanced Network for Astronomical Research (Lead: INFN) (M6 – M30) The Canadian Advanced Network for Astronomical Research (CANFAR) 20 is a computing infrastructure for astronomers in Canada. International collaboration in the Astronomy discipline will be supported both by the Canadian Astronomy Data Centre (CADC) and EGI. CANFAR and EGI will work together to integrate both e-Infrastructures towards a seamless and uniform platform for international astronomy research collaboration. Community services will be provided on top of the federated cloud of EGI using open source solutions and re-using the CANFAR experience. Integration for gCube and the D4Science infrastructure (M1 - M12) 18
CADC CANFAR
U NIVERSITY CANFAR/CADC T ELESCOPE R ESEARCHER C LIENT C LIENT Compute Canada Compute Canada NRC Herzberg Victoria Saskatoon Victoria Archive Telescope Data Data User Interactive use Data of VMs Algorithms and S TORAGE M ANAGEMENT Software S S 2.6 PETABYTES Meta Data Queries E E M ETA D ATA M ANAGEMENT R R 8.5 TERABYTES VM V V Images I I Meta Data C C Queries Meta VM E Data E Control S S Processing Control P ROCESSING M ANAGEMENT Interactive use 954 COMPUTE CORES of VMs VM Images Processing Control Archive Data VM Service Creation and Deployment Compute Canada Compute Canada NRC Herzberg Victoria Calgary Victoria Key Data Activities • Data engineering University researchers and telescope staff have • Operations and user support privileges to upload data, create VMs and install Data In Data Out • Software development science applications, run interactive VM sessions, • Software integration # of files Terabytes # of files Terabytes submit batch processing jobs to VMs, share their • Data processing VMs, control the life-cycle for their VMs, offer • Data management Peak per day 2,169,190 8.0 648,093 16.8 software-as-a-service applications in their VMs. • User web services Avg per day 130,952 0.4 99,253 2.6 • User web interfaces Definition: VM – Virtual Machine
Recommend
More recommend