the helmholtz association project large
play

The Helmholtz Association Project Large Scale Data Management and - PowerPoint PPT Presentation

The Helmholtz Association Project Large Scale Data Management and Analysis (LSDMA) Kilian Schwarz, GSI; Christopher Jung, KIT Overview Motivation Data Life Cycle LSDMAs dual approach Facts and Numbers Initial


  1. The Helmholtz Association Project „Large Scale Data Management and Analysis“ (LSDMA) Kilian Schwarz, GSI; Christopher Jung, KIT

  2. Overview • Motivation • Data Life Cycle • LSDMA’s dual approach • Facts and Numbers • Initial Communities • LSDMA, FAIR and ALICE 2 05.10.2012 Christopher Jung SCC, KIT

  3. Why is Scientific Big Data important? Honestly, I do not need to explain this to you. 3 05.10.2012 Christopher Jung SCC, KIT

  4. Examples of Scientific Big Data in non-HEP Examples for sciences with Big Data: • Systems Biology: ~10 TB per day in high- throughput microscopy (zebra fish embryos) • Climate simulation: 10-100 PB per year • Brain research: 1 PB per year for brain mapping • Photon Science: XFEL 10 PB/year • and many other sciences which do know their needs yet 4 05.10.2012 Christopher Jung SCC, KIT

  5. Challenges of Big Data • Non-reproducibility of scientific data (or at high costs) • Current analysis methods scale poorly • Existing big data knowledge in the respective fields • Each discipline has its specific needs • Multidiscliplanary research • Metadata • Authentication and authorization (single sign-on) • Data privacy (incl. removal of private data) • “ Good scientific practice ” • Cost estimation for long-term archival (at different service levels) • Data preservation • Open Access • … 5 05.10.2012 Christopher Jung SCC, KIT

  6. Data Life Cycle Inspiration for LSDMA: support the whole data life cycle! 6 05.10.2012 Christopher Jung SCC, KIT

  7. Dual approach: community-specific and generic Data Life Cycle Labs Data Services Integration Team • • Joint r&d with the scientific user Generic r&d communities – Interface between federated – Optimization of the data life data infrastructures and DLCLs/communities cycle – Integration of data services into – Community-specific data scientific working process analysis tools and services 7 05.10.2012 Christopher Jung SCC, KIT

  8. Facts and numbers • Initial project period: 1.1.2012-31.12.2016 • Funded by Helmholtz Association (13 MEUR for 5 years) • To become a part of the sustainable program-oriented funding of Helmholtz Association in 2015 • Partners: 4 Helmholtz research centers, 6 universities and the German climate research center • Leading project partner: KIT 8 05.10.2012 Christopher Jung SCC, KIT

  9. Initial communities • Energy – Smart grids, battery research, fusion research • Earth and Environment – Climate model, environmental satellite data • Health – Virtual human brain map • Key Technologies – Synchroton radiation, nanoscopy, systems biology, electron- microscopical imaging techniques • Structure of Matter – Photon Science: Petra 3, XFEL – FAIR@GSI (14 experiments with big and small communities) 9 05.10.2012 Christopher Jung SCC, KIT

  10. LHC Computing – Prototype for FAIR • FAIR profits from computing experience within an already running experiment • ALICE can test new developments in FAIR • new FAIR developments are on the way, and to some extend they already go back to ALICE • FAIR will play an increasing role (funding, network architecture, software development and more ...) 10 05.10.2012 Christopher Jung SCC, KIT

  11. Goals for GSI/FAIR in LSDMA To be developed within LSDMA (DLCL: structure of matter) in collaboration with LSDMA – DSIT, the FAIR community, and ALICE (whereever synergy can be found) • parallel and distributed computing • Metropolitan Area Systems – triggerless “online” system – include the distributed FAIR • porting of needed algorithms to T0/T1 centre into a global GPU Grid/Cloud infrastructure – Grid/Cloud infrastructure – Federated Identity Management • enable the possibility to submit compute jobs to Clouds • Global Federations – create interfaces to existing environments (AliEn, ...) – Global File System • data archives – Optimization of Data Storage – long term data archives • hot versus cold data • including concepts for xrootd and • corrupt and incomplete data sets gStore – meta data calatog and data • parallel storage analysis • 3rd party copy Additional synergies via DSIT 11 05.10.2012 Christopher Jung SCC, KIT

  12. Next Steps at GSI • Advertise LSDMA positions (2 for FAIR DLCL) – do you know candidates ? – GSI DSIT already started to hire people • Discussion with FAIR experiments and ALICE • Set-up of e-science infrastructures, first for PANDA and CBM, based on the experiences with ALICE (AliEn/xrootd/...) • Include smaller FAIR experiments • Continue to develop existing e-science infrastructure, also in close collaboration with DSIT and ALICE 12 05.10.2012 Christopher Jung SCC, KIT

  13. Summary and Outlook • There are many challenges in Scientific Big Data • LSDMA is a sustainable Helmholtz Association project, supporting the whole data life cycle, using a community-specific and a generic approach • FAIR is an important initial community in the research field ‘structure of matter’; several developments planned -> synergies w/ALICE • GSI has two open job positions for LSDMA 13 05.10.2012 Christopher Jung SCC, KIT

Recommend


More recommend