Report from VLDATA F2F meeting in London G. Ganis, 16 June 2014
Reminder ● End of May ○ R. Graciani / UB (DIRAC project leader) proposes to include a plan for some CernVM-FS of possible interest for us in a proposal (VLDATA) for the H2020 call EINFRA-1 {4,5} ● 5 June ○ R. Graciani presents the project to us into a special SFT meeting; we decide to evaluate our possible contribution ● 6-11 June ○ We have some internal discussions and prepare a draft document; we decided that I will attend the London F2F meeting
VLDATA Objectives • VLDATA = Collaboration among Technology providers, integrating existing technologies, to simplify the connection between Users and Resource providers. • VLDATA = open & generic platform supporting efficient and cost-effective solutions for large-scale distributed data processing, curation, analysis and publication. • User Community driven co-design, validated by end-users and supporting a new generation of data scientists. • Sustainability , increasing the user base by promoting VLDATA among other relevant communities. June 12th, 2014 VLDATA
VLDATA
Make IT simple • Simplicity : VLDATA provides an abstraction of the different Resources that are all made accessible the end user via the same interfaces. • Transparency : Users are allowed to specify their Workflows/Pipelines with different levels of abstractions. The platform takes care of the necessary Resource Allocation to fulfill the required specifications. • Extendibility and flexibility : VLDATA provides an API that allows users to extend the provided functionality by developing new or customized components • Reliability : Quality standards and extensive validation in several scientific domains to ensure the readiness-to-use and robustness of VLDATA based solutions • Scalability : Modular implementation allowing horizontal (amount of connected Resources or Users) and vertical (amount of processed Units) scaling to adapt VLDATA to the needs of each particular community or Research Infrastructure project. • Smart and intelligent : building on collected experience and monitoring data, algorithm can look for optimized scheduling/searching strategies, including automated decision making based on usage traces and expectations. • Cost-effective : Building up on existing well-established solutions and incrementally extending and developing to address new challenges with an evolving validated common solution, avoiding unnecessary duplicated efforts. June 12th, 2014 VLDATA
VLDATA or Open DISDATA ● Main components ○ DIRAC ■ LHCb platform for distributed computing ■ Largely used in EGI community ■ Expects to improve scalability and capability to support efficiently multi-cores ○ SCI-BUS ■ SCIentific gateway Based User Support ■ Flexible solution for complex data processing workflows
EINFRA-1 {4,5} (4) Large scale virtualization of data/compute centre resources to achieve on-demand compute capacities, improve flexibility for data analysis and avoid unnecessary costly large data transfers. ● Budget: 15 EUR million in 2014 (5) Development and adoption of a standards-based computing platform (with open software stack) that can be deployed on different hardware and e-instrastructures (such as clouds providing infrastructure-as-a-service (IaaS), HPC, grid infrastructure …) to abstract application development and execution from available (possibly remote) computing systems. This platform should be capable of federating multiple commercial and/or public cloud resources or services and deliver Platform-as-a-Service (PaaS) adapted to the scientific community with a short learning curve. Adequate coordination and interoperability with existing e-infrastructures (including GÉANT, EGI, PRACE and others) is recommended. ● Budget: 40 EUR million in 2015 ● Deadline: 2 Septembre 2014, 17h CET
Work Packages (WP) ● Development ○ WP1: Requirements Analysis & Design (Cardiff Univ.) ○ WP2: Framework and basic modules (Univ. Barcelona) ○ WP3: Compute Management (MTA SZTAKI) ○ WP4: Data Management (CYFRONET) ○ WP5: Quality Assessment (Univ Auton Barcelona) ● Validation ○ WP6 (No leading institution appointed yet; INAF possible candidate) ● Exploitation ○ WP7: Communication (IDGC) ○ WP8: Exploitation (Emergence Tech) ○ WP9: Internationalization (Univ Amsterdam)
WP2: Frameworks & Basic modules 1. Improve accessibility to secure interfaces, including new client libraries supporting standard Authentication mechanism. 2. Optimize DB backend access including schema redesign, more efficient access patterns, intensive usage of caching and bulk operations and transparent replication. 3. Transparent access, in user space, to application software and data. 4. Enhance Task – Resource matching capabilities including extended semantics without losing in efficiency (> 1M Matches / day). 5. Improve overall resource status awareness including automated active and flexible end-to- end functional probes down to the execution node level. 6. Scale current WMS and File Catalog solution vertically and horizontally. 7. Produce High Availability solution, including life migration between geographically separated instances. 8. Fulfill quality and security standards defined by WP6
CernVM-FS developments 1. Unified access to all repositories a. Exploit bootstrap reposotiry cvmfs-config.cern.ch 2. Easier access in user space a. Improve efficiency of Parrot approach b. Provide tools to setup efficient caching 3. Distribution of proprietary or licensed software a. Granular ACL and/or encryption of repositories
CernVM-FS work plan 1. Get familiar with CernVM-FS, Parrot and all the relevant packages (M 3) 2. Design at prototype the required additions/modificationa (M6) 3. Provide a proper, clean, production quality implementation, inclusive of all relevant tests (M12) 4. Provide relevant documentation and dissemination material (M14) 5. Follow closely early adoption, analyse feedback, implement consolidation changes (M24) Request for 24 person months (Junior fellow)
CernVM developments ● Multi-OS support for CernVM ○ Bootloader technology decouples OS from kernel ■ OS-on-demand; proved with SLC4 and SLC5 ○ Needs tools and procedures to automatize and simplify the addition of a new Linux flavour
CernVM work plan 1. Get familiar with the uCernVM technology (M3) 2. Design at prototype the required additions/modificationa (M6) 3. Provide relevant documentation and dissemination material (M11) 4. Follow closely early adoption, analyse feedback, implement consolidation changes (M24) Request for 24 person months (Junior fellow)
SFT commitment / request ● Host the persons(s) ● Provide relevant training ● Code subject to the the same ownership and license terms as current CernVM software
LHCb online interest (N. Neufeld) ● Study options for optimal fabric usage and management ○ Evaluate light-virtualization solutions (e.g. linux containers, dockers) ○ Assess performance and operational issues, such as opportunistic use of facilities having a different main purpose (e.g. the LHCb online farm), ease of maintenance, turn-around times, data handling ● 1 junior fellow (24 person months) fully funded by EC ○ Hosted by LHCb Online admin team (2 senior, 2 junior members, 1 or 2 engineers/physicists) ○ No additional financial needs (e.g. for hardware) ● To be coordinated with Manchester/LHCb (Andrew McNab)
The London meeting: intentions ● Objectives of the meeting ○ Define work plan ■ Work Packages + objectives & deliverables & milestones ○ Define contractors ○ First iteration on the budget ○ Define editors ○ Define contributors to the proposal text and calendar Location: University of Westminster (Cavendish campus)
The London meeting: how it went ● 17 attendees + ~10 remote ○ For 5 people was the first direct interaction with the proposal ● Morning ○ Presentation Tour-de-table (30’) ○ R.Graciani introduction (~the same given at SFT meeting) ○ Expression of interest for the various WPs ■ 1 slide each with detailed explanations and some discussion ● Afternoon ○ Objectives definition for each WP ■ Lot of discussion, took all the remaining time ● Before leaving (17h15) draft schedule for writing and next virtual meetings
The London meeting: impressions ● Program too much for a 7h meeting (including lunch) ● More parts still have to decide about participation ● Some WP still without a very defined list of objectives ● The meeting was useful to identify weak points, but is the time left enough (10 weeks including vacations)? ○ This meeting should have probably taken place before ● Indicated budget (10 MEur) large compared to EINFRA-1 {4,5} ○ Perhaps lack of required ‘political’ work ○ Competing with other big proposals (EUDAT, HelixNebula, …)
The London meeting: outcome ● General consensus (during and after the meeting) that a global vision and detailed list of concrete developments and their expected impact need to be clearly stated in the proposal ● Editors need to be appointed ○ Essentially done after the meeting ● Almost daily cross-checks from here to the draft completeness deadline (July 11th)
Recommend
More recommend