Q S QoS and DLC in d DLC i IaaS INDIGO- DataCloud Presenter : Patrick Fuhrmann Contributions by: Giacinto Donvito, INFN Marcus Hardt, KIT Paul Millar, DESY Alvaro Garcia, CSIC Alvaro Garcia, CSIC With ki d With kind contributions by t ib ti b Zdenek Sustr, CESNET Shaun DEWITT, EUDAT And many more
Content Introducing INDIGO-DataCloud. What is the issue with QoS in Storage ? Which part are we trying to solve ? What is our approach ? INDIGO-DataCloud, QoS and Data Life Cycle, Patrick Fuhrmann 2 15/10/2015
INDIGO DataCloud Cheat Sheet H2020 Project j Approved Jan 2015 Started April 2015 – Ends Sep 2017= 30 months 26 European Partners 26 E P 11 European Countries > 11 Million Euros 11 Milli E Objective : Develop an Open Source platform for computing and data, deployable on public and private cloud infrastructures. d l bl bli d i t l d i f t t Requirements and use-cases collected from 11 INIDIGO communities. For further details : http://indigo-datacloud.eu F f th d t il htt //i di d t l d INDIGO-DataCloud 3 22/03/2016
INDIGO DataCloud WP structure WP1 Management WP2 Community requirements WP3 Software Management Pilot Services WP4 WP4 IaaS Resource Virtualization IaaS, Resource Virtualization WP5 PaaS, Platform WP6 Portals and user access Stolen from Alvaro’s, Andrea’s presentation INDIGO-DataCloud 4 22/03/2016
WP4 in detail Virtualized Computing Resources Full Container support for Cloud Management Infrastructures and Batch F ll C i f Cl d M I f d B h Container support for special hardware (Infiniband, GP-GPU’s) Spot Instances Fair Share Scheduling Fair Share Scheduling Virtualized Storage Resources QoS and Data Life Cycle for storage (storage management) Access to data by meta data instead of name space Access to data by meta data instead of name space Dual access to data (Object Store versus POSIX file name space) Identity Harmonization for storage Virtualized Network Resources Virtualized Network Resources Orchestrating local and federated network resources “Software Defined Network” evaluation Services and Appliances for for virtual networks Services and Appliances for for virtual networks INDIGO-DataCloud 5 22/03/2016
Why QoS and DLC y Q EU requires to provide a “Data Management Plan” from all data q p g intensive EU projects. Problem : Problem : No common way to describe QoS or Data Life Cycle No common way to negotiate QoS with storage endpoints (except for SRM systems ) Common definitions for QoS would be very convenient in general but inevitable for PaaS layers, as the negotiation resp. brokering is done by engines. (Similar to hotel or flight finders) INDIGO-DataCloud 6 22/03/2016
Description of Work for WP4 p 1. Define a common vocabulary for QoS storage properties and their values based on use cases from scientific communities : Involve standardization bodies, e.g. RDA, OGF 2. Define a semantics to negotiate QoS with endpoints 2 D fi ti t ti t Q S ith d i t 3. Find a real network protocol (prototype or demonstrator) and implement the defined QoS semantics for different systems. INDIGO-DataCloud 7 22/03/2016
Introducting part of the issue Storage provisioning for large public infrastructures is facing two Storage provisioning for large public infrastructures is facing two contradicting problems The complexity of storage and storage management p y g g g The large variaty of sciencies and their diverging expectations on storage INDIGO-DataCloud, QoS and Data Life Cycle, Patrick Fuhrmann 8 15/10/2015
Infrastructure Problem Infrastructures Are growing in size of storage and number of supported sciences and communities and number of supported sciences and communities and Number of direct customers accessing storage They all have different ideas on how to use storage. Serving them in the old fashion doesn’t scale any more So you need an API’s or portals to let them select what they need Infrastructures are used by platforms which Infrastructures are used by platforms, which tend to federated resources from different locations and storage providers. So storage needs to be brokered and procured automatically (or programatically) INDIGO-DataCloud, QoS and Data Life Cycle, Patrick Fuhrmann 9 15/10/2015
Examples for Storage Complexity Examples for Storage Complexity INDIGO-DataCloud, QoS and Data Life Cycle, Patrick Fuhrmann 10 15/10/2015
Quality of Service based on media Media Quality Quality Access MEDIUM HIGH MEDIUM LOW MEDIUM Latency y Not so clear Durability OK Quite OK MEDIUM OK Datarate MEDIUM OK OK OK OK Very high h h Very low Reasonable MEDIUM MEDIUM Cost INDIGO-DataCloud 11 22/03/2016
Not quite as easy as that It looks simple, but there are issues. Starting with: a) What are storage properties. b) Wh t b) What are storage property values. t t l INDIGO-DataCloud 12 22/03/2016
Storage quality properties and values Property Value Property Access Latency How long does it take from the request for a byte to receiving that byte. R t Retention Policy ti P li What is the probability of data loss. Access Mechanisms Access Mechanisms http, GridFTP , NFS, …. Security encrypted during the transfer, on disk, end – to – end. Authentication SAML Open ID Connect Password X509 SAML, Open ID Connect, Password, X509 INDIGO-DataCloud, QoS and Data Life Cycle, Patrick Fuhrmann 13 15/10/2015
How many QoS properties ? Is there a sufficiently complete set of properties ? In WCLG we only had two properties : Access Latency Retention policy That was already too much for most people That was already too much for most people Talking to Reagan Moore (IRODS) at the Paris RDA meeting: He is suggesting about 200 properties That might be a bit over the top for a start INDIGO-DataCloud 14 22/03/2016
Even more complexity QoS Property “Value Ambiguity” l bi i Property dependencies Property Quantization Non standard property zoo of existing system INDIGO-DataCloud, QoS and Data Life Cycle, Patrick Fuhrmann 15 15/10/2015
QoS Property Value Ambiguity p y g y 1 day 1 hour 1 ms 1 ns Access Latency y archive backup streaming HPC High Ambiguity g g y Cheapest Fastest INDIGO-DataCloud 16 22/03/2016
Property dependencies D Durability bilit A Access Latency L t INDIGO-DataCloud 17 22/03/2016
Property Quantization Multi Dimensional More More Property Quantization Cost Data S3 Glacier A Access Latency L t INDIGO-DataCloud 18 22/03/2016
Properties zoo of existing systems Properties zoo of existing systems Amazon S3 Glacier Durable Reduces Google Standard Nearline Availability HPSS/GPSS Corresponds to the HPSS Classes (customizable) disk+tape dCache Resilient TAPE INDIGO-DataCloud 19 22/03/2016
Ti Time to tidy up ! t tid ! Starting with the unambiguous Starting with the unambiguous technical view, seen by the storage system. t Canonical Properties INDIGO-DataCloud 20 22/03/2016
What are canonical properties ? p p Class A Class C Class B Access Latency < 1 ms < 10 min 0.99999999 < 0.9999 Durability ****** Media Media Disk / SSD Disk / SSD Tape Tape 1 Disk Replicas 2 Tape 10 E/m/GB 10 E/m/GB 20 E/m/GB 20 E/m/GB Price Price !!! For EUDAT, those “Classes” are close to their “Services” !!! F EUDAT th “Cl ” l t th i “S i ” INDIGO-DataCloud 21 22/03/2016
How to get … g S So after having defined f h i d fi d Canonical Stroage Properties g p and their values ….. How to get them out of existing storage systems ? INDIGO-DataCloud 22 22/03/2016
Canonical Storage Properties Canonical Storage Storage Property Information Access Storage System dCache StoRM EOS Slightly extended Slightly extended Information Provider (internal component) INDIGO-DataCloud 23 22/03/2016
Canonical Storage Properties Canonical Storage Storage g P Property Information I f i Access Storage System Canonical Storage Canonical Storage HPSS GPSS HPSS. GPSS Property Information Google Plug-in System Amazon (external component) Proprietary Storage Property Info INDIGO-DataCloud 24 22/03/2016
Customer View The canonical view only helps to describe the system on the technical level. It’s not very helpful for the storage enduser. We need to introduce more convenient We need to introduce more convenient QoS views . INDIGO-DataCloud 25 22/03/2016
QoS views Q Examples on how a user would decribe his/her needs d Low latency & Lowest price L l t & L t i Highest possible throughput & Short term Highest possible throughput & Short term Scratch & Very cheap Long Term Storage & Price not important INDIGO-DataCloud 26 22/03/2016
Recommend
More recommend