ASPECTS OF HETEROGENEITY AND FAULT TOLERANCE IN CLOUD COMPUTING - PowerPoint PPT Presentation

Atlanta, Georgia, April 19, 2010 in conjunction with IPDPS 2010 UNIBUS: ASPECTS OF HETEROGENEITY AND FAULT TOLERANCE IN CLOUD COMPUTING Magdalena Slawinska Jaroslaw Slawinski Vaidy Sunderam {magg, jaross, vss}@mathcs.emory.edu Emory University, Dept. of Mathematics and Computer Science Atlanta, GA, USA

Creating a problem 2 1. What do I want? 4. Why might I want FT on cloud?  Execute an MPI application  To reduce costs (money, time, energy, … ) 2. What do I need?  Reliability 2. What do I need?  Target resource: MPI cluster  …  Target resource: MPI cluster  FT services: Checkpoint, Heartbeat 5. What is the overhead User introduced by FT? 3. What do I have?  Access to the Rackspace cloud 6. Can I do that? How? Rackspace cloud

Problem 3 User’s requirements Available resource  Execute MPI software  Target resource: MPI cluster  Target platform: FT-flavor Rackspace cloud User’s resources User  Rackspace cloud (credentials) Resource Manually:  interaction with web page transformation  prepare the image: install EC2 cloud required software and dependencies  instantiate servers Target resource  configure passwordless authentication  …. Workstations  1 man-hour for 16+1 nodes

Unibus: a resource orchestrator 4 User’s requirements Available resource  Execute MPI software  Target resource: MPI cluster  Target platform: FT-flavor Rackspace cloud User’s resources User  Rackspace cloud (credentials) Unibus EC2 cloud Target resource Workstations

Outline 5  Unibus – an infrastructure framework that allows to orchestrate resources  Resource access virtualization  Resource provisioning  Unibus – FT MPI platform on demand  Automatic assembly of an FT MPI-enabled platform  Execution of an MPI application on the Unibus- created FT MPI-enabled platform  Discussion of the FT overhead

Unibus resource sharing model 6 Traditional Model Proposed Model Resource exposition Virtual Organization (VO) Resource provider Resource usage Determined by VO Determined by a particular resource provider Resource virtualization Resource providers belonging Software at the client side and aggregation to VO

Handling heterogeneity in Unibus 7  Resources exposed Unibus in an arbitrary uses Capability Model manner as access points User  Capability Model to implements abstract operations Engine available on Mediators provider’s resources access  Mediators to daemon library implement the specifics of access implements points access protocols Network  Knowledge engine to infer relevant facts Resources access points

Unibus access device Complicating Metaapplications a big picture … Services 8 Resources exposed in an  User arbitrary manner as access Unibus points Capability Model implements Capability Model to abstract  operations on resources Mediators to implement the  Engine specifics of access points Services Knowledge engine to infer  Mediators relevant fact Resource descriptors to  Resource describe resources descriptors semantically (OWL-DL) Services (standard and third  Access daemon parties), e.g., heartbeat, library checkpoint, resource Network discovery, etc. implements Metaapplications to  orchestrate execution of access protocols applications on relevant Resources resources Access points

Virtualizing access to resources Capability Model and mediators 9  Capability Model  Provides virtually homogenized access to heterogeneous resources  Specifies abstract binding operations , grouped in interfaces  Interface hierarchy not appropriate (e.g. fs:ITransferable and Implements details ssh:ISftp) Ssh AP  Mediators  Implement resource Rackspace access point protocols cloud Cluster Workstations

Virtualizing access to resources 10 ISsh shell exec subsystem Ssh Mediator implements invoke_shell exec_command get_subsystem get_subsytemsftp compatibleWith …. sshd Workstation

Knowledge engine Mediator’s Developer 11 Knowledge Set Ssh Mediator ISsh Request interface implements invoke_shell shell User ISsh implements exec_command exec implements Resource: get_subsystem subsystem emily … compatibleWith hasOperation some Operating compatibleWith Resource System Knowledge some Access Linux emily Engine hasOS Point (inferring) … … Open SshD hasAccessPoint …

Composite operations 12 ISimpleCloud  Rs_addhosts dependsOn IRackspace create_server addhosts implements create_server  Create_server is implemented by RS deletehosts delete_server Mediator implements  Rs_addhosts implements dependsOn addhosts implements Composite operation  So RS mediator Rackspace implements addhosts rs_addhosts Mediator create_server entry point Composite operations delete_server  Dynamically expand Composite mediator’s operations rs_addhosts operation  May result in classification a.k.a. addhosts definition Def … of mediators and Def … compatible resources to Rackspace new interfaces ISimpleCloud_RS.py Cloud

Resource access unification via composite operations User 13 Eliminating Unified interface ISimpleCloud need of standardization addhosts IEC2 IRackspace deletehosts run_instance create_server … Composite delete_server operations implements implements rs_addhosts Rackspace EC2 Mediator ec2_addhosts Mediator create_server run_instance Def … Def … delete_server … Def … Def … rs_addhosts ec2_addhosts Rackspace Different resources, yet EC2 cloud Cloud semantically similar

Resource provisioning Homogenizing resource heterogeneity 14  Conditioning increases resource specialization levels  Soft conditioning  changes resource software capabilities  e.g., installing MPI enables execution of MPI apps  Successive conditioning  enhances resource capabilities in terms of available access points (may use soft conditioning)  e.g., deploying Globus Toolkit makes the resource accessible via Grid protocols

Transforming Rackspace to FT-enabled MPI platform User’s 15 credentials Rackspace descriptor Unibus Metaapp Rackspace User  Soft conditioning  Successive cond.  Composite ops User’s requirements  …  Execute software: NAS Parellel Benchmarks (NPB)  Target resource: MPI cluster NPB logs  FT services: Heartbeat, Checkpoint FT MPI cluster

Rackspace Cloud to MPI cluster 16 Installing other services (FT) Deployment of MPI on new resources Creating a new group of resources (Rackspace ssh- enabled servers) in terms of new access points Obtaining a higher level of abstraction

Metaapplications User’s requirements  Execute software: NAS Parellel Benchmarks (NPB) 17 Unibus access device  Target resource: MPI cluster Metaapplications  FT services: Heartbeat, Checkpoint Services Unibus Capability Model User Engine Services Mediators Resource descriptors Network Resources

Metaapplication 18 Metaapplication Requests  IClusterMPI  FT services:  IHeartbeat  ICheckpointRestart  Specifies available  resources Performs benchmarks  Transfers benchmarks  execution logs to the head node Requests ISftp 

Rackspace testbed • RAM 256MB 19 • 10GB  16 working nodes (WN) + 1 head node (HN) • RAM HDD  Node: 4-core, 64-bit, AMD 2GHz 1GB dmtcp_coordinator  Debian 5.0 (Lenny) • 40GB  OpenMPI v. 1.3.4 (GNU suite v. 4.3.2 (gcc, HDD gfortran)  NAS Parallel Benchmarks v.3.3, class B Private IPs FT setup: dmtcp_command Heartbeat service:  OpenMPI-based – in case of failure, the service dmtcp_checkpoint – j – h \ determines failes node(s) and raises an exception headNode_privateIP mpirun … Checkpoint/restart service:  DMTCP – Distributed MultiThreaded CheckPointing  user-level transparent checkpointing  Executes dmtcp_command every 60 secs on HN to checkpoint 81 processes (64 MPI processes, 16+1 OpenMPI supervisor processes)  Moves local checkpoint files from WN to HN (in parallel)  Checkpoint time – 5 sec; moving checkpoints from WN -> HN less than10 sec; compressed checkpoint size c.a.1GB

Results: NPB, class B, Rackspace, DMTCP, OpenMPI Heartbeat 20 16 Worker Nodes (WN) + 1 Head Node WN: 4-core, 64-bit, AMD Opteron 2GH, 1GB RAM, 40 GB HDD Checkpoints every 60 sec, average of 8 series Checkpoints every 60 sec FT overhead 2% - 10% HB - Heartbeat

Summary 21  The Unibus infrastructure framework  Virtualization of access to various resources  Automatic resource provisioning  Innovatively used to assemble an FT MPI execution platform on cloud resources  Reduces effort to bare minimum (servers instantiation, etc)  15-20 min from 1 man-hour  Observed FT overhead 2%-10% (expected at least 8%)  Future work  Migration and restart of MPI-based computations on two different clouds or a cloud and a local cluster  Work with an MPI application

ASPECTS OF HETEROGENEITY AND FAULT TOLERANCE IN CLOUD COMPUTING - PowerPoint PPT Presentation

Atlanta, Georgia, April 19, 2010 in conjunction with IPDPS 2010 UNIBUS: ASPECTS OF HETEROGENEITY AND FAULT TOLERANCE IN CLOUD COMPUTING Magdalena Slawinska Jaroslaw Slawinski Vaidy Sunderam {magg, jaross, vss}@mathcs.emory.edu Emory

A comparison of A comparison of heterogeneity correction heterogeneity correction algorithms

WORK IN THE GIG ECONOMY Huma Humans a ns as a s a Se Service rvice @JeremiasPrassl VAST

Etiologic Heterogeneity Etiologic Heterogeneity In Endometrial Cancer Advances in Endometrial

Processing Heterogeneity Nikolaus Grigorieff Heterogeneity and Biology Translocation, Brilot et

Processing Heterogeneity Nikolaus Grigorieff Larson, The Far Side Heterogeneity and Biology

Detecting and Detecting and Characterizing Heterogeneity Characterizing Heterogeneity

Unobserved Heterogeneity in Matching Games Jeremy T. Fox 1 Chenyu Yang 2 1 University of Michigan

Toward Understanding Heterogeneity in Computing Arnold L. Rosenberg Ron C. Chiang Electrical

F.Maraninchi 2 Aspects and Reactive Systems Switch to full screen F.Maraninchi 0 Aspects and

Geotechnical Aspects of Borders Railway Mine Infill Grouting 1 May 15 Geotechnical Aspects of

Measuring the Spatial Heterogeneity of Outdoor Users in Wireless Cellular Networks Based on Open

Statistical Modeling of Spatial Traffic Distribution with Adjustable Heterogeneity and

Escaping the Losses from Trade: The Impact of Heterogeneity on Skill Acquisition Preliminary

Computational modelling of heterogeneity of asphalt mixtures Daniel Castillo 31.05.2018

Computational modelling of heterogeneity of asphalt mixtures Daniel Castillo 31.05.2018

Addressing Tumor Molecular Heterogeneity using A Novel Clinical Trial Design - PANGEA Daniel

Requirements for handling paraphrases Transformation rules / patterns < IWP 2005, Oct. 14th,

02291: System Integration Hubert Baumeister hub@imm.dtu.dk Spring 2013 Contents 1 More UML

How the timing of accident investigation affects data outcome Jim Ouellet Head Protection

Tioga Dashboard? What THEY see... ... vs. what YOU see 2 Why? Our responsibilities no

Serving Medicaid Beneficiaries Who Need Long-term Services and Supports: Better Outcomes at Lower

Managing the U. S. CMS HL-LHC Upgrades Vivian ODell, U. S. CMS HL-LHC USCMS Project Manager

Extending XACML Authorisation Model to Support Policy Obligations Handling in Distributed

Institute for Cyber Security Multi-Tenant Access Control for Cloud Services PhD Dissertation

Sambuz

Useful Links

Newsletter

Mail Us

ASPECTS OF HETEROGENEITY AND FAULT TOLERANCE IN CLOUD COMPUTING - PowerPoint PPT Presentation

Atlanta, Georgia, April 19, 2010 in conjunction with IPDPS 2010 UNIBUS: ASPECTS OF HETEROGENEITY AND FAULT TOLERANCE IN CLOUD COMPUTING Magdalena Slawinska Jaroslaw Slawinski Vaidy Sunderam {magg, jaross, vss}@mathcs.emory.edu Emory

A comparison of A comparison of heterogeneity correction heterogeneity correction algorithms

WORK IN THE GIG ECONOMY Huma Humans a ns as a s a Se Service rvice @JeremiasPrassl VAST

Etiologic Heterogeneity Etiologic Heterogeneity In Endometrial Cancer Advances in Endometrial

Processing Heterogeneity Nikolaus Grigorieff Heterogeneity and Biology Translocation, Brilot et

Processing Heterogeneity Nikolaus Grigorieff Larson, The Far Side Heterogeneity and Biology

Detecting and Detecting and Characterizing Heterogeneity Characterizing Heterogeneity

Unobserved Heterogeneity in Matching Games Jeremy T. Fox 1 Chenyu Yang 2 1 University of Michigan

Toward Understanding Heterogeneity in Computing Arnold L. Rosenberg Ron C. Chiang Electrical

F.Maraninchi 2 Aspects and Reactive Systems Switch to full screen F.Maraninchi 0 Aspects and

Geotechnical Aspects of Borders Railway Mine Infill Grouting 1 May 15 Geotechnical Aspects of

Measuring the Spatial Heterogeneity of Outdoor Users in Wireless Cellular Networks Based on Open

Statistical Modeling of Spatial Traffic Distribution with Adjustable Heterogeneity and

Escaping the Losses from Trade: The Impact of Heterogeneity on Skill Acquisition Preliminary

Computational modelling of heterogeneity of asphalt mixtures Daniel Castillo 31.05.2018

Computational modelling of heterogeneity of asphalt mixtures Daniel Castillo 31.05.2018

Addressing Tumor Molecular Heterogeneity using A Novel Clinical Trial Design - PANGEA Daniel

Requirements for handling paraphrases Transformation rules / patterns &lt; IWP 2005, Oct. 14th,

02291: System Integration Hubert Baumeister hub@imm.dtu.dk Spring 2013 Contents 1 More UML

How the timing of accident investigation affects data outcome Jim Ouellet Head Protection

Tioga Dashboard? What THEY see... ... vs. what YOU see 2 Why? Our responsibilities no

Serving Medicaid Beneficiaries Who Need Long-term Services and Supports: Better Outcomes at Lower

Managing the U. S. CMS HL-LHC Upgrades Vivian ODell, U. S. CMS HL-LHC USCMS Project Manager

Extending XACML Authorisation Model to Support Policy Obligations Handling in Distributed

Institute for Cyber Security Multi-Tenant Access Control for Cloud Services PhD Dissertation

Sambuz

Useful Links

Newsletter

Mail Us

Requirements for handling paraphrases Transformation rules / patterns < IWP 2005, Oct. 14th,