Economics of Cloud Computing: a Statistical Genetics Case Study - PowerPoint PPT Presentation

Economics of Cloud Computing: a Statistical Genetics Case Study Jeremy M. R. MARTIN Dale DUNLAP (UnivaUD) Steven J. BARRETT Steve WESTON (Revolution Computing) Simon J. THORNBER Silviu-Alin BACANU 3 rd November 2009

Some Definitions  Grid Computing: combining multiple computer resources to solve a single task, typically a scientific, technical or business problem that requires a great number of processing cycles or the need to process large amounts of data.  Cloud Computing: a paradigm of computing in which dynamically scalable and often virtualized resources are provided as a service over the Internet.  Genetic Association Analysis: statistical analysis of data from many patients to link a disease to a genetic mutation, potentially leading to discovery of new medicines.  R: a very powerful and high level programming language for statistical analysis and data visualisation. Martin et al 3 rd Nov 2009 Page 2 CPA 2009

Project Summary Aim: Investigate the feasibility and economics of running a major genetic association analysis on external ‘clouds’. Method: Set up three way collaboration between GSK, Revolution Computing (statistical software specialists) and Univa (Cloud brokers) to run a Cloud Computing PoC using our parallel R software Three strategies for cost reduction: 1. Make efficient use of external resources with Univa’s scheduler and resource manager – keep the rented cloud resources busy 2. Optimise the serial performance of the R code – using Revolution’s expertise and toolset 3. Seek out the lowest cost cloud computing resources to run the application . Martin et al 3 rd Nov 2009 Page 3 CPA 2009

Genetic Association Analysis using Simulation Silviu Alin Bacanu SNP association analysis:  5000 cases, 5000 controls  Phenotype and genotype data  Run 250,000 sub-analyses each with a different combination of parameters and 1000 different permutations of the data.  Total CPU time: 7 years  Elapsed time on GSK desktop grid: ~3 days Martin et al 3 rd Nov 2009 Page 4 CPA 2009

GSK Desktop Grid R&D Desktop PCs worldwide (1500 concurrent licenses) Grid job server (linux) Submission PC Grid database server Global storage (SAN) Martin et al 3 rd Nov 2009 Page 5 CPA 2009

Cloud Computing Firewall Trusted Cloud Broker CERN CERN e.g. Univa Amazon Amazon Rack space Rack space Martin et al 3 rd Nov 2009 Page 6 CPA 2009

Obstacles to Cloud Computing. Source: Above the Clouds: A Berkeley View of Cloud Computing Technical Report No. UCB/EECS-2009-28 http://www.eecs.berkeley.edu/Pubs/TechRpts/2009/EECS- 2009-28.html 1 Availability of Service 2 Data Lock-In 3 Data Confidentiality and Auditability 4 Data Transfer Bottlenecks 5 Performance Unpredictability 6 Scalable Storage 7 Bugs in Large Distributed Systems 8 Scaling Quickly 9 Reputation Fate Sharing 10 Software Licensing Martin et al 3 rd Nov 2009 Page 7 CPA 2009

Strategy 1. Efficient Resource Management UniCloud works as follows: – One Linux virtual machine (“installer node”) is created within the chosen cloud environment (e.g. Amazon EC2). – Additional virtual machines are created programmatically using the vendor cloud API. – Sun Grid Engine is installed by Unicloud and used as the batch job scheduler. – Jobs are queued by SGE and then farmed out to different machines as they become available. Martin et al 3 rd Nov 2009 Page 8 CPA 2009

Strategy 2: Accelerating R Code  R programming language is a popular and productive public-domain tool for statistical computing and graphics.  However R programs may take a long time to execute when compared with equivalent programs written in low-level languages like C so there have been many initiatives to make R programs run faster.  These fall into three general categories: – Task farm parallelisation: running a single program many times in parallel with different data across a grid or cluster of computers. – Explicit parallelisation in the R code using MPI or parallel loop constructs (e.g. R/Parallel). – Speeding up the performance of particular functions by improved memory handling, or by using multithreaded or parallelised algorithms ‘beneath the hood’ to accelerate particular R functions e.g. REvolution R, Parallel R or SPRINT.  We are using a combination of approach 1 with approach 3. Martin et al 3 rd Nov 2009 Page 9 CPA 2009

Results of Serial Optimisation of the R code  Using the public domain Revolution version of R improved the execution time by 6%.  Code profiling using the RProf tool revealed that most of the execution time was within the ‘lm’ function (and related functions) for fitting linear models, based on the QR matrix factorization.  This has not yet been optimised by Revolution Computing but is work in progress and should provide a further 5-10% improvement.  A simple code transformation provided a further 20% improvement: transforming a ‘for’ loop iteration to a function together with an ‘sapply’ command. Martin et al 3 rd Nov 2009 Page 10 CPA 2009

Strategy 3: Pursuit of Low Cost Compute Cycles Amazon Elastic Cloud (EC2) Linux Instances Instance Memory ‘Compute Storage Architecture Price per GB units’ GB VM hour Standard 1.7 1 160 32 bit $0.10 small Standard 7.5 4 850 64 bit $0.20 large Standard 15 8 1690 32 bit $0.40 extra large High CPU 1.7 5 350 32 bit $0.20 medium High CPU 7 20 1690 64 bit $0.80 large Martin et al 3 rd Nov 2009 Page 11 CPA 2009

Rackspace Linux Instances Memory MB Storage GB Price per VM hour 256 10 $0.015 512 20 $0.03 1024 40 $0.06 2048 80 $0.12 Association analysis code resource requirements: • 50MB RAM • Negligible storage • 15 minutes CPU time on modern Intel processor • 250,000 times over Martin et al 3 rd Nov 2009 Page 12 CPA 2009

Final Results Cloud virtual machine Cost/hr Throughput: Estimated total jobs/inst/hr cost of run Amazon EC2 Standard XL (8 EC2 compute $0.80 8.25 $24,250.00 units, 15GB RAM) Amazon EC2 Standard XL following R code $0.80 12.5 $16,000.00 optimisation Amazon EC2 High CPU XL (20 EC2 compute $0.80 26.82 $7,458.33 units, 7 GB RAM) Amazon EC2 High CPU XL following R code $0.80 35.56 $5,625.00 optimisation Rackspace 256MB RAM $0.015 12.23 $306.72 Rackspace 256MB RAM, following R code $0.015 15.24 $246.09 optimisation Martin et al 3 rd Nov 2009 Page 13 CPA 2009

Caveats  This is a best case scenario for Rackspace – performance could be adversely affected by other users on the system.  These results are specific to a particular class of problem. Applications requiring huge amounts of data might not map so effectively to cloud computing.  Usage of Rackspace is limited to 200 concurrent VMs, which gives a best execution time of 4 days for this problem. Amazon can scale much higher.  There is no cost from cloud broker included here – just the cost of the cloud cycles. Martin et al 3 rd Nov 2009 Page 14 CPA 2009

Conclusions  Going forward we foresee a model of cloud brokerage emerging whereby a layer of middleware is provided to help satisfy customers constraints in utilising software services based on factors such as cost, security or overall execution time.  One possible attractive feature would be for the cloud broker to charge the customer a fixed cost for total compute cycles , rather than virtual CPU time. – The Broker would then be taking on the risk of performance degradation on a third party cloud. – They would need to build performance monitoring into their resource manager.  If you can find a vendor that has an instance that perfectly matches your workload (or, if it’s possible to vary your workload to perfectly match an inexpensive instance), cloud computing can be very inexpensive. Martin et al 3 rd Nov 2009 Page 15 CPA 2009

Economics of Cloud Computing: a Statistical Genetics Case Study - PowerPoint PPT Presentation

Economics of Cloud Computing: a Statistical Genetics Case Study Jeremy M. R. MARTIN Dale DUNLAP (UnivaUD) Steven J. BARRETT Steve WESTON (Revolution Computing) Simon J. THORNBER Silviu-Alin BACANU 3 rd November 2009 Some Definitions Grid

Human Genetics and Gene Mapping of Complex Traits Advanced Genetics, Spring 2016 Human Genetics

Cloud Computing & Cloud Models Cloud Models Topics Defining cloud computing

Carl Spickett Academic Laboratory of Medical Genetics Academic Laboratory of Medical Genetics

cloud computing Ridwaan Boda Director | Technology, Media and Telecommunications Overview

Cloud Computing and Cloud Storage By: Maurice Kelly History of Internet and Cloud Computing

KAFKA STREAMS CLOUD MONITORING AWS CLOUD MONITORING AWS APP CLOUD MONITORING AWS HTTP APP

Chapter 4 Cloud Computing Applications and Paradigms Cloud Computing: Theory and Practice. 1

Storage Deduplication in Cloud Computing Joo Paulo and Jos Pereira University of Minho July

Patterns for Cloud Computing Simon Guest Senior Director, Technical Strategy Microsoft

Introduction to PaaS and IaaS Cloud Computing Roberto Beraldi Models for Cloud Computing

Are We Really Cloud-Native? Bert Ertman Cloud-Native Computing What is Cloud-Native? answer:

Linux Containers Drive P2P Social Cloud Computing By Alex Karasulu Social cloud computing ,

Cloud Computing Tom Hendrickx RESEARCH QUESTION Define Cloud Computing in context of the higher

WHY STUDY ECONOMICS? Choosing a major or minor in economics MYTHS OF ECONOMICS: Economics is

Building a Private Cloud Cloud Infrastructure Using Opensource Building a Private Cloud OSCON

Secure Outsourcing Computation Li Xiong Outline Cloud computing Computing on encrypted

HEAVY QUARK JET PRODUCTION AT TEVATRON IN THE REGGE LIMIT OF QCD V.A. Saleev Samara State

Legislative reform in Canada A question of capacity? Paul Thomas PhD Candidate 1 Capacity of

T HE C WORD . S OCIAL CLASS & DISADVANTAGE IN PROFESSIONAL LIFE David Travers QC

Beta Presentation Safety Reporting and QC Audit Center Mobile App The Capstone Experience Team

THE QUAD SCOTTSDALE, AZ ACQUISITION JULY 2018 FORWARD-LOOKING STATEMENTS Certain statements

R15201- Autonomous People Mover Patrick Bayly, Eric Paterno, Steve Smith, Peter Hajosch Project

ELENA transfer lines Glenn Vanbavinckhove, Wolfgang Bartmann, Dani Barna and Ranko Ostojic Many

TRIKEXPLOR

Sambuz

Useful Links

Newsletter

Mail Us

Economics of Cloud Computing: a Statistical Genetics Case Study - PowerPoint PPT Presentation

Economics of Cloud Computing: a Statistical Genetics Case Study Jeremy M. R. MARTIN Dale DUNLAP (UnivaUD) Steven J. BARRETT Steve WESTON (Revolution Computing) Simon J. THORNBER Silviu-Alin BACANU 3 rd November 2009 Some Definitions Grid

Human Genetics and Gene Mapping of Complex Traits Advanced Genetics, Spring 2016 Human Genetics

Cloud Computing &amp; Cloud Models Cloud Models Topics Defining cloud computing

Carl Spickett Academic Laboratory of Medical Genetics Academic Laboratory of Medical Genetics

cloud computing Ridwaan Boda Director | Technology, Media and Telecommunications Overview

Cloud Computing and Cloud Storage By: Maurice Kelly History of Internet and Cloud Computing

KAFKA STREAMS CLOUD MONITORING AWS CLOUD MONITORING AWS APP CLOUD MONITORING AWS HTTP APP

Chapter 4 Cloud Computing Applications and Paradigms Cloud Computing: Theory and Practice. 1

Storage Deduplication in Cloud Computing Joo Paulo and Jos Pereira University of Minho July

Patterns for Cloud Computing Simon Guest Senior Director, Technical Strategy Microsoft

Introduction to PaaS and IaaS Cloud Computing Roberto Beraldi Models for Cloud Computing

Are We Really Cloud-Native? Bert Ertman Cloud-Native Computing What is Cloud-Native? answer:

Linux Containers Drive P2P Social Cloud Computing By Alex Karasulu Social cloud computing ,

Cloud Computing Tom Hendrickx RESEARCH QUESTION Define Cloud Computing in context of the higher

WHY STUDY ECONOMICS? Choosing a major or minor in economics MYTHS OF ECONOMICS: Economics is

Building a Private Cloud Cloud Infrastructure Using Opensource Building a Private Cloud OSCON

Secure Outsourcing Computation Li Xiong Outline Cloud computing Computing on encrypted

HEAVY QUARK JET PRODUCTION AT TEVATRON IN THE REGGE LIMIT OF QCD V.A. Saleev Samara State

Legislative reform in Canada A question of capacity? Paul Thomas PhD Candidate 1 Capacity of

T HE C WORD . S OCIAL CLASS &amp; DISADVANTAGE IN PROFESSIONAL LIFE David Travers QC

Beta Presentation Safety Reporting and QC Audit Center Mobile App The Capstone Experience Team

THE QUAD SCOTTSDALE, AZ ACQUISITION JULY 2018 FORWARD-LOOKING STATEMENTS Certain statements

R15201- Autonomous People Mover Patrick Bayly, Eric Paterno, Steve Smith, Peter Hajosch Project

ELENA transfer lines Glenn Vanbavinckhove, Wolfgang Bartmann, Dani Barna and Ranko Ostojic Many

TRIKEXPLOR

Sambuz

Useful Links

Newsletter

Mail Us

Cloud Computing & Cloud Models Cloud Models Topics Defining cloud computing

T HE C WORD . S OCIAL CLASS & DISADVANTAGE IN PROFESSIONAL LIFE David Travers QC