Availability Knob Flexible User-Defined Availability in the Cloud - PowerPoint PPT Presentation

Availability Knob Flexible User-Defined Availability in the Cloud Mohammad Shahrad and David Wentzlaff October 5, 2016

IaaS Providers and Availability Guarantees One thing in common: Fixed 99.95% availability! 2

What’s wrong with fixed availability? Cloud infrastructures: Cloud customers: • Heterogeneous HW &   • Various downtime demands • Different WTP* SW reliability * WTP= Willingness to Pay 3

The Availability Knob (AK) Let’s have clients ask for their desired availability and be charged correspondingly. 4

Cloud Cloud Scheduler What should change in Scheduler cloud to support AK? Service Level Agreements (SLAs) Cloud Cloud Cloud Cloud Cloud Scheduler Scheduler Scheduler Scheduler Scheduler Cloud management • Gathering failure data and build failure stats • Avail-aware scheduling 5 Cloud Cloud Scheduler Scheduler

How do SLAs look with AK? 1. Desired Avail. / Period   2. Availability price scale   (e.g. 99.8% / 7 days) e.g. (99.95%,1.00), (99.9%,0.95) 3. Variable service credit (penalty) 6

The AK Scheduler PM* Failure DB Service Record DB 1. Check for available resources 2. Find the cheapest resource considering possible penalties using: Expected PM time-to-next-failure VM size and expected DT** length in case of failure User’s experienced vs. requested DT * PM= Physical Machine ** DT= Downtime 7

AK-Specific Scheduler Features Extra Knowledge on user availability demand enables new scheduling features: Benign VM* Migration (BVM) Deliberate Downtimes (DDT) * VM= Virtual Machine 8

Benign VM Migration (BVM) • VMs can be over-served • Low failure rate • Assignment to HR resources (resource shortfall) Periodic migration of over-served VMs to cheaper resources * DTF= Downtime Fulfillment ** SLO= Service Level Objective 9

Deliberate Downtimes (DDT) • Providers can deliberately fail VMs near the end of period. Motivations: • • Bidding redeemed resources Building market incentives • • etc. Lowering energy consumption • Delivered Avail. Safety Margin Requested Avail. 10

Economics of AK How to set prices to ensure mutual benefit? How does AK make money? 11

Incentive Compatibility Providers can: Clients may: - run buggy VMs - neglect meeting SLOs* - cause deliberate DTs**. Pricing for incentive compatibility Using game theory to ensure: - Providers maximize profit margin by not violating SLOs - Clients pay less by asking their true demands * SLO= Service Level Objective ** DT = Downtime 12

How does AK make money? 1. Adapting service to real demand:   Higher market efficiency through supply chain flexibility 2. More efficient resource utilization:   Lowering OpEx, Extra Bidding/Sprinting 3. Variable profit margins:   Compensates risks & supply/demand disparity ~10% Cost Reduction ~20% Profit Increase 13

AK Deployment • No hardware change required • Low technology adoption cost • Existing fixed availability a subset of AK • Can be offered as an optional feature • Easy shift to the new model 14

How to evaluate AK? Infrequency of Failures Accelerated testing Simulations Data center scale 1. Stochastic simulations in MATLAB [1] 2. Prototype implementation with OpenStack [1] http://gdkomeg.en.made-in-china.com/productimage 15

AKSim: Stochastic Cloud Simulator Various Machine Types Scalability (cost/resilience trade-off) Resolution/Accuracy trade-off Diverse Applications Multiple VMs 16

OpenStack AK Prototype 17

Availability-aware Scheduler 1000 machines, 12000 users, Normal demand dist., 6 month BVM every 1hr for top 10% of over-served clients 18

Benign VM Migration (BVM) ~7% Cost Reduction Increased Miss Rate 0.19% 0.34% Benefits of BVM depend on machine type blend 1000 machines, 12000 users, Uniform demand dist. [3 nines,5 nines], 30 days and data-center utilization . BVM every 1hr for top 10% of over-served clients 19

Deliberate Downtimes (DDT) DDT Benefits of DDT depend on demand distribution . 1000 machines, 12000 users, Normal demand dist. [3 nines,5 nines], 6 month BVM every 1hr for top 10% of over-served clients 20

Improved Service Satisfaction AK Satisfaction Fixed-avail Satisfaction Downtime Price * WTP= Willingness to Pay 21

Things to Remember • Supply chain flexibility -> market efficiency • Knowing user demand can enable new techniques • Game theory to ensure mutual economic incentive • Leveraging reliability/cost trade-offs 22

The Availability Knob Mohammad Shahrad mshahrad@princeton.edu 23

Back-up Slides

What if client’s demand changed? Client must have the incentive to change his plan. Upper bound of SC given arbitrary P Price Plan update condition: No change; Fixed A 1 PA 1 deliberate failures by PA 1 − SCA 1 ( α A 1 +(1 −α ) A 2 ) user to earn cash back α PA 1 +(1 −α ) PA 2 Change to A 2 25

Nash Equilibrium Nash equilibrium: 26

Catastrophic Failure & AK • When the whole cloud service is down. 100 90 80 70 Missed SLOs (%) 60 50 40 30 20 AK (Uniform Dist) 10 Fixed Availability 0 0 1 2 3 4 5 6 7 8 Catastrophic Event Length (Hour) 27

Why OpenStack • VM migration (unlike Eucalyptus) • Diverse hypervisor support (KVM) • AWS Compatibility • Big community (good support) • Real world adoption in public/private/hybrid clouds 28

Some More Results 29

Service Credit Reshaping 30

Availability Monitoring Tools • There are some performance monitoring tools AK can use to gather avail data: • Nagios (used in AWS) • Zabbix • Ganglia 31

Availability Knob Flexible User-Defined Availability in the Cloud - PowerPoint PPT Presentation

Availability Knob Flexible User-Defined Availability in the Cloud Mohammad Shahrad and David Wentzlaff October 5, 2016 IaaS Providers and Availability Guarantees One thing in common: Fixed 99.95% availability! 2 Whats wrong with fixed

Looking Beyond the Knob Looking Beyond the Knob Looking Beyond the Knob Looking Beyond the Knob

TATA HARRIER Harrier Gear Shift Knob TATA HARRIER GEAR KNOB TATA NEXON Nexon Gear Shift Knob

Knob Handler Putting a handle on any knob 2.009 Yellow A Overview Arthritis patients

High Grade DSO Iron Ore High Grade DSO Iron Ore at Peculiar Knob is Just at Peculiar Knob is

Lilydale Regional Park North Knob Stabilization Meeting February 1, 2018 Me e ting Ag e

A review of the BIAS and KNOB attacks on Bluetooth Classic and Bluetooth Low Energy Daniele

Drupal High Availability High Performance Samstag, 3. November 12 Drupal High Availability

for High Availability Martin Thompson - @mjpt777 What Is High Availability ?

Extending CSP with tests for availability Gavin Lowe Extending CSP with tests for availability

It s the Grade that Counts! s the Grade that Counts! It Bob Duffin Bob Duffin Executive

Brad Duckett Dalton Area Woodturners Guild November 2019 Design The Bowl The Lid

APEX expriment s knob M. Bai, Y. Hao, G. Robert-Demolaize, S. White, X. Shen, Z. Duan April

The KNOB is Broken: Exploiting Low Entropy in the Encryption Key Negotiation Of Bluetooth BR/EDR

AutoASAP AutoASAP Features AutoAsap Entities Availability & Availability &

High Availability with the openais project Prepared by: Steven Dake October 2005 Agenda

High Availability with the openais project Prepared by: Steven Dake 7/12/05 Agenda Service

Second Quarter 2019 Earnings Presentation August 1, 2019 www.ussteel.com Forward-looking

Zero Downtime Deployment with Ansible Zero Downtime Deployment with Ansible DevOps Pro Moscow

AmazingStore: Available, Low-cost Online Storage Service Using Cloudlets Ben Y. Zhao Zhi Yang,

Ideas for finding UV from streamers in ProtoDUNE Ideas, drawings, photos, etc. by Francesco

Functional System Simulation with SimuBoost GI Fachgruppentreffen Betriebssysteme (BS) 2016 Marc

ZERO-DOWNTIME DATACENTER FAILOVERS (SWITCHING HOSTING PROVIDERS FOR YOUR 1.5TB MYSQL DATABASE FOR

BRINGING POSTGRES TOWARDS ZERO DOWNTIME MIGRATION Matthieu Rigal, EuroPython 2015, Bilbao INTRO

UC.yber Meeting 18 If Youre New! Join our Slack ucyber.slack.com Follow us on Twitter

Availability Knob Flexible User-Defined Availability in the Cloud - PowerPoint PPT Presentation

Availability Knob Flexible User-Defined Availability in the Cloud Mohammad Shahrad and David Wentzlaff October 5, 2016 IaaS Providers and Availability Guarantees One thing in common: Fixed 99.95% availability! 2 Whats wrong with fixed

Looking Beyond the Knob Looking Beyond the Knob Looking Beyond the Knob Looking Beyond the Knob

TATA HARRIER Harrier Gear Shift Knob TATA HARRIER GEAR KNOB TATA NEXON Nexon Gear Shift Knob

Knob Handler Putting a handle on any knob 2.009 Yellow A Overview Arthritis patients

High Grade DSO Iron Ore High Grade DSO Iron Ore at Peculiar Knob is Just at Peculiar Knob is

Lilydale Regional Park North Knob Stabilization Meeting February 1, 2018 Me e ting Ag e

A review of the BIAS and KNOB attacks on Bluetooth Classic and Bluetooth Low Energy Daniele

Drupal High Availability High Performance Samstag, 3. November 12 Drupal High Availability

for High Availability Martin Thompson - @mjpt777 What Is High Availability ?

Extending CSP with tests for availability Gavin Lowe Extending CSP with tests for availability

It s the Grade that Counts! s the Grade that Counts! It Bob Duffin Bob Duffin Executive

Brad Duckett Dalton Area Woodturners Guild November 2019 Design The Bowl The Lid

APEX expriment s knob M. Bai, Y. Hao, G. Robert-Demolaize, S. White, X. Shen, Z. Duan April

The KNOB is Broken: Exploiting Low Entropy in the Encryption Key Negotiation Of Bluetooth BR/EDR

AutoASAP AutoASAP Features AutoAsap Entities Availability &amp; Availability &amp;

High Availability with the openais project Prepared by: Steven Dake October 2005 Agenda

High Availability with the openais project Prepared by: Steven Dake 7/12/05 Agenda Service

Second Quarter 2019 Earnings Presentation August 1, 2019 www.ussteel.com Forward-looking

Zero Downtime Deployment with Ansible Zero Downtime Deployment with Ansible DevOps Pro Moscow

AmazingStore: Available, Low-cost Online Storage Service Using Cloudlets Ben Y. Zhao Zhi Yang,

Ideas for finding UV from streamers in ProtoDUNE Ideas, drawings, photos, etc. by Francesco

Functional System Simulation with SimuBoost GI Fachgruppentreffen Betriebssysteme (BS) 2016 Marc

ZERO-DOWNTIME DATACENTER FAILOVERS (SWITCHING HOSTING PROVIDERS FOR YOUR 1.5TB MYSQL DATABASE FOR

BRINGING POSTGRES TOWARDS ZERO DOWNTIME MIGRATION Matthieu Rigal, EuroPython 2015, Bilbao INTRO

UC.yber Meeting 18 If Youre New! Join our Slack ucyber.slack.com Follow us on Twitter

AutoASAP AutoASAP Features AutoAsap Entities Availability & Availability &