cloud spot markets are not sustainable
play

Cloud Spot Markets are Not Sustainable: The Case for Transient - PowerPoint PPT Presentation

Cloud Spot Markets are Not Sustainable: The Case for Transient Guarantees Supreeth Subramanya, Amr Rizk, David Irwin g n i l Idle Cloud Capacity l e S Shared warehouse scale has its limitations machines tend to have 10-50% utilization


  1. Cloud Spot Markets are Not Sustainable: The Case for Transient Guarantees Supreeth Subramanya, Amr Rizk, David Irwin

  2. g n i l Idle Cloud Capacity l e S ❝ Shared warehouse scale has its limitations machines tend to have 10-50% utilization ❞ [ 2013 ] The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines . Commoditized compute Users bid in a EC2 continually evaluates supply- Allocate: bid price ≥ spot price 2nd price auction demand to price spot servers Revoke: bid price < spot price 2/15

  3. Commodity Spot Markets Commodity and futures markets are great at pricing the resources and balancing supply and demand but … Mature markets are inherently VolatilE Not possible to “beat the market” by Predicting future prices Efficient Market Hypothesis 3/15

  4. Compute Time vs. Other Commodities . . . vs. Compute time is ❝ stateful ❞ , e r u t a m e s s t a e e k r r c a e m d t l o l i p w s 1. Losing a server unpredictably incurs an overhead e d t u a o c o l c l l e a h y t e s h A 2. This overhead decreases the useful compute time of the server t s e c r u o s e r f o e u l a v e h t ∴ market volatility reduces amount of compute time purchased 4/15

  5. Understanding Spot Market Characteristics 5/15

  6. Spot Servers are Intrinsically Less Valuable! Single-node batch job on a spot VM Expected runtime T opt T T E[T spot ] = T + ( * 𝝴 ) + ( * ) 2 T opt MTTR Stateful Spot VM Checkpoint to batch job remote disk Checkpointing Actual Optimal interval T opt ≈ √ (2 * 𝝴 * MTTR) Recomputation Overhead Runtime of checkpointing ❝ On average, spot servers get less work done per unit of time compared to an equivalent on-demand server ❞ 6/15

  7. Spot Servers are Intrinsically Less Valuable! T on-demand Equilibrium Price of Spot P eq = P on-demand * (or price when spot stops being cheap ) E[T spot ] ➕ Completion time Stateful 12 hours On-demand ❝ For this application, batch job a spot server with 40% discount on the on-demand price, provides no savings at all ❞ ➕ Completion time Stateful 20 hours Spot VM batch job 7/15

  8. Distilling the Spot Market Characteristics We identify three key metrics: Availability , Volatility , Predictability Compute time Compute time Compute time f chkp ∑ a i = A S Time to checkpoint Time to checkpoint Time to checkpoint Lost time A S . . . a 1 a 2 a 3 a 4 a V 1 1 1 Availability Availability Availability a 3 , a 4 < f chkp 0 0 t t 0 t Unit time Unit time Unit time Available, Not Volatile, Predictable Available, Volatile , Predictable Available, Volatile, Unpredictable Needs as many checkpoints Needs just one checkpointing Needs periodic checkpointing as there are revocations 8/15

  9. Market Characteristics Impact the Performance 25 1.2 Hourly price (in $) Hourly price (in $) 20 0.9 15 0.6 10 0.3 5 0 0 Jan 1 Jan 15 Feb 1 Feb 15 Mar 1 Jan 1 Jan 15 Feb 1 Feb 15 Mar 1 Useful Server Time Chkp Overhead c4.large (Linux) us-east-1 Recomputation cg1.4xlarge (Linux) us-east-1 100 (% of On-demand) Performance 80 Mature markets are Deprecated/rarely used markets are 60 more volatile and less predictable less volatile and more predictable 40 20 0 OnDemand c4.large cg1.4xl Equilibrium price of markets Useful Server Time Chkp Overhead 9/15 Recomputation

  10. On Spot Market Evolution 10/15

  11. State of EC2 Spot Markets (Adaption level, Cost and Complexity) Demand ≈ Supply, Low adaption, Increasing adaption, Priced cheaply, Priced moderately, Equilibrium price, Complex to use Decreasing complexity Convenient to use 2015 onwards 2009-2014 Under mature market conditions As they mature, cloud spot markets may not maximize the value of idle cloud capacity 11/15

  12. Transient Guarantees ❝ Uncertainty is more stressful than knowing for sure something bad will happen ❞ de Berker, Archy O., et al. “ Computations of uncertainty mediate acute stress responses in humans. ” Nature communications 7 (2016) 12/15

  13. Why Transient Guarantees? Not all spots are alike, and there are many ways to sell them Idle Cloud Capacity Highly Volatile nodes Highly Available nodes EC2 Spot and GCE Preemptible No explicit information on availability and volatility Transient Guarantees Class-N (MTTR based) Class-1 (low MTTR) (high MTTR) 13/15

  14. Transient Guarantees Providing probabilistic assurances on availability, volatility and predictability of spot servers E.g., Class-1 servers come with an MTTR of 55 hours , and Class-4 servers 2 hours Able to value spot servers correctly Increase revenue through differentiated offering Minimize fault-tolerance overhead Retain the freedom to reclaim any server Partitioning transient nodes into classes Verifying transient guarantees Fixed pricing vs. market pricing 14/15

  15. Thank you! Supreeth Subramanya http://people.umass.edu/ssubramanya/ 15/15

Recommend


More recommend