tdlo
play

tdlo CS 744: DATACENTER AS A COMPUTER Shivaram Venkataraman Fall - PowerPoint PPT Presentation

tdlo CS 744: DATACENTER AS A COMPUTER Shivaram Venkataraman Fall 2020 ANNOUNCEMENTS - Assignments Piazza - Assignment zero is due! - Form groups for Assignment 1 on Piazza Thursday - Class format - Review - Lecture -


  1. tdlo CS 744: DATACENTER AS A COMPUTER Shivaram Venkataraman Fall 2020

  2. ANNOUNCEMENTS - Assignments Piazza - Assignment zero is due! → - Form groups for Assignment 1 on Piazza ↳ Thursday - Class format - Review - Lecture - Discussion

  3. Applications Machine Learning SQL Streaming Graph Application ¥ Computational Engines ' Scalable Storage Systems Arch [ . Resource Management > Hardware → Architecture Datacenter Architecture

  4. OUTLINE - Hardware Trends - Datacenter design - WSC workloads - Discussion

  5. Why is One Machine Not Enough? parallelism limited - enough resources not → ^ ) high could be Cost - contd maqn.ge Redundancy → - high volumes are Data - - - slow →

  6. What’s in a Machine? DRAM y Procecnpgr f. Interconnected compute and storage Memory Bus Newer Hardware - GPUs, FPGAs PCIe v4 - RDMA, NVlink → Ssp Ethernet SATA HDD →

  7. Scale Up: Make More Powerful Machines Moore’s law ? ? O – Stated 52 years ago by Intel • / founder Gordon Moore – Number of transistors on microchip double every 2 years – Today “closer to 2.5 years” Intel CEO Brian Krzanich

  8. Dennard Scaling is the Problem Core 32 or core If Suggested that power requirements are proportional ¥ to the area for transistors – Both voltage and current being proportional to length – Stated in 1974 by Robert H. Dennard (DRAM inventor) “Adapting to Thrive in a New Economy of Memory Abundance,” Bresniker et al Broken since 2005

  9. ⇒ Dennard Scaling is the Problem Performance per-core is stalled I Number of cores is increasing “Adapting to Thrive in a New Economy of Memory Abundance,” Bresniker et al

  10. ft Memory TRENDS Cop awk or loot of t 's tater a B pi f - GB Is lo - 15 - core per log 100 M = DRAM O '

  11. MEMORY TAKEAWAY Growing Data access from memory is getting more expensive ! +15% per year

  12. HDD CAPACITY storage - Back blaze - - → backup O O O

  13. HDD BANDWIDTH HM bandwidth read MB Is - 200 100 I Disk bandwidth is not growing

  14. SSDs Performance: HDD of latency – Reads: 25us latency moms - – Write: 200us latency deleting data expensive is ~ – Erase: 1,5 ms overwriting - Steady state, when SSD full – One erase every 64 or 128 reads (depending on page size) Lifetime: 100,000-1 million writes per page

  15. SSD VS HDD COST O " " O qq.FEYsn.tn O O - - - - - - -

  16. 100 MB Is Disk I Ethernet Bandwidth " " : r.oas.e.fi Growing 33-40% per year ! 2017 2002 1998 1995 o

  17. AMAZON EC2 (2019) - t drive Flash tf Yat g

  18. TRENDS SUMMARY CPU speed per core is flat Memory bandwidth growing slower than capacity SSD, NVMe replacing HDDs of limitations Ethernet bandwidth growing ? machine Single

  19. net rack offer DATACENTER ARCHITECHTURE gas T Racks with fitches now µ racks → Memory Bus → PCIe → → Ethernet → → SATA Server Server

  20. STORAGE HIERARCHY (DC AS A COMPUTER v2) = ↳ I 201 Or G - I a ::¥¥÷ : - GBH @ → 100M Bb -

  21. Warehouse-Scale Computers Many concerns o – Infrastructure Single organization : – Networking Homogeneity (to some extent) - 19000 getters – Storage Cost efficiency at scale r - ← – Software – Multiplexing across = applications and services – Power/Energy - – Rent it out! – Failure/Recovery – …

  22. SOFTWARE IMPLICATIONS Component → Reliability failures Storage Hierarchy - Workload Diversity Single organization -

  23. WORKLOAD: Partition-Aggregate - - BigData - latency low Top-level Aggregator ijhtkggiegeted Mid-level Aggregators fry Workers shard ed Index

  24. WORKLOAD: SCHOLAR SIMILARITY " mapped → → I quit → Not e Mir → re µ . I .÷÷:w . . Map Stage Reduce Stage

  25. intensive paralleling VIDEO ENCODING compute f fragments TV f K " youtube → ' daleth ly v .

  26. Wsc → MACHINE LEARNING grain we

  27. DISCUSSION https://forms.gle/CrrrhCPYHerwXNEt5

  28. Discussion sale Out Scale up Scale-up vs Scale-out parallelism doesn't have app your If communication → ← overkill dataset small Fault tolerance -8 - you to - as Miriam coiffeur > pay peggy 10 . 000 I

  29. ↳ DISCUSSION Microsoft Word vs. online document editor like Google Docs Word Docs challenge release Yearly is . , collaboration consistency a path monthly anywhere , Access it from - - Machine I hardware patches I release online compatibility Leek tag permanent redundancy → storage 99.99% uptime

  30. DISCUSSION * 99% having well Even work servers makes Parallelism worse latencies tail 0 O X tin only ) - C have slowdown #

  31. NEXT STEPS Next class: Storage Systems Assignment 1 out Thursday. Submit groups before that! Wait list

Recommend


More recommend