tdlo CS 744: DATACENTER AS A COMPUTER Shivaram Venkataraman Fall 2020
ANNOUNCEMENTS - Assignments Piazza - Assignment zero is due! → - Form groups for Assignment 1 on Piazza ↳ Thursday - Class format - Review - Lecture - Discussion
Applications Machine Learning SQL Streaming Graph Application ¥ Computational Engines ' Scalable Storage Systems Arch [ . Resource Management > Hardware → Architecture Datacenter Architecture
OUTLINE - Hardware Trends - Datacenter design - WSC workloads - Discussion
Why is One Machine Not Enough? parallelism limited - enough resources not → ^ ) high could be Cost - contd maqn.ge Redundancy → - high volumes are Data - - - slow →
What’s in a Machine? DRAM y Procecnpgr f. Interconnected compute and storage Memory Bus Newer Hardware - GPUs, FPGAs PCIe v4 - RDMA, NVlink → Ssp Ethernet SATA HDD →
Scale Up: Make More Powerful Machines Moore’s law ? ? O – Stated 52 years ago by Intel • / founder Gordon Moore – Number of transistors on microchip double every 2 years – Today “closer to 2.5 years” Intel CEO Brian Krzanich
Dennard Scaling is the Problem Core 32 or core If Suggested that power requirements are proportional ¥ to the area for transistors – Both voltage and current being proportional to length – Stated in 1974 by Robert H. Dennard (DRAM inventor) “Adapting to Thrive in a New Economy of Memory Abundance,” Bresniker et al Broken since 2005
⇒ Dennard Scaling is the Problem Performance per-core is stalled I Number of cores is increasing “Adapting to Thrive in a New Economy of Memory Abundance,” Bresniker et al
ft Memory TRENDS Cop awk or loot of t 's tater a B pi f - GB Is lo - 15 - core per log 100 M = DRAM O '
MEMORY TAKEAWAY Growing Data access from memory is getting more expensive ! +15% per year
HDD CAPACITY storage - Back blaze - - → backup O O O
HDD BANDWIDTH HM bandwidth read MB Is - 200 100 I Disk bandwidth is not growing
SSDs Performance: HDD of latency – Reads: 25us latency moms - – Write: 200us latency deleting data expensive is ~ – Erase: 1,5 ms overwriting - Steady state, when SSD full – One erase every 64 or 128 reads (depending on page size) Lifetime: 100,000-1 million writes per page
SSD VS HDD COST O " " O qq.FEYsn.tn O O - - - - - - -
100 MB Is Disk I Ethernet Bandwidth " " : r.oas.e.fi Growing 33-40% per year ! 2017 2002 1998 1995 o
AMAZON EC2 (2019) - t drive Flash tf Yat g
TRENDS SUMMARY CPU speed per core is flat Memory bandwidth growing slower than capacity SSD, NVMe replacing HDDs of limitations Ethernet bandwidth growing ? machine Single
net rack offer DATACENTER ARCHITECHTURE gas T Racks with fitches now µ racks → Memory Bus → PCIe → → Ethernet → → SATA Server Server
STORAGE HIERARCHY (DC AS A COMPUTER v2) = ↳ I 201 Or G - I a ::¥¥÷ : - GBH @ → 100M Bb -
Warehouse-Scale Computers Many concerns o – Infrastructure Single organization : – Networking Homogeneity (to some extent) - 19000 getters – Storage Cost efficiency at scale r - ← – Software – Multiplexing across = applications and services – Power/Energy - – Rent it out! – Failure/Recovery – …
SOFTWARE IMPLICATIONS Component → Reliability failures Storage Hierarchy - Workload Diversity Single organization -
WORKLOAD: Partition-Aggregate - - BigData - latency low Top-level Aggregator ijhtkggiegeted Mid-level Aggregators fry Workers shard ed Index
WORKLOAD: SCHOLAR SIMILARITY " mapped → → I quit → Not e Mir → re µ . I .÷÷:w . . Map Stage Reduce Stage
intensive paralleling VIDEO ENCODING compute f fragments TV f K " youtube → ' daleth ly v .
Wsc → MACHINE LEARNING grain we
DISCUSSION https://forms.gle/CrrrhCPYHerwXNEt5
Discussion sale Out Scale up Scale-up vs Scale-out parallelism doesn't have app your If communication → ← overkill dataset small Fault tolerance -8 - you to - as Miriam coiffeur > pay peggy 10 . 000 I
↳ DISCUSSION Microsoft Word vs. online document editor like Google Docs Word Docs challenge release Yearly is . , collaboration consistency a path monthly anywhere , Access it from - - Machine I hardware patches I release online compatibility Leek tag permanent redundancy → storage 99.99% uptime
DISCUSSION * 99% having well Even work servers makes Parallelism worse latencies tail 0 O X tin only ) - C have slowdown #
NEXT STEPS Next class: Storage Systems Assignment 1 out Thursday. Submit groups before that! Wait list
Recommend
More recommend