FAWNdamentally Power-efficient Clusters David Andersen, for: Vijay Vasudevan, Jason Franklin, Amar Phanishayee, Lawrence Tan, Michael Kaminsky*, Iulian Moraru Carnegie Mellon University, *Intel Research Pittsburgh 1
Monthly energy statement considered harmful • Power is a limiting factor in computing • 3-year TCO soon to be dominated by power cost [EPA 2007] • Influences location, technology choices 2
Approaches to saving power Power generation Infrastructure Power distribution Efficiency Cooling Sleeping when idle Dynamic Power Rate adaptation Scaling VM consolidation Computational FAWN Efficiency Goal of computational efficiency: Reduce the amount of energy to do useful work 3
FAWN Improve computational efficiency of data-intensive computing using Fast Array of Wimpy Nodes an array of well-balanced low- power systems. AMD Geode 256MB DRAM 4GB CompactFlash 4
Target: Data-intensive computing • Large amounts of data • Highly-parallelizable • Fine-grained, independent tasks Workloads amenable to “scale - out” approach 5
Outline • What is FAWN? • Why FAWN? • When FAWN? • Challenges (How FAWN?) 6
1. Fixed power costs dominate 70% of peak power at 0% utilization! Power (W) } Fixed power costs Ideal Figure adapted from Tolia et. al HotPower 08 7
2. Balancing to save energy • How do we balance? • Big CPUs clocked down? • CPU-to-Disk seek Embedded Speed Ratio CPUs? • Why not use more disks with big CPUs? Year 8
3. Targeting the sweet-spot in efficiency Fast processors mask memory wall Speed vs. Efficiency at the cost of efficiency Fixed power costs can dominate efficiency for slow processors FAWN targets sweet spot in processor efficiency when including fixed costs 9
4. Reducing peak power consumption • Provisioning for peak power requires: 1. worst case cooling requirements 2. UPS systems upon power failure 3. power generation and substations investment 10
What is FAWN good for? • Random-access workloads (Key-value Lookup) • Scan-bound workloads (Hadoop, Data Analytics) • CPU-bound workloads (Compression, Encryption) 11
Important metrics Performanc Efficiency Density Cost e Work Perf Perf Perf time Watt Volume $ 12
Random access workloads FAWN + CF (4W) Traditional + HD (87W) Traditional + SSD (83W) 14
Random access workloads FAWN is 6-200x more efficient than traditional systems 450 400 424.25 350 300 250 200 150 100 50 2.03448 69.8795 0 Queries/Joule Performance Efficiency 15
CPU-bound encryption AES encryption/decryption of a FAWN is 2x more efficient for CPU-bound operations! 512MB file with a 256-bit key 0.8 0.7 0.73 0.6 0.5 0.4 0.3 0.365 0.2 0.1 0 Encryption Efficiency (MB/J) Performance Efficiency 16
When to use FAWN for random access workloads? • Total cost of ownership • Capital cost + 3 year power @ $0.10/kWh • What is the cheapest architecture for serving random access workloads? • Traditional + {Disks, SSD, DRAM}? • FAWN + {Disks, SSD, DRAM}? 17
Architecture with lowest TCO for random access workloads Ratio of query rate to dataset size informs storage technology FAWN-based systems can provide lower cost per {GB, QueryRate} 18
Challenges “Each decimal order of magnitude increase in parallelism requires a major redesign and rewrite of parallel code” - Kathy Yelick • Algorithms and Architectures at 10x scale • Dealing with Amdahl’s law • High performance using low performance nodes • Today’s software may not run out of the box • Manageability, failures, network design, power cost vs. engineering cost 19
But it’s possible... • Example: FAWN-KV high-performance key-value store 20
FAWN-KV Requests Responses
Using BerkeleyDB on ext3 • Initially implemented using BDB on ext3 • BDB uses B-tree indexing structure • Files stored on CompactFlash • Benchmarked: • Inserts (BDB file creation) • Splits • Merges
DB Mgmt on Flash Split/Merge Operations Creating 1.8GB BDB File Number of Files Insertion Time 1 12 hours 50 min 8 3 hours 18 min 32 2 hours 26 min B-Tree does many small, random writes. Flash does not like.
Flash... • Setting a bit to 1 is free • Setting a bit to zero requires clearing a 128--256KB erase block • Practical consequence: • Seq reads fast; seq writes pretty fast; • rand reads decent; rand writes awful • Almost everything on flash becomes log structured
FAWNDB • Key-Value storage • Inserts written sequentially to end • Deletions/Appends require periodic compaction In memory Log-like behavior is free: DB already tracks location Wimpy memory limits size for key-value at byte Key frag (15 bits) of DB. Offset (32 bits) granularity Wimpies have little Filesystem or device can DRAM. do so at block granularity, higher overhead
FawnDB Performance Creating 1.8GB BDB File Number of Files Insertion Time 1 12 hours 50 min 8 3 hours 18 min 32 2 hours 26 min Creating 1.8GB FAWNDB File Number of Files Insertion Time 1 9.63 min 8 9.83 min 32 9.93 min
Tackling other challenges • Limited DRAM • Good progress on developing new “massive multi - grep” (given 1M strings, find if any of them occur in massive dataset) with low memory requirements • and more! :)
Conclusion • FAWN improves the computational efficiency of datacenters • Informed by fundamental system power trends • Challenges: programming for 10x scale, running today’s software on yesterday’s machine, dealing with flash, ... http://www.cs.cmu.edu/~fawnproj/ Thanks to: Google, Intel, NetApp 28
29
21-node FAWN Cluster 2 Gb/s of small k-v queries @ 891µs 90W from 80GB dataset median access (on 4 year old hardware) time
Cleaning & Merging: Not bad! • LFS traditional weak link: Cleaning... Max load Low load
More Design • Reliability: Chain replication • FAWN-DS is log structured • stream log tail to next replica • Load balancing • Front-ends have small cache; • working on: read from any replica, load balancing across workers
Database distribution Requirements: H D 1) Spread data and G queries uniformly B 2) Handle node joins F C and departures E A without affecting many nodes
Consistent hashing Hash Index Values H (H,B] G B F C D
Node Addition Hash Index Values H A (H,B] G B F C E D
splits and joins Keys Values 1 At : B (H,B] Split (H,A] (A,B] Transfer B A 2 (H,A]
Recommend
More recommend