Motivation 1 Existing Techniques 2 GreenHDFS 3 Yahoo! Cluster - PowerPoint PPT Presentation

RiniT Kaushik , Milind Bhandarkar*, Klara Nahrstedt University of Illinois, Urbana-Champaign, *Yahoo Inc. 1

• Motivation 1 • Existing Techniques 2 • GreenHDFS 3 • Yahoo! Cluster Analysis 4 • Evaluation 5 2

Data-intensive Computing Rapidly Popular Advertising optimizations, Mail anti-spam, Data Analytics Growing Hadoop Deployment Open-source Hadoop platform of choice, Yahoo! 38000 servers , 170 PB Escalating Energy Costs Operating energy costs >= acquisition costs, Environmentally (Un)-friendly Energy-Conservation in Hadoop Clusters Necessary 3

• Scale-down Server • CPU (DVFS, DFS, DVS) & • Disks Cooling • Smart cooling Possible power states: Active, Idle, Inactive (Sleep) Idle Power = 30-40% Active Power Sleep Power = 3-10% Active Power Scale-down transitions servers from active to inactive (Sleep) power state  most energy-proportional Scale-Down Very Attractive 4

Sufficient idleness • To mitigate power state transition time, energy expended No performance degradation Few power state transitions • To not reduce lifetime of disks 5

• Chase et. al., SOSP’01 Workload • G. Chen et. al., NSDI’08, ….. Migration • Con - Works if servers stateless Always-ON • Leverich et. al., HotPower’09 Covering • Amur et. Al., SOCC’10 Primary • Con - Write performance impact Replica Set 6

Data Data Data Data Data Hard to generate significant idleness Replicas and chunks distributed across cluster Workload migration not an option Servers NOT state-less Data-locality: Computations reside with data 7

 Write Performance Important ▪ Reduce phase of Map-reduce task ▪ Production workloads such as click- stream processing operate on newly written data Need More Scale-Down Approaches in a Hadoop Cluster 8

 Focus on energy-aware data placement instead of workload placement  Exploit heterogeneity in data access patterns towards data-differentiated data placement Meets all scale-down mandates and works for Hadoop 9

Data Data Data Data Data Hot Zone Cold Zone Opportunities for consolidation: Scale-down (ZZZ….) 10-50% CPU Utilization * In peak loads: Compute capacity of Cold zone servers can be used * Barasso et. al. 10

• Minimize server wakeups • No data chunking • In-order file Hot Zone Cold Zone placement • On-demand power-on • Storage-heavy servers • Reduces cold zone’s Aggressive Performance footprint Energy-Driven -Driven Policies Policies Zones Trade-off Energy and Performance 11

 File Migration Policy  Dormant, low temperature data moved to Cold zone  Run during low periods of load Coldness > Threshold FMP Hot Cold Zone Zone 12

 Server Power Conservation Policy  Server (CPU, DRAM & Memory) level Dormant > Threshold SCP Active Sleep Wake-on-LAN - File Access - Data Placement - Bit-Rot Scanning - File Deletion 13

 File Reversal Policy  Ensures QoS of data that becomes hot after period of dormancy Hot Cold Zone Zone Hotness > Threshold FRP 14

 Maximize energy savings  Minimize data oscillations  Minimize performance degradation Can be achieved if none or few accesses to the Cold Zone 15

Low High Data Oscillations Hot Space File Migration Policy Performance Energy Savings Energy Performance Server Power Policy Savings State Changes Performance Data Oscillations File Reversal Policy 16

 2600 servers, 5Petabytes, 34 millions files  1-month of HDFS traces and metadata snapshots  Multi-tenant production cluster  Analyzed 6 top-level directories  each signifies a tenant ▪ Directories d, p, u, m 17

63.16% of total file count and 56.23% of total used capacity is cold (not accessed in 1-month) 18

First Last Create Delete Read Read Dormant • Create Lifespan LRD • Delete • Last Read Hot Lifespan CLR 19

90% of data’s first read happens within 2 days of creation 20

89% of data is accessed for less than 10 days after creation Threshold FMP should be > LifespanCLR 21

80% of data in dir d dormant for > 20 days 20% of data in dir p dormant for > 10 days 0.02% of data in dir m dormant beyond 1 day 22

 89% of data in Yahoo! Hadoop compute cluster has a news-server-like access pattern  Once data is deemed cold, low probability of it getting accessed again  Significant idleness in Cold Zone  high energy savings  Few accesses to Cold Zone  less performance degradation  System stable – less data oscillations Great for GreenHDFS Goals 23

 Trace-driven simulation driven by 1-month long hdfs traces from a 2600 server/ 5pb cluster for main directory dir d  Hot zone  1170  Cold zone  390  Assumed 3-way replication in both zones  Used power and transition penalties from datasheets of Quad Core Intel Xeon, Seagate Barracuda SATA disk * * not representative of Yahoo H/W Configuration 24

24% cost savings, $2.1 Million  38000 servers, in reality more savings (cooling, idle power in Hot zone) Minimally sensitive 25

Only 6.38TB worth of data migrated daily 26

Insignificant file reversals Data oscillations & energy savings insensitive to the File Migration Policy threshold. 27

More free space in Hot zone  more hot data 28

Max power state transitions observed = 11, no risk to disk longevity 29

 Results in significant energy cost reduction as shown with real-world large- scale traces from Yahoo! Hadoop Cluster  Insensitive to thresholds  Allows effective server-level scale-down in Hadoop Cluster ▪ Generates significant idleness in Cold Zone ▪ Few power state transitions ▪ No write performance impact 30

Thank You 31

Motivation 1 Existing Techniques 2 GreenHDFS 3 Yahoo! Cluster - PowerPoint PPT Presentation

RiniT Kaushik , Milind Bhandarkar, Klara Nahrstedt University of Illinois, Urbana-Champaign, Yahoo Inc. 1 Motivation 1 Existing Techniques 2 GreenHDFS 3 Yahoo! Cluster Analysis 4 Evaluation 5 2 Data-intensive Computing

Sketch Model Review MotoThresher Empowering Tanzanian Farmers Motivation Motivation

with Polynomial Filters Josiah Manson and Scott Schaefer Texas A&M University Motivation

Bringing Portraits to Life CS448V: Lecture 13 Motivation Motivation Motivation Bring Your

Motivation: Theory & practice 2017-18 I MPORTANCE OF MOTIVATION Employees may lack

5. Motivation Motivation: Big Questions Where does motivation come from? Can

Indoor Places Lukas Kuster Motivation GPS for localization [7] 2 Motivation Indoor

UBER RUSH AND REBUILDING UBERS DISPATCHING PLATFORM motivation CHAPTER 1 OF 8 MOTIVATION

MOTIVATION MOTIVATION Dr. M. Thenmozhi Professor Department of Management Studies Indian

Video Analytics Xavier Gir-i-Nieto Motivation 2 Motivation 3 Motivation 4 Outline 1.

MOTIVATION Watch this video on intrinsic versus extrinsic motivation Value x Expectation (of

Learner Motivation Motivational Self-Reflection Self-Reflection Time Travel Think about a time

Motivation What is Motivation? How motivated are you now? What are your thoughts as you enter

RedGate - Enterprise MSE Project - Phase I Integration Server Motivation 2 Motivation 2

Comp/Phys/Mtsc 715 Lecture 2: Motivation and Toolkits 1/13/2011 Motivation and Toolkits

Recent work in Truncated Statistics Andrew Ilyas Motivation: Poincar and the Baker

Comp/Phys/Mtsc 715 Lecture 2: Motivation and Toolkits 1/14/2014 Motivation and Toolkits

University of Extremadura University of Extremadura Department of Electronics Department of

What is a point pattern? For a specified, bounded region D , a set of locations s i , i = 1 , 2

Travelling securely on the Grid to the origin of the Universe F-Secure SPECIES 2007 conference

This presentation summaries the health pathways for refugee and asylum seeker children in

A First Look Franz Franchetti Carnegie Mellon University in collaboration with Daniele G.

GPU-accelerated Data Management Data Processing on Modern Hardware Sebastian Bre TU Dortmund

Current Trends, Changing Demographics Barry Dickman, MPA Director Philadelphia TB Control

Exascale: Parallelism gone wild! Craig Stunkel, IBM Research IBM Research Outline Why are

Motivation 1 Existing Techniques 2 GreenHDFS 3 Yahoo! Cluster - PowerPoint PPT Presentation

RiniT Kaushik , Milind Bhandarkar*, Klara Nahrstedt University of Illinois, Urbana-Champaign, *Yahoo Inc. 1 Motivation 1 Existing Techniques 2 GreenHDFS 3 Yahoo! Cluster Analysis 4 Evaluation 5 2 Data-intensive Computing

Sketch Model Review MotoThresher Empowering Tanzanian Farmers Motivation Motivation

with Polynomial Filters Josiah Manson and Scott Schaefer Texas A&amp;M University Motivation

Bringing Portraits to Life CS448V: Lecture 13 Motivation Motivation Motivation Bring Your

Motivation: Theory &amp; practice 2017-18 I MPORTANCE OF MOTIVATION Employees may lack

5. Motivation Motivation: Big Questions Where does motivation come from? Can

Indoor Places Lukas Kuster Motivation GPS for localization [7] 2 Motivation Indoor

UBER RUSH AND REBUILDING UBERS DISPATCHING PLATFORM motivation CHAPTER 1 OF 8 MOTIVATION

MOTIVATION MOTIVATION Dr. M. Thenmozhi Professor Department of Management Studies Indian

Video Analytics Xavier Gir-i-Nieto Motivation 2 Motivation 3 Motivation 4 Outline 1.

MOTIVATION Watch this video on intrinsic versus extrinsic motivation Value x Expectation (of

Learner Motivation Motivational Self-Reflection Self-Reflection Time Travel Think about a time

Motivation What is Motivation? How motivated are you now? What are your thoughts as you enter

RedGate - Enterprise MSE Project - Phase I Integration Server Motivation 2 Motivation 2

Comp/Phys/Mtsc 715 Lecture 2: Motivation and Toolkits 1/13/2011 Motivation and Toolkits

Recent work in Truncated Statistics Andrew Ilyas Motivation: Poincar and the Baker

Comp/Phys/Mtsc 715 Lecture 2: Motivation and Toolkits 1/14/2014 Motivation and Toolkits

University of Extremadura University of Extremadura Department of Electronics Department of

What is a point pattern? For a specified, bounded region D , a set of locations s i , i = 1 , 2

Travelling securely on the Grid to the origin of the Universe F-Secure SPECIES 2007 conference

This presentation summaries the health pathways for refugee and asylum seeker children in

A First Look Franz Franchetti Carnegie Mellon University in collaboration with Daniele G.

GPU-accelerated Data Management Data Processing on Modern Hardware Sebastian Bre TU Dortmund

Current Trends, Changing Demographics Barry Dickman, MPA Director Philadelphia TB Control

Exascale: Parallelism gone wild! Craig Stunkel, IBM Research IBM Research Outline Why are

RiniT Kaushik , Milind Bhandarkar, Klara Nahrstedt University of Illinois, Urbana-Champaign, Yahoo Inc. 1 Motivation 1 Existing Techniques 2 GreenHDFS 3 Yahoo! Cluster Analysis 4 Evaluation 5 2 Data-intensive Computing

with Polynomial Filters Josiah Manson and Scott Schaefer Texas A&M University Motivation

Motivation: Theory & practice 2017-18 I MPORTANCE OF MOTIVATION Employees may lack