Dynamic Fine-Grained Scheduling for Energy-Efficient Main-Memory - PowerPoint PPT Presentation

Dynamic Fine-Grained Scheduling for Energy-Efficient Main-Memory Queries Iraklis Psaroudakis (EPFL, SAP AG), Thomas Kissinger (TU Dresden), Danica Porobic (EPFL), Thomas Ilsche (TU Dresden), Erietta Liarou (EPFL), Pinar Tözün (EPFL), Anastasia Ailamaki (EPFL), Wolfgang Lehner (TU Dresden) 1

Why care about power? Monthly datacenter costs [J. R. Hamilton] Energy proportionality Servers 4% 13% Networking Power Equipment Power Distribution & 18% 57% Cooling Today Power 8% Ideal Utilization Other Getting there: 30% power-related • Power management features Dynamic fraction increasing • Power-aware software We need to make DBMS power-aware 2

Power management features • Dynamic voltage and frequency scaling (DVFS) > 2.9GHz • Turbo boost 1.2GHz 2.9GHz • Idle states (C-states) • Power-related H/W counters We can exploit these to improve energy efficiency 3

Current approaches • Black box – e.g. dynamic concurrency throttling [TPDS13] unpredictable behavior DBMS • Query optimizer [ICDE10] coarse-grained, without low-level tuning + power costs We need fine-grained energy-awareness in the database 4

Fine-grained energy-aware scheduling Σ How do you schedule this query plan? S • parameters: – parallelism – thread placement – data placement – dynamic voltage and frequency scaling (DVFS) Calibration of operators under different parameters 5

Concurrent partitioned scans • Each thread scans 128MB of integers for 5 secs • Maximize 𝑞𝑓𝑠𝑔𝑝𝑠𝑛𝑏𝑜𝑑𝑓 𝑞𝑓𝑠 𝑞𝑝𝑥𝑓𝑠 = 𝑢ℎ𝑠𝑝𝑣𝑕ℎ𝑞𝑣𝑢 𝑞𝑝𝑥𝑓𝑠 – under different parallelism, scheduling, and frequency settings • Machine – Two 8-core Intel Xeon E5-2690, HT enabled, 64GB RAM, frequencies from 1.2GHz to 2.9GHz • Power measurements – Hardware performance counters RAPL (CPU & DRAM) – External equipment 6

Socket-fill scheduling Socket 1 Socket 2 Core 1 & HT Core 2 & HT Core 8 & HT Core 9 & HT Core 10 & HT Core 16 & HT … … 1 9 2 10 8 16 17 25 18 26 24 32 4.0 Throughput per Watt 3.5 bandwidth saturation 3.0 2.5 2.0 1.5 1.0 0.5 Auto (RAPL) 0.0 0 4 8 12 16 20 24 28 32 # Threads 7

Socket-fill scheduling Socket 1 Socket 2 Core 1 & HT Core 2 & HT Core 8 & HT Core 9 & HT Core 10 & HT Core 16 & HT … … 1 9 2 10 8 16 17 25 18 26 24 32 4.0 3.5 Throughput per Watt 3.0 2.5 2.0 1.5 constant difference 1.0 Auto (RAPL) 0.5 Auto (external equipment) 0.0 0 4 8 12 16 20 24 28 32 # Threads 8

Socket-fill scheduling Socket 1 Socket 2 Core 1 & HT Core 2 & HT Core 8 & HT Core 9 & HT Core 10 & HT Core 16 & HT … … 1 9 2 10 8 16 17 25 18 26 24 32 best frequency 4.0 different 3.5 Throughput per Watt saturation points 3.0 2.5 2.0 1.5 1.0 1.2GHz 2.0GHz 0.5 2.9GHz Auto 0.0 0 4 8 12 16 20 24 28 32 # Threads 9

Socket-fill HT scheduling Socket 1 Socket 2 Core 1 & HT Core 2 & HT Core 8 & HT Core 9 & HT Core 10 & HT Core 16 & HT … … 1 2 3 4 15 16 17 18 19 20 31 32 4.0 HT draws negligible power 3.5 Throughput per Watt 3.0 2.5 2.0 1.5 1.0 1.2GHz 2.0GHz 0.5 2.9GHz Auto 0.0 0 4 8 12 16 20 24 28 32 # Threads 10

Socket-wise scheduling Socket 1 Socket 2 Core 1 & HT Core 2 & HT Core 8 & HT Core 9 & HT Core 10 & HT Core 16 & HT … … 1 17 3 19 15 31 2 18 4 20 16 32 4.0 3.5 Throughput per Watt 3.0 2.5 2.0 avoids socket-specific 1.5 bandwidth saturation 1.0 1.2GHz 2.0GHz 0.5 2.9GHz Auto 0.0 0 4 8 12 16 20 24 28 32 # Threads 11

Socket-wise HT scheduling Socket 1 Socket 2 Core 1 & HT Core 2 & HT Core 8 & HT Core 9 & HT Core 10 & HT Core 16 & HT … … 1 2 5 6 29 30 3 4 7 8 31 32 best energy 4.0 efficiency 3.5 Throughput per Watt 1.3x 3.0 2.5 2.0 1.5 1.2GHz 2.0GHz 1.0 0.5 2.9GHz Auto 0.0 0 4 8 12 16 20 24 28 32 # Threads 12

Parallel aggregation • 𝑏 = 𝑐 𝑗 + 𝑑 𝑗 , 4GB arrays • Minimize 𝑓𝑜𝑓𝑠𝑕𝑧 𝑒𝑓𝑚𝑏𝑧 𝑞𝑠𝑝𝑒𝑣𝑑𝑢 (𝐹𝐸𝑄) = 𝑠𝑓𝑡𝑞𝑝𝑜𝑡𝑓 𝑢𝑗𝑛𝑓 𝑡𝑓𝑑 ∗ 𝑓𝑜𝑓𝑠𝑕𝑧( 𝐾) – under different parallelism, scheduling, and memory placement • Machine – Two 8-core Intel Xeon E5-2640, HT disabled, 256GB of RAM • Memory placement – On first socket – Interleaved 13

Parallel aggregation Memory on first socket Memory interleaved 100 100 Socket-fill Socket-fill Socket-wise Socket-wise EDP (kJ x sec) 10 10 socket-wise 1 1 better bandwidth constrained 0.1 0.1 0 4 8 12 16 0 4 8 12 16 # Threads # Threads 14

Main-memory memory-bound operations • Intermediate frequency has best efficiency – Different saturation points • Avoid memory bandwidth saturation – by data and thread placement • Up to 4x energy efficiency 15

Fine-grained energy awareness Calibration analysis Measurements Runtime decisions of operators and hardware counters and/or scheduling, resource parameters external equipment allocation, power management Energy efficiency Power # Threads Time power parallelism CPU utilization data & thread placement memory utilization DVFS Thank you! THIS PAPER 16

References • [J. R. Hamilton] Internet-Scale Datacenter Economics: Where the Costs And Opportunities Lie. HPTS, 2011. • [TPDS13] D. Li, B. R. de Supinski, M. Schulz, D. S. Nikolopoulos, and K. W. Cameron. Strategies for energy-ecient resource management of hybrid programming models. IEEE TPDS, 24(1):144-157, 2013. • [ICDE10] Z. Xu, Y.-C. Tu, and X. Wang. Exploring power-performance tradeos in database systems. In ICDE, pages 485-496, 2010. 17

Dynamic Fine-Grained Scheduling for Energy-Efficient Main-Memory - PowerPoint PPT Presentation

Dynamic Fine-Grained Scheduling for Energy-Efficient Main-Memory Queries Iraklis Psaroudakis (EPFL, SAP AG), Thomas Kissinger (TU Dresden), Danica Porobic (EPFL), Thomas Ilsche (TU Dresden), Erietta Liarou (EPFL), Pinar Tzn (EPFL), Anastasia

Fine Grained Access Control Fine-Grained Access Control Fine Grained Access Control

Fine-Grained Access Control Fine Grained Access Control Fine-grained access control examples:

Fine-Grained Geographic Communication (Geocast) Nexus Workshop Frank Drr 23.07.2003 1

Average-Case Fine-Grained Hardness Marshall Ball Alon Rosen Manuel Sabin Prashant Nalini

Fine-grained Visual Analysis: From Classification to Retrieval Yi-Zhe Song SketchX Lab, CVSSP,

Mechanized Verification of Fine-grained Concurrent Programs Ilya Sergey Aleks Nanevski

Junfeng Fan ESAT/COSIC ECC implementation methods Multi-core systems Coarse-Grained

Aperiodic Task Scheduling Radek Pel anek Preemptive Scheduling Non-preemptive Scheduling

Combining Data-Intense and Compute-Intense Methods for Fine-Grained Morphological Analyses Petra

Fine-Grained Power Modeling for Smartphones Using System Call Tracing Based on paper and

Fine-Grained Tracking of Grid Infections Ashish Gehani SRI Basim Baig, Salman Mahmood, Dawood

Addressing Inter-Class Similarity in Fine-Grained Visual Classification Abhimanyu Dubey

Fine-grained Image Recognition Lei Wang VILA group School of Computing and Information

On the Correctness Criteria of Fine-Grained Access Control in Relational Databases Qihua Wang,

Fine Grained Coordinated Parallelism in a Real World Application Mohammad Rezaei, PhD June 2012

Phase Transition in 3SAT Yi Zhou Phase Transition in 3SAT Phase Transition in 3SAT Fine Grained

Advanced Concurrency Department of Computer Science University of Maryland, College Park

Elasecutor: Elastic Executor Scheduling in Data Analytics Systems Libin Liu , Hong Xu City

WHEN BAD THINGS HAPPEN TO GOOD HOSPITALS CASE STUDY Reduction in Force: COMMUNICATIONS STRATEGY.

Half Year results 2018/19 20 November 2018 Andrew Williams - Group Chief Executive Marc

Certifying Video Provenance Ashish Gehani I3P/SRI 1 INTRODUCTION : Why certify? Video

CPL 2016, week 3 Thread management: execution and shutdown Oleg Batrashev Institute of Computer

Timelines with Temporal Uncertainty Alessandro Cimatti Andrea Micheli Marco Roveri Embedded

Asynchronous programming & Crypto COMPSCI210 Recitation 25th Mar 2013 Vamsi Thummala

Dynamic Fine-Grained Scheduling for Energy-Efficient Main-Memory - PowerPoint PPT Presentation

Dynamic Fine-Grained Scheduling for Energy-Efficient Main-Memory Queries Iraklis Psaroudakis (EPFL, SAP AG), Thomas Kissinger (TU Dresden), Danica Porobic (EPFL), Thomas Ilsche (TU Dresden), Erietta Liarou (EPFL), Pinar Tzn (EPFL), Anastasia

Fine Grained Access Control Fine-Grained Access Control Fine Grained Access Control

Fine-Grained Access Control Fine Grained Access Control Fine-grained access control examples:

Fine-Grained Geographic Communication (Geocast) Nexus Workshop Frank Drr 23.07.2003 1

Average-Case Fine-Grained Hardness Marshall Ball Alon Rosen Manuel Sabin Prashant Nalini

Fine-grained Visual Analysis: From Classification to Retrieval Yi-Zhe Song SketchX Lab, CVSSP,

Mechanized Verification of Fine-grained Concurrent Programs Ilya Sergey Aleks Nanevski

Junfeng Fan ESAT/COSIC ECC implementation methods Multi-core systems Coarse-Grained

Aperiodic Task Scheduling Radek Pel anek Preemptive Scheduling Non-preemptive Scheduling

Combining Data-Intense and Compute-Intense Methods for Fine-Grained Morphological Analyses Petra

Fine-Grained Power Modeling for Smartphones Using System Call Tracing Based on paper and

Fine-Grained Tracking of Grid Infections Ashish Gehani SRI Basim Baig, Salman Mahmood, Dawood

Addressing Inter-Class Similarity in Fine-Grained Visual Classification Abhimanyu Dubey

Fine-grained Image Recognition Lei Wang VILA group School of Computing and Information

On the Correctness Criteria of Fine-Grained Access Control in Relational Databases Qihua Wang,

Fine Grained Coordinated Parallelism in a Real World Application Mohammad Rezaei, PhD June 2012

Phase Transition in 3SAT Yi Zhou Phase Transition in 3SAT Phase Transition in 3SAT Fine Grained

Advanced Concurrency Department of Computer Science University of Maryland, College Park

Elasecutor: Elastic Executor Scheduling in Data Analytics Systems Libin Liu , Hong Xu City

WHEN BAD THINGS HAPPEN TO GOOD HOSPITALS CASE STUDY Reduction in Force: COMMUNICATIONS STRATEGY.

Half Year results 2018/19 20 November 2018 Andrew Williams - Group Chief Executive Marc

Certifying Video Provenance Ashish Gehani I3P/SRI 1 INTRODUCTION : Why certify? Video

CPL 2016, week 3 Thread management: execution and shutdown Oleg Batrashev Institute of Computer

Timelines with Temporal Uncertainty Alessandro Cimatti Andrea Micheli Marco Roveri Embedded

Asynchronous programming &amp; Crypto COMPSCI210 Recitation 25th Mar 2013 Vamsi Thummala

Asynchronous programming & Crypto COMPSCI210 Recitation 25th Mar 2013 Vamsi Thummala