Application-controlled Frequency Scaling Jons-Tobias Wamhoff - PowerPoint PPT Presentation

Application-controlled Frequency Scaling Jons-Tobias Wamhoff Stephan Diestelhorst Christof Fetzer Technische Universität Dresden, Germany Patrick Marlier Pascal Felber Université de Neuchâtel, Switzerland Dave Dice Oracle Labs, USA

Overview • Dynamic voltage and frequency scaling (DVFS) • traditionally: used to save energy or boost sequential bottlenecks/serial peak loads • today: improve performance by exposing asymmetric properties of applications • Outline • Recap DVFS features on current x86 multicores • DVFS properties: latency and power • Applying DVFS on application-level 2

P- and C-states • P-states: performance states • predefined frequency/voltage pairs P turbo frequency/voltage • controlled through machine-specific registers … P base (MSRs, privileged rdmsr / wrmsr ) • C-states: power states … P slow • trade entry/wakeup latency for higher power C0 savings halted C1-Cn • entered by hlt or monitor / mwait 3

AMD Intel & Turbo CORE Turbo Boost HT HT x86 FPU x86 P base P base P base P base • Voltage and frequency domain: module vs. package P turbo ≥ C1 ≥ C1 ≥ C1 • Boosting: deterministic vs. thermal P turbo P slow P slow P slow • AMD only: asymmetric frequencies with manual boost 4

Evaluation Setup Acquire entry Acquire exit Release t wait t CS f P base time • Critical sections (CS) protected by MCS queue lock • Decorations on acquire/release → trigger DVFS • Variable size of CS → amortize DVFS cost t CS • Effective CS frequency : f CS = f base · t A + CS + R • Energy for 1 hour at P base : E NORM = E sample · t A + CS + R t CS 5

Automatic Frequency Scaling t CS t P turbo → P base f P turbo t P base → C halt t C halt → P base t wait f P base t ramp OS halt: entry, wakeup CPU deeper C-state boosted P-state • Decoration: spinning vs. blocking • P-state transitions triggered by hardware 6

Blocking vs. Spinning Locks Frequency AMD Frequency Intel 4 . 0 3 . 9 3 . 4 3 . 1 f CS (GHz) ↑ ↑ 1.5M 4M 1 . 4 0 . 8 0 . 0 0 . 0 Energy AMD Energy Intel 0 . 6 0 . 6 E NORM (kWh) 0 . 5 spin 0 . 5 futex 0 . 4 0 . 4 0 . 3 0 . 3 10k 1M, t wait = 7M t wait = 70k 0 . 2 0 . 2 ↓ ↓ 0 . 1 0 . 1 0 . 0 0 . 0 10 3 10 4 10 5 10 6 10 7 10 2 10 3 10 4 10 5 10 6 10 7 Size CS (cycles, log) Size CS (cycles, log) 7

Manual Frequency Scaling t CS t P turbo → P base f P turbo t P base → P slow t P slow → P turbo f P base t wait t ramp f P slow ioctl 1k 1k 1k wrmsr 28k 2k 23k transition 2k 225k 1k • Decoration: spin and application-level DVFS control 8

Manual Lock Boosting Frequency AMD Energy AMD 0 . 8 4 . 0 0 . 7 spin ownr E NORM (kWh) 3 . 1 0 . 6 dlgt ↖ f CS (GHz) ↗ mgrt 0 . 5 200k ↑ 600k 0 . 4 400k 0 . 3 1 . 4 0 . 2 0 . 1 0 . 0 0 . 0 10 3 10 4 10 5 10 6 10 7 10 8 10 3 10 4 10 5 10 6 10 7 10 8 Size CS (cycles, log) Size CS (cycles, log) futex: 1.5M • delegate: dedicated wrmsr core • spin: static P base • owner: dynamically boost • migrate: statically boosted core 9

T URBO Library • Convenient programmatical application-level DVFS control • Testbed to explore challenges of future heterogeneous cores Execution ThreadRegistry ThreadControl control - Create/Register - Decorate lock, barriers, …: boosting/profiling Performance Thread P-States PerformanceMonitor configuration - Migrate to core - Setting & configuration - Low-level profiling Hardware Topology PCI-Configuration MSR-Interface PerfEvent abstraction - P-states - HW counters Linux kernel and hardware interfaces https://bitbucket.org/donjonsn/turbo 10

Boosting Applications • Expose application knowledge • Asymmetric software transactional memory:   up to 50% speedup with only 2% more energy • Tradeoffs when IPC depends on core frequency • Hash table resize in memcached:   9% speedup but 22% higher frequency • Outweigh P-state latency by delegating CS • High cross-module round-trip delay (2k cycles) • Intra-module delay scales with P-state (P boost : 280 cycles) 11

Next Steps • Intel Haswell-EP supports per core P-states • Allows to give hints • Application domains • Real-time scheduling • Fork-join benchmarks • …? 12

Application-controlled Frequency Scaling Jons-Tobias Wamhoff - PowerPoint PPT Presentation

Application-controlled Frequency Scaling Jons-Tobias Wamhoff Stephan Diestelhorst Christof Fetzer Technische Universitt Dresden, Germany Patrick Marlier Pascal Felber Universit de Neuchtel, Switzerland Dave Dice Oracle Labs, USA

Frequency Decomposition The base frequency or the fundamental frequency is the lowest frequency.

Outline Scaling Scalinga Plenitude of Power Laws Scaling-at-large Scaling-at-large

UP UP AND OUT: SCALING SOFTWARE WITH AKKA Jonas Bonr CTO Typesafe @jboner Scaling software

Analysis of Scaling Algorithms for Matrix & Operator Scaling Contents Scaling Algorithms

Screening Controlled Substance Screening Controlled Substance Screening Controlled Substance

MEDICAL SOLUTIONS Controlled Power Company MEDICAL SOLUTIONS Controlled Power Company MEDICAL

Count Controlled CSCI-UA.0002-008 Loops Count Controlled Loops A count controlled loop is a

Time-Frequency Analysis Time Frequency Analysis in Visual Signal Yetmen Wang AnCAD, Inc.

Effectively Scaling Effectively Scaling up/universalizing exclusive up/universalizing exclusive

Scaling From simple models to rich strategies PPPLab Day, November 30th Scaling: recent

Outline Scalinga Plenitude of Power Laws Scaling-at-large Scaling-at-large Principles of

Computer Graphics Spectral Analysis Philipp Slusallek Spatial Frequency Frequency

Conformal Finite Size Scaling of Conformal Finite Size Scaling of Flavors Chik Him Wong Twelve

Chapter 11: Scaling and Round-off Noise Keshab K. Parhi Outline Introduction Scaling

So#ware Scaling Mo/va/on & Goals HW Configura/on & Scale Out So#ware Scaling

ADAPTIVE RADIO OUTPUT SCALING FOR POWER AND BANDWIDTH SAVING Koen Zandberg 1 ADAPTIVE RADIO

Partnership for TCS & Davidson County Public Library TCS & Davidson County Public Library

Impacts of Tropical Cyclones on the Upper Troposphere Eric Ray 1,2 and Karen Rosenlof 1 1 Chemical

Topics in TCS 0 -sampling Raphal Clifford Introduction to 0 sampling Over a large data

Timelike Compton Scattering with CLAS12 at Jefferson Lab Pierre Chatagnon Institut de Physique

TCS G 2 Manifolds and 4D Emergent Strings Fengjun Xu Universit at Heidelberg arXiv:

NHDP/OLSRv2 Security Ulrich Herberg Thomas Clausen 1 Reminder draft-herberg-manet-packetbb-sec

Introduction References and Presentation at: http://www.elinux.org/SOC_Spies Introduction

How to write a successful Research Topic Acceptance Request (RTAR) You can do it! Its

Application-controlled Frequency Scaling Jons-Tobias Wamhoff - PowerPoint PPT Presentation

Application-controlled Frequency Scaling Jons-Tobias Wamhoff Stephan Diestelhorst Christof Fetzer Technische Universitt Dresden, Germany Patrick Marlier Pascal Felber Universit de Neuchtel, Switzerland Dave Dice Oracle Labs, USA

Frequency Decomposition The base frequency or the fundamental frequency is the lowest frequency.

Outline Scaling Scalinga Plenitude of Power Laws Scaling-at-large Scaling-at-large

UP UP AND OUT: SCALING SOFTWARE WITH AKKA Jonas Bonr CTO Typesafe @jboner Scaling software

Analysis of Scaling Algorithms for Matrix &amp; Operator Scaling Contents Scaling Algorithms

Screening Controlled Substance Screening Controlled Substance Screening Controlled Substance

MEDICAL SOLUTIONS Controlled Power Company MEDICAL SOLUTIONS Controlled Power Company MEDICAL

Count Controlled CSCI-UA.0002-008 Loops Count Controlled Loops A count controlled loop is a

Time-Frequency Analysis Time Frequency Analysis in Visual Signal Yetmen Wang AnCAD, Inc.

Effectively Scaling Effectively Scaling up/universalizing exclusive up/universalizing exclusive

Scaling From simple models to rich strategies PPPLab Day, November 30th Scaling: recent

Outline Scalinga Plenitude of Power Laws Scaling-at-large Scaling-at-large Principles of

Computer Graphics Spectral Analysis Philipp Slusallek Spatial Frequency Frequency

Conformal Finite Size Scaling of Conformal Finite Size Scaling of Flavors Chik Him Wong Twelve

Chapter 11: Scaling and Round-off Noise Keshab K. Parhi Outline Introduction Scaling

So#ware Scaling Mo/va/on &amp; Goals HW Configura/on &amp; Scale Out So#ware Scaling

ADAPTIVE RADIO OUTPUT SCALING FOR POWER AND BANDWIDTH SAVING Koen Zandberg 1 ADAPTIVE RADIO

Partnership for TCS &amp; Davidson County Public Library TCS &amp; Davidson County Public Library

Impacts of Tropical Cyclones on the Upper Troposphere Eric Ray 1,2 and Karen Rosenlof 1 1 Chemical

Topics in TCS 0 -sampling Raphal Clifford Introduction to 0 sampling Over a large data

Timelike Compton Scattering with CLAS12 at Jefferson Lab Pierre Chatagnon Institut de Physique

TCS G 2 Manifolds and 4D Emergent Strings Fengjun Xu Universit at Heidelberg arXiv:

NHDP/OLSRv2 Security Ulrich Herberg Thomas Clausen 1 Reminder draft-herberg-manet-packetbb-sec

Introduction References and Presentation at: http://www.elinux.org/SOC_Spies Introduction

How to write a successful Research Topic Acceptance Request (RTAR) You can do it! Its

Analysis of Scaling Algorithms for Matrix & Operator Scaling Contents Scaling Algorithms

So#ware Scaling Mo/va/on & Goals HW Configura/on & Scale Out So#ware Scaling

Partnership for TCS & Davidson County Public Library TCS & Davidson County Public Library