Gaining Insights into Multicore Cache Partitioning: Bridging the - PowerPoint PPT Presentation

Gaining Insights into Multicore Cache Partitioning: Bridging the Gap between Simulation and Real Systems 1 Presented by Hadeel Alabandi 2/18/2014

Introduction and Motivation 2 • A serious issue to the effective utilization of multicore processors is cache partitioning and sharing • Simulation were used to evaluate cache partitioning in the existing studies, however, it has some limitations • Excessive simulation time • Absence of OS activities • Proneness to simulation inaccuracy 2/18/2014

Introduction and Motivation (cont.) 3 • In this paper, a software approach has been used • It supports static and dynamic cache partitioning by using memory address mapping • It emulates hardware partitioning mechanism will examine cache partitioning policies on real time systems • Three metrics were used through evaluation for optimization purposes • Performance • Fairness • QoS 2/18/2014

Cache Partitioning for Multicore Processors 4 • It has two interdependent parts • Mechanism • Forces cache partitioning • Provides partitioning policy input • Policy • Decides how much cache resources will be allocated to each program with an optimization objective 2/18/2014

Adopted Evaluation Metrics in The Study 5 • Performance Metrics • Throughput (IPCs) • Absolute number of IPCs • Combined miss rates • Summarizes miss rates • Combined misses • Summarizes number of cache misses • QoS Metrics • Suppose that QoS constraints are never violated in their case 2/18/2014

Adopted Evaluation Metrics in The Study 6 (cont.) • Fairness Metrics • Miss rates • The number of misses • The slowdown for each co- secheduled program should be identical after cache partitioning • In the study, fairness metrics related to single core execution with dedicated L2 cache • Date required for policy metric and the evaluation metric were acquired by running a workload with different cache partitioning • The result value will be in the range (-1 to 1) • If the result is 1, the correlation between the 2 metrics is perfect 2/18/2014

Static OS-based Cache Partitioning 7 • Static cache partitioning policy predetermines the amount of cache blocks allocated to each program at the beginning of its execution • Page coloring will be used in the partitioning mechanism • There several bits between cache index and physical page number in the physical address • It will be used for page color • Addressed cache will be divided to non-intersecting regions by page color • Pages with the same color are mapped to the same cache region 2/18/2014

Cache Partitioning – Page Coloring 8 2/18/2014

Cache Partitioning – Page Coloring 9 2/18/2014

Dynamic OS-based Cache Partitioning 10 • Adjust cache quotas among processes dynamically • Page recoloring procedure • Increasing the process cache resources ( i.e number of colors used by the process) • The kernel rearrange the virtual memory mapping of the process • Allocating physical pages of the new color • Copying the memory contents • Freeing the old pages • Remapping virtual pages cause performance overhead • Reduce the overall overhead by lowering the frequency of cache allocation adjustment • Another option is using lazy method of page migration, so the content of colored page is moved only when it’s accessed • Average overhead of dynamic partitioning reduced to 2% 2/18/2014 • Highest migration overhead observed 7%

Page Recoloring 11 2/18/2014

Dynamic Cache Partitioning Policies 12 • Cache partitioning will be adjusted periodically by the policies at the end of each epoch • Dynamic cache partitioning policy for performance • Adjust cache partitioning dynamically • Metrics • Throughput (IPCs) • Combined miss rate • Combined misses • Fair speedup • Dynamic cache partitioning policy for fairness • Two dynamic policies were implemented based on FM0 and FM4 • FM0 is the evaluation metric ( i.e. the ratio of the current cumulative IPC over the baseline IPC) 2/18/2014 • FM4 is the cache miss rates

Dynamic Cache Partitioning Policies (cont.) 13 • Dynamic cache partitioning policy for QoS consideration • Two core workload of two programs • The first is the target program • The second is the partner program • QoS guarantee • Ensure the target program performance is larger than or equal to X% of a baseline execution of homogeneous workload on a dual core processor with half of the cache capacity allocated for each program • Increase the performance of the partner program 2/18/2014

Experimental Methodology 14 • Hardware and software platform • Dell PowerEdge1950 • Two dual core, 3.0GHz Intel Xeon 5160 processors and 8GB fully Buffered DIMM (FB-DIMM) main memory • Shared, 4MB, 16-way set associative L2 cache • Each core has a private 32KB instruction cache and a private 32KB data cache • Red Hat Enterprise Linux 4.0 • Kernel linux-2.6.20.3 • Performance collected using pfmon 2/18/2014

Evaluation Results 15 • Show the improvement with the best static partitioning of each workload over shared cache 2/18/2014

The Performance – Static & Dynamic 16 2/18/2014

Fairness – Correlation between Evaluation 17 Metrics and Policy Metrics 2/18/2014

QoS – Static & Dynamic 18 2/18/2014

Related Work 19 • Cache partitioning for multicore processors • Page Coloring 2/18/2014

Summary 20 • An OS-based cache partitioning mechanism on multicore processors were designed and implemented • Using it to study different cache partitioning polices • Some simulation- based study findings’ were confirmed, however, this approach shows new insights haven’t been shown by simulation • Future work • Reduce cache partitioning overhead • Adding easy user interface • Conducting partitioning research at the compiler level for both multiprogramming and multithreaded applications 2/18/2014

Discussion 21 • Does OS-based approach had provided new insights and observations that simulation couldn’t or failed to show it? 2/18/2014

References 22 • Gaining Insights into Multicore Cache Partitioning:Bridging the Gap between Simulation and Real Systems • http://www.contrib.andrew.cmu.edu/~hyoseunk/pdf/ecrts13- hyos-slides.pdf • http://ftp.cs.rochester.edu/~xiao/eurosys09/euro061-zhang.pdf 2/18/2014

Gaining Insights into Multicore Cache Partitioning: Bridging the - PowerPoint PPT Presentation

Gaining Insights into Multicore Cache Partitioning: Bridging the Gap between Simulation and Real Systems 1 Presented by Hadeel Alabandi 2/18/2014 Introduction and Motivation 2 A serious issue to the effective utilization of multicore

Gaining Insights, Gaining Access WHAT WE LEARNED FROM SENIOR FARMERS WITHOUT SUCCESSORS NOVEMBER

State of Multicore OCaml KC Sivaramakrishnan University of OCaml Labs Cambridge Outline

The Why, Where and How of Multicore Anant Agarwal MIT and Tilera Corp. What is Multicore?

Multicore Multicore curiculum 1 Motivation Moores Law: the number of transistors double

Multicore OCaml GC KC Sivaramakrishnan, Stephen Dolan University of OCaml Labs Cambridge

Multicore Synchronization a pragmatic introduction Multicore Synchronization This is a talk on

RETHINKING OPERATING SYSTEM DESIGNS FOR A Ken Birman Based heavily MULTICORE WORLD on a slide

Gaining new insights European Social Survey into societal MEASURING PUBLIC challenges by

Applying Behavioural Insights to Public Policy Simon Ruda Outline 1. What are behavioural

When Multicore Isnt Enough: Trends and the Future for Multi-Multicore Systems Matt Reilly

A Scalable Ordering Primitive for Multicore Machines Sanidhya Kashyap Changwoo Min Kangnyeon Kim

The Challenge of Multicore The Challenge of Multicore and and Specialized Accelerators for

Practical Algebraic Effect Handlers in Multicore OCaml KC Sivaramakrishnan University of

Reactive design patterns for microservices on multicore Reactive summit - 22/10/18

Multicore Based Packet Splitting Multicore Based Packet Splitting Approaches for High Speed

The Impact of Multicore Multicore on on The Impact of Math Software Math Software and and

Finish the Sorting Intro Work on Spellchecker Project Mini-project is due at the beginning of

Evidence-Based Practices to Improve Oral Health in the S BHC S etting 2016-2017 S BHC

Budgeting for Outcomes The Practical Application Benjamin Hart Vice President, Assurance

STAKEHOLDER APPROCH STAKEHOLDER MANAGEMENT: MANAGING FOR STAKEHOLDERS = PRODUCE VALUE

Static Analysis for Extracting Permission Checks of a Large Scale Framework: The Challenges And

Disclosures Tenodesis Screw Fixation of Tendon Graft for Scapholunate Dissociation: None

Bridging the Gap between Smart Home Platforms and Machine Learning using Relational Reference

High-Q 3D Photonic Bandgap Cavities for Axion Detection LLNL Axion n Cavit ity Worksho hop p