Gaining Insights into Multicore Cache Partitioning: Bridging the Gap between Simulation and Real Systems 1 Presented by Hadeel Alabandi 2/18/2014
Introduction and Motivation 2 • A serious issue to the effective utilization of multicore processors is cache partitioning and sharing • Simulation were used to evaluate cache partitioning in the existing studies, however, it has some limitations • Excessive simulation time • Absence of OS activities • Proneness to simulation inaccuracy 2/18/2014
Introduction and Motivation (cont.) 3 • In this paper, a software approach has been used • It supports static and dynamic cache partitioning by using memory address mapping • It emulates hardware partitioning mechanism will examine cache partitioning policies on real time systems • Three metrics were used through evaluation for optimization purposes • Performance • Fairness • QoS 2/18/2014
Cache Partitioning for Multicore Processors 4 • It has two interdependent parts • Mechanism • Forces cache partitioning • Provides partitioning policy input • Policy • Decides how much cache resources will be allocated to each program with an optimization objective 2/18/2014
Adopted Evaluation Metrics in The Study 5 • Performance Metrics • Throughput (IPCs) • Absolute number of IPCs • Combined miss rates • Summarizes miss rates • Combined misses • Summarizes number of cache misses • QoS Metrics • Suppose that QoS constraints are never violated in their case 2/18/2014
Adopted Evaluation Metrics in The Study 6 (cont.) • Fairness Metrics • Miss rates • The number of misses • The slowdown for each co- secheduled program should be identical after cache partitioning • In the study, fairness metrics related to single core execution with dedicated L2 cache • Date required for policy metric and the evaluation metric were acquired by running a workload with different cache partitioning • The result value will be in the range (-1 to 1) • If the result is 1, the correlation between the 2 metrics is perfect 2/18/2014
Static OS-based Cache Partitioning 7 • Static cache partitioning policy predetermines the amount of cache blocks allocated to each program at the beginning of its execution • Page coloring will be used in the partitioning mechanism • There several bits between cache index and physical page number in the physical address • It will be used for page color • Addressed cache will be divided to non-intersecting regions by page color • Pages with the same color are mapped to the same cache region 2/18/2014
Cache Partitioning – Page Coloring 8 2/18/2014
Cache Partitioning – Page Coloring 9 2/18/2014
Dynamic OS-based Cache Partitioning 10 • Adjust cache quotas among processes dynamically • Page recoloring procedure • Increasing the process cache resources ( i.e number of colors used by the process) • The kernel rearrange the virtual memory mapping of the process • Allocating physical pages of the new color • Copying the memory contents • Freeing the old pages • Remapping virtual pages cause performance overhead • Reduce the overall overhead by lowering the frequency of cache allocation adjustment • Another option is using lazy method of page migration, so the content of colored page is moved only when it’s accessed • Average overhead of dynamic partitioning reduced to 2% 2/18/2014 • Highest migration overhead observed 7%
Page Recoloring 11 2/18/2014
Dynamic Cache Partitioning Policies 12 • Cache partitioning will be adjusted periodically by the policies at the end of each epoch • Dynamic cache partitioning policy for performance • Adjust cache partitioning dynamically • Metrics • Throughput (IPCs) • Combined miss rate • Combined misses • Fair speedup • Dynamic cache partitioning policy for fairness • Two dynamic policies were implemented based on FM0 and FM4 • FM0 is the evaluation metric ( i.e. the ratio of the current cumulative IPC over the baseline IPC) 2/18/2014 • FM4 is the cache miss rates
Dynamic Cache Partitioning Policies (cont.) 13 • Dynamic cache partitioning policy for QoS consideration • Two core workload of two programs • The first is the target program • The second is the partner program • QoS guarantee • Ensure the target program performance is larger than or equal to X% of a baseline execution of homogeneous workload on a dual core processor with half of the cache capacity allocated for each program • Increase the performance of the partner program 2/18/2014
Experimental Methodology 14 • Hardware and software platform • Dell PowerEdge1950 • Two dual core, 3.0GHz Intel Xeon 5160 processors and 8GB fully Buffered DIMM (FB-DIMM) main memory • Shared, 4MB, 16-way set associative L2 cache • Each core has a private 32KB instruction cache and a private 32KB data cache • Red Hat Enterprise Linux 4.0 • Kernel linux-2.6.20.3 • Performance collected using pfmon 2/18/2014
Evaluation Results 15 • Show the improvement with the best static partitioning of each workload over shared cache 2/18/2014
The Performance – Static & Dynamic 16 2/18/2014
Fairness – Correlation between Evaluation 17 Metrics and Policy Metrics 2/18/2014
QoS – Static & Dynamic 18 2/18/2014
Related Work 19 • Cache partitioning for multicore processors • Page Coloring 2/18/2014
Summary 20 • An OS-based cache partitioning mechanism on multicore processors were designed and implemented • Using it to study different cache partitioning polices • Some simulation- based study findings’ were confirmed, however, this approach shows new insights haven’t been shown by simulation • Future work • Reduce cache partitioning overhead • Adding easy user interface • Conducting partitioning research at the compiler level for both multiprogramming and multithreaded applications 2/18/2014
Discussion 21 • Does OS-based approach had provided new insights and observations that simulation couldn’t or failed to show it? 2/18/2014
References 22 • Gaining Insights into Multicore Cache Partitioning:Bridging the Gap between Simulation and Real Systems • http://www.contrib.andrew.cmu.edu/~hyoseunk/pdf/ecrts13- hyos-slides.pdf • http://ftp.cs.rochester.edu/~xiao/eurosys09/euro061-zhang.pdf 2/18/2014
Recommend
More recommend