glimpses glimpses memory and program behavior glimpses
play

GLIMPSES: GLIMPSES: Memory and program behavior GLIMPSES: - PowerPoint PPT Presentation

GLIMPSES: GLIMPSES: Memory and program behavior GLIMPSES: GLIMPSES: Memory and program behavior estimation for SPEs Jaswanth Sreeram, Ling Liu, Santosh Pande Motivation Prototyping large codebases for porting to Prototyping large


  1. GLIMPSES: GLIMPSES: Memory and program behavior GLIMPSES: GLIMPSES: Memory and program behavior estimation for SPEs Jaswanth Sreeram, Ling Liu, Santosh Pande

  2. Motivation “Prototyping large codebases for porting to “Prototyping large codebases for porting to SPEs is challenging” SPEs is challenging” “Need “Need a way to quickly evaluate a way to quickly evaluate program program behavior and its suitability for SPEs behavior and its suitability for SPEs” behavior and its suitability for SPEs behavior and its suitability for SPEs – Important for legacy code/reuse Important for legacy code/reuse 2

  3. Motivation (contd) • Porting large codebases to SPEs is challenging – Limited local store Limited local store – High branch penalty – Geared towards vectorizable code – Code/data partitioning is not trivial – SPE-SPE, SPE-PPE interactions • Provide programmer with tools to – Understand dynamic program behavior Understand dynamic program behavior – Quickly construct candidate partitions for SPEs – Evaluate/Quantify partitions’ suitability for SPEs 3

  4. GLIMPSES: Tool Overview • Dynamic Call Graphs • Memory Requirements – Dynamic – Analytical A l ti l • Memory Access Patterns – Locality (spatial, temporal, neighbor affinity) • Partitioning a t t o g – Criteria based estimates • Visual interactive • Visual, interactive 4

  5. Dynamic Call Chains Graph Visualization Area Results Display Panel 5

  6. Call chains…contd 6

  7. Mpeg-2 Decode • Zoom view • Shows dynamic call chains for a program run (in this case the program is mpeg2-decode) 7

  8. GLIMPSES C/C++ program Analytical LLVM compiler flow Partition Estimator Dyn. Memory Estimator Memory Estimator Bytecode Analysis & Analysis & GraphML Trace Instrumentation Passes Instru. Bytecode Visualization Engine Runtime Link Test Inputs Execute Profile Trace 8

  9. Memory Behavior • Estimate static and dyn. memory usage – Code, stack and heap (per function) , p (p ) – Usage < SPE LS limit ? • Estimate function attributes • Estimate function attributes – Branch density – Number of Auto-vectorizable loops • Analytical estimation – Detect program objects affecting dynamic memory behavior – Show correlation between these program objects and memory usage. • Construct an arithmetic expression for amount of memory allocated, in terms of inputs or other program objects 9

  10. Analytical Estimator : Mpeg2 example Code segment Result for (……) __Malloc_size__1 = 1024 { { if (cc==0) __Malloc_size__2 = 0 + Coded_Picture_Width*Coded_Picture_Height size = Picture_Width*Picture_Height; else __Malloc_size__3 0 + Malloc size 3 = 0 + size = Chroma_Width*Chroma_Height; Coded_Picture_Width*Coded_Picture_Height __Malloc_size__4 = 0 + if (!(backward_reference_frame[cc] = Coded_Picture_Width*Coded_Picture_Height (unsigned char *) malloc(size) )) (unsigned char *) malloc(size) )) Error(…); __Malloc_size__5 = 0 + Chroma_Width*Chroma_Height if (!(forward_reference_frame[cc] = __ Malloc size _ __ 6 = 0 + (unsigned char *) malloc(size) )) Chroma_Width*Chroma_Height Error(…); __Malloc_size__7 = 0 + } Chroma_Width*Chroma_Height __Malloc_size__8 = 0 + Chroma_Width*Chroma_Height 10

  11. Memory Access Patterns • Locality metrics for loads/stores – Spatial Locality •“Loads to different addresses in a spatial window” – Temporal Locality •“Loads to same address in a time window” “L d dd i i i d ” – Neighbor Affinity Neighbor Affinity •“Loads to addresses within a space and time window” 11

  12. Locality Measures Localit Measures (mpeg2decode) Memory Access Locality: Recurrence of Loads Recurrence of Loads 1200 1000 1000 of loads 800 Number o 600 600 400 200 200 0 1 6 11 16 21 26 31 36 41 46 51 56 61 Number of times recurred Number of times recurred 12

  13. Locality measures: Affinity Locality: Neighbor Affinity 8000 7000 6000 r of Loads 5000 4000 Number 3000 2000 1000 0 1 5 9 13 17 21 25 29 33 37 41 45 49 53 57 61 "NA" V l "NA" Values 13

  14. Program Partitions • Provide programmer with possible partition candidates candidates – Can be based on criteria: •Memory consumption •Memory consumption •Memory reference behavior •Branch density •Auto-vectorizable loops •Aliasing •Combination (a “rank” metric) •Combination (a “rank” metric) – Does not assume code/data overlays Does not assume code/data overlays 14

  15. Partitioning • Start with earliest leaf node in dyn. call graph • Estimates only: No code generation in a partition • Programmer to take care of “cloning”. • Try to add its parent to the partition • Can produce interprocedural, context sensitive Can produce interprocedural, context sensitive •Try to add all of parent’s children to the alias information. partition • Given two partitions, can they alias each •If they can be added, try to add parent to other’s data ? partition. p • Try to add parent’s parent to partition 15

  16. Status • Several features/improvements planned – Alias Analysis information for refining partition-set Alias Analysis information for refining partition set – Alias Analysis information for data pinning/prefetching opportunities. – Leverage DataStructureAnalyses for smart memory – Leverage DataStructureAnalyses for smart memory allocation on SPUs • Tested on • Tested on – Workloads from SPECINT – Workloads from mediabench – ODE (Open Dynamics Engine) • Beta version to be released shortly • Beta version to be released shortly. 16

  17. The End Email contact: jaswanth@cc.gatech.edu 17

Recommend


More recommend