an unsophisticated cooperative
play

An unsophisticated cooperative approach to prefetching linked data - PowerPoint PPT Presentation

An unsophisticated cooperative approach to prefetching linked data structures Alexander Galazin Murad Neiman-zade JSC MCST, Moscow EPIC-8, April 24, 2010 An unsophisticated cooperative approach to prefetching linked data structures


  1. An unsophisticated cooperative approach to prefetching linked data structures Alexander Galazin Murad Neiman-zade JSC “MCST”, Moscow EPIC-8, April 24, 2010

  2. An unsophisticated cooperative approach to prefetching linked data structures Motivation  Pointer-based applications significantly lack performance due to irregularity of memory access patterns  There is no information on how linked data structures addresses evolve in major applications  Existing approaches propose sophisticated cooperative techniques with great modifications in CPU EPIC-8, April 24, 2010 Alexander Galazin, Murad Neiman-zade

  3. An unsophisticated cooperative approach to prefetching linked data structures Background App Procedure %T app Data Misses 181.mcf flow_cost 53.7% 94.2% update_tree 15.8% 95.1% 197.parser xfree 7.0% 43.6% table_pointer 3.6% 59.4% 254.gap CollectGarb 9.4% 82.4% 300.twolf new_dbox_a 17.3% 71.0% EPIC-8, April 24, 2010 Alexander Galazin, Murad Neiman-zade

  4. An unsophisticated cooperative approach to prefetching linked data structures Studying LDS Traversal • Discover LDS traversal    addr addr • Collect , where addr –  k i k i address with which LDS traversal operates, i- loop iteration and k ={1..16} EPIC-8, April 24, 2010 Alexander Galazin, Murad Neiman-zade

  5. An unsophisticated cooperative approach to prefetching linked data structures LDS Traversal Behavior • 181.mcf – flow_cost : 2 addresses in LDS and only 1  if k is fixed – update_tree : 3  in 97% • 197.parser – xfree : 1  in 90% – table_pointer : 3  in 49% • 254.gap – CollectGarb : 2  in 96% • 300.twolf – new_dbox_a : 3  in 98% EPIC-8, April 24, 2010 Alexander Galazin, Murad Neiman-zade

  6. An unsophisticated cooperative approach to prefetching linked data structures Our method • Architectural support – New instruction IsOperandsNotReady • Compiler support – Discover LDS traversal – Inject prefetching code – Create compensating nodes EPIC-8, April 24, 2010 Alexander Galazin, Murad Neiman-zade

  7. An unsophisticated cooperative approach to prefetching linked data structures Architectural support • IsOperandsNotReady(TI) C-code – returns TRUE if any while(a) of the operands of TI { are not ready – otherwise FALSE a=a->next; – is always scheduled } together with TI in the ASM-code same wide instruction { and requires 1 logical cmpesb,1 %r0, 0, %pred1 unit. pass % ionr1 , %pred5 } EPIC-8, April 24, 2010 Alexander Galazin, Murad Neiman-zade

  8. An unsophisticated cooperative approach to prefetching linked data structures Compiler support. Preparation • for each LD we create a global array for keeping 3 most popular  and their LD arr[i] → d i LD arr[i+1] → f i frequencies; LD r1 → r1 • we keep a history of HISTORY(r1) addresses for the load for LD r1 → r1 D iterations; • in the preloop we load all elements of the array to ST arr[i] ← d i registers ST arr[i+1] ← f i • in the postloop we save values of 3 top  and HISTORY(r1) MOV r1 i → r i … their frequencies in the MOV r1 i+k → r (i+k) array; EPIC-8, April 24, 2010 Alexander Galazin, Murad Neiman-zade

  9. An unsophisticated cooperative approach to prefetching linked data structures Compiler support. Prefetching • in the loop head we create prefetches for ( A+  ) where LD arr[i] → d i LD arr[i+1] → f i A is the address of the LD on the current iteration; LD r1 → r1; USE(r1) HISTORY(r1) LD r1 → r1 PREFETCH(r1+d 1 ) • after the USE of LD result PREFETCH(r1+d 2 ) PREFETCH(r1+d 3 ) we add IsONR(USE) → P BRANCH cn P IsOperandsNotReady and ST arr[i] ← d i branch which transfer ST arr[i+1] ← f i HISTORY(r1) control to a compensating MOV r1 i → r i … node; MOV r1 i+k → r (i+k) EPIC-8, April 24, 2010 Alexander Galazin, Murad Neiman-zade

  10. An unsophisticated cooperative approach to prefetching linked data structures Compiler support. Calculating  • in the compensating node we calculate S – the difference between current load address LD arr[i] → d i LD arr[i+1] → f i and its oldest retained address; • then we search for whether there LD r1 → r1 is such  and if there is, we HISTORY(r1) LD r1 → r1 PREFETCH(r1+d 1 ) increment the value of register PREFETCH(r1+d 2 ) which keeps its frequency; PREFETCH(r1+d 3 ) IsONR(LD) → P if there is no such  we initialize a • BRANCH cn P new register with S and set a frequency register to one; ST arr[i] ← d i ST arr[i+1] ← f i • if the frequency of S becomes HISTORY(r1) greater than that of the previous MOV r1 i → r i … compensating node register we swap them, thus MOV r1 i+k → r (i+k) doing a “lazy bubble sort”; SUB r1, r i → v i SEARCH(v i ) in d i INCR(f i ) SWAP(d i , d i-1 ) EPIC-8, April 24, 2010 Alexander Galazin, Murad Neiman-zade

  11. An unsophisticated cooperative approach to prefetching linked data structures Experimental results • The method was evaluated on a computer with the Elbrus microprocessor; • The microprocessor has EPIC architecture, 4-way associative L2 of 256 KB, 4 load/store units. • 181.mcf reduced by 15% • 254.gap reduced by 4% • The method is still in the phase of active development EPIC-8, April 24, 2010 Alexander Galazin, Murad Neiman-zade

Recommend


More recommend