uncovering hidden loop level parallelism in sequential
play

Uncovering Hidden Loop Level Parallelism in Sequential Applications - PowerPoint PPT Presentation

Uncovering Hidden Loop Level Parallelism in Sequential Applications Hongtao Zhong, Mojtaba Mehrara, Steve Lieberman, Scott Mahlke Advanced Computer Architecture Lab. University of Michigan University of Michigan 1 Electrical Engineering and


  1. Uncovering Hidden Loop Level Parallelism in Sequential Applications Hongtao Zhong, Mojtaba Mehrara, Steve Lieberman, Scott Mahlke Advanced Computer Architecture Lab. University of Michigan University of Michigan 1 Electrical Engineering and Com puter Science

  2. CMP Architectures • Multiple cores on a chip – Higher throughput – Reduced complexity (per core) Intel Core 2 Duo – More power/heat friendly • Multithreaded applications AMD Quad-core (Barcelona) Sun Niagara 2 University of Michigan 2 Electrical Engineering and Com puter Science

  3. How About Single Thread? [Source : Bridges et al, MICRO `07] University of Michigan 3 Electrical Engineering and Com puter Science

  4. Loop Level Parallelization DOALL loop i = 0 -3 9 University of Michigan 4 Electrical Engineering and Com puter Science

  5. Loop Level Parallelization DOALL loop i = 2 0 -3 9 i = 0 -1 9 i = 0 -3 9 Core 1 Core 0 University of Michigan 5 Electrical Engineering and Com puter Science

  6. Loop Level Parallelization Speculative DOALL loop i = 0 -3 9 University of Michigan 6 Electrical Engineering and Com puter Science

  7. Loop Level Parallelization Speculative DOALL loop Loop Chunk i = 0 -9 i = 1 0 -1 9 i = 0 -3 9 i = 2 0 -2 9 i = 3 0 -3 9 Core 1 Core 0 University of Michigan 7 Electrical Engineering and Com puter Science

  8. Loop Level Parallelization Speculative DOALL loop Loop Chunk Bad news: limited number of parallel i = 0 -9 i = 1 0 -1 9 loops in general purpose applications i = 0 -3 9 –1.3x speedup for SpecINT2000 on 4 cores i = 2 0 -2 9 i = 3 0 -3 9 Core 1 Core 0 University of Michigan 8 Electrical Engineering and Com puter Science

  9. Contributions • Code generation framework Spawn Initialization Abort Handler XBEGIN if (global_brk_flag) break; – Speculative parallelization of for(i=IS; i<IE; i++) { ...... if (brk_cond) local_brk_flag = 1; break; } perm = RECV(THREAD j-1 ) XCOMMIT uncounted loops if (local_brk_flag) global_brk_flag = 1; kill_other_threads; elseif (IE < n) SEND(perm,THREAD j+1 ) IS = ...; IE = ...; Consolidation • Compiler transformations – Speculative loop fission – Isolation of infrequent dependences – Speculative prematerialization University of Michigan 9 Electrical Engineering and Com puter Science

  10. Target Architecture L2 cache Core 1 Core 0 Core 2 Core 3 L2 cache University of Michigan 10 Electrical Engineering and Com puter Science

  11. Target Architecture L2 cache Scalar operand Core 1 Core 0 network Core 2 Core 3 L2 cache University of Michigan 11 Electrical Engineering and Com puter Science

  12. Target Architecture L2 cache Scalar operand Core 1 Core 0 network Hardware Core 2 Core 3 transactional memory L2 cache University of Michigan 12 Electrical Engineering and Com puter Science

  13. Code Generation Framework for (i=0;i<n;i++) // original loop code University of Michigan 13 Electrical Engineering and Com puter Science

  14. Code Generation Framework while (...) IS+=...; IE+=...; XBEGIN for (i=IS;i<IE;i++) // original loop code XCOMMIT University of Michigan 14 Electrical Engineering and Com puter Science

  15. Code Generation Framework while (...) IS+=...; IE+=...; XBEGIN for (i=IS;i<IE;i++) // original loop code RECV(THREAD j-1 ) XCOMMIT SEND(THREAD j+1 ) University of Michigan 15 Electrical Engineering and Com puter Science

  16. Code Generation Framework Spawn while (...) IS+=...; IE+=...; XBEGIN for (i=IS;i<IE;i++) // original loop code RECV(THREAD j-1 ) XCOMMIT SEND(THREAD j+1 ) University of Michigan 16 Electrical Engineering and Com puter Science

  17. Code Generation Framework Spawn while (...) IS+=...; IE+=...; XBEGIN for (i=IS;i<IE;i++) // original loop code if (brkCond) break; RECV(THREAD j-1 ) XCOMMIT SEND(THREAD j+1 ) University of Michigan 17 Electrical Engineering and Com puter Science

  18. Code Generation Framework Spawn while (...) IS+=...; IE+=...; XBEGIN if (globalBrk) break; for (i=IS;i<IE;i++) // original loop code if (brkCond) localBrk=1; break; RECV(THREAD j-1 ) XCOMMIT if (localBrk) globalBrk=1;abortOtherTXs; SEND(THREAD j+1 ) University of Michigan 18 Electrical Engineering and Com puter Science

  19. Code Generation Framework Spawn while (...) IS+=...; IE+=...; XBEGIN if (globalBrk) break; for (i=IS;i<IE;i++) // original loop code if (brkCond) localBrk=1; break; RECV(THREAD j-1 ) XCOMMIT if (localBrk) globalBrk=1;abortOtherTXs; SEND(THREAD j+1 ) Consolidation University of Michigan 19 Electrical Engineering and Com puter Science

  20. Code Generation Framework • Supports counted and Spawn uncounted loops while (...) – Software managed IS+=...; IE+=...; XBEGIN control speculation if (globalBrk) break; for (i=IS;i<IE;i++) • Iteration chunking // original loop code if (brkCond) localBrk=1; break; • Enforce transaction RECV(THREAD j-1 ) XCOMMIT ordering if (localBrk) globalBrk=1;abortOtherTXs; SEND(THREAD j+1 ) • Handles livein, liveout & Consolidation accumulator registers University of Michigan 20 Electrical Engineering and Com puter Science

  21. Fraction of sequential execution 0 .1 0 .2 0 .3 0 .4 0 .5 0 .6 0 .7 0 .8 0 .9 DOALL Coverage – Provable and Profiled 0 1 052.alvinn 056.ear 171.swim SPEC FP 172.m grid 177.m esa 179.art 183.equake 188.am m p 008.espresso 023.eqntott 026.com press 072.sc 099.go 124.m 88ksim SPEC INT 129.com press 130.li 132.ijpeg 164.gzip 175.vpr 21 181.m cf 197.parser 256.bzip2 300.twolf cjpeg djpeg epic Electrical Engineering and Com puter Science g721decode g721encode Mediabench gsm decode gsm encode m peg2dec m peg2enc pegwitdec pegwitenc University of Michigan Provable DOALL rawcaudio rawdaudio unepic Utilities grep lex yacc average

  22. Fraction of sequential execution 0 .1 0 .2 0 .3 0 .4 0 .5 0 .6 0 .7 0 .8 0 .9 DOALL Coverage – Provable and Profiled 0 1 052.alvinn 056.ear 171.swim SPEC FP 172.m grid 177.m esa 179.art 183.equake 188.am m p 008.espresso 023.eqntott 026.com press 072.sc 099.go 124.m 88ksim SPEC INT 129.com press 130.li 132.ijpeg 164.gzip 175.vpr 22 181.m cf 197.parser 256.bzip2 300.twolf cjpeg djpeg epic Electrical Engineering and Com puter Science g721decode g721encode Mediabench gsm decode gsm encode m peg2dec m peg2enc pegwitdec pegwitenc University of Michigan Provable DOALL Profiled DOALL rawcaudio rawdaudio unepic Utilities grep lex yacc average

  23. Fraction of sequential execution 0 .1 0 .2 0 .3 0 .4 0 .5 0 .6 0 .7 0 .8 0 .9 DOALL Coverage – Provable and Profiled 0 1 052.alvinn 056.ear 171.swim SPEC FP 172.m grid 177.m esa Few dependences hinder parallelization in many loops 179.art 183.equake 188.am m p 008.espresso 023.eqntott 026.com press 072.sc 099.go 124.m 88ksim Still not good enough! SPEC INT 129.com press 130.li 132.ijpeg 164.gzip 175.vpr 23 181.m cf 197.parser 256.bzip2 300.twolf cjpeg djpeg epic Electrical Engineering and Com puter Science g721decode g721encode Mediabench gsm decode gsm encode m peg2dec m peg2enc pegwitdec pegwitenc University of Michigan Provable DOALL Profiled DOALL rawcaudio rawdaudio unepic Utilities grep lex yacc average

  24. DOALL Coverage – Provable and Profiled 1 Profiled DOALL 0 .9 Fraction of sequential execution Provable DOALL 0 .8 0 .7 0 .6 Compiler can help: 0 .5 •Speculative fission 0 .4 •Isolation of infrequent paths 0 .3 •Speculative prematerialization 0 .2 0 .1 0 172.m grid 188.am m p 130.li 181.m cf 072.sc 099.go 124.m 88ksim 175.vpr epic m peg2dec m peg2enc unepic yacc 052.alvinn 177.m esa 179.art 132.ijpeg 164.gzip 256.bzip2 300.twolf cjpeg djpeg grep lex 056.ear 171.swim 183.equake 023.eqntott 026.com press 129.com press 197.parser g721decode g721encode gsm decode gsm encode pegwitdec pegwitenc rawcaudio rawdaudio average 008.espresso Still not good enough! SPEC FP SPEC INT Mediabench Utilities Few dependences hinder parallelization in many loops University of Michigan 24 Electrical Engineering and Com puter Science

  25. Speculative Loop Fission 1: while (node) { 2: work(node); 3: node = node->next; } University of Michigan 25 Electrical Engineering and Com puter Science

  26. Speculative Loop Fission 1: while (node) { 1: while (node) { 4: node_array[count++] = node; 2: work(node); 3: node = node->next; 3: node = node->next; } } University of Michigan 26 Electrical Engineering and Com puter Science

  27. Speculative Loop Fission 1: while (node) { 1: while (node) { 4: node_array[count++] = node; 2: work(node); 3: node = node->next; 3: node = node->next; } } XBEGIN 5: node = node_array[IS]; i = 0; 1 ' :while (node && i++ < CS) { 2: work(node); 3 ' : node = node->next; } RECV(THREAD j-1 ) XCOMMIT SEND(THREAD j+1 ) } University of Michigan 27 Electrical Engineering and Com puter Science

Recommend


More recommend