Uncovering Hidden Loop Level Parallelism in Sequential Applications Hongtao Zhong, Mojtaba Mehrara, Steve Lieberman, Scott Mahlke Advanced Computer Architecture Lab. University of Michigan University of Michigan 1 Electrical Engineering and Com puter Science
CMP Architectures • Multiple cores on a chip – Higher throughput – Reduced complexity (per core) Intel Core 2 Duo – More power/heat friendly • Multithreaded applications AMD Quad-core (Barcelona) Sun Niagara 2 University of Michigan 2 Electrical Engineering and Com puter Science
How About Single Thread? [Source : Bridges et al, MICRO `07] University of Michigan 3 Electrical Engineering and Com puter Science
Loop Level Parallelization DOALL loop i = 0 -3 9 University of Michigan 4 Electrical Engineering and Com puter Science
Loop Level Parallelization DOALL loop i = 2 0 -3 9 i = 0 -1 9 i = 0 -3 9 Core 1 Core 0 University of Michigan 5 Electrical Engineering and Com puter Science
Loop Level Parallelization Speculative DOALL loop i = 0 -3 9 University of Michigan 6 Electrical Engineering and Com puter Science
Loop Level Parallelization Speculative DOALL loop Loop Chunk i = 0 -9 i = 1 0 -1 9 i = 0 -3 9 i = 2 0 -2 9 i = 3 0 -3 9 Core 1 Core 0 University of Michigan 7 Electrical Engineering and Com puter Science
Loop Level Parallelization Speculative DOALL loop Loop Chunk Bad news: limited number of parallel i = 0 -9 i = 1 0 -1 9 loops in general purpose applications i = 0 -3 9 –1.3x speedup for SpecINT2000 on 4 cores i = 2 0 -2 9 i = 3 0 -3 9 Core 1 Core 0 University of Michigan 8 Electrical Engineering and Com puter Science
Contributions • Code generation framework Spawn Initialization Abort Handler XBEGIN if (global_brk_flag) break; – Speculative parallelization of for(i=IS; i<IE; i++) { ...... if (brk_cond) local_brk_flag = 1; break; } perm = RECV(THREAD j-1 ) XCOMMIT uncounted loops if (local_brk_flag) global_brk_flag = 1; kill_other_threads; elseif (IE < n) SEND(perm,THREAD j+1 ) IS = ...; IE = ...; Consolidation • Compiler transformations – Speculative loop fission – Isolation of infrequent dependences – Speculative prematerialization University of Michigan 9 Electrical Engineering and Com puter Science
Target Architecture L2 cache Core 1 Core 0 Core 2 Core 3 L2 cache University of Michigan 10 Electrical Engineering and Com puter Science
Target Architecture L2 cache Scalar operand Core 1 Core 0 network Core 2 Core 3 L2 cache University of Michigan 11 Electrical Engineering and Com puter Science
Target Architecture L2 cache Scalar operand Core 1 Core 0 network Hardware Core 2 Core 3 transactional memory L2 cache University of Michigan 12 Electrical Engineering and Com puter Science
Code Generation Framework for (i=0;i<n;i++) // original loop code University of Michigan 13 Electrical Engineering and Com puter Science
Code Generation Framework while (...) IS+=...; IE+=...; XBEGIN for (i=IS;i<IE;i++) // original loop code XCOMMIT University of Michigan 14 Electrical Engineering and Com puter Science
Code Generation Framework while (...) IS+=...; IE+=...; XBEGIN for (i=IS;i<IE;i++) // original loop code RECV(THREAD j-1 ) XCOMMIT SEND(THREAD j+1 ) University of Michigan 15 Electrical Engineering and Com puter Science
Code Generation Framework Spawn while (...) IS+=...; IE+=...; XBEGIN for (i=IS;i<IE;i++) // original loop code RECV(THREAD j-1 ) XCOMMIT SEND(THREAD j+1 ) University of Michigan 16 Electrical Engineering and Com puter Science
Code Generation Framework Spawn while (...) IS+=...; IE+=...; XBEGIN for (i=IS;i<IE;i++) // original loop code if (brkCond) break; RECV(THREAD j-1 ) XCOMMIT SEND(THREAD j+1 ) University of Michigan 17 Electrical Engineering and Com puter Science
Code Generation Framework Spawn while (...) IS+=...; IE+=...; XBEGIN if (globalBrk) break; for (i=IS;i<IE;i++) // original loop code if (brkCond) localBrk=1; break; RECV(THREAD j-1 ) XCOMMIT if (localBrk) globalBrk=1;abortOtherTXs; SEND(THREAD j+1 ) University of Michigan 18 Electrical Engineering and Com puter Science
Code Generation Framework Spawn while (...) IS+=...; IE+=...; XBEGIN if (globalBrk) break; for (i=IS;i<IE;i++) // original loop code if (brkCond) localBrk=1; break; RECV(THREAD j-1 ) XCOMMIT if (localBrk) globalBrk=1;abortOtherTXs; SEND(THREAD j+1 ) Consolidation University of Michigan 19 Electrical Engineering and Com puter Science
Code Generation Framework • Supports counted and Spawn uncounted loops while (...) – Software managed IS+=...; IE+=...; XBEGIN control speculation if (globalBrk) break; for (i=IS;i<IE;i++) • Iteration chunking // original loop code if (brkCond) localBrk=1; break; • Enforce transaction RECV(THREAD j-1 ) XCOMMIT ordering if (localBrk) globalBrk=1;abortOtherTXs; SEND(THREAD j+1 ) • Handles livein, liveout & Consolidation accumulator registers University of Michigan 20 Electrical Engineering and Com puter Science
Fraction of sequential execution 0 .1 0 .2 0 .3 0 .4 0 .5 0 .6 0 .7 0 .8 0 .9 DOALL Coverage – Provable and Profiled 0 1 052.alvinn 056.ear 171.swim SPEC FP 172.m grid 177.m esa 179.art 183.equake 188.am m p 008.espresso 023.eqntott 026.com press 072.sc 099.go 124.m 88ksim SPEC INT 129.com press 130.li 132.ijpeg 164.gzip 175.vpr 21 181.m cf 197.parser 256.bzip2 300.twolf cjpeg djpeg epic Electrical Engineering and Com puter Science g721decode g721encode Mediabench gsm decode gsm encode m peg2dec m peg2enc pegwitdec pegwitenc University of Michigan Provable DOALL rawcaudio rawdaudio unepic Utilities grep lex yacc average
Fraction of sequential execution 0 .1 0 .2 0 .3 0 .4 0 .5 0 .6 0 .7 0 .8 0 .9 DOALL Coverage – Provable and Profiled 0 1 052.alvinn 056.ear 171.swim SPEC FP 172.m grid 177.m esa 179.art 183.equake 188.am m p 008.espresso 023.eqntott 026.com press 072.sc 099.go 124.m 88ksim SPEC INT 129.com press 130.li 132.ijpeg 164.gzip 175.vpr 22 181.m cf 197.parser 256.bzip2 300.twolf cjpeg djpeg epic Electrical Engineering and Com puter Science g721decode g721encode Mediabench gsm decode gsm encode m peg2dec m peg2enc pegwitdec pegwitenc University of Michigan Provable DOALL Profiled DOALL rawcaudio rawdaudio unepic Utilities grep lex yacc average
Fraction of sequential execution 0 .1 0 .2 0 .3 0 .4 0 .5 0 .6 0 .7 0 .8 0 .9 DOALL Coverage – Provable and Profiled 0 1 052.alvinn 056.ear 171.swim SPEC FP 172.m grid 177.m esa Few dependences hinder parallelization in many loops 179.art 183.equake 188.am m p 008.espresso 023.eqntott 026.com press 072.sc 099.go 124.m 88ksim Still not good enough! SPEC INT 129.com press 130.li 132.ijpeg 164.gzip 175.vpr 23 181.m cf 197.parser 256.bzip2 300.twolf cjpeg djpeg epic Electrical Engineering and Com puter Science g721decode g721encode Mediabench gsm decode gsm encode m peg2dec m peg2enc pegwitdec pegwitenc University of Michigan Provable DOALL Profiled DOALL rawcaudio rawdaudio unepic Utilities grep lex yacc average
DOALL Coverage – Provable and Profiled 1 Profiled DOALL 0 .9 Fraction of sequential execution Provable DOALL 0 .8 0 .7 0 .6 Compiler can help: 0 .5 •Speculative fission 0 .4 •Isolation of infrequent paths 0 .3 •Speculative prematerialization 0 .2 0 .1 0 172.m grid 188.am m p 130.li 181.m cf 072.sc 099.go 124.m 88ksim 175.vpr epic m peg2dec m peg2enc unepic yacc 052.alvinn 177.m esa 179.art 132.ijpeg 164.gzip 256.bzip2 300.twolf cjpeg djpeg grep lex 056.ear 171.swim 183.equake 023.eqntott 026.com press 129.com press 197.parser g721decode g721encode gsm decode gsm encode pegwitdec pegwitenc rawcaudio rawdaudio average 008.espresso Still not good enough! SPEC FP SPEC INT Mediabench Utilities Few dependences hinder parallelization in many loops University of Michigan 24 Electrical Engineering and Com puter Science
Speculative Loop Fission 1: while (node) { 2: work(node); 3: node = node->next; } University of Michigan 25 Electrical Engineering and Com puter Science
Speculative Loop Fission 1: while (node) { 1: while (node) { 4: node_array[count++] = node; 2: work(node); 3: node = node->next; 3: node = node->next; } } University of Michigan 26 Electrical Engineering and Com puter Science
Speculative Loop Fission 1: while (node) { 1: while (node) { 4: node_array[count++] = node; 2: work(node); 3: node = node->next; 3: node = node->next; } } XBEGIN 5: node = node_array[IS]; i = 0; 1 ' :while (node && i++ < CS) { 2: work(node); 3 ' : node = node->next; } RECV(THREAD j-1 ) XCOMMIT SEND(THREAD j+1 ) } University of Michigan 27 Electrical Engineering and Com puter Science
Recommend
More recommend