ICS/Eindhoven University of Technology • Introduction Limited Address Range • The proposed architecture: LAR Architecture for Reducing Code • Sequential code generation for LAR – Annotated Conflict Graph(ACG) Size in Embedded Processors • The integrated approach – Annotated Worst-Case Conflict Qin Zhao, Bart Mesman, Henk Corporaal Graph(AWCCG) Eindhoven University of Technology, The Netherlands • Experimental results Philips Research Laboratories, The Netherlands • Conclusions and future work • Introduction • Introduction – Code size, power consumption of embedded • The proposed architecture: LAR cores must be small since they are on chip – Irregularities in architectures • Sequential code generation for LAR • Difficult for efficient code generation – Annotated Conflict Graph(ACG) – Clustered register file vs. central register file • The integrated approach • Advantage: small code size, power consumption • Disadvantage: extra hardware, copy operations – Annotated Worst-Case Conflict Graph(AWCCG) – Phase coupling in code generation • Sequential phases may generate inefficient code • Experimental results • Integrated approach potentially offers better solutions • Conclusions and future work • Introduction • The proposed architecture: LAR S2 S1 • Sequential code generation for LAR r1 r2 – Annotated Conflict Graph(ACG) • The integrated approach – Annotated Worst-Case Conflict FU1 FU2 FU3 FU4 Graph(AWCCG) • Experimental results • Conclusions and future work
– Annotated Conflict Graph(ACG) a c a c n0 n1 (1,2) (1,2) 1 2 S1 a c 0 a c d a b 1 2 3 t b d b d n2 n3 1 a c a c S2 c b d b d n4 (2,3) (2,3) 3 2 2 (1,2) (1,2) 1 1 b d b d 1 2 a c a c a c a c a c 2 2 b d b d b d b d b d 3 3 2 3 (2,3) (2,3) • Introduction • u and v overlap for sure – • The proposed architecture: LAR – No conflict: u and v can never overlap – Weak conflict: neither of the above holds • Sequential code generation for LAR – Annotated Conflict Graph(ACG) Pu • The integrated approach Pu u – Annotated Worst-Case Conflict Pv u Cu Graph(AWCCG) Pv v Cu • Experimental results v Cv • Conclusions and future work Cv n0 n1 n4 n5 n0 n1 n4 n5 ld ld + + ld ld + + a 0 a b b a b a b * * n2 n2 1 e e c f c t f c c f f n3 + n3 + 2 e d e d d d + + 3 n6 n6 Best-Case Conflict Graph(BCCG)
n0 n1 n4 n0 n1 n4 n5 ld ld + + n5 ld ld + + 0 a a b b a b a b * * n2 + n5 n2 e e 1 t f c c f c c f n3 + n3 + f 2 e d e d d d + + 3 n6 n6 Worst-Case Conflict Graph(WCCG) • Range assignment conflict: – Strong conflict: u and r have strong conflict if u can S1 never reside in S_i , where r in S_i r1 r1 r2 r3 r4 – No conflict: u and r have no conflict if u can always (r1,r2,r3,r4) (r1,r2,r3,r4) reside in S_i , where r in S_i S2 a b – Weak conflict: u and r have weak conflict if u can r2 reside in S_j , where r notin S_i (r1,r2,r3) + S1 (r2,r3,r4) f c * r3 S2 (r1,r2,r3) Pu e d (r1,r2,r3) r S1 or S2 u ld S_i S_j r4 Cu AWCCG and ABCCG – Overview of the integrated approach r1 Constraint analysis a b r2 Address range Lifetime f c assignment serialization r3 e d Bottleneck identification r4
S1 r1 r1 n0 n1 n4 n5 r1 r2 r3 r4 ld ld + + S2 a a b b a b r2 r2 * n2 e c f c f c f n3 + r3 r3 e d e d d + n6 r4 r4 • Introduction r1 n0 n1 n5 • The proposed architecture: LAR ld ld + a b • Sequential code generation for LAR a b n4 r2 * + n2 – Annotated Conflict graph(ACG) c f c • The integrated approach e f n3 + r3 – Annotated Worst-Case Conflict e d d Graph(AWCCG) + n6 • Experimental results r4 • Conclusions and future work • Conclusions and future work DFG_fu,l encoding sequential integrated – Conclusions central LAR % |S| |S_o| T(s) |S| |S_o| T(s) ar_filter_1,18 392 308 78.57 5 2 0.07 5 2 0.09 • New encoding style for reducing code size wdelf_1,27 476 398 83.61 6 2 0.07 6 2 0.12 • No extra hardware, no extra move operations fdct_2,20 714 588 82.35 9 5 inf 9 5 4.95 9 6 inf 9 6 0.35 fdct_4,11 714 588 82.35 • Corresponding code generation techniques 9 7 0.26 9 7 0.37 12 7 0.95 12 7 no • ACG for range constraints loef_2,15 952 952 100 12 8 inf 12 8 1.43 12 9 0.2 12 9 no • AWCCG solves phase coupling problem loef_4,11 952 952 100 12 9 inf 12 9 0.89 – Future work 8 4 inf 8 4 1.28 chen_2,15 680 560 82.35 8 5 0.24 8 5 4.28 • More versatile architectures 9 3 0.22 9 3 no 9 4 0.19 9 4 0.24 chen_4,8 680 560 82.35 • Combine with the operation assignment phase 9 5 0.16 9 5 0.24
Recommend
More recommend