Bungee Jumps: Accelerating Indirect Branches Through Hardware/Software Co-Design Daniel ¡S. ¡McFarlin Craig ¡Zilles 1
Indirect ¡Branches ¡Are ¡Increasingly ¡Predictable 20 Nehalem Sandy Bridge Haswell TAGE Mispredicts/Kilo Instrs 15 B e 10 t t e 5 r 0 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 Avg 2
Indirect ¡Branches ¡Are ¡Increasingly ¡Predictable 20 Nehalem Sandy Bridge Haswell TAGE Mispredicts/Kilo Instrs 15 B e 10 t t e 5 r 0 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 Avg And ¡Unbiased predictability bias 1 0.75 0.5 0.25 0 meteor raytrace btree fannkuch fasta richards nqueens revcomp float specnorm regexdna knuke mandelbrot Geomean 3
In-‑Order ¡Machines ¡Specialize ¡Based ¡on ¡Branch ¡Bias ¡or ¡ Eliminate ¡Branch ¡PredicCon ¡Altogether area() 30 60 VTable Shape area() 10 area() 4
In-‑Order ¡Machines ¡Specialize ¡Based ¡on ¡Branch ¡Bias ¡or ¡ Eliminate ¡Branch ¡PredicCon ¡Altogether area() 30 if s->type == Circle 60 area() VTable Shape area() else if s->type == Rect 10 R C P O area() else if s->type == Square area() area() else area() 4
In-‑Order ¡Machines ¡Specialize ¡Based ¡on ¡Branch ¡Bias ¡or ¡ Eliminate ¡Branch ¡PredicCon ¡Altogether area() 30 if s->type == Circle 60 area() VTable Shape area() else if s->type == Rect 10 R C P O area() else if s->type == Square p0 = (obj is type B ); area() p1 = (obj is type C ) area() p2 = (obj is type D ) else Predication area() p0 : r = B::func( ); p1 : r = C::func( ); p2 : r = D::func( ); if( !( p0 | p1 | p2 )) r = obj->func( ); 4
Challenge: ¡Non-‑Reconvergence ¡& ¡Large ¡ Number ¡of ¡Targets F A sjeng: f_in_check 7 ld r8, [rip+0x39] ld r9, [rax*8+0x94] ld edi, [rsi*4+0x46] jmp r9 movsxd rcx, r8 G ld r10, [rcx*4+0x46] 25 11 ld r8, [rip+0x5e] cmp edi, r10 ld edi, [rsi*4+0x42] jnz M 24 24 B movsxd rcx, r8 14 19 Text ld r8, [rip+0x5b] ld r10, [rcx*4+0x42] ld edi, [rsi*4+0x92] cmp edi, r10 C movsxd rcx, r8 jnz L ld edx, [rsi*4+0x7d] ld r10, [rcx*4+0x92] D cmp edx, 0x6 cmp edi, r10 E ld ecx, [rip+0x58] 99 jz I jnz H 1 ld edx, [rcx*8+0x10] ld esi, [rip+0x81] test edx, edx lea edi, [rsi+rcx*1] 5 95 99 1 cmp edi, edx jz K jz J 99 1 96 4 5
Challenge: ¡Non-‑Reconvergence ¡& ¡Large ¡ Number ¡of ¡Targets F A sjeng: f_in_check 7 ld r8, [rip+0x39] ld r9, [rax*8+0x94] ld edi, [rsi*4+0x46] jmp r9 movsxd rcx, r8 G ld r10, [rcx*4+0x46] 25 11 ld r8, [rip+0x5e] cmp edi, r10 ld edi, [rsi*4+0x42] jnz M 24 24 B movsxd rcx, r8 14 19 Text ld r8, [rip+0x5b] ld r10, [rcx*4+0x42] ld edi, [rsi*4+0x92] cmp edi, r10 C movsxd rcx, r8 jnz L ld edx, [rsi*4+0x7d] ld r10, [rcx*4+0x92] D cmp edx, 0x6 cmp edi, r10 E ld ecx, [rip+0x58] 99 jz I jnz H 1 ld edx, [rcx*8+0x10] ld esi, [rip+0x81] test edx, edx lea edi, [rsi+rcx*1] 5 95 99 1 cmp edi, edx jz K jz J 99 1 96 4 5
Challenge: ¡Non-‑Reconvergence ¡& ¡Large ¡ Number ¡of ¡Targets F A sjeng: f_in_check 7 ld r8, [rip+0x39] ld r9, [rax*8+0x94] ld edi, [rsi*4+0x46] jmp r9 movsxd rcx, r8 G ld r10, [rcx*4+0x46] 25 11 ld r8, [rip+0x5e] cmp edi, r10 ld edi, [rsi*4+0x42] jnz M 24 24 B movsxd rcx, r8 14 19 Text ld r8, [rip+0x5b] ld r10, [rcx*4+0x42] ld edi, [rsi*4+0x92] cmp edi, r10 C movsxd rcx, r8 jnz L ld edx, [rsi*4+0x7d] ld r10, [rcx*4+0x92] D cmp edx, 0x6 cmp edi, r10 E ld ecx, [rip+0x58] 99 jz I jnz H 1 ld edx, [rcx*8+0x10] ld esi, [rip+0x81] test edx, edx lea edi, [rsi+rcx*1] 5 95 99 1 cmp edi, edx jz K jz J 99 1 96 4 5
Challenge: ¡Non-‑Reconvergence ¡& ¡Large ¡ Number ¡of ¡Targets F A sjeng: f_in_check 7 ld r8, [rip+0x39] ld r9, [rax*8+0x94] ld edi, [rsi*4+0x46] jmp r9 movsxd rcx, r8 G ld r10, [rcx*4+0x46] 25 11 ld r8, [rip+0x5e] cmp edi, r10 ld edi, [rsi*4+0x42] jnz M 24 24 B movsxd rcx, r8 14 19 Text ld r8, [rip+0x5b] ld r10, [rcx*4+0x42] ld edi, [rsi*4+0x92] cmp edi, r10 C movsxd rcx, r8 jnz L ld edx, [rsi*4+0x7d] ld r10, [rcx*4+0x92] D cmp edx, 0x6 cmp edi, r10 E ld ecx, [rip+0x58] 99 jz I jnz H 1 ld edx, [rcx*8+0x10] ld esi, [rip+0x81] test edx, edx lea edi, [rsi+rcx*1] 5 95 99 1 cmp edi, edx jz K jz J 99 1 96 4 5
Challenge: ¡Non-‑Reconvergence ¡& ¡Large ¡ Number ¡of ¡Targets F A sjeng: f_in_check 7 ld r8, [rip+0x39] ld r9, [rax*8+0x94] ld edi, [rsi*4+0x46] jmp r9 movsxd rcx, r8 G ld r10, [rcx*4+0x46] 25 11 ld r8, [rip+0x5e] cmp edi, r10 ld edi, [rsi*4+0x42] jnz M 24 24 B movsxd rcx, r8 14 19 Text ld r8, [rip+0x5b] ld r10, [rcx*4+0x42] ld edi, [rsi*4+0x92] cmp edi, r10 C movsxd rcx, r8 jnz L ld edx, [rsi*4+0x7d] ld r10, [rcx*4+0x92] D cmp edx, 0x6 cmp edi, r10 E ld ecx, [rip+0x58] 99 jz I jnz H 1 ld edx, [rcx*8+0x10] ld esi, [rip+0x81] test edx, edx lea edi, [rsi+rcx*1] 5 95 99 1 cmp edi, edx jz K jz J 99 1 96 4 5
Challenge: ¡Non-‑Reconvergence ¡& ¡Large ¡ Number ¡of ¡Targets F A sjeng: f_in_check 7 ld r8, [rip+0x39] ld r9, [rax*8+0x94] ld edi, [rsi*4+0x46] jmp r9 movsxd rcx, r8 G ld r10, [rcx*4+0x46] 25 11 ld r8, [rip+0x5e] cmp edi, r10 ld edi, [rsi*4+0x42] jnz M 24 24 B movsxd rcx, r8 14 19 Text ld r8, [rip+0x5b] ld r10, [rcx*4+0x42] ld edi, [rsi*4+0x92] cmp edi, r10 C movsxd rcx, r8 jnz L ld edx, [rsi*4+0x7d] ld r10, [rcx*4+0x92] D cmp edx, 0x6 cmp edi, r10 E ld ecx, [rip+0x58] 99 jz I jnz H 1 ld edx, [rcx*8+0x10] ld esi, [rip+0x81] test edx, edx lea edi, [rsi+rcx*1] 5 95 99 1 cmp edi, edx jz K jz J 99 1 96 4 5
Challenge: ¡Non-‑Reconvergence ¡& ¡Large ¡ Number ¡of ¡Targets F A sjeng: f_in_check 7 ld r8, [rip+0x39] ld r9, [rax*8+0x94] ld edi, [rsi*4+0x46] jmp r9 movsxd rcx, r8 G ld r10, [rcx*4+0x46] 25 11 ld r8, [rip+0x5e] cmp edi, r10 ld edi, [rsi*4+0x42] jnz M 24 24 B movsxd rcx, r8 14 19 Text ld r8, [rip+0x5b] ld r10, [rcx*4+0x42] ld edi, [rsi*4+0x92] cmp edi, r10 C movsxd rcx, r8 jnz L ld edx, [rsi*4+0x7d] ld r10, [rcx*4+0x92] D cmp edx, 0x6 cmp edi, r10 E ld ecx, [rip+0x58] 99 jz I jnz H 1 ld edx, [rcx*8+0x10] ld esi, [rip+0x81] test edx, edx lea edi, [rsi+rcx*1] 5 95 99 1 cmp edi, edx jz K jz J 99 1 96 4 5
Challenge: ¡Non-‑Reconvergence ¡& ¡Large ¡ Number ¡of ¡Targets F A sjeng: f_in_check 7 ld r8, [rip+0x39] ld r9, [rax*8+0x94] ld edi, [rsi*4+0x46] jmp r9 movsxd rcx, r8 G ld r10, [rcx*4+0x46] 25 11 ld r8, [rip+0x5e] cmp edi, r10 ld edi, [rsi*4+0x42] jnz M 24 24 B movsxd rcx, r8 14 19 Text ld r8, [rip+0x5b] ld r10, [rcx*4+0x42] ld edi, [rsi*4+0x92] cmp edi, r10 C movsxd rcx, r8 jnz L ld edx, [rsi*4+0x7d] ld r10, [rcx*4+0x92] D cmp edx, 0x6 cmp edi, r10 E ld ecx, [rip+0x58] 99 jz I jnz H 1 ld edx, [rcx*8+0x10] ld esi, [rip+0x81] test edx, edx lea edi, [rsi+rcx*1] 5 95 99 1 cmp edi, edx jz K jz J 99 1 96 4 5
Challenge: ¡Non-‑Reconvergence ¡& ¡Large ¡ Number ¡of ¡Targets F A sjeng: f_in_check 7 ld r8, [rip+0x39] ld r9, [rax*8+0x94] ld edi, [rsi*4+0x46] jmp r9 movsxd rcx, r8 G ld r10, [rcx*4+0x46] 25 11 ld r8, [rip+0x5e] cmp edi, r10 ld edi, [rsi*4+0x42] jnz M 24 24 B movsxd rcx, r8 14 19 Text ld r8, [rip+0x5b] ld r10, [rcx*4+0x42] ld edi, [rsi*4+0x92] cmp edi, r10 C movsxd rcx, r8 jnz L ld edx, [rsi*4+0x7d] ld r10, [rcx*4+0x92] D cmp edx, 0x6 cmp edi, r10 E ld ecx, [rip+0x58] 99 jz I jnz H 1 ld edx, [rcx*8+0x10] ld esi, [rip+0x81] test edx, edx lea edi, [rsi+rcx*1] 5 95 99 1 cmp edi, edx jz K jz J 99 1 96 4 5
Challenge: ¡Non-‑Reconvergence ¡& ¡Large ¡ Number ¡of ¡Targets F A sjeng: f_in_check 7 ld r8, [rip+0x39] ld r9, [rax*8+0x94] ld edi, [rsi*4+0x46] jmp r9 movsxd rcx, r8 G ld r10, [rcx*4+0x46] 25 11 ld r8, [rip+0x5e] cmp edi, r10 ld edi, [rsi*4+0x42] jnz M 24 24 B movsxd rcx, r8 14 19 Text ld r8, [rip+0x5b] ld r10, [rcx*4+0x42] ld edi, [rsi*4+0x92] cmp edi, r10 C movsxd rcx, r8 jnz L ld edx, [rsi*4+0x7d] ld r10, [rcx*4+0x92] D cmp edx, 0x6 cmp edi, r10 E ld ecx, [rip+0x58] 99 jz I jnz H 1 ld edx, [rcx*8+0x10] ld esi, [rip+0x81] test edx, edx lea edi, [rsi+rcx*1] 5 95 99 1 cmp edi, edx jz K jz J 99 1 96 4 5
Challenge: ¡Non-‑Reconvergence ¡& ¡Large ¡ Number ¡of ¡Targets F A sjeng: f_in_check 7 ld r8, [rip+0x39] ld r9, [rax*8+0x94] ld edi, [rsi*4+0x46] jmp r9 movsxd rcx, r8 G ld r10, [rcx*4+0x46] 25 11 ld r8, [rip+0x5e] cmp edi, r10 ld edi, [rsi*4+0x42] jnz M 24 24 B movsxd rcx, r8 14 19 Text ld r8, [rip+0x5b] ld r10, [rcx*4+0x42] ld edi, [rsi*4+0x92] cmp edi, r10 C movsxd rcx, r8 jnz L ld edx, [rsi*4+0x7d] ld r10, [rcx*4+0x92] D cmp edx, 0x6 cmp edi, r10 E ld ecx, [rip+0x58] 99 jz I jnz H 1 ld edx, [rcx*8+0x10] ld esi, [rip+0x81] test edx, edx lea edi, [rsi+rcx*1] 5 95 99 1 cmp edi, edx jz K jz J 99 1 96 4 5
Recommend
More recommend