Dynamic Translation for EPIC Architectures David R. Ditzel Chief - PowerPoint PPT Presentation

Dynamic Translation for EPIC Architectures David R. Ditzel Chief Architect for Hybrid Computing, VP IAG Intel Corporation Presentation for 8 th Workshop on EPIC Architectures April 24, 2010 Dynamic Translation for EPIC 1 CGO 2010 1 1

Thesis: The future of computing belongs to EPIC Architectures EPIC: Explicitly Parallel Instruction Computer • or Exposed Parallelism Instruction Computer • Parallelism exposed for software to exploit • Examples – Itanium, GPGPU’s, Transmeta Efficeon/Crusoe • My belief: • EPIC is a more power efficient approach • Dynamic translation will improve power advantages • May be a different EPIC than we know today Dynamic Translation for EPIC 2 CGO 2010

Biggest challenge Power is the limiter We must move to more efficient computing structures or # cores could be limited Dynamic Translation for EPIC 3 CGO 2010

Simple Power Scaling Example Power = Cdyn x Voltage 2 x Frequency + Leakage (33%) Moore’s Law says # devices can double every node • 4 cores go to 128 cores over 10 years • How does power limit this expectation? With an upper power limit of ~100 Watts, how many cores? Easy to calculate scaling per node: • Voltage scaling about 0.9x • Cdyn scaling about 0.8x • Assume frequency increase of 1.2x From this data we can see how many cores we can have if we do not change to a more efficient approach Dynamic Translation for EPIC 4 CGO 2010

Power Limits # of Big Cores Year 2008 2010 2012 2014 2016 2018 Technology Node (nm) 45 32 22 15 11 8 Total Power 100 Power/core 25 Freq 3.0 Voltage 1.0 Cdyn/Core 5.6 Expected #Cores 4 8 16 32 64 128 Power Limited #Cores 4 Dynamic Translation for EPIC 5 CGO 2010

Power Limits # of Big Cores Year 2008 2010 2012 2014 2016 2018 Technology Node (nm) 45 32 22 15 11 8 Total Power 100 100 100 100 100 100 Power/core 25 Freq 3.0 3.6 4.3 5.2 6.2 7.5 Voltage 1.0 0.9 0.8 0.7 0.7 0.6 Cdyn/Core 5.6 4.4 3.6 2.8 2.3 1.8 Expected #Cores 4 8 16 32 64 128 Power Limited #Cores 4 Dynamic Translation for EPIC 6 CGO 2010

Power Limits # of Big Cores Year 2008 2010 2012 2014 2016 2018 Technology Node (nm) 45 32 22 15 11 8 Total Power 100 100 100 100 100 100 Power/core 25 19 15 12 9 7 Freq 3.0 3.6 4.3 5.2 6.2 7.5 Voltage 1.0 0.9 0.8 0.7 0.7 0.6 Cdyn/Core 5.6 4.4 3.6 2.8 2.3 1.8 Expected #Cores 4 8 16 32 64 128 Power Limited #Cores 4 Dynamic Translation for EPIC 7 CGO 2010

Power Limits # of Big Cores Year 2008 2010 2012 2014 2016 2018 Technology Node (nm) 45 32 22 15 11 8 Total Power 100 100 100 100 100 100 Power/core 25 19 15 12 9 7 Freq 3.0 3.6 4.3 5.2 6.2 7.5 Voltage 1.0 0.9 0.8 0.7 0.7 0.6 Cdyn/Core 5.6 4.4 3.6 2.8 2.3 1.8 Expected #Cores 4 8 16 32 64 128 Power Limited #Cores 4 5 7 9 11 14 Dynamic Translation for EPIC 8 CGO 2010

Power Limits # of Big Cores Year 2008 2010 2012 2014 2016 2018 Technology Node (nm) 45 32 22 15 11 8 Total Power 100 100 100 100 100 100 Power/core 25 19 15 12 9 7 Freq 3.0 3.6 4.3 5.2 6.2 7.5 Voltage 1.0 0.9 0.8 0.7 0.7 0.6 Cdyn/Core 5.6 4.4 3.6 2.8 2.3 1.8 Expected #Cores 4 8 16 32 64 128 Power Limited #Cores 4 5 7 9 11 14 We need to improve the efficiency of each core or we will suffer severe performance reduction Dynamic Translation for EPIC 9 CGO 2010

So how do we build improved cores? Dynamic Translation for EPIC 10 CGO 2010

Premise Change of perspective needed Software should be part of the picture • Hardware co-designed with software increases the available • options Software needs a simple model of the “cost” of an instruction • • Out-of-order processors made this impossible • In-order EPIC processor can provide this simple model Software can do a very good job of scheduling, but only if • the scheduling blocks are large enough Let’s look at an example of how to increase block size and • improve scheduling Dynamic Translation for EPIC 11 CGO 2010

Compiler optimization example Conditional branches tend to have a tst.ne p1, ecx, ecx tst.ne p1, ecx, ecx brc assert ~p1 p1, D very biased program behavior • Exploitable by compiler or eax, zero, 1 or eax, zero, 1 ld edx, [esp + 112] ld r32, [esp + 112] or ebx, zero, 0 or ebx, zero, 0 st st ebx, [r32] ebx, [r32] Correctness makes it difficult ld esi, [ebp + 0x878] ld esi, [ebp + 0x878] cmp.ne p1 edi, 72 cmp.ne p1 edi, 72 • Fixup code for cold exits brc assert ~p1 p1, E • Exceptions or eax, zero, 1 or eax, zero, 1 ld ebx, [ebp] ld ebx, [ebp] ld ebx, [ebx + esi*4] ld ebx, [ebx + esi*4] A little special purpose hardware can ld edx, [esp + 112] ld edx, [esp + 112] st st ebx, [edx] ebx, [edx] make it much easier tst.ne p1, ecx, ecx tst.ne p1, ecx, ecx brc brc p1, F p1, F Dynamic Translation for EPIC 12 CGO 2010

Hardware atomicity Hardware executes a region of code tst.ne p1, ecx, ecx tst.ne p1, ecx, ecx brc assert ~p1 p1, D completely or not at all or eax, zero, 1 or eax, zero, 1 ld r32, [esp + 112] ld edx, [esp + 112] Common case is fast or ebx, zero, 0 or ebx, zero, 0 st ebx, [r32] ld esi, [ebp + 0x878] ld esi, [ebp + 0x878] cmp.ne p1 edi, 72 cmp.ne p1 edi, 72 brc assert ~p1 p1, E Uncommon case rolls back • Resume in non-specialized code or eax, zero, 1 ld ebx, [ebp] ld ebx, [ebp] ld ebx, [ebx + esi*4] ld ebx, [ebx + esi*4] ld edx, [esp + 112] st ebx, [edx] st ebx, [edx] tst.ne p1, ecx, ecx brc p1, F Dynamic Translation for EPIC 13 CGO 2010

Dynamic binary translation test ecx, ecx tst.ne p1, ecx, ecx tst.ne p1, ecx, ecx jne D assert ~p1 brc p1, D mov eax, 1 or eax, zero, 1 or eax, zero, 1 mov esi, [esp + 112] ld r32, [esp + 112] ld edx, [esp + 112] xor ebx,ebx or ebx, zero, 0 or ebx, zero, 0 mov [esi], ebx st ebx, [r32] x86 Applications mov esi, [ebp + 0x878] ld esi, [ebp + 0x878] ld esi, [ebp + 0x878] x86 OS cmp edi, 72 cmp.ne p1 edi, 72 cmp.ne p1 edi, 72 jne E brc assert ~p1 p1, E x86 ISA Translations mov eax, 1 or eax, zero, 1 Interpreter Runtime Code mov ebx, [ebp] ld ebx, [ebp] ld ebx, [ebp] x86 mov ebp, [ebx + esi*4] ld ebx, [ebx + esi*4] ld ebx, [ebx + esi*4] Morphing processor mov edx, [esp + 112] or edx, r32, 0 Software mov [edx], ebx st st ebx, [edx] ebx, [edx] test ecx,ecx tst.ne p1, ecx, ecx jne F brc p1, F RISC ISA EPIC Processor Dynamic Translation for EPIC 14 CGO 2010

Efficeon Processor Example Up to 6-issue/clock EPIC style architecture • 2 loads or stores • 2 integer ALU • 2 SIMD • 1 branch/call or other control Co-designed with CMS Includes hardware atomicity under software control • Commit • Rollback Dynamic Translation for EPIC 15 CGO 2010

Efficeon Hardware Example Each clock, processor can issue from one to six 32- bit instruction “atoms” to 11 functional units atom1 atom2 atom3 atom4 atom5 atom6 atom7 atom8 Instruction Load or Load or Integer Integer Control Store or Alias Store or ALU-1 ALU-2 32-bit add 32-bit add Functional FP / SIMD FP / SIMD Branch Exec-1 Exec-2 Units Dynamic Translation for EPIC 16 CGO 2010

Code Morphing Software 4 Gear System Significantly Improved Responsiveness and Overall Performance 1 st Gear Executes 1 instruction at a time • Profiles code at runtime • Gathers data for flow analysis • Gathers branch frequencies and directions • Detects load/store typing (IO vs memory) Filters out infrequently executed code No startup cost Lowest speed Dynamic Translation for EPIC 17 CGO 2010

Code Morphing Software 4 Gear System Significantly Improved Responsiveness and Overall Performance 1 st Gear 2 nd Gear Uses profile data to create initial translations after code reaches 1 st threshold. • Translates a “Region” of up to100 x86 instructions. • Adds flow graph “Shape” information • Light Optimization • “Greedy” scheduling Low translation overhead Fast execution Dynamic Translation for EPIC 18 CGO 2010

Dynamic Translation for EPIC Architectures David R. Ditzel Chief - PowerPoint PPT Presentation

Dynamic Translation for EPIC Architectures David R. Ditzel Chief Architect for Hybrid Computing, VP IAG Intel Corporation Presentation for 8 th Workshop on EPIC Architectures April 24, 2010 Dynamic Translation for EPIC

11-731 Machine Translation Speech 2 Speech Translation Speech Translation Three part systems

Community Translation By Willem Stoeller Examples Community Translation Virtual Teams Powering

Statistical Machine Translation Nadir Durrani 21-November-2014 Machine Translation

Computer Aided Translation Philipp Koehn 30 April 2015 Philipp Koehn Machine Translation:

Computer Aided Translation Philipp Koehn 15 November 2018 Philipp Koehn Machine Translation:

Global Translation Services Website translation using post-edited machine translation and

4CSLL5 IBM Translation Models Martin Emms October 22, 2020 4CSLL5 IBM Translation Models IBM

4CSLL5 IBM Translation Models IBM models Probabilities and Translation Alignments Martin Emms

Simple, Lexicalized Choice of Translation Timing for Simultaneous Speech Translation Tomoki

Translation Memory & Machine Translation Dj Vu combines both smartly! Content

Translation Services: Innovation in Translation Workflow, Tools and Translation Workflow, Tools

Introd u ction to machine translation MAC H IN E TR AN SL ATION IN P YTH ON Th u shan

Machine Translation Machine Translation February 13, 2008 Andreas Eisele UdS Computerlinguistik

Speech Processing 15-492/18-492 Speech Translation Speech Translation Three part systems

CRF Word Alignment & Noisy Channel Translation January 31, 2013 Tuesday, February 19, 13

CRF Word Alignment & Noisy Channel Translation Machine Translation Lecture 6 Instructor:

Network Metrics, Planar Graphs, and Software Tools Based on materials by Lala Adamic, UMichigan

Prioritization Plus: Entering Program Narratives and Supplemental Data Online workshop for

Large-Scale Reuse in Open Source Software Audris Mockus audris@avaya.com Avaya Labs Research

If XML is so easy, how come its so hard? The usability of editing software for structured

UBL Update Jon Bosak Sun Microsystems http:// oasis- open.org / OASIS Symposium on the

What LLVM Can Do For You David Chisnall April 13, 2012 Introduction Writing a New Front End

[537] Virtual Memory Tyler Harter 9/15/14 Overview Review Scheduling Address Spaces (Chapter

Integra(on*of** humanandmachine* transla(on* * Marco'Turchi' Fondazione'Bruno'Kessler'

Dynamic Translation for EPIC Architectures David R. Ditzel Chief - PowerPoint PPT Presentation

Dynamic Translation for EPIC Architectures David R. Ditzel Chief Architect for Hybrid Computing, VP IAG Intel Corporation Presentation for 8 th Workshop on EPIC Architectures April 24, 2010 Dynamic Translation for EPIC

11-731 Machine Translation Speech 2 Speech Translation Speech Translation Three part systems

Community Translation By Willem Stoeller Examples Community Translation Virtual Teams Powering

Statistical Machine Translation Nadir Durrani 21-November-2014 Machine Translation

Computer Aided Translation Philipp Koehn 30 April 2015 Philipp Koehn Machine Translation:

Computer Aided Translation Philipp Koehn 15 November 2018 Philipp Koehn Machine Translation:

Global Translation Services Website translation using post-edited machine translation and

4CSLL5 IBM Translation Models Martin Emms October 22, 2020 4CSLL5 IBM Translation Models IBM

4CSLL5 IBM Translation Models IBM models Probabilities and Translation Alignments Martin Emms

Simple, Lexicalized Choice of Translation Timing for Simultaneous Speech Translation Tomoki

Translation Memory &amp; Machine Translation Dj Vu combines both smartly! Content

Translation Services: Innovation in Translation Workflow, Tools and Translation Workflow, Tools

Introd u ction to machine translation MAC H IN E TR AN SL ATION IN P YTH ON Th u shan

Machine Translation Machine Translation February 13, 2008 Andreas Eisele UdS Computerlinguistik

Speech Processing 15-492/18-492 Speech Translation Speech Translation Three part systems

CRF Word Alignment &amp; Noisy Channel Translation January 31, 2013 Tuesday, February 19, 13

CRF Word Alignment &amp; Noisy Channel Translation Machine Translation Lecture 6 Instructor:

Network Metrics, Planar Graphs, and Software Tools Based on materials by Lala Adamic, UMichigan

Prioritization Plus: Entering Program Narratives and Supplemental Data Online workshop for

Large-Scale Reuse in Open Source Software Audris Mockus audris@avaya.com Avaya Labs Research

If XML is so easy, how come its so hard? The usability of editing software for structured

UBL Update Jon Bosak Sun Microsystems http:// oasis- open.org / OASIS Symposium on the

What LLVM Can Do For You David Chisnall April 13, 2012 Introduction Writing a New Front End

[537] Virtual Memory Tyler Harter 9/15/14 Overview Review Scheduling Address Spaces (Chapter

Integra(on*of** human*and*machine* transla(on* * Marco'Turchi' Fondazione'Bruno'Kessler'

Translation Memory & Machine Translation Dj Vu combines both smartly! Content

CRF Word Alignment & Noisy Channel Translation January 31, 2013 Tuesday, February 19, 13

CRF Word Alignment & Noisy Channel Translation Machine Translation Lecture 6 Instructor:

Integra(on*of** humanandmachine* transla(on* * Marco'Turchi' Fondazione'Bruno'Kessler'