Emulation Michael Jantz Acknowledgements Slides adapted from - PowerPoint PPT Presentation

Emulation Michael Jantz

Acknowledgements • Slides adapted from Chapter 2 in Virtual Machines: Versatile Platforms for Systems and Processes by James E. Smith and Ravi Nair • Credit to Prasad A. Kulkarni – some slides were borrowed from his course on Virtual Machines at the University of Kansas 2

Outline • Emulation • Interpretation • Basic, indirect threaded, and direct threaded • Binary translation • Code discovery, code location • Other issues • Control transfer optimizations • Instruction set issues 3

Emulation vs. Simulation • Emulation: process of implementing the interface / functionality of a (sub)system on a different system • Applies specifically to an instruction set • Different emulation techniques • Interpretation (instruction-at-a-time) • Binary translation (block-at-a-time) • Simulation • Method for modeling a system’s operation • Goal is to study process – not to imitate function 4

Definitions • Guest Guest • Environment supported by underlying platform supported • Host by • Underlying platform used Host to provide an environment for the guest 5

Definitions • Source ISA or binary Source • Original instruction set or binary • The ISA to be emulated emulated • Target ISA or binary by • ISA of the host processor Target • Underlying ISA • Source / target refer to ISAs • Guest / host refer to platforms 6

Instruction Set Emulation • Binaries in source instruction set can be executed on machine implementing target instruction set • Required for many VM implementations • Example: IA-32 EL 7

Interpretation vs. Translation • Interpretation • Simple, easy to implement • Low performance • Binary translation • Complex implementation • Higher initial cost, better performance • Techniques in between these extremes • Predecoding • Selective compilation 8

Interpreter State Program Counter • Must maintain Condition Codes Code state of machine Reg 0 Reg 1 implementing the . . . source ISA Data Reg n-1 • Registers • Memory • Code • Data Stack • Stack Interpreter Code 9

Decode-And-Dispatch Interpreter • Decode-and-dispatch loop • One instruction at a time • Decode the current instruction • Dispatch to corresponding interpreter routine while (!halt && !interrupt) { inst = code[PC]; opcode = extract (inst,31,6); switch(opcode) { case LoadWordAndZero: LoadWordAndZero (inst); case ALU: ALU (inst); case Branch: Branch (inst); . . .} } 10

Decode-And-Dispatch Interpreter LoadWordAndZero(inst){ RT = extract (inst,25,5); RA = extract (inst,20,5); displacement = extract (inst,15,16); if (RA == 0) source = 0; else source = regs[RA]; address = source + displacement; regs[RT] = (data[address]<< 32)>> 32; PC = PC + 4; } 11

Decode-And-Dispatch Interpreter ALU(inst){ RT = extract (inst,25,5); RA = extract (inst,20,5); RB = extract (inst, 15,5); source1 = regs[RA]; source2 = regs[RB]; extended_opcode = extract (inst,10,10); switch(extended_opcode) { case Add: Add (inst); case AddCarrying: AddCarrying (inst); case AddExtended: AddExtended (inst); . . .} PC = PC + 4; } 12

Decode-And-Dispatch Efficiency • Decode-and-dispatch loop • Several branch instructions • Indirect branch on switch statement • Interpreting an add instruction • Requires approximately 20 target instructions • Several expensive loads/stores to memory • Hand-coded assembly can improve performance • Example: HotSpot JVM 13

Indirect Threaded Interpretation • High number of branches in decode-and- dispatch loop reduces performance • At least 5 branches per instruction • Threaded interpretation • Append dispatch code with each interpretation routine • Removes 3 branches • Threads interpretation routines together 14

Indirect Threaded Interpretation LoadWordAndZero: RT = extract (inst,25,5); RA = extract (inst,20,5); displacement = extract (inst,15,16); if (RA == 0) source = 0; else source = regs(RA); address = source + displacement; regs(RT) = (data(address)<< 32) >> 32; PC = PC +4; If (halt || interrupt) goto exit; inst = code[PC]; opcode = extract (inst,31,6) extended_opcode = extract (inst,10,10); routine = dispatch[opcode,extended_opcode]; goto *routine; 15

Indirect Threaded Interpretation Add: RT = extract (inst,25,5); RA = extract (inst,20,5); RB = extract (inst,15,5); source1 = regs(RA); source2 = regs[RB]; sum = source1 + source2 ; regs[RT] = sum; PC = PC + 4; If (halt || interrupt) goto exit; inst = code[PC]; opcode = extract (inst,31,6); extended_opcode = extract (inst,10,10); routine = dispatch[opcode,extended_opcode]; goto *routine; 16

Indirect Threaded Interpretation • Dispatch occurs indirectly through a table • Interpretation routines can be modified and relocated independently • Advantages • Interpretation routines still portable • Improves efficiency over decode-and-dispatch • Disadvantages • Increases interpreter code size 17

Indirect Threaded Interpretation interpreter interpreter source code routines source code routines "data" accesses dispatch loop Decode-dispatch Threaded 18

Predecoding • Parse each instruction into a pre-defined data structure to facilitate interpretation • Separate opcodes, operands, etc. • Reduces shifts / masks for decoding • More useful when source ISA is CISC lwz r1, 8(r2) add r3, r3,r1 stw r3, 0(r4) 19

Predecoding struct instruction { unsigned long op; unsigned char dest, src1, src2; } code [CODE_SIZE]; LoadWordandZero: RT = code[TPC].dest; RA = code[TPC].src1; displacement = code[TPC].src2; if (RA == 0) source = 0; else source = regs[RA]; address = source + displacement; regs[RT] = (data[address]<< 32) >> 32; SPC = SPC + 4; TPC = TPC + 1; If (halt || interrupt) goto exit; opcode = code[TPC].op routine = dispatch[opcode]; goto *routine; 20

Direct Threaded Interpretation • Replace table lookup with direct access to address of interpreter routine • Requires predecoding • Reduces portability 21

Direct Threaded Interpretation LoadWordandZero: RT = code[TPC].dest; RA = code[TPC].src1; displacement = code[TPC].src2; if (RA == 0) source = 0; else source = regs[RA]; address = source + displacement; regs[RT] = (data[address]<< 32) >> 32; SPC = SPC + 4; TPC = TPC + 1; If (halt || interrupt) goto exit; routine = code[TPC].op; goto *routine; 22

Direct Threaded Interpretation intermediate interpreter code routines source code pre- decoder 23

Binary Translation • Convert source binary to target binary before execution • Logical conclusion of predecoding • Removes parsing and jumps altogether • Allows optimizations on native code • Achieves better performance than interpretation • Generated code no longer portable 24

Binary Translation binary translated target code source code binary translator 25

Binary Translation x86 Source Binary addl %edx,4(%eax) movl 4(%eax),%edx add %eax,4 Translate to PowerPC Target r1 points to x86 register context block r2 points to x86 memory image r3 contains x86 ISA PC value 26

Binary Translation lwz r4,0(r1) ;load %eax from register block addi r5,r4,4 ;add 4 to %eax lwzx r5,r2,r5 ;load operand from memory lwz r4,12(r1) ;load %edx from register block add r5,r4,r5 ;perform add stw r5,12(r1) ;put result into %edx addi r3,r3,3 ;update PC (3 bytes) lwz r4,0(r1) ;load %eax from register block addi r5,r4,4 ;add 4 to %eax lwz r4,12(r1) ;load %edx from register block stwx r4,r2,r5 ;store %edx value into memory addi r3,r3,3 ;update PC (3 bytes) lwz r4,0(r1) ;load %eax from register block addi r4,r4,4 ;add immediate stw r4,0(r1) ;place result back into %eax addi r3,r3,3 ;update PC (3 bytes) 27

Register Mapping • Map source registers to target registers • Reduces memory loads / stores • If target registers < source registers • Map some to memory • Map on per-block basis 28

Register Mapping r1 points to x86 register context block r2 points to x86 memory image r3 contains x86 ISA PC value r4 holds x86 register %eax r7 holds x86 register %edx etc. addi r16,r4,4 ;add 4 to %eax lwzx r17,r2,r16 ;load operand from memory add r7,r17,r7 ;perform add of %edx addi r16,r4,4 ;add 4 to %eax stwx r7,r2,r16 ;store %edx value into memory addi r4,r4,4 ;increment %eax addi r3,r3,9 ;update PC (9 bytes) 29

Code Discovery Problem • May be difficult to statically predecode or translate the entire source program • Code Discovery Problem: how to find the beginning of all source instructions? • Consider the x86 code: mov %ch,0 ?? 31 c0 8b b5 00 00 03 08 8b bd 00 00 03 00 movl %esi, 0x08030000(%ebp) ?? 30

Code Discovery Problem • Contributors to the code discovery problem • Variable length CISC instructions • Indirect jumps • Data interspersed with code • Padding instructions to align branch targets source ISA instructions inst. 1 inst. 2 data in instruction jump inst. 3 stream reg. data inst. 5 inst. 6 uncond. brnch pad pad for instruction alignment inst. 8 jump indirect to??? 31

Emulation Michael Jantz Acknowledgements Slides adapted from - PowerPoint PPT Presentation

Emulation Michael Jantz Acknowledgements Slides adapted from Chapter 2 in Virtual Machines: Versatile Platforms for Systems and Processes by James E. Smith and Ravi Nair Credit to Prasad A. Kulkarni some slides were borrowed from his

MAPS UMTS for IuCS, IuH Interfaces Emulator (IuCS Emulation over IP and ATM; and IuH Emulation

Emulation Outline Emulation Interpretation basic, threaded, directed threaded

Emulation in ns Presented by Alefiya Hussain What is Emulation Ability to introduce the

"ENLIGHTENING" KVM "ENLIGHTENING" KVM HYPER-V EMULATION HYPER-V EMULATION

Game boy emulation Nicolas Montanaro nicolas.moe Emulation Overview hardware or software

Chip-8 Emulation on a SoCKit FPGA Team: Ashley Kling, Levi Oliver, Gabrielle Taylor, David

1 6/17/2011 Introduction Emulation Evaluation Conclusions CPU Device Chipset Memory

vIOMMU/ARM: full emulation and virtio-iommu approaches Eric Auger KVM Forum 2017 Overview

Cross-ISA Machine Emulation for Multicores Emilio G. Cota Columbia University Paolo Bonzini

EMULATION OF THE SLOW CONTROL FOR THE PANDA CLUSTER - JET GENERATOR PRESENTED BY Bogusaw

Vehicular network emulation Scientific issues Contribution Team Airplug A. Buisset, B.

Shuntaint: Emulation-based Security Testing for Formal Verification Bruno Luiz

1 2 For todays lecture, well start by defining what we mean by emulation. Specifically, in

Linux emulation Ron Minnich Fifth IWP9 With thanks to Jim McKie Ron Minnich Linux emulation A

ED137 VoIP Emulation and Analysis Tools for Air Traffic Management (ATM) 818 West Diamond Avenue

MAPS MAP EMULATOR Mobile Application Part Emulation over IP & TDM 818 West Diamond Avenue

Software Reliability and System reliability Steven J Zeil Old Dominion Univ. Spring 2012 1

Modular Instrumentation of Interpreters in JavaScript Florent Marchand de Kerchove, Jacques

Context Threading: A flexible and efficient dispatch technique for virtual machine interpreters

STACL: Simultaneous Translation with Integrated Anticipation & Controllable Latency Liang

Programming Abstractions Week 7-1: MiniScheme Interpreter Stephen Checkoway Project overview In

vmgen - A Generator of Efficient Virtual Machine Interpreters M. Anton Ertl, David Gregg, Andreas

Listening to the Webinar Online: Please make sure your computer speakers are turned on or

Swift: A Register-based JIT Compiler for Embedded JVMs Yuan Zhang, Min Yang, Bo Zhou, Zhemin

Emulation Michael Jantz Acknowledgements Slides adapted from - PowerPoint PPT Presentation

Emulation Michael Jantz Acknowledgements Slides adapted from Chapter 2 in Virtual Machines: Versatile Platforms for Systems and Processes by James E. Smith and Ravi Nair Credit to Prasad A. Kulkarni some slides were borrowed from his

MAPS UMTS for IuCS, IuH Interfaces Emulator (IuCS Emulation over IP and ATM; and IuH Emulation

Emulation Outline Emulation Interpretation basic, threaded, directed threaded

Emulation in ns Presented by Alefiya Hussain What is Emulation Ability to introduce the

&quot;ENLIGHTENING&quot; KVM &quot;ENLIGHTENING&quot; KVM HYPER-V EMULATION HYPER-V EMULATION

Game boy emulation Nicolas Montanaro nicolas.moe Emulation Overview hardware or software

Chip-8 Emulation on a SoCKit FPGA Team: Ashley Kling, Levi Oliver, Gabrielle Taylor, David

1 6/17/2011 Introduction Emulation Evaluation Conclusions CPU Device Chipset Memory

vIOMMU/ARM: full emulation and virtio-iommu approaches Eric Auger KVM Forum 2017 Overview

Cross-ISA Machine Emulation for Multicores Emilio G. Cota Columbia University Paolo Bonzini

EMULATION OF THE SLOW CONTROL FOR THE PANDA CLUSTER - JET GENERATOR PRESENTED BY Bogusaw

Vehicular network emulation Scientific issues Contribution Team Airplug A. Buisset, B.

Shuntaint: Emulation-based Security Testing for Formal Verification Bruno Luiz

1 2 For todays lecture, well start by defining what we mean by emulation. Specifically, in

Linux emulation Ron Minnich Fifth IWP9 With thanks to Jim McKie Ron Minnich Linux emulation A

ED137 VoIP Emulation and Analysis Tools for Air Traffic Management (ATM) 818 West Diamond Avenue

MAPS MAP EMULATOR Mobile Application Part Emulation over IP &amp; TDM 818 West Diamond Avenue

Software Reliability and System reliability Steven J Zeil Old Dominion Univ. Spring 2012 1

Modular Instrumentation of Interpreters in JavaScript Florent Marchand de Kerchove, Jacques

Context Threading: A flexible and efficient dispatch technique for virtual machine interpreters

STACL: Simultaneous Translation with Integrated Anticipation &amp; Controllable Latency Liang

Programming Abstractions Week 7-1: MiniScheme Interpreter Stephen Checkoway Project overview In

vmgen - A Generator of Efficient Virtual Machine Interpreters M. Anton Ertl, David Gregg, Andreas

Listening to the Webinar Online: Please make sure your computer speakers are turned on or

Swift: A Register-based JIT Compiler for Embedded JVMs Yuan Zhang, Min Yang, Bo Zhou, Zhemin

"ENLIGHTENING" KVM "ENLIGHTENING" KVM HYPER-V EMULATION HYPER-V EMULATION

MAPS MAP EMULATOR Mobile Application Part Emulation over IP & TDM 818 West Diamond Avenue

STACL: Simultaneous Translation with Integrated Anticipation & Controllable Latency Liang