For our next chapter, we will discuss the emulation process – which is an integral part of virtual machines. 1
2
For today’s lecture, we’ll start by defining what we mean by emulation. Specifically, in this section, we’ll focus on how to emulate the instructions for one machine on another machine. There are two basic methods for emulating an instruction set. The first is interpretation. We’ll discuss basic techniques for building an interpreter – including basic, indirect threaded and direct threaded interpretation. The other is binary translation – which is basically compiling sections of the source binary to the target platform. There are a number of issues that arise during binary translation – including code discovery and code location – which have to do with separating instructions from data in the source binary. We’ll also look at other issues, such as mapping registers in the source machine to registers on the target platform. Next, we’ll look at optimizations to deal with control transfers in the translated code – and we’ll end by looking at some issues with translating specific parts of the 3
instruction set. 3
Let’s talk about a few of the definitions we’ll use in our discussion going forward. First, the book defines emulation as the process of implementing the interface or functionality of one system on a different system. Notice that this is not very different from our definition of virtualization – and taken in this general sense – I think you could argue that virtualization is just a form of emulation. However, for our discussion, we’re going to narrow the definition of emulation and say that it applies specifically to instruction sets. So, emulation is the process of taking one instruction set and implementing its interface or functionality on a machine with a different instruction set. There are different techniques for doing emulation which we’ll discuss over the next few lectures. The first is interpretation – which is basically instruction-at-a-time translation of the source instructions. Interpretation is simple to implement, but relatively slow compared to binary translation. The other strategy is binary translation – which is basically block-at-a-time translation of the source program. This technique can improve performance over interpretation 4
– but it’s a bit more challenging to implement efficiently. It takes time to compile blocks of code – so you have to prioritize which parts of the program you want to compile first. Also, you have to keep this compiled code in memory and have a way to transfer control from compiled to non-compiled blocks, and vice versa. Now, I want to talk about how emulation is related to the term simulation because the terms are often confused. They are related, but they are different concepts. Simulation is a method for modeling a system’s operation. So, simulators are used when you need to understand how something works, but don’t have the means to setup an experiment on a real machine. For instance, in my lab, we are trying to study heterogeneous memory architectures (systems with multiple types of memory), but these systems do not exist yet. So, we use a simulator called ramulator to simulate the operation of multiple types of memory in one machine. Emulation is often part of the process of simulation, but simulators will often implement much more functionality to give you information about how a system works. For instance, there are processor simulators that will actually execute the application, but additionally, they provide information about how many cycles the processor would require to execute the application – or how efficiently the application will utilize processor caches. 4
Recall from our discussion on virtual machines that the the guest refers to the system or interface that will be supported by the underlying platform. The host refers to the underlying platform that is used to provide an environment for the guest. 5
In the context of instruction set emulation, we use the terms source and target to refer to the instructions that participate in the emulation. The source ISA (or binary) is the original instruction set or binary file that needs to be emulated. The target ISA (or binary) is the ISA of the host processor you want to use to run your source instructions. So, we need to somehow translate the source instructions to the target ISA to emulate them on the host platform. So, I’ll try to use the terms source and target when referring to instruction sets that are emulated, and guest and host when referring platforms that are virtualized. The terms are very similar, and even in the literature, they are not always used consistently. 6
OK – so for this lecture we will be primarily concerned with instruction set emulation since that is a key aspect of most virtual machine implementations. Our definition for instruction set emulation is on this slide. We say the ‘source is emulated by the target’ if the binaries in the source instruction set can be executed on a machine implementing the target instruction set. As we’ve discussed, this capability is required for many VM implementations. An example of instruction set emulation is the IA-32 execution layer we discussed last time. Basically, this is software that Intel developed to enable the execution of older 32-bit x86 executables on 64 bit Itanium processors. The IA-32 EL translated 32-bit x86 instructions to the Itanium’s 64 bit instruction set – and it was integrated into the OS – and so it did this emulation seamlessly and transparently to the upper-level applications. 7
You can think of ISA emulation techniques as existing on a spectrum where different techniques require different amounts of computing resources and offer different performance and portability characteristics. On one end of the spectrum is the straightforward method of interpretation and on the other is binary translation. Interpretation involves a cycle of fetching a source instruction, analyzing it, performing the required operation, and then fetching the next source instruction. Interpretation is the simplest emulation technique, but it typically has poor performance. Interpreters are also often implemented in a high-level language, such as C, and so their often portable. Binary translation tries to amortize the fetch and analysis costs by translating a block of source instructions to a block of target instructions, and then saving the translated block for repeated use. Implementing binary translation is more complex, and it requires higher initial cost to 8
translate the blocks, but it can pay dividends by providing better long-term performance. There are other techniques that attempt to eliminate the drawbacks of both approaches. Predecoding is a preprocessing step done for interpretation to do some of the work of interpreting the instructions beforehand to speed up the process of interpretation. And selective compilation is sort of a hybrid approach that uses interpretation early in the run and for sections of code that are not executed very often, and binary translation for sections of code that are expected to be hot. We’ll talk about each of these in more depth later in the lecture. 8
Let’s first talk about how interpretation of the source ISA is implemented. The interpreter program has to maintain the complete architected state of a machine implementing the source ISA. So, this figure shows that the interpreter maintains an image of all of the guest’s memory, including code, data, and stack regions for the executable. Additionally, the interpreter holds a table called the ‘context block’. The context block contains the various components of the source’s architected state, including general -purpose registers, the program counter, condition codes, and miscellaneous control registers. (point out the context block). Ask: what is the condition codes register? 9
The simplest interpreter implementation is known as a decode-and-dispatch interpreter. It’s implementation is structured around a simple loop that steps through the program, one instruction at a time, and modifies the state of the source according to the instruction. For each iteration of the loop, it decodes the current instruction, and dispatches it to an interpretation routine based on the type of the instruction. The code on this slide shows a decode-and-dispatch loop for interpreting the PowerPC ISA. In this example, the source instructions are kept in an array called ‘code’, and PC is the current value of the source’s program counter. So, we first get the next instruction by indexing into the ‘code’ array. Next, the extract function extracts the opcode from the current instruction. (bit slicing) 10
Next, we enter a switch statement, where, based on this opcode, we will jump to an interpreter routine that implements this source instruction on the target machine. 10
Here’s an example interpreter routine for the LoadWordAndZero source instruction. The Load Word and Zero instruction loads a 32-bit word into a 64-bit register and zeroes the upper 32-bits of the register; it is the basic PowerPC load word instruction. 11
Recommend
More recommend