previous lecture slides for lecture 7
play

Previous Lecture Slides for Lecture 7 ENCM 501: Principles of - PDF document

slide 2/31 ENCM 501 W14 Slides for Lecture 7 Previous Lecture Slides for Lecture 7 ENCM 501: Principles of Computer Architecture Winter 2014 Term endianness Steve Norman, PhD, PEng addressing modes Electrical & Computer


  1. slide 2/31 ENCM 501 W14 Slides for Lecture 7 Previous Lecture Slides for Lecture 7 ENCM 501: Principles of Computer Architecture Winter 2014 Term ◮ endianness Steve Norman, PhD, PEng ◮ addressing modes Electrical & Computer Engineering ◮ examples of tradeoffs in instruction set design Schulich School of Engineering University of Calgary 30 January, 2014 ENCM 501 W14 Slides for Lecture 7 slide 3/31 ENCM 501 W14 Slides for Lecture 7 slide 4/31 Today’s Lecture Conditional branch options Most ISAs make branch decisions based on a few bits called flag bits or condition code bits that sit within some kind of processor status register . Let’s look at this for a simple C example, in which j and k are ◮ completion of previous lecture int variables in registers: ◮ introduction to memory systems if (i < k) goto L1; ◮ review of SRAM and DRAM x86-64 translation, assuming i in %eax , k in %edx : Related reading in Hennessy & Patterson: Sections 2.1, B.1 cmpl %edx, %eax # compare registers jl L1 # branch based on N and V flags jl means “jump if less than.” (Note: In reality the assembly language label almost certainly won’t be the same as the C label L1 .) slide 5/31 slide 6/31 ENCM 501 W14 Slides for Lecture 7 ENCM 501 W14 Slides for Lecture 7 Conditional instructions in ARM Recall from Assignment 1 that MIPS offers the conditional For the same C code, here is an ARM translation, assuming move instructions MOVN and MOVZ . (MIPS also has some i in r0 , k in r1 : similar floating-point conditional move instructions). CMP r0, r1 ; compare registers ARM takes this idea to the extreme— every ARM instruction BLT L1 ; branch based on N and V flags is conditional! Bits 31–28 of an ARM instruction are the so-called cond field, which specifies that the instruction either MIPS is unusual—the comparison result goes into a GPR. performs some action or is a no-op, depending on some Suppose we have i in R4 , k in R5 . . . condition on zero or more of the N, Z, V and C flags. SLT R8, R4, R5 # R8 = (R4 < R5) Example ARM cond field patterns: BNE R8, R0, L1 # branch if R8 != 0 ◮ 1110, for ALWAYS. The instruction is never a no-op. This is the default cond field in ARM assembly language. ◮ 0000, for EQUAL. Execute the instruction if and only if the Z flag is 1.

  2. slide 7/31 slide 8/31 ENCM 501 W14 Slides for Lecture 7 ENCM 501 W14 Slides for Lecture 7 The power of ARM conditional instructions is illustrated by this example . . . Here is some C code: if (i == 33 || i == 63) count++; Acknowledgment: If i and count are int s in ARM registers r0 and r1 , here is Example on previous slide adapted from an example on pages ARM assembly language for the C code: 129–130 of Hohl, W., ARM Assembly Language: Fundamentals and Techniques , c � 2009, ARM (UK), published TEQ r0, #33 ; # indicates immediate mode by CRC Press. TEQNE r0, #63 ADDEQ r1, r1, #1 ; Note typo in Lec 6 slides! The cond field for the first instruction is 1110, for “always”. For the second instruction, it’s 0001, for “do it only if the Z flag is 0”, and for the third, it’s 0000, for “do it only if the Z flag is 1”. ENCM 501 W14 Slides for Lecture 7 slide 9/31 ENCM 501 W14 Slides for Lecture 7 slide 10/31 MIPS versus ARM: Vague arguments MIPS versus ARM: How to be quantitative A fair and thorough study would require at least : CPU time = IC × CPI × clock period ◮ real applications that are reasonably good fits for both ISAs; MIPS attacks CPI by making instructions very simple and easy ◮ the best possible compilers for each of the ISAs; to pipeline. ◮ processors fabricated with the same transistor and ARM tries to be close to MIPS with respect to CPI, and is interconnect technology, and very similar die sizes. much better than older CISC ISAs for CPI. ARM attacks IC by Even then, it might not be a truly fair fight between ISAs, if doing things in one instruction that might sometimes take two one side has better digital designers than the other. or three MIPS instructions. slide 11/31 slide 12/31 ENCM 501 W14 Slides for Lecture 7 ENCM 501 W14 Slides for Lecture 7 We’re moving on from ISA to microarchitecture Views of memory: ISA versus microarchitecture (1) The modern ISA view of memory is simple: Memory is flat. For a program on a 32-bit system, a few regions within the address space from 0 to 0x ffff ffff are available. As long For (much) more about ISA design considerations, see as alignment rules are respected, any memory read is pretty Appendix K of the textbook, which is available in PDF format much the same as any other read, and any memory write is as a no-charge download. pretty much the same as any other write. The first aspect of microarchitecture we’ll look at is the The story is essentially the same for 64-bit systems, except memory hierarchy . that the maximum address is 0x ffff ffff ffff ffff . This simplicity is great for compiler writers choosing addressing modes for instructions, and for linker writers finding ways to stitch pieces of machine language together into complete machine language programs.

  3. slide 13/31 slide 14/31 ENCM 501 W14 Slides for Lecture 7 ENCM 501 W14 Slides for Lecture 7 Views of memory: ISA versus microarchitecture (2) Components within a memory system The schematic on the next page shows typical memory The modern microarchitecture view of memory is that organization for a desktop computer in the time period from memory is not at all simple! about 1999 to 2004. Modern memory systems are designed as complex hierarchies, The box labeled CORE would contain GPRs, ALUs, control with some subsystems optimized for high speed and others for circuits and so on. large capacity and/or low cost. Energy use per memory access may be an important factor as well. TLB stands for translation lookaside buffer . A TLB does high-speed translation of virtual addresses into physical Understanding of this kind of hierarchy is critical at several addresses . levels of computer engineering. Examples: ◮ selection of processors for embedded applications The core generates virtual addresses—PC values for instruction fetches, and data addresses generated by load and ◮ systems software development—operating system kernels, store instructions. Most cache designs are based on physical libraries, etc. addresses, and the DRAM circuits definitely require physical ◮ application software development addresses. ENCM 501 W14 Slides for Lecture 7 slide 15/31 ENCM 501 W14 Slides for Lecture 7 slide 16/31 What are caches for? Sizes of boxes reflect neither chip area nor storage capacity! In trying to make sense of the complicated interconnections I-TLB DRAM CONTROLLER L1 I- and interactions between caches it really helps to keep in mind CACHE what problems are solved by caches and what very UNIFIED DRAM CORE different problems are solved by virtual memory. L2 MODULES CACHE D-TLB Let’s start with caches . Caches exist to optimize L1 D- performance in the face of some difficult facts: CACHE ◮ DRAM latency is on the order of 100 processor clock cycles The yellow box shows what would be included in a processor ◮ latency in small SRAM arrays is on the order of chip in the 1999–2004 time frame. 1 processor clock cycle In 2014, a quad-core chip would include four copies of ◮ latency in larger SRAM arrays is on the order of everything in yellow, plus a large L3 cache shared by all four 10 processor clock cycles cores. The DRAM controller would be on-chip. slide 17/31 slide 18/31 ENCM 501 W14 Slides for Lecture 7 ENCM 501 W14 Slides for Lecture 7 What is virtual memory for? SRAM and DRAM Virtual memory is a system that operating system kernels can use to support applications. Some of the key benefits are: ◮ Protection. Each process —each running user program—has its own virtual address space. Processes cannot accidentally or maliciously access each other’s Before looking in detail at how caches work, let’s look at the memory. two main kinds of volatile storage in use in computer systems. ◮ Efficient memory allocation. A kernel can give an application a large contiguous piece of virtual address space made from many fragmented pieces of physical address space. ◮ Spilling to disk. If DRAM gets close to full, the kernel can copy pages of application memory to disk—the effective memory available can be greater than the DRAM capacity.

Recommend


More recommend