1
play

1 CPI (cycles per instruction) CPI (cycles per instruction) - PDF document

Processor Execution time Processor Execution time Processor Processor Clock Cycles Instructio n Count Cycles per Instructio n = Performance and Performance and CPU Time Instructio n Count CPI Clock period =


  1. Processor Execution time Processor Execution time Processor Processor Clock Cycles Instructio n Count Cycles per Instructio n = × Performance and Performance and CPU Time Instructio n Count CPI Clock period = × × Parallelism Parallelism The time taken by a program to execute is the product of The time taken by a program to execute is the product of n Number of machine instructions executed n Number of machine instructions executed n Number of clock cycles per instruction ( CPI ) n Number of clock cycles per instruction ( CPI ) Slides by YashwantMalaiya n Single clock period duration n Single clock period duration Limited content from: Example : 10,000 instructions, CPI=2, clock period = 250 Example : 10,000 instructions, CPI=2, clock period = 250 Computer Architecture ps ps A Quantitative Approach Hennessy, Patterson CPU Time 1 0 , 000 instructio ns 2 250 ps = × × 4 12 6 − − 10 2 250 . 10 5 . 10 sec . = × × = CS 270 - Spring Semester 2016 2 Processor Execution time Processor Execution time Reducing clock cycle time Reducing clock cycle time CPU Time Instructio n Count CPI Clock Cycle Time = × × Has worked well for Has worked well for decades. decades. Instruction Count for a program Instruction Count for a program Small transistor Small transistor n Determined by program, ISA and compiler n Determined by program, ISA and compiler Average Cycles per instruction (CPI) Average Cycles per instruction (CPI) dimensions implied dimensions implied n Determined by C n Determined by C PU hardw PU hardw are are smaller delays and hence smaller delays and hence n If different instructions have different CPI n If different instructions have different CPI lower clock cycle time. lower clock cycle time. Average CPI affected by instruction m Average CPI affected by instruction m ix ix Clock cycle time ( inverse of frequency ) Clock cycle time ( inverse of frequency ) Not any more. Not any more. n Logic levels n Logic levels n technology n technology CS 270 - Spring Semester 2016 CS 270 - Spring Semester 2016 3 4 1

  2. CPI (cycles per instruction) CPI (cycles per instruction) Parallelism to save time Parallelism to save time What is LC-3 cycles per instruction? What is LC-3 cycles per instruction? Do things in parallel to save time. Do things in parallel to save time. Instructions take 5-9 cycles (p. 568), assuming Instructions take 5-9 cycles (p. 568), assuming Example: Pipelining Example: Pipelining memory access time is one clock period. memory access time is one clock period. n Divide flow into stages. n Divide flow into stages. n LC-3 CPI may be about 6*. (ideal) n LC-3 CPI may be about 6*. (ideal) n Let instructions flow into the pipeline. n Let instructions flow into the pipeline. Load/store instructions are about 20-30% No cache, memory access time = 100 cycles? No cache, memory access time = 100 cycles? n At a time multiple instructions are under execution. n At a time multiple instructions are under execution. n LC-3 CPI would be very high. n LC-3 CPI would be very high. Cache reduces access time to 2 cycles. Cache reduces access time to 2 cycles. n LC-3 CPI higher than 6, but still reasonable. n LC-3 CPI higher than 6, but still reasonable. CS 270 - Spring Semester 2016 CS 270 - Spring Semester 2016 5 6 Pipelining Analogy Pipelining Analogy Pipeline Processor Performance Pipeline Processor Performance Pipelined laundry: overlapping execution Pipelined laundry: overlapping execution Single-cycle (T c = 800ps) n Parallelism improves performance n Parallelism improves performance n Four loads: n time = 4x2 = 8 hours Pipelined (T c = 200ps) n Pipelined: n Time in example = 7x0.5 = 3.5 hours n Non-stop = 4x0.5 = 2 hours. CS 270 - Spring Semester CS 270 - Spring Semester CS 270 - Spring Semester CS 270 - Spring Semester 7 8 2016 2016 2016 2016 2

  3. Pipelining: Issues Pipelining: Issues Instruction level parallelism (ILP): Instruction level parallelism (ILP): Pipelining is one example. Pipelining is one example. Cannot predict which branch will be taken. Cannot predict which branch will be taken. Multiple issue : have multiple copies of resources Multiple issue : have multiple copies of resources n Actually you may be able to make a good guess. n Actually you may be able to make a good guess. n Multiple instructions start at the same time n Multiple instructions start at the same time n Some performance penalty for bad guesses. n Some performance penalty for bad guesses. n Need careful scheduling n Need careful scheduling Compiler assisted scheduling Compiler assisted scheduling Instructions may depend on results of previous Instructions may depend on results of previous Hardware assisted (“ superscaler ”): “dynamic scheduling” Hardware assisted (“ superscaler ”): “dynamic scheduling” instructions. instructions. Ex: AMD Opteron x4 Ex: AMD Opteron x4 n n CPI can be less than 1!. CPI can be less than 1!. n n n There may be a way to get around that problem in n There may be a way to get around that problem in some cases. some cases. CS 270 - Spring Semester 2016 CS 270 - Spring Semester 2016 9 10 10 Flynn’s taxonomy Flynn’s taxonomy Multi what? Multi what? Michael J. Flynn, 1966 Michael J. Flynn, 1966 Multitasking: tasks share a processor Multitasking: tasks share a processor Multithreading: threads share a processor Multithreading: threads share a processor Data Streams Multiprocessors: using multiple processors Multiprocessors: using multiple processors Single Multiple Instruction Single SISD : SIMD : SSE n For example multi-core processors (multiples n For example multi-core processors (multiples Streams Intel Pentium 4 instructions of x86 processors on the same chip) processors on the same chip) Multiple MISD : MIMD : n Scheduling of tasks/subtasks needed n Scheduling of tasks/subtasks needed No examples today Intel Xeon e5345 Thread level parallelism: Thread level parallelism: n Instruction level parallelism is still SISD n multiple threads on one/more processors n multiple threads on one/more processors n SSE (Streaming SIMD Extensions): vector Simultaneous multi-threading: Simultaneous multi-threading: operations n multiple threads in parallel (using multiple state s) n multiple threads in parallel (using multiple state s) n Intel Xeon e5345: 4 cores CS 270 - Spring Semester 2016 CS 270 - Spring Semester 2016 11 11 12 12 3

  4. Multi-core processors Multi-core processors Multi-core processors Multi-core processors Power consumption has Power consumption has Cores may be identical or specialized Cores may be identical or specialized become a limiting factor become a limiting factor Higher level caches are shared. Higher level caches are shared. Key advantage: lower power Key advantage: lower power consumption for the same consumption for the same Lower level cache coherency required. Lower level cache coherency required. performance performance Cores may use superscalar or simultaneous Cores may use superscalar or simultaneous n Ex: 20% low n Ex: 20% low er clock er clock frequency: 87% performance, frequency: 87% performance, multi-threading architectures. multi-threading architectures. 51% power. 51% power. A processor can switch to A processor can switch to lower frequency to reduce lower frequency to reduce power. power. N cores: can run n or more N cores: can run n or more threads. threads. CS 270 - Spring Semester 2016 CS 270 - Spring Semester 2016 13 13 14 14 LC-3 LC-3 states states Instructio Cycles n ADD, 5 AND, NOT , JMP TRAP 8 LD, LDR, 7 ST , STR LDI, STI 9 BR 5, 6 JSR 6 CS 270 - Spring Semester 2016 15 15 4

Recommend


More recommend