1
last time HCL details/built in components HCL debug/interactive options walkthrough of SEQ stages/needed MUXes 2
critical path every path from state output to state input needs enough time output — may change on rising edge of clock input — must be stable suffjciently before rising edge of clock 3 critical path: slowest of all these paths — determines cycle time
SEQ paths function ALU aluA aluB valE 8 0 add/sub xor/and (function of instr.) write? of opcode rA PC+9 instr. length + path 1: 25 picoseconds path 2: 50 picoseconds path 3: 400 picoseconds path 4: 900 picoseconds … … …and many, many more paths rB %rsp PC Data Instr. Mem. register fjle srcA srcB R[srcA] R[srcB] dstE next R[dstE] dstM next R[dstM] Mem. 0xF ZF/SF Stat Data in Addr in Data out valC 0xF 0xF %rsp %rsp 0xF 4
SEQ paths function ALU aluA aluB valE 8 0 add/sub xor/and (function of instr.) write? of opcode rA PC+9 instr. length + path 1: 25 picoseconds path 2: 50 picoseconds path 3: 400 picoseconds path 4: 900 picoseconds … … …and many, many more paths rB %rsp PC Data Instr. Mem. register fjle srcA srcB R[srcA] R[srcB] dstE next R[dstE] dstM next R[dstM] Mem. 0xF ZF/SF Stat Data in Addr in Data out valC 0xF 0xF %rsp %rsp 0xF 4
SEQ paths function ALU aluA aluB valE 8 0 add/sub xor/and (function of instr.) write? of opcode rA PC+9 instr. length + path 1: 25 picoseconds path 2: 50 picoseconds path 3: 400 picoseconds path 4: 900 picoseconds … … …and many, many more paths rB %rsp PC Data Instr. Mem. register fjle srcA srcB R[srcA] R[srcB] dstE next R[dstE] dstM next R[dstM] Mem. 0xF ZF/SF Stat Data in Addr in Data out valC 0xF 0xF %rsp %rsp 0xF 4
SEQ paths function ALU aluA aluB valE 8 0 add/sub xor/and (function of instr.) write? of opcode rA PC+9 instr. length + path 1: 25 picoseconds path 2: 50 picoseconds path 3: 400 picoseconds path 4: 900 picoseconds … … …and many, many more paths rB %rsp PC Data Instr. Mem. register fjle srcA srcB R[srcA] R[srcB] dstE next R[dstE] dstM next R[dstM] Mem. 0xF ZF/SF Stat Data in Addr in Data out valC 0xF 0xF %rsp %rsp 0xF 4
SEQ paths function ALU aluA aluB valE 8 0 add/sub xor/and (function of instr.) write? of opcode rA PC+9 instr. length + path 1: 25 picoseconds path 2: 50 picoseconds path 3: 400 picoseconds path 4: 900 picoseconds … … …and many, many more paths rB %rsp PC Data Instr. Mem. register fjle srcA srcB R[srcA] R[srcB] dstE next R[dstE] dstM next R[dstM] Mem. 0xF ZF/SF Stat Data in Addr in Data out valC 0xF 0xF %rsp %rsp 0xF 4
SEQ paths function ALU aluA aluB valE 8 0 add/sub xor/and (function of instr.) write? of opcode rA PC+9 instr. length + path 1: 25 picoseconds path 2: 50 picoseconds path 3: 400 picoseconds path 4: 900 picoseconds … … …and many, many more paths rB %rsp PC Data Instr. Mem. register fjle srcA srcB R[srcA] R[srcB] dstE next R[dstE] dstM next R[dstM] Mem. 0xF ZF/SF Stat Data in Addr in Data out valC 0xF 0xF %rsp %rsp 0xF 4
sequential addq paths split overall cycle time: 500 picoseconds (longest path) path 4: 500 picoseconds path 3: 500 picoseconds path 2: 375 picoseconds path 1: 25 picoseconds add 2 ADD ADD 0xF next R[dstM] PC dstM next R[dstE] dstE R[srcB] R[srcA] srcB srcA register fjle Mem. Instr. 5
sequential addq paths split overall cycle time: 500 picoseconds (longest path) path 4: 500 picoseconds path 3: 500 picoseconds path 2: 375 picoseconds path 1: 25 picoseconds add 2 ADD ADD 0xF next R[dstM] PC dstM next R[dstE] dstE R[srcB] R[srcA] srcB srcA register fjle Mem. Instr. 5
sequential addq paths split overall cycle time: 500 picoseconds (longest path) path 4: 500 picoseconds path 3: 500 picoseconds path 2: 375 picoseconds path 1: 25 picoseconds add 2 ADD ADD 0xF next R[dstM] PC dstM next R[dstE] dstE R[srcB] R[srcA] srcB srcA register fjle Mem. Instr. 5
sequential addq paths split overall cycle time: 500 picoseconds (longest path) path 4: 500 picoseconds path 3: 500 picoseconds path 2: 375 picoseconds path 1: 25 picoseconds add 2 ADD ADD 0xF next R[dstM] PC dstM next R[dstE] dstE R[srcB] R[srcA] srcB srcA register fjle Mem. Instr. 5
sequential addq paths split overall cycle time: 500 picoseconds (longest path) path 4: 500 picoseconds path 3: 500 picoseconds path 2: 375 picoseconds path 1: 25 picoseconds add 2 ADD ADD 0xF next R[dstM] PC dstM next R[dstE] dstE R[srcB] R[srcA] srcB srcA register fjle Mem. Instr. 5
sequential addq paths split overall cycle time: 500 picoseconds (longest path) path 4: 500 picoseconds path 3: 500 picoseconds path 2: 375 picoseconds path 1: 25 picoseconds add 2 ADD ADD 0xF next R[dstM] PC dstM next R[dstE] dstE R[srcB] R[srcA] srcB srcA register fjle Mem. Instr. 5
Human pipeline: laundry whites sheets sheets sheets colors colors colors whites whites whites colors colors colors whites whites 14:00 Washer 13:00 12:00 11:00 Table Folding Dryer Washer 14:00 13:00 12:00 11:00 Table Folding Dryer 6
Human pipeline: laundry whites sheets sheets sheets colors colors colors whites whites whites colors colors colors whites whites 14:00 Washer 13:00 12:00 11:00 Table Folding Dryer Washer 14:00 13:00 12:00 11:00 Table Folding Dryer 6
Waste (1) whites wasted time! wasted time! sheets sheets sheets colors colors colors whites Washer whites 14:00 13:00 12:00 11:00 Table Folding Dryer 7
Waste (1) whites wasted time! wasted time! sheets sheets sheets colors colors colors whites Washer whites 14:00 13:00 12:00 11:00 Table Folding Dryer 7
Waste (2) whites sheets sheets sheets colors colors colors whites whites Washer 14:00 13:00 12:00 11:00 Table Folding Dryer 8
Latency — Time for One colors normal latency (1.8 h) colors colors colors pipelined latency (2.1 h) sheets sheets sheets colors colors Washer whites whites whites 14:00 13:00 12:00 11:00 Table Folding Dryer 9
Latency — Time for One colors normal latency (1.8 h) colors colors colors pipelined latency (2.1 h) sheets sheets sheets colors colors Washer whites whites whites 14:00 13:00 12:00 11:00 Table Folding Dryer 9
Latency — Time for One colors normal latency (1.8 h) colors colors colors pipelined latency (2.1 h) sheets sheets sheets colors colors Washer whites whites whites 14:00 13:00 12:00 11:00 Table Folding Dryer 9
Throughput — Rate of Many colors time between starts (0.83 h) loads/h h load time between fjnishes (0.83 h) sheets sheets sheets colors colors Washer whites whites whites 14:00 13:00 12:00 11:00 Table Folding Dryer 10
Throughput — Rate of Many Washer time between starts (0.83 h) time between fjnishes (0.83 h) sheets sheets sheets colors colors colors whites whites whites 14:00 13:00 12:00 11:00 Table Folding Dryer 10 1 load 0 . 83 h = 1 . 2 loads/h
Throughput — Rate of Many Washer time between starts (0.83 h) time between fjnishes (0.83 h) sheets sheets sheets colors colors colors whites whites whites 14:00 13:00 12:00 11:00 Table Folding Dryer 10 1 load 0 . 83 h = 1 . 2 loads/h
times three circuit 7 10 results/ns throughput 100 ps latency 100 ps 50 ps 0 ps 21 14 add add ADD ADD ADD ADD 11 A 2 × A 3 × A
times three circuit 7 10 results/ns throughput 100 ps latency 100 ps 50 ps 0 ps 21 14 11 ADD ADD ADD ADD A 2 × A 3 × A A add A + A 2 × A add 2 A + A 3 × A
times three circuit 7 100 ps 50 ps 0 ps 21 14 11 ADD ADD ADD ADD A 2 × A 3 × A 100 ps latency = ⇒ 10 results/ns throughput A add A + A 2 × A add 2 A + A 3 × A
times three and repeat 2 21 17 34 51 4 8 12 1 3 7 23 46 69 0 ps 100 ps 200 ps 300 ps 400 ps 500 ps 14 add 12 2 7 14 17 34 4 8 add 1 23 46 0 ps 100 ps 200 ps 300 ps 400 ps 500 ps A add A + A 2 × A add 2 A + A 3 × A
times three and repeat 2 21 17 34 51 4 8 12 1 3 7 23 46 69 0 ps 100 ps 200 ps 300 ps 400 ps 500 ps 14 12 2 23 7 14 17 34 4 8 1 46 0 ps 100 ps 200 ps 300 ps 400 ps 500 ps A add A + A 2 × A add 2 A + A 3 × A A add A + A 2 × A add 2 A + A 3 × A
pipelined times three ( 34 17 17 21 14 7 7 ) ( ) ) ( ) ( ADD ADD ADD ADD 13 A ( t + 2 ) 2 × A ( t + 1 ) 3 × A ( t + 0 ) A ( t + 1 )
pipelined times three 7 34 17 17 21 14 7 13 ADD ADD ADD ADD A ( t + 2 ) 2 × A ( t + 1 ) 3 × A ( t + 0 ) A ( t + 1 ) A ( t + 2 ) A ( t + 1 ) 2 × A ( t + 1 ) 3 × A ( t + 0 )
register tolerances register output register input output changes input must not change register delay 14
register tolerances register output register input output changes input must not change register delay 14
register tolerances register output register input output changes input must not change register delay 14
Recommend
More recommend