datapath components 3 those who remember things
play

Datapath Components (3) Those Who Remember Things Prof. Usagi - PowerPoint PPT Presentation

Datapath Components (3) Those Who Remember Things Prof. Usagi Recap: Combinational v.s. sequential logic Combinational logic The output is a pure function of its current inputs The output doesnt change regardless how many


  1. Datapath Components (3) — Those Who “Remember” Things Prof. Usagi

  2. Recap: Combinational v.s. sequential logic • Combinational logic • The output is a pure function of its current inputs • The output doesn’t change regardless how many times the logic is triggered — Idempotent • Sequential logic • The output depends on current inputs, previous inputs, their history Sequential circuit has memory! 2

  3. Recap: D flip-flop Input D Q D Q Output D-latch D-latch Clk Clk Clk Master-Slave D Flip-flop Clk Input Output 3

  4. Recap: Positive-edge-triggered D flip-flop Q Clock Q Data 4

  5. Outline • Volatile Memory • Registers • SRAM • DRAM • Programming and memory • Non-volatile Memory 5

  6. Registers 6

  7. Registers • Register: a sequential component that can store multiple bits • A basic register can be built simply by using multiple D-FFs Register Output 1 Output 3 Output 2 Output 4 Output 5 D Flip- D Flip- D Flip- D Flip- D Flip- flop flop flop flop flop D Q D Q D Q D Q D Q D Clk Input 1 Input 3 Input 2 Input 4 Input 5 Inpu 7

  8. Poll close in What will we output 4 cycles later? Output 1 Output 1 Output 3 Output 3 Output 2 Output 2 Output 4 Output 4 D Flip- D Flip- D Flip- D Flip- D Flip- D Flip- D Flip- D Flip- flop flop flop flop flop flop flop flop Input 1 D D Q Q D D Q Q D D Q Q D D Q Q Clk Clk • For the above D-FF organization, what are we expecting to see in (O1,O2,O3,O4) in the Input 1 Input 3 Input 2 Input 4 beginning of the 5th cycle after receiving (1,0,1,1)? A. (1,1,1,1) B. (1,0,1,1) C. (1,1,0,1) D. (0,0,1,0) E. (0,1,0,0) 8

  9. What will we output 4 cycles later? Output 1 Output 3 Output 2 Output 4 D Flip- D Flip- D Flip- D Flip- flop flop flop flop Input 1 D Q D Q D Q D Q Clk • For the above D-FF organization, what are we expecting to see in (O1,O2,O3,O4) in the beginning of the 5th cycle after receiving (1,0,1,1)? A. (1,1,1,1) B. (1,0,1,1) C. (1,1,0,1) D. (0,0,1,0) E. (0,1,0,0) 9

  10. Shift register • Holds & shifts samples of input Output 1 Output 1 Output 3 Output 3 Output 2 Output 2 Output 4 Output 4 D Flip- D Flip- D Flip- D Flip- D Flip- D Flip- D Flip- D Flip- flop flop flop flop flop flop flop flop Input 1 D D Q Q D D Q Q D D Q Q D D Q Q Clk Clk Input 1 Input 3 Input 2 Input 4 10

  11. Let’s play with the shift register more… Output 1 Output 3 Output 3 Output 2 Output 4 Output 4 Output 1 Output 2 D Flip- D Flip- D Flip- D Flip- D Flip- D Flip- D Flip- D Flip- flop flop flop flop flop flop flop flop Input 1 Input 1 D D Q Q D D Q Q D D Q Q D D Q Q Clk Clk 11

  12. Poll close in Let’s play with the shift register more… • For the extended shift register, what sequence of input will the let the circuit output “1”? A. (1, 1, 1, 1) B. (0, 1, 0, 1) C. (1, 0, 1, 0) D. (0, 1, 1, 0) E. (1, 0, 0, 1) 12

  13. Let’s play with the shift register more… • For the extended shift register, what sequence of input will the let the circuit output “1”? A. (1, 1, 1, 1) B. (0, 1, 0, 1) C. (1, 0, 1, 0) D. (0, 1, 1, 0) E. (1, 0, 0, 1) 13

  14. Pattern Recognizer We can recognize 1001! • Combinational function of input samples Output 1 Output 3 Output 3 Output 2 Output 4 Output 4 Output 1 Output 2 D Flip- D Flip- D Flip- D Flip- D Flip- D Flip- D Flip- D Flip- flop flop flop flop flop flop flop flop Input 1 Input 1 D D Q Q D D Q Q D D Q Q D D Q Q Clk Clk 14

  15. Counters • Sequences through a fixed set of patterns • Note: definition is general • For example, the one in the figure is a type of counter called Linear Feedback Shift Register (LFSR) Output 1 Output 3 Output 2 Output 4 D Flip- D Flip- D Flip- D Flip- flop flop flop flop D Q D Q D Q D Q Clk 15

  16. Static Random Access Memory (SRAM) 16

  17. A Classical 6-T SRAM Cell bitline’ bitline wordline Q Q’ Sense Amplifier 17

  18. A Classical 6-T SRAM Cell Sense Amplifier 18

  19. Write “1” to an SRAM Cell bitline’ bitline wordline 1 1 0 0 0 1 Q Q’ • Bitlines overpower cell with new value • Q = 0, Q’ = 1, BL = 1, BL’ = 0 — Force Q’ low, then Q rises high Sense Amplifier 19

  20. Write “0” to an SRAM Cell bitline’ bitline wordline 0 0 1 1 1 0 Q Q’ Sense Amplifier 20

  21. Reading from an SRAM Cell bitline’ bitline wordline 1 0 Q Q’ 1 0 Sense Amplifier 21

  22. SRAM array wd0 wd1 wd2 wd(m-1) 0 1 2 Decoder We can only work on cells sharing upper bits of the same word line simultaneously address n-1 Sense Sense Sense Sense Amp Amp Amp Amp lower bits of MUX address 22

  23. Dynamic Random Access Memory (DRAM) 23

  24. An DRAM cell data • 1 transistor (rather than 6) • Relies on large capacitor to store bit • Write: transistor conducts, data voltage level gets stored on top plate of capacitor wordline • Read: look at the value of d • Problem: Capacitor discharges over time • Must “refresh” regularly, by reading d and then writing it right back 24

  25. DRAM array 0 1 2 Row Decoder Usually 4K — the page size of your OS! upper bits of address n-1 Row Buffer lower bits of MUX address 25

  26. Poll close in Register v.s. DRAM v.s. SRAM • Consider the following memory elements ① 64*64-bit Registers ② 512B SRAM ③ 512B DRAM A. Area: (1) > (2) > (3) Delay: (1) < (2) < (3) B. Area: (1) > (3) > (2) Delay: (1) < (3) < (2) C. Area: (3) > (1) > (2) Delay: (1) < (3) < (2) D. Area: (3) > (2) > (1) Delay: (3) < (2) < (1) E. Area: (2) > (3) > (1) Delay: (2) < (3) < (1) 26

  27. Register v.s. DRAM v.s. SRAM • Consider the following memory elements ① 64*64-bit Registers ② 512B SRAM ③ 512B DRAM A. Area: (1) > (2) > (3) Delay: (1) < (2) < (3) B. Area: (1) > (3) > (2) Delay: (1) < (3) < (2) C. Area: (3) > (1) > (2) Delay: (1) < (3) < (2) D. Area: (3) > (2) > (1) Delay: (3) < (2) < (1) E. Area: (2) > (3) > (1) Delay: (2) < (3) < (1) 27

  28. RC charging 28

  29. Latency of volatile memory Size (Transistors per bit) Latency (ns) Register 18T ~ 0.1 ns SRAM 6T ~ 0.5 ns DRAM 1T 50-100 ns 29

  30. Programming and memory 30

  31. Memory “hierarchy” in modern processor architectures Processor fastest Processor < 1ns Core fastest Registers 32 or 64 words L1 $ L2 $ SRAM $ a few ns KBs ~ MBs L3 $ GBs DRAM tens of ns larger TBs tens of ns Storage larger 31

  32. Poll close in Thinking about programming struct student_record int main( int argc, char **argv) { { int id; int i,j; double homework; double midterm_average=0.0; double midterm; int number_of_records = 10000000; double final; struct timeval time_start, time_end; }; id = ( int *)malloc( sizeof ( int )*number_of_records); midterm = ( double *)malloc( sizeof ( double )*number_of_records); int main( int argc, char **argv) final = ( double *)malloc( sizeof ( double )*number_of_records); { homework = ( double *)malloc( sizeof ( double )*number_of_records); int i,j; init(number_of_records); double midterm_average=0.0; int number_of_records = 10000000; for (j = 0; j < 100; j++) struct timeval time_start, time_end; for (i = 0; i < number_of_records; i++) struct student_record *records; midterm_average+=midterm[i]; records = ( struct student_record*)malloc( sizeof ( struct free(id); student_record)*number_of_records); free(midterm); init(number_of_records,records); free(final); free(homework); return 0; for (j = 0; j < 100; j++) } for (i = 0; i < number_of_records; i++) midterm_average+=records[i].midterm; • Which side is faster in executing the for-loop? printf("average: %lf\n",midterm_average/ A. Left number_of_records); free(records); B. Right return 0; } C. About the same 32

  33. Thinking about programming struct student_record int main( int argc, char **argv) { { int id; int i,j; double homework; double midterm_average=0.0; double midterm; int number_of_records = 10000000; double final; struct timeval time_start, time_end; }; id = ( int *)malloc( sizeof ( int )*number_of_records); midterm = ( double *)malloc( sizeof ( double )*number_of_records); int main( int argc, char **argv) final = ( double *)malloc( sizeof ( double )*number_of_records); { homework = ( double *)malloc( sizeof ( double )*number_of_records); int i,j; init(number_of_records); double midterm_average=0.0; int number_of_records = 10000000; for (j = 0; j < 100; j++) struct timeval time_start, time_end; for (i = 0; i < number_of_records; i++) struct student_record *records; midterm_average+=midterm[i]; records = ( struct student_record*)malloc( sizeof ( struct free(id); student_record)*number_of_records); free(midterm); More row buffer hits in the init(number_of_records,records); free(final); free(homework); DRAM, more SRAM hits return 0; for (j = 0; j < 100; j++) } for (i = 0; i < number_of_records; i++) midterm_average+=records[i].midterm; • Which side is faster in executing the for-loop? printf("average: %lf\n",midterm_average/ A. Left number_of_records); free(records); B. Right return 0; } C. About the same 33

Recommend


More recommend