Datapath Components (3) — Those Who “Remember” Things Prof. Usagi
Recap: Combinational v.s. sequential logic • Combinational logic • The output is a pure function of its current inputs • The output doesn’t change regardless how many times the logic is triggered — Idempotent • Sequential logic • The output depends on current inputs, previous inputs, their history Sequential circuit has memory! 2
Recap: D flip-flop Input D Q D Q Output D-latch D-latch Clk Clk Clk Master-Slave D Flip-flop Clk Input Output 3
Recap: Positive-edge-triggered D flip-flop Q Clock Q Data 4
Outline • Volatile Memory • Registers • SRAM • DRAM • Programming and memory • Non-volatile Memory 5
Registers 6
Registers • Register: a sequential component that can store multiple bits • A basic register can be built simply by using multiple D-FFs Register Output 1 Output 3 Output 2 Output 4 Output 5 D Flip- D Flip- D Flip- D Flip- D Flip- flop flop flop flop flop D Q D Q D Q D Q D Q D Clk Input 1 Input 3 Input 2 Input 4 Input 5 Inpu 7
Poll close in What will we output 4 cycles later? Output 1 Output 1 Output 3 Output 3 Output 2 Output 2 Output 4 Output 4 D Flip- D Flip- D Flip- D Flip- D Flip- D Flip- D Flip- D Flip- flop flop flop flop flop flop flop flop Input 1 D D Q Q D D Q Q D D Q Q D D Q Q Clk Clk • For the above D-FF organization, what are we expecting to see in (O1,O2,O3,O4) in the Input 1 Input 3 Input 2 Input 4 beginning of the 5th cycle after receiving (1,0,1,1)? A. (1,1,1,1) B. (1,0,1,1) C. (1,1,0,1) D. (0,0,1,0) E. (0,1,0,0) 8
What will we output 4 cycles later? Output 1 Output 3 Output 2 Output 4 D Flip- D Flip- D Flip- D Flip- flop flop flop flop Input 1 D Q D Q D Q D Q Clk • For the above D-FF organization, what are we expecting to see in (O1,O2,O3,O4) in the beginning of the 5th cycle after receiving (1,0,1,1)? A. (1,1,1,1) B. (1,0,1,1) C. (1,1,0,1) D. (0,0,1,0) E. (0,1,0,0) 9
Shift register • Holds & shifts samples of input Output 1 Output 1 Output 3 Output 3 Output 2 Output 2 Output 4 Output 4 D Flip- D Flip- D Flip- D Flip- D Flip- D Flip- D Flip- D Flip- flop flop flop flop flop flop flop flop Input 1 D D Q Q D D Q Q D D Q Q D D Q Q Clk Clk Input 1 Input 3 Input 2 Input 4 10
Let’s play with the shift register more… Output 1 Output 3 Output 3 Output 2 Output 4 Output 4 Output 1 Output 2 D Flip- D Flip- D Flip- D Flip- D Flip- D Flip- D Flip- D Flip- flop flop flop flop flop flop flop flop Input 1 Input 1 D D Q Q D D Q Q D D Q Q D D Q Q Clk Clk 11
Poll close in Let’s play with the shift register more… • For the extended shift register, what sequence of input will the let the circuit output “1”? A. (1, 1, 1, 1) B. (0, 1, 0, 1) C. (1, 0, 1, 0) D. (0, 1, 1, 0) E. (1, 0, 0, 1) 12
Let’s play with the shift register more… • For the extended shift register, what sequence of input will the let the circuit output “1”? A. (1, 1, 1, 1) B. (0, 1, 0, 1) C. (1, 0, 1, 0) D. (0, 1, 1, 0) E. (1, 0, 0, 1) 13
Pattern Recognizer We can recognize 1001! • Combinational function of input samples Output 1 Output 3 Output 3 Output 2 Output 4 Output 4 Output 1 Output 2 D Flip- D Flip- D Flip- D Flip- D Flip- D Flip- D Flip- D Flip- flop flop flop flop flop flop flop flop Input 1 Input 1 D D Q Q D D Q Q D D Q Q D D Q Q Clk Clk 14
Counters • Sequences through a fixed set of patterns • Note: definition is general • For example, the one in the figure is a type of counter called Linear Feedback Shift Register (LFSR) Output 1 Output 3 Output 2 Output 4 D Flip- D Flip- D Flip- D Flip- flop flop flop flop D Q D Q D Q D Q Clk 15
Static Random Access Memory (SRAM) 16
A Classical 6-T SRAM Cell bitline’ bitline wordline Q Q’ Sense Amplifier 17
A Classical 6-T SRAM Cell Sense Amplifier 18
Write “1” to an SRAM Cell bitline’ bitline wordline 1 1 0 0 0 1 Q Q’ • Bitlines overpower cell with new value • Q = 0, Q’ = 1, BL = 1, BL’ = 0 — Force Q’ low, then Q rises high Sense Amplifier 19
Write “0” to an SRAM Cell bitline’ bitline wordline 0 0 1 1 1 0 Q Q’ Sense Amplifier 20
Reading from an SRAM Cell bitline’ bitline wordline 1 0 Q Q’ 1 0 Sense Amplifier 21
SRAM array wd0 wd1 wd2 wd(m-1) 0 1 2 Decoder We can only work on cells sharing upper bits of the same word line simultaneously address n-1 Sense Sense Sense Sense Amp Amp Amp Amp lower bits of MUX address 22
Dynamic Random Access Memory (DRAM) 23
An DRAM cell data • 1 transistor (rather than 6) • Relies on large capacitor to store bit • Write: transistor conducts, data voltage level gets stored on top plate of capacitor wordline • Read: look at the value of d • Problem: Capacitor discharges over time • Must “refresh” regularly, by reading d and then writing it right back 24
DRAM array 0 1 2 Row Decoder Usually 4K — the page size of your OS! upper bits of address n-1 Row Buffer lower bits of MUX address 25
Poll close in Register v.s. DRAM v.s. SRAM • Consider the following memory elements ① 64*64-bit Registers ② 512B SRAM ③ 512B DRAM A. Area: (1) > (2) > (3) Delay: (1) < (2) < (3) B. Area: (1) > (3) > (2) Delay: (1) < (3) < (2) C. Area: (3) > (1) > (2) Delay: (1) < (3) < (2) D. Area: (3) > (2) > (1) Delay: (3) < (2) < (1) E. Area: (2) > (3) > (1) Delay: (2) < (3) < (1) 26
Register v.s. DRAM v.s. SRAM • Consider the following memory elements ① 64*64-bit Registers ② 512B SRAM ③ 512B DRAM A. Area: (1) > (2) > (3) Delay: (1) < (2) < (3) B. Area: (1) > (3) > (2) Delay: (1) < (3) < (2) C. Area: (3) > (1) > (2) Delay: (1) < (3) < (2) D. Area: (3) > (2) > (1) Delay: (3) < (2) < (1) E. Area: (2) > (3) > (1) Delay: (2) < (3) < (1) 27
RC charging 28
Latency of volatile memory Size (Transistors per bit) Latency (ns) Register 18T ~ 0.1 ns SRAM 6T ~ 0.5 ns DRAM 1T 50-100 ns 29
Programming and memory 30
Memory “hierarchy” in modern processor architectures Processor fastest Processor < 1ns Core fastest Registers 32 or 64 words L1 $ L2 $ SRAM $ a few ns KBs ~ MBs L3 $ GBs DRAM tens of ns larger TBs tens of ns Storage larger 31
Poll close in Thinking about programming struct student_record int main( int argc, char **argv) { { int id; int i,j; double homework; double midterm_average=0.0; double midterm; int number_of_records = 10000000; double final; struct timeval time_start, time_end; }; id = ( int *)malloc( sizeof ( int )*number_of_records); midterm = ( double *)malloc( sizeof ( double )*number_of_records); int main( int argc, char **argv) final = ( double *)malloc( sizeof ( double )*number_of_records); { homework = ( double *)malloc( sizeof ( double )*number_of_records); int i,j; init(number_of_records); double midterm_average=0.0; int number_of_records = 10000000; for (j = 0; j < 100; j++) struct timeval time_start, time_end; for (i = 0; i < number_of_records; i++) struct student_record *records; midterm_average+=midterm[i]; records = ( struct student_record*)malloc( sizeof ( struct free(id); student_record)*number_of_records); free(midterm); init(number_of_records,records); free(final); free(homework); return 0; for (j = 0; j < 100; j++) } for (i = 0; i < number_of_records; i++) midterm_average+=records[i].midterm; • Which side is faster in executing the for-loop? printf("average: %lf\n",midterm_average/ A. Left number_of_records); free(records); B. Right return 0; } C. About the same 32
Thinking about programming struct student_record int main( int argc, char **argv) { { int id; int i,j; double homework; double midterm_average=0.0; double midterm; int number_of_records = 10000000; double final; struct timeval time_start, time_end; }; id = ( int *)malloc( sizeof ( int )*number_of_records); midterm = ( double *)malloc( sizeof ( double )*number_of_records); int main( int argc, char **argv) final = ( double *)malloc( sizeof ( double )*number_of_records); { homework = ( double *)malloc( sizeof ( double )*number_of_records); int i,j; init(number_of_records); double midterm_average=0.0; int number_of_records = 10000000; for (j = 0; j < 100; j++) struct timeval time_start, time_end; for (i = 0; i < number_of_records; i++) struct student_record *records; midterm_average+=midterm[i]; records = ( struct student_record*)malloc( sizeof ( struct free(id); student_record)*number_of_records); free(midterm); More row buffer hits in the init(number_of_records,records); free(final); free(homework); DRAM, more SRAM hits return 0; for (j = 0; j < 100; j++) } for (i = 0; i < number_of_records; i++) midterm_average+=records[i].midterm; • Which side is faster in executing the for-loop? printf("average: %lf\n",midterm_average/ A. Left number_of_records); free(records); B. Right return 0; } C. About the same 33
Recommend
More recommend