30000bc palaeolithic peoples in central europe and france
play

30000BC Palaeolithic peoples in central Europe and France record - PowerPoint PPT Presentation

30000BC Palaeolithic peoples in central Europe and France record numbers on bones. 5000BC A decimal number system is in use in Egypt. 4000BC Babylonian and Egyptian calendars in use. 3400BC The first symbols for numbers,


  1. Twos Complement number wheel -1 +0 +1 -2 1111 0000 1110 0001 -3 +2 + 1101 0010 like 1's comp except shifted -4 +3 1100 0011 0 100 = + 4 one position clockwise -5 1011 1 100 = - 4 0100 +4 1010 0101 -6 +5 - 1001 0110 +6 -7 1000 0111 +7 -8 Easy to determine sign (0?) Only one representation for 0 Addition and subtraction just as in unsigned case Simple comparison: A < B iff A – B < 0 One more negative number than positive number - one number has no additive inverse

  2. Twos Complement (algebraically) N* = 2 n - N 4 2 = 10000 sub 7 = 0111 Example: Twos complement of 7 1001 = repr. of -7 4 2 = 10000 Example: Twos complement of -7 sub -7 = 1001 0111 = repr. of 7 Bit manipulation: Twos complement: take bitwise complement and add one 0111 -> 1000 + 1 -> 1001 (representation of -7) 1001 -> 0110 + 1 -> 0111 (representation of 7)

  3. How is addition performed in each number system? • Operands may be positive or negative

  4. Sign Magnitude Addition Operand have same sign: unsigned addition of magnitudes 4 0100 -4 1100 result sign bit is the same as the operands' + 3 0011 + (-3) 1011 sign 7 0111 -7 1111 Operands have different signs: subtract smaller from larger and keep sign of the larger 4 0100 -4 1100 - 3 1011 + 3 0011 1 0001 -1 1001

  5. Ones complement addition Perform unsigned addition, then add in the end-around carry 4 0100 -4 1011 + 3 0011 + (-3) 1100 7 0111 -7 10111 End around carry 1 1000 4 0100 -4 1011 - 3 1100 + 3 0011 1 10000 -1 1110 End around carry 1 0001

  6. When carry occurs -0 +0 +1 -1 1111 0000 1110 0001 -2 +2 + 1101 0010 -3 +3 1100 0011 0 100 = + 4 -4 1011 1 011 = - 4 0100 +4 1010 0101 -5 +5 - 1001 0110 +6 -6 1000 0111 M – N where M > N +7 -7 -M - N

  7. Why does end-around carry work? Recall: N = (2 n - 1) - N End-around carry work is equivalent to subtracting 2 n and adding 1 n n M - N = M + N = M + (2 - 1 - N) = (M - N) + 2 - 1 (when M > N) n n -M + (-N) = M + N = (2 - M - 1) + (2 - N - 1) M + N < 2 n-1 n n = 2 + [2 - 1 - (M + N)] - 1 after end around carry: n = 2 - 1 - (M + N) this is the correct form for representing -(M + N) in 1's comp!

  8. Twos Complement Addition Perform unsigned addition and 4 0100 -4 1100 Discard the carry out. + 3 0011 + (-3) 1101 7 0111 -7 11001 Overflow? 4 0100 -4 1100 - 3 1101 + 3 0011 1 10001 -1 1111 Simpler addition scheme makes twos complement the most common choice for integer number systems within digital systems

  9. Twos Complement number wheel -1 +0 +1 -2 1111 0000 1110 0001 -3 +2 + 1101 0010 -4 +3 1100 0011 0 100 = + 4 -5 1011 1 100 = - 4 +4 0100 1010 0101 -6 +5 - 1001 0110 +6 -7 1000 0111 +7 -8 -M + -N where N + M ≤ 2n-1 -M + N when N > M

  10. 2s Comp: ignore the carry out -M + N when N > M: n n M* + N = (2 - M) + N = 2 + (N - M) Ignoring carry-out is just like subtracting 2n -M + -N where N + M ≤ 2 n-1 n n -M + (-N) = M* + N* = (2 - M) + (2 - N) n n = 2 - (M + N) + 2 After ignoring the carry, this is just the right twos compl. representation for -(M + N)!

  11. 2s Complement Overflow How can you tell an overflow occurred ? Add two positive numbers to get a negative number or two negative numbers to get a positive number -1 -1 +0 +0 -2 -2 1111 0000 +1 1111 +1 0000 1110 1110 0001 0001 -3 -3 +2 +2 1101 1101 0010 0010 -4 -4 1100 +3 +3 1100 0011 0011 -5 -5 1011 1011 0100 +4 0100 +4 1010 1010 -6 -6 0101 0101 +5 +5 1001 1001 0110 0110 -7 -7 +6 +6 1000 0111 1000 0111 -8 -8 +7 +7 -7 - 2 = +7! 5 + 3 = -8!

  12. 2s comp. Overflow Detection 0 1 1 1 1 0 0 0 5 0 1 0 1 -7 1 0 0 1 3 0 0 1 1 -2 1 1 0 0 -8 1 0 0 0 7 1 0 1 1 1 Overflow Overflow 0 0 0 0 1 1 1 1 5 0 1 0 1 -3 1 1 0 1 2 0 0 1 0 -5 1 0 1 1 7 0 1 1 1 -8 1 1 0 0 0 No overflow No overflow Overflow occurs when carry in to sign does not equal carry out

  13. Two’s Complement for N=32 0000 ... 0000 0000 0000 0000 two = 0 ten 0000 ... 0000 0000 0000 0001 two = 1 ten 0000 ... 0000 0000 0000 0010 two = 2 ten . . . 0111 ... 1111 1111 1111 1101 two = 2,147,483,645 ten 0111 ... 1111 1111 1111 1110 two = 2,147,483,646 ten 0111 ... 1111 1111 1111 1111 two = 2,147,483,647 ten 1000 ... 0000 0000 0000 0000 two = –2,147,483,648 ten 1000 ... 0000 0000 0000 0001 two = –2,147,483,647 ten 1000 ... 0000 0000 0000 0010 two = –2,147,483,646 ten . . . 1111 ... 1111 1111 1111 1101 two = –3 ten 1111 ... 1111 1111 1111 1110 two = –2 ten 1111 ... 1111 1111 1111 1111 two = –1 ten • One zero; 1st bit called sign bit • 1 “extra” negative:no positive 2,147,483,648 ten

  14. Two’s Complement Formula • Can represent positive and negative numbers in terms of the bit value times a power of 2: d 31 x -(2 31 ) + d 30 x 2 30 + ... + d 2 x 2 2 + d 1 x 2 1 + d 0 x 2 0 • Example: 1101 two = 1x-(2 3 ) + 1x2 2 + 0x2 1 + 1x2 0 = -2 3 + 2 2 + 0 + 2 0 = -8 + 4 + 0 + 1 = -8 + 5 = -3 ten

  15. Two’s Complement shortcut: Negation • Change every 0 to 1 and 1 to 0 (invert or complement), then add 1 to the result • Proof: Sum of number and its (one’s) complement must be 111...111 two However, 111...111 two = -1 ten Let x’ ⇒ one’s complement representation of x Then x + x’ = -1 ⇒ x + x’ + 1 = 0 ⇒ x’ + 1 = -x • Example: -3 to +3 to -3 • x : 1111 1111 1111 1111 1111 1111 1111 1101 two • x’: 0000 0000 0000 0000 0000 0000 0000 0010 two • +1: 0000 0000 0000 0000 0000 0000 0000 0011 two • ()’: 1111 1111 1111 1111 1111 1111 1111 1100 two • +1: 1111 1111 1111 1111 1111 1111 1111 1101 two You should be able to do this in your head…

  16. Two’s comp. shortcut: Sign extension • Convert 2’s complement number rep. using n bits to more than n bits • Simply replicate the most significant bit (sign bit) of smaller to fill new bits – 2’s comp. positive number has infinite 0s – 2’s comp. negative number has infinite 1s – Binary representation hides leading bits; sign extension restores some of them – 16-bit -4 ten to 32-bit: 1111 1111 1111 1100 two 1111 1111 1111 1111 1111 1111 1111 1100 two

  17. What if too big? • Binary bit patterns above are simply representatives of numbers. Strictly speaking they are called “numerals”. • Numbers really have an ∞ ∞ ∞ ∞ number of digits – with almost all being same (00…0 or 11…1) except for a few of the rightmost digits – Just don’t normally show leading digits • If result of add (or -, *, / ) cannot be represented by these rightmost HW bits, overflow is said to have occurred. 11110 11111 00000 00001 00010 unsigned

  18. Kilo, Mega, Giga, Tera, Peta, Exa, Zetta, Yotta physics.nist.gov/cuu/Units/binary.html Name Abbr Factor SI size 2 10 = 1,024 10 3 = 1,000 Kilo K 2 20 = 1,048,576 10 6 = 1,000,000 Mega M 2 30 = 1,073,741,824 10 9 = 1,000,000,000 Giga G 2 40 = 1,099,511,627,776 10 12 = 1,000,000,000,000 Tera T 2 50 = 1,125,899,906,842,624 10 15 = 1,000,000,000,000,000 Peta P 2 60 = 1,152,921,504,606,846,976 10 18 = 1,000,000,000,000,000,000 Exa E 2 70 = 1,180,591,620,717,411,303,424 10 21 = 1,000,000,000,000,000,000,000 Zetta Z 2 80 = 1,208,925,819,614,629,174,706,176 10 24 = 1,000,000,000,000,000,000,000,000 Yotta Y • Confusing! Common usage of “kilobyte” means 1024 bytes, but the “correct” SI value is 1000 bytes • Hard Disk manufacturers & Telecommunications are the only computing groups that use SI factors, so what is advertised as a 30 GB drive will actually only hold about 28 x 2 30 bytes, and a 1 Mbit/s connection transfers 10 6 bps.

  19. kibi, mebi, gibi, tebi, pebi, exbi, zebi, yobi en.wikipedia.org/wiki/Binary_prefix Name Abbr Factor 2 10 = 1,024 kibi Ki 2 20 = 1,048,576 mebi Mi 2 30 = 1,073,741,824 gibi Gi 2 40 = 1,099,511,627,776 tebi Ti 2 50 = 1,125,899,906,842,624 pebi Pi 2 60 = 1,152,921,504,606,846,976 exbi Ei 2 70 = 1,180,591,620,717,411,303,424 zebi Zi 2 80 = 1,208,925,819,614,629,174,706,176 yobi Yi • International Electrotechnical Commission (IEC) in 1999 introduced these to specify binary quantities. – Names come from shortened versions of the original SI prefixes (same pronunciation) and bi is short for “binary”, but pronounced “bee” :-( – Now SI prefixes only have their base-10 meaning and never have a base-2 meaning.

  20. The way to remember #s • What is 2 34 ? How many bits addresses (I.e., what’s ceil log 2 = lg of) 2.5 TiB? • Answer! 2 XY means… X=0 ⇒ --- X=1 ⇒ kibi ~10 3 Y=0 ⇒ 1 X=2 ⇒ mebi ~10 6 X=3 ⇒ gibi ~10 9 Y=1 ⇒ 2 X=4 ⇒ tebi ~10 12 X=5 ⇒ tebi ~10 15 Y=2 ⇒ 4 X=6 ⇒ exbi ~10 18 X=7 ⇒ zebi ~10 21 Y=3 ⇒ 8 X=8 ⇒ yobi ~10 24 Y=4 ⇒ 16 Y=5 ⇒ 32 Y=6 ⇒ 64 Y=7 ⇒ 128 Y=8 ⇒ 256 Y=9 ⇒ 512 MEMORIZE!

  21. Comparing the signed number systems Decimal S.M. 1’s comp. 2’s comp. • Here are all the 4-bit numbers 7 0111 0111 0111 in the different systems. 6 0110 0110 0110 Positive numbers are the • 5 0101 0101 0101 same in all three 4 0100 0100 0100 representations. 3 0011 0011 0011 • Signed magnitude and one’s 2 0010 0010 0010 complement have two ways 1 0001 0001 0001 of representing 0. This makes 0 0000 0000 0000 things more complicated. -0 1000 1111 — • Two’s complement has -1 1001 1110 1111 asymmetric ranges; there is one more negative number -2 1010 1101 1110 than positive number. Here, -3 1011 1100 1101 you can represent -8 but not -4 1100 1011 1100 +8. -5 1101 1010 1011 • However, two’s complement -6 1110 1001 1010 is preferred because it has -7 1111 1000 1001 only one 0, and its addition -8 — — 1000 algorithm is the simplest.

  22. And in Conclusion... • We represent “things” in computers as particular bit patterns: N bits ⇒ 2 N • Decimal for human calculations, binary for computers, hex to write binary more easily • 1’s complement - mostly abandoned • 2’s complement universal in computing: cannot avoid, so learn • Overflow: numbers ∞ ∞ ; computers finite, errors! ∞ ∞

  23. Numbers represented in memory • Memory is a place to store 00000 bits • A word is a fixed number of bits (eg, 32) at an address 101101100110 01110 • Addresses are naturally represented as unsigned numbers in C 11111 = 2 k - 1

  24. Signed vs. Unsigned Variables • Java just declares integers int – Uses two’s complement • C has declaration int also – Declares variable as a signed integer – Uses two’s complement • Also, C declaration unsigned int – Declares a unsigned integer – Treats 32-bit number as unsigned integer, so most significant bit is part of the number, not a sign bit

  25. Binary Codes for Decimal Digits There are over 8,000 ways that you can chose 10 elements from the 16 binary numbers of 4 bits. A few are useful: Decimal 8,4,2,1 Excess3 8,4,-2,-1 0 0000 0011 0000 1 0001 0100 0111 2 0010 0101 0110 3 0011 0110 0101 4 0100 0111 0100 5 0101 1000 1011 6 0110 1001 1010 7 0111 1010 1001 8 1000 1011 1000 9 1001 1100 1111

  26. Binary Coded Decimal (BCD) Binary Coded Decimal or 8,4,2,1 Code. This code is the simplest, most intuitive binary code for decimal digits and uses the same weights as a binary number, but only encodes the first ten values from 0 to 9. Examples: 1001 is 8 + 1 = 9 0011 is 2 + 1 = 3 0100 is 4 1010 is an illegal code.

  27. Other Decimal Codes The Excess-3 Code adds binary 0011 to the BCD code. The BCD (8,4, 2, 1) Code, and the (8,4,-2,-1) Code are examples of weighted codes. Each bit has a "weight" associated with it and you can compute the decimal value by adding the weights where a 1 exists in the code-word. Example: 1111 in (8,4,-2,-1) is 8 + 4 + (-2) + (-1) = 9

  28. Warning: Conversion or Coding? DO NOT mix up CONVERSION of a decimal number to a binary number with CODING a decimal number with a BINARY CODE. 13 10 = 1101 2 (This is CONVERSION) 13 ⇔ ⇔ 00010011 (This is CODING) ⇔ ⇔

  29. Binary Addition: Half Adder Ai Ai 0 1 0 1 Ai Bi Sum Carry Bi Bi 0 0 0 0 0 1 0 0 0 0 0 1 1 0 1 0 1 0 1 0 1 0 1 1 1 1 0 1 Carry = Ai Bi Sum = Ai Bi + Ai Bi = Ai + Bi A i Sum B i Half-adder Schematic Carry But each bit position may have a carry in…

  30. Full-Adder 1 A B CI S CO A B 0 0 1 1 Cin Co 0 0 0 0 0 CI 00 01 11 10 + 0 0 1 0 0 0 1 1 0 B 0 0 1 0 1 0 1 0 1 0 S 0 1 0 1 0 1 1 0 1 A 1 1 0 1 0 1 0 0 1 0 S 1 0 1 0 1 A B 1 1 0 0 1 1 1 1 1 1 CI 00 01 11 10 0 0 0 1 0 CO 1 0 1 1 1 S = CI xor A xor B CO = B CI + A CI + A B = CI (A + B) + A B Now we can connect them up to do multiple bits…

  31. Ripple Carry A2 B2 A3 B3 A1 B1 A0 B0 + + + + S2 S3 C3 C2 S1 C1 S0

  32. Full Adder from Half Adders (little aside) Standard Approach: 6 Gates A A B B CI CO S CI A B Alternative Implementation: 5 Gates A + B A + B + CI A S S S Half Half CI (A + B) Adder Adder A B B CO CO CI CO A B + CI (A xor B) = A B + B CI + A CI

  33. Delay in the Ripple Carry Adder Critical delay: the propagation of carry from low to high order stages @0 A @1 @N+1 B @0 late CI @N CO arriving signal @N+2 A @0 two gate delays B @0 @1 to compute CO C 0 A S @2 0 0 0 B C 1 @2 0 4 stage A S @3 1 1 1 adder B C 2 @4 1 A S @5 2 2 2 B C 3 @6 2 A S @7 3 3 3 B C 4 @8 final sum and 3 carry

  34. Ripple Carry Timing Critical delay: the propagation of carry from low to high order stages S0, C1 Valid S1, C2 Valid S2, C3 Valid S3, C4 Valid 1111 + 0001 worst case addition T0 T2 T4 T6 T8 T0: Inputs to the adder are valid 2 delays to compute sum T2: Stage 0 carry out (C1) but last carry not ready T4: Stage 1 carry out (C2) until 6 delays later T6: Stage 2 carry out (C3) T8: Stage 3 carry out (C4)

  35. Adders (cont.) Ripple Adder b0 a0 c7 c6 c5 c4 c3 c2 c0 FA c1 s7 s6 s0 Ripple adder is inherently slow because, in general s7 must wait for c7 which must wait for c6 … T α n , Cost α n How do we make it faster, perhaps with more cost? Classic approach: Carry Look-Ahead Or use a MUX !!!

  36. Carry Select Adder b7 a7 b6 a6 b5 a5 b4 a4 b3 a3 b2 a2 b1 a1 b0 a0 0 c0 FA s3 s2 s1 s0 1 c8 0 b7 a7 b6 a6 b5 a5 b4 a4 1 FA 1 0 1 0 1 0 1 0 s7 s6 s5 s4 T = T ripple_adder / 2 + T MUX COST = 1.5 * COST ripple_adder + (n+1) * COST MUX

  37. Extended Carry Select Adder b15-b12 a15-a12 b11-b8 a11-a8 b7-b4 a7-a4 b3-b0 a3-a0 4-bit Adder 4-bit Adder 4-bit Adder 1 1 1 cout 4-bit Adder cin 0 0 0 4-bit 4-bit 4-bit Adder Adder Adder 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 • What is the optimal # of blocks and # of bits/block? – If # blocks too large delay dominated by total mux delay – If # blocks too small delay dominated by adder delay per block T α sqrt(N), N stages of N bits Cost ≈ 2*ripple + muxes

  38. Carry Select Adder Performance b15-b12 a15-a12 b11-b8 a11-a8 b7-b4 a7-a4 b3-b0 a3-a0 4-bit Adder 4-bit Adder 4-bit Adder 1 1 1 cout 4-bit Adder cin 0 0 0 4-bit 4-bit 4-bit Adder Adder Adder 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 • Compare to ripple adder delay: T total = 2 sqrt(N) T FA – T FA, assuming T FA = T MUX For ripple adder T total = N T FA “cross-over” at N=3, Carry select faster for any value of N>3. • Is sqrt(N) really the optimum? – From right to left increase size of each block to better match delays – Ex: 64-bit adder, use block sizes [12 11 10 9 8 7 7] • How about recursively defined carry select?

  39. What really happens with the carries b0 a0 c7 c6 c5 c4 c3 c2 c0 FA c1 s7 s6 s0 Carry action A B Cout S Ai 0 0 0 Cin kill Gi Bi Propagate 0 1 Cin ~Cin propagate 1 0 Cin ~Cin Ai Pi generate 1 1 1 Cin Bi Carry Generate Gi = Ai Bi must generate carry when A = B = 1 Carry Propagate Pi = Ai xor Bi carry in will equal carry out here All generates and propagates in parallel at first stage. No ripple.

  40. Carry Look Ahead Logic Carry Generate Gi = Ai Bi must generate carry when A = B = 1 Carry Propagate Pi = Ai xor Bi carry in will equal carry out here Sum and Carry can be reexpressed in terms of generate/propagate: Ci Si Si = Ai xor Bi xor Ci = Pi xor Ci Pi Ci+1 = Ai Bi + Ai Ci + Bi Ci = Ai Bi + Ci (Ai + Bi) Gi Ci = Ai Bi + Ci (Ai xor Bi) Ci+1 Pi = Gi + Ci Pi

  41. All Carries in Parallel Reexpress the carry logic for each of the bits: C1 = G0 + P0 C0 C2 = G1 + P1 C1 = G1 + P1 G0 + P1 P0 C0 C3 = G2 + P2 C2 = G2 + P2 G1 + P2 P1 G0 + P2 P1 P0 C0 C4 = G3 + P3 C3 = G3 + P3 G2 + P3 P2 G1 + P3 P2 P1 G0 + P3 P2 P1 P0 C0 Each of the carry equations can be implemented in a two-level logic network Variables are the adder inputs and carry in to stage 0!

  42. CLA Implementation Ai Pi @ 1 gate delay Bi Adder with Propagate and Si @ 2 gate delays Generate Outputs Ci Gi @ 1 gate delay Increasingly complex logic C0 C0 C0 P0 C1 P0 P0 P1 P1 G0 P2 P2 P3 G0 P1 G0 C0 P2 P1 P0 P2 P1 G1 C3 P3 G0 P2 C2 G1 P1 P2 G2 P3 G1 C4 G2 P3 G3

  43. How do we extend this to larger adders? A 15-12 B 15-12 A 11-8 B 11-8 A 7-4 B 7-4 A 3-0 B 3-0 4 4 4 4 4 4 4 4 4 4 4 4 S 15-12 S 11-8 S 7-4 S 3-0 • Faster carry propagation – 4 bits at a time • But still linear • Can we get to log? • Compute propagate and generate for each adder BLOCK

  44. Cascaded Carry Lookahead 4 4 4 4 4 4 4 4 A [15-12] B [15-12] C A [1 1-8] B [1 1-8] C A [7-4] B [7-4] A [3-0] B [3-0] C C C 16 4 12 8 0 4-bit Adder 4-bit Adder 4-bit Adder 4-bit Adder @0 G G G P G P P P 4 @8 4 4 @7 4 @8 @4 S [3-0] S [15-12] S [1 1-8] S [7-4] @2 @3 @5 @2 @3 @5 @2 @3 @4 @2 @3 P G C P G C P 1 G C P G 3 3 3 2 2 2 1 1 0 0 C 16 C 0 C C 4 Lookahead Carry Unit 0 @5 @0 P 3-0 G 3-0 @3 @5 4 bit adders with internal carry lookahead second level carry lookahead unit, extends lookahead to 16 bits One more level to 64 bits

  45. Trade-offs in combinational logic design • Time vs. Space Trade-offs Doing things fast requires more logic and thus more space Example: carry lookahead logic • Simple with lots of gates vs complex with fewer • Arithmetic Logic Units Critical component of processor datapath Inner-most "loop" of most computer instructions

  46. 2s comp. Overflow Detection 0 1 1 1 1 0 0 0 5 0 1 0 1 -7 1 0 0 1 3 0 0 1 1 -2 1 1 0 0 -8 1 0 0 0 7 1 0 1 1 1 Overflow Overflow 0 0 0 0 1 1 1 1 5 0 1 0 1 -3 1 1 0 1 2 0 0 1 0 -5 1 0 1 1 7 0 1 1 1 -8 1 1 0 0 0 No overflow No overflow Overflow occurs when carry in to sign does not equal carry out

  47. 2s Complement Adder/Subtractor A 3 B 3 B A 2 B 2 B A 1 B 1 B A 0 B 0 B 3 2 1 0 Sel 0 1 Sel 0 1 Sel 0 1 0 1 Sel A B A B A B A B Add/Subtract CO + CI CO + CI CO + CI CO + CI S S S S S S S S 3 2 1 0 Overflow A - B = A + (-B) = A + B + 1

  48. Summary • Circuit design for unsigned addition – Full adder per bit slice – Delay limited by Carry Propagation » Ripple is algorithmically slow, but wires are short • Carry select – Simple, resource-intensive – Excellent layout • Carry look-ahead – Excellent asymptotic behavior – Great at the board level, but wire length effects are significant on chip • Digital number systems – How to represent negative numbers – Simple operations – Clean algorithmic properties • 2s complement is most widely used – Circuit for unsigned arithmetic – Subtract by complement and carry in – Overflow when cin xor cout of sign-bit is 1

  49. Basic Arithmetic and the ALU • Now – Integer multiplication » Booth’s algorithm – Integer division » Restoring, non-restoring – Floating point representation – Floating point addition, multiplication

  50. Multiplication 1 0 0 0 Flashback to 3 rd grade • – Multiplier x 1 0 0 1 – Multiplicand – Partial products 1 0 0 0 – Final sum • Base 10: 8 x 9 = 72 0 0 0 0 – PP: 8 + 0 + 0 + 64 = 72 • How wide is the result? 0 0 0 0 – log(n x m) = log(n) + log(m) 32b x 32b = 64b result – 1 0 0 0 1 0 0 1 0 0 0

  51. Combinational Multiplier • Generating partial products – 2:1 mux based on multiplier[i] selects multiplicand or 0x0 – 32 partial products (!) • Summing partial products – Build Wallace tree of CSA

  52. Carry Save Adder A + B => S Save carries A + B => S, C out Use C in A + B + C => S1, S2 (3# to 2# in parallel) Used in combinational multipliers by building a Wallace Tree a c b CSA c s

  53. Wallace Tree e f d c b a CSA CSA CSA CSA

  54. Multicycle Multipliers • Combinational multipliers – Very hardware-intensive – Integer multiply relatively rare – Not the right place to spend resources • Multicycle multipliers – Iterate through bits of multiplier – Conditionally add shifted multiplicand

  55. Multiplier (F4.25) 1 0 0 0 x 1 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 1 0 0 0

  56. Multiplier Start (F4.26) M ultiplier0 = 1 M ultiplier0 = 0 1. Test M ultiplier0 1a. Add m ultiplicand to product and place the result in Product register 1 0 0 0 2. Shift the Multiplicand register left 1 bit x 1 0 0 1 1 0 0 0 3. Shift the Multiplier register right 1 bit 0 0 0 0 0 0 0 0 No: < 32 repetitions 32nd repetition? 1 0 0 0 1 0 0 1 0 0 0 Yes: 32 repetitions Done

  57. Multiplier Improvements • Do we really need a 64-bit adder? – No, since low-order bits are not involved – Hence, just use a 32-bit adder » Shift product register right on every step • Do we really need a separate multiplier register? – No, since low-order bits of 64-bit product are initially unused – Hence, just store multiplier there initially

  58. Multiplier (F4.31) 1 0 0 0 x 1 0 0 1 1 0 0 0 0 0 0 0 Multiplicand 0 0 0 0 32 bits 1 0 0 0 1 0 0 1 0 0 0 32-bit ALU Control Shift right Product test Write 64 bits

  59. Multiplier Start (F4.32) Product0 = 1 Product0 = 0 1. Test Product0 1a. Add multiplicand to the left half of the product and place the result in the left half of the Product register 1 0 0 0 x 1 0 0 1 2. Shift the Product register right 1 bit 1 0 0 0 0 0 0 0 No: < 32 repetitions 32nd repetition? 0 0 0 0 1 0 0 0 Yes: 32 repetitions 1 0 0 1 0 0 0 D one

  60. Signed Multiplication Recall • – For p = a x b, if a<0 or b<0, then p < 0 – If a<0 and b<0, then p > 0 – Hence sign(p) = sign(a) xor sign(b) Hence • – Convert multiplier, multiplicand to positive number with (n-1) bits – Multiply positive numbers – Compute sign, convert product accordingly • Or, – Perform sign-extension on shifts for F4.31 design – Right answer falls out

  61. Booth’s Encoding • Recall grade school trick – When multiplying by 9: » Multiply by 10 (easy, just shift digits left) » Subtract once – E.g. » 123454 x 9 = 123454 x (10 – 1) = 1234540 – 123454 » Converts addition of six partial products to one shift and one subtraction • Booth’s algorithm applies same principle – Except no ‘9’ in binary, just ‘1’ and ‘0’ – So, it’s actually easier!

  62. Booth’s Encoding Search for a run of ‘1’ bits in the multiplier • – E.g. ‘0110’ has a run of 2 ‘1’ bits in the middle – Multiplying by ‘0110’ (6 in decimal) is equivalent to multiplying by 8 and subtracting twice, since 6 x m = (8 – 2) x m = 8m – 2m • Hence, iterate right to left and: – Subtract multiplicand from product at first ‘1’ – Add multiplicand to product after first ‘1’ – Don’t do either for ‘1’ bits in the middle

  63. Booth’s Algorithm Current Bit to Explanation Example Operation bit right 1 0 Begins run of ‘1’ 00001111000 Subtract 1 1 Middle of run of ‘1’ 00001111000 Nothing 0 1 End of a run of ‘1’ 00001111000 Add 0 0 Middle of a run of ‘0’ 00001111000 Nothing

  64. Integer Division • Again, back to 3 rd grade 1 0 0 1 Quotient Divisor 1 0 0 0 1 0 0 1 0 1 0 Dividend - 1 0 0 0 1 0 1 0 1 1 0 1 0 - 1 0 0 0 1 0 Remainder

  65. Integer Division • How does hardware know if division fits? – Condition: if remainder � divisor – Use subtraction: (remainder – divisor) � 0 • OK, so if it fits, what do we do? – Remainder n+1 = Remainder n – divisor • What if it doesn’t fit? – Have to restore original remainder • Called restoring division

  66. Start 1. Subtract the Divisor register from the Remainder register and place the Integer Division result in the Remainder register (F4.40) Remainder > 0 Remainder < 0 – Test Remainder 2a. Shift the Quotient register to the left, 2b. Restore the original value by adding setting the new rightmost bit to 1 the Divisor register to the Remainder register and place the sum in the Remainder register. Also shift the Quotient register to the left, setting the new least significant bit to 0 1 0 0 1 Quotient Divisor 1 0 0 0 1 0 0 1 0 1 0 Dividend - 1 0 0 0 3. Shift the Divisor register right 1 bit 1 0 1 0 1 1 0 1 0 No: < 33 repetitions 33rd repetition? - 1 0 0 0 1 0 Remainder Yes: 33 repetitions Done

  67. 1 0 0 1 Quotient Integer Division Divisor 1 0 0 0 1 0 0 1 0 1 0 Dividend - 1 0 0 0 1 0 1 0 1 1 0 1 0 - 1 0 0 0 1 0 Remainder Divisor Shift right 64 bits Quotient 64-bit ALU Shift left 32 bits Remainder Control Write test 64 bits

  68. Division Improvements • Skip first subtract – Can’t shift ‘1’ into quotient anyway – Hence shift first, then subtract » Undo extra shift at end • Hardware similar to multiplier – Can store quotient in remainder register – Only need 32b ALU » Shift remainder left vs. divisor right

  69. S ta rt 1 . S h ift th e R e m a in d e r re g iste r le ft 1 b it Improved Divider 2 . S u b tra c t th e D iv is o r re g is te r fro m th e le ft h a lf o f th e R e m a in d e r re g is te r a n d (F4.40) p la c e th e re s u lt in th e le ft h a lf o f th e R e m a in d e r re g is te r R e m a in d e r > 0 R e m a in d e r < 0 – T e s t R e m a in d e r 3 a . S h ift th e R e m a in d e r re g is te r to th e 3 b . R e s to re th e o rig in a l v a lu e b y a d d in g le ft, s e ttin g th e n e w rig h tm o s t b it to 1 th e D iv is o r re g is te r to th e le ft h a lf o f th e R e m a in d e r re g is te r a n d p la c e th e s u m in th e le ft h a lf o f th e R e m a in d e r re g is te r. A ls o sh ift th e R e m a in d e r re g is te r to th e le ft, s e ttin g th e n e w rig h tm o s t b it to 0 N o : < 3 2 re p e titio n s 3 2 n d re p e titio n ? Y e s : 3 2 re p e titio n s D o n e . S h ift le ft h a lf o f R e m a in d e r rig h t 1 b it

  70. Improved Divider (F4.41) Divisor 32 bits 32-bit ALU Shift right Control Remainder Shift left test Write 64 bits

  71. Further Improvements • Division still takes: – 2 ALU cycles per bit position » 1 to check for divisibility (subtract) » One to restore (if needed) • Can reduce to 1 cycle per bit – Called non-restoring division – Avoids restore of remainder when test fails

Recommend


More recommend