CO Unit 2
CO Unit 2
Arithmetic Unit
This Unit Focused on
Addition and subtraction of two numbers are basic operations
at the machine-instruction level in all computers. These
operations, as well as other arithmetic and logic operations,
are implemented in the arithmetic and logic unit (ALU) of the
processor.
To know about
The logic circuits used to implement arithmetic operations.
The time needed to perform addition or subtraction affects the
processor’s performance.
Multiply and divide operations, which require more complex
circuitry than either addition or subtraction operations, also
affect performance.
To know some of the techniques used in modern computers
to perform arithmetic operations at high speed. Operations on
floating-point numbers are also described.
Addition/subtraction of signed numbers
At the ith stage:
Input:
ci is the carry-in
Output:
si is the sum
ci+1 carry-out to (i+1)st state
The logic expression
for si can be
implemented with a 3-
input XOR gate
required for a single
stage of binary
addition. The carry-
out function, ci+1, is
implemented with an
AND-OR circuit. A
complete circuit for a
single stage of
addition, called a full
adder (FA).
An n-bit ripple-carry adder
•Cascade n full adder (FA) blocks to form a n-bit adder
•Carries propagate or ripple through this cascade, the
configuration is called n-bit ripple carry adder
The carry-in, c0, into the
least-significant-bit(LSB)
position provides a
convenient way of adding
1 to a number. For
instance, forming the 2’s-
complement of a number
involves adding 1 to the
1’s-complement of the
number. So, same circuit
is used for addition and
subtraction.
K n-bit adder
K n-bit numbers can be added by cascading k n-bit adders.
Addition/Subtraction Logic Unit
n-bit adder/subtractor
•Add/sub control = 0,
addition.
•Add/sub control = 1,
subtraction.
Computing the add time
x0 y0
Su Carry
m yi
c
i
xi
xi
yi si ci +1
c
i
ci
xi
yi
Computing the add time (contd..)
Cascade of 4 Full Adders, or a 4-bit adder
x0 y0 x0 y0 x0 y0 x0 y0
FA FA FA FA c0
c4 c3 c2 c1
s3 s2 s1 s0
•s0 available after 1 gate delays, c1 available after 2 gate delays.
•s1 available after 3 gate delays, c2 available after 4 gate delays.
•s2 available after 5 gate delays, c3 available after 6 gate delays.
•s3 available after 7 gate delays, c4 available after 8 gate delays.
In CLA Gi and Pi are computed only from xi and yi and not
consider ci, thus they can be computed in one gate delay after
X and Y are applied to the inputs of an n-bitadder.
If the generate function for stage i is equal to 1, then
ci+1 = 1, independent of the input carry, ci . This occurs when
both xi and yi are1.
The propagate function means that an input carry will
produce an output carry when either xi is 1 or yi is 1.
All Gi and Pi functions can be formed independently and in
parallel in one logic-gate delay after the X and Y operands are
applied to the inputs of an n-bit adder.
Each bit stage contains an AND gate to form Gi , an OR
gate to form Pi , and a three-input XOR gate to form si .
A simpler circuit can be derived by observing that an
adequate propagate function can be realized as Pi = xi ⊕ yi ,
which differs from Pi = xi + yi only when xi = yi = 1. But, in
this case Gi = 1, so it does not matter whether Pi is 0 or 1.
A 4-bit carry-lookahead adder
Let us consider the design of a 4-bit adder. The carries can be
implemented as
ci+1 = Gi + Pici c1 = G0 + P0c0
c2 = G1 + P1c1 c2 = G1 + P1 (G0 + P0c0)
c3 = G2 + P2G1 + P2P1G0 +P2P1P0c0
c 4 = G 3 + P 3 G 2 + P 3 P2 G 1 + P3 P 2 P1 G 0 + P3 P2 P 1 P0 c 0
ci+1 = Gi + PiGi−1 + PiPi−1ci−1
Continuing this type of expansion, the final expression for any
carry variable is
ci+1 = Gi + PiGi−1 + PiPi−1Gi−2 +・ ・ +PiPi−1 ・ ・ P1G0 +
PiPi−1 ・ ・ ・ P0c0
Thus, all carries can be obtained three gate delays after the input
operands X , Y , and c0 are applied because
one gate delay is needed to develop all Pi and Gi signals,
followed by two gate delays in the AND-OR circuit for ci+1. After
a further XOR gate delay, all sum bits are available(1 gate
•Independent of n, n-bit additionrequires only 4 gate delays.
This is called Carry Lookahead adder.
Delay through the adder is 3 gate delays for all carry bits and 1
gate delays for all sum bits. In comparison, a 4-bit ripple-carry
adder requires 7 gate delays for sum and 8 gate delays for
carry.
Delay in 4-bit
adder
• Ripple-carry design:
8 gate delays for carry: (2 for each FA)
×4
7 gate delays for sum
• Carry-lookahead design:
1 for all Pi and
Gi 2 for all ci
1 for all si
4 gate delays
Blocked Carry-Lookahead adder
Carry-out from a 4-bit block can be given as:
c G P G P 3 P 2 G 1 P 3 P 2 P1 G P 3 P2 P 1 P 0 c 0
4 3 3 2 0
G 0 G 3 P3 G 2 P3 P2 G 1 P3 P 2 P1G 0
I
Row 3
Row 4
Row 5
There are two gate delays from the inputs to the outputs of a
full-adder block, FA, the critical path has start from Row1 to
Row5
• Combinatorial array multipliers are:
– Extremely inefficient.
– Have a high gate count for multiplying numbers of
practical size such as 32-bit or 64-bit numbers.(ex) For a
32×32 bits multiplication requires 1024 AND gates and
1024 FA circuits are required. So, it require large space to
place all these gates.
– Perform only one function, namely, unsigned integer
product.
• Improve gate efficiency by using a mixture of combinatorial
array techniques and sequential techniques requiring less
combinational logic.
Sequential multiplication
• The combinational array multiplier uses a large number of
logic gates for multiplying numbers of practical size, such
as 32- or 64-bit numbers. Multiplication of two n-bit
numbers can also be performed in a sequential circuit that
uses a single n-bit adder.
Sequential multiplication Example
Sequential Circuit Multiplier
Registers A and Q are shift registers.Together, they hold
partial product PPi while multiplier bit qi generates the signal
Add/Noadd. This signal causes the multiplexer MUX to select
0 when qi = 0, or to select the multiplicand M when qi = 1, to
be added to PPi to generate PP(i + 1). The product is
computed in n cycles.
The partial product grows in length by one bit per cycle from
the initial vector, PP0, of n 0s in register A. The carry-out from
the adder is stored in flip-flop C, shown at the left end of
registerA.
At the start, the multiplier is loaded into register Q, the
multiplicand into register M, and C and A are cleared to 0. At
the end of each cycle, C, A, and Q are shifted right one bit
position to allow for growth of the partial product as the
multiplier is shifted out of register Q.
Because of this shifting, multiplier bit qi appears at the LSB
position of Q to generate the Add/Noadd signal at the correct
time, starting with q0 during the first cycle, q1 during the
second cycle, and so on. After they are used, the multiplier bits
are discarded by the right-shift operation.
The carry-out from the adder is the leftmost bit of PP(i + 1),
and it must be held in the C flip-flop to be shifted right with
the contents of A and Q.
After n cycles, the high-order half of the product is held in
register A and the low-order half is in registerQ.
Multiplication of Signed Numbers
Signed Multiplication
Considering 2’s-complement signed operands, what will happen to
(-13)(+11) if following the same method of unsigned multiplication?
Normal Multiplication Booth Multiplication
Booth recoding of a multiplier
P7 54
P6 P5 P4 P3 P2 P1 P0
Carry-Save Addition of Summands
CSA speeds up the addition process.
P7 P6 P5 P4 P3 P2 P1 P0
• Consider the 4 × 4 multiplication array in which the first
row consists of AND gates that produce the four inputs
m3q0,m2q0, m1q0, and m0q0.
• Instead of letting the carries ripple along the rows, they can
be ―saved‖ and introduced into the next row, at the correct
weighted positions. This frees up an input to each of three
full adders in the first row. These inputs can be used to
introduce the third summand bits m2q2, m1q2, and m0q2.
• Now, two inputs of each of three full adders in the second
row are fed by the sum and carry outputs from the firstrow.
• The third input is used to introduce the bits m2q3, m1q3, and
m0q3 of the fourth summand. The high-order bits m3q2 and
m3q3 of the third and fourth summands are introduced into
the remaining free full-adder inputs at the left end in the
second and third rows.
• The saved carry bits and the sum bits from the second row
are now added in the third row, which is a ripple-carry
adder, to produce the final product bits.
• The delay through the carry-save array is somewhat less
than the delay through the ripple-carry array. This is because
the S and C vector outputs from each row are produced in
parallel in one full-adder delay.
Summand Addition Tree using 3-2 Reducers
• Consider the addition of many summands,
Group the summands in threes and perform carry-save
addition on each of these groups in parallel to generate a
set of S and C vectors in one full-adder delay
Group all of the S and C values into threes, and perform
carry-save addition on them, generating a further set of S
and C values in one more full-adder delay
Continue with this process until there are only two values
remaining . They can be added in a RCA or CLA to
produce the desired product
The adder at each bit position of the three summands is
called a 3-2 reducer, and the logic circuit structure that
reduces a number of summands to two is called a CSA tree.
The final two S and C vectors can be added in a carry-
lookahead adder to produce the desired product.
Multiplication example used to illustrate carry-saveaddition
Schematic representation of the carry-save addition operations
Integer
Division
Manual Division
G
I
G P G P P G P P P G
0 3 3 2 3 2 1 3 2 1 0
4-bit carry-lookahead Adder
2. Assuming 6-bit 2’s-complement number representation, multiply the
multiplicand A= 110101 by the multiplier B = 011011 using both the
normal Booth algorithm and the bit-pair recoding Booth algorithm.
3. Show that the logic expression cn ⊕ cn−1 is a correct
indicator of overflow in the addition of 2’s-complement
integers by using an appropriate truth table.
Consider the truth table of Full Adder. Overflow occurs only
when xn−1 and yn−1 are the same and sn−1 is different. This
occurs in the second and seventh rows of the table; and cn and
cn−1 are different only in those rows. Therefore, cn ⊕ cn−1 is a
correct indicator of overflow.
4. Show that the worst case delay through an n × n array of
the type shown in Figure is 6(n − 1) − 1 gatedelays,
No full adders are needed in the first row of the array because
the incoming partial product PP0 is zero. Each of the two FA
blocks in rows 2 through n -1 introduces 2 gate delays, for a
total of 4(n-2) gate delays. Row n introduces 2n gate delays.
Adding in the initial AND gate delay for row 1 and all other
cells, total delay is:
4(n -2) + 2n + 1 = 6n - 8 + 1 = 6(n -1) -1
5. Multiply each of the following pairs of signed 2’s-
complement numbers using the Booth algorithm. In each case,
assume that A is the multiplicand and B is themultiplier.
(a) A = 010111 and B = 110110
(b) A = 110011 and B = 101100
(c) A = 110101 and B = 011011
(d) A = 001111 and B = 001111
6.Multiply each of the following pairs of signed 2’s-
complement numbers using the bit-pair recoding of the
multiplier. In each case, assume that A is the multiplicand and
B is the multiplier.
(a) A = 010111 and B = 110110
(b) A = 110011 and B = 101100
(c) A = 110101 and B = 011011
(d) A = 001111 and B = 001111
7. Using manual methods, perform the operations A × B and A ÷ B
on the 5-bit unsigned numbers A = 10101 and B = 00101.
8. Show how the multiplication and division operations would
be performed by the hardware and construct the charts to
perform the operations A × B and A ÷ B on the 5-bit unsigned
numbers A= 10101 and B = 00101
The multiplication chart is
The division chart is
9. How many levels of 4-2 reducers are needed to reduce k summands to
2 in a reduction tree? How many levels are needed if 3-2 reducers are
used?
Let the number of levels be L. For 4-2 reducers, we have k(2/4)L = 2
where denominator represent number of inputs and Numerator represent
number of outputs. so, this can be reduced as k(1/2)L = 2 .Take
logarithms to the base 2 of each side of this equation to derive
log(k(1/2)L ) = log2 2
log2 k + log2(1/2L ) =1
log2 k + log21 - log22L =1
log2 k + 0 - L log22 =1
log2 k –L =1 L =log2 k - 1
10. (a) How many 3-2 reduction levels are needed to reduce 16
summands to 2?
(b)Show that the claim of 8 levels is required for reducing 32 summands
to 2 .
(c)Compare the exact answers in parts (a) and (b) to the results
obtained by using the approximation developed in 3-2 reduction.
(a) Six 3-2 reduction levels are needed:
(b) To show that Eight 3-2 reduction levels areneeded:
c. For 3-2 reducers, L = 1.7log2k − 1.7
Here k is number of levels , So k= 6 for 16 summands to 2 and k= 8 for
32 summands to 2
Substitute k=6, L = 1.7log26 − 1.7 L = 1.7log223− 1.7 = (3×1.7)-1.7
=3.4
Substitute k=8, L = 1.7log28 − 1.7 L = 1.7log224 − 1.7 = (4×1.7)-1.7
= 5.1
11. Indicate generally how to modify the sequential multiplier circuit
diagram to implement multiplication of 2’s-complement n-bit numbers
using the Booth algorithm, by clearly specifying inputs and outputs for
the Control sequencer and any other changes needed around the adder
and register A.
Both the A and M registers are augmented by one bit to the left to hold a
sign extension bit. The adder is changed to an n + 1-bit adder. A bit is
added to the right end of the Q register to implement the Booth multiplier
recoding operation. It is initially set to zero. The control logic decodes
the two bits at the right end of the Q register according to the Booth
algorithm. The right shift is an arithmetic right shift as indicated by the
repetition of the extended sign bit at the left end of the Aregister.
12. a. Apply Booth algorithm for a bit pattern for the
multiplier 11111111. How many addition operations are to be
reduced?
b. Similarly suppose a bit pattern consists of hundred 1’s ,
how many addition operations are reduced?
a). 1 1 1 1 1 1 1 1
+1 0 0 0 0 0 0 -1
Six addition operations are reduced.
b). For any N number of consecutive 1’s, there are N-2
addition operations are reduced. For hundred 1’s, 98
addition operations are reduced.
13.The two numbers given below are multiplied using the Booth’s
algorithm.
Multiplicand : 0101 1010 1110 1110 Multiplier: 0111 0111 1011 1101
How many additions/Subtractions are required for the multiplication of
the above two numbers? (Gate 2008)
(A) 6 (B) 8 (C) 10 (D) 12
Booth’s algorithm: first take 2’s complement of given number if number
is negative, then append 0 into LSB.
Then, for each pair from LSB to MSB (add 1 bit at a time):
00 = 0, 01 = +1, 10 = -1, 11 = 0
Booth’s algorithm is based on Multiplier which is already given in binary
representation: 0111 0111 1011 1101
Now, append 0 into LSB of (0111 0111 1011 1101) =
0111 0111 1011 1101 0
Now Booth's code (add 1 bit at a time, from LSB to MSB):
= 01, 11, 11, 10, 01, 11, 11, 11, 10, 01, 11, 11, 11, 10, 01, 10
= +1 0 0 -1 +1 0 0 0 -1 +1 0 0 0 -1 +1 -1 Therefore, 4 subtractions and 4
additions, total 8 additions/Subtractions are required.
Floating-Point Numbers
and
Operations
Floating-point (FP) numbers
• IEEE standard 754-2008 defines representation and
operations for floating-point numbers
• The 32-bit single-precision format is:
A sign bit: S (0 for +, 1 for -)
An 8-bit signed exponent: E (base = 2)
A 23-bit mantissa fraction magnitude:M
FP numbers
• The value represented is
+/- 1.M x 2E
0 10000101 0110 …
S E’ M
+ 1.0110 x 26
1.110010…
4. C = 0 10000101 110010…
FP Multiplication
• Multiply procedure:
Result = 1.1100
Implementation of FP operations
• A considerable amount of logic circuitry is needed to
implement floating-point operations in hardware, especially
if high performance is needed
• It is also possible to implement floating-point operationsin
software
• A hardware addition/subtraction unit is shown in the next
figure
12. Convert the decimal fraction 0.1 to a binary fraction. If the
conversion is not exact, give the binary fraction approximation to 8
bits after the binary point using each of the three truncation
methods.
Multiplying the decimal fraction 0.1 by 2 repeatedly, it generates the
sequence of bits 0, 0, 0, 1, 1, 0, 0, 1, 1, 0, 0, 1, 1, . . . to the left of the
decimal point, which continues indefinitely, repeating the pattern 0, 0, 1,
1.
• Truncation by chopping gives 0.00011001
• Truncation by von Neumann rounding gives 0.00011001
• Truncation by rounding gives 0.00011010
13.Consider the following 12-bit floating-point number
representation format that is manageable for working through
numerical exercises. The first bit is the sign of the number. The next
five bits represent an excess-15 exponent for the scale factor, which
has an implied base of 2. The last six bits represent the fractional
part of the mantissa, which has an implied 1 to the left of the binary
point. Perform Subtract and Multiply operations on the operands
which represent the numbers
A = 1.011011 × 22 and B = −1.101010 ×20
Subtraction:
According to the Add/Subtract rule , perform the following four steps:
1.Shift the mantissa of B to the right by two bit positions, giving
0.01101010.
2. Set the exponent of the result to 10001.
3.Subtract the mantissa of B from the mantissa of A by adding
mantissas, because B is negative, giving and set the sign of the result to
0 (positive).
4.The result is in normalized form, but the fractional part of the mantissa
needs to be truncated to six bits. If this is done by rounding, the two bits
to be removed represent the tie case, so round to the nearest even number
by adding 1, obtaining a result mantissa of 1.110110. The answer is
Multiplication
According to the Multiplication rule to perform the following three
steps:
1.Add the exponents and subtract 15 to obtain 10001 as the exponent of
the result.
2.Multiply mantissas to obtain 10.010110101110 as the mantissa of the
result. The sign of the result is set to 1 (negative).
3.Normalize the resulting mantissa by shifting it to the right by one bit
position. Then add 1 to the exponent to obtain 10010 as the exponent of
the result. Truncate the mantissa fraction to six bits by rounding to obtain
the result.
14. Consider that floating-point numbers are represented in a 12-bit
format. The scale factor has an implied base of 2 and a 5-bit, excess-15
exponent, with the two end values of 0 and 31 used to signify exact 0 and
infinity, respectively. The 6-bit mantissa is normalized as in the IEEE
format, with an implied 1 to the left of the binary point.
(a) Represent the numbers +1.7, −0.012, +19, and 1/8 in this format.
+1.7 0 01111 101101
−0.012 1 01000 100010
+19 0 10011 001100
1/8 0 01100 000000
(b)What are the smallest and largest numbers representablein
this format?
Other than exact 0 and ±infinity, the smallest numbers are
±1.000000 × 2−14 and the largest numbers are ±1.111111× 215.
(c)How does the range calculated in part (b) compare to the
ranges of a 12-bit signed integer and a 12-bit signed fraction?
Assuming sign-and-magnitude format, the smallest and largest integers
(other than 0) are ±1 and ±(211 − 1); and the smallest and largest fractions
(other than 0) are ±2−11 and approximately ±1.
(d) Perform Add, Subtract, Multiply, and Divide operations on
the operands
A + B =0 10000 000000
A − B =0 10000 110110
A × B =1 10000 001011
A/B = 1 10000 101110