0% found this document useful (0 votes)
14 views

CO Unit 2

l

Uploaded by

duskdelight763
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views

CO Unit 2

l

Uploaded by

duskdelight763
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 110

UNIT 2

Arithmetic Unit
This Unit Focused on
Addition and subtraction of two numbers are basic operations
at the machine-instruction level in all computers. These
operations, as well as other arithmetic and logic operations,
are implemented in the arithmetic and logic unit (ALU) of the
processor.
To know about
The logic circuits used to implement arithmetic operations.
The time needed to perform addition or subtraction affects the
processor’s performance.
Multiply and divide operations, which require more complex
circuitry than either addition or subtraction operations, also
affect performance.
To know some of the techniques used in modern computers
to perform arithmetic operations at high speed. Operations on
floating-point numbers are also described.
Addition/subtraction of signed numbers
At the ith stage:
Input:
ci is the carry-in
Output:
si is the sum
ci+1 carry-out to (i+1)st state
The logic expression
for si can be
implemented with a 3-
input XOR gate
required for a single
stage of binary
addition. The carry-
out function, ci+1, is
implemented with an
AND-OR circuit. A
complete circuit for a
single stage of
addition, called a full
adder (FA).
An n-bit ripple-carry adder
•Cascade n full adder (FA) blocks to form a n-bit adder
•Carries propagate or ripple through this cascade, the
configuration is called n-bit ripple carry adder
The carry-in, c0, into the
least-significant-bit(LSB)
position provides a
convenient way of adding
1 to a number. For
instance, forming the 2’s-
complement of a number
involves adding 1 to the
1’s-complement of the
number. So, same circuit
is used for addition and
subtraction.
K n-bit adder
K n-bit numbers can be added by cascading k n-bit adders.
Addition/Subtraction Logic Unit
n-bit adder/subtractor

•Add/sub control = 0,
addition.
•Add/sub control = 1,
subtraction.
Computing the add time

x0 y0

Consider 0th stage:


c1 FA c0
•c1 is available after 2 gate delays.
•s0 is available after 1 gate delay.
s0

Su Carry
m yi
c
i
xi
xi
yi si ci +1
c
i
ci
xi
yi
Computing the add time (contd..)
Cascade of 4 Full Adders, or a 4-bit adder

x0 y0 x0 y0 x0 y0 x0 y0

FA FA FA FA c0
c4 c3 c2 c1

s3 s2 s1 s0
•s0 available after 1 gate delays, c1 available after 2 gate delays.
•s1 available after 3 gate delays, c2 available after 4 gate delays.
•s2 available after 5 gate delays, c3 available after 6 gate delays.
•s3 available after 7 gate delays, c4 available after 8 gate delays.

For an n-bit adder, sn-1 is available after 2n-1 gate delays


cn is available after 2n gatedelays.
Design of Fast Adders

Two approaches can be taken to reduce delay in adders.


The first approach is to use the fastest possible electronic
technology.
The second approach is to use a logic gate network called a
carry-lookahead network.
Carry-Lookahead Addition
The logic expressions for si (sum) and ci+1 (carry-out) of
stage I are
s i = x i ⊕ yi ⊕ and ci+1 = xiyi + xici + yici
ci
Second equation can be written as: ci+1 = xiyi + (xi + yi)ci
ci+1 = Gi + Pici Where Gi = xiyi and Pi = xi + yi
The expressions Gi and Pi are called the generate and
propagate functions for stage i.

In CLA Gi and Pi are computed only from xi and yi and not
consider ci, thus they can be computed in one gate delay after
X and Y are applied to the inputs of an n-bitadder.
 If the generate function for stage i is equal to 1, then
ci+1 = 1, independent of the input carry, ci . This occurs when
both xi and yi are1.
The propagate function means that an input carry will
produce an output carry when either xi is 1 or yi is 1.
All Gi and Pi functions can be formed independently and in
parallel in one logic-gate delay after the X and Y operands are
applied to the inputs of an n-bit adder.
Each bit stage contains an AND gate to form Gi , an OR
gate to form Pi , and a three-input XOR gate to form si .
A simpler circuit can be derived by observing that an
adequate propagate function can be realized as Pi = xi ⊕ yi ,
which differs from Pi = xi + yi only when xi = yi = 1. But, in
this case Gi = 1, so it does not matter whether Pi is 0 or 1.
A 4-bit carry-lookahead adder
Let us consider the design of a 4-bit adder. The carries can be
implemented as
ci+1 = Gi + Pici  c1 = G0 + P0c0
c2 = G1 + P1c1  c2 = G1 + P1 (G0 + P0c0)
c3 = G2 + P2G1 + P2P1G0 +P2P1P0c0
c 4 = G 3 + P 3 G 2 + P 3 P2 G 1 + P3 P 2 P1 G 0 + P3 P2 P 1 P0 c 0
ci+1 = Gi + PiGi−1 + PiPi−1ci−1
Continuing this type of expansion, the final expression for any
carry variable is
ci+1 = Gi + PiGi−1 + PiPi−1Gi−2 +・ ・ +PiPi−1 ・ ・ P1G0 +
PiPi−1 ・ ・ ・ P0c0
Thus, all carries can be obtained three gate delays after the input
operands X , Y , and c0 are applied because
one gate delay is needed to develop all Pi and Gi signals,
followed by two gate delays in the AND-OR circuit for ci+1. After
a further XOR gate delay, all sum bits are available(1 gate
•Independent of n, n-bit additionrequires only 4 gate delays.
This is called Carry Lookahead adder.

Delay through the adder is 3 gate delays for all carry bits and 1
gate delays for all sum bits. In comparison, a 4-bit ripple-carry
adder requires 7 gate delays for sum and 8 gate delays for
carry.
Delay in 4-bit
adder
• Ripple-carry design:
8 gate delays for carry: (2 for each FA)
×4
7 gate delays for sum
• Carry-lookahead design:
1 for all Pi and
Gi 2 for all ci
 1 for all si

4 gate delays
Blocked Carry-Lookahead adder
Carry-out from a 4-bit block can be given as:
c  G  P G  P 3 P 2 G 1  P 3 P 2 P1 G  P 3 P2 P 1 P 0 c 0
4 3 3 2 0

Rewrite this as:


I
P0  P3 P2 P1 P0

G 0  G 3  P3 G 2  P3 P2 G 1  P3 P 2 P1G 0
I

Subscript I denotes the blocked carry lookahead and identifies


the block.
Cascade 4 4-bit adders, c16 can be expressed as:
Multiplication
Multiplication of unsigned numbers example 1

Product of 2 n-bit numbers is at most a 2n-bit number.


Unsigned multiplication can be viewed as addition of
shifted versions of the multiplicand.
Multiplication of unsigned numbers example 2
Multiplication of unsigned numbers

Typical multiplication cell


Combinatorial array multiplier
The main component in each cell is a full adder, FA. The AND
gate in each cell determines whether a multiplicand bit, mj , is
added to the incoming partial-product bit, based on the value
of the multiplier bit, qi . Each row i, where 0 ≤ i ≤ 3, adds the
multiplicand (appropriately shifted) to the incoming partial
product, PPi, to generate the outgoing partialproduct,
PP(i + 1), if qi = 1. If qi = 0, PPi is passed vertically
downward unchanged. PP0 is all 0s, and PP4 is the desired
product. The multiplicand is shifted left one position per row
by the diagonal signal path.
The worst-case signal propagation delay path is from the
upper right corner of the array to the high-order product bit
output at the bottom left corner of the array. This critical path
consists of the staircase pattern that includes the two cells at
the right end of each row, followed by all the cells in the
bottom row.
Row 1
Row 2

Row 3

Row 4

Row 5

There are two gate delays from the inputs to the outputs of a
full-adder block, FA, the critical path has start from Row1 to
Row5
• Combinatorial array multipliers are:
– Extremely inefficient.
– Have a high gate count for multiplying numbers of
practical size such as 32-bit or 64-bit numbers.(ex) For a
32×32 bits multiplication requires 1024 AND gates and
1024 FA circuits are required. So, it require large space to
place all these gates.
– Perform only one function, namely, unsigned integer
product.
• Improve gate efficiency by using a mixture of combinatorial
array techniques and sequential techniques requiring less
combinational logic.
Sequential multiplication
• The combinational array multiplier uses a large number of
logic gates for multiplying numbers of practical size, such
as 32- or 64-bit numbers. Multiplication of two n-bit
numbers can also be performed in a sequential circuit that
uses a single n-bit adder.
Sequential multiplication Example
Sequential Circuit Multiplier
Registers A and Q are shift registers.Together, they hold
partial product PPi while multiplier bit qi generates the signal
Add/Noadd. This signal causes the multiplexer MUX to select
0 when qi = 0, or to select the multiplicand M when qi = 1, to
be added to PPi to generate PP(i + 1). The product is
computed in n cycles.
The partial product grows in length by one bit per cycle from
the initial vector, PP0, of n 0s in register A. The carry-out from
the adder is stored in flip-flop C, shown at the left end of
registerA.
At the start, the multiplier is loaded into register Q, the
multiplicand into register M, and C and A are cleared to 0. At
the end of each cycle, C, A, and Q are shifted right one bit
position to allow for growth of the partial product as the
multiplier is shifted out of register Q.
Because of this shifting, multiplier bit qi appears at the LSB
position of Q to generate the Add/Noadd signal at the correct
time, starting with q0 during the first cycle, q1 during the
second cycle, and so on. After they are used, the multiplier bits
are discarded by the right-shift operation.
The carry-out from the adder is the leftmost bit of PP(i + 1),
and it must be held in the C flip-flop to be shifted right with
the contents of A and Q.
After n cycles, the high-order half of the product is held in
register A and the low-order half is in registerQ.
Multiplication of Signed Numbers
Signed Multiplication
Considering 2’s-complement signed operands, what will happen to
(-13)(+11) if following the same method of unsigned multiplication?
Normal Multiplication Booth Multiplication
Booth recoding of a multiplier

Booth multiplier recoding table


Booth recoded multipliers

A 16-bit worst-case multiplier, an ordinary multiplier, and a


good multiplier are shown
• The transformation 011 . . . 110⇒+1 0 0. . . 0 −1 0 is
called skipping over 1s. This term is derived from the case
in which the multiplier has its 1s grouped into a few
contiguous blocks.
• Only a few versions of the shifted multiplicand (the
summands) need to be added to generate the product, thus
speeding up the multiplication operation.
• However, in the worst case—that of alternating 1s and 0s
in the multiplier—each bit of the multiplier selects a
summand. In fact, this results in more summands than if
the Booth algorithm were not used.
• The Booth algorithm has two attractive features. 1. It
handles both positive and negative multipliers uniformly.
2. it achieves some efficiency in the number of additions
required when the multiplier has a few large blocks of1s.
Booth multiplication with a negative multiplier
Fast Multiplication (or) High-speed multipliers

Neither the combinational array nor the sequential circuit


multiplier are fast enough for high performance processors

Two approaches are used for higher speed:

1. Reduce the number of summands

2. Use more parallelism in adding them


Two techniques for speeding up the multiplication operation
1. Bit-Pair Recoding of Multipliers: This technique
guarantees that the maximum number of summands
(versions of the multiplicand) that must be added is n/2 for
n-bit operands.
2. Carry-Save Addition of Summands (CSA): The second
technique leads to adding the summands in parallel.

Bit-Pair Recoding of Multipliers


•It is derived directly from the Booth algorithm
Example of bit-pair recoding derived from Booth recoding

Table of multiplicand selection decisions


Example of bit-pair recoding derived from Booth recoding
Carry-Save Addition of Summands
Ripple-carry array

P7 54
P6 P5 P4 P3 P2 P1 P0
Carry-Save Addition of Summands
CSA speeds up the addition process.

P7 P6 P5 P4 P3 P2 P1 P0
• Consider the 4 × 4 multiplication array in which the first
row consists of AND gates that produce the four inputs
m3q0,m2q0, m1q0, and m0q0.
• Instead of letting the carries ripple along the rows, they can
be ―saved‖ and introduced into the next row, at the correct
weighted positions. This frees up an input to each of three
full adders in the first row. These inputs can be used to
introduce the third summand bits m2q2, m1q2, and m0q2.
• Now, two inputs of each of three full adders in the second
row are fed by the sum and carry outputs from the firstrow.
• The third input is used to introduce the bits m2q3, m1q3, and
m0q3 of the fourth summand. The high-order bits m3q2 and
m3q3 of the third and fourth summands are introduced into
the remaining free full-adder inputs at the left end in the
second and third rows.
• The saved carry bits and the sum bits from the second row
are now added in the third row, which is a ripple-carry
adder, to produce the final product bits.
• The delay through the carry-save array is somewhat less
than the delay through the ripple-carry array. This is because
the S and C vector outputs from each row are produced in
parallel in one full-adder delay.
Summand Addition Tree using 3-2 Reducers
• Consider the addition of many summands,
 Group the summands in threes and perform carry-save
addition on each of these groups in parallel to generate a
set of S and C vectors in one full-adder delay
 Group all of the S and C values into threes, and perform
carry-save addition on them, generating a further set of S
and C values in one more full-adder delay
 Continue with this process until there are only two values
remaining . They can be added in a RCA or CLA to
produce the desired product
 The adder at each bit position of the three summands is
called a 3-2 reducer, and the logic circuit structure that
reduces a number of summands to two is called a CSA tree.
The final two S and C vectors can be added in a carry-
lookahead adder to produce the desired product.
Multiplication example used to illustrate carry-saveaddition
Schematic representation of the carry-save addition operations
Integer
Division
Manual Division

Longhand Division Steps


Step1: Position the divisor appropriately with respect to the dividend
and performs a subtraction.
Step2:If the remainder is zero or positive, a quotient bit of 1 is
determined, the remainder is extended by another bit of the dividend,
the divisor is repositioned, and another subtraction is performed.
Step3: If the remainder is negative, a quotient bit of 0 is determined,
the dividend is restored by adding back the divisor, and the divisor is
repositioned for another subtraction.
Circuit arrangement for binary division
Two types of Division algorithm
1. Restoring Division ( Dividend is restored by adding back
the divisor)
2. Non Restoring Division
For both Restoring and Non Restoring Divisionmethods,
• An n-bit positive divisor is loaded into register M and an
n-bit positive dividend is loaded into register Q at the start of
the operation. RegisterA is set to 0.
•After the division is complete, the n-bit quotient is in
register Q and the remainder is in registerA.
•The required subtractions are facilitated by using 2’s-
complement arithmetic.
•The extra bit position at the left end of both A and M
accommodates the sign bit during subtractions.
Restoring Division
Steps:
•ShiftA and Q left one binary
position
•Subtract M from A, andplace
the answer back inA
•If the sign of A is 1, set q0 to 0
and add M back to A (restoreA);
otherwise, set q0 to1
•Repeat these steps n times
Example
Stage 1: (Repeat n times)
If the sign of A is 0, shift
A and Q left one bit
position and subtract M
from A; otherwise, shift A
and Q left and add M toA.
Now, if the sign of Ais 0,
set q0 to 1; otherwise, set
q0 to 0.
Stage 2: If the sign of A is
1, add M to A. The
remainder is atA.
Example
1. How many logic gates are needed to build the 16-bit
carry-lookahead adder built from 4-bit adders?
Each B cell requires 3 gates. Hence, 12 gates are needed for
all four B cells. The carries c1, c2, c3, and c4, produced by the
carry-lookahead logic, require 2, 3, 4, and 5 gates,
respectively. The carry-lookahead logic also produces GI0,
using 4 gates, and PI 0, using 1 gate. Hence, a total of 19 gates
are needed to implement the carry-lookahead logic. The
complete 4-bit adder requires 12 + 19 = 31 gates, with a
maximum fan-in of 5.
I
P0  P3 P2 P1 P0

G
I
 G  P G  P P G  P P P G
0 3 3 2 3 2 1 3 2 1 0
4-bit carry-lookahead Adder
2. Assuming 6-bit 2’s-complement number representation, multiply the
multiplicand A= 110101 by the multiplier B = 011011 using both the
normal Booth algorithm and the bit-pair recoding Booth algorithm.
3. Show that the logic expression cn ⊕ cn−1 is a correct
indicator of overflow in the addition of 2’s-complement
integers by using an appropriate truth table.
Consider the truth table of Full Adder. Overflow occurs only
when xn−1 and yn−1 are the same and sn−1 is different. This
occurs in the second and seventh rows of the table; and cn and
cn−1 are different only in those rows. Therefore, cn ⊕ cn−1 is a
correct indicator of overflow.
4. Show that the worst case delay through an n × n array of
the type shown in Figure is 6(n − 1) − 1 gatedelays,
No full adders are needed in the first row of the array because
the incoming partial product PP0 is zero. Each of the two FA
blocks in rows 2 through n -1 introduces 2 gate delays, for a
total of 4(n-2) gate delays. Row n introduces 2n gate delays.
Adding in the initial AND gate delay for row 1 and all other
cells, total delay is:
4(n -2) + 2n + 1 = 6n - 8 + 1 = 6(n -1) -1
5. Multiply each of the following pairs of signed 2’s-
complement numbers using the Booth algorithm. In each case,
assume that A is the multiplicand and B is themultiplier.
(a) A = 010111 and B = 110110
(b) A = 110011 and B = 101100
(c) A = 110101 and B = 011011
(d) A = 001111 and B = 001111
6.Multiply each of the following pairs of signed 2’s-
complement numbers using the bit-pair recoding of the
multiplier. In each case, assume that A is the multiplicand and
B is the multiplier.
(a) A = 010111 and B = 110110
(b) A = 110011 and B = 101100
(c) A = 110101 and B = 011011
(d) A = 001111 and B = 001111
7. Using manual methods, perform the operations A × B and A ÷ B
on the 5-bit unsigned numbers A = 10101 and B = 00101.
8. Show how the multiplication and division operations would
be performed by the hardware and construct the charts to
perform the operations A × B and A ÷ B on the 5-bit unsigned
numbers A= 10101 and B = 00101
The multiplication chart is
The division chart is
9. How many levels of 4-2 reducers are needed to reduce k summands to
2 in a reduction tree? How many levels are needed if 3-2 reducers are
used?
Let the number of levels be L. For 4-2 reducers, we have k(2/4)L = 2
where denominator represent number of inputs and Numerator represent
number of outputs. so, this can be reduced as k(1/2)L = 2 .Take
logarithms to the base 2 of each side of this equation to derive
log(k(1/2)L ) = log2 2
log2 k + log2(1/2L ) =1
log2 k + log21 - log22L =1
log2 k + 0 - L log22 =1
log2 k –L =1  L =log2 k - 1

For 3-2 reducers, we have k(2/3)L = 2. Taking logarithms to the base 2,


we derive log2k + L(log22 − log23) = log22
log2k + L(1 − 1.59) = 1
L = (1 − log2k)/(−0.59)  (1/ −0.59 - log2k / −0.59 ) = − 1.7+ 1.7log2k
L = 1.7log2k − 1.7
How to find log23?
log ba = log na / log nb = log 103 / log 102 = 0.4771/0.3010 = 1.585
These expressions are only approximations unless the number of input
summands to each level is a multiple of 4 in the case of 4-2 reduction, or
is a multiple of 3 in the case of 3-2 reduction.

10. (a) How many 3-2 reduction levels are needed to reduce 16
summands to 2?
(b)Show that the claim of 8 levels is required for reducing 32 summands
to 2 .
(c)Compare the exact answers in parts (a) and (b) to the results
obtained by using the approximation developed in 3-2 reduction.
(a) Six 3-2 reduction levels are needed:
(b) To show that Eight 3-2 reduction levels areneeded:
c. For 3-2 reducers, L = 1.7log2k − 1.7
Here k is number of levels , So k= 6 for 16 summands to 2 and k= 8 for
32 summands to 2
Substitute k=6, L = 1.7log26 − 1.7  L = 1.7log223− 1.7 = (3×1.7)-1.7
=3.4
Substitute k=8, L = 1.7log28 − 1.7  L = 1.7log224 − 1.7 = (4×1.7)-1.7
= 5.1
11. Indicate generally how to modify the sequential multiplier circuit
diagram to implement multiplication of 2’s-complement n-bit numbers
using the Booth algorithm, by clearly specifying inputs and outputs for
the Control sequencer and any other changes needed around the adder
and register A.

Both the A and M registers are augmented by one bit to the left to hold a
sign extension bit. The adder is changed to an n + 1-bit adder. A bit is
added to the right end of the Q register to implement the Booth multiplier
recoding operation. It is initially set to zero. The control logic decodes
the two bits at the right end of the Q register according to the Booth
algorithm. The right shift is an arithmetic right shift as indicated by the
repetition of the extended sign bit at the left end of the Aregister.
12. a. Apply Booth algorithm for a bit pattern for the
multiplier 11111111. How many addition operations are to be
reduced?
b. Similarly suppose a bit pattern consists of hundred 1’s ,
how many addition operations are reduced?

a). 1 1 1 1 1 1 1 1
+1 0 0 0 0 0 0 -1
Six addition operations are reduced.
b). For any N number of consecutive 1’s, there are N-2
addition operations are reduced. For hundred 1’s, 98
addition operations are reduced.
13.The two numbers given below are multiplied using the Booth’s
algorithm.
Multiplicand : 0101 1010 1110 1110 Multiplier: 0111 0111 1011 1101
How many additions/Subtractions are required for the multiplication of
the above two numbers? (Gate 2008)
(A) 6 (B) 8 (C) 10 (D) 12
Booth’s algorithm: first take 2’s complement of given number if number
is negative, then append 0 into LSB.
Then, for each pair from LSB to MSB (add 1 bit at a time):
00 = 0, 01 = +1, 10 = -1, 11 = 0
Booth’s algorithm is based on Multiplier which is already given in binary
representation: 0111 0111 1011 1101
Now, append 0 into LSB of (0111 0111 1011 1101) =
0111 0111 1011 1101 0
Now Booth's code (add 1 bit at a time, from LSB to MSB):
= 01, 11, 11, 10, 01, 11, 11, 11, 10, 01, 11, 11, 11, 10, 01, 10
= +1 0 0 -1 +1 0 0 0 -1 +1 0 0 0 -1 +1 -1 Therefore, 4 subtractions and 4
additions, total 8 additions/Subtractions are required.
Floating-Point Numbers
and
Operations
Floating-point (FP) numbers
• IEEE standard 754-2008 defines representation and
operations for floating-point numbers
• The 32-bit single-precision format is:
A sign bit: S (0 for +, 1 for -)
An 8-bit signed exponent: E (base = 2)
A 23-bit mantissa fraction magnitude:M
FP numbers
• The value represented is

+/- 1.M x 2E

• E is actually encoded as E’= E + 127

which is called an excess-127 representation


FP numbers
• Example of 32-bit number representation:

0 10000101 0110 …
S E’ M

• Value represented (with E = E’– 127 = 133 – 127 = 6):

+ 1.0110 x 26

• This is a called a normalized representation, with binary


point to the right of first significantbit

• 64-bit double-precision is similar with more bits for E’& M


FPAddition/Subtraction
• Add/Subtract procedure:

1. Shift mantissa of number with smaller exponent to the


right
2. Set exponent of result to larger exponent
3. Perform addition/subtraction of mantissas and set sign of
result
4. Normalize the result, if necessary
FP Addition example
• Perform C = A+ B for
A = 0 10000101 0110…
B = 0 10000011 1010…
1. Shift mantissa of B two places to right
2. Set exponent of C to 10000101
3. Add mantissas
1.011000…
+ 0.011010…

1.110010…
4. C = 0 10000101 110010…
FP Multiplication
• Multiply procedure:

1. Add exponents and subtract 127


(to maintain excess-127 representation)

2. Multiply mantissas, determine sign of result

3. Normalize result, if necessary


FP Division
• Divide procedure:

1. Subtract exponents and add 127


(to maintain excess-127 representation)

2. Divide mantissas, determine sign of result

3. Normalize result, if necessary


Truncation of FP mantissas

• The mantissa resulting from an arithmetic operationon two


floating-point numbers may be longer than 24 bits
• It must be truncated to 24 bits (for 32-bit FP)
• The IEEE standard requires that rounding to the nearest24-
bit value is the truncation method to beused
Rounding an FP mantissa

• Consider examples of rounding an 8-bit mantissa to a5-bit


length to illustrate the roundingoperation:
Ex. 1: Round 1.1011011
Result = 1.1011
Ex. 2: Round 1.1011110
1.1011
+ 0.0001

Result = 1.1100
Implementation of FP operations
• A considerable amount of logic circuitry is needed to
implement floating-point operations in hardware, especially
if high performance is needed
• It is also possible to implement floating-point operationsin
software
• A hardware addition/subtraction unit is shown in the next
figure
12. Convert the decimal fraction 0.1 to a binary fraction. If the
conversion is not exact, give the binary fraction approximation to 8
bits after the binary point using each of the three truncation
methods.
Multiplying the decimal fraction 0.1 by 2 repeatedly, it generates the
sequence of bits 0, 0, 0, 1, 1, 0, 0, 1, 1, 0, 0, 1, 1, . . . to the left of the
decimal point, which continues indefinitely, repeating the pattern 0, 0, 1,
1.
• Truncation by chopping gives 0.00011001
• Truncation by von Neumann rounding gives 0.00011001
• Truncation by rounding gives 0.00011010
13.Consider the following 12-bit floating-point number
representation format that is manageable for working through
numerical exercises. The first bit is the sign of the number. The next
five bits represent an excess-15 exponent for the scale factor, which
has an implied base of 2. The last six bits represent the fractional
part of the mantissa, which has an implied 1 to the left of the binary
point. Perform Subtract and Multiply operations on the operands
which represent the numbers
A = 1.011011 × 22 and B = −1.101010 ×20
Subtraction:
According to the Add/Subtract rule , perform the following four steps:
1.Shift the mantissa of B to the right by two bit positions, giving
0.01101010.
2. Set the exponent of the result to 10001.
3.Subtract the mantissa of B from the mantissa of A by adding
mantissas, because B is negative, giving and set the sign of the result to
0 (positive).

4.The result is in normalized form, but the fractional part of the mantissa
needs to be truncated to six bits. If this is done by rounding, the two bits
to be removed represent the tie case, so round to the nearest even number
by adding 1, obtaining a result mantissa of 1.110110. The answer is
Multiplication
According to the Multiplication rule to perform the following three
steps:
1.Add the exponents and subtract 15 to obtain 10001 as the exponent of
the result.
2.Multiply mantissas to obtain 10.010110101110 as the mantissa of the
result. The sign of the result is set to 1 (negative).
3.Normalize the resulting mantissa by shifting it to the right by one bit
position. Then add 1 to the exponent to obtain 10010 as the exponent of
the result. Truncate the mantissa fraction to six bits by rounding to obtain
the result.
14. Consider that floating-point numbers are represented in a 12-bit
format. The scale factor has an implied base of 2 and a 5-bit, excess-15
exponent, with the two end values of 0 and 31 used to signify exact 0 and
infinity, respectively. The 6-bit mantissa is normalized as in the IEEE
format, with an implied 1 to the left of the binary point.
(a) Represent the numbers +1.7, −0.012, +19, and 1/8 in this format.
+1.7 0 01111 101101
−0.012 1 01000 100010
+19 0 10011 001100
1/8 0 01100 000000
(b)What are the smallest and largest numbers representablein
this format?
Other than exact 0 and ±infinity, the smallest numbers are
±1.000000 × 2−14 and the largest numbers are ±1.111111× 215.
(c)How does the range calculated in part (b) compare to the
ranges of a 12-bit signed integer and a 12-bit signed fraction?
Assuming sign-and-magnitude format, the smallest and largest integers
(other than 0) are ±1 and ±(211 − 1); and the smallest and largest fractions
(other than 0) are ±2−11 and approximately ±1.
(d) Perform Add, Subtract, Multiply, and Divide operations on
the operands

A + B =0 10000 000000
A − B =0 10000 110110
A × B =1 10000 001011
A/B = 1 10000 101110

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy