DLCOunit 3
DLCOunit 3
DLCOunit 3
OBJECTIVES:
In this lesson, you will learn about Execution of instructions by a processor, the functional units
of a processor and how they are interconnected, hardware for generating control signals and
microprogrammed control and fixed point and floating-point arithmetic for ALU operation
such as adder and subtractor circuits, high-speed adders based on carry-lookahead logic
circuits, the Booth algorithm for multiplication of signed numbers, logic circuits for division
and arithmetic operations on floating-point numbers conforming to the IEEE standard
CONTENTS:
1. Fundamental Concepts
2. Computer Arithmetic
Introduction
REGISTER TRANSFERS
Instruction execution involves a sequence of steps in which data are transferred from
one register to another. For each register, two control-signals are used: Riin & Riout. These
are called Gating Signals. Riin=1 = data on bus is loaded into Ri. Riout=1 as content of Ri is
placed on bus. Riout=0, makes bus can be used for transferring data from other registers.For
example, Move R1, R2; This transfers the contents of register R1 to register R2. This can be
accomplished as follows:
1) Enable the output of registers R1 by setting R1out to 1 (Figure 7.2). This places the contents
of R1 on processor-bus.
2) Enable the input of register R2 by setting R2out to 1. This loads data from processor-bus
into register R4.
All operations and data transfers within the processor take place within time-periods
defined by the processor-clock. The control-signals that govern a particular transfer are
asserted at the start of the clock cycle.
1) R1out, Yin
2) R2out, SelectY, Add, Zin
3) Zout, R3in
Instruction execution proceeds as follows:
Step 1 --> Contents from register R1 are loaded into register Y.
Step2 --> Contents from Y and from register R2 are applied to the A and B inputs of ALU;
Addition is performed & Result is stored in the Z register.
Step 3 --> The contents of Z register is stored in the R3 register.
The signals are activated for the duration of the clock cycle corresponding to that
step. All other signals are inactive.
CONTROL-SIGNALS OF MDR
The MDR register has 4 control-signals (Figure). MDRin & MDRout control the
connection to the internal processor data bus & MDRinE & MDRoutE control the connection
to the memory Data bus. MAR register has 2 control-signals. MARin controls the connection
to the internal processor address bus & MARout controls the connection to the memory
address bus.
FETCHING A WORD FROM MEMORY
To fetch instruction/data from memory, processor transfers required address to MAR.
At the same time, processor issues Read signal on control-lines of memory-bus. When
requested-data are received from memory, they are stored in MDR. From MDR, they are
transferred to other registers. The response time of each memory access varies (based on cache
miss, memory-mapped I/O). To accommodate this, MFC is used. (MFC makes Memory
Function Completed). MFC is a signal sent from addressed-device to the processor. MFC
informs the processor that the requested operation has been completed by addressed-device.
Consider the instruction Move (R1),R2. The sequence of steps is (Figure): R1out,
MARin, Read ;desired address is loaded into MAR & Read command is issued. MDRinE,
WMFC; load MDR from memory-bus & Wait for MFC response from memory. MDRout,
R2in; load R2 from MDR where WMFC=control-signal that causes processor's control.
circuitry to wait for arrival of MFC signal.
Storing a Word in Memory
Consider the instruction Move R2,(R1). This requires the following sequence: R1out,
MARin; desired address is loaded into MAR. R2out, MDRin, Write; data to be written are
loaded into MDR & Write command is issued. MDRoutE, WMFC ;load data into memory
location pointed by R1 from MDR.
BRANCHING INSTRUCTIONS
Control sequence for an unconditional branch instruction is as follows: Instruction
execution proceeds as follows:
Step 1-3: The processing starts & the fetch phase ends in step3.
Step 4: The offset-value is extracted from IR by instruction-decoding circuit. Since the
updated value of PC is already available in register Y, the offset X is gated onto the bus, and
an addition operation is performed.
Step 5: The result, which is the branch-address, is loaded into the PC.
The branch instruction loads the branch target address in PC so that PC will fetch the
next instruction from the branch target address. The branch target address is usually obtained
by adding the offset in the contents of PC. The offset X is usually the difference between the
branch target-address and the address immediately following the branch instruction.
In case of conditional branch, we have to check the status of the condition-codes
before loading a new value into the PC. e.g.: Offset-field-of-IRout, Add, Zin, If N=0 then End
If N=0, processor returns to step 1 immediately after step 4. If N=1, step 5 is performed to
load a new value into PC.
As shown in figure, three buses can be used to connect registers and the ALU of the
processor. All general-purpose registers are grouped into a single block called the Register
File. Register-file has 3 ports:
1) Two output-ports allow the contents of 2 different registers to be simultaneously placed on
buses A & B.
2) Third input-port allows data on bus C to be loaded into a third register during the same
clock-cycle.
Buses A and B are used to transfer source-operands to A & B inputs of ALU. The
result is transferred to destination over bus C. Incrementer Unit is used to increment PC by 4.
Instruction execution proceeds as follows:
Step 1: Contents of PC are passed through ALU using R=B control-signal & loaded into MAR
to start memory Read operation. At the same time, PC is incremented by 4.
Step2: Processor waits for MFC signal from memory.
Step3: Processor loads requested-data into MDR, and then transfers them to IR.
Step4: The instruction is decoded and add operation takes place in a single step.
COMPLETE PROCESSOR
This has separate processing-units to deal with integer data and floating-point data.
Integer unit has to process integer data. (Figure). Floating unit has to process floating point
data. Data-Cache is inserted between these processing-units & main-memory. The integer and
floating unit gets data from data cache. Instruction-Unit fetches instructions from an
instruction-cache or from main-memory when desired instructions are not already in cache.
Processor is connected to system-bus & hence to the rest of the computer by means of
a Bus Interface. Using separate caches for instructions & data is common practice in many
processors today. A processor may include several units of each type to increase the potential
for concurrent operations. The 80486 processor has 8-kbytes single cache for both instruction
and data. Whereas the Pentium processor has two separate 8 kbytes caches for instruction and
data.
Note: To execute instructions, the processor must have some means of generating the control-
signals. There are two approaches for this purpose:
1) Hardwired control and
2) Microprogrammed control.
HARDWIRED CONTROL
Hardwired control is a method of control unit design (Figure). The control-signals are
generated by using logic circuits such as gates, flip-flops, decoders etc. Decoder / Encoder
Block is a combinational-circuit that generates required control-outputs depending on state of
all its inputs. Instruction decoder decodes the instruction loaded in the IR. If IR is an 8 bit
register, then instruction decoder generates 28(256 lines); one for each instruction. It consists
of a separate output-lines INS1 through INSm for each machine instruction. According to code
in the IR, one of the output-lines INS1 through INSm is set to 1, and all other lines are set to 0.
Step-Decoder provides a separate signal line for each step in the control sequence. Encoder
gets the input from instruction decoder, step decoder, external inputs and condition codes. It
uses all these inputs to generate individual control-signals: Yin, PCout, Add, End and so on.
For example (Figure 7.12), Zin=T1+T6.ADD+T4.BR; This signal is asserted during time-slot
T1 for all instructions during T6 for an Add instruction. During T4 for unconditional branch
instruction, when RUN=1, counter is incremented by 1 at the end of every clock cycle. When
RUN=0, counter stops counting. After execution of each instruction, end signal is generated.
End signal resets step counter. Sequence of operations carried out by this machine is
determined by wiring of logic circuits, hence
the name “hardwired”.
Advantage: Can operate at high speed.
Disadvantages: 1) Since no. of instructions/control-lines is often in hundreds, the complexity
of control unit is very high.
2) It is costly and difficult to design.
3) The control unit is inflexible because it is difficult to change the design.
HARDWIRED CONTROL VS MICROPROGRAMMED CONTROL
Attribute Hardwired Control Microprogrammed Control
Micro programmed control is a
Hardwired control is a control
control mechanism to generate
mechanism to generate control-
control-signals by using a memory
Definition signals by using gates, flip- flops,
called control store (CS), which
decoders, and other
contains the control-
digital circuits.
signals.
Speed Fast Slow
Control functions Implemented in hardware. Implemented in software.
Not flexible to accommodate new More flexible, to accommodate new
Flexibility system specifications or new system specification or new
instructions. instructions redesign is required.
Ability to handle Easier.
large or complex Difficult.
instruction sets
Ability to support Easy.
Operating systems& Very difficult.
diagnostic features
Design process Complicated. Orderly and systematic.
Applications Mostly RISC microprocessors. Mainframes, some microprocessors.
Instruction set size Usually under 100 instructions. Usually over 100 instructions.
2K to 10K by 20-400
ROM size -
bit microinstructions.
Chip area efficiency Uses least area. Uses more area.
Diagram
MICROPROGRAMMED CONTROL
Microprogramming is a method of control unit design (Figure). Control-signals are
generated by a program similar to machine language programs. Control Word(CW) is a word
whose individual bits represent various control-signals (like Add, PCin). Each of the control-
steps in control sequence of an instruction defines a unique combination of 1s & 0s in CW.
Individual control-words in microroutine are referred to as microinstructions (Figure).
A sequence of CWs corresponding to control-sequence of a machine instruction
constitutes the microroutine. The microroutines for all instructions in the instruction-set of a
computer are stored in a special memory called the Control Store (CS). Control-unit generates
control-signals for any instruction by sequentially reading CWs of corresponding microroutine
Every time new instruction is loaded into IR, o/p of Starting Address Generator is loaded into
µPC. Then, µPC is automatically incremented by clock; causing successive microinstructions
to be read from CS. Hence, control-signals are delivered to various parts of processor in correct
sequence.
Advantages
• It simplifies the design of control unit. Thus it is both, cheaper and less error prone implement.
• Control functions are implemented in software rather than hardware.
• The design process is orderly and systematic.
• More flexible, can be changed to accommodate new system specifications or to correct the
design errors quickly and cheaply.
• Complex function such as floating point arithmetic can be realized efficiently.
Disadvantages
• A microprogrammed control unit is somewhat slower than the hardwired control unit, because
time is required to access the microinstructions from CM.
• The flexibility is achieved at some extra hardware cost due to the control memory and its
access circuitry.
Microinstructions
A simple way to structure microinstructions is to assign one bit position to each
control-signal required in the CPU. There are 42 signals and hence each microinstruction will
have 42 bits.
Drawbacks of microprogrammed control:
1) Assigning individual bits to each control-signal results in long microinstructions because the
number of required signals is usually large.
2) Available bit-space is poorly used because only a few bits are set to 1 in any given
microinstruction.
Solution: Signals can be grouped because
1) Most signals are not needed simultaneously.
2) Many signals are mutually exclusive. E.g. only 1 function of ALU can be activated at a time.
For ex: Gating signals: IN and OUT signals (Figure); Control-signals: Read, Write; ALU
signals: Add, Sub, Mul, Div, Mod.
Grouping control-signals into fields requires a little more hardware because decoding-
circuits must be used to decode bit patterns of each field into individual control-signals.
Advantage: This method results in a smaller control-store (only 20 bits are needed to store the
patterns for the 42 signals)
Microprogram Sequencing
The task of microprogram sequencing is done by microprogram sequencer. Two important
factors must be considered while designing the microprogram sequencer,
1) The size of the microinstruction &
2) The address generation time.
The size of the microinstruction should be minimum so that the size of control memory
required to store microinstructions is also less. This reduces the cost of control memory. With
less address generation time, microinstruction can be executed in less time resulting better
throughout. During execution of a microprogram the address of the next microinstruction to be
executed has 3 sources:
1) Determined by instruction register.
2) Next sequential address &
3) Branch.
Microinstructions can be shared using microinstruction branching.
Disadvantage of microprogrammed branching:
1) Having a separate microroutine for each machine instruction results in a large total number
of microinstructions and a large control-store.
2) Execution time is longer because it takes more time to carry out the required branches.
Consider the instruction Add src,Rdst ;which adds the source-operand to the contents
of Rdst and places the sum in Rdst. Let source-operand can be specified in following addressing
modes
a) Indexed b) Autoincrement c) Autodecrement d) Register indirect & e) Register direct
Each box in the chart corresponds to a microinstruction that controls the transfers and
operations indicated within the box. The microinstruction is located at the address indicated by
the octal number (001,002).
Branch Address Modification Using Bit-Oring
The branch address is determined by ORing particular bit or bits with the current
address of microinstruction. Eg: If the current address is 170 and branch address is 171 then
the branch address can be generated by ORing 01(bit 1), with the current address. Consider the
point labelled in the figure. At this point, it is necessary to choose between direct and indirect
addressing modes. If indirect-mode is specified in the instruction, then the microinstruction in
location 170 is performed to fetch the operand from the memory. If direct-mode is specified,
this fetch must be bypassed by branching immediately to location 171. The most efficient way
to bypass microinstruction 170 is to have bit-ORing of current address 170 & branch address
171.
Prefetching Microinstructions
Disadvantage of Microprogrammed Control: Slower operating-speed because of the time it
takes to fetch microinstructions from the control-store.
Solution: Faster operation is achieved if the next microinstruction is pre-fetched while the
current one is being executed.
Emulation
• The main function of microprogrammed control is to provide a means for simple, flexible and
relatively inexpensive execution of machine instruction.
• Its flexibility in using a machine's resources allows diverse classes of instructions to be
implemented.
• Suppose we add to the instruction-repository of a given computer M1, an entirely new set of
instructions that is in fact the instruction-set of a different computer M2.
• Programs written in the machine language of M2 can be then be run on computer M1 i.e. M1
emulates M2.
• Emulation allows us to replace obsolete equipment with more up-to-date machines.
• If the replacement computer fully emulates the original one, then no software changes have
to be made to run existing programs.
• Emulation is easiest when the machines involved have similar architectures.
2. COMPUTER ARITHMETIC
Addition and subtraction of two numbers are basic operations at the machine-
instruction level in all computers. These operations, as well as other arithmetic and logic
operations, are implemented in the arithmetic and logic unit (ALU) of the processor. In this
chapter, we present the logic circuits used to implement arithmetic operations. The time needed
to perform addition or subtraction affects the processor’s performance. Multiply and divide
operations, which require more complex circuitry than either addition or subtraction operations,
also affect performance. We present some of the techniques used in modern computers to
perform arithmetic operations at high speed. Operations on floating-point numbers are also
described.
ADDITION AND SUBTRACTION OF SIGNED NUMBERS
Figure shows the truth table for the sum and carry-out functions for adding equally
weighted bits xi and yi in two numbers X and Y . The figure also shows logic expressions for
these functions, along with an example of addition of the 4-bit unsigned numbers 7 and 6. Note
that each stage of the addition process must accommodate a carry-in bit. We use ci to represent
the carry-in to stage i, which is the same as the carry-out from stage (i − 1). The logic expression
for si in Figure 9.1 can be implemented with a 3-input XOR gate, used in Figure a as part of
the logic required for a single stage of binary addition. The carry-out function, ci+1, is
implemented with an AND-OR circuit, as shown. A convenient symbol for the complete circuit
for a single stage of addition, called a full adder (FA), is also shown in the figure.
A cascaded connection of n full-adder blocks can be used to add two n-bit numbers, as
shown in Figure b. Since the carries must propagate, or ripple, through this cascade, the
configuration is called a ripple-carry adder. The carry-in, c0, into the least-significant-bit (LSB)
position provides a convenient means of adding 1 to a number. For instance, forming the 2’s-
complement of a number involves adding 1 to the 1’s-complement of the number. The carry
signals are also useful for interconnecting k adders to form an adder capable of handling input
numbers that are kn bits long, as shown in Figure c.
MULTIPLICATION ALGORITHM
The usual algorithm for multiplying integers by hand is illustrated for the binary system.
The product of two, unsigned, n-digit numbers can be accommodated in 2n digits, so the
product of the two 4-bit numbers in this example is accommodated in 8bits, as shown. In the
binary system, multiplication of the multiplicand by one bit of the multiplier is easy. If the
multiplier bit is 1, the multiplicand is entered in the appropriate shifted position. If the
multiplier bit is 0, then 0s are entered, as in the third row of the example. The product is
computed one bit at a time by adding the bit columns from right to left and propagating carry
values between columns.
then the upper number is the 2’s-complement representation of −2k+1. The recoded multiplier
now consists of the part corresponding to the lower number, with −1 added in position k + 1.
For example, the multiplier 110110 is recoded as 0 −1 +1 0 −1 0. The Booth technique for
recoding multipliers is summarized in Figure. The transformation 011 . . . 110⇒+1 0 0. . . 0 −1
0 is called skipping over 1s. This term is derived from the case in which the multiplier has its
1s grouped into a few contiguous blocks. Only a few versions of the shifted multiplicand (the
summands) need to be added to generate the product, thus speeding up the multiplication
operation. However, in the worst case—that of alternating 1s and 0s in the multiplier—each bit
of the multiplier selects a summand. In fact, this results in more summands than if the Booth
algorithm were not used. A 16-bit worst-case multiplier, an ordinary multiplier, and a good
multiplier are shown in Figure.
The pair (+1 −1) is equivalent to the pair (0 +1). That is, instead of adding −1 times the
multiplicand M at shift position i to +1 ×M at position i + 1, the same result is obtained by
adding +1 ×M at position i. Other examples are: (+1 0) is equivalent to (0 +2), (−1 +1) is
equivalent to (0 −1), and so on. Thus, if the Booth-recoded multiplier is examined two bits at
a time, starting from the right, it can be rewritten in a form that requires at most one version of
the multiplicand to be added to the partial product for each pair of multiplier bits.
DIVISION ALGORITHM
We discussed the multiplication of unsigned numbers by relating the way the
multiplication operation is done manually to the way it is done in a logic circuit. We use the
same approach here in discussing integer division. We discuss unsigned-number division in
detail, and then make some general comments on the signed-number case.
Integer Division
Figure shows examples of decimal division and binary division of the same values.
Consider the decimal version first. The 2 in the quotient is determined by the following
reasoning: First, we try to divide 13 into 2, and it does not work. Next, we try to divide 13 into
27. We go through the trial exercise of multiplying 13 by 2 to get 26, and, observing that 27 −
26 = 1 is less than 13, we enter 2 as the quotient and perform the required subtraction. The next
digit of the dividend, 4, is brought down, and we finish by deciding that 13 goes into 14 once,
and the remainder is 1. We can discuss binary division in a similar way, with the simplification
that the only possibilities for the quotient bits are 0 and 1.
A circuit that implements division by this longhand method operates as follows: It
positions the divisor appropriately with respect to the dividend and performs a subtraction. If
the remainder is zero or positive, a quotient bit of 1 is determined, the remainder is extended
by another bit of the dividend, the divisor is repositioned, and another subtraction is performed.
If the remainder is negative, a quotient bit of 0 is determined, the dividend is restored by adding
back the divisor, and the divisor is repositioned for another subtraction. This is called the
restoring division algorithm.
Restoring Division
Figure shows a logic circuit arrangement that implements the restoring division
algorithm just discussed. Note its similarity to the structure for multiplication shown in Figure.
An n-bit positive divisor is loaded into registerMand an n-bit positive dividend is loaded into
register Q at the start of the operation. Register A is set to 0. After the division is complete, the
n-bit quotient is in register Q and the remainder is in register A. The required subtractions are
facilitated by using 2’s-complement arithmetic. The extra bit position at the left end of both A
and M accommodates the sign bit during subtractions.
The following algorithm performs restoring division. Do the following three steps n
times:
1. Shift A and Q left one bit position.
2. Subtract M from A, and place the answer back in A.
3. If the sign of A is 1, set q0 to 0 and add M back to A (that is, restore A); otherwise, set
q0 to 1.
Figure shows a 4-bit example as it would be processed by the circuit in Figure.
Non-Restoring Division
The restoring division algorithm can be improved by avoiding the need for restoring A
after an unsuccessful subtraction. Subtraction is said to be unsuccessful if the result is negative.
Consider the sequence of operations that takes place after the subtraction operation in the
preceding algorithm. If A is positive, we shift left and subtract M, that is, we perform 2A− M.
IfAis negative, we restore it by performing A+ M, and then we shift it left and subtract M. This
is equivalent to performing 2A+ M. The q0 bit is appropriately set to 0 or 1 after the correct
operation has been performed.
There are no simple algorithms for directly performing division on signed operands that
are comparable to the algorithms for signed multiplication. In division, the operands can be
preprocessed to change them into positive values. After using one of the algorithms just
discussed, the signs of the quotient and the remainder are adjusted as necessary.
3. FLOATING POINT ARITHMETIC OPERATIONS
We know that the floating-point numbers and indicated how they can be represented in
a 32-bit binary format. Now, we provide more detail on representation formats and arithmetic
operations on floating-point numbers. The descriptions provided here are based on the 2008
version of IEEE (Institute of Electrical and Electronics Engineers) Standard.
The basic IEEE format is a 32-bit representation, shown in Figure a. The leftmost bit
represents the sign, S, for the number. The next 8 bits, E , represent the signed exponent of the
scale factor (with an implied base of 2), and the remaining 23 bits, M, are the fractional part of
the significant bits. The full 24-bit string, B, of significant bits, called the mantissa, always has
a leading 1, with the binary point immediately to its right. Therefore, the mantissa
By convention, when the binary point is placed to the right of the first significant bit,
the number is said to be normalized. Note that the base, 2, of the scale factor and the leading 1
of the mantissa are both fixed. They do not need to appear explicitly in the representation.
Instead of the actual signed exponent, E, the value stored in the exponent field is an unsigned
integer E = E + 127. This is called the excess-127 format. Thus, E is in the range 0 ≤ E ≤ 255.
The end values of this range, 0 and 255, are used to represent special values, as described later.
Therefore, the range of E for normal values is 1 ≤ E ≤ 254.
This means that the actual exponent, E, is in the range −126 ≤ E ≤ 127. The use of the
excess-127 representation for exponents simplifies comparison of the relative sizes of two
floating-point numbers. The 32-bit standard representation in Figure 9.26a is called a single-
precision representation because it occupies a single 32-bit word. The scale factor has a range
of 2−126 to 2+127, which is approximately equal to 10±38. The 24-bit mantissa provides
approximately the same precision as a 7-digit decimal value. An example of a single-precision
floating-point number is shown in Figure b.
To provide more precision and range for floating-point numbers, the IEEE standard
also specifies a double-precision format, as shown in Figure c. The double-precision format
has increased exponent and mantissa ranges. The 11-bit excess-1023 exponent E has the range
1 ≤ E ≤ 2046 for normal values, with 0 and 2047 used to indicate special values, as before.
Thus, the actual exponent E is in the range −1022 ≤ E ≤ 1023, providing scale factors of 2−1022
to 21023 (approximately 10±308). The 53-bit mantissa provides a precision equivalent to about
16 decimal digits.
A computer must provide at least single-precision representation to conform to the
IEEE standard. Double-precision representation is optional. The standard also specifies certain
optional extended versions of both of these formats. The extended versions provide increased
precision and increased exponent range for the representation of intermediate values in a
sequence of calculations. The use of extended formats helps to reduce the size of the
accumulated round-off error in a sequence of calculations leading to a desired result.
For example, the dot product of two vectors of numbers involves accumulating a sum
of products. The input vector components are given in a standard precision, either single or
double, and the final answer (the dot product) is truncated to the same precision. All
intermediate calculations should be done using extended precision to limit accumulation of
errors. Extended formats also enhance the accuracy of evaluation of elementary functions such
as sine, cosine, and so on. This is because they are usually evaluated by adding up a number of
terms in a series representation. In addition to requiring the four basic arithmetic operations,
the standard requires three additional operations to be provided: remainder, square root, and
conversion between binary and decimal representations.
We note two basic aspects of operating with floating-point numbers. First, if a number
is not normalized, it can be put in normalized form by shifting the binary point and adjusting
the exponent. Figure shows an unnormalized value, 0.0010110 . . . × 29, and its normalized
version, 1.0110 . . . × 26. Since the scale factor is in the form 2i , shifting the mantissa right or
left by one bit position is compensated by an increase or a decrease of 1 in the exponent,
respectively. Second, as computations proceed, a number that does not fall in the representable
range of normal numbers might be generated. In single precision, this means that its normalized
representation requires an exponent less than −126 or greater than +127. In the first case, we
say that underflow has occurred, and in the second case, we say that overflow has occurred.
Special Values
The end values 0 and 255 of the excess-127 exponent E are used to represent special
values. When E = 0 and the mantissa fraction M is zero, the value 0 is represented. When E =
255 and M = 0, the value ∞ is represented, where ∞ is the result of dividing a normal number
by zero. The sign bit is still used in these representations, so there are representations for ±0
and ±∞. When E = 0 and M = 0, denormal numbers are represented. Their value is ±0.M ×
2−126. Therefore, they are smaller than the smallest normal number. There is no implied one
to the left of the binary point, and M is any nonzero 23-bit fraction. The purpose of introducing
denormal numbers is to allow for gradual underflow, providing an extension of the range of
normal representable numbers. This is useful in dealing with very small numbers, which may
be needed in certain situations. When E = 255 and M = 0, the value represented is called Not a
Number (NaN). A NaN represents the result of performing an invalid operation such as 0/0 or
√−1.
Exceptions
In conforming to the IEEE Standard, a processor must set exception flags if any of the
following conditions arise when performing operations: underflow, overflow, divide by zero,
inexact, invalid. We have already mentioned the first three. Inexact is the name for a result that
requires rounding in order to be represented in one of the normal formats. An invalid exception
occurs if operations such as 0/0 or √−1 are attempted. When an exception occurs, the result is
set to one of the special values. If interrupts are enabled for any of the exception flags, system
or user-defined routines are entered when the associated exception occurs. Alternatively, the
application program can test for the occurrence of exceptions, as necessary, and decide how to
proceed.
Add/Subtract Rule
1. Choose the number with the smaller exponent and shift its mantissa right a number of steps
equal to the difference in exponents.
2. Set the exponent of the result equal to the larger exponent.
3. Perform addition/subtraction on the mantissas and determine the sign of the result.
4. Normalize the resulting value, if necessary.
Multiplication and division are somewhat easier than addition and subtraction, in that
no alignment of mantissas is needed.
Multiply Rule
1. Add the exponents and subtract 127 to maintain the excess-127 representation.
2. Multiply the mantissas and determine the sign of the result.
3. Normalize the resulting value, if necessary.
Divide Rule
1. Subtract the exponents and add 127 to maintain the excess-127 representation.
2. Divide the mantissas and determine the sign of the result.
3. Normalize the resulting value, if necessary.
The sign of the difference that results from comparing exponents determines which
mantissa is to be shifted. Therefore, in step 1, the sign is sent to the SWAP network in the upper
right corner of Figure. If the sign is 0, then EA≥ EB and the mantissas MA and MB are sent
straight through the SWAP network. This results in MB being sent to the SHIFTER, to be
shifted n positions to the right. The other mantissa, MA, is sent directly to the mantissa
adder/subtractor. If the sign is 1, then EA < EB and the mantissas are swapped before they are
sent to the SHIFTER.
Step 2 is performed by the two-way multiplexer, MUX, near the bottom left corner of the
figure. The exponent of the result, E, is tentatively determined as EA if EA≥ EB, or E B if EA
< EB, based on the sign of the difference resulting from comparing exponents in step 1.
Step 3 involves the major component, the mantissa adder/subtractor in the middle of the figure.
The CONTROL logic determines whether the mantissas are to be added or subtracted. This is
decided by the signs of the operands (SA and SB) and the operation (Add or Subtract) that is
to be performed on the operands. The CONTROL logic also determines the sign of the result,
SR. For example, if A is negative (SA = 1), B is positive (SB = 0), and the operation is A − B,
then the mantissas are added and the sign of the result is negative (SR = 1). On the other hand,
if A and B are both positive and the operation is A − B, then the mantissas are subtracted. The
sign of the result, SR, now depends on the mantissa subtraction operation. For instance, if EA
> EB, then M = MA − (shifted MB) and the resulting number is positive. But if EB > EA, then
M = MB − (shifted MA) and the result is negative. This example shows that the sign from the
exponent comparison is also required as an input to the CONTROL network. When EA= EB
and the mantissas are subtracted, the sign of the mantissa adder/subtractor output determines
the sign of the result. The reader should now be able to construct the complete truth table for
the CONTROL network.
Step 4 of the Add/Subtract rule consists of normalizing the result of step 3 by shifting M to the
right or to the left, as appropriate. The number of leading zeros in M determines the number of
bit shifts, X , to be applied to M. The normalized value is rounded to generate the 24-bit
mantissa, MR, of the result. The value X is also subtracted from the tentative result exponent
E to generate the true result exponent, ER. Note that only a single right shift might be needed
to normalize the result. This would be the case if two mantissas of the form 1.xx . . . were
added. The vector M would then have the form 1x.xx We have not given any details on
the guard bits that must be carried along with intermediate mantissa values. In the IEEE
standard, only a few bits are needed, to generate the 24-bit normalized mantissa of the result.
Let us consider the actual hardware that is needed to implement the blocks in Figure.
The two 8-bit subtractors and the mantissa adder/subtractor can be implemented by
combinational logic, as discussed earlier in this chapter. Because their outputs must be in sign-
and-magnitude form, we must modify some of our earlier discussions. A combination of 1’s-
complement arithmetic and sign-and-magnitude representation is often used. Considerable
flexibility is allowed in implementing the SHIFTER and the output normalization operation.
The operations can be implemented with shift registers. However, they can also be built as
combinational logic units for high-performance.
CONCLUSION:
This chapter explained the basic structure of a processor and how it executes
instructions. Modern processors have a multi-stage organization because this is a structure that
is well suited to pipelined operation. Each stage implements the actions needed in one of the
execution steps of an instruction. A five-step sequence in which each step is completed in one
clock cycle has been demonstrated. Such an approach is commonly used in processors that
have a RISC-style instruction set. The discussion assumed that the execution of one instruction
is completed before the next instruction is fetched. Only one of the five hardware stages is used
at any given time, as execution moves from one stage to the next in each clock cycle.
Computer arithmetic poses several interesting logic design problems. This chapter
discussed some of the techniques that have proven useful in designing binary arithmetic units.
The carry-lookahead technique is one of the major ideas in high-performance adder design. We
now summarize the techniques for high-speed multiplication. Bit-pair recoding of the
multiplier, derived from the Booth algorithm, can be used to initially reduce the number of
summands by a factor of two. The resulting summands can then be reduced to two in a
reduction tree with a relatively small number of reduction levels. The final product can be
generated by an addition operation that uses a carry-lookahead adder. All three of these
techniques—bit-pair recoding of the multiplier, parallel reduction of summands, and carry-
lookahead addition—have been used in various combinations by the designers of high-
performance processors to reduce the time needed to perform multiplication. The important
IEEE floating-point number representation standard was described, and rules for performing
the four standard operations were given.