Floating-Point Arithmetic: Second Slide
Floating-Point Arithmetic: Second Slide
Floating-Point Arithmetic
- For this reason, floating-point computation is often found in systems which include very small and
very large real numbers, which require fast processing times
- The term floating point refers to the fact that a number's radix point (decimal point, or, more
commonly in computers, binary point) can "float".
- A floating-point system can be used to represent, with a fixed number of digits, numbers of
different orders of magnitude:
Second Slide
XM and the exponent XE are fixed-point
- If we Recall from section 3.2.3 that the mantissa (significand)
numbers and that the base B is the same as the base (radix) of XM.
- We also assume that the floating-point numbers are stored in normal form only and the final result of
each floating-point arithmetic operation should be normalized.
Third Slide
The Floating point have four basic operations and four general formulas, the addition subtraction
multiplication and division.
- If we look the formula of Multiplication and division they are relatively simple and common
because the mantissas and exponents can be processed independently. Floating-point for
multiplication requires a fixed-point multiplication of the mantissas and a fixed-point addition of
the exponents.
Fourth Slide
In Multiplication
- And this is the formula for multiplication, as stated in the formula we just simply multiply the X
and Y significand or the Mantissas and add the X and Y exponent
- The product of X and Y mantissas would be 1.38758607 and the sum of exponent X and Y
would be 38
In division
- Same given in the multiplication we just follow this formula for the division.
- As stated in the formula We just simply divide the significand or the X and Y mantissas and
subtract the X and Y exponent.
- The quotient of X and Y mantissa would be 1.263368939 and the difference of X and Y exponent
would be -4
Fifth Slide
- Floating-point addition and subtraction are complicated by the fact that the exponents of the
two input operands must be made equal before the corresponding mantissas and it can be
added or subtracted
- The main 3 steps for floating points addition and subtraction
Six Slide
17
- For example add the floating point where X =1.32400111×10 and
Y =1.04799245 ×10 21
- The First step is we just simply subtract exponent of Y ❑− X❑ . The exponent for Y is 21
and X is 17.
- The difference of Y E− X E would be 4
- Next step is shifting the X M to the right by the difference of by X E ∧Y E . The difference
of X E −Y E is -4 then this -4 is the indicator how many places will shift the X M
- Since X E is identified as smaller exponent, The X M will shift 4 places to the right then.
The value of X M is 1.32400111after the shift to the right the result would be
0.00013240 .
Seven Slide
Last step is add the X M 2 and Y M . The Computed value for X M 2 is 0.00013240and
−4 −4
-
the Value for Y M is 1.04799245
−4
- The sum or the Final result of of X M 2 ∧Y M is 1.04812485 and the exponent of 21
- Each floating-point arithmetic operation needs an extra step in order to normalize the result.
Eight Slide
- If we use the ordinary binary addition the sum for X E ∧Y E is 10100 in binary the to 4+8= 12 and the equivalent
- So to correct this sum we just simply subtract the sum or the 10100 to bias which is the 8 and the converted value of 8
in binary is 1000 .
- the difference of 10100 and 1000 would be 1100 that in decimal the equivalent value of 1100 is 12
Nine Slide
- Overflow or underflow resulting from mantissa operations can usually be corrected by shifting
the mantissa of the result one digit to the right and modifying its exponent or incrementing its
exponent by one.
7 7
- For example Add the normalized decimal numbers. If X =5.1049× 10 and Y =7.9379 × 10 we
just simply the X M ∧Y M and we increment the exponent to 1
The sum of X M ∧Y M is 1 3.0428× 10 then will shift this sum buy 1 digit to the right.
7
-
8
- The normalized value now is 1 .30428× 10
- Ten Slide
Floating point Unit
- that have
It is integrated circuit which handles all mathematical operations.
anything to do with floating point numbers or fractions.
Floating-point arithmetic can be implemented by two loosely connected fixed-point datapath