0% found this document useful (0 votes)
5 views20 pages

Part 5 Floating Point Add Sub Mul

The document explains the IEEE 754 standard for floating-point representation, detailing both single and double precision formats. It covers the structure of binary floating-point numbers, including the sign, exponent, and mantissa, as well as the rules for arithmetic operations such as addition, subtraction, multiplication, and division. The document emphasizes the importance of normalized values for accuracy and data exchange.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views20 pages

Part 5 Floating Point Add Sub Mul

The document explains the IEEE 754 standard for floating-point representation, detailing both single and double precision formats. It covers the structure of binary floating-point numbers, including the sign, exponent, and mantissa, as well as the rules for arithmetic operations such as addition, subtraction, multiplication, and division. The document emphasizes the importance of normalized values for accuracy and data exchange.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

Floating Point Representation

(IEEE 754 Standard) -Add, sub,


mul
Module 2

1
Floating Point Representation
(IEEE 754 Standard)
• a binary floating-point number can be represented by
• A sign for the number
• Some significant bits
• A signed scale factor exponent for an implied base of 2
• The basic IEEE format is a 32-bit representation
• The leftmost bit represents the sign, S, for the number.
• The next 8 bits, E , represent the signed exponent of the scale factor (with an implied base of 2),
and the remaining 23 bits, M , are the fractional part of the significant bits. The full 24-bit string,
B, of significant bits, called the mantissa, always has a leading 1
• when the binary point is placed to the right of the first significant bit, the number is said to be
normalized

2
Floating Point Representation
(IEEE 754 Standard)

3
IEEE 754 Standard (Single Precision)
• Instead of the actual signed exponent, E, the value stored in the
exponent field is an unsigned integer E´ = E + 127.
• This is called the excess-127 format. Thus, E´ is in the range 0 ≤ E´ ≤
255.
• The end values of this range, 0 and 255, are used to represent special
values.
• Therefore, the range of E´ for normal values is 1 ≤ E´ ≤ 254.
• This means that the actual exponent, E, is in the range −126 ≤ E ≤ 127.
The use of the excess-127 representation for exponents simplifies
comparison of the relative sizes of two floating-point numbers.

4
Floating Point Representation

5
(IEEE 754 Standard) Double Precision

6
Double Precision
• The double-precision format has increased exponent and mantissa
ranges.
• The 11-bit excess-1023 exponent E´ has the range 1 ≤ E´ ≤ 2046 for
normal values, with 0 and 2047 used to indicate special values, as
before.
• Thus, the actual exponent E is in the range −1022 ≤ E ≤ 1023,
providing scale factors of 2−1022 to 21023 (approximately 10±308).
The 53-bit mantissa provides a precision equivalent to about 16
decimal digits

7
Normalized Value

8
Why normalized form?
• Simplifies the exchange of data that includes floating-point numbers
• Simplifies the arithmetic algorithms to know that the numbers will
always be in this form
• Increases the accuracy of the numbers that can be stored in a word,
since each unnecessary leading 0 is replaced by another significant
digit to the right of the decimal point

9
Floating Point Arithmetic
• Add/Subtract Rule
1. Choose the number with the smaller exponent and shift its mantissa right a
number of steps equal to the difference in exponents.
2. Set the exponent of the result equal to the larger exponent.
3. Perform addition/subtraction on the mantissas and determine the sign of the
result.
4. Normalize the resulting value, if necessary

10
Floating point addition in Decimal
• Add 2.9400 × 102 to 4.3100 × 104.
• We rewrite 2.9400 × 102 as 0.0294 × 104
• perform addition of the mantissas to get 4.3394 × 104.

11
Floating Point Binary Representation
• 85.125
• 85 = 1010101
• 0.125 = 001
• 85.125 = 1010101.001
• =1.010101001 x 2^6
• sign = 0
• 1. Single precision:
• biased exponent 127+6=133
• 133 = 10000101
• Normalised mantisaa = 010101001
• we will add 0's to complete the 23 bits

• The IEEE 754 Single precision is:
• =0 10000101 01010100100000000000000

12
Floating Point Binary Representation
• Double precision:
• 85.125
• 85 = 1010101
• 0.125 = 001
• 85.125 = 1010101.001 =1.010101001 x 2^6
• sign = 0
• biased exponent 1023+6=1029
• 1029 = 10000000101
• Normalised mantisa = 010101001
• we will add 0's to complete the 52 bits
• The IEEE 754 Double precision is: = 0 10000000101
0101010010000000000000000000000000000000000000000000

13
Example
• Perform the following arithmetic operation using floating point arithmetic, In
each case, show how the numbers would be stored using IEEE single-precision
format

14
15
Example 2

16
17
Multiply Rule
• 1.Add the exponents and subtract 127 to maintain the excess-127
representation.
• 2.Multiply the mantissas and determine the sign of the result.
• 3. Normalize the resulting value, if necessary

18
Multiplication
• Add the biased exponent

• Multiply the mantissas

• Normalise (already normalised)


• Round the result (no change)
• Adjust the sign

19
Divide Rule
• Subtract the exponents and add 127 to maintain the excess-127
representation.
• Divide the mantissas and determine the sign of the result.
• Normalize the resulting value, if necessary.

20

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy