0% found this document useful (0 votes)

11 views17 pages

15 - Floating Point Encoding

Uploaded by

ranbir singh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views17 pages

15 - Floating Point Encoding

Uploaded by

ranbir singh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 17

Floating-point Number Encoding

IEEE 754 standard

1
Reading and Exercises
• P & H: Section 3.5

2
Objective
At the end of this section, you will understand
1. How floating-point numbers are represented

3
Floating-Point Numbers
• Most modern architectures support floating point
representations for fractional quantities
▪ Usually implement the IEEE 754 standard
• Most CPUs now include floating-point units
(FPUs)
▪ Have instructions to do floating-point arithmetic
• Very quick
▪ If missing, FPU operation is simulated in software
• Very slow
4
Floating-Point Numbers (cont’d)
• A floating point number stored in a fixed size
register may only approximate a real value
▪ Beware: precision errors may accumulate when doing
repeated FP calculations

5
Fixed-Point Numbers
• With integers, the binary point is assumed to be to
the right of the LSB
▪ Eg: 4-bit register
0 1 0 1 .

• Gives the value 5.0:

0 × 23 + 1 × 22 + 0 × 21 + 1 × 20 = 5
• Fractional quantities can be represented by
moving the binary point to a new position
▪ i.e. scaling the integer by a power of 2
6
Fixed-Point Numbers (cont’d)
▪ Eg:
0 1 1 1
.
• Gives the value 1.75:
0 × 21 + 1 × 20 + 1 × 2-1 + 1 × 2-2 =
0 + 1 + 0.5 + .25 = 1.75

• The position of the point is established by

convention
• Fixed point is used for the significand in floating
point representations
7
Floating-Point Single Format
• Any number N can be represented as:
N = (-1)s × 1.f × 2u
▪ s = sign bit
▪ 1.f = significand or mantissa
▪ u = unbiased exponent
• 11.01 = (-1)0 × 1.101× 21
• -101.011 = (-1)1 × 1.01011× 22
• That is, we need only know s, f, and u
8
Floating-Point Single Format
• Analogous to scientific notation
▪ The differences are:
• Base 2 instead of base 10
• Uses a sign bit s
• The significand and exponent are in binary
• The number is normalized so that: 1.0 ≤ significand < 2.0
▪ The significand is stored in fixed point format
▪ Since the MSB is always 1, it is not stored
• Provides one more bit of precision
• The exponent is biased by adding the constant 127
▪ Ensures it will always be stored as a positive integer

9
Floating-Point Single Format (cont’d)
• IEEE 754 Standard:
s e[7:0] f[22:0]
31 30 23 22 0

▪ Uses 4 bytes
▪ Number represented is: N = (-1)s × 1.f × 2e−127
• s: sign bit
• e: biased exponent
▪ =127 (0x7f) + unbiased exponent
• f: fractional part of the significand

10
Floating-Point Single Format (cont’d)
▪ Largest biased exponent allowed is 254 (0xfe)
• 255 (0xff) is used to represent quantities that are not
numbers (so-called NaNs)
• Thus, the largest possible unbiased exponent is 127
▪ The smallest biased exponent allowed is 1
• 0 is used for subnormal numbers (tiny fractional quantities)
• Thus, the smallest possible unbiased exponent is -126
▪ Range of magnitudes: 1.0 × 2-126 to (2.0 - ε) × 2127
• 1.17549435e-38 to 3.40282346e+38

11
Floating-Point Single Format (cont’d)
▪ Eg: 0x3e800000
0 01111101 00000000000000000000000
31 30 23 22 0

• f is 0
• e is 125
• s is 0
• (-1)0 × 1.0 × 2125−127 = 1 × 1.0 × 2-2 = +0.25

12
Floating-Point Double Format
• IEEE 754 Standard:
s e[10:0] f[51:0]
63 62 52 51 0

▪ Uses 8 bytes
▪ Number represented is: N = (-1)s × 1.f × 2e−1023
• s: sign bit
• e: biased exponent
▪ Created by adding 1023 (0x3ff) to the unbiased exponent
• f: fractional part of the significand

13
Floating-Point Double Format
(cont’d)
▪ Range of magnitudes: ~2.2e-308 to ~1.8e+308
▪ Has about 17 decimal digits of precision
▪ NaNs are represented using a biased exponent of 2047
(0x7ff)

14
Floating-Point NaNs
• A NaN (“Not a Number”) is an entity that cannot
be represented using conventional numbers
• In single format, NaNs use a biased exponent of
0xff
▪ positive ∞ : 0x7f800000
▪ negative ∞ : 0xff800000
▪ √-1 : 0x7fffffff

15
Floating-Point NaNs (cont’d)
• In double format, NaNs use a biased exponent of
0x7ff
▪ positive ∞ : 0x7ff00000 00000000
▪ negative ∞ : 0xfff00000 00000000
▪ √-1 : 0x7fffffff ffffffff
• Result from dividing by 0 or taking the square
root of a negative number

16
Floating-Point NaNs (cont’d)
• Using a NaN as an operand to most instructions
causes an exception
▪ However, ±∞ can be compared to a conventional
number using fcmp
▪ May also be used as a valid argument to some
functions
• Eg: atan(∞) returns π / 2

ML System Optimization Lecture 11 Quantization
No ratings yet
ML System Optimization Lecture 11 Quantization
150 pages
Floating Point Arithmetic
100% (1)
Floating Point Arithmetic
30 pages
Fixed and Floating Point Representation
No ratings yet
Fixed and Floating Point Representation
5 pages
4-Floating-Point-inclass
No ratings yet
4-Floating-Point-inclass
33 pages
class03_cs230s22
No ratings yet
class03_cs230s22
33 pages
3. Floating_Point_Number
No ratings yet
3. Floating_Point_Number
36 pages
Lec07 - Computer Arithmetic - Floating-Point Representation and Arithmetic
No ratings yet
Lec07 - Computer Arithmetic - Floating-Point Representation and Arithmetic
42 pages
L2-Variables and Floating Point Number System
No ratings yet
L2-Variables and Floating Point Number System
38 pages
COA UNIT-III PPTs Dr.G.Bhaskar ECE
No ratings yet
COA UNIT-III PPTs Dr.G.Bhaskar ECE
64 pages
ENSC254 - Floating Point Computation
No ratings yet
ENSC254 - Floating Point Computation
29 pages
04-float
No ratings yet
04-float
40 pages
Lecture 3 - Floating Point
No ratings yet
Lecture 3 - Floating Point
33 pages
Computer Architecture: Nguyễn Trí Thành
No ratings yet
Computer Architecture: Nguyễn Trí Thành
55 pages
Floating-Point Numbers
No ratings yet
Floating-Point Numbers
23 pages
The World Is Not Just Integers: Programming Languages Support Numbers With Fraction
No ratings yet
The World Is Not Just Integers: Programming Languages Support Numbers With Fraction
51 pages
Floating Point & fixed point Representation_BCA II
No ratings yet
Floating Point & fixed point Representation_BCA II
24 pages
Lecture 4
No ratings yet
Lecture 4
21 pages
Floating Point
No ratings yet
Floating Point
33 pages
Lecture 14 - Arithmetic Subsystems - Numbering Systems and Floating Point Unit (FPU)
No ratings yet
Lecture 14 - Arithmetic Subsystems - Numbering Systems and Floating Point Unit (FPU)
32 pages
chapter3_3
No ratings yet
chapter3_3
13 pages
Lecture 4
No ratings yet
Lecture 4
21 pages
Lecture 2
No ratings yet
Lecture 2
27 pages
IEEE 754 Floating Point Formats
No ratings yet
IEEE 754 Floating Point Formats
12 pages
Lab 7
No ratings yet
Lab 7
11 pages
Module 2 - PART D Floating
No ratings yet
Module 2 - PART D Floating
30 pages
CH03-Data-II(2) (2)
No ratings yet
CH03-Data-II(2) (2)
31 pages
lecture_slides_02_026-IEEEfloats
No ratings yet
lecture_slides_02_026-IEEEfloats
8 pages
Fix and Floting Systems
No ratings yet
Fix and Floting Systems
28 pages
Floating Point: 15-213: Introduction To Computer Systems 4 Lecture, Sep. 10, 2015
No ratings yet
Floating Point: 15-213: Introduction To Computer Systems 4 Lecture, Sep. 10, 2015
40 pages
LEC03 Data II
No ratings yet
LEC03 Data II
45 pages
Lab 7
No ratings yet
Lab 7
9 pages
Lect4 Floats
No ratings yet
Lect4 Floats
64 pages
Lecture 06 - MIPS Floating Point Arithmetic
No ratings yet
Lecture 06 - MIPS Floating Point Arithmetic
23 pages
Floa NG Point: 15 - 213: Introduc On To Computer Systems 4 Lecture, Sep 5, 2013
No ratings yet
Floa NG Point: 15 - 213: Introduc On To Computer Systems 4 Lecture, Sep 5, 2013
40 pages
Floating Point Sept 6, 2006 15-213: "The Course That Gives CMU Its Zip!"
No ratings yet
Floating Point Sept 6, 2006 15-213: "The Course That Gives CMU Its Zip!"
34 pages
9-Algorithms For Floating Point Arithmetic Operations-22-01-2024
No ratings yet
9-Algorithms For Floating Point Arithmetic Operations-22-01-2024
49 pages
"The Course That Gives CMU Its Zip!": Topics
No ratings yet
"The Course That Gives CMU Its Zip!": Topics
30 pages
C++ Introduction
100% (11)
C++ Introduction
217 pages
5 Data - Floating - Point v1
No ratings yet
5 Data - Floating - Point v1
25 pages
DSP Arithmetic
No ratings yet
DSP Arithmetic
33 pages
Fixed Point Numbers
No ratings yet
Fixed Point Numbers
20 pages
Floating Point Representation
No ratings yet
Floating Point Representation
3 pages
CLP-1 Differential Calculus Problem Book2
100% (1)
CLP-1 Differential Calculus Problem Book2
549 pages
Floating Point Arithmetic: Numbers
No ratings yet
Floating Point Arithmetic: Numbers
14 pages
Implementation of IEEE 754 Compliant Single Precision Floating-Point Adder Unit Supporting Denormal Inputs On Xilinx FPGA
No ratings yet
Implementation of IEEE 754 Compliant Single Precision Floating-Point Adder Unit Supporting Denormal Inputs On Xilinx FPGA
5 pages
COMP0068 Lecture10 High Level Data Types
No ratings yet
COMP0068 Lecture10 High Level Data Types
25 pages
Floating Point Numbers: CS101 Introduction To Computing
No ratings yet
Floating Point Numbers: CS101 Introduction To Computing
41 pages
Ece552 10 Floating Point
No ratings yet
Ece552 10 Floating Point
15 pages
"The Course That Gives CMU Its Zip!": Topics
No ratings yet
"The Course That Gives CMU Its Zip!": Topics
31 pages
EE 109 Unit 20: IEEE 754 Floating Point Representation Floating Point Arithmetic
No ratings yet
EE 109 Unit 20: IEEE 754 Floating Point Representation Floating Point Arithmetic
31 pages
Floating Point Package User's Guide
No ratings yet
Floating Point Package User's Guide
13 pages
Computer Arithmetic Representations
No ratings yet
Computer Arithmetic Representations
24 pages
Floating Point Numbers
No ratings yet
Floating Point Numbers
5 pages
Floating Point Arithmetic Class
No ratings yet
Floating Point Arithmetic Class
24 pages
Microprocessor Theory and Applications With 6800068020 and Pentium TQW - Darksiderg
100% (5)
Microprocessor Theory and Applications With 6800068020 and Pentium TQW - Darksiderg
590 pages
Floating Point: - We Need A Way To Represent
No ratings yet
Floating Point: - We Need A Way To Represent
14 pages
Computer Arithmetic Representations
No ratings yet
Computer Arithmetic Representations
24 pages
Floating Point Representation of Data: By-Astha Jain Class-It1 0827IT171019
No ratings yet
Floating Point Representation of Data: By-Astha Jain Class-It1 0827IT171019
16 pages
The IEEE Standard For Floating Point Arithmetic
No ratings yet
The IEEE Standard For Floating Point Arithmetic
9 pages
Q and L Programming Manual
No ratings yet
Q and L Programming Manual
1,320 pages
Boulanger Lazzarini AudioProgrammingBook PDF
67% (3)
Boulanger Lazzarini AudioProgrammingBook PDF
916 pages
how to convert a decimal number to IEEE 754
No ratings yet
how to convert a decimal number to IEEE 754
13 pages
C Programming Module 1 C Programming Basic
100% (1)
C Programming Module 1 C Programming Basic
33 pages
3 - ARMv8-A Architecture
No ratings yet
3 - ARMv8-A Architecture
67 pages
1 - C Language
No ratings yet
1 - C Language
66 pages
Floating Point Numbers
No ratings yet
Floating Point Numbers
3 pages
2 - Computer Architecture
No ratings yet
2 - Computer Architecture
45 pages
IEEE 754 Floating Point Standard
No ratings yet
IEEE 754 Floating Point Standard
2 pages
Chapter 2 Exercise and Answer Sign
No ratings yet
Chapter 2 Exercise and Answer Sign
10 pages
Automatic Differentiation Lecture Slides
No ratings yet
Automatic Differentiation Lecture Slides
271 pages
RPG Ile V7.1
No ratings yet
RPG Ile V7.1
898 pages
So Machine Basic Func Library
No ratings yet
So Machine Basic Func Library
282 pages
Fortran 90
No ratings yet
Fortran 90
61 pages
Ascp TS
No ratings yet
Ascp TS
40 pages
Lecture Notes For Embedded Controllers
100% (1)
Lecture Notes For Embedded Controllers
122 pages
Interview Qa Cobol
No ratings yet
Interview Qa Cobol
145 pages
Flying Logic User's Guide
No ratings yet
Flying Logic User's Guide
93 pages
The OpenGL Shading Language
No ratings yet
The OpenGL Shading Language
75 pages
Floating Point ALU Using VHDL
No ratings yet
Floating Point ALU Using VHDL
32 pages
6 Weeks Industrial Training Cum Project Report
No ratings yet
6 Weeks Industrial Training Cum Project Report
71 pages
Mvi69 MCM User Manual
No ratings yet
Mvi69 MCM User Manual
135 pages
MATH1070 2 Error and Computer Arithmetic PDF
No ratings yet
MATH1070 2 Error and Computer Arithmetic PDF
60 pages
Java CheatSheet
No ratings yet
Java CheatSheet
30 pages
Class Xi CPP Notes
No ratings yet
Class Xi CPP Notes
27 pages
1 - Bit ALU
No ratings yet
1 - Bit ALU
13 pages
Implementation of Double Precision Floating Point Radix-2 FFT Using VHDL
No ratings yet
Implementation of Double Precision Floating Point Radix-2 FFT Using VHDL
7 pages
Bit Masking
No ratings yet
Bit Masking
10 pages
Expressions in C
No ratings yet
Expressions in C
7 pages
Computer Organization and Architecture: William Stallings
No ratings yet
Computer Organization and Architecture: William Stallings
7 pages
Quiz 01aaaae Floatingpoint Answers
No ratings yet
Quiz 01aaaae Floatingpoint Answers
6 pages
Floating-Point To Fixed-Point Code Conversion With Variable Trade-Off Between Computational Complexity and Accuracy Loss
No ratings yet
Floating-Point To Fixed-Point Code Conversion With Variable Trade-Off Between Computational Complexity and Accuracy Loss
6 pages
COMMAT1 Mock Exam: I. Identification
No ratings yet
COMMAT1 Mock Exam: I. Identification
6 pages
Mathematics Principles V11
From Everand
Mathematics Principles V11
Clive W. Humphris
No ratings yet
GCSE Maths Teachers Pack V11
From Everand
GCSE Maths Teachers Pack V11
Clive W. Humphris
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

15 - Floating Point Encoding

Uploaded by

15 - Floating Point Encoding

Uploaded by

Floating-point Number Encoding

IEEE 754 standard

• Gives the value 5.0:

• The position of the point is established by

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.