0% found this document useful (0 votes)

9 views48 pages

chapter02b float 中文

The document provides an overview of computer systems from a programmer's perspective, focusing on the representation and manipulation of information as bits, including integers and floating-point numbers. It covers the IEEE floating point standard, its properties, and the encoding of floating-point numbers, including normalized and denormalized values. Additionally, it discusses limitations in representable numbers and special values like NaN and infinity.

Uploaded by

dthao70706

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views48 pages

chapter02b float 中文

Uploaded by

dthao70706

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 48

Computer Systems:

A Programmer’s Perspective
计算机系统
周学海
xhzhou@ustc.edu.cn
0551-63492149
中国科学技术大学
Review
• Representing information as bits
• Bit-level manipulations
• Integers
– Representation: unsigned and signed
– Conversion, casting
– Expanding, truncating
– Addition, negation, multiplication, shifting
• Summary
Floating Point

• Background: Fractional binary numbers

• IEEE floating point standard: Definition
• Example and properties
• Rounding, addition, multiplication
• Floating point in C
• Summary

10/17/2024 Bryant and O'Hallaron, Computer Systems: A Programmer's Perspective, Third Edition 3
Floating Point Puzzles
• For each of the following C expressions, either:
– Argue that it is true for all argument values
– Explain why not true
• x == (int)(float) x

int x = …; • x == (int)(double) x

float f = …; • f == (float)(double) f

double d = …; • d == (float) d
• f == -(-f);
• 2/3 == 2/3.0
Assume neither
d nor f is NaN • d < 0.0  ((d*2) < 0.0)
• d > f  -f > -d
• d * d >= 0.0
• (d+f)-d == f

10/17/2024 Bryant and O'Hallaron, Computer Systems: A Programmer's Perspective, Third Edition 4
Fractional binary numbers

• What is 1011.1012?

10/17/2024 Bryant and O'Hallaron, Computer Systems: A Programmer's Perspective, Third Edition 5
Fractional Binary Numbers
2i
2i–1

4
••• 2
1

bi bi–1 • • • b2 b1 b0 . b–1 b–2 b–3 • • • b–j

1/2
1/4 •••
1/8

2–j
• 二进制小数
– Bits to right of “binary point” represent fractional powers of 2
i
k
– Represents rational number:  bk 2
k  j
10/17/2024 Bryant and O'Hallaron, Computer Systems: A Programmer's Perspective, Third Edition 6
Frac. Binary Number Examples
• Value Representation
5 + 3/4 101.112 = 4+1+1/2+1/4
2 + 7/8 10.1112 = 2+1/2+1/4+1/8
63/64 0.1111112 = 1/2+1/4+1/8+1/16+1/32+1/64
• Observations
– Divide by 2 by shifting right
– Multiply by 2 by shifting left
– Numbers of form 0.111111…2 just below 1.0
• 1/2 + 1/4 + 1/8 + … + 1/2i + …  1.0
• Use notation 1.0 – 

10/17/2024 Bryant and O'Hallaron, Computer Systems: A Programmer's Perspective, Third Edition 7
Representable Numbers

• Limitation #1
– Can only represent numbers of the form x/2k
• Other numbers have repeating bit representations

• Value Representation
– 1/3 0.0101010101[01]…2
– 1/5 0.001100110011[0011]…2
– 1/10 0.0001100110011[0011]…2

• Limitation #2
–Just one setting of binary point within the w bits
• Limited range of numbers (very small values? very large?)

10/17/2024 Bryant and O'Hallaron, Computer Systems: A Programmer's Perspective, Third Edition 8
IEEE Floating Point

• IEEE Standard 754

– Established in 1985 as uniform standard for floating point arithmetic
• Before that, many idiosyncratic formats
– Supported by all major CPUs
– Some CPUs don’t implement IEEE 754 in full
e.g., early GPUs, Cell BE processor

• Driven by Numerical Concerns

– Nice standards for rounding, overflow, underflow
– Hard to make go fast in hardware
• Numerical analysts predominated over hardware types in defining standard

10/17/2024 Bryant and O'Hallaron, Computer Systems: A Programmer's Perspective, Third Edition 9
IEEE 754-2019

10/17/2024 Bryant and O'Hallaron, Computer Systems: A Programmer's Perspective, Third Edition 10
Floating Point Representation
• 数值表示：(–1)s M 2E
• Sign bit s determines whether number is negative or positive
• Significand(尾数) M normally a fractional value in range [1.0,2.0).
• Exponent(阶码) E weights value by power of two

• 编码格式
– MSB is sign bit
– exp field encodes E
– frac field encodes M

s exp frac

10/17/2024 Bryant and O'Hallaron, Computer Systems: A Programmer's Perspective, Third Edition 11
Floating Point Precisions
• 单精度Single precision: 32 bits ≈7 decimal digits, 10±38

s exp frac
1 8-bits 23-bits

• 双精度Double Precision：64bits ≈16 decimal digits, 10±308

s exp frac
1 11-bits 52-bits

• 扩展精度: 80 bits (Intel only)

s exp frac
1 15-bits 63 or 64-bits

• 其他格式: half precision (FP16) , quad precision, FP8

10/17/2024 Bryant and O'Hallaron, Computer Systems: A Programmer's Perspective, Third Edition 12
FP8在深度学习模型中的推理精度

10/17/2024 Bryant and O'Hallaron, Computer Systems: A Programmer's Perspective, Third Edition 13
不同Float8格式的模型推理精度

10/17/2024 Bryant and O'Hallaron, Computer Systems: A Programmer's Perspective, Third Edition 14
Three “kinds” of floating point numbers

10/17/2024 Bryant and O'Hallaron, Computer Systems: A Programmer's Perspective, Third Edition 15
“Normalized” Numeric Values
• 规格化浮点数表示的情况：
– Condition: exp  000…0 and exp  111…1
• 指数E(有符号整数）编码为增加偏置值（移码）的非负整数
（exp）
– Exponent coded as biased value：E = exp – Bias
– exp : unsigned value denoted by exp
– Bias : Bias value
• Single precision: 127 (exp: 1…254  E: -126…127) (阶码8位）
• Double precision: 1023 (exp: 1…2046  E: -1022…1023) （阶码11位）
• in general: Bias = 2e-1 - 1, where e is number of exponent bits
• 尾数编码解释为的小数表示
– Significand coded with implied leading 1：M = 1.xxx…x2
– xxx…x: bits of frac
– Minimum when 000…0 (M = 1.0)
– Maximum when 111…1 (M = 2.0 – ε)
– Get extra leading bit for “free”
10/17/2024 Bryant and O'Hallaron, Computer Systems: A Programmer's Perspective, Third Edition 16
Normalized Encoding Example
• Value：Float F = 15213.0;
– 1521310 = 111011011011012 = 1.11011011011012 * 213
• Significand
M = 1.11011011011012
frac = 110110110110100000000002 (23位)
• Exponent
E = 13
Bias = 127
exp = 140 = 100011002
Floating Point Representation (Class 02):
Hex: 4 6 6 D B 4 0 0
Binary: 0100 0110 0110 1101 1011 0100 0000 0000
140: 100 0110 0
15213: 1110 1101 1011 01

10/17/2024 Bryant and O'Hallaron, Computer Systems: A Programmer's Perspective, Third Edition 17
Denormalized Values
• 非规格化浮点数表示的情况
– Condition： exp = 000…0
• 阶码和尾数部分的解释
– Exponent value E = –Bias + 1
– Significand coded with implied leading 0：M = 0.xxx…x2
• xxx…x: bits of frac
• 分为两种情况
– exp = 000…0, frac = 000…0
• Represents value 0
• Note that have distinct values +0 and –0
– exp = 000…0, frac  000…0
• Numbers very close to 0.0
• Lose precision as get smaller
• “Gradual underflow”
10/17/2024 Bryant and O'Hallaron, Computer Systems: A Programmer's Perspective, Third Edition 18
Special Values
• 特殊值的情况： exp = 111…1
• 情形1： exp = 111…1, frac = 000…0
– Represents value(infinity)
– Operation that overflows
– Both positive and negative
– E.g., 1.0/0.0 = 1.0/0.0 = +, 1.0/0.0 = 
• 情形2：exp = 111…1, frac  000…0
• Not-a-Number (NaN)
• Represents case when no numeric value can be determined
• E.g., sqrt(–1), ， *0

10/17/2024 Bryant and O'Hallaron, Computer Systems: A Programmer's Perspective, Third Edition 19
C float Decoding Example
• float: 0xC0A00000
• binary: 1100 0000 1010 0000 0000 0000 0000 0000

• E = exp – Bias = 129 – 127 = 2 (decimal)

• S = 1 -> negative number
• M = 1.010 0000 0000 0000 0000 0000
= 1 + 1/4 = 1.25
• v = (–1)s M 2E = (-1)1 * 1.25 * 22 = -5

10/17/2024 Bryant and O'Hallaron, Computer Systems: A Programmer's Perspective, Third Edition 20
C float Decoding Example #2
• float: 0x001C0000
• binary: 0000 0000 0001 1100 0000 0000 0000 0000

• E = 1– Bias = 1 – 127 = -126 (decimal)

• S = 0 -> positive number
• M = 0.001 1100 0000 0000 0000 0000
= 1 /8+ 1/16+1/32 = 7/32 = 7*2-5
• v = (–1)s M 2E = (-1)0 * 7*2-5* 2-126 = 7*2-131
≈2.571393892 X 10–39

10/17/2024 Bryant and O'Hallaron, Computer Systems: A Programmer's Perspective, Third Edition 21
Summary of Floating Point Real Number Encoding

 -Normalized -Denorm +Denorm +Normalized +

NaN NaN
0 +0

10/17/2024 Bryant and O'Hallaron, Computer Systems: A Programmer's Perspective, Third Edition 22
Tiny Floating Point Example

7 6 3 2 0
s exp frac
1 4 3

• 8-bit 浮点数表示
– the sign bit is in the most significant bit.
– the next four bits are the exponent, with a bias of 7. (24-1-1)
– the last three bits are the frac
• 与IEEE格式形式相同
– normalized, denormalized
– representation of 0, NaN, infinity

10/17/2024 Bryant and O'Hallaron, Computer Systems: A Programmer's Perspective, Third Edition 23
Values Related to the Exponent
Exp exp E 2E

0 0000 -6 1/64 (denorms)

1 0001 -6 1/64
2 0010 -5 1/32
3 0011 -4 1/16
4 0100 -3 1/8
5 0101 -2 1/4
6 0110 -1 1/2
7 0111 0 1
8 1000 +1 2
9 1001 +2 4
10 1010 +3 8
11 1011 +4 16
12 1100 +5 32
13 1101 +6 64
14 1110 +7 128
15 1111 n/a (inf, NaN)
10/17/2024 Bryant and O'Hallaron, Computer Systems: A Programmer's Perspective, Third Edition 24
Dynamic Range
s exp frac E Value

0 0000 000 -6 0
0 0000 001 -6 1/8*1/64 = 1/512 closest to zero
Denormalized 0 0000 010 -6 2/8*1/64 = 2/512
numbers …
0 0000 110 -6 6/8*1/64 = 6/512
0 0000 111 -6 7/8*1/64 = 7/512 largest denorm
0 0001 000 -6 8/8*1/64 = 8/512 smallest norm
0 0001 001 -6 9/8*1/64 = 9/512
…
0 0110 110 -1 14/8*1/2 = 14/16
0 0110 111 -1 15/8*1/2 = 15/16 closest to 1 below
Normalized 0 0111 000 0 8/8*1 = 1
numbers 0 0111 001 0 9/8*1 = 9/8 closest to 1 above
0 0111 010 0 10/8*1 = 10/8
…
0 1110 110 7 14/8*128 = 224
0 1110 111 7 15/8*128 = 240 largest norm
0 1111 000 n/a inf
10/17/2024 Bryant and O'Hallaron, Computer Systems: A Programmer's Perspective, Third Edition 25
Dynamic Range
• 6-bit IEEE-like format
– e = 3 exponent bits
– f = 2 fraction bits
– Bias is 23-1-1 = 3

• Notice: the distribution gets denser towards 0.

-15 -10 -5 0 5 10 15
Denormalized Normalized Infinity

10/17/2024 Bryant and O'Hallaron, Computer Systems: A Programmer's Perspective, Third Edition 26
Distribution of Values (close-up view)

• 6-bit IEEE-like format

– e = 3 exponent bits
– f = 2 fraction bits
– Bias is 3

-1 -0.5 0 0.5 1
Denormalized Normalized Infinity

10/17/2024 Bryant and O'Hallaron, Computer Systems: A Programmer's Perspective, Third Edition 27
Interesting Numbers
• Description exp frac Numeric Value
• Zero 00…00 00…00 0.0
• Smallest Pos. Denorm. 00…00 00…01 2– {23,52} X 2– {126,1022}
– Single  1.4 X 10–45
– Double  4.9 X 10–324
• Largest Denormalized 00…00 11…11 (1.0 – ) X 2– {126,1022}
– Single  1.18 X 10–38
– Double  2.2 X 10–308
• Smallest Pos. Normalized 00…01 00…00 1.0 X 2– {126,1022}
– Just larger than largest denormalized
• One 01…11 00…00 1.0
• Largest Normalized 11…10 11…11 (2.0 – ) X 2{127,1023}
– Single  3.4 X 1038
– Double  1.8 X 10308

10/17/2024 Bryant and O'Hallaron, Computer Systems: A Programmer's Perspective, Third Edition 28
Special Properties of Encoding
• 浮点数与整型数零的表示相同
– All bits = 0
• 大多数情况下无符号整型数比较规则适用于浮点数
– Must first compare sign bits

– Must consider -0 = 0
– NaNs problematic
• Will be greater than any other values?
• What should comparison yield? The answer is complicated.
– Otherwise OK
• Denorm vs. normalized
• Normalized vs. infinity

10/17/2024 Bryant and O'Hallaron, Computer Systems: A Programmer's Perspective, Third Edition 29
Floating Point

• Background: Fractional binary numbers

• IEEE floating point standard: Definition
• Example and properties
• Rounding, addition, multiplication
• Floating point in C
• Summary

10/17/2024 Bryant and O'Hallaron, Computer Systems: A Programmer's Perspective, Third Edition 30
Floating Point Operations: Basic Idea

• x +f y = Round(x + y)
• x ×f y = Round(x × y)
• 基本思路
– First compute exact result
– Make it fit into desired precision
• Possibly overflow if exponent too large
• Possibly round to fit into frac

10/17/2024 Bryant and O'Hallaron, Computer Systems: A Programmer's Perspective, Third Edition 31
Floating Point Operations
• 基本思路
– First compute exact result
– Make it fit into desired precision
• Possibly overflow if exponent too large
• Possibly round to fit into frac
• 舍入方式 (illustrate with $ rounding)
• $1.40 $1.60 $1.50 $2.50 $-1.50
– Zero $1 $1 $1 $2 –$1
– Round down (-) $1 $1 $1 $2 –$2
– Round up (+) $2 $2 $2 $3 –$1
– Nearest Even (default) $1 $2 $2 $2 –$2

Note:
1. Round down: rounded result is close to but no greater than true result.
2. Round up: rounded result is close to but no less than true result.
10/17/2024 Bryant and O'Hallaron, Computer Systems: A Programmer's Perspective, Third Edition 32
Closer Look at Round-To-Even
• IEEE 754默认的舍入方式：Round-To-Even
– Hard to get any other kind without dropping into assembly
• C99 has support for rounding mode management
– All others are statistically biased
• Sum of set of positive numbers will consistently be over- or under-
estimated
• Round-To-Even
– When exactly halfway between two possible values
• Round so that least significant digit is even
– E.g., round to nearest hundredth
1.2349999 1.23 (Less than half way)
1.2350001 1.24 (Greater than half way)
1.2350000 1.24 (Half way—round up)
1.2450000 1.24 (Half way—round down)
10/17/2024 Bryant and O'Hallaron, Computer Systems: A Programmer's Perspective, Third Edition 33
Rounding Binary Numbers
• 二进制小数
– “Even” when least significant bit is 0

• 例如
– Round to nearest 1/4 (2 bits right of binary point)
Value Binary Rounded Action Rounded Value
2 3/32 10.000112 10.002 (<1/2—down) 2
2 3/16 10.001102 10.012 (>1/2—up) 2 1/4
2 7/8 10.111002 11.002 (1/2—up) 3
2 5/8 10.101002 10.102 (1/2—down) 2 1/2

10/17/2024 Bryant and O'Hallaron, Computer Systems: A Programmer's Perspective, Third Edition 34
Rounding
Guard bit: LSB of result
1.BBGRXXX

Round bit: 1st bit removed Sticky bit: OR of remaining bits

• 向上舍入(Round up)的条件
– Round = 1, Sticky = 1 ➙ > 0.5
– Guard = 1, Round = 1, Sticky = 0 ➙ Round to even

Fraction GRS Incr? Rounded

1.0000000 000 N 1.000
1.1010000 100 N 1.101
1.0001000 010 N 1.000
1.0011000 110 Y 1.010
1.0001010 011 Y 1.001
1.1111100 111 Y 10.000
10/17/2024 Bryant and O'Hallaron, Computer Systems: A Programmer's Perspective, Third Edition 35
FP Multiplication
• 两个操作数: (–1)s1 M1 2E1× (–1)s2 M2 2E2
• 具体运算结果: (–1)s M 2E
– Sign s: s1 ^ s2
– Significand M: M1 * M2
– Exponent E: E1 + E2
• 结果调整
– If M ≥ 2, shift M right, increment E
– If E out of range, overflow
– Round M to fit frac precision
• 实现工作量 --尾数相乘
– Biggest chore is multiplying significands

10/17/2024 Bryant and O'Hallaron, Computer Systems: A Programmer's Perspective, Third Edition 36
FP Addition
• 两个操作数 : (–1)s1 M1 2E1+ (–1)s2 M2 2E2
E1–E2
– Assume E1 > E2
(–1)s1 M1
• 具体运算结果: (–1)s M 2E
– Sign s, significand M: + (–1)s2 M2
• Result of signed align & add
– Exponent E: E1 (–1)s M
• 结果调整
– If M ≥ 2, shift M right, increment E
– if M < 1, shift M left k positions, decrement E by k
– Overflow if E out of range
– Round M to fit frac precision

10/17/2024 Bryant and O'Hallaron, Computer Systems: A Programmer's Perspective, Third Edition 37
Mathematical Properties of FP Add
• 是否构成阿贝尔群(Abelian Group)
– Closed under addition? YES
• But may generate infinity or NaN
– Commutative? YES

• Overflow and inexactness of rounding

• (3.14+1e10)-1e10 = 0; 3.14+(1e10-1e10) = 3.14
– 0 is additive identity? YES
– Every element has additive inverse ALMOST
• Except for infinities & NaNs
• 是否满足单调性(Monotonicity)
– a ≥ b  a+c ≥ b+c? ALMOST
• Except for infinities & NaNs

10/17/2024 Bryant and O'Hallaron, Computer Systems: A Programmer's Perspective, Third Edition 38
Math. Properties of FP Mult
• 是否构成交换环(Commutative Ring)
– Closed under multiplication? YES
• But may generate infinity or NaN
– Multiplication Commutative? YES
– Multiplication is Associative? NO
• Possibility of overflow, inexactness of rounding
– 1 is multiplicative identity? YES
– Multiplication distributes over addition? NO
• Possibility of overflow, inexactness of rounding
• 1e20*(1e20-1e20)= 0.0, 1e20*1e20 – 1e20*1e20 = NaN
• 是否满足单调性(Monotonicity)
– a ≥ b & c ≥ 0  a *c ≥ b *c? ALMOST
• Except for infinities & NaNs

10/17/2024 Bryant and O'Hallaron, Computer Systems: A Programmer's Perspective, Third Edition 39
Floating Point in C
• C 支持两种精度的浮点数操作
float single precision
double double precision

• 不同数据类型间的转换规则
– Casting between int, float, & double changes numeric values and bit
representation
– Double or float to int
• Truncates fractional part
• Like rounding toward zero
• Not defined when out of range
– Generally saturates to TMin
– int to double
• Exact conversion, as long as int has ≤ 53 bit word size
– int to float
• Will round according to rounding mode

10/17/2024 Bryant and O'Hallaron, Computer Systems: A Programmer's Perspective, Third Edition 40
Answers to Floating Point Puzzles
int x = …;
float f = …; Assume neither
d nor f is NAN
double d = …;
• x == (int)(float) x No: 24 bit significand
• x == (int)(double) x Yes: 53 bit significand
• f == (float)(double) f Yes: increases precision
• d == (float) d No: loses precision
• f == -(-f); Yes: Just change sign bit
• 2/3 == 2/3.0 No: 2/3 == 0
• d < 0.0 ((d*2) < 0.0) Yes!
• d > f -f > -d Yes!
• d * d >= 0.0 Yes!
• (d+f)-d == f No: Not associative

10/17/2024 Bryant and O'Hallaron, Computer Systems: A Programmer's Perspective, Third Edition 41
Summary
• IEEE 754标准的浮点数运算具有清晰的数学性质
– 我们可以不基于实现来预测其操作行为
– As if computed with perfect precision and then rounded浮点数
的表示形式为 M  2E

• 与数学中的算术运算不同之处：
– Violates associativity/distributivity
– Makes life difficult for compilers & serious numerical applications
programmers

10/17/2024 Bryant and O'Hallaron, Computer Systems: A Programmer's Perspective, Third Edition 42
Additional Slides: Creating Floating Point Number

• 基本步骤
– Normalize to have leading 1
– Round to fit within fraction
– Postnormalize to deal with effects of rounding
• 举例
– Convert 8-bit unsigned numbers to tiny floating point format

10/17/2024 Bryant and O'Hallaron, Computer Systems: A Programmer's Perspective, Third Edition 43
Normalize

• 基本步骤
– Set binary point so that numbers of form 1.xxxxx
– Adjust all to have leading one
• Decrement exponent as shift left

10/17/2024 Bryant and O'Hallaron, Computer Systems: A Programmer's Perspective, Third Edition 44
Postnormalize

• 后序规格化处理
– Rounding may have caused overflow
– Handle by shifting right once & incrementing exponent

10/17/2024 Bryant and O'Hallaron, Computer Systems: A Programmer's Perspective, Third Edition 45
This is important!
• Ariane 5 在其首次发射中爆炸:-造成500万美元损失
– Exploded 37 seconds after liftoff，Cargo worth $500 million

• 原因：
– 64-bit floating point number assigned to 16-bit integer
• Computed horizontal velocity as floating point number
• Converted to 16-bit integer
• Worked OK for Ariane 4
• Overflowed for Ariane 5
• Used same software
– Causes rocket to get incorrect value of horizontal velocity and crash
• 爱国者导弹防御系统未命中飞毛腿- 28人死亡
– System tracks time in tenths of second
– Converted from integer to floating point number.
– Accumulated rounding error causes drift. 20% drift over 8 hours.
– Eventually (on 2/25/1991 system was on for 100 hours) causes range mis_x0002_estimation
sufficiently large to miss in comming missiles
10/17/2024 Bryant and O'Hallaron, Computer Systems: A Programmer's Perspective, Third Edition 46
Acknowledgements

• This course was developed and fine-tuned

by Randal E. Bryant and David O’Hallaron.
They wrote The Book!
• http://www.cs.cmu.edu/~./213/schedule.html

10/17/2024 Bryant and O'Hallaron, Computer Systems: A Programmer's Perspective, Third Edition 47
群、环、域的定义

10/17/2024 Bryant and O'Hallaron, Computer Systems: A Programmer's Perspective, Third Edition 48

Mathematical Preliminaries and Error Analysis
100% (1)
Mathematical Preliminaries and Error Analysis
106 pages
Floating Point Arithmetic
100% (1)
Floating Point Arithmetic
30 pages
Lec07 - Computer Arithmetic - Floating-Point Representation and Arithmetic
No ratings yet
Lec07 - Computer Arithmetic - Floating-Point Representation and Arithmetic
42 pages
ML System Optimization Lecture 11 Quantization
No ratings yet
ML System Optimization Lecture 11 Quantization
150 pages
Fixed & Floating Point
No ratings yet
Fixed & Floating Point
31 pages
Madhusanka Liyanage: Lecture 3: Data Representation in Computer Systems
No ratings yet
Madhusanka Liyanage: Lecture 3: Data Representation in Computer Systems
62 pages
Floating Point
No ratings yet
Floating Point
33 pages
Lecture 4 - Floating Point Data
No ratings yet
Lecture 4 - Floating Point Data
44 pages
COA UNIT-III PPTs Dr.G.Bhaskar ECE
No ratings yet
COA UNIT-III PPTs Dr.G.Bhaskar ECE
64 pages
Lec05 Floating Point
No ratings yet
Lec05 Floating Point
39 pages
Computer Architecture: Nguyễn Trí Thành
No ratings yet
Computer Architecture: Nguyễn Trí Thành
55 pages
4 Floating Point Inclass
No ratings yet
4 Floating Point Inclass
33 pages
Floating - Point - Number
No ratings yet
Floating - Point - Number
36 pages
Class03 cs230s22
No ratings yet
Class03 cs230s22
33 pages
Chapter 03 Arith 3 Float
No ratings yet
Chapter 03 Arith 3 Float
30 pages
Chapter 2
No ratings yet
Chapter 2
30 pages
Lecture 4
No ratings yet
Lecture 4
21 pages
Lecture 14 - Arithmetic Subsystems - Numbering Systems and Floating Point Unit (FPU)
No ratings yet
Lecture 14 - Arithmetic Subsystems - Numbering Systems and Floating Point Unit (FPU)
32 pages
The World Is Not Just Integers: Programming Languages Support Numbers With Fraction
No ratings yet
The World Is Not Just Integers: Programming Languages Support Numbers With Fraction
51 pages
Floating Point: CS230 System Programming 4
No ratings yet
Floating Point: CS230 System Programming 4
39 pages
Floating-Point Numbers
No ratings yet
Floating-Point Numbers
23 pages
Floating Point & Fixed Point Representation - BCA II
No ratings yet
Floating Point & Fixed Point Representation - BCA II
24 pages
Numerical Methods Chap1
No ratings yet
Numerical Methods Chap1
14 pages
Part 1
No ratings yet
Part 1
33 pages
Lecture 2
No ratings yet
Lecture 2
27 pages
04 Float
No ratings yet
04 Float
40 pages
CA 03 Numerical Representations
No ratings yet
CA 03 Numerical Representations
18 pages
04 Float 2
No ratings yet
04 Float 2
44 pages
04 Float
No ratings yet
04 Float
40 pages
"The Course That Gives CMU Its Zip!": Topics
No ratings yet
"The Course That Gives CMU Its Zip!": Topics
30 pages
Floating Point: 15-213: Introduction To Computer Systems 4 Lecture, Sep. 10, 2015
No ratings yet
Floating Point: 15-213: Introduction To Computer Systems 4 Lecture, Sep. 10, 2015
40 pages
Chap 02
No ratings yet
Chap 02
16 pages
08 FloatingPoint
No ratings yet
08 FloatingPoint
52 pages
Floa NG Point: 15 - 213: Introduc On To Computer Systems 4 Lecture, Sep 5, 2013
No ratings yet
Floa NG Point: 15 - 213: Introduc On To Computer Systems 4 Lecture, Sep 5, 2013
40 pages
Ieee Tex
No ratings yet
Ieee Tex
4 pages
L2-Variables and Floating Point Number System
No ratings yet
L2-Variables and Floating Point Number System
38 pages
Floating Point Arithmetic: Numbers
No ratings yet
Floating Point Arithmetic: Numbers
41 pages
Floating Point Arithmetic: Numbers
No ratings yet
Floating Point Arithmetic: Numbers
14 pages
Lecture5 - Arithmetic For Computers - Part 2
No ratings yet
Lecture5 - Arithmetic For Computers - Part 2
57 pages
LEC03 Data II
No ratings yet
LEC03 Data II
45 pages
CH03 Data II
No ratings yet
CH03 Data II
31 pages
5 Data - Floating - Point v1
No ratings yet
5 Data - Floating - Point v1
25 pages
Bits, Bytes, Integers, and Floats Notes
No ratings yet
Bits, Bytes, Integers, and Floats Notes
18 pages
Cao Iii PDF
No ratings yet
Cao Iii PDF
16 pages
Lect4 Floats
No ratings yet
Lect4 Floats
64 pages
ARCh Presentation1
No ratings yet
ARCh Presentation1
12 pages
Floating Point Numbers: CS101 Introduction To Computing
No ratings yet
Floating Point Numbers: CS101 Introduction To Computing
41 pages
8.3 Floating Point Numbers
No ratings yet
8.3 Floating Point Numbers
19 pages
COMPX203 Computer Systems: Number Representation
No ratings yet
COMPX203 Computer Systems: Number Representation
33 pages
Week8 Slides
No ratings yet
Week8 Slides
43 pages
Floating Point Sept 6, 2006 15-213: "The Course That Gives CMU Its Zip!"
No ratings yet
Floating Point Sept 6, 2006 15-213: "The Course That Gives CMU Its Zip!"
34 pages
Computer Arithmetic Representations
No ratings yet
Computer Arithmetic Representations
24 pages
Lecture 4 - Computer Arithmetic
No ratings yet
Lecture 4 - Computer Arithmetic
18 pages
"The Course That Gives CMU Its Zip!": Topics
No ratings yet
"The Course That Gives CMU Its Zip!": Topics
31 pages
Floating Point: - We Need A Way To Represent
No ratings yet
Floating Point: - We Need A Way To Represent
14 pages
Floating Point Arithmetic Class
No ratings yet
Floating Point Arithmetic Class
24 pages
The World Is Not Just Integers: Programming Languages Support Numbers With Fraction
No ratings yet
The World Is Not Just Integers: Programming Languages Support Numbers With Fraction
4 pages
Computer Organization - Data Representation On CPU
No ratings yet
Computer Organization - Data Representation On CPU
67 pages
Gate Cse Cao
100% (1)
Gate Cse Cao
108 pages
Coa Gate Notes
No ratings yet
Coa Gate Notes
318 pages
Computer Arithmetic Representations
No ratings yet
Computer Arithmetic Representations
24 pages
Digital Electronics I - All Lecture Notes
No ratings yet
Digital Electronics I - All Lecture Notes
110 pages
2.number Systems
No ratings yet
2.number Systems
177 pages
KCS402
No ratings yet
KCS402
11 pages
Unrestricted Grammars
No ratings yet
Unrestricted Grammars
12 pages
Computer Science STD XII Notes
No ratings yet
Computer Science STD XII Notes
3 pages
CS 106B Lecture 3: Vectors, Grids, Big O: Friday, April 7, 2017
No ratings yet
CS 106B Lecture 3: Vectors, Grids, Big O: Friday, April 7, 2017
80 pages
Questions & Answers by Ali Hassan Soomro
No ratings yet
Questions & Answers by Ali Hassan Soomro
59 pages
Programming in The Lambda-Calculus: From Church To Scott and Back
100% (1)
Programming in The Lambda-Calculus: From Church To Scott and Back
14 pages
Mid2 Princeton
No ratings yet
Mid2 Princeton
12 pages
Lecturenotes 2
No ratings yet
Lecturenotes 2
46 pages
It - Computer Number System
No ratings yet
It - Computer Number System
12 pages
L16 - Karatsuba Algorithm
No ratings yet
L16 - Karatsuba Algorithm
17 pages
JNTUK R20 B Tech CSE 1-2 Computer Organization Unit 3 Reference 2 Notes
No ratings yet
JNTUK R20 B Tech CSE 1-2 Computer Organization Unit 3 Reference 2 Notes
27 pages
Arithmetic For Computers: Ngo Lam Trung
No ratings yet
Arithmetic For Computers: Ngo Lam Trung
48 pages
Number System
No ratings yet
Number System
2 pages
ToC - Sheet 3
No ratings yet
ToC - Sheet 3
5 pages
Problems and Solutions For Bitwise Operations
No ratings yet
Problems and Solutions For Bitwise Operations
85 pages
Dap An Mon Ky Thuat So DITE226829 Thi Ngay 21-12-2023
No ratings yet
Dap An Mon Ky Thuat So DITE226829 Thi Ngay 21-12-2023
12 pages
2160704
No ratings yet
2160704
16 pages
Lecture 34 NP Completeness
No ratings yet
Lecture 34 NP Completeness
14 pages
GX26-1587-0 Decimal-Hex Integer Conversion Chart
No ratings yet
GX26-1587-0 Decimal-Hex Integer Conversion Chart
16 pages
Logix Platform - How To Convert 64-Bit Real Value To 32-Bit Float
No ratings yet
Logix Platform - How To Convert 64-Bit Real Value To 32-Bit Float
4 pages
Recursion Solutions
No ratings yet
Recursion Solutions
11 pages
Oc - Customer Oc - Order Oc - Order - Recurring Oc - Product
No ratings yet
Oc - Customer Oc - Order Oc - Order - Recurring Oc - Product
12 pages
Puzzles: Sudoku, Kakuro, Akari and More..
No ratings yet
Puzzles: Sudoku, Kakuro, Akari and More..
2 pages
Chapter 1 - Automata
No ratings yet
Chapter 1 - Automata
4 pages
The Binary, Decimal, and Hexadecimal Number Systems
No ratings yet
The Binary, Decimal, and Hexadecimal Number Systems
10 pages
Answer B 1 2
No ratings yet
Answer B 1 2
7 pages
C Programming
From Everand
C Programming
Netra
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

chapter02b float 中文

Uploaded by

chapter02b float 中文

Uploaded by

Computer Systems:

• Background: Fractional binary numbers

bi bi–1 • • • b2 b1 b0 . b–1 b–2 b–3 • • • b–j

• IEEE Standard 754

• Driven by Numerical Concerns

• 双精度Double Precision：64bits ≈16 decimal digits, 10±308

• 扩展精度: 80 bits (Intel only)

• 其他格式: half precision (FP16) , quad precision, FP8

• E = exp – Bias = 129 – 127 = 2 (decimal)

• E = 1– Bias = 1 – 127 = -126 (decimal)

 -Normalized -Denorm +Denorm +Normalized +

0 0000 -6 1/64 (denorms)

• Notice: the distribution gets denser towards 0.

• 6-bit IEEE-like format

• Background: Fractional binary numbers

Round bit: 1st bit removed Sticky bit: OR of remaining bits

Fraction GRS Incr? Rounded

• Overflow and inexactness of rounding

• This course was developed and fine-tuned

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.