0% found this document useful (0 votes)
9 views48 pages

chapter02b float 中文

The document provides an overview of computer systems from a programmer's perspective, focusing on the representation and manipulation of information as bits, including integers and floating-point numbers. It covers the IEEE floating point standard, its properties, and the encoding of floating-point numbers, including normalized and denormalized values. Additionally, it discusses limitations in representable numbers and special values like NaN and infinity.

Uploaded by

dthao70706
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views48 pages

chapter02b float 中文

The document provides an overview of computer systems from a programmer's perspective, focusing on the representation and manipulation of information as bits, including integers and floating-point numbers. It covers the IEEE floating point standard, its properties, and the encoding of floating-point numbers, including normalized and denormalized values. Additionally, it discusses limitations in representable numbers and special values like NaN and infinity.

Uploaded by

dthao70706
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 48

Computer Systems:

A Programmer’s Perspective
计算机系统
周学海
xhzhou@ustc.edu.cn
0551-63492149
中国科学技术大学
Review
• Representing information as bits
• Bit-level manipulations
• Integers
– Representation: unsigned and signed
– Conversion, casting
– Expanding, truncating
– Addition, negation, multiplication, shifting
• Summary
Floating Point

• Background: Fractional binary numbers


• IEEE floating point standard: Definition
• Example and properties
• Rounding, addition, multiplication
• Floating point in C
• Summary

10/17/2024 Bryant and O'Hallaron, Computer Systems: A Programmer's Perspective, Third Edition 3
Floating Point Puzzles
• For each of the following C expressions, either:
– Argue that it is true for all argument values
– Explain why not true
• x == (int)(float) x

int x = …; • x == (int)(double) x

float f = …; • f == (float)(double) f

double d = …; • d == (float) d
• f == -(-f);
• 2/3 == 2/3.0
Assume neither
d nor f is NaN • d < 0.0  ((d*2) < 0.0)
• d > f  -f > -d
• d * d >= 0.0
• (d+f)-d == f

10/17/2024 Bryant and O'Hallaron, Computer Systems: A Programmer's Perspective, Third Edition 4
Fractional binary numbers

• What is 1011.1012?

10/17/2024 Bryant and O'Hallaron, Computer Systems: A Programmer's Perspective, Third Edition 5
Fractional Binary Numbers
2i
2i–1

4
••• 2
1

bi bi–1 • • • b2 b1 b0 . b–1 b–2 b–3 • • • b–j


1/2
1/4 •••
1/8

2–j
• 二进制小数
– Bits to right of “binary point” represent fractional powers of 2
i
k
– Represents rational number:  bk 2
k  j
10/17/2024 Bryant and O'Hallaron, Computer Systems: A Programmer's Perspective, Third Edition 6
Frac. Binary Number Examples
• Value Representation
5 + 3/4 101.112 = 4+1+1/2+1/4
2 + 7/8 10.1112 = 2+1/2+1/4+1/8
63/64 0.1111112 = 1/2+1/4+1/8+1/16+1/32+1/64
• Observations
– Divide by 2 by shifting right
– Multiply by 2 by shifting left
– Numbers of form 0.111111…2 just below 1.0
• 1/2 + 1/4 + 1/8 + … + 1/2i + …  1.0
• Use notation 1.0 – 

10/17/2024 Bryant and O'Hallaron, Computer Systems: A Programmer's Perspective, Third Edition 7
Representable Numbers

• Limitation #1
– Can only represent numbers of the form x/2k
• Other numbers have repeating bit representations

• Value Representation
– 1/3 0.0101010101[01]…2
– 1/5 0.001100110011[0011]…2
– 1/10 0.0001100110011[0011]…2

• Limitation #2
–Just one setting of binary point within the w bits
• Limited range of numbers (very small values? very large?)

10/17/2024 Bryant and O'Hallaron, Computer Systems: A Programmer's Perspective, Third Edition 8
IEEE Floating Point

• IEEE Standard 754


– Established in 1985 as uniform standard for floating point arithmetic
• Before that, many idiosyncratic formats
– Supported by all major CPUs
– Some CPUs don’t implement IEEE 754 in full
e.g., early GPUs, Cell BE processor

• Driven by Numerical Concerns


– Nice standards for rounding, overflow, underflow
– Hard to make go fast in hardware
• Numerical analysts predominated over hardware types in defining standard

10/17/2024 Bryant and O'Hallaron, Computer Systems: A Programmer's Perspective, Third Edition 9
IEEE 754-2019

10/17/2024 Bryant and O'Hallaron, Computer Systems: A Programmer's Perspective, Third Edition 10
Floating Point Representation
• 数值表示:(–1)s M 2E
• Sign bit s determines whether number is negative or positive
• Significand(尾数) M normally a fractional value in range [1.0,2.0).
• Exponent(阶码) E weights value by power of two

• 编码格式
– MSB is sign bit
– exp field encodes E
– frac field encodes M

s exp frac

10/17/2024 Bryant and O'Hallaron, Computer Systems: A Programmer's Perspective, Third Edition 11
Floating Point Precisions
• 单精度Single precision: 32 bits ≈7 decimal digits, 10±38

s exp frac
1 8-bits 23-bits

• 双精度Double Precision:64bits ≈16 decimal digits, 10±308


s exp frac
1 11-bits 52-bits

• 扩展精度: 80 bits (Intel only)

s exp frac
1 15-bits 63 or 64-bits

• 其他格式: half precision (FP16) , quad precision, FP8


10/17/2024 Bryant and O'Hallaron, Computer Systems: A Programmer's Perspective, Third Edition 12
FP8在深度学习模型中的推理精度

10/17/2024 Bryant and O'Hallaron, Computer Systems: A Programmer's Perspective, Third Edition 13
不同Float8格式的模型推理精度

10/17/2024 Bryant and O'Hallaron, Computer Systems: A Programmer's Perspective, Third Edition 14
Three “kinds” of floating point numbers

10/17/2024 Bryant and O'Hallaron, Computer Systems: A Programmer's Perspective, Third Edition 15
“Normalized” Numeric Values
• 规格化浮点数表示的情况:
– Condition: exp  000…0 and exp  111…1
• 指数E(有符号整数)编码为增加偏置值(移码)的非负整数
(exp)
– Exponent coded as biased value:E = exp – Bias
– exp : unsigned value denoted by exp
– Bias : Bias value
• Single precision: 127 (exp: 1…254  E: -126…127) (阶码8位)
• Double precision: 1023 (exp: 1…2046  E: -1022…1023) (阶码11位)
• in general: Bias = 2e-1 - 1, where e is number of exponent bits
• 尾数编码解释为 的小数表示
– Significand coded with implied leading 1:M = 1.xxx…x2
– xxx…x: bits of frac
– Minimum when 000…0 (M = 1.0)
– Maximum when 111…1 (M = 2.0 – ε)
– Get extra leading bit for “free”
10/17/2024 Bryant and O'Hallaron, Computer Systems: A Programmer's Perspective, Third Edition 16
Normalized Encoding Example
• Value:Float F = 15213.0;
– 1521310 = 111011011011012 = 1.11011011011012 * 213
• Significand
M = 1.11011011011012
frac = 110110110110100000000002 (23位)
• Exponent
E = 13
Bias = 127
exp = 140 = 100011002
Floating Point Representation (Class 02):
Hex: 4 6 6 D B 4 0 0
Binary: 0100 0110 0110 1101 1011 0100 0000 0000
140: 100 0110 0
15213: 1110 1101 1011 01

10/17/2024 Bryant and O'Hallaron, Computer Systems: A Programmer's Perspective, Third Edition 17
Denormalized Values
• 非规格化浮点数表示的情况
– Condition: exp = 000…0
• 阶码和尾数部分的解释
– Exponent value E = –Bias + 1
– Significand coded with implied leading 0:M = 0.xxx…x2
• xxx…x: bits of frac
• 分为两种情况
– exp = 000…0, frac = 000…0
• Represents value 0
• Note that have distinct values +0 and –0
– exp = 000…0, frac  000…0
• Numbers very close to 0.0
• Lose precision as get smaller
• “Gradual underflow”
10/17/2024 Bryant and O'Hallaron, Computer Systems: A Programmer's Perspective, Third Edition 18
Special Values
• 特殊值的情况: exp = 111…1
• 情形1: exp = 111…1, frac = 000…0
– Represents value(infinity)
– Operation that overflows
– Both positive and negative
– E.g., 1.0/0.0 = 1.0/0.0 = +, 1.0/0.0 = 
• 情形2:exp = 111…1, frac  000…0
• Not-a-Number (NaN)
• Represents case when no numeric value can be determined
• E.g., sqrt(–1), , *0

10/17/2024 Bryant and O'Hallaron, Computer Systems: A Programmer's Perspective, Third Edition 19
C float Decoding Example
• float: 0xC0A00000
• binary: 1100 0000 1010 0000 0000 0000 0000 0000

• E = exp – Bias = 129 – 127 = 2 (decimal)


• S = 1 -> negative number
• M = 1.010 0000 0000 0000 0000 0000
= 1 + 1/4 = 1.25
• v = (–1)s M 2E = (-1)1 * 1.25 * 22 = -5

10/17/2024 Bryant and O'Hallaron, Computer Systems: A Programmer's Perspective, Third Edition 20
C float Decoding Example #2
• float: 0x001C0000
• binary: 0000 0000 0001 1100 0000 0000 0000 0000

• E = 1– Bias = 1 – 127 = -126 (decimal)


• S = 0 -> positive number
• M = 0.001 1100 0000 0000 0000 0000
= 1 /8+ 1/16+1/32 = 7/32 = 7*2-5
• v = (–1)s M 2E = (-1)0 * 7*2-5* 2-126 = 7*2-131
≈2.571393892 X 10–39

10/17/2024 Bryant and O'Hallaron, Computer Systems: A Programmer's Perspective, Third Edition 21
Summary of Floating Point Real Number Encoding

 -Normalized -Denorm +Denorm +Normalized +

NaN NaN
0 +0

10/17/2024 Bryant and O'Hallaron, Computer Systems: A Programmer's Perspective, Third Edition 22
Tiny Floating Point Example

7 6 3 2 0
s exp frac
1 4 3

• 8-bit 浮点数表示
– the sign bit is in the most significant bit.
– the next four bits are the exponent, with a bias of 7. (24-1-1)
– the last three bits are the frac
• 与IEEE格式形式相同
– normalized, denormalized
– representation of 0, NaN, infinity

10/17/2024 Bryant and O'Hallaron, Computer Systems: A Programmer's Perspective, Third Edition 23
Values Related to the Exponent
Exp exp E 2E

0 0000 -6 1/64 (denorms)


1 0001 -6 1/64
2 0010 -5 1/32
3 0011 -4 1/16
4 0100 -3 1/8
5 0101 -2 1/4
6 0110 -1 1/2
7 0111 0 1
8 1000 +1 2
9 1001 +2 4
10 1010 +3 8
11 1011 +4 16
12 1100 +5 32
13 1101 +6 64
14 1110 +7 128
15 1111 n/a (inf, NaN)
10/17/2024 Bryant and O'Hallaron, Computer Systems: A Programmer's Perspective, Third Edition 24
Dynamic Range
s exp frac E Value

0 0000 000 -6 0
0 0000 001 -6 1/8*1/64 = 1/512 closest to zero
Denormalized 0 0000 010 -6 2/8*1/64 = 2/512
numbers …
0 0000 110 -6 6/8*1/64 = 6/512
0 0000 111 -6 7/8*1/64 = 7/512 largest denorm
0 0001 000 -6 8/8*1/64 = 8/512 smallest norm
0 0001 001 -6 9/8*1/64 = 9/512

0 0110 110 -1 14/8*1/2 = 14/16
0 0110 111 -1 15/8*1/2 = 15/16 closest to 1 below
Normalized 0 0111 000 0 8/8*1 = 1
numbers 0 0111 001 0 9/8*1 = 9/8 closest to 1 above
0 0111 010 0 10/8*1 = 10/8

0 1110 110 7 14/8*128 = 224
0 1110 111 7 15/8*128 = 240 largest norm
0 1111 000 n/a inf
10/17/2024 Bryant and O'Hallaron, Computer Systems: A Programmer's Perspective, Third Edition 25
Dynamic Range
• 6-bit IEEE-like format
– e = 3 exponent bits
– f = 2 fraction bits
– Bias is 23-1-1 = 3

• Notice: the distribution gets denser towards 0.

-15 -10 -5 0 5 10 15
Denormalized Normalized Infinity

10/17/2024 Bryant and O'Hallaron, Computer Systems: A Programmer's Perspective, Third Edition 26
Distribution of Values (close-up view)

• 6-bit IEEE-like format


– e = 3 exponent bits
– f = 2 fraction bits
– Bias is 3

-1 -0.5 0 0.5 1
Denormalized Normalized Infinity

10/17/2024 Bryant and O'Hallaron, Computer Systems: A Programmer's Perspective, Third Edition 27
Interesting Numbers
• Description exp frac Numeric Value
• Zero 00…00 00…00 0.0
• Smallest Pos. Denorm. 00…00 00…01 2– {23,52} X 2– {126,1022}
– Single  1.4 X 10–45
– Double  4.9 X 10–324
• Largest Denormalized 00…00 11…11 (1.0 – ) X 2– {126,1022}
– Single  1.18 X 10–38
– Double  2.2 X 10–308
• Smallest Pos. Normalized 00…01 00…00 1.0 X 2– {126,1022}
– Just larger than largest denormalized
• One 01…11 00…00 1.0
• Largest Normalized 11…10 11…11 (2.0 – ) X 2{127,1023}
– Single  3.4 X 1038
– Double  1.8 X 10308

10/17/2024 Bryant and O'Hallaron, Computer Systems: A Programmer's Perspective, Third Edition 28
Special Properties of Encoding
• 浮点数与整型数零的表示相同
– All bits = 0
• 大多数情况下无符号整型数比较规则适用于浮点数
– Must first compare sign bits

– Must consider -0 = 0
– NaNs problematic
• Will be greater than any other values?
• What should comparison yield? The answer is complicated.
– Otherwise OK
• Denorm vs. normalized
• Normalized vs. infinity

10/17/2024 Bryant and O'Hallaron, Computer Systems: A Programmer's Perspective, Third Edition 29
Floating Point

• Background: Fractional binary numbers


• IEEE floating point standard: Definition
• Example and properties
• Rounding, addition, multiplication
• Floating point in C
• Summary

10/17/2024 Bryant and O'Hallaron, Computer Systems: A Programmer's Perspective, Third Edition 30
Floating Point Operations: Basic Idea

• x +f y = Round(x + y)
• x ×f y = Round(x × y)
• 基本思路
– First compute exact result
– Make it fit into desired precision
• Possibly overflow if exponent too large
• Possibly round to fit into frac

10/17/2024 Bryant and O'Hallaron, Computer Systems: A Programmer's Perspective, Third Edition 31
Floating Point Operations
• 基本思路
– First compute exact result
– Make it fit into desired precision
• Possibly overflow if exponent too large
• Possibly round to fit into frac
• 舍入方式 (illustrate with $ rounding)
• $1.40 $1.60 $1.50 $2.50 $-1.50
– Zero $1 $1 $1 $2 –$1
– Round down (-) $1 $1 $1 $2 –$2
– Round up (+) $2 $2 $2 $3 –$1
– Nearest Even (default) $1 $2 $2 $2 –$2

Note:
1. Round down: rounded result is close to but no greater than true result.
2. Round up: rounded result is close to but no less than true result.
10/17/2024 Bryant and O'Hallaron, Computer Systems: A Programmer's Perspective, Third Edition 32
Closer Look at Round-To-Even
• IEEE 754默认的舍入方式:Round-To-Even
– Hard to get any other kind without dropping into assembly
• C99 has support for rounding mode management
– All others are statistically biased
• Sum of set of positive numbers will consistently be over- or under-
estimated
• Round-To-Even
– When exactly halfway between two possible values
• Round so that least significant digit is even
– E.g., round to nearest hundredth
1.2349999 1.23 (Less than half way)
1.2350001 1.24 (Greater than half way)
1.2350000 1.24 (Half way—round up)
1.2450000 1.24 (Half way—round down)
10/17/2024 Bryant and O'Hallaron, Computer Systems: A Programmer's Perspective, Third Edition 33
Rounding Binary Numbers
• 二进制小数
– “Even” when least significant bit is 0

• 例如
– Round to nearest 1/4 (2 bits right of binary point)
Value Binary Rounded Action Rounded Value
2 3/32 10.000112 10.002 (<1/2—down) 2
2 3/16 10.001102 10.012 (>1/2—up) 2 1/4
2 7/8 10.111002 11.002 (1/2—up) 3
2 5/8 10.101002 10.102 (1/2—down) 2 1/2

10/17/2024 Bryant and O'Hallaron, Computer Systems: A Programmer's Perspective, Third Edition 34
Rounding
Guard bit: LSB of result
1.BBGRXXX

Round bit: 1st bit removed Sticky bit: OR of remaining bits

• 向上舍入(Round up)的条件
– Round = 1, Sticky = 1 ➙ > 0.5
– Guard = 1, Round = 1, Sticky = 0 ➙ Round to even

Fraction GRS Incr? Rounded


1.0000000 000 N 1.000
1.1010000 100 N 1.101
1.0001000 010 N 1.000
1.0011000 110 Y 1.010
1.0001010 011 Y 1.001
1.1111100 111 Y 10.000
10/17/2024 Bryant and O'Hallaron, Computer Systems: A Programmer's Perspective, Third Edition 35
FP Multiplication
• 两个操作数: (–1)s1 M1 2E1× (–1)s2 M2 2E2
• 具体运算结果: (–1)s M 2E
– Sign s: s1 ^ s2
– Significand M: M1 * M2
– Exponent E: E1 + E2
• 结果调整
– If M ≥ 2, shift M right, increment E
– If E out of range, overflow
– Round M to fit frac precision
• 实现工作量 --尾数相乘
– Biggest chore is multiplying significands

10/17/2024 Bryant and O'Hallaron, Computer Systems: A Programmer's Perspective, Third Edition 36
FP Addition
• 两个操作数 : (–1)s1 M1 2E1+ (–1)s2 M2 2E2
E1–E2
– Assume E1 > E2
(–1)s1 M1
• 具体运算结果: (–1)s M 2E
– Sign s, significand M: + (–1)s2 M2
• Result of signed align & add
– Exponent E: E1 (–1)s M
• 结果调整
– If M ≥ 2, shift M right, increment E
– if M < 1, shift M left k positions, decrement E by k
– Overflow if E out of range
– Round M to fit frac precision

10/17/2024 Bryant and O'Hallaron, Computer Systems: A Programmer's Perspective, Third Edition 37
Mathematical Properties of FP Add
• 是否构成 阿贝尔群(Abelian Group)
– Closed under addition? YES
• But may generate infinity or NaN
– Commutative? YES

• Overflow and inexactness of rounding


• (3.14+1e10)-1e10 = 0; 3.14+(1e10-1e10) = 3.14
– 0 is additive identity? YES
– Every element has additive inverse ALMOST
• Except for infinities & NaNs
• 是否满足单调性(Monotonicity)
– a ≥ b  a+c ≥ b+c? ALMOST
• Except for infinities & NaNs

10/17/2024 Bryant and O'Hallaron, Computer Systems: A Programmer's Perspective, Third Edition 38
Math. Properties of FP Mult
• 是否构成交换环(Commutative Ring)
– Closed under multiplication? YES
• But may generate infinity or NaN
– Multiplication Commutative? YES
– Multiplication is Associative? NO
• Possibility of overflow, inexactness of rounding
– 1 is multiplicative identity? YES
– Multiplication distributes over addition? NO
• Possibility of overflow, inexactness of rounding
• 1e20*(1e20-1e20)= 0.0, 1e20*1e20 – 1e20*1e20 = NaN
• 是否满足单调性(Monotonicity)
– a ≥ b & c ≥ 0  a *c ≥ b *c? ALMOST
• Except for infinities & NaNs

10/17/2024 Bryant and O'Hallaron, Computer Systems: A Programmer's Perspective, Third Edition 39
Floating Point in C
• C 支持两种精度的浮点数操作
float single precision
double double precision

• 不同数据类型间的转换规则
– Casting between int, float, & double changes numeric values and bit
representation
– Double or float to int
• Truncates fractional part
• Like rounding toward zero
• Not defined when out of range
– Generally saturates to TMin
– int to double
• Exact conversion, as long as int has ≤ 53 bit word size
– int to float
• Will round according to rounding mode

10/17/2024 Bryant and O'Hallaron, Computer Systems: A Programmer's Perspective, Third Edition 40
Answers to Floating Point Puzzles
int x = …;
float f = …; Assume neither
d nor f is NAN
double d = …;
• x == (int)(float) x No: 24 bit significand
• x == (int)(double) x Yes: 53 bit significand
• f == (float)(double) f Yes: increases precision
• d == (float) d No: loses precision
• f == -(-f); Yes: Just change sign bit
• 2/3 == 2/3.0 No: 2/3 == 0
• d < 0.0 ((d*2) < 0.0) Yes!
• d > f -f > -d Yes!
• d * d >= 0.0 Yes!
• (d+f)-d == f No: Not associative

10/17/2024 Bryant and O'Hallaron, Computer Systems: A Programmer's Perspective, Third Edition 41
Summary
• IEEE 754标准的浮点数运算具有清晰的数学性质
– 我们可以不基于实现来预测其操作行为
– As if computed with perfect precision and then rounded浮点数
的表示形式为 M  2E

• 与数学中的算术运算不同之处:
– Violates associativity/distributivity
– Makes life difficult for compilers & serious numerical applications
programmers

10/17/2024 Bryant and O'Hallaron, Computer Systems: A Programmer's Perspective, Third Edition 42
Additional Slides: Creating Floating Point Number

• 基本步骤
– Normalize to have leading 1
– Round to fit within fraction
– Postnormalize to deal with effects of rounding
• 举例
– Convert 8-bit unsigned numbers to tiny floating point format

10/17/2024 Bryant and O'Hallaron, Computer Systems: A Programmer's Perspective, Third Edition 43
Normalize

• 基本步骤
– Set binary point so that numbers of form 1.xxxxx
– Adjust all to have leading one
• Decrement exponent as shift left

10/17/2024 Bryant and O'Hallaron, Computer Systems: A Programmer's Perspective, Third Edition 44
Postnormalize

• 后序规格化处理
– Rounding may have caused overflow
– Handle by shifting right once & incrementing exponent

10/17/2024 Bryant and O'Hallaron, Computer Systems: A Programmer's Perspective, Third Edition 45
This is important!
• Ariane 5 在其首次发射中爆炸:-造成500万美元损失
– Exploded 37 seconds after liftoff,Cargo worth $500 million

• 原因:
– 64-bit floating point number assigned to 16-bit integer
• Computed horizontal velocity as floating point number
• Converted to 16-bit integer
• Worked OK for Ariane 4
• Overflowed for Ariane 5
• Used same software
– Causes rocket to get incorrect value of horizontal velocity and crash
• 爱国者导弹防御系统未命中飞毛腿- 28人死亡
– System tracks time in tenths of second
– Converted from integer to floating point number.
– Accumulated rounding error causes drift. 20% drift over 8 hours.
– Eventually (on 2/25/1991 system was on for 100 hours) causes range mis_x0002_estimation
sufficiently large to miss in comming missiles
10/17/2024 Bryant and O'Hallaron, Computer Systems: A Programmer's Perspective, Third Edition 46
Acknowledgements

• This course was developed and fine-tuned


by Randal E. Bryant and David O’Hallaron.
They wrote The Book!
• http://www.cs.cmu.edu/~./213/schedule.html

10/17/2024 Bryant and O'Hallaron, Computer Systems: A Programmer's Perspective, Third Edition 47
群、环、域的定义

10/17/2024 Bryant and O'Hallaron, Computer Systems: A Programmer's Perspective, Third Edition 48

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy