0% found this document useful (0 votes)
38 views

Floating Point Instructions: Ray Seyfarth

The document discusses floating point instructions for 64-bit Intel assembly language. It describes how floating point operations were previously handled by a separate chip but are now performed using 16 floating point registers that support both scalar and SIMD instructions. It provides an overview and examples of instructions for moving data to and from registers, basic math operations, conversions, comparisons, and mathematical functions.

Uploaded by

irshad
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
38 views

Floating Point Instructions: Ray Seyfarth

The document discusses floating point instructions for 64-bit Intel assembly language. It describes how floating point operations were previously handled by a separate chip but are now performed using 16 floating point registers that support both scalar and SIMD instructions. It provides an overview and examples of instructions for moving data to and from registers, basic math operations, conversions, comparisons, and mathematical functions.

Uploaded by

irshad
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

Floating Point Instructions

Ray Seyfarth

June 29, 2012

64 Bit Intel Assembly Language 2011


c Ray Seyfarth
Floating point instructions

PC floating point operations were once done in a separate chip - 8087


This chip managed a stack of eight 80 bit floating point values
The stack and instructions still exist, but are largely ignored
x86-64 CPUs have 16 floating point registers (128 or 256 bits)
These registers can be used for single data instructions or single
instruction multiple data instructions (SIMD)
We will focus on these newer registers
The older instructions tended to start with the letter “f” and
referenced the stack using register names like ST0
The newer instructions reference using registers with names like
“XMMO”

64 Bit Intel Assembly Language 2011


c Ray Seyfarth
Outline

1 Moving data in and out of floating point registers

2 Addition

3 Subtraction

4 Basic floating point instructions

5 Data conversion

6 Floating point comparisons

7 Mathematical functions

8 Sample floating point code

64 Bit Intel Assembly Language 2011


c Ray Seyfarth
Moving scalars to or from floating point registers

movss moves a single 32 bit floating point value to or from an XMM


register
movsd moves a single 64 bit floating point value
There is no implicit data conversion - unlike the old instructions
which converted floating point data to an 80 bit internal format
The instructions follow the standard pattern of having possibly one
memory address

movss xmm0, [x] ; move value at x into xmm0


movsd [y], xmm1 ; move value from xmm1 to y
movss xmm2, xmm0 ; move from xmm0 to xmm2

64 Bit Intel Assembly Language 2011


c Ray Seyfarth
Moving packed data

The XMM registers are 128 bits


They can hold 4 floats or 2 doubles (or integers of various sizes)
On newer CPUs they are extended to 256 bits and referred to as YMM
registers when using all 256 bits
movaps moves 4 floats to/from a memory address aligned at a 16
byte boundary
movups does the same task with unaligned memory addresses
The Core i series performs unaligned moves efficiently
movapd moves 2 doubles to/from a memory address aligned at a 16
byte boundary
movupd does the same task with unaligned memory addresses

movups xmm0, [x] ; move 4 floats to xmm0


movupd [a], xmm15 ; move 2 doubles to a

64 Bit Intel Assembly Language 2011


c Ray Seyfarth
Floating point addition
addss adds a scalar float (single precision) to another
addsd adds a scalar double to another
addps adds 4 floats to 4 floats - pairwise addition
addpd adds 2 doubles to 2 doubles
There are 2 operands: destination and source
The source can be memory or an XMM register
The destination must be an XMM register
Flags are unaffected

movss xmm0, [a] ; load a


addss xmm0, [b] ; add b to a
movss [c], xmm0 ; store sum in c
movapd xmm0, [a] ; load 2 doubles from a
addpd xmm0, [b] ; add a[0]+b[0] and a[1]+b[1]
movapd [c], xmm0 ; store 2 sums in c
64 Bit Intel Assembly Language 2011
c Ray Seyfarth
Floating point subtraction

subss subtracts the source float from the destination


subsd subtracts the source double from the destination
subps subtracts 4 floats from 4 floats
subpd subtracts 2 doubles from 2 doubles

movss xmm0, [a] ; load a


subss xmm0, [b] ; add b from a
movss [c], xmm0 ; store a-b in c
movapd xmm0, [a] ; load 2 doubles from a
subpd xmm0, [b] ; add a[0]-b[0] and a[1]-b[1]
movapd [c], xmm0 ; store 2 differences in c

64 Bit Intel Assembly Language 2011


c Ray Seyfarth
Basic floating point instructions

instruction effect
addsd add scalar double
addss add scalar float
addpd add packed double
addps add packed float
subsd subtract scalar double
subss subtract scalar float
subpd subtract packed double
subps subtract packed float
mulsd multiply scalar double
mulss multiply scalar float
mulpd multiply packed double
mulps multiply packed float
divsd divide scalar double
divss divide scalar float
divpd divide packed double
divps divide packed float

64 Bit Intel Assembly Language 2011


c Ray Seyfarth
Conversion to a different length floating point

cvtss2sd converts a scalar single (float) to a scalar double


cvtps2pd converts 2 packed floats to 2 packed doubles
cvtsd2ss converts a scalar double to a scalar float
cvtpd2ps converts 2 packed doubles to 2 packed floats

cvtss2sd xmm0, [a] ; get a into xmm0 as a double


addsd xmm0, [b] ; add a double to a
cvtsd2ss xmm0, xmm0 ; convert to float
movss [c], xmm0

64 Bit Intel Assembly Language 2011


c Ray Seyfarth
Converting floating point to/from integer

cvtss2si converts a float to a double word or quad word integer


cvtsd2si converts a float to a double word or quad word integer
These 2 round the value
cvttss2si and cvttsd2si convert by truncation
cvtsi2ss converts an integer to a float in an XMM register
cvtsi2sd converts an integer to a double in an XMM register
When converting from memory a size qualifier is needed

cvtss2si eax, xmm0 ; convert to dword integer


cvtsi2sd xmm0, rax ; convert qword to double
cvtsi2sd xmm0, dword [x] ; convert dword integer

64 Bit Intel Assembly Language 2011


c Ray Seyfarth
Unordered versus ordered comparisons
Floating point comparisons can cause exceptions
Ordered comparisons cause exceptions one QNaN or SNaN
I QNaN means “quiet not a number”
I SNaN means “signalling not a number”
I Both have all exponent field bits set to 1
I QNaN has its top fraction bit equal to 1
An unordered comparison causes exceptions only for SNaN
gcc uses unordered comparisons
If it’s good enough for gcc, it’s good enough for me
ucomiss compares floats
ucomisd compares doubles
The first operand must be an XMM register
They set the zero flag, parity flag and carry flags
movss xmm0, [a]
mulss xmm0, [b]
ucomiss xmm0, [c]
jmple less_eq ; jmp if a*b <= c
64 Bit Intel Assembly Language 2011
c Ray Seyfarth
Mathematical functions

8087 had sine, cosine, arctangent and more


The newer instructions omit these operations on XMM registers
Instead you are supposed to use efficient library functions
There are instructions for
I Minimum
I Maximum
I Rounding
I Square root
I Reciprocal of square root

64 Bit Intel Assembly Language 2011


c Ray Seyfarth
Minimum and maximum

minss and maxss compute minimum or maximum of scalar floats


minsd and maxsd compute minimum or maximum of scalar doubles
The destination operand must be an XMM register
The source can be an XMM register or memory
minps and maxps compute minimum or maximum of packed floats
minpd and maxpd compute minimum or maximum of packed doubles
minps xmm0, xmm1 computes 4 minimums and places them in xmm0

movss xmm0, [x] ; move x into xmm0


maxss xmm0, [y] ; xmm0 has max(x,y)
movapd xmm0, [a] ; move a[0] and a[1] into xmm0
minpd xmm0, [b] ; xmm0[0] has min(a[0],b[0])
; xmm0[1] has min(a[1],b[1])

64 Bit Intel Assembly Language 2011


c Ray Seyfarth
Rounding

roundss rounds 1 float


roundps rounds 4 floats
roundsd rounds 1 double
roundpd rounds 2 doubles
The first operand is an XMM destination register
The second is the source in an XMM register or memory
The third operand is a rounding mode

mode meaning
0 round, giving ties to even numbers
1 round down
2 round up
3 round toward 0 (truncate)

64 Bit Intel Assembly Language 2011


c Ray Seyfarth
Square roots

sqrtss computes 1 float square root


sqrtps computes 4 float square roots
sqrtsd computes 1 double square root
sqrtpd computes 2 double square roots
The first operand is an XMM destination register
The second is the source in an XMM register or memory

64 Bit Intel Assembly Language 2011


c Ray Seyfarth
Distance in 3D
q
d= ((x1 − x2 )2 + (y1 − y2 )2 + (z1 − z2 )2 )
distance3d:
movss xmm0, [rdi] ; x from first point
subss xmm0, [rsi] ; subtract x from second point
mulss xmm0, xmm0 ; (x1-x2)^2
movss xmm1, [rdi+4] ; y from first point
subss xmm1, [rsi+4] ; subtract y from second point
mulss xmm1, xmm1 ; (y1-y2)^2
movss xmm2, [rdi+8] ; z from first point
subss xmm2, [rsi+8] ; subtract z from second point
mulss xmm2, xmm2 ; (z1-z2)^2
addss xmm0, xmm1 ; add x and y parts
addss xmm0, xmm2 ; add z part
sqrt xmm0, xmm0
ret
64 Bit Intel Assembly Language 2011
c Ray Seyfarth
Dot product in 3D

d = x1 x2 + y1 y2 + z1 z2

dot_product:
movss xmm0, [rdi]
mulss xmm0, [rsi]
movss xmm1, [rdi+4]
mulss xmm1, [rsi+4]
addss xmm0, xmm1
movss xmm2, [rdi+8]
mulss xmm2, [rsi+8]
addss xmm0, xmm2
ret

64 Bit Intel Assembly Language 2011


c Ray Seyfarth
Polynomial evaluation by Horner’s Rule

P(x) = p0 + p1 x + p2 x 2 · · · pn x n

bn = pn
bn−1 = pn−1 + bn x
bn−2 = pn−2 + bn−1 x
b0 = p0 + b1 x
horner: movsd xmm1, xmm0 ; use xmm1 as x
movsd xmm0, [rdi+rsi*8] ; accumulator for b_k
test esi, 0 ; is the degree 0?
jz done
more: sub esi, 1
mulsd xmm0, xmm1 ; b_k * x
addsd xmm0, [rdi+rsi*8] ; add p_k
jnz more
done: ret
64 Bit Intel Assembly Language 2011
c Ray Seyfarth

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy