0% found this document useful (0 votes)

38 views

Floating Point Instructions: Ray Seyfarth

The document discusses floating point instructions for 64-bit Intel assembly language. It describes how floating point operations were previously handled by a separate chip but are now performed using 16 floating point registers that support both scalar and SIMD instructions. It provides an overview and examples of instructions for moving data to and from registers, basic math operations, conversions, comparisons, and mathematical functions.

Uploaded by

irshad

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

38 views

Floating Point Instructions: Ray Seyfarth

Uploaded by

irshad

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 18

Floating Point Instructions

Ray Seyfarth

June 29, 2012

64 Bit Intel Assembly Language 2011

c Ray Seyfarth
Floating point instructions

PC floating point operations were once done in a separate chip - 8087

This chip managed a stack of eight 80 bit floating point values
The stack and instructions still exist, but are largely ignored
x86-64 CPUs have 16 floating point registers (128 or 256 bits)
These registers can be used for single data instructions or single
instruction multiple data instructions (SIMD)
We will focus on these newer registers
The older instructions tended to start with the letter “f” and
referenced the stack using register names like ST0
The newer instructions reference using registers with names like
“XMMO”

64 Bit Intel Assembly Language 2011

c Ray Seyfarth
Outline

1 Moving data in and out of floating point registers

2 Addition

3 Subtraction

4 Basic floating point instructions

5 Data conversion

6 Floating point comparisons

7 Mathematical functions

8 Sample floating point code

64 Bit Intel Assembly Language 2011

c Ray Seyfarth
Moving scalars to or from floating point registers

movss moves a single 32 bit floating point value to or from an XMM

register
movsd moves a single 64 bit floating point value
There is no implicit data conversion - unlike the old instructions
which converted floating point data to an 80 bit internal format
The instructions follow the standard pattern of having possibly one
memory address

movss xmm0, [x] ; move value at x into xmm0

movsd [y], xmm1 ; move value from xmm1 to y
movss xmm2, xmm0 ; move from xmm0 to xmm2

64 Bit Intel Assembly Language 2011

c Ray Seyfarth
Moving packed data

The XMM registers are 128 bits

They can hold 4 floats or 2 doubles (or integers of various sizes)
On newer CPUs they are extended to 256 bits and referred to as YMM
registers when using all 256 bits
movaps moves 4 floats to/from a memory address aligned at a 16
byte boundary
movups does the same task with unaligned memory addresses
The Core i series performs unaligned moves efficiently
movapd moves 2 doubles to/from a memory address aligned at a 16
byte boundary
movupd does the same task with unaligned memory addresses

movups xmm0, [x] ; move 4 floats to xmm0

movupd [a], xmm15 ; move 2 doubles to a

64 Bit Intel Assembly Language 2011

c Ray Seyfarth
Floating point addition
addss adds a scalar float (single precision) to another
addsd adds a scalar double to another
addps adds 4 floats to 4 floats - pairwise addition
addpd adds 2 doubles to 2 doubles
There are 2 operands: destination and source
The source can be memory or an XMM register
The destination must be an XMM register
Flags are unaffected

movss xmm0, [a] ; load a

addss xmm0, [b] ; add b to a
movss [c], xmm0 ; store sum in c
movapd xmm0, [a] ; load 2 doubles from a
addpd xmm0, [b] ; add a[0]+b[0] and a[1]+b[1]
movapd [c], xmm0 ; store 2 sums in c
64 Bit Intel Assembly Language 2011
c Ray Seyfarth
Floating point subtraction

subss subtracts the source float from the destination

subsd subtracts the source double from the destination
subps subtracts 4 floats from 4 floats
subpd subtracts 2 doubles from 2 doubles

movss xmm0, [a] ; load a

subss xmm0, [b] ; add b from a
movss [c], xmm0 ; store a-b in c
movapd xmm0, [a] ; load 2 doubles from a
subpd xmm0, [b] ; add a[0]-b[0] and a[1]-b[1]
movapd [c], xmm0 ; store 2 differences in c

64 Bit Intel Assembly Language 2011

c Ray Seyfarth
Basic floating point instructions

instruction effect
addsd add scalar double
addss add scalar float
addpd add packed double
addps add packed float
subsd subtract scalar double
subss subtract scalar float
subpd subtract packed double
subps subtract packed float
mulsd multiply scalar double
mulss multiply scalar float
mulpd multiply packed double
mulps multiply packed float
divsd divide scalar double
divss divide scalar float
divpd divide packed double
divps divide packed float

64 Bit Intel Assembly Language 2011

c Ray Seyfarth
Conversion to a different length floating point

cvtss2sd converts a scalar single (float) to a scalar double

cvtps2pd converts 2 packed floats to 2 packed doubles
cvtsd2ss converts a scalar double to a scalar float
cvtpd2ps converts 2 packed doubles to 2 packed floats

cvtss2sd xmm0, [a] ; get a into xmm0 as a double

addsd xmm0, [b] ; add a double to a
cvtsd2ss xmm0, xmm0 ; convert to float
movss [c], xmm0

64 Bit Intel Assembly Language 2011

c Ray Seyfarth
Converting floating point to/from integer

cvtss2si converts a float to a double word or quad word integer

cvtsd2si converts a float to a double word or quad word integer
These 2 round the value
cvttss2si and cvttsd2si convert by truncation
cvtsi2ss converts an integer to a float in an XMM register
cvtsi2sd converts an integer to a double in an XMM register
When converting from memory a size qualifier is needed

cvtss2si eax, xmm0 ; convert to dword integer

cvtsi2sd xmm0, rax ; convert qword to double
cvtsi2sd xmm0, dword [x] ; convert dword integer

64 Bit Intel Assembly Language 2011

c Ray Seyfarth
Unordered versus ordered comparisons
Floating point comparisons can cause exceptions
Ordered comparisons cause exceptions one QNaN or SNaN
I QNaN means “quiet not a number”
I SNaN means “signalling not a number”
I Both have all exponent field bits set to 1
I QNaN has its top fraction bit equal to 1
An unordered comparison causes exceptions only for SNaN
gcc uses unordered comparisons
If it’s good enough for gcc, it’s good enough for me
ucomiss compares floats
ucomisd compares doubles
The first operand must be an XMM register
They set the zero flag, parity flag and carry flags
movss xmm0, [a]
mulss xmm0, [b]
ucomiss xmm0, [c]
jmple less_eq ; jmp if a*b <= c
64 Bit Intel Assembly Language 2011
c Ray Seyfarth
Mathematical functions

8087 had sine, cosine, arctangent and more

The newer instructions omit these operations on XMM registers
Instead you are supposed to use efficient library functions
There are instructions for
I Minimum
I Maximum
I Rounding
I Square root
I Reciprocal of square root

64 Bit Intel Assembly Language 2011

c Ray Seyfarth
Minimum and maximum

minss and maxss compute minimum or maximum of scalar floats

minsd and maxsd compute minimum or maximum of scalar doubles
The destination operand must be an XMM register
The source can be an XMM register or memory
minps and maxps compute minimum or maximum of packed floats
minpd and maxpd compute minimum or maximum of packed doubles
minps xmm0, xmm1 computes 4 minimums and places them in xmm0

movss xmm0, [x] ; move x into xmm0

maxss xmm0, [y] ; xmm0 has max(x,y)
movapd xmm0, [a] ; move a[0] and a[1] into xmm0
minpd xmm0, [b] ; xmm0[0] has min(a[0],b[0])
; xmm0[1] has min(a[1],b[1])

64 Bit Intel Assembly Language 2011

c Ray Seyfarth
Rounding

roundss rounds 1 float

roundps rounds 4 floats
roundsd rounds 1 double
roundpd rounds 2 doubles
The first operand is an XMM destination register
The second is the source in an XMM register or memory
The third operand is a rounding mode

mode meaning
0 round, giving ties to even numbers
1 round down
2 round up
3 round toward 0 (truncate)

64 Bit Intel Assembly Language 2011

c Ray Seyfarth
Square roots

sqrtss computes 1 float square root

sqrtps computes 4 float square roots
sqrtsd computes 1 double square root
sqrtpd computes 2 double square roots
The first operand is an XMM destination register
The second is the source in an XMM register or memory

64 Bit Intel Assembly Language 2011

c Ray Seyfarth
Distance in 3D
q
d= ((x1 − x2 )2 + (y1 − y2 )2 + (z1 − z2 )2 )
distance3d:
movss xmm0, [rdi] ; x from first point
subss xmm0, [rsi] ; subtract x from second point
mulss xmm0, xmm0 ; (x1-x2)^2
movss xmm1, [rdi+4] ; y from first point
subss xmm1, [rsi+4] ; subtract y from second point
mulss xmm1, xmm1 ; (y1-y2)^2
movss xmm2, [rdi+8] ; z from first point
subss xmm2, [rsi+8] ; subtract z from second point
mulss xmm2, xmm2 ; (z1-z2)^2
addss xmm0, xmm1 ; add x and y parts
addss xmm0, xmm2 ; add z part
sqrt xmm0, xmm0
ret
64 Bit Intel Assembly Language 2011
c Ray Seyfarth
Dot product in 3D

d = x1 x2 + y1 y2 + z1 z2

dot_product:
movss xmm0, [rdi]
mulss xmm0, [rsi]
movss xmm1, [rdi+4]
mulss xmm1, [rsi+4]
addss xmm0, xmm1
movss xmm2, [rdi+8]
mulss xmm2, [rsi+8]
addss xmm0, xmm2
ret

64 Bit Intel Assembly Language 2011

c Ray Seyfarth
Polynomial evaluation by Horner’s Rule

P(x) = p0 + p1 x + p2 x 2 · · · pn x n

bn = pn
bn−1 = pn−1 + bn x
bn−2 = pn−2 + bn−1 x
b0 = p0 + b1 x
horner: movsd xmm1, xmm0 ; use xmm1 as x
movsd xmm0, [rdi+rsi*8] ; accumulator for b_k
test esi, 0 ; is the degree 0?
jz done
more: sub esi, 1
mulsd xmm0, xmm1 ; b_k * x
addsd xmm0, [rdi+rsi*8] ; add p_k
jnz more
done: ret
64 Bit Intel Assembly Language 2011
c Ray Seyfarth

COAL Lab#05
100% (1)
COAL Lab#05
23 pages
Assembly Language Program With 8085 Microprocessor
100% (1)
Assembly Language Program With 8085 Microprocessor
22 pages
Activity No 2 Registers
No ratings yet
Activity No 2 Registers
9 pages
Chapter14 - THE ARITHMETIC COPROCESSOR, MMX, AND SIMD TECHNOLOGIES
No ratings yet
Chapter14 - THE ARITHMETIC COPROCESSOR, MMX, AND SIMD TECHNOLOGIES
134 pages
Shyfem Finite Element Model For Coastal Seas User Manual
No ratings yet
Shyfem Finite Element Model For Coastal Seas User Manual
54 pages
SIMD v1
No ratings yet
SIMD v1
31 pages
ADVANCED COMPUTER ARCHITECTURE
No ratings yet
ADVANCED COMPUTER ARCHITECTURE
71 pages
Useful x86 Instructions This Is A Very Small Subset of The Available In-Structions But Should Be Enough For Your Pur - Poses
No ratings yet
Useful x86 Instructions This Is A Very Small Subset of The Available In-Structions But Should Be Enough For Your Pur - Poses
31 pages
Module 2
No ratings yet
Module 2
26 pages
Lec15 x86SIMD
No ratings yet
Lec15 x86SIMD
74 pages
Lec15 x86SIMD
No ratings yet
Lec15 x86SIMD
74 pages
Exception Handling: M. Krishna Kumar MM/M4/LU11/V1/2004 1
No ratings yet
Exception Handling: M. Krishna Kumar MM/M4/LU11/V1/2004 1
33 pages
Computer Architecture: Assoc. Prof. Nguyễn Trí Thành, Phd
No ratings yet
Computer Architecture: Assoc. Prof. Nguyễn Trí Thành, Phd
126 pages
4_5814292933075013478
No ratings yet
4_5814292933075013478
109 pages
Chapter 7 P1
No ratings yet
Chapter 7 P1
25 pages
Vector Floating Point Instruction Set Quick Reference Card: Key To Tables
No ratings yet
Vector Floating Point Instruction Set Quick Reference Card: Key To Tables
3 pages
Artifact FLO1 2 ELab Learning Tools
No ratings yet
Artifact FLO1 2 ELab Learning Tools
31 pages
7 Machine - Condition - Codes v2
No ratings yet
7 Machine - Condition - Codes v2
25 pages
14 Assembly Instructions
No ratings yet
14 Assembly Instructions
9 pages
L5_8086_Instructions_1 (1)
No ratings yet
L5_8086_Instructions_1 (1)
25 pages
COA2
No ratings yet
COA2
28 pages
Do Hoang Tu - Operating System From 0 To 1 (2022) - Removed - Removed - Removed
No ratings yet
Do Hoang Tu - Operating System From 0 To 1 (2022) - Removed - Removed - Removed
21 pages
MM 2
No ratings yet
MM 2
266 pages
L5 Arithmetic Logic and Shift Instr
No ratings yet
L5 Arithmetic Logic and Shift Instr
13 pages
?addressing Modes
No ratings yet
?addressing Modes
36 pages
MP_Lect_4
No ratings yet
MP_Lect_4
37 pages
12 - Floating Point Instructions
No ratings yet
12 - Floating Point Instructions
25 pages
Lec17 x86SIMD PDF
No ratings yet
Lec17 x86SIMD PDF
80 pages
MMX Notes
No ratings yet
MMX Notes
2 pages
l5 instruction set and addressing modes
No ratings yet
l5 instruction set and addressing modes
48 pages
Intel X86 and Arm Data Types
No ratings yet
Intel X86 and Arm Data Types
20 pages
FALLSEM2021-22 CSE2006 ETH VL2021220104026 Reference Material I 16-11-2021 23-A-8087-Coprocessor Instructions-Programming
No ratings yet
FALLSEM2021-22 CSE2006 ETH VL2021220104026 Reference Material I 16-11-2021 23-A-8087-Coprocessor Instructions-Programming
51 pages
Chapter 4 (1)
No ratings yet
Chapter 4 (1)
18 pages
Addressing Modes
No ratings yet
Addressing Modes
4 pages
Programming With SIMD-instructions
No ratings yet
Programming With SIMD-instructions
10 pages
Chapter2 Part 2 Machine Instructions and Programs
No ratings yet
Chapter2 Part 2 Machine Instructions and Programs
38 pages
Intel SIMD Architecture: Computer Organization and Assembly Languages Yung-Yu Chuang
No ratings yet
Intel SIMD Architecture: Computer Organization and Assembly Languages Yung-Yu Chuang
80 pages
Computer Architecture and Organization: The Central Processing Unit
100% (1)
Computer Architecture and Organization: The Central Processing Unit
126 pages
Computer Architecture - Lab 7: Floating Point Arithmetic On MIPS
100% (1)
Computer Architecture - Lab 7: Floating Point Arithmetic On MIPS
10 pages
Lec2 Instructions
No ratings yet
Lec2 Instructions
33 pages
MP02_insruction-set-1
No ratings yet
MP02_insruction-set-1
31 pages
Sunu4 Arithmetic, Logic and Control Instructions2012
No ratings yet
Sunu4 Arithmetic, Logic and Control Instructions2012
45 pages
5.Week
No ratings yet
5.Week
47 pages
Chapter 4 - Arithmetic and Logic Instructions
No ratings yet
Chapter 4 - Arithmetic and Logic Instructions
47 pages
Intel x86 Instruction Set Architecture: Dr. Nihat Adar
No ratings yet
Intel x86 Instruction Set Architecture: Dr. Nihat Adar
41 pages
Computer Organization & Architecture: Chapter 2 (Lecture 2)
No ratings yet
Computer Organization & Architecture: Chapter 2 (Lecture 2)
50 pages
Unit I
No ratings yet
Unit I
131 pages
Assembly Language II
No ratings yet
Assembly Language II
27 pages
COA Class3
No ratings yet
COA Class3
57 pages
Lecture 7
No ratings yet
Lecture 7
62 pages
MSB LSB: Unit Ii: Machine Instructions and Programs
No ratings yet
MSB LSB: Unit Ii: Machine Instructions and Programs
17 pages
Assembly Language Programming II Instruction Set
No ratings yet
Assembly Language Programming II Instruction Set
12 pages
lecture01-intro
No ratings yet
lecture01-intro
67 pages
Mic Unit III
No ratings yet
Mic Unit III
70 pages
AL41
No ratings yet
AL41
104 pages
Lecture MP 4 PDF
No ratings yet
Lecture MP 4 PDF
107 pages
Activity No 2 Registers
No ratings yet
Activity No 2 Registers
9 pages
8087 Coprocessor
100% (1)
8087 Coprocessor
28 pages
Instruction Sets in Computer Architecture
No ratings yet
Instruction Sets in Computer Architecture
52 pages
Programming in Assembly Language
100% (1)
Programming in Assembly Language
9 pages
Basic Information About C language PDF
From Everand
Basic Information About C language PDF
Suraj Das
No ratings yet
Iran Detailed Political Notes
No ratings yet
Iran Detailed Political Notes
6 pages
India Current Report
No ratings yet
India Current Report
1 page
Iran Current Report
No ratings yet
Iran Current Report
1 page
Dynamic Malware Analysis Workshop: Counterfeiting The Pipes With Fakenet
No ratings yet
Dynamic Malware Analysis Workshop: Counterfeiting The Pipes With Fakenet
50 pages
Pymodbustcp Documentation: Release 0.1.10
No ratings yet
Pymodbustcp Documentation: Release 0.1.10
34 pages
JavaScript For Beginners
No ratings yet
JavaScript For Beginners
68 pages
Working With OPC Data
No ratings yet
Working With OPC Data
59 pages
BSc_R_Basics
No ratings yet
BSc_R_Basics
28 pages
Stan Reference Manual 2 28
No ratings yet
Stan Reference Manual 2 28
205 pages
Matlab For Chemical Engineer2 - Zaidoon PDF
100% (4)
Matlab For Chemical Engineer2 - Zaidoon PDF
113 pages
Replacement Guidelines: Logix 5000 Controllers: Reference Manual
No ratings yet
Replacement Guidelines: Logix 5000 Controllers: Reference Manual
170 pages
Floating Point Arithmetic Class
No ratings yet
Floating Point Arithmetic Class
24 pages
Unit-2 Arithmetic Logic Unit (ALU)
No ratings yet
Unit-2 Arithmetic Logic Unit (ALU)
13 pages
Java Programming for Engineers 1st Edition Julio Sanchez download
100% (1)
Java Programming for Engineers 1st Edition Julio Sanchez download
52 pages
Gnu MPFR: The Multiple Precision Floating-Point Reliable Library Edition 4.0.2 January 2019
No ratings yet
Gnu MPFR: The Multiple Precision Floating-Point Reliable Library Edition 4.0.2 January 2019
70 pages
VLSI Implementation of Bit Serial Architecture Based Multiplier in Floating Point Arithmetic
No ratings yet
VLSI Implementation of Bit Serial Architecture Based Multiplier in Floating Point Arithmetic
6 pages
What Is JavaScript
100% (1)
What Is JavaScript
23 pages
Floating Point
No ratings yet
Floating Point
16 pages
HC900 Communication
No ratings yet
HC900 Communication
106 pages
MATLAB Programming Fundamentals
No ratings yet
MATLAB Programming Fundamentals
1,636 pages
Floating-Point Arithmetic PDF
No ratings yet
Floating-Point Arithmetic PDF
74 pages
A Tutorial On Data Representation - Integers, Floating-Point Numbers, and Characters
No ratings yet
A Tutorial On Data Representation - Integers, Floating-Point Numbers, and Characters
25 pages
Lua 5.4 Manual
No ratings yet
Lua 5.4 Manual
157 pages
C# Operators - Microsoft Docs
No ratings yet
C# Operators - Microsoft Docs
7 pages
Python 3.7.3 Documentation - Mathematical Functions
No ratings yet
Python 3.7.3 Documentation - Mathematical Functions
8 pages
ES6 Cheatsheet PDF
No ratings yet
ES6 Cheatsheet PDF
6 pages
Java 4 U
No ratings yet
Java 4 U
283 pages
Floating Point Arithmetic
100% (1)
Floating Point Arithmetic
30 pages
ICCV'21 Liu Improving Neural Network Efficiency Via Post-Training Quantization With Adaptive Floating-Point ICCV 2021 Paper
No ratings yet
ICCV'21 Liu Improving Neural Network Efficiency Via Post-Training Quantization With Adaptive Floating-Point ICCV 2021 Paper
10 pages
The "Script" Tag: Alert ('Hello, World!')
No ratings yet
The "Script" Tag: Alert ('Hello, World!')
35 pages
Insp v1.0r4 CTC v6.2r2 User Guide
No ratings yet
Insp v1.0r4 CTC v6.2r2 User Guide
50 pages
Javascript Basics Better Presentation
No ratings yet
Javascript Basics Better Presentation
33 pages
Seg D Rev3.0 PDF
No ratings yet
Seg D Rev3.0 PDF
175 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Floating Point Instructions: Ray Seyfarth

Uploaded by

Floating Point Instructions: Ray Seyfarth

Uploaded by

Floating Point Instructions

June 29, 2012

64 Bit Intel Assembly Language 2011

PC floating point operations were once done in a separate chip - 8087

64 Bit Intel Assembly Language 2011

1 Moving data in and out of floating point registers

4 Basic floating point instructions

6 Floating point comparisons

8 Sample floating point code

64 Bit Intel Assembly Language 2011

movss moves a single 32 bit floating point value to or from an XMM

movss xmm0, [x] ; move value at x into xmm0

64 Bit Intel Assembly Language 2011

The XMM registers are 128 bits

movups xmm0, [x] ; move 4 floats to xmm0

64 Bit Intel Assembly Language 2011

movss xmm0, [a] ; load a

subss subtracts the source float from the destination

movss xmm0, [a] ; load a

64 Bit Intel Assembly Language 2011

64 Bit Intel Assembly Language 2011

cvtss2sd converts a scalar single (float) to a scalar double

cvtss2sd xmm0, [a] ; get a into xmm0 as a double

64 Bit Intel Assembly Language 2011

cvtss2si converts a float to a double word or quad word integer

cvtss2si eax, xmm0 ; convert to dword integer

64 Bit Intel Assembly Language 2011

8087 had sine, cosine, arctangent and more

64 Bit Intel Assembly Language 2011

minss and maxss compute minimum or maximum of scalar floats

movss xmm0, [x] ; move x into xmm0

64 Bit Intel Assembly Language 2011

roundss rounds 1 float

64 Bit Intel Assembly Language 2011

sqrtss computes 1 float square root

64 Bit Intel Assembly Language 2011

64 Bit Intel Assembly Language 2011

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.