0% found this document useful (0 votes)
22 views

Aula Ch3

This document provides an overview of computer arithmetic, including integer and floating point representations and operations. It discusses multiplication and division algorithms, parallelism techniques like SIMD, and optimizations for graphics and multimedia workloads. Key topics covered include floating point number formats and operations, parallel hardware designs for faster multiplication and division, and issues that can arise like overflow, underflow, and lack of associativity in floating point addition. The document concludes by noting limitations of finite-precision binary representations and importance of the ISA in defining number interpretations and arithmetic capabilities.

Uploaded by

Charles Chaves
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views

Aula Ch3

This document provides an overview of computer arithmetic, including integer and floating point representations and operations. It discusses multiplication and division algorithms, parallelism techniques like SIMD, and optimizations for graphics and multimedia workloads. Key topics covered include floating point number formats and operations, parallel hardware designs for faster multiplication and division, and issues that can arise like overflow, underflow, and lack of associativity in floating point addition. The document concludes by noting limitations of finite-precision binary representations and importance of the ISA in defining number interpretations and arithmetic capabilities.

Uploaded by

Charles Chaves
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 21

EEL580 - Arquitetura de Computadores: Arithmetic for

Computers

Docente: Diego L. C. Dutra


Sumário

● Introduction
● Multiplication
● Division
● Floating Point
● Parallelism and Computer Arithmetic
● Real Stuff
● Going Faster
● Fallacies and Pitfalls
● Concluding Remarks
Introduction

● Operations on integers
○ Addition and subtraction
■ Overflow if result out of range
○ Multiplication and division
○ Dealing with overflow
● Floating-point real numbers
○ Representation and operations
● Arithmetic for Multimedia
○ Graphics and media processing operates on vectors of 8-bit and 16-bit data
■ Use 64-bit adder, with partitioned carry chain
■ Operate on 8×8-bit, 4×16-bit, or 2×32-bit vectors
○ SIMD (single-instruction, multiple-data)
■ Saturating operations
■ On overflow, result is largest representable value
● c.f. 2s-complement modulo arithmetic
■ E.g., clipping in audio, saturation in video
Multiplication: Basic
Multiplication: Optimized Multiplier

● Faster: Uses multiple adders


● Perform steps in parallel: add/shift ○ Cost/performance tradeoff

● One cycle per partial-product addition


○ That’s ok, if frequency of multiplications is low
● Can be pipelined
○ Several multiplication performed in parallel
Division: Basic
Division: Optimized Divider

● Faster Division
○ Can’t use parallel hardware as in multiplier
■ Subtraction is conditional on sign of
remainder
■ We still use parallel hardware but w/
diminish returns
○ Faster dividers (e.g. Newton-Raphson)
generate multiple quotient bits per step
■ Still require multiple steps

● One cycle per partial-remainder


subtraction
● Looks a lot like a multiplier!
○ Same hardware can be used for both
RISC-V Multiplication & Division

● RISC-V Division: Overflow and division-by-zero don’t produce errors


○ Just return defined results
○ Faster for the common case of no error
Floating Point

single: 8 bits single: 23 bits


● Representation for non-integral double: 52 bits
double: 11 bits
numbers
○ Including very small and very large S Exponent Fraction
numbers
● Types float and double in C
● Defined by IEEE Std 754-1985 ● IEEE Floating-Point Format
● Developed in response to divergence ○ S: sign bit (0 ⇒ non-negative, 1 ⇒ negative)
○ Normalize significand: 1.0 ≤ |significand| < 2.0
of representations ■ Always has a leading pre-binary-point 1 bit, so
○ Portability issues for scientific code no need to represent it explicitly (hidden bit)
● Now almost universally adopted ■ Significant is Fraction with the “1.” restored
○ Exponent: excess representation: actual exponent +
● Two representations Bias
○ Single precision (32-bit) ■ Ensures exponent is unsigned
○ Double precision (64-bit) ■ Single: Bias = 127; Double: Bias = 1203
Floating Point

● Relative precision ● Infinities and NaNs


○ all fraction bits are significant ○ Exponent = 111...1, Fraction = 000...0
○ Single: approx 2–23 ■ ±Infinity
■ Equivalent to 23 × log102 ≈ 23 × 0.3 ■ Can be used in subsequent
≈ 6 decimal digits of precision calculations, avoiding need for
○ Double: approx 2–52 overflow check
■ Equivalent to 52 × log102 ≈ 52 × 0.3 ○ Exponent = 111...1, Fraction ≠ 000...0
≈ 16 decimal digits of precision ■ Not-a-Number (NaN)
● Denormal Numbers ■ Indicates illegal or undefined
○ Exponent = 000...0 ⇒ hidden bit is 0 result
● e.g., 0.0 / 0.0
■ Can be used in subsequent
○ Smaller than normal numbers calculations
■ allow for gradual underflow, with
diminishing precision
○ Denormal with fraction = 000...0
Floating Point: Adder HW

Step 1

Step 2

Step 3

Step 4
Floating Point: HW

● FP Adder Hardware FP multiplier


○ Much more complex than integer adder
○ Doing it in one clock cycle would take too long
■ Much longer than integer operations
■ Slower clock would penalize all instructions
○ FP adder usually takes several cycles
■ Can be pipelined
● FP Arithmetic Hardware & Accurate Arithmetic
○ FP multiplier is of similar complexity to FP adder
■ But uses a multiplier for significands instead of an adder
○ FP arithmetic hardware usually does
■ Addition, subtraction, multiplication, division, reciprocal, square-root
■ FP ↔ integer conversion
○ Operations usually takes several cycles
■ Can be pipelined
○ IEEE Std 754 specifies additional rounding control
■ Extra bits of precision (guard, round, sticky)
■ Choice of rounding modes
■ Allows programmer to fine-tune numerical behavior of a computation
■ Not all FP units implement all options: Trade-off between hardware complexity,
performance, and market requirements
Floating Point Instructions in RISC-V
Floating Point Example: °F to °C

● C code:

● Compiled RISC-V code:


Floating Point Example: Array Multiplication

● C= C+A× B
○ All 32 × 32 matrices, 64-bit double-precision
elements
● C code:
○ Addresses of c, a, b in x10, x11, x12, and i, j, k in
x5, x6, x7

● RISC-V code:
Real Stuff: Parallelism in Computer Arithmetic and SIMD

● Graphics and audio applications can take advantage of performing simultaneous


operations on short vectors
○ Example: 128-bit adder:
■ Sixteen 8-bit adds
■ Eight 16-bit adds
■ Four 32-bit adds
● Also called data-level parallelism, vector parallelism, or Single Instruction,
Multiple Data (SIMD)
● Streaming SIMD Extension 2 (SSE2) on x86_64
Going Faster: Subword Parallelism and Matrix Multiply

● Unoptimized C code:

● Optimized C code:
Fallacies and Pitfalls

● Fallacy: Right Shift and Division


○ Left shift by i places multiplies an integer by 2 i and right shift divides by 2 i?
■ Only for unsigned integers
● Pitfall: Floating-point addition is not associative.
○ Parallel programs may interleave operations in unexpected orders
Assumptions of associativity may fail
○ Need to validate parallel programs under varying degrees of parallelism

● Fallacy: Parallel execution strategies that work for integer data


types also work for floating-point data types.
Fallacies and Pitfalls

● Fallacy: Only theoretical mathematicians care about floating-point accuracy.


○ Important for scientific code
○ But for everyday consumer use?
○ “My bank balance is out by 0.0002¢!” ☹
○ The Intel Pentium FDIV bug
○ The market expects accuracy
○ See Colwell, The Pentium Chronicles
○ Recall costs$500 million,
Concluding Remarks

● Bits have no inherent meaning The frequency of the RISC-V instructions for the
○ Interpretation depends on the SPEC CPU2006 benchmarks.
instructions applied
● Computer representations of
numbers
○ Finite range and precision
○ Need to account for this in programs
● ISAs support arithmetic
○ Signed and unsigned integers
○ Floating-point approximation to reals
● Bounded range and precision
○ Operations can overflow and underflow
Questions

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy