0% found this document useful (0 votes)
696 views

Bits and Bytes PDF

This document provides an introduction to how computers represent and manipulate data at the bit level. It discusses how bits are used to represent binary digits and how groups of bits can represent numbers, characters, and other data types. It also summarizes the history leading to modern approaches, including the work of George Boole on Boolean algebra and Claude Shannon's application of Boolean logic to digital circuit design. Key concepts covered include binary, bytes, encoding integer and floating-point values, and byte ordering conventions like big-endian and little-endian.

Uploaded by

Almacapuno
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
696 views

Bits and Bytes PDF

This document provides an introduction to how computers represent and manipulate data at the bit level. It discusses how bits are used to represent binary digits and how groups of bits can represent numbers, characters, and other data types. It also summarizes the history leading to modern approaches, including the work of George Boole on Boolean algebra and Claude Shannon's application of Boolean logic to digital circuit design. Key concepts covered include binary, bytes, encoding integer and floating-point values, and byte ordering conventions like big-endian and little-endian.

Uploaded by

Almacapuno
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 76

CSCI-UA.

0201

Computer Systems Organization


Bits and Bytes:
Data Presentation
Mohamed Zahran (aka Z)
mzahran@cs.nyu.edu
http://www.mzahran.com

Some slides adapted


and modified from:
•Clark Barrett
•Jinyang Li
•Bryant and O’Hallaron
Bits and Bytes
• Representing information as bits
• How bits are manipulated?
• Integers
• Floating points
Our First Steps…
How do we represent data in a computer?
• How do we represent information using
electrical signals?
• At the lowest level, a computer is an
electronic machine.
• Easy to recognize two conditions:
– presence of a voltage – we call this state “1”
– absence of a voltage – we call this state “0”
Binary Representations

0 1 0

3.3V
2.8V

0.5V
0.0V
A Computer is a Binary Digital
Machine
• Basic unit of information is the binary digit, or bit.
• Values with more than two states require multiple
bits.
– A collection of two bits has four possible states:
00, 01, 10, 11
– A collection of three bits has eight possible states:
000, 001, 010, 011, 100, 101, 110, 111
– A collection of n bits has 2n possible states.
George Boole
• (1815-1864)
• English mathematician and
philosopher
• Inventor of Boolean Algebra
• Now we can use things like:
AND, OR, NOT, XOR,
XNOR, NAND, NOR, ….

Source: http://history-computer.com/ModernComputer/thinkers/Boole.html
Claude Shannon
• (1916–2001)
• American mathematician
and electronic engineer
• His work is the foundation
for using switches (i.e.
transistors), and hence
binary numbers, to
implement Boolean function.

Source: http://history-computer.com/ModernComputer/thinkers/Shannon.html
So, we use transistors to
implement logic gates.
Logic gates manipulate binary
numbers to implement Boolean
functions.
Boolean functions solve
problems.

…. Simply Speaking … 
Encoding Byte Values
• Byte = 8 bits
0 0 0000
– Binary 000000002 to 111111112 1 1 0001
2 2 0010
– Decimal: 010 to 25510 3 3 0011
4 4 0100
– Hexadecimal 0016 to FF16 5 5 0101
6 6 0110
• Base 16 number representation 7 7 0111
8 8 1000
• Every 4 bits  1 hexadecimal digit 9 9 1001
• Use characters ‘0’ to ‘9’ and ‘A’ to ‘F’ A
B
10
11
1010
1011
• Write FA1D37B16 in C language as C 12 1100
D 13 1101
– 0xFA1D37B E 14 1110
– 0xfa1d37b F 15 1111
Data Representations
C Data Type Typical 32-bit Intel IA32 x86-64

char 1 1 1

short 2 2 2

int 4 4 4

long 4 4 8

long long 8 8 8

float 4 4 4

double 8 8 8

pointer 4 4 8

The most widely used .


Byte Ordering
• How are bytes within a multi-byte word
ordered in memory?
• Conventions
– Big Endian: Sun, PPC, Internet
• Most significant byte has lowest address
– Little Endian: x86
• Most significant byte has highest address
Byte Ordering Example
• Big Endian
– Most significant byte has lowest address Most
• Little Endian Significant
Byte
– Most significant byte has highest address
• Example
– Variable x has 4-byte representation 0x01234567
– Address given by &x is 0x100
Big Endian 0x100 0x101 0x102 0x103
01 23 45 67

Little Endian 0x100 0x101 0x102 0x103


67 45 23 01
Reading Byte-Reversed Listings
• Disassembly
– given the binary file, get the assembly

• Example Fragment
Address Instruction Code Assembly Rendition
8048365: 5b pop %ebx
8048366: 81 c3 ab 12 00 00 add $0x12ab,%ebx
804836c: 83 bb 28 00 00 00 00 cmpl $0x0,0x28(%ebx)

• Deciphering Numbers
– Value: 0x12ab
– Pad to 32 bits (int is 4 bytes_: 0x000012ab
– Split into bytes: 00 00 12 ab
– Reverse (little endian): ab 12 00 00
Examining Data Representations
• Code to print Byte Representation of
data

void show_bytes(unsigned char * start, int len){


int i;
for (i = 0; i < len; i++)
printf(”%p\t%2x\n",start+i, start[i]);
printf("\n");
}

printf directives:
%p: Print pointer
%x: Print Hexadecimal
show_bytes Execution Example
int a = 15213;
printf("int a = 15213;\n");
show_bytes((unsigned char *) &a, sizeof(int));

Result (Linux):
int a = 15213;
0x11ffffcb8 0x6d
0x11ffffcb9 0x3b
0x11ffffcba 0x00
0x11ffffcbb 0x00

Note: 15213 in decimal is 3B6D in hexadecimal


Representing Integers
Decimal: 15213
Binary: 0011 1011 0110 1101
Hex: 3 B 6 D
int A = 15213; long int C = 15213;
IA32, x86-64 Sun
IA32 x86-64 Sun
6D 00
6D 6D 00
3B 00
3B 3B 00
00 3B
00 00 3B
00 6D
00 00 6D
00
00
00
00
Representing Strings
• Strings in C char S[6] = "18243";

– Represented by array of characters


– Each character encoded in ASCII format
• Standard 7-bit encoding of character set
• Character ‘0’ has code 0x30
– Digit i has code 0x30+i Little Endian Big Endian

– String should be null-terminated 31 31


38 38
• Byte ordering not an issue 32 32
34 34
Byte Ordering is an issue for a data item. 33 33
An array is a group of data items. 00 00
How to Manipulate Bits?
Boolean Algebra
• Developed by George Boole in 19th Century
– Algebraic representation of logic
• Encode “True” as 1 and “False” as 0
And Or
 A&B = 1 when both A=1 and B=1  A|B = 1 when either A=1 or B=1
A B A&B A B A|B
0 0 0 0 0 0
0 1 0 0 1 1
1 0 0 1 0 1
1 1 1 1 1 1
Not Exclusive-Or (Xor)
 ~A = 1 when A=0  A^B = 1 when either A=1 or B=1, but not both
A ~A A B A^B
0 1 0 0 0
1 0 0 1 1
1 0 1
1 1 0
Application of Boolean Algebra
• Applied to Digital Systems by Claude
Shannon
– 1937 MIT Master’s Thesis
– Reason about networks of relay switches
• Encode closed switch as 1, open switch as 0

Transistor
General Boolean Algebras
• Operate on Bit Vectors (e.g. an integer is a bit vector
of 4 bytes = 32 bits)
– Operations applied bitwise
01101001 01101001 01101001
& 01010101 | 01010101 ^ 01010101 ~ 01010101
01000001
01000001 01111101
01111101 00111100
00111100 10101010
10101010
Bit-Level Operations in C
• Operations &, |, ~, ^ Available in C
– Apply to any “integral” data type
• long, int, short, char, unsigned
• Examples (Char data type)
– ~0x41 = 0xBE
• ~010000012 = 101111102
– ~0x00 = 0xFF
• ~000000002 = 111111112
– 0x69 & 0x55 = 0x41
• 011010012 & 010101012 = 010000012
– 0x69 | 0x55 = 0x7D
• 011010012 | 010101012 = 011111012
Contrast: Logic Operations in C
• Contrast to Logical Operators
&&, ||, !
• View 0 as “False”
• Anything nonzero as “True”
• Always return 0 or 1
• Examples (char data type)
– !0x41 = 0x00
– !0x00 = 0x01
– !!0x41 = 0x01

– 0x69 && 0x55 = 0x01


– 0x69 || 0x55 = 0x01
– p && *p (avoids null pointer access)
Boolean in C
• Did not exist in standard C89/90
• It was introduced in C99 standard
• You may need to use the following switch
with gcc:
gcc –std=c99 …
#include <stdbool.h>
bool x;
x = false;  lower case
x = true;
Shift Operations
• Left Shift: x << y
Argument x 01100010
– Shift x left by y positions
• Throw away extra bits on left << 3 00010000
• Fill with 0’s on right Log. >> 2 00011000
• Right Shift: x >> y Arith. >> 2 00011000
– Shift x right y positions
• Throw away extra bits on right
– type1: Logical shift Argument x 10100010
• Fill with 0’s on left
<< 3 00010000
– type 2: Arithmetic shift (covered later)
• Replicate most significant bit on right Log. >> 2 00101000
• Undefined Behavior Arith. >> 2 11101000
– Shift amount < 0 or ≥ size of x
How to present Integers?
(unsigned and signed)
Two Type of Integers
• Unsigned
– positive numbers and 0
• Signed numbers
– negative numbers as well as positive
numbers and 0
Unsigned Integers
w1
B2U(X)   xi 2 i

i0
10111011
128 64 32 16 8 4 2 1

187
Unsigned Integers
• An n-bit unsigned integer represents 2n
values:
22 21 2 0
from 0 to 2n-1. 0 0 0 0
0 0 1 1
0 1 0 2
0 1 1 3
1 0 0 4
1 0 1 5
1 1 0 6
1 1 1 7
Unsigned Binary Arithmetic
• Base-2 addition – just like base-10
– add from right to left, propagating carry
carry

10010 10010 1111


+ 1001 + 1011 + 1
11011 11101 10000

10111
+ 111
What About Negative Numbers?
People have tried several options:
Sign Magnitude: One's Complement Two's Complement
000 = +0 000 = +0 000 = +0
001 = +1 001 = +1 001 = +1
010 = +2 010 = +2 010 = +2
011 = +3 011 = +3 011 = +3
100 = -0 100 = -3 100 = -4
101 = -1 101 = -2 101 = -3
110 = -2 110 = -1 110 = -2
111 = -3 111 = -0 111 = -1

• Issues: balance, number of zeros, ease of operations


• Which one is best? Why?
Signed Integers
• With n bits, we have 2n distinct values.
– assign about half to positive integers and
about half to negative
• Positive integers
– just like unsigned: zero in most significant
(MS) bit
00101 = 5
• Negative integers
– In two’s complement form
In general: a 0 at the MS bit indicates positive
and a 1 indicates negative.
Two’s Complement
• Two’s complement representation
developed to make circuits easy for
arithmetic.
– for each positive number (X), assign value to
its negative (-X),
such that X + (-X) = 0 with “normal” addition,
ignoring carry out

00101 (5) 01001 (9)


+ 11011 (-5) + 10111 (-9)
00000 (0) 00000 (0)
Two’s Complement Signed Integers
• MS bit is sign bit.
• Range of an n-bit number: -2n-1 through 2n-1 – 1.
– The most negative number (-2n-1) has no positive
counterpart.
-23 22 21 20 -23 22 21 20
0 0 0 0 0 1 0 0 0 -8
0 0 0 1 1 1 0 0 1 -7
0 0 1 0 2 1 0 1 0 -6
0 0 1 1 3 1 0 1 1 -5
0 1 0 0 4 1 1 0 0 -4
0 1 0 1 5 1 1 0 1 -3
0 1 1 0 6 1 1 1 0 -2
0 1 1 1 7 1 1 1 1 -1
Converting Binary (2’s C) to Decimal
1. If MS bit is one (i.e. number is
negative), take two’s n 2n
complement to get a positive 0 1

number.
1 2
2 4

2. Get the decimal as if the


3 8
4 16

number is unsigned (using power 5 32

of 2s).
6 64
7 128

3. If original number was negative,


8 256
9 512

add a minus sign. 10 1024


Examples
X = 00100111two
= 25+22+21+20 = 32+4+2+1 n 2n
= 39ten 0 1
1 2
2 4
X = 11100110two 3 8
4 16
-X = 00011010 5 32
= 24+23+21 = 16+8+2 6 64
7 128
= 26ten 8 256

X= -26ten 9 512
10 1024
Numeric Ranges
Example: Assume 16-bit numbers
Decimal Hex Binary
Unsigned 65535 FF FF 11111111 11111111
Max
Signed 32767 7F FF 01111111 11111111
Max
Signed -32768 80 00 10000000 00000000
Min
-1 -1 FF FF 11111111 11111111
0 0 00 00 00000000 00000000
Values for Different Sizes
W
8 16 32 64
Unsig. 255 65,535 4,294,967,295 18,446,744,073,709,551,615
Max
Signed 127 32,767 2,147,483,647 9,223,372,036,854,775,807
Max
Signed -128 -32,768 -2,147,483,648 -9,223,372,036,854,775,808
Min

 C Programming
 #include <limits.h>
 Declares constants, e.g.,
 INT_MAX
 LONG_MAX
 INT_MIN
 UINT_MIN
 …
What happens if you change the
type of a variable
(aka type casting)?
Signed vs. Unsigned in C
• Constants
– By default, signed integers
– Unsigned with “U” as suffix
0U, 4294967259U
• Casting
– Explicit casting between signed & unsigned
int tx, ty;
unsigned ux, uy;
tx = (int) ux;
uy = (unsigned) ty;

– Implicit casting also occurs via assignments and


procedure calls
tx = ux;
uy = ty;
General Rule for Casting:
signed <-> unsigned
Follow these two steps:
1. Keep the bit presentation
2. Re-interpret

Effect:
• Numerical value may change.
• Bit pattern stays the same.
Mapping Signed  Unsigned
Bits Signed Unsigned
0000 0 0
0001 1 1
0010 2 2
0011 3 3
0100 4
= 4
0101 5 5
0110 6 6
0111 7 7
1000 -8 8
1001 -7 9
1010 -6 10
1011 -5
+/- 16 11
1100 -4 12
1101 -3 13
1110 -2 14
1111 -1 15
Casting Surprises
• Expression Evaluation
–If there is a mix of unsigned and signed in single
expression,
signed values implicitly cast to unsigned
–Including comparison operations <, >, ==, <=,
>=

If there is an expression
that has many types, the
compiler follows these rules.
Example
#include <stdio.h>

main() {

int i = -7;
unsigned j = 5;
Condition is
if( i > j ) TRUE!
printf(“Surprise!\n");

}
Expanding & Truncating a variable
Expanding
• Convert w-bit signed integer to w+k-bit
with same value
• Convert unsigned: pad k 0 bits in front
• Convert signed: make k copies of sign bit
w
X •••

•••

X ••• •••
k w
Sign Extension Example
short int x = 15213;
int ix = (int) x;
short int y = -15213;
int iy = (int) y;

Decimal Hex Binary


x 15213 3B 6D 00111011 01101101
ix 15213 00 00 3B 6D 00000000 00000000 00111011 01101101
y -15213 C4 93 11000100 10010011
iy -15213 FF FF C4 93 11111111 11111111 11000100 10010011

• Converting from smaller to larger


integer data type
• C automatically performs sign extension
Truncating
• Example: from int to short (i.e. from
32-bit to 16-bit)
• High-order bits are truncated
• Value is altered  must reinterpret
• This non-intuitive behavior can lead to
buggy code!  So don’t do it!
Addition, negation, multiplication,
and shifting
Negation: Complement & Increment
• The complement of x satisfies
Two’sComp(x) + x = 0
Two’sComp(x) = ~x + 1

• Proof sketch
– Observation: ~x + x == 1111…111 == -1
 ~x + x + 1 = 0
 (~x + 1) + x = 0
 Two’sComp(x) + x = 0

x 1 0 0 1 1 1 0 1

+ ~x 0 1 1 0 0 0 1 0

-1 1 1 1 1 1 1 1 1
Unsigned Addition
u •••
Operands: w bits
+ v •••
True Sum: w+1 bits u+v •••

Discard Carry: w bits UAddw(u , v) •••


Two’s Complement Addition
Operands: w bits u •••
+ v •••
True Sum: w+1 bits u+v •••
Discard Carry: w bits TAddw(u , v) •••

– If sum  2w–1, becomes negative (positive


overflow)
– If sum < –2w–1, becomes positive (negative
overflow)
Multiplication
• Exact Product of w-bit numbers x, y
– Either signed or unsigned
• Ranges
– Unsigned: 0 ≤ x * y ≤ (2w – 1) 2 = 22w – 2w+1 + 1
– Two’s complement min: x * y ≥ (–2w–1)*(2w–1–1) =
–22w–2 + 2w–1
– Two’s complement max: x * y ≤ (–2w–1) 2 = 22w–2
Power-of-2 Multiply with Shift
• Operation
– u << k gives u * 2k k
– Both signed and unsigned

• Examples
– u << 3 == u * 8
– (u << 5) – (u << 3) == u * 24
– Most machines shift and add faster than
multiply
• Compiler generates this code automatically
Compiled Multiplication Code
C Function
int mul12(int x)
{
return x*12;
}

Compiled Arithmetic Operations Explanation


leal (%eax,%eax,2), %eax t = x+x*2
sall $2, %eax return t << 2;

• C compiler automatically generates shift/add


code when multiplying by constant
Unsigned Power-of-2
Divide with Shift
• Quotient of Unsigned by Power of 2
– u >> k gives  u / 2k 

Examples:

Division Computed Hex Binary


x 15213 15213 3B 6D 00111011 01101101
x >> 1 7606.5 7606 1D B6 00011101 10110110
x >> 4 950.8125 950 03 B6 00000011 10110110
x >> 8 59.4257813 59 00 3B 00000000 00111011
Compiled Unsigned Division Code
C Function
unsigned udiv8(unsigned x)
{
return x/8;
}

Compiled Arithmetic Operations Explanation


shrl $3, %eax # Logical shift
return x >> 3;

• Uses logical shift for unsigned


• For Java Users
– Logical shift written as >>>
Signed Power-of-2 Divide with Shift
• Quotient of Signed by Power of 2
– x >> k gives  x / 2k 
– Uses arithmetic shift

Examples
Division Computed Hex Binary
y -15213 -15213 C4 93 11000100 10010011
y >> 1 -7606.5 -7607 E2 49 11100010 01001001
y >> 4 -950.8125 -951 FC 49 11111100 01001001
y >> 8 -59.4257813 -60 FF C4 11111111 11000100
Floating Points
Some slides and information about FP are adopted from
Prof. Michael Overton book:
Numerical Computing with IEEE Floating Point Arithmetic
Turing Award 1989 to William Kahan for design of the
IEEE Floating Point Standards 754 (binary) and 854
(decimal)
Carnegie Mellon

Background: Fractional binary


numbers
• What is 1011.1012?
Carnegie Mellon

Background: Fractional Binary Numbers


2i
2i-1

4
••• 2
1

bi bi-1 ••• b2 b1 b0 b-1 b-2 b-3 ••• b-j


1/2
1/4 •••
1/8

2-j

• Value:
Carnegie Mellon

Fractional Binary Numbers:


Examples
 Value Representation
5 3/4 101.112
2 7/8 10.1112
Carnegie Mellon

Why not fractional binary


numbers?
• Not efficient
 3 * 2100  1010000000 ….. 0
100 zeros

Given a finite length (e.g. 32-bits), cannot


represent very large nor very small numbers
(ε  0)
Carnegie Mellon

IEEE Floating Point


• IEEE Standard 754
– Supported by all major CPUs
– The IEEE standards committee consisted mostly
of hardware people, plus a few academics led by
W. Kahan at Berkeley.

• Main goals:
– Consistent representation of floating point
numbers by all machines .
– Correctly rounded floating point operations.
– Consistent treatment of exceptional situations
such as division by zero.
Carnegie Mellon

Floating Point Representation


• Numerical Form:
(–1)s M 2E
– Sign bit s determines whether number is negative
or positive
– Significand M a fractional value
– Exponent E weights value by power of two

• Encoding
– MSB s is sign bit s
– exp field encodes E
– frac field encodes M

s exp frac
Carnegie Mellon

Precisions
• Single precision: 32 bits
s exp frac
1 8-bits 23-bits

• Double precision: 64 bits


s exp frac
1 11-bits 52-bits

• Extended precision: 80 bits (Intel only)


s exp frac
1 15-bits 63 or 64-bits
Based on exp
we have 3 encoding schemes
• exp ≠ 0..0 or 11…1  normalized encoding
• exp = 0… 000  denormalized encoding
• exp = 1111…1  special value encoding
– frac = 000…0
– frac = something else
Carnegie Mellon

1. Normalized Encoding
• Condition: exp ≠ 000…0 and exp ≠ 111…1
referred to as Bias

• Exponent is: E = Exp – (2k-1 – 1), k is the # of


exponent bits
– Single precision: E = exp – 127
– Double precision: E = exp – 1023
frac Range(E)=[-126,127]
Range(E)=[-1022,1023]
• Significand is: M = 1.xxx…x2
– Range(M) = [1.0, 2.0-ε)
– Get extra leading bit for free
Normalized Encoding Example
• Value: Float F = 15213.0;
1521310 = 111011011011012
= 1.11011011011012 x 213

• Significand
M = 1.11011011011012
frac = 110110110110100000000002

• Exponent
E = exp – Bias = exp - 127 = 13
 exp = 140 = 100011002

• Result:

0 10001100 11011011011010000000000
s exp frac
Carnegie Mellon

2. Denormalized Encoding
(called subnormal in revised standard 854)
• Condition: exp = 000…0

• Exponent value: E = 1 – Bias (instead of E = 0 – Bias)


• Significand is: M = 0.xxx…x2 (instead of M=1.xxx2)
frac
• Cases
– exp = 000…0, frac = 000…0
• Represents zero
• Note distinct values: +0 and –0
– exp = 000…0, frac ≠ 000…0
• Numbers very close to 0.0
Carnegie Mellon

3. Special Values Encoding


• Condition: exp = 111…1

• Case: exp = 111…1, frac = 000…0


– Represents value  (infinity)
– Operation that overflows
– E.g., 1.0/0.0 = −1.0/−0.0 = +, 1.0/−0.0 = −

• Case: exp = 111…1, frac ≠ 000…0


– Not-a-Number (NaN)
– Represents case when no numeric value can be
determined
– E.g., sqrt(–1),  − ,   0
Carnegie Mellon

Visualization: Floating Point Encodings

− +
−Normalized −Denorm +Denorm +Normalized

NaN NaN
0 +0
Carnegie Mellon

Floating Point in C
• C:
–float single precision
–double double precision
• Conversions/Casting
–Casting between int, float, and double
changes bit representation, examples:
– double/float → int
• Truncates fractional part
• Not defined when out of range or NaN
– int → double
• Exact conversion
Conclusions
• Everything is stored in memory as 1s
and 0s
• The binary presentation by itself does
not carry a meaning, it depends on the
interpretation.
• IEEE Floating Point has clear
mathematical properties
– Represents numbers as: (-1)S x M x 2E

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy