2 Data - Information - Storage v2
2 Data - Information - Storage v2
2 Data - Information - Storage v2
3
Representing Numbers: Decimals
• One of the first things we’d like to do is represent numbers in binary
• But first a quick review of decimal numbers to help us understand binary
numbers
• Decimal numbers are known as a base 10 system
• Numbers are represented by the ten symbols: 0-9
• A decimal number has a one’s place (100), a ten’s place (101), a hundred’s place
(102), a thousand’s place (103) and so on
• Example: decimal number 224
4
Binary (base 2) Number System
• Binary is base 2
• Numbers are represented by the symbols 0 and 1
• A binary number has a one’s place (20), a two’s place (21), a four’s place (22), an
eight’s place (23), and so on
• Used to represent unsigned integers
• Example: 01012 = 510
5
Reading Binary
• With practice, you will soon become comfortable
reading binary numbers up to 255
• All you need to do is add combinations of:
• 1, 2, 4, 8, 16, 32, 64, and 128
6
Converting from decimal to binary
• Example: 4210
• 32 is the largest power of two number <= 42 so we know we have a one in the 32’s
column and we subtract 32 from 42, leaving 10
• 16 > 10, so we have a zero in the 16’s column
• 8 <= 10, so we have a one in the 8’s column and we subtract 8 from 10, leaving 2
• 4 > 2, so we have a zero in the 4’s column
• 2 >= 2, so we have a one in the 2’s column and we subtract 2 from 2, leaving 0
• 1 > 0, so we have a zero in the 1’s column
• Putting it all together 4210 = 1010102
7
Hexadecimal (base 16) Number System 0
1
0000
0001
0
15 1111 F
8
Hexadecimal Conversion 0
1
0000
0001
0
Starting from the right-hand side, group the bits in sets of four 5 0101 5
Convert each set of 4 bits to its hex digit 6 0110 6
Example: 7 0111 7
10001101010101000101
8 1000 8
10001101010101000101 -- Group into sets of 4 bits
8 D 5 4 5 -- Convert each set into a hex digit 9 1001 9
0x8D545 10 1010 A
15 1111 F
9
Data Sizes in C (Typical)
• The sizes of the basic data types can vary based on compiler and machine
settings
• These are typical values on a x86-64 system
10
Storing Multi-Byte Data Integer Bytes
94
Address
0000
ADDR 53 0001
• Memory is byte addressable =
0000 7F 0010
• Every byte of memory has an address EA 0011
• Think of memory as a large array with 31 0100
the address as the index in the array ADDR 76 0101
• For multi-byte data =
0100 D9 0110
• The address specifies starting byte 5C 0111
location of the data in memory AB 1000
• The rest of the data is in the increasing ADDR 83 1001
=
memory addresses that follow 1000 75 1010
• Example BB 1011
28 1100
• The Integer at address 4 (0100) contains ADDR
the four bytes: 31 76 D9 5C 39 1101
=
1100 4E 1110
05 1111
11
Byte Ordering
• There are two conventions for the layout of multi-byte objects
• Big Endian and Little Endian
• Example
• 4-byte integer of 0x01234567 is located at memory address 0x100
• This value exists in memory locations 0x100, 0x101, 0x102, 0x103
12
Byte Ordering Takeaways
• X86-64 uses little endian
• When reading data from left to right in increasing memory order:
• The bytes will be in reverse order from how the number is written
• Bytes are always written from MSB (most significant bit) to LSB (least
significant bit) in both endian conventions
• Little endian systems will always have the end byte at the smallest (littlest)
address
• Big endian systems will always have the end byte at the largest (biggest)
address
13
Char Bytes Addr
Representing Strings ess
94 0000
• There is no native type for strings in C 53 0001
• Instead, strings are represented as an array of characters 7F 0010
(char)
EA 0011
• Terminated by the null (the literal value 0) character
‘H’ 48 0100
• Characters ‘e’ 65 0101
• 1 byte represented by an ASCII (American Standard
‘l’ 6C 0110
Code for Information Interchange) encoding
‘l’ 6C 0111
• Arrays
‘o’ 6F 1000
• Data of the same type ordered in contiguous
memory \0 00 1001
• We’ll talk more about arrays later 75 1010
15
Representing Code
• Binary programs have a well-defined format
• Linux uses the ELF binary format
• Programs loaded into memory are represented as a stream of bytes
written in machine language
00000000000006ca <main>:
6ca: 55 push %rbp
6cb: 48 89 e5 mov %rsp,%rbp
6ce: 48 83 ec 20 sub $0x20,%rsp
6d2: 89 7d ec mov %edi,-0x14(%rbp)
6d5: 48 89 75 e0 mov %rsi,-0x20(%rbp)
6d9: 83 7d ec 02 cmpl $0x2,-0x14(%rbp)
6dd: 74 13 je 6f2 <main+0x28>
16
Boolean Algebra
• Developed by George Boole in 19th Century
• Algebraic representation of logic
• Encode “True” as 1 and “False” as 0
Exclusive OR
AND OR XOR NOT
A B A&B A B A|B A B A^B A ~A
0 0 0 0 0 0 0 0 0 0 1
0 1 0 0 1 1 0 1 1 1 0
1 0 0 1 0 1 1 0 1
1 1 1 1 1 1 1 1 0
17
Boolean Properties
• Where A and B are Boolean values (either 0 or 1)
• Commutative Law:
• A | B == B | A
• A & B == B & A
• Associative Law:
• A | (B | C) == (A | B) | C
• A & (B & C) == (A & B) & C
• Distributive Law:
• A & (B | C) == (A & B) | (A & C)
• A | (B & C) == (A | B) & (A | C)
• De Morgan’s Theorem:
• ~(A & B) == ~A | ~B
• ~(A | B) == ~A & ~B
18
Bit Vectors and Bit-Level Operations in C
• A bit vector is a string of zeros and ones of some fixed length w
• C supports bitwise Boolean operations on bit vectors
• AND (&), OR (|), NOT (~), and XOR (^)
• Can be used on any integral data type (char, short, int, long)
• View arguments as bit vectors
• Operation is applied bit-wise
• Example:
• char a = 0x69; char b = 0x55;
• a & b; // evaluates to 0x41
• a | b; // evaluates to 0x7D
• a ^ b; // evaluates to 0x3C
19
Shift Operations in C
• Left Shift: x << y
Argument x 01100010
• Shift bit-vector x left y positions
• Throw away extra bits on left << 3 00010000
00010000
• Fill with 0’s on right
Log. >> 2 00011000
• Right Shift: x >> y 00011000
Arith. >> 2
• Shift bit-vector x right y positions
• Throw away extra bits on right
• Logical shift Argument x 10100010
• Fill with 0’s on left << 3 00010000
• Arithmetic shift
• Replicate most significant bit on left Log. >> 2 00101000
20
Masking Operations
• Goal: modify some bits of a bit vector while leaving the other bits unchanged
• Apply a mask to a value to change only certain bits
• Example: value 0xAA (10101010), mask 0x0F (00001111)
• Setting bits to 1 (OR operator)
• 0xAA | 0x0F // evaluates to 0xAF
• Wherever the mask is 1, that bit will set be 1
• Wherever the mask is 0, the bit will be unchanged
• Clearing bits to 0 (AND operator)
• 0xAA & 0x0F // evaluates to 0x0A
• Wherever the mask is 0, that bit will be cleared to 0
• Wherever the mask is 1, that bit will remain the same.
• Flipping bits (XOR operator)
• 0xAA ^ 0x0F // evaluates to 0x5A
• Wherever the mask is 1, that bit will be flipped
• Wherever the mask is 0, that bit will remain the same
21
Extracting Bits
• Goal: get the value for some subset of bits from a bit vector
• Can extract any contagious group of bits by varying the size of the shift and
the masking bits
22
Logic Operations in C
• No Boolean type in C
• View 0 as “False”
• Anything nonzero as “True”
• Logical operators always return 0 (False) or 1 (True)
• Early termination
• Logical operators
• &&, ||, ! (logical AND, OR, NOT)
• Examples (char data type)
• !0x41 -> 0x00
• !0x00 -> 0x01
• !!0x41 -> 0x01
• 0x69 && 0x55 -> 0x01
• 0x69 || 0x55 -> 0x01
23