0% found this document useful (0 votes)
246 views18 pages

CP264 - Final Review

The document discusses different data structures including queues, stacks, linked lists, and their implementations and applications. Queues follow FIFO order while stacks follow LIFO order. Linked lists are more flexible than arrays for dynamic data.

Uploaded by

takuminft
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
246 views18 pages

CP264 - Final Review

The document discusses different data structures including queues, stacks, linked lists, and their implementations and applications. Queues follow FIFO order while stacks follow LIFO order. Linked lists are more flexible than arrays for dynamic data.

Uploaded by

takuminft
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 18

CP264 Final Review

Queue
 Abstract Queue: Linearly ordered collection of data elements with enqueue, dequeue, and peek operations
o Enqueue operation inserts element into collection
 Linearly order of elements determined by time they're inserted
 Front element is earliest inserted element; Rear is last inserted element
o Dequeue operation removes/deletes front element
o Peek operation gets front element
 Queue Data Structure: Implementation of abstract queue
 Queue Characteristic → First-In-First-Out (FIFO): Deletes first inserted element
 Time and space complexities for queue operations → time O(1), space O(1)

Array Queue
 Simple Array Queue: Queue implementation with array representation, where front and rear
variable represent index positions that deletions and insertions are done respectively
Drawbacks:
o Length of queue is bound by length of its array
o Wastes space if length of queue is shorter than length of array
Linked List Queue
 Fixes drawbacks of array queues → length is dynamic + doesn’t waste space
 Linked List Queue: Uses singly linked list to store queue data values; uses two pointers front and
rear to represent front and rear positions

Deques (Double-ended Queue)


 Dequeue (Double-ended Queue): Queue where elements can be inserted/deleted at either end

Priority Queue
 Priority Queue: Queue where each element is assigned a priority; priority of element used to
determine order in which elements are processed
 Rule of processing elements of priority queue:
1. Element with higher priority is processed before element with lower priority
2. Two elements of same priority processed by first come first serve order
Applications of Queues
Whenever an algorithm needs to remember data items and a FIFIO is required for item processing,
queue data structure can be an option
1. Used to store intermediate data within algorithm for FIFO retrieval (e.g. Breadth-first search)
2. Used as waiting lists for single shared resource (e.g. printer, disk, CPU, network device)
3. Used in OS for handling interrupts; If interrupts must be handled in order of arrival, FIFO queue
is appropriate data structure
4. Used to transfer data asynchronously (e.g. Pipes, file I/O, sockets)
5. Used in simulations for services
6. Used for play lists, adding songs to the end, playing from the front of the list.
7. Priority queues are in OS for processes execution management.
8. Used to remember the path of explorations if FIFO retrieval is needed.

1
Stack
 Abstract Stack: Linearly ordered collections of data elements with push, pop, and peek operations
 Stack Characteristic → Last-In-First-Out (LIFO): Pops the most recently pushed element
 Stack Data Structure: Implementation of abstract stack
 Stack: Linear data structure with push and pop operations satisfying LIFO property
 Why is only one variable needed for accessing a stack data structure?
1. Stack operations only work on one side of the stack list, one variable does the jobs.
2. It’s more time efficient to maintain one variable than more variables.
3. It’s more space efficient to use one variable than more stack data structure.
2. Time and space complexities for stack operations -> time O(1), space O(1)

Array Stack
 Array Stack: Stack representation by an array, top variable represents index position where
push, pop, and peek operations are done
Linked List Stack
 Linked List Stack: Use singly linked list to implement stack, pointer top points to first node
 Array vs. Linked List Stack

Array Stack Linked List Stack


Pros: Pros:
 Easy to implement  No size constraint
 More time efficient in operation  More space efficient with
dynamic memory allocation
Cons: Cons:
 Limitation on maximum size  Use more time on operations
 Less space efficient when stack
size is less than array size
 When is it better to an array stack or a linked list stack?
o Applying stacks in an application, if the maximum size of a stack is known and not big,
then an array stack is a better choice, otherwise a linked list stack is preferred.
Applications of Stacks
1. Back tracking
2. Reversing a list
3. Parentheses checking
4. Infix/Prefix/Postfix expressions and evaluation
5. Conversion from infix to postfix expressions
6. Function call stack and recursion functions

2
Call Stack and Recursive Calls
 Call Stack: Array stack
 Stack Region: Array holding call stack
 When function is called → function data including arguments, local variables, registers, and
return addresses pushed onto call stack
 When function call is done → function data popped off
 Recursion: Method of solving problem by recursive function; has two major cases:
o Base Case - where problem is simple enough to solve directly without making further
calls to the same function
o Recursive Case -
 Problem is divided into smaller sub-parts
 Function calls itself but with sub-parts of problem obtained in first step
 Result obtained by combining solution of sub-parts
 Recursion can be classified based on:
o Direct vs. Indirect Recursion: Whether function calls itself directly or indirectly
 Direct Recursion: Function explicitly calls itself
 Example: Recursive factorial is a directly recursive function
 Indirect Recursion: Contains call to another function that ultimately calls the
function
o Tail-Recursive vs. Not: Whether any operation is pending each recursive call
 Tail Recursion: No operations are pending to be performed when recursive
function returns to its caller, i.e. When called function returns, returned value is
immediately returned from calling function
 Tail-recursive functions preferred because they can be optimized to an iterative
type function
o Linear vs. Tree-Recursive: Structure of calling pattern
 Linear Recursion: Pending operation involves on recursive call to the function,
i.e. Recursion tree is a path
 Tree Recursion: Pending Operation makes more than one recursive call to
function
 Why is stack used to manage function calls?
o After a function call, it needs to return to the call position of the calling function and all
local data of the calling function should be available
 The running time of a recursion function call is proportional to?
o The number of its recursion function calls
 The space usage of a recursion function call is proportional to?
o The depth of the recursive function calls
 Why should recursive functions be avoided in practical programming?
o Not time efficient due to the number of function calls.
o Memory usage is proportional to the function parameter.

3
Linked List
Singly Linked List
 Singly Linked Lists: Linear collection of nodes with the following properties
o Node contains data and one pointer pointing to next node, last node points to NULL
o Accessed by pointer pointing to first node of linked list
 Array vs. Linked List

Array Linked List


Pros: Pros:
 Efficient for random access with time  The number of nodes needs not to be
complexity O(1). predetermined at compile or creating time.
 Efficient in memory usage, does not need  Nodes can be created and inserted into linked lists
extra space to maintain the linear dynamically at runtime.
 Nodes can be deleted, and memory space can be
relations of data elements.
released at runtime.
 Insert and delete operations are more efficient than
that of arrays without shifting.

Cons: Cons:
 Maximum number of array elements must be  Linked list need extra space to store the links
given at compile time either by a constant or and cost more time in maintaining the links in
by a variable. operations.
 Waste memory space if there are not many  It needs to traverse a linked list to access a
data objects to store and run out of memory
specific node, so random access operation on
space when there are more data objects to
linked lists is less efficient than that on arrays.
store.
 Cost more in insert and delete operations due
to shifting.
Doubly Linked List
 Doubly Linked List: Linear collection of nodes with the following properties
o Node structure contains data members and two pointers
 next pointing to next node
 prev pointing to previous node
o First node has prev pointing to NULL and last node has next pointing to NULL
o Has two accessing pointers, start pointing to first node and end pointing to last node
Circular Linked List
o Circular Linked List: Linear collection of nodes with the following properties
 Node structure consists of data members and one address pointer next pointing to the
next node
 Last node has next pointing to first node
 Accessed by pointer start pointing to start node Advantage -> supports traversing
o Circular Doubly Linked Lists: Has the following properties cyclically in both directions efficiently
 next of the last node points to the first node
 prev of the first node points to the last node
 Has one access pointer start pointing to any node

Need to know all operations and functions on the above data structures, both iterative and recursive

4
Tree Data Structures
 Abstract Tree: Collection of nodes connected in a tree structure with set of operations
 Tree Data Structure: Implementation of abstract tree in programming language where parent-child
relations of nodes represented by specific method
o Provides more efficient way of executing certain algorithms (e.g. Search)
Terms and Notation
 Parent: If node N has subtree T1, N1 is root of T1 (i.e. (N, N1) is an edge) then N is parent of N1
 Child: If node N has subtree T1, N1 is root of T1 (i.e. (N, N1) is an edge) then N1 is child of N
 Siblings: Nodes n2 and n3 have the same parent n1, i.e. Both (n1, n2) and (n1, n3) are edges of the tree
 Ancestor: If node n3 is in sub-tree n1, n1 is ancestor of n3
 Descendent: If node n3 is in sub-tree n1, n3 is descendent of n1
 Leaf: Node without child
 Root: Node without ancestor
 Path: Sequence of different edges such that child node of edge is parent node of next edge
o Length of path is number of edges in path
o Example: (n1, n2),(n2, n3), ..., (nk, nk+1)
 Cycle: Path + edge connecting last node to first node in path
o Example: Edge sequence (n1, n2),(n2, n3),(n3, n1) is a cycle of length 3
 Depth of Node: Length of path from root to node
o Root of tree has depth 0
 Level: Set of nodes of the same depth
o Level i consists of all nodes of depth i
 Width: Maximum number of nodes over all three levels
 Height of Tree: 1 + Maximum height of sub-trees
o Height of tree = height of its root
o If tree contains one node (the root) height is 1
 Forest: Disjoint union of general trees, i.e. Ordered set of zero or more general trees
Classification of Trees
 General Trees: No restrictions, each node has zero or more children
 Binary Trees: Each node has at most two children
 Ternary Tree: Each node has at most three children
 Binary Search Trees: Binary trees for efficient searching
o AVL Tree: Height balanced binary search trees
o Red-black-trees: Type of balanced binary search tree
o Splay Trees: Binary search trees efficient for searching recent accessed nodes
 Multi-way Search Trees: Generalization to binary search trees
o B-trees: Balanced multi-way search trees
 Complete Binary Tree: Binary tree where every node but last two have two children and every node in last
level has either left or no child
 Perfect Binary Tree: Last level is fully filled, properties:
o Number of nodes in perfect binary tree is determined by its height
o Level k has 2k nodes -> ∴ last level has 2k nodes where h is the height
o Width of perfect binary tree of n nodes is 2h
Tree Theorems
 Theorem 1.
1. A perfect binary tree of height h has2 h−1 nodes
2. A perfect binary tree of n nodes has height log (n+1)
n+1
3. A perfect binary tree of n nodes has width
2

5

Note: If given an expression tree you
Theorem 2. Let M(h) be the maximum number of nodes over all binary trees of height h, then M(h) ≤ 2h-1
should be able to write the in-order
 Theorem 3. The height of a binary tree of n nodes is at least log (n+1)
and pre-order AND you should be
Application of Binary Trees able to draw a Huffman tree
Expression Tree
 Expression Tree: Binary tree where each non-leaf node represents operator and each leaf node
represents operand
o Provides alternative method to represent algebra expression
Huffman Tree
 ASCII code uses fixed-length code (i.e. Every character has node of 8 bits)
 Huffman Coding: Variable length coding method to code symbols depending on frequencies of
symbols
o Codes of symbols by Huffman coding have different lengths
 High frequent symbol has short length code
 Low frequent symbol has longer length code
o Huffman code often used in data comparison
 How to draw Huffman tree:
1. Get and order frequency of each character
2. Start from the bottom of the list; draw least frequent items from most to least frequent
from right to left
3. Starting from the right and moving left, give each pair of nodes a parent whose value is
equal to the sum of their frequencies
4. Continue doing step (3) until each node has a parent
 Huffman code of symbol is derived from Huffman tree from path connecting root to leaf node of symbol
o Left child edge gives 0
o Right child edge gives 1
Other
1. Store data in a hierarchical structure, representing the relations of data elements
2. Represent collection of data objects for efficient search
3. Implement other types of data structures like hash tables, sets, and maps
4. Binary search trees, elf-balancing trees, AVL trees, red-black trees used to store record data for
efficient search, insert and delete operation
5. B-trees used to store tree structures on disc and to index a large number of records, and
secondary indexes in databases, where the index facilitates a select operation to answer some
range criteria
6. Compiler construction
7. Database design
8. File system directories
9. Binary trees used to represent algebraic expressions and Huffman trees for encoding/decoding

6
Binary Trees and Operations
Pre/In/Post-Order Traversal
 Pre-order Traversal: Define recursively as -> Visit tree in order of root, left sub-tree, and right
sub-tree; same rules apply traversing left and right sub-trees
 In-order Traversal: Visiting tree in order of left-subtree, root, right-subtree
 Post-order Traversal: Visiting tree in order of left sub-tree, right sub-tree, root
Depth-First Order
 Depth-first Order: Explores nodes starting at roots leftmost-subtree until all nodes are visited,
then moves to right sub-trees
 Depth-First-Search (DFS) Algorithm: Using pre-order traversal method that checks data value of
node against search key and, if matched, returns node and stops traversal
Breadth-First Order
 Breadth-first Order: Visits tree nodes using queue by level from low level to high level, and at
each level from left to right
 Breadth-First-Search (BFS): Using breadth-first traversal with action of checking node data
against search key and returning node if it matches

Binary Search Tree (BST)


 Binary Search Tree (BST) aka Ordered Binary Tree: Has the following properties
1. Tree node associated with key value, field/property of tree node used as key for search operation
2. For any node of tree, all nodes in left sub-tree have key values less than key value of node, and all
nodes in right sub-tree have key values equal or greater to key value of node
 BST is not balanced in general, in worst case heigh is O(n) and best is O(log n)
 Balanced BST: Height of left and right subtrees are equal or don’t differ greatly at any node
Operation Functions
Search
 Recursive:
o // your code here
o time: O(h), space: O(h)
 Iterative
o // your code here
o time: O(h), space: O(1)

Insert
 // your code here
 Time: O(h), space: O(1)
Delete
 // your code here
 time: O(h), space: O(1)

7
AVL Tree
Height-balanced Tree
 Balanced Binary Tree: Left and right subtrees height are equal or don’t differ greatly at any node
 Balancedness is defined and measured precisely by the balance factor
 Balance Factor: Height of left sub-tree minus height of right sub-tree
o Balance Factor=h eig h t ( ¿ )−h eig ht (rig h t subtree)
 Left Heavy: Nodes balance factor is greater than 0
 Right Heavy: Nodes balance factor is less than 0
 Balanced: Balance factor is -1, 0, or 1
 Unbalanced: Balance factor is not -1, 0, or 1
 Height Balanced: Binary tree where all nodes are balanced
 Theorem 1. Let T be a height balanced tree of height h, then the number of nodes in T is at least 2h /2−1
 Theorem 2. The height of a height balanced tree of n nodes is O(log n)
 Theorem 3-4. The self-balancing insert/delete operation on AVL tree derives an AVL tree.

AVL Tree Operations


 AVL Tree: Height-balanced binary search tree data structure with self-balancing insert and
delete operations
Properties:
o AVL tree is a binary search tree
o AVL tree is a height balanced binary tree
o Self-balancing AVL insert delete operations have two major steps:
1. Do BST insertion/deletion
2. If derived BST is not height balanced, do self-balancing operations (rotations) to
restore height balanced BST)
 Critical Node: First unbalanced node on path from new node to root
 Time and space complexities for AVL operations -> time O(log n), space O(log n) / O(1)
 Rotations
o INSERT ROTATIONS -> Four possible cases (patterns) and actions (rotations) taken accordingly for
re-balancing. Let x denote the critical node:
1. Case 1: Balance-factor(x) = 2, balance-factor(x->left) >= 0
Action: Do right-rotation
2. Case 2: Balance-factor(x) = 2, balance-factor(x->left) < 0
Action: Do left-right-rotation
3. Case 3: Balance-factor(x) = -2, balance-factor(x->right) <= 0
Action: Do left-rotation
4. Case 4: Balance-factor(x) = -2, balance-factor(x->right) > 0
Action: Do right-left-rotation
o DELETE ROTATIONS -> Four possible cases (patterns) and actions (rotations) taken accordingly
for re-balancing. Let x denote the critical node:
1. Case 1: Balance-factor(x) = 2, balance-factor(x->left) >= 0
Action: Do right-rotation, rotate_right(x)
2. Case 2: Balance-factor(x) = 2, balance-factor(x->left) < 0
Action: Do left-right-rotation, x->left = rotate_left(x->left); rotate_right(x);
3. Case 3: Balance-factor(x) = -2, balance-factor(x->right) <= 0
Action: Do left-rotation, rotate_left(x)

8
4. Case 4: Balance-factor(x) = -2, balance-factor(x->right) > 0
Action: Do right-left-rotation, x->right = rotate_right(x->right); rotate_left(x)

Concepts of Red Black Trees


 Red-black-tree (RBT): Type of balanced binary search tree data structure
o Height of RBT is O(log n)
o Search on RBT done in time O(log n); space O(1)
o Self-balancing insert/delete operations done in time O(log n); space O(log n)
o RBTs are balanced binary trees by means of longest/shortest <= 2.
 Concept of RBT has two aspects:
1. Red-black balanced binary trees
2. Red-black-tree data structures
 Red-black Balanced Tree: Binary tree with the following properties
1. Every node is labeled a colour of either red or black
2. The colour of root node is black, and all NULL children are viewed as black nodes
3. Child nodes of red nodes are black
4. For any given node, every path from node to any of its NULL node has the same number
of black nodes
 Black Height: Number of black nodes on path from node to leaf node
 Properties of Red-Black Binary Trees
o Lemma 1. Let N be any node of a red-black balanced tree. Let H(N) and BH(N) be the
height and black height of node N. Then H(N) ≤ 2BH(N)+1 if N is a red node, or H(N) ≤
2BH(N) if N is a black node.
o Theorem 1. For any node N of a red-black balanced tree, the number of nodes of the
longest path from N to a leaf is less than or equal to twice of the number of nodes of the
shortest path from N to a leaf.
o Theorem 2. The height of a red-black balanced tree of n nodes is O(log n).
 Red-black-tree Data Structure (aka red-black-tree, RBT): Red-black balanced binary search tree
with self-balancing insert and delete operations
Properties:
1. RBT is a BST
2. RBT is a red-black balanced binary tree
3. RBT insert/delete operation does BST insertion/deletion and self-balancing operations
to restore properties 1 and 2

9
Concepts of Splay Trees
 Splay Tree: Binary search tree (BST) with self-adjusting search, insert, and delete operations),
moves most recently accessed nodes closer to root node -> nodes are ∴ accessed faster
 Splay Node: Last accessed node
 Amortized Analysis: Value worst case average resource usage of an algorithm in a sequence of runs. i.e.
If a sequence of M operations takes O(M f(n)) time, we say the amortized runtime is O(f(n))
o Worse Case -> Splay trees are not balanced BST, worse case time per operation is O(n)
o Splay tree amortized time per operation for search, insertion, and deletion is O(log n)
 Self-adjusting: Search/insert/delete operation consists of two major steps
1. Do BST search/insert/delete
2. Move splay node N up to root by splay operation, which consists of sequence of zig-zig
or zig-zag operations, followed by zig operation in case of need
 Splay Operations
o Zig-zig Operation: Right-right-rotation or left-left-rotation, moves splay node up by two levels
o Zig-zag Operation: Left-right-rotation or right-left-rotation, moves splay node up by two levels
o Zig Operation: Single right or left rotation, moves splay node up by one level

Concepts of Multiway Search Trees


 Multi-way (m-way) Search Tree: Generalization to BST by allowing at most m children and m-1
keys, defined by the following properties
1. Nodes are connected in m-tree structure
2. In each node, all key values are in ascending order: Ki < Ki+1 for 0 ≤ i ≤ n-2
3. All key values in sub-tree point by Pi are less than Ki for i = 0, …, m-2
4. All key values in sub-tree pointed by Pi+1 are bigger than Ki for i=0,…, m-2
 By definition of m-way search tree, we have:
1. m is the maximum number children/sub-trees, so out degree of node of an m-way
search tree is at most m
2. m-1 is the upper limit that defines how many key values can be stored in a node
3. When m = 2, m-way search tree is a BST
4. Node structure of m-way search tree contains m-1 key values K0, K1, K0,…, Km-2, m
pointers P0, P1, P2, …, Pm-1, pointing to m children

Concepts of B-trees
 When m > 2, balanced m-way search tree is shorter than balanced search, so it's more efficient
for search operation than BST
 B-tree of order m: Self-balancing m-way search tree, balancedness of B-trees define by following
properties
1. Every node in B-tree except root node and leaf nodes have at least m/2 children
2. Root node has at least two children if it isn't a leaf node
3. All leaf nodes are on the same level
 Properties guarantee height of B-tree of order m is O(log n); search operation has time
complexity O(log n)
 Self-balancing means insert and delete operations on B-tree are done by:
1. Doing m-way search tree insertion/deletion
2. Doing self-balancing operation to restore B-tree balancedness

10
Hash Tables
 Hash Tables: Data structures that has average time O(1) for search, insert, and delete operations
Concepts of Hash Tables
 Abstract Hash Table: Defined by following properties
1. Stores collection of data values of certain type
 One field is used as a key, other (if any) used as value so data can be viewed as value
2. Array of type of length m is used to store data records, where m is size of the hash table
3. Function h is used to map key k to integer h(k) to integer between 0 and m-1
4. h(k) is the hash/index value of k
5. Insert operation puts key-value pair (k, x) at array position of hash value index h(k),
namely hash_array[h(k)]; if array position is taken, an alternative position is used
6. Delete element by key k to remove data element of key k at hash_array[h(k)]
7. Search element by key k (called look-up) is to check data element at array position h(k).
If key value of hash_array[h(k)] marches with k, then return hash_array[h(k)], otherwise
check the data element at alternative positions
 Hash Table: Data structure that implements abstract hash table with concrete hash_data_type,
hash table size, hash function h, insert, look-up, and delete operations
Hash Functions: Division Method
 Hash Function: Mathematical function which maps any key k in set U to integer an integer h(k)
of given modular m. h(k) is called the hash value of k by h
o i.e. h : U → {0, 1, ..., m-1} where m is a given positive integer
 Division function: h(k) = k mod m, where k is any natural number, and mod is the modulo
operator. i.e. h(k) is the remainder of k divided by m
o How to choose m? Best to choose m to be a prime number and not too close to exact
powers of 2 (e.g. m = 10)
o When k = 18, h(k) = h(18) = 18 mod 10 = 8
 Drawbacks of division method:
o Consecutive keys map to consecutive hash values, ensures consecutive keys don't collide
o Consecutive locations will be occupied, may lead to degradation of performance
Collisions and How to Handle Them
 Collided/Collision: If two keys are hashed to same hash value h
o Ideal Situation -> No collisions with given hash function and set of keys, meaning insert,
search, and delete operation can be done in O(1) time
o BUT collisions are unavoidable
 Collision Handling/Resolving: In hash table, two collided keys will have the same hash value
index, but their corresponding data records can't be stored at the same position in the array.
Alternative locations must be found to store collided records

11
Solutions
Solution 1: When collision happens, find open position by probing and store data at open position.
Probe by:
 Linear Probing
o For value stored at location generated by h(k), use hash function h(k ,i)=[h(k )+i]mod m
where
 m is the size of the hash table
 i is the probe number, varies from 0 to m-1
to find the first i such that position h(k, i) is open
o We know array position is open because hash table contains two types of values:
sentinel (e.g. -1) or data values
o Presence of sentinel value means location contains no data value ∴ open for storage
o Pros and cons of linear probing
Pros Cons
Linear probing finds an empty location It results in clustering, and thus a higher
by doing a linear search in the array risk that where there has been one
beginning from position h(k). collision there will be more.
The algorithm provides good memory The performance of linear probing is
caching, through good locality of sensitive to the distribution of input values.
reference
Good performance if no deletion is The performance get degraded after
applied sequence of insert and deletion operations.

Solution 2: Use alternative data structure to store all data records at the same index value
Criteria of a Good Hash Function
1. Efficient to compute hash value for any given key
o Efficient algorithm to calculate checksum of data record together with division
2. Less number of collisions for given set of keys
Chained Hash Table and Operations
 Linked Hash Table: Each location in hash table array stores pointer to linked list or bst that
stored all data records of some hash value
o If no key hashes to an index, pointer at hash table index position is NULL
 Chained Hash Table: Linked data structure used to store data records
 Insert, search, and delete operations on linked hash table have two steps
o Compute hash index value of given key
o Do insert/search/delete operation on linked data structure at hash index position
Application of Hash Tables
1. Store key-value pair data records for fast search by key
2. Implement symbol tables for variable names and their values
3. Compilers for symbol tables
4. Database indexing
5. Caching information for quick look-up Potential final exam question:
 Applying min/max heap properties
6. Internet search engines to store keyword-URL as key-value pairs in databases
Heapify for quick search
 Heap sort

Heaps
 Heaps: Data structure efficient for finding and deleting minimum element; inserting any element

12
 Heap Data Structure: Implementation of abstract heap
 Three heap data structures:
1. Binary heaps
2. Binomial heaps
3. Fibonacci heaps
Min-heap
 Minimum Heap (aka min-heap): Abstract min-heap has following properties:
1. One component/property of data element used as a key
 Denoted as key(A), where A is a node
2. Data elements (called nodes) of min-heap are connected by trees (called heap trees or
heap-ordered trees)
 If node A has child node B, then key(A) <= key(B)
3. Basic heap operation is to find the element of the minimum key value (find-min)
4. Other min-heap operations include:
1. Create: Create a new min-heap
2. Insert: Insert a data element into a min-heap
3. Delete-min: Delete the minimum key node from a min-heap
4. Extract-min: Delete and return the minimum element from a min-heap
5. Decrease-key/Increase-key: Decrease/increase key value of a node
6. Merge: Merge two min-heaps into one min-heap
 Lemma 1. A node of a min-heap is the node of the minimum key value over all nodes in the sub-
tree rooted at the node.

Max-heap
 At abstract level, maximum heap (aka max-heap) has following properties:
1. One component/property of data element used as a key
 Denoted as key(A), where A is a node
2. Data elements (nodes) of max-heap connected by trees (aka heap/heap-ordered trees)
 If node A has child node B, then key(A) >= key(B)
3. Basic heap operation is to find the element of the maximum key value (find-max)
4. Other max-heap operations include:
1. Create: Create a new max-heap
2. Insert: Insert a data element into a max -heap
3. Delete-min: Delete the minimum key node from a max -heap
4. Extract-min: Delete and return the maximum element from a min-heap
5. Decrease-key/Increase-key: Decrease/increase key value of a node
6. Merge: Merge two max -heaps into one max -heap
Binary Heap
 Binary Heap: Heap whose heap tree is a complete binary tree
 Complete Binary Tree: All levels but the last one are fully filled. All nodes in last level are left
filled
 Linked Binary Heap: Uses linked binary tree to represent complete binary tree
o Flexible on the number of elements
o More efficient when the number of heap elements is unknown.
Array Implementation
 Advantage of using complete binary trees for heap -> efficient array representation

13
 Implementation of array binary heap uses array of given maximum length and variable n to
denote number of elements in heap
 Complete binary tree of n elements can be represented by array of n elements
o Order complete binary tree node data elements in breadth-first and left-first order, put
element into array in same order
o Root has index 0
 Node of index i has left child at 2 i+1 and right child at 2 i+ 2; parent at index (i−1)/2
 Height of complete binary tree is log n
 Element at index 0 is min element

Time and Space Complexities of Operations

Application of Heaps
1. Heap Sort
o Sorting problem:
 Input: array x[n]
 Output: sorted array of x[n] in increasing order
o Sorting solution:
 Insert x[i] into binary max-h
2. Used as priority queue to improve the performance of Prim’s and Dijkstra’s algorithms

These next 3 pages are all graph


definitions, so if you’ve taken or
are taking MA238 you can
Graphs probably ignore them and be ok
Concepts of Graphs
 Graph: Non-linear data structure used to implement mathematical graph of related data elements

14
o Set of vertices together with set of edges, where an edge is a binary relation of two
vertices
o Generalization to trees
 Graphs used to model anything where entities are related to each other in pairs
o Example: Family trees, web graph, social network
 Abstract Graph Data Structure: Specification of collection of data objects related in graph
structure and set of operations on data objects through graph
o Used in algorithm design and analysis
 Graph Data Structure: Implementation of abstract graph data structure; used in programs
 Node -> Vertex
Graph Definitions
Graphs
 Graph G is an ordered pair (V, E) where V is a set of vertices (nodes) and E is a collection of
vertex pairs (edges)
o Written as G = (V, E)
 V(G) represents node set; |V(G)| represents order (# of nodes) of graph
o Order of graph: Number of nodes
 E(G) represents edge set; |E(G)| represents size (# of edges) of graph
o Size of graph: Number of edges
 Dense: m = O(n2)
 Sparse: m = O(n)
 Loop Edge: Edge where both endpoints are the same
 Multiple Edges: Two edges have the same pair of nodes
Subgraphs
 Subgraph: Graph H = (V', E') is a subgraph of G = (V, E) if V' is a subset of V and E' is a subset of E
 Supergraph: G = (V, E) is a contains / is a supergraph of H if H = (V', E') is a subgraph of G
 Spanning Subgraph: H is a spanning subgraph of G if V' = V
Paths
 Path: List of nodes and distinct edges: v0, e0, v1, e1, … , vk-1, ek-1, vk, such that vi and vi+1 are end
nodes of ei for i = 0, ..., k-1
o Path is connected to first node v0 and last node vk
o Path can be represented by list of edges: e0, e1, …, ek-1
o Path can be represented by list of nodes: v0, v1, …, vk
 Terms used and General Knowledge:
o Length of path = Number of edges in the list
o Simple Path: v0, v1, …, vk-1, vk are distinct
o Closed Path: v0 and vk represent the same node
o Cycle: Path is closed and v0, v1, …, vk-1, vk are distinct
o Graph G contains path P if nodes and edges of P are from G
o Reachable: List of some nodes and edges of G form path with u as the first node and v as
the last node

Undirected Graphs
o Undirected Graph: Edges have no orientation (non-ordered pair of nodes)
o Edge represented by set of its two nodes
o Simple Graph: Undirected and does not contain loop or multiple edges

15
o For undirected edge e = {u, v}, following sayings used:
o e connects u and v
o u and v are adjacent
o u is a neighbour of v
o u and v are incident with e
o e is incident with u and v
o Degree of node u: Total number of edges incident with u
o Isolated: deg(u) = 0, i.e. u isn't incident with any edges
o Handshaking Theorem: Sum of degrees of all nodes of graph G is equal to 2|E(G)|
o K-Regular: Every node has degree k
o Complete Graph: Graph is simple and there is an edge between every pair of nodes
o Complete graph of order n is (n-1)-regular and has (n-1)/2 edges
o Connected: Any two nodes graph contains have a path connecting them
o Tree: Graph is connected and contains no cycles
o Tree has n-1 edges, so a tree is a sparse graph
o Component: Subgraph H of G is a component of G if H is connected and G contains no edge
from V(H) to V(G)-V(H)
o If G is not connected, then it can be decomposed into disjoint union components of G
Directed graphs
o Directed Graph (Digraph): Edges are ordered pair of nodes
o Directed edge e connecting node u to node v written as (u, v) or e = (u, v)
o Arrow from u to v is used in drawing of directed edge e
o For directed edge e = (u, v), following sayings used:
o u is tail/origin/initial of e
o v is head/target/terminal of e
o u is parent/predecessor of v
o v is child/successor of u
o e is directed edge from u to v
o e is from u to v
o e connects u to v
o e begins/originates from u; ends/terminates at v
o e is an outgoing edge of u
o e is an incoming edge of v
o Out-degree of node u: Number of outgoing edges of u
o Notation: outdeg(u) or deg+(u)
o In-degree of node u: Number of incoming edges of u
o Notation: indeg(u) or deg-(u)
o Degree of node u: deg(u) = indeg(u) + outdeg(u)
o Source: Node with indeg(u) = 0
o Sink: Node with outdeg(u) = 0
o Directed Path: v0, e0, v1, e1, … , vk-1, ek-1, vk, is a path where each edge connects the previous
node to the next nodes in the list, i.e., ei = (vi, vi+1) for i=0,…,i-1
o P connects v0 to vi+1, and P is a path from v0 to vi+1
o Reachable: Node v is reachable from node u if G contains a directed path connecting u to v
o Connected (or Strongly Connected): For any pair of nodes u and v there is a directed path
from u to v and a directed path from v to u
o Directed Acyclic Graph (DAG): Digraph does not contain a directed cycle

16
o Bidirectional Graph: Undirected graph transferred to a diagraph by replacing each edge by
two directed edges of different directions
Weighted Graphs
o Weighted Graph: Graph where each edge is associated with value (weight)
o Graph Notation: G = (V, E, w), where w is a function w: E -> W, where W is the set of
weights
o Edge Notation: Edge e written as w(e)
o Weight: Sum of weights of all its edges of G
o Notation: w(G)
o Weight can be applied to both undirected and directed graphs
Graph Representations
Adjacent Matrix
 Adjacency Matrix: Adjacency matrix A (or AG) of a simple graph G with vertices v₁ , v₂ ,…, vₙ is a
n×n 0-1 matrix whose (i, j)ᵗʰ entry is 1 when vᵢ and vⱼ are adjacent and 0 when they are not
adjacent
o For directed graph G = ({v0, v1 ,… , vn-1} , E): Adjacency matrix of G is an n by n matrix M =
[ai,j]n x n and ai,j = 1 if (vi, vj) is an edge of G; otherwise ai,j = 0
o For undirected graph G = ({v0, v1 ,… , vn-1} , E): Adjacency matrix of G is matrix M = [ai,j]n x
n, ai,j = aj,i = 1 if {vi, vj} is an edge of G; otherwise ai,j = aj,i = 0
 i.e. adjacency matrix of an undirected graph is a symmetric matrix
o For weighted digraph G = ({v0, v1 ,… , vn-1} , E, w): Adjacency matrix of G is matrix M =
[ai,j]n x n, ai,j = w((v_i, v_j)) if (vi, vj) is an edge of G; otherwise ai,j = 0 (a sentinel value)
o For weighted undirected graph, G = ({v0, v1 ,… , vn-1} , E, w): Adjacency matrix of G is
matrix M = [ai,j]n x n, ai,j = aj,i = w((vi, vj)) if (vi, vj) is an edge of G; otherwise ai,j = aj,i = 0

Adjacent List
 Adjacency List Representation: Represents graph by list of nodes and it's neighbours

Edge List
 Edge List: List of edges

17
Graph Traversal
BF-traversal
 Begins at start node and explores all neighbour nodes
 For each of the neighbour nodes, algorithm explores their unexplored neighbour nodes, and so
on until all reachable nodes are explored
 Needs queue data structure to remember nodes at distance
 Needs status array to store state of each node
DF-traversal
 Depth-first traversal starts from given node and explores unvisited node deeper and deeper
until node has no neighbours (i.e. dead-end)
 When dead-end if reached, algorithm backtracks, returning to most recent node that hasn't
been completely explored
o ∴ it needs to remember a path for backtracking, Think pre-order traversal
Graph Algorithms
Kruskal’s Algorithm
 Idea of Kruskal's algorithm:
1. Sort edges of G in increasing order of weights
2. Starting forest of no edge, it adds edges in sorted order to forest as long as edges are
connecting two trees in the forest until tree is derived and no more edges are left
 time: O(m log m), space: O(m)
 Theorem. If input weighted graph G is connected, Kruskal's algorithm outputs minimum
spanning tree of G
Prim’s Algorithms for MST
 Drawback of Kruskal's algorithm -> Sorting of edges because it costs more time and space
 Prim's algorithm avoids this drawback using efficient data structures with better time and space
performance
 Idea of Prim's Algorithm:
 Grow a tree T starting from a node by adding edges until a spanning tree is derived
 In each iteration, it adds an edge of minimal weight from a node of T to a node not in T
 time: O(nm), space: O(n)
 Prim's Theorem. If the input weighted graph G is connected, then Prim's algorithm gives a
minimum spanning tree of G
Dijkstra's Algorithm for Shortest Path/Tree
 Shortest Path Tree (SPT): Tree subgraph rooted at source node such that path from source node
to each node in the tree is the shortest path connecting the source and destination node in the
super-graph (application: find shortest route from point A to B on a map)
 Dijkstra's Algorithm: Efficient algorithm to find costs of shortest path from source node to
destination node
o Greedy algorithm
 Idea of Dijkstra's Algorithm:
1. Starts with tree T consisting of source node s
2. In each iteration, it adds node with minimum distance to s thorugh T
3. Iteration stops when no more nodes can be added
 Like Prim's algorithm, Dijkstra's algorithm grows current SPT by Greedy strategy by adding
shortest reachable node and edge

18

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy