Comp 251 Final Cribsheet Calem Bendell

Download as pdf or txt
Download as pdf or txt
You are on page 1of 12

1 Good Facts to Know 2.

3 Hash Function lst[root], lst[child] = lst[child], lst[root]


1.1 Trees Division: h(k) = k mod d, d = 2r where r prime not too close root = child
from power of 2 or 10 else:
Trees are graphs, graphs are not trees break
A graph that is not a tree has at least one cycle Multiplication h(k) = (A ∗ k mod 2e ) >> (w − r), 2w−1 < A < 2w ,
A Hamiltonian cycle visits each vertex exactly once multiplies and extracts r bits from end
every tree is a bipartite graph and a median graph, countable 3.1 BuildMaxHeap
trees are a planar graph 2.4 Open Addressing
height and nodes: 2h1 <= n <= 2h − 1 Open addressing: linear and quadratic probing and double Calls MaxHeapify on each element in bottom up manner.
leaf nodes = (n + 1)/2 hashing Correctness:
nodes given l leaves = 2l − 1 Linear probing: interval between probes fixed, best cache Loop invariant At the start of each iteraion of the for loop, each
performance but sensitive to clustering node i+1, i+2, ..., n is the root of a maxâĂŘheap.
number of nodes = 2h 1 Quadratic probing: interval between probes increases linearly Initialisation: Before first iteration i = [n/2], nodes [n/2] + 1,
number of nodes on given level : 2( h) Double hashing: more computation [n/2] + 2, .., n are leaves, hence roots of trivial max heaps
recurrence relation for√ number of nodes: No memory overhead but wasted entries Maintenance: By LI, subtrees at children of node i are max
an = a2n−1 + an−1 (1 + 4an−1 − 3) Recommended .7 load factor
Performance relative to α/(1 − α) heaps, hence MaxHeapify(i) renders node i a max heap root,
number of binary trees with n nodes are the Catalan number Cn decrementing i re-establishes the loop invarian for the next
successor of node x is the node with the smallest key greater Deletion needs to add marker to continue search in addressing
Given uniform hashing expected number of probes in iteration.
than x− > key, which can be determined without comparing
keys unsuccessful search is at most 1/(1 − α) and successful search is Cost of MaxHeapify is O(h), ≤ [n/2h+1 ] nodes of height h, height
1/α ∗ log(1/(1 − α)) of heap is [lg(n)].
1.2 Heaps
height = θ(log(n)), basic operations O(log(n)) maxheapify(A,n): 2.5 Universal hashing [lg(n)]
X h
T (n) = T (largest) + θ(1); largest <= 2n/3 (last row half full is FILL THIS SHIT O(n ) = O(n) (4)
n2
worst case) T (n) = T (2n/3) + θ(1) => T (n) = O(log(n)) 3 Heaps h=0
lg(n)
buildmaxheap = n ∗ sum0 h/2h = O(n) Tree based structure: max heap largest element at root and for 4 Disjoint Sets
all nodes the parent is greater than or equal to the node,
1.3 Max Heap Correctness opposite for min heap map: f (x) = y
Initialisation: Prior to first iteration, i = [n/2], each node Most operations run in O(lg(n)) time and heap height is Θ(lg(n)) relation: R ⊆ (a, b) : a, b ∈ S, can be defined by a boolean matrix
[n/2] + 1, [n/2] + 2, . . . , n is a leaf and thus the root of a trivial max Max heaps are used for sorting equivalence relation: i is equivalent to j if they belong to the
heap. Heap property maintained by exchanging value of an offending same set
Maintenance: The children of node i are numbered higher than node with the large values at its children, which may in turn reflexivity: ∀a ∈ S, (a, a) ∈ R
i, there both roots of max heaps. Max heapify preserves the lead to subtree at child not being a heap which must be symmetry: ∀a, b ∈ S, (a, b) ∈ R → (b, a) ∈ R
property that the nodes up to n are all roots of max heaps. recursively fixed To sort: build heap, then swap first and last transitivity: ∀a, b, c ∈ S, (a, b) ∈ R ∧ (b, c) ∈ R → (a, c) ∈ R
Decrementing i for in the for loop reestablishes the invariant for elements, max heapify repeatedly at the root
the next iteration. Heapsort is O(nlg(n)) like merge sort but in place like insertion make-set makes a new set whose only member is x, assuming x
Termination: At terminion, i = 0. By the loop invariant each sort, built from BuildMaxHeap of O(n) and for loop n − 1 times is not in another set already. O(1) using linked list method by
node is the root of a max-heap, in particular the first node is. making a new LL.
of MaxHeapify of O(lg(n)). union takes two values and forms a union between the two,
2 Hashing some implementations of which will choose a particular
keys used MaxHeapif y(n) = T (largest) + Θ(1) (1) representative. we must destroy the original sets for the values
Load factor: to ensure disjointedness.
number of keys in universe largest ≤ 2n/3(last row half full) (2) find set returns a pointer to the representative set containing x.
Uniform hashing: each key equally likely to have any of the m0 O(1) with linked list method by following point from x back to
permutations as its probe sequence T (n) ≤ T (2n/3) + Θ(1) = T (n) = O(lg(n)) (3)
its set object and returning the member point to by head
2.1 Direct Address Table def heapsort(lst):
Represented by an array T [0 . . . m − 1] for start in range((len(lst)-2)/2, -1, -1): CONNECTED-COMPONENTS(G):
Each slot corresponds to key in U siftdown(lst, start, len(lst)-1) for each vertex v \in G.V
If element x with key k then T [k] has pointer to x for end in range(len(lst)-1, 0, -1): make_set(v)
Otherwise T [k] is empty lst[end], lst[0] = lst[0], lst[end] for each edge (u,v) \in G.E
Delete = Θ(1), Search = Θ(1), Insert = Θ(1) siftdown(lst, 0, end - 1) if find_set(u) != find_set(v)
return lst union(u,v)
2.2 Chaining def siftdown(lst, start, end):
Search, removal, lookup all constant by amortized analysis but root = start SAME-COMPONENTS(u,v):
otherwise average case is O(α) where α is the load factor while True: return if find_set(u) == find_set(v)
Worst case is number of entires in table if all hash equal for all child = root * 2 + 1
but Insert if child > end: break
if child + 1 <= end and lst[child] < lst[child + 1]: path_compression_find(i): (O(log(n)))
Unsuccessful search O(1) + O(α) = O(1 + α)
α α child += 1 if p[i] == i { return i; }
Successful search Θ(1 + 1 + + = Θ(1 + α)) if lst[root] < lst[child]: else { p[i] = find(p[i]); return p[i]; }
2 2n
5 Trees S, then A0 = A \ {1} is an optimal solution to the In the worst case, the number of lecture halls require is n.
5.1 Binary Trees activity-selection problem S = {i ∈ S : si ≥ f1 }. Why? If we could GREED-ACTIVITY-SELECTOR runs in Îÿ(n). The running time
n is nodes, h is height, l is leaves find a solution B’ to S’ with more activities then A’, adding 1 to of this algorithm is O(n2). Observe that choosing the activity of
B’ would yield a solution B to S with more activities than A, least duration will not always produce an optimal solution. For
contradicting the optimality. example, we have a set of activities (3, 5), (6, 8), (1, 4), (4, 7), (7,
2(h + 1) − 1 ≤ h ≤ 2h+1 − 1 (5) 6.2 Huffman encoding 10). Here, either (3, 5) or (6, 8) will be picked first, which will be
l = (n + 1)/2 (6) 6.3 Dijkstra’s Substructure picked first, which will prevent the optimal solution of (1, 4), (4,
7), (7, 10) from being found. Also observe that choosing the
perfect tree : n = 2l − 1 (7) Optimal substructure simply that subpath of any shortest path activity with the least overlap will not always produce solution.
is itself a shortest path, and the shortest path length between
balanecd : h = [log(n + 1)] (8) some u and v is less than or equal to the shortest path from u to For example, we have a set of activities (0, 4), (4, 6), (6, 10), (0,
1), (1, 5), (5, 9), (9, 10), (0, 3), (0, 2), (7, 10), (8, 10). Here the one
perfect : l = 2h , n = 2h+1 − 1 (9) x to v. (triangle inequality)
with the least overlap with other activities is (4, 6), so it will be
6.4 Differences Between Bellman Ford and Dijkstra picked first. But that would prevent the optimal solution of (0,
5.2 Binary Search Trees
Bellman-Ford algorithm is a single-source shortest path 1), (1, 5), (5, 9), (9, 10) from being found.
Search, Insert, and Delete all Θ(h) where h is height algorithm, which allows for negative edge weight and can detect
Each node contains a key and two subtrees, which are both negative cycles in a graph. Dijkstra algorithm is also another 7 Graphs
binary search trees single-source shortest path algorithm. However, the weight of 7.1 Edge Classification
Left subtree strictly less than node, right subtree strictly greater all the edges must be non-negative. For your case, as far as the
than Forward edge:
Shape can degenerate by poor insertions total cost is concerned, there will be no difference, since the Back edge:
Comparison computation required to traverse tree edges in the graph have non-negative weight. However, Cross edge:
Dijkstra’s algorithm is usually used, since the typical Tree Edge:
Balanced tree Θ(log(n)), unbalanced tree Θ(n) implementation with binary heap has Theta((|E|+|V|)log|V|)
BST sort like heap sort but from Binary Search Tree Light edge:
time complexity, while Bellman-Ford algorithm has O(|V||E|) White vertex: unvisited by search
BST-sort = Ω(nlog(n)), O(n2 ) complexity. If there are more than 1 path that has minimum Grey vertex: vertex in progress
5.3 AVL Trees cost, the actual path returned is implementation dependent Black vertex: DFS/otheralgorithm has finished processing
BST such that heights of two child subtrees differ by at most one (even for the same algorithm). vertex
Operations average and worst cases O(log(n)) 6.5 Making Change with Coins 7.2 Depth First Search
Properties follow balanced trees 7.3 Breadth First Search
Uses tree rotations to re-establish tree properties Greedy algorithm gives as many of the highest value coins as
possible until moving to coins of decreasing value in order. 7.4 Parenthesis Theorem
AVL-sort, worst case brought down to O(nlog(n)) Correctness: for US coins this is optimal as we must have at For every two vertices u and v, exactly one of the following
5.4 Red-black trees most 2 dimes otherwise can replace with quarter and a nickel, if conditions holds: the intervals [s[u], f[u] and [s[v], f[v]] are
Variation on BST to ensure balance 2 dimes then no nickels otherwise we can replace 2 dimes and 1 disjoint or one interval contains the other (either s[u] < s[v] <
O(lg(n)) worst case for operations nickel with quater, at most 1 nickel (otherwise we can replace 2 f[v] < f[u] or s[v] < s[u] < f[u] < f[v])
Has extra bit per node for colour red or black nickels with a dime), and at most 4 pennies (otherwise we can To prove the theorem it suffices to prove that if s[u] < s[v] < f[u]
Every node is red or black, root is black, nil leaves are black, if a replace 5 pennies with nickel) then s[u] < s[v] < f[v] < f[u] (and similarly if s[v] < s[u] < f[v]
node is red than it’s children are black, for each node, paths then s[v] < s[u] < f[u] < f[v]). So suppose that s[u] < s[v] < f[u].
from node to descendent leaves contain same number of black 6.6 Rental Car Problem
In this case, at time s[v] when v is colored Gray (and pushed
nodes Let c(i, j) by the optimal cost of driving from agency i to j.
back to the stack) u is on the stack and has color Gray. Thus u is
Black-height number of black nodes on the path from x to a leaf, never added again to the stack, and therefore it can only become
including the leaf 6.7 Lecture Hall Assignment Problem
Black after this occurrence of v is taken out and v is colored
Given a set of activities to among lecture halls. Schedule all the Black. This means f[v] < f[u].
6 Greedy algorithms activities using minimal lecture halls. In order to determine
Bellman-Ford’s algorithm uses dynamic programming which activity should use which lecture hall, the algorithm uses 7.5 White Space Theorem
Dijkstra’s algorithm uses the greedy approach the GREEDY-ACTIVITY-SELECTOR to calculate the activities in In a DFS forest of a (directed or undirected) graph G, vertex v is
6.1 Scheduling problem the first lecture hall. If there are some activities yet to be a descendant of vertex u if and only if at time s[u] (just before u
scheduled, a new lecture hall is selected and
Proof of optimality Let S = {1, 2, . . . , n} be the set of activities GREEDY-ACTIVITY-SELECTOR is called again. This continues is colored Gray), there is a path from u to v that consists of only
ordered by finish time. Thus activity 1 has the earliest finish until all activities have been scheduled. White vertices.
time. Suppose A is a subset of S is an optimal solution and let Proof There are two directions to prove. (forward arrow)
The algorithm can be shown to be correct and optimal. As a Suppose that v is a descendant of u. So there is a path in the tree
activities in A be ordered by finish time. Suppose that the first contradiction, assume the number of lecture halls are not
activity in A is k âĽă 1, that is, this optimal solution does not optimal, that is, the algorithm allocates more hall than from u to v. (Of course this is also a path in G.) All vertices w on
start with the "greedy choice." We want to show that there is necessary. Therefore, there exists a set of activities B which have this path are also descendants of u. So by the corollary above,
another solution B that begins with the greedy choice, activity 1. been wrongly allocated. An activity b belonging to B which has they are colored Gray during the interval [s[u], f[u]]. In other
Let B = (A \ {k}) ∪ {1}. Because f1 ≤ fk , the activities in B are been allocated to hall H[i] should have optimally been allocated words, at time s[u] they are all White. (backward arrow)
disjoint and since B has same number of activities as A, i.e., |A| to H[k]. This implies that the activities for lecture hall H[k] have Suppose that there is a White path from u to v at time s[u]. Let
= |B|, B is also optimal. Once the greedy choice is made, the not been allocated optimally, as the this path be v0 = u, v1, v2, ..., vk1, vk = v To show that v is a
problem reduces to finding an optimal solution for the GREED-ACTIVITY-SELECTOR produces the optimal set of descendant of u, we will indeed show that all vi (for 0 ≤ i ≤ k)
subproblem. If A is an optimal solution to the original problem activities for a particular lecture hall. are descendants of u. (Note that this path may not be in the DFS
tree.) We prove this claim by induction on i. Base case: i = 0, vi state[node] = BLACK
= u, so the claim is obviously true. Induction step: Suppose that
vi is a descendant of u. We show that vi+1 is also a descendant while enter: dfs(enter.pop())
of u. By the corollary above, this is equivalent to showing that return order
s[u] < s[vi + 1] < f [vi + 1] < f [u] i.e., vi+1 is colored Gray during
the interval [s[u], f[u]]. Since vi+1 is White at time s[u], we have
s[u] < s[vi+1]. Now, since vi+1 is a neighbor of vi , vi+1 cannot
stay White after vi is colored Black. In other words, s[vi+1] <
f[vi]. Apply the induction hypothesis: vi is a descendant of u so
s[u] ≤ s[vi] < f [vi] ≤ f [u], we obtain s[vi+1] < f[u]. Thus s[u] <
s[vi+1] < f[vi+1] < f[u] by the Parenthesis Theorem. QED.
7.6 Directed Acyclic Graph
Good for partial order as a > b ∧ b > c =⇒ a > c but can also have
neither a > b nor b > a parent = dict()
Can always make a total ordering from a partial order rank = dict()
May not have back edges
Proof: back edge means cycle suppose there is a back edge def make_set(vertice):
parent[vertice] = vertice
(u, v) then v is ancestor of u in depth-first forest, therefor there
rank[vertice] = 0
is a path from v to u so v to u to v is a cycle
Proof: a DAG is acyclic iff a DFS of G yields no back edges a
def find(vertice):
cycle in G, v presents as the first vertex discovered in c, (u, v) if parent[vertice] != vertice:
preceding edge in c, and at some time d[v] vertices of c form a parent[vertice] = find(parent[vertice])
white path from v to u then by the white path theorem u is a return parent[vertice]
descendent of v in a depth first forest, there u to v is a back edge
7.6.1 Topological sort 7.7 Minimum Spanning Tree def union(vertice1, vertice2):
root1 = find(vertice1)
Performed on a DAG Has |V | − 1 edges with no cycles and may not be unique root2 = find(vertice2)
Is a linear ordering of the vertices such that if (u, v) ∈ E then u Can be calculated by an algorithm that adds edges to a copy of if root1 != root2:
appears somewhere before v the vertices such that added edges are always safe for A. if rank[root1] > rank[root2]:
1. Call DFS to compute finishing times for all vertices, as each Loop invariant: A is a subset of some MST: initialises with parent[root2] = root1
vertex finishes, insert it onto the front of a linked list, and empty set, which satisfies invariant, mainains by adding only else:
return the linked list of vertices safe edges, terminates by all edges added to A are in an MST so parent[root1] = root2
Time: Θ(V + E) spanning tree of A must also be an MST if rank[root1] == rank[root2]: rank[root2] += 1
Proof: Topological sort correctness show that if (u, v) ∈ E then A cut partitions vertices into two disjoint sets
f [v] < f [u]: when we explore (u, v)what are the colors of u and A cut respects A if and only if no edge in A crosses the cut A def kruskal(graph):
v, if us is grey then v cannot be grey because then v would be light edge crosses the cut for vertice in graph[’vertices’]:
ancestor of u therefore (u, v) is a black edge, which is a A safe edge is a light edge crossing (S, V − S), then (u, v) is safe make_set(vertice)
contradiction of DAG, if v is white than becomes descendant of for A minimum_spanning_tree = set()
u and parenthesis theorem works, if v is black then v is already Proof of safe edge Let T be a MST that includes A. for case 1 the edges = list(graph[’edges’])
finished therefore f [v] < f [u] path u v is in T and we’re done, for case two u to v is not in T edges.sort()
topological sort (sometimes abbreviated topsort or toposort) or giving us some edge x to y crossing the cut. Let for edge in edges:
topological ordering of a directed graph is a linear ordering of T 0 = T − {(x, y)} ∪ {(u, v)}. because u to v is light for cut, weight, vertice1, vertice2 = edge
its vertices such that for every directed edge uv from vertex u to w(u, v) ≤ w(x, y) thus w(T 0 ) = w(T ) − w(x, y)) + w(u, v) ≤ w(T ) if find(vertice1) != find(vertice2):
union(vertice1, vertice2)
vertex v, u comes before v in the ordering hence T 0 is also a MST so u to v is safe for A. minimum_spanning_tree.add(edge)
Corollary if u to v is a ligh edge connecting one CC in (V, A) to return minimum_spanning_tree
GRAY, BLACK = 0, 1
another CC in (V, A), then u to v is safe for A
def topological(graph): 7.7.1 Kruskal’s Algorithm First for loop is O(V ), sort E O(E lg E), second for loop O(E) find
order, enter, state = deque(), set(graph), {} sets and unions. Assuming union by rank and path compression
Basic algorith: sort all edges in increasing order by weight. Pick O((V+E)alpha(V) + O(E lg E). Since G is connected |E| geq |V| -
def dfs(node): smallest edge, check if it forms a cycle with spanning tree 1 implies O(E alpha (V)) + O(E lg E), alpha(|V|) = O(lg(V)) =
state[node] = GRAY formed so far. If no cycle, include, else discard. Repeat until
O(lg E), therefore total time is O(E lg E), |E| leq |V|2 therefore
for k in graph.get(node, ()): (V-1) edges in spanning tree.
lg|E| = O(2lgV) = O(lgV) = O(E lg V) time.
sk = state.get(k, None) Starts with each vertex in its own component and repeatedly
if sk == GRAY: raise ValueError("cycle") merges two components into one by choosing a light edge that Proof We are given a graph G = (V, E), with costs on the edges,
if sk == BLACK: continue connects them, scans the set of edges in monotonically and we want to find a spanning tree of minimum cost. We use
enter.discard(k) increasing order by weight, uses a disjoint-set data structure to KruskalâĂŹs algorithm, which sorts the edges in order of
dfs(k) determine whether an edge connects vertices in different increasing cost, and tries to add them in that order, leaving
order.appendleft(node) components edges out only if they create a cycle with the previously selected
edges. Let T = (V , F) be the spanning tree produced by vertex in a subgraph to a vertex outside the subgraph. Since P is currentVert = pq.delMin()
KruskalâĂŹs algorithm, and let T ∗ = (V , F∗) be a minimum connected, there will always be a path to every vertex. The for nextVert in currentVert.getConnections():
spanning tree. If T is not optimal then F , F, and there is an output Y of Prim’s algorithm is a tree, because the edge and newDist = currentVert.getDistance() \
edge e ∈ F∗ such that e < F Then e creates a cycle C in the graph vertex added to tree Y are connected. Let Y1 be a minimum + currentVert.getWeight(nextVert)
T + e, and at least one edge f of this cycle crosses the cut defined spanning tree of graph P. If Y1=Y then Y is a minimum if newDist < nextVert.getDistance():
by T ∗ e. Furthermore, e < F because we tried to add it after the spanning tree. Otherwise, let e be the first edge added during nextVert.setDistance( newDist )
rest of C was already in the tree. Then it was the most expensive the construction of tree Y that is not in tree Y1, and V be the set nextVert.setPred(currentVert)
edge in C, and so cost(f ) ≤ cost(e). If we add the edge f to the of vertices connected by the edges added before edge e. Then pq.decreaseKey(nextVert,newDist)
graph T ∗ −e, then we reconnect the graph and create a spanning one endpoint of edge e is in set V and the other is not. Since tree
Y1 is a spanning tree of graph P, there is a path in tree Y1 ## BELLMAN FORD
tree. Also, cost(T ∗ e + f ) = cost(T ∗)cost(e) + cost(f ) ≤ cost(T ∗), joining the two endpoints. As one travels along the path, one # Step 1: For each node prepare the destination and predecessor
and so we have created a new spanning tree of no more cost must encounter an edge f joining a vertex in set V to one that is def bel_initialize(graph, source):
than T ∗, but with one more edge in common with T. We can do not in set V. Now, at the iteration when edge e was added to tree d = {} # Stands for destination
this for every edge that differs between T and T ∗ , until we Y, edge f could also have been added and it would be added p = {} # Stands for predecessor
either obtain the tree T , of no more cost than T ∗ , contradicting instead of edge e if its weight was less than e, and since edge f for node in graph:
that T was not optimal.
was not added, we conclude that w(f ) ≥ w(e) Let tree Y2 be the d[node] = float(’Inf’) # We start admiting that the rest
7.7.2 Prim’s Algorithm graph obtained by removing edge f from and adding edge e to # of nodes are very very far
tree Y1. It is easy to show that tree Y2 is connected, has the p[node] = None
1) Create a set mstSet that keeps track of vertices already same number of edges as tree Y1, and the total weights of its d[source] = 0 # For the source we know how to reach
included in MST. 2) Assign a key value to all vertices in the edges is not larger than that of tree Y1, therefore it is also a return d, p
input graph. Initialize all key values as INFINITE. Assign key minimum spanning tree of graph P and it contains edge e and def bel_relax(node, neighbour, graph, d, p):
value as 0 for the first vertex so that it is picked first. 3) While all the edges added before it during the construction of set V. if d[neighbour] > d[node] + graph[node][neighbour]:
mstSet doesn’t include all vertices .a) Pick a vertex u which is Repeat the steps above and we will eventually obtain a # Record this lower distance
not there in mstSet and has minimum key value. .b) Include u to minimum spanning tree of graph P that is identical to tree Y. d[neighbour] = d[node] + graph[node][neighbour]

mstSet. .c) Update key value of all adjacent vertices of u. To This shows Y is a minimum spanning tree. p[neighbour] = node
def bellman_ford(graph, source):
update the key values, iterate through all adjacent vertices. For 7.8 Shortest Path Algorithms d, p = bel_initialize(graph, source)
every adjacent vertex v, if weight of edge u-v is less than the Shortest paths not necessarily unique for i in range(len(graph)-1): #Run this until is converges
previous key value of v, update the key value as weight of u-v Single source shortest paths find shortest paths from a given for u in graph:
a greedy algorithm that finds a minimum spanning tree for a source vertex to every vertex v ∈ V for v in graph[u]: #For each neighbour of u
connected weighted undirected graph. Can use modified breadth first search to count number of edge bel_relax(u, v, graph, d, p) #Lets bel_relax it
Builds on tree so A is always a tree, starts from an arbitrary traversals to reach another vertex. # Step 3: check for negative-weight cycles
"root" r, and at each step adds a ligh edge crossing cut (Va, V - 7.8.1 Dijkstra’s Algorithm for u in graph:
Va) to A where Va = vertices that A is incident on. for v in graph[u]:
Using different data structures for representing and linearly 7.8.2 Bellman-Ford Algorithm assert d[v] <= d[u] + graph[u][v]
searching array of weights to find the minimum weight edge: Like Dijkstra’s Algorithm, BellmanâĂŞFord is based on the return d, p
adj. matrix O(|V |2 ), bin heap and adj lst principle of relaxation, in which an approximation to the 8 Bipartite Graphs
O((|V | + |E|)log|V |) = O(|E|log|V |), Fibonacci heap and adj list correct distance is gradually replaced by more accurate values
until eventually reaching the optimum solution. Graph bipartite iff does not contain an odd cycle
O(|E| + |V |log|V |). Run DFS and build a DFS tree of the graph, then colour vertices
In the method that uses binary heaps, we can observe that the Dijkstra’s algorithm greedily selects the minimum-weight node
that has not yet been processed, and performs this relaxation red back, if all non-tree edges join vertices of different color
traversal is executed O(V+E) times (similar to BFS). Each process on all of its outgoing edges; by contrast, the then the graph is bipartite
traversal has operation which takes O(LogV) time. So overall 8.1 Gale Shapley
BellmanâĂŞFord algorithm simply relaxes all the edges, and
time complexity is O(E+V)*O(LogV) which is O((E+V)*LogV) =
does this |V | − 1 times, where |V | is the number of vertices in the Correctness: algorithm must terminate after at most n2
O(E*LogV) (For a connected graph, V = O(E)). graph. In each of these repetitions, the number of vertices with iterations because each time through the while loop a man
correctly calculated distances grows, from which it follows that
eventually all vertices will have their correct distances. This proposes to a new woman and there are only n2 possible
proposals. All men and women get matched as given a man and
method allows the BellmanâĂŞFord algorithm to be applied to a a woman who are unmatched the woman must have never been
wider class of inputs than Dijkstra. proposed to but since the man must propose to everyone, there
BellmanâĂŞFord runs in O(|V | ∗ |E|) time, where |V | and |E| are is a contradiction. The pairs must be stable as supposing an
the number of vertices and edges respectively. unstable pair creates the contradiction that either someone was
Correctness never proposed to by the other person, which means he must
7.8.3 Code have been stable, or she was proposed to but prefers a more
stable partner. In either case, the pairing is stable, which makes
from pythonds.graphs import PriorityQueue, Graph, Vertex a contradiction. All executions yield a man-optimal assignment,
def dijkstra(aGraph,start): which is a stable matching. Suppose some man is paired with
pq = PriorityQueue() someone other than his best partner. Men propose in decreasing
start.setDistance(0) order of preference âĞŠ some man is rejected by a valid partner.
Proof Let P be a connected, weighted graph. At every iteration pq.buildHeap([(v.getDistance(),v) for v in aGraph]) Let Y be first such man, and let A be the first valid woman that
of Prim’s algorithm, an edge must be found that connects a while not pq.isEmpty(): rejects him. Let S be a stable matching where A and Y are
matched. In building matching, when Y is rejected, A forms (or return self.adj[v]
reaffirms) engagement with a man, say Z, whom she prefers to Y.
Let B be Z’s partner in S. In building matching, Z is not rejected def add_edge(self, u, v, w=0):
by any valid partner at the point when Y is rejected by A. Thus, if u == v:
Z prefers A to B. But A prefers Z to Y. Thus A-Z is unstable in S. raise ValueError("u == v")
edge = Edge(u,v,w)
def matchmaker(): redge = Edge(v,u,0)
guysfree = guys[:] edge.redge = redge
engaged = {} redge.redge = edge
guyprefers2 = copy.deepcopy(guyprefers) self.adj[u].append(edge)
galprefers2 = copy.deepcopy(galprefers) self.adj[v].append(redge)
while guysfree: self.flow[edge] = 0
guy = guysfree.pop(0) self.flow[redge] = 0
guyslist = guyprefers2[guy]
9.2 Dijkstra’s Algorithm
gal = guyslist.pop(0) def find_path(self, source, sink, path): Does not allow for negative weights
fiance = engaged.get(gal) if source == sink: Uses a priority queue
if not fiance: return path Keys are shortest-path weights
# She’s free for edge in self.get_edges(source): Similar to Prim’s algorithm but computing d[v] using
engaged[gal] = guy residual = edge.capacity - self.flow[edge] shortest-path weights as keys
print(" %s and %s" % (guy, gal)) if residual > 0 and edge not in path: At each step we make greedy choice to choose light edge
else: result = self.find_path( edge.sink,
# The bounder proposes to an engaged lass!
9.3 Bellman Ford
sink, path + [edge])
galslist = galprefers2[gal] if result != None: Allows negative-weight edges
if galslist.index(fiance) > galslist.index(guy): return result Returns TRUE if no negative-weight cycles reachable from s,
# She prefers new guy FALSE otherwise, if has not converged after V (G) − 1 iterations,
engaged[gal] = guy def max_flow(self, source, sink): then there cannot be a shortest path tree, so there must be
print(" %s dumped %s for %s" % (gal, fiance, guy)) path = self.find_path(source, sink, []) negative weight cycle.
if guyprefers2[fiance]: while path != None: When the algorithm is used to find shortest paths, the existence
# Ex has more girls to try residuals = [edge.capacity - of negative cycles is a problem, preventing the algorithm from
guysfree.append(fiance) self.flow[edge] for edge in path] finding a correct answer. However, since it terminates upon
else: flow = min(residuals) finding a negative cycle, the BellmanâĂŞFord algorithm can be
# She is faithful to old fiance for edge in path: used for applications in which this is the target to be sought - for
if guyslist: self.flow[edge] += flow example in cycle-cancelling techniques in network flow analysis
# Look again self.flow[edge.redge] -= flow
guysfree.append(guy)
Proof Lemma. After i repetitions of for loop: If Distance(u) is
path = self.find_path(source, sink, []) not infinity, it is equal to the length of some path from s to u; If
return engaged return sum(self.flow[edge] for edge in there is a path from s to u with at most i edges, then Distance(u)
self.get_edges(source)) is at most the length of the shortest path from s to u with at
9 Network Flow most i edges. Proof. For the base case of induction, consider i=0
capacity constraint ∀u, v ∈ V , f (u, v) ≤ c(u, v) and the moment before for loop is executed for the first time.
skew symmetry ∀u, v ∈ V , f (u, v) P
= −f (v, u), v ∈ V Then, for the source vertex, source.distance = 0, which is
flow conservation ∀u ∈ V − {s, t}, v∈V f (u, v) = 0 correct. For other vertices u, u.distance = infinity, which is also
9.1 Ford-Fulkerson Algorithm correct because there is no path from source to u with 0 edges.
Finding path from s to t takes O(|E|) by either BFS or DFS, flow For the inductive case, we first prove the first part. Consider a
class Edge(object): increases by at least 1 at each iteration, runs in O(Ef ) where f is moment when a vertex’s distance is updated by v.distance :=
def __init__(self, u, v, w): the maximum flow in the graph u.distance + uv.weight. By inductive assumption, u.distance is
self.source = u the length of some path from source to u. Then u.distance +
self.sink = v uv.weight is the length of the path from source to v that follows
self.capacity = w the path from source to u and then goes to v. For the second
FORD FULKERSON FOREVER part, consider the shortest path from source to u with at most i
def __repr__(self):
return "%s->%s:%s" % (self.source, self.sink, s
edges. Let v be the last vertex before u on this path. Then, the
self.capacity) part of the path from source to v is the shortest path from
source to v with at most i-1 edges. By inductive assumption,
class FlowNetwork(object): v.distance after iâĹŠ1 iterations is at most the length of this
def __init__(self): path. Therefore, uv.weight + v.distance is at most the length of
e1 e3 e2
self.adj = {} v1 v2 v3 v4 the path from s to u. In the ith iteration, u.distance gets
self.flow = {} compared with uv.weight + v.distance, and is set equal to it if
uv.weight + v.distance was smaller. Therefore, after i iteration,
def add_vertex(self, vertex): u.distance is at most the length of the shortest path from source
self.adj[vertex] = [] to u that uses at most i edges. If there are no negative-weight
t cycles, then every shortest path visits each vertex at most once,
def get_edges(self, v): so at step 3 no further improvements can be made. Conversely,
suppose no improvement can be made. Then for any cycle with and a minimum spanning tree T that do not share any edges? that causes v to be added on the stack. So at time s[u] v is White,
vertices v[0], ..., v[kâĹŠ1], v[i].distance <= v[(i-1) mod Prove your answer. i.e., s[u] < s[v]. Furthermore, u is not added to the stack after
k].distance + v[(i-1) mod k]v[i].weight Summing around the Solution: Assuming that G has more than one vertex, there is no s[u]. So when v is colored Black, u is still on stack and has not
such graph. Given any source vertex s, consider the light edges yet colored Black. Hence f[v] < f[u]. QED
cycle, the v[i].distance terms and the v[iâĹŠ1 (mod k)] distance leaving s, defined as the edges (s, v) of minimum weight (where
terms cancel, leaving 0 <= sum from 1 to k of v[i-1 (mod 11.5 BFS
v is some neighbor of s).
k)]v[i].weight I.e., every cycle has nonnegative weight. 5. Lemma Every MST of G must contain at least one of these Vertices in order The proof that vertices are in this order by
breadth first search goes by induction on the level number. By
10 Dynamic programming light edges. the induction hypothesis, BFS lists all vertices at level k-1 before
Proof: Suppose there is some MST T containing none of the those at level k. Therefore it will place into L all vertices at level
10.1 Best alignment light edges leaving s. Consider any light edge (s, u). There is k before all those of level k+1, and therefore so list those of level
Levenshtein distance minimal number of substitutions, some path from s to u in T, and therefore adding (s, u) to T k before those of level k+1.
insertions, and deletions is considered optimal forms a cycle. There must be some other edge (s, v) in the cycle 11.6 Code
such that w(s, u) < w(s, v), as we have assumed that there are no
11 Some Proofs
light edges leaving s contained in T. Removing (s, v) from T and def dfs(graph, start):
11.1 Minimum spanning trees replacing it with (s, u) forms a spanning tree of smaller total visited, stack = set(), [start]
1. Show that if an edge (u, v) is the unique light edge crossing weight than T, contradicting our assumption that T is an MST. while stack:
some cut of the connected, weighted undirected graph G, then 6. Lemma All of the light edges leaving s must be contained vertex = stack.pop()
(u, v) must be included in all minimum spanning trees of G. in all shortest path trees rooted at s. if vertex not in visited:
PROOF. If (s, u) is a light edge leaving s, then because G has visited.add(vertex)
Solution: Suppose for contradiction that there is some MST T
positive edge weights, the single edge (s, u) must be the unique stack.extend(graph[vertex] - visited)
that does not contain the light edge (u, v) crossing a cut C. return visited
Because T is a connected tree, there is some path in T shortest path from s to u. Any other path from s to u must go
connecting the vertices u and v, and therefore adding the edge through some edge (s, v), and by definition we know that w(s, v)
def dfs_paths(graph, start, goal):
(u, v) to T forms a cycle. Following this cycle, there must be at geq w(s, u); therefore, all other paths from s to u must have stack = [(start, [start])]
least one other edge in the cycle crossing the cut C. Removing strictly greater weight than the single edge (s, u). while stack:
this edge, and leaving (u, v), we now have a tree with a lower These two lemmas together show that for any source node s (vertex, path) = stack.pop()
total weight than the original T. Because T was assumed to be an all shortest path trees from s must share some edge with all for next in graph[vertex] - set(path):
MST, this is a contradiction. Therefore, any unique light edge of MSTs of G, so there is no shortest path tree S and MST T that if next == goal:
a cut must be part of every MST. share no edges yield path + [next]
2. Suppose that we have a connected, weighted undirected 7. either G has some path of length at least k. or G has O(kn) else:
graph G = (V, E) such that every cut of G has a unique light edge edges. Proof: look at the longest path in the DFS tree. If it has stack.append((next, path + [next]))
crossing the cut. Show that G has exactly one minimum length at least k, we’re done. Otherwise, since each edge
spanning tree. connects an ancestor and a descendant, we can bound the def bfs(graph, start):
Solution: From the previous proof (proof 1 above) we know that number of edges by counting the total number of ancestors of visited, queue = set(), [start]
for every cut of G, the unique light edge crossing the cut must each descendant, but if the longest path is shorter than k, each while queue:
be part of every MST of G. Consider the set T of edges, descendant has at most k-1 ancestors. So there can be at most vertex = queue.pop(0)
containing the unique light edge crossing each cut C of G. By (k-1)n edges. if vertex not in visited:
the previous part, we know that T is a subset of every MST. Now visited.add(vertex)
we can show that T must be equal to every MST. Because there is 11.2 Shortest Paths queue.extend(graph[vertex] - visited)
exactly one edge of T crossing every cut of G, the graph (V, T) Any subpath of a shortest path is a shortest path suppose return visited
must have exactly one connected component: if there were two some path p is the shortest path from u to v. u to v has x and y
in its path, which also have a path. suppose there is a shorter 12 Divide and Conquer
unconnected components V1 and V âĹŠ V1, then the cut (V1, V path from x to y. then there must be a shorter path from u to v, 12.1 Merge Sort
âĹŠ V1) would not be crossed by any edge of T, which which is a contradiction of the hypothesis that p is the shortest
contradicts the definition of T. In addition, T cannot contain any possible path. Recurrence analysis
cycles: the heaviest edge on a cycle in T cannot be the unique def mergesort(arr):
lightest edge in any cut. Then T itself must be a spanning tree of 11.3 Graphs
if len(arr) == 1:
G. Because T is a subset of all minimum spanning trees of G, it 1. A graph is bipartite iff it does not contain an odd cycle
return arr
must be the unique MST of G, as any additional edge added to T DECIDE IF THIS IS NECESSARY
m = len(arr) / 2
would form a cycle, making it no longer be a tree. 11.4 DFS l = mergesort(arr[:m])
3. Given any connected undirected graph G with positive edge Corollary For v , u, v is a descendant of u in a DFS tree if and r = mergesort(arr[m:])
weights w, does there always exist a shortest path tree S such if not len(l) or not len(r):
that S is a minimum spanning tree of G? only if s[u] < s[v] < f[v] < f[u]. Proof This is an âĂIJif and only
return l or r
Solution: counterexample. draw square with an X in it. outside ifâĂİ statement, so we must prove two directions. (back): The result = []
edges with weight 2 and diagonal inner edges have a weight of condition s[u] < s[v] < f[v] < f[u] implies that v is added to the i = j = 0
3. Minimum spanning trees are any three of the weight 2. the stack during the time u is Gray. When u is Gray, only while (len(result) < len(r)+len(l)):
shortest path tree from each node contains just the edges descendants of u are added to the stack. Therefore v is a if l[i] < r[j]:
containing a single node. descendant of u. (forward): For this direction, it suffices to show result.append(l[i])
4. Does there exist some connected undirected graph G with that if u is the parent of v in the DFS tree, then s[u] < s[v] < f[v] i += 1
positive edge weights w such that G has a shortest path tree S < f[u]. Suppose that u is the parent if v, then u is the last vertex else:
result.append(r[j]) f (n) = 7f (n1) + l4n , for some constant l that depends on the replacement, because 2i+1 = 2i + 2i . Each iteration decreases the
j += 1 number of additions performed at each application of the size of the multiset by 1, so the replacement process must
if i == len(l) or j == len(r): algorithm. Hence f (n) = (7 + o(1))n , i.e., the asymptotic eventually terminate. When it does terminate, we have a set of
result.extend(l[i:] or r[j:]) complexity for multiplying matrices of size N = 2n using the distinct powers of 2 whose sum is n.
break
return result Strassen algorithm is O([7 + o(1)]n ) = O(N log2 7+o(1) )
The proof that Strassen’s algorithm should exist is a simple Naive runtime: Now suppose we want to use INCREMENT to
dimension count (combined with a proof that the naive count from 0 to n. If we only use the worst-case running time
12.2 Karatsuba Multiplication dimension count gives the correct answer). Consider the vector for each increment, we get an upper bound of O(nlogn) on the
space of all bilinear map C n × C n → C n , this is a vector space of total running time.
def multiply(x, y):
if x.bit_length() <= _CUTOFF or y.bit_length() <= _CUTOFF: dimension n3 (in the case of matrix multiplication, we have
return x * y n = m2 , e.g. n = 4 for the 2 × 2 case). The set of bilinear maps of Summation/aggregate method: least significant bit B[0] does
else: rank one, i.e., those computable in an algorithm using just one flip every time but B[1] every other time... in general B[i] flips
n = max(x.bit_length(), y.bit_length()) scalar multiplication, has dimension 3(n − 1) + 1 and the set of every 2i th time. n increments flips each bit B[i] floor [n/2i ]
half = (n + 32) // 64 * 32 bilinear maps of rank at most r has dimension the min of times. thus the total number of bit flips is (all floors)
mask = (1 << half) - 1
r[3(n − 1)] + r and n3 for most values of n, r (and one can check Pf loorlog(n)
[n/2i ] < ∞
P i
xlow = x & mask
that this is correct when r = 7, n = 4. Thus any bilinear map i=0 i=0 (n/2 ) = 2n. thus on average each call
ylow = y & mask flips two bits and thus runs in constant time.
xhigh = x >> half C 4 × C 4 → C 4 , with probability one has rank at most 7, and may
yhigh = y >> half always be approximated to arbitrary precision by a bilinear map
of rank at most 7. Accounting method: charge 2 dollars for setting bit from 0 to 1.
a = multiply(xhigh, yhigh) one spent on changing, another for changing it back. we always
12.5 Dot Product have enough credit to pay for the next increment on that bit.
b = multiply(xlow + xhigh, ylow + yhigh)
c = multiply(xlow, ylow) Grade school dot product optimal
d = b - a - c 12.6 Fast Fourier
return (((a << half) + d) << half) + c
Given A(x) an B(x) as some polynomials, adding them is O(n),
13.5 Queue
multiplying them is O(n) but we need 2n = 1 points, and O(n2 )
def prim(G,start): for evaluation using Lagrange’s formula.
pq = PriorityQueue() 13 Amortized Analysis Earlier in the semester we saw a way of implementing a queue
for v in G: (FIFO) using two stacks (LIFO). Say that our stack has three
v.setDistance(sys.maxsize) Works well for online algorithms that can process input
piece-by-piece in a serial fashion operations, push, pop and empty, each with cost 1. We saw that
v.setPred(None)
a queue can be implemented as enqueue(x): push x onto stack1
start.setDistance(0) 13.1 Aggregate Analysis
pq.buildHeap([(v.getDistance(),v) for v in G])
dequeue(): if stack2 is empty then pop the entire contents of
while not pq.isEmpty(): Determines upper bound T (n) and then calculated amortized stack1 pushing each element in turn onto stack 2. Now pop
currentVert = pq.delMin() cost to be T (n)/n from stack2 and return the result. We’ve seen earlier that this
for nextVert in currentVert.getConnections():
algorithm is correct, now we will consider the running time in
13.2 Accounting Method more detail. A conventional worst case analysis would establish
newCost = currentVert.getWeight(nextVert) \
+ currentVert.getDistance()
Determines cost of each operating, combining immediate that dequeue takes O(n) time, but this is clearly a weak bound
if v in pq and newCost<nextVert.getDistance():
execution and influence on future operations. Usually short for a sequence of operations, because very few dequeues
nextVert.setPred(currentVert)
running operations accumulate a debt of unfavourable state in actually take that long. Thus O(n2 ) is not a very accurate
nextVert.setDistance(newCost)
small increments while long running operations decrease it characterization of the time needed for a sequence of n enqueue
pq.decreaseKey(nextVert,newCost)
drastically. and dequeue operations, even though in the worst case an
The amortized cost of each operation must be greater than or
equal to cost of actual operation. individual dequeue can take O(n) time. To simplify the
12.3 Block Matrix Multiplication amortized analysis, we will consider only the cost of the push
12.4 Strassen’s Algorithm 13.3 Potential Method and pop operations and not of checking whether stack2 is
Like accounting but overcharges operations early to compensate empty. Aggregate method. Each element is clearly pushed at
Strassen algorithm, named after Volker Strassen, is an algorithm most twice and popped at most twice, at most once from each
used for matrix multiplication. It is faster than the standard for undercharges later
stack. If an element is enqueued and never dequeued, then it is
matrix multiplication algorithm and is useful in practice for 13.4 Infinite Binary Counter pushed at most twice and popped at most once. Thus the
large matrices, but would be slower than the fastest known Prove that any non-negative integer n can be represented as the amortized cost of each enqueue is 3 and of each dequeue is 1.
algorithms for extremely large matrices. sum of distinct powers of 2. Banker’s method. Each enqueue will be charged $3. This will
the matrices are square, and the size is a power of two, and that Proof: The base case n = 0 is trivial. For any n > 0, the inductive cover the $2 cost of popping it and pushing it from stack1 to
padding should be used if needed. This restriction allows the hypothesis implies that there is set of distinct powers of 2 whose stack2 if that ever needs to be done, plus $1 for the initial push
matrices to be split in half, recursively, until limit of scalar onto stack1. The dequeue operations cost $1 to pop from stack2.
multiplication is reached sum is n âĹŠ 1. If we add 20 to this set, we obtain a multiset of
powers of two whose sum is n, which might contain two copies Note that the analysis in both cases seems to charge more for
The number of additions and multiplications required in the storing than removing, even though in the code it is the other
Strassen algorithm can be calculated as follows: let f(n) be the of 20 . Then as long as there are two copies of any 2 i in the way around. Amortized analysis bounds the overall sequence,
number of operations for a 2n 2n matrix. Then by recursive multiset, we remove them both and insert 2i+1 in their place. which in this case depends on how much stuff is stored in the
application of the Strassen algorithm, we see that The sum of the elements of the multiset is unchanged by this data structure. It does not bound the individual operations.
14 Randomized Algorithms 16 Master Theorem selection, randomized search trees, ham-sandwich trees
14.1 Karger Contraction Algorithm Zn
T (n) = aT (n/b) + f (n) where a ≥ 1 and b > 1 are constants and T (n) = T (3n/4) + T (n/4) + n = Θ(n(1 + du/u)) = O(nlog(n))
Pick edge uniformly at random. Contract the edge by replacing 1
edge nodes with single supernode, keeping parallel edges but f (n) is asymptotically positive. Case 1: if f (n) = O(nlogb a− ) for (29)
deleting self loops. Repeat until graph has just two nodes.
some  then T (n) = Θ(nlogb a ). T (n) = T (n/5) + T (7n/10) + n = Θ(np ∗ (1 + Θ(n1−p ))) = Θ(n)
Return the cut (all nodes that were contracted).
Case 2: if f (n) = Θ(nlogb a log k n) with k ≥ 1 then (30)
By repeating this algorithm n2 log(n) times with independent
choices, then the probability of failing to find the min-cut is T (n) = Θ(nl ogb alog k+1 n). Case 3: if f (n) = Ω(nlogb a+ ) with  > 0 Zn
and f (n) satisfies the regularity condition, then T (n) = Θ(f (n)). T (n) = (1/4)T (n/4) + (3/4)T (3n/4) + 1 = Θ(1 + du/u) = Θ(log(n))
≤ 1/n2 . 1
Proof for fastmincut (Karger Stein) the probability of finding a Regularity condition is af (n/b) ≤ cf (n) for some c < 1 and
√ sufficiently large n. Case on right hand side for examples (31)
specific cutset is P (n) = 1 − (1 − (1/2)P ([1 + (n/ 2)])2 with
T (n) = T (n/2) + T (n/4) + 1 = Θ(np (1 + Θ(1))) = O(nlogφ )
solution P (n) = O(1/log(n)).√ The running time of fastmincut T (n) = 3T (n/2) + n2 = Θ(n2 ) 3 (15) (32)
satisfies T (n) = 2T ([1 + n/ 2]) + O(n2 ) with solution
T (n) = 4T (n/2) + n2 = Θ(n2 log(n)) 2 (16)
T (n) = O(n2 logn). To achieve error probability O(1/n) the
algorithm can be repeated O(logn/(P (N ))) times for an overall T (n) = T (n/2) + 2n = Θ(2n ) 3 (17)
runtime of O(n2 log 3 n). T (n) = 2n T (n/2) + nn = ... a not constant (18) 18 Red Black Trees
14.2 Maximum 3 Satisfiability T (n) = 16T (n/4) + n = Θ(n2 ) 1 (19)
MAX-3SAT is a problem in the computational complexity T (n) = 2T (n/2) + nlog(n) = Θ(nlog 2 n) 2 (20)
subfield of computer science. It generalises the Boolean A redâĂŞblack tree is a binary search tree with an extra bit of
T (n) = 2T (n/2) + n/log(n) non poly (21)
satisfiability problem (SAT) which is a decision problem data per node, its color, which can be either red or black. The
considered in complexity theory. It is defined as: T (n) = 2T (n/4) + n0.51 = Θ(n0.51 ) 3 (22) extra bit of storage ensures an approximately balanced tree by
Given a 3-CNF formula φ (i.e. with at most 3 variables per constraining how nodes are colored from any path from the root
T (n) = .5T (n/2) + 1/n a < 1 (23) to the leaf. Thus, it is a data structure which is a type of
clause), find an assignment that satisfies the largest number of
clauses. T (n) = 16T (n/4) + n! = O(n!) 3 (24) self-balancing binary search tree.
Approx-Max3SAT is 7/8-approximate √ √
T (n) = 2T (n/2) + log(n) = Θ( n) 1 (25)
Proof: random variable Z is 1 if clause satisfied 0 otherwise. √
Sum from j=1 to k is number of clauses satisfied. Expected T (n) = 3T (n/3) + n = Θ(n) 1 (26) In addition to the requirements imposed on a binary search tree
number of clauses satisfied is sum from j = 1 to k Pr(Cj is the following must be satisfied by a redâĂŞblack tree: 1. A node
T (n) = 4T (n/2) + cn = Θ(nlog(n)) 3 (27)
satisfied) = 7k/8 is either red or black. 2. The root is black. (This rule is
T (n) = T (n/2) + n(2 − cos(n)) regularity violate (28) sometimes omitted. Since the root can always be changed from
Corollary (Lower Bound on Number of Satisfiable Clauses) For
any instance of 3-SAT, there exists a truth assignment that red to black, but not necessarily vice versa, this rule has little
satisfies at least 7/8 of the clauses. Proof: A random variable is effect on analysis.) 3. All leaves (NIL) are black. (All leaves are
at least its expectation some of the time 16.1 Proof same color as the root.) 4. Every red node must have two black
Corollary Any instance of 3-SAT with at most 7 clauses is child nodes. 5. Every path from a given node to any of its
satisfiable. Proof. Follows from the lower bound on number of descendant NIL nodes contains the same number of black
Probably too long to go onto the exam. nodes.
satisfiable clauses
15 Common Recurrence Relations
16.2 Height of Recursion Tree Proof: height Lemma 1: any node x with height h(x) has a black
T (n) = 2T (n/2) + n (10) height bh(x) ≥ h(x)/2 Proof: By property of 4, ≤ h/2 nodes on the
T (n) = 4[2T (n/8) + n/4] + 2n (11) path from he node are red, hence ≥ h/2 are black. Lemma 2: The
17 Akra-Bazzi Theorem subtree rooted at any node Âăx contains Âă≤ 2bh(x)1 internal
T (n) = 2k T (n/(2k )) + kn (12)
Âănodes. Proof: ÂăBy Âăinduction Âăon height of x. Base Case:
T (n) = n + nlog2 n (13) Generalizes master theorem to divide and conquer algorithms. Height h(x) = 0 implies x is a leaf implies bh(x) = 0. Subtree has
T (n) = 2log2 nT (1) + (log2 n)nT (n) = O(nlogn) (14) Given ai > 0, 0 < bi ≤ 1, functions hi (n) = O(n/log 2 n) and 20 1 = 0 nodes. Induction Step: 1. Each child of x has height h(x)
g(n) = O(nc ), if the function T (n) satisfies - 1 and black-ÂŋâĂŘheight either b(x) (child is red) or b(x) - 1
Recurrence Algorithm Big O
T (n) = ki=1 ai T (bi n + hi (n)) + g(n) then (child is black). 2. By ind. hyp., each child has ≥ 2bh(x)1 1
P
T (n/2) + Θ(1) Binary Search O(log(n)) Rn internal nodes. 3. Subtree rooted at x has
T (n − 1) + Θ(1) Sequential Search O(n) T (n) = Ω(np (1 + 1 (g(u)du/u p+1 )) where p satisfies ≥ 2(2bh(x)1 1) + 1 = 2bh(x) 1 internal nodes. (The +1 is for x itself)
2T (n/2) + Θ(1) tree traversal O(n) Pk p Lemma 3: a red-black tree with n internal nodes has height at
T (n − 1) + Θ(n) Selection Sort (n2 sorts) O(n2 ) i=1 ai bi = 1.
2T (n/2) + Θ(n) Mergesort O(n log n) most 2lg(n + 1) Proof: by lemma 2, n ≥ 2bh − 1, by lemma 1
T (n − 1) + T (0) + Θ(n) Quicksort O(n2 ) The below are, in order, randomized quicksort, deterministic bh ≥ h/2, thus n ≥ 2h/2 − 1 implies h ≤ 2lg(n + 1)
18.1 Insertion x = x.left z.p._red = False
return x z.p.p._red = True
self._left_rotate(z.p.p)
def maximum(self, x=None): self.root._red = False
if None == x: def _left_rotate(self, x):
x = self.root y = x.right
while x.right != self.nil: x._right = y.left
x = x.right if y.left != self.nil:
return x y.left._p = x
y._p = x.p
def insert_key(self, key): if x.p == self.nil:
self.insert_node(self._create_node(key=key)) self._root = y
elif x == x.p.left:
def insert_node(self, z): x.p._left = y
y = self.nil else:
x = self.root x.p._right = y
while x != self.nil: y._left = x
y = x x._p = y
if z.key < x.key:
x = x.left def _right_rotate(self, y):
else: x = y.left
x = x.right y._left = x.right
z._p = y if x.right != self.nil:
if y == self.nil: x.right._p = y
self._root = z x._p = y.p
elif z.key < y.key: if y.p == self.nil:
18.2 Code y._left = z self._root = x
else: elif y == y.p.right:
class rbnode(object): y._right = z y.p._right = x
def __init__(self, key): z._left = self.nil else:
"Construct." z._right = self.nil y.p._left = x
self._key = key z._red = True x._right = y
self._red = False self._insert_fixup(z) y._p = x
self._left = None
self._right = None def _insert_fixup(self, z): def check_invariants(self):
self._p = None while z.p.red: def is_red_black_node(node):
if z.p == z.p.p.left: # check has _left and _right or neither
class rbtree(object): y = z.p.p.right if (node.left and not node.right) or
def __init__(self, create_node=rbnode): if y.red: (node.right and not node.left):
self._nil = create_node(key=None) z.p._red = False return 0, False
"Our nil node, used for all leaves." y._red = False # check leaves are black
self._root = self.nil z.p.p._red = True if not node.left and not node.right and node.red:
"The root of the tree." z = z.p.p return 0, False
self._create_node = create_node else: # if node is red, check children are black
"A callable that creates a node." if z == z.p.right: if node.red and node.left and node.right:
z = z.p if node.left.red or node.right.red:
self._left_rotate(z) return 0, False
root = property(fget=lambda self: self._root, doc="The tree’s root node") z.p._red = False # descend tree and check black counts are balanced
nil = property(fget=lambda self: self._nil, doc="The tree’s nil node") z.p.p._red = True if node.left and node.right:
def search(self, key, x=None): self._right_rotate(z.p.p) # check children’s parents are correct
if None == x: else: if self.nil != node.left and node != node.left.p:
x = self.root y = z.p.p.left return 0, False
while x != self.nil and key != x.key: if y.red: if self.nil != node.right and node != node.right.p:
if key < x.key: z.p._red = False return 0, False
x = x.left y._red = False # check children are ok
else: z.p.p._red = True left_counts, left_ok = is_red_black_node(node.left)
x = x.right z = z.p.p if not left_ok:
return x else: return 0, False
def minimum(self, x=None): if z == z.p.left: right_counts, right_ok = is_red_black_node(node.right)
if None == x: z = z.p if not right_ok:
x = self.root self._right_rotate(z) return 0, False
while x.left != self.nil:
# check children’s counts are ok def schedule_weighted_intervals(I):
if left_counts != right_counts: # Use dynamic algorithm to schedule weighted intervals
return 0, False # sorting is O(n log n),
return left_counts, True # finding p[1..n] is O(n log n),
else: # finding OPT[1..n] is O(n),
return 0, True # selecting is O(n)
num_black, is_ok = is_red_black_node(self.root) # whole operation is dominated by O(n log n)
return is_ok and not self.root._red I.sort(lambda x, y: x.finish - y.finish)
# f_1 <= f_2 <= .. <= f_n
19 Interval Scheduling
Proof of optimality for weighted: To prove optimality, we just p = compute_previous_intervals(I)
# compute OPTs iteratively in O(n), here we use DP
need to show that for all 1 ≤ i ≤ n, Si contains the value of an
OPT = collections.defaultdict(int)
optimal solution of the first i intervals. We do so using
OPT[-1] = 0
induction. Proof. For i = 0, S0 = 0 and it is optimal since no
OPT[0] = 0
interval has been processed, and suppose the claim holds for Sj
for j in xrange(1, len(I)):
for all j < i. Consider Si : Either Ii was added to the solution or
OPT[j] = max(I[j].weight + OPT[p[j]], OPT[j - 1])
it wasnâĂŹt. If Ii was not added to the solution then the # given OPT and p, find actual solution intervals in O(n)
optimal solution for the first i intervals is just the same as O = []
optimal solution for the first i âĹŠ 1 intervals, i.e. Si = SiâĹŠ1. def compute_solution(j):
Otherwise, suppose Ii is added to the solution. Then all the if j >= 0: # will halt on OPT[-1]
intervals Ip[i]+1, Ip[i]+2, ..., IiâĹŠ1 conflict with Ii and the if I[j].weight + OPT[p[j]] > OPT[j - 1]:
O.append(I[j])
remaining intervals to chose from are amongst the first p[i]
compute_solution(p[j])
intervals. Therefore any optimal solution that includes Ii must
else:
be a subset of {I1, I2, ..., Ip[i] , Ii}. Since Ii does not intersect with compute_solution(j - 1)
any interval in {I1, ..., Ip[i]} and Sp[i] is the optimal solution of compute_solution(len(I) - 1)
the first p[i] intervals (by induction hypothesis), we conclude # resort, as our O is in reverse order (OPTIONAL)
that Si = Sp[i] + wi. And since Si is the maximization of these O.sort(lambda x, y: x.finish - y.finish)
two cases, the larger value out of these two scenarios is the value
of Si. return O

def schedule_unweighted_intervals(I):
# Use greedy algorithm to schedule unweighted intervals
# sorting is O(n log n), selecting is O(n)
# whole operation is dominated by O(n log n)
# f_1 <= f_2 <= .. <= f_n
I.sort(lambda x, y: x.finish - y.finish)
O = [] 20 Insert into AVL Tree
finish = 0
for i in I:
if finish <= i.start: 1. Left-left Case: x is left child of y and y is left child of z. right
finish = i.finish rotate
O.append(i) 2. Left-Right Case: x is the right child of y and y is the left child 21 Difference Between Prim’s and Kruskal’s Algorithm
return O of z. Kruska’s builds a minimum spanning tree by adding one edge at
3. Right-Left Case: x is the left child of y and y is the right child
def compute_previous_intervals(I): of z. a time. The next line is always the shortest (minimum weight)
# For every interval j, 4. Right-Right Case: x is the right child of y and y is the right ONLY if it does NOT create a cycle. Prims builds a mimimum
# compute the rightmost mutually child of z. spanning tree by adding one vertex at a time. The next vertex to
# compatible interval i, where i < j be added is always the one nearest to a vertex already on the
# I is a sorted list of Interval objects (sorted by finish time) graph.
In Prim’s, you always keep a connected component, starting
# extract start and finish times with a single vertex. You look at all edges from the current
start = [i.start for i in I] component to other vertices and find the smallest among them.
finish = [i.finish for i in I] You then add the neighbouring vertex to the component,
p = [] increasing its size by 1. In N-1 steps, every vertex would be
for j in xrange(len(I)): merged to the current one if we have a connected graph. In
# rightmost interval f_i <= s_j Kruskal’s, you do not keep one connected component but a
i = bisect.bisect_right(finish, start[j]) - 1 forest. At each stage, you look at the globally smallest edge that
p.append(i) does not create a cycle in the current forest. Such an edge has to
return p necessarily merge two trees in the current forest into one. Since
you start with N single-vertex trees, in N-1 steps, they would all
have merged into one if the graph was connected. # the item is not yet part of a set in X, a new singleton set is
def quickSortHelper(alist,first,last): # created for it.
22 Heap Sort if first<last:
splitpoint = partition(alist,first,last) #- X.union(item1, item2, ...) merges the sets containing each item
def heapsort( aList ):
quickSortHelper(alist,first,splitpoint-1) # into a single larger set. If any item is not yet part of a set
# convert aList to heap
quickSortHelper(alist,splitpoint+1,last) # in X, it is added to X as one of the members of the merged set.
length = len( aList ) - 1
leastParent = length / 2
def partition(alist,first,last):
for i in range ( leastParent, -1, -1 ):
pivotvalue = alist[first] def __init__(self):
moveDown( aList, i, length )
leftmark = first+1 #Create a new empty union-find structure.
# flatten heap into sorted array
rightmark = last self.weights = {}
for i in range ( length, 0, -1 ):
done = False self.parents = {}
if aList[0] > aList[i]:
while not done:
swap( aList, 0, i )
while leftmark <= rightmark and \ def __getitem__(self, object):
moveDown( aList, 0, i - 1 )
alist[leftmark] <= pivotvalue: #Find and return the name of the set containing the object.
leftmark = leftmark + 1
def moveDown( aList, first, last ):
while alist[rightmark] >= pivotvalue and \ # check for previously unknown object
largest = 2 * first + 1
rightmark >= leftmark: if object not in self.parents:
while largest <= last:
rightmark = rightmark -1 self.parents[object] = object
# right child exists and is larger than left child
if rightmark < leftmark: self.weights[object] = 1
if ( largest < last ) and ( aList[largest] < aList[largest + 1] ):
done = True return object
largest += 1
else:
# right child is larger than parent
temp = alist[leftmark] # find path of objects leading to the root
if aList[largest] > aList[first]:
alist[leftmark] = alist[rightmark] path = [object]
swap( aList, largest, first )
alist[rightmark] = temp root = self.parents[object]
# move down to largest child
temp = alist[first] while root != path[-1]:
first = largest;
alist[first] = alist[rightmark] path.append(root)
largest = 2 * first + 1
alist[rightmark] = temp root = self.parents[root]
else:
return rightmark
return # force exit
# compress the path and return
for ancestor in path:
def swap( A, x, y ): 24 Quick Find and Union self.parents[ancestor] = root
tmp = A[x] Union-find cost model. When studying algorithms for return root
A[x] = A[y] union-find, we count the number of array accesses (number of
A[y] = tmp times an array entry is accessed, for read or write). def __iter__(self):
Definitions. The size of a tree is its number of nodes. The depth #Iterate through all items ever found or unioned by this
23 Quick Sort of a node in a tree is the number of links on the path from it to # structure.
Randomized quicksort: the root. The height of a tree is the maximum depth among its return iter(self.parents)
Analysis: One can show that randomized quicksort has the nodes.
desirable property that, for any input, it requires only O(n log n) Proposition. The quick-find algorithm uses one array access for def union(self, *objects):
expected time (averaged over all choices of pivots). However, each call to find() and between N+3 and 2N+1 array accesses for #Find the sets containing the objects and merge them all.
there also exists a combinatorial proof. To each execution of each call to union() that combines two components. roots = [self[x] for x in objects]
quicksort corresponds the following binary search tree (BST): heaviest = max([(self.weights[r],r) for r in roots])[1]
Proposition. The number of array accesses used by find() in
the initial pivot is the root node; the pivot of the left half is the for r in roots:
quick-union is 1 plus twice the depth of the node corresponding
root of the left subtree, the pivot of the right half is the root of if r != heaviest:
to the given site. The number of array accesses used by union() self.weights[heaviest] += self.weights[r]
the right subtree, and so on. The number of comparisons of the and connected() is the cost of the two find() operations (plus 1
execution of quicksort equals the number of comparisons self.parents[r] = heaviest
during the construction of the BST by a sequence of insertions. for union() if the given sites are in different trees).
So, the average number of comparisons for randomized Proposition. The depth of any node in a forest built by weighted 25 Knapsack Problem
quicksort equals the average cost of constructing a BST when quick-union for N sites is at most lg N. Corollary. For weighted
quick-union with N sites, the worst-case order of growth of the The knapsack problem or rucksack problem is a problem in
the values inserted (x_1, x_2, ..., x_n) form a random combinatorial optimization: Given a set of items, each with a
permutation. By linearity of expectation, the expected value is cost of find(), connected(), and union() is log N. mass and a value, determine the number of each item to include
the sum from i and j less than i of the probability of the C for i, j class UnionFind: in a collection so that the total weight is less than or equal to a
where c(i,j) is a binary random value expressing whether during #Union-find data structure. given limit and the total value is as large as possible. It derives
insert of xi there was a comparison to xi. Since these are random its name from the problem faced by someone who is constrained
permutation the probability xi is adjacent to xj is exactly (2/(j + #Each unionFind instance X maintains a family of disjoint sets of by a fixed-size knapsack and must fill it with the most valuable
1)). the sum from i sum from j < i of (2/(j+1)) is O(sum i log i) = #hashable objects, supporting the following two methods: items.
The decision problem form of the knapsack problem (Can a
O(nlog(n))
#- X[item] returns a name for the set containing the given item. value of at least V be achieved without exceeding the weight W?)
def quickSort(alist): # Each set is named by an arbitrarily-chosen one of its members; is
asNP-complete, thus there is no possible algorithm both correct
quickSortHelper(alist,0,len(alist)-1) # long as the set remains unchanged it will keep the same name. Ifand fast (polynomial-time) on all cases, unless P=NP. While the
decision problem is NP-complete, the optimization problem is programming algorithm by scaling and rounding (i.e. using 26 Minimum cut
NP-hard, its resolution is at least as difficult as the decision fixed-point arithmetic), but if the problem requires d fractional In graph theory, a minimum cut of a graph is a cut (a partition
problem, and there is no known polynomial algorithm which digits of precision to arrive at the correct answer, W will need to of the vertices of a graph into two disjoint subsets that are
can tell, given a solution, whether it is optimal (which would be scaled by 10d , and the DP algorithm will require O(W ∗ 10d ) joined by at least one edge) whose cut set has the smallest
mean that there is no solution with a larger V, thus solving the number of edges (unweighted case) or smallest sum of weights
decision problem NP-complete). There is a pseudo-polynomial space and O(nW ∗ 10d ) time. Meet-in-the-middle algorithm
input: possible. Several algorithms exist to find minimum cuts.
time algorithm using dynamic programming. There is a fully
polynomial-time approximation scheme, which uses the a set of items with weights and values For a graph G = (V, E), the problem can be reduced to 2|V | âĹŠ
pseudo-polynomial time algorithm as a subroutine, described output: 2 = O(|V |) maximum flow problems, equivalent to O(|V |) s âĹŠ
below. Many cases that arise in practice, and "random instances" the greatest combined value of a subset t cut problems by the max-flow min-cut theorem. Hao and
from some distributions, can nonetheless be solved exactly. partition the set 1...n into two sets A and B of approximately Orlin[1] have shown an algorithm to compute these max-flow
equal size compute the weights and values of all subsets of each problems in time asymptotically equal to one max-flow
Another algorithm for 0-1 knapsack, sometimes called set for each subset of A find the subset of B of greatest value
"meet-in-the-middle" due to parallels to a similarly named such that the combined weight is less than W keep track of the computation, requiring O(|V ||E|log(|V |2 /|E|)) steps.
algorithm in cryptography, is exponential in the number of greatest combined value seen so far Asymptotically faster algorithms exist for undirected graphs,
different items but may be preferable to the DP algorithm when though these do not necessarily extend to the directed case. A
W is large compared to n. In particular, if the w_i are study by Chekuri et al. established experimental results with
nonnegative but not integers, we could still use the dynamic various algorithms

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy