Ada Note Hkyrs
Ada Note Hkyrs
Ada Note Hkyrs
An algorithm, named after the ninth century scholar Abu Jafar Muhammad
Ibn Musu Al-Khowarizmi, is defined as follows: Roughly speaking:
The most famous algorithm in history dates well before the time of the ancient
Greeks: this is Euclid's algorithm for calculating the greatest common divisor
of two integers.
Multiply the multiplicand one after another by each digit of the multiplier
taken from right to left.
Multiply the multiplicand one after another by each digit of the multiplier
taken from left to right.
Algorithmic is a branch of computer science that consists of designing and
analyzing computer algorithms
We start with defining the model of computation, which is usually the Random
Access Machine (RAM) model, but other models of computations can be use
such as PRAM. Once the model of computation has been defined, an algorithm
can be describe using a simple language (or pseudo language) whose syntax is
close to programming language such as C or java
Algorithm's Performance
Two important ways to characterize the effectiveness of an algorithm are its
space complexity and time complexity. Time complexity of an algorithm
concerns determining an expression of the number of steps needed as a function
of the problem size. Since the step count measure is somewhat coarse, one does
not aim at obtaining an exact step count. Instead, one attempts only to get
asymptotic bounds on the step count. Asymptotic analysis makes use of the O
(Big Oh) notation. Two other notational constructs used by computer scientists
in the analysis of algorithms are Θ (Big Theta) notation and Ω (Big Omega)
notation.
The performance evaluation of an algorithm is obtained by totaling the number
of occurrences of each operation when running the algorithm. The performance
of an algorithm is evaluated as a function of the input size n and is to be
considered modulo a multiplicative constant.
This notation bounds a function to within constant factors. We say f(n) = Θ(g(n)) if
there exist positive constants n0, c1 and c2 such that to the right of n0 the value of f(n)
always lies between c1g(n) and c2g(n) inclusive.
This notation gives an upper bound for a function to within a constant factor. We
write f(n) = O(g(n)) if there are positive constants n0 and c such that to the right of
n0, the value of f(n) always lies on or below cg(n).
Ω-Notation (Lower Bound)
This notation gives a lower bound for a function to within a constant factor. We
write f(n) = Ω(g(n)) if there are positive constants n0 and c such that to the right of
n0, the value of f(n) always lies on or above cg(n).
Algorithm Analysis
The complexity of an algorithm is a function g(n) that gives the upper bound of
the number of operation (or running time) performed by an algorithm when the
input size is n.
There are two interpretations of upper bound.
Worst-case Complexity
The running time for any given size input will be lower than the upper bound except
possibly for some values of the input where the maximum is reached.
Average-case Complexity
The running time for any given size input will be the average number of operations
over all problem instances for a given size.
Optimality
Once the complexity of an algorithm has been estimated, the question arises
whether this algorithm is optimal. An algorithm for a given problem is optimal
if its complexity reaches the lower bound over all the algorithms solving this
problem. For example, any algorithm solving “the intersection of n segments”
problem will execute at least n2 operations in the worst case even if it does
nothing but print the output. This is abbreviated by saying that the problem has
Ω(n2) complexity. If one finds an O(n2) algorithm that solve this problem, it
will be optimal and of complexity Θ(n2).
Reduction
Another technique for estimating the complexity of a problem is the
transformation of problems, also called problem reduction. As an example,
suppose we know a lower bound for a problem A, and that we would like to
estimate a lower bound for a problem B. If we can transform A into B by a
transformation step whose cost is less than that for solving A, then B has the
same bound as A.
Set
For example, consider the set S= {7, 21, 57}. Then 7 {7, 21, 57} and 8 {7, 21, 57}
or equivalently, 7 S and 8 S.
We can also describe a set containing elements according to some rule. We write
{n : rule about n}
Set Cardinality
The number of elements in a set is called cardinality or size of the set, denoted |S| or
sometimes n(S). The two sets have same cardinality if their elements can be put into
a one-to-one correspondence. It is easy to see that the cardinality of an empty set is
zero i.e., | | .
Mustiest
If we do want to take the number of occurrences of members into account, we call
the group a multiset.
For example, {7} and {7, 7} are identical as set but {7} and {7, 7} are different as
multiset.
Infinite Set
A set contains infinite elements. For example, set of negative integers, set of
integers, etc.
Empty Set
Set contain no member, denoted as or {}.
Subset
For two sets A and B, we say that A is a subset of B, written A B, if every member
of A also is a member of B.
Formally, A B if
x A implies x B
written
x A => x B.
Proper Subset
Equal Sets
The sets A and B are equal, written A = B, if each is a subset of the other. Rephrased
definition, let A and B be sets. A = B if A B and B A.
Power Set
Let A be the set. The power of A, written P(A) or 2A, is the set of all subsets of A.
That is, P(A) = {B : B A}.
For example, consider A={0, 1}. The power set of A is {{}, {0}, {1}, {0, 1}}. And the
power set of A is the set of all pairs (2-tuples) whose elements are 0 and 1 is {(0, 0),
(0, 1), (1, 0), (1, 1)}.
Disjoint Sets
Union of Sets
The union of A and B, written A B, is the set we get by combining all elements in A
and B into a single set. That is,
A B = { x : x A or x B}.
|A B| = |A| + |B| - |A B|
We can conclude
|A B| |A| + |B|
That is,
The intersection of set set A and B, written A B, is the set of elements that are
both in A and in B. That is,
A B = { x : x A and x B}.
Partition of Set
A collection of S = {Si} of nonempty sets form a partition of a set if
i. The set are pair-wise disjoint, that is, Si, Sj and i j imply Si Sj = .
Difference of Sets
Let A and B be sets. The difference of A and B is
A - B = {x : x A and x B}.
For example, let A = {1, 2, 3} and B = {2, 4, 6, 8}. The set difference A - B = {1, 3}
while B-A = {4, 6, 8}.
Complement of a Set
All set under consideration are subset of some large set U called universal set. Given
a universal set U, the complement of A, written A', is the set of all elements under consideration that are not in
A.
A' = A - U
OR
i. A'' = A
ii. A A' = .
iii. A A' = U
Symmetric difference
Therefore,
A B = (A B) - (A B)
As an example, consider the following two sets A = {1, 2, 3} and B = {2, 4, 6, 8}. The
symmetric difference, A B = {1, 3, 4, 6, 8}.
Sequences
A sequence of objects is a list of objects in some order. For example, the sequence 7,
21, 57 would be written as (7, 21, 57). In a set the order does not matter but in a
sequence it does.
Hence, (7, 21, 57) {57, 7, 21} But (7, 21, 57) = {57, 7, 21}.
Repetition is not permitted in a set but repetition is permitted in a sequence. So, (7,
7, 21, 57) is different from {7, 21, 57}.
Tuples
Finite sequence often are called tuples. For example,
An ordered pair of two elements a and b is denoted (a, b) and can be defined as (a,
b) = (a, {a, b}).
For example, let A = {1, 2} and B = {x, y, z}. Then A×B = {(1, x), (1, y), (1, z), (2, x),
(2, y), (2, z)}.
n-tuples
The cartesian product of n sets A1, A2, ..., An is the set of n-tuples
whose cardinality is
| A × A × ... × A | = |A | . |A | ... |A |
1 2 n 1 2 n
If all sets are finite. We denote an n-fold cartesian product over a single set A by the
set
An = A × A × ... × A
whose cardinality is
|An | = | A|n
if A is finite.
http://www.personal.kent.edu/~rmuhamma/Algorithms/MyAlgorithms/MathAlgor/sets.html
Greedy Introduction
Greedy Approach
Greedy Algorithm works by making the decision that seems most promising at
any moment; it never reconsiders this decision, whatever situation may
arise later.
Informal Algorithm
Start with nothing.
•
at every stage without passing the given amount.
•
add the largest to the coins already chosen.
o
Formal Algorithm
Make change for n units using the least possible number of coins.
MAKE-CHANGE (n)
C ← {100, 25, 10, 5, 1} // constant.
Sol ← {}; // set that will hold the solution set.
Sum ← 0 sum of item in solution set
WHILE sum not = n
x = largest item in set C such that sum + x ≤ n
IF no such item THEN
RETURN "No Solution"
S ← S {value of x}
sum ← sum + x
RETURN S
Example Make a change for 2.89 (289 cents) here n = 2.89 and the solution
contains 2 dollars, 3 quarters, 1 dime and 4 pennies. The algorithm is greedy
because at every stage it chooses the largest coin without worrying about the
consequences. Moreover, it never changes its mind in the sense that once a coin
has been included in the solution set, it remains there.
To construct the solution in an optimal way. Algorithm maintains two sets. One
contains chosen items and the other contains rejected items.
3. The selection function tells which of the candidates is the most promising.
4. An objective function, which does not appear explicitly, gives the value of a solution.
Definitions of feasibility
A feasible set (of candidates) is promising if it can be extended to produce not
merely a solution, but an optimal solution to the problem. In particular,
the empty set is always promising why? (because an optimal solution
always exists)
Greedy-Choice Property
Greedy-Choice Property
Knapsack Problem
Statement A thief robbing a store and can carry a maximal weight of w
into their knapsack. There are n items and ith item weigh wi and is worth vi
dollars. What items should thief take?
The setup is same, but the thief can take fractions of items, meaning that the items can be broken into
smaller pieces so that thief may decide to carry only a fraction of x of item i, where 0 ≤ x ≤ 1.
i i
?????
II. 0-1 knapsack problem
The setup is the same, but the items may not be broken into smaller pieces, so thief may decide either
to take an item or to leave it (binary choice), but may not take a fraction of an item.
Dynamic-Programming Solution
to the 0-1 Knapsack Problem
Let i be the highest-numbered item in an optimal solution S for W pounds.
Then S`= S - {i} is an optimal solution for W-wi pounds and the value to
the solution S is Vi plus the value of the subproblem.
We can express this fact in the following formula: define c[i, w] to be the
solution for items 1,2, . . . , i and maximum weight w. Then
0 if i = 0 or w = 0
c[i,w]
c[i-1, w] if w ≥ 0
= i
This says that the value of the solution to i items either include ith item, in
which case it is vi plus a subproblem solution for (i-1) items and the weight
excluding wi, or does not include ith item, in which case it is a subproblem's
solution for (i-1) items and the same weight. That is, if the thief picks item i,
thief takes vi value, and thief can choose from items w-wi, and get c[i-1, w-wi]
additional value. On other hand, if thief decides not to take item i, thief can
choose from item 1,2, . . . , i-1 upto the weight limit w, and get c[i-1, w] value.
The better of these two choices should be made.
Although the 0-1 knapsack problem, the above formula for c is similar to LCS
formula: boundary values are 0, and other values are computed from the input
and "earlier" values of c. So the 0-1 knapsack algorithm is like the LCS-length
algorithm given in CLR-book for finding a longest common subsequence of
two sequences.
The algorithm takes as input the maximum weight W, the number of items n,
and the two sequences v = <v1, v2, . . . , vn> and w = <w1, w2, . . . , wn>. It stores
the c[i, j] values in the table, that is, a two dimensional array, c[0 . . n, 0 . . w]
whose entries are computed in a row-major order. That is, the first row of c is
filled in from left to right, then the second row, and so on. At the end of the
computation, c[n, w] contains the maximum value that can be picked into the
knapsack.
Dynamic-0-1-knapsack (v, w, n, W)
for w = 0 to W
do c[0, w] = 0
for i = 1 to n
do c[i, 0] = 0
for w = 1 to W
do if wi ≤ w
then if vi + c[i-1, w-wi]
then c[i, w] = vi + c[i-1, w-wi]
else c[i, w] = c[i-1, w]
else
c[i, w] = c[i-1, w]
The set of items to take can be deduced from the table, starting at c[n. w] and
tracing backwards where the optimal values came from. If c[i, w] = c[i-1, w]
item i is not part of the solution, and we are continue tracing with c[i-1, w].
Otherwise item i is part of the solution, and we continue tracing with c[i-1, w-
W].
Analysis
This dynamic-0-1-kanpsack algorithm takes θ(nw) times, broken up as
follows:
θ(nw) times to fill the c-table, which has (n+1).(w+1) entries, each
requiring θ(1) time to compute. O(n) time to trace the solution, because the
tracing process starts in row n of the table and moves up 1 row at each step.
It is clear that an optimal solution must fill the knapsack exactly, for otherwise
we could add a fraction of one of the remaining objects and increase the value
of the load. Thus in an optimal solution nSi=1 xiwi = W.
Greedy-fractional-knapsack (w, v, W)
FOR i =1 to n
do x[i] =0
weight = 0
while weight < W
do i = best remaining item
IF weight + w[i] ≤ W
then x[i] = 1
weight = weight + w[i]
else
x[i] = (w - weight) / w[i]
weight = W
return x
Analysis
If the items are already sorted into decreasing order of vi / wi, then
the while-loop takes a time in O(n);
Therefore, the total time including the sort is in O(n log n).
If we keep the items in heap with largest vi/wi at the root. Then
Although this data structure does not alter the worst-case, it may be faster if
only a small number of items are need to fill the knapsack.
One variant of the 0-1 knapsack problem is when order of items are sorted by
increasing weight is the same as their order when sorted by decreasing value.
The optimal solution to this problem is to sort by the value of the item in
decreasing order. Then pick up the most valuable item which also has a least
weight. First, if its weight is less than the total weight that can be carried. Then
deduct the total weight that can be carried by the weight of the item just pick.
The second item to pick is the most valuable item among those remaining.
Keep follow the same strategy until thief cannot carry more item (due to
weight).
Proof
One way to proof the correctness of the above algorithm is to prove the greedy
choice property and optimal substructure property. It consist of two steps. First,
prove that there exists an optimal solution begins with the greedy choice given
above. The second part prove that if A is an optimal solution to the original
problem S, then A - a is also an optimal solution to the problem S - s where
a is the item thief picked as in the greedy choice and S - s is the subproblem
after the first greedy choice has been made. The second part is easy to prove
since the more valuable items have less weight.
Note that if v` / w` , is not it can replace any other because w` < w, but it
increases the value because v` > v. □
It is clear that an optimal solution must fill the knapsack exactly, for otherwise
we could add a fraction of one of the remaining objects and increase the value
of the load. Thus in an optimal solution nSi=1 xiwi = W.
Greedy-fractional-knapsack (w, v, W)
FOR i =1 to n
do x[i] =0
weight = 0
while weight < W
do i = best remaining item
IF weight + w[i] ≤ W
then x[i] = 1
weight = weight + w[i]
else
x[i] = (w - weight) / w[i]
weight = W
return x
Analysis
If the items are already sorted into decreasing order of vi / wi, then
the while-loop takes a time in O(n);
Therefore, the total time including the sort is in O(n log n).
If we keep the items in heap with largest vi/wi at the root. Then
Although this data structure does not alter the worst-case, it may be faster if
only a small number of items are need to fill the knapsack.
One variant of the 0-1 knapsack problem is when order of items are sorted by
increasing weight is the same as their order when sorted by decreasing value.
The optimal solution to this problem is to sort by the value of the item in
decreasing order. Then pick up the most valuable item which also has a least
weight. First, if its weight is less than the total weight that can be carried. Then
deduct the total weight that can be carried by the weight of the item just pick.
The second item to pick is the most valuable item among those remaining.
Keep follow the same strategy until thief cannot carry more item (due to
weight).
Proof
One way to proof the correctness of the above algorithm is to prove the greedy
choice property and optimal substructure property. It consist of two steps. First,
prove that there exists an optimal solution begins with the greedy choice given
above. The second part prove that if A is an optimal solution to the original
problem S, then A - a is also an optimal solution to the problem S - s where
a is the item thief picked as in the greedy choice and S - s is the subproblem
after the first greedy choice has been made. The second part is easy to prove
since the more valuable items have less weight.
Note that if v` / w` , is not it can replace any other because w` < w, but it
increases the value because v` > v. □
Problem Statement
Given a set S of n activities with and start time, Si and fi, finish time of an ith
activity. Find the maximum size set of mutually compatible activities.
Compatible Activities
Activities i and j are compatible if the half-open internal [si, fi) and
[sj, fj)
do not overlap, that is, i and j are compatible if si ≥ fj and sj ≥ fi
2. A={i}
3. j=1
4. for i = 2 to n
5. do if s ≥ f
i j
6. then A= AU{i}
7. j=i
8. return set A
Analysis
Part I requires O(n lg n) time (use merge of heap sort).
Part II requires θ(n) time assuming that activities were already
sorted in part I by their finish time.
Correctness
Note that Greedy algorithm do not always produce optimal solutions but
GREEDY-ACTIVITY-SELECTOR does.
Proof
I. Let S = {1, 2, . . . , n} be the set of activities. Since activities are in order by finish time. It implies that activity 1 has the
Suppose, A S is an optimal solution and let activities in A are ordered by finish time. Suppose, the first activity in A is k.
If k = 1, then A begins with greedy choice and we are done (or to be very precise, there is nothing to proof here).
If k 1, we want to show that there is another solution B that begins with greedy choice, activity 1.
Let B = A - {k} {1}. Because f f , the activities in B are disjoint and since B has same number of activities as
1 k
A, i.e., |A| = |B|, B is also optimal.
II. Once the greedy choice is made, the problem reduces to finding an optimal solution for the problem. If A is an optimal
solution to the original problem S, then A` = A - {1} is an optimal solution to the activity-selection problem S` = {i
S: S f }.
i i
why? Because if we could find a solution B` to S` with more activities then A`, adding 1 to B` would yield a solution B to S
with more activities than A, there by contradicting the optimality. □
LECTURE-HALL-ASSIGNMENT
(s, f)
n = length [s)
for i = 1 to n
do HALL [i] = NIL
k=1
while (Not empty (s))
do HALL [k] =
GREEDY-ACTIVITY-
SELECTOR (s, t, n)
k=k+1
return HALL
Following changes can be made in the GREEDY-ACTIVITY-SELECTOR (s, f)
(see CLR).
j = first (s)
A=i
for i = j + 1 to n
do if s(i) not= "-"
then if
GREED-ACTIVITY-SELECTOR
(s, f, n)
j = first (s)
A = i = j + 1 to n
if s(i] not = "-" then
if s[i] ≥ f[j]|
then A = AU{i}
s[i] = "-"
j=i
return A
Correctness
The algorithm can be shown to be correct and optimal. As a contradiction,
assume the number of lecture halls are not optimal, that is, the algorithm
allocates more hall than necessary. Therefore, there exists a set of activities B
which have been wrongly allocated. An activity b belonging to B which has
been allocated to hall H[i] should have optimally been allocated to H[k].
This implies that the activities for lecture hall H[k] have not been allocated
optimally, as the GREED-ACTIVITY-SELECTOR produces the optimal set of
activities for a particular lecture hall.
Analysis
In the worst case, the number of lecture halls require is n. GREED-ACTIVITY-
2
SELECTOR runs in θ(n). The running time of this algorithm is O(n ).
Choosing the activity with the least overlap will not always produce solution. For example, we have a set of activities {(0, 4),
•
(4, 6), (6, 10), (0, 1), (1, 5), (5, 9), (9, 10), (0, 3), (0, 2), (7, 10), (8, 10)}. Here the one with the least overlap with other
activities is (4, 6), so it will be picked first. But that would prevent the optimal solution of {(0, 1), (1, 5), (5, 9), (9, 10)} from
being found.
Huffman Codes
Example
Suppose we have a data consists of 100,000 characters that we want to
compress. The characters in the data occur with following frequencies.
a b c d e f
Frequency
45,00013,00012,00016,0009,000 5,000
Consider the problem of designing a "binary character code" in which each
character is represented by a unique binary string.
a b c d e f
Frequency
45,000 13,000 12,000 16,000 9,000 5,000
Fixed Length code 000 001 010 011 100 101
Conclusion
Fixed-length code requires 300,000 bits while variable code
requires 224,000 bits.
Prefix Codes
In which no codeword is a prefix of other codeword. The reason prefix codes
are desirable is that they simply encoding (compression) and decoding.
Can we do better?
a b c d e f
Frequency
45,000 13,000 12,000 16,000 9,000 5,000
Fixed Length code 0 101 100 111 1101 1100
Implies that the total bits are: 45,000 + 123,000 + 56,000 = 224,000 bits
Encoding: Concatenate the codewords representing each characters of the
file.
String Encoding
TEA 10 00 010
SEA 011 00 010
TEN 10 00 110
a b c
Decoding
Since no codeword is a prefix of other, the codeword that begins an encoded
file is unambiguous.
To decode (Translate back to the original character), remove it from the encode
file and repeatedly parse.
For example in "variable-length codeword" table, the string 001011101 parse
uniquely as 0.0.101.1101, which is decode to aabe.
The representation of "decoding process" is binary tree, whose leaves are
characters. We interpret the binary codeword for a character as path from the
root to that character, where 0 means "go to the left child" and 1 means "go to
the right child". Note that an optimal code for a file is always represented by a
full (complete) binary tree.
Proof Let T be a binary tree corresponds to prefix code such that T is not
full. Then there must exist an internal node, say x, such that x has only one
child, y. Construct another binary tree, T`, which has save leaves as T and have
same depth as T except for the leaves which are in the subtree rooted at y in T.
These leaves will have depth in T`, which implies T cannot correspond to an
optimal prefix code.
To obtain T`, simply merge x and y into a single node, z is a child of parent of
x (if a parent exists) and z is a parent to any children of y. Then T` has the
desired properties: it corresponds to a code on the same alphabet as the code
which are obtained, in the subtree rooted at y in T have depth in T` strictly less
(by one) than their depth in T.
This completes the proof. □
a b c d e f
Frequency
45,000 13,000 12,000 16,000 9,000 5,000
Fixed Length code 000 001 010 011 100 101
Figure
Figure
If C is the alphabet from which characters are drawn, then the tree for an
optimal prefix code has exactly |c| leaves (one for each letter) and exactly |c|-1
internal orders. Given a tree T corresponding to the prefix code, compute the
number of bits required to encode a file. For each character c in C, let f(c) be
the frequency of c and let dT(c) denote the depth of c's leaf. Note that dT(c) is
also the length of codeword. The number of bits to encode a file is
Therefore, the cost of the tree corresponding to the optimal prefix code is 224
(224*1000 = 224000).
Analysis
Q implemented as a binary heap.
•
line 2 can be performed by using BUILD-HEAP (P. 145; CLR) in O(n) time.
•
FOR loop executed |n| - 1 times and since each heap operation requires O(lg n) time.
•
=> the FOR loop contributes (|n| - 1) O(lg n)
=> O(n lg n)
Thus the total running time of Huffman on the set of n characters is O(nlg n).
•
Operation of the Algorithm
Since there are letters in the alphabet, the initial queue size is n = 8, and 7
merge steps are required to build the tree. The final tree represents the optimal
prefix code.
Figure
The codeword for a letter is the sequence of the edge labels on the path from
the root to the letter. Thus, the optimal Huffman code is as follows:
h: 1
g: 1 0
f: 1 1 0
e: 1 1 1 0
d: 1 1 1 1 0
c: 1 1 1 1 1 0
b: 1 1 1 1 1 1 0
a: 1 1 1 1 1 1 1
As we can see the tree is one long limb with leaves n=hanging off. This is true
for Fibonacci weights in general, because the Fibonacci the recurrence is
To prove this, write Fj as Fj+1 - Fj-1 and sum from 0 to i, that is, F-1 = 0.
Correctness of Huffman Code Algorithm
Proof Idea
Proof Idea
Take the tree T representing optimal prefix code and transform T into a tree
T` representing another optimal prefix code such that the x characters x and y
appear as sibling leaves of maximum depth in T`. If we can do this, then their
codewords will have same length and differ only in the last bit.
Figures
Proof
Let characters b and c are sibling leaves of maximum depth in tree T. Without
loss of generality assume that f[b] ≥ f[c] and f[x] ≤ f[y]. Since f[x]
and f[y] are lowest leaf frequencies in order and f[b] and f[c] are arbitrary
frequencies in order. We have f[x] ≤ f[b] and f[y] ≤ f[c]. As shown in
the above figure, exchange the positions of leaves to get first T` and then T``.
By formula, B(t) = c in C f(c)dT(c), the difference in cost between T and
T` is
Proof Idea
Figure
Proof
We show that the cost B(T) of tree T can be expressed in terms of the cost
B(T`). By considering the component costs in equation, B(T) =
f(c)dT(c), we show that the cost B(T) of tree T can be expressed in terms of
the cost B(T`) of the tree T`. For each c belongs to C - {x, y}, we have
dT(c) = dT(c)
Proof
Let S be the set of integers n ≥ 2 for which the Huffman procedure produces a
tree of representing optimal prefix code for frequency f and alphabet C with |c|
= n.
If C = {x, y}, then Huffman produces one of the following optimal trees.
figure
Proof
Let tree be a full binary tree with n leaves. Apply induction hypothesis on the
number of leaves in T. When n=2 (the case n=1 is trivially true), there are two
leaves x and y (say) with the same parent z, then the cost of T is
Spanning Trees
A spanning tree of a graph is any tree that includes every vertex in the graph.
Little more formally, a spanning tree of a graph G is a subgraph of G that is a
tree and contains all the vertices of G. An edge of a spanning tree is called a
branch; an edge in the graph that is not in the spanning tree is called a chord.
We construct spanning tree whenever we want to find a simple, cheap and yet
efficient way to connect a set of terminals (computers, cites, factories, etc.).
Spanning trees are important because of following reasons.
Spanning trees construct a sparse sub graph that tells a lot about the original graph.
•
Spanning trees a very important in designing efficient routing algorithms.
•
Some hard problems (e.g., Steiner tree problem and traveling salesman problem) can be solved approximately by using
•
spanning trees.
Spanning trees have wide applications in many areas, such as network design, etc.
•
Note that each time a step of the algorithm is performed, one edge is
examined. If there is only a finite number of edges in the graph, the algorithm
must halt after a finite number of steps. Thus, the time complexity of this
algorithm is clearly O(n), where n is the number of edges in the graph.
Greediness It is easy to see that this algorithm has the property that
each edge is examined at most once. Algorithms, like this one, which examine
each entity at most once and decide its fate once and for all during that
examination are called greedy algorithms. The obvious advantage of greedy
approach is that we do not have to spend time reexamining entities.
Consider the problem of finding a spanning tree with the smallest possible
weight or the largest possible weight, respectively called a minimum spanning
tree and a maximum spanning tree. It is easy to see that if a graph possesses a
spanning tree, it must have a minimum spanning tree and also a maximum
spanning tree. These spanning trees can be constructed by performing the
spanning tree algorithm (e.g., above mentioned algorithm) with an appropriate
ordering of the edges.
Minimum Spanning Tree Algorithm
Perform the spanning tree algorithm (above) by examining the
edges is order of non decreasing weight (smallest first, largest last). If two or
more edges have the same weight, order them arbitrarily.
Problem Find a subset T of the edges of G such that all the vertices
remain connected when only the edges T are used, and the sum of the lengths
of the edges in T is as small as possible.
Let G` = (V, T) be the partial graph formed by the vertices of G and the edges
in T. [Note: A connected graph with n vertices must have at least n-1 edges
AND more that n-1 edges implies at least one cycle]. So n-1 is the minimum
number of edges in the T. Hence if G` is connected and T has more that n-1
edges, we can remove at least one of these edges without disconnecting (choose
an edge that is part of cycle). This will decrease the total length of edges in T.
G` = (V, T) where T is a subset of E. Since connected graph of n nodes must
have n-1 edges otherwise there exist at least one cycle. Hence if G` is
connected and T has more that n-1 edges. Implies that it contains at least one
cycle. Remove edge from T without disconnecting the G` (i.e., remove the edge
that is part of the cycle). This will decrease the total length of the edges in T.
Therefore, the new solution is preferable to the old one.
Thus, T with n vertices and more edges can be an optimal solution. It follow T
must have n-1 edges and since G` is connected it must be a tree. The G` is
called Minimum Spanning Tree (MST).
Kruskal's Algorithm
In kruskal's algorithm the selection function chooses edges in increasing order of length
without worrying too much about their connection to previously chosen edges, except
that never to form a cycle. The result is a forest of trees that grows until all the trees in a
forest (all the components) merge in a single tree.
Prim's Algorithm
This algorithm was first propsed by Jarnik, but typically attributed to Prim. it starts from
an arbitrary vertex (root) and at each stage, add a new branch (edge) to the tree already
constructed; the algorithm halts when all the vertices in the graph have been reached.
This strategy is greedy in the sense that at each step the partial spanning tree is
augmented with an edge that is the smallest among all possible adjacent edges.
MST-PRIM
T={}
Let r be an arbitrarily chosen vertex from V.
U = {r}
WHILE | U| < n
DO
Find u in U and v in V-U such that the edge (u, v) is a smallest edge between U-V.
T = TU{(u, v)}
U= UU{v}
Analysis
The algorithm spends most of its time in finding the smallest edge. So, time of the
algorithm basically depends on how do we search this edge.
Straightforward method
Just find the smallest edge by searching the adjacency list of the vertices in V. In this
case, each iteration costs O(m) time, yielding a total running time of O(mn).
Binary heap
By using binary heaps, the algorithm runs in O(m log n).
Fibonacci heap
By using Fibonacci heaps, the algorithm runs in O(m + n log n) time.
Problem Determine the length of the shortest path from the source to each of the other
nodes of the graph. This problem can be solved by a greedy algorithm often called
Dijkstra's algorithm.
The algorithm maintains two sets of vertices, S and C. At every stage the set S contains
those vertices that have already been selected and set C contains all the other vertices.
Hence we have the invariant property V=S U C. When algorithm starts Delta contains
only the source vertex and when the algorithm halts, Delta contains all the vertices of the
graph and problem is solved. At each step algorithm choose the vertex in C whose
distance to the source is least and add it to S.
Divide-and-Conquer Algorithm
Formally, find the index i such that 1 ≤ i ≤ n+1 and A[i-1] < x ≤
A[i].
Sequential Search
Look sequentially at each element of A until either we reach at the end of an
array A or find an item no smaller than 'q'.
Analysis
This algorithm clearly takes a θ(r), where r is the index returned. This is
Ω(n) in the worst case and O(1) in the best case.
If the elements of an array A are distinct and query point q is indeed in the
array then loop executed (n + 1) / 2 average number of times. On average
(as well as the worst case), sequential search takes θ(n) time.
Binary Search
Look for 'q' either in the first half or in the second half of the array A.
Compare 'q' to an element in the middle, n/2 , of the array. Let k = n/2
. If q ≤ A[k], then search in the A[1 . . . k]; otherwise search
T[k+1 . . n] for 'q'. Binary search for q in subarray A[i . . j] with the
promise that
Analysis
Binary Search can be accomplished in logarithmic time in the worst case , i.e.,
T(n) = θ(log n). This version of the binary search takes logarithmic time in
the best case.
Analysis
The analysis of Iterative algorithm is identical to that of its recursive
counterpart.
Dynamic Programming Algorithms
Bottom-up means
i. Start with the smallest subproblems.
increasing size.
Dynamic-Programming Solution
to the 0-1 Knapsack Problem
We can express this fact in the following formula: define c[i, w] to be the
solution for items 1,2, . . . , i and maximum weight w. Then
0 if i = 0 or w = 0
c[i,w]
c[i-1, w] if w ≥ 0
= i
The algorithm takes as input the maximum weight W, the number of items n,
and the two sequences v = <v1, v2, . . . , vn> and w = <w1, w2, . . . ,
wn>. It stores the c[i, j] values in the table, that is, a two dimensional array,
c[0 . . n, 0 . . w] whose entries are computed in a row-major order. That is,
the first row of c is filled in from left to right, then the second row, and so on.
At the end of the computation, c[n, w] contains the maximum value that can
be picked into the knapsack.
Dynamic-0-1-knapsack (v,
w, n, W)
FOR w = 0 TO W
DO c[0, w] = 0
FOR i=1 to n
DO c[i, 0] = 0
FOR w=1 TO
W
DO IFf wi ≤
w
THEN
IF vi + c[i-1, w-wi]
THEN
c[i, w] = vi + c[i-1,
w-wi]
ELSE
c[i, w] = c[i-1, w]
ELSE
c[i, w]
= c[i-1, w]
The set of items to take can be deduced from the table, starting at c[n. w] and
tracing backwards where the optimal values came from. If c[i, w] = c[i-1,
w] item i is not part of the solution, and we are continue tracing with c[i-1,
w]. Otherwise item i is part of the solution, and we continue tracing with c[i-
1, w-W].
Analysis
This dynamic-0-1-kanpsack algorithm takes θ(nw) times, broken up as
follows: θ(nw) times to fill the c-table, which has (n +1).(w +1) entries,
each requiring θ(1) time to compute. O(n) time to trace the solution, because
the tracing process starts in row n of the table and moves up 1 row at each step.
Dynamic-Programming Algorithm
for the Activity-Selection Problem
Compatible Activities
Activities i and j are compatible if the half-open internal [si, fi) and [sj, fj) do
not overlap, that is, i and j are compatible if si ≥ fj and sj ≥ fi
Dynamic-Programming Algorithm
The finishing time are in a sorted array f[i] and the starting times are in array
s[i]. The array m[i] will store the value mi, where mi is the size of the largest
of mutually compatible activities among activities {1, 2, . . . , i}. Let
BINARY-SEARCH(f, s) returns the index of a number i in the sorted
array f such that f(i) ≤ s ≤ f[i + 1].
for i =1 to n
do m[i] = max(m[i-1], 1+
m [BINARY-SEARCH(f,
s[i])])
We have P(i] = 1
if activity i is in optimal
selection, and P[i] = 0
otherwise
i=n
while i > 0
do if m[i] = m[i-
1]
then P[i] =
0
i=i
-1
else
i=
BINARY-SEARCH (f, s[i])
P[i] =
1
Analysis
The running time of this algorithm is O(nlg n) because of the binary search
which takes lg(n) time as opposed to the O(n) running time of the greedy
algorithm. This greedy algorithm assumes that the activities already sorted by
increasing time.
Amortized Analysis
In an amortized analysis, the time required to perform a sequence of data
structure operations is average over all operation performed. Amortized
analysis can be used to show that average cost of an operation is small, if one
average over a sequence of operations, even though a single operation might be
expensive. Unlike the average probability distribution function, the amortized
analysis guarantees the 'average' performance of each operation in the worst
case.
CLR covers the three most common techniques used in amortized analysis. The
main difference is the way the cost is assign.
1. Aggregate Method
Overcharge some operations early in the sequence. This 'overcharge' is used later in the sequence for pay operation
o
that charges less than they actually cost.
3. Potential Method
Maintain the credit as the potential energy to pay for future operations.
o
Aggregate Method
MULTIPOP(s, k)
while (.NOT. STACK-
EMPTY(s) and k ≠ 0)
do pop(s)
k = k-1
Analysis
i. Worst-case cost for MULTIPOP is O(n). There are n successive calls to MULTIPOP would cost O(n2). We get unfair cost
O(n2) because each item can be poped only once for each time it is pushed.
ii. In a sequence of n mixed operations the most times multipop can be called n/2.Since the cost of push and pop is O(1), the cost
of n stack operations is O(n). Therefore, amortized cost of an operation is the average: O(n)/n = O(1).
Amortized Analysis
floor(log)
Si=0 n/2i < n ∞Si=0 1/2i = 2n
Accounting Method
PUSH (s, x) 1
POP (s) 1
MULTIPOP (s, k) min(k,s)
The amortized cost assignments are
PUSH 2
POP 0
MULTIPOP 0
Observe that the amortized cost of each operation is O(1). We must show that
one can pay for any sequence of stack operations by charging the amortized
costs.
The two units costs collected for each PUSH is used as follows:
1 unit is used to pay the cost of the PUSH.
•
1 unit is collected in advanced to pay for a potential future POP.
•
Therefore, for any sequence for n PUSH, POP, and MULTIPOP operations, the
amortized cost is an
Ci = j=1Si 3 - Ciactual
= 3i - (2floor(lg1) + 1 + i -floor(lgi) - 2)
If i = 2k, where k ≥ 0, then
Ci = 3i - (2k+1 + i - k -2)
=k+2
If i = 2k + j, where k ≥ 0 and 1 ≤ j ≤ 2k, then
Ci = 3i - (2k+1 + i - k - 2)
= 2j + k + 2
This is an upperbound on the total actual cost. Since the total amortized cost is
O(n) so is the total cost.
PUSH 4,
POP 0,
MULTIPOP 0,
STACK-COPY 0.
Every time we PUSH, we pay a dollar (unit) to perform the actual operation
and store 1 dollar (put in the bank). That leaves us with 2 dollars, which is
placed on x (say) element. When we POP x element off the stack, one of two
dollar is used to pay POP operation and the other one (dollar) is again put into a
bank account. The money in the bank is used to pay for the STACK-COPY
operation. Since after kk dollars in the bank and the stack size is never exceeds
k, there is enough dollars (units) in the bank (storage) to pay for the STACK-
COPY operations. The cost of n stack operations, including copy the stack is
therefore O(n). operations, there are atleast
INCREMENT (A)
1. i = 0
2. while i < length[A]
and A[i] = 1
3. do A[i] = 0
4. i = i +1
5. if i < length [A]
6. then A[i] = 1
Within the while loop, the cost of resetting the bits is paid for by the dollars on
the bits that are reset.At most one bit is set, in line 6 above, and therefore the
amortized cost of an INCREMENT operation is at most 2 dollars (units). Thus,
for n INCREMENT operation, the total amortized cost is O(n), which bounds
the total actual cost.
Consider a Variant
Let us implement a binary counter as a bit vector so that any sequence of n
INCREMENT and RESET operations takes O(n) time on an initially zero
counter,. The goal here is not only to increment a counter but also to read it to
zero, that is, make all bits in the binary counter to zero. The new field , max[A],
holds the index of the high-order 1 in A. Initially, set max[A] to -1. Now,
update max[A] appropriately when the counter is incremented (or reset). By
contrast the cost of RESET, we can limit it to an amount that can be covered
from earlier INCREMENT'S.
INCREMENT (A)
1. i = 1
2. while i < length [A] and
A[i] = 1
3. do A[i] = 0
4. i = i +1
5. if i < length [A]
6. then A[i] = 1
7. if i > max[A]
8. then max[A] = i
9. else max[A] = -1
Note that lines 7, 8 and 9 are added in the CLR algorithm of binary counter.
RESET(A)
For i = 0 to max[A]
do A[i] = 0
max[A] = -1
For the counter in the CLR we assume that it cost 1 dollar to flip a bit. In
addition to that we assume that we need 1 dollar to update max[A]. Setting and
Resetting of bits work exactly as the binary counter in CLR: Pay 1 dollar to set
bit to 1 and placed another 1 dollar on the same bit as credit. So, that the credit
on each bit will pay to reset the bit during incrementing.
In addition, use 1 dollar to update max[A] and if max[A] increases place 1
dollar as a credit on a new high-order 1. (If max[A] does not increase we just
waste that one dollar). Since RESET manipulates bits at some time before the
high-order 1 got up to max[A], every bit seen by RESET has one dollar credit
on it. So, the zeroing of bits by RESET can be completely paid for by the credit
stored on the bits. We just need one dollar to pay for resetting max[A].
Thus, charging 4 dollars for each INCREMENT and 1 dollar for each RESET
is sufficient, so the sequence of n INCREMENT and RESET operations take
O(n) amortized time.
Potential Method
Notation:
D is the initial data structure (e.g., stack)
• 0
D is the data structure after the ith operation.
• i
c is the actual cost of the ith operation.
• i
The potential function Ψ maps each D to its potential value Ψ(D )
• i i
The amortized cost ^ci of the ith operation w.r.t potential function Ψ is defined
by
^
ci = ci + Ψ(Di) - Ψ (Di-1) --------- (1)
i=1 Ψ (Di-1)
= n i=1 ci + Ψ(D1) + Ψ(D2) +
. . . + Ψ (Dn-1) + Ψ(Dn) - {Ψ(D0) +
Ψ(D1) + . . . + Ψ (Dn-1)
= n i=1 ci + Ψ(Dn) - Ψ(D0)
----------- (2)
If i = 2k where k ≥ 0 then
2 lgi+1 =
2k+1 = 2i
2 lgi =
2k = i
^
ci = ci + Ψ(Di) - Ψ(Di-1)
= i + (2i -2i+1) -{2(i-1)-i+1}
=2
If i = 2k + j where k ≥ 0 and 1 ≤ j ≤ 2k
then 2 lgi+1 = 2[lgi]
^
ci = ci + Ψ(Di) - Ψ(Di-1) = 3
n ^ n
Because i=1 ci = i=1 ci + Ψ(Dn) - Ψ(D0)
and Ψ(Di) ≥ Ψ(D0), so, the total amortized cost of n operation is an upper
bound on the total actual cost. Therefore, the total amortized cost of a sequence
of n operation is O(n) and the amortized cost per operation is O(n) / n =
O(1).
Ψ(Di) ≥ 0 = Ψ(D0).
th
If the i operation on a stack containing s object is a
PUSH operation, then the potential difference is
Ψ(Di) - Ψ(Di-1) = (s + 1) - s = 1
In simple words, if ith is PUSH then (i-1)th must be
one less. By equation I, the amortized cost of this
PUSH operation is
^
ci = ci + Ψ(Di) - Ψ(Di-1) = 1 + 1 =
2
MULTIPOP
POP
Similarly, the amortized cost of a POP operation is
0.
Analysis
Since amortized cost of each of the three operations is O(1), therefore, the total
amortized cost of n operations is O(n). The total amortized cost of n operations
is an upper bound on the total actual cost.
Proof
^
We know that the amortized cost ci of operation i is defined as
^
ci = ci + Ψ(Di) - Ψ(Di-1)
For the heap operations, this gives us
Consider the potential function Ψ(D) = lg(n!), where n is the number of items
in D.
Let ith INCREMENT operation resets ti bits. This implies that actual cost =
atmost (ti + 1).
Why? Because in addition to resetting ti it also sets at most one bit to 1.
th
Therefore, the number of 1's in the counter after the i operation is therefore bi
≤ bi-1 - ti + 1, and the potential difference is
Ψ(Di) - Ψ(Di-1) ≤ (bi-1 - ti +
1) - bi-1 = 1- ti
^
ci = ci + Ψ(Di) - Ψ
(Di-1)
= (ti + 1) + (1- ti)
= 2
If counter starts at zero, then Ψ(D0) = 0. Since Ψ(Di) ≥ 0 for all i, the
total amortized cost of a sequence of n INCREMENT operation is an upper
bound on the total actual cost, and so the worst-case cost of n INCREMENT
operations is O(n).
If counter does not start at zero, then the initial number are 1's (= b0).
After 'n' INCREMENT operations the number of 1's = bn, where 0 ≤ b0, bn ≤ k.
Implementation of a queue with two stacks, such that the amortized cost of
each ENQUEUE and each DEQUEUE Operation is O(1). ENQUEUE pushes
an object onto the first stack. DEQUEUE pops off an object from second stack
if it is not empty. If second stack is empty, DEQUEUE transfers all objects
from the first stack to the second stack to the second stack and then pops off the
first object. The goal is to show that this implementation has an O(1)
amortized cost for each ENQUEUE and DEQUEUE operation. Suppose Di
th
denotes the state of the stacks after i operation. Define Ψ(Di) to be the
number of elements in the first stack. Clearly, Ψ(D0) = 0 and Ψ(Di) ≥ Ψ(D0)
for all i. If the ith operation is an ENQUEUE operation, then Ψ(Di) - Ψ(Di-1) = 1
Since the actual cost of an ENQUEUE operation is 1, the amortized cost of an
th
ENQUEUE operation is 2. If the i operation is a DEQUEUE, then there are
two case to consider.
Case i: When the second stack is not empty.
In this case we have Ψ(D ) - Ψ(D ) = 0 and the actual cost of the DEQUEUE operation is 1.
i i-1
Case ii: When the second stack is empty.
In this case, we have Ψ(D ) - Ψ(D ) = - Ψ(D ) and the actual cost of the DEQUEUE operation is Ψ(D ) + 1 .
i i-1 i-1 i-1
In either case, the amortize cost of the DEQUEUE operation is 1. It follows that
each operation has O(1) amortized cost
Dynamic Table
If the allocated space for the table is not enough, we must copy the table into
larger size table. Similarly, if large number of members erased from the table, it
is good idea to reallocate the table with a smaller size. Using amortized analysis
we shall show that the amortized cost of insertion and deletion is constant and
unused space in a dynamic table never exceeds a constant fraction of the total
space.
TABLE-DELETE
Load Factor
The number of items stored in the table, n, divided by the size of the table, m, is
defined as the load factor and denoted as T(α) = m/n
The load factor of the empty table (size m=0) is 1.
A table is full when there exists no used slots. That is, the number of items
stored in the table equals the number of available slots (m=n). In this case
Proposed Algorithm
1. Initialize table size to m=1.
2. Keep inserting elements as long as size of the table less than number of items i.e., n<m.
4. Copy items (by using elementary insertion) from the old table into the new one.
5. GOTO step 2.
Analysis
If n elementary insert operations are performed in line 4, the worst-case cost of
an operation is O(n), which leads to an upper bound of O(n2) on the total
running time for n operations.
Aggregate Analysis
th
The i insert operation causes an expansion only when i - 1 an exact power of
th
2. Let ci be the cost of the i insert operation. Then
ci = i if i - 1 is an exact
power of 2
1 Otherwise
As an example, consider the following illustration.
INSERTION TABLE-SIZE COST
n m 1
1 1 1+1
2 2 1+2
3 4 1
4 4 1+4
5 8 1
6 8 1
7 8 1
8 8 1
9 16 1+8
10 16 1
n
∑i=1 ci ≤ n +
floor(lgn)
∑j=0 2j
=n+
floor(lgn)+1
[2 -1]/[2-
1] since n∑k=0 xk
= [xn+1 -1]/[x-1]
= n + 2lgn .
2-1
= n + 2n -
1
= 3n - 1
< 3n
Therefore, the amortized cost of a single operation is
= Total
cost
Number of
operations
= 3n/n
= 3
Asymptotically, the cost of dynamic table is O(1) which is the same as that of
table of fixed size.
Accounting Method
Here we guess charges to 3$. Intuitively, each item pays for 3 elementary
operations.
1. 1$ for inserting immediate item.
3. 1$ for moving another item (re-insert) has already been moved once when that table is expanded.
Potential Method
Define a potential function Φ that is 0 immediately after an expansion but
potential builds to the table size by the time the table is full.
Φ(T) = 2 . num[T] -
2 num[T]
=0
Immediately before an expansion, we have num[T] = size[T],
which implies
Φ(T) = 2 . num[T] -
num[T]
= num[T]
The initial value of the potential function is zero i.e., Φ(T) = 0, and half-full
i.e., num[T] ≥ size[T]/2. or 2 num[T] ≥ size[T]
which implies
Before, analyze the amortized cost of the ith TABLE-INSERT operation define
following.
Let
numi = number of elements in the table after ith operation
sizei = table after ith operation.
Φi = Potential function after ith operation.
If ith insertion does not trigger an expansion, then sizei = sizei-1 and the
amortized cost of the operation is
^
ci = ci + Φi - Φi-1
= 1 + [2 . numi - sizei] - [2
. numi-1- sizei-1]
= 1 + 2 numi - sizei -
2(numi- 1) - sizei
= 1 + 2numi - sizei -
2numi + 2 - sizei
=3
If the ith insertion does trigger an expansion, then (size of the table becomes
double) then sizei = sizei-1 = numi and the amortized cost of the operation
is 2
^
ci = ci + Φi - Φi-1
= numi + [2 . numi - sizei]
- [2 . numi-1 - sizei-1]
= numi + 2numi - sizei - 2
. numi-1 + sizei-1
= numi + 2numi -2(numi
-1) -2(numi -1) + (numi -1)
= numi + 2numi - 2numi
-2 -2numi + 2 + numi -1
= 4 -1
= 3
What is the catch? It show how potential builds (from zero) to pay the
table expansion.
Dynamic Table Expansion and Contraction
When the load factor of the table, α(T) = n/m, becomes too small, we want
to preserve following two properties:
1. Keep the load factor of the dynamic table below by a constant.
2. Keep the amortized cost of the dynamic table bounded above by a constant.
Proposed Strategy
Even: When an item is inserted into full table.
Action: Double the size of the table i.e., m ← 2m.
Even: When removing of an item makes table less the half
table.
Action: Halve the size of the table i.e., m ← m/2.
The problem with this strategy is trashing. We can avoid this problem by
allowing the load factor of the table, α(T) = n/m, to drop below 1/2 before
contradicting it. By contradicting the table when load factor falls below 1/4,
we maintain the lower bound α(T) ≥ 1/4 i.e., load factor is bounded by the
constant 1/4.
Load Factor
The load factor α(T), of non-empty table T is defined as the number of items
stored in the T divided by the size of T, which is number slots in T, i.e.,
2. 0 immediately after a contraction and builds as the load factor, α(T) = n/m, decreases to 1/4.
When α(T) = 1
since α(T) = num[T] = 1
size[T]
which indicates that the potential can pay for an expansion if an item is
inserted.
which indicates that the potential can pay for a contradiction if an item is
deleted.
Notation
The subscript is used in the existing notations to denote their values after the ith
operations. That is to say, ^ci, ci, numi, sizei, αi and Φi indicate values after the
ith operation.
Initially
num0 = size0 = Φ0 = 1 and α0
Hash Table
Direct-address table
If the keys are drawn from the reasoning small universe U = {0, 1, . . . , m-1} of
keys, a solution is to use a Table T[0, . m-1], indexed by keys. To represent the
dynamic set, we use an array, or direct-address table, denoted by T[0 . . m-1],
in which each slot corresponds to a key in the universe.
Each key in the universe U i.e., Collection, corresponds to an index in the table
T[0 . . m-1]. Using this approach, all three basic operations (dictionary
operations) take θ(1) in the worst case.
Hash Tables
When the size of the universe is much larger the same approach (direct address
table) could still work in principle, but the size of the table would make it
impractical. A solution is to map the keys onto a small range, using a function
called a hash function. The resulting data structure is called hash table.
With direct addressing, an element with key k is stored in slot k. With hashing
=, this same element is stored in slot h(k); that is we use a hash function h to
compute the slot from the key. Hash function maps the universe U of keys into
the slot of a hash table T[0 . . .m-1].
h: U → {0, 1, . . ., m-1}
More formally, suppose we want to store a set of size n in a table of size m.
The ratio α = n/m is called a load factor, that is, the average number of
elements stored in a Table. Assume we have a hash function h that maps each
key k U to an integer name h(k) [0 . . m-1]. The basic idea is to store
key k in location T[h(k)].
Typical, hash functions generate "random looking" valves. For example, the
following function usually works well
Is there any point of the hash function? Yes, the point of the hash function is to
reduce the range of array indices that need to be handled.
Collision
As keys are inserted in the table, it is possible that two keys may hash to the
same table slot. If the hash function distributes the elements uniformly over the
table, the number of conclusions cannot be too large on the average, but the
birthday paradox makes it very likely that there will be at least one collision,
even for a lightly loaded table
A hash function h map the keys k and j to the same slot, so they collide.
There are two basic methods for handling collisions in a hash table: Chaining
and Open addressing.
Deletion of an element x can be accomplished in O(1) time if the lists are doubly linked.
In the worst case behavior of chain-hashing, all n keys hash to the same slot, creating a list of length n. The worst-case time for search
h(19) = 19 mod 9 = 1
h(15) = 15 mod 9 = 6
h(20) = 20 mod 9 = 2
h(33) = 33 mod 9 = 6
h(12) = 12mod 9 = 3
h(17) = 17 mod 9 = 8
h(10) = 10 mod 9 = 1
Figure
A good hash function satisfies the assumption of simple uniform hashing, each
element is equally likely to hash into any of the m slots, independently of
where any other element has hash to. But usually it is not possible to check this
condition because one rarely knows the probability distribution according to
which the keys are drawn.
Most hash function assume that the universe of keys is the set of natural
numbers. Thus, its keys are not natural to interpret than as natural numbers.
3. Universal hashing.
h(k) = k mod m.
Example:
If table size m = 12
key k = 100
than
h(100) = 100 mod 12
=4
Poor choices of m
m should not be a power of 2, since if m = 2p, then h(k) is just the p lowest-order bits of k.
So, 2p may be a poor choice, because permuting the characters of k does not change value.
Good m choice of m
A prime not too close to an exact of 2.
Multiply the key k by a constant 0< A < 1 and extract the fraction part of kA.
Step 2:
Advantage of this method is that the value of m is not critical and can be
implemented on most computers.
A reasonable value of constant A is
≈ (sqrt5 - 1) /2
3. Universal Hashing
Open Addressing
In this technique all elements are stored in the hash table itself. That is, each
table entry contains either an element or NIL. When searching for element (or
empty slot), we systematically examine slots until we found an element (or
empty slot). There are no lists and no elements stored outside the table. That
implies that table can completely "fill up"; the load factor α can never exceed
1.Advantage of this technique is that it avoids pointers (pointers need space
too). Instead of chasing pointers, we compute the sequence of slots to be
examined. To perform insertion, we successively examine or probe, the hash
table until we find an empty slot. The sequence of slots probed "depends upon
the key being inserted." To determine which slots to probe, the hash function
includes the probe number as a second input. Thus, the hash function becomes
i=0
Repeat j <-- h(k, i)
if T[j] = k
then return j
i = i +1
until T[j] = NIL or i = m
Return NIL
2. Quadratic probing.
3. Double hashing.
These techniques guarantee that
Uniform hashing required are not met. Since none of these techniques capable
of generating more than m2 probe sequences (instead of m!).
Uniform Hashing
Each key is equally likely to have any of the m! permutation of < 0, 1, . . . , m-1> as its probe sequence.
Note that uniform hashing generalizes the notion of simple uniform hashing.
1. Linear Probing
This method uses the hash function of the form:
2. Quadratic Probing
This method uses the hash function of the form
3. Double Hashing
This method produced the permutation that is very close to random. This
method uses a hash function of the form
The probe sequence here depends in two ways on the key k, the initial probe
position and the offset.
Binary Search tree is a binary tree in which each internal node x stores an element such that the element stored in the left subtree of x
are less than or equal to x and elements stored in the right subtree of x are greater than or equal to x. This is called binary-search-tree
property.
The basic operations on a binary search tree take time proportional to the height
of the tree. For a complete binary tree with node n, such operations runs in
(lg n) worst-case time. If the tree is a linear chain of n nodes, however, the
same operations takes (n) worst-case time.
The height of the Binary Search Tree equals the number of links from
the root node to the deepest node.
INORDER-TREE-WALK
(x)
If x NIL then
INORDER-TREE-WALK
(left[x])
print key[x]
INORDER-TREE-WALK
(right[x])
It takes (n) time to walk a tree of n nodes. Note that the Binary Search Tree property allows us to print out all the elements in the
Binary Search Tree in sorted order.
Preorder Tree Walk
In which we visit the root node before the nodes in either subtree.
PREORDER-TREE-
WALK (x)
If x not equal NIL then
PRINT key[x]
PREORDER-TREE-
WALK (left[x])
PREORDER-TREE-
WALK (right[x])
POSTORDER-TREE-
WALk (x)
If x not equal NIL then
POSTORDER-TREE-
WALK (left[x])
PREORDER-TREE-
WALK (right[x])
PRINT key [x]
It takes O(n) time to walk (inorder, preorder and pastorder) a tree of n nodes.
The last statement implies that since sorting n elements takes Ω(n lg n) time
in the worst case in the comparison model, any comparison-based algorithm for
constructing a Binary Search Tree from arbitrary list n elements takes Ω(n lg
n) time in the worst case.
We can show the validity of this argument (in case you are thinking of beating
Ω(n lg n) bound) as follows: let c(n) be the worst-case running time for
constructing a binary tree of a set of n elements. Given an n-node BST, the
inorder walk in the tree outputs the keys in sorted order (shown above). Since
the worst-case running time of any computation based sorting algorithm is
Ω(n lg n) , we have
The TREE-SEARCH (x, k) algorithm searches the tree root at x for a node
whose key value equals k. It returns a pointer to the node if it exists otherwise
NIL
TREE-SEARCH (x, k)
if x = NIL .OR.
k = key[x]
then return x
if k < key[x]
then return
TREE-SEARCH
(left[x], k)
else return
TREE-SEARCH
(right[x], k)
Clearly, this algorithm runs in O(h) time where h is the height of the tree.
ITERATIVE-TREE-
SEARCH (x, k)
1. while x not equal NIL .AND.
key ≠ key[x] do
3. then x ← left[x]
4. else x ← right [x]
5. return x
The TREE-MINIMUN (x) algorithm returns a point to the node of the tree at x
whose key value is the minimum of all keys in the tree. Due to BST property,
an minimum element can always be found by following left child pointers from
the root until NIL is uncountered.
TREE-MINIMUM (x)
while left[x] ≠ NIL
do
x ← left [x]
return x
Clearly, it runs in O(h) time where h is the height of the tree. Again thanks to
BST property, an element in a binary search tree whose key is a maximum can
always be found by following right child pointers from root until a NIL is
encountered.
TREE-MAXIMUM (x)
while right[x] ≠
NIL do
x ← right [x]
return x
Clearly, it runs in O(h) time where h is the height of the tree.
The TREE-SUCCESSOR (x) algorithm returns a pointer to the node in the tree
whose key value is next higher than key [x].
TREE-SUCCESSOR (x)
if right [x] ≠ NIL
then return
TREE-MINIMUM
(right[x])
else y ← p[x]
while y ≠ NIL
.AND. x =
right[y] do
x←y
y ← p[y]
return y
Note that algorithm TREE-MINIMUM, TRE-MAXIMUM, TREE-
SUCCESSOR, and TREE-PREDESSOR never look at the keys.
Note that e = n - 1 for any tree with at least one node. This allows us to
prove the claim by induction on e (and therefore, on n).
Base case Suppose that e = 0. Then, either the tree is empty or consists
only of a single node. So, e = r = 0. Therefore, the claim holds.
Inductive step Suppose e > 0 and assume that the claim holds for all e' <
e. Let T be a binary search tree with e edges. Let x be the root, and T1 and T2
respectively be the left and right subtree of x. Since T has at least one edge,
either T1 or T2 respectively is nonempty. For each i = 1, 2, let ei be the
number of edges in Ti, pi the node holding the maximum key in Ti, and ri the
distance from pi to the root of Ti. Similarly, let e, p and r be the
correspounding values for T. First assume that both T1 and T2 are nonempty.
Then e = e1 + e2 + 2, p = p2, and r = r2 + 1. The action of the enumeration is as
follows:
Upon being called, the minimum-tree(x) traverses the left branch of x and enters T1.
•
Once the root of T1 is visited, the edges of T1 are traversed as if T1 is the input tree. This situation will last until p is visisted.
• 1
When the Tree-Successor is called form p . The upward path from p and x is traversed and x is discovered to hold the
• 1 1
successor.
mT = 1 + (2e2 - r2)
= 2(e2+1) - (r2 + 1)
= 2e -r
Therefore, the claim holds for this case.
Consider any binary search tree T and let y be the parent of a leaf z. Our goal
is to show that key[y] is
INSERTION
To insert a node into a BST
1. find a leaf st the appropriate place and
TREE-INSERT (T, z)
y ← NIL
x ← root [T]
while x ≠ NIL do
y←x
if key [z] <
key[x]
then x ←
left[x]
else x ←
right[x]
p[z] ← y
if y = NIL
then root [T] ←
z
else if key [z] <
key [y]
then left [y] ←
z
else right [y]
←z
Like other primitive operations on search trees, this algorithm begins at the root
of the tree and traces a path downward. Clearly, it runs in O(h) time on a tree
of height h.
Sorting
We can sort a given set of n numbers by first building a binary search tree
containing these number by using TREE-INSERT (x) procedure repeatedly to
insert the numbers one by one and then printing the numbers by an inorder tree
walk.
Analysis
Best-case running time
Printing takes O(n) time and n insertion cost O(lg n) each (tree is balanced, half the insertions are at depth lg(n) -1).
Printing still takes O(n) time and n insertion costing O(n) each (tree is a single chain of nodes) is O(n2). The n insertion
Deletion
Removing a node from a BST is a bit more complex, since we do not want to
create any "holes" in the tree. If the node has one child then the child is
spliced to the parent of the node. If the node has two children then its successor
has no left child; copy the successor into the node and delete the successor
instead TREE-DELETE (T, z) removes the node pointed to by z from the tree
T. IT returns a pointer to the node removed so that the node can be put on a
free-node list, etc.
TREE-DELETE (T, z)
1. if left [z] = NIL .OR. right[z]
= NIL
2. then y ← z
3. else y ← TREE-SUCCESSOR
(z)
5. then x ← left[y]
7. if x ≠ NIL
9. if p[y] = NIL
14. if y ≠ z
17. return y
Graph Algorithms
The Graph Theory has important applications in Critical path analysis, Social
psychology, Matrix theory, Set theory, Topology, Group theory, Molecular
chemistry, and Searching.
Those who would like to take a quick tour of essentials of graph theory please
go directly to "Graph Theory" from here.
Digraph
A directed graph, or digraph G consists of a finite nonempty set of vertices V,
and a finite set of edges E, where an edge is an ordered pair of vertices in V.
Vertices are also commonly referred to as nodes. Edges are sometimes referred
to as arcs.
V = {1, 2, 3, 4}
E = { (1, 2), (2, 4), (4, 2) (4, 1)}
The definition of graph implies that a graph can be drawn just knowing its
vertex-set and its edge-set. For example, our first example
has vertex set V and edge set E where: V = {1,2,3,4} and E = {(1,2),(2,4),(4,3),
(3,1),(1,4),(2,1),(4,2),(3,4),(1,3),(4,1). Notice that each edge seems to be listed
twice.
Another example, the following Petersen Graph G=(V,E) has vertex set V and
edge set E where: V = {1,2,3,4}and E ={(1,2),(2,4),(4,3),(3,1),(1,4),(2,1),(4,2),
(3,4),(1,3),(4,1)}.
3. Incidence Matrix
1. Transpose
If graph G = (V, E) is a directed graph, its transpose, GT = (V, ET) is the same
as graph G with all arrows reversed. We define the transpose of a adjacency
matrix A = (aij) to be the adjacency matrix AT = (Taij) given by Taij = aji. In other
words, rows of matrix A become columns of matrix AT and columns of matrix
A becomes rows of matrix AT. Since in an undirected graph, (u, v) and (v, u)
represented the same edge, the adjacency matrix A of an undirected graph is its
own transpose: A = AT.
Formally, the transpose of a directed graph G = (V, E) is the graph GT (V, ET),
where ET = {(u, v) ∈ V×V : (u, v)∈E. Thus, GT is G with all its edges reversed.
ALGORITHM MATRIX
TRANSPOSE (G, GT)
For i = 0 to i < V[G]
For j = 0 to j V[G]
GT (j, i) = G(i, j)
j = j + 1;
i=i+1
To see why it works notice that if GT(i, j) is equal to G(j, i), the same thing is
achieved. The time complexity is clearly O(V2).
Algorithm for Computing GT from G in Adjacency-List
Representation
In this representation, a new adjacency list must be constructed for transpose of
G. Every list in adjacency list is scanned. While scanning adjacency list of v
(say), if we encounter u, we put v in adjacency-list of u.
ALGORITHM LIST
TRANSPOSE [G]
for u = 1 to V[G]
for each element v∈Adj[u]
Insert u into the front of
Adj[v]
2. Square
The square of a directed graph G = (V, E) is the graph G2 = (V, E2) such that (a,
b)∈E2 if and only if for some vertex c∈V, both (u, c)∈E and (c,b)∈E. That is,
G2 contains an edge between vertex a and vertex b whenever G contains a path
with exactly two edges between vertex a and vertex b.
For each vertex, we must make a copy of at most |E| list elements. The total
time is O(|V| * |E|).
3. Incidence Matrix
The incidence matrix of a directed graph G=(V, E) is a V×E matrix B = (bij)
such that
-1 if edge j leaves
vertex j.
bij = 1 if edge j enters
vertex j.
0 otherwise.
If B is the incidence matrix and BT is its transpose, the diagonal of the product
matrix BBT represents the degree of all the nodes, i.e., if P is the product matrix
BBT then P[i, j] represents the degree of node i:
Specifically we have
Therefore
Like depth first search, BFS traverse a connected component of a given graph
and defines a spanning tree.
BREADTH FIRST
SEARCH (G, S)
Input: A graph G
and a vertex.
Output: Edges
labeled as
discovery and cross
edges in the
connected
component.
Create a Queue Q.
ENQUEUE (Q,
S) // Insert S into
Q.
While Q is not
empty do
for each vertex v
in Q do
for all edges e
incident on v do
if edge e is
unexplored then
let w be
the other endpoint
of e.
if vertex
w is unexpected
then
- mark
e as a discovery
edge
- insert
w into Q
else
mark e
as a cross edge
BFS label each vertex by the length of a shortest path (in terms of number of
edges) from the start vertex.
Example (CLR)
Step1
Step 2
Step 3
Step 4
Step 5
Step 6
Step 7
Step 8
Step 9
As with the depth first search (DFS), the discovery edges form a spanning tree,
which in this case we call the BSF-tree.
BSF used to solve following problem
Testing whether graph is
connected.
Computing a spanning
forest of graph.
Computing a cycle in
graph or reporting that no
such cycle exists.
Analysis
Total running time of BFS is O(V + E).
Bipartite Graph
We define bipartite graph as follows: A bipartite graph is an undirected graph
G = (V, E) in which V can be partitioned into two sets V1 and V2 such that (u,
v) ∈ E implies either u V1 and v V2 or u V2 and v V1. That is, all edges go
between the two sets V1 and V2.
Formally, to check if the given graph is bipartite, the algorithm traverse the
graph labeling the vertices 0, 1, or 2 corresponding to unvisited., partition 1 and
partition 2 nodes. If an edge is detected between two vertices in the same
partition, the algorithm returns.
ALGORITHM:
BIPARTITE (G, S)
For each vertex
U ∈ V[G] - {s} do
Color[u] =
WHITE
d[u] = ∞
partition[u] = 0
Color[s] = gray
partition[s] = 1
d[s] = 0
Q = [s]
While Queue 'Q' is
not empty do
u = head [Q]
for each v in
Adj[u] do
if partition [u]
= partition [v] then
return 0
else
if color[v]
WHITE then
color[v] =
gray
d[v] =
d[u] +1
partition[v] = 3 -
partition[u]
ENQUEUE (Q, v)
DEQUEUE (Q)
Color[u] = BLACK
Return 1
Correctness
As Bipartite (G, S) traverse the graph it labels the vertices with a partition
number consisted with the graph being bipartite. If at any vertex, algorithm
detects an inconsistency, it shows with an invalid return value,. Partition value
of u will always be a valid number as it was enqueued at some point and its
partition was assigned at that point. AT line 19, partition of v will unchanged if
it already set, otherwise it will be set to a value opposite to that of vertex u.
Analysis
The lines added to BFS algorithm take constant time to execute and so the
running time is the same as that of BFS which is O(V + E).
Diameter of Tree
The diameter of a tree T = (V, E) is the largest of all shortest-path distance
in the tree and given by max[dist(u,v)]. As we have mentioned that BSF
can be use to compute, for every vertex in graph, a path with the minimum
number of edges between start vertex and current vertex. It is quite easy to
compute the diameter of a tree. For each vertex in the tree, we use BFS
algorithm to get a shortest-path. By using a global variable length, we record
the largest of all shortest-paths. This will clearly takes O(V(V + E)) time.
ALGORITHM:
TREE_DIAMETER (T)
maxlength = 0
For S = 0 to S < |
V[T]|
temp = BSF(T,
S)
if maxlength <
temp
maxlength =
temp
Increment s by 1
return maxlength
Edges leads to new vertex are called discovery or tree edges and edges lead to
already visited are called back edges.
Example (CLR)
Solid Edge = discovery or tree edge
Dashed Edge = back edge.
Each vertex has two time stamps: the first time stamp records when vertex is
first discovered and second time stamp records when the search finishes
examining adjacency list of vertex.
DFS algorithm used to solve following
problems.
Testing whether graph is
connected.
Computing a spanning
forest of graph.
Computing a path
between two vertices of
graph or equivalently
reporting that no such
path exists.
Computing a cycle in
graph or equivalently
reporting that no such
cycle exists.
Analysis
The running time of DSF is (V + E).
Consider a directed graph G = (V, E). After a DFS of graph G we can put each
edge into one of four classes:
Observation 1 For an edge (u, v), d[u] < f[u] and d[v] <
f[v] since for any vertex has to be discovered before we can
finish exploring it.
Just for the hell of it lets determine whether or not an undirected graph contain
a cycle. It is not difficult to see that the algorithm for this problem would be
very similar to DFS(G) except that when the adjacent edge is already a GRAY
edge than a cycle is detected. While doing this the algorithm also takes care
that it is not detecting a cycle when the GRAY edge is actually a tree edge from
a ancestor to a descendent.
ALGORITHM
DFS_DETECT_CYCLES
[G]
For each vertex u in
V[G] do
Color [u] =
while,
Predecessor [u]
= NIL;
time = 0
For each vertex u in
V[G] do
if color [u] =
while
DFS_visit(u);
The subalgorithm DFS_visit(u) is as follows:
DFS_visit(u)
color(u) = GRAY
d[u] = time = time + 1
For each v in adj[u] do
if color[v] = gray and
Predecessor[u] v do
return "cycle
exists"
if color[v] = while do
Predecessor[v] = u
Recursively
DFS_visit(v)
color[u] = Black;
f[u] = time = time + 1
Correctness
To see why this algorithm works suppose the node to visited v is a gray node,
then there are two possibilities:
1. The node v is a parent node of u and we are going back the tree edge which we traversed while visiting u after visiting v. In
2. The second possibility is that v has already been encountered once during DFS_visit and what we are traversing now will be
Topological Sort
It is important to note that if the graph is not acyclic, then no linear ordering is
possible. That is, we must not have circularities in the directed graph. For
example, in order to get a job you need to have work experience, but in order to
get work experience you need to have a job (sounds familiar?).
Let is G acyclic.
Since is G acyclic, must have a vertex with no incoming edges.
Let v1 be such a vertex. If we remove v1 from graph, together with
its outgoing edges, the resulting digraph is still acyclic. Hence
resulting digraph also has a vertex *
TOPOLOGICAL_SORT(G
)
1. For each vertex find the finish
time by calling DFS(G).
Example
Given graph G; start node u
Diagram