MC4101 ADSA - Unit I
MC4101 ADSA - Unit I
MC4101 ADSA - Unit I
UNIT- I
Algorithms – Algorithms as a Technology -Time and Space complexity of algorithms-
Asymptotic analysis-Average and worst-case analysis-Asymptotic notation-Importance of
efficient algorithms- Program performance measurement - Recurrences: The Substitution Method
– The Recursion- Tree Method- Data structures and algorithms.
Linked List
Tree
Graph
Stack, Queue etc.
All these data structures allow us to perform different operations on data. We select these data
structures based on which type of operation is required. We will look into these data structures in
more details in our later lessons.
lOMoARcPSD|2583814
The data structures can also be classified on the basis of the following characteristics:
CharactersticDescription
LinearIn Linear data structures,the data items are arranged in a linear sequence.
Example: Array
HomogeneousIn homogeneous data structures,all the elements are of same type. Example: Array
Non- In Non-Homogeneous data structure, the elements may or may not be of the same
Homogeneous type. Example: Structures
StaticStatic data structures are those whose sizes and structures associated memory
locations are fixed, at compile time. Example: Array
DynamicDynamic structures are those which expands or shrinks depending upon the
program need and its execution. Also, their associated memory locations changes.
Example: Linked List created using pointers
What is an Algorithm ?
An algorithm is a finite set of instructions or logic, written in order, to accomplish a certain
predefined task. Algorithm is not the complete code or program, it is just the core logic(solution)
lOMoARcPSD|2583814
1. Time Complexity
2. Space Complexity
Space Complexity
Its the amount of memory space required by the algorithm, during the course of its execution.
Space complexity must be taken seriously for multi-user systems and in situations where limited
memory is available.
An algorithm generally requires space for following components :
Instruction Space : Its the space required to store the executable version of the program. This
space is fixed, but varies depending upon the number of lines of code in the program.
Data Space : Its the space required to store all the constants and variables value.
Environment Space : Its the space required to store the environment information needed to
resume the suspended function.
Time Complexity
Time Complexity is a way to represent the amount of time needed by the program to run till its
completion. We will study this in details in later sections.
2. Big Omega denotes "more than or the same as" <expression> iterations.
3. Big Theta denotes "the same as" <expression> iterations.
4. Little Oh denotes "fewer than" <expression> iterations.
5. Little Omega denotes "more than" <expression> iterations.
…………………………………………………………………………………………………...
Arrays:
Whenever we want to work with large number of data values, we need to use that much number
of different variables. As the number of variables are increasing, complexity of the program also
increases and programmers get confused with the variable names. There may be situations in
lOMoARcPSD|2583814
APPLICATIONS
To store a set of programs which are to be given access to a hard disk according to
their priority.
b) For representing a city region telephone network.
c) To store a set of fixed key words which are referenced very frequently.
d) To represent an image in the form of a bitmap.
e) To implement back functionality in the internet browser.
f) To store dynamically growing data which is accessed very frequently, based upon a
key value.
g) To implement printer spooler so that jobs can be printed in the order of their
arrival.
h) To record the sequence of all the pages browsed in one session.
i) To implement the undo function in a text editor.
j) To store information about the directories and files in a system.
Algorithm Analysis
Efficiency of an algorithm can be analyzed at two different stages, before implementation and
after implementation. They are the following −
A Priori Analysis − This is a theoretical analysis of an algorithm. Efficiency of an
algorithm is measured by assuming that all other factors, for example, processor speed,
are constant and have no effect on the implementation.
A Posterior Analysis − This is an empirical analysis of an algorithm. The selected
algorithm is implemented using programming language. This is then executed on target
computer machine. In this analysis, actual statistics like running time and space required,
are collected.
We shall learn about a priori algorithm analysis. Algorithm analysis deals with the execution or
running time of various operations involved. The running time of an operation can be defined as
the number of computer instructions executed per operation.
Algorithm Complexity
Suppose X is an algorithm and n is the size of input data, the time and space used by the
algorithm X are the two main factors, which decide the efficiency of X.
Time Factor − Time is measured by counting the number of key operations such as
comparisons in the sorting algorithm.
Space Factor − Space is measured by counting the maximum memory space required by
the algorithm.
The complexity of an algorithm f(n) gives the running time and/or the storage space required by
the algorithm in terms of n as the size of input data.
lOMoARcPSD|2583814
Space Complexity
Space complexity of an algorithm represents the amount of memory space required by the
algorithm in its life cycle. The space required by an algorithm is equal to the sum of the
following two components −
A fixed part that is a space required to store certain data and variables, that are
independent of the size of the problem. For example, simple variables and constants used,
program size, etc.
A variable part is a space required by variables, whose size depends on the size of the
problem. For example, dynamic memory allocation, recursion stack space, etc.
Space complexity S(P) of any algorithm P is S(P) = C + SP(I), where C is the fixed part and S(I)
is the variable part of the algorithm, which depends on instance characteristic I. Following is a
simple example that tries to explain the concept −
Algorithm: SUM(A,
B) Step 1 - START
Step 2 - C ← A + B + 10
Step 3 - Stop
Here we have three variables A, B, and C and one constant. Hence S(P) = 1 + 3. Now, space
depends on data types of given variables and constant types and it will be multiplied accordingly.
Time Complexity
Time complexity of an algorithm represents the amount of time required by the algorithm to run
to completion. Time requirements can be defined as a numerical function T(n), where T(n) can
be measured as the number of steps, provided each step consumes constant time.
For example, addition of two n-bit integers takes n steps. Consequently, the total computational
time is T(n) = c ∗ n, where c is the time taken for the addition of two bits. Here, we observe that
T(n) grows linearly as the input size increases.
Asymptotic analysis is input bound i.e., if there's no input to the algorithm, it is concluded to
work in a constant time. Other than the "input" all other factors are considered constant.
Asymptotic analysis refers to computing the running time of any operation in mathematical units
of computation. For example, the running time of one operation is computed as f(n) and may be
for another operation it is computed as g(n2). This means the first operation running time will
increase linearly with the increase in n and the running time of the second operation will increase
exponentially when n increases. Similarly, the running time of both operations will be nearly the
same if n is significantly small.
Asymptotic Notations
Following are the commonly used asymptotic notations to calculate the running time complexity
of an algorithm.
Ο Notation
Ω Notation
θ Notation
Big Oh Notation, Ο
The notation Ο(n) is the formal way to express the upper bound of an algorithm's running time. It
measures the worst case time complexity or the longest amount of time an algorithm can possibly
take to complete.
Ο(f(n)) = { g(n) : there exists c > 0 and n0 such that f(n) ≤ c.g(n) for all n > n0. }
Omega Notation, Ω
The notation Ω(n) is the formal way to express the lower bound of an algorithm's running time. It
measures the best case time complexity or the best amount of time an algorithm can possibly take
to complete.
Ω(f(n)) ≥ { g(n) : there exists c > 0 and n0 such that g(n) ≤ c.f(n) for all n > n0. }
Theta Notation, θ
The notation θ(n) is the formal way to express both the lower bound and the upper bound of an
algorithm's running time. It is represented as follows −
θ(f(n)) = { g(n) if and only if g(n) = Ο(f(n)) and g(n) = Ω(f(n)) for all n > n0. }
logarithmic − Ο(log n)
linear − Ο(n)
quadratic − Ο(n2)
cubic − Ο(n3)
polynomial − nΟ(1)
exponential − 2Ο(n)
lOMoARcPSD|2583814
Time complexity :
Big O notation
f(n) = O(g(n)) means
There are positive constants c and k such that:
0<= f(n) <= c*g(n) for all n >= k.
For large problem sizes the dominant term(one with highest value of exponent) almost completely
determines the value of the complexity expression. So abstract complexity is expressed in terms
of the dominant term for large N. Multiplicative constants are also ignored.
N^2 + 3N + 4 is O(N^2)
since for N>4, N^2 + 3N + 4 < 2N^2 (c=2 & k=4)
Examples:
1. Traversing an array.
2. Sequential/Linear search in an array.
3. Best case time complexity of Bubble sort (i.e when the elements of array are in sorted order).
Basic strucure is :
for (i = 0; i < N; i++) {
sequence of statements of O(1)
}
The loop executes N times, so the total time is N*O(1) which is O(N).
Examples:
1) Worst case time complexity of Bubble, Selection and Insertion sort.
Nested loops:
1.
for (i = 0; i < N; i++) {
for (j = 0; j < M; j++) {
sequence of statements of O(1)
}
}
The outer loop executes N times and inner loop executes M times so the time complexity is
O(N*M)
2.
for (i = 0; i < N; i++) {
for (j = 0; j < N; j++) {
sequence of statements of O(1)
}
}
Now the time complexity is O(N^2)
3.let's consider nested loops where the number of iterations of the inner loop depends on the
value of the outer loop's index.
Examples:
1. MergeSort, QuickSort etc.
The best time in the above list is obviously constant time, and the worst is exponential time
which, as we have seen, quickly overwhelms even the fastest computers even for relatively small
n. Polynomial growth (linear, quadratic, cubic, etc.) is considered manageable as compared to
exponential growth.
Using the "<" sign informally, we can say that the order of growth is
O(l) < O(log n) < O(n) < O(n log n) < O(n^2) < O(n^3) < O(a^n) where a>1
Example:
{ O(1)
perform any statement S1}
for (i=0; i < n; i++) {
has complexity (N2). The loop executes N times and each method call g(N) is complexity O(N).
Other Examples
A recursive solution:
public long Fib1(long n){
if ((n == 1 )||(n==2)) return 1;
return Fib1(n 1) + Fib1(n 2);
}
T(n) is exponential in n. It takes approximately 2^(0.7n) steps to compute F(n). The proof is out
of the scope of this course. F(200) will take about 2^(140) steps which is even more than the life
of universe!!!!
A better solution:
Solve F1, F2, ..., Fn. Solve them in order and save their values!
Function Fib2(n){
Create an array fib[1.. .n]
fib[1] = 1
fib[2] = 1
for i = 3 to n:
fib[i] = fib[i 1] + fib[i 2]
return fib[n]
}
The time complexity of this algorithm is O(n). The number of steps required is proportional to n.
F(200) is now reasonable so is F(2000) and F(20000).
Tower of Hanoi
The Tower of Hanoi is a mathematical puzzle. It consists of three rods, and a number of disks of
different sizes which can slide onto any rod. The puzzle starts with the disks neatly stacked in
order of size on one rod, the smallest at the top, thus making a conical shape.
lOMoARcPSD|2583814
The objective of the puzzle is to move the entire stack to another rod, obeying the following rules:
We want to write a recursive method, THanoi(n,A,B,C) which moves n disks from peg A to peg C
using peg B for intermediate transfers.
Stopping condition: n = 1
The time complexity of above algorithm can be determined using following recurrence relation.
Let T(n) be the number of steps required to solve the puzzle for n disks. It is clearly evident from the
above observation that the soluiton for n disks is equivalent to solving the puzzle two times for n 1
disks and a single step involving transfer of disk from starting 'peg' to final 'peg' which takes
constant time.
Thus,
T(n) = T(n 1) + T(n 1) + O(1)= 2*T(n 1) + O(1)
The solution to this recurrence relation is exponential in n and so T(n) is of exponential order. The
proof is out of scope of this course.
This is an example where recursion is much easier to formulate than a loop based solution.
The time complexity(generally referred as running time) of an algorithm is expressed as the amount of
time taken by an algorithm for some size of the input to the problem. Big O notation is commonly
used to express the time complexity of any algorithm as this suppresses the lower order terms and is
described asymptotically. Time complexity is estimated by counting the operations(provided as
instructions in a program) performed in an algorithm. Here each operation takes a fixed amount of time in
execution. Generally time complexities are classified as constant, linear, logarithmic, polynomial,
exponential etc. Among these the polynomial and exponential are the most prominently
considered and defines the complexity of an algorithm. These two parameters for any algorithm are
always influenced by size of input.
An algorithm is said to be solvable in polynomial time if the number of steps required to complete the
algorithm for a given input is O(nk) for some non-negative integer k, where n is the complexity of the
input. Polynomial-time algorithms are said to be "fast." Most familiar mathematical operations such as
addition, subtraction, multiplication, and division, as well as computing square roots, powers, and
logarithms, can be performed in polynomial time. Computing the digits of most interesting
mathematical constants, including pi and e, can also be done in polynomial time.
lOMoARcPSD|2583814
The set of problems which can be solved by an exponential time algorithms, but for which no
polynomial time algorithms is known. An algorithm is said to be exponential time, if T(n) is upper
bounded by 2poly(n), where poly(n) is some polynomial in n. More formally, an algorithm is exponential
time if T(n) is bounded by O(2nk) for some constant k.
Algorithms which have exponential time complexity grow much faster than polynomial algorithms.
The difference you are probably looking for happens to be where the variable is in the equation that
expresses the run time. Equations that show a polynomial time complexity have variables in the bases of
their terms. Examples: n3 + 2n2+ 1. Notice n is in the base, NOT the exponent. In exponential equations,
the variable is in the exponent. Examples: 2n. As said before, exponential time grows much faster. If n is
equal to 1000 (a reasonable input for an algorithm), then notice 10003 is 1 billion, and 21000 is simply
huge! For a reference, there are about 280 hydrogen atoms in the sun, this is much more than 1 billion.
Worst Case:
In the worst case analysis, we calculate upper bound on running time of an algorithm. We must know the
case that causes maximum number of operations to be executed. For Linear Search, the worst case
happens when the element to be searched (x in the above code) is not present in the array. When x is not
present, the search() functions compares it with all the elements of arr[] one by one. Therefore, the worst
case time complexity of linear search would be Θ(n).
Average Case:
In average case analysis, we take all possible inputs and calculate computing time for all of the inputs.
Sum all the calculated values and divide the sum by total number of inputs. We must know (or predict)
distribution of cases. For the linear search problem, let us assume that all cases are uniformly
distributed (including the case of x not being present in array). So we sum all the cases and divide the
sum by (n+1). Following is the value of average case time complexity.
lOMoARcPSD|2583814
=
= Θ(n)
Best Case:
In the best case analysis, we calculate lower bound on running time of an algorithm. We must know the
case that causes minimum number of operations to be executed. In the linear search problem, the best
case occurs when x is present at the first location. The number of operations in the best case is constant
(not dependent on n). So time complexity in the best case would be Θ(1).
Most of the times, we do worst case analysis to analyze algorithms. In the worst analysis, we guarantee
an upper bound on the running time of an algorithm which is good information.
The average case analysis is not easy to do in most of the practical cases and it is rarely done. In the
average case analysis, we must know (or predict) the mathematical distribution of all possibleinputs.
The Best Case analysis is bogus. Guaranteeing a lower bound on an algorithm doesn’t provide any
information as in the worst case, an algorithm may take years to run.
For some algorithms, all the cases are asymptotically same, i.e., there are no worst and best cases.
For example, Merge Sort. Merge Sort does Θ(nLogn) operations in all cases. Most of the other
sorting algorithms have worst and best cases. For example, in the typical implementation of Quick Sort
(where pivot is chosen as a corner element), the worst occurs when the input array is already sorted and
the best occur when the pivot elements always divide array in two halves. For insertion sort, the worst
case occurs when the array is reverse sorted and the best case occurs when the array is sorted in the same
order as output.
Example: Factorial
n! = 1•2•3...n and 0! = 1 (called initial case)
So the recursive defintiion n! = n•(n-1)!
Algorithm F(n)
if n = 0 then return 1 // base case
else F(n-1)•n // recursive call
M(n) = M(n-1) + 1
is a recursive formula too. This is typical.
...
M(n) = 2iM(n-i) + ∑j=0-i2j = 2iM(n-i) + 2i-1
...
M(n) = 2n-1M(n-(n-1)) + 2n-1-1 = 2n-1M(1) + 2n-1-1 = 2n-1 + 2n-1-1 = 2n-1
M(n) ε Θ(2n)
Where did the exponential term come from? Because two recursive calls are made. Suppose three
recursive calls are made, what is the order of growth.
Lesson learned: Be careful of the recursive algorithm, they can grow exponential.
Especial if the problem size is measured by the level of the recursive tree and the operation count is total
number of nodes.
1. Problem size is n
2. Basic operation is the addition in the recursive call
3. There is no difference between worst and best case
4. Recursive relation including initial conditions
A(n) = A(floor(n/2)) + 1
IC A(1) = 0
5. Solve recursive relation
The division and floor function in the argument of the recursive call makes the analysis difficult.
We could make the variable substitution, n = 2k, could get rid of the definition,
but the substitution skips a lot of values for n.
The smoothness rule (see appendix B) says that is ok.
Smoothness rule
T(n) eventually non-decreasing and f(n) be smooth {eventually non-decreasing and f(2n) ε Θ(f(n))}
if T(n) ε Θ(f(n)) for n powers of b then T(n) ε Θ(f(n)) for all n.
lOMoARcPSD|2583814
In general solution to the inhomogeneous problem is equal to the sum of solution to homogenous
problem plus solution only to the inhomogeneous part. The undetermined coefficients of the solution for
the homogenous problem are used to satisfy the IC.
We guess at I(n) and then determine the new IC for the homogenous problem for B(n)
There is the Master Theorem that give the asymptotic limit for many common problems.
Iterative algorithm
Algorithm Fib(n)
F[1] ← 0; F[1] ← 1
for i ← 2 to n do
F[i] ← F[i-1] + F[i-2]
return F[n]