DAA - Module 1
DAA - Module 1
DAA - Module 1
ANALYSIS
OF
ALGORITHMS
18CS42
DESIGN AND ANALYSIS OF ALGORITHMS (18CS42)
MODULE-1
INTRODUCTION
1. How to devise algorithms: Mastering various design strategies helps to devise new
algorithms.
2. How to validate algorithms: checking the algorithm, whether it gives the correct answer
for all possible inputs. After validation the program can be written.
3. How to analyze algorithms: analysis of algorithms or performance analysis refers to the
task of determining how much computing time and storage an algorithm requires.
4. How to test a program: testing a program in two phases debugging and profiling
(performance, measurement).
a. Debugging is the process of executing programs on sample data sets to determine
whether faulty results occur and so correct them.
b. Profiling is the process of executing a correct program on data sets and measuring
the time and space it takes to compute the results.
Analysis framework is a systematic approach that can be applied for analyzing the efficiency
through:
• Time efficiency
• Space efficiency.
Time efficiency also called time complexity indicates how fast an algorithm in question runs.
Space efficiency also called space complexity refers to the amount of memory units required by the
algorithm in addition to the space needed for its input and output.
Analysis Framework
Measuring
Measuring Space
running time
Complexity
b = [log2n] +1
Where b= number of bits, n= input parameters.
This metric usually gives a better idea about the efficiency of algorithms.
An algorithm’s efficiency must be measured with a metric that does not on extraneous factors like
computer used, compiler used to run the program.
One possible approach is to count the number of times each of the algorithm’s operations is
executed. The most important component used to measure the running time of the algorithm
is called the basic operation.
Identification of the basic operation of an algorithm: it is usually the most time-
consuming operation in the algorithm’s innermost loop.
For example, most sorting algorithms work by comparing elements (keys) of a list being
sorted with each other; for such algorithms, the basic operation is a key comparison.
Let cop be the execution time of an algorithm’s basic operation on a particular computer, and let
C(n) be the number of times this operation needs to be executed for this algorithm. Then we can
estimate the running time T (n) of a program implementing this algorithm on that computer by the
formula
Problem: How much longer will the algorithm run if we double its input size? Assume C(n) is ½
n(n+1) .
3. Orders of Growth
The function growing the slowest among these is the logarithmic function. The exponential
function 2n and the factorial function n! grow so fast that their values become astronomically large
even for rather small values of n. Algorithms that require an exponential number of operations are
practical for solving only problems of very small sizes.
Many algorithms running time depend not only on an input size but on specific input.
Consider, as an example, sequential search. This is a straightforward algorithm that searches for a
given item (some search key K) in a list of n elements by checking successive elements of the list
until either a match with the search key is found or the list is exhausted. The running time of this
algorithm can be quite different for same list size n.
Worst case efficiency: when there are no matching elements or the first matching element happens
to be the last one on the list, the algorithm makes the largest number of key comparisons among all
possible inputs of size n:
Cworst (n) = n.
The worst-case efficiency of an algorithm is its efficiency for the worst-case input of size
n, which is an input (or inputs) of size n for which the algorithm runs the longest among all
possible inputs of that size.
The worst-case analysis provides very important information about an algorithm’s efficiency
by bounding its running time from above. In other words, it guarantees that for any instance of size
n, the running time will not exceed Cworst (n), its running time on the worst-case inputs.
The best-case efficiency of an algorithm is its efficiency for the best-case input of size n, which
is an input (or inputs) of size n for which the algorithm runs the fastest among all possible
inputs of that size.
Accordingly, we can analyze the best case efficiency as follows. First, we determine the kind of
inputs for which the count C(n) will be the smallest among all possible inputs of size n. For
example, the best-case inputs for sequential search are lists of size n with their first element equal to
a search key; accordingly, Cbest(n) = 1 for this algorithm.
The average-case efficiency of an algorithm is average time taken (number of times the
basic operation will be executed) to solve all the possible instances of the input.
The average-case efficiency: neither the worst-case analysis nor its best-case counterpart yields the
necessary information about an algorithm’s behavior on a “typical” or “random” input.
*Space complexity: The space complexity of an algorithm is the amount of memory it needed
to run to completion.
The space needed by each of these algorithms is the sum of the following components
i) Fixed Part: is the aspect of an algorithm that is independent of the characteristics of the
input and outputs.
Example: Number, size, Space for simple variables, fixed component variables, Space for
constant etc.
ii) A Variable Part: consists of the space needed by the components variables whose size is
dependent on the particular problem instance being solved.
Example: The space needed by referenced variables recursion stack space.
Therefore the space requirement S(P) of any algorithm P may be given as below
S(P)=C+ Sp (instance characteristics),
Where C is a constant, Sp is an instance characteristics which is the main focus in analyzing
the space complexity of any algorithm. Therefore Sp is always problem specific.
Example1
S(P)= C+ Sp
C=s, n, i =3 (1 word per variable)
Sp= a[i]= n
S(P) = 3+n
In recursive algorithms the instances are characterized by the n. The recursion stack needs
space which in turn space is used to store formal parameters, local variables and the return
address. So each call to Rsum requires at least 3 words.
• Space for value a[n]
• Return address
• Pointer to a[ ]
Since the depth of the recursion i.e. how many times recursion is called is n+1.
S(P)= 3(n+1)
*Time complexity:
The time T (P) taken by a program P is the sum of the compile time and the run time.
T(P)=compile time +Runtime
Compile Time of a program can be ignored as the complied program will run several times
without recompilationRun-time is denoted by tp.
.
Many factors affect t(p), like the characteristics of the compiler to be used, determine the
number of additions, subtractions multiplications, divisions, compares, loads, stores and so on
That would be made by the code for P so
This formula can be used and the idle time for each operation can be considered, but the
operation execution time is machine dependent. So t (p) must be deduced such that it should be
machine independent.
So it is idle to calculate t(p) by machine independent feature I.e., to identify the program step
and calculate the count of it .
Here count variable is made declared as global. Count is incremented by the step count of each
statement it executes.
So, in the algorithm the for loop, the count will increase by a total of 2n.Then finally
2n+3 will be count after program termination each invocation of sum algorithm executes a
total of 2n+3 steps. For recurrence sum the algorithm step count is computed as below
2. Tabular method: to determine the step count of an algorithm is to build a table in which we
list the total number of steps contributed by each statement.
Here 2 things we should identify.
s/e: Steps per execution of the statement
Frequency: Number of times the statement is executed.
Manjula L, Assistant Professor, Dept of CSE, RNSIT Page 14
DESIGN AND ANALYSIS OF ALGORITHMS (18CS42)
Example:
To compare and rank orders of growth, we use three notations: O (big oh), Ω (big
omega), and Θ (big theta).
O-notation:
A function t(n) is said to be in O(g(n)), denoted t(n)∈O(g(n)), if t(n) is bounded above by some
constant multiple of g(n) for all large n, i.e., if there exist some positive constant c and some
nonnegative integer n0 , such that
t(n) ≤ c g(n) for all n≥ n0
The definition illustration is shown in the below figure
.
Figure : O-notation
Ex: 3n3+2n2∈O(n3)
According to definition of O-notation,
t(n) ≤ c g(n) for all n≥ n0
i.e. 3n3+2n2≤c.n3 Assume 2 is replaced by n
Then 3n3+n3
= 4n3
so c=4 and n0>=2
for more examples refer to the material uploaded.
Ω-notation:
A function t(n) is said to be in Ω (g(n)), denoted t(n)∈ Ω (g(n)), if t(n) is
bounded above by some constant multiple of g(n) for all large n, i.e., if there
exist some positive constant c and some nonnegative integer n0 such that
t(n) ≥c g(n) for all n≥ n0
The definition illustration is shown in the below figur
Ex: n! ∈Ω (2n)
i.e. n!≥ c. 2n
with c=1 & n0≥4 the above inequality will be satisfied.
Θ-notation:
function t(n) is said to be in Θ(g(n)), denoted t(n)∈Θ(g(n)), if t(n) is bounded both above
and below by some positive constant multiples of g(n) for all large n, i.e., if there exist some
positive constant c1 and c2 and some nonnegative integer n0 such that
c2g(n) ≤ t(n) ≤c1g(n) for all n≥n0
The definition illustration is shown in the below figure
Ex: ½n(n-1)∈Θ(n2)
with c2=¼ , c1= ½ & n0≥2 the above inequality will be satisfied.
THEOREM:
If t1(n)∈O(g1(n)) and t2(n)∈O(g2(n)), then t1(n)+t2(n)∈O(max{g1(n),g2(n)}).
(The analogous assertions are true for the Θ and Ω notations as well.)
Proof: Let us take four arbitrary real numbers a1, b1, a2 and b2; if a1≤ b1 and a2≤b2 then
a1+a2≤2max {b1, b2}
Since t1(n)∈O(g1(n)),there exist some positive constant c1 and some nonnegative integer n1
such that
t1(n)≤c1g1(n) for all n≥n1.
The first two cases mean that t(n) ∈O(g(n)), The last two
that t(n)∈Θ(g(n)).
The limit -based approach is often more convenient than the approach based on the
definitions because it can take advantage of the powerful calculus techniques developed for
computing limits, such as
L'Hospital's rule
Here the limit is a positive constant, i.e. the functions have the same order of growth i.e.
½n(n − 1) ∈Θ(n2)
Here the limit is equal to zero, i.e. log2n has a smaller order of growth than √n. i.e.
log2n∈O(√n).
Here the limit is equal to ∞, i.e. n! has a larger order of growth than 2n. i.e
n!∈Ω(2n).
Analysis:
1. The measure of input's size here is the number of elements in the array, i.e., n.
2. There are two basic operations in the algorithm: the comparison A[i] >maxval and the
assignment maxval A[i]. Since the comparison is executed on each repetition of the
loop and the assignment is not, so comparison to be the algorithm's basic operation.
3. The basic operation of the algorithm depends only on the size of the input, so we need
to analyze only one kind of efficiency.
4. Let us denote C(n) the number of times this comparison is executed. The algorithm
makes one comparison on each value of the loop's variable i within the bounds 1 and
n - 1. Therefore, we get the following sum for C(n)
:
3. The number of element comparisons will depend not only on size of input but also on
whether there are equal elements in the array and, if there are, which array positions
they occupy. So we need to analyze best case, worst case & average case separately.
4. Worst Case analysis:
There are two kinds of worst-case inputs arrays with no equal elements and arrays in
which the last two elements are the only pair of equal elements. For such inputs, one
comparison is made for each repetition of the innermost loop, i.e., for each value of the
loop's variable j between its limits i + 1 and n - 1; and this is repeated for each value of
the outer loop, i.e., for each value of the loop's variable i between its limits 0 and n - 2.
So, we get basic operation count as:
5. Solve the above equation by using standard formulas & rules of Summation.
The total number of multiplications M(n) is expressed by the following triple sum:
5. Solve the above equation by using standard formulas & rules of Summation.
ALGORITHM Binary(n)
//Input: A positive decimal integer n
//Output: The number of binary digits in n 's binary representation
count 1
while n > 1 do
count count + 1
n
return count
Analysis:
if n = 0 return 1
else return F(n - 1) * n
Analysis:
1. The size of the input is value of n.
2. The basic operation of the algorithm is multiplication, let M(n) denote number of
executions of Multiplications.
3. The basic operation count of the algorithm depends only on the size of the input, so
we need to analyze only one kind of efficiency.
4. F(n) is computed as F(n)=F(n - 1) * n, so the number of multiplications needed to
compute F(n) is multiplications needed to compute F(n-1) plus 1 to multiply the result
with n.
To solve the above recurrence relation we need an initial condition i.e. the value with which
the sequence starts. This can be obtained by looking at the condition that makes the recursive
call to stop. Here
if n=0 return 1
so the recurrence relation with initial condition is
M(n)=M(n-1)+1
M(0)=0
5. Solve the above recurrence relation by using backward substitution method.
M(n) =M(n-1)+1 //Substitute M(n-1)=M(n-2)+1
= (M(n-2)+1)+1
=M(n-2)+2 //Substitute M(n-2)=M(n-3)+1
= (M(n-3)+1)+2
=M(n-3)+3
.…
=M(n-i)+i
.…
=M(n-n)+n
=M(0)+n
M(n) =0+n //Initial Condition
M(n) =n
The number of multiplications required is: M(n)=n
We have n disks of different sizes & three pegs. Initially all the disks are on the first peg
such that largest is on the bottom & smallest is on the top. We have to move all the disks to the
third peg using second one as an auxiliary. We can move only one disk at a time and smaller one
is always on the top of the larger one. This problem can be solved by recursive technique.
When n>1(number of disks), we first move recursively n-1 disks from peg1 to peg2 with peg3 as
auxiliary, then move the largest disk from peg1 to peg3. Finally move the n-1 disks recursively
from peg2 to peg3 with peg1 as auxiliary.
=2iM(n-i)+2i-1+…+22+2+1
.…
=2n-1M(n-(n-1))+…+ 22+2+1
=2n-1M(1)+2n-2+2n-3…+ 22+2+1
M(n) =1(1-2n)/1-2=2n-1=2n
When a recursive algorithm makes more than a single call to itself, it can be useful for analysis
purposes to construct a tree of its recursive calls. In this tree, nodes correspond to recursive calls,
and we can label them with the value of the parameter (or, more generally, parameters) of the
calls. For the Tower of Hanoi example, the tree is given in Figure. By counting the number of
nodes in the tree, we can get the total number of calls made by the Tower of Hanoi algorithm:
Figure: Tree of recursive calls made by the recursive algorithm for the Tower of Hanoi
puzzle.
ALGORITHM BinRec(n)
if n=1 return 1
Analysis:
3. The basic operation count depends only on the value of input parameter.
4. The recurrence relation for the basic operation count is:
The number of additions required: A(n)=A( )+1
The initial condition is A(1)=0
5. Solve the above recurrence relation by using backward substitution method.
Assume n=2k for simplification
A(2k) =A(2k/2)+1
A(2k) =A(2k-1)+1 //Substitute A(2k-1)= A(2k-2)+1
=[A(2k-2)+1]+1
= A(2k-2)+2 //Substitute A(2k-2)= A(2k-3)+1
=[A(2k-3)+1]+2
=A(2k-3)+3
….
.
= A(2k-k)+k
= A(20)+k
= A(1)+k //Initial condition is A(1)=0
A(2k) =0+k
But,
n =2k
logn =log2k =k log2(2)
logn =k
k = logn
A (2k) =k
∴ Key operation count A (n)=log n
1.8 Important Problem types: The most important problem types are
1. Sorting
2. Searching
3. String processing
4. Graph problems
5. Combinatorial problems
6. Geometric problems
7. Numerical problems
These problems are used in the subject to illustrate different algorithm design techniques and
methods of algorithm analysis.
1. Sorting
The sorting problem is to rearrange the items of a given list in non decreasing order.
For example, we can choose to sort student records in alphabetical order of names or by student
number or by student grade-point average. Such a specially chosen piece of information is called a
key.
The second notable feature of a sorting algorithm is the amount of extra memory the
algorithm requires. An algorithm is said to be in-place if it does not require extra memory, except,
possibly, for a few memory units. There are important sorting algorithms that are in-place and those
that are not.
2. Searching
The searching problem deals with finding a given value, called a search key, in a given set .There
are plenty of searching algorithms to choose from. Example: sequential search and binary search.
These algorithms are of particular importance or real-world applications because they are
indispensable for storing and retrieving information from large databases.
For searching, too, there is no single algorithm that fits all situations best. Some algorithms
work faster than others but require more memory; some are very fast but applicable only to sorted
arrays; and so on. Unlike with sorting algorithms, there is no stability problem, but different issues
arise.
3. String Processing
In recent decades, the applications dealing with non numerical data interest of researchers in string-
handling algorithms.
A string is a sequence of characters from an alphabet. Strings of particular interest are text
strings, which comprise letters, numbers, and special characters; bit strings, which comprise zeros
and ones; and gene sequences, which can be modeled by strings of characters from the four-
character alphabet {A,C, G, T}.
There are many string-processing algorithms in computer science one particular problem—
that of searching for a given word in a text—has attracted special attention from researchers. They
call it string matching.
4. Graph Problems
A Graph consists of a finite set of vertices (or nodes) and set of Edges which connect a pair of
nodes.
Graphs are used to solve many real-life problems. Graphs are used to represent networks. The
networks may include paths in a city or telephone network or circuit network. Graphs are also used
in social networks like LinkedIn, Facebook. For example, in Facebook, each person is represented
with a vertex (or node). Each node is a structure and contains information like person id, name,
gender, locale etc.
Graphs can be used for modeling a wide variety of applications, including transportation,
communication, social and economic networks, project scheduling, and games. Studying different
technical and social aspects of the Internet in particular is one of the active areas of current research
involving computer scientists, economists, and social scientists .Basic graph algorithms include
graph-traversal algorithms (how can one reach all the points in a network?), shortest-path algorithms
(what is the best route between two cities?), and topological sorting for graphs with directed edges
examples are the traveling salesman problem and the graph-coloring problem.
5. Combinatorial Problems
Combinatorial problem deals with, given a finite collection of objects and a set of
constraints, finding an object of the collection that satisfies all constraints. Combinatorial problems
are problems involving arrangements of elements from a finite set and selections from a finite set.
These problems can be divided into three basic types: (1) enumeration
problems, (2) existence problems, and (3) optimization problems.
In enumeration problems the goal is either to find how many arrangements there are satisfying the
given properties or to produce a list of arrangements satisfying the given properties.
In existence problems the goal is to decide whether or not an arrangement exists satisfying the given
properties.
In optimization problems the goal is to find where a given function of several variables takes on an
extreme value (maximum or minimum) over a given finite domain.
The traveling salesman problem and the graph coloring problem are examples of
combinatorial problems. These are problems that ask, explicitly or implicitly, to find a
combinatorial object such as a permutation, a combination, or a subset—that satisfies certain
constraints.
A desired combinatorial object may also be required to have some additional property such
as a maximum value or a minimum cost. Combinatorial problems are the most difficult problems in
computing, from both a theoretical and practical standpoint.
Some combinatorial problems can be solved by efficient algorithms, but they should be
considered fortunate exceptions to the rule. The shortest-path problem mentioned earlier is among
such exceptions.
6. Geometric Problems:
Geometric algorithms deal with geometric objects such as points, lines, and polygons. The ancient
of course, today people are interested in geometric algorithms with quite different applications in
mind, such as computer graphics, robotics, and tomography. The two classic problems of
computational geometry: the closest-pair problem and the convex-hull problem.
The closest-pair problem is self-explanatory: given n points in the plane, find the closest pair
among them. The convex-hull problem asks to find the smallest convex polygon that would include
all the points of a given set.
7. Numerical Problems
Numerical problems, another large special area of applications, are problems that involve
mathematical objects of continuous nature: solving equations and systems of equations,
computing definite integrals, evaluating functions, and so on. The majority of such mathematical
problems can be solved only approximately. Another principal difficulty stems from the fact that
such problems typically require manipulating real numbers, which can be represented in a computer
only approximately. Moreover, a large number of arithmetic operations performed on approximately
represented numbers can lead to an accumulation of the round-off error to a point where it can
drastically distort an output produced by a seemingly sound algorithm.
Many sophisticated algorithms have been developed over the years in this area, and they
continue to play a critical role in many scientific and engineering applications. But in the last 30
years or so, the computing industry has shifted its focus to business applications. These new
applications require primarily algorithms for information storage, retrieval, transportation through
networks, and presentation to users. As a result of this revolutionary change, numerical analysis has
lost its formerly dominating position in both industry and computer science programs. Still, it is
important for any computer-literate person to have at least a rudimentary idea about numerical
algorithms.
Since majority of algorithms operate on data, particular ways of organizing data play a critical role
in the design and analysis of algorithms.
A data structure can be defined as a particular scheme of organizing related data items. The
nature of the data items is dictated by the problem at hand; they can range from elementary data
types (e.g., integers or characters) to data structures (e.g., a one-dimensional array of one-
dimensional arrays is often used for implementing matrices).
There are a few data structures that have proved to be particularly important for computer
algorithms.
The two most important elementary data structures are the array and the linked list. A (one
dimensional) array is a sequence of n items of the same data type that are stored contiguously in
computer memory and made accessible by specifying a value of the array’s index
A linked list: is a sequence of zero or more elements called nodes, each containing two kinds of
information: some data and one or more links called pointers to other nodes of the linked list.
Insertions and deletions can be made quite efficiently in a linked list by reconnecting a few
appropriate pointers. It is often convenient to start a linked list with a special node called the header.
This node may contain information about the linked list itself, such as its current length; it may also
contain, in addition to a pointer to the first element, a pointer to the linked list’s last element.
Doubly linked list: this is an Another extension of singly linked list in which every node, except the
first and the last, contains pointers to both its successor and its predecessor
The array and linked list are two principal choices in representing a more abstract data structure
called a linear list or simply a list.
A list : is a finite sequence of data items, i.e., a collection of data items arranged in a certain linear
order. The basic operations performed on this data structure are searching for, inserting, and deleting
an element. Two special types of lists, stacks and queues, are particularly important.
A stack is a list in which insertions and deletions can be done only at the end. This end is
called the top because a stack is usually visualized not horizontally but vertically—akin to a stack of
plates whose “operations” it mimics very closely. As a result, when elements are added to (pushed
onto) a stack and deleted from (popped off) it, the structure operates in a “last-in–first-out” (LIFO)
fashion—exactly like a stack of plates if we can add or remove a plate only from the top.
Stacks have a multitude of applications; in particular, they are indispensable for
implementing recursive algorithms.
A queue, on the other hand, is a list from which elements are deleted from one end of the
structure, called the front (this operation is called dequeue),and new elements are added to the other
end, called the rear (this operation is called enqueue).
Consequently, a queue operates in a “first-in–first-out” (FIFO) fashion—akin to a queue of
customers served by a single teller in a bank. Queues also have many important applications,
including several algorithms for graph problems. Many important applications require selection of
an item of the highest priority among a dynamically changing set of candidates.
A data structure that seeks to satisfy the needs of such applications is called a priority queue.
A priority queue is a collection of data items from a totally ordered universe (most often integer or
real numbers). The principal operations on a priority queue are finding its largest element, deleting
its largest element, and adding a new element. .
A better implementation of a priority queue is based on an ingenious data structure called
the heap. We discuss heaps and an important sorting algorithm based on them in Section 6.4.
Graphs
A graph is informally thought of as a collection of points in the plane called “vertices” or “nodes,”
some of them connected by line segments called “edges” or “arcs.”
Formally, a graphG = _V,E_ is defined by a pair of two sets: a finite nonempty set V of
items called vertices and a set E of pairs of these items called edges. If these pairs of vertices are
unordered, i.e., a pair of vertices (u, v) is the same as the pair (v, u), we say that the vertices u and v
are adjacent to each other and that they are connected by the undirected edge (u, v).
We call the vertices u and v endpoints of the edge (u, v) and say that u and v are incident to
this edge; we also say that the edge (u, v) is incident to its endpoints u and v. A graph G is called
undirected if every edge in it is undirected. If a pair of vertices (u, v) is not the same as the pair (v,
u), we say that the edge (u, v) is directed from the vertex u, called the edge’s tail, to the vertex v,
called the edge’s head. We also say that the edge (u, v) leaves u and enters v. A graph whose every
edge is directed is called directed. Directed graphs are also called digraphs.
V = {a, b, c, d, e, f }, E = {(a, c), (a, d), (b, c), (b, f ), (c, e), (d, e), (e, f )}.
The digraph depicted in Figure has six vertices and eight directed edges:
V = {a, b, c, d, e, f }, E = {(a, c), (b, c), (b, f ), (c, e), (d, a), (d, e), (e, c), (e, f )}.
Our definition of a graph does not forbid loops, or edges connecting vertices to themselves.
A graph with relatively few possible edges missing is called dense. A graph with few edges relative
to the number of its vertices is called sparse.
Whether we are dealing with a dense or sparse graph may influence how we choose to
represent the graph and, consequently, the running time of an algorithm being designed or used.
Graph Representations Graphs for computer algorithms are usually represented in one of two
ways: the adjacency matrix and adjacency lists.
The adjacency matrix of a graph with n vertices is an n × n boolean matrix with one row
and one column for each of the graph’s vertices, in which the element in the ith row and the j th
column is equal to 1 if there is an edge from the ith vertex to the jth vertex, and equal to 0 if there is
no such edge. For example, the adjacency matrix for the graph of
For the above Fig d (a) here is the adjacency matrix and (b) adjacency list
respectively.
Note that the adjacency matrix of an undirected graph is always symmetric, i.e., A[i, j ]= A[j, i] for
every 0 ≤ i, j ≤ n − 1
The adjacency lists of a graph or a digraph is a collection of linked lists, one for each vertex,
that contain all the vertices adjacent to the list’s vertex (i.e., all the vertices connected to it by an
edge).. For example, Figure d represents the graph in Figure 1.6a via its adjacency lists. To put it
another way.
Weighted Graphs A weighted graph (or weighted digraph) is a graph (or digraph) with numbers
assigned to its edges. These numbers are called weights or costs.
If a weighted graph is represented by its adjacency matrix, then its element A[i, j ] will
simply contain the weight of the edge from the ith to the jth vertex if there is such an edge and a
special symbol, e.g.,∞, if there is no such edge. Such a matrix is called the weight matrix or cost
matrix.
Paths and Cycles Among the many properties of graphs, two are important for a great number of
applications: connectivity and acyclicity. Both are based on the notion of a path.
A path from vertex u to vertex v of a graph G can be defined as a sequence of adjacent
(connected by an edge) vertices that starts with u and ends with v. If all vertices of a path are
distinct, the path is said to be simple.
The length of a path is the total number of vertices in the vertex sequence defining the path
minus 1, which is the same as the number of edges in the path. For example, a, c, b, f is a simple
path of length 3 from a to f in the graph in Figure 1.6a, whereas a, c, e, c, b, f is a path (not simple)
of length 5 from a to f.
A directed path is a sequence of vertices in which every consecutive pair of the vertices is
connected by an edge directed from the vertex listed first to the vertex listed next. For example, a, c,
e, f is a directed path from a to f in the graph in Figure 1.6b.
A graph is said to be connected if for every pair of its vertices u and v there is a path from u to v. If
we make a model of a connected graph by connecting some balls representing the graph’s vertices
with strings representing the edges, it will be a single piece.
If a graph is not connected, such a model will consist of several connected pieces that are
called connected components of the graph. Formally, a connected component is a maximal (not
expandable by including another vertex and an edge) connected subgraph2 of a given graph. For
example, the graphs in Figures 1.6a and 1.8a are connected, whereas the graph in Figure below is
not, because there is no path, for example, from a to f. The graph in the below Figure has two
connected components with vertices {a, b, c, d, e} and {f, g, h, i}, respectively. Graphs with several
connected components do happen in real-world applications.
. A cycle is a path of a positive length that starts and ends at the same vertex and does not
traverse the same edge more than once. For example, f , h, i, g, f is a cycle in the graph in Figure 1.9.
A graph with no cycles is said to be acyclic.
Trees
A tree (more accurately, a free tree) is a connected acyclic graph (Figure below)
A graph that has no cycles but is not necessarily connected is called a forest: Each of its
connected components is a tree A subgraph of a given graph G = _V, E_ is a graph G
Trees have several important properties other graphs do not have. In particular, the number of edges
in a tree is always one less than the number of its vertices: |E| = |V| − 1.
Rooted Trees Another very important property of trees is the fact that for every two vertices in a
tree, there always exists exactly one simple path from one of these vertices to the other. This
property makes it possible to select an arbitrary vertex in a free tree and consider it as the root of the
so-called rooted tree.
A rooted tree is usually depicted by placing its root on the top (level 0 of the tree), the
vertices adjacent to the root below it (level 1), the vertices two edges apart from the root
still below (level 2), and so on.
Figure below presents such a transformation from a free tree to a rooted tree. Rooted trees
play a very important role in computer science, a much more important one than free trees do; in
fact, for the sake of brevity, they are often referred to as simply “trees.” An obvious application of
trees is for describing hierarchies, from file directories to organizational charts of enterprises. There
are many less obvious applications, such as implementing dictionaries and data encoding
List of tree applications, we should mention the so-called state-space trees that underline
two important algorithm design techniques: backtracking and branch-and-bound .
• For any vertex v in a tree T , all the vertices on the simple path from the root to that vertex
are called ancestors of v.
• The vertex itself is usually considered its own ancestor; the set of ancestors that excludes the
vertex itself is referred to as the set of proper ancestors.
• If (u, v) is the last edge of the simple path from the root to vertex v (and u _= v), u is said to
be the parent of v and v is called a child of u;
• vertices that have the same parent are said to be siblings.
• A vertex with no children is called a leaf ;
• a vertex with at least one child is called parental. All the vertices for which a vertex v is an
ancestor are said to be descendants of v;
• the proper descendants exclude the vertex v itself.
• All the descendants of a vertex v with all the edges connecting them form the subtree of T
rooted at that vertex.
• Thus, for the tree in Figure b, the root of the tree is a; vertices d, g, f, h, and I are leaves, and
vertices a, b, e, and c are parental; the parent of b is a; the children of b are c and g; the
siblings of b are d and e; and the vertices of the subtree rooted at b are {b, c, g, h, i}.
• The depth of a vertex v is the length of the simple path from the root to v.
• The height of a tree is the length of the longest simple path from the root to a leaf.
Ordered Trees An ordered tree is a rooted tree in which all the children of each vertex are ordered.
It is convenient to assume that in a tree’s diagram, all the children are ordered left to right.
A binary tree can be defined as an ordered tree in which every vertex has no more than two children
and each child is designated as either a left child or a right child of its parent; a binary tree may also
be empty. The binary tree with its root at the left (right) child of a vertex in a binary tree is called the
left (right) subtree of that vertex.
In Figure below, some numbers are assigned to vertices of the binary tree. Note that a number
assigned to each parental vertex is larger than allthe numbers in its left subtree and smaller than all
the numbers in its right subtree.Such trees are called binary search trees. (This figure shows binary
tree and its binary search tree representation).
Binary trees and binary search trees have a wide variety of applications in computer science;
you will encounter some of them throughout the book. In particular, binary search trees can be
generalized to more general types of search trees called multiway search trees, which are
indispensable for efficient access to very large data sets. As you will see later in the book, the
efficiency of most important algorithms for binary search trees and their extensions depends on the
tree’s height. Therefore, the following inequalities for the height h of a binary tree with n nodes are
especially important for analysis of such algorithms:log2 n_ ≤ h ≤ n − 1.
The notion of a set plays a central role in mathematics. A set can be described as an unordered
collection (possibly empty) of distinct items called elements of below figure .
A specific set is defined either by an explicit listing of its elements (e.g., S = {2,3, 5, 7}) or by
specifying a property that all the set’s elements and only they must satisfy (e.g., S = {n: n is a prime
number smaller than 10}).
The most important set operations are: checking membership of a given item in a given set; finding
the union of two sets, which comprises all the elements in either or both of them; and finding the
intersection of two sets, which comprises all the common elements in the sets.
Sets can be implemented in computer applications in two ways. The first considers only sets
that are subsets of some large set U, called the universal set. If set U has n elements, then any subset
S of U can be represented by a bit string of size n, called a bit vector, in which the ith element is 1 if
and only if the ith element of U is included in set S. Thus, to continue with our example, if U = {1, 2,
3, 4, 5, 6, 7, 8, 9}, then S = {2, 3, 5, 7} is represented by the bit string 011010100. This way of
representing sets makes it possible to implement the standard set operations very fast, but at the
expense of potentially using a large amount of storage.
The second and more common way to represent a set for computing purpose is to use the list
structure to indicate the set’s elements. Of course, this option, too, is feasible only for finite sets;
fortunately, unlike mathematics, this is the kind of sets most computer applications need.
Note, however, there are two principal points of distinction between sets and lists. First, a set cannot
contain identical elements; a list can. This requirement for uniqueness is sometimes circumvented
by the introduction of a multiset, or bag, an unordered collection of items that are not necessarily
distinct. Second, a set is an unordered collection of items; therefore, changing the order of its
elements does not change the set. A list, defined as an ordered collection of items, is exactly the
opposite.
This is an important theoretical distinction, but fortunately it is not important for many
applications. It is also worth mentioning that if a set is represented by a list, depending on the
application at hand, it might be worth maintaining the list in a sorted order.
Manjula L, Assistant Professor, Dept of CSE, RNSIT Page 44
DESIGN AND ANALYSIS OF ALGORITHMS (18CS42)
In computing, the operations we need to perform for a set or a multiset most often are searching for
a given item, adding a new item, and deleting an item from the collection. A data structure that
implements these three operations is called the dictionary. Note the relationship between this data
structure and the problem of searching mentioned in Section 1.3; obviously, we are dealing here
with searching in a dynamic context. Consequently, an efficient implementation of a dictionary has
to strike a compromise between the efficiency of searching and the efficiencies of the other two
operations.