BCS304 Notes
BCS304 Notes
DEPARTMENT
OF
LESSON NOTES
SUBJECT
Vision
Mission
To keep pace with advancements in knowledge and make the students competitive
and capable at the global level.
To create an environment for the students to acquire the right physical, intellectual,
emotional and moral foundations and shine as torch bearers of tomorrow's society.
To strive to attain ever-higher benchmarks of educational excellence.
Department of Computer Science & Engineering
Vision of the Department
PSO1: Ability to apply skills in the field of algorithms, database design, web design, cloud
computing and data analytics.
PSO2: Apply knowledge in the field of computer networks for building network and
internet-based applications.
Course Syllabi with CO's
1. Ellis Horowitz, Sartaj Sahni and Susan Anderson-Freed, Fundamentals of Data Structures in C,
2nd Ed, Universities Press, 2014
Reference Books:
1. Seymour Lipschutz, Data Structures Schaum's Outlines, Revised 1 st Ed, McGraw Hill, 2014.
2. Gilberg & Forouzan, Data Structures: A Pseudo-code approach with C, 2 nd Ed, Cengage
Learning,2014.
3. Reema Thareja, Data Structures using C, 3 rd Ed, Oxford press, 2012.
4. Jean-Paul Tremblay & Paul G. Sorenson, An Introduction to Data Structures with Applications,
2nd Ed, McGraw Hill, 2013
5. A M Tenenbaum, Data Structures using C, PHI, 1989
6. Robert Kruse, Data Structures and Program Design in C, 2 nd Ed, PHI, 1996.
Web links and Video Lectures (e-Resources):
● http://elearning.vtu.ac.in/econtent/courses/video/CSE/06CS35.html
● https://nptel.ac.in/courses/106/105/106105171/
● http://www.nptelvideos.in/2012/11/data-structures-and-algorithms.html
● https://www.youtube.com/watch?v=3Xo6P_V-qns&t=201s
● https://ds2-iiith.vlabs.ac.in/exp/selection-sort/index.html
● https://nptel.ac.in/courses/106/102/106102064/
● https://ds1-iiith.vlabs.ac.in/exp/stacks-queues/index.html
● https://ds1-iiith.vlabs.ac.in/exp/linked-list/basics/overview.html
● https://ds1-iiith.vlabs.ac.in/List%20of%20experiments.html
● https://ds1-iiith.vlabs.ac.in/exp/tree-traversal/index.html
● https://ds1-iiith.vlabs.ac.in/exp/tree-traversal/depth-first-traversal/dft-practice.html
● https://infyspringboard.onwingspan.com/web/en/app/toc/lex_auth_013501595428077568125
59/overview
At the end of the course the student will be able to:
The weightage of Continuous Internal Evaluation (CIE) is 50% and for Semester End Exam (SEE)
is 50%. The minimum passing mark for the CIE is 40% of the maximum marks (20 marks out of
50) and for the SEE minimum passing mark is 35% of the maximum marks (18 out of 50 marks).
A student shall be deemed to have satisfied the academic requirements and earned the credits
allotted to each subject/ course if the student secures a minimum of 40% (40 marks out of 100) in
the sum total of the CIE (Continuous Internal Evaluation) and SEE (Semester End Examination)
taken together.
Internal Assessment Test question paper is designed to attain the different levels of Bloom’s
taxonomy as per the outcome defined for the course.
Semester-End Examination:
Theory SEE will be conducted by University as per the scheduled timetable, with common question
papers for the course (duration 03 hours).
1. The question paper will have ten questions. Each question is set for 20 marks.
2. There will be 2 questions from each module. Each of the two questions under a module (with a
maximum of 3 sub-questions), should have a mix of topics under that module.
3. The students have to answer 5 full questions, selecting one full question from each module.
4. Marks scored shall be proportionally reduced to 50 marks.
Data Structure & Applications – BCS304
DATA STRUCTURES
Data may be organized in many different ways. The logical or mathematical model of a
particular organization of data is called a data structure.
Data items that are divided into sub-items are called Group items. Ex: An Employee Name
may be divided into three subitems- first name, middle name, and last name.
Data items that are not able to divide into sub-items are called Elementary items.
Ex: SSN
Entity: An entity is something that has certain attributes or properties which may be assigned
values. The values may be either numeric or non-numeric.
Ex: Attributes- Names, Age, Sex, SSN
Values- Rohland Gail, 34, F, 134-34-5533
Entities with similar attributes form an entity set. Each attribute of an entity set has a range of
values, the set of all possible values that could be assigned to the particular attribute.
The term “information” is sometimes used for data with given attributes, of, in other words
meaningful or processed data.
Each record in a file may contain many field items but the value in a certain field may uniquely
determine the record in the file. Such a field K is called a primary key and the values k1, k2,
….. in such a field are called keys or key values.
Example: Student records have variable lengths, since different students take different numbers
of courses. Variable-length records have a minimum and a maximum length.
The above organization of data into fields, records and files may not be complex enough to maintain
and efficiently process certain collections of data. For this reason, data are also organized into more
complex types of structures.
The study of complex data structures includes the following three steps:
1. Logical or mathematical description of the structure
2. Implementation of the structure on a computer
3. Quantitative analysis of the structure, which includes determining the amount of
memory needed to store the structure and the time required to process the structure.
1. Primitive data Structures: Primitive data structures are the fundamental data types which are
supported by a programming language. Basic data types such as integer, real, character and
Boolean are known as Primitive data Structures. These data types consists of characters that
cannot be divided and hence they also called simple data types.
2. Non- Primitive data Structures: Non-primitive data structures are those data structures which
are created using primitive data structures. Examples of non-primitive data structures is the
processing of complex numbers, linked lists, stacks, trees, and graphs.
Based on the structure and arrangement of data, non-primitive data structures is further
classified into
1. Linear Data Structure
2. Non-linear Data Structure
Data Structure & Applications – BCS304
The common examples of linear data structure are Arrays, Queues, Stacks, Linked lists
Arrays:
The simplest type of data structure is a linear (or one dimensional) array. A list of a finite
number n of similar data referenced respectively by a set of n consecutive numbers, usually 1,
2, 3 .............. n. if A is chosen the name for the array, then the elements of A are denoted by
subscript notation a1, a2, a3 ............ an
or
by the parenthesis notation A (1), A (2), A (3) .............. A (n)
or
by the bracket notation A [1], A [2], A [3] .............. A [n]
Example 1: A linear array STUDENT consisting of the names of six students is pictured in
below figure. Here STUDENT [1] denotes John Brown, STUDENT [2] denotes Sandra
Gold, and so on.
Data Structure & Applications – BCS304
Linear arrays are called one-dimensional arrays because each element in such an array is referenced
by one subscript. A two-dimensional array is a collection of similar data elements where each
element is referenced by two subscripts.
Example 2: A chain of 28 stores, each store having 4 departments, may list its weekly sales as in
below fig. Such data can be stored in the computer using a two-dimensional array in which the
first subscript denotes the store and the second subscript the department. If SALES is the name
given to the array, then
SALES [1, 1] = 2872, SALES [1, 2] - 805, SALES [1, 3] = 3211,…., SALES [28, 4] = 982
Trees
Data frequently contain a hierarchical relationship between various elements. The data structure
which reflects this relationship is called a rooted tree graph or a tree.
Some of the basic properties of tree are explained by means of examples
However, Name may be a group item with the sub-items Last, First and MI (middle initial). Also
Address may be a group item with the subitems Street address and Area address, where Area itself
may be a group item having subitems City, State and ZIP codenumber.
This hierarchical structure is pictured below
Data Structure & Applications – BCS304
Another way of picturing such a tree structure is in terms of levels, as shown below
1. Stack: A stack, also called a fast-in first-out (LIFO) system, is a linear list in which insertions
and deletions can take place only at one end, called the top. This structure is similar in its operation
to a stack of dishes on a spring system as shown in fig.
Note that new 4 dishes are inserted only at the top of the stack and dishes can be deleted only from
the top of the Stack.
Data Structure & Applications – BCS304
2. Queue: A queue, also called a first-in first-out (FIFO) system, is a linear list in which deletions
can take place only at one end of the list, the "from'' of the list, and insertions can take place only at
the other end of the list, the “rear” of the list.
This structure operates in much the same way as a line of people waiting at a bus stop, as pictured
in Fig. the first person in line is the first person to board the bus. Another analogy is with
automobiles waiting to pass through an intersection the first car in line is the first car through.
3. Graph: Data sometimes contain a relationship between pairs of elements which is not
necessarily hierarchical in nature. For example, suppose an airline flies only between the cities
connected by lines in Fig. Thedata structure which reflects this type of relationship is called a graph
Data Structure & Applications – BCS304
ARRAYS
An Array is defined as, an ordered set of similar data items. All the data items of an
array are stored in consecutive memory locations.
The data items of an array are of same type and each data items can be accessed using
the same name but different index value.
An array is a set of pairs, <index, value >, such that each index has a value associated
with it. It can be called as corresponding or a mapping
Ex: <index, value>
< 0 , 25 > list[0]=25
< 1 , 15 > list[1]=15
< 2 , 20 > list[2]=20
< 3 , 17 > list[3]=17
< 4 , 35 > list[4]=35
Here, list is the name of array. By using, list [0] to list [4] the data items in list can be
accessed.
Array in C
Declaration: A one dimensional array in C is declared by adding brackets to the name of a
variable.
Ex: int list[5], *plist[5];
Data Structure & Applications – BCS304
The array list[5], defines 5 integers and in C array start at index 0, so list[0], list[1],
list[2], list[3], list[4] are the names of five array elements which contains an integer
value.
The array *plist[5], defines an array of 5 pointers to integers. Where, plist[0], plist[1],
plist[2], plist[3], plist[4] are the five array elements which contains a pointer to an
integer.
Implementation:
When the complier encounters an array declaration, list[5], it allocates five consecutive
memory locations. Each memory is enough large to hold a single integer.
The address of first element of an array is called Base Address. Ex: For list[5] the
address of list[0] is called the base address.
If the memory address of list[i] need to compute by the compiler, then the size of the
int would get by sizeof (int), then memory address of list[i] is as follows:
Note: In C the offset i do not multiply with the size of the type to get to the appropriate
element of the array. Hence (list2+i) is equal &list2[i] and *(list2+i) is equal to list2[i].
When sum is invoked, input=&input[0] is copied into a temporary location and associated
with the formal parameter list
A function that prints out both the address of the ith element of the array and the value found
at that address can written as shown in below program.
Output:
Address Content
12244868 0
12344872 1
12344876 2
12344880 3
12344884 4
Data Structure & Applications – BCS304
STRUCTURES
Ex: struct {
char name[10];
int age;
float salary;
} Person;
The above example creates a structure and variable name is Person and that has three fields:
name = a name that is a characterarray
age = an integer value representing the age of the person
salary = a float value representing the salary of the individual
Ex: strcpy(Person.name,“james”);
Person.age =10;
Person.salary = 35000;
Type-Defined Structure
The structure definition associated with keyword typedef is called Type-Defined Structure.
Syntax 1: typedef struct
{
data_type member 1;
data_type member 2;
………………………
………………………
data_type member n;
}Type_name;
Data Structure & Applications – BCS304
Where,
typedef is the keyword used at the beginning of the definition and by using typedef
user defined data type can be obtained.
struct is the keyword which tells structure is defined to the complier
The members are declare with their data_type
Type_name is not a variable, it is user defined data_type.
In above example, humanBeing is the name of the type and it is a user defined data type.
This statement declares the variable person1 and person2 are of type humanBeing.
Structure Operation
The various operations can be performed on structures and structure members.
1. The structures are defined separately and a variable of structure type is declared inside the
definition of another structure. The accessing of the variable of a structure type that are nested
inside another structure in the same way as accessing other memberof that structure
Data Structure & Applications – BCS304
Example: The following example shows two structures, where both the structure are defined
separately.
typedef struct {
int month;
int day;
int year;
}date;
typedef struct {
char name[10];
int age;
float salary;
date dob;
} humanBeing;
humanBeing person1;
A person born on February 11, 1944, would have the values for the date struct set as:
person1.dob.month = 2;
person1.dob.day = 11;
person1.dob.year = 1944;
2. The complete definition of a structure is placed inside the definition of another structure.
Example:
typedef struct {
char name[10];
int age;
float salary;
struct {
int month;
int day;
int year;
} date;
} humanBeing;
Data Structure & Applications – BCS304
SELF-REFERENTIAL STRUCTURES
A self-referential structure is one in which one or more of its components is a pointer to itself. Self-
referential structures usually require dynamic storage management routines (malloc and free) to
explicitly obtain and release memory.
Consider as an example:
typedef struct {
char data;
struct list *link ;
} list;
Each instance of the structure list will have two components data and link.
Data: is a single character,
Link: link is a pointer to a list structure. The value of link is either the address in
memory of an instance of list or the null pointer.
Consider these statements, which create three structures and assign values to their respective fields:
Structures item1, item2 and item3 each contain the data item a, b, and c respectively, and the null
pointer. These structures can be attached together by replacing the null link field in item 2 with
one that points to item 3 and by replacing the null link field in item 1 with one that points to item
2.
item1.link = &item2;
item2.1ink = &item3;
Data Structure & Applications – BCS304
Unions:
A union is similar to a structure, it is collection of data similar data type or dissimilar.
Syntax: union{
data_type member 1;
data_type member 2;
………………………
………………………
data_type member n;
}variable_name;
Example:
union{
int children;
int beard;
} u;
Union Declaration:
A union declaration is similar to a structure, but the fields of a union must share their memory
space. This means that only one field of the union is "active" at any given time.
union{
char name;
int age;
float salary;
}u;
The major difference between a union and a structure is that unlike structure members which are
stored in separate memory locations, all the members of union must share the same memory space.
This means that only one field of the union is "active" at any given time.
Data Structure & Applications – BCS304
Example:
#include <stdio.h>
union job {
char name[32];
float salary;
int worker_no;
}u;
int main( ){
printf("Enter name:\n");
scanf("%s", &u.name);
printf("Enter salary: \n");
scanf("%f", &u.salary);
printf("Displaying\n Name :%s\n",u.name);
printf("Salary: %.1f",u.salary);
return 0;
}
Output:
Enter name: Albert
Enter salary: 45678.90
Displaying
Name: f%gupad (Garbage Value)
Salary: 45678.90
POINTERS
A pointer is a variable which contains the address in memory of another variable.
The two most important operator used with the pointer type are
& - The unary operator & which gives the address of a variable
* - The indirection or dereference operator * gives the content of the object pointed to
by apointer.
Declaration
int i, *pi;
pi = &i;
Here, &i returns the address of i and assigns it as the value of pi
Data Structure & Applications – BCS304
Null Pointer
The null pointer points to no object or function.
The null pointer is represented by the integer 0.
The null pointer can be used in relational expression, where it is interpreted as false.
3. Pointer is dangerous when use of explicit type casts in converting between pointer types
Ex: pi = malloc (sizeof (int));
pf = (float*) pi;
4. In some system, pointers have the same size as type int, since int is the default type specifier,
some programmers omit the return type when defining a function. The return type defaults to int
which can later be interpreted as a pointer. This has proven to be a dangerous practice on some
computer and the programmer is made to define explicit types for functions.
Pointers to Pointers
A variable which contains address of a pointer variable is called pointer-to-pointer.
Data Structure & Applications – BCS304
1. malloc( ):
The function malloc allocates a user- specified amount of memory and a pointer to the start of
the allocated memory is returned.
If there is insufficient memory to make the allocation, the returned value is NULL.
Syntax:
data_type *x;
x= (data_type *) malloc(size);
Where,
2. calloc( ):
The function calloc allocates a user- specified amount of memory and initializes the allocated
memory to 0 and a pointer to the start of the allocated memory is returned.
If there is insufficient memory to make the allocation, the returned value is NULL.
Syntax:
data_type *x;
x= (data_type *) calloc(n, size);
Where,
Ex: int *x
x= calloc (10, sizeof(int));
The above example is used to define a one-dimensional array of integers. The capacity of this
array is n=10 and x [0: n-1] (x [0, 9]) are initially 0
Macro CALLOC
#define CALLOC (p, n, s)\
if ( ! ((p) = calloc (n, s)))\
{\
fprintf(stderr, “Insuffiient memory”);\
exit(EXIT_FAILURE);\
}\
Data Structure & Applications – BCS304
3. realloc( ):
Before using the realloc( ) function, the memory should have been allocated using malloc(
) or calloc( ) functions.
The function relloc( ) resizes memory previously allocated by either mallor or calloc, which
means, the size of the memory changes by extending or deleting the allocated memory.
If the existing allocated memory need to extend, the pointer value will not change.
If the existing allocated memory cannot be extended, the function allocates a new block and
copies the contents of existing memory block into new memory block and then deletes the
old memory block.
When realloc is able to do the resizing, it returns a pointer to the start of the new block and
when it is unable to do the resizing, the old block is unchanged and the function returns the
value NULL
Syntax:
data_type *x;
x= (data_type *) realloc(p, s );
The size of the memory block pointed at by p changes to S. When s > p the additional s-p
memory block have been extended and when s < p, then p-s bytes of the old block are freed.
Macro REALLOC
#define REALLOC(p,S)\
if (!((p) = realloc(p,s))) \
{\
fprintf(stderr, "Insufficient memory");\
exit(EXIT_FAILURE);\
}\
4. free( )
Dynamically allocated memory with either malloc( ) or calloc ( ) does not return on its own.
The programmer must use free( ) explicitly to release space.
Syntax:
free(ptr);
This statement cause the space in memory pointer by ptr to be deallocated
Data Structure & Applications – BCS304
Linear Array
A linear array is a list of a finite number ‘n’ of homogeneous data element such that
a. The elements of the array are reference respectively by an index set consisting of n
consecutive numbers.
b. The element of the array are respectively in successive memory locations.
The number n of elements is called the length or size of the array. The length or the numbers
of elements of the array can be obtained from the index set by the formula
When LB = 0,
Length = UB – LB + 1
When LB = 1,
Length = UB
Where,
Let LA be a linear array in the memory of the computer. The memory of the computer is
simply a sequence of address location as shown below,
1000
1001
1002
1003
1004
Using the base address of LA, the computer calculates the address of any element of LA by
the formula
Where, w is the number of words per memory cell for the array LA.
While writing computer programs, if finds ourselves in a situation where we cannot determine
how large an array to use, then a good solution to this problem is to defer this decision to run
time and allocate the array when we have a good estimate of the required array size.
Example:
int i, n, *list;
printf(“Enter the number of numbers to generate:”);
scanf(“%d”, &n);
if(n<1)
{
fprintf (stderr, “Improper value of n \n”);
exit(EXIT_FAILURE);
}
MALLOC (list, n*sizeof(int));
The programs fails only when n<1 or insufficient memory to hold the list of numbers that are
to be sorted.
Two DimensionalArrays
C uses array-of-arrays representation to represent a multidimensional array. The two
dimensional arrays is represented as a one-dimensional array in which each element is itself a
one-dimensional array.
Array-of-arrays representation
Data Structure & Applications – BCS304
int **myArray;
myArray = make2dArray(5,10);
myArray[2][4]=6;
The second line allocates memory for a 5 by 10 two-dimensional array of integers and the
third line assigns the value 6 to the [2][4] element of this array.
Data Structure & Applications – BCS304
ARRAY OPERATIONS
1. Traversing
Let A be a collection of data elements stored in the memory of the computer. Suppose
if the contents of the each elements of array A needs to be printed or to count the
numbers of elements of A with a given property can be accomplished by Traversing.
Traversing is a accessing and processing each element in the array exactly once.
Hear LA is a linear array with the lower bound LB and upper bound UB. This algorithm
traverses LA applying an operation PROCESS to each element of LA using while loop.
1. [Initialize Counter] set K:= LB
2. Repeat step 3 and 4 while K ≤ UB
3. [Visit element] Apply PROCESS to LA [K]
4. [Increase counter] Set K:= K + 1
[End of step 2 loop]
5. Exit
Hear LA is a linear array with the lower bound LB and upper bound UB. This algorithm
traverses LA applying an operation PROCESS to each element of LA using repeat – for loop.
1. Repeat for K = LB to UB
Apply PROCESS to LA [K]
[End of loop]
2. Exit.
Example:
Consider the array AUTO which records the number of automobiles sold each year from 1932
through 1984.
To find the number NUM of years during which more than 300 automobiles were sold,
involves traversing AUTO.
1. [Initialization step.] Set NUM := 0
2. Repeat for K = 1932 to 1984:
If AUTO [K] > 300, then: Set NUM: = NUM + 1.
[End of loop.]
3. Return.
Data Structure & Applications – BCS304
2. Inserting
Let A be a collection of data elements stored in the memory of the computer.
Inserting refers to the operation of adding another element to the collection A.
Inserting an element at the “end” of the linear array can be easily done provided the memory
space allocated for the array is large enough to accommodate the additional element.
Inserting an element in the middle of the array, then on average, half of the elements must
be moved downwards to new locations to accommodate the new element and keep the order
of the otherelements.
Algorithm:
INSERT (LA, N, K, ITEM)
Here LA is a linear array with N elements and K is a positive integer such that K ≤ N. This
algorithm inserts an element ITEM into the Kth position in LA.
3. Deleting
Deleting refers to the operation of removing one element to the collection A.
Deleting an element at the “end” of the linear array can be easily done with difficulties.
If element at the middle of the array needs to be deleted, then each subsequent
elements be moved one location upward to fill up the array.
Algorithm
DELETE (LA, N, K, ITEM)
Here LA is a linear array with N elements and K is a positive integer such that K ≤ N. this
algorithm deletes the Kth element from LA
4. Sorting
Sorting refers to the operation of rearranging the elements of a list. Here list be a set of n
elements. The elements are arranged in increasing or decreasing order.
Ex: suppose A is the list of n numbers. Sorting A refers to the operation of rearranging the
elements of A so they are in increasing order, i.e., so that,
A[I] < A[2] < A[3] < ... < A[N]
Bubble Sort
Suppose the list of numbers A[l], A[2], ... , A[N] is in memory. The bubble sort algorithm
works as follows:
Example:
Data Structure & Applications – BCS304
5. Searching
Let DATA be a collection of data elements in memory, and suppose a specific ITEM of
information is given. Searching refers to the operation of finding the location LOC of ITEM
in DATA, or printing some message that ITEM does not appear there.
Thesearch is said to be successful if ITEMdoes appear in DATA and unsuccessful otherwise.
Linear Search
Suppose DATA is a linear array with n elements. Given no other information about DATA, The
way to search for a given ITEM in DATA is to compare ITEM with each element of DATA one by
one. That is, first test whether DATA [l] = ITEM, and then test whether DATA[2] = ITEM, and so
on. This method, which traverses DATA sequentially to locate ITEM, is called linear search or
sequential search.
Average Case: The average number of comparisons required to find the location of ITEM is
approximately equal to half the number of elements in the array.
( ) 𝑛+1
f(n)=
2
Data Structure & Applications – BCS304
Binary Search
Suppose DATA is an array which is sorted in increasing numerical order or, equivalently,
alphabetically. Then there is an extremely efficient searching algorithm, called binary search,
which can be used to find the location LOC of a given ITEM of information in DATA.
That is, the running time for the worst case is approximately equal to log2 n. One can also
show that the running time for the average case is approximately equal to the running time for
the worstcase.
Data Structure & Applications – BCS304
MULTIDIMENSIONAL ARRAY
Two-Dimensional Arrays
The element of A with first subscript j and second subscript k will be denoted by
AJ,K or A[J, K]
The computer uses the formula to find the address of LA[K] in time independent of K.
LOC (LA[K]) = Base(LA) + w(K - 1)
The computer keeps track of Base(A)-the address of the first element A[1, 1] of A-and
computes the address LOC(A[J, K]) of A[J, K] using the formula
The element of B with subscripts K1 K2 ... , Kn will be denoted by B[K1 K2 ... , Kn]
The programming language will store the array B either in row-major order or in column-
major order.
Let C be such an n-dimensional array. The index set for each dimension of C consists of the
consecutive integers from the lower bound to the upper bound of the dimension. The length Li
of dimension i of C is the number of elements in the index set, and Li can be calculated, as
Li = upper bound - lower bound + 1
For a given subscript Ki, the effective index Ei of Li is the number of indices preceding Ki in
the index set, and Ei can be calculated from
Ei = Ki - lower bound
Then the address LOC(C[K1 K2 ... , Kn] of an arbitrary element of C can be obtained from the
formula
Base(C) + w[((( ... (ENLN-1 ] + E N-1])LN-2) + ... + E3))L2 + E2)L1 + E1]
or from the formula
Base(C) + w[( ... ((E1L2 + E2)L3 + E3)L4 + ... + EN-1 )LN + EN]
POLYNOMIALS
What is a polynomial?
“A polynomial is a sum of terms, where each term has a form axe , where x is the variable, a is
the coefficient and e is the exponent.”
The largest (or leading) exponent of a polynomial is called its degree. Coefficients that are
zero are not displayed. The term with exponent equal to zero does not show the variable since
x raised to a power of zero is 1.
Polynomial Representation
One way to represent polynomials in C is to use typedef to create the type polynomial as
below:
Now if a is a variable and is of type polynomial and n < MAX_DEGREE, the polynomial
A(x) = Σai xi would be represented as:
a.degree = n
a.coef[i] = an-i , 0 ≤ i ≤ n
In this representation, the coefficients is stored in order of decreasing exponents, such that
a.coef [i] is the coefficient of x n-i provided a term with exponent n-i exists;
Otherwise, a.coef [i] =0. This representation leads to very simple algorithms for most of the
operations, it wastes a lot of space.
To preserve space an alternate representation that uses only one global array, terms to store
all polynomials.
The C declarations needed are:
The above figure shows how these polynomials are stored in the array terms. The index
of the first term of A and B is given by startA and startB, while finishA and finishB
give the index of the last term of A and B.
The index of the next free location in the array is given by avail.
For above example, startA=0, finishA=1, startB=2, finishB=5, & avail=6.
Polynomial Addition
C function is written that adds two polynomials, A and B to obtain D =A + B.
To produce D (x), padd( ) is used to add A (x) and B (x) term by term. Starting at
position avail, attach( ) which places the terms of D into the array, terms.
If there is not enough space in terms to accommodate D, an error message is printed to
the standard error device & exits the program with an error condition
void padd(int startA, int finishA, int startB, int finishB, int *startD,int *finishD)
{ /* add A(x) and B(x) to obtain D(x) */
float coefficient;
*startD = avail;
while (startA <= finishA && startB <= finishB)
switch(COMPARE(terms[startA].expon, terms[startB].expon))
{
case -1: /* a expon < b expon */
attach (terms [startB].coef, terms[startB].expon);
startB++;
break;
if (coefficient)
attach (coefficient, terms[startA].expon);
startA++;
startB++;
break;
Data Structure & Applications – BCS304
Analysis of padd( ):
The number of non-zero terms in A and B is the most important factors in analyzing the time
complexity.
Let m and n be the number of non-zero terms in A and B, If m >0 and n > 0, the while loop is
entered. Each iteration of the loop requires O(1) time. At each iteration, the value of startA or
startB or both is incremented. The iteration terminates when either startA or startB exceeds
finishA or finishB.
Data Structure & Applications – BCS304
𝑛
A(x) = ∑ 𝑥2𝑖 and B(x) = ∑
𝑛
𝑥2𝑖+1
𝑖=0
𝑖=0
The time for the remaining two for loops is bounded by O(n + m) because we cannot iterate
the first loop more than m times and the second more than n times. So, the asymptotic
computing time of this algorithm is O(n +m).
SPARSE MATRICES
A matrix contains m rows and n columns of elements as illustrated in below figures. In this figure,
the elements are numbers. The first matrix has five rows and three columns and the second has six
rows and six columns. We write m x n (read "m by n") to designate a matrix with m rows and n
columns. The total number of elements in such a matrix is mn. If m equals n, the matrix is
square.
Important Note:
A sparse matrix can be represented in 1-Dimension, 2- Dimension and 3- Dimensional array.
When a sparse matrix is represented as a two-dimensional array as shown in
Figure B, more space is wasted.
Example: consider the space requirements necessary to store a 1000 x 1000 matrix that has only
2000 non-zero elements. The corresponding two-dimensional array requires space for 1,000,000
elements. The better choice is by using a representation in which only the nonzero elements are
stored.
Data Structure & Applications – BCS304
The below figure shows the representation of matrix in the array “a” a[0].row contains the
number of rows, a[0].col contains the number of columns and a[0].value contains the total
number of nonzero entries.
Positions 1 through 8 store the triples representing the nonzero entries. The row index is in
the field row, the column index is in the field col, and the value is in the field value. The
triples are ordered by row and within rows bycolumns.
a[0] 6 6 8 b[0] 6 6 8
[1] 0 0 15 [1] 0 0 15
[2] 0 3 22 [2] 0 4 91
[3] 0 5 -15 [3] 1 1 11
[4] 1 1 11 [4] 2 1 3
[5] 1 2 3 [5] 2 5 28
[6] 2 3 -6 [6] 3 0 22
[7] 4 0 91 [7] 3 2 -6
[8] 5 2 28 [8] 5 0 -15
Fig (a): Sparse matrix stored as triple Fig (b): Transpose matrix stored as triple
Data Structure & Applications – BCS304
Transposing a Matrix
To transpose a matrix, interchange the rows and columns. This means that each element
a[i][j] in the original matrix becomes element a[j][i] in the transpose matrix.
If we process the original matrix by the row indices it is difficult to know exactly where to
place element <j, i, value> in the transpose matrix until we processed all the elements that
precede it.
This can be avoided by using the column indices to determine the placement of elements in
the transpose matrix. This suggests the following algorithm:
The columns within each row of the transpose matrix will be arranged in ascending order. void
transpose (term a[], termb[])
{ /* b is set to the transpose of a */
int n, i, j, currentb;
n = a[0].value; /* total number of elements */
b[0].row = a[0].col; /* rows in b = columns in a */
b[0].col = a[0].row; /* columns in b = rows in a */
b[0].value = n;
if (n > 0)
{ currentb = 1;
for (i = 0; i < a[O].col; i++)
for (j= 1; j<=n; j++)
if (a[j].col == i)
{
b[currentb].row = a[j].col;
b[currentb].col = a[j].row;
b[currentb].value = a[j].value;
currentb++;
}
}
}
Transpose of a sparse matrix
Data Structure & Applications – BCS304
STRING
BASIC TERMINOLOGY:
Each programming languages contains a character set that is used to communicate with the
computer. The character set include the following:
Alphabet: ABCDEFGHIJKLMNOPQRSTUVWXYZ
Digits: 012345678 9
Special characters: + - / * ( ) , . $ = ‘ _ (Blank space)
Concatenation: Let S1 and S2 be the strings. The string consisting of the characters of S 1
followed by the character S2 is called Concatenation of S1 and S2.
Ex: ‘THE’ // ‘END’ = ‘THEEND’
‘THE’ // ‘ ’ // ‘END’ = ‘THE END’
Substring: A string Y is called substring of a string S if there exist string X and Z such that
S = X // Y // Z
If X is an empty string, then Y is called an Initial substring of S, and Z is an empty string then
Y is called a terminal substring of S.
Ex: ‘BE OR NOT’ is a substring of ‘TO BE OR NOT TO BE’
‘THE’ is an initial substring of ‘THE END’
STRINGS IN C
In C, the strings are represented as character arrays terminated with the null character \0.
Declaration 1:
#define MAX_SIZE 100 /* maximum size of string */
char s[MAX_SIZE] = {“dog”};
char t[MAX_SIZE] = {“house”};
s[0] s[1] s[2] s[3] t[0] t[1] t[2] t[3] t[4] t[4]
d o g \0 h o u s e \0
The above figure shows how these strings would be represented internally in memory.
Data Structure & Applications – BCS304
Declaration 2:
char s[ ] = {“dog”};
char t[ ] = {“house”};
Using these declarations, the C compiler will allocate just enough space to hold each word
including the null character.
STORING STRINGS
Example: Suppose the input consists of the program. Using a record oriented, fixed length
storage medium, the input data will appear in memory as pictured below.
Data Structure & Applications – BCS304
Suppose, if new record needs to be inserted, then it requires that all succeeding records be moved
to new memory location. This disadvantages can be easily remedied as shown in below figure.
That is, one can use a linear array POINT which gives the address of successive record, so
that the records need not be stored in consecutive locations in memory. Inserting a new record
will require only an updating of the array POINT.
Example:
The other method to store strings one after another by using some separation marker, such as
the two dollar sign ($$) or by using a pointer giving the location of the string.
These ways of storing strings will save space and are sometimes used in secondary memory
when records are relatively permanent and require little changes.
These types of methods of storage are usually inefficient when the strings and their lengths
are frequently being changed.
Data Structure & Applications – BCS304
Linked Storage
Most extensive word processing applications, strings are stored by means of linked
lists.
In a one way linked list, a linearly ordered sequence of memory cells called nodes,
where each node contains an item called a link, which points to the next node in the
list, i.e., which consists the address of the nextnode.
Ex: TO BE OR NOT TO BE
Constants
Many programming languages denotes string constants by placing the string in either single
or double quotation marks.
Ex: ‘THE END’
“THE BEGINNING”
The string constants of length 7 and 13 characters respectively.
Variables
Each programming languages has its own rules for forming character variables. These
variables fall into one of three categories
1. Static: In static character variable, whose length is defined before the program is
executed and cannot change throughout the program
Data Structure & Applications – BCS304
2. Semi-static: The length of the variable may vary during the execution of the program
as long as the length does not exceed a maximum value determined by the program
before the program is executed.
3. Dynamic: The length of the variable can change during the execution of the program.
STRING OPERATION
Substring
Accessing a substring from a given string requires three pieces of information:
(1) The name of the string or the string itself
(2) The position of the first character of the substring in the givenstring
(3) The length of the substring or the position of the last character of the substring.
The syntax denote the substring of a string S beginning in a position K and having a length L.
Indexing
Indexing also called pattern matching, refers to finding the position where a string pattern P
first appears in a given string text T. This operation is called INDEX
If the pattern P does not appears in the text T, then INDEX is assigned the value 0.
The arguments “text” and “pattern” can be either string constant or string variable.
Concatenation
Let S1 and S2 be string. The concatenation of S1 and S2 which is denoted by S1 // S2, is the string
consisting of the characters of S1 followed by the character of S2.
Ex:
(a) Suppose S1 = 'MARK' and S2= ‘TWAIN' then
S1 // S2 = ‘MARKTWAIN’
Length
The number of characters in a string is called its length.
Syntax: LENGTH (string)
String length is determined in C language using the strlen( ) function, as shown below:
X = strlen ("sunrise");
strlen function returns an integer value 7 and assigns it to the variable X
Similar to strcat, strlen is also a part of string.h, hence the header file must be included at the
time of pre-processing.
Data Structure & Applications – BCS304
Pattern matching is the problem of deciding whether or not a given string pattern P appears in a
string text T. The length of P does not exceed the length of T.
Observation of algorithms
P is an r-character string and T is an s-character string
Algorithm contains two loops, one inside the other. The outer loop runs through each
successive R-character substring WK = T[K] T[K + 1] ... T[K+R-l] of T.
The inner loop compares P with WK, character by character. If any character does not match,
then control transfers to Step 5, which increases K and then leads to the next substring of T.
If all the R characters of P do match those of some WK then P appears in T and K is the
INDEX of P in T.
If the outer loop completes all of its cycles, then P does not appear in T and so INDEX
= 0.
Complexity
The complexity of this pattern matching algorithm is equal to O(n2)
This algorithm contains the table that is used for the pattern P = aaba.
The table is obtained as follows.
Let Qi denote the initial substring of P of length i, hence Q0 = A, Q1 = a, Q2 = a2, Q3
= aab, Q4 = aaba = P (Here Q0 = A is the empty string.)
The rows of the table are labeled by these initial substrings of P, excluding P itself.
The columns of the table are labeled a, b and x, where x represents any character that doesn't
appear in the pattern P.
Let f be the function determined by the table; i.e., let f(Qi, t) denote the entry in the table in
row Qi and column t (where t is any character). This entry f(Qi, t) is defined to be the largest
Q that appears as a terminal substring in the string (Qi t) the concatenation of Qi and t.
For example,
a2 is the largest Q that is a terminal substring of Q2a = a3, so f(Q2, a) = Q2 A
is the largest Q that is a terminal substring of Q1b = ab, so f(Q1, b) = Q0 a is
the largest Q that is a terminal substring of Q0a = a, so f(Q0, a) = Q1
A is the largest Q that is a terminal substring of Q3a = a3bx, so f(Q3, x) = Q0
Data Structure & Applications – BCS304
on.
Data Structure & Applications – BCS304
STACKS
DEFINITION
“A stack is an ordered list in which insertions (pushes) and deletions (pops) are made at one
end called the top.”
Given a stack S= (a0, ... ,an-1), where a0 is the bottom element, an-1 is the top element, and ai is
on top of element ai-1, 0 < i < n.
As shown in above figure, the elements are added in the stack in the order A, B, C, D, E, then
E is the first element that is deleted from the stack and the last element is deleted from stack
is A. Figure illustrates this sequence of operations.
Since the last element inserted into a stack is the first element removed, a stack is also known
as a Last-In-First-Out (LIFO) list.
STACK OPERATIONS
Implementation of the stack operations as follows.
1. Stack Create
Stack CreateS(maxStackSize )::=
#define MAX_STACK_ SIZE 100 /* maximum stack size*/
typedef struct
{
int key;
/* other fields */
} element;
element stack[MAX_STACK_SIZE];
int top = -1;
The element which is used to insert or delete is specified as a structure that consists of only a
key field.
The IsEmpty and IsFull operations are simple, and is implemented directly in the program
push and pop functions. Each of these functions assumes that the variables stack and top are
global.
4. Push( )
Function push checks whether stack is full. If it is, it calls stackFull( ), which prints an error
message and terminates execution. When the stack is not full, increment top and assign item to
stack [top].
5. Pop( )
Deleting an element from the stack is called pop operation. The element is deleted only from
the top of the stack and only one element is deleted at a time.
Data Structure & Applications – BCS304
element pop ( )
{ /*delete and return the top element from the stack */
if (top == -1)
return stackEmpty(); /*returns an error key */
return stack[top--];
}
6. stackFull( )
The stackFull which prints an error message and terminates execution.
void stackFull()
{
fprintf(stderr, "Stack is full, cannot add element");
exit(EXIT_FAILURE);
}
The array is used to implement stack, but the bound (MAX_STACK_ SIZE) should be known
during compile time. The size of bound is impossible to alter during compilation hence this
can be overcome by using dynamically allocated array for the elements and then increasing
the size of array as needed.
4. push()
Here the MAX_STACK_SIZE is replaced with capacity
5. pop( )
In this function, no changes are made.
element pop ( )
{ /* delete and return the top element from the stack */
if (top == -1)
return stackEmpty(); /* returns an error key */
return stack[top--];
}
6. stackFull( )
The new code shown below, attempts to increase the capacity of the array stack so that new
element can be added into the stack. Before increasing the capacity of an array, decide what
the new capacity should be.
In array doubling, array capacity is doubled whenever it becomes necessary to increase the
capacity of an array.
void stackFull()
{
REALLOC (stack, 2*capacity*sizeof(*stack));
capacity *= 2;
}
Stack full with array doubling
Analysis
In the worst case, the realloc function needs to allocate 2*capacity*sizeof (*stack) bytes of
memory and copy capacity *sizeof (*stack)) bytes of memory from the old array into the
new one. Under the assumptions that memory may be allocated in O(1) time and that a stack
element can be copied in O(1) time, the time required by array doubling is O(capacity).
Initially, capacity is 1.
Data Structure & Applications – BCS304
Suppose that, if all elements are pushed in stack and the capacity is 2k for some k, k>O, then
the total time spent over all array doublings is O ( ∑𝑘 𝑖=1 2𝑖 ) = O(2k+l) = O(2k).
Since the total number of pushes is more than 2k-1, the total time spend in array doubling is
O(n), where n is the total number of pushes. Hence, even with the time spent on array
doubling added in, the total run time of push over all n pushes is O(n).
Expressions: It is sequence of operators and operands that reduces to a single value after
evaluation is called an expression.
X=a/b–c+d*e–a*c
In above expression contains operators (+, –, /, *) operands (a, b, c, d, e).
Infix Expression: In this expression, the binary operator is placed in-between the operand.
The expression can be parenthesized or un- parenthesized.
Example: A + B
Here, A & B are operands and + is operand
Prefix or Polish Expression: In this expression, the operator appears before its operand.
Example: + A B
Here, A & B are operands and + is operand
Postfix or Reverse Polish Expression: In this expression, the operator appears after its
operand.
Example: A B +
Here, A & B are operands and + is operand
The first answer is picked most because division is carried out before subtraction, and
multiplication before addition. If we wanted the second answer, write expression differently
using parentheses to change the order of evaluation
X= ((a / ( b – c + d ) ) * ( e – a ) * c
In C, there is a precedence hierarchy that determines the order in which operators are
evaluated. Below figure contains the precedence hierarchy for C.
The operators are arranged from highest precedence to lowest. Operators with highest
precedence are evaluated first.
The associativity column indicates how to evaluate operators with the same precedence. For
example, the multiplicative operators have left-to-right associativity. This means that the
expression a * b / c % d / e is equivalent to ( ( ( ( a * b ) / c ) % d ) / e )
Parentheses are used to override precedence, and expressions are always evaluated from the
innermost parenthesized expression first
Data Structure & Applications – BCS304
The analysis of the examples suggests a precedence-based scheme for stacking and
unstacking operators.
The left parenthesis complicates matters because it behaves like a low-precedence operator
when it is on the stack and a high-precedence one when it is not. It is placed in the stack
whenever it is found in the expression, but it is unstacked only when its matching right
parenthesis is found.
There are two types of precedence, in-stack precedence (isp) and incoming precedence
(icp).
Data Structure & Applications – BCS304
void postfix(void)
{
char symbol;
precedence token;
int n = 0,top = 0; /* place eos on stack */
stack[0] = eos;
for (token = getToken(&symbol, &n); token != eos; token =
getToken(&symbol,& n ))
{
if (token == operand)
printf("%c", symbol);
else if (token == rparen)
{
while (stack[top] != lparen)
printToken(pop( ));
pop( );
}
else{
while(isp[stack[top]] >= icp[token])
printToken(pop());
push(token);
}
}
while((token = pop ())!= eos)
printToken(token);
printf("\n");
}
Program: Function to convert from infix to postfix
Analysis of postfix: Let n be the number of tokens in the expression. Ө (n) time is spent extracting
tokens and outputting them. Time is spent in the two while loops, is Ө (n) as the number of tokens
that get stacked and unstacked is linear in n. So, the complexity of function postfix is Ө (n).
Data Structure & Applications – BCS304
int eval(void)
{
precedence token;
char symbol;
int opl,op2, n=0;
int top= -1;
token = getToken(&symbol, &n);
while(token! = eos)
{
if (token == operand)
push(symbol-'0'); /* stack insert */
else {
op2 = pop(); /* stack delete */
opl = pop();
switch(token) {
case plus: push(opl+op2);
break;
case minus: push(opl-op2);
break;
case times: push(opl*op2);
break;
case divide: push(opl/op2);
break;
case mod: push(opl%op2);
}
}
token = getToken(&symbol, &n);
}
return pop(); /* return result */
}
Program: Function to evaluate a postfix expression
Data Structure & Applications – BCS304
RECURSION
A recursive procedure
Suppose P is a procedure containing either a Call statement to itself or a Call statement to a
second procedure that may eventually result in a Call statement back to the original procedure
P. Then P is called a recursive procedure. So that the program will not continue to run
indefinitely, a recursive procedure must have the following two properties:
1. There must be certain criteria, called base criteria, for which the procedure does not call
itself.
2. Each time the procedure does call itself (directly or indirectly), it must be closer to the
base criteria.
A recursive function
A function is said to be recursively defined if the function definition refers to itself. A recursive
function must have the following two properties:
1. There must be certain arguments, called base values, for which the function does not
refer to itself.
2. Each time the function does refer to itself, the argument of the function must be closer
to a base value
A recursive function with these two properties is also said to be well-defined.
Factorial Function
“The product of the positive integers from 1 to n, is called "n factorial" and is denoted by n!”
n! = 1*2 * 3 ... (n - 2)*(n - 1)*n
It is also convenient to define 0! = 1, so that the function is defined for all nonnegative integers.
Observe that this definition of n! is recursive, since it refers to itself when it uses (n - 1)!
(a) The value of n! is explicitly given when n = 0 (thus 0 is the base value )
(b) The value of n! for arbitrary n is defined in terms of a smaller value of n which is closer to
the base value 0.
Data Structure & Applications – BCS304
1. Using for loop: This procedure evaluates N! using an iterative loop process
2. Using recursive function: This is a recursive procedure, since it contains a call to itself
GCD
The greatest common divisor (GCD) of two integers m and n is the greatest integer that divides
both m and n with no remainder.
Fibonacci Sequence
That is, F0 = 0 and F1 = 1 and each succeeding term is the sum of the two preceding terms.
Here
(a) The base values are 0 and 1
(b) The value of Fn is defined in terms of smaller values of n which are closer to the base values.
A procedure for finding the nth term Fn of the Fibonacci sequence follows.
Tower of Hanoi
Problem description
Suppose three pegs, labeled A, Band C, are given, and suppose on peg A a finite number n of
disks with decreasing size are placed.
The objective of the game is to move the disks from peg A to peg C using peg B as an auxiliary.
We write A→B to denote the instruction "Move top disk from peg A to peg B"
In other words,
n=3: A→C, A→B, C→B, A→C, B→A, B→C, A→C
For completeness, the solution to the Towers of Hanoi problem for n = 1 and n = 2
n=l: A→C
n=2: A→B, A→C, B→C
Data Structure & Applications – BCS304
The Towers of Hanoi problem for n > 1 disks may be reduced to the following sub-problems:
(1) Move the top n - 1 disks from peg A to peg B
(2) Move the top disk from peg A to peg C: A→C.
(3) Move the top n - 1 disks from peg B to peg C.
When n > 1, the solution may be reduced to the solution of the following three sub-
problems:
(a) TOWER (N - I, BEG, END, AUX)
(b) TOWER (l, BEG, AUX, END) or BEG → END
(c) TOWER (N - I, AUX, BEG, END)
Ackermann function
The Ackermann function is a function with two arguments each of which can be assigned any
nonnegative integer: 0, 1, 2, ....
MODULE 2: QUEUES
DEFINITION
“A queue is an ordered list in which insertions (additions, pushes) and deletions
(removals and pops) take place at different ends.”
The end at which new elements are added is called the rear, and that from which old
elements are deleted is called the front.
If the elements are inserted A, B, C, D and E in this order, then A is the first element deleted
from the queue. Since the first element inserted into a queue is the first element removed,
queues are also known as First-In-First-Out (FIFO) lists.
Figure indicates the way elements will be deleted from the queue and the way new elements
will be added to the queue.
Whenever an element is deleted from the queue, the value of FRONT is increased by 1;
this can be implemented by the assignment FRONT := FRONT + 1
When an element is added to the queue, the value of REAR is increased by 1; this can
be implemented by the assignment REAR := REAR + 1
Data Structure & Applications – BCS304
QUEUE OPERATIONS
Implementation of the queue operations as follows.
1. Queue Create
Queue CreateQ(maxQueueSize) ::=
#define MAX_QUEUE_ SIZE 100 /* maximum queue size */
typedef struct
{
int key; /* other fields */
} element;
element queue[MAX_QUEUE_ SIZE];
int rear = -1;
int front = -1;
In the queue, two variables are used which are front and rear. The queue increments rear in
addq( ) and front in delete( ). The function calls would be
addq (item); and item =delete( );
Data Structure & Applications – BCS304
4. addq(item)
void addq(element item)
{ /* add an item to the queue */
if (rear == MAX_QUEUE_SIZE-1)
queueFull();
queue [++rear] = item;
}
5. deleteq( )
element deleteq()
{ /* remove element at the front of the queue */
if (front == rear)
return queueEmpty( ); /* return an error key */
return queue[++front];
}
Program: Delete from a queue
6. queueFull( )
The queueFull function which prints an error message and terminates execution
void queueFull()
{
fprintf(stderr, "Queue is full, cannot add element");
exit(EXIT_FAILURE);
}
Drawback of Queue
When item enters and deleted from the queue, the queue gradually shifts to the right as shown
in figure.
In this above situation, when we try to insert another item, which shows that the queue is full .
This means that the rear index equals to MAX_QUEUE_SIZE -1. But even if the space is
available at the front end, rear insertion cannot be done.
Method 1:
When an item is deleted from the queue, move the entire queue to the left so that the first
element is again at queue[0] and front is at -1. It should also recalculate rear so that it is
correctly positioned.
Shifting an array is very time-consuming when there are many elements in queue &
queueFull has worst case complexity of O(MAX_QUEUE_ SIZE)
Data Structure & Applications – BCS304
Method 2:
Circular Queue
It is “The queue which wrap around the end of the array.” The array positions are arranged
in a circle.
In this convention the variable front is changed. front variable points one position
counterclockwise from the location of the front element in the queue. The convention for
rear is unchanged.
CIRCULAR QUEUES
It is “The queue which wrap around the end of the array.” The array positions are arranged
in a circle as shown in figure.
In this convention the variable front is changed. front variable points one position
counterclockwise from the location of the front element in the queue. The convention for
rear is unchanged.
When the array is viewed as a circle, each array position has a next and a previous position.
The position next to MAX-QUEUE-SIZE -1 is 0, and the position that precedes 0 is MAX-
QUEUE-SIZE -1.
When the queue rear is at MAX_QUEUE_SIZE-1, the next element is inserted at position
0.
In circular queue, the variables front and rear are moved from their current position to the
next position in clockwise direction. This may be done using code
if (rear = = MAX_QUEUE_SIZE-1)
rear = 0;
else rear++;
Data Structure & Applications – BCS304
element deleteq()
{ /* remove front element from the queue */
element item;
if (front == rear)
return queueEmpty( ); /* return an error key */
front = (front+1)% MAX_QUEUE_SIZE;
return queue[front];
}
Program: Delete from a circular queue
Data Structure & Applications – BCS304
Note:
When queue becomes empty, then front =rear. When the queue becomes full and
front =rear. It is difficult to distinguish between an empty and a full queue.
To avoid the resulting confusion, increase the capacity of a queue just before it
becomes full.
Consider the full queue of figure (a). This figure shows a queue with seven elements in an
array whose capacity is 8. A circular queue is flatten out the array as in Figure (b).
To get a proper circular queue configuration, slide the elements in the right segment (i.e.,
elements A and B) to the right end of the array as in figure (d)
Data Structure & Applications – BCS304
Below program gives the code to add to a circular queue using a dynamically allocated array.
Below program obtains the configuration of figure (e) and gives the code for queueFull. The
function copy (a,b,c) copies elements from locations a through b-1 to locations beginning at c.
void queueFull( )
{ /* allocate an array with twice the capacity */
element *newQueue;
MALLOC ( newQueue, 2 * capacity * sizeof(* queue));
/* copy from queue to newQueue */
/* switch to newQueue*/
front = 2*capacity – 1;
rear = capacity – 2;
capacity * =2;
free(queue);
queue= newQueue;
}
Program: queueFull
DEQUEUES OR DEQUE
A deque (double ended queue) is a linear list in which elements can be added or removed at
either end but not in the middle.
Representation
Deque is maintained by a circular array DEQUE with pointers LEFT and RIGHT, which
point to the two ends of the deque.
Figure shows deque with 4 elements maintained in an array with N = 8 memory
locations.
The condition LEFT = NULL will be used to indicate that a deque is empty.
DEQUE
AAA BBB CCC DDD
1 2 3 4 5 6 7 8
LEFT: 4 RIGHT: 7
PRIORITY QUEUES
A priority queue is a collection of elements such that each element has been assigned a priority and
such that the order in which elements are deleted and processed comes from the following rules:
(1) An element of higher priority is processed before any element of lower priority.
(2) Twoelements with the same priority are processed according to the order in which they were
added to the queue.
A prototype of a priority queue is a timesharing system: programs of high priority are processed
first, and programs with the same priority form a standard queue.
One way to maintain a priority queue in memory is by means of a one-way list, as follows:
1. Each node in the list will contain three items of information: an information field INFO,
a priority number PRN and a link number LINK.
2. A node X precedes a node Y in the list
a. When X has higher priority than Y
b. When both have the same priority but X was added to the list before Y. This means
that the order in the one-way list corresponds to the order of the priority queue.
Example:
Below Figure shows the way the priority queue may appear in memory using linear arrays
INFO, PRN and LINK with 7 elements.
The diagram does not tell us whether BBB was added to the list before or after DDD. On the
other hand, the diagram does tell us that BBB was inserted before CCC, because BBB and
CCC have the same priority number and BBB appears before CCC in the list.
Data Structure & Applications – BCS304
The main property of the one-way list representation of a priority queue is that the element in
the queue that should be processed first always appears at the beginning of the one-way list.
Accordingly, it is a very simple matter to delete and process an element from our priority
queue.
Algorithm: This algorithm deletes and processes the first element in a priority queue which
appears in memory as a one-way list.
1. Set ITEM:= INFO[START] [This saves the data in the first node.]
2. Delete first node from the list.
3. Process ITEM.
4. Exit.
Algorithm: This algorithm adds an ITEM with priority number N to a priority queue which is
maintained in memory as a one-way list.
1. Traverse the one-way list until finding a node X whose priority number exceeds N. Insert
ITEM in front of node X.
2. If no such node is found, insert ITEM as the last element of thelist.
Data Structure & Applications – BCS304
The main difficulty in the algorithm comes from the fact that ITEM is inserted before node X. This
means that, while traversing the list, one must also keep track of the address of the node preceding
the node being accessed.
Example:
Consider the priority queue in Fig (a). Suppose an item XXX with priority number 2 is to be
inserted into the queue. We traverse the list, comparing priority numbers.
Fig (a)
Fig(b)
Observe that DDD is the first element in the list whose priority number exceeds that of XXX.
Hence XXX is inserted in the list in front of DDD, as pictured in Fig(b).
Observe that XXX comes after BBB and CCC, which have the same priority as XXX. Suppose now
that an element is to be deleted from the queue. It will be AAA, the first element in the List.
Assuming no other insertions, the next element to be deleted will be BBB, then CCC, then XXX,
and so on.
Data Structure & Applications – BCS304
Another way to maintain a priority queue in memory is to use a separate queue for each
level of priority (or for each priority number).
Each such queue will appear in its own circular array and must have its own pair of
pointers, FRONT and REA R.
If each queue is allocated the same amount of space, a two-dimensional array QUEUE
can be used instead of the linear arrays.
Observe that FRONT[K] and REAR[K] contain, respectively, the front and rear elements of
row K of QUEUE, the row that maintains the queue of elements with priority number K.
The following are outlines or algorithms for deleting and inserting elements in a priority
queue
Algorithm: This algorithm deletes and processes the first element in a priority queue
maintained by a two-dimensional array QUEUE.
1. [Find the first non-empty queue.]
Find the smallest K such that FRONT[K] ≠ NULL.
2. Delete and process the front element in row K of QUEUE.
3. Exit.
Algorithm: This algorithm adds an ITEM with priority number M to a priority queue
maintained by a two-dimensional array QUEUE.
1. Insert ITEM as the rear element in row M of QUEUE.
2. Exit.
Data Structure & Applications – BCS304
In the figure, n is the number of stacks entered by the user, n < MAX_STACKS, and
m =MEMORY_SIZE. Stack i grow from boundary[i] + 1 to boundary [i + 1] before it is full.
A boundary for the last stack is needed, so set boundary [n] to MEMORY_SIZE-1.
Data Structure & Applications – BCS304
element pop(int i)
{ /* remove top element from the ith stack */
if (top[i] == boundary[i])
return stackEmpty(i);
return memory[top[i]--];
}
Program: Delete an item from the ith stack
The top[i] == boundary[i+1] condition in push implies only that a particular stack ran out of
memory, not that the entire memory is full. But still there may be a lot of unused space between
other stacks in array memory as shown in Figure.
Therefore, create an error recovery function called stackFull , which determines if there is any free
space in memory. If there is space available, it should shift the stacks so that space is allocated to
the full stack.
Data Structure & Applications – BCS304
DEFINITION
A linked list, or one-way list, is a linear collection of data elements, called nodes, where the
linear order is given by means of pointers. That is, each node is divided into two parts:
The first part contains the information of the element, and
The second part, called the link field or nextpointer field, contains the address of the
next node in the list.
A pointer variable called START or FIRST which contains the address of the first node.
A special case is the list that has no nodes, such a list is called the null list or empty list and is
denoted by the null pointer in the variable START.
Let LIST be a linked list. Then LIST will be maintained in memory as follows.
1. LIST requires two linear arrays such as INFO and LINK-such that INFO[K] and LINK[K]
contains the information part and the nextpointer field of a node of LIST.
2. LIST also requires a variable name such as START which contains the location of the
beginning of the list, and a nextpointer sentinel denoted by NULL-which indicates the end
of the list.
3. The subscripts of the arrays INFO and LINK will be positive, so choose NULL = 0, unless
otherwise stated.
The following examples of linked lists indicate that the nodes of a list need not occupy adjacent
Data Structure & Applications – BCS304
elements in the arrays INFO and LINK, and that more than one list may be maintained in the same
linear arrays INFO and LINK. However, each list must have its own pointer variable giving the
location of its first node.
START=9 INFO[9]=N
LINK[3]=6 INFO[6]=V
LINK[6]=11 INFO[11]=E
LINK[11]=7 INFO[7]= X
LINK[7]=10 INFO[10]= I
LINK[10]=4 INFO[4]= T
LINK[4]= NULL value, So the list has ended
REPRESENTING CHAIN IN C
The following capabilities are needed to make linked representation
1. A mechanism for defining a node’s structure, that is, the field it contains. So self-
referential structures can be used
2. A way to create new nodes, so MALLOC functions can do this operation
3. A way to remove nodes that no longer needed. The FREE function handles this
operation.
The maintenance of linked lists in memory assumes the possibility of inserting new nodes
into the lists and hence requires some mechanism which provides unused memory space for
the new nodes.
Mechanism is required whereby the memory space of deleted nodes becomes available for
future use.
Together with the linked lists in memory, a special list is maintained which consists of
unused memory cells. This list, which has its own pointer, is called the list of available space
or the free storage list or the free pool.
Suppose linked lists are implemented by parallel arrays and insertions and deletions are to be
performed linked lists. Then the unused memory cells in the arrays will also be linked together to
form a linked list using AVAIL as its list pointer variable. Such a data structure will be denoted by
LIST (INFO, LINK, START, AVAIL)
Data Structure & Applications – BCS304
Data Structure & Applications – BCS304
Garbage Collection
Suppose some memory space becomes reusable because a node is deleted from a list or an
entire list is deleted from a program. So space is need to be available for future use.
One way to bring this is to immediately reinsert the space into the free-storage list. However,
this method may be too time-consuming for the operating system of a computer, which
may choose an alternative method, as follows.
The operating system of a computer may periodically collect all the deleted space onto the
freestorage list. Any technique which does this collection is called garbage collection.
Garbage collection takes place in two steps.
1. First the computer runs through all lists, tagging those cells which are currently in use
2. And then the computer runs through the memory, collecting all untagged space onto the
free-storage list.
The garbage collection may take place when there is only some minimum amount of space or no
space at all left in the free-storage list, or when the CPU is idle and has time to do the collection.
Overflow
Sometimes new data are to be inserted into a data structure but there is no available space,
i.e., the free-storage list is empty. This situation is usually called overflow.
The programmer may handle overflow by printing the message OVERFLOW. In such a case,
the programmer may then modify the program by adding space to the underlying arrays.
Overflow will occur with linked lists when AVAIL = NULL and there is an insertion.
Data Structure & Applications – BCS304
Underflow
The term underflow refers to the situation where one wants to delete data from a data
structure that is empty.
The programmer may handle underflow by printing the message UNDERFLOW.
The underflow will occur with linked lists when START = NULL and there is a deletion.
1. Traversing a Linkedlist
Let LIST be a linked list in memory stored in linear arrays INFO and LINK with START
pointing to the first element and NULL indicating the end of LIST.
Traversing algorithm uses a pointer variable PTR which points to the node that is
currently being processed.
PTR→LINK points to the next node to be processed.
Thus the assignment PTR= PTR→LINK moves the pointer to the next node in the list,
as pictured in below figure
Algorithm: (Traversing a Linked List) Let LIST be a linked list in memory. This algorithm
traverses LIST, applying an operation PROCESS to each element of LIST.
The variable PTR points to the node currently being processed.
1. Set PTR = START
2. Repeat Steps 3 and 4 while PTR ≠ NULL
3. Apply PROCESS to PTR→INFO
4. Set PTR = PTR→LINK
5. Exit.
Data Structure & Applications – BCS304
Example:
The following procedure prints the information at each node of a linked list. Since the procedure
must traverse the list.
2. Searching a Linkedlist
There are two searching algorithm for finding location LOC of the node where ITEM first
appears in LIST.
Let LIST be a linked list in memory. Suppose a specific ITEM of information is given.
If ITEM is actually a key value and searching through a file for the record containing ITEM,
then ITEM can appear only once in LIST.
LIST Is Unsorted
Suppose the data in LIST are not sorted. Then search for ITEM in LIST by traversing through
the list using a pointer variable PTR and comparing ITEM with the contents PTR→INFO of
each node, one by one, of LIST. Before updating the pointer PTR by
PTR = PTR→LINK
It requires two tests.
First check whether we have reached the end of the list, i.e.,
PTR == NULL
If not, then check to see whether
PTR→INFO == ITEM
Data Structure & Applications – BCS304
The complexity of this algorithm for the worst-case running time is proportional to the
number n of elements in LIST, and the average-case running time is approximately
proportional to n/2 (with the condition that ITEM appears once in LIST but with equal
probability in any node of LIST).
LIST is Sorted
Suppose the data in LIST are sorted. Search for ITEM in LIST by traversing the list using a
pointer variable PTR and comparing ITEM with the contents PTR→INFO of each node, one
by one, of LIST. Now, searching can stop once ITEM exceeds PTR→INFO.
The complexity of this algorithm for the worst-case running time is proportional to the number
n of elements in LIST, and the average-case running time is approximately proportional to n/2
Let LIST be a linked list with successive nodes A and B, as pictured in Fig. (a). Suppose a node N
is to be inserted into the list between nodes A and B. The schematic diagram of such an insertion
appears in Fig. (b). That is, node A now points to the new node N, and node N points to node B,
to which A previously pointed.
The above figure does not take into account that the memory space for the new node N will come
from the AVAIL list.
Specifically, for easier processing, the first node in the AVAIL list will be used for the new node
N. Thus a more exact schematic diagram of such an insertion is that in below Fig.
Data Structure & Applications – BCS304
Insertion Algorithms
Algorithms which insert nodes into linked lists come up in various situations.
1. Inserts a node at the beginning of the list,
2. Inserts a node after the node with a givenlocation
3. Inserts a node into a sorted list.
The following is an algorithm which inserts ITEM into LIST so that ITEM follows node A or, when
LOC = NULL, so that ITEM is the first node.
Let N denote the new node. If LOC = NULL, then N is inserted as the first node in LIST. Otherwise,
let node N point to node B by the assignment NEW→LINK:= LOC→LINK and let node A point
to the new node N by the assignment LOC→LINK:= NEW
1. [List empty?] If START = NULL, then: Set LOC: = NULL, and Return.
2. [Special case?] If ITEM < START→INFO, then: Set LOC: = NULL, and Return.
3. Set SAVE: = START and PTR: = START→LINK. [Initializes pointers.]
4. Repeat Steps 5 and 6 while PTR ≠ NULL.
5. If ITEM < PTR→INFO, then:
Set LOC: = SAVE, and Return.
[End of If structure.]
6. Set SAVE: = PTR and PTR: = PTR→LINK. [Updates pointers.]
[End of Step 4 loop.]
7. Set LOC: = SAVE.
8. Return.
Below algorithm which inserts ITEM into a linked list. The simplicity of the algorithm comes
from using the previous two procedures.
Let LIST be a linked list with a node N between nodes A and B, as pictured in below
Fig.(a). Suppose node N is to be deleted from the linked list. The schematic diagram of
such a deletion appears in Fig.(b).
The deletion occurs as soon as the nextpointer field of node A is changed so that it
points to node B.
Linked list is maintained in memory in the form
LIST (INFO, LINK, START, AVAIL)
Data Structure & Applications – BCS304
The above figure does not take into account the fact that, when a node N is deleted from our
list, immediately return its memory space to the AVAIL list. So for easier processing, it will
be returned to the beginning of the AVAIL list. Thus a more exact schematic diagram of such
a deletion is the one in below Fig.
Deletion Algorithms
Deletion of nodes from linked lists come up in various situations.
1. Deletes the node following a given node
2. Deletes the node with a given ITEM of information.
All deletion algorithms will return the memory space of the deleted node N to the beginning
of the AVAIL list.
Doubly linked list: It is a linear collection of data elements, called nodes, where each node N
is divided into three parts:
1. An information field INFO which contains the data of N
2. A pointer field LLINK (FORW) which contains the location of the next node in the list
3. A pointer field RLINK (BACK) which contains the location of the preceding node in
the list
A header linked list is a linked list which contains a special node, called the header node, at the
beginning of the list.
The following are two kinds of widely used header lists:
1. A grounded header list is a header list where the last node contains the null pointer.
2. A circular header list is a header list where the last node points back to the headernode.
Observe that the list pointer START always points to the header node.
If START→LINK = NULL indicates that a grounded header list is empty
If START→LINK = START indicates that a circular header list is empty.
Data Structure & Applications – BCS304
The first node in a header list is the node following the header node, and the location of the first
node is START→LINK, not START, as with ordinary linked lists.
Below algorithm, which uses a pointer variable PTR to traverse a circular header list
1. Begins with PTR = START→LINK (not PTR = START)
2. Ends when PTR = START (not PTR = NULL).
Algorithm: (Traversing a Circular Header List) Let LIST be a circular header list in memory.
This algorithm traverses LIST, applying an operation PROCESS to each node of LIST.
1. Set PTR: = START→LINK. [Initializes the pointer PTR.]
2. Repeat Steps 3 and 4 while PTR ≠ START:
3. Apply PROCESS to PTR→INFO.
4. Set PTR: = PTR→LINK. [PTR now points to the next node.]
[End of Step 2 loop.]
5. Exit.
Linked Stack
Function push creates a new node, temp, and places item in the data field and top in the link field.
The variable top is then changed to point to temp. A typical function call to add an element to the
ith stack would be push (i,item).
element pop(int i)
{ /* remove top element from the ith stack */
stackPointer temp = top[i];element item;
Data Structure & Applications – BCS304
if (! temp)
return stackEmpty();
item = temp→data;
top[i] = temp→link;
free (temp) ;
return item;
}
Function pop returns the top element and changes top to point to the address contained in its link
field. The removed node is then returned to system memory. A typical function call to delete an
element from the ith stack would be item = pop (i);
Linked Queue
The representation of m ≤ MAX_QUEUES queues, below is the declarations:
#define MAX-QUEUES 10 /* maximum number of queues */
typedef.struct queue *queuePointer;
typedef struct {
element data;
queuePointer link;
} queue;
queuePointer front[MAX_QUEUES], rear[MAX_QUEUES];
Functions addq and deleteq implement the add and delete operations for multiple queues.
if (front[i])
rear[i] →link = temp;
Data Structure & Applications – BCS304
else
front[i] = temp;
rear[i] = temp;
}
Function addq is more complex than push because we must check for an empty queue. If the queue
is empty, then change front to point to the new node; otherwise change rear's link field to point to
the new node. In either case, we then change rear to point to the new node.
element deleteq(int i)
{ /* delete an element from queue i */
queuePointer temp = front[i];
element item;
if (! temp)
return queueEmpty();
item = temp→data;
front[i]= temp→link;
free (temp) ;
return item;
}
Function deleteq is similar to pop since nodes are removing that is currently at the start of the
list. Typical function calls would be addq (i, item); and item = deleteq (i);
where the ai are nonzero coefficients and the ei are nonnegative integer exponents such that em-
l > em-2 > ... > e1 > e0 ≥ 0.
Present each term as a node containing coefficient and exponent fields, as well as a pointer to
the next term.
Data Structure & Applications – BCS304
Assuming that the coefficients are integers, the type declarations are:
Adding Polynomials
To add two polynomials, examine their terms starting at the nodes pointed to by a and b.
If the exponents of the two terms are equal, then add the two coefficients and create a new
term for the result, and also move the pointers to the next nodes in a and b.
If the exponent of the current term in a is less than the exponent of the current term in b,
then create a duplicate term of b, attach this term to the result, called c, and advance the
pointer to the next term in b.
If the exponent of the current term in b is less than the exponent of the current term in a,
then create a duplicate term of a, attach this term to the result, called c, and advance the
pointer to the next term in a
Analysis of padd:
To determine the computing time of padd, first determine which operations contribute to the
cost. For this algorithm, there are three cost measures:
(l) Coefficient additions
(2) Exponent comparisons
(3) Creation of new nodes for c
The maximum number of executions of any statement in padd is bounded above by m + n. Therefore,
the computing time is O(m+n). This means that if we implement and run the algorithm on a
computer, the time it takes will be C1m + C2n + C3, where C1, C2, C3 are constants. Since any
algorithm that adds two polynomials must look at each nonzero term at least once, padd is optimal
to within a constant factor.
In data representation, each column of a sparse matrix is represented as a circularly linked list
with a header node. A similar representation is used for each row of a sparse matrix.
Each node has a tag field, which is used to distinguish between header nodes and entry nodes.
Header Node:
Each header node has three fields: down, right, and next as shown in figure (a).
The down field is used to link into a column list and the right field to link into a row list.
The next field links the header nodes together.
The header node for row i is also the header node for column i, and the total number of
header nodes is max {number of rows, number of columns}.
Data Structure & Applications – BCS304
Element node:
Each element node has five fields in addition in addition to the tag field: row, col, down,
right, value as shown in figure (b).
The down field is used to link to the next nonzero term in the same column and the right
field to link to the next nonzero term in the same row. Thus, if a ij ≠ 0, there is a node with
tag field = entry, value = aij, row = i, and col = j as shown infigure (c).
We link this node into the circular linked lists for row i and column j. Hence, it is
simultaneously linked into two different lists.
Figure (3) shows the linked representation of this matrix. Although we have not shown the value
of the tag fields, we can easily determine these values from the node structure.
For each nonzero term of a, have one entry node that is in exactly one row list and one column list.
The header nodes are marked HO-H3. As the figure shows, we use the right field of the header node
list header to link into the list of headernodes.
To represent a numRows x numCols matrix with numTerms nonzero terms, then we need max
Data Structure & Applications – BCS304
{numRows, numCols} + numTerms + 1 nodes. While each node may require several words of
memory, the total storage will be less than numRows x numCols when numTerms is
sufficiently small.
There are two different types of nodes in representation, so unions are used to create the
appropriate data structure. The C declarations are as follows:
#define MAX-SIZE 50 /*size of largest matrix*/ typedef enum {head, entry} tagfield;
typedef struet matrixNode *matrixPointer;
typedef strue {
int row; int
eol; int value;
} entryNode;
typedef struet {
matrixPointer down;
matrixPointer right; tagfield
tag;
MODULE 4: TREES
DEFINITION
A tree is a finite set of one or more nodes such that
There is a specially designated node called root.
The remaining nodes are partitioned into n >= 0 disjoint set T1,…,Tn, where each of
these sets is a tree. T1,…,Tn are called the subtrees of the root.
TERMINOLOGY
Node: The item of information plus the branches to other nodes
Degree: The number of subtrees of a node
Degree of a tree: The maximum of the degree of the nodes in the tree.
Terminal nodes (or leaf): nodes that have degree zero or node with no successor
Nonterminal nodes: nodes that don’t belong to terminal nodes.
Parent and Children: Suppose N is a node in T with left successor S1 and right
successor S2, then N is called the Parent (or father) of S1 and S2. Here, S1 is called
left child (or Son) and S2 is called right child (or Son) of N.
Siblings: Children of the same parent are said to be siblings.
Edge: A line drawn from node N of a T to a successor is called an edge
Path: A sequence of consecutive edges from node N to a node M is called a path.
Ancestors of a node: All the nodes along the path from the root to that node.
The level of a node: defined by letting the root be at level zero. If a node is at level l,
then it children are at level l+1.
Height (or depth): The maximum level of any node in the tree
Data Structure & Applications – BCS304
Example
Representation of Trees
Figure (A)
1. List Representation
2. Left Child- Right Sibling Representation
3. Representation as a Degree-Two tree
Data Structure & Applications – BCS304
List Representation:
The tree can be represented as a List. The tree of figure (A) could be written as the list.
(A (B (E (K, L), F), C (G), D (H (M), I, J) ) )
Tree node is represented by a memory node that has fields for the data and pointers to the tree
node's children
Since the degree of each tree node may be different, so memory nodes with a varying number
of pointer fields are used.
For a tree of degree k, the node structure can be represented as below figure. Each child field
is used to point to a subtree.
The below figure show the node structure used in the left child-right sibling representation
Ex:
In Figure (A), the leftmost child of A is B, and the leftmost child of D is H.
The closest right sibling of B is C, and the closest right sibling of H is I.
Choose the nodes based on how the tree is drawn. The left child field of each node points to
its leftmost child (if any), and the right sibling field points to its closest right sibling (if
any).
Figure (D) shows the tree of Figure (A) redrawn using the left child-right sibling representation.
To obtain the degree-two tree representation of a tree, simply rotate the right-sibling pointers in
a left child-right sibling tree clockwise by 45 degrees. This gives us the degree-two tree displayed
in Figure (E).
In the degree-two representation, a node has two children as the left and right children.
Data Structure & Applications – BCS304
BINARY TREES
Definition: A binary tree T is defined as a finite set of nodes such that,
T is empty or
T consists of a root and two disjoint binary trees called the left subtree and the right
subtree.
1. Skewed Tree
A skewed tree is a tree, skewed to the left or skews to the right.
or
It is a tree consisting of only left subtree or only right subtree.
A tree with only left subtrees is called Left Skewed Binary Tree.
A tree with only right subtrees is called Right Skewed Binary Tree.
The following tree is its extended binary tree. The circles represent internal nodes, and square
represent external nodes.
Every internal node in the extended tree has exactly two children, and every external node is
a leaf. The result is a complete binary tree.
Data Structure & Applications – BCS304
Proof:
(1) The proof is by induction on i.
Induction Base: The root is the only node on level i = 1. Hence, the maximum number of nodes
on level i =1 is 2i-1 = 20 = 1.
Induction Hypothesis: Let i be an arbitrary positive integer greater than 1. Assume that the
maximum number of nodes on level i -1is 2i-2
Induction Step: The maximum number of nodes on level i -1 is 2i-2 by the induction hypothesis.
Since each node in a binary tree has a maximum degree of 2, the maximum number of nodes on
level i is two times the maximum number of nodes on level i-1, or 2i-1
Proof: Let n1 be the number of nodes of degree one and n the total number of nodes.
Since all nodes in T are at most of degree two, we have
n = n0 + n1+ n2 (1)
Count the number of branches in a binary tree. If B is the number of branches, then
n =B + 1.
All branches stem from a node of degree one or two. Thus,
B =n 1+ 2n2.
Hence, we obtain
n = B + 1= n 1+ 2n2 + 1 (2)
Subtracting Eq. (2) from Eq. (1) and rearranging terms, we get
n0 = n2 +1
Data Structure & Applications – BCS304
Array representation:
A tree can be represented using an array, which is called sequential representation.
The nodes are numbered from 1 to n, and one dimensional array can be used to store
the nodes.
Position 0 of this array is left empty and the node numbered i is mapped to position i of
the array.
Below figure shows the array representation for both the trees of figure (a).
Data Structure & Applications – BCS304
For complete binary tree the array representation is ideal, as no space is wasted.
For the skewed tree less than half the array is utilized.
Linked representation:
The problems in array representation are:
It is good for complete binary trees, but more memory is wasted for skewed and many
other binary trees.
The insertion and deletion of nodes from the middle of a tree require the movement of
many nodes to reflect the change in level number of these nodes.
1. Inorder: Inorder traversal calls for moving down the tree toward the left until you cannot go
further. Then visit the node, move one node to the right and continue. If no move can be done, then
go back one more node.
Let ptr is the pointer which contains the location of the node N currently being scanned.
L(N) denotes the leftchild of node N and R(N) is the right child of node N
Recursion function:
The inorder traversal of a binary tree can be recursively defined as
void inorder(treepointerptr)
{
if (ptr)
{
inorder (ptr→leftchild);
printf (“%d”,ptr→data);
inorder (ptr→rightchild);
}
}
Data Structure & Applications – BCS304
2. Preorder: Preorder is the procedure of visiting a node, traverse left and continue. When you
cannot continue, move right and begin again or move back until you can move right and resume.
Recursion function:
The Preorder traversal of a binary tree can be recursively defined as
Visit the root
Traverse the left subtree in preorder.
Traverse the right subtree in preorder
3. Postorder: Postorder traversal calls for moving down the tree towards the left until you can
go no further. Then move to the right node and then visit the node and continue.
Recursion function:
The Postorder traversal of a binary tree can be recursively defined as
Traverse the left subtree in postorder.
Traverse the right subtree in postorder.
Visit the root
void postorder(treepointerptr)
{
if (ptr)
{
postorder (ptr→leftchild);
postorder (ptr→rightchild);
printf (“%d”,ptr→data);
}
}
Data Structure & Applications – BCS304
5. Level-Order traversal:
Visiting the nodes using the ordering suggested by the node numbering is called level
ordering traversing.
The nodes in a tree are numbered starting with the root on level 1 and so on.
Firstly visit the root, then the root’s left child, followed by the root’s right child. Thus
continuing in this manner, visiting the nodes at each new level from the leftmost node to the
rightmost node.
1. Copying a Binarytree
This operations will perform a copying of one binary tree to another.
2. Testing Equality
This operation will determin the equivalance of two binary tree. Equivalance binary tree have
the same strucutre and the same information in the corresponding nodes.
Data Structure & Applications – BCS304
This function will return TRUE if two trees are equivalent and FALSE if they are not.
The satisfiablity problem for formulas of the propositional calculus asks if there is an
assignment of values to the variable that causes the value of the expression to be true.
The algorithm to determine satisfiablity is to let (x1, x2, x3) takes on all the possible
combination of true and false values to check the formula for each combination.
For n value of an expression, there are 2n possible combinations of true and false
For example n=3, the eight combinations are (t,t,t), (t,t,f), (t,f,t), (t,f,f), (f,t,t), (f,t,f), (f,f,t),
(f,f,f).
The algorithm will take O(g 2n), where g is the time to substitute values for x1, x2,… xn and
evaluate the expression.
Node structure:
For the purpose of evaluation algorithm, assume each node has four fields:
In the linked representation of any binary tree, there are more null links than actual pointers. These
null links are replaced by the pointers, called threads, which points to other nodes in the tree.
When trees are represented in memory, it should be able to distinguish between threads and
pointers. This can be done by adding two additional fields to node structure, ie., leftThread and
rightThread
If ptr→leftThread = TRUE, then ptr→leftChild contains a thread, otherwise it contains a
pointer to the left child.
If ptr→rightThread = TRUE, then ptr→rightChild contains a thread, otherwise it contains
a pointer to the rightchild.
Node Structure:
The node structure is given in C declaration
The complete memory representation for the tree of figure is shown in Figure C
Data Structure & Applications – BCS304
The variable root points to the header node of the tree, while root →leftChild points to the start
of the first node of the actual tree. This is true for all threaded trees. Here the problem of the loose
threads is handled by pointing to the head node called root.
Graphs
Definitions
A graph is a pictorial representation of a set of objects where some pairs of objects are connected
by links. The interconnected objects are represented by points termed as vertices, and the links that
connect the vertices are called edges.
Formally, a graph is a pair of sets (V, E), where V is the set of vertices and E is the set of edges,
connecting the pairs of vertices. Take a look at the following graph −
In the
above
graph,
V = {a,
b, c, d,
e}
Vertex − Each node of the graph is represented as a vertex. In example given below, labeled circle
represents vertices. So A to G are vertices. We can represent them using an array as shown in image
below. Here A can be identified by index 0. B can be identified using index 1 and so on.
Edge − Edge represents a path between two vertices or a line between two vertices. In example given
below, lines from A to B, B to C and so on represents edges. We can use a two dimensional array to
represent array as shown in image below. Here AB can be represented as 1 at row 0, column 1, BC as
1 at row 1, column 2 and so on, keeping other combinations as 0.
Adjacency − Two node or vertices are adjacent if they are connected to each other through an
edge. In example given below, B is adjacent to A, C is adjacent to B and so on.
Path − Path represents a sequence of edges between two vertices. In example given below,
ABCD represents a path from A to D.
Traversal methods
Breadth First Search
Breadth First Search algorithm(BFS) traverses a graph in a breadthwards motion and uses a
queue to remember to get the next vertex to start a search when a dead end occurs in any
iteration.
As in example given above, BFS algorithm traverses from A to B to E to F first then to C and G lastly
to D. It employs following rules.
Rule 1 − Visit adjacent unvisited vertex. Mark it visited. Display it. Insert it in a queue.
Rule 2 − If no adjacent vertex found, remove the first vertex from queue.
Traversal Description
At this stage we are left with no unmarked (unvisited) nodes. But as per algorithm we keep on
dequeuing in order to get all unvisited nodes. When the queue gets emptied the program is over.
Depth First Search algorithm(DFS) traverses a graph in a depthward motion and uses a stack to
remember to get the next vertex to start a search when a dead end occurs in any iteration.
As in example given above, DFS algorithm traverses from A to B to C to D first then to E, then to
F and lastly to G. It employs following rules.
Rule 1 − Visit adjacent unvisited vertex. Mark it visited. Display it. Push it in a stack.
Rule 2 − If no adjacent vertex found, pop up a vertex from stack. (It will pop up all the vertices
from the stack which do not have adjacent vertices.)
Traversal Description
We choose B, mark it
visited and put onto stack.
Here B does not have any
unvisited adjacent node.
So we pop B from the
stack.
Data Structure & Applications – BCS304
As C does not have any unvisited adjacent node so we keep popping the stack until we find a
node which has unvisited adjacent node. In this case, there's none and we keep popping until stack
is empty.
Data Structure & Applications – BCS304
MODULE 5: HASHING
The Hash Table organizations
If the keys are not unique, then we can simply construct a set
of m lists and store the heads of these lists in the direct
address table. The time to find an element matching an input
key will still be O (1).
However, if each element of the collection has some other max). If
The range of the key determines the size of the direct address table and may be too large to be practical.
For instance it's not likely that you'll be able to use a direct address table to store elements which have
arbitrary 32- bit integers as their keys for a few years yet!
which maps each value of the key, k, to the range (1,m). In this case, we place the element in T[h(k)]
rather than T[k] and we can search in O(1) time as before.
Hashing Functions
The following functions map a single integer key (k) to a small integer bucket value h(k). m is the size
of the hash table (number of buckets).
Division method (Cormen) Choose a prime that isn't close to a power of 2. h(k) = k mod m. Works
badly for many types of patterns in the input data.
Knuth Variant on Division h(k) = k(k+3) mod m. Supposedly works much better than the raw
division method.
s = k*A
x = fractional part
of s h(k) =
floor(m*x)
To do this quickly with integer arithmetic, let w be the number of bits in a word (e.g. 32) and suppose
m is 2^p. Then compute:
s =
floor(A *
2^w) x =
k*s
h(k) = x >> (w-p) // i.e. right shift x by (w-p) bits
// i.e. extract the p most significant
// bits from x
Data Structure & Applications – BCS304
The good functioning of a hash table depends on the fact that the table size is proportional to the
number of entries. With a fixed size, and the common structures, it is similar to linear search, except
with a better constant factor. In some cases, the number of entries may be definitely known in advance,
for example keywords in a language. More commonly, this is not known for sure, if only due to later
changes in code and data. It is one serious, although common, mistake to not provide any way for the
table to resize. A general-purpose hash table "class" will almost always have some way to resize, and
it is good practice even for simple "custom" tables. An implementation should check the load factor,
and do something if it becomes too large (this needs to be done only on inserts, since that is the only
thing that would increase it).
To keep the load factor under a certain limit, e.g., under 3/4, many table implementations expand the
table when items are inserted. For example, in Java's HashMap class the default load factor threshold
for table expansion is 3/4 and in Python's dict , table size is resized when load factor is greater than
2/3.
Since buckets are usually implemented on top of a dynamic array and any constant proportion for
resizing greater than 1 will keep the load factor under the desired limit, the exact choice of the constant
is determined by the same space-time tradeoff as for dynamic arrays.
Resizing is accompanied by a full or incremental table rehash whereby existing items are mapped to
new bucket locations.
To limit the proportion of memory wasted due to empty buckets, some implementations also shrink
the size of the table—followed by a rehash—when items are deleted. From the point of space-time
tradeoffs, this operation is similar to the deallocation in dynamic arrays.
Resizing by copying all entries
A common approach is to automatically trigger a complete resizing when the load factor
exceeds some
threshold rmax. Then a new larger table is allocated, all the entries of the old table are removed and
inserted into this new table, and the old table is returned to the free storage pool. Symmetrically,
when the load factor falls below a second threshold rmin, all entries are moved to a new smaller table.
For hash tables that shrink and grow frequently, the resizing downward can be skipped entirely. In this
case, the table size is proportional to the maximum number of entries that ever were in the hash table
at one time, rather than the current number. The disadvantage is that memory usage will be higher,
and thus cache behavior may be worse. For best control, a "shrink-to-fit" operation can be provided
that does this only on request.
If the table size increases or decreases by a fixed percentage at each expansion, the total cost of these
resizings, amortized over all insert and delete operations, is still a constant, independent of the number
Data Structure & Applications – BCS304
For example, consider a table that was created with the minimum possible size and is doubled each
time the load ratio exceeds some threshold. If m elements are inserted into that table, the total number
of extra re- insertions that occur in all dynamic resizings of the table is at most m − 1. In other words,
dynamic resizing roughly doubles the cost of each insert or delete operation.
Incremental resizing
Some hash table implementations, notably in real-time systems, cannot pay the price of enlarging the
hash table all at once, because it may interrupt time-critical operations. If one cannot avoid dynamic
resizing, a solution is to perform the resizing gradually:
During the resize, allocate the new hash table, but keep the old table unchanged.
In each lookup or delete operation, check both tables.
Perform insertion operations only in the new table.
At each insertion also move r elements from the old table to the new table.
When all elements are removed from the old table, deallocate it.
To ensure that the old table is completely copied over before the new table itself needs to be enlarged,
it is necessary to increase the size of the table by a factor of at least (r + 1)/r during resizing.
Disk-based hash tables almost always use some scheme of incremental resizing, since the cost of
rebuilding the entire table on disk would be too high.
Monotonic keys
If it is known that key values will always increase (or decrease) monotonically, then a variation of
consistent hashing can be achieved by keeping a list of the single most recent key value at each hash
table resize operation. Upon lookup, keys that fall in the ranges defined by these list entries are directed
to the appropriate hash function—and indeed hash table—both of which can be different for each
range. Since it is common to grow the overall number of entries by doubling, there will only
beO(log(N)) ranges to check, and binary search time for the redirection would be O(log(log(N))). As
with consistent hashing, this approach guarantees that any key's hash, once issued, will never change,
even when the hash table is later grown.
Other solutions
Linear hashing is a hash table algorithm that permits incremental hash table expansion. It is
implemented using a single hash table, but with two possible lookup functions.
Another way to decrease the cost of table resizing is to choose a hash function in such a way that the hashes of
most values do not change when the table is resized. This approach, called consistent hashing, is prevalent in
disk-based and distributed hash tables, where rehashing is prohibitively costly.
Data Structure & Applications – BCS304
The basic step in this method is to insert a new record into a sorted sequence of i records
in such a way that the resulting sequence of size i + 1 is also ordered.
Function insert accomplishes this insertion.
The use of a[0] enables us to simplify the while loop, avoiding a test for end of list (i <
1). In insertion sort, begin with the ordered sequence a [1] and successively insert the
records a [2], a [3], ... , a [n]. Since each insertion leaves the resultant sequence ordered,
the list with n records can be ordered making n - 1 insertions.
The details are given in function insertionSort.
Analysis of insertion Sort: In the worst case insert (e, a, i) makes i + 1 comparisons
before making the insertion. Hence the complexity of Insert is O(i). Function
insertionSort invokes insert for i = j - 1 = 1, 2, ... , n - 1. So, the complexity of
insertionSort is
Data Structure & Applications – BCS304
Example: Assume that n = 5 and the input key sequence is 2, 3, 4, 5, 1. after each
iteration we have
RADIX SORT
Radix sort is the method that many people intuitively use or begin to use when
alphabetizing a large list of names. (Here the radix is 26, the 26 letters of the alphabet.)
Specifically, the list of names is first sorted according to the first letter of each name.
That is, the names are arranged in 26 classes, where the first class consists of those
names that begin with "A," the second class consists of those names that begin with
"B," and so on. During the second pass, each class is alphabetized according to the
second letter of the name. And so on. If no name contains, for example, more than 12
letters, the names are alphabetized with at most 12 passes.
The radix sort is the method used by a card sorter. A card sorter contains 13 receiving
pockets labelled as follows:
9, 8, 7, 6, 5, 4, 3, 2, 1, 0, 11, 12, R (reject)
Each pocket other than R corresponds to a row on a card in which a hole can be punched.
Decimal numbers, where the radix is 10, are punched in the obvious way and hence use
only the first 10 pockets of the sorter. The sorter uses a radix reverse-digit sort on
numbers. That is, suppose a card sorter is given a collection of cards where each card
Data Structure & Applications – BCS304
contains a 3-digit number punched in columns 1 to 3. The cards are first sorted
according to the unit’s digit. On the second pass, the cards are sorted according to the
tens digit. On the third and last pass, the cards are sorted according to the hundreds digit.
Given to a card sorter, the numbers would be sorted in three phases, as pictured in
Data Structure & Applications – BCS304
Data field: A data field is an elementary unit that stores a single fact. A data field is
usually characterized by its type and size.
Example: student’s name is a data field that stores the name of students.
Record: A record is a collection of related data fields which is seen as a single unit
from the application point of view.
Example: The student’s record may contain data fields such as name, address, phone
number, roll number, marks obtained, and so on.
FILE ATTRIBUTES
File has a list of attributes associated with it that gives the operating system and the
application software information about the file and how it is intended to be used.
File name: It is a string of characters that stores the name of a file. File naming
conventions vary from one operating system to the other.
File position: It is a pointer that points to the position at which the next read/write
operation will be performed.
File structure: It indicates whether the file is a text file or a binary file. In the text file,
the numbers are stored as a string of characters. A binary file stores numbers in the same
way as they are represented in the main memory.
File Access Method: It indicates whether the records in a file can be accessed
sequentially or randomly.
In sequential access mode, records are read one by one. That is, if 60 records of students
are stored in the STUDENT file, then to read the record of 39th student, you have to go
through the record of the first 38 students.
In random access, records can be accessed in any order.
Attributes Flag: A file can have six additional attributes attached to it. These attributes
are usually stored in a single byte, with each bit representing a specific attribute. If a
particular bit is set to ‘1’ then this means that the corresponding attribute is turned on.
Above figure shows the list of attributes and their position in the attribute flag or attribute
byte.
Data Structure & Applications – BCS304
Text Files
A text file, also known as a flat file or an ASCII file, is structured as a sequence of
lines of alphabet, numerals, special characters.
The data in a text file, whether numeric or non-numeric, is stored using its
corresponding ASCII code.
The end of a text file is denoted by placing a special character, called an end-of-file
marker, after the last line in the text file.
It is possible for humans to read text files which contain only ASCII text.
Text files can be manipulated by any text editor, they do not provide efficient storage.
Binary Files
A binary file contains any type of data encoded in binary form for computer storage
and processing purposes.
A binary file can contain text that is not broken up into lines.
A binary file stores data in a format that is similar to the format in which the data is
stored in the main memory. Therefore, a binary file is not readable by humans.
Binary files contain formatting information that only certain applications or processors
can understand.
Binary files must be run on an appropriate software or processor so that the software
or processor can transform the data in order to make it readable.
Binary files provide efficient storage of data, but they can be read only through an
appropriate program.
Data Structure & Applications – BCS304
The basic operations that can be performed on a file are given in below figure
Creating a File
A file is created by specifying its name and mode. Then the file is opened for writing records
that are read from an input device. Once all the records have been written into the file, the file
is closed. The file is now available for future read/write operations by any program that has
been designed to use it in some way or the other.
Updating a File
Updating a file means changing the contents of the file to reflect a current picture of reality.
A file can be updated in the following ways:
Inserting a new record in the file. For example, if a new student joins the course, we
need to add his record to the STUDENT file.
Deleting an existing record. For example, if a student quits a course in the middle of
the session, his record has to be deleted from the STUDENT file.
Modifying an existing record. For example, if the name of a student was spelt
incorrectly, then correcting the name will be a modification of the existingrecord.
Maintaining a File
It involves restructuring or re-organizing the file to improve the performance of the programs
that access this file.
Restructuring a file keeps the file organization unchanged and changes only the structural
aspects of the file.
Example: changing the field width or adding/deleting fields.
File reorganization may involve changing the entire organization of the file
Data Structure & Applications – BCS304
FILE ORGANIZATION
Organization of records means the logical arrangement of records in the file and not the
physical layout of the file as stored on a storage media.
The following considerations should be kept in mind before selecting an appropriate file
organization method:
Rapid access to one or more records
Ease of inserting/updating/deleting one or more records without disrupting the speed
of accessing record
Efficient storage of records
Using redundancy to ensure data integrity
1. Sequential Organization
A sequentially organized file stores the records in the order in which they were entered.
Sequential files can be read only sequentially, starting with the first record in the file.
Sequential file organization is the most basic way to organize a large collection of records in
a file
Features
Records are written in the order in which they are entered
Records are read and written sequentially
Deletion or updation of one or more records calls for replacing the original file with a
new file that contains the desired changes
Records have the same size and the same field format
Records are sorted on a key value
Generally used for report generation or sequential reading
Data Structure & Applications – BCS304
Advantages
Simple and easy to Handle
No extra overheads involved
Sequential files can be stored on magnetic disks as well as magnetic tapes
Well suited for batch– oriented applications
Disadvantages
Records can be read only sequentially. If ith record has to be read, then all the i–1
records must be read
Does not support update operation. A new file has to be created and the original file
has to be replaced with the new file that contains the desired changes
Cannot be used for interactive applications
If the records are of fixed length and we know the base address of the file and the length of
the record, then any record i can be accessed using the following formula:
Consider the base address of a file is 1000 and each record occupies 20 bytes, then the
address of the 5th record can be given as:
1000 + (5–1) * 20
= 1000 + 80
= 1080
Data Structure & Applications – BCS304
Features
Provides an effective way to access individual records
The record number represents the location of the record relative to the beginning of the
file
Records in a relative file are of fixed length
Relative files can be used for both random as well as sequential access
Every location in the table either stores a record or is marked as FREE
Advantages
Ease of processing
If the relative record number of the record that has to be accessed is known, then the
record can be accessed instantaneously
Random access of records makes access to relative files fast
Allows deletions and updations in the same file
Provides random as well as sequential access of records with low overhead
New records can be easily added in the free locations based on the relative record
number of the record to be inserted
Well suited for interactive applications
Disadvantages
Use of relative files is restricted to disk devices
Records can be of fixed length only
For random access of records, the relative record number must be known in advance
Features
Provides fast data retrieval
Records are of fixed length
Index table stores the address of the records in the file
The ith entry in the index table points to the ith record of the file
While the index table is read sequentially to find the address of the desired record, a
direct access is made to the address of the specified record in order to access it randomly
Indexed sequential files perform well in situations where sequential access as well as
random access is made to the data
Advantages
The key improvement is that the indices are small and can be searched quickly,
allowing the database to access only the records it needs
Supports applications that require both batch and interactive processing
Records can be accessed sequentially as well as randomly
Updates the records in the same file
Disadvantages
Indexed sequential files can be stored only on disks
Needs extra space and overhead to store indices
Handling these files is more complicated than handling sequential files
Supports only fixed length records
INDEXING
the indexing technique based on factors such as access type, access time, insertion time,
deletion time, and space overhead involved. There are two kinds of indices:
Ordered indices that are sorted based on one or more key values
Hash indices that are based on the values generated by applying a hash function
1. Ordered Indices
Indices are used to provide fast random access to records. An index of a file may be a
primary index or a secondary index.
Primary Index
In a sequentially ordered file, the index whose search key specifies the sequential order of the
file is defined as the primary index.
Example: suppose records of students are stored in a STUDENT file in a sequential order
starting from roll number 1 to roll number 60. Now, if we want to search a record for, say,
roll number 10, then the student’s roll number is the primary index.
Data Structure & Applications – BCS304
Secondary Index
An index whose search key specifies an order different from the sequential order of the file is
called as the secondary index.
Example: If the record of a student is searched by his name, then the name is a secondary index.
Secondary indices are used to improve the performance of queries on non-primary keys.
Dense index
In a dense index, the index table stores the address of every record in the file.
Dense index would be more efficient to use than a sparse index if it fits in the memory
By looking at the dense index, it can be concluded directly whether the record exists in
the file or not.
Sparse index
In a sparse index, the index table stores the address of only some of the records in the
file.
Sparse indices are easy to fit in the main memory,
In a sparse index, to locate a record, first find an entry in the index table with the largest
search key value that is either less than or equal to the search key value of the desired
record. Then, start at that record pointed to by that entry in the index table and then
proceed searching the record using the sequential pointers in the file, until the desired
record is obtained.
Example: If we need to access record number 40, then record number 30 is the largest key
value that is less than 40. So jump to the record pointed by record number 30 and move along
the sequential pointer to reach record number 40.
Below figure shows a dense index and a sparse index for an indexed sequential file.
Data Structure & Applications – BCS304
Cylinder surface indexing is a very simple technique used only for the primary key index of a
sequentially ordered file.
The index file will contain two fields—cylinder index and several surface indices.
There are multiple cylinders, and each cylinder has multiple surfaces. If the file needs m
cylinders for storage then the cylinder index will contain m entries.
When a record with a particular key value has to be searched, then the following steps are
performed:
First the cylinder index of the file is read into memory.
Second, the cylinder index is searched to determine which cylinder holds the desired
record. For this, either the binary search technique can be used or the cylinder index can
be made to store an array of pointers to the starting of individual key values. In either
case the search will take O (log m) time.
After the cylinder index is searched, appropriate cylinder is determined.
Depending on the cylinder, the surface index corresponding to the cylinder is then
retrieved from the disk.
Since the number of surfaces on a disk is very small, linear search can be used to
determine surface index of the record.
Once the cylinder and the surface are determined, the corresponding track is read and
searched for the record with the desired key.
Hence, the total number of disk accesses is three—first, for accessing the cylinder index,
second for accessing the surface index, and third for getting the track address.
Data Structure & Applications – BCS304
4. Multi-level Indices
Consider very large files that may contain millions of records. For such files, a simple
indexing technique will not suffice. In such a situation, we use multi-level indices.
Below figure shows a two-level multi-indexing. Three-level indexing and so, can also be
used
In the figure, the main index table stores pointers to three inner index tables. The inner index
tables are sparse index tables that in turn store pointers to the records.
5. Inverted Indices
Inverted files are used in document retrieval systems for large textual databases.
An inverted file reorganizes the structure of an existing data file in order to provide fast
access to all records having one field falling within the set limits.
When a term or keyword specified in the inverted file is identified, the record number
is given and a set of records corresponding to the search criteria are created.
For each keyword, an inverted file contains an inverted list that stores a list of pointers
to all occurrences of that term in the main text. Therefore, given a keyword, the
addresses of all the documents containing that keyword can easily be located.
It is impractical to maintain the entire database in the memory, hence B-trees are used to
index the data in order to provide fast access.
B-trees are used for its data retrieval speed, ease of maintenance, and simplicity.
It forms a tree structure with the root at the top. The index consists of a B-tree (balanced
tree) structure based on the values of the indexed column.
In this example, the indexed column is name and the B-tree is created using all the
existing names that are the values of the indexed column.
The upper blocks of the tree contain index data pointing to the next lower block, thus
forming a hierarchical structure. The lowest level blocks, also known as leaf blocks,
contain pointers to the data rows stored in the table.
7. Hashed Indices
Hashing is used to compute the address of a record by using a hash function on the search key
value.
The hashed values map to the same address, then collision occurs and schemes to resolve
these collisions are applied to generate a new address
Choosing a good hash function is critical to the success of this technique. By a good hash
function, it mean two things.
1. First, a good hash function, irrespective of the number of search keys, gives an
average-case lookup that is a small constant.
2. Second, the function distributes records uniformly and randomly among the buckets,
where a bucket is defined as a unit of one or more records
The worst hash function is one that maps all the keys to the same bucket.
1. Insertion
To insert a record that has ki as its search value, use the hash function h(ki) to compute the
address of the bucket for that record.
If the bucket is free, store the record else use chaining to store the record.
2. Search
To search a record having the key value ki, use h(ki) to compute the address of the bucket
where the record is stored.
The bucket may contain one or several records, so check for every record in the bucket to
retrieve the desired record with the given key value.
3. Deletion
To delete a record with key value ki, use h(ki) to compute the address of the bucket where the
record is stored. The bucket may contain one or several records so check for every record in
the bucket, and then delete the record.