Data Structure Unit 5 (Searching and Sorting Notes)
Data Structure Unit 5 (Searching and Sorting Notes)
Data Structure Unit 5 (Searching and Sorting Notes)
1 SEARCHING
5.1.1 Linear Search
Linear search or sequential search is a method for finding a particular value in a list that consists
of checking every one of its elements, one at a time and in sequence, until the desired one is
found.
Linear search is the simplest search algorithm; it is a special case of brute-force search. Its worst
case cost is proportional to the number of elements in the list; and so is its expected cost, if all
list elements are equally likely to be searched for. Therefore, if the list has more than a few
elements, other methods (such as binary search or hashing) will be faster, but they also impose
additional requirements.
Linear search in an array is usually programmed by stepping up an index variable until it reaches
the last index. This normally requires two comparisons for each list item: one to check whether
the index has reached the end of the array, and another one to check whether the item has the
desired value.
1. Repeat For J = 1 to N
2. If (ITEM == A[J]) Then
3. Print: ITEM found at location J
4. Return [End of If]
[End of For Loop]
5. If (J > N) Then
6. Print: ITEM doesn’t exist
[End of If]
7. Exit
//CODE
int a[10],i,n,m,c=0, x;
Searching a sorted collection is a common task. A dictionary is a sorted list of word definitions.
Given a word, one can find its definition. A telephone book is a sorted list of people's names,
addresses, and telephone numbers. Knowing someone's name allows one to quickly find their
telephone number and address.
6. Else
[End of If]
11. Else
[End of If]
13. Exit
//CODE
int ar[10],val,mid,low,high,size,i;
clrscr();
printf("\nenter the no.s of elements u wanna input in array\n");
scanf("%d",&size);
for(i=0;i<size;i++)
{
printf("input the element no %d\n",i+1);
scanf("%d",&ar[i]);
}
printf("the arry inputed is \n");
for(i=0;i<size;i++)
{
printf("%d\t",ar[i]);
}
low=0;
high=size-1;
printf("\ninput the no. u wanna search \n");
scanf("%d",&val);
while(val!=ar[mid]&&high>=low)
{
mid=(low+high)/2;
if(ar[mid]==val)
{
printf("value found at %d position",mid+1);
}
if(val>ar[mid])
{
low=mid+1;
}
else
{
high=mid-1;
}}
Complexity of Binary Search
A binary search halves the number of items to check with each iteration, so locating an item (or
determining its absence) takes logarithmic time.
Sorting Efficiency
There are many techniques for sorting. Implementation of particular sorting technique depends
upon situation. Sorting techniques mainly depends on two parameters.
First parameter is the execution time of program, which means time taken for execution of
program.
Second is the space, which means space taken by the program.
5.3 TYPES OF SORTING
• An internal sort is any data sorting process that takes place entirely within the main
memory of a computer. This is possible whenever the data to be sorted is small enough to
all be held in the main memory.
• External sorting is a term for a class of sorting algorithms that can handle massive
amounts of data. External sorting is required when the data being sorted do not fit into
the main memory of a computing device (usually RAM) and instead they must reside in
the slower external memory (usually a hard drive). External sorting typically uses
a hybrid sort-merge strategy. In the sorting phase, chunks of data small enough to fit in
main memory are read, sorted, and written out to a temporary file. In the merge phase, the
sorted sub files are combined into a single larger file.
• We can say a sorting algorithm is stable if two objects with equal keys appear in the same
order in sorted output as they appear in the input unsorted array.
5.3.1 Insertion sort
It is a simple sorting algorithm that builds the final sorted array (or list) one item at a time. This
algorithm is less efficient on large lists than more advanced algorithms such
as quicksort, heapsort, or merge sort. However, insertion sort provides several advantages:
· Simple implementation
· Efficient for small data sets
· Stable; i.e., does not change the relative order of elements with equal keys
· In-place; i.e., only requires a constant amount O(1) of additional memory space.
A) Set A[j+1]=A[j]
b) j=j-1
5. Set A[j+1]=key
6. Return
//CODE
int A[6] = {5, 1, 6, 2, 4, 3};
int i, j, key;
for(i=1; i<6; i++)
{
key = A[i];
j = i-1;
while(j>=0 && key < A[j])
{
A[j+1] = A[j];
j--;
}
A[j+1] = key;
}
Complexity of Insertion Sort
The number f(n) of comparisons in the insertion sort algorithm can be easily computed. First of
all, the worst case occurs when the array A is in reverse order and the inner loop must use the
maximum number K-1 of comparisons. Hence
Selection sorting is conceptually the simplest sorting algorithm. This algorithm first finds the
smallest element in the array and exchanges it with the element in the first position, then find the
second smallest element and exchange it with the element in the second position, and continues
in this way until the entire array is sorted
//CODE
The number of comparison in the selection sort algorithm is independent of the original order of
the element. That is there are n-1 comparison during PASS 1 to find the smallest element, there
are n-2 comparisons during PASS 2 to find the second smallest element, and so on. Accordingly
F(n)=(n-1)+(n-2)+…………………………+2+1=n(n-1)/2 = O(n2)
Bubble Sort is an algorithm which is used to sort N elements that are given in a memory for eg:
an Array with N number of elements. Bubble Sort compares all the element one by one and sort
them based on their values.
It is called Bubble sort, because with each iteration the smaller element in the list bubbles up
towards the first place, just like a water bubble rises up to the water surface.
Sorting takes place by stepping through all the data items one-by-one in pairs and comparing
adjacent data items and swapping each pair that is out of order.
Let us take the array of numbers "5 1 4 2 8", and sort the array from lowest number to greatest
number using bubble sort. In each step, elements written in bold are being compared. Three
passes will be required.
First Pass:
( 5 1 4 2 8 ) ( 1 5 4 2 8 ), Here, algorithm compares the first two elements, and swaps since 5
> 1.
( 1 5 4 2 8 ) ( 1 4 5 2 8 ), Swap since 5 > 4
( 1 4 5 2 8 ) ( 1 4 2 5 8 ), Swap since 5 > 2
( 1 4 2 5 8 ) ( 1 4 2 5 8 ), Now, since these elements are already in order (8 > 5), algorithm
does not swap them.
Second Pass:
(14258) (14258)
( 1 4 2 5 8 ) ( 1 2 4 5 8 ), Swap since 4 > 2
(12458) (12458)
(12458) (12458)
Now, the array is already sorted, but our algorithm does not know if it is completed. The
algorithm needs one whole pass without any swap to know it is sorted.
Third Pass:
(12458) (12458)
(12458) (12458)
(12458) (12458)
(12458) (12458)
2. Set ptr=1
[End of If]
b) ptr=ptr+1
4. Exit
//CODE
In the above code, if in a complete single cycle of j iteration(inner for loop), no swapping takes
place, and flag remains 0, then we will break out of the for loops, because the array has already
been sorted.
In Bubble Sort, n-1 comparisons will be done in 1st pass, n-2 in 2nd pass, n-3 in 3rd pass and so
on. So the total number of comparisons will be
F(n)=(n-1)+(n-2)+…………………………+2+1=n(n-1)/2 = O(n2)
Quick Sort, as the name suggests, sorts any list very quickly. Quick sort is not stable search, but
it is very fast and requires very less additional space. It is based on the rule of Divide and
Conquer (also called partition-exchange sort). This algorithm divides the list into three main
parts
Elements less than the Pivot element
Pivot element
Elements greater than the pivot element
In the list of elements, mentioned in below example, we have taken 25 as pivot. So after the first
pass, the list will be changed like this.
6 8 17 14 25 63 37 52
Hence after the first pass, pivot will be set at its position, with all the elements smaller to it on its
left and all the elements larger than it on the right. Now 6 8 17 14 and 63 37 52 are considered as
two separate lists, and same logic is applied on them, and we keep doing this until the complete
list is sorted.
QUICKSORT (A, p, r)
1 if p < r
4 QUICKSORT (A, q + 1, r)
The key to the algorithm is the PARTITION procedure, which rearranges the subarray A[p r] in
place.
PARTITION (A, p, r)
1 x ← A[r]
2i←p-1
3 for j ← p to r - 1
4 do if A[j] ≤ x
5 then i ← i + 1
8 return i + 1
//CODE
The Worst Case occurs when the list is sorted. Then the first element will require n comparisons
to recognize that it remains in the first position. Furthermore, the first sublist will be empty, but
the second sublist will have n-1 elements. Accordingly the second element require n-1
comparisons to recognize that it remains in the second position and so on.
Merge Sort follows the rule of Divide and Conquer. But it doesn't divide the list into two halves.
In merge sort the unsorted list is divided into N sub lists, each having one element, because a list
of one element is considered sorted. Then, it repeatedly merge these sub lists, to produce new
sorted sub lists, and at lasts one sorted list is produced.
Merge Sort is quite fast, and has a time complexity of O(n log n). It is also a stable sort, which
means the equal elements are ordered in the same order in the sorted list.
Suppose the array A contains 8 elements, each pass of the merge-sort algorithm will start at the
beginning of the array A and merge pairs of sorted subarrays as follows.
PASS 1. Merge each pair of elements to obtain the list of sorted pairs.
PASS 2. Merge each pair of pairs to obtain the list of sorted quadruplets.
PASS 3. Merge each pair of sorted quadruplets to obtain the two sorted subarrays.
PASS 4. Merge the two sorted subarrays to obtain the single sorted array.
while(i <= q)
{
b[k++] = a[i++];
}
while(j <= r)
{
b[k++] = a[j++];
}
Let f(n) denote the number of comparisons needed to sort an n-element array A using merge-sort
algorithm. The algorithm requires at most logn passes. Each pass merges a total of n elements
and each pass require at most n comparisons. Thus for both the worst and average case
F(n) ≤ n logn
Thus the time complexity of Merge Sort is O(n Log n) in all 3 cases (worst, average and best) as
merge sort always divides the array in two halves and take linear time to merge two halves.
Heap Sort is one of the best sorting methods being in-place and with no quadratic worst-case
scenarios. Heap sort algorithm is divided into two basic parts
Creating a Heap of the unsorted list.
Then a sorted array is created by repeatedly removing the largest/smallest element from the heap,
and inserting it into the array. The heap is reconstructed after each removal.
What is a Heap?
Heap is a special tree-based data structure that satisfies the following special heap properties
Shape Property: Heap data structure is always a Complete Binary Tree, which means all levels
of the tree are fully filled.
Heap Property: All nodes are either greater than or equal to or less than or equal to each of its
children. If the parent nodes are greater than their children, heap is called a Max-Heap, and if the
parent nodes are smaller than their child nodes, heap is called Min-Heap.
Initially on receiving an unsorted list, the first step in heap sort is to create a Heap data structure
(Max-Heap or Min-Heap). Once heap is built, the first element of the Heap is either largest or
smallest (depending upon Max-Heap or Min-Heap), so we put the first element of the heap in our
array. Then we again make heap using the remaining elements, to again pick the first element of
the heap and put it into the array. We keep on doing the same repeatedly until we have the
complete sorted list in our array.
Heap Sort Algorithm
• HEAPSORT(A)
1. BUILD-MAX-HEAP(A)
2. for i ← length[A] downto 2
3. do exchange A[1] ↔ A[i ]
4. heap-size[A] ← heap-size[A] – 1
5. MAX-HEAPIFY(A, 1)
• BUILD-MAX-HEAP(A)
1. heap-size[A] ← length[A]
2. for i ← length[A]/2 downto 1
3. do MAX-HEAPIFY(A, i )
• MAX-HEAPIFY(A, i )
1. l ← LEFT(i )
2. r ← RIGHT(i )
3. if l ≤ heap-size[A] and A[l] > A[i ]
4. then largest ←l
5. else largest ←i
6. if r ≤ heap-size[A] and A[r] > A[largest]
7. then largest ←r
8. if largest = i
9. then exchange A[i ] ↔ A[largest]
10. MAX-HEAPIFY(A, largest)
//CODE
In the below algorithm, initially heapsort() function is called, which calls buildmaxheap() to
build heap, which inturn uses maxheap() to build the heap.
void main()
{
int a[10], i, size;
printf("Enter size of list"); // less than 10, because max size of array is 10
scanf(“%d”,&size);
printf( "Enter" elements");
for( i=0; i < size; i++)
{
Scanf(“%d”,&a[i]);
}
heapsort(a, size);
getch();
}
The heap sort algorithm is applied to an array A with n elements. The algorithm has two phases,
and we analyze the complexity of each phae separately.
Phase 1. Suppose H is a heap. The number of comparisons to find the appropriate place of a new
element item in H cannot exceed the depth of H. Since H is complete tree, its depth is bounded
by log2m where m is the number of elements in H. Accordingly, the total number g(n) of
comparisons to insert the n elements of A into H is bounded as
g(n) ≤ n log2n
Phase 2. If H is a complete tree with m elements, the left and right subtrees of H are heaps and L
is the root of H Reheaping uses 4 comparisons to move the node L one step down the tree H.
Since the depth cannot exceeds log2m , it uses 4log2m comparisons to find the appropriate place
of L in the tree H.
h(n)≤4nlog2n
Thus each phase requires time proportional to nlog2n, the running time to sort n elements array A
would be nlog2n
The idea is to consider the key one character at a time and to divide the entries, not into two sub
lists, but into as many sub lists as there are possibilities for the given character from the key. If
our keys, for example, are words or other alphabetic strings, then we divide the list into 26 sub
lists at each stage. That is, we set up a table of 26 lists and distribute the entries into the lists
according to one of the characters in the key.
A person sorting words by this method might first distribute the words into 26 lists according to
the initial letter (or distribute punched cards into 12 piles), then divide each of these sub lists into
further sub lists according to the second letter, and so on. The following idea eliminates this
multiplicity of sub lists: Partition the items into the table of sub lists first by the least significant
position, not the most significant. After this first partition, the sub lists from the table are put
back together as a single list, in the order given by the character in the least significant position.
The list is then partitioned into the table according to the second least significant position and
recombined as one list. When, after repetition of these steps, the list has been partitioned by the
most significant place and recombined, it will be completely sorted. This process is illustrated by
sorting the list of nine three-letter words below.
Radix Sort Algorithm
Radixsort(A,d)
1. For i←1 to d
2. Do use a stable sort to sort array A on digit i
The list A of n elements A1, A2,……………An is given. Let d denote the radix(e.g d=10 for
decimal digits, d=26 for letters and d=2 for bits) and each item Ai is represented by means of s of
the digits:
Ai = di1 di2………………. dis
The radix sort require s passes, the number of digits in each item . Pass K will compare each dik
with each of the d digits. Hence
C(n)≤ d*s*n
If the file, F, has been sorted so that at the end of the sort P is a pointer to the first record in a
linked list of records then each record in this list will have a key which is greater than or equal to
the key of the previous record (if there is a previous record). To physically rearrange these
records into the order specified by the list, we begin by interchanging records R1 and RP. Now,
the record in the position R1 has the smallest key. If P≠1 then there is some record in the list with
link field = 1. If we could change this link field to indicate the new position of the record
previously at position 1 then we would be left with records R2, ...,Rn linked together in non
decreasing order. Repeating the above process will, after n - 1 iterations, result in the desired
rearrangement.
· The left sub tree of a node contains only nodes with keys less than the node's key.
· The right subtree of a node contains only nodes with keys greater than the node's key.
· The left and right subtree each must also be a binary search tree.
· Each node can have up to two successor nodes.
· There must be no duplicate nodes.
· A unique path exists from the root to every other node.
The major advantage of binary search trees over other data structures is that the related sorting
algorithms and search algorithms such as in-order traversal can be very efficient. The other
advantages are:
· Binary Search Tree is fast in insertion and deletion etc. when balanced.
· Very efficient and its code is easier than other data structures.
· Stores keys in the nodes in a way that searching, insertion and deletion can be done
efficiently.
· Implementation is very simple in Binary Search Trees.
· Nodes in tree are dynamic in nature.
· The shape of the binary search tree totally depends on the order of insertions, and it can
be degenerated.
· When inserting or searching for an element in binary search tree, the key of each visited
node has to be compared with the key of the element to be inserted or found, i.e., it takes
a long time to search an element in a binary search tree.
· The keys in the binary search tree may be long and the run time may increase.
Insertion in BST
Deletion in BST
Consider the BST shown below first the element 4 is deleted. Then 10 is deleted and after
that 27 is deleted from the BST.