0% found this document useful (0 votes)
12 views

DS Module-3

Sorting, searching and hash tables Data structures notes pdf Engineering

Uploaded by

Anamika
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views

DS Module-3

Sorting, searching and hash tables Data structures notes pdf Engineering

Uploaded by

Anamika
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 84

MODULE-3 (8 Hours)

Searching Techniques: Linear Search,


Binary Search and Fibonacci Search.
Sorting Techniques: Selection Sort,
Insertion sort, Bubble sort, Merge Sort,
Quick Sort, Heap sort, Radix Sort.
Hash Tables: Hash Functions, Collision
Handling Schemes, Applications.
Linear Search Algorithm

• Linear Search is defined as a sequential search


algorithm that starts at one end and goes through each
element of a list until the desired element is found,
otherwise the search continues till the end of the data
set.
How Does Linear Search Algorithm Work?
• In Linear Search Algorithm,
• Every element is considered as a potential match for
the key and checked for the same.
• If any element is found equal to the key, the search is
successful and the index of that element is returned.
• If no element is found equal to the key, the search
yields “No match found”.
For example: Consider the array arr[] = {10, 50, 30, 70, 80, 20, 90, 40} and key = 30
Step 1: Start from the first element (index 0) and compare key with each element (arr[i]).
Step 2: Now when comparing arr[2] with key, the value matches.
So the Linear Search Algorithm will yield a successful message
and return the index of the element when key is found (here 2).
Implementation of Linear Search Algorithm:
Below is the implementation of the linear search algorithm:

• // C code to linearly search x in arr[].


• #include <stdio.h>
• int search(int arr[], int N, int x){
• for (int i = 0; i < N; i++)
• if (arr[i] == x)
• return i;
• return -1;
•}
• // Driver code
• int main(void)
• {
• int arr[] = { 2, 3, 4, 10, 40 };
• int x = 10;
• int N = sizeof(arr) / sizeof(arr[0]);

• // Function call
• int result = search(arr, N, x);
• (result == -1)
• ? printf("Element is not present in array")
• : printf("Element is present at index %d", result);
• return 0;
• }
OutputElement is present at index 3
• The search function is defined to perform the linear
search.
It takes three arguments:
• arr[]: An integer array in which the search is
performed.
• N: The number of elements in the array.
• x: The element to be searched for.Inside the search
function, a for loop iterates through the array elements
from index 0 to N-1.
• Inside the loop, there's an if statement that checks if
the current element (arr[i]) is equal to the target
element (x).
• If the condition is met (i.e., arr[i] == x), the function
returns the index i, indicating that the element was
found at that index.
• If the loop completes without finding the element (x),
the function returns -1, indicating that the element is
not present in the array.
• In the main function: An array arr[] is defined with
some sample elements.
• The target element x is set to 10.The variable N is
calculated as the number of elements in the array using
sizeof(arr) / sizeof(arr[0]), which is a common way to
determine the size of an array in C.
• The search function is called with the array, its size, and
the target element, and the result is stored in the result
variable.
• Finally, the code checks the value of result: If result is -
1, it means the element was not found in the array, and
it prints "Element is not present in array.
• "If result is not -1, it means the element was found,
and it prints "Element is present at index result," where
result is the index where the element was found.
• So, when you run this code with x set to 10, it should
print "Element is present at index 3," indicating that the
element 10 was found at index 3 in the array.
Complexity Analysis of Linear Search:
• Time Complexity:
• Best Case: In the best case, the key might be present at
the first index. So the best case complexity is O(1)
• Worst Case: In the worst case, the key might be present
at the last index i.e., opposite to the end from which the
search has started in the list. So the worst-case complexity
is O(N) where N is the size of the list.
• Average Case: O(N)
• Auxiliary Space: O(1) as except for the variable to iterate
through the list, no other variable is used.
Advantages of Linear Search:
• Linear search can be used irrespective of whether the
array is sorted or not. It can be used on arrays of any
data type.
• Does not require any additional memory.
• It is a well-suited algorithm for small datasets.
Drawbacks of Linear Search:
• Linear search has a time complexity of O(N), which in
turn makes it slow for large datasets.
• Not suitable for large arrays.
When to use Linear Search?
• When we are dealing with a small dataset.
• When you are searching for a dataset stored in
contiguous memory.
Binary Search Algorithm
• Searching is the process of finding some particular
element in the list.
• If the element is present in the list, then the process is
called successful, and the process returns the location
of that element.
• Otherwise, the search is called unsuccessful.
• Binary search is the search technique that works efficiently on
sorted lists.
• Hence, to search an element into some list using the binary
search technique, we must ensure that the list is sorted.
• Binary search follows the divide and conquer approach in which
the list is divided into two halves, and the item is compared with
the middle element of the list.
• If the match is found then, the location of the middle element is
returned.
• Otherwise, we search into either of the halves depending upon
the result produced through the match.
1. Binary_Search(a, lower_bound, upper_bound, val) // 'a' is the given array, 'lower_bound' is the index of the f
irst array element, 'upper_bound' is the index of the last array element, 'val' is the value to search
2. Step 1: set beg = lower_bound, end = upper_bound, pos = - 1
3. Step 2: repeat steps 3 and 4 while beg <=end
4. Step 3: set mid = (beg + end)/2
5. Step 4: if a[mid] = val
6. set pos = mid
7. print pos
8. go to step 6
9. else if a[mid] > val
10.set end = mid - 1
11.else
12.set beg = mid + 1
13.[end of if]
14.[end of loop]
15.Step 5: if pos = -1
16.print "value is not present in the array"
17.[end of if]
18.Step 6: exit
Working of Binary search

There are two methods to implement the binary search


algorithm –
• Iterative method
• Recursive method
The recursive method of binary search follows the divide
and conquer approach.
Let the elements of array be the above list.
• Let the element to search is, K = 56
• We have to use the below formula to calculate the mid of the array -

In the given array,


• beg = 0
• end = 8
• mid = (0 + 8)/2 = 4. So, 4 is the mid of the array.
• Now, the element to search is found.
• So algorithm will return the index of the element matched.
Binary Search Complexity
• 1. Time Complexity
• Best Case Complexity - In Binary search, best case
occurs when the element to search is found in first
comparison, i.e., when the first middle element itself is
the element to be searched. The best-case time
complexity of Binary search is O(1).
• Average Case Complexity - The average case time
complexity of Binary search is O(logn).
• Worst Case Complexity - In Binary search, the worst
case occurs, when we have to keep reducing the search
space till it has only one element. The worst-case time
complexity of Binary search is O(logn).
2. Space Complexity
Program: Write a program to implement
Binary search.
#include <stdio.h>
int binarySearch(int a[], int beg, int end, int val)
{
int mid;
if(end >= beg)
{ mid = (beg + end)/2;
/* if the item to be searched is present at middle */
if(a[mid] == val)
{
return mid+1;
}
/* if the item to be searched is smaller than middle, then it can only be in left subarray */
else if(a[mid] < val) {
return binarySearch(a, mid+1, end, val);
}
/* if the item to be searched is greater than middle, then it can only be in right subarray */

else {
return binarySearch(a, beg, mid-1, val);
}
}
return -1;
}
int main() {
int a[] = {11, 14, 25, 30, 40, 41, 52, 57, 70}; // given array
int val = 40; // value to be searched
int n = sizeof(a) / sizeof(a[0]); // size of array
int res = binarySearch(a, 0, n-1, val); // Store result
printf("The elements of the array are - ");
for (int i = 0; i < n; i++)
printf("%d ", a[i]);
printf("\nElement to be searched is - %d", val);
if (res == -1)
printf("\nElement is not present in the array");
else
printf("\nElement is present at %d position of array", res);
return 0;
}
Fibonacci Search

• Given a sorted array arr[] of size n and an element x to be searched in it. Return index of x if it is
present in array else return -1.
Examples:
• Input: arr[] = {2, 3, 4, 10, 40}, x = 10
Output: 3
Element x is present at index 3.
• Input: arr[] = {2, 3, 4, 10, 40}, x = 11
Output: -1
Element x is not present.
• Fibonacci Search is a comparison-based technique that uses Fibonacci numbers to search an
element in a sorted array.
• Background:
Fibonacci Numbers are recursively defined as F(n) = F(n-1) + F(n-2), F(0) = 0, F(1) = 1. First few
Fibonacci Numbers are 0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, …
Observations:
Below observation is used for range elimination, and hence for the O(log(n)) complexity.

• F(n - 2) &approx; (1/3)*F(n) and

• F(n - 1) &approx; (2/3)*F(n).


Algorithm:
Let the searched element be x.
The idea is to first find the smallest Fibonacci number that is greater than or equal to the length of
the given array. Let the found Fibonacci number be fib (m’th Fibonacci number). We use (m-2)’th
Fibonacci number as the index (If it is a valid index). Let (m-2)’th Fibonacci Number be i, we
compare arr[i] with x, if x is same, we return i. Else if x is greater, we recur for subarray after i, else
we recur for subarray before i.
• Below is the complete algorithm
Let arr[0..n-1] be the input array and the element to be searched be x.

• Find the smallest Fibonacci Number greater than or equal to n. Let this number be fibM [m’th Fibonacci Number]. Let the two
Fibonacci numbers preceding it be fibMm1 [(m-1)’th Fibonacci Number] and fibMm2 [(m-2)’th Fibonacci Number].

• While the array has elements to be inspected:

• Compare x with the last element of the range covered by fibMm2

• If x matches, return index

• Else If x is less than the element, move the three Fibonacci variables two Fibonacci down, indicating elimination of
approximately rear two-third of the remaining array.

• Else x is greater than the element, move the three Fibonacci variables one Fibonacci down. Reset offset to index. Together
these indicate the elimination of approximately front one-third of the remaining array.

• Since there might be a single element remaining for comparison, check if fibMm1 is 1. If Yes, compare x with that remaining element.
If match, return index.


7.2. Insertion Sort
7.2.1. The Algorithm
• One of the simplest sorting algorithms is the insertion sort.
• Insertion sort consists of n - 1 passes.
• For pass p = 2 through n, insertion sort ensures that the
elements in positions 1 through p are in sorted order.
• Insertion sort makes use of the fact that elements in
positions 1 through p - 1 are already known to be in sorted
order.
Figure 7.1 shows a sample file after each pass of
insertion sort.
• Figure 7.1 shows the general strategy.
• In pass p, we move the pth element left until its correct
place is found among the first p elements.
• The code in Figure 7.2 implements this strategy.
1.A sentinel value (usually a minimum or maximum value
depending on the sorting order) is placed in the first position of the array.
This sentinel value ensures that the inner loop won't go beyond the
beginning of the array.
2.The algorithm iterates through the array starting from the
second element (index 2) up to the last element (index n).
3.The current element being considered for insertion is stored in the variable tmp.
4.A backward loop (j) starts from the current index (p) and
compares tmp with the previous element (a[j-1]).
If the previous element is greater than tmp, the previous element
is shifted one position to the right.
5.Once the correct position for tmp is found within the
sorted part of the array, the inner loop exits.
6.The element tmp is then inserted into the correct position
in the sorted part of the array.
This process continues until all elements are correctly positioned
within the sorted array.
Keep in mind that the code assumes that MIN_DATA
is a predefined constant representing the smallest possible value
for the input_type data type.
Please note that the code snippet has a couple of issues:
the loop starting from p = 2 should start from p = 1
as array indices are 0-based, and the outer loop should also run until p
< n instead of p <= n.
• The sentinel in a[0] terminates the while loop in the event
that in some pass an element is moved all the way to the
front.
• Lines 3 through 6 implement that data movement without
the explicit use of swaps.
• The element in position p is saved in tmp, and all larger
elements (prior to position p) are moved one spot to the
right.
• Then tmp is placed in the correct spot.
• This is the same technique that was used in the
implementation of binary heaps.
7.2.2. Analysis of Insertion Sort
• Because of the nested loops, each of which can take n
iterations, insertion sort is O(n2 ).
• A precise calculation shows that the test at line 4 can
be executed at most p times for each value of p.
Summing over all p gives a total of
• On the other hand, if the input is presorted, the running
time is O(n), because the test in the inner for loop
always fails immediately.
• Indeed, if the input is almost sorted insertion sort will
run quickly.
• Because of this wide variation, it is worth analyzing the
average-case behavior of this algorithm.
• It turns out that the average case is (n 2 ) for insertion
sort.
7.3. A Lower Bound for Simple Sorting Algorithms
• An inversion in an array of numbers is any ordered pair (i, j)
having the property that i < j but a[i] > a[j].
• In the example of the input list 34, 8, 64, 51, 32, 21 had
nine inversions, namely (34,8), (34,32), (34,21), (64,51),
(64,32), (64,21), (51,32), (51,21) and (32,21).
• Notice that this is exactly the number of swaps that needed
to be (implicitly) performed by insertion sort.
• This is always the case, because swapping two adjacent
elements that are out of place removes exactly one
inversion, and a sorted file has no inversions.
• Since there is O(n) other work involved in the
algorithm, the running time of insertion sort is O(I + n),
where I is the number of inversions in the original file.
• Thus, insertion sort runs in linear time if the number of
inversions is O(n).
• We can compute precise bounds on the average running
time of insertion sort by computing the average number
of inversions in a permutation.
• As usual, defining average is a difficult proposition. We
will assume that there are no duplicate elements.
• Using this assumption, we can assume that the input is
some permutation of the first n integers (since only
relative ordering is important) and that all are equally
likely.
• Under these assumptions, we have the following
theorem:
THEOREM 7.1. The average number of inversions in an
array of n distinct numbers is n(n - 1)/4.
PROOF:
• For any list, L, of numbers, consider Lr , the list in
reverse order.
• The reverse list of the example is 21, 32, 51, 64, 34, 8.
• Consider any pair of two numbers in the list (x, y), with
y > x.
• Clearly, in exactly one of L and Lr this ordered pair
represents an inversion.
• The total number of these pairs in a list L and its
reverse Lr is n(n - 1)/2.
• Thus, an average list has half this amount, or n(n -1)/4
inversions.
• This theorem implies that insertion sort is quadratic
on average.
• It also provides a very strong lower bound about any
algorithm that only exchanges adjacent elements
THEOREM 7.2. Any algorithm that sorts by exchanging
adjacent elements requires (n2 ) time on average.
PROOF:
• The average number of inversions is initially n(n - 1)/4 = (n2
). Each swap removes only one inversion, so (n2) swaps are
required. This is an example of a lower-bound proof. It is
valid not only for insertion sort, which performs adjacent
exchanges implicitly, but also for other simple algorithms
such as bubble sort and selection sort. In fact, it is valid
over an entire class of sorting algorithms, including those
undiscovered, that perform only adjacent exchanges.
• This lower bound shows us that in order for a sorting
algorithm to run in subquadratic, or o(n2 ), time, it
must do comparisons and exchanges between elements
that are far apart. A sorting algorithm makes progress
by eliminating inversions, and to run efficiently, it must
eliminate more than just one inversion per exchange.
7.5. Heapsort
• Priority queues can be used to sort in O(n log n) time.
• The algorithm based on this idea is known as heapsort and gives the
best Big-Oh running time we have seen so far.
• As an example, suppose we have a heap with six elements. The first
delete_min produces a1 .
• Now the heap has only five elements, so we can place a1 in position
6.
• The next delete_min produces a2 .
• Since the heap will now only have four elements, we can place a2 in
position 5.
• Using this strategy, after the last delete_min the array will contain the
elements in decreasing sorted order.
• If we want the elements in the more typical increasing sorted order,
we can change the ordering property so that the parent has a larger
key than the child. Thus we have a (max)heap
• In our implementation, we will use a (max)heap, but avoid the actual
ADT for the purposes of speed.
• We then perform n - 1 delete_maxes by swapping the last element in
the heap with the first, decrementing the heap size, and percolating
down.
• When the algorithm terminates, the array contains the elements in
sorted order.
• For instance, consider the input sequence 31, 41, 59, 26, 53, 58, 97.
• The resulting heap is shown in Figure 7.6.
• Figure 7.7 shows the heap that results after the first delete_max. As
the figures imply, the last element in the heap is 31; 97 has been
placed in a part of the heap array that is technically no longer part of
the heap.
• After 5 more delete_max operations, the heap will actually have only
one element, but the elements left in the heap array will be in sorted
order.
The code to perform heapsort is given in
Figure 7.8.
7.6. Mergesort
• Merge sort runs in O(n log n) worst-case running time, and the
number of comparisons used is nearly optimal.
• The fundamental operation in this algorithm is merging two sorted
lists.
• Because the lists are sorted, this can be done in one pass through the
input, if the output is put in a third list.
• The basic merging algorithm takes two input arrays a and b, an output
array c, and three counters, aptr, bptr, and cptr, which are initially set
to the beginning of their respective arrays.
• The smaller of a[aptr] and b[bptr] is copied to the next entry in c, and
the appropriate counters are advanced.
• When either input list is exhausted, the remainder of the other list is
copied to c.
• An example of how the merge routine works is provided for the
following input.
If the array a contains 1, 13, 24, 26, and b contains 2, 15, 27, 38, then the algorithm proceeds
as follows: First, a comparison is done between 1 and 2. 1 is added to c, and then 13 and 2
are compared.
2 is added to c, and then 13 and 15 are compared.
13 is added to c, and then 24 and 15 are compared. This proceeds until 26
and 27 are compared
26 is added to c, and the a array is exhausted.
The remainder of the b array is then copied to
c
• The time to merge two sorted lists is clearly linear, because at most n
- 1 comparisons are made, where n is the total number of elements.
• The mergesort algorithm is therefore easy to describe. If n = 1, there
is only one element to sort, and the answer is at hand.
• Otherwise, recursively mergesort the first half and the second half.
• This gives two sorted halves, which can then be merged together
using the merging algorithm described above.
• For instance, to sort the eight-element array 24, 13, 26, 1, 2, 27, 38,
15, we recursively sort the first four and last four elements, obtaining
1, 13, 24, 26, 2, 15, 27, 38.
• Then we merge the two halves as above, obtaining the final list 1, 2,
13, 15, 24, 26, 27, 38.
• This algorithm is a classic divide and-conquer strategy.
• The problem is divided into smaller problems and solved recursively.
The conquering phase consists of patching together the answers.
• Divide-and-conquer is a very powerful use of recursion that we will
see many times.
• An implementation of mergesort is provided in Figure 7.9.
7.6.1. Analysis of Mergesort
• Mergesort is a classic example of the techniques used to analyze recursive
routines.
• Assume that n is a power of 2, so that we always split into even halves.
• For n = 1, the time to mergesort is constant, which we will denote by 1.
• Otherwise, the time to mergesort n numbers is equal to the time to do two
recursive mergesorts of size n/2, plus the time to merge, which is linear.
• The equations below say this exactly:
• This is a standard recurrence relation, which can be solved several
ways. We will show two methods. The first idea is to divide the
recurrence relation through by n. The reason for doing this will
become apparent soon. This yields
• This equation is valid for any n that is a power of 2, so we may also
write
and

Now add up all the equations. This means that we add all of the terms on the left-hand side and set the result
equal to the sum of all of the terms on the right-hand side. Observe that the term T(n/2)/(n/2) appears on
both sides and thus cancels. In fact, virtually all the terms appear on both sides and cancel. This is called
telescoping a sum. After everything is added, the final result is
• because all of the other terms cancel and there are log n equations,
and so all the 1s at the end of these equations add up to log n.
Multiplying through by n gives the final answer.
• Notice that if we did not divide through by n at the start of the
solutions, the sum would not telescope. This is why it was necessary
to divide through by n.
• The analysis can be refined to handle cases when n is not a power of
2.
• The answer turns out to be almost identical (this is usually the case).
• Although merge sort's running time is O(n log n), it is hardly ever used
for main memory sorts.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy