Unit 10 Sorting: Structure Page Nos
Unit 10 Sorting: Structure Page Nos
Unit 10 Sorting: Structure Page Nos
UNIT 10 SORTING
Structure Page Nos.
10.0 Introduction 5
10.1 Objectives 5
10.2 Internal Sorting 6
10.2.1 Insertion Sort
10.2.2 Bubble Sort
10.2.3 Quick Sort
10.2.4 2-way Merge Sort
10.2.5 Heap Sort
10.3 Sorting on Several Keys 13
10.4 Summary 13
10.5 Solutions/Answers 14
10.6 Further Readings 14
10.0 INTRODUCTION
Retrieval of information is made easier when it is stored in some predefined order.
Sorting is, therefore, a very important computer application activity. Many
sorting algorithms are available. Different environments require different sorting
methods. Sorting algorithms can be characterised in the following two ways:
1. Simple algorithms which require the order of n2 (written as O(n2))comparisons to
sort n items.
2. Sophisticated algorithms that require the O(nlog2n) comparisons to sort n items.
The difference lies in the fact that the first method moves data only over small
distances in the process of sorting, whereas the second method moves data over large
distances, so that items settle into the proper order sooner, thus resulting in fewer
comparisons. Performance of a sorting algorithm can also depend on the degree of
order already present in the data.
There are two basic categories of sorting methods: Internal Sorting and External
Sorting. Internal sorting is applied when the entire collection of data to be sorted is
small enough so that the sorting can take place within the main memory. The time
required to read or write is not considered to be significant in evaluating the
performance of internal sorting methods. External sorting methods are applied to
larger collection of data which reside on secondary devices. Read and write access
times are a major concern in determining sorting performances of such methods.
In this unit, we will study some methods of internal sorting. The next unit will discuss
methods of external sorting.
10.1 OBJECTIVES
After going through this unit, you should be able to:
5
File Structures and
Advanced Data 10.2 INTERNAL SORTING
Structures
In internal sorting, all the data to be sorted is available in the high speed main memory
of the computer. We will study the following methods of internal sorting:
1. Insertion sort
2. Bubble sort
3. Quick sort
4. Two-way Merge sort
5. Heap sort
We will illustrate insertion sort with an example (refer to Figure 10.1) before
presenting the formal algorithm.
Example : Sort the following list using the insertion sort method:
Thus to find the correct position search the list till an item just greater than the target
is found. Shift all the items from this point one down the list. Insert the target in the
vacated slot. Repeat this process for all the elements in the list. This results in sorted
list.
In this method, adjacent members of the list to be sorted are compared.If the item on
top is greater than the item immediately below it, then they are swapped. This process
is carried on till the list is sorted.
1. Begin
6
2. Read the n elements Sorting
3. for i=1 to n
for j=n downto i+1
if a[j] <= a[j-1]
swap(a[j],a[j-1])
4. End // of Bubble Sort
= (N-1) +(N-2) . . . + 2 + 1
= (N-1)*N / 2 =O(N2)
This inefficiency is due to the fact that an item moves only to the next position in each
pass.
Rearrange the list so that this item is in the proper position, i.e., all preceding items
have a lesser value and all succeeding items have a greater value than this item.
2. A[I]
Repeat steps 1 & 2 for sublist1 & sublist2 till A[ ] is a sorted list.
2. From the left end of the list (A[O] onwards) scan till an item A[R] is found
whose value is greater than A[I].
3. From the right end of list [A[N] backwards] scan till an item A[L] is found
whose value is less than A[1].
5. Continue steps 2, 3 & 4 till the scan pointers cross. Stop at this stage.
7
File Structures and Program 10.1 gives the program segment for Quick sort. It uses recursion.
Advanced Data
Structures
Quicksort(A,m,n)
int A[ ],m,n
{
int i, j, k;
if m<n
{
i=m;
j=n+1;
k=A[m];
do
do
++i;
while (A[i] < k);
do
--j;
while (A[j] > k);
if (i < j)
{
temp = A[i];
A[i] = A[j];
A[j] = temp;
}
while (i<j);
temp = A[m];
A[m] = A[j];
A[j] = temp;
Quicksort(A,m,j-1);
Quicksort(A,j+1,n);
}
The Quick sort algorithm uses the O(N Log2N) comparisons on average. The
performance can be improved by keeping in mind the following points.
1. Switch to a faster sorting scheme like insertion sort when the sublist size
becomes comparatively small.
Merge sort is also one of the ‘divide and conquer’ class of algorithms. The basic idea
in this is to divide the list into a number of sublists, sort each of these sublists and
merge them to get a single sorted list. The illustrative implementation of 2 way merge
sort sees the input initially as n lists of size 1. These are merged to get n/2 lists of size
2. These n/2 lists are merged pair wise and so on till a single list is obtained. This can
be better understood by the following example. This is also called Concatenate sort.
Figure 10.2 depicts 2-way merge sort.
Mergesort is the best method for sorting linked lists in random order. The total
computing time is of the 0(n log2n ).
8
The disadvantage of using mergesort is that it requires two arrays of the same size and Sorting
space for the merge phase. That is, to sort a list of size n, it needs space for 2n
elements.
Mergesort is the best method for sorting linked lists in random order. The total
computing time is of the 0(n log2n ).
The disadvantage of using mergesort is that it requires two arrays of the same size and
space for the merge phase. That is, to sort a list of size n, it needs space for 2n
elements.
A complete binary tree is said to satisfy the ‘heap condition’ if the key of each node is
greater than or equal to the key in its children. Thus the root node will have the largest
key value.
Trees can be represented as arrays, by first numbering the nodes (starting from the
root) from left to right. The key values of the nodes are then assigned to array
positions whose index is given by the number of the node. For the example tree, the
corresponding array is depicted in Figure 10.4.
The relationships of a node can also be determined from this array representation. If a
node is at position j, its children will be at positions 2j and 2j + 1. Its parent will be at
position └J/2┘.
Consider the node M. It is at position 5. Its parent node is, therefore, at position
9
File Structures and 5/2┘ = 2 i.e. the parent is R. Its children are at positions 2 × 5 & (2 × 5) + 1, i.e.
Advanced Data 10 + 11 respectively i.e. E & I are its children.
Structures
A Heap is a complete binary tree, in which each node satisfies the heap condition,
represented as an array.
We will now study the operations possible on a heap and see how these can be
combined to generate a sorting algorithm.
1. Initially R is added as the right child of J and given the number 13.
2. But, R > J. So, the heap condition is violated.
3. Move R upto position 6 and move J down to position 13.
4. R > P. Therefore, the heap condition is still violated.
5. Swap R and P.
4. The heap condition is now satisfied by all nodes to get the heap of Figure 10.5.
We will first see two methods of heap construction and then removal in order from the
heap to sort the list.
• Insert items into an initially empty heap, satisfying the heap condition at all
steps.
• From the right most node modify to satisfy the heap condition.
PROFESSIONAL
Figure 10.6 shows different steps of the top down construction of the heap.
6 (e)
6 (f) 6 (g)
6 (h) 6 (i)
6 (j) 6 (k)
Example: The input file is (2,3,81,64,4,25,36,16,9, 49). When the file is interpreted as
a binary tree, it results in Figure 10.7. Figure 10.8 depicts the heap.
81
2
3 64 36
81
64 16
4 25 36 49 2
25
16 9 49 3 9 4
64
49
49 36
16 36
16
9
4 25 2
4 25 2
3 9 3
36 25
16 25 16 3
9 9 4 2
4 3 2
9
16
4 3
9 3
2
2 4
Sorted: 81, 64, 49, 36, 25 Sorted:81, 64, 49, 36, 25, 16
Size: 5 Size:4
12
Sorting
4 3
2 3
2
Sorted: 81, 64, 49, 36, 25, 16, 9 Sorted:81, 64, 49, 36, 25, 16, 9, 4
Size: 3 Size: 2
Sorted: 81, 64, 49, 36, 25, 16, 9, 4, 3 Sorted: 81, 64, 49, 36, 25, 16, 9, 4, 3, 2
Size : 1 Result
The first method is called the MSD (Most Significant Digit) sort and the second
method is called the LSD (Least Significant Digit) sort. Digit stands for a key. Though
they are called sorting methods, MSD and LSD sorts only decide the order of sorting.
The actual sorting could be done by any of the sorting methods discussed in this unit.
10.4 SUMMARY
Sorting is an important application activity. Many sorting algorithms are available.
But, each is efficient for a particular situation or a particular kind of data. The choice
of a sorting algorithm is crucial to the performance of the application.
In this unit we have studied many sorting algorithms used in internal sorting. This is
not a conclusive list and the student is advised to read the suggested books for
13
File Structures and exposure to additional sorting methods and for detailed discussions of the methods
Advanced Data introduced here.
Structures
The following are the three most important efficiency criteria:
10.5 SOLUTIONS/ANSWERS
1) O(N2) where N is the number of elements in the list to be sorted.
2) Divide and Conquer.
3) O(NlogN) where N is the number of elements to be sorted.
Reference Websites
http://www.it.jcu.edu.au/Subjects/ cp2001/1998/LectureNotes/Sorting/
http://oopweb.com/Algorithms/Files/Algorithms.html
14