Sorting Algorithms: CENG 213 Data Structures
Sorting Algorithms: CENG 213 Data Structures
Sorting Algorithms: CENG 213 Data Structures
Sorting
Sorting is a process that organizes a collection of data into either ascending or descending order. An internal sort requires that the collection of data fit entirely in the computers main memory. We can use an external sort when the collection of data cannot fit in the computers main memory all at once but must reside in secondary storage such as on a disk. We will analyze only internal sorting algorithms. Any significant amount of computer output is generally arranged in some sorted order so that it can be interpreted. Sorting also has indirect uses. An initial sort of the data can significantly enhance the performance of an algorithm. Majority of programming projects use a sort somewhere, and in many cases, the sorting cost determines the running time. A comparison-based sorting algorithm makes ordering decisions only on the basis of comparisons.
CENG 213 Data Structures
Sorting Algorithms
There are many sorting algorithms, such as:
Selection Sort Insertion Sort Bubble Sort Merge Sort Quick Sort
The first three are the foundations for faster and more efficient algorithms.
CENG 213 Data Structures
Selection Sort
The list is divided into two sublists, sorted and unsorted, which are divided by an imaginary wall. We find the smallest element from the unsorted sublist and swap it with the element at the beginning of the unsorted data. After each selection and swapping, the imaginary wall between the two sublists move one element ahead, increasing the number of sorted elements and decreasing the number of unsorted ones. Each time we move one element from the unsorted sublist to the sorted sublist, we say that we have completed a sort pass. A list of n elements requires n-1 passes to completely rearrange the data.
CENG 213 Data Structures
Sorted
23 78 45
Unsorted
8 32 56
Original List
8
8 8
78
23 23
45
45 32
23
78 78
32
32 45
56
56 56
After pass 1
After pass 2
After pass 3
8
8
23
23
32
32
45
45
78
56
56
78
After pass 4
After pass 5
In selectionSort function, the outer for loop executes n-1 times. We invoke swap function once at each iteration. Total Swaps: n-1 Total Moves: 3*(n-1) (Each swap has three moves)
Insertion Sort
Insertion sort is a simple sorting algorithm that is appropriate for small inputs.
Most common sorting technique used by card players.
The list is divided into two parts: sorted and unsorted. In each pass, the first element of the unsorted part is picked up, transferred to the sorted sublist, and inserted at the appropriate place. A list of n elements will take at most n-1 passes to sort the data.
CENG 213 Data Structures
Sorted
23 78 45
Unsorted
8 32 56
Original List
23
23 8
78
45 23
45
78 45
8
8 78
32
32 32
56
56 56
After pass 1
After pass 2
After pass 3
8
8
23
23
32
32
45
45
78
56
56
78
After pass 4
After pass 5
}
}
Worst-case:
O(n2)
O(n2) O(n2)
Array is in reverse order: Inner loop is executed i-1 times, for i = 2,3, , n The number of moves: 2*(n-1)+(1+2+...+n-1)= 2*(n-1)+ n*(n-1)/2 The number of key comparisons: (1+2+...+n-1)= n*(n-1)/2
Average-case:
O(n2)
Worst:
Longest running time (this is the upper limit for the algorithm) It is guaranteed that the algorithm will not be worse than this.
Sometimes we are interested in average case. But there are some problems with the average case.
It is difficult to figure out the average case. i.e. what is average input? Are we going to assume all possible inputs are equally likely? In fact for most algorithms average case is same as the worst case.
Bubble Sort
The list is divided into two sublists: sorted and unsorted. The smallest element is bubbled from the unsorted list and moved to the sorted sublist. After that, the wall moves one element ahead, increasing the number of sorted elements and decreasing the number of unsorted ones. Each time an element moves from the unsorted part to the sorted part one sort pass is completed. Given a list of n elements, bubble sort requires up to n-1 passes to sort the data.
CENG 213 Data Structures
Bubble Sort
23 78 45 8 32 56
Original List
8
8 8
23
23 23
78
32 32
45
78 45
32
45 78
56
56 56
After pass 1
After pass 2
After pass 3
23
32
45
56
78
After pass 4
Worst-case:
O(n2)
O(n2) O(n2)
Array is in reverse order: Outer loop is executed n-1 times, The number of moves: 3*(1+2+...+n-1) = 3 * n*(n-1)/2 The number of key comparisons: (1+2+...+n-1)= n*(n-1)/2
Average-case:
O(n2)
Mergesort
Mergesort algorithm is one of two important divide-and-conquer sorting algorithms (the other one is quicksort). It is a recursive algorithm. Divides the list into halves, Sort each halve separately, and Then merge the sorted halves into one sorted array.
Mergesort - Example
Merge
const int MAX_SIZE = maximum-number-of-items-in-array; void merge(DataType theArray[], int first, int mid, int last) { DataType tempArray[MAX_SIZE]; // temporary array int first1 = first; // beginning of first subarray int last1 = mid; // end of first subarray int first2 = mid + 1; // beginning of second subarray int last2 = last; // end of second subarray int index = first1; // next available location in tempArray for ( ; (first1 <= last1) && (first2 <= last2); ++index) { if (theArray[first1] < theArray[first2]) { tempArray[index] = theArray[first1]; ++first1; } else { tempArray[index] = theArray[first2]; ++first2; } }
CENG 213 Data Structures
Merge (cont.)
// finish off the first subarray, if necessary for (; first1 <= last1; ++first1, ++index) tempArray[index] = theArray[first1]; // finish off the second subarray, if necessary for (; first2 <= last2; ++first2, ++index) tempArray[index] = theArray[first2]; // copy the result back into the original array for (index = first; index <= last; ++index) theArray[index] = tempArray[index]; // end merge
Mergesort
void mergesort(DataType theArray[], int first, int last) { if (first < last) { int mid = (first + last)/2; // index of midpoint mergesort(theArray, first, mid); mergesort(theArray, mid+1, last); // merge the two halves merge(theArray, first, mid, last); } } // end mergesort
Mergesort - Example
6 3 9 1 5 4 7 2
divide
6 3 9 1
divide
5 4 7 2
divide
6 3
divide
9 1
divide
5 4
divide
7 2
divide
6
merge
9
merge
5
merge
4 4 5
7
merge
2 2 7
3 6
merge
1 9
merge
1 3 6 9
merge
2 4 5 7
1 2 3 4 5 7 8 9
CENG 213 Data Structures
Mergesort Example2
......
0
......
2k-1
Best-case:
......
All the elements in the first array are smaller (or larger) than all the elements in the second array. The number of moves: 2k + 2k The number of key comparisons: k
Worst-case:
The number of moves: 2k + 2k The number of key comparisons: 2k-1
Mergesort - Analysis
Levels of recursive calls to mergesort, given an array of eight items
Mergesort - Analysis
2m
level 0 : 1 merge (size 2m-1)
2m-1
2m-1
20
.................
20
level m
Mergesort - Analysis
Worst-case The number of key comparisons: = 20*(2*2m-1-1) + 21*(2*2m-2-1) + ... + 2m-1*(2*20-1) = (2m - 1) + (2m - 2) + ... + (2m 2m-1) ( m terms ) = m*2m 2i
i 0 m 1
Mergesort Analysis
Mergesort is extremely efficient algorithm with respect to time.
Both worst case and average cases are O (n * log2n )
But, mergesort requires an extra array whose size equals to the size of the original array. If we use a linked list, we do not need an extra array
But, we need space for the links And, it will be difficult to divide the list into half ( O(n) )
Quicksort
Like mergesort, Quicksort is also based on the divide-and-conquer paradigm. But it uses this technique in a somewhat opposite manner, as all the hard work is done before the recursive calls. It works as follows: 1. First, it partitions an array into two parts, 2. Then, it sorts the parts independently, 3. Finally, it combines the sorted subsequences by a simple concatenation.
Quicksort (cont.)
The quick-sort algorithm consists of the following three steps:
1. Divide: Partition the list. To partition the list, we first choose some element from the list for which we hope about half the elements will come before and half after. Call this element the pivot. Then we partition the elements so that all those with values less than the pivot come in one sublist and all those with greater values come in another. 2. Recursion: Recursively sort the sublists separately. 3. Conquer: Put the sorted sublists together.
CENG 213 Data Structures
Partition
Partitioning places the pivot in its correct place position within the array.
Arranging the array elements around the pivot p generates two smaller sorting problems. sort the left section of the array, and sort the right section of the array. when these two smaller sorting problems are solved recursively, our bigger sorting problem is solved.
CENG 213 Data Structures
Partition Function
template <class DataType> void partition(DataType theArray[], int first, int last, int &pivotIndex) { // Partitions an array for quicksort.
// Precondition: first <= last. // Postcondition: Partitions theArray[first..last] such that: // S1 = theArray[first..pivotIndex-1] < pivot // theArray[pivotIndex] == pivot // S2 = theArray[pivotIndex+1..last] >= pivot // Calls: choosePivot and swap. // place pivot in theArray[first]
int lastS1 = first; // index of last item in S1 int firstUnknown = first + 1; //index of 1st item in unknown
// move one item at a time until unknown region is empty
Quicksort Function
void quicksort(DataType theArray[], int first, int last) { // Sorts the items in an array into ascending order. // Precondition: theArray[first..last] is an array. // Postcondition: theArray[first..last] is sorted. // Calls: partition. int pivotIndex; if (first < last) { // create the partition: S1, pivot, S2 partition(theArray, first, last, pivotIndex); // sort regions S1 and S2 quicksort(theArray, first, pivotIndex-1); quicksort(theArray, pivotIndex+1, last); } }
Quicksort Analysis
Worst Case: (assume that we are selecting the first element as pivot) The pivot divides the list of size n into two sublists of sizes 0 and n-1. The number of key comparisons = n-1 + n-2 + ... + 1 = n2/2 n/2 O(n2) The number of swaps = = n-1 + n-1 + n-2 + ... + 1 swaps outside of the for loop = n2/2 + n/2 - 1 swaps inside of the for loop
O(n2)
Quicksort Analysis
Quicksort is O(n*log2n) in the best case and average case. Quicksort is slow when the array is sorted and we choose the first element as the pivot. Although the worst case behavior is not so good, its average case behavior is much better than its worst case.
So, Quicksort is one of best sorting algorithms using key comparisons.
Quicksort Analysis
A worst-case partitioning with quicksort
Quicksort Analysis
An average-case partitioning with quicksort
Radix Sort
Radix sort algorithm different than other sorting algorithms that we talked. It does not use key comparisons to sort an array. The radix sort : Treats each data item as a character string. First it groups data items according to their rightmost character, and put these groups into order w.r.t. this rightmost character. Then, combine these groups. We, repeat these grouping and combining operations for all other character positions in the data items from the rightmost to the leftmost character position. At the end, the sort operation will be completed.
CENG 213 Data Structures