Sorting Lecture-5 DAA
Sorting Lecture-5 DAA
Sorting - Definitions
Input: You are given an array A of data records, each with a key (which
could be an integer, character, string, etc.).
There is an ordering on the set of possible keys
You can compare any two keys using <, >, ==
For simplicity, we will assume that A[i] contains only one element – the key
2
Sorting - Definitions
Notice that sorting is easy for data sets stored in BSTs (BST,
AVL, Splay trees), and B Trees, B+ Trees:
An inorder traversal of the tree results in sorted order of the keys in O(N)
time.
3
Why Sort?
Sorting algorithms are among the most frequently used
algorithms in computer science
Crucial for efficient retrieval and processing of large volumes of data,
e.g., Database systems
Allows O(1) time access to kth largest element in the array for
any k
4
Sorting – Things to consider
Space: Does the sorting algorithm require extra memory to sort
the collection of items?
Do you need to copy and temporarily store some subset of the keys/data
records?
An algorithm which requires O(1) extra space is known as an in place
sorting algorithm
6
Exchange Sort
Compare 1st element to each other element
Swap a pair if out of order
This completes one pass
Places the smallest element in the 1st position
Repeat for the 2nd, 3rd, 4th, etc elements
Exchange Sort – single pass
3 8 2 4 1 7 6 5
3 8 2 4 1 7 6 5
2 8 3 4 1 7 6 5
2 8 3 4 1 7 6 5
1 8 3 4 2 7 6 5
1 8 3 4 2 7 6 5
1 8 3 4 2 7 6 5
1 8 3 4 2 7 6 5
An overview of Simple Sorting
Algorithms
Bubble sort, selection sort, and insertion sort are all O(n2)
As we will see later, we can do much better than this with
somewhat more complicated sorting algorithms
Within O(n2),
Bubble sort is very slow, and should probably never be used for anything
Selection sort is intermediate in speed
Insertion sort is usually the fastest of the three--in fact, for small arrays
(say, 10 or 15 elements), insertion sort is faster than more complicated
sorting algorithms
Selection sort and insertion sort are “good enough” for small
arrays
Sorting methods
Comparison based sorting
O(n2) methods
E.g., Insertion, bubble
Average time O(n log n) methods
E.g., quick sort
O(n logn) methods
E.g., Merge sort, heap sort
Non-comparison based sorting
Integer sorting: linear time
E.g., Counting sort, bin sort
Radix sort, bucket sort
Stable vs. non-stable sorting
Comparison Sorting
Sort Worst Average Best Comments
Case Case Case
InsertionSort Θ(N2) Θ(N2) Θ(N) Fast for
small N
MergeSort Θ(N log N) Θ(N log N) Θ(N log N) Requires memory
HeapSort Θ(N log N) Θ(N log N) Θ(N log N) Large constants
QuickSort Θ(N2) Θ(N log N) Θ(N log N) Small constants
11
Binary Trees
A tree in which no node can have more than two children.
Generic
binary tree
Worst-case
binary tree
N=5
Heap
Definition of Heap
balanced and complete (except the bottom level) binary tree
every node always <= (or >=) its children
1 2 3 4 5 6 7 8 9 10
80 70 60 20 30 50 10 15 8
1 80 20 nodes
2 3 21 nodes
70 60
4 5 6 7
20 30 50 10 22 nodes
8 9
15 8
Construct a heap (heapify)
A tree with a single node is automatically a heap
We construct a heap by inserting nodes one at a time
to the right of the rightmost node in the deepest level
If the deepest level is full, start a new level
Add a new
Add a new node here
node here
Construct a heap (heapify) cont.
After adding the node, since it destroy the heap property, we will Percolate up
it up by comparing with its parent
Example (given 4 numbers, {8,10,5,12})
8 10 10
8
10 8 8 5
1 2 3
10 10 12
8 5 12 5 10 5
12 8 8 4
Heap sort
Sorting can be performed by
Construct a heap from the original array
Then keep deleting the root node of the heap
Sort in ascending order: min-heap
Sort in descending order: max-heap
Problem
Delete operation may destroy heap property
So after deleting the root, we have to re-heap the tree
Delete
Re-heap solution
Remove the rightmost leaf at the deepest level and use it for
the new root
Percolate down the root node by comparing with its children
Delete example
25
22 17
19 22 14 15
18 14 21 3 9 11
Delete example
11
22 17
19 22 14 15
18 14 21 3 9 11
Delete example
11
22 17
19 22 14 15
18 14 21 3 9
Delete example
22
11 17
19 22 14 15
18 14 21 3 9
Delete example
22
22 17
19 11 14 15
18 14 21 3 9
Delete example
22
22 17
19 21 14 15
18 14 11 3 9
Complexity
Insert one node in heap: O(logn) time
Delete one node in heap: O(logn) time
why?
Insertion/deletion only causes one node to percolate up/down alo
ng the tree.
The height of a n-node heap is O(logn)
Complexity
Heapify the array O(n log n) time: n insertion
Keep deleting the root and re-heap
Takes O(n log n) : n deletion
So the overall complexity of Heap Sort is
O(n log n) time
Complexity
Example: key=15
key10 = 15, d=2, k=10 where 0≤xi≤9
key2 = 1111, d=4, k=2 where 0≤xi≤1
Radix Sort
Assumptions
d=Θ(1) and k =O(n)
Sorting looks at one column at a time
For a d digit number, sort the least significant digit first
Continue sorting on the next least significant digit, until all
digits have been sorted
Requires only d passes through the list
RADIX-SORT
Alg.: RADIX-SORT(A, d)
for i ← 1 to d
do use a stable sort to sort array A on digit i
1 is the lowest order digit, d is the highest-order digit
Analysis of Radix Sort
One pass of sorting per digit takes (n+k) assuming that we
There are d passes (for each digit)
Correctness of Radix sort
We use induction on number of passes through each digit
Basis: If d = 1, there’s only one digit, trivial
Inductive step: assume digits 1, 2, . . . , d-1 are sorted
Now sort on the d-th digit
5 .72 4 /
6 .94 5 /
7 .21 6 .68 /
9 .23 8 /
10 .68 9 .94 /
Example - Bucket Sort
.12 .17 .21 .23 .26 .39 .68 .72 .78 .94 /
0 /
1 .12 .17 /
3 .39 /
4 /
5 /
6 .68 /
7 .72 .78 /
Concatenate the lists from
8 / 0 to n – 1 together, in order
9 .94 /
Correctness of Bucket Sort
Consider two elements A[i], A[ j]
Assume without loss of generality that A[i] ≤ A[j]
Then nA[i] ≤ nA[j]
A[i] belongs to the same bucket as A[j] or to a bucket with a lower index
than that of A[j]
If A[i], A[j] belong to the same bucket:
sorting puts them in the proper order
If A[i], A[j] are put in different buckets:
concatenation of the lists puts them in the proper order
Analysis of Bucket Sort
Alg.: BUCKET-SORT(A, n)
for i ← 1 to n
O(n)
do insert A[i] into list B[nA[i]]
for i ← 0 to n - 1
(n)
Radix Sort Is a Bucket Sort
Running Time of 2nd Step
Effect of radix k
Counting Sort
Assumptions:
Sort n integers which are in the range [0 ... r]
r is in the order of n, that is, r=O(n)
Idea:
For each element x, find the number of elements x
Place x into its correct position in the output array
output array
Step 1
(i.e., frequencies)
Step 2
A 2 5 3 0 2 3 0 3 C 2 2 4 7 7 8
0 1 2 3 4 5
(cumulative sums)
C 2 0 2 3 0 1
(frequencies)
1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8
B 3 B 0 3
0 1 2 3 4 5 0 1 2 3 4 5
C 2 2 4 6 7 8 C 1 2 4 6 7 8
1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8
B 0 3 3 B 0 2 3 3
0 1 2 3 4 5 0 1 2 3 4 5
C 1 2 4 5 7 8 C 1 2 3 5 7 8
Example (cont.)
1 2 3 4 5 6 7 8
A 2 5 3 0 2 3 0 3
1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8
B 0 0 2 3 3 B 0 0 2 3 3 3 5
0 1 2 3 4 5 0 1 2 3 4 5
C 0 2 3 5 7 8 C 0 2 3 4 7 7
1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8
B 0 0 2 3 3 3 B 0 0 2 2 3 3 3 5
0 1 2 3 4 5
C 0 2 3 4 7 8
COUNTING-SORT
1 j n
A
Alg.: COUNTING-SORT(A, B, n, k) 0 k
1. for i ← 0 to r C
2. do C[ i ] ← 0 1 n
3. for j ← 1 to n B
4. do C[A[ j ]] ← C[A[ j ]] + 1
5. C[i] contains the number of elements equal to i
6. for i ← 1 to r
7. do C[ i ] ← C[ i ] + C[i -1]
8. C[i] contains the number of elements ≤ i
9. for j ← n downto 1
10. do B[C[A[ j ]]] ← A[ j ]
11. C[A[ j ]] ← C[A[ j ]] - 1
Analysis of Counting Sort
Alg.: COUNTING-SORT(A, B, n, k)
1. for i ← 0 to r
(r)
2. do C[ i ] ← 0
3. for j ← 1 to n
4. do C[A[ j ]] ← C[A[ j ]] + 1 (n)
5. C[i] contains the number of elements equal to i
6. for i ← 1 to r
7. do C[ i ] ← C[ i ] + C[i -1]
8. C[i] contains the number of elements ≤ i (r)
9. for j ← n downto 1
10. do B[C[A[ j ]]] ← A[ j ]
11. C[A[ j ]] ← C[A[ j ]] - 1
(n)
Comparison Sorts
Comparison sorts use comparisons between elements to gain
Perform tests:
node
leaf:
Example: Insertion Sort
Worst-case number of comparisons?
Worst-case number of comparisons depends on:
the length of the longest path from the root to a leaf
(i.e., the height of the decision tree)
Lemma
Any binary tree of height h has at most 2h leaves
Proof: induction on h
Basis: h = 0 tree has one node, which is a leaf
20 = 1
Inductive step: assume true for h-1
Extend the height of the tree with one more level
Each leaf becomes parent to two new leaves
1 3 h-1
2 16 9 10 h
What is the least number of leaves in a
Decision Tree Model?
All permutations on n elements must appear as one of the leaves in the
decision tree: n! permutations
At least n! leaves
How to sort …
1. Distinct Integers in Reverse Order
Radix Sort is best, if space is not a factor.
Insertion Sort: O(n2) – also worst case
Selection Sort: always O(n2)
Bubble Sort: O(n2) – also worst case
Quicksort:
Simple Quicksort: O(n2) –worst case
Median-of-3 pivot picking, O(n log n)
Heapsort: always O(n log n)
Mergesort: always O(n log n)
Radix Sort: O(nk) = O(n).
How to sort …
2. Distinct Real Numbers in Random Order
Quicksort is best. Heapsort is good. Mergesort is also good if space is not a
factor.
Insertion Sort: O(n2)
Selection Sort: always O(n2)
Bubble Sort: O(n2)
Quicksort: O(n log n) in average case (instable)
Heapsort: always O(n log n) (instable)
Mergesort: always O(n log n) (stable)
Radix Sort: not appropriate for real numbers.
How to sort …
3. Distinct Integers with One Element Out of Place