0% found this document useful (0 votes)
8 views68 pages

Sorting Lecture-5 DAA

The document provides an overview of sorting algorithms, defining sorting as the process of arranging data records in a specified order based on their keys. It discusses various sorting methods, including comparison-based algorithms like bubble sort and quicksort, as well as non-comparison-based methods like radix and bucket sort, highlighting their time complexities and stability properties. Additionally, it explains the importance of sorting in data retrieval and processing, particularly in database systems.

Uploaded by

Clock
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views68 pages

Sorting Lecture-5 DAA

The document provides an overview of sorting algorithms, defining sorting as the process of arranging data records in a specified order based on their keys. It discusses various sorting methods, including comparison-based algorithms like bubble sort and quicksort, as well as non-comparison-based methods like radix and bucket sort, highlighting their time complexities and stability properties. Additionally, it explains the importance of sorting in data retrieval and processing, particularly in database systems.

Uploaded by

Clock
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 68

Sorting Algorithms

Sorting - Definitions
 Input: You are given an array A of data records, each with a key (which
could be an integer, character, string, etc.).
 There is an ordering on the set of possible keys
 You can compare any two keys using <, >, ==

 For simplicity, we will assume that A[i] contains only one element – the key

 Sorting Problem: Given an array A, output A such that:


 For any i and j, if i < j then A[i] <= A[j]

 Internal sorting: all data in main memory


 External sorting: data on disk

2
Sorting - Definitions

 Sorting is also a problem for data sets stored in a linked list


where the keys are not kept in sorted order
 Given an unsorted linked list (singly-linked, doubly-linked), produce a
linked list that has the keys in sorted order
 We will not look at sorting algorithms for linked lists

 Notice that sorting is easy for data sets stored in BSTs (BST,
AVL, Splay trees), and B Trees, B+ Trees:
 An inorder traversal of the tree results in sorted order of the keys in O(N)
time.

3
Why Sort?
 Sorting algorithms are among the most frequently used
algorithms in computer science
 Crucial for efficient retrieval and processing of large volumes of data,
e.g., Database systems

 Allows binary search of an N-element array in O(log N) time

 Allows O(1) time access to kth largest element in the array for
any k

 Allows easy detection of any duplicates

4
Sorting – Things to consider
 Space: Does the sorting algorithm require extra memory to sort
the collection of items?
 Do you need to copy and temporarily store some subset of the keys/data
records?
 An algorithm which requires O(1) extra space is known as an in place
sorting algorithm

 Stability: Does it rearrange the order of input data records


which have the same key value (duplicates)?
 E.g. Given: Phone book sorted by name. Now sort by district – Is the list
still sorted by name within each county?
 Extremely important property for databases – next slide
 A stable sorting algorithm is one which does not rearrange the order of
duplicate keys
5
Stability – Why?
 Consider the following data records sorted by name. Second
field is the student’s class (1st year, 2nd year)
(Ali, 1), (Mehmet, 2) (Nazan, 1), (Selim, 1), (Zeynep, 2)

 Now sort this set with respect to class


(Ali, 1), (Nazan, 1), (Selim, 1), (Mehmet, 2) (Zeynep, 2)
 The set is sorted with respect to class
 And students are sorted with respect to name within each class.
 This is the output by a STABLE sorting algorithm

(Nazan, 1), (Ali, 1), (Selim, 1), (Zeynep, 2) (Mehmet, 2)


 The set is sorted with respect to class
 But students are NOT sorted with respect to name within the class
 This is the output by a UNSTABLE sorting algorithm

6
Exchange Sort
 Compare 1st element to each other element
 Swap a pair if out of order
 This completes one pass
 Places the smallest element in the 1st position
 Repeat for the 2nd, 3rd, 4th, etc elements
Exchange Sort – single pass
3 8 2 4 1 7 6 5

3 8 2 4 1 7 6 5

2 8 3 4 1 7 6 5

2 8 3 4 1 7 6 5

1 8 3 4 2 7 6 5

1 8 3 4 2 7 6 5

1 8 3 4 2 7 6 5

1 8 3 4 2 7 6 5
An overview of Simple Sorting
Algorithms
 Bubble sort, selection sort, and insertion sort are all O(n2)
 As we will see later, we can do much better than this with
somewhat more complicated sorting algorithms
 Within O(n2),
 Bubble sort is very slow, and should probably never be used for anything
 Selection sort is intermediate in speed
 Insertion sort is usually the fastest of the three--in fact, for small arrays
(say, 10 or 15 elements), insertion sort is faster than more complicated
sorting algorithms
 Selection sort and insertion sort are “good enough” for small
arrays
Sorting methods
 Comparison based sorting
 O(n2) methods

E.g., Insertion, bubble
 Average time O(n log n) methods

E.g., quick sort
 O(n logn) methods

E.g., Merge sort, heap sort
 Non-comparison based sorting
 Integer sorting: linear time

E.g., Counting sort, bin sort
 Radix sort, bucket sort
 Stable vs. non-stable sorting
Comparison Sorting
Sort Worst Average Best Comments
Case Case Case
InsertionSort Θ(N2) Θ(N2) Θ(N) Fast for
small N
MergeSort Θ(N log N) Θ(N log N) Θ(N log N) Requires memory
HeapSort Θ(N log N) Θ(N log N) Θ(N log N) Large constants
QuickSort Θ(N2) Θ(N log N) Θ(N log N) Small constants

11
Binary Trees
 A tree in which no node can have more than two children.

Generic
binary tree

 The depth of an “average” binary tree is considerably smaller than N,


even though in the worst case, the depth can be as large as N – 1.

Worst-case
binary tree

N=5
Heap
 Definition of Heap
 balanced and complete (except the bottom level) binary tree
 every node always <= (or >=) its children

Pack to the left


Array Implementation
 When stored in an array A
 If A[i] has parent, then the parent is A[ i/2 ]
 If A[i] has left child, then the left child is A[2i]
 If A[i] has right child, then the right child is A[2i+1]
 Depth A[i]:
 Root node is the minimum / maximum A[ log i ]
A (max) heap stored in an array

1 2 3 4 5 6 7 8 9 10
80 70 60 20 30 50 10 15 8

1 80 20 nodes

2 3 21 nodes
70 60

4 5 6 7
20 30 50 10 22 nodes

8 9
15 8
Construct a heap (heapify)
 A tree with a single node is automatically a heap
 We construct a heap by inserting nodes one at a time
 to the right of the rightmost node in the deepest level
 If the deepest level is full, start a new level

Add a new
Add a new node here
node here
Construct a heap (heapify) cont.
 After adding the node, since it destroy the heap property, we will Percolate up
it up by comparing with its parent
 Example (given 4 numbers, {8,10,5,12})

8 10 10
8

10 8 8 5
1 2 3

10 10 12

8 5 12 5 10 5

12 8 8 4
Heap sort
 Sorting can be performed by
 Construct a heap from the original array
 Then keep deleting the root node of the heap

Sort in ascending order: min-heap

Sort in descending order: max-heap
 Problem
 Delete operation may destroy heap property
 So after deleting the root, we have to re-heap the tree
Delete
 Re-heap solution
 Remove the rightmost leaf at the deepest level and use it for
the new root
 Percolate down the root node by comparing with its children
Delete example

25

22 17

19 22 14 15

18 14 21 3 9 11
Delete example

11

22 17

19 22 14 15

18 14 21 3 9 11
Delete example

11

22 17

19 22 14 15

18 14 21 3 9
Delete example

22

11 17

19 22 14 15

18 14 21 3 9
Delete example

22

22 17

19 11 14 15

18 14 21 3 9
Delete example

22

22 17

19 21 14 15

18 14 11 3 9
Complexity
 Insert one node in heap: O(logn) time
 Delete one node in heap: O(logn) time
why?
 Insertion/deletion only causes one node to percolate up/down alo
ng the tree.
 The height of a n-node heap is O(logn)
Complexity
 Heapify the array O(n log n) time: n insertion
 Keep deleting the root and re-heap
 Takes O(n log n) : n deletion
 So the overall complexity of Heap Sort is
O(n log n) time
Complexity

 Heapsort is always O(n log n)


 Quicksort is usually O(n log n) but in the worst
case slows to O(n2)
 Quicksort is generally faster, but Heapsort is
better in time-critical application
Radix Sort
 Represents keys as d-digit numbers in some base-k
e.g., key = x1x2...xd where 0≤xi≤k-1

 Example: key=15
key10 = 15, d=2, k=10 where 0≤xi≤9
key2 = 1111, d=4, k=2 where 0≤xi≤1
Radix Sort
 Assumptions
d=Θ(1) and k =O(n)
 Sorting looks at one column at a time
 For a d digit number, sort the least significant digit first
 Continue sorting on the next least significant digit, until all
digits have been sorted
 Requires only d passes through the list
RADIX-SORT
Alg.: RADIX-SORT(A, d)
for i ← 1 to d
do use a stable sort to sort array A on digit i
 1 is the lowest order digit, d is the highest-order digit
Analysis of Radix Sort

 Given n numbers of d digits each, where each digit may

take up to k possible values, RADIX-SORT correctly

sorts the numbers in (d(n+k))


One pass of sorting per digit takes (n+k) assuming that we

use counting sort


There are d passes (for each digit)
Correctness of Radix sort
 We use induction on number of passes through each digit
 Basis: If d = 1, there’s only one digit, trivial
 Inductive step: assume digits 1, 2, . . . , d-1 are sorted
 Now sort on the d-th digit

 If a < b , sort will put a before b: correct


d d

a < b regardless of the low-order digits


 If ad > bd, sort will put a after b: correct
a > b regardless of the low-order digits
 If ad = bd, sort will leave a and b in the
same order (stable!) and a and b are
already sorted on the low-order d-1 digits
Bucket Sort
 Assumption:
 the input is generated by a random process that distributes elements uniformly over
[0, 1)
 Idea:
 Divide [0, 1) into n equal-sized buckets
 Distribute the n input values into the buckets
 Sort each bucket (e.g., using quicksort)
 Go through the buckets in order, listing elements in each one

 Input: A[1 . . n], where 0 ≤ A[i] < 1 for all i


 Output: elements A[i] sorted
 Auxiliary array: B[0 . . n - 1] of linked lists, each list initially empty
Example - Bucket Sort
A 1 .78 B 0 /

2 .17 1 .17 .12 /

3 .39 2 .26 .21 .23 /


4 .26 3 .39 /

5 .72 4 /

6 .94 5 /

7 .21 6 .68 /

8 .12 7 .78 .72 /

9 .23 8 /
10 .68 9 .94 /
Example - Bucket Sort
.12 .17 .21 .23 .26 .39 .68 .72 .78 .94 /

0 /

1 .12 .17 /

2 .21 .23 .26 /

3 .39 /

4 /

5 /

6 .68 /

7 .72 .78 /
Concatenate the lists from
8 / 0 to n – 1 together, in order
9 .94 /
Correctness of Bucket Sort
 Consider two elements A[i], A[ j]
 Assume without loss of generality that A[i] ≤ A[j]
 Then nA[i] ≤ nA[j]
 A[i] belongs to the same bucket as A[j] or to a bucket with a lower index
than that of A[j]
 If A[i], A[j] belong to the same bucket:
 sorting puts them in the proper order
 If A[i], A[j] are put in different buckets:
 concatenation of the lists puts them in the proper order
Analysis of Bucket Sort
Alg.: BUCKET-SORT(A, n)
for i ← 1 to n
O(n)
do insert A[i] into list B[nA[i]]

for i ← 0 to n - 1

do sort list B[i] with quicksort sort (n)

concatenate lists B[0], B[1], . . . , B[n -1]


together in order

return the concatenated lists O(n)

(n)
Radix Sort Is a Bucket Sort
Running Time of 2nd Step
Effect of radix k
Counting Sort
 Assumptions:
 Sort n integers which are in the range [0 ... r]
 r is in the order of n, that is, r=O(n)
 Idea:
 For each element x, find the number of elements x
 Place x into its correct position in the output array

output array
Step 1

(i.e., frequencies)
Step 2

(i.e., cumulative sums)


Algorithm
 Start from the last element of A (i.e., see hw)
 Place A[i] at its correct place in the output array
 Decrease C[A[i]] by one
Example
1 2 3 4 5 6 7 8 0 1 2 3 4 5

A 2 5 3 0 2 3 0 3 C 2 2 4 7 7 8
0 1 2 3 4 5
(cumulative sums)
C 2 0 2 3 0 1
(frequencies)
1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8

B 3 B 0 3
0 1 2 3 4 5 0 1 2 3 4 5

C 2 2 4 6 7 8 C 1 2 4 6 7 8

1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8

B 0 3 3 B 0 2 3 3
0 1 2 3 4 5 0 1 2 3 4 5

C 1 2 4 5 7 8 C 1 2 3 5 7 8
Example (cont.)
1 2 3 4 5 6 7 8

A 2 5 3 0 2 3 0 3

1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8

B 0 0 2 3 3 B 0 0 2 3 3 3 5
0 1 2 3 4 5 0 1 2 3 4 5

C 0 2 3 5 7 8 C 0 2 3 4 7 7

1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8

B 0 0 2 3 3 3 B 0 0 2 2 3 3 3 5
0 1 2 3 4 5

C 0 2 3 4 7 8
COUNTING-SORT
1 j n

A
Alg.: COUNTING-SORT(A, B, n, k) 0 k

1. for i ← 0 to r C
2. do C[ i ] ← 0 1 n

3. for j ← 1 to n B
4. do C[A[ j ]] ← C[A[ j ]] + 1
5. C[i] contains the number of elements equal to i
6. for i ← 1 to r
7. do C[ i ] ← C[ i ] + C[i -1]
8. C[i] contains the number of elements ≤ i
9. for j ← n downto 1
10. do B[C[A[ j ]]] ← A[ j ]
11. C[A[ j ]] ← C[A[ j ]] - 1
Analysis of Counting Sort
Alg.: COUNTING-SORT(A, B, n, k)
1. for i ← 0 to r
(r)
2. do C[ i ] ← 0
3. for j ← 1 to n
4. do C[A[ j ]] ← C[A[ j ]] + 1 (n)
5. C[i] contains the number of elements equal to i
6. for i ← 1 to r
7. do C[ i ] ← C[ i ] + C[i -1]
8. C[i] contains the number of elements ≤ i (r)
9. for j ← n downto 1
10. do B[C[A[ j ]]] ← A[ j ]
11. C[A[ j ]] ← C[A[ j ]] - 1
(n)

Overall time: (n + r)


Analysis of Counting Sort
 Overall time: (n + r)
 In practice we use COUNTING sort when r = O(n)

 running time is (n)


 Counting sort is stable
 Counting sort is not in place sort
Comparison sort
 A comparison sort is a type of sorting algorithm that only reads the list elements
through a single abstract comparison operation (often a "less than or equal to"
operator).
 The only requirement is that the operator obey the three defining properties of a total
order:
 if a ≤ b and b ≤ a then a = b (antisymmetry)
 if a ≤ b and b ≤ c then a ≤ c (transitivity)
 a ≤ b or b ≤ a (totalness or trichotomy)

Examples
 Some of the most well-known comparison sorts include:
 Quicksort
 Heapsort
 Merge sort
 Introsort
 Insertion sort
 Selection sort
 Bubble sort
 Some examples of sorts which are not comparison sorts include:
 Radix sort (examines individual bits of keys)
 Counting sort (indexes using key values)
 Bucket sort (examines bits of keys)


Comparison Sorts
 Comparison sorts use comparisons between elements to gain

information about an input sequence a1, a2, …, an

 Perform tests:

ai < aj, ai ≤ aj, a i = a j, ai ≥ aj, or ai > aj

to determine the relative order of ai and aj

 For simplicity, assume that all the elements are distinct


Lower-Bound for Sorting

 Theorem: To sort n elements, comparison sorts must make


(nlgn) comparisons in the worst case.
Decision Tree Model
 Represents the comparisons made by a sorting algorithm on an input of a
given size.
 Models all possible execution traces
 Control, data movement, other operations are ignored
 Count only the comparisons

node

leaf:
Example: Insertion Sort
Worst-case number of comparisons?
 Worst-case number of comparisons depends on:
 the length of the longest path from the root to a leaf
(i.e., the height of the decision tree)
Lemma
 Any binary tree of height h has at most 2h leaves
Proof: induction on h
Basis: h = 0  tree has one node, which is a leaf
20 = 1
Inductive step: assume true for h-1
 Extend the height of the tree with one more level
 Each leaf becomes parent to two new leaves

No. of leaves at level h = 2  (no. of leaves at level h-1)


= 2  2h-1
= 2h
4

1 3 h-1

2 16 9 10 h
What is the least number of leaves in a
Decision Tree Model?
 All permutations on n elements must appear as one of the leaves in the
decision tree: n! permutations
 At least n! leaves
How to sort …
1. Distinct Integers in Reverse Order
 Radix Sort is best, if space is not a factor.
 Insertion Sort: O(n2) – also worst case
 Selection Sort: always O(n2)
 Bubble Sort: O(n2) – also worst case
 Quicksort:

Simple Quicksort: O(n2) –worst case

Median-of-3 pivot picking, O(n log n)
 Heapsort: always O(n log n)
 Mergesort: always O(n log n)
 Radix Sort: O(nk) = O(n).
How to sort …
2. Distinct Real Numbers in Random Order
 Quicksort is best. Heapsort is good. Mergesort is also good if space is not a
factor.
 Insertion Sort: O(n2)
 Selection Sort: always O(n2)
 Bubble Sort: O(n2)
 Quicksort: O(n log n) in average case (instable)
 Heapsort: always O(n log n) (instable)
 Mergesort: always O(n log n) (stable)
 Radix Sort: not appropriate for real numbers.
How to sort …
3. Distinct Integers with One Element Out of Place

 Insertion Sort is best.


 If the element is “later” than its proper place then Bubble Sort
(to the smallest end) is also good.
 Otherwise, Radix Sort.
How to sort …
3. Distinct Integers with One Element Out of Place
 Insertion Sort: O(n)
 Selection Sort: always O(n2)
 Bubble Sort:

“Later”: O(n).

“Earlier”: O(n2).
 Quicksort:

Simple Quicksort: O(n2) –close to the worst case

Median-of-3 pivot picking, O(n log n)
 Heapsort: always O(n log n)
 Mergesort: always O(n log n)
 Radix Sort: O(kn) = O(n).
How to sort …
4. Distinct Real Numbers, “Almost Sorted”
 Insertion Sort is best, Bubble Sort almost as good
 Insertion Sort: Almost O(n).
 Selection Sort: always O(n2)
 Bubble Sort: Almost O(n).
 Quicksort: depending on data, somewhere between O(n 2) and O(n log
n)
 Heapsort: always O(n log n)
 Mergesort: always O(n log n)
 Radix Sort: Not appropriate for real numbers.
Timing Comparisons
 O(n2) Sorting Algorithms
Timing Comparisons
 O(n log n) Sorting Algorithms
Complexity Comparison
 Comparing the functions 10n2 and 30 n log(n) for
small values of n.
Quiz No 1 ( 20 min)

Q No 1:- Apply the master theorm on following recursive relations


a) T (n) = T (n/2) + 2n =
b)T (n) = 0.5T (n/2)+ 1/n
c)T (n) = 2nT (n/2) + nn
d)T (n) = 64T (n/8)− n2 log n
Q No 2:- Solve the following
a) 7n-2 is O(n)
b) 3n3 + 20n2 + 5 is O(n3)

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy