Intro To DS and Algo Analysis
Intro To DS and Algo Analysis
Intro To DS and Algo Analysis
ALGORITHM ANALYSIS
Textbook
Seymour Lipschutz and GAV Pai, “Data Structures”, Schaum’s Outlines,
McGraw Hill.
For programming enthusiast:
Data structures using C, Tenenbaum
TOPICS
Arrays
Stacks and Queues
Link Lists
Trees
Heaps / Priority Queues
Binary Search Trees –
Search Trees
Hashing / Dictionaries
Sorting
Graphs and graph algorithms
PROBLEM SOLVING: MAIN STEPS
1. Problem definition
2. Algorithm design / Algorithm specification
3. Algorithm analysis
4. Implementation
5. Testing
6. [Maintenance]
1. PROBLEM DEFINITION
Space complexity
How much space is required
Time complexity
How much time does it take to run the algorithm
5 ms worst-case
4 ms
3 ms
} average-case?
best-case
2 ms
1 ms
A B C D E F G
Input
return 0;
}
This algorithm is clearly O(N) because it only
has one loop that relies on the size of the array,
and the time complexity of the loop doubles as
the size of the array doubles. However, that's
the worst case upper bound. We know that on
average, only half of the array is searched
before the item is found due to the random
distribution. So while the time complexity could
reach O(N), it's usually less even though we
don't really know how much less.
Okay, how about a binary search instead of a sequential search? If
the array is sorted, we can make the search a lot faster by splitting
the array in half at each comparison and only searching the half
where the item might be. That's common knowledge, but why is it
faster? Here's the code for a binary search:
int find ( int a[], int n, int x )
{
int i = 0;
while ( i < n ) ...{
int mid = ( n + i ) / 2;
if ( a[mid] < x )
n = mid;
else if ( a[mid] > x )
i = mid + 1;
else
return 1;
}
return 0;
}
We can call this an O(N) algorithm and not be
wrong because the time complexity will never
exceed O(N). But because the array is split in half
each time, the number of steps is always going to
be equal to the base-2 logarithm of N, which is
considerably less than O(N). So an even better
choice would be to set the upper bound to log N,
which is the upper limit that we know we're
guaranteed never to cross. Therefore, a more
accurate claim is that binary search is a
logarithmic, or O(log2 N), algorithm.
WHAT IS LOG2 N?
In mathematics, the binary logarithm (log2 n) is
the logarithm for base 2. It is the inverse
function of n→2n. The binary logarithm of n is the
power to which the number 2 must be raised to
obtain the value n. This makes the binary
logarithm useful for anything involving powers of 2.
For example, the binary logarithm of 1 is 0, the
binary logarithm of 2 is 1, the binary logarithm of
4 is 2,the binary logarithm of 8 is 3, the binary
logarithm of 16 is 4 and the binary logarithm of 32
is 5.
Sometimes we're interested not in an upper
bound, but in a lower bound. What's the
smallest time complexity that we can expect?
For example, what if we want to know the lower
bound for the binary search we just found the
upper bound for? we can easily say that the
lower bound is O(1) because the best possible
case is an immediate match.
Okay, what about a sorting algorithm? Let's start with selection sort. The
algorithm is simple: find the largest item and move it to the back of the
array. When you move an item to the back, decrease the size of the array so
that you don't continually choose from the items that have already been
selected:
void jsw_selection ( int a[], int n )
{
while ( --n > 0 ) {
int i, max = n;
for ( i = 0; i < n; i++ ) ...{
if ( a[i] > a[max] )
max = i;
}
if ( max != n )
jsw_swap ( &a[n], &a[max] );
}
}
This algorithm has two loops, one inside of the
other. Both rely on the size of the array, so the
algorithm is clearly O(N * N), more commonly
shown as O(N2) and referred to as quadratic.
The fact that N decreases with each step of the
outer loop is irrelevant unless you want a tight
bound, and even then it's difficult to analyze.
But that doesn't matter much because the
upper bound is really all we care about for an
existing sorting algorithm.
Let's look at a faster sort. The heap sort algorithm uses a tree based structure to make the selection process
faster.
void jsw_do_heap ( int a[], int i, int n )
{ int k = i * 2 + 1;
int save = a[i];
while ( k < n ) ...{
if ( k + 1 < n && a[k] < a[k + 1] )
++k;
if ( save >= a[k] )
break;
a[i] = a[k];
i = k;
k = i * 2 + 1;
}
a[i] = save;
}
void jsw_heapsort ( int a[], int n )
{ int i = n / 2;
while ( i-- > 0 )
jsw_do_heap ( a, i, n );
while ( --n > 0 ) ...{
jsw_swap ( &a[0], &a[n] );
jsw_do_heap ( a, 0, n );
}
}
Because the heap is structured like a tree,
jsw_do_heap is Θ(log2 N). The first loop in
jsw_heapsort is O(N / 2), but because the
second loop is O(N) and dominates the first
loop, we can toss the complexity of the first
loop. So we have an O(N) loop that calls a
Θ(log2 N) function. We conclude that the upper
bound of heap sort is O(Nlog2 N), which doesn't
have a set descriptive name, but it's often
shown as O(N * log2 N).
We've looked at the most common time complexities: O(1)
for constant time, O(N) for linear time, O(log2 N) for
logarithmic time, O(Nlog2 N), and O(N2) for quadratic time.
Others exist, such as O(N!) for a ridiculous factorial growth,
but you won't see them often. Here are the upper bound
time complexities in order of growth from least to greatest
that you're most likely to see:
O(1) - No growth
O(log2 N) - Grows by the logarithm of N when N doubles
O(N) - Grows with N when N doubles
O(Nlog2 N) - Grows by the product of N and the logarithm of
N when N doubles
O(N2) - Grows twice as fast as N when N doubles
O(N!) - Grows by the factorial of N when N doubles
COMPARISON OF GROWTH OF COMMON
FUNCTIONS
GROWTH OF FUNCTIONS
ASYMPTOTIC NOTATION
The notations we use to describe the
asymptotic running time of an algorithm are
defined in terms of functions whose domains
are the set of natural numbers N = {0, 1, 2, ...}.
Such notations are convenient for describing
the worst-case running-time function T (n),
which is usually defined only on integer input
sizes.
Θ-NOTATION