Materials
Materials
Materials
CSX0003
Transcript
Integer Multiplication
Transcripts
Sometime when you were a kid, maybe say third grade or so, you
learned an Algorithm for multiplying two numbers. Maybe your third
grade teacher didn't call it that, maybe that's not how you thought about
it. But you learned a well defined set of rules for transforming input,
namely two numbers into an output, namely their product. So, that is an
algorithm for solving a computational problem. Let's pause and be
precise about it. Many of the lectures in this course will follow a pattern.
We'll define a computational problem. We'll say what the input is, and
then we'll say what the desired output is. Then we will proceed to giving
a solution, to giving an algorithm that transforms the input to the output.
When the integer multiplication problem, the input is just two, n-digit
numbers. So the length, n, of the two input integers x and y could be
anything, but for motivation you might want to think of n as large, in the
thousands or even more, perhaps we're implementing some kind of
cryptographic application which has to manipulate very large numbers.
We also need to explain what is desired output in this simple problem it's
simply the product x times y. So a quick digression so back in 3rd grade
around the same I was learning the Integer Multiplication Algorithm. I got
a C in penmanship and I don't think my handwriting has improved much
since. Many people tell me by the end of the course. They think of it
fondly as a sort of acquired taste, but if you're feeling impatient, please
note there are typed versions of these slides. Which I encourage you to
use as you go through the lectures, if you don't want to take the time
deciphering the handwriting. Returning to the Integer Multiplication
problem, having now specified the problem precisely, the input, the
desired output. We'll move on to discussing an algorithm that solves it,
namely, the same algorithm you learned in third grade. The way we will
assess the performance of this algorithm is through the number of basic
operations that it performs. And for the moment, let's think of a basic
operation as simply adding two single-digit numbers together or
multiplying two single digit numbers. We're going to then move on to
counting the number of these basic operations performed by the third
grade algorithm. As a function of the number n of digits in the input.
Here's the integer multiplication algorithm that you learned back in third
grade Illustrated on a concrete example. Let's take say the numbers 1,
2, 3, 4 and 5, 6, 7, 8. As we go through this algorithm quickly, let me
remind you that our focus should be on the number of basic operations
this algorithm performs. As a function of the length of the input numbers.
Which, in this particular example, is four digits long. So as you'll recall,
we just compute one partial product for each digit of the second number.
So we start by just multiplying 4 times the upper number 5, 6, 7, 8. So,
you know, 4 times 8 is 32, 2 carry to 3, 4 times 7 is 28, with the 3 that's
31, write down the 1, carry the 3, and so on. When we do the next partial
product, we do a shift effectively, we add a 0 at the end, and then we
just do exactly the same thing. And so on for the final two partial
products. [SOUND] And finally, we just add everything up. [SOUND],
what you probably realized back in third grade, is that this algorithm is
what we would call correct. That is, no matter what integers x and y you
start with If you carry out this procedure, this algorithm. And all of your
intermediate computations are done properly. Then the algorithm will
eventually terminate with the product, x times y, of the two input
numbers. You're never going to get a wrong answer. You're always
going to get the actual product. Well, you probably didn't think about was
the amount of time needed to carry out this algorithm out to its
conclusion to termination. That is the number of basic operations,
additions or multiplications of single digit numbers needed before
finishing. So let's now quickly give an informal analyses of the number of
operations required as a function of the input length n. Let's begin with
the first partial product, the top row. How did we compute this number
22,712? Well we multiplied 4 times each of the numbers 5, 6, 7 and 8.
So that was for basic operations. One for each digit at the top number,
plus we had to do these carries. So those were some extra additions.
But in any case, this is at most twice times the number of digits in the
first number. At most two end basic operations to form this first partial
product. And if you think about it there's nothing special about the first
partial product. The same argument says that we need at most 2 n
operations to form each of the partial products of which there are again
n, one for each digit of the second number. Well if we need at most two
n operations to compute each partial product and we have n partial
products. That's a total of at most two n squared operations to form all of
these blue numbers, all of the partial products. Now we're not done at
that point. We still have to add all of those up to get the final answer, in
this case 7,006,652. And that's final addition requires a comparable
number of operations. Roughly, another say two n squared, at most
operations. So, the upshot, the high level point that I want you to focus
on, is that as we think about the input numbers getting bigger and
bigger. That is as a function of n the number of digits in the input
numbers. The number of operations that the Grade-School Multiplication
Algorithm performs, grows like some constant. Roughly 4 say times n
squared. That is it's quadratic in the input length n. For example, if you
double the size of the input, if you double the number of digits in each of
the two integers that you're given. Then the number of operations you
will have to perform using this algorithm has to go up by a factor of four.
Similarly, if you quadruple the input length, the number of operations
going, is going to go up by a factor of 16, and so on. Now, depending on
what type of third grader you were. You might well of accepted this
procedure as the unique or at least the optimal way of multiplying two
numbers together to form their product. Now if you want to be a serious
algorithm designer. That kind of obedient tumidity is a quality you're
going to have to grow out of. And early and extremely important textbook
on the design and analysis of algorithms was by Aho, Hopcroft, and
Ullman. It's about 40 years old now. And there's the following quote,
which I absolutely adore. So after iterating through a number of the
algorithm design paradigms covered in the textbook. They say the
following, perhaps the most important principle of all, for the good
algorithm designer is to refuse to be content. And I think this is a spot on
comment. I might summarize it a little bit more succinctly. As, as an
algorithm designer you should adopt as your Mantra the question, can
we do better? This question is particularly apropos when your'e faced
with a naive or straight-forward solution to a computation problem. Like
for example, the third grade algorithm for integer multiplication. The
question you perhaps did not ask yourself in third grade was, can we do
better than the straight forward multiplication algorithm? And now is the
time for an answer.
Karatsuba Multiplication
Transcript
If you want to multiply two integers, is there a better method than the
one we learned back in third grade? To give you the final answer to this
question, you'll have to wait until I provide you with a toolbox for
analyzing Divide and Conquer algorithm a few lectures hence. What I
want to do in this lecture is convince you that the algorithm design space
is surprisingly rich. There are certainly other interesting methods of
multiplying two integers beyond what we learned in third grade. And the
highlight of this lecture will be something called Karatsuba multiplication.
Let me introduce you to Karatsuba multiplication through a concrete
example. I am going to take the same pair of integers we studied last
lecture, 1, 2, 3, 4, 5, 6, 7, 8. I am going to execute a sequence of steps
resulting in their products. But, that sequence of steps is going to look
very different than the one we undertook during the grade school
algorithm, yet we'll arrive at exactly the same answer. The sequence of
steps will strike you as very mysterious. It'll seem like I'm pulling a rabbit
out of the hat, and the rest of this video will develop more systematically
what exactly this Karatsuba multiplication method is, and why it works.
But what I want you to appreciate already on this slide is that the
algorithm design space is far richer than you might expect. There's this
dazzling array of options for how to actually solve problems like integer
multiplication. Let me begin by introducing some notation for the first and
second halves of the input numbers x and y. So the first half of x, that is
56- we're going to regard as a number in its own right called a. Similarly
b will be 78, c will be 12, and d will be 34. I'm going to do a sequence of
operations involving only these double digit numbers a b c and d. And
then after a few such operations I will collect all of the terms together in a
magical way resulting in the product of x and y. First let me compute the
product of a times c and also the product of b times d. I'm going to skip
the elementary calculations, and just tell you the answer. So you can
verify that a times c is 672, where as b times d is 2652. Next I'm going to
do something even still more inscrutable. I'm going to take the sum of a
and b. I'm going to take the sum of c and d. And then I'm going to
compute the product of those two sums. That boils down to computing
the product of 134 and 46. Mainly at 6164. Now, I'm going to subtract
our first two products from the results of this computation. That is, I'm
going to take 6164. Subtract 2652, and subtract 672. You should check
that if you subtract the results of the first 2 steps from the result of the
3rd step, you get 2840. Now, I claim that I can take the results of step 1,
2 and 4 and combine them into super simple way to produce the product
of X and Y. Here's how I do it. I start with the first product, ac. And I pad
it with four zeros. I take the results of the second step, and I don't pad it
with any zeros at all. And I take the result of the fourth step, and I pad it
with two zeros. If we add up these three quantities, from right to left. We
get two, five, six. Six, zero, zero, seven. If you go back to the previous
lecture you'll note that this is exactly the same output as the great school
algorithm, that this is in fact the product of one, two, the, three, four and
five, six, seven, eight. So let me reiterate that you should not have any
intuitions for the computations I just did, you should not understand what
just went down on this slide. Rather I hope you feel some mixture of
bafflement and intrigue but, more the point I hope you appreciate that
the third grade algorithm is not the only game in town. There's
fundamentally different algorithms for multiplying integers than what you
learned as a kid. Once you realize that, once you realize how rich the
space of algorithms is, you have to wonder Can we do better than that
third grade algorithm? In fact, does this algorithm already do better that
the third grade algorithm? Before I explain full-blown Karatsuba
multiplication, let me begin by explaining a simpler, more straightforward
recursive approach. To integer multiplication. Now, I am assuming you
have a bit of programming background. In particular, that you know what
recursive algorithms are. That is, algorithms which invoke themselves as
a subroutine with a smaller input. So, how might you approach the
integer multiplication problem recursively? Well the input are two digits.
Each two numbers. Each has two digits. So to call the algorithm
recursively you need to perform inputs that have smaller size, less digits.
Well, we already were doing that in the computations on the previous
slide. For example the number 5678 we treated the first half of digits as
56 as a number in its own right and similarly 78. In general, given a
number x with n digits. In can be expressed decomposed, in terms of
two, n over two digit numbers. Namely as A, the first half of the digits
shifted appropriately. That is multiplied by ten raised to the power, n over
two. Plus the second half of the digits b. In our example, we had a equal
to 56, 78 was b. N was 4, so 10 to the n over 2 was 100, and then c and
d were 12 and 34. What I want to do next is illuminate the relevant
recursive calls. To do that, let's look at the product, x times y. Express it
in terms of these smaller numbers, a, b, c, and d, and do an elementary
computation. Multiplying the expanded versions of x and y, we get an
expression with three terms. One shifted by n, 10 raised to the power n,
and the coefficient there is a times c. We have a term that's shifted by 10
to the n over 2, and that has a coefficient of ad and also plus bc. And
bringing up the rear, we have the term b times d. We're going to be
referring to this expression a number of times, so let me both circle it and
just give it a shorthand. We're going to call this expression star. One
detail I'm glossing over for simplicity, is that I've assumed that n is an
even integer. Now, if n is an odd integer, you can apply this exact same
recursive approach to integer multiplication. In the straightforward way,
so if n was 9 then you would decompose one of these input numbers
into say the first five digits and the later four digits and you would
proceed in exactly the same way. Now the point of the expression star is
if we look at it despite being the product of just elementary algebra, it
suggests a recursive approach to multiplying two numbers. If we care
about the product Of X and Y, why not, instead, compute this expression
star, which involves only the products of smaller numbers, A, B, C and
D. You'll notice, staring at the expression star, there are 4 relevant
products, each involving a pair of these smaller numbers. Namely AC,
AD, BC, and BD . So why not compute each of those four products
recursively. After all, the inputs will be smaller. And then once our four
recursive calls come back to us with the answer, we can formulate the
rest of expression star in the obvious way. We just pad a times c with n
zeros at the end. We add up a, d, and bc, using the grade school
algorithm, and pad the result with n over two zeros, and then we just
sum up these returns, again using the grade school addition, and
algorithm. So the one detail missing, that I've glossed over, required to
turn this idea into a bonafide recursive algorithm, would be to specify a
base case. As I hope you all know, recursive algorithms need a base
case. If the input is sufficiently small, then you just immediately compute
the answer rather than recursing further. Of course, recursive algorithms
need a base case so they don't keep calling themselves til the rest of
time. So for integer multiplication, which the base case, well, if you're
given two numbers that have the just one digit each. Then you just
multiply them in one basic operation and return the result. So, what I
hope is clear at the moment is that there is indeed a recursive approach
to solving the integer multiplication algorithm resulting in an algorithm
which looks quite different than the one you learned in third grade, but
which nevertheless you could code up quite easily in your favorite
programming language. Now, what you shouldn't have any intuition
about is whether or not this is a good idea or a completely crackpot idea.
Is this algorithm faster or slower than the grade school algorithm? You'll
just have to wait to find out the answer to that question. Let's now refine
this recursive algorithm, resulting in the full-blown Karatsuba
multiplication algorithm. To explain the optimization behind Karatsuba
multiplication, let's recall the expression we were calling star on the
previous slide. So, this just expressed the product of x and y in terms of
the smaller numbers a, b, c, and d. In this straight forward recursive
algorithm we made four recursive calls to compute the four products
which seemed necessary to value, to compute the expression star. But if
you think about it, there's really only three quantities in star that we care
about, the three relevant coefficients. We care about the numbers ad
and bc. Not per se, but only in as much as we care about their sum, AD
plus BC. So this motivates the question, if there's only 3 quantities that
we care about, can we get away with only 3 rather than 4 recursive calls.
It turns out that we can and here's how we do it. The first coefficient a c
and the third coefficient b d, we compute exactly as before, recursively.
Next, rather than recursively computing a d or b c, we're going to
recursively compute the product of a plus b and c plus d. If we expand
this out, this is the same thing as computing ac plus ad plus bc plus bd.
Now, here is the key observation in Karatsuba Multiplication, and it's
really a trick that goes back to the early 19th Century mathematician,
Gauss. Let's look at the quantity we computed in step 3 and subtract
from it. The two quantities that we already computed in steps one and
two. Subtracting out the result of step one cancels the a c term.
Subtracting out the result of step two, cancels out the bd term, leaving
us with exactly what we wanted all along, the middle coefficient a d plus
b c. And now in the same that on the previous slide we have a
straightforward recursive algorithm making four recursive calls, and then
combining them in the obvious way. Here we have a straightforward
recursive algorithm that makes only three recursive calls. And on top of
the recursive calls does just great school addition and subtraction. So
you do this particular difference between the three recursively computed
products and then you do the shifts, the padding by zeros, and the final
sum as before. So that's pretty cool, and this kind of showcases the
ingenuity which bears fruit even in the simplest imageable computational
problems. Now you should still be asking the question Yeah is crazy
algorthim really faster than the grade school algorithm we learn in 3rd
grade? Totally not obvious, we will answer that question a few lecture
hense and we'll answer it in a special case of an entire toolbox I'll
provide you with to analyze the running time of so called divide and
conquer algorithms like Karatsuba multiplication, so stay tuned.
Roughly how many levels does this recursion tree have (as a function of n, the length of the input
array)?
log2 (n)
√n
In this video, we'll be giving a running time analysis of the merge sort
algorithm. In particular, we'll be substantiating the claim that the
recursive divide and conquer merge sort algorithm is better, has better
performance than simpler sorting algorithms that you might know, like
insertion sort, selection sort, and bubble sort. So, in particular, the goal
of this lecture will be to mathematically argue the following claim from an
earlier video, that, in order to sort an array in numbers, the merge sort
algorithm needs no more than a constant times N log N operations.
That's the maximum number of lines of executable code that will ever
execute specifically six times n log n plus six n operations. So, how are
we going to prove this claim? We're going to use what is called a
recursion tree method. The idea of the recursion tree method is to write
out all of the work done by the recursive merge sort algorithm in a tree
structure, with the children of a given node corresponding to the
recursive calls made by that node. The point of this tree structure is it will
facilitate, interesting way to count up the overall work done by the
algorithm, and will greatly facilitate, the analysis. So specifically, what is
this tree? So at level zero, we have a root. And this corresponds to the
outer call of Merge Sort, okay? So I'm gonna call this level zero. Now
this tree is going to be binary in recognition of the fact that each
indication of Merge Sort makes two recursive calls. So the two children
will correspond to the two recursive calls of Merge Sort. So at the route,
we operate on the entire input array, so let me draw a big array
indicating that. And at level one, we have one sub problem for the left
half, and another sub problem for the right half of the input array. And I'll
call these first two recursive calls, level one. Now of course each of
these two level one recursive calls will themselves make two recursive
calls. Each operating on then a quarter of the original input array. So
those are the level two recursive calls, of which there are four, and this
process will continue until eventually the recursion bottoms out, in base
cases when they're only an array size zero or one. So now I have a
question for you which I'll, I'll give you in the form of a quiz which is, at
the bottom of this recursion tree corresponding to the base cases, what
is the level number at the bottom? So, at what level do the leaves in this
tree reside? Okay, so hopefully you guess,correctly guess, that the
answer is the second one, so namely that the number of levels of the
recursion tree is essentially logarithmic in the size of the input array. The
reason is basically that the input size is being decreased by a factor two
with each level of the recursion. If you have an input size of N at the
outer level, then each of the first set of recursive calls operates on array
of size n over two, at level two, each array has size n over four, and so
on. Where does the recursion bottom out? Well, down at the base cases
where there's no more recursion, which is where the input array has size
one or less. So in other words, the number of levels of recursion is
exactly the number of times you need to divide N by two, until you get
down to a number that's most one. Recall that's exactly the definition of
a logarithm base two of n. So since the first level is level zero and last
level is level log base two of n. The total number of levels is actually log
base two of n plus one. And when I write down this expression, I'm here
assuming that N is a, is a power of two. Which is not a big deal. I mean
the analysis is easily extended to the case where N is not a power of
two. And this way, we don't have to think about fractions. Log base two
of N then is an integer. Okay so let's return to the recursion tree. Let me
just redraw it really quick. So again, down here at the bottom of the tree
we have the leaves, i.e. The base cases where there's no more
recursion. Which when N is the power of two correspond exactly to
single element arrays. So that's the recursion tree corresponding to an
indication of Merge Sort. And the motivation for writing down, for
organizing the work performed by Merge Sort in this way, is it allows us
to count up the work, level by level. And we'll see that that's a particularly
convenient way to account for all of the different lines of code that get
executed. Now, to see that in more detail, I need to ask you to identify a
particular pattern. So, first of all, the first question is, at a given level, j, of
this recursion, exactly how many distinct sub-problems are there, as a
function of the level j? That's the first question. The second question is,
for each of those distinct sub-problems at level j, what is the input size?
So, what is the size of the a-, of the array, which is passed to a sub-
problem residing at level j of this recursion tree. So, the correct answer is
the third one. So, first of all, at a given level, j, there's precisely two to
the j distinct sub-problems. There's one outermost sub-problem at level
zero, it has two recursive calls, those are the two, sub-problems at level
one, and so on. In general, since merge short calls itself twice, the
number of sub-problems is doubling at each level, so that gives us the
expression, two to the j, for the number of sub-problems at level j. On the
other hand, by a similar argument, the input size is halving each time.
With each recursive call you pass it half of the input. That you were
given. So at each level of the recursion tree we're seeing half of the input
size of the previous level. So after J levels, since we started with an
input size of N, after J levels, each sub-problem will be operating on an
array of length N over two to the J. Okay, so now let's put this pattern to
use, and actually count up all of the lines of code that Merge Sort
executes. And as I said before, the key, the key idea is to count up the
work level by level. Now, to be clear, when I talk about the amount of
work done at level J, what I'm talking about is the work done by those 2
to the J invocations of Merge Sort, not counting their respective
recursive calls. Not counting work which is gonna get done in the
recursion, lower in the tree. Now, recall, Merge Sort is a very simple
algorithm. It just has three lines of code. First there is a recursive call so
we're not counting that, second there is another recursive call again
we're not counting that a little j and then third we just invoke the merge
subroutine. So really outside of the recursive calls all that merge sort
does is a single invocation of merge. Further recall we already have a
good understanding of the number of lines of code that merge needs.
On an input of size m, it's gonna use, at most, 6m lines of code. That's
an analysis that we did in the previous video. So, let's fix a level j. We
know how many sub-problems there are, two to the j. We know the size
of each sub-problem, n over two to the j, and we know how much work
merge needs on such an input, we just multiply it by six, and then we just
multiply it out, and we get the amount of work done at a level j. 'Kay?
And all of the little adjacent problems. So here it is in more detail. Alright.
So. We start with just the number of different sub-problems at level J and
we just notice that, that was at most two to the J. We also observe that
each level J sub-problem is passed an, an array as input which has
length N over two to the J. And we know that the merge subroutine,
when given an input, given an array of size, N over two to the J, will
execute almost six times that many number of lines of code. So to
compute the total amount of work done at level J, we just multiply the
number of problems times the work done sub-problem, per sub problem.
And then something sort of remarkable happens, where you get this
cancellation of the two, two to the Js. And we get an upper bound 6N.
Which is independent of the level J. So we do at most six end operations
at the root, we do at most six end operations at level one, at level two,
and so on, okay? It's independent of the level. Morally, the reason this is
happening is because of a perfect equilibrium between two competing
forces. First of all, the number of subproblems is doubling with each.
Level of the recursion tree but 2ndly the amount of work that we do per
sub-problem is halving with each level of the recursion tree's, once those
two cancel out. We get an upper bound 6N, which is independent of the
level J. Now, here's why that's so cool, right? We don't really care about
the amount of work just at a given level. We care about the amount of
work that Merge Sort does ever, at any level. But, if we have a bound on
the amount of work at a level which is independent of the level, then our
overall bound is really easy. What do we do? We just take the number of
levels, and we know what that is. It's exactly log based two of N plus
one. Remember the levels are zero through log based two of N inclusive.
And then we have an upper bound 6N for each of those log n plus one
levels. So if we expand out this quantity, we get exactly the upper bound
that was claimed earlier namely the number of operations merge sort
executes is at most 6n times log base 2 of n plus 6n. So that my friends,
is a running time analysis of the merge sort algorithm. That's why it's
running time is bounded by a constant times N log N which, especially
as N grows large, it is far superior to the more simple iterative algorithms
like insertion or selection sort.
Merge Sort: Analysis - Question 2
1 point possible (ungraded)
What is the pattern? Fill in the blanks in the following statement: at each
level j = 0, 1, 2,3, … log2(n) there are _______ subproblems, each of size
______.
2j and 2j respectively
n/2j and n/2j respectively
2j and n/2j respectively
n/2j and 2j respectively
In this video, we'll be giving a running time analysis of the merge sort
algorithm. In particular, we'll be substantiating the claim that the
recursive divide and conquer merge sort algorithm is better, has better
performance than simpler sorting algorithms that you might know, like
insertion sort, selection sort, and bubble sort. So, in particular, the goal
of this lecture will be to mathematically argue the following claim from an
earlier video, that, in order to sort an array in numbers, the merge sort
algorithm needs no more than a constant times N log N operations.
That's the maximum number of lines of executable code that will ever
execute specifically six times n log n plus six n operations. So, how are
we going to prove this claim? We're going to use what is called a
recursion tree method. The idea of the recursion tree method is to write
out all of the work done by the recursive merge sort algorithm in a tree
structure, with the children of a given node corresponding to the
recursive calls made by that node. The point of this tree structure is it will
facilitate, interesting way to count up the overall work done by the
algorithm, and will greatly facilitate, the analysis. So specifically, what is
this tree? So at level zero, we have a root. And this corresponds to the
outer call of Merge Sort, okay? So I'm gonna call this level zero. Now
this tree is going to be binary in recognition of the fact that each
indication of Merge Sort makes two recursive calls. So the two children
will correspond to the two recursive calls of Merge Sort. So at the route,
we operate on the entire input array, so let me draw a big array
indicating that. And at level one, we have one sub problem for the left
half, and another sub problem for the right half of the input array. And I'll
call these first two recursive calls, level one. Now of course each of
these two level one recursive calls will themselves make two recursive
calls. Each operating on then a quarter of the original input array. So
those are the level two recursive calls, of which there are four, and this
process will continue until eventually the recursion bottoms out, in base
cases when they're only an array size zero or one. So now I have a
question for you which I'll, I'll give you in the form of a quiz which is, at
the bottom of this recursion tree corresponding to the base cases, what
is the level number at the bottom? So, at what level do the leaves in this
tree reside? Okay, so hopefully you guess,correctly guess, that the
answer is the second one, so namely that the number of levels of the
recursion tree is essentially logarithmic in the size of the input array. The
reason is basically that the input size is being decreased by a factor two
with each level of the recursion. If you have an input size of N at the
outer level, then each of the first set of recursive calls operates on array
of size n over two, at level two, each array has size n over four, and so
on. Where does the recursion bottom out? Well, down at the base cases
where there's no more recursion, which is where the input array has size
one or less. So in other words, the number of levels of recursion is
exactly the number of times you need to divide N by two, until you get
down to a number that's most one. Recall that's exactly the definition of
a logarithm base two of n. So since the first level is level zero and last
level is level log base two of n. The total number of levels is actually log
base two of n plus one. And when I write down this expression, I'm here
assuming that N is a, is a power of two. Which is not a big deal. I mean
the analysis is easily extended to the case where N is not a power of
two. And this way, we don't have to think about fractions. Log base two
of N then is an integer. Okay so let's return to the recursion tree. Let me
just redraw it really quick. So again, down here at the bottom of the tree
we have the leaves, i.e. The base cases where there's no more
recursion. Which when N is the power of two correspond exactly to
single element arrays. So that's the recursion tree corresponding to an
indication of Merge Sort. And the motivation for writing down, for
organizing the work performed by Merge Sort in this way, is it allows us
to count up the work, level by level. And we'll see that that's a particularly
convenient way to account for all of the different lines of code that get
executed. Now, to see that in more detail, I need to ask you to identify a
particular pattern. So, first of all, the first question is, at a given level, j, of
this recursion, exactly how many distinct sub-problems are there, as a
function of the level j? That's the first question. The second question is,
for each of those distinct sub-problems at level j, what is the input size?
So, what is the size of the a-, of the array, which is passed to a sub-
problem residing at level j of this recursion tree. So, the correct answer is
the third one. So, first of all, at a given level, j, there's precisely two to
the j distinct sub-problems. There's one outermost sub-problem at level
zero, it has two recursive calls, those are the two, sub-problems at level
one, and so on. In general, since merge short calls itself twice, the
number of sub-problems is doubling at each level, so that gives us the
expression, two to the j, for the number of sub-problems at level j. On the
other hand, by a similar argument, the input size is halving each time.
With each recursive call you pass it half of the input. That you were
given. So at each level of the recursion tree we're seeing half of the input
size of the previous level. So after J levels, since we started with an
input size of N, after J levels, each sub-problem will be operating on an
array of length N over two to the J. Okay, so now let's put this pattern to
use, and actually count up all of the lines of code that Merge Sort
executes. And as I said before, the key, the key idea is to count up the
work level by level. Now, to be clear, when I talk about the amount of
work done at level J, what I'm talking about is the work done by those 2
to the J invocations of Merge Sort, not counting their respective
recursive calls. Not counting work which is gonna get done in the
recursion, lower in the tree. Now, recall, Merge Sort is a very simple
algorithm. It just has three lines of code. First there is a recursive call so
we're not counting that, second there is another recursive call again
we're not counting that a little j and then third we just invoke the merge
subroutine. So really outside of the recursive calls all that merge sort
does is a single invocation of merge. Further recall we already have a
good understanding of the number of lines of code that merge needs.
On an input of size m, it's gonna use, at most, 6m lines of code. That's
an analysis that we did in the previous video. So, let's fix a level j. We
know how many sub-problems there are, two to the j. We know the size
of each sub-problem, n over two to the j, and we know how much work
merge needs on such an input, we just multiply it by six, and then we just
multiply it out, and we get the amount of work done at a level j. 'Kay?
And all of the little adjacent problems. So here it is in more detail. Alright.
So. We start with just the number of different sub-problems at level J and
we just notice that, that was at most two to the J. We also observe that
each level J sub-problem is passed an, an array as input which has
length N over two to the J. And we know that the merge subroutine,
when given an input, given an array of size, N over two to the J, will
execute almost six times that many number of lines of code. So to
compute the total amount of work done at level J, we just multiply the
number of problems times the work done sub-problem, per sub problem.
And then something sort of remarkable happens, where you get this
cancellation of the two, two to the Js. And we get an upper bound 6N.
Which is independent of the level J. So we do at most six end operations
at the root, we do at most six end operations at level one, at level two,
and so on, okay? It's independent of the level. Morally, the reason this is
happening is because of a perfect equilibrium between two competing
forces. First of all, the number of subproblems is doubling with each.
Level of the recursion tree but 2ndly the amount of work that we do per
sub-problem is halving with each level of the recursion tree's, once those
two cancel out. We get an upper bound 6N, which is independent of the
level J. Now, here's why that's so cool, right? We don't really care about
the amount of work just at a given level. We care about the amount of
work that Merge Sort does ever, at any level. But, if we have a bound on
the amount of work at a level which is independent of the level, then our
overall bound is really easy. What do we do? We just take the number of
levels, and we know what that is. It's exactly log based two of N plus
one. Remember the levels are zero through log based two of N inclusive.
And then we have an upper bound 6N for each of those log n plus one
levels. So if we expand out this quantity, we get exactly the upper bound
that was claimed earlier namely the number of operations merge sort
executes is at most 6n times log base 2 of n plus 6n. So that my friends,
is a running time analysis of the merge sort algorithm. That's why it's
running time is bounded by a constant times N log N which, especially
as N grows large, it is far superior to the more simple iterative algorithms
like insertion or selection sort.
Problem: Does array A contain the integer t? Given A (array oflength n) and
t (an integer).
Algorithm 1
1: for i = 1 to n do
2: if A[i] == t then
3: Return TRUE
4: Return FALSE
What is the running time of this piece of code?
Algorithm 2
1: for i = 1 to n do
2: if A[i] == t then
3: Return TRUE
4: for i = 1 to n do
5: if B[i] == t then
6: Return TRUE
7: Return FALSE
What is the running time of this piece of code?
Algorithm 3
1: for i = 1 to n do
2: for j = 1 to n do
3: if A[i] == B[j] then
4: Return TRUE
5: Return FALSE
What is the running time of this piece of code?
The Gist - Part 4
In this sequence of lectures we're going to learn Asymptotic Analysis.
This is the language by which every serious computer programmer and
computer scientist discusses the high level performance of computer
algorithms. As such, it's a totally crucial topic. In this video, the plan is to
segue between the high level discussion you've already seen in the
course introduction and the mathematical formalism,which we're going
to start developing in the next video. Before getting into that
mathematical formalism, however. I want to make sure that the topic is
well motivated. That you have solid intuition for what it's trying to
accomplish. And also that you've seen a couple simple, intuitive
examples. Let's get started. [UNKNOWN] analysis provides basic
vocabulary for discussing the design and analysis in algorithms. More, it
is a mathematical concept it is by no means math for maths sake. You
will very frequently hear serious programmers saying that such and such
code runs at O of n time, where such and such other code runs in o of n
square times. It's important you know what programmers mean when
they make statements like that. The reason this vocabulary is so
ubiquitous, is that it identifies a sweet spot for discussing the high level
performance of algorithms. What I mean by that is, it is on the one hand
coarse enough to suppress all of the details that you want to ignore.
Details that depend on the choice of architecture, the choice of
programming language, the choice of compiler. And so on. On the
other hand, it's sharp enough to be useful. In particular, to make
predictive comparisons between different high level algorithmic
approaches to solving a common problem. This is going to be especially
true for large inputs. And remember as we discussed in some sense.
Large inputs are the interesting ones. Those are the ones for which we
need algorithmic enginuity. For example, asotonic analysis will allow us
to differentiate between better and worse approaches to sorting. Better
and worse approaches to multiplying two integers, and so on. Now most
serious programmers if you ask them, what's the deal with asymptotic
analysis anyways? They'll tell you reasonably, that the main point is to
suppress both leading constant factors and lower order terms. Now as
we'll see there's more to Asymptotic Analysis than just these seven
words here but long term, ten years from now, if you only remember
seven words about Asymptotic Analysis I'll be reasonably happy if these
are the seven words that you remember. So how do we justify adopting
a formalism which essentially by definition suppresses constant factors
and lower-order terms. Well lower-order terms basically by definition
become increasingly irrelevant as you focus on large inputs. Which as
we've argued are the interesting inputs, the ones where algorithmic
ingenuity is important. As far as constant factors these are going to be
highly dependent on the details of the environment, the compiler, the
language and so on. So, if we want to ignore those details it makes
sense to have a formalism which doesn't focus unduly on leading
constant factors. Here's an example. Remember when we analyzed the
merge sort algorithm? We gave an upper bound on its running time that
was 6 times n log n plus 6n where n was the input length, the number of
numbers [COUGH] in the input array. So, the lower order term here is
the 6n. That's growing more slowly than n log n. So, we just drop that.
And then the leading constant factor is the 6 so we supress that well
after the 2 supressions we're left with a much simpler expression N log
N. The terminology would then be to say that the running time of merge
search is big O of N log N. So in other words when you say that an
algorithms is big O of some function what you mean is that after you
drop the lower order terms. And suppress the leasing, leading constant
factor, you're left with the function f of n. Intuitively that is what big O
notation means. So to be clear I'm certainly not asserting the constant
factors never matter when you're designing an alg, analyzing
algorithms. Rather, I'm just saying that when you think about high-level
algorithmic approaches, when you want to make a comparison between
fundamentally differnt ways of solving a problem. Asymptotic Analysis is
often the right tool for giving you guidance about which one is going to
perform better, especially on reasonably large inputs. Now, once you've
committed to a particular algorithmic solution to a problem Of course,
you might want to then work harder to improve the leading constant
factor, perhaps even to improve the lower order terms. By all means, if
the future of your start-up depends on how efficiently you implement
some particular set of lines of code, have at it. Make it as fast as you
can. In the rest of this video I want to go through four very simple
examples. In fact, these examples are so simple, if you have any
experience with big O notation You're probably just better off skipping
the rest of this video and moving on the mathematical formalism that we
begin in the next video. But if you've never seen it before I hope these
simple examples will get you oriented. So let's begin with a very basic
problem, searching an array for a given integer. Let's analyze the
straight forward algorithm for this problem where we just do a linear
scan through, through the array, checking each entry to see if it is the
desired integer t. That is the code just checks each array entry in turn. If
it ever finds integer t it returns true. If it falls off the end of the array
without finding it it returns false. So, what do you think? We haven't
formally defined big O notation but, I've given you an intuitive
description. What would you say is the running time of this algorithm as
a function of the length of the array of capital A. So the answer I am
looking for is C, the O(n) or covalently we would say that the running
time of this algorithm is linear in the input length n. Why is that true?
Well, let's think about how many operations this piece of code is going
to execute. Actually, the lines of code executed is going to depend on
the input. It depends on whether or not the target t is contained in the
array a, and if so, where in the array a it lies. But, in the worse case, this
code will do an unsuccessful search. >> T will not be in the array and
the code will scan through the entire array A and return false. The
number of operations then is a constant. There's some initial setup
perhaps and maybe it's an operation to return this final boolean value,
but outside of that constant which will get suppressed in the big
annotation, it does a constant number of operations per entry in the
array. And you could argue about what the constant is, if it's 2, 3, 4
operations per entry in the array, but the point it whatever that constant
is, 2, 3, or 4, it gets conveniently suppressed by the Big O notation. So
as a result, total number of operations will be linear in n, and so the Big
O notation will just be O of N. So that was the first example, and the last
three examples, I want to look at different ways that we could have two
loops. And in this example, I want to think about one loop followed by
another. So two loops in sequence. I want to study almost the same
problem as the previous one. Where now we're just given two arrays,
capital a and capital b, we'll say both of the same length n, and we want
to know whether the target t is in either one of them. Again, we'll look at
the straightforward algorithm, where we just search through A, and if we
fail to find t in A, we search through B. If we don't find t in B either, then
we have to return false. So the question then is exactly the same as
last time. Given this new longer piece of code, what, in big O notation, is
its running time? Well the question was the same and in this case the
answer was the same so this algorithm just like the last one has running
time big O of N if we actually count the number of operations it won't be
exactly the ssame as last time it will be roughly twice as many
operations. As the previous piece of code. That's because we have to
search two different arrays, each of length n. So whatever work we did
before. We now do it twice as many times. Of course, that, too, being a
constant independent of the input length n, is going to get suppressed
once we passed a big O notation. So, this, like the previous algorithm, is
a linear time algorithm. It has running time big O of n. Let's look at a
more interesting example of two loops where rather than processing
each loop in sequence, they're going to be nested. In particular let's look
at the problem of searching whether two given input arrays each of
length n contain a common number. The code that we're going to look
at for solving this problem is the most straightforward one you can, you
can imagine where we just compare all possibilities. So for each index i
into the array a and each index j into the array b, we just see if A i is the
same number as B j. If it is, we return true. If we exhaust all of the
possibilities without ever finding equal elements Then we're save in
returning false. The question is of course is, in terms of big O notation,
asymptotic analysis, as a function of the array length n, what is the
running time of this piece of code? So this time, the answer has
changed. For this piece of code, the running time is not big O of n. But
it is big O of n squared. So we might also call this a quadratic time
algorithm. because the running time is quadratic in the input length n.
So this is one of those kinds of algorithms, where, if you double the
input length. Then the running time of the algorithm will go up by a
factor of 4, rather than by a factor of 2 like in the previous two pieces of
code. So, why is this? Why does it have [UNKNOWN] running time
[UNKNOWN] of n squared? Well again, there's some constant setup
cost which gets suppressed in the big [UNKNOWN]. Again, for each
fixed choice of an entry i into array a, and then index j for array b for
each fixed choice for inj. We only do a constant number of operations.
The particular constants are relevant, because it gets suppressed in the
big O notation. What's different is that there's a total of n squared
iterations of this double four loop. In the first example, we only had n
iterations of a single four loop. In our second example, because one four
loop completed before the second one began. We had only two n
iterations overall. Here for each of the n iterations of the outer for loop
we do n iterations of the inner for loop. So that gives us the n times n
i.e. n squared total iterations. So that's going to be the running time of
this piece of code. Let's wrap up with one final example. It will again be
nested for loops, but this time, we're going to be looking for duplicates in
a single array A, rather than needing to compare two distinct arrays A
and B. So, here's the piece of code we're going to analyze for solving
this problem, for detecting whether or not the input array A has duplicate
entries. There's only 2 small difference relative to the code we went
through on the previous slide when we had 2 different arrays the first
surprise, the first change won't surprise you at all which instead of
referencing the array B I change that B to an A so I just compare the ith
entry of a to the Jth entry of A. The second change is a little more subtle
which is I changed the inner for loop so the index J begins. At I plus 1.
Where I is the current value of the outer four loops index. Rather than
starting at the index 1. I could have had it start at the index one. That
would still be correct. But, it would be wasteful. And you should think
about why. If we started the inner four loops index at 1. Then this code
would actually compare each distinct pair of elements at a to each other
twice. Which, of course, is silly. You only need to compare two different
elements of a to each other one. To know whether they are equal or
not. So this is the piece of code. The question is the same as it always
is what in terms of bigger notations in the input link n is the running time
of this piece of code. So the answer to this question, same as the last
one. Big O of n squared. That is, this piece of code is also a quad-, has
quadratic running time. So what I hope was clear was that, you know?
Whatever the running time of this piece of code is. It's proportional to
the number of iterations of this double four loop. Like in all the
examples, we do constant work per iteration. We don't care about the
constant. It gets suppressed by the big O notation. So, all we gotta do is
figure out how many iterations there are of this double four loop. My
claim is that there's roughly n squared over two iterations of this double
four loop. There's a couple ways to see that. Informally, we discussed
how the difference between this code and the previous one, is that,
instead of counting something twice, we're counting it once. So that
saves us a factor of two in the number of iterations. Of course, this one
half factor gets suppressed by the big O notation anyways. So the big
O, running time doesn't change. A different argument would just say,
you know? How many, there's one iteration for every distinct choice of i
and j of indices between one and n. And a simple counting argument.
Says that there's n choose 2 such choices of distinct i and j, where n
choose 2 is the number n times n minus 1 over 2. And again,
supressing lower-order terms and the constant factor, we still get a
quadratic dependence on the length of the input array A. So that wraps
up some of the sort of just simple basic examples. I hope this gets you
oriented, you have a strong intuitive sense for what big O notation is
trying to accomplish. And how it's defined mathematically. Let's now
move onto both the mathematical developments and some more
interesting algorithms.
Big-Oh Notation
In the following series of videos, we'll give a formal treatment of
asymptotic notation, in particular big-Oh notation, as well as work
through a number of examples. Big-Oh notation concerns functions
defined on the positive integers, we'll call it T(n) We'll pretty much always
have the same semantics for T(n). We're gonna be concerned about the
worst-case running time of an algorithm, as a function of the input size,
n. So, the question I wanna answer for you in the rest of this video, is,
what does it mean when we say a function, T(n), is big-Oh of f(n). Or
hear f(n) is some basic function, like for example n log n. So I'll give you
a number of answers, a number of ways of, to think about what big-Oh
notation really means. For starters let's begin with an English definition.
What does it mean for a function to be big-Oh of f(n)? It means
eventually, for all sufficiently large values of n, it's bounded above by a
constant multiple of f(n). Let's think about it in a couple other ways. So
next I'm gonna translate this English definition into picture and then I'll
translate it into formal mathematics. So pictorially you can imagine that
perhaps we have T(n) denoted by this blue functions here. And perhaps
f(n) is denoted by this green function here, which lies below T(n). But
when we double f(n), we get a function that eventually crosses T(n) and
forevermore is larger than it. So in this event, we would say that T(n)
indeed is a Big-Oh of f(n). The reason being that for all sufficiently large
n, and once we go far enough out right on this graph, indeed, a constant
multiple times of f(n), twice f(n), is an upper bound of T(n). So finally, let
me give you a actual mathematical definition that you could use to do
formal proofs. So how do we say, in mathematics, that eventually it
should be bounded above by a constant multiple of f(n)? We see that
there exists two constants, which I'll call c and n0. So that T(n) is no
more than c times f(n) for all n that exceed or equal n0. So, the role of
these two constants is to quantify what we mean by a constant multiple,
and what we mean by sufficiently large, in the English definition. c
obviously quantifies the constant multiple of f(n), and n0 is quantifying
sufficiently large, that's the threshold beyond which we insist that, c
times f(n) is an upper-bound on T(n). So, going back to the picture, what
are c and n0? Well, c, of course, is just going to be two. And n0 is the
crossing point. So we get to where two f(n). And T(n) cross, and then we
drop the acentode. This would be the relative value of n0 in this picture,
so that's the formal definition, the way to prove that something's bigger
of f(n) you exhibit these two constants c and n0 and it better be the case
that for all n at least n0, c times f(n) upper-bounds T(n). One way to think
about it if you're trying to establish that something is big-Oh of some
function it's like you're playing a game against an opponent and you
want to prove that. This inequality here holds and your opponent must
show that it doesn't hold for sufficiently large n you have to go first your
job is to pick a strategy in the form of a constant c and a constant n0 and
your opponent is then allowed to pick any number n larger than n0 so
the function is big-Oh of f(n) if and only if you have a winning strategy in
this game. If you can up front commit to constants c and n0 so that no
matter how big of an n your opponent picks, this inequality holds if you
have no winning strategy then it's not big-Oh of f(n) no matter what C
and n0 you choose your opponent can always flip this in equality. By
choosing a suitable, suitable large value of n. I want to emphasis one
last thing which is that these constants, what do I mean by constants. I
mean they are independent of n. And so when you apply this definition,
and you choose your constant c and n0, it better be that n does not
appear anywhere. So C should just be something like a thousand or a
million. Some constant independent of n. So those are a bunch of way to
think about big-Oh notation. In English, you wanna have it bound above
for sufficiently large numbers n. I'm showing you how to translate that
into mathematics that give you a pictorial representation. And also sort of
a game theoretic way to think about it. Now, let's move on to a video that
explores a number of examples.
Basic Examples
Having slogged through the formal definition of big O notation, I wanna
quickly turn to a couple of examples. Now, I wanna warn you up front,
these are pretty basic examples. They're not really gonna provide us
with any insight that we don't already have. But they serve as a sanity
check that the big O notation's doing what its intended purpose is.
Namely to supress constant factors and low order terms. Obviously,
these simple examples will also give us some, facility with the definition.
So the first example's going to be to prove formally the following claim.
The claim states that if T(n) is some polynomial of degree "k", so namely
a<u>k n^k. Plus all the way up to a<u>1 N + a<u>0. For any integer
"k", positive</u></u></u> integer "k" and any coefficients a<u>i's
positive or negative. Then: T(n) is big</u> O of n^k. So this claim is a
mathematical statement and something we'll be able to prove. As far as,
you know, what this claim is saying, it's just saying big O notation really
does suppress constant factors and lower order terms. If you have a
polynomial then all you have to worry about is what is the highest power
in that polynomial and that dominates its growth as "n" goes to infinity.
So, recall how one goes about showing that one function is big O of
another. The whole key is to find this pair of constants, c and n<u>0,
where c quantifies the constant multiple</u> of the function you're trying
to prove big O of, and n<u>0 quantifies what you mean</u> by "for all
sufficiently large n." Now, for this proof, to keep things very simple to
follow, but admittedly a little mysterious, I'm just gonna pull these
constants, c and n<u>0, out of a hat. So, I'm not gonna tell you how I
derived them,</u> but it'll be easy to check that they work. So let's work
with the constants n<u>0</u> equal to one, so it's very simple choice of
n<u>0 and then "c" we are gonna pick to</u> be sum of the absolute
values of the coefficients. So the absolute value of "a<u>k"</u> plus the
absolute value of "a<u>(k-1)", and so on. Remember I didn't assume
that</u> the pol..., the original polynomial, had non-negative coefficients.
So I claim these constants work, in the sense that we'll be able to prove
to that, assert, you know, establish the definition of big O notation. What
does that mean? Well we need to show that for all "n" at least one
(cause remember we chose n<u>0 equal to</u> one), T(n) (this
polynomial up here) is bounded above by "c" times "n^k", where "c" is
the way we chose it here, underlined in red. So let's just check why this
is true. So, for every positive integer "n" at least one, what do we need to
prove? We need to prove T(n) is upper bounded by something else. So
we're gonna start on the left hand side with T(n). And now we need a
sequence of upper bounds terminating with "c" times "n^k" (our choice of
c underlined in red). So T(n) is given as equal to this polynomial
underlined in green. So what happens when we replace each of the
coefficients with the absolute value of that coefficient? Well, you take the
absolute value of a number, either it stays the same as it was before, or
it flips from negative to positive. Now, "n" here, we know is at least one.
So if any coefficient flips from negative to positive, then the overall
number only goes up. So if we apply the absolute value of each of the
coefficients we get an only bigger number. So T(n) is bounded above by
the new polynomial where the coefficients are the absolute values of
those that we had before. So why was that a useful step? Well now what
we can do is we can play the same trick but with "n". So it's sort of
annoying how right now we have these different powers of "n". It would
be much nicer if we just had a common power of "n", so let's just replace
all of these different "n"s by "n^k", the biggest power of "n" that shows up
anywhere. So if you replace each of these lower powers of "n" with the
higher power "n^k", that number only goes up. Now, the coefficients are
all non negative so the overall number only goes up. So this is bounded
above by "the absolute value of a<u>k" "n^k"</u> ...up to "absolute
value of a<u>1" "n^k" ...plus "a<u>0" "n^k".</u></u> I'm using here that
"n" is at least one, so higher powers of "n" are only bigger. And now
you'll notice this, by our choice of "c" underlined in red, this is exactly
equal to "c" times "n^k". And that's what we have to prove. We have to
prove that T(n) is at most "c" times "n^k", given our choice of "c" for
every "n" at least one. And we just proved that, so, end of proof. Now
there remains the question of how did I know what the correct, what a
workable value of "c" and "n<u>0"</u> were? And if you yourself want to
prove that something is big O of something else, usually what you do is
you reverse engineer constants that work. So you would go through a
proof like this with a generic value of "c" and "n<u>0" and then</u> you'd
say, "Ahh, well if only I choose "c" in this way, I can push the proof
through." And that tells you what "c" you should use. If you look at the
optional video on further examples of asymptotic notation, you'll see
some examples where we derive the constants via this reverse
engineering method. But now let's turn to a second example, or really I
should say, a non-example. So what we're going to prove now is that
something is not big O of something else. So I claim that for every "k" at
least 1, "n^k" is not O(n^(k-1)). And again, this is something you would
certainly hope would be true. If this was false, there'd be something
wrong with our definition of big O notation and so really this is just to get
further comfort with the definition, how to prove something is not big O of
something else, and to verify that indeed you don't have any collapse of
distinctive powers of ploynomials, which would be a bad thing. So how
would we prove that something is not big O of something else? The
most...frequently useful proof method is gonna be by contradiction. So,
remember, proof by contradiction, you assume what you're trying to,
establish is actually false, and, from that, you do a sequence of logical
steps, culminating in something which is just patently false, which
contradicts basic axioms of mathematics, or of arithmetic. So, suppose,
in fact, n^k was big O of n^(k-1), so that's assuming the opposite of what
we're trying to prove. What would that mean? Well, we just referred to
the definition of Big O notation. If in fact "n^k" hypothetically were Big O
of n^(k-1) then by definition there would be two constants, a winning
strategy if you like, "c" and "n<u>0" such</u> that for all sufficiently large
"n", we have a constant multiple "c" times "n^(k-1)" upper bounding
"n^k". So from this, we need to derive something which is patently false
that will complete the proof. And the way, the easiest way to do that is to
cancel "n^(k-1)" from both sides of this inequality. And remember since
"n" is at least one and "k" is at least one, it's legitimate to cancel this
"n^(k-1)" from both sides. And when we do that we get the assertion that
"n" is at most some constant "c" for all "n" at least "n<u>0". And this
now</u> is a patently false statement. It is not the case that all positive
integers are bounded above by a constant "c". In particular, "c+1", or the
integer right above that, is not bigger than "c". So that provides the
contradiction that shows that our original assumption that "n^k" is big O
of "n^(k-1)" is false. And that proves the claim. "n^k" is not big O of "n^(k-
1)", for every value of "k". So different powers of polynomials do not
collapse. They really are distinct, with respect to big O notation.
Additional Examples
Homework
2. You are a given a unimodal array of n distinct elements, meaning that its
entries are in increasing order up until its maximum element, after which its
elements are in decreasing order. Give an algorithm to compute the
maximum element that runs in O(log n) time.
Overview
THE MASTER METHOD - These lectures cover a "black-box" method
for solving recurrences. You can then immediately determine the running
time of most of the divide-and-conquer algorithms that you'll ever see!
(Including Karatsuba's integer multiplication algorithm and Strassen's
matrix multiplication algorithm from Week 1.) The proof is a nice
generalization of the recursion tree method that we used to analyze
MergeSort. Ever wonder about the mysterious three cases of the Master
Method? Watch these videos and hopefully all will become clear.
HOMEWORK: Problem Set #2 has five questions that should give you
practice with the Master Method and help you understand QuickSort
more deeply. Programming Assignment #2 asks you to implement
QuickSort and compute the number of comparisons that it makes for
three different pivot rules.
1. Motivation (8 min)
2. Formal Statement (10 min)
3. Examples (13 min)
4. Proof I (10 min)
5. Interpretation of the 3 Cases (11 min)
6. Proof II (16 min)
growth