Materials

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 74

Stanford Online

CSX0003

Algorithms: Design and Analysis, Part


1
Chapter 1: INTRODUCTION

Welcome and Overview


WELCOME: Welcome to Algorithms: Design and Analysis, Part I! Here's an
overview of the first few sections of material.

INTRODUCTION: The first set of lectures for this week is meant to give


you the flavor of the course, and hopefully get you excited about it. We
begin by discussing algorithms in general and why they're so important,
and then use the problem of multiplying two integers to illustrate how
algorithmic ingenuity can often improve over more straightforward or naive
solutions. We discuss the Merge Sort algorithm in detail, for several
reasons: it's a practical and famous algorithm that you should all know; it's
a good warm-up to get you ready for more intricate algorithms; and it's the
canonical introduction to the "divide and conquer" algorithm design
paradigm. These lectures conclude by describing several guiding principles
for how we'll analyze algorithms in this course.

ASYMPTOTIC ANALYSIS: The second set of lectures for this week is an


introduction to big-oh notation and its relatives, which belongs in the
vocabulary of every serious programmer and computer scientist. The goal
is to identify a "sweet spot" of granularity for reasoning about algorithms ---
we want to suppress second-order details like constant factors and lower-
order terms, and focus on how the running time of an algorithm scales as
the input size grows large.

DIVIDE AND CONQUER ALGORITHMS: The final set of lectures for this


week discusses three non-trivial examples of the divide and conquer
algorithm design paradigm. The first is for counting the number of
inversions in an array. This problem is related to measuring similarity
between two ranked lists, which in turn is relevant for making good
recommendations to someone based on your knowledge of their and
others' preferences ("collaborative filtering"). The second algorithm is
Strassen's mind-blowing recursive algorithm for matrix multiplication, which
improves over the obvious iterative method. The third algorithm, which is
more advanced and is optional material, is for computing the closest pair of
points in the plane.

PREREQUISITES: This course is not an introduction to programming, and


it assumes that you have basic programming skills in a language such as
Python, Java, or C. There are several outstanding free online courses that
teach basic programming. We also use mathematical analysis as needed
to understand how and why algorithms and data structures really work. If
you need a refresher on the basics of proofs (induction, contradiction, etc.),
I recommend the lecture notes "Mathematics for Computer Science" by
Lehman and Leighton (see separate Resources pages).

DISCUSSION FORUMS: The discussion forums play a crucial role in


massive online courses like this one, which is an all-volunteer effort. If you
have trouble understanding a lecture or completing an assignment, you
should turn to the forums for help. After you've mastered the lectures and
assignments for a given week, I hope you'll contribute to the forums and
help out your fellow students. While I won't have time to carefully monitor
the discussion forums, I'll check in and answer questions whenever I find
the time.

VIDEOS AND SLIDES: Videos can be streamed or downloaded and


watched offline (recommended for commutes, etc.). We are also providing
PDF lecture slides (typed versions of what's written in the lecture videos),
as well as subtitle files. And if you find yourself wishing that I spoke more
quickly or more slowly, note that you can adjust the video speed to
accommodate your preferred pace.

HOMEWORK #1: The first problem set consists of 5 problems, mostly


about Merge Sort and asymptotic notation. The first programming
assignment asks you to implement the counting inversions algorithm (see
the third set of lectures) in whatever programming language you please,
run it on a quite large input, and enter the answer.

RELATED READINGS: Algorithms Illuminated (Part 1), Chapters 1, 2, and


3.
Why Study Algorithms?

Transcript

Hi, my name's Tim Roughgarden. I'm a professor here at Stanford University.


And I'd like to welcome you to this first course on the Design and Analysis of
Algorithms. Now, I imagine many of you are already clear on your reasons for
taking this course. But let me begin by justifying this course's existence. And
giving you some reasons why you should be highly motivated to learn about
algorithms. So what is an algorithm anyways? Basically it's a set of well defined
rules, a recipe in effect for solving some computational problem. Maybe you
have a bunch of numbers and you want to rearrange them so that they're in
sorted order. Maybe you have a roadmap and an origin and a destination And
you want to compute the shortest path from that origin to that destination.
May be you face a number of different tasks that need to be completed by
certain deadlines and you ant to know in what order you should accomplish
the task. So that you complete them all by their respective deadlines. So why
study algorithms? Well first of all, understanding the basics of algorithms and
the related field of data structures is essential for doing serious work in pretty
much any branch of computer science. This is the reason why here at Stanford,
this course is required for every single degree that the department offers. The
bachelors degree the masters degree and also the PHD. To give you a few
examples routing and communication networks piggybacks on classical
shortest path algorithms. The effectiveness of public key cryptography relies
on that of number-theoretic algorithms. Computer graphics needs the
computational primitives supplied by geometric algorithms. Database indices
rely on balanced search tree data structures. Computational biology uses
dynamic programming algorithms to measure genome similarity. And the list
goes on. Second, algorithms play a key role in modern technological
innovation. To give just one obvious example, search engines use a tapestry of
algorithms to efficiently compute the relevance of various webpages to it's
given search query. The most famous such algorithm is the page rank
algorithm currently in use by Google. Indeed in a December 2010 report to the
United States White House, the President's counsel of advisers on science and
technology argued that in many areas performance gains due to improvements
in algorithms have vastly exceeded event he dramatic performance gains due
to increased processor speeds. Third, although this is outside of the score, the
scope of this course. Algorithms are increasingly being used to provide a novel
lens on processes outside of computer science and technology. For example,
the study of quantum computation has provided a new Computational
viewpoint on quantum mechanics. Price fluctuations in economic markets can
be fruitfully viewed as an algorthmic process and even evolution can be
usefully thought of as a surprisingly effect search algorthim. The last two
reasons for studying algorthims might sound flippant but both have more than
a grain of truth to them. I don't know about you, but back when I was a
student, my favorite classes were always the challenging ones that, after I
struggled through them, left me feeling a few IQ points smarter than when I
started. I hope this course provides a similar experience for many of you.
Finally, I hope that by the end of the course I'll have converted some of you to
agree with me that the design and analysis of algorithms is simply fun. It's an
endeavor that requires a rare blend of precision and creativity. It can certainly
be frustrating at times, but it's also highly addictive. So let's descend from
these lofty generalities and get much more concrete. And let's remember that
we've all been learning about and using algorithims since we were little kids.

Integer Multiplication
Transcripts
Sometime when you were a kid, maybe say third grade or so, you
learned an Algorithm for multiplying two numbers. Maybe your third
grade teacher didn't call it that, maybe that's not how you thought about
it. But you learned a well defined set of rules for transforming input,
namely two numbers into an output, namely their product. So, that is an
algorithm for solving a computational problem. Let's pause and be
precise about it. Many of the lectures in this course will follow a pattern.
We'll define a computational problem. We'll say what the input is, and
then we'll say what the desired output is. Then we will proceed to giving
a solution, to giving an algorithm that transforms the input to the output.
When the integer multiplication problem, the input is just two, n-digit
numbers. So the length, n, of the two input integers x and y could be
anything, but for motivation you might want to think of n as large, in the
thousands or even more, perhaps we're implementing some kind of
cryptographic application which has to manipulate very large numbers.
We also need to explain what is desired output in this simple problem it's
simply the product x times y. So a quick digression so back in 3rd grade
around the same I was learning the Integer Multiplication Algorithm. I got
a C in penmanship and I don't think my handwriting has improved much
since. Many people tell me by the end of the course. They think of it
fondly as a sort of acquired taste, but if you're feeling impatient, please
note there are typed versions of these slides. Which I encourage you to
use as you go through the lectures, if you don't want to take the time
deciphering the handwriting. Returning to the Integer Multiplication
problem, having now specified the problem precisely, the input, the
desired output. We'll move on to discussing an algorithm that solves it,
namely, the same algorithm you learned in third grade. The way we will
assess the performance of this algorithm is through the number of basic
operations that it performs. And for the moment, let's think of a basic
operation as simply adding two single-digit numbers together or
multiplying two single digit numbers. We're going to then move on to
counting the number of these basic operations performed by the third
grade algorithm. As a function of the number n of digits in the input.
Here's the integer multiplication algorithm that you learned back in third
grade Illustrated on a concrete example. Let's take say the numbers 1,
2, 3, 4 and 5, 6, 7, 8. As we go through this algorithm quickly, let me
remind you that our focus should be on the number of basic operations
this algorithm performs. As a function of the length of the input numbers.
Which, in this particular example, is four digits long. So as you'll recall,
we just compute one partial product for each digit of the second number.
So we start by just multiplying 4 times the upper number 5, 6, 7, 8. So,
you know, 4 times 8 is 32, 2 carry to 3, 4 times 7 is 28, with the 3 that's
31, write down the 1, carry the 3, and so on. When we do the next partial
product, we do a shift effectively, we add a 0 at the end, and then we
just do exactly the same thing. And so on for the final two partial
products. [SOUND] And finally, we just add everything up. [SOUND],
what you probably realized back in third grade, is that this algorithm is
what we would call correct. That is, no matter what integers x and y you
start with If you carry out this procedure, this algorithm. And all of your
intermediate computations are done properly. Then the algorithm will
eventually terminate with the product, x times y, of the two input
numbers. You're never going to get a wrong answer. You're always
going to get the actual product. Well, you probably didn't think about was
the amount of time needed to carry out this algorithm out to its
conclusion to termination. That is the number of basic operations,
additions or multiplications of single digit numbers needed before
finishing. So let's now quickly give an informal analyses of the number of
operations required as a function of the input length n. Let's begin with
the first partial product, the top row. How did we compute this number
22,712? Well we multiplied 4 times each of the numbers 5, 6, 7 and 8.
So that was for basic operations. One for each digit at the top number,
plus we had to do these carries. So those were some extra additions.
But in any case, this is at most twice times the number of digits in the
first number. At most two end basic operations to form this first partial
product. And if you think about it there's nothing special about the first
partial product. The same argument says that we need at most 2 n
operations to form each of the partial products of which there are again
n, one for each digit of the second number. Well if we need at most two
n operations to compute each partial product and we have n partial
products. That's a total of at most two n squared operations to form all of
these blue numbers, all of the partial products. Now we're not done at
that point. We still have to add all of those up to get the final answer, in
this case 7,006,652. And that's final addition requires a comparable
number of operations. Roughly, another say two n squared, at most
operations. So, the upshot, the high level point that I want you to focus
on, is that as we think about the input numbers getting bigger and
bigger. That is as a function of n the number of digits in the input
numbers. The number of operations that the Grade-School Multiplication
Algorithm performs, grows like some constant. Roughly 4 say times n
squared. That is it's quadratic in the input length n. For example, if you
double the size of the input, if you double the number of digits in each of
the two integers that you're given. Then the number of operations you
will have to perform using this algorithm has to go up by a factor of four.
Similarly, if you quadruple the input length, the number of operations
going, is going to go up by a factor of 16, and so on. Now, depending on
what type of third grader you were. You might well of accepted this
procedure as the unique or at least the optimal way of multiplying two
numbers together to form their product. Now if you want to be a serious
algorithm designer. That kind of obedient tumidity is a quality you're
going to have to grow out of. And early and extremely important textbook
on the design and analysis of algorithms was by Aho, Hopcroft, and
Ullman. It's about 40 years old now. And there's the following quote,
which I absolutely adore. So after iterating through a number of the
algorithm design paradigms covered in the textbook. They say the
following, perhaps the most important principle of all, for the good
algorithm designer is to refuse to be content. And I think this is a spot on
comment. I might summarize it a little bit more succinctly. As, as an
algorithm designer you should adopt as your Mantra the question, can
we do better? This question is particularly apropos when your'e faced
with a naive or straight-forward solution to a computation problem. Like
for example, the third grade algorithm for integer multiplication. The
question you perhaps did not ask yourself in third grade was, can we do
better than the straight forward multiplication algorithm? And now is the
time for an answer.

Karatsuba Multiplication
Transcript
If you want to multiply two integers, is there a better method than the
one we learned back in third grade? To give you the final answer to this
question, you'll have to wait until I provide you with a toolbox for
analyzing Divide and Conquer algorithm a few lectures hence. What I
want to do in this lecture is convince you that the algorithm design space
is surprisingly rich. There are certainly other interesting methods of
multiplying two integers beyond what we learned in third grade. And the
highlight of this lecture will be something called Karatsuba multiplication.
Let me introduce you to Karatsuba multiplication through a concrete
example. I am going to take the same pair of integers we studied last
lecture, 1, 2, 3, 4, 5, 6, 7, 8. I am going to execute a sequence of steps
resulting in their products. But, that sequence of steps is going to look
very different than the one we undertook during the grade school
algorithm, yet we'll arrive at exactly the same answer. The sequence of
steps will strike you as very mysterious. It'll seem like I'm pulling a rabbit
out of the hat, and the rest of this video will develop more systematically
what exactly this Karatsuba multiplication method is, and why it works.
But what I want you to appreciate already on this slide is that the
algorithm design space is far richer than you might expect. There's this
dazzling array of options for how to actually solve problems like integer
multiplication. Let me begin by introducing some notation for the first and
second halves of the input numbers x and y. So the first half of x, that is
56- we're going to regard as a number in its own right called a. Similarly
b will be 78, c will be 12, and d will be 34. I'm going to do a sequence of
operations involving only these double digit numbers a b c and d. And
then after a few such operations I will collect all of the terms together in a
magical way resulting in the product of x and y. First let me compute the
product of a times c and also the product of b times d. I'm going to skip
the elementary calculations, and just tell you the answer. So you can
verify that a times c is 672, where as b times d is 2652. Next I'm going to
do something even still more inscrutable. I'm going to take the sum of a
and b. I'm going to take the sum of c and d. And then I'm going to
compute the product of those two sums. That boils down to computing
the product of 134 and 46. Mainly at 6164. Now, I'm going to subtract
our first two products from the results of this computation. That is, I'm
going to take 6164. Subtract 2652, and subtract 672. You should check
that if you subtract the results of the first 2 steps from the result of the
3rd step, you get 2840. Now, I claim that I can take the results of step 1,
2 and 4 and combine them into super simple way to produce the product
of X and Y. Here's how I do it. I start with the first product, ac. And I pad
it with four zeros. I take the results of the second step, and I don't pad it
with any zeros at all. And I take the result of the fourth step, and I pad it
with two zeros. If we add up these three quantities, from right to left. We
get two, five, six. Six, zero, zero, seven. If you go back to the previous
lecture you'll note that this is exactly the same output as the great school
algorithm, that this is in fact the product of one, two, the, three, four and
five, six, seven, eight. So let me reiterate that you should not have any
intuitions for the computations I just did, you should not understand what
just went down on this slide. Rather I hope you feel some mixture of
bafflement and intrigue but, more the point I hope you appreciate that
the third grade algorithm is not the only game in town. There's
fundamentally different algorithms for multiplying integers than what you
learned as a kid. Once you realize that, once you realize how rich the
space of algorithms is, you have to wonder Can we do better than that
third grade algorithm? In fact, does this algorithm already do better that
the third grade algorithm? Before I explain full-blown Karatsuba
multiplication, let me begin by explaining a simpler, more straightforward
recursive approach. To integer multiplication. Now, I am assuming you
have a bit of programming background. In particular, that you know what
recursive algorithms are. That is, algorithms which invoke themselves as
a subroutine with a smaller input. So, how might you approach the
integer multiplication problem recursively? Well the input are two digits.
Each two numbers. Each has two digits. So to call the algorithm
recursively you need to perform inputs that have smaller size, less digits.
Well, we already were doing that in the computations on the previous
slide. For example the number 5678 we treated the first half of digits as
56 as a number in its own right and similarly 78. In general, given a
number x with n digits. In can be expressed decomposed, in terms of
two, n over two digit numbers. Namely as A, the first half of the digits
shifted appropriately. That is multiplied by ten raised to the power, n over
two. Plus the second half of the digits b. In our example, we had a equal
to 56, 78 was b. N was 4, so 10 to the n over 2 was 100, and then c and
d were 12 and 34. What I want to do next is illuminate the relevant
recursive calls. To do that, let's look at the product, x times y. Express it
in terms of these smaller numbers, a, b, c, and d, and do an elementary
computation. Multiplying the expanded versions of x and y, we get an
expression with three terms. One shifted by n, 10 raised to the power n,
and the coefficient there is a times c. We have a term that's shifted by 10
to the n over 2, and that has a coefficient of ad and also plus bc. And
bringing up the rear, we have the term b times d. We're going to be
referring to this expression a number of times, so let me both circle it and
just give it a shorthand. We're going to call this expression star. One
detail I'm glossing over for simplicity, is that I've assumed that n is an
even integer. Now, if n is an odd integer, you can apply this exact same
recursive approach to integer multiplication. In the straightforward way,
so if n was 9 then you would decompose one of these input numbers
into say the first five digits and the later four digits and you would
proceed in exactly the same way. Now the point of the expression star is
if we look at it despite being the product of just elementary algebra, it
suggests a recursive approach to multiplying two numbers. If we care
about the product Of X and Y, why not, instead, compute this expression
star, which involves only the products of smaller numbers, A, B, C and
D. You'll notice, staring at the expression star, there are 4 relevant
products, each involving a pair of these smaller numbers. Namely AC,
AD, BC, and BD . So why not compute each of those four products
recursively. After all, the inputs will be smaller. And then once our four
recursive calls come back to us with the answer, we can formulate the
rest of expression star in the obvious way. We just pad a times c with n
zeros at the end. We add up a, d, and bc, using the grade school
algorithm, and pad the result with n over two zeros, and then we just
sum up these returns, again using the grade school addition, and
algorithm. So the one detail missing, that I've glossed over, required to
turn this idea into a bonafide recursive algorithm, would be to specify a
base case. As I hope you all know, recursive algorithms need a base
case. If the input is sufficiently small, then you just immediately compute
the answer rather than recursing further. Of course, recursive algorithms
need a base case so they don't keep calling themselves til the rest of
time. So for integer multiplication, which the base case, well, if you're
given two numbers that have the just one digit each. Then you just
multiply them in one basic operation and return the result. So, what I
hope is clear at the moment is that there is indeed a recursive approach
to solving the integer multiplication algorithm resulting in an algorithm
which looks quite different than the one you learned in third grade, but
which nevertheless you could code up quite easily in your favorite
programming language. Now, what you shouldn't have any intuition
about is whether or not this is a good idea or a completely crackpot idea.
Is this algorithm faster or slower than the grade school algorithm? You'll
just have to wait to find out the answer to that question. Let's now refine
this recursive algorithm, resulting in the full-blown Karatsuba
multiplication algorithm. To explain the optimization behind Karatsuba
multiplication, let's recall the expression we were calling star on the
previous slide. So, this just expressed the product of x and y in terms of
the smaller numbers a, b, c, and d. In this straight forward recursive
algorithm we made four recursive calls to compute the four products
which seemed necessary to value, to compute the expression star. But if
you think about it, there's really only three quantities in star that we care
about, the three relevant coefficients. We care about the numbers ad
and bc. Not per se, but only in as much as we care about their sum, AD
plus BC. So this motivates the question, if there's only 3 quantities that
we care about, can we get away with only 3 rather than 4 recursive calls.
It turns out that we can and here's how we do it. The first coefficient a c
and the third coefficient b d, we compute exactly as before, recursively.
Next, rather than recursively computing a d or b c, we're going to
recursively compute the product of a plus b and c plus d. If we expand
this out, this is the same thing as computing ac plus ad plus bc plus bd.
Now, here is the key observation in Karatsuba Multiplication, and it's
really a trick that goes back to the early 19th Century mathematician,
Gauss. Let's look at the quantity we computed in step 3 and subtract
from it. The two quantities that we already computed in steps one and
two. Subtracting out the result of step one cancels the a c term.
Subtracting out the result of step two, cancels out the bd term, leaving
us with exactly what we wanted all along, the middle coefficient a d plus
b c. And now in the same that on the previous slide we have a
straightforward recursive algorithm making four recursive calls, and then
combining them in the obvious way. Here we have a straightforward
recursive algorithm that makes only three recursive calls. And on top of
the recursive calls does just great school addition and subtraction. So
you do this particular difference between the three recursively computed
products and then you do the shifts, the padding by zeros, and the final
sum as before. So that's pretty cool, and this kind of showcases the
ingenuity which bears fruit even in the simplest imageable computational
problems. Now you should still be asking the question Yeah is crazy
algorthim really faster than the grade school algorithm we learn in 3rd
grade? Totally not obvious, we will answer that question a few lecture
hense and we'll answer it in a special case of an entire toolbox I'll
provide you with to analyze the running time of so called divide and
conquer algorithms like Karatsuba multiplication, so stay tuned.

About the Course


Transcript
In this video I'll talk about various aspects of the course, the topics that
we'll cover, the kinds of skills you can expect to acquire, the kind of
background that I expect, the supporting materials and the available
tools for self assessment. Let's start with the specific topics that this
course is going to cover. The course material corresponds to the first
half of the ten week Stanford course. It's taken by all computer science
undergraduates, as well as many of our graduate students. There will be
five high level topics, and at times these will overlap. The five topics are
first of all, the vocabulary for reasoning about algorithm performance, the
design and conquer algorithm design paradigm, randomization and
algorithm design, primitives for reasoning about graphs, and the use and
implementation of basic data structures. The goal is to provide an
introduction to and basic literacy in each of these topics. Much, much
more could be said about each of them, than we'll have time for here.
The first topic is the shortest, and probably also the driest. But it's a
prerequisite for thinking seriously about the design and analysis of
algorithms. The key concept here is big-O notation, which, conceptually,
is a modeling choice about the granularity with which we measure a
performance metric like the running time of an algorithm. It turns out that
the sweet spot for clear high level thinking about algorithm design, is to
ignore constant factors and lower-order terms. And to concentrate on
how well algorithm performance scales with large input sizes. Big O
notation is the way to mathematize this sweet spot. Now, there's no one
silver bullet in algorithm design. No single problem solving method that's
guaranteed to unlock all of the computational problems that you're likely
to face. That said, there are a few general algorithm design techniques.
High level approaches to algorithm design that find successful
application across a range of different domains. These relatively widely
applicable techniques are the backbone of a general algorithms course
like this one. In this course, we'll only have time to deeply explore one
such algorithm design paradigm, namely that of the divide and conquer
algorithms. In the sequel course as we'll discuss, there's two other major
algorithms on paradigms to get covered. But for now, divide and conquer
algorithm, the idea is to first break the problem into smaller problems
which then gets solved recursively, and then to somehow quickly
combine the solutions to the sub problems into one for the original
problem that you actually care about. So for example, in the last video.
We saw two algorithms of this sort, two divide and conquer algorithms
from multiplying two large integers. In later videos we will see a number
of different applications. We'll see how to design fast divide and conquer
algorithms for problems ranging from sorting to matrix multiplication to
nearest neighbor-type problems and computation of geometry. In
addition, we'll cover some powerful methods for reasoning about the
running time of recursive algorithms like these. As for the third topic. A
randomized algorithm is one that, in some sense, flips coins while it
executes. That is, a randomized algorithm will actually have different
executions if you run it over and over again on a fixed input. It turns out,
and this is definitely not intuitive, that allowing randomization internal to
an algorithm, often leads to simple, elegant, and practical solution to
various computational problems. The canonical example is randomized
quick sort, and that algorithm and analysis we will cover in detail in a few
lectures. Randomized primality testing is another killer application that
we'll touch on. And we'll also discuss a randomized approach to graph
partitioning. And finally we'll discuss how randomization is used to
reason about hash functions and hash maps. One of the themes of this
course, and one of the concrete skills that I hope you take away from the
course, is, literacy with a number of computational primitives for
operating on data, that are so fast, that they're, in some sense,
essentially free. That is, the amount of time it take to invoke one of these
computational primitives is barely more than the amount of time you're
already spending just examining or reading the input. When you have a
primitive which is so fast, that the running time is barely more than what
it takes to read the input, you should be ready to apply it. For example, in
a preprocessing step, whenever it seems like it might be helpful. It
should just be there on the shelf waiting to be applied at will. Sorting is
one canonical example of a very fast, almost for-free primitive of this
form. But there are ones that operate on more complex data as well. So
recall that a graph is a data structure that has, on the one hand, vertices,
and on the other hand, edges. Which connects pair of vertices. Graphs
model, among any other things, different types of networks. So even
though graphs are much more complicated than mere arrays, there's still
a number of blazingly fast primitives for reasoning about their structure.
In this class we'll focus on primitives for competing connectivity
information and also shortest paths. We'll also touch on how some
primitives have been used to investigate the structure of information in
social networks. Finally, data structures are often a crucial ingredient in
the design of fast algorithms. A data structure's responsible for
organizing data in a way that supports fast queries. Different data
structures support different types of queries. I'll assume that you're
familiar with the structures that you typically encounter in a basic
programming class including arrays and vectors. Lists, stacks, and
queues. Hopefully, you've seen at some point both trees and heaps, or
you're willing to read a bit about them outside of the course, but we'll
also include a brief review of each of those data structures as we go
along. There's two extremely useful data structures that we'll discuss in
detail. The first is balanced binary search trees. These data structures
dynamically maintain an ordering on a set of elements, while supporting
a large number of queries that run in time logarithmic in the size of the
set. The second data structure we'll talk a fair bit about is hash tables or
hash maps, which keep track of a dynamic set, while supporting
extremely fast insert and lookup queries. We'll talk about some canonical
uses of such data structures, as well as what's going on under the hood
in a typical implementation of such a data structure. >> There's a
number of important concepts in the design and analysis of algorithms
that we won't have time to cover in this five week course. Some of these
will be covered in the sequel course, Design and Analysis of Algorithms
II, which corresponds to the second half of Stanford's ten week course
on this topic. The first part of this sequel course focuses on two more
algorithm design paradigms. First of all, the design analysis of greedy
algorithms with applications to minimum spanning trees, scheduling, and
information theoretic coding. And secondly, the design analysis of
dynamic programming algorithms with example applications being in
genome sequence alignment and the shortest path protocols in
communication networks. The second part of the sequel course
concerns NP complete problems, and what to do about them. Now, NP
complete problems are problems that, assuming a famous mathematical
conjecture you might have heard of, which is called the "P not equal to
NP" conjecture, are problems that cannot be solved under this
conjecture by any computationally efficient algorithm. We'll discuss the
theory of NP completeness, and, with a focus on what it means for you
as an algorithm designer. We'll also talk about several ways to approach
NP complete problems, including: fast algorithms that correctly solve
special cases; fast heuristics with provable performance guarantees; and
exponential time algorithms that are qualitatively faster than brute force
search. Of course there are plenty of important topics that can't be fit into
either of these two five-week courses. Depending on the demand, there
might well be further courses on more advanced topics. Following this
course is going to involve a fair amount of time and effort on your part.
So it's only reasonable to ask: What can you hope to get out of it? What
skills will you learn? Well. Primarily, you know, even though this isn't a
programming class per se, it should make you a better programmer.
You'll get lots of practice describing and reasoning about algorithms,
you'll learn algorithm design paradigms, so really high level problem-
solving strategies that are relevant for many different problems across
different domains, and tools for predicting the performance of such
algorithms. You'll learn several extremely fast subroutines for processing
data and several useful data structures for organizing data that can be
deployed directly in your own programs. Second, while this is not a math
class per se, we'll wind up doing a fair amount of mathematical analysis.
And this in turn will sharpen your mathematical analytical skills. You
might ask, why is mathematics relevant for a class in the design and
analysis of algorithms, seemingly more of a programming class. Well let
me be clear. I am totally uninterested in merely telling you facts or
regurgitating code that you can already find on the web or in any number
of good programming books. My goal here in this class, and the way I
think I can best supplement the resources that you probably already
have access to is to explain why things are the way they are. Why we
analyze the algorithms in the way that we do, why various super fast
algorithms are in fact super fast, and so on. And it turns out that good
algorithmic ideas usually require nontrivial mathematical analysis to
understand properly. You'll acquire fundamental insights into the specific
algorithms and data structures that we discuss in the course. And
hopefully, many of these insights will prove useful, more generally, in
your other work. Third, and perhaps the most relevant for those of you
who work in some other discipline: this course should help you learn how
to think algorithmically. Indeed after studying algorithms it's hard enough
not to see them pretty much everywhere, whether you are riding an
elevator, watching a flock of birds, buying and selling stocks out of your
portfolio, even watching an infant learn. As I said in the previous video
algorithm thinking is becoming increasingly useful and prevalent if you
are outside of computer science and technology like in biology, statistics
and economics. Fourth, if you're interested in feeling like a card carrying
computer scientist, in some sense, then you'll definitely want basic
literacy in all of the topics that we'll be covering. Indeed, one of the
things that makes studying algorithms so fun, is, it really feels like you're
studying a lot of the greatest hits from the last 50 years of computer
science. So, after this class, no longer will you feel excluded at that
computer science cocktail party when someone cracks a joke about
Dijkstra's Algorithm. Now you'll know exactly what they mean. Finally,
there's no question that studying this material is helpful for technical
interview questions. To be clear, my sole goal here is to teach you
algorithms, not to prepare you for interviews, per se. But over the years,
countless students of mine have regaled me with stories about how
mastering the concepts in this class enabled them to ace every technical
question they were ever asked. I told you, this is fundamental stuff. So,
what do I expect from you? Well, honestly, the answer is nothing. After
all isn't the whole point of a free online class like this one that anyone
can take it and devote as much effort to it as they like. So that said, as a
teacher it's still useful to have one or more canonical students in mind.
And I thought I'd go ahead and be transparent with you about how I'm
thinking about these lectures. Who I have in mind that I'm teaching to.
So again, please don't feel discouraged if you don't conform to this
canonical student template. I'm happy to have the opportunity to teach
you about algorithms no matter who you are. So first, I have in mind
someone who knows at least some programming. For example, consider
the previous lecture. We talked about a recursive approach to multiplying
two numbers and I mentioned how in certain mathematical expression,
back then we labeled it star and circled it in green. How that expression
naturally translated into a recursive algorithm. In particular, I was
certainly assuming that you had some familiarity with recursive
programs. If you feel comfortable with my statement in that lecture, if you
feel like you could code up a recursive integer multiplication algorithm
based on the high level outline that I gave you, then you should be in
good shape for this course. You should be good to go. If you weren't
comfortable with that statement, well, you might not be comfortable with
the relatively high conceptual level at which we discuss program in this
course. But I encourage to watch the next several videos anyway, to see
if you get enough out of them to make it worth your while. [sound]. Now,
while I'm aiming these lectures at people who know some programming,
I'm not making any assumptions whatsoever about exactly which
programming languages you know. Any standard imperative language
you know, something like C, Java or Python, is totally fine for this
course. Now, to make these lectures accessible to as many
programmers as possible, and to be honest, you know, also to promote
thinking about programming at a relatively abstract conceptual level, I
won't be describing algorithms in any particular programming language.
Rather, when I discuss the algorithms, I'll use only high-level pseudo-
code, or often simply English. My inductive hypothesis is that you are
capable of translating such a high level description into a working
program in your favorite programming language. In fact, I strongly
encourage everyone watching these lectures to do such a translation of
all of the algorithms that we discussed. This will ensure your
comprehension, and appreciation of them. Indeed, many professional
computer scientists and programmers don't feel that they really
understand an algorithm until they've coded it up. Many of the course's
assignments will have a problem in which we ask you to do precisely
this. Put another way, if you're looking for a sort of coding cookbook,
code that you can copy and paste directly into your own programs.
Without necessarily understanding how it works, then this is definitely
not the course for you. There are several books out there that cater to
programmers looking for such coding cook books. Second, for these
lectures I have in mind someone who has at least a modest amount of
mathematical experience though perhaps with a fair bit of accumulated
rust. Concretely I expect you to be able to recognize a logical argument
that is a proof. In addition, two methods of proof that I hope you've seen
before are proofs by induction and proofs by contradiction. I also need
you to be familiar with basic mathematical notation, like the standard
quantifier and summation symbols. A few of the lectures on randomized
algorithms and hashing will go down much easier for you if you've seen
discrete probability at some point in your life. But beyond these basics,
the lectures will be self contained. You don't even need to know any
calculus, save for a single simple integral that magically pops up in the
analys of the randomized quick sort algorithm. I imagine that many of
you have studied math in the past, but you could use a refresher, you're
a bit rusty. And there's plenty of free resources out there on the web,
and I encourage you to explore and find some that you like. But one that
I want to particularly recommend is a great set of free lecture notes. It's
called Mathematics for Computer Science. It's authored by Eric Lehman
and Tom Layden, and it's quite easy to find on the web if you just do a
web search. And those notes cover all of the prerequisites that we'll
need, in addition to tons of other stuff. In the spirit of keeping this course
as widely accessible as possible, we're keeping the required supporting
materials to an absolute minimum. Lectures are meant to be self-
contained and we'll always provide you with the lecture notes in
PowerPoint and PDF format. Once in a while, we'll also provide some
additional lecture notes. No textbook is required for this class. But that
said, most of the material that we'll study is well covered in a number of
excellent algorithms books that are out there. So I'll single out four such
books here. The first three I mention because they all had a significant
influence on the way that I both think about and teach algorithms. So it's
natural to acknowledge that debt here. One very cool thing about the
second book, the one by Dasgupta, Papadimitriou and Vazirani, is that
the authors have made a version of it available online for free. And
again, if you search on the authors' names and the textbook title, you
should have no trouble coming up with it with a web search. Similarly,
that's the reason I've listed the fourth book because those authors have
likewise made essentially a complete version of that book available
online and it's a good match for the material that we're going to cover
here. If you're looking for more details about something covered in this
class, or simply a different explanation than the one that I give you, all of
these books are gonna be good resources for you. There are also a
number of excellent algorithm textbooks that I haven't put on this list. I
encourage to explore and find you own favorite. >> In our assignments,
we'll sometimes ask you to code up an algorithm and use it to solve a
concrete problem that is too large to solve by hand. Now, we don't care
what program and language and development environment you use to
do this as we're only going to be asking you for the final answer. Thus,
we're not requiring anything specific, just that you are able to write and
execute programs. If you need help or advice about how to get set up
with a suitable coding environment, we suggest that you ask other
students for help via the course discussion forum. Finally, let's talk a bit
more about assessment. Now this course doesn't have official grades
per se, but we will be assigning weekly homeworks. Now we're going to
assign homeworks for three different reasons. The first is just for self-
assessment. It's to give you the opportunity to test your understanding of
the material so that you can figure out which topics you've mastered and
which ones that you haven't. The second reason we do it is to impose
some structure on the course, including deadlines, to provide you with
some additional motivation to work through all the topics. Deadlines also
have a very important side effect that synchronizes a lot of the students
in the class. And this of course makes the course discussion forum a far
more effective tool for students to seek and provide help in
understanding the course material. The final reason that we give
homeworks is to satisfy those of you who, on top of learning the course
material, are looking to challenge yourself intellectually. [sound]. Now,
this class has tens of thousands of students. So it's obviously essential
that the assignments can be graded automatically. Now, we're currently
only in the 1.0 generation of free online courses such as this one. So the
available tools for auto graded assessment are currently rather primitive.
So, we'll do the best we can, but I have to be honest with you. It's
difficult, or maybe even impossible to test deep understanding of the
design and analysis of algorithms, using the current set of tools. Thus,
while the lecture content in this online course is in no way watered down
from the original Stanford version. The required assignments and exams
we'll give you, are not as demanding as those that are given in the on
campus version of the course. To make up for this fact, we'll
occasionally propose optional algorithm design problems, either in a
video or via supplementary assignment. We don't have the ability to
grade these, but we hope that you'll find them interesting and
challenging, and that you'll discuss possible solutions with other students
via the course discussion forum. So I hope this discussion answered
most of the questions you have about the course. Lets move on to the
real reason that we're all here, to learn more about algorithms.

Merge Sort: Motivation and Example


Transcript
Okay. So in this video, we'll get our first sense of what it's actually like to
analyze an algorithm. And we'll do that by first of all reviewing a famous
sorting algorithm, namely the Merge Sort algorithm. And then giving a
really fairly mathematically precise upper bound on exactly how many
operations the Merge Sort algorithm requires to correctly sort an input
array. So I feel like I should begin with a bit of an apology. Here we are
in 2012, a very futuristic sounding date. And yet I'm beginning with a
really quite ancient algorithm. So for example, Merge Sort was certainly
known, to John Von Neumann all the way back in 1945. So, what
justification do I have for beginning, you know, a modern class in
algorithms with such an old example? Well, there's a bunch of reasons.
One, I haven't even put down on the slide, which is like a number of the
algorithms we'll see, "Merge Sort" as an oldie but a goodie. So it's over
60, or maybe even 70 years old. But it's still used all the time in practice,
because this really is one of the methods of choice for sorting. The
standard sorting algorithm in the number of programming libraries. So
that's the first reason. But there's a number of others as well that I want
to be explicit about. So first of all, throughout these online courses, we'll
see a number of general algorithm design paradigms ways of solving
problems that cut across different application domains. And the first one
we're going to focus on is called the Divide-and-Conquer algorithm
design paradigm. So in Divide-and-Conquer, the idea is, you take a
problem, and break it down into smaller sub problems which you then
solve recursively, ... ... and then you somehow combine the results of the
smaller sub-problem to get a solution to the original problem that you
actually care about. And Merge Sort is still today's the, perhaps the,
most transparent application of the Divide-and-Conquer paradigm, ... ...
that will exhibit very clear what the paradigm is, what analysis and
challenge it presents, and what kind of benefits you might derive. As for
its benefits, so for example, you're probably all aware of the sorting
problem. Probably you know some number of sorting algorithms perhaps
including Merge Sort itself. And Merge Sort is better than a lot of this sort
of simpler, I would say obvious, sorting algorithms, ... ... so for example,
three other sorting algorithms that you may know about, but that I'm not
going to discuss here. If you don't know them, I encourage you to look
them up in a text book or look them up on the web. Let's start with three
sorting algorithms which are perhaps simpler, first of all is "Selection
Sort". So this is where you do a number of passes through the way
repeatedly, identifying the minimum of the elements that you haven't
looked at yet, ... ... so you're basically a linear number of passes each
time doing a minimum computation. There's "Insertion Sort", which is still
useful in certain cases in practice as we will discuss, but again it's
generally not as good as Merge Sort, ... ... where you will repeatedly
maintain the invariant that prefix view of array, which is sorted version of
those elements. So after ten loops of Insertion Sort, you'll have the
invariant that whatever the first ten elements of the array are going to be
in sorted order, ... ... and then when Insertion Sort completes, you'll have
an entire sorted array. Finally, some of you may know about "Bubble
Sort", which is where you identify adjacent pairs of elements which are
out of order, ... and then you do repeated swaps until in the end the
array is completely sorted. Again I just say this to jog your memory,
these are simpler sorts than Merge Sort, ... ... but all of them are worse
in the sense that they're lack in performance in general, which scales
with N^2, ... ... and the input array has N elements, so they all have, in
some sense, quadratic running time. But if we use this non-trivial Divide-
and-Conquer approach, or non-obvious approach, we'll get a, as we'll
see, a much better running time than this quadratic dependence on the
input. Okay? So we'll get a win, first sorting in Divide-and-Conquer, and
Merge Sort is the algorithm that realizes that benefit. So the second
reason that I wanna start out by talking about the Merge Sort algorithm,
is to help you calibrate your preparation. I think the discussion we're
about to have will give you a good signal for whether you're
background's at about the right level, of the audience that I'm thinking
about for this course. So in particular, when I describe the Merge Sort
algorithm, you'll notice that I'm not going to describe in a level of detail
that you can just translate it line by line into a working program in some
programming language. My assumption again is that you're a sort of the
programmer, and you can take the high-level idea of the algorithm, how
it works, ... ... and you're perfectly capable of turning that into a working
program in whatever language you see fit. So hopefully, I don't know, it
may not be easy the analysis of Merge Sort discussion. But I hope that
you find it at least relatively straight forward, .. .. because as the course
moves on, we're going to be discussing algorithms and analysis which
are a bit more complicated than the one we're about to do with Merge
Sort. So in other words, I think that this would be a good warm-up for
what's to come. Now another reason I want to discuss Merge Sort is that
our analysis of it will naturally segment discussion of how we analyze the
algorithms in this course and in general. So we're going to expose a
couple of assumptions in our analysis, we're focus on worst case
behavior, ... ... or we'll look for guarantees on performance on running
time that hold for every possible input on a given size, ... and then we'll
also expose our focus on so called "Asymptotic Analysis", which
meaning will be much more concerned with the rate of growth on an
algorithms performance than on things like low-order terms or on small
changes in the constant factors. Finally, we'll do the analysis of Merge
Sort using what's called as "Recursion-Tree" method. So this is a way of
tying up the total number of operations that are executed by an
algorithm. And as we'll see a little bit later, this Recursion-Tree method
generalizes greatly. And it will allow us to analyze lots of different
recursive algorithms, lots of different Divide-and-Conquer algorithms,
including the integer multiplication algorithm that we discussed in an
earlier segment. So those are the reasons to start out with Merge Sort.
So what is the computational problem that Merge Sort is meant to solve?
Well, presumably, you all know about the sorting problem. But let me tell
you a little bit about it anyways, just so that we're all on the same page.
So, we're given as input. An array of N numbers in arbitrary order, and
the goal of course is to produce output array where the numbers are in
sorted order, let's say, from smallest to largest. Okay so, for example,
we could consider the following input array, and then the goal would be
to produce the following output array. Now one quick comment. You'll
notice that here in input array, it had eight elements, all of them were
distinct, it was the different integers, between 1 and 8. Now the sorting
problem really isn't any harder if you have duplicates, in fact it can even
be easier, ... ... but to keep the discussion as simple as possible let's
just, among friends, go ahead and assume that they're distinct, for the
purpose of this lecture. And I'll leave it as an exercise which I encourage
you to do, which is to think about how the Merge Sort algorithm
implementation and analysis would be different, if at all, if there were
ties, okay? Go ahead and make the distinct assumption for simplicity
from here on out. Okay, so before I write down any pseudo code for
Merge Sort, let me just show you how the algorithm works using a
picture, ... ... and I think it'll be pretty clear what the code would be, even
just given a single example. So let's go ahead and consider the same
unsorted input array that we had on the previous slide. So the Merge
Sort algorithm is a recursive algorithm, and again, that means that a
program which calls itself and it calls itself on smaller sub problems of
the same form, okay? So the Merge Sort is its purpose in life is to sort
the given input array. So it's going to spawn, or call itself on smaller
arrays. And this is gonna be a canonical Divide-and-Conquer
application, where we simply take the input array, we split it in half, we
solve the left half recursively, we solve the right half recursively, and
then we combine the results. So let's look at that in the picture. So the
first recursive call gets the first four elements, the left half of the array,
namely 5, 4, 1, 8. And, of course, the other recursive call is gonna get
the rest of the elements, 7, 2, 6, 3. You can imagine these has been
copied into new arrays before they're given to the recursive calls. Now,
by the magic of recursion, or by induction if you like, the recursive calls
will do their task. They will correctly sort each of these arrays of four
elements, and we'll get back sorted versions of them. So from our first
recursive call, we receive the output, 1, 4, 5, 8, and from the second
recursive call, we received the sorted output, 2, 3, 6, 7. So now, all the
remains to complete the Merge Sort is to take the two results of our
recursive calls, these two sorted elements of length-4, and combine
them to produce the final output, namely the sorted array of all eight of
the input numbers. And this is the step which is called "Merge". And
hopefully you are already are thinking about how you might actually
implement this merge in a computationally efficient way. But I do owe
you some more details. And I will tell you exactly how the merge is done.
In effect, you just walk pointers down each of the two sort of sub-arrays,
copying over, populating the output array in the sorted order. But I will
give you some more details in just a slide or two. So that's Merge Sort in
a picture. Split it in half, solve recursively, and then have some slick
merging procedure to combine the two results into a sorted output.

Merge Sort: Pseudocode


Transcript
Okay, so let's move on, and actually discuss the pseudo-code for the
merge sort algorithm. First, let me just tell you the pseudo-code, leaving
aside exactly how the merging subroutine is implemented. And thus,
high levels should be very simple and clear at this point. So there's
gonna be two recursive calls, and then there's gonna be a merging step.
Now, I owe you a few comments, 'cause I'm being a little sloppy. Again,
as I promised, this isn't something you would directly translate into code,
although it's pretty close. But so what are the couple of the ways that I'm
being sloppy? Well, first of all, there's, [inaudible], you know, in any
recursive algorithm, you gotta have some base cases. You gotta have
this idea that when the input's sufficient. Really small you don't do any
recursion, you just return some trivial answer. So in the sorting problem
the base case would be if your handed an array that has either zero or
an elements, well it's already sorted, there's nothing to do, so you just
return it without any recursion. Okay, so to be clear, I haven't written
down the base cases. Although of course you would if you were actually
implementing, a merge short. Some of you, make a note of that. A
couple of other things I'm ignoring. I'm ignoring what the, what to do if
the array has odd lengths, so if it has say nine elements, obviously you
have to somehow break that into five and four or four and five, so you
would do that just in either way and that would fine. And then secondly,
I'm ignoring the details or what it really means to sort of recursively sort,
so for example, I'm not discussing exactly how you would pass these
subarrays onto the recursive calls. That's something that would really
depend somewhat on what, on the programming language, so that's
exactly what I want to avoid. I really want to talk about the concepts
which transcend any particular programming language implementation.
So that's why I'm going to describe algorithms at this level okay. Alright,
so the hard part relatively speaking, that is. How do you implement the
merge depth? The recursive calls have done their work. We have these
two sort of separated half the numbers. The left half and the right half.
How do we combine them into one? And in English, I already told you on
the last slide. The idea is you just populate the output array in a sorted
order, by traversing pointers or just traversing through the two, sorted
sub-arrays in parallel. So let's look at that in some more detail. Okay, so
here is the pseudo-code for the merge step. [sound] So let me begin by,
introducing some names for the, characters in the, what we're about to
discuss. So let's use C. To denote the output array. So this is what we're
suppose to spit out with the numbers in sorted order. And then, I'm
gonna use a and b to denote the results of the two recursive calls, okay?
So, the first recursive call has given us array a, which contains the left
half of the input array in sorted order. Similarly, b contains the right half
of the input array, again, in sorted order. So, as I said, we're gonna need
to traverse the two, sorted sub-arrays, a and b, in parallel. So, I'm gonna
introduce a counter, i, to traverse through a, j to traverse through b. I and
j will both be initialized to one, to be at the beginning of their respective
arrays. And now we're gonna do. We're going to do a single pass of the
output array copying it in an increasing order. Always taking the smallest
from the union of the two sorted sub arrays. And if you, if there's one
idea in this merge step it's just the realization that. The minimum
element that you haven't yet looked at in A and B has to be at the front
of one or the two lists right so for example at the very beginning of the
algorithm where is the minimum element over all. Well, which ever of the
two arrays it lands in -- A or B -- it has to be the smallest one there okay.
So the smallest element over all is either the smallest element A or it's
the smallest element B. So you just check both places, the smaller one
is the smallest you copy it over and you repeat. That's it. So the purpose
of K is just to traverse the output array from left to right. That's the order
we're gonna populate it. Currently looking at position I, and the first array
of position J and the second array. So that's how far we've gotten, how
deeply we've probed in the both of those two arrays. We look at which
one has the current smallest, and we copy the smallest one over. Okay?
So if the, if, the entry in the i position of A is smaller, we copy that one
over. Of course, we have to increment i. We probe one deeper into the
list A, and symmeterically for the case where the current position in B
has the smaller element. Now again, I'm being a little bit sloppy, so that
we can focus on the forest, and not sort of, And not get bogged down
with the trees. I'm ignoring some end cases, so if you really wanted to
implement this, you'd have to add a little bit, to keep track of when you
fall off, either, either A or B. Because you have additional checks for
when i or j reaches the end of the array, at which point you copy over all
the remaining elements into C. Alright, so I'm gonna give you a cleaned
up version, of, that pseudo-code so that you don't have to tolerate my
questionable handwriting any longer than is absolutely necessary. This
again, is just the same thing that we wrote on the last slide, okay? The
pseudo-code for the merge step. Now, so that's the Merge Sort
algorithm. Now let's get to the meaty part of this lecture, which is, okay,
so merge sort produces a sorted array. What makes it, if anything, better
than much simpler non divide and conquer algorithms, like say, insertion
sort? Other words, what is the running time of the merge sort algorithm?
Now I'm not gonna give you a completely precise definition, definition of
what I mean by running time and there's good reason for that, as we'll
discuss shortly. But intuitively, you should think of the running time of an
algorithm, you should imagine that you're just running the algorithm in a
debugger. Then, every time you press enter, you advance with one line
of the program through the debugger. And then basically, the running
time is just a number of operations executed, the number of lines of code
executed. So the question is, how many times you have to hit enter on
the debugger before the, program finally terminates. So we're interested
in how many such, lines of code get executed for Merge Short when an
input array has n numbers. Okay, so that's a fairly complicated question.
So let's start with a more modest school. Rather than thinking about the
number of operations executed by Merge Sort, which is this crazy
recursive algorithm, which is calling itself over and over and over again.
Let's just think about how many operations are gonna get executed
when we do a single merge of two sorted sub arrays. That seems like it
should be an easier place to start. So let me remind you, the pseudo
code of the merge subroutine, here it is. So let's just go and count up
how many operations that are gonna get used. So there's the
initialization step. So let's say that I'm gonna charge us one operation for
each of these two initializations. So let's call this two operations, just set i
equal to one and j equal to one then we have this four loop executes a
total number of end times so each of these in iterations of this four loop
how many instructions get executed, well we have one here we have a
comparison so we compare A(i) to B(j) and either way the comparison
comes up we then do two more operations, we do an assignment. Here
or here. And then we do an increment of the relevent variable either here
or here. So that's gonna be three operations per iteration. And then
maybe I'll also say that in order to increment K we're gonna call it a
fourth iteration. Okay? So for each of these N iterations of the four loop
we're gonna do four operations. All right? So putting it all together, what
do we have is the running time for merge. So let's see the upshot. So the
upshot is that the running time of the merge subroutine, given an array of
M numbers, is at most four M plus two. So a couple of comments. First
of all, I've changed a letter on you so don't get confused. In the previous
slide we were thinking about an input size of N. Here I've just made it.
See I've changed the name of the variable to M. That's gonna be
convenient once we think about merge sort, which is recursing on
smaller sub-problems. But it's exactly the same thing and, and whatever.
So an array of M entries does as most four M plus two. Lines of code.
The second thing is, there's some ambiguity in exactly how we counted
lines of code on the previous slide. So maybe you might argue that, you
know, really, each loop iteration should count as two operations, not just
one.'Cause you don't just have to increment K, but you also have to
compare it to the, upper bound of N. Eh, maybe. Would have been 5M+2
instead of 4M+2. So it turns out these small differences in how you count
up. The number of lines of code executed are not gonna matter, and
we'll see why shortly. So, amongst friends, let's just agree, let's call it 4M
plus two operations from merge, to execute on array on exactly M
entries. So, let me abuse our friendship now a little bit further with an, an
inequality which is true, but extremely sloppy. But I promise it'll make our
lives just easier in some future calculations. So rather than 4m+2, 'cause
2's sorta getting on my nerves. Let's just call this. Utmost six N. Because
m is at least one. [sound] Okay, you have to admit it's true, 6MO is at
least 4M plus two. It's very sloppy, these numbers are not anything
closer to each other for M large but, let's just go ahead and be sloppy in
the interest of future simplicity. Okay. Now I don't expect anyone to be
impressed with this rather crude upper bound, the number of lines of
code that the merge subroutine needs to finish, to execute. The key
question you recall was how many lines of code does merge sort require
to correctly sort the input array, not just this subroutine. And in fact,
analyzing Merge Sort seems a lot more intimidating, because if it keeps
spawning off these recursive versions of itself. So the number of
recursive calls, the number of things we have to analyze, is blowing up
exponentially as we think about various levels of the recursion. Now, if
there's one thing we have going for us, it's that every time we make a
recursive call. It's on a quite a bit smaller input then what we started
with, it's on an array only half the size of the input array. So there's some
kind of tension between on the one hand explosion of sub problems, a
proliferation of sub problems and the fact that successive subproblems
only have to solve smaller and smaller subproblems. And resolute
resolving these two forces is what's going to drive our analysis of Merge
Short. So, the good news is, is I'll be able to show you a complete
analysis of exactly how many lines of code Merge Sort takes. And I'll be
able to give you, and, in fact, a very precise upper bound. And so here's
gonna be the claim that we're gonna prove in the remainder of this
lecture. So the claim is that Merge Short never needs than more than six
times N. Times the logarithm of N log base two if you're keeping track
plus an extra six N operations to correctly sort an input array of N
numbers, okay so lets discuss for a second is this good is this a win,
knowing that this is an upper bound of the number of lines of code the
merger takes well yes it is and it shows the benefits of the divide and
conquer paradigm. Recall. In the simpler sorting methods that we briefly
discussed like insertion sort, selection sort, and bubble sort, I claimed
that their performance was governed by the quadratic function of the
input size. That is they need a constant times in the squared number of
operations to sort an input array of length N. Merge sort by contrast
needs at most a constant times N times log N, not N squared but N
times log N lines of code to correctly sort an input array. So to get a feel
for what kind of win this is let me just remind you for those of you who
are rusty, or for whatever reason have lived in fear of a logarithm, just
exactly what the logarithm is. Okay? So. The way to think about the
logarithm is as follows. So you have the X axis, where you have N,
which is going from one up to infinity. And for comparison let's think
about just the identity function, okay? So, the function which is just.
F(n)=n. Okay, and let's contrast this with a logarithm. So what is the
logorithm? Well, for our purposes, we can just think of a logorithm as
follows, okay? So the log of n, log base 2 of n is, you type the number N
into your calculator, okay? Then you hit divide by two. And then you
keep repeating dividing by two and you count how many times you
divide by two until you get a number that drops below one okay. So if
you plug in 32 you got to divide five times by two to get down to one. Log
base two of 32 is five. You put in 1024 you have to divide by two, ten
times till you get down to one. So log base two of 1024 is ten and so on,
okay. So the point is you already see this if a log of a 1000 roughly is
something like ten then the logarithm is much, much smaller than the
input. So graphically, what the logarithm is going to look like is it's going
to look like. A curve becomes very flat very quickly, as N grows large,
okay? So F(n) being log base 2 of n. And I encourage you to do this,
perhaps a little bit more precisely on the computer or a graphing
calculator, at home. But log is running much, much, much slower than
the identity function. And as a result, sorting algorithm which runs in time
proportional to n times log n is much, much faster, especially as n grows
large, than a sorting algorithm with a running time that's a constant times
n squared.

Merge Sort: Analysis - Part 1


Transcript
In this video, we'll be giving a running time analysis of the merge sort
algorithm. In particular, we'll be substantiating the claim that the
recursive divide and conquer merge sort algorithm is better, has better
performance than simpler sorting algorithms that you might know, like
insertion sort, selection sort, and bubble sort. So, in particular, the goal
of this lecture will be to mathematically argue the following claim from an
earlier video, that, in order to sort an array in numbers, the merge sort
algorithm needs no more than a constant times N log N operations.
That's the maximum number of lines of executable code that will ever
execute specifically six times n log n plus six n operations. So, how are
we going to prove this claim? We're going to use what is called a
recursion tree method. The idea of the recursion tree method is to write
out all of the work done by the recursive merge sort algorithm in a tree
structure, with the children of a given node corresponding to the
recursive calls made by that node. The point of this tree structure is it will
facilitate, interesting way to count up the overall work done by the
algorithm, and will greatly facilitate, the analysis. So specifically, what is
this tree? So at level zero, we have a root. And this corresponds to the
outer call of Merge Sort, okay? So I'm gonna call this level zero. Now
this tree is going to be binary in recognition of the fact that each
indication of Merge Sort makes two recursive calls. So the two children
will correspond to the two recursive calls of Merge Sort. So at the route,
we operate on the entire input array, so let me draw a big array
indicating that. And at level one, we have one sub problem for the left
half, and another sub problem for the right half of the input array. And I'll
call these first two recursive calls, level one. Now of course each of
these two level one recursive calls will themselves make two recursive
calls. Each operating on then a quarter of the original input array. So
those are the level two recursive calls, of which there are four, and this
process will continue until eventually the recursion bottoms out, in base
cases when they're only an array size zero or one. So now I have a
question for you which I'll, I'll give you in the form of a quiz which is, at
the bottom of this recursion tree corresponding to the base cases, what
is the level number at the bottom? So, at what level do the leaves in this
tree reside? Okay, so hopefully you guess,correctly guess, that the
answer is the second one, so namely that the number of levels of the
recursion tree is essentially logarithmic in the size of the input array. The
reason is basically that the input size is being decreased by a factor two
with each level of the recursion. If you have an input size of N at the
outer level, then each of the first set of recursive calls operates on array
of size n over two, at level two, each array has size n over four, and so
on. Where does the recursion bottom out? Well, down at the base cases
where there's no more recursion, which is where the input array has size
one or less. So in other words, the number of levels of recursion is
exactly the number of times you need to divide N by two, until you get
down to a number that's most one. Recall that's exactly the definition of
a logarithm base two of n. So since the first level is level zero and last
level is level log base two of n. The total number of levels is actually log
base two of n plus one. And when I write down this expression, I'm here
assuming that N is a, is a power of two. Which is not a big deal. I mean
the analysis is easily extended to the case where N is not a power of
two. And this way, we don't have to think about fractions. Log base two
of N then is an integer. Okay so let's return to the recursion tree. Let me
just redraw it really quick. So again, down here at the bottom of the tree
we have the leaves, i.e. The base cases where there's no more
recursion. Which when N is the power of two correspond exactly to
single element arrays. So that's the recursion tree corresponding to an
indication of Merge Sort. And the motivation for writing down, for
organizing the work performed by Merge Sort in this way, is it allows us
to count up the work, level by level. And we'll see that that's a particularly
convenient way to account for all of the different lines of code that get
executed. Now, to see that in more detail, I need to ask you to identify a
particular pattern. So, first of all, the first question is, at a given level, j, of
this recursion, exactly how many distinct sub-problems are there, as a
function of the level j? That's the first question. The second question is,
for each of those distinct sub-problems at level j, what is the input size?
So, what is the size of the a-, of the array, which is passed to a sub-
problem residing at level j of this recursion tree. So, the correct answer is
the third one. So, first of all, at a given level, j, there's precisely two to
the j distinct sub-problems. There's one outermost sub-problem at level
zero, it has two recursive calls, those are the two, sub-problems at level
one, and so on. In general, since merge short calls itself twice, the
number of sub-problems is doubling at each level, so that gives us the
expression, two to the j, for the number of sub-problems at level j. On the
other hand, by a similar argument, the input size is halving each time.
With each recursive call you pass it half of the input. That you were
given. So at each level of the recursion tree we're seeing half of the input
size of the previous level. So after J levels, since we started with an
input size of N, after J levels, each sub-problem will be operating on an
array of length N over two to the J. Okay, so now let's put this pattern to
use, and actually count up all of the lines of code that Merge Sort
executes. And as I said before, the key, the key idea is to count up the
work level by level. Now, to be clear, when I talk about the amount of
work done at level J, what I'm talking about is the work done by those 2
to the J invocations of Merge Sort, not counting their respective
recursive calls. Not counting work which is gonna get done in the
recursion, lower in the tree. Now, recall, Merge Sort is a very simple
algorithm. It just has three lines of code. First there is a recursive call so
we're not counting that, second there is another recursive call again
we're not counting that a little j and then third we just invoke the merge
subroutine. So really outside of the recursive calls all that merge sort
does is a single invocation of merge. Further recall we already have a
good understanding of the number of lines of code that merge needs.
On an input of size m, it's gonna use, at most, 6m lines of code. That's
an analysis that we did in the previous video. So, let's fix a level j. We
know how many sub-problems there are, two to the j. We know the size
of each sub-problem, n over two to the j, and we know how much work
merge needs on such an input, we just multiply it by six, and then we just
multiply it out, and we get the amount of work done at a level j. 'Kay?
And all of the little adjacent problems. So here it is in more detail. Alright.
So. We start with just the number of different sub-problems at level J and
we just notice that, that was at most two to the J. We also observe that
each level J sub-problem is passed an, an array as input which has
length N over two to the J. And we know that the merge subroutine,
when given an input, given an array of size, N over two to the J, will
execute almost six times that many number of lines of code. So to
compute the total amount of work done at level J, we just multiply the
number of problems times the work done sub-problem, per sub problem.
And then something sort of remarkable happens, where you get this
cancellation of the two, two to the Js. And we get an upper bound 6N.
Which is independent of the level J. So we do at most six end operations
at the root, we do at most six end operations at level one, at level two,
and so on, okay? It's independent of the level. Morally, the reason this is
happening is because of a perfect equilibrium between two competing
forces. First of all, the number of subproblems is doubling with each.
Level of the recursion tree but 2ndly the amount of work that we do per
sub-problem is halving with each level of the recursion tree's, once those
two cancel out. We get an upper bound 6N, which is independent of the
level J. Now, here's why that's so cool, right? We don't really care about
the amount of work just at a given level. We care about the amount of
work that Merge Sort does ever, at any level. But, if we have a bound on
the amount of work at a level which is independent of the level, then our
overall bound is really easy. What do we do? We just take the number of
levels, and we know what that is. It's exactly log based two of N plus
one. Remember the levels are zero through log based two of N inclusive.
And then we have an upper bound 6N for each of those log n plus one
levels. So if we expand out this quantity, we get exactly the upper bound
that was claimed earlier namely the number of operations merge sort
executes is at most 6n times log base 2 of n plus 6n. So that my friends,
is a running time analysis of the merge sort algorithm. That's why it's
running time is bounded by a constant times N log N which, especially
as N grows large, it is far superior to the more simple iterative algorithms
like insertion or selection sort.

Merge Sort: Analysis - Question 1


Multiple Choice
1 point possible (ungraded)

Roughly how many levels does this recursion tree have (as a function of n, the length of the input
array)?

A constant number (independent of n)

log2 (n)

√n

Merge Sort: Analysis - Part 2


Transcript

In this video, we'll be giving a running time analysis of the merge sort
algorithm. In particular, we'll be substantiating the claim that the
recursive divide and conquer merge sort algorithm is better, has better
performance than simpler sorting algorithms that you might know, like
insertion sort, selection sort, and bubble sort. So, in particular, the goal
of this lecture will be to mathematically argue the following claim from an
earlier video, that, in order to sort an array in numbers, the merge sort
algorithm needs no more than a constant times N log N operations.
That's the maximum number of lines of executable code that will ever
execute specifically six times n log n plus six n operations. So, how are
we going to prove this claim? We're going to use what is called a
recursion tree method. The idea of the recursion tree method is to write
out all of the work done by the recursive merge sort algorithm in a tree
structure, with the children of a given node corresponding to the
recursive calls made by that node. The point of this tree structure is it will
facilitate, interesting way to count up the overall work done by the
algorithm, and will greatly facilitate, the analysis. So specifically, what is
this tree? So at level zero, we have a root. And this corresponds to the
outer call of Merge Sort, okay? So I'm gonna call this level zero. Now
this tree is going to be binary in recognition of the fact that each
indication of Merge Sort makes two recursive calls. So the two children
will correspond to the two recursive calls of Merge Sort. So at the route,
we operate on the entire input array, so let me draw a big array
indicating that. And at level one, we have one sub problem for the left
half, and another sub problem for the right half of the input array. And I'll
call these first two recursive calls, level one. Now of course each of
these two level one recursive calls will themselves make two recursive
calls. Each operating on then a quarter of the original input array. So
those are the level two recursive calls, of which there are four, and this
process will continue until eventually the recursion bottoms out, in base
cases when they're only an array size zero or one. So now I have a
question for you which I'll, I'll give you in the form of a quiz which is, at
the bottom of this recursion tree corresponding to the base cases, what
is the level number at the bottom? So, at what level do the leaves in this
tree reside? Okay, so hopefully you guess,correctly guess, that the
answer is the second one, so namely that the number of levels of the
recursion tree is essentially logarithmic in the size of the input array. The
reason is basically that the input size is being decreased by a factor two
with each level of the recursion. If you have an input size of N at the
outer level, then each of the first set of recursive calls operates on array
of size n over two, at level two, each array has size n over four, and so
on. Where does the recursion bottom out? Well, down at the base cases
where there's no more recursion, which is where the input array has size
one or less. So in other words, the number of levels of recursion is
exactly the number of times you need to divide N by two, until you get
down to a number that's most one. Recall that's exactly the definition of
a logarithm base two of n. So since the first level is level zero and last
level is level log base two of n. The total number of levels is actually log
base two of n plus one. And when I write down this expression, I'm here
assuming that N is a, is a power of two. Which is not a big deal. I mean
the analysis is easily extended to the case where N is not a power of
two. And this way, we don't have to think about fractions. Log base two
of N then is an integer. Okay so let's return to the recursion tree. Let me
just redraw it really quick. So again, down here at the bottom of the tree
we have the leaves, i.e. The base cases where there's no more
recursion. Which when N is the power of two correspond exactly to
single element arrays. So that's the recursion tree corresponding to an
indication of Merge Sort. And the motivation for writing down, for
organizing the work performed by Merge Sort in this way, is it allows us
to count up the work, level by level. And we'll see that that's a particularly
convenient way to account for all of the different lines of code that get
executed. Now, to see that in more detail, I need to ask you to identify a
particular pattern. So, first of all, the first question is, at a given level, j, of
this recursion, exactly how many distinct sub-problems are there, as a
function of the level j? That's the first question. The second question is,
for each of those distinct sub-problems at level j, what is the input size?
So, what is the size of the a-, of the array, which is passed to a sub-
problem residing at level j of this recursion tree. So, the correct answer is
the third one. So, first of all, at a given level, j, there's precisely two to
the j distinct sub-problems. There's one outermost sub-problem at level
zero, it has two recursive calls, those are the two, sub-problems at level
one, and so on. In general, since merge short calls itself twice, the
number of sub-problems is doubling at each level, so that gives us the
expression, two to the j, for the number of sub-problems at level j. On the
other hand, by a similar argument, the input size is halving each time.
With each recursive call you pass it half of the input. That you were
given. So at each level of the recursion tree we're seeing half of the input
size of the previous level. So after J levels, since we started with an
input size of N, after J levels, each sub-problem will be operating on an
array of length N over two to the J. Okay, so now let's put this pattern to
use, and actually count up all of the lines of code that Merge Sort
executes. And as I said before, the key, the key idea is to count up the
work level by level. Now, to be clear, when I talk about the amount of
work done at level J, what I'm talking about is the work done by those 2
to the J invocations of Merge Sort, not counting their respective
recursive calls. Not counting work which is gonna get done in the
recursion, lower in the tree. Now, recall, Merge Sort is a very simple
algorithm. It just has three lines of code. First there is a recursive call so
we're not counting that, second there is another recursive call again
we're not counting that a little j and then third we just invoke the merge
subroutine. So really outside of the recursive calls all that merge sort
does is a single invocation of merge. Further recall we already have a
good understanding of the number of lines of code that merge needs.
On an input of size m, it's gonna use, at most, 6m lines of code. That's
an analysis that we did in the previous video. So, let's fix a level j. We
know how many sub-problems there are, two to the j. We know the size
of each sub-problem, n over two to the j, and we know how much work
merge needs on such an input, we just multiply it by six, and then we just
multiply it out, and we get the amount of work done at a level j. 'Kay?
And all of the little adjacent problems. So here it is in more detail. Alright.
So. We start with just the number of different sub-problems at level J and
we just notice that, that was at most two to the J. We also observe that
each level J sub-problem is passed an, an array as input which has
length N over two to the J. And we know that the merge subroutine,
when given an input, given an array of size, N over two to the J, will
execute almost six times that many number of lines of code. So to
compute the total amount of work done at level J, we just multiply the
number of problems times the work done sub-problem, per sub problem.
And then something sort of remarkable happens, where you get this
cancellation of the two, two to the Js. And we get an upper bound 6N.
Which is independent of the level J. So we do at most six end operations
at the root, we do at most six end operations at level one, at level two,
and so on, okay? It's independent of the level. Morally, the reason this is
happening is because of a perfect equilibrium between two competing
forces. First of all, the number of subproblems is doubling with each.
Level of the recursion tree but 2ndly the amount of work that we do per
sub-problem is halving with each level of the recursion tree's, once those
two cancel out. We get an upper bound 6N, which is independent of the
level J. Now, here's why that's so cool, right? We don't really care about
the amount of work just at a given level. We care about the amount of
work that Merge Sort does ever, at any level. But, if we have a bound on
the amount of work at a level which is independent of the level, then our
overall bound is really easy. What do we do? We just take the number of
levels, and we know what that is. It's exactly log based two of N plus
one. Remember the levels are zero through log based two of N inclusive.
And then we have an upper bound 6N for each of those log n plus one
levels. So if we expand out this quantity, we get exactly the upper bound
that was claimed earlier namely the number of operations merge sort
executes is at most 6n times log base 2 of n plus 6n. So that my friends,
is a running time analysis of the merge sort algorithm. That's why it's
running time is bounded by a constant times N log N which, especially
as N grows large, it is far superior to the more simple iterative algorithms
like insertion or selection sort.
Merge Sort: Analysis - Question 2
1 point possible (ungraded)

What is the pattern? Fill in the blanks in the following statement: at each
level j = 0, 1, 2,3, … log2(n) there are _______ subproblems, each of size
______.
2j and 2j respectively
n/2j and n/2j respectively
2j and n/2j respectively
n/2j and 2j respectively

Merge Sort: Analysis - Part 3


Transcript

In this video, we'll be giving a running time analysis of the merge sort
algorithm. In particular, we'll be substantiating the claim that the
recursive divide and conquer merge sort algorithm is better, has better
performance than simpler sorting algorithms that you might know, like
insertion sort, selection sort, and bubble sort. So, in particular, the goal
of this lecture will be to mathematically argue the following claim from an
earlier video, that, in order to sort an array in numbers, the merge sort
algorithm needs no more than a constant times N log N operations.
That's the maximum number of lines of executable code that will ever
execute specifically six times n log n plus six n operations. So, how are
we going to prove this claim? We're going to use what is called a
recursion tree method. The idea of the recursion tree method is to write
out all of the work done by the recursive merge sort algorithm in a tree
structure, with the children of a given node corresponding to the
recursive calls made by that node. The point of this tree structure is it will
facilitate, interesting way to count up the overall work done by the
algorithm, and will greatly facilitate, the analysis. So specifically, what is
this tree? So at level zero, we have a root. And this corresponds to the
outer call of Merge Sort, okay? So I'm gonna call this level zero. Now
this tree is going to be binary in recognition of the fact that each
indication of Merge Sort makes two recursive calls. So the two children
will correspond to the two recursive calls of Merge Sort. So at the route,
we operate on the entire input array, so let me draw a big array
indicating that. And at level one, we have one sub problem for the left
half, and another sub problem for the right half of the input array. And I'll
call these first two recursive calls, level one. Now of course each of
these two level one recursive calls will themselves make two recursive
calls. Each operating on then a quarter of the original input array. So
those are the level two recursive calls, of which there are four, and this
process will continue until eventually the recursion bottoms out, in base
cases when they're only an array size zero or one. So now I have a
question for you which I'll, I'll give you in the form of a quiz which is, at
the bottom of this recursion tree corresponding to the base cases, what
is the level number at the bottom? So, at what level do the leaves in this
tree reside? Okay, so hopefully you guess,correctly guess, that the
answer is the second one, so namely that the number of levels of the
recursion tree is essentially logarithmic in the size of the input array. The
reason is basically that the input size is being decreased by a factor two
with each level of the recursion. If you have an input size of N at the
outer level, then each of the first set of recursive calls operates on array
of size n over two, at level two, each array has size n over four, and so
on. Where does the recursion bottom out? Well, down at the base cases
where there's no more recursion, which is where the input array has size
one or less. So in other words, the number of levels of recursion is
exactly the number of times you need to divide N by two, until you get
down to a number that's most one. Recall that's exactly the definition of
a logarithm base two of n. So since the first level is level zero and last
level is level log base two of n. The total number of levels is actually log
base two of n plus one. And when I write down this expression, I'm here
assuming that N is a, is a power of two. Which is not a big deal. I mean
the analysis is easily extended to the case where N is not a power of
two. And this way, we don't have to think about fractions. Log base two
of N then is an integer. Okay so let's return to the recursion tree. Let me
just redraw it really quick. So again, down here at the bottom of the tree
we have the leaves, i.e. The base cases where there's no more
recursion. Which when N is the power of two correspond exactly to
single element arrays. So that's the recursion tree corresponding to an
indication of Merge Sort. And the motivation for writing down, for
organizing the work performed by Merge Sort in this way, is it allows us
to count up the work, level by level. And we'll see that that's a particularly
convenient way to account for all of the different lines of code that get
executed. Now, to see that in more detail, I need to ask you to identify a
particular pattern. So, first of all, the first question is, at a given level, j, of
this recursion, exactly how many distinct sub-problems are there, as a
function of the level j? That's the first question. The second question is,
for each of those distinct sub-problems at level j, what is the input size?
So, what is the size of the a-, of the array, which is passed to a sub-
problem residing at level j of this recursion tree. So, the correct answer is
the third one. So, first of all, at a given level, j, there's precisely two to
the j distinct sub-problems. There's one outermost sub-problem at level
zero, it has two recursive calls, those are the two, sub-problems at level
one, and so on. In general, since merge short calls itself twice, the
number of sub-problems is doubling at each level, so that gives us the
expression, two to the j, for the number of sub-problems at level j. On the
other hand, by a similar argument, the input size is halving each time.
With each recursive call you pass it half of the input. That you were
given. So at each level of the recursion tree we're seeing half of the input
size of the previous level. So after J levels, since we started with an
input size of N, after J levels, each sub-problem will be operating on an
array of length N over two to the J. Okay, so now let's put this pattern to
use, and actually count up all of the lines of code that Merge Sort
executes. And as I said before, the key, the key idea is to count up the
work level by level. Now, to be clear, when I talk about the amount of
work done at level J, what I'm talking about is the work done by those 2
to the J invocations of Merge Sort, not counting their respective
recursive calls. Not counting work which is gonna get done in the
recursion, lower in the tree. Now, recall, Merge Sort is a very simple
algorithm. It just has three lines of code. First there is a recursive call so
we're not counting that, second there is another recursive call again
we're not counting that a little j and then third we just invoke the merge
subroutine. So really outside of the recursive calls all that merge sort
does is a single invocation of merge. Further recall we already have a
good understanding of the number of lines of code that merge needs.
On an input of size m, it's gonna use, at most, 6m lines of code. That's
an analysis that we did in the previous video. So, let's fix a level j. We
know how many sub-problems there are, two to the j. We know the size
of each sub-problem, n over two to the j, and we know how much work
merge needs on such an input, we just multiply it by six, and then we just
multiply it out, and we get the amount of work done at a level j. 'Kay?
And all of the little adjacent problems. So here it is in more detail. Alright.
So. We start with just the number of different sub-problems at level J and
we just notice that, that was at most two to the J. We also observe that
each level J sub-problem is passed an, an array as input which has
length N over two to the J. And we know that the merge subroutine,
when given an input, given an array of size, N over two to the J, will
execute almost six times that many number of lines of code. So to
compute the total amount of work done at level J, we just multiply the
number of problems times the work done sub-problem, per sub problem.
And then something sort of remarkable happens, where you get this
cancellation of the two, two to the Js. And we get an upper bound 6N.
Which is independent of the level J. So we do at most six end operations
at the root, we do at most six end operations at level one, at level two,
and so on, okay? It's independent of the level. Morally, the reason this is
happening is because of a perfect equilibrium between two competing
forces. First of all, the number of subproblems is doubling with each.
Level of the recursion tree but 2ndly the amount of work that we do per
sub-problem is halving with each level of the recursion tree's, once those
two cancel out. We get an upper bound 6N, which is independent of the
level J. Now, here's why that's so cool, right? We don't really care about
the amount of work just at a given level. We care about the amount of
work that Merge Sort does ever, at any level. But, if we have a bound on
the amount of work at a level which is independent of the level, then our
overall bound is really easy. What do we do? We just take the number of
levels, and we know what that is. It's exactly log based two of N plus
one. Remember the levels are zero through log based two of N inclusive.
And then we have an upper bound 6N for each of those log n plus one
levels. So if we expand out this quantity, we get exactly the upper bound
that was claimed earlier namely the number of operations merge sort
executes is at most 6n times log base 2 of n plus 6n. So that my friends,
is a running time analysis of the merge sort algorithm. That's why it's
running time is bounded by a constant times N log N which, especially
as N grows large, it is far superior to the more simple iterative algorithms
like insertion or selection sort.

Guiding Principles for Analysis of


Algorithms
Transcript
Having completed our first analysis of an algorithm, namely an upper
bound on the running time of the Merge Short algorithm. What I wanna
do next is take a step back, and be explicit about three assumptions,
three biases that we made when we did this analysis of Merge Short,
and interpreted the results. These three assumptions we will adopt as
guiding principles for how to reason about algorithms, and how to define
a so called fast algorithm for the rest of the course. So, the first guiding
principle is that we used what's often called worst case analysis. By
worst case. Analysis, I simply mean that our upper bound of six N log N
plus six N. Applies to the number of lines of executed for every single
input array of length end. We made absolutely no assumptions about the
input, where it comes from, what it looks like beyond what the input
length N was. Put differently, if, hypothetically, we had some adversary
whose sole purpose in life was to concoct some malevolent input
designed to make our algorithm run as slow as possible. The worst this
adversary could do, is. Upper bounded by this same number, 6N log N +
6N. Now, this, so, sort of worst case guarantee popped out so naturally
from our analysis of Merge Short, you might well be wondering, what
else could you do? Well, two other methods of analysis, which do have
their place, although we won't really dicuss them in this course, are
quote unquote, average case analysis. And also the use of a set of
prespecified benchmarks. By average case analysis, I mean, you
analyze the average running time of an algorithm under some
assumption about the relative frequencies of different inputs. So, for
example, in the sorting problem, one thing you could do, although it's not
what we do here. You could assume that every possible input array is
equally unlikely, and then analyze the average running time of an
algorithm. By benchmarks, I just mean that one agrees up front about
some set, say ten or twenty, Benchmark inputs, which are thought to
represent practical or typical inputs for the algorithm. Now, both average-
case analysis and benchmarks are useful in certain settings, but for
them to make sense, you really have to have domain knowledge about
your problem. You need to have some understanding of what inputs are
more common than others, what inputs better represent typical inputs
than others. By contrast, in worst-case analysis, by definition you're
making absolutely no assumptions about where the input comes from.
So, as a result, worst-case analysis is particularly appropriate for
general-purpose sub-routines. Sub-routines that you design. Find
without having any knowledge of how they will be used or what kind of
inputs they will be used on. And happily, another bonus of doing worst
case analysis, as we will in this course, it's usually mathematically much
more attractable than trying to analyze the average performance of an
algorithm under some distribution over inputs. Or to understand the
detailed behavior of an algorithm on a particular set of benchmark
inputs. This mathemetical tractabilty was reflected in our Merge Sort
analysis, where we had no a priori goal of analyzing the worst case, per
se. But it's naturally what popped out of our reasoning about the
algorithm's running time. The second and third guiding principles are
closely related. The second one is that, in this course, when we analyze
algorithms, we won't worry unduly about small constant factors or lower
order terms. We saw this philosophy at work very early on in our
analysis of merge sort. When we discussed the number of lines of code
that the merge subroutine requires. We first upper-bounded it by 4m plus
two, for an array of length m, and then we said, eh, let's just think about
it as 6m instead. Let's have a simpler, sloppy upper-bound and work with
that. So, that was already an example of not worrying about small
changes in the constant factor. Now, the question you should be
wondering about is, why do we do this, and can we really get away with
it? So let me tell you about the justifications for this guiding principle. So
the first motivation is clear, and we used it already in our merge short
analysis. Which is simply way easier mathematically, if we don't have to,
precisely pin down what the [inaudible] constant factors and lower-order
terms are. The second justification is a little less obvious, but is
extremely important. So, I claim that, given the level at which we're
describing and analyzing algorithms in this course, it would be totally
inappropriate to obsess unduly about exactly what the constant factors
are. Recall our discussion of the merge subroutine. So, we wrote that
subroutine down in pseudocode, and we gave it analysis of 4m plus two
on the number of lines of code executed, given an input of length m. We
also noted that, it was somewhat ambiguous exactly how many lines of
code we should count it as, depending on how you count loop
increments and so on. So even there it's small constant factors could
creep in given the under specification of the pseudo code. Depending on
how that pseudo code gets translated into an actual program language
like C or Java. You'll see the number of lines of code deviate even
further, not but a lot but again by small constant factors. When such a
program is then compiled down into machine code, you'll see even
greater variance depending on the exact processor, the compiler, the
compiler optimizations, the programming implementation, and so on. So
to summarize, because we're going to describe algorithms at a level.
That transcends any particular programming language. It would be
inappropriate to specify precise constants. The precise constants were
ultimately determined by more machine dependent aspects like who the
programmer is, what the compiler is, what the processor is, and so on.
And now the third justification is frankly, we're just going to be able to get
away with it. [sound] That is, one might be concerned that ignoring
things like small constant factors leads us astray. That we wind up
deriving results which suggest that an algorithm is fast when it's really
slow in practice, or vice versa. And for the problems we discuss in this
course we'll get extremely accurate predictive power. Even though we
won't be keeping track of lower terms and constant factors. When the
mathematical analysis we do suggests that an algorithm is fast, indeed it
will be. When it suggests that it's not fast, indeed that will be the case.
So we lose a little bit of granularity of information. But we don't lose in
what we really care about, which is accurate guidance about what
algorithms are gonna be faster than others. So the first two justifications,
I think, are pretty self evident. This third justification is more of an
assertion, but it's one we'll be baking up over and over again as we
proceed through this course. Now, don't get me wrong. I'm not saying
constant factors aren't important in practice. Obviously, for crucial
programs the constant factors are hugely important. If you're running the
sorta crucial loop, you know, your start up's survival depends on, by all
means optimize the constant like crazy. The point is just that
understanding tiny constant factors in the analysis is an inappropriate
level of granularity for the kind of algorithm analysis we're going to be
doing in this course. Okay, lets move on the, the third and final guiding
principle. So the third principle is that we're going to use what's called
asymptotic analysis, by which I mean we will focus on the case of a large
input sizes. The performance of an algorithm as the size N of the input
grows large, that is, tends to infinity. Now this focus on large input size is
it was already evident when we interpreted our bound on Merge Sort.
So, how did we describe the bound on Merge Sort? We said, oh, well, it
needs a number of operations proportional, a constant fact or times in
login. And we very cavalierly declared that this was better than any
algorithm which has quadratic dependence of it's running time on the
number of operations. So for example, we argued that merge sort is a
better, faster algorithm than something like insertion sort, without
actually discussing the constant factors at all. So mathematically. We
were saying the running time of merge short, which we know, which we
can represent as the function. Six N log base two of N + 6N is better
than any function which has a quadratic dependence on n. Even one
with a small constant like lets say 1/2 N squared which might roughly be
the running time of insertion sort. And this is a mathematical statement
that is true if and only if N is sufficiently large once N grows large it is
certainly true that the expression on the left is smaller than the
expression on the right but for small N the expression on the right is
actually going to be smaller because of the smaller leading term so in
saying that merge sort is superior to insertion sort the bias is that we're
focusing on problems with a large N so the question you should have is
that reasonable is that a justified assumption to focus on large input
sizes and the answer is certainly yes. So the reason we focus on large
input sizes is because, frankly, those are the only problems which are
even, which are at all interesting. If all you need to do is sort 100
numbers, use whatever method you want, and it's gonna happen
instantaneously on modern computers. You don't need to know say, the
divide and conquer paradigm, if all you need to do is sort 100 numbers.
So one thing you might be wondering is if, with computers getting faster
all the time according to Moore's Law, if really, it doesn't even matter to
think about algorithmic analysis, if eventually all problem sizes will just
be trivially solvable on super fast computers. But, in fact, the opposite is
true. Moore's Law, with computers getting faster, actually says that our
computational ambitions will naturally grow. We naturally focus on ever
larger problem sizes. And the gulf between an N squared algorithm and
an m log n algorithm will become ever wider. A different way to think
about it is in terms of how much bigger a problem size you can solve. As
computers get faster. If you are using an algorithm with a running time
which is proportional to the input size then the computers get faster by a
factor of four then you can solve problems that are factor of four or
larger. Whereas if you are using an algorithm whose running time is
proportional to the square of the input size then a computer gets faster
by a factor of four, you can only solve double the problem size and we'll
see even starker examples of this gulf between different algorithm
approaches as the time goes on. So to drive this point home. Let me
show you a couple of graphs. So what we're looking at here, is we're
looking at a graph, of two functions. So the solid function. Is the upper
bound that we proved on merge sort. So this is gonna be 6nlog(base2)n
plus 6n. And the dotted line is an estimate. A rather generous estimate
about the running time of, [inaudible] sort. Namely one-half times N.
Squared. And we see here in the graph exactly the behavior that we
discussed earlier, which is that the small N. Down here. In fact because
one-half N. Squared has a smaller leading constant it's actually a smaller
function. And this is true up to this crossing point of maybe 90 or so.
Again, beyond n=90. The quadratic growth in the N squared term.
Overwhelms the fact that it had a smaller constant and it starts being
bigger than this other function six of N + six N so in the regime below 90
it's predicting that the insertion store will be better and in the regime
above 90 it's predicting that merge sort will be faster. Now here's what's
interesting let's scale the X axis let's look well beyond this crossing point
of 90 let's just increase it in order of magnitude up to a raise in size
1500. And I want to emphasize these are still very small problem sizes. If
all you need to do is sort arrays of size 1500 you really don't need to
know Divide-and-conquer or anything else we'll talk about -- that's a
pretty trivial problem on modern computers. [sound]. So what we're
seeing is, that even for very modest problem sizes here, array of, of,
size, say 1500. The quadratic dependence in the insertion sort bound is
more than dwarfing the fact, that it had a lower constant factor. So in this
large regime, the gulf between the two algorithms is growing. And of
course, if I increased it another 10X or 100x or 1000x to get to genuinely
interesting problem sizes, the gap between these two algorithms would
be even bigger, it would be huge. That said, I'm not saying you should
be completely ignorant of constant factors when you implement
algorithms. It's still good to have a general sense of what these constant
factors are so for example in highly tuned versions of Merge Sort which
you'll find in mny programming libraries. In fact, because of the
difference in constant factors, the algorithm will actually switch from
Merge Sort over to insertion sort, once the problem size drops below
some particular threshold, say seven elements, or something like that.
So for small problem sizes, you use the algorithm with smaller constant
factors, and the insertion sort for larger problem sizes, you use the
algorithm and better rate of growth, mainly merge short. So, to review
our first guiding principal is that we're going to pursue worse case
analysis. We're going to look to bounds on the performance, on the
running time of an algorithm which make no domain assumptions, which
make no assumptions about which input of a given length the algorithm
is provided. The second guiding principal is we're not going to focus on
constant factors or lower returns, that would be inappropriate, given the
level of granularity at which we're describing algorithms and third is
where going to focus on the rate of growth of algorithms for large
problem sizes. Putting these three principles together, we get a
mathematical definition of a fast algorithm. Namely, we're gonna pursue
algorithms whose worst case running time grows slowly as a function of
the input size. So let me tell you how you should interpret what I just
wrote down in this box. So on the left hand side is clearly what we want.
Okay, we want algorithms which run quickly if we implement them. And
on the right hand side is a proposed mathematical surrogate of a fast
algorithm. Right, the left hand side is not a mathematical definition. The
right hand side is, as well become clear in the next set of lectures. So
we're identifying fast algorithms, which, those that have good asymptotic
running time which grows slowly with the input size. Now, what would we
want from a mathematical definition? We'd want a sweet spot. On one
hand we want something we can actually reason about. This is why we
zoom out and squint and ignore things like constant factors and lower
terms. We can't keep track of everything. Otherwise we'd never be able
to analyze stuff. On the other hand we don't want to throw out the baby
with the bath water, we want to retain predictive power and this turns out
this definition turns out for the problems we're going to talk about in this
course, to be the sweet spot for reasoning about algorithms okay worst
case analysis using the asymptotic running time. We'll be able to prove
lots of theorems. We'll be able to establish a lot of performance
guarantees for fundamental algorithms but at the same time we'll have
good predictive power what the theory advocates will in fact be
algorithms that are known to be best in practice. So, the final explanation
I owe you, is, what do I mean by, the running time grows slowly, with
respect to the input size? Well, the answer depends a little bit on the
context, but, for almost all of the problems we're going to discuss, the
holy grail will be to have what's called a linear time algorithm, an
algorithm whose number of instructions grows proportional to the input
size. So, we won't always be able to achieve linear time, but that's, in
some sense, the best case scenario. Notice, linear time is even better
than what we achieved with our merge short algorithm for sorting. Merge
short runs a little bit superlinear, it's n times log n, running as the input
size. If possible, we. To be linear time. It's not always gonna be possible,
but that is what we will aspire toward. For most of the problems we'll
discuss in this course. Looking ahead, the next series of videos is going
to have two goals. First of all, on the analysis side, I'll describe formally
what I mean by asymptotic running time. I'll introduce "Big Oh" notation
and its variants, explain its mathematical definitions, and give a number
of examples. On the design side, we'll get more experience applying the
divide and conquer paradigm to further problems. See you then.
Chapter 2: ASYMPTOTIC ANALYSIS

The Gist - Part 1


Transcript

In this sequence of lectures we're going to learn Asymptotic Analysis.


This is the language by which every serious computer programmer and
computer scientist discusses the high level performance of computer
algorithms. As such, it's a totally crucial topic. In this video, the plan is to
segue between the high level discussion you've already seen in the
course introduction and the mathematical formalism,which we're going to
start developing in the next video. Before getting into that mathematical
formalism, however. I want to make sure that the topic is well motivated.
That you have solid intuition for what it's trying to accomplish. And also
that you've seen a couple simple, intuitive examples. Let's get started.
[UNKNOWN] analysis provides basic vocabulary for discussing the
design and analysis in algorithms. More, it is a mathematical concept it is
by no means math for maths sake. You will very frequently hear serious
programmers saying that such and such code runs at O of n time, where
such and such other code runs in o of n square times. It's important you
know what programmers mean when they make statements like that.
The reason this vocabulary is so ubiquitous, is that it identifies a sweet
spot for discussing the high level performance of algorithms. What I
mean by that is, it is on the one hand coarse enough to suppress all of
the details that you want to ignore. Details that depend on the choice of
architecture, the choice of programming language, the choice of
compiler. And so on. On the other hand, it's sharp enough to be useful.
In particular, to make predictive comparisons between different high
level algorithmic approaches to solving a common problem. This is going
to be especially true for large inputs. And remember as we discussed in
some sense. Large inputs are the interesting ones. Those are the ones
for which we need algorithmic enginuity. For example, asotonic analysis
will allow us to differentiate between better and worse approaches to
sorting. Better and worse approaches to multiplying two integers, and so
on. Now most serious programmers if you ask them, what's the deal with
asymptotic analysis anyways? They'll tell you reasonably, that the main
point is to suppress both leading constant factors and lower order terms.
Now as we'll see there's more to Asymptotic Analysis than just these
seven words here but long term, ten years from now, if you only
remember seven words about Asymptotic Analysis I'll be reasonably
happy if these are the seven words that you remember. So how do we
justify adopting a formalism which essentially by definition suppresses
constant factors and lower-order terms. Well lower-order terms basically
by definition become increasingly irrelevant as you focus on large inputs.
Which as we've argued are the interesting inputs, the ones where
algorithmic ingenuity is important. As far as constant factors these are
going to be highly dependent on the details of the environment, the
compiler, the language and so on. So, if we want to ignore those details
it makes sense to have a formalism which doesn't focus unduly on
leading constant factors. Here's an example. Remember when we
analyzed the merge sort algorithm? We gave an upper bound on its
running time that was 6 times n log n plus 6n where n was the input
length, the number of numbers [COUGH] in the input array. So, the
lower order term here is the 6n. That's growing more slowly than n log n.
So, we just drop that. And then the leading constant factor is the 6 so we
supress that well after the 2 supressions we're left with a much simpler
expression N log N. The terminology would then be to say that the
running time of merge search is big O of N log N. So in other words
when you say that an algorithms is big O of some function what you
mean is that after you drop the lower order terms. And suppress the
leasing, leading constant factor, you're left with the function f of n.
Intuitively that is what big O notation means. So to be clear I'm certainly
not asserting the constant factors never matter when you're designing an
alg, analyzing algorithms. Rather, I'm just saying that when you think
about high-level algorithmic approaches, when you want to make a
comparison between fundamentally differnt ways of solving a problem.
Asymptotic Analysis is often the right tool for giving you guidance about
which one is going to perform better, especially on reasonably large
inputs. Now, once you've committed to a particular algorithmic solution
to a problem Of course, you might want to then work harder to improve
the leading constant factor, perhaps even to improve the lower order
terms. By all means, if the future of your start-up depends on how
efficiently you implement some particular set of lines of code, have at it.
Make it as fast as you can. In the rest of this video I want to go through
four very simple examples. In fact, these examples are so simple, if you
have any experience with big O notation You're probably just better off
skipping the rest of this video and moving on the mathematical formalism
that we begin in the next video. But if you've never seen it before I hope
these simple examples will get you oriented. So let's begin with a very
basic problem, searching an array for a given integer. Let's analyze the
straight forward algorithm for this problem where we just do a linear scan
through, through the array, checking each entry to see if it is the desired
integer t. That is the code just checks each array entry in turn. If it ever
finds integer t it returns true. If it falls off the end of the array without
finding it it returns false. So, what do you think? We haven't formally
defined big O notation but, I've given you an intuitive description. What
would you say is the running time of this algorithm as a function of the
length of the array of capital A. So the answer I am looking for is C, the
O(n) or covalently we would say that the running time of this algorithm is
linear in the input length n. Why is that true? Well, let's think about how
many operations this piece of code is going to execute. Actually, the
lines of code executed is going to depend on the input. It depends on
whether or not the target t is contained in the array a, and if so, where in
the array a it lies. But, in the worse case, this code will do an
unsuccessful search. >> T will not be in the array and the code will scan
through the entire array A and return false. The number of operations
then is a constant. There's some initial setup perhaps and maybe it's an
operation to return this final boolean value, but outside of that constant
which will get suppressed in the big annotation, it does a constant
number of operations per entry in the array. And you could argue about
what the constant is, if it's 2, 3, 4 operations per entry in the array, but
the point it whatever that constant is, 2, 3, or 4, it gets conveniently
suppressed by the Big O notation. So as a result, total number of
operations will be linear in n, and so the Big O notation will just be O of
N. So that was the first example, and the last three examples, I want to
look at different ways that we could have two loops. And in this example,
I want to think about one loop followed by another. So two loops in
sequence. I want to study almost the same problem as the previous one.
Where now we're just given two arrays, capital a and capital b, we'll say
both of the same length n, and we want to know whether the target t is in
either one of them. Again, we'll look at the straightforward algorithm,
where we just search through A, and if we fail to find t in A, we search
through B. If we don't find t in B either, then we have to return false. So
the question then is exactly the same as last time. Given this new longer
piece of code, what, in big O notation, is its running time? Well the
question was the same and in this case the answer was the same so this
algorithm just like the last one has running time big O of N if we actually
count the number of operations it won't be exactly the ssame as last time
it will be roughly twice as many operations. As the previous piece of
code. That's because we have to search two different arrays, each of
length n. So whatever work we did before. We now do it twice as many
times. Of course, that, too, being a constant independent of the input
length n, is going to get suppressed once we passed a big O notation.
So, this, like the previous algorithm, is a linear time algorithm. It has
running time big O of n. Let's look at a more interesting example of two
loops where rather than processing each loop in sequence, they're going
to be nested. In particular let's look at the problem of searching whether
two given input arrays each of length n contain a common number. The
code that we're going to look at for solving this problem is the most
straightforward one you can, you can imagine where we just compare all
possibilities. So for each index i into the array a and each index j into the
array b, we just see if A i is the same number as B j. If it is, we return
true. If we exhaust all of the possibilities without ever finding equal
elements Then we're save in returning false. The question is of course
is, in terms of big O notation, asymptotic analysis, as a function of the
array length n, what is the running time of this piece of code? So this
time, the answer has changed. For this piece of code, the running time is
not big O of n. But it is big O of n squared. So we might also call this a
quadratic time algorithm. because the running time is quadratic in the
input length n. So this is one of those kinds of algorithms, where, if you
double the input length. Then the running time of the algorithm will go up
by a factor of 4, rather than by a factor of 2 like in the previous two
pieces of code. So, why is this? Why does it have [UNKNOWN] running
time [UNKNOWN] of n squared? Well again, there's some constant
setup cost which gets suppressed in the big [UNKNOWN]. Again, for
each fixed choice of an entry i into array a, and then index j for array b
for each fixed choice for inj. We only do a constant number of
operations. The particular constants are relevant, because it gets
suppressed in the big O notation. What's different is that there's a total of
n squared iterations of this double four loop. In the first example, we only
had n iterations of a single four loop. In our second example, because
one four loop completed before the second one began. We had only two
n iterations overall. Here for each of the n iterations of the outer for loop
we do n iterations of the inner for loop. So that gives us the n times n i.e.
n squared total iterations. So that's going to be the running time of this
piece of code. Let's wrap up with one final example. It will again be
nested for loops, but this time, we're going to be looking for duplicates in
a single array A, rather than needing to compare two distinct arrays A
and B. So, here's the piece of code we're going to analyze for solving
this problem, for detecting whether or not the input array A has duplicate
entries. There's only 2 small difference relative to the code we went
through on the previous slide when we had 2 different arrays the first
surprise, the first change won't surprise you at all which instead of
referencing the array B I change that B to an A so I just compare the ith
entry of a to the Jth entry of A. The second change is a little more subtle
which is I changed the inner for loop so the index J begins. At I plus 1.
Where I is the current value of the outer four loops index. Rather than
starting at the index 1. I could have had it start at the index one. That
would still be correct. But, it would be wasteful. And you should think
about why. If we started the inner four loops index at 1. Then this code
would actually compare each distinct pair of elements at a to each other
twice. Which, of course, is silly. You only need to compare two different
elements of a to each other one. To know whether they are equal or not.
So this is the piece of code. The question is the same as it always is
what in terms of bigger notations in the input link n is the running time of
this piece of code. So the answer to this question, same as the last one.
Big O of n squared. That is, this piece of code is also a quad-, has
quadratic running time. So what I hope was clear was that, you know?
Whatever the running time of this piece of code is. It's proportional to the
number of iterations of this double four loop. Like in all the examples, we
do constant work per iteration. We don't care about the constant. It gets
suppressed by the big O notation. So, all we gotta do is figure out how
many iterations there are of this double four loop. My claim is that there's
roughly n squared over two iterations of this double four loop. There's a
couple ways to see that. Informally, we discussed how the difference
between this code and the previous one, is that, instead of counting
something twice, we're counting it once. So that saves us a factor of two
in the number of iterations. Of course, this one half factor gets
suppressed by the big O notation anyways. So the big O, running time
doesn't change. A different argument would just say, you know? How
many, there's one iteration for every distinct choice of i and j of indices
between one and n. And a simple counting argument. Says that there's n
choose 2 such choices of distinct i and j, where n choose 2 is the
number n times n minus 1 over 2. And again, supressing lower-order
terms and the constant factor, we still get a quadratic dependence on
the length of the input array A. So that wraps up some of the sort of just
simple basic examples. I hope this gets you oriented, you have a strong
intuitive sense for what big O notation is trying to accomplish. And how
it's defined mathematically. Let's now move onto both the mathematical
developments and some more interesting algorithms.

The Gist - Question 1


1 point possible (ungraded)

Problem: Does array A contain the integer t? Given A (array oflength n) and
t (an integer).

Algorithm 1
1: for i = 1 to n do
2: if A[i] == t then
3: Return TRUE
4: Return FALSE
What is the running time of this piece of code?

The Gist - Part 2


Transcript
NOT AVAILABLE
Given A, B (arrays of length n) and t (an integer). [Does A or Bcontain t?]

Algorithm 2

1: for i = 1 to n do
2: if A[i] == t then
3: Return TRUE
4: for i = 1 to n do
5: if B[i] == t then
6: Return TRUE
7: Return FALSE
What is the running time of this piece of code?

The Gist - Part 3


In this sequence of lectures we're going to learn Asymptotic Analysis.
This is the language by which every serious computer programmer and
computer scientist discusses the high level performance of computer
algorithms. As such, it's a totally crucial topic. In this video, the plan is to
segue between the high level discussion you've already seen in the
course introduction and the mathematical formalism, which we're going
to start developing in the next video. Before getting into that
mathematical formalism, however. I want to make sure that the topic is
well motivated. That you have solid intuition for what it's trying to
accomplish. And also that you've seen a couple simple, intuitive
examples. Let's get started. [UNKNOWN] analysis provides basic
vocabulary for discussing the design and analysis in algorithms. More, it
is a mathematical concept it is by no means math for maths sake. You
will very frequently hear serious programmers saying that such and such
code runs at O of n time, where such and such other code runs in o of n
square times. It's important you know what programmers mean when
they make statements like that. The reason this vocabulary is so
ubiquitous, is that it identifies a sweet spot for discussing the high level
performance of algorithms. What I mean by that is, it is on the one hand
coarse enough to suppress all of the details that you want to ignore.
Details that depend on the choice of architecture, the choice of
programming language, the choice of compiler. And so on. On the other
hand, it's sharp enough to be useful. In particular, to make predictive
comparisons between different high level algorithmic approaches to
solving a common problem. This is going to be especially true for large
inputs. And remember as we discussed in some sense. Large inputs are
the interesting ones. Those are the ones for which we need algorithmic
enginuity. For example, asotonic analysis will allow us to differentiate
between better and worse approaches to sorting. Better and worse
approaches to multiplying two integers, and so on. Now most serious
programmers if you ask them, what's the deal with asymptotic analysis
anyways? They'll tell you reasonably, that the main point is to suppress
both leading constant factors and lower order terms. Now as we'll see
there's more to Asymptotic Analysis than just these seven words here
but long term, ten years from now, if you only remember seven words
about Asymptotic Analysis I'll be reasonably happy if these are the
seven words that you remember. So how do we justify adopting a
formalism which essentially by definition suppresses constant factors
and lower-order terms. Well lower-order terms basically by definition
become increasingly irrelevant as you focus on large inputs. Which as
we've argued are the interesting inputs, the ones where algorithmic
ingenuity is important. As far as constant factors these are going to be
highly dependent on the details of the environment, the compiler, the
language and so on. So, if we want to ignore those details it makes
sense to have a formalism which doesn't focus unduly on leading
constant factors. Here's an example. Remember when we analyzed the
merge sort algorithm? We gave an upper bound on its running time that
was 6 times n log n plus 6n where n was the input length, the number of
numbers [COUGH] in the input array. So, the lower order term here is
the 6n. That's growing more slowly than n log n. So, we just drop that.
And then the leading constant factor is the 6 so we supress that well
after the 2 supressions we're left with a much simpler expression N log
N. The terminology would then be to say that the running time of merge
search is big O of N log N. So in other words when you say that an
algorithms is big O of some function what you mean is that after you
drop the lower order terms. And suppress the leasing, leading constant
factor, you're left with the function f of n. Intuitively that is what big O
notation means. So to be clear I'm certainly not asserting the constant
factors never matter when you're designing an alg, analyzing algorithms.
Rather, I'm just saying that when you think about high-level algorithmic
approaches, when you want to make a comparison between
fundamentally differnt ways of solving a problem. Asymptotic Analysis is
often the right tool for giving you guidance about which one is going to
perform better, especially on reasonably large inputs. Now, once you've
committed to a particular algorithmic solution to a problem Of course,
you might want to then work harder to improve the leading constant
factor, perhaps even to improve the lower order terms. By all means, if
the future of your start-up depends on how efficiently you implement
some particular set of lines of code, have at it. Make it as fast as you
can. In the rest of this video I want to go through four very simple
examples. In fact, these examples are so simple, if you have any
experience with big O notation You're probably just better off skipping
the rest of this video and moving on the mathematical formalism that we
begin in the next video. But if you've never seen it before I hope these
simple examples will get you oriented. So let's begin with a very basic
problem, searching an array for a given integer. Let's analyze the
straight forward algorithm for this problem where we just do a linear scan
through, through the array, checking each entry to see if it is the desired
integer t. That is the code just checks each array entry in turn. If it ever
finds integer t it returns true. If it falls off the end of the array without
finding it it returns false. So, what do you think? We haven't formally
defined big O notation but, I've given you an intuitive description. What
would you say is the running time of this algorithm as a function of the
length of the array of capital A. So the answer I am looking for is C, the
O(n) or covalently we would say that the running time of this algorithm is
linear in the input length n. Why is that true? Well, let's think about how
many operations this piece of code is going to execute. Actually, the
lines of code executed is going to depend on the input. It depends on
whether or not the target t is contained in the array a, and if so, where in
the array a it lies. But, in the worse case, this code will do an
unsuccessful search. >> T will not be in the array and the code will scan
through the entire array A and return false. The number of operations
then is a constant. There's some initial setup perhaps and maybe it's an
operation to return this final boolean value, but outside of that constant
which will get suppressed in the big annotation, it does a constant
number of operations per entry in the array. And you could argue about
what the constant is, if it's 2, 3, 4 operations per entry in the array, but
the point it whatever that constant is, 2, 3, or 4, it gets conveniently
suppressed by the Big O notation. So as a result, total number of
operations will be linear in n, and so the Big O notation will just be O of
N. So that was the first example, and the last three examples, I want to
look at different ways that we could have two loops. And in this example,
I want to think about one loop followed by another. So two loops in
sequence. I want to study almost the same problem as the previous one.
Where now we're just given two arrays, capital a and capital b, we'll say
both of the same length n, and we want to know whether the target t is in
either one of them. Again, we'll look at the straightforward algorithm,
where we just search through A, and if we fail to find t in A, we search
through B. If we don't find t in B either, then we have to return false. So
the question then is exactly the same as last time. Given this new longer
piece of code, what, in big O notation, is its running time? Well the
question was the same and in this case the answer was the same so this
algorithm just like the last one has running time big O of N if we actually
count the number of operations it won't be exactly the ssame as last time
it will be roughly twice as many operations. As the previous piece of
code. That's because we have to search two different arrays, each of
length n. So whatever work we did before. We now do it twice as many
times. Of course, that, too, being a constant independent of the input
length n, is going to get suppressed once we passed a big O notation.
So, this, like the previous algorithm, is a linear time algorithm. It has
running time big O of n. Let's look at a more interesting example of two
loops where rather than processing each loop in sequence, they're going
to be nested. In particular let's look at the problem of searching whether
two given input arrays each of length n contain a common number. The
code that we're going to look at for solving this problem is the most
straightforward one you can, you can imagine where we just compare all
possibilities. So for each index i into the array a and each index j into the
array b, we just see if A i is the same number as B j. If it is, we return
true. If we exhaust all of the possibilities without ever finding equal
elements Then we're save in returning false. The question is of course
is, in terms of big O notation, asymptotic analysis, as a function of the
array length n, what is the running time of this piece of code? So this
time, the answer has changed. For this piece of code, the running time is
not big O of n. But it is big O of n squared. So we might also call this a
quadratic time algorithm. because the running time is quadratic in the
input length n. So this is one of those kinds of algorithms, where, if you
double the input length. Then the running time of the algorithm will go up
by a factor of 4, rather than by a factor of 2 like in the previous two
pieces of code. So, why is this? Why does it have [UNKNOWN] running
time [UNKNOWN] of n squared? Well again, there's some constant
setup cost which gets suppressed in the big [UNKNOWN]. Again, for
each fixed choice of an entry i into array a, and then index j for array b
for each fixed choice for inj. We only do a constant number of
operations. The particular constants are relevant, because it gets
suppressed in the big O notation. What's different is that there's a total of
n squared iterations of this double four loop. In the first example, we only
had n iterations of a single four loop. In our second example, because
one four loop completed before the second one began. We had only two
n iterations overall. Here for each of the n iterations of the outer for loop
we do n iterations of the inner for loop. So that gives us the n times n i.e.
n squared total iterations. So that's going to be the running time of this
piece of code. Let's wrap up with one final example. It will again be
nested for loops, but this time, we're going to be looking for duplicates in
a single array A, rather than needing to compare two distinct arrays A
and B. So, here's the piece of code we're going to analyze for solving
this problem, for detecting whether or not the input array A has duplicate
entries. There's only 2 small difference relative to the code we went
through on the previous slide when we had 2 different arrays the first
surprise, the first change won't surprise you at all which instead of
referencing the array B I change that B to an A so I just compare the ith
entry of a to the Jth entry of A. The second change is a little more subtle
which is I changed the inner for loop so the index J begins. At I plus 1.
Where I is the current value of the outer four loops index. Rather than
starting at the index 1. I could have had it start at the index one. That
would still be correct. But, it would be wasteful. And you should think
about why. If we started the inner four loops index at 1. Then this code
would actually compare each distinct pair of elements at a to each other
twice. Which, of course, is silly. You only need to compare two different
elements of a to each other one. To know whether they are equal or not.
So this is the piece of code. The question is the same as it always is
what in terms of bigger notations in the input link n is the running time of
this piece of code. So the answer to this question, same as the last one.
Big O of n squared. That is, this piece of code is also a quad-, has
quadratic running time. So what I hope was clear was that, you know?
Whatever the running time of this piece of code is. It's proportional to the
number of iterations of this double four loop. Like in all the examples, we
do constant work per iteration. We don't care about the constant. It gets
suppressed by the big O notation. So, all we gotta do is figure out how
many iterations there are of this double four loop. My claim is that there's
roughly n squared over two iterations of this double four loop. There's a
couple ways to see that. Informally, we discussed how the difference
between this code and the previous one, is that, instead of counting
something twice, we're counting it once. So that saves us a factor of two
in the number of iterations. Of course, this one half factor gets
suppressed by the big O notation anyways. So the big O, running time
doesn't change. A different argument would just say, you know? How
many, there's one iteration for every distinct choice of i and j of indices
between one and n. And a simple counting argument. Says that there's n
choose 2 such choices of distinct i and j, where n choose 2 is the
number n times n minus 1 over 2. And again, supressing lower-order
terms and the constant factor, we still get a quadratic dependence on
the length of the input array A. So that wraps up some of the sort of just
simple basic examples. I hope this gets you oriented, you have a strong
intuitive sense for what big O notation is trying to accomplish. And how
it's defined mathematically. Let's now move onto both the mathematical
developments and some more interesting algorithms.

The Gist - Question 3


Problem: Do arrays A, B have a number in common? Given arrays A, B of
length n.

Algorithm 3

1: for i = 1 to n do
2: for j = 1 to n do
3: if A[i] == B[j] then
4: Return TRUE
5: Return FALSE
What is the running time of this piece of code?
The Gist - Part 4
In this sequence of lectures we're going to learn Asymptotic Analysis.
This is the language by which every serious computer programmer and
computer scientist discusses the high level performance of computer
algorithms. As such, it's a totally crucial topic. In this video, the plan is to
segue between the high level discussion you've already seen in the
course introduction and the mathematical formalism,which we're going
to start developing in the next video. Before getting into that
mathematical formalism, however. I want to make sure that the topic is
well motivated. That you have solid intuition for what it's trying to
accomplish. And also that you've seen a couple simple, intuitive
examples. Let's get started. [UNKNOWN] analysis provides basic
vocabulary for discussing the design and analysis in algorithms. More, it
is a mathematical concept it is by no means math for maths sake. You
will very frequently hear serious programmers saying that such and such
code runs at O of n time, where such and such other code runs in o of n
square times. It's important you know what programmers mean when
they make statements like that. The reason this vocabulary is so
ubiquitous, is that it identifies a sweet spot for discussing the high level
performance of algorithms. What I mean by that is, it is on the one hand
coarse enough to suppress all of the details that you want to ignore.
Details that depend on the choice of architecture, the choice of
programming language, the choice of compiler. And so on. On the
other hand, it's sharp enough to be useful. In particular, to make
predictive comparisons between different high level algorithmic
approaches to solving a common problem. This is going to be especially
true for large inputs. And remember as we discussed in some sense.
Large inputs are the interesting ones. Those are the ones for which we
need algorithmic enginuity. For example, asotonic analysis will allow us
to differentiate between better and worse approaches to sorting. Better
and worse approaches to multiplying two integers, and so on. Now most
serious programmers if you ask them, what's the deal with asymptotic
analysis anyways? They'll tell you reasonably, that the main point is to
suppress both leading constant factors and lower order terms. Now as
we'll see there's more to Asymptotic Analysis than just these seven
words here but long term, ten years from now, if you only remember
seven words about Asymptotic Analysis I'll be reasonably happy if these
are the seven words that you remember. So how do we justify adopting
a formalism which essentially by definition suppresses constant factors
and lower-order terms. Well lower-order terms basically by definition
become increasingly irrelevant as you focus on large inputs. Which as
we've argued are the interesting inputs, the ones where algorithmic
ingenuity is important. As far as constant factors these are going to be
highly dependent on the details of the environment, the compiler, the
language and so on. So, if we want to ignore those details it makes
sense to have a formalism which doesn't focus unduly on leading
constant factors. Here's an example. Remember when we analyzed the
merge sort algorithm? We gave an upper bound on its running time that
was 6 times n log n plus 6n where n was the input length, the number of
numbers [COUGH] in the input array. So, the lower order term here is
the 6n. That's growing more slowly than n log n. So, we just drop that.
And then the leading constant factor is the 6 so we supress that well
after the 2 supressions we're left with a much simpler expression N log
N. The terminology would then be to say that the running time of merge
search is big O of N log N. So in other words when you say that an
algorithms is big O of some function what you mean is that after you
drop the lower order terms. And suppress the leasing, leading constant
factor, you're left with the function f of n. Intuitively that is what big O
notation means. So to be clear I'm certainly not asserting the constant
factors never matter when you're designing an alg, analyzing
algorithms. Rather, I'm just saying that when you think about high-level
algorithmic approaches, when you want to make a comparison between
fundamentally differnt ways of solving a problem. Asymptotic Analysis is
often the right tool for giving you guidance about which one is going to
perform better, especially on reasonably large inputs. Now, once you've
committed to a particular algorithmic solution to a problem Of course,
you might want to then work harder to improve the leading constant
factor, perhaps even to improve the lower order terms. By all means, if
the future of your start-up depends on how efficiently you implement
some particular set of lines of code, have at it. Make it as fast as you
can. In the rest of this video I want to go through four very simple
examples. In fact, these examples are so simple, if you have any
experience with big O notation You're probably just better off skipping
the rest of this video and moving on the mathematical formalism that we
begin in the next video. But if you've never seen it before I hope these
simple examples will get you oriented. So let's begin with a very basic
problem, searching an array for a given integer. Let's analyze the
straight forward algorithm for this problem where we just do a linear
scan through, through the array, checking each entry to see if it is the
desired integer t. That is the code just checks each array entry in turn. If
it ever finds integer t it returns true. If it falls off the end of the array
without finding it it returns false. So, what do you think? We haven't
formally defined big O notation but, I've given you an intuitive
description. What would you say is the running time of this algorithm as
a function of the length of the array of capital A. So the answer I am
looking for is C, the O(n) or covalently we would say that the running
time of this algorithm is linear in the input length n. Why is that true?
Well, let's think about how many operations this piece of code is going
to execute. Actually, the lines of code executed is going to depend on
the input. It depends on whether or not the target t is contained in the
array a, and if so, where in the array a it lies. But, in the worse case, this
code will do an unsuccessful search. >> T will not be in the array and
the code will scan through the entire array A and return false. The
number of operations then is a constant. There's some initial setup
perhaps and maybe it's an operation to return this final boolean value,
but outside of that constant which will get suppressed in the big
annotation, it does a constant number of operations per entry in the
array. And you could argue about what the constant is, if it's 2, 3, 4
operations per entry in the array, but the point it whatever that constant
is, 2, 3, or 4, it gets conveniently suppressed by the Big O notation. So
as a result, total number of operations will be linear in n, and so the Big
O notation will just be O of N. So that was the first example, and the last
three examples, I want to look at different ways that we could have two
loops. And in this example, I want to think about one loop followed by
another. So two loops in sequence. I want to study almost the same
problem as the previous one. Where now we're just given two arrays,
capital a and capital b, we'll say both of the same length n, and we want
to know whether the target t is in either one of them. Again, we'll look at
the straightforward algorithm, where we just search through A, and if we
fail to find t in A, we search through B. If we don't find t in B either, then
we have to return false. So the question then is exactly the same as
last time. Given this new longer piece of code, what, in big O notation, is
its running time? Well the question was the same and in this case the
answer was the same so this algorithm just like the last one has running
time big O of N if we actually count the number of operations it won't be
exactly the ssame as last time it will be roughly twice as many
operations. As the previous piece of code. That's because we have to
search two different arrays, each of length n. So whatever work we did
before. We now do it twice as many times. Of course, that, too, being a
constant independent of the input length n, is going to get suppressed
once we passed a big O notation. So, this, like the previous algorithm, is
a linear time algorithm. It has running time big O of n. Let's look at a
more interesting example of two loops where rather than processing
each loop in sequence, they're going to be nested. In particular let's look
at the problem of searching whether two given input arrays each of
length n contain a common number. The code that we're going to look
at for solving this problem is the most straightforward one you can, you
can imagine where we just compare all possibilities. So for each index i
into the array a and each index j into the array b, we just see if A i is the
same number as B j. If it is, we return true. If we exhaust all of the
possibilities without ever finding equal elements Then we're save in
returning false. The question is of course is, in terms of big O notation,
asymptotic analysis, as a function of the array length n, what is the
running time of this piece of code? So this time, the answer has
changed. For this piece of code, the running time is not big O of n. But
it is big O of n squared. So we might also call this a quadratic time
algorithm. because the running time is quadratic in the input length n.
So this is one of those kinds of algorithms, where, if you double the
input length. Then the running time of the algorithm will go up by a
factor of 4, rather than by a factor of 2 like in the previous two pieces of
code. So, why is this? Why does it have [UNKNOWN] running time
[UNKNOWN] of n squared? Well again, there's some constant setup
cost which gets suppressed in the big [UNKNOWN]. Again, for each
fixed choice of an entry i into array a, and then index j for array b for
each fixed choice for inj. We only do a constant number of operations.
The particular constants are relevant, because it gets suppressed in the
big O notation. What's different is that there's a total of n squared
iterations of this double four loop. In the first example, we only had n
iterations of a single four loop. In our second example, because one four
loop completed before the second one began. We had only two n
iterations overall. Here for each of the n iterations of the outer for loop
we do n iterations of the inner for loop. So that gives us the n times n
i.e. n squared total iterations. So that's going to be the running time of
this piece of code. Let's wrap up with one final example. It will again be
nested for loops, but this time, we're going to be looking for duplicates in
a single array A, rather than needing to compare two distinct arrays A
and B. So, here's the piece of code we're going to analyze for solving
this problem, for detecting whether or not the input array A has duplicate
entries. There's only 2 small difference relative to the code we went
through on the previous slide when we had 2 different arrays the first
surprise, the first change won't surprise you at all which instead of
referencing the array B I change that B to an A so I just compare the ith
entry of a to the Jth entry of A. The second change is a little more subtle
which is I changed the inner for loop so the index J begins. At I plus 1.
Where I is the current value of the outer four loops index. Rather than
starting at the index 1. I could have had it start at the index one. That
would still be correct. But, it would be wasteful. And you should think
about why. If we started the inner four loops index at 1. Then this code
would actually compare each distinct pair of elements at a to each other
twice. Which, of course, is silly. You only need to compare two different
elements of a to each other one. To know whether they are equal or
not. So this is the piece of code. The question is the same as it always
is what in terms of bigger notations in the input link n is the running time
of this piece of code. So the answer to this question, same as the last
one. Big O of n squared. That is, this piece of code is also a quad-, has
quadratic running time. So what I hope was clear was that, you know?
Whatever the running time of this piece of code is. It's proportional to
the number of iterations of this double four loop. Like in all the
examples, we do constant work per iteration. We don't care about the
constant. It gets suppressed by the big O notation. So, all we gotta do is
figure out how many iterations there are of this double four loop. My
claim is that there's roughly n squared over two iterations of this double
four loop. There's a couple ways to see that. Informally, we discussed
how the difference between this code and the previous one, is that,
instead of counting something twice, we're counting it once. So that
saves us a factor of two in the number of iterations. Of course, this one
half factor gets suppressed by the big O notation anyways. So the big
O, running time doesn't change. A different argument would just say,
you know? How many, there's one iteration for every distinct choice of i
and j of indices between one and n. And a simple counting argument.
Says that there's n choose 2 such choices of distinct i and j, where n
choose 2 is the number n times n minus 1 over 2. And again,
supressing lower-order terms and the constant factor, we still get a
quadratic dependence on the length of the input array A. So that wraps
up some of the sort of just simple basic examples. I hope this gets you
oriented, you have a strong intuitive sense for what big O notation is
trying to accomplish. And how it's defined mathematically. Let's now
move onto both the mathematical developments and some more
interesting algorithms.

Big-Oh Notation
In the following series of videos, we'll give a formal treatment of
asymptotic notation, in particular big-Oh notation, as well as work
through a number of examples. Big-Oh notation concerns functions
defined on the positive integers, we'll call it T(n) We'll pretty much always
have the same semantics for T(n). We're gonna be concerned about the
worst-case running time of an algorithm, as a function of the input size,
n. So, the question I wanna answer for you in the rest of this video, is,
what does it mean when we say a function, T(n), is big-Oh of f(n). Or
hear f(n) is some basic function, like for example n log n. So I'll give you
a number of answers, a number of ways of, to think about what big-Oh
notation really means. For starters let's begin with an English definition.
What does it mean for a function to be big-Oh of f(n)? It means
eventually, for all sufficiently large values of n, it's bounded above by a
constant multiple of f(n). Let's think about it in a couple other ways. So
next I'm gonna translate this English definition into picture and then I'll
translate it into formal mathematics. So pictorially you can imagine that
perhaps we have T(n) denoted by this blue functions here. And perhaps
f(n) is denoted by this green function here, which lies below T(n). But
when we double f(n), we get a function that eventually crosses T(n) and
forevermore is larger than it. So in this event, we would say that T(n)
indeed is a Big-Oh of f(n). The reason being that for all sufficiently large
n, and once we go far enough out right on this graph, indeed, a constant
multiple times of f(n), twice f(n), is an upper bound of T(n). So finally, let
me give you a actual mathematical definition that you could use to do
formal proofs. So how do we say, in mathematics, that eventually it
should be bounded above by a constant multiple of f(n)? We see that
there exists two constants, which I'll call c and n0. So that T(n) is no
more than c times f(n) for all n that exceed or equal n0. So, the role of
these two constants is to quantify what we mean by a constant multiple,
and what we mean by sufficiently large, in the English definition. c
obviously quantifies the constant multiple of f(n), and n0 is quantifying
sufficiently large, that's the threshold beyond which we insist that, c
times f(n) is an upper-bound on T(n). So, going back to the picture, what
are c and n0? Well, c, of course, is just going to be two. And n0 is the
crossing point. So we get to where two f(n). And T(n) cross, and then we
drop the acentode. This would be the relative value of n0 in this picture,
so that's the formal definition, the way to prove that something's bigger
of f(n) you exhibit these two constants c and n0 and it better be the case
that for all n at least n0, c times f(n) upper-bounds T(n). One way to think
about it if you're trying to establish that something is big-Oh of some
function it's like you're playing a game against an opponent and you
want to prove that. This inequality here holds and your opponent must
show that it doesn't hold for sufficiently large n you have to go first your
job is to pick a strategy in the form of a constant c and a constant n0 and
your opponent is then allowed to pick any number n larger than n0 so
the function is big-Oh of f(n) if and only if you have a winning strategy in
this game. If you can up front commit to constants c and n0 so that no
matter how big of an n your opponent picks, this inequality holds if you
have no winning strategy then it's not big-Oh of f(n) no matter what C
and n0 you choose your opponent can always flip this in equality. By
choosing a suitable, suitable large value of n. I want to emphasis one
last thing which is that these constants, what do I mean by constants. I
mean they are independent of n. And so when you apply this definition,
and you choose your constant c and n0, it better be that n does not
appear anywhere. So C should just be something like a thousand or a
million. Some constant independent of n. So those are a bunch of way to
think about big-Oh notation. In English, you wanna have it bound above
for sufficiently large numbers n. I'm showing you how to translate that
into mathematics that give you a pictorial representation. And also sort of
a game theoretic way to think about it. Now, let's move on to a video that
explores a number of examples.

Basic Examples
Having slogged through the formal definition of big O notation, I wanna
quickly turn to a couple of examples. Now, I wanna warn you up front,
these are pretty basic examples. They're not really gonna provide us
with any insight that we don't already have. But they serve as a sanity
check that the big O notation's doing what its intended purpose is.
Namely to supress constant factors and low order terms. Obviously,
these simple examples will also give us some, facility with the definition.
So the first example's going to be to prove formally the following claim.
The claim states that if T(n) is some polynomial of degree "k", so namely
a<u>k n^k. Plus all the way up to a<u>1 N + a<u>0. For any integer
"k", positive</u></u></u> integer "k" and any coefficients a<u>i's
positive or negative. Then: T(n) is big</u> O of n^k. So this claim is a
mathematical statement and something we'll be able to prove. As far as,
you know, what this claim is saying, it's just saying big O notation really
does suppress constant factors and lower order terms. If you have a
polynomial then all you have to worry about is what is the highest power
in that polynomial and that dominates its growth as "n" goes to infinity.
So, recall how one goes about showing that one function is big O of
another. The whole key is to find this pair of constants, c and n<u>0,
where c quantifies the constant multiple</u> of the function you're trying
to prove big O of, and n<u>0 quantifies what you mean</u> by "for all
sufficiently large n." Now, for this proof, to keep things very simple to
follow, but admittedly a little mysterious, I'm just gonna pull these
constants, c and n<u>0, out of a hat. So, I'm not gonna tell you how I
derived them,</u> but it'll be easy to check that they work. So let's work
with the constants n<u>0</u> equal to one, so it's very simple choice of
n<u>0 and then "c" we are gonna pick to</u> be sum of the absolute
values of the coefficients. So the absolute value of "a<u>k"</u> plus the
absolute value of "a<u>(k-1)", and so on. Remember I didn't assume
that</u> the pol..., the original polynomial, had non-negative coefficients.
So I claim these constants work, in the sense that we'll be able to prove
to that, assert, you know, establish the definition of big O notation. What
does that mean? Well we need to show that for all "n" at least one
(cause remember we chose n<u>0 equal to</u> one), T(n) (this
polynomial up here) is bounded above by "c" times "n^k", where "c" is
the way we chose it here, underlined in red. So let's just check why this
is true. So, for every positive integer "n" at least one, what do we need to
prove? We need to prove T(n) is upper bounded by something else. So
we're gonna start on the left hand side with T(n). And now we need a
sequence of upper bounds terminating with "c" times "n^k" (our choice of
c underlined in red). So T(n) is given as equal to this polynomial
underlined in green. So what happens when we replace each of the
coefficients with the absolute value of that coefficient? Well, you take the
absolute value of a number, either it stays the same as it was before, or
it flips from negative to positive. Now, "n" here, we know is at least one.
So if any coefficient flips from negative to positive, then the overall
number only goes up. So if we apply the absolute value of each of the
coefficients we get an only bigger number. So T(n) is bounded above by
the new polynomial where the coefficients are the absolute values of
those that we had before. So why was that a useful step? Well now what
we can do is we can play the same trick but with "n". So it's sort of
annoying how right now we have these different powers of "n". It would
be much nicer if we just had a common power of "n", so let's just replace
all of these different "n"s by "n^k", the biggest power of "n" that shows up
anywhere. So if you replace each of these lower powers of "n" with the
higher power "n^k", that number only goes up. Now, the coefficients are
all non negative so the overall number only goes up. So this is bounded
above by "the absolute value of a<u>k" "n^k"</u> ...up to "absolute
value of a<u>1" "n^k" ...plus "a<u>0" "n^k".</u></u> I'm using here that
"n" is at least one, so higher powers of "n" are only bigger. And now
you'll notice this, by our choice of "c" underlined in red, this is exactly
equal to "c" times "n^k". And that's what we have to prove. We have to
prove that T(n) is at most "c" times "n^k", given our choice of "c" for
every "n" at least one. And we just proved that, so, end of proof. Now
there remains the question of how did I know what the correct, what a
workable value of "c" and "n<u>0"</u> were? And if you yourself want to
prove that something is big O of something else, usually what you do is
you reverse engineer constants that work. So you would go through a
proof like this with a generic value of "c" and "n<u>0" and then</u> you'd
say, "Ahh, well if only I choose "c" in this way, I can push the proof
through." And that tells you what "c" you should use. If you look at the
optional video on further examples of asymptotic notation, you'll see
some examples where we derive the constants via this reverse
engineering method. But now let's turn to a second example, or really I
should say, a non-example. So what we're going to prove now is that
something is not big O of something else. So I claim that for every "k" at
least 1, "n^k" is not O(n^(k-1)). And again, this is something you would
certainly hope would be true. If this was false, there'd be something
wrong with our definition of big O notation and so really this is just to get
further comfort with the definition, how to prove something is not big O of
something else, and to verify that indeed you don't have any collapse of
distinctive powers of ploynomials, which would be a bad thing. So how
would we prove that something is not big O of something else? The
most...frequently useful proof method is gonna be by contradiction. So,
remember, proof by contradiction, you assume what you're trying to,
establish is actually false, and, from that, you do a sequence of logical
steps, culminating in something which is just patently false, which
contradicts basic axioms of mathematics, or of arithmetic. So, suppose,
in fact, n^k was big O of n^(k-1), so that's assuming the opposite of what
we're trying to prove. What would that mean? Well, we just referred to
the definition of Big O notation. If in fact "n^k" hypothetically were Big O
of n^(k-1) then by definition there would be two constants, a winning
strategy if you like, "c" and "n<u>0" such</u> that for all sufficiently large
"n", we have a constant multiple "c" times "n^(k-1)" upper bounding
"n^k". So from this, we need to derive something which is patently false
that will complete the proof. And the way, the easiest way to do that is to
cancel "n^(k-1)" from both sides of this inequality. And remember since
"n" is at least one and "k" is at least one, it's legitimate to cancel this
"n^(k-1)" from both sides. And when we do that we get the assertion that
"n" is at most some constant "c" for all "n" at least "n<u>0". And this
now</u> is a patently false statement. It is not the case that all positive
integers are bounded above by a constant "c". In particular, "c+1", or the
integer right above that, is not bigger than "c". So that provides the
contradiction that shows that our original assumption that "n^k" is big O
of "n^(k-1)" is false. And that proves the claim. "n^k" is not big O of "n^(k-
1)", for every value of "k". So different powers of polynomials do not
collapse. They really are distinct, with respect to big O notation.

Big Omega and Theta - Part 1


In this lecture, we'll continue our formal treatment of asymptotic notation.
We've already discussed big O notation, which is by far the most
important and ubiquitous concept that's part of asymptotic notation, but,
for completeness, I do want to tell you about a couple of close relatives
of big O, namely omega and theta. If big O is analogous to less than or
equal to, then omega and theta are analogous to greater than or equal
to, and equal to, respectively. But let's treat them a little more precisely.
The formal definition of omega notation closely mirrors that of big O
notation. We say that one function, T of N, is big omega of another
function, F of N, if eventually, that is for sufficiently large N, it's lower
bounded by a constant multiple of F of N. And we quantify the ideas of a
constant multiple and eventually in exactly the same way as before,
namely via explicitly giving two constants, C and N naught, such that T
of N is bounded below by C times F of N for all sufficiently large N. That
is, for all N at least N naught. There's a picture just like there was for big
O notation. Perhaps we have a function T of N which looks something
like this green curve. And then we have another function F of N which is
above T of N. But then when we multiply F of N by one half, we get
something that, eventually, is always below T of N. So in this picture, this
is an example where T of N is indeed big Omega of F of N. As far as
what the constants are, well, the multiple that we use, C, is obviously just
one half. That's what we're multiplying F of N by. And as before, N
naught is the crossing point between the two functions. So, N naught is
the point after which C times F of N always lies below T of N
forevermore. So that's Big Omega. Theta notation is the equivalent of
equals, and so it just means that the function is both Big O of F of N and
Omega of F of N. An equivalent way to think about this is that,
eventually, T of N is sandwiched between two different constant
multiples of F of N. I'll write that down, and I'll leave it to you to verify that
the two notions are equivalent. That is, one implies the other and vice
versa. So what do I mean by T of N is eventually sandwiched between
two multiples of F of N? Well, I just mean we choose two constants. A
small one, C1, and a big constant, C2, and for all N at least N naught, T
of N lies between those two constant multiples. One way that algorithm
designers can be quite sloppy is by using O notation instead of theta
notation. So that's a common convention and I will follow that convention
often in this class. Let me give you an example. Suppose we have a
subroutine, which does a linear scan through an array of length N. It
looks at each entry in the array and does a constant amount of work with
each entry. So the merge subroutine would be more or less an example
of a subroutine of that type. So even though the running time of such an
algorithm, a subroutine, is patently theta of N, it does constant work for
each of N entries, so it's exactly theta of N, we'll often just say that it has
running time O of N. We won't bother to make the stronger statement
that it's theta of N. The reason we do that is because you know, as
algorithm designers, what we really care about is upper bounds. We
want guarantees on how long our algorithms are going to run, so
naturally we focus on the upper bounds and not so much on the lower
bound side. So don't get confused. Once in a while, there will a quantity
which is obviously theta of F of N, and I'll just make the weaker
statement that it's O of F of N. The next quiz is meant to check your
understanding of these three concepts: Big O, Big Omega, and Big
Theta notation. So the final three responses are all correct, and I hope
the high level intuition for why is fairly clear. T of N is definitely a
quadratic function. We know that the linear term doesn't matter much as
it grows, as N grows large. So since it has quadratic growth, then the
third response should be correct. It's theta of N squared. And it is omega
of N. So Omega of N is not a very good lower bound on the asymptotic
rate of growth of T of N, but it is legitimate. Indeed, as a quadratic
growing function, it grows at least as fast as a linear function. So it's
Omega of N. For the same reason, big O of N cubed, it's not a very good
upper bound, but it is a legitimate one, it is correct. The rate of growth of
T of N is at most cubic. In fact, it's at most quadratic, but it is indeed, at
most, cubic. Now if you wanted to prove these three statements formally,
you would just exhibit the appropriate constants. So for proving that it's
big Omega of N, you could take N naught equal to one, and C equal to
one-half. For the final statement, again you could take N naught equal to
one. And C equal to say four. And to prove that it's theta of N squared
you could do something similar just using the two constants combined.
So N naught would be one. You could take C1 to be one-half and C2 to
be four. And I'll leave it to you to verify that the formal definitions of big
omega, big theta, and big O would be satisfied with these choices of
constants. One final piece of asymptotic notation, we're are not going to
use this much, but you do see it from time to time so I wanted to mention
it briefly. This is called little O notation, in contrast to big O notation. So
while big O notation informally is a less than or equal to type relation,
little O is a strictly less than relation. So intuitively it means that one
function is growing strictly less quickly than another. So formally we say
that a function T of N is little O of F of N, if and only if for all constants C,
there is a constant N naught beyond which T of N is upper bounded by
this constant multiple C times by F of N. So the difference between this
definition and that of Big-O notation, is that, to prove that one function is
big O of another, we only have to exhibit one measly constant C, such
that C times F of N is upper bound, eventually, for T of N. By contrast, to
prove that something is little O of another function, we have to prove
something quite a bit stronger. We have to prove that, for every single
constant C, no matter how small, for every C, there exists some large
enough N naught beyond which T of N is bounded above by C times F
of N. So, for those of you looking for a little more facility with little O
notation, I'll leave it as an exercise to prove that, as you'd expect for all
polynomial powers K, in fact, N to the K minus one is little O of N to the
K. There is an analogous notion of little omega notation expressing that
one function grows strictly quicker than another. But that one you don't
see very often, and I'm not gonna say anything more about it. So let me
conclude this video with a quote from an article, back from 1976, about
my colleague Don Knuth, widely regarded as the grandfather of the
formal analysis of algorithms. And it's rare that you can pinpoint why and
where some kind of notation became universally adopted in the field. In
the case of asymptotic notation, indeed, it's very clear where it came
from. The notation was not invented by algorithm designers or computer
scientists. It's been in use in number theory since the nineteenth century.
But it was Don Knuth in '76 that proposed that this become the standard
language for discussing rate of growth, and in particular, for the running
time of algorithms. So in particular, he says in this article, "On the basis
of the issues discussed here, I propose that members of SIGACT," this
is the special interest group of the ACM, which is concerned with
theoretical computer science, in particular the analysis of algorithms. So,
"I propose that the members of SIGACT and editors in computer science
and mathematics journals adopt the O, omega, and theta notations as
defined above unless a better alternative can be found reasonably soon.
So clearly a better alternative was not found and ever since that time this
has been the standard way of discussing the rate of growth of running
times of algorithms and that's what we'll be using here.

Additional Examples

Chapter 3: DIVIDE & CONQUER ALGORITHMS

1. O(n log n) Algorithm for Counting Inversions I (13 min)


2. O(n log n) Algorithm for Counting Inversions II (17 min)
3. Strassen's Subcubic Matrix Multiplication Algorithm (22 min)
4. O(n log n) Algorithm for Closest Pair I [Advanced - Optional] (32 min)
5. O(n log n) Algorithm for Closest Pair II [Advanced - Optional] (19 min)

Homework

Optional Theory Problems


1. You are given as input an unsorted array of n distinct numbers, where n is a
power of 2. Give an algorithm that identifies the second-largest number in
the array, and that uses at most  n+ log2 n – 2 comparisons.

2. You are a given a unimodal array of n distinct elements, meaning that its
entries are in increasing order up until its maximum element, after which its
elements are in decreasing order. Give an algorithm to compute the
maximum element that runs in O(log n) time.

3. You are given a sorted (from smallest to largest) array A of n distinct


integers which can be positive, negative, or zero. You want to decide
whether or not there is an index i such that A[i] = i. Design the fastest
algorithm that you can for solving this problem.

Chapter 4: THE MASTER METHOD

Overview
THE MASTER METHOD - These lectures cover a "black-box" method
for solving recurrences. You can then immediately determine the running
time of most of the divide-and-conquer algorithms that you'll ever see!
(Including Karatsuba's integer multiplication algorithm and Strassen's
matrix multiplication algorithm from Week 1.) The proof is a nice
generalization of the recursion tree method that we used to analyze
MergeSort. Ever wonder about the mysterious three cases of the Master
Method? Watch these videos and hopefully all will become clear.

QUICKSORT - THE ALGORITHM - One of the greatest algorithms ever,


and our first example of a randomized algorithm. These lectures go over
the pseudocode --- the high-level approach, how to partition an array
around a pivot element in linear time with minimal extra storage, and the
ramifications of different pivot choices --- and explain how the algorithm
works.
QUICKSORT - THE ANALYSIS -These lectures are optional, but I
strongly encourage you to watch them if you have time. They prove that
randomized QuickSort (i.e., with random pivot choices) runs in O(n log n)
time on average. The analysis is as elegant as the algorithm itself, and is
based on a "decomposition principle" that is often useful in the analysis
of randomized algorithms. Note that there are some accompanying
lectures notes for this part (available for download underneath each
video).

PROBABILITY REVIEW - This first of these optional lecture videos


reviews the concepts from discrete probability that are necessary for the
QuickSort analysis --- sample spaces, events, random variables,
expectation, and linearity of expectation. The second video covers just
two topics, although quite tricky ones! (Namely, conditional probability
and independence.) You need to review this material (via this video or
some other source, as you wish) before studying the analysis of the
randomized contraction algorithm in Week 3.

HOMEWORK: Problem Set #2 has five questions that should give you
practice with the Master Method and help you understand QuickSort
more deeply. Programming Assignment #2 asks you to implement
QuickSort and compute the number of comparisons that it makes for
three different pivot rules.

1. Motivation (8 min)
2. Formal Statement (10 min)
3. Examples (13 min)
4. Proof I (10 min)
5. Interpretation of the 3 Cases (11 min)
6. Proof II (16 min)

Chapter 5: QUICKSORT - ALGORITHM

1. Quicksort: Overview (12 min)


2. Partitioning Around a Pivot (25 min)
3. Correctness of Quicksort [Review - Optional] (11 min)
4. Choosing a Good Pivot (22min)

Chapter 6: QUICKSORT - ANALYSIS section

1. Analysis I: A Decomposition Principle [Advanced - Optional] (22 min)


2. Analysis II: The Key Insight [Advanced - Optional] (12min)
3. Analysis III: Final Calculations [Advanced - Optional] (9min)

growth

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy