Math1059 - Calculus

Table of Contents
1. Functions 3
2. Limits 9
3. Continuity 18
4. Differentiation 27
5. Curve Sketching 47
6. Integration 59
7. Methods of Integration 67
8. Indeterminate Forms and Improper Integrals 77
9. Differential Equations 83
10. Appendix: Mathematical Induction 97
2
3
1. Functions
Getting started.
Definition 1.1. A function is a map f : X −→ Y between sets X and Y with
the property that f sends an element of X to a unique element of Y . For
x ∈ X, write f (x) for the corresponding element in Y .
What this means is that f cannot send x to both y and y 0 in Y if y 6= y 0 .

To give examples, we first set some notation. Let R be the set of real numbers.
Let R+ be the set of non-negative real numbers, that is those real numbers x
such that x ≥ 0.
Example 1.2. (i) Define f : R −→ R by f (x) = x.
(ii) Define g : R −→ R+ by f (x) = x2 .
1
(iii) Define h : R −→ R by h(x) = 1+x2 .
Examining the definition of a function, let’s consider what a function *does

not* do.
• A function does not have to send each x ∈ X to a different y ∈ Y .
For example, the function g(x) = x2 sends both 2 and −2 to 4.
• A function does not have to hit every y ∈ Y . For example, the function
1
h(x) = 1+x 2 has the property that 0 < h(x) ≤ 1 for every x ∈ R.
These examples show that it’s good to get a grip on the set from which a
function starts and the set where it sends elements to.
Definition 1.3. Let f : X −→ Y be a function. The set X is called the
domain of the function f . The range of f is the smallest subset R of Y
containing all the elements f (x). That is,
R = {f (x) | x ∈ X}.
More notation. For a < b in R, write [a, b] for the closed interval from a to b,
write (a, b) for the open interval from a to b, and write [a, b) and (a, b] for the
half-open intervals from a to b. That is:
• [a, b] = {x ∈ R | a ≤ x ≤ b};
• (a, b) = {x ∈ R | a < x < b};
• [a, b) = {x ∈ R | a ≤ x < b};
• (a, b] = {x ∈ R | a < x ≤ b}.
Example 1.4. Return to Example 1.2.
(i) The domain of f is R. The range is R.
(ii) The domain of g is R. The range is R+ .
4
q ≤ 1 for every
(iii) The domain of h is R. The range is (0, 1] because 0 < g(x)
1 1−y
x ∈ R, and any y ∈ [0, 1] can be written as y = 1+x2 for x = y .
Example 1.5. Let f (x) = x1 . What is the domain and range of f ?

Solution 1.6. The domain is not all of R since 10 makes no sense. Otherwise,
it does make sense to take the inverse of any non-zero x ∈ R. So the domain
of f is (−∞, 0) ∪ (0, ∞). For the range, again, it makes no sense to say
that 0 is the inverse of any x ∈ R – that is, 0 6= x1 for any x ∈ R. So
R ⊆ (−∞, 0) ∪ (0, ∞). On the other hand, if y ∈ (−∞, 0) ∪ (0, ∞) then take
x = y1 to get f (x) = x1 = 1/y
1
= y. Therefore y is in the range of f . Hence
R = (−∞, 0) ∪ (0, ∞).
There are special functions with extra properties.

Definition 1.7. Let f : X −→ Y be a function. The function f is injective if
each y in the range of f is hit by precisely one x ∈ X. That is, if f (x) = f (x0 )
then x = x0 .
Definition 1.8. Let f : X −→ Y be a function. The function f is surjective
if each y ∈ Y is hit by an x ∈ X. That is, if the range R of f equals Y .
Definition 1.9. Let f : X −→ Y be a function. The function f is bijective if
it is both injective and surjective.
Example 1.10. Let’s return once more to Example 1.2.
(i) f : R −→ R defined by f (x) = x: This function is bijective. It is injective
because if f (x) = f (x0 ) then, by definition of f , f (x) = x and f (x0 ) = x0 , so
we obtain x = x0 . It is surjective because if y ∈ R then f (y) = y so y is in the
range of f .
(ii) g : R −→ R+ given by g(x) = x2 : This function is not injective but it
is surjective. It is not surjective because, for example, g(2) = g(−2) = 4
√
but 2 6= −2. It is surjective because if y ∈ R+ then taking x = y gives
√
g(x) = x2 = ( y)2 = y. Therefore y is in the range of g.
1
(iii) h : R −→ R given by h(x) = 1+x 2 . This function is neither injective nor
surjective. It is not injective because, for example, h(1) = h(−1) = 1/2 but
1 6= −1. It is not surjective because we have already seen that the range of h
is (0, 1] and (0, 1] 6= R.
More examples of functions. It’s useful to build up a kitbag of functions

that will appear again and again as examples throughout the module. These
are functions that you should be familiar with from school and we usually
won’t go through explicit definitions.
The absolute value function. Write |x| for the absolute value of a real
number x. That is,
x if x ≥ 0
|x| =
−x if x < 0.
5
The absolute value function is given by f (x) : R −→ R+ where f (x) = |x|. You
should check that the domain of f is R, the range is R+ , f is not injective,
and f is surjective.
The sine and cosine functions. First consider f : R −→ R defined by

f (x) = sin x. Observe that the domain of f is R; since −1 ≤ sin x ≤ 1 for
all x, the range of f is [−1, 1], f is not injective since f (x + 2π) = f (x), and f
is not surjective since the range is not equal to R.
On the other hand, we could restrict the range by considering f 0 : R −→ [−1, 1]
defined by f 0 (x) = sin x. Now f 0 is surjective.
Further, we could restrict the domain by considering f 00 : (− π2 , π2 ] −→ [−1, 1]
defined by f 00 (x) = sin x. Now f 00 is both injective and surjective, that is, f 00
is bijective.
Similarly, g : [0, π] −→ [−1, 1] defined by g(x) = cos x is bijective.
More trig functions. There are four more trig functions:

sin x 1 1 cos x
tan x = sec x = csc x = cot x = .
cos x cos x sin x sin x
To analyse these, we need a bit more notation. Let Z be the integers. For a
set X and a subset A ⊂ X, write X − A for the complement of A in X. That
is, X − A consists of all the elements in X that are not in A.
sin x
(i) tan x = cos x . Observe that tan x is not defined when cos x = 0, that is,
when x = kπ + π/2 for some integer k. Let A = {kπ + π/2 | k ∈ Z}. Then
we can define a function f : R − A −→ R by f (x) = tan x. Observe that f is
a surjection but f (x + 2π) = f (x) so f is not injective.
The restricted function f 0 : (− π2 , π2 ) −→ R defined by f 0 (x) = tan x is a bijec-
tion.
(ii) sec x = cos1 x . As with tan x, sec x is not defined on A = {kπ+π/2 | k ∈ Z}.
So there is a function g : R−A −→ R defined by g(x) = sec x. Observe that g is
neither a surjection nor an injection. It is not a surjection since −1 ≤ cos x ≤ 1
implies that sec x ≥ 1 or sec x ≤ −1. In particular, the open interval (−1, 1)
is not in the range of sec x. It is not an injection since sec(x + 2π) = sec x.
(iii) csc x = sin1 x . Now sec x is not defined when sin x = 0, that is, when
x = kπ for some integer k. Let B = {kπ | k ∈ Z}. Then we can define
a function h : R − B −→ R by h(x) = csc x. Observe that h is neither a
surjection nor an injection reasons similar to those for sec x.
(iv) cot x = cos x

sin x . As for csc x, cot x is not defined on B = {kπ | kZ}. We
can define a function ` : R − B −→ R by `x = cot x. Observe that ` is not an
injection since cot(x + 2π) = cot x but ` is a surjection.
6
Inverse Functions. Suppose that there are functions f : X −→ Y and

g : Y −→ Z. Then we may compose to produce a new function
g ◦ f : X −→ Z
defined by (g ◦ f )(x) = g(f (x)). That is, first send x to f (x) in Y and then
send f (x) to g(f (x)) in Z.
Example 1.11. Define f : R −→ R by f (x) = 2x + 3 and define g : R −→ R+
by g(x) = x2 . Then
(g ◦ f )(x) = g(f (x)) = g(2x + 1) = (2x + 1)2 = 4x2 + 4x + 1.
Observe that composition in the other order – f ◦ g – does make sense since
the range of g is contained in the domain of f . In this case,
(f ◦ g)(x) = f (g(x)) = f (x2 ) = 2(x2 ) + 1 = 2x2 + 1.
Remark 1.12. Note in Example 1.11 that g ◦ f and f ◦ g are NOT equal.
That is, thinking of composition as an “operation” analogous to addition or
multiplication, composition is not commutative. You’ll see this sort of thing
again more abstractly when you study groups in Linear Algebra II.
Now suppose that f : X −→ Y is a bijection. Then we can define an inverse

function
f −1 : Y −→ X
as follows. Suppose that y = f (x). Then set f −1 (y) = x. Observe that this is
the same as saying that (f −1 ◦ f )(x) = x. Observe also that composition in
the other order gives (f ◦ f −1 )(y) = f (f −1 (y)) = f (x) = y.
More formally, we have the following definition. For a set X, the identity
function I : X −→ X is the function defined by I(x) = x.
Definition 1.13. Let f : X −→ Y be a function. Then f has an inverse
function if there is a function g : Y −→ X such that g ◦ f equals the identity
map on X and f ◦ g equals the identity map on Y .
This definition does not explicitly say that f has to be a bijection, but that is
inherent from the compositions g ◦ f and f ◦ g being the identity maps on X
and Y respectively. (Exercise: check this!)
Example 1.14. Let f : R −→ R be defined by f (x) = 2x + 1. Then f is a
bijection. What is the inverse of f ?
Solution 1.15. To find f −1 , let y = f (x) and solve y = 2x + 1 for x. This
gives x = y−1
2 . Let g(x) =
x−1
2 . Let’s check that g is an inverse of f by
showing that g ◦ f and f ◦ g are the identity maps on R. We have
(2x + 1) − 1 2x
(g ◦ f )(x) = g(f (x)) = g(2x + 1) = = =x
2 2
x−1 x−1
(f ◦ g)(x) = f (g(x)) = f ( ) = 2( ) + 1 = (x − 1) + 1 = x.
2 2
Thus g is the inverse of f so we can take f −1 = g. That is, f −1 (x) = x−1
2 .
7
Example 1.16. Let f : R −→ R+ be defined by f (x) = x2 . Find a restricted

domain on which f is a bijection and then, on this restricted domain, find f −1 .
Solution 1.17. Observe that f is not a bijection since f (−x) = (−x)2 = x2 =
f (x). So to obtain a bijection, restrict f to R+ . That is, let f 0 : R+ −→ R+
be defined by f 0 (x) = x2 . Then f 0 is a bijection, and so has an inverse.
√
To find f −1 , let y = f 0 (x) and solve y = x2 for x. This gives x = y (taking
√
the positive square root). Let g(x) = x. Check that g is the inverse of f 0 :
√
(g ◦ f 0 )(x) = g(f 0 (x)) = g(x2 ) = x2 = x
√ √
(f 0 ◦ g)(x) = f 0 (g(x)) = f 0 ( x) = ( x)2 = x.
√
Thus g is the inverse of f so we can take f −1 (x) = x.
One useful observation is that the graph of f −1 is the mirror image of the
graph of f along the line y = x. That is, if y = f (x) then the point (x, y) is on
the graph of f . Now apply f −1 to y = f (x) to get f −1 (y) = f −1 (f (x)) = x.
This implies that the point (y, x) is on the graph of f −1 . You can get the
point (y, x) from the point (x, y) by reflecting along the line x = y - i.e. -
swapping coordinates. Doing this for every such point on the graph of f gives
the graph of f −1 .
Inverse Trig Functions Special cases of inverse functions occur for trig
functions, but caution needs to be exercised to get the domains and ranges
correct.
Let f : R −→ R be defined by f (x) = sin x. Then f is not a bijection so it has
no inverse. But we saw that f becomes a bijection if the domain and range
are appropriately restricted. Starting again, let f : [− π2 , π2 ] −→ [−1, 1] be the
function defined by f (x) = sin x. Then f is a bijection, so f −1 exists. Let
sin−1 x be the inverse function of sin x for this domain and range. So sin−1 x
is a function
π π
sin−1 x : [−1, 1] −→ [− , ]
2 2
with domain [−1, 1] and range [− π2 , π2 ]. The graph of sin−1 x is the mirror
image of that for sin x, with the mirror placed along the line y = x.
Similarly, the function g : [0, π] −→ [−1, 1] defined by g(x) = cos x is a bijec-
tion and so has an inverse
cos−1 x : [−1, 1] −→ [0, π].
Its graph is the mirror image of that for cos x, with the mirror placed along
the line y = x.
Exponential and logarithmic functions. Let a > 0 be a real number. The

exponential function with base a is the function f : R −→ R+ defined by
f (x) = ax .
WARNING: The value “a” is fixed, for example, if a = 2 we’re looking at
f (x) = 2x . The function inputs an x into the exponent.
8
Exponential functions satisfy important properties:

• a0 = 1;
• a−x = 1
ax ;
• ax ay = ax+y for all x, y ∈ R;

• (ax )y = axy for all x, y ∈ R;
• (ab)x = ax bx for all a, b ∈ R+ .
The exponential function f (x) = ax is a bijection. Therefore it has an inverse
function, which is called the logarithm function.
Definition 1.18. The logarithm function with base a is the function
loga : R+ −→ R
defined by the rule:
loga (x) = y if and only if ay = x.
In particular, the exponential and logarithm functions with base a compose

to give identity maps. That is, we have
aloga (x) = x for all x ∈ R+ and loga (ax ) = x for all x ∈ R.
Logarithm functions also satisfy important properties:

• loga (1) = 0;
• loga ( x1 ) = − loga (x);
• loga (xy) = loga (x) + loga (y);
• loga (xy ) = y loga (x);
logb (x)
• loga (x) = logb (a) .
The five rules for logarithms are derived from the five rules for exponentials
by using inverse functions. Here is an example.
Example 1.19. Show that loga (xy) = loga (x) + loga (y).
Solution 1.20. Let c = loga (xy) and d = loga (x) + loga (y). We want to show
that c = d. The strategy is to first show that ac = ad and then show that
c = d.
Taking exponentials using the base a gives az = aloga (xy) = xy. On the other
hand, consider the string of equalities
ad = aloga (x)+loga (y) = aloga (x) aloga (y) = xy.
From left to right, the first equality holds by definition of d, the second equal-
ity holds from the property as+t = as at , and the third equality holds since
aloga (x) = x and aloga (y) = y. Thus we have ac = ad . Now apply the logarithm
function to get loga (ac ) = loga (ad ). But loga (ac ) = c and loga (ad ) = d, so
c = d as required.
9
2. Limits
Let’s consider two simple functions g, h : R −→ R defined by

2x + 1 if x 6= 1
g(x) = 2x + 1 h(x) =
1 if x = 1
3 3
1 1
1 1
g(x) h(x)
They are clearly different functions because they take different values at x = 1.
However, we’ll see that the concept of limits does not see this, while the
concept of continuity does.
The idea of a limit is as follows. Start with a function f : R −→ R. Fix
a point a in R. Suppose x is another point in R which can be moved, in
particular, we can let x get closer and closer to a. The lingo is to say that x
approaches a. As x approaches a, look at how f (x) changes.
Informally, we say that the function f approaches the limit L as x approaches a
if f (x) approaches L. If f is defined on either side of a, write
lim f (x) = L.
x→a
WARNING! A limit says nothing about what f (x) is doing at a. It is only

concerned with what happens near a.
For example, consider the functions g and h above, and take a = 1. Let x
approach 1. This could be from the left with x < 1 and x getting bigger as it
nears 1, or from the right with x > 1 and x getting smaller as it nears 1. For
all x near 1 but not equal to 1, we have g(x) = 2x + 1 so g(x) approaches 3
as x approaches 1. Thus
lim g(x) = 3.
x→1
On the other hand, exactly the same thing is happening with h. For all x
near 1 but not equal to 1, we have h(x) = 2x + 1 so h(x) approaches 3 as x
approaches 1. Thus
lim h(x) = 3.
x→1
10
In the first case, it happens to be the case that g(1) = 3, which matches the
limit, but in the second case h(1) = 1, which does not match the limit. There
is nothing inconsistent here if we remember the key fact that limits are only
concerned with what happens near a but not at a.
3x−1
Example 2.1. Find limx→2 x−2 .
Solution 2.2. First, we double-check the problem makes sense. The function
3x−1
x−1 is not defined when x = 1. But as we are only concerned with what
happens near 2, this is not a problem as we can restrict the function to the
domain (1, ∞).
Now observe that the numerator approaches 5 as x approaches 2 and the
demonimator approaches 1 as x approaches 2. So the function as a whole
approaches 51 = 5. That is,
3x − 1
lim = 5.
x→2 x−1
x2 −1
Example 2.3. Find limx→1 x−1 .
Solution 2.4. Note that the denominator is 0 at x = 1. On the one hand, as

the limit is not concerned with what happens at x = 1 but only near 1, this
is not a problem. On the other hand, a denominator of 0 is troubling.
But in this case, observe that the numerator is also 0 at x = 1. That is, 1 is a
root of the polynomial x2 − 1 and we have x2 − 1 = (x − 1)(x + 1). This lets
2
−1
us cancel out the x − 1 factor to get xx−1 = x + 1. Now the limit is easy to
work out:
x2 − 1
lim = lim x + 1 = 2.
x→1 x − 1 x→1
Sometimes limits don’t exist. For example, consider limx→0 x1 .
The function is not defined at x = 0 but it is defined at any other point near 0,
so it makes sense to ask for the limit. Let x approach 0 from the right, that is,
take x ∈ (0, 1) and let x get smaller. Then x1 gets bigger. In fact, the closer x
gets to 0 the bigger x1 gets. So there is no number L such that f (x) = x1
approaches L as x approaches 0 from the right. Therefore x1 has no limit at
11
x = 0. Note also that the same argument works if we take x ∈ (−1, 0) and
let x approach 0. Then x1 is an increasingly large negative number.
The argument in the last paragraph is somewhat intuitive and logically loose,
while mathematics likes proof and logical rigour. So having a sense for what
limits are now, let’s consider the real, formal definition of a limit.
Definition 2.5. Suppose that f is a function defined on the set (c, a) ∪ (a, b)
where c < a and b > a. Then
lim f (x) = L
x→a
if for every > 0 there exists a δ > 0 such that |f (x) − L| < whenever
|x − a| < δ.
This is a mouthful. Bit by bit, here’s what it’s saying. First, the nota-
tion.
• The condition, “f is a function defined on the set (c, a) ∪ (a, b) where
c < a and b > a” means that f is defined near a but does not have to
be defined at a.
• and δ are real numbers which are usually thought of as very small,
but they don’t necessarily have to be.
• |f (x) − L| is the distance from f (x) to L and |x − a| is the distance
from x to a.
Now the content, thought of as a two-step procedure.
• Step 1: start with a given > 0. You have to find a δ > 0 so that
if the distance from x to a is less than δ – ie, |x − a| < δ – then the
distance from f (x) to L must be less than – ie, |f (x) − L| < .
• Step 2: Make sure you can do Step 1 for every > 0.
The two steps are consistent with the intuitive idea of a limit. They say that
whenever x approaches a (the distance between x and a is small) then f (x)
approaches L (the distance between f (x) and L is small). However, they are
phrased in a way that makes them apt for studying when limits exist, and
especially for when limits do not exist.
Let’s revisit two of the earlier examples.
Example 2.6. Using the formal definition of the limit, show that
lim 2x + 1 = 3.
x→1
Solution 2.7. Let > 0. By the definition of the limit, we need to show that
there is a δ > 0 such that whenever |x − a| < δ we obtain |f (x) − L| < .
In this case, f (x) = 2x + 1, a = 1 and L = 3, so we need to show there is a
δ > 0 such that whenever |x − 1| < δ we obtain |(2x + 1) − 3| < . Note that
|(2x + 1) − 3| = |2x − 2| = 2|x − 1|. So what we really need to show is that
12
there is a δ > 0 such that whenever |x − 1| < δ we obtain 2|x − 1| < . This
is now easy. Take δ = 2 . Then whenever |x − a| < δ we obtain

|(2x + 1) − 3| = 2|x − 1| < 2 · = ,
2
as required.
Challenge Problem 2.8. Using the formal definition of the limit, show that
limx→0 x1 does not exist.
This is not the formal argument, but let’s get a feeling for why limx→0 x1 does
not exist. Suppose the limit *did* exist and took the value L. Let > 0.
Then by the definition of the limit, we can find a δ > 0 such that whenever
|x − a| < δ we have |f (x) − L| < . As a = 0 and f (x) = x1 in this case, this
reads as whenever |x| < δ we have | x1 − L| < . But as |x| gets very small,
| x1 | gets very large, so no matter what value the fixed number L is, | x1 − L|
would also get very large. In particular, it would be larger than . This would
give a contradiction, meaning that the assumption that the limit did exist was
incorrect, and therefore the limit does not exist.
Properties of Limits. In order to calculate limits it’s good to establish

some fundamental properties. This also marks our first foray into proving
mathematical statements. To get started we need some inequalities among
absolute values, which are easy to verify if you think of |x − y| as the distance
from x to y.
Distance Inequalities: Let a, b, c be three real numbers. Then:

(i) |a − b| ≤ |a| + |b|;
(ii) (the triangle inequality) |a − b| ≤ |a − c| + |c − b|.
The first property says that, as you would hope, limits are unique.
Theorem 2.9 (Uniqueness of Limits). If limx→a f (x) = L and limx→a f (x) =
M then L = M .
Proof. The strategy of the proof is to show that |L − M | can be made as small
as we want. This shows that L = M since if L 6= M then |L − M | = d for
some distance d > 0. But if we can made |L − M | as small as we want, we
could make it less than d, a contradiction.
Let > 0. We aim to show that |L − M | < . By the definition of a limit,
to say limx→a f (x) = L means that there is a δ1 > 0 such that whenever
|x − a| < δ1 we have |f (x) − L| < 2 . Similarly, to say limx→a f (x) = M means
that there is a δ2 > 0 such that whenever |x − a| < δ2 we have |f (x) − M | < 2 .
Let δ be the minimum of δ1 and δ2 . Then when |x − a| < δ we have both
|f (x) − L| < 2 and |f (x) − M | < 2 . By the triangle inequality, this implies
that

|L − M | ≤ |L − f (x)| + |f (x) − M | = |f (x) − L| + |f (x) − M | < + = .
2 2
13
Here is an example of Theorem 2.9 in action.

x
Example 2.10. Find limx→0 |x| .
Solution 2.11. To deal with absolute value signs always break the function
into parts depending on when the absolute value sign turns a negative number
x
into a positive one. In this case, note that |x| makes no sense at x = 0 so the
domain is R − {0}. Now |x| is x if x > 0 and is −x if x < 0. So we obtain

x 1 if x > 0
=
|x| −1 if x<0
x x
As x approaches 0 from the right, |x| = 1 so it looks like limx→0 |x| should
x x
be 1. But as x approaches 0 from the left, |x| = −1, so it looks like limx→0 |x|
should be −1. But if a limit exists then it is unique by Theorem 2.9. Therefore
x
limx→0 |x| does not exist.
The next series of properties lets us do arithmetic with limits.

Theorem 2.12 (The Arithmetic of Limits). Suppose that limx→a f (x) = L
and limx→a g(x) = M . Then the following hold:
(a) limx→a (f (x) + g(x)) = L + M ;
(b) limx→a (f (x) − g(x)) = L − M ;
(c) limx→a f (x)g(x) = LM ;
f (x) L
(d) limx→a g(x) = M provided M 6= 0;
(e) limx→a cf (x) = cL for any constant c;
(f) if f (x) ≤ g(x) near a then L ≤ M .
Proof. We will only do part (a). The rest are left as exercises for you.
Let > 0. We wish to show that there is a δ > 0 such that
|(f (x) + g(x)) − (L + M )| <
whenever |x − a| < δ.
By the definition of the limit, to say that limx→a f (x) = L means that there
exists δ1 > 0 such that |f (x) − L| < 2 whenever |x − a| < δ1 . Similarly, to say
that limx→a g(x) = M means that there exists δ2 > 0 such that |g(x)−M | < 2
whenever |x − a| < δ2 . Let δ be the minimum of δ1 and δ2 . Then whenever
|x − a| < δ we have both |f (x) − L| < 2 and |g(x) − M | < 2 . Also, observe
that (f (x) + g(x)) − (L − M ) = (f (x) − L) + (g(x) − M ). Now, by Distance
Inequality (i),
|(f (x) + g(x)) − (L + M )| = |(f (x) − L) + (g(x) − M )|
≤ |f (x) − L| + |g(x) − M |

< + =
2 2
14
as required.
Theorem 2.12 greatly increases the number of limit problems we can tackle.
Starting with the very simple limit limx→a x = a, part (c) gives
lim xn = an
x→a
for any positive integer n, part (e) implies that
lim cxn = can
x→a
for a constant c, and parts (a) and (b) then imply that
lim p(x) = p(a)
x→
for any polynomial p(x). Further, part (d) then implies that
p(x) p(a)
lim =
x→ q(x) q(a)
when both p(x) and q(x) are polynomials and q(a) 6= 0. Let’s record this for
future reference.
Corollary 2.13. Let p(x) and q(x) be polynomials and suppose a is a real
number such that q(a) 6= 0. Then
p(x) p(a)
lim = .
x→a q(x) q(a)
x2 −3x+2
Example 2.14. Find limx→3 2x3 −x+1 .
Solution 2.15. Let q(x) = 2x3 − x + 1. Then q(3) = 52 is not zero. So by

Corollary 2.13,
x2 − 3x + 2 32 − 3(3) + 2 2 1
lim 3
= 3
= = .
x→3 2x − x + 1 2(3) − 3 + 1 52 26
One more property of limits is the Squeeze Theorem.

Theorem 2.16 (The Squeeze Theorem). Suppose that f (x) ≤ g(x) ≤ h(x) for
all x near a. If limx→a f (x) = L and limx→a h(x) = L then limx→a g(x) = L.
Proof. Suppose that limx→a g(x) = M . Since f (x) ≤ g(x) for all x near a,
by Theorem 2.12 (f) we have L ≤ M . Since g(x) ≤ h(x) for all x near a, by
Theorem 2.12 (f) we have M ≤ L. Therefore M = L.
Here is an example showing how the Squeeze Theorem works in practise.

sin x
Problem 2.17. Show that limx→0 x = 1.
Solution 2.18. Without getting sidetracked into it, a little trigonometry
shows that if x ∈ (− π2 , π2 ) then
sin x
cos x ≤ ≤ 1.
x
Observe that limx→0 cos x = 1 and limx→0 1 = 1. Therefore, by the Squeeze
Theorem, limx→0 sinx x = 1.
15
x
One-sided limits. The example of |x| in Problem 2.10 illustrates a good
point. The function was 1 for x to the right of 0 and −1 for x to the left of 0.
The limit didn’t exist because the two values didn’t match up. But there is
value to thinking of the function having limit 1 at 0 as x → 0 from the right
and having limit −1 at 0 as x → 0 from the left. This leads to one-sided
limits.
Definition 2.19. Suppose that f is a function defined on the set (a, b) where
b > a. Then
lim f (x) = L
x→a+
0 < x − a < δ. We say that L is the right hand limit of f (x).
This definition is exactly the same as that for a limit but it is only concerned
with those x near a which also satisfy x > a. That’s why the absolute value
disappeared in the |x − a| < δ part of the definition.
A similar definition works from the left, but now it concerns those x near a
which also satisfy x < a, so |x − a| = a − x.
Definition 2.20. Suppose that f is a function defined on the set (c, a) where
c < a. Then
lim− f (x) = L
x→a
0 < a − x < δ. We say that L is the left hand limit of f (x).
For exmple, we have

x x
lim =1 and lim = −1.
x→0+ |x| x→0− |x|
Limits at infinity. It is often interesting to see what a function is doing

as x becomes a very large positive number (ie, as x approaches infinity) or
as it becomes a very large negative number (ie, as x approaches negative
infinity).
More informally now, if f (x) is a function defined on an interval (a, ∞)
then
lim f (x) = L
x→∞
if f (x) approaches L as x approaches ∞.
Similarly, if f (x) is a function defined on an interval (−∞, a) then
lim f (x) = L
x→−∞
if f (x) approaches L as x approaches −∞.

A fundamental example is limx→∞ x1 = 0. This makes sense since if x is very
large then x1 is very small, and as x gets increasingly larger, x1 gets increasingly
closer to 0.
16
2x2 −x+1
Example 2.21. Find limx→∞ 3x2 +2x−1 .
Solution 2.22. Pick the highest power in either the numerator or the de-
n
moninator, say it’s xn , and multiply by xxn . In our case, the maximum power
is 2, so we get
1 1
2x2 − x + 1 x2 2x2 − x + 1 2− x + x2
2
= 2· 2 = 2 1 .
3x + 2x − 1 x 3x + 2x − 1 3+ x − x2
Now take limits. By the arithmetic of limits in Theorem 2.12, we can take
limits in the numerator and denominator separately, and in each we can take
limits term by term. Further, since limx→∞ x1 = 0 and x12 = x1 · x1 , Theo-
rem 2.12 (c) implies that limx→∞ x12 = 0. Therefore, in the numerator we
have
1 1 1 1
lim 2 − + 2 = lim 2 − lim + lim 2 = 2 − 0 + 0 = 2
x→∞ x x x→∞ x→∞ x x→∞ x
and in the denominator we have
2 1 1 1
lim 3 + − 2 = lim 3 + 2 lim − lim 2 = 3 + 0 − 0 = 3.
x→∞ x x x→∞ x→∞ x x→∞ x
Hence
2x2 − x + 1 2
lim = .
x→∞ 3x2 + 2x − 1 3
Infinite Limits. Another variation is when the function value tends to ∞ or

−∞ as x approaches a. Instead of simply saying the limit does not exist, we
can be a bit more informative.
For example, reconsider limx→0 x1 from Example 2.8. For x > 0 we have x1 > 0,
and as x approaches 0 from the right we have x1 getting very large. In fact, the
closer to 0 that x gets the larger that x1 gets. So what we really have is
1
lim = ∞.
x→0+ x
On the other hand, for x < 0 we have x1 < 0, and as x approaches 0 from the
left we have x1 becoming a very large negative numbers. More, the closer to 0
that x gets, the larger a negative number that x1 becomes. So we have
1
lim− = −∞.
x→0 x
This was an interesting example because it combined many of the variations
on limits. Another combination arrangement is the following.
4x3 −3x+1
Example 2.23. Find limx→∞ 2x2 +8x−2 .
Solution 2.24. Try the same trick as in Example 2.21. We have

4x3 − 3x + 1 x3 4x3 − 3x + 1 4 − x32 + x13
= · = 2 8 2 .
2x2 + 8x − 2 x3 2x2 + 8x − 2 x + x2 − x3
The limit of the numerator as x → ∞ is 4 while the limit of the denominator
is 0. This gives a total limit of 40 which doesn’t make sense. The problem
17
is the 0 in the denominator, which obscures what’s going on. Let’s instead
modify the trick to make sure the denominator has a limit which is a nonzero
constant.
n
Multiply by xxn where n is the highest power in the denominator (rather than
the highest in either the numerator or denominator). Doing so will give a limit
with a constant in the denominator. In our case, we get
4x3 − 3x + 1 x2 4x3 − 3x + 1 4x − x3 + x12
= · = .
2x2 + 8x − 2 x2 2x2 + 8x − 2 2 + x8 − x22
Now in the numerator we have
3 1 1 1
lim 4x − + 2 = lim 4x − 3 lim + lim 2
x→∞ x x x→∞ x→∞ x x→∞ x
= lim 4x − 0 + 0
x→∞
= lim 4x
x→∞
where the term limx→∞ 4x term has been deliberately left alone for the mo-
ment. In the denominator we have
8 2 1 1
lim 2 + − 2 = lim 2 + 8 lim − 2 lim 2 = 2 + 0 − 0 = 0.
x→∞ x x x→∞ x→∞ x x→∞ x
Thus
4x3 − 3x + 1 4x
lim = lim = lim 2x = ∞.
x→∞ 2x2 + 8x − 2 x→∞ 2 x→∞
18
3. Continuity
Intuitively, a function is continuous if it has no breaks or sudden changes, that

is, if when drawing the graph of the function you can do so without lifting
your pencil off the page. Formally, this is phrased as follows.
Definition 3.1. A function f is continuous at a point a if limx→ f (x) = f (a).
Picking apart the definition, this says:
• the function is defined on an open interval containing a (so it makes

sense to take the limit at a);
• the limit limx→a f (x) exists; and
• the limit at a is equal to f (a).
Definition 3.2. If a function f is not continuous at a we say that f is dis-

continuous at a.
For example, the function

2x + 1 if x 6= 1
f (x) =
1 if x = 1
f (x)
is not continuous at x = 1 since limx→1 f (x) = 3 is not equal to f (1) = 1.

The function
1
x if x 6= 0
g(x) =
0 if x = 0
19
g(x)
is not continuous at x = 0 since limx→0+ g(x) = ∞ while g(0) = 0.

There are variations on continuity just as there are variations on limits.
Definition 3.3. A function f is continuous on the right at a if limx→a+ f (x) =
f (a). A function f is continuous on the left at a if limx→a− f (x) = f (a).
For example, if
x
|x| if x 6= 0
f (x) =
1 if x = 0
f (x)
then f is continuous on the right at 0 since limx→0+ f (x) = 1 and f (0) = 1,

but f is not continuous on the left at 0 since limx→0− f (x) = −1 but f (0) = 1.
Definition 3.4. A function f is continuous on an open interval (b, c) if f is
continuous at every point a ∈ (b, c).
Definition 3.5. A function f is continuous on a closed interval [b, c] if f is
continuous a a for every a ∈ (b, c), if it is continuous on the right at b, and
continuous on the left at c.
Analogous definitions of continuity exist for half-open intervals (b, c] and [b, c),
as well as for extended intervals (b, ∞) or [b, ∞), and for unions of intervals
such as (−1, 1) ∪ [3, ∞).
20
For example, consider the function f (x) = x2 restricted to the closed interval
[0, 2]. Then f is continuous at a for every a ∈ (0, 2), it is continuous on the
right at 0, and it is continuous on the left at 2. Therefore f is continuous on
[0, 2].
The arithmetic of limits translates into arithmetic for continuous functions.
Theorem 3.6. If f and g are functions that are continuous at a then:
(a) f + g is continuous at a;
(b) f − g is continuous at a;
(c) f g is continuous at a;
(d) f /g is continuous at a provided that g(a) 6= 0;
(e) cf is continous at a, for any constant c.
Proof. We’ll prove part (a), the others being similar. Since f and g are
continuous at a, by the definition of continuity, limx→a f (x) = f (a) and
limx→a g(x) = g(a). By the arithmetic of limits,
lim (f (x) + g(x)) = lim f (x) + lim g(x) = f (a) + g(a).
x→a x→a x→
Since f (a) + g(a) = (f + g)(a), we obtain limx→a (f (x) + g(x)) = (f + g)(a),

so f + g is continuous at a.
Definition 3.7. A polynomial is a function p : R −→ R of the form

p(x) = an xn + an−1 xn−1 + · · · a1 x + a0
where each ai is a real number, for 0 ≤ i ≤ n. A rational function is a function
of the form
p(x)
r(x) =
q(x)
where both p(x) and q(x) are polynomials.
Theorem 3.6 lets us quickly build up a stockpile of continuous functions.

Start with f (x) = x. It is easy to see that this is continuous at a for any
a ∈ R. Theorem 3.6 (c) implies that f (x) = xm is continous at a for any
strictly positive integer m. Theorem 3.6 (e) then implies that f (x) = am xm
is continuous at a for any constant am . Theorem 3.6 (a) then implies that
f (x) = an xn + an−1 xn−1 + · · · + a1 x is continuous at a. Observe that the
constant function g(x) = a0 is continuous at a as well, so Theorem 3.6 (a)
implies that any polynomial p(x) = an xn + an−1 xn−1 + · · · a1 x + a0 is con-
tinuous at a. Theorem 3.6 (d) then implies that any rational function p/q is
continuous at a provided that q(a) 6= 0. Summarizing:
Corollary 3.8. Any polynomial p(x) is continuous at a for every a ∈ R. Any
rational function r(x) = p(x)
q(x) is continuous at a provided that q(a) 6= 0.
21
Many other functions are also continuous. Without proof, we’ll simply state
that sin x and cos x are continuous at all x ∈ R. Theorem 3.6 (d) then implies
sin x 1
that tan x = cos x is continuous whenever cos x 6= 0. Similarly, sec x = cos x ,
1 cos x
csc x = sin x and cot x = sin x are continuous whenever the denominator is
nonzero.
As well, the exponential function ax is continuous for all x ∈ R and loga x is
continuous for all x ∈ R+ .
Composition is an important operation that builds new functions out of ex-

isting ones. If f and g are functions with appropriate matching conditions
on the range of g and the domain of f , then (f ◦ g)(x) = f (g(x)). The next
theorem says composition preserves continuity.
Theorem 3.9. Let f and g be functions that are continuous at a. Then the
composition f ◦ g is continuous at a.
The proof is a bit tricky, but the idea is simple. As x approaches a we have
g(x) approaching g(a). Let g(x) = y and g(a) = b. Then as y approaches b
we have f (y) approaching f (b). That is, as x approaches a we have f (g(x))
approaching f (g(a)). So f ◦ g is continuous.
Challenge Problem 3.10. Prove Theorem 3.9.
q
Example 3.11. Show that the function F (x) = 2x23x+1
−2x+1 is continuous for
all x ∈ [− 13 , 1) ∪ (1, ∞).
Solution 3.12. Observe that F (x) = (f ◦ g)(x) where g(x) = x23x+1 −2x+1 and
√
f (x) = x. First, x2 −2x+1 = (x−1)2 , which is 0 at x = 1, so g(x) = x23x+1 −2x+1
√
is continuous on R−{1}. Second, the domain of f (x) = x is x ≥ 0, so for the
composition F (x) = f (g(x)) to make sense we need g(x) ≥ 0. Observe that
the denominator of g(x) is the square (x − 1)2 which is always nonnegative.
Therefore g(x) ≥ 0 if and only if the numerator satisfies 3x + 1 ≥ 0. This is
true if and only if x ≥ − 31 . Putting it all together, if x ≥ − 13 and x 6= 1 then
F (x) = f (g(x)) makes sense and is continuous because it is a composition of
two continuous functions.
Properties of continuous functions. Continuous functions have two key

properties. We will first discuss them and then give some justification for why
they are true.
The Intermediate Value Theorem says that if a function is continuous and
takes values c and d with c < d then it must take all values in between c and
d as well. This is the notion that a continuous function can be drawn without
lifting your pencil off the page.
Theorem 3.13 (Intermediate Value Theorem). If f is continuous on the
closed interval [a, b] and K is any number between f (a) and f (b) then there is
a c ∈ (a, b) such that f (c) = K.
22
One application of this theorem is to help locate the zeroes of a function,

which can have many practical applications, especially in cases where it is not
possible to obtain zeroes by factoring.
Example 3.14. Show that sin x+2 cos x−x2 = 0 has a solution in the interval
[0, π/2].
Solution 3.15. Let f (x) = sin x+2 cos x−x2 . Since f (x) is a sum of multiples
of continuous functions, by Theorem 3.6 it is continuous. Observe that
f (0) = sin(0) + 2 cos(0) − 02 = −2
and
2
π2 π2

π π π π
f = sin + 2 cos − =1+2·0+2 =1+ .
2 2 2 2 4 2
2
Whatever 1 + π2 may be the point is that it is larger than 0. Summarizing to
glean the key information: f (0) < 0 and f (π/2) > 0. Since f is a continuous
function, the Intermediate Value Theorem implies that there is a c ∈ (0, π/2)
such that f (c) = 0.
Another key property of continuous functions is that they attain minimums

and maximums on closed intervals. To make sense of this, let’s start with
some definitions.
Definition 3.16. Let f be a function on an interval I. Then f has an upper
bound if there is a number M such that f (x) ≤ M for all x ∈ I. Also, f has
a lower bound if there is a number N such that N ≤ f (x) for all x ∈ I. We
say that f is bounded if it has both an upper and a lower bound.
Example 3.17. Show that the function f (x) = x1 has a lower bound on the
interval I = (0, 1] but it does not have an upper bound.
Solution 3.18. Before starting, note that f (x) = x1 is not defined at x = 0,
which is why this point was omitted from the interval. Now observe that
f (1) = 1 and for any 0 < x < 1 we have f (x) = x1 > 1. Therefore 1 is a
lower bound for f on (0, 1]. On the other hand, suppose that f has an upper
bound of M . Since f (x) > 1 for all x ∈ (0, 1] we can assume that M > 1.
Consider M1+1 . As M > 1 we have 0 < M1+1 < 1 so M1+1 ∈ (0, 1]. Also,
f ( M1+1 ) = M + 1 > M . But this contradicts the fact that M is supposed to
be an upper bound for f on (0, 1]. Therefore, it cannot be the case that M
is an upper bound for f . As this holds for any M > 1, the function f has no
upper bound on (0, 1].
Remark 3.19. It’s worth pointing out that upper and lower bounds are not
unique. In the last example, 1 is a lower bound for f on (0, 1], but so is 0.
For 0 is less than f (x) for any x ∈ (0, 1], so it fits the definition of being a
lower bound. In fact, any number less than 1 is also a lower bound.
Definition 3.20. A function f on an interval I has a maximum value if there
is a number M ∈ R such that M is an upper bound for f on I and M = f (c)
for some c ∈ I. The function f has a minimum value on I if there is a number
N ∈ R such that N is a lower bound for f on I and N = f (d) for some d ∈ I.
23
Maximum and minimum values refine the notions of upper and lower bounds
respectively, provided a maximum or minimum value exists. In the previous
example of f (x) = x1 on (0, 1] we have 1 has a minimum value since 1 is a
lower bound for f on (0, 1] and f (1) = 1. However, as f has no upper bound
it can have no maximum value either.
Example 3.21. Let f (x) = x2 on (0, 2]. Find the minimum and maximum
values of f , if they exist.
Solution 3.22. Observe that 4 is an upper bound since f (x) ≤ 4 for any
x ∈ (0, 2]. Also, 2 is a point in the interval and f (2) = 4, so 4 is the maximum
value of f on (0, 2]. On the other hand, 0 is a lower bound since f (x) ≥ 0
for all x ∈ (0, 2], but 0 is not a minumum value since there is no d ∈ (0, 2]
such that f (d) = 0. In fact, f has no minimum value on (0, 2]. To see why,
suppose that N is the minimum value of f on (0, 2]. We claim that it must
be the case that N ≤ 0. For if N > 0 then consider N2 , which is larger than 0
q
but smaller than N . We have N2 = f ( N2 ), so N cannot be a a lower bound
for f . But with N ≤ 0, there cannot be any d ∈ (0, 2] with f (d) = N since
f (d) > 0 for all such d while N ≤ 0. Hence N cannot be a minimum value
for f , a contradiction.
Notice that continuity has not yet entered the picture in terms of upper and
lower bounds and maximum and minimum values. It only does so now. The
following theorem says that a continuous function on a closed interval does
have both a maximum and a minimum value.
Theorem 3.23 (Extreme Value Theorem). Let f be a continuous function on
a closed interval [a, b]. Then f is bounded on [a, b] and f has both a maximum
value and a minimum value on [a, b].
The Extreme Value Theorem is useful in optimisation problems, when it helps

to know a maximum or minimum exists.
Example 3.24. What is the area of the largest rectangular plot that can be
enclosed by a 200 metre fence?
Solution 3.25. Let x and y be the side lengths of the rectangle. We want
to maximise the area A = xy given the condition that the perimeter satisfies
2x+2y = 200. Solving the latter gives y = 100−x, so we write A as a function
of x only:
A(x) = x(100 − x) = 100x − x2 .
Notice that A(x) = 0 at x = 0 and x = 100. As A(x) is continuous on [0, 100],
by the Extreme Value Theorem, A(x) has a maximum value on [0, 100]. Also
note that A(x) > 0 on [0, 100] so this maximum value is positive. This en-
sures that the problem has a sensible solution. As for working out the actual
solution, this is left to you!
Some proofs. This subsection is optional, but is recommended reading. It

goes over the proofs of the Intermediate Value Theorem and the Extreme
24
Value Theorem. First, we need another variation on the upper/lower bound

concept.
Definition 3.26. Let S be a nonempty set of real numbers which is bounded
above. A number M is the least upper bound of S if and only if:
• M is an upper bound for S; and
• M ≤ K where K is any other upper bound for S.
For example, let S = (0, 1). Then 1 is an upper bound for S, as is 2 and 17.
We claim that 1 is the least upper bound for S. This means showing that
if K is any other upper bound for S then 1 ≤ K. But if K is another upper
bound, we must have K ≥ 1. Suppose K is an upper bound for S and K < 1.
We’ll aim for a contradiction. Observe that the distance between K and 1 is
1 − K. So consider L = K + 1−K 2 . Then K < L and L < 1. As L < 1 we
have L ∈ (0, 1) so K < L implies that K cannot be an upper bound for (0, 1),
contradicting our choice of K.
The notion of a greatest lower bound can also be defined, although we won’t
need it right now.
The following is an axiom of the real numbers. That is, it is a statement to
be accepted without proof.
Axiom (The Least Upper Bound Axiom). Every nonempty set of real num-
bers that has an upper bound has a least upper bound.
Lemma 3.27. Let f be continuous on [a, b]. If f (a) < 0 < f (b), or if
f (a) > 0 > f (b), then there is a number c between a and b such that f (c) = 0.
Proof. Suppose f (a) < 0 < f (b), the other case being proved similarly. Since f
is continuous, limx→a f (x) = f (a). That is, as x approaches a, f (x) ap-
proaches f (a). As f (a) < 0, this means that if x is close enough to a then
f (x) also has to be less than 0. Technically, there is some small number 0 > 0
such that f is negative on the interval [a, 0 ). This line of reasoning will be
used repeatedly in the argument that follows.
Let S be the set of all values for which f is negative on [a, ). That is,
S = { | f is negative on [a, )}.
Observe that S is nonempty since 0 ∈ S. The set S has an upper bound
since f (b) > 0 implies that f must be positive on some interval (t, b), implying
that f is not negative on the whole interval [a, b). So b is an upper bound on S.
Therefore, by the Least Upper Bound Axiom, S has a least upper bound c.
Note that c ≤ b.
We claim that f (c) = 0, which would prove the lemma. If f (c) > 0 then f is
positive on some interval of the form (t, c) so t is also an upper bound on S
and t < c. But this contradicts the fact that c is the least upper bound on S.
If f (c) < 0 then f is negative on some interval of the form (c, s), implying
25
that f is negative on the interval [a, s), meaning that the least upper bound
of S must be at least as large as s. But this contradicts the fact that c < s.
Thus the only option left is that f (c) = 0.
Theorem 3.28 (The Intermediate Valude Theoerem). If f is continuous on

the closed interval [a, b] and K is any number between f (a) and f (b) then there
is a c ∈ (a, b) such that f (c) = K.
Proof. Saying that K is between f (a) and f (b) means that either f (a) < K <
f (b) or f (b) < K < f (a). Suppose that f (a) < K < f (b), the other case being
similar. Let g be the function defined by
g(x) = f (x) − K.
Observe that g is a difference of continuous functions on [a, b] so g is contin-
uous on [a, b]. Also, g(a) = f (a) − K < 0 and g(b) = f (b) − K > 0. So
by Lemma 3.27, there is a number c between a and b such that g(c) = 0.
Therefore, as f (x) = g(x) + K, we obtain f (c) = g(c) + K = K.
Now on to the Extreme Value Theorem.

Lemma 3.29. If f is continuous on [a, b] then f is bounded on [a, b].
Proof. Let S be the following set:

S = {x ∈ [a, b] | f is bounded on [a, x]}.
Observe that a ∈ S since a ∈ [a, b], the interval [a, a] is just the one point a,
and f is bounded on this one point set by f (a). In particular, S is nonempty.
Observe that any element x in S has the property that x ≤ b, so b is an upper
bound on S. Therefore, by the Least Upper Bound Axiom, S has a least upper
bound c. Note that c ≤ b.
We claim that c = b, which would prove the lemma. Let’s aim for a contradic-
tion by supposing that c < b. Since f is continuous on [a, b] and c ∈ [a, b], we
have limx→c f (x) = f (c). In particular, if x is near c then f (x) is near f (c).
Technically, if we pick > 0 then there is a δ > 0 such that if x ∈ (c, c + δ)
then |f (x) − f (c)| ≤ . If c + δ > b we can take a smaller δ to make sure
c + δ ≤ b, and the inequality |f (x) − f (c)| ≤ still holds. But then f is
bounded on [a, c + δ] by either: (i) u + where u is the bound for f on [a, c],
or (ii) f (c + δ). Consequently, c + δ ∈ S. But then the least upper bound
on S must be larger than or equal to c + δ. This is a contradiction since c is
the least upper bound on S and c < c + δ. Hence it must be the case that
c = b.
Theorem 3.30 (Extreme Value Theorem). Let f be a continuous function on

a closed interval [a, b]. Then f is bounded on [a, b] and f has both a maximum
value and a minimum value on [a, b].
26
Proof. By Lemma 3.29, f is bounded on [a, b]. Let S be the set

S = {f (x) | x ∈ [a, b]}.
As f (a) ∈ S we have S nonempty, and as f is bounded on [a, b] the set S is
bounded. Therefore, by the Least Upper Bound Axiom, S has a least upper
bound M .
We claim that M = f (c) for some c ∈ [a, b]. That is, we claim that M ∈ S.
Let’s aim for a contradiction by supposeing that M ∈ / S. Since M is an upper
bound for S, we have M ≥ f (x) for every x ∈ [a, b]. Saying M ∈ / S means
that we in fact have M > f (x) for every x ∈ [a, b]. For if M = f (x) for some
x ∈ [a, b] then M ∈ S. Let g be the function defined by
1
g(x) = .
M − f (x)
Since M > f (x) for all x ∈ [a, b], the function g(x) is defined for all x ∈ [a, b]
and it is continuous since it is the composition F ◦ G of continuous functions,
where F (x) = x1 and G(x) = M − f (x). Therefore, by Lemma 3.29, g is
bounded on [a, b]. But this is a contradiction since we can choose f (x) to be
as close to M as we like, implying that g(x) becomes arbitrarily large.
Thus M ∈ S, implying that M = f (c) for some c ∈ [a, b]. But this implies
that f has a maximum value since M ≥ f (x) for all x ∈ [a, b].
27
4. Differentiation
What is the derivative? Informally, start with a function f and a point x

in the interior of the domain of f (ie - x is not an endpoint). Consider f (x) on
the graph of f . We want to find the tangent line to the graph of f at x.
f (x)
How can we do this? First, remember we want the equation of a line, and a
line is determined by its slope and one point on the line. We have the one
point, it is (x, f (x)). What we need to find is the slope.
The idea is to find the slope through a series of approximations. Pick a point
x + h close to x. To simplify the argument, suppose h > 0. Look at the red
line connecting the points (x, f (x)) and (x + h, f (x + h)) on the graph of f ,
called the secant line.
f (x)
x x+h
(1)
This secant line approximates the tangent line. The advantage of the secant
line is that its slope is easy to calculate since we know two points on the line.
The slope of the secant line is
f (x + h) − f (x) f (x + h) − f (x)
= .
(x + h) − x h
Thinking in terms of the graph (1), if h is made smaller then we get a new
secant line which should be a better approximation to the tangent line.
28
As these approximations to the tangent line improve the smaller h gets, let’s
take the limit as h approaches 0 in order to get the tangent line on the nose.
Then the slope of the tangent line to the graph of f at x should be
f (x + h) − f (x)
lim .
h→0 h
We take this to be the definition of the derivative of f at x.

Definition 4.1. A function f is said to be differentiable at x if and only if
(f (x + h) − f (x)
lim
h→0 h
exists. If this limit exists it is called the derivative of f at x, and it is written
as f 0 (x).
WARNING!: The definition of the derivative is careful to make sure that the
limit actually exists. If the limit does not exist - and this can happen - then
the function is not differentiable at x.
The definition of the derivative works one point x at a time. It’s often easier
to talk about functions being defined on an interval, or some domain.
Definition 4.2. A function f is differentiable on a subset S of the real num-
bers if and only if it is differentiable at every point x in S.
We’ll soon be able to calculate derivatives with ease, but let’s stick to the
definition for the time being.
Example 4.3. Using the definition of the derivative, find the derivative of
f (x) = x2 .
f (x+h)−f (x)
Solution 4.4. Before taking any limits, just consider the fraction h .
Using f (x) = x2 and simplifying we get
f (x + h) − f (x) (x + h)2 − x2 (x2 + 2xh + h2 ) − x2 2xh + h2
= = = = 2x+h.
h h h h
With the arithmetic done, now take limits to get
f (x + h) − f (x)
lim = lim (2x + h) = 2x.
h→0 h h→0
2 0
Thus the derivative of f (x) = x is f (x) = 2x. Note that this works for all
x ∈ R, so f is differentiable on R.
In general, the derivative f 0 is a new function. Its domain is a subset of

the domain of f , since the derivative of f can’t possibly exist at x unless f
is defined at x. Sometimes the domain is a proper subset of the domain
of f .
Example 4.5. Using √ the definition of the derivative, find the derivative of
the function f (x) = x when it exists.
29
√
Solution 4.6. First observe that the domain of f (x) = x is [0, ∞). The
derivative of f is defined in terms of a limit involving h approaching 0, where
we can equivalently think of this as x + h approaching x. In this case, if x = 0
then the function f is not defined for points less than 0, so 0 + h can approach
0 only from the right. Therefore the limit limh→0 f (0+h)−fh
(0)
does not exist.
Hence f is not differentiable at x = 0.
Now suppose x > 0. Then f is defined both to the left and right of x so
it makes sense to take a limit as x + h approaches x. Before taking limits,
consider the fraction f (x+h)−f
h
(x)
. We have
√ √ √ √ √ √
f (x + h) − f (x) x+h− x x+h− x x+h+ x
= = · √ √
h h h x+h+ x
(x + h) − x
= √ √
h · ( x + h + x)
h
= √ √
h · ( x + h − x)
1
=√ √ .
x+h+ x
Therefore
f (x + h) − f (x) 1 1 1
lim = lim √ √ =√ √ = √ .
h→0 h h→0 x+h+ x x+ x 2 x
√
In conclusion, f (x) = x is differentiable for x ∈ (0, ∞) and at any such x
the derivative is f 0 (x) = 2√
1
x
.
Remember that the derivative is designed to find the slope of a tangent line.
So we can now go back to work out actual tangent lines.
Example 4.7. Find the equation of the tangent line for the function f (x) =
x2 at x = 3.
Solution 4.8. To get the equation of a line we need a point on the line and
its slope. For the point on the line, at x = 3 we have f (3) = 32 = 9 so (3, 9)
will be a point on the tangent line. Its slope is given by the derivative of f
at x = 3. We already saw that f 0 (x) = 2x, so at x = 3 we have f 0 (3) = 6.
Therefore the equation of the tangent line is y − 9 = 6(x − 3), which simplifies
to y = 6x − 9.
Differentiability v. Continuity. It turns out that a differentiable func-

tion has to be continuous, but a continuous function does not have to be
differentiable. Let’s see why.
30
Consider the absolute value function f (x) = |x|.
f (x)
It’s not hard to see using the definition of continuity that f is continuous on R,
and this matches the geometric intuition of being able to draw the graph of f
without lifting your pencil off the page.
Now consider the differentiability of f . If x > 0 then f (x) = x. It’s easy
to check using the definition of the deriviative that f (x) is differentiable on
(0, ∞) and f 0 (x) = 1. Geometrically, this makes sense since the tangent line
to a line is the same line, and the slope of f (x) = x is 1. Similarly, if x < 0
then f (x) = −x and this is differentiable on (−∞, 0), although now the slope
of the tangent line is −1. However, at x = 0 something different happens. We
have just seen that
f (x + h) − f (x) f (x + h) − f (x)
lim =1 and lim = −1
h→0+ h h→0− h
for f (x) = |x|. Therefore limh→0 f (x+h)−f
h
(x)
does not exist. Hence f is *not*
differentiable at x = 0. Geometrically, the problem is the corner at x = 0 in the
graph of f , where the slope of the tangent line has to change abruptly.
What we conclude is that the function f (x) = |x| is continuous on R but not
differentiable on R. More precisely, f is continuous on R and differentiable
only on R − {0}.
On the other hand, a differentiable function is continuous.
Theorem 4.9. If f is differentiable at x then f is continuous at x.
Proof. Since f is differentiable at x we have limh→0 f (x+h)−f

h
(x)
= f 0 (x). That
0
is, the limit exists and its value if f (x). Observe that
f (x + h) − f (x)
f (x + h) − f (x) = · h.
h
So taking limits we obtain
f (x + h) − f (x)
lim (f (x + h) − f (x)) = lim ·h
h→0 h→0 h
f (x + h) − f (x)
= lim · lim h
h→0 h h→0
= f 0 (x) · 0 = 0.
31
Thus
lim f (x + h) = f (x).
h→0
This doesn’t immediately look like it, but it is equivalent to saying that f
is continuous at x. Thinking through it, as x + h approaches x we have
f (x + h) approaching f (x). This is the same intuition used in saying that
limy→x f (y) = f (x).
Challenge Problem 4.10. Finish the proof of Theorem 4.9 by showing that
limh→0 f (x + h) = f (x) is the same as saying that the limit of f at x is f (x).
Differentiation Formulas. Now we establish some rules which will let us

calculate freely. We begin with two special cases that will be the building
blocks.
Lemma 4.11. The following hold:
(a) if f (x) = x then f 0 (x) = 1;
(b) if f (x) = c for some constant c then f 0 (x) = 0.
Proof. For part (a), observe that

f (x + h) − f (x) ((x + h) − x h
= = = 1.
h h h
Therefore
f (x + h) − f (x)
f 0 (x) = lim = lim 1 = 1.
h→0 h h→0
For part (b), observe that

f (x + h) − f (x) c−c
= = 0.
h h
Therefore
f (x + h) − f (x)
f 0 (c) = lim = lim 0 = 0.
h→0 h h→0

From the building blocks we move on to derivatives of operations on func-

tions.
Theorem 4.12. Let f and g be functions which are differentiable at x. Then:
(a) (f + g)0 (x) = f 0 (x) + g 0 (x);
(b) (f − g)0 (x) = f 0 (x) − g 0 (x);
(c) (cf )0 (x) = cf 0 (x) for any constant c.
Proof. We only prove part (a) to give a taste of how things are done. The defi-
nition of the derivative of f + g at x considers the fraction (f +g)(x+h)−(f
h
+g)(x)
.
So first observe that
32
(f + g)(x + h) − (f + g)(x) [f (x + h) + g(x + h)] − [f (x) + g(x)]

=
h h
[f (x + h) − f (x)] + [g(x + h) − g(x)]
=
h
f (x + h) − f (x) g(x + h) − g(x)
= + .
h h
Therefore
(f + g)(x + h) − (f + g)(x)
(f + g)0 (x) = lim
h→0 h
f (x + h) − f (x) g(x + h) − g(x)
= lim + lim = f 0 (x) + g 0 (x).
h→0 h h→0 h

Theorem 4.13 (The Product Rule). Let f and g be functions which are
differentiable at x. Then
(f · g)0 (x) = f 0 (x)g(x) + f (x)g 0 (x).
Theorem 4.14 (The Quotient Rule). Let f and g be functions which are
differentiable at x and suppose that g(x) 6= 0. Then
0
f f 0 (x)g(x) − f (x)g 0 (x)
(x) = .
g [g(x)]2
The proofs of Theorems 4.13 and 4.14 are trickier but not unreasonable. How-
ever, we will not dwell on them as the point of the formulas is to start calcu-
lating.
Now let’s build. Consider f (x) = x2 . Think of f (x) as x · x. Then by the
Product Rule and the fact that (x)0 = 1 we get
f 0 (x) = (x)0 · x + x · (x)0 = 1 · x + x · 1 = 2x.
Suppose that for n ≥ 1 we have proved that (xn−1 )0 = (n − 1)xn−1 . By the
Product Rule we obtain
(xn )0 = (x · xn−1 )0 = (x)0 · xn−1 + x · (xn−1 )0
(2) = 1 · xn−1 + x · (n − 1)xn−1
= nxn−1 .
Continuing, suppose that p(x) = an xn + an−1 xn−1 + · · · + a0 is a polynomial.
From Lemma 4.11, Theorem 4.12 and (2) we obtain the following.
Proposition 4.15. If p(x) = an xn + an−1 xn−1 + · · · + a0 is a polynomial then
p0 (x) = na xn−1 + (n − 1)a
n xn−2 + · · · + 2a x + a .
n−1 2 1
Proposition 4.15 can seem like a mouthful, but it’s just the notation that makes
it that way. In practise, differentiating polynomials is dead easy.
Example 4.16. Find the derivative of f (x) = 3x4 − 7x3 + 2x + 8.
Solution 4.17. We have f 0 (x) = 12x3 − 21x2 + 2.
Example 4.18. If g(x) = (x3 − 4x + 2)(x11 − x6 + 1), find g 0 (1).
33
Solution 4.19. You could multiply the two factors of g(x) together and then
differentiate, but it may be easier to simply use the product rule. We have
g 0 (x) = (x3 − 4x + 2)0 (x11 − x6 + 1) + (x3 − 4x + 2)(x11 − x6 + 1)0

= (3x2 − 4)(x11 − x6 + 1) + (x3 − 4x + 2)(11x10 − 6x5 ).
So g 0 (1) = (−1)(1) + (−1)(5) = −6.
Using the Quotient Rule, rational functions are also easy to differentiate.
3x2 −2x+4
Example 4.20. Find the derivative of f (x) = x2 −1 .
Solution 4.21. By the Quotient Rule, we have
(3x2 − 2x + 4)0 (x2 − 1) − (3x2 − 2x + 4)(x2 − 1)0

f 0 (x) =
(x2 − 1)2
(6x − 2)(x2 − 1) − (3x2 − 2x + 4)(2x)
=
(x2 − 1)2
6x3 − 2x2 − 6x + 2 − (6x3 − 4x2 + 8x)
=
(x2 − 1)2
2
2x − 14x + 2
= .
(x2 − 1)2
A special case of Theorem 4.14 that sometimes gets singled out as its own rule
is when f (x) = 1.
Corollary 4.22 (The Reciprocal Rule). Let g be a function which is differ-

entiable at x and suppose that g(x) 6= 0. Then
0
1 g 0 (x)
(x) = − .
g [g(x)]2
Example 4.23. If h(x) = 1

2x3 −4x+3 , find h0 (2).
Solution 4.24. By the Reciprocal Rule (or the Quotient Rule),
(2x3 − 4x + 3)0 6x2 − 4

h0 (x) = − = − .
(2x3 − 4x + 3)2 (2x3 − 4x + 3)2
So h0 (2) = − 11
20 20
2 = − 121 .
Here’s one more example that’s slightly more abstract.

ax2 +bx+c
Example 4.25. Suppose that f (x) = 2x+1 for some constants a, b and c.
Find f 0 (−1).
34
Solution 4.26. Use the Quotient Rule as usual, keeping track of constants.
We get
(ax2 + bx + c)0 (2x + 1) − (ax2 + bx + c)(2x + 1)0
f 0 (x) =
(2x + 1)2
(2ax + b)(2x + 1) − (ax2 + bx + c)(2)
=
(2x + 1)2
(4ax + 2bx + 2ax + b) − (2ax2 + 2bx + 2c)
2
=
(2x + 1)2
2
2ax + 2ax + (b − 2c)
= .
(2x + 1)2
2a−2a+(b−2c)
Therefore f 0 (−1) = (−1)2 = b − 2c.
Challenge Problem 4.27. Use the Reciprocal Rule to show that if n is a
negative integer then (xn )0 = nxn−1 .
Higher order derivatives. Having taken the derivative of a function f we

get a new function f 0 . We could take the derivative of f 0 now. This is called
the second derivative of f and is written f 00 . That is,
f 00 (x) = (f 0 (x))0 .
We could take the third derivative f 000 , which equals the derivative of f 00 .
Carrying on, the prime notation is awkward and instead we write f (n) for the
nth -derivative of f .
Example 4.28. Let f (x) = x4 + 2x3 − x2 + 4x + 3. Find all the higher order
derivatives of f .
Solution 4.29. We have
f (x) = x4 + 2x3 − x2 + 4x + 3
f 0 (x) = 4x3 + 6x2 − 2x + 4
f 00 (x) = 12x2 + 12x − 2
f 000 (x) = 24x + 12
f (4) (x) = 24
f (5) (x) = 0
and f (n) (x) = 0 for all n ≥ 6 since the derivative of the zero function is zero.
Challenge Problem 4.30. Find a formula for f (n) (x) if f (x) = x1 .
The derivative as a rate of change. The derivative f 0 (x) is the slope of

the tangent line to a function f at the point x. There is an alternative view
of the derivative as an instantaneous rate of change of f . Let’s try to figure
out what this means.
35
What should the rate of change of a function f be? A reasonable answer is

that it should be the amount by which f changes, in proportion to how x
changes. That is, if we look at x changing to x + h, then f (x) changes to
f (x + h). The amount by which f has changed is f (x + h) − f (x), and taking
this in proportion to how much x has changed is the quotient
f (x + h) − f (x)
.
(x + h) − x
This is exactly the slope of the secant line connecting the points (x, f (x)) and
(x + h, f (x + h)) on the graph of f , just as in diagram (1).
Further, if h is made smaller so x + h is closer to x, then we get a more

accurate description of the rate at which f is changing when it’s very near
to x. If we take this to the limit by letting h approach 0, then we get the
instantaneous rate of change of f at x. But this limit is limh→0 (f (x+h)−f
(x+h)−x
(x)
=
f (x+h)−f (x)
limh→0 h , which is precisely the definition of f 0 (x). Thus:
f 0 (x) is the instantaneous rate of change of f at x.
In terms of an object moving in two dimensional space, if the path of motion

is given by the function y(t) then the derivative y 0 (t) is the velocity of the
object, and y 00 (t) is the acceleration of the object. The velocity is therefore the
instantaneous rate of change in the motion of the object, and the acceleration
is the instanteneous rate of change in the velocity of the object.
Example 4.31. An object’s motion in space is given by the function p(t) =

t3 − 2t + 1. Find the velocity and acceleration of the object at t = 2.
Solution 4.32. The velocity is p0 (t) = 3t2 − 2 and the acceleration is p00 (t) =
6t. At t = 2, the velocity is p0 (2) = 10 and the acceleration is p00 (2) = 12.
Thinking of the derivative as a rate of change leads to a new form of notation,

which is commonly used in aspects of multivariable calculus and applications
of calculus to physics and engineering. Suppose that y is function of x, for
example, y(x) = x2 − 2x + 1. Let’s introduce the following notation for the
derivative, due to Liebniz:
dy
= y 0 (x).
dx
The Liebniz notation is more akin to a rate of change, where you’re thinking
intuitively of dy as the change in y and dx as the change in x.
dy 2x−1
Example 4.33. Find: (i) dx if y = 3x+4 , and (ii) the rate of change of y at
x = 0.
36
Solution 4.34. This is just finding the derivative using the Quotient Rule:
dy (2x − 1)0 (3x + 4) − (2x − 1)(3x + 4)0
=
dx (3x + 4)2
2(3x + 4) − (2x − 1)3
=
(3x + 4)2
11
= .
(3x + 4)2
dy 11
The rate of change of y at x = 0 is dx (0) = 16 .
d2 y d3 y
In terms of higher order derivatives, we write dx2 for y 00 (x), dx3 for y 000 (x),
and so on.
Example 4.35. An object’s motion in space over time is given by the function
2
y(t) = 2t3 − 3. Find its acceleration ddt2y .
dy
Solution 4.36. The velocity is the first derivative of y, which is dt = 6t2 .
2
d y
The acceleration is the second derivative of y, which is dt2 = 12t.
dy
The dx notation is very handy if there is more than one variable in play, which
is the content of the next section on the Chain Rule.
The Chain Rule. The Chain Rule is a formula for taking the derivative
of a composition of functions. Suppose that y = f ◦ g where f and g are
differentiable. What is the derivative of the composition y?
To break it down, we’ll use rates of change and the Liebniz notation. Let’s
slow the composition down by inserting a middle term. Think of y as
y = f (u).
That is, y is a function depending on a variable u. For u we want to take g(x),
so think of u as
u = g(x).
That is, u as a function depending on a variable x. Then we have
y = f (g(x)).
Now the rate at which y changes with respect to x should have something to
do with how y changes with respect to u and how u changes with respect to
dy dy
x. That is, dx should have something to do with du and du
dx . The Chain Rule
describes the relationship as a product:
dy dy du
(3) = · .
dx du dx
1 dy
Example 4.37. If y = 3u+1 and u = 2x2 , find dx .
37
dy 3 du
Solution 4.38. Observe that du = − (3u+1) 2 and dx = 4x. So by the Chain
Rule,
dy dy du 3
= =− · 4x.
dx du dx (3u + 1)2
The answer should be in terms of the variable x, so substitute in u = 2x2 to
get
dy 3 −12x
=− · 4x = .
dx (6x2 + 1)2 (6x2 + 1)2
Sometimes a problem can be rephrased in terms of a composition in order to

get the derivative more easily.
d 2
Example 4.39. Find dx ((x − 1)32 ).
Solution 4.40. You could multiply out (x2 −1)32 and then take the derivative
but this would take a while. Instead, think of y = (x2 − 1)32 as y = u32 where
u = x2 − 1. Then by the Chain Rule,
dy dy du
= = 32u31 · 2x = 64x(x2 − 1)31 .
dx du dx
Now let’s try to get a formula for the Chain Rule in terms of the functions f
and g. We have y = f ◦ g where we think of y = f (u) and u = g(x). The
Chain Rule says
dy dy du
= .
dx du dx
dy
Now du is f 0 (u) where we have taken the derivative of f with respect to u.
dy
Remembering that u = g(x), this means that du = f 0 (g(x)). On the other
du 0
hand, dx is g (x). Thus
dy
= f 0 (g(x))g 0 (x).
dx
dy
Finally, since y = f ◦ g, we have dx = (f ◦ g)0 . Hence we have:
Theorem 4.41 (The Chain Rule). Let f and g be functions such that g is
differentiable at x and f is differentiable at g(x). Then
(f ◦ g)0 (x) = f 0 (g(x))g 0 (x).
Remark 4.42. This is not a proof of the Chain Rule since (3) was not
rigourously justified.
The Chain Rule is an immensely powerful tool for calculating derivatives.
) . Find y 0 .
x+1 3
Example 4.43. Let y = ( x−1
38
x+1
Solution 4.44. Solution 1: Use the Liebniz technique. Let u = x−1 so that
y = u3 . Then
dy dy du 1(x − 1) − (x + 1)(1)
= = 3u2 ·
dx du dx (x − 1)2
2
x+1 −2
=3 ·
x−1 (x − 1)2
−6(x + 1)2
= .
(x − 1)4
Solution 2: Use Theorem 4.41. We have y = f ◦ g where f (x) = x3 and

x+1
g(x) = x−1 . Note that f 0 (x) = 3x2 so f 0 (g(x)) = 3(g(x))2 . We get
0
0 0 0 0 x+1 2
y (x) = (f ◦ g) (x) = f (g(x))g (x) = 3(g(x))
x−1
2 0
x+1 x+1
=3 ·
x−1 x−1
2
−6(x + 1)2

x+1 −2
=3 · 2
= .
x−1 (x − 1) (x − 1)4
Sometimes you can have multiple compositions.

q
1 0
Example 4.45. Let y = (x2 +1) 3 −1 . Find y .
Solution 4.46. Warning: the ideas used here are easy but the details are
technically harder.
Solution 1: Use the Liebniz technique. Let v = x2 + 1 and let u = v31−1 , so

√
y = u. Now the Chain Rule can be applied to u as well as to y, so we obtain
dy dy du dv 1 −3v 2
= = √ · 3 · 2x.
dx du dv dx 2 u (v − 1)2
Substituting for u and v gives
1 −3v 2 1 −3(x2 + 1) · 2x
√ · 3 2
· 2x = q · .
2 u (v − 1) 1
2 (x2 +1) 3 −1
[(x2 + 1)3 − 1]2
√
Solution 2: Use Theorem 4.41. We have y = f ◦ g ◦ h where f (x) = x,
2
g(x) = x31−1 and h(x) = x2 + 1. Note that f 0 (x) = 2√
1
x
, g 0 (x) = − (x33x
−1)2 and
39
h(x) = 2x. Applying Theorem 4.41 twice gives

y 0 (x) = f 0 (g(h(x)))g 0 (h(x))h0 (x)
1 3(h(x))
= p ·− 3 − 1)2
· 2x
2 g(h(x)) (h(x)
1 −3(x2 + 1) · 2x
= q · .
1
2 (x2 +1) 3 −1
[(x2 + 1)3 − 1]2
The key to making the Chain Rule work for composites of multiple functions
is to start differentiating from the outermost function in the composite and
work inwards to the innermost function.
TIP: How do you get good at the Chain Rule? Practise. A lot. And when
you think you’re good at it, practise some more.
Differentiating Trig Functions. Fundamentally, you only need to know

two derivatives:
d d
(4) (sin x) = cos x and (cos x) = − sin x.
dx dx
The rest of the derivatives of the trig functions follow from their definitions
sin x
and the rules of differentiation. For example, since tan x = cos x the Quotient
Rule gives
d cos x cos x − sin x(− sin x) cos2 x + sin2 x 1
(tan x) = 2
= = = sec2 x.
dx cos x cos2 x cos2 x
You can check yourself that the derivatives of the other trig functions are:
d
(tan x) = sec2 x
dx
d
(cot x) = − csc2 x
dx
d
(sec x) = sec x tan x
dx
d
(csc x) = − csc x cot x.
dx
Using trig functions is a great way to practise the Chain Rule.
Example 4.47. Find the derivative of y = sin(x2 ).
Solution 4.48. Using the Liebniz technique, let u = x2 so y = sin u. Then
dy dy du
= = cos u · 2x = cos(x2 ) · 2x = 2x cos x.
dx du dx
Using Theorem 4.41, take f (x) = sin x and g(x) = x2 , so
(f ◦ g)0 (x) = f 0 (g(x))g 0 (x) = cos(g(x)) · 2x = cos(x2 ) · 2x.
√
Example 4.49. If y = tan( 2x2 + 3x − 1), find y 0 .
40
Solution 4.50. This time we’ll only use Theorem 4.41 and speed up a little
√ is a composite of three functions, y = f ◦ g ◦ h, where f (x) = tan(x),
bit. This
g(x) = x and h(x) = 2x2 + 3x − 1. So by Theorem 4.41,
y 0 (x) = f 0 (g(h(x)))g 0 (h(x))h0 (x)
1
= sec2 (g(h(x))) · p · (4x + 3)
2 h(x)
p 1
= sec2 ( 2x2 + 3x − 1) · √ · (4x + 3)
2 2x2 + 3x − 1
p 4x + 3
= sec2 ( 2x2 + 3x − 1) · √ .
2 2x2 + 3x − 1
Let’s return to (4) and see why these derivatives make sense. Two limits play
an important role:
sin x cos x − 1
lim =1 and lim = 0.
x→0x x→0 x
The first says that the tangent line to sin x as x approaches 0 matches the line
y = x, which is believable from the graph. The second says that the tangent
line to cos x as x approaches 0 matches the line y = 1, or equivalently, the
tangent line to cos x − 1 as x approaches 0 matches the line y = 0. Again,
thinking of the first interpretation, this is believable from the graph. Actually
working these out is not overly difficult but goes a bit beyond the scope of
what we want to do. Let’s take them as given and just use them.
d d
Theorem 4.51. We have dx (sin x) = cos x and dx (cos x) = − sin x.
Proof. Since sin x and cos x can’t be interpreted as sums, products, quotients,
or compositions of other functions, we have to go back to the definition of the
derivative. So
sin(x + h) − sin x cos(x + h) − cos x
(sin x)0 = lim and (cos x)0 = lim .
h→0 h h→0 h
We’ll need the trig identities
sin(x + h) = sin x cos h + sin h cos x
cos(x + h) = cos x cos h − sin x sin h.
Now using the arithmetic of limits we obtain
sin(x + h) − sin x sin x cos h + cos x sin h − sin x
(sin x)0 = lim = lim
h→0 h h→0 h
sin x(cos h − 1) cos x sin h
= lim + lim
h→0 h h→0 h
cos h − 1 sin h
= sin x · lim + cos x · lim
h→0 h h→0 h
= sin x · 0 + cos x · 1 = cos x.
The derivative of cos x is similar and left to you.
41
Challenge Problem 4.52. Finish the proof of Theorem 4.51 by showing

d
that dx (cos x) = − sin x.
Differentiating exponential and logarithmic functions. Recall that if

a is a positive real number the exponential function is f (x) = ax and the
logarithmic function g(x) = loga (x) is the inverse function of ax . If we try to
differentiate f (x) = ax we obtain
f (x + h) − f (x) ax+h − ax
f 0 (x) = lim = lim .
h→0 h h→0 h
ax+h −ax ax (ah −1)
Recall that ax+h = ax ah so h = Therefore
h .
x+h x x h
ah − 1

a −a a (a − 1)
f 0 (x) = lim = lim = lim ax .
h→0 h h→0 h h→0 h
ah −1 a0+h −a0
Next recall that a0 = 1, so we can write h as h . Thus
ah − 1 a0+h − a0
lim = lim = f 0 (0).
h→0 h h→0 h
Hence
ah − 1

f 0 (x) = lim ax = f 0 (0)ax .
h→0 h
That is, (ax )0 = f 0 (0)ax – the derivative of an exponential function is a con-
stant times itself!
It turns out that there is a unique choice of a for which f 0 (0) = 1. This
number is so special it’s given its own name: e. That is, e is the unique real
number such that the exponential function ex is its own derivative:
(ex )0 = ex .
The inverse function of ex is the natural logarithm ln x, where ln x = loge (x).
We won’t prove that e really is the unique choice of a for which f 0 (0) = 1, as
this is a bit subtle. Moreover, the usual proof is to first show the existence of
ln x via an integral, and obtain ex as its inverse. To do this we’d have to wait
until we do integration. But it seems sensible to introduce ex and ln x now,
with their derivatives, in order to expand our kit bag of useful functions we
can differentiate and play with.
We list as facts the following information:
d x
(e ) = ex
dx
d x
(a ) = ax ln a
dx
d 1
(ln x) =
dx x
d 1
(loga x) = .
dx x ln a
42
Example 4.53. Find the derivative of y = esin 2x at x = π/2.

Solution 4.54. The function y is a composite y = f ◦ g ◦ h where f (x) = ex ,
g(x) = sin x and h(x) = 2x. Note that f 0 (x) = ex , g 0 (x) = cos x and h0 (x) = 2.
So by the Chain Rule,
y 0 (x) = f 0 (g(h(x)))g 0 (h(x))h0 (x) = eg(h(x)) cos(h(x)) · 2 = 2esin 2x cos 2x.
(You should be getting the hang of the Chain Rule by now so that you don’t
have to write down the composite so explicitly.) Therefore, at x = π/2, we
have
π
y0 = 2esin π cos π = 2e0 (−1) = −2e0 = −2.
2
√
Example 4.55. FInd the derivative of y = ln(cos x).
Solution 4.56. This is a composite of three functions again, but this time let’s
find the derivative without writing down the explicit composition. Starting
with the outermost function and taking derivatives going inwards, we obtain
√
1 √ 1 − sin x
y0 = √ · (− sin x) · √ = √ √ .
cos x 2 x 2 x cos x
Implicit Differentiation. All the functions we have considered to this point

have been of the form√y = f (x) where f (x) only involved x’s. For example,
y = sin(x2 ) or y = e x . However, when it comes to doing calculus in real
applications it’s common that the y’s and x’s are mixed up. For example,
3xy 2 − 2x3 y − 4 = 0 cannot be solved explicitly for y. We say that y is defined
implicitly by this equation.
dy
Nevertheless, we want to be able to differentiate such a function to find dx .
The way to do it is to think of y a bit more abstractly and use the Chain Rule.
For example,
(y 2 )0 = 2y · y 0 .
That is, y 2 is a composite: first do y (whatever that does) and then square it.
So to differentiate take the derivative of the square, giving 2y, and multiply by
the derivative of the inner function, which is y. It’s derivative is y 0 (whatever
that happens to be).
Example 4.57. The function y is defined implicitly by the equation 3xy 2 −
dy
2x3 y − 4 = 0. Find dx .
Solution 4.58. We need the Product Rule to differentiate something like xy 2 ,
so using this gives
(3xy 2 − 2x3 y − 4)0 = 3y 2 + 3x(y 2 )0 − (6x2 y + 2x2 y 0 )
= 3y 2 + 6xyy 0 − 6x2 y − 2x2 y 0
= (3y 2 − 6x2 y) + (6xy − 2x2 )y 0 .
Thus, starting from 3xy 2 − 2x3 y − 4 = 0 and differentiating both sides of the
equation, we get
(3y 2 − 6x2 y) + (6xy − 2x2 )y 0 = 0.
43
Now solving for y 0 gives

6x2 y − 3y 2
y0 = .
6xy − 2x2
Note in the last example we obtain y 0 as a function of both x and y. This is

okay and to be expected: since y cannot be defined explicitly in terms of x
you wouldn’t ordinarily expect y 0 to be only in terms of x either.
d2 y
Example 4.59. Find dx2 given that y 3 − x2 = 3.
Solution 4.60. First find y 0 . Differentiating implicitly gives
3y 2 y 0 − 2x = 0
and so
2x
. y0 =
3y 2
Now differentiate both sides – using the Quotient Rule and implicit differen-
tiation on the right – to get
2(3y 2 ) − 2x(6yy 0 ) 6y 2 − 12xyy 0
y 00 = = .
(3y 2 )2 9y 4
We know what y 0 is so substitute it in and simplify to get
2x 18y 4 −24x2 y
00
6y 2 − 12xy 3y 2 3y 2 18y 4 − 24x2 y 6y 3 − 8x2
y = = = = .
9y 4 9y 4 27y 6 9y 5
Differentiating Rational Powers. We know how to differentiate xn if n

is a positive integer: (xn )0 = nxn−1 . Challenge Problem 4.27 shows that the
same formula works if n is a negative integer. What if n is a rational number?
Let’s use implicit differentiation to work this out.
p
Theorem 4.61. If p is an integer and q is a positive integer, then (x q )0 =
p
p q −1 ).
q (x
p
Proof. Let y = x q . Take q th -powers of both sides to get
y q = xp .
Now differentiate implicitly go get
qy q−1 y 0 = pxp−1 .
Solving for y 0 gives
pxp−1 p
y0 = = xp−1 y 1−q .
qy q−1 q
p
Since y = x q we have
p(1−q) p(1−q) (p−1)q+p(1−q) p−q p
xp−1 x q = xp−1+ q =x q =x q = x q −1 .
Therefore
p pq −1
y0 = (x ).
q

44
4
2 1
Example 4.62. Find the derivatives of y = x 3 − 2x 4 and z = ex 5 .
Solution 4.63. For the first one,
2 −1 1 3
y0 =(x 3 ) − x− 4 .
3 2
For the second, use the Chain Rule to get
4
4 1 4 1 4
z 0 = ex 5 · x− 5 = x− 5 ex 5 .
5 5
Differentiating Inverse Trig Functions. Recall that the six trig func-
tions have inverses, once restricted to an appropriate domain. What are their
derivatives?
Take for example sin−1 x, which has domain [−1, 1] and range [− π2 , π2 ]. Since
sin−1 x is the inverse function for sin x we have sin(sin−1 x) = x. So if we let
y = sin−1 x then taking the sine of both sides gives sin y = x. Now differentiate
implicitly to get
cos y · y 0 = 1.
Solving for y 0 gives
1
y0 = .
cos y
p
Here’s a trick. Since sin2 t + cos2 t = 1, we have cos t = 1 − sin2 t. In our
case this gives
1 1
y0 = p 2
=√
1 − sin y 1 − x2
where the last step occurred since y = sin−1 x so sin y = sin(sin−1 x) = x.
Working similarly for the other trig functions we end up with:
d 1
(sin−1 x) = √
dx 1 − x2
d −1
(cos−1 x) = √
dx 1 − x2
d 1
(tan−1 x) =
dx 1 + x2
d −1
(cot−1 x) =
dx 1 + x2
d 1
(sec−1 x) = √
dx |x| x2 − 1
d −1
(csc−1 x) = √
dx |x| x2 − 1
Note the odd form that the derivatives of sec−1 x and csc−1 x take, involving
absolute values. These are slightly unusual and won’t be dwelt on much in
what follows.
45
The usual rules, Chain Rule, Product Rule, Quotient Rule, apply as nor-
mal.
Example 4.64. Find the derivatives of y = tan−1 (2x2 ) and z = sin−1 (cos x).
Solution 4.65. Using the Chain Rule in the first case gives
1 4x
y0 = · (2x2 )0 = .
1 + (2x2 )2 1 + 4x4
The Chain Rule in the second case gives
1 1 − sin x − sin x
z0 = p · (cos x)0 = √ (− sin x) = √ 2 = .
1 − (cos x)2 2
1 − cos x sin x | sin x|
√
Note that in the denominator we can’t say sin2 x = sin x since sin x could
be negative while the square root always has to be positive. That’s why the
absolute value signs appeared. Note that if sin x is positive then z 0 = −1
whereas if sin x is negative then z 0 = 1. Moreover, z 0 is not defined when
sin x = 0.
d 1
Challenge Problem 4.66. Prove that dx (tan x) = 1+x2 .
Related Rates. Differentiation is a powerful tool for working out rates of

change in real problems.
Example 4.67. A spherical balloon is expanding. If the radius is increasing

at the rate of 2cm per minute, at what rate is the volume increasing when the
radius is 5cm?
Solution 4.68. First, we have to make sense of the problem. The volume
of the sphere is V = 43 πr3 where r is the radius. We’re looking for the rate
at which volume changes over time, that is, we’re looking for dV dt . We are
told that the radius changes with respect to time, ie. drdt = 2. Note that the
variable t does not explicitly appear in the formula for the volume. So when
we differentiate we need to do so implicitly. This gives
dV 4 dr dr
= π3r2 = 4πr2 .
dt 3 dt dt
We’re told that the radius is increasing at the rate of 2cm per minute, so
dr dV 2 dV
dt = 2. Therefore dt = 8πr . When r = 5 we get dt = 200π. Remembering
the units, dV 3
dt = 200π cm /min.
Example 4.69. Two ships, one heading east and the other heading west,
approach each other on parallel courses 8 miles apart. Given that each ship
is cruising at 20 miles per hour, at what rate is the distance between them
diminishing when they are 10 miles apart?
46
Solution 4.70. It always helps to draw a picture.

←− ship
y
8 miles
ship −→
x
Here, the vertical line is the 8 mile difference between the parallel courses the
ships are moving in, y is the distance between the two ships, and x is the
“horizontal” distance between the ships.
Now let’s figure out what the problem is asking for and what information we
have. We want to find the rate at which the distance between the ships is
changing, that is, we want dy dt . We are told that the ships are approaching
one another so x is decreasing, meaning dx dt should be negative. Further, the
ships are each moving at 20 miles per hour, so the total speed at which they
approach one another is 40 miles per hour. Therefore, dx dt = −40. Finally, we
need to relate x and y. But the triangle does this: x2 + 64 = y 2 . Therefore,
differentiating implicitly gives
dx dy dy x dx
2x = 2y ⇒ = .
dt dt dt y dt
We want to know dy 2 2
dt when y = 10. But when y = 10, from x + 64 = y we
get x = 6. Therefore
dy 6
= · (−40) = −24.
dt 10
47
5. Curve Sketching
The first and second derivatives give an enormous amount of information

about the shape of the graph of a function that will let us a rough but fairly
accurate sketch of how it looks without having to plot lots of points. The first
step is to find “critical points”.
Critical Points. Let’s start with a definition.

Definition 5.1. Let f be a differentiable function defined on an open interval
(a, b). A point c in (a, b) is a critical point if f 0 (c) = 0.
Example 5.2. If f (x) = 2x3 + 3x2 − 12x + 7, find the critical points of f .
Solution 5.3. We have f 0 (x) = 6x2 + 6x − 12. To find the critical points
of f , find the solutions of f 0 (x) = 0. We have 6x2 + 6x − 12 = 0 implying that
x2 + x − 2 = 0, which gives (x + 2)(x − 1) = 0. Therefore f has critical points
at x = −2 and x = 1.
We’ll see that critical points have something to do with maximums and mini-
mums.
Definition 5.4. A function f has a local maximum at x0 if there is an inter-
val I with x0 ∈ I and f (x0 ) ≥ f (x) for all x in I.
A function f has a local minimum at x0 if there is an interval I with x0 ∈ I
and f (x0 ) ≤ f (x) for all x in I.
The word “local” is used because you’re thinking of f as having a maximum

or minimum only a small part of the graph of f near x0 , rather than looking
at the whole graph of f . Note that the definition leaves open the possibility
that I could be a closed interval, in which case the local max or local min
could be an endpoint.
For the moment, we’ll concentrate on points in the interior of intervals. The
following lemma is very important.
Lemma 5.5. If f is differentiable on (a, b) and has a local maximum (or a
local minimum) at a point c in (a, b) then f 0 (c) = 0. That is, c is a critical
point of f .
Proof. Suppose that f has a local maximum value at c for some c ∈ (a, b).
Then f (c) ≥ f (x) for any other x ∈ (a, b). In particular, if h > 0 and h is
sufficiently small so that c + h ∈ (a, b), then f (c + h) − f (c) ≤ 0. As h is
positive, this also implies that f (c+h)−f
h
(c)
≤ 0. Further, this is true for any h
near c, so
f (c + h) − f (c)
lim ≤ 0.
h→0+ h
0
This limit is very close to the definition of f (c), the only difference being
that it’s one-sided. But by hypothesis, f is differentiable at c so f 0 (c) ex-
ists, meaning the limit from the left and the limit from the right are equal.
48
Therefore,
f (c + h) − f (c)
f 0 (c) = lim ≤ 0.
h→0+ h
On the other hand, if h < 0 and h is sufficiently small so that c + h ∈ (a, b)
then the same argument gives
f (c + h) − f (c)
f 0 (c) = lim ≥ 0.
h→0− h
Thus f 0 (c) is both ≥ 0 and ≤ 0, meaning we must have f 0 (c) = 0.
Maximum and Minimum values. We’ve seen local maximums and mini-
mums. There are also absolute maximums and minimums.
Definition 5.6. A function f has an absolute maximum at x0 if f (x0 ) ≥ f (x)
for every x in the domain of f .
The function has an absolute minimum at x0 if f (x0 ) ≤ f (x) for every x in
the domain of f .
Note that absolute maximum and minimum values do not have to be unique.
In an extreme case, take f (x) = 1 for all x ∈ R. Then f has an absolute
maximum of 1 and an absolute minimum of 1 at every point x in R.
For example, consider the following graph of a function y = f (x):
a x1 x2 b
(5)
Observe that f has a local maximum at the endpoint a, a local minimum at
the point x1 , a local maximum at the point x2 . and a local minimum at the
endpoint b. Since the value of f at x2 is larger than at a, f (x2 ) is the absolute
maximum, and since the value of f at x1 is less than the value at b, f (x1 )
is the absolute minimum. The sharp corner at f (x2 ) has the distinct feature
of f not being differentiable at x2 . This leads to a definition.
Definition 5.7. Let f be a function and suppose there is a point c such that
f (c) is defined but f 0 (c) is not. Then c is a singular point.
Returning to (5), f had extreme values (local max’s and local min’s) at three
types of points:
• endpoints;
• singular points;
49
• critical points.
Example 5.8. Consider the function f (x) = x4 − x2 + 2 restricted to the
domain [−2, 2]. Find the absolute maximum and absolute minimum of f .
f 0 (x) =
Solution 5.9. The first step is to find the critical points. We have √
3 2
4x − 2x√= 2x(2x − 1) so the critical points are x = 0, x = 1/ 2 and
x = −1/ 2. Notice that f 0 is defined on all of (−3, 2) so f has no singular
points. Thus the extreme values of f occur at critical points and endpoints.
To find the absolute maximum and absolute minimum value √ of f simply eval-
uate √
f at the extreme points. We get f (−2) = 14, f (−1/ 2) = 7/8, f (0) = 2,
f (1/ 2) = 7/8 and f (2) = 14. Therefore f has an absolute maximum of 14
which it takes twice, at√x = ±2, and an absolute minimum of 7/8, which it
takes twice, at x = ±1/ 2.
The Mean Value Theorem. Having found local minimums and maximums,
the next step is to try to determine on what intervals a function is increasing
or decreasing. The key tool for doing this is The Mean Value Theorem.
Theorem 5.10 (The Mean Value Theorem). Let f be a continuous function
defined on the closed interval [a, b] which is differentiable on the open interval
(a, b). Then there is a point c in (a, b) such that
f (b) − f (a)
f 0 (c) = .
b−a
What is the theorem saying? Let’s start with a graph:
(b, f (b))
(a, f (a))
a c b
On the one hand, f (b)−f b−a

(a)
is the slope of the secant line joining the points
(a, f (a)) and (b, f (b)) on the graph. On the other hand, f 0 (c) is the slope of
the tangent line to the graph of f at c. What the theorem says is that there
is some point c between a and b such that the slope of the tangent line at c
matches the slope of the secant line connecting (a, f (a)) and (b, f (b)).
What is the theorem *not* saying? It does not say where the point c is. That
is, it does not give an algorithm for finding the point c. It just says that a
point c with the right property exists. Moreover, it does not say that c is
unique. It could be that there are many points c1 , . . . , ck in (a, b) with the
same property. The theorem only tells us that, no matter what f is, there will
always be at least one point c with the right property.
50
Example 5.11. Verify the Mean Value Theorem for the function f (x) = x2
on [1, 4].
Solution 5.12. The Mean Value Theorem says that there is a point c in (1, 4)
such that f 0 (c) = 2c equals f (b)−f
b−a
(a)
= 16−1
4−1 = 5, ie - 2c = 5. But just take
c = 5/2 and this works.
Let’s now prove the Mean Value Theorem.
Proof of Theorem 5.10. First we do a special case known as Rolle’s Theorem.

Suppose that f (a) = f (b) = 0, so we’re aiming to show there is a c ∈ (a, b)
such that f 0 (c) = f (b)−f
b−a
(a)
= 0. If we could show that f has a maximum
or minimum value at some point c in (a, b), then Lemma 5.5 implies that
f 0 (c) = 0 and we’re done. Luckily, since f is continuous on [a, b], by the
Extreme Value Theorem, f has a maximum and a minimum value on [a, b].
However, this maximum or minimum may occur at an endpoint, while we
want it to be in (a, b).
To get around this, observe that if f (x) = 0 for all x ∈ (a, b) then f 0 (x) = 0
and we can take c as any point of (a, b). Otherwise, there is some x0 ∈ (a, b)
such that f (x0 ) 6= 0. Suppose that f (x0 ) > 0, the case when f (x0 ) < 0 being
similar. The maximum value at c has the property that f (c) ≥ f (x0 ) > 0. But
as f (a) = 0 = f (b), we cannot have c equal to either a or b. Hence c ∈ (a, b).
Now return to the general case. Let’s do a trick. Define the function g by
f (b) − f (a)
g(x) = f (x) − f (a) − (x − a).
b−a
Observe that g is continuous on [a, b] since it’s a sum/difference of continuous
functions, and it’s differentiable on (a, b) for the same reason. Observe that
g(a) = 0 and g(b) = 0. So g satisfies the special case, implying that there is a
c ∈ (a, b) such that g 0 (c) = 0. But from the definition of g we obtain
f (b) − f (a)
g 0 (x) = f 0 (x) − .
b−a
f (b)−f (a) f (b)−f (a)
Thus 0 = g 0 (c) = f 0 (c) − b−a , and therefore f 0 (c) = b−a .
Intervals of increase and decrease. We now use the Mean Value Theorem
to determine when a function is increasing or decreasing on an interval. First,
let’s define terms.
Definition 5.13. Let f be a function defined on an interval I and suppose
x1 , x2 are two points in I.
• If f (x1 ) < f (x2 ) whenever x1 < x2 then f is increasing on I.
• If f (x1 ) > f (x2 ) whenever x1 < x2 then f is decreasing on I.
• If f (x1 ) ≤ f (x2 ) whenever x1 < x2 then f is non-increasing on I.
• If f (x1 ) ≥ f (x2 ) whenever x1 < x2 then f is non-decreasing on I.
51
Theorem 5.14. Let f be a differentiable function on an interval I.

• If f 0 (x) > 0 for all x ∈ I then f is increasing on I.
• If f 0 (x) < 0 for all x ∈ I then f is decreasing on I.
• If f 0 (x) ≥ 0 for all x ∈ I then f is non-increasing on I.
• If f 0 (x) ≤ 0 for all x ∈ I then f is non-decreasing on I.
Proof. Let x1 , x2 ∈ I and suppose that x1 < x2 . By the Mean Value Theo-
rem there is a point c ∈ (x1 , x2 ) such that f 0 (c) = f (xx22)−f
−x1
(x1 )
. Notice that
x2 − x1 > 0, so f 0 (c) and f (x2 ) − f (x1 ) have the same sign. If f 0 (x) > 0 for all
x ∈ I then f 0 (c) > 0, implying that f (x2 ) − f (x1 ) > 0. Thus f (x2 ) > f (x1 ).
As this is true for all x1 < x2 in I, the function f must be increasing on I.
The other cases are similar.
Example 5.15. Find the intervals of increase and decrease for the function
f (x) = x3 − 12x + 1.
Solution 5.16. The derivative of f is f 0 (x) = 3x2 − 12 = 3(x2 − 4) =
3(x + 2)(x − 2). We want to know when f 0 (x) > 0 and f 0 (x) < 0. It helps to
first know when f 0 (x) = 0. We have f 0 (x) = 3(x + 2)(x − 2) = 0 when x = ±2.
This means we need to consider the intervals (−∞, −2), (−2, 2) and (2, ∞).
In each interval, f 0 (x) will be either always positive or always negative. (For
if there is a sign change then at some point the graph of f 0 (x) must cross
the x-axis, that is, f 0 (x) must be zero, but we’ve already found all the points
where f 0 (x) = 0.) Therefore all we need to do is test f 0 (x) on one point in
each interval. Observe that f 0 (−3) = 15 > 0 so f is increasing on (−∞, −2);
f 0 (0) = −12 > 0 so f is decreasing on (−2, 2); and f 0 (3) = 15 > 0 so f is
increasing on (3, ∞).
Example 5.17. Show that for all x > 0 we have sin x < x.
Solution 5.18. Let f (x) = x − sin x. We want to show that f (x) > 0 for all
x > 0. Since −1 ≤ sin x ≤ 1 for any x, if x > 1 then certainly f (x) > 0. So it
remains to consider f on (0, 1). If we knew that f (0) ≥ 0 and f is increasing
on (0, 1) then we’d be done. We have f (0) = 0 and f 0 (x) = 1 − cos x. Now
cos x ≤ 1 for any x, and cos x = 1 only if x is a multiple of 2π. Thus cos x < 1
on (0, 1), implying that f 0 (x) > 0 on (0, 1), so f is increasing on (0, 1).
Testing critical points I. Given a function f we can find its critical points
by finding solutions to f 0 (x) = 0. We also know that critical points have some-
thing to do with local maximums and minimums, although the only definite
statement we have so far is logically going the other way: given a local max or
local min in an open interval on which f is differentiable then it is a critical
point. So when we find solutions to f 0 (x) = 0, we don’t know immediately
whether these critical points are local minimums, local maximums, or neither.
We need a way of testing this.
52
There are two ways to do this. The first is to look at the intervals of increase
and decrease and correctly interpret information. This is called The First
Derivative Test, and is best seen through an example.
Example 5.19. Find and classify the critical points of the function f (x) =
x4 − 2x2 − 2.
Solution 5.20. First find the critical points: f 0 (x) = 4x3 −4x = 4x(x2 −1) =
4x(x − 1)(x + 1), so the critical points are x = 0, x = 1 and x = −1. Consider
the intervals of increase and decrease:
(−∞, −1) (−1, 0) (0, 1) (1, ∞)
f0 − + − +
f & % & %
Here, in each open interval we have evaluated f 0 at one point in order to deter-
mine whether it is positive or negative. For example, taking −2 ∈ (−∞, −1)
gives f 0 (−2) = −24 and all we care about is the sign, which is negative. Know-
ing that f 0 (x) < 0 on (−∞, 3), we know that f is decreasing on this interval,
and so a downward pointing arrow is inserted in the table.
Now interpret what the intervals of increase and decrease say about local
maximums and minimums. As f is decreasing on (−∞, 1) and then increases
on (−1, 0), it must be the case that f has a local minimum at x = 1. Similarly,
as f is increasing on (−1, 0) and decreasing on (0, 1), it must be the case that f
has a local maximum at x = 0. The same reasoning shows that f has a local
minimum at x = 1.
Remark 5.21. Having seen how it works, note that the local maximums and
minimums can be simply read off the table.
It’s worth being slightly cautious here. It’s possible for a critical point to be
neither a maximum nor a minimum.
Example 5.22. Find and classify the critical points of f (x) = x3 .
Solution 5.23. This is a simple but illustrative example, because you know
the graph of y = x3 is always increasing, but it flattens out momentarily at
x = 0. Doing the algebra, f 0 (x) = 3x2 so f has a critical point at x = 0. The
intervals of increase and decrease are:
(−∞, 0) (0, ∞)
f0 + +
f % %
The table shows that while x = 0 is a critical point, the function is increasing
both to the left and right of it, so x = 0 is neither a local maximum nor a
local minimum.
The second way of testing whether a critical point is a local maximum or a

local minimum uses the second derivative, but will be saved for a later.
53
Concavity and Inflection Points Now we see what extra information the
second derivative of f adds. Essentially, the second derivative will say some-
thing about whether the graph of f is cupped upwards or cupped down-
wards.
Definition 5.24. Let f be a differentiable function on an open interval I.
• The graph of f is concave up on I if f 0 is increasing on I.
• The graph of f is concave down on I if f 0 is decreasing on I.
What does this mean in terms of the graph of f ? Remember that the derivative
f 0 (x) is the slope of the tangent line to the graph of f at x. If f 0 is increasing it
means that over time, travelling from left to right on the x-axis, the slopes of
the tangent lines to the graph of f are increasing. Here is an example:
x0
Note that on the left part of the graph the slope of the tangent line is negative,
but as we go from left to right it becomes less and less negative, until it hits 0
at x0 , and then it becomes increasingly positive. It’s worth repeating this:
saying f 0 is increasing does not mean the slope of the tangent line to the graph
of f is positive, it could be increasing from −2 to −1, for example.
Similarly, if f 0 is decreasing then the slopes of the tangent lines to the graph
of f are decreasing, as in:
x0
Graphically, a function is concave up if the tangent lines are *below* the

graph of f and it is concave down if the tangent lines are *above* the graph
of f .
54
Here is a third graph which has one part of f being concave up and another
being concave down.
x0
The points at which the concavity changes are special.

Definition 5.25. A point x0 is an inflection point or turning point of f if
the graph of f has a tangent line at x0 and the concavity of f is opposite on
opposite sides of x0 .
How do we find inflection points? How do we determine the intervals where

the graph of f is concave up or concave down? The following theorem says
that it boils down to looking at the second derivative.
Theorem 5.26. Let f be a function which is twice differentiable (that is, f 00
exists) on an open interval I.
(1) If f 00 (x) > 0 on I then f is concave up on I.
(2) If f 00 (x) < 0 on I then f is concave down on I.
(3) If f has an inflection point at x0 then f 00 (x0 ) = 0.
Before explaining why this works, let’s read some fine print. First, the theorem
does not say that if f 00 (x0 ) = 0 then x0 is an inflection point. It may be the case
that some point x0 satisfying f 00 (x0 ) = 0 is not an inflection point. (Just as
having f 0 (c) = 0 leaves open the possibility that c is neither a local maximum
nor a local minimum.) Second, it could be the case that the concavity of f
changes at a sharp corner - but f is not differentiable at this corner so it does
not qualify as an inflection point.
Proof. This is really just interpreting the information. Let g = f 0 so g 0 = f 00 .

To say g 0 (x) > 0 on I means g is increasing on I, that is, f 0 is increasing
on I. So by definition, f is concave up on I. Similarly, to say g 0 (x) < 0 on I
means g is decreasing on I, that is, f 0 is decreasing on I. So by definition, f
is concave down on I. This proves parts (1) and (2).
Finally, if f has an inflection point at x0 then the concavity of f changes from
the left to the right of x0 . For example, if f is concave up to the left of x0
and concave down to the right of x0 then by parts (1) and (2), f 00 (x) > 0 to
the left of x0 and f 00 (x) < 0 to the right of x0 , meaning that at x0 we must
have f 00 (x0 ) = 0. Similarly if f is concave up to the right of x0 and concave
up to the right of x0 .
55
Example 5.27. Find the intervals of concavity and the inflection points of
f (x) = x6 − 10x4 .
Solution 5.28. We have f 0 (x) = 6x5 − 40x3 so f 00 (x) = 30x4 − 120x2 =
30x2 (x2 − 4). Thus f 00 (x) = 0 when x = 0, x = −2 and x = 2, so the possible
inflection points of f are x = 0, x = −2 and x = 2. Now consider the intervals
of concavity:
(−∞, −2) (−2, 0) (0, 2) (2, ∞)
f 00 + − − +
f ^ _ _ ^
Here, like in the case of intervals of increase and decrease, we simply test f 00 at
one point in each interval and only record the sign. We find that f is concave
up on (−∞, −2), concave down on (−2, 2) and concave up on (2, ∞). The
change in concavity at x = −2 and x = 2 implies that both of these points are
inflection points. However, there is no change of concavity at x = 0 so this is
not an inflection point.
Testing Critical Points II. Information from the second derivative gives
another way of testing whether a critical point is a local maximum, a local
minimum or neither.
Theorem 5.29 (The Second Derivative Test). Suppose that f 0 (c) = 0 and
f 00 (c) exists. Then
• if f 00 (c) > 0 then f is a local minimum;
• if f 00 (c) < 0 then f is a local maximum;
• if f 00 (c) = 0 then more information is needed to classify the critical
point.
This looks counter-intuitive: a positive second derivative says the function has
a local minimum. But think about it. As f 0 (c) = 0, the tangent line to the
graph of f is horizontal. As f 00 (c) > 0 Theorem 5.26 says that f is concave
up around c. The only way to be concave up around c and have a horizontal
tangent line t c is for f to be a local minimum. The same argument works for
a local maximum.
Example 5.30. Return to f (x) = x4 − 2x2 − 2 in Example 5.19. Classify the
critical points of f using the Second Derivative Test.
Solution 5.31. We know what the answer should be, so we’re checking that
the Second Derivative Test gives the same answer. We have f 0 (x) = 4x3 − 4x
so f 00 (x) = 12x2 − 4. The critical points are x = 0, x = 1 and x = −1.
Evaluating, f 00 (−1) = 8 > 0 so x = −1 is a local minimum, f 00 (0) = −4 < 0
so x = 0 is a local maximum, and f 00 (1) = 8 > 0 so x = 1 is a local minimum.
Example 5.32. Return to f (x) = x3 in Example 5.22. Classify the critical
points of f using the Second Derivative Test.
56
Solution 5.33. We have f 0 (x) = 3x2 so f 00 (x) = 6x. There is one critical
point at x = 0, and f 00 (0) = 0, so the Second Derivative Test does not give
enough information to determine whether x = 0 is a local maximum, a local
minimum or neither.
Vertical and Horizontal Asymptotes. The graph of a function has an

asymptote if it approaches the graph of a straight line. We’ll consider two
types of asymptotes: vertical asymptotes and horizontal asymptotes.
Definition 5.34. The graph of a function f has a vertical asymptote at a
point a if
lim f (x) = ±∞ or lim f (x) = ±∞.
x→a+ x→a−
For example, the graph of f (x) = x1 satisfies both limx→0+ x1 = ∞ and

limx→0− x1 = −∞. Either of these limits shows that f has a vertical asymptote
at x = 0.
Definition 5.35. The graph of a function f has a horizontal aymptote y = L,
where L is a constant, if
lim f (x) = L or lim f (x) = L.
x→∞ x→−∞
This says that the graph of f flattens out to a horizontal line if you go
far enough along the x-axis. For example, the graph of f (x) = e−x sat-
isfies limx→∞ e−x = 0, so f has a horizontal asymptote y = 0 as x ap-
proaches ∞.
x
Example 5.36. Determine whether the function f (x) = x−2 has vertical or
horizontal asymptotes.
Solution 5.37. Vertical asymptotes tend to occur at points where the func-
tion is not defined. In this case, f is not defined at x = 2. Think of f as a
1
product, f (x) = x · x−2 . As x approaches 2 from the right, the factor x ap-
proaches 2, and x−2 becomes a very small positive number, implying that the
1 x
factor x−2 becomes a very large positive number. Thus limx→2+ x−2 = ∞.
Therefore f has a vertical asymptote at x = 2.
We don’t strictly need to check the left hand limit at this point, but let’s do
1
so to be complete. Again think of f as f (x) = x · x−2 . As x approaches 2 from
the left, the factor x approaches 2 and x − 2 becomes a very small negative
1
number, implying that the factor x−2 becomes a very large negative number.
x
Thus limx→2− x−2 = −∞.
For horizontal asymptotes, if x is very large then x and x − 2 have essentially
x
the same magnitude, so the limit of x−2 as x approaches ∞ should be 1.
That’s intuition, let’s now check it works for real. Do a trick by writing the
denominator x − 2 as x(1 − x2 ). This looks strange until you see it in action:
x x 1
lim = lim = lim =1
x→∞ x − 2 x→∞ x(1 − x2 ) x→∞ 1 − 2
x
57
since limx→∞ x2 = 0. Therefore f has a horizontal asymptote of y = 1 as x

approaches ∞.
Similarly, f has a horizontal asymptote of y = 1 as x approaches −∞.
Curve Sketching. Now we put it all together. Given a function f (x) we

want to sketch its graph. The data we need is:
• critical points;
• intervals of increase or decrease;
• inflection points;
• intervals of concavity;
• asymptotes;
• x and y intercepts, if possible.
Example 5.38. Sketch the graph of the function f (x) = x4 − 4x3 + 1.
Solution 5.39. Let’s find critical ponts and inflection points first and then
put the intervals of increase, decrease and concavity on one table. We have
f 0 (x) = 4x3 − 12x2 = 4x2 (x − 3), so f has critical points at x = 0, x = 3.
Also, f 00 (x) = 12x2 − 24x = 12x(x − 2) so f has inflection points at x = 0
and x = 2. Use the critical and inflection points to divide the real line into
intervals and test f 0 and f 00 at one point in each. We get
(−∞, 0) (0, 2) (2, 3) (3, ∞)

f0 − − − +
f & & & %
f 00 + − + +
f ^ _ ^ ^
The table gives us the shape of the graph of f .
To pin it down more, we need some values for f (x) at certain x. Ideally, we’d
find intercepts. However, finding solutions to f (x) = 0 is not terribly easy,
so let’s check the value of f at the critical and inflection points. We have
f (0) = 1, f (2) = −15 and f (3) = −26. Let’s also check for asymptotes. Since
f (x) is defined for all x there are no vertical asymptotes, and the limit of f
as x approaches ±∞ is ∞, so there are no horizontal asymptotes.
58
Putting it all together gives something like
2 3
x2 −1
Example 5.40. Sketch the graph of the function f (x) = x2 −4 .
Solution 5.41. First observe that f is not defined at x = ±2. Note that
limx→2+ f (x) = ∞, limx→2− f (x) = −∞, limx→−2+ f (x) = ∞ and limx→−2− f (x) =
−∞. So f has vertical asymptotes at ±2. Speaking of asymptotes, note also
that as in Example 5.36 we have limx→∞ f (x) = 1 and limx→−∞ f (x) = −∞.
So f has horizontal asymptotes of 1 as x approaches ±∞.
While we’re on f , we can find intercepts this time. We have f (x) = 0 if and
only if x2 − 1 = 0, so f has x-intercepts at x = ±1.
Now take derivatives. Doing the quotient rule gives
−6x 6(3x2 + 4)
f 0 (x) = f 00 (x) = .
(x2
− 4) 2 (x2 − 4)3
So f has one critical point at x = 0, and no inflection points. Let’s put
together a table, where the real line is divided into intervals using the critical
point and the points where f has a vertical asymptote. We get
(−∞, −2) (−2, 0) (0, 2) (2, ∞)
f0 + + − −
f % % & &
f 00 + − − +
f ^ _ _ ^
For pinning points, there are the intercepts at x = ±1 and f (0) = 14 . So for
the graph of f we obtain something roughly like
−2 2
59
6. Integration
Integration is about finding the area under the graph of a function f . We’ll see
that, remarkably, integration is an “inverse process” to differentiation. Let’s
start with the inverse process.
Antiderivatives. The idea is simple: given a function f , find another func-

tion g such that g 0 = f . More precisely:
Definition 6.1. Let f be a function. A function g is an antiderivative of f
if g is differentiable and g 0 = f .
d 1 n
For example, since dx (xn ) = nxn−1 , an antiderivative of xn−1 is nx . By
linearity we can find antiderivatives of polynomials.
Example 6.2. Find an antiderivative of f (x) = x3 − 2x + 3.
Solution 6.3. An antiderivative is g(x) = 41 x4 − x2 + 3x.
Other functions can be introduced as well.

3
Example 6.4. Find an antiderivative of f (x) = sin x + x 2 − ex .
Solution 6.5. Since the derivative of cos x is − sin x, an antiderivative of sin x
is − cos x. Since the derivative of ex is itself, an antiderivative of ex is itself.
5
Thus an antiderivative of f is g(x) = − cos x + 52 x 2 + ex .
Here is an important point.

Lemma 6.6. Suppose that g(x) is an antiderivative of f (x), and c is a con-
stant. Let h(x) be a new function defined by h(x) = g(x) + c. Then h is also
an antiderivative of f .
Proof. Since g is an antiderivative of f we have g 0 (x) = f (x). To show that h

is an antiderivative of f we need to show that h0 (x) = f (x) as well. But
h0 (x) = (g(x) + c)0 = g 0 (x) = f (x) since since the derivative of the constant
is 0.
What is this saying? It says that antiderivatives are *not* unique. Whatever
antiderivative you find you can always add a constant to get another anti-
derivative. For example, if f (x) = 3x2 then one antiderivative is g(x) = x3
and another is h(x) = x3 − 17.
Question: What is the geometric meaning of an antiderivative?
At this point, the answer is that it has no meaning. It’s just a game we can
play with functions. We’ll soon see that antiderivatives give a way of finding
integrals.
The definite integral. Let f be a continuous function and consider its

graph. We want to find the area under the curve between x = a and x = b.
60
“Under” is not a good term, really. For if the graph of f is above the x-axis we
want to count this as positive area, and if the graph of f is under the x-axis
we want to count this as negative area. So more accurately, we want to find
the area A between the curve and the x-axis that lies over [a, b]. How can we
do this?
Before aiming for an exact answer, let’s first try to approximate the area.
We do know how to calculate the area of a rectangle: it’s the base times the
height. So let’s fit together a sequence of rectangles that are below the curve
with total area L. Then the area A is ≥ L. Then let’s fit together a sequence
of rectangles that are above the curve, with total area U . Then the area A
is ≤ L. Here is a picture where the lower rectangles have their base on the
x-axis and their tops being the black horizontal lines below the curve, and the
upper rectangles have their base on the x-axis and their tops being the red
horizontal lines above the curve:
a x1 x2 x3 x4 b
Notice that we were organised about this. The number of lower rectangles
equals the number of upper rectangles, and they have the same bases. The
height of the lower rectangles is the least value of the curve running along that
section of the base. The height of the upper rectangles is the greatest value
of the curve running along that section of the base.
More formally, let’s proceed as follows.
Step 1 : Divide [a, b] into n segments [a, x1 ], [x1 , x2 ], . . ., [xn−2 , xn−1 ], [xn−1 , b]
where a < x1 < x2 < · · · < xn−1 < b. To align the notation, let x0 = a and
xn = b. Let ì be the length of segment [xi−1 , xi ].
Step 2 : Consider f restricted to [xi−1 , xi ]. On that closed interval f is con-

tinuous so it has a maximum value and a minimum value. Let Mi be the
maximum value and let mi be the minimum value.
Step 3 : Now form rectangles over [xi−1 , xi ]. The lower rectangle has base ì
and height mi , so its area is ì mi . The upper rectangle has base ì and
height Mi , so its area is ì Mi .
61
Step 4 : Add up the areas of the lower and upper rectangles. The sum of the
areas of the lower rectangles is
L(f, P ) = `1 m1 + · · · + `n mn
where we remember in the notation L(f, P ) that this total lower area depends
on the function f how [a, b] was divided (“partitioned”) into n segments. The
sum of the areas of the upper rectangles is
U (f, P ) = `1 M1 + · · · + `n Mn .
Therefore, the area between the graph of f and the x-axis that lies over [a, b]
satisfies
L(f, P ) ≤ A ≤ U (f, P ).
You can imagine that if the intervals are made smaller then there’s less un-
counted area between A and the total area of the upper rectangles, and less
uncounted area between A and the total area of the lower rectangles. This
should continue to improve as the interval lengths get smaller. So in the limit
we should get an accurate approximation to A.
This limiting process is what is meant by the indefinite integral, which mea-
sures the area A between the curve of f and the x-axis that lies over the
interval [a, b]. Being cautious, it’s necessary to know that this limit exists,
and that the limit for L(f, P ) equals the limit for U (f, P ). We won’t prove
it but all this holds, provided f is continuous. The following definition then
makes sense.
Definition 6.7. Let f be a continuous function on the interval [a, b]. The
definite integral of f from a to b is the unique number I such that L(f, P ) ≤
I ≤ U (f, P ) for all partitions of I. We write this as
Z b
I= f (x) dx.
a
The definite integral has nice properties because of the arithmetic of limits and
the fact that within those limits we’re just adding up areas of rectangles.
Theorem 6.8. Let f and g be continuous functions on the interval [a, b].
Then:
Z b Z b Z b
(a) (cf (x) + dg(x)) dx = c f (x) dx + d g(x) dx;
a a a
Z a
(b) f (x) dx = 0;
a
Z t Z b Z b
(c) f (x) dx + f (x) dx = f (x) dx;
a t a
62
Z a Z b
(d) f (x) dx = − f (x) dx;
b a
Z b Z b
(e) if f (x) ≤ g(x) for all x ∈ [a, b] then f (x) dx ≤ g(x) dx.
a a
The Fundamental Theorem of Calculus. At this point we have a defini-

tion for the definite integral but no practical way to calculate it. One purpose
in this section is to show that an integral can be calculated using an antideriv-
ative.
Before stating the theorem, let’s get used to some notation. Let f be a contin-
uous function on the interval [a, b]. Define a new function F as follows. Let x
be a point in [a, b]. Let F (x) be the area between the graph of f and the x-axis
that lies over the interval [a, x]. That is F (x) is the indefinite integral
Z x
F (x) = f (t) dt.
a
For example, suppose the graph of f lies above the x-axis so we are considering
area under the graph. Then F (x) is the area under the graph between a and
the point x. As x moves towards b, F is measuring the accumulation of
additional area.
Now we come to the first part of the Fundamental Theorem of Calculus.
Theorem 6.9 (The Fundamental Theorem of Calculus, Part I). Let f be a
continuous function on the interval [a, b]. The function F defined by
Z x
F (x) = f (t) dt
a
is continuous on [a, b], differentiable on (a, b), and has derivative
F 0 (x) = f (x).
Proof. We’ll only show that F is differentiable and F 0 (x) = f (x). By the
definition of the derivative,
F (x + h) − F (x)
F 0 (x) = lim .
h→0h
Let’s look at the righthand limit first, so h > 0. By definition of F and
properties of the definite integral, we have
Z x+h Z x
F (x + h) − F (x) = f (t) dt − f (t) dt
a a
Z x Z x+h Z x
= f (t) dt + f (t) dt − f (t) dt
a x a
Z x+h
= f (t) dt.
x
63
R x+h
Now x f (t) dt is the area between the graph of f and the x-axis that lies
over [x, x+h]. We’re thinking of h as being small. Let’s approximate this area
by going back to rectangles. Let Mh be the maximum value of f on [x, x + h]
and let mh be the minimum value of f on [x, x + h]. Let L be the rectangle
with base [x, x + h] and height mh , so its area is hmh . Let U be the rectangle
with base [x, x + h] and height Mh , so its area is hMh . Therefore
Z x+h
hmh ≤ f (t) dt ≤ hMh .
x
Since h > 0, if we divide through by h we get
1 x+h
Z
mh ≤ f (t) dt ≤ Mh .
h x
That is,
F (x + h) − F (x)
(6) mh ≤ ≤ Mh .
h
As h approaches 0, the minimum value of f on [x, x + h] and the maximum
value of f on [x, x + h] both approach f (x). Therefore, taking limits in all
three terms in (6) we obtain
F (x + h) − F (x)
f (x) ≤ lim ≤ f (x).
x→0+ h
Hence
F (x + h) − F (x)
lim = f (x).
x→0+ h
0
Similarly for the lefthand limit. Therefore F (x) exists and it equals f (x).
Rx
Example 6.10. If F (x) = −1 sin t dt, find F 0 (π/2).
Solution 6.11. By the Fundamental Theorem of Calculus, F 0 (x) = sin x, so
F 0 (π/2) = 1.
Example 6.12. Suppose that
Z x
tf (t) dt = sin x.
0
Find f (π) and f 0 (π).
Rx
Solution 6.13. Let F (x) = 0 tf (t) dt − sin x, so F (x) = 0. Taking deriva-
tives,
F 0 (x) = xf (x) − cos x and F 0 (x) = 0.
Equating the two expressions for F 0 (x) gives
xf (x) = cos x.
Therefore, evaluating at π gives πf (π) = cos π = −1, so f (π) = − π1 . Further,
taking derivatives in xf (x) = cos x gives
f (x) + xf 0 (x) = − sin x
so evaluating at π gives f (π)+πf 0 (π) = − sin π = 0. Knowing that f (π) = − π1 ,
we get f 0 (π) = − π1 f (π) = π12 .
64
The Fundamental Theorem of Calculus Part I says that if you first integrate f
and then differentiate you get f back again. So the integral of f is an an-
tiderivative of f ! We know that antiderivatives are unique up to adding a
constant, so if we can find an antiderivative of f then we know what the in-
tegral is, up to a constant. The next theorem tells us how to deal with the
constant in a definite integral.
Theorem 6.14. Let f be a continuous function on the interval [a, b]. If G is
any antiderivative of f then
Z b
f (t) dt = G(b) − G(a).
a
Proof. As mentioned, one antiderivative of f is

Z x
F (x) = f (t) dt.
a
If G(x) is another antiderivative, then F (x) − G(x) = c for some constant c.

By definition of F , we have F (a) = 0, so c = −G(a). Therefore F (x)−G(x) =
G(a), or F (x) = G(x) − G(a). Evaluating at x = b and using the definition
of F we get
Z b
F (b) = f (t) dt = G(b) − G(a).
a

The Fundamental Theorem of Calculus Part II is what let’s us calculate inte-

grals. Just find an antiderivative and evalate at the endpoints.
Z 2
Example 6.15. Find x3 − 2x + 1 dx.
1
Solution 6.16. An antiderivative of x3 − 2x + 1 is 14 x4 − x2 + x. So

Z 2 2
1 4
x3 − 2x + 1 dx = x − x2 + x

1 4 1

1 4 1 4
= 2 − 22 + 2 − 1 − 12 + 1
4 4
1 7
=2− = .
4 4
Remark 6.17. In the previous example, another antiderivative of x3 − 2x + 1
is 41 x4 − x2 + x + π. Evaluating this at b adds an extra value of π, as does
evaluating at a. But these extra values cancel out because of the subtraction.
So it doesn’t matter which antiderivative you choose.
Z 0
Example 6.18. Find sin x − cos x dx.
−π
65
Solution 6.19. Going quicker now, we have

Z 0
0
sin x − cos x dx = (− cos x − sin x)−π = (−1 − 0) − (−(−1) − 0) = −2.
−π
Area Problems. Let’s go back to the basic aim of integration, which is

to calculate areas. First, suppose you want to know the area between two
functions f and g on an interval [a, b]. Suppose for simplicity that f (x) is
Rb
always ≥ g(x) for x ∈ [a, b]. Remember that a f (x) dx is the area between
the graph of f and the x-axis, and similarly for g. So if we want to find the
area between f and g we take the area from f to the x-axis and cancel out
the area from g to the x-axis. That is, we look at
Z b
(f (x) − g(x)) dx.
a
Example 6.20. Find the area enclosed between the curves y = 4x and y = x2 .
Solution 6.21. The first thing to do is find where the curves intersect so
we have bounds for the integration. Intersection points lie on both curves, so
they are solutions to the equation 4x = x2 . That is, 0 = x2 − 4x = x(x − 4),
so the points of intersection are 0 and 4.
The next thing to do is find which curve is on top and which is on bottom.
Pick a point in (0, 4) and evaluate both functions at that point. For example,
pick x = 1. Then on the line y = 4x we get y(1) = 4 and on the parabola
y = x2 we get y(1) = 1. So y = 4x is on top and y = x2 is on bottom.
The area between the two curves is therefore
Z 4 4
2 2 1 3 64 32
4x − x dx = 2x − x = 32 − − (0 − 0) = .
0 3 0 3 3
Next, suppose that you want to know the total area enclosed between the
graph of f and the x-axis, but some of the area is above the x-axis and some
is below. The definite integral counts area below the x-axis as negative but as
we want a total enclosed area we want to count it as positive. So we’ll have
to manually insert a minus sign.
Example 6.22. Determine the area enclosed by the graph of y = x2 − 2x
and the x-axis, on the interval [−1, 3].
Solution 6.23. First, we need to know where the graph crosses the x-axis,
as these are the points where the graph of f will change from being above the
x-axis to below or vice-versa. That is, find solutions to y = x2 −2x = x(x−2).
These are x = 0 and x = 2.
Second, on the subintervals (−1, 0), (0, 2) and (2, 3), check where the graph
of y is above or below the x-axis. This is done by evaluating a point from the
interior of each subinterval and checking its sign. We have y(− 21 ) = 14 + 1 > 0
66
so y is above the x-axis on (−1, 0); y(1) = 1 − 2 < 0 so y is below the x-axis
on (0, 2); and y(5/2) = 25
4 − 5 > 0 so y is above the x-axis on (2, 3).
The total area A enclosed is therefore

Z 0 Z 2 Z 3
A= x2 − 2x dx − x2 − 2x dx + x2 − 2x dx
−1 0 2
0 2 3
1 3 2
1 3 2
1 3 2

= x −x − x −x + x −x
3 −1 3 0 3 2
4 4 4
= (0 − (− )) − (− − 0) + (0 − (− ))
3 3 3
4 4 4
= + + = 4.
3 3 3
The Indefinite Integral. The indefinite integral is basically just another

name for an antiderivative. For example, an antiderivative of 2x is x2 and
another is x2 +1. Any two antiderivatives differ by a constant. We write
Z
2x dx = x2 + c
for the indefinite integral. Here, the bounds on the integration are left out as
we’re only looking for an antiderivative, not an area, and the “+c” is used to
indicate that the antiderivative can be changed up to any constant c.
1
Example 6.24. Find ex − cos x + x− 3 dx
R
Solution 6.25. We have

Z
1 3 2
ex − cos x + x− 3 dx = ex − sin x + x 3 + c.
2
We’ll often use indefinite integrals because it’s easier to tackle methods of
integration without having to worry about carrying along bounds all the time.
WARNING: An indefinite integral without the “+c” is incorrect and you will
lose marks for doing this on the exam!
67
7. Methods of Integration
Differentiation is algorithmic: once you know the derivatives of the basic func-
tions and the rules of differentiation (including the Chain Rule), you can
differentiate anything you like. Integration is not algorithmic: there are no
formulas that are 100% analogues to the Product Rule, the Quotient Rule
or the Chain Rule. So it is sometimes not straightforward to integrate, and
finding an integral can be something of an art form.
There are some helpful methods of integration. We’ll look at substitution,
integration by parts, and partial fractions. In what follows, we’re mainly
concerned with finding methods to allow for calculation rather than justifying
precisely why they work.
Methods of Integration: Substitution.

The method of substitution is the integral version of the Chain Rule. Remem-
ber that
d
(f (g(x)) = f 0 (g(x))f 0 (x)
dx
so integrating (taking antiderivatives) we get
Z
f 0 (g(x))g 0 (x) dx = f (g(x)) + c.
0
Writing this a bit differently, let u = g(x). Then du dx = g (x), or loosely,
0
du = g (x) dx. So
Z Z
f (g(x))g (x) du = f 0 (u) du = f (u) + c = f (g(x)) + c.
0 0
WARNING: The argument we just went through is not rigourous, that’s why
we said “loosely”. Remember du dx is *not* a fraction, it’s just notation for
the derivative. It happens to be particularly suggestive notation because the
intuition that comes in loosely treating like a fraction actually turns out to
be rigourously true, although this needs a proper proof, which we won’t give.
The best way to get used to subsitution is to do examples.

√
Example 7.1. Find ex 1 + ex dx.
Solution 7.2. Let u = 1 + ex , Then du = ex dx, so
√ √
Z Z Z
3 2 5 2 5 2 5
e 1 + e dx = u u du = u 2 du = u 2 + c = (ex ) 2 + c = ex 2 + c.
x x
5 5 5
Note the process in the last example: use u for the substitution, integrate
to find a function in u, and substitute back to get an answer in the vari-
able x.
68
Example 7.3. Find the definite integral

Z 8 √
cos x + 1
√ dx.
0 x+1
√ 1
Solution 7.4. Let u = x + 1. Then du = 2√x+1 dx, or equivalently,
1
√
x+1
dx = 2du. In changing variables, we should change the bounds. From
√
the equation u = x + 1 we get: if x = 0 then u = 1, and if x = 8 then 3. So
Z 8 √ Z 3
cos x + 1 3
√ dx = 2 cos u du = 2 sin u1 = 2 sin 3 − 2 sin 1.
0 x+1 1
Substitutions are very good for handling some trig integrals.

Z
Example 7.5. Find sin3 x cos5 x dx.
Solution 7.6. If u = cos x then du = − sin x dx. This leaves two more powers
of sine unaccounted for when we try to substitute in u. But we can use the
usual trick of writing sin2 x = 1 − cos2 x to convert the sines into cosines. This
gives
Z Z Z
3 5 2 5
sin cos dx = sin x(1 − cos x) cos x dx = − (1 − u2 )u5 du
Z
= u7 − u2 du
1 8 1 3
= u − u +c
8 3
1 1
= cos8 x − cos3 x + c.
8 3
For even powers of sine and cosine it’s handy to remember the half-angle
formulas
1 1
cos2 x = (1 + cos 2x) sin2 x = (1 − cos 2x).
2 2
Z
Example 7.7. Find sin4 x dx.
Solution 7.8. First, use the half-angle formula to convert some sines to
cosines:
2
4 1 1
sin x = (1 − cos 2x) = (1 − 2 cos 2x + cos2 2x).
2 4
Now convert the cos2 2x using the half-angle formula, to get cos2 2x = 21 (1 +
cos 4x). Therefore
Z Z
4 1 1
sin x dx = 1 − 2 cos 2x + (1 + cos 4x) dx
4 2
1 1 1
= (x − sin 2x + x + sin 4x) + c
4 2 8
3 1 1
= x − sin 2x + sin 4x + c.
4 4 32
69
Remark 7.9. It’s always easy to check whether you’re indefinite integral is
correct. Simply differentiate the answer to see if you get the original function
back again. If so, you’re answer is right. If not, you’re answer is wrong.
Trig substitutions A variant of √ the method√of substitution

√ involves certain
trig functions. Integrals involving a2 − x2 , a2 + x2 or x2 − a2 can some-
times be handled using the substitutions a sin θ = x, a tan θ = x or a sec θ = x
respectively.
Z p
Example 7.10. Find 4 − x2 dx.
√
Solution 7.11. The 4 − x2 suggests we use the substitution 2 sin θ = x.
Note this gives
p p q √
4 − x2 = 4 − 4 sin2 θ = 4(1 − sin2 θ) = 4 cos2 θ = 2 cos θ.
Also, differentiating both√sides of 2 sin θ = x gives (loosely) 2 cos θ dθ = dx.
So substituting for both 4 − x2 and dx we get
Z p Z Z
4 − x2 dx = (2 cos θ)(2 cos θ) dθ = 4 cos2 θ dθ
Z
1
=4 (1 + cos 2θ) dθ
2
= 2θ + sin 2θ + c
The answer as it stands is awkward, as we started with the variable x so
we should end with the variable x. For θ we can use 2 sin θ = x to get
θ = sin−1 ( x2 ). But how do we hande sin 2θ ? Let’s convert back to θ by using
sin 2θ = 2 sin θ cos θ and use a triangle to figure out cos θ:
2
x
θ
√
4 − x2
opposite
Here, from 2 sin θ = x we have sin θ = x2 , so knowing that sin θ = hypotenuse
we obtain that the opposite side to the angle θ is x and the hypotenuse is 2.
√ adjacent
Therefore the adjacent side has length 4 − x2 . As cos θ = hypotenuse we
√
2
obtain cos θ = 4−x2 . Thus
Z p
4 − x2 dx = 2θ + sin 2θ + c = 2θ + 2 sin θ cos θ + c
√
x 4 − x2
= 2 sin−1 ( ) + x + c.
2 2
Methods of Integration: Integration by Parts. Integration by parts is

the analogue of the Product Rule in differentiation. Suppose that u(x) and
70
v(x) are differentiable functions. Then

d du dv
(uv) = v+u .
dx dx dx
Adjusting the terms,
dv d du
u = (uv) − v .
dx dx dx
Integrating then gives
Z Z
dv du
u dx = uv − v dx.
dx dx
Simplifying the notation then gives
Z Z
(7) u dv = uv − v du.
Z
Example 7.12. Find xex dx.
Solution 7.13. We need to select u and dv. Take u = x and dv = ex dx.

Then, beingR rough, differentiating
R gives du = dx and integrating gives v = ex .
Thus from u dv = uv − v du we obtain
Z Z
xe dx = xe − ex dx.
x x
The integral on the right side is easy to work out, so the end result is
Z
xex = xex − ex + c.
Remark 7.14. It may not always be easy to see what u and dv should be to
get the calculation going. Sometimes experimentation is needed.
Z
Example 7.15. Find x2 ln x dx.
Solution 7.16. It seems natural at first glance to take u = x2 and dv =

ln x dx. But right away you see that the integration to get v is not immediate.
Counterintuitively, try taking u = ln x and dv = x2 dx. Then du = x1 dx and
v = 31 x3 . So integration by parts gives
Z Z
2 1 3 1 3 1
x ln x dx = x ln x − x · dx
3 3 x
Z
1 1
= x3 ln x − x2 dx
3 3
1 3 1
= x ln x − x3 + c.
3 6
Integration by parts sometimes lets you do sneaky moves.

Z
Example 7.17. Find ln x dx.
71
Solution 7.18. Let u = ln x and dv = dx. Then du = x1 dx and v = x. So

integration by parts gives
Z Z Z
1
ln x dx = x ln x − x · dx = x ln x − 1 dx = x ln x − x + c.
x
Sometimes it may be appropriate to do integration by parts twice.

Z
Example 7.19. Find x2 e−2x dx.
Solution 7.20. This is much like the first problem. Let u = x2 and dv =
e−2x dx. Then du = 2x dx and v = − 21 e−2x . Interation by parts gives
Z Z Z
1 1 1
x2 e−2x dx = − x2 e−2x + 2xe−2x dx = − x2 e−2x + xe−2x dx.
2 2 2
The integral on the right side can be handled by integration by parts again.
Now let u = x and dv = e−2x dx. Then du = dx and v = − 21 e−2x . We
therefore get
Z Z
−2x 1 −2x 1 1 1
xe dx = − xe + e−2x dx = − xe−2x − e−2x + c.
2 2 2 4
Putting everything together gives
Z
1 1 1
x2 e−2x dx = − x2 e−2x − xe−2x − e−2x + c.
2 2 4
Z
Challenge Problem 7.21. Find and prove a formula for xn ex dx.
Integration by parts with sine and cosine sometimes involves doing parts twice
and then collecting like terms.
Z
Example 7.22. Find ex sin x dx.
Solution 7.23. Let u = sin x and dv = ex dx. Then du = cos x dx and v = ex .

So integation by parts gives
Z Z
ex sin x dx = ex sin x − ex cos x dx.
For the integral on the right side, do integration by parts again with u = cos x
and dv = ex dx. Then du = − sin x dx and v = ex , so
Z Z
ex cos x dx = ex cos x + ex sin x dx.
Note that the integral on the left is the one we started with, so we seem to be
going in circles. But if we put the two equations together we get
Z Z Z
e sin x dx = e sin x−(e cos x+ e sin x dx) = e sin x−e cos x− ex sin x dx.
x x x x x x
Collecting like terms then gives

Z
2 ex sin x dx = ex sin x − ex cos x.
72
Dividing by 2 and remembering to add the “+c” finally gives

Z
1
ex sin x dx = ex (sin x − cos x) + c.
2
Finally, let’s not forget that that integration is about finding area.
√ √
Example 7.24. Find the area between the curve y = xe x and the x-axis
on the interval [0, 4].
Solution 7.25. The area is given by the definite integral
Z 4
√ √x
xe dx.
0
Let’s forget about the bounds for the moment and work out the indefinite
integral first. We use √
a combination of integration methods. Start with a
1
substitution: let u = x so du = 2√ x
dx. Notice that the latter implies
√ √
that 2 xdu = dx, and substituting for the x again, this gives 2udu = dx.
Therefore
√ √x
Z Z Z
xe dx = ueu · 2u du = 2 u2 eu du.
Adjusting Example 7.19, integration by parts twice gives

Z
u2 eu du = u2 eu − 2ueu + eu + c.
Thus
√ √ √ √ √ √
Z
x x
xe dx = 2(xe − 2 xe x + e x ) + c.
Now going back to the definite integral, we get

√ 4
Z 4
√ √x √ √x

√
x x
xe dx = 2(xe − 2 xe + e )
0 0
2 2 2 0
= 2(4e − 4e + e ) − 2(e )
= 2e2 − 2.
Methods of Integration: Partial Fractions. This method is sometimes

good for finding integrals of rational functions,
Z
P (x)
dx
Q(x)
P (x)
where P (x) and Q(x) are polynomials. The idea is to rewrite Q(x) as a sum
of terms which are easier to integrate.
Z
x+4
Example 7.26. Find 2
dx.
x − 5x + 6
73
Solution 7.27. Focus on the term inside the integral first (called the inte-
grand ). Notice that the denominator factors, x2 − 5x + 6 = (x − 3)(x − 2).
Let’s look for an A and B such that
x+4 A B
= + .
(x − 3)(x − 2) x−3 x−2
Cross-multiplying gives
x+4 A(x − 2) + B(x − 3) (A + B)x + (−2A − 3B)
= = .
(x − 3)(x − 2) (x − 3)(x − 2) (x − 3)(x − 2)
Equating coefficients of powers of x in the numerator gives
A+B =1 − 2A − 3B = 4.
Solving for A and B gives A = 7 and B = −6. Thus
x+4 7 6
= − .
(x − 3)(x − 2) x−3 x−2
Now integrate to get
Z Z
x+4 7 6
dx = − dx = 7 ln |x − 3| − 6 ln |x − 2| + c.
x2 − 5x + 6 x−3 x−2
How did we know looking for an A and a B like this would work? Essentially,
this is what led to an effective comparison of coefficients in the numerators
after cross-multiplying.
This leads to a series of steps for doing partial fractions.
P (x)
Step 1 : If necessary, rewrite the fraction Q(x) so that the power of the numer-
ator is strictly less than the denominator. That is, if the degree of P (x) is ≥
the degree of Q(x) then do a long division to write P (x) = A(x)Q(x) + R(x)
where A(x) is a polynomial and the degree of the remainder R(x) is strictly
P (x) R(x)
less than the degree of Q(x). Then Q(x) = A(x) + Q(x) . Integrating the
R(x)
polynomial A(x) is easy and we’re left with integrating Q(x) .
Step 2 : Factor the denominator Q(x). The Fundamental Theorem of Algebra

(something you prove in Complex Analysis) says that Q(x) can be factored
as a product of linear and quadratic terms. Some of these may repeat, for
example, Q(x) = (x − 3)(x + 1)2 (x2 + x + 1)(x2 − x + 2)3 .
P (x)
Step 3 : Write Q(x) as a sum of terms where:
A
• each factor (x − a) in Q(x) contributes a summand x−a ;
• a repeated factor (x − a)k in Q(x) contributes a series of summands

A1 A2 Ak
+ 2
+ ··· + ;
x − a (x − a) (x − a)k
• each irreducible factor x2 + ax + b in Q(x) contributes a summand

Bx+C
x2 +ax+b ;
74
• each repeated irreducible factor (x2 + ax + b)k in Q(x) contributes a

series of summands
B1 x + C1 B2 x + C2 Bk x + Ck
+ + ··· + 2 .
x2 + ax + b (x2 + ax + b)2 (x + ax + b)k
For example, if
P (x) P (x)
=
Q(x) (x − 3)(x + 1) (x + x + 1)(x2 − x + 2)3
2 2
write
P (x)
=
(x − 3)(x + 1)2 (x2 + x + 1)(x2 − x + 2)3
A1 A2 A3 B1 x + C1 B2 x + C2 B3 x + C3 B4 x + C4
+ + + + + + .
x − 3 x + 1 (x + 1)2 x2 + x + 1 x2 − x + 2 (x2 − x + 2)2 (x2 − x + 2)3
2
x2 + 2
Z
Example 7.28. Evaluate the definite integral dx.
1 4x5 + 4x3 + x
Solution 7.29. This will be a longer example as there are lots of bits and
pieces. First observe that
4x5 + 4x3 + x = x(4x4 + 4x2 + 1) = x(2x2 + 1)2 .
So we aim to solve
x2 + 2 A Bx + C Dx + E
= + 2 + .
x(2x2 + 1)2 x 2x + 1 (2x2 + 1)2
Cross-multiplying gives
x2 + 2 A(2x2 + 1)2 + (Bx + C)x(2x2 + 1) + (Dx + E)x
=
x(2x2 + 1)2 x(2x2 + 1)2
A(4x + 4x + 1) + B(2x4 + x2 ) + C(2x3 + x) + Dx2 + Ex
4 2
= .
x(2x2 + 1)2
Comparing coefficients gives equations
4A + 2B = 0
2C = 0
4A + B + D = 1
C +E =0
A=2
Solving gives A = 2, B = −4, C = 0, D = −3, E = 0. Therefore
x2 + 2 −4x −3x
Z Z
2
5 3
dx = + 2 + dx.
4x + 4x + x x 2x + 1 (2x2 + 1)2
75
On the right the substitution u = 2x2 + 1 gives du = 4dx and so (ignoring the
“+c” for now)
−4x
Z Z
1
2
dx = − du = − ln u
2x + 1 u
−3x
Z Z
3 1 31
dx = =− .
2x2 + 1)2 4 u2 4u
Therefore
x2 + 2 −4x −3x
Z Z
2
dx = + + dx
4x5 + 4x3 + x x 2x2 + 1 (2x2 + 1)2
31
= 2 ln x − ln u −
4u
2 3 1
= 2 ln x − ln(2x + 1) − +c
4 2x2 + 1
x2

3 1
= ln 2
− 2
+ c.
2x + 1 4 2x + 1
Finally, returning to the definite integral,
Z 2 2
x2 + 2 x2

3 1
5 3
dx = ln 2
− 2

1 4x + 4x + x 2x + 1 4 2x + 1 1

4 3 1 3
= ln − − ln −
9 36 3 12

4 1
= ln + .
3 6
In the linear case there is sometimes a faster way to solve the equations when
comparing coefficients. This is useful to remember when later we turn to
differential equations.
2x2 + 3
Z
Example 7.30. Find dx.
x(x − 1)2
Solution 7.31. Using partial fractions, we want
2x2 + 3 A B C
2
= + +
x(x − 1) x x − 1 (x − 1)2
A(x − 1)2 + Bx(x − 1) + Cx
= .
x(x − 1)2
Comparing numerators, we want
2x2 + 3 = A(x − 1)2 + Bx(x − 1) + Cx.
Substituting in x = 1, x = 0 produces zero terms that help quickly determine
A, B and C. Substituting in x = 1 gives 5 = C. Substituting in x = 0 gives
3 = A. For B, we can’t do the same trick, but now that we know A and C
76
we can pick any other value of x and see what happens. Take, for example,
x = 2 to get 11 = A + 2B + 2C = 3 + 2B + 10, so B = −1. Hence
2x2 + 3
Z Z
3 1 5
dx = − + dx
x(x − 1)2 x x − 1 (x − 1)2
5
= 3 ln x − ln |x − 1| − + c.
x−1
77
8. Indeterminate Forms and Improper Integrals
In this section limits, derivatives and integrals combine in certain cases.
L’Hospital’s Rule. This is a method for working out a limit

f (x)
lim
x→a g(x)
when both f (x) and g(x) approach 0, or when both f (x) and g(x) approach
±∞.
Theorem 8.1 (L’Hospital’s Rule Version I). Let f and g be differentiable
functions with g 0 (x) 6= 0. Suppose that
lim f (x) = 0 and lim g(x) = 0.
x→a x→a
If
f 0 (x)
lim =L
x→a g 0 (x)
then
f (x)
lim = L.
x→a g(x)
Remark 8.2. In the statement of the theorem, the case x → ∞ or x → −∞

0
is also allowed. Further, if limx→a fg0 (x)
(x)
= ∞ or −∞ then limx→a fg(x)
(x)
=∞
or −∞ respectively.
ln x
Example 8.3. Find lim .
x→1 1−x
Solution 8.4. Observe that limx→1 ln x = 0 and limx→1 1 − x = 0, both ln x
and 1−x are differentiable, and (1−x)0 6= 0. So the hypotheses of L’Hospital’s
Rule are fulfilled. As the derivatives of ln x and 1−x are x1 and −1 respectively,
by L’Hospital’s Rule we obtain
1
ln x 1
lim = lim x = lim − = −1.
x→1 1 − x x→ −1 x→1 x
ex − x − 1
x→0 x2
Solution 8.6. A bit quicker this time, observe that limx→0 (ex − x − 1) = 0
and limx→0 x2 = 0. So by L’Hospital’s Rule,
ex − x − 1 ex − 1
lim 2
= lim .
x→0 x x→0 2x
But now limx→0 (ex −1) = 0 and limx→0 2x = 0 so we have to apply L’Hospital’s
Rule again to get
ex − 1 ex 1
lim = lim = .
x→0 2x x→0 2 2
78
Theorem 8.7 (L’Hospital’s Rule Version II). Let f and g be differentiable

functions with g 0 (x) 6= 0. Suppose that
lim f (x) = ±∞ and lim g(x) = ±∞.
x→a x→a
If
f 0 (x)
lim =L
x→a g 0 (x)
then
f (x)
lim = L.
x→a g(x)
Remark 8.8. As before, the case x → ∞ or x → −∞ is also allowed. Also,
0
if limx→a fg0 (x)
(x)
= ∞ or −∞ then limx→a fg(x)
(x)
= ∞ or −∞ respectively.
ln x
Example 8.9. Let r be any positive integer. Find lim .
x→∞ xr
Solution 8.10. Observe that limx→∞ ln x = ∞ and limx→∞ xr = ∞. Ob-
serve also that both ln x and xr are differentiable, and (xr )0 = rxr−1 is nonzero
since r > 0. Therefore, by L’Hospital’s Rule,
1
ln x x 1
lim = lim = lim = 0.
x→∞ xr x→∞ rxr−1 x→∞ rxr
xk
x→∞ ex
Solution 8.12. Here, we have to use L’Hospital’s Rule repeatedly:

xk kxk−1 k(k − 1)xk−2 k!
lim = lim = lim = · · · = lim x = 0.
x→∞ ex x→∞ e x x→∞ e x x→∞ e
One way to interpret the previous example is that the exponential function
grows faster than any power of x.
Sometimes an indeterminate form arises that appears differently, like 0 · ∞
or 00 , but a little craftiness allows for L’Hospital’s Rule to be used.
√
Example 8.13. Find lim x ln x.
x→0+
Solution 8.14. Superficially, we obtain an indeterminate√form 0 · (−∞) to

which L’Hospital’s Rule does not apply. But let’s rewrite x as 1/ √1x , so we
are considering
ln x
lim 1 .
x→0+ √
x
Now we have the indeterminate form −∞

∞ so L’Hospital’s Rule gives
1
ln x x
√
lim = lim = lim −2 x = 0.
x→0+ √1 1 3
x→0+ − x− 2 x→0+
x 2
Example 8.15. Find lim xx .

x→0+
79
Solution 8.16. Now we have the indeterminate form 00 . To deal with limits
in the exponent, take logarithms. We have ln(xx ) = x ln x, so now using the
same idea as in the last example,
1
ln x x
lim x ln x = lim 1 = lim = lim −x = 0.
x→0+ x→0+ x→0+ − 12 x→0+
x x
This, remember, is the logarithm of limx→0+ xx ,
which is not what we want.
To get back to limx→0+ xx we need to exponentiate. Thus
lim xx = e0 = 1.
x→0+
Improper Integrals.
All the integrals we’ve considered so far have been of the form
Z b
f (x) dx
a
for some interval [a, b]. What about an integral of the form
Z ∞ Z b
f (x) dx or f (x) dx ?
a −∞
Such integrals are called improper.
We can make sense of such an improper integral by using a limit. If
Z b
lim f (x) dx = L
b→∞ a
for some finite number L then we write
Z ∞
f (x) dx = L
a
R∞
and say that a f (x) dx converges to L. If
Z b
lim f (x) dx does not exist
b→∞ a
R∞
then we say that a f (x) dx diverges.
Rb
We can similarly analyse −∞ f (x) dx by considering
Z b
lim f (x) dx.
a→−∞ a
Z ∞
1
Example 8.17. Show that the improper integral dx converges.
1 x2
Solution 8.18. Observe that
Z b b
1 1 1 1
2
dx = − = − − 1 =1− .
1 x x 1 b b

Therefore Z b
1 1
lim dx = lim 1 − = 1.
b→∞ 1 x2 b→∞ b
80
R∞ 1
As the limit exists, dx converges and we have
1 x2
Z ∞
1
dx = 1.
1 x2
Z ∞
1
Example 8.19. Show that the improper integral dx diverges.
1 x
Solution 8.20. Observe that
Z b b
1
dx = ln x = ln(b) − ln(1) = ln b.
1 x 1
Therefore
Z b
1
lim dx = lim ln b = ∞.
b→∞ 1 x b→∞
R∞ 1
Since the limit does not exist, 1 x dx diverges.
One way to interpret the last two examples is to say that the area under the
curve x12 on the interval [1, ∞) is finite, while the area under the curve x1 on
[1, ∞) is infinite.
Challenge Problem 8.21. Let p be a real number. Show that the area
under the curve f (x) = x1p on the interval [1, ∞) is finite if and only p > 1.
There are other types of improper integrals. One is of the form

Z ∞
f (x) dx.
−∞
In this case the thing to do is split the integral as

Z a Z ∞
f (x) dx + f (x) dx
−∞ a
for some a and then regard each of the two summands as its own improper
integral.
Z ∞
1
Example 8.22. Find dx.
−∞ 1 + x2
Solution 8.23. First observe that, as an indefinite integral,
Z
1
dx = tan−1 x + c
1 + x2
since the derivative of tan−1 x is 1+x
1
2 . So we obtain
Z 0 Z 0 0
1 1 −1

2
dx = lim 2
dx = lim tan x
−∞ 1 + x a→−∞ a 1 + x a→−∞
a
= lim (tan−1 (0) − tan−1 (a))

a→−∞
π
= lim tan−1 (a) =
a→−∞ 2
81
and
Z ∞ Z b b
1 1 −1

dx = lim dx = lim tan x
0 1 + x2 b→∞ 0 1 + x2 b→∞
0
= lim (tan−1 (b) − tan−1 (0))

b→∞
π
= lim tan−1 (b) = .
b→∞ 2
Thus
Z ∞ Z 0 Z ∞
1 1 1 π π
dx = dx + dx = + = π.
−∞ 1 + x2 −∞ 1 + x2 0 1+x 2 2 2
1
Remark 8.24. Notice that 1+x 2 takes the same value at x = c and x = −c
1
for any c ≥ 0. Therefore the graph of 1+x 2 is symmetric about the y-axis.
So the area under the graph on the interval (−∞, 0] equals the area under
the graph on the interval [0, ∞). Using this, the previous example could be
simplified by calculating
Z ∞ Z ∞
1 1
2
dx = 2 dx.
−∞ 1 + x 0 1 + x2
Another type of improper integral is of the form

Z b
f (x) dx
a
if the function f exists on [a, b) but is unbounded, or if f exists on (a, b] but
is unbounded. The same limiting ideas work.
Z 1
1
Example 8.25. Find √ dx.
0 x
Solution 8.26. Observe that √1x exists on (0, 1] but does not exist at x = 0,
and limx→0+ √1x = ∞, so f is unbounded on (0, 1]. Therefore we consider
√ 1
Z 1 Z 1
√

1 1
√ dx = lim √ dx = lim 2 x = lim (2 − a) = 2.
0 x a→0+ a x a→0+
a
a→0+
Finally, sometimes an improper integral may be hard to calculate explicitly but

its convergence or divergence can be determined by comparing it to another
integral that is known to converge or diverge. This is known as the comparison
test.
Theorem 8.27 (The Comparison Test). Let f and g be continuous functions
such that 0 ≤ f (x) ≤ g(x) for all x ∈ [a, ∞). The following hold:
Z ∞ Z ∞
(i) if g(x) dx converges then so does f (x) dx;
a a
82
Z ∞ Z ∞
(ii) if f (x) dx diverges then so does g(x) dx.
a a
Example 8.28. Determine whether or not the improper integral
Z ∞
1
√ dx
1 1 + x3
converges or diverges.
1
Solution 8.29. It is not easy to integrate √1+x 3
, but it is easy to integrate
1 1
√
x3
= 3 . Observe that for all x ∈ [1, ∞) we have 1 + x3 > x3 > 0, implying
√ x 2
√ 1
that 1 + x3 > x3 > 0, which in turn implies that 0 < √1+x 3
< √1x3 . Write
3
√1
x3
as x− 2 . Then
Z ∞ Z ∞
1 3
0≤ √ dx ≤ x− 2 dx.
1 1 + x3 1
Since
Z ∞ Z b b
− 23 − 32 − 12
2
x dx = lim x dx = lim −2x lim (− √ + 2) = 2
= b→∞
1 b→∞ 1 b→∞ 1 b
R∞ − 23
the integral 1 x dx converges and therefore by the Comparison Test so
R∞ 1
does 1 √1+x 3
dx.
83
9. Differential Equations
Let y be a function that depends on a variable x, like y = sin x or y = 2x2 −

3x + ex . An ordinary differential equation (ODE) is an equation involving y
and its derivatives. For example:
• y 00 + xy 0 − ex = xy ;
dy 2 dy
• ( dx ) − dx − y = sin x;
q
d2 y dy 2
• dx 2 = c 1 + ( dx ) .
Definition 9.1. The order an of ODE is the degree of the highest derivative
in the equation.
In the examples above, the first equation has order 2 since the highest deriv-
ative involved is the second; the second equation has order 1 since - despite
the square - the highest derivative involved is the first; and the third equation
has order 2.
There are usually many solutions to an ODE. For example, consider the
ODE
dy
= y.
dx
We have no tools for solving this yet, but it’s not hard to see that y = ex
dy
is a solution since dx = ex = y. Another solution is y = 21 ex , because
dy 1 x
dx = 2 e = y.
Definition 9.2. The general solution of an ODE is the most general function
y = f (x) that satisfies the ODE.
Sometimes you want to specify which of the solutions is the one you want.
This can be done with initial condition or boundary conditions.
dy
Example 9.3. Find the solution of the ODE dx = y satisfying y(0) = 1.
Solution 9.4. Again, we don’t have tools yet to properly analyse this, but
notice that for the two solutions we know, y = ex and y = 21 ex , the condition
y(0) = 1 holds only for y = ex . So that’s the solution we want.
Definition 9.5. An initial value problem (IVP) is an ODE together with an
initial condition. A boundary value problem (BVP) is an ODE together with
boundary conditions.
Warning: There is no single method for dealing with all ODE’s. There are
different methods to deal with different types of ODE’s.
First order ODE’s.

A first order ODE is of the form
y 0 = f (y, x)
84
for some function f involving y and x. Equivalently, we may write

dy
= f (y, x).
dx
Separable ODE’s. A first order ODE is separable if it can be algebraically

rearranged to look like
dy
g(y) = h(x).
dx
Example 9.6. The ODE
dy
= x2 e−y
dx
dy
is separable because it can be rearranged to look like ey dx = x2 . The ODE
dy
=y+x
dx
is not separable.
dy
The Method for Solution: Given the separable ODE g(y) dx = h(x):
• Step 1: Formally rewrite the ODE as g(y) dy = h(x) dx.
Z Z
• Step 2: Integrate g(y) dy = h(x) dx + c.
dy
Example 9.7. Solve the ODE ey dx = x2 .
Solution 9.8. Observe that this ODE is separable. So formally rewrite it as

ey dy = x2 dx.
That is, put all the y’s on one side and all the x’s on the other. Now integrate
to get
Z Z
e dy = x2 dx + c
y
1 3
=⇒ gey = x + c.
3
Solving for y by taking logarithms gives
3
x
y = ln +c .
3
dy
Note that Step 1 in the Method makes no real sense, since dx is not a fraction.
However,
Z the
Z Method does work: to check just differentiate both sides of
g(y) dy = h(x) dx + c and see if we can recover the ODE. We have
85
Z Z
d d
g(y) dy = h(x) dx + c
dx dx
Z Z
d dy d
=⇒ g(y) dy = h(x) dx (Chain Rule)
dy dx dx
dx
=⇒ g(y) = h(x) (Fundamental Theorem of Calculus).
dy
The next example is an important one.

Example 9.9. Find the general solution of the ODE
dy
= ky
dx
where k is a constant.
Solution 9.10. Notice that the ODE is separable, so (assuming y 6= 0) rewrite
it as
1 dy
= k.
y dx
Apply the method of solution to get:
1
dy = k dx
y
Z Z
1
=⇒ dy = k dx
y
=⇒ ln |y| = kx + c
=⇒ |y| = ekx+c .
Tidying up, note that ekx+c = ekx ec . So if we let A = ec then we obtain the
solution
|y| = Aekx .
Here, if y > 0 then y = Aekx and if y < 0 then y = −Aekx . In either case,
dy
y = Aekx for some constant A. Thus the general solution of the ODE dx = kx
kx
is y = Ae .
Another example is the logistic equation:

dy
= ky(M − y)
dx
where k and M are constants. This is often used to model occurrences within
a population, like the spread of a disease or flow of information. For example,
if M is the total population and y is the number of people with a contractible
dy
disease, then the ODE says that the rate dx at which the disease will spread is
proportional to the number of people y who have the disease and the number
of people M − y who don’t have the disease.
dy
Example 9.11. Find the general solution to the logistic equation dx =
ky(M − y).
86
Solution 9.12. Rewriting the ODE as

1 dy
=k
y(M − y) dx
shows that it is separable. Integrating gives
Z Z
1
(∗) dy = k dx.
y(M − y)
For the left hand side, use partial fractions. Skipping the details (you should
check them), we get
1 1 1 1 1
= + .
y(M − y) M y M (M − y)
Therefore
Z Z Z
1 1 1 1 1
dy = dy + dy
y(M − y) M y M M −y
1 1
= ln y − ln(M − y)
M M
1 y
= ln .
M M −y
So from (*) we get

1 y
ln = kx + c.
M M −y
Now solve for y. We have

y y
ln = M (kx + c) =⇒ = eM (kx+c) = AeM kx
M −y M −y
where A = eM c . Solving the right equation for y now gives
AM
y=
A + e−M kc
which is the general solution to the logistic equation.
First order linear ODE’s. A first order linear ODE is of the form
dy
+ p(x)y = q(x)
dx
for some continuous functions p(x) and q(x). Note:
dy
• if p(x) = 0 then we have to solve dx = q(x), that is, simply find an
antiderivative of q(x);
dy
• if q(x) = 0 then we have to solve dx = −p(x)y which is separable.
So if both p(x) 6= 0 and q(x) 6= 0 the equation is more subtle.
Definition 9.13. A first order linear ODE with q(x) = 0 is called homoge-
neous. A first order linear ODE with q(x) 6= 0 is called non-homogeneous.
87
As the homogeneous case is separable, we have a method for solving already.

The method for the non-homogeneous case uses a neat trick called the inte-
grating factor. Let Z
F (x) = p(x) dx.
The integrating factor is
eF (x) .
Observe that
deF (x)

d F (x) dy F (x)
ye = e +y (Product Rule)
dx dx dx
dy F (x) dF (x)
= e + yeF (x) (Chain Rule)
dx dx
dy F (x)
= e + yeF (x) p(x) (Definition of F (x))
dx
F (x) dy
=e + p(x)y .
dx
dy
Notice that the term dx + p(x)y at the end is the left hand side of the first
dy
order linear ODE. Therefore, if we multiply the ODE dx + p(x)y = q(x) by
F (x)
e we get

dy d
eF (x) + p(x)y = eF (x) q(x) =⇒ yeF (x) = eF (x) q(x).
dx dx
This leads to the method for solving such ODE’s.
The Method for Solution. Given a First order Linear ODE

dy
(∗) + p(x)y = q(x) :
dx
• Step 1: Find the integrating factor eF (x) .

F (x) d F (x)
• Step 2: Multiply (∗) by e to get dx ye = eF (x) q(x).
Z
• Step 3: Integrate to get ye F (x)
= eF (x) q(x) dx.
Z
• Step 4: Solve for y to get y = e−F (x) eF (x) q(x) dx.
dy
Example 9.14. Find the solution of the ODE dx − 2y = x given the initial
condition y(0) = 1.
dy
Solution 9.15. First find the general solution. Observe that dx − 2y = x is a
first order linear non-homogeneous ODE. Here p(x) = −2 and q(x) = x. Let
Z Z
F (x) = p(x) dx = −2 dx = −2x.
88
So the integrating factor is

eF (x) = e−2x .
Multiply the ODE by the integrating factor to get

dy d
e−2x − 2y = xe−2x =⇒ ye−2x = xe−2x .
dx dx
Integrate to get
Z Z
d
ye−2x dx = xe−2x dx
dx
Z
−2x
ye = xe−2x dx.
For the right hand side, integrate by parts (you do the details) to get
Z
x 1
xe−2x dx = − e−2x − e−2x + c.
2 4
Therefore
x 1
ye−2x = − e−2x − e−2x + c
2 4
and solving for y gives
x 1
y = − − + ce2x .
2 4
This is the general solution of the ODE. For the initial condition y(0) = 1 we
get
0 1 1
1 = y(0) = − − + ce0 = − + c
2 4 4
implying that c = 45 . Hence the solution of the ODE with initial condition
y(0) = 1 is
x 1 5
y = − − + e−2x .
2 4 4
Bernoulli ODE’s. A Bernoulli ODE is of the form

dy
+ p(x)y = q(x)y n
dx
for some n. Note that if n = 0 or n = 1 then this equation is linear. If
n∈/ {0, 1} the equation is nonlinear.
The key to solving a Bernoulli ODE is to craftily turn it into a linear ODE.
This is done using the substitution z = y 1−n . By implicit differentiation
dz dy dy 1 dz
= (1 − n)y −n =⇒ = yn .
dx dx dx n − 1 dx
Now divide the Bernoulli ODE by y n to get
1 dy
+ p(x)y 1−n = q(x)
y n dx
dy
and substitute for dx and y 1−n to get
1 dz
+ p(x)z = q(x).
n − 1 dx
89
This is a first order linear ODE and can be solved in the usual way.
dy y
Example 9.16. Find the general solution to the ODE dx − x = xy 2 .
Solution 9.17. Taking p(x) = − x1 and q(x) = x the ODE is
dy
+ p(x)y = q(x)y 2
dx
which is a Bernoulli ODE with n = 2. Use the substitution z = y 1−n = y −1 .
Then we get the linear ODE
1 dz dz 1
+ p(x)z = q(x) =⇒ − − z = x.
1 − 2 dx dx x
Let’s rewrite this as
dz 1
+ z = −x.
dx x Z
1
Solve this using the integrating factor method. Now F (x) = dx = ln x so
x
eF (x) = eln x = x.
Multiply the ODE (in the z variable) by the integrating factor to get

dz 1 d
x + z = −x2 =⇒ zx = −x2 .
dx x dx
Integrate to get
x3
zx = − + c
3
and solve for z to get
x2 c
z=− + .
3 x
This solves the linear ODE for the variable z but we want to get back to the
variable y so reverse the substitution. As z = y −1 we obtain
x2 c −x3 + 3c x3 + C
y −1 = − + = =−
3 x 3x 3x
where C = −3c. (As c is an arbitrary constant, so is −3c, so we may as well
just say C is the arbitrary constant.) Inverting then gives
3x
y=− 3
x +C
which is the general solution to the Bernoulli ODE.
Clairaut ODE’s. A Clairaut ODE is of the form

dy dy
(8) y=x +f
dx dx
for some differentiable function f . For example, if f (z) = sin z + z 2 − 1
then 2
dy dy dy
y=x + sin + −1
dx dx dx
is a Clairaut ODE.
90
This is a peculiar ODE in that, in order to solve it, one does not integrate but
instead differentiates! Using the Product and Chain Rules, differentiating (8)
gives
d2 y
2
dy dy 0 dy d y
= +x 2 +f .
dx dx dx dx dx2
dy
Cancelling out the dx on each side gives
d2 y
2
0 dy d y
0=x 2 +f
dx dx dx2
2
dy d y
=⇒ 0 = x + f0 .
dx dx2
Thus there is a solution if either
d2 y

dy
x + f0 =0 or = 0.
dx dx2
2
d y dy
If dx 2 = 0 then dx = c for some constant c. Substituting this back into
Clairaut’s ODE (8) gives the solution
y = cx + f (c).
This is the general
solution. Note that y is simply a line. The other case is
dy
when x + f 0 dx = 0. This turns out to have a unique solution called the
singular solution, and it is the envelope of all the general solutions, in the
following sense
envelope
.
The singular solution can be difficult to work out and we won’t go into it.
We’ll simply restrict to the general solution.
Example 9.18. Find the general solution to the ODE
3
dy dy dy
y=x + −2 .
dx dx dx
Solution 9.19. Observe that this is a Clairaut ODE with
3
dy dy dy
f = −2 .
dx dx dx
91
That is, f (z) = z 3 + 2z. The general solution is of the form y = cx + f (c) for
some constant c, so in this case the general solution is
y = cx + c3 + 2c.
Second order linear ODE’s with constant coefficients. This is our first
type of second order ODE, involving the second derivative. A second order
linear ODE with constant coefficients is of the form
d2 y dy
a 2 +b + cy = f (x)
dx dx
where a, b and c are constants and f is some function depending only on x.
Method of Solution.
• Step 1: Find the general solution of the associated homogeneous ODE
d2 y dy
a +b + cy = 0.
dx2 dx
This is called the complementary function (CF).
• Step 2: Find any solution you can to the full nonhomogeneous ODE
d2 y dy
a +b + cy = f (x).
dx2 dx
This is called a particular integral (PI).
• Step 3: The general solution to the full nonhomogeneous ODE
d2 y dy
a +b + cy = 0
dx2 dx
is y = CF+PI.
Let’s check that this makes sense. We need to check: (i) that y = CF+PI
is in fact a solution of the ODE, and (ii) that all solutions are found this
way. Write g(x) for the complementary function, and p(x) for the particular
integral.
To check (i), substitute y = g(x) + p(x) into the ODE and check that both
sides are equal. We have
d2 d
a 2
(g(x) + p(x)) + b (g(x) + p(x)) + c(g(x) + p(x))
dx dx
2 2
d g(x) dg(x) d p(x) dp(x)
= a +b + cg(x) + a +b + cp(x)
dx2 dx dx2 dx
=0 + f (x)
=f (x)
where the second equality comes from g(x) being a solution of the associated
homogeneous equation and p(x) being a solution of the full nonhomogeneous
ODE. In particular, g(x) + p(x) is a solution of the ODE.
92
To check (ii), suppose that h(x) is any solution of the full nonhomogeneous
ODE. Consider h(x) − p(x). Substituting into the ODE we get
d2 d
a (h(x) − p(x)) + b (h(x) − p(x)) + c(h(x) − p(x))
dx2 dx
2 2
d h(x) dh(x) d p(x) dp(x)
= a + b + ch(x) − a + b + cp(x)
dx2 dx dx2 dx
=f (x) − f (x)
=0
where the second equality comes from both h(x) and p(x) being solutions of
the full nonhomogeneous ODE. This says that h(x) − p(x) is a solution of the
associated homogeneous ODE, so it is a special case of the complementary
function. That is, h(x) − p(x) is of the form CF, so h(x) is of the form
CF + p(x) = CF+PI. Hence y = CF+PI really is the most general solution to
the ODE.
The next question is: How do you find the complementary function and the
particular integral? First consider the complementary function.
A basic solution of the associated homogeneous ODE is y = esx for some
constant s. To check this is the case, substitute y = esx into the associated
homogeneous ODE and check that you get zero. We have
dy d2 y
= sesx and = s2 esx .
dx dx2
So we get
d2 y dy
a 2
+b + cy = 0
dx dx
=⇒ as2 esx + bsesx + cesx = 0
=⇒ (as2 + bs + c)esx = 0
=⇒ as2 + bs + c = 0
where the last equality holds since esx > 0 for all x. The equation as2 +bs+c =
0 is called the auxilliary equation and it has a solution when
√
−b ± b2 − 4ac
s= .
2a
This gives three cases:
• Case 1: the auxilliary equation has real roots α and β. Then the
complementary function is
y = Aeαx + Beβx
where A and B are arbitrary constants.
• Case 2: the auxilliary equation has a repeated root α. Then the
complementary function is
y = (A + Bx)eαx
93

• Case 3: the auxilliary equation has complex conjugate roots γ ± iδ.
Then the complementary function is
y = eγx (A cos δx + B sin δx)
Note: To check that these really are the complementary functions, just sub-
stitute them into the associated homogeneous ODE and see that you get
zero.
d2 y dy
Example 9.20. Solve the initial value problem dx2 − 3 dx + 2y = 0 where
y(0) = 1 and y 0 (0) = −1.
Solution 9.21. Note that the ODE is second order with constant coefficients,
and it is homogeneous. So to find the general solution we need only find the
complementary function. The auxilliary equation is
s2 − 3s + 2 = 0.
This factors as (s − 1)(s − 2) so has two real roots 1 and 2. Thus the general
solution to the ODE is
y = Aex + Be2x .
To solve the initial value problem first differentiate to get
y 0 = Aex + 2Be2x .
From these two equations, the initial value y(0) = 1 gives
1 = y(0) = Ae0 + Be0 = A + B
and the initial value y 0 (0) = 1 gives
−1 = y 0 (0) = Ae0 + 2Be0 = A + 2B.
Now solve this system of two equations to get A = 3 and B = −2. Thus the
solution to the initial value problem is
y = 3ex − 2e2x .
d2 y dy
Example 9.22. Find the general solution of the ODE dx2 − 6 dx + 13y = 0.
Solution 9.23. Observe that this is a second order linear homogeneous ODE
with constant coefficients. The auxilliary equation is
s2 − 6s + 13 = 0.
The roots are
√ √
6± 62 − 4 · 13 6 ± −16
s= = = 3 ± 2i.
2 2
As these are complex conjugate roots, the complementary function is
y = e3x (A cos 2x + B sin 2x).
As the ODE is homogeneous, the complementary function is the general solu-
tion.
94
d2 y dy
Example 9.24. Solve the boundary value problem dx2 − 4 dx + 4y = 0 with
y(0) = −1 and y(1) = 1.
Solution 9.25. This is a second order linear homogeneous ODE with constant
coefficients. The auxilliary equation is
s2 − 4s + 4 = 0.
This factors as (s − 2)2 = 0 so s = 2 is a repeated root. The complementary
function is therefore
y = (A + Bx)e2x .
As the ODE is homogeneous, the complementary function is also the general
solution. The boundary condition y(0) = −1 gives
−1 = y(0) = (A + 0)e0 = A
and the boundary condition y(1) = 1 gives
1 = y(1) = (A + B)e2 .
Substituting in A = −1 and solving for B gives
B = 1 + e−2 .
Therefore the solution of the boundary value problem is
y = (−1 + (1 + e−2 )x)e2x = (−1 + x + e−2 x)e2x .
We now know how to find the complementary function for a second order
linear constant coefficient ODE. What about the particular integral? This
can be tricky, but here is a method that usually works. Suppose the ODE
is
d2 y dy
a 2 +b + cy = f (x).
dx dx
Then as the particular integral try a function of the same form as f (x). For
example:
Sample f (x) Try as PI
constant y(x) = p
3x + 2 y(x) = px + q
2
x −2 y(x) = px2 + qx + r
5e3x y(x) = pe3x
2 sin 3x y(x) = p sin 3x + q cos 3x
−4 cos 2x y(x) = p sin 2x + q cos 2x
a sum of these a sum of these.
d2 y dy
Example 9.26. Solve the ODE dx2 + 5 dx + 4y = 4x2 − 2x + 3 where y(0) = 1
and y 0 (0) = 0.
95
Solution 9.27. This is a second order linear nonhomogeneous ODE with

constant coefficients. The general solution if of the form CF+PI. First find
the complementary function. The associated homogeneous ODE is
d2 y dy
+5 + 4y = 0.
dx2 dx
This has auxilliary equation
s2 + 5s + 4 = 0
which factors as (s + 1)(s + 4) = 0, so the roots are s = −1 and s = −4. The
complementary function is therefore
CF = Ae−x + Be−4x .
For the particular integral, since f (x) is a quadratic polynomial, try y(x) =
dy d2 y
px2 + qx + r. Then dx = 2px + q and dx 2 = 2p, so substituting these into the
ODE gives
2p + 5(2px + q) + 4(px2 + qx + r) = x2 − 4x + 1.
Rearranging to collect like terms gives
4px2 + (10p + 4q)x + (2p + 5q + 4r) = 4x2 − 2x + 3.
Thus
4p = 4
10p + 4q = −2
2p + 5q + 4r = 3.
The first equation gives p = 1, substituting this into the second equation gives
q = −3, and substituting both values forp and q into the third equation gives
r = 4. Thus
PI = x2 − 3x + 3.
The general solution of the ODE is therefore
y(x) = CF+PI = Ae−x + Be−4x + x2 − 3x + 3.
Now consider the initial values. First differentiate the general solution to get
y 0 (x) = −Ae−x − 4Be−4x + 2x − 3.
The initial value y(0) = 1 gives
1 = y(0) = A + B + 3.
0
The initial value y (0) = 0 gives
0 = y 0 (0) = −A − 4B − 3.
Solving these equations gives A = −1/3 and B = −5/3. Hence the solution
to the initial value problem is
1 5
y(x) = − e−x − e−4x + x2 − 3x + 3.
3 3
d2 y dy
Example 9.28. Find the general solution of the ODE dx2 +3 dx +2y = x+e3x .
96
Solution 9.29. Observe that this is a second order linear nonhomogeneous

ODE with constant coefficients. The associated homogeneous equation is
d2 y dy
+3 + 2y = 0.
dx2 dx
This has auxilliary equation
s2 + 3x + 2 = 0
which factors as (s + 1)(s + 2) = 0, and so has roots s = −1 and s = −2. The
complementary function is therefore
CF = Ae−x + Be−2x .
For the particular integral, since x + e3x is a sum of a the polynomial x and
dy
the exponential e−3x , try y(x) = px + q + re−3x . Then dx = p − 3re−3x and
d2 y
dx2 = 9re−3x . Substituting these into the ODE gives
9re−3x + 3(p − 3re−3x ) + 2(px + q + re−3x ) = x + e−3x .
Rearranging to collect like terms gives
(9r − 9r + 2r)e−3x + 2px + (3p + 2q) = x + e−3x .
Therefore
2r = 1
2p = 1
3p + 2q = 0
This gives r = 1/2, p = 1/2 and q = −3/4. Thus the particular integral is
1 3 1
PI = x − + e−3x .
2 4 2
Hence the general solution of the ODE is
1 3 1
y(x) = CF+PI = Ae−x + Be−2x + x − + e−3x .
2 4 2
Challenge Problem 9.30. Show that a first order linear ODE can also be
solved using the CF+PI method. That is, given the ODE
dy
(9) + p(x)y = q(x),
dx
dy
suppose that g(x) is the solution to the associated homogeneous ODE dx +
p(x)y = 0 and suppose that h(x) is any solution to the nonhomogenous ODE.
Show that:
• CF+PI - that is, g(x) + h(x) - really is a solution to (9);
• any solution to (9) is of the form CF+PI.
Further, if the function p(x) in (9) is a constant, say p(x) = a, show that the
complementary function is g(x) = Aeax for some constant A.
97
10. Appendix: Mathematical Induction
Mathematical Induction is a way of proving that a particular statement holds

for all positive integers, or all integers larger than some starting integer N .
Example 10.1. Let n be a positive integer. Show that the sum of all integers
from 1 to n is n(n+1)
2 . That is, show that
n
X n(n + 1)
k = 1 + 2 + · · · + (n − 1) + n = .
2
k=1
The method for doing induction is as follows.

• Step 1: Prove a base case directly, like the n = 1 case.
• Step 2: Assume the inductive hypothesis that the n − 1 case holds.
• Step 3: Use the inductive hypothesis to prove that the n case holds.
• Step 4: Deduce that, by induction, the statement works for all n.
In practise, Step 2 is the hard one. To illustrate the method, let’s solve the
example above.
Solution 10.2. The base case is when n = 1. Then the sum of all integers
from 1 to 1 is just 1, and this equals 1(1+1)
2 , so the base case holds.
Now assume that the formula works all positive integers up to n − 1. In
Pn−1
particular, k=1 k = (n−1)n 2 . Consider the sum of all integers from 1 to n:
1 + 2 + · · · + (n − 1) + n. The inductive hypothesis tells us that the sum
1 + 2 + · · · + (n − 1) equals (n−1)n
2 . So using this, we get

(n − 1)n
1 + 2 + · · · + (n − 1) + n = 1 + 2 + · · · + (n − 1) + n = + n.
2
Observe that
(n − 1)n (n − 1)n + 2n n2 − n + 2n n2 + n n(n + 1)
+n= = = = .
2 2 2 2 2
Thus
n(n + 1)
1 + 2 + · · · + (n − 1) + n = .
2
Pn
Therefore, by induction, the formula k=1 k = n(n+1)2 . holds.
Here is another example, more calculus related.

Example 10.3. Show that, for all positive integers n, the nth -derivative of xn
dn n
is n!. That is, x = n!.
dxn
Solution 10.4. The base case is n = 1. Then we want the first derivative
of the function x1 = x, which is 1. On the other hand, with n = 1 we have
n! = 1! = 1. Therefore the asserted formula holds for n = 1.
98
dn−1 n−1

Now assume that, by inductive hypothesis, dx n−1 x = (n − 1)!. Consider
the nth -derivative of xn . To make use of the inductive hypothesis, let’s think
of the nth -derivative as the (n − 1)st derivative of the first derivative. Then
we get
dn n dn−1 dn−1

d n n−1
x = x = nx .
dxn dxn−1 dx dxn−1
The derivative is linear so we can pull the constant n out, and use the inductive
hypothesis to get
dn−1 dn−1

n−1 n−1
nx = n · n−1 x = n · (n − 1)! = n!.
dxn−1 dx
dn n
Thus x = n!. Therefore, by induction, the asserted formula holds.
dxn
dn √
Challenge Problem 10.5. Guess a formula for 1 + x and prove it by
dxn
induction.
dn −x
Challenge Problem 10.6. Guess a formula for xe and prove it by
dxn
induction.

Math1059 - Calculus

Uploaded by

Copyright:

Available Formats

Math1059 - Calculus

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Math1059 - Calculus

Uploaded by

Copyright:

Available Formats

What are the definitions of domain, range, injective, surjective and bijective functions?

What are the definitions of domain, range, injective, surjective and bijective functions?

What are some examples of functions and their properties discussed?

What are some examples of functions and their properties discussed?

Table of Contents

What this means is that f cannot send x to both y and y 0 in Y if y 6= y 0 .

Examining the definition of a function, let’s consider what a function *does

Example 1.5. Let f (x) = x1 . What is the domain and range of f ?

There are special functions with extra properties.

More examples of functions. It’s useful to build up a kitbag of functions

The sine and cosine functions. First consider f : R −→ R defined by

More trig functions. There are four more trig functions:

(iv) cot x = cos x

Inverse Functions. Suppose that there are functions f : X −→ Y and

Now suppose that f : X −→ Y is a bijection. Then we can define an inverse

Example 1.16. Let f : R −→ R+ be defined by f (x) = x2 . Find a restricted

Exponential and logarithmic functions. Let a > 0 be a real number. The

Exponential functions satisfy important properties:

• ax ay = ax+y for all x, y ∈ R;

In particular, the exponential and logarithm functions with base a compose

Logarithm functions also satisfy important properties:

Let’s consider two simple functions g, h : R −→ R defined by

WARNING! A limit says nothing about what f (x) is doing at a. It is only

Solution 2.4. Note that the denominator is 0 at x = 1. On the one hand, as

Sometimes limits don’t exist. For example, consider limx→0 x1 .

Properties of Limits. In order to calculate limits it’s good to establish

Distance Inequalities: Let a, b, c be three real numbers. Then:

Here is an example of Theorem 2.9 in action.

The next series of properties lets us do arithmetic with limits.

Solution 2.15. Let q(x) = 2x3 − x + 1. Then q(3) = 52 is not zero. So by

One more property of limits is the Squeeze Theorem.

Here is an example showing how the Squeeze Theorem works in practise.

For exmple, we have

Limits at infinity. It is often interesting to see what a function is doing

if f (x) approaches L as x approaches −∞.

Infinite Limits. Another variation is when the function value tends to ∞ or

Solution 2.24. Try the same trick as in Example 2.21. We have

Intuitively, a function is continuous if it has no breaks or sudden changes, that

Definition 3.1. A function f is continuous at a point a if limx→ f (x) = f (a).

Picking apart the definition, this says:

• the function is defined on an open interval containing a (so it makes

• the limit limx→a f (x) exists; and

• the limit at a is equal to f (a).

Definition 3.2. If a function f is not continuous at a we say that f is dis-

For example, the function

is not continuous at x = 1 since limx→1 f (x) = 3 is not equal to f (1) = 1.

is not continuous at x = 0 since limx→0+ g(x) = ∞ while g(0) = 0.

then f is continuous on the right at 0 since limx→0+ f (x) = 1 and f (0) = 1,

Since f (a) + g(a) = (f + g)(a), we obtain limx→a (f (x) + g(x)) = (f + g)(a),

Definition 3.7. A polynomial is a function p : R −→ R of the form

Theorem 3.6 lets us quickly build up a stockpile of continuous functions.

Composition is an important operation that builds new functions out of ex-

Properties of continuous functions. Continuous functions have two key

One application of this theorem is to help locate the zeroes of a function,

Another key property of continuous functions is that they attain minimums

The Extreme Value Theorem is useful in optimisation problems, when it helps

Some proofs. This subsection is optional, but is recommended reading. It

Value Theorem. First, we need another variation on the upper/lower bound

Theorem 3.28 (The Intermediate Valude Theoerem). If f is continuous on

Graphically, a function is concave up if the tangent lines are below the