Notes 184
Notes 184
STEVEN V SAM
Contents
1. Review and introduction 2
1.1. Bijections 2
1.2. Sum and product principle 3
1.3. 12-fold way, introduction 3
1.4. Induction 4
2. Fundamental counting problems 5
2.1. Permutations 5
2.2. Words 7
2.3. Choice problems 9
2.4. Compositions 12
3. Stirling numbers 13
3.1. Set partitions 13
3.2. Falling factorials 15
3.3. Cycles in permutations 17
4. Binomial theorem and generalizations 19
4.1. Binomial theorem 19
4.2. Multinomial theorem 21
4.3. Re-indexing sums 22
5. Formal power series 22
5.1. Definitions 22
5.2. Binomial theorem (general form) 25
6. Ordinary generating functions 27
6.1. Linear recurrence relations 27
6.2. Integer partitions 32
6.3. Catalan numbers 36
7. Exponential generating functions 38
7.1. Products of exponential generating functions 38
7.2. Compositions of exponential generating functions 42
7.3. Cayley’s enumeration of labeled trees 46
7.4. Lagrange inversion formula 48
8. Sieving methods 50
8.1. Inclusion-exclusion 50
8.2. Möbius inversion 56
Two situations have already been filled in and won’t be considered interesting. I’m not
going to emphasize this particular table. The point of bringing it up is to illustrate what
kinds of problems might be natural to consider. Some of these entries have simple formulas
in terms of mathematical notation we’re familiar with while others do not. Surprisingly just
changing the problem slightly can take you between these two cases. We’ll start working on
them soon, but not necessarily in a systematic way.
The perspective of the 12-fold way is due to Gian-Carlo Rota.
1.4. Induction. Induction is used when we have a sequence of statements P (0), P (1), P (2), . . .
labeled by non-negative integers that we’d like to prove. For example, P (n) could be the
statement:
n
X n(n + 1)
i= .
i=0
2
In order to prove that all of the statements P (n) are true using induction, we need to do 2
things:
• Prove that P (0) is true.
• Assuming that P (0), . . . , P (n) are true, use it to prove that P (n + 1) is true. Some-
times we only need P (n), sometimes we need all of them.
Let’s see how that works for our example:
• P (0) is the statement 0i=0 i = 0 · 1/2. Both P
P
sides are 0, so the equality is valid.
• Now we assume that P (n) is true, i.e., that ni=0 i = n(n + 1)/2. Now we want to
prove that n+1
P
i=0 i = (n + 1)(n + 2)/2.
Let’s start with the left hand side and simplify using everything we know:
n+1
X n
X
i= i + (n + 1)
i=0 i=0
n(n + 1)
= + (n + 1)
2
n
= ( + 1)(n + 1)
2
(n + 1)(n + 2)
=
2
The first line is just using what a sum is, the second line uses P (n) and the rest is
some algebra. So we’ve proven P (n + 1).
Since we’ve completed the two required steps, we have proven that the summation identity
holds for all n.
Remark 1.4.1. We have labeled the statements starting from 0, but sometimes it’s more
natural to start counting from 1 instead, or even some larger integer. The same reasoning
as above will apply for these variations. The first step “Prove that P (0) is true” is then
replaced by “Prove that P (1) is true” or wherever the start of your indexing occurs. □
A subset T of a set S is another set all of whose elements belong to S. We write this as
T ⊆ S. We allow the possibility that T is empty and also the possibility that T = S.
Theorem 1.4.2. There are 2n subsets of a set of size n.
NOTES FOR MATH 184 5
For example, if S = {1, ⋆, U }, then there are 23 = 8 subsets, and we can list them:
∅, {1}, {⋆}, {U }, {1, ⋆}, {1, U }, {U, ⋆}, {1, ⋆, U }.
The following proof will use induction, the sum principle, and the bijection principle so is
a great example to study carefully.
Proof. Let P (n) be the statement “any set of size n has exactly 2n subsets”.
We check P (0) directly: if S has 0 elements, then S = ∅, and the only subset is S itself,
which is consistent with 20 = 1.
Now we assume P (n) holds and use it to show that P (n + 1) is also true. Let S be a set
of size n + 1. Pick an element x ∈ S and let S ′ be the subset of S consisting of all elements
that are not equal to x, i.e., S ′ = S \ {x}. Then S ′ has size n, so by induction the number
of subsets of S ′ is 2n .
Next, every subset of S either contains x or it does not. To make these kinds of arguments
systematic, call a subset “type I” if it does not contain x and “type II” if it does. A subset has
to be exactly one of these kinds so we can count both and add the answers (sum principle).
Type I: The subsets which do not contain x are the same thing as subsets of S ′ , so there
are 2n of them because P (n) is true.
Type II: We will show there is a bijection between type I and type II
f
{type I subsets} o / {type II subsets}
g
(we’ve already seen this but we’ll redo it). If T is type I, then we define f (T ) = T ∪ {x}
which is type II. If U is type II, then we define g(U ) = U \ {x} which is type I. Then f and
g define a bijection (we won’t spell out every detail here, I hope it’s clear). So there are also
2n type II subsets.
All together we have 2n + 2n = 2n+1 subsets of S, so P (n + 1) holds. □
Continuing with our example, if x = 1, then the subsets not containing x are ∅, {⋆}, {U }, {⋆, U },
while those that do contain x are {1}, {1, ⋆}, {1, U }, {1, ⋆, U }. There are 22 = 4 of each kind.
A natural followup is to determine how many subsets have a given size. In our previous
example, there is 1 subset of size 0, 3 of size 1, 3 of size 2, and 1 of size 3. We’ll discuss this
problem in the next section.
Some more to think about:
• Show that Pni=0 i2 = n(n + 1)(2n + 1)/6 for all n ≥ 0.
P
• Show that ni=0 2i = 2n+1 − 1 for all n ≥ 0.
• Show that 4n < 2n whenever n ≥ 5.
What happens with ni=0 i3 or ni=0 i4 , or...? In the first two cases, we got polynomials
P P
in n on the right side. This actually always happens, and we’ll see why later when we talk
about falling factorials.
our objects could be flowers and the types are colors). Also, objects of the same type are
considered identical. For convenience, we will label the “types” with numbers 1, 2, . . . , k and
let ai be the number of objects of type i (so a1 + a2 + · · · + ak = n).
Theorem 2.1.4. The number of ways to arrange the n objects in the above situation is
n!
.
a1 !a2 ! · · · ak !
As an exercise, you should adapt the reasoning in (2) to give a proof of this theorem.
The quantity above will be used a lot, so we give it a symbol, called the multinomial
coefficient:
n n!
:= .
a1 , a2 , . . . , ak a1 !a2 ! · · · ak !
In the case when k = 2 (a very important case), it is called the binomial coefficient. Note
that in this case, a2 = n − a1 , so for shorthand, one often just writes an1 instead of a1n,a2 .
Example 2.2.4. How many pairs of subsets S, T ⊆ [n] satisfy S ⊆ T ? We can also encode
this problem as a problem about words. Before we do that, let me illustrate with an example
with n = 5.
Suppose our example is S = {1, 2} and T = {1, 2, 4}. We can record the pair of subsets
as a table:
“in S and T ” “in T but not S” “not in T or S”
1 ✓
2 ✓
3 ✓
4 ✓
5 ✓
Note that there is no column for “in S but not T ” because that would violate the assump-
tion S ⊆ T .
You should convince yourself that given a table like above, with exactly one checkmark
in each row, we can recover the information of S and T . Since there are 35 choices for such
tables, we see that’s how many pairs of subsets we have.
We can also represent this visually using a Venn diagram.
3, 5
4
1, 2
The inner circle represents the subsets S, the outer circle represents the elements in T (it
contains the inner circle, which reflects the condition S ⊆ T , and the square represents our
original set [5]. What we should take away from this is that there are exactly 3 regions in
this picture, corresponding to the 3 columns in the previous table. Every way of assigning
numbers to these regions matches up with a table and also with a pair of subsets.
Finally, let’s make this a little more formal. Let A be the alphabet of size 3 whose elements
are: “in S and T ”, “in T but not S” and “not in T or S”. It can be helpful to visualize
these as the different regions in this diagram:
[n]
T
S
Then each pair S ⊆ T gives a word of length n in A: the ith entry of the word is the element
which describes the position of i. So there are 3n such pairs.
Quick: how many pairs of subsets S, T ⊆ [n] satisfy the condition that S is not a subset of
T ? Use the subtraction principle: its the complement of the above in the set of all possible
pairs of subsets, so the answer is 4n − 3n . □
NOTES FOR MATH 184 9
How about words without repeating entries (we will define these to be injective words).
Define the falling factorial:
(n)k := n(n − 1)(n − 2) · · · (n − k + 1).
There are k numbers being multiplied in the above definition. When n = k, we have
(n)n = n!, so this generalizes the factorial function.
Theorem 2.2.5. If |A| = n and n ≥ k, then there are (n)k injective words of length k in A.
Proof. Start with a permutation of A. The first k elements in that permutation give us an
injective word of length k. But we’ve overcounted because we don’t care how the remaining
n−k things we threw away are ordered. In particular, this process returns each word exactly
(n − k)! many times, so our desired quantity is
n!
= (n)k . □
(n − k)!
Some further things to think about:
• A small city has 10 intersections. Each one could have a traffic light or gas station
(or both or neither). How many different configurations could this city have?
• Using that (n)k = n · (n − 1)k−1 , can you find a proof for Theorem 2.2.5 that uses
induction?
• Which entries of the 12-fold way table can we fill in now?
2.3. Choice problems. We finish up with some related counting problems. Recall we
showed that an n-element set has exactly 2n subsets. We can refine this problem by asking
about subsets of a given size.
Theorem 2.3.1. The number of k-element subsets of an n-element set is
n n!
= .
k k!(n − k)!
There are many ways to prove this, but we’ll just do one for now:
Proof. It doesn’t matter which set of size n we’re dealing with, so we work with [n] for
convenience. In the last section on words, we identified subsets of [n] with words of length
n on {0, 1}, with a 1 in position i if and only if i belongs to the subset. So the number of
subsets of size k are exactly the number of words with exactly k instances of 1. This is the
same as arranging n − k 0’s and k 1’s from the section on permutations. In that case, we
n!
saw that the answer is (n−k)!k! . □
n
X n
Corollary 2.3.2. = 2n .
k=0
k
Proof. The left hand side counts the number of subsets of [n] of some size k where k ranges
from 0 to n. But all subsets of [n] are accounted for and we’ve seen that 2n is the number
of all subsets of [n]. □
n
Here’s an important identity for binomial coefficients (we interpret −1 = 0):
10 STEVEN V SAM
1, . . . , n (the number of balls in box i represents how many times i appears in the multiset).
How do we encode this in a useful way? I’ll illustrate with an example.
Suppose n = 5 and k = 4. Our multiset is {1, 1, 3, 5}. Then box 1 has 2 balls, while boxes
3 and 5 each have 1 ball. I’ll encode that by the following picture:
◦ ◦ || ◦ ||◦
We have 4 vertical lines which separate the balls into 5 regions (the boxes). I’ll leave it to
you to convince yourself that every ordering of 4 balls and 4 vertical lines corresponds to
exactly one multiset (i.e., there is a bijection). There are a few subtle points: make sure you
understand why we aren’t putting vertical lines on the outside, for example. In particular,
we have 8 things and just need to know which 4 of them are vertical lines, so there are 84
multisets of size 4 from a set of size 5. Finally, this works in general, but again, I’ll leave it
to you to fill in the details.
This method might be easier to remember, though if you think about it hard enough,
you’ll see that it’s pretty much the same thing as the previous proof. □
Example 2.3.7 (Counting poker hands). We’ll apply some of the ideas above to count the
number of ways to receive various kinds of poker hands. The setup is as follows: Each card
has one of 4 suits: ♣, ♡, ♠, ♢, and one of 13 values: 2, 3, 4, 5, 6, 7, 8, 9, 10, J, Q, K, A.
Each possible pair of suit and value appears exactly once, so there are 52 cards total.
In each situation below, we want to count how many subsets of 5 out of the 52 cards have
certain special properties.
(1) (Four of a kind) This means that 4 of the 5 cards have the same value (and the 5th
necessarily has a different value). Since there are 4 cards in a given suit, the only
relevant information is the value that appears 4 times and the extra card. There are
13 choices for the value, and 48 cards leftover, so there are 13 · 48 ways to get a “four
of a kind”.
(2) (Full house) This means that 3 of the 5 cards have the same value and the other
2 also have the same value. These two values necessarily have to be different. The
relevant information is the two values (with order! why?) and then the suits that are
chosen. There are 13 · 12 ways
to choose two different values with order. To choose 3
suits out of 4, there are 43 ways, 4
and to choose 2 suits out of 4, there are 2
ways,
so in total we get 13 · 12 · 43 42 .
(3) (Two pairs) This means that 2 of the 5 cards have the same value, and 2 of the
remaining 5 cards have the same value. We will also impose these values are different
(so it doesn’t overlap with (1)) and that the value of the 5th card is also different (so
it doesn’t overlap with (2)).
The two values of the pairs are chosen without order (why is this different?) so
there are 13
2
ways. For each value, we choose 2 suits out of 4, so pick up another
4 2
2
. We’ve removed 8 cards from the possibility of what the fifth card can be, so it
42
has 44 possibilities, which gives us a final answer of 132 2
· 44.
(4) (Straight) This means that the values of the 5 cards can be put in consecutive order
(funny rule: A can either count as a 1 or as the value above K). There are no
conditions on the suits. So we need to choose the 5 consecutive values. The smallest
value can be one of: A, 2, 3, 4, 5, 6, 7, 8, 9, 10, and once that is chosen, all of the
other values are determined, so there are 10 possibilities here. For each of the 5 suits,
12 STEVEN V SAM
The answer suggests that we should be able to find a bijection between compositions of n
and subsets of [n − 1]. Can you find one?
3. Stirling numbers
3.1. Set partitions. (Weak) compositions were about indistinguishable objects into distin-
guishable boxes. Now we reverse the roles and consider distinguishable objects into indis-
tinguishable boxes.
Definition 3.1.1. Let X be a set. A partition of X is an unordered collection of nonempty
subsets S1 , . . . , Sk of X such that every element of X belongs to exactly one of the Si . An
ordered partition of X is the same, except the subsets are ordered. The Si are the blocks
of the partition. Partitions of sets are also called set partitions to distinguish from integer
partitions, which will be discussed in the next section. □
Example 3.1.2. Let X = {1, 2, 3}. There are 5 partitions of X:
{{1, 2, 3}}, {{1, 2}, {3}}, {{1, 3}, {2}}, {{2, 3}, {1}}, {{1}, {2}, {3}}.
When we say unordered collection of subsets, we mean that {{1, 2}, {3}} and {{3}, {1, 2}}
are to be considered the same partition.
The notation above is cumbersome, so we can also write the above partitions as follows:
123, 12|3, 13|2, 23|1, 1|2|3. □
The number of partitions of X with k blocks only depends on the number of elements of
X. So for concreteness, we will usually assume that X = [n].
Example 3.1.3. If we continue with our previous example of candy and children: imagine
the 20 pieces of candy are now labeled 1 through 20 and that the 4 children are all identical
clones. The number of ways to distribute candy to them so that each gets at least 1 piece
of candy is then the number of partitions of [20] into 4 blocks. □
Definition 3.1.4. We let S(n, k) be the number of partitions of a set of size n into k
blocks. These are called the Stirling numbers of the second kind. By convention, we
define S(0, 0) = 1. □
Note that S(n, k) = 0 if k > n.
The number of ordered partitions of a set of size n into k blocks is k!S(n, k): the extra data
we need is a way to order the blocks and this can be chosen independently of the partition.
So S(n, k) is, by definition, an answer to one of the 12-fold way entries: how many ways
to put n distinguishable objects into k indistinguishable boxes so that each box gets at least
one object. Similarly, k!S(n, k) is the number of ways to put n distinguishable objects into
k distinguishable boxes so that each box gets at least one object. Alternatively:
Theorem 3.1.5. k!S(n, k) is the number of surjective functions f : [n] → [k].
14 STEVEN V SAM
Unfortunately, it will be generally hard to get nice, exact formulas for S(n, k), but we can
do some special cases:
Example 3.1.6. For n ≥ 1, S(n, 1) = S(n, n) = 1. For n ≥ 2, S(n, 2) = 2n−1 − 1 and
S(n, n − 1) = n2 .
Why? To compute S(n, 2), let’s first count ordered set partitions of [n] with 2 blocks.
This is almost the same as just picking a subset, S since then we can consider the partition
S, [n] \ S. The problem is that S is not allowed to be empty and neither is [n] \ S. So that
leaves us with 2n − 2 options for S, which is the number of ordered set partitions. To get
unordered partitions, we divide by 2!, or just 2.
To compute S(n, n − 1) think about what the blocks must look like. In order to split n
objects into n − 1 blocks, we need to have n − 2 blocks of size 1 and a single block of size
2. So the only relevant information
is which elements go in that block of size 2. This can be
any subset of size 2, hence the n2 . □
We also have the following recursive formula:
Theorem 3.1.7. For n > 0, we have (interpret S(n, k) = 0 if either input is negative)
S(n, k) = S(n − 1, k − 1) + k · S(n − 1, k).
Proof. Consider two kinds of partitions of [n]. The first kind (type I) is when n is in its own
block. In that case, if we remove this block, then we obtain a partition of [n − 1] into k − 1
blocks. To reconstruct the original partition, we just add a block containing n by itself. So
the number of such partitions is S(n − 1, k − 1).
The second kind (type II) is when n is not in its own block. This time, if we remove n, we
get a partition of n − 1 into k blocks. However, it’s not possible to reconstruct the original
block because we can’t remember which block it belonged to. For the purposes of this proof
only, let’s define a marked partition to be a pair (σ, b) where σ is a set partition and b is one
of its blocks.
Then we can define a bijection
type II partitions of marked partitions of
f: →
[n] into k blocks [n − 1] into k blocks
as follows: if τ is a type II partition of [n] into k blocks, then f (τ ) = (σ, b) where σ is the
same as τ except n is deleted, and b is whichever block originally contained n. The inverse
g is defined by adding back n into the block b. Finally, the number of marked partitions of
n − 1 into k blocks is k · S(n − 1, k).
If we add both answers, we account for all possible partitions of [n], so we get the identity
we want. □
We define B(n) to be the number of partitions of [n] into any number of blocks. This is
the nth Bell number. By definition,
X n
B(n) = S(n, k).
k=0
Proof. The left hand side counts the number of functions f : [n] → [d] (since such a function
is equivalent to the word f (1)f (2) · · · f (n) in the alphabet [d]). By Lemma 3.2.1, the right
side counts the number of functions whose image has size k for all possible values of k. But
that accounts for every function exactly once, so we have equality. □
16 STEVEN V SAM
It’s natural to try to guess what happens for ni=0 ir for general r, but the pattern is not
P
easy to guess. Falling factorials work much better.
Example 3.2.4. Since (i)1 = i, we’ve already seen that
n
X 1 1
(i)1 = (n + 1)n = (n + 1)2 .
i=0
2 2
For practice, you should prove this yourself. I won’t work it out in the interest of time and
because it’s a special case of the next identity. □
Given the above, there is a tempting guess for the general case. Let’s go ahead and prove
it:
Theorem 3.2.5. For any non-negative integer d, we have the identity
n
X 1
(i)d = (n + 1)d+1 .
i=0
d+1
Proof. If d = 0, the identity just says that ni=0 1 = (n + 1), which is certainly true. So we
P
don’t need to consider this case anymore and we’re going to assume that d > 0. (I single
this out to avoid dealing with separate cases in the arguments below.)
Now we can try to prove the identity by induction on n. The statement P (n) is just the
identity above.
First the base case P (0): if n = 0, then since d > 0, the left side is 0 and the right side is
1
also 0 (the right side is d+1 1 · 0 · · · (−d + 1)).
NOTES FOR MATH 184 17
Now we assume that P (n) holds, i.e., the identity above is true and need to prove P (n+1):
n+1
X n
X
(i)d = (i)d + (n + 1)d
i=0 i=0
1
= (n + 1)d+1 + (n + 1)d
d+1
(n − d + 1)
= (n + 1)d · + (n + 1)d · 1
d+1
(n − d + 1) d + 1
= (n + 1)d ( + )
d+1 d+1
(n + 1)d · (n + 2)
=
d+1
1
= (n + 2)d+1 . □
d+1
We can try to use everything we’ve learned now to sum up general powers:
n
X n X
X r
r
i = S(r, k)(i)k
i=0 i=0 k=0
r
X n
X
= S(r, k) (i)k
k=0 i=0
r
X S(r, k)
= (n + 1)k+1 .
k=0
k+1
It’s still somewhat complicated, but better than having no pattern at all. In the next
section we’ll see how to expand falling factorials into powers (opposite to what we’ve just
done).
5
This highlights the important concept for us: every permutation can be pictured as a disjoint
union of cycles. In our example, there are 3 cycles, and they are {1, 3}, {2, 4, 5}, and {6}.
Recall the cycle decomposition of a permutation σ ∈ Sn : starting with any 1 ≤ i ≤ n,
we consider the sequence i, σ(i), σ 2 (i), . . . , σ k−1 (i) where σ k (i) = i (there is guaranteed to be
such a k since σ has finite order). We write the cycle as i → σ(i) → · · · → σ k−1 (i) → i. Note
18 STEVEN V SAM
that k could be 1, in which case the cycle has length 1 and also that there isn’t a unique
beginning (we could have started and ended with σ(i) instead of i).
In our running example, its cycle decomposition is 1 → 3 → 1, 2 → 4 → 5 → 2, 6 → 6.
The graph is probably the easiest way to think about it though for our purposes.
Let c(n, k) be the number of permutations in Sn with exactly k different cycles. We use
the convention that c(0, 0) = 1. Note that c(n, 0) = 0 if n > 0. These are the (signless)
Stirling numbers of the first kind.
type II permutations marked permutations of
f: → .
of [n] with k cycles [n − 1] with k cycles
n
X
c(n, k)xk = x(x + 1) · · · (x + n − 1),
k=0
n
X
(−1)n−k c(n, k)xk = (x)n .
k=0
NOTES FOR MATH 184 19
Proof. We prove the first identity by induction on n. For n = 0, both sides are 1. Similarly,
if n = 1, both sides are x. Now suppose n ≥ 2. Then c(n, 0) = c(n − 1, 0) = 0 and
n
X n
X n
X
k k−1
c(n, k)x = x c(n − 1, k − 1)x + (n − 1) c(n − 1, k)xk
k=1 k=1 k=1
n−1
X n−1
X
=x c(n − 1, k)xk + (n − 1) c(n − 1, k)xk
k=0 k=0
n−1
X
= (x + n − 1) c(n − 1, k)xk
k=0
= (x + n − 1) · x(x + 1) · · · (x + n − 2)
where the last equality is by induction, and this proves what we claimed.
The second identity follows by doing the substitution x 7→ −x and multiplying by (−1)n .
□
The coefficients (−1)n−k c(n, k) are the Stirling numbers of the first kind, and are usually
denoted s(n, k).
Example 3.3.3. Some computations to discuss:
• c(n, n − 1) (assume n ≥ 2)
There’s only possibility for cycle sizes: one of size 2 and n − 2 of size 1. There’s
nothing to do for the latter, so we need to pick the 2 numbers for the first cycle in n2
many ways. Since there’s only one way to turn 2 numbers into a cycle, we’re done,
n
and c(n, n − 1) = 2 .
• c(n, 1) (assume n ≥ 1):
We need to put all n numbers into a single cycle. Given a cycle, you can pick any
number and then list the rest in order to get a permutation (i.e., 1 → 2 → 3 → 1
can be thought of as 123 or 231 or 312). There are n! permutations, but we are
overcounting by a factor of n (the choice of where to start), so there are (n − 1)! ways
to put n things in a cycle and hence c(n, 1) = (n − 1)!.
• c(5, 3)
There are 2 cases for how big the cycles are: either sizes 3,1,1 or 2,2,1. In the first
case, we pick 3 numbers in 53 ways and then pick a cycle in 2 ways, so there are 20
permutations in this case.
5
For the second case, we pick 2 numbers for one cycle in 2
ways then pick another 2
3
numbers for the other cycle in 2 ways. Now divide by 2 because we’re overcounting
(our choices lead to an order on the cycles), so we get 15 permutations in this case.
Thus, c(5, 3) = 35. □
It is possible to interpret this formula as the size of some set so that both sides are different
ways to count the number of elements in that set. Can you figure out how to do that? How
about if we took the derivative twice with respect to x? Or if we took it with respect to x
and then with respect to y?
4.2. Multinomial theorem. Below, we have sums with multiple lines below the summation
symbol. This is usually means that we are summing over what is in the first line and the
following lines are conditions that are imposed by the things we sum. By default, the
variables represent integers. So for example,
X
i
0≤i≤10
P10
means the same thing as i=0 .
To clarify, the sum is over all possible k-tuples of non-negative integers whose sum is n.
Proof. The proof is similar to the binomial theorem. Consider expanding the product (x1 +
· · · + xk )n . To do this, we first have to pick one of the xi from the first factor, pick another
one from the second factor, etc. To get the term xa11 xa22 · · · xakk , we need to have picked x1
exactly a1 times, picked x2 exactly a2 times, etc. We can think of this as arranging n objects,
where ai of them have “type i”. In that case, we’ve already discussed that this is counted
n
by the multinomial coefficient a1 ,a2 ,...,ak . □
By performing substitutions, we can get a bunch of identities that generalize the one from
the previous section. I’ll omit the proofs, try to fill them in.
22 STEVEN V SAM
n
X n
k = ,
a1 , a2 , . . . , ak
(a1 ,a2 ,...,ak )
ai ≥0
a1 +···+ak =n
X
a1 n
0= (1 − k) ,
a1 , a2 , . . . , ak
(a1 ,a2 ,...,ak )
ai ≥0
a1 +···+ak =n
n−1
X n
nk = a1 .
a1 , a2 , . . . , ak
(a1 ,a2 ,...,ak )
ai ≥0
a1 +···+ak =n
4.3. Re-indexing sums. The next chunk of the course heavily involves sums and ma-
nipulating them, so let me make a few remarks about re-indexing sums. There isn’t any
mathematical content here, it’s just working with notation, but it may be helpful to have
this spelled out.
Say we have a sum starting from 1 and going to some other quantity, like 10:
10
X
f (i).
i=1
For whatever reason, we might prefer that it starts at 0. You can do this by defining j = i−1.
If you substitute i = j + 1 everywhere, you get
9
X
f (j + 1).
j=0
If you like, you can now replace j with i again to get 9i=0 f (i + 1). This is a common thing
P
we’ll do, so it’s good to get used to it. This is especially useful if we want to combine sums
that don’t have the same starting and ending points:
10
X 9
X 9
X 9
X 9
X
f (i) + g(k) = f (i + 1) + g(k) = (f (i + 1) + g(i)).
i=1 k=0 i=0 k=0 i=0
n
P
Let B(x) = n≥0 bn x be a formal power series. The sum of two formal power series is
defined by X
A(x) + B(x) = (an + bn )xn .
n≥0
The product is defined by
X n
X
n
A(x)B(x) = cn x , cn = ai bn−i .
n≥0 i=0
This is what you get if you just distribute like normal. As a special case, if ai = 0 for i > 0,
we just get X
a0 B(x) = a0 b n x n .
n≥0
Addition and multiplication are commutative, so A(x)+B(x) = B(x)+A(x) and A(x)B(x) =
B(x)A(x). They are also associative, so it is unambiguous how to add or multiply 3 or more
power series.
Example 5.1.1. Let A(x) = B(x) = n≥0 xn . Then
P
X
A(x) + B(x) = 2xn ,
n≥0
X
A(x)B(x) = (n + 1)xn . □
n≥0
A formal power series A(x) is invertible if there is a power series B(x) such that A(x)B(x) =
1. In that case, we write B(x) = A(x)−1 = 1/A(x) and call it the inverse of A(x). If it
exists, then B(x) is unique and also A(x) = 1/B(x).
Example 5.1.2. Let A(x) = n≥0 xn and B(x) = 1 − x. Then A(x)B(x) = 1, so B(x) is
P
the inverse of A(x). For that reason, we will use the expression
1 X
= xn .
1 − x n≥0
Following the calculus terminology, we call this the geometric series. However, the formal
power series x is not invertible: the constant term of xB(x) is 0 no matter what B(x) is, so
there is no way that an inverse exists. □
Theorem 5.1.3. A formal power series A(x) is invertible if and only if its constant term is
nonzero.
Proof. Write A(x) = n≥0 an xn . We want to solve A(x)B(x) = 1 if possible. If we multiply
P
the left side out and equate coefficients, we get the following (infinite) system of equations:
a0 b 0 = 1
a0 b 1 + a1 b 0 = 0
a0 b 2 + a1 b 1 + a2 b 0 = 0
a0 b 3 + a1 b 2 + a2 b 1 + a3 b 0 = 0
..
.
24 STEVEN V SAM
In the sum, we have i > 0, so bn−i is a coefficient we already solved for in a previous step.
Hence we get a formula for bn that makes the next equation valid as well. □
It is important to emphasize that formal here means that we are not considering questions
of convergence. We can take infinite sums and infinite products of formal power series as
long as the coefficient of xn involves only finitely many multiplications and additions for each
n (adding 0 or multiplying by 1 infinitely many times is ok). I don’t want to spend much
time discussing these issues but they do come up, so let’s go over it briefly.
Given a nonzero power series A(x), define its minimum degree, denoted mdeg(A(x)) to
be the smallest n so that [xn ]A(x) ̸= 0 (and define mdeg(0) = ∞). Infinite operations are
allowed whenever the computation of a given coefficient is finite.
Theorem 5.1.4. Let A1 (x), A2 (x), . . . such that limi→∞ mdeg(Ai (x)) = ∞. Then the fol-
lowing two expressions are well-defined (i.e., computing the coefficient of xn is always a finite
process):
(1) the infinite sum ∞
P
i=1
Q∞ Ai (x) = A1 (x) + A2 (x) + · · · , and
(2) the infinite product i=1 (1 + Ai (x)) = (1 + A1 (x))(1 + A2 (x)) · · · .
Example 5.1.5. I’ll give an example to illustrate the intuition for infinite sums. Let Ai (x) =
xi + xi+1 + xi+2 + · · · , so like the geometric series, but starting at xi . Then mdeg(Ai (x)) = i,
and here’s how we can think of the infinite sum A1 (x) + A2 (x) + · · · :
x + x 2 + x3 + x4 + x5 + · · ·
x 2 + x3 + x4 + x5 + · · ·
x3 + x4 + x5 + · · ·
x4 + x5 + · · ·
x5 + · · ·
..
.
2 3 4 5
x + 2x + 3x + 4x + 5x + · · ·
Even though the sum is infinite, each column (coefficient) only requires a finite sum, so we
don’t run into issues. □
Given two formal power series A(x) and B(x), suppose that A(x) has no constant term.
Then we can define the composition by
X
(B ◦ A)(x) = B(A(x)) = bn A(x)n .
n≥0
NOTES FOR MATH 184 25
(1 − x)B(x) = 1
to get X
(1 − xd ) xdn = 1,
n≥0
from which we conclude that
1 X
= xdn . □
1 − xd n≥0
We can also take the derivative D of a formal power series. We define it as follows:
X X
(DA)(x) = A′ (x) = nan xn−1 = (n + 1)an+1 xn .
n≥0 n≥0
When m is a non-negative integer, this agrees with the ordinary binomial theorem with
x)m = 1/(1 + x)−m . For fractional
y = 1. When m is a negative integer, the meaning is (1 + √
1/2
m, we can also interpret them. For example, (1 + x) = 1 + x, which represents a formal
power series whose square is equal to 1 + x. In other words,
!2
X 1/2
xn = 1 + x.
n≥0
n
We won’t use the fractional case beyond m = 1/2 much so I’m not going to go into any
further details about their definition.
This will be useful in later calculations. Let’s work out a few cases.
Example 5.2.2. Consider m = −1. We know from before that
1 X
= xn .
1 − x n≥0
We should also be able to get this from the binomial theorem with m = −1. We have
(−1)n n!
−1 (−1)(−2) · · · (−1 − n + 1)
= = = (−1)n .
n n! n!
More generally, consider m = −d for some positive integer d. Then from what we just did,
we have !d
X
(1 + x)−d = (−1)n xn .
n≥0
The right side could be expanded, possibly by using induction on d, but we’d have to know
a pattern before we could proceed. Instead, let’s use the binomial theorem directly:
(−1)n (d + n − 1)(d + n − 2) · · · (d)
−d (−d)(−d − 1) · · · (−d − n + 1)
= =
n n! n!
(d + n − 1)! d+n−1
= (−1)n = (−1)n .
(d − 1)!n! n
This gives us the identities
1 n d+n−1
X
= (−1) xn ,
(1 + x)d n≥0
n
1 X d + n − 1
= xn . □
(1 − x)d n≥0
n
NOTES FOR MATH 184 27
To check that by hand, we could expand the left side, but it would be a lot of work. □
6.1. Linear recurrence relations. Our first application of ordinary generating functions is
to solve linear recurrence relations. A sequence of numbers is said to satisfy a homogeneous
linear recurrence relation of order d if there are scalars c1 , . . . , cd such that cd ̸= 0, and for
all n ≥ d, we have
an = c1 an−1 + c2 an−2 + · · · + cd an−d .
We’ve seen this idea before, although in slightly different forms.
Example 6.1.1. The Fibonacci numbers Fn are given by the sequence 1, 1, 2, 3, 5, 8, 13, 21, . . . .
This isn’t really telling you what the general Fn is, so instead let me say that for all n ≥ 2,
we have
Fn = Fn−1 + Fn−2 .
Together with the initial conditions F0 = 1, F1 = 1, this is enough information to calculate
any Fn . So (by definition), the Fibonacci numbers satisfy a linear recurrence relation of
order 2. □
In general, if we want to define a sequence using a linear recurrence relation of order d,
we need to specify the first d initial values a0 , a1 , . . . , ad−1 to allow us to calculate all of the
terms.
Our goal here is to get closed formulas for sequences that satisfy linear recurrence relations.
Example 6.1.2. When d = 1, this is easy to do:
an = c1 an−1 = c21 an−2 = c31 an−3 = · · · = cn1 a0 . □
28 STEVEN V SAM
Remember the recurrence is only valid for n ≥ 2, so we have to separate out the first two
terms. Now comes an important point: the last two sums are almost the same as A(x) if we
re-index them:
X X X
an−1 xn = an xn+1 = x an xn = xA(x) − a0 x
n≥2 n≥1 n≥1
X X
an−2 xn = an xn+2 = x2 A(x).
n≥2 n≥0
In particular,
A(x) = a0 + a1 x + c1 xA(x) − c1 a0 x + c2 x2 A(x).
We can rewrite this as
a0 + (a1 − c1 a0 )x
(6.1.6) A(x) = .
1 − c1 x − c2 x2
We want to factor the denominator. To do this, plug in t 7→ x−1 into (6.1.3) and multiply
by x2 to get
1 − c1 x − c2 x2 = (1 − r1 x)(1 − r2 x).
Now we can apply partial fraction decomposition to (6.1.6) to write
α1 α2
A(x) = +
1 − r1 x 1 − r2 x
for some constants α1 , α2 . But these terms are both geometric series, so we can further write
X X
A(x) = α1 r1n xn + α2 r2n xn .
n≥0 n≥0
The coefficient of x on the left side is an and the coefficient of xn on the right side is
n
X X
A(x) = β1 r1n xn + β2 (n + 1)r1n xn .
n≥0 n≥0
Example 6.1.8. We’ve only been dealing with homogeneous linear recurrence relations
so far, i.e., an is expressed as a linear combination of previous terms, but how about the
inhomogeneous case? For example, consider the recurrence relation
an = an−1 + an−2 + 2 (n ≥ 2).
When we don’t know what to do, weP can always try to find a formula for the generating
function. In this case, setting A(x) = n≥0 an xn , we have
X
A(x) = a0 + a1 x + an x n
n≥2
X
= a0 + a1 x + (an−1 + an−2 + 2)xn
n≥2
2x2
= a0 + a1 x + x(A(x) − a0 ) + x2 A(x) +
1−x
and then we can solve for A(x) as before (I’ll stop here). Sometimes, there are shortcuts we
can use to turn these into homogeneous linear recurrence relations (though of higher degree).
For example, if n ≥ 3, then we know that an = an−1 + an−2 + 2 and an−1 = an−2 + an−3 + 2,
so taking the difference gives
an = 2an−1 − an−3
which is now order 3, but homogeneous. We originally had 2 initial values a0 and a1 , so we
should remember that a2 can be determined by using the original equation a2 = a1 + a0 + 2.
This works out for a lot of different kinds of inhomogeneous situations, but instead of
taking a difference, we may have to take other linear combinations (for example, instead of
a constant 2, we might have 2n ) and repeating the process can be helpful too (for example,
instead of a constant 2, we might have 2n) as well as combining these ideas (for example,
n2n ). □
Remark 6.1.9. Finally, let me explain one thing to make the inhomogeneous case a little
easier.
Start with a homogeneous recurrence relation
an = c1 an−1 + · · · + cd an−d .
Its characteristic polynomial is td − c1 td−1 − · · · − cd . Given a constant r, it will be useful to
know that characteristic polynomial of the difference
an = c1 an−1 + · · · + cd an−d
−r(an−1 = c1 an−2 + · · · + cd an−d−1 )
an − ran−1 = c1 (an−1 − ran−2 ) + · · · + cd (an−d − ran−d−1 )
is (t − r)(td − c1 td−1 − · · · − cd ), i.e., we pick up a factor of t − r.
I’ll just give an example of how you can use this. Say you had the inhomogeneous equation
an = c1 an−1 + c2 an−2 + 2n
I’d want to take r = 2 above to get the difference
an − 2an−1 = c1 (an−1 − 2an−2 ) + c2 (an−2 − 2an−3 ).
This is now homogeneous and its characteristic polynomial is (t−2)(t2 −c1 t−c2 ) (the second
factor comes from ignoring the inhomogeneous part of the original equation). □
32 STEVEN V SAM
6.2. Integer partitions. Now we deal with the notion of an “unordered composition”.
These are much harder to study than compositions, which is why we’ve postponed it until
now.
Definition 6.2.1. Let n be a positive integer. A partition λ of n is an unordered collection
of positive integers a1 , . . . , ak such that a1 + · · · + ak = n. The ai are the parts of λ. These
are also called integer partitions to distinguish from set partitions. The number k is called
the length of the partition, and denoted ℓ(λ). We also say that it’s the number of parts of
λ. Then n is the size of the partition, and we denote this by |λ| = n.
The number of partitions of n is denoted p(n), the number of partitions of n with k parts
is denoted pk (n), and the number of partitions of n with at most k parts is denoted p≤k (n).
By convention, there is exactly one partition of n = 0, and it has length 0; we denote it
by the empty set ∅. □
In other words, 2, 3 represents the same partition as 3, 2 since we do not distinguish
between different orderings.
Definition 6.2.2. We can always write the numbers in decreasing order, and we call that
the normal form of the partition. This gives an unambiguous way to write each partition,
and we’ll denote it with tuple notation. □
In the previous example, the normal form for this partition is (3, 2). We will usually always
write partitions in normal form.
Example 6.2.3. p(5) = 7 since there are 7 partitions of 5:
(5), (4, 1), (3, 2), (3, 1, 1), (2, 2, 1), (2, 1, 1, 1), (1, 1, 1, 1, 1). □
Definition 6.2.4. There’s another convenient way to describe partitions. Given a partition
λ and a positive integer k, let mk (λ) be the number of times that k appears in λ. This is
multiplicity of k in λ. If we know all of the multiplicities, then we also know the partition,
so can also be used to describe λ. □
P
Note that |λ| = k mk (λ)k. The sum is always finite since mk only takes nonzero values
for a finite number of k.
Example 6.2.5. For λ = (4, 2, 2, 1), we have m4 (λ) = 1, m2 (λ) = 2, m1 (λ) = 1, and
mk (λ) = 0 for all other k. The above fact just says that |λ| = 4 + 2 · 2 + 1. □
We can visualize partitions using Young diagrams. To illustrate, the Young diagram of
(4, 2, 1) is
Y (λ) =
In general, it is a left-justified collection of boxes with λi boxes in the ith row (counting from
top to bottom).
The transpose (or conjugate) of a partition λ is the partition whose Young diagram is
obtained by flipping Y (λ) across the main diagonal. For example, the transpose of (4, 2, 1)
is (3, 2, 1, 1):
NOTES FOR MATH 184 33
Note that we get the parts of a partition from a Young diagram by reading off the row
lengths. The transpose is obtained by instead reading off the column lengths. The notation
is λT . If we want a formula: λTi = |{j | λj ≥ i}|.
Note that (λT )T = λ. A partition λ is self-conjugate if λ = λT .
Example 6.2.6. Some self-conjugate partitions: (4, 3, 2, 1), (5, 1, 1, 1, 1), (4, 2, 1, 1):
, ,
□
Theorem 6.2.7. The number of partitions λ of n with ℓ(λ) ≤ k is the same as the number
of partitions µ of n such that all µi ≤ k.
Proof. We get a bijection between the two sets by taking transpose. Details omitted. □
This tells us that p≤k (n), which is defined to be the number of partitions of n with at
most k parts, is also the number of partitions of n using only the parts 1, . . . , k. We’ll use
this second interpretation now.
We want a simple expression for n≥0 p≤k (n)xn . When k = 1, we get p≤1 (n) = 1 for all
P
n, so
X 1
p≤1 (n)xn = .
n≥0
1−x
Now consider k = 2. We can think of partitions in terms of how many 1’s they use and
how many 2’s they use, i.e., in terms of their multiplicities (m1 (λ), m2 (λ)) (there is no mi (λ)
for i ≥ 3). Then consider the product
(1 + x + x2 + x3 + · · · )(1 + x2 + (x2 )2 + (x2 )3 + · · · ).
When we multiply this out, each term is of the form xa (x2 )b = xa+2b , so we see that the
total coefficient of xn is exactly the number of ways of writing n as a sum of 1’s and 2’s since
this specific term can be thought of as the partition λ with m1 (λ) = a and m2 (λ) = b. Both
sums are geometric series, so we have
X 1
p≤2 (n)xn = 2)
.
n≥0
(1 − x)(1 − x
This same reasoning extends to any k, and we can prove that
k
X
n
Y 1 1
p≤k (n)x = i
= .
n≥0 i=1
1−x (1 − x)(1 − x2 ) · · · (1 − xk )
We can actually take k → ∞ to guess the formula (due to Euler)
X Y 1
p(n)xn = i
.
n≥0 i≥1
1 − x
Why is this correct? First, we specify that the meaning of an infinite product of terms of
the form 1 + · · · is to multiply out choices where something with a positive power of x is
34 STEVEN V SAM
only chosen a finite number of times (so that each term has finite degree and we’re otherwise
multiplying 1 infinitely many times).
Consider the coefficient of xd in the infinite product on the right. We have to consider the
infinite product
(1 + x + x2 + · · · )(1 + x2 + x4 + · · · )(1 + x3 + x6 + · · · ) · · ·
and the only way to get xd is to choose 1 from (1 + xi + x2i + · · · ) if i > d, so the coefficient
of xd is the same as the coefficient of xd in di=1 1−x
1 n
Q P
i = n≥0 p≤d (n)x . Since p≤d (d) = p(d),
the infinite product indeed has the right coefficients.
More generally, the same argument proves the following:
Theorem 6.2.8. For any subset S of the positive integers, the generating function for the
number of partitions that only use parts from S is
Y 1
i
.
i∈S
1 − x
The above formula lets us restrict which parts are allowed, but does not impose restrictions
on how many times each part can be used. We can actually restrict both. It’s probably easiest
to first examine this with examples. The general case follows the same idea but requires a
lot of notation to state, so I won’t attempt to do so.
Example 6.2.9. Let an be the number of integer partitions of n that only use the parts 1, 2,
and 3, with the additional constraint that 3 can only be used at most 2 times. Examining how
we understood the products above, the generating function for an is the following product
1 + x3 + x6
(1 + x + x2 + x3 + · · · )(1 + x2 + (x2 )2 + (x2 )3 + · · · )(1 + x3 + (x3 )2 ) = .
(1 − x)(1 − x2 )
As before, the first factor corresponds to how many times the part 1 is used, the second factor
corresponds to how many times 2 gets used, and the third factor corresponds to how many
times 3 gets used. Explicitly, if we multiply this out, then each term looks like xa (x2 )b (x3 )c
where a, b are not constrained, by 0 ≤ c ≤ 2. This counts the partition where 1 appears a
times, 2 appears b times, and 3 appears c times. □
Example 6.2.10. Let bn be the number of integer partitions of n that only use the parts 3
and 4, but the number of times that 4 appears has to be odd. Then its generating function
is the following product:
3 3 2 3 3 4 4 3 4 5 x4
(1 + (x ) + (x ) + (x ) + · · · )((x ) + (x ) + (x ) + · · · ) = . □
(1 − x3 )(1 − x8 )
Now let’s take this idea and prove an interesting identity due to Euler.
Let podd (n) be the number of partitions of n such that all parts are odd. Let pdist (n) be
the number of partitions of n such that all parts are distinct.
Theorem 6.2.11 (Euler). podd (n) = pdist (n).
For example, when n = 5, both quantities are 3 since we have (5), (3, 1, 1), (1, 1, 1, 1, 1) for
podd (5) and (5), (4, 1), (3, 2) for pdist (5).
Proof. There are ways to build bijections, but let’s prove this by showing that they have the
same generating function since the idea is a little surprising and could even be considered
fun.
NOTES FOR MATH 184 35
To multiply out the right side, we either choose 1 or xi from the ith term, and we can only
avoid choosing 1 finitely many times. What we get then is xN where N is the sum of the i
where we chose xi . But we get xN one time for every partition of N into distinct parts, so
the coefficient is pdist (N ).
2i
Now we observe that (1 + xi ) = 1−x
1−xi
, so we can rewrite it as
X 1 − x2 1 − x4 1 − x6 1 − x8 1 − x10
pdist (n)xn = · · · · ···
n≥0
1 − x 1 − x2 1 − x3 1 − x 4 1 − x5
Finally, let’s take a step back and try to understand the significance of these product
formulas that we’re getting.
Example 6.2.12. As before, let p≤2 (n) be the number of integer partitions of n with at
most 2 parts. To simplify notation, set an = p≤2 (n). We showed before that
X 1
an x n = .
n≥0
(1 − x)(1 − x2 )
a0 + a1 x + a2 x 2 + a3 x 3 + a4 x 4 + · · ·
−a0 x − a1 x2 − a2 x3 − a3 x4 − · · ·
−a0 x2 − a1 x3 − a2 x4 − · · ·
+a0 x3 + a1 x4 + · · ·
Example 6.2.13. We can actually go one step further with the previous example and
consider the sequence an = p(n). In this case, we showed that
X 1 Y 1
an x n = 2 3 4
= i)
.
n≥0
(1 − x)(1 − x )(1 − x )(1 − x ) · · · i≥1
(1 − x
We can again clear denominators, but we get an infinite number of lines this time since
Y
(1 − xi ) = 1 − x − x2 + x5 + x7 − x12 − x15 + x22 + x26 − · · · .
i≥1
Nonetheless, this gives us an interesting recursive formula for p(n) if we follow the same
argument. As before, let’s adopt the convention that p(n) = 0 if n is negative. Then by
following the same reasoning from before, we see that for all n ≥ 1, we have the recursion:
p(n) = p(n − 1) + p(n − 2) − p(n − 5) − p(n − 7) + p(n − 12) + p(n − 15) − · · · .
A few comments are in order. First, this is, roughly speaking, a homogeneous linear recur-
rence relation of “order ∞”. However, for any given input n, only finitely many terms on the
right side actually contribute, so we do get a well-defined formula. For example, for n = 10,
we have
p(10) = p(9) + p(8) − p(5) − p(3).
Second, what is the actual pattern here? When we expand i≥1 (1 − xi ), it looks like all
Q
of the coefficients are ±1 (this is true in general), but what are the exponents that appear?
It turns out that they are the numbers of the form k(3k ± 1)/2 where k is a non-negative
integer (pentagonal numbers). I won’t go into any more detail, but if you’re interested, the
relevant result here is “Euler’s pentagonal number theorem”. □
Example 6.2.14. We’ll end our discussion on integer partitions with a more interesting
bijection (I don’t know if there’s any easy trick to prove it using power series). We want to
show that “self-conjugate partitions of n” are in bijection with “partitions of n using only
distinct odd parts”. We’ll just do this informally and I’ll leave formulating it in a more
rigorous way to you.
Given a self-conjugate partition, take all of the boxes in the first row and column of its
Young diagram. Since it’s self-conjugate, there are an odd number of boxes. Use this as the
first part of a new partition. Now remove those boxes and repeat. For example:
7→ , 7→ .
Proof. Every set of balanced parentheses must begin with (. Consider the ) which pairs with
it. In between the two of them is another set of balanced parentheses (possibly empty) and
to the right of them is another set of balanced parentheses (again, possibly empty). So the
set on the inside consists of i pairs, where 0 ≤ i ≤ n − 1, while the set on the right consists
of n − 1 − i pairs. These sets can be chosen independently, so there are Ci Cn−i−1 ways for
this to happen. Since the cases with different i don’t overlap, we sum over all possibilities
to get the identity above. □
Define X
C(x) = Cn x n .
n≥0
This means that C(x) is a solution of the quadratic polynomial xt2 − t + 1 = 0. Using the
quadratic formula, we deduce that C(x) is one of the solutions
√
1 ± 1 − 4x
.
2x
Note that x isn’t invertible as a power series, so we have to be careful here. Since C(x) is
a power series, it must be that x divides the numerator, i.e., the numerator
√ cannot have a
1/2
constant term. Which choice of sign is correct? The constant term of 1 − 4x is 0 = 1,
so the correct choice is a negative sign, and so
√
1 − 1 − 4x
C(x) = .
2x
1 2n
Theorem 6.3.4. Cn = .
n+1 n
38 STEVEN V SAM
Here are a few other things that are counted by the Catalan numbers together with the 5
instances for n = 3:
• The number of ways to apply a binary operation ∗ to n + 1 elements:
a ∗ (b ∗ (c ∗ d)), a ∗ ((b ∗ c) ∗ d), (a ∗ b) ∗ (c ∗ d), ((a ∗ b) ∗ c) ∗ d, (a ∗ (b ∗ c)) ∗ d.
• The number of rooted binary trees with n + 1 leaves:
a a d d
b d c a
c d b c a b c d a b b c
• The number of paths from (0, 0) to (n, n) which never go above the diagonal x = y
and are made up of steps either moving in the direction (0, 1) or (1, 0). For n = 3:
It turns out that the Catalan recursion shows up a lot. There are more than 200 other
known interpretations for the Catalan numbers.
where recall that n! = n(n − 1)(n − 2) · · · 2 · 1 and 0! = 1. When an = 1 for all n, we use the
notation
X xn
ex = exp(x) = .
n≥0
n!
You should just think of this as a renormalization of ordinary generating functions. When
written in the exponential format, the coefficients of a product take on a slightly different
form which is very convenient for certain kinds of counting problems:
n n n
Lemma 7.1.1. If A(x) = n≥0 an xn! and B(x) = n≥0 bn xn! , then A(x)B(x) = n≥0 cn xn!
P P P
Pn n
where cn = i=0 i ai bn−i .
The sum is just taking disjoint union. The product requires more explanation: we are taking
the disjoint union over all subsets T in S, picking an α-structure on T and a β-structure on
its complement. We’ll see in examples why this is a sensible thing to do, but first, we show
that these operations behave well with respect to EGFs:
Theorem 7.1.3. We have
Eα+β (x) = Eα (x) + Eβ (x), Eα·β (x) = Eα (x)Eβ (x).
Proof. For the sum, we have |(α + β)([n])| = |α([n])| + |β([n])| since we’re taking a disjoint
union.
For the product, we have
X
|(α · β)([n])| = |α(T )| · |β([n] \ T )|.
T ⊆[n]
40 STEVEN V SAM
Since the size of α(T ) only depends on |T | and similarly for β([n] \ T ), we can just sum over
possible sizes of T :
n
X n
|α([i])| · |β([n − i])|
i=0
i
which is the coefficient of Eα (x)Eβ (x) by Lemma 7.1.1. □
Example 7.1.4. Consider a set of n football players. We want to split them up into two
groups. Both groups needs to be assigned an ordering and the second group additionally
needs to choose one of 3 colors for their uniform. Let cn be the number of ways to do this.
This scenario calls for a product of structures:
• Let α(S) be the set of orderings of S, so |α(S)| = |S|!. We have
X xn 1
Eα (x) = n! = .
n≥0
n! 1−x
• Let β(S) be the set of pairs (σ, f ) where σ is an ordering of S and f : S → [3] is an
assignment of the 3 colors to each element. So |β(S)| = |S|!3|S| . We have
X xn 1
Eβ (x) = n!3n = .
n≥0
n! 1 − 3x
Then (α·β)([n]) is the set of things we’re asking about (I glossed over it, but it’s important
that the definitions above make sense and give the correct thing when S = ∅, otherwise our
product interpretation will be incorrect when T = ∅, for example), so its EGF is
1
Eα·β (x) = .
(1 − x)(1 − 3x)
In particular,
n 1 n 3/2 1/2 3 1
cn /n! = [x ] = [x ] − = 3n − ,
(1 − x)(1 − 3x) 1 − 3x 1 − x 2 2
and hence
3 1 n!
cn = n!( 3n − ) = (3n+1 − 1). □
2 2 2
Before continuing, I want to point out a useful identity.
Proposition 7.1.5. Let A(x) and B(x) be formal power series with no constant term. Then
exp(A(x)) exp(B(x)) = exp(A(x) + B(x)).
Proof. To check this, let’s expand the left side:
! ! n
!
X A(x)n X B(x)n X X A(x)i B(x)n−i
=
n≥0
n! n≥0
n! n≥0 i=0
i! (n − i)!
Now the right side (using the usual binomial theorem):
n
!
X (A(x) + B(x))n X 1 X n
= A(x)i B(x)n−i .
n≥0
n! n≥0
n! i=0
i
This is the same as the first one as soon as we cancel out the n! from ni .
□
NOTES FOR MATH 184 41
Example 7.1.6. We have n distinguishable telephone poles which are to be painted either
red or blue. The number which are blue must be even. Let cn be the number of ways to do
this.
Again we want to interpret this as counting the product of two structures (we’ll think of
the elements of sets as telephone poles):
• Let α(S) be the set of ways to paint the poles red according to our rules, so |α(S)| = 1
for all S (even S = ∅) and Eα (x) = ex .
• Let β(S) be the set of ways to paint the poles blue according to our rules, so |β(S)| = 1
if |S| is even and |β(S)| = 0 if |S| is odd. Hence
X x2n
Eβ (x) = .
n≥0
(2n)!
Here we are deleting all of the odd powers of x from ex . To get a nice expression,
note that this is the same as (ex + e−x )/2. (How about if we wanted to delete the
even terms instead?)
Hence we get:
1 1 1 X 2n xn 1
Eα·β (x) = ex (ex + e−x ) = (e2x + 1) = + .
2 2 2 n≥0 n! 2
where the disjoint union is over all ways to write S as a disjoint union of k subsets T1 , . . . , Tk
(order of the Ti matters). This is almost like an ordered set partition, except that the Ti are
allowed to be empty. Then
Eα1 ···αk (x) = Eα1 (x) · · · Eαk (x).
Example 7.1.7. Continuing from the previous example, suppose we can also color some
telephone poles green and there are no restrictions on how many are green. This introduces
a third structure: let γ(S) be the ways to paint the poles green, so |γ(S)| = 1 for all S. Our
new EGF is
!
1 x x −x x 1 3x x 1 X (3x)n X xn
Eα·β·γ (x) = e (e + e )e = (e + e ) = + ,
2 2 2 n≥0 n! n≥0
n!
In other words, we consider all possible ways to partition S into nonempty subsets and put
the α structure on each block. Finally, we make the convention that |eα (∅)| = 1. We’ll see
some examples soon, but first let’s establish some basic properties.
Theorem 7.2.1 (Exponential formula). We have
Eeα (x) = exp(Eα (x)).
Proof. Since |α(∅)| = 0, we have [xn ]Eα (x)k = 0 if k > n. So
X Eα (x)k n
n n n
X Eα (x)k
[x ] exp(Eα (x)) = [x ] = [x ] .
k≥0
k! k=0
k!
From our discussion on products of EGFs, for n > 0, [xn ]Eα (x)k is the number of ways to
pick an ordered set partition of [n] into k blocks and put structures of type α on each block
(note that the property α(∅) = ∅ disallows picking empty blocks); if we divide by k! we just
remove the ordering. Hence the coefficient of xn above is exactly the size of eα ([n]). Finally,
the case n = 0 is ok by our convention that |eα (∅)| = 1. □
One nice thing about this form of EGF is that we can employ the following identity, which
will allow us to get recursive formulas for hn , as we’ll see in some examples.
NOTES FOR MATH 184 43
Then e ([n]) is the set of derangements on n letters, let’s use the notation hn = |eα ([n])| and
α
so its EGF is H(x) = exp(Eα (x)). Using the derivative identity (Proposition 7.2.2), we have
x
H ′ (x) = H(x)Eα′ (x) = H(x) .
1−x
Let’s rewrite this as
H ′ (x) − xH ′ (x) = xH(x).
We’ll compare the coefficients, but first let’s expand them again to make it easier to see:
X xn−1 X xn X xn+1
hn − hn = hn .
n≥1
(n − 1)! n≥1 (n − 1)! n≥0 n!
Now we’ve seen plenty of examples using the fact that permutations are built out of cycles
and how we can count permutations with restrictions on cycle lengths using the exponential
formula. Another important class of examples comes from set partitions, with “blocks” being
the literal building blocks.
Example 7.2.7. We continue with Example 7.1.8 and consider the selection structure
(
{∗} if |S| > 0
α(S) = .
∅ if |S| = 0
Then |eα (S)| is the number of set partitions of S, so we get the EGF for Bell numbers:
X xn
B(n) = Eeα (x) = exp(Eα (x)) = exp(ex − 1).
n≥0
n!
Letting H(x) be this EGF, we can extract a recursion by applying Proposition 7.2.2 (so
A(x) = ex − 1 and A′ (x) = ex ):
! !
X xn X x n X xn
B(n + 1) = H ′ (x) = H(x)A′ (x) = B(n) .
n≥0
n! n≥0
n! n≥0
n!
The coefficient of xn on the left side is B(n + 1)/n!; the coefficient on the right side is
Pn B(i) 1
i=0 i! (n−i)! . Multiply both by n! to get
n
X n
B(n + 1) = B(i),
i=0
i
which is the identity from Example 3.1.8. □
Example 7.2.8. The advantage of this approach is that we can easily modify the problem if
we want to restrict the possible sizes of the blocks in our set partitions. For example, suppose
we want to consider set partitions such that every block has either size 2 or 3. P Let hn be the
n
number of set partitions of [n] with satisfying this condition and let H(x) = n≥0 hn xn! be
its EGF. Let’s define a structure α by
(
{∗} if |S| ∈ {2, 3}
α(S) = .
∅ else
Then hn = |eα ([n])|, H(x) = exp(Eα (x)), and Eα (x) = x2 /2! + x3 /3!. As usual, let’s apply
Proposition 7.2.2 with A(x) = Eα (x). First, A′ (x) = x + x2 /2 so we have
′ x2
H (x) = H(x)(x + ),
2
which can be written as
X xn−1 X xn+1 1 X xn+2
hn = hn + hn .
n≥1
(n − 1)! n≥0 n! 2 n≥0 n!
1 2 3 2 1 3 1 3 2
NOTES FOR MATH 184 47
Example 7.3.7. Let’s illustrate the previous bijection with an example with n = 7:
6 5 3 6 5 3
4 1 7 2 4 1 2
↔
The original tree is on the left, and its corresponding planted forest is on the right. Here
I’ve indicated the roots by shading in the vertices. □
Combining these two identities gives the equation
R(x) = xeR(x) .
We can try to solve this coefficient by coefficient: say that R(x) = n≥1 rn xn and we are
P
trying to solve for the ri (by definition R(x) has no constant term). So mdeg(R(x)) = 1 and
this tells us that mdeg(R(x)n ) = n. Expanding the equation, we get
R(x)2
R(x) = x(1 + R(x) + + · · · ).
2!
R(x)n−1
So if we want to solve for rn we just need to consider x(1 + R(x) + · · · + (n−1)!
) since all
other terms don’t have a xn term. In particular,
r1 = [x1 ]R(x) = [x1 ]x = 1,
r2 = [x2 ]R(x) = [x2 ]x(1 + R(x)) = 0 + r1 = 1,
3 3 R(x)2 r12 3
r3 = [x ]R(x) = [x ]x(1 + R(x) + ) = 0 + r2 + = ,
2! 2 2
2 3
R(x) R(x) r1 r2 + r2 r1 r13 16
r4 = [x4 ]R(x) = [x4 ]x(1 + R(x) + + ) = 0 + r3 + + = ,
2! 3! 2 6 6
..
.
Remembering that tn = (n − 1)!rn , we get t1 = 1, t2 = 1, t3 = 3, t4 = 16, which is consistent
so far.
We can continue like this, but it would be nice to have a closed formula without having
to guess one. This can be done with the Lagrange inversion formula which we discuss next.
Proof of Cayley’s formula, Theorem 7.3.1. We take A(x) = R(x) and G(x) = ex . For n > 0,
the Lagrange inversion formula tells us that
1 1 X nd 1 nn−1 nn−1
[xn ]R(x) = [xn−1 ]enx = [xn−1 ] xd = = .
n n d≥0
d! n (n − 1)! n!
The sum is over all weak compositions of n − 1 with k parts. Here ij represents the number
of internal
P vertices that are in the jth tree connected to our original root. As before, if
n
C(x) = n≥0 cn x , this leads to the relation
C(x) = 1 + xC(x)k .
Now we don’t have a general method of solving this polynomial equation for general k, but
we can use Lagrange inversion like in the previous example. Again, we set A(x) = C(x) − 1
to convert the relation into
A(x) = x(A(x) + 1)k .
50 STEVEN V SAM
8. Sieving methods
The topic of this section is how to systematically deal with overcounting. This could have
been done earlier in the course since it is basically independent of a lot of the other topics
we discussed, but we’ll draw on the previous sections for interesting examples.
8.1. Inclusion-exclusion.
Example 8.1.1. Suppose we have a room of students, and 14 of them play basketball, 10
of them play football. How many students play at least one of these? We can’t answer the
question because there might be students who play both. But we can say that the total
number is 24 minus the amount in the overlap.
B F
Alternatively, let B be the set who play basketball and let F be the set who play football.
Then what we’ve said is:
|B ∪ F | = |B| + |F | − |B ∩ F |.
New situation: there are additionally 8 students who play hockey. Let H be the set of
students who play hockey. What information do we need to know how many total students
there are?
B F
NOTES FOR MATH 184 51
Here the overlap region is more complicated: it has 4 regions, which suggest that we need 4
more pieces of information. The following formula works:
|B ∪ F ∪ H| = |B| + |F | + |H| − |B ∩ F | − |B ∩ H| − |F ∩ H| + |B ∩ F ∩ H|.
To see this, the total diagram has 7 regions and we need to make sure that students in each
region get counted exactly once in the right side expression. For example, consider students
who play basketball and football, but don’t play hockey. They get counted in B, F , B ∩ F
with signs +1, +1, −1, which sums up to 1. How about students who play all 3? They get
counted in all terms with 4 +1 signs and 3 −1 signs, again adding up to 1. You can check
the other 5 to make sure the count is right. □
The examples above have a generalization to n sets, though the diagram is harder to draw
beyond 3 (technically, you can’t draw it...)
Theorem 8.1.2 (Inclusion-Exclusion). Let A1 , . . . , An be finite sets. Then
n
X X
|A1 ∪ · · · ∪ An | = (−1)j−1 |Ai1 ∩ Ai2 ∩ · · · ∩ Aij |.
j=1 1≤i1 <i2 <···<ij ≤n
In words: to get the size of the union, first add up all of the sizes of the sets, then subtract
off the sizes of all 2-fold intersections, then add the sizes of all 3-fold intersections, ... and
keep going until you’ve intersected all of the sets.
Proof. We just need to make sure that every element x ∈ A1 ∪ · · · ∪ An is counted exactly
once on the right hand side. Let S = {s1 , . . . , sk } be all of the indices such that x ∈ Asr .
Then x belongs to Ai1 ∩ · · · ∩ Aij if and only if {i1 , . . . , ij } ⊆ S. So the relevant contributions
for x is a sum over all of the nonempty subsets of S:
|S|
X
|T |−1
X |S|
(−1) =− (−1)n .
T ⊆S n=1
n
However, since |S| > 0, we have shown before that |S| |S|
(−1)n = 0, so the sum above
P
n=0 n
|S|
is 0 = 1. □
We can also prove this by induction on n. Can you see how?
Let’s start with some specific problems.
Example 8.1.3. Let’s do a warmup with n = 2.
How many numbers between 1 and 1000 are divisible by 3 or 5?
This is a typical inclusion-exclusion problem because OR translates to a union of two sets.
Namely, let A be the set of numbers between 1 and 1000 which are divisible by 3 and let B
be the set of numbers between 1 and 1000 which are divisible by 5. Our question is asking:
how big is A ∪ B?
To use inclusion-exclusion, we need 3 pieces of information: |A|, |B|, and |A ∩ B|.
First, let’s deal with A. We can write all of the multiples of 3: A = {3, 6, 9, . . . , 999}. How
big is this set? To see it easily, let’s divide all of them by 3: {1, 2, 3, . . . , 333}, so |A| = 333.
Next, let’s deal with B in the same way: B = {5, 10, 15, . . . , 1000}, and dividing each
number by 5 gives {1, 2, 3, . . . , 200}, so |B| = 200.
Finally, how do we deal with A ∩ B? Remember that if number being divisible by both
3 and 5 is equivalent to being divisible by their least common multiple lcm(3, 5) = 15. So
52 STEVEN V SAM
we have A ∩ B = {15, 30, 45, . . . , 990}, and again dividing by 15 gives {1, 2, 3, . . . , 66}, so
|A ∩ B| = 66.
So our desired answer is
|A ∪ B| = |A| + |B| − |A ∩ B| = 333 + 200 − 66 = 467. □
Example 8.1.4. The above generalizes fairly well. For instance, let’s consider the numbers
1, . . . , N which are divisible by a or b or c. For any x, let’s define AN,x to be the set of
multiples of x that are in 1, . . . , N . The general pattern is that |AN,x | = ⌊N/x⌋, where we’re
using floor function (rounding down to the next integer). In general, AN,x ∩AN,y = AN,lcm(x,y)
(and something similar for intersecting more than 2).
For a concrete example, let’s take N = 200 and a = 4, b = 5, c = 6. So our desired answer
would be
200 200 200 200
|A200,4 ∪ A200,5 ∪ A200,6 | = + + −
4 5 6 20
200 200 200
− − +
30 12 60
= 50 + 40 + 33 − 10 − 6 − 16 + 3
= 94. □
Example 8.1.5. Let’s consider ways to arrange the letters of the word BARBER such that
no two consecutive letters are the same. For the sake of brevity, let’s call an arrangement
“good” if it satisfies this property and “bad” otherwise. So, for example, BBARER is bad
since the two B’s are consecutive.
Good is defined by two conditions: the two B’s are not consecutive AND the two R’s are
not consecutive. Inclusion-exclusion lets us take care of unions, which you should think of
as taking an OR, so to better handle it, let’s flip it around and count the number of bad
arrangements. Then we’ll subtract it from the total number of arrangements.
The set of bad arrangements is the union of two sets: let A1 be the set of arrangements
where the two B’s appear consecutively, and let A2 be the set of arrangements where the
two R’s appear consecutively.
To count the size of A1 , we can use the following trick: merge the two B’s into a single
character (we can denote it B) and ask how many ways are there to arrange the 5 characters
B, A,R, E, R. This goes back to the problem about arranging flowers, so the answer is
5
1,1,2,1
or 5!/2 = 60. A2 is handled the same way so |A2 | = 60.
The intersection A1 ∩A2 is the set of arrangements where the two B’s appear consecutively
AND the two R’s appear consecutively. In that case, we can use the trick again and ask
about arrangements of B, A, R, E, so |A1 ∩ A2 | = 4! = 24.
So the number of bad arrangements is |A1 ∪ A2 | = 60 + 60 − 24 = 96. Our original problem
is about the opposite case, so we can subtract this from the total number of arrangements.
6
There are 2,2,1,1 = 180 total arrangements, so the number of good arrangements is 180 −
96 = 84. □
Example 8.1.6. If we have any word where each letter appears at most twice, then it’s
not difficult to generalize the work in the previous example to count the number of good
arrangements.
What about a word like TATTLE, where a letter appears 3 times? How many good
arrangements are there? We could try to do the same thing as before and count the bad
NOTES FOR MATH 184 53
Proof. It turns out to be easier to count the number of permutations which are not derange-
ments and then subtract that from the total number of permutations. For i = 1, . . . , n,
54 STEVEN V SAM
let Ai be the set of bijections f such that f (i) = i. Then the set of non-derangements is
A1 ∪ · · · ∪ An .
To apply inclusion-exclusion, we need to count the size of Ai1 ∩ · · · ∩ Aij for some choice of
indices i1 , . . . , ij . This is the set of bijections f : [n] → [n] such that f (i1 ) = i1 , . . . , f (ij ) = ij .
The remaining information to specify f are its values outside of i1 , . . . , ij , which we can
interpret as a bijection of [n] \ {i1 , . . . , ij } to itself. So there are (n − j)! of them. So we get
n
X X
|A1 ∪ · · · ∪ An | = (−1)j−1 |Ai1 ∩ · · · ∩ Aij |
j=1 1≤i1 <···<ij ≤n
n
X X
= (−1)j−1 (n − j)!
j=1 1≤i1 <···<ij ≤n
n
X
j−1 n
= (−1) (n − j)!
j=1
j
n
X n!
= (−1)j−1 .
j=1
j!
Remember that we have to subtract this from n!. So the final answer simplifies as so:
n n
j−1 n! n!
X X
n! − (−1) = (−1)j . □
j=1
j! j=0
j!
If we’re willing to use some calculus, we can conclude a more compact, although slightly
strange formula for the number of derangements. First, recall that for any real number r, we
have an infinite sum formula for er (now we’re doing calculus and not formal power series,
but it’s only for this discussion!):
∞
r
X ri
e = .
i=0
i!
There are two things we can conclude from this. First, taking r = −1 and breaking up the
sum gives
n ∞
1 X (−1)i X (−1)i
= + .
e i=0
i! i=n+1
i!
The first sum is the number of derangements on n letters divided by n!, or in words: the
percentage of permutations which are derangements.
We can bound the difference (for example, using Lagrange’s version of the Taylor remainder
formula1):
∞
X (−1)i 1
≤ .
i=n+1
i! (n + 1)!
1It’s
not crucial for this course, but let me remind you what (a special case of) it says: if f (x) is an
infinitely differentiable function whose Taylor series at 0 converges at r, then for each n, there exists ξ
Pn (i) (n+1)
between 0 and r such that f (r) − i=0 f i!(0) ri = f (n+1)! (ξ) n+1
r . For our purposes, r = −1, and we know
ξ
that e ≤ 1 for all ξ ∈ [−1, 0].
NOTES FOR MATH 184 55
n ∞
n! X i n!
X n!
= (−1) + (−1)i .
e i=0
i! i=n+1 i!
Now the first sum is the number of derangements of n objects and from what we just said,
the second term is at most n!/(n + 1)! = 1/(n + 1) in absolute value.
Hence the number of derangements is in the interval [ n!e − n+1 1
, n!e + n+1
1
]. The width of
this interval is 2/(n + 1) which is strictly smaller than 1 for n ≥ 2, so it can’t contain more
than one integer. Hence the number of derangements is simply the closest integer to n!/e,
giving us the following surprising fact (accounting for n = 1 is easy to do directly, so we’ll
ignore it):
Theorem 8.1.8. The number of derangements of size n is round(n!/e) where round just
means round to the nearest integer.
Remark 8.1.9. This is pretty surprising: there’s no reason to expect that rounding should
ever provide an exact answer to a counting problem, especially something that involves a
transcendental number like e.
To give some sense of how this looks, here are the approximate values of n!/e for n =
1, . . . , 7 (just two decimal places):
We can also use inclusion-exclusion to get an alternating sum formula for Stirling numbers.
k k
(k − i)n
1 X i k
X
S(n, k) = (−1) (k − i)n = (−1)i .
k! i=0 i i=0
i!(k − i)!
Proof. As we discussed before, k!S(n, k) is the number of ordered set partitions of [n] with
k blocks, and we interpreted that as the number of surjective functions f : [n] → [k] (the
blocks are just the preimages f −1 (i)). So we will count this quantity. For i = 1, . . . , k, let
Ai be the set of functions f : [n] → [k] such that i is not in the image of f . The surjective
functions are the complement of A1 ∪ · · · ∪ Ak from the set of all functions (there are k n
total functions). To apply inclusion-exclusion, we need to count the size of Ai1 ∩ · · · ∩ Aij for
1 ≤ i1 < · · · < ij ≤ k. This is the set of functions so that {i1 , . . . , ij } are not in the image;
equivalently, this is identified with the set of functions f : [n] → [k] \ {i1 , . . . , ij }, so there
56 STEVEN V SAM
Now divide both sides by k! to get the first equality of the theorem statement. The second
equality of the theorem statement comes from canceling the k! from the binomial coefficient.
□
I’m not sure if there’s some calculus we can do to conclude something interesting like in
the previous example.
8.2. Möbius inversion. Let A be an alphabet of size k. We want to count the number of
words of length n in A up to cyclic symmetry. This means that two words are considered
the same if one is a cyclic shift of another. For example, for words of length 4, the following
4 words are all the same:
a1 a2 a3 a4 , a2 a3 a4 a1 , a3 a4 a1 a2 , a4 a1 a2 a3 .
We can think of these as necklaces: the elements of A might be different beads we can put
on the necklace, but we would consider two to be the same if we can rotate one to get the
other. Naively, we might say that the number of necklaces of length n is k n /n since we have
n rotations for each necklace. However, there is a problem: the n rotations might not all be
the same. For example there are only 2 different rotations of 0101.
We have to separate necklaces into different groups based on their period: this is the
smallest d such that rotating d times gives the same thing. So for n = 4, we can have
necklaces of periods 1, 2, or 4, examples being 0000, 0101, 0001. There aren’t any of period
3: the period must divide the length (this isn’t entirely obvious but we will not try to prove
it).
Here’s an important observation: a word of period d only depends on its first d letters
because we will just repeat this sequence of length d exactly n/d times. Hence, as long as d
divides n, the number of words of length n and period d does not depend on n.
So it makes sense to define ω(d) to be the number of words of period d and length d (this
notation should also incorporate k, but we’ll assume k is fixed). Hence for necklaces of length
4, we get the following formula:
ω(2) ω(4)
ω(1) + + .
2 4
NOTES FOR MATH 184 57
So we want a formula for the number of words of a given period. We have another identity:
X
k n = |words of length n| = ω(d).
d|n
The second equality holds because if any ei ≥ 2 then pe11 · · · perr is divisible by the square
of a prime, namely p2i . The last sum is a sum over all products of subsets of the primes
{p1 , . . . , pr }, so we get
r
k r
X Y X X
|S|
µ( p) = (−1) = (−1) = 0.
p∈S
k
S⊆{p1 ,...,pr } S⊆{p1 ,...,pr }k=0
(Since n > 1, there is at least one prime in the factorization, so r > 0.) □
Theorem 8.2.4. Let α and β be two complex-valued functions on the positive integers.
(1) If X
α(d) = β(e)
e|d
for all positive integers d, then we also have
X
β(d) = µ(d/e)α(e).
e|d
We have a function
φ : {e | e divides d and is divisible by f } → {g | g divides d/f }
defined by φ(e) = d/e, which is well-defined since (d/f )/(d/e) = e/f which an integer by
the properties of e. There is an inverse function ψ defined by ψ(g) = d/g, which is also well-
defined: (d/f )/r is an integer, and so d/f = f · (d/f )/g is divisible by f , and d/(d/g) = g
so it also divides d. Using this bijection, we can rewrite the last sum:
X X
= β(f ) µ(g) .
f |d g| fd
NOTES FOR MATH 184 59
which is the left hand side of the identity we’re trying to prove. □
Corollary 8.2.5. For any positive integer d, we have
X
ω(d) = µ(d/e)k e .
e|d
where the product is over all k such that k and n are relatively prime. Then from our
discussion, we conclude that Y
xn − 1 = Φj (x).
j|n
Hence using the remark (because we can divide by polynomials in the world of general
functions), if we define α(d) = xd − 1 and β(d) = Φd (x), then we conclude that
Y
Φn (x) = (xj − 1)µ(n/j) .
j|n
60 STEVEN V SAM