0% found this document useful (0 votes)
29 views

Notes 184

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
29 views

Notes 184

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 61

NOTES FOR MATH 184

STEVEN V SAM

Contents
1. Review and introduction 2
1.1. Bijections 2
1.2. Sum and product principle 3
1.3. 12-fold way, introduction 3
1.4. Induction 4
2. Fundamental counting problems 5
2.1. Permutations 5
2.2. Words 7
2.3. Choice problems 9
2.4. Compositions 12
3. Stirling numbers 13
3.1. Set partitions 13
3.2. Falling factorials 15
3.3. Cycles in permutations 17
4. Binomial theorem and generalizations 19
4.1. Binomial theorem 19
4.2. Multinomial theorem 21
4.3. Re-indexing sums 22
5. Formal power series 22
5.1. Definitions 22
5.2. Binomial theorem (general form) 25
6. Ordinary generating functions 27
6.1. Linear recurrence relations 27
6.2. Integer partitions 32
6.3. Catalan numbers 36
7. Exponential generating functions 38
7.1. Products of exponential generating functions 38
7.2. Compositions of exponential generating functions 42
7.3. Cayley’s enumeration of labeled trees 46
7.4. Lagrange inversion formula 48
8. Sieving methods 50
8.1. Inclusion-exclusion 50
8.2. Möbius inversion 56

Date: December 4, 2023


Fall 2023.
1
2 STEVEN V SAM

1. Review and introduction


The first time I taught combinatorics, I followed Boná’s book fairly closely. Each additional
time I’ve added or taken away content and reordered the material. So there are some
similarities and the book is useful for additional explanations or examples, but it’s slowly
becoming its own thing.
1.1. Bijections. Given two functions f : X → Y and g : Y → X, we say that they are
inverses (of each other) if:
• f ◦ g is the identity function on Y , i.e., f (g(y)) = y for all y ∈ Y , and
• g ◦ f is the identity function on X, i.e., g(f (x)) = x for all x ∈ X.
In that case, the functions f and g are called bijections.
The following is a very important principle in counting arguments:
Theorem 1.1.1. If there exists a bijection between X and Y , then |X| = |Y |.
We can think of a bijection f between X and Y as a way of matching the elements of X
with the elements of Y . In particular, x ∈ X gets matched with y = f (x) ∈ Y . Note that
if x′ ∈ X was also matched with y, i.e., f (x′ ) = f (x), then the existence of the inverse g
shows us that g(f (x′ )) = g(f (x)), or more simply x = x′ . In other words, f is forced to be
one-to-one (or injective). On the other hand, every element is matched with something, i.e.,
every y ∈ Y is of the form f (x) for some x because we can take x = g(y). In other words, f
is forced to be onto (or surjective).
Remark 1.1.2. Bijections tell us that two sets have the same size without having to know
how many elements are actually in the set.
Here’s a small example: imagine there is a theater filled with hundreds of people and
hundreds of seats. If we wanted to know if there are the same number of people as seats, we
could count both. However, it would probably be much easier to just have each person take
a seat and see if there are any empty seats or any standing people. □
For notation, for a positive integer n, we’ll let [n] denote the set {1, 2, . . . , n}. If n = 0,
then [0] is the empty set ∅.
Example 1.1.3. Let n be a non-negative integer. Let A be the collection of subsets of
[n], and let B be the collection of subsets of [n + 1] that contain n + 1. We will show that
|A| = |B| by finding a bijection.
First, let’s consider n = 2 to get some idea of how this works:
A = {∅, {1}, {2}, {1, 2}}
B = {{3}, {1, 3}, {2, 3}, {1, 2, 3}}.
Hopefully you can see how they might line up: given an element of A, we can insert 3 into
to it to get an element of B, and going in the other direction is just removing the 3. We
might picture it as follows:
∅ 3
1 13
2 23
12 123
NOTES FOR MATH 184 3

To make this more formal, we’ll do general n. Define f : A → B by f (S) = S ∪ {n + 1}


and define g : B → A by g(T ) = T \ {n + 1}. Then f and g are inverses of each other (please
check this to make sure you understand what it means) so we conclude that |A| = |B|. □
Remark 1.1.4. This illustrates another perspective: bijections like the one above are related
to enumerating, or listing, all of the objects of some specific kind.
To elaborate, suppose we wanted to write a computer program to print out all subsets
of [n]. There are 2 kinds of subsets (we’ll use this again later): those that contain n and
those that do not. Using the previous example, we know how to produce all of the first kind
starting with a list of all of the subsets of [n − 1]. The second kind are easy too (I’ll leave
you to think about it though). Since computers are great at recursion, we’ve solved the
problem as soon as we deal with the base case (also easy, but what would it be?). This is
also closely related to induction, but we’ll discuss that shortly and hopefully the connections
will become more clear.
To summarize: bijections help us with enumeration (i.e., listing), but they can also be
used to answer the question “how many?”, which we’ll start doing soon. □
We’ll see some other examples later on.
1.2. Sum and product principle. Given two sets X and Y without any overlap, we
have |X ∪ Y | = |X| + |Y |. We’ll just take this for granted, though you can call it the
sum principle if you’d like a name for it. There’s an important corollary: the subtraction
principle: suppose we have a subset A in a set B. Then A and its complement B \ A don’t
overlap and A ∪ (B \ A) = B, so |A| = |B| − |B \ A|. It sounds trivial, but it’s a useful
technique so keep it in mind.
Now let S and T be any sets (overlapping or not). The set of pairs of elements (x, y)
where x ∈ S and y ∈ T is the Cartesian product S × T . The related product principle says
that |S × T | = |S| · |T |. Again, we will usually take this for granted and not always refer to
it by name.
Example 1.2.1. How many 4-digit numbers do not end with a 3? □
1.3. 12-fold way, introduction. We have k balls and n boxes. Roughly speaking, the first
part of the course is about counting the number of ways to put the balls into the boxes. We
can think of each assignment as a function from the set of balls to the set of boxes. Phrased
this way, we will be examining how many ways to do this if we require f to be injective,
or surjective, or completely arbitrary. Are the boxes supposed to be considered different or
interchangeable (we also use the terminology distinguishable and indistinguishable)? And
same with the balls, are they considered different or interchangeable? All in all, this will
give us 12 different problems to consider, which means we want to understand the following
table:

balls/boxes f arbitrary f injective f surjective


dist/dist
indist/dist (
1 if n ≥ k
dist/indist
0 if n < k
(
1 if n ≥ k
indist/indist
0 if n < k
4 STEVEN V SAM

Two situations have already been filled in and won’t be considered interesting. I’m not
going to emphasize this particular table. The point of bringing it up is to illustrate what
kinds of problems might be natural to consider. Some of these entries have simple formulas
in terms of mathematical notation we’re familiar with while others do not. Surprisingly just
changing the problem slightly can take you between these two cases. We’ll start working on
them soon, but not necessarily in a systematic way.
The perspective of the 12-fold way is due to Gian-Carlo Rota.

1.4. Induction. Induction is used when we have a sequence of statements P (0), P (1), P (2), . . .
labeled by non-negative integers that we’d like to prove. For example, P (n) could be the
statement:
n
X n(n + 1)
i= .
i=0
2
In order to prove that all of the statements P (n) are true using induction, we need to do 2
things:
• Prove that P (0) is true.
• Assuming that P (0), . . . , P (n) are true, use it to prove that P (n + 1) is true. Some-
times we only need P (n), sometimes we need all of them.
Let’s see how that works for our example:
• P (0) is the statement 0i=0 i = 0 · 1/2. Both P
P
sides are 0, so the equality is valid.
• Now we assume that P (n) is true, i.e., that ni=0 i = n(n + 1)/2. Now we want to
prove that n+1
P
i=0 i = (n + 1)(n + 2)/2.
Let’s start with the left hand side and simplify using everything we know:
n+1
X n
X
i= i + (n + 1)
i=0 i=0
n(n + 1)
= + (n + 1)
2
n
= ( + 1)(n + 1)
2
(n + 1)(n + 2)
=
2
The first line is just using what a sum is, the second line uses P (n) and the rest is
some algebra. So we’ve proven P (n + 1).
Since we’ve completed the two required steps, we have proven that the summation identity
holds for all n.
Remark 1.4.1. We have labeled the statements starting from 0, but sometimes it’s more
natural to start counting from 1 instead, or even some larger integer. The same reasoning
as above will apply for these variations. The first step “Prove that P (0) is true” is then
replaced by “Prove that P (1) is true” or wherever the start of your indexing occurs. □
A subset T of a set S is another set all of whose elements belong to S. We write this as
T ⊆ S. We allow the possibility that T is empty and also the possibility that T = S.
Theorem 1.4.2. There are 2n subsets of a set of size n.
NOTES FOR MATH 184 5

For example, if S = {1, ⋆, U }, then there are 23 = 8 subsets, and we can list them:
∅, {1}, {⋆}, {U }, {1, ⋆}, {1, U }, {U, ⋆}, {1, ⋆, U }.
The following proof will use induction, the sum principle, and the bijection principle so is
a great example to study carefully.
Proof. Let P (n) be the statement “any set of size n has exactly 2n subsets”.
We check P (0) directly: if S has 0 elements, then S = ∅, and the only subset is S itself,
which is consistent with 20 = 1.
Now we assume P (n) holds and use it to show that P (n + 1) is also true. Let S be a set
of size n + 1. Pick an element x ∈ S and let S ′ be the subset of S consisting of all elements
that are not equal to x, i.e., S ′ = S \ {x}. Then S ′ has size n, so by induction the number
of subsets of S ′ is 2n .
Next, every subset of S either contains x or it does not. To make these kinds of arguments
systematic, call a subset “type I” if it does not contain x and “type II” if it does. A subset has
to be exactly one of these kinds so we can count both and add the answers (sum principle).
Type I: The subsets which do not contain x are the same thing as subsets of S ′ , so there
are 2n of them because P (n) is true.
Type II: We will show there is a bijection between type I and type II
f
{type I subsets} o / {type II subsets}
g

(we’ve already seen this but we’ll redo it). If T is type I, then we define f (T ) = T ∪ {x}
which is type II. If U is type II, then we define g(U ) = U \ {x} which is type I. Then f and
g define a bijection (we won’t spell out every detail here, I hope it’s clear). So there are also
2n type II subsets.
All together we have 2n + 2n = 2n+1 subsets of S, so P (n + 1) holds. □
Continuing with our example, if x = 1, then the subsets not containing x are ∅, {⋆}, {U }, {⋆, U },
while those that do contain x are {1}, {1, ⋆}, {1, U }, {1, ⋆, U }. There are 22 = 4 of each kind.
A natural followup is to determine how many subsets have a given size. In our previous
example, there is 1 subset of size 0, 3 of size 1, 3 of size 2, and 1 of size 3. We’ll discuss this
problem in the next section.
Some more to think about:
• Show that Pni=0 i2 = n(n + 1)(2n + 1)/6 for all n ≥ 0.
P
• Show that ni=0 2i = 2n+1 − 1 for all n ≥ 0.
• Show that 4n < 2n whenever n ≥ 5.
What happens with ni=0 i3 or ni=0 i4 , or...? In the first two cases, we got polynomials
P P
in n on the right side. This actually always happens, and we’ll see why later when we talk
about falling factorials.

2. Fundamental counting problems


2.1. Permutations. Given a set S of objects, a permutation of S is a way to put all of the
elements of S in order.
Example 2.1.1. There are 6 permutations of {1, 2, 3} which we list:
123, 132, 213, 231, 312, 321. □
6 STEVEN V SAM

To count permutations in general, we define the factorial as follows: 0! = 1 and if n is a


positive integer, then n! = n · (n − 1)!. Here are the first few values:
0! = 1, 1! = 1, 2! = 2, 3! = 6, 4! = 24, 5! = 120, 6! = 720.
In the previous example, we had 6 permutations of 3 elements, and 6 = 3!. This holds more
generally:
Theorem 2.1.2. If S has n elements and n ≥ 1, then there are n! permutations of S.
Technically this works if n = 0 but I don’t want to confuse you with “permutations of
nothing”. So let’s just take it as a convention that the empty set has exactly one permutation
to match 0! = 1.
Proof. We do this by induction on n. Let P (n) be the statement that a set of size n has
exactly n! permutations.
The statement P (1) follows from the definition: there is exactly 1 way to order a single
element, and 1! = 1.
Now assume for our induction hypothesis that P (n) has been proven. Let S be a set of
size n + 1. To order the elements, we can first pick any element to be first, and then we
have to order the remaining n elements. There are n + 1 different elements that can be first,
and for each such choice, there are n! ways to order the remaining elements by our induction
hypothesis. So all together, we have (n + 1) · n! = (n + 1)! different ways to order all of them,
which proves P (n + 1). □
We can use factorials to answer related questions. For example, suppose that some of the
objects in our set can’t be distinguished from one another, so that some of the orderings end
up being the same.
Example 2.1.3. (1) Suppose we are given 2 red flowers and 1 yellow flower. Aside from
their color, the flowers look identical. We want to count how many ways we can
display them in a single row. There are 3 objects total, so we might say there are
3! = 6 such ways. But consider what the 6 different ways look like:
RRY, RRY, RY R, RY R, Y RR, Y RR.
Since the two red flowers look identical, we don’t actually care which one comes first.
So there are really only 3 different ways to do this – the answer 3! has included each
different way twice, but we only wanted to count them a single time.
(2) Consider a larger problem: 10 red flowers and 5 yellow flowers. There are too many
to list, so we consider a different approach. As above, if we naively count, then we
would get 15! permutations of the flowers. But note that for any given arrangement,
the 10 red flowers can be reordered in any way to get an identical arrangement, and
same with the yellow flowers. So in the list of 15! permutations, each arrangement
15!
is being counted 10! · 5! times. The number of distinct arrangements is then 10!5! .
(3) The same reasoning allows us to generalize. If we have r red flowers and y yellow
flowers, then the number of different ways to arrange them is (r+y)!
r!y!
.
(4) How about more than 2 colors of flowers? If we threw in b blue flowers, then again
the same reasoning gives us (r+y+b)!
r!y!b!
different arrangements. □
Now we state a general formula, which again can be derived by the same reasoning as in
(2) above. Suppose we are given n objects, which have one of k different types (for example,
NOTES FOR MATH 184 7

our objects could be flowers and the types are colors). Also, objects of the same type are
considered identical. For convenience, we will label the “types” with numbers 1, 2, . . . , k and
let ai be the number of objects of type i (so a1 + a2 + · · · + ak = n).
Theorem 2.1.4. The number of ways to arrange the n objects in the above situation is
n!
.
a1 !a2 ! · · · ak !
As an exercise, you should adapt the reasoning in (2) to give a proof of this theorem.
The quantity above will be used a lot, so we give it a symbol, called the multinomial
coefficient:  
n n!
:= .
a1 , a2 , . . . , ak a1 !a2 ! · · · ak !
In the case when k = 2 (a very important case), it is called the binomial coefficient. Note
that in this case, a2 = n − a1 , so for shorthand, one often just writes an1 instead of a1n,a2 .


For similar reasons, an2 is also used as a shorthand. In particular,



   
n n
=
a n−a
which is a very important identity.
2.2. Words. A word is a finite ordered sequence whose entries belong to some fixed set A
(which we call the alphabet). The length of the word is the number of entries that it has.
Entries may repeat, there is no restriction on that. Also, the empty sequence ∅ is considered
a word of length 0.
Example 2.2.1. Say our alphabet is A = {a, b}. The words of length ≤ 2 are:
∅, a, b, aa, ab, ba, bb. □
Theorem 2.2.2. If |A| = n, then the number of words on A of length k is nk .
Proof. A sequence of length k with entries in A is an element in the product set Ak =
A × A × · · · × A and |Ak | = |A|k .
Alternatively, we can think of this as follows. To specify a word, we pick each of its entries,
but these can be done independently of the other choices. So for each of the k positions,
we are choosing one of n different possibilities, which leads us to n · n · · · n = nk different
choices for words. □
For a positive integer n, let [n] denote the set {1, . . . , n}.
Example 2.2.3. We use words to show that the number of subsets of [n] is 2n (we’ve already
seen this result, so now we’re using a different proof method).
Given a subset S ⊆ [n], we define a word wS of length n in the alphabet {0, 1} as follows.
If i ∈ S, then the ith entry of wS is 1, and otherwise the entry is 0. This defines a function
f : {subsets of [n]} → {words of length n on {0, 1}}
by f (S) = wS . We can also define an inverse function g: given such a word w, g(w) is the
subset of positions where there is a 1 in w. We omit the check that these two functions are
inverse to one another. So f is a bijection, and the previous result tells us that there are 2n
words of length n on {0, 1}. □
8 STEVEN V SAM

Example 2.2.4. How many pairs of subsets S, T ⊆ [n] satisfy S ⊆ T ? We can also encode
this problem as a problem about words. Before we do that, let me illustrate with an example
with n = 5.
Suppose our example is S = {1, 2} and T = {1, 2, 4}. We can record the pair of subsets
as a table:
“in S and T ” “in T but not S” “not in T or S”
1 ✓
2 ✓
3 ✓
4 ✓
5 ✓
Note that there is no column for “in S but not T ” because that would violate the assump-
tion S ⊆ T .
You should convince yourself that given a table like above, with exactly one checkmark
in each row, we can recover the information of S and T . Since there are 35 choices for such
tables, we see that’s how many pairs of subsets we have.
We can also represent this visually using a Venn diagram.

3, 5
4
1, 2

The inner circle represents the subsets S, the outer circle represents the elements in T (it
contains the inner circle, which reflects the condition S ⊆ T , and the square represents our
original set [5]. What we should take away from this is that there are exactly 3 regions in
this picture, corresponding to the 3 columns in the previous table. Every way of assigning
numbers to these regions matches up with a table and also with a pair of subsets.
Finally, let’s make this a little more formal. Let A be the alphabet of size 3 whose elements
are: “in S and T ”, “in T but not S” and “not in T or S”. It can be helpful to visualize
these as the different regions in this diagram:

[n]
T
S

Then each pair S ⊆ T gives a word of length n in A: the ith entry of the word is the element
which describes the position of i. So there are 3n such pairs.
Quick: how many pairs of subsets S, T ⊆ [n] satisfy the condition that S is not a subset of
T ? Use the subtraction principle: its the complement of the above in the set of all possible
pairs of subsets, so the answer is 4n − 3n . □
NOTES FOR MATH 184 9

How about words without repeating entries (we will define these to be injective words).
Define the falling factorial:
(n)k := n(n − 1)(n − 2) · · · (n − k + 1).
There are k numbers being multiplied in the above definition. When n = k, we have
(n)n = n!, so this generalizes the factorial function.
Theorem 2.2.5. If |A| = n and n ≥ k, then there are (n)k injective words of length k in A.
Proof. Start with a permutation of A. The first k elements in that permutation give us an
injective word of length k. But we’ve overcounted because we don’t care how the remaining
n−k things we threw away are ordered. In particular, this process returns each word exactly
(n − k)! many times, so our desired quantity is
n!
= (n)k . □
(n − k)!
Some further things to think about:
• A small city has 10 intersections. Each one could have a traffic light or gas station
(or both or neither). How many different configurations could this city have?
• Using that (n)k = n · (n − 1)k−1 , can you find a proof for Theorem 2.2.5 that uses
induction?
• Which entries of the 12-fold way table can we fill in now?

2.3. Choice problems. We finish up with some related counting problems. Recall we
showed that an n-element set has exactly 2n subsets. We can refine this problem by asking
about subsets of a given size.
Theorem 2.3.1. The number of k-element subsets of an n-element set is
 
n n!
= .
k k!(n − k)!
There are many ways to prove this, but we’ll just do one for now:
Proof. It doesn’t matter which set of size n we’re dealing with, so we work with [n] for
convenience. In the last section on words, we identified subsets of [n] with words of length
n on {0, 1}, with a 1 in position i if and only if i belongs to the subset. So the number of
subsets of size k are exactly the number of words with exactly k instances of 1. This is the
same as arranging n − k 0’s and k 1’s from the section on permutations. In that case, we
n!
saw that the answer is (n−k)!k! . □
n  
X n
Corollary 2.3.2. = 2n .
k=0
k

Proof. The left hand side counts the number of subsets of [n] of some size k where k ranges
from 0 to n. But all subsets of [n] are accounted for and we’ve seen that 2n is the number
of all subsets of [n]. □
n

Here’s an important identity for binomial coefficients (we interpret −1 = 0):
10 STEVEN V SAM

Theorem 2.3.3 (Pascal’s identity). For any k ≥ 0, we have


     
n n n+1
+ = .
k−1 k k
Proof. The right hand side is the number of subsets of [n + 1] of size k. There are 2 types
of such subsets: those that contain n + 1 and those that do not. Note that the subsets that
do contain n + 1 are naturally in bijection with the subsets of [n] of size k − 1: to get such
a subset, delete n + 1. Those that do not contain n + 1 are naturally already in bijection
n

with the subsets of [n] of size k. The two sets don’t overlap and their sizes are k−1 and
n

k
, respectively. □
Pascal’s triangle gives a nice way to visualize binomial coefficients using this identity.
Please look it up if you’re interested.
An important variation of subset is the notion of a multiset. Given a set S, a multiset of
S is like a subset, but we allow elements to be repeated. Said another way, a subset of S
can be thought of as a way of assigning either a 0 or 1 to an element, based on whether it
gets included. A multiset is then a way to assign some non-negative integer to each element,
where numbers bigger than 1 mean we have picked them multiple times.
Example 2.3.4. There are 10 multisets of [3] of size 3:
{1, 1, 1}, {1, 1, 2}, {1, 1, 3}, {1, 2, 2}, {1, 2, 3},
{1, 3, 3}, {2, 2, 2}, {2, 2, 3}, {2, 3, 3}, {3, 3, 3}.
Aside from exhaustively checking, how do we know that’s all of them? Here’s a trick: given
a multiset, add 1 to the second smallest value and add 2 to the largest value. What happens
to the above:
{1, 2, 3}, {1, 2, 4}, {1, 2, 5}, {1, 3, 4}, {1, 3, 5},
{1, 4, 5}, {2, 3, 4}, {2, 3, 5}, {2, 4, 5}, {3, 4, 5}.
We get all of the 3-element subsets of [5]. The process is reversible using subtraction, so
there is a more general fact here. □
Theorem 2.3.5. The number of k-element multisets of a set of size n is
 
n+k−1
.
k
Proof. First, it doesn’t really matter which set of size n we consider, since given any two, we
can always relabel elements to get a bijection between their k-element multisets. So we will
take [n] as our set.
We adapt the example above to find a bijection f between k-element multisets of [n] and
k-element subsets of [n + k − 1]. Given a multiset S, sort the elements as s1 ≤ s2 ≤ · · · ≤ sk .
From this, we get a subset f (S) = {s1 , s2 + 1, s3 + 2, . . . , sk + (k − 1)} of [n + k − 1]. On the
other hand, given a subset T of [n + k − 1], sort the elements as t1 < t2 < · · · < tk . From
this, we get a multiset g(T ) = {t1 , t2 − 1, t3 − 2, . . . , tk − (k − 1)} of [n]. We will omit the
details that f and g are well-defined and inverse to one another. □
Remark 2.3.6. Here’s another way to count multisets. For simplicity, assume that our set
is [n]. We can think of a multiset as putting k indistinguishable balls into n boxes labeled
NOTES FOR MATH 184 11

1, . . . , n (the number of balls in box i represents how many times i appears in the multiset).
How do we encode this in a useful way? I’ll illustrate with an example.
Suppose n = 5 and k = 4. Our multiset is {1, 1, 3, 5}. Then box 1 has 2 balls, while boxes
3 and 5 each have 1 ball. I’ll encode that by the following picture:
◦ ◦ || ◦ ||◦
We have 4 vertical lines which separate the balls into 5 regions (the boxes). I’ll leave it to
you to convince yourself that every ordering of 4 balls and 4 vertical lines corresponds to
exactly one multiset (i.e., there is a bijection). There are a few subtle points: make sure you
understand why we aren’t putting vertical lines on the outside, for example. In particular,
we have 8 things and just need to know which 4 of them are vertical lines, so there are 84
multisets of size 4 from a set of size 5. Finally, this works in general, but again, I’ll leave it
to you to fill in the details.
This method might be easier to remember, though if you think about it hard enough,
you’ll see that it’s pretty much the same thing as the previous proof. □
Example 2.3.7 (Counting poker hands). We’ll apply some of the ideas above to count the
number of ways to receive various kinds of poker hands. The setup is as follows: Each card
has one of 4 suits: ♣, ♡, ♠, ♢, and one of 13 values: 2, 3, 4, 5, 6, 7, 8, 9, 10, J, Q, K, A.
Each possible pair of suit and value appears exactly once, so there are 52 cards total.
In each situation below, we want to count how many subsets of 5 out of the 52 cards have
certain special properties.
(1) (Four of a kind) This means that 4 of the 5 cards have the same value (and the 5th
necessarily has a different value). Since there are 4 cards in a given suit, the only
relevant information is the value that appears 4 times and the extra card. There are
13 choices for the value, and 48 cards leftover, so there are 13 · 48 ways to get a “four
of a kind”.
(2) (Full house) This means that 3 of the 5 cards have the same value and the other
2 also have the same value. These two values necessarily have to be different. The
relevant information is the two values (with order! why?) and then the suits that are
chosen. There are 13 · 12 ways
 to choose two different values with order. To choose 3
suits out of 4, there are 43 ways, 4

and to choose 2 suits out of 4, there are 2
ways,
so in total we get 13 · 12 · 43 42 .

(3) (Two pairs) This means that 2 of the 5 cards have the same value, and 2 of the
remaining 5 cards have the same value. We will also impose these values are different
(so it doesn’t overlap with (1)) and that the value of the 5th card is also different (so
it doesn’t overlap with (2)).
The two values of the pairs are chosen without order (why is this different?) so
there are 13

2
ways. For each value, we choose 2 suits out of 4, so pick up another
4 2

2
. We’ve removed 8 cards from the possibility of what the fifth card can be, so it
 42
has 44 possibilities, which gives us a final answer of 132 2
· 44.
(4) (Straight) This means that the values of the 5 cards can be put in consecutive order
(funny rule: A can either count as a 1 or as the value above K). There are no
conditions on the suits. So we need to choose the 5 consecutive values. The smallest
value can be one of: A, 2, 3, 4, 5, 6, 7, 8, 9, 10, and once that is chosen, all of the
other values are determined, so there are 10 possibilities here. For each of the 5 suits,
12 STEVEN V SAM

we need to choose 1 of 4, so we have another 45 choices, giving us a final answer of


10 · 45 . □
Some additional things:
• From the formula, we see that nk = n−k n
 
. This would also be implied if we could
construct a bijection between the k-element subsets and the (n − k)-element subsets
of [n]. Can you find one?
• What other entries of the 12-fold way table can be filled in now?
• Given variables x, y, z, we can form polynomials. A monomial is a product of the
form xa y b z c , and its degree is a + b + c. How many monomials in x, y, z are there of
degree d? What if we have n variables x1 , x2 , . . . , xn ?
• There are other special configuration of 5 cards which are significant in Poker. A
good test of your understanding is to look up the list and see if you can derive the
number of ways to get each. A further variation of this is to change the rules: either
look at 6-card hands (3 pairs, 2 triples, 4 of a kind plus a pair, etc.), 7-card hands...
or to change the number of suits or values.
2.4. Compositions. Below, n and k are positive integers.
Definition 2.4.1. A sequence of non-negative integers (a1 , . . . , ak ) is a weak composition
of n if a1 + · · · + ak = n. If all of the ai are positive, then it is a composition. We call k the
number of parts of the (weak) composition. □
Theorem 2.4.2. The number of weak compositions of n with k parts is n+k−1 = n+k−1
 
n k−1
.
Proof. We will construct a bijection between weak compositions of n with k parts and n-
element multisets of [k]. First, given a weak composition (a1 , . . . , ak ), we get a multiset
which has the element i exactly ai many times. Since a1 + · · · + ak = n, this is an n-element
multiset of [k]. Conversely, given a n-element multiset S of [k], let ai be the number of times
that i appears in S, so that we get a weak composition (a1 , . . . , ak ) of n. □
Example 2.4.3. We want to distribute 20 pieces of candy (all identical) to 4 children. How
many ways can we do this? If we order the children and let ai be the number of pieces of
candy that the ith child receives, then (a1 , a2 , a3 , a4 ) is just a weak composition of 20 into 4
parts, so we can identify all ways with
 the set of all weak compositions. So we know that
20+4−1 23

the number of ways is 20
= 20 .
What if we want to ensure that each child receives at least one piece of candy? First, hand
each child 1 piece of candy. We have 16 pieces left, and we can  distribute them as we like,
so we’re counting weak compositions of 16 into 4 parts, or 19 16
. □
As we saw with the previous example, given a weak composition (a1 , . . . , ak ) of n, we can
think of it as an assignment of n indistinguishable objects into k distinguishable boxes, so
this fills in one of the entries in the 12-fold way. A composition is an assignment which is
required to be surjective, so actually this takes care of 2 of the entries.
Corollary 2.4.4. The number of compositions of n into k parts is n−1

k−1
.
Proof. If we generalize the argument in the last example, we see that compositions of n into
k parts are in bijection with weak compositions of n − k into k parts. □
Corollary 2.4.5. The total number of compositions of n (into any number of parts) is 2n−1 .
NOTES FOR MATH 184 13

Proof. The possible number of parts of a composition of n is anywhere between k = 1 to


k = n. So the total number of compositions possible is
n   X n−1  
X n−1 n−1
= = 2n−1 . □
k=1
k − 1 k=0
k

The answer suggests that we should be able to find a bijection between compositions of n
and subsets of [n − 1]. Can you find one?

3. Stirling numbers
3.1. Set partitions. (Weak) compositions were about indistinguishable objects into distin-
guishable boxes. Now we reverse the roles and consider distinguishable objects into indis-
tinguishable boxes.
Definition 3.1.1. Let X be a set. A partition of X is an unordered collection of nonempty
subsets S1 , . . . , Sk of X such that every element of X belongs to exactly one of the Si . An
ordered partition of X is the same, except the subsets are ordered. The Si are the blocks
of the partition. Partitions of sets are also called set partitions to distinguish from integer
partitions, which will be discussed in the next section. □
Example 3.1.2. Let X = {1, 2, 3}. There are 5 partitions of X:
{{1, 2, 3}}, {{1, 2}, {3}}, {{1, 3}, {2}}, {{2, 3}, {1}}, {{1}, {2}, {3}}.
When we say unordered collection of subsets, we mean that {{1, 2}, {3}} and {{3}, {1, 2}}
are to be considered the same partition.
The notation above is cumbersome, so we can also write the above partitions as follows:
123, 12|3, 13|2, 23|1, 1|2|3. □
The number of partitions of X with k blocks only depends on the number of elements of
X. So for concreteness, we will usually assume that X = [n].
Example 3.1.3. If we continue with our previous example of candy and children: imagine
the 20 pieces of candy are now labeled 1 through 20 and that the 4 children are all identical
clones. The number of ways to distribute candy to them so that each gets at least 1 piece
of candy is then the number of partitions of [20] into 4 blocks. □
Definition 3.1.4. We let S(n, k) be the number of partitions of a set of size n into k
blocks. These are called the Stirling numbers of the second kind. By convention, we
define S(0, 0) = 1. □
Note that S(n, k) = 0 if k > n.
The number of ordered partitions of a set of size n into k blocks is k!S(n, k): the extra data
we need is a way to order the blocks and this can be chosen independently of the partition.
So S(n, k) is, by definition, an answer to one of the 12-fold way entries: how many ways
to put n distinguishable objects into k indistinguishable boxes so that each box gets at least
one object. Similarly, k!S(n, k) is the number of ways to put n distinguishable objects into
k distinguishable boxes so that each box gets at least one object. Alternatively:
Theorem 3.1.5. k!S(n, k) is the number of surjective functions f : [n] → [k].
14 STEVEN V SAM

Unfortunately, it will be generally hard to get nice, exact formulas for S(n, k), but we can
do some special cases:
Example 3.1.6. For n ≥ 1, S(n, 1) = S(n, n) = 1. For n ≥ 2, S(n, 2) = 2n−1 − 1 and
S(n, n − 1) = n2 .
Why? To compute S(n, 2), let’s first count ordered set partitions of [n] with 2 blocks.
This is almost the same as just picking a subset, S since then we can consider the partition
S, [n] \ S. The problem is that S is not allowed to be empty and neither is [n] \ S. So that
leaves us with 2n − 2 options for S, which is the number of ordered set partitions. To get
unordered partitions, we divide by 2!, or just 2.
To compute S(n, n − 1) think about what the blocks must look like. In order to split n
objects into n − 1 blocks, we need to have n − 2 blocks of size 1 and a single block of size
2. So the only relevant information
 is which elements go in that block of size 2. This can be
any subset of size 2, hence the n2 . □
We also have the following recursive formula:
Theorem 3.1.7. For n > 0, we have (interpret S(n, k) = 0 if either input is negative)
S(n, k) = S(n − 1, k − 1) + k · S(n − 1, k).
Proof. Consider two kinds of partitions of [n]. The first kind (type I) is when n is in its own
block. In that case, if we remove this block, then we obtain a partition of [n − 1] into k − 1
blocks. To reconstruct the original partition, we just add a block containing n by itself. So
the number of such partitions is S(n − 1, k − 1).
The second kind (type II) is when n is not in its own block. This time, if we remove n, we
get a partition of n − 1 into k blocks. However, it’s not possible to reconstruct the original
block because we can’t remember which block it belonged to. For the purposes of this proof
only, let’s define a marked partition to be a pair (σ, b) where σ is a set partition and b is one
of its blocks.
Then we can define a bijection
   
type II partitions of marked partitions of
f: →
[n] into k blocks [n − 1] into k blocks
as follows: if τ is a type II partition of [n] into k blocks, then f (τ ) = (σ, b) where σ is the
same as τ except n is deleted, and b is whichever block originally contained n. The inverse
g is defined by adding back n into the block b. Finally, the number of marked partitions of
n − 1 into k blocks is k · S(n − 1, k).
If we add both answers, we account for all possible partitions of [n], so we get the identity
we want. □

Here’s a table of small values of S(n, k):


n\k 1 2 3 4 5
1 1 0 0 0 0
2 1 1 0 0 0
3 1 3 1 0 0
4 1 7 6 1 0
5 1 15 25 10 1
NOTES FOR MATH 184 15

We define B(n) to be the number of partitions of [n] into any number of blocks. This is
the nth Bell number. By definition,
X n
B(n) = S(n, k).
k=0

Example 3.1.8. The following recursion holds for Bell numbers:


n  
X n
B(n + 1) = B(i).
i=0
i
To prove this, we separate all of the set partitions of [n + 1] based on the number of elements
in the block that contains n + 1. Consider those where the size is j. To count the number of
these, we need to first choose the other elements to occupy the same block as n + 1. These
n
numbers come from [n] and there are j − 1 to be chosen, so there are j−1 ways to do this.
We have to then choose a set partition of the remaining n + 1 − j elements, and there are
n

B(n + 1 − j) many of these. So the number of such partitions is j−1 B(n + 1 − j). The
possible values for j are between 1 and n + 1, so we get the identity
n+1  
X n
B(n + 1) = B(n + 1 − j).
j=1
j − 1
n
= ni to get the desired
 
Re-index the sum by setting i = n + 1 − j and use the identity n−i
identity. □
3.2. Falling factorials. Recall that the image of a function f is the set of values it actually
takes.
Lemma 3.2.1. The number of functions f : [n] → [d] whose image has size k is S(n, k)(d)k .
Proof. To list all suchfunctions, we can separate by their image. So pick a k-element subset
T ⊆ [d] (there are kd many of them). Let t1 < · · · < tk be the elements of T , written in
order. Let AT be the set of functions f : [n] → [d] whose image is T and let B be the set
of ordered set partitions of [n] into k blocks. We will show that there is a bijection between
AT and B.
Given f ∈ AT , for i = 1, . . . , k, let Xi = {x ∈ [n] | f (x) = ti }. Then each Xi is nonempty
and (X1 , . . . , Xk ) is an ordered set partition of [d].
Conversely, given an ordered set partition (Y1 , . . . , Yk ) of [d], define a function g ∈ AT
by g(x) = ti if x ∈ Yi . These two processes give the bijection between AT and B, so
|AT | = |B| = k!S(n, k).
Since this is the same for all T , we see that the number of functions whose image has size
d
k is k k!S(n, k) = (d)k S(n, k). □
Theorem 3.2.2. Let d, n be positive integers such that d ≥ n. Then
X n
dn = S(n, k)(d)k .
k=1

Proof. The left hand side counts the number of functions f : [n] → [d] (since such a function
is equivalent to the word f (1)f (2) · · · f (n) in the alphabet [d]). By Lemma 3.2.1, the right
side counts the number of functions whose image has size k for all possible values of k. But
that accounts for every function exactly once, so we have equality. □
16 STEVEN V SAM

Corollary 3.2.3. For any non-negative integer n we have


n
X
n
x = S(n, k)(x)k .
k=0

Proof. If n = 0, then we have x0 = 1 = (x)0 and S(0, 0) = 1, so it works.


Otherwise, for n > 0, S(n, 0) P = 0 and we can omit the k = 0 term from the sum. Now
take the difference f (x) = xn − nk=1 S(n, k)(x)k which is a polynomial in x. If we plug in
x = d for any positive integer d ≥ n, then Theorem 3.2.2 tells us that f (d) = 0. This tells
us that f (x) has infinitely many roots, which means that f (x) is the 0 polynomial. □

Why is this interesting? Consider the following 2 formulas:


n
X n(n + 1)
i= ,
i=0
2
n
X n(n + 1)(2n + 1)
i2 = .
i=0
6

It’s natural to try to guess what happens for ni=0 ir for general r, but the pattern is not
P
easy to guess. Falling factorials work much better.
Example 3.2.4. Since (i)1 = i, we’ve already seen that
n
X 1 1
(i)1 = (n + 1)n = (n + 1)2 .
i=0
2 2

I’ve written it the second way to match the next identity:


n
X 1 1
i(i − 1) = (n + 1)n(n − 1) = (n + 1)3 .
i=0
3 3

For practice, you should prove this yourself. I won’t work it out in the interest of time and
because it’s a special case of the next identity. □
Given the above, there is a tempting guess for the general case. Let’s go ahead and prove
it:
Theorem 3.2.5. For any non-negative integer d, we have the identity
n
X 1
(i)d = (n + 1)d+1 .
i=0
d+1

Proof. If d = 0, the identity just says that ni=0 1 = (n + 1), which is certainly true. So we
P
don’t need to consider this case anymore and we’re going to assume that d > 0. (I single
this out to avoid dealing with separate cases in the arguments below.)
Now we can try to prove the identity by induction on n. The statement P (n) is just the
identity above.
First the base case P (0): if n = 0, then since d > 0, the left side is 0 and the right side is
1
also 0 (the right side is d+1 1 · 0 · · · (−d + 1)).
NOTES FOR MATH 184 17

Now we assume that P (n) holds, i.e., the identity above is true and need to prove P (n+1):
n+1
X n
X
(i)d = (i)d + (n + 1)d
i=0 i=0
1
= (n + 1)d+1 + (n + 1)d
d+1
(n − d + 1)
= (n + 1)d · + (n + 1)d · 1
d+1
(n − d + 1) d + 1
= (n + 1)d ( + )
d+1 d+1
(n + 1)d · (n + 2)
=
d+1
1
= (n + 2)d+1 . □
d+1
We can try to use everything we’ve learned now to sum up general powers:
n
X n X
X r
r
i = S(r, k)(i)k
i=0 i=0 k=0
r
X n
X
= S(r, k) (i)k
k=0 i=0
r
X S(r, k)
= (n + 1)k+1 .
k=0
k+1
It’s still somewhat complicated, but better than having no pattern at all. In the next
section we’ll see how to expand falling factorials into powers (opposite to what we’ve just
done).

3.3. Cycles in permutations. We’re going to be discussing permutations of [n]. We can


think of each one as a list of numbers, like 341526. Here’s a few other ways to visualize
or think about permutations. First, we might think of this as a function σ : [n] → [n] that
sends i to whatever is in the ith position. In our example, that would mean
σ(1) = 3, σ(2) = 4, σ(3) = 1, σ(4) = 5, σ(5) = 2, σ(6) = 6.
We can also think of this as a directed graph where we draw an arrow going from i to σ(i);
again this example would be:
( /
1h 3 2O 4 6Y

5
This highlights the important concept for us: every permutation can be pictured as a disjoint
union of cycles. In our example, there are 3 cycles, and they are {1, 3}, {2, 4, 5}, and {6}.
Recall the cycle decomposition of a permutation σ ∈ Sn : starting with any 1 ≤ i ≤ n,
we consider the sequence i, σ(i), σ 2 (i), . . . , σ k−1 (i) where σ k (i) = i (there is guaranteed to be
such a k since σ has finite order). We write the cycle as i → σ(i) → · · · → σ k−1 (i) → i. Note
18 STEVEN V SAM

that k could be 1, in which case the cycle has length 1 and also that there isn’t a unique
beginning (we could have started and ended with σ(i) instead of i).
In our running example, its cycle decomposition is 1 → 3 → 1, 2 → 4 → 5 → 2, 6 → 6.
The graph is probably the easiest way to think about it though for our purposes.
Let c(n, k) be the number of permutations in Sn with exactly k different cycles. We use
the convention that c(0, 0) = 1. Note that c(n, 0) = 0 if n > 0. These are the (signless)
Stirling numbers of the first kind.

Proposition 3.3.1. If n ≥ k ≥ 1, we have

c(n, k) = c(n − 1, k − 1) + (n − 1)c(n − 1, k).

Proof. We break up the permutations with k cycles into 2 types.


The first type consists of permutations such that n is its own cycle. Removing this cycle
gives a bijection between such permutations and permutations of Sn−1 with k − 1 cycles, so
the total number is c(n − 1, k − 1).
The second type consists of permutations such that n is not in its own cycle. We deal
with this in a way similar to “marked partitions” in the previous section. Namely, define a
marked permutation of n − 1 to be a pair (i, τ ) where i is an element of [n − 1] and τ is a
permutation of n − 1. We will construct a bijection

   
type II permutations marked permutations of
f: → .
of [n] with k cycles [n − 1] with k cycles

Given a type II permutation σ, consider the portion i → n → j in its cycle decomposition.


Since n is not in its own cycle, we know that i ̸= n and j ̸= n. We define a permutation τ
of [n − 1] by τ (i) = j and τ (x) = σ(x) for all x ̸= i. Then we define f (σ) = (i, τ ). Note
that τ still has k cycles. Informally, we’re “cutting” n out of the graph and stitching the
rest together. This second description suggests how to define the inverse: given a marked
permutation (i, τ ), we replace the edge i → τ (i) with i → n → τ (i). So there are (n −
1)c(n − 1, k) many of type II permutations. □

Corollary 3.3.2. For n ≥ 0, we have

n
X
c(n, k)xk = x(x + 1) · · · (x + n − 1),
k=0

where the right side is 1 if n = 0, and in particular,

n
X
(−1)n−k c(n, k)xk = (x)n .
k=0
NOTES FOR MATH 184 19

Proof. We prove the first identity by induction on n. For n = 0, both sides are 1. Similarly,
if n = 1, both sides are x. Now suppose n ≥ 2. Then c(n, 0) = c(n − 1, 0) = 0 and
n
X n
X n
X
k k−1
c(n, k)x = x c(n − 1, k − 1)x + (n − 1) c(n − 1, k)xk
k=1 k=1 k=1
n−1
X n−1
X
=x c(n − 1, k)xk + (n − 1) c(n − 1, k)xk
k=0 k=0
n−1
X
= (x + n − 1) c(n − 1, k)xk
k=0
= (x + n − 1) · x(x + 1) · · · (x + n − 2)
where the last equality is by induction, and this proves what we claimed.
The second identity follows by doing the substitution x 7→ −x and multiplying by (−1)n .

The coefficients (−1)n−k c(n, k) are the Stirling numbers of the first kind, and are usually
denoted s(n, k).
Example 3.3.3. Some computations to discuss:
• c(n, n − 1) (assume n ≥ 2)
There’s only possibility for cycle sizes: one of size 2 and n − 2 of size 1. There’s
nothing to do for the latter, so we need to pick the 2 numbers for the first cycle in n2
many ways. Since there’s only one way to turn 2 numbers into a cycle, we’re done,
n

and c(n, n − 1) = 2 .
• c(n, 1) (assume n ≥ 1):
We need to put all n numbers into a single cycle. Given a cycle, you can pick any
number and then list the rest in order to get a permutation (i.e., 1 → 2 → 3 → 1
can be thought of as 123 or 231 or 312). There are n! permutations, but we are
overcounting by a factor of n (the choice of where to start), so there are (n − 1)! ways
to put n things in a cycle and hence c(n, 1) = (n − 1)!.
• c(5, 3)
There are 2 cases for how big the cycles are: either sizes 3,1,1 or 2,2,1. In the first
case, we pick 3 numbers in 53 ways and then pick a cycle in 2 ways, so there are 20
permutations in this case.
5

For the second case, we pick 2 numbers for one cycle in 2
ways then pick another 2
3

numbers for the other cycle in 2 ways. Now divide by 2 because we’re overcounting
(our choices lead to an order on the cycles), so we get 15 permutations in this case.
Thus, c(5, 3) = 35. □

4. Binomial theorem and generalizations


4.1. Binomial theorem. The binomial theorem is about expanding powers of x + y where
we think of x, y as variables. For example:
(x + y)2 = x2 + 2xy + y 2 ,
(x + y)3 = x3 + 3x2 y + 3xy 2 + y 3 .
20 STEVEN V SAM

Theorem 4.1.1 (Binomial theorem). For any n ≥ 0, we have


n  
n
X n i n−i
(x + y) = xy .
i=0
i
Here’s the proof given in the book.
Proof. Consider how to expand the product (x + y)n = (x + y)(x + y) · · · (x + y). To get a
term, from each expression (x + y), we have to either pick x or y. The final term we get is
xi y n−i if the number of times we chose x is i (and hence the number of times we’ve chosen y
is n − i). The number of times this term appears is therefore the number of different ways
we could have chosen x exactly i times. For each way of doing this, we can associate to it a
subset of [n] of size i: the number j is in the subset if and only if we chose x in the
 jth copy
n
of (x + y). We have already seen that the number of subsets of [n] of size i is i . □
Here’s a proof using induction.
Proof. For n = 0, the formula becomes (x + y)0 = 1 which is valid.
Now suppose the formula is valid for n. Then we have
n  
n+1 n
X n i n−i
(x + y) = (x + y)(x + y) = (x + y) xy .
i=0
i
For a given 0 ≤ k ≤ n+1, there are at most 2ways to get xk y n+1−k on the rightside: either
 we
n k−1 n−k+1 n k n−k n n
get it from x · k−1 x y or from y · k x y (with the convention −1 = n+1 = 0).
n+1
 k (n+1)−k
If we add these up, then we get k x y by Pascal’s identity. □
The second proof can also be used to derive Pascal’s identity as a consequence of the
binomial theorem.
We can manipulate the binomial theorem in a lot of different ways (taking derivatives with
respect to x or y, or doing substitutions). This will give us a lot of new identities. Here are
a few of particular interest (some are old):
n  
n
X n
Corollary 4.1.2. 2 = .
i=0
i
Proof. Substitute x = y = 1 into the binomial theorem. □
This says that the total number of subsets of [n] is 2n which is a familiar fact from before.
n  
i n
X
Corollary 4.1.3. For n > 0, we have 0 = (−1) .
i=0
i
Proof. Substitute x = −1 and y = 1 into the binomial theorem. □
Example 4.1.4. If we rewrite the previous identity we get
X n X n
= .
0≤i≤n
i 0≤i≤n
i
i odd i even
This says that the number of subsets of even size is the same as the number of subsets of odd
size. It is worth finding a more direct proof of this fact which does not rely on the binomial
theorem.
NOTES FOR MATH 184 21

But we can keep going. We also know that


X n X n X n  
n
+ = = 2n
0≤i≤n
i 0≤i≤n
i i=0
i
i odd i even

If we combine both identities, we conclude that


X n X n
= = 2n−1 . □
0≤i≤n
i 0≤i≤n
i
i odd i even
n  
n−1
X n
Corollary 4.1.5. n2 = i .
i=0
i

Proof. Take thePderivative


 i−1of n−i
both sides of the binomial theorem with respect to x to get
n−1 n n
n(x + y) = i=0 i i x y . Now substitute x = y = 1. □

It is possible to interpret this formula as the size of some set so that both sides are different
ways to count the number of elements in that set. Can you figure out how to do that? How
about if we took the derivative twice with respect to x? Or if we took it with respect to x
and then with respect to y?

4.2. Multinomial theorem. Below, we have sums with multiple lines below the summation
symbol. This is usually means that we are summing over what is in the first line and the
following lines are conditions that are imposed by the things we sum. By default, the
variables represent integers. So for example,
X

i
0≤i≤10
P10
means the same thing as i=0 .

Theorem 4.2.1 (Multinomial theorem). For n, k ≥ 0, we have


 
X n
n
(x1 + x2 + · · · + xk ) = xa11 xa22 · · · xakk .
a1 , a2 , . . . , ak
(a1 ,a2 ,...,ak )
ai ≥0
a1 +···+ak =n

To clarify, the sum is over all possible k-tuples of non-negative integers whose sum is n.

Proof. The proof is similar to the binomial theorem. Consider expanding the product (x1 +
· · · + xk )n . To do this, we first have to pick one of the xi from the first factor, pick another
one from the second factor, etc. To get the term xa11 xa22 · · · xakk , we need to have picked x1
exactly a1 times, picked x2 exactly a2 times, etc. We can think of this as arranging n objects,
where ai of them have “type i”. In that  case, we’ve already discussed that this is counted
n
by the multinomial coefficient a1 ,a2 ,...,ak . □

By performing substitutions, we can get a bunch of identities that generalize the one from
the previous section. I’ll omit the proofs, try to fill them in.
22 STEVEN V SAM

 
n
X n
k = ,
a1 , a2 , . . . , ak
(a1 ,a2 ,...,ak )
ai ≥0
a1 +···+ak =n
 
X
a1 n
0= (1 − k) ,
a1 , a2 , . . . , ak
(a1 ,a2 ,...,ak )
ai ≥0
a1 +···+ak =n
 
n−1
X n
nk = a1 .
a1 , a2 , . . . , ak
(a1 ,a2 ,...,ak )
ai ≥0
a1 +···+ak =n

4.3. Re-indexing sums. The next chunk of the course heavily involves sums and ma-
nipulating them, so let me make a few remarks about re-indexing sums. There isn’t any
mathematical content here, it’s just working with notation, but it may be helpful to have
this spelled out.
Say we have a sum starting from 1 and going to some other quantity, like 10:
10
X
f (i).
i=1

For whatever reason, we might prefer that it starts at 0. You can do this by defining j = i−1.
If you substitute i = j + 1 everywhere, you get
9
X
f (j + 1).
j=0

If you like, you can now replace j with i again to get 9i=0 f (i + 1). This is a common thing
P
we’ll do, so it’s good to get used to it. This is especially useful if we want to combine sums
that don’t have the same starting and ending points:
10
X 9
X 9
X 9
X 9
X
f (i) + g(k) = f (i + 1) + g(k) = (f (i + 1) + g(i)).
i=1 k=0 i=0 k=0 i=0

5. Formal power series


5.1. Definitions.
P∞ A formal power series (in the variable x) is an expression of the form
n
A(x) = n=0 an x where the an are scalars (usually integers or rational Pnumbers). Instead
n
of writing the sum from 0 to ∞, we will usually just write A(x) = a
n≥0 n x . If A(x)
n n
is a formal power series, let [x ]A(x) denote the coefficient of x in A(x), so in this case,
[xn ]A(x) = an .
By definition, two formal power series are equal if and only if all of their coefficients match
up:
A(x) = B(x) if and only if an = bn for all n.
A good heuristic is that these are infinite degree polynomials.
NOTES FOR MATH 184 23

n
P
Let B(x) = n≥0 bn x be a formal power series. The sum of two formal power series is
defined by X
A(x) + B(x) = (an + bn )xn .
n≥0
The product is defined by
X n
X
n
A(x)B(x) = cn x , cn = ai bn−i .
n≥0 i=0

This is what you get if you just distribute like normal. As a special case, if ai = 0 for i > 0,
we just get X
a0 B(x) = a0 b n x n .
n≥0
Addition and multiplication are commutative, so A(x)+B(x) = B(x)+A(x) and A(x)B(x) =
B(x)A(x). They are also associative, so it is unambiguous how to add or multiply 3 or more
power series.
Example 5.1.1. Let A(x) = B(x) = n≥0 xn . Then
P
X
A(x) + B(x) = 2xn ,
n≥0
X
A(x)B(x) = (n + 1)xn . □
n≥0

A formal power series A(x) is invertible if there is a power series B(x) such that A(x)B(x) =
1. In that case, we write B(x) = A(x)−1 = 1/A(x) and call it the inverse of A(x). If it
exists, then B(x) is unique and also A(x) = 1/B(x).
Example 5.1.2. Let A(x) = n≥0 xn and B(x) = 1 − x. Then A(x)B(x) = 1, so B(x) is
P
the inverse of A(x). For that reason, we will use the expression
1 X
= xn .
1 − x n≥0
Following the calculus terminology, we call this the geometric series. However, the formal
power series x is not invertible: the constant term of xB(x) is 0 no matter what B(x) is, so
there is no way that an inverse exists. □
Theorem 5.1.3. A formal power series A(x) is invertible if and only if its constant term is
nonzero.
Proof. Write A(x) = n≥0 an xn . We want to solve A(x)B(x) = 1 if possible. If we multiply
P
the left side out and equate coefficients, we get the following (infinite) system of equations:
a0 b 0 = 1
a0 b 1 + a1 b 0 = 0
a0 b 2 + a1 b 1 + a2 b 0 = 0
a0 b 3 + a1 b 2 + a2 b 1 + a3 b 0 = 0
..
.
24 STEVEN V SAM

If a0 = 0, then there is no solution to the first equation so A(x) is not invertible.


If a0 ̸= 0, then we can solve the equations one by one. Formally, we can prove by induction
on n that there exist coefficients b0 , . . . , bn that make the first n + 1 equations valid. For the
base case n = 0, we have b0 = 1/a0 . So suppose we have found the coefficients b0 , . . . , bn
already. At the next step, we will have
n
1 X
bn = − ai bn−i .
a0 i=1

In the sum, we have i > 0, so bn−i is a coefficient we already solved for in a previous step.
Hence we get a formula for bn that makes the next equation valid as well. □

It is important to emphasize that formal here means that we are not considering questions
of convergence. We can take infinite sums and infinite products of formal power series as
long as the coefficient of xn involves only finitely many multiplications and additions for each
n (adding 0 or multiplying by 1 infinitely many times is ok). I don’t want to spend much
time discussing these issues but they do come up, so let’s go over it briefly.
Given a nonzero power series A(x), define its minimum degree, denoted mdeg(A(x)) to
be the smallest n so that [xn ]A(x) ̸= 0 (and define mdeg(0) = ∞). Infinite operations are
allowed whenever the computation of a given coefficient is finite.
Theorem 5.1.4. Let A1 (x), A2 (x), . . . such that limi→∞ mdeg(Ai (x)) = ∞. Then the fol-
lowing two expressions are well-defined (i.e., computing the coefficient of xn is always a finite
process):
(1) the infinite sum ∞
P
i=1
Q∞ Ai (x) = A1 (x) + A2 (x) + · · · , and
(2) the infinite product i=1 (1 + Ai (x)) = (1 + A1 (x))(1 + A2 (x)) · · · .
Example 5.1.5. I’ll give an example to illustrate the intuition for infinite sums. Let Ai (x) =
xi + xi+1 + xi+2 + · · · , so like the geometric series, but starting at xi . Then mdeg(Ai (x)) = i,
and here’s how we can think of the infinite sum A1 (x) + A2 (x) + · · · :
x + x 2 + x3 + x4 + x5 + · · ·
x 2 + x3 + x4 + x5 + · · ·
x3 + x4 + x5 + · · ·
x4 + x5 + · · ·
x5 + · · ·
..
.
2 3 4 5
x + 2x + 3x + 4x + 5x + · · ·
Even though the sum is infinite, each column (coefficient) only requires a finite sum, so we
don’t run into issues. □
Given two formal power series A(x) and B(x), suppose that A(x) has no constant term.
Then we can define the composition by
X
(B ◦ A)(x) = B(A(x)) = bn A(x)n .
n≥0
NOTES FOR MATH 184 25

This is well-defined because mdeg(bn A(x)n ) ≥ n · mdeg(A(x)) (equal if bn ̸= 0 and otherwise


the left side is ∞). Since A has no constant term, mdeg(A(x)) ≥ 1, so limi→∞ mdeg(bn A(x)n ) =
∞.
Example 5.1.6. Let d be a positive integer, A(x) = xd and B(x) = n
P
n≥0 x . Then
B(A(x)) = n≥0 xdn . We can do this substitution into the identity
P

(1 − x)B(x) = 1
to get X
(1 − xd ) xdn = 1,
n≥0
from which we conclude that
1 X
= xdn . □
1 − xd n≥0

We can also take the derivative D of a formal power series. We define it as follows:
X X
(DA)(x) = A′ (x) = nan xn−1 = (n + 1)an+1 xn .
n≥0 n≥0

All of the familiar properties of derivatives hold:


D(A + B) = DA + DB
D(A · B) = (DA) · B + A · (DB)
D(B ◦ A) = (DA) · (DB ◦ A)
D(A)
D(1/A) = −
A2
D(An ) = nD(A)An−1 .
1 1
= n≥0 xn . Taking the derivative of the left side gives (1−x)
P
Example 5.1.7. We have 1−x 2.
n−1 n
P P
Taking the derivative of the right side gives n≥0 nx = n≥0 (n + 1)x . We’ve already
seen that these two expressions arePequal.
How would we simplify B(x) = n≥0 nxn ? We have a few options. First:
X X 1 1 1 − (1 − x) x
B(x) = (n + 1)xn − xn = 2
− = 2
= .
n≥0 n≥0
(1 − x) 1−x (1 − x) (1 − x)2
Or more directly:
X 1
B(x) = x nxn−1 = x . □
n≥0
(1 − x)2

5.2. Binomial theorem (general form). If m is a rational number and k is a non-negative


integer, we define generalized binomial coefficients by
   
m m m(m − 1)(m − 2) · · · (m − k + 1)
= 1, = (k > 0).
0 k k!
Note that when m is a positive integer, this agrees with our previous formulas. An important
difference: if m is a non-negative integer and k > m, then m

k
= 0. If m is not a non-negative
m
integer (i.e., a negative integer or a non-integer), then k ̸= 0 for all k. This lets us formulate
a generalized binomial theorem:
26 STEVEN V SAM

Theorem 5.2.1 (General binomial theorem). Let m be a rational number. Then


X m
m
(1 + x) = xn .
n≥0
n

When m is a non-negative integer, this agrees with the ordinary binomial theorem with
x)m = 1/(1 + x)−m . For fractional
y = 1. When m is a negative integer, the meaning is (1 + √
1/2
m, we can also interpret them. For example, (1 + x) = 1 + x, which represents a formal
power series whose square is equal to 1 + x. In other words,
!2
X 1/2
xn = 1 + x.
n≥0
n

We won’t use the fractional case beyond m = 1/2 much so I’m not going to go into any
further details about their definition.
This will be useful in later calculations. Let’s work out a few cases.
Example 5.2.2. Consider m = −1. We know from before that
1 X
= xn .
1 − x n≥0

If we substitute in −x for x, then we get


1 X
= (−1)n xn .
1 + x n≥0

We should also be able to get this from the binomial theorem with m = −1. We have
(−1)n n!
 
−1 (−1)(−2) · · · (−1 − n + 1)
= = = (−1)n .
n n! n!
More generally, consider m = −d for some positive integer d. Then from what we just did,
we have !d
X
(1 + x)−d = (−1)n xn .
n≥0
The right side could be expanded, possibly by using induction on d, but we’d have to know
a pattern before we could proceed. Instead, let’s use the binomial theorem directly:
(−1)n (d + n − 1)(d + n − 2) · · · (d)
 
−d (−d)(−d − 1) · · · (−d − n + 1)
= =
n n! n!
 
(d + n − 1)! d+n−1
= (−1)n = (−1)n .
(d − 1)!n! n
This gives us the identities
 
1 n d+n−1
X
= (−1) xn ,
(1 + x)d n≥0
n
1 X d + n − 1
= xn . □
(1 − x)d n≥0
n
NOTES FOR MATH 184 27

Example 5.2.3. Consider m = 1/2. Then


(−1)n−1 (2n − 3)(2n − 5) · · · 3
 
1/2 (1/2)(−1/2)(−3/2) · · · (1/2 − n + 1)
= = .
n n! 2n n!
This doesn’t simplify much further, so now is a good time to introduce the double factorial:
if n is a positive integer, we set n!! = n(n − 2)(n − 4) · · · . In other words, if n is odd, then
n!! is the product of all positive odd integers between 1 and n, and if n is even, then n!! is
the product of all positive even integers between 2 and n. Keep in mind this does not mean
we do the factorial twice. With our new notation, we have
(−1)n−1 (2n − 3)!!
 
1/2
= .
n 2n n!
Remember that this means that
!2
X (−1)n−1 (2n − 3)!!
xn = 1 + x.
n≥0
2n n!

To check that by hand, we could expand the left side, but it would be a lot of work. □

6. Ordinary generating functions


Ordinary generating functions are just a way of encoding infinite sequences of numbers
as formal power series. P
Formally, given a sequence of numbers a0 , a1 , a2 , . . . , its ordinary
generating function is n≥0 an xn .

6.1. Linear recurrence relations. Our first application of ordinary generating functions is
to solve linear recurrence relations. A sequence of numbers is said to satisfy a homogeneous
linear recurrence relation of order d if there are scalars c1 , . . . , cd such that cd ̸= 0, and for
all n ≥ d, we have
an = c1 an−1 + c2 an−2 + · · · + cd an−d .
We’ve seen this idea before, although in slightly different forms.
Example 6.1.1. The Fibonacci numbers Fn are given by the sequence 1, 1, 2, 3, 5, 8, 13, 21, . . . .
This isn’t really telling you what the general Fn is, so instead let me say that for all n ≥ 2,
we have
Fn = Fn−1 + Fn−2 .
Together with the initial conditions F0 = 1, F1 = 1, this is enough information to calculate
any Fn . So (by definition), the Fibonacci numbers satisfy a linear recurrence relation of
order 2. □
In general, if we want to define a sequence using a linear recurrence relation of order d,
we need to specify the first d initial values a0 , a1 , . . . , ad−1 to allow us to calculate all of the
terms.
Our goal here is to get closed formulas for sequences that satisfy linear recurrence relations.
Example 6.1.2. When d = 1, this is easy to do:
an = c1 an−1 = c21 an−2 = c31 an−3 = · · · = cn1 a0 . □
28 STEVEN V SAM

So now we’ll focus on the case d = 2. So we have a sequence of numbers a0 , a1 , a2 , . . . that


satisfies a recurrence relation of the form
an = c1 an−1 + c2 an−2
whenever n ≥ 2 (here c1 , c2 are some constants and c2 ̸= 0). We want to find a closed formula
for an .
The characteristic polynomial of this recurrence relation is defined to be
t2 − c1 t − c2 .
p
c1 ± c21 + 4c2
The roots of this polynomial are . Call them r1 and r2 . (They will be
2
2
imaginary numbers if c1 + 4c2 < 0, but everything will still work.) So we can factor the
characteristic polynomial as
(6.1.3) t2 − c1 t − c2 = (t − r1 )(t − r2 ).
Comparing constant terms, we get r1 r2 = c2 , so r1 ̸= 0 and r2 ̸= 0 because we assumed that
c2 ̸= 0.
Here is the first statement:
Theorem 6.1.4. If r1 ̸= r2 , then there are constants α1 and α2 such that
an = α1 r1n + α2 r2n
for all n.
To solve for the coefficients, plug in n = 0 and n = 1 to get
a0 = α 1 + α 2
a1 = r1 α1 + r2 α2 .
Then you have to solve for α1 , α2 (a0 , a1 are part of the original sequence, so are given to
you).
Example 6.1.5. Let’s finish with the example of the Fibonacci numbers Fn . These are
defined by
F0 = 1
F1 = 1
Fn = Fn−1 + Fn−2 for n ≥ 2.

2 1± 5 √
So the characteristic polynomial is t − t − 1. Its roots are . Set r1 = (1 + 5)/2 and
√ 2
r2 = (1 − 5)/2. So we have
Fn = α1 r1n + α2 r2n
and we have to solve for α1 and α2 . Plug in n = 0, 1 to get:
1 = α1 + α2
1 = α1 r1 + α2 r2 .
NOTES FOR MATH 184 29

So α1 = 1 − α2 ; plug this into the second formula to get 1 =


√ (1 − α2 )r1 + α2 r2 . Rewrite
√ this
as 1 − r1 = α2 (r2 − r1 ). We can simplify this: r2 − r1 = − 5 and 1 − r1 = (1 − 5)/2. So
√ √
1− 5 1+ 5
α2 = − √ , α1 = 1 − α2 = √ .
2 5 2 5
In conclusion:
√ √ !n √ √ !n
1+ 5 1+ 5 1− 5 1− 5
Fn = √ − √
2 5 2 2 5 2
√ !n+1 √ !n+1
1 1+ 5 1 1− 5
=√ −√ .
5 2 5 2
(The last step wasn’t necessary, we just did that to reduce the number of radical signs.) □
Proof of Theorem 6.1.4. Define X
A(x) = an x n .
n≥0
The recurrence relation says that we have an identity
X X X
A(x) = a0 + a1 x + (c1 an−1 + c2 an−2 )xn = a0 + a1 x + c1 an−1 xn + c2 an−2 xn .
n≥2 n≥2 n≥2

Remember the recurrence is only valid for n ≥ 2, so we have to separate out the first two
terms. Now comes an important point: the last two sums are almost the same as A(x) if we
re-index them:
X X X
an−1 xn = an xn+1 = x an xn = xA(x) − a0 x
n≥2 n≥1 n≥1
X X
an−2 xn = an xn+2 = x2 A(x).
n≥2 n≥0

In particular,
A(x) = a0 + a1 x + c1 xA(x) − c1 a0 x + c2 x2 A(x).
We can rewrite this as
a0 + (a1 − c1 a0 )x
(6.1.6) A(x) = .
1 − c1 x − c2 x2
We want to factor the denominator. To do this, plug in t 7→ x−1 into (6.1.3) and multiply
by x2 to get
1 − c1 x − c2 x2 = (1 − r1 x)(1 − r2 x).
Now we can apply partial fraction decomposition to (6.1.6) to write
α1 α2
A(x) = +
1 − r1 x 1 − r2 x
for some constants α1 , α2 . But these terms are both geometric series, so we can further write
X X
A(x) = α1 r1n xn + α2 r2n xn .
n≥0 n≥0

The coefficient of x on the left side is an and the coefficient of xn on the right side is
n

α1 r1n + α2 r2n . So we have equality for all n. □


30 STEVEN V SAM

There is a loose end: what if r1 = r2 ?


Theorem 6.1.7. If r1 = r2 , then there are constants α1 and α2 such that
an = α1 r1n + α2 nr1n
for all n.
Again, to solve for α1 , α2 , just plug in n = 0, 1 to get a system of equations:
a0 = α 1
a1 = α1 r1 + α2 r1 .
(From this we could solve the general case, but I think it’s easier to remember the way I’ve
written it.)
Proof. We can start in the same way as in the previous proof. The only difference is that we
are trying to take the partial fraction decomposition of
a0 + (a1 − c1 a0 )x
A(x) = .
(1 − r1 x)2
This can still be done, but now it looks like
β1 β2
+
1 − r1 x (1 − r1 x)2
for some constants β1 , P
β2 . The first is a geometric series, and the second we’ve seen: remem-
ber that 1/(1 − x) = n≥0 (n + 1)xn . So we get instead
2

X X
A(x) = β1 r1n xn + β2 (n + 1)r1n xn .
n≥0 n≥0

Comparing coefficients, we get


an = β1 r1n + β2 (n + 1)r1n = (β1 + β2 )r1n + β2 nr1n .
So α1 = β1 + β2 and α2 = β2 . □
Higher degree recurrence relations
an = c1 an−1 + · · · + cd an−d
can be solved in the same way: one has to first find the roots of the characteristic polynomial
td − c1 td−1 − c2 td−2 − · · · − cd and apply partial fraction decomposition as in the two proofs
above. The simplest case is when the roots r1 , . . . , rd are all distinct. In this case, we can
say that there exist constants α1 , . . . , αd such that
an = α1 r1n + · · · + αd rdn
for all n. In order to solve for α1 , . . . , αd , we have to consider n = 0, . . . , d − 1 separately to
get a system of d linear equations in d variables. When the roots appear with multiplicities,
we have to do something like we did in Theorem 6.1.7. For example, if d = 5 and the roots
are r1 with multiplicity 3 and r2 with multiplicity 2 (and r1 ̸= r2 ), then we would have
an = α1 r1n + α2 nr1n + α3 n2 r1n + α4 r2n + α5 nr2n .
This should look familiar to you if you’ve ever solved a linear homogeneous differential
equation with constant coefficients.
I’ll leave it to you to formulate the general case.
NOTES FOR MATH 184 31

Example 6.1.8. We’ve only been dealing with homogeneous linear recurrence relations
so far, i.e., an is expressed as a linear combination of previous terms, but how about the
inhomogeneous case? For example, consider the recurrence relation
an = an−1 + an−2 + 2 (n ≥ 2).
When we don’t know what to do, weP can always try to find a formula for the generating
function. In this case, setting A(x) = n≥0 an xn , we have
X
A(x) = a0 + a1 x + an x n
n≥2
X
= a0 + a1 x + (an−1 + an−2 + 2)xn
n≥2

2x2
= a0 + a1 x + x(A(x) − a0 ) + x2 A(x) +
1−x
and then we can solve for A(x) as before (I’ll stop here). Sometimes, there are shortcuts we
can use to turn these into homogeneous linear recurrence relations (though of higher degree).
For example, if n ≥ 3, then we know that an = an−1 + an−2 + 2 and an−1 = an−2 + an−3 + 2,
so taking the difference gives
an = 2an−1 − an−3
which is now order 3, but homogeneous. We originally had 2 initial values a0 and a1 , so we
should remember that a2 can be determined by using the original equation a2 = a1 + a0 + 2.
This works out for a lot of different kinds of inhomogeneous situations, but instead of
taking a difference, we may have to take other linear combinations (for example, instead of
a constant 2, we might have 2n ) and repeating the process can be helpful too (for example,
instead of a constant 2, we might have 2n) as well as combining these ideas (for example,
n2n ). □
Remark 6.1.9. Finally, let me explain one thing to make the inhomogeneous case a little
easier.
Start with a homogeneous recurrence relation
an = c1 an−1 + · · · + cd an−d .
Its characteristic polynomial is td − c1 td−1 − · · · − cd . Given a constant r, it will be useful to
know that characteristic polynomial of the difference
an = c1 an−1 + · · · + cd an−d
−r(an−1 = c1 an−2 + · · · + cd an−d−1 )
an − ran−1 = c1 (an−1 − ran−2 ) + · · · + cd (an−d − ran−d−1 )
is (t − r)(td − c1 td−1 − · · · − cd ), i.e., we pick up a factor of t − r.
I’ll just give an example of how you can use this. Say you had the inhomogeneous equation
an = c1 an−1 + c2 an−2 + 2n
I’d want to take r = 2 above to get the difference
an − 2an−1 = c1 (an−1 − 2an−2 ) + c2 (an−2 − 2an−3 ).
This is now homogeneous and its characteristic polynomial is (t−2)(t2 −c1 t−c2 ) (the second
factor comes from ignoring the inhomogeneous part of the original equation). □
32 STEVEN V SAM

6.2. Integer partitions. Now we deal with the notion of an “unordered composition”.
These are much harder to study than compositions, which is why we’ve postponed it until
now.
Definition 6.2.1. Let n be a positive integer. A partition λ of n is an unordered collection
of positive integers a1 , . . . , ak such that a1 + · · · + ak = n. The ai are the parts of λ. These
are also called integer partitions to distinguish from set partitions. The number k is called
the length of the partition, and denoted ℓ(λ). We also say that it’s the number of parts of
λ. Then n is the size of the partition, and we denote this by |λ| = n.
The number of partitions of n is denoted p(n), the number of partitions of n with k parts
is denoted pk (n), and the number of partitions of n with at most k parts is denoted p≤k (n).
By convention, there is exactly one partition of n = 0, and it has length 0; we denote it
by the empty set ∅. □
In other words, 2, 3 represents the same partition as 3, 2 since we do not distinguish
between different orderings.
Definition 6.2.2. We can always write the numbers in decreasing order, and we call that
the normal form of the partition. This gives an unambiguous way to write each partition,
and we’ll denote it with tuple notation. □
In the previous example, the normal form for this partition is (3, 2). We will usually always
write partitions in normal form.
Example 6.2.3. p(5) = 7 since there are 7 partitions of 5:
(5), (4, 1), (3, 2), (3, 1, 1), (2, 2, 1), (2, 1, 1, 1), (1, 1, 1, 1, 1). □
Definition 6.2.4. There’s another convenient way to describe partitions. Given a partition
λ and a positive integer k, let mk (λ) be the number of times that k appears in λ. This is
multiplicity of k in λ. If we know all of the multiplicities, then we also know the partition,
so can also be used to describe λ. □
P
Note that |λ| = k mk (λ)k. The sum is always finite since mk only takes nonzero values
for a finite number of k.
Example 6.2.5. For λ = (4, 2, 2, 1), we have m4 (λ) = 1, m2 (λ) = 2, m1 (λ) = 1, and
mk (λ) = 0 for all other k. The above fact just says that |λ| = 4 + 2 · 2 + 1. □
We can visualize partitions using Young diagrams. To illustrate, the Young diagram of
(4, 2, 1) is

Y (λ) =

In general, it is a left-justified collection of boxes with λi boxes in the ith row (counting from
top to bottom).
The transpose (or conjugate) of a partition λ is the partition whose Young diagram is
obtained by flipping Y (λ) across the main diagonal. For example, the transpose of (4, 2, 1)
is (3, 2, 1, 1):
NOTES FOR MATH 184 33

Note that we get the parts of a partition from a Young diagram by reading off the row
lengths. The transpose is obtained by instead reading off the column lengths. The notation
is λT . If we want a formula: λTi = |{j | λj ≥ i}|.
Note that (λT )T = λ. A partition λ is self-conjugate if λ = λT .
Example 6.2.6. Some self-conjugate partitions: (4, 3, 2, 1), (5, 1, 1, 1, 1), (4, 2, 1, 1):

, ,


Theorem 6.2.7. The number of partitions λ of n with ℓ(λ) ≤ k is the same as the number
of partitions µ of n such that all µi ≤ k.
Proof. We get a bijection between the two sets by taking transpose. Details omitted. □
This tells us that p≤k (n), which is defined to be the number of partitions of n with at
most k parts, is also the number of partitions of n using only the parts 1, . . . , k. We’ll use
this second interpretation now.
We want a simple expression for n≥0 p≤k (n)xn . When k = 1, we get p≤1 (n) = 1 for all
P
n, so
X 1
p≤1 (n)xn = .
n≥0
1−x
Now consider k = 2. We can think of partitions in terms of how many 1’s they use and
how many 2’s they use, i.e., in terms of their multiplicities (m1 (λ), m2 (λ)) (there is no mi (λ)
for i ≥ 3). Then consider the product
(1 + x + x2 + x3 + · · · )(1 + x2 + (x2 )2 + (x2 )3 + · · · ).
When we multiply this out, each term is of the form xa (x2 )b = xa+2b , so we see that the
total coefficient of xn is exactly the number of ways of writing n as a sum of 1’s and 2’s since
this specific term can be thought of as the partition λ with m1 (λ) = a and m2 (λ) = b. Both
sums are geometric series, so we have
X 1
p≤2 (n)xn = 2)
.
n≥0
(1 − x)(1 − x
This same reasoning extends to any k, and we can prove that
k
X
n
Y 1 1
p≤k (n)x = i
= .
n≥0 i=1
1−x (1 − x)(1 − x2 ) · · · (1 − xk )
We can actually take k → ∞ to guess the formula (due to Euler)
X Y 1
p(n)xn = i
.
n≥0 i≥1
1 − x
Why is this correct? First, we specify that the meaning of an infinite product of terms of
the form 1 + · · · is to multiply out choices where something with a positive power of x is
34 STEVEN V SAM

only chosen a finite number of times (so that each term has finite degree and we’re otherwise
multiplying 1 infinitely many times).
Consider the coefficient of xd in the infinite product on the right. We have to consider the
infinite product
(1 + x + x2 + · · · )(1 + x2 + x4 + · · · )(1 + x3 + x6 + · · · ) · · ·
and the only way to get xd is to choose 1 from (1 + xi + x2i + · · · ) if i > d, so the coefficient
of xd is the same as the coefficient of xd in di=1 1−x
1 n
Q P
i = n≥0 p≤d (n)x . Since p≤d (d) = p(d),
the infinite product indeed has the right coefficients.
More generally, the same argument proves the following:
Theorem 6.2.8. For any subset S of the positive integers, the generating function for the
number of partitions that only use parts from S is
Y 1
i
.
i∈S
1 − x
The above formula lets us restrict which parts are allowed, but does not impose restrictions
on how many times each part can be used. We can actually restrict both. It’s probably easiest
to first examine this with examples. The general case follows the same idea but requires a
lot of notation to state, so I won’t attempt to do so.
Example 6.2.9. Let an be the number of integer partitions of n that only use the parts 1, 2,
and 3, with the additional constraint that 3 can only be used at most 2 times. Examining how
we understood the products above, the generating function for an is the following product
1 + x3 + x6
(1 + x + x2 + x3 + · · · )(1 + x2 + (x2 )2 + (x2 )3 + · · · )(1 + x3 + (x3 )2 ) = .
(1 − x)(1 − x2 )
As before, the first factor corresponds to how many times the part 1 is used, the second factor
corresponds to how many times 2 gets used, and the third factor corresponds to how many
times 3 gets used. Explicitly, if we multiply this out, then each term looks like xa (x2 )b (x3 )c
where a, b are not constrained, by 0 ≤ c ≤ 2. This counts the partition where 1 appears a
times, 2 appears b times, and 3 appears c times. □
Example 6.2.10. Let bn be the number of integer partitions of n that only use the parts 3
and 4, but the number of times that 4 appears has to be odd. Then its generating function
is the following product:
3 3 2 3 3 4 4 3 4 5 x4
(1 + (x ) + (x ) + (x ) + · · · )((x ) + (x ) + (x ) + · · · ) = . □
(1 − x3 )(1 − x8 )
Now let’s take this idea and prove an interesting identity due to Euler.
Let podd (n) be the number of partitions of n such that all parts are odd. Let pdist (n) be
the number of partitions of n such that all parts are distinct.
Theorem 6.2.11 (Euler). podd (n) = pdist (n).
For example, when n = 5, both quantities are 3 since we have (5), (3, 1, 1), (1, 1, 1, 1, 1) for
podd (5) and (5), (4, 1), (3, 2) for pdist (5).
Proof. There are ways to build bijections, but let’s prove this by showing that they have the
same generating function since the idea is a little surprising and could even be considered
fun.
NOTES FOR MATH 184 35

By Theorem 6.2.8, we have


X Y 1 1
podd (n)xn = = .
n≥0 i≥0
1 − x2i+1 (1 − x)(1 − x3 )(1 − x5 )(1 − x7 ) · · ·

How about for pdist (n)? I claim that


X Y
pdist (n)xn = (1 + xi ) = (1 + x)(1 + x2 )(1 + x3 )(1 + x4 ) · · · .
n≥0 i≥1

To multiply out the right side, we either choose 1 or xi from the ith term, and we can only
avoid choosing 1 finitely many times. What we get then is xN where N is the sum of the i
where we chose xi . But we get xN one time for every partition of N into distinct parts, so
the coefficient is pdist (N ).
2i
Now we observe that (1 + xi ) = 1−x
1−xi
, so we can rewrite it as
X 1 − x2 1 − x4 1 − x6 1 − x8 1 − x10
pdist (n)xn = · · · · ···
n≥0
1 − x 1 − x2 1 − x3 1 − x 4 1 − x5

We can start canceling: each 1 − x2iQon the top cancels 2i


P with the ncorresponding 1 − x on
1
the bottom. What we’re left with is i≥0 1−x2i+1 = n≥0 podd (n)x . □

Finally, let’s take a step back and try to understand the significance of these product
formulas that we’re getting.

Example 6.2.12. As before, let p≤2 (n) be the number of integer partitions of n with at
most 2 parts. To simplify notation, set an = p≤2 (n). We showed before that
X 1
an x n = .
n≥0
(1 − x)(1 − x2 )

What if we multiply both sides by (1 − x)(1 − P x2 ) = 1 − x − x2 + x3 to clear denominators?


Let’s write the left side as 4 lines (each line is n≥0 an xn multiplied by one of the 4 terms):

a0 + a1 x + a2 x 2 + a3 x 3 + a4 x 4 + · · ·
−a0 x − a1 x2 − a2 x3 − a3 x4 − · · ·
−a0 x2 − a1 x3 − a2 x4 − · · ·
+a0 x3 + a1 x4 + · · ·

The right side is just 1, so by equating coefficients, we conclude that a0 = 1, a1 = a0 ,


a2 = a1 + a0 , and for all n ≥ 3, we have an = an−1 + an−2 − an−3 so that p≤2 (n) satisfies a
homogeneous linear recurrence relation of order 3. We can make this slightly more uniform
if we adopt the convention that an = 0 whenever n is negative: then we can simply say that
an = an−1 + an−2 − an−3 for all n ≥ 1.
By a similar derivation, p≤3 (n) satisfies a homogeneous linear recurrence relation of order
6, and in general p≤k (n) satisfies a homogeneous linear recurrence relation of order 1 + 2 +
· · · + k = k(k + 1)/2. □
36 STEVEN V SAM

Example 6.2.13. We can actually go one step further with the previous example and
consider the sequence an = p(n). In this case, we showed that
X 1 Y 1
an x n = 2 3 4
= i)
.
n≥0
(1 − x)(1 − x )(1 − x )(1 − x ) · · · i≥1
(1 − x

We can again clear denominators, but we get an infinite number of lines this time since
Y
(1 − xi ) = 1 − x − x2 + x5 + x7 − x12 − x15 + x22 + x26 − · · · .
i≥1

Nonetheless, this gives us an interesting recursive formula for p(n) if we follow the same
argument. As before, let’s adopt the convention that p(n) = 0 if n is negative. Then by
following the same reasoning from before, we see that for all n ≥ 1, we have the recursion:
p(n) = p(n − 1) + p(n − 2) − p(n − 5) − p(n − 7) + p(n − 12) + p(n − 15) − · · · .
A few comments are in order. First, this is, roughly speaking, a homogeneous linear recur-
rence relation of “order ∞”. However, for any given input n, only finitely many terms on the
right side actually contribute, so we do get a well-defined formula. For example, for n = 10,
we have
p(10) = p(9) + p(8) − p(5) − p(3).
Second, what is the actual pattern here? When we expand i≥1 (1 − xi ), it looks like all
Q
of the coefficients are ±1 (this is true in general), but what are the exponents that appear?
It turns out that they are the numbers of the form k(3k ± 1)/2 where k is a non-negative
integer (pentagonal numbers). I won’t go into any more detail, but if you’re interested, the
relevant result here is “Euler’s pentagonal number theorem”. □
Example 6.2.14. We’ll end our discussion on integer partitions with a more interesting
bijection (I don’t know if there’s any easy trick to prove it using power series). We want to
show that “self-conjugate partitions of n” are in bijection with “partitions of n using only
distinct odd parts”. We’ll just do this informally and I’ll leave formulating it in a more
rigorous way to you.
Given a self-conjugate partition, take all of the boxes in the first row and column of its
Young diagram. Since it’s self-conjugate, there are an odd number of boxes. Use this as the
first part of a new partition. Now remove those boxes and repeat. For example:

7→ , 7→ .

The pictures suggest how to reverse the procedure. □


6.3. Catalan numbers. The Catalan numbers are denoted Cn and have a lot of different
interpretations. One of them is the number of ways to arrange n pairs of left and right
parentheses so that they are balanced: meaning that every ) pairs off with some ( that
comes before it. More formally, a word consisting of parentheses is balanced, if for every
initial segment, the number of ( is always greater than or equal to the number of ). Our
convention is that C0 = 1.
NOTES FOR MATH 184 37

Example 6.3.1. For n = 3, there are 5 ways to balance 3 pairs of parentheses:


()()(), (())(), ((())), (()()), ()(()). □
Some other interpretations will be given on homework. For now, we’ll see how we can use
generating functions to obtain a formula for Cn . First,
Theorem 6.3.2. For all positive integers n, we have
n−1
X
Cn = Ci Cn−i−1 .
i=0

Proof. Every set of balanced parentheses must begin with (. Consider the ) which pairs with
it. In between the two of them is another set of balanced parentheses (possibly empty) and
to the right of them is another set of balanced parentheses (again, possibly empty). So the
set on the inside consists of i pairs, where 0 ≤ i ≤ n − 1, while the set on the right consists
of n − 1 − i pairs. These sets can be chosen independently, so there are Ci Cn−i−1 ways for
this to happen. Since the cases with different i don’t overlap, we sum over all possibilities
to get the identity above. □
Define X
C(x) = Cn x n .
n≥0

Corollary 6.3.3. We have


C(x) = 1 + xC(x)2 .
Pn−1
Proof. Note that i=0Ci Cn−i−1 is the coefficient of xn−1 in C(x)2 . So we have
n−1
!
X X X
C(x) = 1 + C n xn = 1 + Ci Cn−i−1 xn
n≥1 n≥1 i=0
n−1
!
X X
=1+x Ci Cn−i−1 xn−1 = 1 + xC(x)2 . □
n≥1 i=0

This means that C(x) is a solution of the quadratic polynomial xt2 − t + 1 = 0. Using the
quadratic formula, we deduce that C(x) is one of the solutions

1 ± 1 − 4x
.
2x
Note that x isn’t invertible as a power series, so we have to be careful here. Since C(x) is
a power series, it must be that x divides the numerator, i.e., the numerator
√ cannot have a
1/2

constant term. Which choice of sign is correct? The constant term of 1 − 4x is 0 = 1,
so the correct choice is a negative sign, and so

1 − 1 − 4x
C(x) = .
2x
 
1 2n
Theorem 6.3.4. Cn = .
n+1 n
38 STEVEN V SAM

Proof. We will use the binomial theorem. First, we have


X 1/2
1/2
(1 − 4x) = (−4x)n .
n≥0
n

Let’s simplify the coefficients (assuming n > 0):


1 −1 −3 −(2n−3)
n n 2 2 2 ···
 
n n 1/2 (2n − 3)!!
(−1) 4 = (−1) 4 2
= −2n .
n n! n!
Note that (2n − 3)!!(2n − 2)!! = (2n − 2)!, so we can multiply top and bottom by (2n − 2)!!
to get  
n (2n − 2)! (2n − 2)! 2 2n − 2
−2 = −2 =− .
n!(2n − 2)!! n!(n − 1)! n n−1
Since 1/2

0
= 1, we can simplify:
√ P 2 2n−2
 n
n≥1 n n−1 x
1 − 1 − 4x X 1 2n − 2 X 1 2n
n−1
C(x) = = = x = xn . □
2x 2x n≥1
n n−1 n≥0
n+1 n

Here are a few other things that are counted by the Catalan numbers together with the 5
instances for n = 3:
• The number of ways to apply a binary operation ∗ to n + 1 elements:
a ∗ (b ∗ (c ∗ d)), a ∗ ((b ∗ c) ∗ d), (a ∗ b) ∗ (c ∗ d), ((a ∗ b) ∗ c) ∗ d, (a ∗ (b ∗ c)) ∗ d.
• The number of rooted binary trees with n + 1 leaves:

a a d d
b d c a
c d b c a b c d a b b c

• The number of paths from (0, 0) to (n, n) which never go above the diagonal x = y
and are made up of steps either moving in the direction (0, 1) or (1, 0). For n = 3:

It turns out that the Catalan recursion shows up a lot. There are more than 200 other
known interpretations for the Catalan numbers.

7. Exponential generating functions


7.1. Products of exponential generating functions. Let a0 , a1 , a2 , . . . be a sequence
of numbers. The associated exponential generating function (EGF) is the formal power
series
X xn
A(x) = an ,
n≥0
n!
NOTES FOR MATH 184 39

where recall that n! = n(n − 1)(n − 2) · · · 2 · 1 and 0! = 1. When an = 1 for all n, we use the
notation
X xn
ex = exp(x) = .
n≥0
n!
You should just think of this as a renormalization of ordinary generating functions. When
written in the exponential format, the coefficients of a product take on a slightly different
form which is very convenient for certain kinds of counting problems:
n n n
Lemma 7.1.1. If A(x) = n≥0 an xn! and B(x) = n≥0 bn xn! , then A(x)B(x) = n≥0 cn xn!
P P P
Pn n
where cn = i=0 i ai bn−i .

Proof. The coefficient of xn in A(x)B(x) is ni=0 ai!i (n−i)!


bn−i
P
. By definition it is also cn /n!, so
Pn n
cn = i=0 i ai bn−i . □
To apply this to counting problems, it will be convenient to think of the coefficients of
an EGF as counting the number of structures on a set. Formally, a structure is a function
α that takes as input a finite set S (including S = ∅) and outputs another finite set α(S),
with the key property that if |S| = |T |, then |α(S)| = |α(T )|. We’ve been studying many of
these already.
Example 7.1.2. Here are some examples of structures.
• α(S) is the set of 2-element subsets of S.
• α(S) is the set of set partitions of S.
• α(S) is the set of bijections from S to itself. □
We will call elements of α(S) structures of type α, and the associated exponential gen-
erating function is
X xn
Eα (x) = |α([n])| .
n≥0
n!
Let α, β be structures. We can add and multiply structures:
(α + β)(S) = α(S) ⨿ β(S)
a
(α · β)(S) = α(T ) × β(S \ T ).
T ⊆S

The sum is just taking disjoint union. The product requires more explanation: we are taking
the disjoint union over all subsets T in S, picking an α-structure on T and a β-structure on
its complement. We’ll see in examples why this is a sensible thing to do, but first, we show
that these operations behave well with respect to EGFs:
Theorem 7.1.3. We have
Eα+β (x) = Eα (x) + Eβ (x), Eα·β (x) = Eα (x)Eβ (x).
Proof. For the sum, we have |(α + β)([n])| = |α([n])| + |β([n])| since we’re taking a disjoint
union.
For the product, we have
X
|(α · β)([n])| = |α(T )| · |β([n] \ T )|.
T ⊆[n]
40 STEVEN V SAM

Since the size of α(T ) only depends on |T | and similarly for β([n] \ T ), we can just sum over
possible sizes of T :
n  
X n
|α([i])| · |β([n − i])|
i=0
i
which is the coefficient of Eα (x)Eβ (x) by Lemma 7.1.1. □
Example 7.1.4. Consider a set of n football players. We want to split them up into two
groups. Both groups needs to be assigned an ordering and the second group additionally
needs to choose one of 3 colors for their uniform. Let cn be the number of ways to do this.
This scenario calls for a product of structures:
• Let α(S) be the set of orderings of S, so |α(S)| = |S|!. We have
X xn 1
Eα (x) = n! = .
n≥0
n! 1−x
• Let β(S) be the set of pairs (σ, f ) where σ is an ordering of S and f : S → [3] is an
assignment of the 3 colors to each element. So |β(S)| = |S|!3|S| . We have
X xn 1
Eβ (x) = n!3n = .
n≥0
n! 1 − 3x
Then (α·β)([n]) is the set of things we’re asking about (I glossed over it, but it’s important
that the definitions above make sense and give the correct thing when S = ∅, otherwise our
product interpretation will be incorrect when T = ∅, for example), so its EGF is
1
Eα·β (x) = .
(1 − x)(1 − 3x)
In particular,
 
n 1 n 3/2 1/2 3 1
cn /n! = [x ] = [x ] − = 3n − ,
(1 − x)(1 − 3x) 1 − 3x 1 − x 2 2
and hence
3 1 n!
cn = n!( 3n − ) = (3n+1 − 1). □
2 2 2
Before continuing, I want to point out a useful identity.
Proposition 7.1.5. Let A(x) and B(x) be formal power series with no constant term. Then
exp(A(x)) exp(B(x)) = exp(A(x) + B(x)).
Proof. To check this, let’s expand the left side:
! ! n
!
X A(x)n X B(x)n X X A(x)i B(x)n−i
=
n≥0
n! n≥0
n! n≥0 i=0
i! (n − i)!
Now the right side (using the usual binomial theorem):
n  
!
X (A(x) + B(x))n X 1 X n
= A(x)i B(x)n−i .
n≥0
n! n≥0
n! i=0
i

This is the same as the first one as soon as we cancel out the n! from ni .


NOTES FOR MATH 184 41

Example 7.1.6. We have n distinguishable telephone poles which are to be painted either
red or blue. The number which are blue must be even. Let cn be the number of ways to do
this.
Again we want to interpret this as counting the product of two structures (we’ll think of
the elements of sets as telephone poles):
• Let α(S) be the set of ways to paint the poles red according to our rules, so |α(S)| = 1
for all S (even S = ∅) and Eα (x) = ex .
• Let β(S) be the set of ways to paint the poles blue according to our rules, so |β(S)| = 1
if |S| is even and |β(S)| = 0 if |S| is odd. Hence
X x2n
Eβ (x) = .
n≥0
(2n)!

Here we are deleting all of the odd powers of x from ex . To get a nice expression,
note that this is the same as (ex + e−x )/2. (How about if we wanted to delete the
even terms instead?)
Hence we get:
1 1 1 X 2n xn 1
Eα·β (x) = ex (ex + e−x ) = (e2x + 1) = + .
2 2 2 n≥0 n! 2

So cn = 2n−1 if n > 0 and c0 = 1.


Actually we could have derived this formula using earlier stuff: we’re just trying to pick a
subset of even size to be painted blue. We know that half of the subsets of [n] have even size
and half have odd size, so we can also see 2n−1 . However, the approach given here generalizes
more easily if we introduce more colors, for example. □
We can multiply more than 2 structures at once. By iterating the case of 2 structures, we
come to the following definition and result. Let α1 , . . . , αk be structures. Then their product
is a
(α1 · · · αk )(S) = α1 (T1 ) × · · · × αk (Tk )
(T1 ,...,Tk )
T1 ∪···∪Tk =S
Ti ∩Tj =∅ for i ̸= j

where the disjoint union is over all ways to write S as a disjoint union of k subsets T1 , . . . , Tk
(order of the Ti matters). This is almost like an ordered set partition, except that the Ti are
allowed to be empty. Then
Eα1 ···αk (x) = Eα1 (x) · · · Eαk (x).
Example 7.1.7. Continuing from the previous example, suppose we can also color some
telephone poles green and there are no restrictions on how many are green. This introduces
a third structure: let γ(S) be the ways to paint the poles green, so |γ(S)| = 1 for all S. Our
new EGF is
!
1 x x −x x 1 3x x 1 X (3x)n X xn
Eα·β·γ (x) = e (e + e )e = (e + e ) = + ,
2 2 2 n≥0 n! n≥0
n!

so the answer we want is 21 (3n + 1). □


42 STEVEN V SAM

Example 7.1.8. Consider the following structure:


(
{∗} if |S| > 0
α(S) = .
∅ if |S| = 0
We can think of this as a selection structure which picks out nonempty subsets (said another
way, filters out the empty set) and Eα (x) = ex − 1. In particular, |(α · α)(S)| is the number
of nonempty subsets T ⊆ S such that S \ T is also nonempty. In other words, we can think
of an element of (α · α)(S) as an ordered set partition of S with 2 blocks. More generally,
αk (S) is the set of ordered set partitions of S with k blocks and so |αk ([n])| = k!S(n, k)
(recall that S(n, k) is the Stirling number). Hence
X xn
k!S(n, k) = Eαk (x) = Eα (x)k = (ex − 1)k ,
n≥0
n!
and also
X xn (ex − 1)k
S(n, k) = .
n≥0
n! k!
By modifying the definition of α we can get formulas for set partitions with different condi-
tions on the sizes of the blocks (or even using k different modifications). □
7.2. Compositions of exponential generating functions. Now we consider a structure
α such that α(∅) = ∅. For a finite nonempty set S, let ΠS be the set of set partitions of S.
Given a set partition π ∈ ΠS , define α(π) to be the set of ways to put a structure of type α
on each block of π. In particular, if the blocks of π are b1 , . . . , bk , then
|α(π)| = |α(b1 )| · |α(b2 )| · · · |α(bk )|.
We define eα to be the following structure:
a
eα (S) = α(π).
π∈ΠS

In other words, we consider all possible ways to partition S into nonempty subsets and put
the α structure on each block. Finally, we make the convention that |eα (∅)| = 1. We’ll see
some examples soon, but first let’s establish some basic properties.
Theorem 7.2.1 (Exponential formula). We have
Eeα (x) = exp(Eα (x)).
Proof. Since |α(∅)| = 0, we have [xn ]Eα (x)k = 0 if k > n. So
X Eα (x)k n
n n n
X Eα (x)k
[x ] exp(Eα (x)) = [x ] = [x ] .
k≥0
k! k=0
k!

From our discussion on products of EGFs, for n > 0, [xn ]Eα (x)k is the number of ways to
pick an ordered set partition of [n] into k blocks and put structures of type α on each block
(note that the property α(∅) = ∅ disallows picking empty blocks); if we divide by k! we just
remove the ordering. Hence the coefficient of xn above is exactly the size of eα ([n]). Finally,
the case n = 0 is ok by our convention that |eα (∅)| = 1. □
One nice thing about this form of EGF is that we can employ the following identity, which
will allow us to get recursive formulas for hn , as we’ll see in some examples.
NOTES FOR MATH 184 43

Proposition 7.2.2. If H(x) = eA(x) , then


H ′ (x) = H(x)A′ (x).
Proof. This follows from taking the derivative of H(x) = eA(x) . □
Example 7.2.3. A bijection f : [n] → [n] is an involution if f ◦ f is the identity function.
Let hn be the number of involutions on [n]. Note that an involution can be uniquely specified
by the following data: some elements that map to themselves, and otherwise we have pairs
of elements that get swapped. In other words, we break [n] up into 1 element and 2 ele-
ment subsets and put the identity involution on the 1 element subsets and the non-identity
involution on the 2 element.
Define a structure α such that

{identity function on S}
 if |S| = 1
α(S) = {swapping function on S} if |S| = 2 .

∅ otherwise
Then what we’ve said is that eα is the structure that assigns S to the set of involutions on
S. Since Eα (x) = x + x2 /2, we get
X xn 2
hn = Eeα (x) = exp(Eα (x)) = ex+x /2 .
n≥0
n!
In particular,
H ′ (x) = H(x)(1 + x).
Taking the coefficient of xn for n ≥ 1 of this identity gives the identity
hn+1 hn hn−1
= +
n! n! (n − 1)!
which simplifies to hn+1 = hn + nhn−1 . □
Example 7.2.4. Let hn be the number of ways to divide n people into nonempty groups
and have each sit in a circle. We considerParrangements that differ only by rotating some of
n
the circles to be equivalent. Let H(x) = n≥0 hn xn! .
Define a structure α so that α(∅) = ∅ and for nonempty sets S, α(S) is the set of ways
to arrange the elements of S into a circle. So |α(∅)| = 0 and for n ≥ 1, |α([n])| = (n − 1)!
since there are n! orderings, but each one is counted n times (all of the possible rotations)
and we only want to count them once. So
X xn
Eα (x) = .
n≥1
n
From our description, eα (S) is the set of ways of dividing S into nonempty groups and
arranging each in a circle, so hn = |eα ([n])| and hence H(x) = exp(Eα (x)). Since Eα′ (x) is
the geometric series, we see that (1 − x)H ′ (x) = H(x), which translates to (for any n ≥ 1)
hn+1 hn hn
− = ,
n! (n − 1)! n!
or more simply hn+1 = (n + 1)hn .
This, combined with h0 = 1, implies that hn = n!. Is there a way to see that more directly?
In fact, this is nothing more than the cycle decomposition of a permutation. □
44 STEVEN V SAM

Example 7.2.5. We can think of an involution of [n] as a permutation on n letters such


that all of its cycles either have length 1 or 2. We can generalize that example if we want to
put different restrictions on the lengths of the cycles by just changing α. For instance, if we
only want to allow cycles of length 2 or 3, we could define
(
{ways to put elements of S into a circle} if |S| = 2 or |S| = 3
α(S) = .
∅ otherwise
Then we’d have |α([2])| = 1 and |α([3])| = 2 and so Eα (x) = x2 /2 + x3 /3 and eα ([n]) is
set of permutations on n letters such that all cycles have length 2 or 3 and its EGF is
exp(x2 /2 + x3 /3). □
Example 7.2.6. One more variation: a permutation on n letters such that every cycle has
length ≥ 2 (i.e., no cycles of length 1) is called a derangement (on n letters). Alternatively,
a permutation σ is a derangement if and only if σ(i) ̸= i for all i = 1, . . . , n. Following the
previous example, let’s define
(
{ways to put elements of S into a circle} if |S| ≥ 2
α(S) = .
∅ otherwise
Then |α([n])| = 0 for n = 0, 1 and for n ≥ 2, we have |α([n])| = (n − 1)!, so
X xn
Eα (x) = .
n≥2
n
Since we’ll use it soon, its derivative simplifies as follows:
X X x
Eα′ (x) = xn−1 = xn = .
n≥2 n≥1
1−x

Then e ([n]) is the set of derangements on n letters, let’s use the notation hn = |eα ([n])| and
α

so its EGF is H(x) = exp(Eα (x)). Using the derivative identity (Proposition 7.2.2), we have
x
H ′ (x) = H(x)Eα′ (x) = H(x) .
1−x
Let’s rewrite this as
H ′ (x) − xH ′ (x) = xH(x).
We’ll compare the coefficients, but first let’s expand them again to make it easier to see:
X xn−1 X xn X xn+1
hn − hn = hn .
n≥1
(n − 1)! n≥1 (n − 1)! n≥0 n!

Now take k ≥ 1 and the coefficient of xk of both sides, we get


hk+1 hk hk−1
− = .
k! (k − 1)! (k − 1)!
Finally, multiply both sides by k! and rearrange to get a recursive formula for the number
of derangements:
hk+1 = k(hk + hk−1 ).
We’ll see a different way to understand these numbers when we discuss inclusion-exclusion.

NOTES FOR MATH 184 45

Now we’ve seen plenty of examples using the fact that permutations are built out of cycles
and how we can count permutations with restrictions on cycle lengths using the exponential
formula. Another important class of examples comes from set partitions, with “blocks” being
the literal building blocks.
Example 7.2.7. We continue with Example 7.1.8 and consider the selection structure
(
{∗} if |S| > 0
α(S) = .
∅ if |S| = 0
Then |eα (S)| is the number of set partitions of S, so we get the EGF for Bell numbers:
X xn
B(n) = Eeα (x) = exp(Eα (x)) = exp(ex − 1).
n≥0
n!

Letting H(x) be this EGF, we can extract a recursion by applying Proposition 7.2.2 (so
A(x) = ex − 1 and A′ (x) = ex ):
! !
X xn X x n X xn
B(n + 1) = H ′ (x) = H(x)A′ (x) = B(n) .
n≥0
n! n≥0
n! n≥0
n!

The coefficient of xn on the left side is B(n + 1)/n!; the coefficient on the right side is
Pn B(i) 1
i=0 i! (n−i)! . Multiply both by n! to get
n  
X n
B(n + 1) = B(i),
i=0
i
which is the identity from Example 3.1.8. □
Example 7.2.8. The advantage of this approach is that we can easily modify the problem if
we want to restrict the possible sizes of the blocks in our set partitions. For example, suppose
we want to consider set partitions such that every block has either size 2 or 3. P Let hn be the
n
number of set partitions of [n] with satisfying this condition and let H(x) = n≥0 hn xn! be
its EGF. Let’s define a structure α by
(
{∗} if |S| ∈ {2, 3}
α(S) = .
∅ else
Then hn = |eα ([n])|, H(x) = exp(Eα (x)), and Eα (x) = x2 /2! + x3 /3!. As usual, let’s apply
Proposition 7.2.2 with A(x) = Eα (x). First, A′ (x) = x + x2 /2 so we have
′ x2
H (x) = H(x)(x + ),
2
which can be written as
X xn−1 X xn+1 1 X xn+2
hn = hn + hn .
n≥1
(n − 1)! n≥0 n! 2 n≥0 n!

Now let’s compare the coefficient of xk (assume k ≥ 2 to avoid boundary issues):


hk+1 hk−1 hk−2
= + .
k! (k − 1)! 2(k − 2)!
46 STEVEN V SAM

Clear denominators to get  


k
hk+1 = k · hk−1 + hk−2 ,
2
which should look familiar from the homework. □
Of course, there is nothing special about only having block sizes 2 or 3. You can repeat
the above example with any restriction on the block sizes (for example, requiring them only
to be even, or only to be odd, or being any size except 3, etc.).
Remark 7.2.9. Finally, given two structures α, β such that α(∅) = ∅, there is a nice way
to interpret the composition Eβ (Eα (x)): define (β ◦ α)(S) to be the set of ways to partition
S into nonempty subsets, put an α structure on each block, and then put a β structure on
the set of blocks (for example, you could imagine α being the selection structure and then β
assigns to each block the color red or blue and then β ◦ α is the set of ways to partition S
and also color each of the blocks). Then we have Eβ (Eα (x)) = Eβ◦α (x). But I don’t plan to
use this generalization. □
Hence we have two scenarios so far where the exponential formula works quite well: per-
mutations (which allows us to impose restrictions on allowed cycle lengths) and set partitions
(which allows us to impose restrictions on allowed block sizes). I want to discuss one more
scenario related to graphs (in the sense of graph theory). The point here is that general
graphs are built out of connected graphs (which will be our basic building blocks). This
meta-idea of building arbitrary structures out of “connected” structures goes much further,
but we’ll limit ourselves to graphs.
7.3. Cayley’s enumeration of labeled trees. The next 2 sections are going to continue
the theme of examining how far we can push the idea of getting closed formulas out of
complicated recursive formulas while also reinforcing the use of the exponential formula.
A labeled (simple) graph on a (nonempty) set S is a collection of 2-element subsets of
S. The elements of S are called vertices, and the 2-element subsets are called edges. We
visualize these by thinking of S as a set of points and drawing an edge between two points
if that edge is in our collection. Just keep in mind that this just a visualization tool: there
are many different ways to draw the same labeled graph. The number of labeled graphs is
n
then 2( 2 ) by using what we already know about subsets, so we’ll discuss a more interesting
counting problem.
The graph has a cycle if there is a sequence of vertices v1 , . . . , vd (with d ≥ 3) such that
the vi are all distinct, {vi , vi+1 } is an edge for i = 1, . . . , d − 1 and so is {vd , v1 }. If the graph
has no cycles, it is called a labeled forest. If, in addition, it is connected (meaning we can
go from any point to any other by following edges), then it is a labeled tree. Let tn be the
number of labeled trees on [n]. Our goal is the following formula for tn .
Theorem 7.3.1 (Cayley). For n ≥ 1, we have tn = nn−2 .
There are a lot of different ways to prove this, but we will focus on using EGF.
Example 7.3.2. When n = 1 or n = 2, we get 1 labeled tree. When n = 3, we get 3,
corresponding to the following pictures:

1 2 3 2 1 3 1 3 2
NOTES FOR MATH 184 47

When n = 4, there are 2 types of unlabeled trees:


There are 4 labelings of the first kind since it only matters what goes in the middle, and the
second has 12 = 4!/2 labelings since a labeling can be thought of as a permutation of size 4,
except that reversing the order gives the same tree. □
Remark 7.3.3. The simple form of the formula tn = nn−2 suggests that there should be a
bijection between the set of labeled trees on [n] and words of length n − 2 in the alphabet [n].
In fact, such a bijection is known; you can look up Prüfer sequences for more information.
We won’t go into it in this course, though it does have some nice uses, for example it gives
a way to generate uniformly distributed random labeled trees. □
Remark 7.3.4. As the previous example hints, we can also ask about how many unlabeled
trees on n vertices there are. These are essentially the underlying shapes that a tree can take.
For example, for n = 3, there’s just one type, and for n = 4 there are 2 types. Actually,
this problem is significantly more complicated than the labeled case, and there’s no known
closed formula. □
We need one more definition: a rooted labeled tree is a pair (T, i) where T is a labeled
tree and i is one of its vertices, which we call its root. Alternatively, we can think of it as a
labeled tree where one of the points has been colored or marked in some way. The number
of rooted labeled trees with n vertices is then ntn . Similarly, we define a planted labeled
forest to be a labeled forest in which each connected component is a rooted labeled tree.
For n > 0, let fn be the number of planted labeled forests with n vertices and define f0 = 1.
Define EGFs X xn X xn
F (x) = fn , R(x) = ntn .
n≥0
n! n≥1
n!

Lemma 7.3.5. F (x) = eR(x) .


Proof. Every planted labeled forest is a disjoint union of rooted labeled trees (in a unique
way), so this follows from the exponential formula. □
Lemma 7.3.6. R(x) = xF (x).
Proof. For n ≥ 2, we claim there is a bijection between the set of labeled trees on [n] and
planted labeled forests on [n − 1].
Given a labeled tree, delete the vertex n, then we are left with a labeled forest on [n − 1].
Each vertex that was previously connected to n is now in a separate component (if they were
still connected, then the original graph had a cycle because we could go through n and then
through go through whatever path remains in [n − 1]), so we can declare all of them to be
the roots of their respective components.
On the other hand, given a planted labeled forest on [n − 1], we get a labeled graph on [n]
by adding an edge between n and each of the roots of F . This won’t introduce cycles (let
me skip this explanation since it’s intuitive to understand with a picture and I don’t want
to make this too technical) and the result is connected, so we actually have a labeled tree.
In conclusion tn = fn−1 for n ≥ 2, but also t1 = 1 = f0 by definition. Hence
X xn X xn−1
R(x) = tn =x fn−1 = xF (x). □
n≥1
(n − 1)! n≥1
(n − 1)!
48 STEVEN V SAM

Example 7.3.7. Let’s illustrate the previous bijection with an example with n = 7:
6 5 3 6 5 3

4 1 7 2 4 1 2

The original tree is on the left, and its corresponding planted forest is on the right. Here
I’ve indicated the roots by shading in the vertices. □
Combining these two identities gives the equation
R(x) = xeR(x) .
We can try to solve this coefficient by coefficient: say that R(x) = n≥1 rn xn and we are
P
trying to solve for the ri (by definition R(x) has no constant term). So mdeg(R(x)) = 1 and
this tells us that mdeg(R(x)n ) = n. Expanding the equation, we get
R(x)2
R(x) = x(1 + R(x) + + · · · ).
2!
R(x)n−1
So if we want to solve for rn we just need to consider x(1 + R(x) + · · · + (n−1)!
) since all
other terms don’t have a xn term. In particular,
r1 = [x1 ]R(x) = [x1 ]x = 1,
r2 = [x2 ]R(x) = [x2 ]x(1 + R(x)) = 0 + r1 = 1,
3 3 R(x)2 r12 3
r3 = [x ]R(x) = [x ]x(1 + R(x) + ) = 0 + r2 + = ,
2! 2 2
2 3
R(x) R(x) r1 r2 + r2 r1 r13 16
r4 = [x4 ]R(x) = [x4 ]x(1 + R(x) + + ) = 0 + r3 + + = ,
2! 3! 2 6 6
..
.
Remembering that tn = (n − 1)!rn , we get t1 = 1, t2 = 1, t3 = 3, t4 = 16, which is consistent
so far.
We can continue like this, but it would be nice to have a closed formula without having
to guess one. This can be done with the Lagrange inversion formula which we discuss next.

7.4. Lagrange inversion formula.


Theorem 7.4.1 (Lagrange inversion formula). Let G(x) be a formal power series whose
constant term is nonzero. Then there is a unique formal power series A(x) such that
A(x) = xG(A(x)).
Furthermore, A(x) has no constant term, and for n > 0, we have
1 n−1
[xn ]A(x) = [x ](G(x)n ).
n
A proof of this is doable in this course, but takes a bit of time and I’d prefer to skip it.
But it’s an interesting tool, so we’ll see some ways of how we can apply it.
NOTES FOR MATH 184 49

Proof of Cayley’s formula, Theorem 7.3.1. We take A(x) = R(x) and G(x) = ex . For n > 0,
the Lagrange inversion formula tells us that
1 1 X nd 1 nn−1 nn−1
[xn ]R(x) = [xn−1 ]enx = [xn−1 ] xd = = .
n n d≥0
d! n (n − 1)! n!

Remember that [xn ]R(x) = ntn /n!, so we conclude that tn = nn−2 . □


We’ll give a couple of other examples where this can be applied.

P Let’s nreturn to the problem of computing Catalan numbers from §6.3.


Example 7.4.2.
Let C(x) = n≥0 Cn x where Cn is the Catalan number. Recall that we proved that
C(x) = 1 + xC(x)2 and we solved this with the quadratic formula. Here’s another way using
the Lagrange inversion formula. First, this formula isn’t of the right form, but if we define
A(x) = C(x) − 1, then our relation becomes
A(x) + 1 = 1 + x(A(x) + 1)2 .
(Remember that the A(x) that is solved for in Lagrange inversion has no constant term, so
it was necessary to do some kind of change like above.) Subtracting 1 from both sides, this
is of the right form where G(x) = (x + 1)2 . Hence, we see that for n > 0, we have
 
n 1 n−1 2n 1 2n
[x ]A(x) = [x ](x + 1) =
n n n−1
where we used the binomial theorem. Since [xn ]A(x) = [xn ]C(x) for n > 0, we conclude that
2n
Cn = n1 n−1

. This isn’t quite the formula we derived, but
   
1 2n 1 (2n)! 1 (2n)! 1 2n
= = = . □
n n−1 n (n − 1)!(n + 1)! n + 1 n!n! n+1 n
Example 7.4.3. Continuing with the Catalan example, recall that we discussed why Catalan
numbers count the number of rooted binary trees with n + 1 leaves. Equivalently, this is
the number of rooted binary trees with n internal vertices. More generally, we can consider
rooted k-ary trees with n internal vertices. We’ll leave k out of the notation for simplicity,
and let cn be the number of rooted k-ary trees with n internal vertices. To build one when
n > 0, we start with a single node for our root, and then attach k rooted k-ary trees below
it. This gives us the relation
X
cn = ci 1 ci 2 · · · ci k for n > 0.
(i1 ,i2 ,...,ik )
i1 +···+ik =n−1

The sum is over all weak compositions of n − 1 with k parts. Here ij represents the number
of internal
P vertices that are in the jth tree connected to our original root. As before, if
n
C(x) = n≥0 cn x , this leads to the relation
C(x) = 1 + xC(x)k .
Now we don’t have a general method of solving this polynomial equation for general k, but
we can use Lagrange inversion like in the previous example. Again, we set A(x) = C(x) − 1
to convert the relation into
A(x) = x(A(x) + 1)k .
50 STEVEN V SAM

So we take G(x) = (x + 1)k and we conclude that


   
n 1 n−1 kn 1 kn 1 kn
[x ]A(x) = [x ](x + 1) = = . □
n n n−1 (k − 1)n + 1 n
There are actually many applications of Lagrange inversion (and its generalizations) in dif-
ferent fields of mathematics. The direct applications to counting problems seems somewhat
limited, but it’s very useful when it does apply.
Example 7.4.4. Here’s something not necessarily related to counting. Suppose A(x) is a
formal power series satisfying the identity
x
A(x) = .
1 − A(x)
We could clear denominators and then we’d realize A(x) as the root of a quadratic equation.
1
But we can also use Lagrange inversion with G(x) = 1−x . Then we get, for n > 0,
   
n 1 n−1 −n 1 −n n−1 1 2n − 1
[x ]A(x) = [x ](1 − x) = (−1) = . □
n n n−1 n n−1

8. Sieving methods
The topic of this section is how to systematically deal with overcounting. This could have
been done earlier in the course since it is basically independent of a lot of the other topics
we discussed, but we’ll draw on the previous sections for interesting examples.
8.1. Inclusion-exclusion.
Example 8.1.1. Suppose we have a room of students, and 14 of them play basketball, 10
of them play football. How many students play at least one of these? We can’t answer the
question because there might be students who play both. But we can say that the total
number is 24 minus the amount in the overlap.

B F

Alternatively, let B be the set who play basketball and let F be the set who play football.
Then what we’ve said is:
|B ∪ F | = |B| + |F | − |B ∩ F |.
New situation: there are additionally 8 students who play hockey. Let H be the set of
students who play hockey. What information do we need to know how many total students
there are?

B F
NOTES FOR MATH 184 51

Here the overlap region is more complicated: it has 4 regions, which suggest that we need 4
more pieces of information. The following formula works:
|B ∪ F ∪ H| = |B| + |F | + |H| − |B ∩ F | − |B ∩ H| − |F ∩ H| + |B ∩ F ∩ H|.
To see this, the total diagram has 7 regions and we need to make sure that students in each
region get counted exactly once in the right side expression. For example, consider students
who play basketball and football, but don’t play hockey. They get counted in B, F , B ∩ F
with signs +1, +1, −1, which sums up to 1. How about students who play all 3? They get
counted in all terms with 4 +1 signs and 3 −1 signs, again adding up to 1. You can check
the other 5 to make sure the count is right. □
The examples above have a generalization to n sets, though the diagram is harder to draw
beyond 3 (technically, you can’t draw it...)
Theorem 8.1.2 (Inclusion-Exclusion). Let A1 , . . . , An be finite sets. Then
n
X X
|A1 ∪ · · · ∪ An | = (−1)j−1 |Ai1 ∩ Ai2 ∩ · · · ∩ Aij |.
j=1 1≤i1 <i2 <···<ij ≤n

In words: to get the size of the union, first add up all of the sizes of the sets, then subtract
off the sizes of all 2-fold intersections, then add the sizes of all 3-fold intersections, ... and
keep going until you’ve intersected all of the sets.
Proof. We just need to make sure that every element x ∈ A1 ∪ · · · ∪ An is counted exactly
once on the right hand side. Let S = {s1 , . . . , sk } be all of the indices such that x ∈ Asr .
Then x belongs to Ai1 ∩ · · · ∩ Aij if and only if {i1 , . . . , ij } ⊆ S. So the relevant contributions
for x is a sum over all of the nonempty subsets of S:
|S|  
X
|T |−1
X |S|
(−1) =− (−1)n .
T ⊆S n=1
n

However, since |S| > 0, we have shown before that |S| |S|

(−1)n = 0, so the sum above
P
n=0 n
|S|

is 0 = 1. □
We can also prove this by induction on n. Can you see how?
Let’s start with some specific problems.
Example 8.1.3. Let’s do a warmup with n = 2.
How many numbers between 1 and 1000 are divisible by 3 or 5?
This is a typical inclusion-exclusion problem because OR translates to a union of two sets.
Namely, let A be the set of numbers between 1 and 1000 which are divisible by 3 and let B
be the set of numbers between 1 and 1000 which are divisible by 5. Our question is asking:
how big is A ∪ B?
To use inclusion-exclusion, we need 3 pieces of information: |A|, |B|, and |A ∩ B|.
First, let’s deal with A. We can write all of the multiples of 3: A = {3, 6, 9, . . . , 999}. How
big is this set? To see it easily, let’s divide all of them by 3: {1, 2, 3, . . . , 333}, so |A| = 333.
Next, let’s deal with B in the same way: B = {5, 10, 15, . . . , 1000}, and dividing each
number by 5 gives {1, 2, 3, . . . , 200}, so |B| = 200.
Finally, how do we deal with A ∩ B? Remember that if number being divisible by both
3 and 5 is equivalent to being divisible by their least common multiple lcm(3, 5) = 15. So
52 STEVEN V SAM

we have A ∩ B = {15, 30, 45, . . . , 990}, and again dividing by 15 gives {1, 2, 3, . . . , 66}, so
|A ∩ B| = 66.
So our desired answer is
|A ∪ B| = |A| + |B| − |A ∩ B| = 333 + 200 − 66 = 467. □
Example 8.1.4. The above generalizes fairly well. For instance, let’s consider the numbers
1, . . . , N which are divisible by a or b or c. For any x, let’s define AN,x to be the set of
multiples of x that are in 1, . . . , N . The general pattern is that |AN,x | = ⌊N/x⌋, where we’re
using floor function (rounding down to the next integer). In general, AN,x ∩AN,y = AN,lcm(x,y)
(and something similar for intersecting more than 2).
For a concrete example, let’s take N = 200 and a = 4, b = 5, c = 6. So our desired answer
would be
       
200 200 200 200
|A200,4 ∪ A200,5 ∪ A200,6 | = + + −
4 5 6 20
     
200 200 200
− − +
30 12 60
= 50 + 40 + 33 − 10 − 6 − 16 + 3
= 94. □
Example 8.1.5. Let’s consider ways to arrange the letters of the word BARBER such that
no two consecutive letters are the same. For the sake of brevity, let’s call an arrangement
“good” if it satisfies this property and “bad” otherwise. So, for example, BBARER is bad
since the two B’s are consecutive.
Good is defined by two conditions: the two B’s are not consecutive AND the two R’s are
not consecutive. Inclusion-exclusion lets us take care of unions, which you should think of
as taking an OR, so to better handle it, let’s flip it around and count the number of bad
arrangements. Then we’ll subtract it from the total number of arrangements.
The set of bad arrangements is the union of two sets: let A1 be the set of arrangements
where the two B’s appear consecutively, and let A2 be the set of arrangements where the
two R’s appear consecutively.
To count the size of A1 , we can use the following trick: merge the two B’s into a single
character (we can denote it B) and ask how many ways are there to arrange the 5 characters
B, A,R, E, R. This goes back to the problem about arranging flowers, so the answer is
5
1,1,2,1
or 5!/2 = 60. A2 is handled the same way so |A2 | = 60.
The intersection A1 ∩A2 is the set of arrangements where the two B’s appear consecutively
AND the two R’s appear consecutively. In that case, we can use the trick again and ask
about arrangements of B, A, R, E, so |A1 ∩ A2 | = 4! = 24.
So the number of bad arrangements is |A1 ∪ A2 | = 60 + 60 − 24 = 96. Our original problem
is about the opposite case, so we can subtract this from the total number of arrangements.
6

There are 2,2,1,1 = 180 total arrangements, so the number of good arrangements is 180 −
96 = 84. □
Example 8.1.6. If we have any word where each letter appears at most twice, then it’s
not difficult to generalize the work in the previous example to count the number of good
arrangements.
What about a word like TATTLE, where a letter appears 3 times? How many good
arrangements are there? We could try to do the same thing as before and count the bad
NOTES FOR MATH 184 53

arrangements. Importantly, a bad arrangement could have either 2 or 3 consecutive T’s, so


it won’t be enough to merge all 3 T’s together. If we try to just merge two of them into
T, then we’d be asking about the number of arrangements of T, A, T, L, E, of which there
are 5!. However, both of TTALE and TTALE really mean TTTALE, so we’re overcounting
a bunch of cases. Rather than fix this (think about how you might do that), let me try
another approach.
First, let’s number the T’s, so our letters are now T1 AT2 T3 LE and we can think of them
as 6 distinct letters. Now we want to count arrangements such that at least two of the T’s
appear consecutively. Let’s define 3 sets
A1,2 = {arrangements where T1 and T2 appear consecutively},
A1,3 = {arrangements where T1 and T3 appear consecutively},
A2,3 = {arrangements where T2 and T3 appear consecutively}.
Then we’re asking about |A1,2 ∪ A1,3 ∪ A2,3 |, and the number of bad arrangements is this size
divided by 3! (to remove the ordering of the T’s).
First, let’s count the size of each of these sets. For A1,2 , we can either have T1 T2 or
T2 T1 . In each case, we could merge the letters into one and then we’re arranging 5 (distinct)
letters, so we get 5!. So |A1,2 | = 2 · 5!. This applies equally well to the other two sets, so
|A1,3 | = |A2,3 | = 2 · 5!.
How about the intersections? For A1,2 ∩ A1,3 , there are only two ways for both T1 and T2
to appear consecutively and T1 and T3 to appear consecutively: T2 T1 T3 or T3 T1 T2 . Again,
in each case, we can merge the 3 letters into one and we’re asking about arranging 4 distinct
letters, so we get 4! and so |A1,2 ∩ A1,3 | = 2 · 4!. As before, the same applies to the other two
intersections A1,2 ∩ A2,3 and A1,3 ∩ A2,3 .
Finally, what about A1,2 ∩ A1,3 ∩ A2,3 ? This asks that all pairs of the T’s are consecutive
at the same time, but that’s impossible, so this intersection is empty.
In conclusion:
|A1,2 ∪ A1,3 ∪ A2,3 | = 3 · 2 · 5! − 3 · 2 · 4! = 576.
But remember this is with an ordering of the T’s. If we divide by 3!, we get 96, which is
6
the number of bad arrangements. Since there are 3,1,1,1 = 120 total ways to arrange these
letters, there are 24 good arrangements. □
We now use inclusion-exclusion to address two general counting problems: derangements
and Stirling numbers.
First, we can think of a permutation of [n] as the same thing as a bijection f : [n] → [n]
(given the bijection, f (i) is the position in the permutation where i is supposed to appear).
Recall that we defined derangements. To remind you, a derangement on n letters is a
permutation such that for all i, i does not appear in position i. Equivalently, it is a bijection
f such that f (i) ̸= i for all i.
Theorem 8.1.7. The number of derangements on n letters is
n
X n!
(−1)i .
i=0
i!

Proof. It turns out to be easier to count the number of permutations which are not derange-
ments and then subtract that from the total number of permutations. For i = 1, . . . , n,
54 STEVEN V SAM

let Ai be the set of bijections f such that f (i) = i. Then the set of non-derangements is
A1 ∪ · · · ∪ An .
To apply inclusion-exclusion, we need to count the size of Ai1 ∩ · · · ∩ Aij for some choice of
indices i1 , . . . , ij . This is the set of bijections f : [n] → [n] such that f (i1 ) = i1 , . . . , f (ij ) = ij .
The remaining information to specify f are its values outside of i1 , . . . , ij , which we can
interpret as a bijection of [n] \ {i1 , . . . , ij } to itself. So there are (n − j)! of them. So we get
n
X X
|A1 ∪ · · · ∪ An | = (−1)j−1 |Ai1 ∩ · · · ∩ Aij |
j=1 1≤i1 <···<ij ≤n
n
X X
= (−1)j−1 (n − j)!
j=1 1≤i1 <···<ij ≤n
n  
X
j−1 n
= (−1) (n − j)!
j=1
j
n
X n!
= (−1)j−1 .
j=1
j!

Remember that we have to subtract this from n!. So the final answer simplifies as so:
n n
j−1 n! n!
X X
n! − (−1) = (−1)j . □
j=1
j! j=0
j!

If we’re willing to use some calculus, we can conclude a more compact, although slightly
strange formula for the number of derangements. First, recall that for any real number r, we
have an infinite sum formula for er (now we’re doing calculus and not formal power series,
but it’s only for this discussion!):

r
X ri
e = .
i=0
i!
There are two things we can conclude from this. First, taking r = −1 and breaking up the
sum gives
n ∞
1 X (−1)i X (−1)i
= + .
e i=0
i! i=n+1
i!
The first sum is the number of derangements on n letters divided by n!, or in words: the
percentage of permutations which are derangements.
We can bound the difference (for example, using Lagrange’s version of the Taylor remainder
formula1):

X (−1)i 1
≤ .
i=n+1
i! (n + 1)!

1It’s
not crucial for this course, but let me remind you what (a special case of) it says: if f (x) is an
infinitely differentiable function whose Taylor series at 0 converges at r, then for each n, there exists ξ
Pn (i) (n+1)
between 0 and r such that f (r) − i=0 f i!(0) ri = f (n+1)! (ξ) n+1
r . For our purposes, r = −1, and we know
ξ
that e ≤ 1 for all ξ ∈ [−1, 0].
NOTES FOR MATH 184 55

In particular, we see that as n → ∞, the proportion of permutations that are derangements


limits to e−1 ≈ .368, so roughly 36.8% of them are derangements when n is somewhat large.
Now go back to the formula for 1/e above and multiply it by n!:

n ∞
n! X i n!
X n!
= (−1) + (−1)i .
e i=0
i! i=n+1 i!

Now the first sum is the number of derangements of n objects and from what we just said,
the second term is at most n!/(n + 1)! = 1/(n + 1) in absolute value.
Hence the number of derangements is in the interval [ n!e − n+1 1
, n!e + n+1
1
]. The width of
this interval is 2/(n + 1) which is strictly smaller than 1 for n ≥ 2, so it can’t contain more
than one integer. Hence the number of derangements is simply the closest integer to n!/e,
giving us the following surprising fact (accounting for n = 1 is easy to do directly, so we’ll
ignore it):

Theorem 8.1.8. The number of derangements of size n is round(n!/e) where round just
means round to the nearest integer.

Remark 8.1.9. This is pretty surprising: there’s no reason to expect that rounding should
ever provide an exact answer to a counting problem, especially something that involves a
transcendental number like e.
To give some sense of how this looks, here are the approximate values of n!/e for n =
1, . . . , 7 (just two decimal places):

.37, .74, 2.21, 8.83, 44.15, 264.87, 1854.11,

so the corresponding number of derangements is

0, 1, 2, 9, 44, 265, 1854. □

We can also use inclusion-exclusion to get an alternating sum formula for Stirling numbers.

Theorem 8.1.10. For all n ≥ k > 0,

k k
(k − i)n
 
1 X i k
X
S(n, k) = (−1) (k − i)n = (−1)i .
k! i=0 i i=0
i!(k − i)!

Proof. As we discussed before, k!S(n, k) is the number of ordered set partitions of [n] with
k blocks, and we interpreted that as the number of surjective functions f : [n] → [k] (the
blocks are just the preimages f −1 (i)). So we will count this quantity. For i = 1, . . . , k, let
Ai be the set of functions f : [n] → [k] such that i is not in the image of f . The surjective
functions are the complement of A1 ∪ · · · ∪ Ak from the set of all functions (there are k n
total functions). To apply inclusion-exclusion, we need to count the size of Ai1 ∩ · · · ∩ Aij for
1 ≤ i1 < · · · < ij ≤ k. This is the set of functions so that {i1 , . . . , ij } are not in the image;
equivalently, this is identified with the set of functions f : [n] → [k] \ {i1 , . . . , ij }, so there
56 STEVEN V SAM

are (k − j)n of them. So we can apply inclusion-exclusion to get


k
X X
|A1 ∪ · · · ∪ Ak | = (−1)j−1 |Ai1 ∩ · · · ∩ Aij |
j=1 1≤i1 <···<ij ≤k
k
X X
= (−1)j−1 (k − j)n
j=1 1≤i1 <···<ij ≤k
k  
j−1 k
X
= (−1) (k − j)n .
j=1
j

Remember we have to subtract:


k   k  
n
X
j−1 k n
X
j k
k!S(n, k) = k − (−1) (k − j) = (−1) (k − j)n .
j=1
j j=0
j

Now divide both sides by k! to get the first equality of the theorem statement. The second
equality of the theorem statement comes from canceling the k! from the binomial coefficient.

I’m not sure if there’s some calculus we can do to conclude something interesting like in
the previous example.

8.2. Möbius inversion. Let A be an alphabet of size k. We want to count the number of
words of length n in A up to cyclic symmetry. This means that two words are considered
the same if one is a cyclic shift of another. For example, for words of length 4, the following
4 words are all the same:
a1 a2 a3 a4 , a2 a3 a4 a1 , a3 a4 a1 a2 , a4 a1 a2 a3 .
We can think of these as necklaces: the elements of A might be different beads we can put
on the necklace, but we would consider two to be the same if we can rotate one to get the
other. Naively, we might say that the number of necklaces of length n is k n /n since we have
n rotations for each necklace. However, there is a problem: the n rotations might not all be
the same. For example there are only 2 different rotations of 0101.
We have to separate necklaces into different groups based on their period: this is the
smallest d such that rotating d times gives the same thing. So for n = 4, we can have
necklaces of periods 1, 2, or 4, examples being 0000, 0101, 0001. There aren’t any of period
3: the period must divide the length (this isn’t entirely obvious but we will not try to prove
it).
Here’s an important observation: a word of period d only depends on its first d letters
because we will just repeat this sequence of length d exactly n/d times. Hence, as long as d
divides n, the number of words of length n and period d does not depend on n.
So it makes sense to define ω(d) to be the number of words of period d and length d (this
notation should also incorporate k, but we’ll assume k is fixed). Hence for necklaces of length
4, we get the following formula:
ω(2) ω(4)
ω(1) + + .
2 4
NOTES FOR MATH 184 57

For general n, we would have


X ω(d)
|necklaces of length n| = .
d
d|n

So we want a formula for the number of words of a given period. We have another identity:
X
k n = |words of length n| = ω(d).
d|n

This gives a system of linear equations which we can solve.


Example 8.2.1. If we want the number of words of period 4, we start with
k 4 = ω(1) + ω(2) + ω(4).
We want to subtract off ω(2), so use the next identity
k 2 = ω(1) + ω(2)
and this tells us ω(4) = k 4 − k 2 .
For words of period 6, we get
k 6 = ω(1) + ω(2) + ω(3) + ω(6)
and then we can subtract off
k 3 = ω(1) + ω(3)
which leaves us with
ω(6) + ω(2) = k 6 − k 3 .
Now let’s subtract off k 2 = ω(1) + ω(2) to get
ω(6) − ω(1) = k 6 − k 3 − k 2 .
Finally, we have ω(1) = k, so we conclude that
ω(6) = k 6 − k 3 − k 2 + k. □
As we see, doing this calculation differed a lot for 4 and 6. It would be nice to have a
general formula for the coefficients that appear.
Definition 8.2.2. Define µ(1) = 1. Otherwise, for an integer n > 1, define the Möbius
function to be
(
0 if n is divisible by the square of a prime number
µ(n) = k
. □
(−1) if n is a product of k distinct prime numbers
In other words, if any prime divides n more than once, then µ(n) = 0. Otherwise, we
count how many different prime numbers divide n; µ(n) = 1 if that number is even and
µ(n) = −1 if that number is odd.
P
Lemma 8.2.3. If n > 1, then d|n µ(d) = 0.
Proof. Let n = pa11 · · · par r be its prime factorization. The sum can be rewritten
X X X
µ(d) = µ(pe11 · · · perr ) = µ(pe11 · · · perr ).
d|n 0≤e1 ≤a1 0≤e1 ≤1
0≤e2 ≤a2 0≤e2 ≤1
.. ..
. .
0≤er ≤ar 0≤er ≤1
58 STEVEN V SAM

The second equality holds because if any ei ≥ 2 then pe11 · · · perr is divisible by the square
of a prime, namely p2i . The last sum is a sum over all products of subsets of the primes
{p1 , . . . , pr }, so we get
r  
k r
X Y X X
|S|
µ( p) = (−1) = (−1) = 0.
p∈S
k
S⊆{p1 ,...,pr } S⊆{p1 ,...,pr }k=0

(Since n > 1, there is at least one prime in the factorization, so r > 0.) □
Theorem 8.2.4. Let α and β be two complex-valued functions on the positive integers.
(1) If X
α(d) = β(e)
e|d
for all positive integers d, then we also have
X
β(d) = µ(d/e)α(e).
e|d

for all positive integers d.


(2) Similarly, if Y
α(d) = β(e)
e|d
for all positive integers d and β(e) ̸= 0 for all e, then
Y
β(d) = α(e)µ(d/e)
e|d

for all positive integers d.


Proof. The second part is similar to the first, so we’ll just focus
Pon that.
Start with the right hand side and use the equation α(e) = f |e β(f ):
 
X X X
µ(d/e)α(e) = µ(d/e) β(f )
e|d e|d f |e
 
X X
= β(f ) µ(d/e)

f |d e divides d and
is divisible by f

We have a function
φ : {e | e divides d and is divisible by f } → {g | g divides d/f }
defined by φ(e) = d/e, which is well-defined since (d/f )/(d/e) = e/f which an integer by
the properties of e. There is an inverse function ψ defined by ψ(g) = d/g, which is also well-
defined: (d/f )/r is an integer, and so d/f = f · (d/f )/g is divisible by f , and d/(d/g) = g
so it also divides d. Using this bijection, we can rewrite the last sum:
 
X X
= β(f ) µ(g) .

f |d g| fd
NOTES FOR MATH 184 59

By Lemma 8.2.3, the inner sum is 0 if d/f > 1, so it simplifies to


X
= β(d) µ(g) = β(d),
g|1

which is the left hand side of the identity we’re trying to prove. □
Corollary 8.2.5. For any positive integer d, we have
X
ω(d) = µ(d/e)k e .
e|d

where the sum is over all positive integers e that divide d.


Proof. Take β = ω and α(d) = k d in the previous theorem. □
Example 8.2.6. Let’s apply this to the case n = 4. Then we have the following formulas:
ω(1) = µ(1/1)k = k
ω(2) = µ(2/1)k + µ(2/2)k 2 = −k + k 2
ω(4) = µ(4/1)k + µ(4/2)k 2 + µ(4/4)k 4 = 0 − k 2 + k 4 .
k2 −k k4 −k2
So the number of necklaces of length 4 is k + 2
+ 4
= (k 4 + k 2 + 2k)/4. □
Example 8.2.7. We can more easily compute words of period 6:
ω(6) = µ(6/1)k + µ(6/2)k 2 + µ(6/3)k 3 + µ(6/6)k 6 = k − k 2 − k 3 + k 6 . □
Remark 8.2.8. There was nothing special about the functions being complex-valued. For
(1), the important thing is that we can subtract values, and for (2), the important thing is
that we can divide values. We could say this more succinctly by saying that the functions
take their values in an abelian group (but you aren’t expected to be familiar with this
terminology). □
Here’s another instance of the Möbius function. For the rest of the notes, let i denote one
of the square roots of −1.
Recall that e2πi = 1. This tells us that the n complex numbers {e2πik/n | k = 1, . . . , n}
are all of the solutions of the equation xn − 1 = 0. They are usually called the nth roots of
unity. If k and n have a common factor r, then e2πik/n is also a root of xn/r − 1; if k and
n are relatively prime we call e2πik/n a primitive nth root of unity. The nth cyclotomic
polynomial can be defined as
Y
Φn (x) = (x − e2πik/n )
k

where the product is over all k such that k and n are relatively prime. Then from our
discussion, we conclude that Y
xn − 1 = Φj (x).
j|n

Hence using the remark (because we can divide by polynomials in the world of general
functions), if we define α(d) = xd − 1 and β(d) = Φd (x), then we conclude that
Y
Φn (x) = (xj − 1)µ(n/j) .
j|n
60 STEVEN V SAM

Example 8.2.9. For n = 6 we have


(x6 − 1)(x − 1)
Φ6 (x) = 2 3
= x2 − x + 1.
(x − 1)(x − 1)
For n = 8 we have
x8 − 1
Φ8 (x) = = x4 + 1. □
x4 − 1
Here is an offbeat appearance of the ideas used in the course to present. Consider the
distribution of the total of rolling two 6-sided dice:
total 2 3 4 5 6 7 8 9 10 11 12
frequency 1 2 3 4 5 6 5 4 3 2 1
The question is: could we label the dice in a different way to get the same frequency? The
rules are: the labels must be positive integers, but we won’t require them to be distinct. It
turns out there is exactly one other way to do this (called Sicherman dice): the first die has
labels {1, 2, 2, 3, 3, 4} and the second has labels {1, 3, 4, 5, 6, 8}.
To derive this (and allow generalizations from 6 sides to any number of sides), we make a
few observations. First, the frequency of n in a roll is the coefficient of xn in
(x + x2 + x3 + x4 + x5 + x6 )2 .
Here we think of x + · · · + x6 as being the generating function for the frequencies for one
standard 6-sided die. So the question becomes if we can write the above polynomial as a
product p(x)q(x) where the coefficients of p, q are non-negative integers (the coefficient of
xn is how many times n is used as a label), they have no constant term (since 0 is not an
allowed label), the sum of their coefficients are 6 (to account for 6 sides). Here p, q would
then be the generating functions for these non-standard dice.
We’ll use two facts:
• polynomials with integer coefficients satisfy unique factorization, i.e., have unique ex-
pressions as products of irreducible polynomials (i.e., those which cannot be factored
further into integer coefficient polynomials), and
• cyclotomic polynomials have integer coefficients and are irreducible (although they do
factor if we allow complex numbers, it’s not possible if we only use integer coefficients).
Next,
x6 − 1
x + · · · + x6 = x = xΦ2 (x)Φ3 (x)Φ6 (x) = x(x + 1)(x2 + x + 1)(x2 − x + 1).
x−1
So we get to use the last 4 irreducible polynomials, each twice, and we’re asking to rearrange
them into p(x) and q(x) with the listed properties. There isn’t much flexibility:
• Each of p, q is divisible by x since they can’t have a constant term, so each one gets
a factor of x
• The sum of the coefficients of other 4 terms are 2, 3, 1. The sum of the coefficients
of p is just the product of the sum of the coefficients of each factor, and same for q.
The only way to get 6 both times is to do is 2 · 3 · 1 both times (original dice) or 2 · 3
and 2 · 3 · 1 · 1.
NOTES FOR MATH 184 61

The second way leads to


p(x) = (x + 1)(x2 + x + 1) = x3 + 2x2 + 2x + 1,
q(x) = (x + 1)(x2 + x + 1)(x2 − x + 1)2 = x7 + x5 + x4 + x3 + x2 + 1.
Luckily, these have non-negative coefficients, so we do get another solution!
If we do this for 8-sided dice, we actually get a lot of solutions. First, the generating
function for a standard 8-sided die is
x8 − 1
x + x2 + · · · + x8 = x = xΦ2 (x)Φ4 (x)Φ8 (x) = x(x + 1)(x2 + 1)(x4 + 1).
x−1
All of these factors have non-negative coefficients (more generally, one can show that Φpn (x)
has non-negative coefficients for any prime p) and the sum of their coefficients is 2. So any
way of choosing 3 of them (using each factor at most twice) gives a valid solution:
p(x) = x(x + 1)2 (x2 + 1), q(x) = x(x2 + 1)(x4 + 1)2
p(x) = x(x + 1)2 (x4 + 1), q(x) = x(x2 + 1)2 (x4 + 1)
p(x) = x(x + 1)(x2 + 1)2 , q(x) = x(x + 1)(x4 + 1)2 .

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy