0% found this document useful (0 votes)
3 views

Combi Book

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

Combi Book

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 692

An Introduction to Algebraic

Combinatorics
[Math 701, Spring 2021 lecture notes]

[Math 531, Winter 2024 lecture notes]

Darij Grinberg
April 6, 2024 (unfinished!)

Contents
1. What is this? 6

2. Before we start... 7
2.1. What is this? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.2. Prerequisites . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.3. Notations and elementary facts . . . . . . . . . . . . . . . . . . . . 7

3. Generating functions 10
3.1. Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.1.1. Example 1: The Fibonacci sequence . . . . . . . . . . . . . 11
3.1.2. Example 2: Dyck words and Catalan numbers . . . . . . . 13
3.1.3. Example 3: The Vandermonde convolution . . . . . . . . . 22
3.1.4. Example 4: Solving a recurrence . . . . . . . . . . . . . . . 23
3.2. Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.2.1. Reminder: Commutative rings . . . . . . . . . . . . . . . . 26
3.2.2. The definition of formal power series . . . . . . . . . . . . 33
3.2.3. The Chu–Vandermonde identity . . . . . . . . . . . . . . . 46
3.2.4. What next? . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
3.3. Dividing FPSs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
3.3.1. Conventions . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
3.3.2. Inverses in commutative rings . . . . . . . . . . . . . . . . 49
3.3.3. Inverses in K [[ x ]] . . . . . . . . . . . . . . . . . . . . . . . . 51
3.3.4. Newton’s binomial formula . . . . . . . . . . . . . . . . . . 53

1
Math 701 Spring 2021, version April 6, 2024 page 2

3.3.5. Dividing by x . . . . . . . . . . . . . . . . . . . . . . . . . . 59
3.3.6. A few lemmas . . . . . . . . . . . . . . . . . . . . . . . . . . 60
3.4. Polynomials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
3.4.1. Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
3.4.2. Reminders on rings and K-algebras . . . . . . . . . . . . . 65
3.4.3. Evaluation aka substitution into polynomials . . . . . . . 67
3.5. Substitution and evaluation of power series . . . . . . . . . . . . . 69
3.5.1. Defining substitution . . . . . . . . . . . . . . . . . . . . . . 69
3.5.2. Laws of substitution . . . . . . . . . . . . . . . . . . . . . . 74
3.6. Derivatives of FPSs . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
3.7. Exponentials and logarithms . . . . . . . . . . . . . . . . . . . . . 89
3.7.1. Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
3.7.2. The exponential and the logarithm are inverse . . . . . . . 89
3.7.3. The exponential and the logarithm of an FPS . . . . . . . 95
3.7.4. Addition to multiplication . . . . . . . . . . . . . . . . . . 97
3.7.5. The logarithmic derivative . . . . . . . . . . . . . . . . . . 102
3.8. Non-integer powers . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
3.8.1. Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
3.8.2. The Newton binomial formula for arbitrary exponents . . 108
3.8.3. Another application . . . . . . . . . . . . . . . . . . . . . . 115
3.9. Integer compositions . . . . . . . . . . . . . . . . . . . . . . . . . . 118
3.9.1. Compositions . . . . . . . . . . . . . . . . . . . . . . . . . . 118
3.9.2. Weak compositions . . . . . . . . . . . . . . . . . . . . . . . 123
3.9.3. Weak compositions with entries from {0, 1, . . . , p − 1} . . 124
3.10. x n -equivalence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
3.11. Infinite products . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
3.11.1. An example . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
3.11.2. A rigorous definition
 . . . . . . . . . . . . . . . . . . . . . . 134
i
3.11.3. Why ∏ 1 + x2 works and ∏ (1 + ix ) doesn’t . . . . . 140
i ∈N i ∈N
3.11.4. A general criterion for multipliability . . . . . . . . . . . . 141
3.11.5. x n -approximators . . . . . . . . . . . . . . . . . . . . . . . . 143
3.11.6. Properties of infinite products . . . . . . . . . . . . . . . . 144
3.11.7. Product rules (generalized distributive laws) . . . . . . . . 150
3.11.8. Another example . . . . . . . . . . . . . . . . . . . . . . . . 157
3.11.9. Infinite products and substitution . . . . . . . . . . . . . . 161
3.11.10.Exponentials, logarithms and infinite products . . . . . . . 161
3.12. The generating function of a weighted set . . . . . . . . . . . . . . 162
3.12.1. The theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162
3.12.2. Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167
3.12.3. Domino tilings . . . . . . . . . . . . . . . . . . . . . . . . . 170
3.13. Limits of FPSs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178
3.13.1. Stabilization of scalars . . . . . . . . . . . . . . . . . . . . . 178
3.13.2. Coefficientwise stabilization of FPSs . . . . . . . . . . . . . 180
Math 701 Spring 2021, version April 6, 2024 page 3

3.13.3. Some properties of limits . . . . . . . . . . . . . . . . . . . 181


3.14. Laurent power series . . . . . . . . . . . . . . . . . . . . . . . . . . 186
3.14.1. Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186
3.14.2. The space K [[ x ± ]] . . . . . . . . . . . . . . . . . . . . . . . 190
3.14.3. Laurent polynomials . . . . . . . . . . . . . . . . . . . . . . 191
3.14.4. Laurent polynomials are not enough . . . . . . . . . . . . 192
3.14.5. Laurent series . . . . . . . . . . . . . . . . . . . . . . . . . . 194
3.14.6. A K [ x ± ]-module structure on K [[ x ± ]] . . . . . . . . . . . . 196
3.15. Multivariate FPSs . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196

4. Integer partitions and q-binomial coefficients 200


4.1. Partition basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201
4.1.1. Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201
4.1.2. Simple properties of partition numbers . . . . . . . . . . . 202
4.1.3. The generating function . . . . . . . . . . . . . . . . . . . . 205
4.1.4. Odd parts and distinct parts . . . . . . . . . . . . . . . . . 209
4.1.5. Partitions with a given largest part . . . . . . . . . . . . . . 212
4.1.6. Partition number vs. sums of divisors . . . . . . . . . . . . 215
4.2. Euler’s pentagonal number theorem . . . . . . . . . . . . . . . . . 220
4.3. Jacobi’s triple product identity . . . . . . . . . . . . . . . . . . . . 222
4.3.1. The identity . . . . . . . . . . . . . . . . . . . . . . . . . . . 222
4.3.2. Jacobi implies Euler . . . . . . . . . . . . . . . . . . . . . . 224
4.3.3. Proof of Jacobi’s triple product identity . . . . . . . . . . . 226
4.3.4. Application: A recursion for the sum of divisors . . . . . 233
4.4. q-binomial coefficients . . . . . . . . . . . . . . . . . . . . . . . . . 237
4.4.1. Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237
4.4.2. Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . 240
4.4.3. Basic properties . . . . . . . . . . . . . . . . . . . . . . . . . 242
4.4.4. q-binomial formulas . . . . . . . . . . . . . . . . . . . . . . 254
4.4.5. Counting subspaces of vector spaces . . . . . . . . . . . . 257
4.4.6. Limits of q-binomial coefficients . . . . . . . . . . . . . . . 264
4.5. References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 268

5. Permutations 269
5.1. Basic definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 270
5.2. Transpositions and cycles . . . . . . . . . . . . . . . . . . . . . . . 273
5.2.1. Transpositions . . . . . . . . . . . . . . . . . . . . . . . . . . 273
5.2.2. Cycles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 274
5.3. Inversions, length and Lehmer codes . . . . . . . . . . . . . . . . . 276
5.3.1. Inversions and lengths . . . . . . . . . . . . . . . . . . . . . 276
5.3.2. Lehmer codes . . . . . . . . . . . . . . . . . . . . . . . . . . 279
5.3.3. More about lengths and simples . . . . . . . . . . . . . . . 286
5.4. Signs of permutations . . . . . . . . . . . . . . . . . . . . . . . . . 299
5.5. The cycle decomposition . . . . . . . . . . . . . . . . . . . . . . . . 303
Math 701 Spring 2021, version April 6, 2024 page 4

5.6. References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 311

6. Alternating sums, signed counting and determinants 311


6.1. Cancellations in alternating sums . . . . . . . . . . . . . . . . . . . 311
6.2. The principles of inclusion and exclusion . . . . . . . . . . . . . . 326
6.2.1. The size version . . . . . . . . . . . . . . . . . . . . . . . . . 326
6.2.2. Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . 328
6.2.3. The weighted version . . . . . . . . . . . . . . . . . . . . . 337
6.2.4. Boolean Möbius inversion . . . . . . . . . . . . . . . . . . . 338
6.3. More subtractive methods . . . . . . . . . . . . . . . . . . . . . . . 346
6.4. Determinants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 346
6.4.1. Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . 347
6.4.2. Basic properties . . . . . . . . . . . . . . . . . . . . . . . . . 354
6.4.3. Cauchy–Binet . . . . . . . . . . . . . . . . . . . . . . . . . . 363
6.4.4. det ( A + B) . . . . . . . . . . . . . . . . . . . . . . . . . . . 365
6.4.5. Factoring the matrix . . . . . . . . . . . . . . . . . . . . . . 377
6.4.6. Factor hunting . . . . . . . . . . . . . . . . . . . . . . . . . 379
6.4.7. Laplace expansion . . . . . . . . . . . . . . . . . . . . . . . 389
6.4.8. Desnanot–Jacobi and Dodgson condensation . . . . . . . . 394
6.5. The Lindström–Gessel–Viennot lemma . . . . . . . . . . . . . . . 397
6.5.1. Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 397
6.5.2. Counting paths from ( a, b) to (c, d) . . . . . . . . . . . . . 400
6.5.3. Path tuples, nipats and ipats . . . . . . . . . . . . . . . . . 403
6.5.4. The LGV lemma for two paths . . . . . . . . . . . . . . . . 405
6.5.5. The LGV lemma for k paths . . . . . . . . . . . . . . . . . . 413
6.5.6. The weighted version . . . . . . . . . . . . . . . . . . . . . 417
6.5.7. Generalization to acyclic digraphs . . . . . . . . . . . . . . 418
6.5.8. The nonpermutable case . . . . . . . . . . . . . . . . . . . . 419

7. Symmetric functions 425


7.1. Definitions and examples of symmetric polynomials . . . . . . . 426
7.2. N-partitions and monomial symmetric polynomials . . . . . . . . 441
7.3. Schur polynomials . . . . . . . . . . . . . . . . . . . . . . . . . . . 444
7.3.1. Alternants . . . . . . . . . . . . . . . . . . . . . . . . . . . . 444
7.3.2. Young diagrams and Schur polynomials . . . . . . . . . . 447
7.3.3. Skew Young diagrams and skew Schur polynomials . . . 452
7.3.4. The Bender–Knuth involutions . . . . . . . . . . . . . . . . 456
7.3.5. The Littlewood–Richardson rule . . . . . . . . . . . . . . . 465
7.3.6. The Pieri rules . . . . . . . . . . . . . . . . . . . . . . . . . 491
7.3.7. The Jacobi–Trudi identities . . . . . . . . . . . . . . . . . . 494

A. Homework exercises 499


A.1. Before we start... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 499
A.1.1. Binomial coefficients and elementary counting . . . . . . . 499
Math 701 Spring 2021, version April 6, 2024 page 5

A.2. Generating functions . . . . . . . . . . . . . . . . . . . . . . . . . . 502


A.2.1. Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . 502
A.2.2. Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 505
A.2.3. Dividing FPSs . . . . . . . . . . . . . . . . . . . . . . . . . . 505
A.2.4. Polynomials . . . . . . . . . . . . . . . . . . . . . . . . . . . 506
A.2.5. Substitution and evaluation of power series . . . . . . . . 508
A.2.6. Derivatives of FPSs . . . . . . . . . . . . . . . . . . . . . . . 508
A.2.7. Exponentials and logarithms . . . . . . . . . . . . . . . . . 512
A.2.8. Non-integer powers . . . . . . . . . . . . . . . . . . . . . . 513
A.2.9. Integer compositions . . . . . . . . . . . . . . . . . . . . . . 514
A.2.10. x n -equivalence . . . . . . . . . . . . . . . . . . . . . . . . . 515
A.2.11. Infinite products . . . . . . . . . . . . . . . . . . . . . . . . 515
A.2.12. The generating function of a weighted set . . . . . . . . . 516
A.2.13. Limits of FPSs . . . . . . . . . . . . . . . . . . . . . . . . . . 518
A.2.14. Laurent power series . . . . . . . . . . . . . . . . . . . . . . 519
A.2.15. Multivariate FPSs . . . . . . . . . . . . . . . . . . . . . . . . 522
A.3. Integer partitions and q-binomial coefficients . . . . . . . . . . . . 525
A.3.1. Partition basics . . . . . . . . . . . . . . . . . . . . . . . . . 525
A.3.2. Euler’s pentagonal number theorem . . . . . . . . . . . . . 530
A.3.3. Jacobi’s triple product identity . . . . . . . . . . . . . . . . 530
A.3.4. q-binomial coefficients . . . . . . . . . . . . . . . . . . . . . 531
A.4. Permutations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 539
A.4.1. Basic definitions . . . . . . . . . . . . . . . . . . . . . . . . 539
A.4.2. Transpositions and cycles . . . . . . . . . . . . . . . . . . . 539
A.4.3. Inversions, length and Lehmer codes . . . . . . . . . . . . 541
A.4.4. V-permutations . . . . . . . . . . . . . . . . . . . . . . . . . 541
A.4.5. Fixed points . . . . . . . . . . . . . . . . . . . . . . . . . . . 543
A.4.6. More on inversions . . . . . . . . . . . . . . . . . . . . . . . 543
A.4.7. When transpositions generate SX . . . . . . . . . . . . . . 544
A.4.8. Pattern avoidance . . . . . . . . . . . . . . . . . . . . . . . . 544
A.4.9. The cycle decomposition . . . . . . . . . . . . . . . . . . . 550
A.4.10. Reduced words . . . . . . . . . . . . . . . . . . . . . . . . . 553
A.4.11. Descents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 558
A.5. Alternating sums, signed counting and determinants . . . . . . . 560
A.5.1. Cancellations in alternating sums . . . . . . . . . . . . . . 561
A.5.2. The principles of inclusion and exclusion . . . . . . . . . . 563
A.5.3. Determinants . . . . . . . . . . . . . . . . . . . . . . . . . . 570
A.5.4. The Lindström–Gessel–Viennot lemma . . . . . . . . . . . 582
A.6. Symmetric functions . . . . . . . . . . . . . . . . . . . . . . . . . . 586
A.6.1. Definitions and examples of symmetric polynomials . . . 586
A.6.2. N-partitions and monomial symmetric polynomials . . . 594
A.6.3. Schur polynomials . . . . . . . . . . . . . . . . . . . . . . . 594
Math 701 Spring 2021, version April 6, 2024 page 6

B. Omitted details and proofs 601


B.1. x n -equivalence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 601
B.2. Infinite products . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 609
B.3. Domino tilings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 648
B.4. Limits of FPSs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 651
B.5. Laurent power series . . . . . . . . . . . . . . . . . . . . . . . . . . 663
B.6. Cancellations in alternating sums . . . . . . . . . . . . . . . . . . . 666
B.7. Determinants in combinatorics . . . . . . . . . . . . . . . . . . . . 668
B.8. Definitions and examples of symmetric polynomials . . . . . . . 670
B.9. N-partitions and monomial symmetric polynomials . . . . . . . . 671
B.10. Schur polynomials . . . . . . . . . . . . . . . . . . . . . . . . . . . 672

This work is licensed under a Creative Commons


“CC0 1.0 Universal” license.

1. What is this?
These are the notes for an introductory course on algebraic combinatorics held
in the Spring Quarter 2021 at Drexel University1 . The topics covered are

• formal power series and their use as generating functions (Chapter 3);

• integer partitions and q-binomial coefficients (Chapter 4);

• permutations and their lengths, inversions and cycles (Chapter 5);

• alternating sums, the use of sign-reversing involutions and the combina-


torial view on determinants (Chapter 6);

• the basics of symmetric polynomials, particularly Schur polynomials (Chap-


ter 7).

Most (but not all) of these chapters are in a finished state (the final few sec-
tions of Chapter 3 need details). However, they are improvable both in detail
and in coverage. They might grow in later iterations of this course (in particular,
various further topics could get included). Errors and confusions will be fixed
whenever I become aware of them (any assistance is greatly appreciated!2 ).
Exercises of varying difficulty appear at the end of the text (Chapter A).

1 Thewebsite of this course is https://www.cip.ifi.lmu.de/~grinberg/t/21s/


2 Please
send comments to darijgrinberg@gmail.com
Corrections have been received so far from Mikey Becht – thank you!
Math 701 Spring 2021, version April 6, 2024 page 7

Acknowledgments
Thanks to the students in my Math 701 course for what was in essence an
alpha-test of these notes. Some exercises have been adapted from collections by
Richard P. Stanley, Martin Aigner, Donald Knuth, Miklós Bóna, Mark Wildon
and Igor Pak. A math.stackexchange user named Mindlack has contributed the
proof of Proposition 3.11.30. Andrew Solomon has reported typos.

2. Before we start...
2.1. What is this?
This is a course on algebraic combinatorics. This subject can be viewed either
as a continuation of enumerative combinatorics by other means (specifically, al-
gebraic ones), or as the part of algebra where one studies concrete polynomials
(more precisely, families of polynomials). For example, the Schur polynomi-
als can be viewed on the one hand as a tool for enumerating certain kinds of
tableaux (essentially, tabular arrangements of numbers that increase along rows
and columns), while on the other hand they form a family of polynomials with
a myriad surprising properties, generalizing (e.g.) the Vandermonde determi-
nant. I hope to cover both aspects of the subject to a reasonable amount in this
course.

2.2. Prerequisites
To understand this course, you are assumed to speak the language of rings
and fields (we will mostly need the basic properties of polynomials and linear
maps; we will define what we need about power series), and to have some basic
knowledge of enumerative combinatorics (see below). My notes [23wa], and
the references I gave therein, can help refresh your knowledge of the former.
As for the latter, there are dozens of sources available (I made a list at https:
//math.stackexchange.com/a/1454420/ , focussing mostly on texts available
online), including my own notes [22fco].

2.3. Notations and elementary facts


We will use the following notations and conventions:

• The symbol N will denote the set {0, 1, 2, 3, . . .} of nonnegative integers.

• The size (i.e., cardinality) of a set A will be denoted by | A|.

• The symbol “#” means “number”. For example, the size | A| of a set A is
the # of elements of A.
Math 701 Spring 2021, version April 6, 2024 page 8

We will need some basics from enumerative combinatorics (see, e.g., [Newste19,
§8.1] for details, and [19fco, Chapters 1 and 2] for more details):

• addition principle = sum rule: If A and B are two disjoint sets, then
| A ∪ B | = | A | + | B |.
• multiplication principle = product rule: If A and B are any two sets, then
| A × B | = | A | · | B |.
• bijection principle: There is a bijection (= bijective map = invertible map
= one-to-one correspondence) between two sets X and Y if and only if
| X | = |Y | .
 
n n
• A set with n elements has 2 subsets, and has size-k subsets for any
k
k ∈ R.

• A set with n elements has n! permutations (= bijective maps from this set
to itself).

• dependent product rule: Consider a situation in which you have to make


n choices (sequentially). Assume that you have a1 options available in
choice 1, and then (after making choice 1) you have a2 options available in
choice 2 (no matter which option you chose in choice 1), and then (after
both choices 1 and 2) you have a3 options available in choice 3 (no matter
which options you chose in choices 1 and 2), and so on. Then, the total # of
ways to make all n choices is a1 a2 · · · an . (This is formalized in [Newste19,
Theorem 8.1.19].)

A few words about binomial coefficients are in order:

Definition 2.3.1. For any numbers n and k, we set



   n ( n − 1) ( n − 2) · · · ( n − k + 1)
n , if k ∈ N;
= k! (1)
k
0, else.

Note that “numbers” is to be understood fairly liberally here. In particular,


n can be any integer, rational, real or complex number (or, more generally,
any element in a Q-algebra),
  whereas k can be anything (although the only
n
nonzero values of will be achieved for k ∈ N, by the above definition).
k
Math 701 Spring 2021, version April 6, 2024 page 9

Example 2.3.2. For any k ∈ N, we have

−1 (−1) (−1 − 1) (−1 − 2) · · · (−1 − k + 1)


 
=
k k!
(−1) (−2) (−3) · · · (−k) (−1)k k!
= = = (−1)k .
k! k!

If n, k ∈ N and n ≥ k, then
 
n n!
= . (2)
k k! (n − k )!
But this formula only applies to the case when n, k ∈ N and n ≥ k. Our above
definition ismore
 general than it. The combinatorial meaning of the binomial
n
coefficient (as the # of k-element subsets of a given n-element set) also
k
cannot be used for negative or non-integer values of n.

1 · 3 · 5 · · · · · (2n − 1) n
 
2n
Example 2.3.3. Let n ∈ N. Then, = ·2 .
n n!

Proof of Example 2.3.3. We have


(2n)! = 1 · 2 · · · · · (2n)
= (1 · 3 · 5 · · · · · (2n − 1)) · (2 · 4 · 6 · · · · · (2n))
| {z }
=(2·1)·(2·2)·····(2·n)
=2n (1·2·····n)
n
= (1 · 3 · 5 · · · · · (2n − 1)) · 2 (1 · 2 · · · · · n)
| {z }
=n!
n
= (1 · 3 · 5 · · · · · (2n − 1)) · 2 n!.
Now, (2) yields
(1 · 3 · 5 · · · · · (2n − 1)) · 2n n!
 
2n (2n)! (2n)!
= = =
n n! (2n − n)! n!n! n! · n!
(since (2n)! = (1 · 3 · 5 · · · · · (2n − 1)) · 2n n!)
(1 · 3 · 5 · · · · · (2n − 1)) · 2n 1 · 3 · 5 · · · · · (2n − 1) n
= = ·2 .
n! n!
This proves Example 2.3.3.
Entire books have been written about binomial coefficients and their proper-
ties. See [Spivey19] for a recent text (and [GrKnPa94, Chapter 5] and [Grinbe15,
Chapter 3] and [Knuth1, §1.2.6] and [Wildon19, Chapter 2] for elementary in-
troductions). Here are two more basic facts that we will need ([19fco, Theorem
1.3.8] and [19fco, Proposition 1.3.6], respectively):
Math 701 Spring 2021, version April 6, 2024 page 10

Proposition 2.3.4 (Pascal’s identity, aka recurrence of the binomial coefficients).


We have
m−1 m−1
     
m
= + (3)
n n−1 n
for any numbers m and n.
 
m
Proposition 2.3.5. Let m, n ∈ N satisfy m < n. Then, = 0.
n

 that Proposition 2.3.5 really requires m ∈ N. For example, 1.5 < 2 but
 Note
1.5
= 0.375 ̸= 0.
2
Yet another useful property of the binomial coefficients is the following ([19fco,
Theorem 1.3.11]):

Theorem 2.3.6 (Symmetry of the binomial coefficients). Let n ∈ N and k ∈ R.


Then,    
n n
= .
k n−k

Note the n ∈ N requirement. Convince yourself that Theorem 2.3.6 would


fail for n = −1 and k = 0.

3. Generating functions
In this first chapter, we will discuss generating functions: first informally, then
on a rigorous footing. You may have seen generating functions already, as their
usefulness extends far beyond combinatorics; but they are so important to this
course that they are worth covering twice in case of doubt.
Rigorous introductions to generating functions (and formal power series in
general) can also be found in [Loehr11, Chapter 7 (in the 1st edition)], in
[Henric74, Chapter 1], in [Sambal22], and (to some extent) in [19s, Chapter
7].3 A quick overview is given in [Niven69], and many applications are found
in [Wilf09]. There are furthermore numerous books that explore enumerative
combinatorics through the lens of generating functions ([GouJac83], [Wagner08],
[Lando03] and others).

3 Bourbaki’s[Bourba03, §IV.4] contains what might be the most rigorous and honest treatment
of formal power series available in the literature; however, it is not the most readable source,
as the notation is dense and heavily relies on other volumes by the same author.
Math 701 Spring 2021, version April 6, 2024 page 11

3.1. Examples
Let me first show what generating functions are good for. Then, starting in the
next section, I will explain how to rigorously define them. For now, I will work
informally; please suspend your disbelief until the next section.
The idea behind generating functions is easy: Any sequence ( a0 , a1 , a2 , . . .) of
numbers gives rise to a “power series” a0 + a1 x + a2 x2 + · · · , which is called
the generating function of this sequence. This “power series” is an infinite sum
(an “infinite polynomial” in an indeterminate x), so it is not immediately clear
what it means and what we are allowed to do with it; but before we answer
such questions, let us first play around with these power series and hope for
the best. The following four examples show how they can be useful.

3.1.1. Example 1: The Fibonacci sequence


Example 1. The Fibonacci sequence is the sequence ( f 0 , f 1 , f 2 , . . .) of integers
defined recursively by
f 0 = 0, f 1 = 1, f n = f n−1 + f n−2 for each n ≥ 2.
Its entries are known as the Fibonacci numbers. Here are the first few of them:
n 0 1 2 3 4 5 6 7 8 9 10 11
fn 0 1 1 2 3 5 8 13 21 34 55 89
Let us see what we can learn about this sequence by considering its generat-
ing function
F ( x ) := f 0 + f 1 x + f 2 x2 + f 3 x3 + · · ·
= 0 + 1x + 1x2 + 2x3 + 3x4 + 5x5 + · · · .
We have
F ( x ) = f 0 + f 1 x + f 2 x2 + f 3 x3 + f 4 x4 + · · ·
2 3 4
{z1x} + ( f 1 + f 0 ) x + ( f 2 + f 1 ) x + ( f 3 + f 2 ) x + · · ·
= 0| +
=x
(since f 0 = 0 and f 1 = 1 and f n = f n−1 + f n−2 for each n ≥ 2)
= x + ( f 1 + f 0 ) x2 + ( f 2 + f 1 ) x3 + ( f 3 + f 2 ) x4 + · · ·
| {z }
2 3 4 2 3 4
=( f 1 x + f 2 x + f 3 x +··· )+( f 0 x + f 1 x + f 2 x +··· )
(here we are hoping that this manipulation of
infinite sums is indeed legitimate)
   
= x + f 1 x2 + f 2 x3 + f 3 x4 + · · · + f 0 x2 + f 1 x3 + f 2 x4 + · · ·
| {z } | {z }
2 3
= x ( f 1 x + f 2 x + f 3 x +··· ) 2 2 3
= x ( f 0 + f 1 x + f 2 x + f 3 x +··· )
= x ( F ( x )− f 0 )= xF ( x ) = x2 F ( x )
(since f 0 =0)
 
2 2
= x + xF ( x ) + x F ( x ) = x + x + x F (x) .
Math 701 Spring 2021, version April 6, 2024 page 12

Solving this equation for F ( x ) (assuming that we are allowed to divide by


1 − x − x2 ), we get
x x
F (x) = 2
= ,
1−x−x (1 − ϕ+ x ) (1 − ϕ− x )
√ √
1+ 5 1− 5
where ϕ+ = and ϕ− = are the two roots of the quadratic poly-
2 2
nomial 1 − x − x2 (note that ϕ+ and ϕ− are sometimes known as the “golden
ratios”; we have ϕ+ ≈ 1. 618 and ϕ− ≈ −0.618). Hence,
x
F (x) =
(1 − ϕ+ x ) (1 − ϕ− x )
1 1 1 1
=√ · −√ · (4)
5 1 − ϕ+ x 5 1 − ϕ− x
(by partial fraction decomposition).
1
Now, what are the coefficients of the power series for an α ∈ C ? Let
1 − αx
me first answer this question for α = 1. Namely, I claim that

1
= 1 + x + x2 + x3 + · · · . (5)
1−x
Indeed, this follows by observing that
 
(1 − x ) 1 + x + x 2 + x 3 + · · ·
   
= 1 + x + x2 + x3 + · · · − x 1 + x + x2 + x3 + · · ·
   
2 3 2 3 4
= 1+x+x +x +··· − x+x +x +x +···
=1

(again, we are hoping that these manipulations of infinite sums are allowed).
Note that the equality (5) is a version of the geometric series formula familiar from
real analysis. Now, for any α ∈ C, we can substitute αx for x in the equality (5),
and thus obtain
1
= 1 + αx + (αx )2 + (αx )3 + · · ·
1 − αx
= 1 + αx + α2 x2 + α3 x3 + · · · . (6)
Math 701 Spring 2021, version April 6, 2024 page 13

Hence, our above formula (4) becomes

1 1 1 1
F (x) = √ · −√ ·
5 1 − ϕ+ x 5 1 − ϕ− x
1 
2 2 3 3
 1  2 2 3 3

=√ · 1 + ϕ+ x + ϕ+ x + ϕ+ x + · · · − √ · 1 + ϕ− x + ϕ− x + ϕ− x +···
5 5
( by (6), applied to α = ϕ + and again to α = ϕ− )
1 1
= √ ∑ ϕ+ x − √ ∑ ϕ−
k k k k
x
5 k ≥0 5 k ≥0
 
1 1
= ∑ √ · ϕ+ − √ · ϕ− x k .
k k

k ≥0 5 5

Now, for any given n ∈ N, the coefficient of x n in the power series on the
left hand side of this equality is f n (since F ( x ) = f 0 + f 1 x + f 2 x2 + f 3 x3 + · · · ),
1 1
whereas the coefficient of x n on the right hand side is clearly √ · ϕ+ n −√ · ϕ− n.
5 5
Thus, comparing coefficients before x n , we obtain

1 n 1 n
f n = √ · ϕ+ − √ · ϕ−
5 5
 
assuming that “comparing coefficients” is allowed, i.e.,
that equal power series really have equal coefficients
√ !n √ !n
1 1+ 5 1 1− 5
=√ · −√ ·
5 2 5 2

for any n ∈ N. This formula is known as Binet’s formula. It has many


√ con-
f 1+ 5
sequences; for example, it implies easily that lim n+1 = ϕ+ = ≈
n→∞ f n 2
n in the asymptotical sense.
1.618 . . .. Thus, f n ∼ ϕ+

3.1.2. Example 2: Dyck words and Catalan numbers


Before the next example, let us address a warmup question: What is the number
of 2n-tuples that contain n entries equal to 0 and n entries equal to 1 ?
(For example, for n = 2, these 2n-tuples are (1, 1, 0, 0), (1, 0, 1, 0), (1, 0, 0, 1),
(0, 1, 1, 0), (0, 1, 0, 1), (0, 0, 1, 1), sothere are 6 of them.)
2n
Answer: The number is , since choosing a 2n-tuple that contains n
n
entries equal to 0 and n entries equal to 1 is tantamount to choosing   an n-
2n
element subset of {1, 2, . . . , 2n} (and we know that there are ways to
n
choose the latter).
Math 701 Spring 2021, version April 6, 2024 page 14

Example 2. A Dyck word of length 2n (where n ∈ N) is a 2n-tuple that


contains n entries equal to 0 and n entries equal to 1, and has the additional
property that for each k, we have

(# of 0’s among its first k entries)


≤ (# of 1’s among its first k entries) . (7)

(The symbol “#” means “number”.)


Some examples: The tuples

(1, 0, 1, 0) , (1, 1, 0, 0) , (1, 1, 0, 1, 0, 0) , () , (1, 0)

are Dyck words. The tuples

(0, 1, 1, 0) , (1, 0, 0, 1) , (1, 1, 0) , (1) , (1, 1, 1, 0)

are not Dyck words.


A Dyck path of length 2n is a path from the point (0, 0) to the point (2n, 0)
in the Cartesian plane that moves only using “NE-steps” (i.e., steps of the
form ( x, y) → ( x + 1, y + 1)) and “SE-steps” (i.e., steps of the form ( x, y) →
( x + 1, y − 1)) and never falls below the x-axis (i.e., does not contain any point
( x, y) with y < 0).
Examples: For n = 2, the Dyck paths from (0, 0) to (2n, 0) are

A Dyck path can be viewed as the “skyline” of a “mountain range”. For


example:

Dyck path “mountain range”

The names “NE-steps” and “SE-steps” in the definition of a Dyck path refer
to compass directions: If we treat the Cartesian plane as a map with the x-axis
directed eastwards and the y-axis directed northwards, then an NE-step moves
to the northeast, and an SE-step moves to the southeast.
Note that any NE-step and any SE-step increases the x-coordinate by 1 (that
is, the step goes from a point with x-coordinate k to a point with x-coordinate
Math 701 Spring 2021, version April 6, 2024 page 15

k + 1). Thus, any Dyck path from (0, 0) to (2n, 0) has precisely 2n steps. Of these
2n steps, exactly n are NE-steps while the remaining n are SE-steps (because
any NE-step increases the y-coordinate by 1, while any SE-step decreases the
y-coordinate by 1). Since a Dyck path must never fall below the x-axis, we see
that the number of SE-steps up to any given point can never be larger than the
number of NE-steps up to this point. But this is exactly the condition (7) from
the definition of a Dyck word, except that we are talking about NE-steps and
SE-steps instead of 1’s and 0’s. Thus, there is a simple bijection between Dyck
words of length 2n and Dyck paths (0, 0) → (2n, 0):
• send each 1 in the Dyck word to a NE-step in the Dyck path;
• send each 0 in the Dyck word to a SE-step in the Dyck path.
So the # of Dyck words (of length 2n) equals the # of Dyck paths (from (0, 0)
to (2n, 0)). But what is this number?
Example: For n = 3, this number is 5. Indeed, here are all Dyck paths from
(0, 0) to (6, 0), and their corresponding Dyck words:
Dyck path Dyck word

(1, 1, 0, 0, 1, 0)

(1, 1, 1, 0, 0, 0)

(1, 0, 1, 0, 1, 0)

(1, 0, 1, 1, 0, 0)

(1, 1, 0, 1, 0, 0)
Math 701 Spring 2021, version April 6, 2024 page 16

(We will soon stop writing the commas and parentheses when writing down
words. For example, the word (1, 1, 0, 0, 1, 0) will just become 110010.)
Back to the general question.
For each n ∈ N, we define

cn := (# of Dyck paths (0, 0) → (2n, 0))


= (# of Dyck words of length 2n) (as we have seen above) .

Then, c0 = 1 (since the only Dyck path from (0, 0) to (0, 0) is the trivial path)
and c1 = 1 and c2 = 2 and c3 = 5 and c4 = 14 and so on. These numbers cn
are known as the Catalan numbers. Entire books have been written about them,
such as [Stanle15].
Let us first find a recurrence relation for cn . The argument below is best
understood by following an example; namely, consider the following Dyck path
from (0, 0) to (16, 0) (so the corresponding n is 8):

Fix a positive integer n. If D is a Dyck path from (0, 0) to (2n, 0), then the
first return of D (this is short for “first return of D to the x-axis”) shall mean
the first point on D that lies on the x-axis but is not the origin (i.e., that has the
form (i, 0) for some integer i > 0). For instance, in the example that we just
gave, the first return is the point (6, 0). If D is a Dyck path from (0, 0) to (2n, 0),
and if (i, 0) is its first return, then i is even4 , and therefore we have i = 2k for
some k ∈ {1, 2, . . . , n}. Hence, for any Dyck path from (0, 0) to (2n, 0), the first
return is a point of the form (2k, 0) for some k ∈ {1, 2, . . . , n}. Thus,

(# of Dyck paths from (0, 0) to (2n, 0))


n
= ∑ (# of Dyck paths from (0, 0) to (2n, 0) whose first return is (2k, 0)) .
k =1

Now, let us fix some k ∈ {1, 2, . . . , n}. We shall compute the # of Dyck paths
from (0, 0) to (2n, 0) whose first return is (2k, 0). Any such Dyck path has a
natural “two-part” structure: Its first 2k steps form a path from (0, 0) to (2k, 0),
while its last (i.e., remaining) 2 (n − k ) steps form a path from (2k, 0) to (2n, 0).
Thus, in order to construct such a path, we
4 Proof.The number of NE-steps before the first return must equal the number of SE-steps
before the first return (because these steps have altogether taken us from the origin to a
point on the x-axis, and thus must have increased and decreased the y-coordinate an equal
number of times). This shows that the total number of steps before the first return is even.
In other words, i is even (because the total number of steps before the first return is i).
Math 701 Spring 2021, version April 6, 2024 page 17

• first choose its first 2k steps: They have to form a Dyck path from (0, 0) to
(2k, 0) that never returns to the x-axis until (2k, 0). Hence, they begin with
a NE-step and end with a SE-step (since any other steps here would cause
the path to fall below the x-axis). Between these two steps, the remaining
2k − 2 = 2 (k − 1) steps form a path that not only never falls below the
x-axis, but also never touches it (since (2k, 0) is the first return of our
Dyck path, so that our Dyck path does not touch the x-axis between (0, 0)
and (2k, 0)). In other words, these 2 (k − 1) steps form a path from (1, 1)
to (2k − 1, 1) that never falls below the y = 1 line (i.e., below the x-axis
shifted by 1 upwards). This means that it is a Dyck path from (0, 0) to
(2 (k − 1) , 0) (shifted by (1, 1)). Thus, there are ck−1 possibilities for this
path. Hence, there are ck−1 choices for the first 2k steps of our Dyck path.

• then choose its last 2 (n − k ) steps: They have to form a path from (2k, 0)
to (2 (n − k ) , 0) that never falls below the x-axis (but is allowed to touch
it any number of times). Thus, they form a Dyck path from (0, 0) to
(2 (n − k) , 0) (shifted by (2k, 0)). So there are cn−k choices for these last
2 (n − k ) steps.

Thus, there are ck−1 cn−k many options for such a Dyck path from (0, 0) to
(2n, 0) (since choosing the first 2k steps and choosing the last 2 (n − k) steps are
independent).
Let me illustrate this reasoning on the Dyck path from (0, 0) to (16, 0) shown
above. This Dyck path has first return (6, 0); thus, the corresponding k is 3.
Since this Dyck path does not return to the x-axis before (2k, 0) = (6, 0), its first
2k steps stay above (or on) the yellow trapezoid shown here:

In particular, the first and the last of these 2k steps are uniquely determined,
while the steps between them form a diagonally shifted Dyck path that is filled
in green here:

Finally, the last 2 (n − k ) steps form a horizontally shifted Dyck path that is
Math 701 Spring 2021, version April 6, 2024 page 18

filled in purple here:

Our above argument shows that there are ck−1 choices for the green Dyck path
and cn−k choices for the purple Dyck path, therefore ck−1 cn−k options in total.
Forget that we fixed k. Our counting argument above shows that

(# of Dyck paths from (0, 0) to (2n, 0) whose first return is (2k, 0))
= c k −1 c n − k (8)

for each k ∈ {1, 2, . . . , n}. Now,

cn = (# of Dyck paths from (0, 0) to (2n, 0))


n
= ∑ (|# of Dyck paths from (0, 0) to (2n, 0) whose first return is (2k, 0))
k =1
{z }
= c k −1 c n − k
(by (8))
n
= ∑ c k −1 c n − k = c 0 c n −1 + c 1 c n −2 + c 2 c n −3 + · · · + c n −1 c 0 .
k =1

This is a recurrence equation for cn . Combining it with c0 = 1, we can use it


to compute any value of cn recursively. Let us, however, try to digest it using
generating functions!
Let
C ( x ) : = ∑ c n x n = c0 + c1 x + c2 x 2 + c3 x 3 + · · · .
n ≥0
Math 701 Spring 2021, version April 6, 2024 page 19

Thus,
C ( x ) = c0 + c1 x + c2 x 2 + c3 x 3 + · · ·
= 1 + ( c0 c0 ) x + ( c0 c1 + c1 c0 ) x 2 + ( c0 c2 + c1 c1 + c2 c0 ) x 3 + · · ·
 
since c0 = 1
 and cn = c0 cn−1 + c1 cn−2 + c2 cn−3 + · · · + cn−1 c0 
for each n > 0
 
2
= 1 + x ( c0 c0 ) + ( c0 c1 + c1 c0 ) x + ( c0 c2 + c1 c1 + c2 c0 ) x + · · ·
| {z }
2
=(c0 +c1 x +c2 x2 +··· )
2
(because if we multiply out (c0 +c1 x +c2 x2 +··· )
and collect like powers of x, we obtain
exactly (c0 c0 )+(c0 c1 +c1 c0 ) x +(c0 c2 +c1 c1 +c2 c0 ) x2 +···)
 2

= 1 + x c0 + c1 x + c2 x2 + · · · = 1 + x (C ( x ))2 .
 
| {z }
=C ( x )

This is a quadratic equation in C ( x ). Let us solve it by the quadratic formula


(assuming for now that this is allowed – i.e., that the quadratic formula really
does apply to our “power series”, whatever they are). Thus, we get

1 ± 1 − 4x
C (x) = . (9)
2x
The ± sign here cannot be a + sign, because if it was a +, then the power series
on top of the fraction would not be divisible by 2x (as its constant term would
be 2 and thus nonzero5 ). Thus, (9) becomes

1 − 1 − 4x 1  
C (x) = = 1 − (1 − 4x )1/2 . (10)
2x 2x
How do we find the coefficients of the power series (1 − 4x )1/2 ?
For each n ∈ N, the binomial formula yields
n    
n k n k
(1 + x ) = ∑
n
x = ∑ x . (11)
k =0
k k ≥0
k
n
(Here, we have replaced the ∑ sign by a ∑ sign, thus extending the summa-
k =0 k ≥0
tion from all k ∈ {0, 1, . . . , n} to all k ∈ N. This does not change the value of
the sum, since all the newly appearing addends are 0, as you can easily check.)
5 Ifyou find this unconvincing, here √ is a cleaner way to argue this: Multiplying the equality
(9) by 2x, we obtain 2xC ( x ) = 1 ± 1 − 4x. The left hand side of this equality has constant
term√ 0, but the right hand side has constant term 1 ± 1 (here, we are making the
√ assumption
that 1 − 4x is a power series with constant term 1; this is plausible because 1 − 4 · 0 = 1
and will also be justified further below). Thus, 0 = 1 ± 1; this shows that the ± sign is a −
sign.
Math 701 Spring 2021, version April 6, 2024 page 20

Let us pretend that the formula (11) holds not only for n ∈ N, but also for
n = 1/2. That is, we have
 
1/2 k
(1 + x ) 1/2
= ∑ x . (12)
k ≥0
k

Now, substitute −4x for x in this equality (here we are making the rather plau-
sible assumption that we can substitute −4x for x in a power series); then, we
get
   
1/2 1/2
(1 − 4x ) 1/2
= ∑ (−4x ) = ∑
k
(−4)k x k
k ≥0
k k ≥0
k
   
1/2 1/2
= 0
x +∑
(−4) |{z}0
(−4)k x k
0 | {z } k ≥1
k
| {z } =1 =1
=1
 
1/2
= 1+ ∑ (−4)k x k .
k ≥1
k

Hence,
  !  
1/2 1/2
1 − (1 − 4x ) 1/2
= 1− 1+ ∑ k
(−4) x = − ∑
k k
k
(−4)k x k .
k ≥1 k ≥1

Thus, (10) becomes


  !
1   1 1/2
C (x) = 1 − (1 − 4x )1/2 = −∑ (−4)k x k
2x | 2x k ≥1
k
 {z  }
1/2
=− ∑ (−4)k x k
k ≥1 k
1/2 − (−4)k x k
   
1/2
= ∑ = ∑ 2 (−4)k−1 x k−1
k ≥1
k | 2x
{z } k ≥1
k
=2(−4)k−1 x k−1
 
1/2
= ∑ k+1
2 (−4)k x k
k ≥0
(here, we have substituted k + 1 for k in the sum) .

Comparing coefficients before x n in this equality gives


 
1/2
cn = 2 (−4)n . (13)
n+1
This is an explicit formula for cn (and makes computation of cn pretty easy!),
but it turns out that it can be simplified further. Indeed, the definition of
Math 701 Spring 2021, version April 6, 2024 page 21

 
1/2
yields
n+1
(1/2) (1/2 − 1) (1/2 − 2) · · · (1/2 − n)
 
1/2
=
n+1 ( n + 1) !
1 −1 −3 −5 − (2n − 1)
· · · ·····
2 2 2 2 2
=
( n + 1) !
(1 · (−1) · (−3) · (−5) · · · · · (− (2n − 1))) /2n+1
=
( n + 1) !
((−1) · (−3) · (−5) · · · · · (− (2n − 1))) /2n+1
=
( n + 1) !
n
(−1) (1 · 3 · 5 · · · · · (2n − 1)) /2n+1
= .
( n + 1) !
Thus, (13) rewrites as
(−1)n (1 · 3 · 5 · · · · · (2n − 1)) /2n+1
cn = · 2 (−4)n
( n + 1) !
1 · 3 · 5 · · · · · (2n − 1) (−1)n · 2 (−4)n
= · +1
( n + 1) ! | 2n{z }
| {z }
=2n
1 1 · 3 · 5 · · · · · (2n − 1)
= ·
n+1 n!
(since (n+1)!=(n+1)·n!)
1 · 3 · 5 · · · · · (2n − 1) n
 
1 1 2n
= · ·2 = .
n+1 | n!{z } n+1 n
 
2n
=
n
(by Example 2.3.3)

Hence, we have shown that


 
1 2n
cn = . (14)
n+1 n
Moreover, we can rewrite this further as
   
2n 2n
cn = − (15)
n n−1
   
1 2n 2n
(since another binomial coefficient manipulation6 yields = −
  n+1 n n
2n
).
n−1
6 See Exercise A.2.1.2 (a) for this.
Math 701 Spring 2021, version April 6, 2024 page 22

 
1 2n
Here is the upshot: The # of Dyck words of length 2n is cn = . In
n+1 n
other words, a 2n-tuple that consists of n entries equal to 0 and n entries equal
1
to 1 (chosen uniformly at random) is a Dyck word with probability .
n+1
(There are also combinatorial ways to prove this; see, e.g., [GrKnPa94, §7.5,
discussion at the end of Example 4] or [Stanle15, §1.6] or [Martin13] or [22fco,
Lecture 29, Theorem 5.6.7] or [Loehr11, Theorem 1.56]7 or [Spivey19, §8.5,
proofs of Identity 244]8 .)
Here is a table of the first 12 Catalan numbers cn :

n 0 1 2 3 4 5 6 7 8 9 10 11
.
cn 1 1 2 5 14 42 132 429 1430 4862 16 796 58 786

3.1.3. Example 3: The Vandermonde convolution


Example 3: The Vandermonde convolution identity (also known as the Chu–Vandermonde
identity) says that
  n   
a+b a b
= ∑ for any numbers a, b and any n ∈ N
n k =0
k n−k

(where “numbers” can mean, e.g., “complex numbers”).


Let us prove this using generating functions. For now, we shall only prove
this for a, b ∈ N; later I will explain why it also holds for arbitrary (rational,
real or complex) numbers a, b as well.
Indeed, fix a, b ∈ N. Recall (from (11)) that
 
n k
(1 + x ) = ∑
n
x
k ≥0
k

7 Note that the “Dyck paths” in [Loehr11] differ from ours in that they use N-steps (i.e., steps
(i, j) 7→ (i, j + 1)) and E-steps (i.e., steps (i, j) 7→ (i + 1, j)) instead of NE-steps and SE-steps,
and stay above the x = y line instead of above the x-axis. But this √ notion of Dyck paths is
equivalent to ours, since a clockwise rotation by 45◦ followed by a 2-homothety transforms
it into ours.
8 Again, [Spivey19] works not directly with Dyck paths, but rather with paths that use E-steps

(i.e., steps (i, j) 7→ (i + 1, j)) and N-steps (i.e., steps (i, j) 7→ (i, j + 1)) instead of NE-steps
and SE-steps, and stay below the x = y line instead of above the x-axis. But this kind of
Dyck paths is equivalent to our Dyck paths, since √ a reflection across the x = y line, followed
by a clockwise rotation by 45◦ followed by a 2-homothety transforms it into ours.
Math 701 Spring 2021, version April 6, 2024 page 23

for each n ∈ N. Hence,


 
a k
(1 + x ) = ∑
a
x and
k ≥0
k
 
b k
(1 + x ) = ∑
b
x and
k ≥0
k
 
a+b k
(1 + x ) a+b
= ∑ x .
k ≥0
k
Thus,
  !   !
a b
a
(1 + x ) (1 + x ) =b
∑ k xk ∑ k xk
k ≥0 k ≥0
| {z  } | {z  }
a k b k
=∑ x =∑ x
k ≥0 k k ≥0 k
  !   !
a k b ℓ
= ∑ x ∑ x
k ≥0
k ℓ≥0

   
a k b ℓ
= ∑ ∑ x x
k≥0 ℓ≥0
k ℓ
  
a b k+ℓ
= ∑ ∑ x
k≥0 ℓ≥0
k ℓ
n   !
a b
= ∑ ∑ xn
n ≥0 k =0 k n − k
(here, we have merged addends in which x appears in the same power). Hence,
n   !  
a b a+b k
∑ ∑ n−k
n a b
x = (1 + x ) (1 + x ) = (1 + x ) a+b
= ∑ x
n ≥0 k =0 k k ≥0
k
 
a+b n
= ∑ x .
n ≥0 n
Comparing coefficients in this equality, we obtain
n     
a b a+b
∑ k n−k = n for each n ∈ N.
k =0
This completes the proof of the Vandermonde convolution identity for a, b ∈ N.

3.1.4. Example 4: Solving a recurrence


Example 4. The following example is from [Wilf04, §1.2]. Define a sequence
( a0 , a1 , a2 , . . .) of numbers recursively by
a0 = 1, an+1 = 2an + n for all n ≥ 0.
Math 701 Spring 2021, version April 6, 2024 page 24

Thus, the first entries of this sequence are 1, 2, 5, 12, 27, 58, 121, . . .. This se-
quence appears in the OEIS (= Online Encyclopedia of Integer Sequences) as
A000325, with index shifted.
Can we find an explicit formula for an (without looking it up in the OEIS)?
Again, generating functions are helpful. Set

A ( x ) = a0 + a1 x + a2 x 2 + a3 x 3 + · · · .

Then,

A ( x ) = a0 + a1 x + a2 x 2 + a3 x 3 + · · ·
= 1 + (2a0 + 0) x + (2a1 + 1) x2 + (2a2 + 2) x3 + · · ·
(since a0 = 1 and an+1 = 2an + n for all n ≥ 0)
   
= 1 + 2 a0 x + a1 x2 + a2 x3 + · · · + 0x + 1x2 + 2x3 + · · ·
| {z } | {z }
= xA( x ) = x (0+1x +2x2 +3x3 +··· )
 
= 1 + 2xA ( x ) + x 0 + 1x + 2x2 + 3x3 + · · · . (16)

Thus, it would clearly be helpful to find a simple expression for 0 + 1x +


2x2 + 3x3 + · · · . Here are two ways to do so:
First way: We assume that our power series (whatever they actually are) can
be differentiated (as if they were functions). We furthermore assume that these
derivatives satisfy the same basic rules (sum rule, product rule, quotient rule,
chain rule) as the derivatives in real analysis. (Again, these assumptions shall
be justified later on.)
Denoting the derivative of a power series f by f ′ , we then have
 ′
2 3
1+x+x +x +··· = 1 + 2x + 3x2 + 4x3 + · · · .

Hence,
′  ′
2 3

2 3 1
1 + 2x + 3x + 4x + · · · = 1 + x + x + x + · · · =
1−x

1
(since (5) yields 1 + x + x2 + x3 + · · · = ). Using the quotient rule, we can
 ′ 1 − x
1 1
easily find that = , so that
1−x (1 − x )2
 ′
2 3 1 1
1 + 2x + 3x + 4x + · · · = = . (17)
1−x (1 − x )2
Math 701 Spring 2021, version April 6, 2024 page 25

The left hand side of this looks very similar to the power series 0 + 1x + 2x2 +
3x3 + · · · that we want to simplify. And indeed, we have the following:
  1
0 + 1x + 2x2 + 3x3 + · · · = x 1 + 2x + 3x2 + 4x3 + · · · = x ·
| {z } (1 − x )2
1
=
(1 − x )2
x
= . (18)
(1 − x )2

Second way: We rewrite 0 + 1x + 2x2 + 3x3 + · · · as an infinite sum of infinite


sums:
0 + 1x + 2x2 + 3x3 + · · ·
= x1 + x2 + x3 + x4 + ···
+ x2 + x3 + x4 + ···
+ x3 + x4 + ···
+ x4 + ···
... ...
  1 1
= ∑ x k
+ x k +1
+ x k +2
+ · · · = ∑ xk ·
1 − x
=
1 − x
· ∑ xk
k ≥1 | {z } k ≥1 k ≥1
k 2 3
= x ·(1+ x + x + x +··· )
| {z }
= x + x2 + x3 +···
1

k
1 = x · (1+x+x2 +x3 +··· )
=x ·
1−x 1
(by (5)) =x·
1−x
(by (5))
1 1 x
= ·x· = . (19)
1−x 1−x (1 − x )2
(We used some unstated assumptions here about infinite sums – specifically,
we assumed that we can rearrange them without worrying about absolute con-
vergence or similar issues – but we will later see that these assumptions are
well justified. Besides, we obtained the same result as by our first way, which
is reassuring.)
Having computed 0 + 1x + 2x2 + 3x3 + · · · , we can now simplify (16), obtain-
ing
 
A ( x ) = 1 + 2xA ( x ) + x 0 + 1x + 2x2 + 3x3 + · · ·
| {z }
x
=
(1 − x )2
x
= 1 + 2xA ( x ) + x · .
(1 − x )2
Math 701 Spring 2021, version April 6, 2024 page 26

This is a linear equation in A ( x ). Solving it yields


1 − 2x + 2x2
A (x) =
(1 − x )2 (1 − 2x )
2 1
= − (by partial fraction decomposition)
|1 −
{z2x} (1 − x )2
k k
| {z }
=2 ∑ 2 x =1+2x +3x2 +4x3 +···
k ≥0
(by (17), or alternatively
(by (6))
by dividing the
equality (19) by x)
 
= 2 ∑ 2 x − 1 + 2x + 3x + 4x + · · ·
k k 2 3

k ≥0 | {z }
= ∑ ( k +1) x k
| {z }
= ∑ 2k +1 x k k ≥0
k ≥0
 
= ∑ 2k +1 x k − ∑ ( k + 1) x k = ∑ 2k +1 − ( k + 1 ) x k .
k ≥0 k ≥0 k ≥0

Comparing coefficients, we obtain


a n = 2n +1 − ( n + 1 ) for each n ∈ N.
This is also easy to prove directly (by induction on n).

3.2. Definitions
The four examples above should have convinced you that generating functions
can be useful. Thus, it is worthwhile to put them on a rigorous footing by
first defining generating functions and then justifying the manipulations we
have been doing to them in the previous section (e.g., dividing them, solving
quadratic equations, taking infinite sums, taking derivatives, ...). We are next
going to sketch how this can be done (see [Loehr11, Chapter 7 (in the 1st edi-
tion)] and [19s, Chapter 7] for some details).
First things first: Generating functions are not actually functions. They are
so-called formal power series (short FPSs). Roughly speaking, a formal power
series is a “formal” infinite sum of the form a0 + a1 x + a2 x2 + · · · , where x
is an “indeterminate” (we shall soon see what this all means). You cannot
substitute x = 2 into such a power series. (For example, substituting x = 2
1 1
into = 1 + x + x2 + x3 + · · · would lead to the absurd equality =
1−x −1
1 + 2 + 4 + 8 + 16 + · · · .) The word “function” in “generating function” is
somewhat of a historical artifact.

3.2.1. Reminder: Commutative rings


In order to obtain a precise understanding of what FPSs are, we go back to
abstract algebra. We begin by recalling the concept of a commutative ring. This
Math 701 Spring 2021, version April 6, 2024 page 27

is defined in any textbook on abstract algebra for more details, but we recall
the definition for the sake of completeness.
Informally, a commutative ring is a set K equipped with binary operations
⊕, ⊖ and ⊙ and elements 0 and 1 that “behave” like addition, subtraction
and multiplication (of numbers) and the numbers 0 and 1, respectively. For
example, they should satisfy rules like ( a ⊕ b) ⊙ c = ( a ⊙ c) ⊕ (b ⊙ c).
Formally, commutative rings are defined as follows:
Definition 3.2.1. A commutative ring means a set K equipped with three maps

⊕ : K × K → K,
⊖ : K × K → K,
⊙ : K×K → K

and two elements 0 ∈ K and 1 ∈ K satisfying the following axioms:

1. Commutativity of addition: We have a ⊕ b = b ⊕ a for all a, b ∈ K.


(Here and in the following, we write the three maps ⊕, ⊖ and ⊙ infix
– i.e., we denote the image of a pair ( a, b) ∈ K × K under the map ⊕ by
a ⊕ b rather than by ⊕ ( a, b).)

2. Associativity of addition: We have a ⊕ (b ⊕ c) = ( a ⊕ b) ⊕ c for all a, b, c ∈


K.

3. Neutrality of zero: We have a ⊕ 0 = 0 ⊕ a = a for all a ∈ K.

4. Subtraction undoes addition: Let a, b, c ∈ K. We have a ⊕ b = c if and only


if a = c ⊖ b.

5. Commutativity of multiplication: We have a ⊙ b = b ⊙ a for all a, b ∈ K.

6. Associativity of multiplication: We have a ⊙ (b ⊙ c) = ( a ⊙ b) ⊙ c for all


a, b, c ∈ K.

7. Distributivity: We have

a ⊙ (b ⊕ c) = ( a ⊙ b) ⊕ ( a ⊙ c) and ( a ⊕ b) ⊙ c = ( a ⊙ c) ⊕ (b ⊙ c)
for all a, b, c ∈ K.

8. Neutrality of one: We have a ⊙ 1 = 1 ⊙ a = a for all a ∈ K.

9. Annihilation: We have a ⊙ 0 = 0 ⊙ a = 0 for all a ∈ K.

[Note: Most authors do not include ⊖ in the definition of a commutative


ring, but instead require the existence of additive inverses for all a ∈ K. This
is equivalent to our definition, because if additive inverses exist, then we can
define a ⊖ b to be a ⊕ b where b is the additive inverse of b.]
Math 701 Spring 2021, version April 6, 2024 page 28

The operations ⊕, ⊖ and ⊙ are called the addition, the subtraction and the
multiplication of the ring K. This does not imply that they have any connection
with the usual addition, subtraction and multiplication of numbers; it merely
means that they play similar roles to the latter and behave similarly. When
confusion is unlikely, we will denote these three operations ⊕, ⊖ and ⊙ by
+, − and ·, respectively, and we will abbreviate a ⊙ b = a · b by ab.
The elements 0 and 1 are called the zero and the unity (or the one) of the
ring K. Again, this does not imply that they equal the numbers 0 and 1, but
merely that they play analogous roles. We will simply call these elements 0
and 1 when confusion with the corresponding numbers is unlikely.
We will use PEMDAS conventions for the three operations ⊕, ⊖ and ⊙.
These imply that the operation ⊙ has higher precedence than ⊕ and ⊖, while
the operations ⊕ and ⊖ are left-associative. Thus, for example, “ab + ac”
means ( ab) + ( ac) (that is, ( a ⊙ b) ⊕ ( a ⊙ c)). Likewise, “a − b + c” means
( a − b) + c = ( a ⊖ b) ⊕ c.

Here are some examples of commutative rings:

• The sets Z, Q, R and C are commutative rings. (Of course, the operations
⊕, ⊖ and ⊙ of these rings are just the usual operations +, − and · known
from high school.)

• The set N is not a commutative ring, since it has no subtraction. (It is,
however, something called a commutative semiring.)

• The matrix ring Qm×m (this is the ring of all m × m-matrices with rational
entries) is not a commutative ring for m > 1 (because it fails the “commu-
tativity of multiplication” axiom). However, it satisfies all axioms other
than “commutativity of multiplication”. This makes it a noncommutative
ring.

• The set h√ i n √ o
Z 5 = a + b 5 | a, b ∈ Z
is a commutative ring with operations +, − and · inherited from R. This
is because any a, b, c, d ∈ Z satisfy
 √   √  √ h√ i
a + b 5 + c + d 5 = ( a + c) + (b + d) 5 ∈ Z 5 ;
 √   √  √ h √ i
a + b 5 − c + d 5 = ( a − c) + (b − d) 5 ∈ Z 5 ;
 √  √  √ h√ i
a + b 5 c + d 5 = ( ac + 5bd) + ( ad + bc) 5 ∈ Z 5 .

This is called a subring of R (i.e., a subset of R that is closed under the


operations +, − and · and therefore constitutes a commutative ring with
these operations inherited from R).
Math 701 Spring 2021, version April 6, 2024 page 29

• For each m ∈ Z, the set

Z/m = {all residue classes modulo m}


 
 equivalence classes of integers with respect to 
= the equivalence “congruent modulo m”
(that is, “differ by a multiple of m”)
 

is a commutative ring, with its operations defined by

a + b = a + b, a − b = a − b, a · b = ab.

If m > 0, then this ring Z/m is finite and has size m. It is also known
as Z/mZ or Zm (careful with the latter notation; it can mean different
things to different people). When m is prime, the ring Z/m is actually a
finite field and is called Fm . (But, e.g., the ring Z/4 is not a field, and not
the same as F4 .)

• In the examples we have seen so far, the elements of the commutative ring
either are numbers or (as in the case of matrices or residue classes) consist
of numbers. For a contrast, here is an example where they are sets:
For any two sets X and Y, we define the symmetric difference X △ Y of X
and Y to be the set

( X ∪ Y ) \ ( X ∩ Y ) = ( X \ Y ) ∪ (Y \ X )
= {all elements that belong to exactly one of X and Y } .

Fix a set S. Consider the power set P (S) of S (that is, the set of all
subsets of S). This power set P (S) is a commutative ring if we equip
it with the operation △ as addition (that is, X ⊕ Y = X △ Y for any
subsets X and Y of S), with the same operation △ as subtraction (that
is, X ⊖ Y = X △ Y), and with the operation ∩ as multiplication (that
is, X ⊙ Y = X ∩ Y), and with the elements ∅ and S as zero and unity
(that is, with 0 = ∅ and 1 = S). Indeed, it is straightforward to see
that all the axioms in Definition 3.2.1 hold for this ring. (For example,
distributivity holds because any three sets A, B, C satisfy A ∩ ( B △ C ) =
( A ∩ B) △ ( A ∩ C ) and ( A △ B) ∩ C = ( A ∩ C ) △ ( B ∩ C ).) This is an
example of a Boolean ring (i.e., a ring in which aa = a for each element a
of the ring).

• Here is another example of a semiring, which is rather useful in combina-


torics. Let T be the set Z ∪ {−∞}, where −∞ is just some extra symbol.
Define two operations ⊕ and ⊙ on this set T by setting

a ⊕ b = max { a, b} (where max {n, −∞} := n for each n ∈ T)


Math 701 Spring 2021, version April 6, 2024 page 30

and

a⊙b = a+b (where n + (−∞) := (−∞) + n := −∞ for each n ∈ T) .

Then, T is a commutative semiring (i.e., it would satisfy Definition 3.2.1


if not for the lack of subtraction). It is called the tropical semiring of Z.

More examples can be found in algebra textbooks (or in [Grinbe15, §6.1] or


[19s, §5.2] or [23wa, §2.1.2, §2.3.2, §2.3.3]).
Good news: In any commutative ring K, the standard rules of computation
apply:

• You can compute finite sums (of elements of K) without specifying the
order of summation or the placement of parentheses. For example, for
any a, b, c, d, e ∈ K, we have

(( a + b) + (c + d)) + e = ( a + (b + c)) + (d + e) ,

so you can write the sum a + b + c + d + e without putting parentheses


around anything. (This is called “general associativity”.)
Also, finite sums do not depend on the order of addends. For example,
for any a, b, c, d, e ∈ K, we have

a + b + c + d + e = d + b + a + e + c.

(This is called “general commutativity”.)


More formally: If ( as )s∈S is any finite family of elements of a commutative
ring K (this means that S is a finite set, and as is an element of K for each
s ∈ S), then the finite sum
∑ as
s∈S
is a well-defined element of K. Furthermore, such sums satisfy the usual
rules of sums (see [Grinbe15, §1.4.2]); for instance:
– If S = X ∪ Y and X ∩ Y = ∅, then ∑ as = ∑ as + ∑ as .
s∈S s∈ X s ∈Y
– We have ∑ ( as + bs ) = ∑ as + ∑ bs .
s∈S s∈S s∈S
– If S and W are two finite sets, and if f : S → W is a map, then
∑ as = ∑ ∑ as . (That is, you can subdivide a finite sum into a
s∈S w ∈W s∈S;
f (s)=w
finite sum of finite sums by bunching its addends arbitrarily.)
Math 701 Spring 2021, version April 6, 2024 page 31

(See [Grinbe15, §2.14] for very detailed proofs9 . Alternatively, you can
treat them as exercises on induction.)
If S = ∅, then ∑ as = 0 by definition. Such a sum ∑ as with S = ∅ is
s∈S s∈S
called an empty sum.
• The same holds for finite products. If S = ∅, then ∏ as = 1 by definition.
s∈S

• If a ∈ K, then − a denotes 0 − a = 0 − a ∈ K.
• If n ∈ Z and a ∈ K, then we can define an element na ∈ K by

|a + a +{z· · · + }a,

 if n ≥ 0;

  n times


na =
− |a + a +{z· · · + }a , if n < 0.






−n times

This generalizes the classical definition of multiplication (for integers) as


repeated addition.
• If n ∈ N and a ∈ K, then we can define an element
an = |aa {z
· · · }a ∈ K.
n times

In particular, a0 = |aa {z
· · · }a = 1 (since an empty product is always 1).
0 times

• Standard rules hold:


− ( a + b) = (− a) + (−b) for any a, b ∈ K;
− (− a) = a for any a ∈ K;
(n + m) a = na + ma for any a ∈ K and n, m ∈ Z;
(nm) a = n (ma) for any a ∈ K and n, m ∈ Z;
a (b − c) = ( ab) − ( ac) for any a, b, c ∈ K;
n n n
( ab) = a b for any a, b ∈ K and n ∈ N;
an+m = an am for any a ∈ K and n, m ∈ N;
nm n m
a = (a ) for any a ∈ K and n, m ∈ N;
...;
n  
n k n−k
( a + b) = ∑
n
a b for any a, b ∈ K and n ∈ N.
k =0
k
(The latter equality is known as the binomial theorem or binomial formula.)
9 Theseproofs are stated for numbers rather than elements of an arbitrary commutative ring
K, but the exact same reasoning works in an arbitrary ring K.
Math 701 Spring 2021, version April 6, 2024 page 32

A further concept will be useful. Namely, if K is a commutative ring, then


the notion of a K-module is the straightforward generalization of the concept of
a K-vector space to cases where K is not a field (but just a commutative ring).
Here is the definition of a K-module in detail:

Definition 3.2.2. Let K be a commutative ring.


A K-module means a set M equipped with three maps

⊕ : M × M → M,
⊖ : M × M → M,
⇀:K×M → M

(notice that the third map has domain K × M, not M × M) and an element


0 ∈ M satisfying the following axioms:

1. Commutativity of addition: We have a ⊕ b = b ⊕ a for all a, b ∈ M.


(Here and in the following, we write the three maps ⊕, ⊖ and ⇀ infix,
just as for a commutative ring.)

2. Associativity of addition: We have a ⊕ (b ⊕ c) = ( a ⊕ b) ⊕ c for all a, b, c ∈


M.

→ − →
3. Neutrality of zero: We have a ⊕ 0 = 0 ⊕ a = a for all a ∈ M.

4. Subtraction undoes addition: Let a, b, c ∈ M. We have a ⊕ b = c if and


only if a = c ⊖ b.

5. Associativity of scaling: We have u ⇀ (v ⇀ a) = (uv) ⇀ a for all u, v ∈ K


and a ∈ M.

6. Left distributivity: We have u ⇀ ( a ⊕ b) = (u ⇀ a) ⊕ (u ⇀ b) for all


u ∈ K and a, b ∈ M.

7. Right distributivity: We have (u + v) ⇀ a = (u ⇀ a) ⊕ (v ⇀ a) for all


u, v ∈ K and a ∈ M.

8. Neutrality of one: We have 1 ⇀ a = a for all a ∈ M.




9. Left annihilation: We have 0 ⇀ a = 0 for all a ∈ M.

→ − →
10. Right annihilation: We have u ⇀ 0 = 0 for all u ∈ K.

[Note: Most authors do not include ⊖ in the definition of a K-module. In-


stead, they may require the existence of additive inverses for all a ∈ M. Just
like for commutative rings, this is equivalent. Actually, even the existence of
additive inverses is not necessary, since the additive inverse of an a ∈ M can
Math 701 Spring 2021, version April 6, 2024 page 33

be constructed using the operation ⇀ as (−1) ⇀ a. Thus, you will some-


times see the notion of a K-module defined entirely without any reference to
subtraction or additive inverses; it is still equivalent to ours.]
The operations ⊕, ⊖ and ⇀ are called the addition, the subtraction and the
scaling (or the K-action) of the K-module M. When confusion is unlikely, we
will denote these three operations ⊕, ⊖ and ⇀ by +, − and ·, respectively,
and we will abbreviate a ⇀ b = a · b by ab.


The element 0 is called the zero (or the zero vector) of the K-module M.
We will usually just call it 0.
When M is a K-module, the elements of M are called vectors, while the
elements of K are called scalars.
We will use PEMDAS conventions for the three operations ⊕, ⊖ and ⇀,
with the operation ⇀ having higher precedence than ⊕ and ⊖.

This all having been said, we can now define formal power series.

3.2.2. The definition of formal power series


Until the end of Chapter 3, the following convention will be in place:

Convention 3.2.3. Fix a commutative ring K. (For example, K can be Z or Q


or C.)

Definition 3.2.4. A formal power series (or, short, FPS) in 1 indeterminate over
K means a sequence ( a0 , a1 , a2 , . . .) = ( an )n∈N ∈ KN of elements of K.

Examples of FPSs over Z are (0, 0, 0, . . .) and (1, 0, 0, 0, . . .) and (1, 1, 1, 1, . . .)


and (1, 2, 3, 4, . . .).
Definition 3.2.4 technically answers the question “what is an FPS”; however,
the questions “what can we do with an FPS” and “why do the examples in
Section 3.1 work” or “what is x” remain open. These questions will take us a
while.
First, let us define some operations on FPSs.

Definition 3.2.5. (a) The sum of two FPSs a = ( a0 , a1 , a2 , . . .) and b =


(b0 , b1 , b2 , . . .) is defined to be the FPS

( a0 + b0 , a1 + b1 , a2 + b2 , . . .) .

It is denoted by a + b.
(b) The difference of two FPSs a = ( a0 , a1 , a2 , . . .) and b = (b0 , b1 , b2 , . . .) is
defined to be the FPS

( a0 − b0 , a1 − b1 , a2 − b2 , . . .) .
Math 701 Spring 2021, version April 6, 2024 page 34

It is denoted by a − b.
(c) If λ ∈ K and if a = ( a0 , a1 , a2 , . . .) is an FPS, then we define an FPS

λa := (λa0 , λa1 , λa2 , . . .) .

(d) The product of two FPSs a = ( a0 , a1 , a2 , . . .) and b = (b0 , b1 , b2 , . . .) is


defined to be the FPS (c0 , c1 , c2 , . . .), where
n
cn = ∑ a i bn − i = ∑ ai b j
i =0 (i,j)∈N2 ;
i + j=n
= a0 bn + a1 bn−1 + a2 bn−2 + · · · + an b0 for each n ∈ N.

This product is denoted by a · b or just by ab.


(e) For each a ∈ K, we define a to be the FPS ( a, 0, 0, 0, . . .). An FPS of the
form a for some a ∈ K (that is, an FPS ( a0 , a1 , a2 , . . .) satisfying a1 = a2 =
a3 = · · · = 0) is said to be constant.
(f) The set of all FPSs (in 1 indeterminate over K) is denoted K [[ x ]].
The following theorem is crucial: essentially it says that the operations on
FPSs that we have just defined behave as such operations should:
Theorem 3.2.6. (a) The set K [[ x ]] is a commutative ring (with its operations
+, − and · defined in Definition 3.2.5). Its zero and its unity are the FPSs 0 =
(0, 0, 0, . . .) and 1 = (1, 0, 0, 0, . . .). This means, concretely, that the following
facts hold:

1. Commutativity of addition: We have a + b = b + a for all a, b ∈ K [[ x ]].


2. Associativity of addition: We have a + (b + c) = (a + b) + c for all
a, b, c ∈ K [[ x ]].
3. Neutrality of zero: We have a + 0 = 0 + a = a for all a ∈ K [[ x ]].
4. Subtraction undoes addition: Let a, b, c ∈ K [[ x ]]. We have a + b = c if
and only if a = c − b.
5. Commutativity of multiplication: We have ab = ba for all a, b ∈ K [[ x ]].
6. Associativity of multiplication: We have a (bc) = (ab) c for all a, b, c ∈
K [[ x ]].
7. Distributivity: We have
a (b + c) = ab + ac and (a + b) c = ac + bc
for all a, b, c ∈ K [[ x ]].
Math 701 Spring 2021, version April 6, 2024 page 35

8. Neutrality of one: We have a · 1 = 1 · a = a for all a ∈ K [[ x ]].

9. Annihilation: We have a · 0 = 0 · a = 0 for all a ∈ K [[ x ]].

(b) Furthermore, K [[ x ]] is a K-module (with its scaling being the map that
sends each (λ, a) ∈ K × K [[ x ]] to the FPS λa defined in Definition 3.2.5 (c)).
Its zero vector is 0. Concretely, this means that:

10. Associativity of scaling: We have λ (µa) = (λµ) a for all λ, µ ∈ K and


a ∈ K [[ x ]].

11. Left distributivity: We have λ (a + b) = λa + λb for all λ ∈ K and


a, b ∈ K [[ x ]].

12. Right distributivity: We have (λ + µ) a = λa + µa for all λ, µ ∈ K and


a ∈ K [[ x ]].

13. Neutrality of one: We have 1a = a for all a ∈ K [[ x ]].

14. Left annihilation: We have 0a = 0 for all a ∈ K [[ x ]].

15. Right annihilation: We have λ0 = 0 for all λ ∈ K.

(Also, some of the facts from part (a) are included in this statement.)
(c) We have λ (a · b) = (λa) · b = a · (λb) for all λ ∈ K and a, b ∈ K [[ x ]].
(d) Finally, we have λa = λ · a for all λ ∈ K and a ∈ K [[ x ]].

Theorem 3.2.6 allows us to calculate with FPSs as we do with numbers, at


least as far as the operations +, − and · are concerned. Hence, e.g., we know
that:

• Sums and products in K [[ x ]] need no parentheses and do not depend on


the order of addends/factors. For example, for any a, b, c, d ∈ K [[ x ]], we
have ((ab) c) d = a ((bc) d) = (ab) (cd), so that we can write abcd for
each of these products; and moreover, we have abcd = bdac = dacb.
k k
• Finite sums and products (such as ∑ ai or ∑ ai or ∏ ai or ∏ ai , where I
i =1 i∈ I i =1 i∈ I
is a finite set) make sense and behave as one would expect.

• Powers exist: that is, you can take an for each FPS a and each n ∈ N.

• Standard rules hold: e.g., we have an+m = an am and (ab)n = an bn for any
a, b ∈ K [[ x ]] and any n, m ∈ N.
Math 701 Spring 2021, version April 6, 2024 page 36

• The binomial theorem holds: For any a, b ∈ K [[ x ]] and any n ∈ N, we


have
n  
n k n−k
(a + b) = ∑
n
a b .
k =0
k

Before we prove Theorem 3.2.6, we introduce one more notation:

Definition 3.2.7. If n ∈ N, and if a = ( a0 , a1 , a2 , . . .) ∈ K [[ x ]] is an FPS, then


we define an element [ x n ] a ∈ K by

[ x n ] a := an .

This is called the coefficient of x n in a, or the n-th coefficient of a, or the x n -


coefficient of a.

Thus, the definition of the sum of two FPSs (Definition 3.2.5 (a)) rewrites as
follows: For any a, b ∈ K [[ x ]] and any n ∈ N, we have

[ x n ] (a + b) = [ x n ] a + [ x n ] b. (20)

Similarly, for any a, b ∈ K [[ x ]] and any n ∈ N, we have

[ x n ] (a − b) = [ x n ] a − [ x n ] b. (21)

Meanwhile, the definition of the product of two FPSs (Definition 3.2.5 (d))
rewrites as follows: For any a, b ∈ K [[ x ]] and any n ∈ N, we have

[ x n ] (ab)
h i h i h i h i h i h i
0 n 1 n −1 2 n −2 n
= x a · [x ] b + x a · x b+ x a· x b + · · · + [ x ] a · x0 b
n h i h i
= ∑ x i a · x n −i b (22)
i =0
n h i h i
= ∑ x n− j
a · xj b (23)
j =0

(here, we have substituted n − j for i in the sum). For n = 0, this equality


simplifies to h i h i h i
x0 (ab) = x0 a · x0 b. (24)
In other words, when we multiply two FPSs, their constant terms get multi-
plied. Here and in the following,  the constant term of an FPS a ∈ K [[ x ]] is
defined to be its 0-th coefficient x0 a.
Finally, Definition 3.2.5 (c) rewrites as follows: For any λ ∈ K and a ∈ K [[ x ]]
and any n ∈ N, we have
[ x n ] (λa) = λ · [ x n ] a. (25)
Math 701 Spring 2021, version April 6, 2024 page 37

Proof of Theorem 3.2.6. Most parts of Theorem 3.2.6 are straightforward to verify.
Let us check the associativity of multiplication.
Let a, b, c ∈ K [[ x ]]. We must prove that a (bc) = (ab) c. Let n ∈ N. Consider
the two equalities
n h i h i
n
[ x ] ((ab) c) = ∑ x n− j
(ab) · xj c
j =0 | {z }
n− j
= ∑ [ x i ] a · [ x n − j −i ] b
i =0
(by (22),
applied to n− j instead of n)
 
by (23), applied to ab and c
instead of a and b
n− j h i
!
n h i h i
= ∑ ∑ x i
a · x n − j −i
b · xj c
j =0 i =0
n n− j h i h i h i
= ∑ ∑ i
x a· x n − j −i
b · xj c
j =0 i =0

and
n
h i h i
[ x n ] (a (bc)) = ∑ xi a · x n−i (bc)
i =0 | {z }
n −i
= ∑ [ x n −i − j ] b · [ x j ] c
j =0
(by (23), applied to n−i − j, b and c
instead of n, a and b)
(by (22), applied to bc instead of b)
!
n h i n −i h i h i
= ∑ x a· ∑ x
i n −i − j j
b· x c
i =0 j =0
n n −i h i h i h i
= ∑ ∑ xi a · x n−i− j b · x j c.
i =0 j =0

The right hand sides of these two equalities are equal, since10

n n− j n n −i
∑ ∑= ∑ = ∑ ∑
j =0 i =0 (i,j)∈N2 ; i =0 j =0
i + j≤n

10 The
first equality we are about to state is an equality of summation signs. Such an equality
means that whatever you put inside the summation signs, they produce equal results. For
3
example, ∑ = ∑ and ∑ = ∑ .
i ∈{1,2,3} i =1 i ∈N i ≥0
Math 701 Spring 2021, version April 6, 2024 page 38

and n − j − i = n − i − j. Thus, their left hand sides are equal as well. In other
words,
[ x n ] ((ab) c) = [ x n ] (a (bc)) .
Now, forget that we fixed n. We thus have shown that [ x n ] ((ab) c) = [ x n ] (a (bc))
for each n ∈ N. In other words, each entry of (ab) c equals the corresponding
entry of a (bc). This entails (ab) c = a (bc) (since a FPS is just the sequence
of its entries). In other words, a (bc) = (ab) c. This concludes the proof of
associativity of multiplication.
The remaining claims of Theorem 3.2.6 are LTTR11 (their proofs follow the
same pattern, but are easier to execute).
Since K [[ x ]] is a commutative ring, any finite sum of FPSs is well-defined.
Sometimes, however, infinite sums of FPSs make sense as well: for example, it
stands to reason that
(1, 1, 1, 1, . . .)
+ (0, 1, 1, 1, . . .)
+ (0, 0, 1, 1, . . .)
+ (0, 0, 0, 1, . . .)
+···
= (1, 2, 3, 4, . . .) , (26)
because FPSs are added entrywise. Let us rigorously define such sums. First,
we define “essentially finite” sums of elements of K:

Definition 3.2.8. (a) A family ( ai )i∈ I ∈ K I of elements of K is said to be


essentially finite if all but finitely many i ∈ I satisfy ai = 0 (in other words, if
the set {i ∈ I | ai ̸= 0} is finite).
(b) Let ( ai )i∈ I ∈ K I be an essentially finite family of elements of K. Then,
the infinite sum ∑ ai is defined to equal the finite sum ∑ ai . Such an infinite
i∈ I i ∈ I;
a i ̸ =0
sum is said to be essentially finite.
 
5
For example, the family of integers is essentially finite12 , and its
2n n ∈N
sum is
           
5 5 5 5 5 5
∑ 2n = 20 + 21 + 22 + 23 + 24 + · · ·
n ∈N
= 5 + 2 + 1 + 0| + 0 +{z0 + · ·}· = 5 + 2 + 1 = 8.
throw these away

11 “LTTR” means “left to the reader”.


12 Here and in the following, the notation ⌊ x ⌋ (where x is a real number) stands for the largest
integer that is ≤ x. For instance, ⌊π ⌋ = 3 and ⌊3⌋ = 3.
Math 701 Spring 2021, version April 6, 2024 page 39

Note the following:

• A family ( ai )i∈ I ∈ K I is always essentially finite if I is finite; thus, essen-


tially finite families are a wider class than finite families.

• Any essentially finite sum of real or complex numbers is convergent in


the sense of analysis, but the converse is not true; for instance, the infinite
1 1 1 1 1
sum ∑ n = + + + + · · · is convergent in the sense of analysis,
n ∈N 2 1 2 4 8
but not essentially finite. Essential finiteness is a crude algebraic imitation
of convergence. One of its advantage is that it works for sums of elements
of any commutative ring, not just of R or C.

So the idea behind Definition 3.2.8 (b) is that addends that equal 0 can be
discarded in a sum, even when there are infinitely many of them.
Sums of essentially finite families satisfy the usual rules for sums (such as
the breaking-apart rule ∑ as = ∑ as + ∑ as when a set S is the union of
i ∈S i∈X i ∈Y
two disjoint sets X and Y). See [Grinbe15, §2.14.15] for details13 . There is
only one caveat: Interchange of summation signs (e.g., replacing ∑ ∑ ai,j by
i∈ I j∈ J
∑ ∑ ai,j ) works only if the family ai,j (i,j)∈ I × J is essentially finite (i.e., all but

j∈ J i∈ I
finitely many pairs (i, j) ∈ I × J satisfy ai,j = 0); it does not suffice that the
sums ∑ ∑ ai,j and ∑ ∑ ai,j themselves are essentially finite (i.e., that the
i∈ I j∈ J j∈ J i∈ I
 
families ai,j j∈ J for all i ∈ I, the families ai,j i∈ I for all j ∈ J, and the families
!  
∑ ai,j and ∑ ai,j are essentially finite).
j∈ J i∈ I
j∈ J
i∈ I 
For a counterexample, consider the family ai,j (i,j)∈ I × J of integers with I = {1, 2, 3, . . .}
and J = {1, 2, 3, . . .}, where ai,j is given by the following table:

ai,j j=1 j=2 j=3 j=4 j=5 ···

i=1 1 −1 ···
i=2 1 −1 ···
i=3 1 −1 ···
i=4 1 −1 ···
i=5 1 ···
.. .. .. .. .. .. ..
. . . . . . .

13 Note that [Grinbe15, §2.14.15] uses the words “finitely supported” instead of “essentially fi-
nite”.
Math 701 Spring 2021, version April 6, 2024 page 40


(where all the entries in the empty cells are 0). For this family ai,j (i,j)∈ I × J , both
sums ∑ ∑ ai,j and ∑ ∑ ai,j are essentially finite, but they are not equal (indeed,
i∈ I j∈ J j∈ J i∈ I
the former sum is ∑
i∈ I
∑ ai,j = ∑ 0 = 0, whereas the latter sum is ∑
i∈ I j∈ J
∑ ai,j =
i∈ I
j∈ J
| {z }
=0
∑ ai,1 + ∑ ∑ ai,j = 1 + ∑ 0 = 1). And indeed, this family ai,j

(i,j)∈ I × J
is not essen-
i∈ I j >1 i∈ I j >1
| {z } | {z }
=1 =0
tially finite.

We have now made sense of infinite sums of elements of K when all but
finitely many addends are 0. Of course, we can do the same for K [[ x ]] instead
of K (since K [[ x ]], too, is a commutative ring). However, this does not help
make sense of the sum on the left hand side of (26), because this sum is not
essentially finite (it is a sum of infinitely many nonzero FPSs). Thus, for sums
of FPSs, we need a weaker version of essential finiteness. Here is its definition:
Definition 3.2.9. A (possibly infinite) family (ai )i∈ I of FPSs is said to be
summable (or entrywise essentially finite) if

for each n ∈ N, all but finitely many i ∈ I satisfy [ x n ] ai = 0.

In this case, the sum ∑ ai is defined to be the FPS with


i∈ I
!
[ xn ] ∑ ai = ∑ [ x n ] ai for all n ∈ N. (27)
i∈ I i∈ I
| {z }
an essentially
finite sum

Remark 3.2.10. The condition “all but finitely many i ∈ I satisfy [ x n ] ai = 0”


in Definition 3.2.9 is not equivalent to “infinitely many i ∈ I satisfy [ x n ] ai =
0”.
Any essentially finite family of FPSs is summable, but the converse is not
generally the case.
Example 3.2.11. Let us see how Definition 3.2.9 justifies (26). Consider the
family (ai )i∈N ∈ K [[ x ]]N of FPSs, where
 

ai := 0, 0, . . . , 0, 1, 1, 1, . . . for each i ∈ N.


| {z }
i zeroes

The left hand side of (26) is precisely a0 + a1 + a2 + · · · = ∑ ai , so let us


i ∈N
Math 701 Spring 2021, version April 6, 2024 page 41

check that the family (ai )i∈N is summable and that its sum ∑ ai really
i ∈N
equals the right hand side of (26).
For each n ∈ N, all but finitely many i ∈ N satisfy [ x n ] ai = 0 (indeed, all
i > n satisfy this equality). Thus, the family (ai )i∈N is summable. For each
n ∈ N, we have
!
[ xn ] ∑ ai = ∑ [ x n ] ai (by (27))
i ∈N i ∈N
n n
= ∑ [ x n ] a + ∑ [ x n ] ai = ∑ 1 + ∑ 0
| {z }i
i =0 i >n i =0 i >n
| {z }
=1 =0 |{z}
(since i ≤n) (since i >n) =0
n
= ∑ 1 = n + 1.
i =0

Thus, ∑ ai = (1, 2, 3, 4, . . .). This is precisely the right hand side of (26).
i ∈N
Thus, (26) has been justified rigorously.

You can think of summable infinite sums of FPSs as a crude algebraic imitate
of uniformly convergent infinite sums of holomorphic functions in complex
analysis. (However, the former are a much simpler concept than the latter. In
particular, complex analysis is completely unnecessary in their study.)
The following fact is nearly obvious:14

Proposition 3.2.12. Let (ai )i∈ I be a summable family of FPSs. Then, any
subfamily of (ai )i∈ I is summable as well.

Proof of Proposition 3.2.12. Let J be a subset of I. We must prove that the sub-
family (ai )i∈ J is summable.
Let n ∈ N. Then, all but finitely many i ∈ I satisfy [ x n ] ai = 0 (since the
family (ai )i∈ I is summable). Hence, all but finitely many i ∈ J satisfy [ x n ] ai =
0 (since J is a subset of I). Since we have proved this for each n ∈ N, we
thus conclude that the family (ai )i∈ J is summable. This proves Proposition
3.2.12.
Just as with essentially finite families, we can work with summable sums of
FPSs “as if they were finite” most of the time:

Proposition 3.2.13. Sums of summable families of FPSs satisfy the usual rules
for sums (such as the breaking-apart rule ∑ as = ∑ as + ∑ as when a set
i ∈S i∈X i ∈Y
S is the union of two disjoint sets X and Y). See [19s, Proposition 7.2.11] for
14 A subfamily of a family ( f i )i∈ I means a family of the form ( f i )i∈ J , where J is a subset of I.
Math 701 Spring 2021, version April 6, 2024 page 42

details. Again, the only caveat is about interchange of summation signs: The
equality
∑ ∑ ai,j = ∑ ∑ ai,j
i∈ I j∈ J j∈ J i∈ I

holds when the family ai,j (i,j)∈ I × J is summable (i.e., when for each n ∈ N,


all but finitely many pairs (i, j) ∈ I × J satisfy [ x n ] ai,j = 0); it does not
generally hold if we merely assume that the sums ∑ ∑ ai,j and ∑ ∑ ai,j
i∈ I j∈ J j∈ J i∈ I
are summable.
Proof of Proposition 3.2.13. The proof is tedious (as there are many rules to check),
but fairly straightforward (the idea is always to focus on a single coefficient,
and then to reduce the infinite sums to finite sums). For example, consider the
“discrete Fubini rule”, which says that

∑ ∑ ai,j = ∑ ai,j = ∑ ∑ ai,j (28)


i∈ I j∈ J (i,j)∈ I × J j∈ J i∈ I

whenever ai,j is a summable family of FPSs. In order to prove this
(i,j)∈ I × J 
rule, we fix a summable family ai,j (i,j)∈ I × J of FPSs. It is easy to see that the
 
families ai,j j∈ J for all i ∈ I are summable as well, as are the families ai,j i∈ I
!  
for all j ∈ J, and the families ∑ ai,j and ∑ ai,j . Hence, all sums in
j∈ J i∈ I j∈ J
i∈ I
(28) are well-defined. Now, in order to prove (28), it suffices to check that
!   !
[ x n ] ∑ ∑ ai,j = [ x n ]  ∑ ai,j  = [ x n ] ∑ ∑ ai,j
i∈ I j∈ J (i,j)∈ I × J j∈ J i∈ I

for each n ∈ N. Fix n ∈ N; then, we have [ x n ] ai,j = 0 for all but finitely


many (i, j) ∈ I × J (since the family ai,j (i,j)∈ I × J is summable). That is, the
set of all pairs (i, j) ∈ I × J satisfying [ x n ] ai,j ̸= 0 is finite. Hence, the set


I ′ := i | (i, j) ∈ I × J with [ x n ] ai,j ̸= 0 of the first entries of all these pairs


 

is also finite, and so is the set J ′ := i | (i, j) ∈ I × J with [ x n ] ai,j ̸= 0 of


 

the second entries of all these pairs. Now, the definitions of I ′ and J ′ ensure
that any pair (i, j) ∈ I × J satisfies [ x n ] ai,j = 0 unless i ∈ I ′ and j ∈ J ′ . Hence,
we easily obtain the three equalities
!
[ xn ] ∑ ∑ ai,j =∑ ∑ [xn ] ai,j = ∑′ ∑′ [xn ] ai,j
i∈ I j∈ J i∈ I j∈ J i∈ I j∈ J

and  

[ xn ]  ∑ ai,j  = ∑ [ x n ] ai,j = ∑ [ x n ] ai,j


(i,j)∈ I × J (i,j)∈ I × J (i,j)∈ I ′ × J ′
Math 701 Spring 2021, version April 6, 2024 page 43

and !
[ xn ] ∑ ∑ ai,j = ∑ ∑ [xn ] ai,j = ∑′ ∑′ [xn ] ai,j .
j∈ J i∈ I j∈ J i∈ I j∈ J i∈ I

However, the right hand sides of these three equalities are equal (since the sums
appearing in them are finite sums, and thus satisfy the usual rules for sums).
Thus, the left hand sides are equal, exactly as we needed to show. See [19s,
proof of Proposition 7.2.11] for more details of this proof. Proving the other
properties of sums is easier.
A few conventions about infinite sums will be used rather often:

Convention 3.2.14. (a) For any given integer m ∈ Z, the summation sign ∑
k≥m

is to be understood as ∑ . We also write ∑ for this summation
k∈{m,m+1,m+2,...} k=m
sign.
(b) For any given integer m ∈ Z, the summation sign ∑ is to be under-
k>m
stood as ∑ .
k∈{m+1,m+2,m+3,...}

(c) Let I be a set, and let A (i ) be a logical statement for each i ∈ I. (For
example, I can be N, and A (i ) can be the statement “i is odd”.) Then,
the summation sign ∑ is to be understood as ∑ . (For example,
i ∈ I; i ∈{ j∈ I | A( j)}
A(i )
the summation sign ∑ means ∑ , that is, a sum over all odd
i ∈N; i ∈{ j∈N | j is odd}
i is odd
elements of N.)

We can now define the x that figured so prominently in our informal explo-
ration of formal power series back in Section 3.1:

Definition 3.2.15. Let x denote the FPS


 i (0, 1, 0, 0, 0, . . .). In other words, let x
1
denote the FPS with x x = 1 and x x = 0 for all i ̸= 1.

The following simple lemma follows almost immediately from the definition
of multiplication of FPSs:

Lemma 3.2.16. Let a = ( a0 , a1 , a2 , . . .) be an FPS. Then, x · a =


(0, a0 , a1 , a2 , . . .).

In other words, multiplying an FPS a by x is tantamount to inserting a 0 at


the front of a (and shifting all the previously existing entries of a to the right
by one position).
Math 701 Spring 2021, version April 6, 2024 page 44

Proof of Lemma 3.2.16. If n is a positive integer, then


n h i h i
[ x ] ( x · a) = ∑ xi x ·
n
x n −i
a
i =0 | {z }
= a n −i
(since a=( a0 ,a1 ,a2 ,...))
(by (22), applied to x and a instead of a and b)
n h i h i h i n h i
= ∑ x i x · a n − i = x 0 x · a n −0 + x 1 x · a n −1 + ∑ xi x · a n −i
i =0 | {z } | {z } i =2 | {z }
=0 =1 =0
(since i ≥2>1)
 
here, we have split off the addends
for i = 0 and i = 1 from the sum
n
= 0 · a n −0 + 1 · a n −1 + ∑ 0 · a n − i = 1 · a n −1 = a n −1 .
i =2
| {z }
=0 | {z }
=0

n  
A similar argument can be used for n = 0 (except that now, the sum ∑ xi x ·
i =0
an−i has no x1 x · an−1 addend), and results in the conclusion that [ x n ] ( x · a) =
 

0 in this case. Thus, for each n ∈ N, we have


(
an−1 , if n > 0;
[ xn ] ( x · a) =
0, if n = 0.

In other words, x · a = (0, a0 , a1 , a2 , . . .). This proves Lemma 3.2.16.


· · · x} for each k ∈ N (by the definition of powers in any
Recall that x k = |xx {z
k times
commutative ring). In particular, x0 = 1 (since 1 is the unity of the ring K [[ x ]]).
The following proposition describes x k explicitly for each k ∈ N:

Proposition 3.2.17. We have


 

x k = 0, 0, . . . , 0, 1, 0, 0, 0, . . . for each k ∈ N.


| {z }
k times

Proof of Proposition 3.2.17. We induct on k.  

Induction base: We have x0 = 1 = (1, 0, 0, 0, 0, . . .) = 0, 0, . . . , 0, 1, 0, 0, 0, . . ..


| {z }
0 times
In other words, Proposition 3.2.17 holds for k = 0.
Math 701 Spring 2021, version April 6, 2024 page 45

Induction step: Let m ∈ N. Assume that Proposition 3.2.17 holds for k = m.


We must prove that  for k = m + 1.
 Proposition 3.2.17 holds

We have x m = 0, 0, . . . , 0, 1, 0, 0, 0, . . . (since Proposition 3.2.17 holds for


| {z }
m times
k = m). Thus, Lemma
 3.2.16 (applied toa = x m and

( a0 , a1 , a2 , . . .) = 0, 0, . . . , 0, 1, 0, 0, 0, . . .) yields


| {z }
m times
   

x · x m = 0, 0, 0, . . . , 0, 1, 0, 0, 0, . . . = 0, 0, . . . , 0, 1, 0, 0, 0, . . . .
| {z } | {z }
m times m+1 times
 

In other words, x m+1 = 0, 0, . . . , 0, 1, 0, 0, 0, . . . (since x · x m = x m+1 ). In other


| {z }
m+1 times
words, Proposition 3.2.17 holds for k = m + 1. This completes the induction
step, thus proving Proposition 3.2.17.
Finally, the following corollary allows us to rewrite any FPS ( a0 , a1 , a2 , . . .) in
the familiar form a0 + a1 x + a2 x2 + a3 x3 + · · · :

Corollary 3.2.18. Any FPS ( a0 , a1 , a2 , . . .) ∈ K [[ x ]] satisfies

( a0 , a1 , a2 , . . . ) = a0 + a1 x + a2 x 2 + a3 x 3 + · · · = ∑ an x n .
n ∈N

In particular, the right hand side here is well-defined, i.e., the family
( an x n )n∈N is summable.

Proof of Corollary 3.2.18 (sketched). (See [19s, Corollary 7.2.16] for details.) By
Math 701 Spring 2021, version April 6, 2024 page 46

Proposition 3.2.17, we have


a0 + a1 x + a2 x 2 + a3 x 3 + · · ·
= a0 (1, 0, 0, 0, . . .)
+ a1 (0, 1, 0, 0, . . .)
+ a2 (0, 0, 1, 0, . . .)
+ a3 (0, 0, 0, 1, . . .)
+···
= ( a0 , 0, 0, 0, . . .)
+ (0, a1 , 0, 0, . . .)
+ (0, 0, a2 , 0, . . .)
+ (0, 0, 0, a3 , . . .)
+···
= ( a0 , a1 , a2 , a3 , . . . ) .
This proves Corollary 3.2.18 (since a0 + a1 x + a2 x2 + a3 x3 + · · · = ∑ an x n holds
n ∈N
for obvious reasons).
So we have “found” our x and given a rigorous justification for writing
( a0 , a1 , a2 , . . .) as a0 + a1 x + a2 x2 + a3 x3 + · · · . Note that we did not use any
analysis (real or complex) in the process; in particular, we did not have to
worry about convergence (we did have to worry about summability, but this is
much simpler than convergence). It is easy to come up with FPSs that don’t
converge when any nonzero real number is substituted for x (for example,
∑ n!x n = 1 + x + 2x2 + 6x3 + 24x4 + · · · is such an FPS); they are nevertheless
n ∈N
completely legitimate FPSs.
We can now also answer the question “what is a generating function”, albeit
the answer is somewhat anticlimactic:
Definition 3.2.19. Let ( a0 , a1 , a2 , . . .) be a sequence of elements of K.
Then, the (ordinary) generating function of ( a0 , a1 , a2 , . . .) will mean the FPS
( a0 , a1 , a2 , . . . ) = a0 + a1 x + a2 x 2 + a3 x 3 + · · · .

3.2.3. The Chu–Vandermonde identity


What we have done so far is sufficient to justify Example 3 from Section 3.1.
Thus, let us record the result of Example 3 as a proposition:
Proposition 3.2.20. Let a, b ∈ N, and let n ∈ N. Then,
  n   
a+b a b
= ∑ . (29)
n k =0
k n−k
Math 701 Spring 2021, version April 6, 2024 page 47

We have yet to justify Examples 1, 2 and 4; we shall do so later. First, however,


let us generalize Proposition 3.2.20 to arbitrary numbers a, b (as opposed to
merely a, b ∈ N). That is, we shall prove the following:

Theorem 3.2.21 (Vandermonde convolution identity, aka Chu–Vandermonde iden-


tity). Let a, b ∈ C, and let n ∈ N. Then,
  n   
a+b a b
= ∑ . (30)
n k =0
k n−k

(Actually, the “Let a, b ∈ C” in Theorem 3.2.21 can be replaced by “Let a, b be


any numbers”, where “numbers” is understood appropriately15 . We just wrote
“Let a, b ∈ C” for simplicity.)
To recall, back in Example 3, we proved Proposition 3.2.20 by multiplying
out the identity (1 + x ) a+b = (1 + x ) a (1 + x )b using the binomial formula. If
we wanted to extend this argument to arbitrary a, b ∈ C, then we would need
to make sense of powers like (1 + x )−3 or (1 + x )1/2 or (1 + x )π . This is indeed
possible (in fact, we will briefly outline this later on), but there is a much shorter
way.
In fact, there is a slick trick to automatically extend a claim like (29) from
nonnegative integers to complex numbers. It is sometimes known as the poly-
nomial identity trick, and is used a lot in algebra. The proof of Theorem 3.2.21
that we shall sketch below should illustrate this trick; you can find more details
and further examples in [20f, §7.5.3].
Proof of Theorem 3.2.21 (sketched). Fix n ∈ N and b ∈ N, but forget that a was
fixed. Then, Proposition 3.2.20 (which we have already proved) says that the
equality   n   
a+b a b
= ∑ (31)
n k =0
k n−k
holds for each a ∈ N. However, both sides of this equality are polynomials
(more precisely, polynomial functions) in a: indeed,

( a + b ) ( a + b − 1) ( a + b − 2) · · · ( a + b − n + 1)
 
a+b
= and
n n!
n   n
a ( a − 1) · · · ( a − k + 1)
  
a b b
∑ k n−k = ∑ k! n−k
.
k =0 k =0

If two univariate polynomials p and q (with rational, real or complex coeffi-


cients) are equal for each a ∈ N (that is: if we have p ( a) = q ( a) for each
15 Namely,a “number” should here be viewed as an element of a commutative Q-algebra. This
includes complex numbers, polynomials over complex numbers, power series over complex
numbers and even commuting matrices with complex entries.
Math 701 Spring 2021, version April 6, 2024 page 48

a ∈ N), then they must be identical (because two univariate polynomials that
are equal at infinitely many points must necessarily be identical16 ). Hence, the
two sides of (31) must be identical as polynomials in a. Thus, the equality (31)
holds not only for each a ∈ N, but also for each a ∈ C.
Now, forget that b was fixed. Instead, let us fix a ∈ C. As we just have
proved, the equality (31) holds for each b ∈ N. We want to show that it holds
for each b ∈ C. But this can be achieved by the same argument that we just
used to extend it from a ∈ N to a ∈ C: We view both sides of the equality
as polynomials (but this time in b, not in a), and argue that these polynomials
must be identical because they are equal at infinitely many points. The upshot
is that the equality (31) holds for all a, b ∈ C; thus, Theorem 3.2.21 is proven.
(See [20f, proofs of Lemma 7.5.8 and Theorem 7.5.3] or [19s, §2.17.3] for this
proof in more detail. Alternatively, see [Grinbe15, §3.3.2 and §3.3.3] for two
other proofs.)

3.2.4. What next?


Let us now return to our quest of justifying Examples 1, 2 and 4 from Section
3.1. In order to do so, we need to know

• what we can substitute into a FPS;

• when and why can we divide FPSs by FPSs;

• when and why can we take the square root of an FPS and solve a quadratic
equation using the quadratic formula.

So we need to do more. The following sections are devoted to this.

3.3. Dividing FPSs


3.3.1. Conventions
We shall make ourselves at home in the ring K [[ x ]] a bit more. (Recall that K is
a fixed commutative ring.)

Convention 3.3.1. From now on, we identify each a ∈ K with the constant
FPS a = ( a, 0, 0, 0, 0, . . .) ∈ K [[ x ]].

16 Quick reminder on why this is true: If p and q are two univariate polynomials (with rational,
real or complex coefficients) that are equal at infinitely many points (i.e., if there exist
infinitely many numbers z satisfying p (z) = q (z)), then p = q (because the assumption
entails that the difference p − q has infinitely many roots, but this entails p − q = 0 and thus
p = q). See [20f, Corollary 7.5.7] for this argument in more detail.
Math 701 Spring 2021, version April 6, 2024 page 49

This is motivated by the fact that a = a + 0x + 0x2 + 0x3 + · · · for any a ∈ K.


Convention 3.3.1 does not cause any dangerous ambiguities, because we have

a+b = a+b and


a−b = a−b and
a·b = a·b for any a, b ∈ K

(check this!), and because the zero and the unity of the ring K [[ x ]] are 0 and 1,
respectively.
Furthermore, I will stop using boldfaced letters (like a, b, c) for FPSs. (I did
this above for the sake of convenience, but this is rarely done in the literature.)

3.3.2. Inverses in commutative rings


We recall the notion of an inverse in a commutative ring:

Definition 3.3.2. Let L be a commutative ring. Let a ∈ L. Then:


(a) An inverse (or multiplicative inverse) of a means an element b ∈ L such
that ab = ba = 1 (where the “1” means the unity of L).
(b) We say that a is invertible in L (or a unit of L) if a has an inverse.

Note that the condition “ab = ba = 1” in Definition 3.3.2 (a) can be restated
as “ab = 1”, because we automatically have ab = ba (since L is a commutative
ring). I have chosen to write “ab = ba = 1” in order to state the definition in a
form that applies verbatim to noncommutative rings as well.

Example 3.3.3. (a) In the ring Z, the only two invertible elements are 1 and
−1. Each of these two elements is its own inverse.
(b) In the ring Q, every nonzero element is invertible. The same holds for
the rings R and C (and, more generally, for any field).

Our next goal is to study inverses of FPSs in K [[ x ]], answering in particular


the natural question “which elements of K [[ x ]] have inverses”. But let us first
prove their uniqueness in the generality of an arbitrary commutative ring:

Theorem 3.3.4. Let L be a commutative ring. Let a ∈ L. Then, there is at


most one inverse of a.
Proof. Let b and c be two inverses of a. We must prove that b = c.
Since b is an inverse of a, we have ab = ba = 1. Since c is an inverse of a,
we have ac = ca = 1. Now, we have b ( ac) = b · 1 = b and (ba) c = 1 · c = c.
|{z} |{z}
=1 =1
However, because of the “associativity of multiplication” axiom in Definition
3.2.1, we have b ( ac) = (ba) c. Hence, b = b ( ac) = (ba) c = c. This proves
Theorem 3.3.4.
Math 701 Spring 2021, version April 6, 2024 page 50

Theorem 3.3.4 allows us to make the following definition:

Definition 3.3.5. Let L be a commutative ring. Let a ∈ L. Assume that a is


invertible. Then:
(a) The inverse of a is called a−1 . (This notation is unambiguous, since
Theorem 3.3.4 shows that the inverse of a is unique.)
b
(b) For any b ∈ L, the product b · a−1 is called(or b/a).
a
−n
(c) For any negative integer n, we define an to be a−1 . Thus, the n-th
power an is defined for each n ∈ Z.

The following facts are easy to check:

Proposition 3.3.6. Let L be a commutative ring. Then:


(a) Any invertible element a ∈ L satisfies a−1 = 1/a.
(b) For any invertible elements a, b ∈ L, the element ab is invertible as well,
and satisfies ( ab)−1 = b−1 a−1 = a−1 b−1 .
n
(c) If a ∈ L is invertible, and if n ∈ Z is arbitrary, then a−n = a−1 =
( a n ) −1 .
(d) Laws of exponents hold for negative exponents as well: If a and b are
invertible elements of L, then

an+m = an am for all n, m ∈ Z;


( ab)n = an bn for all n ∈ Z;
( an )m = anm for all n, m ∈ Z.

(e) Laws of fractions hold: If a and c are two invertible elements of L, and
if b and d are any two elements of L, then

b d bc + ad b d bd
+ = and · = .
a c ac a c ac
(f) Division undoes multiplication: If a, b, c are three elements of L with a
c
being invertible, then the equality = b is equivalent to c = ab.
a

Proof. Exercise. (See, e.g., [19s, solution to Exercise 4.1.1] for a proof of parts (c)
and (d) in the special case where L = C; essentially the same argument works
in the general case. The remaining parts of Proposition 3.3.6 are even easier to
check. Note that parts (a) and (c) as well as the ( ab)−1 = b−1 a−1 part of part
(b) would hold even if L was a noncommutative ring.)
Math 701 Spring 2021, version April 6, 2024 page 51

3.3.3. Inverses in K [[ x ]]
Now, which FPSs are invertible in the ring K [[ x ]] ? For example, we know
from (5) that the FPS 1 − x is invertible, with inverse 1 + x + x2 + x3 + · · · . On
the other hand, the FPS x is not invertible, since Lemma 3.2.16 shows that any
product of x with an FPS must begin with a 0 (but the unity of K [[ x ]] does not
begin with a 0). (Strictly speaking, this is only true if the ring K is nontrivial
– i.e., if not all elements of K are equal. If K is trivial, then K [[ x ]] is trivial,
and thus any FPS in K [[ x ]] is invertible, but this does not make an interesting
statement.)
It turns out that we can characterize invertible FPSs in K [[ x ]] in a rather
simple way:

Proposition 3.3.7. Let a ∈ K[[ x ]]


 . Then, the FPS a is invertible in K [[ x ]] if
0
and only if its constant term x a is invertible in K.

Proof. =⇒: Assume that a is invertible in K [[ x ]]. That is, a has an inverse b ∈
K [[ x ]]. Consider this b. Since b is an inverse of a, we have ab = ba = 1 (where
“1” means  0  1 by Convention
 0   03.3.1).
 However, (24) (applied  0 to a = a and
 0  b = b)
yields x ( ab) = x a · x b. Comparing this with x ( ab) = x 1 = 1,
|{z}
 0  0  0 =10 
 0 find 0x a · x 0  b =
we  01. Thus, x b is 0an  0  of  x0  a in0 K (since
 inverse
x b · x a =  x a · x b = 1, so that x a· x b = x b · x a = 1).
Therefore, x0 a is invertible in K (with inverse x0 b). This proves the “=⇒”
direction of Proposition  0 3.3.7.
⇐=: Assume that x a is invertible in K. Write the FPS a in the form  0 a =
0
( a0 , a1 , a2 , . . .). Thus, x a = a0 , so that a0 is invertible in K (since x a is
invertible in K). Thus, its inverse a0−1 is well-defined.
Now, we want to prove that a is invertible in K [[ x ]]. We thus try to find an
inverse of a.
We work backwards at first: We assume that b = (b0 , b1 , b2 , . . .) ∈ K [[ x ]] is an
inverse for a, and we try to figure out how this inverse looks like.
Since b is an inverse of a, we have ab = 1 = (1, 0, 0, 0, . . .). However, from
a = ( a0 , a1 , a2 , . . .) and b = (b0 , b1 , b2 , . . .), we have

ab = ( a0 , a1 , a2 , . . .) (b0 , b1 , b2 , . . .)
= ( a0 b0 , a0 b1 + a1 b0 , a0 b2 + a1 b1 + a2 b0 , a0 b3 + a1 b2 + a2 b1 + a3 b0 , . . .)

(by the definition of the product of FPSs). Comparing this with ab = (1, 0, 0, 0, . . .),
we obtain

(1, 0, 0, 0, . . .)
= ( a0 b0 , a0 b1 + a1 b0 , a0 b2 + a1 b1 + a2 b0 , a0 b3 + a1 b2 + a2 b1 + a3 b0 , . . .) .
Math 701 Spring 2021, version April 6, 2024 page 52

This can be rewritten as the following system of equations:




 1 = a0 b0 ,
 0 = a0 b1 + a1 b0 ,


0 = a0 b2 + a1 b1 + a2 b0 , (32)
0 = a0 b3 + a1 b2 + a2 b1 + a3 b0 ,




....

I claim that this system of equations uniquely determines (b0 , b1 , b2 , . . .). In-
deed, we can solve the first equation (1 = a0 b0 ) for b0 , thus obtaining b0 = a0−1
(since a0 is invertible). Having thus found b0 , we can solve the second equation
(0 = a0 b1 + a1 b0 ) for b1 , thus obtaining b1 = − a0−1 ( a1 b0 ) (again because a0 is
invertible). Having thus found both b0 and b1 , we can solve the third equation
(0 = a0 b2 + a1 b1 + a2 b0 ) for b2 , thus obtaining b2 = − a0−1 ( a1 b1 + a2 b0 ). Proceed-
ing like this, we obtain recursive expressions for all coefficients b0 , b1 , b2 , . . . of
b, namely 


 b0 = a0−1 ,
 b1 = − a0−1 ( a1 b0 ) ,



 b2 = − a0−1 ( a1 b1 + a2 b0 ) , (33)
 b3 = − a−1 ( a1 b2 + a2 b1 + a3 b0 ) ,



 0
....

(This procedure for solving systems of linear equations is well-known from lin-
ear algebra – it is a form of Gaussian elimination, but a particularly simple
one because our system is triangular with invertible coefficients on the diago-
nal. The only complication is that it has infinitely many variables and infinitely
many equations.)
So we have shown that if b is an inverse of a, then the entries bi of the FPS b
are given recursively by (33). This yields that b is unique; alas, this is not what
we want to prove. Instead, we want to prove that b exists.
Fortunately, we can achieve this by simply turning our above argument around:
Forget that we fixed b. Instead, we define a sequence (b0 , b1 , b2 , . . .) of elements
of K recursively by (33), and we define the FPS b = (b0 , b1 , b2 , . . .) ∈ K [[ x ]].
Then, the equalities (32) hold (because they are just equivalent restatements of
the equalities (33)). In other words, we have
(1, 0, 0, 0, . . .)
= ( a0 b0 , a0 b1 + a1 b0 , a0 b2 + a1 b1 + a2 b0 , a0 b3 + a1 b2 + a2 b1 + a3 b0 , . . .) .
However, as before, we can show that
ab = ( a0 b0 , a0 b1 + a1 b0 , a0 b2 + a1 b1 + a2 b0 , a0 b3 + a1 b2 + a2 b1 + a3 b0 , . . .) .
Comparing these two equalities, we find ab = (1, 0, 0, 0, . . .) = 1. Thus, ba =
ab = 1, so that ab = ba = 1. This shows that b is an inverse of a, so that a is
invertible. This proves the “⇐=” direction of Proposition 3.3.7.
Math 701 Spring 2021, version April 6, 2024 page 53

We note a particularly simple corollary of Proposition 3.3.7 when K is a field:

Corollary 3.3.8. Assume that K is a field. Let a ∈ K [[ x ]]. Then, the FPS a is
invertible in K [[ x ]] if and only if x0 a ̸= 0.

Proof. An element of K is invertible in K if and only if it is nonzero (since K is


a field). Hence, Corollary 3.3.8 follows from Proposition 3.3.7.

3.3.4. Newton’s binomial formula


Let us now return to considering specific FPSs. We have already seen that the
FPS 1 − x is invertible, with inverse 1 + x + x2 + x3 + · · · . We shall now show
an analogous result for the FPS 1 + x. Its invertibility follows from Proposition
3.3.7, but it is better to derive it by hand, as this also gives a formula for the
inverse:

Proposition 3.3.9. The FPS 1 + x ∈ K [[ x ]] is invertible, and its inverse is

(1 + x ) −1 = 1 − x + x 2 − x 3 + x 4 − x 5 ± · · · = ∑ (−1)n x n .
n ∈N

First proof of Proposition 3.3.9. We have


 
2 3 4 5
(1 + x ) · 1 − x + x − x + x − x ± · · ·
   
= 1 · 1 − x + x2 − x3 + x4 − x5 ± · · · + x · 1 − x + x2 − x3 + x4 − x5 ± · · ·
| {z } | {z }
=1− x + x2 − x3 + x4 − x5 ±··· = x − x2 + x3 − x4 + x5 − x6 ±···
   
= 1 − x + x2 − x3 + x4 − x5 ± · · · + x − x2 + x3 − x4 + x5 − x6 ± · · ·
=1

(since all powers of x other than 1 cancel out). This shows that 1 − x + x2 − x3 +
x4 − x5 ± · · · is an inverse of 1 + x (since K [[ x ]] is a commutative ring). Thus,
1 + x is invertible, and its inverse is (1 + x )−1 = 1 − x + x2 − x3 + x4 − x5 ± · · · =
n
∑ (−1) x n . This proves Proposition 3.3.9.
n ∈N

Second proof of Proposition 3.3.9. We have


 
(1 + x ) · 1 − x + x 2 − x 3 + x 4 − x 5 ± · · ·
= (1 + x ) · 1 − (1 + x ) · x + (1 + x ) · x 2 − (1 + x ) · x 3 + (1 + x ) · x 4 − (1 + x ) · x 5 ± · · ·
| {z } | {z } | {z } | {z } | {z } | {z }
=1+ x = x + x2 = x2 + x3 = x3 + x4 = x4 + x5 = x5 + x6
        
= (1 + x ) − x + x 2 + x 2 + x 3 − x 3 + x 4 + x 4 + x 5 − x 5 + x 6 ± · · ·
=1
Math 701 Spring 2021, version April 6, 2024 page 54

(since we have a telescoping sum in front of us, in which all powers of x other
than 1 cancel out). This shows that 1 − x + x2 − x3 + x4 − x5 ± · · · is an inverse
of 1 + x (since K [[ x ]] is a commutative ring). Thus, 1 + x is invertible, and its
inverse is (1 + x )−1 = 1 − x + x2 − x3 + x4 − x5 ± · · · = ∑ (−1)n x n . This
n ∈N
proves Proposition 3.3.9.
Proposition 3.3.9 shows that the FPS 1 + x is invertible; thus, its powers
(1 + x )n are defined for all n ∈ Z (by Definition 3.3.5 (c)). The following for-
mula – known as Newton’s binomial theorem17 – describes these powers explicitly:

Theorem 3.3.10. For each n ∈ Z, we have


 
n k
(1 + x ) = ∑
n
x .
k ∈N
k

 
n k
Note that the sum ∑ x is summable for each n ∈ N (indeed, it equals
      k ∈ N k 
n n n
the FPS , , , . . . ). If n ∈ N, then it is essentially finite.
0 1 2
The reader may want to check that the particular case n = −1 of Theorem
3.3.10 agrees with Proposition 3.3.9. (Recall Example 2.3.2!)
Of course, Theorem 3.3.10 should look familiar – an identical-looking for-
mula appears in real analysis under the same name. However, the result in real
analysis is concerned with infinite sums of real numbers, while our Theorem
3.3.10 is an identity between FPSs over an arbitrary commutative ring. Thus,
the two facts are not the same.
We will prove Theorem 3.3.10 in a somewhat roundabout way, since this
gives us an opportunity to establish some auxiliary results that are of separate
interest (and usefulness). The first of these auxiliary results is a fundamental
property of binomial coefficients, known as the upper negation formula (see, e.g.,
[19fco, Proposition 1.3.7]):

Theorem 3.3.11. Let n ∈ C and k ∈ Z. Then,

−n k k+n−1
   
= (−1) .
k k

−n
 
Proof of Theorem 3.3.11 (sketched). If k < 0, then this is trivial because both
k
k+n−1
 
and are 0 (by (1)). Thus, we WLOG assume that k ≥ 0. Hence,
k

17 or Newton’s binomial formula


Math 701 Spring 2021, version April 6, 2024 page 55

k ∈ N. Thus, (1) yields

−n (−n) (−n − 1) (−n − 2) · · · (−n − k + 1)


 
= and
k k!
k+n−1 ( k + n − 1) ( k + n − 2) ( k + n − 3) · · · ( k + n − k )
 
= .
k k!
A moment of thought reveals that the right hand sides of these two equalities
k
 upto a factor of (−1) . Thus, so are the left hand sides. In other
are equal
−n k+n−1
words, = (−1)k . This proves Theorem 3.3.11.
k k
(Quick exercise: Rederive Example 2.3.2 from Theorem 3.3.11.)
Next, we show a formula for negative powers of 1 + x:

Proposition 3.3.12. For each n ∈ N, we have

k n+k−1
 
(1 + x ) = ∑ (−1)
−n
xk .
k ∈N
k

Proof of Proposition 3.3.12. We proceed by induction on n:


Induction base: Comparing

(1 + x )−0 = (1 + x )0 = 1 = (1, 0, 0, 0, . . .) = 1
with
0+k−1 k
 
∑ (−1) k
k
x
k ∈N
0 0+0−1 0+k−1
   
= (−1) x0 + ∑ (−1)k xk
| {z } 0 |{z}
k∈N;
k
=1 | {z } =1 |  {z  }
k >0
=1 k−1
= =0
k
(by Proposition 2.3.5
(applied to m=k−1 and n=k),
since k−1<k and k−1∈N)
(here, we have split off the addend for k = 0 from the sum)
= 1+ ∑ (−1)k 0x k = 1,
k∈N;
k >0
| {z }
=0

0+k−1 k
 
−0 k
we obtain (1 + x ) = ∑ (−1) x . In other words, Proposition
k ∈N k
3.3.12 holds for n = 0.
Math 701 Spring 2021, version April 6, 2024 page 56

Induction step: Let j ∈ N. Assume that Proposition 3.3.12 holds for n = j. We


must prove that Proposition 3.3.12 holds for n = j + 1.
We have assumed that Proposition 3.3.12 holds for n = j. In other words, we
have
k j+k−1
 
(1 + x ) = ∑ (−1)
−j
xk . (34)
k ∈N
k
Now, we want to prove that Proposition 3.3.12 holds for n = j + 1. In other
words, we want to prove that

k ( j + 1) + k − 1
 
(1 + x ) −( j+1)
= ∑ (−1) xk .
k ∈N
k

In view of (1 + x )−( j+1) = (1 + x )− j · (1 + x )−1 and ( j + 1) + k − 1 = j + k, this


equality can be rewritten as
 
k j+k
(1 + x ) · (1 + x ) = ∑ (−1)
−j −1
xk .
k ∈N
k

Since 1 + x is invertible, we can equivalently transform this equality by multi-


plying both sides with 1 + x; thus, it becomes
  !
j + k
(1 + x )− j = ∑ (−1)k x k · (1 + x ) .
k ∈N
k

So this is the equality we must prove.


Math 701 Spring 2021, version April 6, 2024 page 57

We do this by simplifying its right hand side:


  !
j + k
∑ (−1)k k xk · (1 + x)
k ∈N
 
k j+k
= ∑ (−1) x k (1 + x )
k ∈N
k
|  {z }
j+k
=(−1)k ( x k + x k +1 )
  k  
k
j+k k k
j + k k +1
=(−1) x +(−1) x
k k
     
k j+k k j+k
= ∑ (−1) k
x + (−1) x k +1

k ∈N
k k
   
j+k k j+k
= ∑ (−1) k
x + ∑ (−1)
k
x k +1
k ∈N
k k ∈N
k
| {z } |

j+k−1 j+k−1
 {z }
= +
j + ( k − 1 )
= ∑ (−1)k−1 xk
k−1 k k ≥1 k−1
(by Proposition 2.3.4, (here, we have substituted k−1 for k
applied to m= j+k and n=k) in the sum)
j+k−1 j+k−1 k −1 j + ( k − 1 )
     
= ∑ (−1) k
+ x + ∑ (−1)
k
xk
k ∈N
k − 1 k k ≥1
| {z } k − 1
=−(−1)k 
| | {z }
 {z } j+k−1
j+k−1 k j+k−1 k
 
= ∑ (−1)k x + ∑ (−1)k x =
k ∈N k−1 k ∈N k k−1
k j+k−1 k j+k−1
       j + k − 1
= ∑ (−1) x + ∑ (−1)
k
x + ∑ − (−1)
k k
xk
k ∈N
k − 1 k ∈N
k k ≥1
k − 1
k j+k−1 k j+k−1 k j+k−1
     
= ∑ (−1) x + ∑ (−1)
k
x − ∑ (−1)
k
xk
k ∈N
k − 1 k ∈N
k k ≥1
k − 1
 !
k j+k−1 k j+k−1 k j+k−1
    
= ∑ (−1) x − ∑ (−1)
k
x + ∑ (−1)
k
xk
k ∈N
k−1 k ≥1
k−1 k ∈N
k
| {z } | {z }
−j
j+0−1 0
 
=(1+ x )
=(−1)0 x (by (34))
0−1
(since the two sums differ only in their k=0 addend)
j+0−1
 
= (−1) 0
x 0 + (1 + x ) − j = (1 + x ) − j .
0−1
| {z }
=0
∈N)
(by (1), since 0−1/
Math 701 Spring 2021, version April 6, 2024 page 58

Multiplying both sides of this equality by (1 + x )−1 , we obtain


 
k j+k
∑ (−1) k
x k = (1 + x )− j · (1 + x )−1 = (1 + x )(− j)+(−1) = (1 + x )−( j+1) ,
k ∈N

so that

k ( j + 1) + k − 1
   
j+k
(1 + x ) −( j+1)
= ∑ (−1) k
x = ∑ (−1)
k
xk .
k ∈N
k k ∈N
k
 | {z }
( j + 1) + k − 1

=
k

In other words, Proposition 3.3.12 holds for n = j + 1. This completes the


induction step. Thus, Proposition 3.3.12 is proved.
We can rewrite Proposition 3.3.12 using negative binomial coefficients:

Corollary 3.3.13. For each n ∈ N, we have

−n k
 
(1 + x ) = ∑
−n
x .
k ∈N
k

Proof of Corollary 3.3.13. Proposition 3.3.12 yields

k n+k−1 −n k
   
(1 + x ) = ∑ (−1)
−n
x = ∑
k
x .
k ∈N |
k k ∈ N
k
 {z }
k + n − 1
=(−1)k
k
−n
 
=
k
(by Theorem 3.3.11)

This proves Corollary 3.3.13.


We can now easily prove Newton’s binomial formula:
 
n n k
Proof of Theorem 3.3.10. Let n ∈ Z. We must prove that (1 + x ) = ∑ x .
k ∈N k
If n ∈ N, then this follows by comparing
n  
n k n−k
(1 + x ) = ( x + 1) = ∑
n n
x 1|{z} (by the binomial theorem)
k =0
k
=1
n  
n k
= ∑ x
k =0
k
Math 701 Spring 2021, version April 6, 2024 page 59

with
  n     n  
n k n n n k
∑ k x = ∑ k xk + ∑ k
x = ∑
k
k
x + ∑ 0x k
k ∈N k =0 k>n | {z } k =0 k>n
| {z }
=0 =0
(by Proposition 2.3.5,
since n<k)
n  
n k
= ∑ x .
k =0
k

Hence, for the rest of this proof, we WLOG assume that n ∈ / N.


Hence, n is a negative integer, so that −n ∈ N. Thus, Corollary 3.3.13 (ap-
plied to −n instead of n) yields

− (−n) k
 
(1 + x ) −(−n)
= ∑ x .
k ∈N
k
 
n n k
Since − (−n) = n, this rewrites as (1 + x ) = ∑ x . Thus, Theorem 3.3.10
k ∈N k
is proven.
We thus have a formula for (1 + x )n for each integer n. We don’t yet have
such a formula for (1 + x )1/2 (nor do we have a proper definition of (1 + x )1/2 ),
but this was clearly a step forward.

3.3.5. Dividing by x
Let us see how this all helps us justify our arguments in Section 3.1. Proposition
3.3.7 justifies the fractions that appear in (4), but
 0 it does not justify dividing by
the FPS 2x in (10), since the constant term x (2x ) is surely not invertible.
1
And indeed, the FPS 2x is not invertible; the fraction is not a well-defined
2x
FPS.
However, it is easy to see directly which FPSs can be divided by x (and thus
by 2x, if K = Q), and what it means to divide them by x. In fact, Lemma
3.2.16 shows that multiplying an FPS by x means moving all its entries by one
position to the right, and putting a 0 into the newly vacated starting position.
Thus, it is rather clear what dividing by x should be:

Definition 3.3.14. Let a = ( a0 , a1 , a2 , . . .) be an FPS whose constant term a0 is


a
0. Then, is defined to be the FPS ( a1 , a2 , a3 , . . .).
x

The following is almost trivial:


Math 701 Spring 2021, version April 6, 2024 page 60

  Let a ∈ K [[ x ]] and
Proposition 3.3.15.
a
b ∈ K [[ x ]] be two FPSs. Then, a = xb
0
if and only if x a = 0 and b = .
x
Proof. Exercise.
a
Having defined in Definition 3.3.14 (when a has constant term 0), we can
x
a a 1 a
also define when 2 is invertible in K (just set = · ). Thus, the fraction
√ 2x 2x 2 x
1 ± 1 − 4x
in (10) makes sense when the ± sign is a − sign (but not when it is
2x

 
1/2
a + sign), at least if we interpret the square root 1 − 4x as ∑ (−4x )k .
k ≥0 k

3.3.6. A few lemmas


Let us use this occasion to state two simple lemmas (vaguely related to Defini-
tion 3.3.14) that will be used later on:

Lemma 3.3.16. Let a ∈ K [[ x ]] be an FPS with x0 a = 0. Then, there exists


 

an h ∈ K [[ x ]] such that a = xh.


Proof of Lemma 3.3.16. Write the FPS a in the form a = ( a0 , a1 , a2 , . . .). Thus,
a
a0 = x0 a = 0. Hence, the FPS is well-defined. Moreover, it is easy to see
 
x
a 18
that a = x · . Hence, there exists an h ∈ K [[ x ]] such that a = xh (namely,
x
a
h = ). This proves Lemma 3.3.16.
x
Lemma 3.3.17. Let k ∈ N. Let a ∈ K [[ x ]] be any FPS. Then, the first k
coefficients of the FPS x k a are 0.
Proof of Lemma 3.3.17. We must show that [ x m ] x k a = 0 for any nonnegative


integer m < k. But we can do this directly: If m is a nonnegative integer such


that m < k, then (22) (applied to x k , a and m instead of a, b and n) yields
  m h i  h i m h i
[ x m ] x k a = ∑ xi x k · x m−i a = ∑ 0 · x m−i a = 0,
i =0 | {z } i =0
=0
(since i ≤m<k
and thus i ̸=k)

18 Proof. a a
We have = ( a1 , a2 , a3 , . . .) (by Definition 3.3.14). Thus, Lemma 3.2.16 (applied to
x x
and ai−1 instead of a and ai ) yields
a
x· = (0, a1 , a2 , a3 , . . .) = ( a0 , a1 , a2 , a3 , . . .) (since 0 = a0 )
x
= ( a0 , a1 , a2 , . . .) = a.
a
Thus, a = x · .
x
Math 701 Spring 2021, version April 6, 2024 page 61

which is exactly what we wanted to show. Thus, Lemma 3.3.17 is proved.


(Alternatively, we could prove Lemma  3.3.17 by writing a inthe form a =

( a0 , a1 , a2 , . . .) and observing that x k a = 0, 0, . . . , 0, a , a , a , . . .. This follows


| {z } 0 1 2
k times
by applying Lemma 3.2.16 a total of k times, or more formally by induction on
k.)
Lemma 3.3.17 has a converse; here is a statement that combines it with this
converse:

Lemma 3.3.18. Let k ∈ N. Let f ∈ K [[ x ]] be any FPS. Then, the first k


coefficients of the FPS f are 0 if and only if f is a multiple of x k .

Here, we use the following notation:

Definition 3.3.19. Let g ∈ K [[ x ]] be an FPS. Then, a multiple of g means an


FPS of the form ga with a ∈ K [[ x ]].

(This is just a particular case of the usual concept of multiples in a commu-


tative ring.)
Proof of Lemma 3.3.18. The statement we are proving is an “if and only if” state-
ment. We shall prove its “only if” (i.e., “=⇒”) and its “if” (i.e., “⇐=”) directions
separately:
=⇒: Assume that the first k coefficients of the FPS f are 0. We must show
that f is a multiple of x k .
Write f as f = ( f 0 , f 1 , f 2 , . . .). Then, the first k coefficients of the FPS f are
f 0 , f 1 , . . . , f k−1 . Hence, these k coefficients f 0 , f 1 , . . . , f k−1 are 0 (since we have
assumed that the first k coefficients of the FPS f are 0). In other words, f n = 0
k −1 k −1
for each n ∈ {0, 1, . . . , k − 1}. Hence, ∑ f n x n = ∑ 0x n = 0.
n =0 |{z} n =0
=0
Now,
k −1 ∞ ∞
f = ( f 0 , f 1 , f 2 , . . .) = ∑ f n xn = ∑ f n xn + ∑ fn xn
|{z} = ∑ f n x k x n−k
n ∈N n =0 n=k n=k
| {z } = x k x n−k
=0 (since n≥k)

= xk ∑ f n x n−k .
n=k


In other words, f = x k a for a = ∑ f n x n−k . This shows that f is a multiple of
n=k
x k . Thus, the “=⇒” direction of Lemma 3.3.18 is proved.
Math 701 Spring 2021, version April 6, 2024 page 62

⇐=: Assume that f is a multiple of x k . In other words, f = x k a for some a ∈


K [[ x ]]. Consider this a. Now, Lemma 3.3.17 yields that the first k coefficients
of the FPS x k a are 0. In other words, the first k coefficients of the FPS f are 0
(since f = x k a). This proves the “⇐=” direction of Lemma 3.3.18.
The proof of Lemma 3.3.18 is now complete, as both directions have been
proved.
Another lemma that will prove its usefulness much later concerns FPSs that
are equal up until a certain coefficient. It says that if f and g are two FPSs
whose first n + 1 coefficients agree (for a certain n ∈ N), then the same is true
of the FPSs a f and ag whenever a is any further FPS. In more details:
Lemma 3.3.20. Let a, f , g ∈ K [[ x ]] be three FPSs. Let n ∈ N. Assume that

[ xm ] f = [ xm ] g for each m ∈ {0, 1, . . . , n} . (35)

Then,
[ x m ] ( a f ) = [ x m ] ( ag) for each m ∈ {0, 1, . . . , n} .

Proof of Lemma 3.3.20. Let m ∈ {0, 1, . . . , n}. Then, m ≤ n. Hence, each j ∈


{0, 1, . . . , m} satisfies j ≤ m ≤ n and thus j ∈ {0, 1, . . . , n} and therefore
h i h i
xj f = xj g (36)
(by (35), applied to j instead of m). However, (23) (applied to m, a and f instead
of n, a and b) yields
m h i h i m h i h i
[ x m ] ( a f ) = ∑ x m− j a · x j f = ∑ x m− j a · x j g.
j =0 | {z } j =0
=[ x j ] g
(by (36))

On the other hand, (23) (applied to m, a and g instead of n, a and b) yields


m h i h i
[ x m ] ( ag) = ∑ x m− j a · x j g.
j =0

Comparing these two equalities, we obtain [ x m ] ( a f ) = [ x m ] ( ag). This proves


Lemma 3.3.20.
A consequence of Lemma 3.3.20 is the following fact:
Lemma 3.3.21. Let u, v ∈ K [[ x ]] be two FPSs such that v is a multiple of u.
Let n ∈ N. Assume that
[ xm ] u = 0 for each m ∈ {0, 1, . . . , n} . (37)
Then,
[ xm ] v = 0 for each m ∈ {0, 1, . . . , n} .
Math 701 Spring 2021, version April 6, 2024 page 63

Proof of Lemma 3.3.21. We have assumed that v is a multiple of u. In other


words, v = ua for some a ∈ K [[ x ]]. Consider this a.
For each m ∈ {0, 1, . . . , n}, we have
[ xm ] u = 0 (by (37))
= [ xm ] 0 (since the FPS 0 satisfies [ x m ] 0 = 0) .
Hence, Lemma 3.3.20 (applied to f = u and g = 0) yields that
[ x m ] ( au) = [ x m ] ( a · 0) for each m ∈ {0, 1, . . . , n} . (38)
Now, for each m ∈ {0, 1, . . . , n}, we have
[ x m ] v = [ x m ] ( au) (since v = ua = au)
= [ x m ] ( a · 0) (by (38))
| {z }
=0
m
= [ x ] 0 = 0.
This proves Lemma 3.3.21.
We can derive a further useful consequence from Lemma 3.3.21:

Lemma 3.3.22. Let a, b, c, d ∈ K [[ x ]] be four FPSs. Let n ∈ N. Assume that

[ xm ] a = [ xm ] b for each m ∈ {0, 1, . . . , n} . (39)

Assume further that

[ xm ] c = [ xm ] d for each m ∈ {0, 1, . . . , n} . (40)

Then,
[ x m ] ( ac) = [ x m ] (bd) for each m ∈ {0, 1, . . . , n} .

Proof of Lemma 3.3.22. For each m ∈ {0, 1, . . . , n}, we have


[ xm ] ( a − b) = [ xm ] a − [ xm ] b (by (21))
=0 (by (39)) .
Moreover, the FPS ac − bc is a multiple of a − b (since ac − bc = ( a − b) c).
Hence, Lemma 3.3.21 (applied to u = a − b and v = ac − bc) shows that
[ x m ] ( ac − bc) = 0 for each m ∈ {0, 1, . . . , n} (41)
(since we have [ x m ] ( a − b) = 0 for each m ∈ {0, 1, . . . , n}).
For each m ∈ {0, 1, . . . , n}, we have
[ x m ] (c − d) = [ x m ] c − [ x m ] d (by (21))
=0 (by (40)) .
Math 701 Spring 2021, version April 6, 2024 page 64

Moreover, the FPS bc − bd is a multiple of c − d (since bc − bd = b (c − d) =


(c − d) b). Hence, Lemma 3.3.21 (applied to u = c − d and v = bc − bd) shows
that
[ x m ] (bc − bd) = 0 for each m ∈ {0, 1, . . . , n} (42)
(since we have [ x m ] (c − d) = 0 for each m ∈ {0, 1, . . . , n}).
Now, let m ∈ {0, 1, . . . , n}. Then, (21) yields [ x m ] ( ac − bc) = [ x m ] ( ac) −
[ x ] (bc). Comparing this with (41), we obtain [ x m ] ( ac) − [ x m ] (bc) = 0. In other
m

words, [ x m ] ( ac) = [ x m ] (bc). On the other hand, (21) yields [ x m ] (bc − bd) =
[ x m ] (bc) − [ x m ] (bd). Comparing this with (42), we obtain [ x m ] (bc) − [ x m ] (bd) =
0. In other words, [ x m ] (bc) = [ x m ] (bd). Hence, [ x m ] ( ac) = [ x m ] (bc) =
[ x m ] (bd). This proves Lemma 3.3.22.

3.4. Polynomials
3.4.1. Definition
Let us take a little side trip to relate FPSs to polynomials. As should be clear
enough from the definitions, we can think of an FPS as a “polynomial with
(potentially) infinitely many nonzero coefficients”. This can be easily made
precise. Indeed, we can define polynomials as FPSs that have only finitely
many nonzero coefficients:

Definition 3.4.1. (a) An FPS a ∈ K [[ x ]] is said to be a polynomial if all but


finitely many n ∈ N satisfy [ x n ] a = 0 (that is, if all but finitely many coeffi-
cients of a are 0).
(b) We let K [ x ] be the set of all polynomials a ∈ K [[ x ]]. This set K [ x ]
is a subring of K [[ x ]] (according to Theorem 3.4.2 below), and is called the
univariate polynomial ring over K.

For example, 2 + 3x + 7x5 is a polynomial, whereas 1 + x + x2 + x3 + · · · is


not (unless K is a trivial ring).
The definition of a “polynomial” that you have seen in your abstract alge-
bra course might be superficially different from that in Definition 3.4.1; but it
necessarily is equivalent. (In fact, Definition 3.4.1 (a) can be restated as “a poly-
nomial means a K-linear combination of the monomials x0 , x1 , x2 , . . .”, and it
is clear that the monomials x0 , x1 , x2 , . . . in K [[ x ]] are K-linearly independent;
thus, the polynomial ring K [ x ] as we have  defined it in Definition 3.4.1 (b) is a
0 1 2
free K-module with basis x , x , x , . . . . The same is true for the polynomial
ring K [ x ] that you know from abstract algebra. Moreover, the rules for adding,
subtracting and multiplying polynomials known from abstract algebra agree
with the formulas for a + b, a − b and a · b that we gave in Definition 3.2.5.)
We owe a theorem:
Math 701 Spring 2021, version April 6, 2024 page 65

Theorem 3.4.2. The set K [ x ] is a subring of K [[ x ]] (that is, it is closed under


addition, subtraction and multiplication, and contains the zero 0 and the
unity 1) and is a K-submodule of K [[ x ]] (that is, it is closed under addition
and scaling by elements of K).

Proof of Theorem 3.4.2 (sketched). This is a rather easy exercise. The hardest part
is to show that K [ x ] is closed under multiplication. But this, too, is easy: Let
a, b ∈ K [ x ]. Then, all but finitely many n ∈ N satisfy [ x n ] a = 0 (since a ∈ K [ x ]).
In other words, there exists a finite subset I of N such that
h i
xi a = 0 for all i ∈ N \ I. (43)

Similarly, there exists a finite subset J of N such that


h i
x j b = 0 for all j ∈ N \ J. (44)

Consider these I and J. Now, let S be the subset {i + j | i ∈ I and j ∈ J } of N.


This set S is again finite (since I and J are finite), and we can easily see (using
(22)) that
[ x n ] ( ab) = 0 for all n ∈ N \ S
Thus, all but finitely many n ∈ N satisfy [ x n ] ( ab) = 0 (since S is finite).
19 .

This shows that ab ∈ K [ x ]. Hence, we have shown that K [ x ] is closed under


multiplication. The remaining claims of Theorem 3.4.2 are similar but easier.

3.4.2. Reminders on rings and K-algebras


As we now know, polynomials are just a special case of FPSs. However, they
have some features that FPSs don’t have in general. The most important of
n  
19 Proof. / S. Now, (22) yields [ x n ] ( ab) = ∑ xi a · x n−i b.
Let n ∈ N \ S. Thus, n ∈ N and n ∈
 
i =0
We shall next show that each addend in this sum is 0.
Indeed, let i ∈ {0, 1, . . . , n} be arbitrary. Let j = n − i. Thus, n = i + j. Hence, we
cannot have i ∈ I and j ∈ J simultaneously (because if we did, then we would have n =
i + j ∈ S (by the definition of S), which would contradict n ∈
|{z} / S). Hence, we must
|{z}
∈I ∈J
/ J (or both). In the former case, we have xi a = 0 (by (43), since i ∈
 
have i ∈ / I or j ∈ / I
entails i ∈ N \ I). In the latter case, we have x b = 0 (by (44), since j ∈/ J entails j ∈ N \ J).
 j

Thus, in either case, at least one of the two coefficients xi a and x j b is 0, so that their
   

product xi a · x j b is 0. In other words, xi a · x n−i b is 0 (since j = n − i).


       

Forget that we fixed i. We thus have shown that xi a · x n−i b is 0 for each i ∈
   
n  
{0, 1, . . . , n}. In other words, all addends of the sum ∑ xi a · x n−i b are 0. Hence, the
 
i =0
n  
whole sum is 0. In other words, [ x n ] ( ab) = 0 (since [ x n ] ( ab) = ∑ xi a · x n−i b), qed.
 
i =0
Math 701 Spring 2021, version April 6, 2024 page 66

these features is substitution. To wit, we can substitute an element of K, or


more generally an element of any K-algebra, into a polynomial (but generally
not into an FPS). Before we explain how, let us recall the notions of rings and
K-algebras (see [23wa, Chapter 2 and §3.11] for details and more about them):

Definition 3.4.3. The notion of a ring (also known as a noncommutative ring)


is defined in the exact same way as we defined the notion of a commutative
ring in Definition 3.2.1, except that the “Commutativity of multiplication”
axiom is removed.

Examples of noncommutative rings20 abound in linear algebra:

• For any n ∈ N, the matrix ring Rn×n (that is, the ring of all n × n-matrices
with real entries) is a ring. This ring is commutative if n ≤ 1, but not if
n > 1.
More generally, if K is any ring (commutative or not), then the matrix ring
K n×n is a ring for every n ∈ N.

• The ring H of quaternions is a ring that is not commutative.

• If M is an abelian group, then the ring of all endomorphisms of M (that


is, the ring of all Z-linear maps from M to M) is a noncommutative ring.
(Its multiplication is composition of endomorphisms.)

Next, let us recall the notion of a K-algebra ([23wa, §3.11]). Recall that K is a
fixed commutative ring.

Definition 3.4.4. A K-algebra is a set A equipped with four maps

⊕ : A × A → A,
⊖ : A × A → A,
⊙ : A × A → A,
⇀:K×A→ A

→ −

and two elements 0 ∈ A and 1 ∈ A satisfying the following properties:

1. The set A, equipped with the maps ⊕, ⊖ and ⊙ and the two elements

→ −

0 and 1 , is a (noncommutative) ring.
−→
2. The set A, equipped with the maps ⊕, ⊖ and ⇀ and the element 0 , is
a K-module.

20 Note that the word “noncommutative ring” does not imply that the ring is not commuta-
tive; it merely means that commutativity is not required. Thus, any commutative ring is a
noncommutative ring.
Math 701 Spring 2021, version April 6, 2024 page 67

3. We have
λ ⇀ ( a ⊙ b) = (λ ⇀ a) ⊙ b = a ⊙ (λ ⇀ b) (45)
for all λ ∈ K and a, b ∈ A.

(Thus, in a nutshell, a K-algebra is a set A that is simultaneously a ring


and a K-module, with the property that the ring A and the K-module A have
the same addition, the same subtraction and the same zero, and satisfy the
additional compatibility property (45).)
Consequently, a K-algebra is automatically a ring and a K-module. Thus,
all the notations and shorthands that we have introduced for rings and for
K-modules will also be used for K-algebras. For example, if A is a K-algebra,
then both maps ⊙ : A × A → A and ⇀ : K × A → A will be denoted by ·
unless there is a risk of confusion. (There is rarely a risk of confusion, since
the two maps act on different inputs: a · b means a ⊙ b if a belongs to A, and
means a ⇀ b if a belongs to K. Often, even when an element a belongs to
both A and K, the elements a ⊙ b and a ⇀ b are equal, so confusion cannot
arise.)

Examples of K-algebras include:

• the ring K itself;


• the ring K [[ x ]] of FPSs (we have defined the relevant maps in Definition
3.2.5, and claimed the relevant properties in Theorem 3.2.6);
• its subring K [ x ] (all its maps are inherited from K [[ x ]]);
• the matrix ring K n×n for each n ∈ N;
• any quotient ring of K (that is, any ring of the form K/I where I is an
ideal of K);
• any commutative ring that contains K as a subring.

Note that the axiom (45) in the definition of K-algebra can be rewritten as

λ ( ab) = (λa) b = a (λb) for all λ ∈ K and a, b ∈ A

using our conventions (to write ab for a ⊙ b and to write λc for λ ⇀ c). It says
that scaling a product in A by a scalar in λ ∈ K is equivalent to scaling either
of its two factors by λ.

3.4.3. Evaluation aka substitution into polynomials


We can now define what it means to substitute an element of a K-algebra into
a polynomial:
Math 701 Spring 2021, version April 6, 2024 page 68

Definition 3.4.5. Let f ∈ K [ x ] be a polynomial. Let A be any K-algebra. Let


a ∈ A be any element. We then define an element f [ a] ∈ A as follows:
Write f in the form f = ∑ f n x n with f 0 , f 1 , f 2 , . . . ∈ K. (That is, f n = [ x n ] f
n ∈N
for each n ∈ N.) Then, set

f [ a] := ∑ f n an .
n ∈N

(This sum is essentially finite, since f is a polynomial.)


The element f [ a] is also denoted by f ◦ a and is called the value of f at a
(or the evaluation of f at a, or the result of substituting a for x in f ).

Many people write f ( a) for the value f [ a] we have just defined. Unfortu-
nately, this leads to ambiguities (for example, f ( x + 1) could mean either the
value of f at x + 1, or the product of f with x + 1). By writing f [ a] or f ◦ a
instead, I will avoid these ambiguities.
For example, if f = 4x3 + 2x + 7, then f [ a] = 4a3 + 2a + 7. For another
example, if f = ( x + 5)3 , then f [ a] = ( a + 5)3 (although this is not obvious; it
follows from Theorem 3.4.6 below).
If f and g are two polynomials in K [ x ], then the value f [ g] = f ◦ g (this is
the value of f at g; it is well-defined because K [ x ] is a K-algebra) is also known
as the composition of f with g. We note that any polynomial f ∈ K [ x ] satisfies

f [x] = f and
h i
f [0] = x0 f = (the constant term of f ) and
h i h i h i
f [1] = x0 f + x1 f + x2 f + · · · = (the sum of all coefficients of f ) .

It is fairly common to write f [ x ] instead of f for a polynomial, just to stress the


fact that it is a polynomial in an indeterminate called x. The equality f [ x ] = f
justifies this.
Definition 3.4.5 is rather versatile. For example, if f ∈ Z [ x ] is a polynomial
with integer coefficients, then it allows evaluating f at integers, at complex
numbers, at residue classes in Z/n, at square matrices, at other polynomials
and at FPSs. More generally, a polynomial f ∈ Z [ x ] can be evaluated at any
element of any ring, since any ring is automatically a Z-algebra. Evaluating
polynomials at square matrices is an important idea in linear algebra (e.g., the
Cayley–Hamilton theorem is concerned with the characteristic polynomial of a
square matrix, evaluated at this matrix itself).
The following theorem surveys the most basic properties of values of poly-
nomials:
Math 701 Spring 2021, version April 6, 2024 page 69

Theorem 3.4.6. Let A be a K-algebra. Let a ∈ A. Then:


(a) Any f , g ∈ K [ x ] satisfy

( f + g) [ a] = f [ a] + g [ a] and ( f g) [ a] = f [ a] · g [ a] .

(b) Any λ ∈ K and f ∈ K [ x ] satisfy

(λ f ) [ a] = λ · f [ a] .

(c) Any λ ∈ K satisfies λ [ a] = λ · 1 A , where 1 A is the unity of the ring A.


(This is often written as “λ [ a] = λ”, but keep in mind that the “λ” on the
right hand side of this equality is understood to be “coerced into A”, so it
actually means “the element of A corresponding to λ”, which is λ · 1 A .)
(d) We have x [ a] = a.
(e) We have xi [ a] = ai for each i ∈ N.
(f) Any f , g ∈ K [ x ] satisfy f [ g [ a]] = ( f [ g]) [ a].

Proof. See [19s, Theorem 7.6.3] for parts (a), (b), (c), (d) and (e). Part (f) is [19s,
Proposition 7.6.14].

3.5. Substitution and evaluation of power series


3.5.1. Defining substitution
Definition 3.4.5 shows that if f ∈ K [ x ] is a polynomial, then almost anything
(to be more precise: any element of a K-algebra) can be substituted into f .
In contrast, if f ∈ K [[ x ]] is a FPS, then there are far fewer things that can be
substituted into f . Even elements of K itself cannot always be substituted into
f . For example, if we try to substitute 1 for x in the FPS 1 + x + x2 + x3 + · · · ,
then we get
1 + 1 + 12 + 13 + · · · = 1 + 1 + 1 + 1 + · · · ,
which is undefined. Real analysis can help make sense of certain values of FPSs
1
(for example, substituting for x into the FPS 1 + x + x2 + x3 + · · · yields the
2
1 1 1
convergent series 1 + + 2 + 3 + · · · = 2), but this is subtle and specific to
2 2 2
certain numbers and certain FPSs.21
Thus, polynomials have a clear advantage over FPSs.

21 Forinstance, it is not hard to see that there is no nonzero complex number that can be
substituted into the FPS ∑ n!x n to obtain a convergent result. Thus, even though some
n ∈N
complex numbers can be substituted into some FPSs, there is no complex number other
than 0 that can be substituted into every FPS.
Math 701 Spring 2021, version April 6, 2024 page 70

However, let us not give up on FPSs yet. Some things can be substituted into
an FPS. For example:
• We can always substitute 0 for x in an FPS a0 + a1 x + a2 x2 + a3 x3 + · · · .
The result is
a 0 + a 1 0 + a 2 02 + a 3 03 + · · · = a 0 + 0 + 0 + 0 + · · · = a 0 .

• We can always substitute x for x in an FPS a0 + a1 x + a2 x2 + a3 x3 + · · · .


The result is the same FPS a0 + a1 x + a2 x2 + a3 x3 + · · · that we started
with (obviously).
• We can always substitute 2x for x in an FPS a0 + a1 x + a2 x2 + a3 x3 + · · · .
The result is
a0 + a1 (2x ) + a2 (2x )2 + a3 (2x )3 + · · · = a0 + 2a1 x + 4a2 x2 + 8a3 x3 + · · · .

• We can always substitute x2 + x for x in an FPS a0 + a1 x + a2 x2 + a3 x3 +


· · · . This is less obvious, so let me explain why. If we try to substitute
x2 + x for x in an FPS a0 + a1 x + a2 x2 + a3 x3 + · · · , then we obtain
 h i
a0 + a1 x + a2 x 2 + a3 x 3 + · · · x + x 2
   2  3
= a0 + a1 x + x 2 + a2 x + x 2 + a3 x + x 2 + · · ·
     
= a0 + a1 x + x2 + a2 x2 + 2x3 + x4 + a3 x3 + 3x4 + 3x5 + x6 + · · ·
= a0 + a1 x + ( a1 + a2 ) x2 + (2a2 + a3 ) x3 + ( a2 + 3a3 + a4 ) x4 + · · · .
I claim that the right hand side here is well-defined. To prove this, I need
to show that for each n ∈ N, the coefficient of x n on this right hand side
is a finite sum of ai ’s. Indeed, fix n ∈ N. Recall that the right hand side
is obtained by expanding the infinite sum
   2  3
2 2 2
a0 + a1 x + x + a2 x + x + a3 x + x +··· .

Only the first n + 1 addends of this infinite sum (i.e., only the addends
k
ak x + x2 with k ≤ n) can contribute to the coefficient of x n , since any
of the remaining addends is a multiple of x n+1 (because it has the form
k
ak x + x2 = ak ( x (1 + x ))k = ak x k (1 + x )k with k ≥ n + 1) and thus has
a zero coefficient of x n . Hence, the coefficient of x n in this infinite sum
equals the coefficient of x n in the finite sum
   2  3  n
a0 + a1 x + x 2 + a2 x + x 2 + a3 x + x 2 + · · · + a n x + x 2 .

But the latter coefficient is clearly a finite sum of ai ’s. Thus, my claim is
proved, and it follows that the result of substituting x2 + x for x in an FPS
a0 + a1 x + a2 x2 + a3 x3 + · · · is well-defined.
Math 701 Spring 2021, version April 6, 2024 page 71

The idea of the last example can be generalized; there was nothing special
about x + x2 that we used other than the fact that x + x2 is a multiple of x (that
is, an FPS whose constant term is 0). Thus, generalizing our reasoning from this
example, we can convince ourselves that any FPS g that is a multiple of x (that
is, whose constant term is 0) can be substituted into any FPS. Let us introduce
a notation for this, exactly like we did for substituting things into polynomials:

Definition 3.5.1. Let f and g be two FPSs in K [[ x ]]. Assume that x0 g = 0


 

(that is, g = g1 x1 + g2 x2 + g3 x3 + · · · for some g1 , g2 , g3 , . . . ∈ K).


We then define an FPS f [ g] ∈ K [[ x ]] as follows:
Write f in the form f = ∑ f n x n with f 0 , f 1 , f 2 , . . . ∈ K. (That is, f n = [ x n ] f
n ∈N
for each n ∈ N.) Then, set

f [ g] := ∑ f n gn . (46)
n ∈N

(This sum is well-defined, as we will see in Proposition 3.5.2 (b) below.)


This FPS f [ g] is also denoted by f ◦ g, and is called the composition of f
with g, or the result of substituting g for x in f .

Once again, it is not uncommon to see this FPS f [ g] denoted by f ( g), but I
will eschew the latter notation (since it can be confused with a product).
In order to prove that Definition 3.5.1 makes sense, we need to ensure that the
infinite sum ∑ f n gn in (46) is well-defined. The proof of this fact is analogous
n ∈N
to the reasoning I used in the last example; let me present it again in the general
case:

Proposition 3.5.2. Let f and g be two FPSs in K [[ x ]]. Assume that x0 g = 0.


 

Write f in the form f = ∑ f n x n with f 0 , f 1 , f 2 , . . . ∈ K. Then:


n ∈N
(a) For each n ∈ N, the first n coefficients of the FPS gn are 0.
(b) The sum ∑ f n gn is well-defined, i.e., the family ( f n gn )n∈N is
n ∈N
summable.
 
(c) We have x0 ∑ fn gn
 
= f0.
n ∈N

Proof of Proposition 3.5.2. (a) This is easily proved by induction on n. Here is a


shorter alternative argument:
The FPS g has constant term x0 g = 0. Hence, Lemma 3.3.16 (applied to
 

a = g) yields that there exists an h ∈ K [[ x ]] such that g = xh. Consider this h.


Now, let n ∈ N. From g = xh, we obtain gn = ( xh)n = x n hn . However,
Lemma 3.3.17 (applied to k = n and a = hn ) yields that the first n coefficients
Math 701 Spring 2021, version April 6, 2024 page 72

of the FPS x n hn are 0. In other words, the first n coefficients of the FPS gn are 0
(since gn = x n hn ). Thus, Proposition 3.5.2 (a) is proved.
(b) This follows from part (a). Here are the details.
We must prove that the family ( f n gn )n∈N is summable. In other words, we
must prove that the family f i gi i∈N is summable (since f i gi i∈N = ( f n gn )n∈N ).
 

In other words, we must prove that for each n ∈ N, all but finitely many i ∈ N
satisfy [ x n ] f i gi = 0 (by the definition of “summable”). So let us prove this.
Fix n ∈ N. We must prove that all but finitely many i ∈ N satisfy [ x n ] f i gi =


0.
Indeed, let i ∈ N satisfy i > n. Then, n < i. Now, the first i coefficients of the
FPS gi are 0 (by Proposition 3.5.2 (a), applied to i instead of n). However, the
n i i

coefficient [ x ] g of g is one of these first i coefficients (because n < i). Thus,
this coefficient [ x n ] gi must be 0. Now,

 f i ∈ K; thus, (25) (applied to λ = f i
and a = gi ) yields [ x n ] f i gi = f i · [ x ] gi = 0.
n

| {z }
=0
Forget that we  fixed i. We thus have shown that all i ∈ N satisfying i > n
satisfy [ x ] f i g = 0. Hence, all but finitely many i ∈ N satisfy [ x ] f i g = 0
n i n i

(because all but finitely many i ∈ N satisfy i > n). This is precisely what we
wanted to prove. Thus, Proposition 3.5.2 (b) is proved.
(c) Let n be a positive integer. We shall first show that x0 ( f n gn ) = 0.
 

Indeed, Proposition 3.5.2 (a) shows   that the first n coefficients of the FPS
gn are 0. However, the coefficient x0 ( gn ) isone of these first n coefficients
(since n is positive). Thus, this coefficient x0 ( gn ) must be 0. Now, 0  f n n∈ K;
n
thus, i (applied to f n , g and 0 instead of λ, a and n) yields x ( f n g ) =
h (25)
f n · x0 ( gn ) = 0.
| {z }
=0
Forget that we fixed n. We thus have shown that
h i
x0 ( f n gn ) = 0 for each positive integer n. (47)
Math 701 Spring 2021, version April 6, 2024 page 73

Now,
!
h i h i
x0 ∑ fn g n
= ∑ x0 ( f n gn ) (by (27))
n ∈N n ∈N
 
h i h i
= x 0  f 0 g0  + ∑ x 0 ( f n g n )
 
n >0 |
|{z} {z }
=1 =0
(by (47))
 
here, we have split off the
addend for n = 0 from the sum
h i h i
= x0 ( f 0 1) + ∑ 0 = x0 ( f 0 , 0, 0, 0, . . .) = f 0 .
n >0
| {z }
= f 0 (1,0,0,0,...) | {z }
=( f 0 ,0,0,0,...) =0

This proves Proposition 3.5.2 (c).

Example 3.5.3. The FPS x + x2 has constant term x0 x + x2 = 0. Hence,


  

according to Definition 3.5.1, we can substitute it for x into 1 + x + x2 + x3 +


· · · . The result is
 h i
2 3 2
1+x+x +x +··· x+x
   2  3  4  5
2 2 2 2 2
= 1+ x+x + x+x + x+x + x+x + x+x +···
= 1 + x + 2x2 + 3x3 + 5x4 + 8x5 + · · · .
The right hand side appears to be f 1 + f 2 x + f 3 x2 + f 4 x3 + · · · , where
( f 0 , f 1 , f 2 , . . .) is the Fibonacci sequence (as defined in Section 3.1). Let me
show that this indeed the case.
In Example 1 in Section 3.1, we had shown that
x
f 0 + f 1 x + f 2 x2 + f 3 x3 + · · · = .
1 − x − x2
Thus,
x
= f 0 + f 1 x + f 2 x2 + f 3 x3 + · · ·
1 − x − x2 |{z}
=0
= f 1 x + f 2 x2 + f 3 x3 + f 4 x4 + · · ·
 
2 3
= x f1 + f2 x + f3 x + f4 x + · · · .

Cancelling x from this equality (this is indeed allowed – make sure you un-
derstand why!), we obtain
1
= f 1 + f 2 x + f 3 x2 + f 4 x3 + · · · .
1 − x − x2
Math 701 Spring 2021, version April 6, 2024 page 74

However, it appears reasonable to expect that

1 1 h 2
i
= x + x , (48)
1 − x − x2 1−x
1 1
because substituting x + x2 for x in the expression results in .
1−x 1 − x − x2
1 
x + x2 to be

This is plausible but not obvious – after all, we defined
1−x
1
the result of substituting x + x2 for x into the expanded version of
1−x
2 3 1
(which is 1 + x + x + x + · · · ), not into the fractional expression .
1−x
Nevertheless, (48) is true (and will soon be proved). If we take this fact for
granted, then our claim easily follows:

1 1 h i
f 1 + f 2 x + f 3 x2 + f 4 x3 + · · · = = x + x 2
1 − x − x2
 1−x h i
= 1 + x + x2 + x3 + · · · x + x2

1
(since = 1 + x + x2 + x3 + · · · ).
1−x

3.5.2. Laws of substitution


The plausible but nontrivial statement (48) that we have just used follows from
part (c) of the following proposition:22

Proposition 3.5.4. Composition of FPSs satisfies the rules you would expect
it to satisfy:
(a) If f 1 , f 2 , g ∈ K [[ x ]] satisfy x0 g = 0, then ( f 1 + f 2 ) ◦ g = f 1 ◦ g + f 2 ◦ g.
 

(b) If f 1 , f 2 , g ∈ K [[ x ]] satisfy x0 g = 0, then ( f 1 · f 2 ) ◦ g = ( f 1 ◦ g) ·


 

( f 2 ◦ g ).
f f ◦g
(c) If f 1 , f 2 , g ∈ K [[ x ]] satisfy x0 g = 0, then 1 ◦ g = 1
 
, as long as
f2 f2 ◦ g
f 2 is invertible. (In particular, f 2 ◦ g is automatically invertible under these
assumptions.)
(d) If f , g ∈ K [[ x ]] satisfy x0 g = 0, then f k ◦ g = ( f ◦ g)k for each k ∈ N.
 

22 Weare treating the symbol “◦” similarly to the multiplication sign · in our PEMDAS conven-
tion. Thus, an expression like “ f 1 ◦ g + f 2 ◦ g” is understood to mean ( f 1 ◦ g) + ( f 2 ◦ g).
Math 701 Spring 2021, version April 6, 2024 page 75

(e) If f , g, h ∈ K [[ x ]] satisfy x0 g = 0 and x0 h = 0, then x0 ( g ◦ h) = 0


     

and ( f ◦ g) ◦ h = f ◦ ( g ◦ h).
(f) We have a ◦ g = a for each a ∈ K and g ∈ K [[ x ]].
(g) We have x ◦ g = g ◦ x = g for each g ∈ K [[ x ]].
(h) If ( f i )i∈ I ∈ K [[ x ]] I is a summable family of FPSs, and if g ∈ K [[ x ]] is an
FPS satisfying x0 g= 0, then the family ( f i ◦ g)i∈ I ∈ K [[ x ]] I is summable
 

as well and we have ∑ f i ◦ g = ∑ f i ◦ g.
i∈ I i∈ I

For our proof of Proposition 3.5.4, we will need the following lemma:

Lemma 3.5.5. Let f , g ∈ K [[ x ]] satisfy x0 g = 0. Let k ∈ N be such that the


 

first k coefficients of f are 0. Then, the first k coefficients of f ◦ g are 0.

Proof of Lemma 0 3.5.5. This is very similar to the proof of Proposition 3.5.2 (a).
We have x g = 0. Hence, Lemma 3.3.16 (applied to a = g) yields that there
exists an h ∈ K [[ x ]] such that g = xh. Consider this h.
Write the FPS f in the form f = ( f 0 , f 1 , f 2 , . . .). Then, the first k coefficients
of f are f 0 , f 1 , . . . , f k−1 . Hence, these coefficients f 0 , f 1 , . . . , f k−1 are 0 (since the
first k coefficients of f are 0). In other words,

fn = 0 for each n < k. (49)

Now, f = ( f 0 , f 1 , f 2 , . . .) = ∑ f n x n with f 0 , f 1 , f 2 , . . . ∈ K. Hence, Definition


n ∈N
3.5.1 yields

f [ g] = ∑ f n gn = ∑ f n gn + ∑ fn gn = ∑ 0gn + ∑ f n ( xh)n
n ∈N n∈N; n∈N; n∈N; n∈N;
|{z} |{z}
n<k =0 n≥k =( xh)n n<k k≤n
(by (49)) |{z} (since g= xh) | {z }
= ∑ =0
n∈N;
k≤n
= ∑ f n ( xh)n = ∑ f n x n hn .
n∈N;
| {z } n∈N;
k≤n = x n hn k≤n

Thus,
f ◦ g = f [ g] = ∑ f n x n hn = ∑ x n f n hn .
n∈N; n∈N;
|{z}
n
k≤n = x f n k≤n

But this ensures that the first k coefficients of f ◦ g are 0 23 . Thus, Lemma
3.5.5 follows.

23 Proof. We must show that [ x m ] ( f ◦ g) = 0 for any nonnegative integer m < k. But we can do
Math 701 Spring 2021, version April 6, 2024 page 76

Our proof of Proposition 3.5.4 will furthermore use the Kronecker delta nota-
tion:

Definition 3.5.6. If i and j are any objects, then δi,j means the element
(
1, if i = j;
of K.
0, if i ̸= j

For example, δ2,2 = 1 and δ3,8 = 0.


Proof of Proposition 3.5.4. The proof is long and not particularly combinatorial.
I am merely writing it down because it is so rarely explained in the literature.
(a) This is an easy consequence of the definitions, and also appears in [Loehr11,
Theorem 7.62] and [Brewer14, Proposition 2.2.2].  
Here are the details: Let f 1 , f 2 , g ∈ K [[ x ]] satisfy x0 g = 0. Write the FPSs
f 1 and f 2 as

f1 = ∑ f 1,n x n and f2 = ∑ f 2,n x n (50)


n ∈N n ∈N

with f 1,0 , f 1,1 , f 1,2 , . . . ∈ K and f 2,0 , f 2,1 , f 2,2 , . . . ∈ K. Then, adding the two equal-
ities in (50) together, we find

f1 + f2 = ∑ f 1,n x n + ∑ f 2,n x n = ∑ ( f 1,n x n + f 2,n x n ) = ∑ ( f 1,n + f 2,n ) x n .


n ∈N n ∈N n ∈N
| {z } n ∈N
=( f 1,n + f 2,n ) x n

Thus, Definition 3.5.1 (applied to f = f 1 + f 2 ) yields

( f 1 + f 2 ) [ g] = ∑ ( f 1,n + f 2,n ) gn (51)


n ∈N

(since f 1,n + f 2,n ∈ K for each n ∈ N).


this directly: If m is a nonnegative integer such that m < k, then
   

[ xm ] ( f ◦ g) = [ xm ]  ∑ x n f n hn  since f ◦ g = ∑ x n f n hn 
   
n∈N; n∈N;
k≤n k≤n
m h i h i
= ∑ [ x m ] ( x n f n hn ) = ∑ ∑ xi ( xn ) · x m −i ( f n h n )
n∈N; | {z } n∈N; i =0 | {z }
m
k≤n = k≤n
∑ [ xi ]( x n )·[ x m−i ]( f n hn ) =0
i =0 (since i ≤m<k ≤n
(by (22), and thus i ̸=n)
applied to x n , f n hn and m
instead of a, b and n)
m h i
= ∑ ∑ 0 · x m −i ( f n h n ) = 0,
n∈N; i =0
k≤n

exactly as we wanted to show.


Math 701 Spring 2021, version April 6, 2024 page 77

On the other hand, we have f 1 = ∑ f 1,n x n . Thus, Definition 3.5.1 yields


n ∈N
f 1 [ g] = ∑ f 1,n gn (since f 1,n ∈ K for each n ∈ N). Similarly, f 2 [ g] = ∑ f 2,n gn .
n ∈N n ∈N
Adding these two equalities together, we obtain

f 1 [ g] + f 2 [ g] = ∑ f 1,n gn + ∑ f 2,n gn = ∑ ( f 1,n gn + f 2,n gn ) = ∑ ( f 1,n + f 2,n ) gn .


n ∈N n ∈N n ∈N
| {z } n ∈N
=( f 1,n + f 2,n ) gn

Comparing this with (51), we obtain ( f 1 + f 2 ) [ g] = f 1 [ g] + f 2 [ g]. In other


words, ( f 1 + f 2 ) ◦ g = f 1 ◦ g + f 2 ◦ g (since the notation f ◦ g is synonymous to
f [ g]). This proves Proposition 3.5.4 (a).

(f) This is a near-trivial consequence of the definitions. To wit: Let a ∈ K.


Then,

∑ aδn,0 x n = a δ0,0 x0 +
|{z} ∑ a δn,0 xn = a · 1 + ∑ a · 0x n
n ∈N n∈N; n∈N;
|{z} |{z}
=1 =1 =0
n ̸ =0 n ̸ =0
(since 0=0) (since n̸=0) | {z }
=0
= a · 1 = a · (1, 0, 0, 0, . . .) = ( a · 1, a · 0, a · 0, a · 0, . . .)
= ( a, 0, 0, 0, . . .) = a. (52)

Let g ∈ K [[ x ]]. We must prove that a ◦ g = a. From (52), we obtain


a = ∑ aδn,0 x n . Hence, Definition 3.5.1 (or Definition 3.4.5) yields a [ g] =
n ∈N
∑ aδn,0 gn (since aδn,0 ∈ K for each n ∈ N). Thus,
n ∈N

a [ g] = ∑ aδn,0 gn = a δ0,0
|{z} n∑
g0 + a δn,0 gn = a · 1 + ∑ a · 0gn
n ∈N ∈N; n∈N;
|{z} |{z}
=1 =1 n ̸ =0 =0 n ̸ =0
(since 0=0) (since n̸=0) | {z }
=0
= a · 1 = a.

In other words, a ◦ g = a (since a ◦ g is a synonym for a [ g]). This proves


Proposition 3.5.4 (f).

(g) This is easy, too. Indeed, we have

∑ δn,1 x n = δ1,1 x1 +
|{z} ∑ δn,1 xn = x + ∑ 0x n
n ∈N n∈N; n∈N;
|{z} |{z}
=1 =x =0
n ̸ =1 n ̸ =1
(since 1=1) (since n̸=1) | {z }
=0
= x. (53)

Let g ∈ K [[ x ]]. We must prove that x ◦ g = g ◦ x = g. From (53), we


obtain x = ∑ δn,1 x n . Hence, Definition 3.5.1 (or Definition 3.4.5) yields x [ g] =
n ∈N
Math 701 Spring 2021, version April 6, 2024 page 78

∑ δn,1 gn (since δn,1 ∈ K for each n ∈ N). Thus,


n ∈N

x [ g] = ∑ δn,1 gn = δ1,1
|{z} n∑
g1 + δn,1 gn = g + ∑ 0gn = g.
n ∈N ∈N; n∈N;
|{z} |{z}
=1 =g n ̸ =1 =0 n ̸ =1
(since 1=1) (since n̸=1) | {z }
=0

In other words, x ◦ g = g (since x ◦ g is a synonym for x [ g]).


Next, let us write g in the form g = ∑ gn x n for some g0 , g1 , g2 , . . . ∈ K.
n ∈N
Then, Definition 3.5.1 yields g [ x ] = ∑ gn x n = g. Thus, g ◦ x = g [ x ] = g.
n ∈N
Combining this with x ◦ g = g, we obtain x ◦ g = g ◦ x = g. This proves
Proposition 3.5.4 (g).

(b) This appears in [Loehr11, Theorem 7.62] and [Brewer14, Proposition 2.2.2].
Here is the proof:
Let f 1 , f 2 , g ∈ K [[ x ]] satisfy x0 g = 0. Write the FPSs f 1 and f 2 as
 

f1 = ∑ f 1,n x n and f2 = ∑ f 2,n x n (54)


n ∈N n ∈N

with f 1,0 , f 1,1 , f 1,2 , . . . ∈ K and f 2,0 , f 2,1 , f 2,2 , . . . ∈ K. Thus, Definition 3.5.1 yields

f1 [x] = ∑ f 1,n x n = f 1 and f2 [x] = ∑ f 2,n x n = f 2


n ∈N n ∈N

and
f 1 [ g] = ∑ f 1,n gn and f 2 [ g] = ∑ f 2,n gn .
n ∈N n ∈N
Hence,

f 1 [ g] = ∑ f 1,n gn = ∑ f 1,i gi and


n ∈N i ∈N
f 2 [ g] = ∑ n
f 2,n g = ∑ f 2,j g j .
n ∈N j ∈N
Math 701 Spring 2021, version April 6, 2024 page 79

Multiplying these two equalities together, we find


! !
f 1 [ g] · f 2 [ g] = ∑ f 1,i gi ∑ f 2,j g j = ∑ ∑ |f1,i g{z
i
f 2,j g j
i ∈N j ∈N i ∈N j ∈N }
= f 1,i f 2,j gi g j
= f 1,i f 2,j gi+ j

= ∑ ∑ f 1,i f 2,j gi+ j = ∑ f 1,i f 2,j gi+ j


i ∈N j ∈N (i,j)∈N2
| {z } | {z }
= ∑ = ∑ ∑
(i,j)∈N2 n ∈N (i,j)∈N2 ;
i + j=n

= ∑ ∑ f 1,i f 2,j gi + j = ∑ ∑ f 1,i f 2,j gn


n ∈N (i,j)∈N2 ; n ∈N (i,j)∈N2 ;
|{z}
= gn
i + j=n i + j=n
(since i + j=n)
 

∑ ∑
  n
= 
 f 1,i f 2,j 
g . (55)
n ∈N (i,j)∈N2 ;
i + j=n

However, we can
 apply the same computations to x instead of g (since x is
also an FPS with x0 x = 0). Thus, we obtain


 

∑ ∑
  n
f1 [x] · f2 [x] = 
 x .
f 1,i f 2,j 
n ∈N (i,j)∈N2 ;
i + j=n

In view of f 1 [ x ] = f 1 and f 2 [ x ] = f 2 , this rewrites as


 

∑ ∑
  n
f1 · f2 = 
 f 1,i f 2,j 
x .
n ∈N (i,j)∈N2 ;
i + j=n

Hence, Definition 3.5.1 yields


 

∑ ∑
  n
( f 1 · f 2 ) [ g] = 
 f 1,i f 2,j 
g
n ∈N (i,j)∈N2 ;
i + j=n

(since ∑ f 1,i f 2,j ∈ K for each n ∈ N). Comparing this with (55), we obtain
(i,j)∈N2 ;
i + j=n

( f 1 · f 2 ) [ g] = f 1 [ g] · f 2 [ g] .
Math 701 Spring 2021, version April 6, 2024 page 80

In other words, ( f 1 · f 2 ) ◦ g = ( f 1 ◦ g) · ( f 2 ◦ g) (since the notation f ◦ g is syn-


onymous to f [ g]). This proves Proposition 3.5.4 (b).
Did you notice it? I have cheated. The above proof of Proposition 3.5.4 (b) relied
on some manipulations of infinite sums that need to be justified. Namely, we replaced
“ ∑ ∑ ” by “ ∑ ”. This is an application of the “discrete Fubini rule”, and as we
i ∈N j ∈N (i,j)∈N2
said above, this rule can only be used if we know that the family f 1,i f 2,j gi+ j (i,j)∈N×N


is summable. In other words, we need to show the following statement:


Statement 1: For each m ∈ N, all but finitely many pairs (i, j) ∈ N × N
satisfy [ x m ] f 1,i f 2,j gi+ j = 0.
We shall achieve this by proving the following statement:
Statement 2: For  any three nonnegative integers m, i, j with m < i + j, we
have [ x m ] gi+ j = 0.
[Proof of Statement2: Let m, i, j be three nonnegative integers with m < i + j. We must
show that [ xm ]  gi+ j = 0.
We have x0 g = 0. Hence, Lemma 3.3.16 (applied to a = g) shows that there exists
an h ∈ K [[ x ]] such that g = xh. Consider this h.
Now, from g = xh, we obtain gi+ j = ( xh)i+ j = xi+ j hi+ j . However, Lemma 3.3.17
(applied to i + j and hi+ j instead of k and a) shows that the first i + j coefficients of the
FPS xi+ j hi+ j are 0. In other words, i+ j
 the first i + j coefficients of the FPS g are 0 (since
i + j i + j i + j
g = x h ). But [ x ] g m i + j is one of these first i + j coefficients (since m < i + j).
Thus, we conclude that [ x m ] gi+ j = 0. This proves Statement 2.]
[Proof of Statement 1: Let m ∈ N. If (i, j) ∈ N × N is a pair satisfying m < i + j, then
   
m i+ j m i+ j
[ x ] f 1,i f 2,j g = f 1,i f 2,j [ x ] g (by (25))
| {z }
=0
(by Statement 2)

= 0.
Thus, all but finitely many pairs (i, j) ∈ N × N satisfy [ x m ] f 1,i f 2,j gi+ j = 0 (because


all but finitely many such pairs satisfy m < i + j). This proves Statement 1.]
As explained above, Statement 1 shows that the family
f 1,i f 2,j gi+ j (i,j)∈N×N is summable, and thus our interchange of summation signs made


above is justified. This completes our proof of Proposition 3.5.4 (b).

 0  easily from parts (b) and (f). In detail: Let f 1 , f 2 , g ∈ K [[ x ]]


(c) This follows
be such that x g = 0. Assume that f 2 is invertible. Let us first show that
f 2 ◦ g is invertible.
Indeed, consider the inverse f 2−1 of f 2 . This inverse exists (since f 2 is in-
vertible) and satisfies f 2−1 · f 2= 1. Now,
 Proposition
 3.5.4 (b) (applied to f 2−1
instead of f 1 ) yields f 2−1 · f 2 ◦ g = f 2−1 ◦ g · ( f 2 ◦ g). Hence,
   
f 2−1 ◦ g · ( f 2 ◦ g) = f 2−1 · f 2 ◦ g = 1 ◦ g = 1
| {z }
=1
Math 701 Spring 2021, version April 6, 2024 page 81

(by Proposition 3.5.4 (f), applied to a = 1). Thus, the FPS f 2−1 ◦ g is an inverse of
f ◦g
f 2 ◦ g. Hence, f 2 ◦ g is invertible. The expression 1 is therefore well-defined.
f2 ◦ g
f f ◦g
It now remains to prove that 1 ◦ g = 1 . To this purpose, we argue as
f2 f2 ◦ g
f
follows: The expression 1 is well-defined, since f 2 is invertible. Proposition
f2    
f1 f1 f1
3.5.4 (b) (applied to instead of f 1 ) yields · f2 ◦ g = ◦ g · ( f 2 ◦ g ).
f2 f2   f2
f f1
In view of 1 · f 2 = f 1 , this rewrites as f 1 ◦ g = ◦ g · ( f 2 ◦ g). We can
f2 f2
divide both sides of this equality by f 2 ◦ g (since f 2 ◦ g is invertible), and thus
f ◦g f f f ◦g
obtain 1 = 1 ◦ g. In other words, 1 ◦ g = 1 . Thus, Proposition 3.5.4
f2 ◦ g f2 f2 f2 ◦ g
(c) is proven.

(d) Let f , g ∈ K [[ x ]] satisfy x0 g = 0. We must prove that f k ◦ g = ( f ◦ g)k


 

for each k ∈ N.
We prove this by induction on k:
Induction base: We have f 0 ◦ g = 1 ◦ g = 1 (by Proposition 3.5.4 (f), applied
|{z}
=1
to a = 1). Comparing this with ( f ◦ g)0 = 1, we find f 0 ◦ g = ( f ◦ g)0 . In other
words, f k ◦ g = ( f ◦ g)k holds for k = 0.
Induction step: Let m ∈ N. Assume that f k ◦ g = ( f ◦ g)k holds for k = m. We
must prove that f k ◦ g = ( f ◦ g)k holds for k = m + 1.
We have assumed that f k ◦ g = ( f ◦ g)k holds for k = m. In other words, we
have f m ◦ g = ( f ◦ g)m . Now,
f m +1 ◦ g = ( f · f m ) ◦ g = ( f ◦ g ) · ( f m ◦ g )
| {z } | {z }
=f·fm =( f ◦ g)m
(by Proposition 3.5.4 (b), applied to f 1 = f and f 2 = f m )
= ( f ◦ g ) · ( f ◦ g ) m = ( f ◦ g ) m +1 .

In other words, f k ◦ g = ( f ◦ g)k holds for k = m + 1. This completes the


induction step. Thus, we have proven that f k ◦ g = ( f ◦ g)k for each k ∈ N.
Proposition 3.5.4 (d) is now proven.

(h) This is just a generalization of Proposition 3.5.4 (a) to (potentially) infinite


sums. The proof follows the same method, but unfortunately requires some
technical reasoning about summability. I will give the full proof for the sake of
completeness, but be warned that it contains nothing of interest.
Let ( f i )i∈ I ∈ K [[ x ]] I be a summable family of FPSs. Let g ∈ K [[ x ]] be an FPS
satisfying x0 g = 0.
Math 701 Spring 2021, version April 6, 2024 page 82

First, we shall prove that the family ( f i ◦ g)i∈ I ∈ K [[ x ]] I is summable. Indeed,


we recall that the family ( f i )i∈ I ∈ K [[ x ]] I is summable. In other words,

for each n ∈ N, all but finitely many i ∈ I satisfy [ x n ] f i = 0

(by the definition of “summable”). In other words, for each n ∈ N, there exists
a finite subset In of I such that

all i ∈ I \ In satisfy [ x n ] f i = 0. (56)

Consider this subset In . Thus, all the sets I0 , I1 , I2 , . . . are finite subsets of I.
Now, let n ∈ N be arbitrary. The set I0 ∪ I1 ∪ · · · ∪ In is a union of n + 1 finite
subsets of I (because all the sets I0 , I1 , I2 , . . . are finite subsets of I), and thus
itself is a finite subset of I. Moreover,

all i ∈ I \ ( I0 ∪ I1 ∪ · · · ∪ In ) satisfy [ x n ] ( f i ◦ g) = 0. (57)

[Proof of (57): Let i ∈ I \ ( I0 ∪ I1 ∪ · · · ∪ In ). We must show that [ x n ] ( f i ◦ g) = 0.


Let m ∈ {0, 1, . . . , n}. Then, Im ⊆ I0 ∪ I1 ∪ · · · ∪ In , so that I0 ∪ I1 ∪ · · · ∪ In ⊇ Im
and thus I \ ( I0 ∪ I1 ∪ · · · ∪ In ) ⊆ I \ Im . Hence, i ∈ I \ ( I0 ∪ I1 ∪ · · · ∪ In ) ⊆ I \ Im .
| {z }
⊇ Im
Therefore, (56) (applied to m instead of n) yields [ x m ] f i = 0.
Forget that we fixed m. We thus have shown that [ x m ] f i = 0 for each m ∈ {0, 1, . . . , n}.
In other words, the first n + 1 coefficients of f i are 0. Hence, Lemma 3.5.5 (applied to f i
and n + 1 instead of f and k) shows that the first n + 1 coefficients of f i ◦ g are 0. How-
ever, [ x n ] ( f i ◦ g) is one of these first n + 1 coefficients (indeed, it is the last of them);
thus, this coefficient [ x n ] ( f i ◦ g) must be 0. This proves (57).]
Now, recall that I0 ∪ I1 ∪ · · · ∪ In is a finite subset of I. Hence, thanks to (57),
we know that there exists a finite subset J of I such that all i ∈ I \ J satisfy
[ x n ] ( f i ◦ g) = 0 (namely, J = I0 ∪ I1 ∪ · · · ∪ In ). In other words, all but finitely
many i ∈ I satisfy [ x n ] ( f i ◦ g) = 0.
Forget that we fixed n. We thus have shown that

for each n ∈ N, all but finitely many i ∈ I satisfy [ x n ] ( f i ◦ g) = 0.

In other words, the family ( f i ◦ g)i∈ I ∈ K [[ x ]] I is summable (by the definition


of “summable”).  
It now remains to prove that ∑ f i ◦ g = ∑ f i ◦ g.
i∈ I i∈ I
For each i ∈ I, we write the FPS f i in the form f i = ∑ f i,n x n with
n ∈N
f i,0 , f i,1 , f i,2 , . . . ∈ K. First, we shall show that

the family ( f i,m gm )(i,m)∈ I ×N is summable. (58)

[Proof of (58): Fix an n ∈ N. Let J denote the set I0 ∪ I1 ∪ · · · ∪ In . Hence, J is a union


of n + 1 finite subsets of I (because all the sets I0 , I1 , I2 , . . . are finite subsets of I), and
Math 701 Spring 2021, version April 6, 2024 page 83

thus itself is a finite subset of I. The set J × {0, 1, . . . , n} must be finite (since it is the
product of the two finite sets J and {0, 1, . . . , n}).
Now, let (i, m) ∈ ( I × N) \ ( J × {0, 1, . . . , n}). We shall prove that [ x n ] ( f i,m gm ) = 0.
We have (i, m) ∈ / J × {0, 1, . . . , n} (since (i, m) ∈ ( I × N) \ ( J × {0, 1, . . . , n})).
n m n m
Wenote
0
 that f i,m ∈ K and thus [ x ] ( f i,m g ) = f i,m · [ x ] ( g ) (by (25)). However, we
have x g = 0. Thus, Proposition 3.5.2 (a) (applied to f i and f i,j and m instead of f
and f j and n) yields that the first m coefficients of the FPS gm are 0. In other words, we
have h i
x k ( gm ) = 0 for each k ∈ {0, 1, . . . , m − 1} . (59)

Now, if we have n ∈ {0, 1, . . . , m − 1}, then we have [ x n ] ( gm ) = 0 (by (59), applied


to k = n) and therefore [ x n ] ( f i,m gm ) = f i,m · [ x n ] ( gm ) = 0. Hence, [ x n ] ( f i,m gm ) = 0
| {z }
=0
is proved in the case when n ∈ {0, 1, . . . , m − 1}. Thus, for the rest of this proof of
[ x n ] ( f i,m gm ) = 0, we WLOG assume that n ∈ / {0, 1, . . . , m − 1}. Hence, n > m − 1
(since n ∈ N), so that n ≥ m (since n and m are integers). Therefore, m ≤ n, so that
m ∈ {0, 1, . . . , n} and therefore Im ⊆ I0 ∪ I1 ∪ · · · ∪ In = J (since we defined J to be
I0 ∪ I1 ∪ · · · ∪ In ).
If we had i ∈ Im , then we would have (i, m) ∈ J × {0, 1, . . . , n} (since i ∈ Im ⊆ J
and m ∈ {0, 1, . . . , n}), which would contradict the fact that (i, m) ∈ / J × {0, 1, . . . , n}.
Thus, we cannot have i ∈ Im . Hence, i ∈ / Im , so that i ∈ I \ Im (since i ∈ I). Thus, (56)
(applied to m instead of n) yields [ x m ] f i = 0. However, from f i = ∑ f i,n x n , we see that
n ∈N
[ x m ] f i = f i,m . Thus, f i,m = [ x m ] f i = 0. Consequently, [ x n ] ( f i,m gm ) = f i,m · [ x n ] ( gm ) =
|{z}
=0
0.
Forget that we fixed (i, m). We thus have shown that

all (i, m) ∈ ( I × N) \ ( J × {0, 1, . . . , n}) satisfy [ x n ] ( f i,m gm ) = 0.

Therefore, all but finitely many (i, m) ∈ I × N satisfy [ x n ] ( f i,m gm ) = 0 (since J ×


{0, 1, . . . , n} is a finite subset of I × N).
Forget that we fixed n. We thus have shown that

for each n ∈ N, all but finitely many (i, m) ∈ I × N satisfy [ x n ] ( f i,m gm ) = 0.

In other words, the family ( f i,m gm )(i,m)∈ I ×N is summable. This proves (58).]
Now, we have shown that the family ( f i,m gm )(i,m)∈ I ×N is summable. Renam-
ing the index (i, m) as (i, n), we thus conclude that the family ( f i,n gn )(i,n)∈ I ×N
is summable. The same argument (but with g replaced by x)  shows that the
n 0

family ( f i,n x )(i,n)∈ I ×N is summable (since the FPS x satisfies x x = 0).
 
The proof of ∑ f i ◦ g = ∑ f i ◦ g is now just a matter of computation: hen,
i∈ I i∈ I
summing the equalities f i = ∑ f i,n x n over all i ∈ I, we obtain
n ∈N

∑ fi = ∑ ∑ f i,n x n = ∑ ∑ fi,n xn
i∈ I i∈ I n ∈N n ∈N i∈ I
Math 701 Spring 2021, version April 6, 2024 page 84

(here, we have been able to interchange the summation signs, since the family
( f i,n x n )(i,n)∈ I ×N is summable). Thus,
!
∑ fi = ∑ ∑ fi,n xn = ∑ ∑ fi,n xn .
i∈ I n ∈N i∈ I n ∈N i∈ I

Hence, Definition 3.5.1 (applied to f = ∑ f i ) yields


i∈ I
! !
∑ fi [ g] = ∑ ∑ fi,n gn (60)
i∈ I n ∈N i∈ I

(since ∑ f i,n ∈ K for each n ∈ N).


i∈ I
On the other hand, for each i ∈ I, we have f i [ g] = ∑ f i,n gn (by Definition
n ∈N
3.5.1, since f i = ∑ f i,n x n with f i,0 , f i,1 , f i,2 , . . . ∈ K). Summing these equalities
n ∈N
over all i ∈ I, we find

∑ f i [ g] = ∑ ∑ f i,n gn = ∑ ∑ fi,n gn
i∈ I i∈ I n ∈N n ∈N i∈ I

(again, we have been able to interchange the summation signs, since the family
( f i,n gn )(i,n)∈ I ×N is summable). Thus,
!
∑ fi [ g] = ∑ ∑ fi,n gn = ∑ ∑ fi,n gn .
i∈ I n ∈N i∈ I n ∈N i∈ I
 
Comparing this with (60), we find ∑ f i [ g] = ∑ f i [ g]. In other words,
  i∈ I i∈ I

∑ f i ◦ g = ∑ f i ◦ g (since the notation f ◦ g is synonymous to f [ g]). Thus,


i∈ I i∈ I
Proposition 3.5.4 (h) is proven.

(e) This is [Loehr11, Theorem 7.63] and [Brewer14, Proposition 2.2.5].24 Again,
let us give the proof:
Write the FPS g in the form g = ∑ gn x n for some g0 , g1 , g2 , . . . ∈ K. Then,
n ∈N
g0 = x g = 0. Moreover, g ◦ h = g [h] = ∑ gn hn (by Definition 3.5.1, because
 0
n ∈N
g = ∑ gn x n with g0 , g1 , g2 , . . . ∈ K). But Proposition 3.5.2 (c) (applied to g,
n ∈N  
n
 0
h and gn instead of f , g and f n ) yields x ∑ gn h = g0 = 0. In view
n ∈N

24 See also [19s, Proposition 7.6.14] for a similar property for polynomials.
Math 701 Spring 2021, version April 6, 2024 page 85

of g ◦ h = ∑ gn hn , this rewrites as x0 ( g ◦ h) = 0. Hence, the composition


 
n ∈N
f ◦ ( g ◦ h) is well-defined.
It remains to show that ( f ◦ g) ◦ h = f ◦ ( g ◦ h).
Write the FPS f in the form f = ∑ f n x n for some f 0 , f 1 , f 2 , . . . ∈ K. Thus,
n ∈N
Definition 3.5.1 yields
f [ g] = ∑ f n gn and f [ g ◦ h] = ∑ f n · ( g ◦ h)n .
n ∈N n ∈N

Moreover, the family ( f n gn )n∈N is summable (by Proposition 3.5.2 (b)). Hence,
Proposition 3.5.4 (h) (applied to ( f n gn )n∈N and h instead of ( f i )i∈ I and g) yields
that the family (( f n gn ) ◦ h)n∈N ∈ K [[ x ]]N is summable as well and that we have
!
∑ f n gn ◦h = ∑ ( f n gn ) ◦ h. (61)
n ∈N n ∈N

In view of f ◦ g = f [ g] = ∑ f n gn , we can rewrite (61) as


n ∈N

( f ◦ g) ◦ h = ∑ ( f n gn ) ◦ h. (62)
n ∈N

However, for each n ∈ N, we have f n gn = f n · gn (by Theorem 3.2.6 (d), applied


to λ = f n and a = gn ) and thus
   
( f n gn ) ◦ h = f n · gn ◦ h = fn ◦ h · ( gn ◦ h)
| {z }
n
| {z }
= fn =( g◦h)
(by Proposition 3.5.4 (d),
(by Proposition 3.5.4 (f),
applied to g and h
applied to f n and h
instead of f and g)
instead of a and g)
!
by Proposition 3.5.4 (b),
applied to f n , gn and h instead of f 1 , f 2 and g
= f n · ( g ◦ h)n = f n · ( g ◦ h)n (63)

(since Theorem 3.2.6 (d) (applied to λ = f n and a = ( g ◦ h)n ) yields f n ·


( g ◦ h)n = f n · ( g ◦ h)n ). Thus, (62) becomes
( f ◦ g) ◦ h = ∑ ( f n gn ) ◦ h = ∑ f n · ( g ◦ h)n
n ∈N
| {z } n∈N
= f n ·( g◦h)n
(by (63))
!
= f [ g ◦ h] since f [ g ◦ h] = ∑ f n · ( g ◦ h)n
n ∈N
= f ◦ ( g ◦ h) .
This completes the proof of Proposition 3.5.4 (e).
Math 701 Spring 2021, version April 6, 2024 page 86

Example 3.5.7. Let us use Proposition 3.5.4 (c) to justify the equality (48) that
we used in Example 3.5.3. Indeed, we know that the FPS 1 − x is invertible.
Thus, applying Proposition 3.5.4 (c) to f 1 = 1 and f 2 = 1 − x and g = x + x2 ,
we obtain
1 ◦ x + x2

1 
2

◦ x+x = .
1−x (1 − x ) ◦ ( x + x 2 )
Using the notation f [ g] instead of f ◦ g, we can rewrite this as

1 x + x2
 
1 h 2
i
x+x = .
1−x (1 − x ) [ x + x 2 ]

In view of 1 x + x2 = 1 and (1 − x ) x + x2 = 1 − x + x2 = 1 − x − x2 ,
    
1  1
x + x2 =

this rewrites as . Thus, (48) is proved.
1−x 1 − x − x2

Let us summarize: If f , g ∈ K [[ x ]] are two FPSs, then the composition f [ g] is


not always defined. However, it is defined at least in the following two cases:

• in the case when f is a polynomial (that is, f ∈ K [ x ]), and

• in the case when g has constant term 0 (that is, x0 g = 0).


 

This justifies some more of the things we did back in Section 3.1; in particular,
Example 1 from that section is now fully justified. But we still have not defined
(e.g.) the square root of an FPS, which we used in Example 2.
Before I explain square roots, let me quickly survey differentiation of FPSs.

3.6. Derivatives of FPSs


Our definition of the derivative of a FPS copycats the well-known formula for
the derivative of a power series in analysis:

Definition 3.6.1. Let f ∈ K [[ x ]] be an FPS. Then, the derivative f ′ of f is an


FPS defined as follows: Write f as f = ∑ f n x n (with f 0 , f 1 , f 2 , . . . ∈ K), and
n ∈N
set
f ′ := ∑ n f n x n −1 .
n >0

To make sure that this derivative behaves nicely, we need to check that it
satisfies the familiar properties of derivatives. And indeed, it does:

Theorem 3.6.2. (a) We have ( f + g)′ = f ′ + g′ for any f , g ∈ K [[ x ]].


Math 701 Spring 2021, version April 6, 2024 page 87

f i′

(b) If ( f i )i∈ I is a summable family of FPSs, then the family i∈ I
is
summable as well, and we have
!′
∑ fi = ∑ f i′ .
i∈ I i∈ I

(c) We have (c f )′ = c f ′ for any c ∈ K and f ∈ K [[ x ]].


(d) We have ( f g)′ = f ′ g + f g′ for any f , g ∈ K [[ x ]]. (This is known as the
Leibniz rule.)
(e) If f , g ∈ K [[ x ]] are two FPSs such that g is invertible, then
 ′
f f ′ g − f g′
= .
g g2

(This is known as the quotient rule.)


(f) If g ∈ K [[ x ]] is an FPS, then ( gn )′ = ng′ gn−1 for any n ∈ N (where the
expression ng′ gn−1 is to be understood as 0 if n = 0).
(g) Given two FPSs f , g ∈ K [[ x ]], we have

( f ◦ g)′ = f ′ ◦ g · g′


if f is a polynomial or if x0 g = 0. (This is known as the chain rule.)


 

(h) If K is a Q-algebra, and if two FPSs f , g ∈ K [[ x ]] satisfy f ′ = g′ , then


f − g is constant.

Theorem 3.6.2 justifies Example 4 in Section 3.1 (specifically, Theorem 3.6.2


 ′
1
(e) is the quotient rule that we used to compute ).
1−x
Proof of Theorem 3.6.2 (sketched). (a) This is part of [19s-mt3s, Exercise 5 (b)] (specif-
ically, it is Statement 1 in [19s-mt3s, solution to Exercise 5 (b)]). Anyway, the
proof is very easy.
(b) This is just the natural generalization of Theorem 3.6.2 (a) to (potentially)
infinite sums. The proof follows the same idea, but requires some straightfor-
ward technical verifications (mainly to check that the summation signs can be
interchanged).
(c) This is part of [19s-mt3s, Exercise 5 (b)] (specifically, it is Statement 3 in
[19s-mt3s, solution to Exercise 5 (b)]).
(d) This is [19s-mt3s, Exercise 5 (c)] and [Grinbe17, Proposition 0.2 (c)].
(e) Let f , g ∈ K [[ x ]] be two FPSs such that g is invertible. Then, Theorem
 ′  ′
f f f f
3.6.2 (d) (applied to instead of f ) yields ·g = · g + · g′ . In view
g g g g
Math 701 Spring 2021, version April 6, 2024 page 88

 ′  ′
f ′ f f ′ f
of · g = f , this rewrites as f = · g + · g . Solving this for , we
g g g g
 ′
f f ′ g − f g′
find = . This proves Theorem 3.6.2 (e).
g g2
(f) This follows by induction on n, using part (d) (in the induction step) and

1 = 0 (in the induction base).
(g) Let f , g ∈ K [[ x ]] be two FPSs such that f is a polynomial or x0 g = 0.
 

Write the FPS f in the form f = ∑ f n x n with f 0 , f 1 , f 2 , . . . ∈ K. Then, either


n ∈N
Definition 3.5.1 or Definition 3.4.5 (depending on whether we have x0 g = 0
 

or f is a polynomial) yields f [ g] = ∑ f n gn . Hence,


n ∈N
!′
( f [ g])′ = ∑ f n gn = ∑ ( f gn )′
| n{z }
(by Theorem 3.6.2 (b))
n ∈N n ∈N
= f n ( gn )′
(by Theorem 3.6.2 (c))
 ′
= ∑ f n ( g n ) ′ = f 0 g0 + ∑ fn ( gn )′ = f 0 · 0 + ∑ f n ng′ gn−1
n ∈N | {z } n>0 | {z } | {z } n>0

=1 =0 = ng′ gn−1 =0
(by Theorem 3.6.2 (f))

= ∑ f n n g ′ g n −1 = ∑ n f n g n −1 g ′ . (64)
n >0 n >0
| {z }
= g n −1 g ′

On the other hand, from f = ∑ f n x n , we obtain f ′ = ∑ n f n x n−1 =


n ∈N n >0
∑ (m + 1) f m+1 x m (here, we have substituted m + 1 for n in the sum). Hence,
m ∈N
Definition 3.5.1 or Definition 3.4.5 (depending on whether we have x0 g = 0
 

or f is a polynomial) yields
f ′ [ g] = ∑ ( m + 1 ) f m +1 g m = ∑ n f n g n −1
m ∈N n >0

(here, we have substituted n − 1 for m in the sum). Multiplying both sides of


this equality by g′ , we find
!
f ′ [ g] · g′ = ∑ n f n g n −1 · g′ = ∑ n f n g n −1 g ′ .
n >0 n >0

Comparing this with (64), we find ( f [ g])′ = f ′ [ g] · g′ . In other words, ( f ◦ g)′ =


( f ′ ◦ g) · g′ (since f ◦ g is a synonym for f [ g]). This proves Theorem 3.6.2 (g).
(Note that this proof is done in [Loehr11, proof of Theorem 7.57 (d)] in the case
when f is a polynomial.)
(h) The proof is easy and can be found in [Grinbe17, Lemma 0.3]. Note
that the condition that K be a Q-algebra is crucial, since it allows dividing by
positive integers. (For example, if K could be Z/2, then we would easily get a
counterexample, e.g., by taking f = x2 and g = 0.)
Math 701 Spring 2021, version April 6, 2024 page 89

3.7. Exponentials and logarithms


Convention 3.7.1. Throughout Section 3.7, we assume that K is not just a
commutative ring, but actually a commutative Q-algebra.

3.7.1. Definitions
Convention 3.7.1 entails that elements of K can be divided by the positive inte-
gers 1, 2, 3, . . .. We can use this to define three specific (and particularly impor-
tant) FPSs over K:

Definition 3.7.2. Define three FPS exp, log and exp in K [[ x ]] by

1 n
exp := ∑ n!
x ,
n ∈N
(−1)n−1 n
log := ∑ n x,
n ≥1
1 n
exp := exp −1 = ∑ n!
x .
n ≥1

1 n
(The last equality sign here follows from exp = ∑ x =
n∈N n!
1 1 n 1 n
x0 + ∑ x = 1+ ∑ x .)
0! n≥1 n! n≥1 n!
|{z}
|{z} =1
=1

Note that the FPS exp is the usual exponential series from analysis, but now
manifesting itself as a FPS. Likewise, log is the Mercator series for log (1 + x ),
where log stands for the natural logarithm function. The natural logarithm
function itself cannot be interpreted as an FPS, since log 0 is undefined.

3.7.2. The exponential and the logarithm are inverse


I will prove that
exp ◦ log = log ◦ exp = x. (65)
This is an algebraic analogue of the well-known fact from analysis which states
that the exponential and logarithm functions are mutually inverse.
There is a short way of proving (65), which I will not take: Namely, one
can show that any equality between holomorphic functions on an open disk
around the origin leads to an equality between their Taylor series (viewed as
FPSs). Thus, if you have proved in complex analysis that log ◦ exp = id on
an open disk around 0 and exp ◦ log = id on an open disk around 1, then you
automatically get (65) (indeed, exp and log are the Taylor series of the functions
Math 701 Spring 2021, version April 6, 2024 page 90

exp and log around 1, with the reservation that the point 1 has been moved
to the origin by a shift). This approach uses nontrivial results from complex
analysis, so I will not follow it and instead start from scratch.
The main tool in the proof of (65) will be the following useful proposition
([Grinbe17, Lemma 0.4]):

Proposition 3.7.3. Let g ∈ K [[ x ]] with x0 g = 0. Then:


 

(a) We have
(exp ◦ g)′ = (exp ◦ g)′ = (exp ◦ g) · g′ .
(b) We have  ′
log ◦ g = (1 + g ) −1 · g ′ .

Proof of Proposition 3.7.3. (a) Let us first show that exp′ = exp′ = exp. Indeed,
exp = exp −1, so that exp = exp + 1 and therefore

exp′ = (exp + 1)′ = exp′ + |{z}


1′ (by Theorem 3.6.2 (a))
=0

= exp . (66)
1 n
Next, we recall that exp = ∑ x . Hence, the definition of a derivative yields
n∈N n!

1 1 1 n
exp′ = ∑ n· x n −1 = ∑ n − 1) !
x n −1 = ∑ x
n ≥1 | {zn!} n ≥1 ( n ∈N
n!
1
=
( n − 1) !
(since n!=n·(n−1)!)
(here, we have substituted n for n − 1 in the sum)
= exp . (67)

Comparing this with (66), we find

exp′ = exp . (68)

 0Now,
 we can apply the chain rule (Theorem 3.6.2 (g)) to f = exp (since
x g = 0), and thus obtain
 

(exp ◦ g)′ = exp′ ◦ g · g′ = (exp ◦ g) · g′ .


 
|{z}
=exp

The same computation (but with exp replaced by exp) yields (exp ◦ g)′ = (exp ◦ g) ·
g′ . Combining these two formulas, we obtain (exp ◦ g)′ = (exp ◦ g)′ = (exp ◦ g) ·
g′ . Thus, we have proved Proposition 3.7.3 (a).
Math 701 Spring 2021, version April 6, 2024 page 91

(−1)n−1 n
(b) We have log = ∑ x . Thus,
n ≥1 n
!′
′ (−1)n−1 n
log = ∑ n x
n ≥1

(−1)n−1
= ∑ ( x n )′ (by Theorem 3.6.2 (b))
n ≥1
n | {z }
=nx ′ x n−1
(by Theorem 3.6.2 (f),
applied to x instead of g)

(−1)n−1
= ∑ n n |{z} x ′ x n−1 = ∑ (−1)n−1 x n−1 = ∑ (−1)n x n
n ≥1 | {z } =1 n ≥1 n ∈N
=(−1)n−1

(here, we have substituted n for n − 1 in the sum). On the other hand, Proposi-
tion 3.3.7 yields (1 + x )−1 = ∑ (−1)n x n . Comparing these two equalities, we
n ∈N
find

log = (1 + x )−1 . (69)

 0Now,
 we can apply the chain rule (Theorem 3.6.2 (g)) to f = log (since
x g = 0), and thus obtain
 
 ′  ′  ′  
log ◦ g =  log ◦ g  · g = (1 + x ) −1 ◦ g · g ′ . (70)
 |{z} 
=(1+ x )−1

However, we claim that (1 + x )−1 ◦ g = (1 + g)−1 . Indeed, Proposition 3.5.4


1 1◦g
(c) (applied to f 1 = 1 and f 2 = 1 + x) yields ◦g = (since the
1+x (1 + x ) ◦ g
1
FPS 1 + x is invertible). In view of = (1 + x )−1 and
1+x
1 ◦g = 1◦g = 1
|{z} (by Proposition 3.5.4 (f), applied to a = 1)
=1

and
(1 + x ) ◦ g = 1 + g (this follows easily from Definition 3.5.1) ,
1
this rewrites as (1 + x )−1 ◦ g = = (1 + g)−1 . Hence, (70) becomes
1+g
 ′  
log ◦ g = (1 + x ) ◦ g · g′ = (1 + g)−1 · g′ .
−1
| {z }
=(1+ g)−1

Thus, we have proved Proposition 3.7.3 (b).


Math 701 Spring 2021, version April 6, 2024 page 92

We will need a very simple lemma, which says (in particular) that if two FPSs
have constant terms 0, then so does their composition:
 0
Lemma
 0 3.7.4. Let
 0 f , g ∈ K [[ x ]] be two FPSs with x g = 0. Then,
x ( f ◦ g) = x f .

Proof of Lemma 3.7.4. Write f in the form f = ∑ f n x n with f 0 , f 1 , f 2 , . . . ∈ K.


n ∈N
Thus, f 0 = x0 f . Now, Definition 3.5.1 yields f [ g] = ∑ f n gn . However,
 
  n ∈N
n
 0  0
Proposition 3.5.2 (c) yields x ∑ fn g = f 0 = x f . In view of f ◦ g =
n ∈N
f [ g] = ∑ f n gn , we can rewrite this as x0 ( f ◦ g) = x0 f . This proves
   
n ∈N
Lemma 3.7.4.
Now, we can prove the equalities we promised ([Grinbe17, Theorem 0.1]):

Theorem 3.7.5. We have

exp ◦ log = x and log ◦ exp = x.

Proof of Theorem 3.7.5. Let us first prove that log ◦ exp = x.


Indeed, the idea of this proof is to show that log ◦ exp and x are two FPSs
with the same constant term (namely, 0) and with the same derivative. Once
this is proved, Theorem 3.6.2 (h) will easily yield that they are equal.
1 n
Let us fill in the details. We have x0 exp = 0 (since exp = ∑
 
x ).
n ≥ 1 n!
  
Hence, Lemma 3.7.4 (applied to f = log and g = exp) yields x0 log ◦ exp =
 0 (−1)n−1 n
x log = 0 (since log = ∑ x ). Now, (21) yields
n ≥1 n
h i  h i  h i
0 0
x log ◦ exp − x = x log ◦ exp − x0 x = 0. (71)
| {z } | {z }
=0 =0

However, exp = exp −1 and thus 1 + exp = exp. Now, Proposition 3.7.3 (b)
(applied to g = exp) yields
  −1
 ′
log ◦ exp = 1 + exp · exp′ = exp−1 · exp = 1 = x ′
 
| {z } |{z}
=exp =exp
(by (68))

(since x ′ = 1). Hence, Theorem 3.6.2 (h) (applied to f = log ◦ exp and g = x)
yields that log ◦ exp − x is constant. In other words, log ◦ exp − x = a for some
Math 701 Spring 2021, version April 6, 2024 page 93

  
a ∈ K. Consider this a. From log ◦ exp − x = a, we obtain x0 log ◦ exp − x =
 0
x a = a. Comparing this with (71), we find a = 0. Hence, log ◦ exp − x = a
rewrites as log ◦ exp − x = 0. In other words, log ◦ exp = x.
Now it remains to prove that exp ◦ log = x. There are (at least) two ways to
do this:

• 1st
 0 way: A homework exercise (Exercise A.2.5.2) says that any FPS f with
x f = 0 and with x1 f invertible

 0has
 a unique compositional inverse
(i.e., there is a unique FPS g with x g = 0 and f ◦ g = g ◦ f = x). We
can apply this to f = log (since x0 log = 0 and since x1 log = 1 is
   

invertible), and thus see that log has a unique compositional inverse g.
This compositional
 inverse g must be exp, since log ◦ exp = x (indeed,
comparing g ◦ log ◦ exp = x ◦ exp = exp with
| {z }
=x
   
g ◦ log ◦ exp = g ◦ log ◦ exp (by Proposition 3.5.4 (e))
| {z }
=x
= g◦x = g

yields g = exp). But this entails that exp ◦ log = x as well.

• 2nd way: Here is a more direct argument. We shall first show that exp ◦log =
1 + x.
To wit: The FPS 1 + x is invertible (by Proposition 3.3.9). Thus, applying
the quotient rule (Theorem 3.6.2 (e)) to f = exp ◦ log and g = 1 + x, we
obtain
 ′  
· (1 + x ) ′
!′
exp ◦ log exp ◦ log · ( 1 + x ) − exp ◦ log
= .
1+x (1 + x )2

In view of
 ′   ′
exp ◦ log = exp ◦ log · log
|{z}
=(1+ x )−1
(by (69))
 
by Proposition 3.7.3 (a), applied to g = log
 
= exp ◦ log · (1 + x )−1
Math 701 Spring 2021, version April 6, 2024 page 94

and (1 + x )′ = 1, we can rewrite this as


   
!′ −1
exp ◦ log exp ◦ log · ( 1 + x ) · ( 1 + x ) − exp ◦ log ·1
=
1+x (1 + x )2
   
exp ◦ log − exp ◦ log
= 2
= 0 = 0′ .
(1 + x )

exp ◦ log
Thus, Theorem 3.6.2 (h) (applied to f = and g = 0) yields that
1+x
exp ◦ log exp ◦ log
− 0 is constant. In other words, is constant. In other
1+x 1+x
exp ◦ log exp ◦ log
words, = a for some a ∈ K. Consider this a. From = a,
1+x 1+x
we obtain exp ◦ log = a (1 + x ) = a (1 + x ). Thus,
h i  h i
x0 exp ◦ log = x0 ( a (1 + x )) = a.
  
However, it is easy to see that x0 exp ◦ log = 1 25 . Comparing these
two equalities, we find a = 1. Thus, exp ◦ log = |{z}
a (1 + x ) = 1 + x.
=1
Now, exp = exp −1 = exp + −1. Hence,

exp ◦ log
= (exp + −1) ◦ log = exp ◦ log + −1 ◦ log
| {z } | {z }
=1+ x =−1
(by Proposition 3.5.4 (f),
applied to −1 and log instead of a and g)
!
by Proposition 3.5.4 (a),
applied to f 1 = exp and f 2 = −1 and g = log
= 1 + x + −1 = (1 + −1) + x = x.
| {z }
=0

Either way, we have shown that exp ◦ log = x. Thus, the proof of Theorem
3.7.5 is complete.
25 Proof. Recall that x0 log = 0. Hence, Lemma 3.7.4 (applied to f = exp and g = log) yields
 

!
h i  h i 1 1 n
x 0 0
exp ◦ log = x exp = since exp = ∑ x
0! n ∈N
n!
1
= = 1.
1
Math 701 Spring 2021, version April 6, 2024 page 95

3.7.3. The exponential and the logarithm of an FPS


In Definition 3.7.2, we have found algebraic versions of the exponential and
logarithm functions as FPSs. Next, we shall define analogues of these functions
as operators acting on FPSs (i.e., analogues not of the functions exp and log
themselves, but rather of composition with these functions):
Definition
 0 3.7.6. (a) We let K [[ x ]]0 denote the set of all FPSs f ∈ K [[ x ]] with
x f = 0.
(b) We let K [[ x ]]1 denote the set of all FPSs f ∈ K [[ x ]] with x0 f = 1.
 

(c) We define two maps

Exp : K [[ x ]]0 → K [[ x ]]1 ,


g 7→ exp ◦ g

and

Log : K [[ x ]]1 → K [[ x ]]0 ,


f 7→ log ◦ ( f − 1) .

(These two maps are well-defined according to parts (c) and (d) of Lemma
3.7.7 below.)
The maps Exp and Log are algebraic analogues of the maps in complex anal-
ysis that take any holomorphic function f to its exponential and logarithm,
respectively (at least within certain regions in which these things are well-
defined). As one would hope, and as we will soon see, they are mutually
inverse. Let us first check that their definition is justified:
Lemma 3.7.7. (a) For any f , g ∈ K [[ x ]]0 , we have f ◦ g ∈ K [[ x ]]0 .
(b) For any f ∈ K [[ x ]]1 and g ∈ K [[ x ]]0 , we have f ◦ g ∈ K [[ x ]]1 .
(c) For any g ∈ K [[ x ]]0 , we have exp ◦ g ∈ K [[ x ]]1 .
(d) For any f ∈ K [[ x ]]1 , we have f − 1 ∈ K [[ x ]]0 and log ◦ ( f − 1) ∈ K [[ x ]]0 .

 0  (a) Let f , g ∈
Proof of Lemma 3.7.7.
0
 K [[ x ]]0 . In view of the definition of
 0K [[ x ]]0 ,
 0 entails that x f = 0 and x g = 0. Hence, Lemma 3.7.4 yields x ( f ◦ g) =
this
x f = 0. In other words, f ◦ g ∈ K [[ x ]]0 (by the definition of K [[ x ]]0 ). This
proves Lemma 3.7.7 (a).
(b) This is analogous to the proof of Lemma 3.7.7 (a).
1 n 1
(c) Let g ∈ K [[ x ]]0 . From exp = ∑ x , we obtain x0 exp =
 
= 1,
n∈N n! 0!
so that exp ∈ K [[ x ]]1 . Hence, Lemma 3.7.7 (b) (applied to f = exp) yields
exp ◦ g ∈ K [[ x ]]1 . This proves Lemma
 0  3.7.7 (c).  0
i Leth f i∈ K [[ x ]]1 . Thus, x f = 1. Now, (21) yields x  ( f − 1) =
h (d)
x0 f − x0 1 = 1 − 1 = 0, so that f − 1 ∈ K [[ x ]]0 . Furthermore, x0 log = 0
| {z } | {z }
=1 =1
Math 701 Spring 2021, version April 6, 2024 page 96

(−1)n−1 n
(since log = ∑ x ) and thus log ∈ K [[ x ]]0 . Hence, Lemma 3.7.7 (a)
n ≥1 n
(applied to log and f − 1 instead of f and g) yields log ◦ ( f − 1) ∈ K [[ x ]]0 .
Thus, Lemma 3.7.7 (d) is proven.

Lemma 3.7.8. The maps Exp and Log are mutually inverse bijections between
K [[ x ]]0 and K [[ x ]]1 .

Proof of Lemma 3.7.8. First, we make a simple auxiliary observation: Each g ∈


K [[ x ]]0 satisfies26
exp ◦ g = exp ◦ g + 1. (72)
[Proof of (72): Let g ∈ K [[ x ]]0 . Recall that exp = exp −1, so that exp =
exp + 1 = exp + 1. Hence,

exp ◦ g = (exp + 1) ◦ g = exp ◦ g + 1 ◦ g

(by Proposition 3.5.4 (a), applied to f 1 = exp and f 2 = 1). However, Proposition
3.5.4 (f) (applied to a = 1) yields 1 ◦ g = 1 = 1. Hence, exp ◦ g = exp ◦ g +
1 ◦ g = exp ◦ g + 1. This proves (72).]
|{z}
=1
Now, let us show that Exp ◦ Log = id. Indeed, we fix some f ∈ K [[ x ]]1 . Then,
f − 1 ∈ K [[ x ]]0 (by Lemma 3.7.7 (d)). Hence, Proposition
 3.5.4 (e) (applied
to exp, log and f − 1 instead of f , g and h) yields exp ◦ log ◦ ( f − 1) =
 
exp ◦ log ◦ ( f − 1) . Thus,
   
exp ◦ log ◦ ( f − 1) = exp ◦ log ◦ ( f − 1) = x ◦ ( f − 1)
| {z }
=x
(by Theorem 3.7.5)
= f −1 (73)

26 As before,
the “◦” operation behaves like multiplication in the sense of PEMDAS conventions.
Thus, the expression “exp ◦ g + 1” means (exp ◦ g) + 1.
Math 701 Spring 2021, version April 6, 2024 page 97

(by Proposition 3.5.4 (g), applied to g = f − 1). However,


(Exp ◦ Log) ( f ) = Exp (Log f )
= exp ◦ (Log f ) (by the definition of Exp)
= exp ◦ (Log f ) +1
| {z }
=log◦( f −1)
(by the definition of Log )

(by (72), applied to g = Log f )


 
= exp ◦ log ◦ ( f − 1) + 1
| {z }
= f −1
(by (73))
= ( f − 1) + 1 = f = id ( f ) .
Forget that we fixed f . We thus have shown that (Exp ◦ Log) ( f ) = id ( f ) for
each f ∈ K [[ x ]]1 . In other words, Exp ◦ Log = id.
Using a similar argument, we can show that Log ◦ Exp = id. Indeed, let us
fix some g ∈ K [[ x ]]0 . Hence,  3.5.4 (e) (applied to log, exp and g
 Proposition
instead of f , g and h) yields log ◦ exp ◦ g = log ◦ (exp ◦ g). Thus,
 
log ◦ (exp ◦ g) = log ◦ exp ◦ g = x ◦ g
| {z }
=x
(by Theorem 3.7.5)
=g (74)
(by Proposition 3.5.4 (g)). But the definition of Exp yields Exp g = exp ◦ g =
exp ◦ g + 1 (by (72)). Hence, Exp g − 1 = exp ◦ g. Now,
(Log ◦ Exp) ( g) = Log (Exp g)
= log ◦ (Exp g − 1) (by the definition of Log)
= log ◦ (exp ◦ g) (since Exp g − 1 = exp ◦ g)
=g (by (74))
= id ( g) .
Forget that we fixed g. We thus have shown that (Log ◦ Exp) ( g) = id ( g)
for each g ∈ K [[ x ]]0 . In other words, Log ◦ Exp = id. Combining this with
Exp ◦ Log = id, we see that the maps Exp and Log are mutually inverse bijec-
tions between K [[ x ]]0 and K [[ x ]]1 . This proves Lemma 3.7.8.

3.7.4. Addition to multiplication


We will now prove another lemma, which says that the Exp and Log maps
deserve their names:
Math 701 Spring 2021, version April 6, 2024 page 98

Lemma 3.7.9. (a) For any f , g ∈ K [[ x ]]0 , we have

Exp ( f + g) = Exp f · Exp g.

(b) For any f , g ∈ K [[ x ]]1 , we have

Log ( f g) = Log f + Log g.

Proof of Lemma 3.7.9 (sketched). (a) Like many of our arguments involving FPSs,
this will be a short computation followed by lengthy technical arguments justi-
fying the interchanges of summation signs. (In this aspect, our algebraic replica
of the analysis of infinite sums doesn’t differ that much from the original.) We
begin with the computation; the justifying arguments will be sketched after-
wards.
Let f , g ∈ K [[ x ]]0 . Thus, x0 hf =i 0 and
   0
h ix g = 0. Hence, f + g ∈ K [[ x ]]0
(since (20) yields x ( f + g) = x0 f + x0 g = 0).
 0
| {z } | {z }
=0 =0
By the definition of Exp, we have

1 n
Exp f = exp ◦ f = exp [ f ] = ∑ n!
f
n ∈N

1 n
(by Definition 3.5.1, since exp = ∑ x ). Similarly,
n∈N n!

1 n
Exp g = ∑ n!
g
n ∈N

and
1
Exp ( f + g) = ∑ n!
( f + g)n .
n ∈N
Math 701 Spring 2021, version April 6, 2024 page 99

Now, the latter equality becomes

1 n n k n−k
 
1
Exp ( f + g) = ∑ ( f + g) n
= ∑ ∑ k f g
n ∈N
n! |  {z } n ∈N
n! k =0
n n k n−k
=∑ f g
k =0 k
(by the binomial theorem)
n   n
1 n 1
= ∑ ∑ n! k
k n−k
f g = ∑ ∑ k! (n − k )!
f k gn−k
n ∈N k =0 n ∈N
| {z } | {z k=}0
1 = ∑
= (n,k)∈N×N;
k! (n − k)! k≤n
(by (2)) = ∑ ∑
k ∈N n≥k

1 1 k ℓ
= ∑ ∑ k! ( n − k ) !
f k gn−k = ∑ ∑ k!ℓ!
f g
k ∈N n≥k k ∈N ℓ∈N
(here, we have substituted ℓ for n − k in the second sum) .
Comparing this with
! !
1 1
Exp f · Exp g = ∑ k! f k · ∑ ℓ! gℓ
k ∈N ℓ∈N
 
1 n 1 k
since Exp f = ∑ f = ∑ f
n∈N n! k∈N k!
 
 
1 n 1 ℓ
 
and Exp g = ∑ g = ∑ g
 
n∈N n! ℓ∈N ℓ !
1 k 1 ℓ 1 k ℓ
= ∑ ∑ k!
f · g = ∑
ℓ! ∑ k!ℓ!
f g,
k ∈N ℓ∈N k ∈N ℓ∈N

we obtain Exp ( f + g) = Exp f · Exp g.


This is sufficient to prove Lemma 3.7.9 (a) if we can justify the above manipulations
of infinite sums. Actually, there is just one manipulation that we need to justify, and
n
that is our replacement of “ ∑ ∑ ” by “ ∑ ∑ ”. This is an application of the
n ∈N k =0 k ∈N n≥k
“discrete Fubini rule” (specifically, of a version thereof in which the summation is over
all pairs (n, k ) ∈ N × N satisfying
 k ≤ n). In  order to justify this manipulation, we
1
need to show that the family f k gn−k is summable. In
k! (n − k )! (n,k)∈N×N satisfying k≤n
other words, we need to show the following statement:

Statement 1: For each m ∈ N,  all but finitely many


 pairs (n, k ) ∈ N × N
1
satisfying k ≤ n satisfy [ x m ] f k gn−k = 0.
k! (n − k )!

We shall achieve this by proving the following statement:


Math 701 Spring 2021, version April 6, 2024 page 100

Statement 2: For any three nonnegative integers m, k, ℓ with m < k + ℓ, we


have [ x m ] f k gℓ = 0.
[Proof of Statement 2: Let
m k ℓ
 m, k, ℓ be three nonnegative integers with m < k + ℓ. We
must show that [ x ] f g = 0.
We have x0 f = 0. Hence, Lemma 3.3.16 (applied to a = f ) shows that there exists
an h ∈ K [[ x ]] such that f = xh. Consider this h and denote it by u. Thus, u ∈ K [[ x ]]
and f = xu. 
We have x0 g = 0. Hence, Lemma 3.3.16 (applied to a = g) shows that there exists
an h ∈ K [[ x ]] such that g = xh. Consider this h and denote it by v. Thus, v ∈ K [[ x ]]
and g = xv.
Now, from f = xu and g = xv, we obtain f k gℓ = ( xu)k ( xv)ℓ = x k uk x ℓ vℓ = x k+ℓ uk vℓ .
However, Lemma 3.3.17 (applied to k + ℓ and uk vℓ instead of k and a) shows that the
first k + ℓ coefficients of the FPS x k+ℓ uk vℓ are 0. In other words,  the first k + ℓ coefficients
of the FPS f k gℓ are 0 (since f k gℓ = x k+ℓ uk vℓ ). But [ x m ] f k gℓ is one of these first k + ℓ
coefficients (since m < k + ℓ). Thus, we conclude that [ x m ] f k gℓ = 0. This proves


Statement 2.]
Note that Statement 2 entails that the family f k gℓ (k,ℓ)∈N×N is summable (because


when m ∈ N is given, all but finitely many pairs (k, ℓ) ∈ N × N satisfy m < k + ℓ).
However, we need to prove Statement 1, so let us do this:
[Proof of Statement 1: Let m ∈ N. If (n, k ) ∈ N × N is a pair satisfying k ≤ n and
m < n, then
 
1 k n−k 1  
m
[x ] f g = [ x m ] f k gn−k (by (25))
k! (n − k )! k! (n − k )! | {z }
=0
(by Statement 2
(applied to ℓ=n−k),
since m<n=k +(n−k ))

= 0.
Thus, all but finitely many  pairs (n, k ) ∈ N × N satisfying k ≤ n satisfy
1
[ xm ] f k gn−k = 0 (because all but finitely many such pairs satisfy m < n).
k! (n − k )!
This proves Statement 1.]
 As explained above,  Statement 1 shows that the family
1
f k gn−k is summable, and thus our interchange of
k! (n − k )! (n,k)∈N×N satisfying k≤n
summation signs made above is justified. This completes our proof of Lemma 3.7.9
(a).

(b) This easily follows from part (a), since we know that Log is inverse to
Exp. Here are the details:
Let f , g ∈ K [[ x ]]1 . Set u = Log f and v = Log g; then, u, v ∈ K [[ x ]]0 (since
Log is a map from K [[ x ]]1 to K [[ x ]]0 ). Hence, Lemma 3.7.9 (a) (applied to u and
v instead of f and g) yields Exp (u + v) = Exp u · Exp v.
However, Lemma 3.7.8 says that the maps Exp and Log are mutually inverse
bijections between K [[ x ]]0 and K [[ x ]]1 . Hence, Exp ◦ Log = id and Log ◦ Exp =
id.
Math 701 Spring 2021, version April 6, 2024 page 101

Now, from u = Log f , we obtain Exp u = Exp (Log f ) = (Exp ◦ Log) ( f ) =


| {z }
=id
id ( f ) = f . Similarly, Exp v = g. Multiplying these two equalities, we find
Exp u · Exp v = f g. Now, we have

Log (Exp (u + v)) = (Log ◦ Exp) (u + v) = id (u + v) = u + v = Log f + Log g


| {z }
=id

(since u = Log f and v = Log g). In view of Exp (u + v) = Exp u · Exp v = f g,


this rewrites as Log ( f g) = Log f + Log g. This proves Lemma 3.7.9 (b).
We can neatly pack the last few lemmas into a single theorem through the
use of group isomorphisms. To this purpose, we need to observe that K [[ x ]]0 is
a group under addition and K [[ x ]]1 is a group under multiplication:

Proposition 3.7.10. (a) The subset K [[ x ]]0 of K [[ x ]] is closed under addition


and subtraction and contains 0, and thus forms a group (K [[ x ]]0 , +, 0).
(b) The subset K [[ x ]]1 of K [[ x ]] is closed under multiplication and division
and contains 1, and thus forms a group (K [[ x ]]1 , ·, 1).

Proof of Proposition
 0 3.7.10. (a) It is clear that the set K [[ x ]]0 contains the FPS
0 (since x 0 = 0). Thus, it remains to show that K [[ x ]]0 is closed  0  under
 0  and subtraction. But this is easy: If f , g ∈ K [[ x ]]0 , then
addition  0 x f = 0
h i x hg =i 0, and therefore f + g ∈ K [[ x ]]0 (since (20) yields x ( f + g) =
and
x0 f + x0 g = 0) and f − g ∈ K [[ x ]]0 (by a similar argument using (21)).
| {z } | {z }
=0 =0
Thus, K [[ x ]]0 is closed under addition and subtraction. This proves Proposition
3.7.10 (a).
 Any a ∈ K [[0x]]1 is invertible in K [[ x ]] (indeed, a ∈ K [[ x ]]1 shows that
 0(b)
x a = 1; thus, x a is invertible in K; therefore, Proposition 3.3.7 entails that
f
a is invertible in K [[ x ]]). Hence, is well-defined for any f , g ∈ K [[ x ]]1 .
g
Next, we claim  that K [[ x ]]1 is0 closed under multiplication. Indeed, if f , g ∈
K [[ x ]]1 , then x0 fh= i1 and h ix g = 1, and therefore f g ∈ K [[ x ]]1 (since (24)
yields x ( f g) = x0 f · x0 g = 1). This shows that K [[ x ]]1 is closed under
 0
| {z } | {z }
=1 =1
multiplication.
It remains to prove that K [[ x ]]1 is closed under division. Indeed, if f , g ∈
f
K [[ x ]]1 , then x0 f = 1 and x0 g = 1, and therefore ∈ K [[ x ]]1 (because we
   
g
Math 701 Spring 2021, version April 6, 2024 page 102

f
have f = · g and thus
g
h i f  h i
0 f
h i h i
0 0
x f = x ·g = x · x0 g (by (24))
g g | {z }
=1
h i f
= x0
g
  f f
and thus x0 = x0 f = 1, so that ∈ K [[ x ]]1 ). This shows that K [[ x ]]1 is
 
g g
closed under division. Thus, Proposition 3.7.10 (b) is proven.
The two groups in Proposition 3.7.10 can now be connected through Exp and
Log:

Theorem 3.7.11. The maps

Exp : (K [[ x ]]0 , +, 0) → (K [[ x ]]1 , ·, 1)

and
Log : (K [[ x ]]1 , ·, 1) → (K [[ x ]]0 , +, 0)
are mutually inverse group isomorphisms.

Proof of Theorem 3.7.11 (sketched). Lemma 3.7.9 yields that these two maps are
group homomorphisms27 . Lemma 3.7.8 shows that they are mutually inverse.
Combining these results, we conclude that these two maps are mutually inverse
group isomorphisms. This proves Theorem 3.7.11.
Theorem 3.7.11 helps us turn addition into multiplication and vice versa
when it comes to FPSs, at least if the constant terms are the right ones. This
will come useful rather soon.

3.7.5. The logarithmic derivative


For future use, we shall define and briefly study one more concept related to
logarithms:

Definition 3.7.12. In this definition, we do not use Convention 3.7.1; thus,


K can be an arbitrary commutative ring. However, we set K [[ x ]]1 =
f ∈ K [[ x ]] | x0 f = 1 .
 

27 Here,we are using the following fact: If ( G, ∗, eG ) and ( H, ∗, e H ) are any two groups, and if
Φ : G → H is a map such that every f , g ∈ G satisfy Φ ( f ∗ g) = Φ ( f ) ∗ Φ ( g), then Φ is a
group homomorphism.
Math 701 Spring 2021, version April 6, 2024 page 103

For any FPS f ∈ K [[ x ]]1 , we define the logarithmic derivative loder f ∈


f′
K [[ x ]] to be the FPS . (This is well-defined, since f is easily seen to be
f
invertible. )28

The reason why this FPS loder f is called the logarithmic derivative of f is
made clear by the following simple fact:

Proposition 3.7.13. Let K be a commutative Q-algebra. Let f ∈ K [[ x ]]1 be an


FPS. Then, loder f = (Log f )′ .

Proof of Proposition 3.7.13. The definition


 0  of Log yields  0◦ ( f − 1).
 0  Log f = log
 0From
 f 0∈ K [[ x ]]1 , we obtain
0
 x 0  f = 1 = x 1. Now, x ( f − 1) =
x f − x 1 = 0 (since x f = x 1). Hence, Proposition 3.7.3 (b) (applied
to g = f − 1) yields
  −1
 ′ ( f − 1) ′
log ◦ ( f − 1) = 1 + ( f − 1)  · ( f − 1 ) ′ = f −1 · ( f − 1 ) ′ = .
 
| {z } f
=f

In view of Log f = log ◦ ( f − 1), we can rewrite this as

′( f − 1) ′
(Log f ) = .
f

Since ( f − 1)′ = f ′ (which is easy to see29 ), we can rewrite this further as


f′ f′
(Log f )′ = = loder f (since loder f is defined as ). This proves Proposition
f f
3.7.13.
Note that Proposition 3.7.13 only makes sense when K is a Q-algebra (oth-
erwise, Log f would make no sense), which is why we are not using it as a
definition of loder f . The logarithmic derivative is defined even when the loga-
rithm is not!
If you have seen the logarithmic derivative in analysis, you will likely expect
the following property:

28 Indeed: Let f ∈ K [[ x ]]1 . Then, x0 f = 1. Thus, x0 f is invertible. Hence, Proposition 3.3.7


   

(applied to a = f ) yields that f is invertible.


29 Proof. Theorem 3.6.2 (a) (applied to g = −1) yields ( f + (−1))′ = f ′ + (−1)′ = f ′ . In other
| {z }
=0
words, ( f − 1)′ = f ′ .
Math 701 Spring 2021, version April 6, 2024 page 104

Proposition 3.7.14. Let f , g ∈ K [[ x ]]1 be two FPSs. Then, loder ( f g) =


loder f + loder g.
(Here, we do not use Convention 3.7.1; thus, K can be an arbitrary com-
mutative ring.)
Proof of Proposition 3.7.14. The definition of a logarithmic derivative yields loder f =
f′ g′
and loder g = , but it also yields
f g
( f g)′ f ′ g + f g′
 
since Theorem 3.6.2 (d)
loder ( f g) = =
fg fg says that ( f g)′ = f ′ g + f g′
f′ g′
= + = loder f + loder g.
f g
|{z} |{z}
=loder f =loder g

This proves Proposition 3.7.14.


Corollary 3.7.15. Let f 1 , f 2 , . . . , f k be any k FPSs in K [[ x ]]1 . Then,

loder ( f 1 f 2 · · · f k ) = loder ( f 1 ) + loder ( f 2 ) + · · · + loder ( f k ) .

(Here, we do not use Convention 3.7.1; thus, K can be an arbitrary com-


mutative ring.)
Proof of Corollary 3.7.15. Induct on k. The base case (k = 0) requires showing that
loder 1 = 0, which is easy to see from the definition of a logarithmic derivative
(since 1′ = 0). The induction step follows easily from Proposition 3.7.14.
Corollary 3.7.16. Let f be any FPS in K [[ x ]]1 . Then, loder f −1 = − loder f .


(Here, we do not use Convention 3.7.1; thus, K can be an arbitrary com-


mutative ring.)
Proof of Corollary 3.7.16. First of all, f is invertible (as we showed in Definition
3.7.12). This shows that f −1 is well-defined.
Proposition 3.7.14 (applied to g = f −1 ) yields −1 = loder f +

 loder f f

loder f −1 . However, we also have loder  f f −1  = loder 1 = 0 (since 1′ = 0).



| {z }
=1
Comparing these two equalities, we obtain loder f + loder f −1 = 0. In other


words, loder f −1 = − loder f . This proves Corollary 3.7.16.




3.8. Non-integer powers


3.8.1. Definition
Now, let us again recall Example 2 from Section
√ 3.1. In order to fully justify
that example, we still need to explain what 1 − 4x is.
Math 701 Spring 2021, version April 6, 2024 page 105

More generally, let us try to define non-integer powers of FPSs (since square
roots are just 1/2-th powers). Thus, we are trying to solve the following prob-
lem:

Problem: Devise a reasonable definition of the c-th power f c for any


FPS f ∈ K [[ x ]] and any c ∈ K.

Here, “reasonable” means that it should have some of the properties we


would expect:

• It should not conflict with the existing notion of f c for c ∈ N. That is,
if c ∈ N, then our new definition of f c should yield the same result as
the existing meaning that f c has in this case (namely, f f · · · f ). The same
| {z }
c times
should hold for c ∈ Z when f is invertible.

• Rules of exponents should hold: i.e., we should have

f a+b = f a f b , ( f g)a = f a g a , ( f a )b = f ab (75)

for all a, b ∈ K and f , g ∈ K [[ x ]].

• For any positive integer n and any FPS f ∈ K [[ x ]], the 1/n-th power f 1/n
should be an n-th root of f (that is, an FPS whose n-th power is f ). (This
actually follows from the previous two properties, since we can apply the
rule ( f a )b = f ab to a = 1/n and b = n.)

Clearly, we cannot solve the above problem in full generality:

• The power 0−1 cannot be reasonably defined (unless K is trivial). Indeed,


0−1 · 01 would have to equal 0−1+1 = 00 = 1, but this would contradict
0−1 · 01 = 0−1 · 0 = 0.

• The power x1/2 cannot be reasonably defined either (unless K is trivial).


Indeed, there is no FPS whose square is x. This will be proved in Exercise
A.2.8.1 (a).

• Even the power (−1)1/2 cannot always be defined: There is no guarantee


that K contains a square root of −1 (and if K does not, then it is easy to
see that K [[ x ]] does neither).

However, all we want is to make sense of 1 − 4x, so let us restrict ourselves
to FPSs whose constant term is 1. Using the notation from Definition 3.7.6 (b),
we are thus moving on to the following problem:

More realistic problem: Devise a reasonable definition of the c-th power


f c for any FPS f ∈ K [[ x ]]1 and any c ∈ K.
Math 701 Spring 2021, version April 6, 2024 page 106

Besides imposing the above wishlist of properties, we want this c-th power f c
itself to belong to K [[ x ]]1 , since otherwise the iterated power ( f a )b in our rules
of exponents might be undefined.
It turns out that this is still too much to ask. Indeed, if K = Z/2, then the
FPS 1 + x ∈ K [[ x ]]1 has no square root (you get to prove this in Exercise A.2.8.1
(c)), so its 1/2-th power (1 + x )1/2 cannot be reasonably defined.
However, if we assume (as in Convention 3.7.1) that K is a commutative Q-
algebra, then we get lucky: Our “more realistic problem” can be solved in (at
least) two ways:
1st solution: We define
 
c k
(1 + x ) : = ∑
c
x for each c ∈ K,
k ∈N
k

in order to make Newton’s binomial formula (Theorem 3.3.10) hold for arbi-
trary exponents30 . Subsequently, we define

f c : = (1 + x ) c [ f − 1] for any f ∈ K [[ x ]]1 and c ∈ K (76)


 
c c k
(in order to have (1 + g) = ∑ g hold not only for g = x, but also for all
k ∈N k
g ∈ K [[ x ]]0 ).
It is clear that the FPS f c is well-defined in this way. However, proving that
this definition satisfies all our wishlist (particularly the rules of exponents (75))
is highly nontrivial. Some of this is done in [Loehr11, §7.12], but it is still a lot
of work.
Thus, we shall discard this definition of f c , and instead take a different way:
2nd solution: Recall the mutually inverse group isomorphisms

Exp : (K [[ x ]]0 , +, 0) → (K [[ x ]]1 , ·, 1) and


Log : (K [[ x ]]1 , ·, 1) → (K [[ x ]]0 , +, 0)

from Theorem 3.7.11. Thus, for any f ∈ K [[ x ]]1 and any c ∈ Z, the equation

Log ( f c ) = c Log f

holds (since Log is a group homomorphism). This suggests that we define f c


for all c ∈ K by the same equation. In other words, we define f c for all c ∈ K
by setting f c = Exp (c Log f ) (since the map Exp is inverse to Log). And this is
what we shall do now:

c ( c − 1) ( c − 2) · · · ( c − k + 1)
 
30 Note c
that = is well-defined since K is a commutative
k k!
Q-algebra.
Math 701 Spring 2021, version April 6, 2024 page 107

Definition 3.8.1. Assume that K is a commutative Q-algebra. Let f ∈ K [[ x ]]1


and c ∈ K. Then, we define an FPS

f c := Exp (c Log f ) ∈ K [[ x ]]1 .

This definition of f c does not conflict with our original definition of f c when
c ∈ Z because (as we said) the original definition of f c already satisfies Log ( f c ) =
c Log f and therefore f c = Exp (c Log f ).
Moreover, Definition 3.8.1 makes the rules of exponents hold:
Theorem 3.8.2. Assume that K is a commutative Q-algebra. For any a, b ∈ K
and f , g ∈ K [[ x ]]1 , we have

f a+b = f a f b , ( f g)a = f a g a , ( f a )b = f ab .

Proof. Easy exercise (Exercise A.2.8.2).


Now, let us return to Example 2 from Section 3.1. In that example, we had to
solve the quadratic equation

C ( x ) = 1 + x (C ( x ))2 for an FPS C ( x ) ∈ Q [[ x ]] .


Let us write C for C ( x ); thus, this quadratic equation becomes
C = 1 + xC2 .
By completing the square, we can rewrite this equation in the equivalent form

(1 − 2xC )2 = 1 − 4x.
Taking both sides of this equation to the 1/2-th power, we obtain
 1/2
(1 − 2xC )2 = (1 − 4x )1/2

(since both sides are FPSs with constant term 1). However, the FPS 1 − 2xC
 1/2
has constant term 1; thus, the rules of exponents yield (1 − 2xC )2 =
(1 − 2xC )2·1/2 = 1 − 2xC. Hence,
 1/2
2
1 − 2xC = (1 − 2xC ) = (1 − 4x )1/2 .

This is a linear equation in C; solving it for C yields


1  
C= 1 − (1 − 4x )1/2 .
2x
This is precisely the “square-root” expression for C = C ( x ) that we have ob-
tained back in Section 3.1, but now we have proved it rigorously.
Math 701 Spring 2021, version April 6, 2024 page 108

3.8.2. The Newton binomial formula for arbitrary exponents


Is Example 2 from Section 3.1 fully justified now? No, because we still need to
prove the identity (12) that we used back there. Since we are defining powers
in the 2nd way (i.e., using Definition 3.8.1 rather than using (76)), it is not
immediately obvious. Nevertheless, it can be proved. More generally, we can
prove the following:

Theorem 3.8.3 (Generalized Newton binomial formula). Assume that K is a


commutative Q-algebra. Let c ∈ K. Then,
 
c k
(1 + x ) = ∑
c
x .
k ∈N
k

The following proof illustrates a technique that will probably appear prepos-
terous if you are seeing it for the first time, but is in fact both legitimate and
rather useful.
Proof of Theorem 3.8.3 (sketched). The definition of Log yields
 

Log (1 + x ) = log ◦ (1 + x ) − 1 = log ◦ x = log


| {z }
=x

(by Proposition 3.5.4 (g), applied to g = log).


Now, let us obstinately compute (1 + x )c using Definition 3.8.1 and the defi-
nitions of Exp and Log. To wit: Let P denote the set {1, 2, 3, . . .}. By Definition
3.8.1, we have

(1 + x ) c
   
= Exp (c Log (1 + x )) = Exp c log since Log (1 + x ) = log
 
= exp ◦ c log (by the definition of Exp)
! !
(−1)n−1 n (−1)n−1 n
= exp ◦ c ∑ x since log = ∑ x
n ≥1
n n ≥1
n
!
(−1)n−1 n
= exp ◦ ∑ cx
n ≥1
n
!m
n −1
1 (−1)
= ∑ ∑ cx n (77)
m ∈N
m! n≥1 n

1 n 1 m
(by Definition 3.5.1, since exp = ∑ x = ∑ x ).
n∈N n! m∈N m!
Math 701 Spring 2021, version April 6, 2024 page 109

!m
(−1)n−1 n
Now, fix m ∈ N. We shall expand ∑ cx . Indeed, we can
n ≥1 n
replace the “ ∑ ” sign by an “ ∑ ” sign, since P = {1, 2, 3, . . .}. Thus,
n ≥1 n ∈P
!m
(−1)n−1 n
∑ n cx
n ≥1
!m
(−1)n−1 n
= ∑ n cx
n ∈P
! ! !
(−1)n−1 n (−1)n−1 n (−1)n−1 n
= ∑ n cx ∑ n cx ··· ∑
n
cx
n ∈P n ∈P n ∈P
| {z }
m times
! ! !
(−1)n1 −1 n1 (−1) n2 −1
(−1)nm −1 nm
= ∑ n1
cx ∑ n2
cx n2 ··· ∑
nm
cx
n ∈P
1 n2 ∈P n m ∈P
(here, we have renamed the summation indices)
! ! !
(−1)n1 −1 n1 (−1)n2 −1 n2 (−1)nm −1 nm
= ∑ n1
cx
n2
cx ···
nm
cx
(n1 ,n2 ,...,nm )∈Pm
Math 701 Spring 2021, version April 6, 2024 page 110

(by a product rule for the product of m sums31 ). Hence,


!m
(−1)n−1 n
∑ n cx
n ≥1
! ! !
(−1)n1 −1 n1 (−1)n2 −1 n2 (−1)nm −1 nm
= ∑ n1
cx
n2
cx ···
nm
cx
(n1 ,n2 ,...,nm )∈Pm
| {z } | {z }
= ∑ ∑ n1 +n2 +···+nm −m
k∈N ( n ,n ,...,nm )∈Pm ;
(−1)
1 2 = cm x n1 +n2 +···+nm
n1 +n2 +···+nm =k n1 n2 · · · n m
(−1)n1 +n2 +···+nm −m m
= ∑ ∑ n1 n2 · · · n m
c |x
n1 +n2 +···+nm
{z }
k ∈N (n1 ,n2 ,...,nm )∈Pm ; =xk
n1 +n2 +···+nm =k (since n1 +n2 +···+nm =k)

(−1)n1 +n2 +···+nm −m m k


= ∑ ∑ n1 n2 · · · n m
c x . (78)
k ∈N (n1 ,n2 ,...,nm )∈Pm ;
n1 +n2 +···+nm =k

Now, forget that we fixed m. We thus have proved (78) for each m ∈ N.
Now, (77) becomes

(1 + x ) c
!m
n −1
1 (−1)
= ∑ m! ∑ n
cx n
m ∈N n ≥1
1 (−1)n1 +n2 +···+nm −m m k
= ∑ m! ∑ ∑ n 1 n 2 · · · n m
c x (by (78))
m ∈N k ∈N )∈P ;
(n1 ,n2 ,...,nm m
n1 +n2 +···+nm =k

1 (−1)n1 +n2 +···+nm −m m k


= ∑ ∑ ∑ m!
·
n1 n2 · · · n m
c x
k ∈N m ∈N (n1 ,n2 ,...,nm )∈Pm ;
n1 +n2 +···+nm =k
 
1 (−1)n1 +n2 +···+nm −m m 
∑ ∑ ∑ xk .

=  · c  (79)
k ∈N

m ∈N )∈Pm ;
(n1 ,n2 ,...,nm
m! n n
1 2 · · · n m

n1 +n2 +···+nm =k

31 This product rule says that


! ! !
∑ a1,n1 ∑ a2,n2 ··· ∑ am,nm = ∑ a1,n1 a2,n2 · · · am,nm
n1 ∈ A1 n2 ∈ A2 nm ∈ Am (n1 ,n2 ,...,nm )∈ A1 × A2 ×···× Am

for any m sets A1 , A2 , . . . , Am and any elements ai,j ∈ K, provided that all the sums on the
left hand side of this equality are summable. We leave it to the reader to convince himself
of this rule (intuitively, it just says that we can expand a product of sums in the usual way,
even when the sums are infinite) and to check that the sums we are applying it to are indeed
summable.
Math 701 Spring 2021, version April 6, 2024 page 111

1
Now, let k ∈ N. Let us rewrite the “middle sum” ∑ ∑ ·
m ∈N (n1 ,n2 ,...,nm )∈Pm ; m!
n1 +n2 +···+nm =k
n1 +n2 +···+nm −m
(−1)
cm on the right hand side as a finite sum. Indeed, a com-
n1 n2 · · · n m
position of k shall mean a tuple (n1 , n2 , . . . , nm ) of positive integers satisfying
n1 + n2 + · · · + nm = k. (For example, (1, 3, 1) is a composition of 5. We will
study compositions in more detail in Section 3.9.) Let Comp (k ) denote the
set of all compositions of k. It is easy to see that this set Comp (k ) is finite32 .
Now, we can rewrite the double summation sign “ ∑ ∑ ” as a
m ∈N (n1 ,n2 ,...,nm )∈Pm ;
n1 +n2 +···+nm =k
single summation sign “ ∑ ” (since Comp (k) is precisely the set
(n1 ,n2 ,...,nm )∈Comp(k)
of all tuples (n1 , n2 , . . . , nm ) ∈ Pm satisfying n1 + n2 + · · · + nm = k). Hence, we
obtain
1 (−1)n1 +n2 +···+nm −m m
∑ ∑ m!
·
n1 n2 · · · n m
c
m ∈N (n1 ,n2 ,...,nm)∈Pm ;
n1 +n2 +···+nm =k

1 (−1)n1 +n2 +···+nm −m m


= ∑ m!
·
n1 n2 · · · n m
c . (80)
(n1 ,n2 ,...,n )∈Comp(k)
m

Forget that we fixed k. Thus, for each k ∈ N, we have defined a finite set
Comp (k ) and shown that (80) holds.
32 Proof.
Let (n1 , n2 , . . . , nm ) ∈ Comp (k). Thus, (n1 , n2 , . . . , nm ) is a composition of k. In other
words, (n1 , n2 , . . . , nm ) is a finite tuple of positive integers satisfying n1 + n2 + · · · + nm = k.
Hence, all its m entries n1 , n2 , . . . , nm are positive integers and thus are ≥ 1; therefore,
n1 + n2 + · · · + nm ≥ 1 + 1 + · · · + 1 = m, so that m ≤ n1 + n2 + · · · + nm = k. Thus,
| {z }
m times
m ∈ {0, 1, . . . , k }.
Furthermore, the sum n1 + n2 + · · · + nm is ≥ to each of its m addends (since its m
addends n1 , n2 , . . . , nm are positive). In other words, we have n1 + n2 + · · · + nm ≥ ni for
each i ∈ {1, 2, . . . , m}. Thus, for each i ∈ {1, 2, . . . , m}, we have ni ≤ n1 + n2 + · · · + nm = k
and therefore ni ∈ {1, 2, . . . , k} (since ni is a positive integer). Hence,

(n1 , n2 , . . . , nm ) ∈ {1, 2, . . . , k}m ⊆ {1, 2, . . . , k}ℓ


[

ℓ∈{0,1,...,k}

(since m ∈ {0, 1, . . . , k}).


Now, forget that we fixed (n1 , n2 , . . . , nm ). We thus have shown that (n1 , n2 , . . . , nm ) ∈
{1, 2, . . . , k}ℓ for each (n1 , n2 , . . . , nm ) ∈ Comp (k). In other words, Comp (k) ⊆
S
ℓ∈{0,1,...,k}
{1, 2, . . . , k}ℓ . Since the set {1, 2, . . . , k}ℓ is clearly finite (having size
S S
ℓ∈{0,1,...,k} ℓ∈{0,1,...,k}
∑ kℓ ), this entails that the set Comp (k) is finite as well, qed.
ℓ∈{0,1,...,k}
(Incidentally, we will see in Section 3.9 that this set Comp (k) has size 2k−1 for k ≥ 1, and
size 1 for k = 0.)
Math 701 Spring 2021, version April 6, 2024 page 112

Using (80), we can rewrite (79) as

(1 + x ) c
 
n1 +n2 +···+nm −m
1 (−1)
= ∑  ∑ m!
·
n1 n2 · · · n m
cm  x k . (81)
k ∈N (n1 ,n2 ,...,nm )∈Comp(k)

Now, recall that our goal is to prove that this equals


 
c
∑ k xk .
k ∈N

This is equivalent to proving that the equality

1 (−1)n1 +n2 +···+nm −m m


 
c
∑ m!
·
n1 n2 · · · n m
c =
k
(82)
(n1 ,n2 ,...,n )∈Comp(k)
m

holds for each k ∈ N (because two FPSs u = ∑ uk x k and v = ∑ vk x k (with


k ∈N k ∈N
uk ∈ K and vk ∈ K) are equal if and only if the equality uk = vk holds for each
k ∈ N).
c
  we have reduced our original goal (which was to prove (1 + x ) =
Thus,
c k
∑ x ) to the auxiliary goal of proving the equality (82) for each k ∈ N.
k ∈N k
However, this doesn’t look very useful, since (82) is too messy an equality to
have a simple proof. We are seemingly stuck.
However, it turns out that we are almost there – we just need to take a bird’s
eye view. Here is the plan: We fix k ∈ N. Instead of trying to prove the
equality (82) directly, we observe that both sides of this equality are polyno-
mials (with rational coefficients) in c. (Indeed, the left hand side is clearly a
polynomial in c, since it is a finite sum of “rational number times apower 
c
of c” expressions. The right hand side is a polynomial in c because =
k
c ( c − 1) ( c − 2) · · · ( c − k + 1)
.) Thus, the polynomial identity trick (which we
k!
learnt in Subsection 3.2.3) tells us that if we can prove this equality (82) for each
c ∈ N, then it will automatically hold for each c ∈ K (since the two polynomials
that yield its left and right hand sides will have to be equal, having infinitely
many equal values). Hence, in order to prove (82) for each c ∈ K, it suffices
to prove it for each c ∈ N. Now, how can we prove it for each c ∈ N ? We
forget that we fixed k, and we remember that the equality (82) (for all  k ∈ N) is
c
just an equivalent restatement of the FPS equality (1 + x )c = ∑ x k (that
k ∈N k
is, the equality we have originally set out to prove). However, we know for sure
that this equality holds for each c ∈ N (by Theorem 3.3.10, applied to n = c).
Math 701 Spring 2021, version April 6, 2024 page 113

Hence, the equality (82) also holds for each c ∈ N (and each k ∈ N). And this
is precisely what we needed to show!
Let me explain this argument in detail now, as it is somewhat vertigo-inducing.
We forget that we fixed K and c. Now, fix c ∈ N. Thus, c ∈ N ⊆ Z. Hence, in
the ring Q [[ x ]], we have
 
c k
(1 + x ) = ∑
c
x (by Theorem 3.3.10, applied to n = c) .
k ∈N
k

However, (81) (applied to K = Q) shows that


 
n1 +n2 +···+nm −m
1 (−1)
(1 + x ) c = ∑  ∑ · cm  x k .
k∈N (n ,n ,...,n )∈Comp(k)
m! n n
1 2 · · · n m
1 2 m

Comparing these two equalities, we obtain


 
n1 +n2 +···+nm −m  
1 (− 1 ) c k
∑ ∑ m!
·
n 1 n 2 · · · n m
c x = ∑
m k
k
x .
k∈N (n ,n ,...,n )∈Comp(k)
1 2 m k ∈N

Comparing coefficients in this equality, we see that

1 (−1)n1 +n2 +···+nm −m m


 
c
∑ m!
·
n 1 n 2 · · · n m
c =
k
(83)
(n1 ,n2 ,...,n )∈Comp(k)
m

for each k ∈ N. This is an equality between two rational numbers.


Now, forget that we fixed c. We thus have shown that (83) holds for each
k ∈ N and each c ∈ N.
Let us now fix k ∈ N. We have just shown that the equality (83) holds for
each c ∈ N. In other words, the two polynomials

1 (−1)n1 +n2 +···+nm −m m


f := ∑ m!
·
n 1 n 2 · · · n m
x ∈ Q [x]
(n1 ,n2 ,...,n )∈Comp(k)
m

and
x ( x − 1) ( x − 2) · · · ( x − k + 1)
 
x
g := = ∈ Q [x]
k k!
satisfy f [c] = g [c] for each c ∈ N (because f [c] is the left hand side of (83),
while g [c] is the right hand side of (83)). Thus, each c ∈ N satisfies ( f − g) [c] =
f [c] − g [c] = g [c] − g [c] = 0. In other words, each c ∈ N is a root of f − g.
|{z}
= g[c]
Hence, the polynomial f − g has infinitely many roots in Q (since there are
infinitely many c ∈ N). Since f − g is a polynomial with rational coefficients,
Math 701 Spring 2021, version April 6, 2024 page 114

this is impossible unless f − g = 0. We thus must have f − g = 0, so that f = g.


In other words,

1 (−1)n1 +n2 +···+nm −m m


 
x
∑ m!
·
n 1 n 2 · · · n m
x =
k
(84)
(n ,n ,...,n )∈Comp(k)
1 2 m

holds in the polynomial ring Q [ x ].


Now, forget that we fixed k. We thus have shown that the equality (84) holds
for each k ∈ N.
Now, fix a commutative Q-algebra K and an arbitrary element c ∈ K. For
each k ∈ N, we then have

1 (−1)n1 +n2 +···+nm −m m


 
c
∑ m!
·
n1 n2 · · · n m
c =
k
(85)
(n ,n ,...,n )∈Comp(k)
1 2 m

(by substituting c for x on both sides of the equality (84)). Consequently, we


can rewrite (81) as  
c k
(1 + x ) = ∑
c
x .
k ∈N
k
This proves Theorem 3.8.3.
The method we used in the above proof is worth recapitulating in broad
strokes:
c
 to prove a fairly abstract statement (namely, the identity (1 + x ) =
• We had
c k
∑ x ).
k ∈N k

• We translated this statement into an awkward but more concrete state-


ment (namely, the equality (82)).

• We then argued that this concrete statement needs only to be proven in a


special case (viz., for all c ∈ N rather than for all c ∈ K), because it is an
equality between two polynomials with rational coefficients.

• To prove this concrete statement in this special case, we translated it back


into the abstract language of FPSs, and realized that in this special case it
is already known (as a consequence of Theorem 3.3.10).

Thus, by strategically switching between the abstract and the concrete, we


have managed to use the advantages of both sides.
Now that Theorem 3.8.3 is proved, Example 2 from Section 3.1 is fully justi-
fied (since we can obtain (12) by applying Theorem 3.8.3 to K = Q and c = 1/2).
Math 701 Spring 2021, version April 6, 2024 page 115

3.8.3. Another application


Let us show yet another application of powers with non-integer exponents and
the generalized Newton formula. We shall show the following binomial iden-
tity:

Proposition 3.8.4. Let n ∈ C and k ∈ N. Then,


k
n+i−1 n+k−1
    
n
∑ i k − 2i
=
k
.
i =0

Proposition 3.8.4 can be proved in various ways. For example, a mostly


combinatorial proof is found in [19fco, Exercise 2.10.7 and Exercise 2.10.8]33 .
We shall give a proof using generating functions instead.
Proof of Proposition 3.8.4. Define two FPSs f , g ∈ C [[ x ]] by

n + i − 1 2i
 
f = ∑ x (86)
i ∈N
i
 
n j
and g= ∑ x. (87)
j ∈N
j

(We will soon see why we chose to define them this way.) Multiplying the two

33 Specifically,
[19fco, Exercise 2.10.7] proves Proposition 3.8.4 in the particular case when n ∈
N; then, [19fco, Exercise 2.10.8] extends it to the case when n ∈ R. However, the latter
argument can just as well be used to extend it to arbitrary n ∈ C.
Math 701 Spring 2021, version April 6, 2024 page 116

equalities (86) and (87), we find


 !   !
n + i − 1 2i

n j
fg = ∑ x ∑ x
i ∈N
i j ∈N
j
n + i − 1 2i n j
   
= ∑ ∑ x x
i ∈N j ∈N |
i j
| {z }  {z  }
= ∑ n + i − 1 n 2i+ j
(i,j)∈N×N = x
i j
n+i−1
  
n 2i+ j
= ∑ i j
x
(i,j)∈N×N
| {z }
= ∑ ∑
m ∈N (i,j)∈N×N;
2i + j=m
n+i−1
  
n
= ∑ ∑ i j
2i + j
|x {z }
m ∈N (i,j)∈N×N; =xm
2i + j=m (since 2i + j=m)
n+i−1
  
n m
= ∑ ∑ i j
x
m ∈N (i,j)∈N×N;
2i + j=m
 
n+i−1
  
n 
∑ ∑  xm .

= 
m ∈N

(i,j)∈N×N;
i j 
2i + j=m

Hence, the x k -coefficient of this FPS f g is

n+i−1
  
h i n
xk ( f g) = ∑ i j
. (88)
(i,j)∈N×N;
2i + j=k

Now, a pair (i, j) ∈ N × N satisfying 2i + j = k is uniquely determined


by its first entry i, since its second entry j is given by j= k − 2i.
Hence,
 we
n+i−1 n
can substitute (i, k − 2i ) for (i, j) in the sum ∑ , thus
(i,j)∈N×N; i j
2i + j=k
Math 701 Spring 2021, version April 6, 2024 page 117

n+i−1
  
n
rewriting this sum as ∑ . Hence,
i ∈N; i k − 2i
k−2i ∈N

n+i−1 n+i−1
     
n n
∑ i j
= ∑ i k − 2i
(i,j)∈N×N; i ∈N;
2i + j=k k−2i ∈N

n+i−1
  
n
= ∑ i k − 2i
i ∈N;
2i ≤k
since an i ∈ N satisfies k − 2i ∈ N
 
if and only if it satisfies 2i ≤ k
n+i−1
  
n
= ∑
i ∈N;
i k − 2i
i ≤k

(here, we have replaced the condition “2i ≤ k” under the summation sign by
the weaker condition “i ≤ k”, thus extending the range of the sum; but this did
not change thesum, since
 all the newly introduced addends are 0 because of
n
the vanishing factor). Thus, (88) becomes
k − 2i

n+i−1
  
h i n
k
x ( f g) = ∑ i j
(i,j)∈N×N;
2i + j=k
n+i−1
  
n
= ∑ i k − 2i
i ∈N;
i ≤k
k
n+i−1
  
n
= ∑ i k − 2i
. (89)
i =0

Note that the right hand side here is precisely the left hand side of the identity
we are trying to prove. This is why we defined f and g as we did. With a bit
of experience, the computation above can easily be reverse-engineered, and the
definitions of f and g are essentially forced by the goal of making (89) hold.
Anyway, it is now clear that a simple expression for f g would move us for-
ward. So let us try to simplify f and g. For g, the answer is easiest: We have
 
n j
g= ∑ x = (1 + x ) n ,
j ∈N
j
 
n j n
because Theorem 3.8.3 (applied to c = n) yields (1 + x ) = ∑ x . For f ,
j ∈N j
Math 701 Spring 2021, version April 6, 2024 page 118

we need a few more steps. Proposition 3.3.12 yields


i n+i−1
 
(1 + x ) = ∑ (−1)
−n
xi . (90)
i ∈N
i

Substituting − x2 for x on both sides of this equality, we obtain


i n+i−1 i n+i−1
 −n   i  
1−x 2
= ∑ (−1) −x 2
= ∑ (−1) (−1)i x2i
i ∈N
i | {z } i∈N i
=(−1)i x2i
n + i − 1 2i n + i − 1 2i
   
= ∑ (−1) (−1)
i i
x = ∑ x = f
i ∈N
| {z } i i ∈N
i
=1
2 −n . Multiplying this equality by g = (1 + x )n , we obtain

and thus f = 1 − x
−n
1+x n 1 − x2
 −n   
2 n
fg = 1−x (1 + x ) = =
1 − x2 1+x

1−x 2 (1 − x ) (1 + x )

−n
= (1 − x ) since = = 1−x
1+x 1+x
i n+i−1
 
= ∑ (−1) (− x )i
i ∈N
i | {z }
=(−1)i xi
(this follows by substituting − x for x on both sides of (90))
i n+i−1 i n+i−1
   
= ∑ (−1) (−1) x = ∑ (−1) (−1)
i i i
xi
i ∈N
i i ∈N
| {z } i
=1
n+i−1 i
 
= ∑ x.
i ∈N
i
n+k−1
 
k
 k
Hence, the x -coefficient in f g is x ( f g) = . Comparing this with
k
(89), we obtain
k 
n+i−1 n+k−1
   
n
∑ i k − 2i
=
k
.
i =0
Thus, Proposition 3.8.4 is proved.

3.9. Integer compositions


3.9.1. Compositions
Next, let us count certain simple combinatorial objects known as integer compo-
sitions. There are easy combinatorial ways to do this (see, e.g., [19fco, §2.10.1]),
but we shall employ generating functions, in order to see one more example of
how these can be used.
Math 701 Spring 2021, version April 6, 2024 page 119

Definition 3.9.1. (a) An (integer) composition means a (finite) tuple of positive


integers.
(b) The size of an integer composition α = (α1 , α2 , . . . , αm ) is defined to be
the integer α1 + α2 + · · · + αm . It is written |α|.
(c) The length of an integer composition α = (α1 , α2 , . . . , αm ) is defined to
be the integer m. It is written ℓ (α).
(d) Let n ∈ N. A composition of n means a composition whose size is n.
(e) Let n ∈ N and k ∈ N. A composition of n into k parts is a composition
whose size is n and whose length is k.

Example 3.9.2. The tuple (3, 8, 6) is a composition with size 3 + 8 + 6 = 17


and length 3. In other words, it is a composition of 17 into 3 parts.
The empty tuple () is a composition of 0 into 0 parts. It is the only compo-
sition of 0, and also is the only composition with length 0.

The following questions arise quite naturally:

1. How many compositions of n exist for a given n ∈ N ?

2. How many compositions of n into k parts exist for given n, k ∈ N ?

Let us use generating functions to answer question 2.


Approach to question 2. Fix k ∈ N, but don’t fix n. Let

an,k = (# of compositions of n into k parts) . (91)

We want to find an,k . We define the generating function

Ak := ∑ an,k x n = ( a0,k , a1,k , a2,k , . . .) ∈ Q [[ x ]] . (92)


n ∈N

Let us write P for the set {1, 2, 3, . . .}. Then, a composition of n into k parts
is nothing but a k-tuple (α1 , α2 , . . . , αk ) ∈ Pk satisfying α1 + α2 + · · · + αk = n.
Hence, (91) can be rewritten as
 
an,k = # of all k-tuples (α1 , α2 , . . . , αk ) ∈ Pk satisfying α1 + α2 + · · · + αk = n
= ∑ 1. (93)
)∈Pk ;
(α1 ,α2 ,...,αk
α1 +α2 +···+αk =n
Math 701 Spring 2021, version April 6, 2024 page 120

Thus, we can rewrite the equality Ak = ∑ an,k x n as


n ∈N
 

∑ ∑ ∑ ∑
 n
xn

Ak = 
 x =
1 |{z}
n ∈N (α1 ,α2 ,...,αk )∈Pk ; n ∈N (α1 ,α2 ,...,αk )∈Pk ; = x α1 +α2 +···+αk
α1 +α2 +···+αk =n α1 +α2 +···+αk =n (since α1 +α2 +···+αk =n)
= ∑ ∑ |x
α1 +α2 +···+αk
{z }= ∑ x x ···x
α1 α2 αk
n ∈N (α1 ,α2 ,...,αk )∈Pk ; = x α1 x α2 ··· x αk (α1 ,α2 ,...,αk )∈Pk
α1 +α2 +···+αk =n
| {z }
= ∑
(α1 ,α2 ,...,αk )∈Pk
! ! !
= ∑ x α1
∑ x α2
··· ∑ x αk
α1 ∈P α2 ∈P α k ∈P
 
by the same product rule that we used back
in the proof of Theorem 3.8.3
!k
= ∑ xn
n ∈P

(here, we have renamed all the k summation indices as n, and realized that all
k sums are identical). Since
  1 x
∑ x n
= x 1
+ x 2
+ x 3
+ · · · = x 1 + x + x 2
+ · · · = x·
1−x
=
1−x
,
n ∈P | {z }
1
=
1−x
this can be rewritten further as
 k
x
Ak = = x k (1 − x ) − k . (94)
1−x

In order to simplify this, we need to expand (1 − x )−k . This is routine by


now: Theorem 3.3.10 (applied to −k instead of n) yields

−k j
 
(1 + x ) = ∑
−k
x.
j ∈N
j

Substituting − x for x on both sides of this equality (i.e., applying the map
Math 701 Spring 2021, version April 6, 2024 page 121

f 7→ f ◦ (− x )), we obtain
−k j+k−1
   
(1 − x ) = ∑
−k
(− x ) j = ∑ (−1) j
· (−1) j x j
j ∈N
j | {z } j ∈N
j
| {z }  =(−1) j x j
j+k−1
=(−1) j
j
(by Theorem 3.3.11,
applied to k and j
instead of n and k)
 2  j + k − 1 
j+k−1 j

= ∑ (−1) j
j
j
x = ∑ j
x (95)
j ∈N |{z } j ∈N
=1
n − 1 n−k
 
= ∑ x
n≥k
n−k

(here, we have substituted n − k for j in the sum). Hence, our above computa-
tion of Ak can be completed as follows:
n − 1 n−k
 
Ak = x k
(1 − x ) −k
=x ∑
k
x
| {z } n≥k
n−k
n − 1 n−k
=∑ x
n≥k n−k
n − 1 k n−k n−1 n n−1 n
     
= ∑
n − k | {zn } n∑
x x = x = ∑ x
n≥k ≥k
n−k n ∈N
n−k
=x

(here, we have extended the range of the summation from all n ≥ k to all n ∈ N;
this did not change the sum, since all the newly introduced addends with n < k
are 0). Comparing coefficients, we thus obtain
n−1
 
an,k = for each n ∈ N. (96)
n−k
n−1
 
If n > 0, then we can rewrite the right hand side of this equality as
k−1
(using Theorem 2.3.6). However, if n = 0, then this right hand side equals δk,0
instead (where we are using Definition 3.5.6). Thus, we can rewrite (96) as

 n − 1 , if n > 0;


an,k = k−1 for each n ∈ N. (97)

δ ,
k,0 if n = 0

We have thus answered our Question 2. Let us summarize the two an-
swers we have found ((96) and (97)) in the following theorem ([19fco, Theorem
2.10.1]):
Math 701 Spring 2021, version April 6, 2024 page 122

Theorem 3.9.3. Let n, k ∈ N. Then, the # of compositions of n into k parts is



n−1

 
n−1 , if n > 0;
 
= k−1
n−k 
δ ,
k,0 if n = 0.

This theorem has other proofs as well. See [19fco, Proof of Theorem 2.10.1]
for a proof by bijection and [19fco, solution to Exercise 2.10.2] for a proof by
induction.
As an easy consequence of Theorem 3.9.3, we can now answer Question 1 as
well:

Theorem 3.9.4. Let n ∈ N. Then, the # of compositions of n is


(
2n−1 , if n > 0;
1, if n = 0.

Proof of Theorem 3.9.4 (sketched). If n = 0, then the # of compositions of n is 1


(since the empty tuple () is the only composition of 0). Thus, for the rest of
this proof, we WLOG assume that n ̸= 0. Hence, we must prove that the # of
compositions of n is 2n−1 .
If (n1 , n2 , . . . , nm ) is a composition of n, then m ∈ {1, 2, . . . , n} (why?). In
other words, any composition of n is a composition of n into k parts for some
k ∈ {1, 2, . . . , n}. Hence,

(# of compositions of n)
= ∑ (# of compositions of n into k parts)
k∈{1,2,...,n}
|  {z  }
n−1
=
n−k
(by Theorem 3.9.3)
n n −1 
n−1 n−1 n−1
    
= ∑ n−k
= ∑ n−k
= ∑ k
k∈{1,2,...,n} k =1 k =0

(here, we have substituted k for n − k in the sum). Comparing this with


n −1 
n − 1 k n −1− k

2 n −1
= (1 + 1) n −1
= ∑ 1| 1 {z } (by the binomial theorem)
k =0
k
=1
n −1 
n−1

= ∑ ,
k =0
k

we obtain (# of compositions of n) = 2n−1 . This proves Theorem 3.9.4.


Math 701 Spring 2021, version April 6, 2024 page 123

3.9.2. Weak compositions


A variant of compositions are the weak compositions. These are like composi-
tions, but their entries have to only be nonnegative rather than positive. For the
sake of completeness, let us give their definition in full:34

Definition 3.9.5. (a) An (integer) weak composition means a (finite) tuple of


nonnegative integers.
(b) The size of a weak composition α = (α1 , α2 , . . . , αm ) is defined to be the
integer α1 + α2 + · · · + αm . It is written |α|.
(c) The length of a weak composition α = (α1 , α2 , . . . , αm ) is defined to be
the integer m. It is written ℓ (α).
(d) Let n ∈ N. A weak composition of n means a weak composition whose
size is n.
(e) Let n ∈ N and k ∈ N. A weak composition of n into k parts is a weak
composition whose size is n and whose length is k.

Example 3.9.6. The tuple (3, 0, 1, 2) is a weak composition with size 3 + 0 +


1 + 2 = 6 and length 4. In other words, it is a weak composition of 6 into 4
parts. It is not a composition, since one of its entries is a 0.

Weak compositions are a rather natural analogue of compositions, but behave


dissimilarly in one important way: Any n ∈ N has infinitely many weak com-
positions. Indeed, all the tuples (n) , (n, 0) , (n, 0, 0) , (n, 0, 0, 0) , . . . are weak
compositions of n (and of course, there are many more weak compositions of n,
unless n = 0). Thus, it makes no sense to look for an analogue of Theorem 3.9.4
for weak compositions. However, an analogue of Theorem 3.9.3 does exist:

Theorem 3.9.7. Let n, k ∈ N. Then, the # of weak compositions of n into k


parts is 
n+k−1

 
n+k−1
  , if k > 0;
= k−1
n 
δ ,
n,0 if k = 0.

Proof of Theorem 3.9.7 (sketched). (See [19fco, Theorem 2.10.5] for details and al-
ternative proofs.) Adding 1 to a nonnegative integer yields a positive integer.
Furthermore, if we add 1 to each entry of a k-tuple, then the sum of all entries
of the k-tuple increases by k.
Thus, if (α1 , α2 , . . . , αk ) is a weak composition of n into k parts, then
(α1 + 1, α2 + 1, . . . , αk + 1) is a composition of n + k into k parts. Hence, we can

34 Beware that the word “weak composition” does not have a unique meaning in the literature.
Math 701 Spring 2021, version April 6, 2024 page 124

define a map
{weak compositions of n into k parts} → {compositions of n + k into k parts} ,
(α1 , α2 , . . . , αk ) 7→ (α1 + 1, α2 + 1, . . . , αk + 1) .
Furthermore, it is easy to see that this map is a bijection (indeed, its inverse is
rather easy to construct). Thus, by the bijection principle, we have
|{weak compositions of n into k parts}|
= |{compositions of n + k into k parts}|
= (# of compositions of n + k into k parts)
n+k−1
   
by the first equality sign in Theorem 3.9.3,
=
n+k−k applied to n + k instead of n
n+k−1
 
= .
n
Thus, we have shown that the # of weak compositions  of n into k parts is
n+k−1


n+k−1
   , if k > 0;
. It remains to prove that this equals k−1 as
n 
δ ,
n,0 if k = 0
well. This is similar to how we obtained (97): If k = 0, then it is clear by
inspection; otherwise it follows from Theorem 2.3.6. Theorem 3.9.7 is proven.

3.9.3. Weak compositions with entries from {0, 1, . . . , p − 1}


Theorems 3.9.3 and 3.9.7 may stir up hopes that other tuple-counting problems
also have simple answers. Let us see if this hope holds up.
An attempt. We fix three nonnegative integers n, k and p. We are looking for the
# of k-tuples (α1 , α2 , . . . , αk ) ∈ {0, 1, . . . , p − 1}k satisfying α1 + α2 + · · · + αk = n.
In other words, we are looking for the # of all weak compositions of n into k
parts with the property that each entry is < p. Let us denote this # by wn,k,p .
Just as when counting compositions, we invoke a generating function. Forget
that we fixed n, and define the FPS
Wk,p := ∑ wn,k,p x n ∈ Q [[ x ]] .
n ∈N

For each n ∈ N, we have



wn,k,p = # of all k-tuples (α1 , α2 , . . . , αk ) ∈ {0, 1, . . . , p − 1}k
satisfying α1 + α2 + · · · + αk = n)
= ∑ 1.
k
(α1 ,α2 ,...,αk )∈{0,1,...,p−1} ;
α1 +α2 +···+αk =n
Math 701 Spring 2021, version April 6, 2024 page 125

Thus, we can rewrite the equality Wk,p = ∑ wn,k,p x n as


n ∈N
 

∑ ∑
  n
Wk,p = 
 x
1
n ∈N (α1 ,α2 ,...,αk )∈{0,1,...,p−1}k ;
α1 +α2 +···+αk =n
= ∑ ∑ xn
|{z}
n ∈N (α1 ,α2 ,...,αk )∈{0,1,...,p−1}k ; = x α1 +α2 +···+αk
α1 +α2 +···+αk =n (since α1 +α2 +···+αk =n)
= ∑ ∑ α1 +α2 +···+αk
|x {z }
n ∈N (α1 ,α2 ,...,αk )∈{0,1,...,p−1}k ; = x α1 x α2 ··· x αk
α1 +α2 +···+αk =n
| {z }
= ∑
(α1 ,α2 ,...,αk )∈{0,1,...,p−1}k

= ∑ x α1 x α2 · · · x α k
(α1 ,α2 ,...,αk )∈{0,1,...,p−1}k
    

= ∑ x α1   ∑ x α2  · · ·  ∑ x αk 
α1 ∈{0,1,...,p−1} α2 ∈{0,1,...,p−1} αk ∈{0,1,...,p−1}
 
by the same product rule that we used back
in the proof of Theorem 3.8.3
 k

= ∑ xn 
n∈{0,1,...,p−1}

(here, we have renamed all the k summation indices as n, and realized that all
k sums are identical). Since
1 − xp
∑ x n = x 0 + x 1 + · · · + x p −1 =
1−x
n∈{0,1,...,p−1}

(the last equality sign here is easy to check35 ), this can be rewritten further as

1 − xp k
 
Wk,p = = (1 − x p ) k (1 − x ) − k . (98)
1−x

In order to expand the right hand side, let us expand (1 − x p )k and (1 − x )−k
separately.
The binomial theorem yields
k   k    
j k j k j k
(1 − x ) = ∑ (−1)
p k
( x ) = ∑ (−1)
p j
x = ∑ (−1)
pj
x pj
j =0
j | {z } j =0
j j ∈N
j
= x pj
35 Just show that (1 − x ) x0 + x1 + · · · + x p−1 = 1 − x p .

Math 701 Spring 2021, version April 6, 2024 page 126

(here, we have extended the range of the summation from j ∈ {0, 1, . . . , k } to


all j ∈ N; this did not change the sum, since all the newly introduced addends
are 0). Multiplying this by (95), we obtain

(1 − x p ) k (1 − x ) − k
  !  !
j+k−1 j

j k
= ∑ (−1) x pj
∑ x
j ∈N
j j ∈N
j
  !  !


k i + k 1
= ∑ (−1) j x pj ∑ xi
j ∈N
j i ∈N
i
 
here, we have renamed the summation index j
as i in the second sum
i + k − 1 pj+i
  
j k
= ∑ (−1)
j i
x
(i,j)∈N×N
| {z }
= ∑ ∑
n ∈N (i,j)∈N×N;
pj+i =n
i+k−1
  
k
= ∑ ∑ (−1) j j
i
pj+i
|x {z }
n ∈N (i,j)∈N×N; =xn
pj+i =n (since pj+i =n)
i+k−1 n
  
k
= ∑ ∑ (−1) j j
i
x
n ∈N (i,j)∈N×N;
pj+i =n
 
i+k−1 
  
j k
= ∑  ∑  xn .

(− 1 )

n∈N (i,j)∈N×N;
j i 
pj+i =n

Thus, (98) becomes


 
i+k−1 
  
j k
Wk,p = (1 − x p )k (1 − x )−k = ∑  ∑  xn .

(− 1 )

n∈N (i,j)∈N×N;
j i 
pj+i =n
Math 701 Spring 2021, version April 6, 2024 page 127

Comparing coefficients in this equality, we find that each n ∈ N satisfies

i+k−1
  
j k
wn,k,p = ∑ (−1) j i
(i,j)∈N×N;
pj+i =n
n − pj + k − 1
  
k
= ∑ (−1) j

j∈N;
j n − pj
pj≤n

here, we have substituted (n − pj, j) for (i, j) in the sum,


 
 since any pair (i, j) ∈ N × N satisfying pj + i = n 
is uniquely determined by its second entry j
n − pj + k − 1
  
j k
= ∑ (−1)
j ∈N
j n − pj
 
here, we have extended the range of summation by
 dropping the “pj ≤ n” requirement; this does not change 
the sum, since all newly introduced addends are 0
k
n − pj + k − 1
  
j k
= ∑ (−1)
j =0
j n − pj

(here, we have removed all addends with j > k from the sum; this does not
change the sum, since all these addends are 0).
Thus, we have proved the following fact:

Theorem 3.9.8. Let n, k, p ∈ N. Then, the # of k-tuples (α1 , α2 , . . . , αk ) ∈


{0, 1, . . . , p − 1}k satisfying α1 + α2 + · · · + αk = n is
k
n − pj + k − 1
  
k
∑ (−1) j j
n − pj
.
j =0

In general, this expression is the simplest we can get. A combinatorial proof


of Theorem 3.9.8 can be found in [19fco, Exercise 2.10.6].
However, the particular case when p = 2 is worth exploring, as it allows
for a much simpler expression. Indeed, the k-tuples (α1 , α2 , . . . , αk ) ∈ {0, 1}k
are just the “binary k-strings”, i.e., the k-tuples formed of 0s and 1s. Impos-
ing the condition α1 + α2 + · · · + αk = n on such a k-tuple is tantamount to
requiring that it contain exactly n many 1s. Therefore, the  # of all k-tuples
k
(α1 , α2 , . . . , αk ) ∈ {0, 1}k satisfying α1 + α2 + · · · + αk = n is , since we have
n
to choose which n of its k positions will be occupied by 1s (and then all remain-
ing k − n positions will be occupied by 0s). However, Theorem 3.9.8 (applied
Math 701 Spring 2021, version April 6, 2024 page 128

n − 2j + k − 1
  
k k j
to p = 2) yields that this # equals ∑ (−1) . Comparing
j =0 j n − 2j
these two results, we obtain the following identity:

Proposition 3.9.9. Let n, k ∈ N. Then,


k
n − 2j + k − 1
    
k k
n
= ∑ (−1) j j
n − 2j
.
j =0

3.10. x n -equivalence
We now return to general properties of FPSs.

xn
Definition 3.10.1. Let n ∈ N. Let f , g ∈ K [[ x ]] be two FPSs. We write f ≡ g
if and only if

each m ∈ {0, 1, . . . , n} satisfies [ x m ] f = [ x m ] g.

xn
Thus, we have defined a binary relation ≡ on the set K [[ x ]]. We say that
xn
an FPS f is x n -equivalent to an FPS g if and only if f ≡ g.

Thus, an FPS f is x n -equivalent to an FPS g if and only if the first n + 1


coefficients of f agree with the first n + 1 coefficients of g. Here are some
examples:

Example 3.10.2. (a) Consider the two FPSs

(1 + x )3 = 1 + 3x + 3x2 + x3 = x0 + 3x1 + 3x2 + 1x3 + 0x4 + 0x5 + · · ·

and
1
= 1 + 3x + (3x )2 + (3x )3 + · · · = x0 + 3x1 + 9x2 + 27x3 + · · · .
1 − 3x
x1 1  
Thus, (1 + x )3 ≡ , since each m ∈ {0, 1} satisfies [ x m ] (1 + x )3 =
1 − 3x
1      1
[ xm ] (indeed, we have x0 (1 + x )3 = 1 = x0 and
1 − 3x  1 − 3x
   1
(1 + x )3 = 3 = x 1
 1
x ). Of course, this also shows that
1 − 3x
x0 1 x2 1
(1 + x )3 ≡ . However, we don’t have (1 + x )3 ≡ (at least not
1 − 3x 1 − 3x
    1
for K = Z), since x2 (1 + x )3 = 3 does not equal x2
 
= 9.
1 − 3x
Math 701 Spring 2021, version April 6, 2024 page 129

x1 1 x1
(b) More generally, (1 + x )n ≡ and (1 + x )n ≡ 1 + nx for each
1 − nx
n ∈ Z.
(c) The generating function F = F ( x ) = 0 + 1x + 1x2 + 2x3 + 3x4 + 5x5 +
x2 x3
· · · from Section 3.1 satisfies F ≡ x + x2 and F ≡ x + x2 + 2x3 .
(d) If f ∈ K [[ x ]] is any FPS, and if n ∈ N, then there exists a polynomial
xn n
p ∈ K [ x ] such that f ≡ p. Indeed, we can take p = ∑ x k f · x k .
  
k =0

xn
One way to get an intuition for the relation ≡ is to think of it as a kind
of “approximate equality” up to degree n. (This makes the most sense if one
thinks of x as an infinitesimal quantity, in which case a term λx k (with λ ∈ K)
xn
is the more “important” the lower k is. From this viewpoint, f ≡ g means that
the FPSs f and g agree in their n + 1 most “important” terms and differ at most
xn
in their “error terms”.) For this reason, the statement “ f ≡ g” is sometimes
written as “ f = g + o ( x n )” (an algebraic imitation of Landau’s little-o notation
from asymptotic analysis). Another intuition comes from elementary number
xn
theory: The relation ≡ is similar to congruence of integers modulo a given
xn
integer. This is more than a similarity; the relation ≡ can in fact be restated as a
divisibility in the same fashion as for congruences of integers (see Proposition
xn
3.10.4 below). For this reason, the statement “ f ≡ g” is sometimes written
as “ f ≡ g mod x n+1 ”. We shall, however, eschew both of these alternative
xn
notations, and use the original notation “ f ≡ g” from Definition 3.10.1, as
both intuitions (while useful) would distract from the simplicity of Definition
3.10.136 .
xn
Here are some basic properties of the relation ≡ (some of which will be used
without explicit reference):

Theorem 3.10.3. Let n ∈ N.


xn
(a) The relation ≡ on K [[ x ]] is an equivalence relation. In other words:

xn
• This relation is reflexive (i.e., we have f ≡ f for each f ∈ K [[ x ]]).
xn
• This relation is transitive (i.e., if three FPSs f , g, h ∈ K [[ x ]] satisfy f ≡ g
xn xn
and g ≡ h, then f ≡ h).

36 Case in point: Definition 3.10.1 can be generalized to multivariate FPSs, but the two intu-
itions are no longer available (or, worse, give the “wrong” concepts) when extended to this
generality.
Math 701 Spring 2021, version April 6, 2024 page 130

xn
• This relation is symmetric (i.e., if two FPSs f , g ∈ K [[ x ]] satisfy f ≡ g,
xn
then g ≡ f ).

xn xn
(b) If a, b, c, d ∈ K [[ x ]] are four FPSs satisfying a ≡ b and c ≡ d, then we
also have
xn
a + c ≡ b + d; (99)
xn
a − c ≡ b − d; (100)
xn
ac ≡ bd. (101)

xn xn
(c) If a, b ∈ K [[ x ]] are two FPSs satisfying a ≡ b, then λa ≡ λb for each
λ ∈ K.
xn xn
(d) If a, b ∈ K [[ x ]] are two invertible FPSs satisfying a ≡ b, then a−1 ≡ b−1 .

xn xn
(e) If a, b, c, d ∈ K [[ x ]] are four FPSs satisfying a ≡ b and c ≡ d, and if the
FPSs c and d are invertible, then we also have
a xn b
≡ . (102)
c d

(f) Let S be a finite set. Let ( as )s∈S ∈ K [[ x ]]S and (bs )s∈S ∈ K [[ x ]]S be two
families of FPSs such that
xn
each s ∈ S satisfies as ≡ bs . (103)

Then, we have
xn
∑ as ≡ ∑ bs ; (104)
s∈S s∈S
xn
∏ a s ≡ ∏ bs . (105)
s∈S s∈S

Proof of Theorem 3.10.3 (sketched). All of these properties are analogous to famil-
iar properties of integer congruences, except for Theorem 3.10.3 (d), which is
moot for integers (since there are not many integers that are invertible in Z).
The proofs are similarly simple (using (20), (21), (22) and (25)). Thus, we shall
only give some hints for the proof of Theorem 3.10.3 (d) here; detailed proofs
of all parts of Theorem 3.10.3 can be found in Section B.1.
xn
(d) Let a, b ∈ K [[ x ]] be two invertible FPSs satisfying a ≡ b. We want to show
Math 701 Spring 2021, version April 6, 2024 page 131

xn
that a−1 ≡ b−1 .
The FPS a is invertible; thus, its constant term x0 a is invertible in K (by
 

Proposition 3.3.7).
xn
Recall that a ≡ b. In other words,
each m ∈ {0, 1, . . . , n} satisfies [ x m ] a = [ x m ] b. (106)
xn
Now, we want to prove that a−1 ≡ b−1 . In other words,  we want to prove
m
that each m ∈ {0, 1, . . . , n} satisfies [ x ] a − 1 m − 1
= [ x ] b . We shall prove this
by strong induction on m: We fix some k ∈ {0, 1, . . . , n}, and we assume (as an
induction hypothesis) that
   
m −1 m −1
x
[ ] a = x
[ ] b for each m ∈ {0, 1, . . . , k − 1} . (107)

We must now prove that x k a−1 = x k b−1 . We know that


     

h i  kh i h i 
x k aa−1 = ∑ x i
a · x k −i
a −1
(by (22))
i =0
h i h i  k h i h i 
0
= x a· x k
a −1
+∑ x a· x
i k −i
a −1
i =1

(here, we have split off the addend for i = 0 from the sum). Thus,
h i h i  k h i h i  h i  h i
x0 a · x k a−1 + ∑ xi a · x k−i a−1 = x k aa−1 = x k 1.
i =1 | {z }
=1

We can solve this equation for x k a−1 (since x0 a is invertible), and thus
    

obtain
!
k h i
h i  1 h i h i 
x k a −1 = 0 · x k 1 − ∑ x i a · x k − i a −1 .
[x ] a i =1

The same argument (applied to b instead of a) yields


!
k h i
h i  1 h i h i 
x k b −1 = 0
· x k 1 − ∑ x i b · x k − i b −1 .
[x ] b i =1

The right hand sides of the latter


i
 i  two equalities are equal (since each i ∈
{1, 2, . . . , k} satisfies x a = x b as a consequence of (106), and satisfies
 k − i  −1   k − i  −1   0
x
 0 a = x b as a consequence of (107), and since we have x a=
x b as a consequence of (106)). Hence, the left hand sides must also be equal.
In other words, x k a−1 = x k b−1 , which is precisely what we wanted to
     
xn
prove. Thus, the induction step is complete, so that a−1 ≡ b−1 is proved. Thus,
Theorem 3.10.3 (d) is proved. (See Section B.1 for more details.)
Math 701 Spring 2021, version April 6, 2024 page 132

Let us next characterize x n -equivalence of FPSs in terms of divisibility:

Proposition 3.10.4. Let n ∈ N. Let f , g ∈ K [[ x ]] be two FPSs. Then, we have


xn
f ≡ g if and only if the FPS f − g is a multiple of x n+1 .

Proof of Proposition 3.10.4. See Section B.1 for this proof (a simple consequence
of Lemma 3.3.18).
Finally, here is a subtler property of x n -equivalence similar to the ones in
Theorem 3.10.3 (b):

Proposition 3.10.5. Let n ∈ N. Let a, b, c, d ∈ K [[ x ]] be four FPSs satisfying


xn xn
a ≡ b and c ≡ d and x0 c = 0 and x0 d = 0. Then,
   

xn
a ◦ c ≡ b ◦ d.

Proof of Proposition 3.10.5 (sketched). Write a and b as a = ∑ ai xi and b = ∑ bi xi


i ∈N i ∈N
xn
(with ai , bi ∈ K). Then, a ≡ b means that ai = bi for all i ≤ n. Combine this
xn xn
with ci ≡ di (which holds for all i ∈ N as a consequence of c ≡ d and of (105))
xn
to obtain the relation ai ci ≡ bi di for all
 i ≤ n. Butthis relation also holds for all
i > n, since all such i satisfy [ x m ] ci = [ x m ] di = 0 for  m ∈ {0, 1, . . . , n}
 0all
0
(a consequence of Lemma 3.3.18 using x c = 0 and x d = 0). Thus, the
xn xn
relation ai ci ≡ bi di holds for all i ∈ N. Summing over all i, we find a ◦ c ≡ b ◦ d.
See Section B.1 for the details of this argument.

3.11. Infinite products


Let us now extend our FPS playground somewhat. We have made sense of
infinite sums. What about infinite products?

3.11.1. An example
We start with a motivating example (due to Euler, in [Euler48, §328–329]),
which we shall first discuss informally.
Assume (for the time being) that the infinite product
      
∏ 1 + x 2i
= 1 + x 1
1 + x 2
1 + x 4
1 + x 8
··· (108)
i ∈N

in the ring K [[ x ]] is meaningful, and that such products behave as nicely as


finite products. Can we simplify this product?
Math 701 Spring 2021, version April 6, 2024 page 133

i +1
i 1 − x2 i +1
We can observe that each i ∈ N satisfies 1 + x2 = i
(since 1 − x2 =
 i 2    1 − x2
i i
1 − x2 = 1 − x2 1 + x2 ). Multiplying these equalities over all i ∈ N,
we obtain
i +1
  1 − x2 1 − x2 1 − x4 1 − x8 1 − x16
∏ 1 + x 2i
= ∏ 2i
= · · ·
1 − x1 1 − x2 1 − x4 1 − x8
···· .
i ∈N i ∈N 1 − x

The product on the right hand side here is a telescoping product – meaning that
each numerator is cancelled by the denominator of the following fraction. As-
suming (somewhat plausibly, but far from rigorously) that we are allowed to
cancel infinitely many factors from an infinite product, we thus end up with
a single 1 − x1 factor in the denominator. That is, our product simplifies to
1
. Thus, we obtain
1 − x1
  1 1
∏ 1 + x = 1 − x1 = 1 − x = 1 + x + x2 + x3 + · · · .
2i
(109)
i ∈N
 i

This was not very rigorous, so let us try to compute the product ∏ 1 + x2
i ∈N
in a different way. Namely, we recall a simple fact about finite products: If
a0 , a1 , . . . , am are finitely many elements of a commutative ring, then the product
m
∏ (1 + a i ) = (1 + a0 ) (1 + a1 ) · · · (1 + a m ) (110)
i =0

equals the sum37


∑ a i1 a i2 · · · a i k
i1 <i2 <···<ik ≤m

of all the 2m+1 “sub-products” of the product a0 a1 · · · am (because this sum is


m
what we obtain if we expand ∏ (1 + ai ) by repeatedly applying distributivity).
i =0
For example, for m = 2, this is saying that
(1 + a0 ) (1 + a1 ) (1 + a2 )
= 1 + a0 + a1 + a2 + a0 a1 + a0 a2 + a1 a2 + a0 a1 a2 .
m
Now, it is plausible to expect the same formula ∏ (1 + ai ) = ∑ ∏ ai
i =0 I ⊆{0,1,...,m} i∈ I
to hold if “m is ∞” (that is, if the product ranges over all i ∈ N), provided that
the product is meaningful. In other words, it is plausible to expect that

∏ (1 + a i ) = ∑ a i1 a i2 · · · a i k (111)
i ∈N i1 <i2 <···<ik

37 The indices i1 , i2 , . . . , ik of the sum are supposed to be nonnegative integers.


Math 701 Spring 2021, version April 6, 2024 page 134

for any infinite sequence a0 , a1 , a2 , . . . as long as ∏ (1 + ai ) makes sense. (It’s


i ∈N
a little bit more complicated than that, but we aren’t trying to be fully rigorous
yet. The correct condition is that the sequence ( a0 , a1 , a2 , . . .) is summable.) If
i
we now apply (111) to ai = x2 , then we obtain
  i1 ik i1 ik
∏ ∑ x2 x2 · · · x2 = ∑ x2 +2 +···+2
2i i2 i2
1 + x =
i ∈N i1 <i2 <···<ik i1 <i2 <···<ik
= ∑ qn x n , (112)
n ∈N

where qn is the # of ways to write the integer n as a sum 2i1 + 2i2 + · · · + 2ik
with nonnegative integers i1 , i2 , . . . , ik satisfying i1 < i2 < · · · < ik . Comparing
this with (109), we obtain

∑ qn x n = 1 + x + x2 + x3 + · · · = ∑ xn ,
n ∈N n ∈N

at least if our assumptions were valid. Comparing coefficients, this would mean
that qn = 1 for each n ∈ N. In other words, each n ∈ N can be written
in exactly one way as a sum 2i1 + 2i2 + · · · + 2ik with nonnegative integers
i1 , i2 , . . . , ik satisfying i1 < i2 < · · · < ik . In other words, each n ∈ N can
be written uniquely as a finite sum of distinct powers of 2.
Is this true? Yes, because this is just saying that each n ∈ N has a unique
binary representation. For example, 21 = 24 + 22 + 20 corresponds to the binary
representation 21 = (10101)2 .
Thus, the two results we have obtained in (109) and (112) are actually equal,
which is reassuring. Yet, this does not replace a formal definition of infinite
products that rigorously justifies the above arguments.

3.11.2. A rigorous definition


One way of rigorously defining infinite products of FPSs can be found in
[Loehr11, §7.5]. However, this definition only defines infinite products of the

form ∏ or ∏ , but not (for example) of the form ∏ or ∏ . Another def-
i ∈N i =1 I ⊆N (i,j)∈N×N
inition of infinite products uses the Log and Exp bijections from Definition 3.7.6
to turn products into sums; but this requires K to be a Q-algebra (since Log and
Exp aren’t defined otherwise). Thus, we shall give a different definition here.
We recall our definition of infinite sums of FPSs (Definition 3.2.9):

Definition 3.2.9 (repeated). A (possibly infinite) family (ai )i∈ I of


FPSs is said to be summable if

for each n ∈ N, all but finitely many i ∈ I satisfy [ x n ] ai = 0.


Math 701 Spring 2021, version April 6, 2024 page 135

In this case, the sum ∑ ai is defined to be the FPS with


i∈ I
!
[ xn ] ∑ ai = ∑ [ x n ] ai for all n ∈ N.
i∈ I i∈ I
| {z }
an essentially
finite sum

This is how we defined infinite sums of FPSs. We cannot use the same defi-
nition for infinite products, because usually
!
we don’t expect to have [ x n ] ∏ ai = ∏ [ x n ] ai
i∈ I i∈ I

(after all, multiplication of FPSs is not defined coefficientwise). The condition


“all but finitely many i ∈ I satisfy [ x n ] ai = 0” is therefore not what we are
looking for.
Let us instead go back to the idea behind Definition 3.2.9. Let us fix some
n ∈ N. What was the actual purpose of the “all but finitely many i ∈ I sat-
[ x n ] ai = 0” condition? The purpose was to ensure that the coefficient
isfy 
[ xn ] ∑ ai is determined by finitely many of the ai ’s. In other words, the
i∈ I
purpose was to ensure that there is a finite partial sum of ∑ ai such that if
i∈ I
we add any further ai ’s to this partial sum, then the coefficient of x n does not
change any more. Here is a way to restate this condition more rigorously: There
is a finite subset M of I such that every finite subset J of I satisfying M ⊆ J ⊆ I
satisfies ! !
[ xn ] ∑ ai = [ xn ] ∑ ai .
i∈ M i∈ J
(The subset M here is the indexing set of our finite partial sum, and the set J is
what it becomes if we add some further ai ’s to this partial sum.)
This condition is a mouthful; this is why we found it easier to boil it down
to the simple “all but finitely many i ∈ I satisfy [ x n ] ai = 0” condition in the
case of infinite sums. However, in the case of infinite products, we cannot boil
it down to something this simple; thus, we have to live with it.
Fortunately, we can simplify our life by giving this condition a name. To
highlight the analogy, let us define it both for sums and for products:

Definition 3.11.1. Let (ai )i∈ I ∈ K [[ x ]] I be a (possibly infinite) family of FPSs.


Let n ∈ N. Let M be a finite subset of I.
(a) We say that M determines the x n -coefficient in the sum of (ai )i∈ I if every
finite subset J of I satisfying M ⊆ J ⊆ I satisfies
! !
[ xn ] ∑ ai = [ xn ] ∑ ai .
i∈ J i∈ M
Math 701 Spring 2021, version April 6, 2024 page 136

(You can think of this condition as saying “If you add any further ai s to the
sum ∑ ai , then the x n -coefficient stays unchanged”, or, more informally: “If
i∈ M
you want to know the x n -coefficient of ∑ ai , it suffices to take the partial sum
i∈ I
over all i ∈ M”.)
(b) We say that M determines the x n -coefficient in the product of (ai )i∈ I if every
finite subset J of I satisfying M ⊆ J ⊆ I satisfies
! !
[ xn ] ∏ ai = [ xn ] ∏ ai .
i∈ J i∈ M

(You can think of this condition as saying “If you multiply any further ai s to
the product ∏ ai , then the x n -coefficient stays unchanged”, or, more infor-
i∈ M
mally: “If you want to know the x n -coefficient of ∏ ai , it suffices to take the
i∈ I
partial product over all i ∈ M”.)

Example 3.11.2. (a) Consider the family


 i 
2
x+x
i ∈N
 0  1  2  3  4 
2 2 2 2 2
= x+x , x+x , x+x , x+x , x+x , ...
 
= 1, x + x2 , x2 + 2x3 + x4 , x3 + 3x4 + 3x5 + x6 , x4 + 4x5 + 6x6 + 4x7 + x8 , . . .

of FPSs. The subset {2,3} of N determines the x3 -coefficient in the sum


i
of this family x + x2 , because every finite subset J of N satisfying
i ∈N
{2, 3} ⊆ J ⊆ N satisfies
!  
h i  i h i  i
x3 ∑ x + x2 = x3  ∑ x + x2 
i∈ J i ∈{2,3}

(this is simply a consequence of the fact that the only two entries of our fam-
i
ily that have a nonzero x3 -coefficient are the entries x + x2 for i ∈ {2, 3}).
Thus, any finite subset of N that contains
 {2, 3}  as a subset determines the
i
x3 -coefficient in the sum of this family x + x2 .
i ∈N
(b) Consider the family
   
1 + xi = 1 + 1, 1 + x, 1 + x2 , 1 + x3 , 1 + x4 , . . .
i ∈N
Math 701 Spring 2021, version April 6, 2024 page 137

of FPSs over K = Z. The subset {0, 1, 2, 3} of N determines the x3 -coefficient


in the product of this family 1 + xi i∈N , because every finite subset J of N
satisfying {0, 1, 2, 3} ⊆ J ⊆ N satisfies
!  
h i   h i  
x3 ∏ 1 + xi = x3  ∏ 1 + xi  .
i∈ J i ∈{0,1,2,3}

(This is because multiplying an FPS by any of the polynomials 1 + x4 , 1 +


x5 , 1 + x6 , . . . leaves its x3 -coefficient unchanged.) Thus, any finite subset
of N that contains {0, 1, 2, 3} as a subset determines the x3 -coefficient in the
product of this family 1 + xi i∈N .
On the other hand, the subset { 0, 3} of N does not determine the x3 -
coefficient in the product of 1 + xi i∈N . To see this, it suffices to notice that
   
h i   h i  
x3  ∏ 1 + xi  ̸= x3  ∏ 1 + xi  .
i ∈{0,1,2,3} i ∈{0,3}
| {z } | {z }
=[ x3 ]((1+ x0 )(1+ x1 )(1+ x2 )(1+ x3 )) =[ x3 ]((1+ x0 )(1+ x3 ))
=4 =2

(The philosophical reason is that, even though the monomial x3 itself does
not appear in any of the entries 1 + x1 and 1 + x2 , it does emerge in the
product of these two entries with the constant term of ∏ 1 + xi =
i ∈{0,3}
3

(1 + 1) 1 + x .)
(c) Here is a simple but somewhat slippery example: Let I = {1, 2, 3},
and define the FPS ai = 1 + (−1)i x over K = Z for each i ∈ I (so that
a1 = a3 = 1 − x and a2 = 1 + x). Consider the finite family (ai )i∈ I . Its
product is already well-defined by dint of its finiteness:

∏ a i = (1 − x ) (1 + x ) (1 − x ) = 1 − x − x 2 + x 3 .
i∈ I

But let us nevertheless see which subsets M of I determine the x1 -coefficient


in the product of (ai )i∈ I . The subset I itself clearly does (since the only finite
subset J of I satisfying I ⊆ J ⊆ I is I itself). The subset {1} of I, however,
does not, even though the product ∏ ai has the same x1 -coefficient as its
i∈ I
subproduct ∏ ai . (To see why {1} does not determine the x1 -coefficient in
i ∈{1}
the product of (ai )i∈ I , it suffices to check that not every finite subset J of I
satisfying {1} ⊆ J ⊆ I satisfies the equality
!  
h i h i
x1 ∏ ai = x 1  ∏ ai  .
i∈ J i ∈{1}
Math 701 Spring 2021, version April 6, 2024 page 138

!
For example, J = {1, 2} does not, since x1 ∏ ai
 
= −1 but
i ∈{1}
!
 1
x ∏ ai = 0.)
i ∈{1,2}
This shows that, in order for a subset M of I to determine the x n -
coefficient in the product of a family (ai )i∈ I , it does not suffice to check !
that
   
[ x n ] ∏ ai = [ x n ] ∏ ai ; it rather needs to be shown that [ x n ] ∏ ai =
i ∈ I  i∈ M i∈ J

[ x n ] ∏ ai for each finite subset J of I satisfying M ⊆ J ⊆ I.


i∈ M

Definition 3.11.3. Let (ai )i∈ I ∈ K [[ x ]] I be a (possibly infinite) family of FPSs.


Let n ∈ N.
(a) We say that the x n -coefficient in the sum of (ai )i∈ I is finitely determined if
there is a finite subset M of I that determines the x n -coefficient in the sum of
( ai )i ∈ I .
(b) We say that the x n -coefficient in the product of (ai )i∈ I is finitely determined
if there is a finite subset M of I that determines the x n -coefficient in the
product of (ai )i∈ I .

Using these concepts, we can now reword our definition of infinite sums as
follows:

Proposition 3.11.4. Let (ai )i∈ I ∈ K [[ x ]] I be a (possibly infinite) family of


FPSs. Then:
(a) The family (ai )i∈ I is summable if and only if each coefficient in its sum
is finitely determined (i.e., for each n ∈ N, the x n -coefficient in the sum of
(ai )i∈ I is finitely determined).
(b) If the family (ai )i∈ I is summable, then its sum ∑ ai is the FPS whose
i∈ I
x n -coefficient (for any n ∈ N) can be computed as follows: If n ∈ N, and
if M is a finite subset of I that determines the x n -coefficient in the sum of
(ai )i∈ I , then ! !
[ xn ] ∑ ai = [ xn ] ∑ ai .
i∈ I i∈ M

Proof. Easy and LTTR.


Inspired by Proposition 3.11.4, we can now define infinite products of FPSs
at last:
Math 701 Spring 2021, version April 6, 2024 page 139

Definition 3.11.5. Let (ai )i∈ I be a (possibly infinite) family of FPSs. Then:
(a) The family (ai )i∈ I is said to be multipliable if and only if each coefficient
in its product is finitely determined.
(b) If the family (ai )i∈ I is multipliable, then its product ∏ ai is defined to
i∈ I
be the FPS whose x n -coefficient (for any n ∈ N) can be computed as follows:
If n ∈ N, and if M is a finite subset of I that determines the x n -coefficient in
the product of (ai )i∈ I , then
! !
[ xn ] ∏ ai = [ xn ] ∏ ai .
i∈ I i∈ M

Proposition 3.11.6. This definition of ∏ ai is well-defined – i.e., the coefficient


  i∈ I

[ x n ] ∏ ai does not depend on M (as long as M is a finite subset of I that


i∈ M
determines the x n -coefficient in the product of (ai )i∈ I ).
 
Proof. Let n ∈ N. We need to check that the coefficient [ xn ] ∏ ai does
i∈ M
not depend on M (as long as M is a finite subset of I that determines the x n -
coefficient in the product of (ai )i∈ I ). In other words, we need to check that if
M1 and M2 are two finite subsets of I that each determine the x n -coefficient in
the product of (ai )i∈ I , then
! !
[ xn ] ∏ ai = [ xn ] ∏ ai . (113)
i ∈ M1 i ∈ M2

So let us prove this. Let M1 and M2 be two finite subsets of I that each
determine the x n -coefficient in the product of (ai )i∈ I . Thus, in particular, M1
determines the x n -coefficient in the product of (ai )i∈ I . In other words, every
finite subset J of I satisfying M1 ⊆ J ⊆ I satisfies
! !
[ xn ] ∏ ai = [ xn ] ∏ ai .
i∈ J i ∈ M1

Applying this to J = M1 ∪ M2 , we obtain


! !
[ xn ] ∏ ai = [ xn ] ∏ ai (114)
i ∈ M1 ∪ M2 i ∈ M1

(since M1 ∪ M2 is a subset of I satisfying M1 ⊆ M1 ∪ M2 ⊆ I). The same


argument (with the roles of M1 and M2 swapped) yields
! !
[ xn ] ∏ ai = [ xn ] ∏ ai . (115)
i ∈ M2 ∪ M1 i ∈ M2
Math 701 Spring 2021, version April 6, 2024 page 140

The left hand sides of the equalities (114) and (115) are equal (since M1 ∪ M2 =
M2 ∪ M1 ). !Thus, the right ! hand sides are equal as well. In other words,

[ xn ] ∏ ai = [ xn ] ∏ ai . Thus, we have proved (113), and with it Propo-


i ∈ M1 i ∈ M2
sition 3.11.6.
The attentive (and pedantic) reader might notice that there is one more thing
that needs to be checked in order to make sure that Definition 3.11.5 (b) is le-
gitimate. In fact, this definition does not merely define (some) infinite products
∏ ai of FPSs, but also “accidentally” gives a new meaning to finite products
i∈ I
∏ ai (since a finite family (ai )i∈ I of FPSs is always multipliable). We therefore
i∈ I
need to check that this new meaning does not conflict with the original defini-
tion of a finite product of elements of a commutative ring. In other words, we
need to prove the following:

Proposition 3.11.7. Let (ai )i∈ I be a finite family of FPSs. Then, the product
∏ ai defined according to Definition 3.11.5 (b) equals the finite product ∏ ai
i∈ I i∈ I
defined in the usual way (i.e., defined as in any commutative ring).

Proof. Argue that I itself is a subset of I that determines all coefficients in the
product of (ai )i∈ I . See Section B.2 for a detailed proof.
We shall apply Convention 3.2.14 to infinite products just like we have been

applying it to infinite sums. For instance, the product sign ∏ (for a fixed
k=m
m ∈ Z) means ∏ .
k∈{m,m+1,m+2,...}

 i

3.11.3. Why ∏ 1 + x2 works and ∏ (1 + ix ) doesn’t
i ∈N i ∈N
 i

Let us now see how Definition 3.11.5 legitimizes our product ∏ 1 + x2 from
i ∈N
Subsection 3.11.1. Indeed,
      
∏ 1+x = 1+x 1+x 1+x 1+x ··· .
2i 1 2 4 8
i ∈N

If you want to compute the x6 -coefficient in this 4product, you only need to
1 2
 
multiply the first 3 factors 1 + x 1+x 1 + x ; none of the other factors
will change this coefficient in any way, because multiplying an FPS by 1 + x m
Math 701 Spring 2021, version April 6, 2024 page 141

(for some m > 0) does not change its first m coefficients38 . Likewise, if you
want to compute the x13 -coefficient of the above product, then you only need
to multiply the first 4 factors; none of the others will have any effect on this
coefficient. The same logic applies to the x n -coefficient for any n ∈ N; it is
determined by the first ⌊log2 n⌋ + 1 factors of the product. Thus, each coefficient
in the product is finitely determined. This means that the family is multipliable;
thus, its product makes sense.
In contrast, the product

(1 + 0x ) (1 + 1x ) (1 + 2x ) (1 + 3x ) (1 + 4x ) · · · = ∏ (1 + ix)
i ∈N

does not make sense. Indeed, its x1 -coefficient is not finitely determined (any
of the factors other than 1 + 0x affects it), so the family (1 + ix )i∈N is not mul-
tipliable.
Likewise, the product
 x  x  x  x
1+ 2
1
1+ 2
2
1+ 2 ··· =
3 ∏ 1+ 2
i
i ∈{1,2,3,...}

does not make sense in K [[ x ]] (although the analogous product in complex


analysis defines a holomorphic function).

3.11.4. A general criterion for multipliability


 i

Recall our reasoning that we used above to prove that the family 1 + x2
i ∈N
is multipliable. The core of this reasoning was the observation that multiplying
an FPS by 1 + x m (for some m > 0) does not change its first m coefficients.
This can be generalized: If f ∈ K [[ x ]] is an FPS whose first m coefficients are
0 (for example, f can be x m , in which case we recover the statement in our
preceding sentence), then multiplying an FPS a by 1 + f does not change its
first m coefficients (that is, the first m coefficients of a (1 + f ) are the first m
coefficients of a). This is a useful fact, so let us state it as a lemma (renaming m
as n + 1):

Lemma 3.11.8. Let a, f ∈ K [[ x ]] be two FPSs. Let n ∈ N. Assume that

[ xm ] f = 0 for each m ∈ {0, 1, . . . , n} . (116)


38 Forexample, let us check this for m = 3: If we multiply an FPS a0 x0 + a1 x1 + a2 x2 + · · · by
1 + x3 , then we obtain
  
a0 x 0 + a1 x 1 + a2 x 2 + · · · 1 + x3
= a0 x 0 + a1 x 1 + a2 x 2 + ( a3 + a0 ) x 3 + ( a4 + a1 ) x 4 + ( a5 + a2 ) x 5 + · · · ,

and so the first 3 coefficients are left unchanged.


Math 701 Spring 2021, version April 6, 2024 page 142

Then,

[ x m ] ( a (1 + f )) = [ x m ] a for each m ∈ {0, 1, . . . , n} .

Proof of Lemma 3.11.8. The FPS a f is a multiple of f (since a f = f a). Hence,


Lemma 3.3.21 (applied to u = f and v = a f ) yields that
[ xm ] (a f ) = 0 for each m ∈ {0, 1, . . . , n} (117)
(since we have assumed that [ x m ] f = 0 for each m ∈ {0, 1, . . . , n}).
Now, for each m ∈ {0, 1, . . . , n}, we have
 

[ x m ]  a (1 + f )  = [ x m ] ( a + a f ) = [ x m ] a + [ x m ] ( a f ) (by (20))
 
| {z } | {z }
= a+ a f =0
(by (117))
= [ x m ] a.
This proves Lemma 3.11.8.
For convenience, let us extend Lemma 3.11.8 to products of several factors:

Lemma 3.11.9. Let a ∈ K [[ x ]] be an FPS. Let ( f i )i∈ J ∈ K [[ x ]] J be a finite


family of FPSs. Let n ∈ N. Assume that each i ∈ J satisfies

[ xm ] ( f i ) = 0 for each m ∈ {0, 1, . . . , n} . (118)

Then,
!
[ x m ] a ∏ (1 + f i ) = [ xm ] a for each m ∈ {0, 1, . . . , n} .
i∈ J

Proof of Lemma 3.11.9. This is just Lemma 3.11.8, applied several times (specifi-
cally, | J | many times). See Section B.2 for a detailed proof.
Now, using Lemma 3.11.9, we can obtain the following convenient criterion
for multipliability:

Theorem 3.11.10. Let ( f i )i∈ I ∈ K [[ x ]] I be a (possibly infinite) summable fam-


ily of FPSs. Then, the family (1 + f i )i∈ I is multipliable.

Proof of Theorem 3.11.10. This is an easy consequence of Lemma 3.11.9. See Sec-
tion B.2 for a detailed proof.
We notice two simple sufficient (if rarely satisfied) criteria for multipliability:
Math 701 Spring 2021, version April 6, 2024 page 143

Proposition 3.11.11. If all but finitely many entries of a family (ai )i∈ I ∈
K [[ x ]] I equal 1 (that is, if all but finitely many i ∈ I satisfy ai = 1), then
this family is multipliable.

Proof. LTTR. (See Section B.2 for a detailed proof.)

Remark 3.11.12. If a family (ai )i∈ I ∈ K [[ x ]] I contains 0 as an entry (i.e., if


there exists an i ∈ I such that ai = 0), then this family is automatically
multipliable, and its product is 0.

Proof. Assume that the family (ai )i∈ I contains 0 as an entry. That is, there exists
some j ∈ I such that a j = 0. Consider this j. Now, it is easy to see that the
subset { j} of I determines all coefficients in the product of (ai )i∈ I . The details
are LTTR.

3.11.5. x n -approximators
Working with multipliable families gets slightly easier using the following no-
tion:

Definition 3.11.13. Let (ai )i∈ I ∈ K [[ x ]] I be a family of FPSs. Let n ∈ N. An


x n -approximator for (ai )i∈ I means a finite subset M of I that determines the
first n + 1 coefficients in the product of (ai )i∈ I . (In other words, M has to de-
termine the x m -coefficient in the product of (ai )i∈ I for each m ∈ {0, 1, . . . , n}.)

The name “x n -approximator” is supposed to hint at the fact that if M is an


x n -approximator
for a multipliable family (ai )i∈ I , then the (finite) subproduct
∏ ai “approximates” the full product ∏ ai up until the x n -coefficient (i.e., the
i∈ M i∈ I
first n + 1 coefficients of ∏ ai equal the respective coefficients of ∏ ai ). See
i∈ M i∈ I
Proposition 3.11.16 (b) below for the precise statement of this fact.
Clearly, an x n -approximator for a family (ai )i∈ I always determines the x n -
coefficient in the product of (ai )i∈ I . But the converse is not true, as the following
example shows:

Example 3.11.14. Consider the family


 i
  
1 + x2 = 1 + x1 , 1 + x2 , 1 + x4 , 1 + x8 , . . .
i ∈N

of FPSs. The finite subset {1, 2} of N determines the x6 -coefficient


in the product of this family (indeed, the x6 -coefficient of the product
1 + x2 1 + x4 is 1, and this does not change if we multiply any further fac-
tors onto this product), but is not an x6 -approximator for this family (since,
e.g., it does not determine the x5 -coefficient in its product).
Math 701 Spring 2021, version April 6, 2024 page 144

Lemma 3.11.15. Let (ai )i∈ I ∈ K [[ x ]] I be a multipliable family of FPSs. Let


n ∈ N. Then, there exists an x n -approximator for (ai )i∈ I .

Proof of Lemma 3.11.15 (sketched). This is an easy consequence of the fact that a
union of finitely many finite sets is finite. A detailed proof can be found in
Section B.2.
As promised above, we can use x n -approximators to “approximate” infinite
products of FPSs (in the sense of: compute the first n + 1 coefficients of these
products). Here is why this works:39

Proposition 3.11.16. Let (ai )i∈ I ∈ K [[ x ]] I be a family of FPSs. Let n ∈ N. Let


M be an x n -approximator for (ai )i∈ I . Then:
(a) Every finite subset J of I satisfying M ⊆ J ⊆ I satisfies

xn
∏ ai ≡ ∏ ai .
i∈ J i∈ M

(b) If the family (ai )i∈ I is multipliable, then

xn
∏ ai ≡ ∏ ai .
i∈ I i∈ M

Proof. This follows easily from Definition 3.11.13 and Definition 3.11.5 (b). See
Section B.2 for a detailed proof.

3.11.6. Properties of infinite products


We shall now establish some general properties of infinite products. By and
large, these properties are analogous to corresponding properties of finite prod-
ucts, although they need a few technical requirements (see Remark 3.11.20 for
why).
We begin with the first property, which says in essence that a product can be
broken into two parts:

Proposition 3.11.17. Let (ai )i∈ I ∈ K [[ x ]] I be a family of FPSs. Let J be a


subset of I. Assume that the subfamilies (ai )i∈ J and (ai )i∈ I \ J are multipliable.
Then:
(a) The entire family (ai )i∈ I is multipliable.

39 See xn
Definition 3.10.1 for the meaning of the symbol “≡” appearing in this proposition.
Math 701 Spring 2021, version April 6, 2024 page 145

(b) We have
!  

∏ ai = ∏ ai ·  ∏ ai  .
i∈ I i∈ J i∈ I \ J

Proof of Proposition 3.11.17 (sketched). Here is the idea: Fix n ∈ N. Lemma


3.11.15 (applied to J instead of I) shows that there exists an x n -approximator U
for (ai )i∈ J . Consider this U. Lemma 3.11.15 (applied to J instead of I) shows
that there exists an x n -approximator V for (ai )i∈ I \ J . Consider this V. Note that
U ∪ V is finite (since U and V are finite). Now, it is not hard to see that U ∪ V
determines the x n -coefficient in the product of (ai )i∈ I (indeed, it is not much
harder to see that U ∪ V is an x n -approximator for (ai )i∈ I ). Hence, the x n -
coefficient in the product of (ai )i∈ I is finitely determined (since U ∪ V is finite).
Now, forget that we fixed n, and conclude that the family (ai )i∈ I is multipliable.
This proves part (a). Part (b) easily follows using Proposition 3.11.16 (b).
The details of this proof can be found in Section B.2.
A useful particular case of Proposition 3.11.17 is obtained when the subset J
is a single-element set { j}. In this case, the proposition says that

∏ ai = a j · ∏ ai
i∈ I i ∈ I \{ j}

(assuming that the family (ai )i∈ I \{ j} is multipliable40 ). This rule allows us to
split off any factor from a multipliable product, as long as the rest of the product
is still multipliable.
   
Our next property generalizes the classical rule ∏ ( ai bi ) = ∏ ai · ∏ bi
i∈ I i∈ I i∈ I
of finite products to infinite ones:

Proposition 3.11.18. Let (ai )i∈ I ∈ K [[ x ]] I and (bi )i∈ I ∈ K [[ x ]] I be two multi-
pliable families of FPSs. Then:
(a) The family (ai bi )i∈ I is multipliable.
(b) We have ! !
∏ ( ai bi ) = ∏ ai · ∏ bi .
i∈ I i∈ I i∈ I

Proof of Proposition 3.11.18 (sketched). Here is the idea: Fix n ∈ N. Lemma


3.11.15 shows that there exists an x n -approximator U for (ai )i∈ I . Consider this
U. Lemma 3.11.15 (applied to bi instead of ai ) shows that there exists an x n -
approximator V for (bi )i∈ I . Consider this V. Note that U ∪ V is finite (since U
40 The one-element family (ai )i∈{ j} is, of course, always multipliable.
Math 701 Spring 2021, version April 6, 2024 page 146

and V are finite). Now, it is not hard to see that U ∪ V is an x n -approximator


for (ai bi )i∈ I . From here, proceed as in the proof of Proposition 3.11.17.
The details of this proof can be found in Section B.2.
Next comes an analogous property for quotients instead of products:

Proposition 3.11.19. Let (ai )i∈ I ∈ K [[ x ]] I and (bi )i∈ I ∈ K [[ x ]] I be two multi-
pliable families of FPSs. Assume that the FPS bi is invertible for each i ∈ I.
Then:
 
ai
(a) The family is multipliable.
bi i ∈ I
(b) We have
∏ ai
ai
∏ bi = ∏ bi .
i∈ I

i∈ I
i∈ I

Proof of Proposition 3.11.19 (sketch). This is similar to the proof of Proposition


3.11.18, but using (102) instead of Lemma 3.3.22. The details of this proof can
be found in Section B.2.
Now we come to an annoying technicality. In Proposition 3.2.12, we have
learnt that any subfamily of a summable family is again summable. Alas, the
analogous fact for multipliability is false:

Remark 3.11.20. Not every subfamily of a multipliable family is itself multi-


pliable. For example, for K = Z, the family (0, 1, 2, 3, . . .) is multipliable, but
its subfamily (1, 2, 3, . . .) is not.

Nevertheless, if a family of invertible FPSs is multipliable, then so is any


subfamily of it. In other words:

Proposition 3.11.21. Let (ai )i∈ I ∈ K [[ x ]] I be a multipliable family of invert-


ible FPSs. Then, any subfamily of (ai )i∈ I is multipliable.

Proof of Proposition 3.11.21 (sketched). This is another proof in the tradition of the
proofs of Proposition 3.11.17 and Proposition 3.11.18. We must show that the
family (ai )i∈ J is multipliable whenever J is a subset of I. The idea is to show that
if U is an x n -approximator for (ai )i∈ I , then U ∩ J determines the x n -coefficient
in the product of (ai )i∈ J (and, in fact, is an x n -approximator for (ai )i∈ J ). This
relies on the invertibility of ∏ ai ; this is why the FPS ai are required to be
i ∈U \ J
invertible in the proposition.
The details of this proof can be found in Section B.2.
Math 701 Spring 2021, version April 6, 2024 page 147

As Remark 3.11.20 shows, Proposition 3.11.21 would not hold without the
word “invertible”.
Now let us state a trivial rule that is nevertheless worth stating. Recall that
finite products can be reindexed using a bijection – i.e., if f : S → T is a bijection
between two finite sets S and T, then any product ∏ at can be rewritten as
t∈ T
∏ a f (s) . The same holds for infinite products:
s∈S

Proposition 3.11.22. Let S and T be two sets. Let f : S → T be a bijection.


Let (at )t∈T ∈ K [[ x ]] T be a multipliable family of FPSs. Then,

∏ at = ∏ a f (s)
t∈ T s∈S

(and, in particular,
  the product on the right hand side is well-defined, i.e.,
the family a f (s) is multipliable).
s∈S

In other words, if we reindex a multipliable family of FPSs (using a bijection),


then the resulting family will still be multipliable and have the same product
as the original family.
Proof of Proposition 3.11.22 (sketched). This is a trivial consequence of the defini-
tions of multipliability and infinite products.
Next comes a rule that allows us to break an infinite product into any amount
(possibly infinite!) of subproducts:

Proposition 3.11.23. Let (as )s∈S ∈ K [[ x ]]S be a multipliable family of FPSs.


Let W be a set. Let f : S → W be a map. Assume that for each w ∈ W, the
family (as )s∈S; f (s)=w is multipliable.41 Then,

∏ as = ∏ ∏ as . (119)
s∈S w ∈W s∈S;
f (s)=w

(In
 particular,
 the right hand side is well-defined – i.e., the family

 ∏ as  is multipliable.)
 
s∈S;
f (s)=w w ∈W

41 Notethat this assumption automatically holds if we assume that all FPSs as are invertible.
Indeed, the family (as )s∈S; f (s)=w is a subfamily of the multipliable family (as )s∈S and thus
must itself be multipliable if all the as are invertible (by Proposition 3.11.21).
Math 701 Spring 2021, version April 6, 2024 page 148

Proof of Proposition 3.11.23 (sketched). This can be derived from the analogous
property of finite products, since all coefficients in a multipliable product are
finitely determined (conveniently using x n -approximators, which determine
several coefficients at the same time). See Section B.2 for the details of this
proof.
The next rule allows (under certain technical conditions) to interchange prod-
uct signs, and to rewrite a nested product as a product over pairs:

 (Fubini
Proposition 3.11.24  rule for infinite products of FPSs). Let I and J
be two sets. Let a(i,j) ∈ K [[ x ]] I × J be a multipliable family of FPSs.
(i,j)∈ I × J  
Assume that for each i ∈ I, the family a(i,j) is multipliable. Assume that
  j∈ J
for each j ∈ J, the family a(i,j) is multipliable. Then,
i∈ I

∏ ∏ a(i,j) = ∏ a(i,j) = ∏ ∏ a(i,j) .


i∈ I j∈ J (i,j)∈ I × J j∈ J i∈ I

(In particular, all the products appearing in this equality are well-defined.)

Proof of Proposition 3.11.24 (sketched). The first equality follows by applying Propo-
sition 3.11.23 to S = I × J and W = I and f (i, j) = i (and appropriately rein-
dexing the products). The second equality is analogous. See Section B.2 for the
details of this proof.
The above rules (Proposition 3.11.17, Proposition 3.11.18, Proposition 3.11.19,
Proposition 3.11.21, Proposition 3.11.22, Proposition 3.11.23, Proposition 3.11.24)
show that infinite products (as we have defined them) are well-behaved – i.e.,
satisfy the usual rules that finite products satisfy, with only one minor caveat:
Subfamilies of multipliable families might fail to be multipliable (as saw in
Remark 3.11.20). If we restrict ourselves to multipliable families of invertible
FPSs, then even this caveat is avoided:

Proposition 3.11.25 (Fubini rulefor infinite


 products of FPSs, invertible case).
Let I and J be two sets. Let a(i,j) ∈ K [[ x ]] I × J be a multipliable
(i,j)∈ I × J
family of invertible FPSs. Then,

∏ ∏ a(i,j) = ∏ a(i,j) = ∏ ∏ a(i,j) .


i∈ I j∈ J (i,j)∈ I × J j∈ J i∈ I

(In particular, all the products appearing in this equality are well-defined.)

Proof of Proposition 3.11.25 (sketched). See Section B.2.


Math 701 Spring 2021, version April 6, 2024 page 149

These rules (and some similar ones, which the reader can easily invent and
prove42 ) allow us to work with infinite products almost as comfortably as with
finite ones. Of course, we need to check that our families are multipliable,
and occasionally verify that a few other technical requirements are met (such
as multipliability of subfamilies), but usually such verifications are straightfor-
ward and easy and can be done in one’s head.
Does this justify our manipulations in Subsection 3.11.1? To some extent.
We need to be careful with the telescope principle, whose infinite analogue is
rather subtle and needs some qualifications. Here is an example of how not to
use the telescope principle:

1 2 2 2
· · · · · · · ̸= 1.
2 2 2 2
It is tempting to argue that the infinitely many 2’s in these fractions cancel
each other out, and yet the 1 that remains is not the right result. See Exercise
A.2.13.4 (b) for how an infinite telescope principle should actually look like.
Anyway, our computations in Subsection 3.11.1 did not truly need the telescope

 k
42 For example, if (ai )i∈ I is a multipliable family of FPSs, and if k ∈ N, then ∏ aik = ∏ ai .
i∈ I i∈ I
If the ai are moreover invertible, then this equality also holds for all k ∈ Z.
Math 701 Spring 2021, version April 6, 2024 page 150

principle; they can just as well be made using more fundamental rules43 .
There is one more rule that we have used in Subsection 3.11.1 and have not
justified yet: the equality (111). We will justify it next.

3.11.7. Product rules (generalized distributive laws)


The equality (111) is an instance of a product rule – a statement of the form
“a product of sums can be expanded into one big sum”. The simplest prod-
uct rules are the distributive laws a (b + c) = ab + ac and ( a + b) c = ac + bc
(here, one of the sums being multiplied is a one-addend sum); one of the next-
simplest is ( a + b) (c + d) = ac + ad + bc + bd. As far as finite sums and finite
products are concerned, the following product rule is one of the most general:44

Proposition 3.11.26. Let L be a commutative ring. For every n ∈ N, let [n]


denote the set {1, 2, . . . , n}.
Let n ∈ N. For every i ∈ [n], let pi,1 , pi,2 , . . . , pi,mi be finitely many elements

43 To wit, we can argue as follows: We have


 i +1

2i + 1
1 − x2

  1−x i ∈N

i
1 + x2 = ∏ 1 − x2
i
= 
i
 ,
i ∈N i ∈N ∏ 1 − x2
i ∈N

where
 the last step used Proposition
 3.11.19 and relied on the fact that both families
2 i +1 2 i
1−x and 1 − x are multipliable (this is important, but very easy to check
i ∈N i ∈N
i
in this case) and that each FPS 1 − x2 is invertible (because
 itsconstant term is 1). However,
i
splitting off the factor for i = 0 from the product ∏ 1 − x2 , we obtain
i ∈N
       

i

i

0 i +1
1 − x2 = 1 − x2 · 1 − x2 = (1 − x ) · 1 − x2 ,
i ∈N | {z } i >0 i ∈N
=1− x 1 =1− x
| {z }
i +1
 
= ∏ 1− x 2
i ∈N
(here, we substituted i +1
for i in the product)

so that  
i +1
∏ 1 − x2
i ∈N 1
  = .
∏ 1 − x2
i 1−x
i ∈N
i +1

1 − x2

 i

i ∈N 1
Hence, ∏ 1 + x2 =   = . So we don’t need the telescope principle
i ∈N ∏ 1 − x2
i 1−x
i ∈N
to justify this equality.
44 Keep in mind that an empty Cartesian product (i.e., a Cartesian product of 0 sets) is always

a 1-element set; its only element is the 0-tuple (). Thus, a sum ranging over an empty
Cartesian product has exactly 1 addend.
Math 701 Spring 2021, version April 6, 2024 page 151

of L. Then,
n mi n
∏ ∑ pi,k = ∑ ∏ pi,ki . (120)
i =1 k =1 (k1 ,k2 ,...,k n )∈[m1 ]×[m2 ]×···×[mn ] i =1

We can rewrite (120) in a less abstract way as follows:



p1,1 + p1,2 + · · · + p1,m1 ( p2,1 + p2,2 + · · · + p2,m2 ) · · · ( pn,1 + pn,2 + · · · + pn,mn )
= p1,1 p2,1 · · · pn,1 + p1,1 p2,1 · · · pn−1,1 pn,2 + · · · + p1,m1 p2,m2 · · · pn,mn ,

where the right hand side is the sum of all m1 m2 · · · mn many ways to multiply
one addend from each of the factors on the left hand side.
See [Grinbe15, solution to Exercise 6.9] for a formal proof of Proposition
3.11.26. (The idea is to reduce it to the case n = 2 by induction, then to use the
discrete Fubini rule.)
Let us now move on to product rules for infinite sums and products. First,
let us extend Proposition 3.11.26 to a finite product of infinite sums (which are
now required to be in K [[ x ]] in order to have a notion of summability):

Proposition 3.11.27. For every n ∈ N, let [n] denote the set {1, 2, . . . , n}.
Let n ∈ N. For every i ∈ [n], let ( pi,k )k∈S be a summable family of elements
i
of K [[ x ]]. Then,
n n
∏ ∑ pi,k = ∑ ∏ pi,ki . (121)
i =1 k ∈ Si (k1 ,k2 ,...,k n )∈S1 ×S2 ×···×Sn i =1
 
n
In particular, the family ∏ pi,ki is summable.
i =1 (k1 ,k2 ,...,k n )∈S1 ×S2 ×···×Sn

Proof. Same method as for Proposition 3.11.26, but now using the discrete Fu-
bini rule for infinite sums.
Proposition 3.11.27 is rather general, but is only concerned with finite prod-
ucts. Thus, it cannot be directly used to justify (111), since the product in (111)
is infinite. Thus, we need a product rule for infinite products of sums. Such
rules are subtle and require particular care: Not only do our sums have to
be summable and our product multipliable, but we also must avoid cases like
(1 − 1) (1 − 1) (1 − 1) · · · , which would produce non-summable infinite sums
when expanded (despite being multipliable). We also need on come clear about
what addends we get when we expand our products: For example, when ex-
panding the product

(1 + a0 ) (1 + a1 ) (1 + a2 ) (1 + a3 ) (1 + a4 ) · · · ,
Math 701 Spring 2021, version April 6, 2024 page 152

we should get addends like a0 · 1 · a2 · 1| · 1 ·{z1 · · ·}· or 1 · a1 · a2 · 1 · a4 · 1| · 1 ·{z1 · · ·}· ,


infinitely many 1s infinitely many 1s
but not addends like a0 · a1 · a2 · a3 · · · · ; otherwise, the right hand side of (111)
would have to include infinite products of ai ’s. To filter out the latter kind of
addends, let us define the notion of “essentially finite” sequences or families:

Definition 3.11.28. (a) A sequence (k1 , k2 , k3 , . . .) is said to be essentially finite


if all but finitely many i ∈ {1, 2, 3, . . .} satisfy k i = 0.
(b) A family (k i )i∈ I is said to be essentially finite if all but finitely many i ∈ I
satisfy k i = 0.

For example, the sequence (2, 4, 1, 0, 0, 0, 0, . . .) is essentially finite, whereas


the sequence (0, 1, 0, 1, 0, 1, . . .) (which alternates between 0s and 1s) is not.
Of course, Definition 3.11.28 (a) is a particular case of Definition 3.11.28 (b),
since a sequence (k1 , k2 , k3 , . . .) is the same as a family (k i )i∈{1,2,3,...} indexed by
the positive integers. We also remark that Definition 3.2.8 (a) is a particular case
of Definition 3.11.28 (b), since a family of elements of K is one particular type
of family.
Now, we can finally state a version of the product rule for infinite products of
potentially infinite sums. This will help us derive (111) (even though the sums
being multiplied in (111) are finite).

Proposition 3.11.29. Let S1 , S2 , S3 , . . . be infinitely many sets that all contain


the number 0. Set

S = {(i, k ) | i ∈ {1, 2, 3, . . .} and k ∈ Si and k ̸= 0} .

For any i ∈ {1, 2, 3, . . .} and any k ∈ Si , let pi,k be an element of K [[ x ]].


Assume that
pi,0 = 1 for any i ∈ {1, 2, 3, . . .} . (122)
Assume further that the family ( pi,k )(i,k)∈S is summable. Then, the product

∏ ∑ pi,k is well-defined (i.e., the family ( pi,k )k∈Si is summable for each
i =1 k ∈ Si
!
i ∈ {1, 2, 3, . . .}, and the family ∑ pi,k is multipliable), and we
k ∈ Si i ∈{1,2,3,...}
have
∞ ∞
∏ ∑ pi,k = ∑ ∏ pi,ki . (123)
i =1 k ∈ Si (k1 ,k2 ,k3 ,...)∈S1 ×S2 ×S3 ×··· i =1
is essentially finite

 
In particular, the family ∏ pi,ki is
i =1 (k1 ,k2 ,k3 ,...)∈S1 ×S2 ×S3 ×··· is essentially finite
summable.
Math 701 Spring 2021, version April 6, 2024 page 153

Note that the assumption (122) in Proposition 3.11.29 ensures that each of
the sums being multiplied contains an addend that equals 1 (and that this
addend is named pi,0 ; but this clearly does not restrict the generality of the
proposition). The equality (123) shows how we can expand an infinite prod-
uct of such sums. The result is a huge sum of products (the right hand side
of (123)); each of these products is formed by picking an addend from each
sum ∑ pi,k (just as in the finite case). The picking has to be done in such
k ∈ Si
a way that the addend 1 gets picked from all but finitely  many of our sums
(for instance, when expanding 1 + x1 1 + x2 1 + x3 · · · , we don’t want to
pick the xi ’s from all sums, as this would lead to x1 x2 x3 · · · =“x ∞ ”); this is
why the sum on the right hand side of (123) is ranging only over the essen-
tially finite sequences (k1 , k2 , k3 , . . .) ∈ S1 × S2 × S3 × · · · . Finally, the condi-
tion that the family ( pi,k )(i,k)∈S be summable is our way of ruling out cases like
(1 − 1) (1 − 1) (1 − 1) · · · , which cannot be expanded. Proposition 3.11.29 is
not the most general version of the product rule, but it is sufficient for many of
our needs (and we will see some more general versions below).
The rest of this subsection should probably be skipped at first reading – it
is largely a succession of technical arguments about finite and infinite sets in
service of making rigorous what is already intuitively clear.
Before we sketch a proof of Proposition 3.11.29, let us show how (111) can be
derived from it:
Proof of (111) using Proposition 3.11.29. Let ( a0 , a1 , a2 , . . .) = ( an )n∈N be a summable
sequence of FPSs in K [[ x ]]. We must prove the equality (111).
Set Si = {0, 1} for each i ∈ {1, 2, 3, . . .}. Thus,

S1 × S2 × S3 × · · · = {0, 1} × {0, 1} × {0, 1} × · · · = {0, 1}∞ .

Define the set S as in Proposition 3.11.29. Then, S = {(1, 1) , (2, 1) , (3, 1) , . . .}.
Now, set pi,k = aik−1 for each i ∈ {1, 2, 3, . . .} and each k ∈ {0, 1}. Thus, the fam-
ily ( pi,k )(i,k)∈S is summable45 , and the statement (122) holds as well (indeed, we
have pi,0 = a0i−1 = 1 for any i ∈ {1, 2, 3, . . .}). Hence, Proposition 3.11.29 can be
applied.
Moreover, for each i ∈ {1, 2, 3, . . .}, we have Si = {0, 1}. Thus, for each
i ∈ {1, 2, 3, . . .}, we have

∑ pi,k = ∑ pi,k = pi,0 + pi,1 = 1 + a i −1 .


k ∈ Si k ∈{0,1}
|{z} |{z}
= a0i−1 =1 = a1i−1 = ai−1

45 Indeed,
this family can be rewritten as ( p1,1 , p2,1 , p3,1 , . . .) = a10 , a11 , a12 , . . . = ( a0 , a1 , a2 , . . .),


but we have assumed that the latter family is summable.


Math 701 Spring 2021, version April 6, 2024 page 154

Hence,
∞ ∞
∏ ∑ pi,k = ∏ (1 + a i −1 ) = ∏ (1 + a i )
i =1 k ∈ Si i =1 i ∈N
| {z }
=1 + a i −1

(here, we have substituted i for i − 1 in the product). Thus,



∏ (1 + a i ) = ∏ ∑ pi,k
i ∈N i =1 k ∈ Si

= ∑ ∏ |{z}
pi,ki (by (123))
(k1 ,k2 ,k3 ,...)∈S1 ×S2 ×S3 ×··· i =1
k
is essentially finite = ai−i 1
| {z }
= ∑
(k1 ,k2 ,k3 ,...)∈{0,1}∞
is essentially finite
(since S1 ×S2 ×S3 ×···={0,1}∞ )
∞ ∞
∑ ∏ ∑ ∏ ai−i−11
k k
= ai−i 1 =
(k1 ,k2 ,k3 ,...)∈{0,1}∞ i =1 (k0 ,k1 ,k2 ,...)∈{0,1}∞ |i=1{z }
is essentially finite is essentially finite
k
= ∏ ai i
i ∈N
 
here, we have renamed the summation
index (k1 , k2 , k3 , . . .) as (k0 , k1 , k2 , . . .)
∑ ∏ ai i .
k
= (124)
(k0 ,k1 ,k2 ,...)∈{0,1}∞ i ∈N
is essentially finite

Now, an essentially finite sequence (k0 , k1 , k2 , . . .) ∈ {0, 1}∞ is just an infinite


sequence of 0s and 1s that contains only finitely many 1s. Thus, such a sequence
is uniquely determined by the positions of the 1s in it, and these positions form
a finite set, so they can be uniquely labeled as i1 , i2 , . . . , ik with i1 < i2 < · · · < ik .
To put this more formally: There is a bijection

from essentially finite sequences (k0 , k1 , k2 , . . .) ∈ {0, 1}∞




to {finite lists (i1 , i2 , . . . , ik ) of nonnegative integers such that i1 < i2 < · · · < ik }

that sends each sequence (k0 , k1 , k2 , . . .) ∈ {0, 1}∞ to the list of all i ∈ N satisfy-
ing k i = 1 (written in increasing order). Furthermore, if this bijection sends an
essentially finite sequence (k0 , k1 , k2 , . . .) ∈ {0, 1}∞ to a finite list (i1 , i2 , . . . , ik ),
Math 701 Spring 2021, version April 6, 2024 page 155

then
  

∏ ai i = 
∏
∏
k  k  k 
ai i  ai i  (since each k i is either 0 or 1)
i ∈N i ∈N; |{z}  i∈N; |{z} 
k i =0 = a0 =1 k i =1 = a1i = ai
  i 

∏
 ∏ ai  = ∏
  
= 1
  a i = a i1 a i2 · · · a i k
i ∈N; i ∈N; i ∈N;
k i =0 k i =1 k i =1
| {z }
=1

(since (i1 , i2 , . . . , ik ) is the list of all i ∈ N satisfying k i = 1). Thus, we can use
this bijection to re-index the sum on the right hand side of (124); we obtain

∑ ∏ ai i = ∑
k
a i1 a i2 · · · a i k .
(k0 ,k1 ,k2 ,...)∈{0,1}∞ i ∈N i1 <i2 <···<ik
is essentially finite

Thus, (124) rewrites as

∏ (1 + a i ) = ∑ a i1 a i2 · · · a i k .
i ∈N i1 <i2 <···<ik

This proves (111).


We note that (111) can be rewritten as

∏ (1 + a i ) = ∑ ∏ ai . (125)
i ∈N J is a finite i∈ J
subset of N

Indeed, the right hand side of this equality is precisely the right hand side of
(111).
We will prove Proposition 3.11.29 soon. First, however, let us generalize
Proposition 3.11.29 to products indexed by arbitrary sets instead of {1, 2, 3, . . .}:

Proposition 3.11.30. Let I be a set. For any i ∈ I, let Si be a set that contains
the number 0. Set

S = {(i, k ) | i ∈ I and k ∈ Si and k ̸= 0} .

For any i ∈ I and any k ∈ Si , let pi,k be an element of K [[ x ]]. Assume that

pi,0 = 1 for any i ∈ I. (126)


Math 701 Spring 2021, version April 6, 2024 page 156

Assume further that the family ( pi,k )(i,k)∈S is summable. Then, the product
∏ ∑ pi,k is well-defined (i.e., the family ( pi,k )k∈Si is summable for each
i ∈ I k ∈ Si
!
i ∈ I, and the family ∑ pi,k is multipliable), and we have
k ∈ Si i∈ I

∏ ∑ pi,k = ∑ ∏ pi,ki . (127)


i∈ I k ∈ Si ( k i ) i ∈ I ∈ ∏ Si i∈ I
i∈ I
is essentially finite
 
In particular, the family ∏ pi,ki is summable.
i∈ I (k i )i∈ I ∈ ∏ Si is essentially finite
i∈ I

Clearly, Proposition 3.11.29 is the particular case of Proposition 3.11.30 for


I = {1, 2, 3, . . .}. The proof of Proposition 3.11.30 is a long-winded reduction
to the finite case; nothing substantial is going on in it. Thus, I recommend
skipping it unless specifically interested. For the sake of convenience, before
proving Proposition 3.11.30, let me restate Proposition 3.11.27 in a form that
makes it particularly easy to use:
Proposition 3.11.31. Let N be a finite set. For any i ∈ N, let ( pi,k )k∈S be a
i
summable family of elements of K [[ x ]]. Then,

∏ ∑ pi,k = ∑ ∏ pi,ki . (128)


i∈ N k ∈ Si ( k i ) i ∈ N ∈ ∏ Si i∈ N
i∈ N

 
In particular, the family ∏ pi,ki is summable.
i∈ N ( k i ) i ∈ N ∈ ∏ Si
i∈ N

Proof of Proposition 3.11.31. This is just Proposition 3.11.27, with the indexing
set {1, 2, . . . , n} replaced by N. It can be proved by reindexing the products (or
directly by induction on | N |).
Let me furthermore state an infinite version of Lemma 3.11.9, which will be
used in the proof of Proposition 3.11.30 given below:

Lemma 3.11.32. Let a ∈ K [[ x ]] be an FPS. Let ( f i )i∈ J ∈ K [[ x ]] J be a summable


family of FPSs. Let n ∈ N. Assume that each i ∈ J satisfies
[ xm ] ( f i ) = 0 for each m ∈ {0, 1, . . . , n} . (129)
Then,
!
[ x m ] a ∏ (1 + f i ) = [ xm ] a for each m ∈ {0, 1, . . . , n} .
i∈ J
Math 701 Spring 2021, version April 6, 2024 page 157

Proof of Lemma 3.11.32 (sketched). The idea here is to argue that the first n + 1
coefficients of a ∏ (1 + f i ) agree with those of a ∏ (1 + f i ) for some finite sub-
i∈ J i∈ M
set M of J. Then, apply Lemma 3.11.9 to this subset M. The details of this
argument can be found in Section B.2.
With this lemma proved, we have all necessary prerequisites for the proof of
Proposition 3.11.30. The proof, however, is rather long due to the bookkeeping
required, and therefore has been banished to the appendix (Section B.2, to be
specific).
We notice that Proposition 3.11.30 (and thus also Proposition 3.11.29) can be
generalized slightly by lifting the requirement that all sets Si contain 0 (this
means that the sums being multiplied no longer need to contain 1 as an ad-
dend). See Exercise A.2.11.5 for this generalization. This lets us expand prod-
ucts such as x 1 + x1 1 + x2 1 + x3 · · · (but of course, we could just as
 

well expand this particular product by splitting off the x factor and applying
Proposition 3.11.29).

3.11.8. Another example


As another bit of practice with infinite products of FPSs, let us prove the fol-
lowing identity by Euler ([Euler48, §326]):

Proposition 3.11.33 (Euler). We have


  −1  
∏ 1 − x2i−1 = ∏ 1 + xk .
i >0 k >0

Proof. First, it is easy to see that the families 1 + x k k>0 and 1 − x k k>0 and
 

1 − x2k k>0 and 1 − x2i−1 i>0 are multipliable46 . This shows that the product
 

on the right hand side of Proposition 3.11.33 is well-defined.


 
1
Proposition 3.11.19 (a) shows that the family 2i −1
is multipli-
1 − x i > 0
able (since the families (1)i>0 and 1 − x2i−1 i>0 are multipliable, and since


 FPS 1 −
the x2i−1 is invertible for each i > 0). In other words, the family
−1
1 − x2i−1

is multipliable. Thus, the product on the left hand side of
i >0
Proposition 3.11.33 is well-defined.

   
46 Indeed, this follows from Theorem 3.11.10, since the families xk and − xk and
  k >0 k >0
− x2k and − x2i−1 i>0 are summable.

k >0
Math 701 Spring 2021, version April 6, 2024 page 158

1 − x2k 2k = 1 − x k 2 =
For each k > 0, we have 1 + x k =

(since 1 − x
1 − xk
k k
 
1−x 1 + x ). Thus,
  1 − x2k
∏ 1 + x = ∏ 1 − xk
k

k>0 | {z } k >0
1−x 2k
=
1 − xk
∏ 1 − x2k


= k >0 (by Proposition 3.11.19 (b))


∏ 1 − xk

k >0
1 − x2 1 − x4 1 − x6 1 − x8 · · ·
   
=
(1 − x 1 ) (1 − x 2 ) (1 − x 3 ) (1 − x 4 ) · · ·
1
=
(1 − x ) (1 − x ) (1 − x 5 ) (1 − x 7 ) · · ·
1 3
 
since all 1 − xi factors with i even
cancel out (see below for details)
1 1 1 1
= · · · ····
1 − x1 1 − x3 1 − x5 1 − x7
(by Proposition 3.11.19 (b))
1   −1
=∏ 2i −1
= ∏ 1 − x 2i −1
,
i >0 1 − x i >0
which is precisely the claim of Proposition 3.11.33.
The step where we cancelled out the 1 − xi factors  with i even can be jus-
tified as follows: We know that the family 1 − x k k>0 as well as its subfam-
ilies 1 − x k k>0 is even and 1 − x k k>0 are odd are multipliable. Thus, Theorem
 

3.11.17 (b) yields


   
     
∏ 1 − xk =  ∏ 1 − xk  ·  ∏ 1 − xk  .
   
k >0 k >0 k >0
is even is odd
In other words,
    
1 2 3 4
1−x 1−x 1−x 1−x ···
     
= 1 − x2 1 − x4 1 − x6 1 − x8 · · ·
     
· 1 − x1 1 − x3 1 − x5 1 − x7 · · · .
Hence,
1 − x1 1 − x2 1 − x3 1 − x4 · · ·
        
1 3 5 7
= 1 − x 1 − x 1 − x 1 − x ··· .
(1 − x 2 ) (1 − x 4 ) (1 − x 6 ) (1 − x 8 ) · · ·
Math 701 Spring 2021, version April 6, 2024 page 159

Taking reciprocals, we thus obtain

1 − x2 1 − x4 1 − x6 1 − x8 · · ·
   
1
1 2 3 4
= ,
(1 − x ) (1 − x ) (1 − x ) (1 − x ) · · · (1 − x ) (1 − x ) (1 − x 5 ) (1 − x 7 ) · · ·
1 3

just as we claimed. Thus, the proof of Proposition 3.11.33 is complete.


Let us try to interpret Proposition 3.11.33 combinatorially, by expanding both
sides.
First, we use (111) to expand the right hand side:
      
∏ 1 + x k
= 1 + x 1
1 + x 2
1 + x 3
1 + x 4
···
k >0
= ∑ i1 i2
|x x {z · · · xi}k (by (111))
i1 ,i2 ,...,ik ∈{1,2,3,...};
= xi1 +i2 +···+ik
i1 <i2 <···<ik

= ∑ xi1 +i2 +···+ik


i1 ,i2 ,...,ik ∈{1,2,3,...};
i1 <i2 <···<ik
= ∑ dn x n , (130)
n ∈N

where dn is the # of all strictly increasing tuples (i1 < i2 < · · · < ik ) of positive
integers such that n = i1 + i2 + · · · + ik . We can rewrite this definition as follows:
dn is the # of ways to write n as a sum of distinct positive integers, with no
regard for the order47 .
Next, let us expand the left hand side: For each positive integer i, we have
  −1  2  3
1 − x2i−1 = 1 + x2i−1 + x2i−1 + x2i−1 + · · ·
!
by substituting x2i−1 for x in the
equality (1 − x )−1 = 1 + x + x2 + x3 + · · ·
= 1 + x2i−1 + x2(2i−1) + x3(2i−1) + · · · = ∑ x k(2i−1) .
k ∈N

47 “Noregard for the order” means that, for example, 3 + 4 + 1 and 1 + 3 + 4 count as the same
way of writing 8 as a sum of distinct integers.
Math 701 Spring 2021, version April 6, 2024 page 160

Multiplying these equalities over all i > 0, we find


  −1
∏ 1 − x2i−1
i >0
=∏ ∑ x k(2i−1)
i >0 k ∈N
!
by Proposition 3.11.29,
= ∑ ∏x k i (2i −1)
applied to Si = N and pi,k = x k(2i−1)
(k1 ,k2 ,k3 ,...)∈N∞
is essentially finite
|i>0 {z }
= x k1 ·1 x k2 ·3 x k3 ·5 ···
= x k1 ·1+k2 ·3+k3 ·5+···
= ∑ x k1 ·1+k2 ·3+k3 ·5+···
(k1 ,k2 ,k3 ,...)∈N∞
is essentially finite

= ∑ on x n , (131)
n ∈N

where on is the # of all essentially finite sequences (k1 , k2 , k3 , . . .) ∈ N∞ such


that k1 · 1 + k2 · 3 + k3 · 5 + · · · = n. I claim that on is the # of ways to write n as
a sum of (not necessarily distinct) odd positive integers, without regard to the
order. (Why? Because if we can write n as a sum of k1 many 1s, k2 many 3s, k3
many 5s and so on, then (k1 , k2 , k3 , . . .) ∈ N∞ is an essentially finite sequence
such that k1 · 1 + k2 · 3 + k3 · 5 + · · · = n; and conversely, if (k1 , k2 , k3 , . . .) ∈ N∞
is such a sequence, then n can be written as a sum of k1 many 1s, k2 many 3s,
k3 many 5s and so on.)
 −1
Proposition 3.11.33 tells us that ∏ 1 + x k = ∏ 1 − x2i−1

. In view of
k >0 i >0
(130) and (131), we can rewrite this as

∑ dn x n = ∑ on x n .
n ∈N n ∈N

Therefore, dn = on for each n ∈ N. Thus, we have proven the following purely


combinatorial statement:

Theorem 3.11.34 (Euler). Let n ∈ N. Then, dn = on , where

• dn is the # of ways to write n as a sum of distinct positive integers,


without regard to the order;

• on is the # of ways to write n as a sum of (not necessarily distinct) odd


positive integers, without regard to the order.
Math 701 Spring 2021, version April 6, 2024 page 161

Example 3.11.35. Let n = 6. Then, dn = 4, because the ways to write n = 6


as a sum of distinct positive integers are

6, 1 + 5, 2 + 4, 1+2+3

(don’t forget the first of these ways, trivial as it looks!). On the other hand,
on = 4, because the ways to write n = 6 as a sum of odd positive integers are

1 + 5, 3 + 3, 3 + 1 + 1 + 1, 1 + 1 + 1 + 1 + 1 + 1.

We will soon learn a bijective proof of Theorem 3.11.34 as well (see the Second
proof of Theorem 4.1.14 below).

3.11.9. Infinite products and substitution


Proposition 3.5.4 (h) has an analogue for products instead of sums:

Proposition 3.11.36. If ( f i )i∈ I ∈ K [[ x ]] I is a multipliable family of FPSs, and if


I
g ∈ K [[ x ]] is an FPS satisfying x0 g= 0, then
 
 the family ( f i ◦ g)i∈ I ∈ K [[ x ]]
is multipliable as well and we have ∏ fi ◦ g = ∏ ( f i ◦ g ).
i∈ I i∈ I

Proof of Proposition 3.11.36 (sketched). See Section B.2.

3.11.10. Exponentials, logarithms and infinite products


Recall Definition 3.7.6. As we saw in Lemma 3.7.9, the exponential map Exp
and the logarithm map Log can be used to convert sums into products and vice
versa (under certain conditions). This also extends to infinite sums and infinite
products:

Proposition 3.11.37. Assume that K is a commutative Q-algebra. Let ( f i )i∈ I ∈


K [[ x ]] I be a summable family of FPSs in K [[ x ]]0 . Then, (Exp f i )i∈ I is a mul-
tipliable family of FPS in K [[ x ]]1 , and we have ∑ f i ∈ K [[ x ]]0 and
i∈ I
!
Exp ∑ fi = ∏ (Exp f i ) .
i∈ I i∈ I

Proposition 3.11.38. Assume that K is a commutative Q-algebra. Let ( f i )i∈ I ∈


K [[ x ]] I be a multipliable family of FPSs in K [[ x ]]1 . Then, (Log f i )i∈ I is a
Math 701 Spring 2021, version April 6, 2024 page 162

summable family of FPS in K [[ x ]]0 , and we have ∏ f i ∈ K [[ x ]]1 and


i∈ I
!
Log ∏ fi = ∑ (Log f i ) .
i∈ I i∈ I

We leave the proofs of these two propositions as an exercise (Exercise A.2.11.6).

3.12. The generating function of a weighted set


So far, we have built a theory of FPSs, and we have occasionally applied them to
counting problems. However, so far, the latter applications were either limited
to counting tuples of integers (because such tuples naturally appear as indexes
when we multiply several sums) or somewhat indirect, mediated by integer
sequences (i.e., we could not immediately transform our counting problem into
an FPS but rather had to first define an integer sequence and then take its
generating function). For instance, our above proof of Theorem 3.11.34 was an
application of the former type, while Example 2 in Section 3.1 was one of the
latter type.
I shall now explain a more direct approach to transforming counting prob-
lems into FPSs. This approach is the theory of so-called “combinatorial classes”
or “finite-type weighted sets”, and can be used (particularly in some of its more
advanced variants) to prove deep results. However, I will not use it much in
these notes, so I will only give a brief introduction. Much more can be learned
from [FlaSed09, Part A]; introductions can also be found in [Fink17, §3.3–§3.4],
[Melcze24, Parts 1 and 2] and [Loehr11, Chapter 6]. The theory is not magic,
and in principle can always be eschewed (any proof using it can be rewritten
without its use), but it offers the major advantages of systematizing the com-
binatorial objects being counted and making the proofs more insightful. In a
sense, it allows us to perform certain computations not just with the generat-
ing functions of the objects, but with the objects themselves – a middle ground
between algebra and combinatorics.
I will now outline the beginnings of this theory, following Fink’s introduc-
tion [Fink17, §3.3–§3.4] (but speaking of finite-type weighted sets where [Fink17],
[Melcze24] and [FlaSed09] speak of “combinatorial classes”).

3.12.1. The theory

Definition 3.12.1. (a) A weighted set is a set A equipped with a function w :


A → N, which is called the weight function of this weighted set. For each
a ∈ A, the value w ( a) is denoted | a| and is called the weight of a (in our
weighted set).
Math 701 Spring 2021, version April 6, 2024 page 163

Usually, instead of explicitly specifying the weight function w as a func-


tion, we will simply specify the weight | a| for each a ∈ A. The weighted set
consisting of the set A and the weight function w will be called ( A, w) or
simply A when the weight function w is clear from the context.
(b) A weighted set A is said to be finite-type if for each n ∈ N, there are
only finitely many a ∈ A having weight | a| = n.
(c) If A is a finite-type weighted set, then the weight generating function of
A is defined to be the FPS

∑ x | a| = ∑ (# of a ∈ A having weight n) · x n ∈ Z [[ x ]] .
a∈ A n ∈N

This FPS is denoted by A.


(d) An isomorphism between two weighted sets A and B means a bijection
ρ : A → B that preserves the weight (i.e., each a ∈ A satisfies |ρ ( a)| = | a|).
(e) We say that two weighted sets A and B are isomorphic (this is written
A∼
= B) if there exists an isomorphism between A and B.

Example 3.12.2. Let B be the weighted set of all binary strings, i.e., finite
tuples consisting of 0s and 1s. Thus,

B = {() , (0) , (1) , (0, 0) , (0, 1) , (1, 0) , (1, 1) , (0, 0, 0) , (0, 0, 1) , . . .} .

The weight of an n-tuple is defined to be n. In other words, the weight of a


binary string is the length of this string. This weighted set B is finite-type,
since for each n ∈ N, there are only finitely many binary strings of length n
(namely, there are 2n such strings). The weight generating function of B is

1
B= ∑ (# of a ∈ B having weight n) · x n = ∑ 2n x n =
1 − 2x
.
n ∈N n ∈N
| {z }
=(# of binary strings of length n)
=2n

Example 3.12.3. Let B′ be the weighted set of all binary strings, i.e., finite
tuples consisting of 0s and 1s. The weight of an n-tuple ( a1 , a2 , . . . , an ) ∈ B′ is
defined to be a1 + a2 + · · · + an . (Thus, B′ is the same set as the B in Example
3.12.2, but equipped with a different weight function.) This weighted set
B′ is not finite-type, since there are infinitely many a ∈ B′ having weight 0
(namely, all tuples of the form (0, 0, . . . , 0), with any number of zeroes.)

Those familiar with graded vector spaces (or graded modules) will recognize
weighted sets as their combinatorial analogues: Weighted sets are to sets what
graded vector spaces are to vector spaces. (In particular, the weight generating
Math 701 Spring 2021, version April 6, 2024 page 164

function A of a finite-type weighted set A is an analogue of the Hilbert series


of a finite-type graded vector space.)
Let us now prove a few basic properties of weighted sets and their weight
generating functions.

Proposition 3.12.4. Let A and B be two isomorphic finite-type weighted sets.


Then, A = B.

Proof. This is almost trivial: The weighted sets A and B are isomorphic; thus,
there exists an isomorphism ρ : A → B. Consider this ρ. Then, ρ is a bijection
and preserves the weight (since ρ is an isomorphism of weighted sets). The
latter property says that we have

|ρ ( a)| = | a| for each a ∈ A. (132)

Now, the definition of B yields


 
here, we have substituted ρ ( a) for b
B= ∑x |b|
= ∑ |ρ( a)|
|x {z } in the sum, since ρ is a bijection
b∈ B a∈ A
= x | a|
(by (132))

= ∑ x | a| .
a∈ A

Comparing this with

∑ x | a|

A= by the definition of A ,
a∈ A

we obtain A = B. This proves Proposition 3.12.4.


Note that Proposition 3.12.4 has a converse: If A = B, then A ∼
= B.
Recall that the disjoint union of two sets A and B is informally understood to
be “the union of A and B where we pretend that the sets A and B are disjoint
(even if they aren’t)”. Formally, it is defined as the set ({0} × A) ∪ ({1} × B),
but we think of the elements (0, a) of this set as copies of the respective elements
a ∈ A, and we think of the elements (1, b) of this set as copies of the respective
elements b ∈ B. (Thus, even if the original sets A and B have some elements
in common, say c ∈ A ∩ B, the two copies (0, c) and (1, c) of such common
elements will not coincide.)
If A and B are two finite sets, then their disjoint union always has size | A| +
| B |.
Let us now extend the concept of a disjoint union to weighted sets:
Math 701 Spring 2021, version April 6, 2024 page 165

Definition 3.12.5. Let A and B be two weighted sets. Then, the weighted set
A + B is defined to be the disjoint union of A and B, with the weight function
inherited from A and B (meaning that each element of A has the same weight
that it had in A, and each element of B has the same weight that it had in B).
Formally speaking, this means that A + B is the set ({0} × A) ∪ ({1} × B),
with the weight function given by

|(0, a)| = | a| for each a ∈ A (133)

and
|(1, b)| = |b| for each b ∈ B. (134)

Proposition 3.12.6. Let A and B be two finite-type weighted sets. Then, A + B


is finite-type, too, and satisfies A + B = A + B.

Proof. The formal definition of A + B says that A + B = ({0} × A) ∪ ({1} × B).


Thus, the set A + B is the union of the two disjoint sets {0} × A and {1} × B
(indeed, these two sets are clearly disjoint, since 0 ̸= 1).
Let us first check that A + B is finite-type.
For each n ∈ N, there are only finitely many a ∈ A having weight | a| = n
(since A is finite-type), and there are only finitely many b ∈ B having weight
|b| = n (since B is finite-type). Hence, there are only finitely many c ∈ A + B
having weight |c| = n (because any such c either has the form (0, a) for some
a ∈ A having weight | a| = n, or has the form (1, b) for some b ∈ B having
weight |b| = n). In other words, the weighted set A + B is finite-type.
Let us now check that A + B = A + B. Indeed, the definition of A + B yields

A+B = ∑ x |c|
c∈ A+ B
= ∑ x |c| + ∑ x |c|
c∈{0}× A c∈{1}× B
| {z } | {z }
= ∑ x |(0,a)| = ∑ x |(1,b)|
a∈ A b∈ B
(here, we have substituted (0,a) (here, we have substituted (1,b)
for c in the sum, since the for c in the sum, since the
map A→{0}× A, a7→(0,a) is a bijection) map B→{1}× B, b7→(1,b) is a bijection)
 
here, we have split the sum, since the
 set A + B is the union of the two disjoint 
sets {0} × A and {1} × B
= ∑ |(0,a)|
|x {z } + ∑ |(1,b)|
|x {z } = ∑ x|a| + ∑ x|b| = A + B.
a∈ A
= x | a|
b∈ B
= x |b| |a∈ A{z } b∈ B
| {z }
(by (133)) (by (134))
=A =B

Thus, the proof of Proposition 3.12.6 is complete.


Math 701 Spring 2021, version April 6, 2024 page 166

We can easily extend Definition 3.12.5 and Proposition 3.12.6 to disjoint unions
of any number (even infinite) of weighted sets. (However, in the case of infinite
disjoint unions, we need to require the disjoint union to be finite-type in order
for Proposition 3.12.6 to make sense.)
We note that disjoint unions of weighted sets do not satisfy associativity
“on the nose”: If A, B and C are three weighted sets, then the weighted sets
A + B + C and ( A + B) + C and A + ( B + C ) are not literally equal but rather
are isomorphic via canonical isomorphisms. But Proposition 3.12.4 shows that
their weight generating functions are the same, so that we need not distinguish
between them if we are only interested in these functions.
Another operation on weighted sets is the Cartesian product:

Definition 3.12.7. Let A and B be two weighted sets. Then, the weighted set
A × B is defined to be the Cartesian product of the sets A and B (that is, the
set {( a, b) | a ∈ A and b ∈ B}), with the weight function defined as follows:
For any ( a, b) ∈ A × B, we set

|( a, b)| = | a| + |b| . (135)

Proposition 3.12.8. Let A and B be two finite-type weighted sets. Then, A × B


is finite-type, too, and satisfies A × B = A · B.

Proof of Proposition 3.12.8 (sketched). The proof that A × B is finite-type is easy48 .


Now, the definition of A × B yields
 
since all elements of A × B have
A×B = ∑ |x {z }
|( a,b)|
the form ( a, b) for some a and b
( a,b)∈ A× B = x |a|+|b|
(by (135))

= ∑ | a|+|b|
|x {z }
( a,b)∈ A× B = x |a| · x |b|
! !
= ∑ x | a| · x |b| = ∑ x | a| ∑ x |b| = A · B.
( a,b)∈ A× B a∈ A b∈ B
| {z }| {z }
=A =B

The proof of Proposition 3.12.8 is thus complete.

48 Here is the idea: Let n ∈ N. We must prove that there are only finitely many pairs ( a, b) ∈
A × B having weight |( a, b)| = n. Since |( a, b)| = | a| + |b|, these pairs have the property
that a ∈ A has weight k for some k ∈ {0, 1, . . . , n}, and that b ∈ B has weight n − k for the
same k. This leaves only finitely many options for k, only finitely many options for a (since
A is finite-type and thus has only finitely many elements of weight k), and only finitely
many options for b (since B is finite-type and thus has only finitely many elements of n − k).
Altogether, we thus obtain only finitely many options for ( a, b).
Math 701 Spring 2021, version April 6, 2024 page 167

We can easily extend Definition 3.12.7 and Proposition 3.12.8 to Cartesian


products of k weighted sets. The weight function on such a Cartesian product
A1 × A2 × · · · × Ak is defined by

|( a1 , a2 , . . . , ak )| = | a1 | + | a2 | + · · · + | ak | . (136)

As a particular case of such Cartesian products, we obtain Cartesian powers


when we multiply k copies of the same weighted set:

Definition 3.12.9. Let A be a weighted set. Then, Ak (for k ∈ N) means the


weighted set |A × A ×{z· · · × A}.
k times

The analogue of Proposition 3.12.8 for Cartesian products of k weighted sets


then yields the following:

Proposition 3.12.10. Let A be a finite-type weighted set. Let k ∈ N. Then,


k
Ak is finite-type, too, and satisfies Ak = A .

Note that the 0-th Cartesian power A0 of a weighted set A always consists of
a single element – namely, the empty 0-tuple (), which has weight 0.
If A is a weighted set, then the infinite disjoint union A0 + A1 + A2 + · · ·
consists of all (finite) tuples of elements of A (including the 0-tuple, the 1-
tuples, and so on). The weight of a tuple is the sum of the weights of its entries
(indeed, this is just what (136) says).

3.12.2. Examples
Now, let us use this theory to revisit some of the things we have already
counted:

• Fix k ∈ N, and let

Ck := {compositions of length k }
= {( a1 , a2 , . . . , ak ) | a1 , a2 , . . . , ak are positive integers}
= Pk , where P = {1, 2, 3, . . .} .

This set Ck becomes a finite-type weighted set if we set |( a1 , a2 , . . . , ak )| =


a1 + a2 + · · · + ak for every ( a1 , a2 , . . . , ak ) ∈ Ck . What is its weight gener-
ating function Ck ? We can turn P itself into a weighted set, by defining
the weight of a positive integer n by |n| = n. Then, Ck = Pk not just as
Math 701 Spring 2021, version April 6, 2024 page 168

sets, but as weighted sets49 . Hence,


k
Ck = Pk = P (by Proposition 3.12.10)
 k  
1 2 3
= x +x +x +··· since P = x + x + x + · · ·
1 2 3

 k  
x 1 2 3 x
= since x + x + x + · · · = .
1−x 1−x

This recovers the equality (94).

• Recall the notion of Dyck paths (as defined in Example 2 in Section 3.1),
as well as the Catalan numbers c0 , c1 , c2 , . . . (defined in the same place).
Let

D := {Dyck paths from (0, 0) to (2n, 0) for some n ∈ N} .

This set D becomes a finite-type weighted set if we set

| P| = n whenever P is a Dyck path from (0, 0) to (2n, 0) .

The weight generating function D of this weighted set D is

D= ∑ (# of Dyck paths from (0, 0) to (2n, 0)) x n = ∑ cn x n .


n ∈N n ∈N
| {z }
=cn
(by the definition of cn )

This is the generating function we called C ( x ) in Example 2 of Section


3.1.
Recall that there is only one Dyck path from (0, 0) to (0, 0), namely the
trivial path. All the other Dyck paths in D are nontrivial. We let

Dtriv := {trivial Dyck paths in D } and


Dnon := {nontrivial Dyck paths in D } .

These two sets Dtriv and Dnon are subsets of D, and thus are weighted sets
themselves (we define their weight functions by restricting the one of D).
49 Proof. The weight of a composition ( a1 , a2 , . . . , ak ) ∈ Ck in Ck is

|( a1 , a2 , . . . , ak )| = a1 + a2 + · · · + ak (by definition) .

The weight of a composition ( a1 , a2 , . . . , ak ) ∈ Pk in Pk is

|( a1 , a2 , . . . , ak )| = | a1 | + | a2 | + · · · + | ak | (by (136))
= a1 + a2 + · · · + a k (since |n| = n for each n ∈ P) .

These two weights are clearly equal. Thus, the weight functions of Ck and Pk agree. Hence,
Ck = Pk as weighted sets.
Math 701 Spring 2021, version April 6, 2024 page 169

The set Dtriv consists of a single Dyck path, which has weight 0; thus, its
weight generating function is

Dtriv = x0 = 1.

In Example 2 of Section 3.1, we have seen that any nontrivial Dyck path
π has the following structure:50
– a NE-step,
– followed by a (diagonally shifted) Dyck path (drawn in green),
– followed by a SE-step,
– followed by another (horizontally shifted) Dyck path (drawn in pur-
ple).
If we denote the green Dyck path by α and the purple Dyck path by β,
then we obtain a pair (α, β) ∈ D × D of two Dyck paths. Thus, each
nontrivial Dyck path π ∈ Don gives rise to a pair (α, β) ∈ D × D of two
Dyck paths. This yields a map

Dnon → D × D,
π 7→ (α, β) ,

which is easily seen to be a bijection (since any pair (α, β) ∈ D × D can


be assembled into a single nontrivial Dyck path π that starts with an NE-
step, is followed by a shifted copy of α, then by a SE-step, then by a shifted
copy of β).
Alas, this bijection is not an isomorphism of weighted sets, since it fails to
preserve the weight. Indeed, |(α, β)| = |α| + | β| = |π | − 1 ̸= |π |.
Fortunately, we can fix this rather easily. Define a weighted set

X : = {1} with |1| = 1.

This is a one-element set, so the only real difference between the weighted
sets X × D × D and D × D is in the weights. Indeed, the sets D × D and
X × D × D are in bijection (any pair (α, β) ∈ D × D corresponds to the

50 The colors are referring to the following picture:


Math 701 Spring 2021, version April 6, 2024 page 170

triple (1, α, β) ∈ X × D × D), but the weights of corresponding elements


differ by 1 (namely, |(1, α, β)| = |1| + |α| + | β| = 1 + |(α, β)|).
|{z} | {z }
=1 =|(α,β)|
Thus, by replacing D × D by X × D × D, we can fix the degrees in our
above bijection. We thus obtain a bijection

Dnon → X × D × D,
π 7→ (1, α, β) ,

which does preserve the weight. This bijection is thus an isomorphism of


weighted sets. Hence, Dnon ∼
= X × D × D.
Each Dyck path is either trivial or nontrivial. Hence,

D∼
= Dtriv + Dnon ∼
= Dtriv + X × D × D,
| {z }

=X ×D×D

so that

D = Dtriv + X × D × D (by Proposition 3.12.4)


X · |D{z
= Dtriv + |{z} · D} (by Proposition 3.12.6 and Proposition 3.12.8)
|{z}
=1 =x 2
=D
2
= 1+x·D .

This is precisely the quadratic equation C ( x ) = 1 + x (C ( x ))2 that we


obtained in Section 3.1. But this time, we obtained it in a more conceptual
way, through a combinatorially defined isomorphism D ∼ = Dtriv + X ×
D × D rather than by ad-hoc manipulation of FPSs.

3.12.3. Domino tilings


Let us now apply the theory of weight generating functions to something we
haven’t already counted. Namely, we shall count the domino tilings of a rect-
angle. Informally, these are defined as follows:

• For any n, m ∈ N, we let Rn,m be a rectangle with width n and height m.

• A domino means a rectangle that is an R1,2 or an R2,1 .

• A domino tiling of a shape S means a tiling of S by dominos (i.e., a set of


dominos that cover S and whose interiors don’t intersect).
Math 701 Spring 2021, version April 6, 2024 page 171

For example, here is a domino tiling of R8,8 :

(note that the colors are purely ornamental here: we are coloring a domino
pink if it lies horizontally and green if it stands vertically, for the sake of con-
venience).
This sounds geometric, but actually is a combinatorial object hiding behind
geometric language. Our rectangles and dominos all align to a square grid.
Thus, rectangles can be modeled simply as finite sets of grid squares, and domi-
nos are unordered pairs of adjacent grid squares. Grid squares, in turn, can be
modeled as pairs (i, j) of integers (corresponding to the Cartesian coordinates
of their centers). Thus, we redefine domino tilings combinatorially as follows:

Definition 3.12.11. (a) A shape means a subset of Z2 .


We draw each (i, j) ∈ Z2 as a unit square with center at the point (i, j) (in
Cartesian coordinates); thus, a shape can be drawn as a cluster of squares.
(b) For any n, m ∈ N, the shape Rn,m (called the n × m-rectangle) is defined
to be
n o
{1, 2, . . . , n} × {1, 2, . . . , m} = (i, j) ∈ Z | 1 ≤ i ≤ n and 1 ≤ j ≤ m .
2

(c) A domino means a size-2 shape of the form


{(i, j) , (i + 1, j)} (a “horizontal domino”) or
{(i, j) , (i, j + 1)} (a “vertical domino”)
for some (i, j) ∈ Z2 .
(d) A domino tiling of a shape S is a set partition of S into dominos (i.e., a
set of disjoint dominos whose union is S).
(e) For any n, m ∈ N, let dn,m be the # of domino tilings of Rn,m .
Math 701 Spring 2021, version April 6, 2024 page 172

For example, the domino tiling of the rectangle R3,2 is the set

partition
{{(1, 1) , (1, 2)} , {(2, 1) , (3, 1)} , {(2, 2) , (3, 2)}}
of R3,2 (here, {(1, 1) , (1, 2)} is the vertical domino, whereas {(2, 1) , (3, 1)} is
the bottom horizontal domino, and {(2, 2) , (3, 2)} is the top horizontal domino).

Can we compute dn,m ?


The case m = 1 is a bit too simple (do it!), so let us start with the case m = 2.
Here are the dn,2 for n ∈ {0, 1, . . . , 4}:

n dn,2 domino tilings of Rn,2

0 d0,2 = 1

1 d1,2 = 1

2 d2,2 = 2 , .

3 d3,2 = 3 , ,

4 d4,2 = 5 , , ,

Let us try to compute dn,2 in general.


Math 701 Spring 2021, version April 6, 2024 page 173

A height-2 rectangle shall mean a rectangle of the form Rn,2 with n ∈ N. Let
us define the weighted set

D := {domino tilings of height-2 rectangles}


= {domino tilings of Rn,2 with n ∈ N} .

We define the weight of a tiling T of Rn,2 to be | T | := n (that is, the width of


the rectangle tiled by this tiling).
Thus, D is a finite-type weighted set, with weight generating function

D= ∑ dn,2 x n .
n ∈N

So we want to compute D. Let us define a new weighted set that will help
us at that.
Namely, we say that a fault of a domino tiling T is a vertical line ℓ such that

• each domino of T lies either left of ℓ or right of ℓ (but does not straddle
ℓ), and
• there is at least one domino of T that lies left of ℓ, and at least one domino
of T that lies right of ℓ.

For example, here are two domino tilings:

The tiling on the left has a fault (namely, the vertical line separating the 2nd
from the 3rd column), but the tiling on the right has none (a fault must be a
vertical line by definition; a horizontal line doesn’t count).
A domino tiling will be called faultfree if it is nonempty and has no fault.
Thus, the tiling on the right (in the above example) is faultfree.
We now observe a crucial (but trivial) lemma:

Lemma 3.12.12. Any domino tiling of a height-2 rectangle can be decom-


posed uniquely into a tuple of faultfree tilings of (usually smaller) height-2
Math 701 Spring 2021, version April 6, 2024 page 174

rectangles, by cutting it along its faults. For example:

decomposes as

 
 , , , .

(Note that if the original tiling was faultfree, then it will decompose into
a 1-tuple. If the original tiling was empty, then it will decompose into a
0-tuple.)
Moreover, the sum of the weights of the faultfree tilings in the tuple is the
weight of the original tiling. (In other words, if a tiling T decomposes into
the tuple ( T1 , T2 , . . . , Tk ), then | T | = | T1 | + | T2 | + · · · + | Tk |.)

Thus, if we define a new weighted set

F := {faultfree domino tilings of Rn,2 with n ∈ N}

(with the same weights as in D), then we obtain an isomorphism (i.e., weight-
preserving bijection)
D → |F0 + F1 + F{z2 + F3 + · ·}·
an infinite disjoint union
of weighted sets

that sends each tiling to the tuple it decomposes into. Hence,

D∼
= F0 + F1 + F2 + F3 + · · · ,

and therefore

D = F0 + F1 + F2 + F3 + · · · = F0 + F1 + F2 + F3 + · · ·
(by the infinite analogue of Proposition 3.12.6)
0 1 2 3
= F +F +F +F +··· (by Proposition 3.12.10)
1 
= by (5), with F substituted for x .
1−F

Thus, if we can compute F, then we can compute D.


In order to compute F, let us see how a faultfree domino tiling of a height-2
rectangle looks like. Here are two such tilings:

and .
Math 701 Spring 2021, version April 6, 2024 page 175

I claim that these two tilings are the only faultfree tilings of height-2 rectan-
gles. Indeed, consider any faultfree tiling of a height-2 rectangle. In this tiling,
look at the domino that covers the box (1, 1). If it is a vertical domino, then this
vertical domino must constitute the entire tiling, since otherwise there would
be a fault to its right. If it is a horizontal domino, then there must be a second
horizontal domino stacked atop it, and these two dominos must then constitute
the entire tiling, since otherwise there would be a fault to their right. This leads
to the two options we just named.
Thus, the weighted set F consists of just the two tilings shown above: one
tiling of weight 1 and one tiling of weight 2. Hence, its weight generating
function is F = x + x2 . So
1 1 1
D= = 2)
= 2
= f 1 + f 2 x + f 3 x2 + f 4 x3 + · · · ,
1−F 1 − ( x + x 1 − x − x
where ( f 0 , f 1 , f 2 , . . .) is the Fibonacci sequence. Thus, comparing coefficients,
we find
dn,2 = f n+1 for each n ∈ N.
There are, of course, more elementary proofs of this (see [19fco, Proposition
1.1.11]).

1
Remark 3.12.13. Here is an outline of an alternative proof of D = :
1−F
Any tiling in D is either empty, or can be uniquely split into a pair of a
faultfree tiling and an arbitrary tiling (just split it along its leftmost fault, or
along the right end if there is no fault). Thus,

D∼
= {0} + F × D,

where the set {0} here is viewed as a weighted set with |0| = 0. Hence,

D = {0} + F × D = {0} + F · D = 1 + F · D.

1
Solving this for D, we find D = .
1−F

Now, let us try to solve the analogous problem for height-3 rectangles. Forget
about the D and F we defined above. Instead, define a new weighted set

D := {domino tilings of height-3 rectangles}


= {domino tilings of Rn,3 with n ∈ N} .
The weight of a tiling T of Rn,3 is defined as before (i.e., it is | T | = n). Thus, D
is a finite-type weighted set, with generating function

D= ∑ dn,3 x n .
n ∈N
Math 701 Spring 2021, version April 6, 2024 page 176

We want to compute this D.


Set
F := {faultfree domino tilings of Rn,3 with n ∈ N} .
1
Again, we can show D = (as we did above in the case of Rn,2 ). We thus
1−F
need to compute F.
How does a faultfree domino tiling of a height-3 rectangle look like? Let
us classify such tilings according to the kind of dominos that occupy the first
column of the tiling. The proof of the following classification can be found in
the Appendix (Proposition B.3.2 in Section B.3):

• The faultfree domino tilings of a height-3 rectangle that contain a vertical


domino in the top two squares of the first column are

···

(this is an infinite sequence of tilings, each obtained from the previous


by inserting two columns in the middle by a fairly self-explanatory proce-
dure). The weights of these tilings are 2, 4, 6, . . ., so their total contribution
to the weight generating function F of F is x2 + x4 + x6 + · · · .

• The faultfree domino tilings of a height-3 rectangle that contain a vertical


domino in the bottom two squares of the first column are

···

(these are the top-down mirror images of the tilings from the previous
bullet point). The weights of these tilings are 2, 4, 6, . . ., so their total
contribution to the weight generating function F of F is x2 + x4 + x6 + · · · .

• The faultfree domino tilings of a height-3 rectangle that contain no vertical


domino in the first column are
Math 701 Spring 2021, version April 6, 2024 page 177

(yes, there is only one such tiling). The weight of this tiling is 2, so its
total contribution to the weight generating function F of F is x2 .
This classification of faultfree domino tilings entails
   
F = x2 + x4 + x6 + · · · + x2 + x4 + x6 + · · · + x2
 
2 1 2 1 2 2 4 6 2 1
=x · +x · +x since x + x + x + · · · = x ·
1 − x2 1 − x2 1 − x2
3x2 − x4
= .
1 − x2
Thus,
1 1 1 − x2
D= = =
1−F 3x2 − x4 1 − 4x2 + x4
1−
1 − x2
= 1 + 3x + 11x + 41x6 + 153x8 + · · · .
2 4

You will notice that only even powers of x appear in this FPS. In other words,
dn,3 = 0 when n is odd.
This is not surprising, because if n is odd, then the rectangle Rn,3 has an odd #
of squares, and thus cannot be tiled by dominos.
But we can also compute dn,3 for even n. Indeed, using the same method
(partial fractions) that we used for the Fibonacci sequence in Section 3.1, we
1 − x2
can expand as a sum of geometric series:
1 − 4x2 + x4
√ √
1 − x2 3+ 3 1 3− 3 1
2 4
= ·  √  + ·  √  .
1 − 4x + x 6 1 − 2 + 3 x2 6 1 − 2 − 3 x2

Thus, we find
√ √
3+ 3  √ n/2 3 − 3  √ n/2
dn,3 = 2+ 3 + 2− 3 for any even n.
6 6
Now, what about computing dn,m in general? The above reasoning leading up
1
to D = can be applied for any m ∈ N, but describing F becomes harder
1−F
and harder as m grows larger. The generating function D is still a quotient of
two polynomials for any m (see, e.g., [KlaPol79]), but this requires more insight
to prove. For m ≥ 6, it appears that there is no formula for dn,m that requires
only quadratic irrationalities.
It is worth mentioning a different formula for dn,m , found by Kasteleyn in
1961 (motivated by a theoretical physics model):
Math 701 Spring 2021, version April 6, 2024 page 178

Theorem 3.12.14 (Kasteleyn’s formula). Let n, m ∈ N be such that m is even


and n ≥ 1. Then,
s
2 
m/2 n
kπ 2
 

dn,m = 2 mn/2
∏ ∏ cos m + 1 + cos n + 1 .
j =1 k =1

See [Loehr11, Theorem 12.85] or [Stucky15] for proofs of this rather surpris-
ing formula (which, alas, require some more advanced methods than those
introduced in this text). Note that it can indeed be used for exact computa-
tion of dn,m (as there are algorithms for exact manipulation of “trigonometric

irrationals”51 such as cos ); for example, it yields d8,8 = 12 988 816.
m+1

3.13. Limits of FPSs


Much of what we have been doing with FPSs has been an analogue (a mockery,
as some might say) of analysis, particularly of complex analysis. We have de-
fined FPSs, derivatives, infinite sums and products, exponentials and various
related concepts, all by mimicking the respective analytic notions but avoiding
any reference to real numbers or any other peculiarities that our base ring K
may or may not have.
Let me introduce yet another such formal analogue of a classical concept
from analysis: that of a limit. Specifically, I will define (coefficientwise) limits
of sequences of FPSs. The exposition will follow [Loehr11, §7.5]. These limits
are very easy to define, yet quite useful. (In particular, they can be used to give
a simpler definition of infinite products, although a less natural one.)

3.13.1. Stabilization of scalars


We start with the most trivial kind of limit whatsoever: the limit of an eventu-
ally constant sequence. Its main advantage is its generality and the simplicity
of its definition:

Definition 3.13.1. Let ( ai )i∈N = ( a0 , a1 , a2 , . . .) ∈ KN be a sequence of ele-


ments of K. Let a ∈ K.
We say that the sequence ( ai )i∈N stabilizes to a if there exists some N ∈ N
such that
all integers i ≥ N satisfy ai = a.
Instead of saying “( ai )i∈N stabilizes to a”, we can also say “ai stabilizes to
a as i → ∞”. (The words “as i → ∞” here signify that we are talking not
about a single term ai but about the entire sequence ( ai )i∈N .)
51 The technical term is “elements of cyclotomic fields”.
Math 701 Spring 2021, version April 6, 2024 page 179

If ai stabilizes to a as i → ∞, then we write lim ai = a and say that a is


i→∞
the limit (or eventual value) of ( ai )i∈N (or, less precisely, that a is the limit of
the ai ). This is legitimate, since a is uniquely determined by the sequence
( a i ) i ∈N .
We can replace N by Z≥q = {q, q + 1, q + 2, . . .} in this definition, where
q is any integer. That is, everything we said applies not just to sequences of
the form ( ai )i∈N = ( a0 , a1 , a2 , . . .), but also to sequences of the form ( ai )i≥q =

a q , a q +1 , a q +2 , . . . .

Here are some examples and non-examples of stabilization:

• The sequence
        
5 5 5 5
= , , , . . . = (5, 2, 1, 1, 1, 0, 0, 0, 0, . . .) ∈ ZN
i i ≥1 1 2 3
 
5
stabilizes to 0, since all integers i ≥ 6 satisfy = 0.
i
 
5
• On the other hand, the sequence ∈ ZN does not stabilize to
i i ≥1
anything. (It does converge to 0 in the sense of real analysis, but this is a
weaker statement than stabilization.)

• Any constant sequence ( a, a, a, . . .) ∈ KN stabilizes to a.

• A well-known (if non-constructive) fact says that every weakly decreasing


sequence of nonnegative integers stabilizes to some nonnegative integer
(since it cannot decrease infinitely often).

• If s is a nilpotent element of a ring K (that is, an element satisfying sk = 0


for some k ≥ 0), then the sequence si i∈N = s0 , s1 , s2 , . . . stabilizes to
 

0 ∈ K. For example, the element   2 of the ring Z/8Z is nilpotent (with


3 i
2 = 0), so that the sequence 2 = 2, 4, 0, 0, 0, . . . stabilizes to 0.
i ∈N

• If s is an idempotent element of a ring K (that is, an element satisfying


s2 = s), then the sequence si i∈N = s0 , s1 , s2 , . . . stabilizes to s (in fact,
s1 = s2 = s3 = · · · = s).

Remark 3.13.2. If you are familiar with point-set topology, then you will rec-
ognize Definition 3.13.1 as an instance of topological convergence: Namely,
a sequence ( ai )i∈N stabilizes to some a ∈ K if and only if ( ai )i∈N converges
to a in the discrete topology.
Math 701 Spring 2021, version April 6, 2024 page 180

3.13.2. Coefficientwise stabilization of FPSs


Now we define a somewhat subtler notion of limits, one specific to FPSs (as
opposed to elements of an arbitrary commutative ring). This is the notion that
we will use in what follows:

Definition 3.13.3. Let ( f i )i∈N ∈ K [[ x ]]N be a sequence of FPSs over K. Let


f ∈ K [[ x ]] be an FPS.
We say that ( f i )i∈N coefficientwise stabilizes to f if for each n ∈ N,

the sequence ([ x n ] f i )i∈N stabilizes to [ x n ] f .

Instead of saying “( f i )i∈N coefficientwise stabilizes to f ”, we can also say


“ f i coefficientwise stabilizes to f as i → ∞” or write “ f i → f as i → ∞”.
If f i coefficientwise stabilizes to f as i → ∞, then we write lim f i = f and
i→∞
say that f is the limit of ( f i )i∈N (or, less precisely, that f is the limit of the f i ).
This is legitimate, because f is uniquely determined by the sequence ( f i )i∈N .
Again, we can replace N by Z≥q = {q, q + 1, q + 2, . . .} in this definition,
where q is any integer.

Example 3.13.4. (a) The sequence xi i∈N of FPSs coefficientwise stabilizes




to the FPS 0. (Indeed, for each n ∈ N, the sequence [ x n ] xi i∈N consists of




a single 1 and infinitely many 0s; thus, this sequence stabilizes to 0.) In other
words, we have lim xi = 0.
i→∞
 
1
(b) The sequence x of FPSs does not stabilize (since the x1 -
i i ≥1  
1 1
coefficients of these FPSs x never stabilize). Thus, lim x does not exist.
i i→∞ i

(c) We have
     ∞  
lim
i→∞
1+x 1
1+x 2
··· 1+x i
= ∏ 1+x k
.
k =1

Indeed, the x n -coefficient of the product 1 + x1 1 + x2 · · · 1 + xi de-


  

pends only on those factors in which the exponent is ≤ n, and thus stops
changing after i surpasses n; therefore, it stabilizes to the x n -coefficient of the

infinite product ∏ 1 + x k .

k =1
x n 
(d) It would be nice to have lim 1 + = exp, as in real analysis.
n→∞ n
Unfortunately, this is not the case. In fact, the binomial formula yields
 
n
   
 x  n x n x 2 2
1+ = 1+n· + · + · · · = 1 + x + 2 x2 + · · · .
n | {zn} 2 n n
=x
Math 701 Spring 2021, version April 6, 2024 page 181

 x n
This shows that the x0 -coefficient and the x1 -coefficient stabilize of 1 +
n
 x n
as n → ∞, but the x -coefficient does not. Thus, lim 1 +
2 does not exist
n→∞ n
(according to our definition of limit).
(e) Coefficientwise stabilization is weaker than stabilization: If a sequence
( 0 , f 1 , f 2 , . . .) of FPSs stabilizes to an FPS f (meaning that f i = f for all
f
sufficiently high i ∈ N), then it coefficientwise stabilizes to f as well. In
particular, any constant sequence ( f , f , f , . . .) of FPSs stabilizes to f .

Remark 3.13.5. If you are familiar with point-set topology, then you will
again recognize Definition 3.13.3 as an instance of topological convergence:
Namely, we recall that each FPS is an infinite sequence of elements of K (its
coefficients). Thus, the set K [[ x ]] is the infinite Cartesian product K × K ×
K × · · · . If we equip each factor K in this product with the discrete topology,
then the entire product K [[ x ]] becomes a product space equipped with the
product topology. Now, a sequence ( f i )i∈N of FPSs in K [[ x ]] coefficientwise
stabilizes to some FPS f ∈ K [[ x ]] if and only if ( f i )i∈N converges to f in this
product topology. (This is not surprising: After all, the product topology
is also known as the “topology of pointwise convergence”, and in our case
“pointwise” means “coefficientwise”.)

3.13.3. Some properties of limits


We shall now state some basic properties of limits of FPSs. All the proofs are
simple and therefore omitted; some can be found in the Appendix (Section B.4).

Theorem 3.13.6. Let ( f i )i∈N ∈ K [[ x ]]N be a sequence of FPSs. Assume that


for each n ∈ N, there exists some gn ∈ K such that the sequence ([ x n ] f i )i∈N
stabilizes to gn . Then, the sequence ( f i )i∈N coefficientwise stabilizes to
∑ gn x n . (That is, lim f i = ∑ gn x n .)
n ∈N i→∞ n ∈N

Proof. Obvious.
The following easy lemma connects limits with the notion of x n -equivalence
(as introduced in Definition 3.10.1):

Lemma 3.13.7. Assume that ( f i )i∈N is a sequence of FPSs, and that f is an


FPS such that lim f i = f . Then, for each n ∈ N, there exists some integer
i→∞
N ∈ N such that
xn
all integers i ≥ N satisfy f i ≡ f .
Math 701 Spring 2021, version April 6, 2024 page 182

Proof of Lemma 3.13.7 (sketched). Let n ∈ N. For each k ∈ {0, 1, . . . , n}, we pick
an Nk ∈ N such that all i ≥ Nk satisfy x f i = x f (such an Nk exists, since
k k
xn
lim f i = f ). Then, all integers i ≥ max { N0 , N1 , . . . , Nn } satisfy f i ≡ f . See
i→∞
Section B.4 for the details of this proof.
The following proposition is an analogue of the classical “limits respect sums
and products” theorem from real analysis:
Proposition 3.13.8. Assume that ( f i )i∈N and ( gi )i∈N are two sequences of
FPSs, and that f and g are two FPSs such that

lim f i = f and lim gi = g.


i→∞ i→∞

Then,

lim ( f i + gi ) = f + g and lim ( f i gi ) = f g.


i→∞ i→∞

(In other words, the sequences ( f i + gi )i∈N and ( f i gi )i∈N coefficientwise sta-
bilize to f + g and f g, respectively.)
Proof of Proposition 3.13.8 (sketched). Let n ∈ N. Then, Lemma 3.13.7 shows that
there exists some integer N ∈ N such that
xn
all integers i ≥ N satisfy f i ≡ f .
Similarly, there exists some integer M ∈ N such that
xn
all integers i ≥ M satisfy gi ≡ g.
Pick such integers N and M, and set P := max { N, M}. Then, all integers i ≥ P
xn xn xn
satisfy both f i ≡ f and gi ≡ g, and therefore f i gi ≡ f g (by (101)), and thus
[ x n ] ( f i gi ) = [ x n ] ( f g). This shows that the sequence ([ x n ] ( f i gi ))i∈N stabilizes
to [ x n ] ( f g). Since this holds for each n ∈ N, we conclude that lim ( f i gi ) = f g.
i→∞
The proof of lim ( f i + gi ) = f + g is analogous.
i→∞
See Section B.4 for the details of this proof.
Corollary 3.13.9. Let k ∈ N. For each i ∈ {1, 2, . . . , k}, let f i be an FPS, and
let ( f i,n )n∈N be a sequence of FPSs such that
lim ( f i,n ) = f i
n→∞

(note that it is n, not i, that goes to ∞ here!). Then,


k k k k
lim
n→∞
∑ fi,n = ∑ fi and lim
n→∞
∏ fi,n = ∏ fi .
i =1 i =1 i =1 i =1
Math 701 Spring 2021, version April 6, 2024 page 183

Proof. Follows by induction on k, using Proposition 3.13.8.

Remark 3.13.10. The analogue of Corollary 3.13.9 for infinite sums (or prod-
ucts) is not true without further requirements. For instance, for each i ∈ N,
we can define a constant FPS f i,n = δi,n (using Definition 3.5.6). Then, we

have lim f i,n = 0 for any fixed i ∈ N, but lim
n→∞ n→∞
∑ fi,n = nlim
→∞
1 = 1.
i =0
| {z }
=1

The following proposition says that limits respect quotients:

Proposition 3.13.11. Assume that ( f i )i∈N and ( gi )i∈N are two sequences of
FPSs, and that f and g are two FPSs such that

lim f i = f and lim gi = g.


i→∞ i→∞

Assume that each FPS gi is invertible. Then, g is also invertible, and we have

fi f
lim = .
i→∞ gi g

Proof of Proposition 3.13.11 (sketched). First use Proposition 3.3.7 to show that g
f f
is invertible; then use Theorem 3.10.3 (e) to prove lim i = . A detailed proof
i → ∞ gi g
can be found in Section B.4.
Limits of FPSs furthermore respect composition:

Proposition 3.13.12. Assume that ( f i )i∈N and ( gi )i∈N are two sequences of
FPSs, and that f and g are two FPSs such that

lim f i = f and lim gi = g.


i→∞ i→∞

Assume that x0 gi = 0 for each i ∈ N. Then, x0 g = 0 and


   

lim ( f i ◦ gi ) = f ◦ g.
i→∞

Proof of Proposition 3.13.12 (sketched). Similar to Proposition 3.13.11, but now us-
ing Proposition 3.10.5. See Exercise A.2.13.1 (b) for details.
Limits of FPSs also respect derivatives (unlike in real analysis, where this
holds only under subtle additional conditions):
Math 701 Spring 2021, version April 6, 2024 page 184

Proposition 3.13.13. Let ( f n )n∈N be a sequence of FPSs, and let f be an FPS


such that
lim f i = f .
i→∞
Then,
lim f i′ = f ′ .
i→∞

Proof. This is nearly trivial (see Exercise A.2.13.1 (a)).


Next, let us see how infinite sums and infinite products can be written as
limits of finite sums and products, at least as long as they are indexed by N.
The following two theorems and one corollary are easy to prove; detailed proofs
can be found in Section B.4.

Theorem 3.13.14. Let ( f n )n∈N be a summable sequence of FPSs. Then,

i
lim ∑
i → ∞ n =0
fn = ∑ fn.
n ∈N

In other words, the infinite sum ∑ f n is the limit of the finite partial sums
n ∈N
i
∑ fn.
n =0

Theorem 3.13.15. Let ( f n )n∈N be a multipliable sequence of FPSs. Then,

i
lim ∏ fn = ∏
i → ∞ n =0
fn.
n ∈N

In other words, the infinite product ∏ f n is the limit of the finite partial
n ∈N
i
products ∏ f n .
n =0

Corollary 3.13.16. Each FPS is a limit of a sequence of polynomials. Indeed,


if a = ∑ an x n (with an ∈ K), then
n ∈N

i
a = lim ∑ an x n .
i → ∞ n =0

This corollary can be restated as “the polynomials are dense in the FPSs”
(more formally: K [ x ] is dense in K [[ x ]]). This fact is useful, as it allows you
Math 701 Spring 2021, version April 6, 2024 page 185

to restrict yourself to polynomials when proving some properties of FPSs. For


example, if you want to prove that some identity (such as the Leibniz rule
( f g)′ = f ′ g + f g′ ) holds for all FPSs, it suffices to prove it for polynomials, and
then conclude the general case using a limiting argument.
Both Theorem 3.13.14 and Theorem 3.13.15 have converses (which are some-
what harder to prove):

i
Theorem 3.13.17. Let ( f 0 , f 1 , f 2 , . . .) be a sequence of FPSs such that lim ∑ f n
i → ∞ n =0
exists. Then, the family ( f n )n∈N is summable, and satisfies

i
∑ f n = lim ∑
i → ∞ n =0
fn.
n ∈N

i
Theorem 3.13.18. Let ( f 0 , f 1 , f 2 , . . .) be a sequence of FPSs such that lim ∏ f n
i → ∞ n =0
exists. Then, the family ( f n )n∈N is multipliable, and satisfies

i
∏ f n = lim
i→∞
∏ fn.
n ∈N n =0

We refer to Section B.4 for the proofs of these two theorems.


Theorem 3.13.14 and Theorem 3.13.17 combined show that we could have
just as well defined infinite sums of FPSs as limits of finite partial sums, as
long as we were happy with sums of the form ∑ f n (as opposed to sums of
n ∈N
the form ∑ or other kinds). Likewise, Theorem 3.13.15 and Theorem
(n,m)∈N×N
3.13.18 show the same for products. However, I consider my original definitions
to be more natural.
It should be clear that all the above-stated properties of limits remain true
if the sequences involved are no longer indexed by N but rather indexed by
Z≥q = {q, q + 1, q + 2, . . .} for some integer q. Even better: The limit of a
sequence ( f n )n≥q does not depend on any chosen finite piece of this sequence,
so it is simultaneously the limit of the sequence ( f n )n≥r for any integer r (as
long as f n is well-defined for all n ≥ r). In this regard, limits behave exactly as
in real analysis.
Math 701 Spring 2021, version April 6, 2024 page 186

3.14. Laurent power series


3.14.1. Motivation
Next, we shall try to extend the concept of FPSs to allow negative powers of x.
First, however, let me motivate this by showing an example where the need for
such negative powers arises.
First, we recall the binary positional system for nonnegative integers:

Definition 3.14.1. A binary representation of an integer n means an essentially


finite sequence (bi )i∈N = (b0 , b1 , b2 , . . .) ∈ {0, 1}N such that

n= ∑ bi 2 i .
i ∈N

(Recall that “essentially finite” means “all but finitely many i ∈ N satisfy
bi = 0”.)

The following theorem is well-known (and has been proved in Subsection


3.11.1):

Theorem 3.14.2. Each n ∈ N has a unique binary representation.

Note that we are encoding the digits (actually, bits) of a binary representation
as essentially finite sequences instead of finite tuples. This way, we don’t have
to worry about leading zeros breaking the uniqueness in Theorem 3.14.2.
Let us now define a variation of binary representation:

Definition 3.14.3. A balanced ternary representation of an integer n means an


essentially finite sequence (bi )i∈N = (b0 , b1 , b2 , . . .) ∈ {0, 1, −1}N such that

n= ∑ bi 3 i .
i ∈N

Here are some examples:

• The integer 19 has a balanced ternary representation (1, 0, −1, 1, 0, 0, 0, . . .),


because

19 = 1 − 9 + 27 = 30 − 32 + 33 = 1 · 30 + (−1) · 32 + 1 · 33 .

• The integer 42 has a balanced ternary representation (0, −1, −1, −1, 1, 0, 0, 0, . . .),
because
42 = 81 − 27 − 9 − 3 = 34 − 33 − 32 − 31 .
Math 701 Spring 2021, version April 6, 2024 page 187

• Note that (unlike with binary representations) even negative integers can
have balanced ternary representations. For example, the integer −11 has
a balanced ternary representation (1, −1, −1, 0, 0, 0, . . .).

In the Soviet Union of the 1960s/70s, balanced ternary representations have


been used as a foundation for computers (see the Setun computer). The idea
was shelved in the 1970s, but a noticeable amount of algorithms have been in-
vented for working with balanced ternary representations. (See [Knuth2, §4.1]
for some discussion of these.) The following theorem (which goes back to Fi-
bonacci) is crucial for making balanced ternary representations useful:

Theorem 3.14.4. Each integer n has a unique balanced ternary representation.

There are various ways to prove this (see, e.g., [20f, solution to Exercise 3.7.8]
for an elementary one). Let us here try to prove it using FPSs (imitating our
proof of Theorem 3.14.2 in Subsection 3.11.1). Since the bi can be −1s, we must
allow for negative powers of x.
Let us first argue informally; we will later see whether we can make sense of
what we have done. The following informal argument was proposed by Euler
([Euler48, §331]). We shall compute the product
     
1 + x + x −1 1 + x 3 + x −3 1 + x 9 + x −9 · · · = ∏ 1 + x 3 + x −3
i i

i ≥0

in two ways:
Math 701 Spring 2021, version April 6, 2024 page 188

• On the one hand, this product equals


 
∏ 1+x +x 3i − 3i
i ≥0 | {z }
i
= ∑ x b ·3
b∈{0,1,−1}

=∏ ∑
i
x b ·3
i ≥0 b∈{0,1,−1}


0 1 2
= x b0 3 x b1 3 x b2 3 · · ·
(b0 ,b1 ,b2 ,...)∈{0,1,−1}N
is essentially finite
 
here, we have just expanded the product using (127)
(hoping that (127) still works in our setting)


0 + b 31 + b 32 +···
= x b0 3 1 2

(b0 ,b1 ,b2 ,...)∈{0,1,−1}N


is essentially finite

= ∑ ∑ xn
n ∈Z (b0 ,b1 ,b2 ,...)∈{0,1,−1}N
is essentially finite;
b0 30 +b1 31 +b2 32 +···=n
= ∑ (# of balanced ternary representations of n) · x n , (137)
n ∈Z

since a balanced ternary representation of n is precisely an essentially


finite sequence (b0 , b1 , b2 , . . .) ∈ {0, 1, −1}N satisfying b0 30 + b1 31 + b2 32 +
· · · = n.
1 − x3 i
• On the other hand, we have 1 + x + x −1 = . Substituting x3 for
x (1 − x )
x in this equality, we obtain
 i 3
i i
1 − x3 1 − x3
i +1

1 + x 3 + x −3 =   =   for each i ≥ 0.
3 i 3 i 3 i 3 i
x 1−x x 1−x
Math 701 Spring 2021, version April 6, 2024 page 189

Hence,
 
∏ 1+x +x
3i − 3i
i ≥0
i +1
1 − x3
=∏  
3 i 3 i
i ≥0 x 1−x
1 − x3 1 − x9 1 − x27 1 − x81
= · 3 · · ····
x (1 − x ) x (1 − x 3 ) x9 (1 − x9 ) x27 (1 − x27 )
1 1
= 3 9
·
x27 · · }·
|xx x {z − x}
|1 {z
= x −∞ =1+ x + x2 + x3 +···
(whatever this means)

here we cancelled the factors 1 − x3 , 1 − x9 , 1 − x27 , . . .


 

by a somewhat daring use of the telescope principle


 
−∞ 2 3
=x 1+x+x +x +···
= · · · + x −2 + x −1 + x 0 + x 1 + x 2 + · · ·
with some artistic license,
 
 since xi 1 + x + x2 + x3 + · · · = xi + xi+1 + xi+2 + · · · 


for each i ∈ Z
= ∑ xn . (138)
n ∈Z

Comparing (137) with (138), we find


∑ (# of balanced ternary representations of n) · x n = ∑ xn .
n ∈Z n ∈Z
Comparing coefficients, we thus conclude that
(# of balanced ternary representations of n) = 1
for each n ∈ Z. This “proves” Theorem 3.14.4; we just need to make our
computations rigorous – i.e., define the ring in which we have been computing,
explain what x is, and justify the well-definedness of our infinite products and
sums.
Let us first play around a bit further. We have
 
(1 − x ) · · · + x −2 + x −1 + x 0 + x 1 + x 2 + · · ·
   
= · · · + x −2 + x −1 + x 0 + x 1 + x 2 + · · · − x · · · + x −2 + x −1 + x 0 + x 1 + x 2 + · · ·
| {z }
=···+ x −1 + x0 + x1 + x2 + x3 +···
=···+ x −2 + x −1 + x0 + x1 + x2 +···
   
= · · · + x −2 + x −1 + x 0 + x 1 + x 2 + · · · − · · · + x −2 + x −1 + x 0 + x 1 + x 2 + · · ·
= 0.
Math 701 Spring 2021, version April 6, 2024 page 190

Thus, dividing by 1 − x, we obtain

· · · + x −2 + x −1 + x0 + x1 + x2 + · · · = 0.

Comparing coefficients, we conclude that

1=0 for each n ∈ Z.

Oops! Looks like we have overtaxed our artistic license.


So we need to be careful with negative powers of x. Not everything that
looks like a valid computation actually is one. Thus, we need to be rigorous
and delimit what can and what cannot be done with negative powers of x.

3.14.2. The space K [[ x ± ]]


Let us try to define “FPSs with negative powers of x” formally. First, we define
the largest possible space of such FPSs:

Definition 3.14.5. Let K [[ x ± ]] be the K-module KZ of all families ( an )n∈Z =


(. . . , a−2 , a−1 , a0 , a1 , a2 , . . .) of elements of K. Its addition and its scaling are
defined entrywise:

( a n ) n ∈Z + ( bn ) n ∈Z = ( a n + bn ) n ∈Z ;
λ ( an )n∈Z = (λan )n∈Z for each λ ∈ K.

An element of K [[ x ± ]] will be called a doubly infinite power series. This name


is justified by the fact that we will later use the notation ∑ an x n for a family
n ∈Z
( an )n∈Z ∈ K [[ x ± ]].

Now, let us try to define a multiplication on this K-module K [[ x ± ]], in order


to turn it into a K-algebra (like K [[ x ]]). This multiplication should satisfy

( a n ) n ∈Z · ( bn ) n ∈Z = ( c n ) n ∈Z , where cn = ∑ a i bn − i
i ∈Z
  
(since this is what we would get if we expanded ∑ an xn ∑ bn xn and
n ∈Z n ∈Z
combined like powers of x). Unfortunately, the sum ∑ ai bn−i is now infinite
i ∈Z
(unlike for K [[ x ]]), and is not always well-defined. Thus, the product ( an )n∈Z ·
(bn )n∈Z of two elements of K [[ x ± ]] does not always exist52 . Therefore, the K-
module K [[ x ± ]] is not a K-algebra. This explains why our above computations
have led us astray.
52 The simplest example for this is (1)n∈Z · (1)n∈Z . If this product existed, then it should be
(cn )n∈Z , where cn = ∑ 1 · 1, but the latter sum clearly does not exist.
i ∈Z
Math 701 Spring 2021, version April 6, 2024 page 191

3.14.3. Laurent polynomials


If we cannot multiply two arbitrary elements of K [[ x ± ]], can we perhaps re-
strict ourselves to a smaller K-submodule of K [[ x ± ]] whose elements can be
multiplied?
One such submodule is K [[ x ]], which is embedded in K [[ x ± ]] in the “ob-
vious” way (by identifying each FPS ( an )n∈N with the doubly infinite power
series ( an )n∈Z , where we set an := 0 for all n < 0). But there are also some
others. Here is one:53

Definition 3.14.6. Let K [ x ± ] be the K-submodule of K [[ x ± ]] consisting of all


essentially finite families ( an )n∈Z . This is indeed a K-submodule (check it!).
It should be thought of as an analogue of the ring of polynomials K [ x ], but
now allowing for negative powers of x.
The elements of K [ x ± ] are called Laurent polynomials in the indeterminate
x over K.
We define a multiplication on K [ x ± ] by setting

( a n ) n ∈Z · ( bn ) n ∈Z = ( c n ) n ∈Z , where cn = ∑ a i bn − i .
i ∈Z

Note that the sum ∑ ai bn−i is now well-defined, because it is essentially


i ∈Z
finite.
We define an element x ∈ K [ x ± ] by

x = (δi,1 )i∈Z

(where we are using the Kronecker delta notation again, as in Definition


3.5.6).

Theorem 3.14.7. The K-module K [ x ± ], equipped with the multiplication we


just defined, is a commutative K-algebra. Its unity is (δi,0 )i∈Z . The element x
is invertible in this K-algebra.

This K-algebra K [ x ± ] is called the Laurent


 ±1  polynomial ring in one indetermi-
− 1

nate x over K. It is often denoted by K x or K x, x as well.

Proposition 3.14.8. Any doubly infinite power series a = ( ai )i∈Z ∈ K [[ x ± ]]


satisfies
a = ∑ ai x i .
i ∈Z

Here, the powers xi are taken in the Laurent polynomial ring K [ x ± ], but
the infinite sum ∑ ai xi is taken in the K-module K [[ x ± ]]. (The notions of
i ∈Z
53 Forproofs of the statements made (implicitly and explicitly) in the following (or at least for
the less obvious among these proofs), we refer to Section B.5.
Math 701 Spring 2021, version April 6, 2024 page 192

summable families and infinite sums are defined in K [[ x ± ]] in the same way
as they are defined in K [[ x ]].)

Examples of Laurent polynomials are

• any polynomial in K [ x ];

• x −15 ;

• x2 + 3 + 7x −3 .

There are other, equivalent ways to define the Laurent polynomial ring K [ x ± ]:

• as the group algebra of the cyclic group Z over K;

• as the localization of the polynomial ring K [ x ] at the powers of x.

(These are done in some textbooks on abstract algebra – e.g., see [Ford21,
Exercise 3.6.31] for a quick overview.)

3.14.4. Laurent polynomials are not enough


Now let us see if we can make our above proof of Theorem 3.14.4 rigorous
using Laurent polynomials.
 Unfortunately,
 K [ x ± ] is “too small” to contain
i i
the infinite product ∏ 1 + x3 + x −3 . However, we can try using its partial
i ≥0
Math 701 Spring 2021, version April 6, 2024 page 193

products, which are finite. For each k ∈ N, we have

k  

i i
1 + x 3 + x −3
i =0
    k k

= 1 + x + x −1 1 + x 3 + x −3 · · · 1 + x 3 + x −3
k +1
1 − x3 1 − x9 1 − x3
= · 3 · · · · ·
x (1 − x ) x (1 − x 3 )
 
k k
x3 1 − x3
!
i
this is somewhat unrigorous, since 1 − x3 are not invertible
in K [ x ± ] , but this will soon be made rigorous
k +1
1 1 − x3
= · (by cancelling factors)
3k −x }
| 1 {z
3 9
|xx x {z· · · x }
− 30 +31 +···+3k ) =1+ x + x2 +···+ x3k+1 −1
=x (
 
−(30 +31 +···+3k ) 2 3k +1 −1
=x · 1+x+x +···+x
0 1 k
 0 1 k

= x −(3 +3 +···+3 ) · 1 + x + x2 + · · · + x2(3 +3 +···+3 )
   
since 3k+1 − 1 = 2 · 30 + 31 + · · · + 3k (check this!)
0 +31 +···+3k
= x − (3 ) + x −(30 +31 +···+3k )+1 + x −(30 +31 +···+3k )+2 + · · · + x30 +31 +···+3k
= ∑ xn .
n∈Z;
|n|≤30 +31 +···+3k
Math 701 Spring 2021, version April 6, 2024 page 194

On the other hand,


k  
∏ 1+x +x 3i − 3i
i =0 | {z }
i
= ∑ x b ·3
b∈{0,1,−1}

k
=∏ ∑
i
x b ·3
i =0 b∈{0,1,−1}
 
by expanding the product
= ∑ x b0 30 b1 31
x ···x bk 3 k
using Proposition 3.11.26
k +1
(b0 ,b1 ,...,bk )∈{0,1,−1}


0 + b 31 +···+ b 3k
= x b0 3 1 k

(b0 ,b1 ,...,bk )∈{0,1,−1}k+1


= ∑ ∑ xn
n ∈Z (b0 ,b1 ,...,bk )∈{0,1,−1}k+1 ;
b0 30 +b1 31 +···+bk 3k =n
= ∑ (# of k-bounded balanced ternary representations of n) · x n ,
n ∈Z

where a balanced ternary representation (b0 , b1 , b2 , . . .) is said to be k-bounded if


bk+1 = bk+2 = bk+3 = · · · = 0. Comparing the two results, we find

∑ (# of k-bounded balanced ternary representations of n) · x n


n ∈Z
= ∑ xn .
n∈Z;
|n|≤30 +31 +···+3k

Comparing coefficients, we thus see that each n ∈ Z satisfying |n| ≤ 30 +


31 + · · · + 3k has a unique k-bounded balanced ternary representation. Let-
ting k → ∞ now quickly yields Theorem 3.14.4 (because any balanced ternary
representation is k-bounded for any sufficiently large k).
Thus we have proved Theorem 3.14.4 up to the fact that we have divided by
the polynomials 1 − x, 1 − x3 , 1 − x9 , . . ., which are not invertible in the Laurent
polynomial ring K [ x ± ]. In order to fill this gap, we need a new K-algebra: The
Laurent polynomial ring K [ x ± ] is too small, whereas the original K-module
K [[ x ± ]] is not a ring. We need some kind of middle ground: some K-module
lying between K [ x ± ] and K [[ x ± ]] that is a ring but allows division by 1 − x,
1 − x3 , 1 − x9 , . . . (and, more generally, by 1 − xi for each positive integer i).

3.14.5. Laurent series


This middle ground is called K (( x )) and is defined as follows:
Math 701 Spring 2021, version April 6, 2024 page 195

Definition 3.14.9. We let K (( x )) be the subset of K [[ x ± ]] consisting of all


families ( ai )i∈Z ∈ K [[ x ± ]] such that the sequence ( a−1 , a−2 , a−3 , . . .) is essen-
tially finite – i.e., such that all sufficiently low i ∈ Z satisfy ai = 0.
The elements of K (( x )) are called Laurent series in one indeterminate x
over K.

For example (for K = Z):

• the “series” x −3 + x −2 + x −1 + x0 + x1 + · · · belongs to K (( x ));

• the “series” 1 + x −1 + x −2 + x −3 + · · · does not belong to K (( x ));

• the “series” ∑ x n = · · · + x −2 + x −1 + x0 + x1 + x2 + · · · does not belong


n ∈Z
to K (( x )).

The subset K (( x )) turns out to behave as nicely as we might hope for:54

Theorem 3.14.10. The subset K (( x )) is a K-submodule of K [[ x ± ]]. But it has


a multiplication (unlike K [[ x ± ]]). This multiplication is given by the same
rule as the multiplication of K [ x ± ]: namely,

( a n ) n ∈Z · ( bn ) n ∈Z = ( c n ) n ∈Z , where cn = ∑ a i bn − i .
i ∈Z

The sum ∑ ai bn−i here is well-defined, because it is essentially finite (indeed,


i ∈Z
for all sufficiently low i ∈ Z, we have ai = 0 and thus ai bn−i = 0; on the other
hand, for all sufficiently high i ∈ Z, we have bn−i = 0 and thus ai bn−i = 0).
Equipped with the multiplication we just defined, the K-module K (( x ))
becomes a commutative K-algebra with unity (δi,0 )i∈Z .

Now, the ring K (( x )) contains both the FPS ring K [[ x ]] and the Laurent
polynomial ring K [ x ± ] as subrings (and actually as K-subalgebras). This makes
it one of the most convenient places for formal manipulation of FPSs. (However,
it has a disadvantage compared to the FPS ring K [[ x ]]: Namely, you cannot
easily substitute something for x in a Laurent series f ∈ K (( x )).)
k  i i

Now, our above computation of ∏ 1 + x + x 3 − 3 makes perfect sense in
i =0
the Laurent series ring K (( x )): Indeed, for each positive integer i, the power
series 1 − xi is invertible in K [[ x ]] and thus also invertible in K (( x )). Hence, at
last, we have a rigorous proof of Theorem 3.14.4.
Actually, you could also make sense of our original  argument for prov-
i i
ing Theorem 3.14.4, with the infinite product ∏ 1 + x3 + x −3 , as long as
i ≥0

54 See Section B.5 for a proof of Theorem 3.14.10.


Math 701 Spring 2021, version April 6, 2024 page 196

you made sure to interpret it correctly: First, compute the finite products
k  i i
 k  i i

3 − 3 3 − 3
∏ 1+x +x in the ring K (( x )). Then, take their limit lim ∏ 1 + x + x
i =0 k → ∞ i =0
in K [[ x ± ]] (this is not a ring, but the notion of a limit in K [[ x ± ]] is defined just
as it was in K [[ x ]]).
More can be said about the K-algebra K (( x )) when K is a field: Indeed, in
this case, it is itself a field! This fact (whose proof is Exercise A.2.14.2) is not
very useful in combinatorics, but quite so in abstract algebra.

3.14.6. A K [ x ± ]-module structure on K [[ x ± ]]


One more remark is in order. As we have explained, K [[ x ± ]] is not a ring. How-
ever, some semblance of multiplication can be defined in K [[ x ± ]]. Namely, a
“doubly infinite power series” ∑ an x n = ( an )n∈Z ∈ K [[ x ± ]] can be multiplied
n ∈Z
by a Laurent polynomial ∑ bn x n = (bn )n∈Z ∈ K [ x ± ], because in this case the
n ∈Z
sums ∑ ai bn−i will be essentially finite. Thus, while we cannot always multiply
i ∈Z
two elements of K [[ x ± ]], we can always multiply an element of K [[ x ± ]] with an
element of K [ x ± ]. This makes K [[ x ± ]] into a K [ x ± ]-module. The equality
 
−2 −1 0 1 2
(1 − x ) · · · + x + x + x + x + x + · · · = 0
reveals that this module has torsion (i.e., a product of two nonzero factors can
be zero). Thus, while we can multiply a doubly infinite power series by 1 − x,
we cannot (in general) divide it by 1 − x.

3.15. Multivariate FPSs


The rest of Chapter 3 needs more details.
TODO: Push this border stone further down this chapter.
Multivariate FPSs are just FPSs in several variables. Their theory is mostly
analogous to the theory of univariate FPSs (which is what we have been study-
ing above), but requires more subscripts. I will just discuss the main differences.
For instance, FPSs in two variables x and y have the form ∑ ai,j xi y j , where
i,j∈N
the summation sign ∑ means ∑ of course. Formally, such an FPS is a
i,j∈N (i,j)∈N2

family ai,j (i,j)∈N2 of elements of K. The indeterminates x and y are defined by
   
x = δ(i,j),(1,0) and y = δ(i,j),(0,1)
(i,j)∈N
2 2 (i,j)∈N

(so that each indeterminate has exactly one coefficient equal to 1, while all other
coefficients are 0). The rules for addition, subtraction, scaling and multiplica-
tion are essentially as they are for univariate FPSs, except that now the indexing
Math 701 Spring 2021, version April 6, 2024 page 197

set is N2 instead of N. For example, multiplication of FPSs in x and y is defined


by the formula
h i h i
[ x n ym ] ( ab) = ∑ x i j
y a · x k ℓ
y b
(i,j), (k,ℓ)∈N2 ;
i +k=n;
j+ℓ=m

(for any two FPSs a and b and any n, m ∈ N).


More generally, for any k ∈ N, the FPSs in k variables x1 , x2 , . . . , xk are de-
fined to be the families ( ai )i∈Nk of elements of K indexed by k-tuples i =
(i1 , i2 , . . . , ik ) ∈ Nk . Addition, subtraction and scaling of such families is de-
fined entrywise. Multiplication is defined by the formula
h i h i
[xn ] ( ab) = ∑ xi a · xj b
i,j∈Nk ;
i+j=n

(for any two FPSs a and b and any n ∈ Nk ), where

• the sum i + j means the entrywise sum of the k-tuples i and j (that
is, i + j = (i1 + j1 , i2 + j2 , . . . , ik + jk ), where i = (i1 , i2 , . . . , ik ) and j =
( j1 , j2 , . . . , jk ));
• if m ∈ Nk and h is an FPS in x1 , x2 , . . . , xk , then [xm ] h is the m-th en-
try of the family h. Just as in the univariate case, this entry [xm ] h is
m
called the coefficient of the monomial xm := x1m1 x2m2 · · · xk k in h (where
m = (m1 , m2 , . . . , mk )).

With this notation, the formula for ab doesn’t look any more complicated
than the analogous formula in the univariate case. The only difference is that
the monomials and the coefficients are now indexed not by the nonnegative
integers, but by the k-tuples of nonnegative integers instead.
The indeterminates x1 , x2 , . . . , xk are defined by
 
xi = δn,(0,0,...,0,1,0,0,...,0) ,
k n ∈N

where the tuple (0, 0, . . . , 0, 1, 0, 0, . . . , 0) is a k-tuple with a lone 1 in its i-th


position. Thus, if m = (m1 , m2 , . . . , mk ) ∈ Nk , then
m
x1m1 x2m2 · · · xk k = (δn,m )n∈Nk ,

so that each FPS f = ( f m )m∈Nk satisfies


m
f = f m x1m1 x2m2 · · · xk k .
m=(m1 ,m2 ,...,mk )∈Nk
Math 701 Spring 2021, version April 6, 2024 page 198

Most of what we have said about FPSs in one variable applies similarly to
FPSs in multiple variables. The proofs are similar but more laborious due to
the need for subscripts. Instead of the derivative of an FPS, there are now k
derivatives (one for each variable); they are called partial derivatives. One needs
to be somewhat careful with substitution – e.g., one cannot substitute non-
commuting elements into a multivariate polynomial. For example, you cannot
substitute two non-commuting matrices A and B for x and y into the polyno-
mial xy, at least not without sacrificing the rule that a value of the product of
two polynomials should be the product of their values (since x [ A, B] · y [ A, B] =
| {z } | {z }
=A =B
AB would have to equal y [ A, B] · x [ A, B] = BA). But you can still substitute k
| {z } | {z }
=B =A
commuting elements for the k indeterminates in a k-variable polynomial. You
can also compose multivariate FPSs as long as appropriate summability condi-
tions are satisfied.

Definition 3.15.1. Let k ∈ N. The K-algebra of all FPSs in k variables


x1 , x2 , . . . , xk over K will be denoted by K [[ x1 , x2 , . . . , xk ]].

Sometimes we will use different names for our variables. For example, if we
work with 2 variables, we will commonly call them x and y instead of x1 and
x2 . Correspondingly, we will use the notation K [[ x, y]] (instead of K [[ x1 , x2 ]])
for the K-algebra of FPSs in these two variables.
Let me give an example of working with multivariate FPSs.
Math 701 Spring 2021, version April 6, 2024 page 199

Let us work in K [[ x, y]]. On the one hand, we have


     
n n k n n k n
∑ k xy = ∑ ∑ k xy = ∑x n
∑ k yk
n,k∈N n ∈N k ∈N n ∈N k ∈N
| {z }
=(1+y)n
(by the binomial formula)
1
= ∑ x n (1 + y ) n = ∑ ( x (1 + y))n =
1 − x (1 + y )
n ∈N n ∈N
 
here, we have substituted x (1 + y) for x in
 1 
 the formula ∑ x n = ; 
1−x
 
 n ∈N 
this is allowed since x (1 + y) has constant term 0
1 1
= · (easy to check by computation)
1−x 1− x y
1−x
 k
1 x
1 − x k∑
= · y
∈N
1−x
x
 
here, we have substituted y for x
 1−x 
1
 

in the formula = ∑ x k 
1−x k ∈N
1 xk xk
= · ∑ y k
= ∑ k +1
yk .
1 − x k ∈N (1 − x ) k k ∈N (1 − x )

On the other hand, we have


      !
n n n
∑ k x n yk = ∑ ∑ k x n yk = ∑ ∑ k x n yk .
n,k∈N k ∈N n ∈N k ∈N n ∈N

Comparing these two equalities, we find


  !
xk n
∑ k +1
yk = ∑ ∑ k x n yk . (139)
k ∈N (1 − x ) k ∈N n ∈N

Now, comparing coefficients in front of xi yk in (139), we conclude that

xk
 
n n
k +1
= ∑ x for each k ∈ N. (140)
(1 − x ) n ∈N
k

Let me explain in a bit more detail what I mean by “comparing coefficients in


front of xi yk ”. What we have used is the following simple fact:
Math 701 Spring 2021, version April 6, 2024 page 200

Proposition 3.15.2. Let f 0 , f 1 , f 2 , . . . and g0 , g1 , g2 , . . . be FPSs in a single vari-


able x such that

∑ f k yk = ∑ gk y k in K [[ x, y]] . (141)
k ∈N k ∈N

Then, f k = gk for each k ∈ N.

Proof. For each k ∈ N, let us write the two FPSs f k and gk as f k = ∑ f k,n x n
n ∈N
and gk = ∑ gk,n x n with f k,n , gk,n ∈ K. Then, the equality (141) can be rewritten
n ∈N
as
∑ ∑ f k,n x n yk = ∑ ∑ gk,n x n yk .
k ∈N n ∈N k ∈N n ∈N

Now, comparing coefficients in front of x n yk in this equality, we obtain f k,n =


gk,n for each k, n ∈ N. Therefore, f k = gk for each k ∈ N. This proves Proposi-
tion 3.15.2.
xk
Now, because of (139), we can apply Proposition 3.15.2 to f k =
  (1 − x ) k +1
n n
and gk = ∑ x . Thus, we obtain the equality
n ∈N k

xk
 
n n
= ∑ x for each k ∈ N.
(1 − x ) k +1 n ∈N
k

This is an equality between univariate FPSs, even though we have obtained


it by manipulating bivariate FPSs (i.e., FPSs in two variables x and y). As a
homework exercise (Exercise A.2.15.1), you can prove this equality in a more
elementary, univariate way. But the idea to introduce extra variables (in our
case, in order to discover an equality) is highly useful in many situations (some
of which we will see later in this course).

4. Integer partitions and q-binomial coefficients


We have previously counted compositions of an n ∈ N. These are (roughly
speaking) ways to write n as a sum of finitely many positive integers, where
the order matters. For example, 3 = 3 = 2 + 1 = 1 + 2 = 1 + 1 + 1, so 3 has 4
compositions. Formally, compositions are tuples of positive integers.
Now, let us disregard the order. There are two ways to make this rigorous:
either we replace tuples by multisets, or we require the tuples to be weakly
decreasing. These result in the same count, but we will use the 2nd way, just
because tuples are easier to work with than multisets. This will lead to the
notion of integer partitions.
Math 701 Spring 2021, version April 6, 2024 page 201

4.1. Partition basics


4.1.1. Definitions
The following definition is built in analogy to Definition 3.9.1:

Definition 4.1.1. (a) An (integer) partition means a (finite) weakly decreas-


ing tuple of positive integers – i.e., a finite tuple (λ1 , λ2 , . . . , λm ) of positive
integers such that λ1 ≥ λ2 ≥ · · · ≥ λm .
Thus, partitions are the same as weakly decreasing compositions. Hence,
the notions of size and length of a partition are automatically defined, since
we have defined them for compositions (in Definition 3.9.1).
(b) The parts of a partition (λ1 , λ2 , . . . , λm ) are simply its entries
λ1 , λ2 , . . . , λ m .
(c) Let n ∈ Z. A partition of n means a partition whose size is n.
(d) Let n ∈ Z and k ∈ N. A partition of n into k parts is a partition whose
size is n and whose length is k.

Example 4.1.2. The partitions of 5 are

(5) , (4, 1) , (3, 2) , (3, 1, 1) , (2, 2, 1) , (2, 1, 1, 1) , (1, 1, 1, 1, 1) .

Definition 4.1.3. (a) Let n ∈ Z and k ∈ N. Then, we set

pk (n) := (# of partitions of n into k parts) .

(b) Let n ∈ Z. Then, we set

p (n) := (# of partitions of n) .

This is called the n-th partition number.

Example 4.1.4. Our above list of partitions of 5 reveals that

p0 (5) = 0;
p1 (5) = 1;
p2 (5) = 2;
p3 (5) = 2;
p4 (5) = 1;
p5 (5) = 1;
p k (5) =0 for any k > 5;

and finally p (5) = 7.


Math 701 Spring 2021, version April 6, 2024 page 202

Here are the values of p (n) for the first 15 nonnegative integers n:

n 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
.
p (n) 1 1 2 3 5 7 11 15 22 30 42 56 77 101 135

The sequence ( p (0) , p (1) , p (2) , . . .) is remarkable for being an integer se-
quence that grows faster than polynomially, but still considerably slower than
exponentially. (See (168) for an asymptotic expansion.) This not-too-fast growth
(for instance, p (100) = 190 569 292 is far smaller than 2100 ) makes integer par-
titions rather convenient for computer experiments.

4.1.2. Simple properties of partition numbers


We will next state some elementary properties of pk (n) and p (n), but first we
introduce a few very basic notations:
Definition 4.1.5. We will use the Iverson bracket notation: If A is a log-
( statement, then [A] means the truth value of A; this is the integer
ical
1, if A is true;
0, if A is false.

For example, [2 + 2 = 4] = 1 and [2 + 2 = 5] = 0.


Note that the Kronecker delta notation is a particular case of the Iverson
bracket: We have
δi,j = [i = j] for any objects i and j.

Definition 4.1.6. Let a be a real number.


Then, ⌊ a⌋ (called the floor of a) means the largest integer that is ≤ a.
Likewise, ⌈ a⌉ (called the ceiling of a) means the smallest integer that is ≥ a.

For example, the number π ≈ 3.14 satisfies ⌊π ⌋ = 3 and ⌈π ⌉ = 4 and


⌊−π ⌋ = −4 and ⌈−π ⌉ = −3. For another example, ⌊n⌋ = ⌈n⌉ = n for each
n ∈ Z.
The following proposition collects various basic properties of the numbers
introduced in Definition 4.1.3:
Proposition 4.1.7. Let n ∈ Z and k ∈ N.
(a) We have pk (n) = 0 whenever n < 0 and k ∈ N.
(b) We have pk (n) = 0 whenever k > n.
(c) We have p0 (n) = [n = 0].
(d) We have p1 (n) = [n > 0].
(e) We have pk (n) = pk (n − k ) + pk−1 (n − 1) whenever k > 0.
(f) We have p2 (n) = ⌊n/2⌋ whenever n ∈ N.
(g) We have p (n) = p0 (n) + p1 (n) + · · · + pn (n) whenever n ∈ N.
(h) We have p (n) = 0 whenever n < 0.
Math 701 Spring 2021, version April 6, 2024 page 203

Proof of Proposition 4.1.7 (sketched). (a) The size of a partition is always nonneg-
ative (being a sum of positive integers). Thus, a negative number n has no
partitions whatsoever. Thus, pk (n) = 0 whenever n < 0 and k ∈ N.
(b) If (λ1 , λ2 , . . . , λk ) is a partition, then λi ≥ 1 for each i ∈ {1, 2, . . . , k }
(because a partition is a tuple of positive integers, i.e., of integers ≥ 1). Hence,
if (λ1 , λ2 , . . . , λk ) is a partition of n into k parts, then

n = λ1 + λ2 + · · · + λ k (since (λ1 , λ2 , . . . , λk ) is a partition of n)


≥ 1| + 1 +{z· · · + 1} (since λi ≥ 1 for each i ∈ {1, 2, . . . , k})
k times
= k.

Thus, a partition of n into k parts cannot satisfy k > n. Thus, no such partitions
exist if k > n. In other words, pk (n) = 0 if k > n.
(c) The integer 0 has a unique partition into 0 parts, namely the empty tuple
(). A nonzero integer n cannot have any partitions into 0 parts, since the empty
tuple has size 0 ̸= n. Thus, p0 (n) equals 1 for n = 0 and equals 0 for n ̸= 0. In
other words, p0 (n) = [n = 0].
(d) Any positive integer n has a unique partition into 1 part – namely, the
1-tuple (n). On the other hand, if n is not positive, then this 1-tuple is not a
partition, so in this case n has no partition into 1 part. Thus, p1 (n) equals 1 if
n is positive and equals 0 otherwise. In other words, p1 (n) = [n > 0].
(e) Assume that k > 0. We must prove that pk (n) = pk (n − k) + pk−1 (n − 1).
We consider all partitions of n into k parts. We classify these partitions into
two types:

• Type 1 consists of all partitions that have 1 as a part.

• Type 2 consists of all partitions that don’t.

For example, here are the partitions of 5 along with their types:

(4, 1) , (3, 1, 1) , (2, 2, 1) , (2, 1, 1, 1) , (1, 1, 1, 1, 1), (5) , (3, 2) .


| {z } | {z }
Type 1 Type 2

Let us count the type-1 partitions and the type-2 partitions separately.
Any type-1 partition has 1 as a part, therefore as its last part (because it is
weakly decreasing). Hence, any type-1 partition has the form (λ1 , λ2 , . . . , λk−1 , 1).
If (λ1 , λ2 , . . . , λk−1 , 1) is a type-1 partition (of n into k parts), then (λ1 , λ2 , . . . , λk−1 )
is a partition of n − 1 into k − 1 parts. Thus, we have a map

{type-1 partitions of n into k parts} → {partitions of n − 1 into k − 1 parts} ,


( λ 1 , λ 2 , . . . , λ k −1 , 1 ) 7 → ( λ 1 , λ 2 , . . . , λ k −1 ) .
Math 701 Spring 2021, version April 6, 2024 page 204

This map is a bijection (since it has an inverse map, which simply inserts a 1 at
the end of a partition). Thus, the bijection principle shows that

(# of type-1 partitions of n into k parts)


= (# of partitions of n − 1 into k − 1 parts) = pk−1 (n − 1)

(by the definition of pk−1 (n − 1)).


Now let us count type-2 partitions. A type-2 partition does not have 1 as a
part; hence, all its parts are larger than 1 (because all its parts are positive inte-
gers), and therefore we can subtract 1 from each part and still have a partition
in front of us. To be more specific: If (λ1 , λ2 , . . . , λk ) is a type-2 partition of
n into k parts, then subtracting 1 from each of its parts produces the k-tuple
(λ1 − 1, λ2 − 1, . . . , λk − 1), which is a partition of n − k into k parts. Hence, we
have a map

{type-2 partitions of n into k parts} → {partitions of n − k into k parts} ,


(λ1 , λ2 , . . . , λk ) 7→ (λ1 − 1, λ2 − 1, . . . , λk − 1) .

This map is a bijection (since it has an inverse map, which simply adds 1 to
each entry of a partition). Thus, the bijection principle shows that

(# of type-2 partitions of n into k parts)


= (# of partitions of n − k into k parts) = pk (n − k)

(by the definition of pk (n − k )).


Since any partition of n into k parts is either type-1 or type-2 (but not both at
the same time), we now have

(# of partitions of n into k parts)


= (# of type-1 partitions of n into k parts) + (# of type-2 partitions of n into k parts)
| {z } | {z }
= p k −1 ( n −1 ) = pk (n−k )
= p k −1 ( n − 1 ) + p k ( n − k ) = p k ( n − k ) + p k −1 ( n − 1 ) .

Since the left hand side of this equality is pk (n), we thus have proved that
p k ( n ) = p k ( n − k ) + p k −1 ( n − 1 ).
(f) Let n ∈ N. The partitions of n into 2 parts are
 

(n − 1, 1) , (n − 2, 2) , (n − 3, 3) , . . . , n − ⌊n/2⌋, ⌊n/2⌋ .
 
| {z }
=⌈n/2⌉

Thus there are ⌊n/2⌋ of them. In other words, p2 (n) = ⌊n/2⌋.


Math 701 Spring 2021, version April 6, 2024 page 205

(g) Let n ∈ N. Any partition of n must have k parts for some k ∈ N. Thus,
n ∞
p (n) = ∑ pk (n) = ∑ pk (n) + ∑ p (n)
| k{z }
k ∈N k =0 k = n +1
=0
(by Proposition 4.1.7 (b))
n
= ∑ p k ( n ) = p0 ( n ) + p1 ( n ) + · · · + p n ( n ) .
k =0

(h) Same argument as for (a).

4.1.3. The generating function


Proposition 4.1.7 (e) is a recursive formula that makes it not too hard to com-
pute pk (n) for reasonably small values of n and k. Then, using Proposition
4.1.7 (g), we can compute p (n) from these pk (n)’s. However, one might want
a better, faster method.
To get there, let me first express the generating function of the numbers p (n):

Theorem 4.1.8. In the FPS ring Z [[ x ]], we have



1
∑ p (n) x n = ∏ 1 − xk .
n ∈N k =1

(The product on the right hand side is well-defined, since multiplying a


1
FPS by does not affect its first k coefficients.)
1 − xk

Example 4.1.9. Let us check the above equality “up to x5 ”, i.e., let us compare
the coefficients of xi for i < 5. (In doing so, we can ignore all powers of x
Math 701 Spring 2021, version April 6, 2024 page 206

higher than x4 .) We have



1 1 1 1 1
∏ 1 − xk = · 2
·
1 − x 1 − x 1 − x 1 − x4
1 3
· ····
k =1
 
= 1 + x + x2 + x3 + x4 + · · ·
 
· 1 + x2 + x4 + · · ·
 
3
· 1+x +···
 
· 1 + x4 + · · ·
· (1 + · · · )
· (1 + · · · )
····
= 1 + x + 2x2 + 3x3 + 5x4 + · · ·
= p (0) + p (1) x + p (2) x 2 + p (3) x 3 + p (4) x 4 + · · · .

Proof of Theorem 4.1.8. We have



1
∏ k
k =1 |1 −
{zx }
=1+ x k + x2k + x3k +···
∞   ∞
= ∏ 1+x +x k 2k
+ x3k + · · · = ∏ ∑ x ku
k =1 | {z } k =1 u ∈N
= ∑ x ku
u ∈N
 
here, we expanded the product
= ∑ x 1u1 2u2 3u3
x x ···
using Proposition 3.11.29
(u1 ,u2 ,u3 ,...)∈N∞ is
essentially finite

= ∑ x1u1 +2u2 +3u3 +···


(u1 ,u2 ,u3 ,...)∈N∞ is
essentially finite

= ∑ | Qn | x n ,
n ∈N

where

Qn = {(u1 , u2 , u3 , . . .) ∈ N∞ essentially finite | 1u1 + 2u2 + 3u3 + · · · = n} .

Thus, it will suffice to show that

| Qn | = p (n) for each n ∈ N.


Math 701 Spring 2021, version April 6, 2024 page 207

Let us fix n ∈ N. We want to construct a bijection from Qn to {partitions of n}.


Here is how to do this: For any (u1 , u2 , u3 , . . .) ∈ Qn , define a partition

π (u1 , u2 , u3 , . . .) := (the partition that contains each i exactly ui times)


 

= . . . , 3, 3, . . . , 3, 2, 2, . . . , 2, 1, 1, . . . , 1 .
| {z } | {z } | {z }
u3 times u2 times u1 times

This is a partition of n, since its size is 1u1 + 2u2 + 3u3 + · · · = n (because


(u1 , u2 , u3 , . . .) ∈ Qn ). Thus, we have defined a partition π (u1 , u2 , u3 , . . .) of n
for each (u1 , u2 , u3 , . . .) ∈ Qn . In other words, we have defined a map

π : Qn → {partitions of n} .

It remains to show that this map π is a bijection. We define a map

ρ : {partitions of n} → Qn

that sends each partition λ of n to the sequence

(# of 1’s in λ, # of 2’s in λ, # of 3’s in λ, . . .) ∈ Qn

(this is indeed a sequence in Qn , since



∑ i (# of i’s in λ) = (the sum of all entries of λ) = |λ| = n
i =1

because λ is a partition of n). It is now easy to check that the maps π and ρ are
mutually inverse, so that π is a bijection. The bijection principle therefore yields
| Qn | = (# of partitions of n) = p (n); but this is precisely what we wanted to
show. The proof of Theorem 4.1.8 is thus complete.
Theorem 4.1.8 has a “finite” analogue (finite in the sense that the product
∞ 1
∏ k
is replaced by a finite product; the FPSs are still infinite):
k =1 1 − x

Theorem 4.1.10. Let m ∈ N. For each n ∈ N, let pparts≤m (n) be the # of


partitions λ of n such that all parts of λ are ≤ m. Then,
m
1
∑ pparts≤m (n) x = n
∏ 1 − xk .
n ∈N k =1

Proof of Theorem 4.1.10. This proof is mostly analogous to the above proof of
Theorem 4.1.8, and to some extent even simpler because it uses m-tuples instead
of infinite sequences.
Math 701 Spring 2021, version April 6, 2024 page 208

We have
m
1
∏ k
k =1 |1 −
{zx }
=1+ x k + x2k + x3k +···
m   m
= ∏ 1+x +x k 2k
+x +··· = ∏
3k
∑ x ku
k =1 | {z } k =1 u ∈N
= ∑ x ku
u ∈N
 
here, we expanded the product
= ∑ x 1u1 2u2
x ···x mum
using Proposition 3.11.27
(u1 ,u2 ,...,um )∈Nm

= ∑ x1u1 +2u2 +···+mum = ∑ | Qn | x n ,


(u1 ,u2 ,...,um )∈Nm n ∈N

where
Qn = {(u1 , u2 , . . . , um ) ∈ Nm | 1u1 + 2u2 + · · · + mum = n} .
Thus, it will suffice to show that
| Qn | = pparts≤m (n) for each n ∈ N.
Let us fix n ∈ N. We want to construct a bijection from Qn to the set
{partitions λ of n such that all parts of λ are ≤ m}.
Here is how to do this: For any (u1 , u2 , . . . , um ) ∈ Qn , define a partition
π (u1 , u2 , . . . , um ) := (the partition that contains each i exactly ui times)
 

= m, m, . . . , m, . . . , 2, 2, . . . , 2, 1, 1, . . . , 1 .
| {z } | {z } | {z }
um times u2 times u1 times

Thus, we have defined a map


π : Qn → {partitions λ of n such that all parts of λ are ≤ m} .
It is easy to see that this map π is a bijection55 . The bijection principle therefore
yields
| Qn | = (# of partitions λ of n such that all parts of λ are ≤ m) = pparts≤m (n) ;
but this is precisely what we wanted to show. The proof of Theorem 4.1.10 is
thus complete.
Theorem 4.1.10 can be generalized further: Instead of counting the partitions
of n whose all parts are ≤ m, we can count the partitions of n whose parts all
belong to a given set I of positive integers. This leads to the following formula
(which specializes back to Theorem 4.1.10 when we take I = {1, 2, . . . , m}):

55 The argument is analogous to the one used in the proof of Theorem 4.1.8.
Math 701 Spring 2021, version April 6, 2024 page 209

Theorem 4.1.11. Let I be a subset of {1, 2, 3, . . .}. For each n ∈ N, let p I (n)
be the # of partitions λ of n such that all parts of λ belong to I. Then,

1
∑ p I (n) x n = ∏ 1 − xk .
n ∈N k∈ I

Proof of Theorem 4.1.11 (sketched). This is analogous to the proof of Theorem 4.1.10,
m
with some minor changes: The ∏ sign has to be replaced by ∏ ; the m-
k =1 k∈ I
tuples (u1 , u2 , . . . , um ) ∈ Nm must be replaced by the essentially finite families
(ui )i∈ I ∈ N I ; the bijection π has to be replaced by the new bijection

π : Qn → {partitions λ of n such that all parts of λ are ∈ I }

(where Qn is now the set of all essentially finite families (ui )i∈ I ∈ N I ) defined
by

π (ui )i∈ I := (the partition that contains each i ∈ I exactly ui times)
 

=  . . . , i3 , i3 , . . . , i3 , i2 , i2 , . . . , i2 , i1 , i1 , . . . , i1  ,
 
| {z } | {z } | {z }
ui3 times ui2 times ui1 times

where i1 , i2 , i3 , . . . are the elements of I listed in increasing order. The details


are left to the reader.

4.1.4. Odd parts and distinct parts


Next, we shall state a result of Euler that we have already discovered in a
different language.

Definition 4.1.12. Let n ∈ Z.


(a) A partition of n into odd parts means a partition of n whose all parts are
odd.
(b) A partition of n into distinct parts means a partition of n whose parts are
distinct.
(c) Let

podd (n) := (# of partitions of n into odd parts) and


pdist (n) := (# of partitions of n into distinct parts) .
Math 701 Spring 2021, version April 6, 2024 page 210

Example 4.1.13. We have

podd (7) = |{(7) , (5, 1, 1) , (3, 3, 1) , (3, 1, 1, 1, 1) , (1, 1, 1, 1, 1, 1, 1)}| = 5;


pdist (7) = |{(7) , (6, 1) , (5, 2) , (4, 3) , (4, 2, 1)}| = 5.

Theorem 4.1.14 (Euler’s odd-distinct identity). We have podd (n) = pdist (n)
for each n ∈ N.

We have already encountered this theorem before (as Theorem 3.11.34, albeit
in less precise language), and we have proved it using the generating function
identity
  −1  
∏ 1−x 2i −1
= ∏ 1+x . k
i >0 k >0
Let me outline a different, bijective proof.
Second proof of Theorem 4.1.14 (sketched). Let n ∈ N. We want to construct a bi-
jection

A : {partitions of n into odd parts} → {partitions of n into distinct parts} .

We shall do this as follows: Given a partition λ of n into odd parts, we repeat-


edly merge pairs of equal parts in λ until no more equal parts appear. The final
result will be A (λ). Here are two examples:

• To compute A (5, 5, 3, 1, 1, 1), we compute56


 
5, 5, 3, 1, 1, 1 → 10, 3, 1, 1, 1 → (10, 3, 2, 1) .

Thus, A (5, 5, 3, 1, 1, 1) = (10, 3, 2, 1).

• To compute A (5, 3, 1, 1, 1, 1), we compute


  
5, 3, 1, 1, 1, 1 → 5, 3, 2, 1, 1 → 5, 3, 2, 2 → (5, 4, 3) .

Thus, A (5, 3, 1, 1, 1, 1) = (5, 4, 3).

Why is this map A well-defined? We only specified the sort of steps we are al-
lowed to take when computing A (λ); however, there is often a choice involved
in taking these steps (since there are often several pairs of equal parts).57 So
56 The two entries underlined are the two equal entries that are going to get merged in the next
step. Note that there are usually several candidates, and we just pick one pair at will.
57 For example, we could have also computed A (5, 5, 3, 1, 1, 1) as follows:


(5, 5, 3, 1, 1, 1) → 5, 5, 3, 2, 1 → (10, 3, 2, 1) .
Math 701 Spring 2021, version April 6, 2024 page 211

we have specified a non-deterministic algorithm. Why is the resulting partition


independent of the choices we make?
One way to prove this is using the diamond lemma, which is a general tool
for proving that certain non-deterministic algorithms have unique final out-
comes (independent of the choices taken). See, e.g., https://mathoverflow.
net/questions/289300/ for a list of references on this lemma.
For the map A, we can also proceed differently, by analyzing the algorithm
that we used to define A. Namely, we observe what is really going on when
we are merging equal parts. Let us say our original partition λ has p many 1s.
Let us first merge them in pairs, so that we get ⌊ p/2⌋ many 2s and maybe one
single 1. Then, let us merge the 2s in pairs, so that we get ⌊⌊ p/2⌋ /2⌋ many 4s,
maybe a single 2, and maybe a single 1. Proceed until no more than one 1, no
more than one 2, no more than one 4, no more than one 8, and so on remain.
This clears out any duplicate parts of the form 2k . Next do the same with parts
of the form 3 · 2k (that is, with parts equal to 3, 6, 12, 24, and so on), then with
parts of the form 5 · 2k , and so on.
The nice thing about this way of proceeding is that we can explicitly describe
the final outcome. Indeed, if the original partition λ (a partition of n into
odd parts) contains an odd part k precisely m many times, and if the binary
representation of m is m = (mi mi−1 · · · m1 m0 )2 (that is, if m0 , m1 , . . . , mi ∈ {0, 1}
i
satisfy m = ∑ m j 2 j ), then the partition A (λ) will contain the number 20 k
j =0
exactly m0 times, the number 21 k exactly m1 times, the number 22 k exactly m2
times, and so on. Since the binary digits m0 , m1 , . . . , mi are all ≤ 1, this partition
A (λ) will therefore not contain any number more than once, i.e., it will be a
partition into distinct parts.
It is not hard to check that this map A is indeed a bijection. Indeed, in order
to see this, we construct a map B that will turn out to be its inverse. Here, we
start with a partition λ of n into distinct parts. Let us represent each part of
this partition in the form k · 2i for some odd k ≥ 1 and some integer i ≥ 0.
(Recall that any positive integer can be represented uniquely in this form.)
Now, replace this part k · 2i by 2i many k’s. The resulting partition (once all
parts have been replaced) will usually have many equal parts, but all its parts
are odd. We define B (λ) to be this resulting partition. Alternatively, B (λ)
can also be constructed step-by-step by a non-deterministic algorithm: Starting
with λ, keep “breaking even parts into halves” (i.e., whenever you see an even
m m
part m, replace it by two parts and ), until no even parts remain any more.
2 2
The result is B (λ). It is not hard to see that both descriptions of B (λ) describe
the same partition. It is furthermore easy to see that this map B is indeed an
inverse of A, so that A is indeed a bijection. Thus, the bijection principle yields

|{partitions of n into odd parts}| = |{partitions of n into distinct parts}| .


In other words, podd (n) = pdist (n). This proves Theorem 4.1.14.
Math 701 Spring 2021, version April 6, 2024 page 212

Yet another proof of Theorem 4.1.14 will be given in Subsection 6.2.2.

4.1.5. Partitions with a given largest part


Here is another situation in which two kinds of partitions are equinumerous:

Proposition 4.1.15. Let n ∈ N and k ∈ N. Then,

pk (n) = (# of partitions of n whose largest part is k ) .

Here and in the following, we use the following convention:

Convention 4.1.16. We agree to say that the largest part of the empty parti-
tion () is 0 (even though this partition has no parts).

Example 4.1.17. For n = 4 and k = 3, we have

p k ( n ) = p3 (4) = 1 (due to the partition (2, 1, 1))

and

(# of partitions of n whose largest part is k)


= (# of partitions of 4 whose largest part is 3)
=1 (due to the partition (3, 1)) .

Thus, Proposition 4.1.15 holds for n = 4 and k = 3.

Proof of Proposition 4.1.15 (sketched). We do a “proof by picture” (it can be made


rigorous – see Exercise A.3.1.1 for this). We pick n = 14 and k = 4 for example,
and we start with the partition λ = (5, 4, 4, 1) of n into k parts.
We draw a table of k left-aligned rows, where the length of each row equals
the corresponding part of λ (that is, the i-th row from the top has λi boxes,
where λ = (λ1 , λ2 , . . . , λk )):

5→
4→
4→
1→

Now, let us flip this table across the “main diagonal” (i.e., the diagonal that
Math 701 Spring 2021, version April 6, 2024 page 213

goes from the top-left corner to the bottom-right corner)58 :

5 4 4 1
↓ ↓ ↓ ↓

The lengths of the rows of the resulting table again form a partition of n. (In
our case, this new partition is (4, 3, 3, 3, 1).) Moreover, the largest part of this
new partition is k (because the original table had k rows, so the flipped table
has k columns, and this means that its top row has k boxes). This procedure
(i.e., turning a partition into a table, then flipping the table across the “main
diagonal”, and then reading the lengths of the rows of the resulting table again
as a partition) therefore gives a map from

{partitions of n into k parts}

to
{partitions of n whose largest part is k} .
Moreover, this map is a bijection (indeed, its inverse can be effected in the exact
same way, by flipping the table). This bijection is called conjugation of partitions,
and will be studied in more detail later.
Here are some pointers to how this proof can be formalized (see Exercise
A.3.1.1 for much more): For any partition λ = (λ1 , λ2 , . . . , λk ), we define the
Young diagram of λ to be the set
n o
Y (λ) := (i, j) ∈ Z2 | 1 ≤ i ≤ k and 1 ≤ j ≤ λi .

This Young diagram is precisely the table that we drew above, as long as we
agree to identify each pair (i, j) ∈ Y (λ) with the box in row i and column j.
Now, the conjugate of the partition λ is the partition λt uniquely determined by

Y λt = flip (Y (λ)) = {( j, i ) | (i, j) ∈ Y (λ)} .




Explicitly, λt can be defined by λt = µ1 , µ2 , . . . , µ p , where p is the largest part




of λ and where

µi = (# of parts of λ that are ≥ i ) for each i ∈ {1, 2, . . . , p} .

58 This kind of flip is precisely how you would transpose a matrix.


Math 701 Spring 2021, version April 6, 2024 page 214

(This conjugate λt is also often called λ′ , and is also known as the transpose
t
of λ.) Now, it is not hard to show that λt = |λ| and λt = λ for each
partition λ, and that the largest part of λt equals the length of λ. Using these
observations (which are proved in Exercise A.3.1.1), we see that the map

{partitions of n into k parts} → {partitions of n whose largest part is k} ,


λ 7→ λt

is well-defined and is a bijection; thus, the above proof of Proposition 4.1.15


becomes fully rigorous.
The word “Young” in “Young diagram” (and, later, “Young tableau”) does
not imply any novelty (Young diagrams have been around in some form or
another since the 19th century – if often in the superficially different guise of
“Ferrers diagrams”), but rather honors Alfred Young, who built up the rep-
resentation theory of symmetric groups (and significantly forwarded invariant
theory) using these objects.

Corollary 4.1.18. Let n ∈ N and k ∈ N. Then,

p0 ( n ) + p1 ( n ) + · · · + p k ( n )
= (# of partitions of n whose largest part is ≤ k) .

Proof of Corollary 4.1.18. We have

p0 ( n ) + p1 ( n ) + · · · + p k ( n )
k
= ∑ p (n)
| i{z }
i =0
=(# of partitions of n whose largest part is i )
(by Proposition 4.1.15,
applied to i instead of k)
k
= ∑ (# of partitions of n whose largest part is i)
i =0
= (# of partitions of n whose largest part is ≤ k) .

This proves Corollary 4.1.18.


Corollary 4.1.18 leads to yet another FPS identity:

Theorem 4.1.19. Let m ∈ N. Then,


m
1
∑ ( p0 (n) + p1 (n) + · · · + pm (n)) x n = ∏ 1 − xk .
n ∈N k =1
Math 701 Spring 2021, version April 6, 2024 page 215

Proof of Theorem 4.1.19. For each n ∈ N, we have

p0 ( n ) + p1 ( n ) + · · · + p m ( n )
= (# of partitions of n whose largest part is ≤ m) (by Corollary 4.1.18)
= (# of partitions of n whose all parts are ≤ m)
 
because the condition “the largest part is ≤ m” for a
partition is clearly equivalent to “all parts are ≤ m”
= pparts≤m (n) ,

where pparts≤m (n) is defined as in Theorem 4.1.10. Hence,


m
1
∑| ( p 0 ( n ) + p 1 ( n ) + · · · + p m ( n )) x n
= ∑ p parts ≤ m ( n ) x n
= ∏ k
n ∈N n ∈N k =1 1 − x
{z }
= pparts≤m (n)

(by Theorem 4.1.10). This proves Theorem 4.1.19.

4.1.6. Partition number vs. sums of divisors


We shall now prove a curious combinatorial identity (due, I think, to Euler),
illustrating the usefulness of generating functions and logarithmic derivatives.
It connects the partition numbers p (n) with a further sequence, which counts
the positive divisors of a given positive integer:

Theorem 4.1.20. For any positive integer n, let σ (n) denote the sum of all
positive divisors of n. (For example, σ (6) = 1 + 2 + 3 + 6 = 12 and σ (7) =
1 + 7 = 8.)
For any n ∈ N, we have
n
np (n) = ∑ σ (k) p (n − k) .
k =1

For example, for n = 3, this says that

3 p (3) = σ (1) p (2) + σ (2) p (1) + σ (3) p (0) .


| {z } | {z } | {z } | {z } | {z } | {z } | {z }
=3 =1 =2 =3 =1 =4 =1

For reference, let us give a table of the first 15 sums σ (n) defined in Theorem
4.1.20 (see the Sequence A000203 in the OEIS for more values):

n 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
.
σ (n) 1 3 4 7 6 12 8 15 13 18 12 28 14 24 24
Math 701 Spring 2021, version April 6, 2024 page 216

Proof of Theorem 4.1.20 (sketched). Define the two FPSs

P := ∑ p (n) x n ∈ Z [[ x ]] and S := ∑ σ (k) xk ∈ Z [[x]] .


n ∈N k >0

Hence,
! !
SP = ∑ σ (k) x k ∑ p (n) x n = ∑ ∑ σ (k) x k p (n) x n
k >0 n ∈N k >0 n ∈N
| {z }
= p (n ) x k+n

= ∑ ∑ σ (k ) p (n ) x k+n = ∑ ∑ σ (k ) p (m − k ) |x k+({zm−k})
k >0 n ∈N k >0 m∈N; =xm
m≥k
| {z }
= ∑ ∑
m ∈N k>0;
m≥k
m
= ∑ ∑
m ∈N k =1

(here, we have substituted m − k for n in the inner sum)


m
= ∑ ∑ σ (k) p (m − k) x m . (142)
m ∈N k =1

From P = ∑ p (n) x n , we obtain


n ∈N

P′ = ∑ np (n) xn−1. (143)


n >0

Multiplying this equality by x, we obtain the somewhat nicer


xP′ = x ∑ np (n) xn−1 = ∑ np (n) |xx{zn−}1 = ∑ np (n) xn
n >0 n >0 =xn n >0

= ∑ np (n) x n
(144)
n ∈N

(here, we have extended the range of the sum to include n = 0, which caused
no change to the value of the sum, since the newly added n = 0 addend is
0p (0) x0 = 0).
Let us now recall the notion of a logarithmic derivative (as defined in Defini-
0

tion 3.7.12). The FPS P has constant coefficient x P = p (0) = 1, thus belongs
to Z [[ x ]]1 . Hence, its logarithmic derivative loder P is well-defined.
We shall now use loder P to compute P′ in a roundabout way. Namely, we
have

1
P = ∑ p (n) x n = ∏ k
n ∈N k =1 1 − x
(by Theorem 4.1.8). In other words,
1
P= ∏ 1 − xk (145)
k >0
Math 701 Spring 2021, version April 6, 2024 page 217


(since the product sign ∏ is synonymous with ∏ ).
k =1 k >0
But Corollary 3.7.15 says that any k FPSs f 1 , f 2 , . . . , f k ∈ K [[ x ]]1 (for any
commutative ring K) satisfy

loder ( f 1 f 2 · · · f k ) = loder ( f 1 ) + loder ( f 2 ) + · · · + loder ( f k ) .

In other words, the logarithmic derivative of a product of k FPSs (which belong


to K [[ x ]]1 ) equals the sum of the logarithmic derivatives of these k FPSs. By
a simple limit argument (using Theorem 3.13.15, Theorem 3.13.14, Proposition
3.13.11 and Proposition 3.13.13), we can extend this fact to an infinite product
(as long as it is multipliable). Thus, if ( f 1 , f 2 , f 3 , . . .) is a multipliable sequence
of FPSs in K [[ x ]]1 (for any commutative ring K), then

loder ( f 1 f 2 f 3 · · · ) = loder ( f 1 ) + loder ( f 2 ) + loder ( f 3 ) + · · · ;

equivalently, !
loder ∏ fk = ∑ loder ( f k ) .
k >0 k >0
1
Applying this to K = Z and f k = , we obtain
1 − xk
!
1 1
loder ∏ k
= ∑ loder .
k >0 1 − x k >0 1 − xk

In view of (145), we can rewrite this as

1
loder P = ∑ loder 1 − xk . (146)
k >0
Math 701 Spring 2021, version April 6, 2024 page 218

However, every positive integer k satisfies


  −1 
1 k
loder k
= loder 1 − x
|1 −
{zx }
−1
= (1− x k )
!
  by Corollary 3.7.16,
= − loder 1 − x k
| {z } applied to f = 1 − x k
k
′
1−x
=
1 − xk
(by the definition of a
logarithmic derivative)
′
1 − xk −kx k−1
  ′ 
k k −1
=− =− since 1−x = −kx
1 − xk 1 − xk
kx k−1   −1
= = kx k−1 1 − xk
1 − xk | {z }
k m
= ∑ (x )
m ∈N
(by the geometric series formula)
 m  m
= kx k −1
∑ x k
= ∑ k k −1
x k
x
m ∈N m ∈N | {z }
= x k−1+km = x k(m+1)−1
= ∑ kx k(m+1)−1
m ∈N
 
here, we have substituted i
= ∑ kx ki −1
for m + 1 in the sum
i >0
 
here, we have substituted n
= ∑ kx n −1
for ki in the sum
.
n>0;
k|n

Hence, we can rewrite (146) as


loder P = ∑ ∑ kx n−1 = ∑ ∑ kxn−1
k >0 n>0; n >0 k>0;
k|n k|n
| {z }
=∑ ∑
n >0 k>0;
k|n
 

∑ ∑ x n −1 = ∑ σ ( n ) x n −1 .
 
=  k

n >0 k>0; n >0
k|n
| {z }
=(sum of all positive divisors of n)
=σ(n)
(by the definition of σ(n))
Math 701 Spring 2021, version April 6, 2024 page 219

Comparing this with

P′
loder P = (by Definition 3.7.12) ,
P
we obtain
P′
= ∑ σ ( n ) x n −1 .
P n >0
Multiplying both sides of this equality by xP, we find

xP′ = xP · ∑ σ (n) xn−1 = P · ∑ σ (n) |xx{zn−}1 = P · ∑ σ (n) xn = P · S = SP


n >0 n >0 =xn n >0
| {z }
= ∑ σ(k) x k =S
k >0
m
= ∑ ∑ σ (k) p (m − k) x m (by (142))
m ∈N k =1
n  
here, we have renamed the
= ∑ ∑ σ (k) p (n − k) x n
summation index m as n
.
n ∈N k =1

Comparing this with (144), we find


n
∑ np (n) x n = ∑ ∑ σ (k) p (n − k) x n .
n ∈N n ∈N k =1

Comparing coefficients in front of x n on both sides of this equality, we find that


n
np (n) = ∑ σ (k) p (n − k) for each n ∈ N.
k =1

This proves Theorem 4.1.20.


More generally, the following holds:

Theorem 4.1.21. Let I be a subset of {1, 2, 3, . . .}. For each n ∈ N, let p I (n)
be the # of partitions λ of n such that all parts of λ belong to I.
For any positive integer n, let σI (n) denote the sum of all positive divisors
of n that belong to I. (For example, if O = {all odd positive integers} and
E = {all even positive integers}, then σO (6) = 1 + 3 = 4 and σE (6) = 2 +
6 = 8.)
For any n ∈ N, we have
n
np I (n) = ∑ σI (k) p I (n − k) .
k =1
Math 701 Spring 2021, version April 6, 2024 page 220

Of course, Theorem 4.1.20 is the particular case of Theorem 4.1.21 for I =


{1, 2, 3, . . .}.
Proof of Theorem 4.1.21 (sketched). Argue as in our above proof of Theorem 4.1.20,
making some replacements: In particular, symbols like ∑ and ∏ must be re-
k >0 k >0
placed by ∑ and ∏ (respectively), and Theorem 4.1.11 must be used instead
k∈ I k∈ I
of Theorem 4.1.8. We leave the details to the reader.
We note that Theorem 4.1.20 and even the more general Theorem 4.1.21 are
not hard to prove by completely elementary ways (see, e.g., Exercise A.3.1.11).
Nevertheless, our above proof using generating functions has some interesting
consequences, one of which we might soon explore.

4.2. Euler’s pentagonal number theorem


The following definition looks somewhat quaint; why define a notation for a
specific quadratic function?

Definition 4.2.1. For any k ∈ Z, define a nonnegative integer wk ∈ N by

(3k − 1) k
wk = .
2
This is called the k-th pentagonal number.

Here is a table of these pentagonal numbers:

k ··· −5 −4 −3 −2 −1 0 1 2 3 4 5 ···
.
wk ··· 40 26 15 7 2 0 1 5 12 22 35 ···

Note that wk really is a nonnegative integer for any k ∈ Z (check this!). The
name “pentagonal numbers” is historically motivated (see the Wikipedia page
for details); the only thing we need to know about them (beside their definition)
is the fact that they are nonnegative integers and grow quadratically with n in
both directions (i.e., when n → ∞ and when n → −∞). The latter fact ensures
that the infinite sum ∑ (−1)k x wk is a well-defined FPS in Z [[ x ]]. Rather sur-
k ∈Z
prisingly, this infinite sum coincides with a particularly simple infinite product:

Theorem 4.2.2 (Euler’s pentagonal number theorem). We have


∞  
∏ 1 − xk = ∑ (−1)k xwk .
k =1 k ∈Z
Math 701 Spring 2021, version April 6, 2024 page 221

Let us write this out concretely:


∞  
∏ 1−x k

k =1
= ∑ (−1)k xwk
k ∈Z
= · · · + x w −4 − x w −3 + x w −2 − x w −1 + x w0 − x w1 + x w2 − x w3 + x w4 − x w5 ± · · ·
= · · · + x26 − x15 + x7 − x2 + 1 − x + x5 − x12 + x22 − x35 ± · · ·
= 1 − x − x2 + x5 + x7 − x12 − x15 + x22 + x26 ± · · · .

We will prove Theorem 4.2.2 in the next section (as a particular case of Ja-
cobi’s Triple Product Identity).59 First, let us use it to derive the following
recursive formula for the partition numbers p (n):

Corollary 4.2.3. For each positive integer n, we have

p (n) = ∑ (−1)k−1 p (n − wk )
k∈Z;
k ̸ =0
= p ( n − 1) + p ( n − 2) − p ( n − 5) − p ( n − 7)
+ p (n − 12) + p (n − 15) − p (n − 22) − p (n − 26) ± · · · .

Proof of Corollary 4.2.3 using Theorem 4.2.2. We have



1
∑ p (m) x m = ∑ p (n) x n = ∏ 1 − xk
m ∈N n ∈N k =1

(by Theorem 4.1.8) and


∞  
∑ (−1) k
x wk
= ∏ 1−x k

k ∈Z k =1

(by Theorem 4.2.2). Multiplying these two equalities, we obtain

∞ ∞ 
! ! ! !
1 
∑ p (m) xm · ∑ (−1)k xwk = ∏ 1 − xk · ∏ 1 − xk
m ∈N k ∈Z k =1 k =1
∞  
1 
=∏ k
· 1−x k

k =1 | 1 − x
{z }
=1
= 1. (147)
59 See [Bell06] for the history of Theorem 4.2.2.
Math 701 Spring 2021, version April 6, 2024 page 222

Now, let us fix a positive integer n. We shall compare the x n -coefficients on


both sides of (4.2.3).
The x n -coefficient on the left hand side of (4.2.3) is

∑ p (m) · (−1)k = ∑ p (n − wk ) · (−1)k


m∈N; k∈Z;
k∈Z; n − w k ≥0
m + wk = n
 
here, we have replaced m by n − wk
 in the sum, since the condition m + wk = n 
forces m to be n − wk
= ∑ p (n − wk ) · (−1)k
k ∈Z
 
here, we have extended the range of
 summation; this does not change the sum, 
since p (n − wk ) = 0 whenever n − wk < 0
= ∑ (−1)k p (n − wk )
k ∈Z
 

= (−1)0 p n − w0  + ∑ (−1)k p (n − wk )
k∈Z;
| {z } |{z}
=1 =0 k ̸ =0

= p (n) + ∑ k
(−1) p (n − wk ) .
k∈Z;
k ̸ =0

But the x n -coefficient on the right hand side of (4.2.3) is 0 (since n is positive).
Hence, comparing the coefficients yields
p (n) + ∑ (−1)k p (n − wk ) = 0.
k∈Z;
k ̸ =0

Solving this for p (n), we find


p (n) = − ∑ (−1)k p (n − wk ) = ∑ (−1)k−1 p (n − wk ) .
k∈Z; k∈Z;
k ̸ =0 k ̸ =0

Corollary 4.2.3 is thus proved.

4.3. Jacobi’s triple product identity


4.3.1. The identity
Instead of proving Theorem 4.2.2 directly, we shall prove a stronger result:
Jacobi’s triple product identity. This identity can be stated as follows:
   
∏ = ∑ qℓ zℓ .
2n−1 2n−1 −1 2n 2
1 + q z 1 + q z 1 − q (148)
n >0 ℓ∈Z
Math 701 Spring 2021, version April 6, 2024 page 223

What are q and z here? It appears that (148) should be an identity between
multivariate Laurent series (in the indeterminates q and z), but we have never
defined such a concept. Multivariate Laurent series can indeed be defined,
but this is not as easy as the univariate case and involves some choices (see
[ApaKau13] for details).
A simpler ring in which the identity (148) can be placed is (Z [z± ]) [[q]] (that
is, the ring of FPSs in the indeterminate q whose coefficients are Laurent poly-
nomials over Z in the indeterminate z). In other words, we state the following:
Theorem 4.3.1 (Jacobi’s triple product identity, take 1). In the ring
(Z [z± ]) [[q]], we have
   
∏ 1+q z 1+q z = ∑ qℓ zℓ .
2n−1 2n−1 −1 2n 2
1−q
n >0 ℓ∈Z

However, we aren’t just planning to view this identity as a formal iden-


tity between power series; instead, we will later evaluate both sides at certain
powers of another indeterminate x (i.e., we will set q = x a and z = x b for
some positive integers a and b). Alas, this is not an operation defined on the
whole ring (Z [z± ]) [[q]]. For example, setting q = x and z = x in the sum
∑ qℓ z−ℓ ∈ (Z [z± ]) [[q]] yields the nonsensical sum ∑ x ℓ x −ℓ = ∑ 1.
ℓ∈N ℓ∈N ℓ∈Z
Thus, Theorem 4.3.1 is not a version of Jacobi’s triple product identity that
we can use for our purposes. Instead, let us interpret the identity (148) in a
different way: Instead of treating q and z as indeterminates, I will set them to
be powers of a single indeterminate x (more precisely, scalar multiples of such
powers). This leads us to the following version of the identity:
Theorem 4.3.2 (Jacobi’s triple product identity, take 2). Let a and b be two
integers such that a > 0 and a ≥ |b|. Let u, v ∈ Q be rational numbers with
v ̸= 0. In the ring Q (( x )), set q = ux a and z = vx b . Then,
   
∏ = ∑ qℓ zℓ .
2n−1 2n−1 −1 2n 2
1 + q z 1 + q z 1 − q
n >0 ℓ∈Z

Before we start proving this theorem, let us check that the infinite product on
its left hand side and the infinite sum on its right are well-defined:
• The infinite product is
   
∏ 1 + q 2n−1
z 1 + q 2n−1 −1
z 1 − q 2n
n >0
      −1   
= ∏ a 2n−1
1 + (ux ) vx b a 2n−1
1 + (ux ) vx b a 2n
1 − (ux )
n >0
   
= ∏ 1 + u2n−1 vx (2n−1)a+b 1 + u2n−1 v−1 x (2n−1)a−b 1 − u2n x2na .
n >0
Math 701 Spring 2021, version April 6, 2024 page 224

All factors in this product belong to the ring Q [[ x ]] (not just to Q (( x ))),
since the exponents (2n − 1) a + b and (2n − 1) a − b and 2na are always
nonnegative for any n > 0 (indeed, for any n > 0, we have (2n − 1) |{z} a +b ≥
| {z }
≥1 ≥|b|
|b| + b ≥ 0 and (2n − 1) |{z}
a − b ≥ |b| − b ≥ 0 and 2n |{z}
a > 0). More-
| {z }
≥1 ≥|b| >0
over, this product is multipliable, because
– the number (2n − 1) a + b grows linearly when n → ∞ (since a > 0);
– the number (2n − 1) a − b grows linearly when n → ∞ (since a > 0);
– the number 2na grows linearly when n → ∞ (since a > 0).
Thus, the infinite product is well-defined.
This argument also shows that the three subproducts ∏ 1 + q2n−1 z and

n >0
2n − 1 − 1 2n
∏ 1+q and ∏ 1 − q
 
z are well-defined.
n >0 n >0

• The infinite sum is


2
 ℓ
∑ ∑ ∑ uℓ vℓ x aℓ +bℓ .
2 2 2
qℓ zℓ = (ux a )ℓ vx b =
ℓ∈Z ℓ∈Z ℓ∈Z

All addends in this sum belong to the ring Q [[ x ]] (not just to Q (( x ))),
since the exponent aℓ2 + bℓ is always nonnegative for any ℓ ∈ Z (indeed,
ℓ2 +
a |{z}
we have |{z} bℓ
|{z} ≥ |b| · |ℓ|2 − |b| · |ℓ| = |b| · |ℓ| · (|ℓ| − 1) ≥
|{z} | {z }
≥|b| =|ℓ|2 ≥−|bℓ|=−|b|·|ℓ| ≥0 ≥0
0 for any ℓ ∈ Z). Moreover, this sum is summable, because aℓ2 + bℓ grows
quadratically when ℓ → +∞ or ℓ → −∞ (since a > 0).

4.3.2. Jacobi implies Euler


We will give a proof of Jacobi’s triple product identity that works equally for
both versions of it (Theorem 4.3.1 and Theorem 4.3.2). But first, let us see how
it yields Euler’s pentagonal number theorem as a particular case.
Proof of Theorem 4.2.2 using Theorem 4.3.2. Set q = x3 and z = − x in Theorem
4.3.2. (This means that we apply Theorem 4.3.2 to a = 3 and b = 1 and u = 1
and v = −1). We get
  2n−1   2n−1   2n 
∏ 1+ x 3
(− x ) 1+ x 3
(− x ) −1
1 − x3
n >0
  ℓ2
= ∑ x3 (− x )ℓ . (149)
ℓ∈Z
Math 701 Spring 2021, version April 6, 2024 page 225

The left hand side of this equality simplifies as follows:


   
 
   
  2n−1   2n−1   2n 
∏ 1 + x 3 (− x ) 1 + x3 (− x )−1  
1 − x
3
   

n >0  
  | {z 
}  | {z } | {z } 

=− x3(2n−1)+1 =− x3(2n−1)−1 = x6n
=− x6n−2 =− x6n−4
   

= ∏  1 − 6n−2   6n−4  
x6n 
|x {z }  1 − |x {z }  1 − |{z}
 
n >0
=( x2 )3n−1 =( x2 )3n−2 =( x2 )3n
  3n−1    3n−2    3n 
= ∏ 1− x 2
1− x 2
1 − x2
n >0
  k 
= ∏ 1 − x2 ,
k >0

since each positive integer k can be uniquely represented as 3n − 1 or 3n − 2 or


3n for some positive integer n.
Comparing this with (149), we obtain
  k 
∏ 1 − x2
k >0
  ℓ2
∑ x3 (−x)ℓ = ∑ (−1)ℓ
2
3ℓ +ℓ
= |x {z }
ℓ∈Z | } ℓ∈Z
= x2w−ℓ
{z
=(−1)ℓ x3ℓ2 +ℓ (since 3ℓ2 +ℓ=(3ℓ+1)ℓ=(3(−ℓ)−1)(−ℓ)=2w−ℓ
(because w−ℓ is defined as (3(−ℓ)−1)(−ℓ)/2))
 w−ℓ
= ∑ {z } = ∑ (−1)
(−1)ℓ |x2w −ℓ −ℓ 2
x
ℓ∈Z ℓ∈Z
| {z } w2 −ℓ
=(−1)−ℓ =( x )
  wk
= ∑ (−1) k 2
x (150)
k ∈Z

(here, we have substituted k for −ℓ in the sum).


Now, let us “substitute x for x2 ” in this equality (see below for how this
works). As a result, we obtain
 
∏ 1 − x k
= ∑ (−1)k x wk .
k >0 k ∈Z

This is Euler’s pentagonal number theorem (Theorem 4.2.2).


What did I mean by “substituting x for x2 ”? I meant using the following
simple fact:
Math 701 Spring 2021, version April 6, 2024 page 226

 K be acommutative
Lemma 4.3.3. Let ring. Let f and g be two FPSs in K [[ x ]].
Assume that f x2 = g x2 . Then, f = g.


Proof. This is easy: Write f and g as f = ∑ f n x n and g = ∑ gn x n where


n∈N  n∈ nN
2
f 0 , f 1 , f 2 , . . . ∈ K and g0 , g1 , g2 , . . . ∈ K. Then, f x = ∑ f n x 2 = ∑ f n x2n
n∈N   n∈N
and similarly g x2 = ∑ gn x2n . Thus, our assumption f x2 = g x2 rewrites
 
n ∈N
as ∑ f n x2n = ∑ gn x2n . Comparing x2n -coefficients in this equality, we con-
n ∈N n ∈N
clude that f n = gn for each n ∈ N. Hence, ∑ f n x n = ∑ gn x n . In other
n ∈N |{z} n ∈N
= gn
words, f = g (since f = ∑ f n x n and g = ∑ gn x n ). This proves Lemma
n ∈N n ∈N
4.3.3.
Lemma 4.3.3 justifies our “substituting x for x2 ” in the above proof; indeed,
we can apply Lemma 4.3.3 to K = Q and f = ∏ 1 − x k and g = ∑ (−1)k x wk

k >0  k∈Z
(because (150) says that these two FPSs f and g satisfy f x2 = g x2 ), and con-
 

sequently obtain ∏ 1 − x k = ∑ (−1)k x wk . Thus, Theorem 4.2.2 is proved



k >0 k ∈Z
using Theorem 4.3.2. It therefore remains to prove the latter.

4.3.3. Proof of Jacobi’s triple product identity


The following proof is due to Borcherds, and I have taken it from [Camero16,
§8.3] (note that [Loehr11, §11.2] gives essentially the same proof, albeit in a
different language).
Proof of Theorem 4.3.1 and Theorem 4.3.2. The following argument applies equally
to Theorem 4.3.1 and to Theorem 4.3.2. (The meanings of q and z differ between
these two theorems, but all the infinite sums and products considered below
are well-defined in either case.)
We will use a somewhat physics-inspired language:
1
• A level will mean a number of the form p + with p ∈ Z. (Thus, there is
2
exactly one level midway between any two consecutive integers.)
• A state will mean a set of levels that contains
– all but finitely many negative levels, and
– only finitely many positive levels.
Here is an example of a state:
−5 −1 1 3 7 13 −9
   
, , , , , ∪ all levels ≤ .
2 2 2 2 2 2 2
Math 701 Spring 2021, version April 6, 2024 page 227

Visually, it can be represented as follows:

· · · · · · · · · negative levels · · · · · · · · · · · · · · · · · · positive levels · · · · · · · · · · · ·

only s left of here only s right of here


0

where
• A white (=hollow) circle means a level that is contained in the state
(you can think of it as an “electron”).
• A black (=filled) circle means a level that is not contained in the state
(think of it as a “hole”).
For any state S,
• we define the energy of S to be
energy S := ∑ |{z} p∑
2p − 2p ∈ N
p>0; <0;
|{z}
p ∈ S >0 ∈ S <0
p/

(where the summation index p in the first sum runs over the finitely many
positive levels contained in S, while the summation index p in the second
sum runs over the finitely many negative levels not contained in S).
• we define the particle number of S to be
parnum S := (# of levels p > 0 such that p ∈ S)
/ S) ∈ Z.
− (# of levels p < 0 such that p ∈

For instance, in the above example, we have


energy S = 1 + 3 + 7 + 13 − (−3) − (−7) = 34
and
parnum S = 4 − 2 = 2.
We want to prove the identity
   
∏ 1+q z 1+q z ∑ qℓ zℓ .
2n−1 2n−1 −1 2n 2
1−q =
n >0 ℓ∈Z

We will first transform this identity into an equivalent one: Namely, we move
the 1 − q2n factors from the left hand side to the right hand side by multiply-
 −1
ing both sides with ∏ 1 − q2n . Thus, we can rewrite our identity as
n >0
!
     −1
∏ 1 + q2n−1 z 1 + q2n−1 z−1 = ∑ ℓ2
q zℓ ∏ 1 − q2n .
n >0 ℓ∈Z n >0
Math 701 Spring 2021, version April 6, 2024 page 228

We will prove this new identity by showing that both of its sides are

∑ qenergy S zparnum S .
S is a state

Left hand side: We have


  
∏ 1 + q 2n−1
z 1 + q 2n−1 −1
z
n >0
! !
   
= ∏ 1 + q2n−1 z ∏ 1 + q2n−1 z−1
n >0 n >0
! !
   
= ∏ 1 + q2p z ∏ 1 + q−2p z−1
p is a positive level p is a negative level
 
1
 here, we have substituted p + 2 for n in the first product, 
 
 1 
and have substituted − p + for n in the second product
2
  
   
= ∑ ∏ q2p z   ∑ ∏ q−2p z−1 
  
P is a finite set p∈ P N is a finite set p∈ N
of positive levels of negative levels

(here, we have expanded both products using (125))


   
= ∑ ∑ ∏ q2p z ∏ q−2p z−1
P is a finite set N is a finite set p∈ P p∈ N
of positive levels of negative levels | {z }
=q2(sum of elements of P)−2(sum of elements of N ) z| P|−| N |

= ∑ ∑ q2(sum of elements of P)−2(sum of elements of N ) z| P|−| N |


P is a finite set N is a finite set
of positive levels of negative levels

= ∑ q2(sum of positive levels in S)−2(sum of negative levels not in S)


S is a state
| {z }
=qenergy S
(by the definition of energy S)

z|(number of positive levels in S)−(


{z
number of negative levels not in S)
}
=zparnum S
(by the definition of parnum S)
 
here, we have combined P and N

 into a single state S := P ∪ N, 

where N = {all negative levels} \ N
= ∑ qenergy S zparnum S . (151)
S is a state
Math 701 Spring 2021, version April 6, 2024 page 229

Right hand side: Recall that

1
∏ (1 − x n ) −1 = ∏ 1 − x n = ∑ p (n) x n (by Theorem 4.1.8)
n >0 n >0 n ∈N
= ∑ x |λ|
λ is a
partition

(because the sum ∑ x |λ| contains each monomial x n precisely p (n) times).
λ is a
partition
Substituting q2 for x in this equality, we find
   n  −1  |λ|
∏ 1 − q2 = ∑ q2 .
n >0 λ is a
partition

In other words,
  −1
∏ 1 − q2n = ∑ q 2| λ | .
n >0 λ is a
partition
2
Multiplying both sides of this equality by ∑ qℓ zℓ , we obtain
ℓ∈Z
! !
  −1
∑q ℓ2
zℓ ∏ 1 − q2n = ∑q ℓ2
zℓ ∑ q 2| λ |
ℓ∈Z n >0 ℓ∈Z λ is a
partition

∑ ∑
2 +2| λ |
= qℓ zℓ . (152)
ℓ∈Z λ is a
partition

We want to show that this equals ∑ qenergy S zparnum S . In order to do this,


S is a state
we will find a bijection

Φℓ : {partitions} → {states with particle number ℓ}

for each ℓ ∈ Z, and we will show that this bijection satisfies

energy (Φℓ (λ)) = ℓ2 + 2 |λ| for each ℓ ∈ Z and λ ∈ {partitions} .

Let us do this. Fix ℓ ∈ Z. We define the state Gℓ (called the “ℓ-ground state”)
by  
1 3 5
Gℓ := {all levels < ℓ} = ℓ − , ℓ − , ℓ− , ... .
2 2 2
Math 701 Spring 2021, version April 6, 2024 page 230

Here is how it looks like (for ℓ positive):

· · · · · · · · · negative levels · · · · · · · · · · · · · · · · · · positive levels · · · · · · · · · · · ·

only s left of here only s right of here


0 ℓ

If ℓ ≥ 0, then this state Gℓ has energy

energy Gℓ = 1 + 3 + 5 + · · · + (2ℓ − 1) = ℓ2

and particle number


parnum Gℓ = ℓ − 0 = ℓ.
If ℓ < 0, then it has energy

energy Gℓ = − (−1) − (−3) − (−5) − · · · − (2ℓ + 1) = ℓ2

and particle number


parnum Gℓ = 0 − (−ℓ) = ℓ.
Note that the answers are the same in both cases. Thus, whatever sign ℓ has,
we have
energy Gℓ = ℓ2 and parnum Gℓ = ℓ.
If S is a state, and if p ∈ S, and if q is a positive integer such that p + q ∈
/ S,
then we define a new state

jump p,q S := (S \ { p}) ∪ { p + q} .

We say that this state jump p,q S is obtained from S by letting the electron at
level p jump q steps to the right. Note that jump p,q S has the same particle
number as S (check this!60 ), whereas its energy is 2q higher than that of S
(check this!61 ). Thus, a jumping electron raises the energy but keeps the particle
number unchanged.
For any partition λ = (λ1 , λ2 , . . . , λk ), we define the state Eℓ,λ (called an
“excited state”) by starting with the ℓ-ground state Gℓ , and then successively
letting the k electrons at the highest levels (which are – from highest to lowest

60 There are three cases:


Case 1: We have p > 0 (that is, the particle jumps from a positive level to a positive level).
Case 2: We have p < 0 and p + q > 0 (that is, the particle jumps from a negative level to a
positive level).
Case 3: We have p + q < 0 (that is, the particle jumps from a negative level to a negative
level).
Each case is easy to check.
61 Again, the same three cases are to be considered.
Math 701 Spring 2021, version April 6, 2024 page 231

1 1 1
– the levels ℓ − 1 + , ℓ − 2 + , . . . , ℓ − k + ) jump λ1 , λ2 , . . . , λk steps to
2 2 2
the right, respectively (starting with the rightmost electron). In other words,
   
Eℓ,λ := jumpℓ−k+1/2, λk · · · jumpℓ−2+1/2, λ2 jumpℓ−1+1/2, λ1 ( Gℓ )
 
1
= {all levels < ℓ − k} ∪ ℓ − i + + λi | i ∈ {1, 2, . . . , k} .
2

(Check that these jumps are well-defined – i.e., that each electron jumps to an
unoccupied state.)
[Example: Let ℓ = 3 and k = 4 and λ = (λ1 , λ2 , λ3 , λ4 ) = (4, 2, 2, 1). Then,
 
3 7 9 13
Eℓ,λ = {all levels < −1} ∪ , , , .
2 2 2 2

Here is a picture of this state Eℓ,λ and how it is constructed by a sequence of


electron jumps (the top state is Gℓ ; the bottom state is Eℓ,λ ):

0 ℓ
· · · · · · · · · negative levels · · · · · · · · · · · · · · · · · · positive levels · · · · · · · · · · · ·

jumpℓ−1+1/2, λ1

jumpℓ−2+1/2, λ2

jumpℓ−3+1/2, λ3

jumpℓ−4+1/2, λ4

Note how the order of the electrons after the jumps is the same as before – i.e.,
the electron in the rightmost position before the jumps is still the rightmost one
after the jumps, etc.]
Recall that the ℓ-ground state Gℓ has energy ℓ2 and particle number ℓ. The
state Eℓ,λ that we have just defined is obtained from this ℓ-ground state Gℓ by k
jumps, which are jumps by λ1 , λ2 , . . . , λk steps respectively. Recall that a jump
by q steps raises the energy by 2q but keeps the particle number unchanged.
Thus, the “excited state” Eℓ,λ has energy ℓ2 + 2λ1 + 2λ2 + · · · + 2λk = ℓ2 + 2 |λ|
| {z }
=2| λ |
and particle number ℓ. Furthermore, every state with particle number ℓ can be
written as Eℓ,λ for a unique partition λ (check this!62 ).
62 Here is how to obtain this λ: For each i ∈ {1, 2, 3, . . .}, let ui be the i-th largest level in the
Math 701 Spring 2021, version April 6, 2024 page 232

Thus, we obtain a bijection

Φℓ : {partitions} → {states with particle number ℓ} ,


λ 7→ Eℓ,λ .

This bijection satisfies

energy (Φℓ (λ)) = energy Eℓ,λ = ℓ2 + 2 |λ| for every partition λ.

Hence,

∑ ∑ qenergy(Φℓ (λ))
2
q ℓ +2| λ | =
λ is a λ is a
| {z }
partition =qenergy(Φℓ (λ)) partition
(since energy(Φℓ (λ))=ℓ2 +2|λ|)

= ∑ qenergy S (153)
S is a state with
particle number ℓ

(here, we have substituted S for Φℓ (λ) in the sum, since the map Φℓ is a bijec-
tion).
Forget that we fixed ℓ. We thus have proved (153) for each ℓ ∈ Z.
Now, (152) becomes
!
  −1
∑ ∏ = ∑ ∑ q ℓ +2| λ | z ℓ
2
ℓ ℓ 2n 2
q z 1 − q
ℓ∈Z n >0 ℓ∈Z λ is a
partition
| {z }
= ∑ qenergy S
S is a state with
particle number ℓ
(by (153))

= ∑ ∑ qenergy S zℓ
|{z}
ℓ∈Z S is a state with
=zparnum S
particle number ℓ
(since ℓ=parnum S)

= ∑ ∑ qenergy S zparnum S
ℓ∈Z S is a state with
particle number ℓ
| {z }
= ∑
S is a state

= ∑ qenergy S zparnum S .
S is a state

1
state, and let λi = ui − ℓ + i − . Then, (λ1 , λ2 , λ3 , . . .) is a weakly decreasing sequence
2
of nonnegative integers, and all but finitely many of its entries are 0 (since the state has
1
particle number ℓ, so that it is not hard to see that ui = ℓ − i + for any sufficiently
2
large i). Removing all 0s from this sequence (λ1 , λ2 , λ3 , . . .) thus results in a finite tuple
(λ1 , λ2 , . . . , λk ), which is precisely the partition λ whose corresponding Eℓ,λ is our state.
Math 701 Spring 2021, version April 6, 2024 page 233

Comparing this with (151), we obtain


!
     −1
∏ 1 + q2n−1 z 1 + q2n−1 z−1 = ∑ ℓ2
q zℓ ∏ 1 − q2n .
n >0 ℓ∈Z n >0

Multiplying both sides of this identity with ∏ 1 − q2n , we find



n >0
   
∏ ∑ qℓ zℓ .
2n−1 2n−1 −1 2n 2
1+q z 1+q z 1−q =
n >0 ℓ∈Z

This proves Jacobi’s Triple Product Identity (Theorem 4.3.1 and Theorem 4.3.2).

Other proofs of Jacobi’s Triple Product Identity can be found in [Aigner07,


§3.4], [Wagner08, Theorem 10.2] and [Hirsch17, §1.3 and §1.4].
We note that proving Theorem 4.2.2 using Theorem 4.3.2 can be viewed as a
kind of overkill; there are more direct proofs of Theorem 4.2.2 as well (see, e.g.,
[Zabroc03] or [Koch16, §10] or [18f-mt3s, Exercise 5] or [Bell06, §3]).

4.3.4. Application: A recursion for the sum of divisors


Now that Theorem 4.2.2 is proven, we use the occasion to spotlight a truly
unexpected application of it: Euler’s recursion for the sum of divisors. We state
it in the following form:

Theorem 4.3.4. For any positive integer n, let σ (n) denote the sum of all
positive divisors of n. (For example, σ (6) = 1 + 2 + 3 + 6 = 12 and σ (7) =
1 + 7 = 8.)
Then, for each positive integer n, we have
(
(−1)k−1 n, if n = wk for some k ∈ Z;
∑ (−1) σ (n − wk ) = 0,
k
if not.
k∈Z;
wk < n

(Here, wk is the k-th pentagonal number, defined in Definition 4.2.1.)

The left hand side of the equality in Theorem 4.3.4 looks as follows:

σ (n) − σ (n − 1) − σ (n − 2) + σ (n − 5) + σ (n − 7) − σ (n − 12) − σ (n − 15) ± · · ·

(the sum goes only as far as the arguments remain positive integers, so it is
a finite sum). Thus, by solving this equality for σ (n), we obtain a recursive
formula for σ (n). (In this form, Theorem 4.3.4 appears in many places, such as
[Ness61, §23] and [Johnso20, Theorem 37].) This does not give an efficient algo-
rithm for computing σ (n) (it is much slower than the formula in [19s, Exercise
Math 701 Spring 2021, version April 6, 2024 page 234

2.18.1 (b)], even though the latter requires computing the prime factorization of
n), but its beauty and element of surprise (why would you expect the pentago-
nal numbers to have anything to do with sums of divisors?) make up for this
practical uselessness. Even more surprisingly, it can be proved using Euler’s
pentagonal number theorem, even though it has seemingly nothing to do with
partitions!
Proof of Theorem 4.3.4 (sketched). First, let us show that the right hand side in
Theorem 4.3.4 is well-defined. Indeed, it is easy to see that

w0 < w1 < w −1 < w2 < w −2 < w3 < w −3 < · · · ;

thus, the pentagonal numbers wk for all k ∈ Z are distinct. Hence, if a pos-
itive integer n satisfies n = wk for some k ∈ Z, then this latter k is uniquely
determined by n, and therefore the expression
(
(−1)k−1 n, if n = wk for some k ∈ Z;
0, if not

is well-defined. In other words, the right hand side in Theorem 4.3.4 is well-
defined.
Now, define the two FPSs P and S as in our above proof of Theorem 4.1.20.
In that very proof, we have shown that these FPSs satisfy

xP′ = SP. (154)

Next, we define a further FPS

Q := ∑ (−1)k xwk ∈ Z [[x]] .


k ∈Z

Multiplying the equalities



1
P= ∑ p (n) x n = ∏ 1 − xk (by Theorem 4.1.8)
n ∈N k =1

and
∞  
Q= ∑ (−1)k x wk = ∏ 1 − xk (by Theorem 4.2.2) ,
k ∈Z k =1
we obtain
∞ ∞ ∞
! ! 
1   1  
PQ = ∏ 1 − xk ∏ 1−x k
=∏ k
· 1−x k
= 1.
k =1 k =1 k=1 | 1 − x {z }
=1

Combining this equality with (154), we can easily see that xQ′ = − QS. In-
deed, here are two (essentially equivalent) ways to show this:
Math 701 Spring 2021, version April 6, 2024 page 235

• From PQ = 1, we obtain Q = P−1 . But P ∈ Z [[ x ]]1 (since the constant


coefficient of P isp (0) = 1). Hence, Corollary 3.7.16 (applied to f = P)
yields loder P−1 = − loder P. In view of P−1 = Q, this rewrites as
loder Q = − loder P. But the definition of a logarithmic derivative yields
Q′ P′
loder Q = and loder P = . Hence,
Q P

Q′ P′ P′
 
= loder Q = − loder P = − since loder P = .
Q P P

Multiplying this equality by x, we find

xQ′ xP′ SP
=− =− (by (154))
Q P P
= −S,

so that xQ′ = − QS.

• A less conceptual but slicker way to draw the same conclusion is the
following: Taking derivatives in the equality PQ = 1, we find ( PQ)′ =
1′ = 0, so that 0 = ( PQ)′ = P′ Q + PQ′ (by Theorem 3.6.2 (d)). Thus,
PQ′ = − P′ Q. Multiplying this equality by x, we find PQ′ · x = − P′ Q · x =
xP′ Q = −S PQ = −S. Multiplying this further by Q, we obtain
− |{z}
|{z}
=SP =1
(by (154))
PQ′ · xQ = −SQ = − QS. In view of PQ′ · xQ = xQ′ · PQ = xQ′ , we can
|{z}
=1
rewrite this as xQ′ = − QS.

Either way, we now know that xQ′ = − QS. However, from Q = ∑ (−1)k x wk ,
k ∈Z
we obtain
!′
Q′ = ∑ (−1)k xwk = ∑ (−1)k ( x wk ) ′ = ∑ (−1)k wk xwk −1.
k ∈Z k ∈Z k ∈Z
| {z }
= w k x w k −1
(where we understand this
expression to mean 0 if wk =0)

Upon multiplication by x, this becomes

xQ′ = x ∑ (−1)k wk xwk −1 = ∑ (−1)k wk |xx{z } = ∑ (−1) wk x .


w k −1 k wk

k ∈Z k ∈Z = x wk k ∈Z
Math 701 Spring 2021, version April 6, 2024 page 236

Hence,

∑ (−1)k wk xwk = xQ′ = − Q S


|{z}
k ∈Z
|{z}
= ∑ (−1)k x wk = ∑ σ(k) x k
k ∈Z k >0
= ∑ σ (i ) x i
i >0
! !
=− ∑ (−1) k
x wk
∑ σ (i ) x i

k ∈Z i >0
 
= ∑ ∑ − (−1)k x wk σ (i ) xi
k ∈Z i >0 | {z } | {zw +i}
= σ (i ) x k
=(−1)k−1

= ∑ ∑ (−1)k−1 σ (i) xwk +i


k ∈Z i >0

= ∑ ∑ (−1)k−1 σ (m − wk ) |x wk +({zm−wk})
k ∈Z m > wk =xm
| {z }
= ∑ ∑
m >0 k∈Z;
m > wk
 
here, we have substituted m − wk for i
in the inner sum
= ∑ ∑ (−1)k−1 σ (m − wk ) x m . (155)
m >0 k∈Z;
m > wk

Now, let n be a positive integer. Let us compare the coefficients of x n on the


left and right hand sides of (155). On the left hand side, the coefficient of x n is
(
(−1)k wk , if n = wk for some k ∈ Z;
0, if not

(since the pentagonal numbers wk for all k ∈ Z are distinct, and thus the differ-
ent addends on the left hand side of (155) contribute to different monomials).
On the right hand side, the coefficient of x n is obviously

∑ (−1)k−1 σ (n − wk ) .
k∈Z;
n > wk

Since these two coefficients are equal (because (155) is an identity), we thus
Math 701 Spring 2021, version April 6, 2024 page 237

conclude that
(
(−1)k wk , if n = wk for some k ∈ Z;
0, if not
= ∑ (−1)k−1 σ (n − wk )
k∈Z;
| {z }
n>wk =−(−1)k
|{z}
= ∑
k∈Z;
wk < n

=− ∑ (−1)k σ (n − wk ) .
k∈Z;
wk < n

Thus,
(
(−1)k wk , if n = wk for some k ∈ Z;
∑ (−1)k σ (n − wk ) = −
0, if not
k∈Z;
wk < n
(
− (−1)k wk , if n = wk for some k ∈ Z;
=
0, if not
(
(−1)k−1 wk , if n = wk for some k ∈ Z;
=
0, if not
 
k k −1
since − (−1) = (−1)
(
(−1)k−1 n, if n = wk for some k ∈ Z;
=
0, if not

(because if n = wk for some k ∈ Z, then wk = n). This proves Theorem 4.3.4.

4.4. q-binomial coefficients


Next, we shall study q-binomial coefficients (also known as Gaussian binomial co-
efficients, due to their origins in Gauss’s number-theoretical research [Gauss08,
§5]). While we will define them as generating functions for certain kinds of
partitions, they are sufficiently elementary to have relevance to various other
subjects. We will scratch the surface; more can be found in [KacChe02, Chap-
ters 5–7], [KliSch97, Chapter 2], [Wagner08, Chapter 5], [Wagner20, §11.3–
11.11], [Stanle11, spread across the text], [GouJac83, §2.6], [Johnso20] and other
sources. (The book [Johnso20] is particularly recommended as a leisurely in-
troduction to q-binomial coefficients and related power series.)

4.4.1. Motivation
For any n ∈ N, we have
p (n) = (# of partitions of n) .
Math 701 Spring 2021, version April 6, 2024 page 238

For any n, k ∈ N, we have

pk (n) = (# of partitions of n into k parts)


= (# of partitions of n with largest part k) (by Theorem 4.1.15) .

Thus, for any n, k ∈ N, we have

p0 (n) + p1 (n) + · · · + pk (n) = (# of partitions of n into at most k parts)


= (# of partitions of n with largest part ≤ k)

(by Corollary 4.1.18).


So far, so good. But how to count partitions of n that both have a fixed # of
parts (say, k parts) and a fixed largest part (say, largest part ℓ) ?
Let us first drop the size requirement – i.e., we replace “partitions of n” by
just “partitions”.
For example, how many partitions have 4 parts and largest part 6 ?
As in the proof of Theorem 4.1.15, let us draw the Young diagram of such a
partition: For example, the partition (6, 3, 3, 2) has Young diagram

Consider the lower boundary of this Young diagram – i.e., the “irregular”
southeastern border between what is in the diagram and what is outside of it.
Let me mark it in thick red:

This lower boundary can be viewed as a lattice path from the point (0, 0) to the
point (6, 4) (where we are using Cartesian coordinates to label the intersections
of grid lines, so that the southwesternmost point in our diagram is (0, 0); note
that this is completely unrelated to our labeling of cells used in defining the
Math 701 Spring 2021, version April 6, 2024 page 239

Young diagram!63 ). This lattice path consists of east-steps (i.e., steps (i, j) →
(i + 1, j)) and north-steps (i.e., steps (i, j) → (i, j + 1)); moreover, it begins with
an east-step (since otherwise, our partition would have fewer than 4 parts)
and ends with a north-step (since otherwise, our partition would have largest
part < 6). Moreover, the Young diagram (and thus the partition) is uniquely
determined by this lattice path, since its cells are precisely the cells “northwest”
of this lattice path. Conversely, any lattice path from (0, 0) to (6, 4) that consists
of east-steps and north-steps and begins with an east step and ends with a
north-step uniquely determines a Young diagram and therefore a partition.
Therefore, in order to count the partitions that have 4 parts and largest part 6,
we only need to count such lattice paths.
To count them, we notice that any such lattice path has precisely 10 steps
(since any step increases the sum of the coordinates by 1; but this sum must
increase from 0 + 0 = 0 to 6 + 4 = 10). The first and the last steps are prede-
termined; it remains to decide  which
 of the remaining 8 steps are north-steps.
8
The # of ways to do this is , because we want precisely 3 of our 8 non-
3
predetermined steps to be north-steps (in order to end up at (6, 4) rather than
some other point).
As a consequence of this all, we find
 
8
(# of partitions with 4 parts and largest part 6) = .
3

More generally, by the same argument, we obtain the following:

Proposition 4.4.1. For any positive integers k and ℓ, we have

k+ℓ−2
 
(# of partitions with k parts and largest part ℓ) = .
k−1

63 For additional clarity, here are the Cartesian coordinates of all grid points on our lattice path:

(6, 4)

(6, 3)
(3, 3) (4, 3) (5, 3)

(3, 2)

(3, 1)
(2, 1)

(0, 0) (1, 0) (2, 0)


.
Math 701 Spring 2021, version April 6, 2024 page 240

Note two things:

• This is a finite number, even without fixing the size of the partition. This
is not surprising, since you have only finitely many parts and only finitely
many options for each part.
• The number is symmetric in k and ℓ. This, too, is not surprising, because
conjugation (as defined in the proof of Theorem 4.1.15) gives a bijection
from {partitions with k parts and largest part ℓ}
to {partitions with ℓ parts and largest part k} .

Now, let us integrate the size of the partition back into our count – i.e., let us
try to count the partitions of a given n ∈ N with k parts and largest part ℓ. No
simple formula (like Proposition 4.4.1) exists for this number any more, so we
switch our focus to the generating function of such numbers (for fixed k and ℓ).
In other words, we try to compute the FPS

∑ (# of partitions of n with k parts and largest part ℓ) x n


n ∈N
= ∑ x |λ| .
λ is a partition
with largest part ℓ
and length k

For reasons of convenience, history and simplicity, we modify this problem


slightly (without changing its essence). To wit,

• we rename ℓ as n − k (note that n will no longer stand for the size of the
partition);
• we replace “largest part n − k and length k” by “largest part ≤ n − k and
length ≤ k” (this changes the results of our counts, but we can easily re-
cover the answer to the original question from an answer to the new ques-
tion; e.g., in order to count the length-k partitions, it suffices to subtract
the # of length-(≤ k − 1)-partitions from the # of length-(≤ k) partitions);
• we rename the indeterminate x as q.

4.4.2. Definition

Convention 4.4.2. In this section, we will mostly be using FPSs in the inde-
terminate q. That is, we call the indeterminate q rather than x. Thus, e.g., our
formula
1
∏ (1 − x n ) −1 = ∏ 1 − x n = ∑ p (n) x n = ∑ x |λ|
n >0 n >0 n ∈N λ is a
partition
Math 701 Spring 2021, version April 6, 2024 page 241

becomes
1
∏ (1 − q n ) −1 = ∏ 1 − q n = ∑ p (n) qn = ∑ q|λ| .
n >0 n >0 n ∈N λ is a
partition

The ring of FPSs in the indeterminate q over a commutative ring K will be


denoted by K [[q]]. The ring of polynomials in the indeterminate q over K
will be denoted by K [q].

Definition 4.4.3. Let n ∈ N and k ∈ N.


 
n
(a) The q-binomial coefficient (or Gaussian binomial coefficient) is defined
k q
to be the polynomial

∑ q|λ| ∈ Z [ q ] .
λ is a partition
with largest part ≤n−k
and length ≤k
 
n
This is also denoted by (but this notation has other meanings, too, and
k
suppresses q).
(b) If a is any element of a ring A, then we set
   
n n
:= [ a]
k a k q
  !
n
this means the result of substituting a for q in
k q
= ∑ a|λ| ∈ A.
λ is a partition
with largest part ≤n−k
and length ≤k

 
n
Remark 4.4.4. The we defined in Definition 4.4.3 (a) is really a polyno-
k q
mial, not merely a FPS, because (for any given n and k) there are only finitely
many partitions with largest part ≤ n − k and length ≤ k.
 
n
Remark 4.4.5. The notation (and the name “q-binomial coefficient”)
k q
 
n
suggests a similarity to the usual binomial coefficient . And indeed, we
k
Math 701 Spring 2021, version April 6, 2024 page 242

   
n n
will soon see that = .
k 1  k  
n n
Note, however, that is only defined for n, k ∈ N (unlike , which
k q k
we defined for arbitrary n, k ∈ C). It is possible to extend it to negative
integers n, but this will result in a Laurent polynomial. (See Exercise A.3.4.4
for this extension.)

Example 4.4.6. We have


 
3
2 q
= ∑ q|λ| = q|(1,1)| + q|(1)| + q|()|
λ is a partition
with largest part ≤1
and length ≤2
since the partitions with largest part ≤ 1
 

and length ≤ 2 are (1, 1) , (1) and ()


= q2 + q1 + q0 = q2 + q + 1

and
 
4
2 q
= ∑ q|λ|
λ is a partition
with largest part ≤2
and length ≤2

= q|(2,2)| + q|(2,1)| + q|(2)| + q|(1,1)| + q|(1)| + q|()|


= q4 + q3 + q2 + q2 + q1 + q0 = q4 + q3 + 2q2 + q + 1.

4.4.3. Basic properties


Let us show two slightly different (but equivalent) ways to express q-binomial
coefficients:

Proposition 4.4.7. Let n, k ∈ N.


(a) We have  
n
=
k q 0≤ i ∑ qi1 +i2 +···+ik .
1 ≤i2 ≤···≤ik ≤ n − k

Here, the sum ranges over all weakly increasing k-tuples (i1 , i2 , . . . , ik ) ∈
{0, 1, . . . , n − k}k . If k > n, then this is an empty sum (since the
set {0, 1, . . . , n − k } is empty in this case, and thus its k-th power
{0, 1, . . . , n − k}k is also empty because k > n ≥ 0).
Math 701 Spring 2021, version April 6, 2024 page 243

(b) Set sum S = ∑ s for any finite set S of integers. (For example,
s∈S
sum {2, 4, 5} = 2 + 4 + 5 = 11.) Then, we have
 
n
= ∑
k q S⊆{1,2,...,n};
qsum S−(1+2+···+k) .

|S|=k

(c) We have    
n n
= .
k 1 k

 
5
Example 4.4.8. For example, let us compute using Proposition 4.4.7
2 q
(b). Namely, applying Proposition 4.4.7 (b) to n = 5 and k = 2, we obtain
 
5
= ∑ qsum S−(1+2)
2 q S⊆{1,2,...,5 };
|S|=2
(1+2)−(1+2)
=q + q(1+3)−(1+2) + q(1+4)−(1+2) + q(1+5)−(1+2)
+ q(2+3)−(1+2) + q(2+4)−(1+2) + q(2+5)−(1+2)
+ q(3+4)−(1+2) + q(3+5)−(1+2) + q(4+5)−(1+2)
since the 2-element subsets of {1, 2, . . . , 5} are
 
 {1, 2} , {1, 3} , {1, 4} , {1, 5} , {2, 3} , {2, 4} , 
{2, 5} , {3, 4} , {3, 5} , {4, 5}
= q0 + q1 + q2 + q3 + q2 + q3 + q4 + q4 + q5 + q6
= 1 + q + 2q2 + 2q3 + 2q4 + q5 + q6 .

 
n
Proof of Proposition 4.4.7. (a) The definition of yields
k q
 
n
k q
= ∑ q|λ|
λ is a partition
with largest part ≤n−k
and length ≤k
k
= ∑ ∑ q|λ| . (156)
ℓ=0 λ is a partition
with largest part ≤n−k
and length ℓ

Now, let us simplify the inner sum on the right hand side.
Math 701 Spring 2021, version April 6, 2024 page 244

Fix ℓ ∈ {0, 1, . . . , k }. Then, any partition λ with length ℓ has the form
(λ1 , λ2 , . . . , λℓ ) for some nonnegative integers λ1 , λ2 , . . . , λℓ satisfying λ1 ≥
λ2 ≥ · · · ≥ λℓ > 0 (by the definitions of “partition” and “length”). More-
over, this partition λ has largest part ≤ n − k if and only if its entries satisfy
n − k ≥ λ1 ≥ λ2 ≥ · · · ≥ λℓ > 0. Finally, the size |λ| of this partition equals
λ1 + λ2 + · · · + λℓ . Hence, we can rewrite the sum

∑ q|λ| as ∑ qλ1 +λ2 +···+λℓ .


λ is a partition (λ1 ,λ2 ,...,λℓ )∈Nℓ ;
with largest part ≤n−k n−k≥λ1 ≥λ2 ≥···≥λℓ >0
and length ℓ

In other words, we have

∑ q|λ| = ∑ qλ1 +λ2 +···+λℓ . (157)


λ is a partition )∈Nℓ ;
(λ1 ,λ2 ,...,λℓ
with largest part ≤n−k n−k≥λ1 ≥λ2 ≥···≥λℓ >0
and length ℓ

Next, for any k-tuple (λ1 , λ2 , . . . , λk ) ∈ Nk , let us define numpos (λ1 , λ2 , . . . , λk )


to be the number of positive entries of this k-tuple (i.e., the number of i ∈
{1, 2, . . . , k} satisfying λi > 0). For example, numpos (4, 2, 2, 0) = 3 and
numpos (4, 2, 2, 1) = 4 and numpos (0, 0, 0, 0) = 0. The following is easy but
important:

Observation 1: Let (λ1 , λ2 , . . . , λk ) ∈ Nk be any k-tuple satisfying


λ1 ≥ λ2 ≥ · · · ≥ λk ≥ 0 and numpos (λ1 , λ2 , . . . , λk ) = ℓ. Then:
(a) The first ℓ entries of (λ1 , λ2 , . . . , λk ) are positive (i.e., we have
λi > 0 for all i ∈ {1, 2, . . . , ℓ}).
(b) The last k − ℓ entries of (λ1 , λ2 , . . . , λk ) are 0 (i.e., we have λi = 0
for all i ∈ {ℓ + 1, ℓ + 2, . . . , k }).
(c) We have λ1 + λ2 + · · · + λℓ = λ1 + λ2 + · · · + λk .
(d) The ℓ-tuple (λ1 , λ2 , . . . , λℓ ) is a partition.

[Proof of Observation 1: We have numpos (λ1 , λ2 , . . . , λk ) = ℓ. In other words, the


k-tuple (λ1 , λ2 , . . . , λk ) has exactly ℓ positive entries. Since this k-tuple is weakly de-
creasing (because λ1 ≥ λ2 ≥ · · · ≥ λk ), these ℓ positive entries must be concentrated at
the beginning of the k-tuple; i.e., they must be the first ℓ entries of the k-tuple. Hence,
the first ℓ entries of (λ1 , λ2 , . . . , λk ) are positive. This proves Observation 1 (a).
We have shown that the k-tuple (λ1 , λ2 , . . . , λk ) has exactly ℓ positive entries, and
they are the first ℓ entries of this k-tuple. Hence, the remaining k − ℓ entries of this
k-tuple are nonpositive. Since these entries are nonnegative as well (because λ1 ≥ λ2 ≥
· · · ≥ λk ≥ 0), we thus conclude that they are 0. In other words, the last k − ℓ entries
of (λ1 , λ2 , . . . , λk ) are 0. This proves Observation 1 (b).
Math 701 Spring 2021, version April 6, 2024 page 245

Furthermore,

λ1 + λ2 + · · · + λk = (λ1 + λ2 + · · · + λℓ ) + (λℓ+1 + λℓ+2 + · · · + λk )


| {z }
=0+0+···+0
(by Observation 1 (b))

= ( λ1 + λ2 + · · · + λ ℓ ) + (0 + 0 + · · · + 0) = λ1 + λ2 + · · · + λ ℓ .
This proves Observation 1 (c).
(d) The ℓ-tuple (λ1 , λ2 , . . . , λℓ ) is weakly decreasing (since λ1 ≥ λ2 ≥ · · · ≥ λk entails
λ1 ≥ λ2 ≥ · · · ≥ λℓ ) and consists of positive integers (since Observation 1 (a) says that
we have λi > 0 for all i ∈ {1, 2, . . . , ℓ}). Thus, it is a weakly decreasing tuple of positive
integers, i.e., a partition. This proves Observation 1 (d).]
Let us recall that ℓ ∈ {0, 1, . . . , k}, so that ℓ ≤ k. Hence, any ℓ-tuple (λ1 , λ2 , . . . , λℓ ) ∈
Nℓ can be extended to a k-tuple (λ1 , λ2 , . . . , λk ) ∈ Nk by inserting k − ℓ zeroes
at the end (i.e., by setting λℓ+1 = λℓ+2 = · · · = λk = 0). 64 If the original
ℓ-tuple (λ1 , λ2 , . . . , λℓ ) ∈ Nℓ was a partition
 with largest part ≤
 n − k, then

the extended k-tuple (λ1 , λ2 , . . . , λk ) = λ1 , λ2 , . . . , λℓ , 0, 0, . . . , 0 will satisfy


| {z }
k−ℓ zeroes
n − k ≥ λ1 ≥ λ2 ≥ · · · ≥ λk ≥ 0 (since n − k ≥ λ1 ≥ λ2 ≥ · · · ≥ λℓ and
λℓ ≥ 0 = λℓ+1 = λℓ+2 = · · · = λk ≥ 0) and numpos (λ1 , λ2 , . . . , λk ) = ℓ (since
its first ℓ entries are positive whereas its remaining k − ℓ entries are 0).
Conversely, if (λ1 , λ2 , . . . , λk ) ∈ Nk is a k-tuple satisfying n − k ≥ λ1 ≥
λ2 ≥ · · · ≥ λk ≥ 0 and numpos (λ1 , λ2 , . . . , λk ) = ℓ, then (λ1 , λ2 , . . . , λℓ ) is a
partition65 with largest part ≤ n − k 66 and length ℓ. Thus, we obtain a map
n
from k-tuples (λ1 , λ2 , . . . , λk ) ∈ Nk satisfying n − k ≥ λ1 ≥ λ2 ≥ · · · ≥ λk ≥ 0
and numpos (λ1 , λ2 , . . . , λk ) = ℓ}
to {partitions with largest part ≤ n − k and length ℓ}

which sends any k-tuple (λ1 , λ2 , . . . , λk ) to the partition (λ1 , λ2 , . . . , λℓ ). On the


other hand, we have a map

from {partitions with largest part ≤ n − k and length ℓ}


n
to k-tuples (λ1 , λ2 , . . . , λk ) ∈ Nk satisfying n − k ≥ λ1 ≥ λ2 ≥ · · · ≥ λk ≥ 0
and numpos (λ1 , λ2 , . . . , λk ) = ℓ}
 

which sends any partition (λ1 , λ2 , . . . , λℓ ) to the k-tuple λ1 , λ2 , . . . , λℓ , 0, 0, . . . , 0


| {z }
k−ℓ zeroes
64 For example, if ℓ = 3 and k = 5, then the ℓ-tuple (4, 2, 2) gets extended to the k-tuple
(4, 2, 2, 0, 0).
65 because of Observation 1 (d)
66 since all parts λ , λ , . . . , λ of this partition are ≤ n − k (because n − k ≥ λ ≥ λ ≥ · · · ≥ λ )
1 2 ℓ 1 2 k
Math 701 Spring 2021, version April 6, 2024 page 246

67 .These two maps are mutually inverse68 , and therefore are bijections. In par-
ticular, this shows that the first map is a bijection. This bijection allows us to
replace our partitions (λ1 , λ2 , . . . , λℓ ) ∈ Nℓ by k-tuples (λ1 , λ2 , . . . , λk ) ∈ Nk in
the sum ∑ qλ1 +λ2 +···+λℓ ; we thus find
(λ1 ,λ2 ,...,λℓ )∈Nℓ ;
n−k≥λ1 ≥λ2 ≥···≥λℓ >0

∑ qλ1 +λ2 +···+λℓ = ∑ qλ1 +λ2 +···+λℓ


(λ1 ,λ2 ,...,λℓ )∈Nℓ ; (λ1 ,λ2 ,...,λk )∈Nk ;
| {z }
n−k≥λ1 ≥λ2 ≥···≥λℓ >0 n−k≥λ1 ≥λ2 ≥···≥λk ≥0; =qλ1 +λ2 +···+λk
numpos(λ1 ,λ2 ,...,λk )=ℓ (by Observation 1 (c))

= ∑ λ1 +λ2 +···+λk
q .
)∈Nk ;
(λ1 ,λ2 ,...,λk
n−k≥λ1 ≥λ2 ≥···≥λk ≥0;
numpos(λ1 ,λ2 ,...,λk )=ℓ

Now, (157) becomes

∑ q|λ| = ∑ qλ1 +λ2 +···+λℓ


λ is a partition (λ1 ,λ2 ,...,λℓ )∈Nℓ ;
with largest part ≤n−k n−k≥λ1 ≥λ2 ≥···≥λℓ >0
and length ℓ

= ∑ qλ1 +λ2 +···+λk . (158)


)∈Nk ;
(λ1 ,λ2 ,...,λk
n−k≥λ1 ≥λ2 ≥···≥λk ≥0;
numpos(λ1 ,λ2 ,...,λk )=ℓ

Now, forget that we fixed ℓ. We thus have proved (158) for each ℓ ∈ {0, 1, . . . , k }.

67 because we have shown in the previous paragraph that if (λ1 , λ2 , . . . , λℓ ) is a parti-


tion
 with largest part ≤ n − k, then we can extend it to a k-tuple (λ1 , λ2 , . . . , λk ) =

λ1 , λ2 , . . . , λℓ , 0, 0, . . . , 0 by inserting k − ℓ zeroes at the end, and this extended k-tuple


 
| {z }
k −ℓ zeroes
will satisfy n − k ≥ λ1 ≥ λ2 ≥ · · · ≥ λk ≥ 0 and numpos (λ1 , λ2 , . . . , λk ) = ℓ
68 Indeed, the first map removes the last k − ℓ entries from a k-tuple, whereas the second map

inserts k − ℓ zeroes at the end of the partition. Thus, if we apply the first map after the
second map, we clearly recover the partition that we started with. If we apply the second
map after the first map, then we end up replacing the last k − ℓ entries of our k-tuple by
zeroes. However, if (λ1 , λ2 , . . . , λk ) ∈ Nk is any k-tuple satisfying n − k ≥ λ1 ≥ λ2 ≥ · · · ≥
λk ≥ 0 and numpos (λ1 , λ2 , . . . , λk ) = ℓ, then the last k − ℓ entries of this k-tuple are 0
(by Observation 1 (b)), and therefore the k-tuple does not change if we replace these k − ℓ
entries by zeroes.
Math 701 Spring 2021, version April 6, 2024 page 247

Now, (156) becomes


  k k
n
= ∑ ∑ q|λ| = ∑ ∑ qλ1 +λ2 +···+λk
k q ℓ=0 λ is a partition ℓ=0 (λ1 ,λ2 ,...,λk )∈Nk ;
with largest part ≤n−k n−k≥λ1 ≥λ2 ≥···≥λk ≥0;
and length ℓ numpos(λ1 ,λ2 ,...,λk )=ℓ
| {z } | {z }
λ1 +λ2 +···+λk
= ∑ q = ∑
(λ1 ,λ2 ,...,λk )∈Nk ; (λ1 ,λ2 ,...,λk )∈Nk ;
n−k≥λ1 ≥λ2 ≥···≥λk ≥0; n−k≥λ1 ≥λ2 ≥···≥λk ≥0
numpos(λ1 ,λ2 ,...,λk )=ℓ
(by (158))

= ∑ qλ1 +λ2 +···+λk = ∑ qik +ik−1 +···+i1


(λ1 ,λ2 ,...,λk )∈Nk ; (i1 ,i2 ,...,ik )∈Nk ;
| {z }
i1 +i2 +···+ik
n−k≥λ1 ≥λ2 ≥···≥λk ≥0 0≤i1 ≤i2 ≤···≤ik ≤n−k =q
 
here, we have reversed the k-tuple (λ1 , λ2 , . . . , λk ) ,
 i.e., we have substituted (ik , ik−1 , . . . , i1 ) for (λ1 , λ2 , . . . , λk ) 
in our sum
= ∑ qi1 +i2 +···+ik = ∑ qi1 +i2 +···+ik .
(i1 ,i2 ,...,ik )∈Nk ; 0≤i1 ≤i2 ≤···≤ik ≤n−k
0≤i1 ≤i2 ≤···≤ik ≤n−k

This proves Proposition 4.4.7 (a).


(b) There is a bijection
n o
k
from weakly increasing k-tuples (i1 , i2 , . . . , ik ) ∈ {0, 1, . . . , n − k }
n o
to strictly increasing k-tuples (s1 , s2 , . . . , sk ) ∈ {1, 2, . . . , n}k

that sends each weakly increasing k-tuple (i1 , i2 , . . . , ik ) to (i1 + 1, i2 + 2, . . . , ik + k )


(you can think of it as “spacing the i j s apart”, i.e., increasing the distance
between any two consecutive i j ’s by 1 and also increasing i1 by 1). The in-
verse of this bijection sends each strictly increasing k-tuple (s1 , s2 , . . . , sk ) to
(s1 − 1, s2 − 2, . . . , sk − k). Thus, we can substitute (s1 − 1, s2 − 2, . . . , sk − k)
for (i1 , i2 , . . . , ik ) in the sum

∑ qi1 +i2 +···+ik .


0≤i1 ≤i2 ≤···≤ik ≤n−k

Hence we obtain

∑ qi1 +i2 +···+ik = ∑ q(s1 −1)+(s2 −2)+···+(sk −k)


0≤i1 ≤i2 ≤···≤ik ≤n−k 1≤s1 <s2 <···<sk ≤n
| {z }
=q(s1 +s2 +···+sk )−(1+2+···+k)

= ∑ q(s1 +s2 +···+sk )−(1+2+···+k) .


1≤s1 <s2 <···<sk ≤n
Math 701 Spring 2021, version April 6, 2024 page 248

On the other hand, there is a bijection


n o
from strictly increasing k-tuples (s1 , s2 , . . . , sk ) ∈ {1, 2, . . . , n}k
to {k-element subsets of {1, 2, . . . , n}}
that sends each k-tuple (s1 , s2 , . . . , sk ) to the subset {s1 , s2 , . . . , sk }. (This map
is indeed a bijection, because any k-element subset of {1, 2, . . . , n} can be writ-
ten as {s1 , s2 , . . . , sk } for a unique strictly increasing k-tuple (s1 , s2 , . . . , sk ) ∈
{1, 2, . . . , n}k ; in fact, this is simply saying that there is a unique way of listing
the elements of this subset in increasing order.)
Because of this bijection, we have

∑ q(s1 +s2 +···+sk )−(1+2+···+k) = ∑ qsum S−(1+2+···+k)


1≤s1 <s2 <···<sk ≤n S⊆{1,2,...,n};
|S|=k

(because s1 + s2 + · · · + sk = sum {s1 , s2 , . . . , sk } for any strictly increasing k-


tuple (s1 , s2 , . . . , sk ) ∈ {1, 2, . . . , n}k ).
Now, Proposition 4.4.7 (a) yields
 
n
=
k q 0≤i ≤i ≤···≤i ≤n−k∑ qi1 +i2 +···+ik = ∑ q(s1 +s2 +···+sk )−(1+2+···+k)
1 2 k 1≤s <s <···<s ≤n
1 2 k

= ∑ q sum S−(1+2+···+k)
.
S⊆{1,2,...,n};
|S|=k

This proves Proposition 4.4.7 (b).


(c) Proposition 4.4.7 (b) yields
 
n
= ∑ qsum S−(1+2+···+k) .
k q S⊆{1,2,...,n };
|S|=k

Substituting 1 for q in this equality, we find


 
n
= ∑ 1|sum S−({z
k 1 S⊆{1,2,...,n
1+2+···+k)
}= ∑ 1
}; S⊆{1,2,...,n};
=1
|S|=k |S|=k
= (# of subsets S ⊆ {1, 2, . . . , n} satisfying |S| = k)
 
n
= (# of k-element subsets of {1, 2, . . . , n}) = .
k
This proves Proposition 4.4.7 (c).
The following property of q-binomial coefficients generalizes Proposition 2.3.5:
Math 701 Spring 2021, version April 6, 2024 page 249

 
n
Proposition 4.4.9. Let n, k ∈ N satisfy k > n. Then, = 0.
k q
 
n
Proof. From k > n, we obtain n − k < 0. The definition of yields
k q
 
n
k q
= ∑ q|λ| . (159)
λ is a partition
with largest part ≤n−k
and length ≤k

The sum on the right hand side is an empty sum, since there exists no partition
 
n
with largest part ≤ n − k (because n − k < 0). Thus, (159) rewrites as =
k q
(empty sum) = 0, and this proves Proposition 4.4.9.
   
n n
Proposition 4.4.10. We have = = 1 for each n ∈ N.
0 q n q

Proof. This is easy and left as a homework exercise (Exercise A.3.4.1 (a)).
The next convention mirrors a convention we made for the (usual) binomial
coefficients:
 
n
Convention 4.4.11. Let n ∈ N. For any k ∈
/ N, we set := 0.
k q

The following theorem  gives


 not one,
 but
 two 
analogues (“q-analogues”) of
n n−1 n−1
the recurrence relation = + (from Proposition 2.3.4):
k k−1 k
Theorem 4.4.12. Let n be a positive integer. Let k ∈ N. Then:
(a) We have
n−k n − 1 n−1
     
n
=q + .
k q k−1 q k q

(b) We have
n−1 k n−1
     
n
= +q .
k q k−1 q k q

Proof. (a) This is similar to the combinatorial proof of the recurrence relation
for binomial coefficients.
If k = 0, then the claimweare proving down to 1 = qn−k 0 + 1 (because
 boils 
n n−1
Proposition 4.4.10 yields = 1 and = 1, and because Convention
0 q 0 q
Math 701 Spring 2021, version April 6, 2024 page 250

n−1
 
4.4.11 yields = 0). Hence, we WLOG assume that k > 0. Thus,
−1 q
k − 1 ∈ N.
Proposition 4.4.7 (b) says that
 
n
= ∑ qsum S−(1+2+···+k) .
k q S⊆{1,2,...,n
(160)
};
|S|=k

Proposition 4.4.7 (b) (applied to n − 1 and k − 1 instead of n and k) yields

n−1
 
= ∑
k − 1 q S⊆{1,2,...,n
qsum S−(1+2+···+(k−1)) . (161)
−1};
|S|=k−1

Proposition 4.4.7 (b) (applied to n − 1 instead of n) yields

n−1
 

k
= ∑ qsum S−(1+2+···+k) . (162)
q S⊆{1,2,...,n−1};
|S|=k

Let us now make two definitions:

• A type-1 subset will mean a k-element subset of {1, 2, . . . , n} that contains


n;

• A type-2 subset will mean a k-element subset of {1, 2, . . . , n} that does not
contain n.

Each k-element subset of {1, 2, . . . , n} is either type-1 or type-2 (but not both
at the same time). Thus,

∑ qsum S−(1+2+···+k)
S⊆{1,2,...,n};
|S|=k

= ∑ qsum S−(1+2+···+k) + ∑ qsum S−(1+2+···+k) .


S⊆{1,2,...,n}; S⊆{1,2,...,n};
|S|=k; |S|=k;
S is type-1 S is type-2

The type-2 subsets are precisely the k-element subsets of {1, 2, . . . , n − 1}.
Hence,

n−1
 
∑ q sum S−(1+2+···+k)
= ∑ q sum S−(1+2+···+k )
=
k
S⊆{1,2,...,n}; S⊆{1,2,...,n−1}; q
|S|=k; |S|=k
S is type-2
Math 701 Spring 2021, version April 6, 2024 page 251

(by (162)).
The type-1 subsets are just the (k − 1)-element subsets of {1, 2, . . . , n − 1}
with an n inserted into them; i.e., the map

{(k − 1) -element subsets of {1, 2, . . . , n − 1}} → {type-1 subsets} ,


S 7→ S ∪ {n}

is a bijection. Hence, substituting S ∪ {n} for S in the sum, we find

∑ qsum S−(1+2+···+k) = ∑ qsum(S∪{n})−(1+2+···+k)


S⊆{1,2,...,n}; S⊆{1,2,...,n−1};
| {z }
|S|=k; |S|=k−1 =qsum S+n−(1+2+···+k)
S is type-1 (since S⊆{1,2,...,n−1} entails n/
∈S
and thus sum(S∪{n})=sum S+n)

= ∑ qsum S+n−(1+2+···+k)
S⊆{1,2,...,n−1};
| {z }
|S|=k−1 =qsum S+n−(1+2+···+(k−1))−k
=qn−k qsum S−(1+2+···+(k−1))

= ∑ qn−k qsum S−(1+2+···+(k−1))


S⊆{1,2,...,n−1};
|S|=k−1

= qn−k ∑ qsum S−(1+2+···+(k−1))


S⊆{1,2,...,n−1};
|S|=k−1
|  {z  }
n−1
=
k−1 q
(by (161))
n−1
 
= qn−k .
k−1 q

All that’s left to do is combining what we have found:


 
n
= ∑ qsum S−(1+2+···+k)
k q S⊆{1,2,...,n
(by (160))
};
|S|=k

= ∑ qsum S−(1+2+···+k) + ∑ qsum S−(1+2+···+k)


S⊆{1,2,...,n}; S⊆{1,2,...,n};
|S|=k; |S|=k;
S is type-1 S is type-2
| {z } |  {z  }
n−1 n−1

=qn−k =
k−1 q k q

n−k n − 1 n−1
   
=q + .
k−1 q k q

This proves Theorem 4.4.12 (a).


Math 701 Spring 2021, version April 6, 2024 page 252

(b) This is somewhat similar to Theorem 4.4.12 (a) (but a little bit more com-
plicated). It is left as a homework exercise (Exercise A.3.4.1 (b)).
n ( n − 1) · · · ( n − k + 1)
 
n
Next, we shall derive a q-analogue of the formula = =
k k!
n ( n − 1) · · · ( n − k + 1)
:
k ( k − 1) · · · 1
Theorem 4.4.13. Let n, k ∈ N satisfy n ≥ k. Then:
(a) We have
    n

k k −1 1
1−q 1−q ··· 1−q ·
k q
   
n n −1 n − k +1
= (1 − q ) 1 − q ··· 1−q .

(b) We have

(1 − q n ) 1 − q n −1 · · · 1 − q n − k +1
   
n
=
1 − q k 1 − q k −1 · · · (1 − q 1 )
 
k q

(in the ring Z [[q]] or in the field of rational functions over Q).

Note that part (b) of Theorem 4.4.13 is the more intuitive statement, but part
(a) is easier to substitute things in (because substituting something for q in part
(b) requires showing that the denominator remains invertible, whereas part (a)
has no denominators and thus requires no such diligence).
Proof of Theorem 4.4.13. This is left as a homework exercise (Exercise A.3.4.1 (c)).
(Use induction on n and Theorem 4.4.12.)

Remark
n
4.4.14.
n − 1
 If I
n − k + 1
just
 gave you the fraction
(1 − q ) 1 − q ··· 1−q
, you would be surprised to hear
1 − q k 1 − q k −1 · · · (1 − q 1 )
 
that it is a polynomial (i.e., that the denominator divides the numerator)
  and
n
has nonnegative coefficients. But given the way we defined , you are
k q
now getting this for free from Theorem 4.4.13.

Theorem 4.4.13 (b) can be rewritten in a somewhat simpler way using the
following notations:

Definition 4.4.15. (a) For any n ∈ N, define the q-integer [n]q to be

[ n ] q : = q 0 + q 1 + · · · + q n −1 ∈ Z [ q ] .
Math 701 Spring 2021, version April 6, 2024 page 253

(b) For any n ∈ N, define the q-factorial [n]q ! to be

[ n ] q ! : = [1] q [2] q · · · [ n ] q ∈ Z [ q ] .

(c) As usual, if a is an element of a ring A, then [n] a and [n] a ! will mean the
results of substituting a for q in [n]q and [n]q !, respectively. Thus, explicitly,
[n] a = a0 + a1 + · · · + an−1 and [n] a ! = [1] a [2] a · · · [n] a .

Remark 4.4.16. For any n ∈ N, we have

1 − qn
[n]q = (in Z [[q]] or in the ring of rational functions over Q)
1−q

and
[ n ]1 = n and [n]1 ! = n!.

Proof of Remark 4.4.16. Let n ∈ N. We have


0 1 n −1 1 − qn
[n]q := q + q + · · · + q = ,
1−q
since
     
(1 − q ) q 0 + q 1 + · · · + q n −1 = q 0 + q 1 + · · · + q n −1 − q q 0 + q 1 + · · · + q n −1
   
= q 0 + q 1 + · · · + q n −1 − q 1 + q 2 + · · · + q n
= q0 − q n = 1 − q n .
|{z}
=1

Furthermore, substituting 1 for q in the equality [n]q = q0 + q1 + · · · + qn−1 ,


we obtain
[n]1 = 10 + 11 + · · · + 1n−1 = 1| + 1 +{z· · · + 1} = n. (163)
n times
Substituting 1 for q in the equality [n]q ! = [1]q [2]q · · · [n]q , we obtain
[ n ]1 ! = [1]1 [2]1 · · · [n]1 = 1 · 2 · · · · · n = n!.
|{z} |{z} |{z}
=1 =2 =n
(by (163)) (by (163)) (by (163))

Theorem 4.4.17. Let n, k ∈ N with n ≥ k. Then,


 
n [ n ] q [ n − 1] q · · · [ n − k + 1] q [n]q !
= =
k q [k ]q ! [k ]q ! · [n − k ]q !

(in the ring Z [[q]] or in the ring of rational functions over Q).
Math 701 Spring 2021, version April 6, 2024 page 254

Proof. This is left as a homework exercise (Exercise A.3.4.1 (d)).


A consequence of this theorem is the following symmetry property of q-
binomial coefficients:

Proposition 4.4.18. Let n, k ∈ N. Then,


   
n n
= .
k q n−k q

Proof. This is left as a homework exercise (Exercise A.3.4.1 (e)).

4.4.4. q-binomial formulas


The properties we have seen so far are suggesting that q-binomial coefficients
not only generalize binomial coefficients, but also share most of their properties
in a somewhat modified form. In other words, we start expecting most prop-
erties of binomial coefficients to generalize to q-binomial coefficients, often in
several ways (e.g., the recurrence of the binomial coefficients generalized in two
ways).
Let us see how this expectation holds up for the most famous property of
binomial coefficients: the binomial formula
n  
n k n−k
( a + b) = ∑
n
a b .
k =0
k

This formula holds whenever a and b are two elements of a commutative ring,
or even more generally, whenever a and b are two commuting elements of an
arbitrary ring. If we want to integrate a q into this formula, we need to

• either change the structure of the formula,

• or modify the commutativity assumption.

This gives rise to two different “q-analogues” of the binomial formula. Both
are important (one for the theory of partitions, and another for the theory of
quantum groups). Here is the first one:

Theorem 4.4.19 (1st q-binomial theorem). Let K be a commutative ring. Let


a, b ∈ K and n ∈ N. In the polynomial ring K [q], we have
n  
k(k−1)/2 n
    
0 1
aq + b aq + b · · · aq n −1
+b = ∑ q ak bn−k .
k =0
k q
Math 701 Spring 2021, version April 6, 2024 page 255

Note that setting q = 1 in Theorem 4.4.19 (i.e., substituting 1 for q) recovers


the good old binomial formula, since all the n factors on the left hand side
become a + b.
There is a straightforward way to prove Theorem 4.4.19 by induction on n
(see Exercise A.3.4.2 (a)). Let us instead give a nicer argument. This argument
will rely on the following general fact:

Lemma 4.4.20. Let L be a commutative ring. Let n ∈ N. Let [n] denote the
set {1, 2, . . . , n}. Let a1 , a2 , . . . , an be n elements of L. Let b1 , b2 , . . . , bn be n
further elements of L. Then,
! 
n
∏ ( a i + bi ) = ∑ ∏ ai  ∏ bi  . (164)
i =1 S⊆[n] i ∈S i ∈[n]\S

Lemma 4.4.20 is well-known and intuitively clear: When expanding the prod-
n
uct ∏ ( ai + bi ) = ( a1 + b1 ) ( a2 + b2 ) · · · ( an + bn ), you obtain a sum of 2n terms,
i =1
each of which is a product of one addend chosen from each of the n sums
a1 + b1 , a2 + b2 , . . . , an + bn . This is precisely what the right hand side of (164)
is. A rigorous proof of Lemma 4.4.20 can be found in [Grinbe15, Exercise 6.1
(a)].
Math 701 Spring 2021, version April 6, 2024 page 256

Proof of Theorem 4.4.19. Let [n] denote the set {1, 2, . . . , n}. We have
     n  
0
aq + b 1
aq + b · · · aq n −1
+ b = ∏ aq i −1
+b
i =1
! 
 
= ∑ ∏ aqi−1  ∏ b
S⊆[n] i ∈S i ∈[n]\S
| {z }| {z }
= a | S | ∏ q i −1 =b|[n]\S|
i ∈S
 
i −1
by Lemma 4.4.20, applied to L = K [q] , ai = aq and bi = b
!
= ∑ a|S| ∏ q i −1 b|[n]\S|
S⊆[n] i ∈S
|{z} | {z }
n =qsum S−|S|
=∑ ∑ (since the sum of the exponents i −1
k =0 S⊆[n]; over all i ∈S is precisely sum S−|S|)
|S|=k
n
= ∑ ∑ a|S|
|{z} qsum S−|S| |[n]\S|
|b {z }
k =0 S⊆[n];
| {z }
= ak =qsum S−k =bn−k
|S|=k (since |S|=k) (since S is a k-element
(since |S|=k) subset of the n-element
set [n], and thus we
have |[n]\S|=n−k)
n
= ∑ ∑ ak qsum S−k bn−k
k =0 S⊆[n];
| {z }
|S|=k =qsum S−(1+2+···+k) q1+2+···+(k−1)
=qsum S−(1+2+···+k) qk(k−1)/2
(since 1+2+···+(k−1)=k(k−1)/2)
n
= ∑ ∑ ak qsum S−(1+2+···+k) qk(k−1)/2 bn−k
k =0 S⊆[n];
|S|=k
 
n n  
n
∑q k(k−1)/2 
∑ sum S−(1+2+···+k)  k n−k
∑q k(k−1)/2
ak bn−k .
 
= q a b =
k =0

S⊆[n]; k =0
k q
|S|=k
| {z }
n
=
k q
(by Proposition 4.4.7 (b),
since [n]={1,2,...,n})

This proves Theorem 4.4.19.


The 2nd q-binomial theorem grows out of noncommutativity:
Math 701 Spring 2021, version April 6, 2024 page 257

Theorem 4.4.21 (2nd q-binomial theorem, aka Potter’s binomial theorem).


Let L be a commutative ring. Let ω ∈ L. Let A be a noncommutative L-
algebra. Let a, b ∈ A be such that ba = ωab. Then,
n  
n
( a + b) = ∑
n
ak bn−k .
k =0
k ω

The condition ba = ωab looks somewhat artificial – do such elements a, b


actually exist in the wild? Indeed they do, as the following examples show:

Example 4.4.22. Let L = Z and ω = −1 and A = Z2×2 (the ring of 2 × 2-


matrices with integer entries). Let
   
0 1 1 0
a= and b= .
1 0 0 −1

It is easy to check that these two matrices satisfy ba = − ab, that is, ba = ωab.
Thus, Theorem 4.4.21 predicts that
n  
n
( a + b) = ∑
n
ak bn−k .
k =0
k ω

And this is indeed true (check it for n = 3).

Example 4.4.23. Let L = R. Let A be the ring of R-linear operators on


C ∞ (R) = {smooth functions from R to R}. Let ω be any real number.
Let b ∈ A be the differentiation operator (sending each f ∈ C ∞ (R) to f ′ ).
Let a ∈ A be the operator that substitutes ωx for x in the function (in other
words, it shrinks the plot of the function by ω in the x-direction).
Then, you can check that ba = ωab. (Indeed, ( f (ωx ))′ = ω f ′ (ωx ).)
The proof of Theorem 4.4.21 is again a homework exercise (Exercise A.3.4.2
(b)).
The two q-binomial theorems are not entirely unrelated: Theorem 4.4.19 can
be obtained from Theorem 4.4.21. (See Exercise A.3.4.15 for the details.)

4.4.5. Counting subspaces of vector spaces


 
n
We have introduced the q-binomial coefficient as a generating function
k q
for a certain sort of partitions – i.e., a “weighted number” of partitions, where
|λ|
  partition λ has weight q . However, for certain integers a, the number
each
n
has other interpretations, too. A particularly striking one can be found
k a
when a is the size of a finite field.
Math 701 Spring 2021, version April 6, 2024 page 258

Let us recall a few things about finite fields:

• For any prime power pk , there is a finite field of size pk ; it is unique up


to isomorphism, and is therefore often called the “Galois field of size pk ”
and denoted by F pk . The finite fields F p1 are easiest to construct – they are
just the quotient rings Z/pZ = Z/p (that is, the rings of integers modulo
p). Higher prime powers are more complicated. For example, the finite
field F p2 can be obtained by starting with Z/p and adjoining a square
root of an element that is not a square. It is not Z/p2 , since Z/p2 is not a
field!

• Linear algebra (i.e., the notions of vector spaces, subspaces, linear inde-
pendence, bases, matrices, Gaussian elimination, etc.) can be done over
any field. In fact, many of its concepts can be defined over any commu-
tative ring, but only over fields do they behave as nicely as they do over
the real numbers. Thus, much of the linear algebra that you have learned
over the real numbers remains valid over any field. (Exceptions are some
properties that rely on positivity or on characteristic 0.)

Thus, it makes sense to talk about finite-dimensional vector spaces over finite
fields. Such spaces are finite as sets, and thus can be viewed as combinatorial
objects. An n-dimensional vector space over a finite field F has size | F |n .
Now, we might wonder how many k-dimensional subspaces such an n-dimensional
vector space has. The answer is given by the following theorem:

Theorem 4.4.24. Let F be a finite field. Let n, k ∈ N. Let V be an n-


dimensional F-vector space. Then,
 
n
= (# of k-dimensional vector subspaces of V ) .
k | F|

Compare this with the classical fact that if S is an n-element set, then
 
n
= (# of k-element subsets of S) .
k

This hints at an analogy between finite sets and finite-dimensional vector spaces.
Such an analogy does indeed exist; the expository paper [Cohn04] gives a great
overview over its reach.
The easiest proof of Theorem 4.4.24 uses three lemmas. The first one is a clas-
sical fact from linear algebra, which holds for any vector space (not necessarily
finite-dimensional) over any field (not necessarily finite):
Math 701 Spring 2021, version April 6, 2024 page 259

Lemma 4.4.25. Let F be a field. Let V be an F-vector space. Let (v1 , v2 , . . . , vk )


be a k-tuple of vectors in V. Then, (v1 , v2 , . . . , vk ) is linearly independent if
and only if each i ∈ {1, 2, . . . , k } satisfies vi ∈
/ span (v1 , v2 , . . . , vi−1 ) (where
the span span () of an empty list is understood to be the set {0} consisting
only of the zero vector 0). In other words, (v1 , v2 , . . . , vk ) is linearly indepen-
dent if and only if we have

v1 ∈
/ span () and
| {z }
={0}
v2 ∈
/ span (v1 ) and
v3 ∈
/ span (v1 , v2 ) and
··· and
vk ∈
/ span (v1 , v2 , . . . , vk−1 ) .

Proof of Lemma 4.4.25. We must prove that (v1 , v2 , . . . , vk ) is linearly indepen-


dent if and only if each i ∈ {1, 2, . . . , k} satisfies vi ∈ / span (v1 , v2 , . . . , vi−1 ).
This is an “if and only if” statement; we shall prove its “only if” (i.e., “=⇒”)
and “if” (i.e., “⇐=”) directions separately:
=⇒: Assume that the k-tuple (v1 , v2 , . . . , vk ) is linearly independent. Let i ∈
{1, 2, . . . , k}. If we had vi ∈ span (v1 , v2 , . . . , vi−1 ), then we could write vi in the
form vi = α1 v1 + α2 v2 + · · · + αi−1 vi−1 for some coefficients α1 , α2 , . . . , αi−1 ∈ F,
and therefore these coefficients α1 , α2 , . . . , αi−1 would satisfy

α v + α2 v2 + · · · + αi−1 vi−1 + (−1) vi + 0vi+1 + 0vi+2 + · · · + 0vk


|1 1 {z } | {z } | {z }
= vi =−vi =0
= vi + (−vi ) + 0 = 0,

which would be a nontrivial linear dependence relation between (v1 , v2 , . . . , vk )


(nontrivial because vi appears in it with the nonzero coefficient −1); this would
contradict the linear independence of (v1 , v2 , . . . , vk ). Hence, we cannot have
vi ∈ span (v1 , v2 , . . . , vi−1 ). In other words, we have vi ∈ / span (v1 , v2 , . . . , vi−1 ).
Forget that we fixed i. We thus have shown that each i ∈ {1, 2, . . . , k } satisfies
vi ∈
/ span (v1 , v2 , . . . , vi−1 ). This proves the “=⇒” direction of our claim.
⇐=: Assume that each i ∈ {1, 2, . . . , k} satisfies vi ∈ / span (v1 , v2 , . . . , vi−1 ).
We must prove that the k-tuple (v1 , v2 , . . . , vk ) is linearly independent. Indeed,
assume the contrary. Thus, this k-tuple is linearly dependent. In other words,
there exist coefficients β 1 , β 2 , . . . , β k ∈ F that satisfy β 1 v1 + β 2 v2 + · · · + β k vk =
0 and that are not all zero. Consider these β 1 , β 2 , . . . , β k . At least one i ∈
{1, 2, . . . , k} satisfies β i ̸= 0 (since the coefficients β 1 , β 2 , . . . , β k are not all zero).
Math 701 Spring 2021, version April 6, 2024 page 260

Pick the largest such i. Thus, β i ̸= 0 but β i+1 = β i+2 = · · · = β k = 0. Hence,

β 1 v1 + β 2 v2 + · · · + β k v k
= ( β 1 v 1 + β 2 v 2 + · · · + β i −1 v i −1 ) + β i v i + ( β i +1 v i +1 + β i +2 v i +2 + · · · + β k v k )
| {z }
=0vi+1 +0vi+2 +···+0vk
=0
= ( β 1 v 1 + β 2 v 2 + · · · + β i −1 v i −1 ) + β i v i ,

so that

β i v i = ( β 1 v 1 + β 2 v 2 + · · · + β k v k ) − ( β 1 v 1 + β 2 v 2 + · · · + β i −1 v i −1 )
| {z }
=0
= − ( β 1 v 1 + β 2 v 2 + · · · + β i −1 v i −1 )
= (− β 1 ) v1 + (− β 2 ) v2 + · · · + (− β i−1 ) vi−1
∈ span (v1 , v2 , . . . , vi−1 ) .

Since β i ̸= 0, we thus obtain vi ∈ span (v1 , v2 , . . . , vi−1 ) (since span (v1 , v2 , . . . , vi−1 )
is an F-vector subspace of V and thus preserved under scaling). This contra-
dicts our assumption that vi ∈ / span (v1 , v2 , . . . , vi−1 ). This contradiction shows
that our assumption was wrong, and thus completes our proof of the “⇐=”
direction of our claim.
Thus, both directions of our claim are proved. This concludes the proof of
Lemma 4.4.25.
The next lemma we are going to use is itself an answer to a rather natural
counting problem. Indeed, it is well-known that (see, e.g., [19fco, Proposition
2.7.2]) if X is an n-element set, and if k ∈ N, then the

(# of k-tuples of distinct elements of X )


k −1
= n ( n − 1) ( n − 2) · · · ( n − k + 1) = ∏ (n − i ) . (165)
i =0

The following lemma is a “linear analogue” of this combinatorial fact: The n-


element set X is replaced by an n-dimensional vector space V, and “distinct
elements” are replaced by “linearly independent vectors”. The answer is rather
similar:

Lemma 4.4.26. Let F be a finite field. Let n, k ∈ N. Let V be an n-dimensional


F-vector space. Then,

(# of linearly independent k-tuples of vectors in V )


     k −1  
= | F | n − | F | 0 | F | n − | F | 1 · · · | F | n − | F | k −1 = ∏ | F | n − | F | i .
i =0
Math 701 Spring 2021, version April 6, 2024 page 261

Proof of Lemma 4.4.26. We have |V | = | F |n (since V is an n-dimensional F-vector


space).
Lemma 4.4.25 says that a k-tuple (v1 , v2 , . . . , vk ) of vectors in V is linearly
independent if and only if it satisfies

v1 ∈
/ span () and
| {z }
={0}
v2 ∈
/ span (v1 ) and
v3 ∈
/ span (v1 , v2 ) and
··· and
vk ∈
/ span (v1 , v2 , . . . , vk−1 ) .

Thus, we can construct a linearly independent k-tuple (v1 , v2 , . . . , vk ) of vec-


tors in V as follows, proceeding entry by entry:

• First, we choose v1 . This has to be a vector in V \ span () (because it has


| {z }
={0}
to satisfy v1 ∈
/ span ()); thus, there are |V \ span ()| = |V | − |span ()| =
| {z }
=1
|V | − 1 options for it.
Once v1 has been chosen, we have obtained a linearly independent sin-
gleton list (v1 ). Hence, its span span (v1 ) has dimension 1 (as an F-vector
space) and thus size | F |1 . In other words, |span (v1 )| = | F |1 .

• Next, we choose v2 . This has to be a vector in V \ span (v1 ) (because it


/ span (v1 )); thus, there are |V | − |span (v1 )| = |V | − | F |1
has to satisfy v2 ∈
| {z }
=| F |1
options for it.
Once v2 has been chosen, we have obtained a linearly independent list
(v1 , v2 ). Hence, its span span (v1 , v2 ) has dimension 2 (as an F-vector
space) and thus size | F |2 . In other words, |span (v1 , v2 )| = | F |2 .

• Next, we choose v3 . This has to be a vector in V \ span (v1 , v2 ) (because


it has to satisfy v3 ∈
/ span (v1 , v2 )); thus, there are |V | − |span (v1 , v2 )| =
| {z }
=| F |2
|V | − | F |2 options for it.
Once v3 has been chosen, we have obtained a linearly independent list
(v1 , v2 , v3 ). Hence, its span span (v1 , v2 , v3 ) has dimension 3 (as an F-
vector space) and thus size | F |3 . In other words, |span (v1 , v2 , v3 )| = | F |3 .

• And so on, until the last vector vk in our list has been chosen.
Math 701 Spring 2021, version April 6, 2024 page 262

The total # of ways to perform this construction is


    
1 2 k −1
(|V | − 1) |V | − | F | |V | − | F | · · · |V | − | F | .

Hence,

(# of linearly independent k-tuples of vectors in V )


    
= (|V | − 1) |V | − | F |1 |V | − | F |2 · · · |V | − | F |k−1
k −1   k −1  
= ∏ |V | − | F |i = ∏ | F | n − | F |i
i =0 i =0

(since |V | = | F |n ). This proves Lemma 4.4.26.


Another lemma we will need is a basic combinatorial principle (often illus-
trated by the saying “to count a flock of sheep, count the legs and divide by
4”):

Lemma 4.4.27 (Multijection principle). Let A and B be two finite sets. Let
m ∈ N. Let f : A → B be any map. Assume that each b ∈ B has exactly m
preimages under f (that is, for each b ∈ B, there are exactly m many elements
a ∈ A such that f ( a) = b). Then,

| A| = m · | B| .

Proof. Easy and LTTR.


We note that a map f : A → B satisfying the assumption of Lemma 4.4.27 is
often called an m-to-1 map.
 
n
Proof of Theorem 4.4.24. First of all, we notice that if k > n, then = 0 (by
k | F|
Proposition 4.4.9) and (# of k-dimensional vector subspaces of V ) = 0 (since
the dimension of a subspace of V is never larger than the dimension of V).
Thus, Theorem 4.4.24 is true when k > n. Hence, for the rest of this proof, we
WLOG assume that k ≤ n.
We will use the shorthand “linind” for the words “linearly independent”.
If (v1 , v2 , . . . , vk ) is a linind k-tuple of vectors in V, then span (v1 , v2 , . . . , vk )
is a k-dimensional vector subspace of V. Hence, we can define a map

f : {linind k-tuples of vectors in V } → {k-dimensional vector subspaces of V } ,


(v1 , v2 , . . . , vk ) 7→ span (v1 , v2 , . . . , vk ) .

Consider this map f . We claim the following:


Math 701 Spring 2021, version April 6, 2024 page 263

Observation 1: Each k-dimensional vector subspace of V has exactly


    
k 0 k 1 k k −1
| F| − | F| | F| − | F| · · · | F| − | F|

preimages under f .

[Proof of Observation 1: Let W be a k-dimensional vector subspace of V. We


must prove that
    
k 0 k 1 k k −1
(# of preimages of W under f ) = | F | − | F | | F| − | F| · · · | F| − | F| .

A preimage of W under f is a k-tuple of vectors in V that spans W (by the


very definition of f ). Obviously, all the k vectors in such a k-tuple must belong
to W. Thus, a preimage of W under f is a k-tuple of vectors in W that spans
W. However, the vector space W is k-dimensional. Thus, a k-tuple of vectors in
W spans W if and only if this k-tuple is linind69 . Therefore, a preimage of W
under f is a linind k-tuple of vectors in W. Hence,

(# of preimages of W under f )
= (# of linearly independent k-tuples of vectors in W )
    
k 0 k 1 k k −1
= | F| − | F| | F| − | F| · · · | F| − | F|

(by Lemma 4.4.26, applied to W and k instead of V and n). This proves Obser-
vation 1.]
Now, Observation
 1 shows
  k-dimensional
that each  vector subspace of V
has exactly | F |k − | F |0 | F |k − | F |1 · · · | F |k − | F |k−1 preimages under f .
Hence, Lemma 4.4.27 (applied to A = {linind k-tuples of vectors in V } and
B = {k-dimensional
  vector subspaces
  of V } and 
k
m = | F| − | F| 0
| F | − | F | · · · | F |k − | F |k−1 ) shows that
k 1

(# of linind k-tuples of vectors in V )


    
k 0 k 1 k k −1
= | F| − | F| | F| − | F| · · · | F| − | F|
· (# of k-dimensional vector subspaces of V ) .

However, Lemma 4.4.26 yields

(# of linind k-tuples of vectors in V )


    
n 0 n 1 n k −1
= | F| − | F| | F| − | F| · · · | F| − | F| .
69 Again,we are using a simple fact from linear algebra here, which is true over any field (not
necessarily finite): A k-tuple of vectors in a k-dimensional vector space spans the space if
and only if it is linind.
Math 701 Spring 2021, version April 6, 2024 page 264

Comparing these two equalities, we obtain


    
k 0 k 1 k k −1
| F| − | F| | F| − | F| · · · | F| − | F|
· (# of k-dimensional vector subspaces of V )
    
n 0 n 1 n k −1
= | F| − | F| | F| − | F| · · · | F| − | F| .

Therefore,

(# of k-dimensional vector subspaces of V )


    
| F | n − | F | 0 | F | n − | F | 1 · · · | F | n − | F | k −1
=     
| F | k − | F | 0 | F | k − | F | 1 · · · | F | k − | F | k −1
k −1  
n i
∏ | F| − | F| k −1
| F | n − | F |i k −1
1 − | F | n −i
 =∏ ∏
i =0
= =
k −1 
k i i =0 | F | k − | F |i i =0 1 − | F | k −i
∏ | F| − | F|  | {z  }
i =0
| F |i | F | n −i − 1 | F | n −i − 1
= =
| F | k −i − 1

i k −i
| | | |
F F − 1
1 − | F | n −i
=
1 − | F | k −i
k −1  
∏ 1 − | F | n −i    
1 − | F | n 1 − | F | n −1 · · · 1 − | F | n − k +1
= i =0   =     
k −1 k k −1 1
∏ 1 − | F|
k −i 1 − | |
F 1 − | |
F · · · 1 − | |
F
i =0
 
n
=
k | F|

(since substituting | F | for q in Theorem  4.4.13 (b) yields


n  n − 1 n − k +1
 
n 1 − | F| 1 − | F| · · · 1 − | F|
=      ). This proves Theorem
k | F| 1 − | F | k 1 − | F | k −1 · · · 1 − | F | 1
4.4.24.

4.4.6. Limits of q-binomial coefficients


There is much more to say about q-binomial coefficients, but let us just briefly
focus on their limiting behavior. This is not analogous
 to anything known from
n
usual binomial coefficients; indeed, the limit lim does not exist for any
n→∞ k
Math 701 Spring 2021, version April 6, 2024 page 265

positive integer k. However, q-binomial coefficients behave much better in this


regard.  
n
Indeed, consider the q-binomial coefficients for various values of n:
2 q
 
0
= 0,
2 q
 
1
= 0,
2 q
 
2
= 1,
2 q
 
3
= 1 + q + q2 ,
2 q
 
4
= 1 + q + 2q2 + q3 + q4 ,
2 q
 
5
= 1 + q + 2q2 + 2q3 + 2q4 + q5 + q6 ,
2 q
 
6
= 1 + q + 2q2 + 2q3 + 3q4 + 2q5 + 2q6 + q7 + q8 .
2 q
  !
n
It appears from these examples that the sequence coefficientwise
2 q
n ∈N
stabilizes70 to
 j n k
1 + q + 2q2 + 2q3 + 3q4 + 3q5 + · · · = ∑ 1+
2
qn .
n ∈N

And this is indeed the case:

Proposition 4.4.28. Let k ∈ N be fixed. Then,


  k
n 1
lim = ∑ ( p0 (n) + p1 (n) + · · · + pk (n)) qn = ∏ 1 − qi .
n→∞ k
q n ∈N i =1

(See Definition 4.1.3 (a) for the meaning of pi (n).)

70 Recall Definition 3.13.3 for the notion of “coefficientwise stabilizing”.


Math 701 Spring 2021, version April 6, 2024 page 266

First proof of Proposition 4.4.28 (sketched). For each integer n ≥ k, we have

(1 − q n ) 1 − q n −1 · · · 1 − q n − k +1
   
n
= (by Theorem 4.4.13 (b))
1 − q k 1 − q k −1 · · · (1 − q 1 )
 
k q
1 − q n − k +1 1 − q n − k +2 · · · (1 − q n )
 
= 
(1 − q1 ) (1 − q2 ) · · · 1 − q k
(here, we have turned both products upside down)
k
∏ 1 − q n − k +i

k 
1 
= i =1 = · ∏ 1 − q n − k +i .
k k
∏ (1 − q i ) ∏ (1 − q i ) i =1
i =1 i =1

However, we have lim qn = 0 (check this!71 ). Thus, for each i ∈ {1, 2, . . . , k},
n→∞
we have
lim qn−k+i = 0
n→∞

(since the family qn−k+i is just a reindexing of the family (qn )n≥0 ), and

n ≥ k −i
therefore  
n − k +i
lim 1 − q = 1.
n→∞

Hence, Corollary 3.13.9 (applied to f i,n = 1 − qn−k+i and f i = 1) yields that

k   k k   k
lim
n→∞
∑ 1−q n − k +i
= ∑1 and lim
n→∞
∏ 1−q n − k +i
= ∏ 1.
i =1 i =1 i =1 i =1

Thus, in particular,

k   k

n→∞
lim ∏ 1 − q n − k +i = ∏ 1 = 1.
i =1 i =1

Now, recall that each integer n ≥ k satisfies


  k 
n 1 
= · ∏ 1 − q n − k +i .
k q k
i =1
∏ (1 − q i )
i =1

71 This is a matter of understanding Definition 3.13.3.


Math 701 Spring 2021, version April 6, 2024 page 267

Hence,
 
  k
n  1  
lim = lim  ·∏ 1−q n − k +i
 
n→∞ k n→∞  k

q i =1
∏ (1 − q i )

i =1
k 
1  1
= · lim ∏ 1 − qn−k+i =
k n→∞ k
i =1
∏ (1 − q i ) | {z } ∏ (1 − q i )
i =1 =1 i =1
k
1
=∏ i
. (166)
i =1 1 − q

Finally, Theorem 4.1.19 (with the letters x, m and k renamed as q, k and i)


says that
k
1
∑ ( p0 (n) + p1 (n) + · · · + pk (n)) qn = ∏ 1 − qi .
n ∈N i =1
Combining this with (166), we obtain
  k
n 1
lim = ∑ ( p0 (n) + p1 (n) + · · · + pk (n)) qn = ∏ 1 − qi .
n→∞ k
q n ∈N i =1

This proves Proposition 4.4.28.


Second proof of Proposition 4.4.28 (sketched). For each n ∈ N, we have
 
n
k q
= ∑ q|λ| (167)
λ is a partition
with largest part ≤n−k
and length ≤k
 
n
(by the definition of ).
k q
However, for each n ∈ N, the sum ∑ q|λ| is a partial sum of
λ is a partition
with largest part ≤n−k
and length ≤k
the sum ∑ q|λ| , and this partial sum grows by more and more addends
λ is a partition
with length ≤k
as n increases; each addend of the sum ∑ q|λ| gets eventually included
λ is a partition
with length ≤k
in this partial sum (for sufficiently large n). From these observations, it is easy
Math 701 Spring 2021, version April 6, 2024 page 268

to obtain that

n→∞
lim ∑ q|λ| = ∑ q|λ| .
λ is a partition λ is a partition
with largest part ≤n−k with length ≤k
and length ≤k

In view of (167), this rewrites as


 
n
lim
n→∞ k
= ∑ q|λ| = ∑ (# of partitions of n having length ≤ k ) qn
q λ is a partition n ∈N
| {z }
with length ≤k = p0 (n)+ p1 (n)+···+ pk (n)
(by Definition 4.1.3 (a))
k
1
= ∑ ( p0 (n) + p1 (n) + · · · + pk (n)) qn = ∏ i
n ∈N i =1 1 − q

(by Theorem 4.1.19, with the letters x, m and k renamed as q, k and i). Thus,
Proposition 4.4.28 is proved again.

4.5. References
Thus ends our foray into integer partitions and related FPSs. We will partially
revisit this topic later, as we discuss symmetric functions. Here are just a few
things we are omitting:

• In 1919, Ramanujan discovered the following three congruences for p (n):

p (n) ≡ 0 mod 5 if n ≡ 4 mod 5;


p (n) ≡ 0 mod 7 if n ≡ 5 mod 7;
p (n) ≡ 0 mod 11 if n ≡ 6 mod 11.

The first of these follows from the FPS equality


5
1 − x5i
∑ p (5n + 4) x = 5 ∏
n
6
,
n ∈N i =1 (1 − x i )
whose proof is far from straightforward. All of these results (and some
rather subtle generalizations) are shown in [Berndt06, Chapter 2] and
[Hirsch17, Chapters 3 and 5]; see also [Aigner07, Chapter 3, Highlight]
for a proof of the latter equality.

• An asymptotic expansion for p (n) (found by Hardy and Ramanujan in


1918) is r !
1 2n
p (n) ∼ √ exp π as n → ∞. (168)
4n 3 3
See [Erdos42] for a proof.
Math 701 Spring 2021, version April 6, 2024 page 269

• In 1770, Lagrange proved that every nonnegative integer n can be written


as a sum of four perfect squares. In 1829, Jacobi strengthened this to a
counting formula: If n is a positive integer, then the number of quadru-
ples ( a, b, c, d) of integers satisfying n = a2 + b2 + c2 + d2 is 8 times the
sum of positive divisors of n that are not divisible by 4. The most elemen-
tary proofs of this striking result use partition-related FPSs and the Jacobi
Triple Product Identity. (See [Hirsch87] or [Sambal22, Theorem 7.20] for
self-contained proofs; see also [Hirsch17, Chapter 2] and [Berndt06, Chap-
ter 3] for various related results.)

• The Rogers–Ramanujan identities


2
xk 1
∑ (1 − x1 ) (1 − x2 ) · · · 1 − xk  = ∏ (1 − x5i+1 ) (1 − x5i+4 ) and
k ∈N i ∈N
x k ( k +1) 1
∑ 1 2 k
 = ∏ 5i + 2 ) (1 − x5i+3 )
k ∈N (1 − x ) (1 − x ) · · · 1 − x i ∈N (1 − x

can be used to count partitions into parts that are congruent to ±1 mod 5
or congruent to ±2 mod 5, respectively. These surprising identities can
be proved using Proposition 4.4.28 and the Jacobi Triple Product Iden-
tity; see [Doyle19] for a self-contained writeup of this proof. A whole
book [Sills18] has been written about these two identities and their many
variants.

Here is a list of references for further reading on partitions:

• The book [AndEri04] by Andrews and Eriksson is a beautiful (if not al-
ways fully precise) introduction to integer partitions and related topics.

• Pak’s [Pak06] is a survey of identities between partition numbers (and


related FPSs) with occasionally outlined proofs. (Beware: the writing is
very terse and teems with typos.)

• Hirschhorn’s [Hirsch17] (subtitled “a personal journey”, not meant to be


comprehensive) studies partitions through the lens of (mostly purely al-
gebraic) manipulation of FPSs.

• Berndt’s [Berndt06] is another (more analytic and number-theoretical)


study of partition-related FPSs, with applications to number theory.

5. Permutations
We now come back to the foundations of combinatorics: We will study permu-
tations of finite sets. I will assume that you know their most basic properties
Math 701 Spring 2021, version April 6, 2024 page 270

(see, e.g., [Strick13, Appendix B] and [Goodma15, §1.5] for refreshers; see also
[Grinbe15, Chapter 5] for many more details on inversions), and will show
some more advanced results. For deeper treatments, see [Bona12], [Sagan01]
and [Stanle11, Chapter 1].

5.1. Basic definitions


Definition 5.1.1. Let X be a set.
(a) A permutation of X means a bijection from X to X.
(b) It is known that the set of all permutations of X is a group under
composition. This group is called the symmetric group of X, and is denoted
by SX . Its neutral element is the identity map idX : X → X. Its size is | X |!
when X is finite.
(Alternative notations for SX include Sym ( X ) and Σ X and SX and S X .)
(c) As usual in group theory, we will write αβ for the composition α ◦ β
when α, β ∈ SX . This is the map that sends each x ∈ X to α ( β ( x )).
(d) If α ∈ SX and i ∈ Z, then αi shall denote the i-th power of α in the
group SX . Note that αi = α 0 −1
| ◦ α ◦{z· · · ◦ α} if i ≥ 0. Also, α = idX . Also, α is
i times
the inverse of α in the group SX , that is, the inverse of the map α.

Definition 5.1.2. Let n ∈ Z. Then, [n] shall mean the set {1, 2, . . . , n}. This is
an n-element set if n ≥ 0, and is an empty set if n ≤ 0.
The symmetric group S[n] (consisting of all permutations of [n]) will be
denoted Sn and called the n-th symmetric group. Its size is n! (when n ≥ 0).

For instance, S3 is the group of all 6 permutations of the set [3] = {1, 2, 3}.
If two sets X and Y are in bijection, then their symmetric groups SX and SY
are isomorphic. Intuitively, this is clear (just think of Y as a “copy” of X with
all elements relabelled, and use this to reinterpret each permutation of X as a
permutation of Y). We can formalize this as the following proposition:

Proposition 5.1.3. Let X and Y be two sets, and let f : X → Y be a bijec-


tion. Then, for each permutation σ of X, the map f ◦ σ ◦ f −1 : Y → Y is a
permutation of Y. Furthermore, the map

S f : SX → SY ,
σ 7 → f ◦ σ ◦ f −1

is a group isomorphism; thus, we obtain SX ∼


= SY .

Proof. Easy and LTTR.


Math 701 Spring 2021, version April 6, 2024 page 271

Because of Proposition 5.1.3, if you want to understand the symmetric groups


of finite sets, you only need to understand Sn for all n ∈ N (because if X is a
finite set of size n, then there is a bijection f : X → [n] and therefore a group
isomorphism S f : SX → S[n] ). Thus, we will focus mostly on Sn in this chapter.

Remark 5.1.4. If Y = X in Proposition 5.1.3, then the group isomorphism S f


is conjugation by f in the group SX .

Next, let us define three ways to represent a permutation:

Definition 5.1.5. Let n ∈ N and σ ∈ Sn . We introduce three notations for σ:


(a) A two-line notation of σ means a 2 × n-array

···
 
p1 p2 pn
,
σ ( p1 ) σ ( p2 ) · · · σ ( p n )

where the entries p1 , p2 , . . . , pn of the top row are the n elements of [n] in
some order. Note that this is a standard notation for any kind of map from a
finite set. Commonly, we pick pi = i, so we get the array
 
1 2 ··· n
.
σ (1) σ (2) · · · σ ( n )

(b) The one-line notation (short, OLN) of σ means the n-tuple


(σ (1) , σ (2) , . . . , σ (n)).
It is common to omit the commas and the parentheses when writing down
the OLN of σ. Thus, one simply writes σ (1) σ (2) · · · σ (n) instead of
(σ (1) , σ (2) , . . . , σ (n)). Note that this omission can make the notation am-
biguous if some of the σ (i ) have more than one digit (for example, the OLN
1112345678910 can mean two different permutations of [11], depending on
whether you read the “111” part as “1, 11” or as “11, 1”). However, if n ≤ 10,
then this ambiguity does not occur, and the notation is unproblematic (even
without commas and parentheses).
(c) The cycle digraph of σ is defined (informally) as follows:

• For each i ∈ [n], draw a point (“node”) labelled i.

• For each i ∈ [n], draw an arrow (“arc”) from the node labelled i to the
node labelled σ (i ).

The resulting picture is called the cycle digraph of σ.


Using the concept of digraphs (= directed graphs), this definition can be
restated formally as follows: The cycle digraph of σ is the directed graph with
vertices 1, 2, . . . , n and arcs i → σ (i ) for all i ∈ [n].
Math 701 Spring 2021, version April 6, 2024 page 272

Example 5.1.6. Let σ : [4] → [4] be the map that sends the elements 1, 2, 3, 4
to 2, 4, 3, 1, respectively. Then, σ is a bijection, thus a permutation of [4].
   
1 2 3 4 3 1 4 2
(a) A two-line notation of σ is . Another is .
  2 4 3 1 3 2 1 4
4 1 3 2
Another is . There are 24 two-line notations of σ in total, since
1 2 3 4
we can freely choose the order in which the elements of [4] appear in the top
row.
(b) The one-line notation of σ is (2, 4, 3, 1). Omitting the commas and the
parentheses, we can rewrite this as 2431.
(c) One way to draw the cycle digraph of σ is

1 2 3 4

Another is

4 1 3

2
.
(When drawing cycle digraphs, one commonly tries to place the nodes in
such a way as to make the arcs as short as possible. Thus, it is natural to
keep the cycles separate in the picture. But formally speaking, any picture is
fine, as long as the nodes and arcs don’t overlap.)

Example 5.1.7. Let σ : [10] → [10] be the map that sends the elements
1, 2, 3, 4, 5, 6, 7, 8, 9, 10 to 5, 4, 3, 2, 6, 10, 1, 9, 8, 7, respectively. Then, σ is a
bijection, hence a permutation of [10]. The one-line notation of σ is
(5, 4, 3, 2, 6, 10, 1, 9, 8, 7). If we omit the commas and the parentheses, then
this becomes
5 4 3 2 6 (10) 1 9 8 7.
(We have put the 10 in parentheses to make its place clearer.) The cycle
Math 701 Spring 2021, version April 6, 2024 page 273

digraph of σ is

5 9 8

1 6

7 10 2 4
.

5.2. Transpositions and cycles


We shall now define some important families of permutations.

5.2.1. Transpositions

Definition 5.2.1. Let i and j be two distinct elements of a set X.


Then, the transposition ti,j is the permutation of X that sends i to j, sends j
to i, and leaves all other elements of X unchanged.

Strictly speaking, the notation ti,j is somewhat ambiguous, since it suppresses


X. However, most of the times we will use it, the set X will be either clear from
the context or irrelevant.

Example 5.2.2. The permutation t2,4 of the set [7] sends the elements
1, 2, 3, 4, 5, 6, 7 to 1, 4, 3, 2, 5, 6, 7, respectively. Its one-line notation (with com-
mas and parentheses omitted) is therefore 1432567.

Note that ti,j = t j,i for any two distinct elements i and j of a set X.

Definition 5.2.3. Let n ∈ N and i ∈ [n − 1]. Then, the simple transposition si


is defined by
si := ti,i+1 ∈ Sn .

Thus, a simple transposition is a transposition that swaps two consecutive


integers. Again, the notation si suppresses n, but this won’t usually be a prob-
lem.
Math 701 Spring 2021, version April 6, 2024 page 274

Example 5.2.4. The permutation s2 of the set [7] sends the elements
1, 2, 3, 4, 5, 6, 7 to 1, 3, 2, 4, 5, 6, 7, respectively. Its one-line notation is therefore
1324567.
Here are some very basic properties of simple transpositions:72

Proposition 5.2.5. Let n ∈ N.


(a) We have s2i = id for all i ∈ [n − 1]. In other words, we have si = si−1 for
all i ∈ [n − 1].
(b) We have si s j = s j si for any i, j ∈ [n − 1] with |i − j| > 1.
(c) We have si si+1 si = si+1 si si+1 for any i ∈ [n − 2].

Proof. To prove that two permutations α and β of [n] are identical, it suffices
to show that α (k) = β (k ) for each k ∈ [n]. Using this strategy, we can prove
all three parts of Proposition 5.2.5 straightforwardly (distinguishing cases cor-
responding to the relative positions of k, i, i + 1, j and j + 1). This is done in
detail for Proposition 5.2.5 (c) in [Grinbe15, solution to Exercise 5.1 (a)]; the
proofs of parts (a) and (b) are easier and LTTR.

5.2.2. Cycles
The following definition can be viewed as a generalization of Definition 5.2.1:

Definition 5.2.6. Let X be a set. Let i1 , i2 , . . . , ik be k distinct elements of X.


Then,
cyci1 ,i2 ,...,ik
means the permutation of X that sends

i1 to i2 ,
i2 to i3 ,
i3 to i4 ,
...,
ik−1 to ik ,
ik to i1

and leaves all other elements of X unchanged. In other words, cyci1 ,i2 ,...,ik
means the permutation of X that satisfies
(
i j+1 , if p = i j for some j ∈ {1, 2, . . . , k } ;
cyci1 ,i2 ,...,ik ( p) =
p, otherwise
for every p ∈ X,
72 Recall Definition 5.1.1. Thus, for example, s2i means si si = si ◦ si , whereas si s j means si ◦ s j .
Math 701 Spring 2021, version April 6, 2024 page 275

where ik+1 means i1 .


This permutation is called a k-cycle.

The name “k-cycle” harkens back to the cycle digraph of cyci1 ,i2 ,...,ik , which
consists of a cycle of length k (containing the nodes i1 , i2 , . . . , ik in this order)
along with | X | − k isolated nodes (more precisely, each of the elements of X \
{i1 , i2 , . . . , ik } has an arrow from itself to itself in the cycle digraph of σ). Here
is an example:

Example 5.2.7. Let X = [8]. Then, the permutation cyc2,6,5 of X sends

2 to 6,
6 to 5,
5 to 2

and leaves all other elements of X unchanged. Thus, this permutation has
OLN 16342578 and cycle digraph

5 6 1 3

4 7 8

Example 5.2.8. Let X be a set. If i and j are two distinct elements of X, then
cyci,j = ti,j . Thus, the 2-cycles in SX are precisely the transpositions in SX , so
 
|X|
there are many of them (since any 2-element subset {i, j} of X gives
2
rise to a transposition ti,j , and this assignment of transpositions to 2-element
subsets is bijective).

Note that the k-cycle cyci1 ,i2 ,...,ik is often denoted by (i1 , i2 , . . . , ik ), but I will not
use this notation here, since it clashes with the standard notation for k-tuples.

Exercise 5.2.2.1. Let n ∈ N and let k ∈ [n]. Let X be an n-element set. How
many k-cycles exist in SX ?

Solution to Exercise 5.2.2.1 (sketched). First, we note that there is exactly one 1-
cycle in SX (for n > 0), since a 1-cycle is just the identity map. This should be
viewed as a degenerate case; thus, we WLOG assume that k > 1.
Math 701 Spring 2021, version April 6, 2024 page 276

For any k distinct elements i1 , i2 , . . . , ik of X, we have


cyci1 ,i2 ,...,ik = cyci2 ,i3 ,...,ik ,i1 = cyci3 ,i4 ,...,ik ,i1 ,i2 = · · · = cycik ,i1 ,i2 ,...,ik−1 .

That is, cyci1 ,i2 ,...,ik does not change if we cyclically rotate the list (i1 , i2 , . . . , ik ).
Any k-cycle cyci1 ,i2 ,...,ik uniquely determines the elements i1 , i2 , . . . , ik up to
cyclic rotation (since k > 1). Indeed, if σ = cyci1 ,i2 ,...,ik is a k-cycle, then the
elements i1 , i2 , . . . , ik are precisely the elements of X that are not fixed by σ (it is
here that we use our assumption k > 1), and furthermore, if we know which of
these elements is i1 , then we can reconstruct the remaining elements i2 , i3 , . . . , ik
recursively by
i2 = σ ( i1 ) , i3 = σ ( i2 ) , i4 = σ ( i3 ) , ..., i k = σ ( i k −1 )
(that is, i2 , i3 , . . . , ik are obtained by iteratively applying σ to i1 ). Therefore,
if σ ∈ SX is a k-cycle, then there are precisely k lists (i1 , i2 , . . . , ik ) for which
σ = cyci1 ,i2 ,...,ik (coming from the k possibilities for which of the k non-fixed
points of σ should be i1 ).
Hence, the map
f : {k-tuples of distinct elements of X } → {k-cycles in SX } ,
(i1 , i2 , . . . , ik ) 7→ cyci1 ,i2 ,...,ik
is a k-to-1 map (i.e., each k-cycle in SX has precisely k preimages under this
map). Therefore, Lemma 4.4.27 (applied to m = k and
A = {k-tuples of distinct elements of X } and B = {k-cycles in SX }) yields
(# of k-tuples of distinct elements of X ) = k · (# of k-cycles in SX ) .
Therefore,
1
(# of k-cycles in SX ) = · (# of k-tuples of distinct elements of X )
k | {z }
=n(n−1)(n−2)···(n−k+1)
(by (165), since X is an n-element set)
1
= · n ( n − 1) ( n − 2) · · · ( n − k + 1)
k 

n
= · ( k − 1) ! (by a bit of simple algebra) .
k
This is the answer to Exercise 5.2.2.1 in the case k > 1. Hence, Exercise 5.2.2.1
is solved.

5.3. Inversions, length and Lehmer codes


5.3.1. Inversions and lengths
Let us define some features of arbitrary permutations of [n]:
Math 701 Spring 2021, version April 6, 2024 page 277

Definition 5.3.1. Let n ∈ N and σ ∈ Sn .


(a) An inversion of σ means a pair (i, j) of elements of [n] such that i < j
and σ (i ) > σ ( j).
(b) The length (also known as the Coxeter length) of σ is the # of inversions
of σ. It is called ℓ (σ ). (Some authors call it inv σ instead.)

(In LaTeX, the symbol “ℓ” is obtained by typing “\ell”. If you just type “l”,
you will get “l”.)
An inversion of a permutation σ can thus be viewed as a pair of elements of
[n] whose relative order changes when σ is applied to them. (We require this
pair (i, j) to satisfy i < j in order not to count each such pair doubly.)

Example 5.3.2. Let π ∈ S4 be the permutation with OLN 3142. The inversions
of π are
 

(1, 2) since 1 < 2 and π (1) > π (2) and


| {z } | {z }
=3 =1
 

(1, 4) since 1 < 4 and π (1) > π (4) and


| {z } | {z }
=3 =2
 

(3, 4) since 3 < 4 and π (3) > π (4) .


| {z } | {z }
=4 =2

Thus, the length of π is 3.

For a given n ∈ N and a given k ∈ N, how many permutations σ ∈ Sn have


length k ? The following proposition gives a partial answer:

Proposition 5.3.3. Let n ∈ N.


  
n
(a) For any σ ∈ Sn , we have ℓ (σ) ∈ 0, 1, . . . , .
2
(b) We have
(# of σ ∈ Sn with ℓ (σ) = 0) = 1.
Indeed, the only permutation σ ∈ Sn with ℓ (σ) = 0 is the identity map id.
(c) We have   
n
# of σ ∈ Sn with ℓ (σ) = = 1.
2
 
n
Indeed, the only permutation σ ∈ Sn with ℓ (σ) = is the permutation
2
with OLN n (n − 1) (n − 2) · · · 21. (This permutation is often called w0 .)
Math 701 Spring 2021, version April 6, 2024 page 278

(d) If n ≥ 1, then

(# of σ ∈ Sn with ℓ (σ) = 1) = n − 1.

Indeed, the only permutations σ ∈ Sn with ℓ (σ ) = 1 are the simple transpo-


sitions si with i ∈ [n − 1].
(e) If n ≥ 2, then

( n − 2) ( n + 1)
(# of σ ∈ Sn with ℓ (σ) = 2) = .
2
Indeed, the only permutations σ ∈ Sn with ℓ (σ) = 2 are the products si s j
with 1 ≤ i < j < n as well as the products si si−1 with i ∈ {2, 3, . . . , n − 1}. If
( n − 2) ( n + 1)
n ≥ 2, then there are such products (and they are all distinct).
2

(f) For any k ∈ Z, we have


   
n
(# of σ ∈ Sn with ℓ (σ) = k) = # of σ ∈ Sn with ℓ (σ ) = −k .
2

Proof. Exercise A.4.3.1.


What about the general case? Alas, there is no explicit formula for the # of
σ ∈ Sn with ℓ (σ) = k. However, there is a nice formula for the generating
function
∑ (# of σ ∈ Sn with ℓ (σ) = k) xk = ∑ xℓ(σ) .
k ∈N σ ∈ Sn
Let us first sound it out on the case n = 3:

Example 5.3.4. Written in one-line notation, the permutations of the set [3]
are 123, 132, 213, 231, 312, and 321. Their lengths are

ℓ (123) = 0, ℓ (132) = 1, ℓ (213) = 1,


ℓ (231) = 2, ℓ (312) = 2, ℓ (321) = 3.

Thus,

∑ x ℓ(σ) = x ℓ(123) + x ℓ(132) + x ℓ(213) + x ℓ(231) + x ℓ(312) + x ℓ(321)


σ ∈ S3
(where we are writing each σ ∈ S3 in OLN)
= x0 + x1 + x1 + x2 + x2 + x3 = 1 + 2x + 2x2 + x3
 
2
= (1 + x ) 1 + x + x .
Math 701 Spring 2021, version April 6, 2024 page 279

This suggests the following general result:

Proposition 5.3.5. Let n ∈ N. Then,

∑ x ℓ(σ)
σ ∈ Sn
n −1  
= ∏ 1+x+x +···+x2 i
i =1
    
2 2 3 2 n −1
= (1 + x ) 1 + x + x 1+x+x +x ··· 1+x+x +···+x
= [n] x !.

(Here, we are using Definition 4.4.15, so that [n] x ! means the result of substi-
tuting x for q in the q-factorial [n]q !.)

5.3.2. Lehmer codes


We will prove this proposition using the so-called Lehmer code of a permutation,
which is defined as follows:

Definition 5.3.6. Let n ∈ N. The following notations will be used throughout


Section 5.3:
(a) For each σ ∈ Sn and i ∈ [n], we set

ℓi (σ) := (# of all j ∈ [n] satisfying i < j and σ (i ) > σ ( j))


= (# of all j ∈ {i + 1, i + 2, . . . , n} such that σ (i ) > σ ( j)) .

(The last equality sign here is clear, since the j ∈ [n] satisfying i < j are
precisely the j ∈ {i + 1, i + 2, . . . , n}.)
(b) For each m ∈ Z, we let [m]0 denote the set {0, 1, . . . , m}. (This is an
empty set when m < 0.)
(c) We let Hn denote the set

[ n − 1]0 × [ n − 2]0 × · · · × [ n − n ]0
= {( j1 , j2 , . . . , jn ) ∈ Nn | ji ≤ n − i for each i ∈ [n]} .

This set Hn has size

| Hn | = |[n − 1]0 × [n − 2]0 × · · · × [n − n]0 |


= |[n − 1]0 | · |[n − 2]0 | · · · · · |[n − n]0 |
| {z } | {z } | {z }
=n = n −1 =1
= n (n − 1) · · · 1 = n!.
Math 701 Spring 2021, version April 6, 2024 page 280

(d) We define the map

L : Sn → Hn ,
σ 7→ (ℓ1 (σ) , ℓ2 (σ) , . . . , ℓn (σ)) .

(This map is well-defined, since each σ ∈ Sn and each i ∈ [n] satisfy ℓi (σ) ∈
{0, 1, . . . , n − i } = [n − i ]0 .)
(e) If σ ∈ Sn is a permutation, then the n-tuple L (σ ) =
(ℓ1 (σ) , ℓ2 (σ) , . . . , ℓn (σ)) is called the Lehmer code (or just the code) of σ.

Example 5.3.7. (a) If n = 6, and if σ ∈ S6 is the permutation with one-line


notation 451263, then ℓ2 (σ) = 3 (because the numbers j ∈ [n] satisfying
2 < j and σ (2) > σ ( j) are precisely 3, 4 and 6, so that there are 3 of them)
and likewise ℓ1 (σ) = 3 and ℓ3 (σ ) = 0 and ℓ4 (σ) = 0 and ℓ5 (σ) = 1 and
ℓ6 (σ) = 0, and thus L (σ) = (3, 3, 0, 0, 1, 0) ∈ H6 .
(b) Here is a table of all 6 permutations σ ∈ S3 (written in one-line nota-
tion) and their respective Lehmer codes L (σ):

σ L (σ)
123 (0, 0, 0)
132 (0, 1, 0)
213 (1, 0, 0) .
231 (1, 1, 0)
312 (2, 0, 0)
321 (2, 1, 0)

The Lehmer code L (σ) of a permutation σ is a refinement of its length ℓ (σ),


in the sense that it gives finer information (i.e., we can reconstruct ℓ (σ) from
L (σ)):

Proposition 5.3.8. Let n ∈ N and σ ∈ Sn . Then, ℓ (σ ) = ℓ1 (σ) + ℓ2 (σ) +


· · · + ℓ n ( σ ).

Proof. This follows from the definitions of ℓ (σ) and ℓi (σ).


The main property of Lehmer codes is that they uniquely determine permu-
tations, and in fact are in bijection with them (cf. Example 5.3.7 (b)):
Math 701 Spring 2021, version April 6, 2024 page 281

Theorem 5.3.9. Let n ∈ N. Then, the map L : Sn → Hn is a bijection.

We shall sketch two ways of proving this theorem.


First proof of Theorem 5.3.9 (sketched). Let σ ∈ Sn . Let i ∈ [n]. Recall that the
OLN of σ is the n-tuple σ (1) σ (2) · · · σ (n). The definition of ℓi (σ) can be
rewritten as follows:

ℓi (σ) = (# of all j ∈ {i + 1, i + 2, . . . , n} such that σ (i ) > σ ( j))


= (# of all entries in the OLN of σ that appear
after σ (i ) but are smaller than σ (i ))
= (# of elements of [n] \ {σ (1) , σ (2) , . . . , σ (i )}
that are smaller than σ (i ))

(since the elements of [n] \ {σ (1) , σ (2) , . . . , σ (i )} are precisely the entries that
appear after σ (i ) in the OLN of σ). We can replace the set [n] \ {σ (1) , σ (2) , . . . , σ (i )}
by [n] \ {σ (1) , σ (2) , . . . , σ (i − 1)} in this equality73 (this will not change the
# of elements of this set that are smaller than σ (i ), because it only inserts the
element σ (i ) into the set, and obviously this element σ (i ) is not smaller than
σ (i )). Thus, we obtain

ℓi (σ) = (# of elements of [n] \ {σ (1) , σ (2) , . . . , σ (i − 1)}


that are smaller than σ (i )) . (169)

Therefore,

σ (i ) = (the (ℓi (σ) + 1) -st smallest element of


the set [n] \ {σ (1) , σ (2) , . . . , σ (i − 1)}) (170)

(since σ (i ) is an element of [n] \ {σ (1) , σ (2) , . . . , σ (i − 1)}).


Forget that we fixed i. We thus have proved the equality (170) for each i ∈ [n].
This equality (170) allows us to find σ (i ) if ℓi (σ) and σ (1) , σ (2) , . . . , σ (i − 1)
are known. This can be used to recover σ from L (σ). Let us perform this
recovery in an example: Let n = 5, and let σ ∈ S5 satisfy L (σ ) = (3, 1, 2, 1, 0).
What is σ ?
From L (σ ) = (3, 1, 2, 1, 0), we obtain ℓ1 (σ) = 3 and ℓ2 (σ) = 1 and ℓ3 (σ) = 2
and ℓ4 (σ ) = 1 and ℓ5 (σ) = 0.

73 If i = 1, then the set {σ (1) , σ (2) , . . . , σ (i − 1)} is empty, so that we have [n] \
{σ (1) , σ (2) , . . . , σ (i − 1)} = [n] in this case.
Math 701 Spring 2021, version April 6, 2024 page 282

Applying (170) to i = 1, we find


σ (1) = (the (ℓ1 (σ) + 1) -st smallest element of

the set [n] \ {σ (1) , σ (2) , . . . , σ (1 − 1)}


| {z }
=∅
= (the (3 + 1) -st smallest element of the set [5])
(since ℓ1 (σ) = 3 and n = 5)
= (the 4-th smallest element of the set [5]) = 4.
Applying (170) to i = 2, we find
σ (2) = (the (ℓ2 (σ) + 1) -st smallest element of

the set [n] \ {σ (1) , σ (2) , . . . , σ (2 − 1)}



| {z }
={σ(1)}
= (the (1 + 1) -st smallest element of the set [5] \ {4})
(since ℓ2 (σ) = 1 and n = 5 and σ (1) = 4)
= (the 2-nd smallest element of the set [5] \ {4}) = 2.
Applying (170) to i = 3, we find
σ (3) = (the (ℓ3 (σ ) + 1) -st smallest element of

the set [n] \ {σ (1) , σ (2) , . . . , σ (3 − 1)}



| {z }
={σ(1),σ(2)}
= (the (2 + 1) -st smallest element of the set [5] \ {4, 2})
(since ℓ3 (σ) = 2 and n = 5 and σ (1) = 4 and σ (2) = 2)
= (the 3-rd smallest element of the set [5] \ {4, 2}) = 5
(since [5] \ {4, 2} = {1, 3, 5}).
Continuing like this, we find σ (4) = 3 and σ (5) = 1. Thus, the OLN of σ is
σ (1) σ (2) σ (3) σ (4) σ (5) = 42531.
This method allows us to reconstruct any σ ∈ Sn from L (σ) (and thus shows
that the map L is injective). We shall now see what happens if we apply it to
an arbitrary n-tuple j = ( j1 , j2 , . . . , jn ) ∈ Hn instead of L (σ) (that is, we replace
ℓi (σ) by ji ).
Thus, we define a map
M : Hn → Sn
as follows: If j = ( j1 , j2 , . . . , jn ) ∈ Hn , then M (j) is the map σ : [n] → [n] whose
values σ (1) , σ (2) , . . . , σ (n) are defined recursively by the rule
σ (i ) = (the ( ji + 1) -st smallest element of
the set [n] \ {σ (1) , σ (2) , . . . , σ (i − 1)}) . (171)
Math 701 Spring 2021, version April 6, 2024 page 283

This map σ
• is always well-defined (indeed, we never run out of values in the pro-
cess of constructing σ, because of the following argument: each i ∈
[n] satisfies ji ≤ n − i 74 , and thus the ( n − i + 1)-element set [ n ] \

{σ (1) , σ (2) , . . . , σ (i − 1)} has a ( ji + 1)-st smallest element75 ), and


• always is a permutation of [n] (indeed, our definition of σ (i ) ensures that
σ (i ) ∈ [n] \ {σ (1) , σ (2) , . . . , σ (i − 1)}, so that σ (i ) ∈
/ {σ (1) , σ (2) , . . . , σ (i − 1)},
and therefore the map σ has no two equal values; thus, σ is injective; there-
fore, by the Pigeonhole Principle for Injections76 , the map σ must also be
bijective, i.e., a permutation of [n]).
Thus, the map M is well-defined.
We claim that the maps L and M are mutually inverse. Indeed, we already
know that M undoes L (since applying M to L (σ) produces precisely our
above-discussed algorithm for reconstructing σ from L (σ)); in other words,
we have M ◦ L = id. It is also easy to see that L ◦ M = id. Thus, the maps L
and M are inverse, so that L is bijective. This proves Theorem 5.3.9.
Our second proof of Theorem 5.3.9 will be less algorithmic, but it provides a
good illustration for the use of total orders. We will only outline it; the details
can be found in [Grinbe15, solution to Exercise 5.18].
This second proof relies on a total order that can be defined on the set Zn of
n-tuples of integers:

Definition 5.3.10. Let ( a1 , a2 , . . . , an ) and (b1 , b2 , . . . , bn ) be two n-tuples of


integers. We say that

( a1 , a2 , . . . , an ) <lex (b1 , b2 , . . . , bn ) (172)

if and only if

• there exists some k ∈ [n] such that ak ̸= bk , and

• the smallest such k satisfies ak < bk .

74 because ( j1 , j2 , . . . , jn ) ∈ Hn = [n − 1]0 × [n − 2]0 × · · · × [n − n]0 entails ji ∈ [n − i ]0


75 To be fully precise: We don’t know yet that [n] \ {σ (1) , σ (2) , . . . , σ (i − 1)} is an (n − i + 1)-
element set, since we haven’t yet shown that the i − 1 elements σ (1) , σ (2) , . . . , σ (i − 1) are
distinct. However, if they are not distinct, then the set [n] \ {σ (1) , σ (2) , . . . , σ (i − 1)} has
more than n − i + 1 elements, which is just as good for our argument.
76 The Pigeonhole Principle for Injections says the following two things:

– If f : X → Y is an injective map between two finite sets X and Y, then | X | ≤ |Y |.


– If f : X → Y is an injective map between two finite sets X and Y of the same size, then f
is bijective.
We are here using the second statement.
Math 701 Spring 2021, version April 6, 2024 page 284

For example, (4, 1, 2, 5) <lex (4, 1, 3, 0) and (1, 1, 0, 1) <lex (2, 0, 0, 0). The re-
lation (172) is usually pronounced as “( a1 , a2 , . . . , an ) is lexicographically smaller
than (b1 , b2 , . . . , bn )”; the word “lexicographic” comes from the idea that if num-
bers were letters, then a “word” a1 a2 · · · an would appear earlier in a dictionary
than b1 b2 · · · bn if and only if ( a1 , a2 , . . . , an ) <lex (b1 , b2 , . . . , bn ).
Now, the following is easy to see:

Proposition 5.3.11. If a and b are two distinct n-tuples of integers, then we


have either a <lex b or b <lex a.

Actually, it is not hard to show that the relation <lex is a total order on the
set Zn (known as the lexicographic order); however, Proposition 5.3.11 is the only
part of this statement that we will need.

Proposition 5.3.12. Let σ ∈ Sn and τ ∈ Sn be such that

(σ (1) , σ (2) , . . . , σ (n)) <lex (τ (1) , τ (2) , . . . , τ (n)) . (173)

Then,

(ℓ1 (σ) , ℓ2 (σ) , . . . , ℓn (σ)) <lex (ℓ1 (τ ) , ℓ2 (τ ) , . . . , ℓn (τ )) . (174)

(In other words, L (σ) <lex L (τ ).)

Proof of Proposition 5.3.12 (sketched). (See [Grinbe15, solution to Exercise 5.18,


proof of Proposition 5.50] for details.) The assumption (173) shows that there
exists some k ∈ [n] such that σ (k ) ̸= τ (k ), and that the smallest such k satisfies
σ (k) < τ (k). Consider this smallest k. Thus, σ (i ) = τ (i ) for each i ∈ [k − 1]
(since k is smallest with σ (k ) ̸= τ (k )). Hence, using (169), we can easily see
that
ℓi ( σ ) = ℓi ( τ ) for each i ∈ [k − 1] . (175)
Let Z be the set

[n] \ {σ (1) , σ (2) , . . . , σ (k − 1)} = [n] \ {τ (1) , τ (2) , . . . , τ (k − 1)} .

Then, (169) (applied to i = k) yields that

ℓk (σ) = (# of elements of Z that are smaller than σ (k)) ,

and similarly we have

ℓk (τ ) = (# of elements of Z that are smaller than τ (k)) .

From these two equalities, we can easily see that ℓk (σ ) < ℓk (τ ). (In fact, any
element of Z that is smaller than σ (k) must also be smaller than τ (k ) (since
σ (k ) < τ (k)), but there is at least one element of Z that is smaller than τ (k )
Math 701 Spring 2021, version April 6, 2024 page 285

but not smaller than σ (k) (namely, the element σ (k )). Hence, there are fewer
elements of Z that are smaller than σ (k ) than there are elements of Z that are
smaller than τ (k).)
Combining (175) with ℓk (σ ) < ℓk (τ ), we obtain (ℓ1 (σ) , ℓ2 (σ) , . . . , ℓn (σ)) <lex
(ℓ1 (τ ) , ℓ2 (τ ) , . . . , ℓn (τ )) (by Definition 5.3.10). This proves Proposition 5.3.12.

Now, we can easily finish our second proof of Theorem 5.3.9:


Second proof of Theorem 5.3.9 (sketched). We shall first show that the map L is in-
jective.
Indeed, let σ and τ be two distinct permutations in Sn . Then, the two n-
tuples (σ (1) , σ (2) , . . . , σ (n)) and (τ (1) , τ (2) , . . . , τ (n)) (that is, the OLNs of
σ and τ) are distinct as well. Hence, Proposition 5.3.11 yields that we have either
(σ (1) , σ (2) , . . . , σ (n)) <lex (τ (1) , τ (2) , . . . , τ (n)) or (τ (1) , τ (2) , . . . , τ (n)) <lex
(σ (1) , σ (2) , . . . , σ (n)). In the first case, we obtain L (σ) <lex L (τ ) (by Propo-
sition 5.3.12); in the second case, we likewise obtain L (τ ) <lex L (σ). In either
case, we thus conclude that L (σ) ̸= L (τ ).
Forget that we fixed σ and τ. We thus have shown that if σ and τ are two
distinct permutations in Sn , then L (σ ) ̸= L (τ ). In other words, the map L :
Sn → Hn is injective. However, L is a map between two finite sets of the same
size (indeed, |Sn | = n! = | Hn |). Thus, the Pigeonhole Principle for Injections
shows that L is bijective (since L is injective). This proves Theorem 5.3.9 again.

Now, we can prove Proposition 5.3.5 at last:


Proof of Proposition 5.3.5. Each σ ∈ Sn satisfies

ℓ ( σ ) = ℓ1 ( σ ) + ℓ2 ( σ ) + · · · + ℓ n ( σ ) (by Proposition 5.3.8)


= (sum of the entries of L (σ))
Math 701 Spring 2021, version April 6, 2024 page 286

(since L (σ ) = (ℓ1 (σ ) , ℓ2 (σ ) , . . . , ℓn (σ ))). Thus,

∑ x ℓ(σ) = ∑ x (sum of the entries of L(σ)) = ∑ |x


(sum of the entries of ( j1 ,j2 ,...,jn ))
{z }
σ ∈ Sn σ ∈ Sn ( j1 ,j2 ,...,jn )∈ Hn = x j1 + j2 +···+ jn = x j1 x j2 ··· x jn
 
here, we have substituted ( j1 , j2 , . . . , jn ) for L (σ) in
the sum, since the map L : Sn → Hn is a bijection
= ∑ x j1 x j2 · · · x jn = ∑ x j1 x j2 · · · x jn
( j1 ,j2 ,...,jn )∈ Hn ( j1 ,j2 ,...,jn )∈[n−1]0 ×[n−2]0 ×···×[n−n]0
(since Hn = ( j1 , j2 , . . . , jn ) ∈ [n − 1]0 × [n − 2]0 × · · · × [n − n]0 )
    

= ∑ x j1   ∑ x j2  · · ·  ∑ x jn 
j1 ∈[n−1]0 j2 ∈[n−2]0 jn ∈[n−n]0

(by the product rule (120))


! ! !
n −1 n −2 n−n
= ∑ x j1 ∑ x j2 ··· ∑ x jn
j1 =0 j2 =0 jn =0
(since [m]0 = {0, 1, . . . , m} for any m ∈ Z)
  
= 1 + x + x 2 + · · · + x n −1 1 + x + x 2 + · · · + x n −2 · · · (1 + x ) (1 )
  
= 1 + x + x 2 + · · · + x n −1 1 + x + x 2 + · · · + x n −2 · · · (1 + x )
    
2 2 3 2 n −1
= (1 + x ) 1 + x + x 1+x+x +x ··· 1+x+x +···+x
n −1  
= ∏ 1 + x + x2 + · · · + xi = [n] x !
i =1

(the last equality sign here is easy to check). This proves Proposition 5.3.5.

5.3.3. More about lengths and simples


Let us continue studying lengths of permutations.

Proposition 5.3.13. Let n ∈ N and σ ∈ Sn . Then, ℓ σ−1 = ℓ (σ).




Proof of Proposition 5.3.13 (sketched). (See [Grinbe15, Exercise  5.2 (f)] for details.)
Recall that ℓ (σ) is the # of inversions of σ, while ℓ σ − 1 is the # of inversions
of σ .− 1

Recall also that an inversion of σ is a pair (i, j) ∈ [n] × [n] such that i < j and
σ (i ) > σ ( j). Likewise, an inversion of σ−1 is a pair (u, v) ∈ [n] × [n] such that
u < v and σ−1 (u) > σ−1 (v).
Thus, if (i, j) is an inversion of σ, then (σ ( j) , σ (i )) is an inversion of σ−1 .
Math 701 Spring 2021, version April 6, 2024 page 287

Hence, we obtain a map


n o
{inversions of σ} → inversions of σ−1 ,
(i, j) 7→ (σ ( j) , σ (i )) .
This map is furthermore bijective (indeed, it has an inverse map, which sends
each (u, v) ∈ inversions of σ−1 to σ−1 (v) , σ−1 (u) ). Thus, the bijection
 

principle yields
 
(# of inversions of σ) = # of inversions of σ−1 .

In other words, ℓ (σ ) = ℓ σ−1 . This proves Proposition 5.3.13.




The following lemma is crucial for understanding lengths of permutations:

Lemma 5.3.14 (single swap lemma). Let n ∈ N, σ ∈ Sn and k ∈ [n − 1]. Then:

(a) We have
(
ℓ (σ) + 1, if σ (k) < σ (k + 1) ;
ℓ (σsk ) =
ℓ (σ) − 1, if σ (k) > σ (k + 1) .

(b) We have
(
ℓ (σ) + 1, if σ−1 (k) < σ−1 (k + 1) ;
ℓ (sk σ ) =
ℓ (σ) − 1, if σ−1 (k) > σ−1 (k + 1) .

[Note: If i ∈ [n], then σ (i ) is the entry in position i of the one-line notation


of σ, whereas σ−1 (i ) is the position in which the number i appears in the
one-line notation of σ. For example, if σ = 512364 in one-line notation, then
σ (6) = 4 and σ−1 (6) = 5.]

We will only outline the proof of Lemma 5.3.14; a detailed proof can be found
in [Grinbe15, Exercise 5.2 (a)] (although not completely the same proof).
Proof of Lemma 5.3.14 (sketched). (b) The OLN77 of sk σ is obtained from the OLN
of σ by swapping the two entries k and k + 1. This is best seen on an example:
For example, if σ = 512364 (in OLN), then s3 σ = 512463. In general, this
follows by observing that

k + 1, if σ (i ) = k;

(sk σ) (i ) = sk (σ (i )) = k, if σ (i ) = k + 1; for each i ∈ [n] .

σ (i ) , otherwise

77 Recall that “OLN” means “one-line notation”.


Math 701 Spring 2021, version April 6, 2024 page 288

Let us now use this observation to see how the inversions of sk σ differ from
the inversions of σ. Indeed, let us call an inversion (i, j) of a permutation τ
exceptional if we have τ (i ) = k + 1 and τ ( j) = k. All other inversions of τ will
be called non-exceptional.
Now, we make the following observation:

Observation 1: If (i, j) is any non-exceptional inversion of σ, then (i, j)


is still a non-exceptional inversion of sk σ.

[Proof of Observation 1: Let (i, j) be a non-exceptional inversion of σ. Thus,


we have the inequality σ (i ) > σ ( j) (since (i, j) is an inversion of σ). This
inequality cannot get reversed by applying sk to both its sides (i.e., we cannot
have sk (σ (i )) < sk (σ ( j))), since the only pair (u, v) ∈ [n] × [n] satisfying u > v
and sk (u) < sk (v) is the pair (k + 1, k) (but our pair (σ (i ) , σ ( j)) cannot equal
this pair (k + 1, k ), since the inversion (i, j) of σ is non-exceptional). Hence,
we have sk (σ (i )) ≥ sk (σ ( j)). In other words, (sk σ ) (i ) ≥ (sk σ) ( j). Since the
map sk σ is injective (being a permutation of [n]), we thus obtain (sk σ) (i ) >
(sk σ) ( j) (since i ̸= j). Thus, the pair (i, j) is an inversion of sk σ. Moreover, this
inversion (i, j) is non-exceptional (since otherwise we would have (sk σ) (i ) =
k + 1 and (sk σ) ( j) = k, which would lead to σ (i ) = k and σ ( j) = k + 1, which
would contradict σ (i ) > σ ( j)). Thus, we have shown that (i, j) is still a non-
exceptional inversion of sk σ. This proves Observation 1.]
Similarly to Observation 1, we can prove the following:

Observation 2: If (i, j) is any non-exceptional inversion of sk σ, then


(i, j) is still a non-exceptional inversion of σ.

(Alternatively, we can obtain Observation 2 by applying Observation 1 to sk σ


instead of σ, since we have sk sk σ = σ.)
|{z}
=s2k =id
Combining Observation 1 with Observation 2, we see that the non-exceptional
inversions of sk σ are precisely the non-exceptional inversions of σ. Hence,

(# of non-exceptional inversions of sk σ)
= (# of non-exceptional inversions of σ) . (176)

What about the exceptional inversions? A permutation τ ∈ Sn has a unique


exceptional inversion if k appears after k + 1 in the OLN of τ (that is, if we have
τ −1 (k ) > τ −1 (k + 1)); otherwise, it has none. Thus:

• If σ−1 (k ) < σ−1 (k + 1), then the permutation sk σ has a unique excep-
tional inversion, whereas the permutation σ has none.

• If σ−1 (k) > σ−1 (k + 1), then the permutation σ has a unique exceptional
inversion, whereas the permutation sk σ has none.
Math 701 Spring 2021, version April 6, 2024 page 289

Thus,

(# of exceptional inversions of sk σ)
= (# of exceptional inversions of σ)
(
1, if σ−1 (k) < σ−1 (k + 1) ;
+ (177)
−1, if σ−1 (k) > σ−1 (k + 1) .

Now, recall that each inversion of a permutation τ ∈ Sn is either exceptional


or non-exceptional (and cannot be both at the same time). Thus, adding to-
gether the two equalities (177) and (176), we obtain

(# of inversions of sk σ)
(
1, if σ−1 (k ) < σ−1 (k + 1) ;
= (# of inversions of σ) +
−1, if σ−1 (k ) > σ−1 (k + 1)
(
(# of inversions of σ) + 1, if σ−1 (k) < σ−1 (k + 1) ;
=
(# of inversions of σ) − 1, if σ−1 (k) > σ−1 (k + 1) .

In other words,
(
ℓ (σ) + 1, if σ−1 (k) < σ−1 (k + 1) ;
ℓ (sk σ ) =
ℓ (σ) − 1, if σ−1 (k) > σ−1 (k + 1)

(since ℓ (τ ) denotes the # of inversions of any permutation τ). This proves


Lemma 5.3.14 (b).
(a) Applying Lemma 5.3.14 (b) to σ−1 instead of σ, we obtain

  ℓ σ−1  + 1, if σ−1 −1 (k ) < σ−1 −1 (k + 1) ;
ℓ s k σ −1 =
ℓ σ−1  − 1, if σ−1 −1 (k ) > σ−1 −1 (k + 1)
(
ℓ (σ) + 1, if σ (k) < σ (k + 1) ;
= (178)
ℓ (σ) − 1, if σ (k) > σ (k + 1)
 −1
(since σ−1 = σ, and since Proposition 5.3.13 yields ℓ σ−1 =ℓ (σ)). How-


ever, Proposition 5.3.13 (applied to σsk instead of σ) yields ℓ (σsk )−1 =
ℓ (σsk ). In view of (σsk )−1 = s− 1 −1
= sk σ−1 , this rewrites as ℓ sk σ−1 =

k σ
|{z}
=sk
ℓ (σsk ). Comparing this with (178), we obtain
(
ℓ (σ) + 1, if σ (k) < σ (k + 1) ;
ℓ (σsk ) =
ℓ (σ) − 1, if σ (k) > σ (k + 1) .

This proves Lemma 5.3.14 (a).


Math 701 Spring 2021, version April 6, 2024 page 290

Lemma 5.3.14 answers what happens to the length of a permutation when


we compose it (from the left or the right) with a simple transposition sk . What
happens when we compose it with a non-simple transposition? The situation
is more complicated, but it is still true that the length decreases or increases
depending on whether the two entries that are being swapped formed an in-
version or not. Here is the exact answer (stated only for σti,j , but a version for
ti,j σ can easily be derived from it):

Proposition 5.3.15. Let n ∈ N and σ ∈ Sn . Let i and j be two elements of [n]


such that i < j. Then:
 
(a) We have ℓ σti,j < ℓ (σ) if σ (i ) > σ ( j). We have ℓ σti,j > ℓ (σ) if
σ ( i ) < σ ( j ).
(b) We have
(
 ℓ (σ) − 2 | Q| − 1, if σ (i ) > σ ( j) ;
ℓ σti,j =
ℓ (σ) + 2 | R| + 1, if σ (i ) < σ ( j) ,

where

Q = {k ∈ {i + 1, i + 2, . . . , j − 1} | σ (i ) > σ (k) > σ ( j)} and


R = {k ∈ {i + 1, i + 2, . . . , j − 1} | σ (i ) < σ (k) < σ ( j)} .

Proof of Proposition 5.3.15 (sketched). (b) This follows by a diligent analysis of the
possible interactions between an inversion and composition by ti,j . To be more
concrete:

• The fact that ℓ σti,j = ℓ (σ) − 2 | Q| − 1 when σ (i ) > σ ( j) is [Grinbe15,
Exercise 5.20]. A straightforward solution was given by Elafandi in [18f-hw4se].
(The solution given in [Grinbe15, Exercise 5.20] is more circuitous, as it
uses summation tricks to bypass case distinctions.)

• The fact that ℓ σti,j = ℓ (σ) + 2 | R| + 1 when σ (i ) < σ ( j) follows by
applying the previous fact to σti,j instead of σ. (Indeed, if σ (i ) < σ ( j),
 
then σti,j (i ) = σ ( j) > σ (i ) = σti,j ( j) and σ ti,j ti,j = σ. Moreover,
| {z }
=t2i,j =id
when we replace σ by σti,j , the sets Q and R trade places.)

(a) This follows immediately from part (b).


Now we come to one of the main facts about permutations of a finite set.
Math 701 Spring 2021, version April 6, 2024 page 291

Convention 5.3.16. We recall that a simple transposition in Sn means one of the


n − 1 transpositions s1 , s2 , . . . , sn−1 . We shall occasionally abbreviate “simple
transposition” as “simple”.

Theorem 5.3.17 (1st reduced word theorem for the symmetric group). Let
n ∈ N and σ ∈ Sn . Then:
(a) We can write σ as a composition (i.e., product) of ℓ (σ ) simples.
(b) The number ℓ (σ ) is the smallest p ∈ N such that we can write σ as a
composition of p simples.
[Keep in mind: The composition of 0 simples is id, since id is the neutral
element of the group Sn .]

Example 5.3.18. Let σ ∈ S4 be the permutation 4132 (in OLN). How can we
write σ as a composition of simples? There are several ways to do this; for
example,

σ = s2 s3 s2 s1 = s3 s2 s3 s1 = s3 s2 s1 s3 = s2 s1 s1 s3 s2 s1 = s2 s1 s3 s1 s2 s1 = · · · .
| {z } |{z}
= s3 s2 s3 = s1 s3

The shortest of these representations involve 4 simples, precisely as predicted


by Theorem 5.3.17 (since ℓ (σ) = 4).

Before we prove Theorem 5.3.17, let me mention a geometric visualization


of the symmetric group that will not be used in what follows, but sheds some
light on the theorem and on the role of simple transpositions:

Remark 5.3.19. Let n ∈ N. Then, the permutations of [n] can be represented


as the vertices of an (n − 1)-dimensional polytope in n-dimensional space.
Namely, each permutation σ of [n] gives rise to the point

Vσ := (σ (1) , σ (2) , . . . , σ (n)) ∈ Rn .

The convex hull of all these n! many points Vσ (for σ ∈ Sn ) is a polytope (i.e.,
a bounded convex polyhedron in Rn ). This polytope is known as the per-
mutahedron (corresponding to n). It is actually (n − 1)-dimensional, since
all its vertices lie on the hyperplane with equation x1 + x2 + · · · + xn =
1 + 2 + · · · + n. It can be shown (see, e.g., [GaiGup77]) that:

• The vertices of this polytope are precisely the n! points Vσ with σ ∈ Sn .

• Two vertices Vσ and Vτ are joined by an edge if and only if σ = sk τ for


some k ∈ [n − 1].
Math 701 Spring 2021, version April 6, 2024 page 292

The (intuitively obvious) fact that any two vertices of a polytope can be con-
nected by a sequence of edges therefore yields that any σ ∈ Sn can be written
as a product of simples. This is a weaker version of Theorem 5.3.17 (a).

We refer to textbooks on discrete geometry and geometric combinatorics


for more about polytopes and permutahedra in particular. Let me here just
show the permutahedra for n = 3 and for n = 4 (note that the permutahe-
dron for n = 2 is a boring line segment in R2 ):

• The permutahedron for n = 3 is a regular hexagon:

(3, 2, 1)

(2, 3, 1) (3, 1, 2)
3

2
1
(1, 3, 2) (2, 1, 3)
1 2
1
2 (1, 2, 3)
33

The picture on the left (courtesy of tex.stackexchange user Jake, re-


leased under the MIT license) shows the permutahedron embedded in
R3 ; the picture on the right is a view from an orthogonal direction.
Math 701 Spring 2021, version April 6, 2024 page 293

• The permutahedron for n = 4 is a truncated octahedron:

(4,1,2,3)
(3,1,2,4) (4,2,1,3)
(3,2,1,4)

(4,1,3,2)
(2,1,3,4) (4,3,1,2)
(2,3,1,4)

(3,1,4,2) (4,2,3,1)
(2,1,4,3) (4,3,2,1)

(1,2,3,4) (3,4,1,2)
(1,3,2,4) (2,4,1,3)
(3,2,4,1)

(1,2,4,3) (3,4,2,1)

(1,4,2,3)
(2,3,4,1)

(1,3,4,2) (2,4,3,1)
(1,4,3,2)

(picture courtesy of David Eppstein on the Wikipedia).

Let us now outline a proof of Theorem 5.3.17.


Proof of Theorem 5.3.17 (sketched). (a) (See [Grinbe15, Exercise 5.2 (e)] for details.)
We proceed by induction on ℓ (σ):
Induction base: If ℓ (σ) = 0, then σ = id, so that we can write σ as a composi-
tion of 0 simples.
Induction step: Fix h ∈ N. Assume (as the IH78 ) that Theorem 5.3.17 (a) holds
for ℓ (σ) = h.
Now, let σ ∈ Sn be such that ℓ (σ ) = h + 1. We must prove that we can write
σ as a composition of ℓ (σ) simples.
We have ℓ (σ) = h + 1 > h ≥ 0; hence, σ has at least one inversion. Thus,
we cannot have σ (1) ≤ σ (2) ≤ · · · ≤ σ (n). In other words, there exists some
k ∈ [n − 1] such that σ (k ) > σ (k + 1). Let us fix such a k.

78 “IH” means “induction hypothesis”.


Math 701 Spring 2021, version April 6, 2024 page 294

Lemma 5.3.14 (a) yields


(
ℓ (σ) + 1, if σ (k) < σ (k + 1) ;
ℓ (σsk ) =
ℓ (σ) − 1, if σ (k) > σ (k + 1)
= ℓ (σ) − 1 (since σ (k) > σ (k + 1))
=h (since ℓ (σ) = h + 1) .

Thus, the IH tells us that we can apply Theorem 5.3.17 (a) to σsk instead of σ.
We conclude that we can write σsk as a composition of ℓ (σsk ) simples. In other
words, we can write σsk as a composition of h simples (since ℓ (σsk ) = h). That
is, we have

σsk = si1 si2 · · · sih for some i1 , i2 , . . . , ih ∈ [n − 1] .

Consider these i1 , i2 , . . . , ih . Now,

σ= σsk s−
k
1
= s i1 s i2 · · · s i h s k .
|{z} |{z}
=si1 si2 ···sih =sk

This shows that we can write σ as a composition of h + 1 simples. In other


words, we can write σ as a composition of ℓ (σ) simples (since ℓ (σ ) = h + 1).
Thus, Theorem 5.3.17 (a) holds for ℓ (σ ) = h + 1. This completes the induction
step, and so Theorem 5.3.17 (a) is proved by induction.
(b) (See [Grinbe15, Exercise 5.2 (g)] for details.)
We already know from Theorem 5.3.17 (a) that we can write σ as a compo-
sition of ℓ (σ ) simples. It thus remains to show that we cannot write σ as a
composition of fewer than ℓ (σ) simples.
This will clearly follow if we can show that
 
ℓ s i1 s i2 · · · s i g ≤ g (179)

for any i1 , i2 , . . . , i g ∈ [n − 1].


In order to prove (5.3.17), we first make a simple observation: For any σ ∈ Sn
and any k ∈ [n − 1], we have

ℓ (σsk ) ≤ ℓ (σ) + 1. (180)

(This follows from Lemma 5.3.14 (a), since both numbers ℓ (σ ) + 1 and ℓ (σ) − 1
Math 701 Spring 2021, version April 6, 2024 page 295

are ≤ ℓ (σ ) + 1.) Now, for any i1 , i2 , . . . , i g ∈ [n − 1], we have


 
ℓ s i1 s i2 · · · s i g
 
≤ ℓ s i 1 s i 2 · · · s i g −1 + 1 (by (180))
 
≤ ℓ s i 1 s i 2 · · · s i g −2 + 2
     
since (180) yields ℓ si1 si2 · · · sig−1 ≤ ℓ si1 si2 · · · sig−2 + 1
 
≤ ℓ s i 1 s i 2 · · · s i g −3 + 3
     
since (180) yields ℓ si1 si2 · · · sig−2 ≤ ℓ si1 si2 · · · sig−3 + 1
≤ ···
≤ ℓ (id) + g (since the product of 0 simples is id)
| {z }
=0
= g.

This proves (179), and thus concludes our proof of Theorem 5.3.17 (b).

Corollary 5.3.20. Let n ∈ N.


(a) We have ℓ (στ ) ≡ ℓ (σ) + ℓ (τ ) mod 2 for all σ ∈ Sn and τ ∈ Sn .
(b) We have ℓ (στ ) ≤ ℓ (σ) + ℓ (τ ) for all σ ∈ Sn and τ ∈ Sn .
(c) Let k1 , k2 , . . . , k q ∈ [n − 1], and let σ = sk1 sk2 · · · skq . Then, q ≥ ℓ (σ ) and
q ≡ ℓ (σ) mod 2.

Example 5.3.21. Let n = 4. Consider the two permutations σ = 3214 and


τ = 3142 (both written in one-line notation). Then, ℓ (σ) = 3 and ℓ (τ ) = 3.
Now, the permutation στ = 1342 has length ℓ (στ ) = 2.
Corollary 5.3.20 (a) says ℓ (στ ) ≡ ℓ (σ ) + ℓ (τ ) mod 2. In other words, 2 ≡
3 + 3 mod 2.
Corollary 5.3.20 (b) says ℓ (στ ) ≤ ℓ (σ ) + ℓ (τ ). In other words, 2 ≤ 3 + 3.

Proof of Corollary 5.3.20 (sketched). (a) (See [Grinbe15, Exercise 5.2 (b)] for de-
tails.)
For any σ ∈ Sn and any k ∈ [n − 1], we have

ℓ (σsk ) ≡ ℓ (σ) + 1 mod 2. (181)

(This follows from Lemma 5.3.14 (a), since both numbers ℓ (σ) + 1 and ℓ (σ) − 1
are congruent to ℓ (σ) + 1 modulo 2.)
Now, let σ ∈ Sn and τ ∈ Sn . Theorem 5.3.17 (a) yields that we can write τ as a
composition of ℓ (τ ) simples. In other words, we can write τ as τ = sk1 sk2 · · · skq
Math 701 Spring 2021, version April 6, 2024 page 296

for some k1 , k2 , . . . , k q ∈ [n − 1], where q = ℓ (τ ). Consider these k1 , k2 , . . . , k q .


Now,
   
ℓ (στ ) = ℓ σsk1 sk2 · · · skq since τ = sk1 sk2 · · · skq
 
≡ ℓ σsk1 sk2 · · · skq−1 + 1 (by (181))
 
≡ ℓ σsk1 sk2 · · · skq−2 + 2
     
since (181) yields ℓ σsk1 sk2 · · · skq−1 ≡ ℓ σsk1 sk2 · · · skq−2 + 1 mod 2
 
≡ ℓ σsk1 sk2 · · · skq−3 + 3
     
since (181) yields ℓ σsk1 sk2 · · · skq−2 ≡ ℓ σsk1 sk2 · · · skq−3 + 1 mod 2
≡ ···
≡ ℓ (σ) + q = ℓ (σ) + ℓ (τ ) mod 2.
|{z}
=ℓ(τ )

This proves Corollary 5.3.20 (a).


(b) (See [Grinbe15, Exercise 5.2 (c)] for details.)
This is analogous to the proof of Corollary 5.3.20 (a) (but using inequalities
instead of congruences, and using (180) instead of (181)).
(c) Let k1 , k2 , . . . , k q ∈ [n − 1], and let σ = sk1 sk2 · · · skq . We must prove that
q ≥ ℓ (σ) and q ≡ ℓ (σ) mod 2.
From σ = sk1 sk2 · · · skq , we obtain
 
ℓ (σ ) = ℓ sk1 sk2 · · · sk q
 
≡ ℓ s k 1 s k 2 · · · s k q −1 + 1 (by (181))
 
≡ ℓ s k 1 s k 2 · · · s k q −2 + 2
     
since (181) yields ℓ sk1 sk2 · · · skq−1 ≡ ℓ sk1 sk2 · · · skq−2 + 1 mod 2
 
≡ ℓ s k 1 s k 2 · · · s k q −3 + 3
     
since (181) yields ℓ sk1 sk2 · · · skq−2 ≡ ℓ sk1 sk2 · · · skq−3 + 1 mod 2
≡ ···
≡ ℓ (id) + q (since the product of 0 simples is id)
| {z }
=0
= q mod 2.
In other words, q ≡ ℓ (σ ) mod 2. A similar argument (but using inequalities
instead of congruences, and using (180) instead of (181)) shows that q ≥ ℓ (σ).
Thus, Corollary 5.3.20 (c) is proved.
Math 701 Spring 2021, version April 6, 2024 page 297

Corollary 5.3.22. Let n ∈ N. Then, the group Sn is generated by the simples


s 1 , s 2 , . . . , s n −1 .

Proof. This follows directly from Theorem 5.3.17 (a).


Theorem 5.3.17 (a) shows that every permutation σ ∈ Sn can be represented
as a product of ℓ (σ) simples (and in most cases, this can be done in many
different ways). It turns out that there is a rather explicit way to find such a
representation:

Remark 5.3.23. Let n ∈ N and σ ∈ Sn . Let us represent the Lehmer code of


σ visually as follows:
We draw an (empty) n × n-matrix.
For each i ∈ [n], we put a cross × into the cell (i, σ (i )) of the matrix.
In the following, I will use the case n = 6 and σ = 513462 (in one-line
notation) as a running example. In this case, the matrix looks as follows:

Now, starting from each ×, we draw a vertical ray downwards and a hori-
zontal ray eastwards. I will call these two rays the Lehmer lasers. Here is how
the rays look like in our running example:

Now, we draw a little circle ◦ into each cell that is not hit by any laser.
Math 701 Spring 2021, version April 6, 2024 page 298

Here is where the circles end up in our example:

This picture is called the Rothe diagram of σ.


Explicitly, a cell (i, j) has a ◦ in it if and only if
σ (i ) > j and σ−1 ( j) > i
(indeed, the vertical laser in column j hits cell (i, j) if and only if σ−1 ( j) ≤ i,
whereas the horizontal laser in row i hits cell (i, j) if and only if σ (i ) ≤ j).
If we substitute σ ( j) for j in this statement, then we obtain the following:
A cell (i, σ ( j)) has a ◦ in it if and only if
σ (i ) > σ ( j) and j > i.
In other words, a cell (i, σ ( j)) has a ◦ in it if and only if (i, j) is an inversion
of σ.
Thus,
ℓ (σ) = (# of inversions of σ) = (# of ◦ ’s) ,
and
ℓi (σ) = (# of ◦ ’s in row i ) for each i ∈ [n] .
Finally, let us label the ◦’s as follows: For each i ∈ [n], we label the ◦’s
in row i from right to left by the simple transpositions si , si+1 , si+2 , . . . , si′ −1
where i′ = i + ℓi (σ). (This works, since there are precisely ℓi (σ) = i′ − i
many ◦’s in row i.) Here is how this labeling looks in our running example:

s4 s3 s2 s1

s3
s4
s5
Math 701 Spring 2021, version April 6, 2024 page 299

Finally, read the matrix row by row, starting with the top row, and reading
each row from left to right. The result, in our running example, is

s4 s3 s2 s1 s3 s4 s5 .

Reading this as a product, we obtain a product of ℓ (σ ) simples that equals


σ.

The claim of Remark 5.3.23 can be restated in a more direct (if less visual)
fashion:

Proposition 5.3.24. Let n ∈ N. Let σ ∈ Sn . For each i ∈ [n], we set

ai := cyci′ ,i′ −1,i′ −2,...,i = si′ −1 si′ −2 si′ −3 · · · si , (182)

where i′ = i + ℓi (σ). Then, σ = a1 a2 · · · an . (The second equality sign in (182)


is not hard to check. Note that an = id.)

We refer to [Grinbe15, Exercise 5.21 parts (b) and (c)] for a (detailed, but an-
noyingly long) proof of Proposition 5.3.24. (You are probably better off proving
it yourself.)

5.4. Signs of permutations


The notion of the sign (aka signature) of a permutation is a simple consequence
of that of its length; moreover, it is rather well-known, due to its role in the
definition of a determinant. Thus we will survey its properties quickly and
without proofs. More can be found in [Grinbe15, §5.3 and §5.6] and [Strick13,
Appendix B].

Definition 5.4.1. Let n ∈ N. The sign of a permutation σ ∈ Sn is defined to


be the integer (−1)ℓ(σ) .
It is denoted by (−1)σ or sgn (σ ) or sign (σ) or ε (σ ). It is also known as
the signature of σ.

Proposition 5.4.2. Let n ∈ N.


(a) The sign of the permutation id ∈ Sn is (−1)id = 1.
(b) For any two distinct elements i and j of [n], the transposition ti,j ∈ Sn
has sign (−1)ti,j = −1.
(c) For any positive integer k and any distinct elements i1 , i2 , . . . , ik ∈ [n],
cyc
the k-cycle cyci1 ,i2 ,...,ik has sign (−1) i1 ,i2 ,...,ik = (−1)k−1 .
(d) We have (−1)στ = (−1)σ · (−1)τ for any σ ∈ Sn and τ ∈ Sn .
Math 701 Spring 2021, version April 6, 2024 page 300

(e) We have (−1)σ1 σ2 ···σp = (−1)σ1 (−1)σ2 · · · (−1)σp for any σ1 , σ2 , . . . , σp ∈


Sn .
−1
(f) We have (−1)σ = (−1)σ for any σ ∈ Sn . (The left hand side here has
−1
to be understood as (−1)(σ ) .)
(g) We have

σ (i ) − σ ( j )
(−1)σ = ∏ i−j
for each σ ∈ Sn .
1≤ i < j ≤ n

(The product sign “ ∏


” means a product over all pairs (i, j) of integers
1≤ i < j ≤ n
 
n
satisfying 1 ≤ i < j ≤ n. There are such pairs.)
2
(h) If x1 , x2 , . . . , xn are any elements of some commutative ring, and if σ ∈
Sn , then  
∏ ∏
σ 
x σ (i ) − x σ( j) = (− 1 ) x i − x j .
1≤ i < j ≤ n 1≤ i < j ≤ n

Proof of Proposition 5.4.2 (sketched). Most of this follows easily from what we
have proved above, but here are references to complete proofs:
(a) This is [Grinbe15, Proposition 5.15 (a)], and follows easily from ℓ (id) = 0.
(d) This is [Grinbe15, Proposition 5.15 (c)], and follows easily from Corollary
5.3.20 (a). A different proof appears in [Strick13, Proposition B.13].
(b) This is [Grinbe15, Exercise 5.10 (b)], and follows easily from Exercise
A.4.3.2 (a).
(c) This is [Grinbe15, Exercise 5.17 (d)], and follows easily from Exercise
A.4.2.1 (a) and Exercise A.4.3.2 (b) using Proposition 5.4.2 (d).
(e) This is [Grinbe15, Proposition 5.28], and follows by induction from Propo-
sition 5.4.2 (d).
(f) This is [Grinbe15, Proposition 5.15 (d)], and follows easily from Proposi-
tion 5.4.2 (d) or from Proposition 5.3.13.
(h) This is [Grinbe15, Exercise 5.13 (a)] (or, rather, the straightforward gener-
alization of [Grinbe15, Exercise 5.13 (a)] to arbitrary commutative rings). The
proof is fairly easy: Each factor xσ(i) − xσ( j) on the left hand side appears also
on the right hand side, albeit with a different sign if (i, j) is an inversion of
σ. Thus, the products on both sides agree up to a sign, which is precisely
(−1)ℓ(σ) = (−1)σ .
(g) This is [Grinbe15, Exercise 5.13 (c)], and is a particular case of Proposition
5.4.2 (h).
Math 701 Spring 2021, version April 6, 2024 page 301

Corollary 5.4.3. Let n ∈ N. The map

Sn → {1, −1} ,
σ 7→ (−1)σ

is a group homomorphism from the symmetric group Sn to the order-2 group


{1, −1}. (Of course, {1, −1} is a group with respect to multiplication.)

This map is known as the sign homomorphism.


Proof of Corollary 5.4.3 (sketched). Proposition 5.4.2 (d) shows that this map re-
spects multiplication (i.e., sends products to products). However, if a map
between two groups respects multiplication, then it is automatically a group
homomorphism. Thus, Corollary 5.4.3 follows.

Definition 5.4.4. Let n ∈ N. A permutation σ ∈ Sn is said to be

• even if (−1)σ = 1 (that is, if ℓ (σ) is even);

• odd if (−1)σ = −1 (that is, if ℓ (σ) is odd).

The sign and the “parity” (i.e., evenness/oddness) of a permutation have


applications throughout mathematics (in the definition of determinants and
the construction of exterior powers) as well as in the solution of permutation
puzzles (such as Rubik’s cube and the 15-game; see [Mulhol21, Chapters 7–8
and Theorem 20.2.1] for example). Even permutations are also crucial in group
theory, as they form a group:

Corollary 5.4.5. Let n ∈ N. The set of all even permutations in Sn is a normal


subgroup of Sn .

This subgroup is known as the n-th alternating group (commonly called An ).


If n ≥ 5, then this group is a simple group (meaning that it has no normal
subgroups besides itself and the trivial group)79 , and this fact has been used
by Galois to prove that the general 5-th degree polynomial equation cannot be
solved using radicals.
Proof of Corollary 5.4.5 (sketched). The set of all even permutations in Sn is the
kernel of the group homomorphism Sn → {1, −1} from Corollary 5.4.3. Thus,
it is a normal subgroup of Sn (since any kernel is a normal subgroup).

79 See,e.g., https://groupprops.subwiki.org/wiki/Alternating_groups_are_simple or
[Goodma15, Theorem 10.3.4] for a proof.
Math 701 Spring 2021, version April 6, 2024 page 302

Corollary 5.4.6. Let n ≥ 2. Then,

(# of even permutations σ ∈ Sn ) = (# of odd permutations σ ∈ Sn ) = n!/2.

Proof of Corollary 5.4.6 (sketched). The symmetric group Sn contains the simple
transposition s1 (since n ≥ 2). If σ ∈ Sn , then

(−1)σs1 = (−1)σ · (−1)s1 (by Proposition 5.4.2 (d))


| {z }
=−1
(by Proposition 5.4.2 (b),
since s1 =t1,2 )
σ
= − (−1) .

Hence, a permutation σ ∈ Sn is even if and only if the permutation σs1 is odd.


Hence, the map

{even permutations σ ∈ Sn } → {odd permutations σ ∈ Sn } ,


σ 7→ σs1

is well-defined. This map is furthermore a bijection (since Sn is a group). Thus,


the bijection principle yields

(# of even permutations σ ∈ Sn ) = (# of odd permutations σ ∈ Sn ) .

Both sides of this equality must furthermore equal to n!/2, since they add up
to |Sn | = n!. This proves Corollary 5.4.6. (See [Grinbe15, Exercise 5.4] for
details.)
As a consequence of Corollary 5.4.6, we see that

∑ (−1)σ = 0 for each n ≥ 2. (183)


σ ∈ Sn

(Indeed, the sum ∑ (−1)σ can be rewritten as


σ ∈ Sn

(# of even permutations σ ∈ Sn ) − (# of odd permutations σ ∈ Sn ) ,

since the addends corresponding to the even permutations σ ∈ Sn are equal


to 1 whereas the addends corresponding to the odd permutations σ ∈ Sn are
equal to −1.)
We note that the sign can be defined not only for a permutation σ ∈ Sn , but
also for any permutation of any finite set X (even if the set X has no chosen
total order on it, as the set [n] has). Here is one way to do so:
Math 701 Spring 2021, version April 6, 2024 page 303

Proposition 5.4.7. Let X be a finite set. We want to define the sign of any
permutation of X.
Fix a bijection ϕ : X → [n] for some n ∈ N. (Such a bijection always exists,
since X is finite.) For every permutation σ of X, set
−1
(−1)σϕ := (−1)ϕ◦σ◦ϕ .

Here, the right hand side is well-defined, since ϕ ◦ σ ◦ ϕ−1 is a permutation


of [n]. Now:
(a) This number (−1)σϕ depends only on the permutation σ, but not on the
bijection ϕ. (In other words, if ϕ1 and ϕ2 are two bijections from X to [n],
then (−1)σϕ1 = (−1)σϕ2 .)
Thus, we shall denote (−1)σϕ by (−1)σ from now on. We refer to this
number (−1)σ as the sign of the permutation σ ∈ SX . (When X = [n], this
notation does not clash with Definition 5.4.1, since we can pick the bijection
−1
ϕ = id and obtain (−1)σϕ = (−1)id ◦σ◦id = (−1)σ .)

(b) The identity permutation id : X → X satisfies (−1)id = 1.


(c) We have (−1)στ = (−1)σ · (−1)τ for any two permutations σ and τ of
X.
Proof of Proposition 5.4.7 (sketched). This all follows quite easily from Proposition
5.4.2 (d). See [Grinbe15, Exercise 5.12] for a detailed proof.

5.5. The cycle decomposition


Next, we shall discuss the cycle decomposition (or disjoint cycle decomposition) of a
permutation. Again, this is a fairly well-known and elementary tool, so we will
restrict ourselves to the basic properties and omit the details.
We begin with an introductory example:
Example 5.5.1. Let σ ∈ S9 be the permutation with one-line notation
461352987. Here is its cycle digraph:

3 2 9

5 8
4

1 6 7

(where we have strategically arranged the cycles apart horizontally). This


digraph consists of five node-disjoint cycles (i.e., cycles that share no nodes);
Math 701 Spring 2021, version April 6, 2024 page 304

thus, we can view the permutation σ as acting on the five subsets {1, 4, 3},
{2, 6}, {5}, {7, 9} and {8} of [9] separately. On the first of these five subsets,
σ acts as the 3-cycle cyc1,4,3 (in the sense that σ (k) = cyc1,4,3 (k ) for each
k ∈ {1, 4, 3}). On the second, it acts as the 2-cycle cyc2,6 . On the third, it acts
as the 1-cycle cyc5 (which, of course, is the identity map). On the fourth, it
acts as the 2-cycle cyc7,9 . On the fifth, it acts as the 1-cycle cyc8 (which, again,
is just the identity map). Combining these observations, we conclude that
 
σ (k ) = cyc1,4,3 ◦ cyc2,6 ◦ cyc5 ◦ cyc7,9 ◦ cyc8 (k ) for each k ∈ [9]

(because, when we apply the composed permutation


cyc1,4,3 ◦ cyc2,6 ◦ cyc5 ◦ cyc7,9 ◦ cyc8 to an element of [9], then four of the
five cycles cyc1,4,3 , cyc2,6 , cyc5 , cyc7,9 , cyc8 will leave this element unchanged,
whereas the remaining one will move it one step forward along the appro-
priate cycle of the above digraph – which is precisely what the permutation
σ does to our element). This entails that

σ = cyc1,4,3 ◦ cyc2,6 ◦ cyc5 ◦ cyc7,9 ◦ cyc8 . (184)

In Example 5.5.1, we have represented our permutation σ ∈ S9 as a composi-


tion of five cycles with the property that each element of [9] appears in exactly
one of these cycles. This is not specific to the permutation σ chosen in Example
5.5.1. Indeed, for any finite set X, any permutation σ ∈ SX can be written as a
composition of finitely many cycles cyci1 ,i2 ,...,ik with the property that each ele-
ment of X appears in exactly one of these cycles. Moreover, this representation
of σ is unique up to
• swapping the cycles (for example, we could have replaced cyc1,4,3 ◦ cyc2,6
by cyc2,6 ◦ cyc1,4,3 in (184)), and
• rotating each cycle (for example, we could have replaced cyc1,4,3 by cyc4,3,1
or by cyc3,1,4 in (184)).
Let us state this in a more rigorous fashion:
Theorem 5.5.2 (disjoint cycle decomposition of permutations). Let X be a
finite set. Let σ be a permutation of X. Then:
(a) There is a list
 
a1,1 , a1,2 , . . . , a1,n1 ,
( a2,1 , a2,2 , . . . , a2,n2 ) ,
...,

ak,1 , ak,2 , . . . , ak,nk
Math 701 Spring 2021, version April 6, 2024 page 305

of nonempty lists of elements of X such that:

• each element of X appears exactly once in the composite list

( a1,1 , a1,2 , . . . , a1,n1 ,


a2,1 , a2,2 , . . . , a2,n2 ,
...,
ak,1 , ak,2 , . . . , ak,nk ),

and

• we have

σ = cyca1,1 ,a1,2 ,...,a1,n ◦ cyca2,1 ,a2,2 ,...,a2,n ◦ · · · ◦ cycak,1 ,ak,2 ,...,ak,n .


1 2 k

Such a list is called a disjoint cycle decomposition (or short DCD) of σ. Its
entries (which themselves are lists of elements of X) are called the cycles of
σ.
(b) Any two DCDs of σ can be obtained from each other by (repeatedly)
swapping the cycles  with each other, and rotating each cycle (i.e., replacing
ai,1 , ai,2 , . . . , ai,ni by ai,2 , ai,3 , . . . , ai,ni , ai,1 ).
(c) Now assume that X is a set of integers (or, more generally, any totally
ordered finite set). Then, there is a unique DCD
 
a1,1 , a1,2 , . . . , a1,n1 ,
( a2,1 , a2,2 , . . . , a2,n2 ) ,
...,

ak,1 , ak,2 , . . . , ak,nk

of σ that satisfies the additional requirements that

• we have ai,1 ≤ ai,p for each i ∈ [k ] and each p ∈ [ni ] (that is, each cycle
in this DCD is written with its smallest entry first), and

• we have a1,1 > a2,1 > · · · > ak,1 (that is, the cycles appear in this DCD
in the order of decreasing first entries).

Example 5.5.3. Let σ ∈ S9 be the permutation from Example 5.5.1. Then, the
representation (184) shows that
 
(1, 4, 3) , (2, 6) , (5) , (7, 9) , (8)
Math 701 Spring 2021, version April 6, 2024 page 306

is a DCD of σ. By swapping the five cycles of this DCD, and by rotating each
cycle, we can produce various other DCDs of σ, such as
 
(7, 9) , (6, 2) , (3, 1, 4) , (8) , (5) .

The unique DCD of σ that satisfies the two additional requirements of Theo-
rem 5.5.2 (c) is  
(8) , (7, 9) , (5) , (2, 6) , (1, 4, 3) .

Proof of Theorem 5.5.2 (sketched). This is a classical result with an easy proof;
sadly, this easy proof does not present well in writing. I will try to be as clear
as the situation allows. Some familiarity with digraphs (= directed graphs) is
recommended80 .
(a) Let D be the cycle digraph of σ, as in Example 5.5.1. This cycle digraph
D has the following two properties:

• Outbound uniqueness: For each node i, there is exactly one arc outgoing
from i. (Indeed, this is the arc from i to σ (i ), as should be clear from the
construction of D .)

• Inbound uniqueness: For each node i, there is exactly one arc incoming into
i. (Indeed, this is the arc from σ−1 (i ) to i, since σ is a permutation and
therefore invertible.)

Using these two properties, we will now show that the cycle digraph D con-
sists of several node-disjoint cycles (i.e., several cycles that pairwise share no
nodes).
Indeed, let us first observe the following: If two cycles C and D of D have a
node in common, then they are identical81 (because the outbound uniqueness
property prevents these cycles from ever separating after meeting at the com-
mon node). In other words, any two cycles C and D of D are either identical or
node-disjoint (i.e., share no nodes with each other).
Now, let i be any node of D . Then, if we start at i and follow the outgoing
arcs, then we obtain an infinite walk

σ0 (i ) → σ1 (i ) → σ2 (i ) → σ3 (i ) → · · ·

along our digraph D . Since X is finite, the Pigeonhole Principle guarantees that
this walk will eventually revisit a node it has already been to; i.e., there exist

80 See,
e.g., [Guicha20, §5.11] or [Loehr11, §3.5] for brief introductions to digraphs.
81 Here, we identify any cycle with its cyclic rotations. For example, if a → b → c → a is a
cycle, then we consider b → c → a → b to be the same cycle.
Math 701 Spring 2021, version April 6, 2024 page 307

two integers u, v ∈ N with u < v and σu (i ) = σv (i ). Let us pick two such


integers u, v with the smallest possible v. Thus,

the v nodes σ0 (i ) , σ1 (i ) , . . . , σv−1 (i ) are distinct (185)

(since otherwise, v would not be smallest possible). Now, σ is a permutation


and thus has an inverse σ−1 . Applying the map σ−1 to both sides of the equality
σu (i ) = σv (i ), we obtain σu−1 (i ) = σv−1 (i ). However, if we had u ≥ 1, then
(185) would entail σu−1 (i ) ̸= σv−1 (i ) (because 0 ≤ |{z}
u − 1 < v − 1), which
<v
would contradict σu−1 (i ) = σv−1 (i ). Thus, we cannot have u ≥ 1. Hence,
u < 1, so that u = 0. Therefore, σu (i ) = σ0 (i ), so that σ0 (i ) = σu (i ) = σv (i ).
This shows that our walk σ0 (i ) → σ1 (i ) → σ2 (i ) → σ3 (i ) → · · · is circular: it
comes back to its starting node σ0 (i ) = i after v steps. We have thus found a
cycle in our digraph D :

σ0 (i ) → σ1 (i ) → σ2 (i ) → · · · → σ v (i ) = σ0 (i ) .

(This is indeed a cycle, since (185) shows that its first v nodes are distinct.) This
shows that the node i lies on a cycle Ci of D (namely, the cycle that we just
found).
Now, forget that we fixed i. We thus have shown that each node i of D lies
on a cycle Ci . The cycles Ci for all nodes i ∈ X will be called the chosen cycles.
Any arc of our digraph D must belong to one of these chosen cycles. Indeed,
if a is an arc from a node i to a node j, then a must be the only arc outgoing
from i (by the outbound uniqueness property); but this means that this arc a
belongs to the chosen cycle Ci .
Now, let us look back at what we have shown:

• Any node i of D lies on one of our chosen cycles (namely, on Ci ).

• Some of the chosen cycles may be identical, but apart from that, the cho-
sen cycles are pairwise node-disjoint (since any two cycles of D are either
identical or node-disjoint).

• Any arc of D must belong to one of these chosen cycles.

Combining these facts, we conclude that D consists of several node-disjoint


cycles. Let us label these cycles as

a1,1 → a1,2 → · · · → a1,n1 → a1,1 ,
( a2,1 → a2,2 → · · · → a2,n2 → a2,1 ) ,
...,

ak,1 → ak,2 → · · · → ak,nk → ak,1
Math 701 Spring 2021, version April 6, 2024 page 308

(making sure to label each cycle only once). Then, each element of X appears
exactly once in the composite list
( a1,1 , a1,2 , . . . , a1,n1 ,
a2,1 , a2,2 , . . . , a2,n2 ,
...,
ak,1 , ak,2 , . . . , ak,nk ),
and we have
σ = cyca1,1 ,a1,2 ,...,a1,n ◦ cyca2,1 ,a2,2 ,...,a2,n ◦ · · · ◦ cycak,1 ,ak,2 ,...,ak,n
1 2 k

(since σ moves any node i ∈ X one step forward along its chosen cycle). This
proves Theorem 5.5.2 (a).
Alternative proofs of Theorem 5.5.2 (a) can be found (e.g.) in [Goodma15,
Theorem 1.5.3] or in [Knapp16, §I.4, Proposition 1.21] or in [Bourba74, Chap-
ter I, §5.7, Proposition 7] or in [Sagan19, §1.9, proof of Theorem 1.5.1] (this
is essentially our proof) or in https://proofwiki.org/wiki/Existence_and_
Uniqueness_of_Cycle_Decomposition (see also [17f-hw7s, Exercise 7 (e) and
(d)] for a rather formalized proof). Note that some of these sources work with
a slightly modified concept of a DCD, in which they throw away the 1-cycles
(i.e., they replace “appears exactly once” by “appears at most once”, and re-
quire all cycle lengths n1 , n2 , . . . , nk to be > 1). For instance, the DCD (184)
becomes
σ = cyc1,4,3 ◦ cyc2,6 ◦ cyc7,9
if we use this modified notion of a DCD.
(b) See [Goodma15, Theorem 1.5.3] or [Bourba74, Chapter I, §5.7, Proposition
7]. The idea is fairly simple: Let
 
a1,1 , a1,2 , . . . , a1,n1 ,
( a2,1 , a2,2 , . . . , a2,n2 ) ,
...,

ak,1 , ak,2 , . . . , ak,nk

be a DCD of σ. Then, for each i ∈ X, the cycle of this DCD that contains i
is uniquely determined by σ and i up to cyclic  rotation (indeed, it is a rotated
2
version of the list i, σ (i ) , σ (i ) , . . . , σ r − 1 (i ) , where r is the smallest positive
integer satisfying σr (i ) = i). Therefore, all cycles of this DCD are uniquely
determined by σ up to cyclic rotation and up to the relative order in which
these cycles appear in the DCD. But this is precisely the claim of Theorem 5.5.2
(b).
(c) In order to obtain a DCD of σ that satisfies these two requirements, it
suffices to
Math 701 Spring 2021, version April 6, 2024 page 309

• start with an arbitrary DCD of σ,

• then rotate each cycle of this DCD so that it begins with its smallest entry,
and

• then repeatedly swap these cycles so they appear in the order of decreas-
ing first entries.

It is clear that the result of this procedure is uniquely determined (a conse-


quence of Theorem 5.5.2 (b)). Thus, Theorem 5.5.2 (c) is proven.

Definition 5.5.4. Let X be a finite set. Let σ be a permutation of X.


(a) The cycles of σ are defined to be the cycles in the DCD of σ (as defined
in Theorem 5.5.2 (a)). (This includes 1-cycles, if there are any in the DCD of
σ.)
We shall equate a cycle of σ with any of its cyclic rotations; thus, for ex-
ample, (3, 1, 4) and (1, 4, 3) shall be regarded as being the same cycle (but
(3, 1, 4) and (3, 4, 1) shall not).
(b) The cycle lengths partition of σ shall denote the partition of | X | obtained
by writing down the lengths of the cycles of σ in weakly decreasing order.

Example 5.5.5. Let σ ∈ S9 be the permutation from Example 5.5.1. Then, the
cycles of σ are
(1, 4, 3) , (2, 6) , (5) , (7, 9) , (8) .
Their lengths are 3, 2, 1, 2, 1. Hence, the cycle lengths partition of σ is
(3, 2, 2, 1, 1).

The following is obvious:

Proposition 5.5.6. Let X be a finite set. Let i, j ∈ X. Let σ be a permutation


of X. Then, we have the following chain of equivalences:

(i and j belong to the same cycle of σ)


⇐⇒ (i = σ p ( j) for some p ∈ N)
⇐⇒ ( j = σ p (i ) for some p ∈ N) .

The number of cycles of a permutation determines its sign. Let us state this
for permutations of [n] in particular (the reader can easily extend this to the
general case using Proposition 5.4.7):

Proposition 5.5.7. Let n ∈ N. Let σ ∈ Sn . Let k ∈ N be such that σ has


exactly k cycles (including the 1-cycles). Then, (−1)σ = (−1)n−k .
Math 701 Spring 2021, version April 6, 2024 page 310

Proof of Proposition 5.5.7 (sketched). Let



a1,1 , a1,2 , . . . , a1,n1 ,
( a2,1 , a2,2 , . . . , a2,n2 ) ,
...,

ak,1 , ak,2 , . . . , ak,nk
be the k cycles of σ. Thus,
 
a1,1 , a1,2 , . . . , a1,n1 ,
( a2,1 , a2,2 , . . . , a2,n2 ) ,
...,

ak,1 , ak,2 , . . . , ak,nk (186)

is the DCD of σ. Therefore,


σ = cyca1,1 ,a1,2 ,...,a1,n ◦ cyca2,1 ,a2,2 ,...,a2,n ◦ · · · ◦ cycak,1 ,ak,2 ,...,ak,n
1 2 k

= (an n1 -cycle) ◦ (an n2 -cycle) ◦ · · · ◦ (an nk -cycle) ,


so that
(−1)σ = (−1)(an n1 -cycle)◦(an n2 -cycle)◦···◦(an nk -cycle)
= (−1)(an n1 -cycle) · (−1)(an n2 -cycle) · · · · · (−1)(an nk -cycle)
(by Proposition 5.4.2 (e))
= (−1)n1 −1 · (−1)n2 −1 · · · · · (−1)nk −1
!
(a p-cycle) p −1
since Proposition 5.4.2 (c) yields (−1) = (−1)
for any p > 0
= (−1)(n1 −1)+(n2 −1)+···+(nk −1) . (187)
However, recall that (186) is a DCD of σ. Thus, each element of [n] appears
exactly once in the composite list
( a1,1 , a1,2 , . . . , a1,n1 ,
a2,1 , a2,2 , . . . , a2,n2 ,
...,
ak,1 , ak,2 , . . . , ak,nk ).
Therefore, the length n1 + n2 + · · · + nk of this composite list equals the size
|[n]| = n of the set [n]. In other words, n1 + n2 + · · · + nk = n. Hence,
(n1 − 1) + (n2 − 1) + · · · + (nk − 1) = (n1 + n2 + · · · + nk ) − k = n − k.
| {z }
=n

Thus, (187) rewrites as (−1)σ = (−1)n−k . This proves Proposition 5.5.7.


Math 701 Spring 2021, version April 6, 2024 page 311

5.6. References
We end our discussion of permutations here, although we will revisit it every
once in a while. Much more about permutations can be found in [Bona12],
[Kitaev11] (focussing on permutation patterns), [Sagan01] (focussing on the
representation theory of the symmetric group) and various other texts.
It is worth mentioning that the symmetric groups Sn are a particular case of
Coxeter groups – a class of groups highly significant to algebra, combinatorics
and geometry. One of the most combinatorial introductions to this subject
(which sheds new light on the combinatorics of symmetric groups) is the highly
readable text [BjoBre05]. Other texts include [Cohen08] and (for the particularly
resolute) [Bourba02, Chapter IV].

6. Alternating sums, signed counting and


determinants
This chapter is not concerned with any specific combinatorial objects like par-
titions or permutations, but rather with a set of simple ideas that appear often
(and not just in combinatorics). The main objects of study here are alternat-
ing sums – i.e., sums with a (−1)something factor in them. A poster child is the
determinant of a matrix. Such sums are often simplified by the cancellation
that occurs in them, with positive addends cancelling negative addends. Fre-
quently, understanding this cancellation is key to computing the sums. As a
rule of thumb, alternating sums are more likely to have simple closed-form
answers than non-alternating sums. For example, each n ∈ N satisfies

n   3 (−1)n/2 (3n/2)! , if n is even;

k n
∑ (−1) k =  (n/2)!3
k =0  0, if n is odd

(see Exercise A.2.15.7 (g)), but there is no closed form for


n  3
n
∑ k .
k =0

The use of cancellations to simplify alternating sums is old, but system-


atic surveys of applications of this technique have not appeared until recently
([Stanle11, Chapter 2], [Aigner07, Chapter 5], [BenQui03, Chapter 6], [BenQui08],
[Sagan19, Chapter 2], [Grinbe20]).

6.1. Cancellations in alternating sums


We begin with a simple binomial identity:
Math 701 Spring 2021, version April 6, 2024 page 312

Proposition 6.1.1 (Negative hockey-stick identity). Let n ∈ C and m ∈ N.


Then,
m
m n−1
   
k n
∑ (−1) k = (−1) m
. (188)
k =0

There are easy proofs of this proposition by induction on m or by the tele-


scope principle (see, e.g., [18f-hw2s, Exercise 4] and [19fco, Exercise 2.1.1 (a)
and §7.27]). However, let us try to prove the proposition combinatorially. For
a bijective proof, the (−1)k and (−1)m factors would be gamestoppers, as there
is no way to get a negative number (let alone a number of variable sign) by
counting something. However, if we think of the (−1)k as an opportunity for
cancelling addends, then we can use combinatorics pretty well:
Combinatorial proof of Proposition 6.1.1 (sketched). We need to prove the equality
(188). Both sides of this equality are polynomial functions in n. Thus, we can
WLOG assume that n is a positive integer (because of the polynomial identity
trick that we saw in Subsection 3.2.3). Assume this.
Set [n] := {1, 2, . . . , n}. We shall now introduce some notations tailored for
this particular proof.
Define an acceptable set to be a subset of [n] that has size ≤ m. Clearly,
m  
n
(# of acceptable sets) = ∑ (189)
k =0
k
 
n
(since the # of k-element subsets of [n] equals for each k ∈ Z). Incidentally,
k
we note that there is no closed form for this sum – another instance of the
phenomenon in which alternating sums are simpler than non-alternating ones.
Define the sign of a finite set I to be (−1)| I | . Then,

(the sum of the signs of all acceptable sets)


m  
k n
= ∑ (−1) . (190)
k =0
k

(This can be shown just like (189).)


However, the sum of the signs of all acceptable sets is a sum of 1s and −1s.
Let us try to cancel as many of these 1s and −1s against each other as we can,

 
n 1
hoping that what remains will be precisely (−1)m .
m
In order to cancel two addends, we need to pair up two finite sets I that
have opposite signs. How do we find such pairs? One way to do so is to pick
some set I that does not contain the element 1, and pair it up with I ∪ {1}.
Alternatively, we can pick some set I that contains the element 1, and pair it
Math 701 Spring 2021, version April 6, 2024 page 313

up with I \ {1}. In other words, we pair up a finite set I with either I \ {1} or
I ∪ {1}, depending on whether 1 ∈ I or 1 ∈ / I.
Let us do this systematically for all finite sets: For each finite set I, we define
the partner of I to be the set
(
I \ {1} , if 1 ∈ I;
I ′ := = I △ {1} ,
I ∪ {1} , if 1 ∈
/I

where the notation X △ Y means the symmetric difference ( X ∪ Y ) \ ( X ∩ Y ) of


two sets X and Y (as in Subsection 3.2.1). It is easy to see that each finite set I

satisfies I ′′ = I and | I ′ | = | I | ± 1, so that (−1)| I | = − (−1)| I | . Thus, if both I
and I ′ are acceptable sets, then their contributions to the sum of the signs of all
acceptable sets cancel each other out.
This does not mean that all addends in this sum cancel. Indeed, while each fi-
nite set has a partner, it may happen that an acceptable set has a non-acceptable
partner, and then the contribution of the former to the sum does not get can-
celled (since the partner does not contribute to the sum). Thus, in order to see
what remains of the sum after the cancellations, we need to study the accept-
able sets that have non-acceptable partners.
Fortunately, this is easy: In order for an acceptable set I to have a non-
acceptable partner, it needs to satisfy 1 ∈ / I and | I | = m. Better yet, this is an “if
and only if” statement:

Claim 1: Let I be an acceptable set. Then, the partner I ′ of I is non-


acceptable if and only if (1 ∈
/ I and | I | = m).

[Proof of Claim 1: The “if” direction is easy: If 1 ∈ / I and | I | = m, then the


partner I of I is defined by I = I ∪ {1} and thus has size | I ′ | = | I | + 1 > | I | =
′ ′

m, which shows that it is non-acceptable.


It remains to prove the converse, i.e., the “only if” direction. Thus, we assume
that the partner I ′ of I is non-acceptable. We must show that 1 ∈ / I and | I | = m.
Since I is acceptable, we have I ⊆ [n] and | I | ≤ m. If we had 1 ∈ I, then we
would have I ′ = I \ {1} ⊆ I ⊆ [n] and furthermore | I ′ | ≤ | I | (since I ′ ⊆ I), so
that | I ′ | ≤ | I | ≤ m. This would entail that I ′ is acceptable (since I ′ ⊆ [n] and
| I ′ | ≤ m), which would contradict our assumption that I ′ be non-acceptable.
Hence, we cannot have 1 ∈ I. Thus, 1 ∈ / I is proved. Hence, I ′ = I ∪ {1}.

However, 1 ∈ [n] (since n ≥ 1), and I = I ∪ {1} ⊆ [n] (since I ⊆ [n] and
1 ∈ [n]). Moreover, from I ′ = I ∪ {1}, we obtain | I ′ | = | I ∪ {1}| = | I | + 1.
Now, if we had | I | ≤ m − 1, then we would obtain | I ′ | = | I | + 1 ≤ m (since
| I | ≤ m − 1), which would entail that I ′ is acceptable (since I ′ ⊆ [n]), which
again would contradict our assumption. Thus, we cannot have | I | ≤ m − 1.
Hence, | I | > m − 1, so that | I | ≥ m and therefore | I | = m (since | I | ≤ m). Thus,
we have shown that 1 ∈ / I and | I | = m. This proves the “only if” direction and
thus completes the proof of Claim 1.]
Math 701 Spring 2021, version April 6, 2024 page 314

Now, recall our line of reasoning: We start with the sum of the signs of all ac-
ceptable sets, and we cancel any two addends that correspond to an acceptable
set and its acceptable partner. What remains are the addends corresponding
to the acceptable sets that have non-acceptable partners. According to Claim
1, these are precisely the acceptable sets I that satisfy (1 ∈/ I and | I | = m). In
other words, these are precisely the m-element subsets of [n] that do not con-
tain 1. In other words, these are precisely the m-element subsets of [n] \ {1}
(since a subset of [n] that   contain 1 is the same as a subset of [n] \ {1}).
does not
n−1
Thus, there are precisely of these subsets (since [n] \ {1} is an (n − 1)-
m
m
element  set), and each of them has sign (−1) . Hence, there are precisely
n−1

addends left in the sum after our cancellations, and each of these
m
addends is (−1)m . Hence,

m n−1
 
(the sum of the signs of all acceptable sets) = (−1) .
m
Comparing this with (190), we obtain
m
m n−1
   
k n
∑ (−1) k = (−1) m
.
k =0

This proves Proposition 6.1.1.


Let me outline how to formalize this argument without using vague notions
like “cancelling” and “pairing up”. We let

A := {acceptable sets}

and

X := {acceptable sets whose partner is acceptable}


= { I ⊆ [n] | | I | ≤ m but not (| I | = m and 1 ∈
/ I )}

(by Claim 1 in the above proof). Now, we define a map

f : X → X,
I 7→ I ′ .

This map f is a bijection, since each I ∈ X satisfies I ′′ = I and thus I ′ ∈ X . This


bijection f is furthermore sign-reversing, meaning that (−1)| f ( I )| = − (−1)| I | for
all I ∈ X . We claim that this automatically guarantees

(the sum of the signs of all acceptable sets)


= (the sum of the signs of all acceptable sets not belonging to X ) .
Math 701 Spring 2021, version April 6, 2024 page 315

The reason for this equality is that in the sum of the signs of all acceptable sets,
the contributions of the sets that belong to X (that is, of the acceptable sets
that have acceptable partners) cancel each other out. This principle is worth
generalizing and stating as a lemma:

Lemma 6.1.2 (Cancellation principle, take 1). Let A be a finite set. Let X be
a subset of A.
For each I ∈ A, let sign I be a real number (not necessarily 1 or −1). Let
f : X → X be a bijection with the property that

sign ( f ( I )) = − sign I for all I ∈ X . (191)

(Such a bijection f is called sign-reversing.) Then,

∑ sign I = ∑ sign I.
I ∈A I ∈A\X

Note that we did not require that f ◦ f = id in Lemma 6.1.2; we only required
that f is a bijection. That said, most examples that I know do have f ◦ f = id.
Proof of Lemma 6.1.2. Intuitively, this is clear: The contributions of all I ∈ X to
the sum ∑ sign I cancel out, to the extent they are not already zero. However,
I ∈A
rather than formalize this cancellation, let us give an even slicker argument:
We have

∑ sign I = ∑ sign ( f ( I ))
I ∈X I ∈X
| {z }
=− sign I
(by (191))
 
here, we have substituted f ( I ) for I
in the sum, since f : X → X is a bijection
= ∑ (− sign I ) = − ∑ sign I.
I ∈X I ∈X

Adding ∑ sign I to both sides of this equality, we obtain 2 · ∑ sign I = 0.


I ∈X I ∈X
Hence, ∑ sign I = 0 (since any real number a satisfying 2a = 0 must satisfy
I ∈X
a = 0).
Now, X ⊆ A; hence, we can split the sum ∑ sign I as follows:
I ∈A

∑ sign I = ∑ sign I + ∑ sign I = ∑ sign I.


I ∈A
|I ∈X {z } I ∈A\X I ∈A\X
=0

This proves Lemma 6.1.2.


Math 701 Spring 2021, version April 6, 2024 page 316

In the proof of Proposition 6.1.1, we applied Lemma 6.1.2 to

A = {acceptable sets} and


X = {acceptable sets whose partner is acceptable} and
sign I = (−1)| I | .

But there are many other situations in which Lemma 6.1.2 can be applied. For
example, A can be some set of permutations, and sign σ can be the sign of σ
(as in Definition 5.4.1).
Let us observe that Lemma 6.1.2 can be generalized. Indeed, in Lemma 6.1.2,
we can replace “real number” by “element of any Q-vector space” or even by
“element of any additive abelian group with the property that 2a = 0 implies
a = 0”. We cannot, however, remove this requirement entirely. Indeed, if all the
signs sign I were the element 1 of Z/2, then the sign-reversing condition (191)
would hold automatically (since 1 = −1 in Z/2), but the claim of Lemma 6.1.2
would not necessarily be true.
However, if we replace the word “bijection” by “involution with no fixed
points”, then Lemma 6.1.2 holds even without any requirements on the group:

Lemma 6.1.3 (Cancellation principle, take 2). Let A be a finite set. Let X be
a subset of A.
For each I ∈ A, let sign I be an element of some additive abelian group.
Let f : X → X be an involution (i.e., a map satisfying f ◦ f = id) that has no
fixed points. Assume that

sign ( f ( I )) = − sign I for all I ∈ X .

Then,
∑ sign I = ∑ sign I.
I ∈A I ∈A\X

Proof. The idea is that all addends corresponding to the I ∈ X cancel out from
the sum ∑ sign I (because they come in pairs of addends with opposite signs).
I ∈A
See Section B.6 for a detailed proof.
A more general version of Lemma 6.1.3 allows for f to have fixed points, as
long as these fixed points have sign 0:

Lemma 6.1.4 (Cancellation principle, take 3). Let A be a finite set. Let X be
a subset of A.
For each I ∈ A, let sign I be an element of some additive abelian group.
Let f : X → X be an involution (i.e., a map satisfying f ◦ f = id). Assume
that
sign ( f ( I )) = − sign I for all I ∈ X .
Math 701 Spring 2021, version April 6, 2024 page 317

Assume furthermore that

sign I = 0 for all I ∈ X satisfying f ( I ) = I.

Then,
∑ sign I = ∑ sign I.
I ∈A I ∈A\X

Proof. This is similar to Lemma 6.1.3, except that the addends corresponding to
the I ∈ X satisfying f ( I ) = I don’t cancel (but are already zero and thus can
be removed right away). See Section B.6 for a detailed proof.
Let us try to use this idea in another setting. Recall the notion of q-binomial
coefficients, and specifically their values (Definition 4.4.3 (b)).
 
n
Exercise 6.1.0.1. Let n, k ∈ N. Simplify .
k −1
 
4
Example 6.1.5. Let us compute . Theorem 4.4.13 (b) yields
2 −1

1 − q4 1 − q3
   
4
= .
2 q (1 − q2 ) (1 − q1 )

We cannot substitute −1 for q in this formula directly, since both numerator


and denominator would become 0 if we did. However, we can first simplify
the fraction and then substitute −1 for q: We have

1 − q4 1 − q3
   
4
= 2 1
= q4 + q3 + 2q2 + q + 1,
2 q (1 − q ) (1 − q )

so that (by substituting −1 for q) we obtain


 
4
= (−1)4 + (−1)3 + 2 (−1)2 + (−1) + 1 = 2.
2 −1

Solution of Exercise 6.1.0.1 (sketched). Proposition 4.4.7 (b) yields


 
n
= ∑
k q S⊆{1,2,...,n};
qsum S−(1+2+···+k) ,

|S|=k
Math 701 Spring 2021, version April 6, 2024 page 318

where sum S denotes the sum of the elements of a finite set S of integers. Sub-
stituting −1 for q in this equality, we find
 
n
= ∑ (−1)sum S−(1+2+···+k) .
k −1 S⊆{1,2,...,n };
|S|=k

Using the shorthand [n] for the set {1, 2, . . . , n}, we can rewrite this as
 
n
= ∑ (−1)sum S−(1+2+···+k) . (192)
k −1 S⊆[n];
|S|=k

Let us analyze the sum on the right hand side using sign-reversing involutions.
Thus, we set

A := {S ⊆ [n] | |S| = k} = {k-element subsets of [n]}

and
sign S := (−1)sum S−(1+2+···+k) for every S ∈ A.
Hence, (192) rewrites as  
n
k −1 S∑
= sign S. (193)
∈A
Now, we seek a reasonable subset X ⊆ A and a sign-reversing bijection f :
X → X in order to cancel addends in the sum ∑ sign S.
S∈A
To wit, let us try to construct f as a partial map first, and then (as an af-
terthought) define X to be the set of all S ∈ A for which f (S) is defined.
Consider a k-element subset S of [n]. What is a way to transform S that leaves
its size |S| = k unchanged, but flips its sign (i.e., flips the parity of sum S) ? One
thing we can do is switching 1 with 2. By this I mean the following operation:

• If 1 ∈ S and 2 ∈
/ S, then we replace 1 by 2 in S.

• Otherwise, if 2 ∈ S and 1 ∈
/ S, then we replace 2 by 1 in S.

• If none of 1 and 2 is in S, or if both are in S, then we leave S unchanged


for now.

Thus, switching 1 with 2 means replacing S by



(S \ {1}) ∪ {2} , if 1 ∈ S and 2 ∈
 / S;
switch1,2 (S) := (S \ {2}) ∪ {1} , if 1 ∈
/ S and 2 ∈ S;

S, otherwise.

Math 701 Spring 2021, version April 6, 2024 page 319

For example,

switch1,2 ({1, 3, 5}) = {2, 3, 5} ;


switch1,2 ({2, 3, 5}) = {1, 3, 5} ;
switch1,2 ({1, 2, 5}) = {1, 2, 5} ;
switch1,2 ({3, 4, 5}) = {3, 4, 5} .

Notice that the definition of switching 1 with 2 can be restated in a somewhat


simpler way using symmetric differences (see Subsection 3.2.1 for the definition
of symmetric differences):
(
S △ {1, 2} , if |S ∩ {1, 2}| = 1;
switch1,2 (S) :=
S, otherwise.

Indeed, the condition “|S ∩ {1, 2}| = 1” is equivalent to having either (1 ∈ S and 2 ∈ / S)
or (1 ∈/ S and 2 ∈ S); and in this case, the symmetric difference S △ {1, 2} is
precisely the set we need (i.e., the set (S \ {1}) ∪ {2} if we have (1 ∈ S and 2 ∈
/ S ),
and the set (S \ {2}) ∪ {1} if we have (1 ∈ / S and 2 ∈ S)).
This map switch1,2 : A → A is certainly a bijection (and, in fact, an involu-
tion). It is not sign-reversing on the entire set A; however, it has the property
that the sign of switch1,2 (S) is opposite to the sign of S whenever we have
(1 ∈ S and 2 ∈ / S) or (1 ∈
/ S and 2 ∈ S) (because in these two cases, sum S ei-
ther increases by 1 or decreases by 1, respectively). We can restate this property
as follows: The sign of switch1,2 (S) is opposite to the sign of S whenever we
have |S ∩ {1, 2}| = 1. Thus, we can use switch1,2 to cancel many addends from
our sum ∑ sign S. Still, many other addends (of different signs) remain, and
S∈A
the result is far from simple.
Thus, we need a “Plan B” if the map switch1,2 does not succeed. Assuming
that |S ∩ {1, 2}| ̸= 1 (that is, the set S ∈ A contains none or both of 1 and 2),
we gain nothing by switching 1 with 2 in S, but maybe we get lucky switching
2 with 3 in S (which is defined in the same way as switching 1 with 2, but with
the obvious changes)? If that, too, fails, we can try to switch 3 with 4. If that
fails as well, we can try to switch 4 with 5, and so on, until we get to the end of
the set [n].
In other words, we try to define a bijection f : A → A as follows: For any
S ∈ A, we pick the smallest i ∈ [n − 1] such that |S ∩ {i, i + 1}| = 1 (in other
words, the smallest i ∈ [n − 1] such that exactly one of the two elements i and
i + 1 belongs to S); and we switch i with i + 1 in S (that is, we replace S by
S △ {i, i + 1}). This produces a new subset S′ of [n] that has the same size as S
but has the opposite sign (actually, we have sum S′ = sum S ± 1), except for the
two cases when S = ∅ and when S = [n] (these are the cases where we cannot
find any i ∈ [n − 1] such that |S ∩ {i, i + 1}| = 1). We set f (S) := S △ {i, i + 1}.
Math 701 Spring 2021, version April 6, 2024 page 320

Here are some examples (for n = 4 and k = 2):

f ({1, 3}) = {2, 3} (here, the smallest i is 1) ;


f ({1, 4}) = {2, 4} (here, the smallest i is 1) ;
f ({3, 4}) = {2, 4} (here, the smallest i is 2) .

Alas, the last two of these examples show that f is not injective (as f ({1, 4}) =
{2, 4} = f ({3, 4})). Thus, f is not a bijection. The underlying problem is that
the i that was picked in the construction of f (S) is not uniquely recoverable
from f (S). Hence, our map f does not work for us – we cannot use it to cancel
addends, since we cannot cancel (e.g.) a single 1 against multiple −1s.
How can we salvage this argument? We change our map f to “space” our
switches apart. That is, we again start by trying to switch 1 with 2; if this fails,
we jump straight to trying to switch 3 with 4; if this fails too, we jump further
to trying to switch 5 with 6; and so on, until we either succeed at some switch
or run out of pairs to switch. For the explicit description of f , this means that
instead of picking the smallest i ∈ [n − 1] such that |S ∩ {i, i + 1}| = 1, we
pick the smallest odd i ∈ [n − 1] such that |S ∩ {i, i + 1}| = 1; and then we set
f (S) := S △ {i, i + 1} as before.
In other words, we define our new map f : A → A as follows: For any
S ∈ A, we set
f (S) := S △ {i, i + 1} ,
where i is the smallest odd element of [n − 1] such that |S ∩ {i, i + 1}| = 1. If
no such i exists, we just set f (S) := S. (We will soon see when this happens.)
Here are some examples (for n = 8 and k = 3):

f ({1, 3, 4}) = {2, 3, 4} (here, the smallest odd i is 1) ;


f ({2, 4, 5}) = {1, 4, 5} (here, the smallest odd i is 1) ;
f ({1, 2, 3}) = {1, 2, 4} (here, the smallest odd i is 3) ;
f ({3, 5, 7}) = {4, 5, 7} (here, the smallest odd i is 3) ;
f ({3, 4, 5}) = {3, 4, 6} (here, the smallest odd i is 5) ;
f ({5, 6, 7}) = {5, 6, 8} (here, the smallest odd i is 7) .

And here are two more examples (for n = 8 and k = 4):

f ({1, 2, 5, 7}) = {1, 2, 6, 7} (here, the smallest odd i is 5) ;


f ({1, 2, 5, 6}) = {1, 2, 5, 6} (here, there is no appropriate odd i ) .

Once again, it is clear that the set f (S) has size k whenever S does. Hence,
f : A → A is at least a well-defined map. This time, the map f is furthermore
an involution (that is, f ◦ f = id). Here is a quick argument for this (details
are left to the reader): Since we have “spaced” the switches apart, they don’t
interfere with each other. Thus, the i that gets chosen in the construction of
Math 701 Spring 2021, version April 6, 2024 page 321

f (S) will again get chosen in the construction of f ( f (S)) (since the elements
of f (S) that are smaller than this i will not have changed from S). Thus, the
switch that happens in the construction of f ( f (S)) undoes the switch made in
the construction of f (S), and as a result, the set f ( f (S)) will be S again. This
shows that f ◦ f = id.
Thus, f is an involution, hence a bijection. Moreover, sign ( f (S)) = − sign S
holds whenever f (S) ̸= S (because f (S) ̸= S implies that f (S) = S △ {i, i + 1}
for some i satisfying |S ∩ {i, i + 1}| = 1, and therefore sum ( f (S)) = sum S ±
1). Thus, we set
X := {S ∈ A | f (S) ̸= S} ,
and we restrict f to a map X → X (this is well-defined, since it is easy to see
from f ◦ f = id that f (S) ∈ X for each S ∈ X ). Then, the map f becomes a
sign-reversing bijection from X to X . Hence, Lemma 6.1.2 yields

∑ sign I = ∑ sign I.
I ∈A I ∈A\X

Renaming the index I as S, we can rewrite this equality as

∑ sign S = ∑ sign S.
S∈A S∈A\X

Hence, (193) becomes


 
n
k −1 S∑
= sign S = ∑ sign S. (194)
∈A S∈A\X

Now, what is A \ X ? In other words, what addends are left behind uncan-
celled?
In order to answer this question, we need to consider the case when n is even
and the case when n is odd separately. We begin with the case when n is even.
A k-element subset S of [n] belongs to A \ X if and only if it satisfies f (S) =
S. In other words, S belongs to A \ X if and only if there exists no odd i ∈
[n − 1] such that |S ∩ {i, i + 1}| = 1 (because f has been defined in such a way
that f (S) = S in this case, while f (S) = S △ {i, i + 1} ̸= S in the other case).
In other words, S belongs to A \ X if and only if for each odd i ∈ [n − 1], the
size |S ∩ {i, i + 1}| is either 0 or 2. This is equivalent to saying that if we break
up the n elements 1, 2, . . . , n into n/2 “blocks”

{1, 2} , {3, 4} , {5, 6} , . . . , {n − 1, n}

(this can be done, since n is even), then the intersection of S with each block
has size 0 or 2. In other words, this is saying that the set S consists of entire
blocks (i.e., each block is either fully included in S or is disjoint from S). In
other words, this is saying that the set S is a union (possibly empty) of blocks.
Math 701 Spring 2021, version April 6, 2024 page 322

We call a subset S of [n] blocky if it satisfies this condition.82 Thus, a k-element


subset S of [n] belongs to A \ X if and only if it is blocky. In other words, A \ X
is the set of all blocky k-element subsets of [n].
How many blocky k-element subsets does [n] have, and what are their signs?
Any blocky subset of [n] has the form83

{ i1 , i1 + 1} ⊔ { i2 , i2 + 1} ⊔ · · · ⊔ i p , i p + 1
for some odd elements i1 < i2 < · · · < i p of [n − 1]. Thus, this subset has
size 2p, which entails that its size is even. Hence, if k is odd, there are no
blocky k-element subsets of [n] at all. In other words, if k is odd, then there
are no S ∈ A \ X (since A \ X is the set of all blocky k-element subsets of
[n]). Therefore, if k is odd, then the sum ∑ sign S is empty, and thus (194)
S∈A\X
simplifies to  
n
= ∑ sign S = (empty sum) = 0. (195)
k −1 S∈A\X
Let us now consider the case when k is even. In this case, again, any blocky
subset S of [n] has the form

{ i1 , i1 + 1} ⊔ { i2 , i2 + 1} ⊔ · · · ⊔ i p , i p + 1
for some odd elements i1 < i2 < · · · < i p of [n − 1]. Moreover, again, this
subset has size 2p. Thus, if S has size k, then we must have 2p = k, so that
p = k/2. Thus, any blocky k-element subset S of [n] has the form
{i1 , i1 + 1} ⊔ {i2 , i2 + 1} ⊔ · · · ⊔ {ik/2 , ik/2 + 1}
 
n/2
for some odd elements i1 < i2 < · · · < ik/2 of [n − 1]. Hence, there are
  k/2
n/2
such subsets S (since there are precisely choices for these odd elements
k/2
i1 < i2 < · · · < ik/2 84 ). Moreover, any such subset S satisfies
sum S = i1 + (i1 + 1) + i2 + (i2 + 1) + · · · + ik/2 + (ik/2 + 1)
| {z } | {z } | {z }
≡1 mod 2 ≡1 mod 2 ≡1 mod 2
≡ 1| + 1 +{z· · · + 1} = k/2 mod 2
k/2 times
and therefore
k ( k + 1)
| {z S} − (|1 + 2 +{z· · · + k}) ≡ k/2 −
sum = −k2 /2 ≡ 0 mod 2
2
≡k/2 mod 2
k ( k + 1)
=
2
82 For example, {3, 4, 5, 6, 9, 10} is a blocky subset of [10], whereas {2, 3, 5, 6} is not (since it
neither fully includes nor is disjoint from the block {1, 2}).
83 The symbol “⊔” means “disjoint union” (in our case, a union of disjoint sets).
84 because the set [ n − 1] has n/2 odd elements
Math 701 Spring 2021, version April 6, 2024 page 323

(since k is even, so that −k2 /2 is even), and

sign S = (−1)sum S−(1+2+···+k) = 1 (196)

(since sum S − (1 + 2 + · · · + k ) ≡ 0 mod 2). Hence, the sum ∑ sign S has


S∈A\X
 
n/2
addends (because A \ X is the set of all blocky k-element subsets of [n],
k/2  
n/2
and we have just shown that there are such subsets), and each of these
k/2
addends is 1 (by (196)). Thus, this sum simplifies to
   
n/2 n/2
∑ sign S = k/2 · 1 = k/2 .
S∈A\X

Hence, (194) becomes


   
n n/2
= ∑ sign S = . (197)
k −1 S∈A\X k/2
 
n
Thus, we have computed
k −1

• in the case when n is even and k is odd (obtaining (195)), and

• in the case when n is even and k is even (obtaining (197)).

It remains to handle the case when n is odd. This case is different in that the
n elements 1, 2, . . . , n are now subdivided into (n + 1) /2 “blocks”

{1, 2} , {3, 4} , {5, 6} , . . . , {n − 2, n − 1} , {n} ,

with the last of these blocks having size 1. As a consequence, this time, a blocky
subset of [n] can have odd size. Moreover, the parity of k determines whether
a blocky k-element subset of [n] will contain n:

• If k is even, then no blocky k-element subset of [n] can contain n (because


if it did, then it would have odd size, since all non-{n} blocks have even
size).

• If k is odd, then every blocky k-element subset of [n] must contain n


(because if it didn’t, then it would have even size, since all non-{n} blocks
have even size).
Math 701 Spring 2021, version April 6, 2024 page 324

Thus, when classifying the blocky k-element subsets of [n], we can either
dismiss n immediately (if k is even) or take n for granted (if k is odd); in either
case, the problem gets reduced to classifying the blocky k-element or (k − 1)-
element subsets of [n − 1], which we already know how to do (since n − 1 is
even). The result is that the # of blocky k-element subsets of [n] (in the case
when n is odd) is

(n − 1) /2



 , if k is even;
(n − 1) /2
 
k/2

= ,
(n − 1) /2
 
⌊k/2⌋
 (k − 1) /2 , if k is odd


and their signs are always 1. Hence, we obtain


(n − 1) /2 (n − 1) /2
   
∑ sign S = ⌊k/2⌋ · 1 = ⌊k/2⌋ .
S∈A\X

Thus, (194) becomes


(n − 1) /2
   
n
= ∑ sign S = . (198)
k −1 S∈A\X ⌊k/2⌋
This is the answer to our exercise in the case when n is odd.
Returning to the general case, we can now combine the formulas (197), (195)
and (198) into a single equality that holds in all cases:

  0,
 if n is even and k is odd;
n  
= ⌊n/2⌋ (199)
k −1   ⌊k/2⌋ , otherwise.

This formula (199) can be generalized. Indeed, here is a generalization of the


number −1:
Definition 6.1.6. Let K be a field. Let d be a positive integer.
(a) A d-th root of unity in K means an element ω of K satisfying ω d = 1.
In other words, a d-th root of unity in K means an element of K whose d-th
power is 1.
(b) A primitive d-th root of unity in K means an element ω of K satisfying

ωd = 1

but
ω i ̸= 1 for each i ∈ {1, 2, . . . , d − 1} .
In other words, a primitive d-th root of unity in K means an element of the
multiplicative group K × whose order is d.
Math 701 Spring 2021, version April 6, 2024 page 325

For K = C, the d-th roots of unity are the d complex numbers


e2πi0/d , e2πi1/d , e2πi2/d , . . . , e2πi(d−1)/d
(which are the vertices of a regular d-gon
inscribed in the unit circle, with one vertex at 1), whereas the primitive d-th
roots of unity are the numbers e2πig/d for all g ∈ [d] satisfying gcd ( g, d) = 1.
In particular, the 2-nd roots of unity in C are 1 and −1, and the only primitive
2-nd root of unity is −1. The following picture shows the six 6-th roots of unity
in C (with the two primitive 6-th roots colored blue, and the remaining four
roots colored red):

e2πi2/6 e2πi1/6

e2πi3/6 e2πi0/6

e2πi5/6
e2πi4/6 .

We can now generalize (199) by replacing −1 by primitive roots of unity:

Theorem 6.1.7 (q-Lucas theorem). Let K be a field. Let d be a positive integer.


Let ω be a primitive d-th root of unity in K. Let n, k ∈ N. Let q and u be the
quotients obtained when dividing n and k by d with remainder, and let r and
v be the respective remainders. Then,
     
n q r
= · . (200)
k ω u v ω

Note that the equality (200) contains two ω-binomial coefficients and one
regular binomial coefficient.
It is not hard to check that (199) is the particular case of Theorem 6.1.7 for
d = 2 and ω = −1. Indeed, the only possible remainders  of an integer upon
r
division by 2 are 0 and 1, and the ω-binomial coefficients for r, v ∈ {0, 1}
v ω
are
       
0 0 1 1
= 1, = 0, = 1, = 1.
0 ω 1 ω 0 ω 1 ω

Theorem 6.1.7 can be proved using a generalization of sign-reversing involu-


tions to d-cycles instead of pairs85 . A more algebraic proof – using “noncom-
mutative generating functions” – is given in Exercise A.5.1.3.
85 In our proofs so far, we have been using the equality 1 + (−1) = 0 to cancel addends in
Math 701 Spring 2021, version April 6, 2024 page 326

6.2. The principles of inclusion and exclusion


We have so far been applying the “yoga” of sign-reversing involutions directly
to alternating sums. However, some of its essence can also be crystallized into
useful theorems. Most famous among such theorems are the principles of inclu-
sion and exclusion (also known as Sylvester sieve theorems or Poincaré’s theorems)86 .

6.2.1. The size version


The simplest such principle answers the following question: Assume that you
have a finite set U, and some subsets A1 , A2 , . . . , An of U. Assume that, for
each selection of some of these subsets, you know how many elements of U
belong to all selected sets at the same time. (For instance, you know how many
elements of U belong to A2 , A3 and A5 at the same time.) Does this help you
count the elements of U that belong to none of the n subsets (i.e., that don’t
belong to A1 ∪ A2 ∪ · · · ∪ An ) ?
The answer is “yes”, and there is an explicit formula for this count:

Theorem 6.2.1 (size version of the PIE). Let n ∈ N. Let U be a finite set. Let
A1 , A2 , . . . , An be n subsets of U. Then,

(# of u ∈ U that satisfy u ∈
/ Ai for all i ∈ [n])
= ∑ (−1)| I | (# of u ∈ U that satisfy u ∈ Ai for all i ∈ I ) .
I ⊆[n]

Some explanations are in order:

• Here and in the following, we are using the notation [n] for the set {1, 2, . . . , n},
as defined in Definition 5.1.2.

• The summation sign “ ∑ ” means a sum over all subsets I of [n]. More
I ⊆[n]
generally, if S is a given set, then the summation sign “ ∑ ” shall always
I ⊆S
mean a sum over all subsets I of S.

• The shorthand “PIE” in the name of Theorem 6.2.1 is short for “Principle
of Inclusion and Exclusion”.

alternating sums. This equality can be generalized as follows: If ω is a primitive d-th


root of unity for d > 1, then 1 + ω + ω 2 + · · · + ω d−1 = 0. This equality can be used to
cancel addends in sums that involve powers of ω. The discrete Fourier transform (see, e.g.,
[OlvSha18, §5.6]) and the roots-of-unity filter (see, e.g., [Knuth1, §1.2.9.D]) are applications of
this idea.
86 People typically use the singular forms (“principle” and “theorem” rather than “principles”

and “theorems”), but often mean different things.


Math 701 Spring 2021, version April 6, 2024 page 327

In one form or another, Theorem 6.2.1 appears in almost any text on com-
binatorics (e.g., in [Sagan19, Theorem 2.1.1], in [Loehr11, §4.11], in [Strick20,
Theorem 5.3], in [19fco, Theorem 2.9.7], or – in almost the same form as above
– in [20f, Theorem 7.8.6]). Most commonly, its claim is stated in the shorter (if
less transparent) form

∑ (−1)| I |
\
|U \ ( A1 ∪ A2 ∪ · · · ∪ An )| = Ai ,
I ⊆[n] i∈ I

Ai denotes the set {u ∈ U | u ∈ Ai for all i ∈ I } whenever I is a sub-


T
where
i∈ I
set of [n] 87 . This form is indeed equivalent to the claim of Theorem 6.2.1,
since we have

|U \ ( A1 ∪ A2 ∪ · · · ∪ An )|
= (# of u ∈ U that satisfy u ∈ / A1 ∪ A2 ∪ · · · ∪ A n )
= (# of u ∈ U that satisfy u ∈ / Ai for all i ∈ [n])

and since (for each subset I of [n]) we have

\
Ai = (# of u ∈ U that satisfy u ∈ Ai for all i ∈ I )
i∈ I
T
(by the definition of Ai that we just gave).
i∈ I
Rather than prove Theorem 6.2.1 directly, we shall soon derive it from more
general results (in order to avoid duplicating arguments). First, however, let us
give an interpretation that makes Theorem 6.2.1 a little bit more intuitive, and
sketch four applications (more can be found in textbooks – e.g., [Sagan19, §2.1],
[Stanle11, Chapter 2], [Wildon19, Chapter 3], [AndFen04, Chapter 6], [19fco,
§2.9]).

“Rule-breaking” interpretation of Theorem 6.2.1. Assume that we


are given a finite set U, and we are given n rules (labelled 1, 2, . . . , n)
87 This
notation should be taken with a grain of salt. When I is a nonempty subset of [n], it is in-
Ai as we just defined it (i.e., the set {u ∈ U | u ∈ Ai for all i ∈ I })
T
deed true that the set
i∈ I
is the intersection of the sets Ai over all i ∈ I (that is, if I = {i1 , i2 , . . . , ik }, then
Ai = Ai1 ∩ Ai2 ∩ · · · ∩ Aik ). However, if I is the empty set, then the literal intersection of
T
i∈ I
the sets Ai over all i ∈ I is not well-defined (indeed, by common sense, such an intersection
should containTevery object whatsoever; but there is no set that does this), whereas the set
we just called Ai is simply the set U. This is still justified, since we should think of the
i∈ I
sets Ai not as arbitrary sets but rather as subsets of U (so that any intersections are to be
taken within U). This, incidentally, is the reason for the choice of the letter “U”: We think
of U as the “universe” in which our objects live.
Math 701 Spring 2021, version April 6, 2024 page 328

that each element of U may or may not satisfy. (For instance, one
element of U might satisfy all of these rules; another might satisfy
none; yet another might satisfy rules 1 and 3 only. A rule can be
something like “thou shalt be divisible by 5” (if the elements of U
are numbers) or “thou shalt be a nonempty set” (if the elements of
U are sets).)
Assume that, for each I ⊆ [n], we know how many elements u ∈ U
satisfy all rules in I (but may or may not satisfy the remaining rules).
For example, this means that we know how many elements u ∈ U
satisfy rules 2, 3, 5 (simultaneously). Then, we can compute the # of
elements u ∈ U that violate all n rules 1, 2, . . . , n by the following
formula:
(# of elements u ∈ U that violate all n rules 1, 2, . . . , n)
= ∑ (−1)| I | (# of elements u ∈ U that satisfy all rules in I ) .
I ⊆[n]

Indeed, this formula is precisely what we obtain if we apply Theo-


rem 6.2.1 to the n subsets A1 , A2 , . . . , An defined by setting
Ai := {u ∈ U | u satisfies rule i } for each i ∈ [n] .

Thus, if you have a counting problem that can be restated as “count things
that violate a bunch of rules”, then you can apply Theorem 6.2.1 (in the inter-
pretation we just gave) to “turn the problem positive”, i.e., to make it about
counting rule-followers instead of rule-violators. If the “positive” problem is
easier, then this is a useful technique. We will now witness this on four exam-
ples.

6.2.2. Examples
Example 1. Let n, m ∈ N. Let us compute the # of surjective maps from [m] to
[n]. (We will outline the argument here; details can be found in [20f, §7.8.2] or
[19fco, §2.9.4].)
What are surjective maps? They are maps that take each element of the target
set as a value. Thus, in particular, a map f : [m] → [n] is surjective if and only
if it takes each i ∈ [n] as a value.
Hence, if we impose n rules 1, 2, . . . , n on a map f : [m] → [n], where rule i
says “thou shalt not take i as a value”, then the surjective maps f : [m] → [n]
are precisely the maps f : [m] → [n] that violate all n rules. Hence,
(# of surjective maps f : [m] → [n])
= (# of maps f : [m] → [n] that violate all n rules 1, 2, . . . , n)
= ∑ (−1)| I | (# of maps f : [m] → [n] that satisfy all rules in I )
I ⊆[n]
Math 701 Spring 2021, version April 6, 2024 page 329

(by the “rule-breaking” interpretation of Theorem 6.2.1).


Now, fix some subset I of [n]. What is the # of maps f : [m] → [n] that satisfy
all rules in I ? A map f : [m] → [n] satisfies all rules in I if and only if it takes
none of the i ∈ I as a value, i.e., if all its values belong to [n] \ I. Thus, the maps
f : [m] → [n] that satisfy all rules in I are nothing but the maps from [m] to
[n] \ I. The # of such maps is therefore |[n] \ I ||[m]| = (n − | I |)m (since I ⊆ [n]
entails |[n] \ I | = |[n]| − | I | = n − | I |, and since |[m]| = m).
Forget that we fixed I. We thus have shown that for each subset I of [n], we
have

(# of maps f : [m] → [n] that satisfy all rules in I )


= (n − | I |)m . (201)

Substituting this into the above computation, we find

(# of surjective maps f : [m] → [n])


= ∑ (−1)| I | (# of maps f : [m] → [n] that satisfy all rules in I )
I ⊆[n]
| {z }
=(n−| I |)m
(by (201))
n
= ∑ (−1)| I | (n − | I |)m = ∑ ∑ (−1)| I | (n − | I |)m
I ⊆[n] k =0 I ⊆[n];
| {z }
| I |=k =(−1)k (n−k)m
(since | I |=k)
 
here, we have split the sum according to
the value of | I |
n n  
n
= ∑ ∑ (−1) (n − k) = ∑ k (−1)k (n − k)m
k m

k =0 I ⊆[n]; k =0
| I |=k
|   {z }
n
= (−1)k (n−k)m
k  
n
(since this is a sum of
k
many equal addends)
n  
n
= ∑ (−1) k
(n − k )m .
k =0
k

Thus, we have proved the following theorem:

Theorem 6.2.2. Let n, m ∈ N. Then,


  n
n
(# of surjective maps f : [m] → [n]) = ∑ (−1) (n − k )m . k

k =0
k
Math 701 Spring 2021, version April 6, 2024 page 330

This is the simplest expression for this number. It has no product formula
(unlike the # of injective maps f : [m] → [n], which is n (n − 1) (n − 2) · · · (n − m + 1)).
Before we move on to the next example, let us draw a few consequences from
Theorem 6.2.2:

Corollary 6.2.3. Let n ∈ N. Then:


 
n
k n
(a) We have ∑ (−1) (n − k)m = 0 for any m ∈ N satisfying m < n.
k =0 k
 
n
k n
(b) We have ∑ (−1) (n − k)n = n!.
k =0 k
 
n
k n
(c) We have ∑ (−1) (n − k)m ≥ 0 for each m ∈ N.
k =0 k
 
n
k n
(d) We have n! | ∑ (−1) (n − k)m for each m ∈ N.
k =0 k

Proof of Corollary 6.2.3 (sketched).


 (a) Let m ∈ N satisfy m < n. Theorem 6.2.2
n n
shows that the sum ∑ (−1)k (n − k)m equals the # of surjective maps f :
k =0 k
[m] → [n]. However, there are no such maps (by the Pigeonhole Principle
for Surjections) 88
  ; hence, this # is 0. Therefore, the sum is 0. In other words,
n
k n
∑ (−1) (n − k)m = 0. This proves Corollary 6.2.3 (a).
k =0 k
 
n
k n
(b) Theorem 6.2.2 (applied to m = n) shows that the sum ∑ (−1) (n − k )n
k =0 k
equals the # of surjective maps f : [n] → [n]. However, these maps are precisely
the permutations of [n] (by the Pigeonhole Principle for Surjections)89 ; there-
fore, this # is n!. Therefore, this sum is n!. This proves Corollary 6.2.3 (b).
 
n n
k
(c) Let m ∈ N. Theorem 6.2.2 shows that the sum ∑ (−1) (n − k )m
k =0 k
equals the # of surjective maps f : [m] → [n]. Hence, this sum is ≥ 0. This
proves Corollary 6.2.3 (c).
 
n
k n
(d) Let m ∈ N. Theorem 6.2.2 shows that the sum ∑ (−1) (n − k )m
k =0 k
88 Indeed, the Pigeonhole Principle for Surjections shows that there are no surjective maps from
a smaller set to a larger set. Thus, in particular, there are no surjective maps from [m] to [n]
(since m < n).
89 Indeed, the Pigeonhole Principle for Surjections shows that any surjective map between two

finite sets of the same size is bijective. Thus, any surjective map f : [n] → [n] is bijective,
and hence is a permutation of [n]. The converse is true as well (i.e., any permutation of [n]
is a surjective map f : [n] → [n]). Thus, the surjective maps f : [n] → [n] are precisely the
permutations of [n].
Math 701 Spring 2021, version April 6, 2024 page 331

equals the # of surjective maps f : [m] → [n]. Let

U := {surjective maps f : [m] → [n]} .

Thus, this sum equals |U |. Hence, it remains to show that n! | |U |.


The argument that follows will use the language of group actions (although
it could be restated combinatorially)90 . If σ ∈ Sn is any permutation and f :
[m] → [n] is a surjective map, then the composition σ ◦ f : [m] → [n] is again a
surjective map. In other words, any σ ∈ Sn and f ∈ U satisfy σ ◦ f ∈ U. Hence,
the symmetric group Sn acts on the set U by the rule91

σ· f = σ◦ f for all σ ∈ Sn and f ∈ U.

(This is a well-defined group action, since composition of maps is associative.)


This turns U into an Sn -set. It is not hard to see that this action of Sn on U
is free – i.e., the stabilizer of each f ∈ U is the trivial subgroup {id} of Sn 92 .
Thus, the Orbit-Stabilizer Theorem (see, e.g., [Quinla21, Theorem 3.2.5]) shows
that each orbit of this action has size

[Sn : {id}] = |Sn | / |{id}| = n!/1 = n!.


|{z} | {z }
=n! =1

However, the orbits of this action form a partition of U (since each element of
U belongs to exactly one orbit). Thus, we have

|U | = (the sum of the sizes of the orbits) = (# of orbits) · n!

(since each orbit has size n!). This entails n! | |U |, which is exactly what we
wanted to show. This proves Corollary 6.2.3 (d).
We note that parts (a) and (b) of Corollary 6.2.3 can also be proved alge-
braically (see, e.g., [20f, Exercise 5.4.2 (d)] for an algebraic generalization of
Corollary 6.2.3 (a)); but this is harder for Corollary 6.2.3 (d) and (to my knowl-
edge) impossible for Corollary 6.2.3 (c).
90 For a refresher on group actions, see (e.g.) [Quinla21, §3.2] or [Loehr11, §9.11–§9.15] or
[Artin10, §6.7–§6.9] or [Armstr19, Fall 2018, Weeks 8–10] or [Aigner07, §6.1]. When G is a
group, we use the word “G-set” to mean a set on which G acts.
91 This is called “acting by post-composition” (since σ ◦ f is obtained from f by composing with

σ “after” f ).
92 Proof. Let f ∈ U. We must prove that the stabilizer of f is {id}.

Let σ belong to the stabilizer of f . Thus, σ ◦ f = f . Now, let j ∈ [n]. Recall that f ∈ U;
hence, f is surjective. Thus, there exists some i ∈ [m] such that j = f (i ). Consider this i.
Now, from j = f (i ), we obtain σ ( j) = σ ( f (i )) = (σ ◦ f ) (i ) = f (i ) = j. Now, forget that
| {z }
=f
we fixed j. We thus have shown that σ ( j) = j for each j ∈ [n]. In other words, σ = id.
Forget that we fixed σ. We thus have shown that any σ in the stabilizer of f satisfies
σ = id. In other words, the stabilizer of f is a subset of {id}. Therefore, the stabilizer of f
must be {id} (since id is clearly in the stabilizer of f ).
Math 701 Spring 2021, version April 6, 2024 page 332

Example 2. (See [19fco, §2.9.5] for details.) We will use the following defini-
tion:

Definition 6.2.4. A derangement of a set X means a permutation of X that has


no fixed points.

Now, let n ∈ N. How many derangements does [n] have?


Before answering this question, we establish notations and a few examples.
Let Dn be the # of derangements of [n]. For example:

• The identity permutation id ∈ S0 is a derangement, since it has no fixed


points (since [0] = ∅ has no elements to begin with). Thus, D0 = 1.

• The identity permutation id ∈ S1 is not a derangement, since id (1) = 1.


Thus, D1 = 0.

• In the symmetric group S2 , the identity is not a derangement, but the


transposition s1 = t1,2 is one. Thus, D2 = 1.

• In the symmetric group S3 , the derangements are the 3-cycles cyc1,2,3 and
cyc1,3,2 . Thus, D3 = 2.

Here is a table of early values of Dn :

n 0 1 2 3 4 5 6 7 8 9 10
.
Dn 1 0 1 2 9 44 265 1854 14 833 133 496 1 334 961

Note: The number Dn is also called the subfactorial of n and is sometimes


denoted by !n. Be careful with that notation: what is 2!2 ?

Let us now try to compute Dn in general. Fix n ∈ N, and set U := Sn =


{permutations of [n]}. We impose n rules 1, 2, . . . , n on a permutation σ ∈ U,
with rule i being “thou shalt leave the element i fixed” (in other words, rule i
requires a permutation σ ∈ U to satisfy σ (i ) = i). Now,

Dn = (# of derangements of [n])
= (# of permutations u ∈ U that violate all n rules 1, 2, . . . , n)
= ∑ (−1)| I | (# of permutations u ∈ U that satisfy all rules in I )
I ⊆[n]

(by the “rule-breaking” interpretation of Theorem 6.2.1).


Now, let I be a subset of [n]. What is the # of permutations u ∈ U that satisfy
all rules in I ? These permutations are the permutations of [n] that leave each
i ∈ I fixed (but can do whatever they want with the remaining elements of [n]).
Clearly, there are (n − | I |)! such permutations, since they are essentially the
Math 701 Spring 2021, version April 6, 2024 page 333

permutations of the (n − | I |)-element set [n] \ I (with the elements of I tacked


on as fixed points). (See [19fco, Corollary 2.9.16] for the technical details of this
intuitively obvious argument.)
Forget that we fixed I. We thus have shown that

(# of permutations u ∈ U that satisfy all rules in I )


= (n − | I |)! (202)

for any subset I of [n].


Thus, the above computation becomes

Dn = ∑ (−1)| I | (# of permutations u ∈ U that satisfy all rules in I )


I ⊆[n]
| {z }
=(n−| I |)!
(by (202))
n
= ∑ (−1)| I | (n − | I |)! = ∑ ∑ (−1)| I | (n − | I |)!
I ⊆[n] k =0 I ⊆[n];
| {z }
|{z} | I |=k =(−1)k (n−k)!
n (since | I |=k)
=∑ ∑
k =0 I ⊆[n];
| I |=k
n   n
n
= ∑ ∑ (−1) (n − k)! = ∑ k (−1)k (n − k)!
k

k =0 I ⊆[n]; k =0
| I |=k
|   {z }
n
= (−1)k (n−k)!
k
n n n
(−1)k
 
k n n!
= ∑ (−1) (n − k)! = ∑ (−1)k = n! · ∑ .
k =0
k k = 0
k! k = 0
k!
| {z }
n!
=
k!
(by (2))

Let us summarize our results as a theorem:

Theorem 6.2.5. Let n ∈ N. Then, the # of derangements of [n] is


n n
(−1)k
 
n
Dn = ∑ (−1) (n − k)! = n! · ∑
k
.
k =0
k k =0
k!

Remark 6.2.6. The sum on the right hand side is a partial sum of the well-
∞ (−1)k
known infinite series ∑ = e−1 (where e = 2.718 . . . is Euler’s num-
k =0 k!
ber). This is quite helpful in approximating Dn ; indeed, it is easy to see
Math 701 Spring 2021, version April 6, 2024 page 334

(using some simple estimates) that

n!
Dn = round for each n > 0,
e
where round x means the result of rounding a real number x to the nearest
integer (fortunately, since e is irrational, we never get a tie).

See Exercise A.5.2.3 and [Wildon19, Chapter 1] for more about derangements.

Example 3. Like many other things in these lectures, the following elemen-
tary number-theoretical result is due to Euler:

Theorem 6.2.7. Let c be a positive integer with prime factorization


p1a1 p2a2 · · · pnan , where p1 , p2 , . . . , pn are distinct primes, and where a1 , a2 , . . . , an
are positive integers. Then,
n   n 
1 
(# of all u ∈ [c] that are coprime to c) = c · ∏ 1 − = ∏ pi − pi
ai a i −1
.
i =1
pi i =1

Note that the # of all u ∈ [c] that are coprime to c is usually denoted by ϕ (c)
in number theory, and the map ϕ : {1, 2, 3, . . .} → N that sends each c to ϕ (c)
is called Euler’s totient function.
Theorem 6.2.7 can be proved in many ways (and a proof can be found in
almost any text on elementary number theory). Probably the most transparent
proof relies on the PIE:
Proof of Theorem 6.2.7 (sketched). (See [19fco, proof of Theorem 2.9.19] for the de-
tails of this argument.) Let U = [c]. A number u ∈ U is coprime to c if and
only if it is not divisible by any of the prime factors p1 , p2 , . . . , pn of c. Again,
this means that u breaks all n rules 1, 2, . . . , n, where rule i says “thou shalt be
divisible by pi ”. Thus, by the “rule-breaking” interpretation of Theorem 6.2.1,
Math 701 Spring 2021, version April 6, 2024 page 335

we obtain

(# of all u ∈ [c] that are coprime to c)


= ∑ (−1)| I | (# of all u ∈ [c] that are divisible by all pi with i ∈ I )
I ⊆[n]
| {z }
c
=
∏ pi
i∈ I
(this is not hard to prove)
c 1
= ∑ (−1)| I |
∏ pi
= c· ∑ (−1)| I | ∏
pi
I ⊆[n] I ⊆[n] i∈ I
i∈ I | {z }
1
!
n
=∏ 1−
i =1 pi
(an easy consequence of (164))
!
n   n n  
1 1
= c ·∏ 1− = ∏
a
pi i ·∏ 1−
a a
|{z}
i =1
pi i =1 i =1
pi
= p11 p22 ··· pnan
n a
= ∏ pi i
i =1
n    n 
1 
=∏ a
pi i 1− = ∏ piai − piai −1 .
p
i =1 | {z i } i =1
a a −1
= pi i − pi i

This proves Theorem 6.2.7.

Example 4. (This one is taken from [Sagan19, Theorem 2.3.3].) Recall Theo-
rem 4.1.14, which states that each n ∈ N satisfies

podd (n) = pdist (n) ,

where

podd (n) := (# of partitions of n into odd parts) and


pdist (n) := (# of partitions of n into distinct parts) .

We have already proved this twice, but let us prove it again.


Third proof of Theorem 4.1.14 (sketched). Let n ∈ N. We set U := {partitions of n}.
In this proof, the word “partition” will always mean “partition of n”. Thus,
a partition cannot contain any of the entries n + 1, n + 2, n + 3, . . . (because any
of these entries would cause the partition to have size > n).
Math 701 Spring 2021, version April 6, 2024 page 336

We want to frame the partitions of n into distinct parts as rule-breakers. We


observe that
{partitions of n into distinct parts}
= {partitions that contain none of the entries 1, 2, 3, . . . twice}
 
here and in the following, the word “twice”
means “at least twice”
= {partitions that contain none of the entries 1, 2, . . . , n twice}
(since a partition cannot contain any of the entries n + 1, n + 2, n + 3, . . .). In
other words, the partitions of n into distinct parts are precisely the partitions
that break all rules 1, 2, . . . , n, where rule i says “thou shalt contain the entry i
twice”93 .
Thus, applying the PIE (specifically, the “rule-breaking” interpretation of
Theorem 6.2.1), we obtain
pdist (n)
= ∑ (−1)| I | (# of partitions that satisfy all rules in I )
I ⊆[n]

= ∑ (−1)| I | (# of partitions that contain each of the entries i ∈ I twice) .


I ⊆[n]
(203)
We can play the same game with podd (n). This time, rule i says “thou shalt
contain the entry 2i”. Thus, again applying the PIE, we obtain

podd (n) = ∑ (−1)| I | (# of partitions that contain the entry 2i for each i ∈ I ) .
I ⊆[n]
(204)
Now, comparing these two equalities, we see that in order to prove that
pdist (n) = podd (n), it will suffice to show that
(# of partitions that contain each of the entries i ∈ I twice)
= (# of partitions that contain the entry 2i for each i ∈ I )
for any subset I of [n].
So let I be a subset of [n]. We are looking for a bijection
from {partitions that contain each of the entries i ∈ I twice}
to {partitions that contain the entry 2i for each i ∈ I } .
Such a bijection can be obtained as follows: For each i ∈ I (from highest to
lowest94 ), we remove two copies of i from the partition, and insert a 2i into
93 Once again, “twice” means “at least twice”.
94 Actually,a bit of thought reveals that the order in which we go through the i ∈ I does
not affect the result; this becomes particularly clear if we identify each partition with the
multiset of its entries. Thus, me saying “from highest to lowest” is unnecessary.
Math 701 Spring 2021, version April 6, 2024 page 337

the partition in their stead. For example, if I = {2, 4, 5} and n = 33, then our
bijection sends the partition
(5, 5, 4, 4, 3, 3, 2, 2, 2, 2, 1) to (10, 8, 4, 3, 3, 2, 2, 1) .
(Note that the 4 in the resulting partition is not one of the original two 4s, but
rather a new 4 that was inserted when we removed two copies of 2. On the
other hand, the two 2s in the resulting partition are inherited from the original
partition, because (unlike the bijection A in our Second proof of Theorem 4.1.14
above) our bijection only removes two copies of each i ∈ I.)
It is easy to see that this purported bijection really is a bijection95 . Thus, we
have found our bijection. The bijection principle therefore yields
(# of partitions that contain each of the entries i ∈ I twice)
= (# of partitions that contain the entry 2i for each i ∈ I ) .
We have proved this equality for all I ⊆ [n]. Hence, the right hand sides of
the equalities (203) and (204) are equal. Thus, their left hand sides are equal as
well. In other words, pdist (n) = podd (n). This proves Theorem 4.1.14 again.
Remark 6.2.8. It is worth contrasting the above four examples (in which we
applied the PIE to solve a counting problem, obtaining an alternating sum as
a result) with our arguments in Section 6.1 (in which we computed alternat-
ing sums using sign-reversing involutions). Sign-reversing involutions help
turn alternating sums into combinatorial problems, while the PIE moves us
in the opposite direction. The two techniques are thus, in some way, inverse
to each other. This will become less mysterious once we prove the PIE itself
using a sign-reversing involution. The PIE can also be used backwards, to
turn an alternating sum into a counting problem, which is how we proved
Corollary 6.2.3 above.

6.2.3. The weighted version


The main rule of algebra is to never turn down nature’s gifts. The PIE (in the
shape of Theorem 6.2.1) can be generalized with zero effort, so let us do it ([20f,
Theorem 7.8.9]):
Theorem 6.2.9 (weighted version of the PIE). Let n ∈ N, and let U be a finite
set. Let A1 , A2 , . . . , An be n subsets of U. Let A be any additive abelian group
(such as R, or any vector space, or any ring). Let w : U → A be any map
(i.e., let w (u) be an element of A for each u ∈ U). Then,

∑ w (u) = ∑ (−1)| I | ∑ w (u) . (205)


u∈U; I ⊆[n] u∈U;
∈ Ai for all i ∈[n]
u/ u∈ Ai for all i ∈ I

95 Itsinverse, of course, does what you would expect: For each i ∈ I, we remove a 2i from the
partition, and insert two copies of i in its stead.
Math 701 Spring 2021, version April 6, 2024 page 338

We can think of each value w (u) in Theorem 6.2.9 as a kind of “weight”


of the respective element u. Thus, the left hand side of the equality (205) is
the total weight of all “rule-breaking” u ∈ U (that is, of all u ∈ U that satisfy
(u ∈
/ Ai for all i ∈ [n])), whereas the inner sum ∑ w (u) on the right
u∈U;
u∈ Ai for all i ∈ I
hand side is the total weight of all u ∈ U that satisfy (u ∈ Ai for all i ∈ I ). This
is why we call Theorem 6.2.9 the weighted version of the PIE (or just the weighted
PIE).
Theorem 6.2.1 can be obtained from Theorem 6.2.9 by taking w to be con-
stantly 1 (that is, by setting w (u) = 1 for each u ∈ U). Indeed, a sum of a
bunch of 1s equals the # of 1s being summed, so that sums generalize cardinal-
ities.
With Theorem 6.2.9, we can take any of our above four examples from Sub-
section 6.2.2, and introduce weights into them – i.e., instead of asking for a
number, we sum certain “weights”. Little question (cf. the homework): What
do you get?
But we have no time for this now. Another generalization is calling!

6.2.4. Boolean Möbius inversion


This generalization (we will soon see why it is a generalization) is our third
“Principle of Inclusion and Exclusion”, but is probably best referred to as the
Boolean Möbius inversion formula (or the Möbius inversion formula for the Boolean
lattice):
Theorem 6.2.10 (Boolean Möbius inversion). Let S be a finite set. Let A be
any additive abelian group.
For each subset I of S, let a I and b I be two elements of A.
Assume that
bI = ∑ a J for all I ⊆ S. (206)
J⊆I

Then, we also have

aI = ∑ (−1)| I \ J | b J for all I ⊆ S. (207)


J⊆I

Example 6.2.11. Let S = [2] = {1, 2}. Then, the assumptions of Theorem
6.2.10 state that

b∅ = a∅ ;
b{ 1 } = a ∅ + a { 1 } ;
b{ 2 } = a ∅ + a { 2 } ;
b{1,2} = a∅ + a{1} + a{2} + a{1,2} .
Math 701 Spring 2021, version April 6, 2024 page 339

The claim of Theorem 6.2.10 then states that

a∅ = b∅ ;
a{1} = −b∅ + b{1} ;
a{2} = −b∅ + b{2} ;
a{1,2} = b∅ − b{1} − b{2} + b{1,2} .

These four equalities can be verified easily. For instance, let us check the last
of them:

b∅ − b{1} − b{2} + b{1,2}


|{z} |{z} |{z} | {z }
= a∅ = a∅ + a {1} = a∅ + a {2} = a∅ + a{1} + a{2} + a{1,2}
 
   
= a∅ − a∅ + a{1} − a∅ + a{2} + a∅ + a{1} + a{2} + a{1,2}
= a{1,2} .

Before we prove Theorem 6.2.10, let us show that the weighted PIE (Theorem
6.2.9) is a particular case of it:
Proof of Theorem 6.2.9 using Theorem 6.2.10. Let S = [n]. We note that the map
{subsets of S} → {subsets of S} ,
J 7→ S \ J
is a bijection. (Indeed, this map is an involution, since each subset J of S satisfies
S \ (S \ J ) = J.)
For each u ∈ U, define a subset Viol u of S by
Viol u := {i ∈ S | u ∈
/ Ai } .
(In terms of the “rule-breaking” interpretation, Viol u is the set of all rules that
u violates.) Now, for each subset I of [n], we set
a I := ∑ w (u) and b I := ∑ w (u) .
u∈U; u∈U;
Viol u= I Viol u⊆ I

Then, for each subset I of S, we have


 
here, we have split
bI = ∑ w (u) = ∑ ∑ w (u)  the sum according to 
u∈U; J⊆I u∈U; the value of Viol u
Viol u⊆ I Viol u= J
| {z }
=a J
(by the definition of a J )

= ∑ aJ.
J⊆I
Math 701 Spring 2021, version April 6, 2024 page 340

Thus, we can apply Theorem 6.2.10. This gives us

aI = ∑ (−1)| I \ J | b J for all I ⊆ S. (208)


J⊆I

However, if J is any subset of S, then the definition of b J yields

bJ = ∑ w (u) = ∑ w (u) (209)


u∈U; u∈U;
Viol u⊆ J u∈ Ai for all i ∈S\ J

(because the elements u ∈ U that satisfy Viol u ⊆ J are precisely the elements
u ∈ U that satisfy (u ∈ Ai for all i ∈ S \ J ) 96 ).
Now, applying (208) to I = S, we obtain

aS = ∑ (−1)|S\ J | b J = ∑ (−1)|S\ J | ∑ w (u) (by (209))


J ⊆S J ⊆S u∈U;
u∈ Ai for all i ∈S\ J

= ∑ (−1)| I | ∑ w (u)
I ⊆S u∈U;
u∈ Ai for all i ∈ I

here, we have substituted I for S \ J in the sum, since


 
 the map {subsets of S} → {subsets of S} , J 7→ S \ J 
is a bijection
= ∑ (−1)| I | ∑ w (u) (since S = [n]) .
I ⊆[n] u∈U;
u∈ Ai for all i ∈ I

On the other hand, the definition of aS yields

aS = ∑ w (u) = ∑ w (u)
u∈U; u∈U;
Viol u=S ∈ Ai for all i ∈[n]
u/

96 Proof.Let u ∈ U. We must prove that the condition “Viol u ⊆ J” is equivalent to “u ∈ Ai for


all i ∈ S \ J”.
We have Viol u = {i ∈ S | u ∈ / Ai } (by the definition of Viol u). Hence, we have the
following chain of equivalences:

(Viol u ⊆ J ) ⇐⇒ ({i ∈ S | u ∈/ Ai } ⊆ J )
⇐⇒ (each i ∈ S satisfying u ∈ / Ai belongs to J )
⇐⇒ (each i ∈ S that does not belong to J must satisfy u ∈ Ai )
(by contraposition)
⇐⇒ (each i ∈ S \ J must satisfy u ∈ Ai )
⇐⇒ (u ∈ Ai for all i ∈ S \ J ) .

Hence, the condition “Viol u ⊆ J” is equivalent to “u ∈ Ai for all i ∈ S \ J”.


Math 701 Spring 2021, version April 6, 2024 page 341

(since the elements u ∈ U that satisfy Viol u = S are precisely the elements
/ Ai for all i ∈ [n]) 97 ).
u ∈ U that satisfy (u ∈
Comparing these two equalities, we obtain

∑ w (u) = ∑ (−1)| I | ∑ w (u) .


u∈U; I ⊆[n] u∈U;
∈ Ai for all i ∈[n]
u/ u∈ Ai for all i ∈ I

Thus, Theorem 6.2.9 has been proved, assuming Theorem 6.2.10.


It remains to prove the latter:
Proof of Theorem 6.2.10. Fix a subset Q of S. We shall prove that

aQ = ∑ (−1)|Q\ I | b I .
I ⊆Q

We begin by rewriting the right hand side:

∑ (−1)|Q\ I | bI = ∑ (−1)|Q\ I | ∑ aJ = ∑ (−1)|Q\ I | ∑ aP


I ⊆Q I ⊆Q J⊆I I ⊆Q P⊆ I
|{z}
= ∑ aJ | {z }
J⊆I
(by (206)) = ∑ aP
P⊆ I

= ∑ ∑ (−1) | Q\ I |
aP . (210)
I ⊆Q P⊆ I

The two summation signs “ ∑ ∑ ” on the right hand side of this equality
I ⊆Q P⊆ I
result in a sum over all pairs ( I, P) of subsets of Q satisfying P ⊆ I ⊆ Q. The
same result can be obtained by the two summation signs “ ∑ ∑ ” (indeed,
P⊆ Q I ⊆ Q;
P⊆ I
the only difference between “ ∑ ∑ ” and “ ∑ ∑ ” is the order in which
I ⊆Q P⊆ I P⊆ Q I ⊆ Q;
P⊆ I
the two subsets I and P are chosen). Thus, we can replace “ ∑ ∑ ” by
I ⊆Q P⊆ I

97 Proof.Let u ∈ U. We must prove that the condition “Viol u = S” is equivalent to “u ∈


/ Ai for
all i ∈ [n]”.
We have Viol u = {i ∈ S | u ∈ / Ai } (by the definition of Viol u). Hence, we have the
following chain of equivalences:

(Viol u = S) ⇐⇒ ({i ∈ S | u ∈/ Ai } = S )
⇐⇒ (each i ∈ S satisfies u ∈/ Ai )
⇐⇒ (u ∈
/ Ai for all i ∈ S)
⇐⇒ (u ∈
/ Ai for all i ∈ [n]) (since S = [n]) .

Hence, the condition “Viol u = S” is equivalent to “u ∈


/ Ai for all i ∈ [n]”.
Math 701 Spring 2021, version April 6, 2024 page 342

“ ∑ ∑ ” on the right hand side of (210). Hence, (210) rewrites as follows:


P⊆ Q I ⊆ Q;
P⊆ I

∑ (−1)|Q\ I | b I = ∑ ∑ (−1)|Q\ I | a P
I ⊆Q P⊆ Q I ⊆ Q;
P⊆ I
 

= ∑ ∑ (−1)|Q\ I |  a P . (211)
 

P⊆ Q I ⊆ Q;
P⊆ I

We want to prove that this equals aQ . Since the a P ’s are arbitrary elements
of an abelian group, the only way this can possibly be achieved is by showing
that the sum on the right hand side simplifies to aQ formally – i.e., that the
coefficient ∑ (−1)|Q\ I | in front of a P is 0 whenever P ̸= Q, and is 1 whenever
I ⊆ Q;
P⊆ I
P = Q. Thus, we now set out to prove this. Using Definition A.1.5, we can
restate this goal as follows: We want to prove that every subset P of Q satisfies

∑ (−1)|Q\ I | = [ P = Q] . (212)
I ⊆ Q;
P⊆ I

We shall prove this soon (in Lemma 6.2.12 (b) below). For now, let us explain
how the proof of Theorem 6.2.10 can be completed if (212) is known to be true.
Indeed, (211) becomes
 

∑ (−1)|Q\ I | b I = ∑ ∑ (−1)|Q\ I |  a P = ∑ [ P = Q] a P
 

I ⊆Q P⊆ Q I ⊆ Q; P⊆ Q
P⊆ I
| {z }
=[ P= Q]
(by (212))

= [ Q = Q] aQ + ∑ [ P = Q] a P
P⊆ Q;
| {z } | {z }
=1
(since Q= Q) P̸= Q (since=P
0
̸= Q)
 
here, we have split off the addend for P = Q
from the sum (since Q ⊆ Q)
= aQ + ∑ 0a P = aQ .
P⊆ Q;
P̸= Q
| {z }
=0

In other words, aQ = ∑ (−1)|Q\ I | b I .


I ⊆Q
Math 701 Spring 2021, version April 6, 2024 page 343

Forget that we fixed Q. We thus have shown that

aQ = ∑ (−1)|Q\ I | b I for all Q ⊆ S.


I ⊆Q

Renaming the indices Q and I as I and J in this statement, we obtain the


following:
a I = ∑ (−1)| I \ J | b J for all I ⊆ S.
J⊆I

Thus, Theorem 6.2.10 is proved (assuming that (212) is known to be true).


It now remains to prove (212). We shall do this as part of the following
lemma:

Lemma 6.2.12. Let Q be a finite set. Let P be a subset of Q. Then:


(a) We have
∑ (−1)| I | = (−1)| P| [ P = Q] . (213)
I ⊆ Q;
P⊆ I

(b) We have
∑ (−1)|Q\ I | = [ P = Q] . (214)
I ⊆ Q;
P⊆ I

As promised, Lemma 6.2.12 (b) (once proved) will yield (212) and thus will
complete our above proof of Theorem 6.2.10 (and, with it, the proofs of Theorem
6.2.9 and Theorem 6.2.1).
Proof of Lemma 6.2.12. (a) There are many ways to prove this (in particular, a
simple one using the binomial theorem – do you see it?); but staying true to the
spirit of this chapter, we pick one using a sign-reversing involution. (A variant
of this proof can be found in [19fco, solution to Exercise 2.9.1].98 )
We must prove the equality (213). If P = Q, then this equality is easily seen
to hold99 . Hence, for the rest of this proof, we WLOG assume that P ̸= Q.
Thus, [ P = Q] = 0.
Now, P is a proper subset of Q (since P is a subset of Q and satisfies P ̸= Q).
Hence, there exists some q ∈ Q such that q ∈ / P. Fix such a q.
98 Our sets Q and P are called S and T in [19fco, solution to Exercise 2.9.1].
99 Proof.Assume that P = Q. Then, the only subset I of Q that satisfies P ⊆ I is the set Q itself
(since any such subset I has to satisfy both Q = P ⊆ I and I ⊆ Q, which in combination
entail I = Q). Thus, the sum ∑ (−1)| I | has only one addend, namely the addend for
I ⊆ Q;
P⊆ I
I = Q. Consequently, this sum simplifies as follows:

∑ (−1)| I | = (−1)|Q| . (215)


I ⊆ Q;
P⊆ I
Math 701 Spring 2021, version April 6, 2024 page 344

Let
A := { I ⊆ Q | P ⊆ I } ,
and let
sign I := (−1)| I | for each I ∈ A.
Then,
∑ sign I = ∑ (−1)| I | . (216)
I ∈A I ⊆ Q;
P⊆ I

We shall now construct an involution f : A → A on the set A; this will allow


us to apply Lemma 6.1.3 (to X = A), and easily conclude that ∑ sign I = 0
I ∈A
(see below for the details).
Indeed, if I is a subset of Q, then I ∪ {q} is also a subset of Q (because q ∈ Q).
Thus, if I ∈ A, then I ∪ {q} ∈ A (because P ⊆ I implies P ⊆ I ⊆ I ∪ {q}).
On the other hand, if I is a set satisfying P ⊆ I, then the set I \ {q} also
satisfies P ⊆ I \ {q} (since q ∈/ P). Thus, if I ∈ A, then I \ {q} ∈ A (since I ⊆ Q
implies I \ {q} ⊆ I ⊆ Q).
Now, we define a map f : A → A by setting100
(
I \ {q} , if q ∈ I;
f ( I ) := I △ {q} = for each I ∈ A.
I ∪ {q} , if q ∈
/I

This map f is well-defined, because (as we have just shown in the two para-
graphs above) every I ∈ A satisfies I ∪ {q} ∈ A and I \ {q} ∈ A. Moreover,
this map f is an involution101 . This involution f has no fixed points (because
if I ∈ A, then f ( I ) = I △ {q} ̸= I). Furthermore, if I ∈ A, then the set
f ( I ) = I △ {q} differs from I in exactly one element (namely, q), and thus
satisfies | f ( I )| = | I | ± 1, so that

(−1)| f ( I )| = − (−1)| I | ,

or, equivalently,
sign ( f ( I )) = − sign I
On the other hand, from P = Q, we obtain

(−1)| P| [ P = Q] = (−1)|Q| [ Q = Q] = (−1)|Q| .


| {z }
=1
(since Q= Q)

Comparing this with (215), we obtain ∑ (−1)| I | = (−1)| P| [ P = Q]. Thus, we have shown
I ⊆ Q;
P⊆ I
that (213) holds under the assumption that P = Q.
100 Here, the notation X △ Y means the symmetric difference ( X ∪ Y ) \ ( X ∩ Y ) of two sets X
and Y (as in Subsection 3.2.1).
101 Indeed, the map f merely removes q from a set I if q is contained in I, and inserts it into I

otherwise; but this is clearly an operation that undoes itself when performed a second time.
Math 701 Spring 2021, version April 6, 2024 page 345

(since the definition of sign I yields sign I = (−1)| I | , and similarly sign ( f ( I )) =
(−1)| f ( I )| ). Thus, Lemma 6.1.3 (applied to X = A) shows that

∑ sign I = ∑ sign I = (empty sum) (since A \ A = ∅)


I ∈A I ∈A\A
= 0.

Comparing this with (216), we find


 

∑ (−1)| I | = 0 = (−1)| P| [ P = Q] since (−1)| P| [ P = Q] = 0 .


I ⊆ Q;
| {z }
P⊆ I =0

Thus, (213) is proved. This proves Lemma 6.2.12 (a).


(b) If I is any subset of Q, then | Q \ I | = | Q| − | I | ≡ | Q| + | I | mod 2 and thus

(−1)|Q\ I | = (−1)|Q|+| I | = (−1)|Q| (−1)| I | .

Hence,

∑ (−1)|Q\ I | = ∑ (−1)|Q| (−1)| I | = (−1)|Q| ∑ (−1)| I |


I ⊆ Q; I ⊆ Q; I ⊆ Q;
| {z }
P⊆ I =(−1)|Q| (−1)| I | P⊆ I P⊆ I
| {z }
| P|
=(−1) [ P= Q]
(by Lemma 6.2.12 (a))

= (−1)|Q| (−1)| P| [ P = Q] .

However, it is easy to see that

(−1)|Q| (−1)| P| [ P = Q] = [ P = Q] (217)


102 . Thus,

∑ (−1)|Q\ I | = (−1)|Q| (−1)| P| [ P = Q] = [ P = Q] .


I ⊆ Q;
P⊆ I

This proves Lemma 6.2.12 (b).

102 Proofof (217): If P ̸= Q, then the equality (217) boils down to (−1)|Q| (−1)| P| · 0 = 0
(since P ̸= Q entails [ P = Q] = 0), which is obviously true. Hence, (217) is proved
if P ̸= Q. Thus, for the rest of this proof, we WLOG assume that P = Q. Hence,
(−1)|Q| (−1)| P| = (−1)|Q| (−1)|Q| = (−1)|Q|+|Q| = 1 (since | Q| + | Q| = 2 | Q| is even).
Therefore, (−1)|Q| (−1)| P| [ P = Q] = [ P = Q]. This proves (217).
| {z }
=1
Math 701 Spring 2021, version April 6, 2024 page 346

As said above, this completes the proofs of Theorem 6.2.10, of Theorem 6.2.9
and of Theorem 6.2.1.
While Theorem 6.2.10 has played the part of the ultimate generalization to
us, it can be generalized further. Indeed, it is merely a particular case of
Möbius inversion for arbitrary posets (see, e.g., [Stanle11, Proposition 3.7.1] or
[Martin21, Theorem 2.3.1] or [Sagan19, Theorem 5.5.5] or [Sam21, Theorem
6.10] or [Wagner20, Theorem 14.6.4]).

6.3. More subtractive methods


TODO: Here should be a proof of the # of all-even d-tuples, and more generally
of d-tuples in which each number appears with a given parity. For now, see
[18f-hw4s, solution to Exercise 7] for this.

6.4. Determinants
Determinants were introduced by Leibniz in the 17th century, and quickly be-
came one of the most powerful tools in mathematics. They remained so until
the early 20th century. There is a 5-volume book by Thomas Muir [Muir30] that
merely summarizes the results found on determinants... until 1920.
Most of these old results are still interesting and nontrivial. The relative role
of determinants in mathematics has declined mainly because other parts of
mathematics have “caught up” and have produced easier ways to many of the
places that were previously only accessible through the study of determinants.
As with anything else, we will just present some of the most basic results
and methods related to determinants. For more, see [MuiMet60], [Zeilbe85],
[Grinbe15, Chapter 6], [Prasol94, Chapter I], [BruRys91, Chapter 9] and vari-
ous other sources. A good introduction to the most fundamental properties is
[Strick13].
Convention 6.4.1. For the rest of Section 6.4, we fix a commutative ring K. In
most examples, K will be Z or Q or a polynomial ring.

Convention 6.4.2. Let n, m ∈ N.


(a) If A is an n × m-matrix, then Ai,j shall mean the (i, j)-th entry of A, that
is, the entry of A in row i and column j.
(b) If ai,j is an element of K for each i ∈ [n] and each j ∈ [m], then

ai,j 1≤i≤n, 1≤ j≤m
shall denote the n × m-matrix whose (i, j)-th entry is ai,j for all i ∈ [n] and
j ∈ [m]. Explicitly:
a1,1 a1,2 · · · a1,m
 

 a2,1 a2,2 · · · a2,m


  
ai,j 1≤i≤n, 1≤ j≤m =  . .

 .. .. .. ..
. . . 
an,1 an,2 · · · an,m
Math 701 Spring 2021, version April 6, 2024 page 347


Note that the letters “i” and “j” in the notation “ ai,j 1≤i≤n, 1≤ j≤m ” are not
carved  in stone. We could just as well use any other letters instead, and write
a x,y 1≤ x≤n, 1≤y≤m or (somewhat misleadingly, but technically correctly)
 
a j,i 1≤ j≤n, 1≤i≤m for the exact same matrix. (However, a j,i 1≤i≤m, 1≤ j≤n is
a different matrix. Whichever index is mentioned first in the subscript after
the closing parenthesis is used to index rows; the other index is used to index
columns.)
(c) We let K n×m denote the set of all n × m-matrices with entries in K. This
is a K-module. If n = m, this is also a K-algebra.
(d) Let A ∈ K n×m be an n × m-matrix. The transpose A T of A is defined to
be the m × n-matrix whose entries are given by
 
AT = A j,i for all i ∈ [m] and j ∈ [n] .
i,j

6.4.1. Definition
There are several ways to define the determinant of a square matrix. The fol-
lowing is the most direct one:

Definition 6.4.3. Let n ∈ N. Let A ∈ K n×n be an n × n-matrix. The determi-


nant det A of A is defined to be the element

∑ (−1)σ A1,σ(1) A2,σ(2) · · · An,σ(n)


σ ∈ Sn | {z }
n
= ∏ Ai,σ(i)
i =1

of K. Here, as before:

• we let Sn denote the n-th symmetric group (i.e., the group of permuta-
tions of [n] = {1, 2, . . . , n});

• we let (−1)σ denote the sign of the permutation σ (as defined in Defi-
nition 5.4.1).
Math 701 Spring 2021, version April 6, 2024 page 348

Example 6.4.4. For n = 2, we have

det A = ∑ (−1)σ A1,σ(1) A2,σ(2)


σ ∈ S2

= (−1)id A1,id(1) A2,id(2) + (−1)s1 A1,s1 (1) A2,s1 (2)


| {z } | {z } | {z } | {z } | {z } | {z }
=1 = A1,1 = A2,2 =−1 = A1,2 = A2,1
 
since the two elements of S2 are the identity
map id and the simple transposition s1 = t1,2
= A1,1 A2,2 − A1,2 A2,1 .

Using less cumbersome notations, we can rewrite this as follows: For any
a, b, a′ , b′ ∈ K, we have
 
a b
det = ab′ − ba′ .
a′ b′

Similarly, for n = 3, we obtain


 
a b c
det  a′ b′ c′  = ab′ c′′ − ac′ b′′ − ba′ c′′ + bc′ a′′ + ca′ b′′ − cb′ a′′ .
a′′ b′′ c′′

(The six addends on the right hand side here correspond to the six permu-
tations in S3 , which in one-line notation are 123, 132, 213, 231, 312, and 321,
respectively.)
Similarly,
 for n = 1, we obtain that the determinant of the 1 × 1-matrix
a is 
det a = a.

Here, the “ a ” on the left hand side is a 1 × 1-matrix.
Finally, for n = 0, we obtain that the determinant of the 0 × 0-matrix ()
(this is an empty matrix, with no rows and no columns) is

det () = (empty product) = 1.

Some (particularly, older) texts use the notation | A| instead of det A for the
determinant of the matrix A.
The above definition of the determinant is purely combinatorial: it is an
alternating sum over the n-th symmetric group Sn . Typically, when computing
determinants, this definition is not in itself very useful (e.g., because Sn gets
rather large when n is large). However, in some cases, it suffices. Here are a
few examples:
Math 701 Spring 2021, version April 6, 2024 page 349

Example 6.4.5. Let a, b, c, d, e, . . . , p be 16 elements of K. Prove that


 
a b c d e
 p 0 0 0 f 
 
det  o 0 0 0 g  = 0.

 n 0 0 0 h 
m l k j i

(The “o” is a letter “oh”, not a zero. Not that it matters much...)
Proof of Example 6.4.5. Let A be the 5 × 5-matrix whose determinant we are try-
ing to identify as 0; thus, A1,1 = a and A1,2 = b and A3,2 = 0 and so on. Notice
that
Ai,j = 0 whenever i, j ∈ {2, 3, 4} (218)
(since A has a “hollow core”, i.e., a 3 × 3-square consisting entirely of zeroes in
its middle). We must prove that det A = 0.
Our definition of det A yields
5
det A = ∑ (−1)σ ∏ Ai,σ(i) . (219)
σ ∈ S5 i =1

Now, I claim that each of the addends in the sum on the right hand side is 0.
5
In other words, I claim that ∏ Ai,σ(i) = 0 for each σ ∈ S5 .
i =1
To prove this, fix σ ∈ S5 . The three numbers σ (2) , σ (3) , σ (4) are three dis-
tinct elements of [5] (distinct because σ is injective), so they cannot all belong to
the 2-element set {1, 5} (since there are no three distinct elements in a 2-element
set). Hence, at least one of them must belong to the complement {2, 3, 4} of this
set. In other words, there exists some i ∈ {2, 3, 4} such that σ (i ) ∈ {2, 3, 4}. This
i must then satisfy Ai,σ(i) = 0 (by (218), applied to j = σ (i )). Thus, we have
shown that there exists some i ∈ {2, 3, 4} such that Ai,σ(i) = 0.
5
This shows that at least one factor of the product ∏ Ai,σ(i) is 0. Thus, the
i =1
entire product is 0.
5
Forget that we fixed σ. We thus have proved that ∏ Ai,σ(i) = 0 for each
i =1
σ ∈ S5 . Hence,
5
det A = ∑ (−1)σ ∏ Ai,σ(i) = 0,
σ ∈ S5
|i=1 {z }
=0
and thus Example 6.4.5 is proved.
[We note that there are various alternative proofs, e.g., using Laplace expan-
sion. Also, if K is a field, you can argue that det A = 0 using rank arguments.
See [Grinbe15, Exercise 6.47 (a)] for a generalization of Example 6.4.5.]
Math 701 Spring 2021, version April 6, 2024 page 350

Example 6.4.6. Let n ∈ N, and let x1 , x2 , . . . , xn ∈ K and y1 , y2 , . . . , yn ∈ K.


Compute
 
x1 y1 x1 y2 · · · x1 y n
  
 x2 y1 x2 y2
 · · · x2 y n 
det xi y j 1≤i≤n, 1≤ j≤n = det  . .

 .. .. ... ..
. . 
x n y1 x n y2 · · · xn yn

Let us experiment with small n’s:

det () = 1;

det x1 y1 = x1 y1 ;
 
x1 y1 x1 y2
det = 0;
x2 y1 x2 y2
 
x1 y1 x1 y2 x1 y3
det  x2 y1 x2 y2 x2 y3  = 0.
x3 y1 x3 y2 x3 y3

This makes us suspect the following:

Proposition 6.4.7. Let n ∈ N be such that n ≥ 2. Let x1 , x2 , . . . , xn ∈ K and


y1 , y2 , . . . , yn ∈ K. Then,
  
det xi y j 1≤i≤n, 1≤ j≤n = 0.

Proof of Proposition 6.4.7 (sketched). The definition of the determinant yields


      
det xi y j 1≤i≤n, 1≤ j≤n = ∑ (−1)σ x1 yσ(1) x2 yσ(2) · · · xn yσ(n)

σ ∈ Sn | {z }
=( x1 x2 ··· xn )(yσ(1) yσ(2) ···yσ(n) )
 
= ∑ (−1) ( x1 x2 · · · xn ) yσ(1) yσ(2) · · · yσ(n)
σ

σ ∈ Sn | {z }
=y1 y2 ···yn
(since σ is a bijection [n]→[n])

= ∑ (−1)σ ( x1 x2 · · · xn ) (y1 y2 · · · yn )
σ ∈ Sn
= ( x1 x2 · · · x n ) ( y1 y2 · · · y n ) ∑ (−1)σ = 0.
σ ∈ Sn
| {z }
=0
(by (183))

Thus, Proposition 6.4.7 is proved.


Math 701 Spring 2021, version April 6, 2024 page 351

As a consequence of Proposition 6.4.7 (applied to xi = x and y j = 1), we see


that  
x x ··· x
 x x ··· x 
det  . =0 (220)
 
 .. .. . . .. 
. . . 
x x ··· x
| {z }
an n×n-matrix with n≥2

for any x ∈ K. In other words, if all entries of a square matrix of size ≥ 2 are
equal, then the determinant of this matrix is 0.

Example 6.4.8. Let n ∈ N, and let x1 , x2 , . . . , xn ∈ K and y1 , y2 , . . . , yn ∈ K.


Compute
 
x1 + y1 x1 + y2 · · · x1 + y n
    x2 + y1 x2 + y2 · · · x2 + y n 
det xi + y j 1≤i≤n, 1≤ j≤n = det  .
 
.
. .
. . . .
.
 . . . . 
x n + y1 x n + y2 · · · xn + yn

Let us experiment with small n’s:

det () = 1;

det x1 + y1 = x1 + y1 ;
 
x1 + y1 x1 + y2
det = − ( x1 − x2 ) ( y1 − y2 ) ;
x2 + y1 x2 + y2
 
x1 + y1 x1 + y2 x1 + y3
det  x2 + y1 x2 + y2 x2 + y3  = 0.
x3 + y1 x3 + y2 x3 + y3

So we suspect the following:

Proposition 6.4.9. Let n ∈ N be such that n ≥ 3. Let x1 , x2 , . . . , xn ∈ K and


y1 , y2 , . . . , yn ∈ K. Then,
  
det xi + y j 1≤i≤n, 1≤ j≤n = 0.

Proof of Proposition 6.4.9 (sketched). (See [Grinbe15, Example 6.7] for more de-
Math 701 Spring 2021, version April 6, 2024 page 352

tails.) The definition of the determinant yields


  
det xi + y j 1≤i≤n, 1≤ j≤n
    
= ∑ (−1)σ x1 + yσ(1) x2 + yσ(2) · · · xn + yσ(n)
σ ∈ Sn | {z }
!
n
 
= ∏ ( xi + y σ (i ) ) = ∑ ∏ xi ∏ y σ (i )
i =1 I ⊆[n] i ∈ I i ∈[n]\ I
(by (164), applied to ai = xi and bi =yσ(i) )
! 

= ∑ (−1)σ ∑ ∏ xi  ∏ y σ (i ) 
σ ∈ Sn I ⊆[n] i∈ I i ∈[n]\ I
! 

= ∑ ∑ (−1)σ ∏ xi  ∏ y σ (i ) 
I ⊆[n] σ ∈ Sn i∈ I i ∈[n]\ I
!
= ∑ ∏ xi ∑ (−1)σ ∏ y σ (i ) .
I ⊆[n] i∈ I σ ∈ Sn i ∈[n]\ I

Now, I claim that the inner sum is 0 for each I. In other words, I claim that

∑ (−1)σ ∏ y σ (i ) = 0 for each I ⊆ [n] . (221)


σ ∈ Sn i ∈[n]\ I

For instance, for I = {1, 2}, this is claiming that

∑ (−1)σ yσ(3) yσ(4) · · · yσ(n) = 0.


σ ∈ Sn

[Proof of (221): Fix a subset I of [n]. We shall show that all addends in the
sum ∑ (−1)σ ∏ yσ(i) cancel each other – i.e., that for each addend in this
σ ∈ Sn i ∈[n]\ I
sum, there is a different addend with the same product of y j ’s but a different
sign (−1)σ . To achieve this, we need to pair up each σ ∈ Sn with a different

permutation σ′ = σtu,v ∈ Sn that satisfies ∏ yσ′ (i) = ∏ yσ(i) but (−1)σ =
i ∈[n]\ I i ∈[n]\ I
σ
− (−1) . (Indeed, this pairing will then produce the required cancellations:
the addend for each σ will cancel the addend for the corresponding σ′ . To
be more rigorous, we are here applying Lemma 6.1.3 to A = Sn , X = Sn
and sign σ = (−1)σ ∏ yσ(i) (of course, this should not be confused for the
i ∈[n]\ I
σ
notation sign σ for (−1) ) and
f = (the map Sn → Sn that sends each σ ∈ Sn to the corresponding σ′ ).)
So let us construct our pairing. Indeed, from I ⊆ [n], we obtain | I | +
|[n] \ I | = |[n]| = n ≥ 3; hence, at least one of the two sets I and [n] \ I has size
> 1. In other words, must be in one of the following two cases:
Math 701 Spring 2021, version April 6, 2024 page 353

Case 1: We have | I | > 1.


Case 2: We have |[n] \ I | > 1.
Let us first consider Case 1. In this case, we have | I | > 1. Thus, | I | ≥
2. Pick two distinct elements u and v of I. (These exist, since | I | ≥ 2.)
Now, for each permutation σ ∈ Sn , we set σ′ := σtu,v ∈ Sn . Then, each
σ ∈ Sn satisfies σ′′ = |{z} σ′ tu,v = σ tu,v tu,v = σ. Hence, we can pair up
| {z }
=σtu,v
=t2u,v =id
each σ ∈ Sn with σ′ ∈ Sn . Any two permutations σ and σ′ that are paired
with each other have different signs (indeed, if σ ∈ Sn , then σ′ = σtu,v and

thus (−1)σ = (−1)σtu,v = (−1)σ (−1)tu,v = − (−1)σ ), but the correspond-
| {z }
=−1
ing products ∏ yσ(i) and ∏ yσ′ (i) are equal (indeed, the permutations
i ∈[n]\ I i ∈[n]\ I
σ and σ′
= σtu,v differ only in their values at u and v, but neither of these
two values appears in any of our two products, since u, v ∈ I). This shows
that our pairing has precisely the properties we want: Each σ ∈ Sn satisfies
σ′
∏ yσ′ (i) = ∏ yσ(i) but (−1) = − (−1) (and thus σ′ ̸= σ). As explained
σ
i ∈[n]\ I i ∈[n]\ I
above, this completes the proof of (221) in Case 1.
Let us now consider Case 2. In this case, we have |[n] \ I | > 1. Thus,
|[n] \ I | ≥ 2. Pick two distinct elements u and v of [n] \ I. (These exist,
since |[n] \ I | ≥ 2.) We now proceed just as we did in Case 1: For each
permutation σ ∈ Sn , we set σ′ := σtu,v ∈ Sn . Then, each σ ∈ Sn satisfies
σ′′ = |{z} σ′ tu,v = σ tu,v tu,v = σ. Hence, we can pair up each σ ∈ Sn with
| {z }
=σtu,v
=t2u,v =id
σ′ ∈ Sn . Any two permutations σ and σ′ that are paired with each other have
different signs (this can be seen as in Case 1), but the corresponding products
∏ yσ(i) and ∏ yσ′ (i) are equal (indeed, the permutation σ′ = σtu,v can
i ∈[n]\ I i ∈[n]\ I
be obtained from σ by swapping the values at u and v (so that σ′ (u) = σ (v)
and σ′ (v) = σ (u)); thus, the products ∏ yσ(i) and ∏ yσ′ (i) differ only in
i ∈[n]\ I i ∈[n]\ I
the order in which their factors yσ(u) and yσ(v) appear in them103 ; hence, these
products are equal, since K is commutative). Again, this shows that our pairing
has the properties we want, and thus the proof of (221) is complete in Case 2.
Thus, (221) is proved in both cases.]

103 Both factors yσ(u) and yσ(v) do indeed appear in these products, since u and v belong to
[n] \ I.
Math 701 Spring 2021, version April 6, 2024 page 354

Now, we can finish our computation of the original determinant:


!
 
det xi + y j 1≤i≤n, 1≤ j≤n = ∑ ∏ xi ∑ (−1)σ ∏ yσ(i) = 0.

I ⊆[n] i∈ I σ ∈ Sn i ∈[n]\ I
| {z }
=0
(by (221))

This proves Proposition 6.4.9.

6.4.2. Basic properties


The examples above show that pedestrian proofs of facts about determinants
(using just the definition) are possible, but become cumbersome fairly quickly.
Fortunately, determinants have a lot of properties that, once proved, provide
new methods for computing determinants.
Let us first recall a few basic facts that should (ideally) be known from a good
course on linear algebra. Recall that the transpose of a matrix A is denoted by
AT .

 determinants). Let n ∈ N. If A ∈ K
Theorem 6.4.10 (Transposes preserve n×n
T
is any n × n-matrix, then det A = det A.

Proof. See [Strick13, Corollary B.16] or [Grinbe15, Exercise 6.4] or [Laue15,


§5.3.2].

Theorem 6.4.11 (Determinants of triangular matrices). Let n ∈ N. Let A ∈


K n×n be a triangular (i.e., lower-triangular or upper-triangular) n × n-matrix.
Then, the determinant of the matrix A is the product of its diagonal entries.
That is,
det A = A1,1 A2,2 · · · An,n .

Proof. See [Strick13, Proposition B.11] or [Grinbe15, Exercise 6.3 and the para-
graph after Exercise 6.4].
As a consequence of Theorem 6.4.11, we see that the determinant of a diag-
onal matrix is the product of its diagonal entries (since any diagonal matrix is
triangular).

Theorem 6.4.12 (Row operation properties). Let n ∈ N. Let A ∈ K n×n be an


n × n-matrix. Then:
(a) If we swap two rows of A, then det A gets multiplied by −1.
(b) If A has a zero row (i.e., a row that consists entirely of zeroes), then
det A = 0.
(c) If A has two equal rows, then det A = 0.
Math 701 Spring 2021, version April 6, 2024 page 355

(d) Let λ ∈ K. If we multiply a row of A by λ (that is, we multiply all


entries of this one row by λ, while leaving all other entries of A unchanged),
then det A gets multiplied by λ.
(e) If we add a row of A to another row of A (that is, we add each entry
of the former row to the corresponding entry of the latter), then det A stays
unchanged.
(f) Let λ ∈ K. If we add λ times a row of A to another row of A (that is,
we add λ times each entry of the former row to the corresponding entry of
the latter), then det A stays unchanged.
(g) Let B, C ∈ K n×n be two further n × n-matrices. Let k ∈ [n]. Assume
that

(the k-th row of C ) = (the k-th row of A) + (the k-th row of B) ,

whereas each i ̸= k satisfies

(the i-th row of C ) = (the i-th row of A) = (the i-th row of B) .

Then,
det C = det A + det B.

Example 6.4.13. Let us see what Theorem 6.4.12 is saying in some particular
cases (specifically, for 3 × 3-matrices):
(a) One instance of Theorem 6.4.12 (a) is
   
a b c a b c
det  a′′ b′′ c′′  = − det  a′ b′ c′  .
a′ b′ c′ a′′ b′′ c′′
(b) One instance of Theorem 6.4.12 (b) is
 
a b c
det  0 0 0  = 0.
a′′ b′′ c′′
(c) One instance of Theorem 6.4.12 (c) is
 
a b c
det  a′ b′ c′  = 0.
a b c
(d) One instance of Theorem 6.4.12 (d) is
   
a b c a b c
det  λa′ λb′ λc′  = λ det  a′ b′ c′  .
a′′ b′′ c′′ a′′ b′′ c′′
Math 701 Spring 2021, version April 6, 2024 page 356

(e) One instance of Theorem 6.4.12 (e) is


   
a b c a b c
det  a′ + a′′ b′ + b′′ c′ + c′′  = det  a′ b′ c′  .
a′′ b′′ c′′ a′′ b′′ c′′

(f) One instance of Theorem 6.4.12 (f) is


   
a b c a b c
det  a′ + λa′′ b′ + λb′′ c′ + λc′′  = det  a′ b′ c′  .
a′′ b′′ c′′ a′′ b′′ c′′

(g) One instance of Theorem 6.4.12 (g) is


     
a b c a b c a b c
′ ′
det d + d e + e f + f
 ′  = det d
 e f  + det  d′ e′ f ′  .
g h i g h i g h i
| {z } | {z } | {z }
this is C this is A this is B

(Specifically, this is the particular case of Theorem 6.4.12 (g) for n = 3 and
k = 2.)

Parts (b), (d) and (g) of Theorem 6.4.12 are commonly summarized under the
mantle of “multilinearity of the determinant” or “linearity of the determinant in the
k-th row”. In fact, they say that (for any given n ∈ N and k ∈ [n]) if we hold all
rows other than the k-th row of an n × n-matrix A fixed, then det A depends
K-linearly on the k-th row of A.
Proof of Theorem 6.4.12. (a) See [Grinbe15, Exercise 6.7 (a)]. This is also a partic-
ular case of [Strick13, Corollary B.19].
(b) See [Grinbe15, Exercise 6.7 (c)]. This is also near-obvious from Definition
6.4.3.
(c) See [Grinbe15, Exercise 6.7 (e)] or [Laue15, §5.3.3, property (iii)] or [19fla,
2019-10-23 blackboard notes, Theorem 1.3.3].104
(d) See [Grinbe15, Exercise 6.7 (g)] or [Laue15, §5.3.3, property (ii)]. This is
also a particular case of [Strick13, Corollary B.19].

104 Warning:Several authors claim to give an easy proof of part (c) using part (a). This “proof”
goes as follows: If A has two equal rows, then swapping these rows leaves A unchanged,
but (because of Theorem 6.4.12) flips the sign of det A. Hence, in this case, we have det A =
− det A, so that 2 det A = 0 and therefore det A = 0, right? Not so fast! In order to obtain
det A = 0 from 2 det A = 0, we need the element 2 of K to be invertible or at least be a non-
zero-divisor (since we have to divide by 2). This is true when K is one of the “high-school
rings” Z, Q, R and C, but it is not true when K is the field F2 with 2 elements (or, more
generally, any field of characteristic 2). This slick argument can be salvaged, but in the form
just given it is incomplete.
Math 701 Spring 2021, version April 6, 2024 page 357

(f) See [Grinbe15, Exercise 6.8 (a)]. This is also a particular case of [Strick13,
Corollary B.19].
(e) This is the particular case of part (f) for λ = 1.
(g) See [Grinbe15, Exercise 6.7 (i)] or [Laue15, §5.3.3, property (i)] or [19fla,
2019-10-30 blackboard notes, Theorem 1.2.3].

Theorem 6.4.14 (Column operation properties). Theorem 6.4.12 also holds if


we replace “row” by “column” throughout it.

Proof. Theorem 6.4.10 shows that the determinant of a matrix does not change
when we replace it by its transpose; however, the rows of this transpose A T are
the transposes of the columns of A. Thus, Theorem 6.4.14 follows by applying
Theorem 6.4.12 to the transposes of all the matrices involved. (See [Grinbe15,
Exercises 6.7 and 6.8] for the details.)

Corollary 6.4.15. Let n ∈ N. Let A ∈ K n×n and τ ∈ Sn . Then,


  
det Aτ (i),j = (−1)τ · det A (222)
1≤i ≤n, 1≤ j≤n

and   
det Ai,τ ( j) = (−1)τ · det A. (223)
1≤i ≤n, 1≤ j≤n

In words: When we permute the rows or the columns of a matrix, its deter-
minant gets multiplied by the sign of the permutation.
Proof of Corollary 6.4.15. Let us first prove (223).
The definition of det A yields

det A = ∑ (−1)σ A1,σ(1) A2,σ(2) · · · An,σ(n)


σ ∈ Sn | {z }
n
= ∏ Ai,σ(i)
i =1
n
= ∑ (−1)σ ∏ Ai,σ(i) . (224)
σ ∈ Sn i =1
  
The definition of det Ai,τ ( j) yields
1≤i ≤n, 1≤ j≤n
  
det Ai,τ ( j)
1≤i ≤n, 1≤ j≤n
= ∑ (−1)σ A1,τ (σ(1)) A2,τ (σ(2)) · · · An,τ (σ(n))
σ ∈ Sn | {z }
n
= ∏ Ai,τ (σ(i))
i =1
n
= ∑ (−1)σ ∏ Ai,τ (σ(i)) . (225)
σ ∈ Sn i =1
Math 701 Spring 2021, version April 6, 2024 page 358

However, Sn is a group. Thus, the map

Sn → Sn ,
σ 7 → τ −1 σ
n
is a bijection. Hence, we can substitute τ −1 σ for σ in the sum ∑ (−1)σ ∏ Ai,τ (σ(i)) .
σ ∈ Sn i =1
Thus, we obtain
n
∑ (−1)σ ∏ Ai,τ (σ(i))
σ ∈ Sn i =1
n
τ −1 σ
= ∑ (−1) ∏ Ai,τ ((τ −1 σ)(i))
σ ∈ Sn i =1
| {z }
−1
| {z }
=(−1)τ ·(−1)σ = Ai,σ(i)
(by Proposition 5.4.2 (d), (since τ ((τ −1 σ )(i ))=(ττ −1 σ )(i )=σ(i )
applied to τ −1 and σ instead of σ and τ)
(because ττ −1 σ=σ))
−1
n n
τ −1
= ∑ (−1)τ · (−1)σ ∏ Ai,σ(i) = (−1) · ∑ (−1)σ
∏ Ai,σ(i)
σ ∈ Sn i =1 σ ∈ Sn i =1
| {z }
=(−1)τ | {z }
(by Proposition 5.4.2 (f), =det A
applied to τ −1 instead of σ) (by (224))

= (−1)τ · det A.
In view of this, we can rewrite (225) as
  
det Ai,τ ( j) = (−1)τ · det A.
1≤i ≤n, 1≤ j≤n

This proves (223).


Now, we can easily derive (222) by using the transpose. Indeed, applying
(223) to A T instead of A, we obtain
   !
 
det AT = (−1)τ · det A T
i,τ ( j) 1≤i ≤n, 1≤ j≤n | {z }
=det A
(by Theorem 6.4.10)
= (−1)τ · det A. (226)

However, each (i, j) ∈ [n]2 satisfies


   
AT = Aτ ( j),i by the definition of A T
.
i,τ ( j)

Thus,
       T
AT = Aτ ( j),i = Aτ (i),j
i,τ ( j) 1≤i ≤n, 1≤ j≤n 1≤i ≤n, 1≤ j≤n 1≤i ≤n, 1≤ j≤n
Math 701 Spring 2021, version April 6, 2024 page 359

(again by the definition of the transpose). Therefore,


   !  T !

T
det A = det Aτ (i),j
i,τ ( j) 1≤i ≤n, 1≤ j≤n 1≤i ≤n, 1≤ j≤n
  
= det Aτ (i),j
1≤i ≤n, 1≤ j≤n
 
(by Theorem 6.4.10, applied to Aτ (i),j instead of A). Hence,
1≤i ≤n, 1≤ j≤n

    !
 
det Aτ (i),j = det AT = (−1)τ · det A
1≤i ≤n, 1≤ j≤n i,τ ( j) 1≤i ≤n, 1≤ j≤n

(by (226)). This proves (222). Thus, the proof of Corollary 6.4.15 is complete.
The following is probably the most remarkable property of determinants:

Theorem 6.4.16 (Multiplicativity of the determinant). Let n ∈ N. Let A, B ∈


K n×n be two n × n-matrices. Then,

det ( AB) = det A · det B.

Proof. See [Strick13, Theorem B.17] or [Grinbe15, Theorem 6.23] or [Zeilbe85,


§5] or [Ford21, Lemma 4.5.5 (1)] or [Laue15, Theorem 5.7]. (Many of these
proofs are at least partly combinatorial, but the one in [Zeilbe85, §5] is fully so,
constructing quite explicitly a sign-reversing involution.)
As an application of these properties, let us reprove Proposition 6.4.7 and
Proposition 6.4.9:
Second proof of Proposition 6.4.7. Define two n × n-matrices
   
x1 0 0 · · · 0 y1 y2 y3 · · · yn
 x2 0 0 · · · 0   0 0 0 ··· 0 
   
 x3 0 0 · · · 0  ··· 0 
A :=  and B :=  0 0 0 .


 .. .. .. . . .   . .. .. .. .. 
 . . . . ..   .. . . . . 
xn 0 0 · · · 0 0 0 0 ··· 0

(Only the first column of A and the first row of B have any nonzero entries.)
Math 701 Spring 2021, version April 6, 2024 page 360

Now,
 
x1 y1 x1 y2 · · · x1 y n
  x2 y1 x2 y2 · · · x2 y n 
xi y j =
 
1≤i ≤n, 1≤ j≤n .. .. .. .. 
 . . . . 
x n y1 x n y2 · · · xn yn
  
x1 0 0 · · · 0 y1 y2 y3 · · · yn

 x2 0 0 · · · 0 
 0 0 0 ··· 0  
=
 x3 0 0 · · · 0 
 0 0 0 ··· 0   = AB,
 .. .. .. . . ..  .. .. .. .. .. 
 . . . . .  . . . . . 
xn 0 0 · · · 0 0 0 0 ··· 0
| {z }| {z }
=A =B

so that   
det xi y j 1≤i ≤n, 1≤ j≤n
= det ( AB) = det A · det B
(by Theorem 6.4.16). However, the matrix A has a zero column (since n ≥ 2),
and thus satisfies det A = 0 (by Theorem 6.4.14 (b)105 ). Hence,
  
| {zA} · det B = 0.
det xi y j 1≤i≤n, 1≤ j≤n = det
=0

Thus, Proposition 6.4.7 is proven again.


Second proof of Proposition 6.4.9. Define two n × n-matrices
   
x1 1 0 · · · 0 1 1 1 ··· 1
 x2 1 0 · · · 0 
 
 y1 y2 y3
 · · · yn 
 x3 1 0 · · · 0  ··· 0 
A :=  and B :=  0 0 0 .


 .. .. .. . . .   . .. .. .. .. 
 . . . . ..   .. . . . . 
xn 1 0 · · · 0 0 0 0 ··· 0

(Only the first two columns of A and the first two rows of B have any nonzero
entries.)

105 Of course,by “Theorem 6.4.14 (b)”, we mean “the analogue of Theorem 6.4.12 (b) for columns
instead of rows”.
Math 701 Spring 2021, version April 6, 2024 page 361

Now,
 
x1 + y1 x1 + y2 · · · x1 + y n
  x2 + y1 x2 + y2 · · · x2 + y n 
xi + y j =
 
1≤i ≤n, 1≤ j≤n .. .. ... .. 
 . . . 
x n + y1 x n + y2 · · · xn + yn
  
x1 1 0 · · · 0 1 1 1 ··· 1

 x2 1 0 · · · 0  
 y1 y2 y3 · · · yn 
=
 x3 1 0 · · · 0  
 0 0 0 ··· 0   = AB,
.. .. .. . . .  .. .. .. .. .. 
. ..  

 . . . . . . . . 
xn 1 0 · · · 0 0 0 0 ··· 0
| {z }| {z }
=A =B

so that   
det xi + y j 1≤i ≤n, 1≤ j≤n
= det ( AB) = det A · det B
(by Theorem 6.4.16). However, the matrix A has a zero column (since n ≥ 3),
and thus satisfies det A = 0 (by Theorem 6.4.14 (b)). Hence,
  
| {zA} · det B = 0.
det xi + y j 1≤i≤n, 1≤ j≤n = det
=0

Thus, Proposition 6.4.9 is proven again.


The following fact follows equally easily from everything we know so far
about determinants:

Corollary 6.4.17. Let n ∈ N. Let A ∈ K n×n and d1 , d2 , . . . , dn ∈ K. Then,


  
det di Ai,j 1≤i≤n, 1≤ j≤n = d1 d2 · · · dn · det A (227)

and   
det d j Ai,j 1≤i ≤n, 1≤ j≤n
= d1 d2 · · · dn · det A. (228)


First proof of Corollary 6.4.17 (sketched). The matrix di Ai,j 1≤i≤n, 1≤ j≤n is obtained
from the matrix A by multiplying the 1-st row by d1 , multiplying the 2-nd row
by d2 , multiplying the 3-rd row by d3 , and so on. Theorem 6.4.12 (d) (applied
repeatedly – once for each row) shows that these multiplications result in the
determinant of A getting multiplied by d1 , d2 , . . . , dn (in succession). Hence,
  
det di Ai,j 1≤i≤n, 1≤ j≤n = d1 d2 · · · dn · det A.

Thus, (227) is proved. The proof of (228) is analogous (using columns instead
of rows). Corollary 6.4.17 is proven.
Math 701 Spring 2021, version April 6, 2024 page 362

Second proof of Corollary 6.4.17 (sketched). Let D be the diagonal matrix


 
d1 0 · · · 0
 0 d2 · · · 0 
∈ K n×n .
 
 .. .. . . .. 
 . . . . 
0 0 ··· dn

Then, D is upper-triangular; hence, Theorem 6.4.11 shows that its determinant


is det D = d1 d2 · · · dn . 
However, it is easy to see that di Ai,j 1≤i≤n, 1≤ j≤n = DA. Hence,
  
det di Ai,j 1≤i ≤n, 1≤ j≤n
= det ( DA) = det D · det A (by Theorem 6.4.16)
= d1 d2 · · · dn · det A
(since det D = d1 d2 · · · dn ) .

This proves (227). A similar argument using d j Ai,j 1≤i≤n, 1≤ j≤n = AD proves
(228). Thus, Corollary 6.4.17 is proven again.
Third proof of Corollary 6.4.17. Let us now proceed completely elementarily. The
definition of a determinant yields

det A = ∑ (−1)σ A1,σ(1) A2,σ(2) · · · An,σ(n) (229)


σ ∈ Sn

and
      

 σ
det di Ai,j 1≤i ≤n, 1≤ j≤n
= (− 1 ) d A
1 1,σ(1) d A
2 2,σ(2) · · · d A
n n,σ(n)
σ ∈ Sn | {z }
=(d1 d2 ···dn )( A1,σ(1) A2,σ(2) ··· An,σ(n) )
 
= ∑ (−1)σ (d1 d2 · · · dn ) A1,σ(1) A2,σ(2) · · · An,σ(n)
σ ∈ Sn
= d1 d2 · · · d n · ∑ (−1)σ A1,σ(1) A2,σ(2) · · · An,σ(n)
σ ∈ Sn
| {z }
=det A
(by (229))
= d1 d2 · · · dn · det A
Math 701 Spring 2021, version April 6, 2024 page 363

and
  
det d j Ai,j 1≤i ≤n, 1≤ j≤n
    
= ∑ (− 1 ) σ
d A
σ(1) 1,σ(1) d A
σ(2) 2,σ(2) · · · d A
σ(n) n,σ(n)
σ ∈ Sn | {z }
=(dσ(1) dσ(2) ···dσ(n) )( A1,σ(1) A2,σ(2) ··· An,σ(n) )
   
= ∑ (−1)σ d σ (1) d σ (2) · · · d σ ( n ) A1,σ(1) A2,σ(2) · · · An,σ(n)
σ ∈ Sn | {z }
=d1 d2 ···dn
(since σ is a permutation of the set [n])
 
= ∑ (−1)σ (d1 d2 · · · dn ) A1,σ(1) A2,σ(2) · · · An,σ(n)
σ ∈ Sn
= d1 d2 · · · dn · det A (as we have seen above) .

Once again, this proves Corollary 6.4.17.

6.4.3. Cauchy–Binet
The multiplicativity of the determinant generalizes to non-square matrices A
and B, but the general statement is subtler and less famous:

Theorem 6.4.18 (Cauchy–Binet formula). Let n, m ∈ N. Let A ∈ K n×m be an


n × m-matrix, and let B ∈ K m×n be an m × n-matrix. Then,


 
det ( AB) = det colsg1 ,g2 ,...,gn A · det rowsg1 ,g2 ,...,gn B .
( g1 ,g2 ,...,gn )∈[m]n ;
g1 < g2 <···< gn

Here, we are using the following notations:

• We let colsg1 ,g2 ,...,gn A be the n × n-matrix obtained from A by removing


all columns other than the g1 -st, the g2 -nd, the g3 -rd, etc.. In other
words,  
colsg1 ,g2 ,...,gn A := Ai,gj .
1≤i ≤n, 1≤ j≤n

• We let rowsg1 ,g2 ,...,gn B be the n × n-matrix obtained from B by removing


all rows other than the g1 -st, the g2 -nd, the g3 -rd, etc.. In other words,

rowsg1 ,g2 ,...,gn B := Bgi ,j 1≤i≤n, 1≤ j≤n .

Informally, we can rewrite the claim of Theorem 6.4.18 as follows:

det ( AB) = ∑ det (some n columns of A) · det (the corresponding n rows of B) .


Math 701 Spring 2021, version April 6, 2024 page 364

The sum runs over all ways to form an n × n-matrix by picking n columns of A
(in increasing order, with no repetitions). The corresponding n rows of B form
an n × n-matrix as well.
 
a b c
Example 6.4.19. Let n = 2 and m = 3, and let A = and
a′ b′ c′
x′
 
x
B= y y′ . Then, Theorem 6.4.18 yields
z z′


 
det ( AB) = det colsg1 ,g2 A · det rowsg1 ,g2 B
( g1 ,g2 )∈[3]2 ;
g1 < g2
= det (cols1,2 A) · det (rows1,2 B)
+ det (cols1,3 A) · det (rows1,3 B)
+ det (cols2,3 A) · det (rows2,3 B)
!
2
since the only 2-tuples ( g1 , g2 ) ∈ [3]
satisfying g1 < g2 are (1, 2) and (1, 3) and (2, 3)
x x′
   
a b
= det · det
a′ b′ y y′
x x′
   
a c
+ det · det
a′ c′ z z′
y y′
   
b c
+ det · det
b′ c′ z z′
= ab′ − ba′ xy′ − yx ′ + ac′ − ca′ xz′ − zx ′
   

+ bc′ − cb′ yz′ − zy′ .


 

You can check this against the equality

det ( AB) = ( ax + by + cz) a′ x ′ + b′ y′ + c′ z′




− ax ′ + by′ + cz′ a′ x + b′ y + c′ z ,
 

which is obtained by directly computing

x x′
 
ax ′ + by′ + cz′
   
a b c  y y′  = ax + by + cz
AB = .
a′ b′ c′ ′ a′ x + b′ y + c′ z a′ x ′ + b′ y′ + c′ z′
z z
Math 701 Spring 2021, version April 6, 2024 page 365

Remark 6.4.20. If m < n, then the claim of Theorem 6.4.18 becomes


 
det ( AB) = det colsg1 ,g2 ,...,gn A · det rowsg1 ,g2 ,...,gn B
( g1 ,g2 ,...,gn )∈[m]n ;
g1 < g2 <···< gn
= (empty sum)
(since the set [m] has no n distinct elements)
= 0.

When K is a field, this can also be seen trivially from rank considerations (to
wit, the matrix A has rank ≤ m < n, and thus the product AB has rank < n
as well). When K is not a field, the notion of a rank is not available, so we do
need Theorem 6.4.18 to obtain this (although there are ways around this).
If m = n, then the claim of Theorem 6.4.18 becomes


 
det ( AB) = det colsg1 ,g2 ,...,gn A · det rowsg1 ,g2 ,...,gn B
( g1 ,g2 ,...,gn )∈[n]n ;
g1 < g2 <···< gn
= det (cols1,2,...,n A) · det (rows1,2,...,n B)
| {z } | {z }
=A =B
!
since the only n-tuple ( g1 , g2 , . . . , gn ) ∈ [n]n
satisfying g1 < g2 < · · · < gn is (1, 2, . . . , n)
= det A · det B,

so that we recover the multiplicativity of the determinant (Theorem 6.4.16).

Proof of Theorem 6.4.18. See [Grinbe15, Theorem 6.32] or [Schwar16] or [Knuth1,


§1.2.3, Exercise 46] or [Loehr11, Theorem 9.53] or [Stanle18, Theorem 9.4] or
[Zeng93, §2] (a fully combinatorial proof) or https://math.stackexchange.
com/questions/3243063/ .

6.4.4. det ( A + B)
So much for det ( AB). What can we say about det ( A + B) ? The answer is
somewhat cumbersome, but still rather useful. We need some notation to state
it:

Definition 6.4.21. Let n, m ∈ N. Let A be an n × m-matrix.


Let U be a subset of [n]. Let V be a subset of [m].
V
Then, subU A is the |U | × |V |-matrix defined as follows:
Writing the two sets U and V as
 
U = u1 , u2 , . . . , u p and V = v1 , v2 , . . . , v q
Math 701 Spring 2021, version April 6, 2024 page 366

with
u1 < u2 < · · · < u p and v1 < v2 < · · · < v q ,
we set  
V
subU A := Aui ,v j .
1≤i ≤ p, 1≤ j≤q
V
Roughly speaking, subU A is the matrix obtained from A by focusing only
on the i-th rows for i ∈ U (that is, removing all the other rows) and only on
the j-th columns for j ∈ V (that is, removing all the other columns).
V
This matrix subU A is called the submatrix of A obtained by restricting to the
U-rows and the V-columns.
  If this matrix is square (i.e., if |U | = |V |), then its
V
determinant det subU A is called a minor of A.

Example 6.4.22. We have


 
a b c  
{1,3}  ′ ′ ′ a c
sub{1,2} a b c =
a′ c′

′′
a b c′′ ′′

and  
a b c
{3}
sub{2}  a′ b′ c′  = c′

.
a′′ b′′ c′′

Theorem 6.4.23. Let n ∈ N. For any subset I of [n], we let e I be the comple-
ment [n] \ I of I. (For example, if n = 4 and I = {1, 4}, then e
I = {2, 3}.)
For any finite set S of integers, define sum S := ∑ s.
s∈S
Let A and B be two n × n-matrices in K n×n . Then,
   
det ( A + B) = ∑ ∑ (−1) sum P+sum Q Q Q
det subP A · det sub e B .
e
P
P⊆[n] Q⊆[n];
| P|=| Q|
Math 701 Spring 2021, version April 6, 2024 page 367

Example 6.4.24. For n = 2, this is saying that


 
{1,2}
det ( A + B) = (−1)sum ∅+sum ∅ det sub∅

A · det sub {1,2}
B
| {z }| {z ∅ } | {z }
=(−1)0+0 =1 =det()=1 =det B
   
{1} {2}
+ (−1)sum{1}+sum{1} det sub{1} A · det sub{2} B
| {z } | {z } | {z }
=(−1)1+1 =1 = A1,1 = B2,2
{1}
(since sub{1} A is
 
the 1×1-matrix A1,1 )
   
{2} {1}
+ (−1)sum{1}+sum{2} det sub{1} A · det sub{2} B
| {z }| {z } | {z }
1+2
=(−1) =−1 = A1,2 = B2,1
   
{1} {2}
+ (−1)sum{2}+sum{1} det sub{2} A · det sub{1} B
| {z }| {z } | {z }
2+1
=(−1) =−1 = A2,1 = B1,2
   
{2} {1}
+ (−1)sum{2}+sum{2} det sub{2} A · det sub{1} B
| {z }| {z } | {z }
2+2
=(−1) =1 = A2,2 = B1,1
 
{1,2}
+ (−1)sum{1,2}+sum{1,2} det sub{1,2} A · det sub∅

∅B
| {z }| {z } | {z }
3+3 =det()=1
=(−1) =1 =det A
= det B + A1,1 B2,2 − A1,2 B2,1 − A2,1 B1,2 + A2,2 B1,1 + det A
= det A + det B − A1,2 B2,1 − A2,1 B1,2 + A1,1 B2,2 + A2,2 B1,1 .

Theorem 6.4.23 can be thought of as a kind of “binomial theorem” for deter-


minants: On its right hand side (for n > 0) is a sum that contains both det A
and det B as addends (in fact, det A is the addend for P = Q = [n], whereas
det B is the addend for P = Q = ∅) as well as many “mixed” addends that
contain both a part of A and a part of B.

Proof of Theorem 6.4.23. See [Grinbe15, Theorem 6.160]. (Note that subQ
P A is
w( Q)
called subw( P) A in [Grinbe15].) The main difficulty of the proof is bookkeep-
ing; the underlying idea is simple (expand everything and regroup).
Here is a rough outline of the argument. If σ ∈ Sn , and if P is a subset of [n], then
σ ( P) = {σ (i ) | i ∈ P} is a subset of [n] as well, and has the same size as P (since σ is
a permutation and therefore injective); thus, it satisfies | P| = |σ ( P)|.
Math 701 Spring 2021, version April 6, 2024 page 368

The definition of det ( A + B) yields


    
det ( A + B) = ∑ (−1)σ A1,σ(1) + B1,σ(1) A2,σ(2) + B2,σ(2) · · · An,σ(n) + Bn,σ(n)
σ ∈ Sn | {z }
!  
n
= ∏ ( Ai,σ(i) + Bi,σ(i) )= ∑ ∏ Ai,σ(i) ∏ Bi,σ(i)
i =1 P⊆[n] i∈ P i ∈ Pe
(by (164), applied to ai = Ai,σ(i) and bi = Bi,σ(i) )
! !
= ∑ (−1) σ
∑ ∏ Ai,σ(i) ∏ Bi,σ(i)
σ ∈ Sn P⊆[n] i∈ P i ∈ Pe
! !
= ∑ ∑ (−1)σ ∏ Ai,σ(i) ∏ Bi,σ(i)
P⊆[n] σ ∈ Sn i∈ P i ∈ Pe
! !
= ∑ ∑ ∑ (−1) σ
∏ Ai,σ(i) ∏ Bi,σ(i)
P⊆[n] Q⊆[n]; σ ∈ Sn ; i∈ P i ∈ Pe
| P|=| Q| σ( P)= Q

(here, we have split the inner sum according to the value of the subset σ ( P) = {σ (i ) | i ∈ P},
recalling that it satisfies | P| = |σ ( P)|).
Now, fix two subsets P and Q of [n] satisfying | P| = | Q|. Thus, Pe = Q e as well.
Write the sets P, Q, Pe and Q e as

P = { p1 < p2 < · · · < p k } and Q = { q1 < q2 < · · · < q k } and


Pe = p1′ < p2′ < · · · < p′ℓ e = q1′ < q2′ < · · · < q′ ,
 
and Q ℓ

where the notation “U = {u1 < u2 < · · · < u a }” is just a shorthand way to say “U =
{u1 , u2 , . . . , u a } and u1 < u2 < · · · < u a ” (or, equivalently, “the elements of U in strictly
increasing order are u1 , u2 , . . . , u a ”). Now, for each permutation σ ∈ Sn satisfying
σ ( P) = Q, we see that:
• The elements σ ( p1 ) , σ ( p2 ) , . . . , σ ( pk ) are the elements q1 , q2 , . . . , qk in some or-
der (since σ ( P) = Q), and thus there exists a unique permutation α ∈ Sk such
that
σ ( p i ) = q α (i ) for each i ∈ [k ] .
We denote this α by ασ .
• The elements σ ( p1′ ) , σ ( p2′ ) , . . 
. , σ( p′ℓ ) are the elements q1′ , q2′ , . . . , q′ℓ in some or-
der (since σ ( P) = Q entails σ Pe = Q, e because σ is a permutation of [n]), and
thus there exists a unique permutation β ∈ Sℓ such that
σ pi′ = q′β(i)

for each i ∈ [ℓ] .
We denote this β by β σ .
Thus, for each permutation σ ∈ Sn satisfying σ ( P) = Q, we have defined two
permutations ασ ∈ Sk and β σ ∈ Sℓ that (roughly speaking) describe the actions of σ on
the subsets P and P,
e respectively. It is not hard to see that the map

{permutations σ ∈ Sn satisfying σ ( P) = Q} → Sk × Sℓ ,
σ 7→ (ασ , β σ )
Math 701 Spring 2021, version April 6, 2024 page 369

is a bijection. Moreover, for any permutation σ ∈ Sn satisfying σ ( P) = Q, we have

(−1)σ = (−1)sum P+sum Q (−1)ασ (−1) βσ . (230)

(Proving this is perhaps the least pleasant part of this proof, but it is pure combina-
torics. It is probably easiest to reduce this to the case when P = [k ] and Q = [k ] by a re-
duction procedure that involves multiplying σ by sum P + sum Q − 2 · (1 + 2 + · · · + k )
many transpositions106 . Once we are in the case P = [k ] and Q = [k ], we can prove
(230) by directly counting the inversions of σ (showing that ℓ (σ) = ℓ (ασ ) + ℓ ( β σ )).
Note that the proof I give in [Grinbe15, Theorem 6.160] avoids proving (230), opting
106 Specifically: Recall the simple transpositions si defined for all i ∈ [n − 1] in Definition 5.2.3.
By replacing σ by σsi (for some i ∈ [n − 1]), we can swap two adjacent values of σ (namely,
σ (i ) and σ (i + 1)). Furthermore, σ ( P) = Q implies (σsi ) (si ( P)) = Q (since si si = s2i = id).
Thus, the equality σ ( P) = Q is preserved if we simultaneously replace σ by σsi and replace
P by si ( P). Such a simultaneous replacement will be called a shift. Furthermore, if i is
chosen in such a way that i ∈ / P and i + 1 ∈ P, then this shift will be called a left shift.
Let us see what happens to σ, P, ασ and β σ when we perform a left shift. Indeed,
consider a left shift which replaces σ by σsi and replaces P by si ( P), where i ∈ [n − 1] is
chosen in such a way that i ∈ / P and i + 1 ∈ P. The set si ( P) is the set P with the element
i + 1 replaced by i. Thus, sum (si ( P)) = sum P − 1. In other words, our left shift has
decremented sum P by 1. Thus, our left shift has flipped the sign of (−1)sum P+sum Q (since
sum Q obviously stays unchanged). The permutation σ has been replaced by σsi , which is
the same permutation as σ but with the values σ (i ) and σ (i + 1) swapped. Since i ∈ / P and
i + 1 ∈ P, this swap has not disturbed the relative order of the elements of P and Pe (but
merely replaced i + 1 by i in P and replaced i by i + 1 in P), e so that the permutations ασ
and β σ have not changed. Our left shift thus has left ασ and β σ unchanged. It has, however,
flipped the sign of (−1)σ (because (−1)σsi = (−1)σ (−1)si = − (−1)σ ).
| {z }
=−1
Let us summarize: Each left shift leaves ασ and β σ unchanged, while flipping the signs of
(−1)sum P+sum Q and (−1)σ . However, by performing left shifts, we can move the smallest
element of P one step to the left, or (if the smallest element of P is already 1) we can move
the second-smallest element of P one step to the left, or (if the two smallest elements of P
are already 1 and 2) we can move the third-smallest element of P one step to the left, and so
on. After sufficiently many such left shifts, the set P will have become [k], whereas ασ and
β σ will not have changed (because we have seen above that each left shift leaves ασ and β σ
unchanged). The total number of left shifts we need for this is ( p1 − 1) + ( p2 − 2) + · · · +
( pk − k) = sum P − (1 + 2 + · · · + k).
Likewise, we can define left co-shifts, which are operations similar to left shifts but act-
ing on the values rather than positions of σ and acting on Q rather than P. Explicitly, a
left co-shift replaces σ by si σ and replaces Q by si ( Q), where i ∈ [n − 1] is chosen such
that i ∈/ Q and i + 1 ∈ Q. Again, we can see that each left co-shift leaves ασ and β σ
unchanged, while flipping the signs of (−1)sum P+sum Q and (−1)σ . After a sequence of
sum Q − (1 + 2 + · · · + k ) left co-shifts, the set Q will have become [k].
Each left shift and each left co-shift multiplies the permutation σ by a transposition.
Hence, after our sum P − (1 + 2 + · · · + k) left shifts and our sum Q − (1 + 2 + · · · + k ) left
co-shifts, we will have multiplied σ by altogether

(sum P − (1 + 2 + · · · + k)) + (sum Q − (1 + 2 + · · · + k))


= sum P + sum Q − 2 · (1 + 2 + · · · + k)

many transpositions. At the end of this process, we have P = [k ] and Q = [k].


Math 701 Spring 2021, version April 6, 2024 page 370

instead for an argument using row permutations; see [Grinbe15, solution to Exercise
6.44] for the details.)
Now,
! !
∑ (−1)σ ∏ Ai,σ(i) ∏ Bi,σ(i)
σ ∈ Sn ; i∈ P
| {z }
i ∈ Pe
σ( P)= Q =(−1)sum P+sum Q (−1)ασ (−1) β σ | {z } | {z }
(by (230)) k ℓ
= ∏ A pi ,qα = ∏ B p′ ,q′
i =1 σ (i ) i β σ (i )
i =1
(by the definition of ασ ) (by the definition of β σ )
! !
k ℓ
= ∑ (−1) sum P+sum Q ασ
(−1) (−1) βσ
∏ A p ,q i α σ (i ) ∏ Bp′ ,q′ i β σ (i )
σ ∈ Sn ; i =1 i =1
σ ( P)= Q
! !
k ℓ
= ∑ (−1)sum P+sum Q (−1)α (−1) β ∏ A p ,q ( ) i α i ∏ Bp′ ,q′ ( ) i β i
(α,β)∈Sk ×Sℓ i =1 i =1
 
here, we have substituted (α, β) for (ασ , β σ ) in the sum,
since the map σ 7→ (ασ , β σ ) is a bijection
! !
k ℓ
= (−1)sum P+sum Q ∑ (−1)α ∏ A pi ,qα(i) ∑ (−1) β ∏ B pi′ ,q′β(i)
α ∈ Sk i =1 β ∈ Sℓ i =1
| {z }| {z }
=det(subQ
 
P A) =det subQe B
e
P
(by the definition of subQ
P A (by the definition of subQe B
e
and its determinant) P
and its determinant)
   
= (−1)sum P+sum Q det subQ A · det sub Q
B . (231)
e
P Pe

Forget that we fixed P and Q. We thus have proved (231) for any two subsets P and
Q of [n] satisfying | P| = | Q|. Thus, our original computation of det ( A + B) becomes
! !
det ( A + B) = ∑ ∑ ∑ (−1)σ ∏ Ai,σ(i) ∏ Bi,σ(i)
P⊆[n] Q⊆[n]; σ ∈ Sn ; i∈ P i ∈ Pe
| P|=| Q| σ ( P)= Q
| {z }
 
sum P+sum Q Q Q
=(−1) det(subP A)·det sub e B
e
P
(by (231))
   
= ∑ ∑ (−1)sum P+sum Q det subQ A · det sub Q
B .
e
P Pe
P⊆[n] Q⊆[n];
| P|=| Q|

This proves Theorem 6.4.23.

We shall soon see a few applications of Theorem 6.4.23. First, let us observe
a simple property of diagonal matrices:107

107 We are using the Iverson bracket notation (see Definition A.1.5) again.
Math 701 Spring 2021, version April 6, 2024 page 371

Lemma 6.4.25. Let n ∈ N. Let d1 , d2 , . . . , dn ∈ K. Let


 
d1 0 ··· 0
 0 d2 ··· 0 
D := (di [i = j])1≤i≤n, 1≤ j≤n =  .  ∈ K n×n
 
 .. .. . . . ..
. . 
0 0 · · · dn

be the diagonal n × n-matrix with diagonal entries d1 , d2 , . . . , dn . Then:


 
(a) We have det subPP D = ∏ di for any subset P of [n].
i∈ P

 Q be two distinct subsets of [n] satisfying | P| = | Q|. Then,


 Let P and
(b)
Q
det subP D = 0.

Proof of Lemma 6.4.25. This is [Grinbe15, Lemma 6.163] (slightly rewritten); see
[Grinbe15, Exercise 6.49] for a detailed proof. (That said, it is almost obvious:
In part (a), the submatrix subPP D is itself diagonal, and its diagonal entries are
precisely the di for i ∈ P. In part (b), the submatrix subQ P D has a zero row
(indeed, from | P| = | Q| and P ̸= Q, we see that there exists some i ∈ P \ Q,
and then the corresponding row of subQ P D is zero) and thus has determinant
0.)
Lemma 6.4.25 gives very simple formulas for minors of diagonal matrices.
Thus, the formula of Theorem 6.4.23 becomes simpler when the matrix B is
diagonal:

Theorem 6.4.26. Let n ∈ N. Let A and D be two n × n-matrices in K n×n such


that the matrix D is diagonal. Let d1 , d2 , . . . , dn be the diagonal entries of the
diagonal matrix D, so that
 
d1 0 · · · 0
 0 d2 · · · 0 
D = (di [i = j])1≤i≤n, 1≤ j≤n =  . ∈ K n×n .
 
 .. .. . . .. 
. . . 
0 0 · · · dn

Then,  
det ( A + D ) = ∑ det subPP A · ∏ di .
P⊆[n] i ∈[n]\ P

 
The minors det subPP A of an n × n-matrix A are known as the principal
minors of A.
Math 701 Spring 2021, version April 6, 2024 page 372

Example 6.4.27. For n = 3, the claim of Theorem 6.4.26 is

det ( A + D )
 
= ∑ det subP A ·
P
∏ di
P⊆[3] i ∈[3]\ P
 
A · ∏ di + det
{1}
A · ∏ di

= det sub∅
∅ sub{1}
| {z } i∈[3]\∅ | {z } i∈[3]\{1}
=1 | {z } = A1,1 | {z }
= d1 d2 d3 = d2 d3
   
{2}
A · ∏ di + det
{3}
+ det sub{2} sub{3} A · ∏ di
| {z } i∈[3]\{2} | {z } i∈[3]\{3}
= A2,2 | {z } = A3,3 | {z }
= d1 d3 = d1 d2
   
+ det sub{1,2} A · ∏ di + det sub{1,3} A · ∏ di
{1,2} {1,3}
|  {z } i∈[3]\{1,2} |  {z } i∈[3]\{1,3}
A1,1 A1,2  A1,1 A1,3 
| {z } | {z }
=det = d3 =det = d2
A2,1 A2,2 A3,1 A3,3
   
+ det sub{2,3} A · ∏ di + det sub{1,2,3} A ·
{2,3} {1,2,3}
∏ di
|  {z } i∈[3]\{2,3} | {z } i∈[3]\{1,2,3}
=det A
A2,2 A2,3 
| {z } | {z }
=det = d1 =(empty product)
A3,2 A3,3 =1

= d1 d2 d3 + A1,1 d2 d3 + A2,2 d1 d3 + A3,3 d1 d2


   
A1,1 A1,2 A1,1 A1,3
+ det · d3 + det · d2
A2,1 A2,2 A3,1 A3,3
 
A2,2 A2,3
+ det · d1 + det A.
A3,2 A3,3

Proof of Theorem 6.4.26 (sketched). (See [Grinbe15, Corollary 6.162] for details.)
We shall use the notations e I and sum S as defined in Theorem 6.4.23. If P and
Q are two subsets of [n] satisfying | P| = | Q| but P ̸= Q, then their complements
Pe and Qe are also distinct (since P ̸= Q) and satisfy Pe = Q e (since | P| = | Q|),
and therefore Lemma 6.4.25 (b) (applied to Pe and Q
e instead of P and Q) yields
 
Q
det sub e D = 0. (232)
e
P
Math 701 Spring 2021, version April 6, 2024 page 373

Now, Theorem 6.4.23 (applied to B = D) yields


   
det ( A + D ) = ∑ ∑ (−1) sum P+sum Q Q Q
det subP A · det sub e D
e

P⊆[n] Q⊆[n]; | {z P }
| P|=| Q| This is 0 if P̸= Q
(by (232))
   
= ∑ | (− 1 ) sum P+sum P
det sub P
P A · det sub Pe
Pe
D
P⊆[n]
{z } | {z }
=1
= ∏ di
i ∈ Pe
(by Lemma 6.4.25 (a))
 
here, we have removed all addends with P ̸= Q
from the double sum, since these addends are 0
 
= ∑ det subPP A · ∏ di .
P⊆[n] i ∈ Pe

This proves Theorem 6.4.26 (since Pe = [n] \ P).


As a particular case of Theorem 6.4.26, we quickly obtain the following for-
mula for a class of determinants that frequently appear in graph theory:
Proposition 6.4.28. Let n ∈ N. Let d1 , d2 , . . . , dn ∈ K and x ∈ K. Let F be the
n × n-matrix
 
x + d1 x ··· x
 x x + d2 ··· x 
( x + di [i = j])1≤i≤n, 1≤ j≤n =   ∈ K n×n .
 
.
.. .. ... ..
 . . 
x x · · · x + dn

Then,
n
det F = d1 d2 · · · dn + x ∑ d1 d2 · · · dbi · · · dn ,
i =1
where the hat over the “di ” means “omit the di factor” (that is, the expression
“d1 d2 · · · dbi · · · dn ” is to be understood as “d1 d2 · · · di−1 di+1 di+2 · · · dn ”).
Proof of Proposition 6.4.28 (sketched). Define the two n × n-matrices
 
x x ··· x
 x x ··· x 
A := ( x )1≤i≤n, 1≤ j≤n =  . . . ∈ K n×n and
 
 .. .. .
. . .. 

x x ··· x
 
d1 0 · · · 0
 0 d2 · · · 0 
D := (di [i = j])1≤i≤n, 1≤ j≤n =  . ∈ K n×n .
 
 .. .
.. . .
. . .. 

0 0 · · · dn
Math 701 Spring 2021, version April 6, 2024 page 374

Then, it is clear that F = A + D. Moreover, the matrix D is diagonal, and its


diagonal entries are d1 , d2 , . . . , dn . Hence, Theorem 6.4.26 yields
 
det ( A + D ) = ∑ det subPP A · ∏ di . (233)
P⊆[n] i ∈[n]\ P

However, ifP is a subset of[n], then subPP A is a submatrix of A and thus


x x ··· x
 x x ··· x 
has the form  . (since all entries of A equal x). If the subset P
 
 .. .. . . .. 
. . . 
x x ··· x  
x x ··· x
 x x ··· x 
P
of [n] has size ≥ 2, then this submatrix subP A =  . . .  has size
 
. . . .
.
 . . . . 
x x ··· x
| P| ≥ 2 and therefore has determinant 0 (by (220), applied to | P| instead of n).
In other words, we have  
det subPP A = 0 (234)

whenever P ⊆ [n] satisfies | P| ≥ 2.


Math 701 Spring 2021, version April 6, 2024 page 375

Now, (233) becomes


 
det ( A + D ) = ∑ det subPP A · ∏ di
P⊆[n] i ∈[n]\ P
   
= ∑ det subPP A · ∏ di + ∑ det sub P
P A · ∏ di
P⊆[n]; i ∈[n]\ P P⊆[n]; | {z } i∈[n]\ P
| P|≤1 | P|≥2 =0
(by (234))

(since each P ⊆ [n] satisfies either | P| ≤ 1 or | P| ≥ 2)


 
= ∑ det subP A · ∏ di
P

P⊆[n]; i ∈[n]\ P
| P|≤1
n  
{ p}
∏ i ∑ ∏ di

= det sub∅ A · d + det sub { p}
A ·
| {z ∅ } i∈[n]\∅ p =1 | {z } i ∈[n]\{ p}
=1 | {z } =x  | {z }
∅ { }

(since sub∅ A is = ∏ di
p
(since sub{ p} A= x ) =d1 d2 ···dcp ···dn
a 0×0-matrix) i ∈[n]
=d1 d2 ···dn
 
since the subsets P of [n] satisfying | P| ≤ 1
are the n + 1 subsets ∅, {1} , {2} , . . . , {n}
n
= d1 d2 · · · d n + ∑ x · d1 d2 · · · dbp · · · dn
p =1
n
= d1 d2 · · · d n + x ∑ d1 d2 · · · dbp · · · dn
p =1
n
= d1 d2 · · · dn + x ∑ d1 d2 · · · dbi · · · dn
i =1

(here, we have renamed the summation index p as i). In view of F = A + D,


this rewrites as
n
det F = d1 d2 · · · dn + x ∑ d1 d2 · · · dbi · · · dn ,
i =1

Thus, Proposition 6.4.28 is proven.


(See https://math.stackexchange.com/questions/2110766/ for some dif-
ferent proofs of Proposition 6.4.28.)
Another application of Theorem 6.4.26 is an explicit formula for the character-
istic polynomial of a matrix. We recall that the characteristic polynomial of an
n × n-matrix A ∈ K n×n is defined to be the polynomial108 det ( xIn − A) ∈ K [ x ]
(some authors define it to be det ( A − xIn ) instead, but this is the same up
to sign). This is a polynomial of degree n, whose leading term is x n , whose
108 Here, In denotes the n × n identity matrix.
Math 701 Spring 2021, version April 6, 2024 page 376

n
next-highest term is − Tr A · x n−1 where Tr A := ∑ Ai,i is the trace of A, and
i =1
whose constant term is (−1)n det A. We shall extend this by explicitly com-
puting all coefficients of this polynomial. For the sake of simplicity, we will
compute det ( A + xIn ) instead of det ( xIn − A) (this is tantamount to replacing
x by − x), and we will take x to be an element of K rather than an indeterminate
(but this setting is more general, since we can take K itself to be a polynomial
ring and then choose x to be its indeterminate). Here is our formula:

Proposition 6.4.29. Let n ∈ N. Let A ∈ K n×n be an n × n-matrix. Let x ∈ K.


Let In denote the n × n identity matrix. Then,
 
  n   
det ( A + xIn ) = ∑ det subP A · x
P n−| P|
= ∑  ∑ det subP A 
 P k
x .
P⊆[n] k =0 P⊆[n];
| P|=n−k

Proof of Proposition 6.4.29 (sketched). (See [Grinbe15, Corollary 6.164] for details.)
The matrix xIn is diagonal, and its diagonal entries are x, x, . . . , x; in fact,
 
x 0 ··· 0
 0 x ··· 0 
xIn = ( x [i = j])1≤i≤n, 1≤ j≤n =  . . . .
 
 .. .. .
. . .. 

0 0 ··· x
Hence, Theorem 6.4.26 (applied to D = xIn and di = x) yields
det ( A + xIn )
   
= ∑ det subP A ·P
∏ x = ∑ det subPP A · x n−| P|
P⊆[n] i ∈[n]\ P P⊆[n]
| {z }
= x |[n]\ P| = x n−| P|
(since |[n]\ P|=|[n]|−| P|=n−| P|)
n  
  here, we have split the sum
= ∑ ∑ det subPP A x n−k
according to the value of | P|
k =0 P⊆[n];
| P|=k
n  
  here, we have substituted n − k
= ∑ ∑ det subPP A x k
for k in the sum
k =0 P⊆[n];
| P|=n−k
 
n  
∑ ∑ det subPP A  k

= 
 x .
k =0 P⊆[n];
| P|=n−k

Proposition 6.4.29 is proven.


Math 701 Spring 2021, version April 6, 2024 page 377

6.4.5. Factoring the matrix


Next, we will see some tricks for computing determinants.
Let us compute a determinant that recently went viral on the internet af-
ter Timothy Gowers livestreamed himself computing it109 ([Grinbe15, Exercise
6.11], [EdeStr04]):

Proposition 6.4.30. Let n ∈ N. Let A be the n × n-matrix


    
n−1
  
0 1
···
 0 0 0
 
    
 1 2 n 
···
 
i+j−2
 
1 1 1
 
= .
i−1 1≤i ≤n, 1≤ j≤n

.. .. ... ..

. . .
 
 
 n−1 2n − 2 
       
n
···
n−1 n−1 n−1
 
1 1 1 1
 1 2 3 4 
(For example, for n = 4, we have A = 
 1
.)
3 6 10 
1 4 10 20
Then, det A = 1.

There are many ways to prove Proposition 6.4.30, but here is a particularly
simple one:
Proof of Proposition 6.4.30 (sketched). (See [Grinbe15, Exercise 6.11] for details.)

109 See https://www.youtube.com/watch?v=byjhpzEoXFs and https://www.youtube.com/


watch?v=frvBdaqLgLo and https://www.youtube.com/watch?v=m8R9rVb0M5o .
Math 701 Spring 2021, version April 6, 2024 page 378

For any i, j ∈ [n], we have


i+j−2
 
Ai,j = (by the definition of A)
i−1
( i − 1) + ( j − 1) ( i − 1) + ( j − 1)
   
= = (by Theorem 2.3.6)
i−1 j−1
i −1
i−1 j−1
   
= ∑
k =0
k j−1−k
|  {z  }
j−1
=
k
(by Theorem 2.3.6)
 
by Proposition 3.2.20, applied to i − 1, j − 1 and j − 1
instead of a, b and n
i −1 n −1
i−1 j−1 i−1 j−1
     
= ∑ = ∑
k =0
k k k =0
k k
 
here, we extended the sum upwards to k = n − 1,
 but this has not changed the value of the sum, 
 

 since all newly introduced addends are 0 


 
i 1
 
(since = 0 whenever k > i − 1)
 
k
n 
i−1 j−1
 
= ∑
k =1
k−1 k−1
(here, we have substituted k − 1 for k in the sum) .
If we define two n × n-matrices
i−1 j−1
   
L := and U := ,
k−1 1≤i ≤n, 1≤k≤n k−1 1≤k≤n, 1≤ j≤n

then this rewrites as


n
Ai,j = ∑ Li,k Uk,j = ( LU )i,j
k =1
(by the definition of the matrix product LU). Since this equality holds for all
i, j ∈ [n], we thus conclude that A = LU.
Notice, however, that the matrixL is lower-triangular (because if i < k, then
i−1

i − 1 < k − 1 and therefore Li,k = = 0), and thus (by Theorem 6.4.11)
k−1
its determinant is the product of its diagonal entries. In other words,
n 
1−1 2−1 n−1 k−1
     
det L = ··· =∏ = 1.
1−1 2−1 n−1 k−1
k=1 | {z }
=1
Math 701 Spring 2021, version April 6, 2024 page 379

Similarly, the matrix U is upper-triangular, and its determinant is det U = 1


as well.
Now, from A = LU, we obtain

| {z L} · det
det A = det ( LU ) = det | {zU} (by Theorem 6.4.16)
=1 =1
= 1;
this proves Proposition 6.4.30.
How can you discover such a proof? Our serendipitous factorization of A as
LU might appear unmotivated, but from the viewpoint of linear algebra it is an
instance of a well-known and well-understood kind of factorization, known as
the LU-decomposition or the LU-factorization. Over a field, almost every square
matrix110 has an LU-decomposition (i.e., a factorization as a product of a lower-
triangular matrix with an upper-triangular matrix). This LU-decomposition is
unique if you require (e.g.) the diagonal entries of the lower-triangular factor
to all equal 1. It can furthermore be algorithmically computed using Gaussian
elimination (see, e.g., [OlvSha18, §1.3, Theorem 1.3]). Now, computing the LU-
decomposition of the matrix A from Proposition 6.4.30 for n = 4, we find
     
1 1 1 1 1 0 0 0 1 1 1 1
 1 2 3 4   1 1 0 0   0 1 2 3 
 1 3 6 10  = .
     
 1 2 1 0   0 0 1 3 
1 4 10 20 1 3 3 1 0 0 0 1
| {z } | {z }
this is the lower-triangular factor this is the upper-triangular factor

The entries of both factors appear to be the binomial coefficients familiar from
Pascal’s triangle. This suggests that we might have
i−1 j−1
   
L= and U= ,
k−1 1≤i ≤n, 1≤k≤n k−1 1≤k≤n, 1≤ j≤n

not just for n = 4 but also for arbitrary n ∈ N. And once this guess has been
made, it is easy to prove that A = LU (our proof above is not the only one
possible; four proofs appear in [EdeStr04]).
This is not the only example where LU-decomposition helps compute a de-
terminant (see, e.g., [Kratte99, §2.6] for examples). Sometimes it is helpful to
transpose a matrix, or to permute its rows or columns to obtain a matrix with
a good LU-decomposition.

6.4.6. Factor hunting


The next trick – known as factor hunting – works not only for determinants;
however, determinants provide some of the simplest examples.
110 I will not go into details as to what “almost every” means here.
Math 701 Spring 2021, version April 6, 2024 page 380

Theorem 6.4.31 (Vandermonde determinant). Let n ∈ N. Let a1 , a2 , . . . , an be


n elements of K. Then:
(a) We have
  
n− j


det ai = ai − a j .
1≤i ≤n, 1≤ j≤n
1≤ i < j ≤ n

(b) We have
  
anj −i ∏

det = ai − a j .
1≤i ≤n, 1≤ j≤n
1≤ i < j ≤ n

(c) We have
  
j −1


det ai = ai − a j .
1≤i ≤n, 1≤ j≤n
1≤ j < i ≤ n

(d) We have
  
aij−1 ∏

det = ai − a j .
1≤i ≤n, 1≤ j≤n
1≤ j < i ≤ n

Example 6.4.32. Here is what the four parts of Theorem 6.4.31 say for n = 3:
 
a21 a1 1
 2
(a) We have det  a2 a2 1  = ( a1 − a2 ) ( a1 − a3 ) ( a2 − a3 ).

a23 a3 1
 2 2
a23

a1 a2
(b) We have det  a1 a2 a3  = ( a1 − a2 ) ( a1 − a3 ) ( a2 − a3 ).
1 1 1
 
1 a1 a21
(c) We have det  1 a2 a22  = ( a2 − a1 ) ( a3 − a1 ) ( a3 − a2 ).
 

1 a3 a23
1 1 1
 

(d) We have det  a1 a2 a3  = ( a2 − a1 ) ( a3 − a1 ) ( a3 − a2 ).


a21 a22 a23

Theorem 6.4.31 is a classical and important result, known as the Vandermonde


determinant. Many different proofs are known (see, e.g., [Grinbe15, Theorem
6.46] or [Aigner07, §5.3] or [Bourba74, Section III.8.6, Example (1)] or [Grinbe10,
Math 701 Spring 2021, version April 6, 2024 page 381

Theorem 1]; a combinatorial proof can also be found in Exercise A.5.3.6; two
more proofs are obtained in Exercise A.5.3.8 and Exercise A.5.3.9). We will now
sketch a proof using factor hunting and polynomials. We will first focus on
proving part (a) of Theorem 6.4.31, and afterwards derive the other parts from
it.
The first step of our proof is reducing Theorem 6.4.31 (a) to the “particu-
lar case” in which K is the polynomial ring Z [ x1 , x2 , . . . , xn ] and the elements
a1 , a2 , . . . , an are the indeterminates x1 , x2 , . . . , xn . This is merely a particular
case (one possible choice of K and a1 , a2 , . . . , an among many); however, as we
will soon see, proving Theorem 6.4.31 (a) in this particular case will quickly
entail that Theorem 6.4.31 (a) holds in the general case. Let us elaborate on
this argument. First, let us state Theorem 6.4.31 (a) in this particular case as a
lemma:

Lemma 6.4.33. Let n ∈ N. Consider the polynomial ring Z [ x1 , x2 , . . . , xn ] in


n indeterminates x1 , x2 , . . . , xn with integer coefficients. In this ring, we have
  
n− j
= ∏

det xi xi − x j . (235)
1≤i ≤n, 1≤ j≤n
1≤ i < j ≤ n

We can derive Theorem 6.4.31 (a) from Lemma 6.4.33 as follows:


Proof of Theorem 6.4.31 (a) using Lemma 6.4.33. The equality
  
n− j
= ∏

det ai ai − a j (236)
1≤i ≤n, 1≤ j≤n
1≤ i < j ≤ n

follows from the equality (235) by substituting a1 , a2 , . . . , an for x1 , x2 , . . . , xn .


This is sufficiently clear to be considered a complete proof, but just in case, here are
a few details.
We can substitute a1 , a2 , . . . , an for x1 , x2 , . . . , xn in any polynomial f ∈ Z [ x1 , x2 , . . . , xn ],
since a1 , a2 , . . . , an are n elements of a commutative ring (namely, of K). It is obvi-
ous that ∏ xi − x j becomes ∏

ai − a j when we substitute a1 , a2 , . . . , an for
1≤ i < j ≤ n 1≤ i < j ≤ n
  
n− j
x1 , x2 , . . . , xn . It is perhaps a bit less obvious that det xi becomes
1≤i ≤n, 1≤ j≤n
  
n− j
det ai when we substitute a1 , a2 , . . . , an for x1 , x2 , . . . , xn . To con-
1≤i ≤n, 1≤ j≤n
vince our skeptical selves of this, we expand both determinants: The definition of the
determinant yields
  
= ∑ (−1)σ x1
n− j n − σ (1) n − σ (2) n−σ(n)
det xi x2 · · · xn and
1≤i ≤n, 1≤ j≤n
σ ∈ Sn
  

n− j n − σ (1) n − σ (2) n−σ(n)
det ai = (−1)σ a1 a2 · · · an ,
1≤i ≤n, 1≤ j≤n
σ ∈ Sn
Math 701 Spring 2021, version April 6, 2024 page 382

and it is clear that substituting a1 , a2 , . . . , an for x1 , x2 , . . . , xn transforms


σ n − σ (1) n − σ (2) n−σ(n) n − σ (1) n − σ (2) n−σ(n)
∑ (−1) x1 x2 · · · xn into ∑ (−1)σ a1 a2 · · · an .
σ ∈ Sn σ ∈ Sn
Thus, (236) follows from (235). In other words, Theorem 6.4.31 (a) follows
from Lemma 6.4.33.
Arguments like the one we just used are frequently applied in algebra; see
[Conrad-UI] for some more examples.
It now remains to prove Lemma 6.4.33.
Proof of Lemma 6.4.33 (sketched). We set
  
n− j


f := det xi and g := xi − x j .
1≤i ≤n, 1≤ j≤n
1≤ i < j ≤ n

Thus, we must prove that f = g.


We have
  
n− j
f = det xi
1≤i ≤n, 1≤ j≤n
n − σ (1) n − σ (2) n−σ(n)
= ∑ (−1) σ
x1 x2 · · · xn (237)
σ ∈ Sn

(by the definition of a determinant). The right hand side of this equality is a
n (n − 1) 111
homogeneous polynomial in x1 , x2 , . . . , xn of degree . Thus, f is
2
n ( n − 1)
a homogeneous polynomial in x1 , x2 , . . . , xn of degree . Furthermore,
2
the monomial x1n−1 x2n−2 · · · xnn−n appears with coefficient 1 on the right hand
side of (237) (indeed, all the n! addends in the sum on this right hand side
contain distinct monomials, and thus only the addend for σ = id makes any
contribution to the coefficient of the monomial x1n−1 x2n−2 · · · xnn−n ). Hence,
h i
x1n−1 x2n−2 · · · xnn−n f = 1. (238)
n − σ (1) n − σ (2) n−σ(n)
111 becauseit is a Z-linear combination of the monomials x1 x2 · · · xn , each of
which has degree

(n − σ (1)) + (n − σ (2)) + · · · + (n − σ (n)) = n · n − (σ (1) + σ (2) + · · · + σ (n))


| {z }
=1+2+···+n
(since σ is a permutation of [n])
n ( n + 1)
= n · n − (1 + 2 + · · · + n ) = n · n −
| {z } 2
n ( n + 1)
=
2
n ( n − 1)
=
2
Math 701 Spring 2021, version April 6, 2024 page 383

Therefore, f ̸= 0.
Now, let u and v be two elements of [n] satisfying u < v. Then, the polyno-
mial f becomes 0 when we set xu equal to xv (that is,  we substitute xv for
 when
n− j
xu ). Indeed, when we set xu equal to xv , the matrix xi becomes
1≤i ≤n, 1≤ j≤n
a matrix that has two equal rows  (namely, its u-th and its v-th row both become
equal to xvn−1 , xvn−2 , . . . , xvn−n ), and thus its determinant becomes 0 (by Theo-
rem 6.4.12 (c)); but this means precisely that f = 0 (since f is the determinant
of this matrix).
Now, we recall the following well-known property of univariate polynomials:

Root factoring-off theorem: Let R be a commutative ring. Let p ∈


R [t] be a univariate polynomial that has a root r ∈ R. Then, this
polynomial p is divisible by t − r (in the ring R [t]).

Using this property, we can easily see that if a polynomial p ∈ Z [ x1 , x2 , . . . , xn ]


becomes 0 when we set xu equal to xv , then this polynomial p is a multiple of
xu − xv (in the ring Z [ x1 , x2 , . . . , xn ]). (Indeed, viewing p as a polynomial in
the single indeterminate xu over the ring Z [ x1 , x2 , . . . , xbu , . . . , xn ] 112 , we see
that xv is a root of p, and therefore the root factoring-off theorem yields that p
is divisible by xu − xv . 113 )
Applying this observation to p = f , we conclude that f is a multiple of
xu − xv (in the ring Z [ x1 , x2 , . . . , xn ]), since we know that f becomes 0 when we
set xu equal to xv .
Forget that we fixed u and v. We thus have shown that f is a multiple of
xu − xv (in the ring Z [ x1 , x2 , . . . , xn ]) whenever u and v are two elements of
[n] satisfying u < v. In other words, f is a multiple of each of the linear
polynomials
x1 − x2 , x1 − x3 , . . . , x1 − x n ,
x2 − x3 , . . . , x2 − x n ,
... ..
.
x n −1 − x n
(in the ring Z [ x1 , x2 , . . . , xn ]). However, these linear polynomials are irreducible114
112 The meaning of the hat over the “xu ” is as in Proposition 6.4.28: It signifies
that the entry xu is omitted from the list (that is, “x1 , x2 , . . . , c
xu , . . . , xn ” means
“x1 , x2 , . . . , xu−1 , xu+1 , xu+2 , . . . , xn ”).
113 Here, we are tacitly using the canonical isomorphism between the polynomial rings

Z [ x1 , x2 , . . . , x n ] and (Z [ x1 , x2 , . . . , c
xu , . . . , xn ]) [ xu ] .

This isomorphism allows us to treat multivariate polynomials in Z [ x1 , x2 , . . . , xn ] as uni-


variate polynomials in the indeterminate xu over the ring Z [ x1 , x2 , . . . , c
xu , . . . , xn ], and vice
versa (and ensures that the notion of divisibility does not change between the former and
the latter).
114 Check this! (Actually, any linear polynomial over Z is irreducible if the gcd of its coefficients

is 1.)
Math 701 Spring 2021, version April 6, 2024 page 384

and mutually non-associate (i.e., no two of them are associate115 )116 . Since the
ring Z [ x1 , x2 , . . . , xn ] is a unique factorization domain117 , we thus conclude that
any polynomial p ∈ Z [ x1 , x2 , . . . , xn ] that is a multiple of each of these linear


polynomials must necessarily be a multiple of their product xi − x j .
1≤ i < j ≤ n


Hence, the polynomial f must be a multiple of this product xi − x j
1≤ i < j ≤ n
(since f is a multiple of each of the linear polynomials above). In other words,


the polynomial f must be a multiple of g (since g = xi − x j ). In
1≤ i < j ≤ n
other words, f = gq for some q ∈ Z [ x1 , x2 , . . . , xn ]. Consider this q. Clearly,
gq = f ̸= 0 and thus q ̸= 0 and g ̸= 0.
The ring Z is an integral domain. Thus, any two nonzero polynomials a and
b over Z satisfy deg ( ab) = deg a + deg b. Applying this to a = g and b = q,
we find deg ( gq) = deg g + deg q. In other words, deg f = deg g + deg q (since
gq = f ).
Now, g = ∏

xi − x j and thus
1≤ i < j ≤ n
!
∏ deg xi − x j = ∑ 1

 
deg g = deg xi − x j =
1≤ i < j ≤ n 1≤ i < j ≤ n | {z } 1≤ i < j ≤ n
=1

2
 n n ( n − 1)
= # of pairs (i, j) ∈ [n] satisfying i < j = = .
2 2
n ( n − 1)
However, we have deg f ≤ (since f is a homogeneous polynomial of
2
n ( n − 1) n ( n − 1)
degree ). In view of deg g = , this rewrites as deg f ≤ deg g.
2 2
Hence,
deg g ≥ deg f = deg g + deg q.
Therefore, 0 ≥ deg q. This shows that the polynomial q is constant. In other
words, q ∈ Z.
It remains to show that this constant q is 1. However, this can be done by
comparing some coefficients of f and g. Indeed, let us look at the coefficient of
x1n−1 x2n−2 · · · xnn−n (as we already know this coefficient for f to be 1).
Expanding the product ∏ xi − x j , we obtain a sum of several (in fact,
1≤ i < j ≤ n
2n(n−1)/2 many) monomials with + and − signs. I claim that among these
115 Recall that two elements a and b of a commutative ring R are said to be associate if there exists
some unit u of R such that a = ub. Being associate is known (and easily verified) to be an
equivalence relation.
116 Check this! (Recall that the only units of the polynomial ring Z [ x , x , . . . , x ] are 1 and −1.)
1 2 n
117 This is a nontrivial, but rather well-known result in abstract algebra. Proofs can be found in

[Ford21, Theorem 3.7.4], [Knapp16, Remark after Corollary 8.21], [MiRiRu87, Chapter IV,
Theorems 4.8 and 4.9] and [Edward05, Essay 1.4, Corollary of Theorem 1 and Corollary 1
of Theorem 2].
Math 701 Spring 2021, version April 6, 2024 page 385

monomials, the monomial x1n−1 x2n−2 · · · xnn−n will appear exactly once, and with
a + sign. Indeed, in order to obtain x1n−1 x2n−2 · · · xnn−n when expanding the
product



x i − x j = ( x1 − x2 ) ( x1 − x3 ) · · · ( x1 − x n )
1≤ i < j ≤ n
( x2 − x3 ) · · · ( x2 − x n )
.. ..
. .
( x n −1 − x n ) ,

it is necessary to pick the x1 minuends from all n − 1 factors in the first row
(since none of the other factors contain any x1 ), then to pick the x2 minuends
from all n − 2 factors in the second row (since none of the remaining factors
contain any x2 ), and so on – i.e., to take the minuend (rather than the sub-
trahend) from each factor. Thus, only one of the monomials obtained by the
expansion will be x1n−1 x2n−2 · · · xnn−n , and it will appear with a + sign. Hence,
the coefficient of x1n−1 x2n−2 · · · xnn−n in the product ∏

xi − x j is 1. In other
1≤ i < j ≤ n
n −1 n −2 n − n
words, the coefficient of x1 x2 · · · xn in g is 1 (since g = ∏

xi − x j ).
1≤ i < j ≤ n
In other words, h i
x1n−1 x2n−2 · · · xnn−n g = 1.

Now, recall that f = gq. Hence,


h i h i
n −1 n −2 n−n n −1 n −2 n−n
x1 x2 · · · x n f = x1 x2 · · · x n ( gq)
h i
= q · x1n−1 x2n−2 · · · xnn−n g (since q ∈ Z)
| {z }
=1
= q,
h i
so that q = x1n−1 x2n−2 · · · xnn−n f = 1 (by (238)). Hence, f = g q = g. As we
|{z}
=1
said, this completes the proof of Lemma 6.4.33.
The technique used in the above proof of Lemma 6.4.33 may appear some-
what underhanded: Instead of computing our polynomial f upfront, we have
kept lopping off linear factors from it until a constant polynomial remained (for
degree reasons). This technique is called identification of factors or factor hunting,
and is used in various different places, but particularly often in the computa-
tion of determinants (multiple examples are given in [Kratte99, §2.4 and further
below]). While I consider it to be aesthetically inferior to sufficiently direct ap-
proaches, it has shown to be useful in situations in which no direct approaches
are known to work.
Math 701 Spring 2021, version April 6, 2024 page 386

Proof of Theorem 6.4.31 (sketched). (a) We have already given a proof of Theorem
6.4.31 (a) (and with Lemma 6.4.33 established, this proof is now complete).
   
n− j
(b) The matrix anj −i is the transpose of the matrix ai .
1≤i ≤n, 1≤ j≤n 1≤i ≤n, 1≤ j≤n
Thus, Theorem 6.4.31 (b) follows from Theorem 6.4.31 (a) using Theorem 6.4.10.
 
n− j
(c) Let A be the n × n-matrix ai . Then, Theorem 6.4.31 (a)
1≤i ≤n, 1≤ j≤n


says that det A = ai − a j .
1≤ i < j ≤ n
Let τ ∈ Sn be the permutation of [n] that sends each j ∈ [n] to n + 1 − j.
Then, each i, j ∈ [n] satisfy j − 1 = n − τ ( j) and therefore
j −1 n−τ ( j)
ai = ai = Ai,τ ( j) (by the definition of A) .
   
j −1
Hence, ai = Ai,τ ( j) and thus
1≤i ≤n, 1≤ j≤n 1≤i ≤n, 1≤ j≤n
     
j −1
det ai = det Ai,τ ( j) = (−1)τ · det A
1≤i ≤n, 1≤ j≤n 1≤i ≤n, 1≤ j≤n

(by (223)).
However, the permutation τ has OLN (n, n − 1, n − 2, . . . , 1). Thus, each pair
(i, j) ∈ [n]2 satisfying i < j is an inversion of τ. Therefore, the Coxeter length
ℓ (τ ) of τ is the # of all pairs (i, j) ∈ [n]2 satisfying i < j. Thus, the sign of τ is

(−1)τ = (−1)ℓ(τ ) (by the definition of the sign of a permutation)


2
= (−1)(# of all pairs (i,j)∈[n] satisfying i< j)
 
since ℓ (τ ) is the # of all pairs (i, j) ∈ [n]2 satisfying i < j
= ∏ (−1) .
1≤ i < j ≤ n

Now,
  
j −1
det ai
1≤i ≤n, 1≤ j≤n
! !
∏ (−1) · ∏ ai − a j

= (−1)τ ·
| {zA}
det =
1≤ i < j ≤ n 1≤ i < j ≤ n
| {z }
= ∏ (−1) = ∏ ( ai − a j )
1≤ i < j ≤ n 1≤ i < j ≤ n

= ∏ (−1) ai − a j = ∏ a j − ai = ∏
  
ai − a j
1≤ i < j ≤ n | {z } 1≤ i < j ≤ n 1≤ j < i ≤ n
= a j − ai

(here, we have renamed the index (i, j) as ( j, i ) in the product). This proves
Theorem 6.4.31 (c).
Math 701 Spring 2021, version April 6, 2024 page 387

   
j −1
(d) The matrix aij−1 is the transpose of the matrix ai .
1≤i ≤n, 1≤ j≤n 1≤i ≤n, 1≤ j≤n
Thus, Theorem 6.4.31 (d) follows from Theorem 6.4.31 (c) using Theorem 6.4.10.

The Vandermonde determinant is itself a useful tool in the computation of


various other determinants. Here is an example:
Proposition 6.4.34. Let n ∈ N. Let x1 , x2 , . . . , xn ∈ K and y1 , y2 , . . . , yn ∈ K.
Then,
 
 n −1 
det xi + y j
1≤i ≤n, 1≤ j≤n

( x 1 + y 1 ) n −1 ( x 1 + y 2 ) n −1 · · · ( x 1 + y n ) n −1
 

( x 2 + y 1 ) n −1 ( x 2 + y 2 ) n −1 · · · ( x 2 + y n ) n −1
 
 
= det  .. .. .. ..

.
 
 . . . 
( x n + y 1 ) n −1 ( x n + y 2 ) n −1 · · · ( x n + y n ) n −1
n −1  ! ! !
n−1
∏ k ∏ xi − x j ∏ y j − yi .
 
=
k =0 1≤ i < j ≤ n 1≤ i < j ≤ n

First proof of Proposition 6.4.34 (sketched). Here is a rough outline of a proof that
uses factor hunting (in the same way as in our above proofs of Theorem 6.4.31
(a) and Lemma 6.4.33). We WLOG assume that x1 , x2 , . . . , xn and y1 , y2 , . . . , yn
are distinct indeterminates in a polynomial ring over Z. (This is again an
assumption that we can make, because the argument that we used to derive
Theorem 6.4.31 (a) from Lemma 6.4.33 can be applied here as well.) Then, we
can easily see that  
 n −1 
det xi + y j
1≤i ≤n, 1≤ j≤n
is a homogeneous polynomial of degree n (n − 1). This polynomial vanishes if
we set any xu equal to any xv (for u < v), and also vanishes if we set any yu
equal to any yv (for u < v). Thus we have identified n (n − 1) linear factors of
this polynomial (namely, the differences xu − xv and yv − yu for u < v), and we
can again conclude (since any polynomial ring over Z is a unique factorization
domain) that
  ! !
 n −1 
∏ xi − x j ∏ y j − yi · q
 
det xi + y j =
1≤i ≤n, 1≤ j≤n
1≤ i < j ≤ n 1≤ i < j ≤ n
n −1 n−1
 
for a constant q ∈ Z. It remains to prove that this constant q equals ∏ .
k =0 k
This can be done by studying the coefficients of the monomial
x1n−1 x2n−2 · · · xnn−n y01 y12 · · · ynn−1
Math 701 Spring 2021, version April 6, 2024 page 388

  ! !
 n −1 
∏ ∏
 
in det xi + y j and in xi − x j y j − yi .
1≤i ≤n, 1≤ j≤n 1≤ i < j ≤ n 1≤ i < j ≤ n
We leave the details to the reader.
Second proof of Proposition 6.4.34. (See [Grinbe15, Exercise 6.17 (b)] for details.)
Let   n −1 
C : = xi + y j ∈ K n×n .
1≤i ≤n, 1≤ j≤n

For any i, j ∈ [n], we have


n −1 
n − 1 k n −1− k

 n −1
Ci,j = xi + y j = ∑ k
xi y j (by the binomial formula)
k =0
n
n − 1 k −1 n − k
 
= ∑ k−1 i
x yj
k =1

(here, we have substituted k − 1 for k in the sum). If we define two n × n-


matrices P and Q by

n − 1 k −1
    
n−k
P := x and Q := y j ,
k−1 i 1≤i ≤n, 1≤k≤n 1≤k≤n, 1≤ j≤n

then we can rewrite this as


n
Ci,j = ∑ Pi,k Qk,j = ( PQ)i,j
k =1

(by the definition of the matrix product). Since this holds for all i, j ∈ [n], we
thus obtain C = PQ. Hence,

det C = det ( PQ) = det P · det Q (239)

(by Theorem 
6.4.16).
 It now remains to compute
 det P and det Q.
From Q = ynj −k = ynj −i , we obtain
1≤k≤n, 1≤ j≤n 1≤i ≤n, 1≤ j≤n
  
ynj −i ∏

det Q = det = yi − y j (240)
1≤i ≤n, 1≤ j≤n
1≤ i < j ≤ n

(by Theorem 6.4.31


 (b),applied to ai = yi ).
n − 1 k −1 n − 1 j −1
  
From P = x = x , we
k−1 i 1≤i ≤n, 1≤k≤n j−1 i 1≤i ≤n, 1≤ j≤n
Math 701 Spring 2021, version April 6, 2024 page 389

obtain
!
n − 1 j −1
  
det P = det x
j−1 i 1≤i ≤n, 1≤ j≤n
n−1 n−1 n−1
       
j −1
= ··· · det xi
1−1 2−1 n−1 1≤i ≤n, 1≤ j≤n
| {z } | {z }
n−1
 
n −1 = ∏ ( xi − x j )
=∏ 1≤ j < i ≤ n
k =0 k (by Theorem 6.4.31 (c),
applied to ai = xi )
   
j −1
by (228), applied to A = xi
 1≤i ≤n, 1≤ j≤n 
n−1
   
 
and di =
i−1
!
n −1
n−1
 
∏ k · ∏

= xi − x j
k =0 1≤ j < i ≤ n
!
n −1
n−1
 
∏ k · ∏

= x j − xi (241)
k =0 1≤ i < j ≤ n

(here, we have renamed the index ( j, i ) as (i, j) in the second product).


Now, (239) becomes
det C = det P · det Q
n −1  ! !
n−1
= ∏ ∏ ∏
 
· x j − xi · yi − y j
k =0
k 1≤ i < j ≤ n 1≤ i < j ≤ n
(by (241) and (240))
n −1  !
n−1
= ∏ · ∏
 
x j − xi yi − y j
k =0
k 1≤ i < j ≤ n | {z }
=( xi − x j )(y j −yi )
n −1  !
n−1
= ∏ · ∏
 
xi − x j y j − yi
k =0
k 1≤ i < j ≤ n
! ! !
n −1
n−1
 
= ∏ ∏ xi − x j ∏ y j − yi .
 
k =0
k 1≤ i < j ≤ n 1≤ i < j ≤ n
  n −1 
This proves Proposition 6.4.34 (since C = xi + y j ).
1≤i ≤n, 1≤ j≤n

6.4.7. Laplace expansion


Let us next recall another fundamental property of determinants: Laplace expan-
sion. We will use the following notation:
Math 701 Spring 2021, version April 6, 2024 page 390

Convention 6.4.35. Let n ∈ N. Let A be an n × n-matrix. Let i, j ∈ [n]. Then,


we set
[n]\{ j}
A∼i,∼ j := sub[n]\{i} A (using the notation from Definition 6.4.21) .

This is the (n − 1) × (n − 1)-matrix obtained from A by removing its i-th row


and its j-th column.
 
a b c  ′
a c′

Example 6.4.36. If A = ′
a b c′ ′ , then A∼1,∼2 = .
a′′ c′′
 
′′ ′′
a b c ′′

Now, we can state the theorem underlying Laplace expansion:

Theorem 6.4.37. Let n ∈ N. Let A ∈ K n×n be an n × n-matrix.


(a) For every p ∈ [n], we have
n
∑ (−1) p+q A p,q det

det A = A∼ p,∼q .
q =1

(b) For every q ∈ [n], we have


n
∑ (−1) p+q A p,q det

det A = A∼ p,∼q .
p =1


Note that some authors denote the minors det A∼ p,∼q in Theorem 6.4.37 by
A p,q . This is, of course, totally incompatible with our notations.
Proof of Theorem 6.4.37. See [Grinbe15, Theorem 6.82] or [Ford21, Lemma 4.5.7]
or [Laue15, 5.8 and 5.8’] or [Strick13, Proposition B.24 and Proposition B.25] or
[Loehr11, Theorem 9.48].
Theorem 6.4.37 yields several ways to compute the determinant of a ma-
trix118 . When we compute a determinant det A using Theorem 6.4.37 (a), we
say that we expand this determinant along the p-th row. When we compute a deter-
minant det A using Theorem 6.4.37 (b), we say that we expand this determinant
along the q-th column.
Theorem 6.4.37 has many applications, some of which you have probably
seen in your course on linear algebra. (A few might appear on the homework.)

118 Infact, some authors use Theorem 6.4.37 as a definition of the determinant. (However, this
is somewhat tricky, as it requires proving that all the values obtained for det A by applying
Theorem 6.4.37 are actually equal.)
Math 701 Spring 2021, version April 6, 2024 page 391

The main theoretical application of Theorem 6.4.37 is the concept of the adjugate
(or classical adjoint) of a matrix, which we shall introduce in a few moments.
First, let us see what happens if we replace the A p,q in Theorem 6.4.37 by en-
tries from another row or another column. In fact, the respective sums become
0 (instead of det A), as the following proposition shows:

Proposition 6.4.38. Let n ∈ N. Let A ∈ K n×n be an n × n-matrix. Let r ∈ [n].

(a) For every p ∈ [n] satisfying p ̸= r, we have


n
∑ (−1) p+q Ar,q det

0= A∼ p,∼q .
q =1

(b) For every q ∈ [n] satisfying q ̸= r, we have


n
∑ (−1) p+q A p,r det

0= A∼ p,∼q .
p =1

Proof. See [Grinbe15, Proposition 6.96]. This is also implicit in [Strick13, proof
of Proposition B.28] and [Loehr11, proof of Theorem 9.50].
Here is a sketch of the proof:
(a) Let p ∈ [n] satisfy p ̸= r. Let C be the matrix A with its p-th row replaced by its
r-th row. Then, the matrix C has two equal rows, so that det C = 0 (by Theorem 6.4.12
(c)). On the other hand, expanding det C along the p-th row (i.e., applying Theorem
6.4.37 (a) to C instead of A) yields
n n
∑ p+q
∑ (−1) p+q Ar,q det A∼ p,∼q .
 
det C = (− 1 ) C p,q det C∼ p, ∼ q =
q =1 |{z} | {z } q=1
= Ar,q = A∼ p,∼q

Comparing these two equalities, we obtain the claim of Proposition 6.4.38 (a). A similar
argument proves Proposition 6.4.38 (b).

We can now define the adjugate of a matrix:

Definition 6.4.39. Let n ∈ N. Let A ∈ K n×n be an n × n-matrix. We define a


new n × n-matrix adj A ∈ K n×n by
 
adj A = (−1)i+ j det A∼ j,∼i .
1≤i ≤n, 1≤ j≤n

This matrix adj A is called the adjugate of the matrix A. (Some older texts call
it the adjoint, but this name has since been conquered by a different notion.
As a compromise, some still call adj A the classical adjoint of A.)
Math 701 Spring 2021, version April 6, 2024 page 392

Example 6.4.40. The adjugate of the 0× 0-matrixis the 0 × 0-matrix.


The adjugate of a 1 × 1-matrix  a is adj a = 1 .
a b
The adjugate of a 2 × 2-matrix is
c d
   
a b d −b
adj = .
c d −c a
 
a b c
The adjugate of a 3 × 3-matrix  d e f  is
g h i

ei − f h ch − bi b f − ce
   
a b c
adj  d e f  =  f g − di ai − cg cd − a f  .
g h i dh − ge bg − ah ae − bd

The main property of the adjugate adj A is its connection to the inverse A−1
of a matrix A. Indeed, if an n × n-matrix A is invertible, then its inverse A−1
1
is · adj A. More generally, even if A is not invertible, the product of
det A
adj A with A (in either order) equals (det A) · In (where In is the n × n identity
matrix). Let us state this as a theorem:

Theorem 6.4.41. Let n ∈ N. Let A ∈ K n×n be an n × n-matrix. Let In denote


the n × n identity matrix. Then,

A · (adj A) = (adj A) · A = (det A) · In .

Proof. See [Grinbe15, Theorem 6.100] or [Ford21, Lemma 4.5.9] or [Loehr11,


Theorem 9.50] or [Strick13, Proposition B.28].
Here is a sketch of the argument: In order to show that A · (adj A) = (det A) · In , it
suffices to check that the (i, j)-th entry of A · (adj A) equals det A whenever i = j and
equals 0 otherwise. However, this follows from Theorem 6.4.37 (a) (in the case i = j)
and Proposition 6.4.38 (a) (in the case i ̸= j). Thus, A · (adj A) = (det A) · In is proved.
Similarly, (adj A) · A = (det A) · In can be shown.

More about the adjugate matrix can be found in [Grinbe15, §6.15] and [Grinbe19,
§5.4–§5.6]; see also [Robins05] for some applications.
There is also a generalization of Theorem 6.4.37, called Laplace expansion along
multiple rows (or columns):
Math 701 Spring 2021, version April 6, 2024 page 393

Theorem 6.4.42. Let n ∈ N. Let A ∈ K n×n be an n × n-matrix. We shall use


I and sum S as defined in Theorem 6.4.23.
the notations e
(a) For every subset P of [n], we have
   
∑ (−1)sum P+sum Q det subQ Q
det A = A det sub A .
e
P Pe
Q⊆[n];
| Q|=| P|

(b) For every subset Q of [n], we have


   
det A = ∑ (−1) sum P+sum Q
det subQ A det subQe A .
e
P P
P⊆[n];
| P|=| Q|

a b c d
 
 a′ b′ c′ d′ 
 a′′ b′′ c′′ d′′  and P = {3, 4}.
Example 6.4.43. Let n = 4 and A =  

a′′′ b′′′ c′′′ d′′′


Math 701 Spring 2021, version April 6, 2024 page 394

Then, Theorem 6.4.42 (a) says that

det A
   
∑ (−1)sum P+sum Q det subQ Q
= A det sub A
e
P Pe
Q⊆[n];
| Q|=| P|
   
sum{3,4}+sum{1,2} {1,2} ^
{1,2}
= (−1) det sub{3,4} A det sub^ A
{3,4}
   
sum{3,4}+sum{1,3} {1,3} ^
{1,3}
+ (−1) det sub{3,4} A det sub^ A
{3,4}
   
sum{3,4}+sum{1,4} {1,4} ^
{1,4}
+ (−1) det sub{3,4} A det sub^ A
{3,4}
   
sum{3,4}+sum{2,3} {2,3} ^
{2,3}
+ (−1) det sub{3,4} A det sub^ A
{3,4}
   
sum{3,4}+sum{2,4} {2,4} ^
{2,4}
+ (−1) det sub{3,4} A det sub^ A
{3,4}
   
sum{3,4}+sum{3,4} {3,4} ^
{3,4}
+ (−1) det A det sub{3,4} A sub^
{3,4}
 ′′
b′′
 ′′
c′′
     
a c d a b d
= det det − det det
a′′′ b′′′ c′ d′ a′′′ c′′′ b′ d′
 ′′
d′′
 ′′
c′′
     
a b c b a d
+ det det + det det
a′′′ d′′′ b′ c′ b′′′ c′′′ a′ d′
 ′′
d′′
 ′′
d′′
     
b a c c a b
− det det + det det .
b′′′ d′′′ a′ c′ c′′′ d′′′ a′ b′

Proof of Theorem 6.4.42. See [Grinbe15, Theorem 6.156].

6.4.8. Desnanot–Jacobi and Dodgson condensation


We come to more exotic results. The following theorem is one of several ver-
sions of the Desnanot–Jacobi formula:

Theorem 6.4.44 (Desnanot–Jacobi formula, take 1). Let n ∈ N be such that


n ≥ 2. Let A ∈ K n×n be an n × n-matrix.
Let A′ be the (n − 2) × (n − 2)-matrix
{2,3,...,n−1} 
sub{2,3,...,n−1} A = Ai+1,j+1 1≤i ≤n−2, 1≤ j≤n−2
.

(This is precisely what remains of the matrix A when we remove the first
Math 701 Spring 2021, version April 6, 2024 page 395

row, the last row, the first column and the last column.) Then,

det A · det A′ = det ( A∼1,∼1 ) · det ( A∼n,∼n ) − det ( A∼1,∼n ) · det ( A∼n,∼1 )

 
det ( A∼1,∼1 ) det ( A∼1,∼n )
= det .
det ( A∼n,∼1 ) det ( A∼n,∼n )

Example 6.4.45. For n = 3, this is saying that


 
a b c
det  a′ b′ c′  · det b′


a′′ b′′ c′′


 ′
b c′
 ′
a b′
     
a b b c
= det · det − det · det .
b′′ c′′ a′ b′ a′′ b′′ b′ c′

Proof of Theorem 6.4.44. See [Grinbe15, Proposition 6.122] or [Bresso99, §3.5, proof
of Theorem 3.12] or [Zeilbe98].
Theorem 6.4.44 provides a recursive way of computing determinants: Indeed,
in the setting of Theorem 6.4.44, if det ( A′ ) is invertible (which, when K is a
field, simply means that det ( A′ ) ̸= 0), then Theorem 6.4.44) yields

det ( A∼1,∼1 ) · det ( A∼n,∼n ) − det ( A∼1,∼n ) · det ( A∼n,∼1 )


det A = . (242)
det ( A′ )

The five matrices appearing on the right hand side of this are smaller than A,
so their determinants are often easier to compute than det A. In particular,
if you are proving something by strong induction on n, you will occasionally
be able to use the induction hypothesis to compute these determinants. This
method of recursively simplifying determinants is often known as Dodgson con-
densation, as it was popularized (perhaps even discovered) by Charles Lutwidge
Dodgson (aka Lewis Carroll) in [Dodgso67, Appendix II]. We outline a sample
application:

Theorem 6.4.46 (Cauchy determinant). Let n ∈ N. Let x1 , x2 , . . . , xn be n


elements of K. Let y1 , y2 , . . . , yn be n elements of K. Assume that xi + y j is
invertible in K for each (i, j) ∈ [n]2 . Then,


 
 !  xi − x j yi − y j
1  = 1≤ i < j ≤ n
det  .


xi + y j xi + y j
1≤i ≤n, 1≤ j≤n
(i,j)∈[n]2
Math 701 Spring 2021, version April 6, 2024 page 396

Once again, there are many ways to prove this (see, e.g., [Grinbe15, Exer-
cise 6.18], [Prasol94, §1.3], [Grinbe09, Theorem 2], or https://proofwiki.org/
wiki/Value_of_Cauchy_Determinant ). But using the Desnanot–Jacobi identity,
there is a rather straightforward proof of Theorem ! 6.4.46. Indeed, if A is a
1
Cauchy matrix (i.e., a matrix of the form ), then so is each
xi + y j
1≤i ≤n, 1≤ j≤n
submatrix of A. Thus, if we proceed by strong induction on n, we can use the
induction hypothesis to compute all five determinants on the right hand side of
(242). The only difficulty is making sure that det ( A′ ) is invertible. To achieve
this, we again have to WLOG assume that our x’s and y’s are indeterminates in
a polynomial ring, and we have to rewrite the claim of Theorem 6.4.46 in the
form
 ! 

det  ∏ ( xi + yk ) = ∏
 
xi − x j yi − y j
k̸= j 1≤i ≤n, 1≤ j≤n 1≤ i < j ≤ n

in order for both sides to actually be polynomials in Z [ x1 , x2 , . . . , xn , y1 , y2 , . . . , yn ].


The details are left to the reader.
Theorem 6.4.44 can be generalized significantly. Here is the simplest general-
ization, in which the special role played by the first and last rows and the first
and last columns is instead given to any two rows and any two columns:
Theorem 6.4.47. Let n ∈ N be such that n ≥ 2. Let p, q, u and v be four
elements of [n] such that p < q and u < v. Let A be an n × n-matrix. Then,
 
[n]\{u,v}
det A · det sub[n]\{ p,q} A
   
= det A∼ p,∼u · det A∼q,∼v − det A∼ p,∼v · det A∼q,∼u .

Proof. See [Grinbe15, Theorem 6.126].


Even more generally, Jacobi’s complementary minor theorem for adjugates (ap-
pearing, e.g., in [Grinbe19, Theorem 5.22], or in equivalent forms in [Prasol94,
Theorem 2.5.2] and [BruSch83, (13)]) says the following:
Theorem 6.4.48 (Jacobi’s complementary minor theorem for adjugates). Let
n ∈ N. For any subset I of [n], we let e
I denote the complement [n] \ I of I. Set
sum S = ∑ s for any finite set S of integers. (For example, sum {2, 4, 5} =
s∈S
2 + 4 + 5 = 11.)
Let A ∈ K n×n be an n × n-matrix. Let P and Q be two subsets of [n] such
that | P| = | Q| ≥ 1. Then,
   
Q sum P+sum Q | Q|−1 Pe
det subP (adj A) = (−1) (det A) det subQe A .
Math 701 Spring 2021, version April 6, 2024 page 397

Theorem 6.4.47 is the particular case of Theorem 6.4.48 for P = {u, v} and
Q = { p, q}. 119 Theorem 6.4.44 is, of course, the particular case of Theorem
6.4.47 for p = 1 and q = n and u = 1 and v = n.

6.5. The Lindström–Gessel–Viennot lemma


We have so far mostly been discussing algebraic properties of determinants, if
often from a combinatorial point of view. We shall now see a situation where
determinants naturally appear in combinatorics.
Namely, we will see an application of determinants to lattice path enumera-
tion – i.e., to the counting of paths on a certain infinite digraph called the integer
lattice. A survey of this subject can be found in [Kratte17]; we will restrict our-
selves to one of the most accessible highlights: the Lindström–Gessel–Viennot
lemma (for short: LGV lemma). This “lemma” is by now so classical and popular
that it is often deservedly called the Lindström–Gessel–Viennot theorem. Alternate
treatments of this theorem can be found in [Sagan19, §2.5], in [Stanle11, §2.7]
and in [GesVie89, §2]. Some applications predating the general statement of
the theorem can be found in [GesVie85], and a surprising recent generalization
in [Talask12, Theorem 2.5].

6.5.1. Definitions
We have already seen lattice paths (and even counted them in Subsection 4.4.1).
We shall now introduce them formally and study them in greater depth.

119 Indeed, if we set P = {u, v} and Q = { p, q} in the situation of Theorem 6.4.47, then

(−1)u+ p det A∼ p,∼u (−1)u+q det A∼q,∼u


  !
subQ
P (adj A) = and
(−1)v+ p det A∼ p,∼v (−1)v+q det A∼q,∼v
 

P [n]\{u,v}
e A = sub[n]\{ p,q} A;
subQ
e

thus, Theorem 6.4.48 is easily seen to boil down to Theorem 6.4.47 in this case (the powers
of −1 all cancel).
Math 701 Spring 2021, version April 6, 2024 page 398

Convention 6.5.1. (a) Recall that “digraph” means “directed graph”, i.e., a
graph whose edges are directed (and are called arcs). Against a widespread
convention, we will allow our digraphs to be infinite (i.e., to have infinitely
many vertices and arcs).
(b) A digraph D will be called path-finite if it has the property that for any
two vertices u and v, there are only finitely many paths from u to v. (Thus,
in particular, such paths can be counted.)
(c) A digraph D will be called acyclic if it has no directed cycles.
2

For example, the digraph 3 1 is acyclic, whereas the digraph

4
2

3 1 is not.

4
(d) A simple digraph D means a digraph whose arcs are merely pairs of
distinct vertices (i.e., each arc is a pair (u, v) of two vertices u and v with
u ̸= v).

We note that a path may contain 0 arcs (in which case its starting and ending
point are identical).

Definition 6.5.2. We consider the infinite simple digraph with vertex set Z2
(so the vertices are pairs of integers) and arcs

(i, j) → (i + 1, j) for all (i, j) ∈ Z2 (243)

and
(i, j) → (i, j + 1) for all (i, j) ∈ Z2 . (244)
The arcs of the form (243) are called “east-steps” or “right-steps”; the arcs of
the form (244) are called “north-steps” or “up-steps”.
The vertices of this digraph will be called lattice points or grid points or
simply points. They will be represented as points in the Cartesian plane
(in the usual way: the vertex (i, j) ∈ Z2 is represented as the point with
x-coordinate i and y-coordinate j).
The entire digraph will be denoted by Z2 and called the integer lattice or
integer grid (or, to be short, just lattice or grid). Here is a picture of a small
Math 701 Spring 2021, version April 6, 2024 page 399

part of this digraph Z2 :

0
0 1 2 3 4 5

(with east-steps colored blue and north-steps colored dark-red). Of course,


the digraph continues indefinitely in all directions. In the following, we will
not draw the vertices as circles, nor will we draw the arcs as arrows; we will
simply draw the grid lines in order to avoid crowding our pictures.
However, Z2 is also an abelian group under addition. Thus, points can be
added and subtracted entrywise; e.g., for any ( a, b) ∈ Z2 and (c, d) ∈ Z2 , we
have

( a, b) + (c, d) = ( a + c, b + d) and
( a, b) − (c, d) = ( a − c, b − d) .

Thus, the digraph Z2 has an arc from a vertex u to a vertex v if and only if
v − u ∈ {(0, 1) , (1, 0)}.
The digraph Z2 is acyclic (i.e., it has no directed cycles). Thus, its paths
are the same as its walks. We call these paths the lattice paths (or just paths).
Thus, a lattice path is a finite tuple (v0 , v1 , . . . , vn ) of points vi ∈ Z2 with the
property that

vi − vi−1 ∈ {(0, 1) , (1, 0)} for each i ∈ [n] . (245)

The step sequence of a path (v0 , v1 , . . . , vn ) is defined to be the n-tuple


(v1 − v0 , v2 − v1 , . . . , vn − vn−1 ). Each entry of this n-tuple is either (0, 1)
or (1, 0) (because of (245)). We will often write U and R for the pairs (0, 1)
and (1, 0), respectively (as they correspond to an up-step and a right-step).
Informally speaking, the step sequence of a path records the directions (i.e.,
east or north) of all steps of the path.
Math 701 Spring 2021, version April 6, 2024 page 400

Example 6.5.3. Here is a path from (0, 0) to (5, 3):

0
0 1 2 3 4 5 .

Formally speaking, this path is the 9-tuple

((0, 0) , (0, 1) , (1, 1) , (2, 1) , (3, 1) , (3, 2) , (4, 2) , (4, 3) , (5, 3)) .

Its step sequence (i.e., the sequence of the directions of its steps) is
URRRURUR (meaning that its first step is an up-step, its second step is
a right-step, its third step is a right-step, and so on).

Clearly, any path is uniquely determined by its starting point and its step
sequence.
Note that we are considering one of the simplest possible notions of a lat-
tice path here. In more advanced texts, the word “lattice path” is often used
for paths in digraphs more complicated than Z2 (for instance, a digraph with
the same vertex set Z2 but allowing steps in all four directions). However, the
digraph we are considering is perhaps the most useful for algebraic combina-
torics.

6.5.2. Counting paths from ( a, b) to (c, d)


In Subsection 4.4.1, we have counted the lattice paths from (0, 0) to (6, 4) that
begin with an east-step and end with a north-step. These are, of course, in
bijection with the paths from (1, 0) to (6, 3) (since the first and last step are pre-
determined and thus can be ignored). Let us now generalize this by counting
paths between any two lattice points:

Proposition 6.5.4. Let ( a, b) ∈ Z2 and (c, d) ∈ Z2 be two points. Then,



 c + d − a − b , if c + d ≥ a + b;


(# of paths from ( a, b) to (c, d)) = c−a

0, if c + d < a + b.

Proof of Proposition 6.5.4. This is just a formalization (and generalization) of the


reasoning we used in Subsection 4.4.1.
Math 701 Spring 2021, version April 6, 2024 page 401

We shall first show the following two observations:

Observation 1: Each path from ( a, b) to (c, d) has exactly c + d − a − b


steps120 .

Observation 2: Each path from ( a, b) to (c, d) has exactly c − a east-


steps.

[Proof of Observation 1: We define the coordinate sum of a point ( x, y) ∈ Z2 to be x + y.


We shall denote this coordinate sum by cs ( x, y). We observe that the coordinate sum
of a point increases by exactly 1 along each arc of Z2 : That is, if u → v is an arc of Z2 ,
then
cs (v) − cs (u) = 1. (246)
(This is because we can write u in the form u = (i, j) and then must have either
v = (i + 1, j) or v = (i, j + 1); but this entails cs (v) = i + j + 1 = cs (u) + 1 in either
|{z}
=cs(u)
case.)
Let (v0 , v1 , . . . , vn ) be a path from ( a, b) to (c, d). Thus, v0 = ( a, b) and vn = (c, d).
Moreover, for each i ∈ [n], we know that vi−1 → vi is an arc of Z2 , and thus we have

cs (vi ) − cs (vi−1 ) = 1

(by (246), applied to u = vi−1 and v = vi ). Summing these equalities over all i ∈ [n],
we obtain
n n
∑ (cs (vi ) − cs (vi−1 )) = ∑ 1 = n.
i =1 i =1
Hence,
n
n= ∑ (cs (vi ) − cs (vi−1 )) = cs |{z}
(vn ) − cs (v0 ) (by the telescope principle)
i =1
|{z}
=(c,d) =( a,b)

= cs (c, d) − cs ( a, b) = c + d − ( a + b) = c + d − a − b.
| {z } | {z }
=c+d = a+b

In other words, the path (v0 , v1 , . . . , vn ) has exactly c + d − a − b steps (since this path
clearly has n steps).
Forget that we fixed (v0 , v1 , . . . , vn ). We thus have shown that each path (v0 , v1 , . . . , vn )
from ( a, b) to (c, d) has exactly c + d − a − b steps. This proves Observation 1.]
[Proof of Observation 2: For any point v ∈ Z2 , we define x (v) to be the x-coordinate
of v. (Thus, x ( x, y) = x for each ( x, y) ∈ Z2 .)
Obviously, the x-coordinate of a point increases by exactly 1 along each east-step and
stays unchanged along each north-step: That is, if u → v is an arc of Z2 , then
(
1, if u → v is an east-step;
x (v) − x (u) = (247)
0, if u → v is a north-step.
120 A “step” of a path means an arc of this path.
Math 701 Spring 2021, version April 6, 2024 page 402

Let (v0 , v1 , . . . , vn ) be a path from ( a, b) to (c, d). Thus, v0 = ( a, b) and vn =


(c, d). Moreover, for each i ∈ [n], we know that vi−1 → vi is an arc of Z2 , and
thus we have
(
1, if vi−1 → vi is an east-step;
x ( v i ) − x ( v i −1 ) =
0, if vi−1 → vi is a north-step

(by (247), applied to u = vi−1 and v = vi ). Summing these equalities over all
i ∈ [n], we obtain
(
n n
1, if vi−1 → vi is an east-step;
∑ (x (vi ) − x (vi−1 )) = ∑ 0, if v → v is a north-step
i =1 i =1 i −1 i
= (# of i ∈ [n] such that vi−1 → vi is an east-step)
= (# of east-steps in the path (v0 , v1 , . . . , vn )) .

Hence,

(# of east-steps in the path (v0 , v1 , . . . , vn ))


n
= ∑ (x (vi ) − x (vi−1 )) = x |{z}
( v n ) − x ( v0 ) (by the telescope principle)
i =1
|{z}
=(c,d) =( a,b)
= x (c, d) − x ( a, b) = c − a.
| {z } | {z }
=c =a

In other words, the path (v0 , v1 , . . . , vn ) has exactly c − a east-steps.


Forget that we fixed (v0 , v1 , . . . , vn ). We thus have shown that each path
(v0 , v1 , . . . , vn ) from ( a, b) to (c, d) has exactly c − a east-steps. This proves
Observation 2.]
Observation 1 immediately shows that no path from ( a, b) to (c, d) exists
when c + d − a − b < 0. In other words, no path from ( a, b) to (c, d) exists
when c + d < a + b. In other words, (# of paths from ( a, b) to (c, d)) = 0 when
c + d < a + b. This proves Proposition 6.5.4 in the case when c + d < a + b.
Hence, for the rest of this proof of Proposition 6.5.4, we WLOG assume that
c + d ≥ a + b. Thus, c + d − a − b ≥ 0.
Observations 1 and 2 have a sort of (common) converse:

Observation 3: Let p be a path that starts at the point ( a, b) and has


exactly c + d − a − b steps. Assume that exactly c − a of these steps
are east-steps. Then, this path p ends at (c, d).

[Proof of Observation 3: Let (c′ , d′ ) be the point at which this path p ends. Then,
we can apply Observation 1 to (c′ , d′ ) instead of (c, d), and hence conclude that this
path has exactly c′ + d′ − a − b steps. Since we already know that this path has exactly
c + d − a − b steps, we therefore conclude that c′ + d′ − a − b = c + d − a − b. In
Math 701 Spring 2021, version April 6, 2024 page 403

other words, c′ + d′ = c + d. Similarly, using Observation 2, we can find that c′ = c.


Subtracting this equality from c′ + d′ = c + d, we obtain d′ = d. Combining c′ = c with
d′ = d, we find (c′ , d′ ) = (c, d). Thus, our path p ends at (c, d) (since we know that it
ends at (c′ , d′ )). This proves Observation 3.]
Now, combining Observations 1, 2 and 3, we see that the paths from ( a, b) to
(c, d) are precisely the paths that start at ( a, b) and have exactly c + d − a − b
steps and exactly c − a east-steps (among these c + d − a − b steps). Such a path
is therefore uniquely determined if we know which c − a of its c + d − a − b
steps are east-steps. Thus, specifying such a path is equivalent to specifying a
(c − a)-element subset of the (c + d − a − b)-element set121 [c + d − a − b]. The
bijection principle thus yields122

(# of paths from ( a, b) to (c, d))


= (# of (c − a) -element subsets of [c + d − a − b])
c+d−a−b
 
= .
c−a
This proves Proposition 6.5.4 (since we have assumed that c + d ≥ a + b).

6.5.3. Path tuples, nipats and ipats


Now, let us try to count something more interesting: tuples of non-intersecting
paths.

Definition 6.5.5. Let k ∈ N.


(a) A k-vertex means a k-tuple of lattice points (i.e., a k-tuple of vertices).
For example, ((1, 2) , (4, 5) , (7, 4)) is a 3-vertex.
(b) If A = ( A1 , A2 , . . . , Ak ) is a k-vertex,
 and if σ ∈ Sk isa permutation,
then σ (A) shall denote the k-vertex Aσ(1) , Aσ(2) , . . . , Aσ(k) . For instance,
for the simple transposition s1 ∈ S3 , we have s1 ( A, B, C ) = ( B, A, C ) for any
3-vertex ( A, B, C ).
(c) If A = ( A1 , A2 , . . . , Ak ) and B = ( B1 , B2 , . . . , Bk ) are two k-vertices, then
a path tuple from A to B means a k-tuple ( p1 , p2 , . . . , pk ), where each pi is a
path from Ai to Bi .
(d) A path tuple ( p1 , p2 , . . . , pk ) is said to be non-intersecting if no two of
the paths p1 , p2 , . . . , pk have any vertex in common. (Visually speaking, this
121 Here we are tacitly using c + d − a − b ≥ 0.
122 To make this more formal: We are saying that the map

{paths from ( a, b) to (c, d)} → {(c − a) -element subsets of [c + d − a − b]} ,


(v0 , v1 , . . . , vc+d−a−b ) 7→ {i ∈ [c + d − a − b] | the arc vi−1 → vi is an east-step}
is a bijection, and we are applying the bijection principle to this bijection.
Math 701 Spring 2021, version April 6, 2024 page 404

not only forbids them from crossing each other, but also forbids them from
touching or bouncing off each other, or starting or ending at the same point.)
We shall abbreviate “non-intersecting path tuple” as “nipat”. (Histori-
cally, the more common abbreviation is “NILP”, for “non-intersecting lattice
paths”, but I prefer “nipat” as it stresses the tupleness.)
(e) A path tuple ( p1 , p2 , . . . , pk ) is said to be intersecting if it is not non-
intersecting (i.e., if two of its paths have a vertex in common).
We shall abbreviate “intersecting path tuple” as “ipat”.

Example 6.5.6. Here are some path tuples for k = 3:


(a) The following path tuple is a nipat:

6 B3 B2
p3
5
p2
4 B1

3 A3
p1
2

1 A2 A1

0
0 1 2 3 4 5 6 7 .
Math 701 Spring 2021, version April 6, 2024 page 405

(b) The following path tuple is an ipat:

6 B3 B2
p3
5
p2
4 B1

3 A3
p1
2

1 A2 A1

0
0 1 2 3 4 5 6 7 .

(c) The following path tuple is an ipat, too (for several reasons):

6 B3 B2

5 p2

4 p3

3 A3 B1

2
p1
1 A1 A2

0
0 1 2 3 4 5 6 7 .

(In this tuple, the paths p1 and p2 even have an arc in common. Don’t let the
picture confuse you: The two curved arcs are actually one and the same arc
of Z2 appearing in two paths, not two different arcs.)

6.5.4. The LGV lemma for two paths


As I mentioned, we want to count nipats. Here is a first result on the case k = 2:
Math 701 Spring 2021, version April 6, 2024 page 406

Proposition 6.5.7 (LGV lemma for two paths). Let ( A, A′ ) and ( B, B′ ) be two
2-vertices (i.e., let A, A′ , B, B′ be four lattice points). Then,

(# of paths from A to B) (# of paths from A to B′ )


 
det
(# of paths from A′ to B) (# of paths from A′ to B′ )
= # of nipats from A, A′ to B, B′
 

− # of nipats from A, A′ to B′ , B .
 

Example 6.5.8. Let A = (0, 0) and A′ = (1, 1) and B = (2, 2) and B′ = (3, 3).
Then, Proposition 6.5.7 says that
     
4 6
 2 3 
 
det       = # of nipats from A, A′ to B, B′
 
 2 4 
1 2
− # of nipats from A, A′ to B′ , B .
 

(The matrix entries on the left hand side have been computed using Propo-
sition 6.5.4.)
And indeed, this equality is easily verified. There are 2 nipats from ( A, A′ )
to ( B, B′ ), one of which is

3 B′

2 B

1 A′

0A
0 1 2 3

while the other is its reflection in the x = y diagonal. There are 6 nipats from
( A, A′ ) to ( B′ , B), three of which are

3 B′ 3 B′ 3 B′

2 B 2 B 2 B

1 A′ 1 A′ 1 A′

0A 0A 0A
0 1 2 3 0 1 2 3 0 1 2 3
Math 701 Spring 2021, version April 6, 2024 page 407

while the other three are their reflections in the x = y diagonal. The right
hand side of the above equality is thus 2 − 6 = −4, which is also the left
hand side.

Example 6.5.9. Let A = (0, 0) and A′ = (−1, 1) and B = (2, 2)


and B′ = (0, 3). Then, the claim of Proposition 6.5.7 simplifies, since
(# of nipats from ( A, A′ ) to ( B′ , B)) = 0 in this case. Here is a picture of
the four points that should make this visually clear:

B′

A′

A .

Proof of Proposition 6.5.7. We have

(# of paths from A to B) (# of paths from A to B′ )


 
det
(# of paths from A′ to B) (# of paths from A′ to B′ )
= (# of paths from A to B) · # of paths from A′ to B′

| {z }
=(# of path tuples from ( A,A′ ) to ( B,B′ ))
(by the product rule, since a path tuple from ( A,A′ ) to ( B,B′ ) is just
a pair consisting of a path from A to B and a path from A′ to B′ )
′ ′
 
− # of paths from A to B · # of paths from A to B
| {z }
=(# of path tuples from ( A,A′ ) to ( B′ ,B))
(by the product rule, since a path tuple from ( A,A′ ) to ( B′ ,B) is just
a pair consisting of a path from A to B′ and a path from A′ to B)
′ ′
 
= # of path tuples from A, A to B, B
− # of path tuples from A, A′ to B′ , B .
 
(248)

We need to show that on the right hand side, all the intersecting path tuples
cancel each other out (so that only the nipats remain).
Our k-vertices are 2-vertices; thus, our path tuples are pairs. Hence, such a
path tuple ( p, p′ ) is intersecting if and only if p and p′ have a vertex in common.
We shall use these common vertices to define a sign-reversing involution on the
intersecting path tuples. Specifically, we do the following:
Define the set

A := path tuples from A, A′ to B, B′


  

⊔ path tuples from A, A′ to B′ , B .


  
Math 701 Spring 2021, version April 6, 2024 page 408

Here, the symbol “⊔” means “disjoint union (of sets)”, which is a way of unit-
ing two sets without removing duplicates (i.e., even if the sets are not disjoint,
we treat them as disjoint for the purpose of the union, and therefore include
two copies of each common element). As a consequence of us taking the dis-
joint union, each path tuple in A “remembers” whether it comes from the set
{path tuples from ( A, A′ ) to ( B, B′ )} or from the set
{path tuples from ( A, A′ ) to ( B′ , B)} (and if these two sets have a path tuple
in common, then A has two copies of it, each of which remembers from which
set it comes). However, in practice, this is barely relevant: Indeed, the only case
in which the sets
{path tuples from ( A, A′ ) to ( B, B′ )} and {path tuples from ( A, A′ ) to ( B′ , B)}
can fail to be disjoint is the case when B = B′ ; however, in this case, the claim
we are proving is trivial anyway, since there are no nipats123 , and our matrix
has determinant 0 (since it has two equal columns).
Define a subset X of A by

X := {ipats in A} = p, p′ ∈ A | p and p′ have a vertex in common .


 

Hence, A \ X = {nipats in A}.


For each ( p, p′ ) ∈ A, we set
(

 1, if ( p, p′ ) is a path tuple from ( A, A′ ) to ( B, B′ ) ;
sign p, p :=
−1, if ( p, p′ ) is a path tuple from ( A, A′ ) to ( B′ , B) .

(This is well-defined, because each ( p, p′ ) ∈ A is either a path tuple from


( A, A′ ) to ( B, B′ ) or a path tuple from ( A, A′ ) to ( B′ , B) but never both at the
same time124 .) Thus,

# of path tuples from A, A′ to B, B′


 

− # of path tuples from A, A′ to B′ , B


 

= ∑ sign p, p′

( p,p′ )∈A

and

A, A′ to B, B′
 
# of nipats from
− # of nipats from A, A′ to B′ , B
 

∑ sign p, p′

= (since A \ X = {nipats in A}) .
( p,p′ )∈A\X

123 Indeed, if p and p′ are two paths with the same destination, then p and p′ automatically have
a vertex in common.
124 because we took the disjoint union of {path tuples from ( A, A′ ) to ( B, B′ )} and

{path tuples from ( A, A′ ) to ( B′ , B)}


Math 701 Spring 2021, version April 6, 2024 page 409

We want to prove that the left hand sides of these two equalities are equal.
Thus, it clearly suffices to show that the right hand sides are equal. By Lemma
6.1.3, it suffices to find a sign-reversing involution f : X → X that has no fixed
points.
So let us define our sign-reversing involution f : X → X . For each path
tuple ( p, p′ ) ∈ X , we define f ( p, p′ ) as follows:

• Since ( p, p′ ) ∈ X , the paths p and p′ have a vertex in common. There


might be several; let v be the first one. (The first one on p or the first one
on p′ ? Doesn’t matter, because these are the same thing. Indeed, if the
first vertex on p that is contained in p′ was different from the first vertex
on p′ that is contained in p, then we could obtain a nontrivial circuit of our
digraph by walking from the former vertex to the latter vertex along p and
then back along p′ ; but this is impossible, since our digraph is acyclic.)
We call this vertex v the first intersection of ( p, p′ ).

• Call the part of p that comes after v the tail of p. Call the part of p that
comes before v the head of p.
Call the part of p′ that comes after v the tail of p′ . Call the part of p′ that
comes before v the head of p′ .

• Now, we exchange the tails of the paths p and p′ . That is, we set

q := (head of p) ∪ tail of p′

and
q′ := head of p′ ∪ (tail of p)


(where the symbol “∪” means combining a path ending at v with a path
starting at v in the obvious way), and set f ( p, p′ ) := (q, q′ ).

Thus, we have defined a map f : X → X (in a moment, we will explain why


it is well-defined). Here is an example:

B B
p q′

p′ B′ q ′
p′ q B
′ ′
 
If p, p = , then q, q = .
A′ A′
p′ q′
p q

A A
Math 701 Spring 2021, version April 6, 2024 page 410

Here is the same configuration, with the point v marked and with the tails of
the two paths drawn extra-thick:

B
p

p′ B′
p′
.
A′ v
p′
p

We note that if ( p, p′ ) ∈ X is an ipat from ( A, A′ ) to ( B, B′ ), then f ( p, p′ ) =


(q, q′ ) is an ipat from ( A, A′ ) to ( B′ , B) (because by exchanging the tails of p
and p′ , we have caused the two paths to exchange their destinations as well),
and vice versa. Thus, f ( p, p′ ) ∈ X whenever ( p, p′ ) ∈ X . This shows that the
map f : X → X is well-defined.
Furthermore, let ( p, p′ ) ∈ X be any ipat, and let (q, q′ ) = f ( p, p′ ). Then,
the vertex v chosen in the definition of f ( p, p′ ) (that is, the first intersection of
( p, p′ )) is still the first intersection of (q, q′ ) (because when we exchange the tails
of p and p′ , we do not change their heads, and thus the resulting paths q and
q′ still do not intersect until v), and therefore this vertex gets chosen again if
we apply our map f to (q, q′ ). As a consequence, f (q, q′ ) is again ( p, p′ ) (since
exchanging the tails of q and q′ simply undoes the changes incurred when we
exchanged the tails of p and p′ ). Thus,
 

( f ◦ f ) p, p′ = f  f p, p′  = f q, q′ = p, p′ .
    
| {z }
=(q,q′ )

Forget that we fixed ( p, p′ ). We thus have shown that ( f ◦ f ) ( p, p′ ) = ( p, p′ )


for each ( p, p′ ) ∈ X . Hence, f ◦ f = id. In other words, f is an involution on
X . Moreover, this involution f is sign-reversing (i.e., satisfies sign ( f ( p, p′ )) =
− sign ( p, p′ ) for any ( p, p′ ) ∈ X ) 125 . As a consequence of the latter fact,
we see that f has no fixed points (i.e., that we have f ( p, p′ ) ̸= ( p, p′ ) for any
( p, p′ ) ∈ X ). Hence, Lemma 6.1.3 yields

∑ sign p, p′ = ∑ sign p, p′ .
 
(249)
( p,p′ )∈A ( p,p′ )∈A\X

125 Thisis because we have observed above that if ( p, p′ ) ∈ X is an ipat from ( A, A′ ) to ( B, B′ ),


then f ( p, p′ ) = (q, q′ ) is an ipat from ( A, A′ ) to ( B′ , B), and vice versa.
Math 701 Spring 2021, version April 6, 2024 page 411

As we have explained above, the left hand side of this equality is

# of path tuples from A, A′ to B, B′


 

− # of path tuples from A, A′ to B′ , B ,


 

whereas its right hand side is

A, A′ to B, B′
 
# of nipats from
− # of nipats from A, A′ to B′ , B
 

(since A \ X = {nipats in A}). Hence, (249) rewrites as

# of path tuples from A, A′ to B, B′


 

− # of path tuples from A, A′ to B′ , B


 

= # of nipats from A, A′ to B, B′
 

− # of nipats from A, A′ to B′ , B .
 

In view of (248), this rewrites as

(# of paths from A to B) (# of paths from A to B′ )


 
det
(# of paths from A′ to B) (# of paths from A′ to B′ )
= # of nipats from A, A′ to B, B′
 

− # of nipats from A, A′ to B′ , B .
 

This completes our proof of Proposition 6.5.7.


As in Example 6.5.9, the proposition becomes particularly nice when we have
(# of nipats from ( A, A′ ) to ( B′ , B)) = 0. Here is a sufficient criterion for when
this happens:

Proposition 6.5.10 (baby Jordan curve theorem). Let A, B, A′ and B′ be four


lattice points satisfying

x A′ ≤ x ( A) , y A′ ≥ y ( A) ,
 
(250)
′ ′
 
x B ≤ x ( B) , y B ≥ y ( B) . (251)

Here, x ( P) and y ( P) denote the two coordinates of any point P ∈ Z2 .


Let p be any path from A to B′ . Let p′ be any path from A′ to B. Then, p
and p′ have a vertex in common.

Note that the condition (250) can be restated as “the point A′ lies weakly
northwest of A”, where “weakly northwest of A” allows for the options “due
north of A”, “due west of A” and “at A”. Likewise, (251) can be restated as
Math 701 Spring 2021, version April 6, 2024 page 412

“the point B′ lies weakly northwest of B”. The following picture illustrates the
situation of Proposition 6.5.10:

6 B′
p
5

4 B
p′
3 A′
p′
2

1 A
p
0
0 1 2 3 4 5 6 7

Proposition 6.5.10 has an intuitive plausibility to it (one can think of the path
p as creating a “river” that the path p′ must necessarily cross somewhere), but
it is not obvious from a mathematical perspective. We give a rigorous proof in
Section B.7.
Proposition 6.5.7 is just the k = 2 case of a more general theorem that we will
soon derive; however, it already has a nice application:
 2    
n n n
Corollary 6.5.11. Let n, k ∈ N. Then, ≥ · .
k k−1 k+1

Proof of Corollary 6.5.11 (sketched). This is easy to see algebraically, but here is a
combinatorial proof: Define four lattice points A = (1, 0) and A′ = (0, 1) and
B = (k + 1, n − k ) and B′ = (k, n − k + 1). Then, Proposition 6.5.7 yields

(# of paths from A to B) (# of paths from A to B′ )


 
det
(# of paths from A′ to B) (# of paths from A′ to B′ )
= # of nipats from A, A′ to B, B′ − # of nipats from A, A′ to B′ , B
   
| {z }
=0
(since Proposition 6.5.10 yields that any path
from A to B and any path from A′ to B have a

vertex in common)
= # of nipats from A, A′ to B, B′ ≥ 0.
 
Math 701 Spring 2021, version April 6, 2024 page 413

However, Proposition 6.5.4 yields

(# of paths from A to B) (# of paths from A to B′ )


 

(# of paths from A′ to B) (# of paths from A′ to B′ )


     
n n
  k  k−1 
 
=   ,
 n n 
k+1 k
 2    
n n n
so the determinant on the left hand side is − · . Thus,
k k−1 k+1
we have obtained  2    
n n n
− · ≥ 0,
k k−1 k+1
and this proves Corollary 6.5.11.
     
n n n
Corollary 6.5.11 is often stated as “the sequence , ,..., is log-
0 1 n
concave”. There are many more log-concave sequences in combinatorics (see,
e.g., [Sagan19], [Stanle89] and [Brande14] for more).

6.5.5. The LGV lemma for k paths


The following proposition – which is one of the weakest forms of the LGV lemma
(short for Lindström–Gessel–Viennot lemma) – extends the logic of Proposition
6.5.7 to nipats between k-vertices for general k).

Proposition 6.5.12 (LGV lemma, lattice counting version). Let k ∈ N. Let


A = ( A1 , A2 , . . . , Ak ) and B = ( B1 , B2 , . . . , Bk ) be two k-vertices. Then,
  
det # of paths from Ai to Bj 1≤i≤k, 1≤ j≤k
= ∑ (−1)σ (# of nipats from A to σ (B)) .
σ ∈ Sk

The right hand side of this equality can be viewed as a signed count of all
k-tuples of paths that start at the points A1 , A2 , . . . , Ak in this order, but end
at the points B1 , B2 , . . . , Bk in some order. For example, for k = 3, the claim of
Math 701 Spring 2021, version April 6, 2024 page 414

Proposition 6.5.12 takes the form


  
det # of paths from Ai to Bj 1≤i≤3, 1≤ j≤3
= (# of nipats from ( A1 , A2 , A3 ) to ( B1 , B2 , B3 ))
− (# of nipats from ( A1 , A2 , A3 ) to ( B1 , B3 , B2 ))
− (# of nipats from ( A1 , A2 , A3 ) to ( B2 , B1 , B3 ))
+ (# of nipats from ( A1 , A2 , A3 ) to ( B2 , B3 , B1 ))
+ (# of nipats from ( A1 , A2 , A3 ) to ( B3 , B1 , B2 ))
− (# of nipats from ( A1 , A2 , A3 ) to ( B3 , B2 , B1 )) .
Proof of Proposition 6.5.12 (sketched). We adapt the idea of our proof of Proposi-
tion 6.5.7, but we have to be more systematic now. Define a set
A := {(σ, p) | σ ∈ Sk , and p is a path tuple from A to σ (B)}
126 . Define a subset X of A by127
X := {ipats in A} = {(σ, p) ∈ A | p is intersecting} .
Set
sign (σ, p) := (−1)σ for each (σ, p) ∈ A.
Again, we need to find a sign-reversing involution f : X → X that has no
fixed points.
Again, we construct this involution by exchanging the tails of two intersecting
paths128 in our path tuple. There is a complication now, due to the fact that
there might be several pairs of intersecting paths. We have to come up with
a rule for picking one such pair so that when we apply f again to the result
of the exchange, then we again pick the same pair. Otherwise, f won’t be an
involution!
There are different ways to do this. Here is one: If (σ, ( p1 , p2 , . . . , pk )) ∈ X ,
then we construct f (σ, ( p1 , p2 , . . . , pk )) ∈ X as follows:
126 This
set A is meant to generalize the set A that was used in the proof of Proposition 6.5.7.
The reason why we are defining it to be
{(σ, p) | σ ∈ Sk , and p is a path tuple from A to σ (B)}
instead of {p | p is a path tuple from A to σ (B) for some σ ∈ Sk }
is to make sure that each path tuple in A “remembers” which permutation σ ∈ Sk it comes
from. (This is the same rationale that caused us to take the disjoint union in the proof of
Proposition 6.5.7; but now we are explicitly inserting the σ into the elements of A rather
than handwaving about disjoint unions.)
127 Here we are saying that a pair ( σ, p ) ∈ A is an ipat if the path tuple p is an ipat, and we are

saying that a pair (σ, p) ∈ A is a nipat if the path tuple p is a nipat. This is a bit sloppy (σ
has nothing to do with whether p is an ipat or a nipat), but we hope that no confusion will
ensue.
128 This is probably obvious, but just in case: We say that two paths intersect if they have a vertex

in common.
Math 701 Spring 2021, version April 6, 2024 page 415

• We say that a point u is crowded if it is contained in at least two of our


paths p1 , p2 , . . . , pk . Since ( p1 , p2 , . . . , pk ) is intersecting, there exists at
least one crowded point.

• We pick the smallest i ∈ [k ] such that pi contains a crowded point.

• Then, we pick the first crowded point v on pi .

• Then, we pick the largest j ∈ [k] such that v belongs to p j . (Note that j > i,
since v is crowded.)

• Call the part of pi that comes after v the tail of pi . Call the part of pi that
comes before v the head of pi .
Call the part of p j that comes after v the tail of p j . Call the part of p j that
comes before v the head of p j .

• Then, we exchange the tails of the paths pi and p j (while leaving all other
paths unchanged).

• We let (q1 , q2 , . . . , qk ) be the resulting path tuple129 .

• We set σ′ := σti,j , where ti,j is the transposition in Sk that swaps i with j.


Thus, (q1 , q2 , . . . , qk ) is a path tuple from A to σ′ (B) (because exchanging
the tails of the paths pi and p j has switched their ending points Bσ(i) and
Bσ( j) to Bσ( j) = Bσ′ (i) and Bσ(i) = Bσ′ ( j) , respectively).

• Finally, we set

f (σ, ( p1 , p2 , . . . , pk )) := σ′ , (q1 , q2 , . . . , qk ) .


This defines a map f : X → X (again, it is not hard to see that it is well-

129 Thus,

 
qi = (head of pi ) ∪ tail of p j and q j = head of p j ∪ (tail of pi )

(where “head” means “part until v”, and “tail” means “part after v”). Furthermore, we have
qm = pm for any m ∈ [k] \ {i, j}, since we have left all paths other than pi and p j unchanged.
Math 701 Spring 2021, version April 6, 2024 page 416

defined). Here is an example: If k = 5 and if ( p1 , p2 , . . . , pk ) is

B3′ B5′
p5 p3
p3 B2′
p2

p5 p2
A5 p3

A4 B4′
p3 p4
A3 p2

A2

A1 B1′
p1

(where Bm ′ is shorthand for B


σ(m) ), then v is the point where paths p2 and p4
intersect, and we have i = 2 and j = 4, and therefore (q1 , q2 , . . . , qk ) is

B3′ B5′
q5 q3
q3 B2′
q4

q5 q4
A5 q3

A4 v B4′
q3 q2
A3 q2

A2

A1 B1′
q1

(where we again have drawn the exchanged tails extra-thick).


Convince yourself that the map f defined above really is a sign-reversing in-
volution from X to X . (This means showing that applying f twice in succession
to any given (σ, p) ∈ X returns (σ, p), and that sign ( f (σ, p)) = − sign (σ, p)
for any (σ, p) ∈ X . The proof of the second claim, of course, relies on parts (b)
and (d) of Proposition 5.4.2.)
Math 701 Spring 2021, version April 6, 2024 page 417

Thus, we have defined a sign-reversing involution f : X → X . This involu-


tion f has no fixed points (since it is sign-reversing). It is now easy to complete
the proof: Lemma 6.1.3 yields

∑ sign (σ, p) = ∑ sign (σ, p) .


(σ,p)∈A (σ,p)∈A\X

In view of our definition of sign (σ, p), this rewrites as

∑ (−1)σ = ∑ (−1)σ . (252)


(σ,p)∈A (σ,p)∈A\X

The left hand side of this equality is

∑ (−1)σ = ∑ (−1)σ (# of path tuples from A to σ (B))


(σ,p)∈A σ ∈ Sk
| {z }
k
= ∏ (# of paths from Ai to Bσ(i) )
i =1
(by the product rule, since a path tuple from A to σ(B) is just
a tuple ( p1 ,p2 ,...,pk ), where each pi is a path from Ai to Bσ(i) )

(by the definition of A)


k  
= ∑ (−1)σ ∏ # of paths from Ai to Bσ(i)
σ ∈ Sk i =1
  
= det # of paths from Ai to Bj 1≤i ≤k, 1≤ j≤k

(by the definition of the determinant), whereas the right hand side is
 
since X = {ipats in A}
∑ (−1) = σ
∑ (−1) σ
entails A \ X = {nipats in A}
(σ,p)∈A\X (σ,p)∈A is a nipat
= ∑ (−1)σ (# of nipats from A to σ (B))
σ ∈ Sk

(by the definition of A). Thus, (252) rewrites as


  
det # of paths from Ai to Bj 1≤i≤k, 1≤ j≤k
= ∑ (−1)σ (# of nipats from A to σ (B)) .
σ ∈ Sk

This proves Proposition 6.5.12.

6.5.6. The weighted version


So far we have just been counting paths; but we can easily introduce weights to
obtain a more general result:
Math 701 Spring 2021, version April 6, 2024 page 418

Theorem 6.5.13 (LGV lemma, lattice weight version). Let K be a commutative


ring.
For each arc a of the digraph Z2 , let w ( a) be an element of K. We call this
element w ( a) the weight of a.
For each path p of Z2 , define the weight w ( p) of p by

w ( p) := ∏ w ( a) .
a is an arc of p

For each path tuple p = ( p1 , p2 , . . . , pk ), define the weight w (p) of p by

w ( p ) : = w ( p1 ) w ( p2 ) · · · w ( p k ) .

Let k ∈ N. Let A = ( A1 , A2 , . . . , Ak ) and B = ( B1 , B2 , . . . , Bk ) be two


k-vertices. Then,
  

det  ∑ w ( p)  = ∑ (−1)


σ
∑ w (p) .
 
p:Ai → Bj σ ∈ Sk p is a nipat
1≤i ≤k, 1≤ j≤k from A to σ(B)

Here, “p : Ai → Bj ” means “p is a path from Ai to Bj ”.

Clearly, Proposition 6.5.12 is the particular case of Theorem 6.5.13 when K =


Z and w ( a) = 1 for all arcs a (because in this case, all the weights w ( p) and
w (p) of paths and path tuples equal 1, and therefore the sums over paths or
nipats become the #s of paths or nipats).
Proof of Theorem 6.5.13. The same argument as for Proposition 6.5.12 can be
used here; just replace sign (σ, p) := (−1)σ by sign (σ, p) = (−1)σ · w (p). (The
only new observation required is that when we exchange the tails of two paths
in our path tuple, the weight of the path tuple does not change. This is rather
clear: The weight of a path tuple is the product of the weights of all arcs in all
paths of the tuple130 . When we exchange the tails of two paths, some arcs get
moved from one path to the other, but the total product stays unchanged.)

6.5.7. Generalization to acyclic digraphs


We can generalize Theorem 6.5.13 further. Indeed, we have barely used any-
thing specific to Z2 in our proofs; all we used is that Z2 is a path-finite acyclic
digraph. Thus, Theorem 6.5.13 remains true if we replace Z2 by an arbitrary
such digraph. We thus obtain the following more general result:

130 An arc will appear multiple times in the product if it appears in multiple paths.
Math 701 Spring 2021, version April 6, 2024 page 419

Theorem 6.5.14 (LGV lemma, digraph weight version). Let K be a commuta-


tive ring.
Let D be a path-finite (but possibly infinite) acyclic digraph. We extend
Definition 6.5.5 to D instead of Z2 (with the obvious changes: “lattice points”
becomes “vertices of D”).
For each arc a of the digraph D, let w ( a) be an element of K. We call this
element w ( a) the weight of a.
For each path p of D, define the weight w ( p) of p by

w ( p) := ∏ w ( a) .
a is an arc of p

For each path tuple p = ( p1 , p2 , . . . , pk ), define the weight w (p) of p by

w ( p ) : = w ( p1 ) w ( p2 ) · · · w ( p k ) .

Let k ∈ N. Let A = ( A1 , A2 , . . . , Ak ) and B = ( B1 , B2 , . . . , Bk ) be two


k-vertices (i.e., two k-tuples of vertices of D). Then,
  

det  ∑ w ( p) = ∑ (−1)σ ∑ w (p) .


 
p:Ai → Bj σ ∈ Sk p is a nipat
1≤i ≤k, 1≤ j≤k from A to σ(B)

Here, “p : Ai → Bj ” means “p is a path from Ai to Bj ”.

Proof. Completely analogous to the proof of Theorem 6.5.13.

6.5.8. The nonpermutable case


One nice thing about the digraph Z2 , however, is that in many cases, the sum

∑ (−1)σ ∑ w (p)
σ ∈ Sk p is a nipat
from A to σ(B)

has only one nonzero addend. We have already seen this happen often in the
k = 2 case (thanks to Proposition 6.5.10). Here is the analogous statement for
general k:
Corollary 6.5.15 (LGV lemma, nonpermutable lattice weight version). Con-
sider the setting of Theorem 6.5.13, but additionally assume that

x ( A1 ) ≥ x ( A2 ) ≥ · · · ≥ x ( A k ) ; (253)
y ( A1 ) ≤ y ( A2 ) ≤ · · · ≤ y ( A k ) ; (254)
x ( B1 ) ≥ x ( B2 ) ≥ · · · ≥ x ( Bk ) ; (255)
y ( B1 ) ≤ y ( B2 ) ≤ · · · ≤ y ( Bk ) . (256)
Math 701 Spring 2021, version April 6, 2024 page 420

Here, x ( P) and y ( P) denote the two coordinates of any point P ∈ Z2 .


Then, there are no nipats from A to σ (B) when σ ∈ Sk is not the identity
permutation id ∈ Sk . Therefore, the claim of Theorem 6.5.13 simplifies to
  

det  ∑ w ( p)
 

p:Ai → Bj
1≤i ≤k, 1≤ j≤k
= ∑ w (p) . (257)
p is a nipat
from A to B

Proof of Corollary 6.5.15 (sketched). This is easy using Proposition 6.5.10. Here
are the details:
Let σ ∈ Sk be a permutation that is not the identity permutation id ∈ Sk . Then, we
don’t have σ (1) ≤ σ (2) ≤ · · · ≤ σ (k ) (since σ is not id). In other words, there exists
some i ∈ [k − 1] such that σ (i ) > σ (i + 1). Consider this i.
Now, let p be a nipat from A to σ (B). Write p in the form p = ( p1 , p2 , . . . , pk ). Thus,
pi is a path from Ai to Bσ(i) , whereas pi+1 is a path from Ai+1 to Bσ(i+1) . Moreover, pi
and pi+1 have no vertex in common (since p is a nipat).
The sequence (x ( B1 ) , x ( B2 ) , . . . , x ( Bk )) is weakly decreasing (by (255)). In other
words, if m and n are two elements of [k ] satisfying m >  n, then
 x( Bm ) ≤x ( Bn ).
Applying this to m = σ (i ) and n = σ (i + 1), we obtain x Bσ(i) ≤ x Bσ(i+1) (since
   
σ (i ) > σ (i + 1)). Likewise, using (256), we can obtain y Bσ(i) ≥ y Bσ(i+1) . How-
ever, (253) shows that x ( Ai ) ≥ x ( Ai+1 ). In other words, x ( Ai+1 ) ≤ x ( Ai ). Further-
more, (254) shows that y ( Ai ) ≤ y ( Ai+1 ). In other words, y ( Ai+1 ) ≥ y ( Ai ).
Hence, Proposition 6.5.10 (applied to A = Ai , B = Bσ(i+1) , A′ = Ai+1 , B′ = Bσ(i) ,
p = pi and p′ = pi+1 ) yields that pi and pi+1 have a vertex in common. This contradicts
the fact that pi and pi+1 have no vertex in common.
Forget that we fixed p. We thus have found a contradiction for each nipat p from A
to σ (B). Hence, there are no nipats from A to σ (B).
Forget that we fixed σ. We thus have proved that there are no nipats from A to σ (B)
when σ ∈ Sk is not the identity permutation id ∈ Sk . Hence, if σ ∈ Sk is not the identity
permutation id ∈ Sk , then

∑ w (p) = (empty sum) = 0. (258)


p is a nipat
from A to σ(B)
Math 701 Spring 2021, version April 6, 2024 page 421

Now, Theorem 6.5.13 yields


  

det  ∑ w ( p)
 

p:Ai → Bj
1≤i ≤k, 1≤ j≤k

= ∑ (−1) σ
∑ w (p)
σ ∈ Sk p is a nipat
from A to σ(B)

= (−1)id ∑ w (p) + ∑ (−1)σ ∑ w (p)


p is a nipat σ ∈ Sk ; p is a nipat
| {z }
=1 from A to id(B) σ̸=id from A to σ(B)
| {z }
=0
(by (258))

(here, we have split off the addend for σ = id from the sum)
= ∑ w (p) + ∑ (−1)σ 0 = ∑ w (p) = ∑ w (p)
p is a nipat σ ∈ Sk ; p is a nipat p is a nipat
from A to id(B) σ̸=id from A to id(B) from A to B
| {z }
=0

(since id (B) = B). The proof of Corollary 6.5.15 is now complete.

Corollary 6.5.16. Let k ∈ N. Let a1 , a2 , . . . , ak and b1 , b2 , . . . , bk be nonnegative


integers such that

a1 ≥ a2 ≥ · · · ≥ a k and b1 ≥ b2 ≥ · · · ≥ bk .

Then,  
 !
ai  ≥ 0.
det 
bj
1≤i ≤k, 1≤ j≤k

For example, if a1 ≥ a2 ≥ a3 ≥ 0 and b1 ≥ b2 ≥ b3 ≥ 0, then


       
a1 a1 a1
 b
  1   b2  b 
 a  3 
2 a2 a2 
det   ≥ 0.
 
 b1 b b
    2  3 

 a3 a3 a3 
b1 b2 b3

Proof of Corollary 6.5.16 (sketched). Set K = Z, and set w ( a) := 1 for each arc a
of Z2 . Define the lattice points

Ai := (0, − ai ) and Bi := (bi , −bi )


Math 701 Spring 2021, version April 6, 2024 page 422

for all i ∈ [k]. These lattice points satisfy the assumptions of Corollary 6.5.15.
Hence, (257) entails
  

det  ∑ w ( p) = ∑ w (p) .


 
p:Ai → Bj p is a nipat
1≤i ≤k, 1≤ j≤k from A to B

Since all the weights w ( p) and w (p) are 1 in our situation, we can rewrite this
as
  
det # of paths from Ai to Bj 1≤i≤k, 1≤ j≤k = (# of nipats from A to B) .

Using Proposition 6.5.4, we!can easily see that the matrix on the left hand
 
ai
side of this equality is . Thus, this equality rewrites as
bj
1≤i ≤k, 1≤ j≤k
 
 !
ai  = (# of nipats from A to B) .
det 
bj
1≤i ≤k, 1≤ j≤k

Its left hand side is therefore ≥ 0 (since its right hand side is ≥ 0). This proves
Corollary 6.5.16.
 
1 2n
Corollary 6.5.17. Let k ∈ N. Recall the Catalan numbers cn =
n+1 n
for all n ∈ N. Then,
 
c 0 c 1 · · · c k −1
    c1 c2 · · · ck 
det ci+ j−2 1≤i≤k, 1≤ j≤k = det  . = 1.
 
 .. .. . . .. 
. . . 
c k −1 c k · · · c2k−2

Proof of Corollary 6.5.17 (sketched). We will use not the lattice Z2 , but a different
digraph. Namely, we use the simple digraph with vertex set Z × N (that is, the
vertices are the lattice points that lie on the x-axis or above it) and arcs

(i, j) → (i + 1, j + 1) for all (i, j) ∈ Z × N

and
(i, j) → (i + 1, j − 1) for all (i, j) ∈ Z × P,
Math 701 Spring 2021, version April 6, 2024 page 423

where P := {1, 2, 3, . . .}. Here is a picture of a small part of this digraph:

0
0 1 2 3 4 5 .

As we know, the Catalan number cn counts the paths from (0, 0) to (2n, 0) on
this digraph (indeed, these are just the Dyck paths131 ). Hence, cn also counts
the paths from (i, 0) to (2n + i, 0) whenever i ∈ N (because these are just the
Dyck paths shifted by i in the x-direction). It is easy to see that this digraph is
acyclic and path-finite.
Now, define two k-vertices A = ( A1 , A2 , . . . , Ak ) and B = ( B1 , B2 , . . . , Bk ) by
setting
Ai := (−2 (i − 1) , 0) and Bi := (2 (i − 1) , 0)
for all i ∈ [k ]. It is not hard to show (see Exercise A.5.4.2 (a)) that there is
only one nipat from A to B, which is shown in the case k = 4 on the following

131 See Example 2 in Section 3.1 for the definition of a Dyck path.
Math 701 Spring 2021, version April 6, 2024 page 424

picture:132

A4 A3 A2 B11
A B2 B3 B4

(the point A1 coincides with B1 , and the path from A1 to B1 is invisible, since it
has no arcs). Moreover, it can be shown (see Exercise A.5.4.2 (b)) that there are
no nipats from A to σ (B) when σ ∈ Sk is not the identity permutation id ∈ Sk .
(This is analogous to Corollary 6.5.15.) Hence, if we set K = Z and w ( a) = 1
for each arc a of our digraph, then (257) entails
  

det  ∑ w ( p) = ∑ w (p)


 
p:Ai → Bj p is a nipat
1≤i ≤k, 1≤ j≤k from A to B

(by the same reasoning as in the proof of Corollary 6.5.15). The right hand side
of this equality is 1 (since there is only one nipat
 from A to B), while the matrix
on the left hand side is easily seen to be ci+ j−2 1≤i≤k, 1≤ j≤k (since the # of paths
from Ai to Bj is the Catalan number ci+ j−2 ). This yields the claim of Corollary
6.5.17. The details are LTTR.
The LGV lemma in all its variants is one major place in which combinatorial
questions reduce to the computation of determinants. Other such places are
the matrix-tree theorem (see, e.g., [Zeilbe85, §4], [Loehr11, §3.17], [Stanle18, Theo-
rems 9.8 and 10.4], [Grinbe23, §5.14]) and the enumeration of perfect matchings

132 Inthis picture, we are drawing only “half” of the grid. Indeed, the vertices (i, j) of our
digraph Z × N can be classified into even vertices (i.e., the ones for which i + j is even) and
odd vertices (i.e., the ones for which i + j is odd). Any arc either connects two even vertices or
connects two odd vertices. Hence, a path starting at an even vertex cannot contain any odd
vertex (and vice versa). Since all our vertices A1 , A2 , . . . , Ak and B1 , B2 , . . . , Bk are even, we
thus don’t have to bother even drawing the odd vertices (as they have no chance to appear
in any paths between our vertices). As a consequence, we are drawing only the grid lines
containing the even vertices.
Math 701 Spring 2021, version April 6, 2024 page 425

or domino tilings (see, e.g., [Stucky15], [Loehr11, §12.12–§12.13], [Aigner07,


§10.1]). Soon, we will also encounter determinants in the study of symmetric
functions.
See also [BruRys91] for a selection of other intersections between combina-
torics and linear algebra.

7. Symmetric functions
This final chapter is devoted to the theory of symmetric functions. Specifically,
we will restrict ourselves to symmetric polynomials (the “functions” part is a
technical tweak that makes the theory neater but we won’t have time to intro-
duce). Serious treatments of the subject can be found in [Wildon20], [Loehr11,
Chapters 10–11], [Egge19], [MenRem15], [Macdon95], [Aigner07, Chapter 8],
[Stanle23, Chapter 7], [Sagan19, Chapter 7], [Sagan01, Chapter 4], [Krishn86],
[FoaHan04, Chapters 14–19], [Savage22], [GriRei20, Chapter 2] and [LLPT95].
We begin with some oversimplified historical motivation.
Symmetric polynomials first(?) appeared in the study of roots of polynomi-
als. Consider a monic univariate polynomial

f = x n + a 1 x n −1 + a 2 x n −2 + · · · + a n x 0 ∈ C [ x ]

(note the nonstandard labeling of coefficients). Let r1 , r2 , . . . , rn ∈ C be the


roots of this polynomial (listed with multiplicities). Then, François Viète (aka
Franciscus Vieta) noticed that

f = ( x − r1 ) ( x − r2 ) · · · ( x − r n ) ,

so that (by comparing coefficients) we see that

r1 + r2 + · · · + r n = − a1 ;
∑ r i r j = a2 ;
i< j

∑ r i r j r k = − a3 ;
i < j<k
...;
r1 r2 · · · rn = (−1)n an .

These equalities are now known as Viete’s formulas. They allow computing
certain expressions in the ri ’s without having to compute the ri ’s themselves.
For instance, we can compute r12 + r22 + · · · + rn2 (that is, the sum of the squares
of all roots of f ) by observing that
 
(r1 + r2 + · · · + rn )2 = r12 + r22 + · · · + rn2 + 2 ∑ ri r j ,
i< j
Math 701 Spring 2021, version April 6, 2024 page 426

so that
 2

r12 + r22 + · · · + rn2 = r1 + r2 + · · · + rn  − 2 ∑ ri r j = a21 − 2a2 .


| {z }
=− a1 i< j
| {z }
= a2

This shows, among other things, that r12 + r22 + · · · + rn2 is an integer if the co-
efficients of f are integers. Newton and others found similar formulas for
r13 + r23 + · · · + rn3 and other such polynomials. (These formulas are now known
as the Newton–Girard identities – see Theorem 7.1.12 below.) Gauss extended
this to arbitrary symmetric polynomials in r1 , r2 , . . . , rn (by algorithmically ex-
pressing them as polynomials in a1 , a2 , . . . , an ), and used it in one of his proofs
of the Fundamental Theorem of Algebra [Gauss16]; Galois used this to build
what is now known as Galois theory (even though modern treatments of Ga-
lois theory often avoid symmetric polynomials); some harbingers of this can be
seen in Cardano’s solution of the cubic equation. See [Tignol16] and [Armstr19]
for the real history.
Here is a simple modern application of the same ideas: Let A ∈ Cn×n be a
matrix with eigenvalues λ1 , λ2 , . . . , λn (listed with algebraic multiplicities). Let
f ∈ C [ x ] be a univariate polynomial. The spectral mapping theorem says that the
eigenvalues of the matrix f [ A] are f [λ1 ] , f [λ2 ] , . . . , f [λn ] (here, I am using the
notation f [ a] for the value of f at some element a; this is usually written f ( a)).
Thus, the characteristic polynomial of f [ A] is

χ f [ A] = ( x − f [λ1 ]) ( x − f [λ2 ]) · · · ( x − f [λn ])


!
n n −1
∑ x n −2 ± · · · .
 
= x − ( f [λ1 ] + f [λ2 ] + · · · + f [λn ]) x + f [ λi ] f λ j
i< j

Hence, all coefficients of χ f [ A] are symmetric polynomials in the λi s that de-


pend only on f (not on A). In particular, this shows that χ f [ A] is uniquely
determined by f and χ A . But can you compute χ f [ A] exactly in terms of f and
χ A without computing the roots λ1 , λ2 , . . . , λn ? Yes, if you know how to ex-
press any symmetric polynomial in the λi s in terms of the coefficients of χ A .
This is the same problem that Gauss solved with his algorithm for expressing
an arbitrary symmetric polynomial in the roots of a polynomial in terms of the
coefficients of the polynomial. Incidentally, this algorithm also becomes help-
ful when one tries to generalize the spectral mapping theorem to matrices over
arbitrary commutative rings. Here, eigenvalues don’t always exist (let alone n
of them), so it becomes necessary to restate the theorem in a language that does
not rely on them. Again, symmetric polynomials provide the way to do this.

7.1. Definitions and examples of symmetric polynomials


Math 701 Spring 2021, version April 6, 2024 page 427

Convention 7.1.1. Fix a commutative ring K. Fix an N ∈ N. (Perhaps n


would be more conventional, but lowercase letters are chronically in short
supply in this subject.)
Throughout this chapter, we will keep K and N fixed. We will use Defini-
tion 5.1.2.

Recall that S N denotes the N-th symmetric group, i.e., the group of all permu-
tations of the set [ N ] := {1, 2, . . . , N }.

Definition 7.1.2. (a) Let P be the polynomial ring K [ x1 , x2 , . . . , x N ] in N vari-


ables over K. This is not just a ring; it is a commutative K-algebra.
(b) The symmetric group S N acts on the set P according to the formula
h i
σ · f = f x σ (1) , x σ (2) , . . . , x σ ( N ) for any σ ∈ S N and any f ∈ P .

Here, f [ a1 , a2 , . . . , a N ] means the result of substituting a1 , a2 , . . . , a N for the


indeterminates x1 , x2 , . . . , x N in a polynomial f ∈ P .
(For example, if N = 4 and σ = cyc1,2,3 ∈ S4 , then σ · f =
h i
f xσ(1) , xσ(2) , xσ(3) , xσ(4) = f [ x2 , x3 , x1 , x4 ] for any f ∈ P , so that, for ex-
ample,  
σ · 2x1 + 3x22 + 4x3 − x415 = 2x2 + 3x32 + 4x1 − x415

and σ · ( x1 − x3 x4 ) = x2 − x1 x4 .)
Roughly speaking, the group S N is thus acting on P by permuting vari-
ables: A permutation σ ∈ S N transforms a polynomial f by substituting xσ(i)
for each xi .
Note that this action of S N on P is a well-defined group action (as we will
see in Proposition 7.1.4 below).
(c) A polynomial f ∈ P is said to be symmetric if it satisfies

σ· f = f for all σ ∈ S N .

(d) We let S be the set of all symmetric polynomials f ∈ P .

Example 7.1.3. Let N = 3 and K = Q, and let us rename the indeterminates


x1 , x2 , x3 as x, y, z. Then:
(a) We have x + y + z ∈ S (since, for example, the simple transposition
s1 ∈ S3 satisfies s1 · ( x + y + z) = y + x + z = x + y + z, and similarly any
other σ ∈ S3 also satisfies σ · ( x + y + z) = x + y + z).
(b) We have x + y ∈ / S (since the transposition t1,3 ∈ S3 satisfies t1,3 ·
( x + y) = z + y ̸= x + y).
Math 701 Spring 2021, version April 6, 2024 page 428

(c) We have ( x − y) (y − z) (z − x ) ∈
/ S (since the simple transposition s1 ∈
S3 transforms ( x − y) (y − z) (z − x ) into

s1 · (( x − y) (y − z) (z − x )) = (y − x ) ( x − z) (z − y)
= − ( x − y) (y − z) (z − x )
̸= ( x − y) (y − z) (z − x ) ,

because −1 ̸= 1 in Q). Actually, the polynomial ( x − y) (y − z) (z − x ) is an


example of an antisymmetric polynomial – i.e., a polynomial f ∈ P such that
σ · f = (−1)σ f for all σ ∈ S N . However, if K = Z/2 (or if K is a Z/2-
algebra), then antisymmetric polynomials and symmetric polynomials are
the same thing.
(d) We have (( x − y) (y − z) (z − x ))2 ∈ S . More generally, if f ∈ P is
antisymmetric, then f 2 is symmetric, so that f 2 ∈ S .
(e) We have 37 ∈ S . More generally, any constant polynomial f ∈ P is
symmetric.
(f) We have (1 − x ) (1 − y) (1 − z) ∈ S .
1
(g) We have ∈
/ S , because this is not a polynomial.
(1 − x ) (1 − y ) (1 − z )
It is an example of a symmetric power series.

Some basic properties of our current setup are worth mentioning:


Proposition 7.1.4. The action of S N on P is a well-defined group action. In
other words, the following holds:
(a) We have id[ N ] · f = f for every f ∈ P .
(b) We have (στ ) · f = σ · (τ · f ) for every σ, τ ∈ S N and f ∈ P .

The proof of this proposition is straightforward, but due to its somewhat


slippery nature (the two substitutions in part (b) are a particularly frequent
source of confusion), we present it in full:
Proof of Proposition 7.1.4. (a) If f ∈ P , then the definition of id[ N ] · f yields
h i
id[ N ] · f = f xid(1) , xid(2) , . . . , xid( N ) = f [ x1 , x2 , . . . , x N ] = f .

This proves Proposition 7.1.4 (a).


(b) Let σ, τ ∈ S N and f ∈ P . The definition of σ · (τ · f ) yields
h i
σ · ( τ · f ) = ( τ · f ) x σ (1) , x σ (2) , . . . , x σ ( N ) . (259)

Write the polynomial f in the form


f = ∑ f (a1 ,a2 ,...,a N ) x1a1 x2a2 · · · x aNN , (260)
( a1 ,a2 ,...,a N )∈N N
Math 701 Spring 2021, version April 6, 2024 page 429

where f (a1 ,a2 ,...,a N ) ∈ K are its coefficients. The definition of τ · f yields
h i
τ · f = f x τ (1) , x τ (2) , . . . , x τ ( N ) = ∑ f (a1 ,a2 ,...,a N ) xτa1(1) xτa2(2) · · · xτa N( N )
( a1 ,a2 ,...,a N )∈N N

(here, we have substituted xτ (1) , xτ (2) , . . . , xτ ( N ) for x1 , x2 , . . . , x N on both sides of (260)).


Substituting xσ(1) , xσ(2) , . . . , xσ( N ) for x1 , x2 , . . . , x N on both sides of this equality, we
obtain
h i
( τ · f ) x σ (1) , x σ (2) , . . . , x σ ( N )
= ∑ f (a1 ,a2 ,...,a N ) xσa1(τ (1)) xσa2(τ (2)) · · · xσa N(τ ( N ))
( a1 ,a2 ,...,a N )∈N N | {z }
1a a a
N
= x(στ x2
)(1) (στ )(2)
··· x(στ )( N )
(since σ(τ (i ))=(στ )(i ) for all i ∈[ N ])
 
since our substitution replaces each xi by xσ(i)
= ∑ f (a1 ,a2 ,...,a N ) x(aστ
1
x a2
)(1) (στ )(2)
· · · x(aστ
N
)( N )
.
( a1 ,a2 ,...,a N )∈N N

On the other hand, the definition of the action of S N on P yields


h i
(στ ) · f = f x(στ )(1) , x(στ )(2) , . . . , x(στ )( N )
= ∑ f (a1 ,a2 ,...,a N ) x(aστ
1
x a2
)(1) (στ )(2)
· · · x(aστ
N
)( N )
( a1 ,a2 ,...,a N )∈N N

(here, we have substituted x(στ )(1) , x(στ )(2) , . . . , x(στ )( N ) for x1 , x2 , . . . , x N on both sides
of (260)). Comparing these two equalities, we obtain
h i
(στ ) · f = (τ · f ) xσ(1) , xσ(2) , . . . , xσ( N ) = σ · (τ · f )

(by (259)). This proves Proposition 7.1.4 (b).

We recall the following notions from abstract algebra:


• A K-algebra isomorphism from a K-algebra A to a K-algebra B means an
invertible K-algebra morphism f : A → B such that its inverse f −1 :
B → A is a K-algebra morphism from B to A. (Actually, any invertible
K-algebra morphism is an isomorphism.)
• A K-algebra automorphism of a K-algebra A means a K-algebra isomor-
phism from A to A.

Proposition 7.1.5. The group S N acts on P by K-algebra automorphisms. In


other words, for each σ ∈ S N , the map

P → P,
f 7→ σ · f

is a K-algebra automorphism of P (that is, a K-algebra isomorphism from P


to P ).
Math 701 Spring 2021, version April 6, 2024 page 430

Proof of Proposition 7.1.5 (sketched). Fix σ ∈ S N . For any f , g ∈ P , we have


h i
σ · ( f g ) = ( f g ) x σ (1) , x σ (2) , . . . , x σ ( N ) (by the definition of the action of S N on P )
h i h i
= f x σ (1) , x σ (2) , . . . , x σ ( N ) · g x σ (1) , x σ (2) , . . . , x σ ( N )
| {z } | {z }
=σ· f =σ· g
(by the definition of the action of S N on P ) (by the definition of the action of S N on P )

= (σ · f ) · (σ · g) .

Thus, the map

P → P,
f 7→ σ · f

respects multiplication. Similarly, this map respects addition, respects scaling, respects
the zero and respects the unity. Hence, this map is a K-algebra morphism from P to
P . Furthermore, this map is invertible, since its inverse is the map

P → P,
f 7 → σ −1 · f .

Thus, this map is an invertible K-algebra morphism from P to P , and therefore a


K-algebra isomorphism from P to P . In other words, this map is a K-algebra automor-
phism of P . This proves Proposition 7.1.5.

Theorem 7.1.6. The subset S is a K-subalgebra of P .

Proof of Theorem 7.1.6 (sketched). We need to show that S is closed under addition, mul-
tiplication and scaling, and that S contains the zero and the unity of P . Let me just
show that S is closed under multiplication (since all the other claims are equally easy):
Let f , g ∈ S . We must show that f g ∈ S .
The polynomial f is symmetric (since f ∈ S ); in other words, σ · f = f for each
σ ∈ S N . Similarly, σ · g = g for each σ ∈ S N . Now, for each σ ∈ S N , we have

σ · ( f g) = (σ · f ) · (σ · g) = f g.
| {z } | {z }
=f =g
(as we have seen) (as we have seen)

This shows that f g is symmetric, i.e., we have f g ∈ S .


Forget that we fixed f , g. We thus have shown that f g ∈ S for any f , g ∈ S . This
shows that S is closed under multiplication. As explained above, this concludes our
proof of Theorem 7.1.6.

Definition 7.1.7. The K-subalgebra S of P is called the ring of symmetric poly-


nomials in N variables over K.

Some more terminology is worth defining:


Math 701 Spring 2021, version April 6, 2024 page 431

Definition 7.1.8. (a) A monomial is an expression of the form x1a1 x2a2 · · · x aNN
with a1 , a2 , . . . , a N ∈ N.
(b) The degree deg m of a monomial m = x1a1 x2a2 · · · x aNN is defined to be
a1 + a2 + · · · + a N ∈ N.
(c) A monomial m = x1a1 x2a2 · · · x aNN is said to be squarefree if a1 , a2 , . . . , a N ∈
{0, 1}. (This is saying that no square or higher power of an indeterminate
appears in m; thus the name “squarefree”.)
(d) A monomial m = x1a1 x2a2 · · · x aNN is said to be primal if there is at most
one i ∈ [ N ] satisfying ai > 0. (This is saying that the monomial m contains
no two distinct indeterminates. Thus, a primal monomial is just 1 or a power
of an indeterminate.)

Now we can define some specific symmetric polynomials:

Definition 7.1.9. (a) For each n ∈ Z, define a symmetric polynomial en ∈ S


by

en = ∑ n
xi1 xi2 · · · xin = (sum of all squarefree monomials of degree n) .
(i1 ,i2 ,...,in )∈[ N ] ;
i1 <i2 <···<in

This en is called the n-th elementary symmetric polynomial in x1 , x2 , . . . , x N .


(b) For each n ∈ Z, define a symmetric polynomial hn ∈ S by

hn = ∑ n
xi1 xi2 · · · xin = (sum of all monomials of degree n) .
(i1 ,i2 ,...,in )∈[ N ] ;
i1 ≤i2 ≤···≤in

This hn is called the n-th complete homogeneous symmetric polynomial in


x1 , x2 , . . . , x N .
(c) For each n ∈ Z, define a symmetric polynomial pn ∈ S by

n n n
 x1 + x2 + · · · + x N , if n > 0;

pn = 1, if n = 0;

0, if n < 0

= (sum of all primal monomials of degree n) .

This pn is called the n-th power sum in x1 , x2 , . . . , x N .


Math 701 Spring 2021, version April 6, 2024 page 432

Example 7.1.10. (a) The 2-nd elementary symmetric polynomial is

e2 = ∑ x i1 x i2 = ∑ xi x j
2 2
(i1 ,i2 )∈[ N ] ; (i,j)∈[ N ] ;
i1 < i2 i< j
= x1 x2 + x1 x3 + · · · + x1 x N
+ x2 x3 + · · · + x2 x N
..
. ··· ··· ···
+ x N −1 x N .

(b) The 2-nd complete homogeneous symmetric polynomial is

h2 = ∑ x i1 x i2 = ∑ xi x j
2 2
(i1 ,i2 )∈[ N ] ; (i,j)∈[ N ] ;
i1 ≤ i2 i≤ j

= x12 + x1 x2 + x1 x3 + ··· + x1 x N
+ x22 + x2 x3 + ··· + x2 x N
..
. ··· ··· ···
+ x2N −1 + x N −1 x N
+ x2N .

(c) The 2-nd power sum is

p2 = x12 + x22 + · · · + x2N .

In light of the above two formulas for e2 and h2 , we thus have h2 = p2 + e2 .


(d) We have
e1 = h 1 = p 1 = x 1 + x 2 + · · · + x N .
(e) We have
e0 = h0 = p0 = 1.
(f) If n < 0, then en = hn = pn = 0.
(g) If N = 0 (that is, if P is a polynomial ring in 0 variables), then en =
hn = pn = 0 for all n > 0 (because there are no monomials of positive degree
in this case), but e0 = h0 = p0 = 1 still holds. This is a boring border case
which, however, is important to get right.

Proposition 7.1.11. For each integer n > N, we have en = 0.

Proof. Let n > N be an integer. Then, the set [ N ] has no n distinct elements.
Thus, there exists no n-tuple (i1 , i2 , . . . , in ) ∈ [ N ]n satisfying i1 < i2 < · · · <
in (because if (i1 , i2 , . . . , in ) was such an n-tuple, then its n entries i1 , i2 , . . . , in
would be n distinct elements of [ N ]).
Math 701 Spring 2021, version April 6, 2024 page 433

Now, the definition of en yields

en = ∑ xi1 xi2 · · · xin = (empty sum)


(i1 ,i2 ,...,in )∈[ N ]n ;
i1 <i2 <···<in

since there exists no n-tuple (i1 , i2 , . . . , in ) ∈ [ N ]n


 

satisfying i1 < i2 < · · · < in


= 0.

This proves Proposition 7.1.11.


Thus, there are only N “interesting” elementary symmetric polynomials:
namely, e1 , e2 , . . . , e N . All other en ’s are either 1 or 0.
In contrast, there are infinitely many “interesting” complete homogeneous
symmetric polynomials and power sums (provided that N > 0). For example,
for N = 2, we have h5 = x15 + x14 x2 + x13 x22 + x12 x23 + x1 x24 + x25 and p5 = x15 + x25 .
We have so far defined three sequences (e0 , e1 , e2 , . . .), (h0 , h1 , h2 , . . .) and
( p0 , p1 , p2 , . . .) of symmetric polynomials. The following theorem (known as
the Newton–Girard formulas or the Newton–Girard identities) relates these three
sequences:

Theorem 7.1.12 (Newton–Girard formulas). For any positive integer n, we


have
n
∑ (−1) j e j hn− j = 0; (261)
j =0
n
∑ (−1) j−1 en− j p j = nen ; (262)
j =1
n
∑ hn− j p j = nhn . (263)
j =1

Example 7.1.13. The formula (262), applied to n = 2, says that e1 p1 − p2 =


2e2 . Therefore,

p2 = e1 p1 − 2e2 = ( x1 + x2 + · · · + x N ) ( x1 + x2 + · · · + x N ) − 2 ∑ xi x j .
i< j

Before we prove (part of) Theorem 7.1.12, we establish some equalities in the
polynomial rings P [t] and P [u, v] (here, t, u, v are three new indeterminates)
and in the FPS ring P [[t]]:
Math 701 Spring 2021, version April 6, 2024 page 434

Proposition 7.1.14. (a) In the polynomial ring P [t], we have


N
∏ (1 − txi ) = ∑ (−1)n tn en .
i =1 n ∈N

(b) In the polynomial ring P [u, v], we have


N N
∏ (u − vxi ) = ∑ (−1)n u N −n vn en .
i =1 n =0

(c) In the FPS ring P [[t]], we have


N
1
∏ 1 − txi = ∑ tn hn .
i =1 n ∈N

Proof of Proposition 7.1.14 (sketched). (a) For each i ∈ {1, 2, . . . , N }, we have


1 + txi = ∑ (txi ) a .
a∈{0,1}

Multiplying these equalities over all i ∈ {1, 2, . . . , N }, we obtain


N N
∏ (1 + txi ) = ∏ ∑ (txi ) a = ∑ (tx1 ) a1 (tx2 ) a2 · · · (tx N ) a N
i =1 i =1 a∈{0,1} N
| {z }
( a1 ,a2 ,...,a N )∈{0,1} a a a
=t a1 +a2 +···+a N x11 x22 ··· x NN
(by Proposition 3.11.26)
= ∑ t a1 +a2 +···+a N x1a1 x2a2 · · · x aNN = ∑ tdeg m m
( a1 ,a2 ,...,a N )∈{0,1} N m is a squarefree
monomial
 
since the squarefree monomials are precisely
 the x1a1 x2a2 · · · x aNN with ( a1 , a2 , . . . , a N ) ∈ {0, 1} N , 
 
 
 and since the degree of such a monomial 
is precisely a1 + a2 + · · · + a N
 
here, we have split the sum
= ∑ ∑ n
t m
according to the value of deg m
n ∈N m is a squarefree
monomial of degree n

= ∑ t n
∑ m
n ∈N m is a squarefree
monomial of degree n
| {z }
=(sum of all squarefree monomials of degree n)
= en
(by the definition of en )

= ∑ tn en .
n ∈N
Math 701 Spring 2021, version April 6, 2024 page 435

Substituting −t for t on both sides of this equality, we obtain


N
∏ (1 − txi ) = ∑ (−t)n en = ∑ (−1)n tn en .
i =1 n ∈N n ∈N
| {z }
=(−1)n tn

This proves Proposition 7.1.14 (a).


(b) This is similar to part (a). (See Exercise A.6.1.1 for details.)
(c) For each i ∈ {1, 2, . . . , N }, we have

1
= 1 + txi + (txi )2 + (txi )3 + · · ·
1 − txi
 
by substituting txi for x in the
geometric series formula (5)
= ∑ (txi ) a .
a ∈N

Multiplying these equalities over all i ∈ {1, 2, . . . , N }, we obtain


N N
1
∏ 1 − txi = ∏ ∑ (txi ) a
i =1 i =1 a ∈N
= ∑ (tx1 ) a1 (tx2 ) a2 · · · (tx N ) a N
( a1 ,a2 ,...,a N )∈N N
| {z }
a a a
=t a1 +a2 +···+a N x11 x22 ··· x NN
(by Proposition 3.11.27)
= ∑ t a1 +a2 +···+a N x1a1 x2a2 · · · x aNN = ∑ tdeg m m
( a1 ,a2 ,...,a N )∈N N m is a monomial
 
since the monomials are precisely
a a a
 the x11 x22 · · · x NN with ( a1 , a2 , . . . , a N ) ∈ N N , 
 
 and since the degree of such a monomial 
 
is precisely a1 + a2 + · · · + a N
 
here, we have split the sum
= ∑ ∑ n
t m
according to the value of deg m
n ∈N m is a monomial
of degree n

= ∑ tn ∑ m = ∑ tn hn .
n ∈N m is a monomial n ∈N
of degree n
| {z }
=(sum of all monomials of degree n)=hn
(by the definition of hn )

This proves Proposition 7.1.14 (c).


Let us now prove the first Newton–Girard formula (261):
Math 701 Spring 2021, version April 6, 2024 page 436

Proof of the 1st Newton–Girard formula (261). In the FPS ring P [[t]], we have
N
∏ (1 − txi ) = ∑ (−1)n tn en (by Proposition 7.1.14 (a))
i =1 n ∈N

and
N
1
∏ 1 − txi = ∑ tn hn (by Proposition 7.1.14 (c)) .
i =1 n ∈N
Multiplying these two equalities, we obtain
! !
N N
1
∏ (1 − txi ) ∏ 1 − txi
i =1 i =1
! ! ! !
= ∑ (−1)n tn en ∑ tn hn = ∑ (−1) j t j e j ∑ tk hk
n ∈N n ∈N j ∈N k ∈N
 
here, we have renamed the summation
indices n and n (in the two sums) as j and k
= ∑ ∑ (−1) j t j e j tk hk = ∑ (−1) j e j hk t j+k
j ∈N k ∈N | {z } ( j,k)∈N2
| {z } =e j hk t j+k
= ∑
( j,k)∈N2
 
 
= ∑ ∑ (−1) j e j hk  tn
 
n ∈N ( j,k)∈N2 ;
 
j+k=n

(here, we have split the sum according to the value of j + k). Comparing this
with
! !
N N N   N
1 1
∏ (1 − txi ) ∏ 1 − txi = ∏ (1 − txi ) · 1 − txi = ∏ 1 = 1,
i =1 i =1 i =1 | {z } i =1
=1

we obtain  
 
1= ∑ ∑ 1 j
(− ) j k  tn .
e h
 
n ∈N ( j,k)∈N2 ;
 
j+k=n

This is an equality between two FPSs in P [[t]]. Comparing coefficients in front


of tn , we conclude that each positive integer n satisfies
n
0= ∑ (−1) j e j hk = ∑ (−1) j e j hn− j
( j,k)∈N2 ; j =0
j+k=n
Math 701 Spring 2021, version April 6, 2024 page 437

(here, we have substituted ( j, n − j) for ( j, k ) in the sum, since the map {0, 1, . . . , n} →
( j, k) ∈ N2 | j + k = n that sends each j ∈ {0, 1, . . . , n} to the pair ( j, n − j)


is a bijection). This proves the 1st Newton–Girard formula (261).


Proving the other two formulas in Theorem 7.1.12 is a homework problem
(Exercise A.6.1.3). Note that there exist proofs of different kinds: FPS manipu-
lations; induction; sign-reversing involutions.
Note that our above proof of Theorem 7.1.12 attests to the usefulness of gen-
erating functions: Even though the polynomials f ∈ P already involve N vari-
ables x1 , x2 , . . . , x N , the proof proceeds by adjoining yet another variable t (to
form the ring P [[t]]).
The Newton–Girard formulas can be used to express the ei ’s and the hi ’s in
terms of each other, and the pi ’s in terms of the ei ’s and the hi ’s, and finally the
ei ’s and the hi ’s in terms of the pi ’s when K is a commutative Q-algebra (i.e.,
when the numbers 1, 2, 3, . . . have inverses in K). More generally, it turns out
that we can express any symmetric polynomial in terms of ei ’s or of hi ’s or (if
K is a commutative Q-algebra) of pi ’s:
Theorem 7.1.15 (Fundamental Theorem of Symmetric Polynomials, due to
Gauss et al.). (a) The elementary symmetric polynomials e1 , e2 , . . . , e N are
algebraically independent (over K) and generate the K-algebra S .
In other words, each f ∈ S can be uniquely written as a polynomial in
e1 , e2 , . . . , e N .
In yet other words, the map
K [y , y , . . . , y N ] → S ,
| 1 2{z }
a polynomial ring
in N variables
g 7 → g [ e1 , e2 , . . . , e N ]
is a K-algebra isomorphism.
(b) The complete homogeneous symmetric polynomials h1 , h2 , . . . , h N are
algebraically independent (over K) and generate the K-algebra S .
In other words, each f ∈ S can be uniquely written as a polynomial in
h1 , h2 , . . . , h N .
In yet other words, the map
K [y , y , . . . , y N ] → S ,
| 1 2{z }
a polynomial ring
in N variables
g 7 → g [ h1 , h2 , . . . , h N ]
is a K-algebra isomorphism.
(c) Now assume that K is a commutative Q-algebra (e.g., a field of charac-
teristic 0). Then, the power sums p1 , p2 , . . . , p N are algebraically independent
(over K) and generate the K-algebra S .
Math 701 Spring 2021, version April 6, 2024 page 438

In other words, each f ∈ S can be uniquely written as a polynomial in


p1 , p2 , . . . , p N .
In yet other words, the map

K [y , y , . . . , y N ] → S ,
| 1 2{z }
a polynomial ring
in N variables
g 7 → g [ p1 , p2 , . . . , p N ]

is a K-algebra isomorphism.

Example 7.1.16. (a) Theorem 7.1.15 (a) yields that p3 can be uniquely written
as a polynomial in e1 , e2 , . . . , e N . How to write it this way?
Here is a method that (more generally) can be used to express pn (for any
given n > 0) as a polynomial in e1 , e2 , . . . , en . This method is recursive, so we
assume that all the “smaller” power sums p1 , p2 , . . . , pn−1 have already been
expressed in this way. Now, the 2nd Newton–Girard formula (262) yields
n n −1
nen = ∑ (−1) j−1 en− j p j = ∑ (−1) j−1 en− j p j + (−1)n−1 e|{z}
n−n pn
j =1 j =1
= e0 = 1
 
here, we have split off the addend
for j = n from the sum
n −1
= ∑ (−1) j−1 en− j p j + (−1)n−1 pn .
j =1

Solving this equality for pn , we obtain


!
n −1
pn = (−1)n−1 nen − ∑ (−1) j−1 en− j p j .
j =1

The right hand side can now be expressed in terms of e1 , e2 , . . . , en (since the
only power sums appearing in it are p1 , p2 , . . . , pn−1 , which we already know
how to express in these terms); therefore, we obtain an expression of pn as a
polynomial in e1 , e2 , . . . , en .
For example, here is what we obtain for n ∈ [4] by following this method:
p 1 = e1 ;
p2 = e12 − 2e2 ;
p3 = e13 − 3e2 e1 + 3e3 ;
p4 = e14 − 4e2 e12 + 2e22 + 4e3 e1 − 4e4 .
If N < n, then this expression of pn as a polynomial in e1 , e2 , . . . , en becomes
an expression as a polynomial in e1 , e2 , . . . , e N if we throw away all addends
Math 701 Spring 2021, version April 6, 2024 page 439

that contain one of e N +1 , e N +2 , . . . , en as factor (we are allowed to do this,


because Proposition 7.1.11 shows that all these addends are 0). For example,
if N = 2, then the expression p4 = e14 − 4e2 e12 + 2e22 + 4e3 e1 − 4e4 becomes
p4 = e14 − 4e2 e12 + 2e22 this way.
(b) Theorem 7.1.15 (b) yields that p3 can be uniquely written as a polyno-
mial in h1 , h2 , . . . , h N . How to write it this way?
In part (a), we have given a method to express pn (for any given n > 0)
as a polynomial in e1 , e2 , . . . , en . A similar method (but using (263) instead
of (262)) can be used to express pn (for any given n > 0) as a polynomial in
h1 , h2 , . . . , hn . For example, for n ∈ [4], this method produces

p1 = h1 ;
p2 = −h21 + 2h2 ;
p3 = h31 − 3h2 h1 + 3h3 ;
p4 = −h41 + 4h2 h21 − 2h22 − 4h3 h1 + 4h4 .

(The similarity with the analogous formulas expressing pn in terms of


e1 , e2 , . . . , en is not accidental – the formulas are indeed identical when n is
odd and differ in all signs when n is even. Proving this is Exercise A.6.1.7
(b).)
So we can express pn as a polynomial in h1 , h2 , . . . , hn . However, expressing
pn as a polynomial in h1 , h2 , . . . , h N is harder when N < n. For example, if
N = 2, then the former expression is p4 = −h41 + 4h2 h21 − 2h22 − 4h3 h1 + 4h4 ,
while the latter is p4 = −h41 + 2h22 ; there is no easy way to get the latter from
the former.
(c) Assume that K is a Q-algebra. Theorem 7.1.15 (c) yields that e3 can be
uniquely written as a polynomial in p1 , p2 , . . . , p N . How to write it this way?
In part (a), we have given a method to express pn (for any given n > 0) as a
polynomial in e1 , e2 , . . . , en . The crux of this method was to solve the equation
(262) for pn . If we instead solve it for en (which is almost immediate: it gives
1 n j −1
en = ∑ (−1) en− j p j ), then we obtain a method for expressing en (for
n j =1
any given n > 0) as a polynomial in p1 , p2 , . . . , pn . Applied to all n ∈ [4], this
method produces

e1 = p 1 ;
1 1
e2 = p21 − p2 ;
2 2
1 3 1 1
e3 = p 1 − p 2 p 1 + p 3 ;
6 2 3
1 4 1 1 1 1
e4 = p1 − p2 p21 + p22 + p3 p1 − p4 .
24 4 8 3 4
Math 701 Spring 2021, version April 6, 2024 page 440

Note the fractions on the right hand sides! This is why we required K to be
a Q-algebra in Theorem 7.1.15 (c). In general, we cannot express en in terms
of p1 , p2 , . . . , pn if the integer n is not invertible in K.
The question of expressing en as a polynomial in p1 , p2 , . . . , p N (as opposed
to p1 , p2 , . . . , pn ) is easily reduced to what we just have done: If n ≤ N,
then we have answered it already; if n > N, then the answer is en = 0 (by
Proposition 7.1.11).
(d) Not even algebraic independence of p1 , p2 , . . . , p N is true in general (if
we don’t assume that K is a Q-algebra)! Indeed, if K = Z/2, then

p21 = ( x1 + x2 + · · · + x N )2 = x12 + x22 + · · · + x2N + 2 ∑ xi x j = p2 .


i< j
| {z }
= p2 | {z }
=0
if K =Z/2

More generally, if K = Z/p for some prime p, then the Idiot’s Binomial
Formula (i.e., the formula ( x + y) p = x p + y p that holds in any commutative
p
Z/p-algebra) yields p1 = p p . (Did I mention that lowercase letters are in
short supply in the theory of symmetric polynomials?)
(e) If N = 3, then Theorem 7.1.15 (b) yields that h4 can be written as a
polynomial in h1 , h2 , h3 . Here is how this looks like:

h4 = h41 − 3h2 h21 + h22 + 2h3 h1 .

(f) If N = 3, then

(( x − y) (y − z) (z − x ))2 = e22 e12 − 4e23 − 4e3 e13 + 18e3 e2 e1 − 27e32

(where we are again denoting x1 , x2 , x3 by x, y, z).

We omit the proof of Theorem 7.1.15 for now.


Let us record a useful criterion for showing that a polynomial is symmetric:

Lemma 7.1.17. For each i ∈ [ N − 1], we consider the simple transposition


si ∈ S N defined in Definition 5.2.3 (applied to n = N).
Let f ∈ P . Assume that

sk · f = f for each k ∈ [ N − 1] . (264)

Then, the polynomial f is symmetric.

In plainer terms, Lemma 7.1.17 says that if a polynomial f ∈ P remains


unchanged whenever we swap two adjacent indeterminates (i.e., it remains
unchanged if we swap x1 with x2 ; it remains unchanged if we swap x2 with
x3 ; it remains unchanged if we swap x3 with x4 ; etc.), then this polynomial f is
Math 701 Spring 2021, version April 6, 2024 page 441

symmetric. For example, for N = 3, it says that if a polynomial f ∈ K [ x1 , x2 , x3 ]


satisfies f [ x2 , x1 , x3 ] = f and f [ x1 , x3 , x2 ] = f , then f is symmetric.
Proof of Lemma 7.1.17. This follows from Corollary 5.3.22 or from Theorem 5.3.17
(a). See Section B.8 for the details of this proof.

7.2. N-partitions and monomial symmetric polynomials


Recall that an (integer) partition means a weakly decreasing finite tuple of posi-
tive integers – such as (5, 3, 3, 2, 1).
Let us define a variant of this notion:

Definition 7.2.1. An N-partition will mean a weakly decreasing N-tuple of


nonnegative integers. In other words, an N-partition means an N-tuple
(λ1 , λ2 , . . . , λ N ) ∈ N N with λ1 ≥ λ2 ≥ · · · ≥ λ N .

For example, (5, 3, 3, 2, 1, 0, 0) is a 7-partition.


Per se, an N-partition can contain zeroes and thus is not always a parti-
tion. However, the N-partitions are “more or less the same” as the partitions of
length ≤ N. Indeed:

Proposition 7.2.2. There is a bijection

{partitions of length ≤ N } → { N-partitions} ,


 

(λ1 , λ2 , . . . , λℓ ) 7→ λ1 , λ2 , . . . , λℓ , 0, 0, . . . , 0 .
| {z }
N −ℓ zeroes

Proof. Straightforward. (We essentially did this back in our proof of Proposition
4.4.7 (a), although we used the letter k instead of N back then.)
The N-partitions turn out to be closely connected to the ring S . Indeed, we
will soon see various bases of the K-module S , all of which are indexed by the
N-partitions. We shall construct the simplest one in a moment. First, we define
some auxiliary notations:

Definition 7.2.3. Let a = ( a1 , a2 , . . . , a N ) ∈ N N . Then:


(a) We let x a denote the monomial x1a1 x2a2 · · · x aNN .
(b) We let sort a mean the N-partition obtained from a by sorting the entries
of a in weakly decreasing order.

For example, if N = 5, then x (1,5,0,4,4) = x11 x25 x30 x44 x54 = x1 x25 x44 x54 and
sort (1, 5, 0, 4, 4) = (5, 4, 4, 1, 0).
Math 701 Spring 2021, version April 6, 2024 page 442

Definition 7.2.4. Let λ be any N-partition. Then, we define a symmetric


polynomial mλ ∈ S by
mλ := ∑ x a .
a ∈N N ;
sort a=λ
This is called the monomial symmetric polynomial corresponding to λ.

Example 7.2.5. Let N = 3. Then,

m(2,1,0) = ∑ x a = x (2,1,0) + x (2,0,1) + x (1,2,0) + x (1,0,2) + x (0,2,1) + x (0,1,2)


a∈N3 ;
sort a=(2,1,0)

= x12 x2 + x12 x3 + x1 x22 + x1 x32 + x22 x3 + x2 x32

and

m(3,2,1) = ∑ x a = x (3,2,1) + x (3,1,2) + x (2,3,1) + x (2,1,3) + x (1,3,2) + x (1,2,3)


a∈N3 ;
sort a=(3,2,1)

= x13 x22 x3 + x13 x2 x32 + x12 x23 x3 + x12 x2 x33 + x1 x23 x32 + x1 x22 x33
= x13 x22 x3 + (all other 5 permutations of this monomial)

and

m(2,2,1) = ∑ x a = x (2,2,1) + x (2,1,2) + x (1,2,2)


a∈N3 ;
sort a=(2,2,1)

= x12 x22 x3 + x12 x2 x32 + x1 x22 x32

and
m(2,2,2) = ∑ x a = x12 x22 x32 .
a∈N3 ;
sort a=(2,2,2)

Our symmetric polynomials en , hn and pn so far can be easily expressed in


terms of monomial symmetric polynomials:
Proposition 7.2.6. (a) For each n ∈ {0, 1, . . . , N }, we have
en = m(1,1,...,1,0,0,...,0) ,
where (1, 1, . . . , 1, 0, 0, . . . , 0) is the N-tuple that begins with n many 1’s and
ends with N − n many 0’s.
(b) For each n ∈ N, we have
hn = ∑ mλ ,
λ is an N-partition;
|λ|=n
Math 701 Spring 2021, version April 6, 2024 page 443

where the size |λ| of an N-partition λ is defined to be the sum of its entries
(i.e., if λ = (λ1 , λ2 , . . . , λ N ), then |λ| := λ1 + λ2 + · · · + λ N ).
(c) Assume that N > 0. For each n ∈ N, we have

pn = m(n,0,0,...,0) ,

where (n, 0, 0, . . . , 0) is the N-tuple that begins with an n and ends with N − 1
zeroes.

Proof. Easy and LTTR.


The monomial symmetric polynomials mλ are the “building blocks” of sym-
metric polynomials, in the same way as the monomials are the “building blocks”
of polynomials. Here is a way to make this precise:

Theorem 7.2.7. (a) The family (mλ )λ is an N-partition is a basis of the K-module
S.
(b) Each symmetric polynomial f ∈ S satisfies
h i 

λ1 λ2 λN
f = x x
1 2 · · · x N f mλ .
λ=(λ1 ,λ2 ,...,λ N )
is an N-partition

(c) Let n ∈ N. Let

Sn := {homogeneous symmetric polynomials f ∈ P of degree n}

(where we understand the zero polynomial 0 ∈ P to be homogeneous of


every degree). Then, Sn is a K-submodule of S .
(d) Define the size of any N-partition λ = (λ1 , λ2 , . . . , λ N ) to be the number
λ1 + λ2 + · · · + λ N ∈ N. Then, the family (mλ )λ is an N-partition of size n is a basis
of the K-module Sn .

Example 7.2.8. Let N = 3, and let us rename the indeterminates x1 , x2 , x3 as


x, y, z. The polynomial ( x + y) (y + z) (z + x ) is symmetric, thus belongs to
S . Expanding it, we find

( x + y) (y + z) (z + x ) = x2 y + x2 z + y2 x + y2 z + z2 x + z2 y + 2 xyz
| {z } |{z}
=m(2,1,0) =m(1,1,1)

= m(2,1,0) + 2m(1,1,1) .

Thus, we have written ( x + y) (y + z) (z + x ) as a K-linear combination of


mλ ’s for various N-partitions λ. The same procedure (i.e., expanding, and
then collecting monomials that differ only in the order of their exponents,
Math 701 Spring 2021, version April 6, 2024 page 444

such as the monomials x2 y, x2 z, y2 x, y2 z, z2 x, z2 y in our example) can be ap-


plied to any symmetric polynomial f ∈ S , and always results in a repre-
sentation of f as a K-linear combination of mλ ’s (because the symmetry of
f ensures that monomials that differ only in the order of their exponents
appear in f with equal coefficients).

The proof of Theorem 7.2.7 will rely on a simple proposition that expresses
how a permutation σ ∈ S N transforms the coefficients of a polynomial f ∈ P
(guess what: it permutes these coefficients):

Proposition 7.2.9. Let σ ∈ S N and f ∈ P . Then,


 a1 a2 h a aσ 2 aσ N
i
x1 x2 · · · x aNN (σ · f ) = x1 ( ) x2 ( ) · · · x N ( ) f
σ 1


for any ( a1 , a2 , . . . , a N ) ∈ N N .

Here, as in Section 3.15, we are using the notation x1a1 x2a2 · · · x aNN g for the
 

coefficient of a monomial x1a1 x2a2 · · · x aNN in a polynomial g ∈ P .


The proof of Proposition 7.2.9 is quite easy and can be found in Section B.9.
Proof of Theorem 7.2.7 (sketched). Here is a rough outline of the proof; we leave
the details to the reader.
(a) The method shown in Example 7.2.8 shows that each f ∈ S is a K-linear
combination of the family (mλ )λ is an N-partition . Thus, this family spans S . It
remains to show that this family is K-linearly independent.
To show this, we observe that if you expand a linear combination ∑ aλ mλ
λ is an N-partition
(where aλ ∈ K), then none of the addends can cancel (since no two mλ ’s have
any monomial in common133 ); thus, the linear combination cannot be 0 un-
less all the aλ ’s are 0. This proves the K-linear independence of the family
(mλ )λ is an N-partition . Thus, the proof of Theorem 7.2.7 (a) is complete.
(b) This should be clear from Example 7.2.8 as well.
(c) This is rather obvious: Any K-linear combination of homogeneous poly-
nomials of degree n is again homogeneous of degree n.
(d) This follows by the same argument as part (a), except that we now need to
observe that homogeneous polynomials of degree n are K-linear combinations
of degree-n monomials (rather than arbitrary monomials).

7.3. Schur polynomials


7.3.1. Alternants
Here is one way to generate symmetric polynomials:
133 and since each mλ contains at least one monomial (trivial observation, but should not be
forgotten)
Math 701 Spring 2021, version April 6, 2024 page 445

Example 7.3.1. Let N = 3, and let us again abbreviate the indeterminates


x1 , x2 , x3 as x, y, z. For simplicity, we assume that K is a field. As we know
(from the Vandermonde determinant – specifically, Theorem 6.4.31 (a)), we
have
 
x2 x 1
det  y2 y 1  = ∏ xi − x j = ( x − y) ( x − z) (y − z) .
  

z2 z 1 i< j

 
x5 x3 1
What about similar determinants, such as det  y5 y3 1  ? Just as in
 

z5 z3 1
the proof of Lemma 6.4.33 (in which we computed the original Vandermonde
determinant), we can argue that this is a polynomial in x, y, z that is divisible
by each of x − y and x − z and y − z (since it becomes 0 if we set one of x, y, z
equal to another). Hence,
 
x5 x3 1
det  y5 y3 1  = ( x − y) ( x − z) (y − z) · q
 

z5 z3 1

for some q ∈ K [ x, y, z]. However, this time, degree considerations yield


deg q = 8 − 3 = 5, so q is no longer just a constant. What is q ? Using
computer algebra, we see that
 
x5 x3 1
det  y5 y3 1 
 

z5 z3 1 − x 5 y3 + x 5 z3 + x 3 y5 − x 3 z5 − y5 z3 + y3 z5
q= =
( x − y) ( x − z) (y − z) − x2 y + x2 z + xy2 − xz2 − y2 z + yz2
= x2 y3 + x3 y2 + x2 z3 + x3 z2 + y2 z3 + y3 z2 + xyz3
+ xy3 z + x3 yz + 2xy2 z2 + 2x2 yz2 + 2x2 y2 z
= m(3,2,0) + m(3,1,1) + 2m(2,2,1) ∈ S .

Note that q ∈ S can be easily seen without computing  q. Indeed,


 if
5
x x 13
 5 3
we swap two of our variables x, y, z, then both det  y y 1  and

z5 z3 1
( x − y) ( x − z) (y − z) get multiplied by −1, so their ratio q stays unchanged.
This shows that σ · q = q whenever σ ∈ S3 is a transposition. Since the
transpositions generate the group S3 (indeed, Corollary 5.3.22 yields that the
simple transpositions s1 , s2 generate S3 ), this entails that σ · q = q for any
σ ∈ S3 (not just for transpositions). This means that q is symmetric.
Math 701 Spring 2021, version April 6, 2024 page 446

There is nothing special about the exponents 5 and 3 and 0 in the above
determinant. More generally, for any a, b, c ∈ N, we can define the so-called
alternant  
x a xb xc
det  y a yb yc  ∈ P .
 

z a zb zc
When studying this alternant, we can WLOG assume that a, b, c are distinct
(since otherwise, the alternant is just 0) and furthermore assume that a >
b > c (since the general case is reduced to this one by swapping the columns
around). The alternant is then a polynomial divisible by x − y and x − z and
y − z (since it becomes 0 if we set one of  x, y, z equal
 to another), and thus
2
x x 1
divisible by ( x − y) ( x − z) (y − z) = det  y2 y 1  (the simplest nonzero
 

z2 z 1
alternant). Moreover, the ratio
   
x a xb xc x a xb xc
det  y a yb yc  det  y a yb yc 
   

z a zb zc z a zb zc
=  
( x − y) ( x − z) (y − z) x2 x 1
det  y2 y 1 
 

z2 z 1

is a symmetric polynomial in x, y, z (by the same argument that we used be-


fore). Some experimentation suggests that all coefficients of this polynomial
are positive integers (to be rigorous, nonnegative integers). There is probably
no way of showing this without explicitly finding this polynomial – and the
best way to do so is by defining this polynomial combinatorially.

Let us prepare for doing this.

Definition 7.3.2. (a) We let ρ be the N-tuple ( N − 1, N − 2, . . . , N − N ) ∈ N N .

(b) For any N-tuple α = (α1 , α2 , . . . , α N ) ∈ N N , we define


 
 α  
j
aα := det  xi  ∈ P.
 
 1≤i ≤ N, 1≤ j≤ N 
| {z }
∈P N × N

This is called the α-alternant (of x1 , x2 , . . . , x N ).


Math 701 Spring 2021, version April 6, 2024 page 447

For example, for N = 3, we have


 
x5 x3 1
a(5,3,0) = det  y5 y3 1  (where ( x, y, z) = ( x1 , x2 , x3 )) .
 

z5 z3 1
Note that the definition of aρ yields
     
ρj N−j
aρ = det xi = det xi
1≤i ≤ N, 1≤ j≤ N 1≤i ≤ N, 1≤ j≤ N

since ρ j = N − j for each j ∈ [ N ]


= xi − x j (265)
1≤ i < j ≤ N

(by Theorem 6.4.31 (a), applied to N, P and xi instead of n, K and ai ).


Thus, we suspect:

Conjecture 7.3.3. For every α = (α1 , α2 , . . . , α N ) ∈ N N , the alternant aα is a


multiple of aρ in the polynomial ring P . Furthermore, if α1 > α2 > · · · > α N ,
then the polynomial aα /aρ has positive (more precisely, nonnegative) integer
coefficients.
We will prove this by explicitly constructing aα /aρ combinatorially.

7.3.2. Young diagrams and Schur polynomials


Let us first define the Young diagram of an N-partition. This is analogous to the
definition of the Young diagram of a partition (which we did back in the proof
of Proposition 4.1.15):
Definition 7.3.4. Let λ be an N-partition.
The Young diagram of λ is defined to be a table of N left-aligned rows, with
the i-th row (counted from the top, as always) having λi boxes. Formally, the
Young diagram of λ is defined as the set

{(i, j) | i ∈ [ N ] and j ∈ [λi ]} ⊆ {1, 2, 3, . . .}2 .

We visually represent each element (i, j) of this Young diagram as a box in


row i and column j; thus we obtain a table with N left-aligned rows (some
of which might be empty).
We denote the Young diagram of λ by Y (λ).

For example, the 3-partition (4, 1, 0) has Young diagram

.
Math 701 Spring 2021, version April 6, 2024 page 448

(The 3-rd row is invisible since it has length 0.) The four boxes in the 1-st (i.e.,
topmost) row of this diagram are (1, 1), (1, 2), (1, 3) and (1, 4) (from left to
right), while the single box in its 2-nd row is (2, 1).
Now, we are going to fill our Young diagrams – i.e., to put numbers in the
boxes:

Definition 7.3.5. Let λ be an N-partition.


A Young tableau of shape λ means a way of filling the boxes of Y (λ) with
elements of [ N ] (one element per box). Formally speaking, it is defined as
a map T : Y (λ) → [ N ]. We visually represent such a map by filling in the
number T (i, j) into each box (i, j).
We often abbreviate “Young tableau” as “tableau”. The plural of “tableau”
is “tableaux”.
For instance, here is a Young tableau of shape (4, 3, 3, 0, 0, 0, 0, . . . , 0) (defined
for any N ≥ 7):
1 7 2 4 .
3 3 6
2 6 1
Formally speaking, this is a map T : Y (4, 3, 3, 0, 0, 0, 0, . . . , 0) → [ N ] that sends
the pairs (1, 1) , (1, 2) , (1, 3) , (1, 4) , (2, 1) , . . . to 1, 7, 2, 4, 3, . . ., respectively.
We will use some visually inspired language when talking about Young dia-
grams and tableaux:

• For instance, the entry of a tableau T in box (i, j) will mean the value
T (i, j).
• Also, the u-th row of a tableau T (for a given u ≥ 1) will mean the sequence
of all entries of T in the boxes (i, j) with i = u.
• Likewise, the v-th column of a tableau T (for a given v ≥ 1) will mean the
sequence of all entries of T in the boxes (i, j) with j = v.
• If T is a Young tableau of shape λ, then the boxes of Y (λ) will also be
called the boxes of T.
• Two boxes of a Young diagram (or of a tableau) are said to be adjacent
if they have an edge in common when drawn on the picture (i.e., when
one of them has the form (i, j), while the other has the form (i, j + 1) or
(i + 1, j)).
• The words “north”, “west”, “south” and “east” are to be understood ac-
cording to the picture of a Young diagram: e.g., the box (2, 4) lies one step
north and three steps west of the box (3, 7).

Some tableaux are better than others:


Math 701 Spring 2021, version April 6, 2024 page 449

Definition 7.3.6. Let λ be an N-partition.


A Young tableau T of shape λ is said to be semistandard if its entries

• increase weakly along each row (from left to right);

• increase strictly down each column (from top to bottom).

Formally speaking, this means that a Young tableau T : Y (λ) → [ N ] is


semistandard if and only if

• we have T (i, j) ≤ T (i, j + 1) for any (i, j) ∈ Y (λ) satisfying (i, j + 1) ∈


Y ( λ );

• we have T (i, j) < T (i + 1, j) for any (i, j) ∈ Y (λ) satisfying (i + 1, j) ∈


Y ( λ ).

We let SSYT (λ) denote the set of all semistandard Young tableaux of shape
λ. (This depends on N as well, but N is fixed, so we omit it from our no-
tation.) We will usually say “semistandard tableau” instead of “semistandard
Young tableau”.

Example 7.3.7. Consider the following 6 Young tableaux of shape


(4, 2, 1, 0, 0, 0, . . . , 0):

1 3 3 4 2 1 3 4 1 1 2 3 (266)
2 3 5 3 4 5 2 4 5
4 6 6

1 2 3 4 1 1 1 1 1 2 3 4
5 6 7 2 2 2 1 2 3
8 3 1

Which of these 6 tableaux are semistandard? The first one is not semistan-
dard, since the entries in its second column do not strictly increase down the
column. The second one is not semistandard, since the entries in its first row
do not weakly increase along the row. The third one is semistandard. The
fourth one is semistandard, too. The fifth one is semistandard, too (actually
it has a special property: each of its entries is the smallest possible value that
an entry of a semistandard tableau could have in its box). The sixth one is
not semistandard, again because of the columns.
Math 701 Spring 2021, version April 6, 2024 page 450

Definition 7.3.8. Let λ be an N-partition. If T is any Young tableau of shape


λ, then we define the corresponding monomial
N
∏ ∏ ∏ xk
(# of times k appears in T )
xT := x T (c) = x T (i,j) = .
c is a box of Y (λ) (i,j)∈Y (λ) k =1

For example, the three Young tableaux in (266) have corresponding monomi-
als

x1 x3 x3 x4 x2 x3 x5 x4 = x1 x2 x33 x42 x5 ,
x2 x1 x3 x4 x3 x4 x5 x6 ,
x1 x1 x2 x3 x2 x4 x5 x6 .

Definition 7.3.9. Let λ be an N-partition. We define the Schur polynomial


sλ ∈ P by
sλ := ∑ xT .
T ∈SSYT(λ)

Example 7.3.10. (a) Let n ∈ N. Consider the N-partition (n, 0, 0, . . . , 0). The
semistandard tableaux T of shape (n, 0, 0, . . . , 0) are simply the fillings of a
single row with n elements of [ N ] that weakly increase from left to right:

T = i1 i2 · · · i n with i1 ≤ i2 ≤ · · · ≤ in .

Thus,
s(n,0,0,...,0) = ∑ x i1 x i2 · · · x i n = h n .
i1 ≤i2 ≤···≤in

(b) Let n ∈ {0, 1, . . . , N }. Consider the N-partition (1, 1, . . . , 1, 0, 0, . . . , 0)


(with n ones and N − n zeroes). The semistandard tableaux T of shape
(1, 1, . . . , 1, 0, 0, . . . , 0) are simply the fillings of a single column with n ele-
ments of [ N ] that strictly increase from top to bottom:

T = i1 with i1 < i2 < · · · < in .


i2
..
.
in

Hence,

s(1,1,...,1,0,0,...,0) (with n ones and N −n zeroes) = ∑ x i1 x i2 · · · x i n = e n .


i1 <i2 <···<in
Math 701 Spring 2021, version April 6, 2024 page 451

(c) Assume that N ≥ 2. Consider the N-partition (2, 1, 0, 0, 0, . . . , 0) (with


all entries from the third on being 0). The semistandard tableaux T of shape
(2, 1, 0, 0, 0, . . . , 0) all have the form

T= i j with i ≤ j and i < k.


k

Hence,

s(2,1,0,0,0,...,0) = ∑ xi x j x k = ∑ xi x j x k + ∑ xi x j x k + ∑ xi x j x k + ∑ xi x j x k
i ≤ j; i <k; i < j<k i <k; i <k< j
i <k j =i | {z } j=k | {z }
| {z } = e3 | {z } = e3
= ∑ xi xi x k = ∑ xi x k x k
i<k i <k
 
since each triple (i, j, k ) of elements of [ N ]

 that satisfies i ≤ j and i < k must satisfy 


 exactly one of the four 

conditions (i < k and j = i ) and i < j < k
 
 
and (i < k and j = k ) and i < k < j,
 
 
 
 and conversely, each triple satisfying one 
 
 of the latter four conditions must 
satisfy i ≤ j and i < k
= ∑ xi xi xk + e3 + ∑ xi xk xk + e3 = 2e3 + ∑ |{z}
xi xi x k + ∑ xi x k x k
|{z}
i <k i <k i <k i <k
= xi2 = xk2

= 2e3 + ∑ xi2 xk + ∑ xi xk2 = 2e3 + m(2,1,0,0,...,0)


i <k
{z i<k }
| {z }
| =e2 e1 −3e3
=m(2,1,0,0,...,0) (check this!)

= 2e3 + e2 e1 − 3e3 = e2 e1 − e3 .

Theorem 7.3.11. Let λ be an N-partition. Then:


(a) The polynomial sλ is symmetric.
(b) We have
aλ+ρ = aρ · sλ .
Here, the addition on N N is defined entrywise: that is, if α = (α1 , α2 , . . . , α N )
and β = ( β 1 , β 2 , . . . , β N ) are two N-tuples in N N , then we set

α + β : = ( α1 + β 1 , α2 + β 2 , . . . , α N + β N ) .

This theorem (once it will be proved) will yield Conjecture 7.3.3 in the case
when α1 > α2 > · · · > α N . Indeed, if α = (α1 , α2 , . . . , α N ) ∈ N N is an N-tuple
Math 701 Spring 2021, version April 6, 2024 page 452

satisfying α1 > α2 > · · · > α N , then we can write α as α = λ + ρ for some N-


partition λ (namely, for λ = α − ρ = (α1 − ( N − 1) , α2 − ( N − 2) , . . . , α N − ( N − N ))),
and then Theorem 7.3.11 (b) will yield aα = aρ · sλ , so that aα /aρ = sλ , which is
a symmetric polynomial (by Theorem 7.3.11 (a)) and furthermore has positive
coefficients (by its combinatorial definition). Once Conjecture 7.3.3 is proved in
the case when α1 > α2 > · · · > α N , the validity of its first claim in the general
case easily follows (because the alternant aα is 0 when two of α1 , α2 , . . . , α N are
equal134 , and otherwise can be reduced to the case α1 > α2 > · · · > α N by
swapping the columns around135 ).
We will prove Theorem 7.3.11 (a) soon and Theorem 7.3.11 (b) later.

7.3.3. Skew Young diagrams and skew Schur polynomials


Before we get to the proof, let us generalize the situation a bit:

Definition 7.3.12. Let λ and µ be two N-partitions.


We say that µ ⊆ λ if and only if Y (µ) ⊆ Y (λ). Equivalently, µ ⊆ λ if and
only if
each i ∈ [ N ] satisfies µi ≤ λi
(where we write µ and λ as µ = (µ1 , µ2 , . . . , µ N ) and λ = (λ1 , λ2 , . . . , λ N )).
Thus we have defined a partial order ⊆ on the set of all N-partitions.

For example, we have (3, 2, 0) ⊆ (4, 2, 1) (since 3 ≤ 4 and 2 ≤ 2 and 0 ≤ 1),


but we don’t have (2, 2, 0) ⊆ (3, 1, 0) (since 2 > 1). We can see this on the Young
diagrams:

we have ⊆ ,

but we don’t have ⊆ .

Definition 7.3.13. Let λ and µ be two N-partitions such that µ ⊆ λ. Then,


we define the skew Young diagram Y (λ/µ) to be the set difference

Y (λ) \ Y (µ) = {(i, j) | i ∈ [ N ] and j ∈ [λi ] \ [µi ]}


= {(i, j) | i ∈ [ N ] and j ∈ Z and µi < j ≤ λi } .

134 This is easy to prove (but see Lemma 7.3.39 (a) below for a proof).
135 See Lemma 7.3.39 (b) below for the details of this.
Math 701 Spring 2021, version April 6, 2024 page 453

For example,
Y ((4, 3, 1) / (2, 1, 0)) = .

We note that any row or column of a skew Young diagram Y (λ/µ) is con-
tiguous, i.e., has no holes between boxes. Better yet, if ( a, b) and (e, f ) are two
boxes of Y (λ/µ), then any box (c, d) that lies “between” them (i.e., that satisfies
a ≤ c ≤ e and b ≤ d ≤ f ) must also belong to Y (λ/µ). Let us state this as a
lemma:

Lemma 7.3.14 (Convexity of skew Young diagrams). Let λ and µ be two N-


partitions such that µ ⊆ λ. Let ( a, b) and (e, f ) be two elements of Y (λ/µ).
Let (c, d) ∈ Z2 satisfy a ≤ c ≤ e and b ≤ d ≤ f . Then, (c, d) ∈ Y (λ/µ).

Hints to the proof of Lemma 7.3.14. This follows easily from the definition of Y (λ/µ)
using the fact that λ and µ are weakly decreasing N-tuples. A detailed proof
can be found in Section B.10.
Lemma 7.3.14 is known as the convexity of Y (λ/µ) (albeit in a very specific
sense of the word “convexity”).
Next, we can define Young tableaux of shape λ/µ whenever λ and µ are two
N-partitions satisfying µ ⊆ λ. The definition is analogous to Definition 7.3.5,
except that we are only filling the boxes of Y (λ/µ) (rather than all boxes of
Y (λ)) this time:

Definition 7.3.15. Let λ and µ be two N-partitions such that µ ⊆ λ. A


Young tableau of shape λ/µ means a way of filling the boxes of Y (λ/µ) with
elements of [ N ] (one element per box). Formally speaking, it is defined as a
map T : Y (λ/µ) → [ N ]. We visually represent such a map by filling in the
number T (i, j) into each box (i, j).
Young tableaux of shape λ/µ are often called skew Young tableaux.
If we don’t have µ ⊆ λ, then we agree that there are no Young tableaux of
shape λ/µ.

The notion of a semistandard tableau of shape λ/µ is, again, defined in the
same way as for shape λ:

Definition 7.3.16. Let λ and µ be two N-partitions.


A Young tableau T of shape λ/µ is said to be semistandard if its entries

• increase weakly along each row (from left to right);

• increase strictly down each column (from top to bottom).


Math 701 Spring 2021, version April 6, 2024 page 454

Formally speaking, this means that a Young tableau T : Y (λ/µ) → [ N ] is


semistandard if and only if

• we have T (i, j) ≤ T (i, j + 1) for any (i, j) ∈ Y (λ/µ) satisfying


(i, j + 1) ∈ Y (λ/µ);
• we have T (i, j) < T (i + 1, j) for any (i, j) ∈ Y (λ/µ) satisfying
(i + 1, j) ∈ Y (λ/µ).

We let SSYT (λ/µ) denote the set of all semistandard Young tableaux of
shape λ/µ. We will usually say “semistandard tableau” instead of “semistan-
dard Young tableau”.
For example, here is a semistandard Young tableau of shape (4, 3, 1) / (2, 1, 0):
1 3 .
2 2
1
Meanwhile, there are no Young tableaux of shape (3, 2, 1) / (2, 2, 2) (since we
don’t have (2, 2, 2) ⊆ (3, 2, 1)), and thus the set SSYT ((3, 2, 1) / (2, 2, 2)) is
empty.
The phrases “increase weakly along each row” and “increase strictly down
each column” in Definition 7.3.13 have been formalized in terms of adjacent
entries: e.g., we have declared “increase weakly along each row” to mean
“T (i, j) ≤ T (i, j + 1)” rather than “T (i, j1 ) ≤ T (i, j2 ) whenever j1 ≤ j2 ”. How-
ever, since any row or column of Y (λ/µ) is contiguous, the latter stronger
meaning actually follows from the former. To wit:
Lemma 7.3.17. Let λ and µ be two N-partitions. Let T be a semistandard
Young tableau of shape λ/µ. Then:
(a) If (i, j1 ) and (i, j2 ) are two elements of Y (λ/µ) satisfying j1 ≤ j2 , then
T (i, j1 ) ≤ T (i, j2 ).
(b) If (i1 , j) and (i2 , j) are two elements of Y (λ/µ) satisfying i1 ≤ i2 , then
T ( i1 , j ) ≤ T ( i2 , j ).
(c) If (i1 , j) and (i2 , j) are two elements of Y (λ/µ) satisfying i1 < i2 , then
T ( i1 , j ) < T ( i2 , j ).
(d) If (i1 , j1 ) and (i2 , j2 ) are two elements of Y (λ/µ) satisfying i1 ≤ i2 and
j1 ≤ j2 , then T (i1 , j1 ) ≤ T (i2 , j2 ).
(e) If (i1 , j1 ) and (i2 , j2 ) are two elements of Y (λ/µ) satisfying i1 < i2 and
j1 ≤ j2 , then T (i1 , j1 ) < T (i2 , j2 ).
Hints to the proof of Lemma 7.3.17. This is easy using Lemma 7.3.14. A detailed
proof can be found in Section B.10.
Math 701 Spring 2021, version April 6, 2024 page 455

We extend Definition 7.3.8 to skew tableaux:

Definition 7.3.18. Let λ and µ be two N-partitions. If T is any Young tableau


of shape λ/µ, then we define the corresponding monomial
N
∏ ∏ ∏ xk
(# of times k appears in T )
xT := x T (c) = x T (i,j) = .
c is a box of Y (λ/µ) (i,j)∈Y (λ/µ) k =1

For example,

if T = 1 3 , then x T = x1 x3 x2 x2 x3 x4 x4 = x1 x22 x32 x42 .


2 2
3 4 4

Finally, we generalize Definition 7.3.9 to λ/µ:

Definition 7.3.19. Let λ and µ be two N-partitions. We define the skew Schur
polynomial sλ/µ ∈ P by

sλ/µ := ∑ xT .
T ∈SSYT(λ/µ)

Example 7.3.20. (a) We have

s(3,3,2,0,0,0,...,0)/(2,1,0,0,0,...,0) = ∑ xi x j x k x ℓ x m ,
i < j≥k<ℓ≥m

since the semistandard tableaux of shape


i
(3, 3, 2, 0, 0, 0, . . . , 0) / (2, 1, 0, 0, 0, . . . , 0) have the form for
k j
m ℓ
i, j, k, ℓ, m ∈ [ N ] satisfying i < j ≥ k < ℓ ≥ m.
(b) We have
s(3,2,1,0,0,0,...,0)/(2,1,0,0,0,...,0) = ∑ xi x j x k ,
i,j,k

since the semistandard tableaux of shape


(3, 2, 1, 0, 0, 0, . . . , 0) / (2, 1, 0, 0, 0, . . . , 0) have the form i for
j
k
i, j, k ∈ [ N ] satisfying no special requirements (as there are never two entries
Math 701 Spring 2021, version April 6, 2024 page 456

in the same row or in the same column of the tableau). Thus,


!3
s(3,2,1,0,0,0,...,0)/(2,1,0,0,0,...,0) = ∑ xi x j x k = ∑ xi = ( x1 + x2 + · · · + x N )3 .
i,j,k i

Theorem 7.3.21. Let λ and µ be any two N-partitions. Then, the polynomial
sλ/µ is symmetric.

Remark 7.3.22. If we set 0 = (0, 0, . . . , 0) ∈ N N , then sλ/0 = sλ (since the


semistandard tableaux of shape λ/0 are precisely the semistandard tableaux
of shape λ). Hence, Theorem 7.3.21 generalizes Theorem 7.3.11 (a).

We will now prove Theorem 7.3.21 bijectively, using a beautiful set of combi-
natorial bijections known as the Bender–Knuth involutions.

7.3.4. The Bender–Knuth involutions


Proof of Theorem 7.3.21. For each i ∈ [ N − 1], we consider the simple transposi-
tion si ∈ S N defined in Definition 5.2.3 (applied to n = N). As we recall, this
is the transposition that swaps i with i + 1. (The notation si for this transposi-
tion is unfortunately similar to the notation sλ for Schur polynomials; however,
this should not cause any confusion, since the only Schur polynomial that will
appear in this proof is sλ/µ , which cannot be mistaken for a transposition.)
We must show that the polynomial sλ/µ is symmetric. According to Lemma
7.1.17, it will suffice to show that sk · sλ/µ = sλ/µ for each k ∈ [ N − 1].
So let us fix some k ∈ [ N − 1]. In order to prove sk · sλ/µ = sλ/µ , we will
construct a bijection

β k : SSYT (λ/µ) → SSYT (λ/µ)

(called the k-th Bender–Knuth involution) that

• interchanges the # of k’s with the # of (k + 1)’s in a tableau (that is, if a


tableau T has a many k’s and b many (k + 1)’s, then β k ( T ) will have b
many k’s and a many (k + 1)’s);

• leaves all the other entries of the tableau unchanged.

Indeed, once such a bijection β k is constructed, we can easily see that

x β k (T ) = sk · x T for each T ∈ SSYT (λ/µ) ,


Math 701 Spring 2021, version April 6, 2024 page 457

so that applying sk to sλ/µ = ∑ x T will amount to permuting the ad-


T ∈SSYT(λ/µ)
dends in the sum.136
We construct the map β k as follows: Let T ∈ SSYT (λ/µ). We focus on the k’s
and the (k + 1)’s in T (that is, on the entries of T that are equal to k or to k + 1).
An entry k in T will be called matched if there is a k + 1 directly underneath it
(in the same column). An entry k + 1 in T will be called matched if there is a k
directly above it (in the same column). All other k’s and (k + 1)’s in T will be
called free. Let us see an example of this first:
Example 7.3.23. For this example, let k = 2. Here is a semistandard tableau
T ∈ SSYT ((9, 8, 5, 2, 0, 0) / (3, 1, 1, 0, 0, 0)) with the free entries printed in
boldface and the matched entries printed on a grey background:

T= 1 2 2 2 2 3 .
1 1 2 3 3 4 6
2 3 3 5
2 4

(We have color-coded the entries so that 2’s are red, 3’s are blue, and all
other entries are black. You can mostly forget about the black entries, since
our construction of β k ( T ) will neither change them nor depend on them.)
We note that the entries of T increase weakly along each row (since T is
semistandard), and increase strictly down each column (for the same reason).
Now, an entry k in T is matched if and only if there is a k + 1 anywhere in its
column (because if there is a k + 1 anywhere in its column, then this k + 1 must
be directly underneath the k 137 , and therefore the k is matched). Likewise, an
entry k + 1 in T is matched if and only if there is a k anywhere in its column.
Thus, matched entries come in pairs: If a k in T is matched, then the k + 1
directly underneath it is also matched, and conversely, if a k + 1 in T is matched,
then the k directly above it is also matched. Hence, there is an obvious bijection
between the sets {matched k’s in T } and {matched (k + 1) ’s in T } 138 . Thus,
by the bijection principle, we have
(# of matched k’s in T )
= (# of matched (k + 1) ’s in T ) . (267)
Each column of T that contains a matched entry must contain exactly two
matched entries (one k and one k + 1); we shall refer to these two entries as
each other’s “partners”.
136 We will explain this argument in more detail at the end of this proof.
137 since the entries of each column of T increase down this column
138 Strictly speaking, these sets should consist not of the entries, but rather of the boxes in which

they are located.


Math 701 Spring 2021, version April 6, 2024 page 458

Our goal is to modify some of the entries k and k + 1 in such a way that we
obtain a new semistandard tableau that has as many k’s as our original tableau
T had (k + 1)’s, and has as many (k + 1)’s as our original tableau T had k’s. We
do not want to change any entries other than k’s and (k + 1)’s; nor do we want
to replace any k’s or (k + 1)’s by entries other than k and k + 1.
These requirements force us to leave all matched entries (both k’s and (k + 1)’s)
unchanged. Indeed, if an entry is matched, then its column contains both a k
and a k + 1, and thus neither of these two entries can be changed without
breaking the “entries increase strictly down each column” condition in Defini-
tion 7.3.16. Thus, the matched entries will have to stay unchanged.
On the other hand, we can arbitrarily replace the free entries by k’s or (k + 1)’s,
as long as we make sure to keep the rows weakly increasing; the columns will
stay strictly increasing no matter what we do (because a column containing a
free k does not contain any k + 1, and a column containing a free k + 1 does not
contain any k), so our tableau will remain semistandard.
In view of these observations, let us perform the following procedure:

• For each row of T, if there are a free k’s and b free (k + 1)’s in this row, we
replace them by b free k’s and a free (k + 1)’s (placed in this order, from
left to right).

We define β k ( T ) to be the result of this procedure.


Example 7.3.24. If k and T are as in Example 7.3.23, then

βk (T ) = 1 2 2 2 3 3 .
1 1 2 3 3 4 6
2 3 3 5
3 4

Indeed:

• The 1-st row of T had 2 free 2’s and 1 free 3, so we replaced them by 1
free 2 and 2 free 3’s.

• The 2-nd row had 0 free 2’s and 0 free 3’s, so we replaced them by 0
free 2’s and 0 free 3’s. (Of course, this did not change anything.)

• The 3-rd row had 1 free 2 and 1 free 3, so we replaced them by 1 free 2
and 1 free 3. (Of course, this did not change anything.)

• The 4-th row had 1 free 2 and 0 free 3’s, so we replaced them by 0 free
2’s and 1 free 3.
Math 701 Spring 2021, version April 6, 2024 page 459

Thus, β k ( T ) is obtained from T by “flipping the imbalance between free k’s


and free (k + 1)’s” in each row of T (so that a row that was heavy on free
k’s becomes equally heavy on free (k + 1)’s, and vice versa). Rows that have
equally many free k’s and free (k + 1)’s stay unchanged.
In order to make sure that β k is a well-defined map from SSYT (λ/µ) to
SSYT (λ/µ), we need to show that the tableau β k ( T ) is semistandard. As we
have already explained, the columns are not at issue (we have only changed
free entries, so the columns remain strictly increasing), but we need to convince
ourselves that the rows are still weakly increasing. It is clear that the free entries
in each row are in the right order (i.e., any free k stands further left than any
free k + 1), and it is also clear that the matched entries in each row are in the
right order (since they are unchanged from T); however, it is imaginable that
the order between free and matched entries has gotten messed up (e.g., a larger
free entry stands further left than a smaller matched entry in β k ( T )). We need
to prove that this does not happen.
To prove this, we make the following observation:

Observation 1: Each row of T can be subdivided into the following


six blocks:

entries < k matched k’s free k’s free (k + 1)’s matched (k + 1)’s entries > k + 1
,

which appear in this order from left to right. (Each of these blocks
can be empty.)

[Proof of Observation 1: Consider some row of T. The entries in this row


increase weakly from left to right (since T is semistandard), so it is clear that
all entries < k stand further left than all k’s, which in turn stand further left
than all (k + 1)’s, which in turn stand further left than all entries > k + 1. It
therefore remains to show that all matched k’s stand further left than all free
k’s, and that all free (k + 1)’s stand further left than all matched (k + 1)’s. We
will prove the first of these two statements only (since the proof of the second
is largely analogous).
So we need to show that all matched k’s in our row stand further left than all
free k’s. Assume the contrary. Thus, there is some matched k that stands further
right than some free k. Let the former matched k stand in box (u, v), and let the
latter free k stand in box (u, v′ ); thus, v > v′ (since the matched k stands further
right than the free k) and T (u, v′ ) = k (since there is a k in box (u, v′ )). Since
the k in box (u, v) is matched, there is a k + 1 directly underneath it; in other
words, the box (u + 1, v) belongs to Y (λ/µ) and we have T (u + 1, v) = k + 1.
Math 701 Spring 2021, version April 6, 2024 page 460

Here is an illustration of this situation:

column column
v′ v
↓ ↓

row u → k ··· ··· ··· k

row u + 1 → k+1
.

Now, the Young diagram of λ/µ contains the box (u, v′ ), but also contains
the box (u + 1, v), which lies one row south and some number of columns east
of the former box (since v > v′ ). Hence, the Young diagram of λ/µ must have
a box (u + 1, v′ ) directly underneath the box (u, v′ ) (that is, the box (u + 1, v′ )
must belong to Y (λ/µ) 139 ). Here is an illustration of this:

column column
v′ v
↓ ↓

row u → k ··· ··· ··· k

row u + 1 → ? ··· ··· ··· k+1


.

Thus, we know that there is a box (u + 1, v′ ) in the Young diagram of λ/µ.


The entry of the tableau T in this box (u + 1, v′ ) must satisfy T (u + 1, v′ ) >
T (u, v′ ) (because the entries of T increase strictly down each column) and
T (u + 1, v′ ) ≤ T (u + 1, v) (because the entries of T increase weakly along
each row, and because v > v′ shows that the entry T (u + 1, v′ ) lies further
left than T (u + 1, v) 140 ). These two inequalities rewrite as T (u + 1, v′ ) > k
and T (u + 1, v′ ) ≤ k + 1 (since T (u, v′ ) = k and T (u + 1, v) = k + 1). Thus, the
number T (u + 1, v′ ) lies in the half-open interval (k, k + 1]. Since T (u + 1, v′ )
is an integer, we must thus have T (u + 1, v′ ) = k + 1. In other words, the entry
k in box (u, v′ ) of our tableau T has a k + 1 directly underneath it. Therefore,
139 Here is a rigorous proof: We know that (u, v′ ) and (u + 1, v) are two elements of Y (λ/µ).
Moreover, (u + 1, v′ ) ∈ Z2 satisfies u ≤ u + 1 ≤ u + 1 and v′ ≤ v′ ≤ v (since v > v′ ). Thus,
Lemma 7.3.14 (applied to (u, v′ ), (u + 1, v) and (u + 1, v′ ) instead of ( a, b), (e, f ) and (c, d))
yields (u + 1, v′ ) ∈ Y (λ/µ). Qed.
140 Strictly speaking, we are using Lemma 7.3.17 (a) here (applying it to ( u + 1, v′ ) and ( u + 1, v )

instead of (i, j1 ) and (i, j2 )).


Math 701 Spring 2021, version April 6, 2024 page 461

this entry k is matched. This contradicts our assumption that this k is free.
This contradiction shows that our assumption was false. Thus, we have shown
that all matched k’s in our row stand further left than all free k’s. Similarly,
we can show that all free (k + 1)’s stand further left than all matched (k + 1)’s
(the argument is analogous, but it uses the (u − 1)-st row of T rather than the
(u + 1)-st one). As explained above, this completes the proof of Observation 1.]

Observation 1 entails that the free entries in each row of T are stuck together
between the rightmost matched k and the leftmost matched k + 1. Hence, re-
placing these free entries as in our above definition of β k ( T ) does not mess up
the weakly increasing order of the entries in this row. This completes our proof
that β k ( T ) is a semistandard tableau.
Hence, the map β k : SSYT (λ/µ) → SSYT (λ/µ) is well-defined. This map β k
is called the k-th Bender–Knuth involution.
We shall now show that this map β k is a bijection. Better yet, we will show
that it is an involution (i.e., that β k ◦ β k = id):

Observation 2: We have β k ◦ β k = id.

[Proof of Observation 2: We need to check that β k ( β k ( T )) = T for each T ∈


SSYT (λ/µ). So let T ∈ SSYT (λ/µ) be arbitrary. Recall our above definition
of free and matched k’s and (k + 1)’s in T. The construction of β k ( T ) replaced
some free entries while leaving the matched ones unchanged.
Now, we claim that the matched entries of β k ( T ) stand in the exact same
boxes as the matched entries of T, whereas the free entries of β k ( T ) stand in
the exact same boxes as the free entries of T. Indeed, the matched entries of
T remain matched in β k ( T ) (since neither these entries themselves, nor their
“partners” have changed in the construction of β k ( T )), whereas the free en-
tries of T remain free in β k ( T ) (since the construction of β k ( T ) cannot have
produced any “partners” for them141 ).
This has the consequence that if we apply our definition of β k to the semis-
tandard tableau β k ( T ) (to construct β k ( β k ( T ))), then we end up undoing the
very changes that transformed T into β k ( T ) (indeed, in each row, the original
imbalance between free k’s and free (k + 1)’s that was flipped in the construc-
tion of β k ( T ) gets flipped again, and thus gets restored to its original state142 ).
Hence, β k ( β k ( T )) = T.
141 Why not? Let’s take, for example, a free k. This free k is the only k in its column (since the
entries of T increase strictly down each column), and there is no k + 1 in its column (since
otherwise, the k would be matched, not free). Thus, there is no entry in its column that
could become a “partner” for it in β k ( T ). An analogous argument applies to a free k + 1.
142 In more detail: If some row of T had a free k’s and b free ( k + 1)’s, then the same row of

β k ( T ) has b free k’s and a free (k + 1)’s, and therefore the same row of β k ( β k ( T )) will, in
turn, have a free k’s and b free (k + 1)’s again; but this means that its free entries are the
same as in T.
Math 701 Spring 2021, version April 6, 2024 page 462

Forget that we fixed T. We thus have shown that β k ( β k ( T )) = T for each


T ∈ SSYT (λ/µ). In other words, β k ◦ β k = id. This proves Observation 2.]
Observation 2 shows that β k is an involution. Hence, β k is a bijection.
The map β k leaves all entries of the tableau other than k’s and (k + 1)’s un-
changed (because of how we defined β k ). Let us now show that β k interchanges
the # of k’s with the # of (k + 1)’s in a tableau (that is, if a tableau T has a many
k’s and b many (k + 1)’s, then β k ( T ) will have b many k’s and a many (k + 1)’s).
More precisely, we shall show the following:

Observation 3: Let T ∈ SSYT (λ/µ). Then,

(# of k’s in β k ( T )) = (# of (k + 1) ’s in T ) (268)

and
(# of (k + 1) ’s in β k ( T )) = (# of k’s in T ) . (269)
Moreover, if i ∈ [ N ] satisfies i ̸= k and i ̸= k + 1, then

(# of i’s in β k ( T )) = (# of i’s in T ) . (270)

[Proof of Observation 3: We recall a simple fact we noticed during our proof


of Observation 2 above: The matched entries of β k ( T ) stand in the exact same
boxes as the matched entries of T, whereas the free entries of β k ( T ) stand in
the exact same boxes as the free entries of T. Thus, the matched entries of T
remain matched in β k ( T ), whereas the free entries of T remain free in β k ( T )
(even if some of them change their values).
Since the matched entries of T remain unchanged under the map β k , we
therefore have

(# of matched k’s in β k ( T ))
= (# of matched k’s in T ) (271)

and

(# of matched (k + 1) ’s in β k ( T ))
= (# of matched (k + 1) ’s in T ) . (272)

On the other hand, the map β k flips the imbalance between free k’s and
free (k + 1)’s in each row (but all these free entries remain free, whereas the
matched entries of T remain matched in β k ( T )); therefore, it also flips the total
imbalance between free k’s and free (k + 1)’s in the entire tableau143 . Thus,

(# of free k’s in β k ( T ))
= (# of free (k + 1) ’s in T ) (273)
143 sincethe total # of free k’s in a tableau equals the sum of the #s of free k’s in all rows (and
the same holds for free (k + 1)’s)
Math 701 Spring 2021, version April 6, 2024 page 463

and

(# of free (k + 1) ’s in β k ( T ))
= (# of free k’s in T ) . (274)

Now, each k in β k ( T ) is either free or matched (but not both at the same
time). Hence,

(# of k’s in β k ( T ))
= (# of free k’s in β k ( T )) + (# of matched k’s in β k ( T ))
| {z } | {z }
=(# of free (k+1)’s in T ) =(# of matched k’s in T )
(by (273)) (by (271))
= (# of free (k + 1) ’s in T ) + (# of matched k’s in T )
| {z }
=(# of matched (k+1)’s in T )
(by (267))
= (# of free (k + 1) ’s in T ) + (# of matched (k + 1) ’s in T )
= (# of (k + 1) ’s in T )

(since each k + 1 in T is either free or matched, but not both at the same time).
Also, each k + 1 in β k ( T ) is either free or matched (but not both at the same
time). Hence,

(# of (k + 1) ’s in β k ( T ))
= (# of free (k + 1) ’s in β k ( T )) + (# of matched (k + 1) ’s in β k ( T ))
| {z } | {z }
=(# of free k’s in T ) =(# of matched (k+1)’s in T )
(by (274)) (by (272))
= (# of free k’s in T ) + (# of matched (k + 1) ’s in T )
| {z }
=(# of matched k’s in T )
(by (267))
= (# of free k’s in T ) + (# of matched k’s in T )
= (# of k’s in T )

(since each k in T is either free or matched, but not both at the same time).
Moreover, if i ∈ [ N ] satisfies i ̸= k and i ̸= k + 1, then

(# of i’s in β k ( T )) = (# of i’s in T )

(since the map β k leaves all i’s in T unchanged144 , and does not replace any
other entries by i’s). Thus, Observation 3 is proved.]
From Observation 3, we can easily conclude the following:

Observation 4: We have x β k (T ) = sk · x T for each T ∈ SSYT (λ/µ).


144 because i ̸= k and i ̸= k + 1
Math 701 Spring 2021, version April 6, 2024 page 464

[Proof of Observation 4: For the sake of completeness, here is a detailed proof. Let
T ∈ SSYT (λ/µ). Then,
N
∏ xi
(# of times i appears in T )
xT = (by Definition 7.3.18)
i =1
N
= ∏ xi
(# of i’s in T )

i =1
(since (# of times i appears in T ) = (# of i’s in T ) for each i ∈ [ N ])

(# of k’s in T ) (# of (k+1)’s in T ) (# of i’s in T )
= xk · x k +1 · xi (275)
i ∈[ N ];
i ̸=k and i ̸=k +1

(here, we have split off the factors for i = k and for i = k + 1 from the product). The
same argument (applied to β k ( T ) instead of T) yields


(# of k’s in β k ( T )) (# of (k+1)’s in β k ( T )) (# of i’s in β k ( T ))
x β k (T ) = xk ·x · x
| {z } | k +1 {z } i ∈[ N ];
|i {z }
(# of (k+1)’s in T ) (# of k’s in T ) (# of i’s in T )
= xk = x k +1 i ̸=k and i ̸=k +1 = xi
(by (268)) (by (269)) (by (270))


(# of (k+1)’s in T ) (# of k’s in T ) (# of i’s in T )
= xk · x k +1 · xi
i ∈[ N ];
i ̸=k and i ̸=k +1


(# of k’s in T ) (# of (k+1)’s in T ) (# of i’s in T )
= x k +1 · xk · xi .
i ∈[ N ];
i ̸=k and i ̸=k +1

On the other hand, applying the transposition sk (or, more precisely, the action of this
transposition sk ∈ S N on the ring P ) to both sides of the equality (275), we obtain
 


 (# of k’s in T ) (# of (k+1)’s in T ) (# of i’s in T ) 
sk · x T = sk · 
 xk · x k +1 · xi 

i ∈[ N ];
i ̸=k and i ̸=k +1


(# of k’s in T ) (# of (k+1)’s in T ) (# of i’s in T )
= x k +1 · xk · xi
i ∈[ N ];
i ̸=k and i ̸=k+1

(since the action of sk on P swaps the indeterminates xk and xk+1 while leaving all
other indeterminates xi unchanged). Comparing the last two equalities, we obtain
x β k (T ) = sk · x T . This proves Observation 4.]
Now, the definition of sλ/µ yields

sλ/µ = ∑ xT . (276)
T ∈SSYT(λ/µ)

Applying the permutation sk ∈ S N (or, rather, the action of this permutation on


Math 701 Spring 2021, version April 6, 2024 page 465

the ring P ) to both sides of this equality, we obtain


sk · sλ/µ = sk · ∑ xT = ∑ s ·x
|k {z T}
T ∈SSYT(λ/µ) T ∈SSYT(λ/µ) = x β (T )
k
(by Observation 4)
 
since the group S N acts on the ring P
 by K-algebra automorphisms, and thus 
the action of sk on P is K-linear
= ∑ x βk (T ) = ∑ xT
T ∈SSYT(λ/µ) T ∈SSYT(λ/µ)
 
here, we have substituted T for β k ( T ) in the sum,
 since the map β k : SSYT (λ/µ) → SSYT (λ/µ) 
is a bijection
= sλ/µ (by (276)) .
Now, forget that we fixed k. We thus have shown that
sk · sλ/µ = sλ/µ for each k ∈ [ N − 1] . (277)
Hence, Lemma 7.1.17 (applied to f = sλ/µ ) shows that the polynomial sλ/µ is
symmetric. This proves Theorem 7.3.21.
As we already mentioned, Theorem 7.3.11 (a) is a particular case of Theorem
7.3.21 (namely, the one obtained when we set µ = 0 = (0, 0, . . . , 0)), because
sλ/0 = sλ .

7.3.5. The Littlewood–Richardson rule


The Bender–Knuth involutions have served us well in the above proof of Theo-
rem 7.3.21, but they have much more to offer. We will soon see them prove one
of the most famous results in the theory of symmetric polynomials, namely
the Littlewood–Richardson rule. In the process, we will also (finally) prove
Theorem 7.3.11 (b).
The Littlewood–Richardson rule has its roots in the representation theory of
the classical groups (specifically, GL N (C)). We shall say a few words about
this motivation before we move on to stating the rule itself (which is purely
combinatorial, as is its proof). At the simplest level, the Littlewood–Richardson
rule is about expanding the product sν sλ of two Schur polynomials as a sum of
other Schur polynomials. For instance, for N = 4, we have
s2100 s1100 = s2111 + s2210 + s3110 + s3200 ,
where we are omitting commas and parentheses for brevity (i.e., we are writing
2100 for the N-partition (2, 1, 0, 0), and likewise for the other N-partitions).
Likewise, for N = 3, we have
s210 s210 = s222 + 2s321 + s330 + s411 + s420 .
Math 701 Spring 2021, version April 6, 2024 page 466

I think it was Hermann Weyl who originally proved the existence of such an
expansion (i.e., that any product sν sλ of two Schur polynomials is a sum of
Schur polynomials). The original proof used Lie group representations. The
idea of the proof, in a nutshell, is the following (skip this paragraph if you are
unfamiliar with representation theory): The irreducible polynomial represen-
tations of the classical group GL N (C) are (more or less) in bijection with the
N-partitions, meaning that there is an irreducible polynomial representation
Vλ for each N-partition λ, and all irreps (= irreducible polynomial representa-
tions) of GL N (C) have this form145 . These Vλ ’s are known as the Weyl modules,
or in a slightly more general form as the Schur functors. The tensor product of
two such irreps can be decomposed as a direct sum of irreps (since polynomial
representations of GL N (C) are completely reducible):
c(ν,λ,ω )
Vν ⊗ Vλ ∼
M
= Vω .
| {z }
ω is an N-partition
a direct sum of c(ν,λ,ω )
many Vω ’s

The multiplicities c (ν, λ, ω ) in this decomposition are precisely the coefficients


that you get when you decompose the product of the Schur polynomials sν and
sλ as a sum of Schur polynomials:

sν sλ = ∑ c (ν, λ, ω ) sω .
ω is an N-partition

In fact, the Schur polynomials sλ are the so-called characters of the irreps Vλ ,
and it is known that tensor products of representations correspond to products
of their characters.
All of this, in the detail it deserves, is commonly taught in a 1st or 2nd course
on representation theory (e.g., [Proces07] or [EGHetc11] or [Prasad15, Chapter
6]). But we are here for something else: we want to know these c (ν, λ, ω )’s. In
other words, we want a formula that expands a product sν sλ as a finite sum of
Schur polynomials.
Such a formula was first conjectured by Dudley Ernest Littlewood and Archibald
Read Richardson in 1934. It remained unproven for 40 years, not least because
the statement was not very clear. In the 1970s, proofs were found independently
by Marcel-Paul Schützenberger and Glanffrwd Thomas. Since then, at least a
dozen different proofs have appeared. The proof that I will show was pub-
lished by Stembridge in 1997 (in [Stembr02], perhaps one of the most readable
papers in all of mathematics), and crystallizes decades of work by many au-
thors (Gasharov’s somewhat similar proof [Gashar98] probably being the main
harbinger). It will prove not just an expansion for sν sλ , but also a generalization
(replacing sλ by a skew Schur polynomial sλ/µ ) found by Zelevinsky in 1981

145 Atleast if one uses the “right” definition of a polynomial representation. See [KraPro10, §5
and §6] or [Prasad15, §6.1] for details.
Math 701 Spring 2021, version April 6, 2024 page 467

([Zelevi81]), as well as Theorem 7.3.11 (b). My presentation of this proof will


follow [GriRei20, §2.6] (which, in turn, elaborates on [Stembr02]).
To state the Littlewood–Richardson rule, we need some notions and nota-
tions. We begin with the notations:

Definition 7.3.25. (a) We let 0 denote the N-tuple (0, 0, . . . , 0) ∈ N N .


(b) Let α = (α1 , α2 , . . . , α N ) and β = ( β 1 , β 2 , . . . , β N ) be two N-tuples in
N N . Then, we set

α + β : = ( α1 + β 1 , α2 + β 2 , . . . , α N + β N ) and
α − β : = ( α1 − β 1 , α2 − β 2 , . . . , α N − β N ) .

Note that α + β ∈ N N , whereas α − β ∈ Z N .


Of course, the addition operation + that we just defined on the set N N is
associative and commutative, and the N-tuple 0 is its neutral element. The sub-
traction operation − undoes +. Note that the operation + defined in Definition
7.3.25 is precisely the one that we used in Theorem 7.3.11 (b). We notice that
(using the notation of Definition 7.2.3 (a)) we have
x α x β = x α+ β (278)
for any two N-tuples α, β ∈ N N (check this!).
Our next piece of notation is mostly a bookkeeping device:
Definition 7.3.26. Let λ and µ be two N-partitions. Let T be a tableau of
shape λ/µ. We define the content of T to be the N-tuple ( a1 , a2 , . . . , a N ),
where

ai := (# of i’s in T ) = (# of boxes c of T such that T (c) = i ) .

We denote this N-tuple by cont T.

For instance, if N = 5, then cont 1 1 2 = 2, 1, 0, 1, 0 .


( )
4
Note that
x T = xcont T for any tableau T. (279)
N (# of i’s in T )
(Indeed, both sides of this equality equal ∏ xi .)
i =1
Another notation lets us cut certain columns out of a tableau:
Definition 7.3.27. Let λ and µ be two N-partitions. Let T be a tableau of
shape λ/µ. Let j be a positive integer. Then, col≥ j T means the restriction of
T to columns j, j + 1, j + 2, . . . (that is, the result of removing the first j − 1
columns from T). Formally speaking, this means the restriction of the map
T to the set {(u, v) ∈ Y (λ/µ) | v ≥ j}.
Math 701 Spring 2021, version April 6, 2024 page 468

For example,

col≥3 1 1 2 = 1 2 and
2 3 3
1 3 5 5
2 2

col≥5 1 1 2 = (empty tableau) .


2 3
1 3 5
2 2

Remark 7.3.28. What shape does the tableau col≥ j T in Definition 7.3.27 have?
We don’t care, since we will only need this tableau for its content col≥ j T
(which is defined independently of the shape). However, the answer is not
hard to give: If λ = (λ1 , λ2 , . . . , λ N ) and µ = (µ1 , µ2 , . . . , µ N ), then col≥ j T is
a skew Young tableau of shape λ′ /µ′ , where

λ′ = (min { j − 1, λ1 } , min { j − 1, λ2 } , . . . , min { j − 1, λ N }) and



µ = (min { j − 1, µ1 } , min { j − 1, µ2 } , . . . , min { j − 1, µ N }) .

(Thus, the first j − 1 columns of col≥ j T are empty, i.e., have no boxes.)

Note that col≥1 T = T for any tableau T.


Now we are ready to define a nontrivial notion:

Definition 7.3.29. Let λ, µ, ν be three N-partitions. A semistandard tableau


T of shape λ/µ is said to be ν-Yamanouchi (this  is an adjective) if for each
positive integer j, the N-tuple ν + cont col≥ j T ∈ N N is an N-partition (i.e.,
weakly decreasing).

This is a complex and somewhat confusing notion; before we move on, let us
thus give a metaphor that might help clarify it, and several examples.

Remark 7.3.30. Definition 7.3.29 becomes somewhat easier to conceptualize


(and memorize) through a voting metaphor (which, incidentally, is the reason
why 0-Yamanouchi tableaux are sometimes called “ballot tableaux”):
Let λ, µ, ν and T be as in Definition 7.3.29. Consider an election between N
candidates numbered 1, 2, . . . , N. Regard each entry i of T as a single vote for
candidate i. Thus, for example, the tableau 1 2 has one vote for can-
2 5
didate 1, two votes for candidate 2, and one for candidate 5. Now, we count
Math 701 Spring 2021, version April 6, 2024 page 469

the votes by keeping an “tally board”, i.e., an N-tuple ( a1 , a2 , . . . , a N ) ∈ N N


that records how many votes each candidate has received (namely, candidate
i has received ai votes). Assume that, at the beginning of our counting pro-
cess, the tally board is ν (as a consequence of ballot stuffing, or because some
votes have already been counted on the previous day). Now, we process the
votes from the tableau T, column by column, starting with the rightmost col-
umn and moving left. Each time a column is processed, all the votes from
this column are simultaneously added to our tally board. Thus, after the
rightmost column is processed, our tally board is ν + cont col≥ j T , where
j is the index of the rightmost column (i.e., the rightmost column is the j-th
column). Then, the second-to-rightmost column gets processed, and the tally
board becomes ν + cont col≥ j−1 T . And so on, until all columns have been
processed.
Now, the tableau T is ν-Yamanouchi if and only if the tally board has
stayed weakly decreasing (i.e., candidate 1 has at least as many votes as
candidate 2, who in turn has at least as many votes as candidate 3, who in
turn has at least as many votes as candidate 4, and so on) throughout the
vote counting process. This is just a trivial restatement of the definition of
“ν-Yamanouchi”, but in my impression it is conducive to understanding.
One takeaway from this interpretation is the following useful feature of
the vote counting process: No candidate gains more than one vote at a single
time (because no column of T has two equal entries). Thus, the number of
votes for any given candidate increases only in small steps (viz., not at all or
by only 1 vote).

Example 7.3.31. (a) Let N = 3 and ν = 0 = (0, 0, 0). Which of the following
six tableaux are 0-Yamanouchi?

T1 = 1 1 , T2 = 1 1 , T3 = 1 2 ,
2 2 2 3 2 2

T4 = 1 , T5 = 1 , T6 = 1 1 .
1 1 1 2 2
2 3 2 3
Note that all six of these tableaux are semistandard.
Let us check whether T1 is 0-Yamanouchi. Indeed, we compute the N-tuple
ν + cont col≥ j T ∈ N N for each positive integer j, obtaining


ν + cont (col≥1 T1 ) = 0 + (2, 2, 0) = (2, 2, 0) ;


ν + cont (col≥2 T1 ) = 0 + (2, 1, 0) = (2, 1, 0) ;
ν + cont (col≥3 T1 ) = 0 + (1, 0, 0) = (1, 0, 0) ;

ν + cont col≥ j T1 = 0 + (0, 0, 0) = (0, 0, 0) for each j ≥ 4.
Math 701 Spring 2021, version April 6, 2024 page 470

All of the results (2, 2, 0), (2, 1, 0), (1, 0, 0) and (0, 0, 0) are N-partitions. Thus,
T1 is 0-Yamanouchi.
Let us check whether T2 is 0-Yamanouchi. Indeed,

ν + cont (col≥1 T2 ) = 0 + (2, 1, 1) = (2, 1, 1) is an N-partition;


ν + cont (col≥2 T2 ) = 0 + (2, 0, 1) = (2, 0, 1) is not an N-partition.

Thus, T2 is not 0-Yamanouchi.


The tableau T3 is not 0-Yamanouchi, since ν + cont (col≥1 T3 ) = (1, 3, 0) is
not an N-partition.
The tableau T4 is 0-Yamanouchi.
The tableau T5 is not 0-Yamanouchi, since ν + cont (col≥1 T5 ) = (2, 0, 1) is
not an N-partition.
The tableau T6 is 0-Yamanouchi.
(b) So we know that T2 , T3 , T5 are not 0-Yamanouchi. However, they are
ν-Yamanouchi for some other N-partitions ν. For example:

• The tableau T2 becomes ν-Yamanouchi for ν = (1, 1, 0).

• The tableau T3 becomes ν-Yamanouchi for ν = (2, 0, 0).

• The tableau T5 becomes ν-Yamanouchi for ν = (1, 1, 0).

These are, in a sense, the “minimal” choices of ν for this to happen, but of
course there are many other choices of ν that work.

We can now state the Littlewood–Richardson rule:

Theorem 7.3.32 (Zelevinsky’s generalized Littlewood–Richardson rule, in Ya-


manouchi form). Let λ, µ, ν be three N-partitions. Then,

sν · sλ/µ = ∑ sν+cont T . (280)


T is a ν-Yamanouchi
semistandard tableau
of shape λ/µ

Some comments are in order:

• In the sum on the right hand side of (280), the Schur polynomial sν+cont T
is always well-defined. Indeed, if T is a ν-Yamanouchi semistandard
tableau of shape λ/µ, then ν + cont |{z} T = ν + cont (col≥1 T ) is an
=col≥1 T
N-partition (by the definition of “ν-Yamanouchi”), so that sν+cont T is a
well-defined Schur polynomial.
Math 701 Spring 2021, version April 6, 2024 page 471

• Theorem 7.3.32 expresses a product of a regular Schur polynomial sν with


a skew Schur polynomial sλ/µ as a sum of Schur polynomials. You can
get a similar formula for the product of two regular Schur polynomials
by setting µ = 0 = (0, 0, . . . , 0) in Theorem 7.3.32, so that sλ/µ becomes
sλ/0 = sλ .

Example 7.3.33. Let us apply Theorem 7.3.32 to N = 3 and ν = (1, 0, 0) and


λ = (2, 1, 0) and µ = 0 = (0, 0, 0). Thus we get

s(1,0,0) · s(2,1,0) = ∑ s(1,0,0)+cont T . (281)


T is a (1,0,0)-Yamanouchi
semistandard tableau
of shape (2,1,0)/0

What are the T’s in the sum? The (1, 0, 0)-Yamanouchi semistandard tableaux
of shape (2, 1, 0) /0 are

1 1 , 1 2 , 1 2 ,
2 2 3

and the corresponding addends of our sum are

s(1,0,0)+(2,1,0) = s(3,1,0) ,
s(1,0,0)+(1,2,0) = s(2,2,0) ,
s(1,0,0)+(1,1,1) = s(2,1,1) .

Thus, the equality (281) rewrites as

s(1,0,0) · s(2,1,0) = s(3,1,0) + s(2,2,0) + s(2,1,1) .

Note that s(1,0,0) = x1 + x2 + x3 and s(2,1,0) = ∑ xi x j x k .


i ≤ j and i <k

We will prove the Littlewood–Richardson rule as a consequence of the fol-


lowing lemma:
Lemma 7.3.34 (Stembridge’s Lemma). Let λ, µ, ν be three N-partitions. Then,

aν+ρ · sλ/µ = ∑ aν+cont T +ρ .


T is a ν-Yamanouchi
semistandard tableau
of shape λ/µ

Before we prove this lemma, let us explore its consequences. One of them is
the Littlewood–Richardson rule; another is Theorem 7.3.11 (b). Let us first see
how the latter can be derived from the lemma. This derivation, in turn, relies
on another (simple) lemma:
Math 701 Spring 2021, version April 6, 2024 page 472

Lemma 7.3.35. Let λ be any N-partition. Let T be a semistandard tableau of


shape λ. Then, T (i, j) ≥ i for each (i, j) ∈ Y (λ).

Proof of Lemma 7.3.35. This lemma is an easy consequence of the fact that the
entries of a semistandard tableau increase strictly down each column. A de-
tailed proof is given in Section B.10.
Proof of Theorem 7.3.11 (b) using Lemma 7.3.34. Recall that 0 = (0, 0, . . . , 0) ∈ N N .
Applying Lemma 7.3.34 to µ = 0 and ν = 0, we obtain

a0+ρ · sλ/0 = ∑ a0+cont T +ρ .


T is a 0-Yamanouchi
semistandard tableau
of shape λ/0

This rewrites as
aρ · sλ = ∑ acont T +ρ (282)
T is a 0-Yamanouchi
semistandard tableau
of shape λ/0

(since 0 + ρ = ρ and sλ/0 = sλ and 0 + cont T = cont T).


Now, we shall analyze the sum on the right hand side. What are the 0-
Yamanouchi semistandard tableaux of shape λ/0 ? One such tableau is easy to
construct: namely, the one tableau (of shape λ/0) whose all entries in the 1-st
row are 1’s, all entries in the 2-nd row are 2’s, all entries in the 3-rd row are 3’s,
and so on. Let us call this tableau minimalistic, and denote it by T0 . Formally
speaking, this minimalistic tableau T0 is defined to be the map Y (λ/0) → [ N ]
that sends each (i, j) ∈ Y (λ/0) to i. Here is how this minimalistic tableau looks
like for N = 4 and λ = (4, 2, 2, 1):

T0 = 1 1 1 1 .
2 2
3 3
4

It turns out that this minimalistic tableau is the only T on the right hand side
of (282). This will follow from the following two observations:

Observation 1: The minimalistic tableau T0 is a 0-Yamanouchi semis-


tandard tableau of shape λ/0.

Observation 2: If T is a 0-Yamanouchi semistandard tableau of shape


λ/0, then T = T0 .

[Proof of Observation 1: It is clear that T0 is a semistandard tableau of shape λ/0.


Thus, we only need to show that it is 0-Yamanouchi. In other words, we need to show
Math 701 Spring 2021, version April 6, 2024 page 473

that for each positive integer j, the N-tuple 0 + cont col≥ j T0 ∈ N N is an N-partition


(i.e., weakly decreasing).


This can be done directly: Write λ in the form λ = (λ1 , λ2 , . . . , λ N ). Thus, λ1 ≥ λ2 ≥
· · · ≥ λ N (since λ is an N-partition). Let j be a positive integer. Recall that the tableau
T0 is minimalistic; hence, the restricted tableau col≥ j T0 is itself minimalistic (meaning
that all its entries in the 1-st row are 1’s, all entries in the 2-nd row are 2’s, all entries
in the 3-rd row are 3’s, and so on). Therefore, for each i ∈ [ N ], we have

# of i’s in col≥ j T0

= # of boxes in the i-th row of col≥ j T0
= max {λi − ( j − 1) , 0} (283)

(since the i-th row of T0 has λi many boxes, and thus the i-th row of col≥ j T0 has
max {λi − ( j − 1) , 0} many boxes). Now, the N-tuple

0 + cont col≥ j T0

= cont col≥ j T0

= # of 1’s in col≥ j T0 , # of 2’s in col≥ j T0 , . . . , # of N’s in col≥ j T0
= (max {λ1 − ( j − 1) , 0} , max {λ2 − ( j − 1) , 0} , . . . , max {λ N − ( j − 1) , 0})
(by (283))

is weakly decreasing (since λ1 ≥ λ2 ≥ · · · ≥ λ N quickly yields max {λ1 − ( j − 1) , 0} ≥


max {λ2 − ( j − 1) , 0} ≥ · · · ≥ max {λ N − ( j − 1) , 0}), and thus is an N-partition. For-
get that we fixed j. Thus, we have shown that for each positive integer j, the N-
tuple 0 + cont col≥ j T0 ∈ N N is an N-partition. This proves that the tableau T0 is


0-Yamanouchi. This completes the proof of Observation 1.]


[Proof of Observation 2: Let T be a 0-Yamanouchi semistandard tableau of shape λ/0.
We must prove that T = T0 .
Let us assume the contrary. Thus, T ̸= T0 . Hence, there exists some (i, j) ∈ Y (λ/0)
satisfying T (i, j) ̸= T0 (i, j). Choose such an (i, j) with maximum possible j. More
precisely, among all such pairs (i, j) with maximum possible j, we choose one with the
minimum possible i.
Thus, for each (i′ , j′ ) ∈ Y (λ/0) satisfying j′ > j, we have

T i′ , j′ = T0 i′ , j′
 
(284)

(since we have chosen (i, j) to have maximum possible j among the pairs satisfying
T (i, j) ̸= T0 (i, j)). Furthermore, for each (i′ , j′ ) ∈ Y (λ/0) satisfying i′ < i and j′ = j,
we have
T i′ , j′ = T0 i′ , j′
 
(285)
(since we have chosen (i, j) to have minimum possible i among the maximum-j pairs
satisfying T (i, j) ̸= T0 (i, j)).
The definition of the minimalistic tableau T0 yields T0 (i, j) = i. Set p := T (i, j).
Hence, p = T (i, j) ̸= T0 (i, j) = i. 146
146 Here is an example of how our tableau T can look like at this point (for N = 6 and λ =
Math 701 Spring 2021, version April 6, 2024 page 474

The number p appears at least once in the j-th column of T (since p = T (i, j)), and
thus appears at least once in the restricted tableau col≥ j T (since this restricted tableau
contains the j-th column of T).
The definition of Y (λ/0) yields Y (λ/0) = Y (λ) \ Y (0) = Y (λ). Hence, a tableau
| {z }
=∅
of shape λ/0 is the same as a tableau of shape λ. Thus, T is a tableau of shape λ (since
T is a tableau of shape λ/0). Since T is semistandard, we can thus apply Lemma 7.3.35,
and conclude that T (i, j) ≥ i. Hence, p = T (i, j) ≥ i. Combining this with p ̸= i, we
obtain p > i. In other words, i < p. 
Now, recall that T is 0-Yamanouchi; hence, 0 + cont col≥ j T isan N-partition (by
the definition of “0-Yamanouchi”).  In other
 words, cont col≥ j T is an N-partition
(since 0 + cont col≥ j T = cont col≥ j T ). Write this N-partition cont col≥ j T as
( a1 , a2 , . . . , a N ). For each
 k ∈ [ N ], its entry ak is the # of k’s in col≥ j T (by the defi-
nition of cont col≥ j T ). Applying this to k = i, we see that ai is the # of i’s in col≥ j T.
Similarly, a p is the # of p’s in col≥ j T. Hence, a p ≥ 1 (since we know that the number
p appears at least once in the restricted tableau col≥ j T). However, a1 ≥ a2 ≥ · · · ≥ a N
(since ( a1 , a2 , . . . , a N ) is an N-partition), and thus ai ≥ a p (since i < p). Hence, ai ≥
a p ≥ 1. In other words, the number i appears at least once in the restricted tableau
col≥ j T (since ai is the # of i’s in col≥ j T). In other words, the number i appears at least
once in one of the columns j, j + 1, j + 2, . . . of the tableau T. In other words, there
exists some (i′ , j′ ) ∈ Y (λ/0) satisfying j′ ≥ j and T (i′ , j′ ) = i. Consider this (i′ , j′ ).
Let us first assume (for the sake of contradiction) that j′ > j. Thus, (284) yields
T (i′ , j′ ) = T0 (i′ , j′ ) = i′ (by the definition of the minimalistic tableau T0 ). There-
fore, i′ = T (i′ , j′ ) = i. Hence, we can rewrite (i′ , j′ ) ∈ Y (λ/0) and T (i′ , j′ ) = i as
(i, j′ ) ∈ Y (λ/0) and T (i, j′ ) = i. Also, j < j′ (since j′ > j). However, the tableau T is
semistandard; thus, its entries increase weakly along each row. Therefore, from j < j′ ,
we obtain T (i, j) ≤ T (i, j′ ) 147 . Thus, p = T (i, j) ≤ T (i, j′ ) = i. But this contradicts
p > i.
This contradiction shows that our assumption (that j′ > j) was false. Hence, we
must have j′ ≤ j. Combined with j′ ≥ j, this yields j′ = j. Thus, we can rewrite
(i′ , j′ ) ∈ Y (λ/0) and T (i′ , j′ ) = i as (i′ , j) ∈ Y (λ/0) and T (i′ , j) = i.
We assume (for the sake of contradiction) that i′ < i. Hence, (285) yields T (i′ , j′ ) =
T0 (i′ , j′ ) = i′ (by the definition of the minimalistic tableau T0 ), so that i′ = T (i′ , j′ ) = i;

(6, 5, 5, 2, 2, 1) and (i, j) = (3, 2)):

? 1 1 1 1 1 .
? 2 2 2 2
? p 3 3 3
? ?
? ?
?

Here, the known entries come from (284) and (285) (since the definition of the minimalistic
tableau T0 shows that T0 (i′ , j′ ) = i′ for each (i′ , j′ ) ∈ Y (λ/0)).
147 Strictly speaking, this follows by applying Lemma 7.3.17 (a) to (i, j ) and (i, j′ ) instead of (i, j )
1
and (i, j2 ).
Math 701 Spring 2021, version April 6, 2024 page 475

but this contradicts i′ < i.


This contradiction shows that our assumption (that i′ < i) was false. Hence, we
must have i′ ≥ i. In other words, i ≤ i′ . However, the tableau T is semistandard;
thus, its entries increase strictly down each column. Therefore, from i ≤ i′ , we obtain
T (i, j) ≤ T (i′ , j) 148 . Thus, T (i′ , j) ≥ T (i, j) = p > i, so that i < T (i′ , j) = i. Thus
we have obtained a contradiction again. This contradiction shows that our assumption
was false; hence, T = T0 . This proves Observation 2.]
Combining Observation 1 with Observation 2, we see that the minimalis-
tic tableau T0 is the only 0-Yamanouchi semistandard tableau of shape λ/0.
Hence, the sum on the right hand side of (282) has only one addend, namely
the addend for T = T0 . Thus, (282) simplifies to

aρ · sλ = acont(T0 )+ρ = aλ+ρ ,

since it is easy to see that cont ( T0 ) = λ. This proves Theorem 7.3.11 (b) (using
Lemma 7.3.34).
Let us furthermore derive Theorem 7.3.32 from Lemma 7.3.34. This relies on
some elementary properties of certain polynomials. The underlying notion is
defined in an arbitrary commutative ring:

Definition 7.3.36. Let L be a commutative ring. Let a ∈ L. The element a of


L is said to be regular if and only if every x ∈ L satisfying ax = 0 satisfies
x = 0.
Regular elements of a commutative ring are often called “non-zero-divisors”149
or cancellable elements. The latter word is explained by the following simple
fact:

Lemma 7.3.37. Let L be a commutative ring. Let a, u, v ∈ L be such that a is


regular. Assume that au = av. Then, u = v.

Proof of Lemma 7.3.37. We have a (u − v) = au − av = 0 (since au = av). How-


ever, a is regular; in other words, every x ∈ L satisfying ax = 0 satisfies x = 0
(by the definition of “regular”). Applying this to x = u − v, we obtain u − v = 0
(since a (u − v) = 0). Thus, u = v. This proves Lemma 7.3.37.
Lemma 7.3.37 shows that regular elements of a commutative ring can be
cancelled when they appear as factors on both sides of an equality. To make
use of this, we need to actually find nontrivial regular elements. Here is one:
148 Strictly speaking, this follows by applying Lemma 7.3.17 (b) to (i, j) and (i′ , j) instead of
(i1 , j) and (i2 , j).
149 This name is somewhat murky in the literature (and is best avoided). In fact, many authors

prefer to consider 0 to be a non-zero-divisor as well (so that they can say that an integral
domain has no zero-divisors, rather than saying that the only zero-divisor in an integral
domain is 0), even though 0 is not a regular element (unless the ring L is trivial). This
exception tends to make the notion of a non-zero-divisor fickle and unreliable.
Math 701 Spring 2021, version April 6, 2024 page 476

Lemma 7.3.38. The element aρ of the polynomial ring P is regular.


Proof of Lemma 7.3.38 (sketched). There are different ways to prove this. One is to
define a lexicographic order on the monomials in P , and to argue that the lead-
ing coefficient of aρ with respect to this order is 1 (which is a regular element
of R); this uses a bit of multivariate polynomial theory (see [23wa, Proposition
6.2.10] for the properties of polynomials that are used here).150
Here is a more elementary proof. First, we observe that each of the indetermi-
nates x1 , x2 , . . . , x N is regular (as an element of P ). Indeed, multiplying a poly-
nomial f by an indeterminate xi merely shifts the coefficients of f to different
monomials; thus, if xi f = 0, then f = 0. Next, we conclude that the polynomial
xi − x j ∈ P is regular whenever 1 ≤ i < j ≤ N. Indeed, this polynomial xi − x j
is the image of the indeterminate xi under a certain K-algebra automorphism
of P (namely, under the automorphism that sends xi to xi − x j while leaving
all other indeterminates unchanged151 ), and therefore is regular because xi is
regular (and because any ring automorphism sends regular elements to regu-
lar elements). However, it is easy to see (see [Grinbe21, Proposition 2.3] for a
proof) that any finite product of regular elements is again regular. Thus, the


element xi − x j ∈ P is regular (since it is the product of the regular
1≤ i < j ≤ N
elements xi − x j for 1 ≤ i < j ≤ N). In view of (265), this rewrites as follows:
The element aρ ∈ P is regular. This proves Lemma 7.3.38.
We can now derive Theorem 7.3.32 from Lemma 7.3.34:
Proof of Theorem 7.3.32 using Lemma 7.3.34. Theorem 7.3.11 (b) (applied to ν in-
stead of λ) tells us that aν+ρ = aρ · sν . However, Lemma 7.3.34 says that
aν+ρ · sλ/µ = ∑ aν+cont T +ρ
T is a ν-Yamanouchi | {z }
semistandard tableau = aρ ·sν+cont T
of shape λ/µ (by Theorem 7.3.11 (b)
(applied to ν+cont T instead of λ),
since ν+cont T is an N-partition
(as we have shown in a comment
after Theorem 7.3.32))
= ∑ aρ · sν+cont T = aρ · ∑ sν+cont T .
T is a ν-Yamanouchi T is a ν-Yamanouchi
semistandard tableau semistandard tableau
of shape λ/µ of shape λ/µ

In view of aν+ρ = aρ · sν , this equality rewrites as


aρ · sν · sλ/µ = aρ · ∑ sν+cont T .
T is a ν-Yamanouchi
semistandard tableau
of shape λ/µ
150 This argument can in fact be used to show a more general statement: Namely, for any N-
tuple α = (α1 , α2 , . . . , α N ) ∈ N N satisfying α1 > α2 > · · · > α N , the alternant aα is regular
in P . (But we won’t need this statement.)
151 This is an automorphism, because its inverse is easily constructed (namely, it sends x to
i
xi + x j while leaving all other indeterminates unchanged).
Math 701 Spring 2021, version April 6, 2024 page 477

Since the element aρ ∈ P is regular (by Lemma 7.3.38), we can cancel aρ from
this equality (i.e., we can apply Lemma 7.3.37 to L = P and a = aρ and u =
sν · sλ/µ and v = ∑ sν+cont T ). As a result, we obtain
T is a ν-Yamanouchi
semistandard tableau
of shape λ/µ

sν · sλ/µ = ∑ sν+cont T .
T is a ν-Yamanouchi
semistandard tableau
of shape λ/µ

Thus, Theorem 7.3.32 is proven.


Thus it remains to prove Stembridge’s lemma. Before we do so, let us spell
out two simple properties of alternants that will be used in the proof:

Lemma 7.3.39. Let α ∈ N N .


(a) If the N-tuple α has two equal entries, then aα = 0.
(b) Let β ∈ N N be an N-tuple obtained from α by swapping two entries.
Then, a β = − aα .

Proof of Lemma 7.3.39. This is an easy consequence of Definition 7.3.2 (b). See
Section B.10 for a detailed proof.
Proof of Lemma 7.3.34. For any β ∈ N N and any i ∈ [ N ], we let β i denote the
i-th entry of β. Thus, for example, ρk = N − k for each k ∈ [ N ] (since ρ =
( N − 1, N − 2, . . . , N − N )).
Since addition on N N is defined entrywise, we have ( β + γ)i = β i + γi for
any β, γ ∈ N N and i ∈ [ N ].
The group S N acts on P by K-algebra automorphisms. Hence, in particular,
we have
σ · ( f g) = (σ · f ) · (σ · g)
for any σ ∈ S N and any f , g ∈ P . In other words, we have

(σ · f ) · (σ · g) = σ · ( f g) (286)

for any σ ∈ S N and any f , g ∈ P .


The polynomial sλ/µ is symmetric (by Theorem 7.3.21). The definition of sλ/µ
yields
sλ/µ = ∑ xT =
|{z} ∑ xcont T . (287)
T ∈SSYT(λ/µ) T ∈SSYT(λ/µ)
= xcont T
(by (279))
Math 701 Spring 2021, version April 6, 2024 page 478

For any β ∈ N N , we have


  
βj 
a β = det xi by the definition of the alternant a β
1≤i ≤ N, 1≤ j≤ N
  
βi
= det xj
1≤i ≤ N, 1≤ j≤ N
 
by Theorem 6.4.10,
     T 
βj β
since xi = xj i
 
1≤i ≤ N, 1≤ j≤ N 1≤i ≤ N, 1≤ j≤ N


β β β
= (−1)σ xσ(11) xσ(22) · · · xσ(NN )
σ∈S N | 
{z 
}
β β β
=σ· x1 1 x2 2 ··· x NN
(because the action of σ∈S N
on P substitutes xσ(i) for each xi )

(by the definition of a determinant)


 
= ∑ (−1) σ · x1 x2 · · · x N = ∑ (−1)σ σ · x β .
σ β1 β2 βN
(288)
σ∈S N | {z } σ∈S N
=xβ
Applying this to β = ν + ρ, we obtain
aν+ρ = ∑ (−1)σ σ · x ν+ρ .
σ∈S N

Multiplying both sides of this equality by sλ/µ , we find


!
aν+ρ · sλ/µ = ∑ (−1)σ σ · x ν+ρ · sλ/µ
σ∈S N

∑ (−1)σ σ · x ν+ρ ·

= sλ/µ
σ∈S N |{z}
=σ·sλ/µ
(since sλ/µ is symmetric,
so that σ·sλ/µ =sλ/µ )

∑ ν+ρ
σ  
= (− 1 ) σ · x · σ · s λ/µ
σ∈S N | {z }
ν +
=σ·( x sλ/µ )
ρ

(by (286))

∑ x ν+ρ sλ/µ .

= (−1)σ σ · (289)
σ∈S N

However, multiplying both sides of (287) by x ν+ρ , we find


x ν+ρ sλ/µ = x ν+ρ ∑ xcont T = ∑ ν+ρ cont T
|x {zx }
T ∈SSYT(λ/µ) T ∈SSYT(λ/µ) = x ν+ρ+cont T
(by (278))

= ∑ ν+ρ+cont T
|x {z } = ∑ x ν+cont T +ρ .
T ∈SSYT(λ/µ) = x ν+cont T +ρ T ∈SSYT(λ/µ)
Math 701 Spring 2021, version April 6, 2024 page 479

Thus, any σ ∈ S N satisfies


 

σ · x ν+ρ sλ/µ = σ ·  ∑ x ν+cont T +ρ  = ∑ σ · x ν+cont T +ρ



T ∈SSYT(λ/µ) T ∈SSYT(λ/µ)

(since the action of σ on P is a K-algebra automorphism of P , and thus in


particular is K-linear). Therefore, (289) becomes

aν+ρ · sλ/µ = ∑ (−1)σ σ · x ν+ρ sλ/µ



σ∈S N | {z }
= ∑ σ· x ν+cont T +ρ
T ∈SSYT(λ/µ)

= ∑ (−1)σ ∑ σ · x ν+cont T +ρ
σ∈S N T ∈SSYT(λ/µ)

= ∑ ∑ (−1)σ σ · x ν+cont T +ρ
T ∈SSYT(λ/µ) σ∈S N
| {z }
= aν+cont T +ρ
(by (288), applied to β=ν+cont T +ρ)

= ∑ aν+cont T +ρ . (290)
T ∈SSYT(λ/µ)

This almost looks like the claim we want to prove, but the sum on the right
hand side is too big: It runs over all semistandard tableaux of shape λ/µ, while
we only want it to run over the ones that are ν-Yamanouchi. Thus, we will now
try to cancel the extraneous addends (i.e., the addends corresponding to the
T’s that are not ν-Yamanouchi).
Let us first make this a bit more precise. We define two sets

A := SSYT (λ/µ) and


X := { T ∈ SSYT (λ/µ) | T is not ν-Yamanouchi} .

For each T ∈ A, we define an element sign T ∈ P by

sign T := aν+cont T +ρ .

Thus, (290) rewrites as


aν+ρ · sλ/µ = ∑ sign T. (291)
T ∈A
We shall now construct a sign-reversing involution f : X → X .
Indeed, let T ∈ X . Thus, T is a semistandard tableau of shape λ/µ that
is not ν-Yamanouchi (by the definition of X ). Hence, there exists at least one
j ≥ 1 such that the N-tuple ν + cont col≥ j T is not an N-partition (by the
definition of “ν-Yamanouchi”). Any such j will be called a violator of T. Thus,
there exists at least one violator of T. In other words, the set of all violators of
Math 701 Spring 2021, version April 6, 2024 page 480

T is nonempty. On the other hand, this set is finite152 . Hence, this set has a
maximum element. In other words, the largest violator of T exists. 153

Let j be the largest violator of T. Then, ν + cont col≥ j T is not an N-

partition, but ν + cont col≥ j+1 T is an N-partition (since j is the largest vi-
olator of T).
Define two N-tuples b ∈ N N and c ∈ N N by b := ν + cont col≥ j T and c :=


ν + cont col≥ j+1 T . Thus, b is not an N-partition154 , but c is an N-partition155 .




Since b is not an N-partition, there exists some k ∈ [ N − 1] such that bk <


bk+1 . Such a k will be called a misstep of T. Thus, there exists a misstep of T.
Let k be the smallest misstep of T. Then, bk < bk+1 (since k is a misstep of T).
Furthermore, ck ≥ ck+1 (since c is an N-partition).
Example 7.3.40. For this example, let N = 7 and ν = (4, 2, 2, 0, 0, 0, 0) and
λ = (7, 7, 6, 5, 4, 0, 0) and µ = (6, 2, 2, 0, 0, 0, 0). Let T be the following semis-
tandard tableau of shape λ/µ:

T= 2 .
1 1 2 2 3
2 2 3 4
1 3 3 5 6
2 4 5 6

152 Proof.
Let j ≥ 1 be larger than each entry of λ. Then, the restricted tableau col≥ j T is empty
 
and thus satisfies cont col≥ j T = 0. Hence, ν + cont col≥ j T = ν + 0 = ν, which is an
| {z }
=0
N-partition by assumption. Thus, j is not a violator of T (by the definition of a “violator”).
Forget that we fixed j. We thus have shown that if j ≥ 1 is larger than each entry of λ,
then j is not a violator of T. Hence, if j ≥ 1 is sufficiently high, then j is not a violator of T.
Thus, the set of violators of T is bounded from above, and therefore finite (since it is a set
of positive integers).
153 For example, if ν = 0, then the tableaux

T2 = 1 1 , T3 = 1 2 , T5 = 1
2 3 2 2 1
3

from Example 7.3.31 have largest violators 2, 3, 1, respectively.


154 since ν + cont col≥ j T is not an N-partition
155 since ν + cont col
≥ j+1 T is an N-partition
Math 701 Spring 2021, version April 6, 2024 page 481

We have

ν + cont col≥ j T = ν = (4, 2, 2, 0, 0, 0, 0) for each j ≥ 8;
| {z }
=0
ν + cont (col≥7 T ) = ν + (0, 1, 1, 0, 0, 0, 0) = (4, 3, 3, 0, 0, 0, 0) ;
| {z }
=(0,1,1,0,0,0,0)
ν + cont (col≥6 T ) = ν + (0, 2, 1, 1, 0, 0, 0) = (4, 4, 3, 1, 0, 0, 0) ;
| {z }
=(0,2,1,1,0,0,0)
ν + cont (col≥5 T ) = ν + (0, 3, 2, 1, 0, 1, 0) = (4, 5, 4, 1, 0, 1, 0) .
| {z }
=(0,3,2,1,0,1,0)

Thus, ν + cont (col≥5 T ) is not an N-partition. This shows that T is not ν-


Yamanouchi (so that T ∈ X ), and in fact 5 is the smallest violator of T. Thus,
according to our above instructions, we set

j := 5 and

b := ν + cont col≥ j T = ν + cont (col≥5 T ) = (4, 5, 4, 1, 0, 1, 0) and

c := ν + cont col≥ j+1 T = ν + cont (col≥6 T ) = (4, 4, 3, 1, 0, 0, 0) .

The missteps of T are the numbers k ∈ [ N − 1] such that bk < bk+1 ; these
numbers are 2 and 5 (since b2 < b3 and b5 < b6 ). Thus, the smallest misstep
of T is 2. Hence, we set k := 2.
Let us next make a few general observations about b and c.
The restrictions col≥ j T and col≥ j+1 T of T are “almost the same”: The only
difference between them is that the j-th column of T is included in col≥ j T but
not in col≥ j+1 T. Hence,
  
cont col≥ j T = cont col≥ j+1 T + cont col j T ,

where col j T denotes the j-th column of T (or, to be more precise, the restriction
of T to the j-th column). Now,
  
b = ν+ cont col≥ j T = ν + cont col≥ j+1 T + cont col j T
| {z } | {z }
=cont(col≥ j+1 T )+cont(col j T ) =c

= c + cont col j T . (292)

Now, recall that the tableau T is semistandard; thus, its entries increase
strictly down each column. Hence, in particular, the entries of the j-th col-
umn of T increase strictly down this column. Therefore, any given number
i ∈ [ N ] appears at most once in this column. In other words, any given number

i ∈ [ N ] appears at most once in col j T. In other words, cont col j T i ≤ 1
Math 701 Spring 2021, version April 6, 2024 page 482


for each i ∈ [ N ] (because cont col j T i counts how often i appears in col j T).

Applying this inequality to i = k + 1, we obtain cont col j T k+1 ≤ 1. Now,
from (292), we obtain
 
bk+1 = c + cont col j T k+1 = ck+1 + cont col j T k+1 ≤ ck+1 + 1,
| {z }
≤1

so that bk < bk+1 ≤ ck+1 + 1. Since bk and ck+1 + 1 are integers, this entails
bk ≤ (ck+1 + 1) − 1 = ck+1 . However, (292) also yields
 
bk = c + cont col j T k = ck + cont col j T k ≥ ck ,
| {z }
≥0
(since (cont(col j T ))k counts
how often k appears in col j T)

so that ck ≤ bk ≤ ck+1 . Combining this with ck ≥ ck+1 , we obtain ck =


ck+1 . Hence, ck+1 = ck . Now, combining bk ≤ ck+1 = ck with  ck ≤ bk , we
obtain bk = ck . Comparing this with bk = ck + cont col j T k , we obtain
 
ck + cont col j T k = ck , so that cont col j T k = 0. In other words, the

number k appears 0 times in col j T (since cont col j T k counts how often k
appears in col j T). In other words, the number k does not appear in col j T. In
other words, the number k does not appear in the j-th column of T.
Let us make one more simple observation, which we will not use until later:
We have bk < bk+1 , so that bk ≤ bk+1 − 1 (since bk and bk+1 are integers). Thus,
bk + 1 ≤ bk+1 . Combining this with bk+1 ≤ ck+1 + 1 ≤ bk + 1, we obtain
|{z}
= c k ≤ bk

bk + 1 = bk + 1 . (293)

Now, let col< j T be the restriction of T to columns 1, 2, . . . , j − 1 (that is, the


result of removing all but the first j − 1 columns from T). Formally speaking,
this means the restriction of the map T to the set {(u, v) ∈ Y (λ/µ) | v < j}.
This restriction col< j T is a semistandard skew tableau of a certain (skew) shape;
thus, we can apply the Bender–Knuth involution β k (from our above proof of
Theorem 7.3.21) to this tableau col< j T instead of T. Let T ∗ be the tableau
obtained from T by applying β k only  to the columns 1, 2, . . . , j − 1 of T (that
is, replacing col< j T by β k col< j T ), while leaving the columns j, j + 1, j + 2, . . .
unchanged. Thus, formally, T ∗ is the tableau of shape Y (λ/µ) defined by

col< j ( T ∗ ) = β k col< j T

and (294)
col≥ j ( T ∗ ) = col≥ j T (295)

(where col< j ( T ∗ ) is defined just as col< j T was defined, except that we are using
T ∗ instead of T).
Math 701 Spring 2021, version April 6, 2024 page 483

Example 7.3.41. Let N, ν, λ, µ and T be as in Example 7.3.40. Let us now


compute T ∗ . As we know, j = 5 and k = 2. Thus, in order to obtain T ∗ ,
we need to apply the Bender–Knuth involution β k = β 2 only to the columns
1, 2, . . . , j − 1 of T (that is, only to the first j − 1 = 4 columns of T), while
leaving the columns 5, 6, 7, . . . unchanged. Here is how this looks like:

T= 2 =⇒ T ∗ = 2
1 1 2 2 3 1 1 2 2 3
2 2 3 4 2 3 3 4
1 3 3 5 6 1 2 3 5 6
2 4 5 6 3 4 5 6

(where we have grayed out all boxes in columns 5, 6, 7, . . ., because the en-
tries in these boxes stay unchanged and are ignored by the Bender–Knuth
involution).
We shall now show that T ∗ ∈ X . Indeed, let us first check that the tableau T ∗
is semistandard. We know that the tableau T is semistandard,so that its restric-
tions col< j T and col≥ j T are semistandard; thus, β k col< j T is semistandard
as well (since the Bender–Knuth involution β k sends semistandard tableaux to
semistandard tableaux). Now, recall that the tableau T ∗ is obtained from T by
applying β k to columns
 1, 2, . . . , j − 1 only; thus, T ∗ is obtained by glueing the
tableaux β k col< j T and col≥ j T together (along a vertical line). Hence:

• Each column of T ∗ is either a column of β k col< j T or a column of




col≥ j T (depending on whether it is one of columns 1, 2, . . . , j − 1 or one


of columns j, j + 1, j + 2, . . .). In either case, the entries of this column

increase strictly down this column (because the tableaux β k col< j T and
col≥ j T are semistandard). Thus, we have shown that the entries of T ∗
increase strictly down each column.

• It is not hard to see that the entries of T ∗ increase weakly along each
row156 .
156 Proof.Let i ∈ [ N ]. We must prove that the entries of T ∗ increase weakly along the i-th row of
T∗. Assume the contrary. Thus, there exist two adjacent entries in the i-th row of T ∗ that are
out of order (in the sense that the one lying further left is larger than the one lying further
right). In other words, there exists some positive integer u such that (i, u) ∈ Y (λ/µ) and
(i, u + 1) ∈ Y (λ/µ) and T ∗ (i, u) > T ∗ (i, u + 1). Consider this u.
Recall that T ∗ is obtained by glueing the tableaux β k col< j T and col≥ j T together (along
a vertical line). Thus, the i-th row of T ∗ is obtained by glueing the i-th row of β k col< j T


together with the i-th row of col≥ j T. In other words, this row consists of two blocks, looking
as follows:

i-th row of β k col< j T i-th row of col≥ j T
.
Math 701 Spring 2021, version April 6, 2024 page 484

Combining these two conclusions, we conclude that T ∗ is a semistandard


tableau. In other words, T ∗ ∈ SSYT
 (λ/µ).
Recall that ν + cont col≥ j T is not an N-partition. In view of (295), this

We shall refer to thesetwo blocks as the left block and the right block (so the left block is the
i-th row of β k col< j T , whereas the right block is the i-th row of col≥ j T). The boundary
between the two blocks falls between the ( j − 1)-st and j-th columns; the left block covers
columns 1, 2, . . . , j − 1, while the right block covers columns j, j + 1, j + 2, . . ..
The entries of the left block increase
 weakly from left to right (since this left block is
a row of the tableau β k col< j T , which is semistandard). Thus, if both boxes (i, u) and
(i, u + 1) belonged to the left block, then we would have T ∗ (i, u) ≤ T ∗ (i, u + 1), which
would contradict T ∗ (i, u) > T ∗ (i, u + 1). Hence, it is impossible for both boxes (i, u) and
(i, u + 1) to belong to the left block; thus, at least one of these boxes must belong to the
right block. Therefore, (i, u + 1) belongs to the right block (since the right block is further
right than the left block).
The entries of the right block also increase weakly from left to right (since this right
block is a row of the tableau col≥ j T, which is semistandard). Thus, if both boxes (i, u) and
(i, u + 1) belonged to the right block, then we would have T ∗ (i, u) ≤ T ∗ (i, u + 1), which
would contradict T ∗ (i, u) > T ∗ (i, u + 1). Hence, it is impossible for both boxes (i, u) and
(i, u + 1) to belong to the right block; thus, at least one of these boxes must belong to the left
block. Therefore, (i, u) belongs to the left block (since (i, u + 1) belongs to the right block).
Thus, the boxes (i, u) and (i, u + 1) straddle the boundary between the left block and the
right block. Since this boundary falls between the ( j − 1)-st and j-th columns, this entails
that the box (i, u) lies on the ( j − 1)-st column, while the box (i, u + 1) lies on the j-th
column. In other words, u = j − 1 and u + 1 = j. Thus, the inequality T ∗ (i, u) > T ∗ (i, u + 1)
can be rewritten as T ∗ (i, j − 1) > T ∗ (i, j). Moreover, from u = j − 1, we obtain j − 1 = u
and thus (i, j − 1) = (i, u) ∈ Y (λ/µ). Furthermore, from u + 1 = j, we obtain j = u + 1 and
thus (i, j) = (i, u + 1) ∈ Y (λ/µ).
The equality (295) shows that the entries of T ∗ in columns j, j + 1, j + 2, . . . equal the
corresponding entries of T. Thus, in particular, we have T ∗ (i, j) = T (i, j) (since the box
(i, j) lies in column j). Thus, T ∗ (i, j − 1) > T ∗ (i, j) = T (i, j).
On the other hand, the tableau T is semistandard, so that its entries increase weakly along
each row. Hence, T (i, j − 1) ≤ T (i, j). Therefore, T (i, j) ≥ T (i, j − 1), so that T ∗ (i, j − 1) >
T (i, j) ≥ T (i, j − 1).
We know that the number k does not appear in the j-th column of T; thus, T (i, j) ̸= k
(since T (i, j) is an entry in the j-th column of T).
We recall a simple property of the Bender-Knuth involution β k (which follows directly
from the construction of β k ): When we apply β k to a semistandard tableau,
– some k’s get replaced by (k + 1)’s,
– some (k + 1)’s get replaced by k’s, and
– all other entries remain unchanged.
Thus, in particular, when we apply β k to a semistandard tableau, the only entries that
can get replaced by larger entries are k’s, and in that case they can only be replaced by
(k + 1)’s. In other words, if some entry of a semistandard tableau gets replaced by a larger
entry when we apply β k to the tableau, then this entry must have been k before applying
β k , and must get replaced by k + 1 when β k is applied.
Since T ∗ is obtained from T by applying β k to columns 1, 2, . . . , j − 1 (while all other
columns remain unchanged), we thus conclude that if some entry of T gets replaced by a
larger entry when we pass from T to T ∗ , then this entry must have been k in T, and must
get replaced by k + 1 in T ∗ . Let us restate this in a more formal language: If ( p, q) ∈ Y (λ/µ)
Math 701 Spring 2021, version April 6, 2024 page 485

rewrites as follows: ν + cont col≥ j ( T ∗ ) is not an N-partition. Hence, the




tableau T ∗ is not ν-Yamanouchi. Thus, T ∗ ∈ X (by the definition of X , since


T ∗ ∈ SSYT (λ/µ)).
Forget that we fixed T. Thus, for each tableau T ∈ X , we have constructed a
tableau T ∗ ∈ X . Let f : X → X be the map that sends each T ∈ X to T ∗ . We
shall now show that f is a sign-reversing involution. First, we shall show the
following:

Observation 1: The map f is an involution.

[Proof of Observation 1: We must show that f ◦ f = id. In other words, we


must prove that f ( f ( T )) = T for each T ∈ X .
Let T ∈ X . Then, the definition of f yields f ( T ) = T ∗ and f ( T ∗ ) = ( T ∗ )∗ .
Recall how T ∗ is constructed from T:

• We let j be the largest


 violator of T. This is the largest j ≥ 1 such that
ν + cont col≥ j T is not an N-partition.

• We let k be the smallest misstep of T. This is thesmallest k ∈ [ N − 1]


such that ν + cont col≥ j T k < ν + cont col≥ j T k+1 . 157


• We apply the Bender–Knuth involution β k only to the columns 1, 2, . . . , j −


1 of T, while leaving the columns j, j + 1, j + 2, . . . unchanged. The result
is T ∗ .

The construction of ( T ∗ )∗ from T ∗ proceeds similarly:

• We let j′ be the largest violator of T ∗ .

• We let k′ be the smallest misstep of T ∗ .

• We apply the Bender–Knuth involution β k only to the columns 1, 2, . . . , j′ −


1 of T ∗ , while leaving the columns j′ , j′ + 1, j′ + 2, . . . unchanged. The re-
sult is ( T ∗ )∗ .
satisfies T ∗ ( p, q) > T ( p, q), then

T ( p, q) = k and T ∗ ( p, q) = k + 1.

We can apply this to ( p, q) = (i, j − 1) (since T ∗ (i, j − 1) > T (i, j − 1)), and thus conclude
that T (i, j − 1) = k and T ∗ (i, j − 1) = k + 1.
Now, from T (i, j − 1) = k, we obtain k = T (i, j − 1) ≤ T (i, j). On the other hand,
T ∗ (i, j − 1) > T (i, j), so that T (i, j) < T ∗ (i, j − 1) = k + 1. Since T (i, j) and k + 1 are
integers, this entails T (i, j) ≤ (k + 1) − 1 = k. Combining this with k ≤ T (i, j), we obtain
T (i, j) = k. This contradicts T (i, j) ̸= k. This contradiction shows that our assumption was
wrong. Hence, we have shown that the entries of T ∗ increase weakly along the i-th row of
T ∗ . Qed.
157 Indeed, a misstep of T was defined to be a k ∈ [ N − 1] such that b < b
 k k +1 , where
b = ν + cont col≥ j T . In other words, a misstep of T means a k ∈ [ N − 1] such that
 
ν + cont col≥ j T k < ν + cont col≥ j T k+1 .
Math 701 Spring 2021, version April 6, 2024 page 486

We claim that this construction undoes the previous construction and recov-
ers T (so that ( T ∗ )∗ = T). To see this, we argue as follows:

• We know that col≥ j ( T ∗ ) = col≥ j T, so that j is the largest violator of T ∗


(since j is the largest violator of T) 158 . Therefore, j′ = j.

• Knowing that j′ = j and col≥ j ( T ∗ ) = col≥ j T, we now conclude that k is


the smallest misstep of T ∗ (since k is the smallest misstep of T). Therefore,
k′ = k.

• Knowing that j′ = j and k′ = k, we conclude that ( T ∗ )∗ is obtained


from T ∗ by the exact same operation that we used to obtain T ∗ from T:
namely, by applying the Bender–Knuth involution β k only to the columns
1, 2, . . . , j − 1 (while leaving the columns j, j + 1, j + 2, . . . unchanged).
However, this operation undoes itself when applied a second time, be-
cause the Bender–Knuth involution β k is an involution159 . Thus, we con-
clude that ( T ∗ )∗ = T.
 

Thus, f  f ( T ) = f ( T ∗ ) = ( T ∗ )∗ = T.
| {z }
=T∗
Forget that we fixed T. We thus have proved that f ( f ( T )) = T for each
T ∈ X . As explained, this completes the proof of Observation 1.]
Next, we shall show two observations about the effect of the map f on the
sign of a tableau:

Observation 2: We have sign ( f ( T )) = − sign T for all T ∈ X .

[Proof of Observation 2: Let T ∈ X . We must show that sign ( f ( T )) = − sign T.

158 Here
is the argument in some more detail:
We have
col≥ j ( T ∗ ) = col≥ j T, (296)
and therefore we also have

col≥ p ( T ∗ ) = col≥ p T for any integer p > j (297)

(since the tableau col≥ p T is obtained by removing some columns from col≥ j T, whereas the
tableau col≥ p ( T ∗ ) is obtained in the same fashion from col≥ j ( T ∗ )).

We know that j is the largest violator of T. In other words, the N-tuple ν + cont col≥ j T

is not an N-partition, but ν + cont col≥ p T is an N-partition for any integer p > j. In view
of (296) and (297), we can rewrite this as follows: The N-tuple ν + cont col≥ j ( T ∗ ) is not


an N-partition, but ν + cont col≥ p ( T ∗ ) is an N-partition for any integer p > j. In other


words, j is the largest violator of T ∗ .


159 This has been shown during our proof of Theorem 7.3.21.
Math 701 Spring 2021, version April 6, 2024 page 487

Define two N-tuples α ∈ N N and γ ∈ N N by α := ν + cont T + ρ and


γ := ν + cont ( T ∗ ) + ρ. Then, f ( T ) = T ∗ (by the definition of f ), and thus

sign ( f ( T )) = sign ( T ∗ ) = aν+cont(T ∗ )+ρ (by the definition of sign ( T ∗ ))


= aγ (since ν + cont ( T ∗ ) + ρ = γ) .

Also, the definition of sign T yields

sign T = aν+cont T +ρ = aα (since ν + cont T + ρ = α) .

Now, we are going to show that the N-tuple γ is obtained from α by swapping
two entries. Once this is shown, we will easily conclude sign ( f ( T )) = − sign T
by applying Lemma 7.3.39 (b).
We recall the notations from the construction of T ∗ : Let j be the largest vi-
olator of T. Let k be the smallest misstep of T. Define an N-tuple b ∈ N N
by b := ν + cont col≥ j T . Then, bk + 1 = bk+1 (as we have proved in (293)).
However,
 
bk = ν + cont col≥ j T k since b = ν + cont col≥ j T

= νk + cont col≥ j T k
| {z }
=(# of k’s in col≥ j T )
(by Definition 7.3.26)

= νk + # of k’s in col≥ j T . (298)

The same argument (applied to k + 1 instead of k) yields



bk+1 = νk+1 + # of (k + 1) ’s in col≥ j T . (299)

We shall now show that γk = αk+1 . Indeed, the vertical line that separates
the ( j − 1)-st and j-th columns cuts the tableau T into its two parts col< j T and
col≥ j T. Thus, every i ∈ [ N ] satisfies

(# of i’s in T )
 
= # of i’s in col< j T + # of i’s in col≥ j T . (300)

The same argument (applied to T ∗ instead of T) shows that every i ∈ [ N ]


satisfies

(# of i’s in T ∗ )
= # of i’s in col< j ( T ∗ ) + # of i’s in col≥ j ( T ∗ ) .
 
(301)
Math 701 Spring 2021, version April 6, 2024 page 488

Now, the definition of cont ( T ∗ ) yields


(cont ( T ∗ ))k = (# of k’s in T ∗ )
   
   
   
∗  ∗ 
= # of k’s in col< j ( T )  + # of k’s in col≥ j ( T )
 
 | {z }   | {z }
= β k (col< j T ) =col≥ j T
   
(by (294)) (by (295))

(by (301), applied to i = k)


 
= # of k’s in β k col< j T + # of k’s in col≥ j T
| {z } | {z }
=(# of (k+1)’s in col< j T ) =bk −νk
(by (268), (by (298))
applied to col< j T instead of T)

= # of (k + 1) ’s in col< j T + bk − νk
|{z}
= bk + 1 − 1
(since bk +1=bk+1 )

= # of (k + 1) ’s in col< j T + bk+1 − 1 − νk .
However, γ = ν + cont ( T ∗ ) + ρ, so that
γk = (ν + cont ( T ∗ ) + ρ)k
= νk + (cont ( T ∗ ))k + ρk
| {z } |{z}
=(# of (k+1)’s in col< j T )+bk+1 −1−νk = N −k
(by the definition of ρ)

= νk + # of (k + 1) ’s in col< j T + bk+1 − 1 − νk + N − k

= # of (k + 1) ’s in col< j T + bk+1 − 1 + N − k. (302)
On the other hand, the definition of cont T yields
(cont T )k+1 = (# of (k + 1) ’s in T )
 
= # of (k + 1) ’s in col< j T + # of (k + 1) ’s in col≥ j T
| {z }
=bk+1 −νk+1
(by (299))

(by (300), applied to i = k + 1)



= # of (k + 1) ’s in col< j T + bk+1 − νk+1 .
However, α = ν + cont T + ρ, so that
αk+1 = (ν + cont T + ρ)k+1
= νk+1 + (cont T )k+1 + ρ k +1
| {z } |{z}
=(# of (k+1)’s in col< j T )+bk+1 −νk+1 = N −(k+1)
(by the definition of ρ)

= νk+1 + # of (k + 1) ’s in col< j T + bk+1 − νk+1 + N − (k + 1)

= # of (k + 1) ’s in col< j T + bk+1 − 1 + N − k.
Math 701 Spring 2021, version April 6, 2024 page 489

Comparing this with (302), we obtain

γk = α k +1 . (303)

A similar argument (using (269) instead of (268)) can be used to show that

γk + 1 = α k . (304)

(See Section B.10 for the details of this argument.)


A further argument of this form (using (270) instead of (268)) can be used to
show that

γi = α i for each i ∈ [ N ] satisfying i ̸= k and i ̸= k + 1. (305)

(See Section B.10 for the details of this argument.)


Combining the three equalities (303), (304) and (305), we see that the N-
tuple γ is obtained from α by swapping two entries (namely, the k-th and the
(k + 1)-st entry). Thus, Lemma 7.3.39 (b) (applied to γ instead of β) yields that
aγ = − aα . This rewrites as sign ( f ( T )) = − sign T (since sign ( f ( T )) = aγ and
sign T = aα ). This proves Observation 2.]

Observation 3: We have sign T = 0 for all T ∈ X satisfying f ( T ) = T.

[Proof of Observation 3: Let T ∈ X be such that f ( T ) = T.


We shall use all the notations that we have introduced in the proof of Ob-
servation 2 above. In particular, we define two N-tuples α ∈ N N and γ ∈ N N
by α := ν + cont T + ρ and γ := ν + cont ( T ∗ ) + ρ, and we let k be the smallest
misstep of T.
The definition of f yields f ( T ) = T ∗ , so that T ∗ = f ( T ) = T. Thus,

γ = ν + cont ( T ∗ ) + ρ = ν + cont T + ρ = α.
| {z }
=cont T

Hence, γk = αk . However, the equality (303) (which we have shown in the proof
of Observation 2) yields γk = αk+1 . Comparing these two equalities, we obtain
αk = αk+1 . Therefore, the N-tuple α ∈ N N has two equal entries (namely, its
k-th and its (k + 1)-st entry). Thus, Lemma 7.3.39 (a) yields aα = 0.
However, the definition of sign T yields

sign T = aν+cont T +ρ = aα (since ν + cont T + ρ = α)


= 0.

This proves Observation 3.]


Now, let us combine what we have shown. We know that the map f : X → X
is an involution (by Observation 1). Moreover, we have

sign ( f ( I )) = − sign I for all I ∈ X


Math 701 Spring 2021, version April 6, 2024 page 490

(by Observation 2, applied to T = I). Furthermore,

sign I = 0 for all I ∈ X satisfying f ( I ) = I

(by Observation 3, applied to T = I). Therefore, Lemma 6.1.4 (applied to the


additive abelian group P ) yields

∑ sign I = ∑ sign I.
I ∈A I ∈A\X

Renaming the summation index I as T on both sides of this equality, we obtain

∑ sign T = ∑ sign T.
T ∈A T ∈A\X

Hence, (291) rewrites as

aν+ρ · sλ/µ = ∑ sign T = ∑ aν+cont T +ρ


T ∈A\X T ∈A\X
| {z }
= aν+cont T +ρ
(by the definition of sign T)

= ∑ aν+cont T +ρ
T is a ν-Yamanouchi
semistandard tableau
of shape λ/µ

(since A \ X is the set of all ν-Yamanouchi semistandard tableaux of shape λ/µ


160 ). This proves Lemma 7.3.34.

Remark 7.3.42. All the above properties of skew Schur polynomials sλ/µ can
be generalized further by taking an arbitrary M ∈ N and allowing λ and
µ to be M-partitions (rather than N-partitions). Thus, the Young diagram
Y (λ/µ) is now defined to be the set {(i, j) | i ∈ [ M] and j ∈ [λi ] \ [µi ]}; in
particular, it may have more than N rows (if M > N). In this generalized
setup, the tableaux of shape λ/µ are defined just as they were in Definition
7.3.15 (in particular, they may have more than N rows, but their entries still
have to be elements of [ N ]); the same applies to the notions of semistandard
tableaux and the skew Schur polynomials (which are still polynomials in
P = K [ x1 , x2 , . . . , x N ]). All of our above results (particularly, Theorem 7.3.21
and Theorem 7.3.32) still hold in this generalized setup (note that the ν in
Theorem 7.3.32 must still be an N-partition, not an M-partition), and the
proofs given above still work.

160 because

A = SSYT (λ/µ) and X = { T ∈ SSYT (λ/µ) | T is not ν-Yamanouchi}


Math 701 Spring 2021, version April 6, 2024 page 491

7.3.6. The Pieri rules


Having proved the Littlewood–Richardson rule, let us discuss a few more of its
consequences. As we know from Example 7.3.10, the complete homogeneous
symmetric polynomials hn and the elementary symmetric polynomials en (for
n ∈ {0, 1, . . . , N }) are instances of Schur polynomials. Thus, by appropriately
specializing Theorem 7.3.32, we can obtain rules for expressing products of the
form hn sµ and en sµ as sums of Schur polynomials. These rules are known as
the Pieri rules. To formulate them, we need some more notations:

Definition 7.3.43. Let λ and µ be two N-partitions.


(a) We write λ/µ for the pair (µ, λ). Such a pair is called a skew partition.
(b) We say that λ/µ is a horizontal strip if we have µ ⊆ λ and the Young
diagram Y (λ/µ) has no two boxes lying in the same column.
(c) We say that λ/µ is a vertical strip if we have µ ⊆ λ and the Young
diagram Y (λ/µ) has no two boxes lying in the same row.
Now, let n ∈ N.
(d) We say that λ/µ is a horizontal n-strip if λ/µ is a horizontal strip and
satisfies |Y (λ/µ)| = n.
(e) We say that λ/µ is a vertical n-strip if λ/µ is a vertical strip and satisfies
|Y (λ/µ)| = n.
Math 701 Spring 2021, version April 6, 2024 page 492

Example 7.3.44. Let N = 4.


(a) If λ = (8, 7, 4, 3) and µ = (7, 4, 4, 1), then we have µ ⊆ λ, and the Young
diagram Y (λ/µ) looks as follows:

Y (λ/µ) = .

From this picture, it is clear that this skew partition λ/µ is a horizontal strip
(and, in fact, a horizontal 6-strip, since |Y (λ/µ)| = 6), but not a vertical strip
(since, e.g., there are 3 boxes in the second row of Y (λ/µ)).
(b) If λ = (3, 3, 2, 1) and µ = (2, 2, 1, 0), then we have µ ⊆ λ, and the Young
diagram Y (λ/µ) looks as follows:

Y (λ/µ) = .

From this picture, it is clear that this skew partition λ/µ is a vertical strip
(and, in fact, a vertical 4-strip, since |Y (λ/µ)| = 4), but not a horizontal strip
(since there are 2 boxes in the third column of Y (λ/µ)).
(c) If λ = (4, 3, 1, 1) and µ = (3, 2, 1, 0), then we have µ ⊆ λ, and the Young
diagram Y (λ/µ) looks as follows:

Y (λ/µ) = .

From this picture, it is clear that this skew partition λ/µ is both a horizontal
strip (and, in fact, a horizontal 3-strip) and a vertical strip (and, in fact, a
vertical 3-strip).
(d) If λ = (3, 3, 2, 1) and µ = (1, 1, 1, 1), then we have µ ⊆ λ, and the Young
diagram Y (λ/µ) looks as follows:

Y (λ/µ) = .
Math 701 Spring 2021, version April 6, 2024 page 493

From this picture, it is clear that this skew partition λ/µ is neither a horizon-
tal strip nor a vertical strip.
Horizontal and vertical strips can also be characterized in terms of the entries
of the partitions:
Proposition 7.3.45. Let λ = (λ1 , λ2 , . . . , λ N ) and µ = (µ1 , µ2 , . . . , µ N ) be two
N-partitions.
(a) The skew partition λ/µ is a horizontal strip if and only if we have

λ1 ≥ µ1 ≥ λ2 ≥ µ2 ≥ · · · ≥ λ N ≥ µ N .

(b) The skew partition λ/µ is a vertical strip if and only if we have

µi ≤ λi ≤ µi + 1 for each i ∈ [ N ] .

Proof. See Exercise A.6.3.8.


We can now state the Pieri rules:
Theorem 7.3.46 (Pieri rules). Let n ∈ N. Let µ be an N-partition. Then:
(a) We have
hn sµ = ∑ sλ .
λ is an N-partition;
λ/µ is a horizontal n-strip

(b) We have
en sµ = ∑ sλ .
λ is an N-partition;
λ/µ is a vertical n-strip

Example 7.3.47. Let N = 4 and µ = (2, 1, 1, 0). Then:


(a) Theorem 7.3.46 (a) (applied to n = 2) yields
h2 s µ = ∑ sλ = s(2,2,1,1) + s(3,1,1,1) + s(3,2,1,0) + s(4,1,1,0) ,
λ is an N-partition;
λ/µ is a horizontal 2-strip

since the N-partitions λ for which λ/µ is a horizontal 2-strip are precisely
the four N-partitions (2, 2, 1, 1), (3, 1, 1, 1), (3, 2, 1, 0) and (4, 1, 1, 0). Here
are the Young diagrams Y (λ) of these four N-partitions λ (with the Y (µ)
subdiagram colored red each time):

, , , .
Math 701 Spring 2021, version April 6, 2024 page 494

(b) Theorem 7.3.46 (b) (applied to n = 2) yields

e2 s µ = ∑ sλ = s(2,2,1,1) + s(2,2,2,0) + s(3,1,1,1) + s(3,2,1,0) ,


λ is an N-partition;
λ/µ is a vertical 2-strip

since the N-partitions λ for which λ/µ is a vertical 2-strip are precisely the
four N-partitions (2, 2, 1, 1), (2, 2, 2, 0), (3, 1, 1, 1) and (3, 2, 1, 0). Here are the
Young diagrams Y (λ) of these four N-partitions λ (with the Y (µ) subdia-
gram colored red each time):

, , , .

Proof of Theorem 7.3.46. See Exercise A.6.3.9.

7.3.7. The Jacobi–Trudi identities


The Jacobi–Trudi identities are determinantal formulas expressing a skew Schur
polynomial sλ/µ in terms of complete homogeneous symmetric polynomials hn
or elementary symmetric polynomials en . We begin with the former:

Theorem 7.3.48 (First Jacobi–Trudi formula). Let M ∈ N. Let λ =


(λ1 , λ2 , . . . , λ M ) and µ = (µ1 , µ2 , . . . , µ M ) be two M-partitions (i.e., weakly
decreasing M-tuples of nonnegative integers). Then,
  
sλ/µ = det h λi − µ j −i + j .
1≤i ≤ M, 1≤ j≤ M

(Here, sλ/µ is defined as in Definition 7.3.19, where the semistandard


tableaux of shape λ/µ are defined as certain fillings of Y (λ/µ) :=
{(i, j) | i ∈ [ M] and j ∈ [λi ] \ [µi ]}, but their entries are still supposed to be
elements of [ N ]. Compare with Remark 7.3.42.)

Example 7.3.49. If M = 3, then Theorem 7.3.48 says that


 
   h λ 1 − µ 1
h λ 1 − µ 2 + 1 h λ 1 − µ 3 + 2
sλ/µ = det h λi − µ j −i + j = det  hλ2 −µ1 −1 hλ2 −µ2 hλ2 −µ3 +1  .
 
1≤i ≤3, 1≤ j≤3
h λ3 − µ1 −2 h λ3 − µ2 −1 h λ3 − µ3
Math 701 Spring 2021, version April 6, 2024 page 495

For instance, if λ = (4, 2, 1) and µ = (1, 0, 0), then


   
h 4−1 h 4−0+1 h 4−0+2 h3 h5 h6
sλ/µ = det  h 2−1−1 h 2−0 h2−0+1  = det  h0 h2 h3 
h 1−1−2 h 1−0−1 h 1−0 h −2 h 0 h 1
 
h3 h5 h6
= det  1 h2 h3  (since h0 = 1 and h−2 = 0) .
0 1 h1

The proof of Theorem 7.3.48 is not too hard using what we have learnt about
lattice paths in Section 6.5. Here is an outline (with some details left to exer-
cises):
Proof of Theorem 7.3.48 (sketched). Let us follow Convention 6.5.1, Definition 6.5.2
and Definition 6.5.5. We will work with the digraph Z2 . For each arc a of the
digraph Z2 , we define an element w ( a) ∈ P (called the weight of a) as follows:

• If a is an east-step (i, j) → (i + 1, j) with j ∈ [ N ], then we set w ( a) := x j .

• If a is any other arc, then we set w ( a) := 1.

For each path p of Z2 , define the weight w ( p) of p by

w ( p) := ∏ w ( a) .
a is an arc of p

Now, it is not hard to see the following (compare with Proposition 6.5.4):

Observation 1: Let a and c be two integers. Then,

∑ w ( p ) = hc− a .
p is a path
from ( a,1) to (c,N )

See Exercise A.6.3.10 (a) for a proof of Observation 1.


Next, for each path tuple p = ( p1 , p2 , . . . , pk ), let us define the weight w (p)
of p by
w ( p ) : = w ( p1 ) w ( p2 ) · · · w ( p k ) .
Set k := M (for the sake of convenience). Thus, λ = (λ1 , λ2 , . . . , λk ) and
µ = ( µ1 , µ2 , . . . , µ k ).
Define two k-vertices A = ( A1 , A2 , . . . , Ak ) and B = ( B1 , B2 , . . . , Bk ) by setting

Ai := (µi − i, 1) and Bi := (λi − i, N ) for each i ∈ [k ] .


Math 701 Spring 2021, version April 6, 2024 page 496

It is easy to see that the conditions (253), (254), (255) and (256) of Corollary
6.5.15 are satisfied. Hence, (257) yields
  

det  ∑ w ( p)
 

p:Ai → Bj
1≤i ≤k, 1≤ j≤k
= ∑ w (p) , (306)
p is a nipat
from A to B

where “p : Ai → Bj ” means “p is a path from Ai to Bj ”. The left hand side of


this equality is
  

det  ∑ w ( p)
 

p:Ai → Bj
1≤i ≤k, 1≤ j≤k
  
= det h(λ − j)−(µ −i) (by Observation 1)
j i 1≤i ≤k, 1≤ j≤k
  
= det h(λ −i)−(µ − j) (by Theorem 6.4.10)
i j 1≤i ≤k, 1≤ j≤k
  
= det h λi − µ j −i + j
1≤i ≤k, 1≤ j≤k
  
= det h λi − µ j −i + j (307)
1≤i ≤ M, 1≤ j≤ M

(since k = M). We shall now analyze the right hand side of (306). To that
purpose, we need to understand the nipats from A to B.
We define the height of an east-step (i, j) → (i + 1, j) to be the number j. We
define the height sequence of a path p to be the sequence of the heights of the
east-steps of p (going from the starting point to the ending point of p). For
example, the path shown in Example 6.5.3 has height sequence (1, 1, 1, 2, 3). It
is clear that the height sequence of a path is always weakly increasing.
If p = ( p1 , p2 , . . . , pk ) is a nipat from A to B, we let T (p) be the tableau of
shape Y (λ/µ) such that the entries in the i-th row of T (p) (for each i ∈ [k]) are
the entries of the height sequence of pi .
Example 7.3.50. Let N = 6 and M = 3 (so that k = M = 3) and λ = (4, 2, 1)
and µ = (1, 0, 0). Here is a nipat p from A to B, and the corresponding
Math 701 Spring 2021, version April 6, 2024 page 497

tableau T (p):

B3 B2 B1

1 2 5
p= p3 =⇒ T (p) = 2 5 .
4
p2 p1

A3 A2 A1

We could have defined the tableau T (p) just as easily for any path tuple p
from A to B (not just for a nipat); however, the case of a nipat is particularly
useful, because it turns out that the tableau T (p) is semistandard if and only if
p is a nipat. Moreover, the following stronger statement holds:
Observation 2: There is a bijection
{nipats from A to B} → SSYT (λ/µ) ,
p 7→ T (p) .

See Exercise A.6.3.10 (b) for a proof of Observation 2.


It is easy to see that w (p) = x T (p) for any nipat p from A to B. Hence,

∑ w (p) = ∑ x T (p) = ∑ xT
p is a nipat p is a nipat T ∈SSYT(λ/µ)
| {z }
= x T (p)
from A to B from A to B
 
here, we have substituted T for T (p)
 in the sum, since the map in 
Observation 2 is a bijection
 

= sλ/µ since sλ/µ is defined to be ∑ xT  .


T ∈SSYT(λ/µ)

Thus,
  

sλ/µ = ∑ w (p) = det  ∑ w ( p) (by (306))


 

p is a nipat p:Ai → Bj
from A to B 1≤i ≤k, 1≤ j≤k
  
= det h λi − µ j −i + j (by (307)) .
1≤i ≤ M, 1≤ j≤ M

This proves Theorem 7.3.48.


Math 701 Spring 2021, version April 6, 2024 page 498

Our above proof of Theorem 7.3.48 is essentially taken from [Stanle23, First
proof of Theorem 7.16.1]; other proofs can be found in [GriRei20, Exercise
2.7.13] (see also [GriRei20, paragraph after Theorem 2.4.6] for several refer-
ences).
The second Jacobi–Trudi formula involves elementary symmetric polynomials
en (instead of hn ) and transpose partitions (as in Exercise A.3.1.1):

Theorem 7.3.51 (Second Jacobi–Trudi formula). Let λ and µ be two parti-


tions. Let λt and µt be the transposes of λ and µ. Let M ∈ N be such that
both λt and µt have length ≤ M. We extend the partitions λt and µt to M-
t t
tuples (by inserting zeroes at the end). Write
t t t t t t t
 these M-tuples λ and µ as
λ = λ1 , λ2 , . . . , λ M and µ = µ1 , µ2 , . . . , µ M . Then,
  
sλ/µ = det eλt − µt −i + j .
i j 1≤i ≤ M, 1≤ j≤ M

Example 7.3.52. If M = 3, then Theorem 7.3.51 says that


 
eλt −µt e λ t − µ t +1 e λ t − µ t +2
   1 1 1 2 1 3
e e e
 
sλ/µ = det eλt − µt −i + j = det  λ2 −µ1 −1
 t t t
λ2 − µ2 t λ2 −µ3t +1
t .
i j 1≤i ≤ M, 1≤ j≤ M

e λ t − µ t −2 e λ t − µ t −1 e λ t − µ t
3 1 3 2 3 3

For instance, if λ = (3, 2, 2) and µ = (1, 1, 0), then λt = (3, 3, 1) and µt =


(2) = (2, 0, 0) (here, we have extended the partition µt to an M-tuple by
inserting zeroes at the end), so that this becomes
   
e3−2 e3−0+1 e3−0+2 e1 e4 e5
sλ/µ = det  e3−2−1 e3−0 e3−0+1  = det  e0 e3 e4 
e1−2−2 e1−0−1 e1−0 e −3 e0 e1
 
e1 e4 e5
= det  1 e3 e4  (since e0 = 1 and e−3 = 0) .
0 1 e1

Proof of Theorem 7.3.51. See Exercise A.6.3.11.


Math 701 Spring 2021, version April 6, 2024 page 499

A. Homework exercises
What follows is a collection of problems (of varying difficulty) that are meant
to illuminate, expand upon and otherwise complement the above text.
The numbers in the squares (like 3 ) are the experience points you gain for
solving the problems. They are a mix of difficulty rating and relevance score:
The harder or more important the problem, the larger is the number in the
square. I believe a 5 represents a good graduate-level homework problem
that requires thinking and work. A 3 usually requires some thinking or work.
A 1 is a warm-up question. A 7 should be somewhat too hard for regular
homework. Anything above 10 is not really meant as homework, but I’d be
excited to hear your ideas. Multi-part exercises sometimes have points split
between the parts – i.e., if parts (b) and (c) of an exercise are solved using the
same idea, then they may both be assigned 3 points even if each for itself
would be a 5 .
In solving an exercise, you can freely use (without proof) the claims of all
exercises above it.
Your goal (for an A grade in the 2024 iteration of Math 531) is to gain at least
20 experience points from each of the Chapters 3–7 (counting Chapter 2 as part
of Chapter 3).

A.1. Before we start...


A.1.1. Binomial coefficients and elementary counting

Definition A.1.1. Let x ∈ R. Then:

• We let ⌊ x ⌋ denote the largest integer ≤ x. This integer ⌊ x ⌋ is called the


floor of x, or the result of “rounding down x”.

• We let ⌈ x ⌉ denote the smallest integer ≥ x. This integer ⌈ x ⌉ is called


the ceiling of x, or the result of “rounding up x”.

Example A.1.2. We have


j√ k
⌊3⌋ = 3; 2 = 1; ⌊π ⌋ = 3; ⌊−π ⌋ = −4;
l√ m
⌈3⌉ = 3; 2 = 2; ⌈π ⌉ = 4; ⌈−π ⌉ = −3.

Let us note that each x ∈ R satisfies ⌊ x ⌋ ≤ x ≤ ⌈ x ⌉.


Math 701 Spring 2021, version April 6, 2024 page 500

Exercise A.1.1.1. 3 Let n ∈ N. Prove that


n
−2
   
n+2
∑ k
= (−1) n
2
.
k =0

The next exercise is concerned with the notion of lacunar sets. This notion
appears all over combinatorics (particularly in connection with Fibonacci num-
bers), so chances are we will meet it again.

Definition A.1.3. A set S of integers is said to be lacunar if it contains no


two consecutive integers (i.e., there is no integer i such that both i ∈ S and
i + 1 ∈ S).

For example, the set {1, 5, 7} is lacunar, but {1, 5, 6} is not. Any 1-element
subset of Z is lacunar, and so is the empty set.
Some people say “sparse” instead of “lacunar”, but the word “sparse” also
has other meanings.

Example A.1.4. The lacunar subsets of {1, 2, 3, 4, 5} are

∅, {1} , {2} , {3} , {4} , {5} , {1, 3} ,


{1, 4} , {1, 5} , {2, 4} , {2, 5} , {3, 5} , {1, 3, 5} .

Exercise A.1.1.2. Let n ∈ N.


(a) 1 Prove that the total # of lacunar subsets of {1, 2, . . . , n} is the Fi-
bonacci number f n+2 .
(b) 1 Let k ∈ {0, 1, . . . , n + 1}. Prove that
 the total # of k-element lacunar
n+1−k
subsets of {1, 2, . . . , n} equals .
k
(c) 1 What goes wrong with the claim of part (b) if k > n + 1 ?
(d) 1 Find the largest possible size of a lacunar subset of {1, 2, . . . , n}.
n−k
 
n
(e) 1 Prove that f n+1 = ∑ for each n ∈ {−1, 0, 1, . . .}.
k =0 k

Exercise A.1.1.3. Let n be a positive integer.


(a) 2 Prove that
         
n n n n n
+ + + +··· = ∑ = 2n −1 .
0 2 4 6 k ∈N
2k
Math 701 Spring 2021, version April 6, 2024 page 501

(b) 3 Prove that


         
n n n n n πn
+ + + +··· = ∑ = 2n−2 + 2n/2−1 cos .
0 4 8 12 k ∈N
4k 4

[Hint: For part (a), compute (1 + 1)n + (1 − 1)√ n


. For part (b), compute
n n n n
(1 + 1) + (1 + i ) + (1 − 1) + (1 − i ) , where i = −1 ∈ C is the imaginary
unit.]

From now on, we shall use the so-called Iverson bracket notation:

Definition A.1.5. If A is any logical statement, then we define an integer


[A] ∈ {0, 1} by (
1, if A is true;
[A] =
0, if A is false.
For example, [1 + 1 = 2] = 1 (since 1 + 1 = 2 is true), whereas [1 + 1 = 1] = 0
(since 1 + 1 = 1 is false).
If A is any logical statement, then the integer [A] is known as the truth
value of A.

Exercise A.1.1.4. (a) 3 Prove that

n−b
     
n a n
= for any n, a, b ∈ C.
a b b a−b

(b) 3 Let N ∈ N.
For each c ∈ C, let Lc ∈ C N × N be the N × N-matrix whose rows are in-
dexed 0, 1, . . . , N − 1 and
 whose columns are indexed 0, 1, . . . , N − 1, and
i i− j
whose (i, j)-th entry is c for each i, j ∈ {0, 1, . . . , N − 1}. (The expres-
 j
i i− j
sion “ c ” should be understood as 0 if i < j, even if c itself is 0.)
j  
1 0 0 0 0
 1c 1 0 0 0 
 
[For example, if N = 5, then Lc =  1c2 2c 1 0 0 .]
 
 3
 1c 3c2 3c 1 0 

1c4 4c3 6c2 4c 1
Prove that
Lc Ld = Lc+d for any c, d ∈ C.
(c) 1 Prove that the matrices L1 and L−1 are mutually inverse.
Math 701 Spring 2021, version April 6, 2024 page 502

Exercise A.1.1.5. Let p be a prime number.


 
p
(a) 2 Prove that p | for each k ∈ {1, 2, . . . , p − 1}.
k
(b) 3 Let a, b ∈ N. Prove that
   
ap a
≡ mod p2 .
bp b

(c) 3 Prove the claim of part (b) still holds if we replace “a, b ∈ N” by
“a, b ∈ Z”.
[Hint: The following suggests a combinatorial solution (algebraic solutions
also exist).
Consider the cyclic group C p = Z/pZ with p elements.
For part (a), let U be the set of all p-tuples of elements of {0, 1} with the
property that exactly k entries of the p-tuple are 1. The group C p acts on U
by cyclic rotation. Argue that each orbit of this action has size divisible by p.
For part (b), let W be the set of all p × a-matrices with entries in {0, 1}
and having the property that the sum of all entries of the matrix is bp
(that is, exactly bp entries are 1). Construct an action of the group C pa =
C p × C p × · · · × C p on W in which the k-th C p factor cyclically rotates the
| {z }
a times  
a
entries of the k-th row of the matrix. Argue that all but orbits of this
b
action have size divisible by p2 , and conclude by writing |W | as the sum of
the sizes of the orbits.
For part (c), fix b ∈ N and p (why enough to consider b ∈ N?),
 is it 
ap a
and show that the remainders of − modulo p2 are periodic as a
bp b
function in a.]

A.2. Generating functions


The notations of Chapter 3 shall be used here. In particular, we fix a commuta-
tive ring K.

A.2.1. Examples
All the properties of generating functions that have been used without proof in
Section 3.1 can also be used in the following exercises.
Math 701 Spring 2021, version April 6, 2024 page 503

Exercise A.2.1.1. 2 The Lucas sequence is the sequence (ℓ0 , ℓ1 , ℓ2 , . . .) of inte-


gers defined recursively by

ℓ0 = 2, ℓ1 = 1, ℓn = ℓn−1 + ℓn−2 for each n ≥ 2.

(Thus, ℓ2 = 3 and ℓ3 = 4 and ℓ5 = 7 and so on.)


Find an explicit formula for ℓn analogous to Binet’s formula for the Fi-
bonacci numbers.
     
1 2n 2n 2n
Exercise A.2.1.2. 1 Prove that = − for any n ∈
n+1 n n n−1
N.

Exercise A.2.1.3. 3 Let q and d be any two real numbers. Let ( a0 , a1 , a2 , . . .)


be a sequence of real numbers such that each n ≥ 1 satisfies an = qan−1 + d.
(This can be viewed as a common generalization of arithmetic and geometric
sequences.)
Find an explicit formula for an in terms of q, d and a0 . (The formula may
depend on whether q is 1 or not.)

Exercise A.2.1.4. 5 Find and prove an explicit formula for the coefficient of
1
x n in the formal power series .
1 − x − x2 + x3

Exercise A.2.1.5. Recall the Fibonacci sequence ( f 0 , f 1 , f 2 , . . .).


(a) 2 Prove that
x
f 0 + f 2 x + f 4 x2 + f 6 x3 + · · · = ∑ f 2k x k =
x2 − 3x + 1
.
k ≥0

(b) 2 Find a degree-2 linear recurrence relation for the sequence


( f 0 , f 2 , f 4 , f 6 , . . .). That is, find two numbers a and b such that each n ≥ 2
satisfies f 2n = a f 2n−2 + b f 2n−4 .
[Hint: For part (a), start with the generating function F ( x ) from Section
F ( x ) + F (− x )
3.1, and compute the “average” in two different ways: On the
2
one hand, this “average” is f 0 + f 2 x2 + f 4 x4 + f 6 x6 + · · · ; on the other hand,
it is a sum of two fractions. Compare the results, and “substitute x1/2 for x”
(that is, replace each x2n by x n ).]

Exercise A.2.1.6. A Motzkin word of length n (where n ∈ N) is an n-tuple


whose entries belong to the set {0, 1, −1} and sum up to 0 (that is, it contains
Math 701 Spring 2021, version April 6, 2024 page 504

equally many 1’s and −1’s), and that has the additional property that for
each k, we have
(# of −1 ’s among its first k entries)
≤ (# of 1’s among its first k entries) .
A Motzkin path of length n is a path from the point (0, 0) to the point (n, 0)
in the Cartesian plane that moves only using “NE-steps” (i.e., steps of the
form ( x, y) → ( x + 1, y + 1)), “SE-steps” (i.e., steps of the form ( x, y) →
( x + 1, y − 1)) and “E-steps” (i.e., steps of the form ( x, y) → ( x + 1, y)) and
never falls below the x-axis (i.e., does not contain any point ( x, y) with y < 0).
For example, here is a Motzkin path from (0, 0) to (6, 0):

.
For each n ∈ N, we define the Motzkin number mn by
mn := (# of Motzkin paths (0, 0) → (n, 0)) .
Here is a table of the first 12 Motzkin numbers mn :

n 0 1 2 3 4 5 6 7 8 9 10 11
.
mn 1 1 2 4 9 21 51 127 323 835 2188 5798

(a) 1 Prove that mn is also the # of Motzkin words of length n for each
n ∈ N.
(b) 2 Let cn be the n-th Catalan number (defined in Section 3.1) for each
n ∈ N. Prove that
n  
n
mn = ∑ c for each n ∈ N.
k =0
2k k

(The sum here  just as well range from k = 0 to ⌊n/2⌋ or range over all
 could
n
k ∈ N, since = 0 when 2k > n.)
2k
(c) 2 Prove that
n n 
n−k n−k
    
1 n 1 n+1
mn = ∑
n + 1 k∑
=
k =0
k+1 k k =0
k+1 k
for each n ∈ N.
[Despite the analogy between the Motzkin numbers mn and the Catalan
numbers cn , there is no formula for mn as simple as (14) or (15).]
Math 701 Spring 2021, version April 6, 2024 page 505

A.2.2. Definitions

Exercise A.2.2.1. 4 Let n ∈ N and m ∈ C. Prove that


  
n     n/2 m
m m  (−1) , if n is even;
∑ (−1)k k n − k =  n/2
k =0 0, if n is odd.

Exercise A.2.2.2. 3 Recall that a commutative ring L is said to be an integral


domain if it is nontrivial (i.e., its zero and its unity are distinct) and has the
property that if a, b ∈ L satisfy ab = 0, then a = 0 or b = 0.
Let K be an integral domain. Prove that the ring K [[ x ]] is an integral
domain.
[Hint: The analogous fact for the polynomial ring K [ x ] is well-known. It is
commonly proved by noticing that if two polynomials are nonzero, then the
leading term of their product equals the product of their leading terms. This
argument does not immediately apply to FPSs, since nonzero FPSs usually
have no leading term. What do nonzero FPSs have, though?]

An FPS a ∈ K [[ x ]] will be called even if it satisfies x1 a =


 
Exercise A.2.2.3.
x a = x5 a = · · · = 0 (that is, if it satisfies [ x n ] a = 0 for all odd n ∈ N).
 3 

Let f ∈ K [[ x ]] be any FPS. Write f in the form f = ∑ f n x n with


n ∈N
f 0 , f 1 , f 2 , . . . ∈ K (so that f n = [ x n ] f for all n ∈ N). We define fe to be
the FPS ∑ f n (− x )n = ∑ (−1)n f n x n . (Using the notations of Definition
n ∈N n ∈N
3.5.1, this FPS fe is the composition f [− x ] = f ◦ (− x ).)
(a) 1 Show that the FPS f + fe is even.
(b) 2 Show that the FPS f · fe is even.

A.2.3. Dividing FPSs

Exercise A.2.3.1. (a) 4 Prove that


n
2n x 2 x
∑ 1 + x 2n = 1 − x .
n ∈N

(In particular, show that the sum on the left hand side is well-defined.)
(b) 3 For each positive integer n, let ν2 (n) be the highest k ∈ N such
that 2k | n. (Equivalently, ν2 (n) is the exponent with which 2 appears in
the prime factorization of n; when n is odd, this is understood to be 0. For
example, ν2 (40) = 3 and ν2 (41) = 0.)
Math 701 Spring 2021, version April 6, 2024 page 506

Prove that n
x2
∑ 1 − x2n = ∑ (ν2 (n) + 1) xn .
n ∈N n >0
(In particular, show that the sum on the left hand side is well-defined.)

A.2.4. Polynomials
The following exercise is a generalization of Binet’s formula for the Fibonacci
sequence (Example 1 in Section 3.1):

Exercise A.2.4.1. Let F be a field of characteristic 0 (that is, a field that is a


Q-algebra). Let d be a positive integer, and let p1 , p2 , . . . , pd be d elements of
d
F. Let p ∈ F [ x ] be the polynomial 1 − ∑ pi xi .
i =1
Let ( a0 , a1 , a2 , . . .) be a sequence of elements of F with the property that
each integer n ≥ d satisfies

d
an = ∑ pi a n −i .
i =1

(Such a sequence is said to be a linearly recursive sequence with constant coef-


ficients. For example, if d = 2 and p1 = 1 and p2 = 1, then each n ≥ 2
must satisfy an = an−1 + an−2 , that is, the recursive equation of the Fibonacci
sequence. Of course, the starting values a0 , a1 , . . . , ad−1 of the sequence can
be arbitrary.)
(a) 3 Prove that there is some polynomial q ∈ F [ x ] of degree < d (this
allows q = 0) such that
q
a0 + a1 x + a2 x 2 + a3 x 3 + · · · = in F [[ x ]] .
p

(b) 2 Assume that the polynomial p ∈ F [ x ] can be factored as

p = (1 − r1 x ) (1 − r2 x ) · · · (1 − r d x )

for some distinct elements r1 , r2 , . . . , rd of F. Prove that there exist d scalars


λ1 , λ2 , . . . , λd ∈ F such that each n ∈ N satisfies
d
an = ∑ λi rin .
i =1

(c) 3 Now, assume instead that the polynomial p ∈ F [ x ] can be factored


as
p = (1 − r 1 x ) m1 (1 − r 2 x ) m2 · · · (1 − r k x ) m k
Math 701 Spring 2021, version April 6, 2024 page 507

for some distinct elements r1 , r2 , . . . , rk of F and some nonnegative integers


m1 , m2 , . . . , mk . Prove that there exist k polynomials u1 , u2 , . . . , uk ∈ F [ x ] such
that deg ui < mi for each i ∈ {1, 2, . . . , k }, and such that each n ∈ N satisfies
n
an = ∑ ui (n) rin .
i =1

The next exercise reveals an application of FPSs to number theory (more such
applications will appear later on):
Exercise A.2.4.2. Let p and q be two coprime positive integers. We define the
set
S ( p, q) := { ap + bq | ( a, b) ∈ N × N} .
(For example, if p = 3 and q = 5, then S ( p, q) = {0, 3, 5, 6, 8, 9, 10, 11, . . .},
where the “. . .” is saying that all integers ≥ 8 belong to S ( p, q). The set
S ( p, q) can be viewed as the set of all denominations that can be paid with
p-cent coins and q-cent coins, without getting change.)
(a) 3 Prove that

1 − x pq
∑ xn =
(1 − x p ) (1 − x q )
.
n∈S( p,q)

(b) 3 Prove that every integer n > pq − p − q belongs to S ( p, q), whereas


the integer pq − p − q itself does not.
[Hint: For part (a), describe the coefficient of x m in

(1 − x p ) (1 − x q ) ∑ xn = ∑ x n − x n+ p − x n+q + x n+ p+q

n∈S( p,q) n∈S( p,q)

in a form revealing that it is 0 unless m = 0 or m = pq. Now, part (b) can


be solved as follows: First, show that every sufficiently high n ∈ N belongs
1 1 − x pq
to S ( p, q). Hence, ∑ xn = − is a polynomial.
n∈N; 1−x (1 − x p ) (1 − x q )
∈S( p,q)
n/
Finding the largest integer that doesn’t belong to S ( p, q) means finding the
degree of this polynomial.]

Exercise A.2.4.3. Let N ∈ N. Let PN denote the C-vector space of all polyno-
mials f ∈ C [ x ] of degree < N. Consider the matrices Lc for all c ∈ C defined
in Exercise A.1.1.4 (b).  
For each c ∈ C, let Bc be the basis ( x − c)0 , ( x − c)1 , . . . , ( x − c) N −1
of PN . (This is  a basis, since it is the image of the monomial basis
x0 , x1 , . . . , x N −1 under the “substitute x − c for x” automorphism.)
Math 701 Spring 2021, version April 6, 2024 page 508

(a) 2 Let c, d ∈ C. Prove that ( Lc ) T (that is, the transpose of Lc ) is the


change-of-basis matrix from the basis Bc+d to the basis Bd . (This means that
n −1  
( x − d) j = ∑ ( Lc ) T
i,j
( x − (c + d))i for any j ∈ {0, 1, . . . , n − 1} ,
i =0
 
T
where ( Lc ) denotes the (i, j)-th entry of the matrix ( Lc ) T .)
i,j

(b) 2 Use this to give a new solution to Exercise A.1.1.4 (b) (without using
Exercise A.1.1.4 (a)).

A.2.5. Substitution and evaluation of power series

Exercise A.2.5.1. 1 Let K be a commutative ring. Let a ∈ K [[ x ]] be the FPS


x
. Prove that a ◦ a = x.
x−1

  A.2.5.2. Let K be a commutative ring. Let a ∈ K [[ x ]] be an FPS such


Exercise
that x0 a = 0.
A compositional inverse of a shall mean a FPS b ∈ K [[ x ]] such that x0 b = 0
 

and a ◦ b = x and b ◦ a = x.
Prove the following:
(a) 1 If a compositional inverse of a exists, then it is unique.
(b) 4 A compositional inverse of a exists if and only if x1 a is invertible
 

in K.
A.2.6. Derivatives of FPSs

Exercise A.2.6.1. 2 Let f ∈ K [[ x ]] be an FPS. Let p and q be two coprime


nonnegative integers. Prove that the coefficient [ x q ] ( f p ) is a multiple of p
(that is, there exists some c ∈ K such that [ x q ] ( f p ) = pc).
[Hint: More generally, prove that q · [ x q ] ( f p ) is a multiple of p whether or
not p and q are coprime. (Think about the coefficients of ( f p )′ .)]

The next exercise is concerned with generalizing the two equalities


1
1 + x + x2 + x3 + · · · = and
1−x
x
0 + 1x + 2x2 + 3x3 + · · · =
(1 − x )2
that we have encountered in Section 3.1 (as (5) and (18), respectively).
Math 701 Spring 2021, version April 6, 2024 page 509

Exercise A.2.6.2. For any m ∈ N, we define an FPS

Qm := ∑ nm x n = 0m x0 + 1m x1 + 2m x2 + · · · ∈ Z [[ x ]] .
n ∈N

For example,
1
Q0 = x 0 + x 1 + x 2 + x 3 + · · · = ;
1−x
x
Q1 = 0x0 + 1x1 + 2x2 + 3x3 + · · · = (by (18)) ;
(1 − x )2
it can furthermore be shown that
x (1 + x )
Q2 = 0x0 + 1x1 + 4x2 + 9x3 + · · · = ;
(1 − x )2
x 1 + 4x + x2

Q3 = 0x0 + 1x1 + 8x2 + 27x3 + · · · = ;
(1 − x )4
x 1 + 11x + 11x2 + x3

Q4 = 0x0 + 1x1 + 16x2 + 81x3 + · · · = .
(1 − x )5
The expressions become more complicated as m increases, but one will still
Am
notice that each Qm has the form , where Am is a polynomial of
(1 − x ) m +1
degree m that has constant term 0 (unless m = 0) and whose coefficients have
a “palindromic” symmetry (in the sense that the sequence of coefficients is
symmetric across its middle). Let us prove this.
For each m ∈ N, we define an FPS

Am := (1 − x )m+1 Qm ∈ Z [[ x ]] .
Am
(Thus, Qm = , so that the Am we just defined are the Am we are
(1 − x ) m +1
interested in – but we don’t yet know that they are polynomials.)
Let ϑ : Z [[ x ]] → Z [[ x ]] be the Z-linear map that sends each FPS f ∈ Z [[ x ]]
to x f ′ . (That is, ϑ takes the derivative of an FPS and then multiplies it by x.)
(a) 1 Prove that ϑ ( f g) = ϑ ( f ) · g + f · ϑ ( g) for any f , g ∈ Z [[ x ]]. (In the
lingo of algebraists, this is saying that ϑ is a derivation of Z [[ x ]].)
 
(b) 1 Prove that ϑ (1 − x )k = −kx (1 − x )k−1 for each k ∈ Z.

(c) 1 Prove that Qm = ϑ ( Qm−1 ) for each m > 0.


(d) 2 Prove that Am = mxAm−1 + x (1 − x ) A′m−1 for each m > 0.
(e) 1 Conclude that Am is a polynomial of degree ≤ m for each m ∈ N.
Math 701 Spring 2021, version April 6, 2024 page 510

(f) 1 Show that x0 ( Am ) = 0 for each m > 0.


 

(g) 2 Show that xi ( Am ) = (m − i + 1) xi−1 ( Am−1 ) + i xi ( Am−1 ) for


     

each m > 0 and each i > 0.


(h) 3 Show that xi ( Am ) = x m+1−i ( Am ) for each m > 0 and each
   

i ∈ {0, 1, . . . , m + 1}.
The polynomials A0 , A1 , A2 , . . . are known as the Eulerian polynomials.

Exercise A.2.6.3. For any nonzero FPS f ∈ K [[ x ]], define the order ord ( f ) of
f to be the smallest m ∈ N such that [ x m ] f ̸= 0. Further define the norm || f ||
1
of an FPS f ∈ K [[ x ]] to be the rational number ord( f ) if f is nonzero. If f is
2
zero, set || f || := 0.
This norm on K [[ x ]] gives rise to a metric d : K [[ x ]] × K [[ x ]] → Q on
K [[ x ]], defined by

d ( f , g) = || f − g|| for any f , g ∈ K [[ x ]] .

This metric, in turn, induces a topology on K [[ x ]].


[Note that the norm we have defined is not a norm in the sense of func-
tional analysis, even if K is R or C, since (for example) ||2 f || equals || f ||
rather than 2 || f ||. However, it is used in the same way to define a metric and
thus a topology.]
(a) 3 Prove that K [[ x ]] is a complete metric space with respect to this
metric.
(b) 3 Prove that the maps

K [[ x ]] × K [[ x ]] → K [[ x ]] ,
( f , g) 7→ f + g

and

K [[ x ]] × K [[ x ]] → K [[ x ]] ,
( f , g) 7→ f g

and

K [[ x ]] × K [[ x ]]0 → K [[ x ]] ,
( f , g) 7→ f ◦ g

are continuous with respect to the topologies induced by this metric. (Recall
that K [[ x ]]0 denotes the subset of K [[ x ]] consisting of all FPSs g ∈ K [[ x ]]
0

satisfying x g = 0. This subset becomes a topological space by inheriting
a subspace topology from K [[ x ]].)
Math 701 Spring 2021, version April 6, 2024 page 511

(c) 1 Prove that the map

K [[ x ]] → K [[ x ]] ,
f 7→ f ′

is Lipschitz continuous with Lipschitz constant 2.


(d) 1 Assume that K is a commutative Q-algebra. Let
R
denote
the K-linear map from K [[ x ]] to K [[ x ]] that sends each FPS ∑ an x n to
n ∈N
1
∑ an x n+1 . (This map
R
is an algebraic analogue of the antideriva-
n ∈N n + 1 R
tive.) Prove that this map is Lipschitz continuous with Lipschitz constant
1
.
2
[Hint: For part (b), the topology on the product of two metric spaces is
induced by the sup metric, which is given by

dsup (( f 1 , g1 ) , ( f 2 , g2 )) = max {d ( f 1 , f 2 ) , d ( g1 , g2 )} .

Show that all three maps are Lipschitz continuous with Lipschitz constant 1
– i.e., that any ( f 1 , g1 ) and ( f 2 , g2 ) in the respective product spaces satisfy

d ( f 1 + g1 , f 2 + g2 ) ≤ dsup (( f 1 , g1 ) , ( f 2 , g2 )) and
d ( f 1 g1 , f 2 g2 ) ≤ dsup (( f 1 , g1 ) , ( f 2 , g2 )) and
d ( f 1 ◦ g1 , f 2 ◦ g2 ) ≤ dsup (( f 1 , g1 ) , ( f 2 , g2 )) .

Exercise A.2.6.4. 3 Let K be a commutative Q-algebra. Let f ∈ K [[ x ]] be any


FPS. Prove that there exists a unique FPS g ∈ K [[ x ]] satisfying x0 g = 0 and
g′ = f ◦ g.
[Hint: This is an algebraic version of local existence and uniqueness of
a solution of an ODE. There is an elementary recursive way to prove this.
However, a more elegant
R way is toRrewrite the ODE g′ = f ◦ g as an inte-
gral equation g = ( f ◦ g), where is the map defined in Exercise A.2.6.3
(d). This integral equation saysR that g is a fixed point of the (nonlinear) op-
erator K [[ x ]] → K [[ x ]] , h 7→ ( f ◦ h). Now, apply the Banach fixed-point
theorem.]

Exercise A.2.6.5. A set partition of a set U means a set {S1 , S2 , . . . , Sk } of


disjoint subsets S1 , S2 , . . . , Sk of U that satisfy S1 ∪ S2 ∪ · · · ∪ Sk = U. These
sets S1 , S2 , . . . , Sk are called the blocks (or the parts) of this set partition.
For any n, k ∈ N, we define S (n, k ) to be the number of set partitions of
Math 701 Spring 2021, version April 6, 2024 page 512

[n] that have k blocks. For example, S (4, 3) = 6, since the set partitions of [4]
that have 3 blocks are

{{1, 2} , {3} , {4}} , {{1, 3} , {2} , {4}} , {{1, 4} , {2} , {3}} ,


{{2, 3} , {1} , {4}} , {{2, 4} , {1} , {3}} , {{3, 4} , {1} , {2}} .

The number S (n, k ) is called a Stirling number of the 2nd kind.


(a) 1 Show that every positive integer n satisfies

S (n, n) = 1; S (n, 0) = 0; S (n, 1) = 1;


 
n −1 n
S (n, 2) = 2 − 1; S (n, n − 1) = ;
2
S (n, k ) = 0 for all k > n.

(The first and the last equalities here hold for n = 0 as well. However,
S (0, 0) = 1 and S (0, 1) = 0.)
(b) 3 Show that every n ∈ N satisfies
n
xn = ∑ S (n, k) xk in K [ x ] ,
k =0

k −1
where xk denotes the polynomial ∏ (x − i) =
i =0
x ( x − 1) ( x − 2) · · · ( x − k + 1).
(c) 1 Show that every n ∈ N satisfies
n
∑ (−1)k k! · S (n, k) = (−1)n .
k =0

(d) 2 Show that every n ∈ N satisfies


(
n
1, if n = 1;
∑ (−1) k −1
(k − 1)! · S (n, k) =
0, if n ̸= 1.
k =1

[Hint: For part (d), take the derivative of part (b) and evaluate at x = 0.]

A.2.7. Exponentials and logarithms


  0 f′
Recall that K [[ x ]]1 = f ∈ K [[ x ]] | x f = 1 , and that loder f = for any
f
f ∈ K [[ x ]]1 .
Math 701 Spring 2021, version April 6, 2024 page 513

Exercise A.2.7.1. 1 For this exercise, let K be any commutative ring (not
necessarily a Q-algebra). Let f ∈ K [[ x ]]1 and g ∈ K [[ x ]]0 be two FPSs. Note
that f ◦ g ∈ K [[ x ]]1 (by Lemma 3.7.7 (b)), so that loder ( f ◦ g) is well-defined.
Prove that
loder ( f ◦ g) = ((loder f ) ◦ g) · g′ .

Exercise A.2.7.2. Recall the Stirling numbers of the 2nd kind S (n, k ) defined
in Exercise A.2.6.5.
(a) 2 Show that every positive integers n and k satisfy

S (n, k ) = k · S (n − 1, k ) + S (n − 1, k − 1) .

(b) 3 Show that every k ∈ N satisfies

S (n, k) n (exp [ x ] − 1)k


∑ n! x = k!
.
n ∈N

[Hint: For part (b), denote the left hand side by f k , and show that f k′ =
k f k + f k−1 for each k ≥ 0.]

A.2.8. Non-integer powers

Exercise A.2.8.1. Let K be a nontrivial commutative ring.


(a) 2 Prove that there exists no FPS f ∈ K [[ x ]] such that f 2 = x. (Do not
assume that K is a field or an integral domain!)
(b) 2 More generally: Let f ∈ K [[ x ]] and n ∈ N. Assume that

f n = ax m + ∑ ai x i
i >m

for some m ∈ N, some invertible a ∈ K and some elements


am+1 , am+2 , am+3 , . . . ∈ K. Prove that n | m. [This might require a little bit
of commutative algebra – specifically the fact that any nontrivial commuta-
tive ring has a maximal ideal.]
(c) 1 Now assume that K = Z/2 is the field with 2 elements. Prove that
there exists no FPS f ∈ K [[ x ]] such that f 2 = 1 + x.

Exercise A.2.8.2. 1 Prove Theorem 3.8.2.


Math 701 Spring 2021, version April 6, 2024 page 514

Exercise A.2.8.3. 5 Recall the Catalan numbers c0 , c1 , c2 , . . . introduced in


Example 2 in Section 3.1. Prove that
n
∑ c2k c2(n−k) = 4n cn for each n ∈ N.
k =0

Exercise A.2.8.4. 4 (a) Prove that there exists a unique sequence


( a0 , a1 , a2 , . . .) of rational numbers that satisfies a0 = 1 and
n
∑ ak an−k = 1 for all n ∈ N.
k =0

(b) Find an explicit formula for the n-th entry an of this sequence (in terms
of binomial coefficients).

A.2.9. Integer compositions

Exercise A.2.9.1. 3 Let n be a positive integer. Let m ∈ N. Prove that

n+m−1
 
∑ a1 a2 · · · a n =
2n − 1
.
( a ,a ,...,a )∈Nn ;
1 2 n
a1 + a2 +···+ an =m

Exercise A.2.9.2. 5 Let n be a positive integer. Recall the Fibonacci sequence


( f 0 , f 1 , f 2 , . . .). Prove that:
(a) The # of compositions (α1 , α2 , . . . , αm ) of n such that α1 , α2 , . . . , αm are
odd is f n .
(b) The # of compositions (α1 , α2 , . . . , αm ) of n such that αi ≥ 2 for each
i ∈ {1, 2, . . . , m} is f n−1 .
(c) The # of compositions (α1 , α2 , . . . , αm ) of n such that αi ≤ 2 for each
i ∈ {1, 2, . . . , m} is f n+1 .

(Note that Exercise A.2.9.2 is behind many appearances of the Fibonacci num-
bers in the research literature, e.g., in the theory of “peak algebras”.)
It is surprising that one and the same sequence (the Fibonacci sequence)
answers the three different counting questions in Exercise A.2.9.2. Even more
surprisingly, this generalizes:
Math 701 Spring 2021, version April 6, 2024 page 515

Exercise A.2.9.3. 4 Let n and k be two positive integers such that k > 1.
Let u be the # of compositions α of n such that each entry of α is congruent
to 1 modulo k.
Let v be the # of compositions β of n + k − 1 such that each entry of β is
≥ k.
Let w be the # of compositions γ of n − 1 such that each entry of γ is either
1 or k.
Prove that u = v = w.

A.2.10. x n -equivalence

Exercise A.2.10.1. 2 Assume that K is a commutative Q-algebra. Let n ∈ N.


xn xn
Let c ∈ K and a, b ∈ K [[ x ]]1 satisfy a ≡ b. Prove that ac ≡ bc . (See Definition
3.7.6 and Definition 3.8.1 for the meanings of K [[ x ]]1 , ac and bc .)

Exercise A.2.10.2. 3 Let a, b ∈ K [[ x ]] be two FPSs that have compositional


inverses. (See Exercise A.2.5.2 for the meaning of “compositional inverse”.)
Let e b be the compositional inverses of a and b. Let n ∈ N be such that
a and e
x n xn
a ≡ b. Prove that e a≡e b.
A.2.11. Infinite products

Exercise A.2.11.1. 2 Prove that each nonnegative integer can


be written uniquely in the form ∑ ak · k!, for some sequence
k ≥1
( a1 , a2 , a3 , . . .) of integers satisfying (0 ≤ ak ≤ k for each k ≥ 1) and
( ak = 0 for all but finitely many k ≥ 1).
[Hint: Simplify the FPS ∏ 1 + x k! + x2k! + · · · + x k·k! .]

k ≥1

Exercise A.2.11.2. (a) 1 Prove that the family 1 − ai xi



i ∈{1,2,3,...}
is multi-
pliable whenever ( a1 , a2 , a3 , . . .) ∈ K {1,2,3,...} is a sequence of elements of K.

(b) 3 Let f ∈ K [[ x ]] be an FPS with constant term x0 f = 1. Prove that


 

there is a unique sequence ( a1 , a2 , a3 , . . .) ∈ K {1,2,3,...} such that


∞  
f = ∏ 1 − ai x i .
i =1

We call this sequence ( a1 , a2 , a3 , . . .) the Witt coordinate sequence of f .


1
(c) 1 Find the Witt coordinate sequence of the FPS .
1−x
Math 701 Spring 2021, version April 6, 2024 page 516

Next come some more exercises on the technicalities of multipliability and in-
finite products. The first one is a (partial) converse to Theorem 3.11.10:

Exercise A.2.11.3. 5 Let (ai )i∈ I be a multipliable family of FPSs such that
each ai is invertible (in K [[ x ]]). Prove that the family (ai − 1)i∈ I is summable.

Exercise A.2.11.4. Let (ai )i∈ I and (bi )i∈ I be two families of FPSs.
(a) 1 If (ai )i∈ I and (bi )i∈ I are multipliable, is it necessarily true that the
family (ai + bi )i∈ I is multipliable?
(b) 1 If (ai )i∈ I is summable and (bi )i∈ I is multipliable, is it necessarily
true that the family (ai + bi )i∈ I is multipliable?
(c) 1 Does the answer to part (b) change if we additionally assume that
bi is invertible for each i ∈ I ?

Exercise A.2.11.5. 5 Prove the following generalization of Proposition


3.11.30:
Let I be a set. For any i ∈ I, let Si be a set. Set

S = {(i, k ) | i ∈ I and k ∈ Si and k ̸= 0} .

For any i ∈ I and any k ∈ Si , let pi,k be an element of K [[ x ]]. Assume that

pi,0 = 1 for any i ∈ I satisfying 0 ∈ Si .

Assume further that the family ( pi,k )(i,k)∈S is summable. Then, the product
∏ ∑ pi,k is well-defined (i.e., the family ( pi,k )k∈Si is summable for each
i ∈ I k ∈ Si
!
i ∈ I, and the family ∑ pi,k is multipliable), and we have
k ∈ Si i∈ I

∏ ∑ pi,k = ∑ ∏ pi,ki . (308)


i∈ I k ∈ Si ( k i ) i ∈ I ∈ ∏ Si i∈ I
i∈ I
is essentially finite

Exercise A.2.11.6. 3 Prove Proposition 3.11.37 and Proposition 3.11.38.

A.2.12. The generating function of a weighted set

Exercise A.2.12.1. Recall the Motzkin paths and the Motzkin numbers mn ,
defined in Exercise A.2.1.6.
Math 701 Spring 2021, version April 6, 2024 page 517

(a) 3 Consider the weighted set

M := {Motzkin paths from (0, 0) to (n, 0) with n ∈ N} ,

where the weight | P| of a Motzkin path P from (0, 0) to (n, 0) is defined to


be n. Also consider the weighted set X = {1} (with weight |1| = 1) and the
weighted set N (with weights given by |n| = n for each n ∈ N). Note that
N∼ = 1 + X + X 2 + X 3 + · · · (an infinite disjoint union). Prove that
 
M∼ = N × 1 + X2 × M × M .

(b) 2 Conclude that the weight generating function M = ∑ mn x n of M


n ∈N
is p
1−x− (1 + x ) (1 − 3x )
M= .
2x2
(c) 2 Prove that

1 n +2
  
n+2−k 1/2 1/2
mn = − ∑ (−3) for each n ∈ N.
2 k =0 k n+2−k

Exercise A.2.12.2. (a) 3 Extend the analysis in Subsection 3.12.3 to domino


tilings of height-4 rectangles by defining the weighted set

F := {faultfree domino tilings of Rn,4 with n ∈ N}

x2 x2
and showing that F = x + x2 + +2· .
1 − x2 1−x
(b) 3 Use this to find an explicit formula for dn,4 (as defined in Definition
3.12.11 (e)) that uses only quadratic irrationalities. (Note that the formula
will be rather intricate and contain nested square roots.)

Exercise A.2.12.3. 5 This exercise is about a variation on domino tilings. We


define “shapes” and the specific shapes Rn,m as in Definition 3.12.11. Fix a
positive integer k.
A k-omino means a size-k shape of the form

{(i + 1, j) , (i + 2, j) , . . . , (i + k, j)} (a “horizontal k-omino”) or


{(i, j + 1) , (i, j + 2) , . . . , (i, j + k)} (a “vertical k-omino”)

for some (i, j) ∈ Z2 .


A k-omino tiling of a shape S is a set partition of S into k-ominos (i.e., a set
of disjoint k-ominos whose union is S).
Math 701 Spring 2021, version April 6, 2024 page 518

Prove that the shape Rn,m has a k-omino tiling if and only if we have k | n
or k | m.
[Hint: Consider Rn,m as a weighted set, where the weight of a square
(i, j) ∈ N2 is defined to be i + j. If Rn,m has a k-omino tiling, then show
that the weight generating function Rn,m = ∑ xi+ j must be divisible by
(i,j)∈ Rn,m
x2
1+ x + + ··· + (as a polynomial in Q [ x ], for example). However,
x k −1
Rn,m has a simple form.]

A.2.13. Limits of FPSs

Exercise A.2.13.1. (a) 1 Prove Proposition 3.13.13.


(b) 2 Prove Proposition 3.13.12.
(c) 1 Let ( f 0 , f 1 , f 2 , . . .) be a sequence of FPSs that have compositional in-
verses. (See Exercise A.2.5.2 for the definition of a compositional inverses.)
Let f be a further FPS such that lim f i = f . Prove that f also has a composi-
i→∞
tional inverse, and furthermore that

lim ef i = fe,
i→∞

where ge denotes the compositional inverse of an FPS g.

Exercise
 0 A.2.13.2. (a) 2 Let f ∈ K [[ x ]] be any FPS whose constant term
x f is nilpotent. (An element u ∈ K is said to be nilpotent if there exists
some m ∈ N such that um = 0.) Prove that lim f i = 0.
i→∞

(b) 2 Assume that K is an integral domain.


i
 g ∈ K [[ x ]] be any FPS.
 0Let
Prove that lim g exists if and only if g = 1 or x g = 0.
i→∞
Math 701 Spring 2021, version April 6, 2024 page 519

Exercise A.2.13.3. 5 Recall the Catalan numbers c0 , c1 , c2 , . . . introduced in


Example 2 in Section 3.1, and the corresponding FPS C ( x ) = ∑ cn x n . Prove
n ∈N
that
1 x
1− = x ,
C (x) 1− x
1−
..
.
where the continued fraction on the right hand side is to be understood as
x
lim x .
n→∞
1− x
1−
1−
...
x

1−x
| {z }
with n layers

(This requires checking that the n-layered finite continued fractions are
well-defined and converge to a limit in K [[ x ]].)

Exercise A.2.13.4. Let ( a0 , a1 , a2 , . . .) be an infinite sequence of FPSs in K [[ x ]]


such that lim an exists. Prove the following:
n→∞

(a) 2 We have
∑ (ai − ai+1 ) = a0 − nlim
→∞
an
i ∈N

(and, in particular, the family ( ai − ai+1 )i∈N is summable).


(b) 2 If all the FPSs a1 , a2 , a3 , . . . are invertible, then
ai a0
∏ a i +1
=
lim an
i ∈N n→∞
 
ai
(and, in particular, the family is multipliable).
a i +1 i ∈N
(These are infinite analogues of the telescope rules for sums and products.)

A.2.14. Laurent power series

Exercise A.2.14.1. 2 While K [[ x ± ]] is not a ring, some elements of K [[ x ± ]]


can still be multiplied. For instance, define three elements a, b, c ∈ K [[ x ± ]]
Math 701 Spring 2021, version April 6, 2024 page 520

by

a = 1 + x −1 + x −2 + x −3 + · · · ,
b = 1 − x,
c = 1 + x + x2 + x3 + · · · .

(a) Find ab and bc and a (bc) and ( ab) c.


(b) Why is it not surprising that a (bc) ̸= ( ab) c ?

Exercise A.2.14.2. 2 Let K be a field. Prove that the K-algebra K (( x )) is a


field.
[Hint: You can take it for granted that K (( x )) is a commutative K-algebra,
as the proof is a mutatis-mutandis variant of the analogous proof for usual
FPSs.]

Here are some applications of Laurent polynomials:

Exercise A.2.14.3. Let n ∈ N.


(a) 5 Prove that
n     
n 2k n
∑ (−2) n−k
k k
=
n/2
.
k =0
 
n
(Recall that / N.)
= 0 whenever i ∈
i
(b) 2 More generally, prove that
n     
n 2k n
∑ (−2) n−k
k k+p
=
(n + p) /2
for any p ∈ Z.
k =0

 [Hint:  Compute
n the coefficients of the Laurent polynomial
2
x + x −1 − 2 in two ways.]

Exercise A.2.14.4. For any Laurent series f = ∑ f n x n ∈ K (( x )) (with f n ∈


n ∈Z
K), we define the residue of f to be its x −1 -coefficient f −1 . We denote this
residue by Res f . (This is an algebraic analogue of the “residue at 0” from
complex analysis.)
The order ord f of a nonzero Laurent series f = ∑ f n x n ∈ K (( x )) (with
n ∈Z
f n ∈ K) shall mean the smallest n ∈ Z satisfying f n ̸= 0. (This is well-defined,
since all sufficiently low n ∈ Z satisfy f n = 0 by the definition of a Laurent
series.) The trailing coefficient of a nonzero Laurent series f ∈ K (( x )) means
Math 701 Spring 2021, version April 6, 2024 page 521

the coefficient xord f f (that is, the xord f -coefficient of f ). For example, the
 

Laurent series − x −2 + 3 + 7x has order −2 and trailing coefficient −1.


The derivative f ′ of a Laurent series f ∈ K (( x )) is defined as follows: If
f = ∑ f n x n with f n ∈ K, then f ′ := ∑ n f n x n−1 .
n ∈Z n ∈Z
Prove the following:
(a) 1 Any f ∈ K (( x )) satisfies Res ( f ′ ) = 0.
(b) 3 Any n ∈ N and any f ∈ K (( x )) satisfy Res ( f n f ′ ) = 0.
(c) 1 If f ∈ K (( x )) is a nonzero Laurent series whose trailing coefficient is
invertible (in K), then f is invertible in K (( x )). (Keep in mind that the word
“invertible” refers to multiplicative inverses, not compositional inverses.)
(d) 3 If f ∈ K (( x )) is a nonzero Laurent series whose trailing coefficient
is invertible (in K), then each n ∈ Z satisfies
(
0, if n ̸= −1;
Res f n f ′ =

ord f , if n = −1.

(To be fully precise, “ord f ” here means the element (ord f ) · 1K of the ring
K.)
[Hint: In parts of this exercise, it may be expedient to first prove the claim
under the assumption that K is a Q-algebra (so that 1, 2, 3, . . . can be divided
by in K), and then to argue that the assumption can be lifted.]

Exercise A.2.14.5. Let f = ∑ f n x n (with f 1 , f 2 , f 3 , . . . ∈ K) be an FPS in


n >0
K [[ x ]] whose constant term is 0. Assume that f has a compositional inverse
g = ∑ gn x n (with g1 , g2 , g3 , . . . ∈ K).
n >0

(a) 2 Prove that there exists a unique FPS h ∈ K [[ x ]] with x = f h. (This


x
FPS h is usually denoted by , but this notation is not an instance of Defini-
f
tion 3.3.5 (b), since f is not invertible.)
(b) 4 Prove the Lagrange inversion formula, which says that
h i
n · g n = x n −1 ( h n ) for any positive integer n.

(c) 2 The Lambert W series is defined to be the compositional inverse of the


x n +1
FPS x · exp [ x ] = ∑ . Find an explicit formula for the x n -coefficient of
n∈N n!
this series.
(d) 2 Consider again the FPS C ( x ) = c0 + c1 x + c2 x2 + · · · ∈ Q [[ x ]] from
Example 2 in Section 3.1. Let us rename it as C. We proved the equality
Math 701 Spring 2021, version April 6, 2024 page 522

C = 1 + xC2 in that example (albeit we wrote it as C ( x ) = 1 + x (C ( x ))2 ). Set


f = x − x2 and g = xC. Show that the FPS g is a compositional inverseof  f.
1 2n
Use the Lagrange inversion formula to reprove the formula cn =
n+1 n
without the quadratic formula.
(e) 2 Let m be a positive integer. Let D ∈ Q [[ x ]] be an FPS with constant
term 1 that satisfies D = 1 + x m−1 D m . Find an explicit formula for the x n -
coefficient of D. (This generalizes part (d), which is obtained for m = 2.)

[Hint: For part (b), take derivatives of both sides in g ◦ f = x to obtain


( g′
◦ f ) · f ′ = 1; then divide by f n in the Laurent series ring K (( x )), and
rewrite g′ ◦ f as ∑ kgk f k−1 . Now take residues and use Exercise A.2.14.4
k >0
(d).]
[Remarks: Part (b) is remarkable for connecting the compositional inverse
g of f with the multiplicative inverse h of f /x. Since multiplicative inverses
are usually easier to compute, it is a helpful tool for the computation of
compositional inverses.
The Lambert W series in part (c) is the Taylor series of the Lambert W
function.]

The following exercise tells a cautionary tale about applying some of our
results past their stated assumptions:

Exercise A.2.14.6. 2 Extending Definition 3.13.3 to the K-module K [[ x ± ]],


we obtain the notion of a limit of a sequence of doubly infinite power series.

(a) Prove that lim ( x n + x −n ) = 0.


n→∞
 
2
(b) Prove that lim ( x n + x −n ) = 2.
n→∞
(c) Can Theorem 3.13.8 be generalized to Laurent series instead of FPSs?
[Note: This does not mean that the notion of limits of sequences of Laurent
series is completely useless. They behave reasonably as long as multiplica-
tion is not involved.]

A.2.15. Multivariate FPSs

Exercise A.2.15.1. 2 Let k ∈ N. Prove that

xk
 
n n
∑ k x =
n ∈N (1 − x ) k +1
Math 701 Spring 2021, version April 6, 2024 page 523

without using multivariate power series.

Exercise A.2.15.2. 2 Prove that

xn yn
∑ 1 − yqn ∑ 1 − xqn
= in the ring K [[ x, y, q]] .
n ∈N n ∈N

The next two exercises are concerned with FPSs in two indeterminates x and y.

Exercise A.2.15.3. 4 For any n ∈ N and m ∈ N, we let f (m, n) denote the #


of n-tuples (α1 , α2 , . . . , αn ) of integers satisfying |α1 | + |α2 | + · · · + |αn | ≤ m.
1
(a) Prove that ∑ f (m, n) x m yn = in K [[ x, y]].
(n,m)∈N×N 1 − x − y − xy
(b) Prove that f (m, n) = f (n, m) for all n, m ∈ N.

Exercise A.2.15.4. 4 Let f ∈ K [[ x, y]] be an FPS such that each positive


integer b satisfies f x, x b = 0. Prove that f = 0.

Exercise A.2.15.5. 1 Recall the Stirling numbers of the 2nd kind S (n, k )
studied in Exercise A.2.6.5 and Exercise A.2.7.2. Show that in Q [[ x, y]], we
have
S (n, k )
∑ ∑ n! xn yk = exp [y · (exp [x] − 1)] .
n ∈N k ∈N

Exercise A.2.15.6. 3 Recall the Eulerian polynomials Am ∈ Z [ x ] from Exer-


cise A.2.6.2. Prove that in Q [[ x, y]], we have

yn 1−x
∑ An ·
n!
=
1 − x exp [(1 − x ) y]
.
n ∈N

Next comes an application of multivariate polynomials to proving a famous


binomial identity:

Exercise A.2.15.7. In this exercise, we shall prove Dixon’s identity, which


states that
   
k b+c c+a a+b ( a + b + c)!
∑ (−1) c + k a + k b + k = a!b!c! (309)
k ∈Z

for any a, b, c ∈ N.
Math 701 Spring 2021, version April 6, 2024 page 524

(a) 1 Set
   
b+c c+a a+b
F ( a, b, c) := ∑ (−1) k
c+k a+k b+k
k ∈Z

for any a, b, c ∈ N. Prove that F ( a, b, c) is well-defined (i.e., the sum in this


definition is summable).
(b) 1 Prove that (309) holds whenever a = 0 or b = 0 or c = 0.
(c) 3 Prove that every a, b, c ∈ N satisfy
h i 
F ( a, b, c) = (−1) a+b+c · x2a y2b z2c (y − z)b+c (z − x )c+a ( x − y) a+b .

(Here, we are using polynomials in three indeterminates x, y, z.)


(d) 3 Prove that every three positive integers a, b, c satisfy

F ( a, b, c) = F ( a − 1, b, c) + F ( a, b − 1, c) + F ( a, b, c − 1) .

(e) 1 Prove that every three positive integers a, b, c satisfy

( a + b + c)! ( a − 1 + b + c ) ! ( a + b − 1 + c ) ! ( a + b + c − 1) !
= + + .
a!b!c! ( a − 1)!b!c! a! (b − 1)!c! a!b! (c − 1)!

(f) 1 Prove (309).


(g) 2 Show that each n ∈ N satisfies

n   3 (−1)n/2 (3n/2)! , if n is even;

k n
∑ (−1) k =  (n/2)!3
k =0 0, if n is odd.

(h) 2 Show that

( a + b + c)! · (2a)! (2b)! (2c)!


   
2a 2b 2c
∑ (−1) a + k b + k c + k = a!b!c! · (b + c)! (c + a)! (a + b)!
k

k ∈Z

for any a, b, c ∈ N.
[Hint: For part (d), use the fact that

x 2 ( y − z ) + y2 ( z − x ) + z2 ( x − y ) = − ( y − z ) ( z − x ) ( x − y ) ,

and keep in mind that xi y j zk ( x m p) = xi−m y j zk p for any i, j, k, m, p with


   

i ≥ m.]
Math 701 Spring 2021, version April 6, 2024 page 525

A.3. Integer partitions and q-binomial coefficients


The notations of Chapter 4 shall be used here.

A.3.1. Partition basics

Exercise A.3.1.1. The purpose of this exercise is to make the proof of Propo-
sition 4.1.15 rigorous.
For any partition λ = (λ1 , λ2 , . . . , λk ), we define the Young diagram Y (λ) of
λ to be the finite set

{(i, j) | i ∈ {1, 2, . . . , k} and j ∈ {1, 2, . . . , λi }} .

Visually, this set Y (λ) is represented by drawing each (i, j) ∈ Y (λ) as a cell
of an (invisible) matrix, namely as the cell in row i and in row j. The resulting
picture is a table of k left-aligned rows, where the i-th row (counted from the
top) has exactly λi cells. For example, if λ = (4, 2, 1), then the Young diagram
Y (λ) of λ is
.

(This is only one way to draw Young diagrams; it is known as English notation
or matrix notation, since our labeling of cells matches the way the cells of a
matrix are commonly labeled. If we flip our pictures across a horizontal axis,
we would get French notation aka Cartesian notation, as the labeling of cells
would then match the Cartesian coordinates of their centers.)
(a) 1 Prove that |Y (λ)| = |λ| for any partition λ.
(b) 1 Prove that the Young diagram Y (λ) uniquely determines the parti-
tion λ.
A NW-set shall mean a subset S of {1, 2, 3, . . .}2 with the following prop-
erty: If (i, j) ∈ S and (i′ , j′ ) ∈ {1, 2, 3, . . .}2 satisfy i′ ≤ i and j′ ≤ j, then
(i′ , j′ ) ∈ S as well. (In terms of our above visual model, this means that walk-
ing northwest from a cell of S never moves you out of S, unless you walk out
of the matrix. For example, the set

is not a NW-set, since the left neighbor of the leftmost cell in the topmost
row is not in this set.)
(c) 1 Prove that Y (λ) is a NW-set for each partition λ.
Math 701 Spring 2021, version April 6, 2024 page 526

(d) 2 Prove that any finite NW-set has the form Y (λ) for a unique parti-
tion λ.
Now, let flip : {1, 2, 3, . . .}2 → {1, 2, 3, . . .}2 be the map that sends each
(i, j) ∈ {1, 2, 3, . . .}2 to ( j, i ). Visually, this map flip is a reflection in the
“main diagonal” (the diagonal going from the northwest to the southeast).
We can apply flip to a subset of {1, 2, 3, . . .}2 by applying flip to each element
of this subset. For example:

flip sends to .

(e) 1 Prove that for any partition λ, there is a unique partition λt such
that Y λt = flip (Y (λ)).


This partition λt is called the transpose (or conjugate) of λ.


(f) 1 Prove that if λ = (λ1 , λ2 , . . . , λk ) is a partition, then the partition λt
has exactly λ1 parts. (Here, we set λ1 = 0 if k = 0.)
(g) 1 Prove that if λ = (λ1 , λ2 , . . . , λk ), then the i-th part of the partition
λt equals the # of all j ∈ {1, 2, . . . , k } such that λ j ≥ i.
t
(h) 1 Prove that λt = |λ| and λt = λ for any partition λ.

Exercise A.3.1.2. 5 A partition will be called binarial if all its parts are powers
of 2. For instance, (8, 2, 2, 1) is a binarial partition of 13. Recall that the length
of a partition λ is denoted by ℓ (λ).
Let n > 1 be an integer. Prove that

∑ (−1)ℓ(λ) = 0.
λ is a binarial
partition of n

In other words, prove that the # of binarial partitions of n having even length
equals the # of binarial partitions of n having odd length.

Exercise A.3.1.3. 3 A partition will be called trapezoidal if it has the form


( j, j − 1, j − 2, . . . , i ) for some integers i ≤ j. (For instance, (4, 3, 2) and (5)
are trapezoidal partitions.)
Let n be a positive integer. Prove that

(# of trapezoidal partitions of n) = (# of odd positive divisors of n) .


Math 701 Spring 2021, version April 6, 2024 page 527

Convention A.3.1. Let n be an integer. The notation “λ ⊢ n” shall mean “λ


is a partition of n”. Thus, for example, the summation sign “ ∑ ” means a
λ⊢n
sum over all partitions λ of n.

Exercise A.3.1.4. 5 If λ is a partition, and if i is a positive integer, then mi (λ)


shall mean the # of parts of λ that are equal to i. For instance, m3 (5, 3, 3, 2) =
2 and m4 (5, 3, 3, 2) = 0.
Fix an n ∈ N.
(a) Prove that
∞ ∞
∏ ∏ (mi (λ))! = ∏ ∏ imi (λ) .
λ⊢n i =1 λ⊢n i =1
(b) More generally, prove that

∏ ∏ ∏ ∏ xi i
m (λ)
xj =
λ⊢n (i,j)∈{1,2,3,...}2 ; λ⊢n i =1
j ≤ mi ( λ )

as monomials in x1 , x2 , x3 , . . ..
(For example, for n = 3, this is saying that

( x1 ) · ( x1 x1 ) · ( x1 x2 x3 )
|{z} | {z } | {z }
factors for λ=(3) factors for λ=(2,1) factors for λ=(1,1,1)
     
= x31 · x11 x21 · x13 .
| {z } | {z } | {z }
factors for λ=(3) factors for λ=(2,1) factors for λ=(1,1,1)

Make sure you understand why part (a) is a particular case of (b).)

Exercise A.3.1.5. Recall the notations from Exercise A.3.1.1.


We say that a partition λ is a single-cell upgrade of a partition µ if we have
Y (λ) ⊆ Y (µ) and |Y (µ) \ Y (λ)| = 1. (This is just saying that the Young
diagram of µ is obtained from that of λ by adding one single cell.)
For instance, the single-cell upgrades of the partition (2, 2, 1) are (3, 2, 1),
(2, 2, 2) and (2, 2, 1, 1). On the other hand, the partition (2, 2, 1) is a single-cell
upgrade of each of the partitions (2, 2) and (2, 1, 1).
For any partition λ, let γ (λ) denote the # of distinct parts of λ. For in-
stance, γ (5, 5, 3, 2, 1, 1) = 4.
Prove the following:
(a) 2 For any partition λ, we have

(# of partitions µ such that λ is a single-cell upgrade of µ)


= γ (λ) .
Math 701 Spring 2021, version April 6, 2024 page 528

(b) 3 For any partition λ, we have

(# of single-cell upgrades of λ)
= (# of partitions µ such that λ is a single-cell upgrade of µ) + 1.

(c) 3 We have

x 1
∑ γ (λ ) x |λ| =
1−x ∏ 1 − xi in Z [[ x ]] .
λ is a partition i =1

Exercise A.3.1.6. 4 Let d be a positive integer. Let n ∈ N. Prove that

(# of partitions of n that have no part divisible by d)


= (# of partitions of n that have no d equal parts) .

(For instance, the partition (4, 2, 2, 2, 2) has no part divisible by 3, but it has
3 equal parts.)
[Remark: Theorem 4.1.14 is the particular case of this exercise for d = 2.]

Exercise A.3.1.7. 4 Let n ∈ N. Let pdist odd (n) be the # of partitions λ of n


such that all parts of λ are distinct and odd. (For example, (7, 3, 1) is such
a partition.) Let p+ (n) be the # of partitions of n that have an even # of
even parts. (For example, (5, 4, 2) is such a partition.) Let p− (n) be the # of
partitions of n that have an odd # of even parts. (For example, (5, 4, 1) is such
a partition.) Prove that

p+ (n) − p− (n) = pdist odd (n) .

Exercise A.3.1.8. For any partition λ = (λ1 , λ2 , . . . , λk ), we define the integer


k
alt λ = ∑ (−1)i−1 λi = λ1 − λ2 + λ3 − λ4 ± · · · + (−1)k−1 λk .
i =1

(a) 1 Prove that alt λ ∈ N for each partition λ.


(b) 1 Define the transpose λt of a partition λ as in Exercise A.3.1.1. Show
that alt λt = (# of odd parts of λ) for any partition λ.


(c) 8 Prove that each n, k ∈ N satisfy


(# of partitions λ of n into odd parts such that ℓ (λ) = k)
= (# of partitions λ of n into distinct parts such that alt λ = k) .
(d) 1 Derive a new proof of Theorem 4.1.14 from this.
Math 701 Spring 2021, version April 6, 2024 page 529

Exercise A.3.1.9. Let n ∈ N. Let Parn denote the set of all partitions of n.
We define a partial order ≼ on the set Parn as follows: For two partitions
λ = (λ1 , λ2 , . . . , λk ) and µ = (µ1 , µ2 , . . . , µℓ ), we set λ ≼ µ if and only if each
positive integer i satisfies

λ1 + λ2 + · · · + λ i ≤ µ1 + µ2 + · · · + µ i . (310)

Here, we set λ j := 0 for each j > k, and we set µ j := 0 for each j > ℓ.
(For example, for n = 5, we have (2, 1, 1, 1) ≼ (2, 2, 1), since we have

2 ≤ 2,
2 + 1 ≤ 2 + 2,
2 + 1 + 1 ≤ 2 + 2 + 1,
2 + 1 + 1 + 1 ≤ 2 + 2 + 1 + 0,
2 + 1 + 1 + 1 + 0 ≤ 2 + 2 + 1 + 0 + 0,

and so on. Note that there are infinitely many inequalities to be checked,
but only finitely many of them are relevant, since both sides of (310) are
essentially finite sums that stop growing at some point.)
(a) 1 For any n ≥ 6, find two partitions λ and µ of n satisfying neither
λ ≼ µ nor µ ≼ λ. (This shows that ≼ is not a total order for n ≥ 6.)
(b) 2 Prove that two partitions λ = (λ1 , λ2 , . . . , λk ) and µ =
(µ1 , µ2 , . . . , µℓ ) of n satisfy λ ≼ µ if and only if each i ∈ {1, 2, . . . , k} sat-
isfies (310).
(c) 2 Prove that two partitions λ = (λ1 , λ2 , . . . , λk ) and µ = (µ1 , µ2 , . . . , µℓ )
of n satisfy λ ≼ µ if and only if each i ∈ {1, 2, . . . , ℓ} satisfies (310).
(d) 4 Prove that two partitions λ and µ of n satisfy λ ≼ µ if and only if
they satisfy µt ≼ λt . (See Exercise A.3.1.1 for the definition of λt and µt .)
[Note: The partial order ≼ is called the dominance order or the majorization
order; it is rather important in the theory of symmetric functions.]
[Hint: It is helpful to identify a partition λ = (λ1 , λ2 , . . . , λk ) with the
λ = (λ1 , λ2 , . . . , λk , 0, 0, 0, . . .).]
weakly decreasing essentially finite sequence b

Exercise A.3.1.10. For any two partitions λ = (λ1 , λ2 , . . . , λk ) and µ =


(µ1 , µ2 , . . . , µℓ ), we define two partitions λ + µ and λ ⊔ µ as follows:
• We let λ + µ be the partition (λ1 + µ1 , λ2 + µ2 , . . . , λm + µm ), where
we set m = max {k, ℓ}, and where we set λ j := 0 for each j > k, and
where we set µ j := 0 for each j > ℓ.
• We let λ ⊔ µ be the partition obtained by sorting the entries of the list
(λ1 , λ2 , . . . , λk , µ1 , µ2 , . . . , µℓ ) in weakly decreasing order.
Math 701 Spring 2021, version April 6, 2024 page 530

For example, (3, 2, 1) + (4, 2) = (3 + 4, 2 + 2, 1 + 0) = (7, 4, 1) and


(3, 2, 1) ⊔ (4, 2) = (4, 3, 2, 2, 1).
Let λ and µ be two partitions. Recall the definition of the transpose of a
partition (Exercise A.3.1.1).
(a) 2 Prove that (λ ⊔ µ)t = λt + µt .
(b) 2 Prove that (λ + µ)t = λt ⊔ µt .
[Hint: Same as for Exercise A.3.1.9.]

Exercise A.3.1.11. 3 Prove Theorem 4.1.21 without any use of FPSs.


[Hint: Theorem 4.1.21 can be easily reduced to the case when I is finite;
then, induction on | I | becomes available as a technique. A bijective proof
also exists.]

A.3.2. Euler’s pentagonal number theorem


Euler’s pentagonal number theorem can be used freely in the following exer-
cises.

Exercise A.3.2.1. (a) 2 Prove that p (n) ≤ p (n − 1) + p (n − 2) for any n > 0.

(b) 3 Prove that p (n) ≤ p (n − 1) + p (n − 2) − p (n − 5) for any n > 0.


(c) 2 Prove that p (n) ≥ p (n − 1) + p (n − 2) − p (n − 5) − p (n − 7) for
any n ∈ N.
(d) 2 Use part (a) to obtain an upper bound for p (n) in terms of Fibonacci
numbers.

A.3.3. Jacobi’s triple product identity

Exercise A.3.3.1. (a) 3 Prove that



1 − xm
∏ 1 + xm = ∑ (−1)k xk .
2

m =1 k ∈Z

(b) 3 Prove that



1 − x2m
∏ 1 − x2m−1 = ∑ xk(k+1)/2 .
m =1 k ∈N

[Hint: Both times, start by substituting appropriate values for q and x in


the Jacobi Triple Product Identity.]
Math 701 Spring 2021, version April 6, 2024 page 531

Exercise A.3.3.2. 4 Prove that


∞    
∏ 1 − u n v n −1 1 − u n −1 v n (1 − u n v n ) = ∑ (−1)k uk(k−1)/2 vk(k+1)/2
n =1 k ∈Z

in the FPS ring K [[u, v]].


[Hint: This is the Jacobi Triple Product Identity after (or, better, before) a
substitution.]

Exercise A.3.3.3. (a) 6 Prove that


∞ ∞
∏ (1 − x n )3 = ∑ (−1)k (2k + 1) xk(k+1)/2 in K [[ x ]] .
n =1 k =0

(b) 3 Conclude that



∏ (1 − xn )3 = ∑ (4k + 1) xk(2k+1) in K [[ x ]] .
n =1 k ∈Z

A.3.4. q-binomial coefficients

Exercise A.3.4.1. (a) 1 Prove Proposition 4.4.10.


(b) 3 Prove Theorem 4.4.12 (b).
(c) 3 Prove Theorem 4.4.13.
(d) 2 Prove Theorem 4.4.17.
(e) 1 Prove Proposition 4.4.18. (Don’t forget the case k > n.)

Exercise A.3.4.2. (a) 2 Prove Theorem 4.4.19 by induction on n.


(b) 2 Prove Theorem 4.4.21.

Exercise A.3.4.3. 4 Let n, k ∈ N satisfy n ≥ k. Prove the following:


 
n
(a) The polynomial ∈ Z [q] has degree k (n − k).
k q
(b) For each i ∈ {0, 1, . . . , k (n − k )}, we have
h i n h i n
i k(n−k)−i
q = q .
k q k q
Math 701 Spring 2021, version April 6, 2024 page 532

 
n
(That is, the sequence of coefficients of this polynomial is palindromic.)
k q
   
n − k ( n − k ) n
(c) We have = q in the Laurent polynomial ring
k q −1 k q
Z [ q ± ].

Exercise A.3.4.4. Let us extend the definition of the q-integer [n]q (Definition
4.4.15 (a)) to the case when n is a negative integer as follows: If n is a negative
integer, then we set
−1
[ n ] q : = − q −1 − q −2 − · · · − q n = − ∑ qk ∈ Z q± .
 
k=n

(This is a Laurent polynomial, not a polynomial any more.)


 Furthermore,
 inspired by Theorem 4.4.17, let us extend the definition of
n
to the case when n is a negative integer by setting
k q
 
n [ n ] q [ n − 1] q · · · [ n − k + 1] q
= for all n ∈ Z and k ∈ N
k q [k ]q !
and  
n
=0 for all n ∈ Z and k ∈
/ N.
k q
The right hand sides of these two equalities are to be understood in the ring
Z ((q)) of Laurent series. (From Theorem 4.4.17 and Convention 4.4.11, we
know
  that this definition does not conflict with our existing definitions of
n
for n ∈ N.)
k q
(a) 1 Prove that [n]1 = n for any n ∈ Z.
   
n n
(b) 1 Prove that = for any n ∈ Z and k ∈ Z.
k 1 k
1 − qn
(c) 1 Prove that [n]q = in the ring Z ((q)) of Laurent series for any
1−q
n ∈ Z.
(d) 1 Prove that [−n]q = −q−n [n]q for any n ∈ Z.

(e) 1 Prove the “q-upper negation formula” (a q-analogue of Theorem 3.3.11):


If n ∈ Z and k ∈ Z, then
−n k kn−k(k−1)/2 k + n − 1
   
= (−1) q .
k q k q
Math 701 Spring 2021, version April 6, 2024 page 533

   
n n
(f) 1 Prove that ∈ Z [q ] (that is,
± is a Laurent polynomial,
k q k q
not just a Laurent series) for any n ∈ Z and k ∈ Z.
(g) 2 Prove that Theorem 4.4.12 holds for any n ∈ Z (not just for positive
integers n).
(h) 1 Prove that Proposition 4.4.18 does not hold for negative n (in gen-
eral).

Exercise A.3.4.5. Consider the ring Z [[z, q]] of FPSs in two indeterminates z
and q. Let n ∈ N.
(a) 1 Prove that
n −1  n  
 n
∏ 1 + zq i
= ∑q k(k−1)/2
k q
zk .
i =0 k =0

(b) 4 Prove that


n −1
n+k−1
 
1
∏ 1 − zq i
= ∑
k
zk ,
i =0 k ∈N q

−1
 
where we set := 1 (this is consistent with the definition in Exercise
0 q
A.3.4.4). (Up to sign, this is a q-analogue of Proposition 3.3.12.)

Exercise A.3.4.6. 5 Let n ∈ N. Prove the Gauss formula


(
n  
n 0, if n is odd;
∑ (−1)k k = 1 − q1  1 − q3  1 − q5  · · · 1 − qn−1  , if n is even.
k =0 q

Exercise A.3.4.7. 5 Let n ∈ N. Prove that


n  
n   
∑ k 2q k
= 1 + q 1
1 + q 2
· · · (1 + q n ) .
k =0 q

Exercise A.3.4.8. Prove the following q-analogues of the Vandermonde con-


volution identity:
(a) 3 If a, b ∈ N and n ∈ N, then
  n    
a+b k(b−n+k) a b
= ∑q .
n q k =0
k q n−k q
Math 701 Spring 2021, version April 6, 2024 page 534

 k (b− n + k ) might occasionally be negative, but in


(Note that the exponent
b
those cases we have = 0.)
n−k q

(b) 3 If a, b ∈ N and n ∈ N, then


  n    
a+b a b
n
= ∑q ( a−k)(n−k)
k q n−k q
.
q k =0

(Note that the exponent a − k) (n − k ) might occasionally be negative, but


 (
a
in those cases we have = 0.)
k q

Exercise A.3.4.9. 3 Let r, s, u, v ∈ Z with r > s > 0. Prove that


  ∞
rn + u 1
lim
n→∞ sn + v
= ∏ 1 − qk in Z [[q]] .
q k =1

(See Definition 3.13.3 for the meaning of “ lim ” used here.)


n→∞

Exercise A.3.4.10. We shall work in the ring (Z [z± ]) [[q]].


(a) 4 Prove MacMahon’s identity, which says that
! !
m  n  m  
  m+n
∏ 1 + q2i−1 z · ∏ 1 + q2j−1 z−1 = ∑ q k2
k+n
zk
i =1 j =1 k=−n q2

for any m, n ∈ N.
(b) 4 Recover Theorem 4.3.1 by taking the limit m → ∞ and then n → ∞.
(This yields a new proof of Theorem 4.3.1.)
The following exercise does not explicitly involve q-binomial coefficients, but
they can be used profitably in its solution:
Exercise A.3.4.11. We work in the ring Z [[z, q]] of FPSs in two indeterminates
z and q.
(a) 2 Prove that
∞   zk qk(k+1)/2
∏ 1 + zqi = ∑ 1 2 k
.
i =1 k ∈N (1 − q ) (1 − q ) · · · 1 − q

(b) 2 Prove that



1 zk qk
∏ 1 − zqi ∑ (1 − q1 ) (1 − q2 ) · · · 1 − qk  .
=
i =1 k ∈N
Math 701 Spring 2021, version April 6, 2024 page 535

(c) 5 Prove that

∞ 2
1 zk qk
∏ 1 − zqi = ∑ (1 − q1 ) (1 − q2 ) · · · 1 − qk  · (1 − zq1 ) (1 − zq2 ) · · · 1 − zqk  .
i =1 k ∈N

[Hint: For part (c), define the h-index h (λ) of a partition λ to be the largest
i ∈ N such that λ has at least i parts that are ≥ i. For instance, h (4, 3, 1, 1) = 2
and h (4, 3, 3, 1) = 3. Note that h (λ) is the size of the largest square that fits
into the Young diagram of λ. What remains if this square is removed from
the Young diagram of λ ? See also the Hirsch index.]

Exercise A.3.4.12. Let F be a finite field. Let n ∈ N and k ∈ N. Prove the


following:
(a) 3 If W is a k-dimensional F-vector space, then

(# of n-tuples (w1 , w2 , . . . , wn ) of vectors in W that span W )


  k −1  
= | F | n − 1 · | F | n − | F | · · · · · | F | n − | F | k −1 = ∏ | F | n − | F | i .
 
i =0

(b) 3 Let m ∈ N. Then,

# of m × n-matrices A ∈ F m×n satisfying rank A = k



 
m n  n  
n k −1

= · | F| − 1 · | F| − | F| · · · · · | F| − | F|
k | F| | {z }
k −1
= ∏ (| F |n −| F |i )
i =0
   
m n
= · · | F |k(k−1)/2 (| F | − 1)k [k]!| F| .
k | F| k | F|

[Hint: For part (a), what is the connection between injective linear maps
and surjective linear maps (between finite-dimensional vector spaces)?]

Exercise A.3.4.13. 3 Let m, n ∈ N. Prove that


m    
m n
∑ k k
qk(k−1)/2 (q − 1)k [k ]!q = qmn .
k =0 q q

[Hint: This is a polynomial identity in q, and there are infinitely many


prime numbers. What does this suggest?]
Math 701 Spring 2021, version April 6, 2024 page 536

Exercise A.3.4.14. For any a, b ∈ K and any n ∈ N, let us set


    
Qn ( a, b) := aq0 + b aq1 + b · · · aqn−1 + b .

(a) 3 Prove that any n ∈ N and any a, b, c, d ∈ K satisfy


n   n  
n n
∑ k Qk (a, b) · Qn−k (c, d) = ∑ k Qk (c, b) · Qn−k (a, d) .
k =0 q k =0 q

(b) 1 Prove that any n ∈ N and any a, b, c ∈ K satisfy


n  
n
Qn ( a, b) = ∑ Q ( a, c) · Qn−k (−c, b) .
k =0
k q k

[Note that this generalizes Theorem 4.4.19.]


The following exercise gives a way to derive Theorem 4.4.19 from Theorem
4.4.21:
Exercise A.3.4.15. Let L be a commutative ring, and let a, b, q ∈ L. (Note that
if we let L be the polynomial ring K [q], then this setting becomes the setting
of Theorem 4.4.19.)
Consider the polynomial ring L [u]. Let A be the (noncommutative) L-
algebra End L ( L [u]) of all endomorphisms of the L-module L [u]. (Its mul-
tiplication is composition of endomorphisms. Note that L [u] is a free L-
module of infinite rank, with basis u0 , u1 , u2 , . . . ; thus, the elements of A


can be viewed as ∞ × ∞-matrices with each column having only finitely


many nonzero entries. But it is easier to just think of the elements of A as
L-linear maps L [u] → L [u], just as we defined A.)
Let α ∈ A = End L ( L [u]) be the L-module endomorphism of L [u] that
satisfies  
α ui = aqi ui+1 for any i ∈ N.
(Thus, α sends each polynomial f ∈ L [u] to au · f [qu].)
Let β ∈ A = End L ( L [u]) be the L-module endomorphism of L [u] that
satisfies  
β ui = bui+1 for any i ∈ N.
(Thus, β multiplies each polynomial f ∈ L [u] by bu.)
(a) 1 Prove that αβ = qβα.
(b) 2 Prove that ( β + α)k (1) = aq0 + b aq1 + b · · · aqk−1 + b uk for
  

each k ∈ N.
(c) 2 Rederive Theorem 4.4.19 by applying Theorem 4.4.21 to β, α and q
instead of a, b and ω.
Math 701 Spring 2021, version April 6, 2024 page 537

The following exercise shows an application of q-binomial coefficients using


a technique called the roots of unity filter (see also Exercise A.1.1.3 for a similar
technique):
Exercise A.3.4.16. Let p be a prime. Let Ω be the set of all complex numbers
z satisfying z p = 1. It is well-known that
n o
Ω = e2πig/p | g ∈ {0, 1, . . . , p − 1}

(where the letters e and i have the usual meanings they have in complex
analysis) and, in particular, |Ω| = p and 1 ∈ Ω.
Let n be a positive integer.
np − 1
 
(a) 1 Prove that = 1 for each ω ∈ Ω \ {1}.
p−1 ω
 
np
(b) 2 Prove that = n for each ω ∈ Ω \ {1}.
p ω
(
p, if p | k;
(c) 1 Prove that ∑ ω k = for any k ∈ N.
ω ∈Ω 0, if p ∤ k

(d) 3 Prove that


 
np
+ ( p − 1) n
p
(# of subsets S of [np] satisfying p | sum S) = .
p

Here, sum S is defined as in Proposition 4.4.7 (b).


 
np
[Hint: For part (d), compute ∑ in two ways.]
ω ∈Ω p ω
[Remark: The claim of part (d) generalizes Problem 6 from the Interna-
tional Mathematical Olympiad 1995.]
We have seen q-analogues of nonnegative integers and of binomial coeffi-
cients. The following exercise shows that quite a few more things have q-
analogues:
Exercise A.3.4.17. Let R be the ring K [[q, x ]] of FPSs in the two indetermi-
nates q and x. In this exercise, we shall pretend that q is a constant, so that
f [ a] (for a given FPS f ∈ R and a given element a of a K [[q]]-algebra) shall
denote what is normally called f [q, a] (that is, the result of substituting a for
x in f while leaving q unchanged).
Define the map Dq : R → R by letting
f − f [qx ]
Dq f = for each f ∈ R.
(1 − q ) x
Math 701 Spring 2021, version April 6, 2024 page 538

(a) 1 Prove that this is well-defined, i.e., that f − f [qx ] is really divisible
by (1 − q) x in R for any f ∈ R.
The map Dq is known as the q-derivative or the Jackson derivative. It is easily
seen to be K [[q]]-linear.
(b) 1 Prove that Dq ( x n ) = [n]q x n−1 for each n ∈ N. (For n = 0, read the
right hand side as 0.)
(c) 1 Prove the “q-Leibniz rule”: For any f , g ∈ R, we have

Dq ( f g) = Dq f · g + f [qx ] · Dq g.
xn
(d) 1 Let expq ∈ R be the FPS ∑ (called the “q-exponential”). Prove
n ∈N [ n ] q !
that Dq expq = expq . (Note that, unlike the usual exponential exp, this does
not require that K be a Q-algebra, since [n]q ! is a FPS with constant term 1
and thus always invertible.)
(e) 2 Prove that each m ∈ N satisfies
 
 1  [m]q
Dq  = .
 
 m −1
 m
∏ (1 − q x )
i ∏ (1 − q x )
i

i =0 i =0

Now, we shall discuss a q-analogue of the Eulerian polynomials from Ex-


ercise A.2.6.2.
Let K = Z. For any m ∈ N, we define an FPS
Qq,m := ∑ [n]m n m 0 m 1 m 2
q x = [0]q x + [1]q x + [2]q x + · · · ∈ R.
n ∈N

For example,
1
Qq,0 = ∑ [n]0q x n = ∑ x n =
1−x
;
n∈N |{z} n ∈N
=1
x
Qq,1 = ∑ [n]1q xn = ∑ [n]q x n =
(1 − x ) (1 − qx )
(why?) ;
n∈N |{z} n ∈N
=[n]q

it can furthermore be shown that


x (1 + qx )
Qq,2 = ∑ [n]2q x n =
(1 − x ) (1 − qx ) (1 − q2 x )
.
n ∈N

Aq,m
It appears that each Qq,m has the form for
(1 − x ) (1 − qx ) · · · (1 − qm x )
some polynomial Aq,m ∈ Z [q, x ] of degree m. Let us prove this.
Math 701 Spring 2021, version April 6, 2024 page 539

For each m ∈ N, we define an FPS


!
m  
Aq,m := ∏ 1−q x i
Qq,m ∈ R.
i =0

Aq,m
(Thus, Qq,m = .)
(1 − x ) (1 − qx ) · · · (1 − qm x )
Let ϑq : R → R be the Z-linear map that sends each FPS f ∈ R to x · Dq f .

(f) 1 Prove that Qq,m = ϑq Qq,m−1 for each m > 0.
(g) 2 Prove that Aq,m = [m]q xAq,m−1 + x (1 − x ) · Dq Aq,m−1 for each m >
0.
(h) 1 Conclude that Aq,m is a polynomial in x and q for each m ∈ N.
(i) 2 Show that this polynomial Aq,m has the form Aq,m = x m qm(m−1)/2 +
Rq,m , where Rq,m is a Z-linear combination of xi q j with i < m and j <
m (m − 1) /2.
The polynomials Aq,m are called Carlitz’s q-Eulerian polynomials.

A.4. Permutations
The notations of Chapter 5 shall be used here. In particular, if X is a set, then
SX shall mean the symmetric group of X; and if n is a nonnegative integer, then
Sn shall mean the symmetric group S[n] of the set [n] := {1, 2, . . . , n}.

A.4.1. Basic definitions

Exercise A.4.1.1. 1 Let n ∈ N and σ ∈ Sn . What is the easiest way to obtain

(a) a two-line notation of σ−1 from a two-line notation of σ ?


(b) the one-line notation of σ−1 from the one-line notation of σ ?
(c) the cycle digraph of σ−1 from the cycle digraph of σ ?
A.4.2. Transpositions and cycles
Exercise A.4.2.1. Let X be a set. Prove the following:
(a) 2 For any k distinct elements i1 , i2 , . . . , ik of X, we have
cyci1 ,i2 ,...,ik = ti1 ,i2 ti2 ,i3 · · · tik−1 ,ik .
| {z }
k−1 transpositions

(b) 1 For any k distinct elements i1 , i2 , . . . , ik of X and any σ ∈ SX , then


σ cyci1 ,i2 ,...,ik σ−1 = cycσ(i1 ),σ(i2 ),...,σ(ik ) .
Math 701 Spring 2021, version April 6, 2024 page 540

Definition A.4.1. Let X be a set. An involution of X means a map f : X → X


that satisfies f ◦ f = id. Clearly, an involution is always a permutation, and
equals its own inverse.
For example, the identity map idX is an involution, and any transposition
ti,j ∈ SX is an involution, whereas k-cycles cyci1 ,i2 ,...,ik with k > 2 are never
involutions.
Exercise A.4.2.2. Let n ∈ N. Let w0 ∈ Sn be the permutation that sends
each k ∈ [n] to n + 1 − k. Thus, w0 is the permutation that “reflects”
all numbers from 1 to n across the middle of the interval [n]. It is the
unique strictly decreasing permutation of [n]. In one-line notation, w0 is
(n, n − 1, n − 2, . . . , 2, 1).
(a) 1 Prove that w0 is an involution of [n].
jnk
(b) 1 Prove that w0 = t1,n t2,n−1 · · · tk,n+1−k , where k = .
2
(c) 3 Prove that

w0 = cyc1,2,...,n cyc1,2,...,n−1 cyc1,2,...,n−2 · · · cyc1


= cyc1 cyc2,1 cyc3,2,1 · · · cycn,n−1,...,1 .

Exercise A.4.2.3. Let p be a prime number. Let Z be the set of all p-cycles in
the symmetric group S p .
Let ζ be the specific p-cycle cyc1,2,...,p ∈ S p . Note that ζ has order p in the
group S p , and thus generates a cyclic subgroup ⟨ζ ⟩ of order p.
(a) 2 Prove that a permutation σ ∈ S p satisfies σζ = ζσ if and only if
σ ∈ ⟨ζ ⟩ (that is, if and only if σ is a power of ζ).
(b) 2 Prove that |⟨ζ ⟩ ∩ Z | = p − 1.
(c) 1 Prove that the cyclic group ⟨ζ ⟩ acts on the set Z by conjugation:

α ⇀ σ = ασα−1 for any α ∈ ⟨ζ ⟩ and σ ∈ Z

(where the symbol “⇀” means the action of a group G on a G-set X – i.e., we
let g ⇀ x denote the result of a group element g ∈ G acting on some x ∈ X).

(d) 1 Find the fixed points of this action.


(e) 1 Prove Wilson’s theorem from elementary number theory, which states
that
( p − 1)! ≡ −1 mod p.
Math 701 Spring 2021, version April 6, 2024 page 541

A.4.3. Inversions, length and Lehmer codes

Exercise A.4.3.1. 4 Prove Proposition 5.3.3.

Exercise A.4.3.2. Let n ∈ N. Prove the following:



(a) 1 We have ℓ ti,j = 2 |i − j| − 1 for any distinct i, j ∈ [n].
 
(b) 2 We have ℓ cyci+1,i+2,...,i+k = k − 1 for any integers i and k with
0 ≤ i < i + k ≤ n.
 
(c) 4 We have ℓ cyci1 ,i2 ,...,ik ≥ k − 1 for any k distinct elements
i1 , i2 , . . . , i k ∈ [ n ].
(d) 1 Are the k-cycles of the form cyci+1,i+2,...,i+k the only k-cycles whose
length is k − 1 ?

Exercise A.4.3.3. Let σ ∈ Sn and i ∈ [n]. Prove the following (using the
notation of Definition 5.3.6 (a)):
(a) 1 We have ℓi (σ) = |[σ (i ) − 1] \ σ ([i ])|.
(b) 1 We have ℓi (σ) = |[σ (i ) − 1] \ σ ([i − 1])|.
(c) 1 We have σ (i ) ≤ i + ℓi (σ).
(d) 2 Assume that i ∈ [n − 1]. We have σ (i ) > σ (i + 1) if and only if
ℓ i ( σ ) > ℓ i +1 ( σ ).

A.4.4. V-permutations

Exercise A.4.4.1. 5 Let n ∈ N. For each r ∈ [n], let cr denote the permutation
cycr,r−1,...,2,1 ∈ Sn . (Thus, c1 = cyc1 = id and c2 = cyc2,1 = s1 .)

Let G = g1 , g2 , . . . , g p be a subset of [n], with g1 < g2 < · · · < g p . Let
σ ∈ Sn be the permutation c g1 c g2 · · · c g p .
[Example: If n = 6 and p = 2 and G = {2, 5}, then σ = c2 c5 =
cyc2,1 cyc5,4,3,2,1 . In one-line notation, this permutation σ is 521346.]
Prove the following:
(a) We have σ (1) > σ (2) > · · · > σ ( p).
(b) We have σ ([ p]) = G.
(c) We have σ ( p + 1) < σ ( p + 2) < · · · < σ (n).
(Note that a chain of inequalities that involves less than two numbers is
considered to be vacuously true. For example, Exercise A.4.4.1 (c) is vacu-
ously true when p = n − 1 and also when p = n.)
Math 701 Spring 2021, version April 6, 2024 page 542

Exercise A.4.4.2. 8 Let n ∈ N. Define the permutations cr as in Exercise


A.4.4.1.
Let σ ∈ Sn . We will use the notations from Definition 5.3.6.
(a) Prove that the following five statements are equivalent:

• Statement 1: We have σ (1) > σ (2) > · · · > σ ( p) and σ ( p + 1) <


σ ( p + 2) < · · · < σ (n) for some p ∈ {0, 1, . . . , n}. (In other words, the
one-line notation of σ is decreasing at first, then increasing.)

• Statement 2: We have σ = c g1 c g2 · · · c g p for some elements g1 < g2 <


· · · < g p of [n].

• Statement 3: For each i ∈ [n], the set σ−1 ([i ]) is an integer interval (i.e.,
there exist integers u and v such that σ−1 ([i ]) = {u, u + 1, u + 2, . . . , v}).

• Statement 4: If i, j, k ∈ [n] satisfy i < j < k, then we have σ (i ) > σ ( j) or


σ ( k ) > σ ( j ).

• Statement 5: We have ℓ1 (σ) > ℓ2 (σ ) > · · · > ℓ p (σ) and ℓ p+1 (σ) =
ℓ p+2 (σ) = · · · = ℓn (σ) = 0 for some p ∈ {0, 1, . . . , n}. (In other
words, the n-tuple L (σ) is strictly decreasing until it reaches 0, and
then remains at 0.)

(b) Permutations σ ∈ Sn satisfying the above five statements are known as


“V-permutations” (as their plot looks somewhat like the letter “V”: decreasing
at first, then increasing).
Assume that n > 0. Prove that the # of V-permutations in Sn is 2n−1 .
[Example: If n = 3, then the V-permutations in Sn are (in one-line notation)
123 and 213 and 312 and 321.]

Exercise A.4.4.3. 4 Let n ∈ N. Let K be a commutative ring, and let a, b ∈ K.


Recall the q-binomial coefficients from Definition 4.4.3 (a). Prove that in the
polynomial ring K [q], we have
n     
∑ a p n− p
b ∑ q ℓ(σ)
= aq 0
+ b aq 1
+ b · · · aq n −1
+ b
p =0 σ ∈ Sn ;
σ(1)>σ(2)>···>σ( p);
σ( p+1)<σ( p+2)<···<σ(n)

and
n n  
n
∑a p n− p
b ∑ q ℓ(σ)
= ∑q k(k−1)/2
k q
ak bn−k .
p =0 σ ∈ Sn ; k =0
σ (1)>σ(2)>···>σ( p);
σ( p+1)<σ( p+2)<···<σ (n)

[Both equalities can be proved combinatorially, thus leading to a combina-


torial proof of Theorem 4.4.19.]
Math 701 Spring 2021, version April 6, 2024 page 543

A.4.5. Fixed points


The next two exercises are concerned with the fixed points of maps (not only
of permutations).

Definition A.4.2. Let X be a set. Let f : X → X be a map. Then:


(a) A fixed point of f means an x ∈ X satisfying f ( x ) = x.
(b) We let Fix f denote the set of all fixed points of f . (This is the set
{ x ∈ X | f ( x ) = x }.)

Exercise A.4.5.1. 4 Let X and Y be two finite sets. Let f : X → Y and


g : Y → X be two maps. Prove that

|Fix ( f ◦ g)| = |Fix ( g ◦ f )| .

Exercise A.4.5.2. Let X be a finite set.


(a) 4 Prove that each permutation σ ∈ SX is a composition of two involu-
tions of X.
(b) 2 Prove that each permutation σ ∈ SX is conjugate to its inverse σ−1
in the symmetric group SX .
[Hint: For part (a), it is easier to show the following stronger claim: If
σ ∈ SX and Y ⊆ Fix σ and p ∈ X \ Y, then there exist two involutions
α, β ∈ SX such that σ = α ◦ β and Y ⊆ Fix α and Y ∪ { p} ⊆ Fix β.
For part (b), you can use part (a).]

A.4.6. More on inversions


The next two exercises concern the inversions of a permutation. They use the
following definition:

Definition A.4.3. Let n ∈ N. For every σ ∈ Sn , we let Inv σ denote the set of
all inversions of σ.

We know from Corollary 5.3.20 (b) that any n ∈ N and any two permutations
σ and τ in Sn satisfy the inequality ℓ (στ ) ≤ ℓ (σ ) + ℓ (τ ). In the following
exercise, we will see when this inequality becomes an equality:

Exercise A.4.6.1. 6 Let n ∈ N. Let σ ∈ Sn and τ ∈ Sn .


(a) Prove that ℓ (στ ) = ℓ (σ) + ℓ (τ ) holds if and only if Inv τ ⊆ Inv (στ ).
(b) Prove that ℓ (στ ) = ℓ (σ) + ℓ (τ ) holds if and only if Inv σ−1 ⊆


Inv τ −1 σ−1 .

Math 701 Spring 2021, version April 6, 2024 page 544

(c) Prove that Inv σ ⊆ Inv τ holds if and only if ℓ (τ ) = ℓ τσ−1 + ℓ (σ ).




(d) Prove that if Inv σ = Inv τ, then σ = τ.


(e) Prove
− 1
 that ℓ (στ ) = ℓ (σ) + ℓ (τ ) holds if and only if (Inv σ) ∩
Inv τ = ∅.

Exercise A.4.6.1 (d) shows that if two permutations in Sn have the same set
of inversions, then they are equal. In other words, a permutation in Sn is
uniquely determined by its set of inversions. The next exercise shows what set
of inversions a permutation can have:

Exercise A.4.6.2. 7 Let n ∈ N. Let G = (i, j) ∈ Z2 | 1 ≤ i < j ≤ n .




A subset U of G is said to be transitive if every a, b, c ∈ [n] satisfying


( a, b) ∈ U and (b, c) ∈ U also satisfy ( a, c) ∈ U.
A subset U of G is said to be inversive if there exists a σ ∈ Sn such that
U = Inv σ.
Let U be a subset of G. Prove that U is inversive if and only if both U and
G \ U are transitive.

A.4.7. When transpositions generate SX


The next exercise uses a tiny bit of graph theory (the notion of connectedness
of a graph):

Exercise A.4.7.1. Let X be a nonempty finite set. Let G be a loopless undi-


rected graph with vertex set X. For each edge e of G, we let te denote the
transposition ti,j ∈ SX , where i and j are the two endpoints of e. These
transpositions te for all edges e of G will be called the G-edge transpositions.
(a) 5 Prove that the G-edge transpositions generate the symmetric group
SX if and only if the graph G is connected.

[Example: If G is the path graph 1 2 3 ··· n on


the vertex set X = [n], then the G-edge transpositions are precisely the simple
transpositions s1 , s2 , . . . , sn−1 . In this case, the claim of this exercise becomes
the claim of Corollary 5.3.22.]
(b) 5 Let σ ∈ SX be a permutation that can be written as a product of
G-edge
  transpositions. Prove that σ can be written as a product of at most
n
many G-edge transpositions, where n = | X |.
2

A.4.8. Pattern avoidance


A warm-up exercise for this subsection:
Math 701 Spring 2021, version April 6, 2024 page 545

Exercise A.4.8.1. Let n ∈ N.


(a) 1 Find the # of permutations σ ∈ Sn such that each i ∈ [n] satisfies
σ (i ) ≤ i + 1.
(b) 1 Find the # of permutations σ ∈ Sn such that each i ∈ [n] satisfies
i − 1 ≤ σ ( i ).
(c) 2 Find the # of permutations σ ∈ Sn such that each i ∈ [n] satisfies
i − 1 ≤ σ (i ) ≤ i + 1.

The next few exercises cover some of the most basic results in the theory of
pattern avoidance (see [Bona12, Chapter 4] and [Kitaev11] for much more161 ).
This can be viewed as one possible way of generalizing monotonicity (i.e., in-
creasingness and decreasingness). We begin by defining some basic concepts:

Definition A.4.4. Let t = (t1 , t2 , . . . , tn ) be an arbitrary


 (finite) tuple. A subse-
quence of t means a tuple of the form ti1 , ti2 , . . . , tim , where i1 , i2 , . . . , im are
m elements of [n] satisfying i1 < i2 < · · · < im .

Example A.4.5. Let t = (5, 1, 6, 2, 3, 4). Then, (1, 3) and (5, 6, 2) are subse-
quences of t (indeed, if we write t as (t1 , t2 , . . . , t6 ), then (1, 3) = (t2 , t5 ) and
(5, 6, 2) = (t1 , t3 , t4 )), whereas (1, 5) and (2, 1, 6) and (1, 1, 6) are not.

Remark A.4.6. Let t = (t1 , t2 , . . . , tn ) be an arbitrary (finite) tuple. Then, the


1-tuple (ti ) is a subsequence of t for any i ∈ [n]. So is the empty 0-tuple
(). Also, the tuple t itself is a subsequence of t (and is the only length-n
subsequence of t).

Next, we need to define the notion of equally ordered tuples. Roughly speaking,
these are tuples of the same length that might differ in their values, but agree
in the relative order of their values (e.g., if one tuple has a smaller value in
position 2 than in position 5, then so does the other tuple). Here is the formal
definition:

Definition A.4.7. Let a = ( a1 , a2 , . . . , ak ) and b = (b1 , b2 , . . . , bk ) be two k-


tuples of integers. We say that a and b are equally ordered (to each other) if
for every pair (i, j) ∈ [k ] × [k ], we have the logical equivalence
 
ai < a j ⇐⇒ bi < b j .

This relation is clearly symmetric in a and b (that is, a and b are equally
ordered if and only if b and a are equally ordered).
We agree that a k-tuple and an ℓ-tuple are never equally ordered when
k ̸ = ℓ.
161 There is a yearly conference on this subject!
Math 701 Spring 2021, version April 6, 2024 page 546

Example A.4.8. (a) The two triples (3, 1, 6) and (1, 0, 2) are equally ordered.
(b) The two quadruples (3, 1, 1, 2) and (4, 1, 1, 3) are equally ordered.
(c) The two triples (3, 1, 2) and (2, 1, 3) are not equally ordered (indeed, we
have 3 < 2, but we don’t have 2 < 3).

Now, we can define the notion of a pattern in a tuple:

Definition A.4.9. Let s = (s1 , s2 , . . . , sn ) and u = (u1 , u2 , . . . , um ) be two tu-


ples of integers.
A u-pattern in s means a subsequence of s that is equally ordered to the
tuple u. (In particular, this subsequence must have the same length as u.)
In the following, when we talk about u-patterns, we will often write the
tuple u without commas and parentheses. (For example, we shall abbreviate
“(2, 3, 1)-pattern” as “231-pattern”.)

Example A.4.10. Let s = (s1 , s2 , . . . , sn ) be a tuple of integers. Let us see what


u-patterns in s mean for various specific tuples u:

(a) A 21-pattern in s is a subsequence si , s j of s with si > s j (and, of
course, i < j, by the definition of a subsequence). For example, the tuple
(4, 5, 2, 1) has five 21-patterns (namely, (4, 2), (4, 1), (5, 2), (5, 1) and (2, 1)).

(b) A 123-pattern in s is a subsequence si , s j , sk with si < s j < sk (and, of
course, i < j < k, by the definition of a subsequence). For example, the tuple
(2, 1, 3, 5) has two 123-patterns (namely, (1, 3, 5) and (2, 3, 5)).

(c) A 231-pattern in s is a subsequence si , s j , sk with sk < si < s j (and, of
course, i < j < k, by the definition of a subsequence). For example, the tuple
(1, 2, 5, 1) has one 231-pattern (namely, (2, 5, 1)).

Finally, we can define pattern avoidance:

Definition A.4.11. Let s and u be two tuples of integers. We say that s is


u-avoiding if there is no u-pattern in s.

Example A.4.12. The tuple (2, 1, 5, 3, 4) is

• not 123-avoiding, since it contains the 123-pattern (1, 3, 4) (and also the
123-pattern (2, 3, 4));

• not 132-avoiding, since it contains the 132-pattern (1, 5, 3);

• not 321-avoiding, since it contains the 321-pattern (5, 3, 4);

• 231-avoiding (check this!).


Math 701 Spring 2021, version April 6, 2024 page 547

Example A.4.13. (a) A tuple s = (s1 , s2 , . . . , sn ) is 21-avoiding if and only if it


is weakly increasing (i.e., satisfies
 s1 ≤ s2 ≤ · · · ≤ sn ). Indeed, a 21-pattern
in s is a subsequence si , s j of s with si > s j ; thus, the non-existence of such
21-patterns is equivalent to s1 ≤ s2 ≤ · · · ≤ sn .
(b) Likewise, a tuple s = (s1 , s2 , . . . , sn ) is 12-avoiding if and only if it is
weakly decreasing (i.e., satisfies s1 ≥ s2 ≥ · · · ≥ sn ).
Finally, our concept of pattern avoidance can be extended from tuples to
permutations in the most obvious manner:
Definition A.4.14. Let u be a tuple of integers. Let n ∈ N and σ ∈ Sn . We
say that the permutation σ is u-avoiding if the OLN of σ (that is, the n-tuple
(σ (1) , σ (2) , . . . , σ (n))) is u-avoiding.

Example A.4.15. Let n ∈ N. The only 21-avoiding permutation σ ∈ Sn is the


identity permutation id ∈ Sn (since it is the only permutation whose OLN
is weakly increasing). Likewise, the only 12-avoiding permutation σ ∈ Sn is
the permutation w0 ∈ Sn from Exercise A.4.2.2.
After all this build-up, we can now study u-avoidance for more complicated
patterns u than 21 and 12:

Exercise A.4.8.2. 6 Let n ∈ N. Let cn denote the n-th Catalan number (from
Example 2 in Section 3.1).
(a) Prove that

(# of 132-avoiding permutations in Sn ) = cn .
(b) Prove that

(# of 231-avoiding permutations in Sn ) = cn .
(c) Prove that

(# of 213-avoiding permutations in Sn ) = cn .
(d) Prove that

(# of 312-avoiding permutations in Sn ) = cn .
[Hint: Easy bijections show that parts (a), (b), (c) and (d) are equivalent.
For part (b), proceed recursively: Assume that n > 0, and let σ ∈ Sn , and let
i = σ−1 (n). Show that the permutation σ is 231-avoiding if and only if the
two tuples (σ (1) , σ (2) , . . . , σ (i − 1)) and (σ (i + 1) , σ (i + 2) , . . . , σ (n)) are
231-avoiding and satisfy {σ (1) , σ (2) , . . . , σ (i − 1)} = {1, 2, . . . , i − 1} and
{σ (i + 1) , σ (i + 2) , . . . , σ (n)} = {i, i + 1, . . . , n − 1}. This yields a recursive
equation for the # of 231-avoiding permutations in Sn .]
Math 701 Spring 2021, version April 6, 2024 page 548

Exercise A.4.8.3. 8 Let n ∈ N. Let cn denote the n-th Catalan number (from
Example 2 in Section 3.1).
(a) Prove that

(# of 123-avoiding permutations in Sn ) = cn .
(b) Prove that

(# of 321-avoiding permutations in Sn ) = cn .
[Hint: Consider any 321-avoiding permutation σ ∈ Sn . A record of σ means
a value σ (i ) for some i ∈ [n] satisfying

σ (i ) > σ ( j ) for all j ∈ [i − 1] .

(Equivalently, it is an entry in the OLN of σ that is larger than all entries


further left.) Let b1 , b2 , . . . , bk be the records of a permutation σ ∈ Sn , written
in increasing order (or, equivalently, in the order of their appearance in the
OLN of σ). (For example, if σ = 14672385, then these are 1, 4, 6, 7, 8.) Ar-
gue first that the OLN of σ becomes weakly increasing when all records are
removed from it. (For example, 14672385 becomes 235 this way.) Write the
OLN of σ in the form
 
 
(σ (1) , σ (2) , . . . , σ (n)) = 
 b1 , . .
|{z} . , b2 , . .
|{z} . , . . . , b k − 1 , . .
|{z} . , b k , . . .
|{z}  ,

some i1 some i2 some ik−1 some ik
entries entries entries entries

where i1 , i2 , . . . , ik ∈ N. Set ci := bi − bi−1 for each i ∈ [k ], where b0 := 0.


Now, define B (σ) to be the Dyck path

N c1 S i1 +1 N c2 S i2 +1 · · · N c k S i k +1 ,

where an “N j ” means j consecutive NE-steps, and where an “S j ” means j


consecutive SE-steps. For example, σ = 14672385 leads to

B (σ)
= N 1 S1 N 3 S1 N 2 S1 N 1 S3 N 1 S2
= NSNNNSNNSNSSSNSS

=
Math 701 Spring 2021, version April 6, 2024 page 549

Prove that the map

B : {321-avoiding permutations in Sn } → {Dyck paths from (0, 0) to (2n, 0)} ,


σ 7→ B (σ)

is well-defined and a bijection. (In proving surjectivity, don’t forget to check


that the bi ’s really are the records of σ !)]

Exercises A.4.8.2 and A.4.8.3 can be combined into a single statement, which
says that for any τ ∈ S3 and any n ∈ N, the # of (τ (1) , τ (2) , τ (3))-avoiding
permutations in Sn equals the Catalan number cn , independently of τ. The
independence appears almost too good to be true. Parts of this miracle survive
even for τ ∈ S4 ; for example, for any n ∈ N, we have
(# of 4132-avoiding permutations in Sn )
= (# of 3142-avoiding permutations in Sn )
(a result of Stankova [Stanko94, Theorem 3.1]), but this number does not equal
the # of 1324-avoiding permutations in Sn (in general), nor does it have any
simple formula. A Wikipedia page collects known results about these and
similar numbers.
We state a few simple results about permutations avoiding several patterns:
Definition A.4.16. Let u and v be two tuples of integers. A permutation in Sn
(or a tuple of integers) is said to be (u, v)-avoiding if and only if it is both u-
avoiding and v-avoiding. Similarly we define (u, v, w)-avoiding permutations
(where u, v and w are three tuples of integers).

Exercise A.4.8.4. Let n be a positive integer.


(a) 3 Prove that

(# of (231, 321) -avoiding permutations in Sn ) = 2n−1 .

(b) 2 Prove that

(# of (132, 231) -avoiding permutations in Sn ) = 2n−1 .

(c) 2 Prove that

(# of (123, 321) -avoiding permutations in Sn ) = 0 if n > 4.

(d) 2 Prove that

(# of (231, 321, 312) -avoiding permutations in Sn ) = f n+1 ,

where ( f 0 , f 1 , f 2 , . . .) is the Fibonacci sequence (as in Section 3.1).


Math 701 Spring 2021, version April 6, 2024 page 550

(e) 2 Prove that

(# of (123, 132, 231) -avoiding permutations in Sn ) = n.

[Hint: In parts (a), (b) and (d), the permutations you are counting have
already appeared in one of the previous problems under a different guise.]

Finally, let us connect 132-avoidance with Lehmer codes:

Exercise A.4.8.5. 3 Let σ ∈ Sn . Recall Definition 5.3.6 (a). Prove that σ is


132-avoiding if and only if

ℓ1 ( σ ) ≥ ℓ2 ( σ ) ≥ · · · ≥ ℓ n ( σ ) .

A.4.9. The cycle decomposition

Exercise A.4.9.1. 1 Let X be a finite set. Let σ be a permutation of X. Prove


that the order of σ in the symmetric group SX equals the lcm of the lengths
of all cycles of σ.

Exercise A.4.9.2. 3 Let X be a finite set. Let σ and τ be two permutations of


X. Prove that σ and τ are conjugate in the symmetric group SX if and only if
the cycle lengths partition of σ equals the cycle lengths partition of τ.

The disjoint cycle decomposition of a permutation allows us to define its reflec-


tion length, which is an analogue of the Coxeter length that we have defined in
Definition 5.3.1 (b):

Exercise A.4.9.3. Let X be a finite set. Let n = | X |.


The reflection length (aka absolute length) of a permutation σ ∈ SX is defined
to be n − i, where i is the number of cycles (including the 1-cycles) in the dis-
joint cycle decomposition of σ. (For example, any k-cycle in SX has reflection
length k − 1, since it has 1 cycle of length k and n − k cycles of length 1.) The
reflection length of a permutation σ ∈ SX is denoted by ℓr (σ).
Prove the following:
(a) 1 For any σ ∈ SX , we have ℓr σ−1 = ℓr (σ).


(b) 1 If σ and τ are two conjugate elements in the group SX , then ℓr (σ) =
ℓr ( τ ).
(c) 3 For any σ ∈ SX and any two distinct elements i and j of X, we have
( σ
  ℓr (σ) + 1, if we don’t have i ∼ j;
ℓr σti,j = ℓr ti,j σ = σ
ℓr (σ) − 1, if i ∼ j.
Math 701 Spring 2021, version April 6, 2024 page 551

σ
Here, the notation “i ∼ j” means “i and j belong to the same cycle of σ” (that
is, “there exists some p ∈ N such that i = σ p ( j)”).
(d) 2 If σ ∈ SX , then the number ℓr (σ) is the smallest p ∈ N such that we
can write σ as a composition of p transpositions.
(e) 2 For any σ ∈ SX and τ ∈ SX , we have ℓr (στ ) ≤ ℓr (σ ) + ℓr (τ ).
(f) 2 For any σ ∈ SX and τ ∈ SX , we have ℓr (στ ) ≡ ℓr (σ ) + ℓr (τ ) mod 2.
(g) 1 If σ ∈ SX , then ℓr (σ) ≤ ℓ (σ).

Exercise A.4.9.4. Let n ∈ N. Let σ ∈ Sn . Define ℓr (σ) as in Exercise A.4.9.3


(setting X = [n]).
(a) 4 Prove that there is a unique n-tuple (i1 , i2 , . . . , in ) ∈ [1] × [2] × · · · ×
[n] such that
σ = t1,i1 ◦ t2,i2 ◦ · · · ◦ tn,in .
Here, we define ti,i to be the identity permutation id ∈ Sn for each i ∈ [n].
(b) 3 Consider this n-tuple (i1 , i2 , . . . , in ). Prove that

ℓr (σ) = (# of all k ∈ [n] satisfying ik ̸= k) .

Exercise A.4.9.5. Fix a commutative ring K and a nonnegative integer n ∈ N.


For each σ ∈ Sn , we define the permutation matrix Pσ be the n × n-matrix

([i = σ ( j)])i,j∈[n] ∈ K n×n .

(This is the n × n-matrix whose (i, j)-th entry is [i = σ ( j)], where we are
using the notation of Definition 4.1.5.) For instance, if n = 4 and σ = 3124 in
one-line notation, then
 
0 1 0 0
 0 0 1 0 
Pσ =  1 0 0 0 .

0 0 0 1

(a) 1 Let GLn (K ) be the group of all invertible n × n-matrices over K.


Prove that the map

Sn → GLn (K ) ,
σ 7→ Pσ

is a group homomorphism.
Math 701 Spring 2021, version April 6, 2024 page 552

(b) 2 Assuming that K is a field, prove that each σ ∈ Sn satisfies


rank ( Pσ − In ) = ℓr (σ), where ℓr (σ) is defined as in Exercise A.4.9.3 (set-
ting X = [n]). (Here, In denotes the n × n identity matrix.)
(c) 7 Assuming that K is a field, prove that two permutations σ, τ ∈ Sn
are conjugate in the group Sn if and only if their permutation matrices Pσ
and Pτ are similar (i.e., conjugate in the group GLn (K )).
(d) 2 Prove the claim of part (c) more generally if K is any nontrivial
commutative ring.
[Hint: For part (c), it helps to show that the cycle lengths partition of a
permutation σ can be uniquely recovered if one knows the number of cycles
of each of its powers σ1 , σ2 , σ3 , . . ..]

Exercise A.4.9.6. For each n ∈ N, let an denote the # of all permutations


σ ∈ Sn that have odd order in the symmetric group Sn . (By Exercise A.4.9.1,
these are precisely the permutations whose all cycles have odd length.) Also
set an := 0 for all n < 0. Prove the following:
(a) 1 If X is any finite set, then the # of all permutations σ ∈ SX that have
odd order in SX equals a|X | .
n−1
 
(b) 3 We have an = ∑ k! an−1−k for each n > 0.
k∈N even k
an
Set bn := for each n ∈ N. Also, set bn := 0 for all n < 0. Prove that:
n!
(c) 1 We have nbn = ∑ bn−1−k = bn−1 + bn−3 + bn−5 + · · · for each
k∈N even
n > 0.
Let b ∈ Q [[ x ]] be the FPS ∑ bn x n . Prove the following:
n ∈N
 
(d) 1 We have b′ = 1 + x2 + x4 + x6 + · · · · b.
| {z }
= ∑ x2k
k ∈N

1
(e) 2 We have Log b = (Log (1 + x ) − Log (1 − x )).
2
1 + x 1/2
 
−1/2
(f) 1 We have b = = (1 + x ) · 1 − x 2 .
1−x
⌊n/2⌋ −1/2
 
(g) 2 We have bn = (−1) for each n ∈ N.
⌊n/2⌋
(h) 2 For each n ∈ N, we have
 ( 2


1/2 c , if n is even;
an = n! · (−1)⌊n/2⌋ = n2
⌊n/2⌋ ncn , if n is odd,
Math 701 Spring 2021, version April 6, 2024 page 553

where cn = 1 · 3 · 5 · · · · · (2 ⌊n/2⌋ − 1) is the product of all odd positive


integers smaller than n.
(i) 3 For each n ∈ N, let a′n be the # of all permutations σ ∈ Sn whose all
cycles have even length. Show that
(
an , if n is even;
a′n = for each n ∈ N.
0, if n is odd

[Hint: In part (b), first choose the cycle of σ that contains the element 1.]

A.4.10. Reduced words


In this subsection, we shall take a closer look at how permutations in the
symmetric groups Sn can be represented as products of simple transpositions
si . Most exercises here are particular cases of standard results about Coxeter
groups (see, e.g., [BjoBre05] and [Bourba02] for introductions), but it is worth
seeing them in the special yet rather intuitive setting of symmetric groups.
We recall Definition 5.2.3.

Definition A.4.17. Let n ∈ N. Let σ ∈ Sn .


(a) A Coxeter word for σ shall mean a tuple (i1 , i2 , . . . , ik ) ∈ [n − 1]k satisfy-
ing σ = si1 si2 · · · sik .
(b) A reduced word for σ shall mean a Coxeter word for σ that has the
smallest length among all Coxeter words for σ.

Note that Theorem 5.3.17 (a) shows that any permutation σ ∈ Sn has a Cox-
eter word. Furthermore, Theorem 5.3.17 (b) says that the length ℓ (σ) of a
permutation σ ∈ Sn is the smallest length of a Coxeter word for σ. Thus, a
reduced word for a permutation σ ∈ Sn is the same as a Coxeter word for σ
that has length ℓ (σ). (This is the reason for the name “length”.)

Example A.4.18. Let n = 5, and let σ ∈ S5 be the permutation whose


OLN is 32415. Then, σ = s1 s3 s2 s1 ; thus, (1, 3, 2, 1) is a Coxeter word for
σ. Other Coxeter words for σ are (3, 1, 2, 1) and (3, 2, 1, 2) and (1, 3, 1, 1, 2, 1)
and (3, 1, 2, 3, 1, 3). The reduced words for σ are (1, 3, 2, 1) and (3, 1, 2, 1) and
(3, 2, 1, 2).

Note that each permutation σ ∈ Sn has finitely many reduced words, but
infinitely many Coxeter words (unless n ≤ 1). The identity permutation id has
only one reduced word – namely, the 0-tuple () – but usually many Coxeter
words, such as (1, 2, 3, 3, 2, 1).
We shall now study the combinatorics of Coxeter and reduced words of a
permutation σ ∈ Sn in more depth. First, let us view them from a different
perspective:
Math 701 Spring 2021, version April 6, 2024 page 554

Definition A.4.19. Let n ∈ N and σ ∈ Sn . A sorting sequence for σ shall mean


a sequence (σ0 , σ1 , . . . , σk ) of permutations σi ∈ Sn with the property that
σ0 = σ and σk = id and that for each i ∈ [k ], the permutation σi is obtained
from σi−1 by swapping two consecutive entries σi−1 (h) and σi−1 (h + 1) into
the correct order (i.e., σi−1 (h) > σi−1 (h + 1), but σi (h) < σi (h + 1)).
Thus, intuitively, a sorting sequence for σ is a way of sorting its OLN (i.e.,
the list σ (1) σ (2) · · · σ (n)) into increasing order by repeatedly swapping two
out-of-order consecutive entries. For example, if n = 5, and if σ ∈ S5 is the
permutation whose OLN is 32415, then
(32415, 32145, 31245, 13245, 12345)
is a sorting sequences of σ (one of three such sequences).

Exercise A.4.10.1. 2 Let n ∈ N and σ ∈ Sn . Find a bijection between


{reduced words for σ} and {sorting sequences for σ}.

Exercise A.4.10.2. 1 Let n ∈ N. Let σ ∈ Sn . Let (i1 , i2 , . . . , ik ) be a re-


duced word for σ. Let u, v ∈ {0, 1, . . . , k } be such that u ≤ v. Prove that
(iu+1 , iu+2 , . . . , iv ) is a reduced word for siu+1 siu+2 · · · siv .

Exercise A.4.10.3. 2 Let n ∈ N. Let σ ∈ Sn . Let k ∈ [n − 1]. Prove that σ has


a reduced word whose last entry is k if and only if σ (k) > σ (k + 1).

Exercise A.4.10.4. 4 Let n ∈ N. Let σ ∈ Sn . Let u, v ∈ [n − 1]. Assume that


σ has a reduced word whose last entry is u. Assume further that σ has a
reduced word whose last entry is v. Prove the following:
(a) If |u − v| > 1, then σ has a reduced word whose last two entries are u
and v (in this order).
(b) If |u − v| = 1, then σ has a reduced word whose last three entries are
u, v, u (in this order).
Let us now discuss some ways to transform Coxeter words. For instance:
• If a tuple of the form (. . . , 2, 5, . . .) (that is, a tuple that has two adjacent
entries 2 and 5) is a Coxeter word for some permutation σ ∈ Sn , then
the tuple (. . . , 5, 2, . . .) (that is, the result of swapping these two adjacent
entries) is a Coxeter word for the same permutation σ, since Proposition
5.2.5 (b) yields s2 s5 = s5 s2 .
• If a tuple of the form (. . . , 2, 3, 2, . . .) is a Coxeter word for some permuta-
tion σ ∈ Sn , then the tuple (. . . , 3, 2, 3, . . .) (that is, the result of replacing
the three adjacent entries 2, 3, 2 by 3, 2, 3, while leaving all remaining en-
tries unchanged) is a Coxeter word for the same permutation σ, since
Proposition 5.2.5 (c) yields s2 s3 s2 = s3 s2 s3 .
Math 701 Spring 2021, version April 6, 2024 page 555

• If a Coxeter word for σ ∈ Sn has two adjacent entries that are equal, then
we can remove these two entries and still have a Coxeter word for σ, since
Proposition 5.2.5 (a) yields si si = s2i = id for each i ∈ [n − 1].

Generalizing these three observations, we obtain the following ways to change


a Coxeter word:

Definition A.4.20. Let n ∈ N. Let σ ∈ Sn .


Given a Coxeter word i = (i1 , i2 , . . . , ik ) for σ, we can obtain other Coxeter
words for σ by the following three kinds of transformations:
(a) We can pick two adjacent entries iu and iu+1 of i that satisfy |iu − iu+1 | >
1, and swap them (that is, replace the u-th and (u + 1)-st entries of i by iu+1
and iu , respectively). This is called a commutation move, and results in a new
Coxeter word for σ, since Proposition 5.2.5 (b) yields siu siu+1 = siu+1 siu .
For example, we can use such a move to transform the Coxeter word
(1, 2, 3, 1, 2) into (1, 2, 1, 3, 2).
(b) We can pick three adjacent entries iu , iu+1 and iu+2 of i that satisfy iu =
iu+2 = iu+1 ± 1 (by which we mean that we have either iu = iu+2 = iu+1 + 1
or iu = iu+2 = iu+1 − 1), and replace these three entries by iu+1 , iu and iu+1 ,
respectively. This is called a braid move, and results in a new Coxeter word
for σ, since we have siu siu+1 siu+2 = siu siu+1 siu = siu+1 siu siu+1 by Proposition
|{z}
= siu
5.2.5 (c).
For example, we can use such a move to transform the Coxeter word
(1, 2, 1, 3, 2) into (2, 1, 2, 3, 2), and we can use another such move to trans-
form this result further into (2, 1, 3, 2, 3).
(c) We can pick two adjacent entries iu and iu+1 of i that are equal, and
remove both of them from i. This is called a contraction move, and results
in a new Coxeter word for σ, since we have siu siu+1 = siu siu = s2iu = id by
|{z}
= siu
Proposition 5.2.5 (a).
For example, we can use such a move to transform the Coxeter word
(3, 2, 2, 1) into (3, 1).

Of course, these transformations can only be applied to a Coxeter word when


the respective requirements are met. For example, none of these transforma-
tions can be applied to the Coxeter word (1, 2), since it has neither two adjacent
entries iu and iu+1 that satisfy |iu − iu+1 | > 1, nor three adjacent entries iu , iu+1
and iu+2 that satisfy iu = iu+2 = iu+1 ± 1, nor two adjacent entries iu and iu+1
that are equal. On the other hand, we can apply any of the three kinds of
transformation to the Coxeter word (4, 1, 2, 1, 2, 2, 4, 4), and we even have mul-
tiple choices for each of them (e.g., we can apply a braid move to replace the
“1, 2, 1” by “2, 1, 2”, but we can also apply a braid move to replace the “2, 1, 2”
Math 701 Spring 2021, version April 6, 2024 page 556

by “1, 2, 1” instead). Thus, starting with a single Coxeter word, we obtain a


whole tapestry of Coxeter words by applying moves to it. Note that each com-
mutation move and each braid move can be undone by a move of the same
kind, while contraction moves cannot be undone.
We notice that commutation moves and braid moves don’t change the length
of a Coxeter word. Thus, if they are applied to a reduced word, they result in
another reduced word.

Example A.4.21. Let n = 4, and let σ ∈ S4 be the permutation whose OLN


is 4321. It is easy to see that (1, 2, 1, 3, 2, 1) is a reduced word for σ. Omitting
commas and parentheses, we shorten this reduced word to 121321. By re-
peatedly applying commutation moves and braid moves, we can transform
this reduced word into 212321, then further into 213231, then into 231231,
and so on, and in other directions too (as there are often several moves avail-
able). Let us draw the result as a graph, with the nodes being all the reduced
words that we obtain, and the edges signifying braid moves and commuta-
tion moves (the thick edges stand for commutation moves):

323123 321323

232123 321232

231213 312132

213213 231231 312312 132132

213231 132312

212321 123212

121321 123121

It turns out that each reduced word for σ appears as a node on this graph.
In other words, each reduced word for σ can be obtained from 121321 by a
sequence of commutation moves and braid moves. This is not a coincidence,
but a general result, known as Matsumoto’s theorem for the symmetric group:
Math 701 Spring 2021, version April 6, 2024 page 557

Exercise A.4.10.5. 6 Let n ∈ N. Let σ ∈ Sn . Let i and j be two reduced words


for σ. Prove that i can be transformed into j by a sequence of commutation
moves and braid moves.
[Hint: Induct on ℓ (σ).]

The graph in Example A.4.21 is furthermore bipartite; better yet, any cycle
has an even # of thick edges and an even # of thin edges. This, too, is not a
coincidence:

Exercise A.4.10.6. Let n ∈ N. Let σ ∈ Sn . Let i be a reduced word for σ.


Assume that we have transformed i into itself by a sequence of commutation
moves and braid moves.
(a) 3 Prove that the # of braid moves in this sequence must be even.
(b) 7 Prove that the # of commutation moves in this sequence must be
even.
By throwing contraction moves into the mix, we can furthermore reduce non-
reduced Coxeter words:

Exercise A.4.10.7. 6 Let n ∈ N. Let σ ∈ Sn . Let i be a Coxeter word for σ,


and let j be a reduced word for σ. Prove that i can be transformed into j by
a sequence of commutation moves, braid moves and contraction moves.

How many reduced words does a given permutation σ ∈ Sn have? For most
σ, there is no nice formula for the answer. However, in at least one specific case,
a surprising (and deep) formula exists, which I am here mentioning less as a
reasonable exercise than as a curiosity:

Exercise A.4.10.8. 50 Let n ∈ N. Define the permutation w0 ∈ Sn as in


Exercise A.4.2.2. (Note that the σ in Example A.4.21 is the w0 for n = 4.)
Prove that the # of reduced words for w0 is
 
n
!  
2 n 2
n = !· ∏ .
i −1 2 i+j−1
∏ (2n − 2i + 1) 1 ≤ i < j ≤ n
i =1

( 2 Prove the equality sign here.)

This number, incidentally, is the largest # of reduced words that a permuta-


tion in Sn can have. On the other extreme (both of the # of reduced words and
the difficulty of the proof), here is a characterization of permutations that have
a unique reduced word:
Math 701 Spring 2021, version April 6, 2024 page 558

Exercise A.4.10.9. 3 Let n be a positive integer. Let σ ∈ Sn . Prove that there


is only one reduced word for σ if and only if σ has the form cyci,i+1,i+2,...,j or
cyc j,j−1,j−2,...,i for some i, j ∈ [n] satisfying i ≤ j. (If i = j, then these cycles
are just id.)

There is a connection between braid moves and pattern avoidance (Exercise


A.4.8.2):

Exercise A.4.10.10. 4 Let n ∈ N. Let σ ∈ Sn . Prove that σ is 321-avoiding


if and only if every two reduced words for σ can be transformed into each
other by a sequence of commutation moves (without using any braid moves).

As an application of reduced words, the following group-theoretical charac-


terization of the symmetric group Sn easily follows:

Exercise A.4.10.11. 3 Let n ∈ N. Prove that the group Sn is isomorphic to


the group with generators g1 , g2 , . . . , gn−1 and relations

gi2 = 1 for all i ∈ [n − 1] ;


gi g j = 1 for all i, j ∈ [n − 1] satisfying |i − j| > 1;
gi gi + 1 gi = gi + 1 gi gi + 1 for all i ∈ [n − 2] .

(The isomorphism sends each gi to si ∈ Sn .)

This is known as the Coxeter-Moore presentation of Sn .

A.4.11. Descents
Descents are one of the most elementary features of a permutation σ ∈ Sn : they
are just the positions at which σ decreases (from that position to the next).
Formally, they are defined as follows:

Definition A.4.22. Let n ∈ N. Let σ ∈ Sn be a permutation.


(a) A descent of σ means an i ∈ [n − 1] such that σ (i ) > σ (i + 1).
(b) The descent set of σ is defined to be the set of all descents of σ. This set
is denoted by Des σ.

Example A.4.23. The permutation σ ∈ S7 with OLN 3146275 has descents 1


(since σ (1) > σ (2)) and 4 (since σ (4) > σ (5)) and 6 (since σ (6) > σ (7)).
Thus, it has descent set Des σ = {1, 4, 6}.
Math 701 Spring 2021, version April 6, 2024 page 559

Exercise A.4.11.1. 2 Let n ∈ N.


(a) How many σ ∈ Sn have exactly 0 descents?
(b) How many σ ∈ Sn have exactly 1 descent?
(c) How many σ ∈ Sn have exactly n − 1 descents?
(d) Prove that the # of all σ ∈ Sn satisfying 1 ∈ Des σ (that is, σ (1) > σ (2))
n!
is . (Here, we assume that n ≥ 2.)
2
(e) Prove that the # of all σ ∈ Sn satisfying 1, 2 ∈ Des σ (that is, σ (1) >
n!
σ (2) > σ (3)) is . (Here, we assume that n ≥ 3.)
6
(f) How many σ ∈ Sn satisfy 1, 3 ∈ Des σ (that is, σ (1) > σ (2) and σ (3) >
σ (4)) ? (Here, we assume that n ≥ 4.)
The following exercise generalizes parts (d), (e) and (f) of Exercise A.4.11.1:
Exercise A.4.11.2. Let n ∈ N. Let I be a subset of [n − 1]. Write I in the form
I = {c1 , c2 , . . . , ck } with c1 < c2 < · · · < ck . Set c0 := 0 and ck+1 := n. For
each i ∈ [k + 1], set di := ci − ci−1 . Note that the k + 1 numbers d1 , d2 , . . . , dk+1
are precisely the lengths of the intervals into which the elements of I subdi-
vide the interval [0, n].
(a) 3 Prove that

n!
(# of σ ∈ Sn satisfying Des σ ⊆ I ) = .
d1 !d2 ! · · · dk+1 !

(b) 5 Let us use the notations from Definition 4.4.15 (b). Prove that

[n]q !
∑ qℓ(σ) =
[ d 1 ] q ! [ d 2 ] q ! · · · [ d k +1 ] q !
in the ring Z [q] .
σ ∈ Sn ;
Des σ⊆ I

Note that Exercise A.4.11.2 (b) generalizes both Exercise A.4.11.2 (a) (obtained
by setting q = 1) and Proposition 5.3.5 (obtained by setting I = [n − 1] and
q = x).
What about permutations σ ∈ Sn satisfying Des σ = I rather than Des σ ⊆ I
? See Exercise A.5.4.3 further below for this.
Meanwhile, let us connect descents with Eulerian polynomials:

Exercise A.4.11.3. 5 Let n be a positive integer. Consider the polynomials


Am ∈ Z [ x ] defined for all m ∈ N in Exercise A.2.6.2. Prove that
∑ x |Des σ|+1 = An in the ring Z [ x ] .
σ ∈ Sn
Math 701 Spring 2021, version April 6, 2024 page 560

(For example, for n = 4, we have ∑ x |Des σ|+1 = x + 11x2 + 11x3 + x4 =


σ ∈ S4
A4 .)
This exercise can also be q-generalized:

Exercise A.4.11.4. 7 Let n be a positive integer. Consider the polynomials


Aq,m ∈ Z [q, x ] defined for all m ∈ N in Exercise A.3.4.17. (We know from
Exercise A.3.4.17 (h) that they are polynomials.) Prove that

∑ x |Des σ|+1 qsum(Des σ) = Aq,n in the ring Z [q, x ] .


σ ∈ Sn

Here, sum S is defined as in Proposition 4.4.7 (b).


The following exercise is a celebrated result of Foata:
Exercise A.4.11.5. Let n ∈ N. For any permutation σ, we let maj σ denote
the sum of all descents of σ. (This is known as the major index of σ, and
is precisely the number sum (Des σ) from Exercise A.4.11.4. For example, if
σ ∈ S5 has OLN 34152, then Des σ = {2, 4} and maj σ = 2 + 4 = 6.)
(a) 2 Prove that

∑ x ℓ(σ) = ∑ xmaj σ in Z [ x ] .
σ ∈ Sn σ ∈ Sn

In other words, prove that each i ∈ N satisfies

(# of permutations σ ∈ Sn with ℓ (σ) = i )


= (# of permutations σ ∈ Sn with maj σ = i ) .

(b) 8 Prove that


−1
∑ x ℓ(σ) ymaj σ = ∑ xmaj σ yℓ(σ) = ∑ xmaj σ ymaj(σ ) in Z [ x, y] .
σ ∈ Sn σ ∈ Sn σ ∈ Sn

In other words, prove that each i ∈ N and j ∈ N satisfy

(# of permutations σ ∈ Sn with ℓ (σ) = i and maj σ = j)


= (# of permutations σ ∈ Sn with maj σ = i and ℓ (σ) = j)
   
= # of permutations σ ∈ Sn with maj σ = i and maj σ−1 = j .

A.5. Alternating sums, signed counting and determinants


The notations of Chapter 6 shall be used here.
Math 701 Spring 2021, version April 6, 2024 page 561

A.5.1. Cancellations in alternating sums

Exercise A.5.1.1. 3 Prove a generalization of Lemma 6.1.4 in which f is only


required to be a bijection, not an involution, but the assumption “sign I = 0
for all I ∈ X satisfying f ( I ) = I” is replaced by the stronger assumption
“sign I = 0 for all I ∈ X and all odd k ∈ N satisfying f k ( I ) = I”.

Exercise A.5.1.2. Recall the concepts of Dyck words and Dyck paths defined
in Example 2 in Section 3.1.
Let n ∈ N.
If w ∈ {0, 1}2n is a 2n-tuple, and if k ∈ {0, 1, . . . , 2n}, then we define the
k-height hk (w) of w to be the number

(# of 1’s among the first k entries of w)


− (# of 0’s among the first k entries of w) .

If w is a Dyck word, then this k-height hk (w) is a nonnegative integer.


[For example, if n = 4 and w = (1, 0, 0, 1), then h3 (w) = 1 − 2 = −1 < 0,
which shows that w is not a Dyck word.]
Furthermore, if w ∈ {0, 1}2n is a 2n-tuple, then we define the area area (w)
of w to be the number
2n
area (w) := ∑ hk (w) ,
k =0

and we define the sign sign (w) of w to be the number (−1)(area(w)−n)/2 (we
will soon see that this is well-defined).
[For example, if n = 5 and w = (1, 1, 0, 1, 1, 0, 0, 0, 1, 0), then
10
area (w) = ∑ hk (w)
k =0
= h0 ( w ) + h1 ( w ) + h2 ( w ) + h3 ( w ) + h4 ( w ) + h5 ( w )
| {z } | {z } | {z } | {z } | {z } | {z }
=0 =1 =2 =1 =2 =3
+ h6 (w) + h7 (w) + h8 (w) + h9 (w) + h10 (w)
| {z } | {z } | {z } | {z } | {z }
=2 =1 =0 =1 =0
= 0 + 1 + 2 + 1 + 2 + 3 + 2 + 1 + 0 + 1 + 0 = 13

and sign (w) = (−1)(area(w)−n)/2 = (−1)(13−5)/2 = 1. The names “k-height”


and “area” are not accidental: If w is a Dyck word, then the “heights”
h0 (w) , h1 (w) , . . . , h2n (w) really are the heights (i.e., the y-coordinates) of
the points on the Dyck path corresponding to the Dyck word w; further-
more, the number area (w) really is the area of the “mountain range” under
the Dyck path.]
Math 701 Spring 2021, version April 6, 2024 page 562

(a) 1 Prove that any w ∈ {0, 1}2n satisfies (area (w) − n) /2 ∈ Z (so that
(−1)(area(w)−n)/2 really is well-defined).
(b) 2 Prove that a 2n-tuple w ∈ {0, 1}2n is a Dyck word of length 2n if and
only if it satisfies

h2i−1 (w) ≥ 0 for all i ∈ {1, 2, . . . , n}

and h2n (w) = 0.


(c) 4 Recall the Catalan numbers c0 , c1 , c2 , . . . as introduced in Section 3.1.
Assume that n is a positive integer. Prove that
(
(−1)(n−1)/2 c(n−1)/2 , if n is odd;
∑ sign (w) =
w is a Dyck word 0, if n is even.
of length 2n

[Hint: In part (c), find a sign-reversing involution on a certain


set of Dyck words of length 2n that preserves all the “odd heights”
h1 (w) , h3 (w) , . . . , h2n−1 (w) while changing one of the “even heights” hk (w)
by 2.]

The next exercise is not about alternating sums, but rather about proving the
q-Lucas theorem (Theorem 6.1.7):
Exercise A.5.1.3. Let K be a field. Let d be a positive integer. Let ω be a
primitive d-th root of unity in K.
 
d
(a) 2 Prove that = 0 for each k ∈ {1, 2, . . . , d − 1}.
k ω
Now, let A be a noncommutative K-algebra, and let a, b ∈ A be such that
ba = ωab.
(b) 1 Prove that ( a + b)d = ad + bd .
(c) 3 Prove that adand bd commute with both a and b (that is, we have
uv = vu for each u ∈ ad , bd and each v ∈ { a, b}).
(d) 5 Prove Theorem 6.1.7.
 
n [ n ] q  n − 1
[Hint: For part (a), show that = for all n > 0 and
k q [k ]q k − 1 q
k > 0. For part (d), first construct a noncommutative K-algebra A and two
elements a, b ∈ A satisfying ba = ωab and such that all the monomials ai b j
are K-linearly independent. Use Exercise A.3.4.15 for this. In this K-algebra,
d
expand both sides of ( a + b)n = ( a + b)q ( a + b)r . Alternatively, there is a
commutative approach using Theorem 4.4.19.]
Math 701 Spring 2021, version April 6, 2024 page 563

A.5.2. The principles of inclusion and exclusion


 
n n
Exercise A.5.2.1. 3 Let n ∈ N. Prove that ∑ (−1) ( n − k ) n +1 = k

  k =0 k
n+1
· n!.
2

Exercise A.5.2.2. 1 Let n, m ∈ N. Prove that

(# of surjective maps f : [m] → [n]) = n! · S (m, n) ,

where S (m, n) is the Stirling number of the 2nd kind (as defined in Exercise
A.2.6.5).
The next exercise is concerned with the derangement numbers Dn from Defini-
tion 6.2.4.

Exercise A.5.2.3. (a) 1 Prove that Dn = nDn−1 + (−1)n for all n ≥ 1.


(b) 1 Prove that Dn = (n − 1) ( Dn−1 + Dn−2 ) for all n ≥ 2.
 
n n
(c) 2 Prove that n! = ∑ Dn−k for all n ∈ N.
k =0 k
Dn n exp [− x ]
(d) 1 Show that ∑ x = in the FPS ring Q [[ x ]].
n∈N n! 1−x

Exercise A.5.2.4. 3 Reprove Theorem 3.9.8 using the PIE.

Exercise A.5.2.5. 4 For any n, m ∈ N, we define a polynomial Zm,n ∈ Z [ x ]


by
n   m
k n
Zm,n = ∑ (−1) x n−k − 1 .
k =0
k
Prove that Zm,n = Zn,m for all m, n ∈ N.

Exercise A.5.2.6. Let n be a positive integer. Let a1 , a2 , . . . , an be any n inte-


gers.
(a) 4 Show that
n
∑ (−1)k−1 ∑

max { a1 , a2 , . . . , an } = min ai1 , ai2 , . . . , aik .
k =1 1≤i1 <i2 <···<ik ≤n

(b) 2 More generally: Show that


n
∑ (−1)k−1 ∑
 
F (max { a1 , a2 , . . . , an }) = F min ai1 , ai2 , . . . , aik
k =1 1≤i1 <i2 <···<ik ≤n
Math 701 Spring 2021, version April 6, 2024 page 564

for any function F : Z → R.


The following exercise is about a sequence of rather useful identities, some-
times known as the polarization identities:
Exercise A.5.2.7. Let n ∈ N. Let A be a commutative ring. Let v1 , v2 , . . . , vn ∈
A and w ∈ A. Prove the following:
(a) 4 For each m ∈ N, we have
!m
∑ (−1)n−| I | w + ∑ vi = ∑ m
v i1 v i2 · · · v i m ,
I ⊆[n] i∈ I (i1 ,i2 ,...,im )∈{0,1,...,n} ;
[n]⊆{i1 ,i2 ,...,im }

where we set v0 := w.
(b) 1 For each m ∈ {0, 1, . . . , n − 1}, we have
!m
∑ (−1)n−| I | w + ∑ vi = 0.
I ⊆[n] i∈ I

(c) 1 We have
!n
∑ (−1)n−| I | w + ∑ vi = n!v1 v2 · · · vn .
I ⊆[n] i∈ I

(d) 2 We have
 n

∑ (−1)n−| I | ∑ vi − ∑ vi  = 2n n!v1 v2 · · · vn .
I ⊆[n] i∈ I i ∈[n]\ I

Exercise A.5.2.8. 5 Let A and B be two finite sets. Let R be a subset of A × B.


For any subset X of A, we define M ( X ) to be the set

{b ∈ B | there exists some x ∈ X such that ( x, b) ∈ R} .

For any subset Y of B, we define N (Y ) to be the set

{ a ∈ A | there exists some y ∈ Y such that ( a, y) ∈ R} .

Prove that
∑ (−1)|X | = ∑ (−1)|Y | .
X ⊆ A; Y ⊆ B;
M( X )= B N (Y )= A
Math 701 Spring 2021, version April 6, 2024 page 565

[Remark: Those familiar with graph theory can think of A, B and R as


forming a bipartite graph (with vertex set A ⊔ B and edge set R). In that
case, M ( X ) is the “neighbor set” of X (that is, the set of all vertices that have
at least one neighbor in X), and likewise N (Y ) is the “neighbor set” of Y.]

Exercise A.5.2.9. 5 Let n > 1 be an integer. Consider n people standing in a


circle. Each of them looks down at someone else’s feet (i.e., at the feet of one
of the other n − 1 persons). A bell sounds, and every person (simultaneously)
looks up at the eyes of the person whose feet they have been ogling. If two
people make eye contact, they scream. Show that the probability that no one
screams is
n
n (n − 1) · · · (n − 2k + 1)
∑ (−1)k .
k =0 (n − 1)2k · 2k · k!
Here is a combinatorial restatement of the question (if you prefer not to
deal with probabilities): A pair (i, j) of elements of [n] is said to scream at a
map f : [n] → [n] if it satisfies f (i ) = j and f ( j) = i. A map f : [n] → [n]
is silent if no pair (i, j) ∈ [n] × [n] screams at f . Prove that the # of all silent
maps f : [n] → [n] is
n
n (n − 1) · · · (n − 2k + 1)
∑ (−1)k k
2 · k!
(n − 1)n−2k .
k =0

The following two exercises show some applications of the methods of Chap-
ter 6 to graph theory.

Exercise A.5.2.10. Let G be a finite undirected graph with vertex set V and
edge set E. Fix n ∈ N.
An n-coloring of G means a map c : V → [n]. If c : V → [n] is an n-coloring,
then we regard the values c (v) of c as the “colors” of the respective vertices
v.
An n-coloring c of G is said to be proper if there exists no edge of G whose
two endpoints v and w satisfy c (v) = c (w). (In other words, an n-coloring
of G is said to be proper if and only if there is no edge whose two endpoints
have the same color.)
Let χG (n) denote the # of proper n-colorings of G.
(a) 5 Prove that

χG (n) = ∑ (−1)| F| nconn(V,F) ,


F⊆E

where conn (V, F ) denotes the # of connected components of the graph with
vertex set V and edge set F.
Math 701 Spring 2021, version April 6, 2024 page 566

This shows, in particular, that χG (n) is a polynomial function in n. (The


corresponding polynomial is known as the chromatic polynomial of G.)
(b) 1 Find an explicit formula for χG (n) if G is a path graph
1 2 3 ··· m with m vertices.

(c) 2 Find an explicit formula for χG (n) if G is a cycle graph with m


vertices.

Exercise A.5.2.11. 7 Let G be an undirected graph with vertex set V and


edge set E. Fix a vertex v ∈ V.
Given any subset F of E, we define an F-path to be a path of G whose edges
all belong to F.
A subset F of E is said to infect an edge e ∈ E if there is an F-path leading
from v to some endpoint of e. (Note that this is automatically satisfied if v is
an endpoint of e, since the empty path is always an F-path.)
A subset F of E is said to be pandemic if it infects each edge e ∈ E.
Prove that
∑ (−1)|F| = [E = ∅] .
F ⊆ E is
pandemic

[Example: Let G be the following graph:

2 3
p q r

1 8 7 4

v w t
6 5

(where the vertex v is the vertex labelled v). Then, for example, the set
{1, 2} ⊆ E infects edges 1, 2, 3, 6, 8 (but none of the other edges). The set
{1, 2, 5} infects the same edges as {1, 2} (indeed, the additional edge 5 does
not increase its infectiousness, since it is not on any {1, 2, 5}-path from v).
The set {1, 2, 3} infects every edge other than 5. The set {1, 2, 3, 4} infects
each edge, and thus is pandemic.]

Exercise A.5.2.12. 3 Let K be a commutative ring. Let n ∈ N. Let A =


ai,j 1≤i≤n, 1≤ j≤n ∈ K n×n be an n × n-matrix. Then, the permanent per A of A


is defined to be the element


∑ a1,σ(1) a2,σ(2) · · · an,σ(n)
σ ∈ Sn
Math 701 Spring 2021, version April 6, 2024 page 567

of K (where Sn is the n-th symmetric group). Prove the Ryser formula


n
per A = (−1)n ∑ (−1)| I | ∏ ∑ ai,j .
I ⊆[n] j =1 i∈ I

The following exercise is a variant of Theorem 6.2.10:

Exercise A.5.2.13. 2 Let S be a finite set. Let A be any additive abelian


group.
For each subset I of S, let a I and b I be two elements of A.
Assume that
b I = ∑ (−1)| J | a J for all I ⊆ S.
J⊆I

Then, prove that we also have

aI = ∑ (−1)| J | b J for all I ⊆ S.


J⊆I

The next exercise is an analogue of Exercise A.5.2.13 with sets replaced by


numbers:

Exercise A.5.2.14. 2 Let A be any additive abelian group. Let ( a0 , a1 , . . . , an )


and (b0 , b1 , . . . , bn ) be two (n + 1)-tuples of elements of A. Assume that
m  
i m
bm = ∑ (−1) ai for all m ∈ {0, 1, . . . , n} .
i =0
i

Prove that we also have


m  
m
am = ∑ (−1) bii
for all m ∈ {0, 1, . . . , n} .
i =0
i

[Hint: There is a direct proof, but it is perhaps neater to derive this from
Exercise A.5.2.13.]
[Remark: The (n + 1)-tuple (b0 , b1 , . . . , bn ) is called the binomial transform
of ( a0 , a1 , . . . , an ).]

The next few exercises show some ways of generalizing the Principle of In-
clusion and Exclusion (in its original form – Theorem 6.2.1). The first one
replaces the question “how many elements of U belongs to none of the n sub-
sets A1 , A2 , . . . , An ” by “how many elements of U belong to exactly k of the n
subsets A1 , A2 , . . . , An ”:
Math 701 Spring 2021, version April 6, 2024 page 568

Exercise A.5.2.15. 4 Let n ∈ N, and let U be a finite set. Let A1 , A2 , . . . , An


be n subsets of U. Let k ∈ N. Let

Sk := {u ∈ U | the # of all i ∈ [n] satisfying u ∈ Ai is k} .

Show that
 
|I|
| Sk | = ∑ (−1) | I |−k
k
(# of u ∈ U that satisfy u ∈ Ai for all i ∈ I ) .
I ⊆[n]

Exercise A.5.2.16. 5 Let n ∈ N, and let U be a finite set. Let A1 , A2 , . . . , An


be n subsets of U. Let m ∈ N.
(a) For each u ∈ U, let c (u) be the # of all i ∈ [n] satisfying u ∈ Ai . Show
that

∑ (−1)| I | (# of u ∈ U that satisfy u ∈ Ai for all i ∈ I )


I ⊆[n];
| I |≤m
c (u) − 1
 
= (−1) m
∑ m
.
u ∈U

(b) Conclude the Bonferroni inequalities, which say that

∑ (−1)m−| I | (# of u ∈ U that satisfy u ∈ Ai for all i ∈ I ) ≥ 0


I ⊆[n];
| I |≤m

if U = A1 ∪ A2 ∪ · · · ∪ An .

Exercise A.5.2.17. (a) 4 Find a common generalization of Exercise A.5.2.15


and Exercise A.5.2.16 that has the form
 
|I| | I |
∑ (−1) k
(# of u ∈ U that satisfy u ∈ Ai for all i ∈ I )
I ⊆[n];
| I |≤m
= ∑ ···
|{z} .
u ∈U some expression involving c(u)

(b) 2 Generalize this further by including weights on the elements of U


(similarly to how Theorem 6.2.9 generalizes Theorem 6.2.1).

The next exercise generalizes Theorem 6.2.9 in a similar way as q-binomial


coefficients generalize binomial coefficients:
Math 701 Spring 2021, version April 6, 2024 page 569

Exercise A.5.2.18. (a) 4 Let n ∈ N, and let U be a finite set. Let


A1 , A2 , . . . , An be n subsets of U. Let K be any commutative ring. Let
w : U → K be any map (i.e., let w (u) be an element of K for each u ∈ U). Let
q ∈ K. Prove that

∑ (1 + q)(# of i∈[n] satisfying u∈ Ai ) w (u) = ∑ q| I | ∑ w (u)


u ∈U I ⊆[n] u∈U;
u∈ Ai for all i ∈ I

in K.
(b) 1 Derive Theorem 6.2.9 as a particular case of part (a).
(c) 2 Prove that each n ∈ N satisfies
n
n!
∑ q|Fix σ| = ∑ k!
( q − 1) k
σ ∈ Sn k =0

in the polynomial ring Z [q]. (See Definition A.4.2 for the definition of Fix σ.)

Next comes another counting problem that can be solved in many ways:

Exercise A.5.2.19. 3 Let A be an additive abelian group (with its neutral


element denoted by 0). Let n ∈ N. Show that

# of n-tuples ( a1 , a2 , . . . , an ) ∈ ( A \ {0})n such that a1 + a2 + · · · + an = 0




(| A| − 1)n + (−1)n (| A| − 1)
= .
| A|

Next comes a generalization of Theorem 4.1.14:

Exercise A.5.2.20. 3 Let n ∈ N and k ∈ N.


Let podd,k (n) be the # of partitions of n that have exactly k distinct even
parts. (For instance, the partition (7, 5, 4, 4, 3, 2) has exactly 2 distinct even
parts, namely 4 and 2.)
Let pdist,k (n) be the # of partitions λ of n that have exactly k numbers
appear in λ more than once (in the sense that there are exactly k distinct in-
tegers i such that i appears more than once in λ). (For instance, the partition
(7, 4, 2, 2, 1, 1, 1) has exactly 2 numbers appear more than once, namely 2 and
1.)
Prove that
podd,k (n) = pdist,k (n) .
Math 701 Spring 2021, version April 6, 2024 page 570

Exercise A.5.2.21. 5 Solve Exercise A.2.14.3 (b) again using the Principle of
Inclusion and Exclusion.

A.5.3. Determinants
We fix a commutative ring K.

Exercise A.5.3.1. 3 Let n be a positive integer. Let a1 , a2 , . . . , an ∈ K and


b1 , b2 , . . . , bn−1 ∈ K and c1 , c2 , . . . , cn−1 ∈ K. Let A be the n × n-matrix
 
a1 0 · · · 0 c1
 0 a2 · · · 0 c2 
 
 .. .. . . .
.. .
.. 
 . . . .
 
 0 0 · · · a n −1 c n −1 
b1 b2 · · · bn−1 an



 ai , if i = j;
b , if i = n and j ̸= n;

j
(This is the matrix whose (i, j)-th entry is


 ci , if i ̸= n and j = n;
0, if i ̸= n and j ̸= n and i ̸= j

for all i ∈ [n] and j ∈ [n].) Prove that


n −1
det A = a1 a2 · · · an − ∑ bi c i ∏ aj.
i =1 j∈[n−1];
j ̸ =i

Exercise A.5.3.2. 3 Let n ∈ N. Let A be an n × n-matrix. Let b1 , b2 , . . . , bn be


n elements of K. Prove that
n   
∑ det Ai,j bi
[ j=k]
= (b1 + b2 + · · · + bn ) det A
1≤i ≤n, 1≤ j≤n
k =1

(where we are using Definition A.1.5). Equivalently (rewritten in a friendlier


Math 701 Spring 2021, version April 6, 2024 page 571

but longer form): Prove that

A1,2 · · · A1,2 b1 · · ·
   
A1,1 b1 A1,n A1,1 A1,n
 A2,1 b2 A2,2 · · · A2,n   A2,1 A2,2 b2 · · · A2,n 
det   + det 
   
.. .. .. .. .. .. .. .. 
 . . . .   . . . . 
An,1 bn An,2 · · · An,n An,1 An,2 bn · · · An,n
A1,2 · · ·
 
A1,1 A1,n b1
 A2,1 A2,2 · · · A2,n b2 
+ · · · + det  .
 
 .. .. ... .. 
. . 
An,1 An,2 · · · An,n bn
A1,1 A1,2 · · · A1,n
 
 A2,1 A2,2 · · · A2,n 
= (b1 + b2 + · · · + bn ) det  . .
 
 .. .. ... ..
. . 
An,1 An,2 · · · An,n

Exercise A.5.3.3. Let n be a positive integer. Let A ∈ K n×n be an n × n-matrix.

(a) 4 Prove that the equality


 
−2
det Ai,j An,n − Ai,n An,j 1≤i≤n−1, 1≤ j≤n−1 = Ann,n

· det A

holds if the element An,n of K is invertible.


(b) 2 Prove that this equality also holds if n ≥ 2 (whether or not An,n is
invertible).
[Hint: For part (a), observe that Ai,j An,n − Ai,n An,j = An,n ·
 
Ai,n
Ai,j − A .]
An,n n,j
The following two exercises give some applications of determinants:
Exercise A.5.3.4. Let n ∈ N. Let a1 , a2 , . . . , an ∈ K and b1 , b2 , . . . , bn ∈ K.
(a) 2 Use the Cauchy–Binet identity (Theorem 6.4.18, applied to appro-
priate 2 × n- and n × 2-matrices) to show that
! ! !2
n n n
∑ a2k ∑ bk2 − ∑ ak bk = ∑ ai bj − a j bi .
2
k =1 k =1 k =1 1≤ i < j ≤ n

(b) 1 If K = R, then conclude the Cauchy–Schwarz inequality


! ! !2
n n n
∑ a2k ∑ bk2 ≥ ∑ a k bk .
k =1 k =1 k =1
Math 701 Spring 2021, version April 6, 2024 page 572

Exercise A.5.3.5. Let n be a positive integer.


(a) 2 Prove that

∑ (−1)σ = (−1)n−1 (n − 1) .
σ∈Sn is a
derangement

(See Definition 6.2.4 for the notion of a derangement.)


(b) 2 Prove that

∑ (−1)σ x |Fix σ| = ( x + n − 1) ( x − 1)n−1


σ ∈ Sn

for any x ∈ K. (See Definition A.4.2 (b) for the definition of |Fix σ |.)
(c) 3 Prove that

(−1)σ n
∑ |Fix σ| + 1
= (−1)n+1
n+1
.
σ ∈ Sn

(This is Problem B6 on the Putnam competition 2005.)


In the next exercise, you are asked to reconstruct a proof of the Vandermonde
determinant (specifically, of Theorem 6.4.31 (d)) using a special kind of directed
graphs – the tournaments. This is by far not the easiest proof of Theorem 6.4.31
(d), but is perhaps the most combinatorial.
Exercise A.5.3.6. We define a tournament to be a simple directed graph with
the property that for any two distinct vertices i and j, exactly one of the arcs
(i, j) and ( j, i ) belongs to the graph. For example, there are 8 tournaments
with vertex set [3], namely

2 2 2 2

3 3 3 3

1 1 1 1
. (311)
2 2 2 2

3 3 3 3

1 1 1 1

(Note that “simple graph” implies that any arc is merely a pair of two distinct
Math 701 Spring 2021, version April 6, 2024 page 573

vertices; thus, in particular, there are no arcs of the form (i, i ).)
Fix n ∈ N. Let T be the set of all tournaments with vertex set [n]. It is easy
to see that | T | = 2n(n−1)/2 .
For any permutation σ ∈ Sn , we define Pσ ∈ T to be the tournament with
vertex set [n] and with arcs

(σ (i ) , σ ( j)) for all i ∈ [n] and j ∈ [n] satisfying i < j.

(For example, in the above table (311) of tournaments with vertex set [3],
the first tournament is Pid , while the second tournament is Ps2 .)
We define the scoreboard scb D of a tournament D ∈ T to be the n-tuple
(s1 , s2 , . . . , sn ) ∈ Nn , where

s j := (# of arcs of D that end at j)


= (# of i ∈ [n] such that (i, j) is an arc of D )

for each j ∈ [n].


We say that a tournament D ∈ T is injective if all n entries of its scoreboard
scb D are distinct.
(a) 2 Prove that a tournament D ∈ T is injective if and only if it has the
form Pσ for some σ ∈ Sn .
(b) 2 Prove that the tournaments Pσ for all σ ∈ Sn are distinct.
Now, let a1 , a2 , . . . , an be n elements of K. For each tournament D ∈ T, we
define the following:

• For each arc e = (i, j) of D, we define the weight w (e) of e to be


(−1)[i> j] a j (where we are using Definition A.1.5).
• We define the weight w ( D ) of D to be
 
∏ w (e) = ∏ (−1)[i> j] a j .
e is an arc of D (i,j) is an arc of D

∏ a i − a j = ∑ w ( D ).

(c) 2 Prove that
1≤ j < i ≤ n D∈T
  
(d) 2 Prove that det aij−1 = ∑ w ( D ).
1≤i ≤n, 1≤ j≤n D ∈ T is
injective

(e) 3 Prove that ∑ w ( D ) = 0.


D ∈ T is not
injective

(f) 1 Conclude that Theorem 6.4.31 (d) holds.


Math 701 Spring 2021, version April 6, 2024 page 574

[Hint: In part (e), use a sign-reversing involution. Namely, if D ∈ T is


not injective, then its scoreboard scb D = (s1 , s2 , . . . , sn ) has two equal entries
– i.e., there exists a pair (u, v) of two integers u, v ∈ [n] such that u < v
and su = sv . Pick such a pair (u, v) with smallest possible v (the u is then
uniquely determined (why?)), and relabel the vertices u and v of D as v and
u (so that any arcs of the forms (u, k ), (v, k ), (k, u) or (k, v) become (v, k ),
(u, k), (k, v) or (k, u), respectively). Argue that the new tournament D ′ is still
not injective and satisfies w ( D ′ ) = −w ( D ).]

Another proof of the Vandermonde determinant (Theorem 6.4.31 (c) to be


specific) proceeds through a generalization:

Exercise A.5.3.7. Let n ∈ N. Let a1 , a2 , . . . , an ∈ K. Let p1 , p2 , . . . , pn be n


polynomials in K [ x ] with the property that

deg p j ≤ j − 1 for each j ∈ [n] .

(In particular, p1 is constant.)


(a) 3 Prove that
!
  n h i   
j −1
∏ j −1

det p j ( ai ) 1≤i ≤n, 1≤ j≤n
= x pj · det ai .
1≤i ≤n, 1≤ j≤n
j =1

(Here, p j ( ai ) means the evaluation p j [ ai ], of course.)


(b) 2 By applying this to the polynomials pj :=

( x − a1 ) ( x − a2 ) · · · x − a j−1 , obtain a new proof of Theorem 6.4.31
(c).
(c) 1 Conclude that
!
  n h i
∏ x j −1 p j ∏
 
det p j ( ai ) 1≤i ≤n, 1≤ j≤n
= · ai − a j .
j =1 1≤ j < i ≤ n

[Hint: Part (a) can be done  in many ways, but the simplest is probably by
factoring the matrix p j ( ai ) 1≤i≤n, 1≤ j≤n as a product.]

And here is yet another generalization of the Vandermonde determinant:

Exercise A.5.3.8. Let n ∈ N. Let a1 , a2 , . . . , an ∈ K and b1 , b2 , . . . , bn ∈ K.


Define n polynomials q1 , q2 , . . . , qn ∈ K [ x ] by setting
 
q j = x − b j + 1 x − b j + 2 · · · ( x − bn ) for each j ∈ [n] .
Math 701 Spring 2021, version April 6, 2024 page 575

(In particular, qn = (empty product) = 1.) Furthermore, let p1 , p2 , . . . , pn be


n polynomials in K [ x ] with the property that

deg p j ≤ j − 1 for each j ∈ [n] .

(In particular, p1 is constant.)


(a) 6 Prove that
!
  n
∏ pj ∏
  
det p j ( ai ) q j ( ai ) 1≤i ≤n, 1≤ j≤n
= bj · ai − a j .
j =1 1≤ i < j ≤ n

(Again, f ( ai ) means the evaluation f [ ai ].)


(b) 1 Use this to obtain a new proof of Theorem 6.4.31 (a).
(c) 3 Use this to prove Theorem 6.4.46.

As applications of Exercise A.5.3.7 (c), several determinants consisting of bi-


nomial coefficients can be computed:

Exercise A.5.3.9. Let n ∈ N. Let a1 , a2 , . . . , an ∈ C. Let H (n) denote the


positive integer (n − 1)! · (n − 2)! · · · · · 1!; this is known as the hyperfactorial
of n.
(a) 1 Prove that



  ! ai − a j
ai 1≤ j < i ≤ n
det = .
j−1 1≤i ≤n, 1≤ j≤n H (n)



(b) 1 Conclude that H (n) | ai − a j for any n integers
1≤ j < i ≤ n
a1 , a2 , . . . , a n .
(c) 1 Prove that H (n) = ∏ ( i − j ).
1≤ j < i ≤ n

(d) 3 Prove that



  ! ai − a j
ai + j 1≤ j < i ≤ n
det = .
j−1 1≤i ≤n, 1≤ j≤n H (n)

(e) 2 Assume that a1 , a2 , . . . , an ∈ N. Prove that


! !
  n
det (( ai + j)!)1≤i≤n, 1≤ j≤n = ∏ ( ai + 1)! ∏

ai − a j .
i =1 1≤ j < i ≤ n
Math 701 Spring 2021, version April 6, 2024 page 576

(f) 3 Assume that a1 , a2 , . . . , an ∈ N. Prove that



  ! ai − a j
1 1≤ i < j ≤ n
det = n .
( ai + j ) ! 1≤i ≤n, 1≤ j≤n ∏ ( ai + n ) !
i =1

(g) 2 Prove that


  !
ai + j
det = 1.
i−1 1≤i ≤n, 1≤ j≤n

Combining parts (b) and (c) of Exercise A.5.3.9, we see that any n integers
a1 , a2 , . . . , an satisfy

∏ (i − j ) | ∏ ai − a j .

1≤ j < i ≤ n 1≤ j < i ≤ n

In other words, the product of the pairwise differences between n given inte-
gers a1 , a2 , . . . , an is always divisible by the product of the pairwise differences
between 1, 2, . . . , n. This curious fact is one of the beginnings of Bhargava’s the-
ory of generalized factorials ([Bharga00]). We will not elaborate further on this
theory here, but let us point out that our curious fact has a similarly curious
analogue, in which differences are replaced by differences of squares, and the
numbers 1, 2, . . . , n are replaced by 0, 1, . . . , n − 1. This analogue, too, can be
proved using determinants:

Exercise A.5.3.10. For each n ∈ N, we let H ′ (n) denote the number


n −1
(2k)! 0! 2! 4! (2 (n − 1))!
∏ 2
= · · ·····
2 2 2 2
.
k =0

For each a ∈ R and k ∈ N, we define a real number


 ′
a ( a ( a + 1) ( a + 2) · · · ( a + k − 1)) · ( a ( a − 1) ( a − 2) · · · ( a − k + 1))
:= .
k (2k)!/2
Prove the following:
(a) 2 We have H ′ (n) = ∏ i2 − j2 for each n ∈ Z.

0≤ j < i ≤ n −1
 ′
a+k−1
   
a a+k
(b) 1 We have = + ∈ Z for each a ∈ Z and
k 2k 2k
k ∈ N.
Math 701 Spring 2021, version April 6, 2024 page 577

 ′
a 2 k −1
· ∏ a2 − r2 for each a ∈ R and k ∈ N.

(c) 1 We have =
k (2k)! r=0
(d) 3 For any n integers a1 , a2 , . . . , an , we have
 

 ′ !
 ∏ a2i − a2j
ai = 1≤ j < i ≤ n
det  .
j−1 H ′ (n)
1≤i ≤n, 1≤ j≤n
 
(e) 1 For any n integers a1 , a2 , . . . , an , we have H ′ (n) | ∏ a2i − a2j .
1≤ j < i ≤ n
 
∏ i 2 − j2 | ∏ a2i − a2j

(f) 1 Conclude that for any n inte-
0≤ j < i ≤ n −1 1≤ j < i ≤ n
gers a1 , a2 , . . . , an .
 
∏ i3 j3 ∏ a3i a3j

(g) 1 Is it true that − | − for any n integers
0≤ j < i ≤ n −1 1≤ j < i ≤ n
a1 , a2 , . . . , a n ?
[Hint: In part (d), use Exercise A.5.3.7 (c) again, but this time apply it to
a21 , a22 , . . . , a2n instead of a1 , a2 , . . . , an .]
The following exercise gives some variations on Proposition 6.4.34:
Exercise A.5.3.11. Let n be a positive integer. Let x1 , x2 , . . . , xn be n elements
of K. Let y1 , y2 , . . . , yn be n elements of K.
(a) 2 For every m ∈ {0, 1, . . . , n − 2}, prove that
 
m 
det xi + y j = 0.
1≤i ≤n, 1≤ j≤n

(b) 3 Let p0 , p1 , . . . , pn−1 be n elements of K. Let P ∈ K [ x ] be the polyno-


n −1
mial ∑ pk x k . Prove that
k =0
  
det P xi + y j 1≤i≤n, 1≤ j≤n
n −1  ! ! !
n − 1
= pnn−1 ∏ ∏ xi − x j ∏
 
y j − yi .
k =0
k 1≤ i < j ≤ n 1≤ i < j ≤ n

(c) 2 Let p0 , p1 , . . . , pn−1 be n elements of K. Let P ∈ K [ x ] be the polyno-


n −1
mial ∑ pk x k . Prove that
k =0
! ! !
  n −1
∏ pk ∏ ∏
  
det P xi y j 1≤i ≤n, 1≤ j≤n
= xi − x j yi − y j .
k =0 1≤ i < j ≤ n 1≤ i < j ≤ n
Math 701 Spring 2021, version April 6, 2024 page 578

(d) 3 Prove that


 
n 
det xi + y j
1≤i ≤n, 1≤ j≤n
! !
∏ ∏
 
= xi − x j y j − yi
1≤ i < j ≤ n 1≤ i < j ≤ n
 
n  
n 
·∑ ∏ · er [ x 1 , x 2 , . . . , x n ] · e n −r [ y 1 , y 2 , . . . , y n ] ,
r =0 k∈{0,1,...,n}\{r } k

where we are using the following notation (a particular case of Definition


7.1.9 (a)): For any r ∈ N and any n scalars z1 , z2 , . . . , zn , we set

er [ z 1 , z 2 , . . . , z n ] : = ∑ z i1 z i2 · · · z ir
(i1 ,i2 ,...,ir )∈[n]r ;
i1 <i2 <···<ir

(so that e0 [z1 , z2 , . . . , zn ] = 1 and e1 [z1 , z2 , . . . , zn ] = z1 + z2 + · · · + zn and


e2 [z1 , z2 , . . . , zn ] = ∑ zi z j and so on).
1≤ i < j ≤ n

Exercise A.5.3.12. 4 Let n ∈ N. Let u1 , u2 , . . . , un ∈ K and a1 , a2 , . . . , an ∈ K.


For each k ∈ N, we set

zk := u1 a1k + u2 a2k + · · · + un akn .

Prove that
 
z0 z1 · · · z n −1
    z1 z2 · · · zn 
det z i + j −2 = det 
 
1≤i ≤n, 1≤ j≤n .. .. . . .. 
 . . . . 
z n −1 zn · · · z2n−2

2
= u1 u2 · · · u n · ai − a j .
1≤ i < j ≤ n

Exercise A.5.3.13. 5 Let n ∈ N. Let A ∈ K n×n be a matrix with the property


that
Ai,i+1 = 1 for every i ∈ [n − 1]
and
Ai,j = 0 for every i, j ∈ [n] satisfying j > i + 1.
(Such a matrix A is called a normalized lower Hessenberg matrix. For example,
Math 701 Spring 2021, version April 6, 2024 page 579

 
∗ 1 0 0
 ∗ ∗ 1 0 
for n = 4, such a matrix has the form   ∗ ∗ ∗ 1 , where each asterisk

∗ ∗ ∗ ∗
∗ stands for an arbitrary entry.)
For each subset I of [n − 1], we define an element p I ( A) ∈ K as follows:
Write the subset I in the form {i1 , i2 , . . . , ik } with i1 < i2 < · · · < ik . Addi-
tionally, set i0 := 0 and ik+1 := n. Then, set

k +1
p I ( A) := Ai1 ,i0 +1 Ai2 ,i1 +1 · · · Aik+1 ,ik +1 = ∏ Aiu ,iu−1 +1.
u =1

Prove that
det A = ∑ (−1)n−1−| I | p I ( A) .
I ⊆[n−1]

Normalized lower Hessenberg matrices can be used to provide an “explicit”


determinantal formula for the coefficients of the inverse of an FPS:

Exercise A.5.3.14. 4 Let f = ∑ f k x k be a FPS in K [[ x ]] (with f 0 , f 1 , f 2 , . . . ∈


k ∈N
K). Assume that f 0 = 1. Set f k := 0 for all negative k ∈ Z. Let g = f −1 be the
multiplicative inverse of f in K [[ x ]], and let g0 , g1 , g2 , . . . be the coefficients of
g (so that g = ∑ gk x k ). Prove that each n ∈ N satisfies
k ∈N
 
gn = (−1)n det

f i − j +1 1≤i ≤n, 1≤ j≤n
···
 
f1 1 0 0 0

 f2 f1 1 0 ··· 0 

n
 f3 f2 f1 1 ··· 0 
= (−1) det  .
 
 f4 f3 f2 f1 ··· 0 
 .. .. .. .. .. .. 
 . . . . . . 
fn f n −1 f n −2 f n −3 ··· f1

Our next exercise is concerned with tridiagonal matrices. These are matrices
whose all entries are zero except for those on the diagonal and “its neighbors”.

∗ ∗ 0
For example, a 3 × 3-matrix is tridiagonal if it has the form  ∗ ∗ ∗  (where
0 ∗ ∗
each asterisk ∗ means an arbitrary entry).
Math 701 Spring 2021, version April 6, 2024 page 580

Exercise A.5.3.15. Let n ∈ N. Let a1 , a2 , . . . , an ∈ K and b1 , b2 , . . . , bn−1 ∈ K


and c1 , c2 , . . . , cn−1 ∈ K. Set
 
a1 b1 0 ··· 0 0 0
 c1 a2 b2
 ··· 0 0 0 

 0 c2 a3
 ··· 0 0 0 

A =  ... ... ... .. .. .. ..  ∈ K n×n .
 
 . . . . 
 0 0 0
 · · · a n − 2 bn − 2 0 

 0 0 0 · · · c n −2 a n − 1 bn − 1 
0 0 0 ··· 0 c n −1 a n

Formally speaking, this matrix A is defined to be


 
 ai ,
 if i = j;

if i = j − 1; 
 
  bi ,
A=  .
c ,
 j if i = j + 1; 



0, otherwise
1≤i ≤n, 1≤ j≤n

The matrix A is called a tridiagonal matrix, and its determinant is known as


the continuant of the numbers ai , bi , ci .
(a) 2 Prove that every m ∈ {2, 3, . . . , n} satisfies

det ( A:m ) = am det ( A:m−1 ) − bm−1 cm−1 det ( A:m−2 ) ,

where we set A:k := sub1,2,...,k



1,2,...,k A = Ai,j 1≤i ≤k, 1≤ j≤k for each k ∈ {0, 1, . . . , n }.

(b) 3 Recall the notion of a “lacunar set” as defined in Definition A.1.3. If


I is a set of integers, then we let I + := {i + 1 | i ∈ I }. Prove that
  !
det A = ∑  ∏ ai  ∏ (−bi ci ) .
I ⊆[n−1] is i ∈[n]\( I ∪ I + ) i∈ I
lacunar

(c) 1 Compute det A in the case when

ai = 1 (for all i ∈ [n]) and


bi = 1 (for all i ∈ [n − 1]) and
c i = −1 (for all i ∈ [n − 1]) .

(d) 1 Compute det A in the case when

ai = 2 (for all i ∈ [n]) and


bi = c i = 1 (for all i ∈ [n − 1]) .
Math 701 Spring 2021, version April 6, 2024 page 581

(e) 3 If d0 , d1 , . . . , dn ∈ K are arbitrary elements, and if we have

a i = d i −1 + d i for all i ∈ [n] , and


bi = c i = − d i for all i ∈ [n − 1] ,

then prove that


n
det A = ∑ d0 d1 · · · dbk · · · dn ,
k =0
where the hat over the “dk ” is defined as in Proposition 6.4.28. (The matrix
A in this case has a physical interpretation as the stiffness matrix of a mass-
spring chain; see [OlvSha18, §6.1]. Note that we could replace the condition
bi = ci = −di by bi = ci = di without changing det A, but the minus signs
come from the physical backstory.)
(f) 2 We return to the general case. Define A:k as in part (a). Prove that

det A bn − 1 c n − 1
= an − ,
det ( A:n−1 ) bn − 2 c n − 2
a n −1 −
bn − 3 c n − 3
a n −2 −
a n −3 −
..
.
b2 c2

b c
a2 − 1 1
a1

provided that all denominators in this equality are invertible.


[Remark: If we set ai = 2x for all i ∈ [n] and bi = ci = −1 for all i ∈
[n − 1], then det A becomes the n-th Chebyshev polynomial of the first kind,
commonly denoted Tn ( x ). Some properties of Chebyshev polynomials can
be generalized to determinants of arbitrary tridiagonal matrices.
Part (f) can be seen as a formula for expressing a continued fraction as
a ratio of determinants. The connections between continued fractions and
determinants run much deeper.]

Exercise A.5.3.16. Let a ∈ K be arbitrary. Let n ∈ N. Define a tridiagonal


n × n-matrix A ∈ K n×n as in Exercise A.5.3.15, after setting

ai = a (for all i ∈ [n]) and


bi = n − i (for all i ∈ [n − 1]) and
ci = i (for all i ∈ [n − 1]) .
Math 701 Spring 2021, version April 6, 2024 page 582

Thus,  
a n−1 0 ··· 0 0 0

 1 a n−2 ··· 0 0 0 


 0 2 a ··· 0 0 0 

A=
 .. .. .. .. .. .. ...

 . . . . . . .

 0 0 0 ··· a 2 0 

 0 0 0 ··· n−2 a 1 
0 0 0 ··· 0 n−1 a
(a) 6 Prove that
n −1
det A = ∏ (a − 2k + n − 1) .
k =0
(This particular matrix A, for a = 0, is called the Kac matrix; thus, our formula
for det A computes its characteristic polynomial.)
(b) ? Can you find a bijective proof using the formula in Exercise A.5.3.15
(b)?
Another variation on the Vandermonde determinant:

Exercise A.5.3.17. 3 Let n ∈ N. Let a1 , a2 , . . . , an ∈ K and b1 , b2 , . . . , bn ∈ K.


Prove that
  
n − j j −1
= ∏

det a i bi a i b j − a j bi .
1≤i ≤n, 1≤ j≤n
1≤ i < j ≤ n

A.5.4. The Lindström–Gessel–Viennot lemma


The following exercises are concerned with Section 6.5.

Exercise A.5.4.1. 5 Consider the following variants of the definition of the


map f in the proof of Proposition 6.5.12:
(a) Define v to be the last (rather than the first) crowded point on pi .
(b) Define j to be the smallest (rather than the largest) element of [k ] such
that v belongs to p j (not counting i, of course).
(c) Instead of choosing v first and j later, choose j and v as follows: First,
pick j to be the largest element of [k ] such that the path pi intersects p j ; then,
define v to be the first point where pi intersects p j .
(d) Instead of defining v to be the first crowded point on pi , choose some
arbitrary total order on the vertex set of the digraph, and define v to be the
largest crowded point on pi with respect to this order. (The total order needs
to be chosen in advance; it must not depend on p or σ.)
Math 701 Spring 2021, version April 6, 2024 page 583

Which of these variants “work” (i.e., lead to well-defined sign-reversing


involutions f : X → X ) ?
(You are not required to try out all combinations of these variants; just
analyze each variant for itself.)

Exercise A.5.4.2. Complete the proof of Corollary 6.5.17 sketched above:


(a) 2 Show that there is only one nipat from A to B.
(b) 3 Show that there are no nipats from A to σ (B) when σ ∈ Sk is not
the identity permutation id ∈ Sk .
Furthermore:
(c) 3 Prove the analogue of Corollary 6.5.17 that says that each k ∈ N
satisfies
 
c1 c2 · · · ck
    c 2 c 3 · · · c k +1 
det ci+ j−1 1≤i≤k, 1≤ j≤k = det  . = 1.
 
 .. .
.. . .. .
.. 

ck ck+1 · · · c2k−1

(d) 2 Show that the Catalan sequence (c0 , c1 , c2 , . . .) is the only sequence
( a0 , a1 , a2 , . . .) of real numbers that satisfies
     
det ai+ j−2 1≤i≤k, 1≤ j≤k = det ai+ j−1 1≤i≤k, 1≤ j≤k = 1
for all k ∈ N.
 
for each k ∈ N.

(e) 3 Compute det ci + j 1≤i ≤k, 1≤ j≤k

The next exercise should be contrasted with Exercise A.4.11.2.


Exercise A.5.4.3. Let n ∈ N. Let I be a subset of [n − 1]. Write I in the form
I = {c1 , c2 , . . . , ck } with c1 < c2 < · · · < ck . Set c0 := 0 and ck+1 := n. Recall
Definition A.4.22.
(a) 3 Prove that
 
!
n − c i −1

(# of σ ∈ Sn satisfying Des σ = I ) = det  .
c j − c i −1
1≤i ≤k+1, 1≤ j≤k+1

(b) 5 Let us use the notations from Definition 4.4.3. Prove that
 
 !
n−c

∑ qℓ(σ) = det  c j − cii−−11  in the ring Z [q] .
σ∈S ;
n q 1≤i ≤k+1, 1≤ j≤k+1
Des σ= I
Math 701 Spring 2021, version April 6, 2024 page 584

[Hint: One way to approach this is by observing that the matrices in ques-
tion are transposes of normalized lower Hessenberg matrices as in Exercise
A.5.3.13. Another is to apply the LGV lemma (in its weighted version for part
(b)) to the (k + 1)-vertices A = ( A1 , A2 , . . . , Ak+1 ) and B = ( B1 , B2 , . . . , Bk+1 )
defined by

Ai := (0, ci−1 ) and Bi := (n − ci , ci ) .

Here is an example, for n = 10 and I = {5, 6, 9} and σ ∈ S10 with OLN


(2, 3, 5, 7, 10, 9, 1, 6, 8, 4):

B4
0
A4 B3
1

0
A3 B2
4
A2 B1
5

1
A1
.

Here, the x-coordinates of the north-steps of the paths (from bottom to


top: 1, 1, 2, 3, 5, 4, 0, 1, 1, 0) are the entries of the Lehmer code L (σ) =
(1, 1, 2, 3, 5, 4, 0, 1, 1, 0) of σ.]

The next exercise sketches out a visual proof of the Cauchy–Binet formula
(Theorem 6.4.18) using the LGV lemma:

Exercise A.5.4.4. Let K be a commutative ring. Let n, m ∈ N. Let A ∈ K n×m


be an n × m-matrix, and let B ∈ K m×n be an m × n-matrix.
Let D be the digraph with 2n + m vertices labeled

1, 2, . . . , n, 1′ , 2′ , . . . , m ′ , 1′′ , 2′′ , . . . , n′′ ,


Math 701 Spring 2021, version April 6, 2024 page 585

and with arcs


i → j′ for all i ∈ [n] and j ∈ [m]
and
i′ → j′′ for all i ∈ [m] and j ∈ [n] .
Here is how D looks like for n = 2 and m = 4:
1 2

1′ 2′ 3′ 4′

1′′ 2′′ .
(a) 1 Prove that this digraph D is acyclic.
Now, for each arc a of D, we define a weight w ( a) ∈ K as follows:
• If a is the arc i → j′ for some i ∈ [n] and j ∈ [m], then we set w ( a) :=
Ai,j .
• If a is the arc i′ → j′′ for some i ∈ [m] and j ∈ [n], then we set w ( a) :=
Bi,j .

(b) 1 Prove that


!
∑ w ( p) = AB.
p:i → j′′ 1≤i ≤n, 1≤ j≤n

Here, “p : u → v” means “p is a path from u to v” whenever u and v are two


vertices of D.
(c) 3 Define two n-vertices A and B by A = (1, 2, . . . , n) and B =
(1′′ , 2′′ , . . . , n′′ ).
Prove that

∑ (−1)σ ∑ w (p)
σ ∈ Sn p is a nipat
from A to σ (B)


 
= det colsg1 ,g2 ,...,gn A · det rowsg1 ,g2 ,...,gn B ,
( g1 ,g2 ,...,gn )∈[m]n ;
g1 < g2 <···< gn

where we are using the notations of Theorem 6.5.14 (with k = n) and of


Theorem 6.4.18.
(d) 1 Prove Theorem 6.4.18.
Math 701 Spring 2021, version April 6, 2024 page 586

Exercise A.5.4.5. Consider the situation of Theorem 6.5.14. Assume that our
digraph D has vertex set [n] for some n ∈ N. Let E be the set of all arcs of D.
Let M ∈ K n×n be the n × n-matrix whose (i, j)-th entry is given by

Mi,j = ∑ w ( a) for all i, j ∈ [n] .


a∈ E is an arc
from i to j

(Note that if D is a simple digraph, then the sum on the right hand side of
this equality has at most one addend.)
(a) 2 Prove that each k ∈ N satisfies
 

Mk =  ∑
 
 w ( p)
 .
p:i → j is a path
with k steps 1≤i ≤n, 1≤ j≤n

Here, “p : i → j” means “p is a path from i to j”.


(b) 2 Prove that Mn = 0n×n (the zero matrix).
(c) 2 Let In ∈ K n×n denote the n × n identity matrix. Prove that
!
( In − M)−1 = ∑ w ( p) .
p:i → j 1≤i ≤n, 1≤ j≤n

A.6. Symmetric functions


The notations of Chapter 7 shall be used here. In particular, an integer N ∈ N
and a commutative ring K are fixed.

A.6.1. Definitions and examples of symmetric polynomials

Exercise A.6.1.1. 2 Prove Proposition 7.1.14 (b).

Exercise A.6.1.2. 2 Let M ∈ {0, 1, . . . , N } and n ∈ N. Prove that


n
e n [ x1 , x2 , . . . , x N ] = ∑ ei [ x 1 , x 2 , . . . , x M ] · e n − i [ x M +1 , x M +2 , . . . , x N ]
i =0

and
n
h n [ x1 , x2 , . . . , x N ] = ∑ h i [ x 1 , x 2 , . . . , x M ] · h n − i [ x M +1 , x M +2 , . . . , x N ]
i =0
Math 701 Spring 2021, version April 6, 2024 page 587

and (if n is positive)

p n [ x 1 , x 2 , . . . , x N ] = p n [ x 1 , x 2 , . . . , x M ] + p n [ x M +1 , x M +2 , . . . , x N ] .

Exercise A.6.1.3. 5 Finish our proof of Theorem 7.1.12 by proving the re-
maining two Newton–Girard formulas (262) and (263).

Exercise A.6.1.4. 5 Prove that each positive integer n satisfies


n
∑ (−1) j−1 je j hn− j = p j and
j =1
n
∑ (−1)n− j jh j en− j = p j .
j =1

Exercise A.6.1.5. Let n ∈ N.


(a) 1 Prove that
 
 
N
en 1, 1, . . . , 1 = and
| {z } n
 N times 
N+n−1
 
hn 1, 1, . . . , 1 = .
| {z } n
N times

(b) 2 Prove that


 
h
0 1 N −1
i N n(n−1)/2
en q , q , . . . , q =q and
n q
N+n−1
h i  
0 1 N −1
hn q , q , . . . , q =
n q

in the ring Z [q], where we are using the notation of Definition 4.4.3.
(c) 2 Recover a nontrivial identity between q-binomial coefficients by sub-
stituting q0 , q1 , . . . , q N −1 into an identity between symmetric polynomials.
(There are several valid answers here.)
(d) 3 For any m, k ∈ N, the unsigned Stirling number of the 1st kind c (m, k ) ∈
N is defined to be the # of all permutations σ ∈ Sm that have exactly k cycles
(see Definition 5.5.4 (a)). Prove that

en [1, 2, . . . , N ] = c ( N + 1, N + 1 − n) .
Math 701 Spring 2021, version April 6, 2024 page 588

(e) 3 For any m, k ∈ N, the Stirling number of the 2nd kind S (m, k ) ∈ N is
defined to be the # of all set partitions of the set [m] into k parts (i.e., the # of
sets {U1 , U2 , . . . , Uk } consisting of k disjoint nonempty subsets U1 , U2 , . . . , Uk
of [m] such that U1 ∪ U2 ∪ · · · ∪ Uk = [m]). Prove that

hn [1, 2, . . . , N ] = S ( N + n, N ) .

Exercise A.6.1.6. For each n ∈ N and each positive integer k, we define the
(n, k)-th Petrie symmetric polynomial gk,n ∈ S by
gk,n = ∑ n
x i1 x i2 · · · x i n
(i1 ,i2 ,...,in )∈[ N ] ;
i1 ≤i2 ≤···≤in ;
no k of the numbers i1 ,i2 ,...,in are equal

= ∑ x1a1 x2a2 · · · x aNN .


( a1 ,a2 ,...,a N )∈{0,1,...,k−1} N ;
a1 + a2 +···+ a N =n

(a) 1 Prove that g2,n = en for each n ∈ N.


(b) 1 Prove that gk,n = hn for each n ∈ N and each k > n.
(c) 1 Prove that gn,n = hn − pn for each n > 0.
(d) 3 Prove that each n ∈ N and each k > 0 satisfy
h i
gk,n = ∑ (−1)i ei x1k , x2k , . . . , x kN · hn−ki .
i ∈N

(The sum on the right hand side is well-defined, since hn−ki = 0 whenever
n
i > .)
k
(e) 3 Set
(
2, if i ≡ j mod 3;
ci,j := for any i, j ∈ Z.
−1, if i ̸≡ j mod 3

Prove that each even n ∈ N satisfies


(n−2)/2
g3,n = e2n/2 + ∑ ci,n−i ei en−i ,
i =0

and that each odd n ∈ N satisfies


(n−1)/2
g3,n = − ∑ ci,n−i ei en−i .
i =0
Math 701 Spring 2021, version April 6, 2024 page 589

Exercise A.6.1.7. (a) 3 Prove that there exists a family of polynomials


( P1 , P2 , P3 , . . .), with each Pn being a polynomial in the ring Z [ x1 , x2 , . . . , xn ],
such that every positive integer n satisfies

en = Pn [h1 , h2 , . . . , hn ] and hn = Pn [e1 , e2 , . . . , en ] .

(This family begins with P1 = x1 and P2 = x12 − x2 and P3 = x13 − 2x1 x2 + x3 .)

(b) 2 Prove that there exists a family of polynomials ( Q1 , Q2 , Q3 , . . .), with


each Qn being a polynomial in the ring Z [ x1 , x2 , . . . , xn ], such that every
positive integer n satisfies

pn = (−1)n−1 Qn [h1 , h2 , . . . , hn ] and p n = Q n [ e1 , e2 , . . . , e n ] .

(This family begins with Q1 = x1 and Q2 = x12 − 2x2 and Q3 = x13 − 3x1 x2 +
3x3 .)
(c) 3 Express the polynomials Pn explicitly as determinants of certain
matrices.
(d) 3 Express the polynomials Qn explicitly as determinants of certain
matrices.
(e) 1 What is the coefficient of xi in the polynomial Pi ?
(f) 1 What is the coefficient of xi in the polynomial Qi ?

The following exercise is a symmetric-functions analogue of Exercise A.5.2.10:

Exercise A.6.1.8. 3 Let us use the notations of Exercise A.5.2.10. (Thus, G is


a finite undirected graph with vertex set V and edge set E.)
Define the chromatic symmetric polynomial XG (in N variables x1 , x2 , . . . , x N )
to be the polynomial

∑ ∏ xc(v) ∈ P .
c:V →[ N ] is a v ∈V
proper N-coloring

For instance, if the graph G is the length-2 path graph 1 2 3 ,


then a proper N-coloring of G is a map c : [3] → [ N ] satisfying c (1) ̸= c (2)
Math 701 Spring 2021, version April 6, 2024 page 590

and c (2) ̸= c (3), and therefore its chromatic symmetric polynomial XG is

∑ ∏ xc(v) = ∑ xi x j x k = ∑ xi2 x j + ∑ xi x j x k
c:[3]→[ N ]; v∈[3] i,j,k∈[ N ]; i,j∈[ N ]; i,j,k∈[ N ];
c(1)̸=c(2); i ̸= j; i̸= j i,j,k are distinct
c(2)̸=c(3) j̸=k | {z } | {z }
= p 2 e1 − p 3 =6e3
= p2 e1 − p3 + 6e3
= e2 e1 + 3e3 (by some computation)
= p31 − 2p1 p2 + p3 (by some computation) .

(a) 1 Prove that XG ∈ S .


(b) 3 Prove that

XG = ∑ (−1)| F| ∏ p |C | .
F⊆E C is a connected component
of the graph with vertex set V
and edge set F

(Here, |C | denotes the number of vertices in the connected component C.)


(c) 1 Prove that  

XG 1, 1, . . . , 1 = χG ( N )
| {z }
N ones

(where χG ( N ) is as defined in Exercise A.5.2.10).


Next, let us generalize XG : For each vertex v ∈ V, let w (v) be a positive
integer; we shall call w (v) the weight of v. We define the weighted chromatic
symmetric polynomial XG,w to be

∑ ∏ xc(v)
w(v)
∈ P.
c:V →[ N ] is a v ∈V
proper N-coloring

Thus, if all weights w (v) equal 1, then XG,w = XG .


(d) 2 Generalize the claim of part (b) to XG,w .

Exercise A.6.1.9. Let n be a positive integer. If w ∈ [ N ]n is an n-tuple, then

• we let w1 , w2 , . . . , wn denote the n entries of w (so that w =


(w1 , w2 , . . . , wn ));
• we define the descent set Des w of w by

Des w := {i ∈ [n − 1] | wi > wi+1 } ;


Math 701 Spring 2021, version April 6, 2024 page 591

• we define the stagnation set Stag w of w by

Stag w := {i ∈ [n − 1] | wi = wi+1 } ;

• we define the monomial xw to be xw1 xw2 · · · xwn .

For instance, the 7-tuple (2, 2, 4, 1, 4, 4, 2) has descent set {3, 6} and stagnation
set {1, 5}.
(a) 1 Fix s ∈ N. Prove that

∑ xw ∈ S .
w∈[ N ]n ;
|Stag w|=s

(b) 1 Identify this sum as a sum of weighted chromatic symmetric poly-


nomials of path graphs (see Exercise A.6.1.8).
(c) 3 Fix d ∈ N and s ∈ N. Prove that

∑ xw ∈ S .
w∈[ N ]n ;
|Des w|=d;
|Stag w|=s

(d) 6 Fix d ∈ N and s ∈ N. Prove that the three polynomials

∑n xw , ∑n xw , ∑n xw
w∈[ N ] ; w∈[ N ] ; w∈[ N ] ;
|Des w|=d; |Des w|=d; |Des w|=d;
|Stag w|=s; |Stag w|=s; |Stag w|=s;
w1 < w n w1 = w n w1 > w n

all belong to S .

Exercise A.6.1.10. 4 Prove that every m ∈ N satisfies


N xkm
∑ ∏ ( x k − xi )
= h m − N +1 .
k =1
i ∈[ N ]\{k}

(This is an equality in the localization of the polynomial ring P at the multi-


plicative subset generated by the pairwise differences xi − x j for all i < j. If K
is a field, you can also view it as an equality in the field of rational functions
K ( x1 , x2 , . . . , x N ).)
[Example: For instance, if N = 3 (and if we rename x1 , x2 , x3 as x, y, z),
then this exercise is claiming that
xm ym zm
+ + = hm−2 [ x, y, z] .
( x − y) ( x − z) (y − z) (y − x ) (z − x ) (z − y)
Math 701 Spring 2021, version April 6, 2024 page 592

Note that the right hand side is 0 when m < 2.]

Exercise A.6.1.11. 2 Assume that N > 0. Prove that


N
1 (−1) N −1
∑ xk ∏ ( x k − xi )
=
x1 x2 · · · x N
.
k =1
i ∈[ N ]\{k}

(This is an equality in the localization of the polynomial ring P at the mul-


tiplicative subset generated by the variables x1 , x2 , . . . , x N and their pairwise
differences xi − x j for all i < j. If K is a field, you can also view it as an
equality in the field of rational functions K ( x1 , x2 , . . . , x N ).)
[Hint: This can be derived from Exercise A.6.1.10 via a neat trick.]

Exercise A.6.1.12. 4 Prove that

N
xk ∏ ( x k + xi )
i ∈[ N ]\{k}
∑ ∏ ( x k − xi )
= x1 + x2 + · · · + x N .
k =1
i ∈[ N ]\{k}

(The same comments as in Exercise A.6.1.10 apply.)


[Hint: It suffices to prove this in the field Q ( x1 , x2 , . . . , x N ) of rational func-
tions over Q (why?). Define the polynomial P (t) := ∏ (t + xi ) (in the inde-
i ∈[ N ]
terminate t over this field). Observe that the numerator xk ∏ ( x k + xi )
i ∈[ N ]\{k}
1
can be rewritten as P ( xk ) (why?). On the other hand, P (t) = t N + e1 t N −1 +
2
e2 t N −2 + · · · + e N t0 . Now apply Exercise A.6.1.10.]

Exercise A.6.1.13. 5 Prove that

N
∏ ( x k + xi ) (
i ∈[ N ]\{k} 0, if N is even;
∑ ∏ ( x k − xi )
=
1, if N is odd.
k =1
i ∈[ N ]\{k}

(The same comments as in Exercise A.6.1.10 apply.)


[Hint: It suffices to prove this in the field Q ( x1 , x2 , . . . , x N ) of rational
functions over Q (why?). Define the polynomial P (t) := ∏ ( xi + t) −
i ∈[ N ]
∏ ( xi − t) (in the indeterminate t over this field). This polynomial P (t)
i ∈[ N ]
is divisible by 2t. Let Q (t) be the quotient. What then?]
Math 701 Spring 2021, version April 6, 2024 page 593

Exercise A.6.1.14. Let n ∈ N.


(a) 3 Prove that
h i 2n
hn x12 , x22 , . . . , x2N = ∑ (−1)i hi h2n−i .
i =0

(b) 3 Prove that


h i 2n
en x12 , x22 , . . . , x2N = ∑ (−1)n−i ei e2n−i .
i =0

(c) 2 Solve Exercise A.2.2.1 again using part (b).

Exercise A.6.1.15. 4 Let i ∈ [ N + 1] and p ∈ N. Prove that

i −1
h p [ x i , x i +1 , . . . , x N ] = ∑ (−1)t et [x1, x2, . . . , xi−1 ] · h p−t .
t =0

(The “h p−t ” at the end of the right hand side means h p−t [ x1 , x2 , . . . , x N ].)

Exercise A.6.1.16. (a) 2 Prove that each i ∈ [ N ] and j ∈ [ N ] satisfy

∂e j
= e j−1 [ x1 , x2 , . . . , xbi , . . . , x N ] ,
∂xi
where the hat over the “xi ” means “omit the xi entry” (that
is, the expression “x1 , x2 , . . . , xbi , . . . , x N ” is to be understood as
“x1 , x2 , . . . , xi−1 , xi+1 , xi+2 , . . . , x N ”).
(b) 3 Prove that
  !
∂e j


det = xi − x j .
∂xi 1≤i ≤ N, 1≤ j≤ N 1≤ i < j ≤ N

(c) 5 Prove that


  !
∂h j


det = x j − xi .
∂xi 1≤i ≤ N, 1≤ j≤ N 1≤ i < j ≤ N

[Hint: For part (c), it is helpful to first prove the following more general
result:
Let u1 , u2 , . . . , u N be any N polynomials in P . Let v1 , v2 , . . . , v N be any
N polynomials in the polynomial ring K [y1 , y2 , . . . , y N ] (in N indeterminates
Math 701 Spring 2021, version April 6, 2024 page 594

y1 , y2 , . . . , y N ). For each j ∈ [ N ], let w j := v j [u1 , u2 , . . . , u N ] ∈ P be the


polynomial obtained from v j by substituting u1 , u2 , . . . , u N for y1 , y2 , . . . , y N .
Then,
     
∂w j ∂u j ∂v j
= · .
∂xi 1≤i≤ N, 1≤ j≤ N ∂xi 1≤i≤ N, 1≤ j≤ N ∂yi 1≤i≤ N, 1≤ j≤ N

This fact is a generalization of the chain rule and can be proved, e.g., by
decomposing v j into monomials and considering each monomial separately.
Now, use parts (a) and (e) of Exercise A.6.1.7 to apply this to our setting.]

A.6.2. N-partitions and monomial symmetric polynomials

Exercise A.6.2.1. 3 Let M ∈ {0, 1, . . . , N }. For any M-partition


µ = (µ1 , µ2 , . . . , µ M ) and any ( N − M)-partition ν = (ν1 , ν2 , . . . , νN − M ),
we let µ ⊔ ν denote the N-partition obtained by sorting the N-tuple
(µ1 , µ2 , . . . , µ M , ν1 , ν2 , . . . , νN − M ) in weakly decreasing order. (For example,
(3, 2, 0) ⊔ (4, 2, 1, 1) = (4, 3, 2, 2, 1, 1, 0).)
Let λ be any N-partition. Prove that

m λ [ x1 , x2 , . . . , x N ]
= ∑ m µ [ x 1 , x 2 , . . . , x M ] · m ν [ x M +1 , x M +2 , . . . , x N ] .
µ is an M-partition;
ν is an ( N − M )-partition;
µ⊔ν=λ

Exercise A.6.2.2. 2 We shall use the notion of a tournament, as defined in


Exercise A.5.3.6. Let T be the set of all tournaments with vertex set [ N ].
We define the scoreboard scb D of a tournament D ∈ T to be the N-tuple
(s1 , s2 , . . . , s N ) ∈ N N , where

s j := (# of arcs of D that end at j)


= (# of i ∈ [ N ] such that (i, j) is an arc of D )

for each j ∈ [ N ].
Prove that
∏ ∑

xi + x j = tλ mλ ,
1≤ i < j ≤ N λ is an N-partition
of N ( N −1)/2

where tλ denotes the # of tournaments D ∈ T with scoreboard scb D = λ.

A.6.3. Schur polynomials


Math 701 Spring 2021, version April 6, 2024 page 595

Exercise A.6.3.1. 2 Assume that N > 1. Without using the Jacobi–Trudi


identities, prove that s(n,1,0,0,...,0) = hn h1 − hn+1 for each positive integer n.
(Here, (n, 1, 0, 0, . . . , 0) denotes the N-partition whose first two entries are n
and 1 while all remaining entries equal 0.)

Exercise A.6.3.2. Let λ = (λ1 , λ2 , . . . , λ N ) be any N-partition.


(a) 2 Let k ∈ N. Show that

s(λ1 +k,λ2 +k,...,λ N +k) = ( x1 x2 · · · x N )k · sλ .

(b) 4 Let k ∈ N be such that k ≥ λ1 . Let µ be the N-partition


(k − λ N , k − λ N −1 , . . . , k − λ1 ). Prove that
h i
k −1 −1 −1
s µ = ( x1 x2 · · · x N ) · s λ x1 , x2 , . . . , x N

in the Laurent polynomial ring K x1± , x2± , . . . , x ±


 
N . (We have never for-
mally
 ± defined this
 ring, but it should suffice to know that the elements of
± ±
K x1 , x2 , . . . , x N are formal K-linear combinations of “Laurent monomials”
x1a1 x2a2 · · · x aNN with a1 , a2 , . . . , a N ∈ Z.)

Exercise A.6.3.3. Let λ = (λ1 , λ2 , . . . , λ N ) and µ = (µ1 , µ2 , . . . , µ N ) be any


two N-partitions.
(a) 1 Show that each k ∈ N satisfies

s(λ1 +k,λ2 +k,...,λ N +k)/(µ1 +k,µ2 +k,...,µ N +k) = sλ/µ .

(b) 4 Let p ∈ N be such that p ≥ λ1 and p ≥ µ1 . Show that

s( p−µ N ,p−µ N −1 ,...,p−µ1 )/( p−λ N ,p−λ N −1 ,...,p−λ1 ) = sλ/µ .

[Note: The Young diagram of


(λ1 + k, λ2 + k, . . . , λ N + k) / (µ1 + k, µ2 + k, . . . , µ N + k) is obtained
from Y (λ/µ) by a parallel shift, whereas the Young diagram of
( p − µ N , p − µ N −1 , . . . , p − µ1 ) / ( p − λ N , p − λ N −1 , . . . , p − λ1 ) is obtained
from Y (λ/µ) by a 180◦ -rotation.]

Exercise A.6.3.4. Let λ and µ be two N-partitions with µ ⊆ λ. Recall the


Bender–Knuth involutions β k : SSYT (λ/µ) → SSYT (λ/µ) defined for all
k ∈ [n − 1] in the proof of Theorem 7.3.21.
(a) 1 Prove that β i ◦ β j = β j ◦ β i whenever i and j are two elements of
[ N − 1] satisfying |i − j| > 1.
Math 701 Spring 2021, version April 6, 2024 page 596

(b) 4 Prove that β 1 ◦ β 2 ◦ β 1 = β 2 ◦ β 1 ◦ β 2 if µ = 0 (where 0 = (0, 0, . . . , 0)


as in Remark 7.3.22).
(c) 2 Find an example where β 1 ◦ β 2 ◦ β 1 ̸= β 2 ◦ β 1 ◦ β 2 for µ ̸= 0.

The next exercise answers a rather natural question about the definition of a
semistandard tableau (Definition 7.3.6): What happens when one replaces “in-
crease strictly down each column” by “increase weakly down each column”?
This replacement gives rise to a more liberal notion of semistandard tableaux
(which I call “semi-semistandard tableaux”), and to a variant of the Schur poly-
nomial that correspondingly has more terms. As the following exercise shows,
alas, this new polynomial is rarely ever symmetric:

Exercise A.6.3.5. Let λ be an N-partition, or (more generally) a partition of


any length. A Young tableau T of shape λ will be called semi-semistandard if
its entries

• increase weakly along each row;

• increase weakly down each column.

For instance, the tableau 1 1 is semi-semistandard but not semistan-


1 2
dard. (Semi-semistandard tableaux are actually known as reverse plane parti-
tions for historical reasons: If you replace the words “increase” by “decrease”,
then they become “2-dimensional” analogues of partitions, in that they are
tables of positive integers that weakly decrease in two directions.)
We define the “fool’s Schur polynomial” b sλ ∈ P by

sλ :=
b ∑ xT ,
T ∈SSSYT(λ)

where SSSYT (λ) is the set of all semi-semistandard tableaux of shape λ.


(a) 2 Assume that N ≥ 3 and K = Z. Prove that the polynomial b sλ is
symmetric if and only if the Young diagram Y (λ) either consists of a single
row (i.e., we have λ = (n, 0, 0, . . . , 0) for some n ∈ N) or consists of a single
column (i.e., we have λ = (1, 1, . . . , 1, 0, 0, . . . , 0) for some number of 1’s).
(b) 2 Now, replace the assumption N ≥ 3 by N = 2. Prove that the poly-
nomial b sλ is symmetric if and only if the Young diagram Y (λ) is a rectangle
(i.e., all nonzero entries of λ are equal).
[Hint: Compare the coefficients of x1 x2k and x1k x2 , as well as the coefficients
of x1 x2 x3k and x1 x2k x3 .]

The next exercise asks you to explore some possible (or impossible) variations
Math 701 Spring 2021, version April 6, 2024 page 597

on the proof of Lemma 7.3.34.

Exercise A.6.3.6. Recall the definition of the sign-reversing involution f :


X → X in the proof of Lemma 7.3.34.
(a) 2 Would f still be a sign-reversing involution if we defined j to be the
smallest (rather than largest) violator of T ?
(b) 1 Would f still be a sign-reversing involution if we defined k to be the
largest (rather than smallest) misstep of T ?

Here are some more properties of Schur polynomials:

Exercise A.6.3.7. 3 Let ρ := ( N − 1, N − 2, . . . , N − N ) ∈ N N . Prove that

sρ = ∏

xi + x j .
1≤ i < j ≤ N

Exercise A.6.3.8. 3 Prove Proposition 7.3.45.

Exercise A.6.3.9. (a) 4 Prove Theorem 7.3.46 (a).


(b) 4 Prove Theorem 7.3.46 (b).
[Hint: For part (a), apply Theorem 7.3.32 to (n, 0, 0, . . . , 0), 0 and µ instead
of λ, µ and ν, and characterize the µ-Yamanouchi semistandard tableaux of
shape (n, 0, 0, . . . , 0) /0. Proceed likewise for part (b).]

Exercise A.6.3.10. Complete the proof of Theorem 7.3.48 sketched above:


(a) 2 Prove Observation 1.
(b) 3 Prove Observation 2.

Exercise A.6.3.11. 6 Prove Theorem 7.3.51.

Exercise A.6.3.12. 5 Let n be a positive integer. Prove that

min{n,N }−1
pn = ∑ (−1)i sQ(i),
i =0
 

where Q (i ) is the N-partition n − i, 1, 1, . . . , 1, 0, 0, . . . , 0 .


 
| {z } | {z }
i many 1’s N −i −1 many 0’s
Math 701 Spring 2021, version April 6, 2024 page 598

Exercise A.6.3.13. 5 Let λ = (λ1 , λ2 , . . . , λ N ) and µ = (µ1 , µ2 , . . . , µ N ) be


two N-partitions. Prove that
  
sλ sµ = det h λ i + µ N +1− j − i + j .
1≤i ≤ N, 1≤ j≤ N

[Hint: Fix some m ∈ N that is larger than all of µ1 , µ2 , . . . , µ N , and consider


the skew partition ν/κ = (ν1 , ν2 , . . . , νN ) / (κ1 , κ2 , . . . , κ N ), where νi := λi + m
and κi := m − µ N +1−i for all i ∈ [ N ]. What are the semistandard tableaux of
shape ν/κ, and what does this mean for sν/κ ?]

Exercise A.6.3.14. 5 Prove the flagged first Jacobi–Trudi formula: Let M ∈ N.


Let λ = (λ1 , λ2 , . . . , λ M ) and µ = (µ1 , µ2 , . . . , µ M ) be two M-partitions
(i.e., weakly decreasing M-tuples of nonnegative integers). Let α =
(α1 , α2 , . . . , α M ) and β = ( β 1 , β 2 , . . . , β M ) be two weakly increasing sequences
of elements of {0, 1, . . . , N }. (As usual, “weakly increasing” means that
α1 ≤ α2 ≤ · · · ≤ α M and β 1 ≤ β 2 ≤ · · · ≤ β M .) Let

sλ/µ, α/β := ∑ xT ∈ P .
T ∈SSYT(λ/µ);
αi < T (i,j)≤ β i for all (i,j)∈Y (λ/µ)

(This is the same sum as sλ/µ , but restricted to those semistandard tableaux
whose entries in the i-th row belong to the half-open interval (αi , β i ] for each
i ∈ [ M ]. This polynomial is known as a (row-)flagged Schur polynomial; in
general, it is not symmetric.) Then,
 h i 
sλ/µ, α/β = det h λ i − µ j − i + j x α j +1 , x α j +2 , . . . , x β i .
1≤i ≤ M, 1≤ j≤ M

(If α j ≥ β i , then “xα j +1 , xα j +2 , . . . , x βi ” is understood to be an empty list, so


h i
that hλi −µ j −i+ j xα j +1 , xα j +2 , . . . , x βi is the complete homogeneous symmetric
polynomial hλi −µ j −i+ j of 0 variables; this equals 1 if λi − µ j − i + j = 0 and 0
otherwise.)

Exercise A.6.3.15. Let d denote the differential operator


∂ ∂ ∂
+ +···+ on P .
∂x1 ∂x2 ∂x N
Explicitly, this operator d is the K-linear map from P to P that sends


a a −1 a a
each monomial x1a1 x2a2 · · · x aNN to ai x1a1 x2a2 · · · xi−i−11 xi i xi+i+11 xi+i+22 · · · x aNN .
i ∈[ N ]; | {z
a a a
}
a i >0 This is just the monomial x11 x22 ··· x NN ,
with the exponent on xi decreased by 1

As usual for linear operators, we abbreviate d ( f ) by d f (when f ∈ P ).


Math 701 Spring 2021, version April 6, 2024 page 599

Prove the following:


(a) 1 We have d ( f g) = (d f ) · g + f · dg for any f , g ∈ P . (In the lingo of
algebraists, this is saying that d is a derivation on P .)
(b) 1 We have d (en ) = ( N − n + 1) en−1 for any n ∈ Z.
(c) 1 We have d (hn ) = ( N + n − 1) hn−1 for any n ∈ Z.
(d) 1 We have d ( pn ) = npn−1 for any n > 1.
(e) 1 We have d (σ · f ) = σ · (d f ) for any f ∈ P and any σ ∈ S N .
(f) 1 We have d f ∈ S for each f ∈ S .

(g) 2 We have d aρ = 0. (Here, aρ is as in Definition 7.3.2.)
(h) 5 Let λ = (λ1 , λ2 , . . . , λ N ) be an N-partition. Set λ N +1 := 0. Let D be
the set of all i ∈ [ N ] such that λi > λi+1 . For each i ∈ D, let λ − ei denote the
N-partition obtained from λ by subtracting 1 from the i-th entry; that is, we
let
λ − ei := (λ1 , λ2 , . . . , λi−1 , λi − 1, λi+1 , λi+2 , . . . , λ N ) .
(This is an N-partition, since i ∈ D entails λi > λi+1 and thus λi − 1 ≥ λi+1 .)
Then,
d ( s λ ) = ∑ ( λ i + N − i ) s λ − ei . (312)
i∈ D
[Hint: For part (h), study how d acts on the alternant aλ+ρ , and show that

d aρ f = aρ d ( f ) for any f ∈ P .]
[Remark: The equality (312) can be restated in terms of Young diagrams
as follows:
d (sλ ) = ∑ k λ,µ sµ ,
where

• the sum ranges over all N-partitions µ such that the Young diagram
Y (µ) can be obtained from Y (λ) by removing a single box (without
shifting the remaining boxes);

• the coefficient k λ,µ is defined to be j + N − i if (i, j) is the box that must


be removed from Y (λ) to obtain Y (µ).

For example, for N = 4 and λ = (3, 1, 1, 0), this says that


 
d s(3,1,1,0) = (λ1 + N − 1) s(2,1,1,0) + (λ3 + N − 3) s(3,1,0,0)
| {z } | {z }
=3+4−1=6 =1+4−3=2
= 6s(2,1,1,0) + 2s(3,1,0,0) . ]
Math 701 Spring 2021, version April 6, 2024 page 600

Exercise A.6.3.16. 7 (a) Let n ∈ N, and let k be a positive integer. Recall


the (n, k)-Petrie symmetric polynomial gk,n defined in Exercise A.6.1.6. Show
that gk,n can be written in the form

gk,n = ∑ dλ sλ for some dλ ∈ {0, 1, −1} .


λ⊢n

(The notation “λ ⊢ n” means “λ is a partition of n”, so that the sum ranges


over all partitions of n.)
(b) Show that the numbers dλ in this expression are explicitly given by
 
dλ = det ([0 ≤ λi − i + j < k ])1≤i≤ M, 1≤ j≤ M ,

where the partition λ has been written as λ = (λ1 , λ2 , . . . , λ M ) for some


M ∈ N, and where we are using the Iverson bracket notation (see Definition
4.1.5).

Exercise A.6.3.17. Let I be the ideal of the ring P generated by the N poly-
nomials e1 , e2 , . . . , e N .
(a) 2 Prove that any homogeneous symmetric polynomial f ∈ S of posi-
tive degree is contained in I.
(b) 9 Prove that the quotient ring P /I is a free K-module with basis
 
x1a1 x2a2 · · · x aNN ,
( a1 ,a2 ,...,a N )∈ HN

where the N!-element set HN is defined as in Definition 5.3.6 (c) (for n = N).
(Here, the notation f denotes the projection of a polynomial f ∈ P onto the
quotient ring P /I.)
(c) 5 More generally: Let u1 , u2 , . . . , u N be N polynomials in P such that

deg ui < i for each i ∈ [ N ] .

(The polynomials ui need not be symmetric or homogeneous.) Let I ′ be the


ideal of the ring P generated by the N polynomials

e1 − u 1 , e2 − u 2 , . . . , e N − u N .

Prove that the quotient ring P /I ′ is a free K-module with basis


 
x1a1 x2a2 · · · x aNN ,
( a1 ,a2 ,...,a N )∈ HN

where the N!-element set HN is defined as in Definition 5.3.6 (c) (for n = N).
(Here, the notation f denotes the projection of a polynomial f ∈ P onto the
quotient ring P /I ′ .)
Math 701 Spring 2021, version April 6, 2024 page 601

B. Omitted details and proofs


This chapter contains some proofs (and parts of proofs) that have been omit-
ted from the text above – usually because they are technical arguments or of
tangential interest only.
Each of the proofs given below uses the notations and conventions of the
chapter and section in which the respective claim appears.

B.1. x n -equivalence
Detailed proof of Theorem 3.10.3. (a) We claim the following:

xn
Claim 1: We have f ≡ f for each f ∈ K [[ x ]].

[Proof of Claim 1: Let f ∈ K [[ x ]]. Obviously, each m ∈ {0, 1, . . . , n} satisfies [ x m ] f =


xn
[ x m ] f . In other words, we have f ≡ f (by Definition 3.10.1). This proves Claim 1.]
xn xn xn
Claim 2: If three FPSs f , g, h ∈ K [[ x ]] satisfy f ≡ g and g ≡ h, then f ≡ h.

xn xn
[Proof of Claim 2: Let f , g, h ∈ K [[ x ]] be three FPSs satisfying f ≡ g and g ≡ h. We
xn
must show that f ≡ h.
xn
We have f ≡ g. In other words,

each m ∈ {0, 1, . . . , n} satisfies [ x m ] f = [ x m ] g (313)

(by Definition 3.10.1).


xn
We have g ≡ h. In other words,

each m ∈ {0, 1, . . . , n} satisfies [ x m ] g = [ x m ] h (314)

(by Definition 3.10.1).


Now, each m ∈ {0, 1, . . . , n} satisfies

[ xm ] f = [ xm ] g (by (313))
= [ xm ] h (by (314)) .
xn
In other words, we have f ≡ h (by Definition 3.10.1). This proves Claim 2.]

xn xn
Claim 3: If two FPSs f , g ∈ K [[ x ]] satisfy f ≡ g, then g ≡ f .

xn
[Proof of Claim 3: Let f , g ∈ K [[ x ]] be two FPSs satisfying f ≡ g. We must show that
xn
g ≡ f.
xn
We have f ≡ g. In other words,

each m ∈ {0, 1, . . . , n} satisfies [ x m ] f = [ x m ] g


Math 701 Spring 2021, version April 6, 2024 page 602

(by Definition 3.10.1). In other words,

each m ∈ {0, 1, . . . , n} satisfies [ x m ] g = [ x m ] f .


xn
In other words, we have g ≡ f (by Definition 3.10.1). This proves Claim 3.]
xn
Now, the relation ≡ is reflexive (by Claim 1), transitive (by Claim 2) and symmetric
xn
(by Claim 3). In other words, this relation ≡ is an equivalence relation. This proves
Theorem 3.10.3 (a).
xn xn
(b) Let a, b, c, d ∈ K [[ x ]] be four FPSs satisfying a ≡ b and c ≡ d.
xn
We have a ≡ b. In other words,

each m ∈ {0, 1, . . . , n} satisfies [ x m ] a = [ x m ] b (315)

(by Definition 3.10.1).


xn
We have c ≡ d. In other words,

each m ∈ {0, 1, . . . , n} satisfies [ x m ] c = [ x m ] d (316)

(by Definition 3.10.1).


Now, every m ∈ {0, 1, . . . , n} satisfies

[ xm ] ( a + c) = [ xm ] a + [ xm ] c (317)

(by (20), applied to m, a and c instead of n, a and b) and

[ x m ] (b + d) = [ x m ] b + [ x m ] d (318)

(by (20), applied to m, b and d instead of n, a and b). Hence, every m ∈ {0, 1, . . . , n}
satisfies

[ xm ] ( a + c) = [ xm ] a + [ xm ] c (by (317))
| {z } | {z }
=[ x m ]b =[ x m ]d
(by (315)) (by (316))

= [ x m ] b + [ xm ] d = [ xm ] (b + d) (by (318)) .
xn
In other words, a + c ≡ b + d (by Definition 3.10.1). Thus, we have proved (99). The
same argument (but with all “+” signs replaced by “−” signs, and with all references
to (20) replaced by references to (21)) can be used to prove (100). It remains to prove
(101).
Let m ∈ {0, 1, . . . , n}. Then, m ≤ n.
Now, let i ∈ {0, 1, . . . , m}. Then, i ∈ {0, 1, .. . , m} ⊆ {0, 1, . . . , n} (since m ≤ n).
Hence, (315) (applied to i instead of m) yields xi a = xi b. Furthermore, from i ∈
{0, 1, . . . , m}, we obtain m − i ∈ {0, 1, . . . , m} ⊆ {0, 1, . . . ,n} (since m ≤ n). Hence, (316)
(applied toi m − i instead
 m−i  of m) yields x m−i c = x m−i d. Multiplying the equalities

 i  m i

x a = x b and x c= x d, we obtain
h i h i h i h i
xi a · x m−i c = xi b · x m−i d. (319)
Math 701 Spring 2021, version April 6, 2024 page 603

Forget that we fixed i. We thus have proved (319) for each i ∈ {0, 1, . . . , m}. Now,
(22) (applied to m, a and c instead of n, a and b) yields
m h i h i m h i h i
[ x m ] ( ac) = ∑ x i
a · x m −i
c = ∑ x i
b · x m −i
d.
i =0 | {z } i =0
= [ x i ] b · [ x m −i ] d
(by (319))

On the other hand, (22) (applied to m, b and d instead of n, a and b) yields


m h i h i
[ x m ] (bd) = ∑ xi b · x m−i d.
i =0

Comparing these two equalities, we find [ x m ] ( ac) = [ x m ] (bd).


Forget that we fixed m. We thus have shown that each m ∈ {0, 1, . . . , n} satisfies
xn
[ x m ] ( ac) = [ x m ] (bd). In other words, ac ≡ bd (by Definition 3.10.1). Thus we have
proved (101), so that the proof of Theorem 3.10.3 (b) is complete.
xn xn
(c) Let a, b ∈ K [[ x ]] be two FPSs satisfying a ≡ b. We have a ≡ b. In other words,

each m ∈ {0, 1, . . . , n} satisfies [ x m ] a = [ x m ] b (320)

(by Definition 3.10.1). However, each m ∈ {0, 1, . . . , n} satisfies

[ x m ] (λa) = λ · [ x m ] a (321)

(by (25), applied to m and a instead of n and a) and

[ x m ] (λb) = λ · [ x m ] b (322)

(by (25), applied to m and b instead of n and a). Now, each m ∈ {0, 1, . . . , n} satisfies

[ x m ] (λa) = λ · [ x m ] a (by (321))


| {z }
=[ x m ]b
(by (320))

= λ · [ x m ] b = [ x m ] (λb) (by (322)) .

xn
In other words, λa ≡ λb (by Definition 3.10.1). This proves Theorem 3.10.3 (c).
xn xn
(d) Let a, b ∈ K [[ x ]] be two FPSs satisfying a ≡ b. We have a ≡ b. In other words,

each m ∈ {0, 1, . . . , n} satisfies [ x m ] a = [ x m ] b (323)

(by Definition 3.10.1).


Now, we shall show that each m ∈ {0, 1, . . . , n} satisfies
   
m −1 m −1
[x ] a = [x ] b . (324)

[Proof of (324): We shall prove (324) by strong induction on m:


Math 701 Spring 2021, version April 6, 2024 page 604

Induction step: Fix some k ∈ {0, 1, . . . , n}. We assume (as an induction hypothesis)
that (324) is true for any m < k. In other words, for any m ∈ {0, 1, . . . , n} satisfying
m < k, we have    
[ x m ] a −1 = [ x m ] b −1 . (325)

We
 k shall
− 1
now prove
k
 (324) is true for m = k. In other words, we shall prove that
 −that
1
x a = x b .
Proposition 3.3.7 shows that the FPS a is invertible in K [[x ]] if and only if its constant
term x0 a is invertible in K. Hence, its constant term x0 a is invertible in K (since a
 

is invertible in K [[ x ]]). Note that k ≤ n (since k ∈ {0, 1, . . . , n}).


Applying (22) to k, a and a−1 instead of n, a and b, we obtain
h i  k h i h i 
x k aa−1 = ∑ x i a · x k − i a −1
i =0
 0  h k i  −1  k h i h i 
= x a· x a +∑ x a· x
i k −i
a −1
i =1

(here, we have split off the addend for i = 0 from the sum). Thus,

 0  h k i  −1  k h i h i  h i  h i
x a· x a + ∑ xi a · x k−i a−1 = x k aa−1 = x k 1,
i =1 | {z }
=1

so that
 0  h k i  −1  h k i k h i h i 
x a· x a = x 1 − ∑ x i a · x k − i a −1 .
i =1

We can divide this equality by x0 a (since x0 a is invertible in K), and thus obtain
   

!
k h i
− 1
h i  h i h i 
x k a −1 = x 0 a · x k 1 − ∑ x i a · x k − i a −1
  
. (326)
i =1

The same argument (applied to b instead of a) yields


!
h i     −1 h i k h i h i 
x k b −1 = x 0 b · x k 1 − ∑ x i b · x k − i b −1 . (327)
i =1

However, we observe the following:

• We have 0 ∈ {0, 1, . . . , n} (since n ∈ N) and thus x0 a = x0 b (by (323), applied


   

to m = 0).

• Each i ∈ {1, 2, . . . , k } satisfies i ∈ {1, 2, . . . , k } ⊆ {0, 1, . . . , n} (since 1 ≥ 0 and


k ≤ n) and therefore h i h i
xi a = xi b (328)

(by (320), applied to m = i).


Math 701 Spring 2021, version April 6, 2024 page 605

• Each i ∈ {1, 2, . . . , k } satisfies k − i ∈ {0, 1, . . . , k − 1} ⊆ {0, 1, . . . , n} (since k −


1 ≤ k ≤ n) and k − i < k (since k − i ∈ {0, 1, . . . , k − 1}, so that k − i ≤ k − 1 < k),
and therefore h i  h i 
x k − i a −1 = x k − i b −1 (329)

(by (325), applied to m = k − i).

Hence, (326) becomes


 
  −1  
k
h i  h i h i h i   
k −1
=  x a ·  x 1 − ∑ x a · x k −i −1 
 0    k i

x a  a 
| {z }  i =1 | {z } | {z }
0
=[ x ]b
=[ xi ]b =[ x k−i ](b−1 )
 
(by (328)) (by (329))
!
   −1 h i k h i h i 
= x0 b · x k 1 − ∑ x i b · x k − i b −1
i =1
h i 
= x k b −1 (by (327)) .

In other words, (324) is true for m = k. This completes the induction step. Thus, (324)
is proved.]
Thus, we have shown that each m ∈ {0, 1, . . . , n} satisfies [ x m ] a−1 = [ x m ] b−1 . In
 
xn
other words, a−1 ≡ b−1 (by Definition 3.10.1). This proves Theorem 3.10.3 (d).
xn xn
(e) Let a, b, c, d ∈ K [[ x ]] be four FPSs satisfying a ≡ b and c ≡ d. Assume that the
FPSs c and d are invertible. Then, Theorem 3.10.3 (d) (applied to c and d instead of
xn
a and b) yields c−1 ≡ d−1 . Hence, (101) (applied to c−1 and d−1 instead of c and d)
xn a xn b a b
yields ac−1 ≡ bd−1 . In other words, ≡ (since ac−1 = and bd−1 = ). This proves
c d c d
Theorem 3.10.3 (e).
(f) Let us first prove (104):
[Proof of (104): We proceed by induction on |S|:
Induction base: It is easy to see that (104) holds for |S| = 0 162 .
Induction step: Let k ∈ N. Assume (as the induction hypothesis) that (104) holds for
|S| = k. We must prove that (104) holds for |S| = k + 1.
162 Proof.
Let S, ( as )s∈S and (bs )s∈S be as in Theorem 3.10.3 (f), and assume that |S| = 0. From
|S| = 0, we obtain S = ∅. Hence,

∑ as = (empty sum) = 0 and


s∈S

∑ bs = (empty sum) = 0.
s∈S

xn
Comparing these two equalities, we obtain ∑ as = ∑ bs . Hence, ∑ as ≡ ∑ bs (since the
s∈S s∈S s∈S s∈S
xn
relation ≡ is an equivalence relation and thus is reflexive). Thus, we have proved (104)
under the assumption that |S| = 0.
Math 701 Spring 2021, version April 6, 2024 page 606

So let S, ( as )s∈S and (bs )s∈S be as in Theorem 3.10.3 (f), and assume that |S| = k + 1.
Then, |S| = k + 1 > k ≥ 0, so that the set S is nonempty. In other words, there exists
xn
some t ∈ S. Consider this t. Each s ∈ S \ {t} satisfies s ∈ S \ {t} ⊆ S and thus as ≡ bs
(by (103)). Moreover, from t ∈ S, we obtain |S \ {t}| = |S| − 1 = k (since |S| = k + 1).
Hence, we can apply (104) to S \ {t} instead of S (since our induction hypothesis says
that (104) holds for |S| = k). As a result, we obtain
xn
∑ as ≡ ∑ bs .
s∈S\{t} s∈S\{t}

xn
On the other hand, at ≡ bt (by (103), applied to s = t). Hence, (99) (applied to a = at
and b = bt and c = ∑ as and d = ∑ bs ) yields
s∈S\{t} s∈S\{t}

xn
at + ∑ a s ≡ bt + ∑ bs .
s∈S\{t} s∈S\{t}

In view of
 
here, we have split off the
∑ as = at + ∑ as
addend for s = t from the sum
s∈S s∈S\{t}

and  
here, we have split off the
∑ bs = bt + ∑ bs
addend for s = t from the sum
,
s∈S s∈S\{t}

this rewrites as
xn
∑ as ≡ ∑ bs .
s∈S s∈S

Hence, we have shown that (104) holds for |S| = k + 1. This completes the induction
step. Thus, the induction proof of (104) is complete.]
We have now proved (104). The exact same argument (but with all sums replaced by
products, and with the reference to (99) replaced by a reference to (101)) can be used
to prove (105). Hence, the proof of Theorem 3.10.3 (f) is complete.

Detailed proof of Proposition 3.10.4. For each m ∈ {0, 1, . . . , n}, we have

[ xm ] ( f − g) = [ xm ] f − [ xm ] g (330)

(by (21), applied to m, f and g instead of n, a and b).


Lemma 3.3.18 (applied to f − g and n + 1 instead of f and k) shows that the first
n + 1 coefficients of the FPS f − g are 0 if and only if f − g is a multiple of x n+1 .
Math 701 Spring 2021, version April 6, 2024 page 607

Now, we have the following chain of logical equivalences:


 
 n 
x
f ≡ g ⇐⇒ each m ∈ {0, 1, . . . , n} satisfies [ x m ] f = [ x m ] g 
 
| {z }
⇐⇒ ([ x m ] f −[ x m ] g=0)

(by Definition 3.10.1)


 
 
m m
 
⇐⇒ each m ∈ {0, 1, . . . , n} satisfies [ x ] f − [ x ] g = 0


 | {z } 
m =[ x ]( f − g)
(by (330))

⇐⇒ (each m ∈ {0, 1, . . . , n} satisfies [ x m ] ( f − g) = 0)


⇐⇒ (the first n + 1 coefficients of the FPS f − g are 0)
 
⇐⇒ f − g is a multiple of x n+1
(since we have seen that the first n + 1 coefficients of the FPS f − g are 0 if and only if
xn
f − g is a multiple of x n+1 ). In other words, we have f ≡ g if and only if the FPS f − g
is a multiple of x n+1 . This proves Proposition 3.10.4.
Detailed proof of Proposition 3.10.5. Write the FPS a in the form a = ∑ ai xi with a0 , a1 , a2 , . . . ∈
i ∈N
K. Thus,
am = [ x m ] a for each m ∈ N (331)
(by the definition of [ xm ] a). Furthermore, Definition 3.5.1 yields
a◦c = ∑ ai ci (332)
i ∈N

(since a = ∑ ai xi with a0 , a1 , a2 , . . . ∈ K).


i ∈N
Write the FPS b in the form b = ∑ bi xi with b0 , b1 , b2 , . . . ∈ K. Thus,
i ∈N
m
bm = [ x ] b for each m ∈ N (333)
(by the definition of [ x m ] b). Furthermore, Definition 3.5.1 yields
b◦d = ∑ bi d i (334)
i ∈N

(since b = ∑ bi xi with b0 , b1 , b2 , . . . ∈ K).


i ∈N
xn
We have a ≡ b. In other words,
each m ∈ {0, 1, . . . , n} satisfies [ x m ] a = [ x m ] b (335)
xn
(by the definition of “a ≡ b”). Hence, each m ∈ {0, 1, . . . , n} satisfies
am = [ x m ] a (by (331))
m
= [x ] b (by (335))
= bm (by (333)) . (336)
Now, we claim the following:
Math 701 Spring 2021, version April 6, 2024 page 608

Claim 1: Let i ∈ {0, 1, . . . , n}. Then, each m ∈ {0, 1, . . . , n} satisfies


   
[ x m ] a i c i = [ x m ] bi d i .

[Proof of Claim 1: Let S be the set {1, 2, . . . , i }. This set S is finite, and satisfies |S| = i.
xn
Moreover, we have c ≡ d for each s ∈ S (by assumption). Hence, (105) (applied to
as = c and bs = d) yields
xn
∏ c ≡ ∏ d.
s∈S s∈S

In view of

∏ c = c |S| = ci (since |S| = i ) and


s∈S

∏ d = d |S| = di (since |S| = i ) ,


s∈S

xn
we can rewrite this as ci ≡ di . In other words,
   
each m ∈ {0, 1, . . . , n} satisfies [ x m ] ci = [ x m ] di (337)

xn
(by the definition of “ci ≡ di ”).
Now, let m ∈ {0, 1, . . . , n}. Then,  (25) (applied to m, ai and ci instead
 of n, λ and
a) yields [ x ] ai c = ai · [ x ] c . Similarly, [ x ] bi di = bi · [ x m ] di . On the other
m i m i m

hand, (336) (applied to i instead of m) yields ai = bi . Hence,


     
[ x m ] a i c i = a i · [ x m ] c i = bi · [ x m ] d i
|{z} | {z }
= bi
=[ x m ](di )
(by (337))
      
= [ x m ] bi d i since [ x m ] bi di = bi · [ x m ] di .

This proves Claim 1.]


Next, we claim the following:

Claim 2: Let m ∈ N. Let i ∈ N \ {0, 1, . . . , m}. Then,


 
[ x m ] ci = 0 (338)

and  
[ x m ] di = 0. (339)

[Proof of Claim 2: We have i ∈ N \ {0, 1, . . . , m} = {m + 1, m + 2, m + 3, . . .}, so that


i ≥ m + 1 and therefore m ≤ i −
0
 1. Hence, m ∈ {0, 1, . . . , i − 1} (since m ∈ N).
By assumption, we have x c = 0. In other words, the 0-th coefficient of c is 0. In
other words, the first 1 coefficient of the FPS c is 0. However, Lemma 3.3.18 (applied to
k = 1 and f = c) yields that the first 1 coefficient of the FPS c is 0 if and only if c is a
Math 701 Spring 2021, version April 6, 2024 page 609

multiple of x1 . Hence, c is a multiple of x1 (since the first 1 coefficient of the FPS c is


0). In other words, c = x1 h for some h ∈ K [[ x ]]. Consider this h. Now, c = x1 h = xh,
so that ci = ( xh)i = xi hi . However, Lemma 3.3.17 (applied to i and hi instead of k
and a) yields that the first i coefficients of the FPS xi hi are 0. In other
  words, the first
i coefficients of the FPS ci are 0 (since ci = xi hi ). In other words, x j ci = 0 for all


j ∈ {0, 1, . . . , i − 1}. We can apply this to j = m (since m ∈ {0, 1, . . . , i − 1}), and thus
obtain [ x m ] ci = 0. This proves (338). The proof of (339) is analogous (but uses the
FPS d instead of c). Thus, Claim 2 is proven.]
Next, we generalize Claim 1 to all i ∈ N:

Claim 3: Let i ∈ N. Let m ∈ {0, 1, . . . , n}. Then,


   
[ x m ] a i c i = [ x m ] bi d i .

[Proof of Claim 3: If i ∈ {0, 1, . . . , m}, then this follows from Claim 1. Thus, for the
rest of this proof, we WLOG assume that we don’t have i ∈ {0, 1, . . . , m}. Hence,
i∈/ {0, 1, . . . , m}, so that i ∈ N \ {0, 1, . . . , m} (since i ∈ N). Thus, Claim 2 applies, and
therefore (338) and (339) hold.
i m i

 (25) (applied to m, ai and c instead of n, λ and a) yields [ x ] ai c = ai ·
Now,
[ x m ] ci = 0. The same argument (applied to bi and d instead of ai and c) yields
| {z }
=0
(by (338))
[ x m ] bi d i
= 0. Comparing these two equalities, we obtain [ x m ] ai ci = [ x m ] bi di .
  

This proves Claim 3.]


Now, each m ∈ {0, 1, . . . , n} satisfies
!
[ xm ] ( a ◦ c) = [ xm ] ∑ ai ci (by (332))
i ∈N
!
   
= ∑ [ xm ] a i c i = ∑ [ x m ] bi d i = [ x m ] ∑ bi d i = [ x m ] (b ◦ d) .
i ∈N | {z } i∈N i ∈N
=[ x m ](bi di )
| {z }
=b◦d
(by Claim 3) (by (334))

xn xn
In other words, a ◦ c ≡ b ◦ d (by the definition of “a ◦ c ≡ b ◦ d”). This proves Proposition
3.10.5.

B.2. Infinite products


Detailed proof of Proposition 3.11.7. Let us (temporarily) use two different notations for
our two different definitions of a product: We let ∏ ai denote the finite product ∏ ai
i∈ I i∈ I
∏ ai shall
defined in the usual way (i.e., defined as in any commutative ring), whereas f
i∈ I
mean the product ∏ ai defined according to Definition 3.11.5 (b). Thus, our goal is to
i∈ I
∏ ai = ∏ ai .
show that f
i∈ I i∈ I
Math 701 Spring 2021, version April 6, 2024 page 610

Definition 3.11.5 (b) shows that if n ∈ N, and if M is a finite subset of I that deter-
mines the x n -coefficient in the product of (ai )i∈ I , then
! !
[ xn ] ∏
f a = [ xn ]
i ∏ ai . (340)
i∈ I i∈ M

Now, let n ∈ N. The set I is finite (since the family (ai )i∈ I is finite), and thus is a
finite subset of I. Moreover, every finite subset J of I satisfying I ⊆ J ⊆ I satisfies
! !
[ xn ] ∏ ai = [ xn ] ∏ ai
i∈ J i∈ I
!  
(because combining I ⊆ J and J ⊆ I yields J = I, and thus [ xn ] ∏ ai =[ xn ] ∏ ai ).
i∈ J i∈ I
In other words, I determines the x n -coefficient in the product of (ai )i∈ I (by the defi-
nition of “determining the x n -coefficient in a product”). Hence, we can apply (340) to
M = I. As a consequence, we obtain
! !
[ xn ] f a = [ xn ] a . ∏ i ∏ i
i∈ I i∈ I

Now, forget that we fixed n. We thus have proved


! !
[ x ] ∏ ai = [ x ] ∏ ai
n f n
for each n ∈ N.
i∈ I i∈ I

∏ ai equals the corresponding coefficient of


In other words, each coefficient of the FPS f
i∈ I
∏ ai . Hence, f
∏ ai = ∏ ai . This completes the proof of Proposition 3.11.7.
i∈ I i∈ I i∈ I

Detailed proof of Lemma 3.11.9. We shall prove Lemma 3.11.9 by induction on | J | (this is
allowed, since the set J is supposed to be finite):
Induction base: Lemma 3.11.9 is true in the case when | J | = 0 163 .
Induction step: Let k ∈ N. Assume (as the induction hypothesis) that Lemma 3.11.9
is true in the case when | J | = k. We must prove that Lemma 3.11.9 is true in the case
when | J | = k + 1.
163 Proof. Let a, ( f i )i∈ J and n be as in Lemma 3.11.9. Assume that | J | = 0. Thus, J = ∅, so
that the product ∏ (1 + f i ) is empty. Hence, ∏ (1 + f i ) = (empty product) = 1. Hence, we
i∈ J i∈ J
have  
 
[ x ]  a ∏ (1 + f i ) 
m m
 

 = [x ] a for each m ∈ {0, 1, . . . , n} .
 i∈ J 
| {z }
=1
Thus, we have proved Lemma 3.11.9 under the assumption that | J | = 0. Therefore, Lemma
3.11.9 is true in the case when | J | = 0.
Math 701 Spring 2021, version April 6, 2024 page 611

Let a, ( f i )i∈ J and n be as in Lemma 3.11.9. Assume that | J | = k + 1. Thus, | J | =


k + 1 > k ≥ 0; hence, the set J is nonempty. In other words, there exists some j ∈ J.
Consider this j. We have j ∈ J and therefore
[ xm ] f j = 0

for each m ∈ {0, 1, . . . , n} (341)
(by (118), applied to i = j). Hence, Lemma 3.11.8 (applied to a ∏ (1 + f i ) and f j
i ∈ J \{ j}
instead of a and f ) yields that
    

[ xm ]  a ∏ (1 + f i )  1 + f j  = [ x m ]  a ∏

(1 + f i ) 
i ∈ J \{ j} i ∈ J \{ j}

for each m ∈ {0, 1, . . . , n} .


In view of
 

∏ (1 + f i ) ∏ ∏
 
a = a 1 + fj (1 + f i ) =  a (1 + f i )  1 + f j ,
i∈ J i ∈ J \{ j} i ∈ J \{ j}
| {z }
= ( 1+ f j ) ∏ (1+ f i )
i ∈ J \{ j}
(here, we have split off the factor
for i = j from the product)

we can rewrite this as follows: We have


!  

[ x m ] a ∏ (1 + f i ) = [ xm ] a ∏ (1 + f i )  (342)
i∈ J i ∈ J \{ j}

for each m ∈ {0, 1, . . . , n} .


On the other hand, each i ∈ J \ { j} satisfies j ∈ J \ { j} ⊆ J and therefore
[ xm ] ( f i ) = 0 for each m ∈ {0, 1, . . . , n}
(by (118)). Furthermore, from j ∈ J, we obtain | J \ { j}| = | J | − 1 = k (since | J | =
k + 1). Hence, we can apply Lemma 3.11.9 to J \ { j} instead of J (because our induction
hypothesis tells us that Lemma 3.11.9 is true in the case when | J | = k). We thus
conclude that
 

[ xm ] a ∏ (1 + f i )  = [ x m ] a for each m ∈ {0, 1, . . . , n} . (343)


i ∈ J \{ j}

Now, for each m ∈ {0, 1, . . . , n}, we obtain


!  

[ x m ] a ∏ (1 + f i ) = [ x m ]  a ∏ (1 + f i )  (by (342))
i∈ J i ∈ J \{ j}

= [ xm ] a (by (343)) .
This is precisely the claim of Lemma 3.11.9. Thus, we have proved that Lemma 3.11.9
is true in the case when | J | = k + 1. This completes the induction step. Thus, the
induction proof of Lemma 3.11.9 is complete.
Math 701 Spring 2021, version April 6, 2024 page 612

Detailed proof of Theorem 3.11.10. The family ( f i )i∈ I is summable. In other words,

for each n ∈ N, all but finitely many i ∈ I satisfy [ x n ] f i = 0

(by the definition of “summable”). In other words, for each n ∈ N, there exists a finite
subset In of I such that
all i ∈ I \ In satisfy [ x n ] f i = 0. (344)
Consider this subset In . Thus, all the sets I0 , I1 , I2 , . . . are finite subsets of I.
Now, let n ∈ N be arbitrary. Let M := I0 ∪ I1 ∪ · · · ∪ In . Then, M is a union of n + 1
finite subsets of I (because all the sets I0 , I1 , I2 , . . . are finite subsets of I), and thus itself
is a finite subset of I. Moreover,

all m ∈ {0, 1, . . . , n} and all i ∈ I \ M satisfy [ x m ] f i = 0. (345)

[Proof of (345): Let m ∈ {0, 1, . . . , n} and i ∈ I \ M. We must show that [ x m ] f i = 0.


From m ∈ {0, 1, . . . , n}, we obtain Im ⊆ I0 ∪ I1 ∪ · · · ∪ In = M, so that M ⊇ Im and
thus I \ |{z}
M ⊆ I \ Im . Hence, i ∈ I \ M ⊆ I \ Im . Therefore, (344) (applied to m instead
⊇ Im
of n) yields [ x m ] f i = 0. This proves (345).]
Now, we shall prove that the set M determines the x n -coefficient in the product of
(1 + f i )i∈ I . Indeed, let J be a finite subset of I satisfying M ⊆ J ⊆ I. Let a be the
FPS ∏ (1 + f i ) (this is well-defined, since the set M is finite). We have J \ M ⊆ J, so
i∈ M
that the set J \ M is finite (since J is finite). Hence, ( f i )i∈ J \ M is a finite family of FPSs.
Moreover, each i ∈ J \ M satisfies

[ xm ] f i = 0 for each m ∈ {0, 1, . . . , n}

(by (345), because i ∈ J \ M ⊆ I \ M). Thus, Lemma 3.11.9 (applied to J \ M instead


|{z}
⊆I
of J) yields
!
[x ] am
∏ (1 + f i ) = [ xm ] a for each m ∈ {0, 1, . . . , n} .
i∈ J \ M

Applying this to m = n, we find


! !
[ xn ] a ∏ (1 + f i ) = [ xn ] a = [ xn ] ∏ (1 + f i ) (346)
i∈ J \ M i∈ M

(since a = ∏ (1 + f i )). However, the finite set J is the union of the two disjoint sets M
i∈ M
and J \ M (since M ⊆ J). Hence, the product ∏ (1 + f i ) can be split as follows:
i∈ J
! !
∏ (1 + f i ) = ∏ (1 + f i ) ∏ (1 + f i ) =a ∏ (1 + f i ) .
i∈ J i∈ M i∈ J \ M i∈ J \ M
| {z }
=a
(by the definition of a)
Math 701 Spring 2021, version April 6, 2024 page 613

In view of this, we can rewrite (346) as


! !
[x ]n
∏ (1 + f i ) = [x ] n
∏ (1 + f i ) .
i∈ J i∈ M

Forget that we fixed J. We thus have shown that every finite subset J of I satisfying
M ⊆ J ⊆ I satisfies
! !
[ xn ] ∏ (1 + f i ) = [ xn ] ∏ (1 + f i ) .
i∈ J i∈ M

In other words, the set M determines the x n -coefficient in the product of (1 + f i )i∈ I (ac-
cording to Definition 3.11.1 (b)). Hence, the x n -coefficient in the product of (1 + f i )i∈ I
is finitely determined (according to Definition 3.11.3 (b), since the set M is finite).
Forget that we fixed n. Hence, we have shown that for each n ∈ N, the x n -coefficient
in the product of (1 + f i )i∈ I is finitely determined. In other words, each coefficient in
this product is finitely determined. In other words, the family (1 + f i )i∈ I is multipliable
(by the definition of “multipliable”). This proves Theorem 3.11.10.

Detailed proof of Proposition 3.11.11. Let (ai )i∈ I ∈ K [[ x ]] I be a family of FPSs. Assume
that all but finitely many entries of this family (ai )i∈ I equal 1 (that is, all but finitely
many i ∈ I satisfy ai = 1). We must prove that this family is multipliable.
We have assumed that all but finitely many i ∈ I satisfy ai = 1. In other words, there
exists a finite subset M of I such that
all i ∈ I \ M satisfy ai = 1. (347)
Consider this M.
Let n ∈ N. Every finite subset J of I satisfying M ⊆ J ⊆ I satisfies
! !
[ xn ] ∏ ai = [ xn ] ∏ ai
i∈ J i∈ M

164 .In other words, the set M determines the x n -coefficient in the product of (ai )i∈ I
(by the definition of “determining the x n -coefficient in a product”). Hence, the x n -
164 Proof. Let J be a finite subset of I satisfying M ⊆ J ⊆ I. Then, each i ∈ J \ M satisfies
i∈ J \ M ⊆ I \ M and therefore
|{z}
⊆I
ai = 1 (348)
(by (347)). However, the set J is the union of the two disjoint sets M and J \ M (since M ⊆ J).
Hence, we can split the product ∏ ai as follows:
i∈ J
 
! ! 
 
∏ ai = ∏ ai  ∏ ai  = ∏ ai  ∏ 1 = ∏ ai .
 
i∈ J i∈ M i∈ M i∈ M
i ∈ J \ M |{z}  i∈ J \ M
=1
(by (348)) | {z }
=1
!  
Therefore, [xn ] ∏ ai = [xn ] ∏ ai , qed.
i∈ J i∈ M
Math 701 Spring 2021, version April 6, 2024 page 614

coefficient in the product of (ai )i∈ I is finitely determined (by the definition of “finitely
determined”).
Forget that we fixed n. We thus have proved that for each n ∈ N, the x n -coefficient
in the product of (ai )i∈ I is finitely determined. In other words, each coefficient in the
product of (ai )i∈ I is finitely determined. In other words, the family (ai )i∈ I is multipli-
able (by the definition of “multipliable”). This proves Proposition 3.11.11.

Detailed proof of Lemma 3.11.15. Fix m ∈ {0, 1, . . . , n}. Recall that the family (ai )i∈ I is
multipliable. In other words, each coefficient in its product is finitely determined.
Hence, in particular, the x m -coefficient in the product of (ai )i∈ I is finitely determined.
In other words, there is a finite subset M of I that determines the x m -coefficient in the
product of (ai )i∈ I . Consider this subset M, and denote it by Mm . Thus, Mm is a finite
subset of I that determines the x m -coefficient in the product of (ai )i∈ I .
Forget that we fixed m. Thus, for each m ∈ {0, 1, . . . , n}, we have defined a finite
subset Mm of I. Let M be the union M0 ∪ M1 ∪ · · · ∪ Mn of these (altogether n + 1)
subsets. Thus, M is a union of finitely many finite subsets of I; hence, M itself is a
finite subset of I.
Now, let m ∈ {0, 1, . . . , n}. We shall show that M determines the x m -coefficient in
the product of (ai )i∈ I .
Indeed, let N be a finite subset of I satisfying M ⊆ N ⊆ I. We have Mm ⊆ M (since
M is the union M0 ∪ M1 ∪ · · · ∪ Mn , while Mm is one of the sets in this union). Hence,
Mm ⊆ M ⊆ N. Now, recall that the set Mm determines the x m -coefficient in the product
of (ai )i∈ I (by the definition of Mm ). In other words, every finite subset J of I satisfying
Mm ⊆ J ⊆ I satisfies ! !
[ xm ] ∏ ai = [ xm ] ∏ ai (349)
i∈ J i ∈ Mm

(by the definition of what it means to “determine the x m -coefficient in the product of
(ai )i∈ I ”). Applying this to J = N, we obtain
! !
[ xm ] ∏ ai = [ xm ] ∏ ai
i∈ N i ∈ Mm

(since N is a finite subset of I satisfying Mm ⊆ N ⊆ I). On the other hand, we can


apply (349) to J = M (since M is a finite subset of I satisfying Mm ⊆ M ⊆ I), and thus
obtain ! !
[ xm ] ∏ ai = [ xm ] ∏ ai .
i∈ M i ∈ Mm

Comparing these two equalities, we obtain


! !
[ xm ] ∏ ai = [ xm ] ∏ ai .
i∈ N i∈ M

Forget that we fixed N.We thus


 have shown
  every finite subset N of I satisfying
that
M ⊆ N ⊆ I satisfies [ x m ] ∏ ai = [ xm ] ∏ ai . Renaming N as J in this statement,
i∈ N i∈ M
we obtain the following: Every finite subset J of I satisfying M ⊆ J ⊆ I satisfies
Math 701 Spring 2021, version April 6, 2024 page 615

!  
[ xm ] ∏ ai =[ xm ] ∏ ai . In other words, M determines the x m -coefficient in the
i∈ J i∈ M
product of (ai )i∈ I (by the definition of what it means to “determine the x m -coefficient
in the product of (ai )i∈ I ”).
Forget that we fixed m. We thus have shown that M determines the x m -coefficient in
the product of (ai )i∈ I for each m ∈ {0, 1, . . . , n}. In other words, M determines the first
n + 1 coefficients in the product of (ai )i∈ I . In other words, M is an x n -approximator
for (ai )i∈ I (by the definition of an “x n -approximator”). Hence, there exists an x n -
approximator for (ai )i∈ I . This proves Lemma 3.11.15.

Detailed proof of Proposition 3.11.16. The set M is an x n -approximator for (ai )i∈ I . In
other words, M is a finite subset of I that determines the first n + 1 coefficients in
the product of (ai )i∈ I (by the definition of an “x n -approximator”).
(a) Let m ∈ {0, 1, . . . , n}. Recall that M determines the first n + 1 coefficients in the
product of (ai )i∈ I . Thus, in particular, M determines the x m -coefficient in the product
of (ai )i∈ I (since m ∈ {0, 1, . . . , n}). In other words, every finite subset J of I satisfying
M ⊆ J ⊆ I satisfies ! !
[ xm ] ∏ ai = [ xm ] ∏ ai (350)
i∈ J i∈ M

(by the definition of “determining the x m -coefficient in the product of (ai )i∈ I ”).
Forget that we fixed m. We thus have proved that every m ∈ {0, 1, . . . , n} and every
finite subset J of I satisfying M ⊆ J ⊆ I satisfy (350).
Now, let J be a finite
! subset of I satisfying M ⊆ J ⊆ I. Then, each m ∈ {0, 1, . . . , n}
 
xn
satisfies [ x m ] ∏ ai = [ x m ] ∏ ai (by (350)). In other words, we have ∏ ai ≡ ∏ ai
i∈ J i∈ M i∈ J i∈ M
(by Definition 3.10.1). This proves Proposition 3.11.16 (a).
(b) Assume that the family (ai )i∈ I is multipliable. Let m ∈ {0, 1, . . . , n}. Thus, M
determines the x m -coefficient in the product of (ai )i∈ I (since M determines the first
n + 1 coefficients in the product of (ai )i∈ I ).
The product ∏ ai is defined according to Definition 3.11.5 (b). Specifically, Definition
i∈ I
3.11.5 (b) (with n and M renamed as k and N) shows that the product ∏ ai is defined
i∈ I
to be the FPS whose x k -coefficient (for any k ∈ N) can be computed as follows: If
k ∈ N, and if N is a finite subset of I that determines the x k -coefficient in the product
of (ai )i∈ I , then ! !
h i h i
x k
∏ ai = x ∏ ai .
k

i∈ I i∈ N

We can apply this to k = m and N = M (since M is a finite subset of I that determines


the x m -coefficient in the product of (ai )i∈ I ), and thus obtain
! !
[ xm ] ∏ ai = [ xm ] ∏ ai .
i∈ I i∈ M
Math 701 Spring 2021, version April 6, 2024 page 616

Forget
 that
 we fixed m. We thus have proved that each m ∈ {0, 1, . . . , n} satisfies
x n
[ x m ] ∏ ai = [ x m ] ∏ ai . In other words, ∏ ai ≡ ∏ ai (by Definition 3.10.1). This
i∈ I i∈ M i∈ I i∈ M
proves Proposition 3.11.16 (b).

Detailed proof of Proposition 3.11.17. (a) Fix n ∈ N. We know that the family (ai )i∈ J is
multipliable. Hence, there exists an x n -approximator U for (ai )i∈ J (by Lemma 3.11.15,
applied to J instead of I). Consider this U.
We also know that the family (ai )i∈ I \ J is multipliable. Hence, there exists an x n -
approximator V for (ai )i∈ I \ J (by Lemma 3.11.15, applied to I \ J instead of I). Consider
this V.
We know that U is an x n -approximator for (ai )i∈ J . In other words, U is a finite
subset of J that determines the first n + 1 coefficients in the product of (ai )i∈ J (by the
definition of an x n -approximator). Hence, in particular, U is finite. Similarly, V is finite.
Moreover, U ⊆ J (since U is a subset of J); similarly, V ⊆ I \ J.
Let M = U ∪ V. This set M is finite (since U and V are finite). Moreover, using
U ⊆ J ⊆ I and V ⊆ I \ J ⊆ I, we obtain M = |{z} U ∪ |{z}
V ⊆ I ∪ I = I. Hence, M is a
⊆I ⊆I
finite subset of I. Note that the sets U and V are disjoint165 . Hence, the set M is the
union of its two disjoint subsets U and V (since M = U ∪ V).
Now, let N be a finite subset of I satisfying M ⊆ N ⊆ I. We shall show that
! !
[ xn ] ∏ ai = [ xn ] ∏ ai .
i∈ N i∈ M

Indeed, let m ∈ {0, 1, . . . , n}. The set N ∩ J is a finite subset of J (since N is a finite
subset of I) and satisfies U ⊆ N ∩ J (since U ⊆ U ∪ V = M ⊆ N and U ⊆ J). Now,
recall that U determines the first n + 1 coefficients in the product of (ai )i∈ J . Hence, U
determines the x m -coefficient in the product of (ai )i∈ J (since m ∈ {0, 1, . . . , n}). In other
words, every finite subset T of J satisfying U ⊆ T ⊆ J satisfies
! !
[ xm ] ∏ ai = [ xm ] ∏ ai
i∈T i ∈U

(by the definition of what it means to “determine the x m -coefficient in the product of
(ai )i∈ J ”). We can apply this to T = N ∩ J (since N ∩ J is a finite subset of J satisfying
U ⊆ N ∩ J ⊆ J), and thus obtain
! !
[ xm ] ∏ ai = [ xm ] ∏ ai . (351)
i∈ N ∩ J i ∈U

The same argument (applied to I \ J and V instead of J and U) yields


  !
m 
[x ] ∏ ai  = [ x ] ∏ ai .
m

i ∈ N ∩( I \ J ) i ∈V

165 Indeed, U ∩ |{z}


V ⊆ J ∩ ( I \ J ) = ∅, so that U ∩ V = ∅.
|{z}
⊆J ⊆I\J
Math 701 Spring 2021, version April 6, 2024 page 617

In view of N ∩ ( I \ J ) = ( N ∩ I ) \ J = N \ J, this rewrites as


| {z }
=N
(since N ⊆ I)
! !
[x ] m
∏ ai = [x ] m
∏ ai . (352)
i∈ N \ J i ∈V

Forget that we fixed m. We thus have proved the equalities (351) and (352) for each
m ∈ {0, 1, . . . , n}. Hence, Lemma 3.3.22 (applied to a = ∏ ai and b = ∏ ai and
i∈ N ∩ J i ∈U
c = ∏ ai and d = ∏ ai ) yields that
i∈ N \ J i ∈V
! !! ! !!
[ xm ] ∏ ai ∏ ai = [ xm ] ∏ ai ∏ ai
i∈ N ∩ J i∈ N \ J i ∈U i ∈V

for each m ∈ {0, 1, . . . , n}. In view of


! !  
since the set N is the union of its
∏ ai = ∏ ai ∏ ai
two disjoint subsets N ∩ J and N \ J
i∈ N i∈ N ∩ J i∈ N \ J

and
! !  
since the set M is the union of its
∏ ai = ∏ ai ∏ ai two disjoint subsets U and V
,
i∈ M i ∈U i ∈V

this rewrites as follows: We have


! !
[x ]m
∏ ai = [x ] m
∏ ai
i∈ N i∈ M

for each m ∈ {0, 1, . . . , n}. Applying this to m = n, we obtain


! !
[ xn ] ∏ ai = [ xn ] ∏ ai .
i∈ N i∈ M

Forget that we fixed N.We thus


 have shown
  every finite subset N of I satisfying
that
M ⊆ N ⊆ I satisfies [ x n ] ∏ ai = [ xn ] ∏ ai . In other words, M determines the
i∈ N i∈ M
x n -coefficient in the product of (ai )i∈ I (by the definition of what it means to “determine
the x n -coefficient in the product of (ai )i∈ I ”). Hence, the x n -coefficient in the product of
(ai )i∈ I is finitely determined (by the definition of “finitely determined”, since M is a
finite subset of I).
Forget that we fixed n. We thus have proved that for each n ∈ N, the x n -coefficient
in the product of (ai )i∈ I is finitely determined. In other words, each coefficient in the
product of (ai )i∈ I is finitely determined. In other words, the family (ai )i∈ I is multipli-
able (by the definition of “multipliable”). This proves Proposition 3.11.17 (a).
(b) Proposition 3.11.17 (a) shows that the family (ai )i∈ I is multipliable.
Let n ∈ N. In our above proof of Proposition 3.11.17 (a), we have seen the following:
Math 701 Spring 2021, version April 6, 2024 page 618

• There exists an x n -approximator U for (ai )i∈ J .

• There exists an x n -approximator V for (ai )i∈ I \ J .

Consider these U and V. Let M = U ∪ V. In our above proof of Proposition 3.11.17


(a), we have seen the following:

• The set M is a finite subset of I.

• The set M is the union of its two disjoint subsets U and V.

• The set M determines the x n -coefficient in the product of (ai )i∈ I .

Now, the definition of the infinite product ∏ ai (namely, Definition 3.11.5 (b)) yields
i∈ I
that ! !
n
[x ] ∏ ai = [x ] n
∏ ai (353)
i∈ I i∈ M

(since M is a finite subset of I that determines the x n -coefficient in the product of


(ai )i∈ I ). On the other hand, the set U is an x n -approximator for (ai )i∈ J . Thus, Proposi-
tion 3.11.16 (b) (applied to J and U instead of I and M) yields
xn
∏ ai ≡ ∏ ai (354)
i∈ J i ∈U

(since the family (ai )i∈ J is multipliable). Furthermore, the set V is an x n -approximator
for (ai )i∈ I \ J . Thus, Proposition 3.11.16 (b) (applied to I \ J and V instead of I and M)
yields
xn
∏ ai ≡ ∏ ai (355)
i∈ I \ J i ∈V

(since the family (ai )i∈ I \ J is multipliable). From (354) and (355), we obtain
! ! ! !
xn
∏ ai ∏ ai ≡ ∏ ai ∏ ai
i∈ J i∈ I \ J i ∈U i ∈V

(by (101), applied to a = ∏ ai and b = ∏ ai and c = ∏ ai and d = ∏ ai ). In view of


i∈ J i ∈U i∈ I \ J i ∈V
! !  
since the set M is the union of its
∏ ai = ∏ ai ∏ ai two disjoint subsets U and V
,
i∈ M i ∈U i ∈V

this rewrites as ! !
xn
∏ ai ∏ ai ≡ ∏ ai .
i∈ J i∈ I \ J i∈ M

In other words,
! !! !
each m ∈ {0, 1, . . . , n} satisfies [ x m ] ∏ ai ∏ ai = [ xm ] ∏ ai
i∈ J i∈ I \ J i∈ M
Math 701 Spring 2021, version April 6, 2024 page 619

(by Definition 3.10.1). Applying this to m = n, we obtain


! !! !
n
[x ] ∏ ai ∏ ai = [x ] n
∏ ai .
i∈ J i∈ I \ J i∈ M

Comparing this with (353), we obtain


! ! !!
[ xn ] ∏ ai = [ xn ] ∏ ai ∏ ai . (356)
i∈ I i∈ J i∈ I \ J

Now, forget that we fixed n. We thus have proved that each n ∈ N satisfies (356).
In other words, each coefficient of the FPS ∏ ai equals the corresponding coefficient of
! ! i∈ I

∏ ai ∏ ai . Hence, we have
i∈ J i∈ I \ J
! !
∏ ai = ∏ ai · ∏ ai .
i∈ I i∈ J i∈ I \ J

This proves Proposition 3.11.17 (b).

Detailed proof of Proposition 3.11.18. (a) Fix n ∈ N. We know that the family (ai )i∈ I is
multipliable. Hence, there exists an x n -approximator U for (ai )i∈ I (by Lemma 3.11.15).
Consider this U.
We also know that the family (bi )i∈ I is multipliable. Hence, there exists an x n -
approximator V for (bi )i∈ I (by Lemma 3.11.15, applied to bi instead of ai ). Consider
this V.
We know that U is an x n -approximator for (ai )i∈ I . In other words, U is a finite
subset of I that determines the first n + 1 coefficients in the product of (ai )i∈ I (by the
definition of an x n -approximator). Hence, in particular, U is finite. Similarly, V is finite.
Moreover, U ⊆ I (since U is a subset of I); similarly, V ⊆ I.
Let M = U ∪ V. This set M is finite (since U and V are finite). Moreover, using
U ⊆ I and V ⊆ I, we obtain M = |{z} U ∪ |{z}
V ⊆ I ∪ I = I. Hence, M is a finite subset
⊆I ⊆I
of I.
Now, let N be a finite subset of I satisfying M ⊆ N ⊆ I. We shall show that
! !
[ xn ] ∏ ( ai bi ) = [ xn ] ∏ ( ai bi ) .
i∈ N i∈ M

Indeed, let m ∈ {0, 1, . . . , n}. We have U ⊆ U ∪ V = M ⊆ N. The set N is a


finite subset of I and satisfies U ⊆ N. Now, recall that U determines the first n + 1
coefficients in the product of (ai )i∈ I . Hence, U determines the x m -coefficient in the
product of (ai )i∈ I (since m ∈ {0, 1, . . . , n}). In other words, every finite subset T of I
satisfying U ⊆ T ⊆ I satisfies
! !
[ xm ] ∏ ai = [ xm ] ∏ ai (357)
i∈T i ∈U
Math 701 Spring 2021, version April 6, 2024 page 620

(by the definition of what it means to “determine the x m -coefficient in the product
of (ai )i∈ I ”). We can apply this to T = N (since N is a finite subset of I satisfying
U ⊆ N ⊆ I), and thus obtain
! !
[ xm ] ∏ ai = [ xm ] ∏ ai .
i∈ N i ∈U

However, we can also apply (357) to T = M (since M is a finite subset of I satisfying


U ⊆ M ⊆ I), and thus obtain
! !
[ xm ] ∏ ai = [ xm ] ∏ ai .
i∈ M i ∈U

Comparing these two equalities, we obtain


! !
[ xm ] ∏ ai = [ xm ] ∏ ai . (358)
i∈ N i∈ M

The same argument (applied to bi and V instead of ai and U) yields


! !
[ xm ] ∏ bi = [ xm ] ∏ bi . (359)
i∈ N i∈ M

Forget that we fixed m. We thus have proved the equalities (358) and (359) for each
m ∈ {0, 1, . . . , n}. Hence, Lemma 3.3.22 (applied to a = ∏ ai and b = ∏ ai and
i∈ N i∈ M
c = ∏ bi and d = ∏ bi ) yields that
i∈ N i∈ M
! !! ! !!
[ xm ] ∏ ai ∏ bi = [ xm ] ∏ ai ∏ bi
i∈ N i∈ N i∈ M i∈ M

for each m ∈ {0, 1, . . . , n}. Applying this to m = n, we obtain


! !! ! !!
n
[x ] ∏ ai ∏ bi = [x ]n
∏ ai ∏ bi .
i∈ N i∈ N i∈ M i∈ M

In view of
! !  
by the standard rules for finite
∏ ( ai bi ) = ∏ ai ∏ bi products, since N is a finite set
i∈ N i∈ N i∈ N

and
! !  
by the standard rules for finite
∏ ( ai bi ) = ∏ ai ∏ bi
products, since M is a finite set
,
i∈ M i∈ M i∈ M

this rewrites as ! !
[x ] n
∏ ( ai bi ) = [x ]n
∏ ( ai bi ) .
i∈ N i∈ M
Math 701 Spring 2021, version April 6, 2024 page 621

Forget that we fixed N. 


We thus have finite subset N of I satisfying
 shownthat every 
M ⊆ N ⊆ I satisfies [ x n ] ∏ ( ai bi ) = [ xn ] ∏ (ai bi ) . In other words, M deter-
i∈ N i∈ M
mines the x n -coefficient in the product of (ai bi )i∈ I (by the definition of what it means
to “determine the x n -coefficient in the product of (ai bi )i∈ I ”). Hence, the x n -coefficient
in the product of (ai bi )i∈ I is finitely determined (by the definition of “finitely deter-
mined”, since M is a finite subset of I).
Forget that we fixed n. We thus have proved that for each n ∈ N, the x n -coefficient
in the product of (ai bi )i∈ I is finitely determined. In other words, each coefficient in
the product of (ai bi )i∈ I is finitely determined. In other words, the family (ai bi )i∈ I is
multipliable (by the definition of “multipliable”). This proves Proposition 3.11.18 (a).
(b) Proposition 3.11.18 (a) shows that the family (ai bi )i∈ I is multipliable.
Let n ∈ N. In our above proof of Proposition 3.11.18 (a), we have seen the following:

• There exists an x n -approximator U for (ai )i∈ I .


• There exists an x n -approximator V for (bi )i∈ I .

Consider these U and V. Let M = U ∪ V. Thus, M = U ∪ V ⊇ U and M = U ∪ V ⊇


V. In our above proof of Proposition 3.11.18 (a), we have seen the following:

• The set M is a finite subset of I.


• The set M determines the x n -coefficient in the product of (ai bi )i∈ I .
xn
We recall that the relation ≡ on K [[ x ]] is an equivalence relation (by Theorem 3.10.3
xn
(a)). Thus, this relation ≡ is transitive and symmetric.
Now, the definition of the infinite product ∏ (ai bi ) (namely, Definition 3.11.5 (b))
i∈ I
yields that ! !
[x ]n
∏ ( ai bi ) = [x ] n
∏ ( ai bi ) (360)
i∈ I i∈ M

(since M is a finite subset of I that determines the x n -coefficient in the product of


(ai bi )i∈ I ). On the other hand, the set U is an x n -approximator for (ai )i∈ I . Thus, Propo-
sition 3.11.16 (b) (applied to U instead of M) yields
xn
∏ ai ≡ ∏ ai
i∈ I i ∈U

xn
(since the family (ai )i∈ I is multipliable). Since the relation ≡ is symmetric, we thus
obtain
xn
∏ ai ≡ ∏ ai . (361)
i ∈U i∈ I
Moreover, M is a finite subset of I satisfying U ⊆ M (since M ⊇ U) and therefore
U ⊆ M ⊆ I; hence, Proposition 3.11.16 (a) (applied to U and M instead of M and J)
yields
xn xn
∏ ai ≡ ∏ ai ≡ ∏ ai (by (361)) .
i∈ M i ∈U i∈ I
Math 701 Spring 2021, version April 6, 2024 page 622

Hence,
xn
∏ ai ≡ ∏ ai (362)
i∈ M i∈ I
xn
(since the relation ≡ is transitive). The same argument (applied to (bi )i∈ I and V instead
of (ai )i∈ I and U) yields
xn
∏ bi ≡ ∏ bi . (363)
i∈ M i∈ I
From (362) and (363), we obtain
! ! ! !
xn
∏ ai ∏ bi ≡ ∏ ai ∏ bi
i∈ M i∈ M i∈ I i∈ I

(by (101), applied to a = ∏ ai and b = ∏ ai and c = ∏ bi and d = ∏ bi ). In view of


i∈ M i∈ I i∈ M i∈ I
! !  
by the properties of finite products,
∏ ai ∏ bi = ∏ ( ai bi ) since the set M is finite
,
i∈ M i∈ M i∈ M

this rewrites as ! !
xn
∏ ( ai bi ) ≡ ∏ ai ∏ bi .
i∈ M i∈ I i∈ I
In other words,
! ! !!
each m ∈ {0, 1, . . . , n} satisfies [ x m ] ∏ ( ai bi ) = [ xm ] ∏ ai ∏ bi
i∈ M i∈ I i∈ I

(by Definition 3.10.1). Applying this to m = n, we obtain


! ! !!
[ xn ] ∏ ( ai bi ) = [ xn ] ∏ ai ∏ bi .
i∈ M i∈ I i∈ I

In view of (360), this rewrites as


! ! !!
[ xn ] ∏ ( ai bi ) = [ xn ] ∏ ai ∏ bi . (364)
i∈ I i∈ I i∈ I

Now, forget that we fixed n. We thus have proved that each n ∈ N satisfies (364). In
other words, each coefficient of the FPS ∏ (ai bi ) equals the corresponding coefficient
   i∈ I

of ∏ ai ∏ bi . Hence, we have
i∈ I i∈ I
! !
∏ ( ai bi ) = ∏ ai ∏ bi .
i∈ I i∈ I i∈ I

This proves Proposition 3.11.18 (b).


In order to prove Proposition 3.11.19, we need a lemma (an analogue of Lemma
3.3.22 for division instead of multiplication):
Math 701 Spring 2021, version April 6, 2024 page 623

Lemma B.2.1. Let a, b, c, d ∈ K [[ x ]] be four FPSs such that c and d are invertible. Let
n ∈ N. Assume that

[ xm ] a = [ xm ] b for each m ∈ {0, 1, . . . , n} .

Assume further that

[ xm ] c = [ xm ] d for each m ∈ {0, 1, . . . , n} .

Then,
a b
[ xm ] = [ xm ] for each m ∈ {0, 1, . . . , n} .
c d

Proof of Lemma B.2.1. We have assumed that

[ xm ] a = [ xm ] b for each m ∈ {0, 1, . . . , n} .


xn xn xn
In other words, a ≡ b (by the definition of the relation ≡). Similarly, c ≡ d. Hence,
a xn b
(102) yields ≡ . In other words,
c d
a b
[ xm ] = [ xm ] for each m ∈ {0, 1, . . . , n}
c d
xn
(by the definition of the relation ≡). This proves Lemma B.2.1.

Now we can easily prove Proposition 3.11.19:

Detailed proof of Proposition 3.11.19. This can be proved similarly to Proposition 3.11.18,
with the obvious changes (replacing multiplication by division, and requiring each bi
to be invertible). Of course, instead of Lemma 3.3.22, we need to use Lemma B.2.1.

In order to eventually prove Proposition 3.11.21, we shall first prove a slightly stronger
auxiliary statement:

Lemma B.2.2. Let (ai )i∈ I ∈ K [[ x ]] I be a family of invertible FPSs. Let J be a subset
of I. Let n ∈ N. Let U be an x n -approximator for (ai )i∈ I . Then, U ∩ J is an x n -
approximator for (ai )i∈ J .

Proof of Lemma B.2.2. The set U is an x n -approximator for (ai )i∈ I . In other words, U is
a finite subset of I that determines the first n + 1 coefficients in the product of (ai )i∈ I
(by the definition of an x n -approximator). Hence, in particular, U is finite. Moreover,
U ⊆ I (since U is a subset of I).
Let M = U ∩ J. Thus, M = U ∩ J ⊆ U, so that the set M is finite (since U is finite).
Moreover, M is a subset of J (since M = U ∩ J ⊆ J).
Now, let N be a finite subset of J satisfying M ⊆ N ⊆ J. We shall show that

xn
∏ ai ≡ ∏ ai .
i∈ N i∈ M
Math 701 Spring 2021, version April 6, 2024 page 624

Indeed, the set N ∪ U is finite (since N and U are finite) and is a subset of I (since
|{z} ∪ |{z}
N U ⊆ I ∪ I = I); it also satisfies U ⊆ N ∪ U ⊆ I. Now, we recall that U is an
⊆ J⊆I ⊆I
x n -approximator
for (ai )i∈ I . Hence, Proposition 3.11.16 (a) (applied to U and N ∪ U
instead of M and J) yields
xn
∏ ai ≡ ∏ ai (365)
i ∈ N ∪U i ∈U

(since N ∪ U is a finite subset of I satisfying U ⊆ N ∪ U ⊆ I).


On the other hand, we have assumed that (ai )i∈ I is a family of invertible FPSs. Thus,
for each i ∈ I, the FPS ai is invertible, so that its inverse ai−1 is well-defined. We
obviously have
xn
∏ ai−1 ≡ ∏ ai−1 (366)
i ∈U \ J i ∈U \ J

xn
(since Proposition 3.10.3 (a) yields that the relation ≡ is reflexive).
We have now proved (365) and (366). Hence, (101) (applied to a = ∏ ai and
i ∈ N ∪U
b = ∏ ai and c = ∏ ai−1 and d = ∏ ai−1 ) yields
i ∈U i ∈U \ J i ∈U \ J
! ! ! !
xn
∏ ai ∏ ai−1 ≡ ∏ ai ∏ ai−1 . (367)
i ∈ N ∪U i ∈U \ J i ∈U i ∈U \ J

On the other hand, the sets N and U \ J are disjoint166 , and their union is N ∪
(U \ J ) = N ∪ U 167 . Hence, the set N ∪ U is the union of its two disjoint subsets N
and U \ J. Thus, we can split the product ∏ ai as follows:
i ∈ N ∪U
! !
∏ ai = ∏ ai ∏ ai .
i ∈ N ∪U i∈ N i ∈U \ J

166 since N ∩ (U \ J ) ⊆ J ∩ (U \ J ) = ∅ and thus N ∩ (U \ J ) = ∅


|{z}
⊆J
167 This follows from

N∪ U
|{z} = N ∪ (U ∩ J ) ∪ (U \ J ) ⊆ ∪ M} ∪ (U \ J ) = N ∪ (U \ J ) .
|N {z
| {z }
=(U ∩ J )∪(U \ J ) =M =N
(since M ⊆ N)
Math 701 Spring 2021, version April 6, 2024 page 625

Multiplying both sides of this equality by ∏ ai−1 , we obtain


i ∈U \ J
! ! ! ! !
∏ ai ∏ ai−1 = ∏ ai ∏ ai ∏ ai−1
i ∈ N ∪U i ∈U \ J i∈ N i ∈U \ J i ∈U \ J
| {z }
= ∏ (ai ai−1 )
i ∈U \ J
 
! ! !
 
∏ i  ∏ i i = −1 
∏ ai ∏

= a  a a 1
i∈ N i ∈U \ J | {z } i∈ N i ∈U \ J
=1 | {z }
=1
= ∏ ai . (368)
i∈ N

However, the set U is the union of its two disjoint subsets U ∩ J and U \ J. Thus, we
can split the product ∏ ai as follows:
i ∈U
! !
∏ ai = ∏ ai ∏ ai .
i ∈U i ∈U ∩ J i ∈U \ J

Multiplying both sides of this equality by ∏ ai−1 , we obtain


i ∈U \ J
! ! ! ! !
∏ ai ∏ ai−1 = ∏ ai ∏ ai ∏ ai−1
i ∈U i ∈U \ J i ∈U ∩ J i ∈U \ J i ∈U \ J
| {z }| {z }
= ∏ ai = ∏ (ai ai−1 )
i∈ M
i ∈U \ J
(since U ∩ J = M)
 
! ! !
 
∏ ai ∏ −1 
∏ ai ∏

= 
 a i a i = 1
i∈ M i ∈U \ J | {z } i∈ M i ∈U \ J
=1 | {z }
=1
= ∏ ai . (369)
i∈ M

In view of (368) and (369), we can rewrite the relation (367) as follows:
xn
∏ ai ≡ ∏ ai .
i∈ N i∈ M

In other words, each m ∈ {0, 1, . . . , n} satisfies


! !
m
[x ] ∏ ai = [x ] m
∏ ai (370)
i∈ N i∈ M

(by Definition 3.10.1).


Math 701 Spring 2021, version April 6, 2024 page 626

Forget that we fixed N. We have thus shown that every finite subset N of J satisfying
M ⊆ N ⊆ J satisfies (370) for each m ∈ {0, 1, . . . , n}.
Now, let m ∈ {0, 1, . . . , n}. Then, every finite subset N of J satisfying M ⊆ N ⊆ J
satisfies ! !
[ xm ] ∏ ai = [ xm ] ∏ ai
i∈ N i∈ M

(by (370)). In other words, the set M determines the x m -coefficient in the product of
(ai )i∈ J (by the definition of “determining the x m -coefficient in a product”).
Forget that we fixed m. We thus have shown that M determines the x m -coefficient in
the product of (ai )i∈ J for each m ∈ {0, 1, . . . , n}. In other words, M determines the first
n + 1 coefficients in the product of (ai )i∈ J . In other words, M is an x n -approximator
for (ai )i∈ J (by the definition of an “x n -approximator”, since M is a finite subset of J).
In other words, U ∩ J is an x n -approximator for (ai )i∈ J (since M = U ∩ J). This proves
Lemma B.2.2.

Detailed proof of Proposition 3.11.21. Let J be a subset of I. We shall show that the family
(ai )i∈ J is multipliable.
Fix n ∈ N. We know that the family (ai )i∈ I is multipliable. Hence, there exists an
x n -approximator U for (ai )i∈ I (by Lemma 3.11.15). Consider this U.
Set M = U ∩ J. Lemma B.2.2 yields that U ∩ J is an x n -approximator for (ai )i∈ J . In
other words, M is an x n -approximator for (ai )i∈ J (since M = U ∩ J). In other words,
M is a finite subset of J that determines the first n + 1 coefficients in the product of
(ai )i∈ J (by the definition of an x n -approximator). Thus, the set M determines the first
n + 1 coefficients in the product of (ai )i∈ J . Hence, in particular, this set M determines
the x n -coefficient in the product of (ai )i∈ J . Therefore, the x n -coefficient in the product
of (ai )i∈ J is finitely determined (by the definition of “finitely determined”, since M is
a finite subset of J).
Forget that we fixed n. We thus have proved that for each n ∈ N, the x n -coefficient
in the product of (ai )i∈ J is finitely determined. In other words, each coefficient in the
product of (ai )i∈ J is finitely determined. In other words, the family (ai )i∈ J is multipli-
able (by the definition of “multipliable”).
Forget that we fixed J. We thus have shown that the family (ai )i∈ J is multipliable
whenever J is a subset of I. In other words, any subfamily of (ai )i∈ I is multipliable.
This proves Proposition 3.11.21.

Our next goal is to prove Proposition 3.11.23. First, however, let us restate a piece of
Theorem 3.10.3 (f) in more convenient language:

Lemma B.2.3. Let n ∈ N. Let V be a finite set. Let (cw )w∈V ∈ K [[ x ]]V and (dw )w∈V ∈
K [[ x ]]V be two families of FPSs such that
xn
each w ∈ V satisfies cw ≡ dw .

Then, we have
xn
∏ cw ≡ ∏ dw .
w ∈V w ∈V
Math 701 Spring 2021, version April 6, 2024 page 627

Proof of Lemma B.2.3. This is just (105), with the letters S, s, as and bs renamed as V, w,
cw and dw .
Detailed proof of Proposition 3.11.23. We shall subdivide our proof into several claims:

Claim 1: Let w ∈ W. Then, the family (as )s∈S; f (s)=w is multipliable.

[Proof of Claim 1: This was an assumption of Proposition 3.11.23.]


Let us set
bw := ∏ as for each w ∈ W. (371)
s∈S;
f (s)=w

This is well-defined, because for each w ∈ W, the product ∏ as is well-defined


s∈S;
f (s)=w
(since Claim 1 shows that the family (as )s∈S; f (s)=w is multipliable).
Now, let n ∈ N. Lemma 3.11.15 (applied to S and (as )s∈S instead of I and (ai )i∈ I )
shows that there exists an x n -approximator for (as )s∈S . Pick such an x n -approximator,
and call it U. Then, U is an x n -approximator for (as )s∈S ; in other words, U is a finite
subset of S that determines the first n + 1 coefficients in the product of (as )s∈S (by the
definition of an x n -approximator).
The set U is finite. Thus, its image f (U ) = { f (u) | u ∈ U } is finite as well. Now,
we claim the following:

Claim 2: For each w ∈ W, we have


xn
bw ≡ ∏ as .
s∈U;
f (s)=w

[Proof of Claim 2: Let w ∈ W. Let J be the subset {s ∈ S | f (s) = w} of S. Then,


the family (as )s∈ J is just the family (as )s∈S; f (s)=w , and thus is multipliable (by Claim
1). Furthermore, Lemma B.2.2 (applied to S and (as )s∈S instead of I and (ai )i∈ I ) yields
that U ∩ J is an x n -approximator for (as )s∈ J (since U is an x n -approximator for (as )s∈S ).
Hence, Proposition 3.11.16 (b) (applied to J, (as )s∈ J and U ∩ J instead of I, (ai )i∈ I and
M) yields
xn
∏ ai ≡ ∏ ai .
i∈ J i ∈U ∩ J

Renaming the indices i as s on both sides of this relation, we obtain


xn
∏ as ≡ ∏ as . (372)
s∈ J s ∈U ∩ J

However, we have J = {s ∈ S | f (s) = w}. Thus, the product sign “ ∏ ” is equiva-


s∈ J
lent to “ ∏ ”. Thus, we obtain
s∈S;
f (s)=w

∏ as = ∏ as = bw (373)
s∈ J s∈S;
f (s)=w
Math 701 Spring 2021, version April 6, 2024 page 628

(by (371)).
On the other hand, from J = {s ∈ S | f (s) = w}, we obtain

U ∩ J = U ∩ {s ∈ S | f (s) = w}
= {s ∈ U | f (s) = w} (since U ⊆ S) .

Hence, the product sign “ ∏ ” is equivalent to “ ∏ ”. Thus, we obtain


s ∈U ∩ J s∈U;
f (s)=w

∏ as = ∏ as . (374)
s ∈U ∩ J s∈U;
f (s)=w

In view of (373) and (374), we can rewrite the relation (372) as


xn
bw ≡ ∏ as .
s∈U;
f (s)=w

This proves Claim 2.]

Claim 3: The set f (U ) is an x n -approximator for the family (bw )w∈W .

[Proof of Claim 3: Let V be a finite subset of W satisfying f (U ) ⊆ V ⊆ W. We shall


show that
xn
∏ bw ≡ ∏ bw .
w ∈ f (U ) w ∈V

xn
Indeed, each w ∈ V satisfies w ∈ V ⊆ W and therefore bw ≡ ∏ as (by Claim 2).
s∈U;
f (s)=w
Hence, Lemma B.2.3 (applied to cw = bw and dw = ∏ as ) yields
s∈U;
f (s)=w

xn
∏ bw ≡ ∏ ∏ as . (375)
w ∈V w ∈V s∈U;
f (s)=w
 

However, each s ∈ U satisfies f (s) ∈ V (because f |{z}


s  ∈ f (U ) ⊆ V). Hence, we
∈U
can split the product ∏ as according to the value of f (s) (since both sets U and V are
s ∈U
finite); we thus obtain
∏ as = ∏ ∏ as .
s ∈U w ∈V s∈U;
f (s)=w

Therefore, (375) rewrites as


xn
∏ bw ≡ ∏ as . (376)
w ∈V s ∈U
Math 701 Spring 2021, version April 6, 2024 page 629

The same argument (applied to f (U ) instead of V) shows that

xn
∏ bw ≡ ∏ as (377)
w ∈ f (U ) s ∈U

(since f (U ) is a finite subset of W satisfying f (U ) ⊆ f (U ) ⊆ W).


xn
However, the relation ≡ on K [[ x ]] is symmetric (by Theorem 3.10.3 (a)); thus, (376)
entails
xn
∏ as ≡ ∏ bw .
s ∈U w ∈V

Therefore, (377) becomes


xn xn
∏ bw ≡ ∏ as ≡ ∏ bw .
w ∈ f (U ) s ∈U w ∈V

xn
Since the relation ≡ on K [[ x ]] is transitive (by Theorem 3.10.3 (a)), we thus obtain

xn
∏ bw ≡ ∏ bw .
w ∈ f (U ) w ∈V

In other words, each m ∈ {0, 1, . . . , n} satisfies


! !
[ xm ] ∏ bw = [ xm ] ∏ bw (378)
w ∈ f (U ) w ∈V

(by Definition 3.10.1).


Forget that we fixed V. We thus have shown that if V is a finite subset of W satisfying
f (U ) ⊆ V ⊆ W, then each m ∈ {0, 1, . . . , n} satisfies (378).
Now, let m ∈ {0, 1, . . . , n} be arbitrary. Then, every finite subset V of W satisfying
f (U ) ⊆ V ⊆ W satisfies
! !
[ xm ] ∏ bw = [ xm ] ∏ bw (by (378)) .
w ∈ f (U ) w ∈V

In other words, the set f (U ) determines the x m -coefficient in the product of (bw )w∈W
(by the definition of “determining the x m -coefficient in a product”, since f (U ) is a
finite subset of W).
Forget that we fixed m. We thus have shown that the set f (U ) determines the x m -
coefficient in the product of (bw )w∈W for each m ∈ {0, 1, . . . , n}. In other words, the set
f (U ) determines the first n + 1 coefficients in the product of (bw )w∈W . In other words,
f (U ) is an x n -approximator for (bw )w∈W (by the definition of an “x n -approximator”,
since f (U ) is a finite subset of W). This proves Claim 3.]

Claim 4: The x n -coefficient in the product of (bw )w∈W is finitely deter-


mined.
Math 701 Spring 2021, version April 6, 2024 page 630

[Proof of Claim 4: Claim 3 shows that f (U ) is an x n -approximator for (bw )w∈W . Thus,
the set f (U ) determines the first n + 1 coefficients in the product of (bw )w∈W (by the
definition of an “x n -approximator”). Hence, in particular, this set f (U ) determines
the x n -coefficient in the product of (bw )w∈W . Thus, the x n -coefficient in the product of
(bw )w∈W is finitely determined (by the definition of “finitely determined”, since f (U )
is a finite subset of W). This proves Claim 4.]
Now, forget that we fixed n. We thus have shown that the x n -coefficient in the
product of (bw )w∈W is finitely determined for each n ∈ N. In other words, each
coefficient in the product of (bw )w∈W is finitely determined. In other words, the family
(bw )w∈W is multipliable (by the definition
 of“multipliable”). In view of (371), we can

restate this as follows: The family  ∏ as  is multipliable.


 
s∈S;
f (s)=w w ∈W
It remains to prove the equality (119).
Let n ∈ N. Lemma 3.11.15 (applied to S and (as )s∈S instead of I and (ai )i∈ I ) shows
that there exists an x n -approximator for (as )s∈S . Pick such an x n -approximator, and
call it U. Then, U is a finite subset of S (by the definition of an x n -approximator).
Furthermore, Proposition 3.11.16 (b) (applied to S, (as )s∈S and U instead of I, (ai )i∈ I
and M) yields
xn
∏ ai ≡ ∏ ai .
i ∈S i ∈U

Renaming the index i as s on both sides of this relation, we obtain

xn
∏ as ≡ ∏ as . (379)
s∈S s ∈U

xn
However, the relation ≡ on K [[ x ]] is symmetric (by Theorem 3.10.3 (a)); thus, (379)
entails
xn
∏ as ≡ ∏ as . (380)
s ∈U s∈S

Let V be the set f (U ) = { f (u) | u ∈ U }. Thus, V = f (U ) ⊆ W and f (U ) = V ⊆


V, so that f (U ) ⊆ V ⊆ W. Moreover, the set f (U ) is finite (since U is finite); in other
words, the set V is finite (since V = f (U )).
We have already proved (in Claim 3) that the set f (U ) is an x n -approximator for
the family (bw )w∈W . In other words, the set V is an x n -approximator for the family
(bw )w∈W (since V = f (U )). Hence, Proposition 3.11.16 (b) (applied to W, (bw )w∈W
and V instead of I, (ai )i∈ I and M) yields

xn
∏ bi ≡ ∏ bi
i ∈W i ∈V

(since the family (bw )w∈W is multipliable). Renaming the index i as w on both sides of
this relation, we obtain
xn
∏ bw ≡ ∏ bw . (381)
w ∈W w ∈V
Math 701 Spring 2021, version April 6, 2024 page 631

On the other hand,


xn
∏ bw ≡ ∏ as . (382)
w ∈V s ∈U

(Indeed, this is precisely the equality (376) that was shown during the proof of Claim
3, and its proof applies here just as well.)
Now, (381) becomes
xn xn
∏ bw ≡ ∏ bw ≡ ∏ as (by (382))
w ∈W w ∈V s ∈U
xn
≡ ∏ as (by (380)) .
s∈S

xn
Since the relation ≡ on K [[ x ]] is transitive (by Theorem 3.10.3 (a)), we thus obtain

xn
∏ bw ≡ ∏ as .
w ∈W s∈S

In other words,
! !
each m ∈ {0, 1, . . . , n} satisfies [ x m ] ∏ bw = [ xm ] ∏ as
w ∈W s∈S

(by Definition 3.10.1). Applying this to m = n, we obtain


! !
[ xn ] ∏ bw = [ xn ] ∏ as .
w ∈W s∈S

Forget that we fixed n. We thus have shown that


! !
[ xn ] ∏ bw = [ xn ] ∏ as for each n ∈ N.
w ∈W s∈S

In other words, each coefficient of the FPS ∏ bw equals the corresponding coefficient
w ∈W
of ∏ as . Therefore,
s∈S
∏ bw = ∏ as .
w ∈W s∈S

In view of (371), we can rewrite this as

∏ ∏ as = ∏ as .
w ∈W s∈S; s∈S
f (s)=w

Thus, (119) is proven, and the proof of Proposition 3.11.23 is complete.

Detailed proof of Proposition 3.11.24. Let f : I × J → I be the map that sends each pair
(i, j) to i. We first prove two easy claims:
Math 701 Spring 2021, version April 6, 2024 page 632

Claim 1: Let w ∈ I. Let J ′ be the subset {s ∈ I × J | f (s) = w} of I × J.


Then, there is a bijection

J → J′,
j 7→ (w, j) .

Proof of Claim 1. We have

J ′ = {s ∈ I × J | f (s) = w}
 

 


 

= (i, j) ∈ I × J | f (i, j) =w

 | {z } 


 =i 

(by the definition of f )

(here, we have renamed the index s as (i, j))


= {(i, j) ∈ I × J | i = w}
= {(w, j) | j ∈ J } (since i ∈ I ) .

In other words, the set J ′ consists of all pairs (w, j) with j ∈ J. Hence, there is a bijection

J → J′,
j 7→ (w, j) .

This proves Claim 1.

Claim 2: For each w ∈ I, the family (as )s∈ I × J; f (s)=w is multipliable.


 
Proof of Claim 2. Let w ∈ I. We have assumed that for each i ∈ I, the family a(i,j) is
  j∈ J
multipliable. Applying this to i = w, we see that the family a(w,j) is multipliable.
j∈ J
Let J ′ be the subset {s ∈ I × J | f (s) = w} of I × J. Thus, the family (as )s∈ J ′ is the
family (as )s∈ I × J; f (s)=w .
Claim 1 yields that there is a bijection

J → J′,
j 7→ (w, j) .
 
Hence, the family (as )s∈ J ′ is a reindexing of the family a(w,j) . Since the latter
  j∈ J
family a(w,j) is multipliable, we thus conclude that the former family (as )s∈ J ′ is
j∈ J
also multipliable (since a reindexing of a multipliable family is still multipliable168 ). In
other words, the family (as )s∈ I × J; f (s)=w is multipliable (since the family (as )s∈ J ′ is the
family (as )s∈ I × J; f (s)=w ). This proves Claim 2.

168 This is part of Proposition 3.11.22.


Math 701 Spring 2021, version April 6, 2024 page 633

Thanks to Claim 2, we can apply Proposition 3.11.23 to S = I × J and W = I. This


application yields that
∏ as = ∏ ∏ as ; (383)
s∈ I × J w∈ I s∈ I × J;
f (s)=w

in
 particular,
 it yields that the right hand side of (383) is well-defined – i.e., the family

 ∏ as  is also multipliable.
 
s∈ I × J;
f (s)=w w∈ I
Now, fix w ∈ I. Let J ′ be the subset {s ∈ I × J | f (s) = w} of I × J. Thus, the family
(as )s∈ J ′ is the family (as )s∈ I × J; f (s)=w , and therefore is multipliable (by Claim 2). Hence,
the product ∏ as is well-defined. Furthermore, Claim 1 yields that there is a bijection
s∈ J ′

J → J′,
j 7→ (w, j) .

Thus, we can substitute (w, j) for s in the product ∏ as (because any bijection allows
s∈ J ′
us to substitute the index in a product169 ). We thus obtain

∏ as = ∏ a(w,j) (384)
s∈ J ′ j∈ J

(and, in particular,
 the product on the right hand side of this equality is well-defined,
i.e., the family a(w,j) is multipliable). However, we can replace the product sign
j∈ J
“ ∏ ” by “ ∏ ” (since J ′ = {s ∈ I × J | f (s) = w}). Hence, we can rewrite (384) as
s∈ J ′ s∈ I × J;
f (s)=w

∏ as = ∏ a(w,j) . (385)
s∈ I × J; j∈ J
f (s)=w

Forget that we fixed w. We thus have proved (385) for each w ∈ I.


Now, (383) becomes

∏ as = ∏ ∏ as = ∏ ∏ a(w,j) = ∏ ∏ a(i,j)
s∈ I × J w∈ I s∈ I × J; w∈ I j∈ J i∈ I j∈ J
f (s)=w
| {z }
= ∏ a(w,j)
j∈ J
(by (385))

(here, we have renamed the index w as i in the outer product). Renaming the index s
as (i, j) on the left hand side of this equality, we can rewrite it as

∏ a(i,j) = ∏ ∏ a(i,j) .
(i,j)∈ I × J i∈ I j∈ J

169 This is just the claim of Proposition 3.11.22.


Math 701 Spring 2021, version April 6, 2024 page 634

A similar argument (but using the map I × J → J, (i, j) 7→ j instead of our map
f : I × J → I, (i, j) 7→ i) shows that

∏ a(i,j) = ∏ ∏ a(i,j) .
(i,j)∈ I × J j∈ J i∈ I

Combining these two equalities, we obtain

∏ ∏ a(i,j) = ∏ a(i,j) = ∏ ∏ a(i,j) .


i∈ I j∈ J (i,j)∈ I × J j∈ J i∈ I

(Tracing back our above argument, we see that all products appearing in this equality
are well-defined; indeed, their well-definedness has been shown the moment they first
appeared in our proof.) Proposition 3.11.24 is thus proved.
 
Detailed proof of Proposition 3.11.25. We have assumed that a(i,j) ∈ K [[ x ]] I × J
(i,j)∈ I × J
is a multipliable family of invertible FPSs. In other words, (as )s∈ I × J ∈ K [[ x ]] I × J is
a multipliable family of invertible FPSs (here, we have renamed the index (i, j) as s).
Hence, any subfamily of (as )s∈ I × J is multipliable (by Proposition 3.11.21, applied to
I × J and (as )s∈ I × J instead of I and (ai )i∈ I ).
Let f : I × J → I be the map that sends each pair (i, j) to i. Let us show three easy
claims:

Claim 1: Let w ∈ I. Let J ′ be the subset {s ∈ I × J | f (s) = w} of I × J.


Then, there is a bijection

J → J′,
j 7→ (w, j) .

Proof of Claim 1. This is proved in the exact same way as Claim 1 in our above proof of
Proposition 3.11.24.
 
Claim 2: For each i ∈ I, the family a(i,j) is multipliable.
j∈ J

Proof of Claim 2. Let i ∈ I. Let J ′ be the subset {s ∈ I × J | f (s) = i } of I × J. Then,


the family (as )s∈ J ′ is a subfamily of (as )s∈ I × J . Hence, this family (as )s∈ J ′ is multipliable
(since any subfamily of (as )s∈ I × J is multipliable).
However, Claim 1 (applied to w = i) shows that there is a bijection

J → J′,
j 7→ (i, j) .
 
Hence, the family (as )s∈ J ′ is a reindexing of the family . Since the former a(i,j)
  j∈ J
family (as )s∈ J ′ is multipliable, we thus conclude that the latter family a(i,j) is also
j∈ J
multipliable (since a reindexing of a multipliable family is still multipliable170 ). This
proves Claim 2.
170 This is part of Proposition 3.11.22.
Math 701 Spring 2021, version April 6, 2024 page 635

 
Claim 3: For each j ∈ J, the family a(i,j) is multipliable.
i∈ I

Proof of Claim 3. This is analogous to Claim 2.


Thanks to Claim 2 and Claim 3, we can apply Proposition 3.11.24, and conclude that

∏ ∏ a(i,j) = ∏ a(i,j) = ∏ ∏ a(i,j)


i∈ I j∈ J (i,j)∈ I × J j∈ J i∈ I

(and, in particular, all the products appearing in this equality are well-defined). This
proves Proposition 3.11.25.
Detailed proof of Lemma 3.11.32. Theorem 3.11.10 (applied to I = J) shows that the fam-
ily (1 + f i )i∈ J is multipliable. In other words, each coefficient in the product of this
family (1 + f i )i∈ J is finitely determined. In other words, for each m ∈ N, the x m -
coefficient in the product of (1 + f i )i∈ J is finitely determined. In other words, for each
m ∈ N, there is a finite subset Mm of J that determines the x m -coefficient in the product
of (1 + f i )i∈ J . Consider this Mm .
Let M = M0 ∪ M1 ∪ · · · ∪ Mn . Then, M is a finite subset of J (since M0 , M1 , . . . , Mn
are finite subsets of J). Moreover, we claim that
! !
[ xm ] ∏ (1 + f i ) = [ xm ] ∏ (1 + f i ) (386)
i∈ J i∈ M

for each m ∈ {0, 1, . . . , n}.


[Proof of (386): Let m ∈ {0, 1, . . . , n}. Then, Mm is one of the n + 1 sets in the union
M0 ∪ M1 ∪ · · · ∪ Mn . Hence, Mm ⊆ M0 ∪ M1 ∪ · · · ∪ Mn = M.
However, the subset Mm of J determines the x m -coefficient in the product of (1 + f i )i∈ J
(by the definition of Mm ). In other words, every finite subset J ′ of J satisfying Mm ⊆
J ′ ⊆ J satisfies ! !
[ xm ] ∏′ (1 + fi ) = [ xm ] ∏ (1 + f i )
i∈ J i ∈ Mm

(by the definition of “determining the x m -coefficient in a product”). Applying this to


J ′ = M, we obtain
! !
[ xm ] ∏ (1 + f i ) = [ xm ] ∏ (1 + f i )
i∈ M i ∈ Mm

(since M is a finite subset of J satisfying Mm ⊆ M ⊆ J). On the other hand, the


definition of the product ∏ (1 + f i ) yields that
i∈ J
! !
[ xm ] ∏ (1 + f i ) = [ xm ] ∏ (1 + f i )
i∈ J i ∈ Mm

(since Mm is a finite subset of J that determines the x m -coefficient in the product of


(1 + f i )i∈ J ). Comparing these two equalities, we obtain
! !
[ xm ] ∏ (1 + f i ) = [ xm ] ∏ (1 + f i ) .
i∈ J i∈ M
Math 701 Spring 2021, version April 6, 2024 page 636

This proves (386).]


Now, we know that (386) holds for each m ∈ {0, 1, . . . , n}. Thus, we can apply
Lemma 3.3.20 to f = ∏ (1 + f i ) and g = ∏ (1 + f i ). We thus obtain that
i∈ J i∈ M
! !
[ x m ] a ∏ (1 + f i ) = [ x m ] a ∏ (1 + f i ) (387)
i∈ J i∈ M

for each m ∈ {0, 1, . . . , n}.


On the other hand, M is a finite set, so that ( f i )i∈ M is a finite family. Furthermore,
each i ∈ M satisfies [ x m ] ( f i ) = 0 for each m ∈ {0, 1, . . . , n} (by (129), since i ∈ M ⊆ J).
Thus, we can apply Lemma 3.11.9 to M instead of J. We therefore obtain that
!
[ x m ] a ∏ (1 + f i ) = [ xm ] a (388)
i∈ M

for each m ∈ {0, 1, . . . , n}.


Hence, for each m ∈ {0, 1, . . . , n}, we have
! !
[ x ] a ∏ (1 + f i )
m
= [ x ] a ∏ (1 + f i )
m
(by (387))
i∈ J i∈ M

= [ xm ] a (by (388)) .

This proves Lemma 3.11.32.

Detailed proof of Proposition 3.11.30. The following proof is an expanded version of the
argument given by Mindlack at https://math.stackexchange.com/a/4123658/ .
This will be a long grind; we thus break it up into several claims. First, however, let
us introduce a few notations:

• If J is a subset of I, then S J shall denote the Cartesian product ∏ Si . This Carte-


i∈ J
sian product ∏ Si consists of families (si )i∈ J , where each si belongs to the respec-
i∈ J
tive set Si .
The notation S J should not be misconstrued as being an actual power. (However,
in the particular case when all Si equal one and the same set S, the Cartesian
product S J we just defined is indeed the Cartesian power commonly known as
S J .)

• If J is a subset of I, then S JI shall denote the set of all families (si )i∈ I ∈ S I that
satisfy
(si = 0 for all i ∈ I \ J ) .
This set S JI is in a canonical bijection with S J , as elements of both sets consist of
“essentially the same data”. To wit, an element of S J is a family that only has i-th
entries for i ∈ J, whereas an element of S JI is a family that has i-th entries for all
i ∈ I, but subject to the requirement that the i-th entries for all i ∈ I \ J are 0 (so
Math 701 Spring 2021, version April 6, 2024 page 637

that only the i-th entries for i ∈ J carry any information). More rigorously: The
map

S JI → S J ,
( si )i ∈ I 7 → ( si )i ∈ J

is a bijection (since it merely shrinks the family by removing entries that are
required to be 0 anyway). We denote this bijection by reduce J .
I to be the set of all essentially finite families s
• We define Sfin ( i )i∈ I ∈ S I . It is easy
I I
to see that Sfin is the union of the sets S J over all finite subsets J of I.

Now, we can begin with our claims:

Claim 1: The family ( pi,k )k∈Si is summable for each i ∈ I.

[Proof of Claim 1: Let j ∈ I. Then, the pairs (i, k ) ∈ S with i = j are precisely the pairs
of the form ( j, k ) with k ∈ S j and k ̸= 0. In other words, the pairs (i, k ) ∈ S with i = j
are precisely the pairs of the form ( j, k ) with k ∈ S j \ {0}.
We assumed that the family ( pi,k )(i,k)∈S is summable. Hence, its subfamily ( pi,k )(i,k)∈S with i= j
is summable as well (since asubfamily of a summable family is always summable). In
other words, the family p j,k k∈S \{0} is summable (since this family is just a reindexing
j

of the family ( pi,k )(i,k)∈S with i= j (because the pairs (i, k ) ∈ S with i = j are precisely the

pairs of the form ( j, k ) with k ∈ S j \ {0})). Thus, the family p j,k k∈S is summable as
j
well (since the summability of a family does not change if we insert a single entry into
it171 ). 
Forget that we fixed j. We thus have shown that the family p j,k k∈S is summable
j
for each j ∈ I. Renaming j as i in this statement, we obtain the following: The family
( pi,k )k∈Si is summable for each i ∈ I. This proves Claim 1.]
Claim 1 shows that the sum ∑ pi,k is well-defined for each i ∈ I. Moreover, for each
k ∈ Si
i ∈ I, we have
 
here, we have split off
∑ pi,k = pi,0 + ∑ pi,k  the addend for k = 0 
k ∈ Si k ∈S \{0} from the sum, since 0 ∈ Si
|{z}
i
=1
(by (126))

= 1+ ∑ pi,k . (389)
k ∈Si \{0}

Next, we claim the following:


!
Claim 2: The family ∑ pi,k is multipliable.
k ∈ Si
i∈ I

171 Indeed, the summability of a family is an “all but finitely many k satisfy something” type of
statement. If we insert a single entry into the family, such a statement does not change its
validity.
Math 701 Spring 2021, version April 6, 2024 page 638

[Proof of Claim 2: The family ( pi,k )(i,k)∈S is summable (by assumption). We can split
its sum into subsums as follows:

∑ pi,k = ∑ ∑ pi,k .
(i,k)∈S i∈ I k ∈Si \{0}
| {z }
=∑ ∑
i∈ I k ∈Si \{0}
(since a pair (i,k ) belongs to S
if and only if it satisfies i ∈ I
and k∈Si \{0})
!
This shows that the family ∑ pi,k is summable. Hence, Theorem 3.11.10
k ∈Si \{0} i∈ I !
(applied to f i = ∑ pi,k ) yields that the family 1+ ∑ pi,k is multipliable.
k ∈Si \{0} k ∈Si \{0} i∈ I
!
In other words, the family ∑ pi,k is multipliable (since (389) shows that this
k ∈ Si
i∈ I !
family is precisely the family 1+ ∑ pi,k ). This proves Claim 2.]
k∈Si \{0} i∈ I
Claim 2 shows that the product ∏ ∑ pi,k is well-defined.
i∈ I k ∈ Si

The family ( pi,k )(i,k)∈S is summable (by assumption). In other words, for any m ∈ N,
all but finitely many (i, k ) ∈ S satisfy [ x m ] pi,k = 0. In other words, for any m ∈ N,
there exists a finite subset Tm of S such that
all (i, k ) ∈ S \ Tm satisfy [ x m ] pi,k = 0. (390)
Consider these finite subsets Tm .
For each n ∈ N, we let Tn′ be the subset T0 ∪ T1 ∪ · · · ∪ Tn of S. This subset Tn′ is finite
(since T0 , T1 , . . . , Tn are finite).
For each n ∈ N, we let In be the subset
i | (i, k ) ∈ Tn′


of I, and we let Kn be the set


k | (i, k ) ∈ Tn′ .


These two sets In and Kn are finite (since Tn′ is finite).


The definition of Tn′ shows the following:
Claim 3: Let n ∈ N and (i, k ) ∈ S \ Tn′ . Then, we have
[ x m ] pi,k = 0 for each m ∈ {0, 1, . . . , n} .

[Proof of Claim 3: Let m ∈ {0, 1, . . . , n}. Then, Tm ⊆ Tn′ (since Tn′ is defined as the
union T0 ∪ T1 ∪ · · · ∪ Tn , whereas Tm is one of the n + 1 sets appearing in this union).
Hence, Tn′ ⊇ Tm , so that S \ Tn′ ⊆ S \ Tm . Now, (i, k ) ∈ S \ Tn′ ⊆ S \ Tm . Therefore,
|{z}
⊇ Tm
(390) shows that [ x m ] pi,k = 0. This proves Claim 3.]
The following is easy to see:
Math 701 Spring 2021, version April 6, 2024 page 639

I , then the family p



Claim 4: If (k i )i∈ I ∈ Sfin i,k i i∈ I
is multipliable.

I . Thus, k
[Proof of Claim 4: Let (k i )i∈ I ∈ Sfin ( i )i∈ I is an essentially finite family in S I .
Now, all but finitely many i ∈ I satisfy k i = 0 (since (k i )i∈ I is essentially finite) and thus

pi,ki = pi,0 = 1 (by (126)). Hence, all but finitely many entries of the family pi,ki i∈ I
equal 1. Thus, this family is multipliable (by Proposition 3.11.11). This proves Claim
4.]
Claim 4 shows that the product ∏ pi,ki is well-defined whenever (k i )i∈ I ∈ Sfin I . Next,
i∈ I
we claim the following:

Claim 5: Let n ∈ N. Let  I . Assume that some j ∈ I satisfies


(k i )i∈ I ∈Sfin
j, k j ∈ S \ Tn′ . Then, [ x n ] ∏ pi,ki = 0.

i∈ I

[Proof of Claim 5: We have assumed that some j ∈ I satisfies j, k j ∈ S \ Tn′ . Consider




this j. Then, the product ∏ pi,ki is a multiple of p j,k j (since p j,k j is one of the factors of
i∈ I
this product). Moreover, we have [ x m ] p j,k j = 0 for each m ∈ {0, 1, . . . , n} (by Claim 3,
applied to (i, k ) = j, k j ). Hence, Lemma 3.3.21 (applied to u = p j,k j and v = ∏ pi,ki )

  i∈ I

shows that we have [ x m ] ∏ pi,ki = 0 for each m ∈ {0, 1, . . . , n} (since ∏ pi,ki is a


i∈ I   i∈ I

multiple of p j,k j ). Applying this to m = n, we obtain [ x n ] ∏ pi,ki = 0. This proves


i∈ I
Claim 5.]
 
Claim 6: Let n ∈ N. Let (k i )i∈ I ∈ I
Sfin \ S IIn . Then, [ xn ] ∏ pi,ki = 0.
i∈ I

[Proof of Claim 6: We have (k i )i∈ I ∈ Sfin I \ S I . In other words, we have k I


In ( i )i∈ I ∈ Sfin
/ S IIn . From (k i )i∈ I ∈
but (k i )i∈ I ∈ / S IIn , we see that there exists some j ∈ I \ In satisfying
k j ̸= 0. Consider this j. We have k j ∈ S j (since (k i )i∈ I ∈ Sfin I ⊆ S I ) and j ∈/ In (since
/ Tn′ (because
 
j ∈ I \ In ). We have j, k j ∈ S (since k j ∈ S j and k j ̸= 0) and j, k j ∈
if we had j, k j ∈ Tn′ , then we would have j ∈ In by the definition of In ; but this
/ In ). Hence, we have j, k j ∈ S \ Tn′ . Therefore, Claim 5 yields

would contradict j ∈
 
n
[ x ] ∏ pi,ki = 0. This proves Claim 6.]
i∈ I
 
Claim 7: The family ∏ pi,ki is summable.
i∈ I I
(k i )i∈ I ∈Sfin

[Proof of Claim 7: Let n ∈ N. We shall show that all but finitely many families
I satisfy
(k i )i∈ I ∈ Sfin !
[ xn ] ∏ pi,k i
= 0.
i∈ I
Math 701 Spring 2021, version April 6, 2024 page 640

I be a family such that


Indeed, let (k i )i∈ I ∈ Sfin
!
[ xn ] ∏ pi,k i
̸= 0. (391)
i∈ I

We are going to show that (k i )i∈ I satisfies the following two properties:

• Property 1: All i ∈ I \ In satisfy k i = 0.

• Property 2: All i ∈ In satisfy k i ∈ Kn ∪ {0}.

These two properties together will restrict the family (k i )i∈ I to finitely many possibili-
ties (since the sets In and Kn are finite).
/ S IIn , then we would have (k i )i∈ I ∈ Sfin
If we had (k i )i∈ I ∈ I \ S I (since k
In
I )
( i )i∈ I ∈ Sfin
 
and thus [ x n ] ∏ pi,ki = 0 (by Claim 6), which would contradict (391). Hence, we
i∈ I
cannot have (k i )i∈ I ∈ / S IIn . Thus, we have (k i )i∈ I ∈ S IIn . In other words, all i ∈ I \ In
satisfy k i = 0. This proves Property 1.
If there was some j ∈ In that satisfies k j ∈ / Kn ∪ {0}, then this k j would satisfy
 
′ 172 n
, and therefore we would have [ x ] ∏ pi,ki = 0 (by Claim 5, since

j, k j ∈ S \ Tn
i∈ I
j ∈ In ⊆ I); but this would contradict (391). Hence, there exists no j ∈ In that satisfies
kj ∈
/ Kn ∪ {0}. In other words, all i ∈ In satisfy k i ∈ Kn ∪ {0}. This proves Property 2.
Now, we have shown that our family (k i )i∈ I satisfies Property 1 and Property 2.
I
 (k i )i∈ I . We thus have shown that any family (k i )i∈ I ∈ Sfin that
Forget that we fixed
satisfies [ x n ] ∏ pi,ki ̸= 0 must satisfy Property 1 and Property 2. In other words, any
i∈ I
I that satisfy Property 1
such family must belong to the set of all families (k i )i∈ I ∈ Sfin
and Property 2. However, the latter 
set is finite173 . Hence, there are only finitely many

I that satisfy x n
family (k i )i∈ I ∈ Sfin [ ] ∏ pi,ki ̸= 0 (since we have shown that any such
i∈ I
I that satisfy Property
family must belong to the finite set of all families (k i )i∈ I ∈ Sfin
172 Proof. / Kn ∪ {0}. We must show that j, k j ∈ S \ Tn′ .

Let j ∈ In be such that k j ∈
Indeed, we have k j ∈ / Kn ∪ {0}; thus, k j ∈
/ Kn and k j ̸= 0. However, j ∈ In ⊆ I and k j ∈ S j
I ⊆ S I ) and therefore j, k

(since (k i )i∈ I ∈ Sfin j ∈ S (by the definition of S, since k j ̸= 0). If


we had j, k j ∈ Tn , then we would have k j ∈ Kn (by the definition of Kn ), which would
/ Tn′ . Combining this with j, k j ∈ S, we obtain
 
contradict k j ∈ / Kn . Thus, we have j, k j ∈
j, k j ∈ S \ Tn′ . This completes our proof.

173 Proof. We must show that Property 1 and Property 2 leave only finitely many options for

the family (k i )i∈ I . Indeed, Property 1 shows that all entries k i with i ∈ I \ In are uniquely
determined; meanwhile, Property 2 ensures that the remaining entries (of which there are
only finitely many, since the set In is finite) must belong to the finite set Kn ∪ {0} (this set is
finite, since Kn is finite). Therefore, a family (k i )i∈ I that satisfies Property 1 and Property 2
is uniquely determined by finitely many of its entries (namely, by its entries k i with i ∈ In ),
and there are finitely many choices for each of them (since they must belong to the finite set
Kn ∪ {0}). Hence, there are only finitely many such families (namely, at most |Kn ∪ {0}|| In |
many options). In other words, the set of all families (k i )i∈ I ∈ Sfin I that satisfy Property 1

and Property 2 is finite.


Math 701 Spring 2021, version April 6, 2024 page 641

I
1 and  2). In other words, all but finitely many families (k i )i∈ I ∈ Sfin satisfy
 Property
[ x n ] ∏ pi,ki = 0.
i∈ I
Forget that we fixed n. We thus have shown  that for each n ∈ N, all but finitely
I satisfy x n
many families (k i )i∈ I ∈ Sfin [ ] ∏ pi,ki = 0. In other words, the family
  i∈ I

∏ pi,ki is summable. This proves Claim 7.]


i∈ I I
(k i )i∈ I ∈Sfin
Claim 7 shows that the sum ∑ ∏ pi,ki is well-defined. We shall next focus on
I
(k i )i∈ I ∈Sfin i∈ I
proving (127).

Claim 8: Let n ∈ N. Then,


   

[ xn ]  ∑ ∏ pi,k  = [xn ] 
i ∑ ∏ pi,k  .
i
I
(k i )i∈ I ∈Sfin i∈ I ( k i ) i ∈ In ∈ S In i ∈ In

[Proof of Claim 8: The set In is finite (as we know). Thus, S IIn ⊆ Sfin
I 174 . Hence, the
I is the union of its two disjoint subsets S I and S I \ S I .
set Sfin In fin In
Furthermore, for each (k i )i∈ In ∈ S IIn , we have

ki = 0 for all i ∈ I \ In

(by the definition of S IIn ) and therefore

pi,ki = pi,0 = 1 for all i ∈ I \ In

(by (126)) and thus


∏ pi,k = ∏ 1 = 1
|{z}i
i ∈ I \ In i∈ I \ I n
=1
and therefore
! !
∏ pi,k i
= ∏ pi,k i ∏ pi,ki
i∈ I i ∈ In i ∈ I \ In
| {z }
=1
 
here, we have split the product into two
parts, since the set In is a subset of I
= ∏ pi,k . i
(392)
i ∈ In

174 Proof.Let (k i )i∈ I ∈ S IIn . Thus, (k i )i∈ I is a family in S I that satisfies k i = 0 for all i ∈ I \ In
(by the definition of S IIn ). However, In is finite. Thus, k i = 0 for all but finitely many i ∈ I
(since k i = 0 for all i ∈ I \ In ). In other words, the family (k i )i∈ I is essentially finite. In other
I (by the definition of S I ).
words, (k i )i∈ I ∈ Sfin fin
Forget that we fixed (k i )i∈ I . We thus have shown that (k i )i∈ I ∈ Sfin I for each ( k ) I
i i ∈ I ∈ S In .
In other words, S IIn ⊆ Sfin I .
Math 701 Spring 2021, version April 6, 2024 page 642

Now,
 

[ xn ]  ∑ ∏ pi,k  i
I
(k i )i∈ I ∈Sfin i∈ I
!
= ∑ n
[x ] ∏ pi,k i
I
(k i )i∈ I ∈Sfin i∈ I
! !
= ∑ n
[x ] ∏ pi,k i
+ ∑ [x ] n
∏ pi,k i
(k i )i∈ I ∈S IIn i∈ I I \S I
(k i )i∈ I ∈Sfin In
i∈ I
| {z } | {z }
= ∏ pi,ki =0
i ∈ In (by Claim 6)
(by (392))
!
I is
here, we have split the sum, since the set Sfin
the union of its two disjoint subsets S IIn and Sfin
I \ SI
In
!
= ∑ [ xn ] ∏ pi,k i
+ ∑ 0
(k i )i∈ I ∈S IIn i ∈ In I \S I
(k i )i∈ I ∈Sfin In
| {z }
=0
! !
= ∑ n
[x ] ∏ pi,k i
= ∑ n
[x ] ∏ pi,k i
(k i )i∈ I ∈S IIn i ∈ In (k i )i∈ In ∈S In i ∈ In
 
here, we have substituted (k i )i∈ In for (k i )i∈ In in the
sum, since the map S IIn → S In , (k i )i∈ I 7→ (k i )i∈ In
 
 
 
 is a bijection (indeed, this map is the map 
we have called reduce In )
 

= [ xn ]  ∑ ∏ pi,k  . i
(k i )i∈ In ∈S In i ∈ In

This proves Claim 8.]

Claim 9: Let n ∈ N. Let i ∈ I \ In . Then,


!
[ xm ] ∑ pi,k =0 for each m ∈ {0, 1, . . . , n} .
k ∈Si \{0}

[Proof of Claim 9: We have i ∈ I \ In . In other words, i ∈ I and i ∈ / In .


Let m ∈ {0, 1, . . . , n}.
Let k ∈ Si \ {0}. Thus, by the definition of S, we have (i, k ) ∈ S (since k ∈ Si \ {0}
entails k ∈ Si and k ̸= 0). On the other hand, (i, k) ∈ / Tn′ (since otherwise, we would

have (i, k ) ∈ Tn and thus i ∈ In (by the definition of In ), which would contradict i ∈/ In ).
Combining (i, k ) ∈ S with (i, k ) ∈ / Tn′ , we obtain (i, k ) ∈ S \ Tn′ . Therefore, Claim 3
yields [ x m ] pi,k = 0.
Now, forget that we fixed k. We thus have shown that
[ x m ] pi,k = 0 for each k ∈ Si \ {0} . (393)
Math 701 Spring 2021, version April 6, 2024 page 643

Now, !
m
[x ] ∑ pi,k = ∑ [ x m ] pi,k = ∑ 0 = 0.
k ∈Si \{0} k∈S \{0}
| {z } k∈S \{0}
i i
=0
(by (393))

This proves Claim 9.]

Claim 10: Let n ∈ N. Then,


! !
[ xn ] ∏ ∑ pi,k = [ xn ] ∏ ∑ pi,k .
i∈ I k ∈ Si i ∈ In k ∈ Si

[Proof of Claim 10: The set In is a subset of I. Hence, the set I is the union of its two
disjoint subsets In and I \ In . Thus, we can split the product ∏ ∑ pi,k as follows:175
i∈ I k ∈ Si
 
 
 
!



∏ ∑ ∏ ∑ pi,k 
 ∏ ∑ pi,k 
 
pi,k = 
i∈ I k ∈ Si i ∈ In k ∈ Si i∈ I \ In k ∈ Si 
 | {z } 
=1+ ∑ pi,k 
 

k ∈Si \{0}
(by (389))
! !!
= ∏ ∑ pi,k ∏ 1+ ∑ pi,k . (394)
i ∈ In k ∈ Si i ∈ I \ In k∈Si \{0}
!
However, the family ∑ pi,k is summable (as we have seen in the proof
k ∈Si \{0} i∈ I !
of Claim 2). Hence, its subfamily ∑ pi,k is summable as well (since a
k ∈Si \{0} i ∈ I \ In
subfamily of a summable family is always!summable). Moreover, Claim 9 shows that

each i ∈ I \ In satisfies [ x m ] ∑ pi,k = 0 for each m ∈ {0, 1, . . . , n}. Hence,


k ∈Si \{0}
Lemma 3.11.32 (applied to a = ∏ ∑ pi,k and J = I \ In and f i = ∑ pi,k ) yields
i ∈ In k ∈ Si k ∈Si \{0}
that
! !!!
[ xm ] ∏ ∑ pi,k ∏ 1+ ∑ pi,k
i ∈ In k ∈ Si i ∈ I \ In k ∈Si \{0}
!
= [x ] m
∏ ∑ pi,k for each m ∈ {0, 1, . . . , n} .
i ∈ In k ∈ Si
!
175 Here, we are tacitly using the fact that any subfamily of the family ∑ pi,k is multipli-
k ∈ Si i∈ I
able. But this can be proved in the exact same way as we proved Claim 2.
Math 701 Spring 2021, version April 6, 2024 page 644

Applying this to m = n, we find


! !!! !
[ xn ] ∏ ∑ pi,k ∏ 1+ ∑ pi,k = [ xn ] ∏ ∑ pi,k .
i ∈ In k ∈ Si i ∈ I \ In k ∈Si \{0} i ∈ In k ∈ Si

In view of (394), this rewrites as


! !
[ xn ] ∏ ∑ pi,k = [ xn ] ∏ ∑ pi,k .
i∈ I k ∈ Si i ∈ In k ∈ Si

This proves Claim 10.]

Claim 11: Let i ∈ N. Then,

∏ ∑ pi,k = ∑ ∏ pi,k . i
i ∈ In k ∈ Si (k i )i∈ In ∈S In i ∈ In

[Proof of Claim 11: The set In is finite. For any i ∈ In , the family ( pi,k )k∈Si is summable
(by Claim 1). Hence, Proposition 3.11.31 (applied to N = In ) yields

∏ ∑ pi,k = ∑ ∏ pi,k i
= ∑ ∏ pi,k i
i ∈ In k ∈ Si ( k i ) i ∈ In ∈ ∏ S i i ∈ In (k i )i∈ In ∈S In i ∈ In
i ∈ In

(since ∏ Si = S In ). This proves Claim 11.]


i ∈ In

Now, for each n ∈ N, we have


!
[ xn ] ∏ ∑ pi,k
i∈ I k ∈ Si
!
= [ xn ] ∏ ∑ pi,k (by Claim 10)
i ∈ In k ∈ Si
 

= [ xn ]  ∑ ∏ pi,k  i
( k i ) i ∈ In ∈ S In i ∈ In
 
since Claim 11 yields ∏ ∑ pi,k = ∑ ∏ pi,k  i
i ∈ In k ∈ Si (k i )i∈ In ∈S In i ∈ In
 

= [ xn ]  ∑ ∏ pi,k  i
(by Claim 8) .
I
(k i )i∈ I ∈Sfin i∈ I

That is, any coefficient of the FPS ∏ ∑ pi,k equals the corresponding coefficient of
i∈ I k ∈ Si
∑ ∏ pi,ki . Hence,
I
(k i )i∈ I ∈Sfin i∈ I

∏ ∑ pi,k = ∑ ∏ pi,k i
= ∑ ∏ pi,k i
i∈ I k ∈ Si I
(k i )i∈ I ∈Sfin i∈ I ( k i ) i ∈ I ∈ ∏ Si i∈ I
i∈ I
is essentially finite
Math 701 Spring 2021, version April 6, 2024 page 645

I is the set of all essentially finite families k


(since Sfin ( i )i∈ I ∈ ∏ Si ). In particular, the
  i∈ I

family ∏ pi,ki is summable. This proves Proposition 3.11.30.


i∈ I (k i )i∈ I ∈ ∏ Si is essentially finite
i∈ I

Our proof of Proposition 3.11.36 will use the finite analogue of Proposition 3.11.36,
which is easy:

]] I is a family of FPSs, and if


Lemma B.2.4. Let I be a finite set. If ( f i )i∈ I ∈ K [[ x
g ∈ K [[ x ]] is an FPS satisfying x0 g = 0, then ∏ f i ◦ g = ∏ ( f i ◦ g).
 
i∈ I i∈ I

Proof of Lemma B.2.4. This follows by a straightforward induction on | I |. (The base case
is the case when | I | = 0, and relies on the fact that 1 ◦ g = 1 for any g ∈ K [[ x ]]. The
induction step relies on Proposition 3.5.4 (b). The details are left to the reader, who
must have seen dozens of such proofs by now.)

Detailed proof of Proposition 3.11.36. Let ( fi )i∈I ∈ K [[ x ]] I be a multipliable family of


FPSs. Let g ∈ K [[ x ]] be an FPS satisfying x0 g = 0.
We shall first show the following auxiliary claim:

Claim 1: Let M be an x n -approximator for ( f i )i∈ I . Then, the set M deter-


mines the x n -coefficient in the product of ( f i ◦ g)i∈ I .

[Proof of Claim 1: The set M is an x n -approximator for ( f i )i∈ I . In other words, M is a


finite subset of I that determines the first n + 1 coefficients in the product of ( f i )i∈ I (by
the definition of “x n -approximator”).
Let J be a finite subset of I satisfying M ⊆ J ⊆ I. Then, Lemma B.2.4 (applied to J
instead of I) yields !
∏ fi ◦ g = ∏ ( f i ◦ g) . (395)
i∈ J i∈ J

Also, Lemma B.2.4 (applied to M instead of I) yields


!
∏ fi ◦g= ∏ ( f i ◦ g) . (396)
i∈ M i∈ M

However, Proposition 3.11.16 (a) (applied to ai = f i ) yields


xn
∏ fi ≡ ∏ fi .
i∈ J i∈ M

xn xn
We also have g ≡ g (since the relation ≡ is an equivalence relation). Hence, Proposition
3.10.5 (applied to a = ∏ f i and b = ∏ f i and c = g and d = g) yields
i∈ J i∈ M
! !
xn
∏ fi ◦g≡ ∏ fi ◦ g.
i∈ J i∈ M
Math 701 Spring 2021, version April 6, 2024 page 646

In view of (395) and (396), this rewrites as


xn
∏ ( f i ◦ g) ≡ ∏ ( f i ◦ g) .
i∈ J i∈ M

In other words,
! !
each m ∈ {0, 1, . . . , n} satisfies [ x m ] ∏ ( f i ◦ g) = [ xm ] ∏ ( f i ◦ g)
i∈ J i∈ M

xn
(by the definition of the relation ≡). Applying this to m = n, we obtain
! !
[ xn ] ∏ ( f i ◦ g) = [ xn ] ∏ ( f i ◦ g) .
i∈ J i∈ M

Forget that we fixed J. We thus have shown that every finite subset J of I satisfying
M ⊆ J ⊆ I satisfies
! !
[ xn ] ∏ ( f i ◦ g) = [ xn ] ∏ ( f i ◦ g) .
i∈ J i∈ M

In other words, the set M determines the x n -coefficient in the product of ( f i ◦ g)i∈ I
(by the definition of what it means to “determine the x n -coefficient in the product of
( f i ◦ g)i∈ I ”). This proves Claim 1.]
Now, let n ∈ N. Lemma 3.11.15 (applied to ai = f i ) shows that there exists an x n -
approximator for ( f i )i∈ I . Consider this x n -approximator for ( f i )i∈ I , and denote it by
M. Thus, M is an x n -approximator for ( f i )i∈ I ; in other words, M is a finite subset of
I that determines the first n + 1 coefficients in the product of ( f i )i∈ I (by the definition
of “x n -approximator”). Claim 1 shows that the set M determines the x n -coefficient in
the product of ( f i ◦ g)i∈ I . Hence, there is a finite subset of I that determines the x n -
coefficient in the product of ( f i ◦ g)i∈ I (namely, M). In other words, the x n -coefficient
in the product of ( f i ◦ g)i∈ I is finitely determined.
Forget that we fixed n. We thus have shown that for each n ∈ N, the x n -coefficient
in the product of ( f i ◦ g)i∈ I is finitely determined. In other words, each coefficient in
the product of ( f i ◦ g)i∈ I is finitely determined. In other words, the family ( f i ◦ g)i∈ I is
multipliable (by the definition of “multipliable”).
 
It remains to prove that ∏ f i ◦ g = ∏ ( f i ◦ g).
i∈ I i∈ I
In order to do so, we again fix n ∈ N. Lemma 3.11.15 (applied to ai = f i ) shows that
there exists an x n -approximator for ( f i )i∈ I . Consider this x n -approximator for ( f i )i∈ I ,
and denote it by M. Thus, M is an x n -approximator for ( f i )i∈ I ; in other words, M is
a finite subset of I that determines the first n + 1 coefficients in the product of ( f i )i∈ I
(by the definition of “x n -approximator”). Moreover, Proposition 3.11.16 (b) (applied to
ai = f i ) yields
xn
∏ fi ≡ ∏ fi . (397)
i∈ I i∈ M
Math 701 Spring 2021, version April 6, 2024 page 647

xn xn
We also have g ≡ g (since the relation ≡ is an equivalence relation). Hence, Proposition
3.10.5 (applied to a = ∏ f i and b = ∏ f i and c = g and d = g) yields
i∈ I i∈ M
! !
xn
∏ fi ◦g≡ ∏ fi ◦ g. (398)
i∈ I i∈ M

However, Lemma B.2.4 (applied to M instead of I) yields


!
∏ fi ◦g= ∏ ( f i ◦ g) .
i∈ M i∈ M

In view of this, we can rewrite (398) as


!
xn
∏ fi ◦g≡ ∏ ( f i ◦ g) .
i∈ I i∈ M

In other words,
! ! !
each m ∈ {0, 1, . . . , n} satisfies [ x m ] ∏ fi ◦g = [ xm ] ∏ ( f i ◦ g)
i∈ I i∈ M

xn
(by the definition of the relation ≡). Applying this to m = n, we obtain
! ! !
[ xn ] ∏ fi ◦g = [ xn ] ∏ ( f i ◦ g) . (399)
i∈ I i∈ M

However, Claim 1 shows that the set M determines the x n -coefficient in the prod-
uct of ( f i ◦ g)i∈ I . Hence, the definition of the infinite product ∏ ( f i ◦ g) (specifically,
i∈ I
Definition 3.11.5 (b)) yields
! !
[x ] n
∏ ( f i ◦ g) = [x ] n
∏ ( f i ◦ g) .
i∈ I i∈ M

Comparing this with (399), we obtain


! ! !
[ xn ] ∏ fi ◦g = [ xn ] ∏ ( f i ◦ g) .
i∈ I i∈ I

Forget that we fixed n. We thus have shown that each n ∈ N satisfies


! ! !
[ xn ] ∏ fi ◦g = [ xn ] ∏ ( f i ◦ g) .
i∈ I i∈ I
 
In other words, the FPSs ∏ f i ◦ g and ∏ ( f i ◦ g) agree in all their coefficients. Hence,
  i∈ I i∈ I

∏ f i ◦ g = ∏ ( f i ◦ g). This completes our proof of Proposition 3.11.36.


i∈ I i∈ I
Math 701 Spring 2021, version April 6, 2024 page 648

B.3. Domino tilings


In Subsection 3.12.3, we have given a classification of faultfree domino tilings of the
rectangle Rn,3 . Let us now outline how this classification can be proved. First, we
introduce names for the faultfree domino tilings that appear in our classification:
Definition B.3.1. (a) For each even positive integer n, we let An be the domino tiling

·········
··· of Rn,3 .

Formally speaking, this domino tiling is the set partition of Rn,3 consisting of the
following the dominos:

• the horizontal dominos {(2i − 1, 1) , (2i, 1)} for all i ∈ [n/2], which fill the
bottom row of Rn,3 , and which we call the basement dominos;
• the vertical domino {(1, 2) , (1, 3)} in the first column, which we call the left
wall;
• the vertical domino {(n, 2) , (n, 3)} in the last column, which we call the right
wall;
• the horizontal dominos {(2i, 2) , (2i + 1, 2)} for all i ∈ [n/2 − 1], which fill
the middle row of Rn,3 (except for the first and last columns), and which we
call the middle dominos;
• the horizontal dominos {(2i, 3) , (2i + 1, 3)} for all i ∈ [n/2 − 1], which fill
the top row of Rn,3 (except for the first and last columns), and which we call
the top dominos.

(b) For each even positive integer n, we let Bn be the domino tiling

···
·········
of Rn,3 .

This is the reflection of the tiling An across the horizontal axis of symmetry of Rn,3 .
(c) We let C denote the domino tiling

of R2,3 .
Math 701 Spring 2021, version April 6, 2024 page 649

Our classification now can be stated as follows:

Proposition B.3.2. The faultfree domino tilings of height-3 rectangles are precisely
the tilings
A2 , A4 , A6 , A8 , . . . , B2 , B4 , B6 , B8 , . . . , C
we have just defined. More concretely:
(a) The faultfree domino tilings of a height-3 rectangle that contain a vertical
domino in the top two squares of the first column are A2 , A4 , A6 , A8 , . . ..
(b) The faultfree domino tilings of a height-3 rectangle that contain a vertical
domino in the bottom two squares of the first column are B2 , B4 , B6 , B8 , . . ..
(c) The only faultfree domino tiling of a height-3 rectangle that contains no vertical
domino in the first column is C.

Proof of Proposition B.3.2 (sketched). It clearly suffices to prove parts (a), (b) and (c). We
begin with the easiest part, which is (c):
(c) Clearly, C is a faultfree domino tiling of R2,3 that contains no vertical domino in
the first column. It remains to show that C is the only such tiling.
Indeed, let T be any faultfree domino tiling of R2,3 that contains no vertical domino
in the first column. Thus, its first column must be filled with three horizontal dominos,
which all must protrude into the second column and thus cover that second column as
well. If T had any further column, then T would have a fault between its 2-nd and its
3-rd column, which is impossible for a faultfree tiling. Thus, T must consist only of the
three horizontal dominos already mentioned. In other words, T = C. This completes
the proof of Proposition B.3.2 (c).
(a) It is straightforward to check that the tilings A2 , A4 , A6 , A8 , . . . are faultfree (in-
deed, the basement dominos prevent faults between the (2i − 1)-st and (2i )-th columns,
whereas the top dominos prevent faults between the (2i )-th and (2i + 1)-st columns).
Thus, they are faultfree domino tilings of a height-3 rectangle that contain a vertical
domino in the top two squares of the first column. It remains to show that they are the
only such tilings.
Indeed, let T be any faultfree domino tiling of a height-3 rectangle that contains a
vertical domino in the top two squares of the first column. Let n be the width of this
rectangle (so that the rectangle is Rn,3 ). We shall show that n is even, and that T = An .
We know that T contains a vertical domino in the top two squares of the first column.
In other words, T contains the left wall (where we are using the terminology from
Definition B.3.1 (a)). The remaining square in the first column is the square (1, 1), and
it must thus be covered by the basement domino {(1, 1) , (2, 1)} (since no other domino
would fit). Hence, T contains the basement domino {(1, 1) , (2, 1)}.
We shall now prove the following claims:

Claim 1: For each positive integer i < n/2, the tiling T contains the base-
ment domino {(2i − 1, 1) , (2i, 1)}, the middle domino {(2i, 2) , (2i + 1, 2)}
and the top domino {(2i, 3) , (2i + 1, 3)}.

Proof of Claim 1. We induct on i:


Math 701 Spring 2021, version April 6, 2024 page 650

Base case: We must prove that Claim 1 holds for i = 1, provided that 1 < n/2. So
let us assume that 1 < n/2. Thus, 2 < n, so that n ≥ 3. We must prove that Claim 1
holds for i = 1; in other words, we must prove that T contains the basement domino
{(1, 1) , (2, 1)}, the middle domino {(2, 2) , (3, 2)} and the top domino {(2, 3) , (3, 3)}.
For the basement domino, we have already proved this. For the middle and the top
domino, we argue as follows: If T had a vertical domino in the 2-nd column, then this
domino would cover the top two squares of that column (since the bottom square is
already covered by the basement domino {(1, 1) , (2, 1)}), and thus T would have a
fault between the 2-nd and 3-rd columns (since the basement domino {(1, 1) , (2, 1)}
also ends at the 2-nd column), which would contradict the faultfreeness of T. Hence,
T has no vertical domino in the 2-nd column. Thus, the top two squares of the 2-
nd column of T must be covered by horizontal dominos. These horizontal dominos
must both protrude into the 3-rd column (since the corresponding squares in the 1-st
column are already covered by the left wall), and thus must be the middle domino
{(2, 2) , (3, 2)} and the top domino {(2, 3) , (3, 3)}. Hence, we have shown that T
contains the middle domino {(2, 2) , (3, 2)} and the top domino {(2, 3) , (3, 3)}. This
completes the base case.
Induction step: Let j be a positive integer such that j + 1 < n/2. Assume (as the induc-
tion hypothesis) that Claim 1 holds for i = j. In other words, T contains the basement
domino {(2j − 1, 1) , (2j, 1)}, the middle domino {(2j, 2) , (2j + 1, 2)} and the top
domino {(2j, 3) , (2j + 1, 3)}. We must now show that Claim 1 holds for i = j + 1 as
well, i.e., that T also contains the basement domino {(2j + 1, 1) , (2j + 2, 1)}, the mid-
dle domino {(2j + 2, 2) , (2j + 3, 2)} and the top domino {(2j + 2, 3) , (2j + 3, 3)}.
Indeed, we first recall that j + 1 < n/2, so that 2 ( j + 1) < n and therefore n >
2 ( j + 1) = 2j + 2, so that n ≥ 2j + 3. This shows that the rectangle Rn,3 has a (2j + 3)-
th column (along with all columns to its left).
Now, the square (2j + 1, 1) cannot be covered by a vertical domino in T, since this
vertical domino would collide with the middle domino {(2j, 2) , (2j + 1, 2)} (which
we already know to belong to T). Thus, this square must be covered by a horizon-
tal domino. This horizontal domino cannot protrude into the (2j)-th column (since it
would then collide with the basement domino {(2j − 1, 1) , (2j, 1)}, which we already
know to belong to T), and thus must be the basement domino {(2j + 1, 1) , (2j + 2, 1)}.
So we have shown that T contains the basement domino {(2j + 1, 1) , (2j + 2, 1)}. It
remains to show that T also contains the middle domino {(2j + 2, 2) , (2j + 3, 2)} and
the top domino {(2j + 2, 3) , (2j + 3, 3)}.
If T had a vertical domino in the (2j + 2)-nd column, then this domino would
cover the top two squares of that column (since the bottom square is already cov-
ered by the basement domino {(2j + 1, 1) , (2j + 2, 1)}), and thus T would have a
fault between the (2j + 2)-nd and (2j + 3)-rd columns (since the basement domino
{(2j + 1, 1) , (2j + 2, 1)} also ends at the (2j + 2)-nd column), which would contra-
dict the faultfreeness of T. Hence, T has no vertical domino in the (2j + 2)-nd column.
Thus, the top two squares of the (2j + 2)-nd column of T must be covered by horizon-
tal dominos. These horizontal dominos must both protrude into the (2j + 3)-rd col-
umn (since the corresponding squares in the (2j + 1)-st column are already covered by
the middle domino {(2j, 2) , (2j + 1, 2)} and the top domino {(2j, 3) , (2j + 1, 3)}),
and thus must be the middle domino {(2j + 2, 2) , (2j + 3, 2)} and the top domino
{(2j + 2, 3) , (2j + 3, 3)}. Hence, we have shown that T contains the middle domino
Math 701 Spring 2021, version April 6, 2024 page 651

{(2j + 2, 2) , (2j + 3, 2)} and the top domino {(2j + 2, 3) , (2j + 3, 3)}. This com-
pletes the induction step.
Thus, Claim 1 is proved by induction.

Claim 2: The number n is even.

Proof of Claim 2. Assume the contrary. Thus, n is odd. But n > 1 (since R1,3 cannot be
tiled by dominos), so that n − 1 > 0. Hence, (n − 1) /2 is a positive integer (since n is
odd). Therefore, Claim 1 (applied to i = (n − 1) /2) shows that the tiling T contains the
basement domino {(n − 2, 1) , (n − 1, 1)}, the middle domino {(n − 1, 2) , (n, 2)}
and the top domino {(n − 1, 3) , (n, 3)}. But T must also include a domino that
contains the square (n, 1) (since (n, 1) ∈ Rn,3 ). This domino cannot be vertical (since
it would then collide with the basement domino {(n − 2, 1) , (n − 1, 1)}, which we
know to belong to T), and cannot be horizontal either (since it would then collide with
the middle domino {(n − 1, 2) , (n, 2)}). This is clearly absurd. This contradiction
shows that our assumption was wrong, so that Claim 2 is proved.
(Alternatively, Claim 2 can be obtained from a parity argument: Since T is a tiling of
Rn,3 by dominos, the total # of squares of Rn,3 must be even (since each domino covers
exactly 2 squares). But this total # is 3n. Thus, 3n must be even, so that n must be
even.)

Now, Claim 2 shows that n is even, so that n/2 is a positive integer. Furthermore,
we know that the tiling T contains the left wall, the first n − 1 basement dominos

{(1, 1) , (2, 1)} , {(3, 1) , (4, 1)} , ..., {(n − 3, 1) , (n − 2, 1)}

(by Claim 1), all the middle dominos

{(2, 2) , (3, 2)} , {(4, 2) , (5, 2)} , ..., {(n − 2, 2) , (n − 1, 2)}

(also by Claim 1) and all the top dominos

{(2, 3) , (3, 3)} , {(4, 3) , (5, 3)} , ..., {(n − 2, 3) , (n − 1, 3)}

(also by Claim 1). This leaves only the four squares (n − 1, 1), (n, 1), (n, 2) and (n, 3)
unaccounted for, but there is only one way to tile them: namely, with the last basement
domino {(n − 1, 1) , (n, 1)} and the right wall {(n, 2) , (n, 3)}. Thus, T must contain
all the basement dominos, all the middle dominos, all the top dominos and both walls
(left and right). Since these dominos cover all the squares of Rn,3 , this entails that T
consists precisely of these dominos. In other words, T = An . The proof of Proposition
B.3.2 (a) is thus complete.
(b) Proposition B.3.2 (b) is just Proposition B.3.2 (a) reflected across the horizontal
axis of symmetry of Rn,3 .

B.4. Limits of FPSs


Let us now prove a few facts stated in Section 3.13.
Math 701 Spring 2021, version April 6, 2024 page 652

Detailed proof of Lemma 3.13.7. We have lim f i = f . In other words, the sequence ( f i )i∈N
i→∞
coefficientwise stabilizes to f . In other words, for each n ∈ N,

the sequence ([ x n ] f i )i∈N stabilizes to [ x n ] f (400)

(by the definition of “coefficientwise stabilizing”).


Now, let n ∈ N. Furthermore, let k ∈ {0, 1, . . . , n}. Then, the sequence x k f i i∈N
  

stabilizes to x k f (by (400), applied to k instead of n). In other words, there exists
 

some N ∈ N such that


h i h i
all integers i ≥ N satisfy x k f i = x k f

(by the definition of “stabilizing”). Let us denote this N by Nk . Thus,


h i h i
all integers i ≥ Nk satisfy x k f i = x k f . (401)

Forget that we fixed k. Thus, for each k ∈ {0, 1, . . . , n}, we have defined an inte-
ger Nk ∈ N for which (401) holds. Altogether, we have thus defined n + 1 integers
N0 , N1 , . . . , Nn ∈ N.
Let us set P := max { N0 , N1 , . . . , Nn }. Then, of course, P ∈ N.
Now, let i ≥ P be an integer. Then, for each k ∈ {0, 1, . . . , n}, we have i ≥  P =
k k
max { N0 , N1 , . . . , Nn } ≥ Nk (since k ∈ {0, 1, . . . , n}) and therefore x f i = x f (by
(401)). In other words, each k ∈ {0, 1, . . . , n} satisfies x k f i = x k f . Renaming the
   

variable k as m in this result, we obtain the following:

Each m ∈ {0, 1, . . . , n} satisfies [ x m ] f i = [ x m ] f .

xn
In other words, f i ≡ f (by the definition of x n -equivalence176 ).
xn
Forget that we fixed i. We thus have shown that all integers i ≥ P satisfy f i ≡ f .
Hence, there exists some integer N ∈ N such that

xn
all integers i ≥ N satisfy f i ≡ f

(namely, N = P). This proves Lemma 3.13.7.

Detailed proof of Proposition 3.13.8. Recall that lim f i = f . Hence, Lemma 3.13.7 shows
i→∞
that there exists some integer N ∈ N such that

xn
all integers i ≥ N satisfy f i ≡ f .

xn
Let us denote this N by K. Hence, all integers i ≥ K satisfy f i ≡ f .
Thus, we have found an integer K ∈ N such that

xn
all integers i ≥ K satisfy f i ≡ f . (402)

176 i.e., Definition 3.10.1


Math 701 Spring 2021, version April 6, 2024 page 653

Similarly, using lim gi = g, we can find an integer L ∈ N such that


i→∞

xn
all integers i ≥ L satisfy gi ≡ g. (403)

Consider these K and L. Let us furthermore set P := max {K, L}. Thus, P ∈ N.
We shall now show that each integer i ≥ P satisfies [ x n ] ( f i gi ) = [ x n ] ( f g).
xn
Indeed, let i ≥ P be an integer. Then, (402) yields f i ≡ f (since i ≥ P = max {K, L} ≥
xn
K), whereas (403) yields gi ≡ g (since i ≥ P = max {K, L} ≥ L). Hence, we obtain

xn
f i gi ≡ f g

(by (101), applied to a = f i , b = f , c = gi and d = g). In other words,

each m ∈ {0, 1, . . . , n} satisfies [ x m ] ( f i gi ) = [ x m ] ( f g)

(by the definition of x n -equivalence). Applying this to m = n, we find

[ x n ] ( f i gi ) = [ x n ] ( f g ) .

Forget that we fixed i. We thus have shown that all integers i ≥ P satisfy [ x n ] ( f i gi ) =
[ x n ] ( f g). Hence, there exists some N ∈ N such that

all integers i ≥ N satisfy [ x n ] ( f i gi ) = [ x n ] ( f g)

(namely, N = P). In other words,

the sequence ([ x n ] ( f i gi ))i∈N stabilizes to [ x n ] ( f g)

(by the definition of “stabilizes”).


Forget that we fixed n. We thus have shown that for each n ∈ N, the sequence
([ x n ] ( f i gi ))i∈N stabilizes to [ x n ] ( f g). In other words, the sequence ( f i gi )i∈N coefficien-
twise stabilizes to f g (by the definition of “coefficientwise stabilizing”). In other words,
lim ( f i gi ) = f g.
i→∞
The same argument – but using (99) instead of (101) – shows that lim ( f i + gi ) =
i→∞
f + g. Thus, the proof of Proposition 3.13.8 is complete.

Detailed proof of Proposition 3.13.11. First, let us show that g is invertible:

Claim 1: The FPS g ∈ K [[ x ]] is invertible.

Proof of Claim 1. We have assumed that lim gi = g. In other words, the sequence
i→∞
( gi )i∈N coefficientwise stabilizes to g. In other words, for each n ∈ N, the sequence
([ x n ] gi )i∈N stabilizes to [ x n ] g (by the definition of “coefficientwise stabilizing”). Ap-
0
  0
plying this to n = 0, we see that the sequence x gi i∈N stabilizes to x g. In other
words, there exists some N ∈ N such that

all integers i ≥ N satisfy x0 gi = x0 g


   
Math 701 Spring 2021, version April 6, 2024 page 654

(by
 0 the definition of “stabilizes”). Consider this N.  Thus, all
 integers i ≥ N satisfy
x gi = x0 g. Applying this to i = N, we obtain x0 g N = x0 g.
   

However, each FPS gi is invertible (by assumption). Hence, in particular, the FPS
g N is invertible.
0
  0a = g N ), this entails that its constant
By Proposition 3.3.7 (applied to
0 g =

term
 0 x g N is invertible in K. In other words, x g is invertible in K (since x N
x g).
But the FPS g is invertible if and only if its constant term x0 g is invertible in K
 

 0 Proposition 3.3.7, applied to a = g). Hence, g is invertible (since its constant term
(by
x g is invertible in K). This proves Claim 1.
fi f
It remains to prove that lim = .
i → ∞ gi g
Recall that lim f i = f . Hence, Lemma 3.13.7 shows that there exists some integer
i→∞
N ∈ N such that
xn
all integers i ≥ N satisfy f i ≡ f .
xn
Let us denote this N by K. Hence, all integers i ≥ K satisfy f i ≡ f .
Thus, we have found an integer K ∈ N such that
xn
all integers i ≥ K satisfy f i ≡ f . (404)
Similarly, using lim gi = g, we can find an integer L ∈ N such that
i→∞
xn
all integers i ≥ L satisfy gi ≡ g. (405)
Consider these K and L. Let us furthermore set P := max {K, L}. Thus, P ∈ N.
fi f
We shall now show that each integer i ≥ P satisfies [ x n ] = [ x n ] .
gi g
xn
Indeed, let i ≥ P be an integer. Then, (404) yields f i ≡ f (since i ≥ P = max {K, L} ≥
xn
K), whereas (405) yields gi ≡ g (since i ≥ P = max {K, L} ≥ L). Since both FPSs gi and
g are invertible (by Claim 1), we thus obtain
f i xn f

gi g
(by Theorem 3.10.3 (e), applied to a = f i , b = f , c = gi and d = g). In other words,
fi f
each m ∈ {0, 1, . . . , n} satisfies [ x m ] = [ xm ]
gi g
(by the definition of x n -equivalence). Applying this to m = n, we find
fi f
[ xn ] = [ xn ] .
gi g
fi
Forget that we fixed i. We thus have shown that all integers i ≥ P satisfy [ x n ] =
gi
f
[ x n ] . Hence, there exists some N ∈ N such that
g
fi f
all integers i ≥ N satisfy [ x n ] = [ xn ]
gi g
Math 701 Spring 2021, version April 6, 2024 page 655

(namely, N = P). In other words,


 
f f
the sequence [x ] i
n
stabilizes to [ x n ]
gi i ∈N g

(by the definition of “stabilizes”).


 Forgetthat we fixed n. We thus have shown that for each n∈ N,  the sequence
f f f
[ xn ] i stabilizes to [ x n ] . In other words, the sequence
i
coefficient-
gi i ∈ N g gi i ∈ N
f
wise stabilizes to (by the definition of “coefficientwise stabilizing”). In other words,
g
fi f
lim = . This proves Proposition 3.13.11 (since we already have shown that g is
i → ∞ gi g
invertible).

Proof of Theorem 3.13.14. The family ( f n )n∈N is summable. In other words, the family
( f k )k∈N is summable (since this is the same family as ( f n )n∈N ). In other words, for
each n ∈ N,
all but finitely many k ∈ N satisfy [ x n ] f k = 0 (406)
(by the definition of “summable”).
Let us define
i
gi : = ∑ fk for each i ∈ N. (407)
k =0

Let us furthermore set


g := ∑ fk. (408)
k ∈N

Now, fix n ∈ N. Then, all but finitely many k ∈ N satisfy [ x n ] f k = 0 (by (406)). In
other words, there exists a finite subset J of N such that

all k ∈ N \ J satisfy [ x n ] f k = 0. (409)

Consider this subset J. The set J is a finite set of nonnegative integers, and thus has
an upper bound (since any finite set of nonnegative integers has an upper bound). In
other words, there exists some m ∈ N such that

all k ∈ J satisfy k ≤ m. (410)

Consider this m.
Let k ∈ N be such that k ≥ m + 1. If we had k ∈ J, then we would have k ≤ m (by
(410)), which would contradict k ≥ m + 1 > m. Thus, we cannot have k ∈ J. Hence,
k ∈ N \ J (since k ∈ N but not k ∈ J). Therefore, (409) yields [ x n ] f k = 0.
Forget that we fixed k. We thus have shown that

[ xn ] f k = 0 for each k ∈ N satisfying k ≥ m + 1. (411)


Math 701 Spring 2021, version April 6, 2024 page 656

Now, let i be an integer such that i ≥ m. Then, i ∈ N (since i ≥ m ≥ 0) and


i + 1 ≥ m + 1 (since i ≥ m). From g = ∑ f k , we obtain
k ∈N
!
[ xn ] g = [ xn ] ∑ fk = ∑ [ xn ] f k (by (27))
k ∈N k ∈N
i ∞ i ∞
= ∑ [ xn ] f k + ∑ [ xn ] f
| {z }k
= ∑ [ xn ] f k + ∑ 0
k =0 k = i +1 k =0 k = i +1
=0 | {z }
(by (411) =0
(since k ≥i +1≥m+1))
i
= ∑ [ xn ] f k . (412)
k =0

i
On the other hand, from gi = ∑ f k , we obtain
k =0
!
i i
n
[ x ] gi = [ x ] n
∑ fk = ∑ [ xn ] f k .
k =0 k =0

Comparing this with (412), we obtain [ x n ] gi = [ x n ] g.


Forget that we fixed i. We thus have shown that

all integers i ≥ m satisfy [ x n ] gi = [ x n ] g.

Hence, there exists some N ∈ N such that

all integers i ≥ N satisfy [ x n ] gi = [ x n ] g

(namely, N = m). In other words, the sequence ([ x n ] gi )i∈N stabilizes to [ x n ] g (by the
definition of “stabilizes”).
Forget that we fixed n. We thus have shown that for each n ∈ N,

the sequence ([ x n ] gi )i∈N stabilizes to [ x n ] g.

In other words, the sequence ( gi )i∈N coefficientwise stabilizes to g. In other words,


lim gi = g. In view of (407) and (408), we can rewrite this as
i→∞

i
lim
i→∞
∑ fk = ∑ fk.
k =0 k ∈N

Renaming the summation index k as n everywhere in this equality, we can rewrite it as


i
lim
i→∞
∑ fn = ∑ fn.
n =0 n ∈N

This proves Theorem 3.13.14.


Math 701 Spring 2021, version April 6, 2024 page 657

Proof of Theorem 3.13.15. The family ( f n )n∈N is multipliable. In other words, the family
( f k )k∈N is multipliable (since this is the same family as ( f n )n∈N ). In other words, each
coefficient in the product of this family is finitely determined (by the definition of
“multipliable”).
Let us define
i
gi : = ∏ fk for each i ∈ N. (413)
k =0

Let us furthermore set


g := ∏ fk. (414)
k ∈N

Now, fix n ∈ N. Then, the x n -coefficient in the product of the family ( f k )k∈N is
finitely determined (since each coefficient in the product of this family is finitely de-
termined). In other words, there is a finite subset M of N that determines the x n -
coefficient in the product of ( f k )k∈N (by the definition of “finitely determined”). Con-
sider this subset M.
The set M is a finite set of nonnegative integers, and thus has an upper bound (since
any finite set of nonnegative integers has an upper bound). In other words, there exists
some m ∈ N such that
all k ∈ M satisfy k ≤ m. (415)
Consider this m.
Now, let i be an integer be such that i ≥ m. Then, i ∈ N (since i ≥ m ≥ 0) and m ≤ i.
From (415), we see that all k ∈ M satisfy k ≤ m ≤ i and therefore k ∈ {0, 1, . . . , i }. In
other words, M ⊆ {0, 1, . . . , i }.
However, the set M determines the x n -coefficient in the product of ( f k )k∈N . In other
words, every finite subset J of N satisfying M ⊆ J ⊆ N satisfies
! !
[ xn ] ∏ fk = [ xn ] ∏ fk
k∈ J k∈ M

(by the definition of “determines the x n -coefficient in the product of ( f k )k∈N ”). Apply-
ing this to J = {0, 1, . . . , i }, we obtain
! !
[ xn ] ∏ fk = [ xn ] ∏ fk (416)
k ∈{0,1,...,i } k∈ M

(since {0, 1, . . . , i } is a finite subset of N satisfying M ⊆ {0, 1, . . . , i } ⊆ N).


Moreover, the definition of the product of the multipliable family ( f k )k∈N (Definition
3.11.5 (b)) shows that ! !
[ xn ] ∏ fk = [ xn ] ∏ fk
k ∈N k∈ M

(since M is a finite subset of N that determines the x n -coefficient in the product of


( f k )k∈N ). Comparing this with (416), we find
! !
[ xn ] ∏ fk = [ xn ] ∏ fk .
k ∈{0,1,...,i } k ∈N
Math 701 Spring 2021, version April 6, 2024 page 658

i
In view of ∏ f k = ∏ f k = gi (by (413)) and ∏ f k = g (by (414)), we can
k ∈{0,1,...,i } k =0 k ∈N
rewrite this as
[ x n ] gi = [ x n ] g.
Forget that we fixed i. We thus have shown that

all integers i ≥ m satisfy [ x n ] gi = [ x n ] g.

Hence, there exists some N ∈ N such that

all integers i ≥ N satisfy [ x n ] gi = [ x n ] g

(namely, N = m). In other words, the sequence ([ x n ] gi )i∈N stabilizes to [ x n ] g (by the
definition of “stabilizes”).
Forget that we fixed n. We thus have shown that for each n ∈ N,

the sequence ([ x n ] gi )i∈N stabilizes to [ x n ] g.

In other words, the sequence ( gi )i∈N coefficientwise stabilizes to g. In other words,


lim gi = g. In view of (414) and (414), we can rewrite this as
i→∞

i
lim ∏ f k = ∏ fk.
i→∞
k =0 k ∈N

Renaming the product index k as n everywhere in this equality, we can rewrite it as


i
lim
i→∞
∏ fn = ∏ fn.
n =0 n ∈N

This proves Theorem 3.13.15.

Proof of Corollary 3.13.16. Let a FPS a ∈ K [[ x ]] be written in the form a = ∑ an x n with


n ∈N
an ∈ K. Then, Corollary 3.2.18 shows that the family ( an x n )n∈N is summable. Hence,
Theorem 3.13.14 (applied to f n = an x n ) yields
i
lim ∑ an x n = ∑
i → ∞ n =0
an x n = a.
n ∈N

i
In other words, a = lim ∑ an x n . This proves Corollary 3.13.16.
i → ∞ n =0

Proof of Theorem 3.13.17. Let us define


i
gi : = ∑ fk for each i ∈ N. (417)
k =0
Math 701 Spring 2021, version April 6, 2024 page 659

i
We have assumed that the limit lim ∑ f n exists. Let us denote this limit by g. Thus,
i → ∞ n =0

i i  
here, we have renamed the
g = lim ∑
i → ∞ n =0
f n = lim
i→∞
∑ fk
summation index n as k
k =0
| {z }
= gi
(by (417))

= lim gi . (418)
i→∞

In other words, the sequence ( gi )i∈N coefficientwise stabilizes to g. In other words, for
each n ∈ N,
the sequence ([ x n ] gi )i∈N stabilizes to [ x n ] g. (419)
Let n ∈ N. Then, the sequence ([ x n ] gi )i∈N stabilizes to [ x n ] g (by (419)). In other
words, there exists some N ∈ N such that

all integers i ≥ N satisfy [ x n ] gi = [ x n ] g. (420)

Consider this N. Let M be the set {0, 1, . . . , N }; this is a finite subset of N.


Let i ∈ N \ M. Thus,

i ∈ N \ M = N \ {0, 1, . . . , N } (since M = {0, 1, . . . , N })


= { N + 1, N + 2, N + 3, . . .} .

In other words, i is a nonnegative integer with i ≥ N + 1. Hence, i − 1 ≥ N. Therefore,


(420) (applied to i − 1 instead of i) yields [ x n ] gi−1 = [ x n ] g. But (420) also yields
[ x n ] gi = [ x n ] g (since i ≥ N + 1 ≥ N). Comparing these two equalities, we find

[ x n ] gi − 1 = [ x n ] gi . (421)

However, (417) yields


i i −1
gi = ∑ fk = fi + ∑ fk. (422)
k =0 k =0
i −1
Moreover, (417) (applied to i − 1 instead of i) yields gi−1 = ∑ f k . In view of this, we
k =0
can rewrite (422) as
gi = f i + gi − 1 .
Hence, [ x n ] gi = [ x n ] ( f i + gi−1 ) = [ x n ] f i + [ x n ] gi−1 = [ x n ] f i + [ x n ] gi . Subtracting
| {z }
=[ x n ] gi
(by (421))
[ x n ] gi from both sides of this equality, we find 0 = [ x n ] f i . In other words, [ x n ] f i = 0.
Forget that we fixed i. We thus have shown that all i ∈ N \ M satisfy [ x n ] f i = 0.
Hence, all but finitely many i ∈ N satisfy [ x n ] f i = 0 (since all but finitely many i ∈ N
belong to N \ M (because the set M is finite)). Renaming the index i as k in this
statement, we obtain the following: All but finitely many k ∈ N satisfy [ x n ] f k = 0.
Forget that we fixed n. We thus have shown that for each n ∈ N,

all but finitely many k ∈ N satisfy [ x n ] f k = 0.


Math 701 Spring 2021, version April 6, 2024 page 660

In other words, the family ( f n )n∈N is summable (by the definition of “summable”).
i i
Hence, Theorem 3.13.14 yields lim ∑ f n = ∑ f n . In other words, ∑ f n = lim ∑ f n .
i → ∞ n =0 n ∈N n ∈N i → ∞ n =0
This completes the proof of Theorem 3.13.17.

Proof of Theorem 3.13.18. Let us define


i
gi : = ∏ fk for each i ∈ N. (423)
k =0

i
We have assumed that the limit lim ∏ f n exists. Let us denote this limit by g. Thus,
i → ∞ n =0

i i  
here, we have renamed the
g = lim ∏ f n = ilim
i → ∞ n =0 →∞
∏ fk product index n as k
k =0
| {z }
= gi
(by (423))

= lim gi . (424)
i→∞

Let n ∈ N be arbitrary. Lemma 3.13.7 (applied to gi and g instead of f i and f ) shows


that there exists some integer N ∈ N such that
xn
all integers i ≥ N satisfy gi ≡ g (425)

(since g = lim gi ). Consider this N.


i→∞
Let M = {0, 1, . . . , N }. Thus, M is a finite subset of N. We shall show that this
subset M determines the x n -coefficient in the product of ( f k )k∈N .
For this purpose, we first observe that M = {0, 1, . . . , N }, so that
N
∏ fk = ∏ fk = ∏ fk = gN (426)
k∈ M k∈{0,1,...,N } k =0

N xn
(since g N was defined to be ∏ f k ). Applying (425) to i = N, we obtain g N ≡ g (since
k =0
xn xn
N ≥ N). Therefore, g ≡ g N (since ≡ is an equivalence relation and thus symmetric).
Next, we shall show two claims:
xn
Claim 1: For each integer j > N, we have g ≡ g f j .

Proof of Claim 1. Let j > N be an integer. Thus, j ≥ N + 1, so that j − 1 ≥ N. Hence,


xn
(425) (applied to i = j − 1) yields g j−1 ≡ g. However, (425) (applied to i = j) yields
xn xn xn
g j ≡ g (since j ≥ N + 1 ≥ N). Thus, g ≡ g j (since ≡ is an equivalence relation and thus
symmetric).
xn xn
Also, we obviously have f j ≡ f j (since ≡ is an equivalence relation and thus reflex-
ive).
Math 701 Spring 2021, version April 6, 2024 page 661

j −1
However, the definition of g j−1 yields g j−1 = ∏ f k . Furthermore, the definition of g j
k =0
yields
j −1
!
j
gj = ∏ fk = ∏ fk f j = g j −1 f j .
k =0 k =0
| {z }
= g j −1

xn xn xn
But recall that g j−1 ≡ g and f j ≡ f j . Hence, g j−1 f j ≡ g f j (by (101), applied to a = g j−1 ,
xn
b = g, c = f j and d = f j ). In view of g j = g j−1 f j , we can rewrite this as g j ≡ g f j .
xn xn xn xn xn
Now, recall that g ≡ g j . Thus, g ≡ g j ≡ g f j , so that g ≡ g f j (since ≡ is an equivalence
relation and thus transitive). This proves Claim 1.

xn
Claim 2: If U is any finite subset of N satisfying M ⊆ U, then g ≡ ∏ f k .
k ∈U

Proof of Claim 2. We induct on the nonnegative integer |U \ M|:


Base case: Let us check that Claim 2 holds when |U \ M| = 0.
Indeed, let U be any finite subset of N satisfying M ⊆ U and |U \ M| = 0. Then,
U \ M = ∅ (since |U \ M| = 0), so that U ⊆ M. Combined with M ⊆ U, this yields
U = M. Thus, ∏ f k = ∏ f k = g N (by (426)). But we have already proved that
k ∈U k∈ M
xn xn
g ≡ g N . In other words, g ≡ ∏ f k (since ∏ f k = g N ). Thus, Claim 2 is proved under
k ∈U k ∈U
the assumption that |U \ M | = 0. This completes the base case.
Induction step: Let s ∈ N. Assume (as the induction hypothesis) that Claim 2 holds
when |U \ M| = s. We must now prove that Claim 2 holds when |U \ M | = s + 1.
Indeed, let U be any finite subset of N satisfying M ⊆ U and |U \ M | = s + 1. Then,
the set U \ M is nonempty (since |U \ M | = s + 1 > s ≥ 0). Hence, there exists some u ∈
U \ M. Consider this u. From u ∈ U \ M, we obtain |(U \ M) \ {u}| = |U \ M| − 1 = s
(since |U \ M| = s + 1). In view of (U \ M ) \ {u} = (U \ {u}) \ M, we can rewrite this
as |(U \ {u}) \ M| = s. Moreover, U \ {u} is a finite subset of N (since U is a finite
subset of N), and satisfies M ⊆ U \ {u} (since u ∈ U \ M and thus u ∈ / M and therefore
M \ {u} = M, so that M = |{z} M \ {u} ⊆ U \ {u}). Hence, by the induction hypothesis,
⊆U
we can apply Claim 2 to U \ {u} instead of U (since |(U \ {u}) \ M | = s). As a result,
xn
we obtain g ≡ ∏ fk.
k ∈U \{u}
!
However, u ∈ U \ M ⊆ U, so that ∏ f k = ∏ fk f u (here, we have split off
k ∈U k ∈U \{u}
the factor for k = u from the product).
xn xn xn
We have g ≡ ∏ f k (as we have proved above) and f u ≡ f u (since ≡ is an equiv-
k ∈U \{u}
!
xn
alence relation and thus reflexive). Hence, g f u ≡ ∏ fk f u (by (101), applied to
k ∈U \{u}
Math 701 Spring 2021, version April 6, 2024 page 662

!
a = g, b = ∏ f k , c = f u and d = f u ). In view of ∏ f k = ∏ fk f u , we can
k ∈U \{u} k ∈U k ∈U \{u}
xn
rewrite this as g f u ≡ ∏ f k .
k ∈U
But u ∈ |{z}
U \ M
|{z} ⊆ N \ {0, 1, . . . , N } = { N + 1, N + 2, N + 3, . . .}, so that
⊆N ={0,1,...,N }
xn
u ≥ N + 1 > N. Hence, Claim 1 (applied to j = u) yields g ≡ g f u . Therefore,
xn xn xn xn
g ≡ g f u ≡ ∏ f k . Therefore, g ≡ ∏ f k (since ≡ is an equivalence relation and thus
k ∈U k ∈U
transitive).
Forget that we fixed U. We thus have shown that if U is any finite subset of N
xn
satisfying M ⊆ U and |U \ M| = s + 1, then g ≡ ∏ f k . In other words, Claim 2 holds
k ∈U
when |U \ M | = s + 1. This completes the induction step. Thus, Claim 2 is proved by
induction.
xn
Now, let J be a finite subset of N satisfying M ⊆ J ⊆ N. Then, g ≡ ∏ f k (by Claim
k∈ J
2, applied to U = J). In other words,
!
each m ∈ {0, 1, . . . , n} satisfies [ x m ] g = [ x m ] ∏ fk
k∈ J

(by the definition of x n -equivalence). Applying this to m = n, we find


!
[ xn ] g = [ xn ] ∏ fk .
k∈ J

The same argument can be applied to M instead of J (since M ⊆ M ⊆ N), and thus
yields !
[ xn ] g = [ xn ] ∏ fk .
k∈ M

Comparing these two equalities, we find


! !
n
[x ] ∏ fk = [x ] n
∏ fk .
k∈ J k∈ M

Forget that we fixed J. We have now shown that every finite subset J of N satisfying
M ⊆ J ⊆ N satisfies ! !
[ xn ] ∏ fk = [ xn ] ∏ fk .
k∈ J k∈ M

In other words, the set M determines the x n -coefficient in the product of ( f k )k∈N (by
the definition of “determines the x n -coefficient in the product of ( f k )k∈N ”). Hence,
the x n -coefficient in the product of ( f k )k∈N is finitely determined (by the definition of
“finitely determined”, since M is a finite subset of N).
Math 701 Spring 2021, version April 6, 2024 page 663

Forget that we fixed n. We thus have shown that for each n ∈ N, the x n -coefficient
in the product of ( f k )k∈N is finitely determined. In other words, each coefficient in
the product of ( f k )k∈N is finitely determined. In other words, the family ( f k )k∈N is
multipliable (by the definition of “multipliable”). In other words, the family ( f n )n∈N is
multipliable (since this is the same family as ( f k )k∈N ).
i i
Hence, Theorem 3.13.15 yields lim ∏ f n = ∏ f n . In other words, ∏ f n = lim ∏ f n .
i → ∞ n =0 n ∈N n ∈N i → ∞ n =0
This completes the proof of Theorem 3.13.18.

B.5. Laurent power series


Next, we shall prove some claims made in Section 3.14. We shall not go into too many
details here, since the proofs are mostly analogous to the corresponding proofs in the
theory of “usual” FPSs (with nonnegative exponents), which we have already seen.
However, every once in a while, some minor change is required; we will mainly focus
on these changes.

Proof of Theorem 3.14.7 (sketched). First, we need to prove that the set K [ x ± ] is closed
under addition, under scaling, and under the multiplication introduced in Definition
3.14.6. The proof of this is analogous to Theorem 3.4.2 (with the obvious changes, such
n
as replacing the ∑ sums by ∑ sums).
i =0 i ∈Z
Next, we need to prove that the multiplication on K [ x ± ] is associative, commutative,
distributive and K-bilinear. This is analogous to the corresponding parts of Theorem
3.2.6. Likewise, we can show that the element (δi,0 )i∈Z (which is our analogue of the
FPS 1) is a neutral element for this multiplication. Hence, K [ x ± ] is a commutative
K-algebra with unity (δi,0 )i∈Z .
It remains to show that the element x is invertible in this K-algebra. But this is easy:
Set x := (δi,−1 )i∈Z , and show (by direct calculation) that xx = xx = 1. (For example,
let us prove that xx = 1. Indeed, from x = (δi,1 )i∈Z and x = (δi,−1 )i∈Z , we obtain

xx = (δi,1 )i∈Z · (δi,−1 )i∈Z = (cn )n∈Z , where cn = ∑ δi,1 δn−i,−1


i ∈Z

(by Definition 3.14.6). However, for each n ∈ Z, we have

cn = ∑ δi,1 δn−i,−1 = |{z}


δ1,1 δn−1,−1 + ∑ δi,1 δn−i,−1
i ∈Z i ∈Z;
|{z}
=1 i ̸ =1 =0
(since i ̸=1)
 
here, we have split off the addend for i = 1
from the sum
 
since n − 1 = −1 holds
= δn−1,−1 + ∑ 0δn−i,−1 = δn−1,−1 = δn,0 .
i ∈Z;
if and only if n = 0
i ̸ =1
| {z }
=0

Thus, (cn )n∈Z = (δn,0 )n∈Z = 1, so that xx = (cn )n∈Z = 1, as we wanted to prove. The
proof of xx = 1 is analogous, or follows from xx = 1 by commutativity.)
Math 701 Spring 2021, version April 6, 2024 page 664

Proposition 3.14.8 is an analogue of Corollary 3.2.18. Recall that we proved the


latter corollary using Lemma 3.2.16 and Proposition 3.2.17. Thus, in order to prove the
former proposition, we need to find analogues of Lemma 3.2.16 and Proposition 3.2.17
for Laurent polynomials.
The analogue of Lemma 3.2.16 for Laurent polynomials is the following:

Lemma B.5.1. Let a = ( an )n∈Z be a Laurent polynomial in K [ x ± ]. Then,

x · a = ( a n −1 ) n ∈Z and x −1 · a = ( a n +1 ) n ∈Z .

Proof of Lemma B.5.1 (sketched). From x = (δi,1 )i∈Z and a = ( an )n∈Z = ( ai )i∈Z , we ob-
tain

x · a = (δi,1 )i∈Z · ( ai )i∈Z = (cn )n∈Z , where cn = ∑ δi,1 an−i


i ∈Z

(by Definition 3.14.6). However, for each n ∈ Z, we have


 
here, we have split off the
cn = ∑ δi,1 an−i = |{z}
δ1,1 an−1 + ∑ δi,1 a n −i  addend for i = 1 
i ∈Z i ∈Z; from the sum
|{z}
=1 i ̸ =1 =0
(since i ̸=1)

= a n −1 + ∑ 0an−i = an−1 .
i ∈Z;
i ̸ =1
| {z }
=0

Thus, (cn )n∈Z = ( an−1 )n∈Z , so that x · a = (cn )n∈Z = ( an−1 )n∈Z . Thus we have proved

x · a = ( a n −1 ) n ∈Z . (427)

It remains to prove that x −1 · a = ( an+1 )n∈Z . Here we use a trick: Let b = ( an+1 )n∈Z ;
this is again a Laurent polynomial in K [ x ± ]. Hence, (427) (applied to b and an+1
instead of a and an ) yields
 
x · b = a(n−1)+1 = ( a n ) n ∈Z
n ∈Z

(since a(n−1)+1 = an for each n ∈ Z). In other words, x · b = a (since a = ( an )n∈Z ).


Dividing this equality by the invertible element x, we find b = x −1 · a, so that x −1 · a =
b = ( an+1 )n∈Z . This completes the proof of Lemma B.5.1.

The analogue of Proposition 3.2.17 for Laurent polynomials is the following:

Proposition B.5.2. We have

x k = (δi,k )i∈Z for each k ∈ Z.


Math 701 Spring 2021, version April 6, 2024 page 665

Proof of Proposition B.5.2 (sketched). This is similar to Proposition 3.2.17, which we proved
by induction on k. Here, too, we can use induction, but (since k can be negative) we
have to use “two-sided induction”, which contains both an induction step from k to
k + 1 and an induction step from k to k − 1. (See [Grinbe15, §2.15] for a detailed expla-
nation of two-sided induction.)
Both induction steps rely on Lemma B.5.1. (Specifically, the step from k to k + 1 uses
the x · a = ( an−1 )n∈Z part of Lemma B.5.1, whereas the step from k to k − 1 uses the
x −1 · a = ( an+1 )n∈Z part.)
Now, Proposition 3.14.8 is easy:
Proof of Proposition 3.14.8 (sketched). Analogous to Corollary 3.2.18, but using Proposi-
tion B.5.2 instead of Proposition 3.2.17.
Next, let us prove Theorem 3.14.10:
Proof of Theorem 3.14.10 (sketched). This is analogous to the proof of Theorem 3.14.7.
The only (slightly) different piece is the proof that the set K (( x )) is closed under mul-
tiplication. So let us prove this:
Let ( an )n∈Z and (bn )n∈Z be two elements of K (( x )). We must prove that their prod-
uct ( an )n∈Z · (bn )n∈Z belongs to K (( x )) as well. This product is defined by
( a n ) n ∈Z · ( bn ) n ∈Z = ( c n ) n ∈Z , where cn = ∑ a i bn − i .
i ∈Z

Thus, we need to prove that (cn )n∈Z belongs to K (( x )). In other words, we must prove
that the sequence (c−1 , c−2 , c−3 , . . .) is essentially finite (by the definition of K (( x ))).
So let us prove this now. We know that the sequence ( a−1 , a−2 , a−3 , . . .) is essentially
finite (since ( an )n∈Z ∈ K (( x ))). Thus, there exists some negative integer p such that
all i ≤ p satisfy ai = 0. (428)
Similarly, there exists some negative integer q such that
all j ≤ q satisfy b j = 0 (429)
(since (bn )n∈Z ∈ K (( x ))). Consider these p and q. Now, set r := p + q. We shall show
that all negative integers n ≤ r satisfy cn = 0.
Indeed, let n ≤ r be any integer. Then, each integer i ≥ p satisfies |{z}n − |{z}i ≤
≤r = p + q ≥p
p + q − p = q and therefore
bn − i = 0 (430)
(by (429), applied to j = n − i). Now,
cn = ∑ a i bn − i = ∑ a i bn − i + ∑ ai bn−i = ∑ 0bn−i + ∑ ai 0 = 0.
i ∈Z i ∈Z; i ∈Z; i ∈Z; i ∈Z;
|{z} |{z}
i < p (by = 0
(428)) i≥ p =0 i< p i≥ p
(by (430)) | {z } | {z }
=0 =0

Forget that we fixed n. We thus have shown that all n ≤ r satisfy cn = 0. Hence, the
sequence (c−1 , c−2 , c−3 , . . .) is essentially finite. As explained, this completes our proof
of the claim that the set K (( x )) is closed under multiplication.
The rest of Theorem 3.14.10 is proved just like Theorem 3.14.7.
Math 701 Spring 2021, version April 6, 2024 page 666

B.6. Cancellations in alternating sums


We shall now prove Lemma 6.1.3 and Lemma 6.1.4. We start with the latter lemma,
since the former will then follow trivially from it.

Detailed proof of Lemma 6.1.4. The set X is finite (since it is a subset of the finite set A).
Thus, |X | = n for some n ∈ N. Consider this n.
Let [n] be the set {1, 2, . . . , n}. Then, |[n]| = n. Comparing this with |X | = n, we
obtain |X | = |[n]|. Hence, there exists a bijection α : X → [n]. Consider this α.
Now, define two subsets U and W of X by

U := { I ∈ X | α ( f ( I )) < α ( I )} ;
W := { I ∈ X | α ( f ( I )) > α ( I )} .

Then, f (U ) ⊆ W 177 and f (W ) ⊆ U 178 . Now, the map

g : U → W,
I 7→ f ( I )

is well-defined (since each I ∈ U satisfies f ( I ) ∈ f (U ) ⊆ W ), and the map

h : W → U,
I 7→ f ( I )

is also well-defined (since each I ∈ W satisfies f ( I ) ∈ f (W ) ⊆ U ). Consider these two

177 Proof.Let J ∈ f (U ). Thus, J = f (K ) for some K ∈ U . Consider this K. We have K ∈


U = { I ∈ X | α ( f ( I )) < α ( I )}; in other words, K is an I ∈ X satisfying α ( f ( I )) < α ( I ).
In other words, K is an element of X and satisfies α ( f (K )) < α (K ). However, we have
f ◦ f = id (since
 f isan involution) and thus ( f ◦ f ) (K ) = id (K ) = K, so that K =

( f ◦ f ) (K ) = f  f (K ) = f ( J ). However, from J = f (K ), we obtain α ( J ) = α ( f (K )) <


 
| {z }
  =J

K  = α ( f ( J )), so that α ( f ( J )) > α ( J ).


 
α  |{z}
= f ( J)
Now, J is an element of X and satisfies α ( f ( J )) > α ( J ). In other words, J is an I ∈ X
satisfying α ( f ( I )) > α ( I ). In other words, J ∈ { I ∈ X | α ( f ( I )) > α ( I )}. In other words,
J ∈ W (since W = { I ∈ X | α ( f ( I )) > α ( I )}).
Forget that we fixed J. We thus have shown that J ∈ W for each J ∈ f (U ). In other
words, f (U ) ⊆ W .
178 The proof of f (W ) ⊆ U is completely analogous to the proof of f (U ) ⊆ W we just gave; the

only changes are that all “<” signs have to be replaced by “>” signs and vice versa, and
that all “U ”s have to be replaced by “W ”s and vice versa.
Math 701 Spring 2021, version April 6, 2024 page 667

maps g and h. It is clear that g ◦ h = id 179 and h ◦ g = id 180 . Hence, the maps g
and h are mutually inverse, and thus are bijections.
We have
sign I + sign ( g ( I )) = 0 for all I ∈ U (431)
181 . Furthermore, we have

sign I = 0 for all I ∈ X satisfying α ( f ( I )) = α ( I ) . (432)


182 .

However, each I ∈ X satisfies exactly one of the three conditions “α ( f ( I )) < α ( I )”,
“α ( f ( I )) = α ( I )” and “α ( f ( I )) > α ( I )” (because α ( f ( I )) and α ( I ) are two integers).

179 Proof. Let I ∈ W . Recall that f is an involution; thus, f ◦ f = id. Now, ( g ◦ h) ( I ) =


g (h ( I )) = f (h ( I )) (by definition of g). However, h ( I ) = f ( I ) (by the definition of h).
 the 

Thus, ( g ◦ h) ( I ) = f  h ( I )  = f ( f ( I )) = ( f ◦ f ) ( I ) = id ( I ) = I = id ( I ).
 
|{z} | {z }
= f (I) =id
Forget that we fixed I. We thus have shown that ( g ◦ h) ( I ) = id ( I ) for each I ∈ W . In
other words, g ◦ h = id.
180 The proof of this is analogous to the proof of g ◦ h = id we just gave.
181 Proof of (431): Let J ∈ U . Then, J ∈ U ⊆ X and g ( J ) = f ( J ) (by the definition of g).

Now, recall our assumption saying that sign ( f ( I )) = − sign I for all I ∈ X . Applying
this to I = J, we obtain sign ( f ( J )) = − sign J. In view of g ( J ) = f ( J ), this rewrites as
sign ( g ( J )) = − sign J. In other words, sign J + sign ( g ( J )) = 0.
Forget that we fixed J. We thus have shown that sign J + sign ( g ( J )) = 0 for all J ∈ U .
Renaming J as I in this statement, we obtain that sign I + sign ( g ( I )) = 0 for all I ∈ U . This
proves (431).
182 Proof of (432): Recall our assumption saying that

sign I = 0 for all I ∈ X satisfying f ( I ) = I. (433)

Now, let I ∈ X satisfy α ( f ( I )) = α ( I ). The map α is injective (since α is a bijection).


Thus, from α ( f ( I )) = α ( I ), we obtain f ( I ) = I. Therefore, (433) yields sign I = 0.
Forget that we fixed I. We thus have shown that sign I = 0 for all I ∈ X satisfying
α ( f ( I )) = α ( I ). This proves (432).
Math 701 Spring 2021, version April 6, 2024 page 668

Hence, we can split the sum ∑ sign I as follows:


I ∈X

∑ sign I = ∑ sign I + ∑ sign I + ∑ sign I


I ∈X I ∈X ; I ∈X ; I ∈X ;
| {z }
α( f ( I ))<α( I ) α( f ( I ))=α( I ) =0 α( f ( I ))>α( I )
(by (432))

= ∑ sign I + ∑ 0+ ∑ sign I
I ∈X ; I ∈X ; I ∈X ;
α( f ( I ))<α( I ) α( f ( I ))=α( I ) α( f ( I ))>α( I )
| {z }
=0
= ∑ sign I + ∑ sign I
I ∈X ; I ∈X ;
α( f ( I ))<α( I ) α( f ( I ))>α( I )
| {z } | {z }
=∑ = ∑
I ∈U I ∈W
(since { I ∈X | α( f ( I ))<α( I )}=U ) (since { I ∈X | α( f ( I ))>α( I )}=W )

= ∑ sign I + ∑ sign I = ∑ sign I + ∑ sign ( g ( I ))


I ∈U I ∈W I ∈U I ∈U
| {z }
= ∑ sign( g( I ))
I ∈U
(here, we have substituted g( I )
for I in the sum, since the
map g:U →W is a bijection)

= ∑ |(sign I + sign ( g ( I ))) = ∑ 0 = 0.


I ∈U I ∈U
{z }
=0
(by (431))

However, the set A is the union of its two disjoint subsets X and A \ X (since X ⊆ A).
Thus, we can split the sum ∑ sign I as follows:
I ∈A

∑ sign I = ∑ sign I + ∑ sign I = ∑ sign I.


I ∈A I ∈X I ∈A\X I ∈A\X
| {z }
=0

This proves Lemma 6.1.4.

Detailed proof of Lemma 6.1.3. The map f has no fixed points (by assumption). In other
words, there exist no I ∈ X satisfying f ( I ) = I. Hence, we have sign I = 0 for all I ∈ X
satisfying f ( I ) = I (because non-existing objects satisfy any possible claim; this is
known as being “vacuously true”). Thus, Lemma 6.1.4 yields ∑ sign I = ∑ sign I.
I ∈A I ∈A\X
This proves Lemma 6.1.3.

B.7. Determinants in combinatorics


Proof of Proposition 6.5.10. If q is any path, then the length ℓ (q) of q is defined to be the
number of arcs of q.
We shall now prove Proposition 6.5.10 by strong induction on ℓ ( p) + ℓ ( p′ ):
Induction step: Fix a nonnegative integer N. Assume (as the induction hypothesis)
that Proposition 6.5.10 holds whenever ℓ ( p) + ℓ ( p′ ) < N. We must now prove that
Proposition 6.5.10 holds when ℓ ( p) + ℓ ( p′ ) = N.
Math 701 Spring 2021, version April 6, 2024 page 669

So let A, B, A′ , B′ , p and p′ be as in Proposition 6.5.10, and let us assume that


ℓ ( p) + ℓ ( p′ ) = N. We must prove that p and p′ have a vertex in common.
Assume the contrary. Thus, p and p′ have no vertex in common.
Recall that each arc of the lattice is either an east-step or a north-step. Thus, the x-
coordinates of the vertices of a path are always weakly increasing (i.e., if (v0 , v1 , . . . , vn )
is a path, then x (v0 ) ≤ x (v1 ) ≤ · · · ≤ x (vn )), and so are the y-coordinates. Hence,
the existence of a path p from A to B′ shows that x ( A) ≤ x ( B′ ) and y ( A) ≤ y ( B′ ).
Similarly, the existence of a path p′ from A′ to B yields x ( A′ ) ≤ x ( B) and y ( A′ ) ≤
y ( B ).
We are in one of the following three cases:
Case 1: We have y ( A′ ) > y ( A).
Case 2: We have x ( A′ ) < x ( A).
Case 3: We have neither y ( A′ ) > y ( A) nor x ( A′ ) < x ( A).
We shall derive a contradiction in each of these cases.
Let us first consider Case 1. In this case, we have y ( A′ ) > y ( A). Thus, y ( A) <
y ( A′ ) ≤ y ( B) ≤ y ( B′ ) (since y ( B′ ) ≥ y ( B)), so that y ( A) ̸= y ( B′ ) and therefore
A ̸= B′ . This shows that the path p has at least two vertices (since p is a path from A
to B′ ). Let P be its second vertex. Hence, P lies on a path from A to B′ (namely, on the
path p). Therefore, x ( A) ≤ x ( P) ≤ x ( B′ ) (since the x-coordinates of the vertices of a
path are always weakly increasing) and y ( A) ≤ y ( P) ≤ y ( B′ ) (since the y-coordinates
of the vertices of a path are always weakly increasing). Let r be the path from P to B′
obtained by removing the first arc from p (in other words, let r be the part of p from
the point P onwards)183 . Thus, every vertex of r is a vertex of p. Hence, the paths r
and p′ have no vertex in common (since p and p′ have no vertex in common). Also,
ℓ (r ) = ℓ ( p) − 1 < ℓ ( p) and thus ℓ (r ) + ℓ ( p′ ) < ℓ ( p) + ℓ ( p′ ) = N. Moreover, x ( A′ ) ≤
|{z}
<ℓ( p)

183 Here is an illustration (with r drawn extra-thick):

6 B′
r
5

4 B
p′

3 A′
p′

1 A P
p

0
0 1 2 3 4 5 6 7
Math 701 Spring 2021, version April 6, 2024 page 670

x ( A) ≤ x ( P). Thus, if we had y ( A′ ) ≥ y ( P), then we could apply Proposition 6.5.10


to P and r instead of A and p (by the induction hypothesis, since ℓ (r ) + ℓ ( p′ ) < N).
We would consequently conclude that the paths r and p′ have a vertex in common; this
would contradict the fact that the paths r and p′ have no vertex in common. Hence,
we cannot have y ( A′ ) ≥ y ( P). Thus, y ( A′ ) < y ( P). Hence, y ( A′ ) ≤ y ( P) − 1
(since y ( A′ ) and y ( P) are integers). But P is the next vertex after A on the path p.
Hence, there is an arc from A to P. If this arc was an east-step, then we would have
y ( P) = y ( A), which would contradict y ( A) ≤ y ( A′ ) < y ( P). Hence, this arc cannot
be an east-step. Thus, this arc must be a north-step. Therefore, y ( P) = y ( A) + 1.
Hence, y ( A′ ) ≤ y ( P) − 1 = y ( A) (since y ( P) = y ( A) + 1). But this contradicts
y ( A′ ) > y ( A). Thus, we have found a contradiction in Case 1.
An analogous argument can be used to find a contradiction in Case 2. In fact, there
is a symmetry inherent in Proposition 6.5.10, which interchanges Case 1 with Case 2.
Namely, if we reflect all points and paths across the x = y line (i.e., if we replace each
point (i, j) by ( j, i )), and if we rename A, B, A′ , B′ , p and p′ as A′ , A, B′ , B, p′ and p
(respectively), then Case 1 becomes Case 2 and vice versa184 . Thus, Case 2 spawns a
contradiction just like Case 1 did.
Finally, let us consider Case 3. In this case, we have neither y ( A′ ) > y ( A) nor
x ( A′ ) < x ( A). In other words, we have y ( A′ ) ≤ y ( A) and x ( A′ ) ≥ x ( A). Combining
x ( A′ ) ≥ x ( A) with x ( A′ ) ≤ x ( A), we obtain x ( A′ ) = x ( A). Combining y ( A′ ) ≤
y ( A) with y ( A′ ) ≥ y ( A), we obtain y ( A′ ) = y ( A). Now, the vertices A and A′
have the same x-coordinate (since x ( A′ ) = x ( A)) and the same y-coordinate (since
y ( A′ ) = y ( A)). Hence, these two vertices are equal. In other words, A = A′ . Hence,
the vertex A belongs to the path p′ (since the vertex A′ belongs to the path p′ ). However,
the vertex A belongs to the path p as well. Thus, the paths p and p′ have a vertex in
common (namely, A). This contradicts the fact that p and p′ have no vertex in common.
Thus, we have found a contradiction in Case 3.
We have now found contradictions in all three Cases 1, 2 and 3. Hence, our assump-
tion must have been false. We thus conclude that p and p′ have a vertex in common.
Now, forget that we fixed A, B, A′ , B′ , p and p′ . We thus have proven that if A, B, A′ ,
B′ , p and p′ are as in Proposition 6.5.10, and if ℓ ( p) + ℓ ( p′ ) = N, then p and p′ have
a vertex in common. In other words, Proposition 6.5.10 holds when ℓ ( p) + ℓ ( p′ ) = N.
This completes the induction step. Hence, Proposition 6.5.10 is proven.
(This proof is essentially the first proof from https://math.stackexchange.com/
questions/2870640/ .)

B.8. Definitions and examples of symmetric polynomials


Detailed proof of Lemma 7.1.17. Let σ ∈ S N . We shall prove that σ · f = f .
Let us follow Convention 5.3.16; thus, we shall refer to the simple transpositions
s1 , s2 , . . . , s N −1 in S N as “simples”.
Theorem 5.3.17 (a) (applied to n = N) shows that we can write σ as a composition
(i.e., product) of ℓ (σ) simples. Thus, in particular, we can write σ as a finite product

184 Importantly,this reflection preserves our digraph (in fact, it transforms north-steps into east-
steps and vice versa).
Math 701 Spring 2021, version April 6, 2024 page 671

of simples.185 In other words, there exist finitely many elements k1 , k2 , . . . , k p ∈ [ N − 1]


such that σ = sk1 sk2 · · · sk p . Consider these k1 , k2 , . . . , k p .
Now,
 
σ
|{z} · f = s k 1
s k 2
· · · s k p
· f = s k 1 · s k 2 · · · · · s k p −1 · s k p · f
=s s ···s
| {z }
k1 k2 kp
=f
(by (264))

= s k 1 · s k 2 · · · · · s k p −2 · s k p −1 · f = s k 1 · s k 2 · · · · · s k p −3 · s k p −2 · f
| {z } | {z }
=f =f
(by (264)) (by (264))

= ··· = f.
(To be fully rigorous, this is really an induction argument: We are showing (by induc-
tion on i) that sk1 sk2 · · · ski f = f for each i ∈ {0, 1, . . . , p}; the induction base is obvious
(since sk1 sk2 · · · sk0 = (empty product in S N ) = id), while the induction step relies on
(264). It is straightforward to fill in the details of this induction.)
Forget that we fixed σ. We thus have proved that σ · f = f for all σ ∈ S N . In other
words, the polynomial f is symmetric. This proves Lemma 7.1.17.

B.9. N-partitions and monomial symmetric polynomials


Proof of Proposition 7.2.9. Let us write the polynomial f as

f = ∑ f b1 ,b2 ,...,bN x1b1 x2b2 · · · x bNN , (434)


(b1 ,b2 ,...,b N )∈N N

where the coefficients f b1 ,b2 ,...,bN belong to K. Thus, the coefficient of any monomial
x1b1 x2b2 · · · x bNN in f is f b1 ,b2 ,...,bN . In other words,
h i
x1b1 x2b2 · · · x bNN f = f b1 ,b2 ,...,bN (435)

for each (b1 , b2 , . . . , b N ) ∈ N N .


The map σ is a permutation of [ N ] (since σ ∈ S N ). Hence, the map
NN → NN ,
 
(b1 , b2 , . . . , b N ) 7→ bσ(1) , bσ(2) , . . . , bσ( N ) (436)

is a bijection (indeed, this map simply permutes the entries of any given N-tuple using
the permutation σ; thus, it can be undone by permuting them using σ−1 ). Moreover,
the map σ itself is a bijection (since it is a permutation).
185 Alternatively, we can derive this from Corollary 5.3.22 (which is more well-known than The-
orem 5.3.17 (a)):
Corollary 5.3.22 (applied to n = N) shows that the symmetric group S N is generated
by the simples s1 , s2 , . . . , s N −1 . Hence, each element of S N is a (finite) product of simples
and their inverses. Since the inverses of the simples are simply these simples themselves
(because each i ∈ [ N − 1] satisfies si−1 = si ), we can simplify this statement as follows: Each
element of S N is a (finite) product of simples. Applying this to the element σ, we conclude
that σ is a (finite) product of simples.
Math 701 Spring 2021, version April 6, 2024 page 672

The definition of the action of S N on P yields


h i
σ · f = f x σ (1) , x σ (2) , . . . , x σ ( N )

= ∑ f b1 ,b2 ,...,bN xσb1(1) xσb2(2) · · · xσbN( N )


(b1 ,b2 ,...,b N )∈N N
!
here, we have substituted xσ(1) , xσ(2) , . . . , xσ( N )
for x1 , x2 , . . . , x N on both sides of (434)
b b b
= ∑ f bσ(1) ,bσ(2) ,...,bσ( N ) xσσ((11)) xσσ((22)) · · · xσσ((NN))
(b1 ,b2 ,...,b N )∈N N
 
(here, we have substituted bσ(1) , bσ(2) , . . . , bσ( N ) for the summation index (b1 , b2 , . . . , b N )
in the sum, since the map (436) is a bijection). Thus,
b b b
σ· f = ∑ f bσ(1) ,bσ(2) ,...,bσ( N ) xσσ((11)) xσσ((22)) · · · xσσ((NN))
(b1 ,b2 ,...,b N )∈N N | {z }
bσ ( i ) b
= ∏ x σ (i ) = ∏ xi i
i ∈{1,2,...,N } i ∈{1,2,...,N }
(here, we have substituted i for σ(i ) in the product,
since the map σ:[ N ]→[ N ] is a bijection)

= ∑ f bσ(1) ,bσ(2) ,...,bσ( N ) ∏ xibi


(b1 ,b2 ,...,b N )∈N N i ∈{1,2,...,N }
| {z }
b b b
= x11 x22 ··· x NN

= ∑ f bσ(1) ,bσ(2) ,...,bσ( N ) x1b1 x2b2 · · · x bNN .


(b1 ,b2 ,...,b N )∈N N

Hence, for each (b1 , b2 , . . . , b N ) ∈ N N , we have


 
b1 b2 bN
f bσ(1) ,bσ(2) ,...,bσ( N ) = the coefficient of x1 x2 · · · x N in σ · f
h i
= x1b1 x2b2 · · · x bNN (σ · f ) . (437)

 Now, let ( a1 , a2 , . . ., a N ) ∈ N be arbitrary. Then, (435) (applied to (b1 , b2 , . . . , b N ) =


N

aσ(1) , aσ(2) , . . . , aσ( N ) ) yields

a a a
h i
x1σ(1) x2σ(2) · · · x Nσ( N ) f = f aσ(1) ,aσ(2) ,...,aσ( N ) = x1a1 x2a2 · · · x aNN (σ · f )
 

(by (437), applied to (b1 , b2 , . . . , b N ) = ( a1 , a2 , . . . , a N )). This proves Proposition 7.2.9.

B.10. Schur polynomials


Detailed proof of Lemma 7.3.14. Write the N-partitions λ and µ as λ = (λ1 , λ2 , . . . , λ N )
and µ = (µ1 , µ2 , . . . , µ N ). Then, the definition of Y (λ/µ) yields

Y (λ/µ) = {(i, j) | i ∈ [ N ] and j ∈ Z and µi < j ≤ λi } . (438)


Math 701 Spring 2021, version April 6, 2024 page 673

We know that ( a, b) is an element of Y (λ/µ). Hence,

( a, b) ∈ Y (λ/µ) = {(i, j) | i ∈ [ N ] and j ∈ Z and µi < j ≤ λi } .

From this, we obtain a ∈ [ N ] and b ∈ Z and µ a < b ≤ λ a .


We know that (e, f ) is an element of Y (λ/µ). Hence,

(e, f ) ∈ Y (λ/µ) = {(i, j) | i ∈ [ N ] and j ∈ Z and µi < j ≤ λi } .

From this, we obtain e ∈ [ N ] and f ∈ Z and µe < f ≤ λe .


Now, from a ≤ c, we obtain c ≥ a ≥ 1 (since a ∈ [ N ]). Also, we have c ≤ e ≤ N
(since e ∈ [ N ]). Combining this with c ≥ 1, we obtain 1 ≤ c ≤ N, so that c ∈ [ N ] (since
c ∈ Z).
Now, µ1 ≥ µ2 ≥ · · · ≥ µ N (since µ is an N-partition). In other words, if u and v are
two elements of [ N ] satisfying u ≤ v, then µu ≥ µv . Applying this to u = a and v = c,
we obtain µ a ≥ µc (since a ≤ c). Hence, µc ≤ µ a < b ≤ d.
Also, λ1 ≥ λ2 ≥ · · · ≥ λ N (since λ is an N-partition). In other words, if u and v are
two elements of [ N ] satisfying u ≤ v, then λu ≥ λv . Applying this to u = c and v = e,
we obtain λc ≥ λe (since c ≤ e). Hence, d ≤ f ≤ λe ≤ λc (since λc ≥ λe ).
Combining this with µc < d, we obtain µc < d ≤ λc .
Thus, we know that c ∈ [ N ] and d ∈ Z and µc < d ≤ λc . In other words, (c, d) ∈
{(i, j) | i ∈ [ N ] and j ∈ Z and µi < j ≤ λi }. In view of (438), this rewrites as (c, d) ∈
Y (λ/µ). This proves Lemma 7.3.14.

Detailed proof of Lemma 7.3.17. We shall first prove parts (a) and (c), and then quickly
derive the rest from them.
(a) The skew tableau T is semistandard. Hence, we have

T (i, j) ≤ T (i, j + 1) (439)

for any (i, j) ∈ Y (λ/µ) satisfying (i, j + 1) ∈ Y (λ/µ). (Indeed, this is one of the
requirements placed on T in Definition 7.3.16.)
Now, let (i, j1 ) and (i, j2 ) be two elements of Y (λ/µ) satisfying j1 ≤ j2 . We must
prove that T (i, j1 ) ≤ T (i, j2 ).
Let k ∈ { j1 , j1 + 1, . . . , j2 − 1} be arbitrary. Then, j1 ≤ k ≤ j2 − 1. From k ≤ j2 − 1, we
obtain k + 1 ≤ j2 , so that k ≤ k + 1 ≤ j2 . Also, j1 ≤ k ≤ k + 1.
Thus, we have i ≤ i ≤ i and j1 ≤ k ≤ j2 . Hence, Lemma 7.3.14 (applied to ( a, b) =
(i, j1 ) and (e, f ) = (i, j2 ) and (c, d) = (i, k)) yields (i, k) ∈ Y (λ/µ). Therefore, the entry
T (i, k ) of T is well-defined.
Also, we have i ≤ i ≤ i and j1 ≤ k + 1 ≤ j2 . Hence, Lemma 7.3.14 (applied to
( a, b) = (i, j1 ) and (e, f ) = (i, j2 ) and (c, d) = (i, k + 1)) yields (i, k + 1) ∈ Y (λ/µ).
Therefore, the entry T (i, k + 1) of T is well-defined.
Now, (439) (applied to j = k) yields T (i, k ) ≤ T (i, k + 1) (since (i, k) ∈ Y (λ/µ) and
(i, k + 1) ∈ Y (λ/µ)).
Forget that we fixed k. We thus have shown that for each k ∈ { j1 , j1 + 1, . . . , j2 − 1},
the inequality T (i, k ) ≤ T (i, k + 1) holds (and both entries T (i, k ) and T (i, k + 1) are
well-defined). In other words, we have

T (i, j1 ) ≤ T (i, j1 + 1) ≤ T (i, j1 + 2) ≤ · · · ≤ T (i, j2 − 1) ≤ T (i, j2 ) .


Math 701 Spring 2021, version April 6, 2024 page 674

Hence, T (i, j1 ) ≤ T (i, j2 ). This proves Lemma 7.3.17 (a).


(c) The skew tableau T is semistandard. Hence, we have

T (i, j) < T (i + 1, j) (440)

for any (i, j) ∈ Y (λ/µ) satisfying (i + 1, j) ∈ Y (λ/µ). (Indeed, this is one of the
requirements placed on T in Definition 7.3.16.)
Now, let (i1 , j) and (i2 , j) be two elements of Y (λ/µ) satisfying i1 < i2 . We must
prove that T (i1 , j) < T (i2 , j).
Let k ∈ {i1 , i1 + 1, . . . , i2 − 1} be arbitrary. Then, i1 ≤ k ≤ i2 − 1. From k ≤ i2 − 1, we
obtain k + 1 ≤ i2 , so that k ≤ k + 1 ≤ i2 . Also, i1 ≤ k ≤ k + 1.
Thus, we have i1 ≤ k ≤ i2 and j ≤ j ≤ j. Hence, Lemma 7.3.14 (applied to ( a, b) =
(i1 , j) and (e, f ) = (i2 , j) and (c, d) = (k, j)) yields (k, j) ∈ Y (λ/µ). Therefore, the entry
T (k, j) of T is well-defined.
Also, we have i1 ≤ k + 1 ≤ i2 and j ≤ j ≤ j. Hence, Lemma 7.3.14 (applied to
( a, b) = (i1 , j) and (e, f ) = (i2 , j) and (c, d) = (k + 1, j)) yields (k + 1, j) ∈ Y (λ/µ).
Therefore, the entry T (k + 1, j) of T is well-defined.
Now, (440) (applied to i = k) yields T (k, j) < T (k + 1, j) (since (k, j) ∈ Y (λ/µ) and
(k + 1, j) ∈ Y (λ/µ)).
Forget that we fixed k. We thus have shown that for each k ∈ {i1 , i1 + 1, . . . , i2 − 1},
the inequality T (k, j) < T (k + 1, j) holds (and both entries T (k, j) and T (k + 1, j) are
well-defined). In other words, we have

T (i1 , j) < T (i1 + 1, j) < T (i1 + 2, j) < · · · < T (i2 − 1, j) < T (i2 , j) .

Hence, T (i1 , j) < T (i2 , j) (since i1 < i2 ). This proves Lemma 7.3.17 (c).
(b) Let (i1 , j) and (i2 , j) be two elements of Y (λ/µ) satisfying i1 ≤ i2 . We must prove
that T (i1 , j) ≤ T (i2 , j). If i1 < i2 , then this follows from Lemma 7.3.17 (c). Hence, for
the rest of this proof, we WLOG assume that we don’t have i1 < i2 . Thus, we have
i1 ≥ i2 . Combining this with i1 ≤ i2 , we obtain i1 = i2 . Thus, T (i1 , j) = T (i2 , j), so that
T (i1 , j) ≤ T (i2 , j). This proves Lemma 7.3.17 (b).
(d) Let (i1 , j1 ) and (i2 , j2 ) be two elements of Y (λ/µ) satisfying i1 ≤ i2 and j1 ≤
j2 . Then, i1 ≤ i2 ≤ i2 and j1 ≤ j1 ≤ j2 . Hence, Lemma 7.3.14 (applied to ( a, b) =
(i1 , j1 ) and (e, f ) = (i2 , j2 ) and (c, d) = (i2 , j1 )) yields (i2 , j1 ) ∈ Y (λ/µ). Therefore,
the entry T (i2 , j1 ) of T is well-defined. Hence, Lemma 7.3.17 (b) (applied to j = j1 )
yields T (i1 , j1 ) ≤ T (i2 , j1 ). Furthermore, Lemma 7.3.17 (a) (applied to i = i2 ) yields
T (i2 , j1 ) ≤ T (i2 , j2 ). Thus,

T (i1 , j1 ) ≤ T (i2 , j1 ) ≤ T (i2 , j2 ) .

This proves Lemma 7.3.17 (d).


(e) Let (i1 , j1 ) and (i2 , j2 ) be two elements of Y (λ/µ) satisfying i1 < i2 and j1 ≤
j2 . Then, i1 ≤ i2 ≤ i2 and j1 ≤ j1 ≤ j2 . Hence, Lemma 7.3.14 (applied to ( a, b) =
(i1 , j1 ) and (e, f ) = (i2 , j2 ) and (c, d) = (i2 , j1 )) yields (i2 , j1 ) ∈ Y (λ/µ). Therefore,
the entry T (i2 , j1 ) of T is well-defined. Hence, Lemma 7.3.17 (c) (applied to j = j1 )
Math 701 Spring 2021, version April 6, 2024 page 675

yields T (i1 , j1 ) < T (i2 , j1 ). Furthermore, Lemma 7.3.17 (a) (applied to i = i2 ) yields
T (i2 , j1 ) ≤ T (i2 , j2 ). Thus,

T (i1 , j1 ) < T (i2 , j1 ) ≤ T (i2 , j2 ) .

This proves Lemma 7.3.17 (e).

Detailed proof of Lemma 7.3.35. Let (i, j) ∈ Y (λ). Set p := T (i, j).
All entries of T are elements of [ N ] (by the definition of a tableau), and thus are pos-
itive integers. Thus, in particular, the i entries T (1, j) , T (2, j) , . . . , T (i, j) are positive
integers186 . Moreover, the tableau T is semistandard; thus, its entries increase strictly
down each column. Hence, in particular, we have

T (1, j) < T (2, j) < · · · < T (i, j) .

Thus, the i numbers T (1, j) , T (2, j) , . . . , T (i, j) are distinct. Moreover, all these num-
bers are positive integers (as we have seen above) and are ≤ p (since T (1, j) < T (2, j) <
· · · < T (i, j) = p); thus, they all belong to the set [ p]. This shows that there are i dis-
tinct numbers in the set [ p] (namely, the i numbers T (1, j) , T (2, j) , . . . , T (i, j)); in other
words, the set [ p] has at least i elements. In other words, we have |[ p]| ≥ i. Since
|[ p]| = p = T (i, j), this rewrites as T (i, j) ≥ i. This proves Lemma 7.3.35.
Detailed proof of Lemma 7.3.39. Write the N-tuple α ∈ N N as α = (α1 , α2 , . . . , α N ). Then,
Definition 7.3.2 (b) yields
  
α
aα = det xi j . (441)
1≤i ≤ N, 1≤ j≤ N

(a) Assume that the N-tuple α has two equal entries. In other words, the N-tuple
(α1 , α2 , . . . , α N ) has two equal entries (since α = (α1 , α2 , . . . , α N )). In other words, there
exist two elements u, v ∈ [ N ] such that u < v and αu = αv . Consider these u, v.
Now, from  α αu = αv , we conclude that the u-th and the v-th columns  α  of the N × N-
j
matrix xi are equal. Hence, this N × N-matrix xi j has
1≤i ≤ N, 1≤ j≤ N 1≤i ≤ N, 1≤ j≤ N
two equal columns (since u < v).
However, if an N × N-matrix A has  equal columns, then det A = 0 (by Theorem
 two α
6.4.14 (c)187 ). Applying this to A = xi j , we obtain
1≤i ≤ N, 1≤ j≤ N
  
α
det xi j =0
1≤i ≤ N, 1≤ j≤ N

186 Here, we are tacitly using the fact that the boxes (1, j) , (2, j) , . . . , (i, j) all belong to Y (λ) (so
that the corresponding entries T (1, j) , T (2, j) , . . . , T (i, j) are well-defined). This fact can be
checked as follows: Let u ∈ [i ]. Thus, u ≤ i. Now, write λ in the form λ = (λ1 , λ2 , . . . , λ N ).
Thus, λ1 ≥ λ2 ≥ · · · ≥ λ N (since λ is an N-partition). Hence, λu ≥ λi (since u ≤ i), so
that λi ≤ λu and therefore [λi ] ⊆ [λu ]. However, (i, j) ∈ Y (λ). In other words, i ∈ [ N ] and
j ∈ [λi ] (by the definition of the Young diagram Y (λ)). Hence, u ≤ i ≤ N (since i ∈ [ N ]),
so that u ∈ [ N ], and furthermore j ∈ [λi ] ⊆ [λu ]. Now, from u ∈ [ N ] and j ∈ [λu ], we
obtain (u, j) ∈ Y (λ). Forget that we fixed u. We thus have shown that (u, j) ∈ Y (λ) for
each u ∈ [i ]. In other words, the boxes (1, j) , (2, j) , . . . , (i, j) all belong to Y (λ), qed.
187 more precisely: by the analogue of Theorem 6.4.12 (c) for columns instead of rows
Math 701 Spring 2021, version April 6, 2024 page 676

 α

(since the N × N-matrix xi j has two equal columns). In view of (441),
1≤i ≤ N, 1≤ j≤ N
this rewrites as aα = 0. This proves Lemma 7.3.39 (a).
(b) Write the N-tuple β ∈ N N as β = ( β 1 , β 2 , . . . , β N ). Then, Definition 7.3.2 (b)
yields   
βj
a β = det xi . (442)
1≤i ≤ N, 1≤ j≤ N

However, the N-tuple β is obtained from α by swapping two entries. In other words,
the N-tuple ( β 1 , β 2 , . . . , β N ) is obtained from (α1 , α2 , . . . , α N ) by swapping
  two entries β
(since β = ( β 1 , β 2 , . . . , β N ) and α = (α1 , α2 , . . . , α N )). Thus, the matrix xi j
 α 1≤i ≤ N, 1≤ j≤ N
j 188
is obtained from the matrix xi by swapping two columns .
1≤i ≤ N, 1≤ j≤ N
However, if we swap two columns of an N × N-matrix A, then det A gets multiplied
by −1 (by Theorem 6.4.14 (b)189 ). In other words, if A and B are two N × N-matrices
such that B is obtained
 from
 A by swapping two  columns,
 then det B = − det A.
α β
Applying this to A = xi j and B = xi j , we obtain
1≤i ≤ N, 1≤ j≤ N 1≤i ≤ N, 1≤ j≤ N
     
β α
det xi j = − det xi j
1≤i ≤ N, 1≤ j≤ N 1≤i ≤ N, 1≤ j≤ N
 β  α
(since the N × N-matrix xi j is obtained from the matrix xi j
1≤i ≤ N, 1≤ j≤ N 1≤i ≤ N, 1≤ j≤ N
by swapping two columns). In view of (441) and (442), this rewrites as a β = − aα . This
proves Lemma 7.3.39 (b).

Some details omitted from the proof of Lemma 7.3.34. In our above proof of Lemma 7.3.34,
we have omitted certain arguments – namely, the proofs of the equalities (304) and
(305) in the proof of Observation 2. Let us now show these proofs:190

188 Indeed,if the N-tuple ( β 1 , β 2 , . . . , β N ) is obtained


 β  from (α1 , α2 , . . . , α N ) by swapping the
j
u-th and the v-th entries, then the matrix xi is obtained from the matrix
 α 1≤i ≤ N, 1≤ j≤ N
j
xi by swapping the u-th and the v-th columns.
1≤i ≤ N, 1≤ j≤ N
189 more precisely: by the analogue of Theorem 6.4.12 (b) for columns instead of rows
190 In both of these proofs, we will use the notations that were introduced in the proof of Obser-

vation 2 in the proof of Lemma 7.3.34.


Math 701 Spring 2021, version April 6, 2024 page 677

[Proof of (304): The definition of cont ( T ∗ ) yields

(cont ( T ∗ ))k+1 = (# of (k + 1) ’s in T ∗ )
   
   
  
∗  ∗ 

= # of (k + 1) ’s in col< j ( T )  + # of (k + 1) ’s in col≥ j ( T )
 
 | {z }   | {z }
=col≥ j T
 
= β k (col< j T )
(by (294)) (by (295))

(by (301), applied to i = k + 1)


 
= # of (k + 1) ’s in β k col< j T + # of (k + 1) ’s in col≥ j T
| {z } | {z }
=(# of k’s in col< j T ) =bk+1 −νk+1
(by (269), applied to col< j T instead of T) (by (299))

= # of k’s in col< j T + bk + 1 − νk+1
|{z}
= bk + 1
(since bk +1=bk+1 )

= # of k’s in col< j T + bk + 1 − νk+1 .

However, γ = ν + cont ( T ∗ ) + ρ, so that

γk+1 = (ν + cont ( T ∗ ) + ρ)k+1


= νk+1 + (cont ( T ∗ ))k+1 + ρ k +1
| {z } |{z}
=(# of k’s in col< j T )+bk +1−νk+1 = N −(k+1)
(by the definition of ρ)

= νk+1 + # of k’s in col< j T + bk + 1 − νk+1 + N − (k + 1)

= # of k’s in col< j T + bk + N − k. (443)

On the other hand, the definition of cont T yields

(cont T )k = (# of k’s in T )
 
= # of k’s in col< j T + # of k’s in col≥ j T
| {z }
=bk −νk
(by (298))

(by (300), applied to i = k)



= # of k’s in col< j T + bk − νk .

However, α = ν + cont T + ρ, so that

αk = (ν + cont T + ρ)k
= νk + (cont T )k + ρk
| {z } |{z}
=(# of k’s in col< j T )+bk −νk = N −k
(by the definition of ρ)

= νk + # of k’s in col< j T + bk − νk + N − k

= # of k’s in col< j T + bk + N − k.

Comparing this with (443), we obtain γk+1 = αk . This proves (304).]


Math 701 Spring 2021, version April 6, 2024 page 678

[Proof of (305): Let i ∈ [ N ] be such that i ̸= k and i ̸= k + 1. We must prove that


γi = α i .
The definition of cont ( T ∗ ) yields

(cont ( T ∗ ))i = (# of i’s in T ∗ )


   
   
  
∗  ∗ 

= # of i’s in col< j ( T )  + # of i’s in col≥ j ( T ) (by (301))
 
 | {z }   | {z }
=col≥ j T
 
= β k (col< j T )
(by (294)) (by (295))
 
= # of i’s in β k col< j T + # of i’s in col≥ j T
| {z }
=(# of i’s in col< j T )
(by (270),
applied to col< j T instead of T)
 
= # of i’s in col< j T + # of i’s in col≥ j T .

On the other hand, the definition of cont T yields


 
(cont T )i = (# of i’s in T ) = # of i’s in col< j T + # of i’s in col≥ j T

(by (300)). Comparing these two equalities, we obtain (cont ( T ∗ ))i = (cont T )i .
However, γ = ν + cont ( T ∗ ) + ρ, so that

γi = (ν + cont ( T ∗ ) + ρ)i = νi + (cont ( T ∗ ))i + ρi = νi + (cont T )i + ρi .


| {z }
=(cont T )i

However, α = ν + cont T + ρ, so that

αi = (ν + cont T + ρ)i = νi + (cont T )i + ρi .

Comparing these two equalities, we obtain γi = αi . This proves (305).]

References
[17f-hw7s] Darij Grinberg, UMN Fall 2017 Math 4990 homework set #7 with solu-
tions, http://www.cip.ifi.lmu.de/~grinberg/t/17f/hw7os.pdf

[18f-hw2s] Darij Grinberg, UMN Fall 2018 Math 5705 homework set #2 with so-
lutions, http://www.cip.ifi.lmu.de/~grinberg/t/18f/hw2s.pdf

[18f-hw4s] Darij Grinberg, UMN Fall 2018 Math 5705 homework set #4 with so-
lutions, http://www.cip.ifi.lmu.de/~grinberg/t/18f/hw4s.pdf
Math 701 Spring 2021, version April 6, 2024 page 679

[18f-hw4se] Jacob Elafandi, Math 5705: Enumerative Combinatorics, Fall 2018:


Homework 4: solutions to exercises 1, 2, 3, 7.
http://www.cip.ifi.lmu.de/~grinberg/t/18f/hw4s-elafandi.
pdf
[18f-mt3s] Darij Grinberg, Math 5705: Enumerative Combinatorics, Fall 2018:
Midterm 3 with solutions.
https://www.cip.ifi.lmu.de/~grinberg/t/18f/mt3s.pdf
[19fla] Darij Grinberg, Math 201-003: Linear Algebra, Fall 2019.
http://www.cip.ifi.lmu.de/~grinberg/t/19fla/
[19s] Darij Grinberg, Introduction to Modern Algebra (UMN Spring 2019
Math 4281 notes), 29 June 2019.
http://www.cip.ifi.lmu.de/~grinberg/t/19s/notes.pdf
[19s-mt3s] Darij Grinberg, Math 4281: Introduction to Modern Algebra, Spring
2019: Midterm 3 with solutions.
https://www.cip.ifi.lmu.de/~grinberg/t/19s/mt3s.pdf
[19fco] Darij Grinberg, Enumerative Combinatorics: class notes (Drexel Fall
2019 Math 222 notes), 11 September 2022.
http://www.cip.ifi.lmu.de/~grinberg/t/19fco/n/n.pdf
[20f] Darij Grinberg, Math 235: Mathematical Problem Solving, 22 March
2021.
http://www.cip.ifi.lmu.de/~grinberg/t/20f/mps.pdf
[22fco] Darij Grinberg, Math 220: Enumerative Combinatorics, Fall 2022.
https://www.cip.ifi.lmu.de/~grinberg/t/22fco/
[23wa] Darij Grinberg, An introduction to the algebra of rings and fields, 17
August 2023.
https://www.cip.ifi.lmu.de/~grinberg/t/23wa/23wa.pdf
[Aigner07] Martin Aigner, A Course in Enumeration, Graduate Texts in Mathe-
matics #238, Springer 2007.
[AndEri04] George E. Andrews, Kimmo Eriksson, Integer Partitions, Cam-
bridge University Press 2004.
[AndFen04] Titu Andreescu, Zuming Feng, A Path to Combinatorics for Under-
graduates: Counting Strategies, Springer 2004.
[Andrew16] George E. Andrews, Euler’s Partition Identity – Finite Version, 2016.
http://www.personal.psu.edu/gea1/pdf/317.pdf
[ApaKau13] Ainhoa Aparicio Monforte, Manuel Kauers, Formal Laurent series
in several variables, Expositiones Mathematicae, 31(4), pp. 350–367.
Math 701 Spring 2021, version April 6, 2024 page 680

[Armstr19] Drew Armstrong, Abstract Algebra I (Fall 2018) and Abstract Algebra
II (Spring 2019) lecture notes, 2019.
https://www.math.miami.edu/~armstrong/561fa18.php
https://www.math.miami.edu/~armstrong/562sp19.php

[Artin10] Michael Artin, Algebra, 2nd edition, Pearson 2010.

[Bell06] Jordan Bell, Euler and the pentagonal number theorem,


arXiv:math/0510054v2.

[BenQui03] Arthur T. Benjamin and Jennifer J. Quinn, Proofs that Really Count:
The Art of Combinatorial Proof, The Mathematical Association of
America, 2003.

[BenQui04] Arthur T. Benjamin and Jennifer J. Quinn, Proofs that Really Count:
The Magic of Fibonacci Numbers and More, Mathematical Adventures
for Students and Amateurs, (David F. Hayes and Tatiana Shubin,
editors), Spectrum Series of MAA, pp. 83–98, 2004.

[BenQui08] Arthur T. Benjamin and Jennifer J. Quinn, An Alternate Approach


to Alternating Sums: A Method to DIE for, The College Mathematics
Journal, Volume 39, Number 3, May 2008, pp. 191-202(12).

[Berndt06] Bruce C. Berndt, Number Theory in the Spirit of Ramanujan, Student


Mathematical Library #34, AMS 2006.
See https://faculty.math.illinois.edu/~berndt/
spiritcorrections.pdf for errata.

[Berndt17] Bruce C. Berndt, Spring 2017, MATH 595. Theory of Partitions, lec-
ture notes, 2017.
https://conf.math.illinois.edu/~berndt/math595-tp.html

[Bharga00] Manjul Bhargava, The Factorial Function and Generalizations, The


American Mathematical Monthly 107, No. 9 (Nov., 2000), pp. 783–
799.

[BjoBre05] Anders Bjorner, Francesco Brenti, Combinatorics of Coxeter Groups,


Springer 2005.
See https://www.mat.uniroma2.it/~brenti/correct.ps for er-
rata.

[Bona12] Miklos Bóna, Combinatorics of Permutations, 2nd edition, Tay-


lor&Francis 2012.
https://doi.org/10.1201/b12210

[Bourba02] Nicolas Bourbaki, Lie Groups and Lie Algebras: Chapters 4–6,
Springer 2002.
Math 701 Spring 2021, version April 6, 2024 page 681

[Bourba03] Nicolas Bourbaki, Algebra II: Chapters 4–7, Springer 2003.

[Bourba68] Nicolas Bourbaki, Theory of Sets, Springer 1968.

[Bourba74] Nicolas Bourbaki, Algebra I: Chapters 1–3, Addison-Wesley 1974.

[Brande14] Petter Brändén, Unimodality, log-concavity, real-rootedness and be-


yond, arXiv:1410.6601v1.

[Bresso99] David M. Bressoud, Proofs and Confirmations: The Story of the


Alternating Sign Matrix Conjecture, Cambridge University Press
1999.
See https://www.macalester.edu/~bressoud/books/PnC/
PnCcorrect.html for errata.

[Brewer14] Thomas S. Brewer, Algebraic properties of formal power series compo-


sition, PhD thesis at University of Kentucky, 2014.
https://uknowledge.uky.edu/cgi/viewcontent.cgi?article=
1021&context=math_etds

[BruRys91] Richard A. Brualdi and Herbert J. Ryser, Combinatorial Matrix The-


ory, Cambridge University Press 1991.

[BruSch83] Richard A. Brualdi and Hans Schneider, Determinantal Identities:


Gauss, Schur, Cauchy, Sylvester, Kronecker, Jacobi, Binet, Laplace, Muir,
and Cayley, Linear Algebra and its Applications 52–53, July 1983,
pp. 769–791.

[Camero16] Peter J. Cameron, Combinatorics 1: The art of counting (vol. 1 of


St Andrews Notes on Advanced Combinatorics), 28 March 2016,
https://cameroncounts.wordpress.com/lecture-notes/ .
See http://www.cip.ifi.lmu.de/~grinberg/algebra/
acnotes1-errata.pdf for corrections.

[Cohen08] Arjeh M. Cohen, Coxeter groups: Notes of a MasterMath course, Fall


2007, January 24, 2008.
http://arpeg.nl/wp-content/uploads/2016/01/CoxNotes.pdf

[Cohn04] Henry Cohn, Projective geometry over F1 and the Gaussian binomial
coefficients, American Mathematical Monthly 111 (2004), pp. 487–
495, arXiv:math/0407093v1.

[Comtet74] Louis Comtet, Advanced Combinatorics: The Art of Finite and Infinite
Expansions, D. Reidel Publishing Company, 1974.

[Conrad-UI] Keith Conrad, Universal identities, 13 February 2021.


https://kconrad.math.uconn.edu/blurbs/linmultialg/univid.
pdf
Math 701 Spring 2021, version April 6, 2024 page 682

[Dodgso67] Charles L. Dodgson, Elementary Treatise on Determinants with their


Applications to simultaneous linear equations and algebraical geometry,
Macmillan 1867.

[Doyle19] Peter G. Doyle, Frobenius’s last proof, arXiv:1904.06573v1.


See http://www.cip.ifi.lmu.de/~grinberg/algebra/
doyle-frob-errata.pdf for corrections.

[DumFoo04] David S. Dummit, Richard M. Foote, Abstract Algebra, 3rd edition,


Wiley 2004. ISBN: 978-0-471-43334-7.
See http://www.cems.uvm.edu/~rfoote/errata_3rd_edition.
pdf for errata.

[EdeStr04] Alan Edelman and Gilbert Strang, Pascal Matrices, American Math-
ematical Monthly, Vol. 111, No. 3 (March 2004), pp. 189–197.

[Edward05] Harold M. Edwards, Essays in Constructive Mathematics, Springer


2005.

[Egge19] Eric S. Egge, An Introduction to Symmetric Functions and Their Com-


binatorics, AMS 2019.
See https://www.ericegge.net/cofsf/index.html for corrections
and addenda.

[EGHetc11] Pavel Etingof, Oleg Golberg, Sebastian Hensel, Tiankai Liu, Alex
Schwendner, Dmitry Vaintrob, Elena Yudovina, Introduction to Rep-
resentation Theory, with historical interludes by Slava Gerovitch,
Student Mathematical Library 59, AMS 2011, updated version
2018.

[Erdos42] P. Erdös, On an elementary proof of some asymptotic formulas in the


theory of partitions, Annals of Mathematics 43 (1942), pp. 437–450.

[Euler48] Leonhard Euler, Introductio in analysin infinitorum, tomus 1, Lau-


sannæ 1748.

[Fink17] Alex Fink, Enumerative Combinatorics, module taught at the London


Taught Course Centre, 2017.
http://www.maths.qmul.ac.uk/~fink/enumcombi/

[FlaSed09] Philippe Flajolet, Robert Sedgewick, Analytic Combinatorics, Cam-


bridge University Press 2009.
https://algo.inria.fr/flajolet/Publications/book.pdf

[FoaHan04] Dominique Foata, Guo-Niu Han, The q-series in combinatorics;


permutation statistics, preliminary version, 5 May 2011.
https://irma.math.unistra.fr/~guoniu/papers/
p56lectnotes2.pdf
Math 701 Spring 2021, version April 6, 2024 page 683

[Ford21] Timothy J. Ford, Abstract Algebra, draft of a book, 10 October 2021.


http://math.fau.edu/ford/preprints/Algebra_Book/Algebra_
Book.pdf

[Fulton97] William Fulton, Young Tableaux, With Applications to Representation


Theory and Geometry, London Mathematical Society Student Texts
35, Cambridge University Press 1997.

[GaiGup77] P. Gaiha, S. K. Gupta, Adjacent Vertices on a Permutohedron, SIAM


Journal on Applied Mathematics 32(2), pp. 323–327.

[Galvin17] David Galvin, Basic discrete mathematics, 13 December 2017.


https://www3.nd.edu/~dgalvin1/60610/60610_S21/index.html
(Follow the Overleaf link and compile main.tex and
Course-notes.tex. See also https://web.archive.org/web/
20180205122609/http://www-users.math.umn.edu/~dgrinber/
comb/60610lectures2017-Galvin.pdf for an archived old ver-
sion.)

[Gashar98] Vesselin Gasharov, A Short Proof of the Littlewood–Richardson Rule,


Europ. J. Combinatorics (1998) 19, pp. 451–453.

[Gauss08] Carl Friedrich Gauß, Summatio quarumdam serierum singularium,


Comm. soc. reg. sci. Gottingensis rec. 1 (1811).

[Gauss16] C. F. Gauss, Demonstratio nova altera theorematis omnem functionem


algebraicam rationalem integram unius variabilis in factores reales primi
vel secundi gradus resolvi posse, Comm. Recentiores 3 (1816), pp.
107–142.

[GesVie85] Ira Gessel, Gérard Viennot, Binomial Determinants, Paths, and Hook
Length Formulae, Advances in Mathematics 58 (1985), pp. 300-321.

[GesVie89] Ira M. Gessel, X. G. Viennot, Determinants, Paths, and Plane Parti-


tions, 1989 preprint.
https://peeps.unet.brandeis.edu/~gessel/homepage/papers/
pp.pdf

[Godsil06] Chris Godsil, Lecture Notes on Combinatorics, version 5 December


2006.
https://web.archive.org/web/20070824060559/http:
//www.math.uwaterloo.ca/~dgwagner/MATH249/enum.pdf

[Goodma15] Frederick M. Goodman, Algebra: Abstract and Concrete, edition


2.6, 1 May 2015.
http://homepage.math.uiowa.edu/~goodman/algebrabook.dir/
book.2.6.pdf .
Math 701 Spring 2021, version April 6, 2024 page 684

[GouJac83] I. P. Goulden, D. M. Jackson, Combinatorial Enumeration, John Wiley


& Sons 1983, reprinted by Dover 2004.

[Grinbe09] Darij Grinberg, Solution to Problem 19.9 from “Problems from the
Book”.
http://www.cip.ifi.lmu.de/~grinberg/solutions.html

[Grinbe10] Darij Grinberg, A hyperfactorial divisibility, version of 27 July 2015.


http://www.cip.ifi.lmu.de/~grinberg/

[Grinbe15] Darij Grinberg, Notes on the combinatorial fundamentals of algebra, 15


September 2022.
http://www.cip.ifi.lmu.de/~grinberg/primes2015/sols.pdf
The numbering of theorems and formulas in this link might
shift when the project gets updated; for a “frozen” version
whose numbering is guaranteed to match that in the citations
above, see https://github.com/darijgr/detnotes/releases/
tag/2022-09-15c or arXiv:2008.09862v3.

[Grinbe17] Darij Grinberg, Why the log and exp series are mutually inverse, 11
May 2018.
https://www.cip.ifi.lmu.de/~grinberg/t/17f/logexp.pdf

[Grinbe18] Darij Grinberg, The diamond lemma and its applications (talk), 20 May
2018.
https://www.cip.ifi.lmu.de/~grinberg/algebra/
diamond-talk.pdf

[Grinbe19] Darij Grinberg, The trace Cayley-Hamilton theorem, 17 October 2022.


https://www.cip.ifi.lmu.de/~grinberg/algebra/trach.pdf

[Grinbe20] Darij Grinberg, Alternierende Summen: Aufgaben und Lösungen, 28


June 2022.
https://www.cip.ifi.lmu.de/~grinberg/algebra/
aimo2020-altsum-lsg.pdf

[Grinbe21] Darij Grinberg, Regular elements of a ring, monic polynomials and


“lcm-coprimality”, 22 May 2021.
https://www.cip.ifi.lmu.de/~grinberg/algebra/regpol.pdf

[Grinbe23] Darij Grinberg, An introduction to graph theory, arXiv:2308.04512v1.

[GriRei20] Darij Grinberg, Victor Reiner, Hopf algebras in Combinatorics,


version of 27 July 2020, arXiv:1409.8356v7.
See also http://www.cip.ifi.lmu.de/~grinberg/algebra/
HopfComb-sols.pdf for a version that gets updated.
Math 701 Spring 2021, version April 6, 2024 page 685

[GrKnPa94] Ronald L. Graham, Donald E. Knuth, Oren Patashnik, Concrete


Mathematics, Second Edition, Addison-Wesley 1994.
See https://www-cs-faculty.stanford.edu/~knuth/gkp.html
for errata.
[Guicha20] David Guichard, An Introduction to Combinatorics and Graph Theory,
23 April 2021.
https://www.whitman.edu/mathematics/cgt_online/book/
[Hellel08] Geir T. Helleloid, Algebraic Combinatorics, 11 November 2008.
http://libgen.rs/book/index.php?md5=
421E2DABBCD43E900BC280AF5A122FE6
[Henric74] Peter Henrici, Applied and Computational Complex Analysis, volume
1, Wiley 1974.
[Hirsch17] Michael D. Hirschhorn, The Power of q: A Personal Journey, Springer
2017.
See https://link.springer.com/chapter/10.1007/
978-3-319-57762-3_44 for errata.
[Hirsch87] Michael D. Hirschhorn, A simple proof of Jacobi’s four-square theorem,
Proceedings of the American Mathematical Society 101 (1987), pp.
436–438.
[Hopkin17] Sam Hopkins, RSK via local transformations, 13 September 2022.
https://www.samuelfhopkins.com/docs/rsk.pdf
[Johnso20] Warren Pierstorff Johnson, Introduction to q-analysis, American
Mathematical Society 2020.
[Joyner08] W. D. Joyner, Mathematics of the Rubik’s cube, 19 August 2008.
https://web.archive.org/web/20160304122348/http:
//www.permutationpuzzles.org/rubik/webnotes/ (link to the
PDF at the bottom).
[KacChe02] Victor Kac, Pokman Cheung, Quantum Calculus, Springer 2002.
[Kitaev11] Sergey Kitaev, Patterns in Permutations and Words, Springer 2011.
[KlaPol79] David Klarner, Jordan Pollack, Domino tilings of rectangles with fixed
width, Discrete Mathematics, 32(1), pp. 45–52.
[KliSch97] Anatoli Klimyk, Konrad Schmüdgen, Quantum groups and their rep-
resentations, Springer 1997.
[Knapp16] Anthony W. Knapp, Basic Algebra, Digital Second Editions By
Anthony W. Knapp, 2017, http://www.math.stonybrook.edu/
~aknapp/download.html .
Math 701 Spring 2021, version April 6, 2024 page 686

[Knuth1] Donald Ervin Knuth, The Art of Computer Programming, volume 1:


Fundamental Algorithms, 3rd edition, Addison–Wesley 1997.
See https://www-cs-faculty.stanford.edu/~knuth/taocp.html
for errata.

[Knuth2] Donald Ervin Knuth, The Art of Computer Programming, volume 2:


Seminumerical Algorithms, 3rd edition, Addison–Wesley 1998.
See https://www-cs-faculty.stanford.edu/~knuth/taocp.html
for errata.

[Knuth3] Donald Ervin Knuth, The Art of Computer Programming, volume 3:


Sorting and Searching, 2nd edition, Addison–Wesley 1998.
See https://www-cs-faculty.stanford.edu/~knuth/taocp.html
for errata.

[Koch16] Dick Koch, The Pentagonal Number Theorem and All That, 26 August
2016.
https://darkwing.uoregon.edu/~koch/PentagonalNumbers.pdf

[KraPro10] Hanspeter Kraft, Claudio Procesi, Classical invariant theory: A


primer, July 1996.
https://kraftadmin.wixsite.com/hpkraft
See http://www.cip.ifi.lmu.de/~grinberg/algebra/
KP-errata-web.pdf for unofficial errata.

[Kratte17] Christian Krattenthaler, Lattice Path Enumeration,


arXiv:1503.05930v3, published in: Handbook of Enumerative
Combinatorics, M. Bóna (ed.), Discrete Math. and Its Appl., CRC
Press, Boca Raton-London-New York, 2015, pp. 589–678.
https://arxiv.org/abs/1503.05930v3

[Kratte99] Christian Krattenthaler, Advanced Determinant Calculus, Séminaire


Lotharingien Combin. 42 (1999) (The Andrews Festschrift), paper
B42q, 67 pp., arXiv:math/9902004v3.

[Krishn86] V. Krishnamurthy, Combinatorics: Theory and Applications, Ellis Hor-


wood Ltd. 1986.

[Krob95] Daniel Krob, Eléments de combinatoire, version 1.0, 1995.


http://krob.cesames.net/IMG/ps/combi.ps

[Lando03] Sergei K. Lando, Lectures on Generating Functions, Student Mathe-


matical Library 23, AMS 2003.

[Laue15] Hartmut Laue, Determinants, version 17 May 2015,


http://www.math.uni-kiel.de/algebra/laue/homepagetexte/
det.pdf .
Math 701 Spring 2021, version April 6, 2024 page 687

[LLPT95] D. Laksov, A. Lascoux, P. Pragacz, and A. Thorup, The LLPT Notes,


edited by A. Thorup, 28 March 2018,
http://web.math.ku.dk/noter/filer/sympol.pdf .

[Loehr11] Nicholas A. Loehr, Bijective Combinatorics, Chapman & Hall/CRC


2011.

[Macdon95] Ian G. Macdonald, Symmetric Functions and Hall Polynomials, Ox-


ford Mathematical Monographs, 2nd edition, Oxford Science Pub-
lications 1995.

[Martin13] Jeremy L. Martin, Counting Dyck Paths, 11 September 2013.


https://jlmartin.ku.edu/~jlmartin/courses/math724-F13/
count-dyck.pdf

[Martin21] Jeremy L. Martin, Lecture Notes on Algebraic Combinatorics, 23 Au-


gust 2023.
https://jlmartin.ku.edu/LectureNotes.pdf

[Melcze24] Stephen Melczer, An Invitation to Enumeration, lecture notes 2024.


https://enumeration.ca/

[MenRem15] Anthony Mendes, Jeffrey Remmel, Counting with Symmetric Func-


tions, Springer 2015.

[MiRiRu87] Ray Mines, Fred Richman, Wim Ruitenburg, A Course in Construc-


tive Algebra, Springer 1988.

[Muir30] Thomas Muir, The theory of determinants in the historical order of de-
velopment, 5 volumes (1906–1930), later reprinted by Dover.
http://www-igm.univ-mlv.fr/~al/

[MuiMet60] Thomas Muir, A Treatise on the Theory of Determinants, revised and


enlarged by William H. Metzler, Dover 1960.

[Mulhol21] Jamie Mulholland, Permutation Puzzles: A Mathematical Perspective,


12 January 2021.
http://www.sfu.ca/~jtmulhol/math302/notes/
permutation-puzzles-book.pdf

[Ness61] Wilhelm Ness, Proben aus der elementaren additiven Zahlentheorie,


Otto Salle Verlag, Frankfurt am Main / Hamburg 1961.

[Newste19] Clive Newstead, An Infinite Descent into Pure Mathematics, version


0.4, 1 January 2020.
https://infinitedescent.xyz
Math 701 Spring 2021, version April 6, 2024 page 688

[Niven69] Ivan Niven, Formal Power Series, The American Mathematical


Monthly 76, No. 8 (Oct., 1969), pp. 871–889.
https://www.maa.org/programs/maa-awards/writing-awards/
formal-power-series

[OlvSha18] Peter J. Olver, Chehrzad Shakiban, Applied Linear Algebra, 2nd edi-
tion, Springer 2018.
https://doi.org/10.1007/978-3-319-91041-3
See http://www.math.umn.edu/~olver/ala.html for errata.

[Pak06] Igor Pak, Partition bijections, a survey, Ramanujan J 12 (2006), pp.


5–75.
See https://www.math.ucla.edu/~pak/papers/research.htm for
a preprint and updates.

[Prasad15] Amritanshu Prasad, Representation Theory: A Combinatorial View-


point, Cambridge University Press 2015.

[Prasol94] Viktor V. Prasolov, Problems and Theorems in Linear Algebra, Trans-


lations of Mathematical Monographs, vol. #134, AMS 1994.

[Proces07] Claudio Procesi, Lie Groups: An Approach through Invariants and


Representations, Springer 2007.

[Quinla21] Rachel Quinlan, MA3343 Groups, Semester 2020-2021,


http://www.maths.nuigalway.ie/~rquinlan/groups/

[Robins05] Donald W. Robinson, The classical adjoint, Linear Algebra and its
Applications 411 (2005), pp. 254–276.

[Sagan01] Bruce Sagan, The Symmetric Group, Graduate Texts in Mathematics


203, 2nd edition 2001.
https://doi.org/10.1007/978-1-4757-6804-6
See https://users.math.msu.edu/users/bsagan/Books/Sym/
errata.pdf for errata.

[Sagan19] Bruce Sagan, Combinatorics: The Art of Counting, Graduate Studies


in Mathematics 210, 21 September 2020.
https://users.math.msu.edu/users/bsagan/Books/Aoc/final.
pdf
See https://users.math.msu.edu/users/bsagan/Books/Aoc/
errata.pdf for errata.

[Sam19] Steven V. Sam, Notes for Math 184: Combinatorics, 9 December 2019.
https://math.ucsd.edu/~ssam/old/19F-184/notes.pdf
Math 701 Spring 2021, version April 6, 2024 page 689

[Sam21] Steven V. Sam, Notes for Math 188: Algebraic Combinatorics, 17 May
2021.
https://math.ucsd.edu/~ssam/188/notes-188.pdf

[Sambal22] Benjamin Sambale, An invitation to formal power series,


arXiv:2205.00879v5.

[Savage22] Alistair Savage, Symmetric Functions, lecture notes, 2 May 2022.


https://alistairsavage.ca/symfunc/notes/
Savage-SymmetricFunctions.pdf

[Schwar16] Rich Schwartz, The Cauchy-Binet Theorem, 9 February 2016.


https://www.math.brown.edu/reschwar/M123/cauchy.pdf

[Sills18] Andrew V. Sills, An Invitation to the Rogers–Ramanujan Identities,


CRC Press 2018.
See http://home.dimacs.rutgers.edu/~asills/ for errata.

[Smid09] Vita Smid, Inclusion-Exclusion Principle: Proof by Mathematical


Induction, 2 December 2009.
https://faculty.math.illinois.edu/~nirobles/files453/iep_
proof.pdf

[Spivey19] Michael Z. Spivey, The Art of Proving Binomial Identities, CRC Press
2019.
See https://mathcs.pugetsound.edu/~mspivey/Errata.html for
errata.

[Stanko94] Zvezdelina E. Stankova, Forbidden subsequences, Discrete Mathe-


matics 132 (1994), pp. 291–316.

[Stanle11] Richard P. Stanley, Enumerative Combinatorics, volume 1, Second edi-


tion, Cambridge University Press 2012.
See http://math.mit.edu/~rstan/ec/ for a draft (2021) and er-
rata.

[Stanle15] Richard P. Stanley, Catalan Numbers, 1st edition, Cambridge Uni-


versity Press 2015.
See http://math.mit.edu/~rstan/catalan/ for errata.

[Stanle18] Richard P. Stanley, Algebraic Combinatorics: Walks, Trees, Tableaux,


and More, 2nd edition, Springer 2018.
See http://www-math.mit.edu/~rstan/algcomb/index.html for
errata.

[Stanle23] Richard P. Stanley, Enumerative Combinatorics, volume 2, Second edi-


tion, Cambridge University Press 2023.
See http://math.mit.edu/~rstan/ec/ for errata.
Math 701 Spring 2021, version April 6, 2024 page 690

[Stanle89] Richard P. Stanley, Log-Concave and Unimodal Sequences in Algebra,


Combinatorics, and Geometry, Annals of the New York Academy of
Sciences, 576 (1 Graph Theory), pp. 500–535.

[Stembr02] John R. Stembridge, A Concise Proof of the Littlewood-Richardson


Rule, Electronic Journal of Combinatorics 9 (2002), #N5.

[Strick13] Neil Strickland, MAS201 Linear Mathematics for Applications, lecture


notes, 28 September 2013.
http://neil-strickland.staff.shef.ac.uk/courses/MAS201/
See http://www.cip.ifi.lmu.de/~grinberg/t/19fla/MAS201.
pdf for an updated version of the lecture notes.

[Strick20] Neil Strickland, MAS334 Combinatorics, lecture notes and solutions,


6 December 2020.
https://strickland1.org/courses/MAS334/

[Stucky15] Eric Stucky, An Exposition of Kasteleyn’s Solution of the Dimer Model,


senior thesis at Harvey Mudd College, 2015.
https://scholarship.claremont.edu/hmc_theses/89/

[Talask12] Kelli Talaska, Determinants of weighted path matrices,


arXiv:1202.3128v1.

[Tignol16] Jean-Pierre Tignol, Galois’ Theory of Algebraic Equations, 2nd edition,


World Scientific 2016.

[Uecker17] Torsten Ueckerdt, Lecture Notes Combinatorics (2017), 30 May 2017.


http://www.math.kit.edu/iag6/lehre/combinatorics2017s/
media/script.pdf
See http://www.cip.ifi.lmu.de/~grinberg/algebra/
ueckerdt-script2017-errata.pdf for an inofficial list of er-
rata.

[Vorobi02] Nicolai N. Vorobiev, Fibonacci Numbers, Translated from the Rus-


sian by Mircea Martin, Springer 2002 (translation of the 6th Rus-
sian edition).

[Wagner05] Carl G. Wagner, Basic Combinatorics, 14 February 2005.


http://www.math.utk.edu/~wagner/papers/comb.pdf

[Wagner08] David G. Wagner, C&O 330: Introduction to Combinatorial Enumera-


tion, version 20 June 2012.
https://melczer.ca/330/WagnerNotes.pdf

[Wagner17] Stephan Wagner, Combinatorics, 19 June 2017.


https://web.archive.org/web/20221222034640/https:
//math.sun.ac.za/swagner/NotesComb.pdf
Math 701 Spring 2021, version April 6, 2024 page 691

[Wagner20] Carl G. Wagner, A First Course in Enumerative Combinatorics, Pure


and Applied Undergraduate Texts 49, AMS 2020.

[White10] Dennis White, Math 4707: Inclusion-Exclusion and Derangements, 18


October 2010.
https://www-users.cse.umn.edu/~reiner/Classes/
Derangements.pdf

[Wildon19] Mark Wildon, Introduction to Combinatorics, 14 September 2020.


http://www.ma.rhul.ac.uk/~uvah099/Maths/CombinatoricsWeb.
pdf

[Wildon20] Mark Wildon, An involutive introduction to symmetric functions, 8


May 2020.
http://www.ma.rhul.ac.uk/~uvah099/Maths/Sym/SymFuncs2020.
pdf
See http://www.cip.ifi.lmu.de/~grinberg/algebra/
symfuncs2017-2020-05-08-errata.pdf for an inofficial list
of errata.

[Wilf04] Herbert S. Wilf, generatingfunctionology, 2nd edition 2004.


https://www.math.upenn.edu/~wilf/DownldGF.html

[Wilf09] Herbert S. Wilf, Lectures on Integer Partitions, 2009.


https://www.math.upenn.edu/~wilf/PIMS/PIMSLectures.pdf

[Zabroc03] Mike Zabrocki, F. Franklin’s proof of Euler’s pentagonal number


theorem, 28 February 2003.
https://garsia.math.yorku.ca/~zabrocki/math4160w03/
eulerpnt.pdf

[Zeilbe85] Doron Zeilberger, A combinatorial approach to matrix algebra, Dis-


crete Mathematics 56 (1985), pp. 61–72.

[Zeilbe98] Doron Zeilberger, Dodgson’s Determinant-Evaluation Rule proved by


Two-Timing Men and Women, The Electronic Journal of Combina-
torics, vol. 4, issue 2 (1997) (The Wilf Festschrift volume), R22.
http://www.combinatorics.org/ojs/index.php/eljc/article/
view/v4i2r22
Also available as arXiv:math/9808079v1.
http://arxiv.org/abs/math/9808079v1

[Zelevi81] A. V. Zelevinsky, A generalization of the Littlewood–Richardson rule


and the Robinson–Schensted–Knuth correspondence, J. Algebra 69
(1981), pp. 82–94.
Math 701 Spring 2021, version April 6, 2024 page 692

[Zeng93] Jiang Zeng, A bijective proof of Muir’s identity and the Cauchy-Binet
formula, Linear Algebra and its Applications 184, 15 April 1993,
pp. 79–82.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy