0% found this document useful (0 votes)
20 views

Gtacbook

Uploaded by

Yunjiang Jiang
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views

Gtacbook

Uploaded by

Yunjiang Jiang
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 195

Graph Theory and Additive Combinatorics

Yufei Zhao
Massachusetts Institute of Technology
i

This is a book in progress. It has not been carefully proofread. Your feedback is appreciated in
improving this draft.
Please report errors, suggestions, and comments to
https://bit.ly/gtac-form
Current students of MIT 18.225 should use the Piazza forum instead.

This version was last compiled: November 1, 2021


Table of contents

Asymptotic notation convention . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v


Chapter 0. Appetizer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
0.1. Schur’s theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
0.2. Progressions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
Further reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
Chapter 1. Forbidding a subgraph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.1. Forbidding a triangle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.2. Forbidding a clique . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.3. Turán density and supersaturation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
1.4. Forbidding a complete bipartite graph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
1.5. Forbidding a general subgraph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
1.6. Forbidding cycles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
1.7. Forbidding a sparse bipartite graph (and dependent random choice) . . . . . . . . . . . 29
1.8. Lower bound constructions: overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
1.9. Randomized constructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
1.10. Algebraic constructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
1.11. Randomized algebraic constructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
Further reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
Chapter 2. The graph regularity method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
2.1. Szemerédi’s regularity lemma . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
2.2. Triangle counting lemmas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
2.3. Triangle removal lemma . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
2.4. Graph theoretic proof of Roth’s theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
2.5. Constructing large 3-AP-free sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
2.6. Graph counting and removal lemmas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
2.7. Exercises on applying graph regularity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
2.8. Induced graph removal and strong regularity . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
2.9. Graph property testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
2.10. Hypergraph removal and Szemerédi’s theorem . . . . . . . . . . . . . . . . . . . . . . . . . 70
2.11. Hypergraph regularity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
Further reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
Chapter 3. Pseudorandom graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
3.1. Quasirandom graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
3.2. Expander mixing lemma . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
3.3. Cayley graphs on Z/pZ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
3.4. Quasirandom groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
iii
iv TABLE OF CONTENTS

3.5. Quasirandom Cayley graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97


3.6. Second eigenvalue bound . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
Further reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
Chapter 4. Graph limits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
4.1. Graphons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
4.2. Cut distance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
4.3. Homomorphism density . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
4.4. W-random graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
4.5. Counting lemma . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
4.6. Weak regularity lemma . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
4.7. Martingale convergence theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
4.8. Compactness of the space of graphons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
4.9. Equivalence of convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
Further reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
Chapter 5. Graph homomorphism inequalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
5.1. Edge versus triangle densities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
5.2. Cauchy–Schwarz . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
5.3. Hölder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144
5.4. Lagrangian . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150
5.5. Entropy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
5.6. Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
Further reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154
Chapter 6. Forbidding 3-term arithmetic progressions . . . . . . . . . . . . . . . . . . . . . . . . . . 155
6.1. Fourier analysis in finite field vector spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
6.2. Roth’s theorem in the finite field model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159
6.3. Fourier analysis in the integers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163
6.4. Roth’s theorem in the integers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164
6.5. The polynomial method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169
6.6. Arithmetic regularity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172
6.7. Popular common difference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177
Further reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181
ASYMPTOTIC NOTATION CONVENTION v

Asymptotic notation convention


Each line below has the same meaning for positive functions f and g (as some parameter,
usually n, tends to infinity)
• f . g, f = O(g), g = Ω( f ), f ≤ Cg (for some constant C > 0)
• f /g → 0, f = o(g) (and sometimes g = ω( f ))
• f = Θ(g), f  g, g . f . g
• f ∼ g, f = (1 + o(1))g
• whp (= with high probability) means with probability 1 − o(1)
We will avoid using  since this notation has different meaning in different communities and
by different authors. In analytic number theory, f  g is standard for f = O(g) (this is called
Vinogradov notation). In combinatorics and probability, f  g sometimes means f = o(g), and
sometimes means that f is “much smaller” than g (e.g., smaller than some function of g).
Subscripts (e.g., Os ( f ), .s ) are used to emphasize that the hidden constants may depend on the
subscripted parameters.
Sometimes os;n→∞ ( f ) is used to emphasize that that s is fixed and n → ∞.
When asymptotic notation is used in the hypothesis of a statement, it should be interpreted as
being applied to a sequence rather than a single object. For example, given functions f and g, we
write
if G has f (G) = o(1), then g(G) = o(1)
to mean
whenever a sequence Gn satisfies f (Gn ) = o(1), then g(Gn ) = o(1),
which is also equivalent to saying that
for every  > 0 there is some δ > 0 such that if | f (G)| ≤ δ then |g(C)| ≤ .
CHAPTER 0

Appetizer

0.1. Schur’s theorem


Schur (1916) considered the approach of trying to prove Fermat’s Last Theorem by reducing
the equation X n + Y n = Z n modulo a prime p. However, it turns out this approach can never
work, as Dickson had already shown that the mod p of Fermat’s equation can always be solved for
sufficiently large primes p, no matter what n is. Schur gave a simpler proof of this result by proving
the following theorem, showing that Dickson’s result is much more about combinatorics than about
number theory.

Theorem 0.1.1 (Schur’s theorem). If the positive integers are colored with finitely many colors,
then there is always a monochromatic solution to x + y = z (i.e., x, y, z all have the same color).
We will prove Schur’s theorem shortly. But first, let us show how to deduce the existence of
solutions to X n + Y n ≡ Z n (mod p) using Schur’s theorem.
Schur’s theorem is stated above in its “infinitary” form. It is equivalent to a “finitary” formulation
below.
We write [N] := {1, 2, . . . , N }.

Theorem 0.1.2 (Schur’s theorem, finitary version). For every positive integer r, there exists a
positive integer N = N(r) such that if the elements of [N] are colored with r colors, then there is a
monochromatic solution to x + y = z with x, y, z ∈ [N].
With the finitary version, we can also ask quantitative questions such as how big does N(r)
have to be as a function of r. For most questions of this type, we do not know the answer, even
approximately.
Let us show that the two formulations, Theorems 0.1.1 and 0.1.2, are equivalent. It is clear
that the finitary version of Schur’s theorem implies the infinitary version. To see that the infinitary
version implies the finitary version, fix r, and suppose that for every N there is some coloring
φ N : [N] → [r] that avoids monochromatic solutions to x + y = z. We can take an infinite
subsequence of (φ N ) such that, for every k ∈ N, the value of φ N (k) stabilizes to a constant as N
increases along this subsequence. Then the φ N ’s, along this subsequence, converges pointwise to
some coloring φ : N → [r] avoiding monochromatic solutions to x + y = z, but this contradicts the
infinitary statement.
Let us now deduce the claim about the modular Fermat’s equation discussed at the beginning.

Theorem 0.1.3. Let n be a positive integer. For all sufficiently large primes p, there are X, Y, Z ∈
{1, . . . , p − 1} such that X n + Y n ≡ Z n (mod p).
Proof of Theorem 0.1.3 assuming Schur’s theorem (Theorem 0.1.2). We write (Z/pZ)× for the group
of nonzero residues mod p under multiplication. Let H = {x n : x ∈ (Z/pZ)× } be the subgroup of
n-th powers in (Z/pZ)× . Since (Z/pZ)× is a cyclic group of order p − 1 (due to the existence of
1
2 0. APPETIZER

Figure 0.1.1. Frank Ramsey (1903–1930) had made major contributions to mathe-
matical logic, philosophy, and economics, before his untimely death at age 26 after
suffering from chronic liver problems.

primitive roots mod p, a fact from elementary number theory), the index of H in (Z/pZ)× is equal
to gcd(n, p − 1) ≤ n. So the cosets of H partition {1, 2, . . . , p − 1} into at most n sets. By the finitary
statement of Schur’s theorem (Theorem 0.1.2), for p large enough, there is a solution to
x+y=z in Z
in one of the cosets of H, say aH for some a ∈ (Z/pZ)× . Since H consists of n-th powers, we have
x = aX n , y = aY n , and z = aZ n for some X, Y, Z ∈ (Z/pZ)× . Thus
aX n + aY n ≡ aZ n (mod p).
Hence
Xn + Y n ≡ Zn (mod p)
as desired. 
Now let us prove Theorem 0.1.2 by deducing it from a similar sounding result about coloring
the edges of a complete graph. The next result is a special case of Ramsey’s theorem.

Theorem 0.1.4 (Multicolor triangle Ramsey theorem). For every positive integer r, there is some
integer N = N(r) such that if the edges of KN , the complete graph on N vertices, are colored with
r colors, then there is always a monochromatic triangle.
Proof. Define
N1 = 3, and Nr = r(Nr−1 − 1) + 2 for all r ≥ 2. (0.1.1)
We will show by induction on r that every coloring of the edges of KNr by r colors has a monochro-
matic triangle. The case r = 1 holds trivially. Suppose the claim is true for r − 1 colors. Consider
any coloring of the edges of KNr by r colors. Pick an arbitrary vertex v.

V0

Of the Nr − 1 = r(Nr−1 − 1) + 1 edges incident to v, by the pigeonhole principle, at least Nr−1 edges
incident to v have the same color, say red. Let V0 be the vertices joined to v by a red edge. If there
is a red edge inside V0 , we obtain a red triangle. Otherwise, there are at most r − 1 colors appearing
among |V0 | ≥ Nr−1 vertices, and we have a monochromatic triangle by induction. 
0.1. SCHUR’S THEOREM 3

Exercise 0.1.5. Show that in the Nr defined in (0.1.1) satisfies Nr = 1 + r!


Ír
i=0 1/i!. Deduce that
Nr ≤ dr!ee.
The smallest N(r) in Theorem 0.1.4 is also known as the multicolor triangle Ramsey number, and
usually denoted R(3, 3, . . . , 3) with 3 repeated r times. This number grows at least exponentially
quickly in r. For example, there is a coloring of the edges of K2r using r colors to avoid a
monochromatic triangle: writing the vertices as elements of {0, 1}r , assign an edge color i if i
is the smallest index such that the two endpoints differ on coordinate i (check that this coloring
is monochromatic triangle-free!). Schur himself actually gave an even better lower bound: see
Exercise 0.1.7. Even better lower bounds have been found in modern times using a computer, e.g.,
N(r) ≥ c(321)r/5 for some constant c > 0 (Exoo 1994).
It is a major open problem in Ramsey theory to determine the rate of growth of this Ramsey
number. It is related to other important topics in combinatorics such as the Shannon capacity of
graphs (see, e.g., Nešetřil and Rosenfeld 2001) In particular, the following still open question is
one of Erdős’ favorite problems.
Open problem 0.1.6. Is there a constant C > 0 such that Theorem 0.1.4 holds for N(r) ≤ C r for
all r?
We are now ready to prove Schur’s theorem by setting up a graph whose triangles correspond
to solutions to x + y = z, thereby allowing us to “transfer” the above result to the integers.

φ(k − i)

i φ( j − i) j φ(k − j) k

Proof of Schur’s theorem (Theorem 0.1.2). Let φ : [N] → [r] be a coloring. Color the edges of a
complete graph with vertices {1, . . . , N + 1} by giving the edge {i, j} with i < j the color φ( j − i).
By Theorem 0.1.4, if N is large enough, then there is a monochromatic triangle, say on vertices
i < j < k. So φ( j − i) = φ(k − j) = φ(k − i). Take x = j − i, y = k − j, and z = k − i. Then
φ(x) = φ(y) = φ(z) and x + y = z, as desired. 
Notice how we solved a number theory problem by moving over to a graph theoretic setup.
We gained greater flexibility by considering graphs, since the induction on vertices does not
correspond to a natural operator for the integers that we started with. Later on we will see other
more sophisticated examples of this idea, where taking a number theoretic problem to the land of
graph theory gives us a new perspective.
Exercise 0.1.7 (Schur’s lower bound). Let N(r) denote the smallest positive integer in Schur’s
theorem, Theorem 0.1.2. Show that N(r) ≥ 3N(r − 1) − 1 for every r.
Deduce that N(r) ≥ (3r + 1)/2 for every r. Also deduce that there exists a coloring of the edges
of K(3r +3)/2 with r colors so that there are no monochromatic triangles.
Exercise 0.1.8 (Upper bound on Ramsey numbers). Let s and t be positive integers. Show that if
s+t−2
the edges of a complete graph on s−1 vertices are colored with red and blue, then there must be
either a red Ks or a blue Kt .
Exercise 0.1.9 (Ramsey’s theorem). Show that for every k and r there exists some N = N(k, r)
such that every coloring of the edges of KN using r colors, there exists some monochromatic copy
clique on k vertices.
4 0. APPETIZER

Generalize this statement to hypergraphs.


Exercise 0.1.10 (Monochromatic triangles compared to random coloring).
(a) True or false: If the edges of Kn are colored using 2 colors, then at least 1/4 − o(1) fraction of
all triangles are monochromatic. (Note that 1/4 is the fraction one expects if the edges were
colored uniformly at random.)
(b) True or false: if the edges of Kn are colored using 3 colors, then at least 1/9 − o(1) fraction of
all triangles are monochromatic.

(c ) True or false: if the edges of Kn are colored using 2 colors, then at least 1/32 − o(1) fraction
of all copies of K4 ’s are monochromatic.
0.2. Progressions
Schur’s theorem above is one of the earliest examples of an area now known as additive
combinatorics, which is a term coined by Terry Tao in the early 2000’s to describe a rapidly growing
body of mathematics motivated by simple-to-state questions about addition and multiplication
of integers. The problems and methods in additive combinatorics are deep and far-reaching,
connecting many different areas of mathematics such as graph theory, harmonic analysis, ergodic
theory, discrete geometry, and model theory. Let us highlight some important developments in
additive combinatorics, especially concerning progressions. The ideas behind the proofs of these
results form some of the core themes of this book.
Van der Waerden (1927) proved the following result about monochromatic arithmetic progres-
sions in the integers.

Theorem 0.2.1 (van der Waerden’s theorem). If the integers are colored with finitely many colors,
then one of the color classes must contain arbitrarily long arithmetic progressions.
Note that having arbitrarily long arithmetic progressions is very different from having infinitely
long arithmetic progressions, as seen in the next exercise.
Exercise 0.2.2. Show that Z may be colored using two colors so that it contains no infinitely long
arithmetic progressions.
Erdős and Turán (1936) conjectured a stronger statement, that any subset of the integers with
positive density contains arbitrarily long arithmetic progressions. To be precise, we say that A ⊂ Z
has positive upper density if
| A ∩ {−N, . . . , N }|
lim sup > 0. (0.2.1)
N→∞ 2N + 1
(There are several variations of this definition—the exact formulation is not crucial.)
Roth (1953) proved the conjecture for 3-term arithmetic progression using Fourier analytic
methods. Szemerédi (1975) fully settled the conjecture using combinatorial techniques. These are
landmark theorems in the field. Much of what we will discuss are motivated by these results and
the developments around them.

Theorem 0.2.3 (Roth’s theorem). Every subset of the integers with positive upper density contains
a 3-term arithmetic progression.

Theorem 0.2.4 (Szemerédi’s theorem). Every subset of the integers with positive upper density
contains arbitrarily long arithmetic progressions.
0.2. PROGRESSIONS 5

Szemerédi’s theorem is deep and intricate. This important work led to many subsequent
developments in additive combinatorics. Several different proofs of Szemerédi’s theorem have
since been discovered, and some of them have blossomed into rich areas of mathematical research.
Here are some the most influential modern proofs of Szemerédi’s theorem (in historical order):
• The ergodic theoretic approach by Furstenberg (1977);
• Higher-order Fourier analysis by Gowers (2001);
• Hypergraph regularity lemma by independently Rödl et al. (2005) and Gowers (2001).
Another modern proof of Szemerédi’s theorem results from the density Hales–Jewett theorem,
which was originally proved by Furstenberg and Katznelson using ergodic theory, and subsequently
a new combinatorial proof was found in the first successful Polymath Project (Polymath 2012), an
online collaborative project initiated by Gowers.
These different approaches have distinct advantages and disadvantages. The ideas they intro-
duces led to other applications. For example, the ergodic approach led to multidimensional and
polynomial generalizations of Szemerédi’s theorem, which we discuss below. On the other hand,
the ergodic approach does not give any concrete quantitative bounds. Fourier analytic methods
give the best quantitative bounds to Szemerédi’s theorem, and it lead to deep results about counting
patterns in the prime numbers. However, there appear to be difficulties and obstructions extending
Fourier analytic methods to higher dimensions.
The relationships between these different approaches to Szemerédi’s theorem are not yet com-
pletely understood. A unifying theme underlying all known approaches to Szemerédi’s theorem is
the dichotomy between structure and pseudorandomness, a term popularized by Tao (2007b) and
others. We will later see different facets of this dichotomy both in the context of graph theory as
well as in number theory.
There is also much interest in obtaining better quantitative bounds on Szemerédi’s theorem.
Roth’s initial proof showed that every subset of [N] avoiding 3-term arithmetic progressions has
size O(N/log log N). We will see this proof in Chapter 6. Roth’s upper bound has been improved
steadily over time, all via refinement of his Fourier analytic technique. At the time of this writing,
the current best claimed upper bound is N/(log N)1+c for some constant c > 0 (Bloom and Sisask
2020). For 4-term arithmetic progressions, the best known upper bound is N/(log N)c (Green and
Tao 2017). For k-term arithmetic progressions, with fixed k ≥ 5, the best known upper bound √
is N/(log log N)ck . As for lower bounds, Behrend constructed a subset of [N] of size Ne−c log N
avoiding three term arithmetic progressions. There is some evidence leading to a common opinion
among researchers at this lower bound is closer to the truth, since for a certain variant of Roth’s
theorem (namely avoiding solutions to x + y + z = 3w), Behrend’s construction is quite close to
the truth (Schoen and Shkredov 2014; Schoen and Sisask 2016).
Erdős famously conjectured the following.

Conjecture 0.2.5 (Erdős conjecture on arithmetic progressions). Every subset A of integers with
= ∞ contains arbitrarily long arithmetic progressions.
Í
a∈A 1/a

This is a strengthening of the Erdős–Turán conjecture (= Szemerédi’s theorem), since every


subset of integers with positive density necessarily has divergent harmonic sum. Erdős’ conjecture
was motivated by the case of the primes (see the Green–Tao theorem below). It has an attractive
statement and is widely publicized. The supposed connection between divergent harmonic series
and arithmetic progressions seems magical. However, this connection is perhaps somewhat mis-
leading. The hypothesis on divergent harmonic series implies that there are infinitely many N for
6 0. APPETIZER

which | A ∩ [N]| ≥ N/(log N(log log N)2 ). Morally the hypothesis is roughly a statement about
the density of A, that is not too much smaller than the primes. So the Erdős conjecture is really
a conjecture about an upper bound on Szemerédi’s theorem. Although many believe that much
stronger statement than the Erdős conjecture is true, as discussed in the last part of the previous
paragraph, the “logarithmic barrier” has a special symbolic status. Erdős’ conjecture for k-term
arithmetic progressions is now proved for k = 3 thanks for the new N/(log N)1+c upper bound
(Bloom and Sisask 2020), but it remains very much open for all k ≥ 3.
Perhaps by the time you read this book (or when I update it to a future edition), these bounds
will have been improved.

Here are a few other important subsequent developments to Szemerédi’s theorem.


Instead of working over subsets of integers, let us consider subsets of a higher dimensional
lattice Zd . We say that A ⊂ Zd has positive upper density if

| A ∩ [−N, N]d |
lim sup >0
N→∞ (2N + 1)d

(as before, other similar definitions are possible). We say that A contains arbitrary constellations
if for every finite set F ⊂ Zd , there is some a ∈ Zd and t ∈ Z>0 such that a+t ·F = {a+t x : x ∈ F} is
contained in A. In other words, A contains every finite pattern, each consisting of some finite subset
of the integer grid allowing dilation and translation. The following multidimensional generalization
of Szemerédi’s theorem was proved by Furstenberg and Katznelson (1978) initially using ergodic
theory, though a combinatorial proof was later discovered as a consequence of the hypergraph
regularity method mentioned earlier.

Theorem 0.2.6 (Multidimensional Szemerédi theorem). Every subset of Zd of positive upper density
contains arbitrary constellations.

For example, the theorem implies that every subset of Zd of positive upper density contains a
10 × 10 set of points that form an axis-aligned square grid.
There is also a polynomial extension of Szemerédi’s theorem. Let us first state a special case,
originally conjectured by Lovász and proved independently by Furstenberg (1977) and Sárkőzy
(1978).

Theorem 0.2.7 (Furstenberg–Sárkőzy theorem). Any subset of the integers with positive upper
density contains two numbers differing by a perfect square.

In other words, the set always contains {x, x + y 2 } for some x ∈ Z and y ∈ Z>0 . What about
other polynomial patterns? The following polynomial generalization was proved by Bergelson and
Leibman (1996).

Theorem 0.2.8 (Polynomial Szemerédi theorem). Suppose A ⊂ Z has positive upper density. If
P1, . . . , Pk ∈ Z[X] are polynomials with P1 (0) = · · · = Pk (0) = 0, then there exist x ∈ Z and
y ∈ Z>0 such that x + P1 (y), . . . , x + Pk (y) ∈ A.

We leave it as an exercise to formulate a common extension of the above two theorems (i.e., a
multidimensional polynomial Szemerédi theorem). Such a theorem was also proved by Bergelson
and Leibman.
0.2. PROGRESSIONS 7

We will not cover the proof of Theorems 0.2.6 and 0.2.8. In fact, currently the only known
general proof of the polynomial Szemerédi theorem uses ergodic theory, though there are some
recent exciting developments (Peluse 2020).
Building on Szemerédi’s theorem as well as other important developments in number theory,
Green and Tao (2008) proved their famous theorem that settled an old folklore conjecture about
prime numbers. Their theorem is considered one of the most celebrated mathematical achievements
this century.

Theorem 0.2.9 (Green–Tao theorem). The primes contain arbitrarily long arithmetic progressions.

We will discuss many central ideas behind the proof of the Green–Tao theorem.

One of our goals is to understand two different proofs of Roth’s theorem, which can be rephrased
as:

Theorem 0.2.10 (Roth’s theorem). Every subset of [N] that does not contain 3-term arithmetic
progressions has size o(N).

Roth originally proved his result using Fourier analytic techniques, which we will see in the
second half of this book starting in Chapter 6. In the 1970’s, leading up to Szemerédi’s proof of
his landmark result, Szemerédi developed an important tool known as the graph regularity lemma.
Ruzsa and Szemerédi (1978) used the graph regularity lemma to give a new graph theoretic proof
of Roth’s theorem. One of our first goals is to understand this graph theoretic proof. Along the
way, we will also explore many related topics in graph theory, especially those that share a theme
with central topics in additive combinatorics.
As in the proof of Schur’s theorem, we will formulate a graph theoretic problem whose solution
implies Roth’s theorem. Once again, we will set up a graph whose triangles encode 3-term
arithmetic progressions.
Extremal graph theory, broadly speaking, concerns questions of the form: what is the maxi-
mum (or minimum) possible number of (something) in a graph with certain prescribed properties.
A starting point (historically and also pedagogically) in extremal graph theory is the following
question:

Question 0.2.11. What is the maximum number of edges in a triangle-free n-vertex graph?

This question is relatively easy, and it was answered by Mantel in the early 1900’s (and
subsequently rediscovered and generalized by Turán). It will be the first result that we shall prove
in the next chapter. However, even though this question/result sounds similar to Roth’s theorem, it
cannot be used to deduce Roth’s theorem. Later on, we will construct a graph that corresponds to
Roth’s theorem, and it turns out that the right question to ask is:

Question 0.2.12. What is the maximum number of edges in an n-vertex graph where every edge
is contained in a unique triangle?

This innocent looking question turns out to be incredibly mysterious. We are still far from
knowing the truth. We will later prove, using Szemerédi’s regularity lemma, that any such graph
must have o(n2 ) edges, and we will then deduce Roth’s theorem from this graph theoretic claim.
8 0. APPETIZER

Further reading
The textbook Ramsey Theory by Graham, Rothschild, and Spencer (2013) is a wonderful
introduction to the subject. It has beautiful accounts of theorems of Ramsey, van der Waerden,
Hales–Jewett, Schur, Rado, and others, that form the foundation of Ramsey theory.
For a comprehensive survey of modern developments in additive combinatorics, check out
Green (2009a)’s review of the book Additive Combinatorics by Tao and Vu (2006).
CHAPTER 1

Forbidding a subgraph

In this chapter, we discuss problems such as:


Question 1.0.1. What is the maximum number of edges in an n-vertex graph that does not contain
a triangle?
We will see the answer shortly. More generally, we can ask about what happens if replace
“triangle” by an arbitrary subgraph. This is a foundational problem in extremal graph theory.
Definition 1.0.2 (Extremal number / Turán number). We write ex(n, H) for the maximum number
of edges in an n-vertex H-free graph. Here an H-free graph is a graph that does not contain H as
a subgraph.
Question 1.0.3 (Turán problem). Fix a graph H. What is the value of ex(n, H)? In particular, how
does it grow asymptotically as n → ∞?
There are also many other problems in extremal graph theory that we will explore in the rest of
this book. For now, in this chapter, we will focus on the Turán problem of determining or estimating
ex(n, H). This is one of the most basic problems in extremal graph theory. It named after Turán for
his fundamental work on the subject for recognizing its importance. Research on this problem has
led to many important techniques. We will see a fairly satisfactory answer to the Turán problem
for non-bipartite graphs H. We also know the answer for a small number of bipartite graphs H.
However, for nearly all bipartite graphs H, much mystery remains. Some of the most important
open problems in extremal graph theory take this form.
Remark 1.0.4 (Subgraph versus induced subgraph). Let us clarify two distinct notions of subgraphs
that frequently come up in graph theory. Given two graphs H and G, we say that H is a subgraph
of G if one can delete some vertices and edges from G to obtain H. We say that H is an induced
subgraph of G if one can delete some vertices of G (when we delete a vertex, we also remove
all edges incident to the vertex) to obtain H—note that in particular we are not allowed to remove
additional edges other than those incident to a deleted vertex. If S ⊂ V(G), we write G[S] to denote
the subgraph of G induced by the vertex set S, i.e., the subgraph with vertex set S keeping all edges
among S.
As an example, the following graph contains the 4-cycle as an induced subgraph. It contains
the 5-cycle as a subgraph but not as an induced subgraph.

In this book, when we say H-free, we always mean free of H as a subgraph. On the other hand,
we would say induced H-free to mean free of H as an induced subgraph. For a clique H = Kr
(e.g., a triangle), being Kr -free is the same as induced Kr -free.
9
10 1. FORBIDDING A SUBGRAPH

In the first part of the chapter, we focus on techniques for upper bounding ex(n, H). In the last
few sections, we turn our attention to lower bounding ex(n, H) when H is a bipartite graph.

1.1. Forbidding a triangle


We begin by answering Question 1.0.1: what is the maximum number of edges in an n-vertex
triangle-free graph? This question was answered in the early 1900’s by Mantel, whose theorem is
considered the starting point of extremal graph theory.
Let us partition the n vertices into two equal halves (differing by one if n is odd), and then put in
all edges across the two parts. This is the complete bipartite graph K bn/2c,dn/2e , and is triangle-free,
e.g.,

K4,4 = .

The graph K bn/2c,dn/2e has bn/2c dn/2e = n2 /4 edges (one can check this equality by separately
 
considering even and odd n).
Mantel (1907) proved that K bn/2c,dn/2e has the most number of edges among all triangle-free
graphs. This is considered the first result in extremal graph theory.

Theorem 1.1.1 (Mantel’s theorem). Every n-vertex triangle-free graph has at most bn2 /4c edges.

Using the notation of Definition 1.0.2, Mantel’s theorem says that


 2
n
ex(n, H) = .
4
Moreover, we will see that K bn/2c,dn/2e is the unique maximizer of the number of edges among
n-vertex triangle-free graphs.
We give two different proofs of Mantel’s theorem, each illustrating a different technique.
First proof of Mantel’s theorem. In G = (V, E) be a triangle-free graph with |V | = n vertices and
|E | = m edges. For every edge x y of G, note that x and y have no common neighbors or else it
would create a triangle (see Figure 1.1.1). Therefore, deg x + deg y ≤ n, which implies that
Õ
(deg x + deg y) ≤ mn.
x y∈E

x
N(x)

N(y)
y

Figure 1.1.1. In the first proof of Mantel’s theorem, adjacent vertices have disjoint
neighborhoods.
1.1. FORBIDDING A TRIANGLE 11

On the other hand, note that for each vertex x, the term deg x appears once in the sum for each
edge incident to x, so it appears a total of deg x times. Hence
!2
Õ Õ 1 Õ (2m)2
(deg x + deg y) = d(x)2 ≥ d(x) ≥ .
xy∈E x∈V
n x∈V
n

Comparing the two inequalities, we obtain (2m)2 /n ≤ mn, and hence m ≤ n2 /4. Since m is an
integer, we obtain m ≤ bn2 /4c, as claimed. 

Second proof of Mantel’s theorem. Let G = (V, E) be a triangle-free graph. Let v be a vertex of
maximum degree in G. Since G is triangle-free, the neighborhood N(v) of v is an independent set
(see Figure 1.2.2).

A B
(indep set)

Figure 1.1.2. In the second proof of Mantel’s theorem, the neighborhood of v is an


independent set.

Partition V = A ∪ B where A = N(v) and B = V \ A. Since v is a vertex of maximum degree,


we have deg x ≤ deg v = | A| for all x ∈ V. Since A contains no edges, every edge of G has at least
one endpoint in B. Therefore,
2
| A| + |B| n2
Õ 
|E | ≤ deg x ≤ | A| |B| ≤ = , (1.1.1)
x∈B
2 4

as claimed. 
Remark 1.1.2 (The equality case in Mantel’s theorem). The second proof above shows that every
2
Í bn /4c edges must be isomorphic to K bn/2c,dn/2e . Indeed,
n-vertex triangle-free graph with exactly
in (1.1.1), the first inequality |E | ≤ x∈B deg x is tight only if B is an independent set, the second
inequality is tight if B is complete to A, and | A| |B| < bn2 /4c unless | A| = |B| (if n is even) or
|| A| − |B|| = 1 (if n is odd). (Exercise: deduce the equality case from the first proof.)
In general, it is a good idea to keep the equality case in mind when following the proofs, or
when coming up with your own proofs, to make sure that the steps are not lossy.
The next several exercises explore extensions of Mantel’s theorem. It is useful to revisit the
proof techniques.
Exercise 1.1.3. Let G be a Kr+1 -free graph. Prove that there is another graph H on the same vertex
set as G such that χ(H) ≤ r and dH (x) ≥ dG (x) for every vertex x (here dH (x) is the degree of x in
H, and likewise with dG (x) for G). Give another proof of Turán’s theorem from this fact.
12 1. FORBIDDING A SUBGRAPH

Exercise 1.1.4 (Many triangles). Show that a graph with n vertices and m edges has at least
n2
 
4m
m− triangles.
3n 4
Exercise 1.1.5. Prove that every n-vertex non-bipartite triangle-free graph has at most (n−1)2 /4+1
edges.
Exercise 1.1.6 (Stability). Let G be an n-vertex triangle-free graph with at least n2 /4 − k edges.
 
Prove that G can be made bipartite by removing at most k edges.
Exercise 1.1.7. Show that every n-vertex triangle-free graph with minimum degree greater than
2n/5 is bipartite.
Exercise 1.1.8∗. Prove that every n-vertex graph with at least n2 /4 + 1 edges contains at least
 
bn/2c triangles.
Exercise 1.1.9∗. Prove that every n-vertex graph with at least n2 /4 + 1 edges contains some edge
 
in at least (1/6 − o(1))n triangles, and that this constant 1/6 is best possible.
The next exercise can be solved by a neat application of Mantel’s theorem.
Exercise 1.1.10. Let X and Y be independent and identically distributed random vectors in Rd
according to some arbitrary probability distribution. Prove that
1
P(|X + Y | ≥ 1) ≥ P(|X | ≥ 1)2 .
2
1.2. Forbidding a clique
In Mantel’s theorem, what happens if we replace the triangle by K4 , a clique on 4 vertices? Or
a clique of fixed given size?
Question 1.2.1. What is the maximum number of edges in a Kr+1 -free graph on n vertices?

Let us first consider the easiest case of the problem where we restrict to r-partite graphs, which
are automatically Kr+1 -free. This observation will be useful in the general proof later.

Lemma 1.2.2. Among n-vertex r-partite graphs, the graph with the maximum number of edges has
all r vertex parts as equal in size as possible (i.e., n mod r parts have size dn/re and the rest have
size bn/rc).
Proof. If two vertex parts differ in size by more than one, then moving a vertex from a larger part
to a smaller part would strictly increase the number of edges in the graph. 
Definition 1.2.3. The Turán graph Tn,r is defined to be the unique n-vertex, r-partite graph, with
part sizes differing by at most 1 (so each part has size bn/rc or dn/re).
Turán (1941) proved the following fundamental result, generalizing Mantel’s theorem (Theo-
rem 1.1.1) from triangles to arbitrary cliques. Turán’s work initiated the direction of extremal graph
theory.

Theorem 1.2.4 (Turán’s theorem). The Turán graph Tn,r maximizes the number of edges among all
n-vertex Kr+1 -free graphs. It is also the unique maximizer.
1.2. FORBIDDING A CLIQUE 13

Figure 1.2.1. The Turán graph T10,3 = K3,3,4 .

In other words,
ex(n, Kr+1 ) = e(Tn,r ).
It is not too hard to give precise formula for e(Tn,r ), though there is a small annoying dependence
on the residue class of n mod r. The following bound is usually good enough for most purposes.

Corollary 1.2.5 (Turán’s theorem). The number of edges in an n-vertex Kr+1 -free graph is at most
1 n2
 
1− .
r 2
  2
Exercise 1.2.6. Show that e(Tn,r ) ≤ 1 − 1r n2 .

Note that if n is divisible by r, then (1 − 1/r)n2 /2 is exactly the number of edges in Tn,r . Even
when n is not divisible by r, the difference between e(Tn,r ) and (1 − 1/r)n2 /2 is O(nr). is O(nr).
As we are generally interested in the regime when r is fixed, this difference is a negligible lower
order contribution, i.e.,
  2
1 n
ex(n, Kr+1 ) = 1 − − o(1) , for fixed r. (1.2.1)
r 2
We now give three proofs of Turán’s theorem. The first proof extends our second proof of
Mantel’s theorem.
First proof of Turán’s theorem. We prove by induction on r. The case r = 1 is trivial. Now assume
r > 1.

A B
(Kr -free)

Figure 1.2.2. In the first proof of Turán’s theorem, the neighborhood of v is Kr -free.

Let G = (V, E) be a Kr+1 -free graph. Let v be a vertex of maximum degree in G. Since G is
Kr+1 -free, the neighborhood A = N(v) of v is Kr -free, and hence contains at most e(T| A|,r−1 ) edges
by the induction hypothesis.
Let B = V \ A. Since v is a vertex of maximum degree, we have deg x ≤ deg(v) = | A| for all
x ∈ V. So the number of edges with at least one vertex in B is
Õ
≤ deg(y) ≤ | A| |B| .
y∈B
14 1. FORBIDDING A SUBGRAPH

Thus the total number of edges of G is at most


| A| |B| + e(T| A|,r−1 ).
It remains to show that
| A| |B| + e(T| A|,r−1 ) ≤ e(Tn,r ).
The left-hand side is the number of edges in the complete r-partite with one part of size |B| and
the remaining vertices equitably partitioned into r − 1 parts, and we know from Lemma 1.2.2 that
the Turán graph Tn,r maximizes the number of edges among n-vertex r-partite graphs. This shows
Í e(G) ≤ e(Tn,r ). To have equality in every step above, B must be an independent set (or else
that
y∈B deg(y) < | A| |B|) and A must induce T| A|,r , so that G is r-partite, and we already know that
Tn,r is the unique maximizer of edges among r-partite graphs. 
The second proof starts out similarly to our first proof of Mantel’s theorem. Recall that in
Mantel’s theorem, the initial observation was that in a triangle-free graph, given an edge, its two
endpoints must have no common neighbors (or else they form a triangle). Generalizing, in a K4 -free
graph, given a triangle, its three vertices have no common neighbor. The rest of the proof proceeds
somewhat differently from earlier. Instead of summing over all edges as we did before, we remove
the triangle and apply induction to the rest of the graph.
Second proof of Turán’s theorem. We fix r and proceed by induction on n.
The statement is trivial for n ≤ r, as the Turán graph is the complete graph Kn = Tn,r and thus
clearly maximizes the number of edges.
Now, assume that n > r and that Turán’s theorem holds for all graphs on fewer than n vertices.
Let G = (V, E) be an n-vertex, Kr+1 -free graph with the maximum possible number of edges. By
the maximality assumption, G contains Kr as a subgraph, or else we could add an edge in G and
still be Kr+1 -free. Let A be the vertex set of an r-clique in G, and let B := V \ A.
Since G is Kr+1 -free, every x ∈ B has at most r − 1 neighbors in A. So the number of edges
e(A, B) between A and B is at most
Õ Õ
e(A, B) ≤ deg(y, A) ≤ (r − 1) = (r − 1)(n − r).
y∈B y∈B

Letting e(A) denote the number of edges within A, and likewise e(B), we have
e(G) = e(A) + e(A, B) + e(B)
 
r
≤ + (r − 1)(n − r) + ex(n − r, Kr+1 ), (1.2.2)
2
where in the final step we use that the subgraph induced by B is an Kr+1 -free graph on n −r vertices.
Applying the induction hypothesis, we have ex(n − r, Kr+1 ) = e(Tn−r,r ). Thus
 
r
e(G) ≤ + (r − 1)(n − r) + e(Tn−r,r ).
2
It remains to check that the right-hand side equals to e(Tn,r ). Note that
 if we remove one vertex
from each part of Tn,r , we end up with Tn−r,r after removing exactly 2r + (r − 1)(n − r) edges. This
shows that e(G) ≤ e(Tn−r,r ).
To see that Tn,r is the unique maximizer, we check when equality occurs in all steps of the proof.
Suppose G is a maximizer, i.e.,  has e(Tn,r ) edges. The subgraph induced on B must be Tn−r,r by
induction. To have e(A) = 2r in (1.2.2), A must induce a clique. To have e(A, B) = (r − 1)(n − r),
every vertex of B must be adjacent to all but one vertex in A. Also, two vertices x, y lying in distinct
1.2. FORBIDDING A CLIQUE 15

parts of G[B]  Tn−r,r cannot “miss” the same vertex v of A, or else A ∪ {x, y} \ {v} would be an
Kr+1 -clique. This then forces G to be Tn,r . 

The third proof uses a method known as Zykov symmetrization. The idea here is that if a
Kr+1 -free graph is a not a Turán graph, then we should be able make some local modifications
(namely replacing a vertex by a clone of another vertex) to get another Kr+1 -free with strictly more
edges.

Third proof of Turán’s theorem. As before, let G be an n-vertex, Kr+1 -free graph with the maximum
possible number of edges.
We claim that if x and y are non-adjacent vertices, then deg x = deg y. Indeed, if deg x > deg y,
say, then we can modify G by replacing y by a “clone” of x (i.e., with the same neighbors as x).
The resulting graph would still be Kr+1 -free (since a clique cannot contain both x and its clone)
and has strictly more edges than G, thereby contradicting the assumption that G has the maximum
possible number of edges.

x y x x0
−→

We claim that if x is non-adjacent to both y and z in G, then y and z must be non-adjacent.


Indeed, if yz is an edge, then by deleting y and z from G and adding two clones of x, we obtain a
Kr+1 -free graph with one more edge than G (note that the edge yz is counted twice in G if we sum
the degrees of y and z). This would contradict the maximality of G.

x x
y −→
z x0 x 00

Therefore, non-adjacency is an equivalence relation among vertices of G. So the complement


of G is a union of cliques. Hence G is a complete multipartite graph, which has at most r parts
since G is Kr+1 -free. Among all complete r-partite graphs, the Turán graph Tn,r is the unique graph
that maximizes the number of edges, by Lemma 1.2.2. Therefore, G is isomorphic to Tn,r . 

The last proof we give in this section uses the probabilistic method. This probabilistic proof was
given in the book The probabilistic method by Alon and Spencer, though the key inequality is due
earlier to Caro and Wei. Below, we prove Turán’s theorem in the formulation of Corollary 1.2.5, i.e,
ex(n, Kr+1 ) ≤ (1 − 1/r)n2 /2. A more careful analysis of the proof can yield the stronger statement
of Theorem 1.2.4, which we omit.

Fourth proof of Turán’s theorem. Let G = (V, E) be an n-vertex, Kr+1 -free graph. Consider a
uniform random ordering σ of the vertices. Let
X = {v ∈ V : v is adjacent to all earlier vertices in σ}.
Observe that the set of vertices in X form a clique. Since the permutation was chosen uniformly at
random, we have
1
P(v ∈ X) = P(v appears before all non-neighbors) = .
n − deg v
16 1. FORBIDDING A SUBGRAPH

Since G is Kr+1 -free, we always have |X | ≤ r. So


Õ Õ 1 n n
r ≥ E |X | = P(v ∈ X) = ≥ Í = .
v∈V v∈V
n − deg v n − ( v∈V deg v)/n n − 2m/n
Rearranging gives
1 n2
 
m ≤ 1− .
r 2

In Chapter 5, we will see another more analytic proof of Turán’s theorem using a method known
as graph Lagrangians.
The following exercise is an extension of Exercise 1.1.6.
Exercise 1.2.7∗ (Stability). Let G be an n-vertex Kr+1 -free graph with at least e(Tn,r ) − k edges,
where Tn,r is the Turán graph. Prove that G can be made r-partite by removing at most k edges.
The next exercise is a neat geometric application of Turán’s theorem.
Exercise 1.2.8. Let S be a set of n points in the plane,
 2 with
 the property that no two points are
at distance greater than 1. Show that S has at most n /3 pairs of points at distance greater than

1/ 2. Also, show that the bound n2 /3 is tight (i.e., cannot be improved).
 

1.3. Turán density and supersaturation


Turán’s theorem pinpoints the value of ex(n, H) when H is a clique. Such precise answers are
actually quite rare. For general H, we will be content with looser bounds, and we will mostly be
interested in asymptotics for large n give a fixed H. Before going on to bound ex(n, H) for other
values of H, let us take a short detour and think about the structure of the problem.
In this chapter, we will define the edge density of a graph G to be
 
v(G)
e(G) .
2
So the edge density of a clique is 1. In later chapters, we will use a different normalization
2e(G)/v(G)2 for edge density, which is more convenient for other purposes. When v(G) is large,
there is no significant difference between the two choices.
Next, we observe that ex(n, H), when expressed as edge-densities, is non-increasing in n.

Proposition 1.3.1. For every graph H and positive integer n,


ex(n + 1, H) ex(n, H)
n+1
≤ n .
2 2
Proof. Let G an H-free graph on n + 1 vertices. For each n-element subset S of V(G), since G[S]
is also H-free, we have
e(G[S]) ex(n, H)
n ≤ n .
2 2
Varying S uniformly over all n-element subsets of V(G), and the left-hand hand averages to the
edge density of G by linearity of expectations. It follows that
e(G) ex(n, H)
n+1
≤ n .
2 2
1.3. TURÁN DENSITY AND SUPERSATURATION 17

The claim then follows. 


n
For every fixed H, the sequence ex(n, H)/ 2 is non-increasing and bounded between 0 and 1.
It follows that it approaches a limit.
Definition 1.3.2. The Turán density of a graph H is defined to be
ex(n, H)
π(H) := lim n .
n→∞
2

Determining the Turán density π(H) is equivalent to determining ex(n, H) up to an o(n2 ) additive
error.
Turán’s theorem implies that
1
π(Kr+1 ) = 1 − .
r
In the next couple of sections we will determine the the Turán density for every graph H. We will
see the Erdős–Stone–Simonovits theorem, which will tell us that
1
π(H) = 1 −
χ(H) − 1
where χ(H) is the chromatic number of H.
Here is an equivalent definition of Turán density: π(H) is smallest real number so that for every
 > 0 there is  some n0 = n0 (H, ) so that for every n ≥ n0 , every n-vertex graph with at least
(π(H) + ) 2n edges contains H as a subgraph.
It turns out there is a general phenomenon in combinatorics where once some density crosses
an existence threshold (e.g., the Turán density is the threshold for H-freeness), it will be possible
to find not just one copy of the desired object, but in fact lots and lots of copies. This principle is
usually called supersaturation. It is a fundamental idea useful for many applications, including in
our upcoming determination of π(H) for general H.
The next statement is an instance of the supersaturation phenomenon for the Turán problem. It
converts an extremal result to a counting result. The proof technique is worth paying attention to,
as it can be used to prove similar results in many settings.

Theorem 1.3.3 (Supersaturation). For every  > 0 and graph H there exist some δ > 0 and n0
such that every graph on n ≥ n0 vertices with at least (π(H) + )
n
2 edges contains at least δn
v(H)

copies of H as a subgraph.
Note that δnv(H) has the best possible order of magnitude, since even the complete graph on n
vertices only has OH (nv(H) ) copies of H. The precise dependence of the optimal δ versus  is a
difficult problem already when H is a triangle; we will discuss it in Chapter 5.
The idea, sometimes called “subsampling,” is sample a v(H)-vertex subset of G in two stages.
We first sample a random n0 -vertex subset S, where n0 is a large constant so that with good
probability G[S] has enough edge density to guarantee one copy of H. If G[S] indeed has a copy
of H, then we can sample again a v(H)-vertex subset from S, and we would obtain a copy of H
with at least constant probability. The same argument can also be phrased non-probabilistically via
double-counting, though the probabilistic argument, once understood, can be helpful in seeing that
we should expect to obtain the right order of magnitude.
Here is a simple but useful lemma.

Lemma 1.3.4. Let X be a real random variable taking values in [0, 1]. Then P(X ≥ EX − ) ≥ .
18 1. FORBIDDING A SUBGRAPH

Proof. Let µ = EX. By separately considering what happens when X ≤ µ −  versus when
X > µ −  (in the latter case we bound by X ≤ 1), we have
µ ≤ (µ − )P(X ≤ µ − ) + P(X > µ − ) ≤ µ −  + P(X > µ − ).
Thus P(X > µ − ) ≥ . 
Proof of supersaturation (Theorem 1.3.3). By the definition of the Turán density, there exists some
constant n0 depending only on H and  such that every n0 -vertex graph with at least (π(H)+/2) n20
edges contains H as a subgraph.
Let n ≥ n0 and G be n-vertex graph with at least (π(H) + ) 2n edges. Let S be an n0 -element

subset of V(G), chosen uniformly at random. Let X denote the edge density of G[S]. By averaging,
EX equals to the edge density of G, and so EX ≥ π(H) + . Then by Lemma 1.3.4, with probability
at least /2, X ≥ π(H) + /2, in which case G[S] contains a copy of H by the earlier paragraph.
Let T be a uniformly random v(H)-element subset of the n0 -element set S. Conditioned on the
edge density of G[S] being at least π(H) + /2, the probability that G[T] contains H as a subgraph
n0 
is thus at least 1/ v(H) (a constant not depending on n). Thus the unconditional probability of G[T]
containing H as a subgraph is at least /(2 v(H) n0  n 
). So there are at least v(H) /(2 v(H)
n0 
) copies of
H in G, which implies the claim. 
As a corollary, we obtain the following supersaturation version of Turán’s theorem.

 every  > 0 and positive integer r, there is some δ > 0 such that every n-vertex
Corollary 1.3.5. For
n2
graph with at least 1 − 1
r + 2 edges contains at least δnr+1 copies of Kr+1 .
Proof. Applying supersaturation with π(Kr+1 ) = 1 − 1/r, we deduce the corollary for all n > n0 for
some n0 = n0 (, r). Finally, by decreasing δ (if necessary) to below 1/nr+1
0 , the claim becomes true
for all n, since for n ≤ n0 Turán’s theorem guarantees us at least one copy of Kr+1 . 
As mentioned earlier, we will soon determine the Turán density π(H) for every graph H. It
may seem like the Turán problem is essentially understood, but actually this would be very far
from the truth. We will see in the next section that π(H) = 0 for every bipartite graph H, i.e.,
ex(n, H) = o(n2 ), but actual asymptotics for ex(n, H) are often unknown.
In a different direction, the generalization to hypergraphs, while looking deceptively similar,
turns out to be much more difficult, and very little is known here.
Remark 1.3.6 (Hypergraph Turán problem). Generalizing from graphs to hypergraphs, given an
r-uniform hypergraph H, we write ex(n, H) for the maximum number of edges in an n-vertex
r-uniform hypergraph that does not contain H as a subgraph. A straightforward extension of
n
Proposition 1.3.1 gives that ex(n, H)/ r is a non-increasing function of n, for each fixed H. So we
can similarly define the hypergraph Turán density
ex(n, H)
π(H) := lim n .
n→∞
r
The exact value of π(H) is known in very few cases. It is a major open problem to determine π(H)
when H is the complete 3-uniform hypergraph on 4 vertices (also known as a tetrahedron), and
more generally when H is a complete hypergraph.
Exercise 1.3.7 (Density Ramsey). Prove that for every s and r, there is some constant c > 0 so that
for every sufficiently large n, if the edges of Kn are colored using r colors, then at least c fraction
of all copies of Ks are monochromatic.
1.4. FORBIDDING A COMPLETE BIPARTITE GRAPH 19

Exercise 1.3.8 (Density Szemerédi). Let k ≥ 3. Assuming Szemerédi’s theorem for k-term
arithmetic progressions (i.e., every subset of [N] without a k-term arithmetic progression has size
o(N)), prove the following density version of Szemerédi’s theorem:
For every δ > 0 there exist c and N0 (both depending only on k and δ) such that for every
A ⊂ [N] with | A| ≥ δN and N ≥ N0 , the number of k-term arithmetic progressions in A is at least
cN 2 .

1.4. Forbidding a complete bipartite graph


In this section, we provide an upper bound on ex(n, Ks,t ), the maximum number of edges in an
n-vertex graph without a Ks,t .
However, it is a major open problem to determine the asymptotic growth of ex(n, Ks,t ) for most
values of (s, t), although the problem has been solved in some cases, as we will see in later sections.
Problem 1.4.1 (Zarankiewicz problem). Determine ex(n, Ks,t ), the maximum number of edges in
an n-vertex Ks,t -free graph.
Zarankiewicz (1951) originally asked a related problem: determine the maximum number of
1’s in an m × n matrix without an s × t submatrix with all entries 1.
The main theorem of this section, below, is due to Kővári, Sós, and Turán (1954). We will
refer to it as the KST theorem, which stands both for its discoverers, as well as for the forbidden
subgraph Ks,t .

Theorem 1.4.2 (Kővári–Sós–Turán “KST” theorem). For positive integers s ≤ t, there exists some
constant C = C(s, t), such that, for all n,
ex(n, Ks,t ) ≤ Cn2−1/s .
The proof proceeds by double counting.

Proof. Let G be a Ks,t -free n-vertex graph with m edges. Let

..
X = number of copies of Ks,1 . in G.

(When s = 1 we need to modify the definition of X slightly to count each copy twice.) The strategy
is to count X in two ways. First we count Ks,1 by first embedding the “left” s vertices of Ks,1 . Then
we count Ks,1 by first embedding the “right” single vertex of Ks,1 .
Upper bound on X. Every subset of s vertices in G has at most t − 1 common neighbors since
G is Ks,t -free. Therefore,
 
n
X≤ (t − 1).
s
Lower bound on X. For each vertex v of G, there are exactly degs v ways to pick s of its

neighbors to form a Ks,1 as a subgraph. Therefore
Õ deg v 
X=
s
v∈V(G)
20 1. FORBIDDING A SUBGRAPH

To obtain a lower bound on this quantity in terms of the number of edges m of G, we use a standard
trick by viewing xs as a convex function on the reals, namely, letting

(
x(x − 1) · · · (x − s + 1)/s! if x ≥ s − 1
fs (x) =
0 x < s − 1.

Then f (x) = xs for all nonnegative integers x. Furthermore fs is a convex function. Since the

average degree of G is 2m/n, it follows by convexity that
 
Õ 2m
X= fs (deg v) ≥ n fs .
n
v∈V(G)

Combining the upper bound and the lower bound. We find that
   
2m n
n fs ≤X≤ (t − 1).
n s
Since fs (x) = (1 + o(1))x s /s! for x → ∞ and fixed s, we find that, as n → ∞,
 s
n 2m ns
≤ (1 + o(1)) (t − 1)
s! n s!
Therefore,
(t − 1)1/s
 
m≤ + o(1) n2−1/s . 
2
The final bound in the proof gives us a somewhat more precise estimate than stated in Theo-
rem 1.4.2. Let us record it here for future reference.

Theorem 1.4.3 (KST theorem). Fix positive integers s ≤ t. Then, as n → ∞,

(t − 1)1/s
 
ex(n, Ks,t ) ≤ + o(1) n2−1/s .
2
It has been long conjectured that the KST theorem is tight up to a constant factor.

Conjecture 1.4.4. For positive integers s ≤ t, there exists a constant c = c(s, t) such that for all
n ≥ 2,
ex(n, Ks,t ) ≥ cn2−1/s .
In the final sections of this chapter, we will produce some constructions showing that Conjec-
ture 1.4.4 is true for K2,t and K3,t . We also know that the conjecture is true if t is much larger than
s, namely when t > (s − 1)!. The first open case of the conjecture is K4,4 , although there is dividing
opinion among researchers whether they think the conjecture should be true in this case.
Since every bipartite graph H is a subgraph of some Ks,t , and every H-free graph must be
Ks,t -free, we obtain the following corollary of the KST theorem.

Corollary 1.4.5. For every bipartite graph H, there exists some constant c > 0 so that ex(n, H) =
O(n2−c ).
1.4. FORBIDDING A COMPLETE BIPARTITE GRAPH 21

In particular, the Turán density π(H) of every bipartite graph H is zero.


The KST theorem gives a constant c in the above corollary that depends on the number of
vertices on the smaller part of H. In Section 1.7, we will use the dependent random choice
technique to give a proof of the corollary showing that c only has to depend on the maximum
degree of H.

We give a geometric application of the KST theorem. The following famous problem was posed
by Erdős (1946).
Question 1.4.6 (Unit distance problem). What is the maximum number of unit distances formed
by a set of n points in R2 ?
In other words, given n distinct points in the plane, at most how many pairs of these points can
be exactly distance 1 apart. We can draw a graph with these n points as vertices, with edges joining
points exactly unit distance apart.
To get a feeling for the problem, let us play with some constructions. For small values of n, it
is not hard to check by hand that the following configurations are optimal.

n= 3 4 5 6 7
What about for larger values of n? If we line up the n points equally spaced on a line, we get
n − 1 unit distances.
···
We can be a bit more efficient in by chaining up triangles. The following construction gives us
2n − 3 unit distances.
···
The construction for n = 6 looks like it was obtained by copying and translating a unit triangle. We
can generalize this idea to obtain a recursive construction. Let f (n) denote the maximum number
of unit distances formed by n points in the plane. Given a configuration P with bn/2c points that
has f (bn/2c) unit distances, we can copy P and translate it by a generic unit vector to get P0. The
configuration P ∪ P0 has at least 2 f (bn/2c) + bn/2c unit distances. We can solve the recursion to
get f (n) ≥ cn log n for some constant c.

P P0
1
Now
√ we √take a different approach to obtain an even better construction. Take a square grid with
b nc × b nc vertices. Instead of choosing√ the distance between adjacent points as the unit distance,
we can scale the configuration so that r becomes the “unit” distance for some integer r. As an
illustration, here is an example of a 5 × 5 grid with r = 10.

It turns out that by choosing the optimal r as a function of n, we can get at least
n1+c/log log n
22 1. FORBIDDING A SUBGRAPH

unit distances, where c > 0 is some absolute constant. The proof uses analytic number theory,
which we omit as it would take us too far afield. The basic idea is to choose r to be a product of
many distinct primes that are congruent to 1 modulo 4, so that r can be represented as a sum of two
squares in many different ways, and then estimate the number of such ways.
It is conjectured that the last construction above is close to optimal.

Conjecture 1.4.7. Every set of n points in R2 has at most n1+o(1) unit distances.

The KST theorem can be used to prove the following upper bound on the number of unit
distances.

Theorem 1.4.8. Every set of n points in R2 has O(n3/2 ) unit distances.

p q

Figure 1.4.1. Two vertices p, q can have at most two common neighbors in the unit
distance graph.

Proof. The unit distance graph is K2,3 -free, for every pair of distinct points, there are at most two
other points that are at unit distance from both points (see Figure Figure 1.4.1. So the number of
edges is at most ex(n, K2,3 ) = O(n3/2 ) by Theorem 1.4.2. 
Later in ?? we will use the crossing number inequality to prove a better bound of O(n4/3 ), which
is the best known upper bound to date.
Erdős (1946) also asked the following related question.
Question 1.4.9 (Distinct distance problem). What is the minimum number of distinct distances
formed by n points in R2 ?
Let g(n) denote the answer. The asymptotically best construction for the minimum number of
distinct distances is also a square grid, samepas earlier. It can be shown that a square grid with
√ √
b nc × b nc points has on the order of n/ log n distinct distances. This is conjectured to be
p
optimal, i.e., g(n) . n/ log n.
Let f (n) denote the maximum number of unit distances among n points in the answer. We have
f (n)g(n) ≥ 2n , since each distance occurs at most f (n) times. So an upper bound on f (n) gives a

lower bound on g(n) (but not conversely).
A breakthrough on the distinct distances problem was obtained by Guth and Katz (2015), show-
ing that g(n) & n/log n distinct distances for some constant c. Their proof is quite sophisticated. It
uses tools ranging from the polynomial method to algebraic geometry.

Exercise 1.4.10 (Density KST). Prove that for every pair of positive integers s ≤ t, there are
constants C, c > 0 such that every n-vertex graph with p n
2 edges contains at least cpst ns+t copies
of Ks,t , provided that p ≥ Cn−1/s .
The next exercise asks you to think about the quantitative dependencies in the proof of the KST
theorem.
1.5. FORBIDDING A GENERAL SUBGRAPH 23

Exercise 1.4.11. Show that, for every  > 0, there exists δ > 0 such that every graph with n
vertices and at least n2 edges contains a copy of Ks,t where s ≥ δ log n and t ≥ n0.99 .
The next exercise shows a bad definition of density of a subset of Z2 (it always ends up being
either 0 or 1).
Exercise 1.4.12. Let S ⊂ Z2 . Define
|S ∩ (A × B)|
dk (S) = max .
A,B⊂Z | A||B|
| A|=|B|=k

Show that lim k→∞ dk (S) exists and is always either 0 or 1.

1.5. Forbidding a general subgraph


Let us summarize what we know about extremal numbers ex(n, H) so far. Here we always treat
H as fixed and the asymptotics are given for n → ∞.
Turán’s theorem tells us that
  2
1 n
ex(n, Kr+1 ) = 1 − − o(1) . (1.5.1)
r 2
The KST theorem implies that
ex(n, H) = o(n2 ) for any fixed bipartite graph H.
In this section, we extend these results and determine ex(n, H), up to an o(n2 ) error term, for every
graph H. In other words, we will compute the Turán density π(H).
Initially it seems possible that the Turán density π(H) might depend on H in some complicated
way. It turns out that it only depends on the chromatic number χ(H) of H, which is the smallest
number of colors needed to color the vertices of H such that no two adjacent vertices receive the
same color (such a coloring is called a proper coloring).
Note that if χ(H) > r, then H cannot be a subgraph of any r-partite graph. In particular, the
Turán graph Tn,r is H-free (recall from Definition 1.2.3 that Tn,r is the complete r-partite graph with
n vertices divided into nearly equal parts). Therefore,
  2
1 n
ex(n, H) ≥ e(Tn,r ) = 1 − + o(1) .
r 2
The main theorem of this section, below, is a matching lower bound. It is due to Erdős and Stone
(1946) and Erdős and Simonovits (1966).

Theorem 1.5.1 (Erdős–Stone–Simonovits theorem). Fix a graph H. As n → ∞, as have


  2
1 n
ex(n, H) = 1 − + o(1) .
χ(H) − 1 2
In other words, the Turán density of H is
1
π(H) = 1 − .
χ(H) − 1
Example 1.5.2. When H = Kr+1 , χ(H) = r + 1, and so Theorem 1.5.1 agrees with what we knew
earlier (1.5.1) from Turán’s theorem.
24 1. FORBIDDING A SUBGRAPH

Example 1.5.3. When H is the Petersen graph, below, which has chromatic number 3, Theo-
rem 1.5.1 tells us that ex(n, H) = (1/4 + o(1))n2 . The Turán density of the Petersen graph is the
same as that of a triangle, which may be somewhat surprising since the Petersen graph seems more
complicated than the triangle.
2
1
1 1
3 2
3 2
2 3

In the rest of this section, we prove the Erdős–Stone–Simonovits theorem. The proof given
here is due to Erdős (1971).
Note that when χ(H) = 2, i.e., H is bipartite, the Erdős–Stone–Simonovits theorem follows
from the KST theorem. We begin by proving an extension of the KST theorem to hypergraphs,
using the same double-counting techniques from the proof of the KST theorem.
Recall the hypergraph Turán problem (Remark 1.3.6). Given an r-uniform hypergraph H (also
known as an r-graph), we write ex(n, H) to be the maximum number of edges in an H-free r-graph.
The analog of a complete bipartite graph for a 3-graph is a complete tripartite 3-graph Ks(3) 1,s2,s3
,
where one has three sets of vertices S1 , S2 , S3 of sizes s1 , s2 , s3 , respectively, and every triple in
S1 × S2 × S3 is an edge. More generally, we write Ks(r) 1,...,sr
for a complete r-partite r-graph.
To help keep notation simple, we first consider what happens for 3-uniform hypergraphs.

Theorem 1.5.4 (KST for 3-graphs). For every s, there is some C such that
(3) 2
ex(n, Ks,s,s ) ≤ Cn3−1/s .

Recall that the KST theorem (Theorem 1.4.2) was proved by counting the number of copies of
Ks,1 in the graph in two different ways. For 3-graphs, we instead count the number of copies of
(3)
Ks,1,1 in two different ways, one of which uses the KST theorem for Ks,s -free graphs.

(3)
Proof. Let G be a Ks,s,s -free 3-graph with n vertices and m edges. Let X denote the number of
(3)
copies of Ks,1,1 in G (when s = 1, we count each copy three times).
Upper bound on X. Given a set S of s vertices, consider the set T of all unordered pairs of
(3)
distinct vertices that would form a Ks,1,1 with S (with S in one part, and the two new vertices each
in its own part). Note that T is the edge-set of a graph on the same n vertices. If T contains a
(3)
Ks,s , then together with S we would have a Ks,s,s . Thus T is Ks,s -free, and hence by Theorem 1.4.2,
|T | = Os (n 2−1/s ). Hence
 
n 2−1/s
X .s n .s ns+2−1/s .
s

Lower bound on X. We write deg(u, v) for the number of edges containing both u and v. Then,
summing over all unordered pairs of distinct vertices u, v in G, we have
Õ deg(u, v)
X= .
u,v
s
1.5. FORBIDDING A GENERAL SUBGRAPH 25

As in the proof of Theorem 1.4.2, let


(
x(x − 1) · · · (x − s + 1)/s! if x ≥ s − 1
fs (x) =
0 x < s − 1.
Then fs is convex and fs (x) = xs for all nonnegative integers x. Since the average of deg(u, v) is


3m/ 2n .
  !
Õ n 3m
X= fs (deg(u, v)) ≥ fs n .
u,v
2 2

Combining the upper and lower bounds, we have


  !s
n 3m
n .s ns+2−1/s .
2 2
And hence
2
m = Os (n3−1/s ). 
(3)
Exercise 1.5.5. Show that for every r, s, t, there is some C such that ex(n, Kr,s,t ) ≤ Cn3−1/(r s) .

We can iterate further, using the same technique, to prove an analogous result for every unifor-
mity.

Theorem 1.5.6 (Hypergraph KST). For every r ≥ 2 and s ≥ 1, there is some C such that
(r) −r+1
ex(n, Ks,...,s ) ≤ Cnr−s ,
(r)
where Ks,...,s is the r-partite r-graph with s vertices in each of the r parts.
Proof. We prove by induction on r. The cases r = 2 and r = 3 were covered previously in
Theorem 1.4.2 and Theorem 1.5.4. Assume that r ≥ 3 and that the theorem has already been
established for smaller values of r. (Actually we could have started at r = 1 if we adjust the
definitions appropriately.)
(r)
Let G be a Ks,...,s -free r-graph with n vertices and m edges. Let X denote the number of copies
(r)
of Ks,1,...,1 in G (when s = 1, we count each copy r times).
Upper bound on X. Given a set S of s vertices, consider the set T of all unordered (r − 1)-tuples
(r)
of vertices that would form a Ks,1,...,1 with S (with S in one part, and the r − 1 new vertices each in
its own part). Note that T is the edge-set of an (r − 1) graph on the same n vertices. If T contains
(r−1) (r) (r−1)
a Ks,...,s , then together with S we would have a Ks,...,s . Thus T is Ks,...,s -free, and by the induction
−r+2
hypothesis, |T | = Or,s (n r−1−s ). Hence
 
n r−1−s−r+2 −r+2
X .r,s n .r,s nr+s−1−s .
s
Lower bound on X. Given a set U of vertices, we write deg(U) for the number of edges
containing all vertices in U. Then
Õ deg(U)
X=
(G)
s
U∈(Vr−1 )
26 1. FORBIDDING A SUBGRAPH

Let fs (x) be defined as in the previous proof. Since the average of deg U over all (r − 1)-element
n
subsets U is rm/ r−1 , we have
  !
Õ n rm
X= fs (deg(U)) ≥ fs n  .
V (G)
r −1 r−1
U∈( r−1 )

Combining the upper and lower bounds, we have


  !
n rm s+r−1−s−r+2
fs n  .r,s n .
r −1 r−1
And hence
−r+1
m = Or,s (nr−s ). 
Exercise 1.5.7. Prove that for every sequence of positive integers s1, . . . , sr , there exists C so that

ex(n, Ks(r)
1,...,sr
) ≤ Cnr−1/(s1 ···sr−1 ) .
Now we are ready to prove the Erdős–Stone–Simonovits theorem. It suffices to establish the
result for complete (r + 1)-partite graphs H, since every H with χ(H) is a subgraph of some
complete (r + 1)-partite graph. This result is due to Erdős and Stone (1946).

Theorem 1.5.8 (Erdős–Stone theorem). Fix r ≥ 1 and s ≥ 1. Let H = Ks,...,s be the complete
(r + 1)-partite graph with s vertices in each part. Then
  2
1 n
ex(n, H) = 1 − + o(1) .
r 2
In other words, using the notation Kr+1 [s] for s-blow-up of Kr+1 , obtained by replacing each
vertex of Kr+1 by s duplicates of itself (so that Kr+1 [s] = H in the above theorem statement), the
Erdős–Stone theorem says that
1
π(Kr+1 [s]) = π(Kr+1 ) = 1 − ,
r
As earlier, the lower bound on ex(n, H) in the theorem comes from noting that the r-partite
Turán graph Tn,r is H-free.
The proof of the Erdős–Stone theorem will combine the hypergraph KST theorem with the
supersaturation theorem from Section 1.3. Recall the supersaturation result for subgraphs. Roughly,
it says that if the edge density of G significantly exceeds the Turán density of H, then G must have
many copies of H. The precise statement of Theorem 1.3.3 is copied below.
For every  > 0 and graph H there exist some δ > 0 and n0 such that every graph
on n ≥ n0 vertices with at least (π(H) + ) 2 edges contains at least δnv(H) copies
n

of H as a subgraph.
We can rephrase this theorem equivalently as follows:
Fix H. Every n-vertex graph with o(nv(H) ) copies of H has edge density at most
π(H) + o(1).
(Above, the o(·) hypothesis should be interpreted as being applied to a sequence of graphs rather
than a single graph.) We will apply this supersaturation result for H = Kr+1 , combined with Turán’s
theorem, which tells us that π(Kr+1 ) = 1 − 1/r.
1.6. FORBIDDING CYCLES 27

Proof of Theorem 1.5.8. Let G be an H-free graph. Consider the (r + 1)-graph F with the same
(r+1)
vertices as G, and whose edges are (r + 1)-cliques in G. Note that F is Ks,...,s -free, or else a copy
(r+1)
of Ks,...,s in F would be supported by a copy of H in G. Thus, by Theorem 1.5.6, F has o(nr+1 )
edges. So G has o(nr+1 ) copies of Kr+1 , and thus by the supersaturation theorem quoted above, the
edge density of G is at most π(Kr+1 ) + o(1), which equals 1 − 1/r + o(1) by Turán’s theorem. 
The proof technique illustrates another supersaturation principle: once above the Turán density
threshold π(H), not only can you find one copy of H, but you can actually find a large blow-up of
H. The following exercise illustrates this principle of hypergraphs.
Exercise 1.5.9 (Erdős–Stone for hypergraphs). Let H be an r-graph. Show that π(H[s]) = π(H),
where H[s], the s-blow-up of H, is obtained by replacing every vertex of H by s duplicates of itself.
In Section 2.6, we will give another proof of the Erdős–Stone–Simonovits theorem using the
graph regularity method.

1.6. Forbidding cycles


In this section, we consider the problem of determining ex(n, C` ), the maximum number of
edges in an n-vertex graph without an `-cycle.
First let us consider forbidding odd cycles. Let k be a positive integer. Then C2k+1 has chromatic
number 3, and so the Erdős–Stone–Simonovits theorem (Theorem 1.5.1) tells us that
n2
ex(n, C2k+1 ) = (1 + o(1)) .
4
In fact, an even stronger statement is true. If n is large enough (as a function of k), then the complete
bipartite graph K bn/2c,dn/2e is always the extremal graph, just like in the triangle case.

Theorem 1.6.1 (Odd cycles). Let k be a positive integer. Then for all sufficiently large integer n
(i.e., n ≥ n0 (k) for some n0 (k)), one has
n2
 
ex(n, C2k+1 ) = .
4
We will omit the proof of this theorem.
Let us now turn to forbidding even cycles. Since C2k is bipartite, we know from the KST theorem
that ex(n, C2k ) = o(n2 ). The following upper bound was determined by Bondy and Simonovits
(1974).

Theorem 1.6.2 (Even cycles). For every k ≥ 2, there exists a constant C so that

ex(n, C2k ) ≤ Cn1+1/k .


Remark 1.6.3 (Tightness). We will see in Section 1.10 a matching lower bound construction (up
to constant factors) for k = 2, 3, 5. For all other values of k, it is open whether a matching lower
bound construction exists.
We will not prove this theorem, but still we will prove a weaker result. This weaker result has
a short and neat proof, which hopefully gives some intuition as to why the above theorem should
be true.
28 1. FORBIDDING A SUBGRAPH

Theorem 1.6.4. For any integer k ≥ 2, there exists a constant C so that every graph G with n
vertices and at least Cn1+1/k edges contains an even cycle of length at most 2k.
In other words, Theorem 1.6.4 says that
ex(n, {C2, C4, C6, . . . , C2k }) = O k (n1+1/k ).
Here, given a set F of graphs, ex(n, F ) denotes the maximum number of edges in an n-vertex graph
that does not contain any graph in F as a subgraph.
To prove this theorem, we first clean up the graph by removing some edges and vertices to get
a bipartite subgraph with large minimum degree.

Lemma 1.6.5. Every G has a bipartite subgraph with at least e(G)/2 edges.

Proof. Color every vertex with one of two colors uniformly at random. Then the expected number
of non-monochromatic edges is e(G)/2. Hence there exists a coloring that has at least e(G)/2
non-monochromatic edges, and these edges form the desired bipartite subgraph. 

Lemma 1.6.6. Let t ∈ R. Every graph with average degree 2t has a subgraph with minimum degree
greater than t.
Proof. Let G be a graph with average degree 2t. Removing a vertex of degree at most t cannot
decrease the average degree, since the total degree goes down by at most 2t and so the post-deletion
graph has average degree at least (2e(G) − 2t)/(v(G) − 1), which is at least 2e(G)/v(G) since
2e(G)/v(G) ≥ 2t. Let us repeatedly delete vertices of degree at most t in the remaining graph,
until every vertex has degree more than t. This algorithm must terminate with a non-empty graph
since every graph with at most 2t vertices has average degree less than 2t. 

···

u
A0
A1 ···
A2
A3 Ak

Figure 1.6.1. Exploration from a vertex u in a C2k -free graph in the proof of
Theorem 1.6.4.

Proof of Theorem 1.6.4. Suppose G contains no even cycles of length at most 2k. Applying
Lemma 1.6.5 followed by Lemma 1.6.6, we find a bipartite subgraph G0 of G with minimum degree
greater than e(G)/(2v(G)) =: t. Let u be an arbitrary vertex of G0. For each i = 0, 1, . . . , k, let Ak
denote the set of vertices at distance exactly i from u (see Section 1.6). For each i = 1, . . . , k − 1,
every vertex of Ai has
• no neighbors inside Ai (or else G0 would not be bipartite),
• exactly one neighbor in Ai−1 (else we can backtrace through two neighbors which must converge
at some point to form an even cycle of length at most 2k),
• and thus greater than t − 1 neighbors in Ai+1 (by the minimum degree assumption on G0).
1.7. FORBIDDING A SPARSE BIPARTITE GRAPH (AND DEPENDENT RANDOM CHOICE) 29

Therefore, each layer Ai expands to the next by a factor of at least t − 1, and so | Ak | ≥ (t − 1) k .


Hence
 k
e(G)
k
v(G) ≥ (t − 1) ≥ −1 .
2v(G)
And thus
e(G) ≤ 2v(G)1+1/k + 2v(G). 
Exercise 1.6.7 (Extremal number of trees). Let T be a tree with k edges. Show that ex(n, T) ≤ kn.

1.7. Forbidding a sparse bipartite graph (and dependent random choice)


For any bipartite graph H, it is always contained in Ks,t for some s ≤ t. Therefore by the KST
theorem (Theorem 1.4.2), ex(n, H) ≤ ex(n, Ks,t ) = Os,t (n2−1/s ). The main result of this section,
below, gives a significant improvement when the maximum degree of H is small. The result was
first proved by Füredi (1991). The proof given here is due to Alon et al. (2003b), and uses a
probabilistic combinatorics technique known as dependent random choice.

Theorem 1.7.1. Let H be a bipartite graph with vertex bipartition A ∪ B such that every vertex in
A has degree at most r. Then there exists a constant C = CH such that
ex(n, H) ≤ Cn2−1/r .
Remark 1.7.2. The exponent 2 − 1/r is best possible as a function of r. Indeed, we will see in
the following section that for every r there exists some s so that ex(n, Kr,s ) ≥ cn2−1/r for some
c = c(r, s) > 0.
On the other hand, for specific graphs G, Theorem 1.7.1 may not be tight, e.g., ex(n, C6 ) =
Θ(n4/3 ), whereas Theorem 1.7.1 only tells us that ex(n, C6 ) = O(n3/2 ).
Given a graph G with many edges, the goal is to find a large subset U of vertices such that every
r-vertex subset of U has many common neighbors in G (even the case r = 2 is interesting). We can
then embed the B-vertices of H into U, and then extend the embedding to the whole H. The tricky
part is to find such a U.
Here is some intuition.
We want to host a party so that each pair of party-goers has many common friends. Whom
should we invite? Inviting people uniformly at random is not a good idea (why?). Perhaps we can
pick some random individual (Alice) to host a party inviting all her friends. Alice’s friends are
expected to share some common friends—at least they all know Alice.
We can take a step further, and pick a few people at random (Alice, Bob, Carol, David) and have
them host a party and invite all their common friends. This will likely be an even more sociable
crowd. At least all the party goers will know all the hosts, and likely even more. As long as the
social network is not too sparse, there should be lots of invitees.
Some invitees (e.g., Zack) might be feel a bit out of place at the party—maybe they don’t have
many common friends with other party-goers (they all know the hosts but maybe Zack doesn’t
know many others). To prevent such awkwardness, the hosts will cancel Zack’s invitation. There
shouldn’t be too many people like Zack. The party must go on.
Here is the technical statement that we will prove. While there are many parameters, the specific
details are less important compared to the proof technique. This is quite a tricky proof.
30 1. FORBIDDING A SUBGRAPH

Theorem 1.7.3 (Dependent random choice). Let n, r, m, t be positive integers and α > 0. Then
every graph G with n vertices and at least αn2 /2 edges contains a vertex subset U with
  
t n m t
|U| ≥ nα −
r n
such that every r-element subset S of U has more than m common neighbors in G.
In the theorem statement, t is an auxiliary parameter that does not appear in the conclusion.
While one can optimize for t, it is instructive and convenient to leave it as is. The theorem is
generally applied to graphs with at least n1−c edges, for some small c > 0, and we can play with
the parameters to get |U| and m both large as desired.
Proof. We say that an r-element subset of V(G) is “bad” if it has at most m common neighbors in
G.
Let u1, . . . , ut be vertices chosen uniformly and independently at random from V(G), and let
A be their common neighborhood. (Keep in mind that u1, . . . , ut, A are random. It may be a bit
confusing in this proof what is random and what is not.)
u1 u2 . . . u t

Each fixed vertex v ∈ V(G) has probability (deg(v)/n)t of being adjacent to all of u1, . . . , ut , and so
by linearity of expectations and convexity,
!t
Õ Õ  deg(v)  t 1 Õ deg(v)
E | A| = P(v ∈ A) = ≥n ≥ nαt .
n n v∈V n
v∈V(G) v∈V(G)

For any fixed R ⊂ V(G),


 t
# common neighbors of R
P(R ⊂ A) = P(R is complete to u1, . . . , ut ) = .
n
If R is a bad r-element subset of vertices, then it has at most m common neighbors, and so
 m t
P(R ⊂ A) ≤ .
n
Therefore, summing over all nr possible r-vertex subsets R ⊂ V(G), by linearity of expectation,

  
n m t
E[the number bad r-vertex subsets of A] ≤ .
r n
Let U be obtained from A by deleting an element from each bad r-vertex subset. So U has no bad
r-vertex subsets. Also
E |U| ≥ E | A| − E[the number bad r-vertex subset of A]
  
n m t
≥ nα −t
.
r n
Thus there exists some U with at least this size, with the property that all its r-vertex subsets have
more than m common neighbors. 
1.7. FORBIDDING A SPARSE BIPARTITE GRAPH (AND DEPENDENT RANDOM CHOICE) 31

Now we are ready to show Theorem 1.7.1, which recall says that for a bipartite graph H with
vertex bipartition A ∪ B such that every vertex in A has degree at most r, one has ex(n, H) =
OH (n2−1/r ).
1
Proof of Theorem 1.7.1. Let G be a graph with n vertices and at least Cn2− r edges. By choosing C
large enough (depending only on | A| + |B|), we have
  r n  | A| + |B|  r
− r1
n 2Cn − ≥ |B|.
r n

We want to show that G contains H as a subgraph. By dependent random choice (Theorem 1.7.3),
we can embed the B-vertices of H into G so that every r-vertex subset of B (now viewed as a subset
of V(G)) has > | A| + |B| common neighbors.

nibinalbannabs

j be
b

A B j bibs t

Next, we embed the vertices of A one at a time. Suppose we need to embed v ∈ A (some previous
vertices of A may have already been embedded at this point). Note that v has at ≤ r neighbors in
B, and these ≤ r vertices in B have > | A| + |B| common neighbors in G. While some of these
common neighbors may have already been used up in earlier steps to embed vertices of H, there
are enough of them that they cannot all be used up, and thus we can embed v to some remaining
common neighbor. This process ends with an embedding of H into G. 

Exercise 1.7.4. Let H be a bipartite graph with vertex bipartition A ∪ B, such that r vertices in A
are complete to B, and all remaining vertices in A have degree at most r. Prove that there is some
constant C = CH such that ex(n, H) ≤ Cn2−1/r for all n.

Exercise 1.7.5. Let  > 0. Show that, for sufficiently large n, every K4 -free graph with n vertices
and at least n2 edges contains an independent set of size at least n1− .

Exercise 1.7.6 (Extremal numbers of degenerate graphs).


(a*) Prove that there is some absolute constant c > 0 so that for every positive integer r, every
n-vertex graph with at least n2−c/r edges contains disjoint non-empty vertex subsets A and B
such that every subset of at most r vertices in A has at least nc common neighbors in B and
every subset of at most r vertices in B has at least nc neighbors in A.
(b) We say that a graph H is r-degenerate if its vertices can be ordered so that every vertex has
at most r neighbors that appear before it in the ordering. Show that for every r-degenerate
bipartite graph H there is some constant C > 0 so that ex(n, H) ≤ Cn2−c/r , where c is the
same absolute constant from part (a) (c should not depend on H or r).
32 1. FORBIDDING A SUBGRAPH

1.8. Lower bound constructions: overview


We proved various upper bounds on ex(n, H) in earlier sections. When H is non-bipartite,
the Turán graph construction (Definition 1.2.3) shows that the upper bound in the Erdős–Stone–
Simonovits theorem (Theorem 1.5.1) is tight to the first order. However, when H is bipartite, so
that ex(n, H) = o(n2 ), we have not seen any non-trivial lower bound constructions. In the remainder
of this chapter, we will see some methods for constructing H-free graphs for bipartite H. In some
cases, these constructions will have enough edges to match the upper bounds on ex(n, H) from
earlier sections. However, for most bipartite graphs H, there is a gap in known upper and lower
bounds on ex(n, H). It is a central problem in extremal graph theory to close this gap.
We will see three methods for constructing H-free graphs.
Randomized constructions. The idea is to take a random graph at a density that gives a small
number of copies of H, and then destroy these copies of H by removing some edges from the
random graph. The resulting graph is then H-free. This method is easy to implement and applies
quite generally to all H. For example, it will be shown that
 v(H)−2 
ex(n, H) = ΩH n2− e(H)−1 .

However, bounds arising from this method are usually not tight.
Algebraic constructions. The idea is to use (algebraic) geometry (over a finite field) to construct
a graph. Its vertices correspond to geometric objects such as points or lines. Its edges corresponds
to incidences or other algebraic relations. These constructions sometimes give tight bounds. They
work for a small number of graphs H, and usually require a different ad hoc idea for each H. They
work rarely, but they when do, they can appear quite mysterious, or even magical. Many important
tight lower bounds on bipartite extremal numbers arise this way. In particular it will be shown that
 
ex(n, Ks,t ) = Ωs,t n2−1/s whenever t ≥ (s − 1)! + 1,

thereby matching the KST theorem for such s, t. Also, it will be shown that
 
ex(n, C2k ) = Ω k n1+1/k whenever k ∈ {2, 3, 5},

thereby matching Theorem 1.6.2 for these values of k.


Randomized algebraic constructions. This is a recent invention that combines the above two
ideas. The vertex set is some finite field vector space (or more generally some algebraic set). We
choose a random polynomial, and use it to determine a binary relation on the points, which then
produces a graph.

1.9. Randomized constructions


We use the probabilistic method to construct an H-free graph. The Erdős–Rényi random graph
G(n, p) is the random graph on n vertices where every pair of vertices forms an edge independently
with probability p. We first take a G(n, p) with an appropriately chosen p. The number of copies
of H in G(n, p) is expected to be small, and we can destroy all such copies of H from the random
graph by removing some edges. The remaining graph will then be H-free.
The method of starting with a simple random object and then modifying it is sometimes called
“alteration method” or the “deletion method.”
1.9. RANDOMIZED CONSTRUCTIONS 33

Theorem 1.9.1. Let H be a graph with at least two edges. Then there exists a constant c = cH > 0,
v(H)−2
so that for all n ≥ 2, there exists an H-free graph on n vertices with at least cn2− e(H)−1 edges. In
other words,
v(H)−2
ex(n, H) ≥ cn2− e(H)−1 .
− v(H)−2
Proof. Let G be an instance of the Erdős–Rényi random graph G(n, p), with p = 41 n (chosen
e(H)−1

with hindsight). Let X denote the number of copies of H in G. Then, our choice of p ensures that
 
p n 1
e(H) v(H)
EX ≤ p n ≤ = Ee(G).
2 2 2
Thus  
p n v(H)−2
E[e(G) − X] ≥ & n2− e(H)−1 .
2 2
Take a graph G such that e(G) − X is at least its expectation. Remove one edge from each copy of
v(H)−2
H in G, and we get an H-free graph with at least e(G) − X & n2− e(H)−1 edges. 
For some graphs H, we can bootstrap Theorem 1.9.1 to give an even better lower bound. For
example, if

H= ,

then v(H) = 10 and e(H) = 20, so applying Theorem 1.9.1 directly gives
ex(n, H) & n2−8/19 .
On the other hand, any K4,4 -free graph is automatically H-free. Applying Theorem 1.9.1 to K4,4
(8-vertex 16-edge) actually gives a better lower bound (2 − 6/15 > 2 − 8/19):
ex(n, H) ≥ ex(n, K4,4 ) & n2−6/15 .
In general, given H, we should apply Theorem 1.9.1 to the subgraph of H with the maximum
(e(H) − 1)/(v(H) − 2) ratio. This gives the following corollary, which sometimes gives a better
lower bound than directly applying Theorem 1.9.1.
Definition 1.9.2. The 2-density of a graph H is defined by
e(H 0) − 1
m2 (H) := max .
0H ⊂H v(H 0) − 2
e(H 0 )≥2

Corollary 1.9.3. For any graph H with at least two edges, there exists constant c = cH > 0 such
that
ex(n, H) ≥ cn2−1/m2 (H) .
e(H 0 )−1
Proof. Let H 0 be the subgraph of H with m2 (H) = v(H 0)−2 . Then ex(n, H) ≥ ex(n, H 0), and we can
apply Theorem 1.9.1 to get ex(n, H) ≥ cn2−1/m2 (H) . 
Example 1.9.4. Theorem 1.9.1 combined with the upper bound from the KST theorem (Theo-
rem 1.4.2) gives that for every fixed 2 ≤ s ≤ t,
s+t−2 1
n2− st−1 . ex(n, Ks,t ) . n2− s .
34 1. FORBIDDING A SUBGRAPH

When t is large compared to s, the exponents in the two bounds above are close to each other (but
never equal). When t = s, the above bounds specialize to
2 1
n2− s+1 . ex(n, Ks,s ) . n2− s .
In particular, for s = 2,
n4/3 . ex(n, K2,2 ) . n3/2 .
It turns out that the upper bound is tight. We will show this in the next section using an algebraic
construction.
Exercise 1.9.5. Find a graph H with χ(H) = 3 and ex(n, H) > 41 n2 + n1.99 for all sufficiently large
n.

1.10. Algebraic constructions


In this section, we use algebraic methods to construct Ks,t -free graphs for certain values of (s, t),
as well as C2k -free graphs for certain values of k. In both cases, the constructions are optimal in
that they match the upper bounds up to a constant factor.
We begin by constructing K2,2 -free graphs with the number of edges matching the KST theorem,
even in the constant factor. The construction is due to Erdős, Rényi, and Sós (1966) and Brown
(1966) independently.

Theorem 1.10.1. For every n,


 
1
ex(n, K2,2 ) ≥ − o(1) n3/2 .
2
Combining with the KST theorem, we obtain the corollary.

Corollary 1.10.2. For every n,


 
1
ex(n, K2,2 ) = − o(1) n3/2 .
2
Before giving the proof of Theorem 1.10.1, let us first sketch the geometric intuition.
Consider a point-line incidence graph. This is a bipartite graph with two vertex parts P and
L, where P is a set of points in some geometric space, and L is a set of lines. We put in an edge
between p ∈ P and ` ∈ L if ` passes through p. Note that this graph is C4 -free. Indeed, a C4 would
correspond to two lines both passing through two distinct points, which is impossible.
We want to construct a set of points and a set of lines so that there are many incidences. It
turns out that we should work in a plane over a finite field F p (this is important; if we work in R2 , it
will not be possible to obtain enough incidences due to the Szemerédi–Trotter theorem; see ??). In
F2p , we can actually take all points and all lines. There are p2 points and p2 + p lines. Since every
line contains p points, the graph has around p3 edges, and so ex(2p2 + p, K2,2 ) ≥ p3 . By rounding
down an integer n to the closest number of the form 2p2 + p for a prime p, we already see that
ex(n, K2,2 ) & n3/2 for all n. Here we use a theorem from number theory regarding large gaps in
primes which we quote below without proof.

Theorem 1.10.3 (Large gaps between primes). The largest prime below N has size N − o(N).
1.10. ALGEBRAIC CONSTRUCTIONS 35

Remark 1.10.4. The best quantitative result of this form to date, due to Baker, Harman, and Pintz
(2001), says that there exists a prime in [N − N 0.525, N] for all sufficiently large N. Cramer’s
conjecture, which is wide open and based on a random model of the primes, speculates that the
o(N) in Theorem 1.10.3 may be replaced by O((log N)2 ).

To get a better constant in the above construction, we optimize somewhat by using the same
vertices to represent both points and lines. This pairing of points and lines is known as polarity in
projective geometry.

Proof of Theorem 1.10.1. Let p denote the largest prime such that p2 −1 ≤ n. Then p = (1−o(1)) n
by Theorem 1.10.3. Let G be a graph with vertex set V(G) = F2p \ {(0, 0)} and an edge between
(x, y) and (a, b) if and only if ax + by = 1 in F p .
For any two distinct vertices (a, b) and (a0, b0) in V(G), they have at most one common neighbor
since there is at most one solution to the system ax + by = 1 and a0 x + b0 y = 1. Therefore, G is
K2,2 -free.
For every (a, b) ∈ V(G), there are exactly p vertices (x, y) satisfying ax + by = 1. However,
one of those vertices could be (a, b) itself. So every vertex in G has degree p or p − 1. Hence G
has at least (p2 − 1)(p − 1)/2 = (1/2 − o(1))n3/2 edges. 
Next, we construct K3,3 -free graphs with the number of edges matching the KST theorem. This
is due to Brown (1966).

Theorem 1.10.5. For every n,


 
1
ex(n, K3,3 ) ≥ − o(1) n5/3 .
2
Consider the incidence between between points in 3-dimensions and unit spheres. This graph
is K3,3 -free since no three unit spheres can share three distinct common points. Again, one needs
to do this over a finite field to attain the desired bounds, but it is easier to visualize the setup in
Euclidean space, where it is clearly true.
Remark 1.10.6. It is known that the constant 1/2 in Theorem 1.10.5 is the best constant possible.
Proof sketch. Let p be the largest prime less than n1/3 . Fix a nonzero element d ∈ F p , which we
take to be a quadratic residue if p ≡ 3 (mod 4) and a quadratic non-residue if p . 3 (mod 4).
Construct a graph G with vertex set V(G) = F3p , and an edge between (x, y, z) and (a, b, c) ∈ V(G)
if and only if
(a − x)2 + (b − y)2 + (c − z)2 = d.
It turns out that each vertex has (1 − o(1))p2 neighbors (the intuition here is that, for a fixed
(a, b, c), if we choose x, y, z ∈ F p independently and uniformly at random, then the resulting sum
(a − x)2 + (b − y)2 + (c − z)2 is roughly uniformly distributed, and hence equals to d with probability
close to 1/p). Furthermore, the graph is K3,3 -free (to see this, think about how one might prove
this claim in R3 via algebraic manipulations; by considering the radical planes between pairs of
spheres, the claim boils down to the fact that no sphere has three collinear points, which is true due
to the quadratic (non)residue hypothesis on d).
Thus G is a K3,3 -free graph on p3 ≤ n vertices and with at least (1/2−o(1))p5 = (1/2−o(1))n5/3
edges. 
It is unknown if the above ideas can be extended to construct K4,4 -free graphs with Ω(n7/4 )
edges. It is a major open problem to determine the asymptotics of ex(n, K4,4 ).
36 1. FORBIDDING A SUBGRAPH

Conjecture 1.10.7. For every fixed s ≥ 4, one has

ex(n, Ks,s ) = Θs (n2−1/s ).


Now we present a substantial generalization of the above constructions, due to Kollár, Rónyai,
and Szabó (1996) and Alon, Rónyai, and Szabó (1999). It gives a matching lower bound (up to a
constant factor) to the KST theorem for Ks,t whenever t is sufficiently large compared to s.

Theorem 1.10.8. Fix a positive integer s ≥ 2. Then


 
1
ex(n, Ks,(s−1)!+1 ) ≥ − o(1) n2−1/s .
2

Corollary 1.10.9. If t > (s − 1)!, then

ex(n, Ks,t ) = Θs,t (n2−1/s ).


We first prove a slightly weaker version of Theorem 1.10.8, namely that ex(n, Ks,s!+1 ) ≥ (1/2 −
o(1))n2−1/s . This construction was first shown by Kollár, Rónyai, and Szabó (1996). It will then be
modified to give the full Theorem.
Let p be a prime. Recall that the norm map N : F ps → F p is defined by
2 s−1 p s −1
N(x) := x · x p · x p · · · x p =x p−1 .
Note that N(x) ∈ F p for all x ∈ F ps since N(x) p = N(x) and F p is the set of elements in F ps invariant
under the automorphism x 7→ x p . Furthermore, since F×ps is a cyclic group of order ps − 1, we
know that
ps − 1
|{x ∈ F ps |N(x) = 1}| = . (1.10.1)
p−1
We define NormGraph p,s to be the graph with vertex set F ps and an edge between distinct
a, b ∈ F ps if N(a + b) = 1.
By (1.10.1), every vertex in NormGraph p,s has degree at least
ps − 1
− 1 ≥ ps−1
p−1
(we had to subtract 1 in case N(x + x) = 1). And thus the number of edges is at least p2s−1 /2. It
remains to establish that NormGraph p,s is Ks,s!+1 -free. Once this is done, we can take p to be the
largest prime at most n1/s , and then
p2s−1
 
1
ex(n, Ks,s!+1 ) ≥ ex(p , Ks,s!+1 ) ≥
s
≥ − o(1) n2−1/s .
2 2

Proposition 1.10.10. Let s ≥ 2. Then NormGraph p,s is Ks,s!+1 -free.

We wish to upper bound the number of common neighbors to a set of s vertices. This amount
to showing that a certain system of algebraic equations cannot have too many solutions. We quote
without proof the following key algebraic result from Kollár, Rónyai, and Szabó (1996), which can
be proved using algebraic geometry.
1.10. ALGEBRAIC CONSTRUCTIONS 37

Theorem 1.10.11. Let F be any field and ai j , bi ∈ F such that ai j , ai 0 j for all i , i0. Then the
system of equations
(x1 − a11 )(x2 − a12 ) · · · (xs − a1s ) = b1
(x1 − a21 )(x2 − a22 ) · · · (xs − a2s ) = b2
..
.
(x1 − as1 )(x2 − as2 ) · · · (xs − ass ) = bs
has at most s! solutions (x1, . . . , xs ) ∈ Fs .
Remark 1.10.12. Consider the special case when all the bi are 0. In this case, since the ai j are
distinct for each fixed j, every solution to the system corresponds to a permutation π : [s] → [s],
setting xi = aiπ(i) . So there are exactly s! solutions in this special case. The difficult part of the
theorem says that the number of solutions cannot increase if we move b away from the origin.
Proof of Proposition 1.10.10. Consider distinct y1, y2, . . . , ys ∈ F ps . We wish to bound the number
of common neighbors x. Recall that in a field with characteristic p, we have the identity (x + y) p =
x p + y p for all x, y. So
s−1
1 = N(x + yi ) = (x + yi )(x + yi ) p . . . (x + yi ) p
p s−1 ps−1
= (x + yi )(x p + yi ) . . . (x p + yi )
for all 1 ≤ i ≤ s. By Theorem 1.10.11, these s equations (as i ranges over [s]) have at most s!
p p
solutions in x. Note the hypothesis of Theorem 1.10.11 is satisfied since yi = y j if and only if
yi = y j in F ps . 
Now we modify the norm graph construction to forbid Ks,(s−1)!+1 , thereby yielding Theo-
rem 1.10.8.
Let ProjNormGraph p,s be the graph with vertex set F ps−1 × F×p , where two vertices (X, x), (Y, y) ∈
F ps−1 × F×p are adjacent if and only if
N(X + Y ) = x y.
Every vertex (X, x) has degree ps−1 − 1 since its neighbors are (Y, N(X + Y )/x) for all Y , −X. So
ProjNormGraph p,s has (ps−1 − 1)ps−1 (p − 1)/2 edges. As earlier, it remains to show that this graph
is Ks,(s−1)!+1 -free. Once we know this, by taking p to be the largest prime satisfying ps−1 (p − 1) ≤ n,
we obtain the desired lower bound
ex(n, Ks,(s−1)!+1 ) ≥ (ps−1 − 1)ps−1 (p − 1)/2 ≥ (1/2 − o(1)) n2−1/s .

Proposition 1.10.13. ProjNormGraph p,s is Ks,(s−1)!+1 -free.

Proof. Fix distinct (Y1, y1 ), . . . , (Ys, ys ) ∈ F ps−1 × F×p . We wish to show that there are at most (s − 1)!
solutions (X, x) ∈ F ps−1 × F×p to the system of equations
N(X + Yi ) = xyi, i = 1, . . . , s.
Assume this system has at least one solution. Then if Yi = Yj with i , j we must have that yi = y j .
Therefore all the Yi are distinct. For each i < s, dividing N(X + Yi ) = xyi by N(X + Ys ) = x ys gives
X + Yi
 
yi
N = , i = 1, . . . , s − 1.
X + Ys ys
38 1. FORBIDDING A SUBGRAPH

Dividing both sides by N(Yi − Ys ) gives


 
1 1 yi
N + = , i = 1, . . . , s − 1.
X + Ys Yi − Ys N(Yi − Ys )ys
Now apply Theorem 1.10.11 (same as in the proof of Proposition 1.10.10). We deduce that there
are at most (s − 1)! choices for X, and each such X automatically determines x = N(X + Y1 )/y1 .
Thus there are at most (s − 1)! solutions (X, x). 
Finally, let us turn to constructions of C2k -free graphs. We had mentioned in Section 1.6 that
ex(C2k , n) = O k (n1+1/k ). We saw a matching lower bound construction for 4-cycles. Now we give
matching constructions for 6-cycles and 10-cycles. (It remains an open problem for other cycle
lengths.) The existence of such C2k -free graphs for k ∈ {3, 5} were first due to Benson (1966)
and Singleton (1966). The construction given here is due to Wenger (1991), with a simplified
description due to Conlon (2021).

Theorem 1.10.14. Let k ∈ {2, 3, 5}. Then there is a constant c > 0 such that for every n,
ex(n, C2k ) ≥ cn1+1/k .
The following construction generalizes the point-line incidence graph construction earlier for
the C4 -free graph in Theorem 1.10.1. Here we consider a special set of lines in Fqk , whereas
previously we took all lines in F2q .
Construction 1.10.15. Let q be a prime power. Let L denote the set of all lines in Fqk whose
direction can be written as (1, t, . . . , t k−1 ) for some t ∈ Fq . Let G q,k denote the bipartite point-line
incidence graph with vertex sets Fqk and L, i.e., (p, `) ∈ Fqk × L is an edge if and only if p ∈ `.
We have |L| = q k , since to specify a line in L we can provide a point with first coordinate
equal to zero, along with a choice of t ∈ Fq giving the direction of the line. So the graph G q,k
has n = 2q k vertices. Since each line contains exactly q points, there are exactly q k+1  n1+1/k
edges in the graph. It remains to show that this graph is C2k -free whenever k ∈ {2, 3, 5}. Then
Theorem 1.10.14 would follow after the usual trick of taking q to be the largest prime with 2q k < n.

Lemma 1.10.16. For k ∈ {2, 3, 5}. The graph G q,k from Construction 1.10.15 is C2k -free.

Proof. A 2k-cycle in G q,k would correspond to p1, `1, . . . , p k , `k with distinct p1, . . . , p k ∈ Fqk and
distinct `1, . . . , `k ∈ L, and pi, pi+1 ∈ `i for all i (indices taken mod k). Let (1, ti, . . . , tik−1 ) denote
the direction of `i .

lk h
p
l

Then
pi+1 − pi = ai (1, ti, . . . , tik−1 )
for some ai ∈ Fq \ {0}. Thus (recall that p k+1 = p1 )
k
Õ k
Õ
ai (1, ti, . . . , tik−1 ) = (pi+1 − pi ) = 0. (1.10.2)
i=1 i=1
1.11. RANDOMIZED ALGEBRAIC CONSTRUCTIONS 39

Due to the Vandermonde determinant


1 x1 x12 · · · x1k−1
1 x2 x22 · · · x2k−1 Ö
.. .. .. . . .. = (x j − xi ),
. . . . . i< j
2
1 xk xk · · · x kk−1
we see that, after deleting duplicates, the vectors (1, ti, . . . , tik−1 ), i = 1, . . . , k, are linearly indepen-
dent. For (1.10.2) to hold, each vector (1, ti, . . . , tik−1 ) must be appear at least twice in the sum, with
their coefficients ai adding up to zero.
Since the lines `1, . . . , `k are distinct, for each i = 1, . . . , k (indices taken mod k), the lines `i
and `i+1 cannot be parallel. So ti , ti+1 . When k ∈ {2, 3, 5} it is impossible to select t1, . . . , t k
with no equal consecutive terms (including wrap-around) and so that each value is repeated at least
twice. Therefore the 2k-cycle cannot exist. 

1.11. Randomized algebraic constructions


In this section, we show how to add randomness to algebraic constructions, thereby combining
the power of both approaches. This idea is due to Bukh (2015).
The algebraic constructions in the previous section can be abstractly described as follows. Take
a graph whose vertices are points in some algebraic set (e.g., some finite field geometry), with
two vertices x and y being adjacent if some algebraic relationship, e.g., f (x, y) = 0, is satisfied.
Previously, this f was carefully chosen by hand. The new idea that is to take f to be a random
polynomial.
We illustrate this technique by giving another proof of the tightness of the KST bound on
extremal numbers for Ks,t when t is large compared to s. We will prove the following result in this
section.

Theorem 1.11.1. For every s ≥ 2, there exists some t so that


 
1
ex(n, Ks,t ) ≥ − o(1) n2−1/s .
2
The construction we present here has a worse dependence of t on s than in Theorem 1.10.8. The
main purpose of this section is to illustrate the technique of randomized algebraic constructions.
Bukh (2021) later gave a significant extension of this technique which shows that ex(n, Ks,t ) =
Ωs (n2−1/s ) for some t close to 9s , improving on Theorem 1.10.8, which required t > (s − 1)!.
Let q be the largest prime power satisfying q s ≤ n. Due to prime gaps (Theorem 1.10.3),
we have q = (1 − o(1))n1/s . So it suffices to construct a Ks,t -free graph on q s vertices with
(1/2 − o(1))q2s−1 edges.
Let d = s2 . Let
f ∈ Fq [x1, x2, . . . , xs, y1, y2, . . . , ys ]≤d
be a polynomial chosen uniformly at random among all polynomials with degree at most d in each
of X = (X1, X2, . . . , Xs ) and Y = (Y1, Y2, . . . , Ys ) and furthermore satisfying f (X, Y ) = f (Y, X). In
other words, Õ
j j
f = ai1,...,is, j1,..., js X1i1 · · · Xsis Y1 1 · · · Ys s
i1 +···+is ≤d
j1 +···+ js ≤d
40 1. FORBIDDING A SUBGRAPH

where the ai1,...,is, j1,..., js ∈ Fq ’s are chosen subject to ai1,...,is, j1,..., js = a j1,..., js,i1,...,is, but otherwise
independently and uniformly at random.
Let G be the graph with vertex set Fqs , with distinct x, y ∈ Fqs adjacent if and only if f (x, y) = 0.
Then G is a random graph. The next two lemmas show that G behaves in some ways like a
random graph with edges independently appearing with probability 1/q. Indeed, the next lemma
shows that every pair of vertices form an edge with probability 1/q.

Lemma 1.11.2. Suppose f is randomly chosen as above. For all u, v ∈ Fqs ,

1
P[ f (u, v) = 0] = .
q
Proof. Note that resampling the constant term of f does not change its distribution. Thus, f (u, v)
is uniformly distributed in Fq for a fixed (u, v). Hence f (u, v) takes each value with probability
1/q. 

More generally, we show below that the expected occurrence of small subgraphs mirrors that
of the usual random graph with independent edges. We write U2 for the set of unordered pairs of
element from U.

Lemma 1.11.3. Suppose f is randomly chosen as above. Let U ⊂ Fqs with |U| ≤ d + 1. Then the
( |U |) U
vector ( f (u, v)){u,v}∈(U ) is uniformly distributed in Fq 2 . In particular, for any E ⊂ 2 , one has
2

P[ f (u, v) = 0 for all {u, v} ∈ E] = q−|E | .


Proof. We first perform multivariate Lagrange interpolation to show that ( f (u, v)){u,v} can take all
possible values. For each pair u, v ∈ U with u , v, we can find some polynomial `u,v ∈ F[X1, . . . , Xs ]
of degree at most 1 such that `u,v (u) = 1 and `u,v (v) = 0. For each u ∈ U, let
Ö
qu (X) = `u,v (X) ∈ F[X1, . . . , Xs ]
v∈U\{u}

which has degree at most |U| − 1 ≤ d. It satisfies qu (u) = 1, and qu (v) = 0 for all v ∈ U \ {u0 }. Let
Õ
p(X, Y ) = cu,v (qu (X)qv (Y ) + qv (X)qu (Y ))
U
{u,v}∈( 2 )

with cu,v ∈ Fq . Then p(u, v) = cu,v for all distinct u, v ∈ U.


Now let each cu,v ∈ Fq above be chosen independently and uniformly at random. So p(X, Y ) is a
random polynomial. Note that f (X, Y ) and p(X, Y ) are independent random polynomials both with
degree at most d in each of X and Y and satisfying p(X, Y ) = p(Y, X). Since f is chosen uniformly
( |U |)
at random, it has the same distribution as f + p. Since (p(u, v))u,v = (cu,v )u,v ∈ Fq 2 is uniformly
distributed, the same must be true for ( f (u, v))u,v as well. 

Now fix U ⊂ Fqs with |U| = s. We want to show that it is rare for U to have many common
neighbors. We will use the method of moments. Let
ZU = {x ∈ Fqs \ U : f (x, u) = 0 for all u ∈ U}.
1.11. RANDOMIZED ALGEBRAIC CONSTRUCTIONS 41

Then ZU is the set of common neighbors of U along with possibly some additional vertices in U.
Then using Lemma 1.11.3,
h Õ di
E[|ZU | d ] = E 1{v ∈ ZU }
v∈Fqs \U
Õ
= E[1{v (1), . . . , v (d) ∈ ZU }]
v (1),...,v (d) ∈Fqs \U
Õ
= E[ f (u, v) = 0 for all u ∈ U and v ∈ {v (1), . . . , v (d) }]
v (1),...,v (d) ∈Fqs \U
Õ  q s − |U| 
= q−r s #{surjections [d] → [r]}
r ≤d
r
Õ
≤ #{surjections [d] → [r]}
r ≤d
= Od (1),
Using Markov’s inequality we get
E[X d ] Od (1)
P(X ≥ λ) = P(X d ≥ λ d ) ≤ ≤ . (1.11.1)
λd λd
Remark 1.11.4. All the probabilistic arguments up to this point would be identical had we used a
random graph with independent edges appearing with probability p. In both settings, the X above
is a random variable with constant order expectation. However, their distributions are extremely
different, as we will soon see. For a random graph with independent edges, X behaves like a
Poisson random variable, and consequently, for any constant t, P(X ≥ t) is bounded from below by
a constant. Consequently, many s-element sets of vertices are expected to have at least t common
neighbors, which means that this method will not work. However, this is not the case with the
random algebraic construction. It is impossible for X to take on certain ranges of values—if X is
somewhat large, then it must be very large.
Note that ZU is defined by s polynomial equations. The next result tells us that the number of
points on such an algebraic variety must be either bounded or at least around q.

Lemma 1.11.5. For all s, d there exists a constant C such that if f1 (X), . . . , fs (X) are polynomials
on Fqs of degree at most d, then
{x ∈ Fqs : f1 (x) = . . . fs (x) = 0}

has size either at most C or at least q − C q.
The lemma can be deduced from the following important result from algebraic geometry due
to Loomis and Whitney (1949), which says that the number of points of an r-dimensional algebraic
variety in Fqs is roughly qr , as long as certain irreducibility hypotheses are satisfied. We include
here the statement of the Lang–Weil bound. Here Fq denote the algebraic closure of Fq .

Theorem 1.11.6 (Lang–Weil bound). Let g1, . . . , gm ∈ Fq [X] be polynomials of degree at most d.
Let
s
V = {x ∈ Fq |g1 (x) = g2 (x) = . . . = gm (x)}.
42 1. FORBIDDING A SUBGRAPH

Suppose V is an irreducible variety. Then


V ∩ Fqs = qdim V (1 + Os,m,d (q−1/2 )).
The two cases in Lemma 1.11.5 then correspond to the zero dimension case and the positive
dimension case, though some care is needed to deal with what happens if the variety is reducible
in the field closure. We refer the reader to Bukh (2015) for details on how to deduce Lemma 1.11.5
from the Lang–Weil bound.
Now, continuing our proof of Theorem 1.11.1. Recall f is the random polynomial from earlier.
We fixed an s-element set U ⊂ Fqs and defined ZU = {x ∈ Fqs \U : f (x, u) = 0 for all u ∈ U}. Apply
Lemma 1.11.5 to the polynomials f (X, u) as u ranges over the s elements of U. Then for large
enough q there exists a constant C from Lemma 1.11.5 such that one always have either |ZU | ≤ C
or |ZU | > q/2. Thus, by (1.11.1),
 q Od (1)
P(|ZU | > C) = P |ZU | > ≤ .
2 (q/2)d
So the expected number of s-element subset U with |ZU | > C is at most (recall we set d = s2 at the
beginning of the proof)  s
q Od (1)
= Os (1).
s (q/2)d
Remove from G a vertex from every s-element U with |ZU | > C. Then the resulting graph is
Ks,C+1 -free. Furthermore, we remove at most q s edges for each deleted vertex, so the expected
number of remaining edges is at least
1 qs
   
1
− Os (q ) =
s
− o(1) q2s−1 .
q 2 2
Finally, given n, we can take the largest prime q satisfying q s ≤ n to finish the proof of Theo-
rem 1.11.1.
Further reading
Graph theory is a large subject. There are many important topics that are quite far from the main
theme of this book. Several excellent graph theory textbooks are available: Bollobás (1998), Bondy
and Murty (2008), Diestel (2017), West (1996). The three-volume Combinatorial Optimization
by Schrijver (2003) is also an excellent reference for graph theory, with a focus on combinatorial
algorithms.
For more on the bipartite Turán problem, see the survey by Füredi and Simonovits (2013).
For the hypergraph Turán problem, see the survey by Keevash (2011).
For more on the dependent random choice technique, see the survey by Fox and Sudakov (2011).
CHAPTER 2

The graph regularity method

In this chapter, we discuss a powerful technique in extremal graph theory developed in the
1970’s, known as Szemerédi’s graph regularity lemma. The graph regularity method has wide
ranging applications, and is now considered a central technique in the field. The regularity lemma
produces a “rough structural” decomposition of an arbitrary graph (though it is mainly useful for
graphs with quadratically many edges). It then allows us to model an arbitrary graph by a certain
random graph model.
The regularity method introduces us to a central theme of the book:
the dichotomy of structure versus pseudorandomness.
This dichotomy is analogous to the more familiar concept of “signal versus noise”, namely that a
complex system can be decomposed into a structural piece with plenty of information content (the
signal) as well as a random-like residue (the noise). This idea will show up again later in Chapter 6
when we discuss Fourier analysis in additive combinatorics. It is often the case that the structural
piece and the random-like piece can be individually analyzed. However, it is often far from clear
that one can always come up with a useful decomposition.
We begin the chapter with the statement and the proof of the graph regularity lemma. Subse-
quently, we will prove Roth’s theorem using the regularity method. This proof, due to Ruzsa and
Szemerédi (1978), is not the original proof by Roth (1953), whose original Fourier analytic proof
we will see in Chapter 6. Nevertheless, it is important for being historically one of the first major
applications of the graph regularity method. Similar to the proof of Schur’s theorem in Chapter 0,
this graph theoretic proof of Roth’s theorem demonstrates a fruitful connection between graph
theory and additive combinatorics.
By the regularity method, we mean both the graph regularity lemma as well as methods for
applying it. Rather than viewing it as some specific theorem or set of theorems, graph regularity
should be viewed as a general technique malleable to various applications. The reader should avoid
getting bogged down by specific choices of parameters in the statements and proofs below, and
rather, focus on the main ideas and techniques. Students often find the regularity method difficult
to learn, perhaps because the technical details can obscure the intuition. Section 2.7 contains a
number of important exercises on applying the graph regularity method. It is useful to work through
these exercises carefully in order to practice applying the graph regularity method.

2.1. Szemerédi’s regularity lemma


In this section, we state and prove Szemerédi’s regularity lemma, which, tells us, roughly
speaking:
The vertex set of every graph can be partitioned into a bounded number of parts
so that the graph looks random-like between most pairs of parts.
Below is an illustration of what the outcome of the partition looks like. Here the vertex set of a
graph is partitioned into five parts. Between a pair of parts (including between a part and itself) is
43
44 2. THE GRAPH REGULARITY METHOD

a random-like graph with a certain edge-density (e.g., 0.4 between the first and second parts, and
0.7 between the first and third parts, etc.).

Definition 2.1.1. Let X and Y be sets of vertices in a graph G. Let eG (X, Y ) be the number of edges
between X and Y ; that is,
eG (X, Y ) := |{(x, y) ∈ X × Y : xy ∈ E(G)}| .
Define the edge density between X and Y in G by
eG (X, Y )
dG (X, Y ) := .
|X | |Y |
We drop the subscript G if context is clear.
We allow X and Y to overlap in the definition above. (It may be useful to picture the bipartite
setting, where X and Y are automatically disjoint.)
What should it mean for a graph to be random-like? For the current application, we want to say
that the edge-density between a pair of parts X and Y is similar to the “local” edge density between
subsets of X and Y . It is too restrictive to allow taking every subset of X and Y , e.g., by restricting
to single-vertex sets {x} ⊂ X and {y} ⊂ Y , the “local” edge density between {x} and {y} can vary
from 0 to 1. To avoid this issue, we should only consider densities between not-too-small subsets
X and Y .
The next definition tells us what it means for a graph between a pair of vertex sets to be
“random-like.”

U A B W

Definition 2.1.2. Let G be a graph and U, W ⊂ V(G). We call (U, W) an -regular pair in G if for
all A ⊂ U and B ⊂ W with | A| ≥  |U| , |B| ≥  |W |, one has
|d(A, B) − d(U, W)| ≤  .
If (X, Y ) is not -regular, then we say that their irregularity is witnessed by some A ⊂ U and B ⊂ W
satisfying A ≥  |U|, |B| ≥  |W |, and |d(A, B) − d(U, W)| > .
Exercise 2.1.3. Using the Chernoff bound, show that a random bipartite graph between two sets
is -regular with probability approaching 1 as the sizes of vertex sets grows to infinity.
Remark 2.1.4. The  in | A| ≥  |U| and |B| ≥  |W | plays a different role from the  in
|d(A, B) − d(U, W)| ≤ . However, it is usually not important to distinguish these ’s. So we
use only one  for convenience of notation.
2.1. SZEMERÉDI’S REGULARITY LEMMA 45

Definition 2.1.5. Given a graph G, a partition P = {V1, . . . , Vk } of its vertex set is an -regular
partition if Õ
|Vi ||Vj | ≤  |V(G)| 2 .
(i, j)∈[k]2
(Vi,Vj ) not -regular
In other words, an -regular partition is a partition of the vertex set where all but an -fraction of
the pairs of vertices of the graph lie between -regular parts.
Remark 2.1.6. When |V1 | = · · · = |Vk |, the inequality says that at most  k 2 of pairs (Vi, Vj ) are not
-regular.
Also, note that the summation includes i = j. If none of Vi are too large, say |Vi | ≤ n for each
i, then the terms with i = j contribute to at most i |Vi | 2 ≤ n i |Vi | = n2 , which is neglible.
Í Í

We are now ready to state Szemerédi’s graph regularity lemma.

Theorem 2.1.7 (Szemerédi’s graph regularity lemma). For every  > 0, there exists a constant M
such that every graph has an -regular partition into at most M parts.
Here is the proof idea. We will generate the desired vertex partition according to the following
algorithm:
(1) Start with the trivial partition of V(G), i.e., a single part containing all of V(G).
(2) While the current partition P is not -regular:
(a) For each (Vi, Vj ) that is not -regular, find a witnessing pair in V)i and Vj
(b) Refine P using all the witnessing pairs. (Here given two partitions P and Q of the same
set, we say that Q refines P if each part of Q is contained in a part of P. In other words,
we divide each part of P further to obtain Q.)

We repeat step (2) until the partition is -regular, at which point the algorithm terminates. The
resulting partition is always -regular by design. It remains to show that the number of iterations
is bounded as a function of . To see this, we keep track of a quantity that necessarily increases at
each iteration of the procedure. This is called an energy increment argument. (The reason that
we call it an “energy” is that it is the mean squared density, i.e., an L 2 norm, and kinetic energy in
physics is also an L 2 norm.)
Definition 2.1.8 (Energy). Let G be an n-vertex graph (whose dependence we drop from the
notation). Let U, W ⊂ V(G). Define
|U| |W |
q(U, W) := d(U, W)2 .
n2
For partitions PU = {U1, . . . , Uk } of U and PW = {W1, . . . , Wl } of W, define
k Õ
Õ l
q(PU, PW ) := q(Ui, W j ).
i=1 j=1
46 2. THE GRAPH REGULARITY METHOD

Finally, for a partition P = {V1, . . . , Vk } of V(G), define its energy to be


k Õ
k k Õ
k
Õ Õ |Vi | Vj
q(P) := q(P, P) = q(Vi, Vj ) = d(Vi, Vj )2 .
i=1 j=1 i=1 j=1
n2

Since the edge density is always between 0 and 1, we have 0 ≤ q(P) ≤ 1 for all partitions P.
The following lemmas show that the energy cannot decrease upon refinement, and furthermore, it
must increase substantially at each step of the algorithm above.

Lemma 2.1.9 (Energy never decreases under refinement). Let G be a graph, U, W ⊂ V(G), PU a
partition of U, and PW a partition of W. Then q(PU, PW ) ≥ q(U, W).

U W

ρU ρW

Proof. Let n = v(G). Let PU = {U1, . . . , Uk } and PW = {W1, . . . , Wl }. Choose x ∈ U and y ∈ W


uniformly and independently at random. Let Ui be the part of PU that contains x and W j be the
part of PW that contains y. Define the random variable Z := d(Ui, W j ). We have
s
k Õ l
Õ |W
|Ui | j | n2
E[Z] = d(Ui, W j ) = d(U, W) = q(U, W).
i=1 j=1
|U| |W | |U||W |
We have
k Õ l
Õ |Ui | |W j | n2
E[Z 2 ] = d(Ui, W j )2 = q(PU, PW ).
i=1 j=1
|U| |W | |U||W |
By convexity, E[Z 2 ] ≥ E[Z]2 , which implies q(PU, PW ) ≥ q(U, W). 

Lemma 2.1.10 (Energy never decreases under refinement). Given two vertex partitions P and P 0
of some graph, if P 0 refines P, then q(P) ≤ q(P 0).
Proof. The conclusion follows by applying Lemma 2.1.9 to each pair of parts of P. In more detail,
letting P = {V1, . . . , Vm }, and suppose P 0 refines each Vi into a partition PV0 i = {Vi10 , . . . , Vik0 i } of Vi ,
so that P 0 = PV0 1 ∪ · · · ∪ PV0 m , we have
Õ Õ
q(P) = q(Vi, Vj ) ≤ q(PV0 i , PV0 j ) = q(P 0). 
i, j i, j

Lemma 2.1.11 (Energy boost for an irregular pair). Let G be an n-vertex graph. If (U, W) is not
-regular, as witnessed by A ⊂ U and B ⊂ W, then
|U| |W |
q({A, U \ A}, {B, W \ B}) > q(U, W) +  4 .
n2
2.1. SZEMERÉDI’S REGULARITY LEMMA 47

This is the “red bull lemma”, giving an energy boost when feeling irregular.
Proof. Define Z as in the proof of Lemma 2.1.9 for PU = {A, U \ A} and PW = {B, W \ B}. Then
n2
Var(Z) = E[Z 2 ] − E[Z]2 = (q(PU, PW ) − q(U, W)) .
|U| |W |
We have Z = d(A, B) with probability ≥ | A| |B| /(|U| |W |) (corresponding to the event x ∈ A and
y ∈ B). So
Var(Z) = E[(Z − E[Z])2 ]
| A| |B|
≥ (d(A, B) − d(U, W))2
|U| |W |
>  ·  ·  2.
Putting the two inequalities together gives the claim. 
The next lemma, corresponding to step (2b) of the algorithm above, shows that we can put all
the witnessing pairs together to obtain an energy increment.

Lemma 2.1.12 (Energy boost for an irregular partition). If a partition P = {V1, . . . , Vk } of V(G) is
not -regular, then there exists a refinement Q of P where every Vi is partitioned into at most 2 k+1
parts, and such that
q(Q) > q(P) +  5 .
Proof. Let
R = {(i, j) ∈ [k]2 : (Vi, Vj ) is -regular} and R = [k]2 \ R.
For each pair (Vi, Vj ) that is not -regular, find a pair Ai, j ⊂ Vi and Bi, j ⊂ Vj that witnesses the
irregularity. Do this simultaneously for all (i, j) ∈ R. Note for i , j, we can take Ai, j = B j,i due to
symmetry. When i = j, we should allow for the possibility of Ai,i and Bi,i to be distinct.
Let Q be a common refinement of P by all the Ai, j and Bi, j ’s. There are ≤ k + 1 such distinct
non-empty sets inside each Vi . So Q refines each Vi into at most 2 k+1 parts. Let Qi be the partition
of Vi given by Q. Then, using the monotonicity of energy under refinements (Lemma 2.1.9),
Õ
q(Q) = q(Qi, Q j )
(i, j)∈[k]2
Õ Õ
= q(Qi, Q j ) + q(Qi, Q j )
(i, j)∈R (i, j)∈R
Õ Õ
≥ q(Vi, Vj ) + q({Ai, j , Vi \Ai, j }, {Bi, j , Vj \Bi, j }).
(i, j)∈R (i, j)∈R

By Lemma 2.1.11, the energy boost lemma, the above sum is


Õ Õ |Vi | Vj
> q(Vi, Vj ) + 4 .
n2
(i, j)∈[k]2 (i, j)∈R

The first sum equals q(P), and the second sum is >  5 by Lemma 2.1.11 since P is not -regular,
we deduce the desired inequality. 
48 2. THE GRAPH REGULARITY METHOD

Remark 2.1.13. There is a subtlety in the above proof that might be easy to get wrong if you try to
re-do the proof yourself. The refinement Q must be obtained in a single step by refining P using
all the witnessing sets Ai, j simultaneously. If instead one picks out a pair Ai, j ⊂ Vi and A j,i ⊂ Vj ,
refines the partition using just this pair, and then tries to iterate using another irregular pair (Vi 0, Vj 0 ),
the energy boost step would not work, since -regularity (or lack thereof) is not well-preserved
under taking refinements.
Proof of the graph regularity lemma (Theorem 2.1.7). Start with a trivial partition of the vertex set
of the graph. Repeatedly apply Lemma 2.1.12 whenever the current partition is not -regular. By
Lemma 2.1.12, the energy of the partition increases by more than  5 at each iteration. Since the
energy of the partition is ≤ 1, we must stop after <  −5 iterations, terminating in an -regular
partition.
If a partition has k parts, then Lemma 2.1.12 produces a refinement with ≤ k2 k+1 parts. We
start with a trivial partition with one part, and then refine <  −5 times. Observe the crude bound
k
k2 k+1 ≤ 22 . So the total number of parts at the end is ≤ tower(d2 −5 e), where

2
··
2· height k
tower(k) := 2 . 
Remark 2.1.14. Let us stress what the proof is not saying. It is not saying that the partition gets
more and more regular under each refinement. Also, it is not saying that partition gets more regular
as the energy gets higher. Rather, the role of the energy quantity is simply to bound the number of
iterations of the refinements.
The bound on the number of parts guaranteed by the proof is a constant for each fixed  > 0,
but it grows extremely quickly as  gets smaller. Is the poor quantitative dependence somehow
due to a suboptimal proof strategy? Surprisingly, the tower-type bound is necessary, as shown by
Gowers (1997).

Theorem 2.1.15 (Lower bound on the number of parts in a regularity partition). There exists a
constant c > 0 such that for all sufficiently small  > 0, there exists a graph with no -regular
partitions into fewer than tower(d −c e) parts.
We do not include the proof here. The idea is to construct a graph that roughly reverse engineers
the proof of the regularity lemma, so there is essentially a unique -regular partition, which must
have many parts. This tells us that the proof of the regularity lemma we just saw is the natural
proof.
Recall that in Definition 2.1.5 of an -regular partition, we are allowed to have some irregular
pairs. Is it necessary to permit irregular pairs? It turns that we must permit them. Exercise 2.1.21
gives an example of a graph where every regularity partition must have irregular pairs.
The regularity lemma is quite flexible. For example, we can start with an arbitrary partition of
V(G) instead of the trivial partition in the proof, in order to obtain a partition that is a refinement
of a given partition. The exact same proof with this modification yields the following.

Theorem 2.1.16 (Regularity starting with an arbitrary initial partition). For every  > 0, there exists
a constant M such that for every graph G and a partition P0 of V(G), there exists an -regular
partition P of V(G) that is a refinement of P0 , and such that each part of P0 is refined into at most
M parts.
2.1. SZEMERÉDI’S REGULARITY LEMMA 49

Here is another strengthening of the regularity lemma where we impose the additional require-
ment that vertex parts should be as equal in size as possible. We say that a partition is equitable if
all part sizes are within one of each other. In other words, a partition of a set of size n into k parts
is equitable if every part has size bn/kc or dn/ke.

Theorem 2.1.17 (Equitable regularity partition). For all  > 0 and m0 , there exists a constant
M such that every graph has an -regular equitable partition of its vertex set into k parts with
m0 ≤ k ≤ M.
Remark 2.1.18. The lower bound m0 requirement on the number of parts is somewhat superficial.
The reason for including it here is that it is often convenient to discard all the edges that lie within
individual parts of the partition, and since there are most n2 /k such edges, they contribute negligibly
if k is not too small, e.g., if we require m0 ≥ 1/.
There are several ways to guarantee equitability. One method is sketched below. We equitize
the partition at every step of the refinement iteration, so that at each step in the proof, we both
obtain an energy increment and also end up with an equitable partition. Here we omit detailed
choices of parameters and calculations, which are mostly straightforward but can get a bit messy.
One can see the original paper of Szemerédi (1978) for details.
Proof sketch of Theorem 2.1.17. Here is a modified algorithm:
(1) Start with an arbitrary equitable partition of the graph into m0 parts.
(2) While the current equitable partition P is not -regular:
(a) (Refinement/energy boost) Refine the partition using pairs that witness irregularity (as in
the earlier proof). The new partition P 0 divides each part of P into ≤ 2|P | parts.
(b) (Equitization) Modify P 0 into an equitable partition by arbitrarily chopping each part
of P 0 into parts of size |V(G)| /m (for some appropriately chosen m = m(|P 0 | , )) plus
some leftover pieces, which are then combined together and then divided into parts of
size |V(G)| /m.
The refinement step (2a) increases energy by ≥  5 as before. The energy might go down in the
equitization step (2b), but it should notdecrease by much, provided that the m chosen in that step
is large enough (say, m = 100 |P 0 |  −5 ). So overall, we still have an energy increment of ≥  5 /2
at each step, and hence the process still terminates after O( −5 ) steps. The total number of parts at
the end is ≤ m0 tower(O( −5 )). 
Exercise 2.1.19 (Basic inheritance of regularity). Let G be a graph and X, Y ⊂ V(G). If (X, Y ) is
an η-regular pair, then (X 0, Y 0) is -regular for all X 0 ⊂ X with |X 0 | ≥ η|X | and Y 0 ⊂ Y with
|Y 0 | ≥ η|Y |.
Exercise 2.1.20 (An alternate definition of regular pairs). Let G be a graph and X, Y ⊂ V(G). Say
that (X, Y ) is -homogeneous if for all A ⊂ X and B ⊂ Y , one has
|e(A, B) − | A| |B| d(X, Y )| ≤  |X | |Y | .
Show that if (X, Y ) is -regular, then it is -homogeneous. Also, show that if (X, Y ) is  3 -
homogeneous, then it is -regular.
The next exercise shows why we must allow for irregular pairs in the graph regularity lemma.
Exercise 2.1.21 (Unavoidability of irregular pairs). Let the half-graph Hn be the bipartite graph on
2n vertices {a1, . . . , an, b1, . . . , bn } with edges {ai b j : i ≤ j}.
50 2. THE GRAPH REGULARITY METHOD

(a) For every  > 0, explicitly construct an -regular partition of Hn into O(1/) parts.
(b) Show that there is some c > 0 such that for every  ∈ (0, c), every positive integer k and
sufficiently large multiple n of k, every partition of the vertices of Hn into k equal-sized parts
contains at least ck pairs of parts which are not -regular.

Exercise 2.1.22 (Existence of a regular pair of subsets). Show that there is some absolute constant
C > 0 such that for every 0 <  < 1/2, every graph on n vertices contains an -regular pair of
−C
vertex subsets each with size at least δn, where δ = 2− .

Exercise 2.1.23 (Existence of a regular subset). Given a graph G, we say that X ⊂ V(G) is
-regular if the pair (X, X) is -regular, i.e., for all A, B ⊂ X with | A|, |B| ≥  |X |, one has
|d(A, B) − d(X, X)| ≤ .
This exercise asks for two different proofs of the claim:
For every  > 0, there exists δ > 0 such that every graph contains an -regular subset of vertices
that is an ≥ δ fraction of the vertex set.
(a) Prove the claim using Szemerédi’s regularity lemma, showing that one can obtain the -regular
subset by combining a suitable sub-collection of parts from some regularity partition.
(b*) Give an alternative proof of the claim showing that one can take δ = exp(− exp( −C )) for some
constant C.

Exercise 2.1.24∗ (Regularity partition into regular sets). Prove or disprove: for every  > 0 there
exists M so that every graph has an -regular partition into at most M parts, with every part being
-regular with itself.

2.2. Triangle counting lemmas


Szemerédi’s regularity lemma gave us a vertex partition of a graph. How can we use this
partition?
In this section, we begin by establishing the triangle counting lemma. Given three vertex sets
X, Y, Z, pairwise -regular in G, we can approximate it by a random tripartite graph on X, Y, Z with
the same edge densities between parts. By comparing G to its random model approximation, we
expect the number of triples (x, y, z) ∈ X × Y × Z forming a triangle in G to be roughly

d(X, Y )d(X, Z)d(Y, Z)|X ||Y ||Z |.

The triangle counting lemma makes this intuition precise.

X
x

Y y z Z

Theorem 2.2.1 (Triangle counting lemma). Let G be a graph and X, Y, Z be subsets of the vertices of
G such that (X, Y ), (Y, Z), (Z, X) are all -regular pairs for some  > 0. If d(X, Y ), d(X, Z), d(Y, Z) ≥
2.2. TRIANGLE COUNTING LEMMAS 51

2, then
|{(x, y, z) ∈ X × Y × Z : xyz is a triangle in G}|
≥ (1 − 2)(d(X, Y ) − )(d(X, Z) − )(d(Y, Z) − ) |X | |Y | |Z |.
Remark 2.2.2. The vertex sets X, Y, Z do not have to be disjoint, but one does not lose any generality
by assuming that they are disjoint in this statement. Indeed, starting with X, Y, Z ⊂ V(G), one can
always create with auxiliary tripartite graph G0 with vertex parts being disjoint replicas of X, Y, Z
and the edge relations in X × Y being the same for G and G0, and likewise for X × Z and Y × Z.
Under this auxiliary construction, a triple in X × Y × Z forms a triangle in G if and only it forms a
triangle in G0.
X
X −→
Y Z
Y Z

G G0

We first begin with a simple but very useful consequence of -regularity. It says that in an
-regular pair (X, Y ), almost all vertices of X have roughly the same number of neighbors in Y .

Lemma 2.2.3. Let (X, Y ) be an -regular pair. Then <  |X | vertices in X have < (d(X, Y ) − ) |Y |
neighbors in Y . Likewise, <  |Y | vertices in Y have < (d(X, Y ) − ) |X | neighbors in X.
Proof. Let A be the vertices in X with < (d(X, Y )−) |Y | neighbors in Y . Then d(A, Y ) < d(X, Y )−,
and thus | A| <  |X | by Definition 2.1.2 as (X, Y ) is an -regular pair. The other claim is similar. 
X

Y Z

-regular

Proof of Theorem 2.2.1. By Lemma 2.2.3, we can find X 0 ⊂ X with |X 0 | ≥ (1 − 2) |X | such that
every vertex x ∈ X 0 has ≥ (d(X, Y ) − ) |Y | neighbors in Y and ≥ (d(X, Z) − )|Z | neighbors in Z.
For each such x ∈ X 0, we have |N(x) ∩ Y | ≥ (d(X, Y ) − ) |Y | ≥  |Y |. Likewise, |N(x) ∩ Z | ≥
 |Z |. Since (Y, Z) is -regular, the edge density between N(x) ∩ Y and N(y) ∩ Z is ≥ d(Y, Z) − .
So for each x ∈ X 0, the number of edges between N(x) ∩ Y and N(x) ∩ Z is
≥ (d(Y, Z) − )|N(x) ∩ Y ||N(x) ∩ Z | ≥ (d(X, Y ) − )(d(X, Z) − )(d(Y, Z) − )|Y ||Z |.
Multiplying by |X 0 | ≥ (1−2) |X |, we obtain the desired lower bound on the number of triangles. 
Remark 2.2.4. We only need the lower bound on the triangle count for our applications in this
chapter, but the same proof can also be modified to give an upper bound, which we leave as an
exercise.
52 2. THE GRAPH REGULARITY METHOD

2.3. Triangle removal lemma


Now we are ready to deduce our first major application of the regularity lemma: the triangle
removal lemma, due to Ruzsa and Szemerédi (1978). Informally, the triangle removal lemma
says that a graph with few triangles can be made triangle-free by removing a few edges. Here,
“few triangles” means a subcubic number of triangles (i.e., asymptotically less than the maximum
possible number) and “few edges” means a subquadratic number of edges.

Theorem 2.3.1 (Triangle removal lemma). For all  > 0, there exists δ > 0 such that any graph
on n vertices with fewer than δn3 triangles can be made triangle-free by removing fewer than n2
edges.

The triangle removal lemma can be equivalently stated as:


An n-vertex graph with o(n3 ) triangles can be made triangle-free by removing
o(n2 ) edges.
Our proof of Theorem 2.3.1 demonstrates how to apply the graph regularity lemma. Here is a
representative “recipe” for the regularity method.

Remark 2.3.2 (Regularity method recipe). Typical applications of the regularity method proceed
in the following steps:
(1) Partition the vertex set of a graph using the regularity lemma.
(2) Clean the graph by removing edges that behave poorly in the regularity partition. Most
commonly, we remove edges that lie between pairs of parts with
(a) irregularity, or
(b) low-density, or
(c) one of the parts too small
This ends up removing a negligible number of edges.
(3) Count a certain pattern in the cleaned graph using a counting lemma.

To prove the triangle removal lemma, after cleaning the graph (which removes few edges), we
claim that the resulting cleaned graph must be triangle-free, or else the triangle counting lemma
would find many triangles, contradicting the hypothesis.

Proof of the triangle removal lemma (Theorem 2.3.1). Suppose we are given a graph on n vertices
with < δn3 triangles, for some parameter δ we will choose later. Apply the graph regularity lemma,
Theorem 2.1.7, to obtain an /4-regular partition of the graph with parts V1, V2, · · · , Vm . Next, for
each (i, j) ∈ [m]2 , remove all edges between Vi and Vj if
(a) (Vi, Vj ) is not /4-regular, or
(b) d(Vi, Vj ) < /2, or
(c) min{|Vi | , Vj } < n/(4m).
Since the partition is /4-regular (recall Definition 2.1.5), the number of edges removed in (a)
from irregular pairs is
Õ 
≤ |Vi ||Vj | ≤ n2 .
i, j
4
(Vi,Vj ) not (/4)-regular
2.3. TRIANGLE REMOVAL LEMMA 53

The number of edges removed in (b) from low-density pairs is


Õ Õ 
≤ d(Vi, Vj )|Vi ||Vj | ≤ |Vi ||Vj | = n2 .
i, j
2 i, j 2
d(Vi,Vj )</2

The number of edges removed in (c) with an endpoint in a small part is


n 
<m· · n = n2 .
4m 4
In total, we removed < n2 edges from the graph.
We claim that the remaining graph is triangle-free, provided that δ was chosen appropriately
small. Indeed, suppose there remains a triangle whose three vertices lie in Vi, Vj , Vk (not necessarily
distinct parts).
Vi

triangle counting lemma

Vj Vk

one triangle cubically many triangles

Because edges between the pairs described in (a) and (b) were removed, Vi, Vj , Vk satisfy the
hypotheses of the triangle counting lemma (Theorem 2.2.1),
     3
#{triangles in Vi × Vj × Vk } ≥ 1 − |Vi | Vj |Vk |
2 4
      3  n  3
≥ 1− ,
2 4 4m
where the final step uses (c) above. Then as long as
1     3   3
δ< 1− ,
6 2 4 4m
we would contradict the hypothesis that the original graph has < δn3 triangles (the extra factor of
6 above is there to account for the possibility that Vi = Vj = Vk ). Since m is bounded for each fixed
, we see that δ can be chosen to depend only on . 

The next corollary of the triangle removal lemma will soon be used to prove Roth’s theorem.

Corollary 2.3.3. Let G be an n-vertex graph where every edge lies in a unique triangle. Then G
has o(n2 ) edges.
Proof. Let G have m edges. Because each edge lies in exactly one triangle, the number of triangles in
G is m/3 = O(n2 ) = o(n3 ). By the triangle removal lemma (see the statement after Theorem 2.3.1),
we can remove o(n2 ) edges to make G triangle-free. However, deleting an edge removes at most one
triangle from the graph by assumption, so m/3 edges need to be removed to make G triangle-free.
Thus m = o(n2 ). 
54 2. THE GRAPH REGULARITY METHOD

Remark 2.3.4 (Quantitative dependencies in the triangle removal lemma). Since the above proof
of the triangle removal lemma applies the graph regularity lemma, the resulting bounds from
the proof are quite poor: it shows that one can pick δ = 1/tower( −O(1) ). Using a different but
related method, Fox (2011) proved the triangle removal lemma with a slightly better dependence
δ = 1/tower(O(log(1/))). In the other direction, we know that the triangle removal lemma does
not hold with δ =  c log(1/) for a sufficiently small constant c > 0. The construction comes from the
Behrend construction of large 3-AP-free sets that we will soon see in Section 2.5. Our knowledge of
the quantitative dependence in Corollary 2.3.3 comes from the same source, specifically, we know

that the o(n2 ) can be sharpened to n2 /eΩ(log (1/)) (where log∗ , the iterated logarithm function, is the
number of iterations of log that one needs √ to take to bring a number to at most 1) but the statement
2 2 −C log n
is false if the o(n ) is replaced by n e for some sufficiently large constant C. It is a major
open problem to close the gap between these the upper and lower bounds in these problems.

The triangle removal lemma was historically first considered in the equivalent formulation.

Theorem 2.3.5 ((6, 3)-theorem). Let H be an n-vertex 3-uniform hypergraph without a subgraph
having 6 vertices and 3 edges. Then H has o(n2 ) edges.

Exercise 2.3.6. Deduce the (6, 3)-theorem from Corollary 2.3.3, and vice-versa.

The following influential conjecture due to Brown, Erdős, and Sós (1973) is a major open
problem in extremal combinatorics. It is a natural generalization of Theorem 2.3.5.

Conjecture 2.3.7 ((7, 4)-conjecture). Let H be an n-vertex 3-uniform hypergraph without a sub-
graph having 7 vertices and 4 edges. Then H has o(n2 ) edges.

2.4. Graph theoretic proof of Roth’s theorem


We will now prove Roth’s theorem, which we saw in Chapter 0 and restated below. The proof
below, due to Ruzsa and Szemerédi (1978), draws upon the connection between graph theory and
additive combinatorics that we first saw in the proof of Schur’s theorem in Chapter 0.
We write 3-AP for “3-term arithmetic progression”, say that A is 3-AP-free if there are no
x, x + y, x + 2y ∈ A with y , 0.

Theorem 2.4.1 (Roth’s theorem). Let A ⊂ [N] be 3-AP-free. Then | A| = o(N).

Proof. Embed A ⊂ Z/MZ with M = 2N + 1 (to avoid wraparounds). Since A is 3-AP-free in Z, it


is 3-AP-free in Z/MZ as well.
Now, we construct a tripartite graph G whose parts X, Y, Z are all copies of Z/MZ. The edges
of the graph are (since M is odd, we are allowed to divide by 2 in Z/MZ):

• (x, y) ∈ X × Y whenever y − x ∈ A;
• (y, z) ∈ Y × Z whenever z − y ∈ A;
• (x, z) ∈ X × Z whenever (z − x)/2 ∈ A.
2.4. GRAPH THEORETIC PROOF OF ROTH’S THEOREM 55

Z/MZ

y
x ∼ y iff y ∼ z iff
y−x ∈ A z−y ∈ A

z
x

Z/MZ x ∼ z iff Z/MZ


(z − x)/2 ∈ A

In this graph, (x, y, z) ∈ X × Y × Z is a triangle if and only if


z−x
y − x, , z − y ∈ A.
2
The graph was designed so that the above three numbers form an arithmetic progression in the
listed order. Since A is 3-AP-free, these three numbers must be all equal. So, every edge of G lies
in a unique triangle, formed by setting the three numbers above to equal.
The graph G has exactly 3M = 6N + 3 vertices and 3M | A| edges. Corollary 2.3.3 implies that
G has o(N 2 ) edges. So | A| = o(N). 
Now we prove a higher dimensional generalization of Roth’s theorem.
A corner in Z2 is a three-element set of the form {(x, y), (x + d, y), (x, y + d)} with d > 0.

(Note that one could relax the assumption d > 0 to d , 0, allowing “negative” corners. As shown
in the first step in the proof below, the assumption d > 0 is inconsequential.)

Theorem 2.4.2. Every corner-free subset of [N]2 has size o(N 2 ).

The is due to Ajtai and Szemerédi (1974), who originally proved it by invoking the full power
of Szemerédi’s theorem. Here we present a much simpler proof using the triangle removal lemma
due to Solymosi (2003).
Proof. First we show how to relax the assumption in the definition of a corner from d > 0 to d , 0.
Let A ⊂ [N]2 be a corner-free set. For each z ∈ Z2 , let Az = A ∩ (z − A). Then | Az | is the
number of ways that one can write z = a + b for some (a, b) ∈ A × A. So z∈[2N]2 | Az | = | A| 2 , so
Í

there is some z ∈ [2N] with | Az | ≥ | A| 2 /(2N)2 . To show that | A| = o(N 2 ), it suffices to show that
| Az | = o(N 2 ). Moreover, since Az = z − Az , it being corner-free implies that does not contain three
points {(x, y), (x + d, y), (x, y + d)} with d , 0.
Write A = Az from now on. Build a tripartite graph G with parts X = {x1, . . . , xN }, Y =
{y1, . . . , yN } and Z = {z1, . . . , z2N }, where each vertex xi corresponds to a vertical line {x = i} ⊂
Z2 , each vertex y j corresponds to a horizontal line {y = j}, and each vertex z k corresponds to a
slanted line {y = −x + k} with slope −1. Join two distinct vertices of G with an edge if and only
if the corresponding lines intersect at a point belonging to A. Then, each triangle in the graph G
56 2. THE GRAPH REGULARITY METHOD

corresponds to a set of three lines of slopes 0, ∞, −1 pairwise intersecting at a point of A.


X
xi

y=j
yj
Y zk Z
x=i x+y=k

Since A is corner-free in the sense stated at the end of the previous paragraph, xi , y j , z k form a
triangle in G if and only if the three corresponding lines pass through the same point of A (i.e.,
forming a trivial corner with d = 0). Since there is exactly one line of each direction passing
through every point of A, it follows that each edge of G belongs to exactly one triangle. Thus, by
Corollary 2.3.3, 3 | A| = e(G) = o(N 2 ). 
The upper bound on corner-free sets actually implies Roth’s theorem, as shown below. So we
now have a second proof of Roth’s theorem (though, this second proof is secretly the same as the
first proof).

Proposition 2.4.3 (Corner-free sets versus 3-AP-free sets). Let r3 (N) be the size of the largest
subset of [N] which contains no 3-term arithmetic progression, and rx (N) be the size of the largest
subset of [N]2 which contains no corner. Then, r3 (N)N 6 rx (2N).

2N

0 A N 2N

Proof. Given a 3-AP-free set A ⊂ [N] of size r3 (N), define a set


B := (x, y) ∈ [2N]2 : x − y ∈ A .


Each element a ∈ A gives rise to ≥ N different elements (x, y) of B with x − y = a. So


|B| ≥ N | A|. Furthermore, B is corner-free, since each corner (x + d, y), (x, y), (x, y + d) in B gives
rise to a 3-AP x − y − d, x − y, x − y + d with common difference d. So |B| ≤ rx (2N). Thus
r3 (N)N ≤ | A| N ≤ |B| ≤ rx (2N). 
Remark 2.4.4 (Quantitative bounds). Both proofs above rely on the graph regularity lemma, and
hence give poor quantitative bounds. They tell us that a 3-AP-free A ⊂ [N] has | A| ≤ N/(log∗ N)c ,
where log∗ N is the iterated logarithm (the number of times the logarithm function must be applied
to bring N to less than or equal to 1). Later in Chapter 6 we discuss Roth’s original Fourier analytic
proof, which uses different methods (though sharing the structure versus randomness dichotomy
theme) and gives much better quantitative bounds.
The current best upper bound on the size of corner-free subsets of [N]2 is N 2 /(log log N)c for
some constant c > 0, proved by Shkredov (2006).
2.5. CONSTRUCTING LARGE 3-AP-FREE SETS 57

Exercise 2.4.5∗ (Arithmetic triangle removal lemma). Show that for every  > 0, there exists δ > 0
such that if A ⊂ [n] has fewer than δn2 many triples (x, y, z) ∈ A3 with x + y = z, then there is some
B ⊂ A with | A \ B| ≤ n such that B is sum-free, i.e., there do not exist x, y, z ∈ B with x + y = z.

2.5. Constructing large 3-AP-free sets


How can we construct a large subset of [N] without a 3-term arithmetic progression (3-AP)?
We can do it greedily. Starting with 0 (which produces a nicer pattern), we successively put
in each positive integer if adding it does not create a 3-AP with the already chosen integers. This
would produce the following sequence:
0 1 3 4 9 10 12 13 27 28 30 31 ··· .
The above sequence is known as a Stanley sequence. It consists of all nonnegative integers whose
ternary representations have only the digits 0 and 1 (why?). Up to N = 3 k , the subset A ⊂ [N] so
constructed has size | A| = 2 k = N log3 2 .
For quite some time, people thought the above example was close to the optimal. However,
Salem and Spencer (1942) showed that one can actually construct a 3-AP-free subset of [N]
of size N 1−o(1) . Their result was further improved by Behrend (1946), whose construction we
present below. Behrend’s construction plays a central role and has been widely used. It has some
surprising applications, including the design of fast matrix multiplication algorithms (Coppersmith
and Winograd 1990). It has not yet been substantially improved (see Elkin (2011) and Green and
Wolf (2010) for some improvements on the constant C below).

√ C > 0 such that for every positive


Theorem 2.5.1 (Behrend’s construction). There exists a constant
integer N, there exists a 3-AP-free A ⊂ [N] with | A| > Ne−C log N
.
The rough idea is to consider choose a high dimensional sphere with many lattice points (this
can be done using the pigeonhole principle). The sphere contains no 3-AP due to convexity. We
then project these lattice points onto Z in a way that creates no additional 3-APs. This is done by
treating the coordinates as the digital expansion of an integer with some large base.

Proof. Let m and d be two positive integers depending on N to be √ specified later. Consider the
lattice points of X = {0, 1, . . . , m − 1} d that lie on a sphere of radius L:
XL := (x1, . . . , xd ) ∈ X : x12 + · · · + xd2 = L .


Ðdm2
Then, X = i=1 Xi . So by the pigeonhole principle, there exists an L ∈ [dm2 ] such that |XL | >
d 2
m /(dm ). Define the base 2m digital expansion
d
Õ
φ(x1, . . . , xd ) := xi (2m)i−1 .
i=1

Then φ is injective on X. Furthermore, x, y, z ∈ [m]d satisfies x + z = 2y if and only if φ(x) + φ(z) =


2φ(y) (basically that there are no wrap arounds in base 2m if all the digits are less than m). Since XL
is a subset of a sphere, it is 3-AP-free. Thus φ(X)√⊂ [(2m)d ] is a 3-AP-free set of size ≥ m d /(dm2 ).
We can optimize the parameters and take m = be log N /2c and d = b log Nc, thereby producing a
p

−C log N
3-AP-free subset of [N] with of size ≥ Ne , where C is some absolute constant. 
58 2. THE GRAPH REGULARITY METHOD

The Behrend construction also implies lower bound constructions for the other problems we
saw earlier. For example, since we used Corollary 2.3.3 to deduce an upper bound on the size of
3-AP-free set, turning this implication around, we see that having a large 3-AP-free set implies a
quantitative limitation on Corollary 2.3.3. Let us spell out it here.

Corollary 2.5.2. For every n ≥ 3, there is some n-vertex graph with at least n2 e−C log n edges
where every edge lies on a unique triangle. Here C is some absolute constant.
Proof. In the proof of Theorem 2.4.1, starting from a 3-AP-free set A ⊂ [N], we constructed a
graph with 6N + 3 vertices and (6N + 3) | A| edges such that every edge lies in a unique triangle.
Choosing N√ = b(n − 3)/6c and letting A be the Behrend construction of Theorem 2.5.1 with
| A| ≥ Ne−C log N , we obtain the desired graph. 
The same graph construction also shows, after examining the proof of Corollary 2.3.3, that in
2
the triangle removal lemma, Theorem 2.3.1, one cannot take δ = e−c(log(1/)) if the constant c > 0
is too small.
In Proposition 2.4.3 we deduced an upper bound r3 (N)N ≤ rx (2N) on corner-free sets using
3-AP-free√sets. The Behrend construction then also gives a corner free subset of [N]2 of size
≥ N 2 e−C log N .

2.6. Graph counting and removal lemmas


In this section, we generalize the triangle counting lemma from triangles to other graphs, which
can then be combined with the graph regularity lemma to produce additional applications.
Let us first illustrate the technique for K4 . Similar to the triangle counting lemma, we embed
the vertices of K4 one at a time, and at each stage ensure that many eligible vertices remain for the
yet to be embedded vertices.

Proposition 2.6.1 (K4 counting lemma). Let 0 <  < 1. Let X1, . . . , X4√be vertex subsets of a graph
G such that (Xi, X j ) is -regular with edge-density di j := d(Xi, X j ) ≥ 3  for each pair i < j. Then
|{(x1, x2, x3, x4 ) ∈ X1 × X2 × X3 × X4 : x1 x2 x3 x4 is a clique in G}|
≥ (1 − 3)(d12 − 3)(d13 − )(d14 − )(d23 − )(d24 − )(d34 − ) |X1 | |X2 | |X3 | |X4 | .

Proof. We repeatedly apply the following statement, which is a simple consequence of the definition
of -regularity (and a small extension of Lemma 2.2.3)
2.6. GRAPH COUNTING AND REMOVAL LEMMAS 59

Given an -regular pair (X, Y ), and B ⊂ Y with |B| ≥  |Y |, the number of vertices
in X with < (d(X, Y ) − ) |B| neighbors in B is <  |X |.
The number of vertices X1 with ≥ (d1i − ) |Xi | neighbors in Xi for each i = 2, 3, 4 is ≥
(1 − 3) |X1 |. Fix a choice of such an x1 ∈ X1 . For each i = 2, 3, 4, let Yi be the neighbors of x1 in
Xi , so that |Yi | ≥ (d1i − ) |Xi |.
The number of vertices in Y2 with ≥ (d2i − ) |Yi | common neighbors in Yi for each i = 3, 4 is
≥ |Y2 | − 2 |X2 | ≥ (d12 − 3) |X2 |. Fix a choice of such an x2 ∈ Y2 . For each i = 3, 4, let Zi be the
neighbors of x2 in Yi .
For each i = 3, 4, |Zi | ≥ (d1i − )(d2i − ) |Xi | ≥  |Xi |, and so

e(Z3, Z4 ) ≥ (d34 − ) |Z3 | |Z4 |


≥ (d34 − ) · (d13 − )(d23 − ) |X3 | · (d14 − )(d24 − ) |X4 | .

Any edge between Z3 and Z4 forms a K4 together with x1 and x2 . Multiplying the above quantity
with the earlier lower bounds on the number of choices of x1 and x2 gives the result. 

The same strategy works more generally for counting any graph. To find copies of H, we embed
vertices of H one at a time.

Theorem 2.6.2 (Graph counting lemma). For every graph H and real δ > 0, there exists an  > 0
such that the following is true.
Let G be a graph, and Xi ⊂ V(G) for each i ∈ V(H) such that for each i j ∈ E(H), (Xi, X j ) is an
-regular pair with edge density di j := d(Xi, X j ) ≥ δ. Then the number of graph homomorphisms
H → G where each i ∈ V(H) is mapped to Xi is
Ö Ö
≥ (1 − δ) (di j − δ) |Xi |.
i j∈E(H) i∈V(H)

Remark 2.6.3. (a) For a fixed H, as |Xi | → ∞ for each i, all but a negligible fraction of such
homomorphisms from H are injective (i.e., yielding a copy of H as a subgraph).
(b) It is useful (and in fact equivalent) to think about the setting where G is a multipartite graph
with parts Xi , as illustrated below.

X
I

2 3
X X

H Xy
G

In the multipartite setting, we see that the graph counting lemma can be adapted to variants such as
counting induced copies of H. Indeed, induced copies of H correspond to embedding a v(H)-clique
in an auxiliary graph G0 obtained by replacing the bipartite graph in G between Xi and X j by its
60 2. THE GRAPH REGULARITY METHOD

complementary bipartite graph between Xi and X j for each i j < E(H).

Xi Xi

613 Giz Gt3


Giz
X2
Ya
G X E X
inducedH in Kray in G
G
(c) We will see a different proof in Section 4.5. There, instead of embedding H one vertex at a
time, we consider what happens to the H-count when we remove an edge from H.
We will prove the following stronger statement, which has the additional advantage that one
can choose the regularity parameter  to depend on the maximum degree of H rather than H itself.

Theorem 2.6.4 (Graph counting lemma). Let H be a graph with maximum degree ∆ ≥ 1 and c(H)
connected components. Let  > 0. Let G be a graph, and Xi ⊂ V(G) for each i ∈ V(H) such that,
for each i j ∈ E(H), (Xi, X j ) is an -regular pair with edge density di j := d(Xi, X j ) ≥ (∆ + 1) 1/∆ .
Then the number of graph homomorphisms H → G where each i ∈ V(H) is mapped to Xi is
Ö Ö
≥ (1 − ∆)c(H) (di j − ∆ 1/∆ ) · |Xi | .
i j∈E(H) i∈V(H)

Furthermore, if |Xi | ≥ v(H)/ for each i, then there exists such a homomorphism H → G that is
injective (i.e., an embedding of H as a subgraph).
Proof. Let us order and label the vertices of H by 1, . . . , v(H) arbitrarily. We will select vertices
x1 ∈ X1, x2 ∈ X2, . . . in order. The idea is to always make sure that they have enough neighbors in
G so that there are many ways to continue the embedding of H. We say that a partial embedding
x1, . . . , xs−1 (here partial embedding means that xi x j ∈ E(G) whenever i j ∈ E(H) for all the xi ’s
chosen so far) is abundant if for each j ≥ s, the number of valid extensions x j ∈ X j (meaning that
xi x j ∈ E(G) whenever i < s and i j ∈ E(H)) is ≥ X j i<s:i j∈E(H) (di j − ).
Î
For each s = 1, 2, . . . , v(H) in order, suppose we have already fixed an abundant partial embed-
ding x1, . . . , xs−1 . For each j ≥ s, let
Yj = {x j ∈ X j : xi x j ∈ E(G) whenever i < s and i j ∈ E(H)}
be the set of valid extensions of the j-th vertex in X j given the partial embeddings of x1, . . . , xs−1 ,
so that the abundance hypothesis gives
Ö
|Yj | ≥ |X j | (di j − ) ≥ ( 1/∆ )|{i<s:i j∈E(H)}| |X j | ≥  |X j |.
i<s
i j∈E(H)

Thus, as in the proof of Proposition 2.6.1 for K4 , the number of choices xs ∈ Xs that would extend
x1, . . . , xs−1 to an abundant partial embedding is
≥ |Ys | − |{i > s : si ∈ E(H)}|  |Xs |
Ö
≥ |Xs | (di j − ) − |{i > s : si ∈ E(H)}|  |Xs | . (†)
i<s
is∈E(H)
2.6. GRAPH COUNTING AND REMOVAL LEMMAS 61

If none of 1, . . . , s − 1 is a neighbor of s in H, then the first term in (†) is |Xs |, and so


(†) ≥ (1 − ∆) |Xs | .
Otherwise we can absorb the second term into the product and obtain
Ö Ö
(†) ≥ |Xs | (di j − ) − (∆ − 1) |Xs | ≥ |Xs | (di j − ∆ 1/∆ ).
i<s i<s
is∈E(H) is∈E(H)

Fix such a choice of xs . And now we move onto embedding the next vertex xs+1 .
Multiplying together these lower bounds for the number of choices of each xs over all s =
1, . . . , v(H), we obtain the lower bound on the number of homomorphisms H → G.
Finally, note that in both cases (†) ≥  |Xs |, and so if |Xs | ≥ v(H)/, then (†) ≥ v(H) and so
we can choose each xs to be distinct from the previously embedded vertices x1, . . . , xs−1 , thereby
yielding an injective homomorphism. 
As an application, we have the following graph removal lemma, generalizing the triangle
removal lemma, Theorem 2.3.1. The proof is basically the same as Theorem 2.3.1 except with the
above graph counting lemma taking the role of the triangle counting lemma, so we will not repeat
the proof here.

Theorem 2.6.5 (Graph removal lemma). For every graph H and constant  > 0, there exists a
constant δ = δ(H, ) > 0 such that every n-vertex graph G with < δnv(H) copies of H can be made
H-free by removing < n2 edges.
The next exercise asks you to show that, if H is bipartite, then one can prove the H-removal
lemma without using regularity, and thereby getting a much better bound.
Exercise 2.6.6 (Removal lemma for bipartite graphs with polynomial bounds). Prove that for every
bipartite graph H, there is a constant C such that for every  > 0, every n-vertex graph with fewer
than  C nv(H) copies of H can be made H-free by removing at most n2 edges.
As another application, let us give a different proof of the Erdős–Stone–Simonovits theorem.
We saw a proof earlier in Section 1.5 using supersaturation and the hypergraph KST theorem.
The proof below follows the partition–clean–count strategy in Remark 2.3.2 combined with an
application of Turán’s theorem. A common feature of many regularity applications is that they
“boost” an exact extremal graph theoretic result (e.g., Turán’s theorem) to an asymptotic result
involving more complex derived structures (e.g., from the existence of a copy of Kr to embedding
a complete r-partite graph).

Theorem 2.6.7 (Erdős–Stone–Simonovits theorem). Fix graph H with at least one edge. Then
  2
1 n
ex(n, H) = 1 − + o(1) .
χ(H) − 1 2
  2
Proof. Fix  > 0. Let G be any n-vertex graph with at least 1 − χ(H)−1 1
+  n2 edges. The theorem
is equivalent to the claim that for n = n(, H) sufficiently large, G contains H as a subgraph.
Apply the graph regularity lemma to obtain an η-regular partition V(G) = V1 t · · · t Vm for
some sufficiently small η > 0 only depending on  and H, to be decided later. Then the number m
of parts is also bounded for fixed H and .
Remove an edge (x, y) ∈ Vi × Vj if
62 2. THE GRAPH REGULARITY METHOD

(a) (Vi, Vj ) is not η-regular, or


(b) d(Vi, Vj ) < /8, or
(c) min{|Vi | , Vj } < n/(8m).
Then, as in Theorem 2.3.1, the number of edges in (a) is ≤ ηn2 ≤ n2 /8, the number of edges in
(b) is < n2 /8, and the number of edges in (c) is < mn2 /(8m) ≤ n2 /8. Thus, the total number of
2 . After removing all these edges, the resulting graph G0 has still has
 removed is ≤
edges  (3/8)n
+ 4 n2 edges.
2
> 1 − χ(H)−1
1

a
By Turán’s theorem (Corollary 1.2.5), G0 contains a copy of K χ(H) . Suppose that the χ(H)
vertices of this K χ(H) land in Vi1, · · · , Vi χ(H) (allowing repeated indices). Since each pair of these
sets is η-regular, has edge density ≥ /8, and each has size ≥ n/(8m), applying the graph counting
lemma, Theorem 2.6.2, we see that as long as η is sufficiently small in terms of  and H, and n is
sufficiently large, there exists an injective embedding of H into G0 where the vertices of H in the
r-th color class are mapped into Vir . So G contains H as a subgraph. 

2.7. Exercises on applying graph regularity


The regularity method can be difficult at first to grasp conceptually. The following exercises
are useful for gaining familiarity in applying the regularity lemma. For these exercises, you are
welcome to use the equitable form of the graph regularity lemma (Theorem 2.1.17).
Exercise 2.7.1 (Ramsey–Turán).
(a) Show that for every  > 0, there exists δ > 0 such that every n-vertex K4 -free graph with at
least ( 81 + )n2 edges contains an independent set of size at least δn.
(b) Show that for every  > 0, there exists δ > 0 such that every n-vertex K4 -free graph with at
least ( 81 − δ)n2 edges and independence number at most δn can be made bipartite by removing
at most n2 edges.
2
Exercise 2.7.2. Show that the number of n-vertex triangle-free graphs is 2(1/4+o(1))n .
Exercise 2.7.3. Show that for every H and  > 0 there exists δ > 0 such that every graph on n
vertices without an induced copy of H contains an induced subgraph on at least δn vertices whose
edge density is at most  or at least 1 − .
Exercise 2.7.4 (Ramsey numbers of bounded degree graphs). Show that for every ∆ there exists
a constant C∆ so that if H is a graph with maximum degree at most ∆, then every 2-edge-coloring
of a complete graph on at least C∆ v(H) vertices contains a monochromatic copy of H.
Exercise 2.7.5 (Induced Ramsey). Show that for every graph H there is some graph G such that if
the edges of G are colored with two colors, then some induced subgraph of G is a monochromatic
copy of H.
2.8. INDUCED GRAPH REMOVAL AND STRONG REGULARITY 63

Exercise 2.7.6. Show that for every α > 0, there exists β > 0 such that every graph on n vertices
with at least αn2 edges contains a d-regular subgraph for some d ≥ βn (here d-regular refers to
every vertex having degree d).

2.8. Induced graph removal and strong regularity


Recall from the beginning of Chapter 0 that H is an induced subgraph of G if one can obtain
H from G by deleting vertices from G (but you are not allowed to simply remove edges from G).
We say that G is induced H-free if G contains no induced subgraph isomorphic to H.
The following removal lemma for induced graphs is due to Alon, Fischer, Krivelevich, and
Szegedy (2000).

Theorem 2.8.1 (Induced graph removal lemma). For any graph H and  > 0, there exists δ > 0
such that if an n-vertex graph has fewer than δnv(H) copies of H, then it can be made induced H-free
by adding and/or deleting fewer than n2 edges.
Remark 2.8.2. Given two graphs on the same vertex set, the minimum number of edges that one
needs to add/delete to obtain the second graph from the first graph is called the edit distance
between the two graphs. The induced graph removal lemma can be rephrased as saying that every
graph with few induced copies of H is close in edit distance to an induced-H-free graph.
Unlike the previous graph removal lemma, for the induced version, it is important that we allow
both adding and deleting edges. The statement would be false if we only allow edge deletion but
not addition. For example, suppose G = Kn \ K3 , i.e., a complete graph on n vertices with three
edges of a single triangle removed. If H is an empty graph on three vertices, then G has exactly
one copy of H, but G cannot be made induced-H-free by only deleting edges.
To see why the earlier proof of the graph removal lemma (Theorem 2.6.5) does not apply in a
straightforward way to prove the induced graph removal lemma, let us attempt to follow the earlier
strategy and see where things go wrong.
First we apply the graph regularity lemma. Then we need to clean up the graph. In the induced
graph removal lemma, edges and non-edges play symmetric roles (alternatively, we can rephrase
the problem in terms of red/blue edge-coloring of cliques). We can handle low density pairs (edge
density less than ) by removing edges between such pairs. Naturally, for the induced graph removal
lemma, we also need to handle high density pairs (density more than 1 − ), and we can add all
the edges between such pairs. However, it is not clear what to do with irregular pairs. Earlier, we
just removed all edges between irregular pairs. The problem is that this may create many induced
copies of H that were not present previously, e.g., below. Likewise, we cannot simply add all edges
between irregular pairs.
V2
irregular

H
V1
G
Perhaps we can always find a regularity partition without irregular pairs? Unfortunately, this is
false, as shown in Exercise 2.1.21. One must allow for the possibility of irregular pairs.
64 2. THE GRAPH REGULARITY METHOD

We will iterate the regularity partitioning lemma to obtain a stronger form of the regularity
lemma. Recall the energy q(P) of a partition (Definition 2.1.8) as the mean-squared edge density
between parts.

Theorem 2.8.3 (Strong regularity lemma). For any sequence of constants 0 ≥ 1 ≥ 2 . . . > 0,
there exists an integer M so that every graph has two vertex partitions P and Q so that
(a) Q refines P,
(b) P is 0 -regular and Q is |P | -regular,
(c) q(Q) ≤ q(P) + 0 , and
(d) |Q| ≤ M.
Remark 2.8.4. One should think of the sequence 1, . . . as rapidly decreasing. This strong regu-
larity lemma outputs a refining pair of partitions P and Q such that P is regular, Q is extremely
regular, and P and Q are close to each other (as captured by q(P) ≤ q(Q) ≤ q(P) + 0 ; see
Lemma 2.8.7 below). A key point here is that we demand Q to be extremely regular relative to the
number of parts of P. The more parts P has, the more regular Q should be.
Proof. We repeatedly apply the following version of Szemerédi’s regularity lemma (Theorem 2.1.16):
For all  > 0, there exists an integer M0 = M0 () so that for all partitions P of
V(G), there exists a refinement P 0 of P with each part in P refined into ≤ M0
parts so that P 0 is -regular.
By iteratively applying the above regularity partition, we obtain a sequence of partitions
P0, P1, . . . of V(G) starting with P0 = {V(G)} being the trivial partition. Each Pi+1 is |Pi | -regular
and refines Pi . The regularity lemma guarantees that we can have |Pi+1 | ≤ |Pi |M0 (|Pi | ).
Since 0 ≤ q(·) ≤ 1, there exists i ≤ 0−1 so that q(Pi+1 ) ≤ q(Pi ) + 0 . Then setting P = Pi
and Q = Pi+1 satisfies the desired requirements. Indeed, the number of parts of Q is bounded by
a function of the sequence (0, 1, . . . ), since there are a bounded number of iterations, and each
iteration produced a refining partition with a bounded number of parts. 
Remark 2.8.5 (Bounds in the strong regularity lemma). The bound on M produced by the proof
depends on the sequence (0, 1, . . . ). In the application below, we use i = 0 /poly(i). Then the
size of M is comparable to applying M0 to 0 in succession 10 times. Note that M0 is a tower
function, and this makes M a tower function iterated i times. This iterated tower function is called
the wowzer function: wowzer(k) = tower(tower(· · · (tower(k)) · · · )) (with k applications of tower).
The wowzer function is one step up from tower in the Ackermann hierarchy.
Remark 2.8.6. We can in fact further guarantee equitability of parts. This can be done by adapting
the ideas sketched in the proof sketch of Theorem 2.1.17.
The following lemma explains the significance of the inequality q(Q) ≤ q(P) +  from earlier.

Lemma 2.8.7. Let P and Q both be vertex partitions of a graph G, with Q refining P. For each
x ∈ V(G), write Vx for the part of P that x lies in and W x for the part of Q that x lies in. If
q(Q) ≤ q(P) +  3 , then d(Vx, Vy ) − d(W x, W y ) ≤  for all but n2 pairs (x, y) ∈ V(G)2 .
Proof. Let x, y ∈ V(G) be chosen uniformly at random. As in the proof of Lemma 2.1.9, we have
q(P) = E[ZP2 ], where ZP = d(Vx, Vy ). Likewise, q(Q) = E[ZQ2 ], where ZQ = d(W x, W y ).
We have
q(Q) − q(P) = E[ZQ2 ] − E[ZP2 ] = E[(ZQ − ZP )2 ],
2.8. INDUCED GRAPH REMOVAL AND STRONG REGULARITY 65

where the final step above is a “Pythagorean identity.”


ZQ
ZP

ZQ − ZP

Indeed, the identity E[ZQ2 ] − E[ZP2 ] = E[(ZQ − ZP )2 ] is equivalent to E[ZP (ZQ − ZP )] = 0, which
is true since as x and y each vary over their own parts of P, the expression ZQ − ZP averages to
zero.
So q(Q) ≤ q(P) +  3 is equivalent to E[(ZQ − ZP )2 ] ≤  3 , which in turn implies, by Markov
inequality, that P(|ZQ − ZP | > ) ≤ , which is the same as the desired conclusion. 
Remark 2.8.8. Conversely, if d(Vx, Vy ) − d(W x, W y ) ≤  for all but n2 pairs (x, y) ∈ V(G)2 , then
q(Q) ≤ q(P) + 2 (Exercise!).
We now deduce the following form of the strong regularity lemma, which considers only select
subsets of vertex parts but does not require irregular pairs.

Theorem 2.8.9 (Strong regularity lemma). For any sequences of constants 0 ≥ 1 ≥ 2 · · · > 0,
there exists a constant δ > 0 so that every n-vertex graph has an equitable vertex partition
V1 ∪ · · · ∪ Vk and a subset Wi ⊂ Vi for each i satisfying
(a) |Wi | ≥ δn,
(b) (Wi, W j ) is  k -regular for all 1 ≤ i ≤ j ≤ k, and
(c) d(Vi, Vj ) − d(Wi, W j ) ≤ 0 for all but < 0 k 2 pairs (i, j) ∈ [k]2 .

V2 W2 W3 V3

V1 W1 W4 V4

Remark 2.8.10. It is significant that all (rather than nearly all) pairs (Wi, W j ) are regular. We will
need this fact in our applications below.
Proof sketch. Here we show how to prove a slightly weaker result where i ≤ j in (b) is replaced
by i < j. In other words, this proof does not promise that each Wi is  k -regular. To obtain the
stronger conclusion as stated (requiring each Wi to be regular with itself), we can adapt the ideas
in Exercise 2.1.23. We omit the details.
By decreasing the i ’s if needed (we can do this since a smaller sequence of i ’s yields a stronger
conclusion), we may assume that i ≤ 1/(10i 2 ) and i ≤ 0 /4 for every i ≥ 1.
Let us apply the strong regularity lemma, Theorem 2.8.3, with equitable partitions (see above
Remark 2.8.6). That is, we have (we make the simplifying assumption that all partitions are exactly
equitable, to avoid unimportant technicalities):
• an equitable 0 -regular partition P = {V1, . . . , Vk } of V(G) and
• an equitable k -regular partition Q refining P
66 2. THE GRAPH REGULARITY METHOD

satisfying
• q(Q) ≤ q(P) + 03 /8, and
• |Q| ≤ M = M(0, 1, . . . ).
Inside each part Vi , let us choose a part Wi of Q uniformly at random. Since |Q| ≤ M, the
equitability assumption implies that each part of Q has size ≥ δn for some constant δ = δ(0, 1, . . . ).
So (a) is satisfied.
Since Q is  k -regular, all but an  k -fraction of pairs of parts of Q are  k -regular. Summing over
all i < j, using linearity of expectations, the expected the number of pairs (Wi, W j ),that are not
 k -regular is ≤  k k 2 ≤ 1/10. It follows that with probability ≥ 9/10, (Wi, W j ) is  k -regular for all
i < j, so (b) is satisfied (in the simpler setting ignoring i = j as mentioned earlier).
Let X denote the number of pairs (i, j) ∈ [k]2 with d(Vi, Vj ) − d(Wi, W j ) > 0 . Since q(Q) ≤
q(P) + (0 /2)3 , by Lemma 2.8.7 and linearity of expectations, EX ≤ (0 /2)k 2 . So by Markov’s
inequality, X ≤ 0 k 2 with probability ≥ 1/2, so that (c) is satisfied
It follows that (a) and (b) are both satisfied with probability ≥ 1 − 1/10 − 1/2. Therefore, there
exist valid choices of Wi ’s. 
Proof of the induced graph removal lemma (Theorem 2.8.1). As usual, we have the three steps in
the regularity method recipe.
First, we apply Theorem 2.8.9 to obtain a partition V1 ∪ · · · ∪ Vk of the vertex set of the graph,
along with W k ⊂ Vk , so that the following hold.
(a) (Wi, W j ) is  0-regular for every i ≤ j, with some sufficiently small constant  0 > 0 depending
on  and H,
(b) d(Vi, Vj ) − d(Wi, W j ) ≤ /8 for all but <  k 2 /8 pairs (i, j) ∈ [k]2 , and
(c) |Wi | ≥ δ0 n, for some constant δ0 depending only on  and H.
Next, we clean the graph as follows: for each pair i ≤ j (including i = j):
• if d(Wi, W j ) ≤ /8, then remove all edges between (Vi, Vj );
• if d(Wi, W j ) ≥ 1 − /8, then add all edges between (Vi, Vj ).
Note that we are not simply add/removing edges within each pair (Wi, W j ), but rather all of
(Vi, Vj ). To bound the number of edges add/deleted, recall (b) from the previous paragraph. If
d(Wi, W j ) ≤ /8 and d(Vi, Vj ) − d(Wi, W j ) ≤ /4, then d(Vi, Vj ) ≤ /4, and the number of edges
in all such (Vi, Vj ) is at most n2 /4. Likewise for d(Wi, W j ) ≥ 1 − /8. For the remaining <  k 2 /8
pairs (i, j) not satisfying d(Vi, Vj ) − d(Wi, W j ) ≤ /8, the total number of edges among all such
pairs is at most n2 /8. All together, we added/deleted < n2 edges from G. Call the resulting graph
G0. There are no irregular pairs (Wi, W j ) for us to worry about.
It remains to show that G0 is induced-H-free. Suppose otherwise. Let us count induced copies
of H in G as in the proof of the graph removal lemma, Theorem 2.6.5. We have some induced copy
of H in G0, with each vertex v ∈ V(H) embedded in Vφ(v) for some φ : V(H) → [k].
Consider a pair of distinct vertices u, v of H. If uv ∈ E(H), there must be an edge in G0 between
Vφ(u) and Vφ(v) (here φ(u) and φ(v) are not necessarily different). So we must not have deleted all
the edges in G between Vφ(u) and Vφ(v) in the cleaning step. By the cleaning algorithm above, this
means that dG (Wi, W j ) > /8.
Likewise, if uv < E(H) for any pair of distinct u, v ∈ V(H), we have dG (Wi, W j ) < 1 − /8.
Since (Wi, W j ) is  0-regular in G for every i ≤ j, provided that  0 is small enough (in
terms of  and H), the graph counting lemma, Theorem 2.6.2 (with the induced variation as
in Remark 2.6.3(b)), applied to G tell us that the number of induced copies of H in G is
2.8. INDUCED GRAPH REMOVAL AND STRONG REGULARITY 67

v(H) v(H)
≥ (1−)(/10)( 2 ) (δ0 n)v(H) (recall |Wi | ≥ δ0 n). We are then done with δ = (1−)(/10)( 2 ) δ0v(H) ,
since from the hypothesis we knew that G has < δnv(H) copies of H. 

Finally, let us prove a graph removal lemma with an infinite number of forbidden induced
subgraphs (Alon and Shapira 2008). Given a (possibly infinite) set H of graphs, we say that G is
induced-H -free if G is induced-H-free for every H ∈ H .

Theorem 2.8.11 (Infinite graph removal lemma). For each (possibly infinite) set of graphs H and
 > 0, there exist h0 and δ > 0 so that if G is an n-vertex graph with fewer than δnv(H) induced
copies of H for every H ∈ H with at most h0 vertices, then G can be made induced-H -free by
adding/removing fewer than n2 edges.

Remark 2.8.12. The presence of h0 may seem a bit strange at first. In the next section, we will see
a reformulation of this theorem in the language of property testing, where h0 comes up naturally.

Proof. The proof is mostly the same as the proof of the induced graph removal lemma that we just
saw. The main (and only) tricky issue here is how to choose the regularity parameter  0 for every
pair (Wi, W j ) in condition (a) of the earlier proof. Previously, we did not use the full strength of
Theorem 2.8.9, which allowed  0 to depend on k, but now we are going to use it. Recall that we had
to make sure that this  0 was chosen to be small enough for the H-counting lemma to work. Now
that there are possibly infinitely many graphs in H , we cannot naively choose  0 to be sufficiently
small. The main point of the proof is to reduce the problem to a finite subset of H for each k.
Define a template T to be an edge-coloring of the looped k-clique (i.e., a complete graph on k
vertices along with a loop at a every vertex) where each edge is colored by one of {white, black,
gray}. We say that a graph H is compatible with a template T if there exists a map φ : V(H) → V(T)
such that for every distinct pair u, v of vertices of H:
• if uv ∈ E(H), then φ(u)φ(v) is colored black or gray in T; and
• if uv < E(H), then φ(u)φ(v) is colored white or gray in T.
That is, a black edge in a template means an edge of H, a white edge means a non-edge of H, and
a gray edge is a wildcard (the most flexible). An example is shown below.

As another example, every graph is compatible with every completely gray template.
For every template T, pick some representative HT ∈ H compatible with T, as long as such a
representative exists (and ignore T otherwise). A graph in H is allowed to be the representative
of more than one template. Let Hk be a set of all H ∈ H that arise as the representative of some
k-vertex template. Note that Hk is finite since there are finitely many k-vertex templates. We
can pick each  k > 0 to be small enough so that the conclusion of the counting step later can be
guaranteed for all elements of Hk .
68 2. THE GRAPH REGULARITY METHOD

Now we proceed nearly identically as in the proof of the induced removal lemma, Theorem 2.8.1,
that we just saw. In applying Theorem 2.8.9 to obtain the partition V1 ∪ · · · ∪Vk and finding Wi ⊂ Vi ,
we ensure the following condition instead of the earlier (a):
(a) (Wi, W j ) is k -regular for every i ≤ j.
We set h0 to be the maximum number of vertices of a graph in Hk .
Now we do the cleaning step. Along the way, we create a k-vertex template T with vertex set
[k] corresponding to the parts {V1, . . . , Vk } of the partition. For each 1 ≤ i ≤ j ≤ n,
• if d(Wi, W j ) ≤ /4, then remove all edges between (Vi, Vj ) from G, and color the edge i j in
template T white;
• if d(Wi, W j ) ≥ 1 − /4, then add all edges between (Vi, Vj ), and color the edge i j in template
T black;
• otherwise, color the edge in i j in template T gray.
Finally, suppose some induced H ∈ H remains in G0. Due to our cleaning procedure, H must
be compatible with the template T. Then the representative HT ∈ Hk of T is a graph on at most h0
vertices, and furthermore, the counting lemma guarantees that, provided  k > 0 is small enough (a
finite number of pre-chosen constraints, one for each element of Hk ), the number of copies of HT
in G is ≥ δnv(HT ) for some constant δ > 0 that only depends on  and H . 
All the techniques above work nearly verbatim for a generalization to colored graphs.

Theorem 2.8.13 (Infinite edge-colored graph removal lemma). For every  > 0, r ∈ N, and a
(possibly infinite) set H of r-edge-colored graphs, there exists some h0 and δ > 0 such that if G is
an r-edge-coloring of the complete graph on n vertices with < δnv(H) copies of H for every H with
at most h0 vertices, then G can be made H -free by recoloring < n2 edges (using the same palette
of r colors throughout).
The induced graph removal lemma corresponds to the special case r = 2, with the two colors
representing edges and non-edges respectively.

2.9. Graph property testing


We are given access to a very large graph, where we can sample vertices uniformly at random
and read out the edges between them. The graph may be too large for us to see every vertex or
edge. What can you infer about the graph under such a model?
For example, we will not able to detect very small perturbations in the graph (e.g., we cannot
distinguish two graphs if they only differ on a small number of vertices or edges). We will also
need to have some error tolerance.
A graph property P is simply a set of isomorphism classes of graphs. The graph properties
that we usually encounter have some nice name and/or compact description, such as triangle-free,
planar, 3-colorable, etc.
We say that an n-vertex graph G is -far from property P if one cannot change G into a graph
in P by adding/deleting n2 edges.
The following theorem gives a straightforward algorithm, with a probabilistic guarantee, on
testing triangle-freeness. It turns out to be essentially a rephrasing of the triangle removal lemma.

Theorem 2.9.1 (Triangle-freeness is testable). For all constants  > 0, there exists a constant
K = K() so that the following algorithm satisfies the probabilistic guarantees below.
Algorithm. Input: a graph G.
2.9. GRAPH PROPERTY TESTING 69

Sample K vertices from G uniformly at random without replacement (if G has fewer than K
vertices, then return the entire graph). If G has no triangles among these K vertices, then output
that G is triangle-free; else output that G is -far from triangle-free.
(a) If the input graph G is triangle-free, then the algorithm always correctly outputs that G is
triangle-free;
(b) If the input graph G is -far from triangle-free, then with probability ≥ 0.99 the algorithm
outputs that G is -far from triangle-free;
(c) We do not make any guarantees when the input graph is neither triangle-free nor -far from
triangle-free.
Remark 2.9.2. This is an example of a one-sided tester, meaning that it always (non-probabilistically)
outputs a correct answer when G satisfies property P and only has a probabilistic guarantee when
G does not satisfy property G. In otherwise, it only has false positives and not false negatives. (In
contrast, a two-sided tester would have probabilistic guarantees for both situations.)
For a one-sided tester, there is nothing special about the number 0.99 above in (b). It can be any
positive constant δ > 0. If we run the algorithm m times, then the probability of success improves
from ≥ δ to ≥ 1 − (1 − δ)m , which can be made arbitrarily close to 1 if we choose m large enough.
Proof. If the graph G is triangle-free, the algorithm clearly always outputs correctly. On the other
hand, if G is -far from triangle-free, then by the triangle removal lemma (Theorem 2.3.1), G has
≥ δ 3n triangles with some constant δ = δ() > 0. If we sample three vertices from G uniformly
a random, then then they form a triangle with probability ≥ δ. And if run K/3 independent trials,
then the probability that we see a triangle is ≥ 1 − (1 − δ)K/3 , which is ≥ 0.99 as long as K is a
sufficiently large constant (depending on δ, which in turn depends on ).
In the algorithm as stated in the theorem, K vertices are sampled without replacement. Above
we had K independent trials of picking a triple of vertices at random. But this difference hardly
matters. We can couple the two processes by adding additional random processes to the latter
process until we see K distinct vertices. 
Just as how the guarantee of the above algorithm is essentially a rephrasing of the triangle
removal lemma, other graph removal lemmas can be rephrased as graph property testing theorems.
For the infinite induced graph removal lemma, Theorem 2.8.11, we can rephrase the result in terms
of graph property testing for hereditary properties.
A graph property P is hereditary if it is closed under vertex-deletion. That is, if G ∈ P, then
every induced subgraph of G is in P. Many common examples of graph properties are hereditary,
e.g., H-free, induced H-free, planar, 3-color, perfect. Every hereditary property P is the same as
the set of induced H -free graph for some (possibly infinite) family of graphs H , e.g., we can take
H = {H : H < P}.

Theorem 2.9.3 (Every hereditary graph property is testable). For every hereditary graph property
P, and constant  > 0, there exists a constant K = K(P, ) so that the following algorithm satisfies
the probabilistic guarantees listed below.
Algorithm. Input: a graph G.
Sample K vertices from G uniformly at random without replacement and let H be induced
subgraph on these K vertices. If H ∈ P, then output that G satisfies P; else output that G is -far
from P.
(a) If the input graph G satisfies P, then the algorithm always correctly outputs that G satisfies
P;
70 2. THE GRAPH REGULARITY METHOD

(b) If the input graph G is -far from P, then with probability ≥ 0.99 the algorithm outputs that
G is -far from P;
(c) We do not make any guarantees when the input graph is neither in P nor -far from P.
Proof. If G ∈ P, then since P is hereditary, H ∈ P, and so the algorithm always correctly outputs
that G ∈ P. So suppose G is -far from P. Let H be such that P is the set of induced H -free
graphs. By the infinite induced graph removal lemma, there is some h0 and δ > 0 so that G has
≥ δ v(H) copies of some H ∈ H with at most h0 vertices. So with probability ≥ δ, a sample
n 

of h0 vertices sees an induced subgraph not satisfying P. Running K/h0 independent trials, we
see some induced subgraph not satisfying P with probability ≥ 1 − (1 − δ)K/h0 , which can made
arbitrarily close to 1 by choosing K to be sufficiently large. As with earlier, this implies the result
about choosing K random points without replacement. 

2.10. Hypergraph removal and Szemerédi’s theorem


We showed earlier how to deduce Roth’s theorem from the triangle removal lemma. How-
ever, the graph removal lemma, or the graph regularity method more generally, is insufficient for
understanding longer arithmetic progressions.
Szemerédi’s theorem follows as a corollary of a hypergraph generalization of the triangle
removal lemma. (Note that historically, Szemerédi’s theorem was initially shown using other
methods; see the discussion in Section 0.2). The hypergraph removal lemma turns out to be
substantially more difficult. The following theorem was proved by Rödl et al. (2005) and Gowers
(2007). The special case of the tetrahedron removal lemma in 3-graphs was proved earlier by Frankl
and Rödl (2002).

Theorem 2.10.1 (Hypergraph removal lemma). For every r-graph H and  > 0, there exists δ > 0
so that every n-vertex r-graph with < δnv(H) copies of H can be made H-free by removing < nr
edges.
Recall Szemerédi’s theorem says that for every fixed k ≥ 3, every k-AP-free subset of [N]
has size o(N). We will prove it as a corollary of the hypergraph removal lemma for H = Kk(k−1) ,
the complete (k − 1)-graph on k vertices (also known as a simplex; when k = 3 it is called a
tetrahedron). For concreteness, we will show how the deduction works in the case k = 4 (it is
straightforward to generalize).
Here is a corollary of the tetrahedron removal lemma. It is analogous to Corollary 2.3.3.

Corollary 2.10.2. If G is a 3-graph such that every edge is contained in a unique tetrahedron (i.e.,
a clique on four vertices), then G has o(n3 ) edges.
Proof of Szemerédi’s theorem for 4-APs. Let A ⊂ [N] be 4-AP-free. Let M = 6N + 1. Then A is
also a 4-AP-free subset of Z/MZ (there are no wrap-arounds). Build a 4-partite 3-graph G with
parts W, X, Y , Z, all of which are M-vertex sets indexed by the elements of Z/MZ. We define
edges as follows, where w, x, y, z range over elements of W, X, Y , Z, respectively:
wx y ∈ E(G) ⇐⇒ 3w + 2x + y ∈ A,
wxz ∈ E(G) ⇐⇒ 2w + x −z ∈ A,
wyz ∈ E(G) ⇐⇒ w − y − 2z ∈ A,
x yz ∈ E(G) ⇐⇒ −x − 2y − 3z ∈ A.
2.11. HYPERGRAPH REGULARITY 71

What is important here is that the ith expression does not contain the ith variable.
The vertices x yzw form a tetrahedron if and only if
3w + 2x + y, 2w + x − z, w − y − 2z, −x − 2y − 3z ∈ A.
However, these values form a 4-AP with common difference −x − y − z − w. Since A is 4-AP-free,
the only tetrahedra in A are trivial 4-APs (those with common difference zero). For each triple
(w, x, y) ∈ W × X × Y , there is exactly one z ∈ Z/MZ such that x + y + z + w = 0. Thus, every
edge of the hypergraph lies in exactly one tetrahedron.
By Corollary 2.10.2, the number of edges in the hypergraph is o(M 3 ). On the other hand,
the number of edges is exactly 4M 2 | A| (since, e.g., for every a ∈ A, there are exactly M 2 triples
(w, x, y) ∈ (Z/MZ)3 with 3w + 2x + y = a). Therefore | A| = o(M) = o(N). 
The hypergraph removal lemma is proved using a substantial and difficult generalization of the
graph regularity method to hypergraphs. We will not be able to prove it in this book. In the next
section, we sketch some key ideas in hypergraph regularity.
It is instructive to work out the proof in the special cases below.
For the next two exercises, you should assuming Corollary 2.10.2.
Exercise 2.10.3 (3-dimensional corners). Suppose A ⊂ [N]3 contains no four points of the form
(x, y, z), (x + d, y, z), (x, y + d, z), (x, y, z + d), with d > 0.
Show that | A| = o(N 3 ).
Exercise 2.10.4 (Multidimensional Szemerédi for axis-aligned squares). Suppose A ⊂ [N]2 con-
tains no four points of the form
(x, y), (x + d, y), (x, y + d), (x + d, y + d), with d , 0.
Show that | A| = o(N 2 ).
Try generalizing this technique to prove the multidimensional Szemerédi theorem (Theo-
rem 0.2.6) using the hypergraph removal lemma.
2.11. Hypergraph regularity
Hypergraph regularity is substantially more difficult to prove than graph regularity. We only
sketch some key ideas here. For concreteness, we focus our discussion on 3-graphs. Throughout
this section, G will be a 3-graph with vertex set V.
What should correspond to an “-regular pair” from the graph regularity lemma? Here is an
initial attempt.
Definition 2.11.1 (Initial attempt at 3-graph regularity). Given vertex subsets V1, V2, V3 ⊂ V, we say
that (V1, V2, V3 ) is -regular if, for all Ai ⊂ Vi such that | Ai | ≥  |Vi | , we have
|d(V1, V2, V3 ) − d(A1, A2, A3 )| ≤  .
Here, the edge density d(X, Y, Z) is the fraction of elements of X × Y × Z that are edges of G.
By following the proof of the graph regularity lemma nearly verbatim, we can show the
following.

Proposition 2.11.2 (Initial attempt at 3-graph regularity partition). For all  > 0, there exists M =
M() such that every 3-graph has a partition into at most M parts so that all but at most an
-fraction of triples of vertices lie in -regular triples of vertex parts.
72 2. THE GRAPH REGULARITY METHOD

Can this result be used to prove the hypergraph removal lemma? Unfortunately, no.
Recall that our graph regularity recipe (Remark 2.3.2) involves three steps: partition, clean, and
count. It turns out that no counting lemma is possible for the above notion of 3-graph regularity.
The notion of -regularity is supposed to model pseudorandomness. So why don’t we try
truly random hypergraphs and see what happens? Let us consider two different random 3-graph
constructions:
(1) First pick constants p, q ∈ [0, 1] . Build a random graph G(2) = G(n, p), an ordinary Erdős–
Rényi graph. Then construct G(3) by including each triangle of G(2) as an edge of G(3) with
probability q. Call this 3-graph X.
(2) For each possible edge (i.e. triple of vertices), include the edge with probability p3 q, indepen-
dent of all other edges. Call this 3-graph Y .
The edge density in both X and Y are close to p3 q, even when restricted to linearly sized triples
of vertex subsets. So both graphs satisfy our above notion of -regularity with high probability.
However, we can compute the tetrahedron densities in both of these graphs and see that they do not
match.
The tetrahedron density in X is around q4 times the K4 density in the underlying random graph
G . The K4 density in G(2) is around p6 . So the tetrahedron density in X is around p6 q4 .
(2)

On the other hand, the tetrahedron density in Y is around (p3 q)4 , different from p6 q4 earlier. So
we should not expect a counting lemma with this notion of -regularity. (Unless the the 3-graph
we are counting is linear, as in the exercise below.)
Exercise 2.11.3. Under the notion of 3-graph regularity in Definition 2.11.1, formulate and prove
an H-counting lemma for every linear 3-graph H. Here a hypergraph is said to be linear if every
pair of its edges intersects in at most one vertex.
As hinted by the first random hypergraph above, a more useful notion of hypergraph regularity
should involve both vertex subsets as well as subsets of vertex-pairs (i.e., an underlying 2-graph).
Given a 3-graph G, a regularity decomposition will consist of
(1) a partition of V2 into 2-graphs G(2) (2)

1 ∪ · · · ∪ G l so that G sits in a random-like way on top of
most triples of these 2-graphs (we won’t try to make it precise), and
(2) a partition of V that gives an extremely regular partition for all 2-graphs G(2) (2)
1 , . . . , G l (this
should be somewhat reminiscent of the strong graph regularity lemma from Section 2.8).
For such a decomposition to be applicable, it should come with a corresponding counting
lemma.
There are several ways to make the above notions precise. Certain formulations make the
regularity partition easier to prove while the counting lemma harder, and some vice versa. The
interested readers should consult Rödl et al. (2005), Gowers (2007) (see Gowers (2006) for an
exposition of the case of 3-uniform hypergraphs), and Tao (2006) for three different approaches to
the hypergraph regularity lemma.
Remark 2.11.4 (Quantitative bounds). Whereas the proof of the graph regularity lemma gives
tower-type bounds tower( −O(1) ), the proof of the 3-graph regularity lemma has wowzer-type
bounds. The 4-graph regularity lemma moves us one more step up in the Ackermann hierarchy,
i.e., iterating wowzer, and so on. Just as with the tower-type lower bound (Theorem 2.1.15) for the
graph regularity lemma, Ackermann type bounds are necessary for hypergraph regularity as well
(Moshkovitz and Shapira 2019).
FURTHER READING 73

Further reading
For surveys on the graph regularity method and applications, see Komlós and Simonovits (1996)
and Komlós, Shokoufandeh, Simonovits, and Szemerédi (2002).
For a survey on the graph removal lemma, including many variants, extensions, and proof
techniques, see Conlon and Fox (2013).
For a well-motivated introduction to the hypergraph regularity lemma, see Gowers (2006).
CHAPTER 3

Pseudorandom graphs

In the previous chapter, we say that the graph regularity lemma partitions an arbitrary graph
into a bounded number of pieces so that the graph looks “random-like” between most pairs of parts.
In this chapter, we dive further into how a graph can be random-like.
Pseudorandomness is a concept prevalent in combinatorics, theoretical computer science, and
in many other areas. It specifies how a non-random object can behave like a truly random object.
Suppose you want to generate a random number on a computer. In most systems and program-
ming languages, you can do this easily with a single command (e.g., rand()). The output is not
actually truly random. Instead, the output came from a pseudorandom generator, which is some
function/algorithm that takes a seed as input, and passes it through some sophisticated function,
so that there is no practical way to distinguish the output from a truly random object. In other
words, the output is not actually truly random, but for all practical purposes the output cannot be
distinguished from a true random output.
In number theory, the prime numbers behave like a random sequence in many ways. The
celebrated Riemann hypothesis and its generalizations give quantitative predictions about how
closely the primes behave in a certain specific way like a random sequence. There is also something
called Cramér’s random model for the primes that allows one to make predictions about the
asymptotic density of certain patterns in the primes (e.g., how many twin primes up to N are
there?). Empirical data support these predictions, and they have been proved in certain cases, but
there are still notorious open problems such as the twin prime and Goldbach conjectures. Despite
their pseudorandom behavior, the primes are not random!
It is very much believed that the digits of π behave in a random-like way, where every digit
or block of digits appear with frequency similar to that of a truly
√ random number. Such numbers
are called normal. It is widely believed that numbers such as 2, π, and e are normal, but proofs
remain elusive. Again, the digits of π are deterministic, not random, but they are believed to behave
pseudorandomly.
Coming back to graph theory, we have the Erdős–Rényi model of random graphs, where every
edge occurs independently with some probability. Now, given some specific graph (perhaps an
instance of the random graph model, or perhaps generated via some other means), we can ask
whether this graph, for the purpose of some intended application, behaves similarly to that of a
typical random graph. What are some way to “measure” pseudorandomness? These questions were
studied systematically starting in the late 1980’s in the foundational works of Thomason (1987)
and Chung, Graham, and Wilson (1989). It had an important impact in the field. This is the main
theme that we explore in this chapter.

3.1. Quasirandom graphs


Chung, Graham, and Wilson (1989) showed that several seemingly different notions of graph
pseudorandomness properties are surprisingly equivalent. They used the term quasirandom
graphs to describe graphs (or rather sequences of graphs) satisfying these properties.
75
76 3. PSEUDORANDOM GRAPHS

Theorem 3.1.1 (Quasirandom graphs). Let  p ∈ [0, 1] be fixed. Let (Gn ) be a sequence of graphs
with Gn having n vertices and (p + o(1)) 2n edges (here n → ∞ along some subsequence of integers,
i.e., is allowed to skip integers). Denote Gn by G. The following properties are all equivalent:
DISC (discrepancy):
e(X, Y ) = p |X | |Y | + o(n2 ) for all X, Y ⊂ V(G).
Here e(X, Y ) = |{(x, y) ∈ X × Y : xy ∈ E(G)}|. The asymptotic notation means that there is some
function f (n) with f (n)/n2 → 0 as n → ∞ ( f may depend on the sequence of graphs) such that e(X, Y )
is always within f (n) of p|X ||Y |.
DISC’:  
|X |
e(X) = p + o(n2 ) for all X ⊂ V(G).
2
Here e(X) is the number of edges of G contained in X.
COUNT: For every graph H, the number of labeled copies of H in G is (pe(H) + o(1))nv(H) .
Here a labeled copy of H is the same as an injective map V(H) → V(G) that sends every edge of H to
an edge of G. Here the rate that the o(1) goes to zero is allowed to depend on H.
C4 (4-cycle): The number of labeled 4-cycles is at most (p4 + o(1))n4 .
CODEG (codegree): Letting codeg(u, v) denote the number of common neighbors of u and v,
Õ
codeg(u, v) − p2 n = o(n3 ).
u,v∈V(G)
EIG (eigenvalue): If λ1 ≥ λ2 ≥ · · · ≥ λn are the eigenvalues of the adjacency matrix of G, then
λ1 = pn + o(n) and maxi,1 |λi | = o(n).
Here the adjacency matrix of G defined as the matrix with row and columns both indexed by V(G),
and the (u, v)-entry is 1 if uv ∈ E(G), and 0 otherwise.
Definition 3.1.2. We say a sequence of graphs is quasirandom (at edge density p) if it satisfies
the above conditions for some constant p ∈ [0, 1].
Remark 3.1.3. Strictly speaking, it does not make sense to say whether a single graph is quasiran-
dom, but we will abuse the definition as such when it is clear that the graph we are referring to is
part of a sequence.
The C4 condition may be surprising. It says that the 4-cycle density, a single statistic, is
equivalent to all the other quasirandomness conditions. We will soon see below in Proposition 3.1.12
that the C4 can be replaced by the equivalent condition that the number of labeled 4-cycles is
(p4 + o(1))n4 (rather than at most this quantity).
The discrepancy conditions are hard to verify since they involve checking exponentially many
sets. The other conditions can all be checked in time polynomial in the size of the graph. So the
equivalence gives us an algorithmically efficient way to certify the discrepancy condition.
Remark 3.1.4 (Quantiative equivalences). Rather than stating these properties for a sequence of
graphs using a decaying error term o(1), we can state a quantitative quasirandomness hypothesis for
a specific graph using an error tolerance parameter . For example, we can restate the discrepancy
condition as
DISC(): For all X, Y ⊂ V(G), |e(X, Y ) − p |X | |Y || < n2 .
And likewise with the other quasirandom graph notions. The proof below show that these
notions are equivalence up to a polynomial change in , i.e., for each pair of properties, Prop1()
implies Prop2( c ) for some constant c > 0, provided that 0 <  < 1/2.
3.1. QUASIRANDOM GRAPHS 77

Now we give some examples of quasirandom graphs. First let us check that random graphs are
quasirandom (hence justifying the name).
Recall the following basic tail bound for a sum of independent random variables.

Theorem 3.1.5 (Chernoff bound). Let X be a sum of m independent Bernoulli random variables
(not necessarily identically distributed). Then for every t > 0,
2 /(2m)
P(|X − EX | ≥ t) ≤ 2e−t
2 2
Proposition 3.1.6. Let p ∈ [0, 1] and  > 0. With probability at least 1 − 2n+1 e− n , the Erdős–
Rényi random graph G(n, p) has the property that for every vertex subset X,
 
|X |
e(X) − p ≤ n2
2
Proof. Applying the Chernoff bound to e(X), we see that
!
−(n2 )2
   
|X |  
P e(X) − p > n 2
≤ 2 exp  ≤ 2 exp − n .
2 2
2 2 |X2 |
The result then follows by taking a union bound over all 2n subsets X of the n-vertex graph. 
Applying the Borel–Cantelli lemma with the above bound, we obtain the following consequence.

Corollary 3.1.7 (Random graphs are quasirandom). Fix p ∈ [0, 1]. With probability 1, a sequence
of random graphs Gn ∼ G(n, p) is quasirandom at edge density p.
It would be somewhat disappointing if the only interesting example of quasirandom graph were
actual random graphs. Fortunately we have more explicit constructions. In the rest of the chapter,
we will see several constructions using Cayley graphs on groups. A notable example, which we
will prove in Section 3.3, is that the Paley graph is quasirandom.
Example 3.1.8 (Paley graph). Let p ≡ 1 (mod 4) be a prime. Form a graph with vertex set F p ,
with two vertices x, y joined if x − y is a quadratic residue. Then this graph is quasirandom at edge
density 1/2 as p → ∞. (By a standard fact from elementary number theory, since p ≡ 1 (mod 4),
−1 is a quadratic residue, and hence x − y is a quadratic residue if and only if y − x is. So the graph
is well defined.)
In Section 3.4, we will show that for certain sequence of groups, every sequence of Cayley
graphs on them is quasirandom provided that the edge densities converge. We will call such groups
quasirandom. We will later prove the following important example.
Example 3.1.9 (PSL(2, p)). Let p be a prime. Let S ⊂ PSL(2, p) be a subset of non-zero elements
with S = S −1 . Let G be the Cayley graph on PSL(2, p) with generator S, meaning that the vertices
are elements of PSL(2, p), and two vertices x, y are adjacent if x −1 y ∈ S. Then G is quasirandom
as p → ∞ as long as |S| /p3 converges.
Finally, here is an explicit construction using finite geometry. We leave it as an exercise to
verify its quasirandomness using the conditions given earlier.
Example 3.1.10. Let p be a prime. Let S ⊂ F p ∪ {∞}. Let G be a graph on vertex set F2p where
two points are joined if the slope of the line connecting them lies in S. Then G is quasirandom as
p → ∞ as long as |S| /p converges.
78 3. PSEUDORANDOM GRAPHS

Exercise 3.1.11. Prove that the construction in Example 3.1.10 is quasirandom.


We will now start to prove Theorem 3.1.1. Let us begin with a warm-up on how to use apply
the Cauchy–Schwarz inequality in graph theory since it will come up several times in the proof (we
will revisit this topic in Section 5.2).
The following statement says that the 4-cycle density is always roughly at least as much as
random. Later in Chapter 5, we will see Sidorenko’s conjecture, which says that all bipartite graphs
have this property.
As a consequence, the C4 condition is equivalent to saying that the number of labeled 4-cycles
is (p4 + o(1))n4 (rather than at most).

Proposition 3.1.12 (Minimum 4-cycle density). Every n-vertex graph with at least pn2 /2 edges has
at least p4 n4 labeled closed walks of length 4.
Remark 3.1.13. Since all but O(n3 ) such closed walks use four distinct vertices, the above statement
implies that the number of labeled 4-cycles is at least (p4 − o(1))n4 .
Proof. The number of closed walks of length 4 is
Õ w y
|{(w, x, y, z) closed walk}| = |{x : w ∼ x ∼ y}| 2
w,y
!2
1 Õ
≥ 2 |{x : w ∼ x ∼ y}| w y
n w,y
!2
1 Õ x
= 2 |{(w, y) : w ∼ x ∼ y}|
n x
!2
1 Õ x
= 2 (deg x)2
n x
!4
1 Õ x
≥ 4 deg x
n x
= (2e(G))4 /n4 ≥ p4 n4
Here both inequality steps are due to Cauchy–Schwarz. On the right column is a pictorial depiction
of what is being counted by the inner sum on each line. These diagrams are a useful way to
keep track of the graph inequalities, especially when dealing with much larger graphs, where the
algebraic expressions get unwieldly. Note that each application of the Cauchy–Schwarz inequality
corresponds to “folding” the graph along a line of reflection. 
We shall prove the equivalences of Theorem 3.1.1 in the following way:
DISC0 DISC COUNT

CODEG C4 EIG
|X | 
Proof that DISC implies DISC0. Take Y = X in DISC. (Note that e(X, X) = 2e(X) and 2 =
2
|X | /2 − O(n).) 
3.1. QUASIRANDOM GRAPHS 79

Proof that DISC0 implies DISC. We have the following “polarization identity”, together with a
proof by picture (recall 2e(X) = e(X, X)):
e(X, Y ) = e(X ∪ Y ) + e(X ∩ Y ) − e(X \ Y ) − e(Y \ X).

X
Y Y
X Y

\

\
X
X \Y
X ∩Y + = + − −
Y\X

If DISC0 holds, then the right-hand side above equals to


       
|X ∪ Y | |X ∩ Y | |X \ Y | |Y \ X |
p +p +p +p + o(n2 ) = p |X | |Y | + o(n2 ),
2 2 2 2
where the final step applies the polarization identity again, this time on the complete graph. So we
have e(X, Y ) = p |X | |Y | + o(n2 ) thereby confirming DISC. 
Proof (deferred) that DISC implies COUNT. This is essentially a counting lemma. In Section 2.6
we proved a version of the counting lemma but for lower bounds. The same proof can be modified
to a two-sided bound. We will see another proof of a counting lemma (Theorem 4.5.1) in the next
chapter on graph limits, which gives us a convenient language to set up a more streamlined proof.
So we will defer this proof until then. 
Proof that COUNT implies C4 . C4 is a special case of COUNT. 
Proof that C4 implies CODEG. Assuming C4 , we have
!2
Õ Õ 1 Õ 1 2 2
codeg(u, v) = 2
deg(x) ≥ deg(x) = pn + o(n ) = p2 n3 + o(n3 ).
2

u,v x∈G
n x∈G n
We also have (below the O(n3 ) error term is due to walks of length 4 that use repeated vertices)
Õ
codeg(u, v)2 = # labeled C4 + O(n3 )
u,v

≤ p4 n4 + o(n4 ).
Thus, by Cauchy–Schwarz,
 2 Õ  2
1 Õ 2 2
codeg(u, v) − p n ≤ codeg(u, v) − p n
n2 u,v u,v
Õ Õ
= codeg(u, v)2 − 2p2 n codeg(u, v) + p4 n4
u,v u,v

≤ p n − 2p n · p n + p n + o(n4 )
4 4 2 2 3 4 4

= o(n4 ). 
Remark 3.1.14. These calculations share the spirit of the second moment method in probabilistic
combinatorics. The condition C4 says that the variance of the codegree of two random vertices is
small.
Exercise 3.1.15. Show that if we modify the COEG condition to u,v∈V(G) codeg(u, v) − p2 n =
Í 

o(n3 ), then it would not be enough to imply quasirandomness.


80 3. PSEUDORANDOM GRAPHS

Proof that CODEG implies DISC. We first show that the codegree condition implies the concen-
tration of degrees:
 2 Õ
1 Õ
| deg u − pn| ≤ (deg u − pn)2
n u u
Õ Õ
= (deg u)2 − 2pn deg u + p2 n3
u u
Õ
= codeg(x, y) − 4pn e(G) + p2 n3
x,y

= p2 n3 − 2p2 n3 + p2 n3 + o(n3 )
= o(n3 ). (3.1.1)
Now we bound the expression in DISC. We have
 2
1 1 Õ
|e(X, Y ) − p |X | |Y || =
2
(deg(x, Y ) − p |Y |)
n n x∈X
Õ
≤ (deg(x, Y ) − p |Y |)2 .
x∈X
The above Cauchy–Schwarz step turned all the summands nonnegative, which affords us the next
step, expanding the domain of summation from X to all of V = V(G). Continuing,
Õ
≤ (deg(x, Y ) − p |Y |)2
x∈V
Õ Õ
= 2
deg(x, Y ) − 2p |Y | deg(x, Y ) + p2 n |Y | 2
x∈V x∈V
Õ Õ
= 0
codeg(y, y ) − 2p |Y | deg y + p2 n |Y | 2
y,y 0 ∈Y y∈Y

= |Y | p n − 2p |Y | · |Y | pn + p2 n |Y | 2 + o(n3 )
2 2
[by CODEG and (3.1.1)]
= o(n ). 3

Finally, let us consider the graph spectrum, i.e., the multiset of eigenvalues of the graph
adjacency matrix, accounting for eigenvalue multiplicities. Eigenvalues are core to the study of
pseudorandomness and they will play a central role in the rest of this chapter.
In this book, when we talk about the eigenvalues of a graph, we always mean the eigenvalues
of the adjacency matrix of the graph. In other contexts, it may be useful to consider other related
matrices, such as the Laplacian matrix, or a normalized adjacency matrix.
We will generally only consider real symmetric matrices, whose eigenvalues are always all real
(Hermitian matrices also have this property). Our usual convention is to list all the eigenvalues
in order (including multiplicities): λ1 ≥ λ2 ≥ · · · ≥ λn . We refer to λ1 as the top eigenvalue
(or largest eigenvalue), and λi as the i-th eigenvalue (or the i-th largest eigenvalue). The second
eigenvalue plays an important role. We write λi (A) for the i-th eigenvalue of the matrix A and
λi (G) = λi (AG ) where AG is the adjacency matrix of G.
Remark 3.1.16 (Linear algebra review). For every n × n real symmetric matrix A with eigenvalues
λ1 ≥ · · · ≥ λn , we can choose an eigenvector vi ∈ Rn for each eigenvalue λi (so that Avi = λi vi )
3.1. QUASIRANDOM GRAPHS 81

and such that {v1, . . . , vn } is an orthogonal basis of Rn (this is false for general non-symmetric
matrices).
The Courant–Fischer min-max theorem is an important characterization of eigenvalues in
terms of a variational problem. Here we only state some consequences most useful for us. We have
hv, Avi
λ1 = max .
nv∈R \{0} hv, vi
Once we have fixed a choice of an eigenvector v1 for the top eigenvalue λ1 , we have
hv, Avi
λ2 = max .
v⊥v1 hv, vi
v∈Rn \{0}

In particular, if G is a d-regular graph, then the all-1 vector, denoted 1 ∈ Rv(G) , is an eigenvector
for the top eigenvalue d.
The Perron–Frobenius theorem tells us some important information about the top eigenvector
and eigenvalue of a nonnegative matrix. For every connected graph G, the top eigenvector is simple
(i.e., multiplicity one), so that λi < λ1 for all i > 1. We also have |λi | ≤ λ1 for all i (one has
λn = −λ1 if and only if G is bipartite; see Remark 3.1.20 below). Also, the top eigenvector v1
(which is unique up to scalar multiplication) has all coordinates positive.
If G has multiple connected components G1, . . . , G k , then the eigenvalues of G (with multi-
plicities) are obtained by taking a multiset union of the eigenvalues of its connected components.
An orthogonal system of eigenvectors can also be derived as such, by extending each eigenvector
of Gi to an eigenvector of G via padding the eigenvector by zeros outside the vertices of Gi .
Here is a useful formula:
tr Ak = λ1k + · · · + λnk .
When A is the adjacency matrix of a graph G, tr Ak counts the number of closed walks of length k.
In particular, tr A2 = 2e(G).
Proof that EIG implies C4 . Let A denote the adjacency matrix of G. The number of labeled 4-cycles
is within O(n3 ) of the number of closed walks of length 4, and the latter equals
n
Õ
tr A =4
λ14 +···+ λn4 = p n + o(n ) +
4 4 4
λi4 .
i=2

Since tr A2 = 2e(G) ≤ n2 , we have


n
Õ n
Õ
λi4 ≤ max λi2 · λi2 . = o(n2 ) · tr A2 = o(n4 ).
i,1
i=2 i=1

So tr A4 ≤ p4 n4 + o(n4 ). 

i≥2 λi
by n maxi≥2 λi4 = o(n5 ), but this would
Remark 3.1.17. A rookie error would be to bound
Í 4

not be enough. (Where do we save in the above proof?) We will see a similar situation later in ??
in the Fourier analytic proof of Roth’s theorem.

Lemma 3.1.18. The top eigenvalue of the adjacency matrix of a graph is always at least its average
degree.
82 3. PSEUDORANDOM GRAPHS

Proof. Let 1 ∈ Rn be the all-1 vector. By the Courant–Fischer min-max theorem, the adjacency
matrix A of the graph G has top eigenvalue
hx, Axi h1, A1i 2e(G)
λ1 = sup ≥ = = avgdeg(G). 
x∈Rn hx, xi h1, 1i v(G)
x,0

Proof that C4 implies EIG. Again writing A for the adjacency matrix,
n
Õ
λi4 = tr A4 = # {closed walks of length 4} ≤ p4 n4 + o(n4 ).
i=1
On the other hand, by Lemma 3.1.18 above, we have λ1 ≥ pn+o(n). So we must have λ1 = pn+o(n)
and maxi≥2 |λi | = o(n). 
This completes all the implications in the proof of Theorem 3.1.1.
Remark 3.1.19 (Forcing graphs). The C4 hypothesis says that having 4-cycle density asymptotically
the same as random implies quasirandomness. Which other graphs besides C4 have this property?
Chung, Graham, and Wilson (1989) called a graph F forcing if every graph with edge density
p + o(1) and F-density pe(F) + o(1) (i.e., asymptotically the same as random) is automatically
quasirandom. Theorem 3.1.1 implies that C4 is forcing. It remains an open problem to determine
which graphs are forcing. The forcing conjecture says that F is forcing if and only if G is bipartite
and not a tree (Skokan and Thoma 2004; Conlon, Fox, and Sudakov 2010). We will revisit this
conjecture in Chapter 5 where we will reformulate it using the language of graphons.
More generally, one says that a family of graphs F is forcing if having F-density being
pe(F) + o(1) for each F ∈ F implies quasirandomness. So {K2, C4 } is forcing. It seems to be a
difficult problem to classify forcing families.
Even though many other graphs can potentially play the role of the 4-cycle, the 4-cycle never-
theless occupies an important role in the study of quasirandomness. The 4-cycle comes up naturally
in the proofs, as we will see below. It also is closely tied to other important pseudorandomness
measurements such as the Gowers U 2 uniformity norm in additive combinatorics.
Let us formulate a bipartite analogue of Theorem 3.1.1 since we will need it later. It is easy
to adapt the above proofs to the bipartite version—we encourage the readers to think about the
differences between the two settings.
Remark 3.1.20 (Eigenvalues of bipartite graphs). Given a bipartite graph G with vertex bipartition
V ∪ W, we can write its adjacency matrix as
 
0 B
A= (3.1.2)
B| 0
where B is an |V | × |W | matrix with rows indexed by V and columns indexed by W. The eigenvalues
λ1 ≥ · · · ≥ λn of A always satisfy
λi = λn+1−i for every 1 ≤ i ≤ n.
In other words, the eigenvalues are symmetric around zero. One way to see this is that if x = (v, w)
is an eigenvalue of A, where v ∈ RV is the restriction of x to the first |V | coordinates, and w is the
the restriction of x to the last |W | coordinates, then
λv
      
0 B v Bw
= λx = Ax = = ,
λw B| 0 w B| v
3.1. QUASIRANDOM GRAPHS 83

so that
Bw = λv and B | v = λw.
Then the vector x 0 = (v, −w) satisfies
      
0 B v −Bw −λv
Ax =
0
= = = −λx 0 .
B| 0 −w B| v λw
So we can pair each eigenvalue of A with its negation.
Exercise 3.1.21. Using the notation from (3.1.2), show that the positive eigenvalues of the adja-
cency matrix A coincide with the positive singular values of B (the singular values of B are also
the positive square roots of the eigenvalues of B | B).

Theorem 3.1.22 (Bipartite quasirandom graphs). Fix p ∈ [0, 1]. Let (G n )n≥1 be a sequence of
bipartite graphs Gn . Write Gn as G, with vertex bipartition V ∪ W. Suppose |V | , |W | → ∞ and
|E | = (p + o(1)) |V | |W | as n → ∞. The following properties are all equivalent:
DISC:
e(X, Y ) = p |X | |Y | + o(n2 ) for all X ⊂ V and Y ⊂ W .
COUNT: For every bipartite graph H with vertex bipartition (S, T), the number of labeled copies
of H in G with S embedded in V and T embedded in W is (pe(H) + o(1)) |V | |S| |W | |T | .
C4 : The number of closed walks of length 4 in G starting in V is at most (p4 + o(1)) |V | 2 |W | 2 .
Left-CODEG: x,y∈V codeg(x, y) − p2 |W | = o(|V | 2 |W |).
Í

Right-CODEG: x,y∈W codeg(x, y) − p2 |V | = o(|V | |W | 2 ).


Í
p
EIG: The
p adjacency matrix of G has top eigenvalue (p+o(1)) |X | |Y | and second largest eigenvalue
o( |X | |Y |).
The bipartite discrepancy condition DISC is equivalent to being an o(1)-regular pair (Defini-
tion 2.1.2, Exercise 2.1.20).
Remark 3.1.23 (Bipartite double cover). Theorem 3.1.22 implies the non-bipartite version Theo-
rem 3.1.1, since every graph G can be transformed into a bipartite graph G × K2 (a graph tensor
power) whose two vertex parts are both copies of V(G). Each edge u ∼ v of G lifts to two edges
(u, 0) ∼ (v, 1) and (u, 1) ∼ (u, 0) in G × K2 . It is not hard to check G satisfies each property in
Theorem 3.1.1 if and only if G × K2 satisfies the corresponding bipartite property in Theorem 3.1.22
(exercise).
Like earlier, random bipartite graphs are bipartite quasirandom. The proof (omitted) is essen-
tially the same as Proposition 3.1.6 and Corollary 3.1.7.

Proposition 3.1.24. Fix p ∈ [0, 1]. With probability 1, a sequence of bipartite random graphs
Gn ∼ G(n, n, p) (obtained by keeping every edge of Kn,n with probability p independently) is
quasirandom in the sense of Theorem 3.1.22.
Remark 3.1.25 (Sparse graphs). We stated quasirandom properties so far only for graphs of con-
stant order density (i.e., p is a constant). Let us think about what happens if we allow p = pn
to depend on n and decaying to zero as n → ∞. Such graphs are sometimes called sparse (al-
though some other authors reserve the word “sparse” for bounded degree graphs). Theorems 3.1.1
and 3.1.22 as stated do hold for a constant p = 0, but the results are not as informative as we would
84 3. PSEUDORANDOM GRAPHS

like. For example, the error tolerance on the DISC is o(n2 ), which does not tell us much since the
graph already has much fewer edges due to its sparseness anyway.
To remedy the situation, the natural thing to do is to adjust the error tolerance relative to the
edge density p = pn → 0. Here are some representative examples (all of these properties should
also depend on p):
Sparse-DISC: |e(X, Y ) − p |X | |Y || = o(pn2 ) for all X, Y ⊂ V(G).
Sparse-COUNTH : The number of labeled copies of H is (1 + o(1))pe(H) nv(H) .
Sparse-C4 : The number of labeled 4-cycles is at most (1 + o(1))p4 n4 .
Sparse-EIG: λ1 = (1 + o(1))pn and maxi,1 |λi | = o(pn).
Warning: these sparse pseudorandomness conditions are not all equivalent to each other. Some
of the implications still hold (the reader is encouraged to think about which ones). However, some
crucial implications such as the counting lemma fail quite miserably. For example:
Sparse-DISC does not imply Sparse-COUNT.
Indeed, suppose p = n−c for some constant 1/2 < c < 1. In a typical random graph G(n, p),
the number of triangles is close to 3n p3 , while the number of edges is close to 2n p. We have


p3 n3 = o(pn2 ) as long as p = o(n−1/2 ), so there are significantly fewer triangles than there are
edges. Now remove an edge from every triangle in this  random graph. We will have removed
o(pn2 ) edges, a negligible fraction of the (p + o(1)) 2n edges, and this edge removal should not
significantly affect Sparse-DISC. However, we have changed the triangle count significantly as a
result.
Fortunately, this is not the end of the story. With additional hypotheses on the sparse graph, we
can sometimes salvage a counting lemma. Sparse counting lemmas play an important role in the
proof of the Green–Tao theorem on arithmetic progressions in the primes, as we will explain in ??.
The next several exercises ask you to prove additional equivalent quasirandomness properties.
It is easy to verify that the quasirandom graphs indeed satisfy each of the properties below.
Exercise 3.1.26∗ (Quasirandomness through fixed sized subsets). Fix p ∈ [0, 1]. Let (G n ) be a
sequence of graphs with v(Gn ) = n (here n → ∞ along a subsequence of integers).
(1) Fix a single α ∈ (0, 1). Suppose
pα2 n2
e(S) = + o(n2 ) for all S ⊂ V(G) with S = bαnc .
2
Prove that G is quasirandom.
(2) Fix a single α ∈ (0, 1/2). Suppose
e(S, V(G) \ S) = pα(1 − α)n2 + o(n2 ) for all S ⊂ V(G) with S = bαnc .
Prove that G is quasirandom. Furthermore, show that the conclusion is false for α = 1/2.
Exercise 3.1.27 (Quasirandomness and regularity partitions). Fix p ∈ [0, 1]. Let (G n ) be a sequence
of graphs with v(Gn ) → ∞. Suppose that for every  > 0, there exists M = M() so that each Gn
has an -regular partition where all but -fraction of vertex pairs lie between pairs of parts with
edge density p + o(1) (as n → ∞). Prove that Gn is quasirandom.
Exercise 3.1.28∗ (Triangle counts on induced subgraphs). Fix p ∈ [0, 1]. Let (G n ) be a sequence
of graphs with v(Gn ) = n. Let G = Gn . Suppose that for every S ⊂ V(G), the number of triangles
in the induced subgraph G[S] is p3 |S|
3 + o(n ). Prove that G is quasirandom.
3

3.2. EXPANDER MIXING LEMMA 85

Exercise 3.1.29∗ (Perfect matchings). Prove that there are constant β,  > 0 such that for every
positive even integer n and real p ≥ n−β , if G is an n-vertex graph where every vertex has degree
(1 ± )pn (meaning within  pn of pn) and every pair of vertices has codegree (1 ± )p2 n, then G
has a perfect matching.
3.2. Expander mixing lemma
We dive further into the relationship between graph eigenvalues and its pseudorandomness
properties. We focus on d-regular graphs since they occur often in practice (e.g., from Cayley
graphs), and they are also cleaner to work with. Unlike the previous section, the results here are
effective for any value of d (not just when d is on the same order as n).
As we saw earlier, the magnitudes of eigenvalues are related to the pseudorandomness of a
graph. In a d-regular graph, the top eigenvalue is always exactly d. The following condition says
that all other eigenvalues are bounded by λ in absolute value.
Definition 3.2.1. An (n, d, λ)-graph is an n-vertex, d-regular graph whose adjacency matrix eigen-
values d = λ1 ≥ · · · ≥ λn satisfy
max |λi | ≤ λ.
i,1

Remark 3.2.2 (Notation). Rather than saying, e.g., “an (n, 7, 6)-graph,” we prefer to say “an (n, d, λ)-
graph with d = 7 and λ = 6” for clarity as the name “(n, d, λ)” is quite standard and recognizable.
Remark 3.2.3 (Linear algebra review). The operator norm of a matrix A ∈ Rm×n is defined by
| Ax| hy, Axi
k Ak = sup = sup .
x∈Rn \{0} |x| x∈Rn \{0} |x| |y|
y∈Rm \{0}

Here |x| = hx, xi denotes the length of vector x. The operator norm of A is the maximum ratio
p

that A can amplify the length of a vector by. If A is a real symmetric matrix, then
k Ak = max |λi (A)| .
i
For general matrices, the operator norm of A equals the largest singular value of A.
Here is the main result of this section.

Theorem 3.2.4 (Expander mixing lemma). If G is an (n, d, λ)-graph, then


d p
e(X, Y ) − |X | |Y | ≤ λ |X | |Y | for all X, Y ⊂ V(G).
n
On the left-hand side, (d/n) |X | |Y | is the number of edges that one should expect between X
and Y purely based on the edge density d/n of the graph and the sizes of X and Y . Note that unlike
the discrepancy condition (DISC) from quasirandom graphs (Theorem 3.1.1), the error bound on
the right-side hand depends on the sizes of X and Y . We can apply the expander mixing lemma to
small subsets X and Y and still obtain useful estimates on e(X, Y ), unlike the dense quasirandom
graph conditions.
Proof. Let J be the n × n all-1 matrix. Since the all-1 vector 1 ∈ Rn is an eigenvector of AG with
eigenvalue d, we see that 1 is an eigenvector of AG − dn J with eigenvalue 0. Any other eigenvector
v of AG , with v ⊥ 1, satisfies Jv = 0, and thus v is also an eigenvector of AG − dn J with the same
eigenvalue as in AG . Therefore, the eigenvalues of AG − dn J are obtained by taking the eigenvalues
86 3. PSEUDORANDOM GRAPHS

of AG then replacing one top eigenvalue d by zero. All the other eigenvalues of AG − dn J are
therefore at most λ in absolute value, so AG − dn J ≤ λ. Therefore,
   
d d
e(X, Y ) − |X | |Y | = 1 X , AG − J 1Y
n n
d
≤ AG − J |1 X | |1Y |
n
p
≤ λ |X | |Y |. 
Exercise 3.2.5. Prove the following strengthening the expander mixing lemma.

Theorem 3.2.6. If G is an (n, d, λ)-graph, then


d λp
e(X, Y ) − |X | |Y | ≤ |X | (n − |X |) |Y | (n − |Y |) for all X, Y ⊂ V(G).
n n
We also have a bipartite analogue (the nomenclature used here is less standard). Recall from
Remark 3.1.20 that the eigenvalues of a bipartite graph are symmetric around zero.
Definition 3.2.7. An bipartite-(n, d, λ)-graph is a d-regular bipartite graph with n vertices in each
part, such that its second largest eigenvalue is at most λ.
Exercise 3.2.8. Show that G is an (n, d, λ)-graph if and only if G × K2 is a bipartite-(n, d, λ)-graph.

Theorem 3.2.9 (Bipartite expander mixing lemma). Let G be a bipartite-(n, d, λ)-graph with vertex
bipartition V ∪ W. Then
d p
e(X, Y ) − |X | |Y | ≤ λ |X | |Y | for all X ⊂ V and Y ⊂ W .
n
Exercise 3.2.10. Prove Theorem 3.2.9.
Remark 3.2.11. The following partial converse to the expander mixing lemma was shown by Bilu
and Linial (2006). The extra log factor turns out to be necessary.

Theorem 3.2.12 (Converse to expander mixing lemma). There exists an absolute constant C such
that if G is a d-regular graph, and β satisfies
d p
e(X, Y ) − |X | |Y | ≤ β |X | |Y | for all X, Y ⊂ V(G),
n
then G is an (n, d, λ)-graph with λ ≤ C β log(2d/β).
Remark 3.2.13 (Edge expansion versus spectral gap). Let us mention another important theorem
relating the eigenvalues and expansion. The spectral gap is defined to be the difference between
the two most significant eigenvalues, i.e., λ1 − λ2 for the adjacency matrix of a graph. This quantity
turns out to be closely related to expansion in graphs. We define the edge-expansion ratio of a
graph G = (V, E) to be the quantity
eG (S, V \ S)
h(G) := min .
S⊂V |S|
0<|S|≤|V |/2
In other words, a graph with edge-expansion ratio at least h has the property that for every nonempty
subset of vertices S with |S| ≤ |V | /2, there are at least h |S| edges leaving S.
3.2. EXPANDER MIXING LEMMA 87

Cheeger’s inequality, stated below, tells us that among d-regular graphs for a fixed d, having
spectral gap bounded away from zero is equivalent to having edge-expansion ratio bounded away
from zero. Cheeger (1970) originally developed this inequality for Riemannian manifolds. The
graph theoretic analogue was proved by Dodziuk (1984), and independently by Alon and Milman
(1985) and Alon (1986).

Theorem 3.2.14 (Cheeger’s inequality). Let G be an n-vertex d-regular graph with adjacency
matrix spectral gap κ = d − λ2 . Then its edge-expansion ratio h = h(G) satisfies

κ/2 ≤ h ≤ 2dκ.
The two bounds of Cheeger’s inequality are tight up to constant factors. For the lower bound,
taking G to be the skeleton of the d-dimensional cube with vertex set {0, 1} d gives h = 1 (achieved
by the d − 1 dimensional subcube) and κ = 2. For the upper bound, taking G to be an n-cycle gives
h = 2/(n/2) = Θ(1/n) while d = 2 and κ = 2 − 2 cos(2π/n)) = Θ(1/n2 ).
We call a family of d-regular graphs expanders if there is some constant κ0 > 0 so that each
graph in the family has spectral gap ≥ κ0 ; by Cheeger’s inequality, this is equivalent to the existence
of some h0 > 0 so that each graph in the family has edge expansion ratio ≥ h0 . Expander graphs are
important objects in mathematics and computer science. For example, expander graphs have rapid
mixing properties, which are useful for designing efficient Monte Carlo algorithms for sampling
and estimation.
The following direction of Cheeger’s inequality is easier to prove. It is similar to the expander
mixing lemma.
Exercise 3.2.15 (Spectral gap implies expansion). Prove the κ/2 ≤ h part of Cheeger’s inequality.

The other direction, h ≤ 2dκ, is more difficult and interesting. The proof is outlined in the
following exercise.
Exercise 3.2.16 (Expansion implies spectral gap). Let G = (V, E) be a d-regular graph with
spectral gap κ. Let x = (xv )v∈V ∈ RV be an eigenvector associated to the second largest eigenvalue
λ2 = d − κ of the adjacency matrix of G. Assume that xv > 0 on at most half of the vertex set (or
else we replace x by −x). Let y = (yv )v∈V ∈ RV be obtained from x by replacing all its negative
coordinates by zero.
(a) Prove that
hy, Ayi
d− ≤ κ.
hy, yi
Hint: recall that λ2 xv =
Í
u∼v xu .
(b) Let Õ
Θ= yu2 − yv2 .
uv∈E
Prove that
Θ2 ≤ 2d(d hy, yi − hy, Ayi) hy, yi .
Hint: yu2 − yv2 = (yu − yv )(yu + yv ). Apply Cauchy–Schwarz.
(c) Relabel the vertex set V by [n] so that y1 ≥ y2 · · · ≥ yt > 0 = yt+1 = · · · = yn . Prove
t
Õ
Θ= (y k2 − y k+1
2
) e([k], [n] \ [k]).
k=1
88 3. PSEUDORANDOM GRAPHS

(d) Prove that for some 1 ≤ k ≤ t,


e([k], [n] \ [k]) Θ
≤ .
k hy, yi

(e) Prove the h ≤ 2dκ claim of Cheeger’s inequality.
Exercise 3.2.17 (Independence numbers). Prove that every independent set in a (n, d, λ)-graph has
size at most nλ/(d + λ).
Exercise 3.2.18 (Diameter). Prove that the diameter of an (n, d, λ)–graph is at most dlog n/log(d/λ)e.
(The diameter of a graph is the maximum distance between a pair of vertices.)
Exercise 3.2.19 (Counting cliques). For each part below, prove that for every  > 0, there exists
δ > 0 such that the conclusion holds for every (n, d, λ)-graph G with d = pn.
(a) If λ ≤ δp2 n, then the number of triangles of G is within a 1 ±  factor of p3 n3 /3.
(b*) If λ ≤ δp3 n, then the number of K4 ’s in G is within a 1 ±  factor of p6 n4 /6.
3.3. Cayley graphs on Z/pZ
Many important constructions of pseudorandom graphs come from groups.
Definition 3.3.1. Let Γ be a finite group, and let S ⊂ Γ be a subset with S = S −1 (i.e., s−1 ∈ S for
all s ∈ S) and not containing the identity element. We write Cay(Γ, S) to denote the Cayley graph
on Γ generated by S, which has elements of Γ as vertices, and
g ∼ gs for all g ∈ Γ and s ∈ S.
as edges.
In this section, we only consider abelian groups, particularly Z/pZ. For abelian groups, we
write the group operation additively, i.e., g + s. So edges join elements whose difference lies in S.
Remark 3.3.2. In later sections when we consider a non-abelian group Γ, one needs to make a
choice whether to define edges by left- or right-multiplication (i.e., gs or sg; we chose gs here). It
does not matter which choice one makes (as long as one is consistent) since the resulting Cayley
graphs are isomorphic (why?). However, some careful bookkeeping is sometimes required to make
sure that later computations are consistent with initial choice.
Example 3.3.3. Cay(Z/nZ, {−1, 1}) is a cycle of length n.

Cay(Z/8Z, {−1, 1})

Example 3.3.4. Cay(F2n, {e1, . . . , en }) is the skeleton of an n-dimensional cube. Here ei is the i-th
standard basis vector. The graphs for n = 1, 2, 3, 4 are illustrated below..
3.3. CAYLEY GRAPHS ON Z/pZ 89

Here is an explicitly constructed family of quasirandom graphs with edge density 1/2 + o(1).
Definition 3.3.5. Given prime p ≡ 1 (mod 4), the Paley graph of order p is Cay(Z/pZ, S), where
S is the set of non-zero quadratic residues in Z/pZ (here Z/pZ is viewed as an additive group).
Example 3.3.6. The Paley graphs for p = 5 and p = 13 are shown below.
0 0
12 1
11 2
4 1
10 3

9 4

3 2 8 5
7 6

Cay(Z/5Z, {±1}) Cay(Z/13Z, {±1, ±3, ±4})

 Here we recall some facts from elementary number theory.


Remark 3.3.7 (Quadratic residues).
For every odd prime p, the set S = a2 : a ∈ F×p of quadratic residues is a multiplicative subgroup
of F×p with index two. In particular, |S| = (p − 1)/2. We have −1 ∈ S if and only if p ≡ 1 (mod 4)
(which is required to define a Cayley graph, as the generating set needs to be a symmetric set, i.e.,
S = −S).
We will show that Paley graphs are quasirandom by verifying the EIG condition, i.e., all
eigenvalues, except the top one, are small. Here is a general formula for computing the eigenvalues
of any Cayley graph on Z/pZ.

Theorem 3.3.8 (Eigenvalues of abelian Cayley graphs on Z/nZ). Let n be a positive integer. Let
S ⊂ Z/nZ with 0 , S and S = −S. Let
ω = exp(2πi/n).
Then we have an orthonormal basis v0, . . . , vn−1 ∈ Cn of eigenvectors of Cay(Z/nZ, S) where

v j ∈ Cn has x-coordinate ω j x / n, for each x ∈ Z/nZ.
The eigenvalue associated to the eigenvector v j equals to
Õ
λj = ω js .
s∈S

In particular, λ0 = |S| and v0 has all coordinates 1/ n.
Remark 3.3.9 (Eigenvalues and the Fourier transform). The coordinates of the eigenvectors are
shown below.
Z/nZ
√ 0 1 ··· n−1
2
√n v0 1 1 ···
1 1
n v1 1 ω ω2
· · · ωn−1

n v2 1 ω2 ω4
· · · ω2(n−1)
.. .. .. ..
.. ..
. . . . . .
√ 2
n vn−1 1 ω n−1 ω 2(n−1) ··· ω (n−1)

Viewed as a matrix, this is sometimes known as the discrete Fourier transform matrix. We will
study the Fourier transform in Chapter 6. These two topics are closely tied. The eigenvalues of an
90 3. PSEUDORANDOM GRAPHS

abelian Cayley graph Cay(Γ, S) are precisely the Fourier transform in Γ of the generating set S, up
to normalizing factors:
eigenvalues of Cay(Γ, S) ←→ Fourier transform 1bS in Γ.
We will say more about this in Remark 3.3.11 below.
Proof. Let A be the adjacency matrix of Cay(Z/nZ, S). First we check that each v j is an eigenvector

of A with eigenvalue λ j . The coordinate of nAv j at x ∈ Z/nZ equals to
Õ Õ 
ω j(x+s)
= ω ωjx = λjωjx.
js

s∈S s∈S
So Av j = λ j v j .
Next we check that {v0, . . . , vn−1 } is an orthonormal basis. We have the inner product
1 
hv j , vk i = 1 · 1 + ω j ω k + ω2 j ω2k + · · · + ω(n−1) j ω(n−1)k
n (
1  1 if j = k,
= 1 + ω k− j + ω2(k− j) + · · · + ω(n−1)(k− j) = .
n 0 if j , k.

For the i , j case, we use that for any m-th root of unity ζ , 1, m−1 j=0 ζ = 0. So {v0, . . . , vn−1 } is
j
Í
an orthonormal basis. 
Remark 3.3.10 (Real vs complex eigenbases). The adjacency matrix of a graph is a real symmetric
matrix, so all its eigenvalues are real, and it always has a real orthogonal eigenbasis. The eigenbasis
given in Theorem 3.3.8 is complex, but it can always be made real. Looking at the formulas in
Theorem 3.3.8, we have λ j = λn− j , and v j is the complex conjugate of vn− j . So we can form √ a real
√ eigenbasis by replacing, for each j < {0, n/2}, the pair (v j , vn− j ) by ((v j +vn− j )/ 2, i(v j −
orthogonal
vn− j )/ 2). Equivalently, we can separate the real and imaginary parts of each v j , which are both
eigenvectors with eigenvalue λ j . All the real eigenvalues and eigenvectors can be expressed in
terms of sines and cosines.
Remark 3.3.11 (Every abelian Cayley graph has an eigenbasis independent of the generators). The
above theorem and its proof generalizes to all finite abelian groups, not just Z/nZ. For every finite
abelian group Γ, we have a set b Γ of characters, i.e., homomorphisms χ : Γ → C× . Then b Γ turns
out to be a group isomorphic to Γ (one can check this by first writing Γ as a direct product of
cyclicpgroups). For each χ ∈ bΓ, define the vector v χ ∈ CΓ by setting the coordinate at g ∈ Γ to be
χ(g)/ |Γ|. Then {v χ : χ ∈ b Γ} is an orthonormal basis forÍthe adjacency matrix of every Cayley
graph on Γ. The eigenvalue corresponding to v χ is λ χ (S) = s∈S χ(s). Up to normalization, λ χ (S)
is the Fourier transform of the indicator function of S on the abelian group Γ (Theorem 3.3.8 is
a special case of this construction). In particular, this eigenbasis {v χ : χ ∈ bΓ} depends only on
the finite abelian group and not on the generating set S. In other words, we have a simultaneous
diagonalization for all adjacency matrices of Cayley graphs on a fixed finite abelian group.
If Γ is a non-abelian group, then there does not exist a simultaneous eigenbasis for all Cayley
graphs on Γ. There is a corresponding theory of non-abelian Fourier analysis, which uses group
representation theory. We will discuss more about non-abelian Cayley graphs in Section 3.4.
Now we apply the above formula to compute eigenvalues of Paley graphs. In particular, the
following tells us that Paley graphs satisfy the quasirandomness condition EIG from Theorem 3.1.1.
3.3. CAYLEY GRAPHS ON Z/pZ 91

Theorem 3.3.12 (Eigenvalues of Paley graphs). Let p ≡ 1 (mod 4) be a prime. The adjacency of
matrix of the Paley graph of order p has top eigenvalue (p − 1)/2, and all other eigenvalues are
√ √
either ( p + 1)/2 or (− p + 1)/2.
Proof. Applying Theorem 3.3.8, we see that the eigenvalues are given by, for j = 0, 1, . . . , p − 1,
 
Õ 1 Õ
j x2
λj = ω = −1 + js
ω ,
s∈S
2 x∈F p

since each quadratic residue s appears as x 2 for exactly two non-zero x. Clearly λ0 = (p − 1)/2. For

j , 0, the next result shows that the inner sum on the right-hand side is ± p (note that the above
sum is real when p ≡ 1 (mod 4) since S = S −1 and so the sum equals to its own complex conjugate;
alternatively, the sum must be real since all eigenvalues of a symmetric matrix are real). 
Remark 3.3.13. Since the trace of the adjacency matrix is zero, and equals the sum of eigenvalues,
√ √
we see that the non-top eigenvalues are equally split between ( p + 1)/2 and (− p + 1)/2.

Theorem 3.3.14 (Gauss sum). Let p be an odd prime, ω = exp(2πi/p), and j ∈ F p \ {0}. Then
Õ 2 √
ωjx = p.
x∈F p

Proof. We have
Õ 2 Õ Õ
j x2 2 −x 2 ) 2
ω = ω j((x+y) = ω j(2xy+y ) .
x∈F p x,y∈Z/pZ x,y∈Z/pZ
For each fixed y, we have
(
Õ
j(2xy+y 2 ) p if y = 0,
ω =
x∈Z/pZ
0 if y , 0.
Summing over y yields the claim. 
Remark 3.3.15 (Sign of the Gauss sum). The determination of this sign is a more difficult problem.
Gauss conjectured the sign in 1801 and it took him four years to prove it. When j is a nonzero

quadratic residue mod p, the inner sum above turns out to equal to p if p ≡ 1 (mod 4) and
√ √ √
i p if p ≡ 3 (mod 4). When j is a quadratic non-residue, it is − p and −i p in the two cases
respectively. For a proof, see, e.g., Ireland and Rosen (1990, Section 6.4).
Exercise 3.3.16. Let p be an odd prime and A, B ⊂ Z/pZ. Show that
Õ Õ a + b p
≤ p | A| |B|
a∈A b∈B
p
where (a/p) is the Legendre symbol defined by
  
0
 if a ≡ 0 (mod p)
a

= 1

if a is a nonzero quadratic residue mod p
p 
 −1 if a is a quadratic nonresidue mod p



Exercise 3.3.17. Prove that in a Paley graph of order p, every clique has size at most p.
92 3. PSEUDORANDOM GRAPHS

Exercise 3.3.18 (No spectral gap if too few generators). Prove that for every  > 0 there is some
c > 0 such that for every S ⊂ Z/nZ with 0 < S = −S and |S| ≤ c log n, the second largest eigenvalue
of the adjacency matrix of Cay(Z/nZ, S) is at least (1 − ) |S|.
Exercise 3.3.19∗. Let p be a prime and let S be a multiplicative subgroup of F×p . Suppose −1 ∈ S.
Prove that all eigenvalues of the adjacency matrix of Cay(Z/pZ, S), other than the top one, are at

most p in absolute value.

3.4. Quasirandom groups


In the previous section, we saw that certain Cayley graphs on cyclic groups are quasirandom.
In this section, we will see that for certain families of non-abelian groups, every Cayley graph
on the group is quasirandom, regardless of the Cayley graph generators. Gowers (2008) called
such groups quasirandom groups, and showed that they are precisely groups with no small non-
trivial representations. He came up with this notion while solving the following problem about
product-free sets in groups.
Question 3.4.1 (Product-free subset of groups). Given a group of order n, what is the size of its
largest product-free subset? Is it always ≥ cn for some constant c > 0?
Remark 3.4.2 (Representations of finite groups). Representation theory allows us to study groups
in terms of linear transformations on a vector space. Here, we only need the very basic concepts
of group representation theory. Let Γ be a finite group. A representation of a Γ is a group
homomorphism ρ : Γ → GL(V), where V is a complex vector space (everything will take place
over C). We say that Γ acts on V via ρ. For each g ∈ Γ and v ∈ V, we write gv = ρ(g)v for the
image of the g-action on v. We write dim ρ = dim V for the dimension of the representation. We
say that ρ is trivial if gv = v for all g ∈ Γ and v ∈ V (i.e., ρ sends Γ to the identity element of
GL(V)), and non-trivial otherwise.
Recall from Definition 3.2.1 that an (n, d, λ)-graph is an n-vertex d-regular graph all of whose
eigenvalues, except the top one, are at most λ in absolute value.
The main theorem of this section, below, says that a group with no small non-trivial represen-
tations always produces quasirandom Cayley graphs (Gowers 2008).

Theorem 3.4.3. Let Γ be a group of order n with no non-trivial representations of dimension less
than K. Then every d-regular Cayley graph on Γ is an (n, d, λ)-graph for some λ <
p
dn/K.
More generally we will prove the result for vertex-transitive groups, of which Cayley graphs is
a special case.
Definition 3.4.4 (vertex-transitive graphs). Let G be a graph. An automorphism of G is a permu-
tation of V(G) that induces an isomorphism of G to itself (i.e., sending edges to edges). Let Γ be a
group of automorphisms of G (not necessarily the whole automorphism group). We say that Γ acts
vertex-transitively on G if for every pair v, w ∈ V(G) there is some g ∈ Γ such that gv = w. We
that G is a vertex-transitive graph if the automorphism group of G acts vertex-transitively on G.
In particular, every group Γ acts vertex-transitively on its Cayley graph Cay(Γ, S) by left-
multiplication: the action of g ∈ Γ sends each vertex x ∈ Γ to gx ∈ Γ, which sends each edge
(x, xs) to (gx, gxs), for all x ∈ Γ and s ∈ S.
3.4. QUASIRANDOM GROUPS 93

Theorem 3.4.5. Let Γ be a finite group with no non-trivial representations of dimension less than
p n-vertex d-regular graph that admits a vertex-transitive Γ action is an (n, d, λ)-graph
K. Then every
with λ < dn/K.
p √
Note that dn/K ≤ n/ K, so that a sequence of such Cayley graphs is quasirandom (Defini-
tion 3.1.2) as long as K → ∞ as n → ∞.
Proof. Let A denote the adjacency matrix of the graph, whose vertices are indexed by {1, . . . , n}.
Each g ∈ Γ gives a permutation (g(1), . . . , g(n)) of the vertex set, which induces a representation
of Γ on Cn given by permuting coordinates, sending v = (v1, . . . , vn ) ∈ Cn to gv = (vg(1), . . . , vg(n) ).
We know that the all-1 vector 1 is an eigenvector of A with eigenvalue d. Let v ∈ Rn be
an eigenvector of A with eigenvalue µ such that v ⊥ 1. Since each g ∈ Γ induces a graph
automorphism, Av = µv implies A(gv) = µgv (since g relabels vertices in an isomorphically
indistinguishable way).
Since Γv = {gv : g ∈ Γ} is Γ-invariant, its C-span W is a Γ-invariant subspace (i.e., gW ⊂ W
for all g ∈ Γ), and hence a sub-representation of Γ. Since v is not a constant vector, the Γ-action
on v is non-trivial. So W is a non-trivial representation of Γ. Hence dim W ≥ K by hypothesis.
Every nonzero vector in W is an eigenvector of A with eigenvalue µ. It follows that µ appears as
an eigenvalue of A with multiplicity at least K. Recall that we also have an eigenvalue d from the
eigenvector 1. Thus
Õn
d 2 + K µ2 ≤ λ j (A)2 = tr A2 = nd.
j=1
Therefore r r
d(n − d) dn
| µ| ≤ < .
K K

The above proof can be modified to prove a bipartite version, which will be useful for certain
applications.
Given a finite group Γ and a subset S ⊂ Γ (not necessarily symmetric), we define the bipartite
Cayley graph BiCay(Γ, S) as the bipartite graph with vertex set Γ on both parts, with an edge
joining g on the left with gs on the right for every g ∈ Γ and s ∈ S.

Theorem 3.4.6. Let Γ be a group of order n with no non-trivial representations of dimension less
p |S| = d. Then the bipartite Cayley graph BiCay(Γ, S) is a bipartite-(n, d, λ)-
than K. Let S ⊂ Γ with
graph for some λ < nd/K.
In other words,p the second largest eigenvalue of the adjacency matrix of this bipartite Cayley
graph is less than nd/K.
Exercise 3.4.7. Prove Theorem 3.4.6.

As an application of the expander mixing lemma, we show that in a quasirandom group, the
number of solutions to x y = z with x, y, z lying in three given sets X, Y, Z ⊂ Γ is close to what
one should predict from density alone. Note that the right-hand side expression below is relatively
small if K 2 is large compared to |X | |Y | |Z | /|Γ| 3 (e.g., if X, Y, Z each occupy at least a constant
proportion of the group, and K tends to infinity).
94 3. PSEUDORANDOM GRAPHS

Theorem 3.4.8 (Mixing in quasirandom groups). Let Γ be a finite group with no non-trivial repre-
sentations of dimension less than K. Let X, Y, Z ⊂ Γ. Then
r
|X | |Y | |Z | |X | |Y | |Z | |Γ|
# {(x, y, z) ∈ X × Y × Z : x y = z} − < .
|Γ| K
Proof. Every solution to x y = z, with (x, y, z) ∈ X × Y × Z corresponds to an edge (x, z) in
BiCay(Γ, Y ) between vertex subset X on the left and vertex subset Z on the right.

X
z
Γ y = x −1 z ∈ Y Γ

x Z
BiCay(Γ, Y )

By Theorem 3.4.6, BiCay(Γ, Y ) is a bipartite-(n, d, λ)-graph with n = |Γ|, d = |Y |, and some


λ < |Γ| |Y | /|K |. The above inequality then follows from applying the bipartite expander mixing
p

lemma, Theorem 3.2.9, to BiCay(Γ, Y ). 

Corollary 3.4.9 (Product-free sets). Let Γ be a finite group with no non-trivial representations of
dimension less than K. Let X, Y, Z ⊂ Γ. If there is no solution to x y = z with (x, y, z) ∈ X × Y × Z,
then
|Γ| 3
|X | |Y | |Z | < .
K
In particular, every product-free X ⊂ Γ (product-free meaning that there is no solution to xy = z
with x, y, z ∈ X) has size less than |Γ| /K 1/3 .
Proof. If there is no solution to xy = z, then the left-hand side of the inequality in Theorem 3.4.8
is |X | |Y | |Z | /|Γ|. Rearranging gives the result. 
The above result already shows that all product-free subsets of a quasirandom group must be
small. This sharply contrasts the abelian setting. For example, in Z/nZ (written additively), there
is a sum-free subset of size around n/3 consisting of all group elements strictly between n/3 and
2n/3.
Exercise 3.4.10 (Growth and expansion in quasirandom groups). Let Γ be a finite group with no
non-trivial representations of dimension less than K. Let X, Y, Z ⊂ Γ. Suppose |X | |Y | |Z | ≥
|Γ| 3 /K. Then XY Z = Γ (i.e., every element of Γ can be expressed as x yz for some (x, y, z) ∈
X × Y × Z).
Now let us see some examples of quasirandom groups.
Example 3.4.11 (Quasirandom groups). Here are some examples of groups with no small non-
trivial representations.
(a) A classic result of Frobenius from around 1900 shows that every non-trivial representation
of PSL(2, p) has dimension at least (p − 1)/2 for all prime p. A short proof is included
below. Jordan (1907) and Schur (1907) computed the character tables for PSL(2, q) for all
prime power q. In particular, we know that every non-trivial representation of PSL(2, q) has
dimension ≥ (q − 1)/2 for all prime power q.
3.4. QUASIRANDOM GROUPS 95

(b) The alternating group Am for m ≥ 2 has order m!/2, and its smallest non-trivial representation
has dimension m − 1 = Θ(log n/log log n). The representations of symmetric and alternating
groups have a nice combinatorial description using Young diagrams. See, e.g., Sagan (2001)
or Fulton and Harris (1991) for expository accounts of this theory.
(c) Gowers (2008, Theorem 4.7) gives an elementary proof that in every non-cyclic
p simple group
of order n, the smallest non-trivial representation has dimension at least log n/2.
Recall that the special linear group SL(2, p) is the group of 2 × 2 matrices (under multiplication)
with determinant 1:
  
a b
SL(2, p) = : a, b, c, d ∈ F p, ad − bc = 1 .
c d
The projective special linear group PSL(2, p) is a quotient of SL(2, p) by all scalars, i.e.,
PSL(2, p) = SL(2, p)/{±I} .
The following result is due to Frobenius

Theorem 3.4.12 (PSL(2, p) is quasirandom). Let p be a prime. Then all non-trivial representations
of SL(2, p) and PSL(2, p) have dimension at least (p − 1)/2.
Proof. The claim is trivial for p = 2, so we can assume that p is odd. It suffices to prove the claim
for SL(2, p). Indeed, any non-trivial representation of PSL(2, p) can be made into a representation
of SL(2, p) by first passing through the quotient SL(2, p) → SL(2, p)/{±I} = PSL(2, p).
Now suppose ρ is a non-trivial representation of SL(2, p). The group SL(2, p) is generated by
the elements (Exercise: check!)
   
1 1 1 0
g= and h= .
0 1 −1 1
These two elements are conjugate in SL(2, p) via z = 11 −1 0 as gz = zh. If ρ(g) = I, then ρ(h) = I

by conjugation, and ρ would be trivial since g and h generate the group. So, ρ(g) , I. Since g p = I,
we have ρ(g) p = I. So ρ(g) is diagonalizable (here we use that a polynomial is diagonalizable if an
only if its minimal polynomial has distinct roots, and that the minimal polynomial of ρ(g) divides
X p − 1). Since ρ(g) , I, ρ(g) has an eigenvalue λ , 1. Since ρ(g) p = I, λ is a primitive p-th root
of unity.
For every a ∈ F×p , g is conjugate to
     
a 0 1 1 a−1 0 1 a2 2
−1 = = ga .
0 a 0 1 0 a 0 1
2
Thus ρ(g) is conjugate to ρ(g)a . Hence these two matrices have same set of eigenvalues. So
2
λ a is an eigenvalue of ρ(g) for every a ∈ F×p , and by ranging over all a ∈ F×p , this gives
(p − 1)/2 distinct eigenvalues of ρ(g) (recall that λ is a primitive p-th root of unity). It follows that
dim ρ ≥ (p − 1)/2. 
Applying Corollary 3.4.9 with Theorem 3.4.12 yields the following corollary (Gowers 2008).
Note that the order of PSL(2, p) is (p3 − p)/2.

Corollary 3.4.13. The largest product-free subset of PSL(2, p) has size O(p3−1/3 ).
In particular, there exist infinitely many groups of order n whose largest product-free subset
has size O(n8/9 ).
96 3. PSEUDORANDOM GRAPHS

Before Gowers’ work, it was not known whether every order n group has a product-free subset
of size ≥ cn for some absolute constant c > 0 (this was Question 3.4.1, asked by Babai and Sós).
Gowers’ result shows that the answer is no.
In the other direction, Kedlaya (1997; 1998) showed that every finite group of order n has a
product-free subset of size & n11/14 . In fact, he showed that if the group has a proper subgroup H
of index m, then there is a product-free subset that is a union of & m1/2 cosets of H.
The above results tell us that having no small non-trivial representations is a useful property of
groups. Gowers further showed that this group representation theoretic property is equivalent to
several other characterizations of the group.

Theorem 3.4.14 (Quasirandom groups). Let Γn be a sequence of finite groups of increasing order.
The following are equivalent:
REP: The dimension of the smallest non-trivial representation of Γn tends to infinity.
GRAPH: Every sequence of bipartite Cayley graphs on Γn , as n → ∞, is quasirandom in the
sense of Theorem 3.1.22.
PRODFREE: The largest product-free subset of Γn has size o(|Γn |).
(X ⊂ Γn is product-free if there is no solution to x y = z with x, y, z ∈ X)
QUOTIENT: For every proper normal subgroup H of Γn , the quotient Γn /H is nonabelian and
has order tending to infinity as n → ∞.

Let us comment on the various implications.


By Theorem 3.4.6, REP implies GRAPH. For the converse, we need to construct a non-
quasirandom Cayley graph on each group with a non-trivial representation of bounded dimension.
One can first construct a weighted analogue of a bipartite Cayley graph with large eigenvalues by
appealing to formulas from non-abelian Fourier transform (see Remark 3.4.16 below). And then
one can sample a genuine bipartite Cayley graph from the weighted version.
By Corollary 3.4.9, REP implies PRODFREE. The converse is proved in Gowers (2008) using
elementary methods. It was later proved with better polynomial quantitative dependence in Nikolov
and Pyber (2011), who proved the following result.

Theorem 3.4.15. Let Γ be a group with a non-trivial representation of dimension K. Then Γ has
a product-free subset of size at least c |Γ| /K, where c > 0 is some absolute constant.

To see that REP implies QUOTIENT, note that any non-trivial representation of Γ/H is
automatically a representation of Γ after passing through the quotient. Furthermore, every non-
√ representation, and every group of order m > 1
trivial abelian group has a non-trivial 1-dimensional
has a non-trivial representation of dimension < m. For the proof of the converse, see Gowers
(2008, Theorem 4.8). (This implication has an exponential dependence of parameters.)

Remark 3.4.16 (Non-abelian Fourier analysis). (This is an advanced remark and can be skipped
over.) Section 3.3 discussed the Fourier transform on finite abelian groups. The topic of this section
can be alternatively viewed through the lenses of the non-abelian Fourier transform. We refer to,
e.g., Wigderson (2012), for a tutorial on the non-abelian Fourier transform from a combinatorial
perspective.
Let us give here the recipe for computing the eigenvalues and an orthonormal basis of eigen-
vectors of Cay(Γ, S).
3.5. QUASIRANDOM CAYLEY GRAPHS 97

For each irreducible representation ρ of Γ (always working over C), let


Õ
Mρ := ρ(s),
s∈S

viewed as a dim ρ × dim ρ matrix over C. Then Mρ has dim ρ eigenvalues λ ρ,1, . . . , λ ρ,dim ρ .
Here is how to list all the eigenvalues of the adjacency matrix of Cay(Γ, S): repeating each λ ρ,i
with multiplicity dim ρ, ranging over all irreducible representations ρ and all 1 ≤ i ≤ dim ρ.
To emphasize, the eigenvalues always come in bundles with multiplicities determined by the the
dimensions of the irreducible representations of Γ (although it is possible for there to be additional
coalescence of eigenvalues).
One can additionally recover a system of eigenvectors of Cay(Γ, S). For each eigenvector v with
eigenvalue λ of Mρ , and every w ∈ Cdim ρ , set x ρ,v,w ∈ CΓ with coordinates
ρ,v,w
xg = hρ(g)v, wi
for all g ∈ Γ. Then x is an eigenvector of Cay(Γ, S) with eigenvalue λ. Now let ρ range over all
irreducible representations of Γ, and let v range over an orthonormal basis of eigenvectors of Mρ
(let λ be the corresponding eigenvalue), and let w range over an orthonormal basis of eigenvectors
of Cdim ρ , then x ρ,v,w ranges over an orthogonal system of eigenvectors of Cay(Γ, S). The eigenvalue
associated to x ρ,v,w is λ.
A basic theorem in representation theory tells us that the regular representation decomposes
into a direct sum of dim ρ copies of ρ ranging over every irreducible representation ρ of Γ.
This decomposition then corresponds to a block diagonalization (simultaneously for all S) of the
adjacency matrix of Cay(Γ, S) into blocks Mρ , repeated dim ρ times, for each ρ. The above statement
comes from interpreting this block diagonalization.
The matrix Mρ , appropriately normalized, is the non-abelian Fourier transform of the indi-
cator vector of S at ρ. Many basic and important formulas for Fourier analysis over abelian groups,
e.g, inversion and Parseval (which we will see in Chapter 6) have nonabelian analogs.

3.5. Quasirandom Cayley graphs


Let us examine the following two sparse quasirandom graph conditions (c.f. Remark 3.1.25).
Definition 3.5.1. Let G be an n-vertex d-regular graph. We say that G satisfies property
Sparse-DISC(): if e(X, Y ) − dn |X | |Y | ≤  dn for all X, Y ⊂ V(G);
Sparse-EIG(): if G is an (n, d, λ)-graph for some λ ≤  d.
In Section 3.1, we saw that when d grows linearly in n, then these two conditions are equivalent
up to a polynomial change in the constant . As discussed in Remark 3.1.25, many quasirandomness
equivalences break down for sparse graphs, meaning d = o(n) here. Some still holds, for example:

Proposition 3.5.2. Every regular graph satisfying Sparse-EIG() also satisfies Sparse-DISC().

Proof. In an (n, d, λ) graph with λ ≤  d, by the expander mixing lemma (Theorem 3.2.4), for every
vertex subsets X and Y ,
d p p
e(X, Y ) − |X | |Y | ≤ λ |X | |Y | ≤  d |X | |Y | ≤  dn.
n
So the graph satisfies Sparse-DISC(). 
98 3. PSEUDORANDOM GRAPHS

The converse fails badly. Consider the disjoint union of a large random d-regular graph and a
Kd+1 (here d = o(n)).

large random
Kd+1
d-regular graph

This graph satisfies Sparse-DISC(o(1)) since it is satisfied by the large component, and the small
component Kd+1 contributes negligibly to discrepancy due to its size. On the other hand, each
connected component contributes a eigenvalue of d (by taking the all-1 vector supported on each
component), and so Sparse-EIG() fails for any  < 1.
The main result of this section is that despite the above example, if we restrict ourselves to
Cayley graphs (abelian or non-abelian), Sparse-DISC() and Sparse-EIG() are always equivalent
up to a linear change in . This result is due to Conlon and Zhao (2017).

Theorem 3.5.3. If a Cayley graph satisfies Sparse-DISC(), then it satisfies Sparse-EIG(8).

As in Section 3.4, we prove the result more generally for vertex-transitive graphs.

Theorem 3.5.4. If a vertex-transitive graph satisfies Sparse-DISC(), then it satisfies Sparse-EIG(8).

A pedagogical reason to show the proof of this theorem is to showcase the following important
inequality from functional analysis due to Grothendieck (1953).
Given a matrix A = (ai, j ) ∈ Rm×n , we can consider its ` ∞ → ` 1 norm
sup k Ayk `1 ,
k yk ∞ ≤1

which can also be written as (exercise: check! Also see Lemma 4.5.3 for a related fact about the
cut norm of graphons)
n Õ
Õ n
sup hx, Ayi = sup ai, j xi y j . (3.5.1)
x∈{−1,1} m x1,··· ,xm ∈{−1,1} i=1 j=1
y∈{−1,1} n y1,...,yn ∈{−1,1}

This quantity is closely related to discrepancy.


One can consider a semidefinite relaxation of the above quantity:
m Õ
Õ n
sup ai, j hxi, yi i , (3.5.2)
k x1 k,...,k xm k≤1 i=1 j=1
k y1 k,...,k yn k≤1

where the surpremum is taken over vectors x1, . . . , xm, y1, . . . , yn in the unit ball of some real
Hilbert space, whose norm is denoted by k k. Without loss of generality, we can take assume that
these vectors lie in Rm+n with the usual Euclidean norm (here m + n dimensions are enough since
x1, . . . , xm, y1, . . . , yn span a real subspace of dimension at most m + n).
We always have
(3.5.1) ≤ (3.5.2)
by restricting the vectors in (3.5.2) to R. The latter expression
Í (3.5.2) is called a semidefinite
relaxation since it can be also written as the supremum of i, j ai, j Mi, j over all positive semidefinite
3.5. QUASIRANDOM CAYLEY GRAPHS 99

matrices M. So (3.5.2) can be efficiently computed using semidefinite programming, whereas no


efficient algorithm is believed to exist for computing (3.5.1) (Alon and Naor 2006).
Grothendieck’s inequality says that this semidefinite relaxation never loses lose more than a
constant factor.

Theorem 3.5.5 (Grothendieck’s inequality). There exists a constant K > 0 (K = 1.8 works) such
that for all matrices A = (ai, j ) ∈ Rm×n ,
m Õ
Õ n m Õ
Õ n
sup ai, j hxi, yi i ≤ K sup ai, j xi y j ,
k xi k, k y j k ≤1 i=1 j=1 xi,y j ∈{±1} i=1 j=1

where the left-hand side supremum is taken over vectors x1, . . . , xn, y1, . . . , ym in the unit ball of
some real Hilbert space.
Remark 3.5.6. The optimal constant K is known as the real Grothendieck’s constant. Its exact
value is unknown. It is known to lie within [1.676, 1.783]. There is also a complex version of
Grothendieck’s inequality, where the left-hand side uses a complex Hilbert space (and place an
absolute value around the final sum). The corresponding complex Grothendieck’s constant is
known to lie within [1.338, 1.405].
We will not prove Grothendieck’s inequality here. See Alon and Naor (2006) for three proofs
of the inequality, along with algorithmic discussions.
Now let us use Grothendieck’s inequality to show that Sparse-DISC() implies Sparse-EIG(8)
for vertex-transitive graphs.
Proof of Theorem 3.5.3. Let G be an n-vertex d-regular graph with a vertex-transitive group Γ of
automorphisms. Suppose G satisfies Sparse-DISC(). Let A be the adjacency matrix of G. Write
d
J B = A−
n
where J is the n × n all-1 matrix. To show that G is an (n, d, λ)-graph with λ ≤  d, it suffices
to show that B has operator norm kBk ≤  d (here we are using that G is d-regular, so the all-1
eigenvector of A with eigenvalue d becomes an eigenvector of B with eigenvalue zero 0).
For any X, Y ⊂ V(G), the corresponding indicator vectors x = 1 X ∈ Rn and y = 1Y ∈ Rn satisfy,
by Sparse-DISC(),
d
|hx, Byi| = e(X, Y ) − |X | |Y | ≤  dn.
n
Then, for any x, y ∈ {−1, 1}n , we can write x = x + −x − and y = y + −y − with x +, x −, y +, y − ∈ {0, 1}n .
Since,
hx, Byi = hx +, By + i − hx +, By − i − hx −, By + i + hx −, By − i,
and each term on the right-hand side is at most  dn in absolute value, we have
|hx, Byi| ≤ 4 dn for all x, y ∈ {−1, 1}n . (3.5.3)
For any graph automorphism g ∈ Γ and any x = (x1, . . . , xn ) ∈ Rn and i ∈ [n], write
r 
n
x =
j
xg( j) : g ∈ Γ ∈ RΓ .
|Γ|
100 3. PSEUDORANDOM GRAPHS

For every unit vector x ∈ Rn , the vector x j ∈ RΓ is a unit vector since x12 + · · · + xn2 = 1 and the map
g 7→ g( j) is n/|Γ|-to-1 for each j. Similarly define y j for any y ∈ Rn and j ∈ [n]. Furthermore,
Bi, j = Bg(i),g( j) for any g ∈ Γ and j ∈ [n] due to g being a graph automorphism.
To prove the operator norm bound kBk ≤ 8 d, it suffices to show that hx, Byi ≤ 8 d for every
pair of unit vectors x, y ∈ Rn . We have
n n
Õ 1 ÕÕ
hx, Byi = Bi, j xi y j = Bg(i),g( j) xg(i) yg( j)
i, j=1
|Γ| g∈Γ i, j=1
n n
1 ÕÕ 1Õ
= Bi, j xg(i) yg( j) = Bi, j hx i, y j i ≤ 8 d.
|Γ| g∈Γ i, j=1 n i, j=1

The final step follows from Grothendieck’s inequality (applied with K ≤ 2) along with (3.5.3).
This completes the proof of Sparse-EIG(8). 

3.6. Second eigenvalue bound


The expander mixing lemma tells us that in an (n, d, λ)-graph, a smaller value of λ guarantees
stronger pseudorandomness properties. In this chapter, we explore the following natural extremal
question.
Question 3.6.1. Fix a positive integer d. What is the smallest possible λ (as a function of d alone)
such that there exist infinitely many (n, d, λ + o(1))-graphs, where the o(1) is some quantity that
goes to zero as n → ∞?
The following result (Alon 1986) gives a lower bound on λ. As we will see later, it turns out to
be tight.

Theorem 3.6.2 (Alon–Boppana bound). Fix d. Let G be an n-vertex d-regular graph. If λ1 ≥


· · · ≥ λn are the eigenvalues of its adjacency matrix, then

λ2 ≥ 2 d − 1 − o(1),
where o(1) → 0 as n → ∞.

In particular, the Alon–Boppana bound implies that max {|λ2 | , |λn |} ≥ 2 d − 1 − o(1), which
can be restated as below.

Corollary 3.6.3. For every fixed d and λ < 2 d − 1, there are only finitely many (n, d, λ)-graphs.

We will see two different proofs. The first proof (Nilli 1991) constructs an eigenvector explicitly.
The second proof (only for Corollary 3.6.3) uses the trace method to bound moments of the
eigenvalues via counting closed walks.

Lemma 3.6.4. Let G = (V, E) be a d-regular graph. Let A be the adjacency matrix of G. Let r be
a positive integer. Let st be an edge of G. For each i ≥ 0, let Vi denote the set of all vertices at
distance exactly i from {s, t} (so that in particular V0 = {s, t}). Let x = (xv )v∈V ∈ RV be a vector
with coordinates (
(d − 1)−i/2 if v ∈ Vi and i ≤ r,
xv =
0 otherwise, i.e., dist(v, {s, t}) > r.
3.6. SECOND EIGENVALUE BOUND 101

Then

 
hx, Axi 1
≥ 2 d−1 1−
hx, xi r +1

s t xv
V0 1

V1 (d − 1)−1/2

V2 (d − 1)−1

V3 (d − 1)−3/2

Proof. Let L = dI − A (this is called the Laplacian matrix of G). The claim can be rephrased as
an upper bound on hx, L xi /hx, xi. Here is an important and convenient formula (it can be easily
proved by expanding):
Õ
hx, L xi = (xu − xv )2 .
uv∈E

Since xv is constant for all v in the same Vi , we only need to consider edges spanning consecutive
Vi ’s. Using the formula for x, we obtain
r−1 2
e(Vr , Vr+1 )

Õ 1 1
hx, L xi = e(Vi, Vi+1 ) − +
i=0
(d − 1)i/2 (d − 1)(i+1)/2 (d − 1)r

For each i ≥ 0, each vertex in Vi has at most d − 1 neighbors in Vi+1 , so e(Vi, Vi+1 ) ≤ (d − 1) |Vi |.
Thus continuing from above,
r−1  2
Õ 1 1 |Vr | (d − 1)
≤ |Vi | (d − 1) − +
i=0
(d − 1)i/2 (d − 1)(i+1)/2 (d − 1)r
√ r−1
2 Õ
|Vi | |Vr | (d − 1)
= d−1−1 +
i=0
(d − 1) i (d − 1)r
√ r  √
 Õ |Vi |  |V |
r
= d−2 d−1 + 2 d − 1 − 1 .
i=0
(d − 1) i (d − 1)r

We have |Vi+1 | ≤ (d − 1) |Vi | for every i ≥ 0, so that |Vr | (d − 1)−r ≤ |Vi | (d − 1)−i for each i ≤ r.
So continuing,

√r
!

2 d − 1 − 1 Õ |Vi |
≤ d−2 d−1+
r +1 i=0
(d − 1)i
√ !
√ 2 d−1−1
= d−2 d−1+ hx, xi ,
r +1
102 3. PSEUDORANDOM GRAPHS

It follows that
√ !
hx, Axi hx, L xi √ 2 d−1−1
=d− ≥ 2 d−1−
hx, xi hx, xi r +1

 
1
≥ 1− 2 d − 1. 
r +1
Proof of the Alon–Boppana bound (Theorem 3.6.2). Let V = V(G). Let 1 be the all-1’s vector,
which is an eigenvector with eigenvalue d. To prove the theorem, it suffices to exhibit a a nonzero
vector z ⊥ 1 such that
hz, Azi √
≥ 2 d − 1 − o(1).
hz, zi
Let r be an arbitrary positive integer. When n is sufficiently large, there exist two edges st and s0t 0
in the graph with distance at least 2r + 2 apart (indeed, since the number of vertices within distance
k of and edge is ≤ 2(1 + (d − 1) + (d − 1)2 + · · · + (d − 1) k )). Let x ∈ RV be the vector constructed
as in Lemma 3.6.4 for st, and let y ∈ RV be the corresponding vector constructed for s0t 0. Recall
that x is supported on vertices within distance r from st, and likewise with y and s0t 0. Since st and
s0t 0 are at distance at least 2r + 2 apart, the support of x is at distance at least 2 from the support of
y. Thus
hx, yi = 0 and hx, Ayi = 0.
Choose a constant c ∈ R such that z = x − cy has sum of its entries equal to zero (this is possible
since hy, 1i > 0). Then
hz, zi = hx, xi + c2 hy, yi
and so by Lemma 3.6.4
hz, Azi = hx, Axi + c2 hy, Ayi

 
1  
≥ 1− 2 d − 1 hx, xi + c2 hy, yi
r +1

 
1
= 1− 2 d − 1 hz, zi .
r +1
Taking r → ∞ as n → ∞ gives the theorem. 
Remark 3.6.5. The above proof cleverly considers distance from an edge rather than from a single
vertex. This is important for a rather subtle reason. Why does the proof fail if we had instead
considered distance from a vertex?
Now let us give another proof—actually we will only prove the slightly weaker statement of
Corollary 3.6.3, which is equivalent to

max {|λ2 | , |λn |} ≥ 2 d − 1 − o(1). (3.6.1)

As a warmup, let us first prove (3.6.1) with d − o(1) on the right-hand side. We have
Õn
dn = 2e(G) = tr A2 = λi2 ≤ d 2 + (n − 1) max {|λ2 | , |λn |}2 .
i=1
So r
d(n − d) √
max {|λ2 | , |λn |} ≥ = d − o(1)
n−1
3.6. SECOND EIGENVALUE BOUND 103

as n → ∞ for fixed d.
To prove (3.6.1), we consider higher moments tr Ak . This is a useful technique, sometimes
called the trace method or the moment method.
Alternative proof of (3.6.1). The quantity
n
Õ
tr A 2k
= λi2k
i=1

counts the number of closed walks of length 2k on G. Let Td denote the infinite d-regular tree.
Observe that
# closed length-2k walks in G starting from a fixed vertex
≥ # closed length-2k walks in Td starting from a fixed vertex.
Indeed, at each vertex, for both G and Td , we can label its d incident edges arbitrarily from 1 to d
(the labels assigned from the two endpoints of the same edge do not have to match). Then every
closed length-2k walk in Td corresponds to a distinct closed length-2k walk in G by tracing the
same outgoing edges at each step (why?). Note that not all closed walks in G arise this way (e.g.,
walks that go around cycles in G).
The number of closed walks of length2k on an infinite d-regular graph starting at a fixed root
is at least (d − 1) k Ck , where Ck = k+1
1 2k
k is the k-th Catalan number. To see this, note that each
step in the walk is either “away from the root” or “towards the root.” We record a sequence by
denoting steps of the former type by + and of the latter type by −.

If t.tt
i

t t t

Then the number of valid sequences permuting k +’s and k −’s is exactly counted by the Catalan
number Ck , as the only constraint is that there can never be more −’s than +’s up to any point in
the sequence. Finally, there are at least d − 1 choices for where to step in the walk at any + (there
are d choices at the root), and exactly one choice for each −.
Thus, the number of closed walks of length 2k in G is at least
 
n 2k
2k k
tr A ≥ n(d − 1) Ck ≥ (d − 1) k .
k +1 k
On the other hand, we have
n
Õ
tr A2k
= λi2k ≤ d 2k + (n − 1) max {|λ2 | , |λn |}2k .
i=1

Thus,
d 2k
 
1 2k
max {|λ2 | , |λn |} 2k
≥ (d − 1) k − .
k +1 k n−1
104 3. PSEUDORANDOM GRAPHS

1 2k 2k as k → ∞. Letting k → ∞ slowly (e.g., k = o(log n)) as n → ∞



The term k+1 k is (2 − o(1)) √
gives us max {|λ2 | , |λn |} ≥ 2 d − 1 − o(1). 
Remark 3.6.6. The infinite d-regular graph Td is the universal cover of all d-regular
√ graphs (this
fact is used in the first step of the argument). The spectral radius of Td is 2 d − 1, which is the
fundamental reason why this number arises in the Alon–Boppana bound.
Let us return to Question 3.6.1: what is the smallest possible λ2 for n-vertex d-regular graphs,
with d fixed and n large? Alon’s second eigenvalue conjecture says that random d-regular graphs
match the Alon–Boppana bound. This was proved by Friedman (2008), a rather difficult result.

Theorem 3.6.7 (Friedman’s second eigenvalue theorem). Fix positive integer d and λ > 2 d − 1.
With probability 1 − o(1) as n → ∞ (with n even if d is odd), a uniformly chosen random n-vertex
d-regular graph is an (n, d, λ)-graph.

What about for λ = 2 d − 1 exactly? Much mystery remains.

Definition 3.6.8. A Ramanujan graph is an (n, d, λ)-graph with λ = 2 d − 1. In other words, √ it is
a d-regular graph whose adjacency matrix has all eigenvalues, except the top one, at most 2 d − 1
in absolute value.
A major open problem is to show the existence of infinite families of d-regular Ramanujan
graphs.

Conjecture 3.6.9 (Existence of Ramanujan graphs). For every positive integer d ≥ 3, there exist
infinitely many d-regular Ramanujan graphs.
While it is not too hard to construct small Ramanujan graphs, e.g., Kd+1 has eigenvalues λ1 = d
and λ2 = · · · = λn = −1, it is a difficult problem to construct infinitely many d-regular Ramanujan
graphs for each d.
The term Ramanujan graphs was coined by Lubotzky, Phillips, and Sarnak (1988), who
constructed infinite families of d-regular Ramanujan graphs when d − 1 is an odd prime. The same
result was independently proved by Margulis (1988). The proof of the eigenvalue bounds uses
deep results from number theory, namely solutions to the Ramanujan conjecture (hence the name).
These constructions were later extended by Morgenstern (1994) whenever d − 1 is a prime power.
The current state of Conjecture 3.6.9 is given below, and it remains open for all other d, with the
smallest open case being d = 7.

Theorem 3.6.10. If d − 1 is a prime power, then there exist infinitely many d-regular Ramanujan
graphs.
All known results are based on explicit constructions using Cayley graphs on PSL(2, q) or
related groups. We refer the reader to the book Davidoff, Sarnak, and Valette (2003) for a gentle
exposition of the construction.
Theorem 3.6.7 says that random d-regular graphs are “nearly-Ramanujan.” Empirical evidence
suggests that for each fixed d, a uniform random n-vertex d-regular graph is Ramanujan with
probability bounded away from 0 and 1, for large n. If this were true, it would prove Conjecture 3.6.9
on the existence of Ramanujan graphs. However, no rigorous results are known in this vein.
One can formulate a bipartite analog.
FURTHER READING 105

Definition
√ 3.6.11. A bipartite Ramanujan graph is some bipartite-(n, d, λ)-graph with λ =
2 d − 1.
Given a Ramanujan graph G, we can turn it into a bipartite Ramanujan graph G × K2 . So the
existence of bipartite Ramanujan graphs is weaker than of Ramanujan graphs. Nevertheless, for
a long time, it was not known how to construct infinite families of bipartite Ramanujan graphs
other than using Ramanujan graphs. A breakthrough by Marcus, Spielman, and Srivastava (2015)
completely settled the bipartite version of the problem. Unlike earlier construction of Ramanujan
graphs, their proof is existential (i.e., non-constructive) and introduces an important technique of
interlacing families of polynomials.

Theorem 3.6.12 (Bipartite Ramanujan graphs of every degree). For every d ≥ 3, there exist infin-
itely many d-regular bipartite Ramanujan graphs.
Exercise 3.6.13 (Alon–Boppana bound with multiplicity). Prove that for every positive integer d and
real  > 0, there is some√constant c > 0 so that every n-vertex d-regular graph has at least cn
eigenvalues greater than 2 d − 1 − .
Exercise 3.6.14∗ (Net removal decreases top eigenvalue). Show that for every d and r, there is
some  > 0 such that if G is a d-regular graph, and S ⊂ V(G) is such that every vertex of G is
within distance r of S, then the top eigenvalue of the adjacency matrix of G − S (i.e., remove S and
its incident edges from G) is at most d − .
Further reading
The survey Pseudo-random Graphs by Krivelevich and Sudakov (2006) discusses many com-
binatorial aspects of this topic.
Expander graphs are a large and intensely studied topic, partly due to many important appli-
cations in computer science. See the survey Expander Graphs and Their Applications by Hoory,
Linial, and Wigderson (2006). The survey Expander Graphs in Pure and Applied Mathematics by
Lubotzky (2012) as well as his book Discrete Groups, Expanding Graphs and Invariant Measures
(1994) go more in-depth into graph expansion and connections to algebra.
For spectral graph theory, see the book Spectral Graph Theory by Chung (1997), or the book
draft currently in progress by Spielman titled Spectral and Algebraic Graph Theory.
The textbook Elementary Number Theory, Group Theory and Ramanujan Graphs by Davidoff,
Sarnak, and Valette (2003) gives a gentle introduction to the construction of Ramanujan graphs.
The breakthrough by Marcus, Spielman, and Srivastava (2015) constructing bipartite Ramanu-
jan graphs via interlacing polynomials is an instant classic.
CHAPTER 4

Graph limits

The theory of graphs limits was developed by Lovász and his collaborators in a series of works
starting around 2003. The researchers were motivated by questions about very large graphs from
several different angles, including from combinatorics, statistical physics, computer science, and
applied math. Graph limits give an analytic framework for analyzing large graphs. The theory
offers both a convenient mathematical language as well as powerful theorems.
Suppose we lived in a hypothetical world where we only had access to rational numbers and
had no language for irrational numbers. We are given the following optimization problem:
minimize x 3 − x subject to 0 ≤ x ≤ 1.

The minimum occurs at x = 1/ 3, but this answer does not make sense over the rationals. With
only access to rationals, we can state a progressively improving sequence of answers that converge
to the optimum. This is rather cumbersome. It is much easier to write down a single real number
expressing the answer.
Now consider an analogous questions for graphs. Fix some real p ∈ [0, 1]. We want to
minimize (# closed walks of length 4)/n4
among n-vertex graphs with edge density ≥ p.
We know from Proposition 3.1.12 every n-vertex graph with edge density ≥ p has at least n4 p4
closed walks of length 4. On the other hand, every sequence of quasirandom graphs with edge
density p + o(1) has p4 n4 + o(n4 ) closed walks of length 4. It follows that the minimum (or rather,
infimum) is p4 , and is attained not by any single graph, but rather by a sequence of quasirandom
graphs.
One of the purposes of graph limits is to provide an easy-to-use mathematical object that
captures the limit of such graph sequences. The central object in theory of graph limits is called a
graphon (the word comes from combining graph and function), to be defined shortly. Graphons
can be viewed as an analytic generalization of graphs.
Here are some questions that we will consider:
(1) What does it mean for a sequence of graphs (or graphons) to converge?
(2) Are different notions of convergence equivalent?
(3) Does every convergent sequence of graphs (or graphons) have a limit?
Note that it is possible to talk about convergence without a limit. In a first real analysis
course, one learns about a Cauchy sequence in a metric space (X, d), which is some sequence
x1, x2, · · · ∈ X such that for every  > 0, there is some N so that d(xm, xn ) <  for all m, n ≥ N. For
instance, Q one can have a Cauchy sequence without a limit in Q. A metric space is complete if
every Cauchy sequence has a limit. The completion of X is some complete metric space X e such
that X is isometrically embedded in X as a dense subset. The completion of X is in some sense
e
the smallest complete space containing X. For example, R is the completion of Q. Intuitively, the
107
108 4. GRAPH LIMITS

completion of a space fills in all its the gaps. A basic result in analysis says that every space has a
unique completion.
Here is a key result about graph limits that we will prove:
The space of graphons is compact, and is the completion of graphs.
To make this statement precise, we also need to define a notion of similarity (i.e., distance) between
graphs, and also between graphons. We will see two different notions, one based on the cut metric,
and another based on subgraph densities. Another important result in the theory of graph limits
is that these two notions are equivalent. We will prove it at the end of the chapter once we have
developed some tools.

4.1. Graphons
Here is the central object in the theory of dense graph limits.
Definition 4.1.1. A graphon is a symmetric measurable function W : [0, 1]2 → [0, 1]. Here
symmetric means W(x, y) = W(y, x) for all x, y.
Remark 4.1.2. More generally, we can consider an arbitrary probability space Ω and study sym-
metric measurable functions Ω × Ω → [0, 1]. In practice, we do not lose much by restricting to
[0, 1].
We will also sometimes consider symmetric measurable functions [0, 1]2 → R (e.g., arising as
the difference between two graphons). Such an object is sometimes called a kernel in the literature.
Remark 4.1.3 (Measure theoretic technicalities). We try to sweep measure theoretic technicalities
under the rug so that we can focus on the key ideas. We always ignore measure zero differences.
For example, we shall treat two graphons as the same if they only differ on a measure zero subset
of the domain.
Here is a procedure to turn any graph G into a graphon WG :
(1) Write down the adjacency matrix AG of the graph;
(2) Replace the matrix by a black and white pixelated picture on [0, 1]2 , by turning every 1 entry
into a black square and every 0 entry into a white square.
(3) View the resulting picture as a graphon WG : [0, 1]2 → [0, 1] (with the axis labeled like a
matrix, i.e., x ∈ [0, 1] running from top to bottom and y ∈ [0, 1] running from left to right),
where we write WG (x, y) = 1 if (x, y) is black and WG (x, y) = 0 if (x, y) is white.
An equivalent definition is given below. As with everything in this chapter, we ignore measure
zero differences, and so it does not matter what we do with boundaries of the pixels.
Definition 4.1.4. Given a graph G with n vertices labeled 1, . . . , n, we define its associated graphon
WG : [0, 1]2 → [0, 1] by first partitioning [0, 1] into n equal-length intervals I1, . . . , In and setting
WG to be 1 on all Ii × I j where i j is an edge of G, and 0 on all other Ii × I j ’s.
More generally, we can encode nonnegative vertex and edge weights in a graphon.
Definition 4.1.5. A step-graphon W with k steps consists of first partitioning [0, 1] into k intervals
I1, . . . , Ik , and then setting W to be a constant on each Ii × I j .
Example 4.1.6 (Half-graph). Consider the bipartite graph on 2n vertices, with one vertex part
{v1, . . . , vn } and the other vertex part {w1, . . . , wn }, and edges vi w j whenever i ≤ j. Its adjacency
4.1. GRAPHONS 109

matrix and associated graphon are illustrated below.


1 7 000000111111
000000011111
2 8 000000001111
000000000111
3 9 000000000011
000000000001
4 10 100000000000
110000000000
5 11 111000000000
111100000000
6 12 111110000000
111111000000

As n → ∞, the associated graphons converge pointwise almost everywhere to the graphon


0 1
( 0
1 if x + y ≤ 1/2 or x + y ≥ 3/2,
W(x, y) =
0 otherwise.
1

In general, pointwise convergence turns out to be too restrictive. We will need a more flexible
notion of convergence, which we will discuss more in depth in the next section. Let us first give
some more examples to motivate subsequent definitions.
Example 4.1.7 (Quasirandom graphs). Let G n be a sequence of quasirandom graphs with edge
density approaching 1/2, and v(Gn ) → ∞. The constant graphon W ≡ 1/2 seems like a reasonable
candidate for its limit, and later we will see that this is indeed the case.

−→ 1
2

Example 4.1.8 (Stochastic block model). Consider an n vertex graph with two types of vertices:
red and blue. Half of the vertices are red and half of the vertices are blue. Two red vertices are
adjacent with probability pr , two blue vertices are adjacent with probability pb , and finally, a red
vertex and a blue vertex are adjacent with probability pr b , all independently. Then as n → ∞, the
graphs converge to the step-graphon shown below.

pr pr b
−→
pr b pb

The above examples suggest that the limiting graphon looks like a blurry image of the adjacency
matrix. However, there is an important caveat as illustrated in the next example.
Example 4.1.9 (Checkerboard). Consider the 2n × 2n “checkerboard” graphon shown below (for
n = 4).
1 2
3 4
5 6
7 8
Since the 0 and 1’s in the adjacency matrix are evenly spaced, one might suspect that the constant
1/2 graphon would be a limit of the sequence of graphons as the number of vertices tends to infinity.
However, this is not so. The checkerboard graphon is associated to the complete bipartite graph
110 4. GRAPH LIMITS

Kn,n , with the two vertex parts interleaved. By relabeling the vertices, we see that below is another
representation of the associated graphon of the same graph.
1 5
2 6
3 7
4 8

So the graphon is the same for all n. So the graphon shown on the right, which is also WK2 , must
be the limit of the sequence, and not the constant 1/2 graphon.
This example tells us that we must be careful about the possibility of rearranging vertices when
studying graph limits.
A graphon is an infinite dimensional object. We would like some ways to measure the similarity
between two graphons. We will explain two different approaches:
(1) cut distance, and
(2) homomorphism densities.
One of the main results in the theory of graph limits is that these two approaches are equivalent—we
will show this later in the chapter.

4.2. Cut distance


There are many ways to measure the distance between two graphs. Different methods may
be useful for different applications. For example, we can consider the edit distance between
two graphs (say on the same set of vertices), defined to be the number of edges added/deleted
to obtain one graph from the other. The notion of edit distance arose when discussing induced
graph removal lemmas in Section 2.8. However, edit distance is not suitable for graph limits since
it is incompatible with (quasi)random graphs. For example, given two n-vertex random graphs,
independently generated with edge-probability 1/2, we would like to say that they are similar as
these graphs will end up converging to the constant 1/2 graphon as n → ∞ (e.g., Example 4.1.7).
However, two independent random graphs typically only agree on around half of their edges (even
if we allow permuting vertices), and so it takes (1/4 + o(1))n2 edge addition/deletion to obtain one
from the other.
A more suitable notion of distance is motivated by the discrepancy condition from Theorem 3.1.1
on quasirandom graphs. Inspired by the condition DISC, we would want to say that a graph G is
-close to the constant p graphon if
|eG (X, Y ) − p |X | |Y || ≤  |V(G)| 2 for all X, Y ⊂ V(G).
Inspired by this notion, we now compare a pair of graphs G and G0 on a common vertex set
V = V(G) = V(G0). We say that G and G0 are -close in cut norm if
|eG (X, Y ) − eG 0 (X, Y )| ≤  |V | 2 for all X, Y ⊂ V . (4.2.1)
(This term “cut” is often used to refer to the set of edges in a graph G between some X ⊂ V(G) and
its complement. The cut norm builds on this concept.) With this notion, two independent n-vertex
random graphs with the same edge-probability are o(1)-close in cut norm as n → ∞.
As illustrated in Example 4.1.9, we also need to consider possible relabeling of vertices.
Intuitively, the cut distance between two graphs will come from the relabeling of vertices that
gives the greatest alignment. The actual definition will be a bit more subtle, allowing vertex
4.2. CUT DISTANCE 111

fractionalization. The general definition of cut distance will allow us to compare graphs with
different numbers of vertices. It is conceptually easier to define cut distance using graphons.
The edit distance of graphs corresponds to the L 1 distance for graphons. For every p ≥ 1, we
define the L p norm of a function W : [0, 1]2 → R by
∫  1/p
kW k p := p
|W(x, y)| dxdy ,
[0,1]2
and the L ∞ norm by
kW k ∞ := sup t : W −1 ([t, ∞)) has positive measure .


(This is not simply the supremum of W; the definition should be invariant under measure zero
changes of W.)
Definition 4.2.1. The cut norm of a measurable W : [0, 1]2 → R is defined as

kW k  := sup W ,
S,T ⊂[0,1] S×T

where S and T are measurable sets.


Let G and G0 be two graphs sharing a common vertex set. Let WG and WG 0 be their associated
graphons (using the same ordering of vertices when constructing the graphons). Then the G and
G0 are -close in cut norm (see (4.2.1)) if and only if
kWG − WG 0 k  ≤  .
(There is a subtlety in this claim that is worth thinking about: should we worried about sets
S, T ⊂ [0, 1] in Definition 4.2.1 of cut norm that contain fractions of some intervals that represent
vertices? See Lemma 4.5.3 for a reformulation of the cut norm that may shed some light.)
We need a concept for an analog of a vertex set permutation for graphons. We write
λ(A) = Lebesgue measure of A.
Intuitively, this is the “length” of A. We will always be referring to Lebesgue measurable sets
(measure theoretic technicalities are not central to the discussions here, so feel free to ignore them).
Definition 4.2.2. We say that φ : [0, 1] → [0, 1] is a measure preserving map if
λ(A) = λ(φ−1 (A)) for all measurable A ⊂ [0, 1].
We say that φ is an invertible measure preserving map if there is another measure preserving map
ψ : [0, 1] → [0, 1] such that φ ◦ ψ and ψ ◦ φ are both identity maps outside a set of measure zero.
Example 4.2.3. For any constant α ∈ R, the function φ(x) = x + α mod 1 is measure-preserving
(this map rotates the circle R/Z by α).
A more interesting example is, φ(x) = 2x mod 1, illustrated below.
1 1

f (x) = 2x mod 1
A

0 0
0 1 0 1
f −1 (A)
112 4. GRAPH LIMITS

This map is also measure-preserving. This might not seem to be the case at first, since f seems
to shrink some intervals by half. However, the definition of measure-preserving actually says
λ( f −1 (A)) = λ(A) and not λ( f (A)) = λ(A). For any interval [a, b] ⊂ [0, 1], we have f −1 ([a, b]) =
[a/2, b/2] ∪ [1/2 + a/2, 1/2 + b/2], which does have the same measure as [a, b]. This map is 2-to-1,
and it is not invertible.
Given W : [0, 1]2 → R and an invertible measure preserving maps φ : [0, 1] → [0, 1], we write
W φ (x, y) := W(φ(x), φ(y)).
Intuitively, this operation relabels of the vertex set.
Definition 4.2.4 (Cut metric). Given two symmetric measurable functions U, W : [0, 1]2 → R, we
define their cut distance (or cut metric) to be
δ (U, W) := inf U − W φ 
φ

= inf sup (U(x, y) − W(φ(x), φ(y))) dxdy ,
φ S,T ⊂[0,1] S×T

where the infimum is taken over all invertible measure preserving maps φ : [0, 1] → [0, 1]. Define
the cut distance between two graphs G and G0 by the cut distance of their associated graphons:
δ (G, G0) := δ (WG, WG 0 ).
Likewise, we can also define the cut distance between a graph and a graphon U:
δ (G, U) := δ (WG, U).
Definition 4.2.5 (Convergence in cut metric). We say that a sequence of graphs or graphons con-
vergences in cut metric if they form a Cauchy sequence with respect to δ . Furthermore, we say
that Wn convergences to W in cut metric if δ (Wn, W) → 0 as n → ∞.
Note that in δ (G, G0), we are doing more than just permuting vertices. A measure preserving
map on [0, 1] is also allowed to split a single node into fractions.
It is possible for two different graphons to have cut distance zero. For example, they could differ
on a measure-zero set, or could be related via measure-preserving maps. We can form a metric
space by identifying graphons with measure zero.
Definition 4.2.6 (Graphon space). Let W
f0 be the set of graphons (i.e., symmetric measurable
functions [0, 1]2 → [0, 1]) where any pair of graphons with cut distance zero are considered the
same point in the space. This is a metric space under cut distance δ .
We view every graph G as a point in W0 via its associated graphon (note that several graphons
can be identified as the same point in W0 ).
(The subscript 0 in Wf0 is conventional. Sometimes, without the subscript, W
f is used to denote
the space of symmetric measurable functions [0, 1]2 → R.)
Here is a central theorem in the theory of graph limits, proved by Lovász and Szegedy (2007).

f0, δ ) is compact.
Theorem 4.2.7 (Compactness of graphon space). The metric space (W
One of the main goals of this chapter is to prove this theorem and show its applications.
The compactness of graphon space is related to the graph regularity lemma. In fact, we will use
the regularity method to prove compactness. Both compactness and the graph regularity lemma
4.3. HOMOMORPHISM DENSITY 113

tell us that despite the infinite variability of graphs, every graph can be -approximated by a graph
from a finite set of templates.
We close this section with the following observation.

f0, δ ).
Theorem 4.2.8 (Graphs are dense in graphons). The set of graphs is dense in (W

Proof. Let  > 0. It suffices to show that for every graphon W there exists a graph G such that
δ (G, W) < .
We approximate W in several steps, illustrated below.

W W1 W2

First, by rounding down the values of W(x, y), we construct a graphon W1 whose values are all
integer multiples of /3, and such that
kW − W1 k ∞ ≤ /3.
Next, since every Lebesgue measurable subset of [0, 1]2 can be arbitrarily well approximated
using a union of boxes, we can find a step graphon W2 approximating W1 in L 1 norm:
kW1 − W2 k 1 ≤ /3.
Finally, by replacing each block of W2 by a sufficiently large quasirandom (bipartite) graph of
edge density equal to the value of W2 (c.f. Example 4.1.8), we find a graph G so that
kW2 − WG k  ≤ /3.
Then δ (W, G) < . 
Remark 4.2.9. In the above proof, to obtain kW1 − W2 k1 ≤ /3, the number of steps of W2 cannot
be uniformly bounded as function of , i.e., it must depend on W as well (why? Think about a
random graph). Consequently the number of vertices of the final graph G produced by this proof
is not bounded by a function of .
Later on, we will see a different proof showing that for every  > 0, there is some N() so that
every graphon lies within cut distance  of some graph with ≤ N() vertices (Proposition 4.8.1).
Since every compact metric space is complete, we have the following corollary.

f0, δ ) is the completion of the space of graphs with respect


Corollary 4.2.10. The graphon space (W
to the cut metric.
Exercise 4.2.11 (Zero-one valued graphons). Let W be a {0, 1}-valued graphon. Suppose graphons
Wn satisfy kWn − W k  → 0 as n → ∞. Show that kWn − W k 1 → 0 as n → ∞.

4.3. Homomorphism density


Subgraph densities give another way of measuring graphs. It will be more technically more
convenient to work with graph homomorphisms instead of subgraphs.
114 4. GRAPH LIMITS

Definition 4.3.1. A graph homomorphism from F to G is a map φ : V(F) → V(G) such that if
uv ∈ E(F) then φ(u)φ(v) ∈ E(G) (i.e., φ maps edges to edges). Define
Hom(F, G) := {homomorphisms from F to G}
and
hom(F, G) := |Hom(F, G)| .
Define the F-homomorphism density in G (or F-density in G for short) as
hom(F, G)
t(F, G) := .
v(G)v(F)
This is also the probability that a uniformly random map V(F) → V(G) induces a graph homomor-
phism from F to G.
Example 4.3.2.
• hom(K1, G) = v(G).
• hom(K2, G) = 2e(G).
• hom(K3, G) = 6 · #triangles in G
• hom(G, K3 ) is the number of proper colorings G using three labeled colors, e.g., {red, green,
blue} (corresponding to the vertices of K3 ).
Remark 4.3.3 (Subgraphs versus homomorphisms). Note that the homomorphisms from F to G
do not quite correspond to copies of subgraphs F inside G, because the homomorphisms can be
non-injective. Define the injective homomorphism density
#injective homomorphisms from F to G
tinj (F, G) := .
v(G)(v(G) − 1) · · · (v(G) − v(F) + 1)
Equivalently, this is the fraction of injective maps V(F) → V(G) that are graph homomorphisms
(i.e., send edges to edges). The fraction of maps V(F) → V(G) that are non-injective is ≤
v(F)
2 /v(G) (for every fixed pair of vertices of F, the probability that they collide is exactly 1/v(G)).
So  
1 v(F)
t(F, G) − tinj (F, G) ≤ .
v(G) 2
If F is fixed, the right-hand side tends to zero as v(G) → ∞. So all but a negligible fraction of
such homomorphisms correspond to subgraphs. This is why we often treat subgraph densities
interchangeably with homomorphism densities as they agree in the limit.
Now we define the corresponding notion of homomorphism density in graphons. We first give
an example and then the general formula.
Example 4.3.4 (Triangle density in graphons). The following quantity is the triangle density in a
graphon W.

t(K3, W) = W(x, y)W(y, z)W(z, x) dxdydz.
[0,1]3
This definition agrees with Definition 4.3.1 for the triangle density in graphs. Indeed, for every
graph G, the triangle density in G equals the triangle density in the associated graphon WG , i.e.,
t(K3, WG ) = t(K3, G).
4.3. HOMOMORPHISM DENSITY 115

Definition 4.3.5. Let F be a graph and W a graphon. The F-density in W is defined to be


∫ Ö Ö
t(F, W) = W(xi, x j ) dxi .
[0,1]V (F) i j∈E(F) i∈V(F)

We also use the same formula when W is a symmetric measurable function.


Note that for all graphs F and G, letting WG be the graphon associated to G,
t(F, G) = t(F, WG ). (4.3.1)
So the two definitions of F-density agree.
Definition 4.3.6. We say that a sequence of graphons Wn is left-convergent if for every graph
F, t(F, Wn ) converges as n → ∞. We say that this sequence left-converges to a graphon W if
limn→∞ t(F, Gn ) = t(F, W) for every graph F.
For a sequence of graphs, we say that it is left-convergent if the sequence of associated graphons
Wn = WG n is left-convergent, and that it left-converges to W if Wn does.
One usually has v(Gn ) → ∞, but it is not strictly necessary for this definition. Note that when
v(Gn ) → ∞, homomorphism densities and subgraph densities coincide (see Remark 4.3.3).
It turns out that left-convergence is equivalent to convergence in cut metric. This foundational
result in graph limits is due to Borgs, Chayes, Lovász, Sós, and Vesztergombi (2008).

Theorem 4.3.7 (Equivalence of convergence). A sequence of graphons is left-convergent if and


only if it is a Cauchy sequence with respect to the cut metric δ .
The sequence left-converges to some graphon W if and only if it converges to W in cut metric.
The implication that convergence in cut metric implies left-convergence is easier; it follows
from the counting lemma (Section 4.5). The converse is more difficult, and we will establish it at
the end of the chapter.
This allows us to talk about convergent sequences of graphs or graphons without specifying
whether we are referring to left-convergence or convergence in cut metric. However, since a major
goal of this chapter is to prove the equivalence between these two notions, we will be more specific
about the notion of convergence.
From the compactness of the space of graphons and the equivalence of convergence (actually
only needing the easier implication), we will be able to quickly deduce the existence of limit for
left-convergence, which was first proved by Lovász and Szegedy (2006).

Theorem 4.3.8 (Existence of limit for left-convergence). Every left-convergent sequence of graphs
or graphons left-converges to some graphon.
Remark 4.3.9. One can artificially define a metric that coincides with left-convergence. Let (Fn )n≥1
enumerate over all graphs. One can define a distance between graphons U and W by
Õ
2−k |t(Fk , W) − t(Fk , U)| .
k ≥1

We see that a sequence of graphons convergences under this notion of distance if and only if it
is left-convergent. This shows that left-convergence defines a metric topology on the space of
graphons, but in practice the above distance is pretty useless.
116 4. GRAPH LIMITS

Exercise 4.3.10. Define W : [0, 1]2 → R by W(x, y) = 2 cos(2π(x − y)). Let F be a graph. Show
that t(F, W) is the number of ways to orient all edges of F so that every vertex has the same number
of incoming edges as outgoing edges.

4.4. W-random graphs


In this section, we explain how to use a graphon to create a random graph model. This hopefully
gives more intuition about graphons.
The most common random graph model is the Erdős–Rényi random graph G(n, p), which is an
n-vertex graph with every vertex chosen with probability p.
The stochastic block model is a random graph model that generalizes the Erdős–Rényi random
graph. We already saw an example in Example 4.1.8. Let us first illustrate the two-block model,
which has several parameters:
qr qb

qr prr pr b

qb pr b pbb

with all the numbers lying in [0, 1], and subject to qr + qb = 1. We form a n-vertex random graph
as follows:
(1) Color each vertex red with probability qr and blue with probability qb , independently at
random. These vertex colors are “hidden states” and are not part of the data of the output
random graph (this step is slightly different from Example 4.1.8 in an unimportant way);
(2) For every pair of vertices, independent place an edge between them with probability
• prr if both vertices are red,
• pbb if both vertices are blue, and
• pr b if one vertex is red and the other is blue.
One can easily generalize the above to a k-block model, where vertices have one of k hidden
states, with q1, . . . , qk (adding up to 1) being the vertex state probabilities, and a symmetric k × k
matrix (pi j )1≤i, j≤k of edge probabilities for pairs of vertices between various states.
The W-random graph is a further generalization of stochastic block models, which correspond
to step-graphons W.
Definition 4.4.1 (W -random graph). Let W be a graphon. The n-vertex W-random graph G(n, W)
denotes the n-vertex random graph (with vertices labeled 1, . . . , n) obtained by first picking
x1, . . . , xn uniformly at random from [0, 1], and then putting an edge between vertices i and j
with probability W(xi, x j ), independently for all 1 ≤ i < j ≤ n.

x3 x5 x1 x2 x4
x3
x5
x1

x2
x4
4.4. W-RANDOM GRAPHS 117

Let us show that these W-random graphs left-converge to W with probability 1

Theorem 4.4.2 (W -random graphs left-converge to W ). Let W be a graphon. For each n, let G n be
a random graph distributed as G(n, W). Then Gn left-converges to W with probability 1.
Remark 4.4.3. The theorem does not require each G n to be sampled independently. For example,
we can we construct the sequence of random graphs, with Gn distributed as G(n, W), by revealing
one vertex at a time without resampling the previous vertices and edges. In this case, each Gn is a
subgraph of the next graph Gn+1 .
We will need the following standard result about concentration of Lipschitz functions. This can
be proved using Azuma’s inequality. See, e.g., (Alon and Spencer 2016, Chapter 7).

Theorem 4.4.4 (Bounded difference inequality). Let X1 ∈ Ω1, . . . , Xn ∈ Ωn be independent random


variables. Suppose f : Ω1 × · · · × Ωn → R is L-Lipschitz for some constant L in the sense of
satisfying
f (x1, . . . , xn ) − f (x10 , . . . , xn0 ) ≤ L (4.4.1)
whenever (x1, . . . , xn ) and (x1, . . . , xn ) differ on exactly one coordinate. Then the random variable
0 0

Z = f (X1, . . . , Xn ) satisfies, for every λ ≥ 0,


2 /n 2 /n
P(Z − EZ ≥ λL) ≤ e−2λ and P(Z − EZ ≤ −λL) ≤ e−2λ .
Let us show that the F-density in a W-random graph rarely differs significantly from t(F, W).

Theorem 4.4.5 (Sample concentration for graphons). For every  > 0, positive integer n, graph F,
and graphon W, we have
− 2 n

P (|t(F, G(n, W)) − t(F, W)| > ) ≤ 2 exp . (4.4.2)
8v(F)2
Proof. Recall from Remark 4.3.3 that the injective homomorphism density tinj (F, G) is defined to
be the fraction of injective maps V(F) → V(G) that carry every edge of F to an edge of G. We will
first prove that
− 2 n
 
P tinj (F, G(n, W)) − t(F, W) >  ≤ 2 exp .

(4.4.3)
2v(F)2
Let y1 . . . , yn , and zi j for each 1 ≤ i < j ≤ n, be independent uniform random variables in [0, 1].
Let G be the graph on vertices {1, . . . , n} with an edge between i and j if and only if zi j ≤ W(yi, y j ),
for every i < j. Then G has the same distribution as G(n, W). Let us group variables yi, zi j into
x1, x2, . . . , xn where
x1 = (y1 ), x2 = (y2, z12 ), x3 = (y3, z13, z23 ), x4 = (y4, z14, z24, z34 ), ....
This amounts to exposing the graph G one vertex at a time. Define the function f (x1, . . . , xn ) =
tinj (F, G). Note that E f = E tinj (F, G(n, W)) = t(F, W) by linearity of expectations (in this step, it
is important that we are using the injective variant of homomorphism densities). Note changing a
single coordinate of f changes the value of the function by at most v(F)/n, since exactly a v(F)/n
fraction of injective maps V(F) → V(G) hits any specific v ∈ V(G) in the image. Then (4.4.3)
follows from the bounded difference inequality, Theorem 4.4.4.
To deduce the theorem from (4.4.3), recall from Remark 4.3.3 that
t(F, G) − tinj (F, G) ≤ v(F)2 /(2v(G)).
118 4. GRAPH LIMITS

If  < v(F)2 /n, then the right-hand side of (4.4.2) is at least 2e−/8 ≥ 1, and so the inequality
trivially holds. Otherwise, |t(F, G(n, W)) − t(F, W)| >  implies tinj (F, G(n, W)) − t(F, W) >
 − v(F)2 /(2n) ≥ /2, and then we can apply (4.4.3) to conclude. 
Theorem 4.4.2 then follows from the Borel–Cantelli lemma, stated below, applied to Theo-
rem 4.4.5 with a union bound over all rational  > 0.

Theorem 4.4.6 (Borel–Cantelli lemma). Given a sequence of events E1, E2, . . . , if < ∞,
Í
n P(En )
then with probability 1, only finitely of them occur.

4.5. Counting lemma


In Chapter 2 on the graph regularity lemma, we proved a counting lemma to lower bound the
number of copies of some fixed graph H into a regularity partition. The same techniques can be
modified to give a similar upper bound. Here we prove another graph counting lemma. The proof
is more analytic, whereas the previous proofs in Chapter 2 were more combinatorial (embedding
one vertex at a time).

Theorem 4.5.1 (Counting lemma). Let F be a graph. Let W and U be graphons. Then
|t(F, W) − t(F, U)| ≤ |E(F)| δ (W, U).
Qualitatively, the counting lemma tells us that for every graph F, the function t(F, ·) is continuous
f0, δ ), the graphon space with respect to the cut metric. It implies the easier direction of the
in (W
equivalence in Theorem 4.3.7, namely that convergence in cut metric implies left-convergence.

Corollary 4.5.2. Every Cauchy sequence of graphons with respect to the cut metric is left-
convergent.
In the rest of this section, we prove Theorem 4.5.1. It suffices to prove that
|t(F, W) − t(F, U)| ≤ |E(F)| kW − U k  . (4.5.1)
Indeed, for every invertible measure preserving maps φ : [0, 1] → [0, 1], we have t(F, U) = t(F, U φ ).
By considering the above inequality with U replaced by U φ , and taking the infimum over all U φ ,
we obtain Theorem 4.5.1.
The following reformulation of the cut norm is often quite useful.

Lemma 4.5.3 (Reformulation of cut norm). For every measurable W : [0, 1]2 → R,

kW k  = sup W(x, y)u(x)v(y) dxdy .
u,v:[0,1]→[0,1] [0,1]2
measurable

Proof. We want to show (left-hand side below is how we defined the cut norm in Definition 4.2.1)
∫ ∫
sup W(x, y)1S (x)1T (y) dxdy = sup W(x, y)u(x)v(y) dxdy .
S,T ⊂[0,1] [0,1]2 u,v:[0,1]→[0,1] [0,1]2
measurable measurable
The right-hand side is at least as large as the left-hand side since we can take u = 1S and v = 1T .
On the other hand, the integral on the right-hand side is bilinear in u and v, and so it is always
possible to change u and v to {0, 1}-valued functions without decreasing the value of the integral
4.5. COUNTING LEMMA 119

(e.g., think about what is the best choice for v with u held fixed, and vice versa). If u and v are
restricted to {0, 1}-valued functions, then the two sides are identical. 

As a warm up, let us illustrate the proof of the triangle counting lemma, which has all the ideas
of the general proof but with simpler notation.

Proposition 4.5.4. Let W and U be graphons. Then

|t(K3, W) − t(K3, U)| ≤ 3 kW − U k  .


Proof. Given three graphons W12 , W13 , W23 , define

t(W12, W13, W23 ) = W12 (x, y)W13 (x, z)W23 (y, z) dxdydz.
[0,1]3

So
t(K3, W) = t(W, W, W) and t(K3, U) = t(U, U, U).
Note t(W12, W13, W23 ) is trilinear in W12, W13, W23 . In particular, For any graphons W and U, we
have ∫
t(W, W, W) − t(U, W, W) = (W − U)(x, y)W(x, z)W(y, z) dxdydz.
[0,1]3
For any fixed z, note that x 7→ W(x, z) and y 7→ W(y, z) are both measurable functions [0, 1] →
[0, 1]. So applying Lemma 4.5.3 gives

(W − U)(x, y)W(x, z)W(y, z) dxdy ≤ kW − U k 
[0,1]2

for every z. Now integrate over all z and applying the triangle inequality, we obtain
|t(W, W, W) − t(U, W, W)| ≤ kW − U k  .
We have similar inequalities in the other two coordinates. We can write
t(W, W, W) − t(U, U, U) = t(W, W, W − U) + t(W, W − U, U) + t(W − U, U, U).
We say that each term on the right-hand side is at most kW − U k  in absolute value. So the result
follows. 

The above proof generalizes in a straightforward way to an general graph counting lemma..

Proof. Given a collection of graphons We indexed by the edges e of F, define


∫ Ö Ö
tF (We : e ∈ E(F)) = Wi j (xi, x j ) dxi .
[0,1]V (F) i j∈E(F) i∈V(H)

In particular, this quantity equals to t(F, W) if We = W for all e ∈ E(F). A straightforward


generalization of the triangle case shows that if we change exactly one argument in the above
function from W to U, then its the value changes by at most kW − U k  in absolute value. Thus,
starting with tF (We : e ∈ E(F)) with every We = W, we can change each argument from W to U,
one by one, resulting in a total change of at most e(F) kW − U k. This proves (4.5.1), and hence the
theorem. 
120 4. GRAPH LIMITS

4.6. Weak regularity lemma


In Chapter 2, we defined an -regular vertex partition of a graph to be a partition such that all
but -fraction of pairs of vertices lie between -regular pairs of vertex parts. The number of parts
is at most a exponential tower of height O( −5 ).
The goal of this section is to introduce a weaker version of the regularity, requiring substantially
fewer parts for the partition. The guarantee provided by the partition can be captured by the cut
norm.
Let us first state this notion for a graph and then for a graphon.
Definition 4.6.1 (Weak regular partition for graphs). Given graph G, a partition P = {V1, . . . , Vk }
of V(G) is called weak -regular if for all A, B ⊂ V(G),
k
Õ
e(A, B) − d(Vi, Vj )| A ∩ Vi ||B ∩ Vj | ≤ v(G)2 .
i, j=1

Remark 4.6.2 (Interpreting weak regularity). Given A, B ⊂ V(G), suppose one only knew how many
vertices from A and B lie in each part of the partition (and not specifically which vertices), and
asked to predict the number of edges between A and B, then the sum above is the number of edges
between A and B that one would naturally expect based on the edge densities between vertex parts.
Being weak regular says that this prediction is roughly correct.
Weak regularity is more “global” compared to the notion of -regular partition from Chapter 2.
Here A and B have size a constant order fractions of the entire vertex set, rather than subsets of
individual parts of the partition. The edge densities between certain pairs A ∩ Vi and B ∩ Vj could
differ significantly from that of Vi and Vj . All we ask is that on average these discrepancies mostly
cancel out.
The following weak regularity lemma was proved by Frieze and Kannan (1999), initially
motivated by algorithmic applications that we will mention in Remark 4.6.11.

Theorem 4.6.3 (Weak regularity lemma for graphs). Let 0 <  < 1. Every graph has a weak
2
-regular partition into up to at most 41/ vertex parts.
Now let us state the corresponding notions for graphons.
Definition 4.6.4 (Stepping operator). Given a symmetric measurable function W : [0, 1]2 → R,
and a measurable partition P = {S1, . . . , Sk } of [0, 1], define a symmetric measurable function
WP : [0, 1]2 → R by setting its value on each Si × S j to be the average value of WP of over Si × S j
(since we only care about functions up to measure zero sets, we can ignore all parts Si with measure
zero).
In other words, WP is a step-graphon with steps given by P and values given by averaging W
over the steps.
Remark 4.6.5. The stepping operator is the orthogonal projection in the Hilbert space L 2 ([0, 1]2 )
onto the subspace of functions constant on each step Si × S j . It can also be viewed as the conditional
expectation with respect to the σ-algebra generated by Si × S j .
Definition 4.6.6 (Weak regular partition for graphons). Given graphon W, we say that a measurable
partition P of [0, 1] into finitely many parts is weak -regular if
kW − WP k ≤  .
4.6. WEAK REGULARITY LEMMA 121

Theorem 4.6.7 (Weak regularity lemma for graphons). Let 0 <  < 1. Then every graphon has a
2
weak -regular partition into at most 41/ parts.
Remark 4.6.8. Technically speaking, Theorem 4.6.3 does not follow from Theorem 4.6.7 since the
partition of [0, 1] for WG could split intervals corresponding to individual vertices of G. However,
the proofs of the two claims are exactly the same. Alternatively, one can allow a more flexible
definition of a graphon as symmetric measurable functions W : Ω × Ω → [0, 1], and then take Ω to
be the discrete probability space V(G) endowed with the uniform measure.
Like the proof of the regularity lemma in Section 2.1, we use an energy increment strategy.
Recall from Definition 2.1.8 that the energy of a vertex partition is the mean-squared edge-density
between parts. Given a graphon W, we define the energy of a measurable partition P = {S1, . . . , Sk }
of [0, 1] by
∫ Õ k
kWP k 2 =
2
WP (x, y) dxdy =
2
λ(Si )λ(S j )(avg of W on Si × S j )2
[0,1]2 i, j=1

Given W, U : [0, 1]2 → R, we write


∫ ∫
hW, Ui := WU = W(x, y)U(x, y) dxdy.
[0,1]2

Lemma 4.6.9 ( L 2 energy increment). Let W be a graphon. Let P be a finite measurable partition
of [0, 1] that is not -regular for W. Then there is a measurable refinement P 0 of P, dividing each
part of P into at most 4 parts, such that
kWP 0 k 22 > kWP k 22 +  2 .
Proof. Because kW − WP k > , there exist measurable subsets S, T ⊂ [0, 1] such that
|hW − WP, 1S×T i| >  .
Let P 0 be the refinement of P by introducing S and T, dividing each part of P into ≤ 4 sub-parts.
We know that
hWP, WP i = hWP 0, WP i
because WP is constant on each step of P, and P 0 is a refinement of P. Thus,
hWP 0 − WP, WP i = 0.
By the Pythagorean Theorem (in the Hilbert space L 2 ([0, 1]2 ),
kWP 0 k22 = kWP k22 + kWP 0 − WP k22 > kWP k22 +  2,
where the final step is due to
kWP 0 − WP k 2 ≥ |hWP 0 − WP, 1S×T i| = |hW − WP, 1S×T i| >  . 
We will prove the following slight generalization of Theorem 4.6.7, allowing an arbitrary
starting partition (this will be useful later).

Theorem 4.6.10 (Weak regularity lemma for graphons). Let 0 <  < 0. Let W be a graphon. Let
P0 be a finite measurable partition of [0, 1]. Then every graphon has a weak -regular partition
2
P, such that P refines P0 , and each part of P0 is partitioned into at most 41/ parts under P.
122 4. GRAPH LIMITS

This proposition specifically tells us that starting with any given partition, the regularity argu-
ment still works.
Proof. Starting with i = 0:
(1) If Pi is -regular, then STOP.
(2) Else, by Lemma 4.6.9, there exists a measurable partition Pi+1 refining each part of Pi into at
2 2
most 4 parts, such that WPi+1 2 > WPi 2 +  2 .
(3) Increase i by 1 and go back to Step (1).
Since 0 ≤ kWP k 2 ≤ 1 for every P, the process terminates with i < 1/ 2 , resulting in a terminal Pi
with the desired properties. 
Remark 4.6.11 (Additive approximation of maximum cut). One of the initial motivations for devel-
oping the weak regularity lemma was to develop a a general efficient algorithm for estimating the
maximum cut in a dense graph. The maximum cut problem is a central problem in algorithms
and combinatorial optimization:
MAX CUT: Given a graph S, find a S ⊂ V(G) that maximizes e(S, V(G) \ S).
Goemans and Williamson (1995) found an efficient 0.878-approximation algorithm (this means
that the algorithm outputs some S with e(S, V(G) \ S) at least a factor 0.878 of the optimum). Their
seminal algorithm uses a semidefinite relaxation. The Unique Games Conjecture would imply
that the it would not be possible to obtain a better approximation than the Goemans–Williamson
algorithm (Khot, Kindler, Mossel, and O’Donnell 2007). It is also known that approximating
beyond 16/17 ≈ 0.941 is NP-hard (Håstad 2001).
On the other hand, an algorithmic version of the weak regularity gives us an efficient algorithm
to approximate the maximum cut for for dense graphs, i.e., finding a cut within an n2 additive
error of the optimum, for any constant  > 0 is a constant. The basic idea is to find a weak regular
partition V(G) = V1 ∪ · · · ∪ Vk , and then do a brute-force search through all possibles sizes |S ∩ Vi |.
See Frieze and Kannan (1999) for more details. These ideas have been further developed into
efficient sampling algorithms, sampling only poly(1/) random vertices, for estimating maximum
cut in a dense graph, e.g., see Alon, Fernandez de la Vega, Kannan, and Karpinski (2003a).
The following exercise offers alternate approach to the weak regularity lemma. It gives an
approximation of a graphon as a linear combination of ≤  −2 indictor functions of boxes. The
polynomial dependence of  −2 is important for designing efficient approximation algorithms.
Exercise 4.6.12 (Weak regularity decomposition). (a) Let  > 0. Show that for every graphon W,
there exist measurable S1, . . . , Sk , T1, . . . , Tk ⊆ [0, 1] and reals a1, . . . , a k ∈ R, with k <  −2 ,
such that
Õ k
W− ai 1Si ×Ti ≤  .
i=1 
The rest of the exercise shows how to recover a regularity partition from the above approxi-
mation.
(b) Show that the stepping operator is contractive with respect to the cut norm, in the sense that
if W : [0, 1]2 → R is a measurable symmetric function, then kWP k ≤ kW k .
(c) Let P be a partition of [0, 1] into measurable sets. Let U be a graphon that is constant on S × T
for each S, T ∈ P. Show that for every graphon W, one has
kW − WP k ≤ 2kW − U k .
4.7. MARTINGALE CONVERGENCE THEOREM 123

(d) Use (a) and (c) to give a different proof of the weak regularity lemma (with slightly worse
bounds than the one given in class): show that for every  > 0 and every graphon W, there
2
exists partition P of [0, 1] into 2O(1/ ) measurable sets such that kW − WP k ≤ .
Exercise 4.6.13∗ (Second neighborhood distance). Let 0 <  < 1/2. Let W be a graphon. Define
τW,x : [0, 1] → [0, 1] by

τW,x (z) = W(x, y)W(y, z) dy.
[0,1]
(This models the second neighborhood of x.) Prove that if a finite set S ⊂ [0, 1] satisfies
kτW,s − τW,t k1 >  for all distinct s, t ∈ S,
2
then |S| ≤ (1/)C/ , where C is some absolute constant.
Exercise 4.6.14 (Strong regularity lemma). Let  = (1, 2, . . . ) be a sequence of positive reals. By
repeatedly applying the weak regularity lemma, show that there is some M = M() such that for
every graphon W, there is a pair of partitions P and Q of [0, 1] into measurable sets, such that Q
refines P, |Q| ≤ M (here |Q| denotes the number of parts of Q),
kW − WQ k ≤ |P | and kWQ k22 ≤ kWP k22 + 12 .
Furthermore, deduce the strong regularity lemma in the following form: one can write
W = Wstr + Wpsr + Wsml
where Wstr is a k-step-graphon with k ≤ M, kWpsr k ≤  k , and kWsml k1 ≤ 1 . State your bounds
on M explicitly in terms of . (Note: the parameter choice  k = /k 2 roughly corresponds to
Szemerédi’s regularity lemma, in which case your bound on M should be an exponential tower of
2’s of height  −O(1) ; if not then you are doing something wrong.)

4.7. Martingale convergence theorem


In this section we prove a result about martingales that will be used in the proof of the
compactness of graphon space.
Martingales are a standard notion in probability theory. It is a stochastic sequence where the
expected change at each step is zero, even conditioned on all prior values of the sequence.
Definition 4.7.1. A martingale is a random real sequence X0, X1, X2, . . . such that for all n ≥ 0,
E |Xn | < ∞, and
E[Xn+1 |X0, . . . , Xn ] = Xn .
Remark 4.7.2. The above definition is sufficient for our purposes. In order to give a more formal
definition of a martingale, we need to introduce the notion of a filtration. See any standard measure
theory based introduction to probability, e.g., (Williams 1991, Chapters 10–11) has a particularly
lucid discussion of martingale and their convergence theorem discussed below. This martingale
is indexed by the integers, and hence called “discrete-time.” There are also continuous-time
martingales (e.g., Brownian motion), which we will not discuss here.
Example 4.7.3 (Partial sum of independent mean zero random variables). Let Z1, Z2, . . . be a
sequence of independent mean zero random variables (e.g., ±1 with equal probability). Then
Xn = Z1 + · · · + Zn , n ≥ 0, is a martingale.
124 4. GRAPH LIMITS

Example 4.7.4 (Betting strategy). Consider any betting strategy in a “fair” casino, where the ex-
pected value of each bet is zero. Let Xn be the balance after n rounds of betting. Then Xn is a
martingale regardless of the betting strategy. So every betting strategy has zero expected gain after
n rounds. Also see the optional stopping theorem for a more general statement, e.g., Williams
(1991, Chapter 10).
The original meaning of the word “martingale” refers to the following better strategy on a
sequence of fair coin tosses. Each round the better is allowed to bet an arbitrary amount Z: if
heads, the better gains Z dollars, and if tails the better loses Z dollars.
Start betting 1 dollar. If one wins, stop. If one loses, then double one’s bet for the next coin.
And then repeat (i.e., keep doubling one’s bet until the first win, at which point one stops).
A “fallacy” is that this strategy always results in a final net gain of $1, the supposed reason being
that that with probability 1 one eventually sees a head. This initially appears to contradict the earlier
claim that all betting strategies have zero expected gain. Thankfully there is no contradiction. In
real life, one starts with a finite budget and could possibly go bankrupt with this betting strategy,
thereby leading to a forced stop. In the optional stopping theorem, there are some boundedness
hypothesis that are violated by above strategy.
The following construction of martingales most relevant for our purposes.
Example 4.7.5 (Doob martingale). Let X be some “hidden” random variable. Partial information
is revealed about X gradually over time. For example, X is some fixed function of some random
inputs. So the exact value of X is unknown but its distribution can be derived from the distribution
of the inputs. Initially one does not know any of the inputs. Over time, some of the inputs are
revealed. Let
Xn = E[X | all information revealed up to time n].
Then X0, X1, . . . is a martingale (why?). Informally, Xn is the best guess (in expectation) of X based
on all the information available up to time n. We have X0 = EX (when no information is revealed).
All information are revealed as n → ∞, and the martingale Xn converges to the random variable X
with probability 1.
Here is a real-life example. Let X ∈ {0, 1} be whether a candidate wins in a presidential election.
Let Xn be the inferred probability that the candidate wins, given all the information known at time tn .
Then Xn converges to the “truth”, a {0, 1}-value, eventually becoming deterministic when election
result is finalized.
Then Xn is a martingale. At time tn , knowing Xn , if the expectation for Xn+1 (conditioned on
everything known at time tn ) were different from Xn , then one should have adjusted Xn accordingly
in the first place.
The precise notion of “information” in the above formula can be formalized using the notion of
filtration in probability theory.
Here is the main result of this section.

Theorem 4.7.6 (Martingale convergence theorem). Every bounded martingale converges with
probability 1.
In other words, if X0, X1, . . . is a a martingale with Xn ∈ [0, 1] for every n, then the sequence is
convergent with probability 1.
Remark 4.7.7. The proof actually shows that the boundedness condition can be replaced by the
weaker L 1 -boundedness condition, i.e., supn E |Xn | < ∞. Even more generally, uniform integrabil-
ity is enough.
4.8. COMPACTNESS OF THE SPACE OF GRAPHONS 125

Some boundedness condition is necessary. For example, in Example 4.7.3, a running sum of
independent uniform ±1 is a non-bounded martingale, and never converges.
Proof. If a sequence X0, X1, . . . does not converge, then there exist a pair of rational numbers
0 < a < b < 1 such that Xn “up-crosses” [a, b] infinitely many times, meaning that there is an
infinite sequence s1 < t1 < s2 < t2 < · · · such that Xsi < a < b < Xti for all i.

s1 t1 s2 t 2 s3 t3

We will show that for each a < b, the probability that a bounded martingale X0, X1, · · · ∈ [0, 1]
up-crosses [a, b] infinitely many times is zero (by rescaling, we may assume that it is bounded
between 0 and 1 without loss of generality). Then, by taking a union of all countably many such
pairs (a, b) of rationals, we deduce that the martingale converges with probability 1.
Consider the following better strategy, imagine that Xn is a stock price. At any time, if Xn dips
below a at any point, we buy and hold one share until as soon as Xn reaches above b, at which point
we sell this share. (Note that we always hold either zero or one share–we do not buy more until we
have sold the currently held share). Start with a budget of 1 (so we will never go bankrupt). Let
Yn be the value of our portfolio (cash on hand plus the value of the share if held) at time n. Then
Yn is a martingale (why?). So EYn = Y0 = 1. Also Yn ≥ 0 for all n. If one buys and sells at least k
times up to time n, then Yn ≥ k(b − a) (this is only the net profit from buying and selling; the actual
Yn may be higher due to the initial cash balance and the value of the current share held). So, by
Markov’s inequality, for every n,
EYn 1
P(≥ k up-crossings up to time n) ≤ P(Yn ≥ k(b − a)) ≤ = .
k(b − a) k(b − a)
By the monotone convergence theorem,
1
P(≥ k up-crossings) = lim P(≥ k up-crossings up to time n) ≤ .
n→∞ k(b − a)
Letting k → ∞, the probability of having infinitely many up-crossings is zero. 

4.8. Compactness of the space of graphons


f0, δ ) is compact.
Now we prove Theorem 4.2.7: the graphon space (W
Proof of Theorem 4.2.7. As W
f0 is a metric space, it suffices to prove sequential compactness. Fix
a sequence W1, W2, . . . of graphons. We want to show that there is a subsequence which converges
(with respect to δ ) to some limit graphon.
Step 1. Regularize.
For each n, apply the weak regularity lemma (Theorem 4.6.7) repeatedly, to obtain a sequence
of partitions Pn,1, Pn,2, Pn,3, . . . (everything in this proof is measurable, and we will stop repeatedly
mention it) such that
(a) Pn,k+1 refines Pn,k for all n, k,
(b) |Pn,k | = mk where mk is a function of only k, and
(c) Wn − Wn,k  ≤ 1/k where Wn,k = (Wn )Pn,k .
126 4. GRAPH LIMITS

The weak regularity lemma only guarantees that |Pn,k | ≤ mk , but if we allow empty parts then we
can achieve equality in (b).
Step 2. Passing to a subsequence.
Initially, each Pn,k partitions [0, 1] into arbitrary measurable sets. Since eventually we only care
φ
about δ , which allows rearranging [0, 1], we can replace Wn by an appropriate rearrangement Wn n
(some measure theoretic details are omitted) so that every Pn,k is a partition of [0, 1] into intervals.
By repeated passing to subsequences, we can replace Wn by a subsequence so that
(1) for each k, the endpoints of the intervals in Pn,k all converge as n → ∞, and
(2) the values of Wn,k on the each block convergence individually as n → ∞.
Then, for each k, there is some graphon Uk such that
Wn,k → Uk pointwise almost everywhere as n → ∞.
The relationships between various sequences (after passing to subsequence) is illustrated below
W1 W2 W3 ...
k = 1 W1,1 W2,1 W3,1 . . . → U1 pointwise a.e.
k = 2 W1,2 W2,2 W3,2 . . . → U2 pointwise a.e.
k = 3 W1,3 W2,3 W3,3 . . . → U3 pointwise a.e.
.. .. .. .. .. ..
. . . . . .
Similarly, for each k, the partition Pn,k of [0, 1] into mk intervals converges to a partition Pk of
[0, 1] into intervals. Each Pk+1 refines Pk . Since Wn,k = (Wn,k+1 )Pn,k , taking n → ∞, we get
Uk = (Uk+1 )Pk .

.3 .8 .4 .6
.5 .3
.8 .1 0 .2

.368 ...
.4 0 .5 .2
.3 .4
.6 .2 .2 .7

U1 U2 U3

Step 3. Finding the limit.


Now each Uk can be thought of as a random variable on probability space [0, 1]2 (i.e., Uk (X, Y )
for independent uniform random variables X, Y in [0, 1]). The condition Uk = (Uk+1 )Pk implies
that the sequence U1, U2, . . . is a martingale. Since each Uk is bounded between 0 and 1, by the
martingale convergence theorem, o the martingale is bounded. By the martingale convergence
theorem (Theorem 4.7.6), there exists a graphon U such that Uk → U pointwise almost everywhere
as k → ∞.
Recall that our goal was to find a convergent subsequence of W1, W2, . . . under δ . We have
passed to a subsequence by the above diagonalization argument, and we claim that it converges to
U under δ . That is, we want to show that δ(Wn, U) → 0 as n → ∞.
Let  > 0. Then there exists some k > 3/ such that kU − Uk k 1 < /3, by pointwise conver-
gence and the dominated convergence theorem. Since Wn,k → Uk pointwise almost everywhere
(and by another application of the dominated convergence theorem), there exists some n0 ∈ N such
4.8. COMPACTNESS OF THE SPACE OF GRAPHONS 127

that Uk − Wn,k 1 < /3 for all n > n0 . Finally, since we chose k > 3/, we already know that
δ(Wn, Wn,k ) < /3 for all n. We conclude that
δ(U, Wn ) ≤ δ(U, Uk ) + δ(Uk , Wn,k ) + δ(Wn,k , Wn )
≤ kU − Uk k 1 + Uk − Wn,k 1
+ δ(Wn,k , Wn )
≤ .
The second inequality uses the general bound that
δ(W1, W2 ) ≤ kW1 − W2 k  ≤ kW1 − W2 k 1
for graphons W1, W2 . 

The compactness of (W f0, δ ) is a powerful statement. We will spend the remainder for the
chapter exploring its applications. We close this section with a couple of quick applications.
First, let us show how to use compactness to deduce the existence of limit for a left-convergent
sequence of graphons.
Proof of Theorem 4.3.8 (existence of limit of left-convergent sequence of graphons). Let W1, W2, . . .
be a sequence of graphons such that the sequence of F-densities {t(F, Wn )}n converges for every
graph F. Since (W f0, δ ) is compact metric space by Theorem 4.2.7, it is also sequentially compact,
∞ and a graphon W such that δ (W , W) → 0 as i → ∞. Fix
and so there is a subsequence (ni )i=1  ni
any graph F. By the counting lemma, Theorem 4.5.1, it follows that t(F, Wni ) → t(F, W). But by
assumption, the sequence {t(F, Wn )}n converges. Therefore t(F, Wn ) → t(F, W) as n → ∞. Thus
Wn left-converges to W. 
Let us now examine a different aspect of compactness. Recall that by definition, a set is compact
if every open cover has a finite subcover.
Recall from Theorem 4.2.8 that the set of graphs is dense in the space of graphons with respect
to the cut metric. This was proved by showing that for every  > 0 and graphon W, one can find a
graph G such that δ (G, W) < . However, the size of G produced by this proof depends on both
 and W, since the proof proceeds by first taking a discrete L 1 approximation of W, which could
involve an unbounded number of steps to approximate. In contrast, we show below that the number
of vertices of G needs to depend only on  and not on W.

Proposition 4.8.1. For every  > 0 there is some positive integer N = N() such that every
graphon lies within  cut distance from a graph on at most N vertices.
Proof. Let  > 0. For a graph G, define the open -ball (with respect to the cut metric) around G:

B (G) = {W ∈ W
f0 : δ (G, W) <  }.

Since every graphon lies within cut distance  from some graph (Theorem 4.2.8), the balls B (G)
cover Wf0 as G ranges over all graphs. By compactness, this open cover has a finite subcover. So
there is some n such that the subcover only uses graph G with at most N vertices. 
The following exercise asks to make the above proof quantitative. (Hint: use the weak regularity
lemma.)
Exercise 4.8.2. Show that for every  > 0, every graphon lies within cut distance at most  from
2
some graph on at most C 1/ vertices, where C is some absolute constant.
128 4. GRAPH LIMITS

Remark 4.8.3 (Ineffective bounds from compactness). Arguments using compactness usually do
not generate quantitative bounds, meaning, for example, the proof of Proposition 4.8.1 does not
give any specific function n(), only that such a function always exist. In case where one does
not have an explicit bound, we call the bound ineffective. Ineffective bounds also often arise from
arguments involving ergodic theory and non-standard analysis. Sometimes a different argument
can be found that generates a quantitative bound (e.g., Exercise 4.8.2), but it is not always known
how to do this. Here we illustrate a simple example of a compactness application (unrelated to
dense graph limits) that gives an ineffective bound, but it remains an open problem to make bound
effective.
This example concerns bounded degree graphs. It is sometimes called a “regularity lemma” for
bounded degree graphs, but it is very different from the regularity lemmas we have encountered so
far.
A rooted graph is a graph with a special vertex designated as a root, i.e., a pair (G, v) with
v ∈ V(G) as the root. Given a graph G and r ∈ N, we can obtain a random rooted graph by first
picking a vertex v of G as the root uniformly at random, and then removing all vertices more than
distance r from v. We define the r-neighborhood-profile of G to be the probability distribution
on rooted graphs generated by this process.
Recall that the total variation distance between two probability distributions µ and λ is defined
by
dTV (µ, λ) = sup | µ(E) − λ(E)| ,
E

where E ranges over all events. In the case of two discrete discrete random distributions µ and λ,
the above definition can be written as half the ` 1 distance between the two probability distributions:

dTV (µ, λ) = | µ(x) − λ(x)| .
2 x

The following observation is due to Alon (unpublished).

Theorem 4.8.4 (“Regularity lemma” for bounded degree graphs). For every  > 0 and positive
integers ∆ and r there exists a positive integer N = N(, ∆, r) such that for every graph G with
maximum degree at most ∆, there exists a graph G0 with at most N vertices, so that the total
variation distance between the r-neighborhood-profiles of G and G0 is at most .

Proof. Let G = G∆,r be the set of all possible rooted graphs with maximum degree ∆ and radius at
most r around the root. Then |G| < ∞. The r-neighborhood-profile pG of any rooted graph G can
be represented as a point pG ∈ [0, 1]G with coordinate sum 1, and let A = {pG : graph G} ⊂ [0, 1]G
be the set of all points that can arise this way. Since [0, 1]G is compact, the closure of A is compact.
Since the union of the open -neighborhoods (with respect to dTV ) of pG , ranging over all graphs
G, covers the closure of A, by compactness there is some finite subcover, i.e., a finite collection
X of graphs so that for every graph G, pG lies within  total variance distance to some pG 0 with
G0 ∈ X. We conclude by letting N be the maximum number of vertices of a graph from X. 

Despite the short proof using compactness, it remains an open problem to make the above result
quantitative.

Open problem 4.8.5. Find some specific N(, ∆, r) so that Theorem 4.8.4 holds.
4.9. EQUIVALENCE OF CONVERGENCE 129

4.9. Equivalence of convergence


In this section, we prove Theorem 4.3.7, that left-convergence is equivalent to convergence in
cut metric. Recall the counting lemma Theorem 4.5.1 already showed that cut metric convergence
implies left-convergence. It remains to show the converse. In other words, we need to show that if
W1, W2, . . . is a sequence of graphons such that t(F, Wn ) converges as n → ∞ for every graphs F,
then Wn is a Cauchy sequence in (W f0, δ ).
By the compactness of the graphon space, there is always some limit point W of the sequence
Wn under the cut metric. We want to show that this limit point is unique.f Suppose U is another
limit point. It remains to show that W and U are in fact the same point in Wf0 .

Let (ni )i=1 be the subsequence such that Wni → W. By the counting lemma, t(F, Wni ) → t(F, W)
for all graphs F, and by convergence of F-densities, t(F, Wn ) → t(F, W) for all graphs F. Similarly,
t(F, Wn ) → t(F, U) for all F. Hence, t(F, U) = t(F, W) for all F. All it remains is to prove is the
following claim.

Theorem 4.9.1 (Uniqueness of moments). Let U and W be graphons such that t(F, W) = t(F, U)
for all F. Then δ (U, W) = 0.
Remark 4.9.2. The result is reminiscent of results from probability theory on the uniqueness of
moments, which roughly says that if two “sufficiently well-behaved” real random variable X and
Y share the same moments, i.e., E[X k ] = E[Y k ] for all nonnegative integers k, then X and Y must
be identically distributed. One needs some technical conditions forÍthe conclusion to hold. For
example, Carleman’s condition says that if the moments of X satisfy ∞ 2k −1/(2k) = ∞, then
k=1 E[X ]
the distribution of X is uniquely determined by its moments. This sufficient condition holds as
long as the k-th moment of X does not grow too quickly with k. It holds for many distributions in
practice.
We need some preparation before proving the uniqueness of moments theorem.

Lemma 4.9.3 (Tail bounds for U -statistics). Let U : [0, 1]2 → [−1, 1] be a symmetric measurable
function. Let x1, . . . , x k ∈ [0, 1] be chosen independently and uniformly at random. Let  > 0.
Then

© 1 Õ 2
U(xi, x j ) − U ≥  ® ≤ 2e−k /8 .
ª
P ­ k
[0,1]2
« 2 i< j ¬
Proof. Let f (x1, . . . , xn ) denote the expression inside the absolute value. So E f = 0. Also f
changes by at most 2(k − 1)/ 2k = 4/k whenever we change exactly one coordinate of f . By the

bounded difference inequality, Theorem 4.4.4, we obtain
−2 2
 
2
P(| f | ≥ ) ≤ 2 exp 2
= 2e−k /8 . 
(4/k) k
Let us now consider a variation of the W-random graph model from Section 4.4. Let x1, . . . , x k ∈
[0, 1] be chosen independently and uniformly at random. Let H(k, W) be an edge-weighted random
graph on vertex set [k] with edge i j having weight W(xi, x j ), for each 1 ≤ i < j ≤ n. Note that this
definition makes sense for any symmetric measurable W : [0, 1]2 → R. Furthermore, when W is a
graphon, i.e., W : [0, 1]2 → [0, 1]. the W-random graph G(k, W) can be obtained by independently
sampling each edge of H(k, W) with probability equal to its edge weight. We shall study the joint
distributions of G(k, W) and H(k, W) coupled through the above two-step process.
130 4. GRAPH LIMITS

x3 x5 x1 x2 x4 1 2 3 4 5 1 2 3 4 5
x3
1 1
x5
2 2
x1
3 3
x2 4 4
x4
5 5

W H(k, W) G(k, W)

Similar to Definition 4.2.4 of the cut distance δ , define the distance based on the L 1 norm:
δ1 (W, U) := inf kW − U φ k1
φ

where the infimum is taken over all invertible measure preserving maps φ : [0, 1] → [0, 1]. Since
k · k  ≤ k · k 1 , we have δ ≤ δ1 .

Lemma 4.9.4. Let W be a graphon. Then δ1 (H(k, W), W) → 0 as k → ∞ with probability 1.


Proof. First we prove the result for step-graphons W. In this case, with probability 1 the fraction
of vertices of H(k, W) that fall in each step of W converges to the length of each step by the law
of large numbers, and so the associated graphon H(k, W) is obtained from W by changing the step
sizes by o(1) as k → ∞. It follows that the δ1 (H(k, W), W) → 0 with probability 1.
Now let W be any graphon. For any other graphon W 0, by using the same random vertices for
H(k, W) and H(k, W 0), the two random graphs are coupled so that with probability 1,
kH(k, W) − H(k, W 0)k 1 = kH(k, W − W 0)k 1 = kW − W 0 k 1 + o(1) as k → ∞
by Lemma 4.9.3 applied to U = |W − W 0 |.
For every  > 0, we can find some step-graphon W 0 so that kW − W 0 k 1 ≤  (by approximating
the Lebesgue measure using boxes). We saw earlier that δ1 (H(k, W 0), W 0) → 0. It follows that with
probability 1,
δ1 (H(k, W), W) ≤ kH(k, W) − H(k, W 0)k 1 + δ1 (H(k, W 0), W 0) + kW 0 − W k 1
= 2 kW 0 − W k 1 + o(1) ≤ 2 + o(1)
as k → ∞. Since  > 0 can be chosen to be arbitrarily small, we have δ1 (H(k, W), W) → 0 with
probability 1. 
Proof of (uniqueness of moments). By inclusion-exclusion, for any k-vertex labeled graph F,
Õ 0
Pr[G(k, W)  F as labeled graphs] = (−1)E(F )−E(F) Pr[G(k, W) ⊃ F 0 as labeled graphs]
F0
summing over all F 0 obtained by taking subsets of edges of F. Since
t(F 0, W) = Pr[G(k, W) ⊃ F 0 as labeled graphs],
we see that the distribution of G(k, W) is determined by the values of t(F, W) over all F. Since
t(F, W) = t(F, U) for all F, G(k, W) and G(k, U) are identically distributed.
Our strategy is to prove
δ1 δ D δ δ1
W ≈ H(k, W) ≈ G(k, W) ≡ G(k, U) ≈ H(k, U) ≈ U.
4.9. EQUIVALENCE OF CONVERGENCE 131

By Lemma 4.9.4, δ1 (H(k, W), W) → 0 with probability 1.


By coupling H(k, W) and G(k, W) using the same random vertices as noted earlier, so that
G(k, W) is generated from H(k, W) by independently sampling each edges with probability equal
to the edge-weight, (for some constant c > 0)
2 2
P(δ (G(k, W), H(k, W)) ≥ ) ≤ 22k e−ck
by the Chernoff bound (Theorem 3.1.5) for each pair of vertex subsets and then taking a union
bound, similar to the proof of Proposition 3.1.6. In particular, this implies that, with probability 1,
δ (H(k, W), G(k, W)) → 0 as k → ∞.
Since δ ≤ δ1 , we have, with probability 1,
δ (W, G(k, W)) ≤ δ1 (W, H(k, W)) + δ (H(k, W), G(k, W)) = o(1).
Likewise δ (U, G(k, U)) = o(1) with probability 1. Since G(k, W) is identically distributed as
G(k, U), we deduce that δ (W, U) = 0. 
This finishes the proof of the equivalence between left-convergence and cut metric convergence.
This equivalence can be recast as counting and inverse counting lemmas. We state the inverse
counting lemma below, and leave the proof as an instructive exercise in applying the compactness
of the graphon space.

Corollary 4.9.5 (Inverse counting lemma). For every  > 0 there is some η > 0 and integer k > 0
such that if U and W are graphons with
|t(F, U) − t(F, W)| ≤ η whenever v(F) ≤ k,
then δ (U, W) ≤ .
Exercise 4.9.6. Prove the inverse counting lemma Corollary 4.9.5 using the compactness of
graphon space (Theorem 4.2.7) and the uniqueness of moments (Section 4.9).
Remark 4.9.7. The inverse counting lemma was first proved by Borgs, Chayes, Lovász, Sós, and
Vesztergombi (2008) in the following quantitative form:

Theorem 4.9.8 (Inverse counting lemma). Let k be a positive integer. Let U and W be graphons
with
2
|t(F, U) − t(F, W)| ≤ 2−k whenever v(F) ≤ k,
then (here C is some absolute constant)
C
δ (U, W) ≤ p .
log k
Exercise 4.9.9∗ (Generalized maximum cut). For symmetric measurable functions W, U : [0, 1]2 →
R, define ∫
φ
C(W, U) := sup hW, U i = sup W(x, y)U(φ(x), φ(y)) dxdy,
φ φ
where φ ranges over all invertible measure preserving maps [0, 1] → [0, 1]. Extend the definition
of C(·, ·) to graphs by C(G, ·) := C(WG, ·), etc.
(a) Is C(U, W) continuous jointly in (U, W) with respect to the cut norm? Is it continuous in U if
W is held fixed?
132 4. GRAPH LIMITS

(b) Show that if W1 and W2 are graphons such that C(W1, U) = C(W2, U) for all graphons U, then
δ (W1, W2 ) = 0.
(c) Let G1, G2, . . . be a sequence of graphs such that C(Gn, U) converges as n → ∞ for every
graphon U. Show that G1, G2, . . . is convergent.
(d) Can the hypothesis in (c) be replaced by “C(Gn, H) converges as n → ∞ for every graph H”?
Further reading
The book Large Networks and Graph Limits by Lovász (2012) is the authoritative reference on
the subject. His survey article titled Very Large Graphs (2009) also gives an excellent overview.
CHAPTER 5

Graph homomorphism inequalities

In this chapter, we study inequalities between graph homomorphism densities t(F, G) =


hom(F, G)/v(G)v(F) . The most common such inequalities can be written in the form
c1 t(F1, G) + c2 t(F2, G) + · · · + ck t(Fk , G) ≥ 0. (5.0.1)
Although the left-hand side is a linear combination of various graph homomorphism densities in
G, polynomials can also be written this way. Indeed, t(F1, G)t(F2, G) = t(F1 t F2, G), with F1 t F2
being the disjoint union of the two graphs.
More generally, we would like understand constrained optimization problems in terms of graph
homomorphism density. Many problems in extremal graph theory can be cast in this framework.
For example, Turán’s theorem from Chapter 1 on the maximum edge density of a Kr -free graph can
be phrased in terms of the optimization problem
maximize t(K2, G) subject to t(Kr , G) = 0.
Turán’s theorem (Corollary 1.2.5) says that the answer is 1/(r − 1), achieved by G = Kr−1 . We will
see additional proofs of these results in this chapter via inequalities of the form (5.0.1).
Remark 5.0.1 (Undecidability). Perhaps surprisingly, it is an undecidable problem to determine
whether an inequality of the form (5.0.1) (with rational coefficients, say) holds for all graphs G
(Hatami and Norine 2011). This undecidability stands in stark contrast to the decidability of
polynomials inequalities over the reals, which follows from a classic result of Tarski (1948) that
the first order theory of real numbers is decidable (via quantifier elimination). This undecidability
of graph homomorphism inequalities is related to Matiyasevich’s theorem (1970) (also known
as the Matiyasevich–Robinson–Davis–Putnam theorem) giving a negative solution to Hilbert’s
10th Problem, showing that diophantine equations are undecidable (equivalently: polynomial
inequalities over the integers are undecidable). In fact, the proof of the former proceeds by
converting polynomial inequalities over the integers by inequalities between t(F, G) for various F.
As in the case of diophantine equations, the undecidability of graph homomorphism inequalities
should be positively viewed as evidence of the richness of this space of problems. There are still
many open problems, such as Sidorenko’s inequality that we will see shortly.
Remark 5.0.2 (Graphs versus graphons). In the space of graphons with respect to the cut norm,
W 7→ t(F, W) is continuous (by the counting lemma, Theorem 4.5.1), and graphs are dense subset
(Theorem 4.2.8). It follows that inequalities such as (5.0.1) hold for all graphs G if and only if they
hold for all graphons W (this is true for inequalities between continuous functions of F-densities
over various F). Furthermore, due to the compactness of the space of graphons, the extremum of
continuous functions of F-densities is always attained at some graphon. The graphon formulation
of the results can be often succinct and attractive.
For example, consider the following extremal problem (already mentioned in Chapter 4), where
p ∈ [0, 1] is a given constant,
minimize t(C4, G) subject to t(K2, G) ≥ p.
133
134 5. GRAPH HOMOMORPHISM INEQUALITIES

The minimum (or rather infimum) p4 is not attained by any single graph, but rather by a sequence of
quasirandom graphs (see Section 3.1). However, if we enlarge the space from graphs G to graphons
W, then the minimizer is attained, in this case by the constant graphon p.
There are many important open problems on graph homomorphism inequalities. A major
conjecture in extremal combinatorics is Sidorenko’s conjecture (1993) (an equivalent conjecture
was given earlier by Erdős and Simonovits).
Definition 5.0.3. We say that a graph F is Sidorenko if for every graph G,
t(F, G) ≥ t(K2, G)e(F) .

Conjecture 5.0.4 (Sidorenko’s conjecture). Every bipartite graph is Sidorenko.


In other words, the conjecture says that for a fixed bipartite graph F, the F-density in a graph of
a given edge density is asymptotically minimized by a random graph. We will develop techniques
in this chapter to prove several interesting special cases of Sidorenko’s conjecture.
Every Sidorenko graph is necessarily bipartite. Indeed, given a non-bipartite F, we can take a
non-empty bipartite G to get t(F, G) = 0 while t(K2, G) > 0.
A notable open case of Sidorenko’s conjecture is F = K5,5 \ C10 (below left). This F is called
the Möbius graph since it is the point-face incidence graph of a minimum simplicial decomposition
of a Möbius strip (below right).
a F
b d a
b G
c H J
H G I F
d I
a c e b
e J

Sidorenko’s conjecture has the equivalent graphon formulation: for every bipartite graph F and
graphon W,
t(F, W) ≥ t(K2, W)e(F) .
Note that equality occurs when W ≡ p, the constant graphon. One can think of Sidorenko’s
conjecture
∫ as a separate problem for each F, and asking to minimize t(F, W) among graphons W
with W ≥ p. Whether the constant graphon is the unique minimizer is the subject of an even
stronger conjecture known as the forcing conjecture.
Definition 5.0.5. We say that a graph F is forcing if every graphon W with t(F, W) = t(K2, W)e(F)
is a constant graphon (up to a set of measure zero)
By translating back and forth between graph limits and sequences of graphs, being forcing
is equivalent to the quasirandomness condition. Thus any forcing graph can play the role of C4
in Theorem 3.1.1. This is what led Chung, Graham, and Wilson to consider forcing graphs. In
particular, C4 is forcing.

Proposition 5.0.6 (Forcing and quasirandomness). A graph F is forcing if and only if for every
constant p ∈ [0, 1], every sequence of graphs G = Gn with
t(K2, G) = p + o(1) and t(F, G) = pe(F) + o(1)
is quasirandom in the sense of Definition 3.1.2.
Exercise 5.0.7. Prove Proposition 5.0.6.
5.1. EDGE VERSUS TRIANGLE DENSITIES 135

t ks W

AK W
34 F
Figure 5.1.1. The edge triangle-region. (Figure adapted from Lovász (2012))

The forcing conjecture, below, states a complete characterization of forcing graphs (Skokan
and Thoma 2004; Conlon, Fox, and Sudakov 2010).

Conjecture 5.0.8 (Forcing conjecture). A graph is forcing if and only if it is bipartite and not a
tree.
Exercise 5.0.9. Prove the “only if” direction of the forcing conjecture.
Exercise 5.0.10. Prove that every forcing graph is Sidorenko.
Exercise 5.0.11 (Forcing and stability). Show that a graph F is forcing if and only if for every  > 0,
there exists δ > 0 such that if a graph G satisfies t(F, G) ≤ t(K2, G)e(F) + δ, then δ (G, p) ≤ .
Exercise 5.0.12. Let F be a bipartite graph. Suppose there is some constant c > 0 such that
t(F, G) ≥ ct(K2, G)e(F) for all graphs G. Show that F is Sidorenko.
5.1. Edge versus triangle densities
What are all the pairs of edge and triangles densities that can occur in a graph (or graphon)?
We would like to determine the
edge-triangle region := {(t(K2, W), t(K3, W)) : W graphon} ⊂ [0, 1]2 . (5.1.1)
This is a closed subset of [0, 1]2 , due to the compactness of the space of graphons. This set has
been completely determined, and it is illustrated in Figure 5.1.1. We will discuss its features in this
section.
The upper and lower boundaries of this region correspond to the answers of the following
question.
Question 5.1.1. Fix p ∈ [0, 1]. What are the minimum and maximum possible t(K3, W) among all
graphons with t(K2, W) = p?
136 5. GRAPH HOMOMORPHISM INEQUALITIES

For a given p ∈ [0, 1], the set {t(K3, W) : t(K2, W) = p} is a closed interval. Indeed, if W0
achieves the minimum triangle density, and W1 achieves the maximum, then their linear interpolation
Wt = (1 − t)W0 + tW1 , ranging over 0 ≤ t ≤ 1, must have triangle density continuously interpolating
between those of W0 and W1 , and therefore achieves every intermediate value.
The maximization part of Question 5.1.1 is easier. The answer is p3/2 .

Theorem 5.1.2 (Max triangle density). For every graph G,


t(K3, G) ≤ t(K2, G)3/2 .
This inequality is asymptotically tight for G being a clique on a subset of vertices. The
equivalent graphon inequality t(K3, W) ≤ t(K2, W)3/2 attains equality for the clique graphon
0 a 1
(
1 if x, y ≤ a, 1
W(x, y) = a (5.1.2)
0 otherwise. 0
1

For the above W, we have t(K3, G) = a3 while t(K2, G) = a2 .


Proof. The quantities hom(K3, G) and hom(K2, G) these count the number of closed walks in the
graph of length 3 and 2, respectively. Let λ1 ≥ · · · ≥ λn be the eigenvalues of the adjacency matrix
AG of G, then
Õ k Õk
hom(K3, G) = tr AG =
3
λi
3
and hom(K2, G) = tr AG =
2
λi2
i=1 i=1
Then (see lemma below)
n n
! 3/2
Õ Õ
hom(K3, G) = λi3 ≤ λi2 = hom(K2, G)3/2 .
i=1 i=1
After dividing by v(G)3 on both sides, the result follows. 

Lemma 5.1.3. Let t ≥ 1, and a1, · · · , an ≥ 0. Then,


a1t + · · · + ant ≤ (a1 + · · · + an )t .
Proof. Assume at least one ai is positive, or else both sides equal to zero. Then
n  t Õ n
LHS Õ ai ai
= ≤ = 1. 
RHS i=1 a1 + · · · + an i=1
a1 + · · · + a n

Remark 5.1.4. We will see additional proofs of Theorem 5.1.2 not invoking eigenvalues later in
Exercise 5.2.13 and in Section 5.3. Theorem 5.1.2 is an inequality in “physical space” (as opposed
to going into the “frequency space” of the spectrum), and it is a good idea to think about how to
prove it while staying in the physical space.
More generally, the clique graphon (5.1.2) also maximizes Kr -densities among all graphon of
given edge density.

Theorem 5.1.5 (Maximum clique density). For any graphon W and integer k ≥ 3,
t(Kk , W) ≤ t(K2, W) k/2 .
5.1. EDGE VERSUS TRIANGLE DENSITIES 137

Proof. There exist integers a, b ≥ 0 such that k = 3a + 2b (e.g., take a = 1 if k is odd and a = 0 if
k is even). Then aK3 + bK2 (a disjoint union of a triangles and b isolated edges) is a subgraph of
Kk . So
t(Kk , W) ≤ t(aK3 + bK2, W) = t(K3, W)a t(K2, W)b ≤ t(K2, W)3a/2+b = t(K2, W) k/2 . 
Remark 5.1.6 (Kruskal–Katona theorem). Thanks to a theorem of Kruskal (1963) and Katona
(1968), the exact answer to the following non-asymptotic question is completely known:
What is the maximum number of copies of Kk ’s in an n-vertex graph with m edges?
When m = a2 for some integer a, the optimal graph is a clique on a vertices. More generally,

for any value of m, the optimal graph is obtained by adding edges in colexicographic order:
12, 13, 23, 14, 24, 34, 15, 25, 35, 45, . . .
This is stronger than Theorem 5.1.5, which only gives an asymptotically tight answer as n → ∞.
The full Kruskal–Katona theorem also answers:
What is the maximum number of k-cliques in an r-graph with n vertices and m edges?
When m = r , the optimal r-graph is a clique on a vertices. (An asymptotic version of this
a

statement can be proved using techniques in Section 5.3.) More generally, the optimal r-graph is
obtained by adding the edges in colexicographic order. For example, for 3-graphs, the edges should
be added in the following order:
123, 124, 134, 234, 125, 135, 235, 145, 245, 345, . . .
Here a1 . . . ar < b1 . . . br in colexicographic order if ai < bi at the last i where ai , bi (i.e.,
dictionary order when read from right to left). Here we sort the elements of each r-tuple in
increasing order.
The Kruskal–Katona theorem can be proved by a compression/shifting argument. The idea is
to repeatedly modifying the graph so that we eventually end up at the optimal graph. At each step,
we “push” all the edges towards a clique along some “direction” in a way that does not reduce the
number of k-cliques in the graph.
Now we turn to the lower boundary of the edge-triangle region. What is the minimum triangle
density in a graph of given edge density p?
For p ≤ 1/2, we can have complete bipartite graphs of density p + o(1), which are triangle-free.
For p > 1/2, the triangle density must be positive due to Mantel’s theorem (Theorem 1.1.1) and
supersaturation (Theorem 1.3.3). It turns out that among graphs with edge density p + o(1), the
triangle density is asymptotically minimized by certain complete multipartite graphs, although this
is not easy to prove.
For each positive integer k, we have
  
1 1 2
t(K2, Kk ) = 1 − and t(K3, Kk ) = 1 − 1− .
k k k
As k ranges over all positive integers, these pairs form special points on the lower boundary of
the edge-triangle region, as illustrated in Figure 5.1.1. (Recall that Kk is associated to the same
graphon as a complete k-partite graph with equal parts.)
Now suppose the given edge density p lies strictly between 1 − 1/(k − 1) and 1 − 1/k for some
integer k ≥ 2. To obtain the graphon with edge density p and minimum triangle density, we first
start with Kk with all vertices having equal weight. And then shrink the relative weight of exactly
one of the k vertices (while keeping the remaining k − 1 vertices to have the same vertex weight).
138 5. GRAPH HOMOMORPHISM INEQUALITIES

For example, the graphon illustrated below is obtained by starting with K4 and shrinking the weight
on one vertex.
I1 I2 I3 I4

I1 0 1 1 1

I2 1 0 1 1

I3 1 1 0 1

I4 1 1 1 0

During this process, the total edge density (account for vertex weights) decreases continuously
from 1 − 1/k to 1 − 1/(k − 1). At some point, the edge density is equal to p. This vertex-weighted
k-clique W turns out minimize triangle density among all graphons with edge density p.
The above claim is much more difficult to prove than the maximum triangle density result. This
theorem, stated below, due to Razborov (2008), was proved using an involved Cauchy–Schwarz
calculus that he coined flag algebra. We will say a bit more about this method in Section 5.2.

Theorem 5.1.7 (Minimum triangle density). Fix 0 ≤ p ≤ 1 and k = d1/(1 − p)e. The minimum
of t(K3, W) among graphons W with t(K2, W) = p is attained by the stepfunction W associated
to a k-clique with node weights a1, a2, · · · , a k with sum equal to 1, a1 = · · · = a k−1 ≥ a k , and
t(K2, W) = p.
We will not prove this theorem in full here. See Lovász (2012, Section 16.3.2) for a presentation
of the proof of Theorem 5.1.7. Later in this Chapter, we give lower bounds that match the edge-
triangle region at the cliques. In particular, Theorem 5.4.4 will allow us to determine the convex
hull of the region.
The graphon described in Theorem 5.1.7 turns out to be not unique unless p = 1 − 1/k for some
positive integer k Indeed, suppose 1−1/(k −1) < p < 1−1/k. Let I1, . . . , Ik be the partition of [0, 1]
into the intervals corresponding to the vertices of the vertex-weighted k-clique, with I1, . . . , Ik−1 all
having equal length, and Ik strictly smaller length. We can replace the graphon on some Ik−1 ∪ Ik
by any triangle-free graphon without changing the edge density (why is this possible?).
I1 I2 I3 I4

I1 0 1 1 1

I2 1 0 1 1

any
I3 1 1 0 1
triangle-free
graphon
I4 1 1 1 0

This operation does not change the edge-density or the triangle-density of the graphon (check!).
The non-uniqueness of the minimizer hints at the difficulty of the result.
This completes our discussion of the edge-triangle region (Figure 5.1.1).
5.2. CAUCHY–SCHWARZ 139

Theorem 5.1.7 was generalized from K3 to K4 (Nikiforov 2011), and then to all cliques Kr
(Reiher 2016). The construction for the minimizing graphon is the same as for the triangle case.

Theorem 5.1.8 (Minimum clique density). Fix 0 ≤ p ≤ 1 and k = d1/(1 − p)e. The minimum
of t(Kr , W) among graphons W with t(K2, W) = p is attained by the stepfunction W associated
to a k-clique with node weights a1, a2, · · · , a k with sum equal to 1, a1 = · · · = a k−1 ≥ a k , and
t(K2, W) = p.

5.2. Cauchy–Schwarz
We will apply the Cauchy–Schwarz inequality in the following form: given real-valued functions
f and g on the same space (always assuming the usual measurability assumptions without further
comments), we have
∫  2 ∫  ∫ 
fg ≤ f 2
g .
2
X X X
It is one of the most versatile inequalities in combinatorics.
To better emphasize the variables being integrated, we write write below the integral sign. The
domain of integration (usually [0, 1] for each variable) is omitted to avoid clutter. We write
∫ ∫
f (x, y, . . . ) for f (x, y, . . . ) dxdy · · · .
x,y,...
In practice, we will often apply the Cauchy–Schwarz inequality by changing the order of
integration, and separating an integral into an outer integral and an inner integral.
A typical application of the Cauchy–Schwarz inequality is demonstrated in the following cal-
culation (here one should think of x, y, z each as collections of variables):
∫ ∫ ∫  ∫ 
f (x, y)g(x, z) = f (x, y) g(x, z)
x,y,z x y z
∫ ∫  2 ! 1/2 ∫ ∫  2 ! 1/2
≤ f (x, y) g(x, z)
x y x z
∫  1/2 ∫  1/2
= f (x, y) f (x, y )0
g(x, z)g(x, z ) 0
x,y,y 0 x,z,z 0
Note that in the final step, “expanding a square” has the effect of “duplicating a variable.” It is
useful to recognize expressions with duplicated variables that can be folded back into a square.
Let us warm up by proving that K2,2 is Sidorenko. We actually already proved this statement in
Proposition 3.1.12 in the context of the Chung–Graham–Wilson theorem on quasirandom graphs.
We repeat the same calculations here to demonstrate the integral notation.

Theorem 5.2.1. t(K2,2, W) ≥ t(K2, W)4 .

Lemma 5.2.2. t(K1,2, W) ≥ t(K2, W)2 .

Proof.
∫ ∫ ∫ 2 ∫ 2
t(K1,2, W) = W(x, y)W(x, y ) = 0
W(x, y) ≥ W(x, y) = t(K2, W)2 . 
x,y,y 0 x y x,y
140 5. GRAPH HOMOMORPHISM INEQUALITIES

Lemma 5.2.3. t(K2,2, W) ≥ t(K1,2, W)2 .

Proof.

t(K2,2, W) = W(x, z)W(x, z0)W(y, z)W(y, z0)
x,y,z,z 0
∫ ∫ 2 ∫ 2
= W(x, z)W(y, z) ≥ W(x, z)W(y, z) = t(K1,2, W). 
x,y z x,y,z

Proofs involving Cauchy–Schwarz are sometimes called “sum-of-square” proofs. The Cauchy–
Schwarz inequality can be proved by writing the difference between the two sides as a sum of square
quantity:
∫  ∫  ∫ 2 ∫
1
f 2 2
g − fg = ( f (x)g(y) − f (y)g(x))2 .
2 x,y
Commonly, g = 1, in which case we can also write
∫  ∫  2 ∫  ∫ 2
2
f − f = f (x) − f (y) .
x y

For example, We can write the proof of Lemma 5.2.3 as


∫ ∫ 2
1
t(K1,2, W) − t(K2, W) ≥
2
(W(x, y) − W(x, y )) .
0
2 x y,y 0

Exercise 5.2.4. Write t(K2,2, W) − t(K2, W)4 as a single sum-of-squares expression.

The next inequality tells us that if we color the edges of Kn using two colors, then at least
1/4 + o(1) fraction of all triangles are monochromatic (Goodman 1959). Note that this 1/4 constant
is tight since it is obtained by a uniform random coloring.

Theorem 5.2.5. t(K3, W) + t(K3, 1 − W) ≥ 1/4

Proof. Expanding, we have



t(K3, 1 − W) = (1 − W(x, y))(1 − W(x, z))(1 − W(y, z)) dxdydz
= 1 − 3t(K2, W) + 3t(K1,2, W) − t(K3, W).
So
t(K3, W) + t(K3, 1 − W) = 1 − 3t(K2, W) + 3t(K1,2, W)
 2
1 1 1
≥ 1 − 3t(K2, W) + 3t(K2, W) = + 3 t(K3, W) −
2
≥ . 
4 2 4
Which graphs, other than triangles, have the above property? We do not know the full answer.
Definition 5.2.6. We say that a graph F is common if for all graphons W,

t(F, W) + t(F, 1 − W) ≥ 2−e(F)+1 .


In other words, the left-hand side is minimized by the constant 1/2 graphon.
5.2. CAUCHY–SCHWARZ 141

Although it was initially conjectured that all graphs are common, this turns out to be false. In
particular, Kt is not common for all t ≥ 4 (Thomason 1989).
It is not too hard to show that every Sidorenko graph is common. Recall that every Sidorenko
graph is bipartite, and it is conjectured that every bipartite graph is Sidorenko. On the other hand,
the triangle is common but not bipartite.

Proposition 5.2.7. Every Sidorenko graph is common.

Proof. Suppose F were Sidorenko. Let p = t(K2, W). Then t(F, W) ≥ pe(F) and t(F, 1 − W) ≥
t(K2, 1 − W)e(F) = (1 − p)e(F) . Adding up and using convexity,

t(F, W) + t(F, 1 − W) ≥ pe(F) + (1 − p)e(F) ≥ 2−e(F)+1 . 

We also have the following lower bound on the minimum triangle density given edge density
(Goodman 1959).

Theorem 5.2.8 (Lower bound on triangle density).

t(K3, W) ≥ t(K2, W)(2t(K2, W) − 1).

Here is plot of Goodman’s bound against the true edge triangle region (figure from Lovász
(2012)). The inequality is tight whenever W is graphon for Kn , so that t(K3, W) = 3n /n3 =
(1 − 1/n)(1 − 2/n) and t(K2, W) = 1 − 1/n. In particular, Goodman’s bound implies Mantel’s
theorem: t(K2, W) > 1/2 implies t(K3, W) > 0.

(Figure from Lovász (2012).)

Proof. Since 0 ≤ W ≤ 1, we have (1 − W(x, z))(1 − W(y, z)) ≥ 0, and so

W(x, z)W(y, z) ≥ W(x, z) + W(y, z) − 1.


142 5. GRAPH HOMOMORPHISM INEQUALITIES

Thus

t(K3, G) = W(x, y)W(x, z)W(y, z)
x,y,z

≥ W(x, y)(W(x, z) + W(y, z) − 1)
x,y,z
= 2t(K1,2, W) − t(K2, W)
≥ 2t(K2, W)2 − t(K2, W). 
Remark 5.2.9 (Flag algebra). The above examples were all simple enough to be found by hand.
As mentioned earlier, every application of the Cauchy–Schwarz inequality can be rewritten in the
form of a sum of a squares. One could actually search for these sum-of-squares proofs more
systematically using a computer program. This idea, first introduced by Razborov (2007), can be
combined with other sophisticated method to determine the lower boundary of the edge-triangle
region (Razborov 2008). Razborov coined the term flag algebra to describe a formalization of such
calculations. The technique is also sometimes called graph algebra, Cauchy–Schwarz calculus,
sum-of-squares proof.
Conceptually, the idea is that we are looking for all the ways to obtain nonnegative linear
combinations of squared expressions. In a typical application, one is asked to solve an extremal
problem of the form
Minimize t(F0, W)
Subject to t(F1, W) = q1, . . ., t(F`, W) = q`,
W a graphon.
The technique is very flexible. The objectives and constraints could be any linear combinations
of densities. It could be maximize instead of minimize. Extensions of the techniques can handle
wider classes of extremal problems, such as for hypergraphs, directed graphs, edge-colored graphs,
permutations, and more.
To demonstrate the technique for graphons, note that we obtain for “free” inequalities of such
as
∫ ∫ 2
W(x, y)W(x, z) (aW(x, u)W(y, u) − bW(x, w)W(w, u)W(u, z) + c) ≥ 0
x,y,z u,w
due to the nonnegativity of squares. Here a, b, c ∈ R are constants (to be chosen). Expand the
above expression, by first
∫ 2 ∫
replacing G x,y,z (u, w) by G x,y,z (u, w)G x,y,z (u0, w0),
u,w u,w,u 0,w 0

we obtain a nonnegative linear combination of t(F, W) over various F with undetermined real
coefficients.
The idea is to now consider all such nonnegative expressions (in practice, on a computer,
we consider a large but finite set of such inequalities). Then we try to optimize the previously
undetermined real coefficients (a, b, c above), so that by adding together an optimized nonnegative
linear combination of all such inequalities, and when combined with the given constraints, we
obtain t(F0, W) ≥ α for some real α. We can find such coefficient and nonnegative combinations
efficiently using a semidefinite program (SDP) solver. This would then prove a bound on the
5.2. CAUCHY–SCHWARZ 143

desired extremal problem. If we also happen to have an example of W satisfying the constraints
and matching the bound, i.e., t(F0, W) = α, then we have solved the extremal problem.
The flag algebra method, with computer assistance, has successfully solved many interesting
extremal problems in graph theory. For example, a conjecture of Erdős (1984) on the maximum
pentagon density in a triangle-free graph was solved using flag algebra methods; the extremal
construction is a blow-up of a 5-cycle (Grzesik 2012; Hatami, Hladký, Kráľ, Norine, and Razborov
2013).

Theorem 5.2.10. Every n-vertex triangle-free graph has at most (n/5)5 cycles of length 5.

Let us mention another nice result obtained using the flag algebra method. Pippenger and
Golumbic (1975) asked to determine the maximum possible number of induced copies of a given
n 
graph H among all n-vertex graphs. The optimal limiting density (as a fraction of v(H) , as n → ∞)
is called the inducibility of graph H. They conjectured that for every k ≥ 5, the inducibility of a
k-cycle is k!/(k k − k), obtained by an iterated blow-up of a k-cycle (k = 5 illustrated below; in the
limit the should be infinitely many fractal-like iterations).

The conjecture for 5-cycles was proved by Balogh, Hu, Lidický, and Pfender (2016) using flag
algebra methods combined with additional “stability” methods.

Theorem 5.2.11. Every n-vertex graph has at most n5 /(55 − 5) induced 5-cycles.

Although the flag algebra method has successfully solved several extremal problems, in many
interesting cases, the method does not give a tight bound. Nevertheless, for many open extremal
problems, such as the tetrahedron hypergraph Turán problem, the best known bound comes from
this approach.
Remark 5.2.12 (Incompleteness). Can every true linear inequality for graph homomorphism den-
sities proved via Cauchy–Schwarz/sum-of-squares?
144 5. GRAPH HOMOMORPHISM INEQUALITIES

Before giving the answer, we first discuss classical results about real polynomials. Suppose
p(x1, . . . , xn ) is a real polynomial such that p(x1, . . . , xn ) ≥ 0 for all x1, . . . , xn ∈ R. Can such a
nonnegative polynomial always be written as a sum of squares? Hilbert (1888; 1893) proved that
the answer is yes for n ≥ 2 and no in general for n ≥ 3. The first explicit counterexample was given
by Motzkin (1967):
p(x, y) = x 4 y 2 + x 2 y 4 + 1 − 3x 2 y 2
is always nonnegative due to the AM-GM inequality, but it cannot be written as a nonnegative sum
of squares. Solving Hilbert’s 17th problem, Artin (1927) proved that every p(x1, . . . , xn ) ≥ 0 can
be written as a sum of squares of rational functions, i.e., there is some nonzero polynomial q such
that pq2 can be written as a sum of squares of polynomials. For the earlier example,
x 2 y 2 (x 2 + y 2 + 1)(x 2 + y 2 − 2)2 + (x 2 − y 2 )2
p(x, y) = .
(x 2 + y 2 )2
Turning back to inequalities between graph homomorphism densities, if f (W) = i ci t(Fi, W)
Í
is nonnegative for every graphon W, can f always be written as a nonnegative sum of squares of
rational functions in t(F, W)? In other words, can every true inequality can be proved using a finite
number of Cauchy–Schwarz inequalities (i.e., via vanilla flag algebra calculations).
It turns out that the answer is no (Hatami and Norine 2011). Indeed, if there were always a
sum-of-squares proof, then we could obtain an algorithm for deciding whether f (W) ≥ 0 (with
rational coefficients, say) holds for all graphons W, thereby contradicting the undecidability of
the problem (Remark 5.0.1). Consider the algorithm that enumerates over all possible forms of
sum-of-squares expressions (with undetermined coefficients that can then be solved for) and in
parallel enumerates over all graphs G and checks whether f (G) ≥ 0. If every true inequality had
a sum-of-squares proof, then this algorithm would always terminate and tell us whether f (W) ≥ 0
for all graphons W.
It turns out some simple looking inequalities, such as the fact that the 3-edge-path is Sidorenko,
cannot be written as a sum of squares (Blekherman, Raymond, Singh, and Thomas 2020).
Exercise 5.2.13 (Another proof of maximum triangle density). Let W : [0, 1]2 → R be a symmetric
measurable function. Write W 2 for the function taking value W 2 (x, y) = W(x, y)2 .
(1) Show that t(C4, W) ≤ t(K2, W 2 )2 .
(2) Show that t(K3, W) ≤ t(K2, W 2 )1/2 t(C4, W).
Combining the two inequalities we deduce t(K3, W) ≤ t(K2, W 2 )3/2 , which is somewhat stronger
than Theorem 5.1.2. We will see another proof below in Corollary 5.3.7.

5.3. Hölder
Hölder’s inequality is a generalization of the Cauchy–Schwarz inequality. It says that given
p1, . . . , p k ≥ 1 with 1/p1 + · · · + 1/p k = 1, and real-valued functions f1, . . . , fk on a common space,
we have ∫
f1 f2 · · · fk ≤ k f1 k p1 · · · k fk k pk ,
where the p-norm of a function f is defined by
∫  1/p
k f k p := |f| p
.

In practice, the case p1 = · · · = p k = k of Hölder’s inequality is used often.


5.3. HÖLDER 145

We can apply Hölder’s inequality to show that Kr,r is Sidorenko. The proof is essentially
verbatim to the proof of Theorem 5.2.1 that t(K2,2, W) ≥ t(K2, W)4 from the previous section,
except that we now apply Hölder’s inequality instead of the Cauchy–Schwarz inequality. We
outline the steps below and leave the details as an exercise.
2
Theorem 5.3.1. t(Kr,r , W) ≥ t(K2, W)r 

Lemma 5.3.2. t(Kr,1, W) ≥ t(K2, W)r 

Lemma 5.3.3. t(Kr,r , W) ≥ t(Kr,1, W)r 


Now we discuss a powerful variant of Hölder’s inequality due to Finner (1992), which is related
more generally to Brascamp–Lieb inequalities. Here is a representative example.

Theorem 5.3.4. Let X, Y, Z be measure spaces. Let f : X × Y → R, g : X × Z → R, and


h : Y × Z → R be measurable functions (assuming integrability whenever needed). Then

f (x, y)g(x, z)h(y, z) ≤ k f k 2 kgk 2 khk 2 .
x,y,z

Note that a∫straightforward ∫ application of Hölder’s inequality, when X, Y, Z are probability


spaces (so that x,y,z f (x, y) = x,y f (x, y)) would yield

f (x, y)g(x, z)h(y, z) ≤ k f k 3 kgk 3 khk 3
x,y,z

which is implied by Theorem 5.3.4. Indeed, in a probability space, k f k p is nondecreasing as a


function of p, which follows as a simple corollary of Höler’s inequality.
Proof of Theorem 5.3.4. We apply the Cauchy–Schwarz inequality three times. First to the integral
over x (this affects f and g while leaving h intact):
∫ ∫ ∫  1/2 ∫  1/2
2 2
f (x, y)g(x, z)h(y, z) ≤ f (x, y) g(x, z) h(y, z).
x,y,z y,z x x

Next, we apply the Cauchy–Schwarz inequality to the variable y (this affects f and h while leaving
g intact). Continuing the above inequality,
∫ ∫  1/2 ∫  1/2 ∫  1/2
≤ f (x, y)2
g(x, z)2
h(y, z)2
.
z x,y x y

Finally, we apply the Cauchy–Schwarz inequality to the variable z (this affects g and h while leaving
x intact). Continuing the above inequality,
∫  1/2 ∫  1/2 ∫  1/2
≤ f (x, y)2
g(x, z)2
h(y, z)2
.
x,y x,z y,z

This completes the proof of Theorem 5.3.4. 


Remark 5.3.5 (Projection inequalities). What is the maximum volume of a body K ⊂ R3 whose
projection on each coordinate plane is at most 1? A unit cube has volume 1, but is this the largest
possible?
146 5. GRAPH HOMOMORPHISM INEQUALITIES

Letting |·| denote both volume and area (depending on the dimension) and π xy (K) denote
the project of K onto the x y-plane, and likewise with the other planes. Using 1K (x, y, z) ≤
f (x, y)g(x, z)h(y, z), Theorem 5.3.4 implies
|K | 2 ≤ π x y (K) |π xz (K)| π yz (K) . (5.3.1)
This shows that if all three projections have volume at most 1, then |K | ≤ 1.
The inequality (5.3.1), which holds more generally in higher dimensions, is due to Loomis and
Whitney (1949). It has important applications in combinatorics. A powerful generalization known
as Shearer’s entropy inequality will be discussed in Section 5.5.
Now let us state a more general form of Theorem 5.3.4, which can be proved using the same
techniques. The key point of the inequality in Theorem 5.3.4 is that each variables (x, y, z) is
contained in exactly 2 of the factors ( f , g, h). Everything works the same way as long as each
variable is contained in exactly k factors, as long as we use L k norms on the right-hand side.
For example,
∫ 9
Ö
f1 (u, v) f2 (v, w) f3 (w, z) f4 (x, y) f5 (y, z) f6 (z, u) f7 (u, x) f8 (u, z) f9 (w, y) ≤ k fi k 3 .
u,v,w,x,y,z i=1

Here the factors in the integral correspond to edges of a 3-regular graph, below. In particular, every
variable lies in exactly 3 factors.
v w

u x

z y

More generally, each function fi can take as input any number of variables, as long as every variable
appears in exactly k functions. For example

f (w, x, y)g(w, y, z)h(x, z) ≤ k f k2 kgk2 khk2 .
w,x,y,z

The inequality is stated moreÎ generally below. Given x = (x1, . . . , xm ) ∈ X1 × · · · × Xm and I ⊂ [m],
we write π I (x) = (xi )i∈I ∈ i∈I Xi for the projection onto the coordinate subspace of I.

Theorem 5.3.6 (Generalized Hölder). Let X1, . . . , Xm be measure spaces. Let A1, . . . , A` ⊂ [m]
such that each element of [m] appears in exactly k different A0i s. For each i ∈ [m], let fi :
Î
j∈Ai X j →
R. Then ∫
f1 (π A1 (x)) · · · f` (π A` (x)) dx ≤ k f1 k k · · · k f` k k .
X1 ×···×X`
Furthermore, if every Xi is a probability space, then we can relax the hypothesis to “each element
of [m] appears in at most k different Ai ’s.
The version of Theorem 5.3.6 with each Xi being a probability space is useful for graphons.

Corollary 5.3.7. For any graph F with maximum degree at most k, and graphon W,

t(F, W) ≤ kW k e(F)
k
.
5.3. HÖLDER 147

In particular, since ∫
kW k kk = W k ≤ t(K2, W),
the inequality implies that
t(F, W) ≤ t(K2, W)e(F)/k .
This implies the upper bound on clique densities (Theorems 5.1.2 and 5.1.5). The stronger statement
of Corollary 5.3.7 with the L k norm of W on the right-hand side has no direct interpretations for
subgraph densities, but it is important for certain applications such as to understanding large
deviation rates in random graphs (Lubetzky and Zhao 2017).
More generally, using different L p norms for different factors in Hölder’s inequality, we have
the following statement (Finner 1992).

Theorem 5.3.8 (Generalized Hölder). Let X1, . . . , Xm be measure spaces. For each i ∈ [`], let
Î
pi ≥ 1, let Ai ⊂ [m], and fi : j∈Ai X j → R. If either
(1) i: j∈Ai 1/pi = 1 for each j ∈ [m],
Í
OR Í
(2) each Xi is a probability space and i: j∈Ai 1/pi ≤ 1 for each j ∈ [m],
then ∫
f1 (π A1 (x)) · · · f` (π A` (x)) dx ≤ k f1 k p1 · · · k f` k p` .
X1 ×···×X`
The proof proceeds by applying Hölder’s inequality k times in succession, once for each variable
xi ∈ Xi , nearly identically to the proof of Theorem 5.3.4.
Now we turn to another graph inequality that where the above generalization of Hölder’s
inequality plays a key role.
Question 5.3.9. Fix d. Among d-regular graphs, which graph G maximizes i(G)1/v(G) , where i(G)
denotes the number of independent sets of G.
The answer turns out to be G = Kd,d . We can also take G to be a disjoint union of copies of
Kd,d ’s, and this would not change i(G)1/v(G) . This result, stated below, was shown by Kahn (2001)
for bipartite regular graphs G, and later extended by Zhao (2010) to all regular graphs G.

Theorem 5.3.10. For every n-vertex d-regular graph G,


i(G) ≤ i(Kd,d )n/(2d) = (2d+1 − 1)n/(2d) .
The set of independent sets of G is in bijection with the set of graph homomorphisms from G
to the following graph:

Indeed, a map between their vertex sets form a graph homomorphism if and only if the vertices of
G that map to the non-looped vertex is an independent set of G.
Let us first prove Theorem 5.3.10 for bipartite regular G. The following more general inequality
was shown by Galvin and Tetali (2004). It implies the bipartite case of Theorem 5.3.10 by the
above discussion.

Theorem 5.3.11. For every n-vertex d-regular graph G, and any graph H (allowing looped vertices
on H)
hom(G, H) ≤ hom(Kd,d, H)n/(2d) .
148 5. GRAPH HOMOMORPHISM INEQUALITIES

This is equivalent to the following statement.

Theorem 5.3.12. For any d-regular bipartite graph F,


2
t(F, W) ≤ t(Kd,d, W)e(F)/d
Let us prove this theorem in the case F = C6 to illustrate the technique more concretely. The
general proof is basically the same. Let

f (x1, x2 ) = W(x1, y)W(x2, y).
y

This function should be thought of the codegree of vertices x1 and x2 . Then, grouping the factors
in the integral according to their right-endpoint, we have
x1 y1
x2 y2
x3 y3


t(C6, W) = W(x1, y1 )W(x2, y1 )W(x1, y2 )W(x3, y2 )W(x2, y3 )W(x2, y3 )
x1,x2,x3,y1,y2,y3
∫ ∫  ∫  ∫ 
= W(x1, y1 )W(x2, y1 ) W(x1, y2 )W(x3, y2 ) W(x2, y3 )W(x2, y3 )
x1,x2,x3 y1 y2 y3

= f (x1, x2 ) f (x1, x3 ) f (x2, x3 )
x1,x2,x3
≤ k f k 32 [by generalized Hölder, Theorem 5.3.6]

On the other hand, we have



k f k2 =
2
f (x1, x2 )2
x ,x
∫ 1 2 ∫  ∫ 
= W(x1, y1 )W(x2, y1 ) W(x1, y2 )W(x2, y2 )
x1,x2 y1 y2

= W(x1, y1 )W(x2, y1 )W(x1, y2 )W(x2, y2 )
x1,x2,y1,y2
= t(C4, W).

x1 y1
x2 y2

This proves Theorem 5.3.12 in the case F = C6 . The theorem in general can be proved via a similar
calculation and left to the readers as an exercise.
Remark 5.3.13. Kahn (2001) first proved the bipartite case of Theorem 5.3.10 using Shearer’s
entropy inequality, which we will see in Section 5.5. His technique was extended by Galvin and
Tetali (2004) to prove Theorem 5.3.11. The proof using generalized Hölder’s inequality presented
here was given by Lubetzky and Zhao (2017).
5.3. HÖLDER 149

So far we proved Theorem 5.3.10 for bipartite regular graphs. To prove it for all regular graphs,
we apply the following inequality by Zhao (2010). Here G × K2 (tensor product) is the bipartite
double cover of G. An example is illustrated below:

G G × K2
The vertex set of G × K2 is V(G) × {0, 1}. Its vertices are labeled vi with v ∈ V(G) and i ∈ {0, 1}.
Its edges are u0 v1 for all uv ∈ E(G). Note that G × K2 is always a bipartite graph.

Theorem 5.3.14. For every graph G,

i(G)2 ≤ i(G × K2 ).
Assuming Theorem 5.3.14, we can now prove Theorem 5.3.10 by reducing the statement to the
bipartite case, which we proved earlier. Indeed, for every d-regular graph G,
i(G) ≤ i(G × K2 )1/2 ≤ i(Kd,d )n/(2d),
where the last step follows from applying Theorem 5.3.10 to the bipartite graph G × K2 .
Proof of Theorem 5.3.14. Let 2G denote a disjoint union of two copies of G. Label its vertices by
vi with v ∈ V and i ∈ {0, 1} so that its edges are ui vi with uv ∈ E(G) and i ∈ {0, 1}. We will
give an injection φ : I(2G) → I(G × K2 ). Recall that I(G) is the set of independent sets of G. The
injection would imply i(G)2 = i(2G) ≤ i(G × K2 ) as desired.
Fix an arbitrary order on all subsets of V(G). Let S be an independent set of 2G. Let
Ebad (S) := {uv ∈ E(G) : u0, v1 ∈ S}.
Note that Ebad (S) is a bipartite subgraph of G, since each edge of Ebad has exactly one endpoint
in {v ∈ V(G) : v0 ∈ S} but not both (or else S would not be independent). Let A denote the first
subset (in the previously fixed ordering) of V(G) such that all edges in Ebad (S) have one vertex in A
and the other outside A. Define φ(S) to be the subset of V(G) × {0, 1} obtained by “swapping” the
pairs in A, i.e., for all v ∈ A, vi ∈ φ(S) if and only if v1−i ∈ S for each i ∈ {0, 1}, and for all v < A,
vi ∈ φ(S) if and only if vi ∈ S for each i ∈ {0, 1}. It is not hard to verify that φ(S) is an independent
set in G × K2 . The swapping procedure fixes the “bad” edges.

bad edges swap to get


bolded indep set

2G G × K2 G × K2

It remains to verify that φ is an injection. For every S ∈ I(2G), once we know T = φ(S), we
can recover S by first setting
0
Ebad (T) = {uv ∈ E(G) : ui, vi ∈ T for some i ∈ {0, 1}},
so that Ebad (S) = Ebad
0 (T), and then finding A as earlier and swapping the pairs of A back. (Remark:

it follows that T ∈ I(G × K2 ) lies in the image of φ if and only if Ebad


0 (T) is bipartite.) 
150 5. GRAPH HOMOMORPHISM INEQUALITIES

Remark 5.3.15 (Reverse Sidorenko). Does Theorem 5.3.11 generalize to all regular graphs G like
Theorem 5.3.10? Unfortunately, no. For example, when H = consists of two isolated loops,
hom(G, H) = 2c(G) , with c(G) being the number of connected components of G. So hom(G, H)1/v(G)
is minimized among d-regular graphs G for G = Kd+1 , which is the connected d-regular graph
with the fewest vertices.
Theorem 5.3.11 actually extends to every triangle-free regular graph G. Furthermore, for every
non-triangle-free regular graph G, there is some graph H for which the inequality in Theorem 5.3.11
fails.
There are several families interesting graphs H where Theorem 5.3.11 is known to extend to all
regular bipartite G. Notably, this is true for H = Kq , which is significant since hom(G, Kq ) is the
number of proper q-colorings of G.
There are also generalizations of the above to non-regular graphs. For example, for a graph G
without isolated vertices, letting du denote the degree of u ∈ V(G), we have
Ö
i(G) ≤ i(Kdu,dv )1/(du dv ) .
uv∈E(G)

And similarly for the number of proper q-colorings. In fact, the results mentioned in this remark
about regular graphs are proved by induction on vertices of G, and thus require considering the
larger family of not necessarily regular graphs G.
The results discussed in this remark are due to Sah, Sawhney, Stoner, and Zhao (2019;
2020). They introduced the term reverse Sidorenko inequalities to describe these inequalities
2
t(F, W)1/e(F) ≤ t(Kd,d, W)1/d , which mirror the inequality t(F, W)1/e(F) ≥ t(K2, W) in Sidorenko’s
conjecture. Also see the earlier survey by Zhao (2017) for discussions of related results and open
problems.
Exercise 5.3.16. Let F be a bipartite graph with vertex bipartition A ∪ B such that every vertex in
B has degree d. Let du denote the degree of u in F. Prove that for every graphon W,
Ö
t(F, W) ≤ t(Kdu,dv , W)1/(du dv ) .
uv∈E(G)

Exercise 5.3.17. For a graph G, let fq (G) denote the number of maps V(G) → {0, 1, . . . , q} such
that f (u) + f (v) ≤ q for every uv ∈ E(G). Prove that for every n-vertex d-regular graph G (not
necessarily partite),
fq (G) ≤ fq (Kd,d )n/(2d) .

5.4. Lagrangian
Here is another proof of Turán’s theorem due to Motzkin and Straus (1965). It can be viewed
as a continuous/analytic analogue of the Zykov symmetrization proof of Turán’s theorem from
Section 1.2 (the third proof there).
  2
Theorem 5.4.1 (Turán theorem). Every n-vertex Kr+1 -free graph has at most 1 − 1r n2 edges.

Proof. Let G be a Kr+1 -free graph on vertex set [n]. Consider the function
Õ
f (x1, . . . , xn ) = xi x j .
i j∈E(G)
5.4. LAGRANGIAN 151

We want to show that    


1 1 1 1
f , . . ., ≤ 1− .
n n 2 r
In fact, we will show that
 
1 1
max f (x1, . . . , xn ) ≤ 1− .
x1,...,xn ≥0 2 r
x1 +···+xn =1

By compactness, the maximum is achieved at some x = (x1, . . . , xn ). Let us choose such a


maximizing vector with the minimum support (i.e., the number of nonzero coordinates).
Suppose i j < E(G) for some pair of distinct xi, x j > 0. If we replace (xi, x j ) by (s, xi + x j − s),
then f changes linearly in s (since xi x j does not come up as a summand in f ). Since f is already
maximized at x, it must not change with s. So we can replace (xi, x j ) by (xi + x j , 0), which keeps
f the same while decreasing the number of nonzero coordinates of x.
Thus the support of x is a clique in G. By labeling vertices, say that x1, . . . , x k > 0 and
x k+1 = x k+2 = · · · = xn = 0. Since G is Kr+1 -free, this clique has size k ≤ r. So
  k !2    
Õ 1 1 Õ 1 1 1 1
f (x) = xi x j ≤ 1− xi = 1− ≤ 1− . 
1≤i≤ j≤k
2 k i=1 2 k 2 r

Remark 5.4.2 (Hypergraph Lagrangians). The Lagrangian of a hypergraph H with vertex set [n]
is defined to be
Õ Ö
λ(H) := max f (x1, . . . , xn ), where f (x1, . . . , xn ) = xi .
x1,...,xn ≥0
x1 +···+xn =1 e∈E(H) i∈e

It is a useful tool for certain hypergraph Turán problems. The above proof of Turán’s theorem
shows that for every graph G, λ(G) = (1 − 1/ω(G))/2, where ω(G) is the size of the largest clique
in G. A maximizing x has coordinate 1/ω(G) on vertices of the clique and zero elsewhere.
As an alternate but equivalent perspective, the above proof can rephrased in terms of maximizing
the edge density among Kr+1 -free vertex-weighted graphs (vertex weights are given by the vector
x above). The proof shifts weights between non-adjacent vertices while not decreasing the edge
density, and this process preserves Kr+1 -freeness.
Using a similar technique, we show that to check whether a linear inequality in clique densities
in G holds, it suffices to check it for G being cliques. The next theorem is due to Bollobás (1976).
We first need the following lemma about the extrema of a symmetric polynomial over a simple.

Lemma 5.4.3. Let f (x1, . . . , xn ) be a symmetric polynomial with real coefficients. Suppose x =
(x1, . . . , xn ) minimizes f (x) among all vectors x ∈ Rn with x1, . . . , xn ≥ 0 and x1 + · · · + xn = 1,
and furthermore x has minimum support size among all such minimizers. Then, up to permuting
the coordinates of x, there is some 1 ≤ k ≤ n so that
x1 = · · · = x k = 1/k and x k+1 = · · · = xn = 0.
Proof. Suppose x1, . . . , x k > 0 and x k+1 = · · · = xn = 0 with k ≥ 2. Fixing x3, . . . , xn , we see that
as a function of (x1, x2 ), f has the form
Ax1 x2 + Bx1 + Bx2 + C
152 5. GRAPH HOMOMORPHISM INEQUALITIES

where A, B, C depend on x3, . . . , xn . Notably the coefficients of x1 and x2 agree due since f is a
symmetric polynomial. Holding x1 + x2 fixed, f has the form
Ax1 x2 + C 0 .
If A ≥ 0, then holding x1 + x2 fixed, we can set either x1 or x2 to be zero while not increasing f ,
which contradicts the hypothesis that the minimizing x has minimum support size. So A < 0, so
that with x1 + x2 held fixed, Ax1 x2 + C 0 is minimized uniquely at x1 = x2 . Thus x1 = x2 . Likewise,
x1 = · · · = x k , as claimed. 

Theorem 5.4.4 (Linear inequalities between clique densities). Let c1, · · · , c` ∈ R. The inequality
Õ̀
cr t(Kr , G) ≥ 0
r=1
is true for every graph G if and only if it is true with G = Kn for every positive integer n.
More explicitly, the above inequality holds for all graphs G if and only if
Õ̀ n(n − 1) · · · (n − r + 1)
cr · ≥0 for every n ∈ N.
r=1
nr
Since this is a single variable polynomial in m, it is usually easy to check this inequality. We will
see some examples right after the proof.
Proof. Suppose the displayed inequality holds for all cliques G. Let G be an arbitrary graph with
vertex set [n]. Let Õ
fr (x1, . . . , xn ) = xi1 · · · xir
{i1,...,ir }
r-clique in G
and
Õ̀
f (x1, . . . , xn ) = r!cr fr (x1, . . . , xn ).
r=1
So
Õ̀
f (1/n, . . . , 1/n) = cr t(Kr , G).
r=1
It suffices to prove that
min f (x1, . . . , xn ) ≥ 0.
x1,...,xn ≥0
x1 +···+xn =1
By compactness, we can assume that the minimum is attained at some x. Among all minimizing
x, choose one with the smallest support (i.e., the number of nonzero coordinates).
As in the previous proof, if i j < E(G) for some pair of distinct xi, x j > 0, then, replacing
(xi, x j ) by (s, xi + x j − s), f changes linearly in s. Since f is already maximized at x, it must not
change with s. So we can replace (xi, x j ) by (xi + x j , 0), which keeps f the same while decreasing
the number of nonzero coordinates of x. Thus the support of x is a clique in G. Suppose x is
supported on coordinates [k] So f is a symmetric polynomial in x1, . . . , x k . Lemma 5.4.3 implies
Í`
that x1 = · · · = x k = 1/k. Then f (x) = r=1 cr t(Kr , Kk ) ≥ 0 by hypothesis. 
Theorem 5.4.4 can be equivalently instated in terms of the convex hull of the region of all
possible clique density tuples.
5.6. EXERCISES 153

Corollary 5.4.5. Let ` ≥ 3. In R`−1 , the convex hull of


{(t(K2, W), t(K3, W), · · · , t(K`, W)) : graphons W }
is the same as the convex hull of
{(t(K2, Kn ), t(K3, Kn ), · · · , t(K`, Kn )) : n ∈ N} .
For ` = 3, the points
   
1 1 2
(t(K2, Kn ), t(K3, Kn )) = 1 − , 1 − 1− , n ∈ N,
n n n
are the extremal points of the convex hull of the edge-triangle region from (5.1.1). The actual
region, illustrated in Figure 5.1.1, has lower boundary consisting of concave curves connecting the
points (t(K2, Kn ), t(K3, Kn )).
This convex hull description easily implies Turán’s theorem (exercise).

5.5. Entropy
To be written

5.6. Exercises
Exercise 5.6.1. Prove that Ks,t is forcing whenever s, t ≥ 2.

Exercise 5.6.2∗. Let K4− be K4 with an edge removed. Prove that K4− is common. In other words,
show that for all graphons W,
t(K4−, W) + t(K4−, 1 − W) ≥ 2−4 .
The next exercise asks you to extend Goodman’s bound (Theorem 5.2.8). The inequality implies
Turán’s theorem (Theorem 1.2.4).
Exercise 5.6.3 (A lower bound on clique density). Show that for every positive integer r ≥ 3, and
graphon W,
t(Kr , W) ≥ p(2p − 1)(3p − 2) · · · ((r − 1)p − (r − 2)) .
Note that this inequality is tight when W is the associated graphon of a clique.
2
Exercise 5.6.4 (Cliquey edges). Show that every n-vertex graph with (1 − 1r ) n2 + t edges has at
least rt edges that belong to a Kr+1 .
Exercise 5.6.5∗ (Maximizing K1,2 density). Prove that, for every p ∈ [0, 1], among all graphons W
with t(K2, W) = p, the maximum possible value of t(K1,2, W) is attained by either a “clique” or a
“hub” graphon, illustrated below.
0 a 1 0 a 1
1
1 a
a 0
0
1 1
clique graphon hub graphon
W(x, y) = 1max{x,y } ≤a W(x, y) = 1min{x,y } ≤a
154 5. GRAPH HOMOMORPHISM INEQUALITIES

Further reading
The book Large Networks and Graph Limits by Lovász (2012) contains an excellent treatment
of graph homomorphism inequalities in Section 2.1 and Chapter 16.
The survey Flag Algebras: An Interim Report by Razborov (2013) contains a survey of results
obtained using the flag algebra method.
CHAPTER 6

Forbidding 3-term arithmetic progressions

In this chapter, we study Roth’s theorem, which says that every 3-AP-free subset of [N] has size
o(N).
Previously, in Section 2.4, we gave a proof of Roth’s theorem using the graph regularity lemma.
The main goal of this chapter is to give a Fourier analytic proof of Roth’s theorem. This is also
Roth’s original proof (1953).
We begin by proving Roth’s theorem in the finite field model. That is, we first prove an
analogue of Roth’s theorem in F3n . Finite field vector spaces serves as an excellent playground for
many additive combinatorics problems. Techniques such as Fourier analysis are often simpler to
carry out in the finite field model. After we develop the techniques in the finite field model, we
then prove Roth’s theorem in the integers. It can be a good idea to first try out ideas in the finite
field model before bringing them to the integers.
Later in Section 6.5, we will see a complete different proof of Roth’s theorem in F3n using the
polynomial method, which gives significantly better quantitative bounds. This proof surprised
many people at the time of its discovery. However, this polynomial method technique is only
applicable in the finite field setting, and it is not known how to apply it in the integers.

6.1. Fourier analysis in finite field vector spaces


We review some basic facts about Fourier analysis in Fnp for a prime p. Everything here can be
extended to arbitrary abelian groups. As we saw in Section 3.3, eigenvalues of Cayley graphs on
an abelian group and the Fourier transform are intimately related.
Throughout this section, we fix a prime p and let
ω = exp(2πi/p).

Definition 6.1.1 (Fourier transform in Fnp ). The Fourier transform f : Fnp → C is a function b
f : Fnp →
C defined by setting, for each r ∈ Fnp ,
1 Õ
f (r) := E x∈Fnp f (x)ω−r·x = n
b f (x)ω−r·x
p x∈Fn
p

where r · x = r1 x1 + · · · + rn xn .
In particular, bf (0) = E f is the average of f . This value often plays a special role compared to
other values f (r).
b
To simplify notation, it is generally understood that the variables being averaged or summed
over are varying uniformly in the domain Fnp .
Let us now state several important properties of the Fourier transform. We will see that all these
properties are consequences of the orthogonality of the Fourier basis.
The next result allows us to write f in terms of b f.
155
156 6. FORBIDDING 3-TERM ARITHMETIC PROGRESSIONS

Theorem 6.1.2 (Fourier inversion formula). Let f : Fnp → C. For every x ∈ Fnp ,
Õ
f (x) = f (r)ωr·x .
b
r∈Fnp

The next result tells us that the Fourier transform preserves inner products.

Theorem 6.1.3 (Parseval’s identity). Given f , g : Fnp → C, we have


Õ
E x∈F3n f (x)g(x) = f (r)b
b g (r).
r∈F3n

In particular, as a special case ( f = g),


Õ
E x∈F3n | f (x)| 2 = f (r)| 2 .
|b
r∈F3n

As is nowadays the standard in additive combinatorics, we adopt the following convention for
the Fourier transform in finite abelian groups:
average in physical space E f
Íb
and sum in frequency (Fourier) space f.
For example, following this convention, we define an “averaging” inner product for functions
f , g : Fnp → C by
h f , gi := E x∈Fnp f (x)g(x) and k f k := h f , f i 1/2 .
In the frequency/Fourier domain, we define the “summing” inner product for functions α, β : Fnp →
C by Õ
hα, βi `2 := α(x)β(x). and kαk `2 := hα, αi `1/2
2
x∈Fnp
Writing γr : Fnp → C for the function defined by
γr (x) := ωr·x
(this is a character of the group Fnp ), the Fourier transform can be written as
f (r) = E x f (x)γr (x) = h f , γr i .
b (6.1.1)
Parseval’s identity can be stated as
h f , gi = h b
f, b
g i`2 and kfk = kb
f k`2 .
With these conventions, we often do not need to keep track of normalization factors.
The above identities can be proved via direct verification, by plugging in the formula for the
Fourier transform. We give a more conceptual proof below.
Proof of the Fourier inversion formula (Theorem 6.1.2). Let γr (x) = ωr·x . Then the set of function
{γr : r ∈ Fnp }
forms an orthonormal basis for the space of functions Fnp → C with respect to the averaging inner
product h·, ·i. Indeed, (
1 if r = s,,
hγr , γs i = E x ω(r−s)·x =
0 if r , s
6.1. FOURIER ANALYSIS IN FINITE FIELD VECTOR SPACES 157

Furthermore, there are pn functions (as r ranges over Fnp ). So they form a basis of the pn -dimensional
vector space of all functions f : Fnp → C. We will call this basis the Fourier basis.
Now, given an arbitrary f : Fnp → C, the “coordinate” of f with respect to the basis vector γr
Fourier basis is h f , γr i = b
f (r) by (6.1.1). So
Õ
f = f (r)γr .
b
r

This is precisely the Fourier inversion formula. 


Proof of Parseval’s identity (Theorem 6.1.3). Continuing from the previous proof, since the Fourier
basis is orthonormal, we can evaluate h f , gi with respects to coordinates in this basis, thereby by
yielding
Õ Õ
h f , gi = h f , γr i hg, γr i = f (r)b
b g (r). 
r∈Fnp r∈Fnp

Remark 6.1.4. Parseval’s identity is sometimes also referred to by the name Plancheral. Parseval
derived the identity for the Fourier series of a periodic function on R, whereas Plancheral derived
it for the Fourier transform on R.
The convolution is an important operation.
Definition 6.1.5 (Convolution). Define f , g : Fnp → C, define f ∗ g : Fnp → C by

( f ∗ g)(x) := E y∈Fnp f (y)g(x − y).


In other words, ( f ∗ g)(x) is the average of f (y)g(z) over all pairs (y, z) with y + z = x.
Example 6.1.6. (a) If f is supported on A ⊂ Fnp and g is supported on B ⊂ Fnp , then f ∗ g is
supported on the sum set A + B = {a + b : a ∈ A, b ∈ B}.
(b) Let W be a subspace of Fnp . Let µW = (pn /|W |)1W be the indicator function on W normalized
so that EµW = 1. Then for any f : Fnp → C, the function f ∗ µW is obtained from f by replacing
its value at x by its average value on the coset x + W.
The second example suggests that convolution can be thought of as smoothing a function,
damping its potentially rough perturbations.
The Fourier transform conveniently converts convolutions to multiplication.

Theorem 6.1.7 (Convolution identity). For any f , g : Fnp → C and any r ∈ Fnp ,

f ∗ g(r) = b
š f (r)b
g (r).
Proof. We have

f ∗ g(r) = E x ( f ∗ g)(x)ω−r·x = E x E y,z:y+z=x f (y)g(z)ω−r·(y+z)


š
= E y,z f (y)g(z)ω−r·(y+z) = E y f (y)ω−r·y (Ez g(z)ω−r·z ) = b

f (r)b
g (r). 
By repeated applications of the convolution identity, we have
( f1 ∗ · · · ∗ fk )∧ = b f2 · · · b
f1 b fk
(here we write f ∧ for b
f for typographical reasons).
158 6. FORBIDDING 3-TERM ARITHMETIC PROGRESSIONS

Definition 6.1.8 (3-AP density). Given f , g, h : F3n → C, we write


Λ( f , g, h) := E x,y f (x)g(x + y)h(x + 2y), (6.1.2)
and
Λ3 ( f ) := Λ( f , f , f ), (6.1.3)
Note that for any A ⊂ F3n ,
Λ(1 A) = 3−2n |{(x, y) : x, x + y, x + 2y ∈ A}| = “3-AP density of A.”.
Here include “trivial” 3-APs (i.e., those with with y = 0).
The following identity, relating the Fourier transform and 3-APs, plays a central role in the
Fourier analytic proof of Roth’s theorem.

Proposition 6.1.9 (Fourier and 3-AP). Let p be an odd prime. If f , g, h : Fnp → C, then
Õ
Λ( f , g, h) = f (r)b
b g (−2r)b
h(r).
r

We will give two proofs of this proposition. The first proof is more mechanically straightforward.
It is similar to proof of the convolution identity earlier. The second proof directly applies the
convolution identity, and may be a bit more abstract/conceptual.
First proof. We expand the left-hand side using the formula for Fourier inversion.
! ! !
Õ Õ Õ
E x,y f (x)g(x + y)h(x + 2y) = E x,y f (r1 )ω
b r1 ·x
g (r2 )ω
b r2 ·(x+y)
h(r3 )ω
b r3 ·(x+2y)

r1 r2 r3

h(r3 )E x ω x·(r1 +r2 +r3 ) E y ω y·(r2 +2r3 )


Õ
= f (r1 )b
b g (r2 )b
r1,r2,r3
Õ
= f (r1 )b
b g (r2 )b
h(r3 )1r1 +r2 +r3 =0 1r2 +2r3 =0
r1,r2,r3
Õ
= f (r)b
b g (−2r)b
h(r).
r
The last due is to due r1 + r2 + r3 = 0 and r2 + 2r3 together imply r1 = r2 = r3 . 
Second proof. Write g1 (y) = g(−y/2). So gb1 (r) = b
g (−2r). Applying the convolution identity.
E x,y f (x)g(x + y)h(x + 2y) = E x,y,z:x−2y+z=0 f (x)g(y)h(z)
= E x,y,z:x+y+z=0 f (x)g1 (y)h(z)
= ( f ∗ g1 ∗ h)(0)
Õ
= fœ∗ g1 ∗ h(r) [Fourier inversion]
r
Õ
= f (r)gb1 (r)b
b h(r) [Convolution identity]
r

Õ
= f (r)b
b g (−2r)b
h(r). 
r
6.2. ROTH’S THEOREM IN THE FINITE FIELD MODEL 159

Remark 6.1.10. In the following section, we will work in F3n . Since −2 = 1 (and so g1 = g above),
the proof looks even simpler. In particular, by Fourier inversion and the convolution identity,
Λ3 (1 A) = 3−2n {(x, y, z) ∈ A3 : x + y + z = 0}
Õ Õ
= (1 A ∗ 1 A ∗ 1 A)(0) = (1 A ∗ 1 A ∗ 1 A)∧ (r) = 1cA(r)3 . (6.1.4)
r r

When A = −A, the eigenvalues of the adjacency matrix of the Cayley graph Cay(F3n, A) are 3n 1cA(r),
r ∈ F3n (c.f. Section 3.3). The quantity 32n Λ3 (1 A) is the number of closed walks of length 3 in the
Cayley graph Cay(Fnp, A). So the above identity is saying that the number of closed walks of length
3 in Cay(F3n, A) equals to the third moment of the eigenvalues of the adjacency matrix, which is a
general fact for every graph. (When A , −A, we can consider the directed or bipartite version of
this argument.)
The following exercise generalizes the above identity.
Exercise 6.1.11. Let a1, . . . , a k be nonzero integers, none divisible by the prime p. Let f1, . . . , fk : Fnp →
C. Show that
Õ
E x1,...,xk ∈Fnp :a1 x1 +···+ak xk =0 f1 (x1 ) · · · fk (x k ) = f1 (r) · · · b
b fk (r).
r∈Fnp

6.2. Roth’s theorem in the finite field model


In this section, we use Fourier analytic methods to prove the following finite field analogue of
Roth’s theorem (Meshulam 1995). Later in the chapter, we will convert this proof to the integer
setting.
In an abelian group, a set A is said to be 3-AP-free if A does not have three distinct elements of
the form x, x + y, x + 2y. A 3-AP-free subset of F3n is also called a cap set. The cap set problem
asks to determined the size of the largest cap set in F3n .

Theorem 6.2.1 (Roth’s theorem in F3n ). Every 3-AP-free subset of F3n has size O(3n /n).
Remark 6.2.2 (General finite fields). We work in F3n mainly for convenience. The argument pre-
sented in this section also shows that for every odd prime p, there is some constant Cp so that every
3-AP-free subset of Fnp has size ≤ Cp pn /n.
In F3n , there are several equivalent interpretations of x, y, z ∈ F3n forming a 3-AP (allowing the
possibility for a trivial 3-AP with x = y = z):
• (x, y, z) = (x, x + d, x + 2d) for some d;
• x − 2y + z = 0;
• x + y + z = 0;
• x, y, z are all equal or the three distinct points of a line in F3n ,
• for each i, the i-th coordinate of x, y, z are all distinct or all equal.
Remark 6.2.3 (SET card game). The card same SET comes with a deck of 81 cards (see Fig-
ure 6.2.1). Each card has one of four features, and each feature has one of three possibilities:
• Number: 1, 2, 3;
• Symbol: diamond, squiggle, oval;
• Shading: solid, striped, open;
• Color: red, green, purple.
160 6. FORBIDDING 3-TERM ARITHMETIC PROGRESSIONS

3 different colours
3 different shadings 3 different shadings 3 different shadings

3 different symbols
symbols
different numbers of
3 different symbols
3
3 different symbols

Figure 6.2.1. The deck of 81 cards in the game SET. A 20-card cap set is highlighted.
This is the maximum size of a cap set in F43 . Image: Wikipedia.

Each of the 34 = 81 combinations appears exactly once as a card.


In this game, a combination of three cards is called a “set” if each of the four features show up
as all identical or all distinct among the three cards. For the example, the three cards shown below
form a “set”: number (all distinct), symbol (all diamond), shading (all distinct), color (all red).

In a standard play of the game, the dealer lays down twelve cards on table until some player
finds a “set”, in which case the player keeps the three cards of the “set” as their score. And then
6.2. ROTH’S THEOREM IN THE FINITE FIELD MODEL 161

dealer replenishes the table by laying down more cards. If no set is found, then the dealer continues
to lay down more cards until a set is found.
The cards of the game correspond to points of F43 . A “set” is precisely a 3-AP. The cap set
problem in F43 asks for the number of cards without a “set.” The size of the maximum cap set in F43
is 20 (Pellegrino 1970), and an example is shown in Figure 6.2.1.
Here is the proof strategy of Roth’s theorem in F3n :
(1) A 3-AP-free set has a large Fourier coefficient.
(2) A large Fourier coefficient implies density increment on some hyperplane.
(3) Iterate.
As in the proof of the graph regularity lemma (where we refined partitions to obtain an energy
increment), the above process must terminate in a bounded number of steps since the density of a
subset is always between 0 and 1.
Similar to what we saw in Chapter 3 on pseudorandom graphs, a set A ⊂ F3n has pseudorandom
properties if and only if all its Fourier coefficients 1cA(r), for r , 0, are small in absolute value.
When A is pseudorandom in this Fourier-uniform sense, the 3-AP-density of A is similar to that
of a random set with the same density. On the flip side, a large Fourier coefficient in A points
to non-uniformity along the direction of the Fourier character. Then we can restrict A to some
hyperplane and extract a density increment.
The following counting lemma shows that a Fourier-uniform subset of F3n has 3-AP density
similar to that of a random set. It has a similar flavor as the proof that EIG implies C4 in
Theorem 3.1.1. It is also related to the counting lemma for graphons (Theorem 4.5.1). Recall the
3-AP-density Λ3 from Definition 6.1.8.

Lemma 6.2.4 (3-AP counting lemma). Let f : F3n → [0, 1]. Then

f (r)| k f k 22 .
Λ3 ( f ) − (E f )3 ≤ max | b
r,0

Proof. By Proposition 6.1.9 (also see (6.1.4)).


Õ Õ
Λ3 ( f ) = f (r)3 = b
b f (0)3 + f (r)3 .
b
r r,0

Since E f = b
f (0), we have
Õ Õ
Λ3 ( f ) − (E f )3 ≤ f (r)| 3 ≤ max | b
|b f (r)| · f (r)| 2 = max | b
|b f (r)| k f k 22 .
r,0 r,0
r,0 r

The final step is by Parseval. 

f (r)| 3 by k f k 33 in the above inequality step, we would not be able to


Remark 6.2.5. If we bound | b
successfully conclude. Instead, Parseval comes for the rescue. See Remark 3.1.17 for a similar
issue.

Step 1. A 3-AP-free set has a large Fourier coefficient.

Lemma 6.2.6. Let A ⊂ F3n and α = | A| /3n . If A is 3-AP-free and 3n ≥ 2α−2 , then there is r , 0
such that | 1cA(r)| ≥ α2 /2.
162 6. FORBIDDING 3-TERM ARITHMETIC PROGRESSIONS

Proof. Since A is 3-AP-free, Λ3 (A) = | A| /32n = α/3n , as all 3-APs are trivial, i.e., with common
difference zero. By the counting lemma, Lemma 6.2.4,
α
1 A(r)| k1 A k 22 = max |b
α3 − n = α3 − Λ3 (1 A) ≤ max |b 1 A(r)|α.
3 r,0 r,0

By the hypothesis 3n ≥ 2α−2 , the left-hand side above is ≥ α3 /3. So there is some r , 0 with
1cA(r) ≥ α2 /2. 

Step 2. A large Fourier coefficient implies density increment on some hyperplane.

Lemma 6.2.7. Let A ⊂ F3n with α = | A| /3n . Suppose | 1


cA(r)| ≥ δ > 0 for some r , 0. Then A has
density at least α + δ/2 when restricted to some hyperplane.
Proof. We have
α0 + α1 ω + α2 ω2
1 A(r) = E x 1 A(x)ω
c −r·x
=
3
where α0, α1, α2 are densities of A on the cosets of r ⊥ . We want to show that one of α0, α1, α2 is
significantly larger than α. This is easy to check directly, but let us introduce a trick that we will
also use later in the integer setting.
We have α = (α0 + α1 + α2 )/3. By triangle inequality,
3δ ≤ α0 + α1 ω + α2 ω2
= (α0 − α) + (α1 − α)ω + (α2 − α)ω2
≤ |α0 − α| + |α1 − α| + |α2 − α|
2
Õ
|α j − α| + (α j − α) .


j=0

Consequently, there exists j such that |α j − α| + (α j − α) ≥ δ. Note that |t| + t equals 2t if t > 0
and 0 if t ≤ 0. So we deduce that And so α j − α ≥ δ/2, as desired. 
Combining the previous two lemmas, here is what we have proved so far.

Lemma 6.2.8 (Density increment). Let A ⊂ F3n and α = | A| /3n . If A is 3-AP-free and 3n ≥ 2α−2 ,
then A has density at least α + α2 /4 when restricted to some hyperplane. 

Step 3 : Iterate the density increment.


We start with a 3-AP-free A ⊂ F3n . Let V0 := F3n with density α0 := α = | A| /3n . Repeatedly
apply Lemma 6.2.8. After i rounds, we restrict A to a codimension i affine subspace Vi (with
V0 ⊃ V1 ⊃ · · · ). Let αi = | A ∩ Vi | /|Vi | be the density of A in Vi . As long as 2αi−2 ≤ |Vi | = 3n−i , we
can apply Lemma 6.2.8 to obtain a Vi+1 with density increment
αi+1 ≥ αi + αi2 /4.
Since α = α0 ≤ α1 ≤ · · · ≤ 1, and αi increases by ≥ αi2 /4 ≥ α2 /4 at each step, the process
terminates after m ≤ 4/α2 rounds, at which point we must have 3n−m < 2αm−2 ≤ 2α −2
√ (or else we
can continue vai Lemma 6.2.8). So n < m + log3 (2α ) = O(1/α ), i.e., α ≤ 1/ n. This is just
−2 2

shy of the bound α = O(1/n) that we aim to prove. So let us re-do the density increment analysis
more carefully to analyze how quickly αi grows.
6.3. FOURIER ANALYSIS IN THE INTEGERS 163

Each round, αi increases by at least α2 /4. So it takes ≤ d4/αe initial rounds for αi to double.
Once αi ≥ 2α, it then increases by at least αi2 /4 each round afterwards, so it takes ≤ d1/αi e ≤ d1/αe
additional
 2−k  round for the density to double again. And so on: the k-th doubling time is at most
4 /α . Since the density is always at most α, the density can double at most log2 (1/α) times.
So the total number of rounds is at most
Õ  42− j   
1
=O .
α α
j≤log2 (1/α)

Suppose the process terminates after m steps with density αm . Then, examining the hypothesis
of Lemma 6.2.8, we find that the size of the final subspace |Vm | = 3n−m is less than αm−2 ≤ α−2 . So
n ≤ m + O(log(1/α)) ≤ O(1/α). Thus α = | A| /N = O(1/n). This completes the proof of Roth’s
theorem in F3n (Theorem 6.2.1)
Remark 6.2.9 (Quantitative bounds). The best published lower bound on the size of a cap set is
≥ 2.21n (Edel 2004). This is obtained by constructing a cap set in F62 3 of size m = 928 · 112 ≥
9

2.2162 , which then implies, by a product struction, a cap set in F62k3 of size m k for each positive
integer k.
It was an open problem of great interest whether an upper bound of the form cn , with constant
c < 3, were possible on the size of cap sets in F3n . With significant effort, the Fourier analytic
strategy above was extended to prove an upper bound of the form 3n /n1+c (Bateman and Katz 2012).
So it came as quite a shock to the community when a very short polynomial method proof was
discovered, giving an upper bound O(2.76n ) (Croot, Lev, and Pach 2017; Ellenberg and Gijswijt
2017). We will discuss this proof in Section 6.5. However, the polynomial method proof appears
to be specific to the finite field model, and it is not known how to extend it to the integers.

6.3. Fourier analysis in the integers


Now we review the basic notions of Fourier analysis on the integers. In the next section, we
adapting the proof of Roth’s theorem from F3n to Z. The notions that we introduce below are better
known as Fourier series (albeit usually with different notational convention).
Here R/Z is the set of reals mod 1.
Definition 6.3.1 (Fourier transform in Z). Given a finitely supported f : Z → C, define b
f : R/Z →
C by setting, for all θ ∈ R,
Õ
f (θ) :=
b f (x)e(−xθ),
x∈Z
where
e(t) := exp(2πit), t ∈ R.

Note that bf (θ) = b


f (θ + n) for all integers n. So b
f : R/Z → C is well defined.
The various identities in Section 6.1 have counterparts stated below. We leave the proofs as
exercises for the reader.

Theorem 6.3.2 (Fourier inversion formula). Given a finitely supported f : Z → C, for any x ∈ Z,
∫ 1
f (x) = f (θ)e(xθ) dθ.
b
0
164 6. FORBIDDING 3-TERM ARITHMETIC PROGRESSIONS

Theorem 6.3.3 (Parseval’s identity). Given finitely supported f , g : Z → C,


Õ ∫ 1
f (x)g(x) = f (θ)b
b g (θ) dθ
x∈Z 0

In particular, as a special case ( f = g),


Õ ∫ 1
| f (x)| =
2
f (θ)| 2 dθ
|b
x∈Z 0

Note the normalization conventions: we sum in the physical space Z (there is no sensible way
to average in Z) and average in the frequency space R/Z.
Definition 6.3.4 (Convolution). Given finitely supported f , g : Z → C, define f ∗ g : Z → C by
Õ
( f ∗ g)(x) := f (y)g(x − y).
y∈Z

Theorem 6.3.5 (Convolution identity). Given finitely supported f , g : Z → C, for any θ ∈ R/Z,
f ∗ g(θ) = b
š f (θ)b
g (θ).
Given finitely supported f , g, h : Z → C, define
Õ
Λ( f , g, h) := f (x)g(x + y)h(x + 2y)
x,y∈Z

and
Λ3 ( f ) := Λ( f , f , f ).
Then for any finite set A of integers,
Λ3 (A) = |{(x, y) : x, x + y, x + 2y ∈ A}|
counts the number of 3-APs in A, where each non-trivial 3-AP is counted twice, forward and
backward, and each trivial 3-AP is counted once.

Proposition 6.3.6 (Fourier and 3-AP). Given finitely supported f , g, h : Z → C,


∫ 1
Λ( f , g, h) = f (θ)b
b g (−2θ)b
h(θ) dθ.
0
Exercise 6.3.7. Prove all the identities above.

6.4. Roth’s theorem in the integers


In Section 6.2 we saw a Fourier analytic proof of Roth’s theorem in F3n . In this section, we adapt
the proof to the integers and obtain the following result. This is Roth’s original proof (1953).

Theorem 6.4.1 (Roth’s theorem). Every 3-AP-free subset of [N] = {1, . . . , N } has size O(N/log log N).

The proof of Roth’s theorem in F3n proceeded by density increment when restricting to subspaces.
An important difference between F3n and Z is that Z has no subspaces (more on this later). Instead,
we will proceed in Z by restricting to subprogressions. In this section, by a progression we mean
an arithmetic progression.
6.4. ROTH’S THEOREM IN THE INTEGERS 165

We have the following analog of Lemma 6.2.4. It says that if f and g are “Fourier-close,”, then
they have similar 3-AP counts. We write
! 1/2
Õ
kb
f k∞ := sup | b
f (θ)| and k f k`2 := | f (x)| 2 .
θ x∈Z

Proposition 6.4.2 (3-AP counting lemma). Let f , g : Z → C be finitely supported functions. Then

f − gk∞ max k f k `22 , kgk `22 .



|Λ3 ( f ) − Λ3 (g)| ≤ 3k š
Proof. We have
Λ3 ( f ) − Λ3 (g) = Λ( f − g, f , f ) + Λ(g, f − g, f ) + Λ(g, g, f − g).
Let us bound the first term on the right-hand side. We have
∫ 1
|Λ( f − g, f , f )| = (œ
f − g)(θ) b
f (−2θ) b
f (θ) dθ [By Proposition 6.3.6]
0
∫ 1
≤ kš
f − gk∞ f (−2θ) b
b f (θ) dθ [Triangle inequality]
0
∫ 1  1/2 ∫ 1  1/2
2 2
≤ kš
f − gk∞ f (−2θ) dθ
b f (θ) dθ
b [Cauchy-Schwarz]
0 0
f − gk∞ k f k `22 .
≤ kš [Parseval]
By similar arguments, we have
|Λ(g, f − g, f )| ≤ k š
f − gk∞ k f k `2 kgk `2
and
f − gk∞ kgk `22 .
|Λ(g, g, f − g)| ≤ k š
Combining with the first sum gives the result. 
Now we prove Roth’s theorem by following the same steps as in Section 6.2 for the finite field
setting.
Step 1. A 3-AP-free set has a large Fourier coefficient.

Lemma 6.4.3. Let A ⊂ [N] be a 3-AP free set with | A| = αN. If N ≥ 5α−2 , then there exists
θ ∈ R/Z satisfying
N
Õ α2
(1 A − α)(x)e(θ x) ≥ N.
x=1
10

Proof. Since A is 3-AP-free, the quantity 1 A (x)1 A (x + y)1 A (x + 2y) is nonzero only for trivial APs,
i.e. when y = 0. Thus
Λ3 (1 A) = | A| = αN.
On the other hand, a 3-AP in [N] can be counted by counting pairs integers with the same parity to
form the first and third element of the 3-AP, yielding,
Λ3 (1[N] ) = bN/2c 2 + dN/2e 2 ≥ N 2 /2.
166 6. FORBIDDING 3-TERM ARITHMETIC PROGRESSIONS

Now apply the counting lemma (Proposition 6.4.2) to f = 1 A and g = α1[N] . We have k1 A k`22 =
| A| = αN and kα1[N] k`22 = α2 N. So
α3 N 2
− αN ≤ α3 Λ3 (1[N] ) − Λ3 (1 A) ≤ 3αN (1 A − α1[N] )∧ ∞
.
2
Thus, using N ≥ 5/α2 , we have

2α N− αN 1 2
1 3 2
1 1 2
(1 A − α1[N] )∧ ∞
≥ = α N− ≥ α N.
3αN 6 3 10
Therefore there exists some θ ∈ R with
N
Õ 1 2
(1 A − α)(x)e(θ x) = (1 A − α1[N] )∧ (θ) ≥ α N. 
x=1
10

Step 2: A large Fourier coefficient implies density increment on a subprogression.


In the finite field model, if 1cA(r) is large for some r ∈ F3n \ {0}, then we obtained a density
increment by restricting A to some coset of the hyperplane r ⊥ .
How can we adapt this argument in the integers?
In the finite field model, we used that the Fourier character γr (x) = ωr·x is constant on each coset
of the hyperplane r ⊥ ⊂ F3n . In the integer setting, we want to partition [N] into subprogressions
such that the character Z → C : x 7→ e(xθ) is roughly constant on each subprogression. As a
simple example, assume that θ is a rational a/b for some fairly small b. Then x 7→ e(xθ) is constant
on arithmetic progressions with common difference b. Thus we could partition [N] into arithmetic
progressions with common difference b. This is useful as long as b is not too large. On the other
hand, if b is too large, or if θ is irrational, then we would want to approximate θ be a rational
number with small denominator.

Lemma 6.4.4 (Dirichlet’s lemma). Let θ ∈ R and 0 < δ < 1. Then there exists a positive integer
d ≤ 1/δ such that kdθ k R/Z ≤ δ. (kθ k R/Z is the distance from θ to the nearest integer).
Proof. Let m = b1/δc. By the pigeonhole principle, among the m + 1 numbers 0, θ, · · · , mθ, we can
find 0 ≤ i < j ≤ m such that the fractional parts of iθ and jθ differ by at most δ. Set d = |i − j |.
Then kdθ k R/Z ≤ δ, as desired. 
Given θ, we now partition [N] into subprogressions with roughly constant e(xθ) inside each
progression. The constants appearing in rest of this argument are mostly unimportant.

Lemma 6.4.5. Let 0 < η < 1 and θ ∈ R. Suppose N ≥ (4π/η)6 . Then one can partition [N] into
subprogression Pi , each with length
N 1/3 ≤ |Pi | ≤ 2N 1/3,
such that
sup |e(xθ) − e(yθ)| < η, for each i.
x,y∈Pi
√ √
Proof. By Lemma 6.4.4, there is a positive integer d <
N such that kdθ k R/Z ≤ 1/ N. Partition
[N] greedily into progressions with common difference d of lengths between N 1/3 and 2N 1/3 .
6.4. ROTH’S THEOREM IN THE INTEGERS 167

Then, for two elements x, y within the same progression Pi , we have


|e(xθ) − e(yθ)| ≤ |Pi | |e(dθ) − 1| ≤ 2N 1/3 · 2π · N −1/2 ≤ η.
Here we use the inequality |e(dθ) − 1| ≤ 2π kdθ k R/Z from the fact that the length of a chord on a
circle is at most the length of the corresponding arc. 

We can now apply this lemma to obtain a density increment.

Lemma 6.4.6 (Density increment). Let A ⊂ [N] be 3-AP-free, with | A| = αN and N ≥ (16/α)12 .
Then there exists a subprogression P ⊂ [N] with |P| ≥ N 1/3 and | A ∩ P| ≥ (α + α2 /40) |P| .
Proof. By Lemma 6.4.3, there exists θ satisfying
N
Õ α2
(1 A − α)(x)e(xθ) ≥ N.
x=1
10

Next, apply Lemma 6.4.5 with η = α2 /20 (the hypothesis N ≥ (4π/η)6 is satisfied since (16/α)12 ≥
(80π/α2 )6 = (4π/η)6 ) to obtain a partition P1, . . . , Pk of [N] satisfying N 1/3 ≤ |Pi | ≤ 2N 1/3 and
α2
|e(xθ) − e(yθ)| ≤ for all i and x, y ∈ Pi .
20
So on each Pi ,
Õ Õ α2
(1 A − α)(x)e(xθ) ≤ (1 A − α)(x) + |Pi |.
x∈Pi x∈Pi
20
Thus
N
α2 Õ
N≤ (1 A − α)(x)e(xθ)
10 x=1
k Õ
Õ
≤ (1 A − α)(x)e(xθ) .
i=1 x∈Pi
k
!
Õ Õ α2
≤ (1 A − α)(x) + |Pi |
i=1 x∈P
20
i
k
Õ Õ α2
= (1 A − α)(x) + N
i=1 x∈Pi
20

Thus
k Õ
α2 Õ
N≤ (1 A − α)(x)
20 i=1 x∈P i

and hence
k k
α2 Õ Õ
|Pi | ≤ | A ∩ Pi | − α|Pi | .
20 i=1 i=1
168 6. FORBIDDING 3-TERM ARITHMETIC PROGRESSIONS

We want to show that there exists some Pi such that A has a density increment when restricted to
Pi . The following trick is convenient. Note that
k k
α2 Õ Õ
|Pi | ≤ | A ∩ Pi | − α|Pi |
20 i=1 i=1
k
Õ
= | A ∩ Pi | − α|Pi | + (| A ∩ Pi | − α|Pi |) .

i=1
Thus there exists an i such that
α2
|Pi | ≤ | A ∩ Pi | − α|Pi | + (| A ∩ Pi | − α|Pi |) .
20
Since |t| + t is 2t for t > 0 and 0 for t ≤ 0, we deduce
α2
|Pi | ≤ 2(| A ∩ Pi | − α|Pi |),
20
which yields
α2
 
| A ∩ Pi | ≥ α + |Pi |. 
40

Step 3: Iterate the density increment.


This step is nearly identical to the proof in the finite field model. Start with α0 = α and N0 = N.
After i iterations, we arrive at a subprogression of length Ni where A has density αi . As long as
Ni ≥ (16/αi )12 , we can apply apply Lemma 6.4.6 to pass down to a subprogression with
Ni+1 ≥ Ni1/3 and αi+1 ≥ αi + αi2 /40.
We double αi from α0 after at ≤ d40/αe iterations. Once the density reaches at least 2α, the next
doubling takes ≤ d20/αe iterations, and so on. In general, the k-th doubling requires ≤ d40·2−k /αe
iterations. There are at most log2 (1/α) doublings since the density is always at most 1. Summing
up, the total number of iterations is
2 (1/α)
logÕ
40 · 2−k /α = O(1/α).
 
m≤
i=1
m
When the process terminates, we must have N 1/2 ≤ Nm by Lemma 6.4.6. So
m
N 1/2 ≤ Nm < (16/αi )12 ≤ (16/α)12 .
So
m O(1/α)
N ≤ (16/α)12·2 ≤ (16/α)2 .
Therefore  
| A| 1
=α=O .
N log log N
This completes the proof of Roth’s theorem (Theorem 6.4.1). 
We saw that the proofs in F3n
and Z have largely the same set of ideas, but the proof in Z is
somewhat more technically involved. The finite field model is often a good sandbox to try out
Fourier analytic ideas.
6.5. THE POLYNOMIAL METHOD 169

Remark 6.4.7 (Bohr sets). Let us compare the results in F3n and [N]. Write N = 3n for the size of
the ambient size in both cases, for comparison. We obtained an upper bound of O(N/log N) for
3-AP-free sets in F3n and O(N/log log N) in [N] ⊂ Z. Where does the difference in quantitative
bounds stem from?
In the density increment step for F3n , at each step, we pass down to a subset which had size
a constant factor (namely 1/3) of the original one. However, in [N], each iteration gives us a
subprogression which has size equal to the cube root of the previous subprogression. The extra log
for Roth’s theorem in the integers comes from this rapid reduction in the sizes of the subprogressions.
Can we do better? Perhaps by passing down to subsets of [N] that look more like subspaces?
Indeed, this possible. Bourgain introduced a notion of Bohr sets, which mimic properties of
subspaces in settings like Z where subspaces are not available. Given θ 1, . . . , θ k , and some  > 0,
a Bohr set has the form
x ∈ [N] : k xθ j kR/Z ≤  for each j = 1, . . . , k .


To see what this is analogous to subspaces, note that we can define a subspace pf F3n as a set of the
following form
x ∈ F3n : r j · x = 0 for each j = 1, . . . , k .


where r1, . . . , rk ∈ F3n \ {0}. Bourgain (1999) introduced Bohr sets1, and used it to improve
quantitative bounds on Roth’s theorem to N/(log N)1/2+o(1) . Bohr sets are used widely in additive
combinatorics, and in nearly all subsequent work on Roth’s theorem in the integers, including the
proof of the current best bound N/(log N)1+c for some constant c > 0 (Bloom and Sisask 2020).
We will see Bohr sets again in the proof of Freiman’s theorem in ??.

6.5. The polynomial method


An important breakthrough of Croot, Lev, and Pach (2017) showed how to apply the polynomial
method to Roth-type problems in the finite field model. Their method quickly found many
applications. Less than a week after the Croot, Lev, and Pach paper was made public, Ellenberg
and Gijswijt (2017) adapted their argument prove the following power-saving bound on the cap set
problem. The discovery came as quite a shock to the community, especially as the proof is so short.
It led to many more applications and had a significant impact in additive combinatorics.

Theorem 6.5.1. Every 3-AP-free subset of F3n has size O(2.76n ).

The presentation of the proof below is due to ?.


Recall from linear algebra the usual rank of a matrix. Here we can view an | A| × | A| matrix over
the field F as a function F : A × A → F. A function F is said to have rank 1 if F(x, y) = f (x)g(y)
for some nonzero functions f , g : A → F. More generally, the rank of F is the minimum k so that
F can be written as a sum of k rank 1 functions.
More generally, for other notions of rank, we can first define the set of rank 1 functions, and then
define the rank of F to be the minimum k so that F can be written as a sum of k rank 1 functions.
Whereas a function A × A → F corresponds to a matrix, a function A × A × A → F correspond
to a 3-tensor. There is a notion of tensor rank, where the rank 1 functions are those of the form
F(x, y, z) = F(x, y, z) = f (x)g(y)h(z). This is a standard and important notion (which comes with
a lot of mystery), but it is not the one that we shall use.
1Apparently named after Harald Bohr, a mathematical analyst and Olympic Silver Medal winning footballer who
is also the brother of the Nobel Prize-winning physicist Niels Bohr.
170 6. FORBIDDING 3-TERM ARITHMETIC PROGRESSIONS

Definition 6.5.2 (Slice rank). A function F : A × A × A → F is said to have slice rank 1 if it can
be written as
f (x)g(y, z), f (y)g(x, z), or f (z)g(x, y),
for some nonzero functions f : A → F and g : A × A → F. The slice rank of a function
F : A × A × A → F is the minimum k so that F can be written as a sum of k slice rank 1 functions.
We need the following fact from linear algebra.

Lemma 6.5.3. Every k-dimensional subspace of an n-dimensional vector space (over any field)
contains a point with at least k nonzero coordinates.
Proof. Form a k × n matrix M whose rows are a basis of this k-dimensional subspace W. Then
M has rank k. So it has some invertible k × k submatrix with columns S ⊂ [n] with |S| = k.
Then for every z ∈ FS , there is some linear combination of the rows whose coordinates on S are
identical to those of z. In particular, there is some vector in the k-dimensional subspace W whose
S-coordinates are all nonzero. 
A diagonal matrix with nonzero diagonal entries has full rank. We show that a similar statement
holds true for the slice rank.

Lemma 6.5.4. Suppose F : A × A × A → F satisfies F(x, y, z) , 0 if and only if x = y = z. Then


F has slice rank | A|.
Proof. Every F : A×A×A → F has slice rank ≤ | A|, since we can write F(x, y, z) = a∈A δa (x)F(a, y, z),
Í
where δa denotes the function taking value 1 at a and 0 elsewehre.
For the converse, suppose F(x, y, z) can be written as a sum of functions of the form
f (x)g(y, z), f (y)g(x, z), and f (z)g(x, y).
Suppose that there are m1 summands of the first type, m2 of the second type, and m3 of the third
type. By Lemma 6.5.3, there Íis some function h : A → F that is orthogonal to all the f ’s from the
third type of summands (i.e., x∈A f (x)h(z) = 0), and such that |supp h| ≥ | A| − m3 . Let
Õ
G(x, y) = F(x, y, z)h(z).
z∈A
Only summands of the first two types remain. The first type of summand turns into a rank 1 function
(in the matrix sense of the rank)
Õ
(x, y) 7→ f (x)g(y, z)h(z) = f (x)e
g (y)
z
for some new function e g : A → F. Similarly with functions of the second type. So G (viewed as an
| A| × | A| matrix) has rank ≤ m1 + m2 . On the other hand,
(
h(x) if x = y,
G(x, y) =
0 if x , y.
This G has rank |supp h| ≥ | A| − m3 . Combining, we get
| A| − m3 ≤ rank G ≤ m1 + m2 .
So m1 + m2 + m3 ≥ | A|. This shows that the slice rank of F is ≥ | A|. 
Now we prove an upper bound on the slice rank by invoking magical powers of polynomials.
6.5. THE POLYNOMIAL METHOD 171

Lemma 6.5.5. Define F : A × A × A → F3 by


(
1 if x + y + z = 0,
F(x + y + z) =
0 otherwise.

Then the slice rank of F is at most


Õ n!
3 .
a,b,c≥0
a!b!c!
a+b+c=n
b+2c≤2n/3

Proof. In F3 , one has


(
1 if x = 0
1 − x2 =
0 if x , 0.
So, writing x = (x1, . . . , xn ), y = (y1, . . . , yn ), and z = (z1, . . . , zn ), we have
n
Ö
F(x + y + z) = (1 − (xi + yi + zi )2 ), (6.5.1)
i=1

If we expand the right-hand side, we obtain a polynomial in 3n variables with degree 2n. This is a
sum of monomials, each of the form
j j
x1i1 · · · xnin y11 · · · ynn z1k1 · · · znk n,
where i1, i2, . . . , in, j1, . . . , jn, k 1, . . . , k n ∈ {0, 1, 2}. For each term, by the pigeonhole principle, at
least one of i1 + · · · + in, j1 + · · · + jn, k 1 + · · · + k n is at most 2n/3. So we can split these summands
into three sets:
Ö n Õ
(1 − (xi + yi + zi )2 ) = x1i1 · · · xnin fi1,...,in (y, z)
i=1 i1 +···+in ≤ 2n
3
Õ
j j
+ y11 · · · ynn g j1,..., jn (x, z)
j1 +···+ jn ≤ 2n
3
Õ
+ z1k1 · · · znk n h k1,...,k n (x, y).
k1 +···+k n ≤ 2n
3

Each summand has slice rank at most 1. The number of summands in the first sum is precisely
the number of triples of nonnegative integers a, b, c with a + b + c = n and b + 2c ≤ 2n/3 (a, b, c
correspond to the number of i∗ ’s that are equal to 0, 1, 2 respectively) . The lemma then follows. 

Here is a standard estimate. It is similar to the proof of the Chernoff bound.

Lemma 6.5.6. For every positive integer n,


Õ n!
≤ 2.76n .
a,b,c≥0
a!b!c!
a+b+c=n
b+2c≤2n/3
172 6. FORBIDDING 3-TERM ARITHMETIC PROGRESSIONS

Proof. Let x > 0. The sum equals to the coefficients of all the monomials x k with k ≤ 2n/3 in the
expansion of (1 + x + x 2 )n . So the sum is
Õ n! (1 + x + x 2 )n
≤ .
a,b,c≥0
a!b!c! x 2n/3
a+b+c=n
b+2c≤2n/3

Setting x = 0.6 gives ≤ (2.76)n . 



Remark 6.5.7. Taking the optimal value x = ( 33 − 1)/8 = 0.59307 . . . in the final step, we obtain
≤ (2.75510 . . . )n . This is the true exponential asymptotics of the sum in Lemma 6.5.6. See, e.g.,
Sanov’s theorem from large deviation theory. Though we have no idea how close this is to the
optimal bound for the cap set problem.
Proof of Theorem 6.5.1. Let A ⊂ F3n be 3-AP-free. Define F : A × A × A → F3 by
(
1 if x + y + z = 0,
F(x + y + z) =
0 otherwise.
Since A is 3-AP-free, one has F(x, y, z) = 1 if and only if x = y = z ∈ A. By Lemma 6.5.4,
F has slice rank | A|. On the other hand, by Lemma 6.5.5, F has slice rank ≤ 3(2.76)n . So
| A| ≤ 3(2.76)n . 
It is straightforward to extend the above proof from F3 to any other fixed F p , resulting:

Theorem 6.5.8. For every odd prime p, there is some cp < p so that every 3-AP-free subset of Fnp
has size at most 3cpn .
It remains an intriging open problem to extend the techniques to other settings.
Open problem 6.5.9. Is there a constant c < 5 such that every 4-AP-free subset of F5n has size
n
O(c )?
Open problem 6.5.10. Is there a constant c < 2 such that every corner-free subset of F2n × F2n has
size O(c2n )? Here a corner is a configuration of the form {(x, y), (x + d, y), (x, y + d)}.
Finally, the proof technique in this section seems specific to the finite field model. It is an
intriguing open problem to apply the polynomial method for Roth’s theorem in the integers. Due
to the Behrend example (Section 2.5), we cannot expect power-saving bounds in the integers.

6.6. Arithmetic regularity


Here we develop an arithmetic analogue of Szemerédi’s graph regularity lemma from Chapter 2.
Just as the graph regularity method has powerful applications, so too does the arithmetic regularity
lemma as well as the general strategy behind it.
First, we need a notion of what it means for a subset of Fnp to be uniform, in a sense analogous to
-regular pairs from the graph regularity lemma. We also saw the following notion in the Fourier
analytic proof of Roth’s theorem.
Definition 6.6.1 (Fourier uniformity). We say that A ⊂ Fnp is -uniform if | 1
cA(r)| ≤  for all
r ∈ Fnp \ {0}.
6.6. ARITHMETIC REGULARITY 173

The following exercises explains how Fourier uniformity is analogous to the discrepancy-type
condition for -regular pairs in the graph regularity lemma.
Exercise 6.6.2 (Uniformity and discrepancy). Let A ⊂ Fnp with | A| = αpn . Let HyperplaneDISC(η)
denote the property that for every hyperplane W of Fnp ,
|A ∩ W|
− α ≤ η.
|W |
(a) Prove that if A satisfies HyperplaneDISC(), then A is -uniform.
(b) Prove that if A is -uniform, then it satisfies HyperplaneDISC((p − 1)).
Definition 6.6.3 (Fourier uniformity on affine subspaces). For an affine subspace W of Fnp (i.e., the
coset of a subspace), we say that A is -uniform on W if A ∩ W is -uniform when viewed as a
subset of W.
Here is an arithmetic analogue of Szemerédi’s graph regularity lemma that we saw in Chapter 2.
It is due to Green (2005a).

Theorem 6.6.4 (Arithmetic regularity lemma). For every  > 0 and prime p, there exists M so that
for every A ⊂ Fnp , there is some subspace W of Fnp with codimension at most M such that A is
-uniform on all but at most -fraction of cosets of W.
The proof is very similar to the proof of the graph regularity lemma in Chapter 2. Each
subspace W induces a partition of of the whole space Fnp into W-cosets, and we keep track the
energy (mean-squared density) of the partition. We show that if the conclusion of Theorem 6.6.4
does not hold for the current W, then we can replace W by a smaller subspace so that the energy
increases significantly. Since the energy is always bounded between 0 and 1, there are at most a
bounded number of iterations.
Definition 6.6.5 (Energy). Given A ⊂ Fnp , and W a subspace of Fnp , we define the energy of W with
respect to a fixed A to be
| A ∩ (W + x)| 2
 
q A(W) := E x∈Fnp .
|W | 2
Given a subspace W of Fnp . Define µW : Fnp → R by
pn
µW := 1W .
|W |
(One can regard µW as the uniform probability distribution on W; it is normalized so that EµW = 1.)
Then,
| A ∩ (W + x)|
(1 A ∗ µW )(x) = for every x ∈ Fnp .
|W |
We have (check!) (
1 if r ∈ W ⊥,
µcW (r) =
0 if r < W ⊥ .
So by the convolution identity (Theorem 6.1.7).
(
1cA(r) if r ∈ W ⊥,
A ∗ µW (r) = 1 A (r) µ
1œ W (r) = (6.6.1)
c c
0 if r < W ⊥ .
174 6. FORBIDDING 3-TERM ARITHMETIC PROGRESSIONS

To summarize, convolving by µW averages 1 A along cosets of W in the physical space, and filters
W ⊥ in the Fourier space.
Energy interacts nicely with the Fourier transform. By Parseval’s identity (Theorem 6.1.3), we
have Õ Õ
q A(W) = k1 A ∗ µW k 22 = A ∗ µW (r)| =
| 1œ 2
| 1cA(r)| 2 . (6.6.2)
r∈Fnp r∈W ⊥
The next lemma is analogous to Lemma 2.1.10. It is an easy consequence of convexity. It also
directly follows from (6.6.2).

Lemma 6.6.6 (Energy never decreases under refinement). Let A ⊂ Fnp . For subspaces U ≤ W ≤
F2p , we have q A(U) ≥ q A(W). 
The next lemma is analogous to the energy boost lemma for irregular pairs in the proof of graph
regularity (Lemma 2.1.11).

Lemma 6.6.7 (Local energy increment). If A ⊂ Fnp is not -uniform, then there is some codimension-
1 subspace W with q A(W) > (| A| /pn )2 +  2 .
Proof. Suppose A is not -uniform. Then there is some r , 0 such that | 1
cA(r)| > . Let W = r ⊥ .
Then by (6.6.2),
q A(W) = | 1cA(0)| 2 + | 1cA(r)| 2 + | 1cA(2r)| 2 + · · · + | 1cA((p − 1)r)| 2
≥ | 1cA(0)| 2 + | 1cA(r)| 2 > (| A| /pn )2 +  2 . 
By applying the above lemmas locally to each W-coset, we obtain the following global incre-
ment, analogous to Lemma 2.1.12

Lemma 6.6.8 (Global energy increment). Let A ⊂ Fnp . Let W be a subspace of Fnp . Suppose
that f is not -uniform on > -fraction of W-cosets. Then there is some subspace U of W with
codim U − codim W ≤ pcodim W such that
q A(U) > q A(W) +  3 .
Proof. By Lemma 6.6.7, for each coset W 0 of W on which f is not -uniform, we can find some
r ∈ Fnp \ W ⊥ so that replacing W by its intersection with r ⊥ increases its energy on W 0 by more than
 2 . In other words,
| A ∩ W 0 |2
q A∩W 0 (W 0 ∩ r ⊥ ) > 2
+  2.
0
|W |
Let R be a set of such r’s, one for each W-coset on which f is not -uniform (allowing some r’s to
be chosen repeatedly).
Let U = W ∩ R⊥ . Then codim U − codim W ≤ |R| ≤ |F p /W | = pcodim W .
Applying the monotonicity of energy (Lemma 6.6.6) on each W-coset and using the observation
in the first paragraph in this proof, we see the “local” energy of U is more than that of W on by
>  2 on each of the > -fraction of W-cosets on which f is not -uniform, and is at least as great
as that of W on each of the remaining W-cosets. There the energy increases by >  2 when refining
from W to U. 
Proof of the arithmetic regularity lemma (Theorem 6.6.4). Starting with W0 = Fnp , we construct a
sequence of subspaces W0 ≥ W1 ≥ W2 ≥ · · · where each at step, unless A is -uniform on all but
6.6. ARITHMETIC REGULARITY 175

≤ -fraction of W-cosets, then we apply Lemma 6.6.8 to find Wi+1 ≤ Wi . The energy increases by
>  3 at each iteration, so there are <  −3 iterations. We have codim Wi+1 ≤ codim Wi + pcodim Wi at
each i, so the final W = Wm has codimension at most some function of p and  (one can check that
it is an exponential tower of p’s of height O( −3 )). This W satisfies the desired properties. 
Remark 6.6.9 (Lower bound). Recall that Gowers (1997) showed that there exist graphs whose
-regular partition requires at least tower(Ω( −c )) parts (Theorem 2.1.15). There is a similar tower-
type lower bound for the arithmetic regularity lemma (Green 2005a; Hosseini, Lovett, Moshkovitz,
and Shapira 2016).
Remark 6.6.10 (Abelian groups). Green (2005a) also established an arithmetic regularity lemma
over arbitrary finite abelian groups. Instead of subspaces, one uses Bohr sets (see Remark 6.4.7).
Now let us give another proof of the arithmetic regularity-type result. It has the same spirit as
the above regularity lemma, but phrased in terms of a decomposition rather than a partition. This
perspective of regularity as decompositions, popularized by Tao, allows one to adapt the ideas of
regularity to more general settings where we cannot neatly partition the underlying space into easily
describable pieces. It is very useful and has many applications in additive combinatorics.

Theorem 6.6.11 (Arithmetic regularity decomposition). For every sequence 0 ≥ 1 ≥ 2 ≥ · · · > 0,


there exists M so that every f : Fnp → [0, 1] can be written as
f = fstr + fpsr + fsml
where
• (structured piece) fstr = fW for some subspace W of codimension at most M;
• (pseudorandom piece) k fc psr k∞ ≤ codim W ;
• (small piece) k fsml k2 ≤ 0 .
Remark 6.6.12. It is worth comparing Theorem 6.6.11 to the strong graph regularity lemma
(Theorem 2.8.3). It is important that the uniformity requirement on the pseudorandom piece
depends on the codim W.
In other more advanced applications, we would like fstr to come from some structured class of
functions. For example, in higher order Fourier analysis, fstr is a nilsequence.
Proof. Let k 0 = 0 and ki+1 = max{ki, d k−2 e} for each i ≥ 0. Note that k 0 ≤ k 1 ≤ · · · .
i
Let us label the elements r1, r2, . . . , rpn of Fnp so that

|b f (r2 )| ≥ · · · .
f (r1 )| ≥ | b
By Parseval (Theorem 6.1.3), we have
p n
Õ
f (r j )| 2 = E f 2 ≤ 1.
|b
j=1

There is some positive integer m ≤ d0−2 e so that


Õ
f (r j )| 2 ≤ 02,
|b (6.6.3)
k m < j≤k m+1
176 6. FORBIDDING 3-TERM ARITHMETIC PROGRESSIONS

since otherwise adding up the sum over all m ≤ d0−2 e would contradict
Í b 2
r | f (r)| ≤ 1. Also, we
have
1
|bf (rk )| ≤ √ for every k. (6.6.4)
k
The idea now is to split
pn
Õ
f (x) = f (r j )ωr j ·x
b
j=1
into
f = fstr + fsml + fpsr
according to the sizes of the Fourier coefficients. Roughly speaking, the large spectrum will go
into the structured piece fstr , the very small spectrum will go into pseudorandom piece fpsr , and the
remaining middle terms will form the small piece fsml (which has small L 2 norm by (6.6.3)).
Let W = {r1, . . . , rk m }⊥ and set
fstr = fW .
Then, by (6.6.1),
(
f (r) if r ∈ W ⊥,
b
fstr (r) =
c
0 if r ∈ W ⊥ .
Let us define fpsr and fsml via their Fourier transform (and we can recover the functions via the
inverse Fourier transform). For each j = 1, 2, . . . , pn , set
(
f (r j ) if j > k m+1 and r j < W ⊥,
b
fpsr (r j ) =
c
0 otherwise.
Finally, let fsml = f − fpsr − fsml , so that
(
f (r j ) if k m < j ≤ k m+1 and r j < W ⊥,
b
dfsml (r j ) =
0 otherwise.
Now we check that all the conditions are satisfied.
Structured piece. We have fstr = fW where codim W ≤ k m ≤ k d −2 e , which is bounded as a
0
function of the sequence 0 ≥ 1 ≥ . . . .

Pseudorandom piece. For every j > k m+1 , we have | b f (r j )| ≤ 1/ k m+1 by (6.6.4), which is in
turn ≤  k m ≤ codim W by the definition of k m . It follows that k fc psr k ≤ codim W .
Small piece. By (6.6.3),
Õ
kdfsml k22 ≤ f (r j )| 2 ≤ 02 .
|b 
k m < j≤k m+1

Exercise 6.6.13. Deduce Theorem 6.6.4 from Theorem 6.6.11 by using an appropriate sequence
i and using the same W guaranteed by Theorem 6.6.11.
Remark 6.6.14 (Spectral proof of the graph regularity lemma). The proof technique of Theo-
rem 6.6.11 can be adapted to give an alternate proof of the graph regularity lemma (along with
certain weak and strong variants). Instead of iteratively refining partitions and tracking energy
6.7. POPULAR COMMON DIFFERENCE 177

increments as we did in Chapter 2, we can first take a spectral decomposition of the adjacency
matrix A of a graph:
Õ n
|
A= λi vi vi ,
i=1
where v1, . . . , vn is an orthonormal system of eigenvectors with eigenvalues λ1 ≥ · · · ≥ λn . Then,
as in the proof of Theorem 6.6.11, we can decompose A as
A = Astr + Apsr + Asml
with
λi vi vi, λi vi vi, λi vi vi,
Õ Õ Õ
Astr = Apsr = and Asml =
i≤k i>k 0 k<i≤k 0
for some appropriately chosen k and k 0 similar to the proof of Theorem 6.6.11.
We have
Õn
λi2 = tr A2 ≤ n2 .
i=1

So λi ≤ n/ i for each i. We can guarantee that the spectral norm of Apsr is small enough as a
function of k and . Furthermore, we can guarantee that tr A2sml = k<i≤k 0 λi2 ≤ .
Í
To turn Astr into a vertex partition, we can use the approximate level sets of the top k eigenvectors
v1, . . . , vk . Some bookkeeping calculations then shows that this is a regularity partition. Intuitively,
Apsr provides us with regular pairs. Some of these regular pairs may not stay regular after adding
Asml , but since Asml has ≤  mass (in terms of L 2 norm), it destroys at most a negligible fraction of
regular pairs.
See Tao (2007a, Lemma 2.11) or Tao’s blog post The Spectral Proof of the Szemerédi Regularity
Lemma (2012) for more details of the proof.

6.7. Popular common difference


Roth’s theorem has the following qualitative strengthening. Given A ⊂ F3n with density α,
there is some “popular common difference” y , 0 so that the number of 3-APs in A with common
difference y is ≥ α3 − o(1), i.e., at least approximately as much as one should expect if A were a
random subset of density α. This was proved by Green (2005a) as an application of his arithmetic
regularity lemma (from the previous section).

Theorem 6.7.1 (Roth’s theorem with popular common difference in F3n ). For all  > 0, there exists
n0 = n0 () such that for n ≥ n0 and every A ⊂ F3n with | A| = α3n , there exists y , 0 such that
{x ∈ F3n : x, x + y, x + 2y ∈ A} ≥ (α3 − )3n .
In particular, Theorem 6.7.1 implies that every 3-AP-free subset of F3n has size o(3n ).
Exercise 6.7.2. Show that it is false that every A ⊂ F3n with | A| = α3n , the number of pairs
(x, y) ∈ F3n with x, x + y, x + 2y ∈ A is ≥ (α3 − o(1))32n , where o(1) → 0 as n → 0.
We will prove Theorem 6.7.1 via the next result, which concerns the number of 3-APs with
common difference coming from some subspace of bounded codimension, which is picked via the
arithmetic regularity lemma.
178 6. FORBIDDING 3-TERM ARITHMETIC PROGRESSIONS

Theorem 6.7.3 (Roth’s theorem with common difference in some subspace). For every  > 0, there
exists M so that for every A ⊂ F3n , there exists a subspace W with codimension at most M, so that

{(x, y) ∈ F3n × W : x, x + y, x + 2y ∈ A} ≥ (α3 − )3n |W | .


Proof. By the arithmetic regularity lemma (Theorem 6.6.4), there is some M depending only on 
and a subspace W of Fnp of codimension ≤ M so that A is -uniform on all but at most -fraction
of W-cosets.
Let u + W be a W-coset on which A is -uniform. Denote the density of A in u + W by
| A ∩ (u + W)|
αu = .
|W |
Restricting ourselves inside u + W for a moment, by the 3-AP counting lemma Lemma 6.2.4, the
number of 3-APs of A (including trivial ones) that are contained in u + W is
|{(x, y) ∈ (u + W) × W : x, x + y, x + 2y ∈ A}| ≥ (αu3 − ) |W | 2 .
Since A is -uniform on all but at most -fraction of W-cosets, by varying u + W over all such
cosets, we find that the total number of 3-APs in A with common difference in W is
{(x, y) ∈ F3n × W : x, x + y, x + 2y ∈ A} ≥ (1 − )(α3 − )3n |W | ≥ (α3 − 2)3n |W | .
This proves the theorem (with  replaced by 2). 
Exercise 6.7.4. Give another proof of Theorem 6.7.3 using Theorem 6.6.11 (arithmetic regularity
decomposition f = fstr + fpsr + fsml ).
Proof of Theorem 6.7.1. First apply Theorem 6.7.3 with find a subspace W of codimension ≤ M =
M(). Choose n0 = M + log3 (1/). So n ≥ n0 guarantees |W | ≥ 1/.
We need to exclude 3-APs with common difference zero. We have
(α3 − )3n |W | ≤ {(x, y) ∈ F3n × W : x, x + y, x + 2y ∈ A}
= {(x, y) ∈ F3n × (W \ {0}) : x, x + y, x + 2y ∈ A} + | A| .
We have | A| ≤ 3n ≤ 3n |W |, so
(α3 − 2)3n |W | ≤ {(x, y) ∈ F3n × (W \ {0}) : x, x + y, x + 2y ∈ A} .
By averaging, there exists y ∈ W \ {0} satisfying
{x ∈ F3n : x, x + y, x + 2y ∈ A} ≥ (α3 − 2)3n .
This proves the theorem (with  replaced by 2). 

By adapting the above proof strategy with Bohr sets, Green (2005a) proved that a Roth’s theorem
with popular differences in finite abelian groups of odd order, as well as in the integers.

Theorem 6.7.5 (Roth’s theorem with popular difference in finite abelian groups). For all  > 0, there
exists N0 = N0 () such that for all finite abelian groups Γ of odd order |Γ| ≥ N0 , and every A ⊂ Γ
with | A| = α |Γ|, there exists y ∈ Γ \ {0} such that
|{x ∈ Γ : x, x + y, x + 2y ∈ A}| ≥ (α3 − ) |Γ| .
FURTHER READING 179

Theorem 6.7.6 (Roth’s theorem with popular difference in the integers). For all  > 0, there exists
N0 = N0 () such that for every N ≥ N0 , and every A ⊂ [N] with | A| = αN, there exists y , 0 such
that
|{x ∈ [N] : x, x + y, x + 2y ∈ A}| ≥ (α3 − )N.
See Tao’s blog post A Proof of Roth’s Theorem (2014) for a proof of Theorem 6.7.6 using Bohr
sets, following an arithmetic regularity decomposition in the spirit of Theorem 6.6.11.
Remark 6.7.7 (Bounds). The above proof of Theorem 6.7.1 gives n0 = tower( −O(1) ). The bounds
Theorems 6.7.5 and 6.7.6 are also tower-type. What is the smallest n0 () for which Theorem 6.7.1
holds? It turns out to be tower(Θ(log(1/))), as proved by Fox and Pham (2019) over finite fields
and Fox, Pham, and Zhao (2020) over the integers. Although it had been known since Gowers
(1997) that tower-type bounds are necessary for the regularity lemmas themselves, Roth’s theorem
with popular differences is the first regularity application where a tower-type bound is shown to be
indeed necessary.
Using quadratic Fourier analysis, Green and Tao (2010) extended the popular difference result
over to 4-APs.

Theorem 6.7.8 (Popular difference for 4-APs). For all  > 0, there exists N0 = N0 () such that for
every N ≥ N0 and A ⊂ [N] with | A| = αN, there exists y , 0 such that
|{x : x, x + y, x + 2y, x + 3y ∈ A}| ≥ (α4 − )N.
It may be a surprising that such a statement is false for APs of length 5 or longer. This was
shown by Bergelson et al. (2005) with an appendix by Ruzsa giving a construction that is a clever
modification of the Behrend construction (Section 2.5).

Theorem 6.7.9 (Popular difference fails for 5-APs). Let 0 < α < 1/2. For all sufficiently large N,
there exists A ⊂ [N] with | A| ≥ αN such that for all y , 0,
|{x : x, x + y, x + 2y, x + 3y, x + 4y ∈ A}| ≤ α c log(1/α) N.
Here c > 0 is some absolute constant.
For more on results of this type, as well as for popular different fro high dimensional patterns,
see Sah, Sawhney, and Zhao (2021).

Further reading
The book Fourier Analysis by Stein and Shakarchi (2003) provides an excellent undergraduate-
level introduction to Fourier analysis.
Green has several surveys and lecture notes on the topics covered in this and subsequent
chapters. In Finite Field Models in Additive Combinatorics (2005b), Green argues that one should
begin the study of many additive combinatorics problems in the finite field setting. His Montreal
Lecture Notes on Quadratic Fourier Analysis introduces quadratic Fourier analysis and explains
how to prove the popular common difference theorem for 4-APs in F5n . His lecture notes from his
Cambridge course Additive Combinatorics (2009b) also provides an excellent introduction to the
subject.
Tao’s FOCS 2007 tutorial Structure and Randomness in Combinatorics (2007a) explains many
facets of arithmetic regularity and applications.
180 6. FORBIDDING 3-TERM ARITHMETIC PROGRESSIONS

For more on algebraic methods in combinatorics (pre-dating methods in Section 6.5), see the
books Thirty-three Miniatures by Matoušek (2010), Linear Algebra Methods in Combinatorics by
Babai and Frankl, and Polynomial Methods in Combinatorics by Guth (2016).
References

M. Ajtai and E. Szemerédi, Sets of lattice points that form no squares, Studia Sci. Math. Hungar. 9 (1974), 9–11 (1975).
MR369299 ↑55
N. Alon, Eigenvalues and expanders, vol. 6, 1986, pp. 83–96. MR875835 doi:10.1007/BF02579166 ↑87, ↑100
N. Alon and V. D. Milman, λ1, isoperimetric inequalities for graphs, and superconcentrators, J. Combin. Theory Ser.
B 38 (1985), 73–88. MR782626 doi:10.1016/0095-8956(85)90092-9 ↑87
Noga Alon and Assaf Naor, Approximating the cut-norm via Grothendieck’s inequality, SIAM J. Comput. 35 (2006),
787–803. MR2203567 doi:10.1137/S0097539704441629 ↑99
Noga Alon and Asaf Shapira, A characterization of the (natural) graph properties testable with one-sided error, SIAM
J. Comput. 37 (2008), 1703–1727. MR2386211 doi:10.1137/06064888X ↑67
Noga Alon and Joel H. Spencer, The probabilistic method, fourth ed., John Wiley & Sons, Inc., 2016. MR3524748
↑15, ↑117
Noga Alon, Lajos Rónyai, and Tibor Szabó, Norm-graphs: variations and applications, J. Combin. Theory Ser. B 76
(1999), 280–290. MR1699238 doi:10.1006/jctb.1999.1906 ↑36
Noga Alon, Eldar Fischer, Michael Krivelevich, and Mario Szegedy, Efficient testing of large graphs, Combinatorica
20 (2000), 451–476. MR1804820 doi:10.1007/s004930070001 ↑63
Noga Alon, W. Fernandez de la Vega, Ravi Kannan, and Marek Karpinski, Random sampling and approxima-
tion of MAX-CSPs, vol. 67, 2003a, Special issue on STOC2002 (Montreal, QC), pp. 212–243. MR2022830
doi:10.1016/S0022-0000(03)00008-4 ↑122
Noga Alon, Michael Krivelevich, and Benny Sudakov, Turán numbers of bipartite graphs and related Ramsey-type ques-
tions, vol. 12, 2003b, Special issue on Ramsey theory, pp. 477–494. MR2037065 doi:10.1017/S0963548303005741
↑29
Emil Artin, Über die Zerlegung definiter Funktionen in Quadrate, Abh. Math. Sem. Univ. Hamburg 5 (1927), 100–115.
MR3069468 doi:10.1007/BF02952513 ↑144
Lászlo Babai and Péter Frankl, Linear algebra methods in combinatorics, 2020, book draft http://people.cs.
uchicago.edu/~laci/CLASS/HANDOUTS-COMB/BaFrNew.pdf. ↑180
R. C. Baker, G. Harman, and J. Pintz, The difference between consecutive primes. II, Proc. London Math. Soc. (3) 83
(2001), 532–562. MR1851081 doi:10.1112/plms/83.3.532 ↑35
József Balogh, Ping Hu, Bernard Lidický, and Florian Pfender, Maximum density of induced 5-cycle is achieved by
an iterated blow-up of 5-cycle, European J. Combin. 52 (2016), 47–58. MR3425964 doi:10.1016/j.ejc.2015.08.006
↑143
Michael Bateman and Nets Hawk Katz, New bounds on cap sets, J. Amer. Math. Soc. 25 (2012), 585–613. MR2869028
doi:10.1090/S0894-0347-2011-00725-X ↑163
F. A. Behrend, On sets of integers which contain no three terms in arithmetical progression, Proc. Nat. Acad. Sci.
U.S.A. 32 (1946), 331–332. MR18694 doi:10.1073/pnas.32.12.331 ↑57
Clark T. Benson, Minimal regular graphs of girths eight and twelve, Canadian J. Math. 18 (1966), 1091–1094.
MR197342 doi:10.4153/CJM-1966-109-8 ↑38
V. Bergelson and A. Leibman, Polynomial extensions of van der Waerden’s and Szemerédi’s theorems, J. Amer. Math.
Soc. 9 (1996), 725–753. MR1325795 doi:10.1090/S0894-0347-96-00194-4 ↑6
Vitaly Bergelson, Bernard Host, and Bryna Kra, Multiple recurrence and nilsequences, Invent. Math. 160 (2005),
261–303, With an appendix by Imre Ruzsa. MR2138068 doi:10.1007/s00222-004-0428-6 ↑179
Yonatan Bilu and Nathan Linial, Lifts, discrepancy and nearly optimal spectral gap, Combinatorica 26 (2006), 495–519.
MR2279667 doi:10.1007/s00493-006-0029-7 ↑86
Grigoriy Blekherman, Annie Raymond, Mohit Singh, and Rekha R. Thomas, Simple graph density inequalities with
no sum of squares proofs, Combinatorica 40 (2020), 455–471. MR4150878 doi:10.1007/s00493-019-4124-y ↑144

181
182 References

Thomas F. Bloom and Olof Sisask, Breaking the logarithmic barrier in Roth’s theorem on arithmetic progressions,
arXiv preprint (2020). arXiv:2007.03528 ↑5, ↑6, ↑169
Béla Bollobás, Relations between sets of complete subgraphs, Proceedings of the Fifth British Combinatorial Confer-
ence (Univ. Aberdeen, Aberdeen, 1975), 1976, pp. 79–84. MR0396327 ↑151
Béla Bollobás, Modern graph theory, Springer-Verlag, 1998. MR1633290 doi:10.1007/978-1-4612-0619-4 ↑42
J. A. Bondy and U. S. R. Murty, Graph theory, Springer, 2008. MR2368647 doi:10.1007/978-1-84628-970-5 ↑42
J. A. Bondy and M. Simonovits, Cycles of even length in graphs, J. Combinatorial Theory Ser. B 16 (1974), 97–105.
MR340095 doi:10.1016/0095-8956(74)90052-5 ↑27
C. Borgs, J. T. Chayes, L. Lovász, V. T. Sós, and K. Vesztergombi, Convergent sequences of dense graphs.
I. Subgraph frequencies, metric properties and testing, Adv. Math. 219 (2008), 1801–1851. MR2455626
doi:10.1016/j.aim.2008.07.008 ↑115, ↑131
J. Bourgain, On triples in arithmetic progression, Geom. Funct. Anal. 9 (1999), 968–984. MR1726234
doi:10.1007/s000390050105 ↑169
W. G. Brown, On graphs that do not contain a Thomsen graph, Canad. Math. Bull. 9 (1966), 281–285. MR200182
doi:10.4153/CMB-1966-036-2 ↑34, ↑35
W. G. Brown, P. Erdős, and V. T. Sós, Some extremal problems on r-graphs, New directions in the theory of graphs
(Proc. Third Ann Arbor Conf., Univ. Michigan, Ann Arbor, Mich, 1971), 1973, pp. 53–63. MR0351888 ↑54
Boris Bukh, Random algebraic construction of extremal graphs, Bull. Lond. Math. Soc. 47 (2015), 939–945.
MR3431574 doi:10.1112/blms/bdv062 ↑39, ↑42
Boris Bukh, Extremal graphs without exponentially-small bicliques, arXiv preprint (2021). arXiv:2107.04167 ↑39
Jeff Cheeger, A lower bound for the smallest eigenvalue of the Laplacian, Problems in analysis (Papers dedicated to
Salomon Bochner, 1969), 1970, pp. 195–199. MR0402831 ↑87
F. R. K. Chung, R. L. Graham, and R. M. Wilson, Quasi-random graphs, Combinatorica 9 (1989), 345–362.
MR1054011 doi:10.1007/BF02125347 ↑75, ↑82
Fan R. K. Chung, Spectral graph theory, American Mathematical Society, 1997. MR1421568 ↑105
David Conlon, Extremal numbers of cycles revisited, Amer. Math. Monthly 128 (2021), 464–466. MR4249723
doi:10.1080/00029890.2021.1886845 ↑38
David Conlon and Jacob Fox, Graph removal lemmas, Surveys in combinatorics 2013, London Math. Soc. Lecture
Note Ser., vol. 409, Cambridge Univ. Press, Cambridge, 2013, pp. 1–49. MR3156927 ↑73
David Conlon and Yufei Zhao, Quasirandom Cayley graphs, Discrete Anal. (2017), Paper No. 6, 14. MR3631610
doi:10.19086/da.1294 ↑98
David Conlon, Jacob Fox, and Benny Sudakov, An approximate version of Sidorenko’s conjecture, Geom. Funct. Anal.
20 (2010), 1354–1366. MR2738996 doi:10.1007/s00039-010-0097-0 ↑82, ↑135
Don Coppersmith and Shmuel Winograd, Matrix multiplication via arithmetic progressions, J. Symbolic Comput. 9
(1990), 251–280. MR1056627 doi:10.1016/S0747-7171(08)80013-2 ↑57
Ernie Croot, Vsevolod F. Lev, and Péter Pál Pach, Progression-free sets in Z4n are exponentially small, Ann. of Math.
(2) 185 (2017), 331–337. MR3583357 doi:10.4007/annals.2017.185.1.7 ↑163, ↑169
Giuliana Davidoff, Peter Sarnak, and Alain Valette, Elementary number theory, group theory, and Ramanujan graphs,
Cambridge University Press, 2003. MR1989434 doi:10.1017/CBO9780511615825 ↑104, ↑105
Reinhard Diestel, Graph theory, fifth ed., Springer, 2017. MR3644391 doi:10.1007/978-3-662-53622-3 ↑42
Jozef Dodziuk, Difference equations, isoperimetric inequality and transience of certain random walks, Trans. Amer.
Math. Soc. 284 (1984), 787–794. MR743744 doi:10.2307/1999107 ↑87
Yves Edel, Extensions of generalized product caps, Des. Codes Cryptogr. 31 (2004), 5–14. MR2031694
doi:10.1023/A:1027365901231 ↑163
Michael Elkin, An improved construction of progression-free sets, Israel J. Math. 184 (2011), 93–128. MR2823971
doi:10.1007/s11856-011-0061-1 ↑57
Jordan S. Ellenberg and Dion Gijswijt, On large subsets of Fqn with no three-term arithmetic progression, Ann. of Math.
(2) 185 (2017), 339–343. MR3583358 doi:10.4007/annals.2017.185.1.8 ↑163, ↑169
P. Erdős, On some extremal problems on r-graphs, Discrete Math. 1 (1971), 1–6. MR297602 doi:10.1016/0012-
365X(71)90002-1 ↑24
P. Erdős and M. Simonovits, A limit theorem in graph theory, Studia Sci. Math. Hungar. 1 (1966), 51–57. MR205876
↑23
P. Erdős, A. Rényi, and V. T. Sós, On a problem of graph theory, Studia Sci. Math. Hungar. 1 (1966), 215–235.
MR223262 ↑34
References 183

Paul Erdős, On some problems in graph theory, combinatorial analysis and combinatorial number theory, Graph
theory and combinatorics (Cambridge, 1983), Academic Press, London, 1984, pp. 1–17. MR777160 ↑143
P. Erdős, On sets of distances of n points, Amer. Math. Monthly 53 (1946), 248–250. MR15796 doi:10.2307/2305092
↑21, ↑22
P. Erdős and A. H. Stone, On the structure of linear graphs, Bull. Amer. Math. Soc. 52 (1946), 1087–1091. MR18807
doi:10.1090/S0002-9904-1946-08715-7 ↑23, ↑26
Paul Erdős and Paul Turán, On Some Sequences of Integers, J. London Math. Soc. 11 (1936), 261–264. MR1574918
doi:10.1112/jlms/s1-11.4.261 ↑4
Geoffrey Exoo, A lower bound for Schur numbers and multicolor Ramsey numbers of K3 , Electron. J. Combin. 1 (1994),
R8, 3 pp. MR1293398 ↑3
Helmut Finner, A generalization of Hölder’s inequality and some probability inequalities, Ann. Probab. 20 (1992),
1893–1901. MR1188047 ↑145, ↑147
Jacob Fox, A new proof of the graph removal lemma, Ann. of Math. (2) 174 (2011), 561–579. MR2811609
doi:10.4007/annals.2011.174.1.17 ↑54
Jacob Fox and Huy Tuan Pham, Popular progression differences in vector spaces II, Discrete Anal. (2019), Paper No.
16, 39. MR4042159 doi:10.19086/da ↑179
Jacob Fox and Benny Sudakov, Dependent random choice, Random Structures Algorithms 38 (2011), 68–99.
MR2768884 doi:10.1002/rsa.20344 ↑42
Jacob Fox, Huy Tuan Pham, and Yufei Zhao, Tower-type bounds for Roth’s theorem with popular differences, 2020.
arXiv:2004.13690 ↑179
Peter Frankl and Vojtěch Rödl, Extremal problems on set systems, Random Structures Algorithms 20 (2002), 131–164.
MR1884430 doi:10.1002/rsa.10017.abs ↑70
Joel Friedman, A proof of Alon’s second eigenvalue conjecture and related problems, Mem. Amer. Math. Soc. 195
(2008), viii+100. MR2437174 doi:10.1090/memo/0910 ↑104
Alan Frieze and Ravi Kannan, Quick approximation to matrices and applications, Combinatorica 19 (1999), 175–220.
MR1723039 doi:10.1007/s004930050052 ↑120, ↑122
William Fulton and Joe Harris, Representation theory, Graduate Texts in Mathematics, vol. 129, Springer-Verlag, New
York, 1991, A first course, Readings in Mathematics. MR1153249 doi:10.1007/978-1-4612-0979-9 ↑95
Zoltán Füredi, On a Turán type problem of Erdős, Combinatorica 11 (1991), 75–79. MR1112277
doi:10.1007/BF01375476 ↑29
Zoltán Füredi and Miklós Simonovits, The history of degenerate (bipartite) extremal graph problems, Erdös centennial,
János Bolyai Math. Soc., 2013, pp. 169–264. MR3203598 doi:10.1007/978-3-642-39286-3_7 ↑42
H. Furstenberg, Ergodic behavior of diagonal measures and a theorem of Szemerédi on arithmetic progressions, J.
Analyse Math. 31 (1977), 204–256. MR0498471 ↑5, ↑6
H. Furstenberg and Y. Katznelson, An ergodic Szemerédi theorem for commuting transformations, J. Analyse Math.
34 (1978), 275–291. MR531279 doi:10.1007/BF02790016 ↑6
David Galvin and Prasad Tetali, On weighted graph homomorphisms, Graphs, morphisms and statistical physics,
DIMACS Ser. Discrete Math. Theoret. Comput. Sci., vol. 63, Amer. Math. Soc., Providence, RI, 2004, pp. 97–104.
MR2056231 doi:10.1090/dimacs/063/07 ↑147, ↑148
Michel X. Goemans and David P. Williamson, Improved approximation algorithms for maximum cut and satisfia-
bility problems using semidefinite programming, J. Assoc. Comput. Mach. 42 (1995), 1115–1145. MR1412228
doi:10.1145/227683.227684 ↑122
A. W. Goodman, On sets of acquaintances and strangers at any party, Amer. Math. Monthly 66 (1959), 778–783.
MR107610 doi:10.2307/2310464 ↑140, ↑141
W. T. Gowers, Lower bounds of tower type for Szemerédi’s uniformity lemma, Geom. Funct. Anal. 7 (1997), 322–337.
MR1445389 doi:10.1007/PL00001621 ↑48, ↑175, ↑179
W. T. Gowers, A new proof of Szemerédi’s theorem, Geom. Funct. Anal. 11 (2001), 465–588. MR1844079
doi:10.1007/s00039-001-0332-9 ↑5
W. T. Gowers, Quasirandomness, counting and regularity for 3-uniform hypergraphs, Combin. Probab. Comput. 15
(2006), 143–184. MR2195580 doi:10.1017/S0963548305007236 ↑72, ↑73
W. T. Gowers, Hypergraph regularity and the multidimensional Szemerédi theorem, Ann. of Math. (2) 166 (2007),
897–946. MR2373376 doi:10.4007/annals.2007.166.897 ↑70, ↑72
W. T. Gowers, Quasirandom groups, Combin. Probab. Comput. 17 (2008), 363–387. MR2410393
doi:10.1017/S0963548307008826 ↑92, ↑95, ↑96
184 References

Ronald L. Graham, Bruce L. Rothschild, and Joel H. Spencer, Ramsey theory, John Wiley & Sons, Inc., 2013.
MR3288500 ↑8
B. Green, A Szemerédi-type regularity lemma in abelian groups, with applications, Geom. Funct. Anal. 15 (2005),
340–376. MR2153903 doi:10.1007/s00039-005-0509-8 ↑173, ↑175, ↑177, ↑178
Ben Green, Finite field models in additive combinatorics, Surveys in combinatorics 2005, Cambridge University Press,
2005b, pp. 1–27. MR2187732 doi:10.1017/CBO9780511734885.002 ↑179
Ben Green, Additive combinatorics (book review), Bull. Amer. Math. Soc. (N.S.) 46 (2009), 489–497. MR2507281
doi:10.1090/S0273-0979-09-01231-2 ↑8
Ben Green, Additive combinatorics, 2009b, lecture notes http://people.maths.ox.ac.uk/greenbj/notes.
html. ↑179
Ben Green and Terence Tao, The primes contain arbitrarily long arithmetic progressions, Ann. of Math. (2) 167 (2008),
481–547. MR2415379 doi:10.4007/annals.2008.167.481 ↑7
Ben Green and Terence Tao, An arithmetic regularity lemma, an associated counting lemma, and applications, An
irregular mind, Bolyai Soc. Math. Stud., vol. 21, János Bolyai Math. Soc., Budapest, 2010, pp. 261–334. MR2815606
doi:10.1007/978-3-642-14444-8_7 ↑179
Ben Green and Terence Tao, New bounds for Szemerédi’s theorem, III: a polylogarithmic bound for r4 (N), Mathematika
63 (2017), 944–1040. MR3731312 doi:10.1112/S0025579317000316 ↑5
Ben Green and Julia Wolf, A note on Elkin’s improvement of Behrend’s construction, Additive number theory, Springer,
New York, 2010, pp. 141–144. MR2744752 doi:10.1007/978-0-387-68361-4_9 ↑57
A. Grothendieck, Résumé de la théorie métrique des produits tensoriels topologiques, Bol. Soc. Mat. São Paulo 8
(1953), 1–79. MR94682 ↑98
Andrzej Grzesik, On the maximum number of five-cycles in a triangle-free graph, J. Combin. Theory Ser. B 102 (2012),
1061–1066. MR2959390 doi:10.1016/j.jctb.2012.04.001 ↑143
Larry Guth, Polynomial methods in combinatorics, University Lecture Series, vol. 64, American Mathematical Society,
Providence, RI, 2016. MR3495952 doi:10.1090/ulect/064 ↑180
Larry Guth and Nets Hawk Katz, On the Erdős distinct distances problem in the plane, Ann. of Math. (2) 181 (2015),
155–190. MR3272924 doi:10.4007/annals.2015.181.1.2 ↑22
Johan Håstad, Some optimal inapproximability results, J. ACM 48 (2001), 798–859. MR2144931
doi:10.1145/502090.502098 ↑122
Hamed Hatami and Serguei Norine, Undecidability of linear inequalities in graph homomorphism densities, J. Amer.
Math. Soc. 24 (2011), 547–565. MR2748400 doi:10.1090/S0894-0347-2010-00687-X ↑133, ↑144
Hamed Hatami, Jan Hladký, Daniel Kráľ, Serguei Norine, and Alexander Razborov, On the number of pentagons in
triangle-free graphs, J. Combin. Theory Ser. A 120 (2013), 722–732. MR3007147 doi:10.1016/j.jcta.2012.12.008
↑143
David Hilbert, Ueber die Darstellung definiter Formen als Summe von Formenquadraten, Math. Ann. 32 (1888),
342–350. MR1510517 doi:10.1007/BF01443605 ↑144
David Hilbert, Über ternäre definite Formen, Acta Math. 17 (1893), 169–197. MR1554835 doi:10.1007/BF02391990
↑144
Shlomo Hoory, Nathan Linial, and Avi Wigderson, Expander graphs and their applications, Bull. Amer. Math. Soc.
(N.S.) 43 (2006), 439–561. MR2247919 doi:10.1090/S0273-0979-06-01126-8 ↑105
Kaave Hosseini, Shachar Lovett, Guy Moshkovitz, and Asaf Shapira, An improved lower bound for arithmetic regularity,
Math. Proc. Cambridge Philos. Soc. 161 (2016), 193–197. MR3530502 doi:10.1017/S030500411600013X ↑175
Kenneth Ireland and Michael Rosen, A classical introduction to modern number theory, second ed., Graduate Texts in
Mathematics, vol. 84, Springer-Verlag, New York, 1990. MR1070716 doi:10.1007/978-1-4757-2103-4 ↑91
Herbert E. Jordan, Group-Characters of Various Types of Linear Groups, Amer. J. Math. 29 (1907), 387–405.
MR1506021 doi:10.2307/2370015 ↑94
Jeff Kahn, An entropy approach to the hard-core model on bipartite graphs, Combin. Probab. Comput. 10 (2001),
219–237. MR1841642 doi:10.1017/S0963548301004631 ↑147, ↑148
G. Katona, A theorem of finite sets, Theory of graphs (Proc. Colloq., Tihany, 1966), 1968, pp. 187–207. MR0290982
↑137
Kiran S. Kedlaya, Large product-free subsets of finite groups, J. Combin. Theory Ser. A 77 (1997), 339–343.
MR1429085 doi:10.1006/jcta.1997.2715 ↑96
Kiran S. Kedlaya, Product-free subsets of groups, Amer. Math. Monthly 105 (1998), 900–906. MR1656927
doi:10.2307/2589282 ↑96
References 185

Peter Keevash, Hypergraph Turán problems, Surveys in combinatorics 2011, London Math. Soc. Lecture Note Ser.,
vol. 392, Cambridge Univ. Press, Cambridge, 2011, pp. 83–139. MR2866732 ↑42
Subhash Khot, Guy Kindler, Elchanan Mossel, and Ryan O’Donnell, Optimal inapproximability results for MAX-CUT
and other 2-variable CSPs?, SIAM J. Comput. 37 (2007), 319–357. MR2306295 doi:10.1137/S0097539705447372
↑122
János Kollár, Lajos Rónyai, and Tibor Szabó, Norm-graphs and bipartite Turán numbers, Combinatorica 16 (1996),
399–406. MR1417348 doi:10.1007/BF01261323 ↑36
J. Komlós and M. Simonovits, Szemerédi’s regularity lemma and its applications in graph theory, Combinatorics, Paul
Erdős is eighty, Vol. 2 (Keszthely, 1993), Bolyai Soc. Math. Stud., vol. 2, János Bolyai Math. Soc., Budapest, 1996,
pp. 295–352. MR1395865 ↑73
János Komlós, Ali Shokoufandeh, Miklós Simonovits, and Endre Szemerédi, The regularity lemma and its applications
in graph theory, Theoretical aspects of computer science (Tehran, 2000), Lecture Notes in Comput. Sci., vol. 2292,
Springer, Berlin, 2002, pp. 84–112. MR1966181 doi:10.1007/3-540-45878-6_3 ↑73
T. Kővári, V. T. Sós, and P. Turán, On a problem of K. Zarankiewicz, Colloq. Math. 3 (1954), 50–57. MR65617
doi:10.4064/cm-3-1-50-57 ↑19
M. Krivelevich and B. Sudakov, Pseudo-random graphs, More sets, graphs and numbers, Bolyai Soc. Math. Stud.,
vol. 15, Springer, Berlin, 2006, pp. 199–262. MR2223394 doi:10.1007/978-3-540-32439-3_10 ↑105
Joseph B. Kruskal, The number of simplices in a complex, Mathematical optimization techniques, Univ. California
Press, Berkeley, Calif., 1963, pp. 251–278. MR0154827 ↑137
L. H. Loomis and H. Whitney, An inequality related to the isoperimetric inequality, Bull. Amer. Math. Soc 55 (1949),
961–962. MR0031538 doi:10.1090/S0002-9904-1949-09320-5 ↑41, ↑146
László Lovász, Very large graphs, Current developments in mathematics, 2008, Int. Press, Somerville, MA, 2009,
pp. 67–128. MR2555927 ↑132
László Lovász, Large networks and graph limits, American Mathematical Society Colloquium Publications, vol. 60,
American Mathematical Society, Providence, RI, 2012. MR3012035 doi:10.1090/coll/060 ↑132, ↑135, ↑138, ↑141,
↑154
László Lovász and Balázs Szegedy, Limits of dense graph sequences, J. Combin. Theory Ser. B 96 (2006), 933–957.
MR2274085 doi:10.1016/j.jctb.2006.05.002 ↑115
László Lovász and Balázs Szegedy, Szemerédi’s lemma for the analyst, Geom. Funct. Anal. 17 (2007), 252–270.
MR2306658 doi:10.1007/s00039-007-0599-6 ↑112
Eyal Lubetzky and Yufei Zhao, On the variational problem for upper tails in sparse random graphs, Random Structures
Algorithms 50 (2017), 420–436. MR3632418 doi:10.1002/rsa.20658 ↑147, ↑148
A. Lubotzky, R. Phillips, and P. Sarnak, Ramanujan graphs, Combinatorica 8 (1988), 261–277. MR963118
doi:10.1007/BF02126799 ↑104
Alexander Lubotzky, Discrete groups, expanding graphs and invariant measures, Progress in Mathematics, vol. 125,
Birkhäuser Verlag, Basel, 1994, With an appendix by Jonathan D. Rogawski. MR1308046 doi:10.1007/978-3-0346-
0332-4 ↑105
Alexander Lubotzky, Expander graphs in pure and applied mathematics, Bull. Amer. Math. Soc. (N.S.) 49 (2012),
113–162. MR2869010 doi:10.1090/S0273-0979-2011-01359-3 ↑105
W. Mantel, Problem 28, Wiskundige Opgaven 10 (1907), 60–61. ↑10
Adam W. Marcus, Daniel A. Spielman, and Nikhil Srivastava, Interlacing families I: Bipartite Ramanujan graphs of
all degrees, Ann. of Math. (2) 182 (2015), 307–325. MR3374962 doi:10.4007/annals.2015.182.1.7 ↑105
G. A. Margulis, Explicit group-theoretic constructions of combinatorial schemes and their applications in the con-
struction of expanders and concentrators, Problemy Peredachi Informatsii 24 (1988), 51–60. MR939574 ↑104
Ju. V. Matiyasevich, The Diophantineness of enumerable sets, Dokl. Akad. Nauk SSSR 191 (1970), 279–282.
MR0258744 ↑133
Jiří Matoušek, Thirty-three miniatures, American Mathematical Society, 2010, Mathematical and algorithmic applica-
tions of linear algebra. MR2656313 doi:10.1090/stml/053 ↑180
Roy Meshulam, On subsets of finite abelian groups with no 3-term arithmetic progressions, J. Combin. Theory Ser. A
71 (1995), 168–172. MR1335785 doi:10.1016/0097-3165(95)90024-1 ↑159
Moshe Morgenstern, Existence and explicit constructions of q + 1 regular Ramanujan graphs for every prime power
q, J. Combin. Theory Ser. B 62 (1994), 44–62. MR1290630 doi:10.1006/jctb.1994.1054 ↑104
Guy Moshkovitz and Asaf Shapira, A tight bound for hypergraph regularity, Geom. Funct. Anal. 29 (2019), 1531–1578.
MR4025519 doi:10.1007/s00039-019-00512-5 ↑72
186 References

T. S. Motzkin, The arithmetic-geometric inequality, Inequalities (Proc. Sympos. Wright-Patterson Air Force Base,
Ohio, 1965), Academic Press, New York, 1967, pp. 205–224. MR0223521 ↑144
T. S. Motzkin and E. G. Straus, Maxima for graphs and a new proof of a theorem of Turán, Canadian J. Math. 17
(1965), 533–540. MR175813 doi:10.4153/CJM-1965-053-6 ↑150
Jaroslav Nešetřil and Moshe Rosenfeld, I. Schur, C. E. Shannon and Ramsey numbers, a short story, vol. 229,
2001, Combinatorics, graph theory, algorithms and applications, pp. 185–195. MR1815606 doi:10.1016/S0012-
365X(00)00208-9 ↑3
V. Nikiforov, The number of cliques in graphs of given order and size, Trans. Amer. Math. Soc. 363 (2011), 1599–1618.
MR2737279 doi:10.1090/S0002-9947-2010-05189-X ↑139
N. Nikolov and L. Pyber, Product decompositions of quasirandom groups and a Jordan type theorem, J. Eur. Math.
Soc. (JEMS) 13 (2011), 1063–1077. MR2800484 doi:10.4171/JEMS/275 ↑96
A. Nilli, On the second eigenvalue of a graph, Discrete Math. 91 (1991), 207–210. MR1124768 doi:10.1016/0012-
365X(91)90112-F ↑100
Giuseppe Pellegrino, Sul massimo ordine delle calotte in S4,3 , Matematiche (Catania) 25 (1970), 149–157 (1971).
MR363952 ↑161
Sarah Peluse, Bounds for sets with no polynomial progressions, Forum Math. Pi 8 (2020), e16, 55. MR4199235
doi:10.1017/fmp.2020.11 ↑7
Nicholas Pippenger and Martin Charles Golumbic, The inducibility of graphs, J. Combinatorial Theory Ser. B 19
(1975), 189–203. MR401552 doi:10.1016/0095-8956(75)90084-2 ↑143
D. H. J. Polymath, A new proof of the density Hales-Jewett theorem, Ann. of Math. 175 (2012), 1283–1327. MR2912706
doi:10.4007/annals.2012.175.3.6 ↑5
Alexander A. Razborov, Flag algebras, J. Symbolic Logic 72 (2007), 1239–1282. MR2371204
doi:10.2178/jsl/1203350785 ↑142
Alexander A. Razborov, On the minimal density of triangles in graphs, Combin. Probab. Comput. 17 (2008), 603–618.
MR2433944 doi:10.1017/S0963548308009085 ↑138, ↑142
Alexander A. Razborov, Flag algebras: an interim report, The mathematics of Paul Erdős. II, Springer, New York,
2013, pp. 207–232. MR3186665 doi:10.1007/978-1-4614-7254-4_16 ↑154
Christian Reiher, The clique density theorem, Ann. of Math. (2) 184 (2016), 683–707. MR3549620
doi:10.4007/annals.2016.184.3.1 ↑139
V. Rödl, B. Nagle, J. Skokan, M. Schacht, and Y. Kohayakawa, The hypergraph regularity method and its applications,
Proc. Natl. Acad. Sci. USA 102 (2005), 8109–8113. MR2167756 doi:10.1073/pnas.0502771102 ↑5, ↑70, ↑72
K. F. Roth, On certain sets of integers, J. London Math. Soc. 28 (1953), 104–109. MR51853 doi:10.1112/jlms/s1-
28.1.104 ↑4, ↑43, ↑155, ↑164
Imre Z. Ruzsa and Endre Szemerédi, Triple systems with no six points carrying three triangles, Combinatorics (Proc.
Fifth Hungarian Colloq., Keszthely, 1976), Vol. II, 1978, pp. 939–945. MR519318 ↑7, ↑43, ↑52, ↑54
Bruce E. Sagan, The symmetric group, second ed., Graduate Texts in Mathematics, vol. 203, Springer-Verlag, New
York, 2001, Representations, combinatorial algorithms, and symmetric functions. MR1824028 doi:10.1007/978-1-
4757-6804-6 ↑95
Ashwin Sah, Mehtaab Sawhney, David Stoner, and Yufei Zhao, The number of independent sets in an irregular graph,
J. Combin. Theory Ser. B 138 (2019), 172–195. MR3979229 doi:10.1016/j.jctb.2019.01.007 ↑150
Ashwin Sah, Mehtaab Sawhney, David Stoner, and Yufei Zhao, A reverse Sidorenko inequality, Invent. Math. 221
(2020), 665–711. MR4121160 doi:10.1007/s00222-020-00956-9 ↑150
Ashwin Sah, Mehtaab Sawhney, and Yufei Zhao, Patterns without a popular difference, Discrete Anal. (2021), Paper
No. 8, 30. MR4293329 doi:10.19086/da ↑179
R. Salem and D. C. Spencer, On sets of integers which contain no three terms in arithmetical progression, Proc. Nat.
Acad. Sci. U.S.A. 28 (1942), 561–563. MR7405 doi:10.1073/pnas.28.12.561 ↑57
A. Sárkőzy, On difference sets of sequences of integers. I, Acta Math. Acad. Sci. Hungar. 31 (1978), 125–149.
MR466059 doi:10.1007/BF01896079 ↑6
Tomasz Schoen and Ilya D. Shkredov, Roth’s theorem in many variables, Israel J. Math. 199 (2014), 287–308.
MR3219538 doi:10.1007/s11856-013-0049-0 ↑5
Tomasz Schoen and Olof Sisask, Roth’s theorem for four variables and additive structures in sums of sparse sets,
Forum Math. Sigma 4 (2016), e5, 28 pp. MR3482282 doi:10.1017/fms.2016.2 ↑5
Alexander Schrijver, Combinatorial optimization. Polyhedra and efficiency., Springer-Verlag, 2003. MR1956924 ↑42
I. Schur, Uber die kongruenz x m + y m ≡ z m (mod p), Jber. Deutsch. Math.-Verein 25 (1916). ↑1
References 187

J. Schur, Untersuchungen über die Darstellung der endlichen Gruppen durch gebrochene lineare Substitutionen, J.
Reine Angew. Math. 132 (1907), 85–137. MR1580715 doi:10.1515/crll.1907.132.85 ↑94
I. D. Shkredov, On a generalization of Szemerédi’s theorem, Proc. London Math. Soc. (3) 93 (2006), 723–760.
MR2266965 doi:10.1017/S0024611506015991 ↑56
Alexander Sidorenko, A correlation inequality for bipartite graphs, Graphs Combin. 9 (1993), 201–204. MR1225933
doi:10.1007/BF02988307 ↑134
Robert Singleton, On minimal graphs of maximum even girth, J. Combinatorial Theory 1 (1966), 306–332. MR201347
↑38
Jozef Skokan and Lubos Thoma, Bipartite subgraphs and quasi-randomness, Graphs Combin. 20 (2004), 255–262.
MR2080111 doi:10.1007/s00373-004-0556-1 ↑82, ↑135
József Solymosi, Note on a generalization of Roth’s theorem, Discrete and computational geometry, Algorithms
Combin., vol. 25, Springer, Berlin, 2003, pp. 825–827. MR2038505 doi:10.1007/978-3-642-55566-4_39 ↑55
Daniel A. Spielman, Spectral and algebraic graph theory, 2019, textbook draft http://cs-www.cs.yale.edu/
homes/spielman/sagt/. ↑105
Elias M. Stein and Rami Shakarchi, Fourier analysis, Princeton Lectures in Analysis, vol. 1, Princeton University
Press, Princeton, NJ, 2003, An introduction. MR1970295 ↑179
E. Szemerédi, On sets of integers containing no k elements in arithmetic progression, Acta Arith. 27 (1975), 199–245.
MR369312 doi:10.4064/aa-27-1-199-245 ↑4
Endre Szemerédi, Regular partitions of graphs, Problèmes combinatoires et théorie des graphes (Colloq. Internat.
CNRS, Univ. Orsay, Orsay, 1976), Colloq. Internat. CNRS, vol. 260, CNRS, Paris, 1978, pp. 399–401. MR540024
↑49
Terence Tao, A variant of the hypergraph removal lemma, J. Combin. Theory Ser. A 113 (2006), 1257–1280.
MR2259060 doi:10.1016/j.jcta.2005.11.006 ↑72
Terence Tao, Structure and randomness in combinatorics, 48th Annual IEEE Symposium on Foundations of Computer
Science (FOCS’07), 2007a, pp. 3–15. doi:10.1109/FOCS.2007.17 ↑177, ↑179
Terence Tao, The dichotomy between structure and randomness, arithmetic progressions, and the primes, International
Congress of Mathematicians. Vol. I, Eur. Math. Soc., Zürich, 2007b, pp. 581–608. MR2334204 doi:10.4171/022-
1/22 ↑5
Terence Tao, The spectral proof of the szemeredi regularity lemma, 2012, blog post https://terrytao.wordpress.
com/2012/12/03/the-spectral-proof-of-the-szemeredi-regularity-lemma/. ↑177
Terence Tao, A proof of Roth’s theorem, 2014, blog post https://terrytao.wordpress.com/2014/04/24/
a-proof-of-roths-theorem/. ↑179
Terence Tao and Van Vu, Additive combinatorics, Cambridge University Press, 2006. MR2289012
doi:10.1017/CBO9780511755149 ↑8
Alfred Tarski, A Decision Method for Elementary Algebra and Geometry, RAND Corporation, Santa Monica, Calif.,
1948. MR0028796 ↑133
Andrew Thomason, Pseudorandom graphs, Random graphs ’85 (Poznań, 1985), North-Holland Math. Stud., vol. 144,
North-Holland, Amsterdam, 1987, pp. 307–331. MR930498 ↑75
Andrew Thomason, A disproof of a conjecture of Erdős in Ramsey theory, J. London Math. Soc. (2) 39 (1989),
246–255. MR991659 doi:10.1112/jlms/s2-39.2.246 ↑141
Paul Turán, Eine Extremalaufgabe aus der Graphentheorie, Mat. Fiz. Lapok 48 (1941), 436–452 (Hungarian, with
German summary). ↑12
B. L. van der Waerden, Beweis einer baudetschen vermutung, Nieuw Arch. Wisk. 15 (1927), 212–216. ↑4
R. Wenger, Extremal graphs with no C 4 ’s, C 6 ’s, or C 10 ’s, J. Combin. Theory Ser. B 52 (1991), 113–116. MR1109426
doi:10.1016/0095-8956(91)90097-4 ↑38
Douglas B. West, Introduction to graph theory, Prentice Hall, Inc., 1996. MR1367739 ↑42
Avi Wigderson, Representation theory of finite groups, and applications, Lecture notes for the 22nd McGill invitational
workshop on computational complexity, 2012, https://www.math.ias.edu/~avi/TALKS/Green_Wigderson_
lecture.pdf. ↑96
David Williams, Probability with martingales, Cambridge Mathematical Textbooks, Cambridge University Press,
Cambridge, 1991. MR1155402 doi:10.1017/CBO9780511813658 ↑123, ↑124
K. Zarankiewicz, Problem 101, Colloq. Math. 2 (1951), 201. ↑19
Yufei Zhao, The number of independent sets in a regular graph, Combin. Probab. Comput. 19 (2010), 315–320.
MR2593625 doi:10.1017/S0963548309990538 ↑147, ↑149
188 References

Yufei Zhao, Extremal regular graphs: independent sets and graph homomorphisms, Amer. Math. Monthly 124 (2017),
827–843. MR3722040 doi:10.4169/amer.math.monthly.124.9.827 ↑150

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy