Gtacbook
Gtacbook
Yufei Zhao
Massachusetts Institute of Technology
i
This is a book in progress. It has not been carefully proofread. Your feedback is appreciated in
improving this draft.
Please report errors, suggestions, and comments to
https://bit.ly/gtac-form
Current students of MIT 18.225 should use the Piazza forum instead.
Appetizer
Theorem 0.1.1 (Schur’s theorem). If the positive integers are colored with finitely many colors,
then there is always a monochromatic solution to x + y = z (i.e., x, y, z all have the same color).
We will prove Schur’s theorem shortly. But first, let us show how to deduce the existence of
solutions to X n + Y n ≡ Z n (mod p) using Schur’s theorem.
Schur’s theorem is stated above in its “infinitary” form. It is equivalent to a “finitary” formulation
below.
We write [N] := {1, 2, . . . , N }.
Theorem 0.1.2 (Schur’s theorem, finitary version). For every positive integer r, there exists a
positive integer N = N(r) such that if the elements of [N] are colored with r colors, then there is a
monochromatic solution to x + y = z with x, y, z ∈ [N].
With the finitary version, we can also ask quantitative questions such as how big does N(r)
have to be as a function of r. For most questions of this type, we do not know the answer, even
approximately.
Let us show that the two formulations, Theorems 0.1.1 and 0.1.2, are equivalent. It is clear
that the finitary version of Schur’s theorem implies the infinitary version. To see that the infinitary
version implies the finitary version, fix r, and suppose that for every N there is some coloring
φ N : [N] → [r] that avoids monochromatic solutions to x + y = z. We can take an infinite
subsequence of (φ N ) such that, for every k ∈ N, the value of φ N (k) stabilizes to a constant as N
increases along this subsequence. Then the φ N ’s, along this subsequence, converges pointwise to
some coloring φ : N → [r] avoiding monochromatic solutions to x + y = z, but this contradicts the
infinitary statement.
Let us now deduce the claim about the modular Fermat’s equation discussed at the beginning.
Theorem 0.1.3. Let n be a positive integer. For all sufficiently large primes p, there are X, Y, Z ∈
{1, . . . , p − 1} such that X n + Y n ≡ Z n (mod p).
Proof of Theorem 0.1.3 assuming Schur’s theorem (Theorem 0.1.2). We write (Z/pZ)× for the group
of nonzero residues mod p under multiplication. Let H = {x n : x ∈ (Z/pZ)× } be the subgroup of
n-th powers in (Z/pZ)× . Since (Z/pZ)× is a cyclic group of order p − 1 (due to the existence of
1
2 0. APPETIZER
Figure 0.1.1. Frank Ramsey (1903–1930) had made major contributions to mathe-
matical logic, philosophy, and economics, before his untimely death at age 26 after
suffering from chronic liver problems.
primitive roots mod p, a fact from elementary number theory), the index of H in (Z/pZ)× is equal
to gcd(n, p − 1) ≤ n. So the cosets of H partition {1, 2, . . . , p − 1} into at most n sets. By the finitary
statement of Schur’s theorem (Theorem 0.1.2), for p large enough, there is a solution to
x+y=z in Z
in one of the cosets of H, say aH for some a ∈ (Z/pZ)× . Since H consists of n-th powers, we have
x = aX n , y = aY n , and z = aZ n for some X, Y, Z ∈ (Z/pZ)× . Thus
aX n + aY n ≡ aZ n (mod p).
Hence
Xn + Y n ≡ Zn (mod p)
as desired.
Now let us prove Theorem 0.1.2 by deducing it from a similar sounding result about coloring
the edges of a complete graph. The next result is a special case of Ramsey’s theorem.
Theorem 0.1.4 (Multicolor triangle Ramsey theorem). For every positive integer r, there is some
integer N = N(r) such that if the edges of KN , the complete graph on N vertices, are colored with
r colors, then there is always a monochromatic triangle.
Proof. Define
N1 = 3, and Nr = r(Nr−1 − 1) + 2 for all r ≥ 2. (0.1.1)
We will show by induction on r that every coloring of the edges of KNr by r colors has a monochro-
matic triangle. The case r = 1 holds trivially. Suppose the claim is true for r − 1 colors. Consider
any coloring of the edges of KNr by r colors. Pick an arbitrary vertex v.
V0
Of the Nr − 1 = r(Nr−1 − 1) + 1 edges incident to v, by the pigeonhole principle, at least Nr−1 edges
incident to v have the same color, say red. Let V0 be the vertices joined to v by a red edge. If there
is a red edge inside V0 , we obtain a red triangle. Otherwise, there are at most r − 1 colors appearing
among |V0 | ≥ Nr−1 vertices, and we have a monochromatic triangle by induction.
0.1. SCHUR’S THEOREM 3
φ(k − i)
i φ( j − i) j φ(k − j) k
Proof of Schur’s theorem (Theorem 0.1.2). Let φ : [N] → [r] be a coloring. Color the edges of a
complete graph with vertices {1, . . . , N + 1} by giving the edge {i, j} with i < j the color φ( j − i).
By Theorem 0.1.4, if N is large enough, then there is a monochromatic triangle, say on vertices
i < j < k. So φ( j − i) = φ(k − j) = φ(k − i). Take x = j − i, y = k − j, and z = k − i. Then
φ(x) = φ(y) = φ(z) and x + y = z, as desired.
Notice how we solved a number theory problem by moving over to a graph theoretic setup.
We gained greater flexibility by considering graphs, since the induction on vertices does not
correspond to a natural operator for the integers that we started with. Later on we will see other
more sophisticated examples of this idea, where taking a number theoretic problem to the land of
graph theory gives us a new perspective.
Exercise 0.1.7 (Schur’s lower bound). Let N(r) denote the smallest positive integer in Schur’s
theorem, Theorem 0.1.2. Show that N(r) ≥ 3N(r − 1) − 1 for every r.
Deduce that N(r) ≥ (3r + 1)/2 for every r. Also deduce that there exists a coloring of the edges
of K(3r +3)/2 with r colors so that there are no monochromatic triangles.
Exercise 0.1.8 (Upper bound on Ramsey numbers). Let s and t be positive integers. Show that if
s+t−2
the edges of a complete graph on s−1 vertices are colored with red and blue, then there must be
either a red Ks or a blue Kt .
Exercise 0.1.9 (Ramsey’s theorem). Show that for every k and r there exists some N = N(k, r)
such that every coloring of the edges of KN using r colors, there exists some monochromatic copy
clique on k vertices.
4 0. APPETIZER
Theorem 0.2.1 (van der Waerden’s theorem). If the integers are colored with finitely many colors,
then one of the color classes must contain arbitrarily long arithmetic progressions.
Note that having arbitrarily long arithmetic progressions is very different from having infinitely
long arithmetic progressions, as seen in the next exercise.
Exercise 0.2.2. Show that Z may be colored using two colors so that it contains no infinitely long
arithmetic progressions.
Erdős and Turán (1936) conjectured a stronger statement, that any subset of the integers with
positive density contains arbitrarily long arithmetic progressions. To be precise, we say that A ⊂ Z
has positive upper density if
| A ∩ {−N, . . . , N }|
lim sup > 0. (0.2.1)
N→∞ 2N + 1
(There are several variations of this definition—the exact formulation is not crucial.)
Roth (1953) proved the conjecture for 3-term arithmetic progression using Fourier analytic
methods. Szemerédi (1975) fully settled the conjecture using combinatorial techniques. These are
landmark theorems in the field. Much of what we will discuss are motivated by these results and
the developments around them.
Theorem 0.2.3 (Roth’s theorem). Every subset of the integers with positive upper density contains
a 3-term arithmetic progression.
Theorem 0.2.4 (Szemerédi’s theorem). Every subset of the integers with positive upper density
contains arbitrarily long arithmetic progressions.
0.2. PROGRESSIONS 5
Szemerédi’s theorem is deep and intricate. This important work led to many subsequent
developments in additive combinatorics. Several different proofs of Szemerédi’s theorem have
since been discovered, and some of them have blossomed into rich areas of mathematical research.
Here are some the most influential modern proofs of Szemerédi’s theorem (in historical order):
• The ergodic theoretic approach by Furstenberg (1977);
• Higher-order Fourier analysis by Gowers (2001);
• Hypergraph regularity lemma by independently Rödl et al. (2005) and Gowers (2001).
Another modern proof of Szemerédi’s theorem results from the density Hales–Jewett theorem,
which was originally proved by Furstenberg and Katznelson using ergodic theory, and subsequently
a new combinatorial proof was found in the first successful Polymath Project (Polymath 2012), an
online collaborative project initiated by Gowers.
These different approaches have distinct advantages and disadvantages. The ideas they intro-
duces led to other applications. For example, the ergodic approach led to multidimensional and
polynomial generalizations of Szemerédi’s theorem, which we discuss below. On the other hand,
the ergodic approach does not give any concrete quantitative bounds. Fourier analytic methods
give the best quantitative bounds to Szemerédi’s theorem, and it lead to deep results about counting
patterns in the prime numbers. However, there appear to be difficulties and obstructions extending
Fourier analytic methods to higher dimensions.
The relationships between these different approaches to Szemerédi’s theorem are not yet com-
pletely understood. A unifying theme underlying all known approaches to Szemerédi’s theorem is
the dichotomy between structure and pseudorandomness, a term popularized by Tao (2007b) and
others. We will later see different facets of this dichotomy both in the context of graph theory as
well as in number theory.
There is also much interest in obtaining better quantitative bounds on Szemerédi’s theorem.
Roth’s initial proof showed that every subset of [N] avoiding 3-term arithmetic progressions has
size O(N/log log N). We will see this proof in Chapter 6. Roth’s upper bound has been improved
steadily over time, all via refinement of his Fourier analytic technique. At the time of this writing,
the current best claimed upper bound is N/(log N)1+c for some constant c > 0 (Bloom and Sisask
2020). For 4-term arithmetic progressions, the best known upper bound is N/(log N)c (Green and
Tao 2017). For k-term arithmetic progressions, with fixed k ≥ 5, the best known upper bound √
is N/(log log N)ck . As for lower bounds, Behrend constructed a subset of [N] of size Ne−c log N
avoiding three term arithmetic progressions. There is some evidence leading to a common opinion
among researchers at this lower bound is closer to the truth, since for a certain variant of Roth’s
theorem (namely avoiding solutions to x + y + z = 3w), Behrend’s construction is quite close to
the truth (Schoen and Shkredov 2014; Schoen and Sisask 2016).
Erdős famously conjectured the following.
Conjecture 0.2.5 (Erdős conjecture on arithmetic progressions). Every subset A of integers with
= ∞ contains arbitrarily long arithmetic progressions.
Í
a∈A 1/a
which | A ∩ [N]| ≥ N/(log N(log log N)2 ). Morally the hypothesis is roughly a statement about
the density of A, that is not too much smaller than the primes. So the Erdős conjecture is really
a conjecture about an upper bound on Szemerédi’s theorem. Although many believe that much
stronger statement than the Erdős conjecture is true, as discussed in the last part of the previous
paragraph, the “logarithmic barrier” has a special symbolic status. Erdős’ conjecture for k-term
arithmetic progressions is now proved for k = 3 thanks for the new N/(log N)1+c upper bound
(Bloom and Sisask 2020), but it remains very much open for all k ≥ 3.
Perhaps by the time you read this book (or when I update it to a future edition), these bounds
will have been improved.
| A ∩ [−N, N]d |
lim sup >0
N→∞ (2N + 1)d
(as before, other similar definitions are possible). We say that A contains arbitrary constellations
if for every finite set F ⊂ Zd , there is some a ∈ Zd and t ∈ Z>0 such that a+t ·F = {a+t x : x ∈ F} is
contained in A. In other words, A contains every finite pattern, each consisting of some finite subset
of the integer grid allowing dilation and translation. The following multidimensional generalization
of Szemerédi’s theorem was proved by Furstenberg and Katznelson (1978) initially using ergodic
theory, though a combinatorial proof was later discovered as a consequence of the hypergraph
regularity method mentioned earlier.
Theorem 0.2.6 (Multidimensional Szemerédi theorem). Every subset of Zd of positive upper density
contains arbitrary constellations.
For example, the theorem implies that every subset of Zd of positive upper density contains a
10 × 10 set of points that form an axis-aligned square grid.
There is also a polynomial extension of Szemerédi’s theorem. Let us first state a special case,
originally conjectured by Lovász and proved independently by Furstenberg (1977) and Sárkőzy
(1978).
Theorem 0.2.7 (Furstenberg–Sárkőzy theorem). Any subset of the integers with positive upper
density contains two numbers differing by a perfect square.
In other words, the set always contains {x, x + y 2 } for some x ∈ Z and y ∈ Z>0 . What about
other polynomial patterns? The following polynomial generalization was proved by Bergelson and
Leibman (1996).
Theorem 0.2.8 (Polynomial Szemerédi theorem). Suppose A ⊂ Z has positive upper density. If
P1, . . . , Pk ∈ Z[X] are polynomials with P1 (0) = · · · = Pk (0) = 0, then there exist x ∈ Z and
y ∈ Z>0 such that x + P1 (y), . . . , x + Pk (y) ∈ A.
We leave it as an exercise to formulate a common extension of the above two theorems (i.e., a
multidimensional polynomial Szemerédi theorem). Such a theorem was also proved by Bergelson
and Leibman.
0.2. PROGRESSIONS 7
We will not cover the proof of Theorems 0.2.6 and 0.2.8. In fact, currently the only known
general proof of the polynomial Szemerédi theorem uses ergodic theory, though there are some
recent exciting developments (Peluse 2020).
Building on Szemerédi’s theorem as well as other important developments in number theory,
Green and Tao (2008) proved their famous theorem that settled an old folklore conjecture about
prime numbers. Their theorem is considered one of the most celebrated mathematical achievements
this century.
Theorem 0.2.9 (Green–Tao theorem). The primes contain arbitrarily long arithmetic progressions.
We will discuss many central ideas behind the proof of the Green–Tao theorem.
One of our goals is to understand two different proofs of Roth’s theorem, which can be rephrased
as:
Theorem 0.2.10 (Roth’s theorem). Every subset of [N] that does not contain 3-term arithmetic
progressions has size o(N).
Roth originally proved his result using Fourier analytic techniques, which we will see in the
second half of this book starting in Chapter 6. In the 1970’s, leading up to Szemerédi’s proof of
his landmark result, Szemerédi developed an important tool known as the graph regularity lemma.
Ruzsa and Szemerédi (1978) used the graph regularity lemma to give a new graph theoretic proof
of Roth’s theorem. One of our first goals is to understand this graph theoretic proof. Along the
way, we will also explore many related topics in graph theory, especially those that share a theme
with central topics in additive combinatorics.
As in the proof of Schur’s theorem, we will formulate a graph theoretic problem whose solution
implies Roth’s theorem. Once again, we will set up a graph whose triangles encode 3-term
arithmetic progressions.
Extremal graph theory, broadly speaking, concerns questions of the form: what is the maxi-
mum (or minimum) possible number of (something) in a graph with certain prescribed properties.
A starting point (historically and also pedagogically) in extremal graph theory is the following
question:
Question 0.2.11. What is the maximum number of edges in a triangle-free n-vertex graph?
This question is relatively easy, and it was answered by Mantel in the early 1900’s (and
subsequently rediscovered and generalized by Turán). It will be the first result that we shall prove
in the next chapter. However, even though this question/result sounds similar to Roth’s theorem, it
cannot be used to deduce Roth’s theorem. Later on, we will construct a graph that corresponds to
Roth’s theorem, and it turns out that the right question to ask is:
Question 0.2.12. What is the maximum number of edges in an n-vertex graph where every edge
is contained in a unique triangle?
This innocent looking question turns out to be incredibly mysterious. We are still far from
knowing the truth. We will later prove, using Szemerédi’s regularity lemma, that any such graph
must have o(n2 ) edges, and we will then deduce Roth’s theorem from this graph theoretic claim.
8 0. APPETIZER
Further reading
The textbook Ramsey Theory by Graham, Rothschild, and Spencer (2013) is a wonderful
introduction to the subject. It has beautiful accounts of theorems of Ramsey, van der Waerden,
Hales–Jewett, Schur, Rado, and others, that form the foundation of Ramsey theory.
For a comprehensive survey of modern developments in additive combinatorics, check out
Green (2009a)’s review of the book Additive Combinatorics by Tao and Vu (2006).
CHAPTER 1
Forbidding a subgraph
In this book, when we say H-free, we always mean free of H as a subgraph. On the other hand,
we would say induced H-free to mean free of H as an induced subgraph. For a clique H = Kr
(e.g., a triangle), being Kr -free is the same as induced Kr -free.
9
10 1. FORBIDDING A SUBGRAPH
In the first part of the chapter, we focus on techniques for upper bounding ex(n, H). In the last
few sections, we turn our attention to lower bounding ex(n, H) when H is a bipartite graph.
K4,4 = .
The graph K bn/2c,dn/2e has bn/2c dn/2e = n2 /4 edges (one can check this equality by separately
considering even and odd n).
Mantel (1907) proved that K bn/2c,dn/2e has the most number of edges among all triangle-free
graphs. This is considered the first result in extremal graph theory.
Theorem 1.1.1 (Mantel’s theorem). Every n-vertex triangle-free graph has at most bn2 /4c edges.
x
N(x)
N(y)
y
Figure 1.1.1. In the first proof of Mantel’s theorem, adjacent vertices have disjoint
neighborhoods.
1.1. FORBIDDING A TRIANGLE 11
On the other hand, note that for each vertex x, the term deg x appears once in the sum for each
edge incident to x, so it appears a total of deg x times. Hence
!2
Õ Õ 1 Õ (2m)2
(deg x + deg y) = d(x)2 ≥ d(x) ≥ .
xy∈E x∈V
n x∈V
n
Comparing the two inequalities, we obtain (2m)2 /n ≤ mn, and hence m ≤ n2 /4. Since m is an
integer, we obtain m ≤ bn2 /4c, as claimed.
Second proof of Mantel’s theorem. Let G = (V, E) be a triangle-free graph. Let v be a vertex of
maximum degree in G. Since G is triangle-free, the neighborhood N(v) of v is an independent set
(see Figure 1.2.2).
A B
(indep set)
as claimed.
Remark 1.1.2 (The equality case in Mantel’s theorem). The second proof above shows that every
2
Í bn /4c edges must be isomorphic to K bn/2c,dn/2e . Indeed,
n-vertex triangle-free graph with exactly
in (1.1.1), the first inequality |E | ≤ x∈B deg x is tight only if B is an independent set, the second
inequality is tight if B is complete to A, and | A| |B| < bn2 /4c unless | A| = |B| (if n is even) or
|| A| − |B|| = 1 (if n is odd). (Exercise: deduce the equality case from the first proof.)
In general, it is a good idea to keep the equality case in mind when following the proofs, or
when coming up with your own proofs, to make sure that the steps are not lossy.
The next several exercises explore extensions of Mantel’s theorem. It is useful to revisit the
proof techniques.
Exercise 1.1.3. Let G be a Kr+1 -free graph. Prove that there is another graph H on the same vertex
set as G such that χ(H) ≤ r and dH (x) ≥ dG (x) for every vertex x (here dH (x) is the degree of x in
H, and likewise with dG (x) for G). Give another proof of Turán’s theorem from this fact.
12 1. FORBIDDING A SUBGRAPH
Exercise 1.1.4 (Many triangles). Show that a graph with n vertices and m edges has at least
n2
4m
m− triangles.
3n 4
Exercise 1.1.5. Prove that every n-vertex non-bipartite triangle-free graph has at most (n−1)2 /4+1
edges.
Exercise 1.1.6 (Stability). Let G be an n-vertex triangle-free graph with at least n2 /4 − k edges.
Prove that G can be made bipartite by removing at most k edges.
Exercise 1.1.7. Show that every n-vertex triangle-free graph with minimum degree greater than
2n/5 is bipartite.
Exercise 1.1.8∗. Prove that every n-vertex graph with at least n2 /4 + 1 edges contains at least
bn/2c triangles.
Exercise 1.1.9∗. Prove that every n-vertex graph with at least n2 /4 + 1 edges contains some edge
in at least (1/6 − o(1))n triangles, and that this constant 1/6 is best possible.
The next exercise can be solved by a neat application of Mantel’s theorem.
Exercise 1.1.10. Let X and Y be independent and identically distributed random vectors in Rd
according to some arbitrary probability distribution. Prove that
1
P(|X + Y | ≥ 1) ≥ P(|X | ≥ 1)2 .
2
1.2. Forbidding a clique
In Mantel’s theorem, what happens if we replace the triangle by K4 , a clique on 4 vertices? Or
a clique of fixed given size?
Question 1.2.1. What is the maximum number of edges in a Kr+1 -free graph on n vertices?
Let us first consider the easiest case of the problem where we restrict to r-partite graphs, which
are automatically Kr+1 -free. This observation will be useful in the general proof later.
Lemma 1.2.2. Among n-vertex r-partite graphs, the graph with the maximum number of edges has
all r vertex parts as equal in size as possible (i.e., n mod r parts have size dn/re and the rest have
size bn/rc).
Proof. If two vertex parts differ in size by more than one, then moving a vertex from a larger part
to a smaller part would strictly increase the number of edges in the graph.
Definition 1.2.3. The Turán graph Tn,r is defined to be the unique n-vertex, r-partite graph, with
part sizes differing by at most 1 (so each part has size bn/rc or dn/re).
Turán (1941) proved the following fundamental result, generalizing Mantel’s theorem (Theo-
rem 1.1.1) from triangles to arbitrary cliques. Turán’s work initiated the direction of extremal graph
theory.
Theorem 1.2.4 (Turán’s theorem). The Turán graph Tn,r maximizes the number of edges among all
n-vertex Kr+1 -free graphs. It is also the unique maximizer.
1.2. FORBIDDING A CLIQUE 13
In other words,
ex(n, Kr+1 ) = e(Tn,r ).
It is not too hard to give precise formula for e(Tn,r ), though there is a small annoying dependence
on the residue class of n mod r. The following bound is usually good enough for most purposes.
Corollary 1.2.5 (Turán’s theorem). The number of edges in an n-vertex Kr+1 -free graph is at most
1 n2
1− .
r 2
2
Exercise 1.2.6. Show that e(Tn,r ) ≤ 1 − 1r n2 .
Note that if n is divisible by r, then (1 − 1/r)n2 /2 is exactly the number of edges in Tn,r . Even
when n is not divisible by r, the difference between e(Tn,r ) and (1 − 1/r)n2 /2 is O(nr). is O(nr).
As we are generally interested in the regime when r is fixed, this difference is a negligible lower
order contribution, i.e.,
2
1 n
ex(n, Kr+1 ) = 1 − − o(1) , for fixed r. (1.2.1)
r 2
We now give three proofs of Turán’s theorem. The first proof extends our second proof of
Mantel’s theorem.
First proof of Turán’s theorem. We prove by induction on r. The case r = 1 is trivial. Now assume
r > 1.
A B
(Kr -free)
Figure 1.2.2. In the first proof of Turán’s theorem, the neighborhood of v is Kr -free.
Let G = (V, E) be a Kr+1 -free graph. Let v be a vertex of maximum degree in G. Since G is
Kr+1 -free, the neighborhood A = N(v) of v is Kr -free, and hence contains at most e(T| A|,r−1 ) edges
by the induction hypothesis.
Let B = V \ A. Since v is a vertex of maximum degree, we have deg x ≤ deg(v) = | A| for all
x ∈ V. So the number of edges with at least one vertex in B is
Õ
≤ deg(y) ≤ | A| |B| .
y∈B
14 1. FORBIDDING A SUBGRAPH
Letting e(A) denote the number of edges within A, and likewise e(B), we have
e(G) = e(A) + e(A, B) + e(B)
r
≤ + (r − 1)(n − r) + ex(n − r, Kr+1 ), (1.2.2)
2
where in the final step we use that the subgraph induced by B is an Kr+1 -free graph on n −r vertices.
Applying the induction hypothesis, we have ex(n − r, Kr+1 ) = e(Tn−r,r ). Thus
r
e(G) ≤ + (r − 1)(n − r) + e(Tn−r,r ).
2
It remains to check that the right-hand side equals to e(Tn,r ). Note that
if we remove one vertex
from each part of Tn,r , we end up with Tn−r,r after removing exactly 2r + (r − 1)(n − r) edges. This
shows that e(G) ≤ e(Tn−r,r ).
To see that Tn,r is the unique maximizer, we check when equality occurs in all steps of the proof.
Suppose G is a maximizer, i.e., has e(Tn,r ) edges. The subgraph induced on B must be Tn−r,r by
induction. To have e(A) = 2r in (1.2.2), A must induce a clique. To have e(A, B) = (r − 1)(n − r),
every vertex of B must be adjacent to all but one vertex in A. Also, two vertices x, y lying in distinct
1.2. FORBIDDING A CLIQUE 15
parts of G[B] Tn−r,r cannot “miss” the same vertex v of A, or else A ∪ {x, y} \ {v} would be an
Kr+1 -clique. This then forces G to be Tn,r .
The third proof uses a method known as Zykov symmetrization. The idea here is that if a
Kr+1 -free graph is a not a Turán graph, then we should be able make some local modifications
(namely replacing a vertex by a clone of another vertex) to get another Kr+1 -free with strictly more
edges.
Third proof of Turán’s theorem. As before, let G be an n-vertex, Kr+1 -free graph with the maximum
possible number of edges.
We claim that if x and y are non-adjacent vertices, then deg x = deg y. Indeed, if deg x > deg y,
say, then we can modify G by replacing y by a “clone” of x (i.e., with the same neighbors as x).
The resulting graph would still be Kr+1 -free (since a clique cannot contain both x and its clone)
and has strictly more edges than G, thereby contradicting the assumption that G has the maximum
possible number of edges.
x y x x0
−→
x x
y −→
z x0 x 00
The last proof we give in this section uses the probabilistic method. This probabilistic proof was
given in the book The probabilistic method by Alon and Spencer, though the key inequality is due
earlier to Caro and Wei. Below, we prove Turán’s theorem in the formulation of Corollary 1.2.5, i.e,
ex(n, Kr+1 ) ≤ (1 − 1/r)n2 /2. A more careful analysis of the proof can yield the stronger statement
of Theorem 1.2.4, which we omit.
Fourth proof of Turán’s theorem. Let G = (V, E) be an n-vertex, Kr+1 -free graph. Consider a
uniform random ordering σ of the vertices. Let
X = {v ∈ V : v is adjacent to all earlier vertices in σ}.
Observe that the set of vertices in X form a clique. Since the permutation was chosen uniformly at
random, we have
1
P(v ∈ X) = P(v appears before all non-neighbors) = .
n − deg v
16 1. FORBIDDING A SUBGRAPH
Determining the Turán density π(H) is equivalent to determining ex(n, H) up to an o(n2 ) additive
error.
Turán’s theorem implies that
1
π(Kr+1 ) = 1 − .
r
In the next couple of sections we will determine the the Turán density for every graph H. We will
see the Erdős–Stone–Simonovits theorem, which will tell us that
1
π(H) = 1 −
χ(H) − 1
where χ(H) is the chromatic number of H.
Here is an equivalent definition of Turán density: π(H) is smallest real number so that for every
> 0 there is some n0 = n0 (H, ) so that for every n ≥ n0 , every n-vertex graph with at least
(π(H) + ) 2n edges contains H as a subgraph.
It turns out there is a general phenomenon in combinatorics where once some density crosses
an existence threshold (e.g., the Turán density is the threshold for H-freeness), it will be possible
to find not just one copy of the desired object, but in fact lots and lots of copies. This principle is
usually called supersaturation. It is a fundamental idea useful for many applications, including in
our upcoming determination of π(H) for general H.
The next statement is an instance of the supersaturation phenomenon for the Turán problem. It
converts an extremal result to a counting result. The proof technique is worth paying attention to,
as it can be used to prove similar results in many settings.
Theorem 1.3.3 (Supersaturation). For every > 0 and graph H there exist some δ > 0 and n0
such that every graph on n ≥ n0 vertices with at least (π(H) + )
n
2 edges contains at least δn
v(H)
copies of H as a subgraph.
Note that δnv(H) has the best possible order of magnitude, since even the complete graph on n
vertices only has OH (nv(H) ) copies of H. The precise dependence of the optimal δ versus is a
difficult problem already when H is a triangle; we will discuss it in Chapter 5.
The idea, sometimes called “subsampling,” is sample a v(H)-vertex subset of G in two stages.
We first sample a random n0 -vertex subset S, where n0 is a large constant so that with good
probability G[S] has enough edge density to guarantee one copy of H. If G[S] indeed has a copy
of H, then we can sample again a v(H)-vertex subset from S, and we would obtain a copy of H
with at least constant probability. The same argument can also be phrased non-probabilistically via
double-counting, though the probabilistic argument, once understood, can be helpful in seeing that
we should expect to obtain the right order of magnitude.
Here is a simple but useful lemma.
Lemma 1.3.4. Let X be a real random variable taking values in [0, 1]. Then P(X ≥ EX − ) ≥ .
18 1. FORBIDDING A SUBGRAPH
Proof. Let µ = EX. By separately considering what happens when X ≤ µ − versus when
X > µ − (in the latter case we bound by X ≤ 1), we have
µ ≤ (µ − )P(X ≤ µ − ) + P(X > µ − ) ≤ µ − + P(X > µ − ).
Thus P(X > µ − ) ≥ .
Proof of supersaturation (Theorem 1.3.3). By the definition of the Turán density, there exists some
constant n0 depending only on H and such that every n0 -vertex graph with at least (π(H)+/2) n20
edges contains H as a subgraph.
Let n ≥ n0 and G be n-vertex graph with at least (π(H) + ) 2n edges. Let S be an n0 -element
subset of V(G), chosen uniformly at random. Let X denote the edge density of G[S]. By averaging,
EX equals to the edge density of G, and so EX ≥ π(H) + . Then by Lemma 1.3.4, with probability
at least /2, X ≥ π(H) + /2, in which case G[S] contains a copy of H by the earlier paragraph.
Let T be a uniformly random v(H)-element subset of the n0 -element set S. Conditioned on the
edge density of G[S] being at least π(H) + /2, the probability that G[T] contains H as a subgraph
n0
is thus at least 1/ v(H) (a constant not depending on n). Thus the unconditional probability of G[T]
containing H as a subgraph is at least /(2 v(H) n0 n
). So there are at least v(H) /(2 v(H)
n0
) copies of
H in G, which implies the claim.
As a corollary, we obtain the following supersaturation version of Turán’s theorem.
every > 0 and positive integer r, there is some δ > 0 such that every n-vertex
Corollary 1.3.5. For
n2
graph with at least 1 − 1
r + 2 edges contains at least δnr+1 copies of Kr+1 .
Proof. Applying supersaturation with π(Kr+1 ) = 1 − 1/r, we deduce the corollary for all n > n0 for
some n0 = n0 (, r). Finally, by decreasing δ (if necessary) to below 1/nr+1
0 , the claim becomes true
for all n, since for n ≤ n0 Turán’s theorem guarantees us at least one copy of Kr+1 .
As mentioned earlier, we will soon determine the Turán density π(H) for every graph H. It
may seem like the Turán problem is essentially understood, but actually this would be very far
from the truth. We will see in the next section that π(H) = 0 for every bipartite graph H, i.e.,
ex(n, H) = o(n2 ), but actual asymptotics for ex(n, H) are often unknown.
In a different direction, the generalization to hypergraphs, while looking deceptively similar,
turns out to be much more difficult, and very little is known here.
Remark 1.3.6 (Hypergraph Turán problem). Generalizing from graphs to hypergraphs, given an
r-uniform hypergraph H, we write ex(n, H) for the maximum number of edges in an n-vertex
r-uniform hypergraph that does not contain H as a subgraph. A straightforward extension of
n
Proposition 1.3.1 gives that ex(n, H)/ r is a non-increasing function of n, for each fixed H. So we
can similarly define the hypergraph Turán density
ex(n, H)
π(H) := lim n .
n→∞
r
The exact value of π(H) is known in very few cases. It is a major open problem to determine π(H)
when H is the complete 3-uniform hypergraph on 4 vertices (also known as a tetrahedron), and
more generally when H is a complete hypergraph.
Exercise 1.3.7 (Density Ramsey). Prove that for every s and r, there is some constant c > 0 so that
for every sufficiently large n, if the edges of Kn are colored using r colors, then at least c fraction
of all copies of Ks are monochromatic.
1.4. FORBIDDING A COMPLETE BIPARTITE GRAPH 19
Exercise 1.3.8 (Density Szemerédi). Let k ≥ 3. Assuming Szemerédi’s theorem for k-term
arithmetic progressions (i.e., every subset of [N] without a k-term arithmetic progression has size
o(N)), prove the following density version of Szemerédi’s theorem:
For every δ > 0 there exist c and N0 (both depending only on k and δ) such that for every
A ⊂ [N] with | A| ≥ δN and N ≥ N0 , the number of k-term arithmetic progressions in A is at least
cN 2 .
Theorem 1.4.2 (Kővári–Sós–Turán “KST” theorem). For positive integers s ≤ t, there exists some
constant C = C(s, t), such that, for all n,
ex(n, Ks,t ) ≤ Cn2−1/s .
The proof proceeds by double counting.
..
X = number of copies of Ks,1 . in G.
(When s = 1 we need to modify the definition of X slightly to count each copy twice.) The strategy
is to count X in two ways. First we count Ks,1 by first embedding the “left” s vertices of Ks,1 . Then
we count Ks,1 by first embedding the “right” single vertex of Ks,1 .
Upper bound on X. Every subset of s vertices in G has at most t − 1 common neighbors since
G is Ks,t -free. Therefore,
n
X≤ (t − 1).
s
Lower bound on X. For each vertex v of G, there are exactly degs v ways to pick s of its
neighbors to form a Ks,1 as a subgraph. Therefore
Õ deg v
X=
s
v∈V(G)
20 1. FORBIDDING A SUBGRAPH
To obtain a lower bound on this quantity in terms of the number of edges m of G, we use a standard
trick by viewing xs as a convex function on the reals, namely, letting
(
x(x − 1) · · · (x − s + 1)/s! if x ≥ s − 1
fs (x) =
0 x < s − 1.
Then f (x) = xs for all nonnegative integers x. Furthermore fs is a convex function. Since the
average degree of G is 2m/n, it follows by convexity that
Õ 2m
X= fs (deg v) ≥ n fs .
n
v∈V(G)
Combining the upper bound and the lower bound. We find that
2m n
n fs ≤X≤ (t − 1).
n s
Since fs (x) = (1 + o(1))x s /s! for x → ∞ and fixed s, we find that, as n → ∞,
s
n 2m ns
≤ (1 + o(1)) (t − 1)
s! n s!
Therefore,
(t − 1)1/s
m≤ + o(1) n2−1/s .
2
The final bound in the proof gives us a somewhat more precise estimate than stated in Theo-
rem 1.4.2. Let us record it here for future reference.
(t − 1)1/s
ex(n, Ks,t ) ≤ + o(1) n2−1/s .
2
It has been long conjectured that the KST theorem is tight up to a constant factor.
Conjecture 1.4.4. For positive integers s ≤ t, there exists a constant c = c(s, t) such that for all
n ≥ 2,
ex(n, Ks,t ) ≥ cn2−1/s .
In the final sections of this chapter, we will produce some constructions showing that Conjec-
ture 1.4.4 is true for K2,t and K3,t . We also know that the conjecture is true if t is much larger than
s, namely when t > (s − 1)!. The first open case of the conjecture is K4,4 , although there is dividing
opinion among researchers whether they think the conjecture should be true in this case.
Since every bipartite graph H is a subgraph of some Ks,t , and every H-free graph must be
Ks,t -free, we obtain the following corollary of the KST theorem.
Corollary 1.4.5. For every bipartite graph H, there exists some constant c > 0 so that ex(n, H) =
O(n2−c ).
1.4. FORBIDDING A COMPLETE BIPARTITE GRAPH 21
We give a geometric application of the KST theorem. The following famous problem was posed
by Erdős (1946).
Question 1.4.6 (Unit distance problem). What is the maximum number of unit distances formed
by a set of n points in R2 ?
In other words, given n distinct points in the plane, at most how many pairs of these points can
be exactly distance 1 apart. We can draw a graph with these n points as vertices, with edges joining
points exactly unit distance apart.
To get a feeling for the problem, let us play with some constructions. For small values of n, it
is not hard to check by hand that the following configurations are optimal.
n= 3 4 5 6 7
What about for larger values of n? If we line up the n points equally spaced on a line, we get
n − 1 unit distances.
···
We can be a bit more efficient in by chaining up triangles. The following construction gives us
2n − 3 unit distances.
···
The construction for n = 6 looks like it was obtained by copying and translating a unit triangle. We
can generalize this idea to obtain a recursive construction. Let f (n) denote the maximum number
of unit distances formed by n points in the plane. Given a configuration P with bn/2c points that
has f (bn/2c) unit distances, we can copy P and translate it by a generic unit vector to get P0. The
configuration P ∪ P0 has at least 2 f (bn/2c) + bn/2c unit distances. We can solve the recursion to
get f (n) ≥ cn log n for some constant c.
P P0
1
Now
√ we √take a different approach to obtain an even better construction. Take a square grid with
b nc × b nc vertices. Instead of choosing√ the distance between adjacent points as the unit distance,
we can scale the configuration so that r becomes the “unit” distance for some integer r. As an
illustration, here is an example of a 5 × 5 grid with r = 10.
It turns out that by choosing the optimal r as a function of n, we can get at least
n1+c/log log n
22 1. FORBIDDING A SUBGRAPH
unit distances, where c > 0 is some absolute constant. The proof uses analytic number theory,
which we omit as it would take us too far afield. The basic idea is to choose r to be a product of
many distinct primes that are congruent to 1 modulo 4, so that r can be represented as a sum of two
squares in many different ways, and then estimate the number of such ways.
It is conjectured that the last construction above is close to optimal.
Conjecture 1.4.7. Every set of n points in R2 has at most n1+o(1) unit distances.
The KST theorem can be used to prove the following upper bound on the number of unit
distances.
p q
Figure 1.4.1. Two vertices p, q can have at most two common neighbors in the unit
distance graph.
Proof. The unit distance graph is K2,3 -free, for every pair of distinct points, there are at most two
other points that are at unit distance from both points (see Figure Figure 1.4.1. So the number of
edges is at most ex(n, K2,3 ) = O(n3/2 ) by Theorem 1.4.2.
Later in ?? we will use the crossing number inequality to prove a better bound of O(n4/3 ), which
is the best known upper bound to date.
Erdős (1946) also asked the following related question.
Question 1.4.9 (Distinct distance problem). What is the minimum number of distinct distances
formed by n points in R2 ?
Let g(n) denote the answer. The asymptotically best construction for the minimum number of
distinct distances is also a square grid, samepas earlier. It can be shown that a square grid with
√ √
b nc × b nc points has on the order of n/ log n distinct distances. This is conjectured to be
p
optimal, i.e., g(n) . n/ log n.
Let f (n) denote the maximum number of unit distances among n points in the answer. We have
f (n)g(n) ≥ 2n , since each distance occurs at most f (n) times. So an upper bound on f (n) gives a
lower bound on g(n) (but not conversely).
A breakthrough on the distinct distances problem was obtained by Guth and Katz (2015), show-
ing that g(n) & n/log n distinct distances for some constant c. Their proof is quite sophisticated. It
uses tools ranging from the polynomial method to algebraic geometry.
Exercise 1.4.10 (Density KST). Prove that for every pair of positive integers s ≤ t, there are
constants C, c > 0 such that every n-vertex graph with p n
2 edges contains at least cpst ns+t copies
of Ks,t , provided that p ≥ Cn−1/s .
The next exercise asks you to think about the quantitative dependencies in the proof of the KST
theorem.
1.5. FORBIDDING A GENERAL SUBGRAPH 23
Exercise 1.4.11. Show that, for every > 0, there exists δ > 0 such that every graph with n
vertices and at least n2 edges contains a copy of Ks,t where s ≥ δ log n and t ≥ n0.99 .
The next exercise shows a bad definition of density of a subset of Z2 (it always ends up being
either 0 or 1).
Exercise 1.4.12. Let S ⊂ Z2 . Define
|S ∩ (A × B)|
dk (S) = max .
A,B⊂Z | A||B|
| A|=|B|=k
Example 1.5.3. When H is the Petersen graph, below, which has chromatic number 3, Theo-
rem 1.5.1 tells us that ex(n, H) = (1/4 + o(1))n2 . The Turán density of the Petersen graph is the
same as that of a triangle, which may be somewhat surprising since the Petersen graph seems more
complicated than the triangle.
2
1
1 1
3 2
3 2
2 3
In the rest of this section, we prove the Erdős–Stone–Simonovits theorem. The proof given
here is due to Erdős (1971).
Note that when χ(H) = 2, i.e., H is bipartite, the Erdős–Stone–Simonovits theorem follows
from the KST theorem. We begin by proving an extension of the KST theorem to hypergraphs,
using the same double-counting techniques from the proof of the KST theorem.
Recall the hypergraph Turán problem (Remark 1.3.6). Given an r-uniform hypergraph H (also
known as an r-graph), we write ex(n, H) to be the maximum number of edges in an H-free r-graph.
The analog of a complete bipartite graph for a 3-graph is a complete tripartite 3-graph Ks(3) 1,s2,s3
,
where one has three sets of vertices S1 , S2 , S3 of sizes s1 , s2 , s3 , respectively, and every triple in
S1 × S2 × S3 is an edge. More generally, we write Ks(r) 1,...,sr
for a complete r-partite r-graph.
To help keep notation simple, we first consider what happens for 3-uniform hypergraphs.
Theorem 1.5.4 (KST for 3-graphs). For every s, there is some C such that
(3) 2
ex(n, Ks,s,s ) ≤ Cn3−1/s .
Recall that the KST theorem (Theorem 1.4.2) was proved by counting the number of copies of
Ks,1 in the graph in two different ways. For 3-graphs, we instead count the number of copies of
(3)
Ks,1,1 in two different ways, one of which uses the KST theorem for Ks,s -free graphs.
(3)
Proof. Let G be a Ks,s,s -free 3-graph with n vertices and m edges. Let X denote the number of
(3)
copies of Ks,1,1 in G (when s = 1, we count each copy three times).
Upper bound on X. Given a set S of s vertices, consider the set T of all unordered pairs of
(3)
distinct vertices that would form a Ks,1,1 with S (with S in one part, and the two new vertices each
in its own part). Note that T is the edge-set of a graph on the same n vertices. If T contains a
(3)
Ks,s , then together with S we would have a Ks,s,s . Thus T is Ks,s -free, and hence by Theorem 1.4.2,
|T | = Os (n 2−1/s ). Hence
n 2−1/s
X .s n .s ns+2−1/s .
s
Lower bound on X. We write deg(u, v) for the number of edges containing both u and v. Then,
summing over all unordered pairs of distinct vertices u, v in G, we have
Õ deg(u, v)
X= .
u,v
s
1.5. FORBIDDING A GENERAL SUBGRAPH 25
3m/ 2n .
!
Õ n 3m
X= fs (deg(u, v)) ≥ fs n .
u,v
2 2
We can iterate further, using the same technique, to prove an analogous result for every unifor-
mity.
Theorem 1.5.6 (Hypergraph KST). For every r ≥ 2 and s ≥ 1, there is some C such that
(r) −r+1
ex(n, Ks,...,s ) ≤ Cnr−s ,
(r)
where Ks,...,s is the r-partite r-graph with s vertices in each of the r parts.
Proof. We prove by induction on r. The cases r = 2 and r = 3 were covered previously in
Theorem 1.4.2 and Theorem 1.5.4. Assume that r ≥ 3 and that the theorem has already been
established for smaller values of r. (Actually we could have started at r = 1 if we adjust the
definitions appropriately.)
(r)
Let G be a Ks,...,s -free r-graph with n vertices and m edges. Let X denote the number of copies
(r)
of Ks,1,...,1 in G (when s = 1, we count each copy r times).
Upper bound on X. Given a set S of s vertices, consider the set T of all unordered (r − 1)-tuples
(r)
of vertices that would form a Ks,1,...,1 with S (with S in one part, and the r − 1 new vertices each in
its own part). Note that T is the edge-set of an (r − 1) graph on the same n vertices. If T contains
(r−1) (r) (r−1)
a Ks,...,s , then together with S we would have a Ks,...,s . Thus T is Ks,...,s -free, and by the induction
−r+2
hypothesis, |T | = Or,s (n r−1−s ). Hence
n r−1−s−r+2 −r+2
X .r,s n .r,s nr+s−1−s .
s
Lower bound on X. Given a set U of vertices, we write deg(U) for the number of edges
containing all vertices in U. Then
Õ deg(U)
X=
(G)
s
U∈(Vr−1 )
26 1. FORBIDDING A SUBGRAPH
Let fs (x) be defined as in the previous proof. Since the average of deg U over all (r − 1)-element
n
subsets U is rm/ r−1 , we have
!
Õ n rm
X= fs (deg(U)) ≥ fs n .
V (G)
r −1 r−1
U∈( r−1 )
ex(n, Ks(r)
1,...,sr
) ≤ Cnr−1/(s1 ···sr−1 ) .
Now we are ready to prove the Erdős–Stone–Simonovits theorem. It suffices to establish the
result for complete (r + 1)-partite graphs H, since every H with χ(H) is a subgraph of some
complete (r + 1)-partite graph. This result is due to Erdős and Stone (1946).
Theorem 1.5.8 (Erdős–Stone theorem). Fix r ≥ 1 and s ≥ 1. Let H = Ks,...,s be the complete
(r + 1)-partite graph with s vertices in each part. Then
2
1 n
ex(n, H) = 1 − + o(1) .
r 2
In other words, using the notation Kr+1 [s] for s-blow-up of Kr+1 , obtained by replacing each
vertex of Kr+1 by s duplicates of itself (so that Kr+1 [s] = H in the above theorem statement), the
Erdős–Stone theorem says that
1
π(Kr+1 [s]) = π(Kr+1 ) = 1 − ,
r
As earlier, the lower bound on ex(n, H) in the theorem comes from noting that the r-partite
Turán graph Tn,r is H-free.
The proof of the Erdős–Stone theorem will combine the hypergraph KST theorem with the
supersaturation theorem from Section 1.3. Recall the supersaturation result for subgraphs. Roughly,
it says that if the edge density of G significantly exceeds the Turán density of H, then G must have
many copies of H. The precise statement of Theorem 1.3.3 is copied below.
For every > 0 and graph H there exist some δ > 0 and n0 such that every graph
on n ≥ n0 vertices with at least (π(H) + ) 2 edges contains at least δnv(H) copies
n
of H as a subgraph.
We can rephrase this theorem equivalently as follows:
Fix H. Every n-vertex graph with o(nv(H) ) copies of H has edge density at most
π(H) + o(1).
(Above, the o(·) hypothesis should be interpreted as being applied to a sequence of graphs rather
than a single graph.) We will apply this supersaturation result for H = Kr+1 , combined with Turán’s
theorem, which tells us that π(Kr+1 ) = 1 − 1/r.
1.6. FORBIDDING CYCLES 27
Proof of Theorem 1.5.8. Let G be an H-free graph. Consider the (r + 1)-graph F with the same
(r+1)
vertices as G, and whose edges are (r + 1)-cliques in G. Note that F is Ks,...,s -free, or else a copy
(r+1)
of Ks,...,s in F would be supported by a copy of H in G. Thus, by Theorem 1.5.6, F has o(nr+1 )
edges. So G has o(nr+1 ) copies of Kr+1 , and thus by the supersaturation theorem quoted above, the
edge density of G is at most π(Kr+1 ) + o(1), which equals 1 − 1/r + o(1) by Turán’s theorem.
The proof technique illustrates another supersaturation principle: once above the Turán density
threshold π(H), not only can you find one copy of H, but you can actually find a large blow-up of
H. The following exercise illustrates this principle of hypergraphs.
Exercise 1.5.9 (Erdős–Stone for hypergraphs). Let H be an r-graph. Show that π(H[s]) = π(H),
where H[s], the s-blow-up of H, is obtained by replacing every vertex of H by s duplicates of itself.
In Section 2.6, we will give another proof of the Erdős–Stone–Simonovits theorem using the
graph regularity method.
Theorem 1.6.1 (Odd cycles). Let k be a positive integer. Then for all sufficiently large integer n
(i.e., n ≥ n0 (k) for some n0 (k)), one has
n2
ex(n, C2k+1 ) = .
4
We will omit the proof of this theorem.
Let us now turn to forbidding even cycles. Since C2k is bipartite, we know from the KST theorem
that ex(n, C2k ) = o(n2 ). The following upper bound was determined by Bondy and Simonovits
(1974).
Theorem 1.6.2 (Even cycles). For every k ≥ 2, there exists a constant C so that
Theorem 1.6.4. For any integer k ≥ 2, there exists a constant C so that every graph G with n
vertices and at least Cn1+1/k edges contains an even cycle of length at most 2k.
In other words, Theorem 1.6.4 says that
ex(n, {C2, C4, C6, . . . , C2k }) = O k (n1+1/k ).
Here, given a set F of graphs, ex(n, F ) denotes the maximum number of edges in an n-vertex graph
that does not contain any graph in F as a subgraph.
To prove this theorem, we first clean up the graph by removing some edges and vertices to get
a bipartite subgraph with large minimum degree.
Lemma 1.6.5. Every G has a bipartite subgraph with at least e(G)/2 edges.
Proof. Color every vertex with one of two colors uniformly at random. Then the expected number
of non-monochromatic edges is e(G)/2. Hence there exists a coloring that has at least e(G)/2
non-monochromatic edges, and these edges form the desired bipartite subgraph.
Lemma 1.6.6. Let t ∈ R. Every graph with average degree 2t has a subgraph with minimum degree
greater than t.
Proof. Let G be a graph with average degree 2t. Removing a vertex of degree at most t cannot
decrease the average degree, since the total degree goes down by at most 2t and so the post-deletion
graph has average degree at least (2e(G) − 2t)/(v(G) − 1), which is at least 2e(G)/v(G) since
2e(G)/v(G) ≥ 2t. Let us repeatedly delete vertices of degree at most t in the remaining graph,
until every vertex has degree more than t. This algorithm must terminate with a non-empty graph
since every graph with at most 2t vertices has average degree less than 2t.
···
u
A0
A1 ···
A2
A3 Ak
Figure 1.6.1. Exploration from a vertex u in a C2k -free graph in the proof of
Theorem 1.6.4.
Proof of Theorem 1.6.4. Suppose G contains no even cycles of length at most 2k. Applying
Lemma 1.6.5 followed by Lemma 1.6.6, we find a bipartite subgraph G0 of G with minimum degree
greater than e(G)/(2v(G)) =: t. Let u be an arbitrary vertex of G0. For each i = 0, 1, . . . , k, let Ak
denote the set of vertices at distance exactly i from u (see Section 1.6). For each i = 1, . . . , k − 1,
every vertex of Ai has
• no neighbors inside Ai (or else G0 would not be bipartite),
• exactly one neighbor in Ai−1 (else we can backtrace through two neighbors which must converge
at some point to form an even cycle of length at most 2k),
• and thus greater than t − 1 neighbors in Ai+1 (by the minimum degree assumption on G0).
1.7. FORBIDDING A SPARSE BIPARTITE GRAPH (AND DEPENDENT RANDOM CHOICE) 29
Theorem 1.7.1. Let H be a bipartite graph with vertex bipartition A ∪ B such that every vertex in
A has degree at most r. Then there exists a constant C = CH such that
ex(n, H) ≤ Cn2−1/r .
Remark 1.7.2. The exponent 2 − 1/r is best possible as a function of r. Indeed, we will see in
the following section that for every r there exists some s so that ex(n, Kr,s ) ≥ cn2−1/r for some
c = c(r, s) > 0.
On the other hand, for specific graphs G, Theorem 1.7.1 may not be tight, e.g., ex(n, C6 ) =
Θ(n4/3 ), whereas Theorem 1.7.1 only tells us that ex(n, C6 ) = O(n3/2 ).
Given a graph G with many edges, the goal is to find a large subset U of vertices such that every
r-vertex subset of U has many common neighbors in G (even the case r = 2 is interesting). We can
then embed the B-vertices of H into U, and then extend the embedding to the whole H. The tricky
part is to find such a U.
Here is some intuition.
We want to host a party so that each pair of party-goers has many common friends. Whom
should we invite? Inviting people uniformly at random is not a good idea (why?). Perhaps we can
pick some random individual (Alice) to host a party inviting all her friends. Alice’s friends are
expected to share some common friends—at least they all know Alice.
We can take a step further, and pick a few people at random (Alice, Bob, Carol, David) and have
them host a party and invite all their common friends. This will likely be an even more sociable
crowd. At least all the party goers will know all the hosts, and likely even more. As long as the
social network is not too sparse, there should be lots of invitees.
Some invitees (e.g., Zack) might be feel a bit out of place at the party—maybe they don’t have
many common friends with other party-goers (they all know the hosts but maybe Zack doesn’t
know many others). To prevent such awkwardness, the hosts will cancel Zack’s invitation. There
shouldn’t be too many people like Zack. The party must go on.
Here is the technical statement that we will prove. While there are many parameters, the specific
details are less important compared to the proof technique. This is quite a tricky proof.
30 1. FORBIDDING A SUBGRAPH
Theorem 1.7.3 (Dependent random choice). Let n, r, m, t be positive integers and α > 0. Then
every graph G with n vertices and at least αn2 /2 edges contains a vertex subset U with
t n m t
|U| ≥ nα −
r n
such that every r-element subset S of U has more than m common neighbors in G.
In the theorem statement, t is an auxiliary parameter that does not appear in the conclusion.
While one can optimize for t, it is instructive and convenient to leave it as is. The theorem is
generally applied to graphs with at least n1−c edges, for some small c > 0, and we can play with
the parameters to get |U| and m both large as desired.
Proof. We say that an r-element subset of V(G) is “bad” if it has at most m common neighbors in
G.
Let u1, . . . , ut be vertices chosen uniformly and independently at random from V(G), and let
A be their common neighborhood. (Keep in mind that u1, . . . , ut, A are random. It may be a bit
confusing in this proof what is random and what is not.)
u1 u2 . . . u t
Each fixed vertex v ∈ V(G) has probability (deg(v)/n)t of being adjacent to all of u1, . . . , ut , and so
by linearity of expectations and convexity,
!t
Õ Õ deg(v) t 1 Õ deg(v)
E | A| = P(v ∈ A) = ≥n ≥ nαt .
n n v∈V n
v∈V(G) v∈V(G)
Now we are ready to show Theorem 1.7.1, which recall says that for a bipartite graph H with
vertex bipartition A ∪ B such that every vertex in A has degree at most r, one has ex(n, H) =
OH (n2−1/r ).
1
Proof of Theorem 1.7.1. Let G be a graph with n vertices and at least Cn2− r edges. By choosing C
large enough (depending only on | A| + |B|), we have
r n | A| + |B| r
− r1
n 2Cn − ≥ |B|.
r n
We want to show that G contains H as a subgraph. By dependent random choice (Theorem 1.7.3),
we can embed the B-vertices of H into G so that every r-vertex subset of B (now viewed as a subset
of V(G)) has > | A| + |B| common neighbors.
nibinalbannabs
j be
b
A B j bibs t
Next, we embed the vertices of A one at a time. Suppose we need to embed v ∈ A (some previous
vertices of A may have already been embedded at this point). Note that v has at ≤ r neighbors in
B, and these ≤ r vertices in B have > | A| + |B| common neighbors in G. While some of these
common neighbors may have already been used up in earlier steps to embed vertices of H, there
are enough of them that they cannot all be used up, and thus we can embed v to some remaining
common neighbor. This process ends with an embedding of H into G.
Exercise 1.7.4. Let H be a bipartite graph with vertex bipartition A ∪ B, such that r vertices in A
are complete to B, and all remaining vertices in A have degree at most r. Prove that there is some
constant C = CH such that ex(n, H) ≤ Cn2−1/r for all n.
Exercise 1.7.5. Let > 0. Show that, for sufficiently large n, every K4 -free graph with n vertices
and at least n2 edges contains an independent set of size at least n1− .
However, bounds arising from this method are usually not tight.
Algebraic constructions. The idea is to use (algebraic) geometry (over a finite field) to construct
a graph. Its vertices correspond to geometric objects such as points or lines. Its edges corresponds
to incidences or other algebraic relations. These constructions sometimes give tight bounds. They
work for a small number of graphs H, and usually require a different ad hoc idea for each H. They
work rarely, but they when do, they can appear quite mysterious, or even magical. Many important
tight lower bounds on bipartite extremal numbers arise this way. In particular it will be shown that
ex(n, Ks,t ) = Ωs,t n2−1/s whenever t ≥ (s − 1)! + 1,
thereby matching the KST theorem for such s, t. Also, it will be shown that
ex(n, C2k ) = Ω k n1+1/k whenever k ∈ {2, 3, 5},
Theorem 1.9.1. Let H be a graph with at least two edges. Then there exists a constant c = cH > 0,
v(H)−2
so that for all n ≥ 2, there exists an H-free graph on n vertices with at least cn2− e(H)−1 edges. In
other words,
v(H)−2
ex(n, H) ≥ cn2− e(H)−1 .
− v(H)−2
Proof. Let G be an instance of the Erdős–Rényi random graph G(n, p), with p = 41 n (chosen
e(H)−1
with hindsight). Let X denote the number of copies of H in G. Then, our choice of p ensures that
p n 1
e(H) v(H)
EX ≤ p n ≤ = Ee(G).
2 2 2
Thus
p n v(H)−2
E[e(G) − X] ≥ & n2− e(H)−1 .
2 2
Take a graph G such that e(G) − X is at least its expectation. Remove one edge from each copy of
v(H)−2
H in G, and we get an H-free graph with at least e(G) − X & n2− e(H)−1 edges.
For some graphs H, we can bootstrap Theorem 1.9.1 to give an even better lower bound. For
example, if
H= ,
then v(H) = 10 and e(H) = 20, so applying Theorem 1.9.1 directly gives
ex(n, H) & n2−8/19 .
On the other hand, any K4,4 -free graph is automatically H-free. Applying Theorem 1.9.1 to K4,4
(8-vertex 16-edge) actually gives a better lower bound (2 − 6/15 > 2 − 8/19):
ex(n, H) ≥ ex(n, K4,4 ) & n2−6/15 .
In general, given H, we should apply Theorem 1.9.1 to the subgraph of H with the maximum
(e(H) − 1)/(v(H) − 2) ratio. This gives the following corollary, which sometimes gives a better
lower bound than directly applying Theorem 1.9.1.
Definition 1.9.2. The 2-density of a graph H is defined by
e(H 0) − 1
m2 (H) := max .
0H ⊂H v(H 0) − 2
e(H 0 )≥2
Corollary 1.9.3. For any graph H with at least two edges, there exists constant c = cH > 0 such
that
ex(n, H) ≥ cn2−1/m2 (H) .
e(H 0 )−1
Proof. Let H 0 be the subgraph of H with m2 (H) = v(H 0)−2 . Then ex(n, H) ≥ ex(n, H 0), and we can
apply Theorem 1.9.1 to get ex(n, H) ≥ cn2−1/m2 (H) .
Example 1.9.4. Theorem 1.9.1 combined with the upper bound from the KST theorem (Theo-
rem 1.4.2) gives that for every fixed 2 ≤ s ≤ t,
s+t−2 1
n2− st−1 . ex(n, Ks,t ) . n2− s .
34 1. FORBIDDING A SUBGRAPH
When t is large compared to s, the exponents in the two bounds above are close to each other (but
never equal). When t = s, the above bounds specialize to
2 1
n2− s+1 . ex(n, Ks,s ) . n2− s .
In particular, for s = 2,
n4/3 . ex(n, K2,2 ) . n3/2 .
It turns out that the upper bound is tight. We will show this in the next section using an algebraic
construction.
Exercise 1.9.5. Find a graph H with χ(H) = 3 and ex(n, H) > 41 n2 + n1.99 for all sufficiently large
n.
Theorem 1.10.3 (Large gaps between primes). The largest prime below N has size N − o(N).
1.10. ALGEBRAIC CONSTRUCTIONS 35
Remark 1.10.4. The best quantitative result of this form to date, due to Baker, Harman, and Pintz
(2001), says that there exists a prime in [N − N 0.525, N] for all sufficiently large N. Cramer’s
conjecture, which is wide open and based on a random model of the primes, speculates that the
o(N) in Theorem 1.10.3 may be replaced by O((log N)2 ).
To get a better constant in the above construction, we optimize somewhat by using the same
vertices to represent both points and lines. This pairing of points and lines is known as polarity in
projective geometry.
√
Proof of Theorem 1.10.1. Let p denote the largest prime such that p2 −1 ≤ n. Then p = (1−o(1)) n
by Theorem 1.10.3. Let G be a graph with vertex set V(G) = F2p \ {(0, 0)} and an edge between
(x, y) and (a, b) if and only if ax + by = 1 in F p .
For any two distinct vertices (a, b) and (a0, b0) in V(G), they have at most one common neighbor
since there is at most one solution to the system ax + by = 1 and a0 x + b0 y = 1. Therefore, G is
K2,2 -free.
For every (a, b) ∈ V(G), there are exactly p vertices (x, y) satisfying ax + by = 1. However,
one of those vertices could be (a, b) itself. So every vertex in G has degree p or p − 1. Hence G
has at least (p2 − 1)(p − 1)/2 = (1/2 − o(1))n3/2 edges.
Next, we construct K3,3 -free graphs with the number of edges matching the KST theorem. This
is due to Brown (1966).
We wish to upper bound the number of common neighbors to a set of s vertices. This amount
to showing that a certain system of algebraic equations cannot have too many solutions. We quote
without proof the following key algebraic result from Kollár, Rónyai, and Szabó (1996), which can
be proved using algebraic geometry.
1.10. ALGEBRAIC CONSTRUCTIONS 37
Theorem 1.10.11. Let F be any field and ai j , bi ∈ F such that ai j , ai 0 j for all i , i0. Then the
system of equations
(x1 − a11 )(x2 − a12 ) · · · (xs − a1s ) = b1
(x1 − a21 )(x2 − a22 ) · · · (xs − a2s ) = b2
..
.
(x1 − as1 )(x2 − as2 ) · · · (xs − ass ) = bs
has at most s! solutions (x1, . . . , xs ) ∈ Fs .
Remark 1.10.12. Consider the special case when all the bi are 0. In this case, since the ai j are
distinct for each fixed j, every solution to the system corresponds to a permutation π : [s] → [s],
setting xi = aiπ(i) . So there are exactly s! solutions in this special case. The difficult part of the
theorem says that the number of solutions cannot increase if we move b away from the origin.
Proof of Proposition 1.10.10. Consider distinct y1, y2, . . . , ys ∈ F ps . We wish to bound the number
of common neighbors x. Recall that in a field with characteristic p, we have the identity (x + y) p =
x p + y p for all x, y. So
s−1
1 = N(x + yi ) = (x + yi )(x + yi ) p . . . (x + yi ) p
p s−1 ps−1
= (x + yi )(x p + yi ) . . . (x p + yi )
for all 1 ≤ i ≤ s. By Theorem 1.10.11, these s equations (as i ranges over [s]) have at most s!
p p
solutions in x. Note the hypothesis of Theorem 1.10.11 is satisfied since yi = y j if and only if
yi = y j in F ps .
Now we modify the norm graph construction to forbid Ks,(s−1)!+1 , thereby yielding Theo-
rem 1.10.8.
Let ProjNormGraph p,s be the graph with vertex set F ps−1 × F×p , where two vertices (X, x), (Y, y) ∈
F ps−1 × F×p are adjacent if and only if
N(X + Y ) = x y.
Every vertex (X, x) has degree ps−1 − 1 since its neighbors are (Y, N(X + Y )/x) for all Y , −X. So
ProjNormGraph p,s has (ps−1 − 1)ps−1 (p − 1)/2 edges. As earlier, it remains to show that this graph
is Ks,(s−1)!+1 -free. Once we know this, by taking p to be the largest prime satisfying ps−1 (p − 1) ≤ n,
we obtain the desired lower bound
ex(n, Ks,(s−1)!+1 ) ≥ (ps−1 − 1)ps−1 (p − 1)/2 ≥ (1/2 − o(1)) n2−1/s .
Proof. Fix distinct (Y1, y1 ), . . . , (Ys, ys ) ∈ F ps−1 × F×p . We wish to show that there are at most (s − 1)!
solutions (X, x) ∈ F ps−1 × F×p to the system of equations
N(X + Yi ) = xyi, i = 1, . . . , s.
Assume this system has at least one solution. Then if Yi = Yj with i , j we must have that yi = y j .
Therefore all the Yi are distinct. For each i < s, dividing N(X + Yi ) = xyi by N(X + Ys ) = x ys gives
X + Yi
yi
N = , i = 1, . . . , s − 1.
X + Ys ys
38 1. FORBIDDING A SUBGRAPH
Theorem 1.10.14. Let k ∈ {2, 3, 5}. Then there is a constant c > 0 such that for every n,
ex(n, C2k ) ≥ cn1+1/k .
The following construction generalizes the point-line incidence graph construction earlier for
the C4 -free graph in Theorem 1.10.1. Here we consider a special set of lines in Fqk , whereas
previously we took all lines in F2q .
Construction 1.10.15. Let q be a prime power. Let L denote the set of all lines in Fqk whose
direction can be written as (1, t, . . . , t k−1 ) for some t ∈ Fq . Let G q,k denote the bipartite point-line
incidence graph with vertex sets Fqk and L, i.e., (p, `) ∈ Fqk × L is an edge if and only if p ∈ `.
We have |L| = q k , since to specify a line in L we can provide a point with first coordinate
equal to zero, along with a choice of t ∈ Fq giving the direction of the line. So the graph G q,k
has n = 2q k vertices. Since each line contains exactly q points, there are exactly q k+1 n1+1/k
edges in the graph. It remains to show that this graph is C2k -free whenever k ∈ {2, 3, 5}. Then
Theorem 1.10.14 would follow after the usual trick of taking q to be the largest prime with 2q k < n.
Lemma 1.10.16. For k ∈ {2, 3, 5}. The graph G q,k from Construction 1.10.15 is C2k -free.
Proof. A 2k-cycle in G q,k would correspond to p1, `1, . . . , p k , `k with distinct p1, . . . , p k ∈ Fqk and
distinct `1, . . . , `k ∈ L, and pi, pi+1 ∈ `i for all i (indices taken mod k). Let (1, ti, . . . , tik−1 ) denote
the direction of `i .
lk h
p
l
Then
pi+1 − pi = ai (1, ti, . . . , tik−1 )
for some ai ∈ Fq \ {0}. Thus (recall that p k+1 = p1 )
k
Õ k
Õ
ai (1, ti, . . . , tik−1 ) = (pi+1 − pi ) = 0. (1.10.2)
i=1 i=1
1.11. RANDOMIZED ALGEBRAIC CONSTRUCTIONS 39
where the ai1,...,is, j1,..., js ∈ Fq ’s are chosen subject to ai1,...,is, j1,..., js = a j1,..., js,i1,...,is, but otherwise
independently and uniformly at random.
Let G be the graph with vertex set Fqs , with distinct x, y ∈ Fqs adjacent if and only if f (x, y) = 0.
Then G is a random graph. The next two lemmas show that G behaves in some ways like a
random graph with edges independently appearing with probability 1/q. Indeed, the next lemma
shows that every pair of vertices form an edge with probability 1/q.
1
P[ f (u, v) = 0] = .
q
Proof. Note that resampling the constant term of f does not change its distribution. Thus, f (u, v)
is uniformly distributed in Fq for a fixed (u, v). Hence f (u, v) takes each value with probability
1/q.
More generally, we show below that the expected occurrence of small subgraphs mirrors that
of the usual random graph with independent edges. We write U2 for the set of unordered pairs of
element from U.
Lemma 1.11.3. Suppose f is randomly chosen as above. Let U ⊂ Fqs with |U| ≤ d + 1. Then the
( |U |) U
vector ( f (u, v)){u,v}∈(U ) is uniformly distributed in Fq 2 . In particular, for any E ⊂ 2 , one has
2
which has degree at most |U| − 1 ≤ d. It satisfies qu (u) = 1, and qu (v) = 0 for all v ∈ U \ {u0 }. Let
Õ
p(X, Y ) = cu,v (qu (X)qv (Y ) + qv (X)qu (Y ))
U
{u,v}∈( 2 )
Now fix U ⊂ Fqs with |U| = s. We want to show that it is rare for U to have many common
neighbors. We will use the method of moments. Let
ZU = {x ∈ Fqs \ U : f (x, u) = 0 for all u ∈ U}.
1.11. RANDOMIZED ALGEBRAIC CONSTRUCTIONS 41
Then ZU is the set of common neighbors of U along with possibly some additional vertices in U.
Then using Lemma 1.11.3,
h Õ di
E[|ZU | d ] = E 1{v ∈ ZU }
v∈Fqs \U
Õ
= E[1{v (1), . . . , v (d) ∈ ZU }]
v (1),...,v (d) ∈Fqs \U
Õ
= E[ f (u, v) = 0 for all u ∈ U and v ∈ {v (1), . . . , v (d) }]
v (1),...,v (d) ∈Fqs \U
Õ q s − |U|
= q−r s #{surjections [d] → [r]}
r ≤d
r
Õ
≤ #{surjections [d] → [r]}
r ≤d
= Od (1),
Using Markov’s inequality we get
E[X d ] Od (1)
P(X ≥ λ) = P(X d ≥ λ d ) ≤ ≤ . (1.11.1)
λd λd
Remark 1.11.4. All the probabilistic arguments up to this point would be identical had we used a
random graph with independent edges appearing with probability p. In both settings, the X above
is a random variable with constant order expectation. However, their distributions are extremely
different, as we will soon see. For a random graph with independent edges, X behaves like a
Poisson random variable, and consequently, for any constant t, P(X ≥ t) is bounded from below by
a constant. Consequently, many s-element sets of vertices are expected to have at least t common
neighbors, which means that this method will not work. However, this is not the case with the
random algebraic construction. It is impossible for X to take on certain ranges of values—if X is
somewhat large, then it must be very large.
Note that ZU is defined by s polynomial equations. The next result tells us that the number of
points on such an algebraic variety must be either bounded or at least around q.
Lemma 1.11.5. For all s, d there exists a constant C such that if f1 (X), . . . , fs (X) are polynomials
on Fqs of degree at most d, then
{x ∈ Fqs : f1 (x) = . . . fs (x) = 0}
√
has size either at most C or at least q − C q.
The lemma can be deduced from the following important result from algebraic geometry due
to Loomis and Whitney (1949), which says that the number of points of an r-dimensional algebraic
variety in Fqs is roughly qr , as long as certain irreducibility hypotheses are satisfied. We include
here the statement of the Lang–Weil bound. Here Fq denote the algebraic closure of Fq .
Theorem 1.11.6 (Lang–Weil bound). Let g1, . . . , gm ∈ Fq [X] be polynomials of degree at most d.
Let
s
V = {x ∈ Fq |g1 (x) = g2 (x) = . . . = gm (x)}.
42 1. FORBIDDING A SUBGRAPH
In this chapter, we discuss a powerful technique in extremal graph theory developed in the
1970’s, known as Szemerédi’s graph regularity lemma. The graph regularity method has wide
ranging applications, and is now considered a central technique in the field. The regularity lemma
produces a “rough structural” decomposition of an arbitrary graph (though it is mainly useful for
graphs with quadratically many edges). It then allows us to model an arbitrary graph by a certain
random graph model.
The regularity method introduces us to a central theme of the book:
the dichotomy of structure versus pseudorandomness.
This dichotomy is analogous to the more familiar concept of “signal versus noise”, namely that a
complex system can be decomposed into a structural piece with plenty of information content (the
signal) as well as a random-like residue (the noise). This idea will show up again later in Chapter 6
when we discuss Fourier analysis in additive combinatorics. It is often the case that the structural
piece and the random-like piece can be individually analyzed. However, it is often far from clear
that one can always come up with a useful decomposition.
We begin the chapter with the statement and the proof of the graph regularity lemma. Subse-
quently, we will prove Roth’s theorem using the regularity method. This proof, due to Ruzsa and
Szemerédi (1978), is not the original proof by Roth (1953), whose original Fourier analytic proof
we will see in Chapter 6. Nevertheless, it is important for being historically one of the first major
applications of the graph regularity method. Similar to the proof of Schur’s theorem in Chapter 0,
this graph theoretic proof of Roth’s theorem demonstrates a fruitful connection between graph
theory and additive combinatorics.
By the regularity method, we mean both the graph regularity lemma as well as methods for
applying it. Rather than viewing it as some specific theorem or set of theorems, graph regularity
should be viewed as a general technique malleable to various applications. The reader should avoid
getting bogged down by specific choices of parameters in the statements and proofs below, and
rather, focus on the main ideas and techniques. Students often find the regularity method difficult
to learn, perhaps because the technical details can obscure the intuition. Section 2.7 contains a
number of important exercises on applying the graph regularity method. It is useful to work through
these exercises carefully in order to practice applying the graph regularity method.
a random-like graph with a certain edge-density (e.g., 0.4 between the first and second parts, and
0.7 between the first and third parts, etc.).
Definition 2.1.1. Let X and Y be sets of vertices in a graph G. Let eG (X, Y ) be the number of edges
between X and Y ; that is,
eG (X, Y ) := |{(x, y) ∈ X × Y : xy ∈ E(G)}| .
Define the edge density between X and Y in G by
eG (X, Y )
dG (X, Y ) := .
|X | |Y |
We drop the subscript G if context is clear.
We allow X and Y to overlap in the definition above. (It may be useful to picture the bipartite
setting, where X and Y are automatically disjoint.)
What should it mean for a graph to be random-like? For the current application, we want to say
that the edge-density between a pair of parts X and Y is similar to the “local” edge density between
subsets of X and Y . It is too restrictive to allow taking every subset of X and Y , e.g., by restricting
to single-vertex sets {x} ⊂ X and {y} ⊂ Y , the “local” edge density between {x} and {y} can vary
from 0 to 1. To avoid this issue, we should only consider densities between not-too-small subsets
X and Y .
The next definition tells us what it means for a graph between a pair of vertex sets to be
“random-like.”
U A B W
Definition 2.1.2. Let G be a graph and U, W ⊂ V(G). We call (U, W) an -regular pair in G if for
all A ⊂ U and B ⊂ W with | A| ≥ |U| , |B| ≥ |W |, one has
|d(A, B) − d(U, W)| ≤ .
If (X, Y ) is not -regular, then we say that their irregularity is witnessed by some A ⊂ U and B ⊂ W
satisfying A ≥ |U|, |B| ≥ |W |, and |d(A, B) − d(U, W)| > .
Exercise 2.1.3. Using the Chernoff bound, show that a random bipartite graph between two sets
is -regular with probability approaching 1 as the sizes of vertex sets grows to infinity.
Remark 2.1.4. The in | A| ≥ |U| and |B| ≥ |W | plays a different role from the in
|d(A, B) − d(U, W)| ≤ . However, it is usually not important to distinguish these ’s. So we
use only one for convenience of notation.
2.1. SZEMERÉDI’S REGULARITY LEMMA 45
Definition 2.1.5. Given a graph G, a partition P = {V1, . . . , Vk } of its vertex set is an -regular
partition if Õ
|Vi ||Vj | ≤ |V(G)| 2 .
(i, j)∈[k]2
(Vi,Vj ) not -regular
In other words, an -regular partition is a partition of the vertex set where all but an -fraction of
the pairs of vertices of the graph lie between -regular parts.
Remark 2.1.6. When |V1 | = · · · = |Vk |, the inequality says that at most k 2 of pairs (Vi, Vj ) are not
-regular.
Also, note that the summation includes i = j. If none of Vi are too large, say |Vi | ≤ n for each
i, then the terms with i = j contribute to at most i |Vi | 2 ≤ n i |Vi | = n2 , which is neglible.
Í Í
Theorem 2.1.7 (Szemerédi’s graph regularity lemma). For every > 0, there exists a constant M
such that every graph has an -regular partition into at most M parts.
Here is the proof idea. We will generate the desired vertex partition according to the following
algorithm:
(1) Start with the trivial partition of V(G), i.e., a single part containing all of V(G).
(2) While the current partition P is not -regular:
(a) For each (Vi, Vj ) that is not -regular, find a witnessing pair in V)i and Vj
(b) Refine P using all the witnessing pairs. (Here given two partitions P and Q of the same
set, we say that Q refines P if each part of Q is contained in a part of P. In other words,
we divide each part of P further to obtain Q.)
We repeat step (2) until the partition is -regular, at which point the algorithm terminates. The
resulting partition is always -regular by design. It remains to show that the number of iterations
is bounded as a function of . To see this, we keep track of a quantity that necessarily increases at
each iteration of the procedure. This is called an energy increment argument. (The reason that
we call it an “energy” is that it is the mean squared density, i.e., an L 2 norm, and kinetic energy in
physics is also an L 2 norm.)
Definition 2.1.8 (Energy). Let G be an n-vertex graph (whose dependence we drop from the
notation). Let U, W ⊂ V(G). Define
|U| |W |
q(U, W) := d(U, W)2 .
n2
For partitions PU = {U1, . . . , Uk } of U and PW = {W1, . . . , Wl } of W, define
k Õ
Õ l
q(PU, PW ) := q(Ui, W j ).
i=1 j=1
46 2. THE GRAPH REGULARITY METHOD
Since the edge density is always between 0 and 1, we have 0 ≤ q(P) ≤ 1 for all partitions P.
The following lemmas show that the energy cannot decrease upon refinement, and furthermore, it
must increase substantially at each step of the algorithm above.
Lemma 2.1.9 (Energy never decreases under refinement). Let G be a graph, U, W ⊂ V(G), PU a
partition of U, and PW a partition of W. Then q(PU, PW ) ≥ q(U, W).
U W
ρU ρW
Lemma 2.1.10 (Energy never decreases under refinement). Given two vertex partitions P and P 0
of some graph, if P 0 refines P, then q(P) ≤ q(P 0).
Proof. The conclusion follows by applying Lemma 2.1.9 to each pair of parts of P. In more detail,
letting P = {V1, . . . , Vm }, and suppose P 0 refines each Vi into a partition PV0 i = {Vi10 , . . . , Vik0 i } of Vi ,
so that P 0 = PV0 1 ∪ · · · ∪ PV0 m , we have
Õ Õ
q(P) = q(Vi, Vj ) ≤ q(PV0 i , PV0 j ) = q(P 0).
i, j i, j
Lemma 2.1.11 (Energy boost for an irregular pair). Let G be an n-vertex graph. If (U, W) is not
-regular, as witnessed by A ⊂ U and B ⊂ W, then
|U| |W |
q({A, U \ A}, {B, W \ B}) > q(U, W) + 4 .
n2
2.1. SZEMERÉDI’S REGULARITY LEMMA 47
This is the “red bull lemma”, giving an energy boost when feeling irregular.
Proof. Define Z as in the proof of Lemma 2.1.9 for PU = {A, U \ A} and PW = {B, W \ B}. Then
n2
Var(Z) = E[Z 2 ] − E[Z]2 = (q(PU, PW ) − q(U, W)) .
|U| |W |
We have Z = d(A, B) with probability ≥ | A| |B| /(|U| |W |) (corresponding to the event x ∈ A and
y ∈ B). So
Var(Z) = E[(Z − E[Z])2 ]
| A| |B|
≥ (d(A, B) − d(U, W))2
|U| |W |
> · · 2.
Putting the two inequalities together gives the claim.
The next lemma, corresponding to step (2b) of the algorithm above, shows that we can put all
the witnessing pairs together to obtain an energy increment.
Lemma 2.1.12 (Energy boost for an irregular partition). If a partition P = {V1, . . . , Vk } of V(G) is
not -regular, then there exists a refinement Q of P where every Vi is partitioned into at most 2 k+1
parts, and such that
q(Q) > q(P) + 5 .
Proof. Let
R = {(i, j) ∈ [k]2 : (Vi, Vj ) is -regular} and R = [k]2 \ R.
For each pair (Vi, Vj ) that is not -regular, find a pair Ai, j ⊂ Vi and Bi, j ⊂ Vj that witnesses the
irregularity. Do this simultaneously for all (i, j) ∈ R. Note for i , j, we can take Ai, j = B j,i due to
symmetry. When i = j, we should allow for the possibility of Ai,i and Bi,i to be distinct.
Let Q be a common refinement of P by all the Ai, j and Bi, j ’s. There are ≤ k + 1 such distinct
non-empty sets inside each Vi . So Q refines each Vi into at most 2 k+1 parts. Let Qi be the partition
of Vi given by Q. Then, using the monotonicity of energy under refinements (Lemma 2.1.9),
Õ
q(Q) = q(Qi, Q j )
(i, j)∈[k]2
Õ Õ
= q(Qi, Q j ) + q(Qi, Q j )
(i, j)∈R (i, j)∈R
Õ Õ
≥ q(Vi, Vj ) + q({Ai, j , Vi \Ai, j }, {Bi, j , Vj \Bi, j }).
(i, j)∈R (i, j)∈R
The first sum equals q(P), and the second sum is > 5 by Lemma 2.1.11 since P is not -regular,
we deduce the desired inequality.
48 2. THE GRAPH REGULARITY METHOD
Remark 2.1.13. There is a subtlety in the above proof that might be easy to get wrong if you try to
re-do the proof yourself. The refinement Q must be obtained in a single step by refining P using
all the witnessing sets Ai, j simultaneously. If instead one picks out a pair Ai, j ⊂ Vi and A j,i ⊂ Vj ,
refines the partition using just this pair, and then tries to iterate using another irregular pair (Vi 0, Vj 0 ),
the energy boost step would not work, since -regularity (or lack thereof) is not well-preserved
under taking refinements.
Proof of the graph regularity lemma (Theorem 2.1.7). Start with a trivial partition of the vertex set
of the graph. Repeatedly apply Lemma 2.1.12 whenever the current partition is not -regular. By
Lemma 2.1.12, the energy of the partition increases by more than 5 at each iteration. Since the
energy of the partition is ≤ 1, we must stop after < −5 iterations, terminating in an -regular
partition.
If a partition has k parts, then Lemma 2.1.12 produces a refinement with ≤ k2 k+1 parts. We
start with a trivial partition with one part, and then refine < −5 times. Observe the crude bound
k
k2 k+1 ≤ 22 . So the total number of parts at the end is ≤ tower(d2 −5 e), where
2
··
2· height k
tower(k) := 2 .
Remark 2.1.14. Let us stress what the proof is not saying. It is not saying that the partition gets
more and more regular under each refinement. Also, it is not saying that partition gets more regular
as the energy gets higher. Rather, the role of the energy quantity is simply to bound the number of
iterations of the refinements.
The bound on the number of parts guaranteed by the proof is a constant for each fixed > 0,
but it grows extremely quickly as gets smaller. Is the poor quantitative dependence somehow
due to a suboptimal proof strategy? Surprisingly, the tower-type bound is necessary, as shown by
Gowers (1997).
Theorem 2.1.15 (Lower bound on the number of parts in a regularity partition). There exists a
constant c > 0 such that for all sufficiently small > 0, there exists a graph with no -regular
partitions into fewer than tower(d −c e) parts.
We do not include the proof here. The idea is to construct a graph that roughly reverse engineers
the proof of the regularity lemma, so there is essentially a unique -regular partition, which must
have many parts. This tells us that the proof of the regularity lemma we just saw is the natural
proof.
Recall that in Definition 2.1.5 of an -regular partition, we are allowed to have some irregular
pairs. Is it necessary to permit irregular pairs? It turns that we must permit them. Exercise 2.1.21
gives an example of a graph where every regularity partition must have irregular pairs.
The regularity lemma is quite flexible. For example, we can start with an arbitrary partition of
V(G) instead of the trivial partition in the proof, in order to obtain a partition that is a refinement
of a given partition. The exact same proof with this modification yields the following.
Theorem 2.1.16 (Regularity starting with an arbitrary initial partition). For every > 0, there exists
a constant M such that for every graph G and a partition P0 of V(G), there exists an -regular
partition P of V(G) that is a refinement of P0 , and such that each part of P0 is refined into at most
M parts.
2.1. SZEMERÉDI’S REGULARITY LEMMA 49
Here is another strengthening of the regularity lemma where we impose the additional require-
ment that vertex parts should be as equal in size as possible. We say that a partition is equitable if
all part sizes are within one of each other. In other words, a partition of a set of size n into k parts
is equitable if every part has size bn/kc or dn/ke.
Theorem 2.1.17 (Equitable regularity partition). For all > 0 and m0 , there exists a constant
M such that every graph has an -regular equitable partition of its vertex set into k parts with
m0 ≤ k ≤ M.
Remark 2.1.18. The lower bound m0 requirement on the number of parts is somewhat superficial.
The reason for including it here is that it is often convenient to discard all the edges that lie within
individual parts of the partition, and since there are most n2 /k such edges, they contribute negligibly
if k is not too small, e.g., if we require m0 ≥ 1/.
There are several ways to guarantee equitability. One method is sketched below. We equitize
the partition at every step of the refinement iteration, so that at each step in the proof, we both
obtain an energy increment and also end up with an equitable partition. Here we omit detailed
choices of parameters and calculations, which are mostly straightforward but can get a bit messy.
One can see the original paper of Szemerédi (1978) for details.
Proof sketch of Theorem 2.1.17. Here is a modified algorithm:
(1) Start with an arbitrary equitable partition of the graph into m0 parts.
(2) While the current equitable partition P is not -regular:
(a) (Refinement/energy boost) Refine the partition using pairs that witness irregularity (as in
the earlier proof). The new partition P 0 divides each part of P into ≤ 2|P | parts.
(b) (Equitization) Modify P 0 into an equitable partition by arbitrarily chopping each part
of P 0 into parts of size |V(G)| /m (for some appropriately chosen m = m(|P 0 | , )) plus
some leftover pieces, which are then combined together and then divided into parts of
size |V(G)| /m.
The refinement step (2a) increases energy by ≥ 5 as before. The energy might go down in the
equitization step (2b), but it should notdecrease by much, provided that the m chosen in that step
is large enough (say, m = 100 |P 0 | −5 ). So overall, we still have an energy increment of ≥ 5 /2
at each step, and hence the process still terminates after O( −5 ) steps. The total number of parts at
the end is ≤ m0 tower(O( −5 )).
Exercise 2.1.19 (Basic inheritance of regularity). Let G be a graph and X, Y ⊂ V(G). If (X, Y ) is
an η-regular pair, then (X 0, Y 0) is -regular for all X 0 ⊂ X with |X 0 | ≥ η|X | and Y 0 ⊂ Y with
|Y 0 | ≥ η|Y |.
Exercise 2.1.20 (An alternate definition of regular pairs). Let G be a graph and X, Y ⊂ V(G). Say
that (X, Y ) is -homogeneous if for all A ⊂ X and B ⊂ Y , one has
|e(A, B) − | A| |B| d(X, Y )| ≤ |X | |Y | .
Show that if (X, Y ) is -regular, then it is -homogeneous. Also, show that if (X, Y ) is 3 -
homogeneous, then it is -regular.
The next exercise shows why we must allow for irregular pairs in the graph regularity lemma.
Exercise 2.1.21 (Unavoidability of irregular pairs). Let the half-graph Hn be the bipartite graph on
2n vertices {a1, . . . , an, b1, . . . , bn } with edges {ai b j : i ≤ j}.
50 2. THE GRAPH REGULARITY METHOD
(a) For every > 0, explicitly construct an -regular partition of Hn into O(1/) parts.
(b) Show that there is some c > 0 such that for every ∈ (0, c), every positive integer k and
sufficiently large multiple n of k, every partition of the vertices of Hn into k equal-sized parts
contains at least ck pairs of parts which are not -regular.
Exercise 2.1.22 (Existence of a regular pair of subsets). Show that there is some absolute constant
C > 0 such that for every 0 < < 1/2, every graph on n vertices contains an -regular pair of
−C
vertex subsets each with size at least δn, where δ = 2− .
Exercise 2.1.23 (Existence of a regular subset). Given a graph G, we say that X ⊂ V(G) is
-regular if the pair (X, X) is -regular, i.e., for all A, B ⊂ X with | A|, |B| ≥ |X |, one has
|d(A, B) − d(X, X)| ≤ .
This exercise asks for two different proofs of the claim:
For every > 0, there exists δ > 0 such that every graph contains an -regular subset of vertices
that is an ≥ δ fraction of the vertex set.
(a) Prove the claim using Szemerédi’s regularity lemma, showing that one can obtain the -regular
subset by combining a suitable sub-collection of parts from some regularity partition.
(b*) Give an alternative proof of the claim showing that one can take δ = exp(− exp( −C )) for some
constant C.
Exercise 2.1.24∗ (Regularity partition into regular sets). Prove or disprove: for every > 0 there
exists M so that every graph has an -regular partition into at most M parts, with every part being
-regular with itself.
X
x
Y y z Z
Theorem 2.2.1 (Triangle counting lemma). Let G be a graph and X, Y, Z be subsets of the vertices of
G such that (X, Y ), (Y, Z), (Z, X) are all -regular pairs for some > 0. If d(X, Y ), d(X, Z), d(Y, Z) ≥
2.2. TRIANGLE COUNTING LEMMAS 51
2, then
|{(x, y, z) ∈ X × Y × Z : xyz is a triangle in G}|
≥ (1 − 2)(d(X, Y ) − )(d(X, Z) − )(d(Y, Z) − ) |X | |Y | |Z |.
Remark 2.2.2. The vertex sets X, Y, Z do not have to be disjoint, but one does not lose any generality
by assuming that they are disjoint in this statement. Indeed, starting with X, Y, Z ⊂ V(G), one can
always create with auxiliary tripartite graph G0 with vertex parts being disjoint replicas of X, Y, Z
and the edge relations in X × Y being the same for G and G0, and likewise for X × Z and Y × Z.
Under this auxiliary construction, a triple in X × Y × Z forms a triangle in G if and only it forms a
triangle in G0.
X
X −→
Y Z
Y Z
G G0
We first begin with a simple but very useful consequence of -regularity. It says that in an
-regular pair (X, Y ), almost all vertices of X have roughly the same number of neighbors in Y .
Lemma 2.2.3. Let (X, Y ) be an -regular pair. Then < |X | vertices in X have < (d(X, Y ) − ) |Y |
neighbors in Y . Likewise, < |Y | vertices in Y have < (d(X, Y ) − ) |X | neighbors in X.
Proof. Let A be the vertices in X with < (d(X, Y )−) |Y | neighbors in Y . Then d(A, Y ) < d(X, Y )−,
and thus | A| < |X | by Definition 2.1.2 as (X, Y ) is an -regular pair. The other claim is similar.
X
Y Z
-regular
Proof of Theorem 2.2.1. By Lemma 2.2.3, we can find X 0 ⊂ X with |X 0 | ≥ (1 − 2) |X | such that
every vertex x ∈ X 0 has ≥ (d(X, Y ) − ) |Y | neighbors in Y and ≥ (d(X, Z) − )|Z | neighbors in Z.
For each such x ∈ X 0, we have |N(x) ∩ Y | ≥ (d(X, Y ) − ) |Y | ≥ |Y |. Likewise, |N(x) ∩ Z | ≥
|Z |. Since (Y, Z) is -regular, the edge density between N(x) ∩ Y and N(y) ∩ Z is ≥ d(Y, Z) − .
So for each x ∈ X 0, the number of edges between N(x) ∩ Y and N(x) ∩ Z is
≥ (d(Y, Z) − )|N(x) ∩ Y ||N(x) ∩ Z | ≥ (d(X, Y ) − )(d(X, Z) − )(d(Y, Z) − )|Y ||Z |.
Multiplying by |X 0 | ≥ (1−2) |X |, we obtain the desired lower bound on the number of triangles.
Remark 2.2.4. We only need the lower bound on the triangle count for our applications in this
chapter, but the same proof can also be modified to give an upper bound, which we leave as an
exercise.
52 2. THE GRAPH REGULARITY METHOD
Theorem 2.3.1 (Triangle removal lemma). For all > 0, there exists δ > 0 such that any graph
on n vertices with fewer than δn3 triangles can be made triangle-free by removing fewer than n2
edges.
Remark 2.3.2 (Regularity method recipe). Typical applications of the regularity method proceed
in the following steps:
(1) Partition the vertex set of a graph using the regularity lemma.
(2) Clean the graph by removing edges that behave poorly in the regularity partition. Most
commonly, we remove edges that lie between pairs of parts with
(a) irregularity, or
(b) low-density, or
(c) one of the parts too small
This ends up removing a negligible number of edges.
(3) Count a certain pattern in the cleaned graph using a counting lemma.
To prove the triangle removal lemma, after cleaning the graph (which removes few edges), we
claim that the resulting cleaned graph must be triangle-free, or else the triangle counting lemma
would find many triangles, contradicting the hypothesis.
Proof of the triangle removal lemma (Theorem 2.3.1). Suppose we are given a graph on n vertices
with < δn3 triangles, for some parameter δ we will choose later. Apply the graph regularity lemma,
Theorem 2.1.7, to obtain an /4-regular partition of the graph with parts V1, V2, · · · , Vm . Next, for
each (i, j) ∈ [m]2 , remove all edges between Vi and Vj if
(a) (Vi, Vj ) is not /4-regular, or
(b) d(Vi, Vj ) < /2, or
(c) min{|Vi | , Vj } < n/(4m).
Since the partition is /4-regular (recall Definition 2.1.5), the number of edges removed in (a)
from irregular pairs is
Õ
≤ |Vi ||Vj | ≤ n2 .
i, j
4
(Vi,Vj ) not (/4)-regular
2.3. TRIANGLE REMOVAL LEMMA 53
Vj Vk
Because edges between the pairs described in (a) and (b) were removed, Vi, Vj , Vk satisfy the
hypotheses of the triangle counting lemma (Theorem 2.2.1),
3
#{triangles in Vi × Vj × Vk } ≥ 1 − |Vi | Vj |Vk |
2 4
3 n 3
≥ 1− ,
2 4 4m
where the final step uses (c) above. Then as long as
1 3 3
δ< 1− ,
6 2 4 4m
we would contradict the hypothesis that the original graph has < δn3 triangles (the extra factor of
6 above is there to account for the possibility that Vi = Vj = Vk ). Since m is bounded for each fixed
, we see that δ can be chosen to depend only on .
The next corollary of the triangle removal lemma will soon be used to prove Roth’s theorem.
Corollary 2.3.3. Let G be an n-vertex graph where every edge lies in a unique triangle. Then G
has o(n2 ) edges.
Proof. Let G have m edges. Because each edge lies in exactly one triangle, the number of triangles in
G is m/3 = O(n2 ) = o(n3 ). By the triangle removal lemma (see the statement after Theorem 2.3.1),
we can remove o(n2 ) edges to make G triangle-free. However, deleting an edge removes at most one
triangle from the graph by assumption, so m/3 edges need to be removed to make G triangle-free.
Thus m = o(n2 ).
54 2. THE GRAPH REGULARITY METHOD
Remark 2.3.4 (Quantitative dependencies in the triangle removal lemma). Since the above proof
of the triangle removal lemma applies the graph regularity lemma, the resulting bounds from
the proof are quite poor: it shows that one can pick δ = 1/tower( −O(1) ). Using a different but
related method, Fox (2011) proved the triangle removal lemma with a slightly better dependence
δ = 1/tower(O(log(1/))). In the other direction, we know that the triangle removal lemma does
not hold with δ = c log(1/) for a sufficiently small constant c > 0. The construction comes from the
Behrend construction of large 3-AP-free sets that we will soon see in Section 2.5. Our knowledge of
the quantitative dependence in Corollary 2.3.3 comes from the same source, specifically, we know
∗
that the o(n2 ) can be sharpened to n2 /eΩ(log (1/)) (where log∗ , the iterated logarithm function, is the
number of iterations of log that one needs √ to take to bring a number to at most 1) but the statement
2 2 −C log n
is false if the o(n ) is replaced by n e for some sufficiently large constant C. It is a major
open problem to close the gap between these the upper and lower bounds in these problems.
The triangle removal lemma was historically first considered in the equivalent formulation.
Theorem 2.3.5 ((6, 3)-theorem). Let H be an n-vertex 3-uniform hypergraph without a subgraph
having 6 vertices and 3 edges. Then H has o(n2 ) edges.
Exercise 2.3.6. Deduce the (6, 3)-theorem from Corollary 2.3.3, and vice-versa.
The following influential conjecture due to Brown, Erdős, and Sós (1973) is a major open
problem in extremal combinatorics. It is a natural generalization of Theorem 2.3.5.
Conjecture 2.3.7 ((7, 4)-conjecture). Let H be an n-vertex 3-uniform hypergraph without a sub-
graph having 7 vertices and 4 edges. Then H has o(n2 ) edges.
• (x, y) ∈ X × Y whenever y − x ∈ A;
• (y, z) ∈ Y × Z whenever z − y ∈ A;
• (x, z) ∈ X × Z whenever (z − x)/2 ∈ A.
2.4. GRAPH THEORETIC PROOF OF ROTH’S THEOREM 55
Z/MZ
y
x ∼ y iff y ∼ z iff
y−x ∈ A z−y ∈ A
z
x
(Note that one could relax the assumption d > 0 to d , 0, allowing “negative” corners. As shown
in the first step in the proof below, the assumption d > 0 is inconsequential.)
The is due to Ajtai and Szemerédi (1974), who originally proved it by invoking the full power
of Szemerédi’s theorem. Here we present a much simpler proof using the triangle removal lemma
due to Solymosi (2003).
Proof. First we show how to relax the assumption in the definition of a corner from d > 0 to d , 0.
Let A ⊂ [N]2 be a corner-free set. For each z ∈ Z2 , let Az = A ∩ (z − A). Then | Az | is the
number of ways that one can write z = a + b for some (a, b) ∈ A × A. So z∈[2N]2 | Az | = | A| 2 , so
Í
there is some z ∈ [2N] with | Az | ≥ | A| 2 /(2N)2 . To show that | A| = o(N 2 ), it suffices to show that
| Az | = o(N 2 ). Moreover, since Az = z − Az , it being corner-free implies that does not contain three
points {(x, y), (x + d, y), (x, y + d)} with d , 0.
Write A = Az from now on. Build a tripartite graph G with parts X = {x1, . . . , xN }, Y =
{y1, . . . , yN } and Z = {z1, . . . , z2N }, where each vertex xi corresponds to a vertical line {x = i} ⊂
Z2 , each vertex y j corresponds to a horizontal line {y = j}, and each vertex z k corresponds to a
slanted line {y = −x + k} with slope −1. Join two distinct vertices of G with an edge if and only
if the corresponding lines intersect at a point belonging to A. Then, each triangle in the graph G
56 2. THE GRAPH REGULARITY METHOD
y=j
yj
Y zk Z
x=i x+y=k
Since A is corner-free in the sense stated at the end of the previous paragraph, xi , y j , z k form a
triangle in G if and only if the three corresponding lines pass through the same point of A (i.e.,
forming a trivial corner with d = 0). Since there is exactly one line of each direction passing
through every point of A, it follows that each edge of G belongs to exactly one triangle. Thus, by
Corollary 2.3.3, 3 | A| = e(G) = o(N 2 ).
The upper bound on corner-free sets actually implies Roth’s theorem, as shown below. So we
now have a second proof of Roth’s theorem (though, this second proof is secretly the same as the
first proof).
Proposition 2.4.3 (Corner-free sets versus 3-AP-free sets). Let r3 (N) be the size of the largest
subset of [N] which contains no 3-term arithmetic progression, and rx (N) be the size of the largest
subset of [N]2 which contains no corner. Then, r3 (N)N 6 rx (2N).
2N
0 A N 2N
Exercise 2.4.5∗ (Arithmetic triangle removal lemma). Show that for every > 0, there exists δ > 0
such that if A ⊂ [n] has fewer than δn2 many triples (x, y, z) ∈ A3 with x + y = z, then there is some
B ⊂ A with | A \ B| ≤ n such that B is sum-free, i.e., there do not exist x, y, z ∈ B with x + y = z.
Proof. Let m and d be two positive integers depending on N to be √ specified later. Consider the
lattice points of X = {0, 1, . . . , m − 1} d that lie on a sphere of radius L:
XL := (x1, . . . , xd ) ∈ X : x12 + · · · + xd2 = L .
Ðdm2
Then, X = i=1 Xi . So by the pigeonhole principle, there exists an L ∈ [dm2 ] such that |XL | >
d 2
m /(dm ). Define the base 2m digital expansion
d
Õ
φ(x1, . . . , xd ) := xi (2m)i−1 .
i=1
The Behrend construction also implies lower bound constructions for the other problems we
saw earlier. For example, since we used Corollary 2.3.3 to deduce an upper bound on the size of
3-AP-free set, turning this implication around, we see that having a large 3-AP-free set implies a
quantitative limitation on Corollary 2.3.3. Let us spell out it here.
√
Corollary 2.5.2. For every n ≥ 3, there is some n-vertex graph with at least n2 e−C log n edges
where every edge lies on a unique triangle. Here C is some absolute constant.
Proof. In the proof of Theorem 2.4.1, starting from a 3-AP-free set A ⊂ [N], we constructed a
graph with 6N + 3 vertices and (6N + 3) | A| edges such that every edge lies in a unique triangle.
Choosing N√ = b(n − 3)/6c and letting A be the Behrend construction of Theorem 2.5.1 with
| A| ≥ Ne−C log N , we obtain the desired graph.
The same graph construction also shows, after examining the proof of Corollary 2.3.3, that in
2
the triangle removal lemma, Theorem 2.3.1, one cannot take δ = e−c(log(1/)) if the constant c > 0
is too small.
In Proposition 2.4.3 we deduced an upper bound r3 (N)N ≤ rx (2N) on corner-free sets using
3-AP-free√sets. The Behrend construction then also gives a corner free subset of [N]2 of size
≥ N 2 e−C log N .
Proposition 2.6.1 (K4 counting lemma). Let 0 < < 1. Let X1, . . . , X4√be vertex subsets of a graph
G such that (Xi, X j ) is -regular with edge-density di j := d(Xi, X j ) ≥ 3 for each pair i < j. Then
|{(x1, x2, x3, x4 ) ∈ X1 × X2 × X3 × X4 : x1 x2 x3 x4 is a clique in G}|
≥ (1 − 3)(d12 − 3)(d13 − )(d14 − )(d23 − )(d24 − )(d34 − ) |X1 | |X2 | |X3 | |X4 | .
Proof. We repeatedly apply the following statement, which is a simple consequence of the definition
of -regularity (and a small extension of Lemma 2.2.3)
2.6. GRAPH COUNTING AND REMOVAL LEMMAS 59
Given an -regular pair (X, Y ), and B ⊂ Y with |B| ≥ |Y |, the number of vertices
in X with < (d(X, Y ) − ) |B| neighbors in B is < |X |.
The number of vertices X1 with ≥ (d1i − ) |Xi | neighbors in Xi for each i = 2, 3, 4 is ≥
(1 − 3) |X1 |. Fix a choice of such an x1 ∈ X1 . For each i = 2, 3, 4, let Yi be the neighbors of x1 in
Xi , so that |Yi | ≥ (d1i − ) |Xi |.
The number of vertices in Y2 with ≥ (d2i − ) |Yi | common neighbors in Yi for each i = 3, 4 is
≥ |Y2 | − 2 |X2 | ≥ (d12 − 3) |X2 |. Fix a choice of such an x2 ∈ Y2 . For each i = 3, 4, let Zi be the
neighbors of x2 in Yi .
For each i = 3, 4, |Zi | ≥ (d1i − )(d2i − ) |Xi | ≥ |Xi |, and so
Any edge between Z3 and Z4 forms a K4 together with x1 and x2 . Multiplying the above quantity
with the earlier lower bounds on the number of choices of x1 and x2 gives the result.
The same strategy works more generally for counting any graph. To find copies of H, we embed
vertices of H one at a time.
Theorem 2.6.2 (Graph counting lemma). For every graph H and real δ > 0, there exists an > 0
such that the following is true.
Let G be a graph, and Xi ⊂ V(G) for each i ∈ V(H) such that for each i j ∈ E(H), (Xi, X j ) is an
-regular pair with edge density di j := d(Xi, X j ) ≥ δ. Then the number of graph homomorphisms
H → G where each i ∈ V(H) is mapped to Xi is
Ö Ö
≥ (1 − δ) (di j − δ) |Xi |.
i j∈E(H) i∈V(H)
Remark 2.6.3. (a) For a fixed H, as |Xi | → ∞ for each i, all but a negligible fraction of such
homomorphisms from H are injective (i.e., yielding a copy of H as a subgraph).
(b) It is useful (and in fact equivalent) to think about the setting where G is a multipartite graph
with parts Xi , as illustrated below.
X
I
2 3
X X
H Xy
G
In the multipartite setting, we see that the graph counting lemma can be adapted to variants such as
counting induced copies of H. Indeed, induced copies of H correspond to embedding a v(H)-clique
in an auxiliary graph G0 obtained by replacing the bipartite graph in G between Xi and X j by its
60 2. THE GRAPH REGULARITY METHOD
Xi Xi
Theorem 2.6.4 (Graph counting lemma). Let H be a graph with maximum degree ∆ ≥ 1 and c(H)
connected components. Let > 0. Let G be a graph, and Xi ⊂ V(G) for each i ∈ V(H) such that,
for each i j ∈ E(H), (Xi, X j ) is an -regular pair with edge density di j := d(Xi, X j ) ≥ (∆ + 1) 1/∆ .
Then the number of graph homomorphisms H → G where each i ∈ V(H) is mapped to Xi is
Ö Ö
≥ (1 − ∆)c(H) (di j − ∆ 1/∆ ) · |Xi | .
i j∈E(H) i∈V(H)
Furthermore, if |Xi | ≥ v(H)/ for each i, then there exists such a homomorphism H → G that is
injective (i.e., an embedding of H as a subgraph).
Proof. Let us order and label the vertices of H by 1, . . . , v(H) arbitrarily. We will select vertices
x1 ∈ X1, x2 ∈ X2, . . . in order. The idea is to always make sure that they have enough neighbors in
G so that there are many ways to continue the embedding of H. We say that a partial embedding
x1, . . . , xs−1 (here partial embedding means that xi x j ∈ E(G) whenever i j ∈ E(H) for all the xi ’s
chosen so far) is abundant if for each j ≥ s, the number of valid extensions x j ∈ X j (meaning that
xi x j ∈ E(G) whenever i < s and i j ∈ E(H)) is ≥ X j i<s:i j∈E(H) (di j − ).
Î
For each s = 1, 2, . . . , v(H) in order, suppose we have already fixed an abundant partial embed-
ding x1, . . . , xs−1 . For each j ≥ s, let
Yj = {x j ∈ X j : xi x j ∈ E(G) whenever i < s and i j ∈ E(H)}
be the set of valid extensions of the j-th vertex in X j given the partial embeddings of x1, . . . , xs−1 ,
so that the abundance hypothesis gives
Ö
|Yj | ≥ |X j | (di j − ) ≥ ( 1/∆ )|{i<s:i j∈E(H)}| |X j | ≥ |X j |.
i<s
i j∈E(H)
Thus, as in the proof of Proposition 2.6.1 for K4 , the number of choices xs ∈ Xs that would extend
x1, . . . , xs−1 to an abundant partial embedding is
≥ |Ys | − |{i > s : si ∈ E(H)}| |Xs |
Ö
≥ |Xs | (di j − ) − |{i > s : si ∈ E(H)}| |Xs | . (†)
i<s
is∈E(H)
2.6. GRAPH COUNTING AND REMOVAL LEMMAS 61
Fix such a choice of xs . And now we move onto embedding the next vertex xs+1 .
Multiplying together these lower bounds for the number of choices of each xs over all s =
1, . . . , v(H), we obtain the lower bound on the number of homomorphisms H → G.
Finally, note that in both cases (†) ≥ |Xs |, and so if |Xs | ≥ v(H)/, then (†) ≥ v(H) and so
we can choose each xs to be distinct from the previously embedded vertices x1, . . . , xs−1 , thereby
yielding an injective homomorphism.
As an application, we have the following graph removal lemma, generalizing the triangle
removal lemma, Theorem 2.3.1. The proof is basically the same as Theorem 2.3.1 except with the
above graph counting lemma taking the role of the triangle counting lemma, so we will not repeat
the proof here.
Theorem 2.6.5 (Graph removal lemma). For every graph H and constant > 0, there exists a
constant δ = δ(H, ) > 0 such that every n-vertex graph G with < δnv(H) copies of H can be made
H-free by removing < n2 edges.
The next exercise asks you to show that, if H is bipartite, then one can prove the H-removal
lemma without using regularity, and thereby getting a much better bound.
Exercise 2.6.6 (Removal lemma for bipartite graphs with polynomial bounds). Prove that for every
bipartite graph H, there is a constant C such that for every > 0, every n-vertex graph with fewer
than C nv(H) copies of H can be made H-free by removing at most n2 edges.
As another application, let us give a different proof of the Erdős–Stone–Simonovits theorem.
We saw a proof earlier in Section 1.5 using supersaturation and the hypergraph KST theorem.
The proof below follows the partition–clean–count strategy in Remark 2.3.2 combined with an
application of Turán’s theorem. A common feature of many regularity applications is that they
“boost” an exact extremal graph theoretic result (e.g., Turán’s theorem) to an asymptotic result
involving more complex derived structures (e.g., from the existence of a copy of Kr to embedding
a complete r-partite graph).
Theorem 2.6.7 (Erdős–Stone–Simonovits theorem). Fix graph H with at least one edge. Then
2
1 n
ex(n, H) = 1 − + o(1) .
χ(H) − 1 2
2
Proof. Fix > 0. Let G be any n-vertex graph with at least 1 − χ(H)−1 1
+ n2 edges. The theorem
is equivalent to the claim that for n = n(, H) sufficiently large, G contains H as a subgraph.
Apply the graph regularity lemma to obtain an η-regular partition V(G) = V1 t · · · t Vm for
some sufficiently small η > 0 only depending on and H, to be decided later. Then the number m
of parts is also bounded for fixed H and .
Remove an edge (x, y) ∈ Vi × Vj if
62 2. THE GRAPH REGULARITY METHOD
a
By Turán’s theorem (Corollary 1.2.5), G0 contains a copy of K χ(H) . Suppose that the χ(H)
vertices of this K χ(H) land in Vi1, · · · , Vi χ(H) (allowing repeated indices). Since each pair of these
sets is η-regular, has edge density ≥ /8, and each has size ≥ n/(8m), applying the graph counting
lemma, Theorem 2.6.2, we see that as long as η is sufficiently small in terms of and H, and n is
sufficiently large, there exists an injective embedding of H into G0 where the vertices of H in the
r-th color class are mapped into Vir . So G contains H as a subgraph.
Exercise 2.7.6. Show that for every α > 0, there exists β > 0 such that every graph on n vertices
with at least αn2 edges contains a d-regular subgraph for some d ≥ βn (here d-regular refers to
every vertex having degree d).
Theorem 2.8.1 (Induced graph removal lemma). For any graph H and > 0, there exists δ > 0
such that if an n-vertex graph has fewer than δnv(H) copies of H, then it can be made induced H-free
by adding and/or deleting fewer than n2 edges.
Remark 2.8.2. Given two graphs on the same vertex set, the minimum number of edges that one
needs to add/delete to obtain the second graph from the first graph is called the edit distance
between the two graphs. The induced graph removal lemma can be rephrased as saying that every
graph with few induced copies of H is close in edit distance to an induced-H-free graph.
Unlike the previous graph removal lemma, for the induced version, it is important that we allow
both adding and deleting edges. The statement would be false if we only allow edge deletion but
not addition. For example, suppose G = Kn \ K3 , i.e., a complete graph on n vertices with three
edges of a single triangle removed. If H is an empty graph on three vertices, then G has exactly
one copy of H, but G cannot be made induced-H-free by only deleting edges.
To see why the earlier proof of the graph removal lemma (Theorem 2.6.5) does not apply in a
straightforward way to prove the induced graph removal lemma, let us attempt to follow the earlier
strategy and see where things go wrong.
First we apply the graph regularity lemma. Then we need to clean up the graph. In the induced
graph removal lemma, edges and non-edges play symmetric roles (alternatively, we can rephrase
the problem in terms of red/blue edge-coloring of cliques). We can handle low density pairs (edge
density less than ) by removing edges between such pairs. Naturally, for the induced graph removal
lemma, we also need to handle high density pairs (density more than 1 − ), and we can add all
the edges between such pairs. However, it is not clear what to do with irregular pairs. Earlier, we
just removed all edges between irregular pairs. The problem is that this may create many induced
copies of H that were not present previously, e.g., below. Likewise, we cannot simply add all edges
between irregular pairs.
V2
irregular
H
V1
G
Perhaps we can always find a regularity partition without irregular pairs? Unfortunately, this is
false, as shown in Exercise 2.1.21. One must allow for the possibility of irregular pairs.
64 2. THE GRAPH REGULARITY METHOD
We will iterate the regularity partitioning lemma to obtain a stronger form of the regularity
lemma. Recall the energy q(P) of a partition (Definition 2.1.8) as the mean-squared edge density
between parts.
Theorem 2.8.3 (Strong regularity lemma). For any sequence of constants 0 ≥ 1 ≥ 2 . . . > 0,
there exists an integer M so that every graph has two vertex partitions P and Q so that
(a) Q refines P,
(b) P is 0 -regular and Q is |P | -regular,
(c) q(Q) ≤ q(P) + 0 , and
(d) |Q| ≤ M.
Remark 2.8.4. One should think of the sequence 1, . . . as rapidly decreasing. This strong regu-
larity lemma outputs a refining pair of partitions P and Q such that P is regular, Q is extremely
regular, and P and Q are close to each other (as captured by q(P) ≤ q(Q) ≤ q(P) + 0 ; see
Lemma 2.8.7 below). A key point here is that we demand Q to be extremely regular relative to the
number of parts of P. The more parts P has, the more regular Q should be.
Proof. We repeatedly apply the following version of Szemerédi’s regularity lemma (Theorem 2.1.16):
For all > 0, there exists an integer M0 = M0 () so that for all partitions P of
V(G), there exists a refinement P 0 of P with each part in P refined into ≤ M0
parts so that P 0 is -regular.
By iteratively applying the above regularity partition, we obtain a sequence of partitions
P0, P1, . . . of V(G) starting with P0 = {V(G)} being the trivial partition. Each Pi+1 is |Pi | -regular
and refines Pi . The regularity lemma guarantees that we can have |Pi+1 | ≤ |Pi |M0 (|Pi | ).
Since 0 ≤ q(·) ≤ 1, there exists i ≤ 0−1 so that q(Pi+1 ) ≤ q(Pi ) + 0 . Then setting P = Pi
and Q = Pi+1 satisfies the desired requirements. Indeed, the number of parts of Q is bounded by
a function of the sequence (0, 1, . . . ), since there are a bounded number of iterations, and each
iteration produced a refining partition with a bounded number of parts.
Remark 2.8.5 (Bounds in the strong regularity lemma). The bound on M produced by the proof
depends on the sequence (0, 1, . . . ). In the application below, we use i = 0 /poly(i). Then the
size of M is comparable to applying M0 to 0 in succession 10 times. Note that M0 is a tower
function, and this makes M a tower function iterated i times. This iterated tower function is called
the wowzer function: wowzer(k) = tower(tower(· · · (tower(k)) · · · )) (with k applications of tower).
The wowzer function is one step up from tower in the Ackermann hierarchy.
Remark 2.8.6. We can in fact further guarantee equitability of parts. This can be done by adapting
the ideas sketched in the proof sketch of Theorem 2.1.17.
The following lemma explains the significance of the inequality q(Q) ≤ q(P) + from earlier.
Lemma 2.8.7. Let P and Q both be vertex partitions of a graph G, with Q refining P. For each
x ∈ V(G), write Vx for the part of P that x lies in and W x for the part of Q that x lies in. If
q(Q) ≤ q(P) + 3 , then d(Vx, Vy ) − d(W x, W y ) ≤ for all but n2 pairs (x, y) ∈ V(G)2 .
Proof. Let x, y ∈ V(G) be chosen uniformly at random. As in the proof of Lemma 2.1.9, we have
q(P) = E[ZP2 ], where ZP = d(Vx, Vy ). Likewise, q(Q) = E[ZQ2 ], where ZQ = d(W x, W y ).
We have
q(Q) − q(P) = E[ZQ2 ] − E[ZP2 ] = E[(ZQ − ZP )2 ],
2.8. INDUCED GRAPH REMOVAL AND STRONG REGULARITY 65
ZQ − ZP
Indeed, the identity E[ZQ2 ] − E[ZP2 ] = E[(ZQ − ZP )2 ] is equivalent to E[ZP (ZQ − ZP )] = 0, which
is true since as x and y each vary over their own parts of P, the expression ZQ − ZP averages to
zero.
So q(Q) ≤ q(P) + 3 is equivalent to E[(ZQ − ZP )2 ] ≤ 3 , which in turn implies, by Markov
inequality, that P(|ZQ − ZP | > ) ≤ , which is the same as the desired conclusion.
Remark 2.8.8. Conversely, if d(Vx, Vy ) − d(W x, W y ) ≤ for all but n2 pairs (x, y) ∈ V(G)2 , then
q(Q) ≤ q(P) + 2 (Exercise!).
We now deduce the following form of the strong regularity lemma, which considers only select
subsets of vertex parts but does not require irregular pairs.
Theorem 2.8.9 (Strong regularity lemma). For any sequences of constants 0 ≥ 1 ≥ 2 · · · > 0,
there exists a constant δ > 0 so that every n-vertex graph has an equitable vertex partition
V1 ∪ · · · ∪ Vk and a subset Wi ⊂ Vi for each i satisfying
(a) |Wi | ≥ δn,
(b) (Wi, W j ) is k -regular for all 1 ≤ i ≤ j ≤ k, and
(c) d(Vi, Vj ) − d(Wi, W j ) ≤ 0 for all but < 0 k 2 pairs (i, j) ∈ [k]2 .
V2 W2 W3 V3
V1 W1 W4 V4
Remark 2.8.10. It is significant that all (rather than nearly all) pairs (Wi, W j ) are regular. We will
need this fact in our applications below.
Proof sketch. Here we show how to prove a slightly weaker result where i ≤ j in (b) is replaced
by i < j. In other words, this proof does not promise that each Wi is k -regular. To obtain the
stronger conclusion as stated (requiring each Wi to be regular with itself), we can adapt the ideas
in Exercise 2.1.23. We omit the details.
By decreasing the i ’s if needed (we can do this since a smaller sequence of i ’s yields a stronger
conclusion), we may assume that i ≤ 1/(10i 2 ) and i ≤ 0 /4 for every i ≥ 1.
Let us apply the strong regularity lemma, Theorem 2.8.3, with equitable partitions (see above
Remark 2.8.6). That is, we have (we make the simplifying assumption that all partitions are exactly
equitable, to avoid unimportant technicalities):
• an equitable 0 -regular partition P = {V1, . . . , Vk } of V(G) and
• an equitable k -regular partition Q refining P
66 2. THE GRAPH REGULARITY METHOD
satisfying
• q(Q) ≤ q(P) + 03 /8, and
• |Q| ≤ M = M(0, 1, . . . ).
Inside each part Vi , let us choose a part Wi of Q uniformly at random. Since |Q| ≤ M, the
equitability assumption implies that each part of Q has size ≥ δn for some constant δ = δ(0, 1, . . . ).
So (a) is satisfied.
Since Q is k -regular, all but an k -fraction of pairs of parts of Q are k -regular. Summing over
all i < j, using linearity of expectations, the expected the number of pairs (Wi, W j ),that are not
k -regular is ≤ k k 2 ≤ 1/10. It follows that with probability ≥ 9/10, (Wi, W j ) is k -regular for all
i < j, so (b) is satisfied (in the simpler setting ignoring i = j as mentioned earlier).
Let X denote the number of pairs (i, j) ∈ [k]2 with d(Vi, Vj ) − d(Wi, W j ) > 0 . Since q(Q) ≤
q(P) + (0 /2)3 , by Lemma 2.8.7 and linearity of expectations, EX ≤ (0 /2)k 2 . So by Markov’s
inequality, X ≤ 0 k 2 with probability ≥ 1/2, so that (c) is satisfied
It follows that (a) and (b) are both satisfied with probability ≥ 1 − 1/10 − 1/2. Therefore, there
exist valid choices of Wi ’s.
Proof of the induced graph removal lemma (Theorem 2.8.1). As usual, we have the three steps in
the regularity method recipe.
First, we apply Theorem 2.8.9 to obtain a partition V1 ∪ · · · ∪ Vk of the vertex set of the graph,
along with W k ⊂ Vk , so that the following hold.
(a) (Wi, W j ) is 0-regular for every i ≤ j, with some sufficiently small constant 0 > 0 depending
on and H,
(b) d(Vi, Vj ) − d(Wi, W j ) ≤ /8 for all but < k 2 /8 pairs (i, j) ∈ [k]2 , and
(c) |Wi | ≥ δ0 n, for some constant δ0 depending only on and H.
Next, we clean the graph as follows: for each pair i ≤ j (including i = j):
• if d(Wi, W j ) ≤ /8, then remove all edges between (Vi, Vj );
• if d(Wi, W j ) ≥ 1 − /8, then add all edges between (Vi, Vj ).
Note that we are not simply add/removing edges within each pair (Wi, W j ), but rather all of
(Vi, Vj ). To bound the number of edges add/deleted, recall (b) from the previous paragraph. If
d(Wi, W j ) ≤ /8 and d(Vi, Vj ) − d(Wi, W j ) ≤ /4, then d(Vi, Vj ) ≤ /4, and the number of edges
in all such (Vi, Vj ) is at most n2 /4. Likewise for d(Wi, W j ) ≥ 1 − /8. For the remaining < k 2 /8
pairs (i, j) not satisfying d(Vi, Vj ) − d(Wi, W j ) ≤ /8, the total number of edges among all such
pairs is at most n2 /8. All together, we added/deleted < n2 edges from G. Call the resulting graph
G0. There are no irregular pairs (Wi, W j ) for us to worry about.
It remains to show that G0 is induced-H-free. Suppose otherwise. Let us count induced copies
of H in G as in the proof of the graph removal lemma, Theorem 2.6.5. We have some induced copy
of H in G0, with each vertex v ∈ V(H) embedded in Vφ(v) for some φ : V(H) → [k].
Consider a pair of distinct vertices u, v of H. If uv ∈ E(H), there must be an edge in G0 between
Vφ(u) and Vφ(v) (here φ(u) and φ(v) are not necessarily different). So we must not have deleted all
the edges in G between Vφ(u) and Vφ(v) in the cleaning step. By the cleaning algorithm above, this
means that dG (Wi, W j ) > /8.
Likewise, if uv < E(H) for any pair of distinct u, v ∈ V(H), we have dG (Wi, W j ) < 1 − /8.
Since (Wi, W j ) is 0-regular in G for every i ≤ j, provided that 0 is small enough (in
terms of and H), the graph counting lemma, Theorem 2.6.2 (with the induced variation as
in Remark 2.6.3(b)), applied to G tell us that the number of induced copies of H in G is
2.8. INDUCED GRAPH REMOVAL AND STRONG REGULARITY 67
v(H) v(H)
≥ (1−)(/10)( 2 ) (δ0 n)v(H) (recall |Wi | ≥ δ0 n). We are then done with δ = (1−)(/10)( 2 ) δ0v(H) ,
since from the hypothesis we knew that G has < δnv(H) copies of H.
Finally, let us prove a graph removal lemma with an infinite number of forbidden induced
subgraphs (Alon and Shapira 2008). Given a (possibly infinite) set H of graphs, we say that G is
induced-H -free if G is induced-H-free for every H ∈ H .
Theorem 2.8.11 (Infinite graph removal lemma). For each (possibly infinite) set of graphs H and
> 0, there exist h0 and δ > 0 so that if G is an n-vertex graph with fewer than δnv(H) induced
copies of H for every H ∈ H with at most h0 vertices, then G can be made induced-H -free by
adding/removing fewer than n2 edges.
Remark 2.8.12. The presence of h0 may seem a bit strange at first. In the next section, we will see
a reformulation of this theorem in the language of property testing, where h0 comes up naturally.
Proof. The proof is mostly the same as the proof of the induced graph removal lemma that we just
saw. The main (and only) tricky issue here is how to choose the regularity parameter 0 for every
pair (Wi, W j ) in condition (a) of the earlier proof. Previously, we did not use the full strength of
Theorem 2.8.9, which allowed 0 to depend on k, but now we are going to use it. Recall that we had
to make sure that this 0 was chosen to be small enough for the H-counting lemma to work. Now
that there are possibly infinitely many graphs in H , we cannot naively choose 0 to be sufficiently
small. The main point of the proof is to reduce the problem to a finite subset of H for each k.
Define a template T to be an edge-coloring of the looped k-clique (i.e., a complete graph on k
vertices along with a loop at a every vertex) where each edge is colored by one of {white, black,
gray}. We say that a graph H is compatible with a template T if there exists a map φ : V(H) → V(T)
such that for every distinct pair u, v of vertices of H:
• if uv ∈ E(H), then φ(u)φ(v) is colored black or gray in T; and
• if uv < E(H), then φ(u)φ(v) is colored white or gray in T.
That is, a black edge in a template means an edge of H, a white edge means a non-edge of H, and
a gray edge is a wildcard (the most flexible). An example is shown below.
As another example, every graph is compatible with every completely gray template.
For every template T, pick some representative HT ∈ H compatible with T, as long as such a
representative exists (and ignore T otherwise). A graph in H is allowed to be the representative
of more than one template. Let Hk be a set of all H ∈ H that arise as the representative of some
k-vertex template. Note that Hk is finite since there are finitely many k-vertex templates. We
can pick each k > 0 to be small enough so that the conclusion of the counting step later can be
guaranteed for all elements of Hk .
68 2. THE GRAPH REGULARITY METHOD
Now we proceed nearly identically as in the proof of the induced removal lemma, Theorem 2.8.1,
that we just saw. In applying Theorem 2.8.9 to obtain the partition V1 ∪ · · · ∪Vk and finding Wi ⊂ Vi ,
we ensure the following condition instead of the earlier (a):
(a) (Wi, W j ) is k -regular for every i ≤ j.
We set h0 to be the maximum number of vertices of a graph in Hk .
Now we do the cleaning step. Along the way, we create a k-vertex template T with vertex set
[k] corresponding to the parts {V1, . . . , Vk } of the partition. For each 1 ≤ i ≤ j ≤ n,
• if d(Wi, W j ) ≤ /4, then remove all edges between (Vi, Vj ) from G, and color the edge i j in
template T white;
• if d(Wi, W j ) ≥ 1 − /4, then add all edges between (Vi, Vj ), and color the edge i j in template
T black;
• otherwise, color the edge in i j in template T gray.
Finally, suppose some induced H ∈ H remains in G0. Due to our cleaning procedure, H must
be compatible with the template T. Then the representative HT ∈ Hk of T is a graph on at most h0
vertices, and furthermore, the counting lemma guarantees that, provided k > 0 is small enough (a
finite number of pre-chosen constraints, one for each element of Hk ), the number of copies of HT
in G is ≥ δnv(HT ) for some constant δ > 0 that only depends on and H .
All the techniques above work nearly verbatim for a generalization to colored graphs.
Theorem 2.8.13 (Infinite edge-colored graph removal lemma). For every > 0, r ∈ N, and a
(possibly infinite) set H of r-edge-colored graphs, there exists some h0 and δ > 0 such that if G is
an r-edge-coloring of the complete graph on n vertices with < δnv(H) copies of H for every H with
at most h0 vertices, then G can be made H -free by recoloring < n2 edges (using the same palette
of r colors throughout).
The induced graph removal lemma corresponds to the special case r = 2, with the two colors
representing edges and non-edges respectively.
Theorem 2.9.1 (Triangle-freeness is testable). For all constants > 0, there exists a constant
K = K() so that the following algorithm satisfies the probabilistic guarantees below.
Algorithm. Input: a graph G.
2.9. GRAPH PROPERTY TESTING 69
Sample K vertices from G uniformly at random without replacement (if G has fewer than K
vertices, then return the entire graph). If G has no triangles among these K vertices, then output
that G is triangle-free; else output that G is -far from triangle-free.
(a) If the input graph G is triangle-free, then the algorithm always correctly outputs that G is
triangle-free;
(b) If the input graph G is -far from triangle-free, then with probability ≥ 0.99 the algorithm
outputs that G is -far from triangle-free;
(c) We do not make any guarantees when the input graph is neither triangle-free nor -far from
triangle-free.
Remark 2.9.2. This is an example of a one-sided tester, meaning that it always (non-probabilistically)
outputs a correct answer when G satisfies property P and only has a probabilistic guarantee when
G does not satisfy property G. In otherwise, it only has false positives and not false negatives. (In
contrast, a two-sided tester would have probabilistic guarantees for both situations.)
For a one-sided tester, there is nothing special about the number 0.99 above in (b). It can be any
positive constant δ > 0. If we run the algorithm m times, then the probability of success improves
from ≥ δ to ≥ 1 − (1 − δ)m , which can be made arbitrarily close to 1 if we choose m large enough.
Proof. If the graph G is triangle-free, the algorithm clearly always outputs correctly. On the other
hand, if G is -far from triangle-free, then by the triangle removal lemma (Theorem 2.3.1), G has
≥ δ 3n triangles with some constant δ = δ() > 0. If we sample three vertices from G uniformly
a random, then then they form a triangle with probability ≥ δ. And if run K/3 independent trials,
then the probability that we see a triangle is ≥ 1 − (1 − δ)K/3 , which is ≥ 0.99 as long as K is a
sufficiently large constant (depending on δ, which in turn depends on ).
In the algorithm as stated in the theorem, K vertices are sampled without replacement. Above
we had K independent trials of picking a triple of vertices at random. But this difference hardly
matters. We can couple the two processes by adding additional random processes to the latter
process until we see K distinct vertices.
Just as how the guarantee of the above algorithm is essentially a rephrasing of the triangle
removal lemma, other graph removal lemmas can be rephrased as graph property testing theorems.
For the infinite induced graph removal lemma, Theorem 2.8.11, we can rephrase the result in terms
of graph property testing for hereditary properties.
A graph property P is hereditary if it is closed under vertex-deletion. That is, if G ∈ P, then
every induced subgraph of G is in P. Many common examples of graph properties are hereditary,
e.g., H-free, induced H-free, planar, 3-color, perfect. Every hereditary property P is the same as
the set of induced H -free graph for some (possibly infinite) family of graphs H , e.g., we can take
H = {H : H < P}.
Theorem 2.9.3 (Every hereditary graph property is testable). For every hereditary graph property
P, and constant > 0, there exists a constant K = K(P, ) so that the following algorithm satisfies
the probabilistic guarantees listed below.
Algorithm. Input: a graph G.
Sample K vertices from G uniformly at random without replacement and let H be induced
subgraph on these K vertices. If H ∈ P, then output that G satisfies P; else output that G is -far
from P.
(a) If the input graph G satisfies P, then the algorithm always correctly outputs that G satisfies
P;
70 2. THE GRAPH REGULARITY METHOD
(b) If the input graph G is -far from P, then with probability ≥ 0.99 the algorithm outputs that
G is -far from P;
(c) We do not make any guarantees when the input graph is neither in P nor -far from P.
Proof. If G ∈ P, then since P is hereditary, H ∈ P, and so the algorithm always correctly outputs
that G ∈ P. So suppose G is -far from P. Let H be such that P is the set of induced H -free
graphs. By the infinite induced graph removal lemma, there is some h0 and δ > 0 so that G has
≥ δ v(H) copies of some H ∈ H with at most h0 vertices. So with probability ≥ δ, a sample
n
of h0 vertices sees an induced subgraph not satisfying P. Running K/h0 independent trials, we
see some induced subgraph not satisfying P with probability ≥ 1 − (1 − δ)K/h0 , which can made
arbitrarily close to 1 by choosing K to be sufficiently large. As with earlier, this implies the result
about choosing K random points without replacement.
Theorem 2.10.1 (Hypergraph removal lemma). For every r-graph H and > 0, there exists δ > 0
so that every n-vertex r-graph with < δnv(H) copies of H can be made H-free by removing < nr
edges.
Recall Szemerédi’s theorem says that for every fixed k ≥ 3, every k-AP-free subset of [N]
has size o(N). We will prove it as a corollary of the hypergraph removal lemma for H = Kk(k−1) ,
the complete (k − 1)-graph on k vertices (also known as a simplex; when k = 3 it is called a
tetrahedron). For concreteness, we will show how the deduction works in the case k = 4 (it is
straightforward to generalize).
Here is a corollary of the tetrahedron removal lemma. It is analogous to Corollary 2.3.3.
Corollary 2.10.2. If G is a 3-graph such that every edge is contained in a unique tetrahedron (i.e.,
a clique on four vertices), then G has o(n3 ) edges.
Proof of Szemerédi’s theorem for 4-APs. Let A ⊂ [N] be 4-AP-free. Let M = 6N + 1. Then A is
also a 4-AP-free subset of Z/MZ (there are no wrap-arounds). Build a 4-partite 3-graph G with
parts W, X, Y , Z, all of which are M-vertex sets indexed by the elements of Z/MZ. We define
edges as follows, where w, x, y, z range over elements of W, X, Y , Z, respectively:
wx y ∈ E(G) ⇐⇒ 3w + 2x + y ∈ A,
wxz ∈ E(G) ⇐⇒ 2w + x −z ∈ A,
wyz ∈ E(G) ⇐⇒ w − y − 2z ∈ A,
x yz ∈ E(G) ⇐⇒ −x − 2y − 3z ∈ A.
2.11. HYPERGRAPH REGULARITY 71
What is important here is that the ith expression does not contain the ith variable.
The vertices x yzw form a tetrahedron if and only if
3w + 2x + y, 2w + x − z, w − y − 2z, −x − 2y − 3z ∈ A.
However, these values form a 4-AP with common difference −x − y − z − w. Since A is 4-AP-free,
the only tetrahedra in A are trivial 4-APs (those with common difference zero). For each triple
(w, x, y) ∈ W × X × Y , there is exactly one z ∈ Z/MZ such that x + y + z + w = 0. Thus, every
edge of the hypergraph lies in exactly one tetrahedron.
By Corollary 2.10.2, the number of edges in the hypergraph is o(M 3 ). On the other hand,
the number of edges is exactly 4M 2 | A| (since, e.g., for every a ∈ A, there are exactly M 2 triples
(w, x, y) ∈ (Z/MZ)3 with 3w + 2x + y = a). Therefore | A| = o(M) = o(N).
The hypergraph removal lemma is proved using a substantial and difficult generalization of the
graph regularity method to hypergraphs. We will not be able to prove it in this book. In the next
section, we sketch some key ideas in hypergraph regularity.
It is instructive to work out the proof in the special cases below.
For the next two exercises, you should assuming Corollary 2.10.2.
Exercise 2.10.3 (3-dimensional corners). Suppose A ⊂ [N]3 contains no four points of the form
(x, y, z), (x + d, y, z), (x, y + d, z), (x, y, z + d), with d > 0.
Show that | A| = o(N 3 ).
Exercise 2.10.4 (Multidimensional Szemerédi for axis-aligned squares). Suppose A ⊂ [N]2 con-
tains no four points of the form
(x, y), (x + d, y), (x, y + d), (x + d, y + d), with d , 0.
Show that | A| = o(N 2 ).
Try generalizing this technique to prove the multidimensional Szemerédi theorem (Theo-
rem 0.2.6) using the hypergraph removal lemma.
2.11. Hypergraph regularity
Hypergraph regularity is substantially more difficult to prove than graph regularity. We only
sketch some key ideas here. For concreteness, we focus our discussion on 3-graphs. Throughout
this section, G will be a 3-graph with vertex set V.
What should correspond to an “-regular pair” from the graph regularity lemma? Here is an
initial attempt.
Definition 2.11.1 (Initial attempt at 3-graph regularity). Given vertex subsets V1, V2, V3 ⊂ V, we say
that (V1, V2, V3 ) is -regular if, for all Ai ⊂ Vi such that | Ai | ≥ |Vi | , we have
|d(V1, V2, V3 ) − d(A1, A2, A3 )| ≤ .
Here, the edge density d(X, Y, Z) is the fraction of elements of X × Y × Z that are edges of G.
By following the proof of the graph regularity lemma nearly verbatim, we can show the
following.
Proposition 2.11.2 (Initial attempt at 3-graph regularity partition). For all > 0, there exists M =
M() such that every 3-graph has a partition into at most M parts so that all but at most an
-fraction of triples of vertices lie in -regular triples of vertex parts.
72 2. THE GRAPH REGULARITY METHOD
Can this result be used to prove the hypergraph removal lemma? Unfortunately, no.
Recall that our graph regularity recipe (Remark 2.3.2) involves three steps: partition, clean, and
count. It turns out that no counting lemma is possible for the above notion of 3-graph regularity.
The notion of -regularity is supposed to model pseudorandomness. So why don’t we try
truly random hypergraphs and see what happens? Let us consider two different random 3-graph
constructions:
(1) First pick constants p, q ∈ [0, 1] . Build a random graph G(2) = G(n, p), an ordinary Erdős–
Rényi graph. Then construct G(3) by including each triangle of G(2) as an edge of G(3) with
probability q. Call this 3-graph X.
(2) For each possible edge (i.e. triple of vertices), include the edge with probability p3 q, indepen-
dent of all other edges. Call this 3-graph Y .
The edge density in both X and Y are close to p3 q, even when restricted to linearly sized triples
of vertex subsets. So both graphs satisfy our above notion of -regularity with high probability.
However, we can compute the tetrahedron densities in both of these graphs and see that they do not
match.
The tetrahedron density in X is around q4 times the K4 density in the underlying random graph
G . The K4 density in G(2) is around p6 . So the tetrahedron density in X is around p6 q4 .
(2)
On the other hand, the tetrahedron density in Y is around (p3 q)4 , different from p6 q4 earlier. So
we should not expect a counting lemma with this notion of -regularity. (Unless the the 3-graph
we are counting is linear, as in the exercise below.)
Exercise 2.11.3. Under the notion of 3-graph regularity in Definition 2.11.1, formulate and prove
an H-counting lemma for every linear 3-graph H. Here a hypergraph is said to be linear if every
pair of its edges intersects in at most one vertex.
As hinted by the first random hypergraph above, a more useful notion of hypergraph regularity
should involve both vertex subsets as well as subsets of vertex-pairs (i.e., an underlying 2-graph).
Given a 3-graph G, a regularity decomposition will consist of
(1) a partition of V2 into 2-graphs G(2) (2)
1 ∪ · · · ∪ G l so that G sits in a random-like way on top of
most triples of these 2-graphs (we won’t try to make it precise), and
(2) a partition of V that gives an extremely regular partition for all 2-graphs G(2) (2)
1 , . . . , G l (this
should be somewhat reminiscent of the strong graph regularity lemma from Section 2.8).
For such a decomposition to be applicable, it should come with a corresponding counting
lemma.
There are several ways to make the above notions precise. Certain formulations make the
regularity partition easier to prove while the counting lemma harder, and some vice versa. The
interested readers should consult Rödl et al. (2005), Gowers (2007) (see Gowers (2006) for an
exposition of the case of 3-uniform hypergraphs), and Tao (2006) for three different approaches to
the hypergraph regularity lemma.
Remark 2.11.4 (Quantitative bounds). Whereas the proof of the graph regularity lemma gives
tower-type bounds tower( −O(1) ), the proof of the 3-graph regularity lemma has wowzer-type
bounds. The 4-graph regularity lemma moves us one more step up in the Ackermann hierarchy,
i.e., iterating wowzer, and so on. Just as with the tower-type lower bound (Theorem 2.1.15) for the
graph regularity lemma, Ackermann type bounds are necessary for hypergraph regularity as well
(Moshkovitz and Shapira 2019).
FURTHER READING 73
Further reading
For surveys on the graph regularity method and applications, see Komlós and Simonovits (1996)
and Komlós, Shokoufandeh, Simonovits, and Szemerédi (2002).
For a survey on the graph removal lemma, including many variants, extensions, and proof
techniques, see Conlon and Fox (2013).
For a well-motivated introduction to the hypergraph regularity lemma, see Gowers (2006).
CHAPTER 3
Pseudorandom graphs
In the previous chapter, we say that the graph regularity lemma partitions an arbitrary graph
into a bounded number of pieces so that the graph looks “random-like” between most pairs of parts.
In this chapter, we dive further into how a graph can be random-like.
Pseudorandomness is a concept prevalent in combinatorics, theoretical computer science, and
in many other areas. It specifies how a non-random object can behave like a truly random object.
Suppose you want to generate a random number on a computer. In most systems and program-
ming languages, you can do this easily with a single command (e.g., rand()). The output is not
actually truly random. Instead, the output came from a pseudorandom generator, which is some
function/algorithm that takes a seed as input, and passes it through some sophisticated function,
so that there is no practical way to distinguish the output from a truly random object. In other
words, the output is not actually truly random, but for all practical purposes the output cannot be
distinguished from a true random output.
In number theory, the prime numbers behave like a random sequence in many ways. The
celebrated Riemann hypothesis and its generalizations give quantitative predictions about how
closely the primes behave in a certain specific way like a random sequence. There is also something
called Cramér’s random model for the primes that allows one to make predictions about the
asymptotic density of certain patterns in the primes (e.g., how many twin primes up to N are
there?). Empirical data support these predictions, and they have been proved in certain cases, but
there are still notorious open problems such as the twin prime and Goldbach conjectures. Despite
their pseudorandom behavior, the primes are not random!
It is very much believed that the digits of π behave in a random-like way, where every digit
or block of digits appear with frequency similar to that of a truly
√ random number. Such numbers
are called normal. It is widely believed that numbers such as 2, π, and e are normal, but proofs
remain elusive. Again, the digits of π are deterministic, not random, but they are believed to behave
pseudorandomly.
Coming back to graph theory, we have the Erdős–Rényi model of random graphs, where every
edge occurs independently with some probability. Now, given some specific graph (perhaps an
instance of the random graph model, or perhaps generated via some other means), we can ask
whether this graph, for the purpose of some intended application, behaves similarly to that of a
typical random graph. What are some way to “measure” pseudorandomness? These questions were
studied systematically starting in the late 1980’s in the foundational works of Thomason (1987)
and Chung, Graham, and Wilson (1989). It had an important impact in the field. This is the main
theme that we explore in this chapter.
Theorem 3.1.1 (Quasirandom graphs). Let p ∈ [0, 1] be fixed. Let (Gn ) be a sequence of graphs
with Gn having n vertices and (p + o(1)) 2n edges (here n → ∞ along some subsequence of integers,
i.e., is allowed to skip integers). Denote Gn by G. The following properties are all equivalent:
DISC (discrepancy):
e(X, Y ) = p |X | |Y | + o(n2 ) for all X, Y ⊂ V(G).
Here e(X, Y ) = |{(x, y) ∈ X × Y : xy ∈ E(G)}|. The asymptotic notation means that there is some
function f (n) with f (n)/n2 → 0 as n → ∞ ( f may depend on the sequence of graphs) such that e(X, Y )
is always within f (n) of p|X ||Y |.
DISC’:
|X |
e(X) = p + o(n2 ) for all X ⊂ V(G).
2
Here e(X) is the number of edges of G contained in X.
COUNT: For every graph H, the number of labeled copies of H in G is (pe(H) + o(1))nv(H) .
Here a labeled copy of H is the same as an injective map V(H) → V(G) that sends every edge of H to
an edge of G. Here the rate that the o(1) goes to zero is allowed to depend on H.
C4 (4-cycle): The number of labeled 4-cycles is at most (p4 + o(1))n4 .
CODEG (codegree): Letting codeg(u, v) denote the number of common neighbors of u and v,
Õ
codeg(u, v) − p2 n = o(n3 ).
u,v∈V(G)
EIG (eigenvalue): If λ1 ≥ λ2 ≥ · · · ≥ λn are the eigenvalues of the adjacency matrix of G, then
λ1 = pn + o(n) and maxi,1 |λi | = o(n).
Here the adjacency matrix of G defined as the matrix with row and columns both indexed by V(G),
and the (u, v)-entry is 1 if uv ∈ E(G), and 0 otherwise.
Definition 3.1.2. We say a sequence of graphs is quasirandom (at edge density p) if it satisfies
the above conditions for some constant p ∈ [0, 1].
Remark 3.1.3. Strictly speaking, it does not make sense to say whether a single graph is quasiran-
dom, but we will abuse the definition as such when it is clear that the graph we are referring to is
part of a sequence.
The C4 condition may be surprising. It says that the 4-cycle density, a single statistic, is
equivalent to all the other quasirandomness conditions. We will soon see below in Proposition 3.1.12
that the C4 can be replaced by the equivalent condition that the number of labeled 4-cycles is
(p4 + o(1))n4 (rather than at most this quantity).
The discrepancy conditions are hard to verify since they involve checking exponentially many
sets. The other conditions can all be checked in time polynomial in the size of the graph. So the
equivalence gives us an algorithmically efficient way to certify the discrepancy condition.
Remark 3.1.4 (Quantiative equivalences). Rather than stating these properties for a sequence of
graphs using a decaying error term o(1), we can state a quantitative quasirandomness hypothesis for
a specific graph using an error tolerance parameter . For example, we can restate the discrepancy
condition as
DISC(): For all X, Y ⊂ V(G), |e(X, Y ) − p |X | |Y || < n2 .
And likewise with the other quasirandom graph notions. The proof below show that these
notions are equivalence up to a polynomial change in , i.e., for each pair of properties, Prop1()
implies Prop2( c ) for some constant c > 0, provided that 0 < < 1/2.
3.1. QUASIRANDOM GRAPHS 77
Now we give some examples of quasirandom graphs. First let us check that random graphs are
quasirandom (hence justifying the name).
Recall the following basic tail bound for a sum of independent random variables.
Theorem 3.1.5 (Chernoff bound). Let X be a sum of m independent Bernoulli random variables
(not necessarily identically distributed). Then for every t > 0,
2 /(2m)
P(|X − EX | ≥ t) ≤ 2e−t
2 2
Proposition 3.1.6. Let p ∈ [0, 1] and > 0. With probability at least 1 − 2n+1 e− n , the Erdős–
Rényi random graph G(n, p) has the property that for every vertex subset X,
|X |
e(X) − p ≤ n2
2
Proof. Applying the Chernoff bound to e(X), we see that
!
−(n2 )2
|X |
P e(X) − p > n 2
≤ 2 exp ≤ 2 exp − n .
2 2
2 2 |X2 |
The result then follows by taking a union bound over all 2n subsets X of the n-vertex graph.
Applying the Borel–Cantelli lemma with the above bound, we obtain the following consequence.
Corollary 3.1.7 (Random graphs are quasirandom). Fix p ∈ [0, 1]. With probability 1, a sequence
of random graphs Gn ∼ G(n, p) is quasirandom at edge density p.
It would be somewhat disappointing if the only interesting example of quasirandom graph were
actual random graphs. Fortunately we have more explicit constructions. In the rest of the chapter,
we will see several constructions using Cayley graphs on groups. A notable example, which we
will prove in Section 3.3, is that the Paley graph is quasirandom.
Example 3.1.8 (Paley graph). Let p ≡ 1 (mod 4) be a prime. Form a graph with vertex set F p ,
with two vertices x, y joined if x − y is a quadratic residue. Then this graph is quasirandom at edge
density 1/2 as p → ∞. (By a standard fact from elementary number theory, since p ≡ 1 (mod 4),
−1 is a quadratic residue, and hence x − y is a quadratic residue if and only if y − x is. So the graph
is well defined.)
In Section 3.4, we will show that for certain sequence of groups, every sequence of Cayley
graphs on them is quasirandom provided that the edge densities converge. We will call such groups
quasirandom. We will later prove the following important example.
Example 3.1.9 (PSL(2, p)). Let p be a prime. Let S ⊂ PSL(2, p) be a subset of non-zero elements
with S = S −1 . Let G be the Cayley graph on PSL(2, p) with generator S, meaning that the vertices
are elements of PSL(2, p), and two vertices x, y are adjacent if x −1 y ∈ S. Then G is quasirandom
as p → ∞ as long as |S| /p3 converges.
Finally, here is an explicit construction using finite geometry. We leave it as an exercise to
verify its quasirandomness using the conditions given earlier.
Example 3.1.10. Let p be a prime. Let S ⊂ F p ∪ {∞}. Let G be a graph on vertex set F2p where
two points are joined if the slope of the line connecting them lies in S. Then G is quasirandom as
p → ∞ as long as |S| /p converges.
78 3. PSEUDORANDOM GRAPHS
Proposition 3.1.12 (Minimum 4-cycle density). Every n-vertex graph with at least pn2 /2 edges has
at least p4 n4 labeled closed walks of length 4.
Remark 3.1.13. Since all but O(n3 ) such closed walks use four distinct vertices, the above statement
implies that the number of labeled 4-cycles is at least (p4 − o(1))n4 .
Proof. The number of closed walks of length 4 is
Õ w y
|{(w, x, y, z) closed walk}| = |{x : w ∼ x ∼ y}| 2
w,y
!2
1 Õ
≥ 2 |{x : w ∼ x ∼ y}| w y
n w,y
!2
1 Õ x
= 2 |{(w, y) : w ∼ x ∼ y}|
n x
!2
1 Õ x
= 2 (deg x)2
n x
!4
1 Õ x
≥ 4 deg x
n x
= (2e(G))4 /n4 ≥ p4 n4
Here both inequality steps are due to Cauchy–Schwarz. On the right column is a pictorial depiction
of what is being counted by the inner sum on each line. These diagrams are a useful way to
keep track of the graph inequalities, especially when dealing with much larger graphs, where the
algebraic expressions get unwieldly. Note that each application of the Cauchy–Schwarz inequality
corresponds to “folding” the graph along a line of reflection.
We shall prove the equivalences of Theorem 3.1.1 in the following way:
DISC0 DISC COUNT
CODEG C4 EIG
|X |
Proof that DISC implies DISC0. Take Y = X in DISC. (Note that e(X, X) = 2e(X) and 2 =
2
|X | /2 − O(n).)
3.1. QUASIRANDOM GRAPHS 79
Proof that DISC0 implies DISC. We have the following “polarization identity”, together with a
proof by picture (recall 2e(X) = e(X, X)):
e(X, Y ) = e(X ∪ Y ) + e(X ∩ Y ) − e(X \ Y ) − e(Y \ X).
X
Y Y
X Y
∩
\
\
X
X \Y
X ∩Y + = + − −
Y\X
u,v x∈G
n x∈G n
We also have (below the O(n3 ) error term is due to walks of length 4 that use repeated vertices)
Õ
codeg(u, v)2 = # labeled C4 + O(n3 )
u,v
≤ p4 n4 + o(n4 ).
Thus, by Cauchy–Schwarz,
2 Õ 2
1 Õ 2 2
codeg(u, v) − p n ≤ codeg(u, v) − p n
n2 u,v u,v
Õ Õ
= codeg(u, v)2 − 2p2 n codeg(u, v) + p4 n4
u,v u,v
≤ p n − 2p n · p n + p n + o(n4 )
4 4 2 2 3 4 4
= o(n4 ).
Remark 3.1.14. These calculations share the spirit of the second moment method in probabilistic
combinatorics. The condition C4 says that the variance of the codegree of two random vertices is
small.
Exercise 3.1.15. Show that if we modify the COEG condition to u,v∈V(G) codeg(u, v) − p2 n =
Í
Proof that CODEG implies DISC. We first show that the codegree condition implies the concen-
tration of degrees:
2 Õ
1 Õ
| deg u − pn| ≤ (deg u − pn)2
n u u
Õ Õ
= (deg u)2 − 2pn deg u + p2 n3
u u
Õ
= codeg(x, y) − 4pn e(G) + p2 n3
x,y
= p2 n3 − 2p2 n3 + p2 n3 + o(n3 )
= o(n3 ). (3.1.1)
Now we bound the expression in DISC. We have
2
1 1 Õ
|e(X, Y ) − p |X | |Y || =
2
(deg(x, Y ) − p |Y |)
n n x∈X
Õ
≤ (deg(x, Y ) − p |Y |)2 .
x∈X
The above Cauchy–Schwarz step turned all the summands nonnegative, which affords us the next
step, expanding the domain of summation from X to all of V = V(G). Continuing,
Õ
≤ (deg(x, Y ) − p |Y |)2
x∈V
Õ Õ
= 2
deg(x, Y ) − 2p |Y | deg(x, Y ) + p2 n |Y | 2
x∈V x∈V
Õ Õ
= 0
codeg(y, y ) − 2p |Y | deg y + p2 n |Y | 2
y,y 0 ∈Y y∈Y
= |Y | p n − 2p |Y | · |Y | pn + p2 n |Y | 2 + o(n3 )
2 2
[by CODEG and (3.1.1)]
= o(n ). 3
Finally, let us consider the graph spectrum, i.e., the multiset of eigenvalues of the graph
adjacency matrix, accounting for eigenvalue multiplicities. Eigenvalues are core to the study of
pseudorandomness and they will play a central role in the rest of this chapter.
In this book, when we talk about the eigenvalues of a graph, we always mean the eigenvalues
of the adjacency matrix of the graph. In other contexts, it may be useful to consider other related
matrices, such as the Laplacian matrix, or a normalized adjacency matrix.
We will generally only consider real symmetric matrices, whose eigenvalues are always all real
(Hermitian matrices also have this property). Our usual convention is to list all the eigenvalues
in order (including multiplicities): λ1 ≥ λ2 ≥ · · · ≥ λn . We refer to λ1 as the top eigenvalue
(or largest eigenvalue), and λi as the i-th eigenvalue (or the i-th largest eigenvalue). The second
eigenvalue plays an important role. We write λi (A) for the i-th eigenvalue of the matrix A and
λi (G) = λi (AG ) where AG is the adjacency matrix of G.
Remark 3.1.16 (Linear algebra review). For every n × n real symmetric matrix A with eigenvalues
λ1 ≥ · · · ≥ λn , we can choose an eigenvector vi ∈ Rn for each eigenvalue λi (so that Avi = λi vi )
3.1. QUASIRANDOM GRAPHS 81
and such that {v1, . . . , vn } is an orthogonal basis of Rn (this is false for general non-symmetric
matrices).
The Courant–Fischer min-max theorem is an important characterization of eigenvalues in
terms of a variational problem. Here we only state some consequences most useful for us. We have
hv, Avi
λ1 = max .
nv∈R \{0} hv, vi
Once we have fixed a choice of an eigenvector v1 for the top eigenvalue λ1 , we have
hv, Avi
λ2 = max .
v⊥v1 hv, vi
v∈Rn \{0}
In particular, if G is a d-regular graph, then the all-1 vector, denoted 1 ∈ Rv(G) , is an eigenvector
for the top eigenvalue d.
The Perron–Frobenius theorem tells us some important information about the top eigenvector
and eigenvalue of a nonnegative matrix. For every connected graph G, the top eigenvector is simple
(i.e., multiplicity one), so that λi < λ1 for all i > 1. We also have |λi | ≤ λ1 for all i (one has
λn = −λ1 if and only if G is bipartite; see Remark 3.1.20 below). Also, the top eigenvector v1
(which is unique up to scalar multiplication) has all coordinates positive.
If G has multiple connected components G1, . . . , G k , then the eigenvalues of G (with multi-
plicities) are obtained by taking a multiset union of the eigenvalues of its connected components.
An orthogonal system of eigenvectors can also be derived as such, by extending each eigenvector
of Gi to an eigenvector of G via padding the eigenvector by zeros outside the vertices of Gi .
Here is a useful formula:
tr Ak = λ1k + · · · + λnk .
When A is the adjacency matrix of a graph G, tr Ak counts the number of closed walks of length k.
In particular, tr A2 = 2e(G).
Proof that EIG implies C4 . Let A denote the adjacency matrix of G. The number of labeled 4-cycles
is within O(n3 ) of the number of closed walks of length 4, and the latter equals
n
Õ
tr A =4
λ14 +···+ λn4 = p n + o(n ) +
4 4 4
λi4 .
i=2
So tr A4 ≤ p4 n4 + o(n4 ).
i≥2 λi
by n maxi≥2 λi4 = o(n5 ), but this would
Remark 3.1.17. A rookie error would be to bound
Í 4
not be enough. (Where do we save in the above proof?) We will see a similar situation later in ??
in the Fourier analytic proof of Roth’s theorem.
Lemma 3.1.18. The top eigenvalue of the adjacency matrix of a graph is always at least its average
degree.
82 3. PSEUDORANDOM GRAPHS
Proof. Let 1 ∈ Rn be the all-1 vector. By the Courant–Fischer min-max theorem, the adjacency
matrix A of the graph G has top eigenvalue
hx, Axi h1, A1i 2e(G)
λ1 = sup ≥ = = avgdeg(G).
x∈Rn hx, xi h1, 1i v(G)
x,0
Proof that C4 implies EIG. Again writing A for the adjacency matrix,
n
Õ
λi4 = tr A4 = # {closed walks of length 4} ≤ p4 n4 + o(n4 ).
i=1
On the other hand, by Lemma 3.1.18 above, we have λ1 ≥ pn+o(n). So we must have λ1 = pn+o(n)
and maxi≥2 |λi | = o(n).
This completes all the implications in the proof of Theorem 3.1.1.
Remark 3.1.19 (Forcing graphs). The C4 hypothesis says that having 4-cycle density asymptotically
the same as random implies quasirandomness. Which other graphs besides C4 have this property?
Chung, Graham, and Wilson (1989) called a graph F forcing if every graph with edge density
p + o(1) and F-density pe(F) + o(1) (i.e., asymptotically the same as random) is automatically
quasirandom. Theorem 3.1.1 implies that C4 is forcing. It remains an open problem to determine
which graphs are forcing. The forcing conjecture says that F is forcing if and only if G is bipartite
and not a tree (Skokan and Thoma 2004; Conlon, Fox, and Sudakov 2010). We will revisit this
conjecture in Chapter 5 where we will reformulate it using the language of graphons.
More generally, one says that a family of graphs F is forcing if having F-density being
pe(F) + o(1) for each F ∈ F implies quasirandomness. So {K2, C4 } is forcing. It seems to be a
difficult problem to classify forcing families.
Even though many other graphs can potentially play the role of the 4-cycle, the 4-cycle never-
theless occupies an important role in the study of quasirandomness. The 4-cycle comes up naturally
in the proofs, as we will see below. It also is closely tied to other important pseudorandomness
measurements such as the Gowers U 2 uniformity norm in additive combinatorics.
Let us formulate a bipartite analogue of Theorem 3.1.1 since we will need it later. It is easy
to adapt the above proofs to the bipartite version—we encourage the readers to think about the
differences between the two settings.
Remark 3.1.20 (Eigenvalues of bipartite graphs). Given a bipartite graph G with vertex bipartition
V ∪ W, we can write its adjacency matrix as
0 B
A= (3.1.2)
B| 0
where B is an |V | × |W | matrix with rows indexed by V and columns indexed by W. The eigenvalues
λ1 ≥ · · · ≥ λn of A always satisfy
λi = λn+1−i for every 1 ≤ i ≤ n.
In other words, the eigenvalues are symmetric around zero. One way to see this is that if x = (v, w)
is an eigenvalue of A, where v ∈ RV is the restriction of x to the first |V | coordinates, and w is the
the restriction of x to the last |W | coordinates, then
λv
0 B v Bw
= λx = Ax = = ,
λw B| 0 w B| v
3.1. QUASIRANDOM GRAPHS 83
so that
Bw = λv and B | v = λw.
Then the vector x 0 = (v, −w) satisfies
0 B v −Bw −λv
Ax =
0
= = = −λx 0 .
B| 0 −w B| v λw
So we can pair each eigenvalue of A with its negation.
Exercise 3.1.21. Using the notation from (3.1.2), show that the positive eigenvalues of the adja-
cency matrix A coincide with the positive singular values of B (the singular values of B are also
the positive square roots of the eigenvalues of B | B).
Theorem 3.1.22 (Bipartite quasirandom graphs). Fix p ∈ [0, 1]. Let (G n )n≥1 be a sequence of
bipartite graphs Gn . Write Gn as G, with vertex bipartition V ∪ W. Suppose |V | , |W | → ∞ and
|E | = (p + o(1)) |V | |W | as n → ∞. The following properties are all equivalent:
DISC:
e(X, Y ) = p |X | |Y | + o(n2 ) for all X ⊂ V and Y ⊂ W .
COUNT: For every bipartite graph H with vertex bipartition (S, T), the number of labeled copies
of H in G with S embedded in V and T embedded in W is (pe(H) + o(1)) |V | |S| |W | |T | .
C4 : The number of closed walks of length 4 in G starting in V is at most (p4 + o(1)) |V | 2 |W | 2 .
Left-CODEG: x,y∈V codeg(x, y) − p2 |W | = o(|V | 2 |W |).
Í
Proposition 3.1.24. Fix p ∈ [0, 1]. With probability 1, a sequence of bipartite random graphs
Gn ∼ G(n, n, p) (obtained by keeping every edge of Kn,n with probability p independently) is
quasirandom in the sense of Theorem 3.1.22.
Remark 3.1.25 (Sparse graphs). We stated quasirandom properties so far only for graphs of con-
stant order density (i.e., p is a constant). Let us think about what happens if we allow p = pn
to depend on n and decaying to zero as n → ∞. Such graphs are sometimes called sparse (al-
though some other authors reserve the word “sparse” for bounded degree graphs). Theorems 3.1.1
and 3.1.22 as stated do hold for a constant p = 0, but the results are not as informative as we would
84 3. PSEUDORANDOM GRAPHS
like. For example, the error tolerance on the DISC is o(n2 ), which does not tell us much since the
graph already has much fewer edges due to its sparseness anyway.
To remedy the situation, the natural thing to do is to adjust the error tolerance relative to the
edge density p = pn → 0. Here are some representative examples (all of these properties should
also depend on p):
Sparse-DISC: |e(X, Y ) − p |X | |Y || = o(pn2 ) for all X, Y ⊂ V(G).
Sparse-COUNTH : The number of labeled copies of H is (1 + o(1))pe(H) nv(H) .
Sparse-C4 : The number of labeled 4-cycles is at most (1 + o(1))p4 n4 .
Sparse-EIG: λ1 = (1 + o(1))pn and maxi,1 |λi | = o(pn).
Warning: these sparse pseudorandomness conditions are not all equivalent to each other. Some
of the implications still hold (the reader is encouraged to think about which ones). However, some
crucial implications such as the counting lemma fail quite miserably. For example:
Sparse-DISC does not imply Sparse-COUNT.
Indeed, suppose p = n−c for some constant 1/2 < c < 1. In a typical random graph G(n, p),
the number of triangles is close to 3n p3 , while the number of edges is close to 2n p. We have
p3 n3 = o(pn2 ) as long as p = o(n−1/2 ), so there are significantly fewer triangles than there are
edges. Now remove an edge from every triangle in this random graph. We will have removed
o(pn2 ) edges, a negligible fraction of the (p + o(1)) 2n edges, and this edge removal should not
significantly affect Sparse-DISC. However, we have changed the triangle count significantly as a
result.
Fortunately, this is not the end of the story. With additional hypotheses on the sparse graph, we
can sometimes salvage a counting lemma. Sparse counting lemmas play an important role in the
proof of the Green–Tao theorem on arithmetic progressions in the primes, as we will explain in ??.
The next several exercises ask you to prove additional equivalent quasirandomness properties.
It is easy to verify that the quasirandom graphs indeed satisfy each of the properties below.
Exercise 3.1.26∗ (Quasirandomness through fixed sized subsets). Fix p ∈ [0, 1]. Let (G n ) be a
sequence of graphs with v(Gn ) = n (here n → ∞ along a subsequence of integers).
(1) Fix a single α ∈ (0, 1). Suppose
pα2 n2
e(S) = + o(n2 ) for all S ⊂ V(G) with S = bαnc .
2
Prove that G is quasirandom.
(2) Fix a single α ∈ (0, 1/2). Suppose
e(S, V(G) \ S) = pα(1 − α)n2 + o(n2 ) for all S ⊂ V(G) with S = bαnc .
Prove that G is quasirandom. Furthermore, show that the conclusion is false for α = 1/2.
Exercise 3.1.27 (Quasirandomness and regularity partitions). Fix p ∈ [0, 1]. Let (G n ) be a sequence
of graphs with v(Gn ) → ∞. Suppose that for every > 0, there exists M = M() so that each Gn
has an -regular partition where all but -fraction of vertex pairs lie between pairs of parts with
edge density p + o(1) (as n → ∞). Prove that Gn is quasirandom.
Exercise 3.1.28∗ (Triangle counts on induced subgraphs). Fix p ∈ [0, 1]. Let (G n ) be a sequence
of graphs with v(Gn ) = n. Let G = Gn . Suppose that for every S ⊂ V(G), the number of triangles
in the induced subgraph G[S] is p3 |S|
3 + o(n ). Prove that G is quasirandom.
3
3.2. EXPANDER MIXING LEMMA 85
Exercise 3.1.29∗ (Perfect matchings). Prove that there are constant β, > 0 such that for every
positive even integer n and real p ≥ n−β , if G is an n-vertex graph where every vertex has degree
(1 ± )pn (meaning within pn of pn) and every pair of vertices has codegree (1 ± )p2 n, then G
has a perfect matching.
3.2. Expander mixing lemma
We dive further into the relationship between graph eigenvalues and its pseudorandomness
properties. We focus on d-regular graphs since they occur often in practice (e.g., from Cayley
graphs), and they are also cleaner to work with. Unlike the previous section, the results here are
effective for any value of d (not just when d is on the same order as n).
As we saw earlier, the magnitudes of eigenvalues are related to the pseudorandomness of a
graph. In a d-regular graph, the top eigenvalue is always exactly d. The following condition says
that all other eigenvalues are bounded by λ in absolute value.
Definition 3.2.1. An (n, d, λ)-graph is an n-vertex, d-regular graph whose adjacency matrix eigen-
values d = λ1 ≥ · · · ≥ λn satisfy
max |λi | ≤ λ.
i,1
Remark 3.2.2 (Notation). Rather than saying, e.g., “an (n, 7, 6)-graph,” we prefer to say “an (n, d, λ)-
graph with d = 7 and λ = 6” for clarity as the name “(n, d, λ)” is quite standard and recognizable.
Remark 3.2.3 (Linear algebra review). The operator norm of a matrix A ∈ Rm×n is defined by
| Ax| hy, Axi
k Ak = sup = sup .
x∈Rn \{0} |x| x∈Rn \{0} |x| |y|
y∈Rm \{0}
Here |x| = hx, xi denotes the length of vector x. The operator norm of A is the maximum ratio
p
that A can amplify the length of a vector by. If A is a real symmetric matrix, then
k Ak = max |λi (A)| .
i
For general matrices, the operator norm of A equals the largest singular value of A.
Here is the main result of this section.
of AG then replacing one top eigenvalue d by zero. All the other eigenvalues of AG − dn J are
therefore at most λ in absolute value, so AG − dn J ≤ λ. Therefore,
d d
e(X, Y ) − |X | |Y | = 1 X , AG − J 1Y
n n
d
≤ AG − J |1 X | |1Y |
n
p
≤ λ |X | |Y |.
Exercise 3.2.5. Prove the following strengthening the expander mixing lemma.
Theorem 3.2.9 (Bipartite expander mixing lemma). Let G be a bipartite-(n, d, λ)-graph with vertex
bipartition V ∪ W. Then
d p
e(X, Y ) − |X | |Y | ≤ λ |X | |Y | for all X ⊂ V and Y ⊂ W .
n
Exercise 3.2.10. Prove Theorem 3.2.9.
Remark 3.2.11. The following partial converse to the expander mixing lemma was shown by Bilu
and Linial (2006). The extra log factor turns out to be necessary.
Theorem 3.2.12 (Converse to expander mixing lemma). There exists an absolute constant C such
that if G is a d-regular graph, and β satisfies
d p
e(X, Y ) − |X | |Y | ≤ β |X | |Y | for all X, Y ⊂ V(G),
n
then G is an (n, d, λ)-graph with λ ≤ C β log(2d/β).
Remark 3.2.13 (Edge expansion versus spectral gap). Let us mention another important theorem
relating the eigenvalues and expansion. The spectral gap is defined to be the difference between
the two most significant eigenvalues, i.e., λ1 − λ2 for the adjacency matrix of a graph. This quantity
turns out to be closely related to expansion in graphs. We define the edge-expansion ratio of a
graph G = (V, E) to be the quantity
eG (S, V \ S)
h(G) := min .
S⊂V |S|
0<|S|≤|V |/2
In other words, a graph with edge-expansion ratio at least h has the property that for every nonempty
subset of vertices S with |S| ≤ |V | /2, there are at least h |S| edges leaving S.
3.2. EXPANDER MIXING LEMMA 87
Cheeger’s inequality, stated below, tells us that among d-regular graphs for a fixed d, having
spectral gap bounded away from zero is equivalent to having edge-expansion ratio bounded away
from zero. Cheeger (1970) originally developed this inequality for Riemannian manifolds. The
graph theoretic analogue was proved by Dodziuk (1984), and independently by Alon and Milman
(1985) and Alon (1986).
Theorem 3.2.14 (Cheeger’s inequality). Let G be an n-vertex d-regular graph with adjacency
matrix spectral gap κ = d − λ2 . Then its edge-expansion ratio h = h(G) satisfies
√
κ/2 ≤ h ≤ 2dκ.
The two bounds of Cheeger’s inequality are tight up to constant factors. For the lower bound,
taking G to be the skeleton of the d-dimensional cube with vertex set {0, 1} d gives h = 1 (achieved
by the d − 1 dimensional subcube) and κ = 2. For the upper bound, taking G to be an n-cycle gives
h = 2/(n/2) = Θ(1/n) while d = 2 and κ = 2 − 2 cos(2π/n)) = Θ(1/n2 ).
We call a family of d-regular graphs expanders if there is some constant κ0 > 0 so that each
graph in the family has spectral gap ≥ κ0 ; by Cheeger’s inequality, this is equivalent to the existence
of some h0 > 0 so that each graph in the family has edge expansion ratio ≥ h0 . Expander graphs are
important objects in mathematics and computer science. For example, expander graphs have rapid
mixing properties, which are useful for designing efficient Monte Carlo algorithms for sampling
and estimation.
The following direction of Cheeger’s inequality is easier to prove. It is similar to the expander
mixing lemma.
Exercise 3.2.15 (Spectral gap implies expansion). Prove the κ/2 ≤ h part of Cheeger’s inequality.
√
The other direction, h ≤ 2dκ, is more difficult and interesting. The proof is outlined in the
following exercise.
Exercise 3.2.16 (Expansion implies spectral gap). Let G = (V, E) be a d-regular graph with
spectral gap κ. Let x = (xv )v∈V ∈ RV be an eigenvector associated to the second largest eigenvalue
λ2 = d − κ of the adjacency matrix of G. Assume that xv > 0 on at most half of the vertex set (or
else we replace x by −x). Let y = (yv )v∈V ∈ RV be obtained from x by replacing all its negative
coordinates by zero.
(a) Prove that
hy, Ayi
d− ≤ κ.
hy, yi
Hint: recall that λ2 xv =
Í
u∼v xu .
(b) Let Õ
Θ= yu2 − yv2 .
uv∈E
Prove that
Θ2 ≤ 2d(d hy, yi − hy, Ayi) hy, yi .
Hint: yu2 − yv2 = (yu − yv )(yu + yv ). Apply Cauchy–Schwarz.
(c) Relabel the vertex set V by [n] so that y1 ≥ y2 · · · ≥ yt > 0 = yt+1 = · · · = yn . Prove
t
Õ
Θ= (y k2 − y k+1
2
) e([k], [n] \ [k]).
k=1
88 3. PSEUDORANDOM GRAPHS
Example 3.3.4. Cay(F2n, {e1, . . . , en }) is the skeleton of an n-dimensional cube. Here ei is the i-th
standard basis vector. The graphs for n = 1, 2, 3, 4 are illustrated below..
3.3. CAYLEY GRAPHS ON Z/pZ 89
Here is an explicitly constructed family of quasirandom graphs with edge density 1/2 + o(1).
Definition 3.3.5. Given prime p ≡ 1 (mod 4), the Paley graph of order p is Cay(Z/pZ, S), where
S is the set of non-zero quadratic residues in Z/pZ (here Z/pZ is viewed as an additive group).
Example 3.3.6. The Paley graphs for p = 5 and p = 13 are shown below.
0 0
12 1
11 2
4 1
10 3
9 4
3 2 8 5
7 6
Theorem 3.3.8 (Eigenvalues of abelian Cayley graphs on Z/nZ). Let n be a positive integer. Let
S ⊂ Z/nZ with 0 , S and S = −S. Let
ω = exp(2πi/n).
Then we have an orthonormal basis v0, . . . , vn−1 ∈ Cn of eigenvectors of Cay(Z/nZ, S) where
√
v j ∈ Cn has x-coordinate ω j x / n, for each x ∈ Z/nZ.
The eigenvalue associated to the eigenvector v j equals to
Õ
λj = ω js .
s∈S
√
In particular, λ0 = |S| and v0 has all coordinates 1/ n.
Remark 3.3.9 (Eigenvalues and the Fourier transform). The coordinates of the eigenvectors are
shown below.
Z/nZ
√ 0 1 ··· n−1
2
√n v0 1 1 ···
1 1
n v1 1 ω ω2
· · · ωn−1
√
n v2 1 ω2 ω4
· · · ω2(n−1)
.. .. .. ..
.. ..
. . . . . .
√ 2
n vn−1 1 ω n−1 ω 2(n−1) ··· ω (n−1)
Viewed as a matrix, this is sometimes known as the discrete Fourier transform matrix. We will
study the Fourier transform in Chapter 6. These two topics are closely tied. The eigenvalues of an
90 3. PSEUDORANDOM GRAPHS
abelian Cayley graph Cay(Γ, S) are precisely the Fourier transform in Γ of the generating set S, up
to normalizing factors:
eigenvalues of Cay(Γ, S) ←→ Fourier transform 1bS in Γ.
We will say more about this in Remark 3.3.11 below.
Proof. Let A be the adjacency matrix of Cay(Z/nZ, S). First we check that each v j is an eigenvector
√
of A with eigenvalue λ j . The coordinate of nAv j at x ∈ Z/nZ equals to
Õ Õ
ω j(x+s)
= ω ωjx = λjωjx.
js
s∈S s∈S
So Av j = λ j v j .
Next we check that {v0, . . . , vn−1 } is an orthonormal basis. We have the inner product
1
hv j , vk i = 1 · 1 + ω j ω k + ω2 j ω2k + · · · + ω(n−1) j ω(n−1)k
n (
1 1 if j = k,
= 1 + ω k− j + ω2(k− j) + · · · + ω(n−1)(k− j) = .
n 0 if j , k.
For the i , j case, we use that for any m-th root of unity ζ , 1, m−1 j=0 ζ = 0. So {v0, . . . , vn−1 } is
j
Í
an orthonormal basis.
Remark 3.3.10 (Real vs complex eigenbases). The adjacency matrix of a graph is a real symmetric
matrix, so all its eigenvalues are real, and it always has a real orthogonal eigenbasis. The eigenbasis
given in Theorem 3.3.8 is complex, but it can always be made real. Looking at the formulas in
Theorem 3.3.8, we have λ j = λn− j , and v j is the complex conjugate of vn− j . So we can form √ a real
√ eigenbasis by replacing, for each j < {0, n/2}, the pair (v j , vn− j ) by ((v j +vn− j )/ 2, i(v j −
orthogonal
vn− j )/ 2). Equivalently, we can separate the real and imaginary parts of each v j , which are both
eigenvectors with eigenvalue λ j . All the real eigenvalues and eigenvectors can be expressed in
terms of sines and cosines.
Remark 3.3.11 (Every abelian Cayley graph has an eigenbasis independent of the generators). The
above theorem and its proof generalizes to all finite abelian groups, not just Z/nZ. For every finite
abelian group Γ, we have a set b Γ of characters, i.e., homomorphisms χ : Γ → C× . Then b Γ turns
out to be a group isomorphic to Γ (one can check this by first writing Γ as a direct product of
cyclicpgroups). For each χ ∈ bΓ, define the vector v χ ∈ CΓ by setting the coordinate at g ∈ Γ to be
χ(g)/ |Γ|. Then {v χ : χ ∈ b Γ} is an orthonormal basis forÍthe adjacency matrix of every Cayley
graph on Γ. The eigenvalue corresponding to v χ is λ χ (S) = s∈S χ(s). Up to normalization, λ χ (S)
is the Fourier transform of the indicator function of S on the abelian group Γ (Theorem 3.3.8 is
a special case of this construction). In particular, this eigenbasis {v χ : χ ∈ bΓ} depends only on
the finite abelian group and not on the generating set S. In other words, we have a simultaneous
diagonalization for all adjacency matrices of Cayley graphs on a fixed finite abelian group.
If Γ is a non-abelian group, then there does not exist a simultaneous eigenbasis for all Cayley
graphs on Γ. There is a corresponding theory of non-abelian Fourier analysis, which uses group
representation theory. We will discuss more about non-abelian Cayley graphs in Section 3.4.
Now we apply the above formula to compute eigenvalues of Paley graphs. In particular, the
following tells us that Paley graphs satisfy the quasirandomness condition EIG from Theorem 3.1.1.
3.3. CAYLEY GRAPHS ON Z/pZ 91
Theorem 3.3.12 (Eigenvalues of Paley graphs). Let p ≡ 1 (mod 4) be a prime. The adjacency of
matrix of the Paley graph of order p has top eigenvalue (p − 1)/2, and all other eigenvalues are
√ √
either ( p + 1)/2 or (− p + 1)/2.
Proof. Applying Theorem 3.3.8, we see that the eigenvalues are given by, for j = 0, 1, . . . , p − 1,
Õ 1 Õ
j x2
λj = ω = −1 + js
ω ,
s∈S
2 x∈F p
since each quadratic residue s appears as x 2 for exactly two non-zero x. Clearly λ0 = (p − 1)/2. For
√
j , 0, the next result shows that the inner sum on the right-hand side is ± p (note that the above
sum is real when p ≡ 1 (mod 4) since S = S −1 and so the sum equals to its own complex conjugate;
alternatively, the sum must be real since all eigenvalues of a symmetric matrix are real).
Remark 3.3.13. Since the trace of the adjacency matrix is zero, and equals the sum of eigenvalues,
√ √
we see that the non-top eigenvalues are equally split between ( p + 1)/2 and (− p + 1)/2.
Theorem 3.3.14 (Gauss sum). Let p be an odd prime, ω = exp(2πi/p), and j ∈ F p \ {0}. Then
Õ 2 √
ωjx = p.
x∈F p
Proof. We have
Õ 2 Õ Õ
j x2 2 −x 2 ) 2
ω = ω j((x+y) = ω j(2xy+y ) .
x∈F p x,y∈Z/pZ x,y∈Z/pZ
For each fixed y, we have
(
Õ
j(2xy+y 2 ) p if y = 0,
ω =
x∈Z/pZ
0 if y , 0.
Summing over y yields the claim.
Remark 3.3.15 (Sign of the Gauss sum). The determination of this sign is a more difficult problem.
Gauss conjectured the sign in 1801 and it took him four years to prove it. When j is a nonzero
√
quadratic residue mod p, the inner sum above turns out to equal to p if p ≡ 1 (mod 4) and
√ √ √
i p if p ≡ 3 (mod 4). When j is a quadratic non-residue, it is − p and −i p in the two cases
respectively. For a proof, see, e.g., Ireland and Rosen (1990, Section 6.4).
Exercise 3.3.16. Let p be an odd prime and A, B ⊂ Z/pZ. Show that
Õ Õ a + b p
≤ p | A| |B|
a∈A b∈B
p
where (a/p) is the Legendre symbol defined by
0
if a ≡ 0 (mod p)
a
= 1
if a is a nonzero quadratic residue mod p
p
−1 if a is a quadratic nonresidue mod p
√
Exercise 3.3.17. Prove that in a Paley graph of order p, every clique has size at most p.
92 3. PSEUDORANDOM GRAPHS
Exercise 3.3.18 (No spectral gap if too few generators). Prove that for every > 0 there is some
c > 0 such that for every S ⊂ Z/nZ with 0 < S = −S and |S| ≤ c log n, the second largest eigenvalue
of the adjacency matrix of Cay(Z/nZ, S) is at least (1 − ) |S|.
Exercise 3.3.19∗. Let p be a prime and let S be a multiplicative subgroup of F×p . Suppose −1 ∈ S.
Prove that all eigenvalues of the adjacency matrix of Cay(Z/pZ, S), other than the top one, are at
√
most p in absolute value.
Theorem 3.4.3. Let Γ be a group of order n with no non-trivial representations of dimension less
than K. Then every d-regular Cayley graph on Γ is an (n, d, λ)-graph for some λ <
p
dn/K.
More generally we will prove the result for vertex-transitive groups, of which Cayley graphs is
a special case.
Definition 3.4.4 (vertex-transitive graphs). Let G be a graph. An automorphism of G is a permu-
tation of V(G) that induces an isomorphism of G to itself (i.e., sending edges to edges). Let Γ be a
group of automorphisms of G (not necessarily the whole automorphism group). We say that Γ acts
vertex-transitively on G if for every pair v, w ∈ V(G) there is some g ∈ Γ such that gv = w. We
that G is a vertex-transitive graph if the automorphism group of G acts vertex-transitively on G.
In particular, every group Γ acts vertex-transitively on its Cayley graph Cay(Γ, S) by left-
multiplication: the action of g ∈ Γ sends each vertex x ∈ Γ to gx ∈ Γ, which sends each edge
(x, xs) to (gx, gxs), for all x ∈ Γ and s ∈ S.
3.4. QUASIRANDOM GROUPS 93
Theorem 3.4.5. Let Γ be a finite group with no non-trivial representations of dimension less than
p n-vertex d-regular graph that admits a vertex-transitive Γ action is an (n, d, λ)-graph
K. Then every
with λ < dn/K.
p √
Note that dn/K ≤ n/ K, so that a sequence of such Cayley graphs is quasirandom (Defini-
tion 3.1.2) as long as K → ∞ as n → ∞.
Proof. Let A denote the adjacency matrix of the graph, whose vertices are indexed by {1, . . . , n}.
Each g ∈ Γ gives a permutation (g(1), . . . , g(n)) of the vertex set, which induces a representation
of Γ on Cn given by permuting coordinates, sending v = (v1, . . . , vn ) ∈ Cn to gv = (vg(1), . . . , vg(n) ).
We know that the all-1 vector 1 is an eigenvector of A with eigenvalue d. Let v ∈ Rn be
an eigenvector of A with eigenvalue µ such that v ⊥ 1. Since each g ∈ Γ induces a graph
automorphism, Av = µv implies A(gv) = µgv (since g relabels vertices in an isomorphically
indistinguishable way).
Since Γv = {gv : g ∈ Γ} is Γ-invariant, its C-span W is a Γ-invariant subspace (i.e., gW ⊂ W
for all g ∈ Γ), and hence a sub-representation of Γ. Since v is not a constant vector, the Γ-action
on v is non-trivial. So W is a non-trivial representation of Γ. Hence dim W ≥ K by hypothesis.
Every nonzero vector in W is an eigenvector of A with eigenvalue µ. It follows that µ appears as
an eigenvalue of A with multiplicity at least K. Recall that we also have an eigenvalue d from the
eigenvector 1. Thus
Õn
d 2 + K µ2 ≤ λ j (A)2 = tr A2 = nd.
j=1
Therefore r r
d(n − d) dn
| µ| ≤ < .
K K
The above proof can be modified to prove a bipartite version, which will be useful for certain
applications.
Given a finite group Γ and a subset S ⊂ Γ (not necessarily symmetric), we define the bipartite
Cayley graph BiCay(Γ, S) as the bipartite graph with vertex set Γ on both parts, with an edge
joining g on the left with gs on the right for every g ∈ Γ and s ∈ S.
Theorem 3.4.6. Let Γ be a group of order n with no non-trivial representations of dimension less
p |S| = d. Then the bipartite Cayley graph BiCay(Γ, S) is a bipartite-(n, d, λ)-
than K. Let S ⊂ Γ with
graph for some λ < nd/K.
In other words,p the second largest eigenvalue of the adjacency matrix of this bipartite Cayley
graph is less than nd/K.
Exercise 3.4.7. Prove Theorem 3.4.6.
As an application of the expander mixing lemma, we show that in a quasirandom group, the
number of solutions to x y = z with x, y, z lying in three given sets X, Y, Z ⊂ Γ is close to what
one should predict from density alone. Note that the right-hand side expression below is relatively
small if K 2 is large compared to |X | |Y | |Z | /|Γ| 3 (e.g., if X, Y, Z each occupy at least a constant
proportion of the group, and K tends to infinity).
94 3. PSEUDORANDOM GRAPHS
Theorem 3.4.8 (Mixing in quasirandom groups). Let Γ be a finite group with no non-trivial repre-
sentations of dimension less than K. Let X, Y, Z ⊂ Γ. Then
r
|X | |Y | |Z | |X | |Y | |Z | |Γ|
# {(x, y, z) ∈ X × Y × Z : x y = z} − < .
|Γ| K
Proof. Every solution to x y = z, with (x, y, z) ∈ X × Y × Z corresponds to an edge (x, z) in
BiCay(Γ, Y ) between vertex subset X on the left and vertex subset Z on the right.
X
z
Γ y = x −1 z ∈ Y Γ
x Z
BiCay(Γ, Y )
Corollary 3.4.9 (Product-free sets). Let Γ be a finite group with no non-trivial representations of
dimension less than K. Let X, Y, Z ⊂ Γ. If there is no solution to x y = z with (x, y, z) ∈ X × Y × Z,
then
|Γ| 3
|X | |Y | |Z | < .
K
In particular, every product-free X ⊂ Γ (product-free meaning that there is no solution to xy = z
with x, y, z ∈ X) has size less than |Γ| /K 1/3 .
Proof. If there is no solution to xy = z, then the left-hand side of the inequality in Theorem 3.4.8
is |X | |Y | |Z | /|Γ|. Rearranging gives the result.
The above result already shows that all product-free subsets of a quasirandom group must be
small. This sharply contrasts the abelian setting. For example, in Z/nZ (written additively), there
is a sum-free subset of size around n/3 consisting of all group elements strictly between n/3 and
2n/3.
Exercise 3.4.10 (Growth and expansion in quasirandom groups). Let Γ be a finite group with no
non-trivial representations of dimension less than K. Let X, Y, Z ⊂ Γ. Suppose |X | |Y | |Z | ≥
|Γ| 3 /K. Then XY Z = Γ (i.e., every element of Γ can be expressed as x yz for some (x, y, z) ∈
X × Y × Z).
Now let us see some examples of quasirandom groups.
Example 3.4.11 (Quasirandom groups). Here are some examples of groups with no small non-
trivial representations.
(a) A classic result of Frobenius from around 1900 shows that every non-trivial representation
of PSL(2, p) has dimension at least (p − 1)/2 for all prime p. A short proof is included
below. Jordan (1907) and Schur (1907) computed the character tables for PSL(2, q) for all
prime power q. In particular, we know that every non-trivial representation of PSL(2, q) has
dimension ≥ (q − 1)/2 for all prime power q.
3.4. QUASIRANDOM GROUPS 95
(b) The alternating group Am for m ≥ 2 has order m!/2, and its smallest non-trivial representation
has dimension m − 1 = Θ(log n/log log n). The representations of symmetric and alternating
groups have a nice combinatorial description using Young diagrams. See, e.g., Sagan (2001)
or Fulton and Harris (1991) for expository accounts of this theory.
(c) Gowers (2008, Theorem 4.7) gives an elementary proof that in every non-cyclic
p simple group
of order n, the smallest non-trivial representation has dimension at least log n/2.
Recall that the special linear group SL(2, p) is the group of 2 × 2 matrices (under multiplication)
with determinant 1:
a b
SL(2, p) = : a, b, c, d ∈ F p, ad − bc = 1 .
c d
The projective special linear group PSL(2, p) is a quotient of SL(2, p) by all scalars, i.e.,
PSL(2, p) = SL(2, p)/{±I} .
The following result is due to Frobenius
Theorem 3.4.12 (PSL(2, p) is quasirandom). Let p be a prime. Then all non-trivial representations
of SL(2, p) and PSL(2, p) have dimension at least (p − 1)/2.
Proof. The claim is trivial for p = 2, so we can assume that p is odd. It suffices to prove the claim
for SL(2, p). Indeed, any non-trivial representation of PSL(2, p) can be made into a representation
of SL(2, p) by first passing through the quotient SL(2, p) → SL(2, p)/{±I} = PSL(2, p).
Now suppose ρ is a non-trivial representation of SL(2, p). The group SL(2, p) is generated by
the elements (Exercise: check!)
1 1 1 0
g= and h= .
0 1 −1 1
These two elements are conjugate in SL(2, p) via z = 11 −1 0 as gz = zh. If ρ(g) = I, then ρ(h) = I
by conjugation, and ρ would be trivial since g and h generate the group. So, ρ(g) , I. Since g p = I,
we have ρ(g) p = I. So ρ(g) is diagonalizable (here we use that a polynomial is diagonalizable if an
only if its minimal polynomial has distinct roots, and that the minimal polynomial of ρ(g) divides
X p − 1). Since ρ(g) , I, ρ(g) has an eigenvalue λ , 1. Since ρ(g) p = I, λ is a primitive p-th root
of unity.
For every a ∈ F×p , g is conjugate to
a 0 1 1 a−1 0 1 a2 2
−1 = = ga .
0 a 0 1 0 a 0 1
2
Thus ρ(g) is conjugate to ρ(g)a . Hence these two matrices have same set of eigenvalues. So
2
λ a is an eigenvalue of ρ(g) for every a ∈ F×p , and by ranging over all a ∈ F×p , this gives
(p − 1)/2 distinct eigenvalues of ρ(g) (recall that λ is a primitive p-th root of unity). It follows that
dim ρ ≥ (p − 1)/2.
Applying Corollary 3.4.9 with Theorem 3.4.12 yields the following corollary (Gowers 2008).
Note that the order of PSL(2, p) is (p3 − p)/2.
Corollary 3.4.13. The largest product-free subset of PSL(2, p) has size O(p3−1/3 ).
In particular, there exist infinitely many groups of order n whose largest product-free subset
has size O(n8/9 ).
96 3. PSEUDORANDOM GRAPHS
Before Gowers’ work, it was not known whether every order n group has a product-free subset
of size ≥ cn for some absolute constant c > 0 (this was Question 3.4.1, asked by Babai and Sós).
Gowers’ result shows that the answer is no.
In the other direction, Kedlaya (1997; 1998) showed that every finite group of order n has a
product-free subset of size & n11/14 . In fact, he showed that if the group has a proper subgroup H
of index m, then there is a product-free subset that is a union of & m1/2 cosets of H.
The above results tell us that having no small non-trivial representations is a useful property of
groups. Gowers further showed that this group representation theoretic property is equivalent to
several other characterizations of the group.
Theorem 3.4.14 (Quasirandom groups). Let Γn be a sequence of finite groups of increasing order.
The following are equivalent:
REP: The dimension of the smallest non-trivial representation of Γn tends to infinity.
GRAPH: Every sequence of bipartite Cayley graphs on Γn , as n → ∞, is quasirandom in the
sense of Theorem 3.1.22.
PRODFREE: The largest product-free subset of Γn has size o(|Γn |).
(X ⊂ Γn is product-free if there is no solution to x y = z with x, y, z ∈ X)
QUOTIENT: For every proper normal subgroup H of Γn , the quotient Γn /H is nonabelian and
has order tending to infinity as n → ∞.
Theorem 3.4.15. Let Γ be a group with a non-trivial representation of dimension K. Then Γ has
a product-free subset of size at least c |Γ| /K, where c > 0 is some absolute constant.
To see that REP implies QUOTIENT, note that any non-trivial representation of Γ/H is
automatically a representation of Γ after passing through the quotient. Furthermore, every non-
√ representation, and every group of order m > 1
trivial abelian group has a non-trivial 1-dimensional
has a non-trivial representation of dimension < m. For the proof of the converse, see Gowers
(2008, Theorem 4.8). (This implication has an exponential dependence of parameters.)
Remark 3.4.16 (Non-abelian Fourier analysis). (This is an advanced remark and can be skipped
over.) Section 3.3 discussed the Fourier transform on finite abelian groups. The topic of this section
can be alternatively viewed through the lenses of the non-abelian Fourier transform. We refer to,
e.g., Wigderson (2012), for a tutorial on the non-abelian Fourier transform from a combinatorial
perspective.
Let us give here the recipe for computing the eigenvalues and an orthonormal basis of eigen-
vectors of Cay(Γ, S).
3.5. QUASIRANDOM CAYLEY GRAPHS 97
viewed as a dim ρ × dim ρ matrix over C. Then Mρ has dim ρ eigenvalues λ ρ,1, . . . , λ ρ,dim ρ .
Here is how to list all the eigenvalues of the adjacency matrix of Cay(Γ, S): repeating each λ ρ,i
with multiplicity dim ρ, ranging over all irreducible representations ρ and all 1 ≤ i ≤ dim ρ.
To emphasize, the eigenvalues always come in bundles with multiplicities determined by the the
dimensions of the irreducible representations of Γ (although it is possible for there to be additional
coalescence of eigenvalues).
One can additionally recover a system of eigenvectors of Cay(Γ, S). For each eigenvector v with
eigenvalue λ of Mρ , and every w ∈ Cdim ρ , set x ρ,v,w ∈ CΓ with coordinates
ρ,v,w
xg = hρ(g)v, wi
for all g ∈ Γ. Then x is an eigenvector of Cay(Γ, S) with eigenvalue λ. Now let ρ range over all
irreducible representations of Γ, and let v range over an orthonormal basis of eigenvectors of Mρ
(let λ be the corresponding eigenvalue), and let w range over an orthonormal basis of eigenvectors
of Cdim ρ , then x ρ,v,w ranges over an orthogonal system of eigenvectors of Cay(Γ, S). The eigenvalue
associated to x ρ,v,w is λ.
A basic theorem in representation theory tells us that the regular representation decomposes
into a direct sum of dim ρ copies of ρ ranging over every irreducible representation ρ of Γ.
This decomposition then corresponds to a block diagonalization (simultaneously for all S) of the
adjacency matrix of Cay(Γ, S) into blocks Mρ , repeated dim ρ times, for each ρ. The above statement
comes from interpreting this block diagonalization.
The matrix Mρ , appropriately normalized, is the non-abelian Fourier transform of the indi-
cator vector of S at ρ. Many basic and important formulas for Fourier analysis over abelian groups,
e.g, inversion and Parseval (which we will see in Chapter 6) have nonabelian analogs.
Proposition 3.5.2. Every regular graph satisfying Sparse-EIG() also satisfies Sparse-DISC().
Proof. In an (n, d, λ) graph with λ ≤ d, by the expander mixing lemma (Theorem 3.2.4), for every
vertex subsets X and Y ,
d p p
e(X, Y ) − |X | |Y | ≤ λ |X | |Y | ≤ d |X | |Y | ≤ dn.
n
So the graph satisfies Sparse-DISC().
98 3. PSEUDORANDOM GRAPHS
The converse fails badly. Consider the disjoint union of a large random d-regular graph and a
Kd+1 (here d = o(n)).
large random
Kd+1
d-regular graph
This graph satisfies Sparse-DISC(o(1)) since it is satisfied by the large component, and the small
component Kd+1 contributes negligibly to discrepancy due to its size. On the other hand, each
connected component contributes a eigenvalue of d (by taking the all-1 vector supported on each
component), and so Sparse-EIG() fails for any < 1.
The main result of this section is that despite the above example, if we restrict ourselves to
Cayley graphs (abelian or non-abelian), Sparse-DISC() and Sparse-EIG() are always equivalent
up to a linear change in . This result is due to Conlon and Zhao (2017).
As in Section 3.4, we prove the result more generally for vertex-transitive graphs.
A pedagogical reason to show the proof of this theorem is to showcase the following important
inequality from functional analysis due to Grothendieck (1953).
Given a matrix A = (ai, j ) ∈ Rm×n , we can consider its ` ∞ → ` 1 norm
sup k Ayk `1 ,
k yk ∞ ≤1
which can also be written as (exercise: check! Also see Lemma 4.5.3 for a related fact about the
cut norm of graphons)
n Õ
Õ n
sup hx, Ayi = sup ai, j xi y j . (3.5.1)
x∈{−1,1} m x1,··· ,xm ∈{−1,1} i=1 j=1
y∈{−1,1} n y1,...,yn ∈{−1,1}
where the surpremum is taken over vectors x1, . . . , xm, y1, . . . , yn in the unit ball of some real
Hilbert space, whose norm is denoted by k k. Without loss of generality, we can take assume that
these vectors lie in Rm+n with the usual Euclidean norm (here m + n dimensions are enough since
x1, . . . , xm, y1, . . . , yn span a real subspace of dimension at most m + n).
We always have
(3.5.1) ≤ (3.5.2)
by restricting the vectors in (3.5.2) to R. The latter expression
Í (3.5.2) is called a semidefinite
relaxation since it can be also written as the supremum of i, j ai, j Mi, j over all positive semidefinite
3.5. QUASIRANDOM CAYLEY GRAPHS 99
Theorem 3.5.5 (Grothendieck’s inequality). There exists a constant K > 0 (K = 1.8 works) such
that for all matrices A = (ai, j ) ∈ Rm×n ,
m Õ
Õ n m Õ
Õ n
sup ai, j hxi, yi i ≤ K sup ai, j xi y j ,
k xi k, k y j k ≤1 i=1 j=1 xi,y j ∈{±1} i=1 j=1
where the left-hand side supremum is taken over vectors x1, . . . , xn, y1, . . . , ym in the unit ball of
some real Hilbert space.
Remark 3.5.6. The optimal constant K is known as the real Grothendieck’s constant. Its exact
value is unknown. It is known to lie within [1.676, 1.783]. There is also a complex version of
Grothendieck’s inequality, where the left-hand side uses a complex Hilbert space (and place an
absolute value around the final sum). The corresponding complex Grothendieck’s constant is
known to lie within [1.338, 1.405].
We will not prove Grothendieck’s inequality here. See Alon and Naor (2006) for three proofs
of the inequality, along with algorithmic discussions.
Now let us use Grothendieck’s inequality to show that Sparse-DISC() implies Sparse-EIG(8)
for vertex-transitive graphs.
Proof of Theorem 3.5.3. Let G be an n-vertex d-regular graph with a vertex-transitive group Γ of
automorphisms. Suppose G satisfies Sparse-DISC(). Let A be the adjacency matrix of G. Write
d
J B = A−
n
where J is the n × n all-1 matrix. To show that G is an (n, d, λ)-graph with λ ≤ d, it suffices
to show that B has operator norm kBk ≤ d (here we are using that G is d-regular, so the all-1
eigenvector of A with eigenvalue d becomes an eigenvector of B with eigenvalue zero 0).
For any X, Y ⊂ V(G), the corresponding indicator vectors x = 1 X ∈ Rn and y = 1Y ∈ Rn satisfy,
by Sparse-DISC(),
d
|hx, Byi| = e(X, Y ) − |X | |Y | ≤ dn.
n
Then, for any x, y ∈ {−1, 1}n , we can write x = x + −x − and y = y + −y − with x +, x −, y +, y − ∈ {0, 1}n .
Since,
hx, Byi = hx +, By + i − hx +, By − i − hx −, By + i + hx −, By − i,
and each term on the right-hand side is at most dn in absolute value, we have
|hx, Byi| ≤ 4 dn for all x, y ∈ {−1, 1}n . (3.5.3)
For any graph automorphism g ∈ Γ and any x = (x1, . . . , xn ) ∈ Rn and i ∈ [n], write
r
n
x =
j
xg( j) : g ∈ Γ ∈ RΓ .
|Γ|
100 3. PSEUDORANDOM GRAPHS
For every unit vector x ∈ Rn , the vector x j ∈ RΓ is a unit vector since x12 + · · · + xn2 = 1 and the map
g 7→ g( j) is n/|Γ|-to-1 for each j. Similarly define y j for any y ∈ Rn and j ∈ [n]. Furthermore,
Bi, j = Bg(i),g( j) for any g ∈ Γ and j ∈ [n] due to g being a graph automorphism.
To prove the operator norm bound kBk ≤ 8 d, it suffices to show that hx, Byi ≤ 8 d for every
pair of unit vectors x, y ∈ Rn . We have
n n
Õ 1 ÕÕ
hx, Byi = Bi, j xi y j = Bg(i),g( j) xg(i) yg( j)
i, j=1
|Γ| g∈Γ i, j=1
n n
1 ÕÕ 1Õ
= Bi, j xg(i) yg( j) = Bi, j hx i, y j i ≤ 8 d.
|Γ| g∈Γ i, j=1 n i, j=1
The final step follows from Grothendieck’s inequality (applied with K ≤ 2) along with (3.5.3).
This completes the proof of Sparse-EIG(8).
We will see two different proofs. The first proof (Nilli 1991) constructs an eigenvector explicitly.
The second proof (only for Corollary 3.6.3) uses the trace method to bound moments of the
eigenvalues via counting closed walks.
Lemma 3.6.4. Let G = (V, E) be a d-regular graph. Let A be the adjacency matrix of G. Let r be
a positive integer. Let st be an edge of G. For each i ≥ 0, let Vi denote the set of all vertices at
distance exactly i from {s, t} (so that in particular V0 = {s, t}). Let x = (xv )v∈V ∈ RV be a vector
with coordinates (
(d − 1)−i/2 if v ∈ Vi and i ≤ r,
xv =
0 otherwise, i.e., dist(v, {s, t}) > r.
3.6. SECOND EIGENVALUE BOUND 101
Then
√
hx, Axi 1
≥ 2 d−1 1−
hx, xi r +1
s t xv
V0 1
V1 (d − 1)−1/2
V2 (d − 1)−1
V3 (d − 1)−3/2
Proof. Let L = dI − A (this is called the Laplacian matrix of G). The claim can be rephrased as
an upper bound on hx, L xi /hx, xi. Here is an important and convenient formula (it can be easily
proved by expanding):
Õ
hx, L xi = (xu − xv )2 .
uv∈E
Since xv is constant for all v in the same Vi , we only need to consider edges spanning consecutive
Vi ’s. Using the formula for x, we obtain
r−1 2
e(Vr , Vr+1 )
Õ 1 1
hx, L xi = e(Vi, Vi+1 ) − +
i=0
(d − 1)i/2 (d − 1)(i+1)/2 (d − 1)r
For each i ≥ 0, each vertex in Vi has at most d − 1 neighbors in Vi+1 , so e(Vi, Vi+1 ) ≤ (d − 1) |Vi |.
Thus continuing from above,
r−1 2
Õ 1 1 |Vr | (d − 1)
≤ |Vi | (d − 1) − +
i=0
(d − 1)i/2 (d − 1)(i+1)/2 (d − 1)r
√ r−1
2 Õ
|Vi | |Vr | (d − 1)
= d−1−1 +
i=0
(d − 1) i (d − 1)r
√ r √
Õ |Vi | |V |
r
= d−2 d−1 + 2 d − 1 − 1 .
i=0
(d − 1) i (d − 1)r
We have |Vi+1 | ≤ (d − 1) |Vi | for every i ≥ 0, so that |Vr | (d − 1)−r ≤ |Vi | (d − 1)−i for each i ≤ r.
So continuing,
√r
!
√
2 d − 1 − 1 Õ |Vi |
≤ d−2 d−1+
r +1 i=0
(d − 1)i
√ !
√ 2 d−1−1
= d−2 d−1+ hx, xi ,
r +1
102 3. PSEUDORANDOM GRAPHS
It follows that
√ !
hx, Axi hx, L xi √ 2 d−1−1
=d− ≥ 2 d−1−
hx, xi hx, xi r +1
√
1
≥ 1− 2 d − 1.
r +1
Proof of the Alon–Boppana bound (Theorem 3.6.2). Let V = V(G). Let 1 be the all-1’s vector,
which is an eigenvector with eigenvalue d. To prove the theorem, it suffices to exhibit a a nonzero
vector z ⊥ 1 such that
hz, Azi √
≥ 2 d − 1 − o(1).
hz, zi
Let r be an arbitrary positive integer. When n is sufficiently large, there exist two edges st and s0t 0
in the graph with distance at least 2r + 2 apart (indeed, since the number of vertices within distance
k of and edge is ≤ 2(1 + (d − 1) + (d − 1)2 + · · · + (d − 1) k )). Let x ∈ RV be the vector constructed
as in Lemma 3.6.4 for st, and let y ∈ RV be the corresponding vector constructed for s0t 0. Recall
that x is supported on vertices within distance r from st, and likewise with y and s0t 0. Since st and
s0t 0 are at distance at least 2r + 2 apart, the support of x is at distance at least 2 from the support of
y. Thus
hx, yi = 0 and hx, Ayi = 0.
Choose a constant c ∈ R such that z = x − cy has sum of its entries equal to zero (this is possible
since hy, 1i > 0). Then
hz, zi = hx, xi + c2 hy, yi
and so by Lemma 3.6.4
hz, Azi = hx, Axi + c2 hy, Ayi
√
1
≥ 1− 2 d − 1 hx, xi + c2 hy, yi
r +1
√
1
= 1− 2 d − 1 hz, zi .
r +1
Taking r → ∞ as n → ∞ gives the theorem.
Remark 3.6.5. The above proof cleverly considers distance from an edge rather than from a single
vertex. This is important for a rather subtle reason. Why does the proof fail if we had instead
considered distance from a vertex?
Now let us give another proof—actually we will only prove the slightly weaker statement of
Corollary 3.6.3, which is equivalent to
√
max {|λ2 | , |λn |} ≥ 2 d − 1 − o(1). (3.6.1)
√
As a warmup, let us first prove (3.6.1) with d − o(1) on the right-hand side. We have
Õn
dn = 2e(G) = tr A2 = λi2 ≤ d 2 + (n − 1) max {|λ2 | , |λn |}2 .
i=1
So r
d(n − d) √
max {|λ2 | , |λn |} ≥ = d − o(1)
n−1
3.6. SECOND EIGENVALUE BOUND 103
as n → ∞ for fixed d.
To prove (3.6.1), we consider higher moments tr Ak . This is a useful technique, sometimes
called the trace method or the moment method.
Alternative proof of (3.6.1). The quantity
n
Õ
tr A 2k
= λi2k
i=1
counts the number of closed walks of length 2k on G. Let Td denote the infinite d-regular tree.
Observe that
# closed length-2k walks in G starting from a fixed vertex
≥ # closed length-2k walks in Td starting from a fixed vertex.
Indeed, at each vertex, for both G and Td , we can label its d incident edges arbitrarily from 1 to d
(the labels assigned from the two endpoints of the same edge do not have to match). Then every
closed length-2k walk in Td corresponds to a distinct closed length-2k walk in G by tracing the
same outgoing edges at each step (why?). Note that not all closed walks in G arise this way (e.g.,
walks that go around cycles in G).
The number of closed walks of length2k on an infinite d-regular graph starting at a fixed root
is at least (d − 1) k Ck , where Ck = k+1
1 2k
k is the k-th Catalan number. To see this, note that each
step in the walk is either “away from the root” or “towards the root.” We record a sequence by
denoting steps of the former type by + and of the latter type by −.
If t.tt
i
t t t
Then the number of valid sequences permuting k +’s and k −’s is exactly counted by the Catalan
number Ck , as the only constraint is that there can never be more −’s than +’s up to any point in
the sequence. Finally, there are at least d − 1 choices for where to step in the walk at any + (there
are d choices at the root), and exactly one choice for each −.
Thus, the number of closed walks of length 2k in G is at least
n 2k
2k k
tr A ≥ n(d − 1) Ck ≥ (d − 1) k .
k +1 k
On the other hand, we have
n
Õ
tr A2k
= λi2k ≤ d 2k + (n − 1) max {|λ2 | , |λn |}2k .
i=1
Thus,
d 2k
1 2k
max {|λ2 | , |λn |} 2k
≥ (d − 1) k − .
k +1 k n−1
104 3. PSEUDORANDOM GRAPHS
Conjecture 3.6.9 (Existence of Ramanujan graphs). For every positive integer d ≥ 3, there exist
infinitely many d-regular Ramanujan graphs.
While it is not too hard to construct small Ramanujan graphs, e.g., Kd+1 has eigenvalues λ1 = d
and λ2 = · · · = λn = −1, it is a difficult problem to construct infinitely many d-regular Ramanujan
graphs for each d.
The term Ramanujan graphs was coined by Lubotzky, Phillips, and Sarnak (1988), who
constructed infinite families of d-regular Ramanujan graphs when d − 1 is an odd prime. The same
result was independently proved by Margulis (1988). The proof of the eigenvalue bounds uses
deep results from number theory, namely solutions to the Ramanujan conjecture (hence the name).
These constructions were later extended by Morgenstern (1994) whenever d − 1 is a prime power.
The current state of Conjecture 3.6.9 is given below, and it remains open for all other d, with the
smallest open case being d = 7.
Theorem 3.6.10. If d − 1 is a prime power, then there exist infinitely many d-regular Ramanujan
graphs.
All known results are based on explicit constructions using Cayley graphs on PSL(2, q) or
related groups. We refer the reader to the book Davidoff, Sarnak, and Valette (2003) for a gentle
exposition of the construction.
Theorem 3.6.7 says that random d-regular graphs are “nearly-Ramanujan.” Empirical evidence
suggests that for each fixed d, a uniform random n-vertex d-regular graph is Ramanujan with
probability bounded away from 0 and 1, for large n. If this were true, it would prove Conjecture 3.6.9
on the existence of Ramanujan graphs. However, no rigorous results are known in this vein.
One can formulate a bipartite analog.
FURTHER READING 105
Definition
√ 3.6.11. A bipartite Ramanujan graph is some bipartite-(n, d, λ)-graph with λ =
2 d − 1.
Given a Ramanujan graph G, we can turn it into a bipartite Ramanujan graph G × K2 . So the
existence of bipartite Ramanujan graphs is weaker than of Ramanujan graphs. Nevertheless, for
a long time, it was not known how to construct infinite families of bipartite Ramanujan graphs
other than using Ramanujan graphs. A breakthrough by Marcus, Spielman, and Srivastava (2015)
completely settled the bipartite version of the problem. Unlike earlier construction of Ramanujan
graphs, their proof is existential (i.e., non-constructive) and introduces an important technique of
interlacing families of polynomials.
Theorem 3.6.12 (Bipartite Ramanujan graphs of every degree). For every d ≥ 3, there exist infin-
itely many d-regular bipartite Ramanujan graphs.
Exercise 3.6.13 (Alon–Boppana bound with multiplicity). Prove that for every positive integer d and
real > 0, there is some√constant c > 0 so that every n-vertex d-regular graph has at least cn
eigenvalues greater than 2 d − 1 − .
Exercise 3.6.14∗ (Net removal decreases top eigenvalue). Show that for every d and r, there is
some > 0 such that if G is a d-regular graph, and S ⊂ V(G) is such that every vertex of G is
within distance r of S, then the top eigenvalue of the adjacency matrix of G − S (i.e., remove S and
its incident edges from G) is at most d − .
Further reading
The survey Pseudo-random Graphs by Krivelevich and Sudakov (2006) discusses many com-
binatorial aspects of this topic.
Expander graphs are a large and intensely studied topic, partly due to many important appli-
cations in computer science. See the survey Expander Graphs and Their Applications by Hoory,
Linial, and Wigderson (2006). The survey Expander Graphs in Pure and Applied Mathematics by
Lubotzky (2012) as well as his book Discrete Groups, Expanding Graphs and Invariant Measures
(1994) go more in-depth into graph expansion and connections to algebra.
For spectral graph theory, see the book Spectral Graph Theory by Chung (1997), or the book
draft currently in progress by Spielman titled Spectral and Algebraic Graph Theory.
The textbook Elementary Number Theory, Group Theory and Ramanujan Graphs by Davidoff,
Sarnak, and Valette (2003) gives a gentle introduction to the construction of Ramanujan graphs.
The breakthrough by Marcus, Spielman, and Srivastava (2015) constructing bipartite Ramanu-
jan graphs via interlacing polynomials is an instant classic.
CHAPTER 4
Graph limits
The theory of graphs limits was developed by Lovász and his collaborators in a series of works
starting around 2003. The researchers were motivated by questions about very large graphs from
several different angles, including from combinatorics, statistical physics, computer science, and
applied math. Graph limits give an analytic framework for analyzing large graphs. The theory
offers both a convenient mathematical language as well as powerful theorems.
Suppose we lived in a hypothetical world where we only had access to rational numbers and
had no language for irrational numbers. We are given the following optimization problem:
minimize x 3 − x subject to 0 ≤ x ≤ 1.
√
The minimum occurs at x = 1/ 3, but this answer does not make sense over the rationals. With
only access to rationals, we can state a progressively improving sequence of answers that converge
to the optimum. This is rather cumbersome. It is much easier to write down a single real number
expressing the answer.
Now consider an analogous questions for graphs. Fix some real p ∈ [0, 1]. We want to
minimize (# closed walks of length 4)/n4
among n-vertex graphs with edge density ≥ p.
We know from Proposition 3.1.12 every n-vertex graph with edge density ≥ p has at least n4 p4
closed walks of length 4. On the other hand, every sequence of quasirandom graphs with edge
density p + o(1) has p4 n4 + o(n4 ) closed walks of length 4. It follows that the minimum (or rather,
infimum) is p4 , and is attained not by any single graph, but rather by a sequence of quasirandom
graphs.
One of the purposes of graph limits is to provide an easy-to-use mathematical object that
captures the limit of such graph sequences. The central object in theory of graph limits is called a
graphon (the word comes from combining graph and function), to be defined shortly. Graphons
can be viewed as an analytic generalization of graphs.
Here are some questions that we will consider:
(1) What does it mean for a sequence of graphs (or graphons) to converge?
(2) Are different notions of convergence equivalent?
(3) Does every convergent sequence of graphs (or graphons) have a limit?
Note that it is possible to talk about convergence without a limit. In a first real analysis
course, one learns about a Cauchy sequence in a metric space (X, d), which is some sequence
x1, x2, · · · ∈ X such that for every > 0, there is some N so that d(xm, xn ) < for all m, n ≥ N. For
instance, Q one can have a Cauchy sequence without a limit in Q. A metric space is complete if
every Cauchy sequence has a limit. The completion of X is some complete metric space X e such
that X is isometrically embedded in X as a dense subset. The completion of X is in some sense
e
the smallest complete space containing X. For example, R is the completion of Q. Intuitively, the
107
108 4. GRAPH LIMITS
completion of a space fills in all its the gaps. A basic result in analysis says that every space has a
unique completion.
Here is a key result about graph limits that we will prove:
The space of graphons is compact, and is the completion of graphs.
To make this statement precise, we also need to define a notion of similarity (i.e., distance) between
graphs, and also between graphons. We will see two different notions, one based on the cut metric,
and another based on subgraph densities. Another important result in the theory of graph limits
is that these two notions are equivalent. We will prove it at the end of the chapter once we have
developed some tools.
4.1. Graphons
Here is the central object in the theory of dense graph limits.
Definition 4.1.1. A graphon is a symmetric measurable function W : [0, 1]2 → [0, 1]. Here
symmetric means W(x, y) = W(y, x) for all x, y.
Remark 4.1.2. More generally, we can consider an arbitrary probability space Ω and study sym-
metric measurable functions Ω × Ω → [0, 1]. In practice, we do not lose much by restricting to
[0, 1].
We will also sometimes consider symmetric measurable functions [0, 1]2 → R (e.g., arising as
the difference between two graphons). Such an object is sometimes called a kernel in the literature.
Remark 4.1.3 (Measure theoretic technicalities). We try to sweep measure theoretic technicalities
under the rug so that we can focus on the key ideas. We always ignore measure zero differences.
For example, we shall treat two graphons as the same if they only differ on a measure zero subset
of the domain.
Here is a procedure to turn any graph G into a graphon WG :
(1) Write down the adjacency matrix AG of the graph;
(2) Replace the matrix by a black and white pixelated picture on [0, 1]2 , by turning every 1 entry
into a black square and every 0 entry into a white square.
(3) View the resulting picture as a graphon WG : [0, 1]2 → [0, 1] (with the axis labeled like a
matrix, i.e., x ∈ [0, 1] running from top to bottom and y ∈ [0, 1] running from left to right),
where we write WG (x, y) = 1 if (x, y) is black and WG (x, y) = 0 if (x, y) is white.
An equivalent definition is given below. As with everything in this chapter, we ignore measure
zero differences, and so it does not matter what we do with boundaries of the pixels.
Definition 4.1.4. Given a graph G with n vertices labeled 1, . . . , n, we define its associated graphon
WG : [0, 1]2 → [0, 1] by first partitioning [0, 1] into n equal-length intervals I1, . . . , In and setting
WG to be 1 on all Ii × I j where i j is an edge of G, and 0 on all other Ii × I j ’s.
More generally, we can encode nonnegative vertex and edge weights in a graphon.
Definition 4.1.5. A step-graphon W with k steps consists of first partitioning [0, 1] into k intervals
I1, . . . , Ik , and then setting W to be a constant on each Ii × I j .
Example 4.1.6 (Half-graph). Consider the bipartite graph on 2n vertices, with one vertex part
{v1, . . . , vn } and the other vertex part {w1, . . . , wn }, and edges vi w j whenever i ≤ j. Its adjacency
4.1. GRAPHONS 109
In general, pointwise convergence turns out to be too restrictive. We will need a more flexible
notion of convergence, which we will discuss more in depth in the next section. Let us first give
some more examples to motivate subsequent definitions.
Example 4.1.7 (Quasirandom graphs). Let G n be a sequence of quasirandom graphs with edge
density approaching 1/2, and v(Gn ) → ∞. The constant graphon W ≡ 1/2 seems like a reasonable
candidate for its limit, and later we will see that this is indeed the case.
−→ 1
2
Example 4.1.8 (Stochastic block model). Consider an n vertex graph with two types of vertices:
red and blue. Half of the vertices are red and half of the vertices are blue. Two red vertices are
adjacent with probability pr , two blue vertices are adjacent with probability pb , and finally, a red
vertex and a blue vertex are adjacent with probability pr b , all independently. Then as n → ∞, the
graphs converge to the step-graphon shown below.
pr pr b
−→
pr b pb
The above examples suggest that the limiting graphon looks like a blurry image of the adjacency
matrix. However, there is an important caveat as illustrated in the next example.
Example 4.1.9 (Checkerboard). Consider the 2n × 2n “checkerboard” graphon shown below (for
n = 4).
1 2
3 4
5 6
7 8
Since the 0 and 1’s in the adjacency matrix are evenly spaced, one might suspect that the constant
1/2 graphon would be a limit of the sequence of graphons as the number of vertices tends to infinity.
However, this is not so. The checkerboard graphon is associated to the complete bipartite graph
110 4. GRAPH LIMITS
Kn,n , with the two vertex parts interleaved. By relabeling the vertices, we see that below is another
representation of the associated graphon of the same graph.
1 5
2 6
3 7
4 8
So the graphon is the same for all n. So the graphon shown on the right, which is also WK2 , must
be the limit of the sequence, and not the constant 1/2 graphon.
This example tells us that we must be careful about the possibility of rearranging vertices when
studying graph limits.
A graphon is an infinite dimensional object. We would like some ways to measure the similarity
between two graphons. We will explain two different approaches:
(1) cut distance, and
(2) homomorphism densities.
One of the main results in the theory of graph limits is that these two approaches are equivalent—we
will show this later in the chapter.
fractionalization. The general definition of cut distance will allow us to compare graphs with
different numbers of vertices. It is conceptually easier to define cut distance using graphons.
The edit distance of graphs corresponds to the L 1 distance for graphons. For every p ≥ 1, we
define the L p norm of a function W : [0, 1]2 → R by
∫ 1/p
kW k p := p
|W(x, y)| dxdy ,
[0,1]2
and the L ∞ norm by
kW k ∞ := sup t : W −1 ([t, ∞)) has positive measure .
(This is not simply the supremum of W; the definition should be invariant under measure zero
changes of W.)
Definition 4.2.1. The cut norm of a measurable W : [0, 1]2 → R is defined as
∫
kW k := sup W ,
S,T ⊂[0,1] S×T
f (x) = 2x mod 1
A
0 0
0 1 0 1
f −1 (A)
112 4. GRAPH LIMITS
This map is also measure-preserving. This might not seem to be the case at first, since f seems
to shrink some intervals by half. However, the definition of measure-preserving actually says
λ( f −1 (A)) = λ(A) and not λ( f (A)) = λ(A). For any interval [a, b] ⊂ [0, 1], we have f −1 ([a, b]) =
[a/2, b/2] ∪ [1/2 + a/2, 1/2 + b/2], which does have the same measure as [a, b]. This map is 2-to-1,
and it is not invertible.
Given W : [0, 1]2 → R and an invertible measure preserving maps φ : [0, 1] → [0, 1], we write
W φ (x, y) := W(φ(x), φ(y)).
Intuitively, this operation relabels of the vertex set.
Definition 4.2.4 (Cut metric). Given two symmetric measurable functions U, W : [0, 1]2 → R, we
define their cut distance (or cut metric) to be
δ (U, W) := inf U − W φ
φ
∫
= inf sup (U(x, y) − W(φ(x), φ(y))) dxdy ,
φ S,T ⊂[0,1] S×T
where the infimum is taken over all invertible measure preserving maps φ : [0, 1] → [0, 1]. Define
the cut distance between two graphs G and G0 by the cut distance of their associated graphons:
δ (G, G0) := δ (WG, WG 0 ).
Likewise, we can also define the cut distance between a graph and a graphon U:
δ (G, U) := δ (WG, U).
Definition 4.2.5 (Convergence in cut metric). We say that a sequence of graphs or graphons con-
vergences in cut metric if they form a Cauchy sequence with respect to δ . Furthermore, we say
that Wn convergences to W in cut metric if δ (Wn, W) → 0 as n → ∞.
Note that in δ (G, G0), we are doing more than just permuting vertices. A measure preserving
map on [0, 1] is also allowed to split a single node into fractions.
It is possible for two different graphons to have cut distance zero. For example, they could differ
on a measure-zero set, or could be related via measure-preserving maps. We can form a metric
space by identifying graphons with measure zero.
Definition 4.2.6 (Graphon space). Let W
f0 be the set of graphons (i.e., symmetric measurable
functions [0, 1]2 → [0, 1]) where any pair of graphons with cut distance zero are considered the
same point in the space. This is a metric space under cut distance δ .
We view every graph G as a point in W0 via its associated graphon (note that several graphons
can be identified as the same point in W0 ).
(The subscript 0 in Wf0 is conventional. Sometimes, without the subscript, W
f is used to denote
the space of symmetric measurable functions [0, 1]2 → R.)
Here is a central theorem in the theory of graph limits, proved by Lovász and Szegedy (2007).
f0, δ ) is compact.
Theorem 4.2.7 (Compactness of graphon space). The metric space (W
One of the main goals of this chapter is to prove this theorem and show its applications.
The compactness of graphon space is related to the graph regularity lemma. In fact, we will use
the regularity method to prove compactness. Both compactness and the graph regularity lemma
4.3. HOMOMORPHISM DENSITY 113
tell us that despite the infinite variability of graphs, every graph can be -approximated by a graph
from a finite set of templates.
We close this section with the following observation.
f0, δ ).
Theorem 4.2.8 (Graphs are dense in graphons). The set of graphs is dense in (W
Proof. Let > 0. It suffices to show that for every graphon W there exists a graph G such that
δ (G, W) < .
We approximate W in several steps, illustrated below.
W W1 W2
First, by rounding down the values of W(x, y), we construct a graphon W1 whose values are all
integer multiples of /3, and such that
kW − W1 k ∞ ≤ /3.
Next, since every Lebesgue measurable subset of [0, 1]2 can be arbitrarily well approximated
using a union of boxes, we can find a step graphon W2 approximating W1 in L 1 norm:
kW1 − W2 k 1 ≤ /3.
Finally, by replacing each block of W2 by a sufficiently large quasirandom (bipartite) graph of
edge density equal to the value of W2 (c.f. Example 4.1.8), we find a graph G so that
kW2 − WG k ≤ /3.
Then δ (W, G) < .
Remark 4.2.9. In the above proof, to obtain kW1 − W2 k1 ≤ /3, the number of steps of W2 cannot
be uniformly bounded as function of , i.e., it must depend on W as well (why? Think about a
random graph). Consequently the number of vertices of the final graph G produced by this proof
is not bounded by a function of .
Later on, we will see a different proof showing that for every > 0, there is some N() so that
every graphon lies within cut distance of some graph with ≤ N() vertices (Proposition 4.8.1).
Since every compact metric space is complete, we have the following corollary.
Definition 4.3.1. A graph homomorphism from F to G is a map φ : V(F) → V(G) such that if
uv ∈ E(F) then φ(u)φ(v) ∈ E(G) (i.e., φ maps edges to edges). Define
Hom(F, G) := {homomorphisms from F to G}
and
hom(F, G) := |Hom(F, G)| .
Define the F-homomorphism density in G (or F-density in G for short) as
hom(F, G)
t(F, G) := .
v(G)v(F)
This is also the probability that a uniformly random map V(F) → V(G) induces a graph homomor-
phism from F to G.
Example 4.3.2.
• hom(K1, G) = v(G).
• hom(K2, G) = 2e(G).
• hom(K3, G) = 6 · #triangles in G
• hom(G, K3 ) is the number of proper colorings G using three labeled colors, e.g., {red, green,
blue} (corresponding to the vertices of K3 ).
Remark 4.3.3 (Subgraphs versus homomorphisms). Note that the homomorphisms from F to G
do not quite correspond to copies of subgraphs F inside G, because the homomorphisms can be
non-injective. Define the injective homomorphism density
#injective homomorphisms from F to G
tinj (F, G) := .
v(G)(v(G) − 1) · · · (v(G) − v(F) + 1)
Equivalently, this is the fraction of injective maps V(F) → V(G) that are graph homomorphisms
(i.e., send edges to edges). The fraction of maps V(F) → V(G) that are non-injective is ≤
v(F)
2 /v(G) (for every fixed pair of vertices of F, the probability that they collide is exactly 1/v(G)).
So
1 v(F)
t(F, G) − tinj (F, G) ≤ .
v(G) 2
If F is fixed, the right-hand side tends to zero as v(G) → ∞. So all but a negligible fraction of
such homomorphisms correspond to subgraphs. This is why we often treat subgraph densities
interchangeably with homomorphism densities as they agree in the limit.
Now we define the corresponding notion of homomorphism density in graphons. We first give
an example and then the general formula.
Example 4.3.4 (Triangle density in graphons). The following quantity is the triangle density in a
graphon W.
∫
t(K3, W) = W(x, y)W(y, z)W(z, x) dxdydz.
[0,1]3
This definition agrees with Definition 4.3.1 for the triangle density in graphs. Indeed, for every
graph G, the triangle density in G equals the triangle density in the associated graphon WG , i.e.,
t(K3, WG ) = t(K3, G).
4.3. HOMOMORPHISM DENSITY 115
Theorem 4.3.8 (Existence of limit for left-convergence). Every left-convergent sequence of graphs
or graphons left-converges to some graphon.
Remark 4.3.9. One can artificially define a metric that coincides with left-convergence. Let (Fn )n≥1
enumerate over all graphs. One can define a distance between graphons U and W by
Õ
2−k |t(Fk , W) − t(Fk , U)| .
k ≥1
We see that a sequence of graphons convergences under this notion of distance if and only if it
is left-convergent. This shows that left-convergence defines a metric topology on the space of
graphons, but in practice the above distance is pretty useless.
116 4. GRAPH LIMITS
Exercise 4.3.10. Define W : [0, 1]2 → R by W(x, y) = 2 cos(2π(x − y)). Let F be a graph. Show
that t(F, W) is the number of ways to orient all edges of F so that every vertex has the same number
of incoming edges as outgoing edges.
qr prr pr b
qb pr b pbb
with all the numbers lying in [0, 1], and subject to qr + qb = 1. We form a n-vertex random graph
as follows:
(1) Color each vertex red with probability qr and blue with probability qb , independently at
random. These vertex colors are “hidden states” and are not part of the data of the output
random graph (this step is slightly different from Example 4.1.8 in an unimportant way);
(2) For every pair of vertices, independent place an edge between them with probability
• prr if both vertices are red,
• pbb if both vertices are blue, and
• pr b if one vertex is red and the other is blue.
One can easily generalize the above to a k-block model, where vertices have one of k hidden
states, with q1, . . . , qk (adding up to 1) being the vertex state probabilities, and a symmetric k × k
matrix (pi j )1≤i, j≤k of edge probabilities for pairs of vertices between various states.
The W-random graph is a further generalization of stochastic block models, which correspond
to step-graphons W.
Definition 4.4.1 (W -random graph). Let W be a graphon. The n-vertex W-random graph G(n, W)
denotes the n-vertex random graph (with vertices labeled 1, . . . , n) obtained by first picking
x1, . . . , xn uniformly at random from [0, 1], and then putting an edge between vertices i and j
with probability W(xi, x j ), independently for all 1 ≤ i < j ≤ n.
x3 x5 x1 x2 x4
x3
x5
x1
x2
x4
4.4. W-RANDOM GRAPHS 117
Theorem 4.4.2 (W -random graphs left-converge to W ). Let W be a graphon. For each n, let G n be
a random graph distributed as G(n, W). Then Gn left-converges to W with probability 1.
Remark 4.4.3. The theorem does not require each G n to be sampled independently. For example,
we can we construct the sequence of random graphs, with Gn distributed as G(n, W), by revealing
one vertex at a time without resampling the previous vertices and edges. In this case, each Gn is a
subgraph of the next graph Gn+1 .
We will need the following standard result about concentration of Lipschitz functions. This can
be proved using Azuma’s inequality. See, e.g., (Alon and Spencer 2016, Chapter 7).
Theorem 4.4.5 (Sample concentration for graphons). For every > 0, positive integer n, graph F,
and graphon W, we have
− 2 n
P (|t(F, G(n, W)) − t(F, W)| > ) ≤ 2 exp . (4.4.2)
8v(F)2
Proof. Recall from Remark 4.3.3 that the injective homomorphism density tinj (F, G) is defined to
be the fraction of injective maps V(F) → V(G) that carry every edge of F to an edge of G. We will
first prove that
− 2 n
P tinj (F, G(n, W)) − t(F, W) > ≤ 2 exp .
(4.4.3)
2v(F)2
Let y1 . . . , yn , and zi j for each 1 ≤ i < j ≤ n, be independent uniform random variables in [0, 1].
Let G be the graph on vertices {1, . . . , n} with an edge between i and j if and only if zi j ≤ W(yi, y j ),
for every i < j. Then G has the same distribution as G(n, W). Let us group variables yi, zi j into
x1, x2, . . . , xn where
x1 = (y1 ), x2 = (y2, z12 ), x3 = (y3, z13, z23 ), x4 = (y4, z14, z24, z34 ), ....
This amounts to exposing the graph G one vertex at a time. Define the function f (x1, . . . , xn ) =
tinj (F, G). Note that E f = E tinj (F, G(n, W)) = t(F, W) by linearity of expectations (in this step, it
is important that we are using the injective variant of homomorphism densities). Note changing a
single coordinate of f changes the value of the function by at most v(F)/n, since exactly a v(F)/n
fraction of injective maps V(F) → V(G) hits any specific v ∈ V(G) in the image. Then (4.4.3)
follows from the bounded difference inequality, Theorem 4.4.4.
To deduce the theorem from (4.4.3), recall from Remark 4.3.3 that
t(F, G) − tinj (F, G) ≤ v(F)2 /(2v(G)).
118 4. GRAPH LIMITS
If < v(F)2 /n, then the right-hand side of (4.4.2) is at least 2e−/8 ≥ 1, and so the inequality
trivially holds. Otherwise, |t(F, G(n, W)) − t(F, W)| > implies tinj (F, G(n, W)) − t(F, W) >
− v(F)2 /(2n) ≥ /2, and then we can apply (4.4.3) to conclude.
Theorem 4.4.2 then follows from the Borel–Cantelli lemma, stated below, applied to Theo-
rem 4.4.5 with a union bound over all rational > 0.
Theorem 4.4.6 (Borel–Cantelli lemma). Given a sequence of events E1, E2, . . . , if < ∞,
Í
n P(En )
then with probability 1, only finitely of them occur.
Theorem 4.5.1 (Counting lemma). Let F be a graph. Let W and U be graphons. Then
|t(F, W) − t(F, U)| ≤ |E(F)| δ (W, U).
Qualitatively, the counting lemma tells us that for every graph F, the function t(F, ·) is continuous
f0, δ ), the graphon space with respect to the cut metric. It implies the easier direction of the
in (W
equivalence in Theorem 4.3.7, namely that convergence in cut metric implies left-convergence.
Corollary 4.5.2. Every Cauchy sequence of graphons with respect to the cut metric is left-
convergent.
In the rest of this section, we prove Theorem 4.5.1. It suffices to prove that
|t(F, W) − t(F, U)| ≤ |E(F)| kW − U k . (4.5.1)
Indeed, for every invertible measure preserving maps φ : [0, 1] → [0, 1], we have t(F, U) = t(F, U φ ).
By considering the above inequality with U replaced by U φ , and taking the infimum over all U φ ,
we obtain Theorem 4.5.1.
The following reformulation of the cut norm is often quite useful.
Lemma 4.5.3 (Reformulation of cut norm). For every measurable W : [0, 1]2 → R,
∫
kW k = sup W(x, y)u(x)v(y) dxdy .
u,v:[0,1]→[0,1] [0,1]2
measurable
Proof. We want to show (left-hand side below is how we defined the cut norm in Definition 4.2.1)
∫ ∫
sup W(x, y)1S (x)1T (y) dxdy = sup W(x, y)u(x)v(y) dxdy .
S,T ⊂[0,1] [0,1]2 u,v:[0,1]→[0,1] [0,1]2
measurable measurable
The right-hand side is at least as large as the left-hand side since we can take u = 1S and v = 1T .
On the other hand, the integral on the right-hand side is bilinear in u and v, and so it is always
possible to change u and v to {0, 1}-valued functions without decreasing the value of the integral
4.5. COUNTING LEMMA 119
(e.g., think about what is the best choice for v with u held fixed, and vice versa). If u and v are
restricted to {0, 1}-valued functions, then the two sides are identical.
As a warm up, let us illustrate the proof of the triangle counting lemma, which has all the ideas
of the general proof but with simpler notation.
So
t(K3, W) = t(W, W, W) and t(K3, U) = t(U, U, U).
Note t(W12, W13, W23 ) is trilinear in W12, W13, W23 . In particular, For any graphons W and U, we
have ∫
t(W, W, W) − t(U, W, W) = (W − U)(x, y)W(x, z)W(y, z) dxdydz.
[0,1]3
For any fixed z, note that x 7→ W(x, z) and y 7→ W(y, z) are both measurable functions [0, 1] →
[0, 1]. So applying Lemma 4.5.3 gives
∫
(W − U)(x, y)W(x, z)W(y, z) dxdy ≤ kW − U k
[0,1]2
for every z. Now integrate over all z and applying the triangle inequality, we obtain
|t(W, W, W) − t(U, W, W)| ≤ kW − U k .
We have similar inequalities in the other two coordinates. We can write
t(W, W, W) − t(U, U, U) = t(W, W, W − U) + t(W, W − U, U) + t(W − U, U, U).
We say that each term on the right-hand side is at most kW − U k in absolute value. So the result
follows.
The above proof generalizes in a straightforward way to an general graph counting lemma..
Remark 4.6.2 (Interpreting weak regularity). Given A, B ⊂ V(G), suppose one only knew how many
vertices from A and B lie in each part of the partition (and not specifically which vertices), and
asked to predict the number of edges between A and B, then the sum above is the number of edges
between A and B that one would naturally expect based on the edge densities between vertex parts.
Being weak regular says that this prediction is roughly correct.
Weak regularity is more “global” compared to the notion of -regular partition from Chapter 2.
Here A and B have size a constant order fractions of the entire vertex set, rather than subsets of
individual parts of the partition. The edge densities between certain pairs A ∩ Vi and B ∩ Vj could
differ significantly from that of Vi and Vj . All we ask is that on average these discrepancies mostly
cancel out.
The following weak regularity lemma was proved by Frieze and Kannan (1999), initially
motivated by algorithmic applications that we will mention in Remark 4.6.11.
Theorem 4.6.3 (Weak regularity lemma for graphs). Let 0 < < 1. Every graph has a weak
2
-regular partition into up to at most 41/ vertex parts.
Now let us state the corresponding notions for graphons.
Definition 4.6.4 (Stepping operator). Given a symmetric measurable function W : [0, 1]2 → R,
and a measurable partition P = {S1, . . . , Sk } of [0, 1], define a symmetric measurable function
WP : [0, 1]2 → R by setting its value on each Si × S j to be the average value of WP of over Si × S j
(since we only care about functions up to measure zero sets, we can ignore all parts Si with measure
zero).
In other words, WP is a step-graphon with steps given by P and values given by averaging W
over the steps.
Remark 4.6.5. The stepping operator is the orthogonal projection in the Hilbert space L 2 ([0, 1]2 )
onto the subspace of functions constant on each step Si × S j . It can also be viewed as the conditional
expectation with respect to the σ-algebra generated by Si × S j .
Definition 4.6.6 (Weak regular partition for graphons). Given graphon W, we say that a measurable
partition P of [0, 1] into finitely many parts is weak -regular if
kW − WP k ≤ .
4.6. WEAK REGULARITY LEMMA 121
Theorem 4.6.7 (Weak regularity lemma for graphons). Let 0 < < 1. Then every graphon has a
2
weak -regular partition into at most 41/ parts.
Remark 4.6.8. Technically speaking, Theorem 4.6.3 does not follow from Theorem 4.6.7 since the
partition of [0, 1] for WG could split intervals corresponding to individual vertices of G. However,
the proofs of the two claims are exactly the same. Alternatively, one can allow a more flexible
definition of a graphon as symmetric measurable functions W : Ω × Ω → [0, 1], and then take Ω to
be the discrete probability space V(G) endowed with the uniform measure.
Like the proof of the regularity lemma in Section 2.1, we use an energy increment strategy.
Recall from Definition 2.1.8 that the energy of a vertex partition is the mean-squared edge-density
between parts. Given a graphon W, we define the energy of a measurable partition P = {S1, . . . , Sk }
of [0, 1] by
∫ Õ k
kWP k 2 =
2
WP (x, y) dxdy =
2
λ(Si )λ(S j )(avg of W on Si × S j )2
[0,1]2 i, j=1
Lemma 4.6.9 ( L 2 energy increment). Let W be a graphon. Let P be a finite measurable partition
of [0, 1] that is not -regular for W. Then there is a measurable refinement P 0 of P, dividing each
part of P into at most 4 parts, such that
kWP 0 k 22 > kWP k 22 + 2 .
Proof. Because kW − WP k > , there exist measurable subsets S, T ⊂ [0, 1] such that
|hW − WP, 1S×T i| > .
Let P 0 be the refinement of P by introducing S and T, dividing each part of P into ≤ 4 sub-parts.
We know that
hWP, WP i = hWP 0, WP i
because WP is constant on each step of P, and P 0 is a refinement of P. Thus,
hWP 0 − WP, WP i = 0.
By the Pythagorean Theorem (in the Hilbert space L 2 ([0, 1]2 ),
kWP 0 k22 = kWP k22 + kWP 0 − WP k22 > kWP k22 + 2,
where the final step is due to
kWP 0 − WP k 2 ≥ |hWP 0 − WP, 1S×T i| = |hW − WP, 1S×T i| > .
We will prove the following slight generalization of Theorem 4.6.7, allowing an arbitrary
starting partition (this will be useful later).
Theorem 4.6.10 (Weak regularity lemma for graphons). Let 0 < < 0. Let W be a graphon. Let
P0 be a finite measurable partition of [0, 1]. Then every graphon has a weak -regular partition
2
P, such that P refines P0 , and each part of P0 is partitioned into at most 41/ parts under P.
122 4. GRAPH LIMITS
This proposition specifically tells us that starting with any given partition, the regularity argu-
ment still works.
Proof. Starting with i = 0:
(1) If Pi is -regular, then STOP.
(2) Else, by Lemma 4.6.9, there exists a measurable partition Pi+1 refining each part of Pi into at
2 2
most 4 parts, such that WPi+1 2 > WPi 2 + 2 .
(3) Increase i by 1 and go back to Step (1).
Since 0 ≤ kWP k 2 ≤ 1 for every P, the process terminates with i < 1/ 2 , resulting in a terminal Pi
with the desired properties.
Remark 4.6.11 (Additive approximation of maximum cut). One of the initial motivations for devel-
oping the weak regularity lemma was to develop a a general efficient algorithm for estimating the
maximum cut in a dense graph. The maximum cut problem is a central problem in algorithms
and combinatorial optimization:
MAX CUT: Given a graph S, find a S ⊂ V(G) that maximizes e(S, V(G) \ S).
Goemans and Williamson (1995) found an efficient 0.878-approximation algorithm (this means
that the algorithm outputs some S with e(S, V(G) \ S) at least a factor 0.878 of the optimum). Their
seminal algorithm uses a semidefinite relaxation. The Unique Games Conjecture would imply
that the it would not be possible to obtain a better approximation than the Goemans–Williamson
algorithm (Khot, Kindler, Mossel, and O’Donnell 2007). It is also known that approximating
beyond 16/17 ≈ 0.941 is NP-hard (Håstad 2001).
On the other hand, an algorithmic version of the weak regularity gives us an efficient algorithm
to approximate the maximum cut for for dense graphs, i.e., finding a cut within an n2 additive
error of the optimum, for any constant > 0 is a constant. The basic idea is to find a weak regular
partition V(G) = V1 ∪ · · · ∪ Vk , and then do a brute-force search through all possibles sizes |S ∩ Vi |.
See Frieze and Kannan (1999) for more details. These ideas have been further developed into
efficient sampling algorithms, sampling only poly(1/) random vertices, for estimating maximum
cut in a dense graph, e.g., see Alon, Fernandez de la Vega, Kannan, and Karpinski (2003a).
The following exercise offers alternate approach to the weak regularity lemma. It gives an
approximation of a graphon as a linear combination of ≤ −2 indictor functions of boxes. The
polynomial dependence of −2 is important for designing efficient approximation algorithms.
Exercise 4.6.12 (Weak regularity decomposition). (a) Let > 0. Show that for every graphon W,
there exist measurable S1, . . . , Sk , T1, . . . , Tk ⊆ [0, 1] and reals a1, . . . , a k ∈ R, with k < −2 ,
such that
Õ k
W− ai 1Si ×Ti ≤ .
i=1
The rest of the exercise shows how to recover a regularity partition from the above approxi-
mation.
(b) Show that the stepping operator is contractive with respect to the cut norm, in the sense that
if W : [0, 1]2 → R is a measurable symmetric function, then kWP k ≤ kW k .
(c) Let P be a partition of [0, 1] into measurable sets. Let U be a graphon that is constant on S × T
for each S, T ∈ P. Show that for every graphon W, one has
kW − WP k ≤ 2kW − U k .
4.7. MARTINGALE CONVERGENCE THEOREM 123
(d) Use (a) and (c) to give a different proof of the weak regularity lemma (with slightly worse
bounds than the one given in class): show that for every > 0 and every graphon W, there
2
exists partition P of [0, 1] into 2O(1/ ) measurable sets such that kW − WP k ≤ .
Exercise 4.6.13∗ (Second neighborhood distance). Let 0 < < 1/2. Let W be a graphon. Define
τW,x : [0, 1] → [0, 1] by
∫
τW,x (z) = W(x, y)W(y, z) dy.
[0,1]
(This models the second neighborhood of x.) Prove that if a finite set S ⊂ [0, 1] satisfies
kτW,s − τW,t k1 > for all distinct s, t ∈ S,
2
then |S| ≤ (1/)C/ , where C is some absolute constant.
Exercise 4.6.14 (Strong regularity lemma). Let = (1, 2, . . . ) be a sequence of positive reals. By
repeatedly applying the weak regularity lemma, show that there is some M = M() such that for
every graphon W, there is a pair of partitions P and Q of [0, 1] into measurable sets, such that Q
refines P, |Q| ≤ M (here |Q| denotes the number of parts of Q),
kW − WQ k ≤ |P | and kWQ k22 ≤ kWP k22 + 12 .
Furthermore, deduce the strong regularity lemma in the following form: one can write
W = Wstr + Wpsr + Wsml
where Wstr is a k-step-graphon with k ≤ M, kWpsr k ≤ k , and kWsml k1 ≤ 1 . State your bounds
on M explicitly in terms of . (Note: the parameter choice k = /k 2 roughly corresponds to
Szemerédi’s regularity lemma, in which case your bound on M should be an exponential tower of
2’s of height −O(1) ; if not then you are doing something wrong.)
Example 4.7.4 (Betting strategy). Consider any betting strategy in a “fair” casino, where the ex-
pected value of each bet is zero. Let Xn be the balance after n rounds of betting. Then Xn is a
martingale regardless of the betting strategy. So every betting strategy has zero expected gain after
n rounds. Also see the optional stopping theorem for a more general statement, e.g., Williams
(1991, Chapter 10).
The original meaning of the word “martingale” refers to the following better strategy on a
sequence of fair coin tosses. Each round the better is allowed to bet an arbitrary amount Z: if
heads, the better gains Z dollars, and if tails the better loses Z dollars.
Start betting 1 dollar. If one wins, stop. If one loses, then double one’s bet for the next coin.
And then repeat (i.e., keep doubling one’s bet until the first win, at which point one stops).
A “fallacy” is that this strategy always results in a final net gain of $1, the supposed reason being
that that with probability 1 one eventually sees a head. This initially appears to contradict the earlier
claim that all betting strategies have zero expected gain. Thankfully there is no contradiction. In
real life, one starts with a finite budget and could possibly go bankrupt with this betting strategy,
thereby leading to a forced stop. In the optional stopping theorem, there are some boundedness
hypothesis that are violated by above strategy.
The following construction of martingales most relevant for our purposes.
Example 4.7.5 (Doob martingale). Let X be some “hidden” random variable. Partial information
is revealed about X gradually over time. For example, X is some fixed function of some random
inputs. So the exact value of X is unknown but its distribution can be derived from the distribution
of the inputs. Initially one does not know any of the inputs. Over time, some of the inputs are
revealed. Let
Xn = E[X | all information revealed up to time n].
Then X0, X1, . . . is a martingale (why?). Informally, Xn is the best guess (in expectation) of X based
on all the information available up to time n. We have X0 = EX (when no information is revealed).
All information are revealed as n → ∞, and the martingale Xn converges to the random variable X
with probability 1.
Here is a real-life example. Let X ∈ {0, 1} be whether a candidate wins in a presidential election.
Let Xn be the inferred probability that the candidate wins, given all the information known at time tn .
Then Xn converges to the “truth”, a {0, 1}-value, eventually becoming deterministic when election
result is finalized.
Then Xn is a martingale. At time tn , knowing Xn , if the expectation for Xn+1 (conditioned on
everything known at time tn ) were different from Xn , then one should have adjusted Xn accordingly
in the first place.
The precise notion of “information” in the above formula can be formalized using the notion of
filtration in probability theory.
Here is the main result of this section.
Theorem 4.7.6 (Martingale convergence theorem). Every bounded martingale converges with
probability 1.
In other words, if X0, X1, . . . is a a martingale with Xn ∈ [0, 1] for every n, then the sequence is
convergent with probability 1.
Remark 4.7.7. The proof actually shows that the boundedness condition can be replaced by the
weaker L 1 -boundedness condition, i.e., supn E |Xn | < ∞. Even more generally, uniform integrabil-
ity is enough.
4.8. COMPACTNESS OF THE SPACE OF GRAPHONS 125
Some boundedness condition is necessary. For example, in Example 4.7.3, a running sum of
independent uniform ±1 is a non-bounded martingale, and never converges.
Proof. If a sequence X0, X1, . . . does not converge, then there exist a pair of rational numbers
0 < a < b < 1 such that Xn “up-crosses” [a, b] infinitely many times, meaning that there is an
infinite sequence s1 < t1 < s2 < t2 < · · · such that Xsi < a < b < Xti for all i.
s1 t1 s2 t 2 s3 t3
We will show that for each a < b, the probability that a bounded martingale X0, X1, · · · ∈ [0, 1]
up-crosses [a, b] infinitely many times is zero (by rescaling, we may assume that it is bounded
between 0 and 1 without loss of generality). Then, by taking a union of all countably many such
pairs (a, b) of rationals, we deduce that the martingale converges with probability 1.
Consider the following better strategy, imagine that Xn is a stock price. At any time, if Xn dips
below a at any point, we buy and hold one share until as soon as Xn reaches above b, at which point
we sell this share. (Note that we always hold either zero or one share–we do not buy more until we
have sold the currently held share). Start with a budget of 1 (so we will never go bankrupt). Let
Yn be the value of our portfolio (cash on hand plus the value of the share if held) at time n. Then
Yn is a martingale (why?). So EYn = Y0 = 1. Also Yn ≥ 0 for all n. If one buys and sells at least k
times up to time n, then Yn ≥ k(b − a) (this is only the net profit from buying and selling; the actual
Yn may be higher due to the initial cash balance and the value of the current share held). So, by
Markov’s inequality, for every n,
EYn 1
P(≥ k up-crossings up to time n) ≤ P(Yn ≥ k(b − a)) ≤ = .
k(b − a) k(b − a)
By the monotone convergence theorem,
1
P(≥ k up-crossings) = lim P(≥ k up-crossings up to time n) ≤ .
n→∞ k(b − a)
Letting k → ∞, the probability of having infinitely many up-crossings is zero.
The weak regularity lemma only guarantees that |Pn,k | ≤ mk , but if we allow empty parts then we
can achieve equality in (b).
Step 2. Passing to a subsequence.
Initially, each Pn,k partitions [0, 1] into arbitrary measurable sets. Since eventually we only care
φ
about δ , which allows rearranging [0, 1], we can replace Wn by an appropriate rearrangement Wn n
(some measure theoretic details are omitted) so that every Pn,k is a partition of [0, 1] into intervals.
By repeated passing to subsequences, we can replace Wn by a subsequence so that
(1) for each k, the endpoints of the intervals in Pn,k all converge as n → ∞, and
(2) the values of Wn,k on the each block convergence individually as n → ∞.
Then, for each k, there is some graphon Uk such that
Wn,k → Uk pointwise almost everywhere as n → ∞.
The relationships between various sequences (after passing to subsequence) is illustrated below
W1 W2 W3 ...
k = 1 W1,1 W2,1 W3,1 . . . → U1 pointwise a.e.
k = 2 W1,2 W2,2 W3,2 . . . → U2 pointwise a.e.
k = 3 W1,3 W2,3 W3,3 . . . → U3 pointwise a.e.
.. .. .. .. .. ..
. . . . . .
Similarly, for each k, the partition Pn,k of [0, 1] into mk intervals converges to a partition Pk of
[0, 1] into intervals. Each Pk+1 refines Pk . Since Wn,k = (Wn,k+1 )Pn,k , taking n → ∞, we get
Uk = (Uk+1 )Pk .
.3 .8 .4 .6
.5 .3
.8 .1 0 .2
.368 ...
.4 0 .5 .2
.3 .4
.6 .2 .2 .7
U1 U2 U3
that Uk − Wn,k 1 < /3 for all n > n0 . Finally, since we chose k > 3/, we already know that
δ(Wn, Wn,k ) < /3 for all n. We conclude that
δ(U, Wn ) ≤ δ(U, Uk ) + δ(Uk , Wn,k ) + δ(Wn,k , Wn )
≤ kU − Uk k 1 + Uk − Wn,k 1
+ δ(Wn,k , Wn )
≤ .
The second inequality uses the general bound that
δ(W1, W2 ) ≤ kW1 − W2 k ≤ kW1 − W2 k 1
for graphons W1, W2 .
The compactness of (W f0, δ ) is a powerful statement. We will spend the remainder for the
chapter exploring its applications. We close this section with a couple of quick applications.
First, let us show how to use compactness to deduce the existence of limit for a left-convergent
sequence of graphons.
Proof of Theorem 4.3.8 (existence of limit of left-convergent sequence of graphons). Let W1, W2, . . .
be a sequence of graphons such that the sequence of F-densities {t(F, Wn )}n converges for every
graph F. Since (W f0, δ ) is compact metric space by Theorem 4.2.7, it is also sequentially compact,
∞ and a graphon W such that δ (W , W) → 0 as i → ∞. Fix
and so there is a subsequence (ni )i=1 ni
any graph F. By the counting lemma, Theorem 4.5.1, it follows that t(F, Wni ) → t(F, W). But by
assumption, the sequence {t(F, Wn )}n converges. Therefore t(F, Wn ) → t(F, W) as n → ∞. Thus
Wn left-converges to W.
Let us now examine a different aspect of compactness. Recall that by definition, a set is compact
if every open cover has a finite subcover.
Recall from Theorem 4.2.8 that the set of graphs is dense in the space of graphons with respect
to the cut metric. This was proved by showing that for every > 0 and graphon W, one can find a
graph G such that δ (G, W) < . However, the size of G produced by this proof depends on both
and W, since the proof proceeds by first taking a discrete L 1 approximation of W, which could
involve an unbounded number of steps to approximate. In contrast, we show below that the number
of vertices of G needs to depend only on and not on W.
Proposition 4.8.1. For every > 0 there is some positive integer N = N() such that every
graphon lies within cut distance from a graph on at most N vertices.
Proof. Let > 0. For a graph G, define the open -ball (with respect to the cut metric) around G:
B (G) = {W ∈ W
f0 : δ (G, W) < }.
Since every graphon lies within cut distance from some graph (Theorem 4.2.8), the balls B (G)
cover Wf0 as G ranges over all graphs. By compactness, this open cover has a finite subcover. So
there is some n such that the subcover only uses graph G with at most N vertices.
The following exercise asks to make the above proof quantitative. (Hint: use the weak regularity
lemma.)
Exercise 4.8.2. Show that for every > 0, every graphon lies within cut distance at most from
2
some graph on at most C 1/ vertices, where C is some absolute constant.
128 4. GRAPH LIMITS
Remark 4.8.3 (Ineffective bounds from compactness). Arguments using compactness usually do
not generate quantitative bounds, meaning, for example, the proof of Proposition 4.8.1 does not
give any specific function n(), only that such a function always exist. In case where one does
not have an explicit bound, we call the bound ineffective. Ineffective bounds also often arise from
arguments involving ergodic theory and non-standard analysis. Sometimes a different argument
can be found that generates a quantitative bound (e.g., Exercise 4.8.2), but it is not always known
how to do this. Here we illustrate a simple example of a compactness application (unrelated to
dense graph limits) that gives an ineffective bound, but it remains an open problem to make bound
effective.
This example concerns bounded degree graphs. It is sometimes called a “regularity lemma” for
bounded degree graphs, but it is very different from the regularity lemmas we have encountered so
far.
A rooted graph is a graph with a special vertex designated as a root, i.e., a pair (G, v) with
v ∈ V(G) as the root. Given a graph G and r ∈ N, we can obtain a random rooted graph by first
picking a vertex v of G as the root uniformly at random, and then removing all vertices more than
distance r from v. We define the r-neighborhood-profile of G to be the probability distribution
on rooted graphs generated by this process.
Recall that the total variation distance between two probability distributions µ and λ is defined
by
dTV (µ, λ) = sup | µ(E) − λ(E)| ,
E
where E ranges over all events. In the case of two discrete discrete random distributions µ and λ,
the above definition can be written as half the ` 1 distance between the two probability distributions:
1Õ
dTV (µ, λ) = | µ(x) − λ(x)| .
2 x
Theorem 4.8.4 (“Regularity lemma” for bounded degree graphs). For every > 0 and positive
integers ∆ and r there exists a positive integer N = N(, ∆, r) such that for every graph G with
maximum degree at most ∆, there exists a graph G0 with at most N vertices, so that the total
variation distance between the r-neighborhood-profiles of G and G0 is at most .
Proof. Let G = G∆,r be the set of all possible rooted graphs with maximum degree ∆ and radius at
most r around the root. Then |G| < ∞. The r-neighborhood-profile pG of any rooted graph G can
be represented as a point pG ∈ [0, 1]G with coordinate sum 1, and let A = {pG : graph G} ⊂ [0, 1]G
be the set of all points that can arise this way. Since [0, 1]G is compact, the closure of A is compact.
Since the union of the open -neighborhoods (with respect to dTV ) of pG , ranging over all graphs
G, covers the closure of A, by compactness there is some finite subcover, i.e., a finite collection
X of graphs so that for every graph G, pG lies within total variance distance to some pG 0 with
G0 ∈ X. We conclude by letting N be the maximum number of vertices of a graph from X.
Despite the short proof using compactness, it remains an open problem to make the above result
quantitative.
Open problem 4.8.5. Find some specific N(, ∆, r) so that Theorem 4.8.4 holds.
4.9. EQUIVALENCE OF CONVERGENCE 129
Theorem 4.9.1 (Uniqueness of moments). Let U and W be graphons such that t(F, W) = t(F, U)
for all F. Then δ (U, W) = 0.
Remark 4.9.2. The result is reminiscent of results from probability theory on the uniqueness of
moments, which roughly says that if two “sufficiently well-behaved” real random variable X and
Y share the same moments, i.e., E[X k ] = E[Y k ] for all nonnegative integers k, then X and Y must
be identically distributed. One needs some technical conditions forÍthe conclusion to hold. For
example, Carleman’s condition says that if the moments of X satisfy ∞ 2k −1/(2k) = ∞, then
k=1 E[X ]
the distribution of X is uniquely determined by its moments. This sufficient condition holds as
long as the k-th moment of X does not grow too quickly with k. It holds for many distributions in
practice.
We need some preparation before proving the uniqueness of moments theorem.
Lemma 4.9.3 (Tail bounds for U -statistics). Let U : [0, 1]2 → [−1, 1] be a symmetric measurable
function. Let x1, . . . , x k ∈ [0, 1] be chosen independently and uniformly at random. Let > 0.
Then
∫
© 1 Õ 2
U(xi, x j ) − U ≥ ® ≤ 2e−k /8 .
ª
P k
[0,1]2
« 2 i< j ¬
Proof. Let f (x1, . . . , xn ) denote the expression inside the absolute value. So E f = 0. Also f
changes by at most 2(k − 1)/ 2k = 4/k whenever we change exactly one coordinate of f . By the
bounded difference inequality, Theorem 4.4.4, we obtain
−2 2
2
P(| f | ≥ ) ≤ 2 exp 2
= 2e−k /8 .
(4/k) k
Let us now consider a variation of the W-random graph model from Section 4.4. Let x1, . . . , x k ∈
[0, 1] be chosen independently and uniformly at random. Let H(k, W) be an edge-weighted random
graph on vertex set [k] with edge i j having weight W(xi, x j ), for each 1 ≤ i < j ≤ n. Note that this
definition makes sense for any symmetric measurable W : [0, 1]2 → R. Furthermore, when W is a
graphon, i.e., W : [0, 1]2 → [0, 1]. the W-random graph G(k, W) can be obtained by independently
sampling each edge of H(k, W) with probability equal to its edge weight. We shall study the joint
distributions of G(k, W) and H(k, W) coupled through the above two-step process.
130 4. GRAPH LIMITS
x3 x5 x1 x2 x4 1 2 3 4 5 1 2 3 4 5
x3
1 1
x5
2 2
x1
3 3
x2 4 4
x4
5 5
W H(k, W) G(k, W)
Similar to Definition 4.2.4 of the cut distance δ , define the distance based on the L 1 norm:
δ1 (W, U) := inf kW − U φ k1
φ
where the infimum is taken over all invertible measure preserving maps φ : [0, 1] → [0, 1]. Since
k · k ≤ k · k 1 , we have δ ≤ δ1 .
Corollary 4.9.5 (Inverse counting lemma). For every > 0 there is some η > 0 and integer k > 0
such that if U and W are graphons with
|t(F, U) − t(F, W)| ≤ η whenever v(F) ≤ k,
then δ (U, W) ≤ .
Exercise 4.9.6. Prove the inverse counting lemma Corollary 4.9.5 using the compactness of
graphon space (Theorem 4.2.7) and the uniqueness of moments (Section 4.9).
Remark 4.9.7. The inverse counting lemma was first proved by Borgs, Chayes, Lovász, Sós, and
Vesztergombi (2008) in the following quantitative form:
Theorem 4.9.8 (Inverse counting lemma). Let k be a positive integer. Let U and W be graphons
with
2
|t(F, U) − t(F, W)| ≤ 2−k whenever v(F) ≤ k,
then (here C is some absolute constant)
C
δ (U, W) ≤ p .
log k
Exercise 4.9.9∗ (Generalized maximum cut). For symmetric measurable functions W, U : [0, 1]2 →
R, define ∫
φ
C(W, U) := sup hW, U i = sup W(x, y)U(φ(x), φ(y)) dxdy,
φ φ
where φ ranges over all invertible measure preserving maps [0, 1] → [0, 1]. Extend the definition
of C(·, ·) to graphs by C(G, ·) := C(WG, ·), etc.
(a) Is C(U, W) continuous jointly in (U, W) with respect to the cut norm? Is it continuous in U if
W is held fixed?
132 4. GRAPH LIMITS
(b) Show that if W1 and W2 are graphons such that C(W1, U) = C(W2, U) for all graphons U, then
δ (W1, W2 ) = 0.
(c) Let G1, G2, . . . be a sequence of graphs such that C(Gn, U) converges as n → ∞ for every
graphon U. Show that G1, G2, . . . is convergent.
(d) Can the hypothesis in (c) be replaced by “C(Gn, H) converges as n → ∞ for every graph H”?
Further reading
The book Large Networks and Graph Limits by Lovász (2012) is the authoritative reference on
the subject. His survey article titled Very Large Graphs (2009) also gives an excellent overview.
CHAPTER 5
The minimum (or rather infimum) p4 is not attained by any single graph, but rather by a sequence of
quasirandom graphs (see Section 3.1). However, if we enlarge the space from graphs G to graphons
W, then the minimizer is attained, in this case by the constant graphon p.
There are many important open problems on graph homomorphism inequalities. A major
conjecture in extremal combinatorics is Sidorenko’s conjecture (1993) (an equivalent conjecture
was given earlier by Erdős and Simonovits).
Definition 5.0.3. We say that a graph F is Sidorenko if for every graph G,
t(F, G) ≥ t(K2, G)e(F) .
Sidorenko’s conjecture has the equivalent graphon formulation: for every bipartite graph F and
graphon W,
t(F, W) ≥ t(K2, W)e(F) .
Note that equality occurs when W ≡ p, the constant graphon. One can think of Sidorenko’s
conjecture
∫ as a separate problem for each F, and asking to minimize t(F, W) among graphons W
with W ≥ p. Whether the constant graphon is the unique minimizer is the subject of an even
stronger conjecture known as the forcing conjecture.
Definition 5.0.5. We say that a graph F is forcing if every graphon W with t(F, W) = t(K2, W)e(F)
is a constant graphon (up to a set of measure zero)
By translating back and forth between graph limits and sequences of graphs, being forcing
is equivalent to the quasirandomness condition. Thus any forcing graph can play the role of C4
in Theorem 3.1.1. This is what led Chung, Graham, and Wilson to consider forcing graphs. In
particular, C4 is forcing.
Proposition 5.0.6 (Forcing and quasirandomness). A graph F is forcing if and only if for every
constant p ∈ [0, 1], every sequence of graphs G = Gn with
t(K2, G) = p + o(1) and t(F, G) = pe(F) + o(1)
is quasirandom in the sense of Definition 3.1.2.
Exercise 5.0.7. Prove Proposition 5.0.6.
5.1. EDGE VERSUS TRIANGLE DENSITIES 135
t ks W
AK W
34 F
Figure 5.1.1. The edge triangle-region. (Figure adapted from Lovász (2012))
The forcing conjecture, below, states a complete characterization of forcing graphs (Skokan
and Thoma 2004; Conlon, Fox, and Sudakov 2010).
Conjecture 5.0.8 (Forcing conjecture). A graph is forcing if and only if it is bipartite and not a
tree.
Exercise 5.0.9. Prove the “only if” direction of the forcing conjecture.
Exercise 5.0.10. Prove that every forcing graph is Sidorenko.
Exercise 5.0.11 (Forcing and stability). Show that a graph F is forcing if and only if for every > 0,
there exists δ > 0 such that if a graph G satisfies t(F, G) ≤ t(K2, G)e(F) + δ, then δ (G, p) ≤ .
Exercise 5.0.12. Let F be a bipartite graph. Suppose there is some constant c > 0 such that
t(F, G) ≥ ct(K2, G)e(F) for all graphs G. Show that F is Sidorenko.
5.1. Edge versus triangle densities
What are all the pairs of edge and triangles densities that can occur in a graph (or graphon)?
We would like to determine the
edge-triangle region := {(t(K2, W), t(K3, W)) : W graphon} ⊂ [0, 1]2 . (5.1.1)
This is a closed subset of [0, 1]2 , due to the compactness of the space of graphons. This set has
been completely determined, and it is illustrated in Figure 5.1.1. We will discuss its features in this
section.
The upper and lower boundaries of this region correspond to the answers of the following
question.
Question 5.1.1. Fix p ∈ [0, 1]. What are the minimum and maximum possible t(K3, W) among all
graphons with t(K2, W) = p?
136 5. GRAPH HOMOMORPHISM INEQUALITIES
For a given p ∈ [0, 1], the set {t(K3, W) : t(K2, W) = p} is a closed interval. Indeed, if W0
achieves the minimum triangle density, and W1 achieves the maximum, then their linear interpolation
Wt = (1 − t)W0 + tW1 , ranging over 0 ≤ t ≤ 1, must have triangle density continuously interpolating
between those of W0 and W1 , and therefore achieves every intermediate value.
The maximization part of Question 5.1.1 is easier. The answer is p3/2 .
Remark 5.1.4. We will see additional proofs of Theorem 5.1.2 not invoking eigenvalues later in
Exercise 5.2.13 and in Section 5.3. Theorem 5.1.2 is an inequality in “physical space” (as opposed
to going into the “frequency space” of the spectrum), and it is a good idea to think about how to
prove it while staying in the physical space.
More generally, the clique graphon (5.1.2) also maximizes Kr -densities among all graphon of
given edge density.
Theorem 5.1.5 (Maximum clique density). For any graphon W and integer k ≥ 3,
t(Kk , W) ≤ t(K2, W) k/2 .
5.1. EDGE VERSUS TRIANGLE DENSITIES 137
Proof. There exist integers a, b ≥ 0 such that k = 3a + 2b (e.g., take a = 1 if k is odd and a = 0 if
k is even). Then aK3 + bK2 (a disjoint union of a triangles and b isolated edges) is a subgraph of
Kk . So
t(Kk , W) ≤ t(aK3 + bK2, W) = t(K3, W)a t(K2, W)b ≤ t(K2, W)3a/2+b = t(K2, W) k/2 .
Remark 5.1.6 (Kruskal–Katona theorem). Thanks to a theorem of Kruskal (1963) and Katona
(1968), the exact answer to the following non-asymptotic question is completely known:
What is the maximum number of copies of Kk ’s in an n-vertex graph with m edges?
When m = a2 for some integer a, the optimal graph is a clique on a vertices. More generally,
for any value of m, the optimal graph is obtained by adding edges in colexicographic order:
12, 13, 23, 14, 24, 34, 15, 25, 35, 45, . . .
This is stronger than Theorem 5.1.5, which only gives an asymptotically tight answer as n → ∞.
The full Kruskal–Katona theorem also answers:
What is the maximum number of k-cliques in an r-graph with n vertices and m edges?
When m = r , the optimal r-graph is a clique on a vertices. (An asymptotic version of this
a
statement can be proved using techniques in Section 5.3.) More generally, the optimal r-graph is
obtained by adding the edges in colexicographic order. For example, for 3-graphs, the edges should
be added in the following order:
123, 124, 134, 234, 125, 135, 235, 145, 245, 345, . . .
Here a1 . . . ar < b1 . . . br in colexicographic order if ai < bi at the last i where ai , bi (i.e.,
dictionary order when read from right to left). Here we sort the elements of each r-tuple in
increasing order.
The Kruskal–Katona theorem can be proved by a compression/shifting argument. The idea is
to repeatedly modifying the graph so that we eventually end up at the optimal graph. At each step,
we “push” all the edges towards a clique along some “direction” in a way that does not reduce the
number of k-cliques in the graph.
Now we turn to the lower boundary of the edge-triangle region. What is the minimum triangle
density in a graph of given edge density p?
For p ≤ 1/2, we can have complete bipartite graphs of density p + o(1), which are triangle-free.
For p > 1/2, the triangle density must be positive due to Mantel’s theorem (Theorem 1.1.1) and
supersaturation (Theorem 1.3.3). It turns out that among graphs with edge density p + o(1), the
triangle density is asymptotically minimized by certain complete multipartite graphs, although this
is not easy to prove.
For each positive integer k, we have
1 1 2
t(K2, Kk ) = 1 − and t(K3, Kk ) = 1 − 1− .
k k k
As k ranges over all positive integers, these pairs form special points on the lower boundary of
the edge-triangle region, as illustrated in Figure 5.1.1. (Recall that Kk is associated to the same
graphon as a complete k-partite graph with equal parts.)
Now suppose the given edge density p lies strictly between 1 − 1/(k − 1) and 1 − 1/k for some
integer k ≥ 2. To obtain the graphon with edge density p and minimum triangle density, we first
start with Kk with all vertices having equal weight. And then shrink the relative weight of exactly
one of the k vertices (while keeping the remaining k − 1 vertices to have the same vertex weight).
138 5. GRAPH HOMOMORPHISM INEQUALITIES
For example, the graphon illustrated below is obtained by starting with K4 and shrinking the weight
on one vertex.
I1 I2 I3 I4
I1 0 1 1 1
I2 1 0 1 1
I3 1 1 0 1
I4 1 1 1 0
During this process, the total edge density (account for vertex weights) decreases continuously
from 1 − 1/k to 1 − 1/(k − 1). At some point, the edge density is equal to p. This vertex-weighted
k-clique W turns out minimize triangle density among all graphons with edge density p.
The above claim is much more difficult to prove than the maximum triangle density result. This
theorem, stated below, due to Razborov (2008), was proved using an involved Cauchy–Schwarz
calculus that he coined flag algebra. We will say a bit more about this method in Section 5.2.
Theorem 5.1.7 (Minimum triangle density). Fix 0 ≤ p ≤ 1 and k = d1/(1 − p)e. The minimum
of t(K3, W) among graphons W with t(K2, W) = p is attained by the stepfunction W associated
to a k-clique with node weights a1, a2, · · · , a k with sum equal to 1, a1 = · · · = a k−1 ≥ a k , and
t(K2, W) = p.
We will not prove this theorem in full here. See Lovász (2012, Section 16.3.2) for a presentation
of the proof of Theorem 5.1.7. Later in this Chapter, we give lower bounds that match the edge-
triangle region at the cliques. In particular, Theorem 5.4.4 will allow us to determine the convex
hull of the region.
The graphon described in Theorem 5.1.7 turns out to be not unique unless p = 1 − 1/k for some
positive integer k Indeed, suppose 1−1/(k −1) < p < 1−1/k. Let I1, . . . , Ik be the partition of [0, 1]
into the intervals corresponding to the vertices of the vertex-weighted k-clique, with I1, . . . , Ik−1 all
having equal length, and Ik strictly smaller length. We can replace the graphon on some Ik−1 ∪ Ik
by any triangle-free graphon without changing the edge density (why is this possible?).
I1 I2 I3 I4
I1 0 1 1 1
I2 1 0 1 1
any
I3 1 1 0 1
triangle-free
graphon
I4 1 1 1 0
This operation does not change the edge-density or the triangle-density of the graphon (check!).
The non-uniqueness of the minimizer hints at the difficulty of the result.
This completes our discussion of the edge-triangle region (Figure 5.1.1).
5.2. CAUCHY–SCHWARZ 139
Theorem 5.1.7 was generalized from K3 to K4 (Nikiforov 2011), and then to all cliques Kr
(Reiher 2016). The construction for the minimizing graphon is the same as for the triangle case.
Theorem 5.1.8 (Minimum clique density). Fix 0 ≤ p ≤ 1 and k = d1/(1 − p)e. The minimum
of t(Kr , W) among graphons W with t(K2, W) = p is attained by the stepfunction W associated
to a k-clique with node weights a1, a2, · · · , a k with sum equal to 1, a1 = · · · = a k−1 ≥ a k , and
t(K2, W) = p.
5.2. Cauchy–Schwarz
We will apply the Cauchy–Schwarz inequality in the following form: given real-valued functions
f and g on the same space (always assuming the usual measurability assumptions without further
comments), we have
∫ 2 ∫ ∫
fg ≤ f 2
g .
2
X X X
It is one of the most versatile inequalities in combinatorics.
To better emphasize the variables being integrated, we write write below the integral sign. The
domain of integration (usually [0, 1] for each variable) is omitted to avoid clutter. We write
∫ ∫
f (x, y, . . . ) for f (x, y, . . . ) dxdy · · · .
x,y,...
In practice, we will often apply the Cauchy–Schwarz inequality by changing the order of
integration, and separating an integral into an outer integral and an inner integral.
A typical application of the Cauchy–Schwarz inequality is demonstrated in the following cal-
culation (here one should think of x, y, z each as collections of variables):
∫ ∫ ∫ ∫
f (x, y)g(x, z) = f (x, y) g(x, z)
x,y,z x y z
∫ ∫ 2 ! 1/2 ∫ ∫ 2 ! 1/2
≤ f (x, y) g(x, z)
x y x z
∫ 1/2 ∫ 1/2
= f (x, y) f (x, y )0
g(x, z)g(x, z ) 0
x,y,y 0 x,z,z 0
Note that in the final step, “expanding a square” has the effect of “duplicating a variable.” It is
useful to recognize expressions with duplicated variables that can be folded back into a square.
Let us warm up by proving that K2,2 is Sidorenko. We actually already proved this statement in
Proposition 3.1.12 in the context of the Chung–Graham–Wilson theorem on quasirandom graphs.
We repeat the same calculations here to demonstrate the integral notation.
Proof.
∫ ∫ ∫ 2 ∫ 2
t(K1,2, W) = W(x, y)W(x, y ) = 0
W(x, y) ≥ W(x, y) = t(K2, W)2 .
x,y,y 0 x y x,y
140 5. GRAPH HOMOMORPHISM INEQUALITIES
Proof.
∫
t(K2,2, W) = W(x, z)W(x, z0)W(y, z)W(y, z0)
x,y,z,z 0
∫ ∫ 2 ∫ 2
= W(x, z)W(y, z) ≥ W(x, z)W(y, z) = t(K1,2, W).
x,y z x,y,z
Proofs involving Cauchy–Schwarz are sometimes called “sum-of-square” proofs. The Cauchy–
Schwarz inequality can be proved by writing the difference between the two sides as a sum of square
quantity:
∫ ∫ ∫ 2 ∫
1
f 2 2
g − fg = ( f (x)g(y) − f (y)g(x))2 .
2 x,y
Commonly, g = 1, in which case we can also write
∫ ∫ 2 ∫ ∫ 2
2
f − f = f (x) − f (y) .
x y
The next inequality tells us that if we color the edges of Kn using two colors, then at least
1/4 + o(1) fraction of all triangles are monochromatic (Goodman 1959). Note that this 1/4 constant
is tight since it is obtained by a uniform random coloring.
Although it was initially conjectured that all graphs are common, this turns out to be false. In
particular, Kt is not common for all t ≥ 4 (Thomason 1989).
It is not too hard to show that every Sidorenko graph is common. Recall that every Sidorenko
graph is bipartite, and it is conjectured that every bipartite graph is Sidorenko. On the other hand,
the triangle is common but not bipartite.
Proof. Suppose F were Sidorenko. Let p = t(K2, W). Then t(F, W) ≥ pe(F) and t(F, 1 − W) ≥
t(K2, 1 − W)e(F) = (1 − p)e(F) . Adding up and using convexity,
We also have the following lower bound on the minimum triangle density given edge density
(Goodman 1959).
Here is plot of Goodman’s bound against the true edge triangle region (figure from Lovász
(2012)). The inequality is tight whenever W is graphon for Kn , so that t(K3, W) = 3n /n3 =
(1 − 1/n)(1 − 2/n) and t(K2, W) = 1 − 1/n. In particular, Goodman’s bound implies Mantel’s
theorem: t(K2, W) > 1/2 implies t(K3, W) > 0.
Thus
∫
t(K3, G) = W(x, y)W(x, z)W(y, z)
x,y,z
∫
≥ W(x, y)(W(x, z) + W(y, z) − 1)
x,y,z
= 2t(K1,2, W) − t(K2, W)
≥ 2t(K2, W)2 − t(K2, W).
Remark 5.2.9 (Flag algebra). The above examples were all simple enough to be found by hand.
As mentioned earlier, every application of the Cauchy–Schwarz inequality can be rewritten in the
form of a sum of a squares. One could actually search for these sum-of-squares proofs more
systematically using a computer program. This idea, first introduced by Razborov (2007), can be
combined with other sophisticated method to determine the lower boundary of the edge-triangle
region (Razborov 2008). Razborov coined the term flag algebra to describe a formalization of such
calculations. The technique is also sometimes called graph algebra, Cauchy–Schwarz calculus,
sum-of-squares proof.
Conceptually, the idea is that we are looking for all the ways to obtain nonnegative linear
combinations of squared expressions. In a typical application, one is asked to solve an extremal
problem of the form
Minimize t(F0, W)
Subject to t(F1, W) = q1, . . ., t(F`, W) = q`,
W a graphon.
The technique is very flexible. The objectives and constraints could be any linear combinations
of densities. It could be maximize instead of minimize. Extensions of the techniques can handle
wider classes of extremal problems, such as for hypergraphs, directed graphs, edge-colored graphs,
permutations, and more.
To demonstrate the technique for graphons, note that we obtain for “free” inequalities of such
as
∫ ∫ 2
W(x, y)W(x, z) (aW(x, u)W(y, u) − bW(x, w)W(w, u)W(u, z) + c) ≥ 0
x,y,z u,w
due to the nonnegativity of squares. Here a, b, c ∈ R are constants (to be chosen). Expand the
above expression, by first
∫ 2 ∫
replacing G x,y,z (u, w) by G x,y,z (u, w)G x,y,z (u0, w0),
u,w u,w,u 0,w 0
we obtain a nonnegative linear combination of t(F, W) over various F with undetermined real
coefficients.
The idea is to now consider all such nonnegative expressions (in practice, on a computer,
we consider a large but finite set of such inequalities). Then we try to optimize the previously
undetermined real coefficients (a, b, c above), so that by adding together an optimized nonnegative
linear combination of all such inequalities, and when combined with the given constraints, we
obtain t(F0, W) ≥ α for some real α. We can find such coefficient and nonnegative combinations
efficiently using a semidefinite program (SDP) solver. This would then prove a bound on the
5.2. CAUCHY–SCHWARZ 143
desired extremal problem. If we also happen to have an example of W satisfying the constraints
and matching the bound, i.e., t(F0, W) = α, then we have solved the extremal problem.
The flag algebra method, with computer assistance, has successfully solved many interesting
extremal problems in graph theory. For example, a conjecture of Erdős (1984) on the maximum
pentagon density in a triangle-free graph was solved using flag algebra methods; the extremal
construction is a blow-up of a 5-cycle (Grzesik 2012; Hatami, Hladký, Kráľ, Norine, and Razborov
2013).
Theorem 5.2.10. Every n-vertex triangle-free graph has at most (n/5)5 cycles of length 5.
Let us mention another nice result obtained using the flag algebra method. Pippenger and
Golumbic (1975) asked to determine the maximum possible number of induced copies of a given
n
graph H among all n-vertex graphs. The optimal limiting density (as a fraction of v(H) , as n → ∞)
is called the inducibility of graph H. They conjectured that for every k ≥ 5, the inducibility of a
k-cycle is k!/(k k − k), obtained by an iterated blow-up of a k-cycle (k = 5 illustrated below; in the
limit the should be infinitely many fractal-like iterations).
The conjecture for 5-cycles was proved by Balogh, Hu, Lidický, and Pfender (2016) using flag
algebra methods combined with additional “stability” methods.
Theorem 5.2.11. Every n-vertex graph has at most n5 /(55 − 5) induced 5-cycles.
Although the flag algebra method has successfully solved several extremal problems, in many
interesting cases, the method does not give a tight bound. Nevertheless, for many open extremal
problems, such as the tetrahedron hypergraph Turán problem, the best known bound comes from
this approach.
Remark 5.2.12 (Incompleteness). Can every true linear inequality for graph homomorphism den-
sities proved via Cauchy–Schwarz/sum-of-squares?
144 5. GRAPH HOMOMORPHISM INEQUALITIES
Before giving the answer, we first discuss classical results about real polynomials. Suppose
p(x1, . . . , xn ) is a real polynomial such that p(x1, . . . , xn ) ≥ 0 for all x1, . . . , xn ∈ R. Can such a
nonnegative polynomial always be written as a sum of squares? Hilbert (1888; 1893) proved that
the answer is yes for n ≥ 2 and no in general for n ≥ 3. The first explicit counterexample was given
by Motzkin (1967):
p(x, y) = x 4 y 2 + x 2 y 4 + 1 − 3x 2 y 2
is always nonnegative due to the AM-GM inequality, but it cannot be written as a nonnegative sum
of squares. Solving Hilbert’s 17th problem, Artin (1927) proved that every p(x1, . . . , xn ) ≥ 0 can
be written as a sum of squares of rational functions, i.e., there is some nonzero polynomial q such
that pq2 can be written as a sum of squares of polynomials. For the earlier example,
x 2 y 2 (x 2 + y 2 + 1)(x 2 + y 2 − 2)2 + (x 2 − y 2 )2
p(x, y) = .
(x 2 + y 2 )2
Turning back to inequalities between graph homomorphism densities, if f (W) = i ci t(Fi, W)
Í
is nonnegative for every graphon W, can f always be written as a nonnegative sum of squares of
rational functions in t(F, W)? In other words, can every true inequality can be proved using a finite
number of Cauchy–Schwarz inequalities (i.e., via vanilla flag algebra calculations).
It turns out that the answer is no (Hatami and Norine 2011). Indeed, if there were always a
sum-of-squares proof, then we could obtain an algorithm for deciding whether f (W) ≥ 0 (with
rational coefficients, say) holds for all graphons W, thereby contradicting the undecidability of
the problem (Remark 5.0.1). Consider the algorithm that enumerates over all possible forms of
sum-of-squares expressions (with undetermined coefficients that can then be solved for) and in
parallel enumerates over all graphs G and checks whether f (G) ≥ 0. If every true inequality had
a sum-of-squares proof, then this algorithm would always terminate and tell us whether f (W) ≥ 0
for all graphons W.
It turns out some simple looking inequalities, such as the fact that the 3-edge-path is Sidorenko,
cannot be written as a sum of squares (Blekherman, Raymond, Singh, and Thomas 2020).
Exercise 5.2.13 (Another proof of maximum triangle density). Let W : [0, 1]2 → R be a symmetric
measurable function. Write W 2 for the function taking value W 2 (x, y) = W(x, y)2 .
(1) Show that t(C4, W) ≤ t(K2, W 2 )2 .
(2) Show that t(K3, W) ≤ t(K2, W 2 )1/2 t(C4, W).
Combining the two inequalities we deduce t(K3, W) ≤ t(K2, W 2 )3/2 , which is somewhat stronger
than Theorem 5.1.2. We will see another proof below in Corollary 5.3.7.
5.3. Hölder
Hölder’s inequality is a generalization of the Cauchy–Schwarz inequality. It says that given
p1, . . . , p k ≥ 1 with 1/p1 + · · · + 1/p k = 1, and real-valued functions f1, . . . , fk on a common space,
we have ∫
f1 f2 · · · fk ≤ k f1 k p1 · · · k fk k pk ,
where the p-norm of a function f is defined by
∫ 1/p
k f k p := |f| p
.
We can apply Hölder’s inequality to show that Kr,r is Sidorenko. The proof is essentially
verbatim to the proof of Theorem 5.2.1 that t(K2,2, W) ≥ t(K2, W)4 from the previous section,
except that we now apply Hölder’s inequality instead of the Cauchy–Schwarz inequality. We
outline the steps below and leave the details as an exercise.
2
Theorem 5.3.1. t(Kr,r , W) ≥ t(K2, W)r
Next, we apply the Cauchy–Schwarz inequality to the variable y (this affects f and h while leaving
g intact). Continuing the above inequality,
∫ ∫ 1/2 ∫ 1/2 ∫ 1/2
≤ f (x, y)2
g(x, z)2
h(y, z)2
.
z x,y x y
Finally, we apply the Cauchy–Schwarz inequality to the variable z (this affects g and h while leaving
x intact). Continuing the above inequality,
∫ 1/2 ∫ 1/2 ∫ 1/2
≤ f (x, y)2
g(x, z)2
h(y, z)2
.
x,y x,z y,z
Letting |·| denote both volume and area (depending on the dimension) and π xy (K) denote
the project of K onto the x y-plane, and likewise with the other planes. Using 1K (x, y, z) ≤
f (x, y)g(x, z)h(y, z), Theorem 5.3.4 implies
|K | 2 ≤ π x y (K) |π xz (K)| π yz (K) . (5.3.1)
This shows that if all three projections have volume at most 1, then |K | ≤ 1.
The inequality (5.3.1), which holds more generally in higher dimensions, is due to Loomis and
Whitney (1949). It has important applications in combinatorics. A powerful generalization known
as Shearer’s entropy inequality will be discussed in Section 5.5.
Now let us state a more general form of Theorem 5.3.4, which can be proved using the same
techniques. The key point of the inequality in Theorem 5.3.4 is that each variables (x, y, z) is
contained in exactly 2 of the factors ( f , g, h). Everything works the same way as long as each
variable is contained in exactly k factors, as long as we use L k norms on the right-hand side.
For example,
∫ 9
Ö
f1 (u, v) f2 (v, w) f3 (w, z) f4 (x, y) f5 (y, z) f6 (z, u) f7 (u, x) f8 (u, z) f9 (w, y) ≤ k fi k 3 .
u,v,w,x,y,z i=1
Here the factors in the integral correspond to edges of a 3-regular graph, below. In particular, every
variable lies in exactly 3 factors.
v w
u x
z y
More generally, each function fi can take as input any number of variables, as long as every variable
appears in exactly k functions. For example
∫
f (w, x, y)g(w, y, z)h(x, z) ≤ k f k2 kgk2 khk2 .
w,x,y,z
The inequality is stated moreÎ generally below. Given x = (x1, . . . , xm ) ∈ X1 × · · · × Xm and I ⊂ [m],
we write π I (x) = (xi )i∈I ∈ i∈I Xi for the projection onto the coordinate subspace of I.
Theorem 5.3.6 (Generalized Hölder). Let X1, . . . , Xm be measure spaces. Let A1, . . . , A` ⊂ [m]
such that each element of [m] appears in exactly k different A0i s. For each i ∈ [m], let fi :
Î
j∈Ai X j →
R. Then ∫
f1 (π A1 (x)) · · · f` (π A` (x)) dx ≤ k f1 k k · · · k f` k k .
X1 ×···×X`
Furthermore, if every Xi is a probability space, then we can relax the hypothesis to “each element
of [m] appears in at most k different Ai ’s.
The version of Theorem 5.3.6 with each Xi being a probability space is useful for graphons.
Corollary 5.3.7. For any graph F with maximum degree at most k, and graphon W,
t(F, W) ≤ kW k e(F)
k
.
5.3. HÖLDER 147
In particular, since ∫
kW k kk = W k ≤ t(K2, W),
the inequality implies that
t(F, W) ≤ t(K2, W)e(F)/k .
This implies the upper bound on clique densities (Theorems 5.1.2 and 5.1.5). The stronger statement
of Corollary 5.3.7 with the L k norm of W on the right-hand side has no direct interpretations for
subgraph densities, but it is important for certain applications such as to understanding large
deviation rates in random graphs (Lubetzky and Zhao 2017).
More generally, using different L p norms for different factors in Hölder’s inequality, we have
the following statement (Finner 1992).
Theorem 5.3.8 (Generalized Hölder). Let X1, . . . , Xm be measure spaces. For each i ∈ [`], let
Î
pi ≥ 1, let Ai ⊂ [m], and fi : j∈Ai X j → R. If either
(1) i: j∈Ai 1/pi = 1 for each j ∈ [m],
Í
OR Í
(2) each Xi is a probability space and i: j∈Ai 1/pi ≤ 1 for each j ∈ [m],
then ∫
f1 (π A1 (x)) · · · f` (π A` (x)) dx ≤ k f1 k p1 · · · k f` k p` .
X1 ×···×X`
The proof proceeds by applying Hölder’s inequality k times in succession, once for each variable
xi ∈ Xi , nearly identically to the proof of Theorem 5.3.4.
Now we turn to another graph inequality that where the above generalization of Hölder’s
inequality plays a key role.
Question 5.3.9. Fix d. Among d-regular graphs, which graph G maximizes i(G)1/v(G) , where i(G)
denotes the number of independent sets of G.
The answer turns out to be G = Kd,d . We can also take G to be a disjoint union of copies of
Kd,d ’s, and this would not change i(G)1/v(G) . This result, stated below, was shown by Kahn (2001)
for bipartite regular graphs G, and later extended by Zhao (2010) to all regular graphs G.
Indeed, a map between their vertex sets form a graph homomorphism if and only if the vertices of
G that map to the non-looped vertex is an independent set of G.
Let us first prove Theorem 5.3.10 for bipartite regular G. The following more general inequality
was shown by Galvin and Tetali (2004). It implies the bipartite case of Theorem 5.3.10 by the
above discussion.
Theorem 5.3.11. For every n-vertex d-regular graph G, and any graph H (allowing looped vertices
on H)
hom(G, H) ≤ hom(Kd,d, H)n/(2d) .
148 5. GRAPH HOMOMORPHISM INEQUALITIES
This function should be thought of the codegree of vertices x1 and x2 . Then, grouping the factors
in the integral according to their right-endpoint, we have
x1 y1
x2 y2
x3 y3
∫
t(C6, W) = W(x1, y1 )W(x2, y1 )W(x1, y2 )W(x3, y2 )W(x2, y3 )W(x2, y3 )
x1,x2,x3,y1,y2,y3
∫ ∫ ∫ ∫
= W(x1, y1 )W(x2, y1 ) W(x1, y2 )W(x3, y2 ) W(x2, y3 )W(x2, y3 )
x1,x2,x3 y1 y2 y3
∫
= f (x1, x2 ) f (x1, x3 ) f (x2, x3 )
x1,x2,x3
≤ k f k 32 [by generalized Hölder, Theorem 5.3.6]
x1 y1
x2 y2
This proves Theorem 5.3.12 in the case F = C6 . The theorem in general can be proved via a similar
calculation and left to the readers as an exercise.
Remark 5.3.13. Kahn (2001) first proved the bipartite case of Theorem 5.3.10 using Shearer’s
entropy inequality, which we will see in Section 5.5. His technique was extended by Galvin and
Tetali (2004) to prove Theorem 5.3.11. The proof using generalized Hölder’s inequality presented
here was given by Lubetzky and Zhao (2017).
5.3. HÖLDER 149
So far we proved Theorem 5.3.10 for bipartite regular graphs. To prove it for all regular graphs,
we apply the following inequality by Zhao (2010). Here G × K2 (tensor product) is the bipartite
double cover of G. An example is illustrated below:
G G × K2
The vertex set of G × K2 is V(G) × {0, 1}. Its vertices are labeled vi with v ∈ V(G) and i ∈ {0, 1}.
Its edges are u0 v1 for all uv ∈ E(G). Note that G × K2 is always a bipartite graph.
i(G)2 ≤ i(G × K2 ).
Assuming Theorem 5.3.14, we can now prove Theorem 5.3.10 by reducing the statement to the
bipartite case, which we proved earlier. Indeed, for every d-regular graph G,
i(G) ≤ i(G × K2 )1/2 ≤ i(Kd,d )n/(2d),
where the last step follows from applying Theorem 5.3.10 to the bipartite graph G × K2 .
Proof of Theorem 5.3.14. Let 2G denote a disjoint union of two copies of G. Label its vertices by
vi with v ∈ V and i ∈ {0, 1} so that its edges are ui vi with uv ∈ E(G) and i ∈ {0, 1}. We will
give an injection φ : I(2G) → I(G × K2 ). Recall that I(G) is the set of independent sets of G. The
injection would imply i(G)2 = i(2G) ≤ i(G × K2 ) as desired.
Fix an arbitrary order on all subsets of V(G). Let S be an independent set of 2G. Let
Ebad (S) := {uv ∈ E(G) : u0, v1 ∈ S}.
Note that Ebad (S) is a bipartite subgraph of G, since each edge of Ebad has exactly one endpoint
in {v ∈ V(G) : v0 ∈ S} but not both (or else S would not be independent). Let A denote the first
subset (in the previously fixed ordering) of V(G) such that all edges in Ebad (S) have one vertex in A
and the other outside A. Define φ(S) to be the subset of V(G) × {0, 1} obtained by “swapping” the
pairs in A, i.e., for all v ∈ A, vi ∈ φ(S) if and only if v1−i ∈ S for each i ∈ {0, 1}, and for all v < A,
vi ∈ φ(S) if and only if vi ∈ S for each i ∈ {0, 1}. It is not hard to verify that φ(S) is an independent
set in G × K2 . The swapping procedure fixes the “bad” edges.
2G G × K2 G × K2
It remains to verify that φ is an injection. For every S ∈ I(2G), once we know T = φ(S), we
can recover S by first setting
0
Ebad (T) = {uv ∈ E(G) : ui, vi ∈ T for some i ∈ {0, 1}},
so that Ebad (S) = Ebad
0 (T), and then finding A as earlier and swapping the pairs of A back. (Remark:
Remark 5.3.15 (Reverse Sidorenko). Does Theorem 5.3.11 generalize to all regular graphs G like
Theorem 5.3.10? Unfortunately, no. For example, when H = consists of two isolated loops,
hom(G, H) = 2c(G) , with c(G) being the number of connected components of G. So hom(G, H)1/v(G)
is minimized among d-regular graphs G for G = Kd+1 , which is the connected d-regular graph
with the fewest vertices.
Theorem 5.3.11 actually extends to every triangle-free regular graph G. Furthermore, for every
non-triangle-free regular graph G, there is some graph H for which the inequality in Theorem 5.3.11
fails.
There are several families interesting graphs H where Theorem 5.3.11 is known to extend to all
regular bipartite G. Notably, this is true for H = Kq , which is significant since hom(G, Kq ) is the
number of proper q-colorings of G.
There are also generalizations of the above to non-regular graphs. For example, for a graph G
without isolated vertices, letting du denote the degree of u ∈ V(G), we have
Ö
i(G) ≤ i(Kdu,dv )1/(du dv ) .
uv∈E(G)
And similarly for the number of proper q-colorings. In fact, the results mentioned in this remark
about regular graphs are proved by induction on vertices of G, and thus require considering the
larger family of not necessarily regular graphs G.
The results discussed in this remark are due to Sah, Sawhney, Stoner, and Zhao (2019;
2020). They introduced the term reverse Sidorenko inequalities to describe these inequalities
2
t(F, W)1/e(F) ≤ t(Kd,d, W)1/d , which mirror the inequality t(F, W)1/e(F) ≥ t(K2, W) in Sidorenko’s
conjecture. Also see the earlier survey by Zhao (2017) for discussions of related results and open
problems.
Exercise 5.3.16. Let F be a bipartite graph with vertex bipartition A ∪ B such that every vertex in
B has degree d. Let du denote the degree of u in F. Prove that for every graphon W,
Ö
t(F, W) ≤ t(Kdu,dv , W)1/(du dv ) .
uv∈E(G)
Exercise 5.3.17. For a graph G, let fq (G) denote the number of maps V(G) → {0, 1, . . . , q} such
that f (u) + f (v) ≤ q for every uv ∈ E(G). Prove that for every n-vertex d-regular graph G (not
necessarily partite),
fq (G) ≤ fq (Kd,d )n/(2d) .
5.4. Lagrangian
Here is another proof of Turán’s theorem due to Motzkin and Straus (1965). It can be viewed
as a continuous/analytic analogue of the Zykov symmetrization proof of Turán’s theorem from
Section 1.2 (the third proof there).
2
Theorem 5.4.1 (Turán theorem). Every n-vertex Kr+1 -free graph has at most 1 − 1r n2 edges.
Proof. Let G be a Kr+1 -free graph on vertex set [n]. Consider the function
Õ
f (x1, . . . , xn ) = xi x j .
i j∈E(G)
5.4. LAGRANGIAN 151
Remark 5.4.2 (Hypergraph Lagrangians). The Lagrangian of a hypergraph H with vertex set [n]
is defined to be
Õ Ö
λ(H) := max f (x1, . . . , xn ), where f (x1, . . . , xn ) = xi .
x1,...,xn ≥0
x1 +···+xn =1 e∈E(H) i∈e
It is a useful tool for certain hypergraph Turán problems. The above proof of Turán’s theorem
shows that for every graph G, λ(G) = (1 − 1/ω(G))/2, where ω(G) is the size of the largest clique
in G. A maximizing x has coordinate 1/ω(G) on vertices of the clique and zero elsewhere.
As an alternate but equivalent perspective, the above proof can rephrased in terms of maximizing
the edge density among Kr+1 -free vertex-weighted graphs (vertex weights are given by the vector
x above). The proof shifts weights between non-adjacent vertices while not decreasing the edge
density, and this process preserves Kr+1 -freeness.
Using a similar technique, we show that to check whether a linear inequality in clique densities
in G holds, it suffices to check it for G being cliques. The next theorem is due to Bollobás (1976).
We first need the following lemma about the extrema of a symmetric polynomial over a simple.
Lemma 5.4.3. Let f (x1, . . . , xn ) be a symmetric polynomial with real coefficients. Suppose x =
(x1, . . . , xn ) minimizes f (x) among all vectors x ∈ Rn with x1, . . . , xn ≥ 0 and x1 + · · · + xn = 1,
and furthermore x has minimum support size among all such minimizers. Then, up to permuting
the coordinates of x, there is some 1 ≤ k ≤ n so that
x1 = · · · = x k = 1/k and x k+1 = · · · = xn = 0.
Proof. Suppose x1, . . . , x k > 0 and x k+1 = · · · = xn = 0 with k ≥ 2. Fixing x3, . . . , xn , we see that
as a function of (x1, x2 ), f has the form
Ax1 x2 + Bx1 + Bx2 + C
152 5. GRAPH HOMOMORPHISM INEQUALITIES
where A, B, C depend on x3, . . . , xn . Notably the coefficients of x1 and x2 agree due since f is a
symmetric polynomial. Holding x1 + x2 fixed, f has the form
Ax1 x2 + C 0 .
If A ≥ 0, then holding x1 + x2 fixed, we can set either x1 or x2 to be zero while not increasing f ,
which contradicts the hypothesis that the minimizing x has minimum support size. So A < 0, so
that with x1 + x2 held fixed, Ax1 x2 + C 0 is minimized uniquely at x1 = x2 . Thus x1 = x2 . Likewise,
x1 = · · · = x k , as claimed.
Theorem 5.4.4 (Linear inequalities between clique densities). Let c1, · · · , c` ∈ R. The inequality
Õ̀
cr t(Kr , G) ≥ 0
r=1
is true for every graph G if and only if it is true with G = Kn for every positive integer n.
More explicitly, the above inequality holds for all graphs G if and only if
Õ̀ n(n − 1) · · · (n − r + 1)
cr · ≥0 for every n ∈ N.
r=1
nr
Since this is a single variable polynomial in m, it is usually easy to check this inequality. We will
see some examples right after the proof.
Proof. Suppose the displayed inequality holds for all cliques G. Let G be an arbitrary graph with
vertex set [n]. Let Õ
fr (x1, . . . , xn ) = xi1 · · · xir
{i1,...,ir }
r-clique in G
and
Õ̀
f (x1, . . . , xn ) = r!cr fr (x1, . . . , xn ).
r=1
So
Õ̀
f (1/n, . . . , 1/n) = cr t(Kr , G).
r=1
It suffices to prove that
min f (x1, . . . , xn ) ≥ 0.
x1,...,xn ≥0
x1 +···+xn =1
By compactness, we can assume that the minimum is attained at some x. Among all minimizing
x, choose one with the smallest support (i.e., the number of nonzero coordinates).
As in the previous proof, if i j < E(G) for some pair of distinct xi, x j > 0, then, replacing
(xi, x j ) by (s, xi + x j − s), f changes linearly in s. Since f is already maximized at x, it must not
change with s. So we can replace (xi, x j ) by (xi + x j , 0), which keeps f the same while decreasing
the number of nonzero coordinates of x. Thus the support of x is a clique in G. Suppose x is
supported on coordinates [k] So f is a symmetric polynomial in x1, . . . , x k . Lemma 5.4.3 implies
Í`
that x1 = · · · = x k = 1/k. Then f (x) = r=1 cr t(Kr , Kk ) ≥ 0 by hypothesis.
Theorem 5.4.4 can be equivalently instated in terms of the convex hull of the region of all
possible clique density tuples.
5.6. EXERCISES 153
5.5. Entropy
To be written
5.6. Exercises
Exercise 5.6.1. Prove that Ks,t is forcing whenever s, t ≥ 2.
Exercise 5.6.2∗. Let K4− be K4 with an edge removed. Prove that K4− is common. In other words,
show that for all graphons W,
t(K4−, W) + t(K4−, 1 − W) ≥ 2−4 .
The next exercise asks you to extend Goodman’s bound (Theorem 5.2.8). The inequality implies
Turán’s theorem (Theorem 1.2.4).
Exercise 5.6.3 (A lower bound on clique density). Show that for every positive integer r ≥ 3, and
graphon W,
t(Kr , W) ≥ p(2p − 1)(3p − 2) · · · ((r − 1)p − (r − 2)) .
Note that this inequality is tight when W is the associated graphon of a clique.
2
Exercise 5.6.4 (Cliquey edges). Show that every n-vertex graph with (1 − 1r ) n2 + t edges has at
least rt edges that belong to a Kr+1 .
Exercise 5.6.5∗ (Maximizing K1,2 density). Prove that, for every p ∈ [0, 1], among all graphons W
with t(K2, W) = p, the maximum possible value of t(K1,2, W) is attained by either a “clique” or a
“hub” graphon, illustrated below.
0 a 1 0 a 1
1
1 a
a 0
0
1 1
clique graphon hub graphon
W(x, y) = 1max{x,y } ≤a W(x, y) = 1min{x,y } ≤a
154 5. GRAPH HOMOMORPHISM INEQUALITIES
Further reading
The book Large Networks and Graph Limits by Lovász (2012) contains an excellent treatment
of graph homomorphism inequalities in Section 2.1 and Chapter 16.
The survey Flag Algebras: An Interim Report by Razborov (2013) contains a survey of results
obtained using the flag algebra method.
CHAPTER 6
In this chapter, we study Roth’s theorem, which says that every 3-AP-free subset of [N] has size
o(N).
Previously, in Section 2.4, we gave a proof of Roth’s theorem using the graph regularity lemma.
The main goal of this chapter is to give a Fourier analytic proof of Roth’s theorem. This is also
Roth’s original proof (1953).
We begin by proving Roth’s theorem in the finite field model. That is, we first prove an
analogue of Roth’s theorem in F3n . Finite field vector spaces serves as an excellent playground for
many additive combinatorics problems. Techniques such as Fourier analysis are often simpler to
carry out in the finite field model. After we develop the techniques in the finite field model, we
then prove Roth’s theorem in the integers. It can be a good idea to first try out ideas in the finite
field model before bringing them to the integers.
Later in Section 6.5, we will see a complete different proof of Roth’s theorem in F3n using the
polynomial method, which gives significantly better quantitative bounds. This proof surprised
many people at the time of its discovery. However, this polynomial method technique is only
applicable in the finite field setting, and it is not known how to apply it in the integers.
Definition 6.1.1 (Fourier transform in Fnp ). The Fourier transform f : Fnp → C is a function b
f : Fnp →
C defined by setting, for each r ∈ Fnp ,
1 Õ
f (r) := E x∈Fnp f (x)ω−r·x = n
b f (x)ω−r·x
p x∈Fn
p
where r · x = r1 x1 + · · · + rn xn .
In particular, bf (0) = E f is the average of f . This value often plays a special role compared to
other values f (r).
b
To simplify notation, it is generally understood that the variables being averaged or summed
over are varying uniformly in the domain Fnp .
Let us now state several important properties of the Fourier transform. We will see that all these
properties are consequences of the orthogonality of the Fourier basis.
The next result allows us to write f in terms of b f.
155
156 6. FORBIDDING 3-TERM ARITHMETIC PROGRESSIONS
Theorem 6.1.2 (Fourier inversion formula). Let f : Fnp → C. For every x ∈ Fnp ,
Õ
f (x) = f (r)ωr·x .
b
r∈Fnp
The next result tells us that the Fourier transform preserves inner products.
As is nowadays the standard in additive combinatorics, we adopt the following convention for
the Fourier transform in finite abelian groups:
average in physical space E f
Íb
and sum in frequency (Fourier) space f.
For example, following this convention, we define an “averaging” inner product for functions
f , g : Fnp → C by
h f , gi := E x∈Fnp f (x)g(x) and k f k := h f , f i 1/2 .
In the frequency/Fourier domain, we define the “summing” inner product for functions α, β : Fnp →
C by Õ
hα, βi `2 := α(x)β(x). and kαk `2 := hα, αi `1/2
2
x∈Fnp
Writing γr : Fnp → C for the function defined by
γr (x) := ωr·x
(this is a character of the group Fnp ), the Fourier transform can be written as
f (r) = E x f (x)γr (x) = h f , γr i .
b (6.1.1)
Parseval’s identity can be stated as
h f , gi = h b
f, b
g i`2 and kfk = kb
f k`2 .
With these conventions, we often do not need to keep track of normalization factors.
The above identities can be proved via direct verification, by plugging in the formula for the
Fourier transform. We give a more conceptual proof below.
Proof of the Fourier inversion formula (Theorem 6.1.2). Let γr (x) = ωr·x . Then the set of function
{γr : r ∈ Fnp }
forms an orthonormal basis for the space of functions Fnp → C with respect to the averaging inner
product h·, ·i. Indeed, (
1 if r = s,,
hγr , γs i = E x ω(r−s)·x =
0 if r , s
6.1. FOURIER ANALYSIS IN FINITE FIELD VECTOR SPACES 157
Furthermore, there are pn functions (as r ranges over Fnp ). So they form a basis of the pn -dimensional
vector space of all functions f : Fnp → C. We will call this basis the Fourier basis.
Now, given an arbitrary f : Fnp → C, the “coordinate” of f with respect to the basis vector γr
Fourier basis is h f , γr i = b
f (r) by (6.1.1). So
Õ
f = f (r)γr .
b
r
Remark 6.1.4. Parseval’s identity is sometimes also referred to by the name Plancheral. Parseval
derived the identity for the Fourier series of a periodic function on R, whereas Plancheral derived
it for the Fourier transform on R.
The convolution is an important operation.
Definition 6.1.5 (Convolution). Define f , g : Fnp → C, define f ∗ g : Fnp → C by
Theorem 6.1.7 (Convolution identity). For any f , g : Fnp → C and any r ∈ Fnp ,
f ∗ g(r) = b
f (r)b
g (r).
Proof. We have
Proposition 6.1.9 (Fourier and 3-AP). Let p be an odd prime. If f , g, h : Fnp → C, then
Õ
Λ( f , g, h) = f (r)b
b g (−2r)b
h(r).
r
We will give two proofs of this proposition. The first proof is more mechanically straightforward.
It is similar to proof of the convolution identity earlier. The second proof directly applies the
convolution identity, and may be a bit more abstract/conceptual.
First proof. We expand the left-hand side using the formula for Fourier inversion.
! ! !
Õ Õ Õ
E x,y f (x)g(x + y)h(x + 2y) = E x,y f (r1 )ω
b r1 ·x
g (r2 )ω
b r2 ·(x+y)
h(r3 )ω
b r3 ·(x+2y)
r1 r2 r3
Õ
= f (r)b
b g (−2r)b
h(r).
r
6.2. ROTH’S THEOREM IN THE FINITE FIELD MODEL 159
Remark 6.1.10. In the following section, we will work in F3n . Since −2 = 1 (and so g1 = g above),
the proof looks even simpler. In particular, by Fourier inversion and the convolution identity,
Λ3 (1 A) = 3−2n {(x, y, z) ∈ A3 : x + y + z = 0}
Õ Õ
= (1 A ∗ 1 A ∗ 1 A)(0) = (1 A ∗ 1 A ∗ 1 A)∧ (r) = 1cA(r)3 . (6.1.4)
r r
When A = −A, the eigenvalues of the adjacency matrix of the Cayley graph Cay(F3n, A) are 3n 1cA(r),
r ∈ F3n (c.f. Section 3.3). The quantity 32n Λ3 (1 A) is the number of closed walks of length 3 in the
Cayley graph Cay(Fnp, A). So the above identity is saying that the number of closed walks of length
3 in Cay(F3n, A) equals to the third moment of the eigenvalues of the adjacency matrix, which is a
general fact for every graph. (When A , −A, we can consider the directed or bipartite version of
this argument.)
The following exercise generalizes the above identity.
Exercise 6.1.11. Let a1, . . . , a k be nonzero integers, none divisible by the prime p. Let f1, . . . , fk : Fnp →
C. Show that
Õ
E x1,...,xk ∈Fnp :a1 x1 +···+ak xk =0 f1 (x1 ) · · · fk (x k ) = f1 (r) · · · b
b fk (r).
r∈Fnp
Theorem 6.2.1 (Roth’s theorem in F3n ). Every 3-AP-free subset of F3n has size O(3n /n).
Remark 6.2.2 (General finite fields). We work in F3n mainly for convenience. The argument pre-
sented in this section also shows that for every odd prime p, there is some constant Cp so that every
3-AP-free subset of Fnp has size ≤ Cp pn /n.
In F3n , there are several equivalent interpretations of x, y, z ∈ F3n forming a 3-AP (allowing the
possibility for a trivial 3-AP with x = y = z):
• (x, y, z) = (x, x + d, x + 2d) for some d;
• x − 2y + z = 0;
• x + y + z = 0;
• x, y, z are all equal or the three distinct points of a line in F3n ,
• for each i, the i-th coordinate of x, y, z are all distinct or all equal.
Remark 6.2.3 (SET card game). The card same SET comes with a deck of 81 cards (see Fig-
ure 6.2.1). Each card has one of four features, and each feature has one of three possibilities:
• Number: 1, 2, 3;
• Symbol: diamond, squiggle, oval;
• Shading: solid, striped, open;
• Color: red, green, purple.
160 6. FORBIDDING 3-TERM ARITHMETIC PROGRESSIONS
3 different colours
3 different shadings 3 different shadings 3 different shadings
3 different symbols
symbols
different numbers of
3 different symbols
3
3 different symbols
Figure 6.2.1. The deck of 81 cards in the game SET. A 20-card cap set is highlighted.
This is the maximum size of a cap set in F43 . Image: Wikipedia.
In a standard play of the game, the dealer lays down twelve cards on table until some player
finds a “set”, in which case the player keeps the three cards of the “set” as their score. And then
6.2. ROTH’S THEOREM IN THE FINITE FIELD MODEL 161
dealer replenishes the table by laying down more cards. If no set is found, then the dealer continues
to lay down more cards until a set is found.
The cards of the game correspond to points of F43 . A “set” is precisely a 3-AP. The cap set
problem in F43 asks for the number of cards without a “set.” The size of the maximum cap set in F43
is 20 (Pellegrino 1970), and an example is shown in Figure 6.2.1.
Here is the proof strategy of Roth’s theorem in F3n :
(1) A 3-AP-free set has a large Fourier coefficient.
(2) A large Fourier coefficient implies density increment on some hyperplane.
(3) Iterate.
As in the proof of the graph regularity lemma (where we refined partitions to obtain an energy
increment), the above process must terminate in a bounded number of steps since the density of a
subset is always between 0 and 1.
Similar to what we saw in Chapter 3 on pseudorandom graphs, a set A ⊂ F3n has pseudorandom
properties if and only if all its Fourier coefficients 1cA(r), for r , 0, are small in absolute value.
When A is pseudorandom in this Fourier-uniform sense, the 3-AP-density of A is similar to that
of a random set with the same density. On the flip side, a large Fourier coefficient in A points
to non-uniformity along the direction of the Fourier character. Then we can restrict A to some
hyperplane and extract a density increment.
The following counting lemma shows that a Fourier-uniform subset of F3n has 3-AP density
similar to that of a random set. It has a similar flavor as the proof that EIG implies C4 in
Theorem 3.1.1. It is also related to the counting lemma for graphons (Theorem 4.5.1). Recall the
3-AP-density Λ3 from Definition 6.1.8.
Lemma 6.2.4 (3-AP counting lemma). Let f : F3n → [0, 1]. Then
f (r)| k f k 22 .
Λ3 ( f ) − (E f )3 ≤ max | b
r,0
Since E f = b
f (0), we have
Õ Õ
Λ3 ( f ) − (E f )3 ≤ f (r)| 3 ≤ max | b
|b f (r)| · f (r)| 2 = max | b
|b f (r)| k f k 22 .
r,0 r,0
r,0 r
Lemma 6.2.6. Let A ⊂ F3n and α = | A| /3n . If A is 3-AP-free and 3n ≥ 2α−2 , then there is r , 0
such that | 1cA(r)| ≥ α2 /2.
162 6. FORBIDDING 3-TERM ARITHMETIC PROGRESSIONS
Proof. Since A is 3-AP-free, Λ3 (A) = | A| /32n = α/3n , as all 3-APs are trivial, i.e., with common
difference zero. By the counting lemma, Lemma 6.2.4,
α
1 A(r)| k1 A k 22 = max |b
α3 − n = α3 − Λ3 (1 A) ≤ max |b 1 A(r)|α.
3 r,0 r,0
By the hypothesis 3n ≥ 2α−2 , the left-hand side above is ≥ α3 /3. So there is some r , 0 with
1cA(r) ≥ α2 /2.
Consequently, there exists j such that |α j − α| + (α j − α) ≥ δ. Note that |t| + t equals 2t if t > 0
and 0 if t ≤ 0. So we deduce that And so α j − α ≥ δ/2, as desired.
Combining the previous two lemmas, here is what we have proved so far.
Lemma 6.2.8 (Density increment). Let A ⊂ F3n and α = | A| /3n . If A is 3-AP-free and 3n ≥ 2α−2 ,
then A has density at least α + α2 /4 when restricted to some hyperplane.
shy of the bound α = O(1/n) that we aim to prove. So let us re-do the density increment analysis
more carefully to analyze how quickly αi grows.
6.3. FOURIER ANALYSIS IN THE INTEGERS 163
Each round, αi increases by at least α2 /4. So it takes ≤ d4/αe initial rounds for αi to double.
Once αi ≥ 2α, it then increases by at least αi2 /4 each round afterwards, so it takes ≤ d1/αi e ≤ d1/αe
additional
2−k round for the density to double again. And so on: the k-th doubling time is at most
4 /α . Since the density is always at most α, the density can double at most log2 (1/α) times.
So the total number of rounds is at most
Õ 42− j
1
=O .
α α
j≤log2 (1/α)
Suppose the process terminates after m steps with density αm . Then, examining the hypothesis
of Lemma 6.2.8, we find that the size of the final subspace |Vm | = 3n−m is less than αm−2 ≤ α−2 . So
n ≤ m + O(log(1/α)) ≤ O(1/α). Thus α = | A| /N = O(1/n). This completes the proof of Roth’s
theorem in F3n (Theorem 6.2.1)
Remark 6.2.9 (Quantitative bounds). The best published lower bound on the size of a cap set is
≥ 2.21n (Edel 2004). This is obtained by constructing a cap set in F62 3 of size m = 928 · 112 ≥
9
2.2162 , which then implies, by a product struction, a cap set in F62k3 of size m k for each positive
integer k.
It was an open problem of great interest whether an upper bound of the form cn , with constant
c < 3, were possible on the size of cap sets in F3n . With significant effort, the Fourier analytic
strategy above was extended to prove an upper bound of the form 3n /n1+c (Bateman and Katz 2012).
So it came as quite a shock to the community when a very short polynomial method proof was
discovered, giving an upper bound O(2.76n ) (Croot, Lev, and Pach 2017; Ellenberg and Gijswijt
2017). We will discuss this proof in Section 6.5. However, the polynomial method proof appears
to be specific to the finite field model, and it is not known how to extend it to the integers.
Theorem 6.3.2 (Fourier inversion formula). Given a finitely supported f : Z → C, for any x ∈ Z,
∫ 1
f (x) = f (θ)e(xθ) dθ.
b
0
164 6. FORBIDDING 3-TERM ARITHMETIC PROGRESSIONS
Note the normalization conventions: we sum in the physical space Z (there is no sensible way
to average in Z) and average in the frequency space R/Z.
Definition 6.3.4 (Convolution). Given finitely supported f , g : Z → C, define f ∗ g : Z → C by
Õ
( f ∗ g)(x) := f (y)g(x − y).
y∈Z
Theorem 6.3.5 (Convolution identity). Given finitely supported f , g : Z → C, for any θ ∈ R/Z,
f ∗ g(θ) = b
f (θ)b
g (θ).
Given finitely supported f , g, h : Z → C, define
Õ
Λ( f , g, h) := f (x)g(x + y)h(x + 2y)
x,y∈Z
and
Λ3 ( f ) := Λ( f , f , f ).
Then for any finite set A of integers,
Λ3 (A) = |{(x, y) : x, x + y, x + 2y ∈ A}|
counts the number of 3-APs in A, where each non-trivial 3-AP is counted twice, forward and
backward, and each trivial 3-AP is counted once.
Theorem 6.4.1 (Roth’s theorem). Every 3-AP-free subset of [N] = {1, . . . , N } has size O(N/log log N).
The proof of Roth’s theorem in F3n proceeded by density increment when restricting to subspaces.
An important difference between F3n and Z is that Z has no subspaces (more on this later). Instead,
we will proceed in Z by restricting to subprogressions. In this section, by a progression we mean
an arithmetic progression.
6.4. ROTH’S THEOREM IN THE INTEGERS 165
We have the following analog of Lemma 6.2.4. It says that if f and g are “Fourier-close,”, then
they have similar 3-AP counts. We write
! 1/2
Õ
kb
f k∞ := sup | b
f (θ)| and k f k`2 := | f (x)| 2 .
θ x∈Z
Proposition 6.4.2 (3-AP counting lemma). Let f , g : Z → C be finitely supported functions. Then
Lemma 6.4.3. Let A ⊂ [N] be a 3-AP free set with | A| = αN. If N ≥ 5α−2 , then there exists
θ ∈ R/Z satisfying
N
Õ α2
(1 A − α)(x)e(θ x) ≥ N.
x=1
10
Proof. Since A is 3-AP-free, the quantity 1 A (x)1 A (x + y)1 A (x + 2y) is nonzero only for trivial APs,
i.e. when y = 0. Thus
Λ3 (1 A) = | A| = αN.
On the other hand, a 3-AP in [N] can be counted by counting pairs integers with the same parity to
form the first and third element of the 3-AP, yielding,
Λ3 (1[N] ) = bN/2c 2 + dN/2e 2 ≥ N 2 /2.
166 6. FORBIDDING 3-TERM ARITHMETIC PROGRESSIONS
Now apply the counting lemma (Proposition 6.4.2) to f = 1 A and g = α1[N] . We have k1 A k`22 =
| A| = αN and kα1[N] k`22 = α2 N. So
α3 N 2
− αN ≤ α3 Λ3 (1[N] ) − Λ3 (1 A) ≤ 3αN (1 A − α1[N] )∧ ∞
.
2
Thus, using N ≥ 5/α2 , we have
2α N− αN 1 2
1 3 2
1 1 2
(1 A − α1[N] )∧ ∞
≥ = α N− ≥ α N.
3αN 6 3 10
Therefore there exists some θ ∈ R with
N
Õ 1 2
(1 A − α)(x)e(θ x) = (1 A − α1[N] )∧ (θ) ≥ α N.
x=1
10
Lemma 6.4.4 (Dirichlet’s lemma). Let θ ∈ R and 0 < δ < 1. Then there exists a positive integer
d ≤ 1/δ such that kdθ k R/Z ≤ δ. (kθ k R/Z is the distance from θ to the nearest integer).
Proof. Let m = b1/δc. By the pigeonhole principle, among the m + 1 numbers 0, θ, · · · , mθ, we can
find 0 ≤ i < j ≤ m such that the fractional parts of iθ and jθ differ by at most δ. Set d = |i − j |.
Then kdθ k R/Z ≤ δ, as desired.
Given θ, we now partition [N] into subprogressions with roughly constant e(xθ) inside each
progression. The constants appearing in rest of this argument are mostly unimportant.
Lemma 6.4.5. Let 0 < η < 1 and θ ∈ R. Suppose N ≥ (4π/η)6 . Then one can partition [N] into
subprogression Pi , each with length
N 1/3 ≤ |Pi | ≤ 2N 1/3,
such that
sup |e(xθ) − e(yθ)| < η, for each i.
x,y∈Pi
√ √
Proof. By Lemma 6.4.4, there is a positive integer d <
N such that kdθ k R/Z ≤ 1/ N. Partition
[N] greedily into progressions with common difference d of lengths between N 1/3 and 2N 1/3 .
6.4. ROTH’S THEOREM IN THE INTEGERS 167
Lemma 6.4.6 (Density increment). Let A ⊂ [N] be 3-AP-free, with | A| = αN and N ≥ (16/α)12 .
Then there exists a subprogression P ⊂ [N] with |P| ≥ N 1/3 and | A ∩ P| ≥ (α + α2 /40) |P| .
Proof. By Lemma 6.4.3, there exists θ satisfying
N
Õ α2
(1 A − α)(x)e(xθ) ≥ N.
x=1
10
Next, apply Lemma 6.4.5 with η = α2 /20 (the hypothesis N ≥ (4π/η)6 is satisfied since (16/α)12 ≥
(80π/α2 )6 = (4π/η)6 ) to obtain a partition P1, . . . , Pk of [N] satisfying N 1/3 ≤ |Pi | ≤ 2N 1/3 and
α2
|e(xθ) − e(yθ)| ≤ for all i and x, y ∈ Pi .
20
So on each Pi ,
Õ Õ α2
(1 A − α)(x)e(xθ) ≤ (1 A − α)(x) + |Pi |.
x∈Pi x∈Pi
20
Thus
N
α2 Õ
N≤ (1 A − α)(x)e(xθ)
10 x=1
k Õ
Õ
≤ (1 A − α)(x)e(xθ) .
i=1 x∈Pi
k
!
Õ Õ α2
≤ (1 A − α)(x) + |Pi |
i=1 x∈P
20
i
k
Õ Õ α2
= (1 A − α)(x) + N
i=1 x∈Pi
20
Thus
k Õ
α2 Õ
N≤ (1 A − α)(x)
20 i=1 x∈P i
and hence
k k
α2 Õ Õ
|Pi | ≤ | A ∩ Pi | − α|Pi | .
20 i=1 i=1
168 6. FORBIDDING 3-TERM ARITHMETIC PROGRESSIONS
We want to show that there exists some Pi such that A has a density increment when restricted to
Pi . The following trick is convenient. Note that
k k
α2 Õ Õ
|Pi | ≤ | A ∩ Pi | − α|Pi |
20 i=1 i=1
k
Õ
= | A ∩ Pi | − α|Pi | + (| A ∩ Pi | − α|Pi |) .
i=1
Thus there exists an i such that
α2
|Pi | ≤ | A ∩ Pi | − α|Pi | + (| A ∩ Pi | − α|Pi |) .
20
Since |t| + t is 2t for t > 0 and 0 for t ≤ 0, we deduce
α2
|Pi | ≤ 2(| A ∩ Pi | − α|Pi |),
20
which yields
α2
| A ∩ Pi | ≥ α + |Pi |.
40
Remark 6.4.7 (Bohr sets). Let us compare the results in F3n and [N]. Write N = 3n for the size of
the ambient size in both cases, for comparison. We obtained an upper bound of O(N/log N) for
3-AP-free sets in F3n and O(N/log log N) in [N] ⊂ Z. Where does the difference in quantitative
bounds stem from?
In the density increment step for F3n , at each step, we pass down to a subset which had size
a constant factor (namely 1/3) of the original one. However, in [N], each iteration gives us a
subprogression which has size equal to the cube root of the previous subprogression. The extra log
for Roth’s theorem in the integers comes from this rapid reduction in the sizes of the subprogressions.
Can we do better? Perhaps by passing down to subsets of [N] that look more like subspaces?
Indeed, this possible. Bourgain introduced a notion of Bohr sets, which mimic properties of
subspaces in settings like Z where subspaces are not available. Given θ 1, . . . , θ k , and some > 0,
a Bohr set has the form
x ∈ [N] : k xθ j kR/Z ≤ for each j = 1, . . . , k .
To see what this is analogous to subspaces, note that we can define a subspace pf F3n as a set of the
following form
x ∈ F3n : r j · x = 0 for each j = 1, . . . , k .
where r1, . . . , rk ∈ F3n \ {0}. Bourgain (1999) introduced Bohr sets1, and used it to improve
quantitative bounds on Roth’s theorem to N/(log N)1/2+o(1) . Bohr sets are used widely in additive
combinatorics, and in nearly all subsequent work on Roth’s theorem in the integers, including the
proof of the current best bound N/(log N)1+c for some constant c > 0 (Bloom and Sisask 2020).
We will see Bohr sets again in the proof of Freiman’s theorem in ??.
Definition 6.5.2 (Slice rank). A function F : A × A × A → F is said to have slice rank 1 if it can
be written as
f (x)g(y, z), f (y)g(x, z), or f (z)g(x, y),
for some nonzero functions f : A → F and g : A × A → F. The slice rank of a function
F : A × A × A → F is the minimum k so that F can be written as a sum of k slice rank 1 functions.
We need the following fact from linear algebra.
Lemma 6.5.3. Every k-dimensional subspace of an n-dimensional vector space (over any field)
contains a point with at least k nonzero coordinates.
Proof. Form a k × n matrix M whose rows are a basis of this k-dimensional subspace W. Then
M has rank k. So it has some invertible k × k submatrix with columns S ⊂ [n] with |S| = k.
Then for every z ∈ FS , there is some linear combination of the rows whose coordinates on S are
identical to those of z. In particular, there is some vector in the k-dimensional subspace W whose
S-coordinates are all nonzero.
A diagonal matrix with nonzero diagonal entries has full rank. We show that a similar statement
holds true for the slice rank.
If we expand the right-hand side, we obtain a polynomial in 3n variables with degree 2n. This is a
sum of monomials, each of the form
j j
x1i1 · · · xnin y11 · · · ynn z1k1 · · · znk n,
where i1, i2, . . . , in, j1, . . . , jn, k 1, . . . , k n ∈ {0, 1, 2}. For each term, by the pigeonhole principle, at
least one of i1 + · · · + in, j1 + · · · + jn, k 1 + · · · + k n is at most 2n/3. So we can split these summands
into three sets:
Ö n Õ
(1 − (xi + yi + zi )2 ) = x1i1 · · · xnin fi1,...,in (y, z)
i=1 i1 +···+in ≤ 2n
3
Õ
j j
+ y11 · · · ynn g j1,..., jn (x, z)
j1 +···+ jn ≤ 2n
3
Õ
+ z1k1 · · · znk n h k1,...,k n (x, y).
k1 +···+k n ≤ 2n
3
Each summand has slice rank at most 1. The number of summands in the first sum is precisely
the number of triples of nonnegative integers a, b, c with a + b + c = n and b + 2c ≤ 2n/3 (a, b, c
correspond to the number of i∗ ’s that are equal to 0, 1, 2 respectively) . The lemma then follows.
Proof. Let x > 0. The sum equals to the coefficients of all the monomials x k with k ≤ 2n/3 in the
expansion of (1 + x + x 2 )n . So the sum is
Õ n! (1 + x + x 2 )n
≤ .
a,b,c≥0
a!b!c! x 2n/3
a+b+c=n
b+2c≤2n/3
Theorem 6.5.8. For every odd prime p, there is some cp < p so that every 3-AP-free subset of Fnp
has size at most 3cpn .
It remains an intriging open problem to extend the techniques to other settings.
Open problem 6.5.9. Is there a constant c < 5 such that every 4-AP-free subset of F5n has size
n
O(c )?
Open problem 6.5.10. Is there a constant c < 2 such that every corner-free subset of F2n × F2n has
size O(c2n )? Here a corner is a configuration of the form {(x, y), (x + d, y), (x, y + d)}.
Finally, the proof technique in this section seems specific to the finite field model. It is an
intriguing open problem to apply the polynomial method for Roth’s theorem in the integers. Due
to the Behrend example (Section 2.5), we cannot expect power-saving bounds in the integers.
The following exercises explains how Fourier uniformity is analogous to the discrepancy-type
condition for -regular pairs in the graph regularity lemma.
Exercise 6.6.2 (Uniformity and discrepancy). Let A ⊂ Fnp with | A| = αpn . Let HyperplaneDISC(η)
denote the property that for every hyperplane W of Fnp ,
|A ∩ W|
− α ≤ η.
|W |
(a) Prove that if A satisfies HyperplaneDISC(), then A is -uniform.
(b) Prove that if A is -uniform, then it satisfies HyperplaneDISC((p − 1)).
Definition 6.6.3 (Fourier uniformity on affine subspaces). For an affine subspace W of Fnp (i.e., the
coset of a subspace), we say that A is -uniform on W if A ∩ W is -uniform when viewed as a
subset of W.
Here is an arithmetic analogue of Szemerédi’s graph regularity lemma that we saw in Chapter 2.
It is due to Green (2005a).
Theorem 6.6.4 (Arithmetic regularity lemma). For every > 0 and prime p, there exists M so that
for every A ⊂ Fnp , there is some subspace W of Fnp with codimension at most M such that A is
-uniform on all but at most -fraction of cosets of W.
The proof is very similar to the proof of the graph regularity lemma in Chapter 2. Each
subspace W induces a partition of of the whole space Fnp into W-cosets, and we keep track the
energy (mean-squared density) of the partition. We show that if the conclusion of Theorem 6.6.4
does not hold for the current W, then we can replace W by a smaller subspace so that the energy
increases significantly. Since the energy is always bounded between 0 and 1, there are at most a
bounded number of iterations.
Definition 6.6.5 (Energy). Given A ⊂ Fnp , and W a subspace of Fnp , we define the energy of W with
respect to a fixed A to be
| A ∩ (W + x)| 2
q A(W) := E x∈Fnp .
|W | 2
Given a subspace W of Fnp . Define µW : Fnp → R by
pn
µW := 1W .
|W |
(One can regard µW as the uniform probability distribution on W; it is normalized so that EµW = 1.)
Then,
| A ∩ (W + x)|
(1 A ∗ µW )(x) = for every x ∈ Fnp .
|W |
We have (check!) (
1 if r ∈ W ⊥,
µcW (r) =
0 if r < W ⊥ .
So by the convolution identity (Theorem 6.1.7).
(
1cA(r) if r ∈ W ⊥,
A ∗ µW (r) = 1 A (r) µ
1 W (r) = (6.6.1)
c c
0 if r < W ⊥ .
174 6. FORBIDDING 3-TERM ARITHMETIC PROGRESSIONS
To summarize, convolving by µW averages 1 A along cosets of W in the physical space, and filters
W ⊥ in the Fourier space.
Energy interacts nicely with the Fourier transform. By Parseval’s identity (Theorem 6.1.3), we
have Õ Õ
q A(W) = k1 A ∗ µW k 22 = A ∗ µW (r)| =
| 1 2
| 1cA(r)| 2 . (6.6.2)
r∈Fnp r∈W ⊥
The next lemma is analogous to Lemma 2.1.10. It is an easy consequence of convexity. It also
directly follows from (6.6.2).
Lemma 6.6.6 (Energy never decreases under refinement). Let A ⊂ Fnp . For subspaces U ≤ W ≤
F2p , we have q A(U) ≥ q A(W).
The next lemma is analogous to the energy boost lemma for irregular pairs in the proof of graph
regularity (Lemma 2.1.11).
Lemma 6.6.7 (Local energy increment). If A ⊂ Fnp is not -uniform, then there is some codimension-
1 subspace W with q A(W) > (| A| /pn )2 + 2 .
Proof. Suppose A is not -uniform. Then there is some r , 0 such that | 1
cA(r)| > . Let W = r ⊥ .
Then by (6.6.2),
q A(W) = | 1cA(0)| 2 + | 1cA(r)| 2 + | 1cA(2r)| 2 + · · · + | 1cA((p − 1)r)| 2
≥ | 1cA(0)| 2 + | 1cA(r)| 2 > (| A| /pn )2 + 2 .
By applying the above lemmas locally to each W-coset, we obtain the following global incre-
ment, analogous to Lemma 2.1.12
Lemma 6.6.8 (Global energy increment). Let A ⊂ Fnp . Let W be a subspace of Fnp . Suppose
that f is not -uniform on > -fraction of W-cosets. Then there is some subspace U of W with
codim U − codim W ≤ pcodim W such that
q A(U) > q A(W) + 3 .
Proof. By Lemma 6.6.7, for each coset W 0 of W on which f is not -uniform, we can find some
r ∈ Fnp \ W ⊥ so that replacing W by its intersection with r ⊥ increases its energy on W 0 by more than
2 . In other words,
| A ∩ W 0 |2
q A∩W 0 (W 0 ∩ r ⊥ ) > 2
+ 2.
0
|W |
Let R be a set of such r’s, one for each W-coset on which f is not -uniform (allowing some r’s to
be chosen repeatedly).
Let U = W ∩ R⊥ . Then codim U − codim W ≤ |R| ≤ |F p /W | = pcodim W .
Applying the monotonicity of energy (Lemma 6.6.6) on each W-coset and using the observation
in the first paragraph in this proof, we see the “local” energy of U is more than that of W on by
> 2 on each of the > -fraction of W-cosets on which f is not -uniform, and is at least as great
as that of W on each of the remaining W-cosets. There the energy increases by > 2 when refining
from W to U.
Proof of the arithmetic regularity lemma (Theorem 6.6.4). Starting with W0 = Fnp , we construct a
sequence of subspaces W0 ≥ W1 ≥ W2 ≥ · · · where each at step, unless A is -uniform on all but
6.6. ARITHMETIC REGULARITY 175
≤ -fraction of W-cosets, then we apply Lemma 6.6.8 to find Wi+1 ≤ Wi . The energy increases by
> 3 at each iteration, so there are < −3 iterations. We have codim Wi+1 ≤ codim Wi + pcodim Wi at
each i, so the final W = Wm has codimension at most some function of p and (one can check that
it is an exponential tower of p’s of height O( −3 )). This W satisfies the desired properties.
Remark 6.6.9 (Lower bound). Recall that Gowers (1997) showed that there exist graphs whose
-regular partition requires at least tower(Ω( −c )) parts (Theorem 2.1.15). There is a similar tower-
type lower bound for the arithmetic regularity lemma (Green 2005a; Hosseini, Lovett, Moshkovitz,
and Shapira 2016).
Remark 6.6.10 (Abelian groups). Green (2005a) also established an arithmetic regularity lemma
over arbitrary finite abelian groups. Instead of subspaces, one uses Bohr sets (see Remark 6.4.7).
Now let us give another proof of the arithmetic regularity-type result. It has the same spirit as
the above regularity lemma, but phrased in terms of a decomposition rather than a partition. This
perspective of regularity as decompositions, popularized by Tao, allows one to adapt the ideas of
regularity to more general settings where we cannot neatly partition the underlying space into easily
describable pieces. It is very useful and has many applications in additive combinatorics.
|b f (r2 )| ≥ · · · .
f (r1 )| ≥ | b
By Parseval (Theorem 6.1.3), we have
p n
Õ
f (r j )| 2 = E f 2 ≤ 1.
|b
j=1
since otherwise adding up the sum over all m ≤ d0−2 e would contradict
Í b 2
r | f (r)| ≤ 1. Also, we
have
1
|bf (rk )| ≤ √ for every k. (6.6.4)
k
The idea now is to split
pn
Õ
f (x) = f (r j )ωr j ·x
b
j=1
into
f = fstr + fsml + fpsr
according to the sizes of the Fourier coefficients. Roughly speaking, the large spectrum will go
into the structured piece fstr , the very small spectrum will go into pseudorandom piece fpsr , and the
remaining middle terms will form the small piece fsml (which has small L 2 norm by (6.6.3)).
Let W = {r1, . . . , rk m }⊥ and set
fstr = fW .
Then, by (6.6.1),
(
f (r) if r ∈ W ⊥,
b
fstr (r) =
c
0 if r ∈ W ⊥ .
Let us define fpsr and fsml via their Fourier transform (and we can recover the functions via the
inverse Fourier transform). For each j = 1, 2, . . . , pn , set
(
f (r j ) if j > k m+1 and r j < W ⊥,
b
fpsr (r j ) =
c
0 otherwise.
Finally, let fsml = f − fpsr − fsml , so that
(
f (r j ) if k m < j ≤ k m+1 and r j < W ⊥,
b
dfsml (r j ) =
0 otherwise.
Now we check that all the conditions are satisfied.
Structured piece. We have fstr = fW where codim W ≤ k m ≤ k d −2 e , which is bounded as a
0
function of the sequence 0 ≥ 1 ≥ . . . .
√
Pseudorandom piece. For every j > k m+1 , we have | b f (r j )| ≤ 1/ k m+1 by (6.6.4), which is in
turn ≤ k m ≤ codim W by the definition of k m . It follows that k fc psr k ≤ codim W .
Small piece. By (6.6.3),
Õ
kdfsml k22 ≤ f (r j )| 2 ≤ 02 .
|b
k m < j≤k m+1
Exercise 6.6.13. Deduce Theorem 6.6.4 from Theorem 6.6.11 by using an appropriate sequence
i and using the same W guaranteed by Theorem 6.6.11.
Remark 6.6.14 (Spectral proof of the graph regularity lemma). The proof technique of Theo-
rem 6.6.11 can be adapted to give an alternate proof of the graph regularity lemma (along with
certain weak and strong variants). Instead of iteratively refining partitions and tracking energy
6.7. POPULAR COMMON DIFFERENCE 177
increments as we did in Chapter 2, we can first take a spectral decomposition of the adjacency
matrix A of a graph:
Õ n
|
A= λi vi vi ,
i=1
where v1, . . . , vn is an orthonormal system of eigenvectors with eigenvalues λ1 ≥ · · · ≥ λn . Then,
as in the proof of Theorem 6.6.11, we can decompose A as
A = Astr + Apsr + Asml
with
λi vi vi, λi vi vi, λi vi vi,
Õ Õ Õ
Astr = Apsr = and Asml =
i≤k i>k 0 k<i≤k 0
for some appropriately chosen k and k 0 similar to the proof of Theorem 6.6.11.
We have
Õn
λi2 = tr A2 ≤ n2 .
i=1
√
So λi ≤ n/ i for each i. We can guarantee that the spectral norm of Apsr is small enough as a
function of k and . Furthermore, we can guarantee that tr A2sml = k<i≤k 0 λi2 ≤ .
Í
To turn Astr into a vertex partition, we can use the approximate level sets of the top k eigenvectors
v1, . . . , vk . Some bookkeeping calculations then shows that this is a regularity partition. Intuitively,
Apsr provides us with regular pairs. Some of these regular pairs may not stay regular after adding
Asml , but since Asml has ≤ mass (in terms of L 2 norm), it destroys at most a negligible fraction of
regular pairs.
See Tao (2007a, Lemma 2.11) or Tao’s blog post The Spectral Proof of the Szemerédi Regularity
Lemma (2012) for more details of the proof.
Theorem 6.7.1 (Roth’s theorem with popular common difference in F3n ). For all > 0, there exists
n0 = n0 () such that for n ≥ n0 and every A ⊂ F3n with | A| = α3n , there exists y , 0 such that
{x ∈ F3n : x, x + y, x + 2y ∈ A} ≥ (α3 − )3n .
In particular, Theorem 6.7.1 implies that every 3-AP-free subset of F3n has size o(3n ).
Exercise 6.7.2. Show that it is false that every A ⊂ F3n with | A| = α3n , the number of pairs
(x, y) ∈ F3n with x, x + y, x + 2y ∈ A is ≥ (α3 − o(1))32n , where o(1) → 0 as n → 0.
We will prove Theorem 6.7.1 via the next result, which concerns the number of 3-APs with
common difference coming from some subspace of bounded codimension, which is picked via the
arithmetic regularity lemma.
178 6. FORBIDDING 3-TERM ARITHMETIC PROGRESSIONS
Theorem 6.7.3 (Roth’s theorem with common difference in some subspace). For every > 0, there
exists M so that for every A ⊂ F3n , there exists a subspace W with codimension at most M, so that
By adapting the above proof strategy with Bohr sets, Green (2005a) proved that a Roth’s theorem
with popular differences in finite abelian groups of odd order, as well as in the integers.
Theorem 6.7.5 (Roth’s theorem with popular difference in finite abelian groups). For all > 0, there
exists N0 = N0 () such that for all finite abelian groups Γ of odd order |Γ| ≥ N0 , and every A ⊂ Γ
with | A| = α |Γ|, there exists y ∈ Γ \ {0} such that
|{x ∈ Γ : x, x + y, x + 2y ∈ A}| ≥ (α3 − ) |Γ| .
FURTHER READING 179
Theorem 6.7.6 (Roth’s theorem with popular difference in the integers). For all > 0, there exists
N0 = N0 () such that for every N ≥ N0 , and every A ⊂ [N] with | A| = αN, there exists y , 0 such
that
|{x ∈ [N] : x, x + y, x + 2y ∈ A}| ≥ (α3 − )N.
See Tao’s blog post A Proof of Roth’s Theorem (2014) for a proof of Theorem 6.7.6 using Bohr
sets, following an arithmetic regularity decomposition in the spirit of Theorem 6.6.11.
Remark 6.7.7 (Bounds). The above proof of Theorem 6.7.1 gives n0 = tower( −O(1) ). The bounds
Theorems 6.7.5 and 6.7.6 are also tower-type. What is the smallest n0 () for which Theorem 6.7.1
holds? It turns out to be tower(Θ(log(1/))), as proved by Fox and Pham (2019) over finite fields
and Fox, Pham, and Zhao (2020) over the integers. Although it had been known since Gowers
(1997) that tower-type bounds are necessary for the regularity lemmas themselves, Roth’s theorem
with popular differences is the first regularity application where a tower-type bound is shown to be
indeed necessary.
Using quadratic Fourier analysis, Green and Tao (2010) extended the popular difference result
over to 4-APs.
Theorem 6.7.8 (Popular difference for 4-APs). For all > 0, there exists N0 = N0 () such that for
every N ≥ N0 and A ⊂ [N] with | A| = αN, there exists y , 0 such that
|{x : x, x + y, x + 2y, x + 3y ∈ A}| ≥ (α4 − )N.
It may be a surprising that such a statement is false for APs of length 5 or longer. This was
shown by Bergelson et al. (2005) with an appendix by Ruzsa giving a construction that is a clever
modification of the Behrend construction (Section 2.5).
Theorem 6.7.9 (Popular difference fails for 5-APs). Let 0 < α < 1/2. For all sufficiently large N,
there exists A ⊂ [N] with | A| ≥ αN such that for all y , 0,
|{x : x, x + y, x + 2y, x + 3y, x + 4y ∈ A}| ≤ α c log(1/α) N.
Here c > 0 is some absolute constant.
For more on results of this type, as well as for popular different fro high dimensional patterns,
see Sah, Sawhney, and Zhao (2021).
Further reading
The book Fourier Analysis by Stein and Shakarchi (2003) provides an excellent undergraduate-
level introduction to Fourier analysis.
Green has several surveys and lecture notes on the topics covered in this and subsequent
chapters. In Finite Field Models in Additive Combinatorics (2005b), Green argues that one should
begin the study of many additive combinatorics problems in the finite field setting. His Montreal
Lecture Notes on Quadratic Fourier Analysis introduces quadratic Fourier analysis and explains
how to prove the popular common difference theorem for 4-APs in F5n . His lecture notes from his
Cambridge course Additive Combinatorics (2009b) also provides an excellent introduction to the
subject.
Tao’s FOCS 2007 tutorial Structure and Randomness in Combinatorics (2007a) explains many
facets of arithmetic regularity and applications.
180 6. FORBIDDING 3-TERM ARITHMETIC PROGRESSIONS
For more on algebraic methods in combinatorics (pre-dating methods in Section 6.5), see the
books Thirty-three Miniatures by Matoušek (2010), Linear Algebra Methods in Combinatorics by
Babai and Frankl, and Polynomial Methods in Combinatorics by Guth (2016).
References
M. Ajtai and E. Szemerédi, Sets of lattice points that form no squares, Studia Sci. Math. Hungar. 9 (1974), 9–11 (1975).
MR369299 ↑55
N. Alon, Eigenvalues and expanders, vol. 6, 1986, pp. 83–96. MR875835 doi:10.1007/BF02579166 ↑87, ↑100
N. Alon and V. D. Milman, λ1, isoperimetric inequalities for graphs, and superconcentrators, J. Combin. Theory Ser.
B 38 (1985), 73–88. MR782626 doi:10.1016/0095-8956(85)90092-9 ↑87
Noga Alon and Assaf Naor, Approximating the cut-norm via Grothendieck’s inequality, SIAM J. Comput. 35 (2006),
787–803. MR2203567 doi:10.1137/S0097539704441629 ↑99
Noga Alon and Asaf Shapira, A characterization of the (natural) graph properties testable with one-sided error, SIAM
J. Comput. 37 (2008), 1703–1727. MR2386211 doi:10.1137/06064888X ↑67
Noga Alon and Joel H. Spencer, The probabilistic method, fourth ed., John Wiley & Sons, Inc., 2016. MR3524748
↑15, ↑117
Noga Alon, Lajos Rónyai, and Tibor Szabó, Norm-graphs: variations and applications, J. Combin. Theory Ser. B 76
(1999), 280–290. MR1699238 doi:10.1006/jctb.1999.1906 ↑36
Noga Alon, Eldar Fischer, Michael Krivelevich, and Mario Szegedy, Efficient testing of large graphs, Combinatorica
20 (2000), 451–476. MR1804820 doi:10.1007/s004930070001 ↑63
Noga Alon, W. Fernandez de la Vega, Ravi Kannan, and Marek Karpinski, Random sampling and approxima-
tion of MAX-CSPs, vol. 67, 2003a, Special issue on STOC2002 (Montreal, QC), pp. 212–243. MR2022830
doi:10.1016/S0022-0000(03)00008-4 ↑122
Noga Alon, Michael Krivelevich, and Benny Sudakov, Turán numbers of bipartite graphs and related Ramsey-type ques-
tions, vol. 12, 2003b, Special issue on Ramsey theory, pp. 477–494. MR2037065 doi:10.1017/S0963548303005741
↑29
Emil Artin, Über die Zerlegung definiter Funktionen in Quadrate, Abh. Math. Sem. Univ. Hamburg 5 (1927), 100–115.
MR3069468 doi:10.1007/BF02952513 ↑144
Lászlo Babai and Péter Frankl, Linear algebra methods in combinatorics, 2020, book draft http://people.cs.
uchicago.edu/~laci/CLASS/HANDOUTS-COMB/BaFrNew.pdf. ↑180
R. C. Baker, G. Harman, and J. Pintz, The difference between consecutive primes. II, Proc. London Math. Soc. (3) 83
(2001), 532–562. MR1851081 doi:10.1112/plms/83.3.532 ↑35
József Balogh, Ping Hu, Bernard Lidický, and Florian Pfender, Maximum density of induced 5-cycle is achieved by
an iterated blow-up of 5-cycle, European J. Combin. 52 (2016), 47–58. MR3425964 doi:10.1016/j.ejc.2015.08.006
↑143
Michael Bateman and Nets Hawk Katz, New bounds on cap sets, J. Amer. Math. Soc. 25 (2012), 585–613. MR2869028
doi:10.1090/S0894-0347-2011-00725-X ↑163
F. A. Behrend, On sets of integers which contain no three terms in arithmetical progression, Proc. Nat. Acad. Sci.
U.S.A. 32 (1946), 331–332. MR18694 doi:10.1073/pnas.32.12.331 ↑57
Clark T. Benson, Minimal regular graphs of girths eight and twelve, Canadian J. Math. 18 (1966), 1091–1094.
MR197342 doi:10.4153/CJM-1966-109-8 ↑38
V. Bergelson and A. Leibman, Polynomial extensions of van der Waerden’s and Szemerédi’s theorems, J. Amer. Math.
Soc. 9 (1996), 725–753. MR1325795 doi:10.1090/S0894-0347-96-00194-4 ↑6
Vitaly Bergelson, Bernard Host, and Bryna Kra, Multiple recurrence and nilsequences, Invent. Math. 160 (2005),
261–303, With an appendix by Imre Ruzsa. MR2138068 doi:10.1007/s00222-004-0428-6 ↑179
Yonatan Bilu and Nathan Linial, Lifts, discrepancy and nearly optimal spectral gap, Combinatorica 26 (2006), 495–519.
MR2279667 doi:10.1007/s00493-006-0029-7 ↑86
Grigoriy Blekherman, Annie Raymond, Mohit Singh, and Rekha R. Thomas, Simple graph density inequalities with
no sum of squares proofs, Combinatorica 40 (2020), 455–471. MR4150878 doi:10.1007/s00493-019-4124-y ↑144
181
182 References
Thomas F. Bloom and Olof Sisask, Breaking the logarithmic barrier in Roth’s theorem on arithmetic progressions,
arXiv preprint (2020). arXiv:2007.03528 ↑5, ↑6, ↑169
Béla Bollobás, Relations between sets of complete subgraphs, Proceedings of the Fifth British Combinatorial Confer-
ence (Univ. Aberdeen, Aberdeen, 1975), 1976, pp. 79–84. MR0396327 ↑151
Béla Bollobás, Modern graph theory, Springer-Verlag, 1998. MR1633290 doi:10.1007/978-1-4612-0619-4 ↑42
J. A. Bondy and U. S. R. Murty, Graph theory, Springer, 2008. MR2368647 doi:10.1007/978-1-84628-970-5 ↑42
J. A. Bondy and M. Simonovits, Cycles of even length in graphs, J. Combinatorial Theory Ser. B 16 (1974), 97–105.
MR340095 doi:10.1016/0095-8956(74)90052-5 ↑27
C. Borgs, J. T. Chayes, L. Lovász, V. T. Sós, and K. Vesztergombi, Convergent sequences of dense graphs.
I. Subgraph frequencies, metric properties and testing, Adv. Math. 219 (2008), 1801–1851. MR2455626
doi:10.1016/j.aim.2008.07.008 ↑115, ↑131
J. Bourgain, On triples in arithmetic progression, Geom. Funct. Anal. 9 (1999), 968–984. MR1726234
doi:10.1007/s000390050105 ↑169
W. G. Brown, On graphs that do not contain a Thomsen graph, Canad. Math. Bull. 9 (1966), 281–285. MR200182
doi:10.4153/CMB-1966-036-2 ↑34, ↑35
W. G. Brown, P. Erdős, and V. T. Sós, Some extremal problems on r-graphs, New directions in the theory of graphs
(Proc. Third Ann Arbor Conf., Univ. Michigan, Ann Arbor, Mich, 1971), 1973, pp. 53–63. MR0351888 ↑54
Boris Bukh, Random algebraic construction of extremal graphs, Bull. Lond. Math. Soc. 47 (2015), 939–945.
MR3431574 doi:10.1112/blms/bdv062 ↑39, ↑42
Boris Bukh, Extremal graphs without exponentially-small bicliques, arXiv preprint (2021). arXiv:2107.04167 ↑39
Jeff Cheeger, A lower bound for the smallest eigenvalue of the Laplacian, Problems in analysis (Papers dedicated to
Salomon Bochner, 1969), 1970, pp. 195–199. MR0402831 ↑87
F. R. K. Chung, R. L. Graham, and R. M. Wilson, Quasi-random graphs, Combinatorica 9 (1989), 345–362.
MR1054011 doi:10.1007/BF02125347 ↑75, ↑82
Fan R. K. Chung, Spectral graph theory, American Mathematical Society, 1997. MR1421568 ↑105
David Conlon, Extremal numbers of cycles revisited, Amer. Math. Monthly 128 (2021), 464–466. MR4249723
doi:10.1080/00029890.2021.1886845 ↑38
David Conlon and Jacob Fox, Graph removal lemmas, Surveys in combinatorics 2013, London Math. Soc. Lecture
Note Ser., vol. 409, Cambridge Univ. Press, Cambridge, 2013, pp. 1–49. MR3156927 ↑73
David Conlon and Yufei Zhao, Quasirandom Cayley graphs, Discrete Anal. (2017), Paper No. 6, 14. MR3631610
doi:10.19086/da.1294 ↑98
David Conlon, Jacob Fox, and Benny Sudakov, An approximate version of Sidorenko’s conjecture, Geom. Funct. Anal.
20 (2010), 1354–1366. MR2738996 doi:10.1007/s00039-010-0097-0 ↑82, ↑135
Don Coppersmith and Shmuel Winograd, Matrix multiplication via arithmetic progressions, J. Symbolic Comput. 9
(1990), 251–280. MR1056627 doi:10.1016/S0747-7171(08)80013-2 ↑57
Ernie Croot, Vsevolod F. Lev, and Péter Pál Pach, Progression-free sets in Z4n are exponentially small, Ann. of Math.
(2) 185 (2017), 331–337. MR3583357 doi:10.4007/annals.2017.185.1.7 ↑163, ↑169
Giuliana Davidoff, Peter Sarnak, and Alain Valette, Elementary number theory, group theory, and Ramanujan graphs,
Cambridge University Press, 2003. MR1989434 doi:10.1017/CBO9780511615825 ↑104, ↑105
Reinhard Diestel, Graph theory, fifth ed., Springer, 2017. MR3644391 doi:10.1007/978-3-662-53622-3 ↑42
Jozef Dodziuk, Difference equations, isoperimetric inequality and transience of certain random walks, Trans. Amer.
Math. Soc. 284 (1984), 787–794. MR743744 doi:10.2307/1999107 ↑87
Yves Edel, Extensions of generalized product caps, Des. Codes Cryptogr. 31 (2004), 5–14. MR2031694
doi:10.1023/A:1027365901231 ↑163
Michael Elkin, An improved construction of progression-free sets, Israel J. Math. 184 (2011), 93–128. MR2823971
doi:10.1007/s11856-011-0061-1 ↑57
Jordan S. Ellenberg and Dion Gijswijt, On large subsets of Fqn with no three-term arithmetic progression, Ann. of Math.
(2) 185 (2017), 339–343. MR3583358 doi:10.4007/annals.2017.185.1.8 ↑163, ↑169
P. Erdős, On some extremal problems on r-graphs, Discrete Math. 1 (1971), 1–6. MR297602 doi:10.1016/0012-
365X(71)90002-1 ↑24
P. Erdős and M. Simonovits, A limit theorem in graph theory, Studia Sci. Math. Hungar. 1 (1966), 51–57. MR205876
↑23
P. Erdős, A. Rényi, and V. T. Sós, On a problem of graph theory, Studia Sci. Math. Hungar. 1 (1966), 215–235.
MR223262 ↑34
References 183
Paul Erdős, On some problems in graph theory, combinatorial analysis and combinatorial number theory, Graph
theory and combinatorics (Cambridge, 1983), Academic Press, London, 1984, pp. 1–17. MR777160 ↑143
P. Erdős, On sets of distances of n points, Amer. Math. Monthly 53 (1946), 248–250. MR15796 doi:10.2307/2305092
↑21, ↑22
P. Erdős and A. H. Stone, On the structure of linear graphs, Bull. Amer. Math. Soc. 52 (1946), 1087–1091. MR18807
doi:10.1090/S0002-9904-1946-08715-7 ↑23, ↑26
Paul Erdős and Paul Turán, On Some Sequences of Integers, J. London Math. Soc. 11 (1936), 261–264. MR1574918
doi:10.1112/jlms/s1-11.4.261 ↑4
Geoffrey Exoo, A lower bound for Schur numbers and multicolor Ramsey numbers of K3 , Electron. J. Combin. 1 (1994),
R8, 3 pp. MR1293398 ↑3
Helmut Finner, A generalization of Hölder’s inequality and some probability inequalities, Ann. Probab. 20 (1992),
1893–1901. MR1188047 ↑145, ↑147
Jacob Fox, A new proof of the graph removal lemma, Ann. of Math. (2) 174 (2011), 561–579. MR2811609
doi:10.4007/annals.2011.174.1.17 ↑54
Jacob Fox and Huy Tuan Pham, Popular progression differences in vector spaces II, Discrete Anal. (2019), Paper No.
16, 39. MR4042159 doi:10.19086/da ↑179
Jacob Fox and Benny Sudakov, Dependent random choice, Random Structures Algorithms 38 (2011), 68–99.
MR2768884 doi:10.1002/rsa.20344 ↑42
Jacob Fox, Huy Tuan Pham, and Yufei Zhao, Tower-type bounds for Roth’s theorem with popular differences, 2020.
arXiv:2004.13690 ↑179
Peter Frankl and Vojtěch Rödl, Extremal problems on set systems, Random Structures Algorithms 20 (2002), 131–164.
MR1884430 doi:10.1002/rsa.10017.abs ↑70
Joel Friedman, A proof of Alon’s second eigenvalue conjecture and related problems, Mem. Amer. Math. Soc. 195
(2008), viii+100. MR2437174 doi:10.1090/memo/0910 ↑104
Alan Frieze and Ravi Kannan, Quick approximation to matrices and applications, Combinatorica 19 (1999), 175–220.
MR1723039 doi:10.1007/s004930050052 ↑120, ↑122
William Fulton and Joe Harris, Representation theory, Graduate Texts in Mathematics, vol. 129, Springer-Verlag, New
York, 1991, A first course, Readings in Mathematics. MR1153249 doi:10.1007/978-1-4612-0979-9 ↑95
Zoltán Füredi, On a Turán type problem of Erdős, Combinatorica 11 (1991), 75–79. MR1112277
doi:10.1007/BF01375476 ↑29
Zoltán Füredi and Miklós Simonovits, The history of degenerate (bipartite) extremal graph problems, Erdös centennial,
János Bolyai Math. Soc., 2013, pp. 169–264. MR3203598 doi:10.1007/978-3-642-39286-3_7 ↑42
H. Furstenberg, Ergodic behavior of diagonal measures and a theorem of Szemerédi on arithmetic progressions, J.
Analyse Math. 31 (1977), 204–256. MR0498471 ↑5, ↑6
H. Furstenberg and Y. Katznelson, An ergodic Szemerédi theorem for commuting transformations, J. Analyse Math.
34 (1978), 275–291. MR531279 doi:10.1007/BF02790016 ↑6
David Galvin and Prasad Tetali, On weighted graph homomorphisms, Graphs, morphisms and statistical physics,
DIMACS Ser. Discrete Math. Theoret. Comput. Sci., vol. 63, Amer. Math. Soc., Providence, RI, 2004, pp. 97–104.
MR2056231 doi:10.1090/dimacs/063/07 ↑147, ↑148
Michel X. Goemans and David P. Williamson, Improved approximation algorithms for maximum cut and satisfia-
bility problems using semidefinite programming, J. Assoc. Comput. Mach. 42 (1995), 1115–1145. MR1412228
doi:10.1145/227683.227684 ↑122
A. W. Goodman, On sets of acquaintances and strangers at any party, Amer. Math. Monthly 66 (1959), 778–783.
MR107610 doi:10.2307/2310464 ↑140, ↑141
W. T. Gowers, Lower bounds of tower type for Szemerédi’s uniformity lemma, Geom. Funct. Anal. 7 (1997), 322–337.
MR1445389 doi:10.1007/PL00001621 ↑48, ↑175, ↑179
W. T. Gowers, A new proof of Szemerédi’s theorem, Geom. Funct. Anal. 11 (2001), 465–588. MR1844079
doi:10.1007/s00039-001-0332-9 ↑5
W. T. Gowers, Quasirandomness, counting and regularity for 3-uniform hypergraphs, Combin. Probab. Comput. 15
(2006), 143–184. MR2195580 doi:10.1017/S0963548305007236 ↑72, ↑73
W. T. Gowers, Hypergraph regularity and the multidimensional Szemerédi theorem, Ann. of Math. (2) 166 (2007),
897–946. MR2373376 doi:10.4007/annals.2007.166.897 ↑70, ↑72
W. T. Gowers, Quasirandom groups, Combin. Probab. Comput. 17 (2008), 363–387. MR2410393
doi:10.1017/S0963548307008826 ↑92, ↑95, ↑96
184 References
Ronald L. Graham, Bruce L. Rothschild, and Joel H. Spencer, Ramsey theory, John Wiley & Sons, Inc., 2013.
MR3288500 ↑8
B. Green, A Szemerédi-type regularity lemma in abelian groups, with applications, Geom. Funct. Anal. 15 (2005),
340–376. MR2153903 doi:10.1007/s00039-005-0509-8 ↑173, ↑175, ↑177, ↑178
Ben Green, Finite field models in additive combinatorics, Surveys in combinatorics 2005, Cambridge University Press,
2005b, pp. 1–27. MR2187732 doi:10.1017/CBO9780511734885.002 ↑179
Ben Green, Additive combinatorics (book review), Bull. Amer. Math. Soc. (N.S.) 46 (2009), 489–497. MR2507281
doi:10.1090/S0273-0979-09-01231-2 ↑8
Ben Green, Additive combinatorics, 2009b, lecture notes http://people.maths.ox.ac.uk/greenbj/notes.
html. ↑179
Ben Green and Terence Tao, The primes contain arbitrarily long arithmetic progressions, Ann. of Math. (2) 167 (2008),
481–547. MR2415379 doi:10.4007/annals.2008.167.481 ↑7
Ben Green and Terence Tao, An arithmetic regularity lemma, an associated counting lemma, and applications, An
irregular mind, Bolyai Soc. Math. Stud., vol. 21, János Bolyai Math. Soc., Budapest, 2010, pp. 261–334. MR2815606
doi:10.1007/978-3-642-14444-8_7 ↑179
Ben Green and Terence Tao, New bounds for Szemerédi’s theorem, III: a polylogarithmic bound for r4 (N), Mathematika
63 (2017), 944–1040. MR3731312 doi:10.1112/S0025579317000316 ↑5
Ben Green and Julia Wolf, A note on Elkin’s improvement of Behrend’s construction, Additive number theory, Springer,
New York, 2010, pp. 141–144. MR2744752 doi:10.1007/978-0-387-68361-4_9 ↑57
A. Grothendieck, Résumé de la théorie métrique des produits tensoriels topologiques, Bol. Soc. Mat. São Paulo 8
(1953), 1–79. MR94682 ↑98
Andrzej Grzesik, On the maximum number of five-cycles in a triangle-free graph, J. Combin. Theory Ser. B 102 (2012),
1061–1066. MR2959390 doi:10.1016/j.jctb.2012.04.001 ↑143
Larry Guth, Polynomial methods in combinatorics, University Lecture Series, vol. 64, American Mathematical Society,
Providence, RI, 2016. MR3495952 doi:10.1090/ulect/064 ↑180
Larry Guth and Nets Hawk Katz, On the Erdős distinct distances problem in the plane, Ann. of Math. (2) 181 (2015),
155–190. MR3272924 doi:10.4007/annals.2015.181.1.2 ↑22
Johan Håstad, Some optimal inapproximability results, J. ACM 48 (2001), 798–859. MR2144931
doi:10.1145/502090.502098 ↑122
Hamed Hatami and Serguei Norine, Undecidability of linear inequalities in graph homomorphism densities, J. Amer.
Math. Soc. 24 (2011), 547–565. MR2748400 doi:10.1090/S0894-0347-2010-00687-X ↑133, ↑144
Hamed Hatami, Jan Hladký, Daniel Kráľ, Serguei Norine, and Alexander Razborov, On the number of pentagons in
triangle-free graphs, J. Combin. Theory Ser. A 120 (2013), 722–732. MR3007147 doi:10.1016/j.jcta.2012.12.008
↑143
David Hilbert, Ueber die Darstellung definiter Formen als Summe von Formenquadraten, Math. Ann. 32 (1888),
342–350. MR1510517 doi:10.1007/BF01443605 ↑144
David Hilbert, Über ternäre definite Formen, Acta Math. 17 (1893), 169–197. MR1554835 doi:10.1007/BF02391990
↑144
Shlomo Hoory, Nathan Linial, and Avi Wigderson, Expander graphs and their applications, Bull. Amer. Math. Soc.
(N.S.) 43 (2006), 439–561. MR2247919 doi:10.1090/S0273-0979-06-01126-8 ↑105
Kaave Hosseini, Shachar Lovett, Guy Moshkovitz, and Asaf Shapira, An improved lower bound for arithmetic regularity,
Math. Proc. Cambridge Philos. Soc. 161 (2016), 193–197. MR3530502 doi:10.1017/S030500411600013X ↑175
Kenneth Ireland and Michael Rosen, A classical introduction to modern number theory, second ed., Graduate Texts in
Mathematics, vol. 84, Springer-Verlag, New York, 1990. MR1070716 doi:10.1007/978-1-4757-2103-4 ↑91
Herbert E. Jordan, Group-Characters of Various Types of Linear Groups, Amer. J. Math. 29 (1907), 387–405.
MR1506021 doi:10.2307/2370015 ↑94
Jeff Kahn, An entropy approach to the hard-core model on bipartite graphs, Combin. Probab. Comput. 10 (2001),
219–237. MR1841642 doi:10.1017/S0963548301004631 ↑147, ↑148
G. Katona, A theorem of finite sets, Theory of graphs (Proc. Colloq., Tihany, 1966), 1968, pp. 187–207. MR0290982
↑137
Kiran S. Kedlaya, Large product-free subsets of finite groups, J. Combin. Theory Ser. A 77 (1997), 339–343.
MR1429085 doi:10.1006/jcta.1997.2715 ↑96
Kiran S. Kedlaya, Product-free subsets of groups, Amer. Math. Monthly 105 (1998), 900–906. MR1656927
doi:10.2307/2589282 ↑96
References 185
Peter Keevash, Hypergraph Turán problems, Surveys in combinatorics 2011, London Math. Soc. Lecture Note Ser.,
vol. 392, Cambridge Univ. Press, Cambridge, 2011, pp. 83–139. MR2866732 ↑42
Subhash Khot, Guy Kindler, Elchanan Mossel, and Ryan O’Donnell, Optimal inapproximability results for MAX-CUT
and other 2-variable CSPs?, SIAM J. Comput. 37 (2007), 319–357. MR2306295 doi:10.1137/S0097539705447372
↑122
János Kollár, Lajos Rónyai, and Tibor Szabó, Norm-graphs and bipartite Turán numbers, Combinatorica 16 (1996),
399–406. MR1417348 doi:10.1007/BF01261323 ↑36
J. Komlós and M. Simonovits, Szemerédi’s regularity lemma and its applications in graph theory, Combinatorics, Paul
Erdős is eighty, Vol. 2 (Keszthely, 1993), Bolyai Soc. Math. Stud., vol. 2, János Bolyai Math. Soc., Budapest, 1996,
pp. 295–352. MR1395865 ↑73
János Komlós, Ali Shokoufandeh, Miklós Simonovits, and Endre Szemerédi, The regularity lemma and its applications
in graph theory, Theoretical aspects of computer science (Tehran, 2000), Lecture Notes in Comput. Sci., vol. 2292,
Springer, Berlin, 2002, pp. 84–112. MR1966181 doi:10.1007/3-540-45878-6_3 ↑73
T. Kővári, V. T. Sós, and P. Turán, On a problem of K. Zarankiewicz, Colloq. Math. 3 (1954), 50–57. MR65617
doi:10.4064/cm-3-1-50-57 ↑19
M. Krivelevich and B. Sudakov, Pseudo-random graphs, More sets, graphs and numbers, Bolyai Soc. Math. Stud.,
vol. 15, Springer, Berlin, 2006, pp. 199–262. MR2223394 doi:10.1007/978-3-540-32439-3_10 ↑105
Joseph B. Kruskal, The number of simplices in a complex, Mathematical optimization techniques, Univ. California
Press, Berkeley, Calif., 1963, pp. 251–278. MR0154827 ↑137
L. H. Loomis and H. Whitney, An inequality related to the isoperimetric inequality, Bull. Amer. Math. Soc 55 (1949),
961–962. MR0031538 doi:10.1090/S0002-9904-1949-09320-5 ↑41, ↑146
László Lovász, Very large graphs, Current developments in mathematics, 2008, Int. Press, Somerville, MA, 2009,
pp. 67–128. MR2555927 ↑132
László Lovász, Large networks and graph limits, American Mathematical Society Colloquium Publications, vol. 60,
American Mathematical Society, Providence, RI, 2012. MR3012035 doi:10.1090/coll/060 ↑132, ↑135, ↑138, ↑141,
↑154
László Lovász and Balázs Szegedy, Limits of dense graph sequences, J. Combin. Theory Ser. B 96 (2006), 933–957.
MR2274085 doi:10.1016/j.jctb.2006.05.002 ↑115
László Lovász and Balázs Szegedy, Szemerédi’s lemma for the analyst, Geom. Funct. Anal. 17 (2007), 252–270.
MR2306658 doi:10.1007/s00039-007-0599-6 ↑112
Eyal Lubetzky and Yufei Zhao, On the variational problem for upper tails in sparse random graphs, Random Structures
Algorithms 50 (2017), 420–436. MR3632418 doi:10.1002/rsa.20658 ↑147, ↑148
A. Lubotzky, R. Phillips, and P. Sarnak, Ramanujan graphs, Combinatorica 8 (1988), 261–277. MR963118
doi:10.1007/BF02126799 ↑104
Alexander Lubotzky, Discrete groups, expanding graphs and invariant measures, Progress in Mathematics, vol. 125,
Birkhäuser Verlag, Basel, 1994, With an appendix by Jonathan D. Rogawski. MR1308046 doi:10.1007/978-3-0346-
0332-4 ↑105
Alexander Lubotzky, Expander graphs in pure and applied mathematics, Bull. Amer. Math. Soc. (N.S.) 49 (2012),
113–162. MR2869010 doi:10.1090/S0273-0979-2011-01359-3 ↑105
W. Mantel, Problem 28, Wiskundige Opgaven 10 (1907), 60–61. ↑10
Adam W. Marcus, Daniel A. Spielman, and Nikhil Srivastava, Interlacing families I: Bipartite Ramanujan graphs of
all degrees, Ann. of Math. (2) 182 (2015), 307–325. MR3374962 doi:10.4007/annals.2015.182.1.7 ↑105
G. A. Margulis, Explicit group-theoretic constructions of combinatorial schemes and their applications in the con-
struction of expanders and concentrators, Problemy Peredachi Informatsii 24 (1988), 51–60. MR939574 ↑104
Ju. V. Matiyasevich, The Diophantineness of enumerable sets, Dokl. Akad. Nauk SSSR 191 (1970), 279–282.
MR0258744 ↑133
Jiří Matoušek, Thirty-three miniatures, American Mathematical Society, 2010, Mathematical and algorithmic applica-
tions of linear algebra. MR2656313 doi:10.1090/stml/053 ↑180
Roy Meshulam, On subsets of finite abelian groups with no 3-term arithmetic progressions, J. Combin. Theory Ser. A
71 (1995), 168–172. MR1335785 doi:10.1016/0097-3165(95)90024-1 ↑159
Moshe Morgenstern, Existence and explicit constructions of q + 1 regular Ramanujan graphs for every prime power
q, J. Combin. Theory Ser. B 62 (1994), 44–62. MR1290630 doi:10.1006/jctb.1994.1054 ↑104
Guy Moshkovitz and Asaf Shapira, A tight bound for hypergraph regularity, Geom. Funct. Anal. 29 (2019), 1531–1578.
MR4025519 doi:10.1007/s00039-019-00512-5 ↑72
186 References
T. S. Motzkin, The arithmetic-geometric inequality, Inequalities (Proc. Sympos. Wright-Patterson Air Force Base,
Ohio, 1965), Academic Press, New York, 1967, pp. 205–224. MR0223521 ↑144
T. S. Motzkin and E. G. Straus, Maxima for graphs and a new proof of a theorem of Turán, Canadian J. Math. 17
(1965), 533–540. MR175813 doi:10.4153/CJM-1965-053-6 ↑150
Jaroslav Nešetřil and Moshe Rosenfeld, I. Schur, C. E. Shannon and Ramsey numbers, a short story, vol. 229,
2001, Combinatorics, graph theory, algorithms and applications, pp. 185–195. MR1815606 doi:10.1016/S0012-
365X(00)00208-9 ↑3
V. Nikiforov, The number of cliques in graphs of given order and size, Trans. Amer. Math. Soc. 363 (2011), 1599–1618.
MR2737279 doi:10.1090/S0002-9947-2010-05189-X ↑139
N. Nikolov and L. Pyber, Product decompositions of quasirandom groups and a Jordan type theorem, J. Eur. Math.
Soc. (JEMS) 13 (2011), 1063–1077. MR2800484 doi:10.4171/JEMS/275 ↑96
A. Nilli, On the second eigenvalue of a graph, Discrete Math. 91 (1991), 207–210. MR1124768 doi:10.1016/0012-
365X(91)90112-F ↑100
Giuseppe Pellegrino, Sul massimo ordine delle calotte in S4,3 , Matematiche (Catania) 25 (1970), 149–157 (1971).
MR363952 ↑161
Sarah Peluse, Bounds for sets with no polynomial progressions, Forum Math. Pi 8 (2020), e16, 55. MR4199235
doi:10.1017/fmp.2020.11 ↑7
Nicholas Pippenger and Martin Charles Golumbic, The inducibility of graphs, J. Combinatorial Theory Ser. B 19
(1975), 189–203. MR401552 doi:10.1016/0095-8956(75)90084-2 ↑143
D. H. J. Polymath, A new proof of the density Hales-Jewett theorem, Ann. of Math. 175 (2012), 1283–1327. MR2912706
doi:10.4007/annals.2012.175.3.6 ↑5
Alexander A. Razborov, Flag algebras, J. Symbolic Logic 72 (2007), 1239–1282. MR2371204
doi:10.2178/jsl/1203350785 ↑142
Alexander A. Razborov, On the minimal density of triangles in graphs, Combin. Probab. Comput. 17 (2008), 603–618.
MR2433944 doi:10.1017/S0963548308009085 ↑138, ↑142
Alexander A. Razborov, Flag algebras: an interim report, The mathematics of Paul Erdős. II, Springer, New York,
2013, pp. 207–232. MR3186665 doi:10.1007/978-1-4614-7254-4_16 ↑154
Christian Reiher, The clique density theorem, Ann. of Math. (2) 184 (2016), 683–707. MR3549620
doi:10.4007/annals.2016.184.3.1 ↑139
V. Rödl, B. Nagle, J. Skokan, M. Schacht, and Y. Kohayakawa, The hypergraph regularity method and its applications,
Proc. Natl. Acad. Sci. USA 102 (2005), 8109–8113. MR2167756 doi:10.1073/pnas.0502771102 ↑5, ↑70, ↑72
K. F. Roth, On certain sets of integers, J. London Math. Soc. 28 (1953), 104–109. MR51853 doi:10.1112/jlms/s1-
28.1.104 ↑4, ↑43, ↑155, ↑164
Imre Z. Ruzsa and Endre Szemerédi, Triple systems with no six points carrying three triangles, Combinatorics (Proc.
Fifth Hungarian Colloq., Keszthely, 1976), Vol. II, 1978, pp. 939–945. MR519318 ↑7, ↑43, ↑52, ↑54
Bruce E. Sagan, The symmetric group, second ed., Graduate Texts in Mathematics, vol. 203, Springer-Verlag, New
York, 2001, Representations, combinatorial algorithms, and symmetric functions. MR1824028 doi:10.1007/978-1-
4757-6804-6 ↑95
Ashwin Sah, Mehtaab Sawhney, David Stoner, and Yufei Zhao, The number of independent sets in an irregular graph,
J. Combin. Theory Ser. B 138 (2019), 172–195. MR3979229 doi:10.1016/j.jctb.2019.01.007 ↑150
Ashwin Sah, Mehtaab Sawhney, David Stoner, and Yufei Zhao, A reverse Sidorenko inequality, Invent. Math. 221
(2020), 665–711. MR4121160 doi:10.1007/s00222-020-00956-9 ↑150
Ashwin Sah, Mehtaab Sawhney, and Yufei Zhao, Patterns without a popular difference, Discrete Anal. (2021), Paper
No. 8, 30. MR4293329 doi:10.19086/da ↑179
R. Salem and D. C. Spencer, On sets of integers which contain no three terms in arithmetical progression, Proc. Nat.
Acad. Sci. U.S.A. 28 (1942), 561–563. MR7405 doi:10.1073/pnas.28.12.561 ↑57
A. Sárkőzy, On difference sets of sequences of integers. I, Acta Math. Acad. Sci. Hungar. 31 (1978), 125–149.
MR466059 doi:10.1007/BF01896079 ↑6
Tomasz Schoen and Ilya D. Shkredov, Roth’s theorem in many variables, Israel J. Math. 199 (2014), 287–308.
MR3219538 doi:10.1007/s11856-013-0049-0 ↑5
Tomasz Schoen and Olof Sisask, Roth’s theorem for four variables and additive structures in sums of sparse sets,
Forum Math. Sigma 4 (2016), e5, 28 pp. MR3482282 doi:10.1017/fms.2016.2 ↑5
Alexander Schrijver, Combinatorial optimization. Polyhedra and efficiency., Springer-Verlag, 2003. MR1956924 ↑42
I. Schur, Uber die kongruenz x m + y m ≡ z m (mod p), Jber. Deutsch. Math.-Verein 25 (1916). ↑1
References 187
J. Schur, Untersuchungen über die Darstellung der endlichen Gruppen durch gebrochene lineare Substitutionen, J.
Reine Angew. Math. 132 (1907), 85–137. MR1580715 doi:10.1515/crll.1907.132.85 ↑94
I. D. Shkredov, On a generalization of Szemerédi’s theorem, Proc. London Math. Soc. (3) 93 (2006), 723–760.
MR2266965 doi:10.1017/S0024611506015991 ↑56
Alexander Sidorenko, A correlation inequality for bipartite graphs, Graphs Combin. 9 (1993), 201–204. MR1225933
doi:10.1007/BF02988307 ↑134
Robert Singleton, On minimal graphs of maximum even girth, J. Combinatorial Theory 1 (1966), 306–332. MR201347
↑38
Jozef Skokan and Lubos Thoma, Bipartite subgraphs and quasi-randomness, Graphs Combin. 20 (2004), 255–262.
MR2080111 doi:10.1007/s00373-004-0556-1 ↑82, ↑135
József Solymosi, Note on a generalization of Roth’s theorem, Discrete and computational geometry, Algorithms
Combin., vol. 25, Springer, Berlin, 2003, pp. 825–827. MR2038505 doi:10.1007/978-3-642-55566-4_39 ↑55
Daniel A. Spielman, Spectral and algebraic graph theory, 2019, textbook draft http://cs-www.cs.yale.edu/
homes/spielman/sagt/. ↑105
Elias M. Stein and Rami Shakarchi, Fourier analysis, Princeton Lectures in Analysis, vol. 1, Princeton University
Press, Princeton, NJ, 2003, An introduction. MR1970295 ↑179
E. Szemerédi, On sets of integers containing no k elements in arithmetic progression, Acta Arith. 27 (1975), 199–245.
MR369312 doi:10.4064/aa-27-1-199-245 ↑4
Endre Szemerédi, Regular partitions of graphs, Problèmes combinatoires et théorie des graphes (Colloq. Internat.
CNRS, Univ. Orsay, Orsay, 1976), Colloq. Internat. CNRS, vol. 260, CNRS, Paris, 1978, pp. 399–401. MR540024
↑49
Terence Tao, A variant of the hypergraph removal lemma, J. Combin. Theory Ser. A 113 (2006), 1257–1280.
MR2259060 doi:10.1016/j.jcta.2005.11.006 ↑72
Terence Tao, Structure and randomness in combinatorics, 48th Annual IEEE Symposium on Foundations of Computer
Science (FOCS’07), 2007a, pp. 3–15. doi:10.1109/FOCS.2007.17 ↑177, ↑179
Terence Tao, The dichotomy between structure and randomness, arithmetic progressions, and the primes, International
Congress of Mathematicians. Vol. I, Eur. Math. Soc., Zürich, 2007b, pp. 581–608. MR2334204 doi:10.4171/022-
1/22 ↑5
Terence Tao, The spectral proof of the szemeredi regularity lemma, 2012, blog post https://terrytao.wordpress.
com/2012/12/03/the-spectral-proof-of-the-szemeredi-regularity-lemma/. ↑177
Terence Tao, A proof of Roth’s theorem, 2014, blog post https://terrytao.wordpress.com/2014/04/24/
a-proof-of-roths-theorem/. ↑179
Terence Tao and Van Vu, Additive combinatorics, Cambridge University Press, 2006. MR2289012
doi:10.1017/CBO9780511755149 ↑8
Alfred Tarski, A Decision Method for Elementary Algebra and Geometry, RAND Corporation, Santa Monica, Calif.,
1948. MR0028796 ↑133
Andrew Thomason, Pseudorandom graphs, Random graphs ’85 (Poznań, 1985), North-Holland Math. Stud., vol. 144,
North-Holland, Amsterdam, 1987, pp. 307–331. MR930498 ↑75
Andrew Thomason, A disproof of a conjecture of Erdős in Ramsey theory, J. London Math. Soc. (2) 39 (1989),
246–255. MR991659 doi:10.1112/jlms/s2-39.2.246 ↑141
Paul Turán, Eine Extremalaufgabe aus der Graphentheorie, Mat. Fiz. Lapok 48 (1941), 436–452 (Hungarian, with
German summary). ↑12
B. L. van der Waerden, Beweis einer baudetschen vermutung, Nieuw Arch. Wisk. 15 (1927), 212–216. ↑4
R. Wenger, Extremal graphs with no C 4 ’s, C 6 ’s, or C 10 ’s, J. Combin. Theory Ser. B 52 (1991), 113–116. MR1109426
doi:10.1016/0095-8956(91)90097-4 ↑38
Douglas B. West, Introduction to graph theory, Prentice Hall, Inc., 1996. MR1367739 ↑42
Avi Wigderson, Representation theory of finite groups, and applications, Lecture notes for the 22nd McGill invitational
workshop on computational complexity, 2012, https://www.math.ias.edu/~avi/TALKS/Green_Wigderson_
lecture.pdf. ↑96
David Williams, Probability with martingales, Cambridge Mathematical Textbooks, Cambridge University Press,
Cambridge, 1991. MR1155402 doi:10.1017/CBO9780511813658 ↑123, ↑124
K. Zarankiewicz, Problem 101, Colloq. Math. 2 (1951), 201. ↑19
Yufei Zhao, The number of independent sets in a regular graph, Combin. Probab. Comput. 19 (2010), 315–320.
MR2593625 doi:10.1017/S0963548309990538 ↑147, ↑149
188 References
Yufei Zhao, Extremal regular graphs: independent sets and graph homomorphisms, Amer. Math. Monthly 124 (2017),
827–843. MR3722040 doi:10.4169/amer.math.monthly.124.9.827 ↑150