A Dynamical Approach To Random Matrix Theory PDF
A Dynamical Approach To Random Matrix Theory PDF
L Á S Z L Ó E R D Ő S LECTURE
HO RNG-TZER YAU NOTES
A Dynamical Approach
to Random Matrix Theory
Horng-Tzer Yau
Harvard University
28 A Dynamical Approach
to Random Matrix Theory
Copying and reprinting. Individual readers of this publication, and nonprofit libraries
acting for them, are permitted to make fair use of the material, such as to copy select pages for
use in teaching or research. Permission is granted to quote brief passages from this publication in
reviews, provided the customary acknowledgment of the source is given.
Republication, systematic copying, or multiple reproduction of any material in this publication
is permitted only under license from the American Mathematical Society. Permissions to reuse
portions of AMS publication content are handled by Copyright Clearance Center’s RightsLink
service. For more information, please visit: http://www.ams.org/rightslink.
Send requests for translation rights and licensed reprints to reprint-permission@ams.org.
Excluded from these provisions is material for which the author holds copyright. In such cases,
requests for permission to reuse or reprint material should be addressed directly to the author(s).
Copyright ownership is indicated on the copyright page, or on the lower right-hand corner of the
first page of each article within proceedings volumes.
c 2017 by the authors. All rights reserved.
Printed in the United States of America.
∞ The paper used in this book is acid-free and falls within the guidelines
established to ensure permanence and durability.
Visit the AMS home page at http://www.ams.org/
10 9 8 7 6 5 4 3 2 1 22 21 20 19 18 17
Contents
Preface ix
Chapter 1. Introduction 1
Chapter 2. Wigner Matrices and Their Generalizations 7
Chapter 3. Eigenvalue Density 11
3.1. Wigner Semicircle Law and Other Canonical Densities 11
3.2. The Moment Method 12
3.3. The Resolvent Method and the Stieltjes Transform 14
Chapter 4. Invariant Ensembles 17
4.1. Joint Density of Eigenvalues for Invariant Ensembles 17
4.2. Universality of Classical Invariant Ensembles
via Orthogonal Polynomials 20
Chapter 5. Universality for Generalized Wigner Matrices 29
5.1. Different Notions of Universality 29
5.2. The Three-Step Strategy 31
Chapter 6. Local Semicircle Law for Universal Wigner Matrices 33
6.1. Setup 33
6.2. Spectral Information on 𝑆 35
6.3. Stochastic Domination 37
6.4. Statement of the Local Semicircle Law 38
6.5. Appendix: Behavior of Γ and Γ̃ and the Proof of Lemma 6.3 40
Chapter 7. Weak Local Semicircle Law 45
7.1. Proof of the Weak Local Semicircle Law, Theorem 7.1 45
7.2. Large-Deviation Estimates 57
Chapter 8. Proof of the Local Semicircle Law 61
8.1. Tools 61
8.2. Self-Consistent Equations on Two Levels 64
8.3. Proof of the Local Semicircle Law Without Using the Spectral Gap 67
Chapter 9. Sketch of the Proof of the Local Semicircle Law
Using the Spectral Gap 79
v
vi CONTENTS
ix
CHAPTER 1
Introduction
As the first result of this type, Wigner proved a type of law of large num-
bers for the density of eigenvalues, which we now explain. The (real or com-
plex) Wigner ensembles consist of 𝑁 × 𝑁 self-adjoint matrices 𝐻 = (ℎ𝑖𝑗 ) with
matrix elements having mean zero and variance 1/𝑁 that are independent up
to the symmetry constraint ℎ𝑖𝑗 = ℎ𝑗𝑖 . The Wigner semicircle law states that
the empirical density of the eigenvalues of 𝐻 is given by the semicircle law,
1
𝜚sc (𝑥) = 2𝜋 √(4 − 𝑥 2 )+ , as 𝑁 → ∞, independent of the details of the distribu-
tion of ℎ𝑖𝑗 .
On the scale of individual eigenvalues, Wigner predicted that the fluctua-
tions of the gaps are universal and their distribution is given by a new law, the
Wigner surmise. This might be viewed as the random matrix analogue of the
central limit theorem.
After Wigner’s discovery, Dyson, Gaudin, and Mehta achieved several fun-
damental mathematical results. In particular, they were able to compute the
gap distribution and the local correlation functions of the eigenvalues for Gauss-
ian ensembles. They are called the Gaussian orthogonal ensemble (GOE) and
the Gaussian unitary ensemble (GUE), corresponding to the two most important
symmetry classes, the real symmetric and complex Hermitian matrices. The
Wigner surmise turned out to be slightly wrong and the correct law is given by
the Gaudin distribution. Dyson and Mehta gradually formulated what is nowa-
days known as the Wigner-Dyson-Mehta (WDM) universality conjecture. As
presented in the classical treatise of Mehta [106], this conjecture asserts that
the local eigenvalue statistics for large random matrices with independent en-
tries are universal; i.e., they do not depend on the particular distribution of the
matrix elements. In particular, they coincide with those in the Gaussian case
that were computed explicitly.
On a more philosophical level, we can recast Wigner’s vision as the hypoth-
esis that the eigenvalue gap distributions for large complicated quantum sys-
tems are universal in the sense that they depend only on the symmetry class of
the physical system but not on other detailed structures. Therefore, the Wigner-
Dyson-Mehta universality conjecture is merely a test of Wigner’s hypothesis for
a special class of matrix models, the Wigner ensembles, which are characterized
by the independence of their matrix elements. The other large class of ma-
trix models is the invariant ensembles. They are defined by a Radon-Nikodym
density of the form exp(−Tr 𝑉(𝐻)) with respect to the flat Lebesgue measure
on the space of real symmetric or complex Hermitian matrices 𝐻. Here 𝑉 is a
real-valued function called the potential of the invariant ensemble. These distri-
butions are invariant under orthogonal or unitary conjugation, but the matrix
elements are not independent except for the Gaussian case. The universality
conjecture for invariant ensembles asserts that the local eigenvalue gap statis-
tics are independent of the function 𝑉.
In contrast to the Wigner case, substantial progress was made for invari-
ant ensembles in the last two decades. The key element of this success was
that invariant ensembles, unlike Wigner matrices, have explicit formulas for
1. INTRODUCTION 3
the joint densities of the eigenvalues. These formulas express the eigenvalue
correlation functions as determinants whose entries are given by functions of
orthogonal polynomials. The limiting local eigenvalue statistics are thus deter-
mined by the asymptotic behavior of these formulas and in particular those of
the orthogonal polynomials as the size of the matrix tends to infinity. A key
important tool, the Riemann-Hilbert method [72], was brought into this sub-
ject by Fokas, Its, and Kitaev, and the universality of eigenvalue statistics was
established for large classes of invariant ensembles by Bleher-Its [20] and by
Deift and collaborators [37–41].
Behind the spectacular progress in understanding the local statistics of in-
variant ensembles, the cornerstone of this approach lies in the fact that there
are explicit formulas to represent eigenvalue correlation functions by orthogo-
nal polynomials—a key observation made in the original work of Gaudin and
Mehta for Gaussian random matrices. For Wigner ensembles, there are no ex-
plicit formulas for the eigenvalue statistics, and the WDM conjecture was open
for almost 50 years with virtually no progress. The first significant advance
in this direction was made by Johansson [86], who proved the universality for
complex Hermitian matrices under the assumption that the common distribu-
tion of the matrix entries has a substantial Gaussian component; i.e., the ran-
dom matrix 𝐻 is of the form 𝐻 = 𝐻0 + 𝑎𝐻 G where 𝐻0 is a general Wigner
matrix, 𝐻 G is the GUE matrix, and 𝑎 is a certain, not too small, positive con-
stant independent of 𝑁. His proof relied on an explicit formula by Brézin and
Hikami [30, 31] that uses a certain version of the Harish-Chandra-Itzykson-
Zuber formula [85]. These formulas are available for the complex Hermitian
case only, which restricted the method to this symmetry class.
If local spectral universality is so ubiquitous as Wigner, Dyson, and Mehta
conjectured, there must be an underlying mechanism driving local statistics to
their universal fixed point. In hindsight, the existence of such a mechanism is
almost a synonym of “universality.” However, up to ten years ago there was no
clear indication whether a solution of the WDM conjecture would rely on some
yet-to-be discovered formulas or would come from some other deep insight.
The goal of this book is to give a self-contained introduction to a new ap-
proach to local spectral universality which, in particular, resolves the WDM
conjecture. To keep technicalities to a minimum, we will consider the simplest
class of matrices that we call generalized Wigner matrices (Definition 2.1), and
we will focus on proving universality in a simpler form, in the averaged energy
sense (see Section 5.1) and the necessary prerequisites. We stress that these are
not the strongest results that are currently available, but they are good repre-
sentatives of the general ideas we have developed in the past several years. We
believe that these techniques are sufficiently mature to be presented in a book
format.
4 1. INTRODUCTION
The conceptually novel point of our method is Step 2. The eigenvalue dis-
tributions of the Gaussian divisible ensembles, written in the form 𝑒−𝑡/2 𝐻0 +
√1 − 𝑒−𝑡 𝐻 G , are the same as that of the solution of a matrix-valued Ornstein-
Uhlenbeck (OU) process 𝐻𝑡 for any time 𝑡 ≥ 0. Dyson [45] observed half a
century ago that the dynamics of the eigenvalues of 𝐻𝑡 is given by an inter-
acting stochastic particle system, called the Dyson Brownian motion (DBM).
In addition, the invariant measure of this dynamics is exactly the eigenvalue
distribution of GOE or GUE. This invariant measure is also a Gibbs measure
of point particles in one dimension interacting via a long-range, logarithmic
potential. Using a heuristic physical argument, Dyson remarked [45] that the
DBM reaches its “local equilibrium” on a short time scale 𝑡 ≳ 𝑁 −1 . We will
refer to this as Dyson’s conjecture, although it was more an intuitive physical
picture than an exact mathematical statement.
Since Dyson’s work in the 1960s, there has been virtually no progress in
proving this conjecture. Besides the limit of available mathematical tools, one
main factor is the vague formulation of the conjecture involving the notion of
“local equilibrium,” which even nowadays is not well-defined for a Gibbs mea-
sure with a general long-range interaction. Furthermore, a possible connection
between Dyson’s conjecture and the solution of the WDM conjecture has never
been elucidated in the literature.
In fact, “relaxation to local equilibrium” in a time scale 𝑡 refers to the phe-
nomenon that after time 𝑡 the dynamics has changed the system, at least locally,
from its initial state to a local equilibrium. It therefore appears counterintuitive
that one may learn anything useful in this way about the WDM conjecture,
since the latter concerns the initial state. The key point is that by applying local
relaxation to all initial states (within a reasonable class) simultaneously, Step 2
generates a large set of random matrix ensembles for which universality holds.
We prove that, for the purpose of universality, this set is sufficiently dense so
that any Wigner matrix 𝐻 is sufficiently close to a Gaussian divisible ensemble
of the form 𝑒−𝑡/2 𝐻0 + √1 − 𝑒−𝑡 𝐻 G with a suitably chosen 𝐻0 . Originally, our
motivation was driven not by Dyson’s conjecture, but by the desire to prove the
1. INTRODUCTION 5
give a historical overview, discuss newer results, and provide some outlook on
related models in Chapter 18.
This book is not a comprehensive monograph on random matrices. Al-
though we summarize a few general concepts at the beginning, we mostly focus
on a concrete strategy and the necessary technical steps behind it. For readers
interested in other aspects, in addition to the classical book of Mehta [106],
several excellent works are available that present random matrices in a broader
scope. The books by Anderson, Guionnet, and Zeitouni [8] and Pastur and
Shcherbina [113] contain extensive material starting from the basics. Tao’s
book [128] provides a different aspect to this subject and is self-contained as
a graduate textbook. Forrester’s monograph [73] is a handbook for any explicit
formulas related to random matrices. Finally, [6] is an excellent comprehen-
sive overview of diverse applications of random matrix theory in mathematics,
physics, neural networks, and engineering.
Finally, we list a few notational conventions. In order to focus on the es-
sentials, we will not follow the dependencies of various constants on different
parameters. In particular, we will use the generic letters 𝐶 and 𝑐 to denote
positive constants, whose values may change from line to line and which may
depend on some fixed basic parameters of the model. For two positive quanti-
ties 𝐴 and 𝐵, we will write 𝐴 ≲ 𝐵 to indicate that there exists a constant 𝐶 such
that 𝐴 ≤ 𝐶𝐵. If 𝐴 and 𝐵 are comparable in the sense that 𝐴 ≲ 𝐵 and 𝐵 ≲ 𝐴,
then we write 𝐴 ≍ 𝐵. We introduce the notation J𝐴, 𝐵K ∶= ℤ ∩ [𝐴, 𝐵] for the
set of integers between any two real numbers 𝐴 < 𝐵.
CHAPTER 2
−1/2
Wigner matrices, we assume that the rescaled matrix elements 𝑠𝑖𝑗 ℎ𝑖𝑗 with
unit variance all have law 𝜈.
We will need some decay property of 𝜈 and we will consider two decay types.
Either we assume that 𝜈 has subexponential decay, i.e., that there are constants
𝐶, 𝜗 > 0 such that for any 𝑠 > 0
Eigenvalue Density
𝑏
1
lim #{𝑖 ∶ 𝜆𝑖 ∈ [𝑎, 𝑏]} = ∫ 𝜚sc (𝑥)d𝑥,
𝑁→∞ 𝑁
(3.1) 𝑎
1
𝜚sc (𝑥) ∶= (4 − 𝑥 2 )+ ,
2𝜋 √
where (𝑎)+ ∶= max{𝑎, 0} denotes the positive part of the number 𝑎. Note the
emergence of the universal density, the semicircle law, that is independent of
the details of the distribution of the matrix elements. The semicircle law holds
beyond Wigner matrices: it characterizes the eigenvalue density of the univer-
sal Wigner matrices (see Definition 2.1).
For the random covariance matrices (2.9), the empirical density of eigen-
values 𝜆𝑖 of 𝐻 converges to the Marchenko-Pastur law [104] in the limit when
𝑀, 𝑁 → ∞ such that 𝑑 = 𝑁/𝑀 is fixed 0 < 𝑑 ≤ 1:
𝑏
1
lim #{𝑖 ∶ 𝜆𝑖 ∈ [𝑎, 𝑏]} = ∫ 𝜚MP (𝑥)d𝑥,
𝑁→∞ 𝑁
(3.2) 𝑎
with 𝜆± ∶= (1 ± √𝑑)2 being the spectral edges. Note that in case 𝑀 ≤ 𝑁, the
matrix 𝐻 has macroscopically many zero eigenvalues; otherwise, the spectra
of 𝑋𝑋 ∗ and 𝑋 ∗ 𝑋 coincide so the Marchenko-Pastur law can be applied to all
nonzero eigenvalues with the role of 𝑀 and 𝑁 exchanged.
11
12 3. EIGENVALUE DENSITY
Since the expectation of each matrix element vanishes, each factor ℎ𝑥𝑦
must be paired with at least another copy ℎ𝑦𝑥 = ℎ𝑥𝑦 ; otherwise the expecta-
tion value is zero. In general, there is no need that the sequence form a perfect
pairing. If we identify ℎ𝑦𝑥 with ℎ𝑥𝑦 , the only restriction is that the matrix ele-
ment ℎ𝑥𝑦 should appear at least twice in the sequence. Under this condition,
the main contribution comes from the index sequences that satisfy the perfect
pairing condition such that each ℎ𝑥𝑦 is paired with a unique ℎ𝑦𝑥 in (3.3). These
sequences can be classified according to their complexity, and it turns out that
the main contribution comes from the so-called backtracking sequences.
An index sequence 𝑖1 𝑖2 𝑖3 ⋯ 𝑖2𝑘 𝑖1 returning to the original index 𝑖1 is called
backtracking if it can be successively generated by a substitution rule
(3.4) 𝑎 → 𝑎𝑏𝑎, 𝑏 ∈ {1, … , 𝑁}, 𝑏 ≠ 𝑎,
with an arbitrary index 𝑏. For example, we represent the term
(3.5) ℎ𝑖1 𝑖2 ℎ𝑖2 𝑖3 ℎ𝑖3 𝑖2 ℎ𝑖2 𝑖4 ℎ𝑖4 𝑖5 ℎ𝑖5 𝑖4 ℎ𝑖4 𝑖2 ℎ𝑖2 𝑖1 , 𝑖1 , … , 𝑖5 ∈ {1, … , 𝑁},
in the expansion of Tr 𝐻 8 (𝑘 = 4) by a walk of length 2 × 4 on the set {1, … , 𝑁}.
This path is generated by the operation (3.4) in the following order:
𝑖1 → 𝑖1 𝑖2 𝑖1 → 𝑖1 𝑖2 𝑖3 𝑖2 𝑖1 → 𝑖1 𝑖2 𝑖3 𝑖2 𝑖4 𝑖2 𝑖1 → 𝑖1 𝑖2 𝑖3 𝑖2 𝑖4 𝑖5 𝑖4 𝑖2 𝑖1
with 𝑖1 ≠ 𝑖2 , 𝑖2 ≠ 𝑖3 , 𝑖3 ≠ 𝑖4 , 𝑖4 ≠ 𝑖5 . It may happen that two nonsuccessive
labels coincide (e.g., 𝑖1 = 𝑖3 ), but the contribution of such terms is by a factor
3.2. THE MOMENT METHOD 13
1/𝑁 less than the leading term so we can neglect them. Thus, we assume that all
labels 𝑖1 , 𝑖2 , 𝑖3 , 𝑖4 , 𝑖5 are distinct. We may bookkeep these paths by two objects:
(a) a graph on vertices labeled by 1, 2, 3, 4, 5 (i.e., by the indices of the
labels 𝑖𝑗 ) and with edges defined by walking over the vertices in fol-
lowing order 1, 2, 3, 2, 4, 5, 4, 2, 1, i.e., the order of the indices of the
labels in (3.5)
(b) an assignment of a distinct label 𝑖𝑗 , 𝑗 = 1, 2, 3, 4, 5, to each vertex.
It is easy to see that the graph generated by backtracking sequences is a
(double-edged) tree on the vertices 1, 2, 3, 4, 5. Now we count the combinatorics
of the objects (a) and (b) separately. We start with (a).
Lemma 3.1. The number of graphs with 𝑘 double-edges, derived from back-
1
tracking sequences, is explicitly given by the Catalan numbers, 𝐶𝑘 ∶= 𝑘+1 (2𝑘).
𝑘
holds for the number 𝐶𝑛 of the backtracking sequences of total length 2𝑛. Ele-
mentary combinatorics shows that the solution to this recursion is given by the
Catalan numbers, which proves the lemma. We remark that 𝐶𝑛 has many other
definitions. Alternatively, it is also the number of planar binary trees with 𝑛 + 1
vertices and one could also use this definition to verify the lemma. □
Note that the number of independent labels, 𝑁 𝑘+1 after dividing by 𝑁, exactly
cancels the size of the 𝑘-fold product of variances, (𝔼|ℎ|2 )𝑘 = 𝑁 −𝑘 .
14 3. EIGENVALUE DENSITY
The expectation of the traces of odd powers of 𝐻 is negligible since they can
never satisfy the pairing condition. One can easily check that
i.e., the semicircle law is identified as the probability measure on ℝ whose even
moments are the Catalan numbers and the odd moments vanish. This proves
that
1
(3.8) 𝔼 Tr 𝑃(𝐻) → ∫ 𝑃(𝑥)𝜚sc (𝑥)d𝑥
𝑁 ℝ
for any 𝑧 = 𝐸 + 𝑖𝜂, 𝐸 ∈ ℝ, 𝜂 > 0. Notice that 𝑚 is simply the normalized trace
of the resolvent of the random matrix 𝐻 with spectral parameter 𝑧. The real
part 𝐸 = Re 𝑧 will often be referred to as the “energy,” alluding to the quantum
mechanical interpretation of the spectrum of 𝐻. An important property of the
Stieltjes transform of any measure on ℝ is that its imaginary part is positive
whenever Im 𝑧 > 0.
3.3. THE RESOLVENT METHOD AND THE STIELTJES TRANSFORM 15
which, after some calculus, can be identified as the Laurent series of 12 (−𝑧 +
√𝑧2 − 4). The approximation becomes exact in the 𝑁 → ∞ limit. Although
the expansion (3.10) is valid only for large 𝑧, given that the limit is an analytic
function of 𝑧 one can extend the relation
1
(3.12) lim 𝔼𝑚𝑁 (𝑧) = (−𝑧 + √𝑧2 − 4)
𝑁→∞ 2
by analytic continuation to the whole upper half-plane 𝑧 = 𝐸+𝑖𝜂, 𝜂 > 0. It is an
easy exercise to see that this is exactly the Stieltjes transform of the semicircle
density, i.e.,
1 𝜚 (𝑥)d𝑥
(3.13) 𝑚sc (𝑧) ∶= (−𝑧 + √𝑧2 − 4) = ∫ sc .
2 ℝ
𝑥−𝑧
The square root function is chosen with a branch cut in the segment [−2, 2] so
that √𝑧2 − 4 ≍ 𝑧 at infinity. This guarantees that Im 𝑚sc (𝑧) > 0 for Im 𝑧 >
0. Since the Stieltjes transform identifies the measure uniquely, and pointwise
convergence of Stieltjes transforms implies weak convergence of measures, we
obtain
(3.14) 𝔼 𝜚𝑁 (d𝑥) ⇀ 𝜚sc (𝑥)d𝑥.
The relation (3.12) actually holds with high probability; i.e., for any 𝑧 with
Im 𝑧 > 0
1
(3.15) lim 𝑚𝑁 (𝑧) = (−𝑧 + √𝑧2 − 4)
𝑁→∞ 2
in probability. In the next sections we will prove this limit with an effective
error term via the resolvent method.
The semicircle law can be identified in many different ways. The moment
method in Section 3.2 utilized the fact that the moments of the semicircle den-
sity are given by the Catalan numbers (3.7), which also emerged as the normal-
ized traces of powers of 𝐻; see (3.6). The resolvent method relies on the fact that
𝑚𝑁 approximately satisfies a self-consistent equation, 𝑚𝑁 ≈ −(𝑧 +𝑚𝑁 )−1 , that
is very close to the quadratic equation that 𝑚sc from (3.13) satisfies:
1
𝑚sc (𝑧) = − .
𝑧 + 𝑚sc (𝑧)
16 3. EIGENVALUE DENSITY
In other words, in the resolvent method the semicircle density emerges via a
specific relation for its Stieltjes transform. It turns out that this approach allows
us to perform a much more precise analysis, especially in the short-scale regime
where Im 𝑧 approaches to 0 as a function of 𝑁. Since the Stieltjes transform of
a measure at spectral parameter 𝑧 = 𝐸 + 𝑖𝜂 essentially identifies the measure
around 𝐸 on scale 𝜂 > 0, a precise understanding of 𝑚𝑁 (𝑧) for small Im 𝑧 will
yield a local version of the semicircle law. This will be explained in Chapter 6.
CHAPTER 4
Invariant Ensembles
which is of the form (4.1) with a traditional extra factor 𝛽/2 that makes some
later formulas nicer. The parameter 𝛽 is determined by the symmetry type:
𝛽 = 1 for real symmetric ensembles and 𝛽 = 2 for complex Hermitian ensem-
bles.
The joint (symmetrized) probability density of the eigenvalues of 𝐻 can be
computed explicitly:
𝛽 𝑁
− 𝑁 ∑𝑗=1 𝑉(𝜆𝑗 )
(4.3) 𝑝𝑁 (𝜆1 , … , 𝜆𝑁 ) = const ∏(𝜆𝑖 − 𝜆𝑗 )𝛽 𝑒 2 .
𝑖<𝑗
In particular, for the Gaussian case 𝑉(𝜆) = 12 𝜆2 is quadratic and thus the joint
distribution of the GOE (𝛽 = 1) and GUE (𝛽 = 2) eigenvalues is given by
1 𝑁
− 𝛽𝑁 ∑𝑗=1 𝜆2𝑗
(4.4) 𝑝𝑁 (𝜆1 , … , 𝜆𝑁 ) = const ∏(𝜆𝑖 − 𝜆𝑗 )𝛽 𝑒 4 .
𝑖<𝑗
In particular, the eigenvalues are strongly correlated. (In this section we neglect
the ordering of the eigenvalues, and we will consider symmetrized statistics.)
The emergence of the Vandermonde determinant in (4.3) is a result of inte-
grating out the “angle” variables in (4.2), i.e., the unitary matrix in the diago-
nalization of 𝐻 = 𝑈Λ𝑈 ∗ . For illustration, we now show this formula for a 2 × 2
matrix. Consider first the real case. By diagonalization, any real symmetric 2×2
matrix can be written in the form
𝑥 𝑧
𝐻=( )
𝑧 𝑦
(4.5)
cos 𝜃 − sin 𝜃 𝜆1 0 cos 𝜃 sin 𝜃
=( )( )( ) , 𝑥, 𝑦, 𝑧 ∈ ℝ.
sin 𝜃 cos 𝜃 0 𝜆2 − sin 𝜃 cos 𝜃
Direct calculation shows that the Jacobian of the coordinate transformation
from (𝑥, 𝑦, 𝑧) to (𝜆1 , 𝜆2 , 𝜃) is
(4.6) (𝜆1 − 𝜆2 ).
The complex case is slightly more complicated. We can write with 𝑧 = 𝑢+𝑖𝑣
and 𝑥, 𝑦 ∈ ℝ
𝑥 𝑧 𝜆 0 −𝑖𝐴
(4.7) 𝐻=( ) = 𝑒𝑖𝐴 ( 1 )𝑒
𝑧 𝑦 0 𝜆2
where 𝐴 is a Hermitian matrix with trace zero; thus, it can be written in the
form
𝑎 𝑏 + 𝑖𝑐
(4.8) 𝐴=( ) , 𝑎, 𝑏, 𝑐 ∈ ℝ.
𝑏 − 𝑖𝑐 −𝑎
This parametrization of 𝑆𝑈(2) with three real degrees of freedom is standard,
but for our purpose we only need two in (4.7) in addition to the two degrees of
freedom from the 𝜆’s. The reason is that the two phases of the eigenvectors are
redundant and the trace zero condition only takes out one degree of freedom,
leaving one more superfluous parameter. We will see that eventually 𝑎 plays no
4.1. JOINT DENSITY OF EIGENVALUES FOR INVARIANT ENSEMBLES 19
role. First, we evaluate the Jacobian at 𝐴 = 0 from the formula (4.7). We only
need to keep the leading order in 𝐴 which gives
𝑥 𝑧 𝜆 0
(4.9) ( ) = Λ + 𝑖[𝐴, Λ] + 𝑂(‖𝐴‖2 ), Λ=( 1 ).
𝑧 𝑦 0 𝜆2
Thus
𝑥 𝑧 𝜆1 𝑖(𝑏 + 𝑖𝑐)(𝜆2 − 𝜆1 )
(4.10) ( )=( ) + 𝑂(‖𝐴‖2 ),
𝑧 𝑦 −𝑖(𝑏 − 𝑖𝑐)(𝜆2 − 𝜆1 ) 𝜆2
and the Jacobian of the transformation from (𝑥, 𝑦, 𝑧) to (𝜆1 , 𝜆2 , 𝑏, 𝑐) at 𝑏 = 𝑐 = 0
is of the form
1 0 0 0
⎡ ⎤
0 1 0 0⎥
(4.11) 𝐶(𝜆2 − 𝜆1 )2 ⎢
⎢0 = 𝐶(𝜆1 − 𝜆2 )2
⎢ 0 𝑖 −𝑖 ⎥
⎥
⎣0 0 −1 −1⎦
with some constant 𝐶. To compute the Jacobian not at the identity, we first no-
tice that by rotation invariance the measure factorizes. This means that its den-
sity with respect to the Lebesgue measure can be written of the form 𝑓(Λ)𝑔(𝑈)
with some functions 𝑓 and 𝑔; in fact, 𝑔(𝑈) is constant (the marginal on the uni-
tary part is the Haar measure). The function 𝑓 may be computed at any point; in
particular, at 𝑈 = 𝐼 this was the calculation (4.11) yielding 𝑓(Λ) = 𝐶(𝜆1 − 𝜆2 )2 .
This proves (4.4) for 𝑁 = 2 modulo the case of multiple eigenvalues 𝜆1 = 𝜆2
where the parametrization of 𝑈 is even more redundant. But this set has zero
measure, so it is negligible (see [8] for a precise argument). The formula (4.9) is
the basis for the proof for general 𝑁, which we leave to the readers. The detailed
proof can be found in [8] or [37].
It is often useful to think of the measure (4.4) as a Gibbs measure on 𝑁
“particles” or “points” 𝝀 = (𝜆1 , … , 𝜆𝑁 ) of the form
𝑒−𝛽𝑁ℋ(𝝀)
𝜇𝑁 (d𝝀) = 𝑝𝑁 (𝝀)d𝝀 = ,
𝑍
(4.12) 𝑁
1 1
ℋ(𝝀) ∶= ∑ 𝑉(𝜆𝑖 ) − ∑ log |𝜆𝑗 − 𝜆𝑖 |
2 𝑖=1 𝑁 𝑖<𝑗
with the confining potential 𝑉(𝜆) and logarithmic interaction. (This connec-
tion was exploited first in [46]). We adopt the standard convention in random
matrix theory that the Hamiltonian ℋ expresses energy per particle, in contrast
to the standard statistical physics terminology where the “Hamiltonian” refers
to the total energy. This explains the unusual 𝑁 factor in the exponent. Notice
the emergence of the Vandermonde determinant in (4.3), which directly comes
from integrating out the Haar measure and the symmetry type of the ensemble,
appears through the exponent 𝛽. Only the “classical” 𝛽 = 1 , 2, or 4 cases cor-
responds to matrix ensembles of the form (4.2), namely, to the real symmetric
or complex Hermitian and quaternion self-dual matrices. We will not give the
20 4. INVARIANT ENSEMBLES
precise definition of the latter (see, e.g., chapter 7 of [106] or [64]), and just men-
tion that this is the natural generalization of symmetric or Hermitian matrices
to quaternion entries and they have real eigenvalues.
We remark that despite the convenience of the explicit formula (4.3) or
(4.12) for the joint density, computing various statistics, such as correlation
functions or even the density of a single eigenvalue, is a highly nontrivial task.
For example, the density involves “only” integrating out all but one eigenvalue,
but the measure is far from being a product, so these integrations cannot be
performed directly when 𝑁 is large. The measure (4.12) has a strong and long-
range interaction, while conventional methods of statistical physics are well
suited for short-range interactions. In fact, from this point of view 𝛽 can be any
positive number and does not have to be restricted to the specific values 𝛽 = 1,
2, or 4. For other values of 𝛽 there is no invariant matrix ensemble behind the
measure (4.12), but it is still a very interesting statistical mechanical system,
called the log-gas or 𝛽-ensemble. If the potential 𝑉 is quadratic, then (4.12)
coincides with (4.4) and it is called the Gaussian 𝛽-ensemble. We will briefly
discuss log-gases in Section 18.3.
(4.13) 𝑥 = √𝑁𝜆,
which effectively removes the factor 𝑁 from the exponent in (4.3). (This simple
scaling works only in the pure Gaussian case, and it is only a technical conve-
nience to simplify formulas.)
4.2. UNIVERSALITY OF CLASSICAL INVARIANT ENSEMBLES 21
After the rescaling and setting 𝛽 = 2, the measure we will consider is given
by a density which we denote by
𝑁 1
− 𝑥2𝑗
(4.14) ̂ (𝑥1 , … , 𝑥𝑁 ) = const ∏(𝑥𝑖 − 𝑥𝑗 )2 ∏ 𝑒
𝑝𝑁 2 .
𝑖<𝑗 𝑗=1
Let 𝑃𝑘 (𝑥) be the 𝑘th orthogonal polynomial on ℝ with respect to the weight
2
function 𝑒−𝑥 /2 with leading coefficient 1. Let
2
𝑒−𝑥 /4 𝑃 (𝑥)
𝜓𝑘 (𝑥) ∶= −𝑥2 /4 𝑘
‖𝑒 𝑃𝑘 ‖𝐿2 (ℝ)
be the corresponding orthonormal function, i.e.,
where the expectation is w.r.t. the probability density 𝑝𝑁 or, in this case, w.r.t.
the original random matrix ensemble. A similar formula holds for observables
of any number of variables.
To compute the correlation functions of a determinantal joint density (4.19),
we start with the following prototype calculation for 𝑁 = 3, 𝑛 = 2:
𝐾3 (𝑥1 , 𝑥1 ) 𝐾3 (𝑥1 , 𝑥2 ) 𝐾3 (𝑥1 , 𝑥3 )
∫ d𝑥3 [𝐾3 (𝑥2 , 𝑥1 ) 𝐾3 (𝑥2 , 𝑥2 ) 𝐾3 (𝑥2 , 𝑥3 )]
ℝ 𝐾3 (𝑥3 , 𝑥1 ) 𝐾3 (𝑥3 , 𝑥2 ) 𝐾3 (𝑥3 , 𝑥3 )
𝐾 (𝑥 , 𝑥 ) 𝐾3 (𝑥2 , 𝑥2 )
= ∫ d𝑥3 [ 3 2 1 ] 𝐾3 (𝑥1 , 𝑥3 )
(4.22) ℝ
𝐾 3 (𝑥3 , 𝑥1 ) 𝐾 3 (𝑥3 , 𝑥2 )
𝐾 (𝑥 , 𝑥 ) 𝐾3 (𝑥1 , 𝑥2 )
− ∫ d𝑥3 [ 3 1 1 ] 𝐾 (𝑥 , 𝑥 )
ℝ
𝐾3 (𝑥3 , 𝑥1 ) 𝐾3 (𝑥3 , 𝑥2 ) 3 2 3
𝐾 (𝑥 , 𝑥 ) 𝐾3 (𝑥1 , 𝑥2 )
+ ∫ d𝑥3 [ 3 1 1 ] 𝐾 (𝑥 , 𝑥 ).
ℝ
𝐾3 (𝑥2 , 𝑥1 ) 𝐾3 (𝑥2 , 𝑥2 ) 3 3 3
4.2. UNIVERSALITY OF CLASSICAL INVARIANT ENSEMBLES 23
From the definition (4.18) and the orthonormality of the 𝜓’s we have the repro-
ducing property
(4.24) ∫ d𝑥 𝐾𝑁 (𝑥, 𝑥) = 𝑁.
ℝ
Thus (4.22) equals
𝐾 (𝑥 , 𝑥 ) 𝐾3 (𝑥2 , 𝑥2 ) 𝐾 (𝑥 , 𝑥 ) 𝐾3 (𝑥1 , 𝑥2 )
[ 3 2 1 ]−[ 3 1 1 ]
𝐾3 (𝑥1 , 𝑥1 ) 𝐾3 (𝑥1 , 𝑥2 ) 𝐾3 (𝑥2 , 𝑥1 ) 𝐾3 (𝑥2 , 𝑥2 )
𝐾 (𝑥 , 𝑥 ) 𝐾3 (𝑥1 , 𝑥2 )
(4.25) + 3[ 3 1 1 ]
𝐾3 (𝑥2 , 𝑥1 ) 𝐾3 (𝑥2 , 𝑥2 )
𝐾3 (𝑥1 , 𝑥1 ) 𝐾3 (𝑥1 , 𝑥2 )
=[ ].
𝐾3 (𝑥2 , 𝑥1 ) 𝐾3 (𝑥2 , 𝑥2 )
It is easy to generalize this computation to get
(𝑛) (𝑁 − 𝑛)!
(4.26) ˆ𝑁 (𝑥1 , … , 𝑥𝑛 ) =
𝑝 det[𝐾𝑁 (𝑥𝑖 , 𝑥𝑗 )]𝑛
𝑖,𝑗=1 ;
𝑁!
i.e., the correlation functions continue to have a determinantal structure. Here
(𝑛)
the constant is obtained by the normalization condition that 𝑝 ˆ𝑁 is a proba-
bility density. Thus, we have an explicit formula for the correlation functions
in terms of the kernel 𝐾𝑁 . We note that this structure is very general and is
not restricted to Hermite polynomials; it only requires a system of orthogonal
polynomials.
To understand the behavior of 𝐾𝑁 , first we recall a basic algebraic property
of the orthogonal polynomials, the Christoffel–Darboux formula:
𝑁−1
𝜓𝑁 (𝑥)𝜓𝑁−1 (𝑦) − 𝜓𝑁 (𝑦)𝜓𝑁−1 (𝑥)
(4.27) 𝐾𝑁 (𝑥, 𝑦) = ∑ 𝜓𝑗 (𝑥)𝜓𝑗 (𝑦) = √𝑁[ ].
𝑗=0
𝑥−𝑦
Since the Hermite polynomials and the orthonormal functions 𝜓𝑁 differ only
by an exponential factor (4.16), and these factors in 𝜓(𝑥)𝜓(𝑦) on both sides of
(4.27) are canceled, (4.27) is just a property of the Hermite polynomials.
We now sketch a proof of this identity. Multiplying both sides by (𝑥 − 𝑦),
we need to prove that
𝑁−1
(4.28) ∑ 𝜓𝑗 (𝑥)𝜓𝑗 (𝑦)(𝑥 − 𝑦) = √𝑁[𝜓𝑁 (𝑥)𝜓𝑁−1 (𝑦) − 𝜓𝑁 (𝑦)𝜓𝑁−1 (𝑥)].
𝑗=0
that directly follows from the “three-term” relation for the Hermite polynomi-
als. Collecting all the terms generated in this way, we obtain the right-hand
side of (4.28). Details can be found in lemma 3.2.7 of [8].
4.2.1. Bulk Universality: the Sine-Kernel. It is well-known that or-
thogonal polynomials of high degree have asymptotic behavior (Plancherel-
Rotach asymptotics). For the Hermite orthonormal functions 𝜓 these formu-
las read as follows:
(−1)𝑚
(4.30) 𝜓2𝑚 (𝑥) = cos (√𝑁𝑥) + 𝑜(𝑁 −1/4 ),
1/4
𝑁 √𝜋
(−1)𝑚
(4.31) 𝜓2𝑚+1 (𝑥) = sin (√𝑁𝑥) + 𝑜(𝑁 −1/4 ),
1/4
𝑁 √𝜋
as 𝑁 → ∞ for any 𝑚 such that |2𝑚 − 𝑁| ≤ 𝐶. The approximation is uniform
for |𝑥| ≤ 𝐶𝑁 −1/2 . We can thus compute that
1 sin(√𝑁𝑥) cos(√𝑁𝑦) − sin(√𝑁𝑦) cos(√𝑁𝑥)
𝐾𝑁 (𝑥, 𝑦) ≈
𝜋 𝑥−𝑦
(4.32)
sin √𝑁(𝑥 − 𝑦)
= ;
𝜋(𝑥 − 𝑦)
i.e., the celebrated sine kernel emerged [47, 105].
To rewrite this formula into a canonical form, recall that we have done a
rescaling (4.13) where 𝜆 is the original variable. The two-point function in the
(2) (2)
original variables was denoted by 𝑝𝑁 and the rescaled variables by 𝑝 ˆ𝑁 . The
(2) (2)
ˆ𝑁 is determined by
relation between 𝑝𝑁 and 𝑝
(2) (2)
(4.33) ˆ𝑁 (𝑥1 , 𝑥2 )d𝑥1 d𝑥2 ,
𝑝𝑁 (𝜆1 , 𝜆2 )d𝜆1 d𝜆2 = 𝑝
and thus we have
(2) (2)
(4.34) ˆ𝑁 (√𝑁𝜆1 , √𝑁𝜆2 ).
𝑝𝑁 (𝜆1 , 𝜆2 ) = 𝑁 𝑝
Now we introduce another rescaling of the eigenvalues that rescales the
typical gap between them to order 1. We set
𝑎𝑗 1
(4.35) 𝜆𝑗 = , 𝜚sc (0) = ,
𝜚sc (0)𝑁 𝜋
where 𝜚sc is the semicircle density; see (3.1). Using the expression (4.20) for
correlation functions, we have, in terms of the original variable, that
1 (2) 𝑎1 𝑎2 ˜11
𝐾 𝑁
𝐾˜12
𝑁
(4.36) 𝑝 ( , ) = det ( 𝑁 )
[𝜚sc (0)]2 𝑁 𝜚sc (0)𝑁 𝜚sc (0)𝑁 ˜21 𝐾
𝐾 ˜22
𝑁
where
˜12
𝑁 1 1 𝑎1 𝑎2
𝐾 ∶= 𝐾𝑁 ( , ) ⇀ 𝑆(𝑎1 − 𝑎2 )
𝜚sc (0) √𝑁 − 1 𝜌𝑠𝑐 (0)√𝑁 𝜌𝑠𝑐 (0)√𝑁
(4.37)
sin 𝑥
𝑆(𝑥) ∶= ,
𝜋𝑥
4.2. UNIVERSALITY OF CLASSICAL INVARIANT ENSEMBLES 25
where we used (4.32). Due to the rescaling, this calculation reveals the corre-
lation functions around 𝐸 = 0. The general formula for any fixed energy 𝐸 in
the bulk, i.e., |𝐸| < 2, can be obtained similarly, and it is given by
1 (𝑛) 𝛼1 𝛼2 𝛼𝑛
(4.38) 𝑝 (𝐸 + ,𝐸 + ,…,𝐸 + )
[𝜚sc (𝐸)]𝑛 𝑁 𝑁𝜚sc (𝐸) 𝑁𝜚sc (𝐸) 𝑁𝜚sc (𝐸)
(𝑛)
⇀ 𝑞GUE (𝜶) ∶= det(𝑆(𝛼𝑖 − 𝛼𝑗 ))𝑛
𝑖,𝑗=1
4.2.2. Edge Universality: the Airy Kernel. Near the spectral edges, i.e.,
for energy 𝐸 = ±2, a different scaling has to be used. Recall the formula
𝜓𝑁 (𝑥)𝜓𝑁−1 (𝑦) − 𝜓𝑁 (𝑦)𝜓𝑁−1 (𝑥)
𝐾𝑁 (𝑥, 𝑦) = √𝑁[ ]
𝑥−𝑦
in the rescaled variables 𝑥, 𝑦. We will need the following identity of the deriva-
tives of the Hermite functions
′ 𝑥
(4.40) 𝜓𝑁 (𝑥) = − 𝜓𝑁 (𝑥) + √𝑁𝜓𝑁−1 (𝑥).
2
Thus, we can rewrite
′ ′
𝜓𝑁 (𝑥)𝜓𝑁 (𝑦) − 𝜓𝑁 (𝑦)𝜓𝑁 (𝑥) 1
(4.41) 𝐾𝑁 (𝑥, 𝑦) = [ − 𝜓𝑁 (𝑥)𝜓𝑁 (𝑦)].
𝑥−𝑦 2
26 4. INVARIANT ENSEMBLES
𝑢
(4.42) Ψ𝑁 (𝑢) ∶= 𝑁 1/12 𝜓𝑁 (2√𝑁 + ).
𝑁 1/6
It is well-known that the Airy function is the solution to the second-order dif-
ferential equation 𝑦 ″ − 𝑥𝑦 = 0 with vanishing boundary condition at 𝑥 = ∞.
We now define the Airy kernel by
′ ′
Ai(𝑢)Ai (𝑣) − Ai (𝑢)Ai(𝑣)
𝐴(𝑢, 𝑣) ∶= .
𝑢−𝑣
Under the edge scaling (4.42), we have
𝑢 𝑣
(4.44) 𝑁 −1/6 𝐾𝑁 (2√𝑁 + 1/6
, 2√𝑁 + 1/6 ) → 𝐴(𝑢, 𝑣).
𝑁 𝑁
(2) 𝛼1 𝛼2 (2) 𝛼1 𝛼1
𝑝𝑁 (2 + , 2 + 2/3 ˆ𝑁 (2√𝑁 + 1/6
) = 𝑁𝑝 , 2√𝑁 + 1/6 ).
𝑁 2/3 𝑁 𝑁 𝑁
(2) 𝛼1 𝛼2
𝑝𝑁 (2 + 2/3
, 2 + 2/3 )
𝑁 𝑁
2
1 𝛼𝑖 𝛼𝑗
=𝑁 √ √
det[𝐾𝑁 (2 𝑁 + 1/6 , 2 𝑁 + 1/6 )]
𝑁(𝑁 − 1) 𝑁 𝑁 𝑖,𝑗=1
2
𝛼𝑖 𝛼
≍ 𝑁 −2/3 det[𝑁 −1/6 𝐾𝑁 (2√𝑁 + , 2 √𝑁 + 𝑗 )] ,
𝑁 1/6 𝑁 1/6 𝑖,𝑗=1
and similar formulas hold for any 𝑘-point correlation functions. Using the lim-
iting statement (4.44), in terms of the original variables, we obtain
(𝑘) 𝛼1 𝛼2 𝛼𝑘
(4.45) 𝑁 𝑘/3 𝑝𝑁 (2 + 2/3
, 2 + 2/3 , … , 2 + 2/3 ) ⇀ det(𝐴(𝛼𝑖 , 𝛼𝑗 ))𝑘𝑖,𝑗=1
𝑁 𝑁 𝑁
4.2. UNIVERSALITY OF CLASSICAL INVARIANT ENSEMBLES 27
in a weak sense. In particular, the last formula with 𝑘 = 2 implies, for any
smooth test function 𝑂 with compact support, that
∑ 𝔼 𝑂(𝑁 2/3 (𝜆𝑗 − 2)), 𝑁 2/3 (𝜆𝑘 − 2))
𝑗≠𝑘
(2) 𝛼1 𝛼2
(4.46) = 𝑁(𝑁 − 1)𝑁 −4/3 ∫ d𝛼1 d𝛼2 𝑂(𝛼1 , 𝛼2 )𝑝𝑁 (2 + , 2 + 2/3 )
ℝ2
𝑁 2/3 𝑁
1 (𝑛) 𝜶 (𝑛)
(5.1) lim 𝑛
∫ d𝜶 𝐹(𝜶)𝑝𝑁 (𝐸 + ) = ∫ d𝜶 𝐹(𝜶)𝑞GOE (𝜶)
𝑁→∞ 𝜚(𝐸) 𝑁𝜚(𝐸)
ℝ𝑛 ℝ𝑛
(𝑛)
where 𝜶 = (𝛼1 , … , 𝛼𝑛 ) . Here 𝑝𝑁 is the 𝑛-point function of the ma-
(𝑛)
trix ensemble and 𝑞GOE is the limiting 𝑛-point function of the GOE
(𝑛)
defined in (4.39). To shorten the argument of 𝑝𝑁 , we used the con-
vention that 𝐸+𝜶 = (𝐸+𝛼1 , … , 𝐸+𝛼𝑛 ) for any 𝜶 = (𝛼1 , … , 𝛼𝑛 ) ∈ ℂ𝑁 .
(ii) Averaged energy universality (in the bulk, on scale 𝑁 −1+𝜉 ): For any
𝑛 ≥ 1, 𝐹 ∶ ℝ𝑛 → ℝ a smooth and compactly supported function,
29
30 5. UNIVERSALITY FOR GENERALIZED WIGNER MATRICES
and for some 0 < 𝜉 < 1 and for any 𝜅 > 0, we have, uniformly in
𝐸 ∈ [−2 + 𝜅, 2 − 𝜅],
𝐸+𝑏
1 d𝑥 (𝑛) 𝜶
(5.2) lim ∫ ∫ d𝜶 𝐹(𝜶)𝑝𝑁 (𝑥 + )d𝜶 =
𝑁→∞ 𝜚(𝐸)𝑛 2𝑏 𝑁𝜚(𝐸)
𝐸−𝑏 ℝ𝑛
(𝑛)
∫ d𝜶 𝐹(𝜶)𝑞GOE (𝜶)
ℝ𝑛
where 𝑏 = 𝑏𝑁 ∶= 𝑁 −1+𝜉 .
(iii) Fixed gap universality (in the bulk): Fix any positive number 0 < 𝛼 < 1
and an integer 𝑛. For any smooth compactly supported function 𝐺 ∶
ℝ𝑛 → ℝ and for any 𝑘, 𝑚 ∈ J𝛼𝑁, (1 − 𝛼)𝑁K we have
where 𝜇𝑁 denotes the law of the random matrix ensemble under con-
sideration.
(iv) Averaged gap universality (in the bulk, on scale 𝑁 −1+𝜉 ): Using the
same notation as in (iii) and ℓ = 𝑁 𝜉 with 0 < 𝜉 < 1, we have
𝑘+ℓ
| 1
lim | ∑ 𝔼𝜇𝑁 𝐺((𝑁𝜚(𝜆𝑗 ))(𝜆𝑗 − 𝜆𝑗+1 ), … , (𝑁𝜚(𝜆𝑗 ))(𝜆𝑗 − 𝜆𝑗+𝑛 ))
𝑁→∞ | 2ℓ + 1
(5.4) 𝑗=𝑘−ℓ
|
− 𝔼(GOE) 𝐺((𝑁𝜚(𝜆𝑚 ))(𝜆𝑚 − 𝜆𝑚+1 ), … , (𝑁𝜚(𝜆𝑚 ))(𝜆𝑚 − 𝜆𝑚+𝑛 ))|| = 0.
Brownian motions starting from zero. Then 𝐻𝑡 and 𝑒−𝑡/2 𝐻0 + √1 − 𝑒−𝑡 𝐻 G have
the same distribution.
The aim of Step 2 is to prove the bulk universality of 𝐻𝑡 for 𝑡 = 𝑁 −𝜏 for the
entire range of 0 < 𝜏 < 1. This is connected to the local ergodicity of the Dyson
Brownian motion, which we now define.
Definition 5.2. Given a real parameter 𝛽 ≥ 1, consider the following sys-
tem of stochastic differential equations (SDE) :
√2 𝜆𝑖 1 1
(5.8) d𝜆𝑖 = d𝐵𝑖 + (− + ∑ )d𝑡, 𝑖 ∈ J1, 𝑁K,
√𝛽𝑁 2 𝑁 𝑗≠𝑖 𝜆𝑖 − 𝜆𝑗
where (𝐵𝑖 ) is a collection of real-valued, independent, standard Brownian mo-
tions. The solution of this SDE is called the Dyson Brownian motion (DBM).
In a seminal paper [45], Dyson observed that the eigenvalue flow of the
matrix OU process is exactly the DBM with 𝛽 = 1, 2 corresponding to real sym-
metric or complex Hermitian ensembles. Furthermore, the invariant measure
of the DBM is given by the Gaussian 𝛽-ensemble defined in (4.4). Dyson further
conjectured that the time to “local equilibrium” for DBM is of order 1/𝑁, while
the time to global equilibrium is of order one. It should be noted that there is
no standard notion for the “local equilibrium”; we will instead take a practical
point of view to interpret Dyson’s conjecture as that the local statistics of the
DBM at any time 𝑡 ≫ 𝑁 −1 satisfy the universality defined earlier in this sec-
tion. In other words, Dyson’s conjecture is exactly that the local statistics of 𝐻𝑡
are universal for 𝑡 = 𝑁 −𝜏 for any 0 < 𝜏 < 1.
Step 3. Approximation by a Gaussian divisible ensemble. This is a simple
density argument in the space of matrix ensembles which shows that for any
probability distribution of the matrix elements there exists a Gaussian divisible
distribution with a small Gaussian component, as in Step 2, such that the two
associated Wigner ensembles have asymptotically identical local eigenvalue sta-
tistics. The general result to compare any two matrix ensembles with matching
moments will be given in Theorem 16.1. Alternatively, to follow the evolution
of the Green function under the OU flow, we can use the following continuity
of the matrix OU process.
Step 3a. Continuity of eigenvalues under the matrix OU process. In Theorem
15.2 we will show that the changes of the local statistics in the bulk under the
flow (5.7) up to time scales 𝑡 ≪ 𝑁 −1/2 are negligible; see Lemma 15.4. This
clearly can be used in combination with Step 2 to complete the proof of Theorem
5.1.
The three-step strategy outlined here is very general, and it has been applied
to many different models as we will explain in Chapter 18. It can also be extended
to the edges of the spectrum, yielding the universality at the spectral edges. This
will be reviewed in Chapter 17.
CHAPTER 6
6.1. Setup
We recall the definition of the universal Wigner matrices (Definition 2.1);
in particular, the matrix elements may have different distributions but indepen-
dence (up to symmetry) is always assumed. The fundamental data of this model
is the 𝑁 × 𝑁 matrix of variances 𝑆 = (𝑠𝑖𝑗 ) where
𝑠𝑖𝑗 ∶= 𝔼 |ℎ𝑖𝑗 |2 ,
and we assume that 𝑆 is (doubly) stochastic:
(6.1) ∑ 𝑠𝑖𝑗 = 1
𝑗
We remark that every quantity related to the random matrix 𝐻, such as the
eigenvalues, Green function, empirical density of states, and its Stieltjes trans-
form, all depend on 𝑁, but this dependence will often be omitted in the notation
for brevity. In some formulas, especially in statements of the main results, we
will put back the 𝑁 dependence to stress its presence.
Since
1 1 1 𝜂
Im = 𝜃𝜂 (𝐸 − 𝜆𝛼 ) with 𝜃𝜂 (𝑥) ∶=
𝜋 𝑧 − 𝜆𝛼 𝜋 𝑥2 + 𝜂2
is an approximation to the identity (i.e., delta function) at the scale 𝜂 = Im 𝑧,
we have 𝜋 −1 Im 𝑚𝑁 (𝑧) = 𝜚𝑁 ∗ 𝜃𝜂 (𝑧); i.e., the imaginary part of 𝑚𝑁 (𝑧) is the
density of the eigenvalues “at the scale 𝜂.” Thus, the convergence of the Stieltjes
transform 𝑚𝑁 (𝑧) to 𝑚sc (𝑧) as 𝑁 → ∞ will show that the empirical local density
of the eigenvalues around the energy 𝐸 in a window of size 𝜂 converges to the
semicircle law 𝜚sc (𝐸). Therefore, the key task is to control 𝑚𝑁 (𝑧) for small 𝜂.
with very high probability. This equation will be viewed as a small perturbation
of the deterministic equation
𝑁
1
(6.16) − = 𝑧 + ∑ 𝑠𝑖𝑗 𝑚𝑖 (𝑧),
𝑚𝑖 (𝑧) 𝑗=1
which, under the side condition that Im 𝑚𝑖 > 0, has a unique solution, namely
𝑚𝑖 (𝑧) = 𝑚sc (𝑧) for every 𝑖 (see (6.8)). Here the stochasticity condition (6.1)
is essential. For the stability analysis of (6.16) the invertibility of the operator
2
1 − 𝑚sc (𝑧)𝑆 plays a key role.
36 6. LOCAL SEMICIRCLE LAW FOR UNIVERSAL WIGNER MATRICES
for large enough 𝑁 ≥ 𝑁0 (𝜀, 𝐷). Unless stated otherwise, throughout this paper
the stochastic domination will always be uniform in all parameters apart from
the parameter 𝛿 in (6.4) and the sequence of constants 𝜇𝑝 in (5.6). Thus, 𝑁0 (𝜀, 𝐷)
also depends on 𝛿 and 𝜇𝑝 . If 𝑋 is stochastically dominated by 𝑌, uniformly in 𝑢,
we use the notation 𝑋 ≺ 𝑌. Moreover, if for some complex family 𝑋 we have
|𝑋| ≺ 𝑌, we also write 𝑋 = 𝑂≺ (𝑌).
The following proposition collects some basic properties of the stochastic
domination. The proofs are left as an exercise.
Proposition 6.5. The relation ≺ satisfies the following properties:
(i) ≺ is transitive: 𝑋 ≺ 𝑌 and 𝑌 ≺ 𝑍 imply 𝑋 ≺ 𝑍.
(ii) ≺ satisfies the familiar arithmetic rules of order relations; i.e., if 𝑋1 ≺ 𝑌1
and 𝑋2 ≺ 𝑌2 , then 𝑋1 + 𝑋2 ≺ 𝑌1 + 𝑌2 and 𝑋1 𝑋2 ≺ 𝑌1 𝑌2 .
(iii) Moreover, the following cancellation property holds:
(6.25) if 𝑋 ≺ 𝑌 + 𝑁 −𝜀 𝑋 for some 𝜀 > 0, then 𝑋 ≺ 𝑌.
(iv) Furthermore, if 𝑋 ≺ 𝑌, 𝔼𝑌 ≥ 𝑁 −𝐶 , and |𝑋| ≤ 𝑁 𝐶 almost surely
with some fixed exponent 𝐶, then for any 𝜀 > 0 and sufficiently large
𝑁 ≥ 𝑁0 (𝜀) we have
(6.26) 𝔼𝑋 ≤ 𝑁 𝜀 𝔼𝑌.
Later in Lemma 10.1 the relation (6.26) will be extended to partial expectations.
We now define appropriate subsets of the spectral parameter 𝑧.
Definition 6.6 (Spectral domain). We call an 𝑁-dependent family
𝐃 ≡ 𝐃(𝑁) ⊂ {𝑧 ∶ |𝐸| ≤ 10, 𝑀 −1 ≤ 𝜂 ≤ 10}
a spectral domain. (Recall that 𝑀 ≡ 𝑀𝑁 depends on 𝑁.)
(𝑁)
We always consider families 𝑋 (𝑁) (𝑢) = 𝑋𝑖 (𝑧) indexed by 𝑢 = (𝑧, 𝑖) where
𝑧 takes on values in some spectral domain 𝐃, and 𝑖 takes on values in some finite
(possibly 𝑁-dependent or empty) index set. The stochastic domination 𝑋 ≺ 𝑌
38 6. LOCAL SEMICIRCLE LAW FOR UNIVERSAL WIGNER MATRICES
of such families will always be uniform in 𝑧 and 𝑖, and we usually do not state
this explicitly. Usually which spectral domain 𝐃 is meant will be clear from the
context, in which case we shall not mention it explicitly.
For example, using Chebyshev’s inequality and (6.2) one easily finds that
(6.27) ℎ𝑖𝑗 ≺ (𝑠𝑖𝑗 )1/2 ≺ 𝑀 −1/2
uniformly in 𝑖 and 𝑗, so that we may also write ℎ𝑖𝑗 = 𝑂≺ ((𝑠𝑖𝑗 )1/2 ). The definition
of ≺ with the polynomial factors 𝑁 −𝜀 and 𝑁 −𝐷 are tailored for the assumption
(6.2). We remark that if the analogous subexponential decay (2.7) is assumed
then a stronger form of stochastic domination can be introduced, but we will
not pursue this direction here.
1 𝑀 −𝛾 𝑀 −2𝛾
(6.28) ˜
𝜂 𝐸 ∶= min{𝜂 ∶ ≤ min{ , }
𝑀𝜉 Γ̃ (𝐸 + i𝜉)3 Γ̃ (𝐸 + i𝜉)4 Im 𝑚sc (𝐸 + i𝜉)
holds for all 𝜉 ≥ 𝜂}.
Although this expression looks complicated, we shall see that it comes out
naturally in the analysis of the self-consistent equations for the Green functions.
Here 𝛾 > 0 is a parameter that can be chosen arbitrarily small; for all practical
purposes, the reader can neglect it. For generalized Wigner matrices 𝑀 ≍ 𝑁,
from (6.24) we have
(6.29) 𝜂 𝐸 ≤ 𝐶𝑁 −1+2𝛾 ;
˜
i.e., we will get the local semicircle law on the smallest possible scale 𝜂 ≫ 𝑁 −1 ,
modulo an 𝑀 𝛾 correction with an arbitrary, small exponent. We remark that
if we assume subexponential decay (2.7) instead of the polynomial decay (6.2),
then the small 𝑀 𝛾 correction can be replaced with a (log 𝑀)𝐶 factor.
Finally, we define our fundamental control parameter
Im 𝑚sc (𝑧) 1
(6.30) Π(𝑧) ∶= + .
√ 𝑀𝜂 𝑀𝜂
We can now state the main result of this section, which in this full generality
first appeared in [55]. Previous results that have cumulatively led to this general
formulation will be summarized at the end of the section.
Theorem 6.7 (Local semicircle law [55]). Consider a universal Wigner ma-
trix satisfying the polynomial decay condition (6.2) and (6.3). Then, uniformly in
6.4. STATEMENT OF THE LOCAL SEMICIRCLE LAW 39
Im 𝑚sc (𝑧) 1
(6.31) max |𝐺𝑖𝑗 (𝑧) − 𝛿𝑖𝑗 𝑚sc (𝑧)| ≺ Π(𝑧) = + , 𝑧 = 𝐸 + i𝜂,
𝑖,𝑗 √ 𝑀𝜂 𝑀𝜂
as well as
1
(6.32) |𝑚𝑁 (𝑧) − 𝑚sc (𝑧)| ≺ .
𝑀𝜂
Moreover, outside of the spectrum we have the stronger estimate
1 1
(6.33) |𝑚𝑁 (𝑧) − 𝑚(𝑧)| ≺ +
𝑀(𝜅𝐸 + 𝜂) (𝑀𝜂)2 √𝜅𝐸 + 𝜂
uniformly in 𝑧 ∈ {𝑧 ∶ 2 ≤ |𝐸| ≤ 10, 𝜂˜𝐸 ≤ 𝜂 ≤ 10, 𝑀𝜂 √𝜅𝐸 + 𝜂 ≥ 𝑀 𝛾 } for any
fixed 𝛾 > 0 where 𝜅𝐸 ∶= ||𝐸| − 2|.
For the generalized Wigner matrix, the threshold 𝜂˜𝐸 can be chosen 𝜂˜𝐸 =
𝑁 −1+2𝛾 .
We point out two remarkable features of these bounds. The error term for
the resolvent entries behaves essentially as (𝑀𝜂)−1/2 , with an improvement near
the edges where Im 𝑚sc vanishes. The error bound for the Stieltjes transform,
i.e., for the average of the diagonal resolvent entries, is one order better, (𝑀𝜂)−1 ,
but without improvement near the edge.
The resolvent matrix element 𝐺𝑖𝑗 may be viewed as the scalar product ⟨e𝑖 ,
𝐺e𝑗 ⟩ where e𝑖 is the 𝑖th coordinate vector. In fact, a more general version of
(6.31), the isotropic local law, also holds for generalized Wigner matrices:
Theorem 6.8 (Isotropic law [21]). For a generalized Wigner matrix with
polynomial decay (5.6) and for any fixed unit vector 𝐯, 𝐰 we have
Im 𝑚sc (𝑧) 1
(6.34) |⟨𝐯, 𝐺(𝑧)𝐰⟩ − 𝑚sc (𝑧)⟨𝐯, 𝐰⟩| ≺ +
√ 𝑁𝜂 𝑁𝜂
uniformly in the set {𝑧 = 𝐸 + 𝑖𝜂 ∶ |𝐸| ≤ 𝜔−1 , 𝑁 −1+𝜔 ≤ 𝜂 ≤ 𝜔−1 } for any fixed
𝜔 > 0.
The isotropic law was first proven in [89] for Wigner matrices under a van-
ishing third moment condition. The general case in the form above was given
in [21]. We will not prove this result here since it is not needed for the proof of
Theorem 5.1.
We will first prove a weaker version of Theorem 6.7 in Chapter 7, the so-
called weak local semicircle law where the error term is not optimal. After that,
we will prove Theorem 6.7 in Chapter 8 using Γ instead of Γ̃ . This yields the
same estimate as given in Theorem 6.7 but only on a smaller set of the spectral
parameter for which the argument is somewhat simpler. The proof of Theorem
6.7 for the entire domain will only be sketched in Chapter 9, and we refer the
reader to the original paper for the complete version. Chapter 7 is included
mainly for pedagogical reasons to introduce the ideas of continuity argument
and self-consistent equations. Chapter 8 demonstrates how to use the vector
40 6. LOCAL SEMICIRCLE LAW FOR UNIVERSAL WIGNER MATRICES
where the constant depends on 𝐶inf and 𝐶sup . This proves the upper bound on Γ
in (6.24).
Finally, we bound Γ̃ . The lower bound was already given in (6.19). For the
upper bound we follow the argument above, but we restrict 𝑆 to e⟂ . Since the
spectrum of this restriction lies in [−1 + 𝑎, 1 − 𝑎], we immediately get the bound
𝐶/𝑎 for the ℓ2 -norm of (1 − 𝑚sc2
𝑆)−1 |e⟂ in the right-hand side of (6.35). This can
42 6. LOCAL SEMICIRCLE LAW FOR UNIVERSAL WIGNER MATRICES
be lifted to the same estimate for the ℓ∞ -norm. This completes the proof of the
lemma. □
This simple proof of Lemma 6.3 used both spectral gaps and that 𝑠𝑖𝑗 =
𝑂(𝑁 −1 ). Lacking this information in the general case, the following proposi-
tion gives explicit bounds on Γ and Γ̃ depending on the spectral gaps 𝛿± in the
general case. We recall the notation 𝑧 = 𝐸 + i𝜂 and 𝜅 = 𝜅𝐸 ∶= ||𝐸| − 2| and
define
𝜂
𝜅+ if |𝐸| ≤ 2,
(6.36) 𝜃 ≡ 𝜃(𝑧) ∶= { √𝜅+𝜂
√𝜅 + 𝜂 if |𝐸| > 2.
1 𝐶 log 𝑁 𝐶 log 𝑁
(6.37) ≤ Γ(𝑧) ≤ 2 ≤ .
𝐶 √𝜅 + 𝜂 1 − max± ||
1±𝑚 sc | min{𝜂 + 𝐸 2 , 𝜃}
2|
𝐶 log 𝑁
(6.38) Γ(𝑧) ≤ .
min{𝛿− + 𝜂 + 𝐸 2 , 𝜃}
𝐶 log 𝑁
(6.39) 𝐶 −1 ≤ Γ̃ (𝑧) ≤ .
min{𝛿− + 𝜂 + 𝐸 2 , 𝛿+ + 𝜃}
Proof. The first bound of (6.37) follows from
2
(1 − 𝑚sc 𝑆)−1 𝐞 = (1 − 𝑚sc
2 −1
) 𝐞
Therefore,
0𝑛 −1 2 𝑛 ∞ 2 𝑛
‖ 1 ‖ ‖ 1 + 𝑚sc 𝑆‖
√ ‖ 1 + 𝑚sc 𝑆‖
‖‖ 1 − 𝑚2 𝑆 ‖‖ ∞ ∞ ≤ ∑ ‖
‖ ‖
‖ + 𝑁 ∑ ‖
‖ ‖‖ 2 2
sc ℓ →ℓ 𝑛=0
2 ℓ∞ →ℓ∞ 𝑛=𝑛0
2 ℓ →ℓ
𝑛0
𝑞
≤ 𝑛 0 + √𝑁
1−𝑞
𝐶 log 𝑁
≤ ,
1−𝑞
𝐶 log 𝑁
where in the last step we chose 𝑛0 = 01−𝑞 for large enough 𝐶0 . Here we used
that ‖𝑆‖ℓ∞ →ℓ∞ ≤ 1 and (6.11) to estimate the summands in the first sum. This
concludes the proof of the second bound of (6.37).
The third bound of (6.37) follows from the elementary estimates
2
| 1 − 𝑚sc | 2
| |
| 2 | ≤ 1 − 𝑐(𝜂 + 𝐸 ),
(6.41) 2
| 1 + 𝑚sc | 2 𝜂
| |
| 2 | ≤ 1 − 𝑐((Im 𝑚sc ) + Im 𝑚sc + 𝜂 ) ≤ 1 − 𝑐𝜃,
for some universal constant 𝑐 > 0, where in the last step we used Lemma 6.2.
The estimate (6.38) follows similarly. Due to the gap 𝛿− in the spectrum
of 𝑆, we may replace the estimate (6.40) with
2 2
‖ 1 + 𝑚sc 𝑆‖ 2 || 1 + 𝑚sc ||
(6.42) ‖‖ ‖‖ 2 2 ≤ max{1 − 𝛿− − 𝜂 − 𝐸 , | 2 |}.
2 ℓ →ℓ
Hence (6.38) follows using (6.41).
The lower bound of (6.39) was proved in (6.19). The upper bound is proved
similarly to (6.38), except that (6.42) is replaced with
2 2
‖ 1 + 𝑚sc 𝑆| ‖ 2 | 1 + 𝑚sc |
| ≤ max{1 − 𝛿 − 𝜂 − 𝐸 , min{1 − 𝛿 , | |
‖‖ 2 |𝐞⟂ ‖‖ℓ2 →ℓ2 − + | 2 |}}.
This concludes the proof of (6.39). □
CHAPTER 7
Before we prove the local semicircle law in the strong form, Theorem 6.7, for
pedagogical reasons we first prove the following weaker version whose proof is
easier. For simplicity, in this section we consider the Wigner case, i.e., 𝑠𝑖𝑗 = 𝑁1
and 𝑀 = 𝑁. In the bulk, this weaker estimate is still effective for all 𝜂 down to
the smallest scales 𝑁 −1+𝜀 , but the power of 𝑁1 in the error estimate is not optimal
( 12 instead of 1 in (6.32)). Near the edge the bound is even weaker; the power of
1
𝑁
is reduced to 41 , indicating that this proof is not sufficiently strong near the
edge.
For definiteness, we present the proof for the Hermitian case, but all formu-
las below carry over to the other symmetry classes with obvious modifications.
(We sometimes use 𝐮 ⋅ 𝐯 instead of 𝐮∗ 𝐯 or (𝐮, 𝐯) for the usual Hermitian scalar
product.) We now introduce some notation.
Definition 7.3 (Partial expectation and independence). Let 𝑋 ≡ 𝑋(𝐻) be
a random variable. For 𝑖 ∈ {1, … , 𝑁} define the operations 𝑃𝑖 and 𝑄𝑖 through
𝑃𝑖 𝑋 ∶= 𝔼(𝑋|𝐻 (𝑖) ), 𝑄𝑖 𝑋 ∶= 𝑋 − 𝑃𝑖 𝑋.
We call 𝑃𝑖 the partial expectation in the index 𝑖. Moreover, we say that 𝑋 is
independent of a set 𝕋 ⊂ {1, … , 𝑁} if 𝑋 = 𝑃𝑖 𝑋 for all 𝑖 ∈ 𝕋.
We can decompose 𝐚𝑖 ⋅ 𝐺 (𝑖) 𝐚𝑖 into its expectation and fluctuation
𝐚𝑖 ⋅ 𝐺 (𝑖) 𝐚𝑖 = 𝑃𝑖 [𝐚𝑖 ⋅ 𝐺 (𝑖) 𝐚𝑖 ] + 𝑍𝑖
where
(7.9) 𝑍𝑖 ∶= 𝑄𝑖 [𝐚𝑖 ⋅ 𝐺 (𝑖) 𝐚𝑖 ].
Since 𝐺 (𝑖) is independent of 𝐚𝑖 , we need to compute expectations and fluctua-
tions of quadratic functions. The expectation is easy:
(𝑖) (𝑖) 1 (𝑖)
𝑃𝑖 [𝐚𝑖 ⋅ 𝐺 (𝑖) 𝐚𝑖 ] = 𝑃𝑖 ∑ 𝐚𝑘𝑖 𝐺𝑘𝑙 𝐚𝑙𝑖 = ∑ 𝑃𝑖 [ℎ𝑖𝑘 𝐺𝑘𝑙 ℎ𝑙𝑖 ] = ∑ 𝐺𝑘𝑘 ,
𝑘,𝑙 𝑘,𝑙≠𝑖
𝑁 𝑘≠𝑖
where in the last step we used that different matrix elements are independent,
i.e., 𝑃𝑖 [ℎ𝑖𝑘 ℎ𝑖𝑙 ] = 𝑁1 𝛿𝑘𝑙 . The summations always run over all indices from 1 to 𝑁,
apart from those that are explicitly excluded. We define
(𝑖) 1 1 (𝑖)
𝑚𝑁 (𝑧) ∶= Tr 𝐺 [𝑖] (𝑧) = ∑ 𝐺 (𝑧),
𝑁−1 𝑁 − 1 𝑘≠𝑖 𝑘𝑘
[𝑖] (𝑖)
where we used 𝐺𝑘𝑘 = 𝐺𝑘𝑘 for 𝑘 ≠ 𝑖 from (7.5). Hence we have the identity
1 1
(7.10) 𝐺𝑖𝑖 = = 1
.
𝑖 (𝑖) 𝑖
ℎ𝑖𝑖 − 𝑧 − 𝑃𝑖 [𝐚 ⋅ 𝐺 𝐚 ] − 𝑍𝑖 (𝑖)
ℎ𝑖𝑖 − 𝑧 − (1 − )𝑚𝑁 (𝑧) − 𝑍𝑖
𝑁
Step 2. Interlacing of eigenvalues. We now estimate the difference between
(𝑖)
𝑚sc and 𝑚𝑁 . The first step is the following well-known lemma. We include a
short proof for completeness. For simplicity we consider the randomized setup
with a continuous distribution to avoid multiple eigenvalues. The general case
easily follows from standard approximation arguments.
Lemma 7.4 (Interlacing of eigenvalues). Let 𝐻 be a symmetric or Hermitian
𝑁 × 𝑁 matrix with continuous distribution. Decompose 𝐻 as follows:
ℎ 𝐚∗
(7.11) 𝐻=( ),
𝐚 𝐵
where 𝐚 = (ℎ12 , … , ℎ1𝑁 )∗ and 𝐵 = 𝐻 [1] is the (𝑁 − 1) × (𝑁 − 1) minor of 𝐻
obtained by removing the first row and first column from 𝐻. Denote by 𝜇1 ≤ ⋯ ≤
𝜇𝑁 the eigenvalues of 𝐻 and 𝜆1 ≤ ⋯ ≤ 𝜆𝑁−1 the eigenvalues of 𝐵. Then with
48 7. WEAK LOCAL SEMICIRCLE LAW
probability 1 the eigenvalues of 𝐵 are distinct and the eigenvalues of 𝐻 and 𝐵 are
interlaced:
(7.12) 𝜇1 < 𝜆1 < 𝜇2 < 𝜆2 < ⋯ < 𝜇𝑁−1 < 𝜆𝑁−1 < 𝜇𝑁 .
Since 𝔼ℎ = 0, the nonzero contributions to this sum come from index com-
binations when all ℎ and ℎ are paired. For pedagogical simplicity, assume that
𝔼ℎ2 = 0; this holds, for example, if the distribution of the real and imaginary
parts are the same. Then ℎ factors in the above expression have to be paired
in such a way that ℎ𝑖𝑘 = ℎ𝑖𝑘′ and ℎ𝑖𝑙 = ℎ𝑖𝑙′ , i.e., 𝑘 = 𝑘 ′ and 𝑙 = 𝑙 ′ . Note that
pairing ℎ𝑖𝑘 = ℎ𝑖𝑙 would give 0 because the expectation is subtracted. The result
is
1 (𝑖) 2 𝑚 −1 (𝑖) 2
(7.21) ℙ 𝑖 |𝑍𝑖 |2 = 2 ∑ |𝐺𝑘𝑙 | + 4 2 ∑ |𝐺𝑘𝑘 | ,
𝑁 𝑘,𝑙≠𝑖 𝑁 𝑘≠𝑖
where 𝜁𝛼 are the eigenvalues of 𝐴. To see this, let 𝐮𝛼 be the normalized eigen-
vectors. Then by the spectral theorem
|𝐮𝛼 (𝑘)|2
𝑅𝑘𝑘 = ∑ ,
𝛼
𝜁𝛼 − 𝑧
and thus we have
2
|𝐮𝛼 (𝑘)|2 |𝐮𝛽 (𝑘)|
∑ |𝑅𝑘𝑘 |2 ≤ ∑ ∑
𝑘 𝑘 𝛼,𝛽
|𝜁𝛼 − 𝑧| |𝜁𝛽 − 𝑧|
1 1
≤∑ ∑ |𝐮 (𝑘)|2 ∑ |𝐮𝛽 (𝑘)|2 = ∑ ,
𝛼
|𝜁𝛼 − 𝑧|2 𝑘 𝛼 𝛽 𝛼
|𝜁𝛼 − 𝑧|2
where we have used the Schwarz inequality and that {𝐮𝛽 } is an orthonormal
basis. Applying this bound to the Green function of 𝐻 [𝑖] with eigenvalues 𝜇𝛼 ,
7.1. PROOF OF THE WEAK LOCAL SEMICIRCLE LAW, THEOREM 7.1 51
we have
1 2 2
∑ |𝐺 (𝑖) | = 1 ∑ |𝐺 [𝑖] |
𝑁 2 𝑘≠𝑖 | 𝑘𝑘 | |
𝑁 2 𝑘≠𝑖 𝑘𝑘 |
𝑁−1
(7.25) 1 1 𝜂
≤ ∑
𝑁𝜂 𝑁 𝛼=1 |𝜇𝛼 − 𝑧|2
1 1 (𝑖)
= (1 − ) Im 𝑚𝑁 .
𝑁𝜂 𝑁
(𝑖)
By (7.16), we can estimate 𝑚𝑁 by 𝑚𝑁 . Thus the estimates (7.22) and (7.25)
confirm that the size of 𝑍𝑖 is roughly
1
(7.26) |𝑍𝑖 | ≲ √Im 𝑚𝑁
√𝑁𝜂
in the second-moment sense. In Section 7.2 we will prove that this inequality
actually holds in large-deviation sense; i.e., we have
1
(7.27) |𝑍𝑖 | ≺ √Im 𝑚𝑁 .
√𝑁𝜂
The diagonal entry ℎ𝑖𝑖 can be easily estimated. Since the single-entry dis-
tribution has finite moments (5.6), we have
ℙ(|ℎ𝑖𝑖 | ≥ 𝑁 𝜀 𝑁 −1/2 ) ≤ 𝐶𝑝 𝑁 −𝜀𝑝
for each 𝑖 and for any 𝜀 > 0. Hence we can guarantee that all diagonal elements
ℎ𝑖𝑖 simultaneously satisfy |ℎ𝑖𝑖 | ≺ 𝑁 −1/2 .
Step 4. Initial estimate at large scales.
To control the error terms in the self-consistent equation (7.18), we need
two inputs. First, from now on we assume that (7.27) holds. Since 𝑁𝜂 is large,
this implies that 𝑍𝑖 is small provided that Im 𝑚𝑁 is bounded. Second, we need
to ensure that the denominator in the right-hand side of (7.18) does not become
too small. Since the main term in this denominator is 𝑧 + 𝑚𝑁 (𝑧), our task is to
show that
1
(7.28) ≺ 1.
|𝑧 + 𝑚𝑁 (𝑧)|
Notice that both inputs are in terms of the yet uncontrolled quantity 𝑚𝑁 ; they
would be trivially available if 𝑚𝑁 were replaced with 𝑚sc (see Lemma 6.2). Since
the smallness of 𝑚𝑁 −𝑚sc is our goal, to break this apparently circular argument
we will use a bootstrap strategy. The convenient bootstrap parameter is 𝜂. We
first establish the result for large 𝜂 in this section, which is called the initial
estimate. Then, in the next section, step by step we reduce the value 𝜂 by using
the control from the previous scale to estimate Im 𝑚𝑁 and |𝑧 + 𝑚𝑁 (𝑧)|−1 . This
control will use the large-deviation bounds on 𝑍𝑖 , which hold with very high
probability. Hence at each step an exceptional event of very small probability
will have to be excluded. This is the main reason why the bootstrap argument is
52 7. WEAK LOCAL SEMICIRCLE LAW
2 | |
| 1 1 − 𝑚sc 1 |
(7.36) |𝑚 − + | |𝑚 − | ≤ 𝐶𝛿,
| 𝑚sc 𝑚sc | | 𝑚sc |
and repeat the previous argument with the roles of 𝑚sc and 1/𝑚sc interchanged.
−1
We conclude that either |𝑚 − 𝑚sc | ≤ 12 |1 − 𝑚sc
2
|/|𝑚sc |, in which case we im-
−1 −1
mediately get |𝑚 − 𝑚sc | ≲ 𝛿/√𝜅 + 𝜂, or |𝑚 − 𝑚sc | ≳ √𝜅 + 𝜂. In the latter
case, combining it with |𝑚 − 𝑚sc | ≳ √𝜅 + 𝜂 from the previous argument, we
−1 2
would get |𝑚 − 𝑚sc ||𝑚 − 𝑚sc | ≳ 𝜅 + 𝜂. Since 𝜅 + 𝜂 ≳ |1 − 𝑚sc | ≥ 𝐶 ′ 𝛿, this
′
would contradict (7.34) if 𝐶 is large enough. This completes the proof of the
lemma. □
| 1 |
(7.37) min{|𝑚𝑁 (𝑧) − 𝑚sc (𝑧)|, ||𝑚𝑁 (𝑧) − |} ≺
𝑚sc (𝑧) |
1 1/2 max𝑖 |Ω𝑖 |
min{(max |Ω𝑖 |) , };
|𝑧 + 𝑚𝑁 (𝑧)| 𝑖 √𝜅
Applying (7.32) once again, with 𝛿 = (𝑁𝜂)−1/2 and using (7.41) to exclude the
possibility that 𝑚𝑁 is close to 1/𝑚sc , we obtain the better bound
1 1
(7.45) |𝑚𝑁 (𝑧) − 𝑚sc (𝑧)| ≺ min( 1/4
, ) for any 𝜂 ≥ 𝑁 −1/16 ,
(𝑁𝜂) √𝑁𝜂𝜅
which is exactly (7.1) for 𝜂 ≥ 𝑁 −1/16 .
Step 5. Continuity argument and completion of the proof. With (7.1) proven
for any 𝜂 ≥ 𝑁 −1/16 , we now proceed to reduce the scale of 𝜂, while 𝐸, the real
7.1. PROOF OF THE WEAK LOCAL SEMICIRCLE LAW, THEOREM 7.1 55
except for the large-deviation estimate (7.27), which we will prove in the next
subsection.
Proof. Write
∑ 𝑎𝑖𝑗 𝑋𝑖 𝑌𝑗 = ∑ 𝑏𝑗 𝑌𝑗 , 𝑏𝑗 ∶= ∑ 𝑎𝑖𝑗 𝑋𝑖 .
𝑖,𝑗 𝑗 𝑖
Note that (𝑏𝑗 ) and (𝑌𝑗 ) are independent families. By conditioning on the family
(𝑏𝑗 ), we therefore get from Lemma 7.8 and the triangle inequality that
1/2 1/2
‖∑ 𝑏 𝑌 ‖ ≤ (𝐶𝑝)1/2 𝜇 ‖∑|𝑏 |2 ‖ ≤ (𝐶𝑝) 1/2
𝜇 (∑ ‖𝑏 ‖ 2
) .
‖ 𝑗 𝑗‖
𝑝
𝑝‖ 𝑗 ‖
𝑝/2
𝑝 𝑗 𝑝
𝑗 𝑗 𝑗
where the sum ranges over all partitions of ℕ𝑁 = {1, … , 𝑁} into two sets 𝐼 and 𝐽,
and 𝑍𝑁 ∶= 2𝑁−2 is independent of 𝑖 and 𝑗. Moreover, we have
(7.62) ∑ 1 = 2𝑁 − 2,
𝐼⊔𝐽=ℕ𝑁
where the sum ranges over nonempty subsets 𝐼 and 𝐽. Now we may estimate
‖∑ 𝑎 𝑋 𝑋 ‖ ≤ 1 ∑ ‖‖∑ ∑ 𝑎𝑖𝑗 𝑋𝑖 𝑋𝑗 ‖‖
‖ 𝑖𝑗 𝑖 𝑗 ‖
𝑝 𝑍𝑁 𝑝
𝑖≠𝑗 𝐼⊔𝐽=ℕ 𝑖∈𝐼 𝑗∈𝐽 𝑁
1 1/2
2
≤ ∑ 𝐶𝑝𝜇𝑝 (∑|𝑎𝑖𝑗 |2 )
𝑍𝑁 𝐼⊔𝐽=ℕ𝑁 𝑖≠𝑗
where we used that, for any partition 𝐼 ⊔𝐽 = ℕ𝑁 , the families (𝑋𝑖 )𝑖∈𝐼 and (𝑋𝑗 )𝑗∈𝐽
are independent, and hence Lemma 7.9 is applicable. The claim now follows
from (7.62). □
As remarked above, the proof of Lemma 7.10 may be easily extended to
∗
multilinear expressions of the form ∑𝑖 ,…,𝑖 𝑎𝑖1 …𝑖𝑘 𝑋𝑖1 ⋯ 𝑋𝑖𝑘 .
1 𝑘
We may now complete the proof of Theorem 7.7.
Proof of Theorem 7.7. The proof is a simple application of Chebyshev’s
inequality. Part (i) follows from Lemma 7.8, part (ii) from Lemma 7.10, and part
(iii) from Lemma 7.9. We give the details for part (iii).
For 𝜖 > 0 and 𝐷 > 0 we have
1/2
ℙ[||∑ 𝑎𝑖𝑗 𝑋𝑖 𝑋𝑗 || ≥ 𝑁 𝜀 Ψ] ≤ ℙ[||∑ 𝑎𝑖𝑗 𝑋𝑖 𝑋𝑗 || ≥ 𝑁 𝜀 Ψ, (∑|𝑎𝑖𝑗 |2 ) ≤ 𝑁 𝜀/2 Ψ]
𝑖≠𝑗 𝑖≠𝑗 𝑖≠𝑗
1/2
+ ℙ[(∑|𝑎𝑖𝑗 |2 ) ≥ 𝑁 𝜀/2 Ψ]
𝑖≠𝑗
1/2
≤ ℙ[||∑ 𝑎𝑖𝑗 𝑋𝑖 𝑋𝑗 || ≥ 𝑁 𝜀/2 (∑|𝑎𝑖𝑗 |2 ) ] + 𝑁 −𝐷−1
𝑖≠𝑗 𝑖≠𝑗
2 𝑝
𝐶𝑝𝜇𝑝
≤( ) + 𝑁 −𝐷−1
𝑁 𝜀/2
for arbitrary 𝐷. In the second step we used the definition of (∑𝑖≠𝑗 |𝑎𝑖𝑗 |2 )1/2 ≺
Ψ with parameters 𝜀/2 and 𝐷 + 1. In the last step we used Lemma 7.10 by
conditioning on (𝑎𝑖𝑗 ). Given 𝜀 and 𝐷, there is a large enough 𝑝 such that the
first term on the last line is bounded by 𝑁 −𝐷−1 . Since 𝜖 and 𝐷 were arbitrary,
the proof is complete.
The claimed uniformity in 𝑢 in the case that 𝑎𝑖𝑗 and 𝑋𝑖 depend on an index 𝑢
also follows from the above estimate. □
CHAPTER 8
In this section we start the proof of the local semicircle law, Theorem 6.7.
This section can be read independently of the previous Chapter 7, so some basic
definitions and facts are repeated for convenience. For those readers who may
wish to compare this argument with the proof of the weak law, Theorem 7.1, we
mention that the basic strategy is similar except that we use an additional mech-
anism that we call fluctuation averaging. We point out that the first estimate in
(7.29) was not optimal; here we estimated the average of Ω𝑖 by its maximum.
Since Ω𝑖 ’s are almost centered random variables with weak correlation, a more
precise estimate of their average leads to a considerable improvement. We now
recall that the basic steps to prove the weak law were
(1) the self-consistent equation for 𝑚𝑁 ,
(𝑖)
(2) the interlacing of eigenvalues to compare 𝑚𝑁 and 𝑚𝑁 ,
(3) the quadratic large-deviation estimate for the error term 𝑍𝑖 ,
(4) the initial estimate for large 𝜂, and
(5) extending the estimate to small 𝜂 by the continuity argument.
Exploiting the fluctuation averaging mechanism requires a control on the
individual matrix elements of the resolvent 𝐺 instead of just its normalized trace,
𝑚𝑁 = 𝑁 −1 Tr 𝐺. Therefore, instead of considering the scalar equation for 𝑚𝑁 ,
we will consider the vector self-consistent equation for the diagonal elements 𝐺𝑖𝑖
of the Green function and investigate the stability of this equation. There is
no direct analogue of the interlacing property for 𝐺𝑖𝑖 ; thus, we introduce new
resolvent decoupling identities to compare the resolvents of the original matrix
and its minors. We will still use the quadratic large-deviation estimate to bound
the error term. We also use a continuity argument similar to the one given in
the weak law to derive a crude estimate on 𝐺𝑖𝑖 (formulated in terms of a certain
dichotomy), but instead of making many small steps and keeping track of the
exceptional sets, we follow a genuinely continuous approach within the frame-
work of the stochastic domination. Finally, we use the fluctuation-averaging
lemma (Lemma 8.9) to boost the error estimate by one order and an iteration
argument to prove the local semicircle law, Theorem 6.7. We now start the
rigorous proof. We will largely follow the presentation in [55].
8.1. Tools
In this subsection we collect some basic definitions and facts. First, we
repeat Definition 7.3 of the partial expectation.
61
62 8. PROOF OF THE LOCAL SEMICIRCLE LAW
Moreover, we define the resolvent of 𝐻 (𝕋) and its normalized trace through
(𝕋) 1
𝐺𝑖𝑗 (𝑧) ∶= (𝐻 (𝕋) − 𝑧)−1
𝑖𝑗 , 𝑚(𝕋) (𝑧) ∶= Tr 𝐺 (𝕋) (𝑧).
𝑁
We also set the notation
(𝕋)
∑ ∶= ∑ .
𝑖 𝑖∶𝑖∉𝕋
These definitions are the natural generalizations of 𝐻 (𝑖) and 𝐺 (𝑖) introduced
in Chapter 1. In particular, notice that 𝐻 (𝕋) is the matrix obtained by setting all
rows and columns in 𝕋 to 0. This is different from considering the minors by
removing the columns and rows in 𝕋. Similarly, 𝐺 (𝕋) is still an 𝑁 × 𝑁 matrix
(𝕋) (𝕋)
with 𝐺𝑖𝑖 = −𝑧−1 for 𝑖 ∈ 𝕋 and 𝐺𝑖𝑗 = 0 if 𝑖 ∈ 𝕋 and 𝑗 ∉ 𝕋. We will denote
𝐺 ({𝑖}) simply by 𝐺 (𝑖) and similarly for a few more indices. This is consistent with
the notation we used in the earlier chapters.
The following resolvent decoupling identities form the backbone of all of our
calculations. The idea behind them is that a resolvent matrix element 𝐺𝑖𝑗 de-
pends strongly on the 𝑖th and 𝑗th columns of 𝐻, but weakly on all other columns.
The first identity determines how to make a resolvent matrix element 𝐺𝑖𝑗 in-
dependent of an additional index 𝑘 ≠ 𝑖, 𝑗. The second identity expresses the
dependence of a resolvent matrix element 𝐺𝑖𝑗 on the matrix elements in the 𝑖th
or 𝑗th column of 𝐻. We added a third identity that relates sums of off-diagonal
resolvent entries with a diagonal one. The proofs are elementary.
Lemma 8.3 (Resolvent decoupling identities). For any real or complex Her-
mitian matrix 𝐻 and 𝕋 ⊂ {1, … , 𝑁} the following identities hold:
(i) First resolvent decoupling identity [69]: If 𝑖, 𝑗, 𝑘 ∉ 𝕋 and 𝑖, 𝑗 ≠ 𝑘, then
(𝕋) (𝕋)
(𝕋) (𝕋𝑘)
𝐺𝑖𝑘 𝐺𝑘𝑗
(8.1) 𝐺𝑖𝑗 = 𝐺𝑖𝑗 + .
(𝕋)
𝐺𝑘𝑘
8.1. TOOLS 63
(𝕋𝑖) (𝕋𝑗)
(𝕋) (𝕋) (𝕋𝑖) (𝕋) (𝕋𝑗)
(8.2) 𝐺𝑖𝑗 = −𝐺𝑖𝑖 ∑ ℎ𝑖𝑘 𝐺𝑘𝑗 = −𝐺𝑗𝑗 ∑ 𝐺𝑖𝑘 ℎ𝑘𝑗
𝑘 𝑘
where the superscript in the summation means omission; e.g., the sum-
mation in the first sum runs over all 𝑘 ∉ 𝕋 ∪ {𝑖}.
(iii) Ward identity. For any 𝕋 ⊂ {1, … , 𝑁} we have
2 1
∑ ||𝐺𝑖𝑗 || = Im 𝐺𝑖𝑖 .
(𝕋) (𝕋)
(8.3)
𝑗
𝜂
Proof. We will prove this lemma only for 𝕋= 0; the general case is a straight-
forward modification. We first consider (8.2). Recall the resolvent expansion
stating that for any two matrices 𝐴 and 𝐵,
1 1 1 1 1 1 1
(8.4) = − 𝐵 = − 𝐵
𝐴+𝐵 𝐴 𝐴+𝐵 𝐴 𝐴 𝐴 𝐴+𝐵
provided that all the matrix inverses exist.
To obtain the first formula in (8.2), we use the first resolvent identity (8.4)
(𝑖)
at the (𝑖𝑗)th matrix element with 𝐴 = 𝐻 (𝑖) − 𝑧 and 𝐵 = 𝐻 − 𝐻 (𝑖) . Since 𝐺𝑖𝑗 =
(𝐴−1 )𝑖𝑗 = 0 if 𝑖 ≠ 𝑗, we immediately have
The second identity in (8.2) follows in the same way by using the second
identity in (8.4). To prove the identity (8.1), we let 𝐴 = 𝐻 (𝑘) − 𝑧 and 𝐵 =
𝐻 − 𝐻 (𝑘) . Then, from the first formula in (8.4), we have
where in the second step we used (8.1). We will compare (8.8) with the defining
equation of 𝑚sc :
1
(8.9) 𝑚sc = ,
−𝑧 − 𝑚sc
so we introduce the notation for the difference
𝑣𝑖 ∶= 𝐺𝑖𝑖 − 𝑚sc .
Recalling (6.1), we get the following system of self-consistent equations for 𝑣𝑖 :
1
(8.10) 𝑣𝑖 = − 𝑚sc ,
−𝑧 − 𝑚sc − (∑𝑘 𝑠𝑖𝑘 𝑣𝑘 − Υ𝑖 )
where
𝐺𝑖𝑘 𝐺𝑘𝑖
Υ𝑖 ∶= 𝐴𝑖 + ℎ𝑖𝑖 − 𝑍𝑖 , 𝐴𝑖 ∶= ∑ 𝑠𝑖𝑘 ,
𝑘
𝐺𝑖𝑖
(8.11) (𝑖)
(𝑖)
𝑍𝑖 ∶= 𝑄𝑖 [∑ ℎ𝑖𝑘 𝐺𝑘𝑙 ℎ𝑙𝑖 ].
𝑘,𝑙
All these quantities depend on 𝑧, but we omit it in the notation. We will show
that Υ is a small error term. This is clear about ℎ𝑖𝑖 by (6.27). The term 𝐴𝑖 will
be small since off-diagonal resolvent entries are small. Finally, 𝑍𝑖 will be small
by a large-deviation estimate (7.59) from Theorem 7.7.
8.2. SELF-CONSISTENT EQUATIONS ON TWO LEVELS 65
Before we present more details, we heuristically show the power of this new
system of self-consistent equations (8.10) and compare it with the single self-
consistent equation used in Chapter 7 and, in fact, used in all previous literature
on the resolvent method for Wigner matrices.
4
where we have used the defining equation (8.9) for 𝑚sc and used that |𝑚sc |≤1
in the error term. After rearranging (8.14), we get
2
(8.15) [(1 − 𝑚sc 𝑆)𝐯]𝑖 = ℰ𝑖 ∶=
2 3
2 3
− 𝑚sc Υ𝑖 + 𝑚sc (∑ 𝑠𝑖𝑘 𝑣𝑘 − Υ𝑖 ) + 𝑂((∑ 𝑠𝑖𝑘 𝑣𝑘 − Υ𝑖 ) ),
𝑘 𝑘
where we have also used the defining equation of 𝑚sc . If |𝑣𝑖 | = 𝑜(1), then we
2
expand 𝑣𝑖 to the second-order and multiply both sides by 𝑚sc to obtain
2 2 1 2
(8.17) [(1 − 𝑚sc 𝑆)𝐯]𝑖 = ℰ𝑖 ∶= −𝑚sc Υ𝑖 + 𝑣 + 𝑂(|𝑣𝑖 |3 ),
𝑚sc 𝑖
2
where we have estimated |𝑚sc | ≥ 𝑐 in the last term (see Lemma 6.2).
Notice that the definitions of ℰ𝑖 in (8.15) and (8.18) are different, although
2
their leading behavior −𝑚sc Υ𝑖 is the same. In both cases we can continue the
2
analysis by inverting the operator (1 − 𝑚sc 𝑆) to obtain
1 ‖ 1 ‖
(8.18) 𝐯 = ℰ, hence ‖𝐯‖∞ ≤ ‖‖ 2 ‖
‖ℰ‖∞ = Γ‖ℰ‖∞ ,
2
1 − 𝑚sc 𝑆 1 − 𝑚sc 𝑆 ‖∞→∞
and this relation shows how the quantity Γ, defined in (6.17), emerges. If the
error term is indeed small and Γ is bounded, then we obtain that ‖𝐯‖∞ =
max |𝐺𝑖𝑖 − 𝑚sc | is small.
While the expansion logic behind equations (8.14) and (8.17) is the same
and the resulting formulae are very similar, the structure of the main proof de-
pends on which linearization of the self-consistent equation is used. In both
cases we need to derive an a priori bound to ensure that the expansion is valid.
Intuitively, the smallness of ∑𝑘 𝑠𝑖𝑘 𝑣𝑘 −Υ𝑖 seems easier than that of 𝑣𝑖 , since both
terms are averaged quantities and extra averaging typically helps. But these are
random objects and every estimate comes with an exceptional set in the prob-
ability space where it does not hold. It turns out that on the technical level it
is worth minimizing the bookkeeping of these events, and this reason favors
the second version of the linearization, which operates with controlling a sin-
gle quantity, max𝑖 |𝑣𝑖 |. In this book, therefore, we will follow the linearization
(8.17). We remark that the other option was used in [69, 70], which required
first proving the weak semicircle law, Theorem 7.1, to provide the necessary
a priori bound. The linearization (8.17) circumvents this step and the a priori
bound on 𝑣𝑖 will be proved directly.
8.3. PROOF OF THE LOCAL SEMICIRCLE LAW WITHOUT USING THE SPECTRAL GAP 67
8.3. Proof of the Local Semicircle Law Without Using the Spectral Gap
In this section we prove a restricted version of Theorem 6.7; namely, we
replace threshold ˜𝜂 𝐸 with a larger threshold 𝜂𝐸 defined as
1 𝑀 −𝛾 𝑀 −2𝛾
𝜂𝐸 ∶= min{𝜂 ∶ ≤ min{ , }
𝑀𝜉 Γ(𝐸 + i𝜉)3 Γ(𝐸 + i𝜉)4 Im 𝑚sc (𝐸 + i𝜉)
(8.19)
holds for all 𝜉 ≥ 𝜂}.
This definition is exactly the same as (6.28), but Γ̃ is replaced with the larger
quantity Γ; in other words, we do not make use of the spectral gap in 𝑆. This
will pedagogically simplify the presentation, but it will prove the estimates in
Theorem 6.7 only for the 𝜂 ≥ 𝜂𝐸 regime. In Chapter 9 we will give the proof
for the entire 𝜂 ≥ ˜ 𝜂 𝐸 regime. We recall Lemma 6.3 showing that there is no
difference between Γ and Γ̃ for generalized Wigner matrices away from the
edges (both are of order 1), so readers interested in the local semicircle law
only in the bulk should be content with the simpler proof. Near the spectral
edges, however, there is a substantial difference. Note that even in the Wigner
case (see (6.22)), 𝜂𝐸 is much larger near the spectral edges than the optimal
threshold ˜𝜂 𝐸 . For generalized Wigner matrices, while ˜ 𝜂 𝐸 ≫ 𝑁1 , the threshold
𝜂𝐸 is determined by the relation 𝑁𝜂𝐸 (𝜅𝐸 + 𝜂𝐸 )3/2 ≫ 1, where 𝜅𝐸 = ||𝐸| − 2| is
the distance of 𝐸 from the spectral edges.
We stress, however, that the proof given below does not use any model-
specific upper bound on Γ, such as (6.22) or (6.24); only the trivial lower and
upper bounds, (6.19) and (6.20), are needed. The actual size of Γ enters only
implicitly by determining the threshold 𝜂𝐸 . This makes the argument applicable
to a wide class of problems beyond generalized Wigner matrices, including band
matrices; see [55].
Definition 8.4. A deterministic nonnegative function Ψ ≡ Ψ(𝑁) (𝑧) is
called an admissible control parameter if we have
(8.20) 𝑐𝑀 −1/2 ≤ Ψ ≤ 𝑀 −𝑐
for some constant 𝑐 > 0 and large enough 𝑁. Moreover, after fixing a 𝛾 > 0, we
call any (possibly 𝑁-dependent) subset
D = D(𝑁) ⊂ {𝑧 ∶ |𝐸| ≤ 10, 𝑀 −1 ≤ 𝜂 ≤ 10}
a spectral domain.
In this section we will mostly use the spectral domain
(8.21) S ∶= {𝑧 ∶ |𝐸| ≤ 10, 𝜂 ∈ [𝜂𝐸 , 10]}
where we note that
1 −1+𝛾
(8.22) 𝜂𝐸 ≥ 𝑀 ,
8
68 8. PROOF OF THE LOCAL SEMICIRCLE LAW
using the lower bound Γ ≥ 𝑐 from (6.19) in the definition (8.19). Define the
random control parameters
(8.23) Λo ∶= max|𝐺𝑖𝑗 |, Λd ∶= max|𝐺𝑖𝑖 − 𝑚sc |, Λ ∶= max(Λo , Λd ),
𝑖≠𝑗 𝑖
where the letters d and o refer to diagonal and off-diagonal elements. In the
typical regime that we will work, all these quantities are small. The key quantity
is Λ, and we will develop an iterative argument to control it. We first derive
an estimate of Λo + |Υ𝑖 | in terms of Λ. This will be possible only in the event
when Λ is already small, so we will need to introduce an indicator function
𝜙 = 1(Λ ≤ 𝑀 −𝑐 ) with some small 𝑐. More generally, we will consider any
indicator function 𝜙 so that 𝜙Λ ≺ 𝑀 −𝑐 . Notice that this is a somewhat weaker
concept than 𝜙Λ ≤ 𝑀 −𝑐 (even if the exponent 𝑐 is slightly adjusted), but it
turns out to be more flexible since algebraic manipulations involving ≺ (see
Proposition 6.5) can be directly used.
Im 𝑚sc + Λ
(8.24) 𝜙(Λo + |𝑍𝑖 | + |Υ𝑖 |) ≺
√ 𝑀𝜂
uniformly in 𝑧 ∈ 𝐃. Moreover, for any fixed (𝑁-independent) 𝜂 > 0 we have
(8.25) Λo + |𝑍𝑖 | + |Υ𝑖 | ≺ 𝑀 −1/2
uniformly in 𝑧 ∈ {𝑤 ∈ 𝐃 ∶ Im 𝑤 = 𝜂}.
In other words, (8.24) means that
Im 𝑚sc + Λ
(8.26) Λo + |𝑍𝑖 | + |Υ𝑖 | ≺
√ 𝑀𝜂
on the event where Λ ≺ 𝑀 −𝑐 has been a priori established.
We begin with the first statement in Lemma 8.5. First we estimate 𝑍𝑖 , which
we split as
| (𝑖) 2 (𝑖) | | (𝑖) (𝑖) |
(8.29) 𝜙|𝑍𝑖 | ≤ 𝜙|∑(|ℎ𝑖𝑘 | − 𝑠𝑖𝑘 )𝐺𝑘𝑘 | + 𝜙| ∑ ℎ𝑖𝑘 𝐺𝑘𝑙 ℎ𝑙𝑖 |.
|𝑘 | |𝑘≠𝑙 |
We estimate each term using Theorem 7.7 by conditioning on 𝐺 (𝑖) and using
the fact that the family (ℎ𝑖𝑘 )𝑁 (𝑖)
𝑘=1 is independent of 𝐺 . By (7.57) the first term
of (8.29) is stochastically dominated by
(𝑖) 1/2
2 | (𝑖) |2
𝜙[∑ 𝑠𝑖𝑘 𝐺𝑘𝑘 ] ≺ 𝑀 −1/2 ,
𝑘
where (8.28), (6.3), and (6.1) were used. For the second term of (8.29) we apply
1/2 (𝑖) 1/2 −1/2
(7.59) from Theorem 7.7 with 𝑎𝑘𝑙 = 𝑠𝑖𝑘 𝐺𝑘𝑙 𝑠𝑙𝑖 and 𝑋𝑘 = 𝑠𝑖𝑘 ℎ𝑖𝑘 . We find
(𝑖) 2 (𝑖) 2
(𝑖) 𝜙 (𝑖)
𝜙 ∑ 𝑠𝑖𝑘 ||𝐺𝑘𝑙 || 𝑠𝑙𝑖 ≤ ∑ 𝑠𝑖𝑘 ||𝐺𝑘𝑙 ||
𝑘,𝑙
𝑀 𝑘,𝑙
(8.30)
(𝑖)
𝜙 (𝑖) Im 𝑚sc + Λ
= ∑ 𝑠𝑖𝑘 Im 𝐺𝑘𝑘 ≺ ,
𝑀𝜂 𝑘 𝑀𝜂
where in the last step we used (8.1) and the estimate 1/𝐺𝑖𝑖 ≺ 1. Thus, we get
Im 𝑚sc + Λ
(8.31) 𝜙|𝑍𝑖 | ≺ ,
√ 𝑀𝜂
where we absorbed the bound 𝑀 −1/2 on the first term of (8.29) into the right-
hand side of (8.31). Here we only needed to use Im 𝑚sc (𝑧) ≥ 𝑐𝜂 as follows from
an explicit estimate; see (6.13).
Next, we estimate Λo . We can iterate (8.2) once to get, for 𝑖 ≠ 𝑗,
(𝑖) (𝑖𝑗)
(𝑖) (𝑖) (𝑖𝑗)
(8.32) 𝐺𝑖𝑗 = −𝐺𝑖𝑖 ∑ ℎ𝑖𝑘 𝐺𝑘𝑗 = −𝐺𝑖𝑖 𝐺𝑗𝑗 (ℎ𝑖𝑗 − ∑ ℎ𝑖𝑘 𝐺𝑘𝑙 ℎ𝑙𝑗 ).
𝑘 𝑘,𝑙
−1/2
The term ℎ𝑖𝑗 is trivially 𝑂≺ (𝑀 ). In order to estimate the other term, we
1/2 1/2 (𝑖𝑗)
−1/2
invoke (7.58) from Theorem 7.7 with 𝑎𝑘𝑙 = 𝑠𝑖𝑘 𝐺𝑘𝑙 𝑠𝑙𝑗 , 𝑋𝑘 = 𝑠𝑖𝑘 ℎ𝑖𝑘 , and
−1/2
𝑌𝑙 = 𝑠𝑙𝑗 ℎ𝑙𝑗 . As in (8.30), we find
(𝑖) 2
(𝑖𝑗) Im 𝑚sc + Λ
𝜙 ∑ 𝑠𝑖𝑘 ||𝐺𝑘𝑙 || 𝑠𝑙𝑗 ≺ ,
𝑘,𝑙
𝑀𝜂
and thus
Im 𝑚sc + Λ
(8.33) 𝜙Λo ≺ ,
√ 𝑀𝜂
where we again absorbed the term ℎ𝑖𝑗 ≺ 𝑀 −1/2 into the right-hand side.
70 8. PROOF OF THE LOCAL SEMICIRCLE LAW
Proof. The core of the proof is a continuity argument. The first task is to
establish a gap in the range of Λ by establishing a dichotomy. Roughly speaking,
the following lemma asserts that, for all 𝑧 ∈ 𝐒, with high probability either
Λ ≤ 𝑀 −𝛾/2 Γ−1 or Λ ≥ 𝑀 −𝛾/4 Γ−1 ; i.e., there is a gap or forbidden region in the
range of Λ with very high probability.
Lemma 8.8. We have the bound
𝟏(Λ ≤ 𝑀 −𝛾/4 Γ−1 )Λ ≺ 𝑀 −𝛾/2 Γ−1
uniformly in 𝐒.
Proof of Lemma 8.8 . Set
𝜙 ∶= 𝟏(Λ ≤ 𝑀 −𝛾/4 Γ−1 ).
Then by definition we have 𝜙Λ ≤ 𝑀 −𝛾/4 Γ−1 ≤ 𝐶𝑀 −𝛾/4 , where in the last step
we have used that Γ is bounded below (6.19). Hence we may invoke (8.24) to
estimate Λo and Υ𝑖 by √(Im 𝑚sc + Λ)/𝑀𝜂. In order to estimate Λd , we use (8.18)
to get
Im 𝑚sc + Λ
(8.38) 𝜙Λd = 𝜙 max|𝑣𝑖 | ≺ Γ(Λ2 + ).
𝑖 √ 𝑀𝜂
Recalling (6.19) and (8.24), we therefore get
Im 𝑚sc + Λ
(8.39) 𝜙Λ ≺ 𝜙Γ(Λ2 + ).
√ 𝑀𝜂
Next, by definition of 𝜙, we may estimate
𝜙ΓΛ2 ≤ 𝑀 −𝛾/2 Γ−1 .
Moreover, by definition of 𝐒, we have
1 𝑀 −𝛾 𝑀 −2𝛾
(8.40) ≤ min{ 3 , 4 }.
𝑀𝜂 Γ Γ Im 𝑚sc
Together with the definition of 𝜙, we have
Figure 8.1. The (𝜂, Λ)-plane for a fixed 𝐸 with the graph of 𝜂 →
Λ(𝐸 + i𝜂). The shaded region is forbidden with high probability
by Lemma 8.8. The initial estimate, Lemma 8.6, is marked with
a black dot. The graph of Λ = Λ(𝐸 + i𝜂) is continuous and
lies beneath the shaded region. Note that this method does not
control Λ(𝐸 + 𝑖𝜂) in the regime 𝜂 ≤ 𝜂𝐸 .
If we knew that Λ is excluded from the interval [𝑀 −𝛾/2 Γ−1 , 𝑀 −𝛾/4 Γ−1 ], then
we could immediately finish the proof of Proposition 8.7. We could argue that
Λ = Λ(𝐸 + 𝑖𝜂) is continuous in 𝜂 = Im 𝑧 and hence cannot jump from one side
of the gap to the other; moreover, for 𝜂 = 2 it is below the gap by Lemma 8.6, so
Λ is below the gap for all 𝑧 ∈ 𝐒 with high probability. For a pictorial illustration
of this argument, see Figure 8.1 (borrowed from [55]).
However, Lemma 8.8 guarantees a gap in the range of Λ only with a very
high probability for each fixed 𝑧. We need to use a fine discrete grid in the
space of 𝑧 to upgrade this statement to all 𝑧 with high probability. Then the
continuity argument described in the previous paragraph will be valid with high
probability. In the next step we explain the details.
The continuity argument. Fix 𝐷 > 10. Lemma 8.8 implies that for each 𝑧 ∈ 𝐒
the probability that Λ falls into the gap (or the forbidden region) is very small;
i.e., we have
Typical example weights are 𝑡𝑖𝑘 = 𝑠𝑖𝑘 and 𝑡𝑖𝑘 = 𝑁 −1 . Note that in both of these
cases 𝑇 commutes with 𝑆.
Lemma 8.9 (Fluctuation averaging). Fix a spectral domain 𝐃 and a deter-
ministic control parameter Ψ satisfying (8.20). Let the weights 𝑇 = (𝑡𝑖𝑘 ) satisfy
(8.45).
(i) If Λ ≺ Ψ, then we have
(8.46) ∑ 𝑡𝑖𝑘 𝑄𝑘 𝐺𝑘𝑘 = 𝑂≺ (Ψ2 ).
𝑘
74 8. PROOF OF THE LOCAL SEMICIRCLE LAW
then
(8.50) ∑ 𝑡𝑖𝑘 (𝑣𝑘 − [𝑣]) = 𝑂≺ ( Γ̃ Ψ2 ) for all 𝑖
𝑘
Proof. The claim easily follows from the Schur complement formula (8.7)
written in the form
1
Υ𝑖 = 𝐴𝑖 + 𝑄𝑖 .
𝐺𝑖𝑖
We may therefore estimate ∑𝑘 𝑡𝑎𝑘 Υ𝑘 using the trivial bound |𝐴𝑖 | ≺ Ψ2𝑜 as well
as the fluctuation averaging bound from (8.47). □
8.3.5. The Final Iteration Scheme. First note that Proposition 8.7 guar-
antees that 𝜙 ≡ 1 may be chosen in Lemma 8.5, since the condition 𝜙Λ ≺ 𝑀 −𝑐
is satisfied. Therefore, Lemma 8.5 asserts that Λo is stochastically dominated
by
Im 𝑚sc + Λ Im 𝑚sc 𝑀 𝜀
≤ 𝑀 −𝜀 Λ + + ,
√ 𝑀𝜂 √ 𝑀𝜂 𝑀𝜂
where we have used the Schwarz inequality. The next lemma, the main estimate
behind the proof of Theorem 6.7, extends this estimate to bound Λd with the
same quantity. Thus, roughly speaking, we can estimate Λ by 𝑀 −𝜀 Λ plus a
deterministic error term. This gives a recursive relation on the upper bound
for Λ, which will be the basic step of our iteration scheme.
8.3. PROOF OF THE LOCAL SEMICIRCLE LAW WITHOUT USING THE SPECTRAL GAP 75
Im 𝑚sc 𝑀 𝜀
𝐹(Ψ) ∶= 𝑀 −𝜀 Ψ + + .
√ 𝑀𝜂 𝑀𝜂
Im 𝑚sc + Λ Im 𝑚sc + Ψ
(8.53) Λo + |𝑍𝑖 | + |Υ𝑖 | ≺ ≺ .
√ 𝑀𝜂 √ 𝑀𝜂
Next, we estimate Λd . Define the 𝑧-dependent indicator function
𝜓 ∶= 𝟏(Λ ≤ 𝑀 −𝛾/4 ).
By (8.51), (6.19), and the assumption Λ ≺ Ψ, we have 1 − 𝜓 ≺ 0. On the event
{𝜓 = 1}, (8.17) is rigorous and we get the bound
Using the fluctuation averaging estimate (8.48) to bound ∑𝑘 𝑠𝑖𝑘 𝑣𝑘 and (8.53) to
bound Υ𝑖 , we find
Im 𝑚sc + Ψ
(8.54) 𝜓|𝑣𝑖 | ≺ ΓΨ2 + ,
√ 𝑀𝜂
where we used the assumption Λ ≺ Ψ and the lower bound from (6.19) so
that 𝐶𝜓Λ2 ≤ ΓΨ2 . Since the set {𝜓 = 0} has very small probability (using our
notation, this is expressed by 1 − 𝜓 ≺ 0), we conclude
Im 𝑚sc + Ψ
(8.55) Λd ≺ ΓΨ2 + ,
√ 𝑀𝜂
which, combined with (8.53), yields
Im 𝑚sc + Ψ
(8.56) Λ ≺ ΓΨ2 + .
√ 𝑀𝜂
Using the Schwarz inequality and the assumption Ψ ≤ 𝑀 −𝛾/3 Γ−1 , we conclude
the proof. □
76 8. PROOF OF THE LOCAL SEMICIRCLE LAW
Im 𝑚sc 𝑀 𝜀
Λ≺ + .
√ 𝑀𝜂 𝑀𝜂
Im 𝑚sc (𝑧) 1
(8.57) Λ≺Π= + ,
√ 𝑀𝜂 𝑀𝜂
which is (6.31).
What remains is to prove (6.32). On the set {𝜓 = 1}, we once again use (8.17)
to get
2
(8.58) 𝜓𝑚sc (− ∑ 𝑠𝑖𝑘 𝑣𝑘 + Υ𝑖 ) = −𝜓𝑣𝑖 + 𝑂(𝜓Λ2 ).
𝑘
2
𝜓[𝑣] = 𝑚sc 𝜓[𝑣] + 𝑂≺ (Π2 ).
2
Since 1 − 𝜓 ≺ 0, we conclude that [𝑣] = 𝑚sc [𝑣] + 𝑂≺ (Π2 ). Therefore,
Π2 Im 𝑚sc 1 2
|[𝑣]| ≺ 2
≤( 2
+ 2
)
|1 − 𝑚sc | |1 − 𝑚sc | |1 − 𝑚sc |𝑀𝜂 𝑀𝜂
(8.60)
Γ 2 𝐶
≤ (𝐶 + ) ≤ .
𝑀𝜂 𝑀𝜂 𝑀𝜂
2
In the third step, we used the elementary explicit bound Im 𝑚sc ≤ 𝐶|1 − 𝑚sc |
2 −1
from (6.12)–(6.13) and the bound Γ ≥ |1 − 𝑚sc | from (6.21). Because |𝑚𝑁 −
𝑚sc | = |[𝑣]|, this concludes the proof of (6.32). The proof of (6.33) is exactly
the same, just in the third inequality of (8.60) we may use the stronger bound
Im 𝑚sc ≍ 𝜂/√𝜅 + 𝜂 from (6.13) in the regime |𝐸| ≥ 2. This completes the proof
of Theorem 6.7 in the entire regime 𝐒, i.e., for 𝜂 ≥ 𝜂𝐸 .
8.3. PROOF OF THE LOCAL SEMICIRCLE LAW WITHOUT USING THE SPECTRAL GAP 77
Im 𝑚sc + Λ
(8.62) Λo + |𝑍𝑖 | + |Υ𝑖 | ≺ .
√ 𝑀𝜂
(3) An initial estimate on Λ for large 𝜂, where the a priori bounds |𝐺𝑖𝑗 | ≤
𝜂−1 ≤ 𝐶 are effective.
(4) A dichotomy, showing that there is a forbidden region for Λ:
𝟏(Λ ≤ 𝑀 −𝛾/4 Γ−1 )Λ ≺ 𝑀 −𝛾/2 Γ−1 .
(5) A crude bound on Λ down to small 𝜂 obtained from the dichotomy and
the initial estimate via a continuity argument.
(6) Application of the fluctuation averaging lemma, i.e., Lemma 8.9, to es-
timate 𝑆𝐯 in the self-consistent equation. Thus 𝑆𝐯 becomes of order Λ2 , i.e.,
one order higher than the trivial bound |𝑣𝑖 | ≤ Λ. This “boost” exploits the can-
cellations in averages of weakly correlated centered random variables, and it
constitutes the crucial improvement over Chapter 7. Thus, we can use (8.17) to
have
(8.63) 𝑣𝑖 ≲ Υ𝑖 + 𝑂(Λ2 ).
(7) Combination of the large-deviation bound (8.62) for Υ𝑖 with (8.63) pro-
vides an estimate on Λd = max𝑖 |𝑣𝑖 | in terms of Λ that is better than the trivial
bound Λd ≤ Λ. Together with the large-deviation bound on Λo in (8.62), we
have a closed inequality for Λ:
Im 𝑚sc 𝑀 𝜀
Λ ≤ 𝑀 −𝜀 Λ + + .
√ 𝑀𝜂 𝑀𝜂
The error term 𝑀 −𝜀 Λ can be absorbed into the left side, and we have proved
the estimate on Λ asserted in Theorem 6.7. In practice, the last inequality was
formulated in terms of a control parameter and we used an iteration scheme.
78 8. PROOF OF THE LOCAL SEMICIRCLE LAW
In Section 8.3 we proved the local semicircle law, Theorem 6.7, uniformly
for 𝜂 ≥ 𝜂𝐸 instead of the larger regime 𝜂 ≥ ˜ 𝜂 𝐸 . Recall that for generalized
Wigner matrices the two thresholds 𝜂𝐸 and ˜ 𝜂 𝐸 are determined by the relation
𝑁𝜂𝐸 (𝜅𝐸 + 𝜂𝐸 )3/2 ≫ 1 and ˜ 𝜂 𝐸 ≫ 1/𝑁. Hence these two thresholds coincide
in the bulk but substantially differ near the edge. In this section we sketch
the proof of Theorem 6.7 for any 𝜂 ≥ ˜ 𝜂 𝐸 , i.e., for the optimal domain for 𝜂.
For the complete proof we refer to section 6 of [55]. This section can be read
independently of Section 8.3.
We point out that the difference between these two thresholds stem from
the difference between Γ and Γ̃ (see (6.28) and (8.19)). The bound Γ on the
norm of (1 − 𝑚2 𝑆)−1 entered the proof when the self-consistent equation (8.17)
was solved. The key idea in this section is that we can use Γ̃ to solve the self-
consistent equation (8.17) separately on the subspace of constants (the span of
the vector e) and on its orthogonal complement e⟂ .
Step 1. We bound the control parameter Λ = Λ(𝑧) from (8.23) in terms of
Θ ∶= |𝑚𝑁 − 𝑚sc |. This is the content of the following lemma. From now on,
we assume that 𝑐 ≤ Γ̃ ≤ 𝐶, which is valid for generalized Wigner matrices (see
(6.24)). This will simplify several estimates in the argument that follows; for the
general case, we refer to [55].
Lemma 9.1. Define the 𝑧-dependent indicator function
(9.1) 𝜙 ∶= 𝟏(Λ ≤ 𝑀 −𝛾/4 )
and the random control parameter
Im 𝑚sc + Θ 1
(9.2) 𝑞(Θ) ∶= + , Θ = |𝑚𝑁 − 𝑚sc |.
√ 𝑀𝜂 𝑀𝜂
Then we have
(9.3) 𝜙Λ ≺ Θ + 𝑞(Θ).
Proof. For the whole proof we work on the event {𝜙 = 1}; i.e., every quan-
tity is multiplied by 𝜙. We consistently drop these factors 𝜙 from our notation in
order to avoid cluttered expressions. In particular, we set Λ ≤ 𝑀 −𝛾/4 through-
out the proof.
79
80 9. SKETCH OF THE PROOF OF THE LOCAL SEMICIRCLE LAW
Im 𝑚sc + Λ
(9.4) Λo + |Υ𝑖 | ≺ 𝑟(Λ), 𝑟(Λ) ∶= .
√ 𝑀𝜂
In order to estimate Λd , from (8.17) we have, on the event {𝜙 = 1}, that
2
(9.5) 𝑣𝑖 − 𝑚sc ∑ 𝑠𝑖𝑘 𝑣𝑘 = 𝑂≺ (Λ2 + 𝑟(Λ));
𝑘
here we used the bound (9.4) on |Υ𝑖 |. Next, we subtract the average 𝑁 −1 ∑𝑖 from
each side to get
2
(𝑣𝑖 − [𝑣]) − 𝑚sc ∑ 𝑠𝑖𝑘 (𝑣𝑘 − [𝑣]) = 𝑂≺ (Λ2 + 𝑟(Λ))
𝑘
where we used ∑𝑘 𝑠𝑖𝑘 = 1. Note that the average over 𝑖 of the left-hand side
vanishes, so that the average of the right-hand side also vanishes. Hence, the
2
right-hand side is perpendicular to 𝐞. Inverting the operator 1 − 𝑚sc 𝑆 on the
⟂
subspace 𝐞 and using Γ̃ ≤ 𝐶 therefore yields
(9.6) |𝑣𝑖 − [𝑣]| ≺ Λ2 + 𝑟(Λ).
Combining this with the bound Λo ≺ 𝑟(Λ) from (9.4) and recalling that Θ =
|[𝑣]|, we therefore get
(9.7) Λ ≺ Θ + Λ2 + 𝑟(Λ).
By definition of 𝜙 we have Λ2 ≤ 𝑀 −𝛾/4 Λ, so that the second term on the right-
hand side of (9.7) may be absorbed into the left-hand side,
(9.8) Λ ≺ Θ + 𝑟(Λ),
where we used the cancellation property (6.25). Using (9.8) and the Cauchy-
Schwarz inequality, we get
Considering the right-hand side as a small error, we will view (9.10) as a small
perturbation of a quadratic equation for [𝑣]. Recalling that Θ = |[𝑣]|, the error
term can be determined self-consistently. Thus, up to the accuracy of the error
terms, we can solve (9.10) for [𝑣].
We now give a heuristic proof for (9.10). There are two main issues that we
ignore in this sketch. First, we work only on the event {𝜙 = 1}; i.e., we assume
that an a priori bound Λ ≪ 1 has already been proved uniformly for all 𝜂 ≥ ˜ 𝜂 𝐸.
It will require a separate argument to show that the complement event {𝜙 = 0}
is negligible and, in some sense, this constitutes the essential part to extend the
proof of Theorem 6.7 from 𝜂 ≥ 𝜂𝐸 to the larger regime 𝜂 ≥ ˜ 𝜂 𝐸 . Second, we will
neglect certain subtleties of the ≺ relation. While most arithmetics involving ≺
work in the same way as the usual inequality relation ≤, the cancellation rule
(6.25) requires the coefficient of 𝑋 on the right-hand side to be small. We will
disregard this requirement below and we will treat ≺ as the usual ≤.
Since we work on the event {𝜙 = 1}, it is understood that every quantity
below is multiplied by 𝜙, but we will ignore this fact in the formulas. Recall
from (8.17) that
2 2 −1 2
(9.11) 𝑣𝑖 − 𝑚sc ∑ 𝑠𝑖𝑘 𝑣𝑘 + 𝑚sc Υ𝑖 = 𝑚sc 𝑣𝑖 + 𝑂(Λ3 ).
𝑘
In order to take the average over 𝑖 and get a closed equation for [𝑣], we write,
using (9.6),
Plugging this back into (9.11) and taking the average over 𝑖 gives
2 2 −1
(1 − 𝑚sc )[𝑣] + 𝑚sc [Υ] = 𝑚sc [𝑣]2 + 𝑂≺ (Λ3 + 𝑟(Λ)2 ).
1
By Corollary 8.10 with 𝑡𝑖𝑘 = 𝑁
, we can estimate
Indeed, from the Schur complement formula (8.7) we get |𝑄𝑘 (𝐺𝑘𝑘 )−1 | ≤ |ℎ𝑘𝑘 |+
|𝑍𝑘 |. The first term is estimated by |ℎ𝑘𝑘 | ≺ 𝑀 −1/2 ≤ Ψo . The second term is
estimated exactly as in (8.29) and (8.30), giving |𝑍𝑘 | ≺ Ψo . In fact, the same
(𝕋)
bound holds if 𝐺𝑘𝑘 is replaced with 𝐺𝑘𝑘 as long as the cardinality of 𝕋 is bounded
(see (10.10) later).
Abbreviate 𝑋𝑘 ∶= 𝑄𝑘 (𝐺𝑘𝑘 )−1 and compute the variance:
2
(10.3) 𝔼||∑ 𝑡𝑖𝑘 𝑋𝑘 || = ∑ 𝑡𝑖𝑘 𝑡𝑖𝑙 𝔼𝑋𝑘 𝑋𝑙 = ∑ 𝑡𝑖𝑘
2
𝔼𝑋𝑘 𝑋𝑘 + ∑ 𝑡𝑖𝑘 𝑡𝑖𝑙 𝔼𝑋𝑘 𝑋𝑙 .
𝑘 𝑘,𝑙 𝑘 𝑘≠𝑙
Using the bounds (8.45) on 𝑡𝑖𝑘 and (10.2), we find that the first term on the
right-hand side of (10.3) is 𝑂≺ (𝑀 −1 Ψ2o ) = 𝑂≺ (Ψ4o ), where we used that Ψo is
admissible, recalling (8.20). Let us therefore focus on the second term of (10.3).
Using the fact that 𝑘 ≠ 𝑙, we apply the first resolvent decoupling formula (8.1)
to 𝑋𝑘 and 𝑋𝑙 to get
1 1
𝔼𝑋𝑘 𝑋𝑙 = 𝔼𝑄𝑘 ( )𝑄𝑙 ( )
𝐺𝑘𝑘 𝐺𝑙𝑙
(10.4)
1 𝐺𝑘𝑙 𝐺𝑙𝑘 1 𝐺𝑙𝑘 𝐺𝑘𝑙
= 𝔼𝑄𝑘 ( − )𝑄𝑙 ( − ).
(𝑙) (𝑙) (𝑘) (𝑘)
𝐺𝑘𝑘 𝐺𝑘𝑘 𝐺𝑘𝑘 𝐺𝑙𝑙 𝐺𝑙𝑙 𝐺𝑙𝑙 𝐺𝑙𝑙 𝐺𝑘𝑘
Notice that we used (8.1) in the form
(𝕋) (𝕋)
1 1 𝐺𝑘𝑙 𝐺𝑙𝑘
(10.5) = − for any 𝑘 ≠ 𝑙, 𝑘, 𝑙 ∉ 𝕋
(𝕋) (𝕋𝑙) (𝕋) (𝕋𝑙) (𝕋)
𝐺𝑘𝑘 𝐺𝑘𝑘 𝐺𝑘𝑘 𝐺𝑘𝑘 𝐺𝑙𝑙
with 𝕋 = ∅. We multiply out the parentheses on the right-hand side of (10.4).
The crucial observation is that if the random variable 𝑌 is independent of 𝑖 (see
Definition 7.3) then not only 𝑄𝑖 𝑌 = 0 but also 𝔼𝑄𝑖 (𝑋)𝑌 = 𝔼𝑄𝑖 (𝑋𝑌) = 0 hold
for any 𝑋. Hence out of the four terms obtained from the right-hand side of
(10.4), the only nonvanishing one is
This lemma is a generalization of (6.26), but notice that the majoring quan-
tity Ψ has to be deterministic. For general random variables, 𝑋 ≺ 𝑌 may not
imply the analogous relation 𝔼[𝑋|ℱ] ≺ 𝔼[𝑌|ℱ] for conditional expectations
with respect to a 𝜎-algebra ℱ.
for arbitrary 𝐷 > 0. The first claim therefore follows by choosing 𝐷 large
enough. For the converse statement, we use the Chebyshev inequality: for any
𝜀 > 0 and 𝐷 > 0,
𝔼𝑋 𝑛
ℙ(𝑋 ≥ 𝑁 𝜀 Ψ) ≤ ≤ 𝑁 𝜀−𝜀𝑛 ≤ 𝑁 −𝐷
𝑁 𝜀𝑛 Ψ𝑛
by choosing 𝑛 large enough.
Finally, the claim (10.7) follows from Chebyshev’s inequality, using a high-
moment estimate combined with Jensen’s inequality for partial expectation. We
omit the details, which are similar to those of the first claim. □
We now start the proof of Lemma 8.9. The main statement is (8.47), the
other bounds will be relatively simple consequences. Finally, we present an
alternative argument for (8.47) that organizes the stopping rule in the expansion
somewhat differently.
uniformly for 𝕋 ⊂ {1, … , ℕ}, |𝕋| ≤ 𝑝, and 𝑘 ∉ 𝕋. To simplify notation, for the
proof we set 𝕋 = ∅; the proof for nonempty 𝕋 is the same. From the Schur
complement formula (8.7), we get |𝑄𝑘 (𝐺𝑘𝑘 )−1 | ≤ |ℎ𝑘𝑘 | + |𝑍𝑘 |. The first term is
estimated by |ℎ𝑘𝑘 | ≺ 𝑀 −1/2 ≤ Ψo . The second term is estimated exactly as in
(8.29) and (8.30):
(𝑘) 1/2
(𝑘) 2
|𝑍𝑘 | ≺ ( ∑ 𝑠𝑘𝑥 |𝐺𝑥𝑦 | 𝑠𝑦𝑘 ) ≺ Ψo
𝑥≠𝑦
10.2. PROOF OF LEMMA 8.9 87
(𝑘)
where in the last step we used that |𝐺𝑥𝑦 | ≺ Ψo as follows from (10.8) and the
bound 1/|𝐺𝑘𝑘 | ≺ 1 (recall that Λ ≺ Ψ ≤ 𝑀 −𝑐 ). This concludes the proof of
(10.10).
Abbreviate 𝑋𝑘 ∶= 𝑄𝑘 (𝐺𝑘𝑘 )−1 . We shall estimate ∑𝑘 𝑡𝑖𝑘 𝑋𝑘 in probability
2𝑝
by estimating its 𝑝th moment by Ψo , from which the claim (8.47) will easily
follow using Chebyshev’s inequality.
After this preparation, the rest of the proof for (8.47) can be divided into four
steps.
Step 1. Coincidence structure in the expansion of the 𝐿𝑝 norm. Fix some even
integer 𝑝 and write
𝑝
| |
𝔼||∑ 𝑡𝑖𝑘 𝑋𝑘 || = ∑ 𝑡𝑖𝑘1 … 𝑡𝑖𝑘𝑝/2 𝑡 𝑖𝑘𝑝/2+1 … 𝑡𝑖𝑘𝑝 𝔼𝑋𝑘1 ⋯ 𝑋𝑘𝑝/2 𝑋 𝑘𝑝/2+1 ⋯ 𝑋 𝑘𝑝 .
𝑘 𝑘1 ,…,𝑘𝑝
Next, we regroup the terms in the sum over 𝐤 ∶= (𝑘1 , … , 𝑘𝑝 ) according to the
coincidence structure in 𝐤 as follows: given a sequence of indices 𝐤, define the
partition 𝒫(𝐤) of {1, … , 𝑝} by the equivalence relation 𝑟 ∼ 𝑠 if and only if 𝑘𝑟 = 𝑘𝑠 .
Denote the set of all partitions of {1, … , 𝑝} by 𝔓𝑝 . Then, we write
𝑝
| |
(10.11) 𝔼||∑ 𝑡𝑖𝑘 𝑋𝑘 || = ∑ ∑ 𝑡𝑖𝑘1 … 𝑡𝑖𝑘𝑝/2 𝑡 𝑖𝑘𝑝/2+1 … 𝑡 𝑖𝑘𝑝 𝟏(𝒫(𝐤) = 𝑃)𝑉(𝐤)
𝑘 𝑃∈𝔓𝑝 𝐤
where we defined
𝑉(𝐤) ∶= 𝔼𝑋𝑘1 … 𝑋𝑘𝑝/2 𝑋 𝑘𝑝/2+1 … 𝑋 𝑘𝑝 .
Given a partition 𝑃, for any 𝑟 ∈ {1, … , 𝑝}, we denote by [𝑟] the block of 𝑟 in 𝑃,
i.e., the set of all indices in the same block of the partition as 𝑟. Let 𝐿 ≡ 𝐿(𝑃) ∶=
{𝑟 ∶ [𝑟] = {𝑟}} ⊂ {1, … , 𝑝} be the set of lone labels. We denote by 𝐤𝐿 ∶= (𝑘𝑟 )𝑟∈𝐿
the summation indices associated with lone labels.
Step 2. Resolution of dependence in weakly dependent random variables. The
resolvent entry 𝐺𝑘𝑘 depends strongly on the randomness in the 𝑘-column of 𝐻,
but only weakly on the randomness in the other columns. We conclude that if
𝑟 is a lone label, then all factors 𝑋𝑘𝑠 with 𝑠 ≠ 𝑟 in 𝑉(𝐤) depend weakly on the
randomness in the 𝑘𝑟 th column of 𝐻 (if 𝑟 is not a lone label, then this statement
holds only for “all factors 𝑋𝑘𝑠 with 𝑘𝑠 ≠ 𝑘𝑟 ”). Thus, the idea is to make all
resolvent entries inside the expectation of 𝑉(𝐤) as independent of the indices
𝐤𝐿 as possible (see Definition 7.3), using the first decoupling resolvent identity
(8.1): for 𝑥, 𝑦, 𝑢 ∉ 𝕋 and 𝑥, 𝑦 ≠ 𝑢 (𝑥 can be equal to 𝑦),
(𝕋) (𝕋)
(𝕋) (𝕋𝑢) 𝐺𝑥𝑢 𝐺𝑢𝑦
(10.12) 𝐺𝑥𝑦 = 𝐺𝑥𝑦 +
(𝕋)
𝐺𝑢𝑢
and for 𝑥, 𝑢 ∉ 𝕋 and 𝑥 ≠ 𝑢
(𝕋) (𝕋)
1 1 𝐺𝑥𝑢 𝐺𝑢𝑥
(10.13) = − .
(𝕋) (𝕋𝑢) (𝕋) (𝕋𝑢) (𝕋)
𝐺𝑥𝑥 𝐺𝑥𝑥 𝐺𝑥𝑥 𝐺𝑥𝑥 𝐺𝑢𝑢
88 10. FLUCTUATION AVERAGING MECHANISM
(𝕋)
Definition 10.3. A resolvent entry 𝐺𝑥𝑦 with 𝑥, 𝑦 ∉ 𝕋 is maximally ex-
panded with respect to a set 𝐵 ⊂ {1, … 𝑁} if 𝐵 ⊂ 𝕋 ∪ {𝑥, 𝑦}.
(𝕋)
Given the set 𝐤𝐿 of lone indices, we say that a resolvent entry 𝐺𝑥𝑦 is maxi-
mally expanded if it is maximally expanded with respect to the set 𝐵 = 𝐤𝐿 . The
motivation behind this definition is that using (8.1) we cannot add upper indices
from the set 𝐤𝐿 to a maximally expanded resolvent entry. We shall apply (8.1) to
all resolvent entries in 𝑉(𝐤). In this manner we generate a sum of monomials
consisting of off-diagonal resolvent entries and inverses of diagonal resolvent
entries. We can now repeatedly apply (8.1) to each factor until either they are
all maximally expanded or a sufficiently large number of off-diagonal resolvent
entries has been generated. The cap on the number of off-diagonal entries is
introduced to ensure that this procedure terminates after a finite number of
steps.
In order to define the precise algorithm, let 𝒜 denote the set of monomials
(𝕋)
in the off-diagonal entries 𝐺𝑥𝑦 , with 𝕋 ⊂ 𝐤𝐿 , 𝑥 ≠ 𝑦, and 𝑥, 𝑦 ∈ 𝐤 ⧵ 𝕋, as well as
(𝕋)
the inverse diagonal entries 1/𝐺𝑥𝑥 with 𝕋 ⊂ 𝐤𝐿 and 𝑥 ∈ 𝐤 ⧵ 𝕋. Starting from
𝑉(𝐤), the algorithm will recursively generate sums of monomials in 𝒜. Let 𝑑(𝐴)
denote the number of off-diagonal entries in 𝐴 ∈ 𝒜. For 𝐴 ∈ 𝒜 we shall define
𝑤0 (𝐴), 𝑤1 (𝐴) ∈ 𝒜 satisfying
𝐴 = 𝑤0 (𝐴) + 𝑤1 (𝐴),
𝑑(𝑤0 (𝐴)) = 𝑑(𝐴),
𝑑(𝑤1 (𝐴)) ≥ max{2, 𝑑(𝐴) + 1}.
The idea behind this splitting is to use (8.1) on one entry of 𝐴; the first term on
the right-hand side of (8.1) gives rise to 𝑤0 (𝐴) and the second to 𝑤1 (𝐴). The
precise definition of the algorithm applied to 𝐴 ∈ 𝒜 is as follows:
(1) If all factors of 𝐴 are maximally expanded or 𝑑(𝐴) ≥ 𝑝 + 1, then stop
the expansion of 𝐴.
(2) Otherwise, choose some (arbitrary) factor of 𝐴 that is not maximally
(𝕋)
expanded. If this entry is off-diagonal, 𝐺𝑥𝑦 , we use (10.12) to de-
(𝕋)
compose 𝐺𝑥𝑦 into a sum of two terms with 𝑢 the smallest element
(𝕋)
in 𝐤𝐿 ⧵ (𝕋 ∪ {𝑥, 𝑦}). If the chosen entry is diagonal, 1/𝐺𝑥𝑥 , we use
(𝕋)
(10.13) to decompose 𝐺𝑥𝑥 into two terms with 𝑢 the smallest element
in 𝐤𝐿 ⧵ (𝕋 ∪ {𝑥}). The choice of 𝑢 to be the smallest element is not
important, we just chose it for definiteness of the algorithm. From the
(𝕋) (𝕋)
splitting of the factor 𝐺𝑥𝑦 or 1/𝐺𝑥𝑥 in the monomial 𝐴, we obtain a
(𝕋) (𝕋)
splitting 𝐴 = 𝑤0 (𝐴) + 𝑤1 (𝐴) (i.e., we replace 𝐺𝑥𝑦 or 1/𝐺𝑥𝑥 by the
right-hand sides of (10.12) or (10.13)).
It is clear that (10.14) holds with the algorithm just defined. In fact, in most
cases the last inequality in (10.14) is an equality, 𝑑(𝑤1 (𝐴)) = max{2, 𝑑(𝐴) + 1}.
The only exception is when (10.12) is used for 𝑥 = 𝑦, then two new off-diagonal
entries are added, i.e., 𝑑(𝑤1 (𝐴)) = 𝑑(𝐴) + 2. Notice also that this algorithm
10.2. PROOF OF LEMMA 8.9 89
where 𝑏(𝜎) is the number ones in the string 𝜎. Indeed, if 𝑏(𝜎) = 0 then this
follows from (10.10); if 𝑏(𝜎) ≥ 1 this follows from the last statement in (10.14)
which guarantees that every one in the string 𝜎 increases the exponent of Ψo by
at least one (we also use (10.8)). In particular, each 𝐴𝑘𝜍 is bounded by at least
Ψo .
If we used only the trivial bound 𝐴𝑘𝜍 = 𝑂≺ (Ψo ) for each factor in (10.18);
i.e., we did not exploit the gain from the 𝑤1 (𝐴) type terms in the expansion,
𝑝
then the naive size of the left-hand side of (10.18) would only be Ψo . The key
observation behind (10.18) is that each lone label 𝑠 ∈ 𝐿 yields one extra factor
Ψo to the estimate. This is because the expectation in (10.17) would vanish if
all other factors (𝑄𝑘𝑟 𝐴𝑟𝜍𝑟 ), 𝑟 ≠ 𝑠, were independent of 𝑘𝑠 . The expansion of the
binary tree makes this dependence explicit by exhibiting 𝑘𝑠 as a lower index.
But this requires performing an operation 𝑤1 with the choice 𝑢 = 𝑘𝑠 in (10.12)
or (10.13). However, 𝑤1 increases the number of off-diagonal elements by at
least one. In other words, every index associated with a lone label must have
a “partner” index in a different resolvent entry which arose by application of
𝑤1 . Such a partner index may only be obtained through the creation of at least
one off-diagonal resolvent entry. The actual proof below shows that this effect
applies cumulatively for all lone labels.
In order to give the rigorous proof of (10.18), we consider two cases. Con-
sider first the case where for some 𝑟 = 1, … , 𝑝 the monomial 𝐴𝑟𝜍𝑟 on the left-hand
side of (10.18) is not maximally expanded. Then 𝑑(𝐴𝑟𝜍𝑟 ) = 𝑝 + 1, so that (10.8)
𝑝+1
yields 𝐴𝑟𝜍𝑟 ≺ Ψo . Therefore, the observation that 𝐴𝑠𝜍𝑠 ≺ Ψo for all 𝑠 ≠ 𝑟
2𝑝
together with (10.7) implies that the left-hand side of (10.18) is 𝑂≺ (Ψo ). Since
|𝐿| ≤ 𝑝, (10.18) follows.
Consider now the case where 𝐴𝑟𝜍𝑟 on the left-hand side of (10.18) is maxi-
mally expanded for all 𝑟 = 1, … , 𝑝. The key observation is the following claim
about the left-hand side of (10.18) with a nonzero expectation.
For each 𝑠 ∈ 𝐿 there exists 𝑟 ∶= 𝜏(𝑠) ∈ {1, … , 𝑝} ⧵ {𝑠}
(∗) such that the monomial 𝐴𝑟𝜍𝑟 contains a resolvent entry
with lower index 𝑘𝑠 .
To prove (∗), suppose by contradiction that there exists an 𝑠 ∈ 𝐿 such that
for all 𝑟 ∈ {1, … , 𝑝} ⧵ {𝑠} the lower index 𝑘𝑠 does not appear in the monomial
𝐴𝑟𝜍𝑟 . To simplify notation, we assume that 𝑠 = 1. Then, for all 𝑟 = 2, … , 𝑝, since
𝐴𝑟𝜍𝑟 is maximally expanded, we find that 𝐴𝑟𝜍𝑟 is independent of 𝑘1 . Therefore,
we have
𝑝 𝑝
𝔼(𝑄𝑘1 𝐴1𝜍1 )(𝑄𝑘2 𝐴2𝜍2 ) ⋯ (𝑄𝑘𝑝 𝐴𝜍𝑝 ) = 𝔼𝑄𝑘1 (𝐴1𝜍1 (𝑄𝑘2 𝐴2𝜍2 ) ⋯ (𝑄𝑘𝑝 𝐴𝜍𝑝 )) = 0,
where in the last step we used that 𝔼𝑄𝑖 (𝑋)𝑌 = 𝔼𝑄𝑖 (𝑋𝑌) = 0 if 𝑌 is independent
of 𝑖. This concludes the proof of (∗).
The statement (∗) can be reformulated as asserting that, after expansion,
every lone label 𝑠 has a “partner” label 𝑟 = 𝜏(𝑠), such that the index 𝑘𝑠 appears
92 10. FLUCTUATION AVERAGING MECHANISM
also as a lower index in the expansion of 𝐴𝑟 (note that there may be several such
partner labels 𝑟, we can choose 𝜏(𝑠) to be any one of them).
For 𝑟 ∈ {1, … , 𝑝} we define ℓ(𝑟) ∶= ∑𝑠∈𝐿 𝟏(𝜏(𝑠) = 𝑟), the number of times
that the label 𝑟 was chosen as a partner to some lone label 𝑠. We now claim that
1+ℓ(𝑟)
(10.20) 𝐴𝑟𝜍𝑟 = 𝑂≺ (Ψo ).
To prove (10.20), fix 𝑟 ∈ {1, … , 𝑝}. By definition, for each 𝑠 ∈ 𝜏 −1 ({𝑟}) the
index 𝑘𝑠 appears as a lower index in the monomial 𝐴𝑟𝜍𝑟 . Since 𝑠 ∈ 𝐿 is by
definition a lone label and 𝑠 ≠ 𝑟, we know that 𝑘𝑠 does not appear as an index
in 𝐴𝑟 = 1/𝐺𝑘𝑟 𝑘𝑟 , i.e., 𝑘𝑟 ≠ 𝑘𝑠 . By definition of the monomials associated with
the tree vertex 𝜎𝑟 , it follows that 𝑏(𝜎𝑟 ), the number of ones in 𝜎𝑟 , is at least
|𝜏 −1 ({𝑟})| = ℓ(𝑟) since each application of 𝑤1 adds precisely one new lower
index while application of 𝑤0 leaves the lower indices unchanged. Note that in
this step it is crucial that 𝑠 ∈ 𝜏 −1 ({𝑟}) was a lone label. Recalling (10.19), we
therefore get (10.20).
Using (10.20) and Lemma 10.1, we find
𝑝
|(𝑄 𝐴1 ) ⋯ (𝑄 𝐴𝑝𝜍 )| ≺ ∏ Ψ1+ℓ(𝑟) 𝑝+|𝐿|
= Ψo .
| 𝑘1 𝜍1 𝑘𝑝 𝑝 | o
𝑟=1
| | | |
|∑ 𝟏(𝒫(𝐤) = 𝑃) 𝑡𝑖𝑘1 ⋯ 𝑡𝑖𝑘𝑝/2 𝑡𝑖𝑘𝑝/2+1 ⋯ 𝑡𝑖𝑘𝑝 | ≤ 𝑀 −𝑝 |∑ 𝟏(𝒫(𝐤) = 𝑃) |.
| | | |
𝐤 𝐤
Now the number of 𝐤 with |𝐿| lone labels can be bounded easily by 𝑀 |𝐿|+(𝑝−|𝐿|)/2
since each block of 𝑃 that is not contained in 𝐿 consists of at least two labels.
Thus, we can bound the last displayed equation by
| |
|∑ 𝟏(𝒫(𝐤) = 𝑃)𝑡𝑖𝑘1 ⋯ 𝑡𝑖𝑘𝑝/2 𝑡𝑖𝑘𝑝/2+1 ⋯ 𝑡𝑖𝑘𝑝 | ≤ 𝑀 −𝑝 𝑀 |𝐿|+(𝑝−|𝐿|)/2
| |
𝐤
= (𝑀 −1/2 )𝑝−|𝐿| .
From (10.11) and (10.21) we get
𝑝
| | 𝑝+|𝐿(𝑃)| 2𝑝
𝔼||∑ 𝑡𝑖𝑘 𝑋𝑘 || ≺ ∑ (𝑀 −1/2 )𝑝−|𝐿(𝑃)| Ψo ≤ 𝐶𝑝 Ψo ,
𝑘 𝑃∈𝔓𝑝
10.2. PROOF OF LEMMA 8.9 93
where in the last step we used the lower bound from (8.20) and estimated the
summation over 𝔓𝑝 with a constant 𝐶𝑝 (which is bounded by (𝐶𝑝2 )𝑝 ). Sum-
marizing, we have proved that
𝑝
| | 2𝑝
(10.22) 𝔼||∑ 𝑡𝑖𝑘 𝑋𝑘 || ≺ Ψo
𝑘
where in the last step we used Corollary 8.10 (recall that the proof of this corollary
used only (8.47), which was already proved in Part I) to bound ∑𝑖 𝑡𝑎𝑖 Υ𝑖 by 𝑂≺ (Ψ2 )
and that the matrices 𝑇 and 𝑆 commute by assumption. Introducing the vector
𝐰 = (𝑤𝑎 )𝑁𝑎=1 , we therefore have the equation
2
(10.23) 𝐰 = 𝑚sc 𝑆𝐰 + 𝑂≺ (Ψ2 ) ,
where the error term is in the sense of the ℓ∞ -norm (uniform in the components
2
of the vector 𝐰). Inverting the matrix 1−𝑚sc 𝑆 and recalling the definition (6.17)
yields (8.48).
94 10. FLUCTUATION AVERAGING MECHANISM
The proof of (8.50) is similar, except that we have to treat the subspace 𝐞⟂
separately. Using (8.49), we write
1
∑ 𝑡𝑎𝑖 (𝑣𝑖 − [𝑣]) = ∑ 𝑡𝑎𝑖 𝑣𝑖 − ∑ 𝑣𝑖 ,
𝑖 𝑖 𝑖
𝑁
and apply the above argument to each term separately. This yields
2 2 1
∑ 𝑡𝑎𝑖 (𝑣𝑖 − [𝑣]) = 𝑚sc ∑ 𝑡𝑎𝑖 ∑ 𝑠𝑖𝑘 𝑣𝑘 − 𝑚sc ∑ ∑ 𝑠𝑖𝑘 𝑣𝑘 + 𝑂≺ (Ψ2 )
𝑖 𝑖 𝑘 𝑖
𝑁 𝑘
2
= 𝑚sc ∑ 𝑠𝑎𝑖 𝑡𝑖𝑘 (𝑣𝑘 − [𝑣]) + 𝑂≺ (Ψ2 )
𝑖,𝑘
where we used (6.1) in the second step. Note that the error term on the right-
hand side is perpendicular to 𝐞 when regarded as a vector indexed by 𝑎, since
all other terms in the equation are. Hence we may invert the matrix (1 − 𝑚2 𝑆)
on the subspace 𝐞⟂ , as above, to get (8.50). This completes the proof of Lemma
8.9. □
˜𝑘 , we have that 𝑋
Next, by definition of 𝑋 ˜𝑘 = 𝑄𝑘 𝑋 ˜ , which implies that
𝑠 𝑠 𝑠 𝑘𝑠
˜
𝑃𝐴c𝑠 𝑋𝑘𝑠 = 0 if 𝑘𝑠 ∉ 𝐴𝑠 . Hence we may restrict the summation to 𝐴𝑠 satisfying
(10.25) 𝑘𝑠 ∈ 𝐴 𝑠
for all 𝑠. Moreover, we claim that the right-hand side of (10.24) vanishes unless
(10.26) 𝑘𝑠 ∈ 𝐴𝑞
⋃
𝑞≠𝑠
for all 𝑠. Indeed, suppose that 𝑘𝑠 ∈ ⋂𝑞≠𝑠 𝐴c𝑞 for some 𝑠, say 𝑠 = 1. In this case,
˜𝑘 is independent of 𝑘1 (see Definition
for each 𝑠 = 2, … , 𝑝 the factor 𝑃𝐴c 𝑄𝐴 𝑋 𝑠 𝑠 𝑠
7.3). Thus, we get
𝑝 𝑝
˜𝑘 ) = 𝔼(𝑃𝐴c 𝑄𝐴 𝑄𝑘 𝑋
𝔼 ∏(𝑃 𝑄𝐴𝑠 𝑋
𝐴c𝑠
˜ )
˜ ) ∏(𝑃𝐴c 𝑄𝐴 𝑋
𝑠 1 1 1 𝑘1 𝑠 𝑠 𝑘𝑠
𝑠=1 𝑠=2
𝑝
˜𝑘 ) ∏(𝑃𝐴c 𝑄𝐴 𝑋
= 𝔼𝑄𝑘1 ((𝑃𝐴c1 𝑄𝐴1 𝑋 ˜ )) = 0,
1 𝑠 𝑠 𝑘𝑠
𝑠=2
where in the last step we used that 𝔼𝑄𝑖 (𝑋) = 0 for any 𝑖 and random variable 𝑋.
We conclude that the summation on the right-hand side of (10.24) is re-
stricted to indices satisfying (10.25) and (10.26). Under these two conditions we
have
𝑝
(10.27) ∑ |𝐴𝑠 | ≥ 2 |[𝐤]|,
𝑠=1
since each index 𝑘𝑠 must belong to at least two different sets 𝐴𝑞 : to 𝐴𝑠 (by
(10.25)) as well as to some 𝐴𝑞 with 𝑞 ≠ 𝑠 (by (10.26)).
Next, we claim that for 𝑘 ∈ 𝐴 we have
|𝐴|
(10.28) |𝑄𝐴 𝑋𝑘 | ≺ Ψo .
(Note that if we were doing the case 𝑋𝑘 = 𝑄𝑘 𝐺𝑘𝑘 instead of 𝑋𝑘 = 𝑄𝑘 (𝐺𝑘𝑘 )−1 ,
then (10.28) would have to be weakened to |𝑄𝐴 𝑋𝑘 | ≺ Ψ|𝐴| , in accordance with
(8.46). Indeed, in that case and for 𝐴 = {𝑘}, we only have the bound |𝑄𝑘 𝐺𝑘𝑘 | ≺ Ψ
and not |𝑄𝑘 𝐺𝑘𝑘 | ≺ Ψo .)
Before proving (10.28), we show how it may be used to complete the proof.
Using (10.24), (10.28), and Lemma 10.1, we find
𝑝
|1 | 1 2|[𝑘]|
𝔼|| ∑ 𝑋𝑘 || ≺ 𝐶𝑝 𝑝 ∑ Ψo
𝑁 𝑘
𝑁 𝐤
𝑝
1
= 𝐶𝑝 ∑ Ψ2𝑢
o ∑ 𝟏(|[𝐤]| = 𝑢)
𝑢=1
𝑁𝑝 𝐤
𝑝
2𝑝
≤ 𝐶𝑝 ∑ Ψ2𝑢
o 𝑁
𝑢−𝑝
≤ 𝐶𝑝 (Ψo + 𝑁 −1/2 )2𝑝 ≤ 𝐶𝑝 Ψo ,
𝑢=1
96 10. FLUCTUATION AVERAGING MECHANISM
where in the first step we estimated the summation over the sets 𝐴1 , … , 𝐴𝑝
by a combinatorial factor 𝐶𝑝 depending on 𝑝, in the fourth step we used the
elementary inequality 𝑎𝑛 𝑏𝑚 ≤ (𝑎 + 𝑏)𝑛+𝑚 for positive 𝑎, 𝑏, and in the last step
we used (8.20) and the bound 𝑀 ≤ 𝑁. Thus, we have proved (10.22), from
which the claim follows exactly as in the first proof of (8.47).
What remains is the proof of (10.28). The case |𝐴| = 1 (corresponding to
𝐴 = {𝑘}) follows from (10.10), exactly as in the first proof of (8.47). To simplify
notation, for the case |𝐴| ≥ 2 we assume that 𝑘 = 1 and 𝐴 = {1, … , 𝑡} with 𝑡 ≥ 2.
It suffices to prove that
| 1 |
(10.29) |𝑄𝑡 ⋯ 𝑄2 | ≺ Ψ𝑡o .
| 𝐺11 |
We start by writing, using the first decoupling formula (8.1),
1 1 𝐺12 𝐺21 𝐺12 𝐺21
𝑄2 = 𝑄2 + 𝑄2 = 𝑄2 ,
𝐺11 (2)
𝐺11
(2)
𝐺11 𝐺11 𝐺22
(2)
𝐺11 𝐺11 𝐺22
(2)
where the first term vanishes since 𝐺11 is independent of 2 (see Definition 7.3).
We now consider
1 𝐺12 𝐺21
𝑄3 𝑄2 = 𝑄2 𝑄3 ,
𝐺11 𝐺 𝐺 𝐺
(2)
11 11 22
and apply again the first decoupling formula (8.1) with 𝑘 = 3 to each resolvent
entry on the right-hand side,and multiply everything out. The result is a sum of
fractions of entries of 𝐺, whereby all entries in the numerator are diagonal and
all entries in the denominator are diagonal. The leading-order term vanishes,
(3) (3)
𝐺12 𝐺21
𝑄2 𝑄3 = 0,
(3) (23) (3)
𝐺11 𝐺11 𝐺22
so that the surviving terms have at least three (off-diagonal) resolvent entries in
the numerator. We may now continue in this manner; at each step the number
of (off-diagonal) resolvent entries in the numerator increases by at least 1.
More formally, we obtain a sequence 𝐴2 , … , 𝐴𝑡 , where
𝐺12 𝐺21
𝐴2 ∶= 𝑄2
(2)
𝐺11 𝐺11 𝐺22
and 𝐴𝑖 is obtained by applying (8.1) with 𝑘 = 𝑖 to each entry of 𝑄𝑖 𝐴𝑖−1 and
keeping only the nonvanishing terms. The following properties are easy to check
by induction:
(i) 𝐴𝑖 = 𝑄𝑖 𝐴𝑖−1 .
(ii) 𝐴𝑖 consists of the projection 𝑄2 ⋯ 𝑄𝑖 applied to a sum of fractions such
that all entries in the numerator are off-diagonal and all entries in the
denominator are diagonal.
10.3. ALTERNATIVE PROOF OF (8.47) OF LEMMA 8.9 97
The local semicircle law in Theorem 6.7 was proven for universal Wigner
matrices, characterized by the upper bound 𝑠𝑖𝑗 ≤ 1/𝑀 on the variances of ℎ𝑖𝑗 .
This result implies rigidity estimates on the location of the eigenvalues with
a precision depending on 𝑀; e.g., in the bulk spectrum the eigenvalues can
typically be located with a precision slightly above 1/𝑀. For the sake of simplic-
ity, from now on we restrict our presentation to the special case of generalized
Wigner matrices characterized by 𝐶inf /𝑁 ≤ 𝑠𝑖𝑗 ≤ 𝐶sup /𝑁 with two positive con-
stants 𝐶inf and 𝐶sup (see Definition 2.1). So, from now on the parameter 𝑀 will
be replaced with 𝑁. For rigidity results in the general case, see theorem 7.6
in [55].
In this section we will show that eigenvalues for generalized Wigner matri-
ces are quite rigid; they may fluctuate only on a scale slightly above 1/𝑁. This
is a manifestation that the eigenvalues are strongly correlated; the typical fluc-
tuation scale of 𝑁 independent, say Poisson, points in a finite interval would be
𝑁 −1/2 .
while the second follows from Im 𝑚sc (𝑧) ≍ 𝜂/√𝜅 from (6.13).
The equations (11.2) and (11.3), however, contradict each other, showing
that with very high probability there is no eigenvalue with |𝜆 − 𝐸| ≤ 𝑁 −2/3 . We
can repeat this argument for a grid of energies 𝐸 = 𝐸𝑘 = 2+𝑁 −2/3+𝜀 +𝑘𝑁 −2/3 for
𝑘 = 0, 1, … , 𝐶𝑁 2/3+𝐷 for any 𝐷 finite. We can then use the union bound for the
exceptional sets of small probabilities to exclude the existence of eigenvalues
between 𝑁 −2/3+𝜀 and 𝑁 𝐷 . Finally, for a large enough 𝐷, it is trivial to prove
that the bound ‖𝐻‖ ≤ ∑𝑖𝑗 |ℎ𝑖𝑗 | ≤ 𝑁 𝐷 holds with very high probability. This
excludes eigenvalues |𝜆| ≥ 𝑁 𝐷 and proves the corollary. □
1 𝜕 𝑓˜ (𝑥 + 𝑖𝑦)
(11.5) 𝑓(𝜆) = ∫ 𝑧 d𝑥 d𝑦.
𝜋 ℝ2 𝜆 − 𝑥 − 𝑖𝑦
To see this identity, we use 𝜕𝑧 (𝜆 − 𝑧)−1 = 0 to write
1 𝜕 𝑓˜ (𝑥 + 𝑖𝑦) 1 ˜ (𝑥 + 𝑖𝑦)
𝑓
(11.6) ∫ 𝑧 d𝑥 d𝑦 = ∫ 𝜕𝑧 [ ]d𝑧 d𝑧.
𝜋 ℝ2 𝜆 − 𝑥 − 𝑖𝑦 2𝜋𝑖 ℝ2 𝜆 − 𝑥 − 𝑖𝑦
We can rewrite the last term as
1 ˜ (𝑥 + 𝑖𝑦)
𝑓
(11.7) ∫ d[ d𝑧],
2𝜋𝑖 ℝ2 𝜆 − 𝑥 − 𝑖𝑦
where the operator d is the differential in the sense of a 1-form. From the Green
theorem and the compact support of 𝑓 ˜ , we can integrate by parts to a small circle
11.2. STIELTJES TRANSFORM AND REGULARIZED COUNTING FUNCTION 101
and this proves (11.5). Computing the 𝜕𝑧̄ in (11.5) explicitly, we have
1 𝑖𝑦𝑓 ″ (𝑥)𝜒(𝑦) + 𝑖(𝑓(𝑥) + 𝑖𝑦𝑓 ′ (𝑥))𝜒 ′ (𝑦)
(11.9) 𝑓(𝜆) = ∫ d𝑥 d𝑦.
𝜋 ℝ2 𝜆 − 𝑥 − 𝑖𝑦
𝑈1 1
(11.11) |Im 𝑆(𝑥 + 𝑖𝑦)| ≤ for any 𝑥 ∈ [𝐸1 , 𝐸2 + ˜
𝜂 ] and ≤ 𝑦 ≤ 1,
𝑦 2
𝑈
(11.12) |Im 𝑆(𝑥 + 𝑖𝑦)| ≤ 2 for any 𝑥 ∈ [𝐸1 , 𝐸1 + ˜ 𝜂 ] ∪ [𝐸2 , 𝐸2 + ˜
𝜂]
𝑦
and 0 < 𝑦 < ˜𝜂.
For ˆ𝜂 ≥ ˜
𝜂 , define two functions 𝑓𝑗 = 𝑓𝐸𝑗 ,ˆ
𝜂 ∶ ℝ → ℝ such that 𝑓𝑗 (𝑥) = 1 for
−1
𝜂 , ∞); moreover, |𝑓𝑗′ (𝑥)| ≤ 𝐶 ˆ
𝑥 ∈ (−∞, 𝐸𝑗 ], 𝑓𝑗 (𝑥) vanishes for 𝑥 ∈ [𝐸𝑗 + ˆ 𝜂
−2
and |𝑓𝑗″ (𝑥)| ≤ 𝐶 ˆ
𝜂 for some constant 𝐶. Then, for some other constant 𝐶 > 0
independent of 𝑈1 , 𝑈2 , and ˜
𝜂 we have
2
| | ˜
𝜂
(11.13) |∫(𝑓2 − 𝑓1 )(𝜆) ˜
𝜚 (𝜆)d𝜆| ≤ 𝐶‖ ˜
𝜚 ‖TV [𝑈1 |log ˜
𝜂 | + 𝑈2 2 ]
| | ˆ
𝜂
102 11. EIGENVALUE LOCATION: THE RIGIDITY PHENOMENON
where ‖ ⋅ ‖TV denotes the total variation norm of signed measures. In addition, if
we assume that ˜𝜌 has compact support, then we also have
2
| | ˜
𝜂
(11.14) 𝜚 (𝜆)d𝜆| ≤ 𝐶‖ ˜
|∫ 𝑓2 (𝜆) ˜ 𝜚 ‖TV [𝑈1 |log ˜
𝜂 | + 𝑈2 2 ]
| | ˆ
𝜂
for some universal 𝐶 > 0 and where 𝜒 is a smooth cutoff function with support
1
in [−1, 1] with 𝜒(𝑦) = 1 for |𝑦| ≤ 2
and with bounded derivatives. Recall that 𝑓 ′
1
is O(𝜂−1 ) on two intervals of size O(𝜂) each and 𝜒 ′ is supported in 2
≤ |𝑦| ≤ 1.
By (11.10), we can bound
(note that due to 𝑆(𝑥 + 𝑖𝑦) = 𝑆(𝑥 − 𝑖𝑦), the bounds analogous to (11.10)–(11.11)
also hold for negative 𝑦’s). Using that 𝑓 is bounded with compact support, we
have, by (11.11),
For the first term on the right-hand side of (11.15), we split the integral into
two regimes depending on 0 < 𝑦 < 𝜂 or 𝜂 < 𝑦 < 1. Note that by symme-
try we only need to consider positive 𝑦. From (11.12), the integral on the first
11.3. CONVERGENCE SPEED OF THE EMPIRICAL DISTRIBUTION FUNCTION 103
Recall that 𝑓 ′ (𝑥) = 0 unless |𝑥 − 𝐸𝑗 | < 𝜂 for some 𝑗 = 1, 2, and in this regime
we have 𝑓 ′ = O(𝜂 −1 ). By (11.10), the last integral is easily bounded by O(𝑈).
For the first integral, we use 𝜕𝑦 (𝑦𝜒(𝑦)) = O(1) and 𝑆(𝑥 + i𝑦) = O(𝑈/𝑦) to have
1
| | d𝑦
|∬ 𝜕𝑦 (𝑦𝜒(𝑦))𝑓 ′ (𝑥)𝑆(𝑥 + i𝑦)d𝑥 d𝑦 | ≤ O(𝑈 ∫ )
(11.20) | 𝑦>𝜂 | 𝜂
𝑦
= O (𝑈| log 𝜂|) ,
which is (11.13). To prove (11.14), we simply choose 𝐸1 to be any energy on the
left side of the support of ˜
𝜌 and notice that
| | | |
(11.21) |∫ 𝑓2 (𝜆) ˜ 𝜚 (𝜆)d𝜆| ≤ 𝐶𝑈|log ˜
𝜚 (𝜆)d𝜆| = |∫(𝑓2 − 𝑓1 )(𝜆) ˜ 𝜂 |.
| | | |
This completes the proof of Lemma 11.2. □
Notice that we only used the positivity of 𝜚𝑁 in the first term and the bounded-
ness of 𝑓 and 𝜚sc in the second. Conversely, we have
The right-hand side is directly bounded by (11.29); the left-hand side is esti-
mated in the same way by using (11.29) for 𝑓𝐸−𝜂˜,𝜂˜ instead of 𝑓𝐸,𝜂˜ . This proves
Lemma 11.3. □
106 11. EIGENVALUE LOCATION: THE RIGIDITY PHENOMENON
i.e.,
| 𝛾𝑗 | | 𝜆𝑗 |
|∫ 𝜚sc (𝑥)d𝑥 | ≤ |∫ (𝜌𝑁 (𝑥) − 𝜌sc (𝑥))d𝑥| = |𝔫𝑁 (𝜆𝑗 ) − 𝑛sc (𝜆𝑗 )|
| 𝜆𝑗 | | −∞ |
and thus by the uniform bound (11.25)
| 𝛾𝑗 | 1
(11.34) |∫ 𝜚sc (𝑥)d𝑥| ≺ .
| 𝜆𝑗 | 𝑁
Combining this with the second asymptotics form (11.36), we would have
(11.38) |𝜆𝑗 − 𝛾𝑗 | ≺ 𝑁 −1 𝜚sc (𝛾𝑗 )−1 ≤ 𝐶𝑁 −1 (𝑗/𝑁)−1/3 .
For the complete proof, we first consider the indices 𝑗 ≥ 𝑗0 ∶= 𝑁 𝜀/2 . Since
𝑛sc (𝛾𝑗 ) ≥ 𝑁 −1+𝜀/2 and 𝑛sc (𝛾𝑗 ) = 𝔫𝑁 (𝜆𝑗 ), we have
|𝑛sc (𝜆𝑗 ) − 𝑛sc (𝛾𝑗 )| ≤ 𝑁 −𝜀/4 𝑛sc (𝛾𝑗 )
with very high probability by (11.25). This shows that 𝑛sc (𝜆𝑗 ) ≍ 𝑛sc (𝛾𝑗 ), but then
𝜚sc (𝜆𝑗 ) ≍ 𝜚sc (𝛾𝑗 ) by (11.35), as we presumed above.
Finally, for indices 𝑗 ≤ 𝑗0 = 𝑁 𝜀/2 we use monotonicity and rigidity (11.38)
for the index 𝑗0 :
𝜆𝑗 ≤ 𝜆𝑗0 ≤ 𝛾𝑗0 + 𝑁 −2/3+𝜀/2 ≤ −2 + 𝑁 −2/3+𝜀
with very high probability. In the last step we used (11.36). For the lower bound
on 𝜆𝑗 we refer to (11.1). This shows that |𝜆𝑗 + 2| ≤ 𝑁 −2/3+𝜀 with very high
probability, and we also have |𝛾𝑗 + 2| ≤ 𝑁 −2/3+𝜀 , so |𝜆𝑗 − 𝛾𝑗 | ≤ 2𝑁 −2/3+𝜀 . This
proves the rigidity estimate (11.32) for all indices 𝑗. □
CHAPTER 12
1 1
(12.1) d𝐻(𝑡) = dB(𝑡) − 𝐻(𝑡)d𝑡, 𝑡 ≥ 0,
√𝑁 2
with initial data 𝐻0 where B(𝑡) is defined somewhat differently in the two sym-
metry classes, namely:
We will drop the s or h superscript; the following formulas hold for both cases.
In coordinates, we have
d𝑏𝑖𝑗 (𝑡) 1
(12.2) dℎ𝑖𝑗 (𝑡) = − ℎ (𝑡)d𝑡
√𝑁 2 𝑖𝑗
where the entries 𝑏𝑖𝑗 (𝑡) have variance 𝑡 in the Hermitian case and in the sym-
metric case the off-diagonal entries (𝑏𝑖𝑗 (𝑡)) have variance 𝑡, while the diagonal
entries have variance 2𝑡.
The equation (12.1) defines a matrix-valued Ornstein-Uhlenbeck (OU) pro-
cess. It plays a distinguished role in the theory of random matrices that was
discovered by Dyson. Depending on the typographical convenience, the time
dependence will sometimes be indicated as an argument, 𝐻(𝑡), and sometimes
as an index, 𝐻𝑡 ; we keep both notations in parallel, i.e., 𝐻(𝑡) = 𝐻𝑡 . In particular,
unlike in some PDE literature, subscripts do not mean derivatives.
109
110 12. UNIVERSALITY FOR MATRICES WITH GAUSSIAN CONVOLUTIONS
Next, we are concerned about the evolution of the eigenvalues 𝝀(𝑡) = (𝜆1 (𝑡),
… , 𝜆𝑁 (𝑡)) of 𝐻𝑡 along the OU flow. We label the eigenvalues in an increasing
order; i.e., we assume that 𝝀(𝑡) ∈ Σ𝑁 , where we define the open simplex
A theorem below will guarantee that the eigenvalues are simple and are con-
tinuous functions of 𝑡, hence the labeling is consistently preserved along the
evolution.
In principle, the eigenvalues {𝜆𝑗 (𝑡)} and eigenvectors {𝐮𝑗 (𝑡)} of 𝐻𝑡 are strongly
related, and one would expect a coupled system of stochastic differential equa-
tions for them. It was Dyson’s fundamental observation [45] that the eigenvalues
themselves satisfy an autonomous system of stochastic differential equations
(SDE) that do not involve the eigenvectors. This SDE is given in the following
definition.
(𝑖)
√2 𝜆𝑖 1 1
(12.4) d𝜆𝑖 = d𝐵𝑖 + (− + ∑ )d𝑡, 𝑖 ∈ J1, 𝑁K
√𝛽𝑁 2 𝑁 𝑗 𝜆𝑖 − 𝜆𝑗
Notice that we defined the DBM for any 𝛽 ≥ 1 and not only for the classical
values 𝛽 = 1, 2 that will correspond to an OU matrix flow. The following the-
orem summarizes the main properties of the DBM. For the proof, see lemma
4.3.3 and proposition 4.3.5 of [8]. We remark that the authors in [8] considered
(12.4) without the drift term −𝜆𝑖 /2. However, any drift term of the form −𝑉 ′ (𝜆𝑖 )
with 𝑉 ∈ 𝐶 2 (ℝ) could be added without further complications.
Theorem 12.2. Let 𝛽 ≥ 1 and suppose that the initial data satisfy 𝝀(0) ∈ Σ𝑁 .
Then there exists a unique (strong) solution to (12.4) in the space of continuous
functions (𝝀(𝑡))𝑡≥0 ∈ 𝐶(ℝ+ , Σ𝑁 ). Furthermore, for any 𝑡 > 0 we have 𝝀(𝑡) ∈
Σ𝑁 , and 𝝀(𝑡) depends continuously on 𝝀(0). In particular, if 𝝀(0) ∈ Σ𝑁 ; i.e., the
multiplicity of the initial points is 1, then (𝝀(𝑡))𝑡≥0 ∈ 𝐶(ℝ+ , Σ𝑁 ); i.e., this property
is preserved for all times along the evolution.
We point out that the solution is considered in the strong sense. This means
that despite the singularity in (12.4), the DBM admits a solution for almost all
realizations of the Brownian motions 𝐵𝑖 . For the precise definition, we recall
the concept of strong solution to a (scalar) SDE of the form
(12.6) ̇ 𝛼 + 𝐻 𝐮̇ 𝛼 = 𝜆𝛼̇ 𝐮𝛼 + 𝜆𝛼 𝐮̇ 𝛼 ,
𝐻𝐮
as well as
𝐮̇ ∗𝛼 𝐮𝛽 + 𝐮∗𝛼 𝐮̇ 𝛽 = 0, 𝐮̇ ∗𝛼 𝐮𝛼 = 0.
Taking the inner product with 𝐮𝛼 on both sides of (12.6), we get
(12.7) 𝜆𝛼̇ = 𝐮∗𝛼 𝐻𝐮
̇ 𝛼.
i.e.,
̇ 𝛼 + 𝜆𝛽 𝐮∗𝛽 𝐮̇ 𝛼 = 𝜆𝛼 𝐮∗𝛽 𝐮̇ 𝛼
𝐮∗𝛽 𝐻𝐮
where we have used that 𝐻 is self-adjoint and 𝜆𝛽 is real in the last equation.
Hence,
̇ 𝛼
𝐮∗𝛽 𝐻𝐮
(12.8) 𝐮̇ 𝛼 = ∑ (𝐮∗𝛽 𝐮̇ 𝛼 )𝐮𝛽 = ∑ 𝐮𝛽 .
𝛽≠𝛼 𝛽≠𝛼
𝜆𝛼 − 𝜆𝛽
For simplicity of notation, from now on we consider only the real symmetric
case. The complex Hermitian case can be treated similarly. In the real symmet-
ric case, (12.7) reads as
𝜕𝜆𝛼
(12.9) = 𝑢𝛼 (𝑖)𝑢𝛼 (𝑗)[2 − 𝛿𝑖𝑗 ],
𝜕ℎ𝑖𝑗
where 𝑢𝛼 (1), … , 𝑢𝛼 (𝑁) denote the coordinates of the vector 𝐮𝛼 . We may assume
that the eigenvectors are real. From (12.8) we get
𝜕𝑢𝛼 (𝑘) 𝑢𝛽 (𝑖)𝑢𝛼 (𝑗) + 𝑢𝛽 (𝑗)𝑢𝛼 (𝑖)[1 − 𝛿𝑖𝑗 ]
(12.10) = ∑ 𝑢𝛽 (𝑘).
𝜕ℎ𝑖𝑗 𝛽≠𝛼
𝜆𝛼 − 𝜆 𝛽
12.2. DERIVATION OF DYSON BROWNIAN MOTION AND PERTURBATION THEORY 113
Combining these last two formulas allows us to compute the second partial
derivatives; i.e., for any fixed indices 𝑖, 𝑗, 𝑘, and ℓ we have
𝜕 2 𝜆𝛼
𝜕ℎℓ𝑗 𝜕ℎ𝑖𝑘
𝜕𝑢𝛼 (𝑖) 𝜕𝑢 (𝑘)
= [2 − 𝛿𝑖𝑘 ][ 𝑢 (𝑘) + 𝑢𝛼 (𝑖) 𝛼 ]
𝜕ℎℓ𝑗 𝛼 𝜕ℎℓ𝑗
1
= [2 − 𝛿𝑖𝑘 ] ∑ [(𝑢 (𝑗)𝑢𝛼 (ℓ) + 𝑢𝛽 (ℓ)𝑢𝛼 (𝑗)[1 − 𝛿𝑗ℓ ])𝑢𝛽 (𝑖)𝑢𝛼 (𝑘)
𝛽≠𝛼
𝜆𝛼 − 𝜆𝛽 𝛽
(12.11)
+ (𝑢𝛽 (ℓ)𝑢𝛼 (𝑗) + 𝑢𝛽 (𝑗)𝑢𝛼 (ℓ)[1 − 𝛿𝑗ℓ ])𝑢𝛽 (𝑘)𝑢𝛼 (𝑖)]
1
= [2 − 𝛿𝑖𝑘 ] ∑ [(𝑢 (𝑗)𝑢𝛼 (ℓ) + 𝑢𝛽 (ℓ)𝑢𝛼 (𝑗)[1 − 𝛿𝑗ℓ ])
𝛽≠𝛼
𝜆𝛼 − 𝜆𝛽 𝛽
By (12.2), (12.9), (12.11), and using Itô’s formula (neglecting the issue of singu-
larity), we have
𝜕𝜆𝛼 1 𝜕 2 𝜆𝛼
d𝜆𝛼 = ∑ dℎ𝑖𝑘 + ∑ ∑ (dℎ𝑖𝑘 )(dℎℓ𝑗 )
𝑖≤𝑘
𝜕ℎ𝑖𝑘 2 𝑖≤𝑘 𝑗≤ℓ 𝜕ℎ𝑖𝑘 𝜕ℎℓ𝑗
d𝑏𝑖𝑘 ℎ𝑖𝑘
= ∑ 𝑢𝛼 (𝑖)𝑢𝛼 (𝑘)[ − d𝑡]
𝑖,𝑘 √𝑁 2
(12.12)
1 1
+ ∑∑ [|𝑢 (𝑖)|2 |𝑢𝛼 (𝑘)|2 + |𝑢𝛼 (𝑖)|2 |𝑢𝛽 (𝑘)|2 ]d𝑡
2𝑁 𝑖,𝑘 𝛽≠𝛼 𝜆𝛼 − 𝜆𝛽 𝛽
1 𝜆𝛼 1 1
= ∑ 𝑢𝛼 (𝑖)𝑢𝛼 (𝑘)d𝑏𝑖𝑘 − d𝑡 + ∑ d𝑡.
√𝑁 𝑖,𝑘
2 𝑁 𝛽≠𝛼 𝜆𝛼 − 𝜆𝛽
where there are two pairings, (𝑖, ℓ) = (𝑘, 𝑗) and (𝑖, ℓ) = (𝑗, 𝑘), that contributed
to the contractions. Thus 𝐵 ˜ 𝛼 = √2𝐵𝛼 where (𝐵𝛼 )𝑁 𝛼=1 is the standard (real)
𝑁
Brownian motion in ℝ , and this gives the martingale term in (12.4) with 𝛽 = 1.
A similar formula can be derived for the Hermitian case, where the parameter
becomes 𝛽 = 2; we omit the details.
In the next section we put the Dyson Brownian motion in a more general
context that is closer to the interpretation of GUE as an invariant ensemble in the
spirit of Chapter 4. It turns out that the measure on the eigenvalues, explicitly
given in (4.3), generates a DBM in a canonical way such that this measure will
be invariant under the dynamics. This holds for any values 𝛽 ≥ 1, i.e., even if
there is no underlying matrix ensemble.
𝑒−𝛽𝑁ℋ𝑁 (𝝀)
𝜇𝑁 (d𝝀) = d𝝀,
𝑍
(12.13) 𝑁
1 1
ℋ𝑁 (𝝀) ∶= ∑ 𝑉(𝜆𝑖 ) − ∑ log |𝜆𝑗 − 𝜆𝑖 |.
2 𝑖=1 𝑁 𝑖<𝑗
We remark that in our earlier papers the Dirichlet form was defined with a
1/(2𝑁) prefactor instead of 1/(𝛽𝑁). The current convention is suitable to con-
sider DBM for general 𝛽.
12.3. STRONG LOCAL ERGODICITY OF THE DYSON BROWNIAN MOTION 115
The symmetric operator associated with the Dirichlet form is called the gen-
erator and denoted by ℒ = ℒ𝜇 . It satisfies
𝑁 𝑁 (𝑖)
1 2 1 1 1
(12.15) ℒ=∑ 𝜕𝑖 + ∑ (− 𝑉 ′ (𝜆𝑖 ) + ∑ )𝜕 .
𝑖=1
𝛽𝑁 𝑖=1
2 𝑁 𝑗 𝜆𝑖 − 𝜆𝑗 𝑖
𝑁 𝑁 (𝑖)
1 2 1 1 1
(12.16) ℒ𝐺 = ∑ 𝜕𝑖 + ∑ (− 𝜆𝑖 + ∑ )𝜕 .
𝑖=1
𝛽𝑁 𝑖=1
2 𝑁 𝑗
𝜆 𝑖 − 𝜆𝑗 𝑖
Now we consider the Dyson Brownian motion (12.4), i.e., dynamics of the
eigenvalues 𝝀 = (𝜆1 , … , 𝜆𝑁 ) ∈ Σ𝑁 of 𝐻𝑡 that evolves by (12.1). We write the
distribution of 𝝀 of 𝐻𝑡 at time 𝑡 as 𝑓𝑡 (𝝀)𝜇𝐺 (d𝝀). Comparing the SDE (12.4)
with ℒ𝐺 , we notice that Kolmogorov’s forward equation for the evolution of the
density 𝑓𝑡 takes the form
(12.17) 𝜕𝑡 𝑓𝑡 = ℒ𝐺 𝑓𝑡 (𝑡 ≥ 0).
(𝑛)
(12.19) 𝑝𝜈,𝑁 (𝑥1 , … , 𝑥𝑛 ) ∶= ∫ 𝜈(𝐱)d𝑥𝑛+1 ⋯ d𝑥𝑁 , 𝐱 = (𝑥1 , … , 𝑥𝑁 ).
ℝ𝑁−𝑛
12.3. STRONG LOCAL ERGODICITY OF THE DYSON BROWNIAN MOTION 117
In other words, this theorem states that if we have rigidity on scale 𝑁 −1+𝜉
for some 𝜉 ∈ (0, 12 ), then the DBM has average energy universality in the bulk
for any time 𝑡 ≫ 𝑁 −1+2𝜉 on scale 𝑏 ≫ max{𝑁 −1+𝜉 , (𝑁𝑡)−1 }. For generalized
Wigner matrices we know that rigidity holds on the smallest possible scale, so
we have the following corollary:
Corollary 12.5. Consider the matrix OU process with initial matrix ensem-
ble 𝐻0 being a generalized Wigner ensemble and let |𝐸| < 2. Then, for any 𝜀 > 0,
for any 𝑡 ∈ [𝑁 −1+2𝜀 , 𝑁 𝜀 ], and for any 𝑏 ≥ (𝑁𝑡)−1 with 𝑏 < ||𝐸| − 2| we have
| 𝐸+𝑏 d𝐸 ′ (𝑛) (𝑛) 𝜶 | 𝑁 3𝜀
(12.24) |∫ ∫ d𝜶 𝑂(𝜶)(𝑝𝑡,𝑁 − 𝑝𝐺,𝑁 ) (𝐸 ′ + )| ≤ ‖𝑂‖𝐶1
| 𝐸−𝑏 2𝑏 ℝ𝑛 𝑁 | √𝑏𝑁𝑡
(𝑛)
for any 𝑁 ≥ 𝑁0 (𝑛, 𝜀) where 𝑝𝑡,𝑁 is the correlation function of 𝐻𝑡 , or equivalently,
that of 𝑒−𝑡/2 𝐻0 + √1 − 𝑒−𝑡 𝐻 G .
just below 𝑁 −1/2 in the bulk. Clearly, 𝑁 −1/2 is a critical threshold; no local
universality can be concluded with this argument unless a rigidity control on a
scale below 𝑁 −1/2 is established a priori.
To establish the relation between 𝐿 and 𝐿(Σ) , we first define the symmetrized
version of Σ,
Σ̃ ∶= ℝ𝑁 ⧵ {𝐱 ∶ ∃𝑖 ≠ 𝑗 with 𝑥𝑖 = 𝑥𝑗 }.
Denote 𝑋 ∶= 𝐶0∞ ( Σ̃ ). The key information is that 𝑋 is dense in 𝐻 1 (ℝ𝑁 , d𝜇),
which is equivalent to the density of 𝑋 in 𝐶0∞ (ℝ𝑁 , d𝜇). We will check this prop-
erty below. Then the general argument above directly applies if ℝ𝑁 is replaced
by Σ̃ 𝑁 , and it shows that the generator 𝐿 is the same (with the same domain)
if we start from 𝑋 instead of 𝐶0∞ (ℝ𝑁 , d𝜇) as a core.
Note that both 𝐿 and 𝐿(Σ) are local operators and 𝐿 is symmetric with respect
to the permutation of the variables. For any function 𝑓 defined on Σ, we define its
symmetric extension onto Σ̃ by 𝑓 ˜ . Clearly, 𝐿 𝑓 ˜ = ˜ 𝐿(Σ) 𝑓 for any 𝑓 ∈ 𝐶0∞ (Σ).
Since the generator is uniquely determined by its action on its core, and the
generator uniquely determines the dynamics, we see that for any 𝑓 ∈ 𝐿1 (Σ, d𝜇),
(Σ) ˜ and restricting it to Σ. In other
one can determine 𝑇𝑡 𝑓 by computing 𝑇𝑡 𝑓
words, the dynamics (12.17) is well-defined when restricted to Σ = Σ𝑁 .
Finally, we have to prove the density of 𝑋 in 𝐶0∞ (ℝ𝑁 , d𝜇), i.e., to show that if
𝑓 ∈ 𝐶0∞ (ℝ𝑁 ), then a sequence 𝑓𝑛 ∈ 𝐶0∞ ( Σ̃ ) exists such that ℰ(𝑓−𝑓𝑛 , 𝑓−𝑓𝑛 ) → 0.
The structure of Σ̃ is complicated since in addition to the one-codimensional
coalescence hyperplanes 𝑥𝑖 = 𝑥𝑗 (and 𝑥𝑖 = 0 in case of Σ+ ), it contains higher-
order coalescence subspaces with higher codimensions. We will show the ap-
proximation argument in a neighborhood of a point 𝐱 such that 𝑥𝑖 = 𝑥𝑗 but
𝑥𝑖 ≠ 𝑥𝑘 for any other 𝑘 ≠ 𝑖, 𝑗. The proof uses the fact that the measure d𝜇
vanishes at least to first order, i.e., at least as |𝑥𝑖 − 𝑥𝑗 | around 𝐱, thanks to 𝛽 ≥ 1.
This is the critical case; the argument near higher-order coalescence points is
even easier, since they have lower codimension and the measure 𝜇 vanishes at
even higher order.
In a neighborhood of 𝐱 we can change to local coordinates such that 𝑟 ∶=
𝑥𝑖 − 𝑥𝑗 remains the only relevant coordinate. Thus, the task is equivalent to
showing that any 𝑔 ∈ 𝐶0∞ (ℝ) can be approximated by a sequence 𝑔𝜀 ∈ 𝐶0∞ (ℝ ⧵
{0}) in the sense that
𝐺(𝑦) ∶= 𝑔(|𝑦|) for any 𝑦 ∈ ℝ2 . Then the left-hand side above gives
𝜀
|𝑔(𝑟)|2 |𝐺(𝑦)|2
𝑏2 ∫ 𝑟 d𝑟 = 𝑏 2
∫ d𝑦
𝜀2
𝑟2 (𝑎 + 𝑏 log 𝑟)2 𝜀2 ≤|𝑦|≤𝜀
|𝑦|2 (𝑎 + 𝑏 log |𝑦|)2
|𝐺(𝑦)|2
≤ 𝐶∫ d𝑦.
𝜀2 ≤|𝑦|≤𝜀
|𝑦|2 (log |𝑦|)2
122 12. UNIVERSALITY FOR MATRICES WITH GAUSSIAN CONVOLUTIONS
𝑘
Define the sequence 𝜀𝑘 ∶= 2−2 ; clearly 𝜀𝑘2 = 𝜀𝑘+1 . Setting
|𝐺(𝑦)|2
𝐼𝑘 ∶= ∫ d𝑦,
𝜀𝑘+1 ≤|𝑦|≤𝜀𝑘
|𝑦|2 (log |𝑦|)2
we clearly have
∞ ∞
|𝐺(𝑦)|2
∑ 𝐼𝑘 = ∫ d𝑦 ≤ 𝐶 ∫ |∇𝐺|2 = 𝐶 ′ ∫ |𝑔′ (𝑟)|2 𝑟 d𝑟,
𝑘=1 |𝑦|≤1/2
|𝑦|2 (log |𝑦|)2 ℝ2 0
where in the last step we used the critical Hardy inequality with logarithmic
correction (see, e.g., theorem 2.8 in [48]). Since in our case 𝑔 ∈ 𝐻 1 (ℝ+ , 𝑟 d𝑟),
from the last displayed inequality we have 𝐼𝑘 → 0 as 𝑘 → ∞. Therefore, we can
use the cutoff function 𝜙𝜀𝑘 to prove the claim.
The Meyers-Serrin theorem we just proved for (𝑅+ , 𝑟 d𝑟) can be extended to
(ℝ𝑁 , d𝜇) using local changes of coordinates if 𝛽 ≥ 1. Thus, one may prove that
the form domain of the operator 𝐿(Σ) is 𝐻 1 (Σ, d𝜇Σ ) and the dynamics (12.25) is
well-defined on this space.
CHAPTER 13
In most cases, the reference measure 𝜇 will be canonical (e.g., the natural equi-
librium measure), so often we will drop the subscript 𝜇 if there is no confusion.
We may then call 𝑆𝜇 (𝑓) = 𝑆(𝑓) the entropy of 𝑓. Using the convexity of the
function 𝑥 ↦ 𝑥 log 𝑥 on ℝ+ , a simple Jensen’s inequality,
Proof. First note that we only have to prove the case 𝛼 = 1 since we can
redefine 𝛼𝑋 → 𝑋. From the concavity of the logarithm and Jensen’s inequality,
we have
d𝜇
∫ 𝑋 d𝜈 − 𝑆(𝜈|𝜇) = ∫ log[𝑒𝑋 ]d𝜈 ≤ log 𝔼𝜈 𝑒𝑋 ,
d𝜈
and this proves (13.2). □
Although (13.2) is just an inequality, there is a variational characterization
of the relative entropy behind it. Namely,
(13.3) 𝑆(𝜈|𝜇) = sup[𝔼𝜈 [𝑋] − log 𝔼𝜇 𝑒𝑋 ],
𝑋
where the supremum is over all bounded random variables. As we will not need
this relation in this book, we leave it to the interested reader to prove it.
As a corollary of (13.2), we mention that for any set 𝐴 we have the bound
log 2 + 𝑆(𝜈|𝜇)
(13.4) ℙ𝜈 (𝐴) ≤ 1
.
log
ℙ𝜇 (𝐴)
The proof is left as an exercise. This inequality will be used in the following
context. Suppose that the relative entropy of 𝜈 with respect to 𝜇 is finite. Then
in order to show that a set has a small probability w.r.t. 𝜈, we need to verify
that this set is exponentially small w.r.t. 𝜇. In this sense, entropy provides only
a relatively weak link between two measures. However, it is still stronger than
the total variational norm, which we will show now.
For any two probability measures 𝑓𝜇 and 𝜇, the 𝐿𝑝 -distance between 𝑓𝜇
and 𝜇 is defined by
1/𝑝
𝑝
(13.5) [∫ |𝑓 − 1| d𝜇] .
For any such function 𝑔, we have by the entropy inequality (13.2) that for any
𝑡>0
for any 𝑡 ≥ 0. A simple calculation shows that the second derivative of ℎ is given
by
𝑒𝑡𝑔 𝜇
ℎ′′ (𝑡) = ⟨𝑔; 𝑔⟩𝜔𝑡 , 𝜔𝑡 ∶= ,
∫ 𝑒𝑡𝑔 d𝜇
where 𝜔𝑡 is a probability measure, and
denotes the covariance. Recall that the covariance is positive definite; i.e., it
satisfies the usual Schwarz inequality,
𝑡 1
∫ 𝑓𝑔 d𝜇 − ∫ 𝑔 d𝜇 ≤ + 𝑡 −1 ∫ 𝑓 log 𝑓 d𝜇 ≤ ∫ 𝑓 log 𝑓 d𝜇,
2 √2
where we optimized 𝑡 in the last step. Since this bound holds for any 𝑔 with
|𝑔| ≤ 1, using (13.9) we have proved (13.8). □
126 13. ENTROPY AND THE LOGARITHMIC SOBOLEV INEQUALITY (LSI)
The notation d𝑥 d𝑦 for the reference measure already indicates that in our appli-
cations Ω𝑗 will be euclidean spaces ℝ𝑛𝑗 of some dimension 𝑛𝑗 , and the reference
measure will be the Lebesgue measure.
We remark that the concept of conditioning can be defined in full generality
with respect to any sub-𝜎-algebra; the product structure of Ω is not essential.
However, we will not need the general definition in this book and the above
definition is conceptually simpler.
The conditional expectation gives rise to a trivial martingale decomposition:
(13.16) 𝑢=𝑢
ˆ + (𝑢 − 𝑢
ˆ),
where 𝑢ˆ is ℱ1 -measurable, while 𝑢−ˆ
𝑢 has zero expectation on any ℱ1 -measurable
set. Subtracting the expectation 𝑢 ∶= 𝔼𝜔 𝑢 = 𝔼𝜔 𝑢 ˆ and squaring this formula,
we have the martingale decomposition of the variance of 𝑢:
Var𝜔 (𝑢) ∶= 𝔼𝜔 (𝑢 − 𝑢)2 = 𝔼𝜔 (𝑢 − 𝑢
ˆ)2 + 𝔼𝜔 (ˆ
𝑢 − 𝑢)2
(13.17)
= 𝔼𝜔 Var(𝑢(𝑥, ⋅ )) + Var𝜔 (ˆ
𝑢)
where we defined the conditional variance
ˆ)2 |ℱ1 ] = 𝔼[𝑢2 |ℱ1 ] − [ˆ
Var(𝑢(𝑥, ⋅ )) ∶= 𝔼[(𝑢 − 𝑢 𝑢 ]2 .
The identity (13.17) is a triviality, but its interpretation is important. It
means that the variance is additive w.r.t. the martingale decomposition (13.16).
The first term 𝔼𝜔 Var(𝑢(𝑥, ⋅ )) is the expectation of the variance w.r.t. 𝑦 condi-
tioned on 𝑥; the second term Var(ˆ 𝑢 − 𝑢)2 is the variance of the marginal w.r.t. 𝑥.
In other words, we can compute the variance one by one.
The martingale decomposition has an analogue for the entropy. For sim-
plicity, we assume that 𝜔 has a density 𝜔(𝑥, 𝑦) w.r.t. a reference measure d𝑥 d𝑦.
Denote by 𝜔˜ the marginal 𝜔˜ probability density on Ω1 ,
𝜔(𝑥)
˜ = ∫ 𝜔(𝑥, 𝑦)d𝑦 ,
Ω2
(13.18) ˆ 𝜔(𝑥)d𝑥.
∬ 𝑂(𝑥)𝑓(𝑥, 𝑦)𝜔(𝑥, 𝑦)d𝑥d𝑦 = ∫ 𝑂(𝑥)𝑓(𝑥) ˆ
Ω Ω1
Let
𝜔(𝑥, 𝑦)
𝜔𝑥 (𝑦) ∶=
𝜔(𝑥)
˜
be the probability density on Ω2 conditioned on a fixed 𝑥 ∈ Ω1 . Define
𝑓(𝑥, 𝑦)
(13.19) 𝑓𝑥 (𝑦) ∶=
ˆ
𝑓(𝑥)
128 13. ENTROPY AND THE LOGARITHMIC SOBOLEV INEQUALITY (LSI)
where in the last step we used (13.18). The first term we can rewrite as
(13.23) = ∫ d𝑥 𝜔(𝑥) ˆ
˜ 𝑓(𝑥)[∫ 𝑓𝑥 (𝑦) log 𝑓𝑥 (𝑦)𝜔𝑥 (𝑦)d𝑦]
= 𝔼𝑓𝜔˜𝑆𝜔𝑥 (𝑓𝑥 ).
ˆ
We remark that in many books on probability, e.g., [43], the Dirichlet form
(13.25) is defined with a factor 12 , but this convention is not compatible with
the 1/(𝛽𝑁) prefactor in (12.14). The lack of this 12 factor in (13.25) causes slight
deviations from their customary form in the following theorems.
Definition 13.5. The probability measure 𝜇 on ℝ𝑁 satisfies the logarithmic
Sobolev inequality if there exists a constant 𝛾 such that
𝜕𝑡 𝐷(√𝑓𝑡 ) = 𝜕𝑡 ∫(∇ℎ)2 d𝜇
= 2 ∫ ∇ℎ ⋅ ∇𝜕𝑡 ℎ d𝜇
(∇ℎ)2
= 2 ∫(∇ℎ) ⋅ (∇ℒℎ)d𝜇 + 2 ∫(∇ℎ) ⋅ ∇ d𝜇
ℎ
(𝜕𝑖 ℎ)(𝜕𝑗 ℎ) 2
− 2 ∫ ∑(𝜕𝑖𝑗 ℎ − ) d𝜇
𝑖𝑗
ℎ
13.3. LOGARITHMIC SOBOLEV INEQUALITY 131
Using the positivity of the entropy 𝑆(𝑓𝑡 ) ≥ 0 on the left side and the monotonicity
of the Dirichlet form (from (13.34)) on the right side, we get
2
(13.38) 𝐷(√𝑓𝑡 ) ≤ 𝑆(𝑓𝑡/2 );
𝑡
thus, using (13.37), we obtain exponential relaxation of the Dirichlet form on time
scale 𝑡 ≍ 1/𝐾,
2
𝐷(√𝑓𝑡 ) ≤ 𝑒−𝑡𝐾 𝑆(𝑓0 ). □
𝑡
132 13. ENTROPY AND THE LOGARITHMIC SOBOLEV INEQUALITY (LSI)
(13.42) ∫ 𝑔2 log(𝑔2 /‖𝑔‖2 )d𝑥 + 𝑛[1 + log 𝑎] ∫ 𝑔2 d𝑥 ≤ (𝑎2 /𝜋) ∫ |∇𝑔|2 d𝑥,
ℝ𝑛 ℝ𝑛 ℝ𝑛
which holds for any 𝑎 > 0 and any function 𝑔, where ‖𝑔‖ = (∫ 𝑔2 d𝑥)1/2 is the
𝐿2 -norm with respect to the Lebesgue measure.
Proposition 13.7 (LSI implies spectral gap). Let 𝜇 satisfy the LSI (13.27)
with an LSI constant 𝛾. Then, for any 𝑣 ∈ 𝐿2 (𝜇) with ∫ 𝑣 d𝜇 = 0, we have
𝛾 𝛾
(13.43) ∫ 𝑣 2 d𝜇 ≤ ∫ |∇𝑣|2 d𝜇 = 𝐷(𝑣);
2 2
i.e., 𝜇 has a spectral gap of size at least 𝛾/2.
Proof. By definition of the LSI constant, we have
∫ 𝑢 log 𝑢 d𝜇 ≤ 𝛾𝐷(√𝑢)
The following inequality for any two nonnegative numbers 𝑎, 𝑏 can be checked
by elementary calculus:
𝑎 log 𝑎 − 𝑏 log 𝑏 − (1 + log 𝑏)(𝑎 − 𝑏) ≥ 0.
Hence,
(13.50) ˆ = 𝜔(𝑥)
𝑓(𝑥) ˆ −1 ∫ 𝑓(𝑥, 𝑦)𝜔(𝑥, 𝑦)d𝑦 = ∫ 𝑓(𝑥, 𝑦)𝜈(𝑦)d𝑦.
ℒ𝑔𝑡
= ∫ 𝜓𝑡 [𝑔𝑡 ℒ(log 𝑔𝑡 ) − 𝑔𝑡 ] d𝜇 + ∫ 𝑔𝑡 (ℒ − 𝜕𝑡 )𝜓𝑡 d𝜇.
𝑔𝑡
By definition of ℒ, we have
2
ℒ𝑔 (𝜕𝑗 𝑔) 2
(13.52) ℒ(log 𝑔) − = −∑ = −4 ∑ (𝜕𝑗 √𝑔) ,
𝑔 𝑗
𝑔 𝑗
13.4. Hypercontractivity
We now present an interesting connection between the LSI of a probability
measure 𝜇 and the hypercontractivity properties of the semigroup generated by
ℒ = ℒ𝜇 . Since this result will not be used later in this book, this section can be
skipped.
To state the result, we define the semigroup {𝑃𝑡 }𝑡≥0 by 𝑃𝑡 𝑓 ∶= 𝑓𝑡 , where 𝑓𝑡
solves the equation 𝜕𝑡 𝑓𝑡 = ℒ𝑓𝑡 with initial condition 𝑓0 = 𝑓.
Theorem 13.12 (L. Gross [79]). For a measure 𝜇 on ℝ𝑛 and for any fixed
constants 𝛽 ≥ 0 and 𝛾 > 0 the following two statements are equivalent:
(i) The generalized LSI
holds.
(ii) The hypercontractivity estimate
1 1
(13.54) ‖𝑃𝑡 𝑓‖𝐿𝑞 (𝜇) ≤ exp {𝛽 [ − ]} ‖𝑓‖𝐿𝑝 (𝜇)
𝑝 𝑞
holds for all exponents satisfying
𝑞−1
≤ 𝑒4𝑡/𝛾 , 1 < 𝑝 ≤ 𝑞 < ∞.
𝑝−1
13.4. HYPERCONTRACTIVITY 137
Proof. We will only prove (i) ⇒ (ii), i.e., that the generalized LSI implies
the decay estimate, the proof of the converse statement can be found in [43].
First we assume that 𝑓 ≥ 0, hence 𝑓𝑡 ≥ 0. Direct differentiation yields the
identity
d
(13.55) log ‖𝑓𝑡 ‖𝑝(𝑡) =
d𝑡
𝑝(𝑡)
̇ 4(𝑝(𝑡) − 1)
[− 𝐷(𝑢(𝑡)) + ∫ 𝑢(𝑡)2 log(𝑢(𝑡)2 )d𝜇]
𝑝(𝑡)2 𝑝(𝑡)
̇
d
with 𝑝(𝑡)
̇ = 𝑝(𝑡) and where we defined
d𝑡
𝑝(𝑡)/2 −𝑝(𝑡)/2
𝑢(𝑡) ∶= 𝑓𝑡 ‖𝑓𝑡 ‖𝑝(𝑡) , 𝐷(𝑢) = ∫(∇𝑢)2 d𝜇.
4(𝑝(𝑡) − 1)
𝛾= with 𝑝(0) = 𝑝, i.e., 𝑝(𝑡) = 1 + (𝑝 − 1)𝑒4𝑡/𝛾 ,
𝑝(𝑡)
̇
where 𝛾 is the constant given in the theorem. Using (13.53) for the 𝐿1 (𝜇)-
normalized function 𝑢(𝑡)2 , we have
d 𝛽 𝑝(𝑡)
̇
log ‖𝑓𝑡 ‖𝑝(𝑡) ≤ .
d𝑡 𝑝(𝑡)2
1 1
log ‖𝑓𝑇 ‖𝑝(𝑇) − log ‖𝑓‖𝑝 ≤ 𝛽[ − ].
𝑝 𝑝(𝑇)
Choosing 𝑇 such that 𝑝(𝑇) = 𝑞, we have proved (13.54) for 𝑓 ≥ 0. The general
case follows from separating the positive and negative parts of 𝑓. □
Exercise. In this exercise, we show that the idea of the LSI can be useful
even if the invariant measure is not a probability measure. The sketch below
follows the paper by Carlen-Loss [33], and it works for any parabolic equation
of the type
𝜕𝑡 𝑓𝑡 = [∇ ⋅ (𝐷(𝑥, 𝑡)∇) + b(𝑥, 𝑡) ⋅ ∇]𝑓𝑡
for any divergence free b and 𝐷(𝑥, 𝑡) ≥ 𝑐 > 0. For simplicity of notation, we
consider only the heat equation on ℝ𝑛
(13.56) 𝜕𝑡 𝑓𝑡 = Δ𝑓𝑡 .
138 13. ENTROPY AND THE LOGARITHMIC SOBOLEV INEQUALITY (LSI)
as a matrix inequality for some positive constant 𝐾. Then, for any smooth function
𝑓 ∈ 𝐿2 (𝜇), we have
Recall that ⟨𝑓, 𝑔⟩𝜇 = ∫ 𝑓𝑔 d𝜇 denotes the scalar product and ⟨𝑓; 𝑔⟩𝜇 =
⟨𝑓, 𝑔⟩𝜇 − ⟨1, 𝑓⟩𝜇 ⟨1, 𝑔⟩𝜇 is the covariance. With a slight abuse of notation, we
𝑁
also use the notation ⟨F, G⟩𝜇 = ∑𝑖=1 ⟨𝐹𝑖 , 𝐺𝑖 ⟩𝜇 for the scalar product of any two
vector-valued functions F, G ∶ ℝ𝑁 → ℝ𝑁 ; this extended scalar product is used
in the right-hand side of (13.60).
Define
G(𝑡, 𝑥) ∶= (𝐺1 (𝑡, 𝐱), … , 𝐺𝑁 (𝑡, 𝐱)), 𝐺𝑗 (𝑡, 𝐱) ∶= 𝜕𝑥𝑗 [𝑒𝑡ℒ 𝑔(𝐱)].
From the decay estimate (13.31), it follows that the dynamics are mixing,
i.e., lim𝑡→∞ ⟨𝑓, 𝑒𝑡ℒ 𝑔⟩ = 0 for any smooth function 𝑓 with ∫ 𝑓 d𝜇 = 0. Thus for
140 13. ENTROPY AND THE LOGARITHMIC SOBOLEV INEQUALITY (LSI)
13.6.1. LSI from the Wigner matrix point of view. Let 𝜇 = 𝜇G be the
standard Gaussian measure 𝜇 ∼ exp (−𝑁 Tr 𝐻 2 ) on symmetric matrices. Notice
that for this measure the family {𝑥𝑖𝑗 = 𝑁 1/2 ℎ𝑖𝑗 , 𝑖 ≤ 𝑗} is a collection of indepen-
dent standard Gaussian random variables (up to a factor 2). Hence the LSI holds
for every matrix element and by its tensorial property, i.e., Proposition 13.10,
the LSI holds for any function of the full matrix considered as a function of this
collection {𝑥𝑖𝑗 = 𝑁 1/2 ℎ𝑖𝑗 , 𝑖 ≤ 𝑗}. In particular, using the spectral gap estimate
(13.43) for the Gaussian variables 𝑥𝑖𝑗 , we have for any function 𝐹 = 𝐹(𝐻)
𝐶 2
(13.61) ⟨𝐹(𝐻); 𝐹(𝐻)⟩𝜇 ≤ 𝐶 ∑ ∫ |𝜕𝑥𝑖𝑗 𝐹(𝐻)|2 d𝜇 = ∑ ∫|𝜕ℎ𝑖𝑗 𝐹(𝐻)| d𝜇,
𝑖≤𝑗
𝑁 𝑖≤𝑗
where the additional 𝑁 −1 factor comes from the scaling ℎ𝑖𝑗 = 𝑁 −1/2 𝑥𝑖𝑗 .
13.6. REMARKS ON THE APPLICATIONS OF THE LSI TO RANDOM MATRICES 141
Expanding the square and using the perturbation formula (12.9) with real eigen-
vectors, we can compute
2
1 | 𝜕𝜆 |
∑ ∫|∑ 𝜕 𝑅(𝝀) 𝛼 || d𝜇
𝑁 𝑖≤𝑗 | 𝛼 𝜆𝛼 𝜕ℎ𝑖𝑗
2
1 2 | 𝜕𝜆 |
≤ ∑ ∫||∑ 𝜕𝜆𝛼 𝑅(𝝀) 𝛼 || d𝜇 =
𝑁 𝑖≤𝑗 2 − 𝛿𝑖𝑗 𝛼
𝜕ℎ𝑖𝑗
1 2 𝜕𝜆 𝜕𝜆𝛽
= ∑ ∑ ∫ 𝜕𝜆𝛼 𝑅(𝝀)𝜕𝜆𝛽 𝑅(𝝀) 𝛼 d𝜇
𝑁 𝑖≤𝑗 2 − 𝛿𝑖𝑗 𝛼,𝛽 𝜕ℎ𝑖𝑗 𝜕ℎ𝑖𝑗
2
= ∑[2 − 𝛿𝑖𝑗 ] ∑ ∫ 𝜕𝜆𝛼 𝑅(𝝀)𝜕𝜆𝛽 𝑅(𝝀)𝑢𝛼 (𝑖)𝑢𝛼 (𝑗)𝑢𝛽 (𝑖)𝑢𝛽 (𝑗) d𝜇
𝑁 𝑖≤𝑗 𝛼,𝛽
2
= ∑ ∑ ∫ 𝜕𝜆𝛼 𝑅(𝝀)𝜕𝜆𝛽 𝑅(𝝀)𝑢𝛼 (𝑖)𝑢𝛼 (𝑗)𝑢𝛽 (𝑖)𝑢𝛽 (𝑗) d𝜇
𝑁 𝑖,𝑗 𝛼,𝛽
2
(13.62) = ∑ ∫ |𝜕𝜆𝛼 𝑅(𝝀)|2 d𝜇,
𝑁 𝛼
where we have used the orthogonality property and the normalization conven-
tion of the eigenvectors in the last step.
We remark that the annoying factor [2 − 𝛿𝑖𝑗 ] can be avoided if we first con-
sider 𝜆𝛼 as a function of all {𝑥𝑖𝑗 ∶ 1 ≤ 𝑖, 𝑗 ≤ 𝑁} as independent variables. Then
the perturbation formula (12.9) becomes
𝜕𝜆𝛼 |
(13.63) | = 𝑢𝛼 (𝑖)𝑢𝛼 (𝑗),
𝜕ℎ𝑖𝑗 |ℎ𝑖𝑗 =ℎ𝑗𝑖
i.e., the derivative evaluated on the submanifold of Hermitian matrices. In this
way, we can keep the summation in (13.61) unrestricted, and up to a constant
factor, we will get the same final result as in (13.62).
In summary, we proved
𝐶
(13.64) ⟨𝐹; 𝐹⟩𝜇 = ⟨𝑅(𝝀(𝐻)); 𝑅(𝝀(𝐻))⟩𝜇 ≤ ∑ ∫ |𝜕𝜆𝛼 𝑅(𝝀)|2 d𝜇.
𝑁 𝛼
Notice that this argument holds for any generalized Wigner matrix as long as a
spectral gap estimate (13.43) holds for the distribution of every rescaled matrix
element 𝑁 1/2 ℎ𝑖𝑗 . Furthermore, it can be generalized for Wigner matrices with
Bernoulli random variables for which there is a spectral gap (and LSI) in discrete
form. Similar remarks apply to the following LSI estimates.
142 13. ENTROPY AND THE LOGARITHMIC SOBOLEV INEQUALITY (LSI)
To see how good this estimate is, consider a local linear statistics of eigen-
values, i.e., the local average of 𝐾 consecutive eigenvalues with label near 𝑀, by
setting
1 𝛼−𝑀
𝑅(𝝀) ∶= ∑ 𝐴( )𝜆𝛼 ,
𝐾 𝛼 𝐾
where 𝐴 is a smooth function of compact support and 1 ≤ 𝑀, 𝐾 ≤ 𝑁. Comput-
ing the right-hand side of (13.64) and combining it with (13.61), we get
𝐶 𝛼−𝑀 𝐶
(13.65) ⟨𝐹; 𝐹⟩𝜇 ≤ ∑ 𝐴2 ( )≤ 𝐴
𝑁𝐾 2 𝛼 𝐾 𝑁𝐾
where the last constant 𝐶𝐴 depends on the function 𝐴. For 𝐾 = 1, this inequal-
ity estimates the square of the fluctuation of a single eigenvalue (choosing 𝐴
appropriately). The bound (13.65) is off by a factor 𝑁 since the true fluctua-
tion is of order almost 1/𝑁 by rigidity, Theorem 11.5, at least in the bulk, i.e.,
if 𝛿𝑁 ≤ 𝑀 ≤ (1 − 𝛿)𝑁. On the other hand, for 𝐾 = 𝑁 the bound (13.65) is
much more precise; it shows that the variance of a macroscopic average of the
eigenvalues is at most of order 𝑁 −2 . This is the correct order of magnitude; in
𝛼
fact, it is known that ∑𝛼 𝐴( 𝑁 )𝜆𝛼 converges to a Gaussian random variable (see,
e.g., [103,118] and references therein). Hence, the spectral gap argument yields
the optimal (up to a constant) result for any macroscopic average of eigenvalues.
Another common quantity of interest is the Stieltjes transform of the em-
pirical eigenvalue distribution, i.e.,
1 1
(13.66) 𝐹(𝐻) = 𝐺(𝝀(𝐻)) with 𝐺(𝝀) = 𝑚𝑁 (𝑧) = ∑ , 𝑧 = 𝐸 + 𝑖𝜂.
𝑁 𝛼 𝜆𝛼 − 𝑧
This shows that the variance of 𝑚𝑁 (𝑧) vanishes if 𝜂 ≫ 𝑁 −2/3 . Since vanishing
fluctuations can be used to estimate the density, this argument can actually be
made rigorous by a bootstrapping argument [61]. Notice that the scale 𝜂 ≫
𝑁 −2/3 is still far from the resolution demonstrated in the local semicircle law,
Theorem 6.7.
Whenever the LSI is available, the variance bounds can be easily lifted to a
concentration estimate with exponential tail. We now demonstrate this mech-
anism for the fluctuation of a single eigenvalue. In other words, we will apply
(13.46) with 𝐹(𝐻) = 𝜆𝛼 (𝐻) − 𝔼𝜇 𝜆𝛼 (𝐻) for a fixed 𝛼. From (12.9), we have
2
∑ |∇𝑥𝑖𝑗 𝐹|2 ≤ ;
𝑖≤𝑗
𝑁
13.6.2. LSI from the invariant ensemble point of view. Now we pass to
the second point of view, where the basic measure 𝜇𝐺 is the invariant ensemble
on the eigenvalues. One might hope that the situation can be improved since
we look directly at the Gaussian eigenvalue ensemble defined in (12.13) with
the Gaussian choice for 𝑉(𝜆) = 12 𝜆2 . Notice that the role of ℋ in Theorem 13.6
will be played by 𝑁ℋ𝑁 defined in (12.13) (with 𝛽 = 1). The Hessian of ℋ𝑁 is
given by (all inner products and norms in the following equation are w.r.t. the
standard inner product in ℝ𝑁 )
1 1 (𝑣𝑖 − 𝑣𝑗 )2 1
(13.70) (𝐯, ∇2 ℋ𝑁 (𝐱)𝐯) ≥ ‖𝐯‖2 + ∑ ≥ ‖𝐯‖2 , 𝐯 ∈ ℝ𝑁 ;
2 𝑁 𝑖<𝑗 (𝑥𝑖 − 𝑥𝑗 ) 2 2
thus the convexity bound (13.28) holds with a constant 𝐾 = 𝑁/2 for 𝑁ℋ𝑁 .
Hence, the spectral gap from (13.43) implies that for any function 𝑅(𝝀) we have
𝐶
(13.71) ⟨𝑅(𝝀); 𝑅(𝝀)⟩𝜇𝐺 ≤ ∑ ∫ |𝜕𝜆𝛼 𝑅(𝝀)|2 d𝜇𝐺 .
𝑁 𝛼
Notice that this bound is in the same form as in (13.64). A similar statement
holds for the LSI; i.e., we have
𝐶 2
(13.72) ∫ 𝑅 log 𝑅 d𝜇𝐺 ≤ ∑ ∫|𝜕𝜆𝛼 √𝑅(𝝀)| d𝜇𝐺 .
𝑁 𝛼
144 13. ENTROPY AND THE LOGARITHMIC SOBOLEV INEQUALITY (LSI)
and define 𝜇𝛽 (𝐱) = 𝑍𝜇−1 𝑒−𝑁𝛽ℋ(𝐱) on the simplex Σ𝑁 , exactly as in (12.13) with
the Gaussian potential 𝑉(𝑥) = 12 𝑥2 and with any parameter 𝛽 > 0. As usual,
𝑍𝜇 is a normalization constant.
Recall from (12.4) that the DBM on the simplex Σ𝑁 is defined via the sto-
chastic differential equation
√2 1 1 1
(13.73) d𝑥𝑖 = d𝐵𝑖 + (− 𝑥𝑖 + ∑ )d𝑡 for 𝑖 = 1, … , 𝑁,
√𝛽𝑁 2 𝑁 𝑗≠𝑖 𝑥𝑖 − 𝑥𝑗
remark below (12.17)). On the other hand, certain results may be extended to
any 𝛽 > 0 if their final formulations do not involve DBM.
In this section, we present a regularization procedure to show that substan-
tial parts of the main results of Theorem 13.6, i.e., the LSI for any 𝛽 > 0 and
exponential relaxation decay of the entropy for 𝛽 ≥ 1, remain valid on the sim-
plex Σ𝑁 . A similar generalization holds for the Brascamp-Lieb inequality. In the
next section, the same regularization will be used to show that the key Dirichlet
form inequality (Theorem 14.3) also holds for 𝛽 > 0.
For later applications, we work with a slightly bigger class of measures than
just 𝜇𝛽 . We consider measures on Σ = Σ𝑁 of the form
−1 −𝛽𝑁ℋ̂
𝜔 = 𝑍𝜔 𝑒 𝜇𝛽 ,
where ℋ̂(𝐱) = ∑𝑗 𝑈𝑗 (𝑥𝑗 ) for some convex real valued functions 𝑈𝑗 on ℝ. The
total Hamiltonian of 𝜔 is ℋ𝜔 ∶= ℋ + ℋ̂. Note that 𝑈𝑗 are defined and convex
on the entire ℝ𝑁 . The entropy and the Dirichlet form are defined as before:
1
𝑆𝜔 (𝑓) = ∫ 𝑓 log 𝑓 d𝜔, 𝐷𝜔 (𝑓) = ∫ |∇𝑓|2 d𝜔.
Σ
𝛽𝑁 Σ
The corresponding DBM is given by
√2 1 1 1
(13.74) d𝑥𝑖 = d𝐵𝑖 + (− 𝑥𝑖 − 𝑈𝑖′ (𝑥𝑖 ) + ∑ )d𝑡.
√𝛽𝑁 2 𝑁 𝑥
𝑗≠𝑖 𝑖
− 𝑥𝑗
for some positive constant 𝐾 on the entire ℝ𝑁 . This bound (13.75) plays the
role of (13.28). Let 𝐷𝜔 , 𝑆𝜔 , and ℒ𝜔 denote the Dirichlet form, entropy, and
generator corresponding to the measure 𝜔. Now we claim that Theorem 13.6
holds for the measure 𝜔 on Σ𝑁 in the following form:
Theorem 13.14. Assume (13.75). Then for 𝛽 > 0, the LSI holds, i.e.,
2
(13.76) 𝑆𝜔 (𝑓) ≤ 𝐷 (√𝑓)
𝐾 𝜔
for any nonnegative normalized density 𝑓 on Σ𝑁 , ∫ 𝑓 d𝜔 = 1, that satisfies 𝑓 ∈
𝐿∞ and ∇√𝑓 ∈ 𝐿∞ . For 𝛽 ≥ 1 the requirement that 𝑓 and ∇√𝑓 are bounded can
be removed.
Moreover, the Brascamp-Lieb inequality also holds for any 𝛽 > 0; i.e., for any
bounded function 𝑓 ∈ 𝐿2 (Σ𝑁 , d𝜔) we have
″ −1
(13.77) ⟨𝑓; 𝑓⟩𝜔 ≤ ⟨∇𝑓, [ℋ𝜔 ] ∇𝑓⟩𝜔 .
For 𝛽 ≥ 1 the requirement that 𝑓 be bounded can be removed.
146 13. ENTROPY AND THE LOGARITHMIC SOBOLEV INEQUALITY (LSI)
The inequalities above are understood in the usual sense that they are rel-
evant only when the right-hand side is finite. Moreover, by a standard density
argument, they extend to the closure of the corresponding spaces. For exam-
ple, (13.76) holds for any 𝑓 that can be approximated by a sequence of bounded
normalized densities 𝑓𝑛 ∈ 𝐿∞ with ∇√𝑓𝑛 ∈ 𝐿∞ such that 𝐷𝜔 (√𝑓𝑛 − √𝑓) → 0.
Before starting the formal proof, we explain the key idea. For the proofs of
(13.76) and (13.77), we extend the measure 𝜔 from Σ to the entire ℝ𝑁 in a con-
tinuous way by relaxing the strict ordering 𝑥𝑖 < 𝑥𝑖+1 imposed by Σ but heavily
penalizing the opposite order. In this way we can use Theorem 13.6 for the reg-
ularized measure and avoid the problematic boundary terms in the integration
by parts in (13.33). At the end we remove the regularization using the additional
boundedness assumptions on 𝑓 and ∇√𝑓. For 𝛽 ≥ 1 these additional assump-
tions are not necessary using that 𝐶0∞ (Σ) functions are dense in 𝐻 1 (Σ, d𝜔). The
entropy decay (13.78) will follow from the LSI and the time-integral version of
the entropy dissipation (13.32), which can be proven directly on Σ if 𝛽 ≥ 1.
We remark that there is an alternative regularization method that mimics
the proof of Theorem 13.6 directly on Σ for 𝛽 ≥ 1. This is based on introducing
carefully selected cutoff functions in order to approximate 𝑓𝑡 by a function com-
pactly supported on Σ. The compact support renders the boundary terms in the
integration by parts zero, but the cutoff does not commute with the dynamics;
the error has to be tracked carefully. The advantage of this alternative method
is that it also gives the exponential decay of the Dirichlet form, and it is also
the closest in spirit to the strategy of the proof of Theorem 13.6 on ℝ𝑁 . The
disadvantage is that it works only for 𝛽 ≥ 1; in particular, it does not yield the
LSI for 𝛽 ∈ (0, 1). We will not discuss this approach in this book; the interested
reader may find details in appendix B of [64].
where we set
𝑥−𝛿 1
log𝛿 (𝑥) ∶= 𝟏(𝑥 ≥ 𝛿) log 𝑥 + 𝟏(𝑥 < 𝛿)(log 𝛿 + − 2 (𝑥 − 𝛿)2 ), 𝑥 ∈ ℝ.
𝛿 2𝛿
13.7. EXTENSIONS TO THE SIMPLEX; REGULARIZATION OF THE DBM 147
where 𝜋 runs through all 𝑁-element permutations and acts by permuting the
coordinates of any point 𝐱 ∈ ℝ𝑁 . For any 𝐱 ∈ Σ̃ 𝑁 there is a unique 𝜋 so that
˜ by 𝑓
𝐱 ∈ 𝜋(Σ𝑁 ), and we then define the extension 𝑓 ˜ (𝐱) ∶= 𝑓(𝜋 −1 (𝐱)). Clearly,
148 13. ENTROPY AND THE LOGARITHMIC SOBOLEV INEQUALITY (LSI)
Now we let 𝛿 → 0. Using the boundedness of ∇[( 𝑓 ˜ )1/2 ], the weak convergence
of 𝜔𝛿 to 𝜔, and that 𝑍𝜔,𝛿 and 𝐶𝛿 converge to 1, the Dirichlet form on the right
side of the last inequality converges to 𝐷𝜔 (√𝑓). The first term on the left side
of the inequality converges to ∫ 𝑓 log 𝑓 d𝜔 by dominated convergence, and the
second term converges to 0. Thus we arrive at (13.76).
In the above argument, the boundedness of 𝑓 and ∇√𝑓 were only used to
ensure that 𝑓 or rather its extension 𝑓 ˜ has finite integral, and the Dirichlet
form w.r.t. the regularized measure 𝜔𝛿 , 𝐷𝜔𝛿 (√ 𝑓 ˜ ), converges to 𝐷𝜔 (√𝑓 ). For
𝛽 ≥ 1 we can remove these conditions by using a different extension of 𝑓 to
ℝ𝑁 if 𝑓 ∈ 𝐻 1 (Σ, d𝜔). We may assume that 𝐷𝜔 (√𝑓) < ∞; otherwise (13.76)
is a tautology. We first still assume that 𝑓 ∈ 𝐿∞ (Σ). We smoothly cut off 𝑓 to
be 0 at the boundary of Σ𝑁 ; i.e., we find a nonnegative sequence 𝑓𝜀 ∈ 𝐶0∞ (Σ𝑁 )
such that √𝑓𝜀 → √𝑓 in 𝐻 1 (Σ, d𝜔) and ∫ 𝑓𝜀 d𝜔 = 1. For 𝛽 ≥ 1 the existence of a
similar sequence but 𝑓𝜀 → 𝑓 in 𝐻 1 was shown in Section 12.4. In fact, the same
construction shows that we can also guarantee √𝑓𝜀 → √𝑓 in 𝐻 1 (Σ, d𝜔). Now
we use the LSI for the smooth functions 𝑓𝜀 , i.e.,
2
𝑆𝜔 (𝑓𝜀 ) ≤ 𝐷 (√𝑓𝜀 ),
𝐾 𝜔
and we let 𝜀 → 0. The right-hand side converges to 𝐷𝜔 (√𝑓) by the above choice
of 𝑓𝜀 . For the left-hand side, recall that apart from a smoothing that can be
dealt with via standard approximation arguments, the cutoff function was con-
structed in the form 𝑓𝜀 (𝐱) = 𝐶𝜀 𝜙𝜀 (𝐱)𝑓(𝐱), where 𝜙𝜀 (𝐱) ∈ (0, 1) with 𝜙𝜀 ↗ 1
monotonically pointwise and 𝐶𝜀 is a normalization such that 𝐶𝜀 → 1 as 𝜀 → 0.
Clearly,
2 2
(13.84) 𝐶𝑀 log 𝐶𝑀 ∫ 𝑓𝑀 d𝜔 + 𝐶𝑀 ∫ 𝑓𝑀 log 𝑓𝑀 d𝜔 ≤ 𝐶𝑀 ∫|∇√𝑓𝑀 | d𝜔.
𝐾
Now we let 𝑀 → ∞. The first term on the left is just log 𝐶𝑀 → 0. The second
term on the left converges to 𝑆𝜔 (𝑓) by monotone convergence and 𝐶𝑀 → 1 and
similarly the right-hand side converges to (2/𝐾)𝐷𝜔 (√𝑓) by monotone conver-
gence. This proves (13.76) for 𝛽 ≥ 1 without any additional condition on 𝑓.
The Brascamp-Lieb inequality, (13.77), is proved similarly, starting from its
regularized version on ℝ𝑁 ,
that follows directly from Theorem 13.13. We can then take the limit 𝛿 → 0 using
monotone convergence on the left and the dominated convergence on the right,
using that ℋ𝛿″ → ℋ and the inverse [ℋ𝛿″ + ℋ̂ ″ ]−1 is uniformly bounded.
For the third part of the theorem, for the proof of (13.78), we first note that
the remark after (12.17) applies to the generator ℒ𝜔 as well; i.e., 𝛽 ≥ 1 is nec-
essary for the well-posedness of the equation 𝜕𝑡 𝑓𝑡 = ℒ𝜔 𝑓𝑡 on Σ𝑁 with initial
condition 𝑓0 supported on Σ𝑁 . The construction of the dynamics in Section 12.4
also implies that 𝑓𝑡 ∈ 𝐻 1 (d𝜔) for any 𝑡 > 0 if 𝑓 ∈ 𝐿2 (d𝜔).
We now mimic the proof of the entropy dissipation (13.32) in our setup.
Since we do not know that 𝐷𝜔 (√𝑓𝑡 ) < ∞, we have to introduce a regularization
𝑐 > 0 to keep 𝑓𝑡 away from 0. We compute
d ℒ𝑓𝑡
∫ 𝑓𝑡 log(𝑓𝑡 + 𝑐)d𝜔 = ∫(ℒ𝑓𝑡 ) log(𝑓𝑡 + 𝑐)d𝜔 + ∫ 𝑓𝑡 d𝜔
d𝑡 𝑓𝑡 + 𝑐
|∇𝑓𝑡 |2 𝑐ℒ𝑓𝑡
= −∫ d𝜔 − ∫ d𝜔
𝑓𝑡 + 𝑐 𝑓𝑡 + 𝑐
|∇𝑓𝑡 |2 𝑐|∇𝑓𝑡 |2
(13.85) = −∫ d𝜔 − ∫ d𝜔.
𝑓𝑡 + 𝑐 (𝑓𝑡 + 𝑐)2
𝑓𝑡 +𝑐
Since 𝑓𝑡 log ≥ 0, we have
𝑓𝑡
𝑡
|∇𝑓𝑠 |2
(13.87) ∫ 𝑓𝑡 log 𝑓𝑡 d𝜔 + ∫ ∫ d𝑠 ≤ ∫ 𝑓0 log(𝑓0 + 𝑐)d𝜔.
0
𝑓𝑠 + 𝑐
Note that both terms on the left-hand side are nonnegative. Now we let 𝑐 → 0.
By monotone convergence and 𝑆𝜔 (𝑓0 ) < ∞, we get
𝑡
|∇𝑓𝑠 |2
(13.88) ∫ 𝑓𝑡 log 𝑓𝑡 d𝜇 + ∫ d𝑠 ∫ d𝜔 ≤ ∫ 𝑓0 log 𝑓0 d𝜔
0
𝑓𝑠
or
𝑡
(13.89) 𝑆𝜔 (𝑓𝑡 ) + 4 ∫ 𝐷𝜔 (√𝑓𝑠 )d𝑠 ≤ 𝑆𝜔 (𝑓0 ).
0
This is the entropy dissipation inequality in a time integral form. Notice that
neither equality nor the differential version as in (13.32) is claimed. Note that
by integrating (13.85) between 𝑡 and 𝜏, a similar argument yields
𝑡
(13.90) 𝑆𝜔 (𝑓𝑡 ) + 4 ∫ 𝐷𝜔 (√𝑓𝑠 )d𝑠 ≤ 𝑆𝜔 (𝑓𝜏 ), 𝑡 ≥ 𝜏 ≥ 0.
𝜏
In particular, the entropy decays:
(13.91) 0 ≤ 𝑆𝜔 (𝑓𝑡 ) ≤ 𝑆𝜔 (𝑓𝜏 ), 𝑡 ≥ 𝜏 ≥ 0.
Now we use the LSI (13.76) to estimate 𝐷𝜔 (√𝑓𝑠 ) in (13.90) and recall that
the LSI holds for any 𝑓𝑠 since 𝛽 ≥ 1. We get
𝑡
(13.92) 𝑆𝜔 (𝑓𝑡 ) + 2𝐾 ∫ 𝑆𝜔 (𝑓𝑠 )d𝑠 ≤ 𝑆𝜔 (𝑓𝜏 ), 𝑡 ≥ 𝜏 ≥ 0.
𝜏
A standard calculus exercise shows that 𝑆𝜔 (𝑓𝑡 ) ≤ 𝑒−2𝐾𝑡 𝑆𝜔 (𝑓0 ) for all 𝑡 ≥ 0. One
possible argument is to fix any 𝛿 > 0 and choose 𝜏 = (𝑛 − 1)𝛿, 𝑡 = 𝑛𝛿 with
𝑛 = 1, 2, … in (13.92). By monotonicity of the entropy, we have
(1 + 2𝐾𝛿)𝑆𝜔 (𝑓𝑛𝛿 ) ≤ 𝑆𝜔 (𝑓(𝑛−1)𝛿 )
for any 𝑛, and by iteration we obtain
𝑆𝜔 (𝑓𝑛𝛿 ) ≤ (1 + 2𝐾𝛿)−𝑛 𝑆𝜔 (𝑓0 ).
Setting 𝛿 = 𝑡/𝑛 and letting 𝑛 → ∞ we get (13.78). This completes the proof of
Theorem 13.14. □
CHAPTER 14
1
and explicitly given by ℒG = Δ − (∇ℋ) ⋅ ∇. The corresponding dynamics is
𝛽𝑁
given by (12.17), i.e.,
(14.3) 𝜕𝑡 𝑓𝑡 = ℒG 𝑓𝑡 , 𝑡 ≥ 0.
In this section, we will drop all subscripts G.
As remarked in the previous section, the Hamiltonian ℋ is convex since
the Hessian of the Hamiltonian of 𝜇 satisfies ∇2 (𝛽𝑁ℋ) ≥ 𝛽𝑁/2 by (13.70).
Taking the different normalization of the Dirichlet form in (13.25) and (14.1)
into account, Theorem 13.6 (actually, its extension to Σ𝑁 in Theorem 13.14 with
𝑈𝑗 ≡ 0) guarantees that 𝜇 satisfies the LSI in the form
with a constant 𝐶 uniformly in 𝑁. We also assume that after time 1/𝑁 the
solution of the equation (14.3) satisfies the bound
(14.5) 𝑆𝜇 (𝑓1/𝑁 ) ≤ 𝐶𝑁 𝑚
for some fixed 𝑚. Later in Lemma 14.6, we will show that for 𝛽 = 1, 2 this bound
automatically holds.
Theorem 14.1 (Gap universality of the Dyson Brownian motion for short
time [64, theorem 4.1]). Let 𝛽 ≥ 1 and assume (14.5). Fix 𝑛 ≥ 1 and an array
of positive integers, 𝐦 = (𝑚1 , … , 𝑚𝑛 ) ∈ ℕ𝑛 𝑛
+ . Let 𝐺 ∶ ℝ → ℝ be a bounded,
smooth function with compact support and define
(which is smaller than 𝑁 −1+𝜀 in the bulk by (11.32)). This standard fact will be
proved in Section 14.5.
As pointed out after Theorem 12.4, the input of this theorem, the a priori
estimate (14.4), identifies the location of the eigenvalues only on a scale 𝑁 −1+𝜉 ,
which is much weaker than the 1/𝑁 precision encoded in the rescaled eigen-
value differences in (14.7). Moreover, by the rigidity estimate (11.32), the a priori
estimate (14.4) holds for any 𝜉 > 0 if the initial data of the DBM is a general-
ized Wigner ensemble. Therefore, Theorem 14.1 holds for any 𝑡 ≥ 𝑁 −1+𝜀 for
any 𝜀 > 0. This establishes Dyson’s conjecture (described in Section 12.3) in the
sense of averaged gap distributions for any generalized Wigner matrices.
i.e., it is a quadratic confinement on the scale √𝜏 for each eigenvalue near its
classical location, where the parameter 0 < 𝜏 < 1 will be chosen later on. The
total Hamiltonian is given by
(14.11) ℋ̃ ∶= ℋ + ℋ̂
where ℋ is the Gaussian Hamiltonian given by (4.12) with 𝑉(𝑥) = 12 𝑥 2 . The
measure with Hamiltonian ℋ̃ ,
˜
(14.12) d𝜔 ∶= 𝜔(𝐱)d𝐱, 𝜔 ∶= 𝑒−𝛽𝑁ℋ̃ / 𝑍 ,
will be called the local relaxation measure.
154 14. UNIVERSALITY OF THE DYSON BROWNIAN MOTION
The local relaxation flow is defined to be the flow with the generator char-
acterized by the natural Dirichlet form w.r.t. 𝜔, explicitly, ℒ̃:
𝑥𝑗 − 𝛾𝑗
(14.13) ℒ̃ = ℒ − ∑ 𝑏𝑗 𝜕𝑗 , 𝑏𝑗 = 𝑈𝑗′ (𝑥𝑗 ) = .
𝑗
𝜏
how to remedy this problem with the help of the regularization introduced in
Section 13.7.
The core of the proof is divided into three theorems. For the flow with
generator ℒ̃, we have the following estimates on the entropy and Dirichlet form.
Theorem 14.2. Let 𝛽 ≥ 1 be arbitrary. Consider the forward equation
(14.14) 𝜕𝑡 𝑞𝑡 = ℒ̃𝑞𝑡 , 𝑡 ≥ 0,
with the reversible measure 𝜔 defined in (14.12). Let the initial condition 𝑞0 satisfy
∫ 𝑞0 d𝜔 = 1. Then, we have the following estimates:
𝑁 2
2 1 (𝜕𝑖 √𝑞𝑡 − 𝜕𝑗 √𝑞𝑡 )
(14.15) 𝜕𝑡 𝐷𝜔 (√𝑞𝑡 ) ≤ − 𝐷𝜔 (√𝑞𝑡 ) − ∫ ∑ d𝜔,
𝜏 𝛽𝑁 2
𝑖,𝑗=1
(𝑥𝑖 − 𝑥𝑗 )2
∞ 𝑁 2
1 (𝜕𝑖 √𝑞𝑠 − 𝜕𝑗 √𝑞𝑠 )
(14.16) 2
∫ d𝑠 ∫ ∑ d𝜔 ≤ 𝐷𝜔 (√𝑞0 ),
𝛽𝑁 0 𝑖,𝑗=1
(𝑥𝑖 − 𝑥𝑗 )2
1
𝜕𝑡 𝐷𝜔 (ℎ𝑡 ) = 𝜕𝑡 ∫(∇ℎ)2 𝑒−𝛽𝑁ℋ̃ d𝐱
𝛽𝑁
(14.19)
2
≤− ∫ ∇ℎ(∇2 ℋ̃ )∇ℎ 𝑒−𝛽𝑁ℋ̃ d𝐱.
𝛽𝑁
In our case, (13.70) and (14.10) imply that the Hessian of ℋ̃ is bounded
from below as
1 1 1
(14.20) ∇ℎ(∇2 ℋ̃ )∇ℎ ≥ ∑(𝜕𝑗 ℎ)2 + ∑ (𝜕 ℎ − 𝜕𝑗 ℎ)2 .
𝜏 𝑗 2𝑁 𝑖,𝑗 (𝑥𝑖 − 𝑥𝑗 )2 𝑖
This proves (14.15) and (14.16). The rest can be proved by straightforward ar-
guments analogously to (13.32)–(13.37). □
Proof. We give the proof for the 𝛽 ≥ 1 case here since this is relevant for
Theorem 14.1. The general case 𝛽 > 0 with additional assumptions will be
discussed in Section 14.4. For simplicity of notation, we consider only the case
𝑛 = 1, 𝑚1 = 1, 𝒢𝑖,𝐦 (𝐱) = 𝐺(𝑁(𝑥𝑖 − 𝑥𝑖+1 )). Let 𝑞𝑡 satisfy
𝜕𝑡 𝑞𝑡 = ℒ̃𝑞𝑡 , 𝑡 ≥ 0,
with an initial condition 𝑞0 = 𝑞. We write
1
(14.22) ∫[ ∑ 𝐺(𝑁(𝑥𝑖 − 𝑥𝑖+1 ))](𝑞 − 1)d𝜔 =
|𝐽| 𝑖∈𝐽
1
∫[ ∑ 𝐺(𝑁(𝑥𝑖 − 𝑥𝑖+1 ))](𝑞 − 𝑞𝑡 )d𝜔
|𝐽| 𝑖∈𝐽
1
+ ∫[ ∑ 𝐺(𝑁(𝑥𝑖 − 𝑥𝑖+1 ))](𝑞𝑡 − 1)d𝜔.
|𝐽| 𝑖∈𝐽
The second term in (14.22) can be estimated by (13.8), the decay of the entropy
(14.18), and the boundedness of 𝐺; this gives the second term in (14.21).
To estimate the first term in (14.22), by the evolution equation 𝜕𝑞𝑡 = ℒ̃𝑞𝑡
and the definition of ℒ̃ we have
1 1
∫ ∑ 𝐺(𝑁(𝑥𝑖 − 𝑥𝑖+1 ))𝑞𝑡 d𝜔 − ∫ ∑ 𝐺(𝑁(𝑥𝑖 − 𝑥𝑖+1 ))𝑞0 d𝜔 =
|𝐽| 𝑖∈𝐽 |𝐽| 𝑖∈𝐽
𝑡
1
∫ d𝑠 ∫ ∑ 𝐺 ′ (𝑁(𝑥𝑖 − 𝑥𝑖+1 ))[𝜕𝑖 𝑞𝑠 − 𝜕𝑖+1 𝑞𝑠 ]d𝜔.
0
|𝐽| 𝑖∈𝐽
From the Schwarz inequality and 𝜕𝑞 = 2√𝑞𝜕 √𝑞, the last term is bounded by
𝑡 1/2
𝑁2 2
2[∫ d𝑠 ∫ 2
∑[𝐺 ′ (𝑁(𝑥𝑖 − 𝑥𝑖+1 ))] (𝑥𝑖 − 𝑥𝑖+1 )2 𝑞𝑠 d𝜔]
0 ℝ𝑁
|𝐽| 𝑖∈𝐽
𝑡 1/2
(14.23) 1 1 2
× [∫ d𝑠 ∫ 2
∑ 2
[𝜕𝑖 √𝑞𝑠 − 𝜕𝑖+1 √𝑞𝑠 ] d𝜔]
0 ℝ𝑁
𝑁 𝑖 (𝑥𝑖 − 𝑥𝑖+1 )
1/2
𝐷𝜔 (√𝑞0 )𝑡
≤ 𝐶( ) ,
|𝐽|
where we have used (14.16) and that [𝐺 ′ (𝑁(𝑥𝑖 − 𝑥𝑖+1 ))]2 (𝑥𝑖 − 𝑥𝑖+1 )2 ≤ 𝐶𝑁 −2
due to 𝐺 being smooth and compactly supported. □
14.2. PROOF OF THEOREM 14.1 157
However, compared with this simple bound, the estimate (14.21) gains an ex-
tra factor |𝐽| ≍ 𝑁 in the denominator; i.e., it is in terms of the Dirichlet form
per particle. The improvement is due to the fact that the observable in (14.21)
depends only on the gap, i.e., difference of points. This allows us to exploit the
additional term (14.16) gained in the Bakry-Émery argument. This is a man-
ifestation of the general observation that gap observables behave much better
than point observables.
The final ingredient in proving Theorem 14.1 is the following entropy and
Dirichlet form estimates.
Theorem 14.4. Let 𝛽 ≥ 1 be arbitrary. Suppose that (13.70) holds and recall
the definition of 𝑄 from (12.18). Fix some (possibly 𝑁-dependent) 𝜏 > 0, and
consider the local relaxation measure 𝜔 with this 𝜏. Set 𝜓 ∶= 𝜔/𝜇 and let 𝑔𝑡 ∶=
𝑓𝑡 /𝜓 with 𝑓𝑡 solving the evolution equation (14.3). Suppose there is a constant 𝑚
such that
Fix any small 𝜀 > 0. Then, for any 𝑡 ∈ [𝜏𝑁 𝜀 , 𝑁] the entropy and the Dirichlet
form satisfy the estimates
Remark 14.5. It will not be difficult to check that if the initial data is given
by a Wigner ensemble, then (14.25) holds without any assumption (see Lemma
14.6).
Proof of Theorem 14.4. Using Lemma 13.11, we can compute the evo-
lution of the entropy 𝑆(𝑓𝑡 𝜇|𝜔) = 𝑆(𝑓𝑡 𝜇|𝜓𝜇) as
4
𝜕𝑡 𝑆(𝑓𝑡 𝜇|𝜔) = − ∑ ∫(𝜕𝑗 √𝑔𝑡 )2 𝜓 d𝜇 + ∫(ℒ𝑔𝑡 )𝜓 d𝜇
𝛽𝑁 𝑗
Since 𝜔 is ℒ̃-invariant and time independent, the middle term on the right-hand
side vanishes. From the Schwarz inequality and 𝜕𝑔 = 2√𝑔 𝜕 √𝑔, we have
≤ −2𝐷𝜔 (√𝑔𝑡 ) + 𝐶𝑁 𝑄𝜏 −2 .
2
Notice that (14.27) is reminiscent to (13.32) for the derivative of the entropy of
the measure 𝑔𝑡 𝜔 = 𝑓𝑡 𝜇 with respect to 𝜔. The difference is, however, that 𝑔𝑡
does not satisfy the evolution equation with the generator ℒ̃. The last term in
(14.27) expresses the error.
Together with the logarithmic Sobolev inequality (14.17), we have
(14.28) 𝜕𝑡 𝑆(𝑓𝑡 𝜇|𝜔) ≤ −2𝐷𝜔 (√𝑔𝑡 ) + 𝐶𝑁 2 𝑄𝜏 −2 ≤ −𝐶𝜏 −1 𝑆(𝑓𝑡 𝜇|𝜔) + 𝐶𝑁 2 𝑄𝜏 −2 .
Integrating the last inequality from 𝜏 to 𝑡 and using the assumption (14.25) and
𝑡 ≥ 𝜏𝑁 𝜀 , we have proved the first inequality of (14.26). Using this result and
integrating (14.27), we have
𝑡
(14.29) ∫ 𝐷𝜔 (√𝑔𝑠 )d𝑠 ≤ 𝐶𝑁 2 𝑄𝜏 −1 .
𝜏
Notice that
1 |∇(𝑔𝑠 𝜓)|2 d𝜔
𝐷𝜇 (√𝑓𝑠 ) = ∫
𝛽𝑁 𝑔𝑠 𝜓 𝜓
(14.30) 𝐶 |∇𝑔𝑠 |2
≤ ∫[ + |∇ log 𝜓|2 𝑔𝑠 ]d𝜔
𝛽𝑁 𝑔𝑠
= 𝐶𝐷𝜔 (√𝑔𝑠 ) + 𝐶𝑁 2 𝑄𝜏 −2
by a Schwarz inequality. Thus from (14.29), after restricting the integration and
using 𝑡 ≥ 2𝜏, we get
𝑡
2 −1
𝐶𝑁 𝑄𝜏 ≥ ∫ 𝐷𝜔 (√𝑔𝑠 )d𝑠
𝑡−𝜏
𝑡
(14.31) 1
≥∫ [ 𝐷 (√𝑓𝑠 ) − 𝐶𝑁 2 𝑄𝜏 −2 ]d𝑠
𝑡−𝜏
𝐶 𝜇
𝜏
≥ 𝐷𝜇 (√𝑓𝑡 ) − 𝐶𝑁 2 𝑄𝜏 −1 ,
𝐶
where, in the last step, we used that 𝐷𝜇 (√𝑓𝑡 ) is decreasing in 𝑡, which follows
from the convexity of the Hamiltonian of 𝜇 (see, e.g., (13.33)). Using the opposite
inequality 𝐷𝜔 (√𝑔𝑡 ) ≤ 𝐶𝐷𝜇 (√𝑓𝑡 ) + 𝐶𝑁 2 𝑄𝜏 −2 , which can be proven similarly to
(14.30), we obtain
𝐷𝜔 (√𝑔𝑡 ) ≤ 𝐶𝑁 2 𝑄𝜏 −2 ,
i.e., the second inequality of (14.26). □
14.2. PROOF OF THEOREM 14.1 159
Finally, we complete the proof of Theorem 14.1. For any given 𝑡 > 0 we now
choose 𝜏 ∶= 𝑡𝑁 −𝜀 and construct the local relaxation measure 𝜔 with this 𝜏 as in
(14.12). Set 𝜓 = 𝜔/𝜇 and let 𝑞 ∶= 𝑔𝑡 = 𝑓𝑡 /𝜓 be the density 𝑞 in Theorem 14.3.
We would like to apply Theorem 14.4, and for this purpose we need to verify the
assumption (14.25). By the definitions of 𝜔, 𝜇, and 𝜓 = 𝜔/𝜇, we have
Lemma 14.6. Let 𝛽 = 1, 2. Suppose the initial data 𝑓0 of the DBM is given by
the eigenvalue distribution of a Wigner matrix. Then for any 𝜏 > 0 we have
Proof. For simplicity, we consider only the case 𝛽 = 1, i.e., the real Wigner
matrices. Recall that the probability measure 𝑓𝜏 𝜇 is the same as the eigenvalue
distribution of the Gaussian divisible matrix (12.21):
where 𝜇𝐻𝜏 and 𝜇𝐻G are the laws of the matrix 𝐻𝜏 and 𝐻 G , respectively. Since
the laws of both 𝜇𝐻𝜏 and 𝜇𝐻G are given by the product of the laws of the matrix
elements, from the additivity of entropy (13.11), 𝑆(𝜇𝐻𝜏 |𝜇𝐻G ) is equal to the sum
of the relative entropies of the matrix elements. Recall that the variances of off-
diagonal and diagonal entries for GOE differ by a factor of 2. For simplicity of
notation, we consider only the off-diagonal terms. Let 𝛾 = 1 − e−𝜏 and denote
by 𝑔𝛼 the standard Gaussian distribution with variance 𝛼, i.e.,
1 𝑥2
𝑔𝛼 (𝑥) ∶= exp(− ).
√2𝜋𝛼 2𝛼
inequality yields
|
𝑆(𝜁𝜏 |𝑔2/𝑁 ) = 𝑆(∫ d𝑦 𝜚𝛾 (𝑦) 𝑔2𝛾/𝑁 (⋅ − 𝑦) || 𝑔2/𝑁 )
(14.36)
≤ ∫ d𝑦 𝜚𝛾 (𝑦)𝑆(𝑔2𝛾/𝑁 (⋅ − 𝑦)|𝑔2/𝑁 ).
𝑠 𝑦2 𝜎2 1
𝑆(𝑔𝜍2 (⋅ − 𝑦)|𝑔𝑠2 ) = log + 2+ 2− .
𝜎 2𝑠 2𝑠 2
In our case, we have
1 𝑁
𝑆(𝑔2𝛾/𝑁 (⋅ − 𝑦)|𝑔2/𝑁 ) = ( 𝑦 2 − log 𝛾 + 𝛾 − 1).
2 2
We can now continue the estimate (14.36). Using ∫ 𝑦 2 𝜚𝛾 (𝑦)d𝑦 = 2/𝑁, we obtain
We now return to the proof of Theorem 14.1. We can apply Lemma 14.6
1
with the choice 𝜏 = 𝑁 −1+2𝜉 , where 𝜉 ∈ (0, ) is from the assumption (14.4).
2
Together with (14.32)–(14.33), this implies that (14.25) holds. Thus, Theorem
14.4 and Theorem 14.3 imply for any 𝑡 ∈ [𝜏𝑁 𝜀 , 𝑁] that
1/2
| 1 | 𝐷𝜔 (√𝑞) 𝜀
|∫ ∑ 𝒢m,𝑖 (𝐱)(𝑓𝑡 d𝜇 − d𝜔)| ≤ 𝐶(𝑡 ) + 𝐶 √𝑆𝜔 (𝑞) 𝑒−𝑐𝑁
| 𝑁 𝑖∈𝐽 | |𝐽|
1/2
(14.37) 𝑁2𝑄 𝜀
≤ 𝐶(𝑡 ) + 𝐶𝑒−𝑐𝑁
|𝐽|𝜏 2
𝑁2𝑄 𝜀
≤ 𝐶𝑁 𝜀 + 𝐶𝑒−𝑐𝑁 ;
√ |𝐽|𝑡
i.e., the local statistics of 𝑓𝑡 𝜇 and 𝜔 are the same for any initial data 𝑓𝜏 .
Applying the same argument to the Gaussian initial data, 𝑓0 = 𝑓𝜏 = 1, we
can also compare 𝜇 and 𝜔. We have thus proved the estimate (14.7). Finally, if
𝑡 ≥ 𝑁 −1+2𝜉+𝛿+2𝜀 , then the assumption (14.4) guarantees that
𝑁2𝑄 1
≤ ,
√ |𝐽|𝑡 |𝐽|𝑁 𝛿−1
√
𝑁 2 𝑄𝛿 𝜀
𝐶𝑁 𝜀 + 𝐶𝑒−𝑐𝑁 , 𝑡 ∈ [𝜏𝑁 𝜀 , 𝑁],
√ |𝐽|𝑡
where 𝑄𝛿 is defined exactly as in (14.4) with the regularized measures, i.e.,
𝑁
1
(14.40) 𝑄𝛿 ∶= sup ∫ ∑ (𝑥𝑗 − 𝛾𝑗 )2 𝑓𝑡,𝛿 (𝐱)𝜇𝛿 (d𝐱).
0≤𝑡≤𝑁 𝑁 𝑗=1
𝐱
(14.42) ∫ 𝑂(𝐱)𝑓𝑡,𝛿 (𝐱)d𝜔𝛿 = 𝔼𝑓0 𝜔 𝔼𝛿0 𝑂(𝐱(𝑡)),
ℝ𝑁
𝐱
where 𝔼𝛿0 denotes the expectation with respect to the law of the regularized
DBM (𝐱(𝑡))𝑡 (see (13.82)) starting from 𝐱0 , and 𝔼𝑓0 𝜔 denotes the expectation
of 𝐱0 with respect to the measure 𝑓0 𝜔. Now we use the existence of the strong
solution to the DBM (13.74) for any 𝛽 ≥ 1, that can be obtained exactly as in
Theorem 12.2 for the 𝑈𝑗 ≡ 0 case. Since 𝐱(𝑡) is continuous and remains in the
open set Σ𝑁 , the probability that up to a fixed time 𝑡 it remains away from a
𝛿-neighborhood of the boundary of Σ𝑁 goes to 1 as 𝛿 → 0, i.e.,
Notice that (13.74) and (13.82) are exactly the same for paths that stay away from
the boundary of Σ𝑁 . This means that the right-hand side of (14.42) converges
to 𝔼𝑓0 𝜔 𝔼𝐱0 𝑂(𝐱(𝑡)), where 𝔼𝐱0 denotes expectation with respect to the law of
(13.74). This proves that
entropy decays along the regularized flow, we immediately have 𝑆𝜇𝛿 (𝑓𝜏,𝛿 ) ≤
𝑆𝜇𝛿 (𝑓0,𝛿 ) = 𝑆𝜇𝛿 (𝑓ˆ𝜍,𝐾 ). Since 𝑓ˆ𝜍,𝐾 is supported on Σ𝑁 and is bounded, clearly
𝑆𝜇𝛿 (𝑓ˆ𝜍,𝐾 ) converges to 𝑆𝜇 (𝑓ˆ𝜍,𝐾 ) as 𝛿 → 0. We then have
| 1 1 |
(14.44) |∫ ∑ 𝒢𝑖,𝐦 𝑞 d𝜔 − ∫ ∑ 𝒢𝑖,𝐦 d𝜔| ≤
| |𝐽| 𝑖∈𝐽 |𝐽| 𝑖∈𝐽 |
𝐷𝜔 (√𝑞) 𝑡
𝐶 + 𝐶 √𝑆𝜔 (𝑞) e−𝑐𝑡/𝜏 .
√ |𝐽|
For 𝛽 ≥ 1, the conditions ∇√𝑞 ∈ 𝐿∞ and 𝑞 ∈ 𝐿∞ (d𝜔) can be removed.
We emphasize that this lemma holds for any 𝛽 > 0; i.e., it does not (directly)
rely on the existence of the DBM. The parameter 𝑡 is not connected with the
time parameter of a dynamics on Σ𝑁 (although it emerges as a time cutoff in a
regularized dynamics on ℝ𝑁 within the proof).
| 1 1 |
(14.45) ||∫ ∑ 𝒢𝑖,𝐦 𝑞 ̂ d𝜔𝛿 − ∫ ∑ 𝒢𝑖,𝐦 d𝜔𝛿 || ≤
|𝐽| 𝑖∈𝐽
|𝐽| 𝑖∈𝐽
√
√ 𝐷𝜔𝛿 (√𝑞)̂ 𝑡
𝐶 + 𝐶 √𝑆𝜔𝛿 (𝑞)̂ e−𝑐𝑡/𝜏 .
√ |𝐽|
Suppose now that √𝑞 ∈ 𝐻 1 (d𝜔) and 𝑞 is a bounded probability density in
Σ𝑁 with respect to 𝜔. Similarly to the proof of (13.76) for the general 𝛽 > 0 case,
we may extend 𝑞 to Σ̃ by symmetrization. Let ˜ 𝑞 denote this extension, which
1/2
is bounded, and ∇ 𝑓˜ is also bounded since 𝑞 has these properties. Then there
is a constant 𝐶𝛿 such that 𝑞𝛿 ∶= 𝐶𝛿 ˜ 𝑞 is a probability density with respect to
𝑞 is bounded, we have ∫ℝ𝑁 ˜
𝜔𝛿 . Since ˜ 𝑞 d𝜔𝛿 → ∫Σ𝑁 𝑞 d𝜔 by dominated conver-
gence, and thus 𝐶𝛿 → 1 as 𝛿 → 0.
Now we apply (14.45) 𝑞ˆ = 𝑞𝛿 . Taking the limit 𝛿 → 0, the left-hand
side converges to that of (14.44) since 𝒢𝑖,m is a bounded smooth function and
𝑞𝛿 d𝜔𝛿 = 𝐶𝛿 ˜ 𝑞 d𝜔𝛿 converges weakly to 𝑞(𝜔)1(𝜔 ∈ Σ𝑁 )d𝜔 by dominated con-
vergence. We thus have
| 1 1 |
(14.46) |∫ ∑ 𝒢𝑖,𝐦 𝑞 d𝜔 − ∫ ∑ 𝒢𝑖,𝐦 d𝜔| ≤
| |𝐽| 𝑖∈𝐽 |𝐽| 𝑖∈𝐽 |
√
√ 𝐶𝛿 𝐷𝜔𝛿 (√ ˜
𝑞)𝑡
𝐶 lim sup 𝑞 ) e−𝑐𝑡/𝜏 .
+ 𝐶 lim sup √𝑆𝜔𝛿 (𝐶𝛿 ˜
𝛿→0 √ |𝐽| 𝛿→0
1/2 1/2
By dominated convergence, using that ∇ 𝑓 ˜ ∈ 𝐿∞ , we have 𝐷𝜔𝛿 ( 𝑓˜ ) →
𝐷𝜔 (√𝑓). Similarly, the entropy term also converges to 𝑆𝜔 (𝑞) by ˜ 𝑞 ∈ 𝐿∞ .
∞ ∞
This proves (14.44) under the condition 𝑞 ∈ 𝐿 , ∇√𝑞 ∈ 𝐿 . Finally, these
conditions can be removed for 𝛽 ≥ 1 by a simple approximation. We may assume
𝐷𝜔 (√𝑞) < ∞; otherwise, (14.44) is an empty statement. By the LSI (13.76), we
also know that 𝑆𝜔 (𝑞) < ∞. First, we still keep the assumption that 𝑞 ∈ 𝐿∞ .
Since 𝐶0∞ (Σ) is dense in 𝐻 1 (d𝜔) (see Section 12.4), we can find a sequence of
densities 𝑞𝑛 ∈ 𝐿∞ (Σ) such that ∇√𝑞𝑛 ∈ 𝐿2 (d𝜔) and √𝑞𝑛 → √𝑞 in 𝐿2 (d𝜔) and
∇√𝑞𝑛 → ∇√𝑞 in 𝐿2 (d𝜔). In fact, the construction in Section 12.4 guarantees
that, apart from an irrelevant smoothing, 𝑞𝑛 may be chosen of the form 𝑞𝑛 =
𝐶𝑛 𝜙𝑛 𝑞 where 𝜙𝑛 is a cutoff function with 0 ≤ 𝜙𝑛 ≤ 1, converging to 1 pointwise
and 𝐶𝑛 → 1. We know that (14.44) holds for every 𝑞𝑛 . Taking the limit 𝑛 → ∞,
the left-hand side converges since
| |
|∫ 𝑂(𝑞𝑛 − 𝑞)d𝜔| ≤ ‖𝑂‖∞ ‖√𝑞𝑛 − √𝑞‖𝐿2 (d𝜔) ‖√𝑞𝑛 + √𝑞‖𝐿2 (d𝜔) → 0,
| |
where we have used that ‖√𝑞𝑛 + √𝑞‖2 ≤ ‖√𝑞𝑛 ‖2 + ‖√𝑞‖2 and ‖√𝑞‖2 =
∫ 𝑞 d𝜔 = 1. Here, 𝑂 is given by (14.41). For the right-hand side of (14.44)
14.5. FROM GAP DISTRIBUTION TO CORRELATION FUNCTIONS 165
𝐸+𝑏 ′ 𝑁
d𝐸
= 𝐶𝑁,𝑛 ∫ ∫ ∑ ∑ 𝑌𝑖,m (𝐸 ′ , 𝐱)𝑓 d𝜇
𝐸−𝑏
2𝑏 m∈𝑆 𝑖=1𝑛
(1) (2)
and note that |𝑆𝑛 (𝑀)| ≤ 𝑀 𝑛−1 . We have the simple bound Θ ≤ Θ𝑀 + Θ𝑀 +
(3)
Θ𝑀 where
(1) | 𝐸+𝑏 d𝐸 ′ 𝑁
|
(14.52) Θ𝑀 ∶= |∫ ∫ ∑ ∑ 𝑌𝑖,𝐦 (𝐸 ′ , 𝐱)(𝑓 − 1)d𝜇|
| 𝐸−𝑏 2𝑏 𝐦∈𝑆 (𝑀) 𝑖=1
|
𝑛
and
(2) | 𝐸+𝑏 d𝐸 ′ 𝑁
|
(14.53) Θ𝑀 ∶= ∑ |∫ ∫ ∑ 𝑌𝑖,𝐦 (𝐸 ′ , 𝐱)𝑓 d𝜇|.
|
𝐦∈𝑆c (𝑀) 𝐸−𝑏
2𝑏 𝑖=1 |
𝑛
(3) (2)
We define Θ𝑀 to be the same as Θ𝑀 but with 𝑓 replaced by the constant 1, i.e.,
the equilibrium measure 𝜇.
(1)
Step 1. Small 𝐦 case; estimate of Θ𝑀 . After performing the d𝐸 ′ integration,
we will eventually apply Theorem 14.1 to the function
The error term Ω+ 𝐽,𝐦 , defined by (14.55) indirectly, comes from those 𝑖 ∉ 𝐽
+
−1 ′
indices, for which 𝑥𝑖 ∈ [𝐸 − 𝑏, 𝐸 + 𝑏] + 𝑂(𝑁 ) since 𝑌𝑖,𝐦 (𝐸 , 𝐱) = 0 unless
|𝑥𝑖 − 𝐸 ′ | ≤ 𝐶/𝑁, the constant depending on the support of 𝑂. Thus,
|Ω+ −1
𝐽,𝐦 (𝐱)| ≤ 𝐶𝑁 #{𝑖 ∶ |𝑥𝑖 − 𝛾𝑖 | ≥ 𝜁/2, |𝑥𝑖 − 𝐸| ≤ 2𝑏}
for any sufficiently large 𝑁 assuming 𝜁 ≫ 1/𝑁 and using that 𝑂 is a bounded
function. The additional 𝑁 −1 factor comes from the d𝐸 ′ integration. Due to the
′
rigidity estimate (12.22) and choosing 𝜁 = 𝑁 −1+𝜉+𝜀 with some 𝜀′ > 0, we get
(14.56) ∫ |Ω+
𝐽,𝐦 (𝐱)|𝑓 d𝜇 ≤ 𝑁
−𝐷
168 14. UNIVERSALITY OF THE DYSON BROWNIAN MOTION
(14.57)
= ∫ d𝐸 ′ ∑ 𝑌𝑖,m (𝐸 ′ , 𝐱) + 𝐶𝑁 −1 |𝐽 + ⧵ 𝐽 − | + Ξ+
𝐽,m (𝐱)
ℝ 𝑖∈𝐽−
≤ ∫ d𝐸 ′ ∑ 𝑌𝑖,m (𝐸 ′ , 𝐱) + 𝐶𝑁 −1 |𝐽 + ⧵ 𝐽 − |
ℝ 𝑖∈𝐽
+ 𝐶𝑁 −1 |𝐽 ⧵ 𝐽 − | + Ξ+
𝐽,m (𝐱)
where the error term Ξ+ 𝐽,m , defined by (14.57), comes from indices 𝑖 ∈ 𝐽 such
−
| 𝐸+𝑏 ′ 𝑁
(14.58) | ∫ d𝐸 ∫ ∑ 𝑌𝑖,m (𝐸 ′ , 𝐱)𝑓 d𝜇
| 𝐸−𝑏 𝑖=1
1 |
−∫ ∑ 𝐺(𝑁(𝑥𝑖 − 𝑥𝑖+𝑚2 ), … )𝑓 d𝜇| ≤ 𝐶𝜁 + 𝐶𝑁 −𝐷
𝑁 𝑖∈𝐽 |
for each m ∈ 𝑆𝑛 . The error term 𝑁 −𝐷 can be neglected. Adding up (14.58) for
all m ∈ 𝑆𝑛 (𝑀), we get
| 𝐸+𝑏 ′ 𝑁
(14.59) |∫ d𝐸 ∫ ∑ ∑ 𝑌𝑖,m (𝐸 ′ , 𝐱)𝑓 d𝜇
| 𝐸−𝑏 m∈𝑆 (𝑀) 𝑖=1 𝑛
1 |
−∫ ∑ ∑ 𝐺(𝑁(𝑥𝑖 − 𝑥𝑖+𝑚2 ), … )𝑓 d𝜇| ≤ 𝐶𝑀 𝑛−1 𝜁.
m∈𝑆𝑛 (𝑀)
𝑁 𝑖∈𝐽 |
Clearly, the same estimate holds for the equilibrium, i.e., if we set 𝑓 = 1 in
(14.59). Subtracting these two formulas and applying (14.47) to each summand
14.6. DETAILS OF THE PROOF OF LEMMA 14.8 169
(1) | 𝐸+𝑏 d𝐸 ′ 𝑁
|
Θ𝑀 = | ∫ ∫ ∑ ∑ 𝑌𝑖,m (𝐸 ′ , 𝐱)(𝑓 d𝜇 − d𝜇)|
(14.60) | 𝐸−𝑏 2𝑏 m∈𝑆 (𝑀) 𝑖=1
|
𝑛
𝑛−1 −1 −1+𝜉+𝜀′
≤ 𝐶𝑀 (𝑏 𝑁 + 𝑏−1/2 𝑁 −𝛿/2 ),
(1)
where we have used that |𝐽| ≤ 𝐶𝑁𝑏. This completes the estimate of Θ𝑀 .
(2) (3)
Step 2. Large m case; estimate of Θ𝑀 and Θ𝑀 . For a fixed 𝑦 ∈ ℝ, ℓ > 0,
let
𝑁
ℓ ℓ
𝜒(𝑦, ℓ) ∶= ∑ 1(𝑥𝑖 ∈ [𝑦 − , 𝑦 + ])
𝑖=1
𝑁 𝑁
denote the number of points in the interval [𝑦 − ℓ/𝑁, 𝑦 + ℓ/𝑁]. Note that for a
fixed m = (𝑚2 , … , 𝑚𝑛 ), we have
𝑁
∑ |𝑌𝑖,m (𝐸 ′ , 𝐱)| ≤ 𝐶 ⋅ 𝜒(𝐸 ′ , ℓ) ⋅ 1(𝜒(𝐸 ′ , ℓ) ≥ 𝑚𝑛 )
𝑖=1
(14.61)
𝑁
≤ 𝐶 ∑ 𝑚 ⋅ 1(𝜒(𝐸 ′ , ℓ) ≥ 𝑚)
𝑚=𝑚𝑛
| 𝐸+𝑏 ′ 𝑁
|
(14.62) ∑ |∫ d𝐸 ∫ ∑ 𝑌𝑖,m (𝐸 ′ , 𝐱)𝑓 d𝜇| ≤
|
m∈𝑆c (𝑀) 𝐸−𝑏 𝑖=1 |
𝑛
𝐸+𝑏 𝑁
𝐶∫ d𝐸 ′ ∑ 𝑚𝑛−1 ∫ 1(𝜒(𝐸 ′ , ℓ) ≥ 𝑚)𝑓 d𝜇.
𝐸−𝑏 𝑚=𝑀
The rigidity bound (12.22) clearly implies
We have completed the first two steps of the three-step strategy introduced in
Chapter 5, i.e., the local semicircle law and the universality of Gaussian divisible
ensembles (Theorem 12.4). In this section, we will complete this strategy by
proving a continuity result for the local correlation functions of the matrix OU
process in the following Theorem 15.2 and Lemma 15.3. This is Step 3a defined
in Chapter 5. From these results, we obtain a weaker version of Theorem 5.1;
namely, we get averaged energy universality of Wigner matrices but only on
scale 𝑏 ≥ 𝑁 −1/2+𝜀 . In Section 16.1, we will use the idea of “approximation
by a Gaussian divisible ensemble” and prove Theorem 5.1 down to any scale
𝑏 ≥ 𝑁 −1+𝜀 .
Theorem 15.1 ([68, theorem 2.2]). Let 𝐻 be an 𝑁 × 𝑁 real symmetric or
complex Hermitian Wigner matrix. In the Hermitian case we assume that the
real and imaginary parts are i.i.d. Suppose that the distribution 𝜈 of the rescaled
matrix elements √𝑁ℎ𝑖𝑗 satisfies the decay condition (5.6). Fix a small 𝜀 > 0,
an integer 𝑛 ≥ 1 and let 𝑂 ∶ ℝ𝑛 → ℝ be a continuous, compactly supported
function. Then, for any |𝐸| < 2 and 𝑏 ∈ [𝑁 −1/2+𝜀 , 𝑁 −𝜀 ], we have
𝐸+𝑏
1 (𝑛) (𝑛) 𝜶
(15.1) lim ∫ d𝐸 ′ ∫ d𝜶 𝑂(𝜶)(𝑝𝐻,𝑁 − 𝑝𝐺,𝑁 )(𝐸 ′ + ) = 0.
𝑁→∞ 2𝑏 𝑁
𝐸−𝑏 ℝ𝑛
To prove this theorem, we first recall the matrix OU process (12.1) defined
by
1 1
(15.2) d𝐻𝑡 = dB𝑡 − 𝐻𝑡 d𝑡
√𝑁 2
with the initial data 𝐻0 . The eigenvalue evolution of this process is the DBM and
recall that we denote the eigenvalue distribution at the time 𝑡 by 𝑓𝑡 d𝜇 with 𝑓𝑡
satisfying (12.17). In this section, we assume that the initial data 𝐻0 is an 𝑁 × 𝑁
Wigner matrix and the distribution of matrix element satisfies the uniform poly-
nomial decay condition (5.6). We have the following Green function continuity
theorem for the matrix OU process.
Theorem 15.2 (Continuity of Green function). Suppose that the initial data
𝐻0 is an 𝑁 × 𝑁 Wigner matrix with the distribution of matrix elements satisfying
the uniform polynomial decay condition (5.6). Let 𝜅 > 0 be arbitrary and suppose
171
172 15. CONTINUITY OF LOCAL CORRELATION FUNCTIONS
that for some small parameter 𝜎 > 0 and for any 𝑦 ≥ 𝑁 −1+𝜍 we have the following
estimate on the diagonal elements of the resolvent for any 0 ≤ 𝑡 ≤ 1:
| 1 |
(15.3) max max ||( ) | ≺ 𝑁 2𝜍
1≤𝑘≤𝑁 |𝐸|≤2−𝜅 𝐻𝑡 − 𝐸 − 𝑖𝑦 𝑘𝑘 |
and
(15.5) max{|𝜕 𝛼 𝐹(𝑥1 , … , 𝑥𝑛 )| ∶ max |𝑥𝑗 | ≤ 𝑁 2 } ≤ 𝑁 𝐶0
𝑗
𝑁 −1−𝜍 to the real axis for some 𝜎 > 0, i.e., below the scale of the eigenvalue
spacing. Controlling the Stieltjes transform on this very short scale is necessary
to identify and compare local correlation functions at the scale 𝑁 −1 .
The following theorem is a slightly modified version of [69, theorem 6.4].
Theorem 15.3 (Correlation function comparison). Let 𝜅 > 0 be arbitrary,
and suppose that for some small parameters 𝜎, 𝛿 > 0 the following two conditions
hold:
(i) For any 𝜀 > 0 and any 𝑘 integer
(15.7) 𝔼[Im 𝑚𝐯 (𝐸 + 𝑖𝑁 −1+𝜀 )]𝑘 + 𝔼[Im 𝑚𝐰 (𝐸 + 𝑖𝑁 −1+𝜀 )]𝑘 ≤ 𝐶
holds for any |𝐸| ≤ 2 − 𝜅 and 𝑁 ≥ 𝑁0 (𝜀, 𝑘, 𝜅).
(ii) For any sequence 𝑧𝑗 = 𝐸𝑗 + 𝑖𝜂𝑗 , 𝑗 = 1, … , 𝑛, with |𝐸𝑗 | ≤ 2 − 𝜅 and
𝜂𝑗 = 𝑁 −1−𝜍𝑗 for some 𝜎𝑗 ≤ 𝜎, we have
(15.8) |𝔼(Im 𝑚𝐯 (𝑧1 ) ⋯ Im 𝑚𝐯 (𝑧𝑛 )) − 𝔼(Im 𝑚𝐰 (𝑧1 ) ⋯ Im 𝑚𝐰 (𝑧𝑛 ))| ≤ 𝑁 −𝛿 .
Then, for any integer 𝑛 ≥ 1 there are positive constants 𝑐𝑛 = 𝑐𝑛 (𝜎, 𝛿) such that
for any |𝐸| ≤ 2 − 2𝜅 and for any 𝐶 1 -function 𝑂 ∶ ℝ𝑛 → ℝ with compact support,
(𝑛) (𝑛) 𝜶
(15.9) ∫ d𝜶 𝑂(𝜶)(𝑝𝐯,𝑁 − 𝑝𝐰,𝑁 )(𝐸 + ) ≤ 𝐶𝑁 −𝑐𝑛
ℝ𝑛
𝑁
where 𝐶 depends on 𝑂 and 𝑁 is sufficiently large.
We remark that in some applications we will use slightly different condi-
tions. Instead of (15.8) we may assume
provided that 𝑘 is large enough. Similar arguments can be used for general
𝑛, and this implies that the difference between the expectation of 𝐹 and the
product is negligible. This proves that the condition (15.8) can be replaced by
the condition (15.10).
Notice that we pass through the continuity of traces of Green functions as an
intermediate step to get continuity of the correlation functions. If we choose to
follow the evolution of the correlation functions directly by differentiating them,
it will involve higher derivatives of eigenvalues. From the formulas of derivatives
of eigenvalues and eigenvectors (12.9), (12.10), higher derivatives of eigenvalues
involve singularities of the form (𝜆𝑖 − 𝜆𝑗 )−𝑛 for some positive integers 𝑛. These
singularities are very difficult to control precisely. Our approach to use the Green
function as an intermediate step completely avoids this difficulty because the
Green function has a natural regularization parameter, the imaginary part of
the spectral parameter.
Proof of Theorem 15.1. Recall that the matrix OU (15.2) can be solved
by the formula (12.21) so that the probability distribution of the matrix OU is
given by a Wigner matrix ensemble if the initial data is a Wigner matrix en-
semble. More precisely, if we denote the initial Wigner matrix by 𝐻0 , then the
distribution of 𝐻𝑡 is the same as 𝑒−𝑡/2 𝐻0 + (1 − 𝑒−𝑡 )1/2 𝐻G . Hence, the rigidity
holds in this case by (11.32), and (15.3) holds with any sufficiently small 𝜎 > 0;
(𝑛)
we may choose 𝜎 = 𝜀1 . Recall that 𝑝𝑡,𝑁 denotes the correlation functions of 𝐻𝑡 .
We now apply Theorem 15.3 with 𝐻 𝐯 = 𝐻0 and 𝐻 𝐰 = 𝐻𝑡 . The assumption
(15.8) can be verified by (15.6) if 𝑡 ≤ 𝑁 −1/2−𝜀 for any 𝜀 > 0. From (15.9), we
have
𝐸+𝑏
d𝐸 ′ (𝑛) (𝑛) 𝜶
(15.13) lim ∫ ∫ d𝜶 𝑂(𝜶)(𝑝0,𝑁 − 𝑝𝑡,𝑁 )(𝐸 ′ + ) = 0.
𝑁→∞
𝐸−𝑏
2𝑏 ℝ𝑛 𝑁
This compares the correlation functions of 𝐻0 and 𝐻𝑡 if 𝑡 is not too large. To
compare 𝐻𝑡 with 𝐻∞ = 𝐻G , by (12.24), we have for 𝑡 = 𝑁 −1/2−𝜀 and 𝑏 ≥
(𝑛)
𝑁 −1/2+10𝜀 that (15.13) holds with 𝑝0,𝑁 (which are the correlation functions of
(𝑛)
𝐻0 ) replaced by 𝑝𝐺,𝑁 . We have thus completed the proof of Theorem 15.1. □
Lemma 15.4. Suppose that 𝐻0 is a Wigner ensemble and fix 𝑡 ∈ [0, 1]. Let 𝑔
be a smooth function of the matrix elements (ℎ𝑖𝑗 )𝑖≤𝑗 and set
(15.14) 𝑀𝑡 ∶= sup sup sup 𝔼((𝑁 3/2 |ℎ𝑖𝑗 (𝑠)3 | + √𝑁|ℎ𝑖𝑗 (𝑠)|)|𝜕ℎ3 𝑖𝑗 𝑔(𝜽𝑖𝑗 𝐻𝑠 )|)
0≤𝑠≤𝑡 𝑖≤𝑗 𝜽𝑖𝑗
where the last supremum runs through all deformations 𝜽𝑖𝑗 . Then,
This lemma also holds for generalized Wigner matrices. We refer the reader
to [12] for the minor adjustment needed to this case.
Proof. Denote 𝜕𝑖𝑗 = 𝜕ℎ𝑖𝑗 ; notice that despite the two indices, this is still a
first- and not a second-order partial derivative. By Itô’s formula, we have
1 2
𝜕𝑡 𝔼𝑔(𝐻𝑡 ) = − ∑(𝔼(ℎ𝑖𝑗 (𝑡)𝜕𝑖𝑗 𝑔(𝐻𝑡 )) − 𝔼(𝜕𝑖𝑗 𝑔(𝐻𝑡 ))).
𝑖≤𝑗
2𝑁
A Taylor expansion of the first derivative 𝜕𝑖𝑗 𝑔 in the direction ℎ𝑖𝑗 yields
2
𝔼(ℎ𝑖𝑗 (𝑡)𝜕𝑖𝑗 𝑔(𝐻𝑡 )) = 𝔼ℎ𝑖𝑗 (𝑡)𝜕𝑖𝑗 𝑔ℎ𝑖𝑗 (𝑡)=0 + 𝔼(ℎ𝑖𝑗 (𝑡)2 𝜕𝑖𝑗 𝑔ℎ𝑖𝑗 (𝑡)=0 )
3
+ O(sup 𝔼(|ℎ𝑖𝑗 (𝑡)3 𝜕𝑖𝑗 𝑔(𝜽𝑖𝑗 𝐻𝑡 )|))
𝜽𝑖𝑗
2 3
= 𝑠𝑖𝑗 𝔼(𝜕𝑖𝑗 𝑔ℎ𝑖𝑗 (𝑡)=0 ) + O(sup 𝔼(|ℎ𝑖𝑗 (𝑡)3 𝜕𝑖𝑗 𝑔(𝜽𝑖𝑗 𝐻𝑡 )|)),
𝜽𝑖𝑗
2 2 3
𝔼(𝜕𝑖𝑗 𝑔(𝐻𝑡 )) = 𝔼(𝜕𝑖𝑗 𝑔ℎ𝑖𝑗 (𝑡)=0 ) + O(sup 𝔼(|ℎ𝑖𝑗 (𝑡)(𝜕𝑖𝑗 𝑔(𝜽𝑖𝑗 𝐻))|)).
𝜽𝑖𝑗
Here the shorthand notation 𝜕𝑖𝑗 𝑔ℎ𝑖𝑗 (𝑡)=0 means (𝜕𝑖𝑗 𝑔)(𝐻 ˜𝑡 ), where (𝐻 ˜𝑡 )𝑘ℓ =
˜𝑡 )𝑖𝑗 = (𝐻
(𝐻𝑡 )𝑘ℓ if {𝑘, ℓ} ≠ {𝑖, 𝑗} and (𝐻 ˜𝑡 )𝑗𝑖 = 0. In the calculation above, we
used the independence of the matrix elements, in particular, that 𝐻 ˜ 𝑡 is indepen-
dent of ℎ𝑖𝑗 (𝑡). We also used the fact that the OU process preserves the first and
second moments, i.e., 𝔼ℎ𝑖𝑗 (𝑡) = 𝔼ℎ𝑖𝑗 (0) = 0 and 𝔼|ℎ𝑖𝑗 (𝑡)|2 = 𝔼|ℎ𝑖𝑗 (0)|2 = 1/𝑁.
Thus we have
3
𝜕𝑡 𝔼𝑔(𝐻𝑡 ) = 𝑁 1/2 O(sup sup 𝔼(𝑁 3/2 |ℎ𝑖𝑗 (𝑡)3 | + 𝑁 1/2 |ℎ𝑖𝑗 (𝑡)|)|𝜕𝑖𝑗 𝑔(𝜽𝑖𝑗 𝐻𝑡 )|).
𝑖≤𝑗 𝜽𝑖𝑗
Lemma 15.5. Suppose for a Wigner matrix 𝐻 we have the following estimate:
| 1 |
(15.15) max sup sup ||Im( ) || ≺ 𝑁 3𝜍+𝜀 .
1≤𝑘≤𝑁 |𝐸|≤2−𝜅 𝜂≥𝑁−1−𝜀 𝐻 − 𝐸 ± 𝑖𝜂 𝑘𝑘
1
≤ 𝐶 ∑ Im( ) .
𝑛
𝐻 − 𝐸 − 𝑖2𝑛 𝜂 𝑗𝑗
Now using (15.15) we can control the right-hand side of (15.17) and conclude
(15.16). □
Now we can finish the proof of Theorem 15.2. First note that from the trivial
bound
1 𝑦 1
(15.19) Im( ) ≤ ( ) Im( ) , 𝜂 ≤ 𝑦,
𝐻 − 𝐸 − 𝑖𝜂 𝑗𝑗 𝜂 𝐻 − 𝐸 − 𝑖𝑦 𝑗𝑗
and (15.3), the assumption (15.15) in Lemma 15.5 holds. Therefore, the bounds
(15.16) on the matrix elements are available.
15.2. PROOF OF THE CORRELATION FUNCTION COMPARISON THEOREM 177
1 𝛽 − 𝛼1 𝛽 − 𝛼3
𝑂𝜂 (𝛽1 , 𝛽2 , 𝛽3 ) ∶= ∫ d𝛼 d𝛼 d𝛼 𝑂(𝛼1 , 𝛼2 , 𝛼3 )𝜃𝜂 ( 1 ) ⋯ 𝜃𝜂 ( 3 )
𝑁 3 ℝ3 1 2 3 𝑁 𝑁
be its smoothing on scale 𝑁𝜂. We note that for any nonnegative 𝐶 1 function 𝑂
with compact support, there is a constant depending on 𝑂 such that
We can apply this bound with 𝜂′ = 1/𝑁 and combine it with (15.11) with 𝜂1 =
1/𝑁 and 𝜂2 = 𝑁 −1+𝜀 for any small 𝜀 > 0 to have
(3) 𝛽1 𝛽
∫ d𝛽1 d𝛽2 d𝛽3 𝑂𝜂 (𝛽1 , 𝛽2 , 𝛽3 )𝑝𝐰,𝑁 (𝐸 + ,…,𝐸 + 3)
ℝ3
𝑁 𝑁
(3)
(15.27) ∫ d𝑥1 d𝑥2 d𝑥3 𝑝𝐰,𝑁 (𝑥1 , 𝑥2 , 𝑥3 )𝜃𝜂 (𝑥1 − 𝐸1 )𝜃𝜂 (𝑥2 − 𝐸2 )𝜃𝜂 (𝑥3 − 𝐸3 ) ≤
3
𝐶𝔼𝐰 ∏ Im 𝑚(𝐸𝑗 + 𝑖𝜂).
𝑗=1
(3)
The same bound holds for 𝑝𝐯,𝑁 as well. Therefore, we can combine (15.23) and
(15.7) to obtain for any small 𝜀 > 0
(3) (3) 𝛽1 𝛽
(15.28) ∫ d𝛽1 d𝛽2 d𝛽3 𝑂𝜂 (𝛽1 , 𝛽2 , 𝛽3 )(𝑝𝐰,𝑁 + 𝑝𝐯,𝑁 )(𝐸 + , … , 𝐸 + 3 ) = 𝑂(𝑁 3𝜀 ).
ℝ3
𝑁 𝑁
To approximate 𝔼𝐰 𝐵3 , we define 𝜙𝐸1 ,𝐸2 (𝑥) = 𝜃𝜂 (𝑥 − 𝐸1 )𝜃𝜂 (𝑥 − 𝐸2 ). Recall
𝜂 = 𝑁 −1−𝑎 and let 𝜂 ˆ = 𝑁 −1−9𝑎 . Decompose 𝜃 ˆ𝜂ˆ = 𝜃ˆ1 + 𝜃ˆ2 where 𝜃 ˆ2 (𝑦) =
ˆ𝑁 3𝑎 ). Denote
𝜃𝜂ˆ (𝑦)1(|𝑦| ≥ 𝜂
Now we show that the contribution of this error term to the right-hand side of
(15.24) is negligible. Recalling 𝐸1 = 𝐸 +𝛼1 /𝑁 and using ∫ d𝛼1 𝜃𝜂 (𝑦 −𝐸1 ) ≤ 𝐶𝑁,
15.2. PROOF OF THE CORRELATION FUNCTION COMPARISON THEOREM 181
we have
|
|∫ d𝛼1 d𝛼2 d𝛼3 𝑂(𝛼1 , 𝛼2 , 𝛼3 )
| ℝ3
|
× ∫ d𝑦𝜃𝜂 (𝑦 − 𝐸1 )𝜃𝜂 (𝑦 − 𝐸2 )𝔼𝐰 𝑁 −3 ∑ 𝑁 −3𝑎 𝜃𝑁3𝑎 𝜂ˆ (𝜆𝑖 − 𝑦)𝜃𝜂 (𝜆𝑘 − 𝐸3 )|
𝑖,𝑘 |
(15.33)
≤ 𝐶𝑁 −3𝑎 ∫ d𝛼2 d𝛼3 𝔼𝐰 𝑁 −2 ∑ 𝜃𝜂 ∗ 𝜃𝑁3𝑎 𝜂ˆ (𝜆𝑖 − 𝐸2 ) ∑ 𝜃𝜂 (𝜆𝑘 − 𝐸3 )
|𝛼2 |+|𝛼3 |≤𝐶 𝑖 𝑘
+ 𝑂(𝑁 −𝑎 ).
| |
|∫ d𝛼1 d𝛼2 d𝛼3 𝑂(𝛼1 , 𝛼2 , 𝛼3 )[𝔼𝐰 − 𝔼𝐯 ]𝐵3 |
| ℝ3 |
| 𝐰 𝐯 −2 |
≤ 𝐶 ∫ d𝑦 𝜙𝐸1 ,𝐸2 (𝑦)||[𝔼 − 𝔼 ]𝑁 ∑ 𝜃𝜂ˆ (𝜆𝑖 − 𝑦)𝜃𝜂 (𝜆𝑘 − 𝐸3 )||
𝑖,𝑘
(15.35) −𝑎
+ 𝑂(𝑁 )
≤ 𝐶(𝑁𝜂)−1 𝑁 −𝛿 + 𝑂(𝑁 −𝑎 )
≤ 𝑂(𝑁 −𝑎 )
and
(16.4) max{|𝜕 𝛼 𝐹(𝑥1 , … , 𝑥𝑛 )| ∶ max |𝑥𝑗 | ≤ 𝑁 2 } ≤ 𝑁 𝐶0
𝑗
Let 𝐸 (𝑖𝑗) denote the matrix whose matrix elements are 0 everywhere except at
(𝑖𝑗)
the (𝑖, 𝑗) position, where it is 1, i.e., 𝐸𝑘ℓ = 𝛿𝑖𝑘 𝛿𝑗ℓ . Fix an 𝛾 ≥ 1 and let (𝑖, 𝑗)
be determined by 𝜙(𝑖, 𝑗) = 𝛾. We will compare 𝐻𝛾−1 with 𝐻𝛾 . Note that these
two matrices differ only in the (𝑖, 𝑗) and (𝑗, 𝑖) matrix elements, and they can be
written as
1
𝐻𝛾−1 = 𝑄 + 𝑉, 𝑉 ∶= 𝑣𝑖𝑗 𝐸 (𝑖𝑗) + 𝑣𝑗𝑖 𝐸 (𝑗𝑖) ,
√𝑁
1
𝐻𝛾 = 𝑄 + 𝑊, 𝑊 ∶= 𝑤𝑖𝑗 𝐸 (𝑖𝑗) + 𝑤𝑗𝑖 𝐸 (𝑗𝑖) ,
√𝑁
with a matrix 𝑄 that has zero matrix element at the (𝑖, 𝑗) and (𝑗, 𝑖) positions and
where we set 𝑣𝑗𝑖 ∶= 𝑣 𝑖𝑗 for 𝑖 < 𝑗 and similarly for 𝑤. Define the Green functions
1 1
𝑅 ∶= , 𝑆 ∶= .
𝑄−𝑧 𝐻𝛾 − 𝑧
We first claim that the estimate (15.16) holds for the Green function 𝑅 as
well. To see this, we have, from the resolvent expansion,
𝑅 = 𝑆 + 𝑁 −1/2 𝑆𝑉𝑆 + ⋯ + 𝑁 −9/5 (𝑆𝑉)9 𝑆 + 𝑁 −5 (𝑆𝑉)10 𝑅.
Since 𝑉 has only at most two nonzero elements, when computing the (𝑘, ℓ)
matrix element of this matrix identity, each term is a finite sum involving matrix
186 16. UNIVERSALITY OF WIGNER MATRICES IN SMALL ENERGY WINDOWS: GFT
elements of 𝑆 or 𝑅 and 𝑣𝑖𝑗 , e.g., (𝑆𝑉𝑆)𝑘ℓ = 𝑆𝑘𝑖 𝑣𝑖𝑗 𝑆𝑗ℓ + 𝑆𝑘𝑗 𝑣𝑗𝑖 𝑆𝑖ℓ . Using the
bound (15.16) for the 𝑆 matrix elements, the subexponential decay for 𝑣𝑖𝑗 , and
the trivial bound |𝑅𝑖𝑗 | ≤ 𝜂 −1 , we obtain that the estimate (15.16) holds for 𝑅.
We can now start proving the main result by comparing the resolvents of
(𝛾−1)
𝐻 and 𝐻 (𝛾) with the resolvent 𝑅 of the reference matrix 𝑄. By the resolvent
expansion,
𝑆 = 𝑅 − 𝑁 −1/2 𝑅𝑉𝑅 + 𝑁 −1 (𝑅𝑉)2 𝑅 − 𝑁 −3/2 (𝑅𝑉)3 𝑅
(16.8)
+ 𝑁 −2 (𝑅𝑉)4 𝑅 − 𝑁 −5/2 (𝑅𝑉)5 𝑆,
so we can write
4
1 (𝑚)
Tr 𝑆 = 𝑅ˆ + 𝜉, 𝜉 ∶= ∑ 𝑁 −𝑚/2 𝑅ˆ𝐯 + 𝑁 −5/2 Ω𝐯
𝑁 𝑚=1
with
1 (𝑚) 1
𝑅ˆ ∶= Tr 𝑅, 𝑅ˆ𝐯 ∶= (−1)𝑚 Tr (𝑅𝑉)𝑚 𝑅,
(16.9) 𝑁 𝑁
1
Ω𝐯 ∶= − Tr (𝑅𝑉)5 𝑆.
𝑁
For each diagonal element in the computation of these traces, the contribution
ˆ 𝑅ˆ(𝑚)
to 𝑅, 𝐯 , and Ω𝐯 is a sum of a few terms, e.g.,
(2) 1
𝑅ˆ𝐯 = ∑[𝑅𝑘𝑖 𝑣𝑖𝑗 𝑅𝑗𝑗 𝑣𝑗𝑖 𝑅𝑖𝑘 + 𝑅𝑘𝑖 𝑣𝑖𝑗 𝑅𝑗𝑖 𝑣𝑖𝑗 𝑅𝑗𝑘
𝑁 𝑘
+ 𝑅𝑘𝑗 𝑣𝑗𝑖 𝑅𝑖𝑖 𝑣𝑖𝑗 𝑅𝑗𝑘 + 𝑅𝑘𝑗 𝑣𝑗𝑖 𝑅𝑖𝑗 𝑣𝑗𝑖 𝑅𝑖𝑘 ]
and similar formulas hold for the other terms. Then we have
1 1
𝔼𝐹( Tr )
𝑁 𝐻𝛾 − 𝑧
= 𝔼𝐹(𝑅ˆ + 𝜉)
(16.10)
= 𝔼[𝐹(𝑅ˆ) + 𝐹 ′ (𝑅ˆ)𝜉 + 𝐹 ′′ (𝑅ˆ)𝜉 2 + ⋯ + 𝐹 (5) (𝑅ˆ + 𝜉 ′ )𝜉 5 ]
5
(𝑚)
= ∑ 𝑁 −𝑚/2 𝔼𝐴𝐯
𝑚=0
(𝑚)
where 𝜉 ′ is a number between 0 and 𝜉 that depends on 𝑅ˆ and 𝜉, and the 𝐴𝐯 ’s
are defined as
(0) (1) (1) (2) (1) (2)
ˆ
𝐴𝐯 = 𝐹(𝑅), ˆ 𝑅ˆ𝐯 ,
𝐴𝐯 = 𝐹 ′ (𝑅) ˆ 𝑅ˆ𝐯 )2 + 𝐹 ′ (𝑅)
𝐴𝐯 = 𝐹 ″ (𝑅)( ˆ 𝑅ˆ𝐯 ,
(3) (4)
and similarly for 𝐴𝐯 and 𝐴𝐯 . Finally,
(5)
ˆ + 𝐹 (5) (𝑅ˆ + 𝜉 ′ )(𝑅ˆ(1)
𝐴𝐯 = 𝐹 ′ (𝑅)Ω 5
𝐯 ) + ⋯.
16.1. GREEN FUNCTION COMPARISON THEOREMS 187
(𝑚)
The expectation values of the terms 𝐴𝐯 , 𝑚 ≤ 4, with respect to 𝑣𝑖𝑗 are deter-
mined by the first four moments of 𝑣𝑖𝑗 ; e.g.,
(2) 1
𝔼𝐴𝐯 = 𝐹 ′ (𝑅ˆ )[ ∑ 𝑅 𝑅 𝑅 + ⋯ ]𝔼|𝑣𝑖𝑗 |2
𝑁 𝑘 𝑘𝑖 𝑗𝑗 𝑖𝑘
1
+ 𝐹 ″ (𝑅ˆ )[ ∑ 𝑅 𝑅 𝑅 𝑅 + ⋯ ]𝔼|𝑣𝑖𝑗 |2
𝑁 2 𝑘,ℓ 𝑘𝑖 𝑗ℓ ℓ𝑗 𝑖𝑘
1
+ 𝐹 ′ (𝑅ˆ )[ ∑ 𝑅𝑘𝑖 𝑅𝑗𝑖 𝑅𝑗𝑘 + ⋯ ]𝔼 𝑣𝑖𝑗
2
𝑁 𝑘
1
+ 𝐹 ″ (𝑅ˆ )[ 2
∑ 𝑅 𝑅 𝑅 𝑅 + ⋯ ]𝔼𝑣𝑖𝑗 .
𝑁 2 𝑘,ℓ 𝑘𝑖 𝑗ℓ ℓ𝑖 𝑗𝑘
for some positive constant 𝐶2 . Let 𝜉 G be a Gaussian random variable with mean 0
and variance 1. Then, for any sufficiently small 𝛾 > 0 (depending on 𝐶2 ), there
exists a real random variable 𝜉𝛾 with subexponential decay that is independent
of 𝜉 G such that the first four moments of
(16.13) 𝜉 ′ = (1 − 𝛾)1/2 𝜉𝛾 + 𝛾1/2 𝜉 G
are 𝑚1 (𝜉 ′ ) = 0, 𝑚2 (𝜉 ′ ) = 1, 𝑚3 (𝜉 ′ ) = 𝑚3 , and 𝑚4 (𝜉 ′ ), and
(16.14) |𝑚4 (𝜉 ′ ) − 𝑚4 | ≤ 𝐶𝛾
for some 𝐶 depending on 𝐶2 .
Proof. We first construct a random variable 𝑋 with the first four moments
given by 0, 1, 𝑚3 , 𝑚4 satisfying 𝑚4 − 𝑚32 − 1 ≥ 0. We take the law of 𝑋 to be of
the form
𝑝𝛿𝑎 + 𝑞𝛿−𝑏 + (1 − 𝑝 − 𝑞)𝛿0 ,
where 𝑎, 𝑏, 𝑝, 𝑞 ≥ 0 are parameters satisfying 𝑝+𝑞 ≤ 1. The conditions 𝑚1 (𝑋) =
0 and 𝑚2 (𝑋) = 1 imply
1 1
𝑝= , 𝑞= .
𝑎(𝑎 + 𝑏) 𝑏(𝑎 + 𝑏)
Furthermore, the condition 𝑝 +𝑞 ≤ 1 reads 𝑎𝑏 ≥ 1. By an explicit computation,
we find
(16.15) 𝑚3 (𝑋) = 𝑎 − 𝑏, 𝑚4 (𝑋) = 𝑚3 (𝑋)2 + 𝑎𝑏.
Clearly, the condition 𝑚4 − 𝑚32 − 1 ≥ 0 implies that (16.15) has a solution with
𝑎𝑏 ≥ 1. This proves the existence of a random variable supported at three points
given four moments 0, 1, 𝑚3 , and 𝑚4 satisfying 𝑚4 − 𝑚32 − 1 ≥ 0.
Our main task is to construct a Gaussian divisible distribution that approx-
imates the four moment matching conditions stated in the lemma. For any real
random variable 𝜁, independent of 𝜉 G , and with the first four moments being
0, 1, 𝑚3 (𝜁) and 𝑚4 (𝜁) < ∞, the first four moments of
(16.16) 𝜁 ′ = (1 − 𝛾)1/2 𝜁 + 𝛾1/2 𝜉 G
are 0, 1,
(16.17) 𝑚3 (𝜁 ′ ) = (1 − 𝛾)3/2 𝑚3 (𝜁),
and
(16.18) 𝑚4 (𝜁 ′ ) = (1 − 𝛾)2 𝑚4 (𝜁) + 6𝛾 − 3𝛾2 .
Since we can match four moments by a random variable, for any 𝛾 > 0 there
exists a real random variable 𝜉𝛾 such that the first four moments are
0, 1, 𝑚3 (𝜉𝛾 ) = (1 − 𝛾)−3/2 𝑚3 ,
(16.19)
and 𝑚4 (𝜉𝛾 ) = 𝑚3 (𝜉𝛾 )2 + (𝑚4 − 𝑚32 ).
With 𝑚4 ≤ 𝐶2 , we have 𝑚32 ≤ 𝐶23/2 ; thus,
(16.20) |𝑚4 (𝜉𝛾 ) − 𝑚4 | ≤ 𝐶𝛾
190 16. UNIVERSALITY OF WIGNER MATRICES IN SMALL ENERGY WINDOWS: GFT
for some 𝐶 depending on 𝐶2 . Hence with (16.17) and (16.18), we obtain that
𝜉 ′ = (1 − 𝛾)1/2 𝜉𝛾 + 𝛾1/2 𝜉 G satisfies 𝑚3 (𝜉 ′ ) = 𝑚3 and (16.14). This completes
the proof of Lemma 16.2. □
We now prove Theorem 5.1, i.e., that the limit in (15.1) holds. We restrict
ourselves to the real symmetric case since the moment matching Lemma 16.2
was formulated for real random variables. A similar argument works for the
complex case. From the universality of Gaussian divisible ensembles (more
precisely, (12.24)) for any 𝑏 = 𝑁 −1+10𝜀 and 𝑡 ≥ 𝑁 −𝜀 , we have
𝐸+𝑏
1 (𝑛) (𝑛) 𝜶
(16.21) ∫ d𝐸 ′ ∫ d𝜶 𝑂(𝜶)(𝑝𝑡,𝑁 − 𝑝𝐺,𝑁 )(𝐸 ′ + ) → 0
2𝑏 𝐸−𝑏 ℝ𝑛
𝑁
(𝑛)
where 𝑝𝑡,𝑁 is the eigenvalue correlation function of the matrix ensemble 𝐻𝑡 =
𝑒−𝑡/2 𝐻0 + √1 − 𝑒−𝑡 𝐻 G .
The initial matrix ensemble 𝐻0 can be any Wigner ensemble. Recall that 𝐻
is the Wigner ensemble for which we wish to prove (15.1). Given the first four
moments 𝑚1 = 0, 𝑚2 = 1, 𝑚3 , and 𝑚4 of the rescaled matrix elements √𝑁ℎ𝑖𝑗
of 𝐻, we first use Lemma 16.2 to construct a distribution 𝜉𝛾 with 𝛾 = 1 − 𝑒−𝑡 .
This distribution has the property that the first four moments of 𝜉 ′ , defined by
(16.13), matches the first four moments of the rescaled matrix elements of 𝐻.
We now choose 𝐻0 to be the Wigner ensemble with rescaled matrix elements
distributed according to 𝜉𝛾 . Then, on one hand, the matrix 𝐻𝑡 will have universal
local spectral statistics in the sense of (16.21); on the other hand, we can apply
the Green function comparison Theorem 16.1 to the matrix ensembles 𝐻 and 𝐻𝑡
so that (15.9) holds for these two ensembles. Clearly, (15.1) follows from (16.21)
and (15.9). Note that averaging in the energy parameter 𝐸 was necessary only for
the first step in (16.21), the comparison argument works for any fixed energy 𝐸.
CHAPTER 17
Edge Universality
The main result of edge universality for the generalized Wigner matrix is
given by the following theorem. For simplicity, we formulate it only for the real
symmetric case; the complex Hermitian case is analogous.
Theorem 17.1 (Universality of extreme eigenvalues). Suppose that we have
two real symmetric generalized 𝑁 × 𝑁 Wigner matrices, 𝐻 (𝐯) and 𝐻 (𝐰) , with
matrix elements ℎ𝑖𝑗 given by the random variables 𝑁 −1/2 𝑣𝑖𝑗 and 𝑁 −1/2 𝑤𝑖𝑗 , re-
spectively, with 𝑣𝑖𝑗 and 𝑤𝑖𝑗 satisfying the uniform subexponential decay condition
(2.7). Suppose that
2 2
(17.1) 𝔼𝑣𝑖𝑗 = 𝔼𝑤𝑖𝑗 .
(𝐯) (𝐰)
Let 𝜆𝑁 and 𝜆𝑁 denote the largest eigenvalues of 𝐻 (𝐯) and 𝐻 (𝐰) , respectively.
Then there is an 𝜀 > 0 and 𝛿 > 0 depending on 𝜗 in (2.7) such that for any real
parameter 𝑠 (may depend on 𝑁) we have
(𝐯)
ℙ(𝑁 2/3 (𝜆𝑁 − 2) ≤ 𝑠 − 𝑁 −𝜀 ) − 𝑁 −𝛿
(𝐰)
(17.2) ≤ ℙ(𝑁 2/3 (𝜆𝑁 − 2) ≤ 𝑠)
(𝐯)
≤ ℙ(𝑁 2/3 (𝜆𝑁 − 2) ≤ 𝑠 + 𝑁 −𝜀 ) + 𝑁 −𝛿
for 𝑁 ≥ 𝑁0 sufficiently large, where 𝑁0 is independent of 𝑠. An analogous result
holds for the smallest eigenvalue 𝜆1 .
This theorem shows that the statistics of the extreme eigenvalues depend
only on the second moments of the centered matrix entries, but the result does
not determine the distribution. For the Gaussian case, the corresponding dis-
tribution was identified by Tracy and Widom in [134, 135]. Theorem 17.1 im-
mediately gives the same result for Wigner matrices, but not yet for generalized
Wigner matrices. In fact, the Tracy-Widom law for generalized Wigner matri-
ces does not follow from Theorem 17.1, and its proof in [24] required a quite
different argument (see Theorem 18.7).
(𝐯)
We first give an outline of the proof of Theorem 17.1. Denote by 𝜚𝑁 the
empirical eigenvalue distribution
𝑁
(𝐯) 1 (𝐯)
𝜚𝑁 (𝑥) = ∑ 𝛿(𝑥 − 𝜆𝛼 ).
𝑁 𝛼=1
191
192 17. EDGE UNIVERSALITY
(𝐯) (𝐰)
Our goal is to compare the difference of 𝜚𝑁 and 𝜚𝑁 via their Stieltjes trans-
(𝐯) (𝐰)
forms, 𝑚𝑁 (𝑧) and 𝑚𝑁 (𝑧). We will be able to locate the largest eigenvalue via
the distribution of certain functionals of the Stieltjes transforms. This will be
done in the preparatory Lemma 17.2 and its corollary, Corollary 17.3. Therefore,
(𝐯) (𝐰) (𝐯)
if 𝑚𝑁 (𝑧) and 𝑚𝑁 (𝑧) are sufficiently close, we will be able to compare 𝜆𝑁 and
(𝐰)
𝜆𝑁 .
The main ingredient of the proof of Theorem 17.1 is the Green function com-
parison theorem at the edge (Theorem 17.4), which shows that the distributions
(𝐯) (𝐰)
of 𝑚𝑁 (𝑧) and 𝑚𝑁 (𝑧) on the critical scale Im 𝑧 ≈ 𝑁 −2/3 are the same provided
the second moments of the matrix elements match. This comparison will be
done by a resolvent expansion as done for the Green function comparison the-
orem in the bulk (Theorem 16.1). However, the resolvent expansion is more
efficient at the edge for a reason that we explain now.
Recall (6.13), stating that 𝑚sc (𝑧) ≍ 𝜂 if 𝜅 = ||𝐸| − 2| ≤ 𝜂 and 𝑧 = 𝐸 + 𝑖𝜂.
Since the extremal eigenvalue gaps are expected to be order 𝑁 −2/3 , we need to
set 𝜂 ≪ 𝑁 −2/3 in order to identify individual eigenvalues. This is a smaller scale
than that of the local semicircle law; in fact, 𝑚𝑁 (𝑧) does not have a deterministic
limit; we need to identify its distribution. Hence, the reference size of Im 𝑚𝑁 (𝑧)
that we should keep in mind is 𝑁 −1/3 . The largest eigenvalue can be located via
the Stieltjes transform as follows. By definition of 𝑚𝑁 , if there is an eigenvalue
in a neighborhood of 𝐸 within distance 𝜂, then
1 𝜂 1
Im 𝑚𝑁 (𝐸 + i𝜂) = ∑ ≥ ≫ 𝑁 −1/3 ;
2
𝑁 𝛼 (𝜆𝛼 − 𝐸) + 𝜂 2 𝑁𝜂
otherwise Im 𝑚𝑁 (𝐸 +i𝜂) ≲ 𝑁 −1/3 . Based upon this idea, we will construct a cer-
tain functional of Im 𝑚𝑁 that expresses the distribution of 𝜆𝑁 . In other words,
we expect that if we can control the imaginary part of the trace of the Green
function by 𝑁 −1/3 , then we can identify the individual extremal eigenvalues.
Roughly speaking, our basic principle is that whenever we understand 𝑚𝑁 (𝑧)
with a precision better than (𝑁𝜂)−1 (i.e., we can improve over the strong local semi-
circle law (6.32)
1
|𝑚𝑁 (𝑧) − 𝑚sc (𝑧)| ≺ , Im 𝑧 = 𝜂,
𝑁𝜂
at some scale 𝜂), then we can “identify” the eigenvalue distribution at that scale
precisely. It is instructive to compare the edge situation with the bulk, where
the critical scale 𝜂 ≍ 1/𝑁 identifies individual eigenvalues and the typical size
of Im 𝑚𝑁 is of order 1. In other words, the critical scale Im 𝑚𝑁 is of order 1 in
the bulk and is of order 𝑁 −1/3 at the edge. Owing to (6.31), this fact extends to
each resolvent matrix element, i.e., |𝐺𝑖𝑗 | ≲ 𝑁 −1/3 at the edge. Thus a resolvent
expansion is more powerful near the edge, which explains that less moment
matching is needed than in the bulk.
17. EDGE UNIVERSALITY 193
Now we start the detailed proof of Theorem 17.1. We first introduce some
notation. For any 𝐸1 ≤ 𝐸2 , let
Lemma 17.2. For any 𝜀 > 0 set ℓ1 = 𝑁 −2/3−3𝜀 and 𝜂 = 𝑁 −2/3−9𝜀 . Then
there exist constants 𝐶, 𝑐 such that for any 𝐸 satisfying |𝐸 − 2| ≤ 32 𝑁 −2/3+𝜀 and
for any 𝐷 we have
We will not give the detailed proof of this lemma since it is a straightforward
approximation argument. We only point out that 𝜃𝜂 is an approximate delta
function on scale 𝜂 ≪ 𝑁 −2/3 , and if it were compactly supported, then the
difference between Tr 𝜒𝐸 (𝐻) and Tr 𝜒𝐸 ⋆ 𝜃𝜂 (𝐻) were clearly bounded by the
number of eigenvalues in an 𝜂-vicinity of 𝐸. Since 𝜃𝜂 (𝑥) = 𝜋 −1 𝜂/(𝑥 2 + 𝜂 2 ); i.e.,
it has a quadratically decaying tail, the above argument must be complemented
by estimating the density of eigenvalues away from 𝐸. This estimate comes
from two contributions. Eigenvalues within the ℓ1 -vicinity of 𝐸 are directly
estimated by the term 𝒩(𝐸 − ℓ1 , 𝐸 + ℓ1 ). Eigenvalues farther away come with
an additional factor 𝜂/ℓ1 = 𝑁 −6𝜀 due to the decay of 𝜃𝜂 ; thus, it is sufficient to
estimate their density only up to an 𝑁 𝜀 factor precision, which is provided by
the optimal local law at the edge. For precise details, see the proof of lemma 6.1
in [70] for essentially the same argument.
Using (17.3) and the local law to estimate a slightly averaged version of
𝒩(𝐸 − ℓ1 , 𝐸 + ℓ1 ), we arrive at the following corollary:
holds with probability bigger than 1 − 𝑁 −𝐷 for any 𝐷 if 𝑁 is large enough. Fur-
thermore, we have
𝔼𝐹(Tr 𝜒𝐸−ℓ ⋆ 𝜃𝜂 (𝐻)) ≤ ℙ(𝒩(𝐸, ∞) = 0)
(17.6)
≤ 𝔼𝐹(Tr 𝜒𝐸+ℓ ⋆ 𝜃𝜂 (𝐻)) + 𝐶𝑁 −𝐷 .
3
Proof. We have 𝐸+ − 𝐸 ≫ ℓ; thus |𝐸 − 2 − ℓ| ≤ 𝑁 −2/3−𝜀 , and therefore
2
(17.4) holds for 𝐸 replaced with 𝑦 ∈ [𝐸 − ℓ, 𝐸] as well. We thus obtain
𝐸
−1
Tr 𝜒𝐸 (𝐻) ≤ ℓ ∫ d𝑦 Tr 𝜒𝑦 (𝐻)
𝐸−ℓ
𝐸
≤ ℓ−1 ∫ d𝑦 Tr 𝜒𝑦 ∗ 𝜃𝜂 (𝐻)
𝐸−ℓ
𝐸
+ 𝐶ℓ−1 ∫ d𝑦[𝑁 −2𝜀 + 𝒩(𝑦 − ℓ1 , 𝑦 + ℓ1 )]
𝐸−ℓ
ℓ1
≤ Tr 𝜒𝐸−ℓ ∗ 𝜃𝜂 (𝐻) + 𝐶𝑁 −2𝜀 + 𝐶 𝒩(𝐸 − 2ℓ, 𝐸 + ℓ)
ℓ
with a probability larger than 1 − 𝑁 −𝐷 for any 𝐷 > 0. From the rigidity estimate
in the form (11.25), the conditions |𝐸 − 2| ≤ 𝑁 −2/3+𝜀 , ℓ1 /ℓ = 2𝑁 −2𝜀 , and ℓ ≤
𝑁 −2/3 , we can bound
𝐸+ℓ
ℓ1
𝒩(𝐸 − 2ℓ, 𝐸 + ℓ) ≤ 𝑁 1−2𝜀 ∫ 𝜚sc (𝑥)d𝑥 + 𝑁 −2𝜀 𝑁 𝜀 ≤ 𝐶𝑁 −𝜀
ℓ 𝐸−2ℓ
with a very high probability, where we estimated the explicit integral using that
the integration domain is in a 𝐶𝑁 −2/3+𝜀 -vicinity of the edge at 2. We have thus
proved
𝒩(𝐸, 𝐸+ ) = Tr 𝜒𝐸 (𝐻) ≤ Tr 𝜒𝐸−ℓ ∗ 𝜃𝜂 (𝐻) + 𝑁 −𝜀 .
By (17.3), we can replace 𝒩(𝐸, 𝐸+ ) by 𝒩(𝐸, ∞). This proves the upper bound
of (17.5) and the lower bound can be proved similarly.
In the event that (17.5) holds, the condition 𝒩(𝐸, ∞) = 0 implies that
Tr 𝜒𝐸+ℓ ∗ 𝜃𝜂 (𝐻) ≤ 1/9. Thus we have
(17.7) ℙ(𝒩(𝐸, ∞) = 0) ≤ ℙ(Tr 𝜒𝐸+ℓ ∗ 𝜃𝜂 (𝐻) ≤ 1/9) + 𝐶𝑁 −𝐷 .
Together with the Markov inequality, this proves the upper bound in (17.6). For
the lower bound, we use
𝔼𝐹(Tr 𝜒𝐸−ℓ ∗ 𝜃𝜂 (𝐻)) ≤ ℙ(Tr 𝜒𝐸−ℓ ∗ 𝜃𝜂 (𝐻) ≤ 2/9)
≤ ℙ(𝒩(𝐸, ∞) ≤ 2/9 + 𝑁 −𝜀 )
= ℙ(𝒩(𝐸, ∞) = 0),
where we used the upper bound from (17.5) and that 𝒩 is an integer. This
completes the proof of the corollary. □
17. EDGE UNIVERSALITY 195
Since ℙ(𝜆𝑁 < 𝐸) = ℙ(𝒩(𝐸, ∞) = 0), we will use (17.6) and the identity
𝐸+
(17.8) Tr 𝜒𝐸 ⋆ 𝜃𝜂 (𝐻) = 𝑁 ∫ d𝑦 Im 𝑚𝑁 (𝑦 + 𝑖𝜂)
𝐸
to relate the distribution of 𝜆𝑁 with that of the Stieltjes transform 𝑚𝑁 . The fol-
lowing Green function comparison theorem shows that the distribution of the
right-hand side of (17.8) depends only on the second moments of the matrix
elements. Its proof will be given after we have completed the proof of Theo-
rem 17.1.
with some constant 𝐶1 > 0. Then, there exists 𝜀0 > 0 depending only on 𝐶1 such
that for any 𝜀 < 𝜀0 and for any real numbers 𝐸1 and 𝐸2 satisfying
|𝐸1 − 2| ≤ 𝐶𝑁 −2/3+𝜀 , |𝐸2 − 2| ≤ 𝐶𝑁 −2/3+𝜀 ,
and setting 𝜂 = 𝑁 −2/3−𝜀 , we have
𝐸2
|
(17.10) ||𝔼𝐹(𝑁 ∫ d𝑦 Im 𝑚(𝐯) (𝑦 + 𝑖𝜂))
| 𝐸1
𝐸2
|
− 𝔼𝐹(𝑁 ∫ d𝑦 Im 𝑚(𝐰) (𝑦 + 𝑖𝜂))| ≤ 𝐶𝑁 −1/6+𝐶𝜀
𝐸1 |
for some constant 𝐶 and large enough 𝑁 depending only on 𝐶1 , 𝜗, 𝛿± , and 𝐶0 (in
(14.26)).
Note that the 𝑁 times the integration gives a factor 𝑁|𝐸1 − 𝐸2 | ∼ 𝑁 1/3+𝜀 on
the left-hand side. This factor compensates for the “natural” size of the imagi-
nary part of the Stieltjes transform obtained from the local semicircle law and
makes the argument of 𝐹 to be of order 1. The bound (17.10) shows that the
distributions of this order 1 quantity with respect to the 𝐯- and 𝐰-ensembles
are asymptotically the same.
Assuming that Theorem 17.4 holds, we now prove Theorem 17.1.
define 𝐸 ∶= 2 + 𝑠𝑁 −2/3 . With the left side of (17.6), for any sufficiently small
𝜀 > 0, we have
(17.11) 𝔼𝐰 𝐹(Tr 𝜒𝐸−ℓ ∗ 𝜃𝜂 (𝐻)) ≤ ℙ𝐰 (𝒩(𝐸, ∞) = 0)
with the choice
1
ℓ ∶= 𝑁 −2/3−𝜀 , 𝜂 ∶= 𝑁 −2/3−9𝜀 .
2
The bound (17.10) applying to the case 𝐸1 = 𝐸 − ℓ and 𝐸2 = 𝐸+ shows that
there exists 𝛿 > 0, for sufficiently small 𝜀 > 0, such that
(17.12) 𝔼𝐯 𝐹(Tr 𝜒𝐸−ℓ ∗ 𝜃𝜂 (𝐻)) ≤ 𝔼𝐰 𝐹(Tr 𝜒𝐸−ℓ ∗ 𝜃𝜂 (𝐻)) + 𝑁 −𝛿
(note that 9𝜀 plays the role of the 𝜀 in the Green function comparison theorem).
Then, applying the right side of (17.6) to the left-hand side of (17.12), we have
(17.13) ℙ𝐯 (𝒩(𝐸 − 2ℓ, ∞) = 0) ≤ 𝔼𝐯 𝐹(Tr 𝜒𝐸−ℓ ∗ 𝜃𝜂 (𝐻)) + 𝐶𝑁 −𝐷 .
Combining these inequalities, we have
(17.14) ℙ𝐯 (𝒩(𝐸 − 2ℓ, ∞) = 0) ≤ ℙ𝐰 (𝒩(𝐸, ∞) = 0) + 2𝑁 −𝛿
for sufficiently small 𝜀 > 0 and sufficiently large 𝑁. By recalling that 𝐸 =
2 + 𝑠𝑁 −2/3 , this proves the first inequality of (17.2) and, by switching the roles
of 𝐯 and 𝐰, the second inequality of (17.2) as well. This completes the proof of
Theorem 17.1. □
Proof of Theorem 17.4. We follow the notation and the setup in the proof
of Theorem 16.1. First, we prove a simpler version of (17.10) with 𝐹(𝑥) = 𝑥 and
without integration. Namely, we show that for any 𝐸 with |𝐸 − 2| ≤ 𝑁 −2/3+𝜀
we have
(𝐯) (𝐰)
𝑁𝜂 ||𝔼𝑚𝑁 (𝑧) − 𝔼𝑚𝑁 (𝑧)||
(17.15) | 1 1 1 1 |
= 𝑁𝜂 ||𝔼 Tr − 𝔼 Tr |
𝑁 (𝐯)
𝐻 −𝑧 𝑁 𝐻 (𝐰) − 𝑧|
≤ 𝐶𝑁 −1/6+𝐶𝜀 , 𝑧 = 𝐸 + 𝑖𝜂.
The prefactor 𝑁𝜂 accounts for the 𝑁 times the integration from 𝐸1 to 𝐸2 in (17.10)
since |𝐸1 − 𝐸2 | ≤ 𝐶𝜂.
We write
1 1 1 1
(17.16) 𝔼 Tr − 𝔼 Tr =
𝑁 (𝐯)
𝐻 −𝑧 𝑁 𝐻 (𝐰) −𝑧
𝛾(𝑁)
1 1 1 1
∑ [𝔼 Tr − 𝔼 Tr ]
𝛾=1
𝑁 𝐻𝛾 − 𝑧 𝑁 𝐻𝛾−1 − 𝑧
with
1 (𝑚) 1
𝑅ˆ ∶= Tr 𝑅, 𝑅ˆ𝐯 ∶= (−1)𝑚 Tr(𝑅𝑉)𝑚 𝑅,
(17.18) 𝑁 𝑁
1 5
Ω𝐯 ∶= − Tr(𝑅𝑉) 𝑆.
𝑁
All these quantities depend on 𝛾, or equivalently on the pair of indices (𝑖, 𝑗) with
𝜙(𝑖, 𝑗) = 𝛾 (see the proof of Theorem 16.1); i.e., 𝑅ˆ = 𝑅(𝛾),
ˆ etc., but we will omit
this fact from the notation. We also define the same quantities for 𝐯 replaced
with 𝐰. By the moment matching condition (17.1),
(𝑚) (𝑚)
𝔼𝑅ˆ𝐯 = 𝔼𝑅ˆ𝐰 , 𝑚 = 0, 1, 2,
so we can consider the 𝑚 = 3, 4 terms only in the summation in (17.17). Since
|𝐸 −2| ≤ 𝑁 −2/3+𝜀 and 𝜂 = 𝑁 −2/3−𝜀 , the strong local semicircle law (6.32) implies
that
Im 𝑚sc (𝑧) 1
(17.19) |𝑅𝑖𝑗 (𝑧) − 𝛿𝑖𝑗 𝑚sc (𝑧)| ≺ + ≤ 𝑁 −1/3+𝐶𝜀 ,
√ 𝑁𝜂 𝑁𝜂
and a similar bound holds for 𝑆. In particular, the off-diagonal terms of the Green
functions are of order 𝑁 −1/3+𝐶𝜀 and the diagonal terms are bounded with high
(𝑚)
probability. As a crude bound, we immediately obtain that |𝑅ˆ𝐯 |, |Ω𝐯 | ≤ 𝑁 𝐶𝜀
with high probability.
Hence we have
𝛾(𝑁)
| 1 1 1 1 |
(17.20) ∑ (𝑁𝜂)||𝔼 Tr − 𝔼 Tr |≤
|
𝛾=1
𝑁 𝐻 𝛾 − 𝑧 𝑁 𝐻 𝛾−1 − 𝑧
4
(𝑚) (𝑚)
∑ 𝑁 2 (𝑁𝜂)𝑁 −𝑚/2 max|𝔼[𝑅ˆ𝐯 − 𝑅ˆ𝐰 ]| + 𝜂𝑁 1/2+𝐶𝜀 .
𝛾
𝑚=3
First we discuss the generic case, when all indices are different, in particular
𝑖 ≠ 𝑗, and later we comment on the case of the coinciding indices. Notice that,
generically, the first and the last resolvents 𝑅𝑘𝑎1 , 𝑅𝑏𝑚 𝑘 are off-diagonal terms;
thus their size is 𝑁 −1/3+𝐶𝜀 . Every other factor in (17.21) is 𝑂(𝑁 𝜀 ). Hence the
generic contributions to the sum (17.20) give
(𝑚)
(17.22) 𝑁 2 (𝑁𝜂)𝑁 −𝑚/2 𝑅ˆ𝐯 ≤ 𝑁 2−𝑚/2 𝑁 −1/3+𝐶𝜀 .
198 17. EDGE UNIVERSALITY
(3)
contributing to 𝑅ˆ𝐯 , where the summation is restricted to the choices consistent
with the fact that the all Green functions in the middle are diagonal, i.e., 𝑏1 = 𝑎2 ,
𝑏2 = 𝑎3 . Together with the restriction {𝑎ℓ , 𝑏ℓ } = {𝑖, 𝑗} and the assumption that
all indices are distinct, it yields only two terms:
1 1
(17.25) ∑ 𝑚2 𝔼[𝑅𝑘𝑖 𝑣𝑖𝑗 𝑣𝑗𝑖 𝑣𝑖𝑗 𝑅𝑗𝑘 ] + ∑ 𝑚sc
2
𝔼[𝑅𝑘𝑗 𝑣𝑗𝑖 𝑣𝑖𝑗 𝑣𝑗𝑖 𝑅𝑖𝑘 ].
𝑁 𝑘 sc 𝑁 𝑘
We may again replace the diagonal term 𝑅𝑘𝑘 with 𝑚sc at a negligible error, and
we are left with
(𝑖)
3 −3/2 (𝑖) (𝑖)
(17.28) −𝑚sc 𝑁 ∑ ∑ 𝔼[𝑅𝑘ℓ 𝑣ℓ𝑖 𝑣𝑖𝑗 𝑣𝑗𝑖 𝑣𝑖𝑗 𝑅𝑗𝑘 ].
𝑘 ℓ
The expectation with respect to the variable 𝑣ℓ𝑖 renders this term 0 unless ℓ =
𝑗, in which case we gain an additional factor 𝑁 −1/2+𝐶𝜀 in the power counting
compared with (17.26). So the contribution of the last displayed expression to
(17.23) is improved from 𝑁 1/6+𝐶𝜀 to 𝑁 −1/3+𝐶𝜀 .
The argument so far assumed that all indices are distinct. In case of coin-
ciding indices, each coinciding index results in a factor 1/𝑁 from the restriction
in the summation. At the same time, one of the off-diagonal elements may be-
come diagonal; hence a factor 𝑁 −1/3+𝐶𝜀 , attributed to this off-diagonal element,
is “lost.” The total balance of a coinciding index in the power counting is thus a
factor 𝑁 −2/3 , so we conclude that terms with coinciding indices are negligible.
This proves the simplified version (16.5) of Theorem 17.4.
To prove (17.10), for simplicity we ignore the integration and we prove only
that
| 1 1 1 1 |
(17.30) 𝑁 2 ||𝔼𝐹(𝑁𝜂 Im Tr ) − 𝔼𝐹(𝑁𝜂 Im Tr )|
𝑁 𝐻𝛾 − 𝑧 𝑁 𝐻𝛾 − 𝑧 |
≤ 𝐶𝑁 −1/6+𝐶𝜀 ,
and then sum it up for all 𝛾 = 𝜙(𝑖, 𝑗). As before, we assume the generic case,
i.e., 𝑖 ≠ 𝑗.
We Taylor expand the function 𝐹 around the point
Ξ ∶= 𝑁𝜂 Im 𝑚(𝑅) (𝑧), 𝑧 = 𝐸 + 𝑖𝜂,
200 17. EDGE UNIVERSALITY
where 𝑚(𝑅) = 𝑁1 Tr 𝑅. Notice that 𝑚(𝑅) is independent of 𝑣𝑖𝑗 and 𝑤𝑖𝑗 . Recalling
the definition of
4
(𝑚)
(17.31) 𝜉𝐯 = ∑ 𝑁 −𝑚/2 𝑅ˆ𝐯 + 𝑁 −5/2 Ω𝐯
𝑚=1
| 1 1 1 1 |
(17.32) ||𝔼𝐹(𝑁𝜂 Im Tr ) − 𝔼𝐹(𝑁𝜂 Im Tr )| ≤
𝑁 𝐻𝛾 − 𝑧 𝑁 𝐻𝛾−1 − 𝑧 |
1
|𝔼𝐹 ′ (Ξ)(𝑁𝜂[𝜉𝐯 − 𝜉𝐰 ])| + |𝔼𝐹 ″ (Ξ)([𝑁𝜂𝜉𝐯 ]2 − [𝑁𝜂𝜉𝐰 ]2 )| + ⋯ .
2
We now substitute (17.31) into this expression. By the näive power counting
(𝑚)
(𝑁𝜂)𝑁 −𝑚/2 |𝑅ˆ𝐯 | ≤ 𝑁 −𝑚/2−1/3+𝐶𝜀 , (𝑁𝜂)𝑁 −5/2 |Ω𝐯 | ≤ 𝑁 −13/6+𝐶𝜀 ,
from the previous proof. The contributions of all the Ω error terms as well as the
𝑚 = 4 terms clearly satisfy the aimed bound (17.30), and thus can be neglected.
First, we prove that the quadratic term (as well as all higher-order terms) in
the Taylor expansion (17.32) is negligible. Here the cancellation between the 𝐯
and 𝐰 contributions is essential. It is therefore sufficient to consider
3
′ (𝑚) (𝑚 ) ′ (𝑚) (𝑚 )′
(𝑁𝜂)2 ∑ 𝑁 −𝑚/2−𝑚 /2 [𝑅ˆ𝐯 𝑅ˆ𝐯 − 𝑅ˆ𝐰 𝑅ˆ𝐰 ]
𝑚,𝑚′ =1
where we already took into account that the 𝑚 = 4 term as well as the Ω𝐯
error term from (17.31) are negligible by direct power counting. If 𝐹 ′ (Ξ) were
deterministic, then the previous argument leading to (17.15) would also prove
the 𝑁 −1/6+𝐶𝜀 bound for the linear term in (17.32). In fact, the exact cancellation
(𝑚) (𝑚)
between the expectations of the 𝑅ˆ𝐯 and 𝑅ˆ𝐰 terms for 𝑚 = 1 and 𝑚 = 2
relied only on computing the expectation with respect to 𝑣𝑖𝑗 and 𝑤𝑖𝑗 and on the
matching of the first and second moments. Since 𝐹 ′ (Ξ) is independent of 𝑣𝑖𝑗
and 𝑤𝑖𝑗 , this argument remains valid even with the 𝐹 ′ (Ξ) factor included. We
thus need to control the 𝑚 = 3 term, i.e., show that
(3)
(17.33) 𝑁 2 (𝑁𝜂)𝑁 −3/2 𝔼𝐹 ′ (Ξ)𝑅ˆ𝐯 ≤ 𝑁 −1/6+𝐶𝜀 .
The same bound then would also hold if 𝐯 is replaced with 𝐰.
Using the boundedness of 𝐹 ′ and the argument between (17.24) and (17.28)
in the previous proof, we see that for (17.33) it is sufficient to show that
(𝑖)
| 2 −3/2 −3/2 (𝑖) (𝑖) |
(17.34) |𝑁 (𝑁𝜂)𝑁 𝑁 ∑ ∑ 𝔼𝐹 ′ (Ξ)[𝑅𝑘ℓ 𝑣ℓ𝑖 𝑣𝑖𝑗 𝑣𝑗𝑖 𝑣𝑖𝑗 𝑅𝑗𝑘 ]| ≤ 𝑁 −1/6+𝐶𝜀 .
| 𝑘 ℓ |
Without the 𝐹 ′ (Ξ) term, the expectation with respect to 𝑣ℓ𝑖 would render this
term 0, in the generic case when ℓ ≠ 𝑗. In order to exploit this effect we need
to expand 𝐹 ′ (Ξ) in the single variable 𝑣ℓ𝑖 .
Fix an index ℓ ≠ 𝑗 and let 𝑄ℓ be the matrix identical to 𝑄 except that the (ℓ, 𝑖)
and (𝑖, ℓ) matrix elements are 0, and let 𝑇 = 𝑇 ℓ = (𝑄ℓ − 𝑧)−1 be its resolvent.
Clearly 𝑇 is independent of 𝑣𝑖𝑗 and 𝑣ℓ𝑖 , and the local law (17.19) holds for the
matrix elements of 𝑇 as well. Using the resolvent expansion, we find
1
(17.35) Ξ = (𝑁𝜂) Im[ ∑[𝑇 + 𝑁 −1/2 (𝑇𝑛ℓ 𝑣ℓ𝑖 𝑇𝑖𝑛 + 𝑇𝑛𝑖 𝑣𝑖ℓ𝑖 𝑇ℓ𝑛 ) + ⋯ ]]
𝑁 𝑛 𝑛𝑛
202 17. EDGE UNIVERSALITY
Since Ξ0 is independent of 𝑣ℓ𝑖 , when we insert the above expansion for 𝐹 ′ (Ξ)
into (17.34), the contribution of the leading term 𝐹 ′ (Ξ0 ) gives 0 after taking the
expectation for 𝑣ℓ𝑖 .
The two subleading terms in (17.36), which are linear in 𝑣ℓ𝑖 , have generi-
cally two off-diagonal elements, 𝑇𝑛ℓ and 𝑇𝑖𝑛 (or 𝑇𝑛𝑖 and 𝑇ℓ𝑛 ), which have size
𝑁 −1/3+𝐶𝜀 each. So the contribution of this term to (17.34) by simple power
counting is
𝑁 2 (𝑁𝜂)𝑁 −3/2 𝑁 −3/2 𝑁𝑁(𝑁𝜂)𝑁 −1/2 (𝑁 −1/3+𝐶𝜀 )4 ≤ 𝑁 −1/6+𝐶𝜀 .
A similar argument shows that the higher-order terms in the Taylor expansion
(17.36) as well as higher-order terms in the resolvent expansion (17.35) have a
negligible contribution.
As before, we presented the argument for generic indices; coinciding indices
give smaller contributions as we explained before. We leave the details to the
reader. This completes the proof of (17.30) and thus the proof of Theorem 17.4.
□
CHAPTER 18
Spectral statistics of large random matrices have been studied from many
different perspectives. The main focus of this book was on one particular result:
the Wigner-Dyson-Mehta bulk universality for Wigner matrices in the average
energy sense (Theorem 5.1). In this last section we collect a few other recent
results and open questions of this very active research area. We also give refer-
ences to related results and summarize the history of the development.
Here 𝜆𝑗 ’s are the ordered eigenvalues, and for brevity we omitted the rescaling fac-
tor 𝜚(𝜆𝑗 ) from the argument of 𝑂. Moreover, 𝔼 and 𝔼G denote the expectation with
respect to the Wigner ensemble 𝐻 and the Gaussian (GOE or GUE) ensemble, re-
spectively.
The next result shows that Theorem 18.1 holds under the same conditions
without energy averaging.
Theorem 18.3 (Universality at fixed energy [26]). Consider a complex Her-
mitian (respectively, real symmetric) generalized Wigner matrix with moment
condition (18.1). Then, for any 𝐸 with |𝐸| < 2 we have
1 (𝑛) (𝑛) 𝜶
(18.4) lim ∫ d𝜶 𝑂(𝜶) 𝑛
(𝑝𝑁 − 𝑝G,𝑁 )(𝐸 + ) = 0.
𝑁→∞
ℝ𝑛
𝜚sc (𝐸) 𝑁𝜚sc (𝐸)
The fixed energy universality (18.4) for the 𝛽 = 2 (complex Hermitian)
case was already known earlier (see [57,66,132]). This case is exceptional since
the Harish-Chandra-Itzykson-Zuber identity allows one to compute correlation
functions for Gaussian divisible ensembles. This method relies on an algebraic
identity and has no generalization to other symmetry classes.
Finally, the gap universality with fixed label asserts that (18.3) holds without
averaging.
Theorem 18.4 (Gap universality with fixed label [67, theorem 2.2]). Con-
sider a complex Hermitian (respectively, real symmetric) generalized Wigner ma-
trix and assume subexponential decay of the matrix elements instead of (18.1).
Then, Corollary 18.2 holds without averaging:
(18.5) lim [𝔼 − 𝔼G ]𝑂(𝑁(𝜆𝑗 − 𝜆𝑗+1 ), … , 𝑁(𝜆𝑗 − 𝜆𝑗+𝑛 )) = 0
𝑁→∞
for any 𝑗 ∈ J𝛼𝑁, (1 − 𝛼)𝑁K with a fixed 𝛼 > 0. More generally, for any 𝑘, 𝑚 ∈
J𝛼𝑁, (1 − 𝛼)𝑁K we have
The second part (18.6) of this theorem asserts that the gap distribution is
not only independent of the specific Wigner ensemble, but it is also universal
throughout the bulk spectrum. This is the counterpart of the statement that the
appropriately rescaled correlation functions (18.4) have a limit that is indepen-
dent of 𝐸 (see (4.38)).
Prior to [67], universality for a single gap was only achieved in the spe-
cial case of the Gaussian unitary ensemble (GUE) in [129]; this statement then
immediately implies the same result for complex Hermitian Wigner matrices
satisfying the four moment matching conditions.
18.2.1. History of Step 1. The semicircle law was proved by Wigner for
energy windows of order 1 [138]. Various improvements were made to shrink
the spectral windows; in particular, results down to scale 𝑁 −1/2 were obtained
by combining the results of [11] and [80]. The result at the optimal scale, 𝑁 −1 ,
referred to as the local semicircle law, was established for Wigner matrices in a
series of papers [60–62]. The method was based on a self-consistent equation for
the Stieltjes transform of the eigenvalues, 𝑚(𝑧), and the continuity in the imag-
inary part of the spectral parameter 𝑧. As a by-product, the optimal eigenvector
delocalization estimate was proved. For generalized Wigner matrices there is no
closed equation for 𝑚(𝑧) = 𝑁 −1 Tr 𝐺. In order to deal with this case, one needed
to consider a self-consistent equation for the entire vector (𝐺𝑖𝑖 (𝑧))𝑁
𝑖=1 consisting
of the diagonal matrix elements of the Green function [68, 69]. Together with
the fluctuation averaging lemma, this method implied the optimal rigidity esti-
mate of eigenvalues in the bulk in [68] and up to the edges in [70]. Furthermore,
optimal control was obtained for individual matrix elements of the resolvent 𝐺𝑖𝑗
and not only for its trace. The estimate on 𝐺𝑖𝑖 also provided a simple alternative
proof of the eigenvector delocalization estimate. A comprehensive summary of
these results can be found in [55]. Several further extensions concern weaker
moment conditions and improvements of (log 𝑁)-powers (see, e.g., [32,78,130]
and references therein).
Gaussian calculations going back to Gaudin, Dyson, and Mehta. It was later ex-
tended to complex sample covariance matrices on the entire bulk spectrum by
Ben Arous and Péché [19]. There were two major restrictions of this method:
(i) the Gaussian component was required to be of order 1 independent of 𝑁;
(ii) it relies on an explicit formula by Brézin-Hikami [30, 31] for the correlation
functions of eigenvalues, which is valid only for Gaussian divisible ensembles
with unitary invariant Gaussian component. The size of the Gaussian compo-
nent was reduced to 𝑁 −1+𝜀 in [57] by using an improved formula for correlation
functions and the local semicircle law from [60–62].
To eliminate the usage of an explicit formula, a conceptual approach for
Step 2 via the local ergodicity of Dyson Brownian motion was initiated in [63].
In [64], a general theorem for the bulk universality was formulated that ap-
plies to all classical ensembles, i.e., real and complex Wigner matrices, real and
complex sample covariance matrices, and quaternion Wigner matrices. The re-
laxation time to local equilibrium proved in these two papers were not optimal;
the optimal relaxation time, conjectured by Dyson, was obtained later in [70].
The DBM approach in these papers yields the average gap and hence the aver-
age energy universality, while the method via the Brézin-Hikami formula gives
universality in a fixed energy sense but is strictly restricted to the complex Her-
mitian case. The fixed energy universality for all symmetry classes was recently
achieved in [26]. The method in this paper was still based on DBM, but the
analysis of DBM was very different from the one discussed in this book.
only for specific values 𝛽 = 1, 2, 4, log-gases may be studied for any 𝛽 > 0, i.e.,
at any inverse temperature.
In the case of invariant ensembles, it is well-known that for 𝑉 satisfying cer-
tain mild conditions the sequence of one-point correlation functions, or density,
associated with (18.7) has a limit as 𝑁 → ∞, and the limiting equilibrium den-
sity 𝜚𝑉 (𝑠) can be obtained as the unique minimizer of the functional
available only for the specific values 𝛽 = 1, 2, 4. Within the framework of this ap-
proach, the 𝛽 = 1, 2, 4 cases of Theorem 18.5 below was proved for very general
potentials for 𝛽 = 2 and for analytic potentials with some additional conditions
for 𝛽 = 1, 4. This is a whole subject by itself, and we can only refer the reader
to some reviews or books [39, 91, 116].
The first general result for nonclassical 𝛽-values is the following theorem
proved in [23–25]. In these works, a new approach to the universality of the
𝛽-ensemble was initiated. It departed from the previously mentioned traditional
approach using explicit expressions for the correlation functions. Instead, it re-
lies on comparing the probability measures of the 𝛽-ensembles to those of the
Gaussian ones. This basic idea of comparing two probability measures directly
was also used in the later works [18, 119], albeit in a different way, where The-
orem 18.5 was reproved under different sets of conditions on the potential 𝑉.
to be the limiting densities at the 𝑗th quantiles. Let 𝔼𝜇𝑉 and 𝔼G denote the
expectation w.r.t. the measure 𝜇𝑉 and its Gaussian counterpart for 𝑉(𝜆) = 12 𝜆2 .
210 18. FURTHER RESULTS AND HISTORICAL NOTES
Theorem 18.6 (Gap universality with fixed label [67, theorem 2.3]). We
consider the setup of Theorem 18.5 and also assume 𝛽 ≥ 1. Set some 𝛼 > 0;
then
lim |𝔼𝜇𝑉 𝑂((𝑁𝜚𝑘𝑉 )(𝜆𝑘 − 𝜆𝑘+1 ), … , (𝑁𝜚𝑘𝑉 )(𝜆𝑘 − 𝜆𝑘+𝑛 ))
𝑁→∞
(18.13)
− 𝔼𝜇𝐺 𝑂((𝑁𝜚𝑚 )(𝜆𝑚 − 𝜆𝑚+1 ), … , (𝑁𝜚𝑚 )(𝜆𝑚 − 𝜆𝑚+𝑛 ))| = 0
for any 𝑘, 𝑚 ∈ J𝛼𝑁, (1 − 𝛼)𝑁K. In particular, the distribution of the rescaled gaps
w.r.t. 𝜇𝑉 does not depend on the index 𝑘 in the bulk.
We point out that Theorem 18.5 holds for any 𝛽 > 0, but Theorem 18.6
requires 𝛽 ≥ 1. This is partly due to the fact that the De Giorgi-Nash-Moser reg-
ularity theory was used in [67]. This restriction was later removed by Bekerman-
Figalli-Guionnet [18] under a higher regularity assumption on 𝑉 and some ad-
ditional hypotheses.
with some 𝜒 > 0, where 𝔼G is expectation w.r.t. the standard GOE or GUE ensem-
ble depending on the symmetry class of 𝐻 and 𝛾𝑗 ’s are semicircle quantiles (11.31).
The edge universality for Wigner matrices was first proved by Soshinikov
[125] under the assumption that the distribution of the matrix elements is sym-
metric and has finite moments for all orders. Soshnikov used an elaborated
moment matching method, and the conditions on the moments are not easy
to improve. A completely different method based on the Green function com-
parison theorem was initiated in [70]. This method is more flexible in many
ways and removes essentially all restrictions in previous works. In particular,
the optimal moment condition on the distribution of the matrix elements was
obtained by Lee and Yin [98]. All these works assume that the variances of the
matrix elements are identical. The main point of Theorem 18.7 is to consider
generalized Wigner matrices, i.e., matrices with nonconstant variances. It was
18.5. EIGENVECTORS 211
proved in [70, theorem 17.1] that the edge statistics for any generalized Wigner
matrix coincide with those of a generalized Gaussian Wigner matrix with the
same variances, but it was not shown that the statistics are independent of the
variances themselves. Theorem 18.7 provides this missing step and thus proves
the edge universality for the generalized Wigner matrices.
Theorem 18.8 (Universality at the edge for log-gases [24]). Let 𝛽 ≥ 1 and
˜ be in 𝐶 4 (ℝ), regular such that the equilibrium density 𝜚𝑉 (resp., 𝜚 ) is
𝑉 (resp., 𝑉) ˜
𝑉
˜𝐵
supported on a single interval [𝐴, 𝐵] (resp., [𝐴, ˜]). Without loss of generality we
assume that for both densities (18.8) holds with 𝐴 = 0 and with the same constant
𝑠𝐴 . Fix 𝑛 ∈ ℕ and 𝜅 < 2/5. Then for any Λ ⊂ J1, 𝑁 𝜅 K with |Λ| = 𝑛, we have
(18.14) |(𝔼𝜇𝑉 − 𝔼𝜇𝑉˜ )𝑂((𝑁 2/3 𝑗 1/3 (𝜆𝑗 − 𝛾𝑗 )) )| ≤ 𝑁 −𝜒
𝑗∈Λ
with some 𝜒 > 0. Here, 𝛾𝑗 are the quantiles w.r.t. the density 𝜚𝑉 (18.10).
The history of edge universality runs in parallel to the bulk one in the sense
that most bulk methods were extended to the edge case. Similar to the bulk case,
the edge universality was first proved via orthogonal polynomial methods for
the classical values of 𝛽 = 1, 2, 4 in [38, 42, 111, 117]. The first edge universality
results were given independently in [24] (valid for general potentials and 𝛽 ≥
1) and, with a completely different method, in [92] (valid for strictly convex
potentials and any 𝛽 > 0). The later work [18] also covered the edge case and
proved universality to 𝛽 > 0 under some higher regularity assumption on 𝑉 and
an additional condition that can be checked for convex 𝑉. Some open questions
concern the universal behavior at the nonregular edges where even the scaling
may change.
18.5. Eigenvectors
Universality questions have been traditionally formulated for eigenvalues,
but they naturally extend to eigenvectors as well. They are closely related to two
different important physical phenomena: delocalization and quantum ergod-
icity.
The concept of delocalization stems from the basic theory of random Schrö-
dinger operators describing a single quantum particle subject to a random po-
tential 𝑉 in ℝ𝑑 . If the potential decays at infinity, then the operator −Δ + 𝑉
acting on 𝐿2 (ℝ𝑑 ) has a discrete pure point spectrum with 𝐿2 -eigenfunctions, lo-
calized or bound states, in the low-energy regime. In the high-energy regime
it has absolutely continuous spectrum with bounded but not 𝐿2 -normalizable
generalized eigenfunctions. The latter are also called delocalized or extended
states, and they correspond to scattering and conductance. If the potential 𝑉
is a stationary, ergodic random field, then celebrated Anderson localization [9]
occurs: at high disorder a dense pure point spectrum with exponentially de-
caying eigenfunctions appears even in the energy regime that would classically
correspond to scattering states. The localized regime has been mathematically
212 18. FURTHER RESULTS AND HISTORICAL NOTES
well understood for both the high-disorder regime and the regime of low den-
sity of states (near the spectral edges) since the fundamental works of Fröhlich
and Spencer [74] and later by Aizenman and Molchanov [1]. The local spectral
statistics is Poisson [108], reflecting the intuition that exponentially decaying
eigenfunctions typically have no overlap, so their energies are independent.
The low- disorder regime, however, is mathematically wide open. In 𝑑 = 3
or higher dimensions it is conjectured that −Δ + 𝑉 has absolutely continuous
spectrum (away from the spectral edges), the corresponding eigenfunctions are
delocalized, and the local eigenvalue statistics of the restriction of −Δ + 𝑉 to a
large finite box follow the GOE local statistics. In other words, a phase transi-
tion is believed to occur as the strength of the disorder varies; this is called the
Anderson metal-insulator transition. Currently, even the existence of this ex-
tended states regime is not known; this is considered one of the most important
open questions in mathematical physics. Only the Bethe lattice (regular tree) is
understood [2, 87], but this model exhibits Poisson local statistics [3] due to its
exponentially growing boundary.
The strong link between local eigenvalue statistics and delocalization for
random Schrödinger operators naturally raises the question about the structure
of the eigenvectors for an 𝑁 × 𝑁 Wigner matrix 𝐻. Complete delocalization in
this context would mean that any ℓ2 -normalized eigenvector 𝐮 = (𝑢1 , … , 𝑢𝑁 )
of 𝐻 is substantially supported on each coordinate; ideally |𝑢𝑖 |2 ≈ 𝑁 −1 for each 𝑖.
Due to fluctuations, this can hold only in a somewhat weaker sense.
The first type of results provide an upper bound on the ℓ∞ -norm in very
high probability:
where we chose 𝐸 in the 𝜂-vicinity of 𝜆𝛼 . Since from (6.31) the left-hand side
is bounded by 𝑁 𝜀 (𝑁𝜂)−1 with very high probability, we obtain (18.15). Under
stronger decay conditions on 𝐻, the local semicircle law can be slightly im-
proved, and thus the 𝑁 𝜀 tolerance factor may be changed to a log-power and
the probability estimate improved to subexponential.
18.6. GENERAL MEAN FIELD MODELS 213
and for any unit vector 𝐪 ∈ ℝ𝑁 , the vector (√𝑁(|⟨𝐪, 𝐮𝛼 ⟩|)𝛼∈𝐼 ) ∈ ℝ𝑚 is asymp-
totically normal in the sense of finite moments.
Moreover, the study of eigenvectors leads to another fundamental problem
of mathematical physics: quantum ergodicity. Recall that the quantum ergod-
icity theorem (Shnirel′ man [123], Colin de Verdière [35], and Zelditch [141])
asserts that “most” eigenfunctions for the Laplacian on a compact Riemannian
manifold with ergodic geodesic flow are completely flat. For 𝑑-regular graphs
under certain assumptions on the injectivity radius and spectral gap of the ad-
jacency matrices, similar results were proved for eigenvectors of the adjacency
matrices [7]. A stronger notion of quantum ergodicity, the quantum unique
ergodicity (QUE) proposed by Rudnick-Sarnak [115], demands that all high
energy eigenfunctions become completely flat, and it supposedly holds for neg-
atively curved compact Riemannian manifolds. One case for which QUE was
rigorously proved concerns arithmetic surfaces, thanks to tools from number
theory and ergodic theory on homogeneous spaces [82, 83, 101]. For Wigner
matrices, a probabilistic version of QUE was settled in corollary 1.4 of [28]:
Theorem 18.11 (Quantum unique ergodicity for Wigner matrices). Let 𝐮𝛼
be the eigenvectors of a generalized Wigner matrix with moment condition (5.6).
Then there exists 𝜀 > 0 such that for any deterministic 1 ≤ 𝛼 ≤ 𝑁 and 𝐼 ⊂ J1, 𝑁K,
for any 𝛿 > 0 we have
| |𝐼| | 𝑁 −𝜀
(18.16) ℙ(||∑ |𝑢𝛼 (𝑖)|2 − || ≥ 𝛿) ≤ 2 .
𝑖∈𝐼
𝑁 𝛿
models of interest. Here we give a partial list. A more detailed account of the
results and citations can be found later on in the few representative references.
The three-step strategy opens up a path to investigate local spectral statis-
tics of a large class of matrices. In many examples the limiting density of the
eigenvalues is not the semicircle anymore, so the Dyson Brownian motion is not
in global equilibrium and Step 2 is more complicated. In the spirit of Dyson’s
conjecture, however, gap universality already follows from thelocal equilibra-
tion of DBM. Indeed, a local version of the DBM was used as a basic tool in [25].
A recent development to compare local and global dynamics of DBM was im-
plemented by a coupling and homogenization argument in [26]. It turns out
that as long as the initial condition of the DBM is regular on a certain scale, the
local gap statistics reaches its equilibrium, i.e., the Wigner-Dyson-Mehta distri-
bition, on the corresponding time scale [93] (see also [65] for a similar result).
This concept reduces the proof of universality to a proof of the corresponding
local law.
One natural class of mean field ensembles are the deformed Wigner matri-
ces; these are of the form 𝐻 = 𝑉 + 𝑊, where 𝑉 is a diagonal matrix and 𝑊 is a
standard Wigner matrix. In fact, after a trivial rescaling, these matrices may be
viewed as instances of the DBM with initial condition 𝐻0 = 𝑉, recalling that
the law of the DBM at time 𝑡 is given by 𝐻𝑡 = 𝑒−𝑡/2 𝑉 + (1 − 𝑒−𝑡 )1/2 𝑊. As-
suming that the diagonal elements of 𝑉 have a limiting density 𝜚𝑉 , the limiting
density of 𝐻𝑡 is given by the free convolution of 𝜚𝑉 with the semicircle density.
The corresponding local law was proved in [94] followed by the proof of bulk
universality in [97] and the edge universality [95, 97].
The free convolution of two arbitrary densities, 𝜚𝛼 ⊞ 𝜚𝛽 , naturally appear in
random matrix theory as the limiting density of the sum of two asymptotically
free matrices 𝐴 and 𝐵 whose limiting densities are given by 𝜚𝛼 and 𝜚𝛽 . Moreover,
for any deterministic 𝐴 and 𝐵, the matrices 𝐴 and 𝑈𝐵𝑈 ∗ are asymptotically
free [137] where 𝑈 is a Haar distributed unitary matrix. Thus, the eigenvalue
density of 𝐴 + 𝑈𝐵𝑈 ∗ is given by the free convolution . The corresponding local
law on the optimal scale in the bulk was given in [15].
Another way to generate a matrix ensemble with a limiting density differ-
ent from the semicircle law is to consider Wigner-type matrices that are further
generalizations of the universal Wigner matrices from Definition 2.1. These are
complex Hermitian or real symmetric matrices with centered independent (up
to symmetry) matrix elements but without the condition that the sum of the
variances in each row is constant (2.5). The limiting density is determined by
the matrix of variances 𝑆 by solving the corresponding Dyson-Schwinger equa-
tion. Depending on 𝑆, the density may exhibit square root singularity similar to
the edge of the semicircle law or a cubic root singularity [5]. The corresponding
optimal local law and bulk universality were proved in [4].
Different complications arise for adjacency matrices of sparse graphs. Each
edge of the Erdős-Rényi graph is chosen independently with probability 𝑝, so it
18.7. BEYOND MEAN FIELD MODELS: BAND MATRICES 215
is a mean field model with a semicircle density, but a typical realization of the
matrix contains many zero elements if 𝑝 ≪ 1. Optimal local law was proved
in [56], and bulk universality for 𝑝 ≫ 𝑁 −1/3 in [53] and for 𝑝 ≫ 𝑁 −1 in [84]. An-
other prominent model of random graphs is the uniform measure on 𝑑-regular
graphs . The elements of the adjacency matrix are weakly dependent since their
sum in every row is 𝑑. For 𝑑 ≫ 1 the limiting density is the semicircle law, and
optimal local law was obtained in [17]. Bulk universality was shown in the
regime 1 ≪ 𝑑 ≪ 𝑁 2/3 in [16],
Sample covariance matrices (2.9) and especially their deformations with fi-
nite rank deterministic matrices play an important role in statistics. The first
application of the three-step strategy to prove bulk universality was presented
in [64]. The edge universality was achieved in [90,96]. For applications in prin-
cipal component analysis, the main focus is on the extreme eigenvalues and the
outliers whose detailed analysis was given in [22]. Here we have listed refer-
ences for papers using methods closely related to this book. There are many
references in this subject that we are unable to include here.
In other words, if 𝑊 ≫ √𝑁, we expect the universality results of [26, 57, 63, 67]
to hold. Furthermore, the eigenvectors of 𝐻 are expected to be completely de-
localized in this range. For 𝑊 ≪ √𝑁, one expects that the eigenvectors are
exponentially localized. This is the analogue of the celebrated Anderson metal-
insulator transition for random band matrices. The only rigorous work indicat-
ing the √𝑁 threshold concerns the second mixed moments of the characteristic
polynomial for a special class of Gaussian band matrices [120, 122].
The localization length for band matrices in one spatial dimension was re-
cently investigated in numerous works. For general distribution of the matrix
entries, eigenstates were proved to be localized [116] for 𝑊 ≪ 𝑁 1/8 , and delo-
calization of most eigenvectors in a certain averaged sense holds for 𝑊 ≫ 𝑁 6/7
[50, 51], improved to 𝑊 ≫ 𝑁 4/5 [54]. The Green’s function (𝐻 − 𝑧)−1 was
controlled down to the scale Im 𝑧 ≫ 𝑊 −1 in [69], implying a lower bound
of order 𝑊 for the localization length of all eigenvectors. When the entries are
Gaussian with some specific covariance profiles, supersymmetry techniques are
applicable to obtain stronger results. This approach was first developed by physi-
cists (see [49] for an overview); the rigorous analysis was initiated by Spencer
(see [126] for an overview) with an accurate estimate on the expected density
of states on arbitrarily short scales for a three-dimensional band matrix ensem-
ble in [44]. More recent works include universality for 𝑊 ≥ 𝑐𝑁 [121] and the
control of the Green’s function down to the optimal scale Im 𝑧 ≫ 𝑁 −1 , hence
delocalization in a strong sense for all eigenvectors, when 𝑊 ≫ 𝑁 6/7 [14] with
first four moments matching the Gaussian ones (both results require a block
structure and hold in part of the bulk spectrum).
While delocalization and Wigner-Dyson-Mehta spectral statistics are ex-
pected to occur simultaneously, there is no rigorous argument directly linking
them. The Dyson Brownian motion, the cornerstone of the three-step strategy,
proves universality for matrices where each entry has a nontrivial Gaussian com-
ponent and the comparison ideas require that second moments match exactly.
Therefore, this approach cannot be applied to matrices with many zero entries.
However, a combination of the quantum unique ergodicity with a mean field
reduction strategy in [27] yields Wigner-Dyson-Mehta bulk universality for a
large class of band matrices with general distribution in the large bandwidth
regime 𝑊 ≥ 𝑐𝑁. In contrast to the bulk, universality at the spectral edge is
much better understood: extreme eigenvalues follow the Tracy-Widom law for
𝑊 ≫ 𝑁 5/6 , an essentially optimal condition [124].
References
[1] Aizenman, M., and Molchanov, S. Localization at large disorder and at extreme energies:
an elementary derivation. Comm. Math. Phys. 157(2):245–278, 1993.
[2] Aizenman, M., Sims, R., and Warzel, S. Absolutely continuous spectra of quantum tree
graphs with weak disorder. Comm. Math. Phys. 264(2):371–389, 2006. doi:10.1007/s00220-
005-1468-5
[3] Aizenman, M., and Warzel, S. The canopy graph and level statistics for random opera-
tors on trees. Math. Phys. Anal. Geom. 9(4):291–333 (2007), 2006. doi:10.1007/s11040-007-
9018-3
[4] Ajanki,O., Erdős, L., and Krüger, T. Quadratic vector equations on complex upper half-
plane. Preprint, 2015. arXiv:1506.05095 [math.PR].
[5] . Universality for general Wigner-type matrices. Probab. Theory Relat. Fields, to
appear. doi:10.1007/s00440-016-0740-2.
[6] Akemann, G., Baik, J., and Di Francesco, P., eds. The Oxford handbook of random matrix
theory. Oxford University Press, Oxford, 2011. MR2920518.
[7] Anantharaman, N., and Le Masson, E. Quantum ergodicity on large regular graphs. Duke
Math. J. 164(4):723–765, 2015. doi:10.1215/00127094-2881592
[8] Anderson, G. W., Guionnet, A., and Zeitouni, O. An introduction to random matrices. Cam-
bridge Studies in Advanced Mathematics, 118. Cambridge University Press, Cambridge,
2010.
[9] Anderson, P. W. Absence of diffusion in certain random lattices. Phys. Rev. 109(5):1492–
1505, 1958. doi:10.1103/PhysRev.109.1492
[10] Bai, Z. D. Convergence rate of expected spectral distributions of large random matrices. I.
Wigner matrices. Ann. Probab. 21(2):625–648, 1993.
[11] Bai, Z. D., Miao, B., and Tsay, J. Convergence rates of the spectral distributions of large
Wigner matrices. Int. Math. J. 1(1):65–90, 2002.
[12] Bai, Z. D., and Yin, Y. Q. Limit of the smallest eigenvalue of a large-dimensional sample
covariance matrix. Ann. Probab. 21(3):1275–1294, 1993.
[13] Bakry, D., and Émery, M. Diffusions hypercontractives. Séminaire de probabilités,
XIX, 1983/84, 177–206. Lecture Notes in Mathematics, 1123. Springer, Berlin, 1985.
doi:10.1007/BFb0075847
[14] Bao, Z., and Erdős, L. Delocalization for a class of random block band matrices. Probab.
Theory Related Fields 1–104, 2016. doi:10.1007/s00440-015-0692-y
[15] Bao, Z., Erdős, L., and Schnelli, K. Local law of addition of random matrices on optimal
scale. Comm. Math. Phys. 349(3):947–990, 2017. doi:10.1007/s00220-016-2805-6
[16] Bauerschmidt, R., Huang, J., Knowles, A., and Yau, H.-T. Bulk eigenvalue statistics for
random regular graphs. Preprint, 2015. arXiv:1505.06700 [math.PR].
[17] Bauerschmidt, R., Knowles, A., and Yau, H.-T. Local semicircle law for random regular
graphs. Preprint, 2015. arXiv:1503.08702 [math.PR].
[18] Bekerman, F., Figalli, A., and Guionnet, A. Transport maps for 𝛽-matrix models and uni-
versality. Comm. Math. Phys. 338(2):589–619, 2015. doi:10.1007/s00220-015-2384-y
217
218 REFERENCES
[19] Ben Arous, G., and Péché, S. Universality of local eigenvalue statistics for some
sample covariance matrices. Comm. Pure Appl. Math. 58(10):1316–1357, 2005.
doi:10.1002/cpa.20070
[20] Bleher, P., and Its, A. Semiclassical asymptotics of orthogonal polynomials, Riemann-
Hilbert problem, and universality in the matrix model. Ann. of Math. (2) 150(1):185–266,
1999. doi:10.2307/121101
[21] Bloemendal, A., Erdős, L., Knowles, A., Yau, H.-T., and Yin, J. Isotropic local laws for
sample covariance and generalized Wigner matrices. Electron. J. Probab. 19(33):53 pp.,
2014.
[22] Bloemendal, A., Knowles, A, Yau, H.-T., and Yin, J. On the principal components
of sample covariance matrices. Probab. Theory Related Fields 164(1-2):459–552, 2016.
doi:10.1007/s00440-015-0616-x
[23] Bourgade, P., Erdős, L., and Yau, H.-T. Bulk universality of general 𝛽-ensembles with non-
convex potential. J. Math. Phys. 53(9):095221, 19 pp., 2012. doi:10.1063/1.4751478
[24] . Edge universality of beta ensembles. Comm. Math. Phys. 332(1):261–353, 2014.
doi:10.1007/s00220-014-2120-z
[25] . Universality of general 𝛽-ensembles. Duke Math. J. 163(6):1127–1190, 2014.
doi:10.1215/00127094-2649752
[26] Bourgade, P., Erdős, L., Yau, H.-T., and Yin, J. Fixed energy universality for generalized
Wigner matrices. Comm. Pure Appl. Math. 69(10):1815–1881, 2016. doi:10.1002/cpa.21624
[27] . Universality for a class of random band matrices. Preprint, 2016.
arXiv:1602.02312 [math.PR].
[28] Bourgade, P., and Yau, H.-T. The eigenvector moment flow and local quantum unique
ergodicity. Comm. Math. Phys. 1–48, 2016. doi:10.1007/s00220-016-2627-6
[29] Brascamp, H. J., and Lieb, E. H. Best constants in Young’s inequality, its converse, and
its generalization to more than three functions. Advances in Math. 20(2):151–173, 1976.
doi:10.1016/0001-8708(76)90184-5
[30] Brézin, E., and Hikami, S. Correlations of nearby levels induced by a random potential.
Nuclear Phys. B 479(3):697–706, 1996. doi:10.1016/0550-3213(96)00394-X
[31] . Spectral form factor in a random matrix theory. Phys. Rev. E (3) 55(4):4067–4083,
1997. doi:10.1103/PhysRevE.55.4067
[32] Cacciapuoti, C., Maltsev, A., and Schlein, B. Bounds for the Stieltjes transform and the
density of states of Wigner matrices. Probab. Theory Related Fields 163(1-2):1–59, 2015.
doi:10.1007/s00440-014-0586-4
[33] Carlen, E. A., and Loss, M. Optimal smoothing and decay estimates for viscously damped
conservation laws, with applications to the 2-D Navier-Stokes equation. Duke Math. J.
81(1):135–157 (1996), 1995. doi:10.1215/S0012-7094-95-08110-1
[34] Chen, T. Localization lengths and Boltzmann limit for the Anderson model at small dis-
orders in dimension 3. J. Stat. Phys. 120(1):279–337, 2005. doi:10.1007/s10955-005-5255-7
[35] Colin de Verdière, Y. Ergodicité et fonctions propres du laplacien. Comm. Math. Phys.
102(3):497–502, 1985.
[36] Davies, E. B. The functional calculus. J. London Math. Soc. (2) 52(1):166–176, 1995.
doi:10.1112/jlms/52.1.166
[37] Deift, P. A. Orthogonal polynomials and random matrices: a Riemann-Hilbert approach.
Courant Lecture Notes in Mathematics, 3. New York University, Courant Institute of
Mathematical Sciences, New York; American Mathematical Society, Providence, R.I.,
1999.
[38] Deift, P., and Gioev, D. Universality at the edge of the spectrum for unitary, orthogonal,
and symplectic ensembles of random matrices. Comm. Pure Appl. Math. 60(6):867–910,
2007. doi:10.1002/cpa.20164
REFERENCES 219
[39] . Universality in random matrix theory for orthogonal and symplectic ensembles.
Int. Math. Res. Pap. 2007(2):116, Art. ID rpm004, 2007. doi:10.1093/imrp/rpm004
[40] . Random matrix theory: invariant ensembles and universality. Courant Lecture
Notes in Mathematics, 18. Courant Institute of Mathematical Sciences, New York; Amer-
ican Mathematical Society, Providence, R.I., 2009. doi:10.1090/cln/018
[41] Deift, P., Kriecherbauer, T., McLaughlin, K. T.-R., Venakides, S., and Zhou,
X. Strong asymptotics of orthogonal polynomials with respect to exponential
weights. Comm. Pure Appl. Math. 52(12):1491–1552, 1999. doi:10.1002/(SICI)1097-
0312(199912)52:12<1491::AID-CPA2>3.3.CO;2-R
[42] . Uniform asymptotics for polynomials orthogonal with respect to vary-
ing exponential weights and applications to universality questions in random ma-
trix theory. Comm. Pure Appl. Math. 52(11):1335–1425, 1999. doi:10.1002/(SICI)1097-
0312(199911)52:11<1335::AID-CPA1>3.0.CO;2-1
[43] Deuschel, J.-D., and Stroock, D. W. Large deviations. Pure and Applied Mathematics, 137.
Academic Press, Boston, 1989.
[44] Disertori, M., Pinson, H., and Spencer, T. Density of states for random band matrices.
Comm. Math. Phys. 232(1):83–124, 2002. doi:10.1007/s00220-002-0733-0
[45] Dyson, F. J. A Brownian-motion model for the eigenvalues of a random matrix. J. Mathe-
matical Phys. 3:1191–1198, 1962. doi:10.1063/1.1703862
[46] . Statistical theory of the energy levels of complex systems. I, II, III. J. Mathematical
Phys. 3:140–156, 1962. doi:10.1063/1.1703773
[47] . Correlations between eigenvalues of a random matrix. Comm. Math. Phys.
19:235–250, 1970.
[48] Edmunds, D. E., and Triebel, H. Sharp Sobolev embeddings and related Hardy inequali-
ties: the critical case. Math. Nachr. 207:79–92, 1999. doi:10.1002/mana.1999.3212070105
[49] Efetov, K. Supersymmetry in disorder and chaos. Cambridge University Press, Cambridge,
1997.
[50] Erdős, L., and Knowles, A. Quantum diffusion and delocalization for band matrices with
general distribution. Ann. Henri Poincaré 12(7):1227–1319, 2007. doi:10.1007/s00023-011-
0104-5
[51] . Quantum diffusion and eigenfunction delocalization in a random band matrix
model. Comm. Math. Phys. 303(2):509–554, 2011. doi:10.1007/s00220-011-1204-2
[52] Erdős, L., Knowles, A., and Yau, H.-T. Averaging fluctuations in resolvents of random band
matrices. Ann. Henri Poincaré 14(8):1837–1926, 2013. doi:10.1007/s00023-013-0235-y
[53] Erdős, L., Knowles, A., Yau, H.-T., and Yin, J. Spectral statistics of Erdős-Rényi Graphs
II: Eigenvalue spacing and the extreme eigenvalues. Comm. Math. Phys. 314(3):587–640,
2012. doi:10.1007/s00220-012-1527-7
[54] . Delocalization and diffusion profile for random band matrices. Comm. Math.
Phys. 323(1):367–416, 2013. doi:10.1007/s00220-013-1773-3
[55] . The local semicircle law for a general class of random matrices. Electron. J. Probab.
18(59):58 pp., 2013. doi:10.1214/EJP.v18-2473
[56] . Spectral statistics of Erdős-Rényi graphs I: Local semicircle law. Ann. Probab.
41(3B):2279–2375, 2013. doi:10.1214/11-AOP734
[57] Erdős, L., Péché, S., Ramírez, J. A., Schlein, B., and Yau, H.-T. Bulk universality for Wigner
matrices. Comm. Pure Appl. Math. 63(7):895–925, 2010. doi:10.1002/cpa.20317
[58] Erdős, L., Ramírez, J., Schlein, B., Tao, T., Vu, V., and Yau, H.-T. Bulk universality for
Wigner Hermitian matrices with subexponential decay. Math. Res. Lett. 17(4):667–674,
2010. doi:10.4310/MRL.2010.v17.n4.a7
220 REFERENCES
[59] Erdős, L., Ramírez, J. A., Schlein, B., and Yau, H.-T. Universality of sine-kernel for Wigner
matrices with a small Gaussian perturbation. Electron. J. Probab. 15(18):526–603, 2010.
doi:10.1214/EJP.v15-768
[60] Erdős, L., Schlein, B., and Yau, H.-T. Local semicircle law and complete delocalization for
Wigner random matrices. Comm. Math. Phys. 287(2):641–655, 2009. doi:10.1007/s00220-
008-0636-9
[61] . Semicircle law on short scales and delocalization of eigenvectors for Wigner ran-
dom matrices. Ann. Probab. 37(3):815–852, 2009. doi:10.1214/08-AOP421
[62] . Wegner estimate and level repulsion for Wigner random matrices. Int. Math. Res.
Not. IMRN 2010(3):436–479, 2010. doi:10.1093/imrn/rnp136
[63] . Universality of random matrices and local relaxation flow. Invent. Math.
185(1):75–119, 2011. doi:10.1007/s00222-010-0302-7
[64] Erdős, L., Schlein, B., Yau, H.-T., and Yin, J. The local relaxation flow approach to uni-
versality of the local statistics for random matrices. Ann. Inst. Henri Poincaré Probab. Stat.
48(1):1–46, 2012. doi:10.1214/10-AIHP388
[65] Erdős, L., and Schnelli, K. Universality for random matrix flows with time-dependent den-
sity. Preprint, 2015. arXiv:1504.00650 [math.PR].
[66] Erdős, L., and Yau, H.-T. A comment on the Wigner-Dyson-Mehta bulk universality con-
jecture for Wigner matrices. Electron. J. Probab. 17(28):5 pp., 2012. doi:10.1214/EJP.v17-
1779
[67] . Gap universality of generalized Wigner and 𝛽-ensembles. J. Eur. Math. Soc.
(JEMS) 17(8):1927–2036, 2015. doi:10.4171/JEMS/548
[68] Erdős, L., Yau, H.-T., and Yin, J. Universality for generalized Wigner matrices with
Bernoulli distribution. J. Comb. 2(1):15–81, 2011. doi:10.4310/JOC.2011.v2.n1.a2
[69] . Bulk universality for generalized Wigner matrices. Probab. Theory Related Fields
154(1-2):341–407, 2012. doi:10.1007/s00440-011-0390-3
[70] . Rigidity of eigenvalues of generalized Wigner matrices. Adv. Math. 229(3):1435–
1515, 2012. doi:10.1016/j.aim.2011.12.010
[71] Firk, F. W. K., and Miller, S. J. Nuclei, primes and the random matrix connection. Sym-
metry 1:64–105. doi:10.3390/sym1010064
[72] Fokas, A. S., It.s, A. R., and Kitaev, A. V. The isomonodromy approach to matrix models in
2D quantum gravity. Comm. Math. Phys. 147(2):395–430, 1992. doi:10.1007/BF02096594
[73] Forrester, P. J. Log-gases and random matrices. London Mathematical Society Monographs
Series, 34. Princeton University Press, Princeton, N.J., 2010. doi:10.1515/9781400835416
[74] Fröhlich, J., and Spencer, T. Absence of diffusion in the Anderson tight binding model for
large disorder or low energy. Comm. Math. Phys. 88(2):151–184, 1983.
[75] Fukushima, M., Ōshima, Y., and Takeda, M. Dirichlet forms and symmetric Markov
processes. De Gruyter Studies in Mathematics, 19. Walter de Gruyter, Berlin, 1994.
doi:10.1515/9783110889741
[76] Fyodorov, Y. V., and Mirlin, A. D. Scaling properties of localization in ran-
dom band matrices: a 𝜍-model approach. Phys. Rev. Lett. 67(18):2405–2409, 1991.
doi:10.1103/PhysRevLett.67.2405
[77] Girko, V. L. Asymptotics of the distribution of the spectrum of random matrices. Trans-
lated from Uspekhi Mat. Nauk 44(4(268)):7–34, 256, 1989; Russian Math. Surveys 44(4):3–
36, 1989. doi:10.1070/RM1989v044n04ABEH002143
[78] Götze, F., Naumov, A., and Tikhomirov, A. Local semicircle law under moment conditions.
Part I: The Stieltjes transform. Preprint, 2015. arXiv:1510.07350 [math.PR].
[79] Gross, L. Logarithmic Sobolev inequalities. Amer. J. Math. 97(4):1061–1083, 1975.
doi:10.2307/2373688
REFERENCES 221
[80] Guionnet, A., and Zeitouni, O. Concentration of the spectral measure for large matrices.
Electron. Comm. Probab. 5:119–136, 2000. doi:10.1214/ECP.v5-1026
[81] Helffer, B., and Sjöstrand, J. On the correlation for Kac-like models in the convex case. J.
Statist. Phys. 74(1-2):349–409, 1994. doi:10.1007/BF02186817
[82] Holowinsky, R. Sieving for mass equidistribution. Ann. of Math. (2) 172(2):1499–1516,
2010.
[83] Holowinsky, R., and Soundararajan, K. Mass equidistribution for Hecke eigenforms. Ann.
of Math. (2) 172(2):1517–1528, 2010.
[84] Huang, J., Landon, B., and Yau, H.-T. Bulk universality of sparse random matrices. J. Math.
Phys. 56(12):123301, 19 pp., 2015. doi:10.1063/1.4936139
[85] Itzykson, C., and Zuber, J. B. The planar approximation. II. J. Math. Phys. 21(3):411–421,
1980. doi:10.1063/1.524438
[86] Johansson, K. Universality of the local spacing distribution in certain ensem-
bles of Hermitian Wigner matrices. Comm. Math. Phys. 215(3):683–705, 2001.
doi:10.1007/s002200000328
[87] Klein, A. Absolutely continuous spectrum in the Anderson model on the Bethe lattice.
Math. Res. Lett. 1(4):399–407, 1994. doi:10.4310/MRL.1994.v1.n4.a1
[88] Knowles, A., and Yin, J. Eigenvector distribution of Wigner matrices. Probab. Theory Re-
lated Fields 155(3-4):543–582, 2013. doi:10.1007/s00440-011-0407-y
[89] . The isotropic semicircle law and deformation of Wigner matrices. Comm. Pure
Appl. Math. 66(11):1663–1750, 2013. doi:10.1002/cpa.21450
[90] . Anisotropic local laws for random matrices. Probab. Theory Relat. Fields 2016,
1–96. doi:10.1007/s00440-016-0730-4
[91] Kriecherbauer, T., and Shcherbina, M. Fluctuations of eigenvalues of matrix models and
their applications. Preprint, 2010. arXiv:1003.6121. [math-ph].
[92] Krishnapur, M., Rider, B., and Virág, B. Universality of the stochastic Airy operator. Comm.
Pure Appl. Math. 69(1):145–199, 2016. doi:10.1002/cpa.21573
[93] Landon, B., and Yau, H.-T. Convergence of local statistics of Dyson Brownian motion.
Comm. Math. Phys., to appear.
[94] Lee, J. O., and Schnelli, K. Local deformed semicircle law and complete delocalization
for Wigner matrices with random potential. J. Math. Phys. 54(10):103504, 62 pp., 2013.
doi:10.1063/1.4823718
[95] . Edge universality for deformed Wigner matrices. Rev. Math. Phys. 27(8):1550018,
94 pp., 2015. doi:10.1142/S0129055X1550018X
[96] . Tracy–Widom distribution for the largest eigenvalue of real sample covari-
ance matrices with general population. Ann. Appl. Probab. 26(6):3786–3839, 2016.
doi:10.1214/16-AAP1193
[97] Lee, J. O., Schnelli, K., Stetler, B., and Yau, H.-T. Bulk universality for deformed Wigner
matrices. Ann. Probab. 44(3):2349–2425, 2016. doi:10.1214/15-AOP1023
[98] Lee, J. O., and Yin, J. A necessary and sufficient condition for edge universality of Wigner
matrices. Duke Math. J. 163(1):117–173, 2014. doi:10.1215/00127094-2414767
[99] Levin, E., and Lubinsky, D. S. Universality limits in the bulk for varying measures. Adv.
Math. 219(3):743–779, 2008. doi:10.1016/j.aim.2008.06.010
[100] Lieb, E. H., and Loss, M. Analysis. Second edition. Graduate Studies in Mathematics, 14.
American Mathematical Society, Providence, R.I., 2001. doi:10.1090/gsm/014
[101] Lindenstrauss, E. Invariant measures and arithmetic quantum unique ergodicity. Ann. of
Math. (2) 163(1):165–219, 2006. doi:10.4007/annals.2006.163.165
[102] Lubinsky, D. S. A new approach to universality limits involving orthogonal polynomials.
Ann. of Math. (2) 170(2):915–939, 2009. doi:10.4007/annals.2009.170.915
222 REFERENCES
[103] Lytova, A., and Pastur, L. Central limit theorem for linear eigenvalue statistics of random
matrices with independent entries. Ann. Probab. 37(5):1778–1840, 2009. doi:10.1214/09-
AOP452
[104] Marčenko, V. A., and Pastur, L. A. Distribution of eigenvalues in certain sets of random
matrices. Mat. Sb. (N.S.) 72(114):507–536, 1967.
[105] Mehta, M. L. A note on correlations between eigenvalues of a random matrix. Comm.
Math. Phys. 20:245–250, 1971.
[106] . Random matrices. Second edition. Academic Press, Boston, 1991.
[107] Mehta, M. L., and Gaudin, M. On the density of eigenvalues of a random matrix. Nuclear
Phys. 18:420–427, 1960.
[108] Minami, N. Local fluctuation of the spectrum of a multidimensional Anderson tight bind-
ing model. Comm. Math. Phys. 177(3):709–725, 1996.
[109] Naddaf, A., and Spencer, T. On homogenization and scaling limit of some gradi-
ent perturbations of a massless free field. Comm. Math. Phys. 183(1):55–84, 1997.
doi:10.1007/BF02509796
[110] Pastur, L. A. Spectra of random selfadjoint operators. Uspehi Mat. Nauk 28(1(169)):3–64,
1973.
[111] Pastur, L., and Shcherbina, M. On the edge universality of the local eigenvalue statistics
of matrix models. Mat. Fiz. Anal. Geom. 10(3):335–365, 2003.
[112] . Bulk universality and related properties of Hermitian matrix models. J. Stat. Phys.
130(2):205–250, 2008. doi:10.1007/s10955-007-9434-6
[113] . Eigenvalue distribution of large random matrices. Mathematical Surveys
and Monographs, 171. American Mathematical Society, Providence, R.I., 2011.
doi:10.1090/surv/171
[114] Pillai, N. S., and Yin, J. Universality of covariance matrices. Ann. Appl. Probab. 24(3):935–
1001, 2014. doi:10.1214/13-AAP939
[115] Rudnick, Z., and Sarnak, P. The behaviour of eigenstates of arithmetic hyperbolic mani-
folds. Comm. Math. Phys. 161(1):195–213, 1994.
[116] Schenker, J. Eigenvector localization for random band matrices with power law band
width. Comm. Math. Phys. 290(3):1065–1097, 2009. doi:10.1007/s00220-009-0798-0
[117] Shcherbina, M. Edge universality for orthogonal ensembles of random matrices. J. Stat.
Phys. 136(1):35–50, 2009. doi:10.1007/s10955-009-9766-5
[118] . Central limit theorem for linear eigenvalue statistics of the Wigner and sample
covariance random matrices. Zh. Mat. Fiz. Anal. Geom. 7(2):176–192, 197, 199, 2011.
[119] . Change of variables as a method to study general 𝛽-models: bulk universality. J.
Math. Phys. 55(4):043504, 23 pp., 2014. doi:10.1063/1.4870603
[120] Shcherbina, T. On the second mixed moment of the characteristic polynomials of 1D band
matrices. Comm. Math. Phys. 328(1):45–82, 2014. doi:10.1007/s00220-014-1947-7
[121] . Universality of the local regime for the block band matrices with a finite number
of blocks. J. Stat. Phys. 155(3):466–499, 2014. doi:10.1007/s10955-014-0964-4
[122] . Universality of the second mixed moment of the characteristic polynomials of
the 1D band matrices: real symmetric case. J. Math. Phys. 56(6):063303, 23 pp., 2015.
doi:10.1063/1.4922621
[123] Šnirel′ man, A. I. Ergodic properties of eigenfunctions. Uspehi Mat. Nauk 29(6(180)):181–
182, 1974.
[124] Sodin, S. The spectral edge of some random band matrices. Ann. of Math. (2) 172(3):2223–
2251, 2010. doi:10.4007/annals.2010.172.2223
[125] Soshnikov, A. Universality at the edge of the spectrum in Wigner random matrices. Comm.
Math. Phys. 207(3):697–733, 1999. doi:10.1007/s002200050743
REFERENCES 223
[126] Spencer, T. Random banded and sparse matrices. The Oxford handbook of random matrix
theory, 471–488. Oxford University Press, Oxford, 2011.
[127] Stroock, D. W. Probability theory, an analytic view. Cambridge University Press, Cam-
bridge, 1993.
[128] Tao, T. Topics in random matrix theory. Graduate Studies in Mathematics, 132. American
Mathematical Society, Providence, R.I., 2012. doi:10.1090/gsm/132
[129] . The asymptotic distribution of a single eigenvalue gap of a Wigner matrix. Probab.
Theory Related Fields 157(1-2):81–106, 2013. doi:10.1007/s00440-012-0450-3
[130] Tao, T., and Vu, V. Random matrices: localization of the eigenvalues and the necessity of
four moments. Acta Math. Vietnam. 36(2):431–449, 2011.
[131] . Random matrices: universality of local eigenvalue statistics. Acta Math.
206(1):127–204, 2011. doi:10.1007/s11511-011-0061-3
[132] . The Wigner-Dyson-Mehta bulk universality conjecture for Wigner matrices. Elec-
tron. J. Probab. 16(77):2104–2121, 2011. doi:10.1214/EJP.v16-944
[133] . Random matrices: universal properties of eigenvectors. Random Matrices Theory
Appl. 1(1):1150001, 27 pp., 2012. doi:10.1142/S2010326311500018
[134] Tracy, C. A., and Widom, H. Level-spacing distributions and the Airy kernel. Comm. Math.
Phys. 159(1):151–174, 1994.
[135] . On orthogonal and symplectic matrix ensembles. Comm. Math. Phys. 177(3):727–
754, 1996.
[136] Valkó, B., and Virág, B. Continuum limits of random matrices and the Brownian carousel.
Invent. Math. 177(3):463–508, 2009. doi:10.1007/s00222-009-0180-z
[137] Voiculescu, D. Limit laws for random matrices and free products. Invent. Math.
104(1):201–220, 1991. doi:10.1007/BF01245072
[138] Wigner, E. P. Characteristic vectors of bordered matrices with infinite dimensions. Ann.
of Math. (2) 62:548–564, 1955. doi:10.2307/1970079
[139] Wishart, J. The generalised product moment distribution in samples from a normal mul-
tivariate population. Biometrika 20A(1/2):32–52, 1928. doi:10.2307/2331939
[140] Yau, H.-T. Relative entropy and hydrodynamics of Ginzburg-Landau models. Lett. Math.
Phys. 22(1):63–80, 1991. doi:10.1007/BF00400379
[141] Zelditch, S. Uniform distribution of eigenfunctions on compact hyperbolic surfaces. Duke
Math. J. 55(4):919–941, 1987. doi:10.1215/S0012-7094-87-05546-3
Index
CLN/28