0% found this document useful (0 votes)
351 views

A Dynamical Approach To Random Matrix Theory PDF

Uploaded by

Phong Vân
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
351 views

A Dynamical Approach To Random Matrix Theory PDF

Uploaded by

Phong Vân
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 239

C O U R A N T 28

L Á S Z L Ó E R D Ő S LECTURE
HO RNG-TZER YAU NOTES

A Dynamical Approach
to Random Matrix Theory

American Mathematical Society


Courant Institute of Mathematical Sciences
A Dynamical Approach
to Random Matrix Theory
Courant Lecture Notes
in Mathematics
Executive Editor
Jalal Shatah
Managing Editor
Paul D. Monsour
Production Editor
Neelang Parghi
Copy Editor
Michael Munn
László Erdős
Institute of Science and Technology Austria

Horng-Tzer Yau
Harvard University

28 A Dynamical Approach
to Random Matrix Theory

Courant Institute of Mathematical Sciences


New York University
New York, New York

American Mathematical Society


Providence, Rhode Island
2010 Mathematics Subject Classification. Primary 15B52, 82B44.

For additional information and updates on this book, visit


www.ams.org/bookpages/cln-28

Library of Congress Cataloging-in-Publication Data


Names: Erdős, László, 1966- | Yau, Horng-Tzer. | Courant Institute of Mathematical Sciences.
Title: A dynamical approach to random matrix theory / László Erdős, Institute of Science and
Technology, Austria, Horng-Tzer Yau, Harvard University.
Description: Providence, RI : American Mathematical Society, [2017] | Series: Courant lecture
notes in mathematics ; volume 28 | “Courant Institute of Mathematical Sciences.” | Includes
bibliographical references and index.
Identifiers: LCCN 2017012815 | ISBN 9781470436483 (alk. paper)
Subjects: LCSH: Random matrices.
Classification: LCC QA196.5 .E73 2017 | DDC 512.9/434–dc23
LC record available at https://lccn.loc.gov/2017012815

Copying and reprinting. Individual readers of this publication, and nonprofit libraries
acting for them, are permitted to make fair use of the material, such as to copy select pages for
use in teaching or research. Permission is granted to quote brief passages from this publication in
reviews, provided the customary acknowledgment of the source is given.
Republication, systematic copying, or multiple reproduction of any material in this publication
is permitted only under license from the American Mathematical Society. Permissions to reuse
portions of AMS publication content are handled by Copyright Clearance Center’s RightsLink
service. For more information, please visit: http://www.ams.org/rightslink.
Send requests for translation rights and licensed reprints to reprint-permission@ams.org.
Excluded from these provisions is material for which the author holds copyright. In such cases,
requests for permission to reuse or reprint material should be addressed directly to the author(s).
Copyright ownership is indicated on the copyright page, or on the lower right-hand corner of the
first page of each article within proceedings volumes.


c 2017 by the authors. All rights reserved.
Printed in the United States of America.

∞ The paper used in this book is acid-free and falls within the guidelines
established to ensure permanence and durability.
Visit the AMS home page at http://www.ams.org/
10 9 8 7 6 5 4 3 2 1 22 21 20 19 18 17
Contents

Preface ix
Chapter 1. Introduction 1
Chapter 2. Wigner Matrices and Their Generalizations 7
Chapter 3. Eigenvalue Density 11
3.1. Wigner Semicircle Law and Other Canonical Densities 11
3.2. The Moment Method 12
3.3. The Resolvent Method and the Stieltjes Transform 14
Chapter 4. Invariant Ensembles 17
4.1. Joint Density of Eigenvalues for Invariant Ensembles 17
4.2. Universality of Classical Invariant Ensembles
via Orthogonal Polynomials 20
Chapter 5. Universality for Generalized Wigner Matrices 29
5.1. Different Notions of Universality 29
5.2. The Three-Step Strategy 31
Chapter 6. Local Semicircle Law for Universal Wigner Matrices 33
6.1. Setup 33
6.2. Spectral Information on 𝑆 35
6.3. Stochastic Domination 37
6.4. Statement of the Local Semicircle Law 38
6.5. Appendix: Behavior of Γ and Γ̃ and the Proof of Lemma 6.3 40
Chapter 7. Weak Local Semicircle Law 45
7.1. Proof of the Weak Local Semicircle Law, Theorem 7.1 45
7.2. Large-Deviation Estimates 57
Chapter 8. Proof of the Local Semicircle Law 61
8.1. Tools 61
8.2. Self-Consistent Equations on Two Levels 64
8.3. Proof of the Local Semicircle Law Without Using the Spectral Gap 67
Chapter 9. Sketch of the Proof of the Local Semicircle Law
Using the Spectral Gap 79

v
vi CONTENTS

Chapter 10. Fluctuation Averaging Mechanism 83


10.1. Intuition Behind the Fluctuation Averaging 83
10.2. Proof of Lemma 8.9 84
10.3. Alternative Proof of (8.47) of Lemma 8.9 94
Chapter 11. Eigenvalue Location: The Rigidity Phenomenon 99
11.1. Extreme Eigenvalues 99
11.2. Stieltjes Transform and Regularized Counting Function 100
11.3. Convergence Speed of the Empirical Distribution Function 103
11.4. Rigidity of Eigenvalues 106
Chapter 12. Universality for Matrices with Gaussian Convolutions 109
12.1. Dyson Brownian Motion 109
12.2. Derivation of Dyson Brownian Motion and Perturbation Theory 112
12.3. Strong Local Ergodicity of the Dyson Brownian Motion 114
12.4. Existence and Restriction of the Dynamics 119
Chapter 13. Entropy and the Logarithmic Sobolev Inequality (LSI) 123
13.1. Basic Properties of the Entropy 123
13.2. Entropy on Product Spaces and Conditioning 126
13.3. Logarithmic Sobolev Inequality 128
13.4. Hypercontractivity 136
13.5. Brascamp-Lieb Inequality 138
13.6. Remarks on the Applications of the LSI to Random Matrices 140
13.7. Extensions to the Simplex; Regularization of the DBM 144
Chapter 14. Universality of the Dyson Brownian Motion 151
14.1. Main Ideas Behind the Proof of Theorem 14.1 153
14.2. Proof of Theorem 14.1 154
14.3. Restriction to the Simplex via Regularization 161
14.4. Dirichlet Form Inequality for Any 𝛽 > 0 163
14.5. From Gap Distribution to Correlation Functions:
Proof of Theorem 12.4 165
14.6. Details of the Proof of Lemma 14.8 166
Chapter 15. Continuity of Local Correlation Functions
Under the Matrix OU Process 171
15.1. Proof of Theorem 15.2 174
15.2. Proof of the Correlation Function Comparison Theorem,
Theorem 15.3 177
Chapter 16. Universality of Wigner Matrices
in Small Energy Windows: GFT 183
16.1. Green Function Comparison Theorems 183
16.2. Conclusion of the Three-Step Strategy 188
Chapter 17. Edge Universality 191
CONTENTS vii

Chapter 18. Further Results and Historical Notes 203


18.1. Wigner Matrices: Bulk Universality in Different Senses 203
18.2. Historical Overview of the Three-Step Strategy
for Bulk Universality 205
18.3. Invariant Ensembles and Log-Gases 207
18.4. Universality at the Spectral Edge 210
18.5. Eigenvectors 211
18.6. General Mean Field Models 213
18.7. Beyond Mean Field Models: Band Matrices 215
References 217
Index 225
Preface

This book is a concise and self-contained introduction to the recent tech-


niques to prove local spectral universality for large random matrices. Random
matrix theory is a fast-expanding research area, and this book mainly focuses on
the methods we participated in developing over the past few years. Many other
interesting topics are not included, nor are several new developments within
the framework of these methods. We have chosen instead to present key con-
cepts that we believe are the core of these methods and should be relevant for
future applications. We keep technicalities to a minimum to make the book
accessible to graduate students. With this in mind, we include in this book the
basic notions and tools for high-dimensional analysis such as large deviation,
entropy, Dirichlet form, and logarithmic Sobolev inequality.
The material in this book originates from our joint works with a group of
collaborators over the past several years. Not only were the main mathematical
results in this book taken from these works, but the presentation of many sec-
tions followed the routes laid out in these papers. In alphabetical order, these
coauthors were Paul Bourgade, Antti Knowles, Sandrine Péché, Jose Ramírez,
Benjamin Schlein, and Jun Yin. We would like to thank all of them.
This manuscript was developed and continuously improved over the last
five years. We have taught this material in several regular graduate courses at
Harvard, Munich, and Vienna, in addition to various summer schools and short
courses. We are thankful for the generous support of the Institute for Advanced
Study, Princeton, where part of this manuscript was written during the special
year devoted to random matrices in 2013–14. L.E. also thanks Harvard Univer-
sity for the continuous support during his numerous visits. L.E. was partially
supported by the SFB TR 12 grant of the German Science Foundation and the
ERC Advanced Grant, RANMAT 338804 of the European Research Council.
H.-T. Y. would like to thank the National Center for the Theoretic Sciences at
the National Taiwan University, where part of the manuscript was written, for
the hospitality and support for his several visits. H.-T. Y. gratefully acknowl-
edges the support from NSF DMS-1307444 and DMS-1606305 and a Simons
Investigator award.
Finally, we are grateful to the editorial support from the publisher, to Amol
Aggarwal, Johannes Alt, and Patrick Lopatto for careful reading of the manu-
script, and to Alex Gontar for his help in composing the bibliography.

ix
CHAPTER 1

Introduction

Perhaps I am now too courageous when I try to guess the distribu-


tion of the distances between successive levels [of energies of heavy
nuclei] …. Theoretically, the situation is quite simple if one attacks
the problem in a simple-minded fashion. The question is simply
‘what are the distances of the characteristic values of a symmetric
matrix with random coefficients?’

Eugene Wigner on the Wigner surmise, 1956 [71]

Random matrices appeared in the literature as early as 1928, when Wishart


[139] used them in statistics. The natural question regarding their eigenvalue
statistics, however, was not raised until the pioneering work [138] of Eugene
Wigner in the 1950s. Wigner’s original motivation came from nuclear physics
when he noticed from experimental data that gaps in energy levels of large nu-
clei tend to follow the same statistics irrespective of the material. Quantum
mechanics predicts that energy levels are eigenvalues of a self-adjoint operator,
but the correct Hamiltonian operator describing nuclear forces was not known
at that time. In addition, the computation of the energy levels of large quantum
systems would have been impossible even with the full Hamiltonian explicitly
given. Instead of pursuing a direct solution to this problem, Wigner appealed
to a phenomenological model to explain his observation. Wigner’s pioneering
idea was to model the complex Hamiltonian by a random matrix with inde-
pendent entries. All physical details of the system were ignored except one,
the symmetry type: systems with time-reversal symmetry were modeled by real
symmetric random matrices, while complex Hermitian random matrices were
used for systems without time-reversal symmetry (e.g., with magnetic forces).
This simple-minded model amazingly reproduced the correct gap statistics, in-
dicating a profound universality principle working in the background.
Notwithstanding their physical significance, random matrices are also very
natural mathematical objects, and their studies could have been initiated by
mathematicians driven by pure curiosity. A large number of random numbers
and vectors have been known to exhibit universal patterns; the obvious exam-
ples are the law of large numbers and the central limit theorem. What are their
analogues in the noncommutative setting, e.g., for matrices? Focusing on the
spectrum, what do eigenvalues of typical large random matrices look like?
1
2 1. INTRODUCTION

As the first result of this type, Wigner proved a type of law of large num-
bers for the density of eigenvalues, which we now explain. The (real or com-
plex) Wigner ensembles consist of 𝑁 × 𝑁 self-adjoint matrices 𝐻 = (ℎ𝑖𝑗 ) with
matrix elements having mean zero and variance 1/𝑁 that are independent up
to the symmetry constraint ℎ𝑖𝑗 = ℎ𝑗𝑖 . The Wigner semicircle law states that
the empirical density of the eigenvalues of 𝐻 is given by the semicircle law,
1
𝜚sc (𝑥) = 2𝜋 √(4 − 𝑥 2 )+ , as 𝑁 → ∞, independent of the details of the distribu-
tion of ℎ𝑖𝑗 .
On the scale of individual eigenvalues, Wigner predicted that the fluctua-
tions of the gaps are universal and their distribution is given by a new law, the
Wigner surmise. This might be viewed as the random matrix analogue of the
central limit theorem.
After Wigner’s discovery, Dyson, Gaudin, and Mehta achieved several fun-
damental mathematical results. In particular, they were able to compute the
gap distribution and the local correlation functions of the eigenvalues for Gauss-
ian ensembles. They are called the Gaussian orthogonal ensemble (GOE) and
the Gaussian unitary ensemble (GUE), corresponding to the two most important
symmetry classes, the real symmetric and complex Hermitian matrices. The
Wigner surmise turned out to be slightly wrong and the correct law is given by
the Gaudin distribution. Dyson and Mehta gradually formulated what is nowa-
days known as the Wigner-Dyson-Mehta (WDM) universality conjecture. As
presented in the classical treatise of Mehta [106], this conjecture asserts that
the local eigenvalue statistics for large random matrices with independent en-
tries are universal; i.e., they do not depend on the particular distribution of the
matrix elements. In particular, they coincide with those in the Gaussian case
that were computed explicitly.
On a more philosophical level, we can recast Wigner’s vision as the hypoth-
esis that the eigenvalue gap distributions for large complicated quantum sys-
tems are universal in the sense that they depend only on the symmetry class of
the physical system but not on other detailed structures. Therefore, the Wigner-
Dyson-Mehta universality conjecture is merely a test of Wigner’s hypothesis for
a special class of matrix models, the Wigner ensembles, which are characterized
by the independence of their matrix elements. The other large class of ma-
trix models is the invariant ensembles. They are defined by a Radon-Nikodym
density of the form exp(−Tr 𝑉(𝐻)) with respect to the flat Lebesgue measure
on the space of real symmetric or complex Hermitian matrices 𝐻. Here 𝑉 is a
real-valued function called the potential of the invariant ensemble. These distri-
butions are invariant under orthogonal or unitary conjugation, but the matrix
elements are not independent except for the Gaussian case. The universality
conjecture for invariant ensembles asserts that the local eigenvalue gap statis-
tics are independent of the function 𝑉.
In contrast to the Wigner case, substantial progress was made for invari-
ant ensembles in the last two decades. The key element of this success was
that invariant ensembles, unlike Wigner matrices, have explicit formulas for
1. INTRODUCTION 3

the joint densities of the eigenvalues. These formulas express the eigenvalue
correlation functions as determinants whose entries are given by functions of
orthogonal polynomials. The limiting local eigenvalue statistics are thus deter-
mined by the asymptotic behavior of these formulas and in particular those of
the orthogonal polynomials as the size of the matrix tends to infinity. A key
important tool, the Riemann-Hilbert method [72], was brought into this sub-
ject by Fokas, Its, and Kitaev, and the universality of eigenvalue statistics was
established for large classes of invariant ensembles by Bleher-Its [20] and by
Deift and collaborators [37–41].
Behind the spectacular progress in understanding the local statistics of in-
variant ensembles, the cornerstone of this approach lies in the fact that there
are explicit formulas to represent eigenvalue correlation functions by orthogo-
nal polynomials—a key observation made in the original work of Gaudin and
Mehta for Gaussian random matrices. For Wigner ensembles, there are no ex-
plicit formulas for the eigenvalue statistics, and the WDM conjecture was open
for almost 50 years with virtually no progress. The first significant advance
in this direction was made by Johansson [86], who proved the universality for
complex Hermitian matrices under the assumption that the common distribu-
tion of the matrix entries has a substantial Gaussian component; i.e., the ran-
dom matrix 𝐻 is of the form 𝐻 = 𝐻0 + 𝑎𝐻 G where 𝐻0 is a general Wigner
matrix, 𝐻 G is the GUE matrix, and 𝑎 is a certain, not too small, positive con-
stant independent of 𝑁. His proof relied on an explicit formula by Brézin and
Hikami [30, 31] that uses a certain version of the Harish-Chandra-Itzykson-
Zuber formula [85]. These formulas are available for the complex Hermitian
case only, which restricted the method to this symmetry class.
If local spectral universality is so ubiquitous as Wigner, Dyson, and Mehta
conjectured, there must be an underlying mechanism driving local statistics to
their universal fixed point. In hindsight, the existence of such a mechanism is
almost a synonym of “universality.” However, up to ten years ago there was no
clear indication whether a solution of the WDM conjecture would rely on some
yet-to-be discovered formulas or would come from some other deep insight.
The goal of this book is to give a self-contained introduction to a new ap-
proach to local spectral universality which, in particular, resolves the WDM
conjecture. To keep technicalities to a minimum, we will consider the simplest
class of matrices that we call generalized Wigner matrices (Definition 2.1), and
we will focus on proving universality in a simpler form, in the averaged energy
sense (see Section 5.1) and the necessary prerequisites. We stress that these are
not the strongest results that are currently available, but they are good repre-
sentatives of the general ideas we have developed in the past several years. We
believe that these techniques are sufficiently mature to be presented in a book
format.
4 1. INTRODUCTION

This approach consists of the following three steps:

Step 1. Local semicircle law: It provides an a priori estimate showing that


the density of eigenvalues of generalized Wigner matrices is given by the semi-
circle law at very small microscopic scales, i.e., down to spectral intervals that
contain 𝑁 𝜀 eigenvalues.
Step 2. Universality for Gaussian divisible ensembles: It proves that the local
statistics of Gaussian divisible ensembles 𝐻0 + 𝑎𝐻 G are the same as those of the
Gaussian ensembles 𝐻 G as long as 𝑎 ≥ 𝑁 −1/2+𝜀 , i.e., already for very small 𝑎.
Step 3. Approximation by a Gaussian divisible ensemble: It is a type of “den-
sity argument” that extends the local spectral universality from Gaussian divis-
ible ensembles to all Wigner ensembles.

The conceptually novel point of our method is Step 2. The eigenvalue dis-
tributions of the Gaussian divisible ensembles, written in the form 𝑒−𝑡/2 𝐻0 +
√1 − 𝑒−𝑡 𝐻 G , are the same as that of the solution of a matrix-valued Ornstein-
Uhlenbeck (OU) process 𝐻𝑡 for any time 𝑡 ≥ 0. Dyson [45] observed half a
century ago that the dynamics of the eigenvalues of 𝐻𝑡 is given by an inter-
acting stochastic particle system, called the Dyson Brownian motion (DBM).
In addition, the invariant measure of this dynamics is exactly the eigenvalue
distribution of GOE or GUE. This invariant measure is also a Gibbs measure
of point particles in one dimension interacting via a long-range, logarithmic
potential. Using a heuristic physical argument, Dyson remarked [45] that the
DBM reaches its “local equilibrium” on a short time scale 𝑡 ≳ 𝑁 −1 . We will
refer to this as Dyson’s conjecture, although it was more an intuitive physical
picture than an exact mathematical statement.
Since Dyson’s work in the 1960s, there has been virtually no progress in
proving this conjecture. Besides the limit of available mathematical tools, one
main factor is the vague formulation of the conjecture involving the notion of
“local equilibrium,” which even nowadays is not well-defined for a Gibbs mea-
sure with a general long-range interaction. Furthermore, a possible connection
between Dyson’s conjecture and the solution of the WDM conjecture has never
been elucidated in the literature.
In fact, “relaxation to local equilibrium” in a time scale 𝑡 refers to the phe-
nomenon that after time 𝑡 the dynamics has changed the system, at least locally,
from its initial state to a local equilibrium. It therefore appears counterintuitive
that one may learn anything useful in this way about the WDM conjecture,
since the latter concerns the initial state. The key point is that by applying local
relaxation to all initial states (within a reasonable class) simultaneously, Step 2
generates a large set of random matrix ensembles for which universality holds.
We prove that, for the purpose of universality, this set is sufficiently dense so
that any Wigner matrix 𝐻 is sufficiently close to a Gaussian divisible ensemble
of the form 𝑒−𝑡/2 𝐻0 + √1 − 𝑒−𝑡 𝐻 G with a suitably chosen 𝐻0 . Originally, our
motivation was driven not by Dyson’s conjecture, but by the desire to prove the
1. INTRODUCTION 5

universality for Gaussian divisible ensembles with the Gaussian component as


small as possible. As it turned out, our method used in Step 2 actually proved
a precisely formulated version of Dyson’s conjecture.
The applicability of our approach actually goes well beyond the original
scope. In recent years the method has been extended to invariant ensembles,
sample covariance matrices, adjacency matrices of sparse random graphs, and
certain band matrices. At the end of the book we will give a short overview of
these applications. In the following, we outline the main contents of the book.
In Chapters 2 through 4 we start with an elementary recapitulation of some
well-known concepts, such as the moment method, the orthogonal polynomial
method, the emergence of the sine kernel, the Tracy-Widom distribution, and
the invariant ensembles. These sections are included only to provide back-
grounds for the topics discussed in this book; details are often omitted, since
these issues are discussed extensively in other books. The core material starts
from Chapter 5 with a precise definition of different concepts of universality
and with the formulation of a representative theorem (Theorem 5.1) that we
eventually prove in this book. We also give here a detailed overview of the
three-step strategy.
The first step, the local version of Wigner’s semicircle law, is given in Chap-
ter 6. For pedagogical reasons, we give the proof on several levels. A weaker
version is given in Chapter 7, whose proof is easier; an ambitious reader may
skip this section. The proof of the stronger version is found in Chapter 8, which
gives the optimal result in the bulk of the spectrum. To obtain the optimal re-
sult at the edge, an additional idea is needed. This is sketched in Chapter 9,
but we do not give all the technical details. This section may be skipped at first
reading. A key ingredient of the proofs, the fluctuation averaging mechanism,
is presented separately in Chapter 10. Important consequences of the local law,
such as rigidity, speed of convergence, and eigenvector delocalization, are given
in Chapter 11.
The second step, the analysis of the Dyson Brownian motion and the proof
of Dyson’s conjecture, is presented in Chapter 12 and Chapter 14. Before the
proof in Chapter 14, we added a separate Chapter 13 devoted to general tools
of very large dimensional analysis, where we introduce the concepts of en-
tropy, Dirichlet form, Bakry-Émery method, and logarithmic Sobolev inequal-
ity. Some concepts (e.g., the Brascamp-Lieb inequality and hypercontractivity)
are not used in this book, but we included them since they are useful and related
topics.
The third step, the perturbative argument to compare arbitrary Wigner ma-
trices with Gaussian divisible matrices, is given in two versions. A shorter but
somewhat weaker version, the continuity of the correlation functions, is found
in Chapter 15. A technically more involved version, the Green function compar-
ison theorem, is presented in Chapter 16. This completes the proof of our main
result.
Separately, in Chapter 17 we prove the universality at the edge; the main
input is the strong form of the local semicircle law at the edge. Finally, we
6 1. INTRODUCTION

give a historical overview, discuss newer results, and provide some outlook on
related models in Chapter 18.
This book is not a comprehensive monograph on random matrices. Al-
though we summarize a few general concepts at the beginning, we mostly focus
on a concrete strategy and the necessary technical steps behind it. For readers
interested in other aspects, in addition to the classical book of Mehta [106],
several excellent works are available that present random matrices in a broader
scope. The books by Anderson, Guionnet, and Zeitouni [8] and Pastur and
Shcherbina [113] contain extensive material starting from the basics. Tao’s
book [128] provides a different aspect to this subject and is self-contained as
a graduate textbook. Forrester’s monograph [73] is a handbook for any explicit
formulas related to random matrices. Finally, [6] is an excellent comprehen-
sive overview of diverse applications of random matrix theory in mathematics,
physics, neural networks, and engineering.
Finally, we list a few notational conventions. In order to focus on the es-
sentials, we will not follow the dependencies of various constants on different
parameters. In particular, we will use the generic letters 𝐶 and 𝑐 to denote
positive constants, whose values may change from line to line and which may
depend on some fixed basic parameters of the model. For two positive quanti-
ties 𝐴 and 𝐵, we will write 𝐴 ≲ 𝐵 to indicate that there exists a constant 𝐶 such
that 𝐴 ≤ 𝐶𝐵. If 𝐴 and 𝐵 are comparable in the sense that 𝐴 ≲ 𝐵 and 𝐵 ≲ 𝐴,
then we write 𝐴 ≍ 𝐵. We introduce the notation J𝐴, 𝐵K ∶= ℤ ∩ [𝐴, 𝐵] for the
set of integers between any two real numbers 𝐴 < 𝐵.
CHAPTER 2

Wigner Matrices and Their Generalizations

Consider 𝑁 × 𝑁 square matrices of the form


ℎ ℎ12 … ℎ1𝑁
⎛ 11 ⎞
ℎ21 ℎ22 ⋯ ℎ2𝑁
(2.1) 𝐻 = 𝐻 (𝑁) =⎜ ⎟
⎜ ⋮ ⋮ ⋱ ⋮ ⎟
⎝ℎ𝑁1 ℎ𝑁2 ⋯ ℎ𝑁𝑁 ⎠
with centered entries
(2.2) 𝔼 ℎ𝑖𝑗 = 0, 𝑖, 𝑗 = 1, … , 𝑁.
The random variables ℎ𝑖𝑗 for 𝑖, 𝑗 = 1, … , 𝑁, are real or complex random vari-
ables subject to the symmetry constraint ℎ𝑖𝑗 = ℎ𝑗𝑖 so that 𝐻 is either Hermitian
(complex) or symmetric (real). Let 𝑆 = 𝑆 (𝑁) denote the matrix of variances
𝑠 𝑠12 … 𝑠1𝑁
⎛ 11 ⎞
𝑠21 𝑠22 … 𝑠2𝑁
(2.3) 𝑆 ∶= ⎜ ⎟, 𝑠𝑖𝑗 ∶= 𝔼|ℎ𝑖𝑗 |2 .
⎜ ⋮ ⋮ ⋱ ⋮ ⎟
𝑠 𝑠
⎝ 𝑁1 𝑁2 … 𝑠𝑁𝑁 ⎠
We assume that
𝑁
(2.4) ∑ 𝑠𝑖𝑗 = 1
𝑗=1

for every 𝑖 = 1, … , 𝑁; i.e., the deterministic 𝑁 ×𝑁 matrix of variances, 𝑆 = (𝑠𝑖𝑗 ),


is symmetric and doubly stochastic. The size of the random matrix 𝑁 is the
fundamental limiting parameter throughout this book. Most results become
sharp in the large-𝑁 limit. Many quantities, such as the distribution of 𝐻 and
the matrix 𝑆, naturally depend on 𝑁 but, for notational simplicity, we will often
omit this dependence from the notation.
The simplest case, when
1
𝑠𝑖𝑗 = , 𝑖, 𝑗 = 1, … , 𝑁,
𝑁
is called the Wigner matrix ensemble. We summarize this setup in the following
definition.
Definition 2.1. An 𝑁 × 𝑁 symmetric or Hermitian random matrix (2.1)
is called a universal Wigner matrix (ensemble) if the entries are centered (2.2),
7
8 2. WIGNER MATRICES AND THEIR GENERALIZATIONS

their variances 𝑠𝑖𝑗 = 𝔼|ℎ𝑖𝑗 |2 satisfy


(2.5) ∑ 𝑠𝑖𝑗 = 1, 𝑖 = 1, … , 𝑁,
𝑗

and {ℎ𝑖𝑗 ∶ 𝑖 ≤ 𝑗} are independent. An important subclass of the universal


Wigner ensembles is called generalized Wigner matrices (ensembles) if, addi-
tionally, the variances are comparable; i.e.,
(2.6) 0 < 𝐶inf ≤ 𝑁𝑠𝑖𝑗 ≤ 𝐶sup < ∞, 𝑖, 𝑗 = 1, … , 𝑁,
holds with some fixed positive constants 𝐶inf , 𝐶sup . For Hermitian ensembles,
we additionally require that for each 𝑖, 𝑗 the 2 × 2 covariance matrix is bounded
by 𝐶/𝑁 in the matrix sense, i.e.,
𝔼(ℜℎ𝑖𝑗 )2 𝔼(ℜℎ𝑖𝑗 )(ℑℎ𝑖𝑗 ) 𝐶
Σ𝑖𝑗 ∶= ( )≥ .
𝔼(ℜℎ𝑖𝑗 )(ℑℎ𝑖𝑗 ) 𝔼(ℑℎ𝑖𝑗 )2 𝑁
1
In the special case 𝑠𝑖𝑗 = 1/𝑁 and Σ𝑖𝑗 = 2𝑁 𝐼2×2 , we recover the original defini-
tion of the Wigner matrix or Wigner ensemble [138].
In this book we will focus on the generalized Wigner matrices. These are
mean field models in the sense that the typical size of the matrix elements ℎ𝑖𝑗
are comparable; see (2.6). In Section 18.7 we will discuss models beyond the
mean field regime.
The most prominent Wigner ensembles are the Gaussian orthogonal ensem-
ble (GOE) and the Gaussian unitary ensemble (GUE), i.e., real symmetric and
complex Hermitian Wigner matrices with rescaled matrix elements √𝑁ℎ𝑖𝑗 be-
ing standard Gaussian variables. More precisely, in the Hermitian case, √𝑁ℎ𝑖𝑗
is a standard complex Gaussian variable; i.e.,
2
𝔼|√𝑁ℎ𝑖𝑗 | = 1
with the real and imaginary part independent and having the same variance
when 𝑖 ≠ 𝑗. Furthermore, √𝑁ℎ𝑖𝑖 is a real standard Gaussian variable with
variance 1. For the real symmetric case we require that
2 1 2
𝔼|√𝑁ℎ𝑖𝑗 | = 1 = 𝔼|√𝑁ℎ𝑖𝑖 |
2
for any 𝑖 ≠ 𝑗.
For simplicity of the presentation, in the case of the Wigner ensembles we
will assume that the variances of ℎ𝑖𝑗 for 𝑖 < 𝑗 are not only identical but also
identically distributed. In this case we fix a distribution 𝜈 and we assume that
the rescaled matrix elements √𝑁ℎ𝑖𝑗 are distributed according to 𝜈. Depend-
ing on the symmetry type, the diagonal elements may have a slightly different
distribution, but we will omit this subtlety from the discussion. The distribu-
tion 𝜈 will be called the single-entry distribution of 𝐻. In the case of generalized
2. WIGNER MATRICES AND THEIR GENERALIZATIONS 9

−1/2
Wigner matrices, we assume that the rescaled matrix elements 𝑠𝑖𝑗 ℎ𝑖𝑗 with
unit variance all have law 𝜈.
We will need some decay property of 𝜈 and we will consider two decay types.
Either we assume that 𝜈 has subexponential decay, i.e., that there are constants
𝐶, 𝜗 > 0 such that for any 𝑠 > 0

(2.7) ∫ 1(|𝑥| ≥ 𝑠)d𝜈(𝑥) ≤ 𝐶 exp(−𝑠𝜗 );



or we assume that 𝜈 has a polynomial decay with arbitrary high power, i.e.,

(2.8) ∫ 1(|𝑥| ≥ 𝑠)d𝜈(𝑥) ≤ 𝐶𝑝 (1 + 𝑠)−𝑝



for all 𝑠 > 0 and any 𝑝 ∈ ℕ.
Another class of random matrices, which even predate Wigner, are the ran-
dom sample covariance matrices. These are matrices of the form
(2.9) 𝐻 = 𝑋∗𝑋
where 𝑋 is a rectangular 𝑀 × 𝑁 matrix with centered i.i.d. entries with variance
𝔼|𝑋𝑖𝑗 |2 = 𝑀 −1 . Note that the matrix elements of 𝐻 are not independent, but
they are generated from the independent matrix elements of 𝑋 in a straightfor-
ward way. These matrices appear in statistical samples and were first consid-
ered by Wishart [139]. In the case when 𝑋𝑖𝑗 are centered Gaussian, the random
covariance matrices are called Wishart matrices or ensembles.
CHAPTER 3

Eigenvalue Density

3.1. Wigner Semicircle Law and Other Canonical Densities


For a real symmetric or complex Hermitian Wigner matrix 𝐻, let 𝜆1 ≤ 𝜆2 ≤
⋯ ≤ 𝜆𝑁 denote the eigenvalues. The empirical distribution of eigenvalues
follows a universal pattern, the Wigner semicircle law. To formulate it more
precisely, note that the typical spacing between neighboring eigenvalues is of
order 1/𝑁. So, in a fixed interval [𝑎, 𝑏] ⊂ ℝ one expects macroscopically many
(of order 𝑁) eigenvalues. More precisely, it can be shown (first proof was given
by Wigner [138]) that for any fixed 𝑎 ≤ 𝑏 real numbers,

𝑏
1
lim #{𝑖 ∶ 𝜆𝑖 ∈ [𝑎, 𝑏]} = ∫ 𝜚sc (𝑥)d𝑥,
𝑁→∞ 𝑁
(3.1) 𝑎
1
𝜚sc (𝑥) ∶= (4 − 𝑥 2 )+ ,
2𝜋 √

where (𝑎)+ ∶= max{𝑎, 0} denotes the positive part of the number 𝑎. Note the
emergence of the universal density, the semicircle law, that is independent of
the details of the distribution of the matrix elements. The semicircle law holds
beyond Wigner matrices: it characterizes the eigenvalue density of the univer-
sal Wigner matrices (see Definition 2.1).
For the random covariance matrices (2.9), the empirical density of eigen-
values 𝜆𝑖 of 𝐻 converges to the Marchenko-Pastur law [104] in the limit when
𝑀, 𝑁 → ∞ such that 𝑑 = 𝑁/𝑀 is fixed 0 < 𝑑 ≤ 1:

𝑏
1
lim #{𝑖 ∶ 𝜆𝑖 ∈ [𝑎, 𝑏]} = ∫ 𝜚MP (𝑥)d𝑥,
𝑁→∞ 𝑁
(3.2) 𝑎

1 [(𝜆+ − 𝑥)(𝑥 − 𝜆− )]+


𝜚MP (𝑥) ∶= √ ,
2𝑢𝜋𝑑 𝑥2

with 𝜆± ∶= (1 ± √𝑑)2 being the spectral edges. Note that in case 𝑀 ≤ 𝑁, the
matrix 𝐻 has macroscopically many zero eigenvalues; otherwise, the spectra
of 𝑋𝑋 ∗ and 𝑋 ∗ 𝑋 coincide so the Marchenko-Pastur law can be applied to all
nonzero eigenvalues with the role of 𝑀 and 𝑁 exchanged.
11
12 3. EIGENVALUE DENSITY

3.2. The Moment Method


The eigenvalue density is commonly approached via the fairly robust mo-
ment method (see [8] for an exposé) that was also the original approach of Wig-
ner to prove the semicircle law [138]. The following proof is formulated for
Wigner matrices with identically distributed entries, in particular with vari-
ances 𝑠𝑖𝑗 = 𝑁 −1 , but a slight modification of the argument also applies to more
general ensembles satisfying (2.5). The simplest moment is the second mo-
ment, Tr 𝐻 2 :
𝑁 𝑁
∑ 𝜆2𝑖 = Tr 𝐻 2 = ∑ |ℎ𝑖𝑗 |2 .
𝑖=1 𝑖,𝑗=1

Taking expectation and using (2.5) we have


1 1
∑ 𝔼 𝜆2𝑖 = ∑ 𝑠𝑖𝑗 = 1.
𝑁 𝑖 𝑁 𝑖𝑗

In general, we can compute traces of all even powers of 𝐻, i.e.,


𝔼 Tr 𝐻 2𝑘 ,
by expanding the product as
(3.3) 𝔼 ∑ ℎ𝑖1 𝑖2 ℎ𝑖2 𝑖3 ⋯ ℎ𝑖2𝑘 𝑖1 .
𝑖1 ,…,𝑖2𝑘

Since the expectation of each matrix element vanishes, each factor ℎ𝑥𝑦
must be paired with at least another copy ℎ𝑦𝑥 = ℎ𝑥𝑦 ; otherwise the expecta-
tion value is zero. In general, there is no need that the sequence form a perfect
pairing. If we identify ℎ𝑦𝑥 with ℎ𝑥𝑦 , the only restriction is that the matrix ele-
ment ℎ𝑥𝑦 should appear at least twice in the sequence. Under this condition,
the main contribution comes from the index sequences that satisfy the perfect
pairing condition such that each ℎ𝑥𝑦 is paired with a unique ℎ𝑦𝑥 in (3.3). These
sequences can be classified according to their complexity, and it turns out that
the main contribution comes from the so-called backtracking sequences.
An index sequence 𝑖1 𝑖2 𝑖3 ⋯ 𝑖2𝑘 𝑖1 returning to the original index 𝑖1 is called
backtracking if it can be successively generated by a substitution rule
(3.4) 𝑎 → 𝑎𝑏𝑎, 𝑏 ∈ {1, … , 𝑁}, 𝑏 ≠ 𝑎,
with an arbitrary index 𝑏. For example, we represent the term
(3.5) ℎ𝑖1 𝑖2 ℎ𝑖2 𝑖3 ℎ𝑖3 𝑖2 ℎ𝑖2 𝑖4 ℎ𝑖4 𝑖5 ℎ𝑖5 𝑖4 ℎ𝑖4 𝑖2 ℎ𝑖2 𝑖1 , 𝑖1 , … , 𝑖5 ∈ {1, … , 𝑁},
in the expansion of Tr 𝐻 8 (𝑘 = 4) by a walk of length 2 × 4 on the set {1, … , 𝑁}.
This path is generated by the operation (3.4) in the following order:
𝑖1 → 𝑖1 𝑖2 𝑖1 → 𝑖1 𝑖2 𝑖3 𝑖2 𝑖1 → 𝑖1 𝑖2 𝑖3 𝑖2 𝑖4 𝑖2 𝑖1 → 𝑖1 𝑖2 𝑖3 𝑖2 𝑖4 𝑖5 𝑖4 𝑖2 𝑖1
with 𝑖1 ≠ 𝑖2 , 𝑖2 ≠ 𝑖3 , 𝑖3 ≠ 𝑖4 , 𝑖4 ≠ 𝑖5 . It may happen that two nonsuccessive
labels coincide (e.g., 𝑖1 = 𝑖3 ), but the contribution of such terms is by a factor
3.2. THE MOMENT METHOD 13

1/𝑁 less than the leading term so we can neglect them. Thus, we assume that all
labels 𝑖1 , 𝑖2 , 𝑖3 , 𝑖4 , 𝑖5 are distinct. We may bookkeep these paths by two objects:
(a) a graph on vertices labeled by 1, 2, 3, 4, 5 (i.e., by the indices of the
labels 𝑖𝑗 ) and with edges defined by walking over the vertices in fol-
lowing order 1, 2, 3, 2, 4, 5, 4, 2, 1, i.e., the order of the indices of the
labels in (3.5)
(b) an assignment of a distinct label 𝑖𝑗 , 𝑗 = 1, 2, 3, 4, 5, to each vertex.
It is easy to see that the graph generated by backtracking sequences is a
(double-edged) tree on the vertices 1, 2, 3, 4, 5. Now we count the combinatorics
of the objects (a) and (b) separately. We start with (a).
Lemma 3.1. The number of graphs with 𝑘 double-edges, derived from back-
1
tracking sequences, is explicitly given by the Catalan numbers, 𝐶𝑘 ∶= 𝑘+1 (2𝑘).
𝑘

Proof. There is a one-to-one correspondence between backtracking se-


quences and the number of nonnegative one-dimensional random walks of
length 2𝑘 returning to the origin at the time 2𝑘. This is defined by the simple
rule that a forward step 𝑎 → 𝑏 in the substitution rule (3.4) will be represented
by a step to the right in the one-dimensional random walk while the backtrack-
ing step 𝑏 → 𝑎 in (3.4) is step to the left. Since at any moment the number of
backtracking steps cannot be larger than the number of forward steps, the walk
remains on the nonnegative semi-axis.
From this interpretation, it is easy to show that the recursive relation
𝑛−1
𝐶𝑛 = ∑ 𝐶𝑘 𝐶𝑛−𝑘−1 , 𝐶0 = 𝐶1 = 1,
𝑘=0

holds for the number 𝐶𝑛 of the backtracking sequences of total length 2𝑛. Ele-
mentary combinatorics shows that the solution to this recursion is given by the
Catalan numbers, which proves the lemma. We remark that 𝐶𝑛 has many other
definitions. Alternatively, it is also the number of planar binary trees with 𝑛 + 1
vertices and one could also use this definition to verify the lemma. □

The combinatorics of label assignments in step b) for backtracking paths of


length 2𝑘 is 𝑁(𝑁 − 1) … (𝑁 − 𝑘) ≈ 𝑁 𝑘+1 for any fixed 𝑘 in the 𝑁 → ∞ limit. It
is easy to see that among all paths of length 2𝑘, the backtracking paths support
the maximal number (𝑘 + 1)-independent indices. The combinatorics for all
non-backtracking paths is at most 𝐶𝑁 𝑘 ; i.e., by a factor 1/𝑁 less, thus they are
negligible. Hence 𝔼 Tr 𝐻 2𝑘 can be computed fairly precisely for each finite 𝑘:
1 1 2𝑘
(3.6) 𝔼 Tr 𝐻 2𝑘 = ( ) + 𝑂𝑘 (𝑁 −1 ).
𝑁 𝑘+1 𝑘

Note that the number of independent labels, 𝑁 𝑘+1 after dividing by 𝑁, exactly
cancels the size of the 𝑘-fold product of variances, (𝔼|ℎ|2 )𝑘 = 𝑁 −𝑘 .
14 3. EIGENVALUE DENSITY

The expectation of the traces of odd powers of 𝐻 is negligible since they can
never satisfy the pairing condition. One can easily check that

(3.7) ∫ 𝑥 2𝑘 𝜌sc (𝑥)d𝑥 = 𝐶𝑘 ;


i.e., the semicircle law is identified as the probability measure on ℝ whose even
moments are the Catalan numbers and the odd moments vanish. This proves
that
1
(3.8) 𝔼 Tr 𝑃(𝐻) → ∫ 𝑃(𝑥)𝜚sc (𝑥)d𝑥
𝑁 ℝ

for any polynomial 𝑃. With standard approximation arguments, the polynomi-


als can be replaced with any continuous functions with compact support and
even with characteristic functions of intervals. This proves the semicircle law
(3.1) for the Wigner matrices in the “expectation sense.” To rigorously com-
plete the proof of the semicircle law (3.1) without expectation, we also need to
control the variance of Tr 𝑃(𝐻) in the left-hand side of (3.8). Although this can
also be done by the moment method, we will not get into further details here;
see [8] for a detailed proof.
Finally, we also remark that there are index sequences in (3.3) that require
computing higher than the second moment of ℎ. While the combinatorics of
these terms is negligible, the proof above assumed polynomial decay (2.8), i.e.,
that all rescaled moments of ℎ are finite, 𝔼|ℎ𝑖𝑗 |𝑚 ≤ 𝐶𝑚 𝑁 −𝑚/2 . This condition
can be relaxed by a cutoff argument, which we skip here. The interested reader
should consult [8] for more details.

3.3. The Resolvent Method and the Stieltjes Transform


An alternative approach to the eigenvalue density is via the Stieltjes trans-
form. The empirical measure of the eigenvalues is defined by
𝑁
1
𝜚𝑁 (d𝑥) ∶= ∑ 𝛿(𝑥 − 𝜆𝑗 )d𝑥.
𝑁 𝑗=1

The Stieltjes transform of the empirical measure is defined by


𝑁
1 1 1 1 d𝜚 (𝑥)
(3.9) 𝑚(𝑧) = 𝑚𝑁 (𝑧) ∶= Tr = ∑ =∫ 𝑁
𝑁 𝐻−𝑧 𝑁 𝑗=1 𝜆𝑗 − 𝑧 ℝ
𝑥−𝑧

for any 𝑧 = 𝐸 + 𝑖𝜂, 𝐸 ∈ ℝ, 𝜂 > 0. Notice that 𝑚 is simply the normalized trace
of the resolvent of the random matrix 𝐻 with spectral parameter 𝑧. The real
part 𝐸 = Re 𝑧 will often be referred to as the “energy,” alluding to the quantum
mechanical interpretation of the spectrum of 𝐻. An important property of the
Stieltjes transform of any measure on ℝ is that its imaginary part is positive
whenever Im 𝑧 > 0.
3.3. THE RESOLVENT METHOD AND THE STIELTJES TRANSFORM 15

For large 𝑧 one can expand 𝑚𝑁 as follows:


∞ 𝑚
1 1 1 𝐻
(3.10) 𝑚𝑁 (𝑧) = Tr =− ∑( ) ;
𝑁 𝐻−𝑧 𝑁𝑧 𝑚=0 𝑧
so after taking the expectation, using (3.6), and neglecting the error terms, we
get

1 2𝑘
(3.11) 𝔼 𝑚𝑁 (𝑧) ≈ − ∑ ( ) 𝑧−(2𝑘+1) ,
𝑘=0
𝑘+1 𝑘

which, after some calculus, can be identified as the Laurent series of 12 (−𝑧 +
√𝑧2 − 4). The approximation becomes exact in the 𝑁 → ∞ limit. Although
the expansion (3.10) is valid only for large 𝑧, given that the limit is an analytic
function of 𝑧 one can extend the relation
1
(3.12) lim 𝔼𝑚𝑁 (𝑧) = (−𝑧 + √𝑧2 − 4)
𝑁→∞ 2
by analytic continuation to the whole upper half-plane 𝑧 = 𝐸+𝑖𝜂, 𝜂 > 0. It is an
easy exercise to see that this is exactly the Stieltjes transform of the semicircle
density, i.e.,
1 𝜚 (𝑥)d𝑥
(3.13) 𝑚sc (𝑧) ∶= (−𝑧 + √𝑧2 − 4) = ∫ sc .
2 ℝ
𝑥−𝑧
The square root function is chosen with a branch cut in the segment [−2, 2] so
that √𝑧2 − 4 ≍ 𝑧 at infinity. This guarantees that Im 𝑚sc (𝑧) > 0 for Im 𝑧 >
0. Since the Stieltjes transform identifies the measure uniquely, and pointwise
convergence of Stieltjes transforms implies weak convergence of measures, we
obtain
(3.14) 𝔼 𝜚𝑁 (d𝑥) ⇀ 𝜚sc (𝑥)d𝑥.
The relation (3.12) actually holds with high probability; i.e., for any 𝑧 with
Im 𝑧 > 0
1
(3.15) lim 𝑚𝑁 (𝑧) = (−𝑧 + √𝑧2 − 4)
𝑁→∞ 2
in probability. In the next sections we will prove this limit with an effective
error term via the resolvent method.
The semicircle law can be identified in many different ways. The moment
method in Section 3.2 utilized the fact that the moments of the semicircle den-
sity are given by the Catalan numbers (3.7), which also emerged as the normal-
ized traces of powers of 𝐻; see (3.6). The resolvent method relies on the fact that
𝑚𝑁 approximately satisfies a self-consistent equation, 𝑚𝑁 ≈ −(𝑧 +𝑚𝑁 )−1 , that
is very close to the quadratic equation that 𝑚sc from (3.13) satisfies:
1
𝑚sc (𝑧) = − .
𝑧 + 𝑚sc (𝑧)
16 3. EIGENVALUE DENSITY

In other words, in the resolvent method the semicircle density emerges via a
specific relation for its Stieltjes transform. It turns out that this approach allows
us to perform a much more precise analysis, especially in the short-scale regime
where Im 𝑧 approaches to 0 as a function of 𝑁. Since the Stieltjes transform of
a measure at spectral parameter 𝑧 = 𝐸 + 𝑖𝜂 essentially identifies the measure
around 𝐸 on scale 𝜂 > 0, a precise understanding of 𝑚𝑁 (𝑧) for small Im 𝑧 will
yield a local version of the semicircle law. This will be explained in Chapter 6.
CHAPTER 4

Invariant Ensembles

4.1. Joint Density of Eigenvalues for Invariant Ensembles


There is another natural way to define probability distributions on real sym-
metric or complex Hermitian matrices apart from directly imposing a given
probability law 𝜈 on their entries. They are obtained by defining a density func-
tion directly on the set of matrices
(4.1) ℙ(𝐻)d𝐻 ∶= 𝑍 −1 exp (−𝑁 Tr 𝑉(𝐻))d𝐻.
Here d𝐻 = ∏𝑖≤𝑗 d𝐻𝑖𝑗 is the flat Lebesgue measure (in the case of complex Her-
mitian matrices and 𝑖 < 𝑗, d𝐻𝑖𝑗 is the Lebesgue measure on the complex plane
ℂ). The function 𝑉 ∶ ℝ → ℝ is assumed to grow mildly at infinity (some log-
arithmic growth would suffice) to ensure that the measure defined in (4.2) is
finite and 𝑍 is the normalization factor. Probability distributions of the form
(4.1) are called invariant ensembles since they are invariant under orthogonal
or unitary conjugation (in the case of symmetric or Hermitian matrices, respec-
tively). For example, in the Hermitian case, for any fixed unitary matrix 𝑈 the
transformation
𝐻 → 𝑈 ∗ 𝐻𝑈
leaves the distribution (4.1) invariant thanks to Tr 𝑉(𝑈 ∗ 𝐻𝑈) = Tr 𝑉(𝐻).
Wigner matrices and invariant ensembles form two different universes with
quite different mathematical tools available for their studies. In fact, these two
classes are almost disjoint, the Gaussian ensembles being the only invariant
Wigner matrices. This is the content of the following lemma.
Lemma 4.1 ([37] or theorem 2.6.3 [106]). Suppose that the real symmetric or
complex Hermitian matrix ensembles given in (4.1) have independent entries ℎ𝑖𝑗 ,
𝑖 ≤ 𝑗. Then 𝑉(𝑥) is a quadratic polynomial, 𝑉(𝑥) = 𝑎𝑥 2 +𝑏𝑥+𝑐 with 𝑎 > 0. This
means that apart from a trivial shift and normalization, the ensemble is GOE or
GUE.
For ensembles that remain invariant under the transformation 𝐻 → 𝑈 ∗ 𝐻𝑈
for any unitary matrix 𝑈 (or, in the case of symmetric matrices 𝐻, for any or-
thogonal matrix 𝑈), the joint probability density function of all the 𝑁 eigenval-
ues can be explicitly computed. These ensembles are typically given by
𝛽
(4.2) ℙ(𝐻)d𝐻 = 𝑍 −1 exp (− 𝑁 Tr 𝑉(𝐻))d𝐻,
2
17
18 4. INVARIANT ENSEMBLES

which is of the form (4.1) with a traditional extra factor 𝛽/2 that makes some
later formulas nicer. The parameter 𝛽 is determined by the symmetry type:
𝛽 = 1 for real symmetric ensembles and 𝛽 = 2 for complex Hermitian ensem-
bles.
The joint (symmetrized) probability density of the eigenvalues of 𝐻 can be
computed explicitly:
𝛽 𝑁
− 𝑁 ∑𝑗=1 𝑉(𝜆𝑗 )
(4.3) 𝑝𝑁 (𝜆1 , … , 𝜆𝑁 ) = const ∏(𝜆𝑖 − 𝜆𝑗 )𝛽 𝑒 2 .
𝑖<𝑗

In particular, for the Gaussian case 𝑉(𝜆) = 12 𝜆2 is quadratic and thus the joint
distribution of the GOE (𝛽 = 1) and GUE (𝛽 = 2) eigenvalues is given by
1 𝑁
− 𝛽𝑁 ∑𝑗=1 𝜆2𝑗
(4.4) 𝑝𝑁 (𝜆1 , … , 𝜆𝑁 ) = const ∏(𝜆𝑖 − 𝜆𝑗 )𝛽 𝑒 4 .
𝑖<𝑗

In particular, the eigenvalues are strongly correlated. (In this section we neglect
the ordering of the eigenvalues, and we will consider symmetrized statistics.)
The emergence of the Vandermonde determinant in (4.3) is a result of inte-
grating out the “angle” variables in (4.2), i.e., the unitary matrix in the diago-
nalization of 𝐻 = 𝑈Λ𝑈 ∗ . For illustration, we now show this formula for a 2 × 2
matrix. Consider first the real case. By diagonalization, any real symmetric 2×2
matrix can be written in the form
𝑥 𝑧
𝐻=( )
𝑧 𝑦
(4.5)
cos 𝜃 − sin 𝜃 𝜆1 0 cos 𝜃 sin 𝜃
=( )( )( ) , 𝑥, 𝑦, 𝑧 ∈ ℝ.
sin 𝜃 cos 𝜃 0 𝜆2 − sin 𝜃 cos 𝜃
Direct calculation shows that the Jacobian of the coordinate transformation
from (𝑥, 𝑦, 𝑧) to (𝜆1 , 𝜆2 , 𝜃) is
(4.6) (𝜆1 − 𝜆2 ).
The complex case is slightly more complicated. We can write with 𝑧 = 𝑢+𝑖𝑣
and 𝑥, 𝑦 ∈ ℝ
𝑥 𝑧 𝜆 0 −𝑖𝐴
(4.7) 𝐻=( ) = 𝑒𝑖𝐴 ( 1 )𝑒
𝑧 𝑦 0 𝜆2
where 𝐴 is a Hermitian matrix with trace zero; thus, it can be written in the
form
𝑎 𝑏 + 𝑖𝑐
(4.8) 𝐴=( ) , 𝑎, 𝑏, 𝑐 ∈ ℝ.
𝑏 − 𝑖𝑐 −𝑎
This parametrization of 𝑆𝑈(2) with three real degrees of freedom is standard,
but for our purpose we only need two in (4.7) in addition to the two degrees of
freedom from the 𝜆’s. The reason is that the two phases of the eigenvectors are
redundant and the trace zero condition only takes out one degree of freedom,
leaving one more superfluous parameter. We will see that eventually 𝑎 plays no
4.1. JOINT DENSITY OF EIGENVALUES FOR INVARIANT ENSEMBLES 19

role. First, we evaluate the Jacobian at 𝐴 = 0 from the formula (4.7). We only
need to keep the leading order in 𝐴 which gives
𝑥 𝑧 𝜆 0
(4.9) ( ) = Λ + 𝑖[𝐴, Λ] + 𝑂(‖𝐴‖2 ), Λ=( 1 ).
𝑧 𝑦 0 𝜆2
Thus
𝑥 𝑧 𝜆1 𝑖(𝑏 + 𝑖𝑐)(𝜆2 − 𝜆1 )
(4.10) ( )=( ) + 𝑂(‖𝐴‖2 ),
𝑧 𝑦 −𝑖(𝑏 − 𝑖𝑐)(𝜆2 − 𝜆1 ) 𝜆2
and the Jacobian of the transformation from (𝑥, 𝑦, 𝑧) to (𝜆1 , 𝜆2 , 𝑏, 𝑐) at 𝑏 = 𝑐 = 0
is of the form
1 0 0 0
⎡ ⎤
0 1 0 0⎥
(4.11) 𝐶(𝜆2 − 𝜆1 )2 ⎢
⎢0 = 𝐶(𝜆1 − 𝜆2 )2
⎢ 0 𝑖 −𝑖 ⎥

⎣0 0 −1 −1⎦
with some constant 𝐶. To compute the Jacobian not at the identity, we first no-
tice that by rotation invariance the measure factorizes. This means that its den-
sity with respect to the Lebesgue measure can be written of the form 𝑓(Λ)𝑔(𝑈)
with some functions 𝑓 and 𝑔; in fact, 𝑔(𝑈) is constant (the marginal on the uni-
tary part is the Haar measure). The function 𝑓 may be computed at any point; in
particular, at 𝑈 = 𝐼 this was the calculation (4.11) yielding 𝑓(Λ) = 𝐶(𝜆1 − 𝜆2 )2 .
This proves (4.4) for 𝑁 = 2 modulo the case of multiple eigenvalues 𝜆1 = 𝜆2
where the parametrization of 𝑈 is even more redundant. But this set has zero
measure, so it is negligible (see [8] for a precise argument). The formula (4.9) is
the basis for the proof for general 𝑁, which we leave to the readers. The detailed
proof can be found in [8] or [37].
It is often useful to think of the measure (4.4) as a Gibbs measure on 𝑁
“particles” or “points” 𝝀 = (𝜆1 , … , 𝜆𝑁 ) of the form
𝑒−𝛽𝑁ℋ(𝝀)
𝜇𝑁 (d𝝀) = 𝑝𝑁 (𝝀)d𝝀 = ,
𝑍
(4.12) 𝑁
1 1
ℋ(𝝀) ∶= ∑ 𝑉(𝜆𝑖 ) − ∑ log |𝜆𝑗 − 𝜆𝑖 |
2 𝑖=1 𝑁 𝑖<𝑗

with the confining potential 𝑉(𝜆) and logarithmic interaction. (This connec-
tion was exploited first in [46]). We adopt the standard convention in random
matrix theory that the Hamiltonian ℋ expresses energy per particle, in contrast
to the standard statistical physics terminology where the “Hamiltonian” refers
to the total energy. This explains the unusual 𝑁 factor in the exponent. Notice
the emergence of the Vandermonde determinant in (4.3), which directly comes
from integrating out the Haar measure and the symmetry type of the ensemble,
appears through the exponent 𝛽. Only the “classical” 𝛽 = 1 , 2, or 4 cases cor-
responds to matrix ensembles of the form (4.2), namely, to the real symmetric
or complex Hermitian and quaternion self-dual matrices. We will not give the
20 4. INVARIANT ENSEMBLES

precise definition of the latter (see, e.g., chapter 7 of [106] or [64]), and just men-
tion that this is the natural generalization of symmetric or Hermitian matrices
to quaternion entries and they have real eigenvalues.
We remark that despite the convenience of the explicit formula (4.3) or
(4.12) for the joint density, computing various statistics, such as correlation
functions or even the density of a single eigenvalue, is a highly nontrivial task.
For example, the density involves “only” integrating out all but one eigenvalue,
but the measure is far from being a product, so these integrations cannot be
performed directly when 𝑁 is large. The measure (4.12) has a strong and long-
range interaction, while conventional methods of statistical physics are well
suited for short-range interactions. In fact, from this point of view 𝛽 can be any
positive number and does not have to be restricted to the specific values 𝛽 = 1,
2, or 4. For other values of 𝛽 there is no invariant matrix ensemble behind the
measure (4.12), but it is still a very interesting statistical mechanical system,
called the log-gas or 𝛽-ensemble. If the potential 𝑉 is quadratic, then (4.12)
coincides with (4.4) and it is called the Gaussian 𝛽-ensemble. We will briefly
discuss log-gases in Section 18.3.

4.2. Universality of Classical Invariant Ensembles


via Orthogonal Polynomials
The classical values 𝛽 = 1, 2, 4 in the ensemble (4.12) are specific not only
because they originate from an invariant matrix ensemble (4.3). For these spe-
cific values an extra mathematical structure emerges, namely the orthogonal
polynomials with respect to the weight function 𝑒−𝑁𝑉(𝜆) on the real line. Ow-
ing to this structure, much more is known about these ensembles than about
log-gases with general 𝛽. This approach was originally applied by Mehta and
Gaudin [106,107] to compute the gap distribution for the Gaussian case that in-
volved the classical Hermite orthonormal polynomials. Dyson [47] computed
the local correlation functions for a related ensemble (circular ensemble) that
was extended to the standard Gaussian ensembles by Mehta [105]. Later, a
general method using orthogonal polynomials and the Riemann-Hilbert prob-
lem was developed to tackle a very general class of invariant ensembles (see,
e.g., [20, 37, 40–42, 72, 106, 112] and references therein).
For simplicity, to illustrate the connection, we will consider the Hermitian
case 𝛽 = 2 with a Gaussian potential 𝑉(𝜆) = 𝜆2 /2 (which, by Lemma 4.1, is
also a Wigner matrix ensemble, namely the GUE). To simplify the presentation
further, for the purpose of this subsection only, we rescale the eigenvalues

(4.13) 𝑥 = √𝑁𝜆,

which effectively removes the factor 𝑁 from the exponent in (4.3). (This simple
scaling works only in the pure Gaussian case, and it is only a technical conve-
nience to simplify formulas.)
4.2. UNIVERSALITY OF CLASSICAL INVARIANT ENSEMBLES 21

After the rescaling and setting 𝛽 = 2, the measure we will consider is given
by a density which we denote by
𝑁 1
− 𝑥2𝑗
(4.14) ̂ (𝑥1 , … , 𝑥𝑁 ) = const ∏(𝑥𝑖 − 𝑥𝑗 )2 ∏ 𝑒
𝑝𝑁 2 .
𝑖<𝑗 𝑗=1

Let 𝑃𝑘 (𝑥) be the 𝑘th orthogonal polynomial on ℝ with respect to the weight
2
function 𝑒−𝑥 /2 with leading coefficient 1. Let
2
𝑒−𝑥 /4 𝑃 (𝑥)
𝜓𝑘 (𝑥) ∶= −𝑥2 /4 𝑘
‖𝑒 𝑃𝑘 ‖𝐿2 (ℝ)
be the corresponding orthonormal function, i.e.,

(4.15) ∫ 𝜓𝑘 (𝑥)𝜓ℓ (𝑥)d𝑥 = 𝛿𝑘,ℓ .



In the particular case of the Gaussian weight function, 𝑃𝑘 is given by the Her-
mite polynomials
𝑘
2 /2 d 2
𝑃𝑘 (𝑥) = 𝐻𝑘 (𝑥) ∶= (−1)𝑘 𝑒𝑥 𝑒−𝑥 /2
d𝑥𝑘
and
𝑃𝑘 (𝑥) 2
(4.16) 𝜓𝑘 (𝑥) = 1/4 1/2
𝑒−𝑥 /4 ,
(2𝜋) (𝑘! )
but for the following discussion we will not need these explicit formulae, only
the orthonormality relation (4.15).
The key observation is that, by simple properties of the Vandermonde de-
terminant, we have
𝑗−1 𝑁
(4.17) Δ𝑁 (𝐱) = ∏ (𝑥𝑗 − 𝑥𝑖 ) = det(𝑥𝑖 )𝑖,𝑗=1 = det(𝑃𝑗−1 (𝑥𝑖 ))𝑁
𝑖,𝑗=1
1≤𝑖<𝑗≤𝑁

by exploiting that 𝑃𝑗 (𝑥) = 𝑥 𝑗 + ⋯ is a polynomial of degree 𝑗 with leading


coefficient equal to 1. Define the kernel
𝑁−1
(4.18) 𝐾𝑁 (𝑥, 𝑦) ∶= ∑ 𝜓𝑘 (𝑥)𝜓𝑘 (𝑦),
𝑘=0
i.e., the projection kernel onto the subspace spanned by the first 𝐾 orthonormal
functions. Then (4.17) immediately implies
𝑁
2 2
𝑝𝑁
̂ (𝑥1 , … , 𝑥𝑁 ) = 𝐶𝑁 [det(𝑃𝑗−1 (𝑥𝑖 ))𝑁
𝑖,𝑗=1 ] ∏ 𝑒−𝑥𝑖 /2
(4.19) 𝑖=1
2
= ′
𝐶𝑁 [det(𝜓𝑗−1 (𝑥𝑖 ))𝑁
𝑖,𝑗=1 ]

= 𝐶𝑁 𝑁
det(𝐾𝑁 (𝑥𝑖 , 𝑥𝑗 ))𝑖,𝑗=1 ,
where in the last step we used that the product of the matrices (𝜓𝑗−1 (𝑥𝑖 ))𝑁
𝑖,𝑗=1
and (𝜓𝑗−1 (𝑥𝑘 ))𝑁
𝑗,𝑘=1 is exactly (𝐾 (𝑥 , 𝑥 ))𝑁
𝑁 𝑖 𝑘 𝑖,𝑘=1 , and we did not follow the pre-
cise constants for simplicity. The determinantal structure (4.19) for the joint
22 4. INVARIANT ENSEMBLES

density functions is remarkable: it allows one to describe a function of 𝑁 vari-


ables in terms of a kernel of two variables only. This structure greatly simplifies
the analysis.
We now define the correlation functions that play a crucial role to describe
universality.
Definition 4.2. Let 𝑝𝑁 (𝜆1 , … , 𝜆𝑁 ) be the joint symmetrized probability
distribution of the eigenvalues (without rescaling (4.13)). For any 𝑛 ≥ 1, the
𝑛-point correlation function is defined by
(𝑛)
(4.20) 𝑝𝑁 (𝜆1 , … , 𝜆𝑛 ) ∶= ∫ 𝑝𝑁 (𝜆1 , … , 𝜆𝑛 , 𝜆𝑛+1 , … , 𝜆𝑁 )d𝜆𝑛+1 ⋯ d𝜆𝑁 .
ℝ𝑁−𝑛
(𝑛)
We also define the rescaled correlation function 𝑝
ˆ𝑁 (𝑥1 , … , 𝑥𝑛 ) in a similar
way.
Remark. In other sections of this book we usually label the eigenvalues in
increasing order so that their probability density, denoted by 𝑝
˜ 𝑁 (𝝀), is defined
on the set
Ξ(𝑁) ∶= {𝜆1 < ⋯ < 𝜆𝑁 } ⊂ ℝ𝑁 .
For the purpose of Definition 4.2, however, we dropped this restriction and
we consider 𝑝𝑁 (𝜆1 , … , 𝜆𝑁 ) to be a symmetric function of 𝑁 variables, 𝝀 =
(𝜆1 , … , 𝜆𝑁 ) on ℝ𝑁 . The relation between the ordered and unordered densities
is clearly 𝑝˜ 𝑁 (𝝀) = 𝑁! 𝑝𝑁 (𝝀) ⋅ 1(𝝀 ∈ Ξ(𝑁) ).
The significance of the correlation functions is that with their help one can
compute the expectation value of any symmetrized observable. For example,
for any test function 𝑂 of two variables we have, directly from the definition of
the correlation functions, that
1 (2)
(4.21) 𝔼 ∑ 𝑂(𝜆𝑖 , 𝜆𝑗 ) = ∫ 𝑂(𝜆1 , 𝜆2 )𝑝𝑁 (𝜆1 , 𝜆2 )d𝜆1 d𝜆2
𝑁(𝑁 − 1) 𝑖≠𝑗 ℝ×ℝ

where the expectation is w.r.t. the probability density 𝑝𝑁 or, in this case, w.r.t.
the original random matrix ensemble. A similar formula holds for observables
of any number of variables.
To compute the correlation functions of a determinantal joint density (4.19),
we start with the following prototype calculation for 𝑁 = 3, 𝑛 = 2:
𝐾3 (𝑥1 , 𝑥1 ) 𝐾3 (𝑥1 , 𝑥2 ) 𝐾3 (𝑥1 , 𝑥3 )
∫ d𝑥3 [𝐾3 (𝑥2 , 𝑥1 ) 𝐾3 (𝑥2 , 𝑥2 ) 𝐾3 (𝑥2 , 𝑥3 )]
ℝ 𝐾3 (𝑥3 , 𝑥1 ) 𝐾3 (𝑥3 , 𝑥2 ) 𝐾3 (𝑥3 , 𝑥3 )
𝐾 (𝑥 , 𝑥 ) 𝐾3 (𝑥2 , 𝑥2 )
= ∫ d𝑥3 [ 3 2 1 ] 𝐾3 (𝑥1 , 𝑥3 )
(4.22) ℝ
𝐾 3 (𝑥3 , 𝑥1 ) 𝐾 3 (𝑥3 , 𝑥2 )

𝐾 (𝑥 , 𝑥 ) 𝐾3 (𝑥1 , 𝑥2 )
− ∫ d𝑥3 [ 3 1 1 ] 𝐾 (𝑥 , 𝑥 )

𝐾3 (𝑥3 , 𝑥1 ) 𝐾3 (𝑥3 , 𝑥2 ) 3 2 3
𝐾 (𝑥 , 𝑥 ) 𝐾3 (𝑥1 , 𝑥2 )
+ ∫ d𝑥3 [ 3 1 1 ] 𝐾 (𝑥 , 𝑥 ).

𝐾3 (𝑥2 , 𝑥1 ) 𝐾3 (𝑥2 , 𝑥2 ) 3 3 3
4.2. UNIVERSALITY OF CLASSICAL INVARIANT ENSEMBLES 23

From the definition (4.18) and the orthonormality of the 𝜓’s we have the repro-
ducing property

(4.23) ∫ d𝑦 𝐾𝑁 (𝑥, 𝑦)𝐾𝑁 (𝑦, 𝑧) = 𝐾𝑁 (𝑥, 𝑧)



and the normalization

(4.24) ∫ d𝑥 𝐾𝑁 (𝑥, 𝑥) = 𝑁.

Thus (4.22) equals
𝐾 (𝑥 , 𝑥 ) 𝐾3 (𝑥2 , 𝑥2 ) 𝐾 (𝑥 , 𝑥 ) 𝐾3 (𝑥1 , 𝑥2 )
[ 3 2 1 ]−[ 3 1 1 ]
𝐾3 (𝑥1 , 𝑥1 ) 𝐾3 (𝑥1 , 𝑥2 ) 𝐾3 (𝑥2 , 𝑥1 ) 𝐾3 (𝑥2 , 𝑥2 )
𝐾 (𝑥 , 𝑥 ) 𝐾3 (𝑥1 , 𝑥2 )
(4.25) + 3[ 3 1 1 ]
𝐾3 (𝑥2 , 𝑥1 ) 𝐾3 (𝑥2 , 𝑥2 )
𝐾3 (𝑥1 , 𝑥1 ) 𝐾3 (𝑥1 , 𝑥2 )
=[ ].
𝐾3 (𝑥2 , 𝑥1 ) 𝐾3 (𝑥2 , 𝑥2 )
It is easy to generalize this computation to get
(𝑛) (𝑁 − 𝑛)!
(4.26) ˆ𝑁 (𝑥1 , … , 𝑥𝑛 ) =
𝑝 det[𝐾𝑁 (𝑥𝑖 , 𝑥𝑗 )]𝑛
𝑖,𝑗=1 ;
𝑁!
i.e., the correlation functions continue to have a determinantal structure. Here
(𝑛)
the constant is obtained by the normalization condition that 𝑝 ˆ𝑁 is a proba-
bility density. Thus, we have an explicit formula for the correlation functions
in terms of the kernel 𝐾𝑁 . We note that this structure is very general and is
not restricted to Hermite polynomials; it only requires a system of orthogonal
polynomials.
To understand the behavior of 𝐾𝑁 , first we recall a basic algebraic property
of the orthogonal polynomials, the Christoffel–Darboux formula:
𝑁−1
𝜓𝑁 (𝑥)𝜓𝑁−1 (𝑦) − 𝜓𝑁 (𝑦)𝜓𝑁−1 (𝑥)
(4.27) 𝐾𝑁 (𝑥, 𝑦) = ∑ 𝜓𝑗 (𝑥)𝜓𝑗 (𝑦) = √𝑁[ ].
𝑗=0
𝑥−𝑦

Since the Hermite polynomials and the orthonormal functions 𝜓𝑁 differ only
by an exponential factor (4.16), and these factors in 𝜓(𝑥)𝜓(𝑦) on both sides of
(4.27) are canceled, (4.27) is just a property of the Hermite polynomials.
We now sketch a proof of this identity. Multiplying both sides by (𝑥 − 𝑦),
we need to prove that
𝑁−1
(4.28) ∑ 𝜓𝑗 (𝑥)𝜓𝑗 (𝑦)(𝑥 − 𝑦) = √𝑁[𝜓𝑁 (𝑥)𝜓𝑁−1 (𝑦) − 𝜓𝑁 (𝑦)𝜓𝑁−1 (𝑥)].
𝑗=0

The left side is a polynomial of degree 𝑁 in 𝑥 and degree 𝑁 in 𝑦, up to com-


mon exponential factors. The multiplication of 𝑥 or 𝑦 can be computed by the
following identity:
(4.29) 𝑥𝜓𝑗 (𝑥) = √𝑗 + 1 𝜓𝑗+1 (𝑥) + √𝑗 𝜓𝑗−1 (𝑥)
24 4. INVARIANT ENSEMBLES

that directly follows from the “three-term” relation for the Hermite polynomi-
als. Collecting all the terms generated in this way, we obtain the right-hand
side of (4.28). Details can be found in lemma 3.2.7 of [8].
4.2.1. Bulk Universality: the Sine-Kernel. It is well-known that or-
thogonal polynomials of high degree have asymptotic behavior (Plancherel-
Rotach asymptotics). For the Hermite orthonormal functions 𝜓 these formu-
las read as follows:
(−1)𝑚
(4.30) 𝜓2𝑚 (𝑥) = cos (√𝑁𝑥) + 𝑜(𝑁 −1/4 ),
1/4
𝑁 √𝜋
(−1)𝑚
(4.31) 𝜓2𝑚+1 (𝑥) = sin (√𝑁𝑥) + 𝑜(𝑁 −1/4 ),
1/4
𝑁 √𝜋
as 𝑁 → ∞ for any 𝑚 such that |2𝑚 − 𝑁| ≤ 𝐶. The approximation is uniform
for |𝑥| ≤ 𝐶𝑁 −1/2 . We can thus compute that
1 sin(√𝑁𝑥) cos(√𝑁𝑦) − sin(√𝑁𝑦) cos(√𝑁𝑥)
𝐾𝑁 (𝑥, 𝑦) ≈
𝜋 𝑥−𝑦
(4.32)
sin √𝑁(𝑥 − 𝑦)
= ;
𝜋(𝑥 − 𝑦)
i.e., the celebrated sine kernel emerged [47, 105].
To rewrite this formula into a canonical form, recall that we have done a
rescaling (4.13) where 𝜆 is the original variable. The two-point function in the
(2) (2)
original variables was denoted by 𝑝𝑁 and the rescaled variables by 𝑝 ˆ𝑁 . The
(2) (2)
ˆ𝑁 is determined by
relation between 𝑝𝑁 and 𝑝
(2) (2)
(4.33) ˆ𝑁 (𝑥1 , 𝑥2 )d𝑥1 d𝑥2 ,
𝑝𝑁 (𝜆1 , 𝜆2 )d𝜆1 d𝜆2 = 𝑝
and thus we have
(2) (2)
(4.34) ˆ𝑁 (√𝑁𝜆1 , √𝑁𝜆2 ).
𝑝𝑁 (𝜆1 , 𝜆2 ) = 𝑁 𝑝
Now we introduce another rescaling of the eigenvalues that rescales the
typical gap between them to order 1. We set
𝑎𝑗 1
(4.35) 𝜆𝑗 = , 𝜚sc (0) = ,
𝜚sc (0)𝑁 𝜋
where 𝜚sc is the semicircle density; see (3.1). Using the expression (4.20) for
correlation functions, we have, in terms of the original variable, that
1 (2) 𝑎1 𝑎2 ˜11
𝐾 𝑁
𝐾˜12
𝑁
(4.36) 𝑝 ( , ) = det ( 𝑁 )
[𝜚sc (0)]2 𝑁 𝜚sc (0)𝑁 𝜚sc (0)𝑁 ˜21 𝐾
𝐾 ˜22
𝑁

where
˜12
𝑁 1 1 𝑎1 𝑎2
𝐾 ∶= 𝐾𝑁 ( , ) ⇀ 𝑆(𝑎1 − 𝑎2 )
𝜚sc (0) √𝑁 − 1 𝜌𝑠𝑐 (0)√𝑁 𝜌𝑠𝑐 (0)√𝑁
(4.37)
sin 𝑥
𝑆(𝑥) ∶= ,
𝜋𝑥
4.2. UNIVERSALITY OF CLASSICAL INVARIANT ENSEMBLES 25

where we used (4.32). Due to the rescaling, this calculation reveals the corre-
lation functions around 𝐸 = 0. The general formula for any fixed energy 𝐸 in
the bulk, i.e., |𝐸| < 2, can be obtained similarly, and it is given by
1 (𝑛) 𝛼1 𝛼2 𝛼𝑛
(4.38) 𝑝 (𝐸 + ,𝐸 + ,…,𝐸 + )
[𝜚sc (𝐸)]𝑛 𝑁 𝑁𝜚sc (𝐸) 𝑁𝜚sc (𝐸) 𝑁𝜚sc (𝐸)
(𝑛)
⇀ 𝑞GUE (𝜶) ∶= det(𝑆(𝛼𝑖 − 𝛼𝑗 ))𝑛
𝑖,𝑗=1

as weak convergence of functions in the variables 𝜶 = (𝛼1 , … , 𝛼𝑛 ). Note that the


limit is universal in the sense that it is independent of the energy 𝐸. Formula
(4.38) holds for the GUE case. The corresponding expression for GOE is more
involved [8, 106]:
(𝑛)
𝑞GOE (𝜶) ∶= det(𝐾(𝛼𝑖 − 𝛼𝑗 ))𝑛
𝑖,𝑗=1
(4.39) 𝑆(𝑥) 𝑆 ′ (𝑥)
𝐾(𝑥) ∶= ( 1 𝑥 )
− sgn(𝑥) + ∫0 𝑆(𝑡)d𝑡 𝑆(𝑥)
2

Here the determinant is understood as the trace of the quaternion determinant


after the canonical correspondance between quaternions 𝑎 ⋅ 1 + 𝑏 ⋅ i + 𝑐 ⋅ j + 𝑑 ⋅ k,
𝑎, 𝑏, 𝑐, 𝑑 ∈ ℂ, and 2 × 2 complex matrices given by
1 0 𝑖 0 0 1 0 𝑖
1↔( ), i↔( ), j↔( ), k↔( ).
0 1 0 −𝑖 −1 0 𝑖 0
The main technical input is the refined asymptotic formulae (4.31) for or-
thogonal polynomials. In the case of the classical orthogonal polynomials (ap-
pearing in the standard Gaussian Wigner and Wishart ensembles), they are usu-
ally obtained by Laplace asymptotics from their integral representation. For a
general potential 𝑉 the corresponding analysis is quite involved and depends on
the regularity properties of 𝑉. One successful approach was initiated by Fokas,
Its, and Kitaev [72] and by P. Deift and collaborators via the Riemann-Hilbert
method; see [37] and references therein. An alternative method was presented
in [99, 102] using more direct methods from orthogonal polynomials.

4.2.2. Edge Universality: the Airy Kernel. Near the spectral edges, i.e.,
for energy 𝐸 = ±2, a different scaling has to be used. Recall the formula
𝜓𝑁 (𝑥)𝜓𝑁−1 (𝑦) − 𝜓𝑁 (𝑦)𝜓𝑁−1 (𝑥)
𝐾𝑁 (𝑥, 𝑦) = √𝑁[ ]
𝑥−𝑦
in the rescaled variables 𝑥, 𝑦. We will need the following identity of the deriva-
tives of the Hermite functions
′ 𝑥
(4.40) 𝜓𝑁 (𝑥) = − 𝜓𝑁 (𝑥) + √𝑁𝜓𝑁−1 (𝑥).
2
Thus, we can rewrite
′ ′
𝜓𝑁 (𝑥)𝜓𝑁 (𝑦) − 𝜓𝑁 (𝑦)𝜓𝑁 (𝑥) 1
(4.41) 𝐾𝑁 (𝑥, 𝑦) = [ − 𝜓𝑁 (𝑥)𝜓𝑁 (𝑦)].
𝑥−𝑦 2
26 4. INVARIANT ENSEMBLES

Define a new, rescaled function

𝑢
(4.42) Ψ𝑁 (𝑢) ∶= 𝑁 1/12 𝜓𝑁 (2√𝑁 + ).
𝑁 1/6

The Plancherel-Rotach edge asymptotics for Ψ asserts that

(4.43) lim |Ψ𝑁 (𝑧) − Ai(𝑧)| = 0


𝑁→∞

in any compact domain in ℂ where Ai(𝑥) is the Airy function, i.e.,



1 1
Ai(𝑥) = ∫ cos ( 𝑡 3 + 𝑥𝑡)d𝑡.
𝜋 0 3

It is well-known that the Airy function is the solution to the second-order dif-
ferential equation 𝑦 ″ − 𝑥𝑦 = 0 with vanishing boundary condition at 𝑥 = ∞.
We now define the Airy kernel by
′ ′
Ai(𝑢)Ai (𝑣) − Ai (𝑢)Ai(𝑣)
𝐴(𝑢, 𝑣) ∶= .
𝑢−𝑣
Under the edge scaling (4.42), we have

𝑢 𝑣
(4.44) 𝑁 −1/6 𝐾𝑁 (2√𝑁 + 1/6
, 2√𝑁 + 1/6 ) → 𝐴(𝑢, 𝑣).
𝑁 𝑁

In these variables, we have, by (4.34), that

(2) 𝛼1 𝛼2 (2) 𝛼1 𝛼1
𝑝𝑁 (2 + , 2 + 2/3 ˆ𝑁 (2√𝑁 + 1/6
) = 𝑁𝑝 , 2√𝑁 + 1/6 ).
𝑁 2/3 𝑁 𝑁 𝑁

By using (4.20) we can continue with

(2) 𝛼1 𝛼2
𝑝𝑁 (2 + 2/3
, 2 + 2/3 )
𝑁 𝑁
2
1 𝛼𝑖 𝛼𝑗
=𝑁 √ √
det[𝐾𝑁 (2 𝑁 + 1/6 , 2 𝑁 + 1/6 )]
𝑁(𝑁 − 1) 𝑁 𝑁 𝑖,𝑗=1
2
𝛼𝑖 𝛼
≍ 𝑁 −2/3 det[𝑁 −1/6 𝐾𝑁 (2√𝑁 + , 2 √𝑁 + 𝑗 )] ,
𝑁 1/6 𝑁 1/6 𝑖,𝑗=1

and similar formulas hold for any 𝑘-point correlation functions. Using the lim-
iting statement (4.44), in terms of the original variables, we obtain

(𝑘) 𝛼1 𝛼2 𝛼𝑘
(4.45) 𝑁 𝑘/3 𝑝𝑁 (2 + 2/3
, 2 + 2/3 , … , 2 + 2/3 ) ⇀ det(𝐴(𝛼𝑖 , 𝛼𝑗 ))𝑘𝑖,𝑗=1
𝑁 𝑁 𝑁
4.2. UNIVERSALITY OF CLASSICAL INVARIANT ENSEMBLES 27

in a weak sense. In particular, the last formula with 𝑘 = 2 implies, for any
smooth test function 𝑂 with compact support, that
∑ 𝔼 𝑂(𝑁 2/3 (𝜆𝑗 − 2)), 𝑁 2/3 (𝜆𝑘 − 2))
𝑗≠𝑘

(2) 𝛼1 𝛼2
(4.46) = 𝑁(𝑁 − 1)𝑁 −4/3 ∫ d𝛼1 d𝛼2 𝑂(𝛼1 , 𝛼2 )𝑝𝑁 (2 + , 2 + 2/3 )
ℝ2
𝑁 2/3 𝑁

→ ∫ d𝛼1 d𝛼2 𝑂(𝛼1 , 𝛼2 ) det(𝐴(𝛼𝑖 , 𝛼𝑗 ))2𝑖,𝑗=1 .


ℝ2
A similar statement holds at the lower spectral edge, 𝐸 = −2.
CHAPTER 5

Universality for Generalized Wigner Matrices

5.1. Different Notions of Universality


The universality of eigenvalue statistics can be considered via several differ-
ent notions of convergence, which yield somewhat different concepts of univer-
sality. The local statistics can either be expressed in terms of local correlation
functions rescaled around some energy 𝐸, or one may ask for the gap statistics
for a gap 𝜆𝑗+1 − 𝜆𝑗 with a given (typically 𝑁-dependent) label 𝑗. These will be
called fixed energy and fixed gap universality, and these two concepts do not
coincide. To see this, notice that since eigenvalues fluctuate on a scale much
larger than the typical eigenvalue spacing, the label 𝑗 of the eigenvalue 𝜆𝑗 clos-
est to a fixed energy 𝐸 is not a deterministic function of 𝐸. Furthermore, one
may look for cumulative statistics of gaps averaged over a mesoscopic scale; i.e.,
both above concepts have a natural averaged version. We now define these four
concepts precisely.
The correlation functions and the gaps need to be rescaled by the limiting
local density 𝜚(𝐸) to get a universal limit. In the case of generalized Wigner
matrices we have 𝜚(𝐸) = 𝜚sc (𝐸), but the definitions below hold in a more gen-
eral setup as well. We also remark that all results in this book hold for both real
symmetric matrices or complex Hermitian matrices. For simplicity of notation,
we formulate all concepts and later state all results in terms of real symmetric
matrices.
We recall the notation J𝐴, 𝐵K ∶= ℤ ∩ [𝐴, 𝐵] for the set of integers between
two real numbers 𝐴 < 𝐵.
(i) Fixed energy universality (in the bulk): For any 𝑛 ≥ 1, 𝐹 ∶ ℝ𝑛 → ℝ
a smooth and compactly supported function, and for any 𝜅 > 0, we
have, uniformly in 𝐸 ∈ [−2 + 𝜅, 2 − 𝜅],

1 (𝑛) 𝜶 (𝑛)
(5.1) lim 𝑛
∫ d𝜶 𝐹(𝜶)𝑝𝑁 (𝐸 + ) = ∫ d𝜶 𝐹(𝜶)𝑞GOE (𝜶)
𝑁→∞ 𝜚(𝐸) 𝑁𝜚(𝐸)
ℝ𝑛 ℝ𝑛

(𝑛)
where 𝜶 = (𝛼1 , … , 𝛼𝑛 ) . Here 𝑝𝑁 is the 𝑛-point function of the ma-
(𝑛)
trix ensemble and 𝑞GOE is the limiting 𝑛-point function of the GOE
(𝑛)
defined in (4.39). To shorten the argument of 𝑝𝑁 , we used the con-
vention that 𝐸+𝜶 = (𝐸+𝛼1 , … , 𝐸+𝛼𝑛 ) for any 𝜶 = (𝛼1 , … , 𝛼𝑛 ) ∈ ℂ𝑁 .
(ii) Averaged energy universality (in the bulk, on scale 𝑁 −1+𝜉 ): For any
𝑛 ≥ 1, 𝐹 ∶ ℝ𝑛 → ℝ a smooth and compactly supported function,
29
30 5. UNIVERSALITY FOR GENERALIZED WIGNER MATRICES

and for some 0 < 𝜉 < 1 and for any 𝜅 > 0, we have, uniformly in
𝐸 ∈ [−2 + 𝜅, 2 − 𝜅],
𝐸+𝑏
1 d𝑥 (𝑛) 𝜶
(5.2) lim ∫ ∫ d𝜶 𝐹(𝜶)𝑝𝑁 (𝑥 + )d𝜶 =
𝑁→∞ 𝜚(𝐸)𝑛 2𝑏 𝑁𝜚(𝐸)
𝐸−𝑏 ℝ𝑛
(𝑛)
∫ d𝜶 𝐹(𝜶)𝑞GOE (𝜶)
ℝ𝑛

where 𝑏 = 𝑏𝑁 ∶= 𝑁 −1+𝜉 .
(iii) Fixed gap universality (in the bulk): Fix any positive number 0 < 𝛼 < 1
and an integer 𝑛. For any smooth compactly supported function 𝐺 ∶
ℝ𝑛 → ℝ and for any 𝑘, 𝑚 ∈ J𝛼𝑁, (1 − 𝛼)𝑁K we have

lim ||𝔼𝜇𝑁 𝐺((𝑁𝜚(𝜆𝑘 ))(𝜆𝑘 − 𝜆𝑘+1 ), … , (𝑁𝜚(𝜆𝑘 ))(𝜆𝑘 − 𝜆𝑘+𝑛 ))


𝑁→∞
(5.3)
− 𝔼(GOE) 𝐺((𝑁𝜚(𝜆𝑚 ))(𝜆𝑚 − 𝜆𝑚+1 ), … , (𝑁𝜚(𝜆𝑚 ))(𝜆𝑚 − 𝜆𝑚+𝑛 ))|| = 0

where 𝜇𝑁 denotes the law of the random matrix ensemble under con-
sideration.
(iv) Averaged gap universality (in the bulk, on scale 𝑁 −1+𝜉 ): Using the
same notation as in (iii) and ℓ = 𝑁 𝜉 with 0 < 𝜉 < 1, we have
𝑘+ℓ
| 1
lim | ∑ 𝔼𝜇𝑁 𝐺((𝑁𝜚(𝜆𝑗 ))(𝜆𝑗 − 𝜆𝑗+1 ), … , (𝑁𝜚(𝜆𝑗 ))(𝜆𝑗 − 𝜆𝑗+𝑛 ))
𝑁→∞ | 2ℓ + 1
(5.4) 𝑗=𝑘−ℓ

|
− 𝔼(GOE) 𝐺((𝑁𝜚(𝜆𝑚 ))(𝜆𝑚 − 𝜆𝑚+1 ), … , (𝑁𝜚(𝜆𝑚 ))(𝜆𝑚 − 𝜆𝑚+𝑛 ))|| = 0.

Note that in the bulk ℓ = 𝑁 𝜉 consecutive eigenvalues range over a


scale 𝑁 −1+𝜉 in the spectrum, hence the name.
Although we have formulated the universality notions in terms of large-𝑁
limits, all limit statements in this book have effective error bounds of the form
𝑁 −𝑐 for some 𝑐 > 0. For the four notions of universality stated here, the fixed
energy (fixed gap, resp.) universality obviously implies averaged energy (av-
eraged gap, resp.) universality. However, the fixed gap universality and the
fixed energy universality are not logical consequences of each other. On the
other hand, under suitable conditions, averaged energy universality and aver-
aged gap universality are equivalent. In Section 14 we will prove that the latter
implies the former. This is the direction that we actually use in the proof. The
opposite direction goes along similar arguments and will be omitted.
In this book, we will focus on establishing averaged energy universality.
From this one, the average gap universality follows by the equivalence just men-
tioned. We now state precisely our representative universality theorem that will
be proven in this book. It asserts average energy universality (in the bulk) for
generalized Wigner matrices where the scale of the energy average is reduced
to 𝑁 −1+𝜉 for arbitrary 𝜉 > 0.
5.2. THE THREE-STEP STRATEGY 31

For convenience, we assume that the normalized matrix entries


(5.5) 𝜁𝑖𝑗 ∶= √𝑁ℎ𝑖𝑗
have a polynomial decay of arbitrarily high degree; i.e., for all 𝑝 ∈ ℕ there is a
constant 𝜇𝑝 such that
(5.6) 𝔼|𝜁𝑖𝑗 |𝑝 ≤ 𝜇𝑝
for all 𝑁, 𝑖, and 𝑗. Recall from (2.6) that 𝑠𝑖𝑗 ≍ 𝑁 −1 , so this condition is equivalent
to (2.8). We make this assumption in order to streamline notation, but in fact,
our results hold, with the same proof, provided (5.6) is valid for some large but
fixed 𝑝. In fact, even 𝑝 = 4 + 𝜖 is sufficient; see Chapter 18. On the other hand,
if we strengthen the decay condition to uniform subexponential decay (2.7) then
certain estimates concerning the local semicircle law (e.g., Theorem 6.7) become
stronger, although we will not prove them in this book (see [70]).
Theorem 5.1. Let 𝐻 be an 𝑁 ×𝑁 generalized, real symmetric Wigner matrix.
Suppose that the rescaled matrix elements √𝑁ℎ𝑖𝑗 satisfy the decay condition (5.6).
Then averaged energy universality holds in the sense of (5.1) on scale 𝑁 −1+𝜉 for
any 0 < 𝜉 < 1.
The rest of this book is devoted to a proof of Theorem 5.1 and related ques-
tions on eigenvectors. The proof of this theorem will be based on the following
three-step strategy.

5.2. The Three-Step Strategy


Step 1. Local semicircle law and delocalization of eigenvectors. The local
semicircle law states that the density of eigenvalues is given by the semicircle
law not only as a weak limit on macroscopic scales (3.1), but also in a high-
probability sense with an effective convergence speed and down to short scales
containing only 𝑁 𝜉 eigenvalues for all 𝜉 > 0. This will imply the rigidity of
eigenvalues; i.e., the eigenvalues are near their classical locations in the sense to
be made clear in Theorem 11.5. We also obtain precise estimates on the matrix
elements of the Green function which, in particular, imply complete delocal-
ization of eigenvectors.
Step 2. Universality for Gaussian divisible ensembles: The Gaussian divisi-
ble ensembles are matrices of the form 𝐻𝑡 = 𝑒−𝑡/2 𝐻0 + √1 − 𝑒−𝑡 𝐻 G where 𝑡 > 0
is a parameter, 𝐻0 is a (generalized) Wigner matrix, and 𝐻 G is an independent
GOE matrix. The parametrization of 𝐻𝑡 is chosen so that 𝐻𝑡 can be obtained
by an Ornstein-Uhlenbeck process starting from 𝐻0 . More precisely, consider
the following matrix Ornstein-Uhlenbeck process
1 1
(5.7) d𝐻𝑡 = dB𝑡 − 𝐻𝑡 d𝑡
√𝑁 2
with initial data 𝐻0 where B𝑡 = {𝑏𝑖𝑗,𝑡 }𝑁
𝑖,𝑗=1 is a symmetric 𝑁 × 𝑁 matrix such
that its matrix elements 𝑏𝑖𝑗,𝑡 for 𝑖 < 𝑗 and 𝑏𝑖𝑖,𝑡 /√2 are independent, standard
32 5. UNIVERSALITY FOR GENERALIZED WIGNER MATRICES

Brownian motions starting from zero. Then 𝐻𝑡 and 𝑒−𝑡/2 𝐻0 + √1 − 𝑒−𝑡 𝐻 G have
the same distribution.
The aim of Step 2 is to prove the bulk universality of 𝐻𝑡 for 𝑡 = 𝑁 −𝜏 for the
entire range of 0 < 𝜏 < 1. This is connected to the local ergodicity of the Dyson
Brownian motion, which we now define.
Definition 5.2. Given a real parameter 𝛽 ≥ 1, consider the following sys-
tem of stochastic differential equations (SDE) :
√2 𝜆𝑖 1 1
(5.8) d𝜆𝑖 = d𝐵𝑖 + (− + ∑ )d𝑡, 𝑖 ∈ J1, 𝑁K,
√𝛽𝑁 2 𝑁 𝑗≠𝑖 𝜆𝑖 − 𝜆𝑗
where (𝐵𝑖 ) is a collection of real-valued, independent, standard Brownian mo-
tions. The solution of this SDE is called the Dyson Brownian motion (DBM).
In a seminal paper [45], Dyson observed that the eigenvalue flow of the
matrix OU process is exactly the DBM with 𝛽 = 1, 2 corresponding to real sym-
metric or complex Hermitian ensembles. Furthermore, the invariant measure
of the DBM is given by the Gaussian 𝛽-ensemble defined in (4.4). Dyson further
conjectured that the time to “local equilibrium” for DBM is of order 1/𝑁, while
the time to global equilibrium is of order one. It should be noted that there is
no standard notion for the “local equilibrium”; we will instead take a practical
point of view to interpret Dyson’s conjecture as that the local statistics of the
DBM at any time 𝑡 ≫ 𝑁 −1 satisfy the universality defined earlier in this sec-
tion. In other words, Dyson’s conjecture is exactly that the local statistics of 𝐻𝑡
are universal for 𝑡 = 𝑁 −𝜏 for any 0 < 𝜏 < 1.
Step 3. Approximation by a Gaussian divisible ensemble. This is a simple
density argument in the space of matrix ensembles which shows that for any
probability distribution of the matrix elements there exists a Gaussian divisible
distribution with a small Gaussian component, as in Step 2, such that the two
associated Wigner ensembles have asymptotically identical local eigenvalue sta-
tistics. The general result to compare any two matrix ensembles with matching
moments will be given in Theorem 16.1. Alternatively, to follow the evolution
of the Green function under the OU flow, we can use the following continuity
of the matrix OU process.
Step 3a. Continuity of eigenvalues under the matrix OU process. In Theorem
15.2 we will show that the changes of the local statistics in the bulk under the
flow (5.7) up to time scales 𝑡 ≪ 𝑁 −1/2 are negligible; see Lemma 15.4. This
clearly can be used in combination with Step 2 to complete the proof of Theorem
5.1.
The three-step strategy outlined here is very general, and it has been applied
to many different models as we will explain in Chapter 18. It can also be extended
to the edges of the spectrum, yielding the universality at the spectral edges. This
will be reviewed in Chapter 17.
CHAPTER 6

Local Semicircle Law for Universal Wigner Matrices

6.1. Setup
We recall the definition of the universal Wigner matrices (Definition 2.1);
in particular, the matrix elements may have different distributions but indepen-
dence (up to symmetry) is always assumed. The fundamental data of this model
is the 𝑁 × 𝑁 matrix of variances 𝑆 = (𝑠𝑖𝑗 ) where
𝑠𝑖𝑗 ∶= 𝔼 |ℎ𝑖𝑗 |2 ,
and we assume that 𝑆 is (doubly) stochastic:
(6.1) ∑ 𝑠𝑖𝑗 = 1
𝑗

for all 𝑖. We will assume the polynomial decay analogous to (5.6)


(6.2) 𝔼|ℎ𝑖𝑗 |𝑝 ≤ 𝜇𝑝 [𝑠𝑖𝑗 ]𝑝/2
where 𝜇𝑝 depends only on 𝑝 and is uniform in 𝑖, 𝑗, and 𝑁.
We introduce the parameter 𝑀 ∶= [max𝑖,𝑗 𝑠𝑖𝑗 ]−1 that expresses the maximal
size of 𝑠𝑖𝑗 :
(6.3) 𝑠𝑖𝑗 ≤ 𝑀 −1
for all 𝑖 and 𝑗. We regard 𝑁 as the fundamental parameter and 𝑀 = 𝑀𝑁 as a
function of 𝑁. In this section we do not assume a lower bound on 𝑠𝑖𝑗 . Notice
that for generalized Wigner matrices 𝑀 is comparable with 𝑁 and one may use
𝑁 everywhere instead of 𝑀 in all error bounds in this section. However, we
wish to keep 𝑀 as a separate parameter and assume only that
(6.4) 𝑁𝛿 ≤ 𝑀 ≤ 𝑁
for some fixed 𝛿 > 0. For standard Wigner matrices, ℎ𝑖𝑗 are identically dis-
tributed, hence 𝑠𝑖𝑗 = 𝑁1 and 𝑀 = 𝑁. Another motivating example where 𝑀
may substantially differ from 𝑁 is the random band matrices that play a key role
interpolating between Wigner matrices and random Schrödinger operators (see
Section 18.7).
Example 6.1. Random band matrices are characterized by translation in-
variant variances of the form
1 |𝑖 − 𝑗|𝑁
(6.5) 𝑠𝑖𝑗 = 𝑓( ),
𝑊 𝑊
33
34 6. LOCAL SEMICIRCLE LAW FOR UNIVERSAL WIGNER MATRICES

where 𝑓 is a smooth, symmetric probability density on ℝ, 𝑊 is a large param-


eter, called the band width, and |𝑖 − 𝑗|𝑁 denotes the periodic distance on the
discrete torus 𝕋 of length 𝑁. In this case, 𝑀 is comparable with 𝑊, which is typ-
ically much smaller than 𝑁. The generalization in higher spatial dimensions is
straightforward: in this case the rows and columns of 𝐻 are labeled by a discrete
𝑑-dimensional torus 𝕋𝑑𝐿 of length 𝐿 with 𝑁 = 𝐿𝑑 .
Recall from (3.1) that the global eigenvalue density of 𝐻 in the 𝑁 → ∞ limit
follows the celebrated Wigner semicircle law,
1
(6.6) 𝜚sc (𝑥) ∶= (4 − 𝑥 2 )+ ,
2𝜋 √
and its Stieltjes transform with the spectral parameter 𝑧 = 𝐸 + i𝜂 is defined by
𝜚sc (𝑥)
(6.7) 𝑚sc (𝑧) ∶= ∫ d𝑥.

𝑥−𝑧
The spectral parameter 𝑧 will sometimes be omitted from the notation. The two
endpoints ±2 of the support of 𝜚sc are called the spectral edges.
It is well-known (see Section 3.3) that the Stieltjes transform 𝑚sc is the
unique solution of the quadratic equation
1
(6.8) 𝑚(𝑧) + +𝑧 =0
𝑚(𝑧)
with Im 𝑚(𝑧) > 0 for Im 𝑧 > 0. Thus we have
−𝑧 + √𝑧2 − 4
(6.9) 𝑚sc (𝑧) =
2
where the square root is chosen so that Im 𝑚sc (𝑧) > 0 for Im 𝑧 > 0. In particular,
√𝑧2 − 4 ≍ 𝑧 for 𝑧 large so that it cancels the −𝑧-term in the above formula.
Under this convention, we collect basic bounds on 𝑚sc in the following lemma.
Lemma 6.2. We have for all 𝑧 = 𝐸 + 𝑖𝜂 with 𝜂 > 0 that
(6.10) |𝑚sc (𝑧)| = |𝑚sc (𝑧) + 𝑧|−1 ≤ 1.
Furthermore, there is a constant 𝑐 > 0 such that for 𝐸 ∈ [−10, 10] and 𝜂 ∈ (0, 10]
we have
(6.11) 𝑐 ≤ |𝑚sc (𝑧)| ≤ 1 − 𝑐𝜂,
2
(6.12) |1 − 𝑚𝑠𝑐 (𝑧)| ≍ √𝜅 + 𝜂,
as well as
√𝜅 + 𝜂 if |𝐸| ≤ 2,
(6.13) Im 𝑚sc (𝑧) ≍ { 𝜂
if |𝐸| ≥ 2,
√𝜅+𝜂

where 𝜅 ∶= ||𝐸| − 2| denotes the distance of 𝐸 to the spectral edges.


Proof. The proof is an elementary exercise using (6.9). □
6.2. SPECTRAL INFORMATION ON 𝑺 35

We define the Green function or the resolvent of 𝐻 through


𝐺(𝑧) ∶= (𝐻 − 𝑧)−1 ,
and denote its entries by 𝐺𝑖𝑗 (𝑧). We recall from (3.9) that the Stieltjes transform
of the empirical spectral measure
1
𝜚(d𝑥) = 𝜚𝑁 (d𝑥) ∶= ∑ 𝛿(𝜆𝛼 − 𝑥)d𝑥
𝑁 𝛼

for the eigenvalues 𝜆1 ≤ ⋯ ≤ 𝜆𝑁 of 𝐻 is


𝜚𝑁 (d𝑥) 1 1 1
(6.14) 𝑚(𝑧) = 𝑚𝑁 (𝑧) ∶= ∫ = Tr 𝐺(𝑧) = ∑ .

𝑥−𝑧 𝑁 𝑁 𝛼 𝑧 − 𝜆𝛼

We remark that every quantity related to the random matrix 𝐻, such as the
eigenvalues, Green function, empirical density of states, and its Stieltjes trans-
form, all depend on 𝑁, but this dependence will often be omitted in the notation
for brevity. In some formulas, especially in statements of the main results, we
will put back the 𝑁 dependence to stress its presence.
Since
1 1 1 𝜂
Im = 𝜃𝜂 (𝐸 − 𝜆𝛼 ) with 𝜃𝜂 (𝑥) ∶=
𝜋 𝑧 − 𝜆𝛼 𝜋 𝑥2 + 𝜂2
is an approximation to the identity (i.e., delta function) at the scale 𝜂 = Im 𝑧,
we have 𝜋 −1 Im 𝑚𝑁 (𝑧) = 𝜚𝑁 ∗ 𝜃𝜂 (𝑧); i.e., the imaginary part of 𝑚𝑁 (𝑧) is the
density of the eigenvalues “at the scale 𝜂.” Thus, the convergence of the Stieltjes
transform 𝑚𝑁 (𝑧) to 𝑚sc (𝑧) as 𝑁 → ∞ will show that the empirical local density
of the eigenvalues around the energy 𝐸 in a window of size 𝜂 converges to the
semicircle law 𝜚sc (𝐸). Therefore, the key task is to control 𝑚𝑁 (𝑧) for small 𝜂.

6.2. Spectral Information on 𝑺


We will show that the diagonal matrix elements of the resolvent 𝐺𝑖𝑖 (𝑧) satisfy
a system of self-consistent vector equations of the form
𝑁
1
(6.15) − ≈ 𝑧 + ∑ 𝑠𝑖𝑗 𝐺𝑗𝑗 (𝑧)
𝐺𝑖𝑖 (𝑧) 𝑗=1

with very high probability. This equation will be viewed as a small perturbation
of the deterministic equation
𝑁
1
(6.16) − = 𝑧 + ∑ 𝑠𝑖𝑗 𝑚𝑖 (𝑧),
𝑚𝑖 (𝑧) 𝑗=1

which, under the side condition that Im 𝑚𝑖 > 0, has a unique solution, namely
𝑚𝑖 (𝑧) = 𝑚sc (𝑧) for every 𝑖 (see (6.8)). Here the stochasticity condition (6.1)
is essential. For the stability analysis of (6.16) the invertibility of the operator
2
1 − 𝑚sc (𝑧)𝑆 plays a key role.
36 6. LOCAL SEMICIRCLE LAW FOR UNIVERSAL WIGNER MATRICES

Therefore an important parameter of the model is


‖ 1 ‖
(6.17) Γ(𝑧) ∶= ‖‖ , Im 𝑧 > 0.
1 − 𝑚sc (𝑧)𝑆 ‖‖∞→∞
2

Note that 𝑆, being a stochastic matrix, satisfies −1 ≤ 𝑆 ≤ 1, and 1 is an eigen-


value with eigenvector e = 𝑁 −1/2 (1, … , 1), 𝑆e = e. For convenience, we assume
that 1 is a simple eigenvalue of 𝑆 (which holds if 𝑆 is irreducible and aperiodic).
Another important parameter is
‖ 1 | ‖
(6.18) Γ̃ (𝑧) ∶= ‖‖ | ,
2
1 − 𝑚sc (𝑧)𝑆 |e⟂ ‖‖∞→∞
2
i.e., the norm of (1 − 𝑚sc 𝑆)−1 restricted to the subspace orthogonal to the con-
stants. Recalling that −1 ≤ 𝑆 ≤ 1 and using the upper bound |𝑚sc | ≤ 1 from
(6.11), we find that there is a constant 𝑐 > 0 such that
(6.19) 𝑐 ≤ Γ̃ ≤ Γ.
2
Since |𝑚sc (𝑧)| ≤ 1 − 𝑐𝜂 , we can expand (1 − 𝑚𝑠𝑐 𝑆)−1 into a geometric series.
Using that ‖𝑆‖ℓ∞ →ℓ∞ ≤ 1 from (6.1), we obtain the trivial upper bound
(6.20) Γ ≤ 𝐶𝜂 −1 .
We remark that we also have the following easy lower bound on Γ:
2 −1 1
(6.21) Γ ≥ |1 − 𝑚sc | ≥ .
2
This can be seen by applying the matrix (1 − 𝑚sc 𝑆)−1 to the constant vector and
2

using the definition of Γ.


For standard Wigner matrices, 𝑠𝑖𝑗 = 𝑁1 , and for bounded energies |𝐸| ≤ 𝐶,
we easily obtain that
1 1
(6.22) Γ(𝑧) = 2
≍ , Γ̃ (𝑧) = 1,
|1 − 𝑚sc (𝑧)| √𝜅𝐸 + 𝜂
where
(6.23) 𝜅𝐸 ∶= min{|𝐸 − 2|, |𝐸 + 2|}
is the distance of 𝐸 = Re 𝑧 from the spectral edges. To see this, we note that
in this case 𝑆 = 𝑃e , the projection operator onto the direction e. Hence the
2
operator [1 − 𝑚sc (𝑧)𝑆]−1 can be computed explicitly. The comparison relation
in (6.22) follows from (6.12). In the general case, we have the same relations up
to a constant factor:
Lemma 6.3. For generalized Wigner matrices (Definition 2.1) and a bounded
range of energies, say for any |𝐸| ≤ 10, we have
𝑐 𝐶 𝐶
(6.24) 2
≤ Γ(𝑧) ≤ 2
≍ , 𝑐 ≤ Γ̃ (𝑧) ≤ 𝐶,
|1 − 𝑚sc (𝑧)| |1 − 𝑚sc (𝑧)| √𝜅𝐸 + 𝜂
where the constant 𝐶 depends on 𝐶inf and 𝐶sup in (2.6) and the constant 𝑐 in the
lower bound is universal.
The proof will be given in Chapter 6.5.
6.3. STOCHASTIC DOMINATION 37

6.3. Stochastic Domination


The following definition introduces a notion of a high-probability bound
that is suited for our purposes. It first appeared in [55] and it relieves us from
the burden of keeping track of exceptional sets of small probability where some
bound does not hold.
Definition 6.4 (Stochastic domination). Let
𝑋 = (𝑋 (𝑁) (𝑢) ∶ 𝑁 ∈ ℕ, 𝑢 ∈ 𝑈 (𝑁) ), 𝑌 = (𝑌 (𝑁) (𝑢) ∶ 𝑁 ∈ ℕ, 𝑢 ∈ 𝑈 (𝑁) ),
be two families of nonnegative random variables where 𝑈 (𝑁) is a possibly 𝑁-
dependent parameter set. We say that 𝑋 is stochastically dominated by 𝑌, uni-
formly in 𝑢, if for all (small) 𝜀 > 0 and (large) 𝐷 > 0 we have
sup ℙ[𝑋 (𝑁) (𝑢) > 𝑁 𝜀 𝑌 (𝑁) (𝑢)] ≤ 𝑁 −𝐷
𝑢∈𝑈(𝑁)

for large enough 𝑁 ≥ 𝑁0 (𝜀, 𝐷). Unless stated otherwise, throughout this paper
the stochastic domination will always be uniform in all parameters apart from
the parameter 𝛿 in (6.4) and the sequence of constants 𝜇𝑝 in (5.6). Thus, 𝑁0 (𝜀, 𝐷)
also depends on 𝛿 and 𝜇𝑝 . If 𝑋 is stochastically dominated by 𝑌, uniformly in 𝑢,
we use the notation 𝑋 ≺ 𝑌. Moreover, if for some complex family 𝑋 we have
|𝑋| ≺ 𝑌, we also write 𝑋 = 𝑂≺ (𝑌).
The following proposition collects some basic properties of the stochastic
domination. The proofs are left as an exercise.
Proposition 6.5. The relation ≺ satisfies the following properties:
(i) ≺ is transitive: 𝑋 ≺ 𝑌 and 𝑌 ≺ 𝑍 imply 𝑋 ≺ 𝑍.
(ii) ≺ satisfies the familiar arithmetic rules of order relations; i.e., if 𝑋1 ≺ 𝑌1
and 𝑋2 ≺ 𝑌2 , then 𝑋1 + 𝑋2 ≺ 𝑌1 + 𝑌2 and 𝑋1 𝑋2 ≺ 𝑌1 𝑌2 .
(iii) Moreover, the following cancellation property holds:
(6.25) if 𝑋 ≺ 𝑌 + 𝑁 −𝜀 𝑋 for some 𝜀 > 0, then 𝑋 ≺ 𝑌.
(iv) Furthermore, if 𝑋 ≺ 𝑌, 𝔼𝑌 ≥ 𝑁 −𝐶 , and |𝑋| ≤ 𝑁 𝐶 almost surely
with some fixed exponent 𝐶, then for any 𝜀 > 0 and sufficiently large
𝑁 ≥ 𝑁0 (𝜀) we have
(6.26) 𝔼𝑋 ≤ 𝑁 𝜀 𝔼𝑌.
Later in Lemma 10.1 the relation (6.26) will be extended to partial expectations.
We now define appropriate subsets of the spectral parameter 𝑧.
Definition 6.6 (Spectral domain). We call an 𝑁-dependent family
𝐃 ≡ 𝐃(𝑁) ⊂ {𝑧 ∶ |𝐸| ≤ 10, 𝑀 −1 ≤ 𝜂 ≤ 10}
a spectral domain. (Recall that 𝑀 ≡ 𝑀𝑁 depends on 𝑁.)
(𝑁)
We always consider families 𝑋 (𝑁) (𝑢) = 𝑋𝑖 (𝑧) indexed by 𝑢 = (𝑧, 𝑖) where
𝑧 takes on values in some spectral domain 𝐃, and 𝑖 takes on values in some finite
(possibly 𝑁-dependent or empty) index set. The stochastic domination 𝑋 ≺ 𝑌
38 6. LOCAL SEMICIRCLE LAW FOR UNIVERSAL WIGNER MATRICES

of such families will always be uniform in 𝑧 and 𝑖, and we usually do not state
this explicitly. Usually which spectral domain 𝐃 is meant will be clear from the
context, in which case we shall not mention it explicitly.
For example, using Chebyshev’s inequality and (6.2) one easily finds that
(6.27) ℎ𝑖𝑗 ≺ (𝑠𝑖𝑗 )1/2 ≺ 𝑀 −1/2
uniformly in 𝑖 and 𝑗, so that we may also write ℎ𝑖𝑗 = 𝑂≺ ((𝑠𝑖𝑗 )1/2 ). The definition
of ≺ with the polynomial factors 𝑁 −𝜀 and 𝑁 −𝐷 are tailored for the assumption
(6.2). We remark that if the analogous subexponential decay (2.7) is assumed
then a stronger form of stochastic domination can be introduced, but we will
not pursue this direction here.

6.4. Statement of the Local Semicircle Law


The local semicircle law is very sensitive to the closeness of 𝜂 = Im 𝑧 to
the real axis. When 𝜂 is too small the resolvent and its trace become strongly
fluctuating. So our results will hold only above a certain threshold for 𝜂. We
now define the lower threshold for 𝜂 that depends on the energy 𝐸 ∈ [−10, 10]

1 𝑀 −𝛾 𝑀 −2𝛾
(6.28) ˜
𝜂 𝐸 ∶= min{𝜂 ∶ ≤ min{ , }
𝑀𝜉 Γ̃ (𝐸 + i𝜉)3 Γ̃ (𝐸 + i𝜉)4 Im 𝑚sc (𝐸 + i𝜉)
holds for all 𝜉 ≥ 𝜂}.

Although this expression looks complicated, we shall see that it comes out
naturally in the analysis of the self-consistent equations for the Green functions.
Here 𝛾 > 0 is a parameter that can be chosen arbitrarily small; for all practical
purposes, the reader can neglect it. For generalized Wigner matrices 𝑀 ≍ 𝑁,
from (6.24) we have
(6.29) 𝜂 𝐸 ≤ 𝐶𝑁 −1+2𝛾 ;
˜
i.e., we will get the local semicircle law on the smallest possible scale 𝜂 ≫ 𝑁 −1 ,
modulo an 𝑀 𝛾 correction with an arbitrary, small exponent. We remark that
if we assume subexponential decay (2.7) instead of the polynomial decay (6.2),
then the small 𝑀 𝛾 correction can be replaced with a (log 𝑀)𝐶 factor.
Finally, we define our fundamental control parameter

Im 𝑚sc (𝑧) 1
(6.30) Π(𝑧) ∶= + .
√ 𝑀𝜂 𝑀𝜂
We can now state the main result of this section, which in this full generality
first appeared in [55]. Previous results that have cumulatively led to this general
formulation will be summarized at the end of the section.
Theorem 6.7 (Local semicircle law [55]). Consider a universal Wigner ma-
trix satisfying the polynomial decay condition (6.2) and (6.3). Then, uniformly in
6.4. STATEMENT OF THE LOCAL SEMICIRCLE LAW 39

the energy |𝐸| ≤ 10 and 𝜂 ∈ [ ˜


𝜂 𝐸 , 10], we have the bounds

Im 𝑚sc (𝑧) 1
(6.31) max |𝐺𝑖𝑗 (𝑧) − 𝛿𝑖𝑗 𝑚sc (𝑧)| ≺ Π(𝑧) = + , 𝑧 = 𝐸 + i𝜂,
𝑖,𝑗 √ 𝑀𝜂 𝑀𝜂
as well as
1
(6.32) |𝑚𝑁 (𝑧) − 𝑚sc (𝑧)| ≺ .
𝑀𝜂
Moreover, outside of the spectrum we have the stronger estimate
1 1
(6.33) |𝑚𝑁 (𝑧) − 𝑚(𝑧)| ≺ +
𝑀(𝜅𝐸 + 𝜂) (𝑀𝜂)2 √𝜅𝐸 + 𝜂
uniformly in 𝑧 ∈ {𝑧 ∶ 2 ≤ |𝐸| ≤ 10, 𝜂˜𝐸 ≤ 𝜂 ≤ 10, 𝑀𝜂 √𝜅𝐸 + 𝜂 ≥ 𝑀 𝛾 } for any
fixed 𝛾 > 0 where 𝜅𝐸 ∶= ||𝐸| − 2|.
For the generalized Wigner matrix, the threshold 𝜂˜𝐸 can be chosen 𝜂˜𝐸 =
𝑁 −1+2𝛾 .
We point out two remarkable features of these bounds. The error term for
the resolvent entries behaves essentially as (𝑀𝜂)−1/2 , with an improvement near
the edges where Im 𝑚sc vanishes. The error bound for the Stieltjes transform,
i.e., for the average of the diagonal resolvent entries, is one order better, (𝑀𝜂)−1 ,
but without improvement near the edge.
The resolvent matrix element 𝐺𝑖𝑗 may be viewed as the scalar product ⟨e𝑖 ,
𝐺e𝑗 ⟩ where e𝑖 is the 𝑖th coordinate vector. In fact, a more general version of
(6.31), the isotropic local law, also holds for generalized Wigner matrices:
Theorem 6.8 (Isotropic law [21]). For a generalized Wigner matrix with
polynomial decay (5.6) and for any fixed unit vector 𝐯, 𝐰 we have
Im 𝑚sc (𝑧) 1
(6.34) |⟨𝐯, 𝐺(𝑧)𝐰⟩ − 𝑚sc (𝑧)⟨𝐯, 𝐰⟩| ≺ +
√ 𝑁𝜂 𝑁𝜂
uniformly in the set {𝑧 = 𝐸 + 𝑖𝜂 ∶ |𝐸| ≤ 𝜔−1 , 𝑁 −1+𝜔 ≤ 𝜂 ≤ 𝜔−1 } for any fixed
𝜔 > 0.
The isotropic law was first proven in [89] for Wigner matrices under a van-
ishing third moment condition. The general case in the form above was given
in [21]. We will not prove this result here since it is not needed for the proof of
Theorem 5.1.
We will first prove a weaker version of Theorem 6.7 in Chapter 7, the so-
called weak local semicircle law where the error term is not optimal. After that,
we will prove Theorem 6.7 in Chapter 8 using Γ instead of Γ̃ . This yields the
same estimate as given in Theorem 6.7 but only on a smaller set of the spectral
parameter for which the argument is somewhat simpler. The proof of Theorem
6.7 for the entire domain will only be sketched in Chapter 9, and we refer the
reader to the original paper for the complete version. Chapter 7 is included
mainly for pedagogical reasons to introduce the ideas of continuity argument
and self-consistent equations. Chapter 8 demonstrates how to use the vector
40 6. LOCAL SEMICIRCLE LAW FOR UNIVERSAL WIGNER MATRICES

self-consistent equations in a simpler setup. In Chapter 9 we sketch the analysis


of the same self-consistent equation but splitting it on space orthogonal to the
constant vector and exploiting a spectral gap. Finally, Chapter 10 is devoted to
a key technical lemma, the fluctuation averaging lemma.
We close this section with a short summary of the main developments that
have gradually led to the local semicircle law, Theorem 6.7, in its general form.
In Section 8.2 we will demonstrate that all proofs of the local semicircle law rely
on some version of a self-consistent equation. At the beginning this was a scalar
equation for 𝑚𝑁 used in various forms by many authors; see, e.g., Pastur [110],
Girko [77], and Bai [10]. The self-consistent vector equation for 𝑣𝑖 = 𝐺𝑖𝑖 − 𝑚sc
(see Section 8.2.2) first appeared in [69]. This allowed one to deviate from the
identical distributions for ℎ𝑖𝑗 and opened up the route to estimates on individ-
ual resolvent matrix elements. Finally, the self-consistent matrix equation for
𝔼|𝐺𝑥𝑦 |2 first appeared in [54], and it yielded the diffusion profile for the resol-
vent. In this book we only need the vector equation.
The local semicircle law has three important features that have been gradu-
ally achieved. We explain them for the case when 𝑀 is comparable with 𝑁, the
general case is only a technical extension. First, the local law in the bulk holds
down to the scale expressed by the lower bound 𝜂 ≫ 𝑁 −1 . This is the smallest
possible scale to control 𝑚(𝑧), since at scale 𝜂 ≲ 𝑁 −1 a few individual eigen-
values strongly influence its behavior, and 𝑚(𝑧) is not close to a deterministic
quantity. This optimal scale in the bulk of the spectrum was first established
for Wigner matrices in a series of papers [60–62] with the extension to general-
ized Wigner matrices in [69]. Second, the optimal speed of convergence in the
entrywise bound (6.31) is 𝑁 −1/2 , while in the bound 𝑚 − 𝑚sc (6.32) it is 𝑁 −1 .
These optimal 𝑁-dependences were first achieved in [68] by introducing the
fluctuation averaging mechanism. Third, near the spectral edge the stability of
the self-consistent equation deteriorates, manifested by the behavior of Γ(𝑧) in
(6.24) when 𝜅𝐸 is small. This can be compensated by the fact that the density
is small near the edge. After many attempts and weaker results in [64, 68, 69],
the optimal form of this compensation was eventually found in [70], basically
separating the analysis of the self-consistent equation onto the space orthogonal
to constants. Since Γ̃ does not deteriorate near the edge, see (6.24), the estimate
becomes optimal.

˜ and the Proof of Lemma 6.3


6.5. Appendix: Behavior of 𝚪 and 𝚪
In this appendix we give basic bounds on the parameters Γ and Γ̃ . Readers
interested only in the Wigner case may skip this section entirely; if 𝑠𝑖𝑗 = 𝑁 −1 ,
then the explicit formulas in (6.22) already suffice. As it turns out, the behavior
of Γ and Γ̃ is intimately linked with the spectrum of 𝑆, more precisely with its
spectral gaps. Recall that the spectrum of 𝑆 lies in [−1, 1], with 1 being a simple
eigenvalue.
Definition 6.9. Let 𝛿− be the distance from −1 to the spectrum of 𝑆 and
𝛿+ the distance from 1 to the spectrum of 𝑆 restricted to 𝐞⟂ . In other words, 𝛿±
˜ AND THE PROOF OF LEMMA 6.3
6.5. APPENDIX: BEHAVIOR OF 𝚪 AND 𝚪 41

are the largest numbers satisfying


𝑆 ≥ −1 + 𝛿− , 𝑆|𝐞⟂ ≤ 1 − 𝛿+ .
For generalized Wigner matrices the lower and upper spectral gaps satisfy
𝛿± ≥ 𝑎 with a constant 𝑎 ∶= min{𝐶inf , 𝐶sup }; see (2.6). This simple fact follows
easily by splitting
𝑆 = (𝑆 − 𝑎ee∗ ) + 𝑎ee∗
and noticing that the first term is (1−𝑎) times a doubly stochastic matrix; hence
its spectrum lies in [−1 + 𝑎, 1 − 𝑎].
Proof of Lemma 6.3. The lower bound on Γ in (6.24) follows from (1 −
2
𝑚sc 𝑆)−1 𝐞 = (1 − 𝑚sc
2 −1
) 𝐞 combined with (6.12). For the upper bound, we first
2
notice that 1−𝑚sc 𝑆 is invertible since −1 ≤ 𝑆 ≤ 1 and |𝑚sc | < 1; see (6.11). Since
𝑚sc and its reciprocal are bounded (6.11), it is sufficient to bound the inverse
−2
of 𝑚sc − 𝑆. Since the spectrum of 𝑆 lies in the set [−1 + 𝛿− , 1 − 𝛿+ ] ∪ {1} ⊂
−2
[−1 + 𝑎, 1 − 𝑎] ∪ {1}, and |𝑚sc | ≥ 1, we easily get
‖ 1 ‖ ‖ 1 ‖
‖‖ 1 − 𝑚2 𝑆 ‖‖ 2 2 ≤ 𝐶 ‖‖ 𝑚−2 − 𝑆 ‖‖ 2 2
sc ℓ →ℓ sc ℓ →ℓ
(6.35)
𝐶 𝐶𝑎
≤ −2
≤ 2
.
min{𝑎, |𝑚sc − 1|} |1 − 𝑚sc |
In order to find the ℓ∞ → ℓ∞ norm, we solve (1 − 𝑚sc
2
𝑆)𝐯 = 𝐮 directly using
(6.10):
2
‖𝐯‖∞ = ‖𝐮 + 𝑚sc 𝑆𝐯‖∞ ≤ ‖𝐮‖∞ + ‖𝑆‖ℓ1 →ℓ∞ ‖𝐯‖1

≤ ‖𝐮‖∞ + 𝑁 1/2 ‖𝑆‖ℓ1 →ℓ∞ ‖𝐯‖2


‖ 1 ‖
≤ ‖𝐮‖∞ + 𝑁 1/2 ‖𝑆‖ℓ1 →ℓ∞ ‖ 2 ‖
‖𝐮‖2
1 − 𝑚sc 𝑆 ℓ2 →ℓ2
𝐶𝑎 𝑁‖𝑆‖ℓ1 →ℓ∞
≤ (1 + 2
)‖𝐮‖∞ .
|1 − 𝑚sc |
Here we used (6.35) and that ‖𝐯‖1 = ∑𝑖 |𝑣𝑖 | ≤ 𝑁 1/2 (∑𝑖 |𝑣𝑖 |2 )1/2 and ‖𝐮‖2 ≤
𝑁 1/2 ‖𝐮‖∞ . Since for generalized Wigner matrices 𝑁‖𝑆‖ℓ1 →ℓ∞ = 𝑁 ⋅ max𝑖𝑗 𝑠𝑖𝑗 ≤
𝐶sup from (2.6), this proves
‖ 1 ‖ 𝐶
‖‖ 1 − 𝑚2 𝑆 ‖‖ ∞ ∞ ≤ |1 − 𝑚2 | ,
sc ℓ →ℓ sc

where the constant depends on 𝐶inf and 𝐶sup . This proves the upper bound on Γ
in (6.24).
Finally, we bound Γ̃ . The lower bound was already given in (6.19). For the
upper bound we follow the argument above, but we restrict 𝑆 to e⟂ . Since the
spectrum of this restriction lies in [−1 + 𝑎, 1 − 𝑎], we immediately get the bound
𝐶/𝑎 for the ℓ2 -norm of (1 − 𝑚sc2
𝑆)−1 |e⟂ in the right-hand side of (6.35). This can
42 6. LOCAL SEMICIRCLE LAW FOR UNIVERSAL WIGNER MATRICES

be lifted to the same estimate for the ℓ∞ -norm. This completes the proof of the
lemma. □

This simple proof of Lemma 6.3 used both spectral gaps and that 𝑠𝑖𝑗 =
𝑂(𝑁 −1 ). Lacking this information in the general case, the following proposi-
tion gives explicit bounds on Γ and Γ̃ depending on the spectral gaps 𝛿± in the
general case. We recall the notation 𝑧 = 𝐸 + i𝜂 and 𝜅 = 𝜅𝐸 ∶= ||𝐸| − 2| and
define
𝜂
𝜅+ if |𝐸| ≤ 2,
(6.36) 𝜃 ≡ 𝜃(𝑧) ∶= { √𝜅+𝜂
√𝜅 + 𝜂 if |𝐸| > 2.

Proposition 6.10. For the matrix elements of 𝑆 we assume 0 ≤ 𝑠𝑖𝑗 = 𝑠𝑗𝑖 ≤


−1
𝑀 and ∑𝑗 𝑠𝑖𝑗 = 1. Then there is a universal constant 𝐶 such that the following
holds uniformly in the domain {𝑧 = 𝐸 + 𝑖𝜂 ∶ |𝐸| ≤ 10, 𝑀 −1 ≤ 𝜂 ≤ 10} and, in
particular, in any spectral domain 𝐃.
(i) We have the estimate

1 𝐶 log 𝑁 𝐶 log 𝑁
(6.37) ≤ Γ(𝑧) ≤ 2 ≤ .
𝐶 √𝜅 + 𝜂 1 − max± ||
1±𝑚 sc | min{𝜂 + 𝐸 2 , 𝜃}
2|

(ii) In the presence of a gap 𝛿− we may improve the upper bound to

𝐶 log 𝑁
(6.38) Γ(𝑧) ≤ .
min{𝛿− + 𝜂 + 𝐸 2 , 𝜃}

(iii) For Γ̃ we have the bounds

𝐶 log 𝑁
(6.39) 𝐶 −1 ≤ Γ̃ (𝑧) ≤ .
min{𝛿− + 𝜂 + 𝐸 2 , 𝛿+ + 𝜃}
Proof. The first bound of (6.37) follows from
2
(1 − 𝑚sc 𝑆)−1 𝐞 = (1 − 𝑚sc
2 −1
) 𝐞

combined with (6.12).


In order to prove the second bound of (6.37), we write
1 1 1
2
= 2𝑆
1 − 𝑚sc 𝑆 21− 1+𝑚 sc
2

and observe that


2 2
‖ 1 + 𝑚sc 𝑆‖ | 1 ± 𝑚sc |
(6.40) ≤ max | |
‖‖ 2 ‖
‖ℓ2 →ℓ2 ± | 2 | =∶ 𝑞.
˜ AND THE PROOF OF LEMMA 6.3
6.5. APPENDIX: BEHAVIOR OF 𝚪 AND 𝚪 43

Therefore,
0𝑛 −1 2 𝑛 ∞ 2 𝑛
‖ 1 ‖ ‖ 1 + 𝑚sc 𝑆‖
√ ‖ 1 + 𝑚sc 𝑆‖
‖‖ 1 − 𝑚2 𝑆 ‖‖ ∞ ∞ ≤ ∑ ‖
‖ ‖
‖ + 𝑁 ∑ ‖
‖ ‖‖ 2 2
sc ℓ →ℓ 𝑛=0
2 ℓ∞ →ℓ∞ 𝑛=𝑛0
2 ℓ →ℓ
𝑛0
𝑞
≤ 𝑛 0 + √𝑁
1−𝑞
𝐶 log 𝑁
≤ ,
1−𝑞
𝐶 log 𝑁
where in the last step we chose 𝑛0 = 01−𝑞 for large enough 𝐶0 . Here we used
that ‖𝑆‖ℓ∞ →ℓ∞ ≤ 1 and (6.11) to estimate the summands in the first sum. This
concludes the proof of the second bound of (6.37).
The third bound of (6.37) follows from the elementary estimates
2
| 1 − 𝑚sc | 2
| |
| 2 | ≤ 1 − 𝑐(𝜂 + 𝐸 ),
(6.41) 2
| 1 + 𝑚sc | 2 𝜂
| |
| 2 | ≤ 1 − 𝑐((Im 𝑚sc ) + Im 𝑚sc + 𝜂 ) ≤ 1 − 𝑐𝜃,
for some universal constant 𝑐 > 0, where in the last step we used Lemma 6.2.
The estimate (6.38) follows similarly. Due to the gap 𝛿− in the spectrum
of 𝑆, we may replace the estimate (6.40) with
2 2
‖ 1 + 𝑚sc 𝑆‖ 2 || 1 + 𝑚sc ||
(6.42) ‖‖ ‖‖ 2 2 ≤ max{1 − 𝛿− − 𝜂 − 𝐸 , | 2 |}.
2 ℓ →ℓ
Hence (6.38) follows using (6.41).
The lower bound of (6.39) was proved in (6.19). The upper bound is proved
similarly to (6.38), except that (6.42) is replaced with
2 2
‖ 1 + 𝑚sc 𝑆| ‖ 2 | 1 + 𝑚sc |
| ≤ max{1 − 𝛿 − 𝜂 − 𝐸 , min{1 − 𝛿 , | |
‖‖ 2 |𝐞⟂ ‖‖ℓ2 →ℓ2 − + | 2 |}}.
This concludes the proof of (6.39). □
CHAPTER 7

Weak Local Semicircle Law

Before we prove the local semicircle law in the strong form, Theorem 6.7, for
pedagogical reasons we first prove the following weaker version whose proof is
easier. For simplicity, in this section we consider the Wigner case, i.e., 𝑠𝑖𝑗 = 𝑁1
and 𝑀 = 𝑁. In the bulk, this weaker estimate is still effective for all 𝜂 down to
the smallest scales 𝑁 −1+𝜀 , but the power of 𝑁1 in the error estimate is not optimal
( 12 instead of 1 in (6.32)). Near the edge the bound is even weaker; the power of
1
𝑁
is reduced to 41 , indicating that this proof is not sufficiently strong near the
edge.

Theorem 7.1 (Weak local semicircle law). Let 𝑧 = 𝐸 + 𝑖𝜂 and let 𝜅 ∶=


||𝐸| − 2|. Let 𝐻 be a Wigner matrix, let 𝐺(𝑧) = (𝐻 − 𝑧)−1 be its resolvent, and
set 𝑚𝑁 (𝑧) = 𝑁1 Tr 𝐺(𝑧). We assume that the single-entry distribution satisfies the
decay condition (5.6). Choose any 𝛾 > 0. Then for 𝑧 ∈ D𝛾 ∶= {𝐸 + 𝑖𝜂 ∶ |𝐸| ≤
10, 𝑁 −1+𝛾 ≤ 𝜂 ≤ 10} we have
1 1
(7.1) |𝑚𝑁 (𝑧) − 𝑚sc (𝑧)| ≺ min { , 1/4
}.
√𝑁𝜂𝜅 (𝑁𝜂)

For definiteness, we present the proof for the Hermitian case, but all formu-
las below carry over to the other symmetry classes with obvious modifications.

7.1. Proof of the Weak Local Semicircle Law, Theorem 7.1


The proof is divided into five steps.
Step 1. Schur complement formula. The first step to prove the weak local
semicircle law is the to use the Schur complement formula, which we state in
the following lemma. The proof is an elementary exercise.
Lemma 7.2 (Schur formula). Let 𝐴, 𝐵, and 𝐶 be 𝑛 × 𝑛, 𝑚 × 𝑛, and 𝑚 × 𝑚
matrices. We define the (𝑚 + 𝑛) × (𝑚 + 𝑛) matrix 𝐷 as
𝐴 𝐵∗
(7.2) 𝐷 ∶= ( )
𝐵 𝐶
ˆ as
and 𝑛 × 𝑛 matrix 𝐷
(7.3) ˆ ∶= 𝐴 − 𝐵∗ 𝐶 −1 𝐵.
𝐷
45
46 7. WEAK LOCAL SEMICIRCLE LAW

ˆ is invertible if 𝐷 is invertible, and for any 1 ≤ 𝑖, 𝑗 ≤ 𝑛 we have


Then 𝐷
(7.4) ˆ −1 )𝑖𝑗
(𝐷 −1 )𝑖𝑗 = (𝐷
for the corresponding matrix elements.
Recall that 𝐺𝑖𝑗 = 𝐺𝑖𝑗 (𝑧) denotes the matrix element of the resolvent
1
𝐺𝑖𝑗 = ( ) .
𝐻 − 𝑧 𝑖𝑗
Let 𝐻 (𝑖) be the matrix where all matrix elements of 𝐻 = (ℎ𝑎𝑏 ) in the 𝑖th column
and row are set to be 0:
(𝐻 (𝑖) )𝑎𝑏 ∶= ℎ𝑎𝑏 ⋅ 1(𝑎 ≠ 𝑖) ⋅ 1(𝑏 ≠ 𝑖), 𝑎, 𝑏 = 1, … , 𝑁.
(𝑖) [𝑖]
In other words, 𝐻 is the 𝑖th minor 𝐻 of 𝐻 augmented to an 𝑁 × 𝑁 matrix by
adding a zero row and column. Recall that the minor 𝐻 [𝑖] is an (𝑁 − 1) × (𝑁 − 1)
matrix with the 𝑖th row and column removed:
[𝑖]
𝐻𝑎𝑏 ∶= ℎ𝑎𝑏 , 𝑎, 𝑏 ≠ 𝑖.
Denote the Green function of 𝐻 (𝑖) by 𝐺 (𝑖) (𝑧) = (𝐻 (𝑖) − 𝑧)−1 , which is again an
𝑁 × 𝑁 matrix. Notice that
−1
(𝑖)
⎧(−𝑧) if 𝑎 = 𝑏 = 𝑖,
(7.5) 𝐺𝑎𝑏 = 0 if exactly one of 𝑎 or 𝑏 equals 𝑖,
⎨ [𝑖] [𝑖] −1
⎩𝐺𝑎𝑏 = (𝐻 − 𝑧)𝑎𝑏 if 𝑎 ≠ 𝑖, 𝑏 ≠ 𝑖.
We warn the reader that in some earlier papers on the subject a different con-
vention was used where 𝐻 (𝑖) and 𝐺 (𝑖) denoted the (𝑁 − 1) × (𝑁 − 1) minors (𝐻 [𝑖]
with the current notation) and their Green functions. The current convention
simplifies several formulas although the mathematical contents of both versions
are identical.
With similar conventions, we can define 𝐺 (𝑖𝑗) , 𝐺 (𝑖𝑗𝑘) , etc. The superscript
in parenthesis for resolvents always means “after setting the corresponding row
and column of 𝐻 to 0” (in some terminology, this procedure is also described as
“zero out the corresponding row and column”). In particular, by the indepen-
dence of matrix elements, this means that the matrix 𝐺 (𝑖𝑗) , say, is independent
of the 𝑖th and 𝑗th row and column of 𝐻. This helps to decouple dependencies in
formulae. Let
(7.6) 𝐚𝑖 = (ℎ1𝑖 , ℎ2𝑖 , … , 0, … , ℎ𝑁𝑖 )T
be the 𝑖th column of 𝐻, after setting the 𝑖th entry to 0.
Using Lemma 7.2 with 𝑛 = 1 and 𝑚 = 𝑁 − 1, (7.5), and (7.6), we have
1
(7.7) 𝐺𝑖𝑖 = ,
ℎ𝑖𝑖 − 𝑧 − 𝐚𝑖 ⋅ 𝐺 (𝑖) 𝐚𝑖
where
(𝑖) [𝑖]
(7.8) 𝐚𝑖 ⋅ 𝐺 (𝑖) 𝐚𝑖 = ∑ ℎ𝑖𝑘 𝐺𝑘𝑙 ℎ𝑙𝑖 = ∑ ℎ𝑖𝑘 𝐺𝑘𝑙 ℎ𝑙𝑖 .
𝑘,𝑙≠𝑖 𝑘,𝑙≠𝑖
7.1. PROOF OF THE WEAK LOCAL SEMICIRCLE LAW, THEOREM 7.1 47

(We sometimes use 𝐮 ⋅ 𝐯 instead of 𝐮∗ 𝐯 or (𝐮, 𝐯) for the usual Hermitian scalar
product.) We now introduce some notation.
Definition 7.3 (Partial expectation and independence). Let 𝑋 ≡ 𝑋(𝐻) be
a random variable. For 𝑖 ∈ {1, … , 𝑁} define the operations 𝑃𝑖 and 𝑄𝑖 through
𝑃𝑖 𝑋 ∶= 𝔼(𝑋|𝐻 (𝑖) ), 𝑄𝑖 𝑋 ∶= 𝑋 − 𝑃𝑖 𝑋.
We call 𝑃𝑖 the partial expectation in the index 𝑖. Moreover, we say that 𝑋 is
independent of a set 𝕋 ⊂ {1, … , 𝑁} if 𝑋 = 𝑃𝑖 𝑋 for all 𝑖 ∈ 𝕋.
We can decompose 𝐚𝑖 ⋅ 𝐺 (𝑖) 𝐚𝑖 into its expectation and fluctuation
𝐚𝑖 ⋅ 𝐺 (𝑖) 𝐚𝑖 = 𝑃𝑖 [𝐚𝑖 ⋅ 𝐺 (𝑖) 𝐚𝑖 ] + 𝑍𝑖
where
(7.9) 𝑍𝑖 ∶= 𝑄𝑖 [𝐚𝑖 ⋅ 𝐺 (𝑖) 𝐚𝑖 ].
Since 𝐺 (𝑖) is independent of 𝐚𝑖 , we need to compute expectations and fluctua-
tions of quadratic functions. The expectation is easy:
(𝑖) (𝑖) 1 (𝑖)
𝑃𝑖 [𝐚𝑖 ⋅ 𝐺 (𝑖) 𝐚𝑖 ] = 𝑃𝑖 ∑ 𝐚𝑘𝑖 𝐺𝑘𝑙 𝐚𝑙𝑖 = ∑ 𝑃𝑖 [ℎ𝑖𝑘 𝐺𝑘𝑙 ℎ𝑙𝑖 ] = ∑ 𝐺𝑘𝑘 ,
𝑘,𝑙 𝑘,𝑙≠𝑖
𝑁 𝑘≠𝑖

where in the last step we used that different matrix elements are independent,
i.e., 𝑃𝑖 [ℎ𝑖𝑘 ℎ𝑖𝑙 ] = 𝑁1 𝛿𝑘𝑙 . The summations always run over all indices from 1 to 𝑁,
apart from those that are explicitly excluded. We define
(𝑖) 1 1 (𝑖)
𝑚𝑁 (𝑧) ∶= Tr 𝐺 [𝑖] (𝑧) = ∑ 𝐺 (𝑧),
𝑁−1 𝑁 − 1 𝑘≠𝑖 𝑘𝑘
[𝑖] (𝑖)
where we used 𝐺𝑘𝑘 = 𝐺𝑘𝑘 for 𝑘 ≠ 𝑖 from (7.5). Hence we have the identity
1 1
(7.10) 𝐺𝑖𝑖 = = 1
.
𝑖 (𝑖) 𝑖
ℎ𝑖𝑖 − 𝑧 − 𝑃𝑖 [𝐚 ⋅ 𝐺 𝐚 ] − 𝑍𝑖 (𝑖)
ℎ𝑖𝑖 − 𝑧 − (1 − )𝑚𝑁 (𝑧) − 𝑍𝑖
𝑁
Step 2. Interlacing of eigenvalues. We now estimate the difference between
(𝑖)
𝑚sc and 𝑚𝑁 . The first step is the following well-known lemma. We include a
short proof for completeness. For simplicity we consider the randomized setup
with a continuous distribution to avoid multiple eigenvalues. The general case
easily follows from standard approximation arguments.
Lemma 7.4 (Interlacing of eigenvalues). Let 𝐻 be a symmetric or Hermitian
𝑁 × 𝑁 matrix with continuous distribution. Decompose 𝐻 as follows:
ℎ 𝐚∗
(7.11) 𝐻=( ),
𝐚 𝐵
where 𝐚 = (ℎ12 , … , ℎ1𝑁 )∗ and 𝐵 = 𝐻 [1] is the (𝑁 − 1) × (𝑁 − 1) minor of 𝐻
obtained by removing the first row and first column from 𝐻. Denote by 𝜇1 ≤ ⋯ ≤
𝜇𝑁 the eigenvalues of 𝐻 and 𝜆1 ≤ ⋯ ≤ 𝜆𝑁−1 the eigenvalues of 𝐵. Then with
48 7. WEAK LOCAL SEMICIRCLE LAW

probability 1 the eigenvalues of 𝐵 are distinct and the eigenvalues of 𝐻 and 𝐵 are
interlaced:
(7.12) 𝜇1 < 𝜆1 < 𝜇2 < 𝜆2 < ⋯ < 𝜇𝑁−1 < 𝜆𝑁−1 < 𝜇𝑁 .

Proof. Since matrices with multiple eigenvalues form a lower-dimensional


submanifold within the set of all Hermitian matrices, the eigenvalues are dis-
tinct almost surely. Let 𝜇 be one of the eigenvalues of 𝐻 and let 𝐯 = (𝑣1 , … , 𝑣𝑁 )T
be a normalized eigenvector associated with 𝜇. From the eigenvalue equation
𝐻𝐯 = 𝜇𝐯 and from (7.11) we find that
(7.13) ℎ𝑣1 + 𝐚 ⋅ 𝐰 = 𝜇𝑣1 and 𝐚𝑣1 + 𝐵𝐰 = 𝜇𝐰
with 𝐰 = (𝑣2 , … , 𝑣𝑁 )T . From these equations we obtain
𝐰 = (𝜇 − 𝐵)−1 𝐚𝑣1 and thus (𝜇 − ℎ)𝑣1 = 𝐚 ⋅ (𝜇 − 𝐵)−1 𝐚𝑣1
(7.14) 𝑣1 𝜉𝛼
= ∑
𝑁 𝛼 𝜇 − 𝜆𝛼
using the spectral representation of 𝐵 where we set
2
𝜉𝛼 = |√𝑁𝐚 ⋅ 𝐮𝛼 | ,
with 𝐮𝛼 being the normalized eigenvector of 𝐵 associated with the eigenvalue 𝜆𝛼 .
From the continuity of the distribution it also holds that 𝑣1 ≠ 0 almost surely
and thus we have
1 𝜉𝛼
(7.15) 𝜇−ℎ = ∑ ,
𝑁 𝛼 𝜇 − 𝜆𝛼
where the 𝜉𝛼 ’s are strictly positive almost surely (notice that 𝐚 and 𝐮𝛼 are inde-
pendent). In particular, this shows that 𝜇 ≠ 𝜆𝛼 for any 𝛼. In the open interval
𝜇 ∈ (𝜆𝛼−1 , 𝜆𝛼 ), the function
1 𝜉𝛼
Φ(𝜇) ∶= ∑
𝑁 𝛼 𝜇 − 𝜆𝛼
is strictly decreasing from ∞ to −∞; therefore there is exactly one solution to
the equation 𝜇 − ℎ = Φ(𝜇). A similar argument shows that there is also exactly
one solution below 𝜆1 and above 𝜆𝑁−1 . This completes the proof. □
We can now compare the normalized traces of 𝐺 and 𝐺 [𝑖] :
Lemma 7.5. Under the conditions of Lemma 7.4, for any 1 ≤ 𝑖 ≤ 𝑁 we have
| 1 (𝑖) | 𝐶
(7.16) |𝑚𝑁 (𝑧) − (1 − )𝑚𝑁 (𝑧)| ≤
| 𝑁 | 𝑁𝜂 , 𝜂 = Im 𝑧 > 0.

Proof. With the notation of Lemma 7.4, let


1 1
𝐹(𝑥) ∶= #{𝜆𝑗 ≤ 𝑥} and 𝐹 (𝑖) (𝑥) ∶= #{𝜇𝑗 ≤ 𝑥}
𝑁 𝑁−1
7.1. PROOF OF THE WEAK LOCAL SEMICIRCLE LAW, THEOREM 7.1 49

denote the normalized counting functions of the eigenvalues. The interlacing


property of the eigenvalues of 𝐻 and 𝐻 [𝑖] (see (7.12)) in terms of these functions
means that
sup |𝑁𝐹(𝑥) − (𝑁 − 1)𝐹 (𝑖) (𝑥)| ≤ 1.
𝑥
Then, after integrating by parts,
| 1 (𝑖) | | d𝐹(𝑥) 1 d𝐹 (𝑖) (𝑥) |
|𝑚𝑁 (𝑧) − (1 − )𝑚𝑁 (𝑧)| = |∫ − (1 − ) ∫ |
| 𝑁 | | 𝑥−𝑧 𝑁 𝑥−𝑧 |
1 | 𝑁𝐹(𝑥) − (𝑁 − 1)𝐹 (𝑖) (𝑥) |
(7.17) = |∫ d𝑥 |
𝑁 | (𝑥 − 𝑧)2 |
1 d𝑥 𝐶
≤ ∫ ≤ ,
𝑁 |𝑥 − 𝑧|2 𝑁𝜂
which proves (7.16). Note that the (7.16) would also hold without the prefactor
1
(1 − ) since we have the trivial bound |𝑚(𝑖) | ≤ 𝜂 −1 . □
𝑁
We can now rewrite (7.10) into the following form, after summing over 𝑖:
1 1 1
(7.18) 𝑚𝑁 (𝑧) = ∑ with Ω𝑖 ∶= ℎ𝑖𝑖 − 𝑍𝑖 + 𝑂( ).
𝑁 𝑖 −𝑧 − 𝑚𝑁 (𝑧) + Ω𝑖 𝑁𝜂
At this stage we can explain the main idea for the proof of the local semicircle
law (7.1). Equation (7.18) is the key self-consistent equation for 𝑚𝑁 , the Stieltjes
transform of the empirical eigenvalue density of 𝐻. Notice that if Ω𝑖 were 0 then
we would have the equation
1
(7.19) 𝑚=−
𝑚+𝑧
complemented with the side condition that Im 𝑚 > 0. This is exactly the defin-
ing equation (6.8) of the Stieltjes transform of the semicircle density. In the next
step we will give bounds on ℎ𝑖𝑖 and 𝑍𝑖 , and then we will effectively control the
stability of (7.19) against small perturbations. This will give the desired estimate
(7.1) on |𝑚𝑁 − 𝑚sc |.
Step 3. Large-deviation estimate of the fluctuations. The quantity 𝑍𝑖 defined
in (7.9) will be viewed as a random variable in the probability space of the 𝑖th
column. We will need an upper bound on it in the large-deviation sense, i.e.,
with a very high probability. Since it is a quadratic function of the independent
matrix elements of the 𝑖th column, standard large-deviation theorems do not
directly apply. To focus on the main line of the proof and to keep technicalities
at a minimum, we postpone the discussion of the full version of the quadratic
large-deviation bounds to Section 7.2. However, in order to get an idea of the
size of 𝑍𝑖 , we compute its second moment as follows:
(7.20) 𝑃𝑖 |𝑍𝑖 |2 =
(𝑖) (𝑖) (𝑖) (𝑖)
∑ ∑ 𝑃𝑖 [(ℎ𝑖𝑘 𝐺𝑘𝑙 ℎ𝑙𝑖 − 𝑃𝑖 [ℎ𝑖𝑘 𝐺𝑘𝑙 ℎ𝑙𝑖 ])(ℎ𝑖𝑘′ 𝐺 𝑘′ 𝑙′ ℎ𝑙′ 𝑖 − 𝑃𝑖 [ℎ𝑖𝑘′ 𝐺 𝑘′ 𝑙′ ℎ𝑙′ 𝑖 ])].
𝑘,𝑙≠𝑖 𝑘′ ,𝑙′ ≠𝑖
50 7. WEAK LOCAL SEMICIRCLE LAW

Since 𝔼ℎ = 0, the nonzero contributions to this sum come from index com-
binations when all ℎ and ℎ are paired. For pedagogical simplicity, assume that
𝔼ℎ2 = 0; this holds, for example, if the distribution of the real and imaginary
parts are the same. Then ℎ factors in the above expression have to be paired
in such a way that ℎ𝑖𝑘 = ℎ𝑖𝑘′ and ℎ𝑖𝑙 = ℎ𝑖𝑙′ , i.e., 𝑘 = 𝑘 ′ and 𝑙 = 𝑙 ′ . Note that
pairing ℎ𝑖𝑘 = ℎ𝑖𝑙 would give 0 because the expectation is subtracted. The result
is
1 (𝑖) 2 𝑚 −1 (𝑖) 2
(7.21) ℙ 𝑖 |𝑍𝑖 |2 = 2 ∑ |𝐺𝑘𝑙 | + 4 2 ∑ |𝐺𝑘𝑘 | ,
𝑁 𝑘,𝑙≠𝑖 𝑁 𝑘≠𝑖

where 𝑚4 = 𝔼|√𝑁ℎ|4 is the fourth moment of the single-entry distribution.


The first term can be computed as
1 (𝑖) 2 1 [𝑖] 2 1
2
∑ |𝐺𝑘𝑙 | = 2 ∑ |𝐺𝑘𝑙 | = 2 ∑ (|𝐺 [𝑖] |2 )𝑘𝑘
𝑁 𝑘,𝑙≠𝑖 𝑁 𝑘,𝑙≠𝑖 𝑁 𝑘≠𝑖
1 1 [𝑖]
(7.22) = ∑ Im 𝐺𝑘𝑘
𝑁𝜂 𝑁 𝑘≠𝑖
1 1 (𝑖)
= (1 − ) Im 𝑚𝑁
𝑁𝜂 𝑁
where |𝐺|2 = 𝐺𝐺 ∗ . In the middle step we have used the Ward identity valid for
the resolvent 𝑅(𝑧) = (𝐴 − 𝑧)−1 of any Hermitian matrix 𝐴:
1 1
(7.23) |𝑅(𝑧)|2 = 2 2
= Im 𝑅(𝑧), 𝑧 = 𝐸 + 𝑖𝜂.
|𝐴 − 𝐸| + 𝜂 𝜂
We can estimate the second term in (7.21) by the general fact that the resol-
vent 𝑅 = (𝐴 − 𝑧)−1 of any Hermitian matrix 𝐴, we have
1
(7.24) ∑ |𝑅𝑘𝑘 |2 ≤ ∑
𝑘 𝛼
|𝜁𝛼 − 𝑧|2

where 𝜁𝛼 are the eigenvalues of 𝐴. To see this, let 𝐮𝛼 be the normalized eigen-
vectors. Then by the spectral theorem
|𝐮𝛼 (𝑘)|2
𝑅𝑘𝑘 = ∑ ,
𝛼
𝜁𝛼 − 𝑧
and thus we have
2
|𝐮𝛼 (𝑘)|2 |𝐮𝛽 (𝑘)|
∑ |𝑅𝑘𝑘 |2 ≤ ∑ ∑
𝑘 𝑘 𝛼,𝛽
|𝜁𝛼 − 𝑧| |𝜁𝛽 − 𝑧|
1 1
≤∑ ∑ |𝐮 (𝑘)|2 ∑ |𝐮𝛽 (𝑘)|2 = ∑ ,
𝛼
|𝜁𝛼 − 𝑧|2 𝑘 𝛼 𝛽 𝛼
|𝜁𝛼 − 𝑧|2

where we have used the Schwarz inequality and that {𝐮𝛽 } is an orthonormal
basis. Applying this bound to the Green function of 𝐻 [𝑖] with eigenvalues 𝜇𝛼 ,
7.1. PROOF OF THE WEAK LOCAL SEMICIRCLE LAW, THEOREM 7.1 51

we have
1 2 2
∑ |𝐺 (𝑖) | = 1 ∑ |𝐺 [𝑖] |
𝑁 2 𝑘≠𝑖 | 𝑘𝑘 | |
𝑁 2 𝑘≠𝑖 𝑘𝑘 |

𝑁−1
(7.25) 1 1 𝜂
≤ ∑
𝑁𝜂 𝑁 𝛼=1 |𝜇𝛼 − 𝑧|2
1 1 (𝑖)
= (1 − ) Im 𝑚𝑁 .
𝑁𝜂 𝑁
(𝑖)
By (7.16), we can estimate 𝑚𝑁 by 𝑚𝑁 . Thus the estimates (7.22) and (7.25)
confirm that the size of 𝑍𝑖 is roughly
1
(7.26) |𝑍𝑖 | ≲ √Im 𝑚𝑁
√𝑁𝜂
in the second-moment sense. In Section 7.2 we will prove that this inequality
actually holds in large-deviation sense; i.e., we have
1
(7.27) |𝑍𝑖 | ≺ √Im 𝑚𝑁 .
√𝑁𝜂
The diagonal entry ℎ𝑖𝑖 can be easily estimated. Since the single-entry dis-
tribution has finite moments (5.6), we have
ℙ(|ℎ𝑖𝑖 | ≥ 𝑁 𝜀 𝑁 −1/2 ) ≤ 𝐶𝑝 𝑁 −𝜀𝑝
for each 𝑖 and for any 𝜀 > 0. Hence we can guarantee that all diagonal elements
ℎ𝑖𝑖 simultaneously satisfy |ℎ𝑖𝑖 | ≺ 𝑁 −1/2 .
Step 4. Initial estimate at large scales.
To control the error terms in the self-consistent equation (7.18), we need
two inputs. First, from now on we assume that (7.27) holds. Since 𝑁𝜂 is large,
this implies that 𝑍𝑖 is small provided that Im 𝑚𝑁 is bounded. Second, we need
to ensure that the denominator in the right-hand side of (7.18) does not become
too small. Since the main term in this denominator is 𝑧 + 𝑚𝑁 (𝑧), our task is to
show that
1
(7.28) ≺ 1.
|𝑧 + 𝑚𝑁 (𝑧)|
Notice that both inputs are in terms of the yet uncontrolled quantity 𝑚𝑁 ; they
would be trivially available if 𝑚𝑁 were replaced with 𝑚sc (see Lemma 6.2). Since
the smallness of 𝑚𝑁 −𝑚sc is our goal, to break this apparently circular argument
we will use a bootstrap strategy. The convenient bootstrap parameter is 𝜂. We
first establish the result for large 𝜂 in this section, which is called the initial
estimate. Then, in the next section, step by step we reduce the value 𝜂 by using
the control from the previous scale to estimate Im 𝑚𝑁 and |𝑧 + 𝑚𝑁 (𝑧)|−1 . This
control will use the large-deviation bounds on 𝑍𝑖 , which hold with very high
probability. Hence at each step an exceptional event of very small probability
will have to be excluded. This is the main reason why the bootstrap argument is
52 7. WEAK LOCAL SEMICIRCLE LAW

done in small discrete steps, although in essence this is a continuity argument.


We will explain this important argument in more detail in the next section.
Both the initial estimate and the continuity argument use the following sim-
ple idea. By expanding Ω𝑖 in the denominator in (7.18) and using (7.27) and
ℎ𝑖𝑖 ≺ 𝑁 −1/2 , we have
| 1 |
|𝑚𝑁 (𝑧) + |
| 𝑧 + 𝑚𝑁 (𝑧) |
1
≺ max |Ω𝑖 |
(7.29) |𝑧 + 𝑚𝑁 (𝑧)|2 𝑖
1 Im 𝑚𝑁
≺ [ + 𝑁 −1/2 + (𝑁𝜂)−1 ]
|𝑧 + 𝑚𝑁 (𝑧)|2 √ 𝑁𝜂
provided that Ω𝑖 can be considered as a small perturbation of 𝑧 + 𝑚𝑁 (𝑧), i.e.,
1
(7.30) max |Ω𝑖 | ≪ 1,
|𝑧 + 𝑚𝑁 (𝑧)| 𝑖
so that the expansion is justified. We can now use the following elementary
lemma.
Lemma 7.6. Fix 𝑧 = 𝐸 + 𝑖𝜂, with |𝐸| ≤ 20, 0 < 𝜂 ≤ 10, and set 𝜅 ∶= ||𝐸| − 2|.
Suppose that 𝑚 satisfies the inequality
| 1 |
(7.31) |𝑚 + |≤𝛿
| 𝑧 + 𝑚|
for some 𝛿 ≤ 1. Then
| 1 | 𝐶𝛿
(7.32) min {|𝑚 − 𝑚sc (𝑧)|, |𝑚 − |} ≤ ≤ 𝐶 √𝛿.
| 𝑚sc (𝑧) | √𝜅 + 𝜂 + 𝛿
Proof. For 𝛿 ≤ 1 and |𝑧| ≤ 20, (7.31) implies that |𝑚| ≤ 22. Write (7.31)
as
1
𝑚+ =∶ Δ, |Δ| ≤ 𝛿,
𝑧+𝑚
and subtract this from the equation 𝑚sc + (𝑧 + 𝑚sc )−1 = 0. After some simple
algebra we get
1
(7.33) (𝑚 − 𝑚sc )[𝑚 − ] = Δ(𝑚 + 𝑧),
𝑚sc
i.e.,
| 1 |
(7.34) |𝑚 − 𝑚sc |||𝑚 − | ≤ 𝐶𝛿,
𝑚sc |
for some fixed constant 𝐶, where we have used (6.11).
2
We separate two cases. If |1 − 𝑚sc | ≤ 𝐶 ′ √𝛿 for some large 𝐶 ′ , then we write
the above inequality as
| 2 |
1 − 𝑚sc
(7.35) |𝑚 − 𝑚sc | |𝑚 − 𝑚sc − | ≤ 𝐶𝛿.
| 𝑚sc |
7.1. PROOF OF THE WEAK LOCAL SEMICIRCLE LAW, THEOREM 7.1 53

We claim that this implies |𝑚 − 𝑚sc | ≤ 2𝐶 ′ √𝛿/|𝑚sc |. Indeed, if |𝑚 − 𝑚sc | ≥


2𝐶 ′ √𝛿/|𝑚sc | were true, the second factor in (7.35) would be at least 𝐶 ′ √𝛿/|𝑚sc |,
so the left-hand side of (7.35) would be at least 2(𝐶 ′ /|𝑚sc |)2 𝛿. Since 𝑚sc ≍ 1
(see (6.11)), we would get a contradiction if 𝐶 ′ is large enough. Thus we proved
|𝑚 − 𝑚sc | ≤ 2𝐶 ′ √𝛿/|𝑚sc | ≤ 𝐶 √𝛿 in this case, which is in agreement with
2
the first inequality in (7.32) since the condition |1 − 𝑚sc | ≤ 𝐶 ′ √𝛿 also implies
√𝜅 + 𝜂 ≲ √𝛿 by (6.12).
2
In the second case we have |1 − 𝑚sc | ≥ 𝐶 ′ √𝛿, i.e., √𝜅 + 𝜂 ≳ √𝛿, so for
−1
(7.32) we need to prove that |𝑚 − 𝑚sc | ≲ 𝛿/√𝜅 + 𝜂 or |𝑚 − 𝑚sc | ≲ 𝛿/√𝜅 + 𝜂. If
1 2
|𝑚 − 𝑚sc | ≤ |1 − 𝑚sc |/|𝑚sc |, then the second factor in (7.35) would be at least
2
1 2
|1 − 𝑚sc |/|𝑚sc | ≍ √𝜅 + 𝜂, so we would immediately get |𝑚 − 𝑚sc | ≲ 𝛿/√𝜅 + 𝜂.
2
21
We are left with the case |𝑚 − 𝑚sc | ≥ |1 − 𝑚sc |/|𝑚sc |, i.e., |𝑚 − 𝑚sc | ≳ √𝜅 + 𝜂.
2
Rewrite (7.35) as

2 | |
| 1 1 − 𝑚sc 1 |
(7.36) |𝑚 − + | |𝑚 − | ≤ 𝐶𝛿,
| 𝑚sc 𝑚sc | | 𝑚sc |

and repeat the previous argument with the roles of 𝑚sc and 1/𝑚sc interchanged.
−1
We conclude that either |𝑚 − 𝑚sc | ≤ 12 |1 − 𝑚sc
2
|/|𝑚sc |, in which case we im-
−1 −1
mediately get |𝑚 − 𝑚sc | ≲ 𝛿/√𝜅 + 𝜂, or |𝑚 − 𝑚sc | ≳ √𝜅 + 𝜂. In the latter
case, combining it with |𝑚 − 𝑚sc | ≳ √𝜅 + 𝜂 from the previous argument, we
−1 2
would get |𝑚 − 𝑚sc ||𝑚 − 𝑚sc | ≳ 𝜅 + 𝜂. Since 𝜅 + 𝜂 ≳ |1 − 𝑚sc | ≥ 𝐶 ′ 𝛿, this

would contradict (7.34) if 𝐶 is large enough. This completes the proof of the
lemma. □

Applying Lemma 7.6 in (7.29), we see that

| 1 |
(7.37) min{|𝑚𝑁 (𝑧) − 𝑚sc (𝑧)|, ||𝑚𝑁 (𝑧) − |} ≺
𝑚sc (𝑧) |
1 1/2 max𝑖 |Ω𝑖 |
min{(max |Ω𝑖 |) , };
|𝑧 + 𝑚𝑁 (𝑧)| 𝑖 √𝜅

i.e., a good bound on Ω𝑖 directly yields an estimate on 𝑚𝑁 − 𝑚sc provided a


−1
lower bound on |𝑧 + 𝑚𝑁 (𝑧)| is given and if the possibility that |𝑚𝑁 (𝑧) − 𝑚sc (𝑧)|
is small can be excluded.
Using this scheme, we now derive the initial estimate, which is (7.1) for any
𝜂 ≥ 𝑁 −1/16 . To check (7.30), we start with the trivial bound Im 𝑚𝑁 ≤ 𝜂 −1 . By
(7.27), we have 𝑍𝑖 ≺ 𝑁 −15/32 for 𝜂 ≥ 𝑁 −1/16 . Hence we have

(7.38) max |Ω𝑖 | ≤ max |𝑍𝑖 | + |ℎ𝑖𝑖 | + 𝐶𝑁 −15/16 ≺ 𝑁 −15/32 .


𝑖 𝑖
54 7. WEAK LOCAL SEMICIRCLE LAW

Since Im(𝑧 + 𝑚𝑁 (𝑧)) ≥ 𝜂 ≥ 𝑁 −1/16 , (7.30) is satisfied for 𝜂 ≥ 𝑁 −1/16 . Thus


(7.29) implies that
| 1 | 1
(7.39) |𝑚𝑁 (𝑧) +
|
|≺ 𝑂(𝑁 −15/32 ) ≺ 𝑁 −11/32
𝑧 + 𝑚𝑁 (𝑧) | |𝑧 + 𝑚𝑁 (𝑧)|2
for any 𝜂 ≥ 𝑁 −1/16 . The precise exponents do not matter here; our goal is to find

some 𝜂 = 𝑁 −𝑐 such that the last equation holds with an error 𝑁 −𝑐 for some
𝑐, 𝑐′ > 0.
Applying Lemma 7.6 we obtain, for any 𝜂 ≥ 𝑁 −1/16 , that either
| 1 |
(7.40) |𝑚𝑁 (𝑧) − 𝑚sc (𝑧)| ≺ 𝑁 −11/64 or |𝑚𝑁 (𝑧) −
|
| ≺ 𝑁 −11/64 .
𝑚sc (𝑧) |
However, the second option is excluded since Im 𝑚𝑁 ≥ 0 and thus
1 1
|𝑚𝑁 (𝑧) − | ≥ Im 𝑚𝑁 − Im
(7.41) 𝑚sc (𝑧) 𝑚sc (𝑧)
≥ 𝑐 Im 𝑚sc (𝑧) ≥ 𝑐𝜂 ≥ 𝑐𝑁 −1/16 ,
where we used (6.13). This proves that
(7.42) |𝑚𝑁 (𝑧) − 𝑚sc (𝑧)| ≺ 𝑁 −11/64 .
Once we know that |𝑚𝑁 (𝑧) − 𝑚sc (𝑧)| is small, we can simply perturb the
relation
| 1 |
| |
| 𝑧 + 𝑚sc (𝑧) | = |𝑚sc (𝑧)| ≤ 1
from Lemma 6.2 to obtain
| 1 |
(7.43) |𝑚𝑁 (𝑧)| ≤ 𝐶 and || | ≤ 𝐶.
𝑧 + 𝑚𝑁 (𝑧) |
Thus we can bound Ω𝑖 from (7.27) and ℎ𝑖𝑖 ≺ 𝑁 −1/2 by
1
(7.44) max |Ω𝑖 | ≺ ,
𝑖 √𝑁𝜂
and, instead of (7.39), we get the stronger bound
| 1 | 1
|𝑚𝑁 (𝑧) + |≺ .
| 𝑧 + 𝑚𝑁 (𝑧) | √𝑁𝜂

Applying (7.32) once again, with 𝛿 = (𝑁𝜂)−1/2 and using (7.41) to exclude the
possibility that 𝑚𝑁 is close to 1/𝑚sc , we obtain the better bound
1 1
(7.45) |𝑚𝑁 (𝑧) − 𝑚sc (𝑧)| ≺ min( 1/4
, ) for any 𝜂 ≥ 𝑁 −1/16 ,
(𝑁𝜂) √𝑁𝜂𝜅
which is exactly (7.1) for 𝜂 ≥ 𝑁 −1/16 .
Step 5. Continuity argument and completion of the proof. With (7.1) proven
for any 𝜂 ≥ 𝑁 −1/16 , we now proceed to reduce the scale of 𝜂, while 𝐸, the real
7.1. PROOF OF THE WEAK LOCAL SEMICIRCLE LAW, THEOREM 7.1 55

part of 𝑧, is kept fixed. To do this, choose another scale 𝜂1 = 𝜂 − 𝑁 −4 slightly


smaller than 𝜂 (but still 𝜂1 ≥ 𝑁 −1 ). Since
′ 1
(7.46) |𝑚𝑁 (𝑧)| ≤ 𝑁 −1 ∑ ≤ 𝜂 −2 ,
𝛼
|𝜆𝛼 − 𝑧|2

we have the deterministic relation


(7.47) |𝑚𝑁 (𝑧1 ) − 𝑚𝑁 (𝑧)| ≤ 𝑁 −2 , 𝑧 = 𝐸 + i𝜂, 𝑧1 = 𝐸 + i𝜂1 .
Hence (7.1) holds as well for 𝜂1 instead of 𝜂 since 𝑁 −2 is smaller than all our
error terms. But we cannot rely on this idea for the obvious reason that we would
need to apply this procedure at least 𝑁 4 times and the many small errors would
accumulate. We need to show that the estimate in (7.1) does not deteriorate even
by the small amount 𝑁 −2 . Since the definition of ≺ includes additional factors
𝑁 𝜀 , it is not sufficiently sensitive to track such precision. For the continuity
argument we will now abandon the formalism of stochastic domination and
go back to tracking exceptional sets precisely. In Chapter 8 we will set up a
continuity scheme fully within the framework of stochastic domination, but
here we complete the proof in a more elementary way.
The idea is to show that for any small exponent 𝜀 ∈ (0, 𝛾/10) there is a
constant 𝐶1 large enough such that if
−1
𝑆(𝑧) ∶= min {|𝑚𝑁 (𝑧) − 𝑚sc (𝑧)|, |𝑚𝑁 (𝑧) − 𝑚sc (𝑧)|}
(7.48) 1 1
≤ 𝐶1 𝑁 𝜀 min( 1/4
, )
(𝑁𝜂) √𝑁𝜂𝜅
in a set 𝐴 in the probability space, then we not only have the bound
1 1
(7.49) 𝑆(𝑧1 ) ≤ 𝐶1 𝑁 𝜀 min( , ) + 2𝑁 −2
(𝑁𝜂)1/4 √𝑁𝜂𝜅

that trivially follows from (7.47), but we also have that


1 1
(7.50) 𝑆(𝑧1 ) ≤ 𝐶1 𝑁 𝜀 min( , )
(𝑁𝜂1 )1/4 √𝑁𝜂1 𝜅

holds with the same 𝐶1 and 𝜀 in a set 𝐴1 ⊂ 𝐴 with


(7.51) 𝑃(𝐴 ⧵ 𝐴1 ) ≤ 𝑁 −𝐷 for any 𝐷 > 0.
Thus the deterioration of the estimate from (7.47) to (7.50) can be avoided at least
with a very high probability. To see this, we note that in the regime 𝜂 ≥ 𝑁 −1+5𝜀
the bound (7.49) implies that in the set 𝐴 we have
(7.52) |Im 𝑚𝑁 (𝑧1 )| ≤ |𝑚𝑁 (𝑧1 )| ≤ |𝑚sc (𝑧1 )| + 𝑜(1) ≤ 2
and
1
(7.53) ≤ 2.
|𝑧1 + 𝑚𝑁 (𝑧1 )|
56 7. WEAK LOCAL SEMICIRCLE LAW

The last estimate is a perturbative consequence of the bounds |𝑧 + 𝑚sc (𝑧)|−1 ≤ 1


−1
and |𝑧 + 𝑚sc (𝑧)|−1 ≤ 𝐶, where we used either
(7.54) |(𝑧1 + 𝑚𝑁 (𝑧1 )) − (𝑧 + 𝑚sc (𝑧))| ≤
|𝑚𝑁 (𝑧1 ) − 𝑚𝑁 (𝑧)| + |𝑧1 − 𝑧| + |𝑚𝑁 (𝑧) − 𝑚sc (𝑧)| = 𝑜(1)
or
−1
|(𝑧1 + 𝑚𝑁 (𝑧1 )) − (𝑧 + 𝑚sc (𝑧))| ≤
−1
|𝑚𝑁 (𝑧1 ) − 𝑚𝑁 (𝑧)| + |𝑧1 − 𝑧| + |𝑚𝑁 (𝑧) − 𝑚sc (𝑧)| = 𝑜(1);
one of which follows from (7.47) and (7.48). Let 𝐴1 be the intersection of three
sets: 𝐴, the set on which |ℎ𝑖𝑖 | ≤ 𝑁 −1/2+𝜀 holds, and the set that (7.27) holds at
the scale 𝜂1 , i.e., at 𝑧 = 𝐸 +𝑖𝜂1 . Hence, together with (7.53), the condition (7.30)
holds in 𝐴1 . Now we can use (7.37) together with (7.52) and (7.53) to prove that
(7.50) holds if 𝐶1 is chosen large enough.
At each step, we lose a set of probability 𝑁 −𝐷 due to checking (7.27) at that
scale. The estimate from the previous scale is used only to check (7.52), (7.53),
and (7.30). Finally, the constant 𝐶1 in the final bound of 𝑆(𝑧1 ) comes from the
constant in (7.32), which is uniform in 𝜂. Therefore, 𝐶1 does not deteriorate
when passing to the next scale. The only price to pay is the loss of an excep-
tional set of probability 𝑁 −𝐷 . Since the number of steps are 𝑁 4 , while 𝐷 can be
arbitrary large, this loss is affordable. Since the exponent 𝜀 > 0 was arbitrary,
this proves
−1
𝑆(𝑧) = min{|𝑚𝑁 (𝑧) − 𝑚sc (𝑧)|, |𝑚𝑁 (𝑧) − 𝑚sc (𝑧)|}
(7.55) 1 1
≺ min{ , 1/4
}
√𝑁𝜂𝜅 (𝑁𝜂)
for any fixed 𝑧 ∈ 𝐃𝛾 . Using this relation for a discrete net 𝐃 ˆ 𝛾 ∶= 𝐃𝛾 ∩
−4
𝑁 (ℤ + 𝑖ℤ) and a union bound, we obtain that (7.55) holds simultaneously for
ˆ 𝛾 . Using the Lipschitz continuity of 𝑆(𝑧) (with a Lipschitz constant at
all 𝑧 ∈ 𝐃
most 𝑁 2 ), we get (7.55) simultaneously for all 𝑧 ∈ 𝐃𝛾 .
Finally, we need to show that the estimate holds not only for the minimum,
but for |𝑚𝑁 (𝑧) − 𝑚sc (𝑧)|. We have already seen this for any 𝜂 ≥ 𝑁 −1/16 in
(7.45). Fixing 𝐸 = Re 𝑧, we consider 𝑚𝑁 (𝐸 + 𝑖𝜂), 𝑚sc (𝐸 + 𝑖𝜂), and 𝑆(𝐸 + 𝑖𝜂) as
functions of 𝜂. Since these are continuous functions, by reducing 𝜂 we obtain
that 𝑚𝑁 (𝐸 + 𝑖𝜂) remains close to 𝑚sc (𝐸 + 𝑖𝜂) as long as the right-hand side of
(7.55) is larger than the difference
−1
|𝑚sc (𝐸 + 𝑖𝜂) − 𝑚sc (𝐸 + 𝑖𝜂)| ≍ √𝜅𝐸 + 𝜂.
The right-hand side of (7.55) increases as 𝜂 decreases, while the separation
bound √𝜅𝐸 + 𝜂 decreases. Therefore, once 𝜂 is so small that the separation
bound √𝜅𝐸 + 𝜂 becomes smaller than the right-hand side of (7.55), the differ-
−1
ence between 𝑚sc and 𝑚sc remains irrelevant for any smaller 𝜂. Thus (7.1) holds
on the entire 𝐃𝛾 . This proves the weak local semicircle law, i.e., Theorem 7.1
7.2. LARGE-DEVIATION ESTIMATES 57

except for the large-deviation estimate (7.27), which we will prove in the next
subsection.

7.2. Large-Deviation Estimates


Finally, in order to estimate large sums of independent random variables as
in (7.8) and later in (8.2), we will need a large-deviation estimate for linear and
quadratic functionals of independent random variables. The case of the linear
functionals is standard. Quadratic functionals were considered in [62, 69]; the
current formulation is taken from [52].
(𝑁) (𝑁)
Theorem 7.7 (Large-deviation bounds). Let (𝑋𝑖 ) and (𝑌𝑖 ) be inde-
(𝑁) (𝑁)
pendent families of random variables and (𝑎𝑖𝑗 )and (𝑏𝑖 ) be deterministic;
(𝑁) (𝑁)
here 𝑁 ∈ ℕ and 𝑖, 𝑗 = 1, … , 𝑁. Suppose that all entries 𝑋𝑖 and 𝑌𝑖 are inde-
pendent and satisfy
(7.56) 𝔼𝑋 = 0, 𝔼|𝑋|2 = 1, ‖𝑋‖𝑝 ∶= (𝔼|𝑋|𝑝 )1/𝑝 ≤ 𝜇𝑝 ,
for all 𝑝 ∈ ℕ and some constants 𝜇𝑝 . Then we have the bounds
1/2
(7.57) ∑ 𝑏𝑖 𝑋𝑖 ≺ (∑|𝑏𝑖 |2 ) ,
𝑖 𝑖
1/2
(7.58) ∑ 𝑎𝑖𝑗 𝑋𝑖 𝑌𝑗 ≺ (∑|𝑎𝑖𝑗 |2 ) ,
𝑖,𝑗 𝑖,𝑗
1/2
(7.59) ∑ 𝑎𝑖𝑗 𝑋𝑖 𝑋𝑗 ≺ (∑|𝑎𝑖𝑗 |2 ) .
𝑖≠𝑗 𝑖≠𝑗

Our proof in fact generalizes trivially to arbitrary multilinear estimates for



quantities of the form ∑𝑖 ,…,𝑖 𝑎𝑖1 …𝑖𝑘 (𝑢)𝑋𝑖1 (𝑢) ⋯ 𝑋𝑖𝑘 (𝑢), where the star indicates
1 𝑘
that the summation indices are constrained to be distinct.
For our purposes the most important inequality is the last one, (7.59). It con-
firms the intuition from Step 3 on page 49 that for sums of the form ∑𝑖≠𝑗 𝑎𝑖𝑗 𝑋𝑖 𝑋𝑗
the second-moment calculation gives the correct order of magnitude, and it can
be improved to a high probability statement in a large-deviation sense. Indeed,
the second-moment calculation gives
2
| |
𝔼|| ∑ 𝑎𝑖𝑗 𝑋𝑖 𝑋𝑗 || = ∑ ∑ 𝑎𝑖𝑗 𝑎𝑖′ 𝑗′ 𝑋𝑖 𝑋𝑗 𝑋 𝑖′ 𝑋 𝑗′
𝑖≠𝑗 ′ ′
𝑖≠𝑗 𝑖 ≠𝑗

= ∑ [|𝑎𝑖𝑗 |2 + 𝑎𝑖𝑗 𝑎𝑗𝑖 ] ≤ 2 ∑ |𝑎𝑖𝑗 |2 ,


𝑖≠𝑗 𝑖≠𝑗

since only the pairings 𝑖 = 𝑖 ′ , 𝑗 = 𝑗 ′ and 𝑖 = 𝑗 ′ , 𝑗 = 𝑖 ′ give nonzero contribution.


To prepare the proof of Theorem 7.7, we first recall the following version of
the Marcinkiewicz-Zygmund inequality.
58 7. WEAK LOCAL SEMICIRCLE LAW

Lemma 7.8. Let 𝑋1 , … , 𝑋𝑁 be a family of independent random variables,


each satisfying (7.56), and suppose that the family (𝑏𝑖 ) is deterministic. Then

(7.60) ‖∑ 𝑏 𝑋 ‖ ≤ (𝐶𝑝)1/2 𝜇 (∑|𝑏 |2 )1/2 .


‖ 𝑖 𝑖‖
𝑝
𝑝 𝑖
𝑖 𝑖

Proof. The proof is a simple application of Jensen’s inequality. Writing


2
𝐵 ∶= ∑𝑗 |𝑏𝑖 |2 , we get, by the classical Marcinkiewicz–Zygmund inequality
[127] in the first line, that
𝑝 𝑝 2 𝑝/2
‖∑ 𝑏 𝑋 ‖ ≤ (𝐶𝑝)𝑝/2 ‖(∑|𝑏 |2 |𝑋 |2 )1/2 ‖ = (𝐶𝑝)𝑝/2 𝐵𝑝 𝔼[(∑ |𝑏𝑖 | |𝑋 |2 ) ]
‖ 𝑖 𝑖‖
𝑝 ‖ 𝑖 𝑖 ‖𝑝 𝐵2 𝑖
𝑖 𝑖 𝑖
|𝑏𝑖 |2
≤ (𝐶𝑝)𝑝/2 𝐵𝑝 𝔼[∑ |𝑋 |𝑝 ]
𝑖
𝐵2 𝑖
𝑝
≤ (𝐶𝑝)𝑝/2 𝐵𝑝 𝜇𝑝 . □

Next, we prove the following intermediate result.


Lemma 7.9. Let 𝑋1 , … , 𝑋𝑁 , 𝑌1 , … , 𝑌𝑁 be independent random variables,
each satisfying (7.56), and suppose that the family (𝑎𝑖𝑗 ) is deterministic. Then
for all 𝑝 ≥ 2 we have
‖∑ 𝑎 𝑋 𝑌 ‖ ≤ 𝐶𝑝𝜇2 (∑|𝑎 |2 )1/2 .
‖ 𝑖𝑗 𝑖 𝑗 ‖
𝑝
𝑝 𝑖𝑗
𝑖,𝑗 𝑖,𝑗

Proof. Write
∑ 𝑎𝑖𝑗 𝑋𝑖 𝑌𝑗 = ∑ 𝑏𝑗 𝑌𝑗 , 𝑏𝑗 ∶= ∑ 𝑎𝑖𝑗 𝑋𝑖 .
𝑖,𝑗 𝑗 𝑖

Note that (𝑏𝑗 ) and (𝑌𝑗 ) are independent families. By conditioning on the family
(𝑏𝑗 ), we therefore get from Lemma 7.8 and the triangle inequality that
1/2 1/2
‖∑ 𝑏 𝑌 ‖ ≤ (𝐶𝑝)1/2 𝜇 ‖∑|𝑏 |2 ‖ ≤ (𝐶𝑝) 1/2
𝜇 (∑ ‖𝑏 ‖ 2
) .
‖ 𝑗 𝑗‖
𝑝
𝑝‖ 𝑗 ‖
𝑝/2
𝑝 𝑗 𝑝
𝑗 𝑗 𝑗

Using Lemma 7.8 again, we have


1/2
‖𝑏𝑗 ‖𝑝 ≤ (𝐶𝑝)1/2 𝜇𝑝 ‖‖∑|𝑎𝑖𝑗 |2 ‖‖ .
𝑖

This concludes the proof. □


Lemma 7.10. Let 𝑋1 , … , 𝑋𝑁 be independent random variables, each satis-
fying (7.56), and suppose that the family (𝑎𝑖𝑗 ) is deterministic. Then we have

‖∑ 𝑎 𝑋 𝑋 ‖ ≤ 𝐶𝑝𝜇2 (∑|𝑎 |2 )1/2 .


‖ 𝑖𝑗 𝑖 𝑗 ‖
𝑝
𝑝 𝑖𝑗
𝑖≠𝑗 𝑖≠𝑗
7.2. LARGE-DEVIATION ESTIMATES 59

Proof. The proof relies on the identity (valid for 𝑖 ≠ 𝑗)


1
(7.61) 1= ∑ 𝟏(𝑖 ∈ 𝐼)𝟏(𝑗 ∈ 𝐽)
𝑍𝑁 𝐼⊔𝐽=ℕ
𝑁

where the sum ranges over all partitions of ℕ𝑁 = {1, … , 𝑁} into two sets 𝐼 and 𝐽,
and 𝑍𝑁 ∶= 2𝑁−2 is independent of 𝑖 and 𝑗. Moreover, we have
(7.62) ∑ 1 = 2𝑁 − 2,
𝐼⊔𝐽=ℕ𝑁

where the sum ranges over nonempty subsets 𝐼 and 𝐽. Now we may estimate
‖∑ 𝑎 𝑋 𝑋 ‖ ≤ 1 ∑ ‖‖∑ ∑ 𝑎𝑖𝑗 𝑋𝑖 𝑋𝑗 ‖‖
‖ 𝑖𝑗 𝑖 𝑗 ‖
𝑝 𝑍𝑁 𝑝
𝑖≠𝑗 𝐼⊔𝐽=ℕ 𝑖∈𝐼 𝑗∈𝐽 𝑁

1 1/2
2
≤ ∑ 𝐶𝑝𝜇𝑝 (∑|𝑎𝑖𝑗 |2 )
𝑍𝑁 𝐼⊔𝐽=ℕ𝑁 𝑖≠𝑗

where we used that, for any partition 𝐼 ⊔𝐽 = ℕ𝑁 , the families (𝑋𝑖 )𝑖∈𝐼 and (𝑋𝑗 )𝑗∈𝐽
are independent, and hence Lemma 7.9 is applicable. The claim now follows
from (7.62). □
As remarked above, the proof of Lemma 7.10 may be easily extended to

multilinear expressions of the form ∑𝑖 ,…,𝑖 𝑎𝑖1 …𝑖𝑘 𝑋𝑖1 ⋯ 𝑋𝑖𝑘 .
1 𝑘
We may now complete the proof of Theorem 7.7.
Proof of Theorem 7.7. The proof is a simple application of Chebyshev’s
inequality. Part (i) follows from Lemma 7.8, part (ii) from Lemma 7.10, and part
(iii) from Lemma 7.9. We give the details for part (iii).
For 𝜖 > 0 and 𝐷 > 0 we have
1/2
ℙ[||∑ 𝑎𝑖𝑗 𝑋𝑖 𝑋𝑗 || ≥ 𝑁 𝜀 Ψ] ≤ ℙ[||∑ 𝑎𝑖𝑗 𝑋𝑖 𝑋𝑗 || ≥ 𝑁 𝜀 Ψ, (∑|𝑎𝑖𝑗 |2 ) ≤ 𝑁 𝜀/2 Ψ]
𝑖≠𝑗 𝑖≠𝑗 𝑖≠𝑗
1/2
+ ℙ[(∑|𝑎𝑖𝑗 |2 ) ≥ 𝑁 𝜀/2 Ψ]
𝑖≠𝑗
1/2
≤ ℙ[||∑ 𝑎𝑖𝑗 𝑋𝑖 𝑋𝑗 || ≥ 𝑁 𝜀/2 (∑|𝑎𝑖𝑗 |2 ) ] + 𝑁 −𝐷−1
𝑖≠𝑗 𝑖≠𝑗
2 𝑝
𝐶𝑝𝜇𝑝
≤( ) + 𝑁 −𝐷−1
𝑁 𝜀/2
for arbitrary 𝐷. In the second step we used the definition of (∑𝑖≠𝑗 |𝑎𝑖𝑗 |2 )1/2 ≺
Ψ with parameters 𝜀/2 and 𝐷 + 1. In the last step we used Lemma 7.10 by
conditioning on (𝑎𝑖𝑗 ). Given 𝜀 and 𝐷, there is a large enough 𝑝 such that the
first term on the last line is bounded by 𝑁 −𝐷−1 . Since 𝜖 and 𝐷 were arbitrary,
the proof is complete.
The claimed uniformity in 𝑢 in the case that 𝑎𝑖𝑗 and 𝑋𝑖 depend on an index 𝑢
also follows from the above estimate. □
CHAPTER 8

Proof of the Local Semicircle Law

In this section we start the proof of the local semicircle law, Theorem 6.7.
This section can be read independently of the previous Chapter 7, so some basic
definitions and facts are repeated for convenience. For those readers who may
wish to compare this argument with the proof of the weak law, Theorem 7.1, we
mention that the basic strategy is similar except that we use an additional mech-
anism that we call fluctuation averaging. We point out that the first estimate in
(7.29) was not optimal; here we estimated the average of Ω𝑖 by its maximum.
Since Ω𝑖 ’s are almost centered random variables with weak correlation, a more
precise estimate of their average leads to a considerable improvement. We now
recall that the basic steps to prove the weak law were
(1) the self-consistent equation for 𝑚𝑁 ,
(𝑖)
(2) the interlacing of eigenvalues to compare 𝑚𝑁 and 𝑚𝑁 ,
(3) the quadratic large-deviation estimate for the error term 𝑍𝑖 ,
(4) the initial estimate for large 𝜂, and
(5) extending the estimate to small 𝜂 by the continuity argument.
Exploiting the fluctuation averaging mechanism requires a control on the
individual matrix elements of the resolvent 𝐺 instead of just its normalized trace,
𝑚𝑁 = 𝑁 −1 Tr 𝐺. Therefore, instead of considering the scalar equation for 𝑚𝑁 ,
we will consider the vector self-consistent equation for the diagonal elements 𝐺𝑖𝑖
of the Green function and investigate the stability of this equation. There is
no direct analogue of the interlacing property for 𝐺𝑖𝑖 ; thus, we introduce new
resolvent decoupling identities to compare the resolvents of the original matrix
and its minors. We will still use the quadratic large-deviation estimate to bound
the error term. We also use a continuity argument similar to the one given in
the weak law to derive a crude estimate on 𝐺𝑖𝑖 (formulated in terms of a certain
dichotomy), but instead of making many small steps and keeping track of the
exceptional sets, we follow a genuinely continuous approach within the frame-
work of the stochastic domination. Finally, we use the fluctuation-averaging
lemma (Lemma 8.9) to boost the error estimate by one order and an iteration
argument to prove the local semicircle law, Theorem 6.7. We now start the
rigorous proof. We will largely follow the presentation in [55].

8.1. Tools
In this subsection we collect some basic definitions and facts. First, we
repeat Definition 7.3 of the partial expectation.
61
62 8. PROOF OF THE LOCAL SEMICIRCLE LAW

Definition 8.1 (Partial expectation and independence). Let 𝑋 ≡ 𝑋(𝐻) be


a random variable. For 𝑖 ∈ {1, … , 𝑁} define the operations 𝑃𝑖 and 𝑄𝑖 through
𝑃𝑖 𝑋 ∶= 𝔼(𝑋|𝐻 (𝑖) ), 𝑄𝑖 𝑋 ∶= 𝑋 − 𝑃𝑖 𝑋.
We call 𝑃𝑖 partial expectation in the index 𝑖. Moreover, we say that 𝑋 is indepen-
dent of a set 𝕋 ⊂ {1, … , 𝑁} if 𝑋 = 𝑃𝑖 𝑋 for all 𝑖 ∈ 𝕋.
Next, we define matrices with certain columns and rows zeroed out.
Definition 8.2. For 𝕋 ⊂ {1, … , 𝑁} we set 𝐻 (𝕋) to be the 𝑁 × 𝑁 matrix
defined by
(𝐻 (𝕋) )𝑖𝑗 ∶= 𝟏(𝑖 ∉ 𝕋)𝟏(𝑗 ∉ 𝕋)ℎ𝑖𝑗 , 𝑖, 𝑗 = 1, … , 𝑁.

Moreover, we define the resolvent of 𝐻 (𝕋) and its normalized trace through
(𝕋) 1
𝐺𝑖𝑗 (𝑧) ∶= (𝐻 (𝕋) − 𝑧)−1
𝑖𝑗 , 𝑚(𝕋) (𝑧) ∶= Tr 𝐺 (𝕋) (𝑧).
𝑁
We also set the notation
(𝕋)
∑ ∶= ∑ .
𝑖 𝑖∶𝑖∉𝕋

These definitions are the natural generalizations of 𝐻 (𝑖) and 𝐺 (𝑖) introduced
in Chapter 1. In particular, notice that 𝐻 (𝕋) is the matrix obtained by setting all
rows and columns in 𝕋 to 0. This is different from considering the minors by
removing the columns and rows in 𝕋. Similarly, 𝐺 (𝕋) is still an 𝑁 × 𝑁 matrix
(𝕋) (𝕋)
with 𝐺𝑖𝑖 = −𝑧−1 for 𝑖 ∈ 𝕋 and 𝐺𝑖𝑗 = 0 if 𝑖 ∈ 𝕋 and 𝑗 ∉ 𝕋. We will denote
𝐺 ({𝑖}) simply by 𝐺 (𝑖) and similarly for a few more indices. This is consistent with
the notation we used in the earlier chapters.
The following resolvent decoupling identities form the backbone of all of our
calculations. The idea behind them is that a resolvent matrix element 𝐺𝑖𝑗 de-
pends strongly on the 𝑖th and 𝑗th columns of 𝐻, but weakly on all other columns.
The first identity determines how to make a resolvent matrix element 𝐺𝑖𝑗 in-
dependent of an additional index 𝑘 ≠ 𝑖, 𝑗. The second identity expresses the
dependence of a resolvent matrix element 𝐺𝑖𝑗 on the matrix elements in the 𝑖th
or 𝑗th column of 𝐻. We added a third identity that relates sums of off-diagonal
resolvent entries with a diagonal one. The proofs are elementary.
Lemma 8.3 (Resolvent decoupling identities). For any real or complex Her-
mitian matrix 𝐻 and 𝕋 ⊂ {1, … , 𝑁} the following identities hold:
(i) First resolvent decoupling identity [69]: If 𝑖, 𝑗, 𝑘 ∉ 𝕋 and 𝑖, 𝑗 ≠ 𝑘, then
(𝕋) (𝕋)
(𝕋) (𝕋𝑘)
𝐺𝑖𝑘 𝐺𝑘𝑗
(8.1) 𝐺𝑖𝑗 = 𝐺𝑖𝑗 + .
(𝕋)
𝐺𝑘𝑘
8.1. TOOLS 63

(ii) Second resolvent decoupling identity [52]: If 𝑖, 𝑗 ∉ 𝕋 satisfy 𝑖 ≠ 𝑗, then

(𝕋𝑖) (𝕋𝑗)
(𝕋) (𝕋) (𝕋𝑖) (𝕋) (𝕋𝑗)
(8.2) 𝐺𝑖𝑗 = −𝐺𝑖𝑖 ∑ ℎ𝑖𝑘 𝐺𝑘𝑗 = −𝐺𝑗𝑗 ∑ 𝐺𝑖𝑘 ℎ𝑘𝑗
𝑘 𝑘

where the superscript in the summation means omission; e.g., the sum-
mation in the first sum runs over all 𝑘 ∉ 𝕋 ∪ {𝑖}.
(iii) Ward identity. For any 𝕋 ⊂ {1, … , 𝑁} we have
2 1
∑ ||𝐺𝑖𝑗 || = Im 𝐺𝑖𝑖 .
(𝕋) (𝕋)
(8.3)
𝑗
𝜂

Proof. We will prove this lemma only for 𝕋= 0; the general case is a straight-
forward modification. We first consider (8.2). Recall the resolvent expansion
stating that for any two matrices 𝐴 and 𝐵,

1 1 1 1 1 1 1
(8.4) = − 𝐵 = − 𝐵
𝐴+𝐵 𝐴 𝐴+𝐵 𝐴 𝐴 𝐴 𝐴+𝐵
provided that all the matrix inverses exist.
To obtain the first formula in (8.2), we use the first resolvent identity (8.4)
(𝑖)
at the (𝑖𝑗)th matrix element with 𝐴 = 𝐻 (𝑖) − 𝑧 and 𝐵 = 𝐻 − 𝐻 (𝑖) . Since 𝐺𝑖𝑗 =
(𝐴−1 )𝑖𝑗 = 0 if 𝑖 ≠ 𝑗, we immediately have

(𝑖) (𝑖) (𝑖)


(8.5) 𝐺𝑖𝑗 = − ∑ 𝐺𝑖𝑖 ℎ𝑖𝑘 𝐺𝑘𝑗 − ∑ 𝐺𝑖𝑘 ℎ𝑘𝑖 𝐺𝑖𝑗 = −𝐺𝑖𝑖 ∑ ℎ𝑖𝑘 𝐺𝑘𝑗 , 𝑖 ≠ 𝑗.
𝑘 𝑘 𝑘≠𝑖

The second identity in (8.2) follows in the same way by using the second
identity in (8.4). To prove the identity (8.1), we let 𝐴 = 𝐻 (𝑘) − 𝑧 and 𝐵 =
𝐻 − 𝐻 (𝑘) . Then, from the first formula in (8.4), we have

(𝑘) (𝑘) (𝑘) 𝐺𝑖𝑘 𝐺𝑘𝑗


(8.6) 𝐺𝑖𝑗 = 𝐺𝑖𝑗 − 𝐺𝑖𝑘 ∑ ℎ𝑘ℓ 𝐺ℓ𝑗 = 𝐺𝑖𝑗 + ,
ℓ≠𝑘
𝐺𝑘𝑘

and in the second step we used (8.2).


Finally, the Ward identity (8.3), already used in (7.23), is well-known. It
follows from the spectral decomposition for 𝐻 with eigenvalues 𝜆𝛼 and eigen-
vectors 𝐮𝛼 , namely,

∑ |𝐺𝑖𝑗 |2 = ∑ 𝐺𝑖𝑗 𝐺𝑗𝑖 = [|𝐺|2 ]𝑖𝑖
𝑗 𝑗

|𝐮𝛼 (𝑖)|2 1 |𝐮𝛼 (𝑖)|2 1


=∑ = Im ∑ = Im 𝐺𝑖𝑖 . □
𝛼
|𝜆𝛼 − 𝑧|2 𝜂 𝛼
𝜆𝛼 − 𝑧 𝜂
64 8. PROOF OF THE LOCAL SEMICIRCLE LAW

8.2. Self-Consistent Equations on Two Levels


By using the notation from the previous section, the Schur formula (Lemma
7.2) can be written as
(𝕋𝑖)
1 (𝕋𝑖)
(8.7) = ℎ𝑖𝑖 − 𝑧 − ∑ ℎ𝑖𝑘 𝐺𝑘𝑙 ℎ𝑙𝑖
(𝕋)
𝐺𝑖𝑖 𝑘,𝑙

where 𝑖 ∉ 𝕋 ⊂ {1, … , 𝑁}. We can take 𝕋 = ∅ to have


1
(8.8) 𝐺𝑖𝑖 = .
(𝑖) (𝑖)
ℎ𝑖𝑖 − 𝑧 − ∑𝑘,𝑙 ℎ𝑖𝑘 𝐺𝑘𝑙 ℎ𝑙𝑖
The partial expectation with respect to the index 𝑖 in the denominator can be
(𝑖)
computed explicitly since ℎ𝑖𝑘 ℎ𝑙𝑖 is independent of 𝐺𝑘𝑙 and its expectation is
nonzero only if 𝑘 = 𝑙. We thus get
(𝑖) (𝑖) (𝑖) (𝑖)
(𝑖) (𝑖) 𝐺𝑖𝑘 𝐺𝑘𝑖
𝑃𝑖 [∑ ℎ𝑖𝑘 𝐺𝑘𝑙 ℎ𝑙𝑖 ] = ∑ 𝑠𝑖𝑘 𝐺𝑘𝑘 = ∑ 𝑠𝑖𝑘 𝐺𝑘𝑘 − ∑ 𝑠𝑖𝑘
𝑘,𝑙 𝑘 𝑘 𝑘
𝐺𝑖𝑖
𝐺𝑖𝑘 𝐺𝑘𝑖
= ∑ 𝑠𝑖𝑘 𝐺𝑘𝑘 − ∑ 𝑠𝑖𝑘 ,
𝑘 𝑘
𝐺𝑖𝑖

where in the second step we used (8.1). We will compare (8.8) with the defining
equation of 𝑚sc :
1
(8.9) 𝑚sc = ,
−𝑧 − 𝑚sc
so we introduce the notation for the difference
𝑣𝑖 ∶= 𝐺𝑖𝑖 − 𝑚sc .
Recalling (6.1), we get the following system of self-consistent equations for 𝑣𝑖 :
1
(8.10) 𝑣𝑖 = − 𝑚sc ,
−𝑧 − 𝑚sc − (∑𝑘 𝑠𝑖𝑘 𝑣𝑘 − Υ𝑖 )
where
𝐺𝑖𝑘 𝐺𝑘𝑖
Υ𝑖 ∶= 𝐴𝑖 + ℎ𝑖𝑖 − 𝑍𝑖 , 𝐴𝑖 ∶= ∑ 𝑠𝑖𝑘 ,
𝑘
𝐺𝑖𝑖
(8.11) (𝑖)
(𝑖)
𝑍𝑖 ∶= 𝑄𝑖 [∑ ℎ𝑖𝑘 𝐺𝑘𝑙 ℎ𝑙𝑖 ].
𝑘,𝑙

All these quantities depend on 𝑧, but we omit it in the notation. We will show
that Υ is a small error term. This is clear about ℎ𝑖𝑖 by (6.27). The term 𝐴𝑖 will
be small since off-diagonal resolvent entries are small. Finally, 𝑍𝑖 will be small
by a large-deviation estimate (7.59) from Theorem 7.7.
8.2. SELF-CONSISTENT EQUATIONS ON TWO LEVELS 65

Before we present more details, we heuristically show the power of this new
system of self-consistent equations (8.10) and compare it with the single self-
consistent equation used in Chapter 7 and, in fact, used in all previous literature
on the resolvent method for Wigner matrices.

8.2.1. A Scalar Self-Consistent Equation. Introduce the notation


1
(8.12) [𝑎] = ∑𝑎
𝑁 𝑖 𝑖
1
for the average of a vector (𝑎𝑖 )𝑁
𝑖=1 . Consider the standard Wigner case, 𝑠𝑖𝑗 = 𝑁
.
Then
1
∑ 𝑠𝑖𝑘 𝑣𝑘 = ∑ 𝑣𝑘 = [𝑣] (= 𝑚𝑁 − 𝑚sc ).
𝑘
𝑁 𝑘
Neglecting Υ𝑖 in (8.10) and taking the average of this relation for each 𝑖 = 1, … , 𝑁,
we get
1
(8.13) [𝑣] ≈ − 𝑚sc .
−𝑧 − 𝑚sc − [𝑣]
Recall (8.9), the defining equation of 𝑚sc :
1
𝑚sc = .
−𝑧 − 𝑚sc
Using Lemma 7.6, which asserts that this equation is stable under small pertur-
bations, at least away from the spectral edges 𝑧 = ±2, we conclude [𝑣] ≈ 0 from
(8.13). This means that 𝑚𝑁 ≈ 𝑚sc and hence for the empirical density 𝜚𝑁 ≈ 𝜚sc ;
i.e., we obtained Wigner’s original semicircle law. This is exactly the idea we
used in Chapter 7 except that the interlacing argument is replaced by Lemma
8.3 this time.

8.2.2. A Vector Self-Consistent Equation. If we are interested in indi-


vidual resolvent matrix elements 𝐺𝑖𝑖 instead of their average, 𝑁1 Tr 𝐺, then the
scalar equation (8.13) discussed in the previous section is not sufficient. We
have to consider (8.10) as a system of equations for the components of the vector
𝐯 = (𝑣1 , … , 𝑣𝑁 ). In order to analyze it, we will linearize this system of equations.
There are two possible linearizations depending on how we expand the error
terms.
Linearization I. If we know that ∑𝑘 𝑠𝑖𝑘 𝑣𝑘 − Υ𝑖 is small, we can expand the
denominator in (8.10) to have
2
2 3
𝑣𝑖 = 𝑚sc (∑ 𝑠𝑖𝑘 𝑣𝑘 − Υ𝑖 ) + 𝑚sc (∑ 𝑠𝑖𝑘 𝑣𝑘 − Υ𝑖 )
𝑘 𝑘
(8.14) 3
+ 𝑂((∑ 𝑠𝑖𝑘 𝑣𝑘 − Υ𝑖 ) ),
𝑘
66 8. PROOF OF THE LOCAL SEMICIRCLE LAW

4
where we have used the defining equation (8.9) for 𝑚sc and used that |𝑚sc |≤1
in the error term. After rearranging (8.14), we get

2
(8.15) [(1 − 𝑚sc 𝑆)𝐯]𝑖 = ℰ𝑖 ∶=
2 3
2 3
− 𝑚sc Υ𝑖 + 𝑚sc (∑ 𝑠𝑖𝑘 𝑣𝑘 − Υ𝑖 ) + 𝑂((∑ 𝑠𝑖𝑘 𝑣𝑘 − Υ𝑖 ) ),
𝑘 𝑘

Linearization II. If we know that 𝑣𝑖 is small, then it is useful to rewrite (8.10)


to the following form:
1 1
(8.16) − ∑ 𝑠𝑖𝑘 𝑣𝑘 + Υ𝑖 = − ,
𝑘
𝑚sc + 𝑣𝑖 𝑚sc

where we have also used the defining equation of 𝑚sc . If |𝑣𝑖 | = 𝑜(1), then we
2
expand 𝑣𝑖 to the second-order and multiply both sides by 𝑚sc to obtain

2 2 1 2
(8.17) [(1 − 𝑚sc 𝑆)𝐯]𝑖 = ℰ𝑖 ∶= −𝑚sc Υ𝑖 + 𝑣 + 𝑂(|𝑣𝑖 |3 ),
𝑚sc 𝑖
2
where we have estimated |𝑚sc | ≥ 𝑐 in the last term (see Lemma 6.2).
Notice that the definitions of ℰ𝑖 in (8.15) and (8.18) are different, although
2
their leading behavior −𝑚sc Υ𝑖 is the same. In both cases we can continue the
2
analysis by inverting the operator (1 − 𝑚sc 𝑆) to obtain

1 ‖ 1 ‖
(8.18) 𝐯 = ℰ, hence ‖𝐯‖∞ ≤ ‖‖ 2 ‖
‖ℰ‖∞ = Γ‖ℰ‖∞ ,
2
1 − 𝑚sc 𝑆 1 − 𝑚sc 𝑆 ‖∞→∞

and this relation shows how the quantity Γ, defined in (6.17), emerges. If the
error term is indeed small and Γ is bounded, then we obtain that ‖𝐯‖∞ =
max |𝐺𝑖𝑖 − 𝑚sc | is small.
While the expansion logic behind equations (8.14) and (8.17) is the same
and the resulting formulae are very similar, the structure of the main proof de-
pends on which linearization of the self-consistent equation is used. In both
cases we need to derive an a priori bound to ensure that the expansion is valid.
Intuitively, the smallness of ∑𝑘 𝑠𝑖𝑘 𝑣𝑘 −Υ𝑖 seems easier than that of 𝑣𝑖 , since both
terms are averaged quantities and extra averaging typically helps. But these are
random objects and every estimate comes with an exceptional set in the prob-
ability space where it does not hold. It turns out that on the technical level it
is worth minimizing the bookkeeping of these events, and this reason favors
the second version of the linearization, which operates with controlling a sin-
gle quantity, max𝑖 |𝑣𝑖 |. In this book, therefore, we will follow the linearization
(8.17). We remark that the other option was used in [69, 70], which required
first proving the weak semicircle law, Theorem 7.1, to provide the necessary
a priori bound. The linearization (8.17) circumvents this step and the a priori
bound on 𝑣𝑖 will be proved directly.
8.3. PROOF OF THE LOCAL SEMICIRCLE LAW WITHOUT USING THE SPECTRAL GAP 67

8.3. Proof of the Local Semicircle Law Without Using the Spectral Gap
In this section we prove a restricted version of Theorem 6.7; namely, we
replace threshold ˜𝜂 𝐸 with a larger threshold 𝜂𝐸 defined as
1 𝑀 −𝛾 𝑀 −2𝛾
𝜂𝐸 ∶= min{𝜂 ∶ ≤ min{ , }
𝑀𝜉 Γ(𝐸 + i𝜉)3 Γ(𝐸 + i𝜉)4 Im 𝑚sc (𝐸 + i𝜉)
(8.19)
holds for all 𝜉 ≥ 𝜂}.

This definition is exactly the same as (6.28), but Γ̃ is replaced with the larger
quantity Γ; in other words, we do not make use of the spectral gap in 𝑆. This
will pedagogically simplify the presentation, but it will prove the estimates in
Theorem 6.7 only for the 𝜂 ≥ 𝜂𝐸 regime. In Chapter 9 we will give the proof
for the entire 𝜂 ≥ ˜ 𝜂 𝐸 regime. We recall Lemma 6.3 showing that there is no
difference between Γ and Γ̃ for generalized Wigner matrices away from the
edges (both are of order 1), so readers interested in the local semicircle law
only in the bulk should be content with the simpler proof. Near the spectral
edges, however, there is a substantial difference. Note that even in the Wigner
case (see (6.22)), 𝜂𝐸 is much larger near the spectral edges than the optimal
threshold ˜𝜂 𝐸 . For generalized Wigner matrices, while ˜ 𝜂 𝐸 ≫ 𝑁1 , the threshold
𝜂𝐸 is determined by the relation 𝑁𝜂𝐸 (𝜅𝐸 + 𝜂𝐸 )3/2 ≫ 1, where 𝜅𝐸 = ||𝐸| − 2| is
the distance of 𝐸 from the spectral edges.
We stress, however, that the proof given below does not use any model-
specific upper bound on Γ, such as (6.22) or (6.24); only the trivial lower and
upper bounds, (6.19) and (6.20), are needed. The actual size of Γ enters only
implicitly by determining the threshold 𝜂𝐸 . This makes the argument applicable
to a wide class of problems beyond generalized Wigner matrices, including band
matrices; see [55].
Definition 8.4. A deterministic nonnegative function Ψ ≡ Ψ(𝑁) (𝑧) is
called an admissible control parameter if we have
(8.20) 𝑐𝑀 −1/2 ≤ Ψ ≤ 𝑀 −𝑐
for some constant 𝑐 > 0 and large enough 𝑁. Moreover, after fixing a 𝛾 > 0, we
call any (possibly 𝑁-dependent) subset
D = D(𝑁) ⊂ {𝑧 ∶ |𝐸| ≤ 10, 𝑀 −1 ≤ 𝜂 ≤ 10}
a spectral domain.
In this section we will mostly use the spectral domain
(8.21) S ∶= {𝑧 ∶ |𝐸| ≤ 10, 𝜂 ∈ [𝜂𝐸 , 10]}
where we note that
1 −1+𝛾
(8.22) 𝜂𝐸 ≥ 𝑀 ,
8
68 8. PROOF OF THE LOCAL SEMICIRCLE LAW

using the lower bound Γ ≥ 𝑐 from (6.19) in the definition (8.19). Define the
random control parameters
(8.23) Λo ∶= max|𝐺𝑖𝑗 |, Λd ∶= max|𝐺𝑖𝑖 − 𝑚sc |, Λ ∶= max(Λo , Λd ),
𝑖≠𝑗 𝑖

where the letters d and o refer to diagonal and off-diagonal elements. In the
typical regime that we will work, all these quantities are small. The key quantity
is Λ, and we will develop an iterative argument to control it. We first derive
an estimate of Λo + |Υ𝑖 | in terms of Λ. This will be possible only in the event
when Λ is already small, so we will need to introduce an indicator function
𝜙 = 1(Λ ≤ 𝑀 −𝑐 ) with some small 𝑐. More generally, we will consider any
indicator function 𝜙 so that 𝜙Λ ≺ 𝑀 −𝑐 . Notice that this is a somewhat weaker
concept than 𝜙Λ ≤ 𝑀 −𝑐 (even if the exponent 𝑐 is slightly adjusted), but it
turns out to be more flexible since algebraic manipulations involving ≺ (see
Proposition 6.5) can be directly used.

8.3.1. Large-Deviation Bounds on 𝚲o and 𝚼𝐢 .


Lemma 8.5. The following statements hold for any spectral domain 𝐃: Let 𝜙
be the indicator function of some (possibly 𝑧-dependent) event. If 𝜙Λ ≺ 𝑀 −𝑐 for
some 𝑐 > 0 holds uniformly in 𝑧 ∈ 𝐃, then

Im 𝑚sc + Λ
(8.24) 𝜙(Λo + |𝑍𝑖 | + |Υ𝑖 |) ≺
√ 𝑀𝜂
uniformly in 𝑧 ∈ 𝐃. Moreover, for any fixed (𝑁-independent) 𝜂 > 0 we have
(8.25) Λo + |𝑍𝑖 | + |Υ𝑖 | ≺ 𝑀 −1/2
uniformly in 𝑧 ∈ {𝑤 ∈ 𝐃 ∶ Im 𝑤 = 𝜂}.
In other words, (8.24) means that

Im 𝑚sc + Λ
(8.26) Λo + |𝑍𝑖 | + |Υ𝑖 | ≺
√ 𝑀𝜂
on the event where Λ ≺ 𝑀 −𝑐 has been a priori established.

Proof of Lemma 8.5. We first observe that 𝜙Λ ≺ 𝑀 −𝑐 ≪ 1 and the posi-


tive lower bound |𝑚sc (𝑧)| ≥ 𝑐 implies that
𝜙
(8.27) ≺ 1.
|𝐺𝑖𝑖 |
A simple iteration of the expansion formula (8.1) concludes that
𝜙
𝜙 ||𝐺𝑖𝑗 || ≺ 𝑀 −𝑐 for 𝑖 ≠ 𝑗,
(𝕋) (𝕋)
(8.28) 𝜙||𝐺𝑖𝑖 || ≺ 1, ≺ 1,
|𝐺𝑖𝑖(𝕋) |
for any subset 𝕋 of fixed cardinality.
8.3. PROOF OF THE LOCAL SEMICIRCLE LAW WITHOUT USING THE SPECTRAL GAP 69

We begin with the first statement in Lemma 8.5. First we estimate 𝑍𝑖 , which
we split as
| (𝑖) 2 (𝑖) | | (𝑖) (𝑖) |
(8.29) 𝜙|𝑍𝑖 | ≤ 𝜙|∑(|ℎ𝑖𝑘 | − 𝑠𝑖𝑘 )𝐺𝑘𝑘 | + 𝜙| ∑ ℎ𝑖𝑘 𝐺𝑘𝑙 ℎ𝑙𝑖 |.
|𝑘 | |𝑘≠𝑙 |

We estimate each term using Theorem 7.7 by conditioning on 𝐺 (𝑖) and using
the fact that the family (ℎ𝑖𝑘 )𝑁 (𝑖)
𝑘=1 is independent of 𝐺 . By (7.57) the first term
of (8.29) is stochastically dominated by
(𝑖) 1/2
2 | (𝑖) |2
𝜙[∑ 𝑠𝑖𝑘 𝐺𝑘𝑘 ] ≺ 𝑀 −1/2 ,
𝑘

where (8.28), (6.3), and (6.1) were used. For the second term of (8.29) we apply
1/2 (𝑖) 1/2 −1/2
(7.59) from Theorem 7.7 with 𝑎𝑘𝑙 = 𝑠𝑖𝑘 𝐺𝑘𝑙 𝑠𝑙𝑖 and 𝑋𝑘 = 𝑠𝑖𝑘 ℎ𝑖𝑘 . We find
(𝑖) 2 (𝑖) 2
(𝑖) 𝜙 (𝑖)
𝜙 ∑ 𝑠𝑖𝑘 ||𝐺𝑘𝑙 || 𝑠𝑙𝑖 ≤ ∑ 𝑠𝑖𝑘 ||𝐺𝑘𝑙 ||
𝑘,𝑙
𝑀 𝑘,𝑙
(8.30)
(𝑖)
𝜙 (𝑖) Im 𝑚sc + Λ
= ∑ 𝑠𝑖𝑘 Im 𝐺𝑘𝑘 ≺ ,
𝑀𝜂 𝑘 𝑀𝜂
where in the last step we used (8.1) and the estimate 1/𝐺𝑖𝑖 ≺ 1. Thus, we get
Im 𝑚sc + Λ
(8.31) 𝜙|𝑍𝑖 | ≺ ,
√ 𝑀𝜂
where we absorbed the bound 𝑀 −1/2 on the first term of (8.29) into the right-
hand side of (8.31). Here we only needed to use Im 𝑚sc (𝑧) ≥ 𝑐𝜂 as follows from
an explicit estimate; see (6.13).
Next, we estimate Λo . We can iterate (8.2) once to get, for 𝑖 ≠ 𝑗,
(𝑖) (𝑖𝑗)
(𝑖) (𝑖) (𝑖𝑗)
(8.32) 𝐺𝑖𝑗 = −𝐺𝑖𝑖 ∑ ℎ𝑖𝑘 𝐺𝑘𝑗 = −𝐺𝑖𝑖 𝐺𝑗𝑗 (ℎ𝑖𝑗 − ∑ ℎ𝑖𝑘 𝐺𝑘𝑙 ℎ𝑙𝑗 ).
𝑘 𝑘,𝑙
−1/2
The term ℎ𝑖𝑗 is trivially 𝑂≺ (𝑀 ). In order to estimate the other term, we
1/2 1/2 (𝑖𝑗)
−1/2
invoke (7.58) from Theorem 7.7 with 𝑎𝑘𝑙 = 𝑠𝑖𝑘 𝐺𝑘𝑙 𝑠𝑙𝑗 , 𝑋𝑘 = 𝑠𝑖𝑘 ℎ𝑖𝑘 , and
−1/2
𝑌𝑙 = 𝑠𝑙𝑗 ℎ𝑙𝑗 . As in (8.30), we find
(𝑖) 2
(𝑖𝑗) Im 𝑚sc + Λ
𝜙 ∑ 𝑠𝑖𝑘 ||𝐺𝑘𝑙 || 𝑠𝑙𝑗 ≺ ,
𝑘,𝑙
𝑀𝜂
and thus
Im 𝑚sc + Λ
(8.33) 𝜙Λo ≺ ,
√ 𝑀𝜂
where we again absorbed the term ℎ𝑖𝑗 ≺ 𝑀 −1/2 into the right-hand side.
70 8. PROOF OF THE LOCAL SEMICIRCLE LAW

In order to estimate 𝐴𝑖 and ℎ𝑖𝑖 in the definition of Υ𝑖 , we use (8.28) to get


Im 𝑚sc Im 𝑚sc + Λ
𝜙[|𝐴𝑖 | + |ℎ𝑖𝑖 |] ≺ 𝜙Λ2o + 𝑀 −1/2 ≤ 𝜙Λo + 𝐶 ≺ ,
√ 𝑀𝜂 √ 𝑀𝜂
where the second step follows from Im 𝑚sc ≥ 𝑐𝜂. Collecting (8.31) and (8.33),
this completes the proof of (8.24).
(𝑖)
The proof of (8.25) is almost identical to that of (8.24). The quantities |𝐺𝑘𝑘 |
(𝑖𝑗)
and |𝐺𝑘𝑘 | are estimated by the trivial deterministic bound 𝜂−1 = 𝑂(1). We omit
the details. □

8.3.2. Initial Bound on 𝚲. In this subsection, we prove an initial bound


asserting that Λ ≺ 𝑀 −1/2 holds for 𝑧 with a large imaginary part 𝜂. This bound
is rather easy to get after we have proved in (8.25) that the error term Υ𝑖 in the
self-consistent equation (8.16) is small.
Lemma 8.6. We have Λ ≺ 𝑀 −1/2 uniformly in 𝑧 ∈ [−10, 10] + 2i.
Proof. We shall make use of the trivial bounds
(8.34) |𝐺 (𝕋) | ≤ 1 = 1 , |𝑚 | ≤ 1 = 1 ,
| 𝑖𝑗 | 𝜂 2 sc
𝜂 2
where the last inequality follows from the fact that 𝑚sc is the Stieltjes transform
of a probability measure. From (8.25) we get
(8.35) Λo + |𝑍𝑖 | ≺ 𝑀 −1/2 .
Moreover, we use (8.1) and (8.32) to estimate
| 𝐺𝑖𝑗 𝐺𝑗𝑖 | (𝑖) | (𝑖𝑗) |
| ≤ 𝑀 −1 + ∑ 𝑠𝑖𝑗 ||𝐺𝑗𝑖 𝐺𝑗𝑗 || ||ℎ𝑖𝑗 − ∑ ℎ𝑖𝑘 𝐺𝑘𝑙 ℎ𝑙𝑗 || ≺ 𝑀 −1/2 ,
(𝑖) (𝑖𝑗)
|𝐴𝑖 | ≤ ∑ 𝑠𝑖𝑗 |
𝑗 | 𝐺𝑖𝑖 | 𝑗 | 𝑘,𝑙 |
where the last step follows by using (7.58), exactly as the estimate of the right-
hand side of (8.32) in the proof of Lemma 8.5. We conclude that |Υ𝑖 | ≺ 𝑀 −1/2 .
Next, we write (8.16) as
𝑚sc (∑𝑘 𝑠𝑖𝑘 𝑣𝑘 − Υ𝑖 )
(8.36) 𝑣𝑖 = −1
.
𝑚sc − ∑𝑘 𝑠𝑖𝑘 𝑣𝑘 + Υ𝑖
−1
Using |𝑚sc | ≥ 2 and |𝑣𝑘 | ≤ 1 as follows from (8.34), we find
| −1 |
|𝑚sc + ∑ 𝑠𝑖𝑘 𝑣𝑘 − Υ𝑖 | ≥ 1 + 𝑂≺ (𝑀 −1/2 ).
| |
𝑘
1
Using |𝑚sc | ≤ 2
and taking the maximum over all 𝑖 in (8.36), we therefore
conclude that
Λd + 𝑂≺ (𝑀 −1/2 ) Λd
(8.37) Λd ≤ = + 𝑂≺ (𝑀 −1/2 ) ,
2 + 𝑂≺ (𝑀 −1/2 ) 2
from which the claim follows together with the estimate on Λo from (8.35). □
8.3. PROOF OF THE LOCAL SEMICIRCLE LAW WITHOUT USING THE SPECTRAL GAP 71

8.3.3. A Rough Bound on 𝚲 and a Continuity Argument. The next


step is to get a rough bound to Λ by a continuity argument.
Proposition 8.7. We have Λ ≺ 𝑀 −𝛾/3 Γ−1 uniformly in the domain 𝐒 de-
fined in (8.21).

Proof. The core of the proof is a continuity argument. The first task is to
establish a gap in the range of Λ by establishing a dichotomy. Roughly speaking,
the following lemma asserts that, for all 𝑧 ∈ 𝐒, with high probability either
Λ ≤ 𝑀 −𝛾/2 Γ−1 or Λ ≥ 𝑀 −𝛾/4 Γ−1 ; i.e., there is a gap or forbidden region in the
range of Λ with very high probability.
Lemma 8.8. We have the bound
𝟏(Λ ≤ 𝑀 −𝛾/4 Γ−1 )Λ ≺ 𝑀 −𝛾/2 Γ−1
uniformly in 𝐒.
Proof of Lemma 8.8 . Set
𝜙 ∶= 𝟏(Λ ≤ 𝑀 −𝛾/4 Γ−1 ).
Then by definition we have 𝜙Λ ≤ 𝑀 −𝛾/4 Γ−1 ≤ 𝐶𝑀 −𝛾/4 , where in the last step
we have used that Γ is bounded below (6.19). Hence we may invoke (8.24) to
estimate Λo and Υ𝑖 by √(Im 𝑚sc + Λ)/𝑀𝜂. In order to estimate Λd , we use (8.18)
to get

Im 𝑚sc + Λ
(8.38) 𝜙Λd = 𝜙 max|𝑣𝑖 | ≺ Γ(Λ2 + ).
𝑖 √ 𝑀𝜂
Recalling (6.19) and (8.24), we therefore get

Im 𝑚sc + Λ
(8.39) 𝜙Λ ≺ 𝜙Γ(Λ2 + ).
√ 𝑀𝜂
Next, by definition of 𝜙, we may estimate
𝜙ΓΛ2 ≤ 𝑀 −𝛾/2 Γ−1 .
Moreover, by definition of 𝐒, we have
1 𝑀 −𝛾 𝑀 −2𝛾
(8.40) ≤ min{ 3 , 4 }.
𝑀𝜂 Γ Γ Im 𝑚sc
Together with the definition of 𝜙, we have

Im 𝑚sc + Λ Im 𝑚sc Γ−1


𝜙Γ ≤Γ +Γ
√ 𝑀𝜂 √ 𝑀𝜂 √ 𝑀𝜂
≤ 𝑀 −𝛾 Γ−1 + 𝑀 −𝛾/2 Γ−1 ≤ 2𝑀 −𝛾/2 Γ−1 .
(Notice that the middle inequality is the crucial place where the definition of
𝜂𝐸 and the restriction 𝜂 ≥ 𝜂𝐸 are used.) Plugging this into (8.39) yields 𝜙Λ ≺
𝑀 −𝛾/2 Γ−1 , which is the claim. □
72 8. PROOF OF THE LOCAL SEMICIRCLE LAW

Figure 8.1. The (𝜂, Λ)-plane for a fixed 𝐸 with the graph of 𝜂 →
Λ(𝐸 + i𝜂). The shaded region is forbidden with high probability
by Lemma 8.8. The initial estimate, Lemma 8.6, is marked with
a black dot. The graph of Λ = Λ(𝐸 + i𝜂) is continuous and
lies beneath the shaded region. Note that this method does not
control Λ(𝐸 + 𝑖𝜂) in the regime 𝜂 ≤ 𝜂𝐸 .

If we knew that Λ is excluded from the interval [𝑀 −𝛾/2 Γ−1 , 𝑀 −𝛾/4 Γ−1 ], then
we could immediately finish the proof of Proposition 8.7. We could argue that
Λ = Λ(𝐸 + 𝑖𝜂) is continuous in 𝜂 = Im 𝑧 and hence cannot jump from one side
of the gap to the other; moreover, for 𝜂 = 2 it is below the gap by Lemma 8.6, so
Λ is below the gap for all 𝑧 ∈ 𝐒 with high probability. For a pictorial illustration
of this argument, see Figure 8.1 (borrowed from [55]).
However, Lemma 8.8 guarantees a gap in the range of Λ only with a very
high probability for each fixed 𝑧. We need to use a fine discrete grid in the
space of 𝑧 to upgrade this statement to all 𝑧 with high probability. Then the
continuity argument described in the previous paragraph will be valid with high
probability. In the next step we explain the details.
The continuity argument. Fix 𝐷 > 10. Lemma 8.8 implies that for each 𝑧 ∈ 𝐒
the probability that Λ falls into the gap (or the forbidden region) is very small;
i.e., we have

(8.41) ℙ(𝑀 −𝛾/3 Γ(𝑧)−1 ≤ Λ(𝑧) ≤ 𝑀 −𝛾/4 Γ(𝑧)−1 ) ≤ 𝑁 −𝐷

for 𝑁 ≥ 𝑁0 where 𝑁0 ≡ 𝑁0 (𝛾, 𝐷) does not depend on 𝑧. All of the argument


below is valid for 𝑁 ≥ 𝑁0 .
Next, take a lattice Δ ⊂ 𝐒 such that |Δ| ≤ 𝑁 10 ; for each 𝑧 ∈ 𝐒 there exists
a 𝑤 ∈ Δ such that |𝑧 − 𝑤| ≤ 𝑁 −4 . Then (8.41) applied to all elements of Δ, a
8.3. PROOF OF THE LOCAL SEMICIRCLE LAW WITHOUT USING THE SPECTRAL GAP 73

simple union bound gives


(8.42) ℙ(∃𝑤 ∈ Δ ∶ 𝑀 −𝛾/3 Γ(𝑤)−1 ≤ Λ(𝑤) ≤ 𝑀 −𝛾/4 Γ(𝑤)−1 ) ≤ 𝑁 −𝐷+10 .
From the definitions of Λ(𝑧), Γ(𝑧), and 𝐒 (recall (6.19)), we immediately find
that Λ and Γ are Lipschitz continuous on 𝐒, with Lipschitz constant at most 𝑀 2 .
Hence (8.42) implies that
ℙ(∃𝑧 ∈ 𝐒 ∶ 2𝑀 −𝛾/3 Γ(𝑧)−1 ≤ Λ(𝑧) ≤ 2−1 𝑀 −𝛾/4 Γ(𝑧)−1 ) ≤ 𝑁 −𝐷+10 ;
i.e., a slightly smaller gap is present simultaneously for all 𝑧 ∈ 𝐒 and not only
for the discrete lattice Δ. We conclude that there is an event Ξ satisfying ℙ(Ξ) ≥
1 − 𝑁 −𝐷+10 such that, for each 𝑧 ∈ 𝐒, either 𝟏(Ξ)Λ(𝑧) ≤ 2𝑀 −𝛾/3 Γ(𝑧)−1 or
𝟏(Ξ)Λ(𝑧) ≥ 2−1 𝑀 −𝛾/4 Γ(𝑧)−1 . Since Λ is continuous and 𝐒 is by definition con-
nected, we conclude that either
(8.43) ∀𝑧 ∈ 𝐒 ∶ 𝟏(Ξ)Λ(𝑧) ≤ 2𝑀 −𝛾/3 Γ(𝑧)−1
or
(8.44) ∀𝑧 ∈ 𝐒 ∶ 𝟏(Ξ)Λ(𝑧) ≥ 2−1 𝑀 −𝛾/4 Γ(𝑧)−1 .
Here the bounds (8.43) and (8.44) each hold surely, i.e., for every realization of
Λ(𝑧).
It remains to show that (8.44) is impossible. In order to do so, it suffices to
show that there exists a 𝑧 ∈ 𝐒 such that Λ(𝑧) < 2−1 𝑀 −𝛾/4 Γ(𝑧)−1 with probability
greater than 1/2. But this holds for any 𝑧 with Im 𝑧 = 2, as follows from Lemma
8.6 and the bound Γ ≤ 𝐶𝜂 −1 (6.20). This concludes the proof of Proposition 8.7.

8.3.4. Fluctuation Averaging Lemma. Recall the definition of the small


control parameter Π from (6.30) and the definition of the average [𝑎] of a vector
(𝑎𝑖 )𝑁
𝑖=1 from (8.12). We have shown in Lemma 8.5 that |Υ𝑖 | ≺ Π, but in fact the
average [Υ] is one order better; it is Π2 . This is due to the fluctuation averaging
phenomenon, which we state as the following lemma. We will explain its proof
and related earlier results in Chapter 10.
We shall perform the averaging with respect to a family of complex weights
𝑇 = (𝑡𝑖𝑘 ) satisfying
(8.45) 0 ≤ |𝑡𝑖𝑘 | ≤ 𝑀 −1 , ∑|𝑡𝑖𝑘 | ≤ 1.
𝑘

Typical example weights are 𝑡𝑖𝑘 = 𝑠𝑖𝑘 and 𝑡𝑖𝑘 = 𝑁 −1 . Note that in both of these
cases 𝑇 commutes with 𝑆.
Lemma 8.9 (Fluctuation averaging). Fix a spectral domain 𝐃 and a deter-
ministic control parameter Ψ satisfying (8.20). Let the weights 𝑇 = (𝑡𝑖𝑘 ) satisfy
(8.45).
(i) If Λ ≺ Ψ, then we have
(8.46) ∑ 𝑡𝑖𝑘 𝑄𝑘 𝐺𝑘𝑘 = 𝑂≺ (Ψ2 ).
𝑘
74 8. PROOF OF THE LOCAL SEMICIRCLE LAW

(ii) If Λd ≺ 𝑀 −𝑐 for some 𝑐 > 0 and Λo ≺ Ψo (with Ψo also satisfying


(8.20)), then we have
1
(8.47) ∑ 𝑡𝑖𝑘 𝑄𝑘 = 𝑂≺ (Ψ2o ).
𝑘
𝐺𝑘𝑘

(iii) Assume that 𝑇 commutes with 𝑆 and Λ ≺ Ψ. Then we have


(8.48) ∑ 𝑡𝑖𝑘 𝑣𝑘 = 𝑂≺ (ΓΨ2 ).
𝑘

If, additionally, we have


(8.49) ∑ 𝑡𝑖𝑘 = 1 for all 𝑖,
𝑘

then
(8.50) ∑ 𝑡𝑖𝑘 (𝑣𝑘 − [𝑣]) = 𝑂≺ ( Γ̃ Ψ2 ) for all 𝑖
𝑘

where we defined 𝑣𝑖 ∶= 𝐺𝑖𝑖 − 𝑚.


The estimates (8.46)–(8.48) and (8.50) are uniform in the index 𝑖.
Notice that the last statement (8.50) involves the parameter Γ̃ instead of Γ, in-
dicating that the spectral gap of 𝑆 is used in its proof. We now prove a simple
corollary of the bound (8.47).
Corollary 8.10. Suppose that Λd ≺ 𝑀 −𝑐 for some 𝑐 > 0 and Λo ≺ Ψo
for some deterministic control parameter Ψo satisfying (8.20). Suppose that the
weights 𝑇 = (𝑡𝑖𝑘 ) satisfy (8.45). Then we have ∑𝑘 𝑡𝑎𝑘 Υ𝑘 = 𝑂≺ (Ψ2𝑜 ).

Proof. The claim easily follows from the Schur complement formula (8.7)
written in the form
1
Υ𝑖 = 𝐴𝑖 + 𝑄𝑖 .
𝐺𝑖𝑖
We may therefore estimate ∑𝑘 𝑡𝑎𝑘 Υ𝑘 using the trivial bound |𝐴𝑖 | ≺ Ψ2𝑜 as well
as the fluctuation averaging bound from (8.47). □

8.3.5. The Final Iteration Scheme. First note that Proposition 8.7 guar-
antees that 𝜙 ≡ 1 may be chosen in Lemma 8.5, since the condition 𝜙Λ ≺ 𝑀 −𝑐
is satisfied. Therefore, Lemma 8.5 asserts that Λo is stochastically dominated
by
Im 𝑚sc + Λ Im 𝑚sc 𝑀 𝜀
≤ 𝑀 −𝜀 Λ + + ,
√ 𝑀𝜂 √ 𝑀𝜂 𝑀𝜂
where we have used the Schwarz inequality. The next lemma, the main estimate
behind the proof of Theorem 6.7, extends this estimate to bound Λd with the
same quantity. Thus, roughly speaking, we can estimate Λ by 𝑀 −𝜀 Λ plus a
deterministic error term. This gives a recursive relation on the upper bound
for Λ, which will be the basic step of our iteration scheme.
8.3. PROOF OF THE LOCAL SEMICIRCLE LAW WITHOUT USING THE SPECTRAL GAP 75

Proposition 8.11. Let Ψ be a control parameter satisfying


(8.51) 𝑐𝑀 −1/2 ≤ Ψ ≤ 𝑀 −𝛾/3 Γ−1
and fix 𝜀 ∈ (0, 𝛾/3). Then on the domain 𝐒 we have the implication
(8.52) Λ≺Ψ ⟹ Λ ≺ 𝐹(Ψ) ,
where we defined

Im 𝑚sc 𝑀 𝜀
𝐹(Ψ) ∶= 𝑀 −𝜀 Ψ + + .
√ 𝑀𝜂 𝑀𝜂

Proof. Suppose that Λ ≺ Ψ for some deterministic control parameter Ψ


satisfying (8.51). We invoke Lemma 8.5 with 𝜙 = 1 (recall the bound (6.19)) to
get

Im 𝑚sc + Λ Im 𝑚sc + Ψ
(8.53) Λo + |𝑍𝑖 | + |Υ𝑖 | ≺ ≺ .
√ 𝑀𝜂 √ 𝑀𝜂
Next, we estimate Λd . Define the 𝑧-dependent indicator function
𝜓 ∶= 𝟏(Λ ≤ 𝑀 −𝛾/4 ).
By (8.51), (6.19), and the assumption Λ ≺ Ψ, we have 1 − 𝜓 ≺ 0. On the event
{𝜓 = 1}, (8.17) is rigorous and we get the bound

𝜓|𝑣𝑖 | ≤ 𝐶𝜓 ||∑ 𝑠𝑖𝑘 𝑣𝑘 − Υ𝑖 || + 𝐶𝜓Λ2 .


𝑘

Using the fluctuation averaging estimate (8.48) to bound ∑𝑘 𝑠𝑖𝑘 𝑣𝑘 and (8.53) to
bound Υ𝑖 , we find

Im 𝑚sc + Ψ
(8.54) 𝜓|𝑣𝑖 | ≺ ΓΨ2 + ,
√ 𝑀𝜂
where we used the assumption Λ ≺ Ψ and the lower bound from (6.19) so
that 𝐶𝜓Λ2 ≤ ΓΨ2 . Since the set {𝜓 = 0} has very small probability (using our
notation, this is expressed by 1 − 𝜓 ≺ 0), we conclude

Im 𝑚sc + Ψ
(8.55) Λd ≺ ΓΨ2 + ,
√ 𝑀𝜂
which, combined with (8.53), yields

Im 𝑚sc + Ψ
(8.56) Λ ≺ ΓΨ2 + .
√ 𝑀𝜂

Using the Schwarz inequality and the assumption Ψ ≤ 𝑀 −𝛾/3 Γ−1 , we conclude
the proof. □
76 8. PROOF OF THE LOCAL SEMICIRCLE LAW

Finally, we complete the proof of Theorem 6.7. It is easy to check that, on


the domain 𝐒, if Ψ satisfies (8.51), then so does 𝐹(Ψ). In fact, this step is the
origin of the somewhat complicated definition of 𝜂𝐸 . We may therefore iterate
(8.52). This yields a bound on Λ that is essentially the fixed point of the map
Ψ ↦ 𝐹(Ψ), which is given by Π, defined in (6.30) (up to the factor 𝑀 𝜀 ). More
precisely, the iteration is started with Ψ0 ∶= 𝑀 −𝛾/3 Γ−1 ; the initial hypothesis
Λ ≺ Ψ0 is provided by Proposition 8.7. For 𝑘 ≥ 1 we set Ψ𝑘+1 ∶= 𝐹(Ψ𝑘 ). Hence,
from (8.52) we conclude that Λ ≺ Ψ𝑘 for all 𝑘. Choosing 𝑘 ∶= ⌈𝜀−1 ⌉ yields

Im 𝑚sc 𝑀 𝜀
Λ≺ + .
√ 𝑀𝜂 𝑀𝜂

Since 𝜀 was arbitrary, we have proved that

Im 𝑚sc (𝑧) 1
(8.57) Λ≺Π= + ,
√ 𝑀𝜂 𝑀𝜂

which is (6.31).
What remains is to prove (6.32). On the set {𝜓 = 1}, we once again use (8.17)
to get

2
(8.58) 𝜓𝑚sc (− ∑ 𝑠𝑖𝑘 𝑣𝑘 + Υ𝑖 ) = −𝜓𝑣𝑖 + 𝑂(𝜓Λ2 ).
𝑘

Averaging in (8.58) yields


2
(8.59) 𝜓𝑚sc (−[𝑣] + [Υ]) = −𝜓[𝑣] + 𝑂(𝜓Λ2 ).

By (8.57) and (8.53) with Ψ = Π, we have Λ + |Υ𝑖 | ≺ Π. Moreover, by Corollary


8.10 (with the choice of 𝑡𝑖𝑘 = 𝑁1 ), we have |[Υ]| ≺ Π2 . Thus, we get

2
𝜓[𝑣] = 𝑚sc 𝜓[𝑣] + 𝑂≺ (Π2 ).
2
Since 1 − 𝜓 ≺ 0, we conclude that [𝑣] = 𝑚sc [𝑣] + 𝑂≺ (Π2 ). Therefore,

Π2 Im 𝑚sc 1 2
|[𝑣]| ≺ 2
≤( 2
+ 2
)
|1 − 𝑚sc | |1 − 𝑚sc | |1 − 𝑚sc |𝑀𝜂 𝑀𝜂
(8.60)
Γ 2 𝐶
≤ (𝐶 + ) ≤ .
𝑀𝜂 𝑀𝜂 𝑀𝜂
2
In the third step, we used the elementary explicit bound Im 𝑚sc ≤ 𝐶|1 − 𝑚sc |
2 −1
from (6.12)–(6.13) and the bound Γ ≥ |1 − 𝑚sc | from (6.21). Because |𝑚𝑁 −
𝑚sc | = |[𝑣]|, this concludes the proof of (6.32). The proof of (6.33) is exactly
the same, just in the third inequality of (8.60) we may use the stronger bound
Im 𝑚sc ≍ 𝜂/√𝜅 + 𝜂 from (6.13) in the regime |𝐸| ≥ 2. This completes the proof
of Theorem 6.7 in the entire regime 𝐒, i.e., for 𝜂 ≥ 𝜂𝐸 .
8.3. PROOF OF THE LOCAL SEMICIRCLE LAW WITHOUT USING THE SPECTRAL GAP 77

8.3.6. Summary of the Proof: A Recapitulation. In order to highlight


the main ideas again, we summarize the key steps leading to the proof of The-
orem 6.7 restricted to the regime 𝜂 ≥ 𝜂𝐸 . As a guiding principle we stress that
the main control parameter in the proof is Λ = max𝑖𝑗 {|𝐺𝑖𝑖 − 𝑚sc |, |𝐺𝑖𝑗 |}. The
main goal is to get a closed, self-improving inequality for Λ, at least with a very
high probability. We needed the following ingredients:
(1) A self-consistent system of equations for the diagonal elements of the
resolvent, written as
1 1
(8.61) −(𝑆𝐯)𝑖 + Υ𝑖 = − where 𝑣𝑖 = 𝐺𝑖𝑖 − 𝑚sc .
𝑚sc + 𝑣𝑖 𝑚sc
Since Υ𝑖 contains the resolvent 𝐺 (𝑖) , we also need perturbation formulas con-
necting 𝐺 and 𝐺 (𝑖) to make these equations self-consistent.
(2) A large-deviation bound on the off-diagonal elements and on the fluc-
tuation, i.e., on Λo + |𝑍𝑖 |, as

Im 𝑚sc + Λ
(8.62) Λo + |𝑍𝑖 | + |Υ𝑖 | ≺ .
√ 𝑀𝜂
(3) An initial estimate on Λ for large 𝜂, where the a priori bounds |𝐺𝑖𝑗 | ≤
𝜂−1 ≤ 𝐶 are effective.
(4) A dichotomy, showing that there is a forbidden region for Λ:
𝟏(Λ ≤ 𝑀 −𝛾/4 Γ−1 )Λ ≺ 𝑀 −𝛾/2 Γ−1 .
(5) A crude bound on Λ down to small 𝜂 obtained from the dichotomy and
the initial estimate via a continuity argument.
(6) Application of the fluctuation averaging lemma, i.e., Lemma 8.9, to es-
timate 𝑆𝐯 in the self-consistent equation. Thus 𝑆𝐯 becomes of order Λ2 , i.e.,
one order higher than the trivial bound |𝑣𝑖 | ≤ Λ. This “boost” exploits the can-
cellations in averages of weakly correlated centered random variables, and it
constitutes the crucial improvement over Chapter 7. Thus, we can use (8.17) to
have
(8.63) 𝑣𝑖 ≲ Υ𝑖 + 𝑂(Λ2 ).
(7) Combination of the large-deviation bound (8.62) for Υ𝑖 with (8.63) pro-
vides an estimate on Λd = max𝑖 |𝑣𝑖 | in terms of Λ that is better than the trivial
bound Λd ≤ Λ. Together with the large-deviation bound on Λo in (8.62), we
have a closed inequality for Λ:

Im 𝑚sc 𝑀 𝜀
Λ ≤ 𝑀 −𝜀 Λ + + .
√ 𝑀𝜂 𝑀𝜂
The error term 𝑀 −𝜀 Λ can be absorbed into the left side, and we have proved
the estimate on Λ asserted in Theorem 6.7. In practice, the last inequality was
formulated in terms of a control parameter and we used an iteration scheme.
78 8. PROOF OF THE LOCAL SEMICIRCLE LAW

(8) Averaging the self-consistent equation (8.17) to obtain a stronger esti-


mate on the average quantity [𝑣]. Fluctuation averaging is used again, this time
to the average of Υ, so that it becomes one order higher. This concludes the
proof of Theorem 6.7 for 𝜂 ≥ 𝜂𝐸 .
CHAPTER 9

Sketch of the Proof of the Local Semicircle Law


Using the Spectral Gap

In Section 8.3 we proved the local semicircle law, Theorem 6.7, uniformly
for 𝜂 ≥ 𝜂𝐸 instead of the larger regime 𝜂 ≥ ˜ 𝜂 𝐸 . Recall that for generalized
Wigner matrices the two thresholds 𝜂𝐸 and ˜ 𝜂 𝐸 are determined by the relation
𝑁𝜂𝐸 (𝜅𝐸 + 𝜂𝐸 )3/2 ≫ 1 and ˜ 𝜂 𝐸 ≫ 1/𝑁. Hence these two thresholds coincide
in the bulk but substantially differ near the edge. In this section we sketch
the proof of Theorem 6.7 for any 𝜂 ≥ ˜ 𝜂 𝐸 , i.e., for the optimal domain for 𝜂.
For the complete proof we refer to section 6 of [55]. This section can be read
independently of Section 8.3.
We point out that the difference between these two thresholds stem from
the difference between Γ and Γ̃ (see (6.28) and (8.19)). The bound Γ on the
norm of (1 − 𝑚2 𝑆)−1 entered the proof when the self-consistent equation (8.17)
was solved. The key idea in this section is that we can use Γ̃ to solve the self-
consistent equation (8.17) separately on the subspace of constants (the span of
the vector e) and on its orthogonal complement e⟂ .
Step 1. We bound the control parameter Λ = Λ(𝑧) from (8.23) in terms of
Θ ∶= |𝑚𝑁 − 𝑚sc |. This is the content of the following lemma. From now on,
we assume that 𝑐 ≤ Γ̃ ≤ 𝐶, which is valid for generalized Wigner matrices (see
(6.24)). This will simplify several estimates in the argument that follows; for the
general case, we refer to [55].
Lemma 9.1. Define the 𝑧-dependent indicator function
(9.1) 𝜙 ∶= 𝟏(Λ ≤ 𝑀 −𝛾/4 )
and the random control parameter

Im 𝑚sc + Θ 1
(9.2) 𝑞(Θ) ∶= + , Θ = |𝑚𝑁 − 𝑚sc |.
√ 𝑀𝜂 𝑀𝜂
Then we have
(9.3) 𝜙Λ ≺ Θ + 𝑞(Θ).
Proof. For the whole proof we work on the event {𝜙 = 1}; i.e., every quan-
tity is multiplied by 𝜙. We consistently drop these factors 𝜙 from our notation in
order to avoid cluttered expressions. In particular, we set Λ ≤ 𝑀 −𝛾/4 through-
out the proof.
79
80 9. SKETCH OF THE PROOF OF THE LOCAL SEMICIRCLE LAW

We begin by estimating Λo and Λd in terms of Θ. Recalling (6.19), we find


that 𝜙 satisfies the hypotheses of Lemma 8.5, from which we get

Im 𝑚sc + Λ
(9.4) Λo + |Υ𝑖 | ≺ 𝑟(Λ), 𝑟(Λ) ∶= .
√ 𝑀𝜂
In order to estimate Λd , from (8.17) we have, on the event {𝜙 = 1}, that
2
(9.5) 𝑣𝑖 − 𝑚sc ∑ 𝑠𝑖𝑘 𝑣𝑘 = 𝑂≺ (Λ2 + 𝑟(Λ));
𝑘

here we used the bound (9.4) on |Υ𝑖 |. Next, we subtract the average 𝑁 −1 ∑𝑖 from
each side to get
2
(𝑣𝑖 − [𝑣]) − 𝑚sc ∑ 𝑠𝑖𝑘 (𝑣𝑘 − [𝑣]) = 𝑂≺ (Λ2 + 𝑟(Λ))
𝑘

where we used ∑𝑘 𝑠𝑖𝑘 = 1. Note that the average over 𝑖 of the left-hand side
vanishes, so that the average of the right-hand side also vanishes. Hence, the
2
right-hand side is perpendicular to 𝐞. Inverting the operator 1 − 𝑚sc 𝑆 on the

subspace 𝐞 and using Γ̃ ≤ 𝐶 therefore yields
(9.6) |𝑣𝑖 − [𝑣]| ≺ Λ2 + 𝑟(Λ).
Combining this with the bound Λo ≺ 𝑟(Λ) from (9.4) and recalling that Θ =
|[𝑣]|, we therefore get
(9.7) Λ ≺ Θ + Λ2 + 𝑟(Λ).
By definition of 𝜙 we have Λ2 ≤ 𝑀 −𝛾/4 Λ, so that the second term on the right-
hand side of (9.7) may be absorbed into the left-hand side,
(9.8) Λ ≺ Θ + 𝑟(Λ),
where we used the cancellation property (6.25). Using (9.8) and the Cauchy-
Schwarz inequality, we get

Im 𝑚sc + Λ Im 𝑚sc + Θ 𝑟(Λ)


𝑟(Λ) ≤ ≺ +
√ 𝑀𝜂 √ 𝑀𝜂 √ 𝑀𝜂
Im 𝑚sc + Θ 1
≤ + 𝑀 −𝜖 𝑟(Λ) + 𝑀 𝜖
√ 𝑀𝜂 𝑀𝜂
for any 𝜖 > 0. Since 𝜖 > 0 can be arbitrarily small, we conclude that
(9.9) 𝑟(Λ) ≺ 𝑞(Θ).
Clearly, (9.3) follows from (9.8) and (9.9). □

Step 2. Having estimated Λ in terms of Θ, we can derive a closed relation


involving only Θ. More precisely, we will show that the following self-consistent
equation holds:
2 −1
(9.10) 𝜙((1 − 𝑚sc )[𝑣] − 𝑚sc [𝑣]2 ) = 𝜙𝑂≺ (𝑞(Θ)2 + 𝑀 −𝛾/4 Θ2 ).
9. SKETCH OF THE PROOF OF THE LOCAL SEMICIRCLE LAW 81

Considering the right-hand side as a small error, we will view (9.10) as a small
perturbation of a quadratic equation for [𝑣]. Recalling that Θ = |[𝑣]|, the error
term can be determined self-consistently. Thus, up to the accuracy of the error
terms, we can solve (9.10) for [𝑣].
We now give a heuristic proof for (9.10). There are two main issues that we
ignore in this sketch. First, we work only on the event {𝜙 = 1}; i.e., we assume
that an a priori bound Λ ≪ 1 has already been proved uniformly for all 𝜂 ≥ ˜ 𝜂 𝐸.
It will require a separate argument to show that the complement event {𝜙 = 0}
is negligible and, in some sense, this constitutes the essential part to extend the
proof of Theorem 6.7 from 𝜂 ≥ 𝜂𝐸 to the larger regime 𝜂 ≥ ˜ 𝜂 𝐸 . Second, we will
neglect certain subtleties of the ≺ relation. While most arithmetics involving ≺
work in the same way as the usual inequality relation ≤, the cancellation rule
(6.25) requires the coefficient of 𝑋 on the right-hand side to be small. We will
disregard this requirement below and we will treat ≺ as the usual ≤.
Since we work on the event {𝜙 = 1}, it is understood that every quantity
below is multiplied by 𝜙, but we will ignore this fact in the formulas. Recall
from (8.17) that
2 2 −1 2
(9.11) 𝑣𝑖 − 𝑚sc ∑ 𝑠𝑖𝑘 𝑣𝑘 + 𝑚sc Υ𝑖 = 𝑚sc 𝑣𝑖 + 𝑂(Λ3 ).
𝑘

In order to take the average over 𝑖 and get a closed equation for [𝑣], we write,
using (9.6),

𝑣𝑖2 = ([𝑣] + 𝑣𝑖 − [𝑣])2 = [𝑣]2 + 2[𝑣](𝑣𝑖 − [𝑣]) + 𝑂≺ ((Λ2 + 𝑟(Λ))2 ).

Plugging this back into (9.11) and taking the average over 𝑖 gives
2 2 −1
(1 − 𝑚sc )[𝑣] + 𝑚sc [Υ] = 𝑚sc [𝑣]2 + 𝑂≺ (Λ3 + 𝑟(Λ)2 ).
1
By Corollary 8.10 with 𝑡𝑖𝑘 = 𝑁
, we can estimate

(9.12) [Υ] ≺ Λ2o ≺ 𝑟(Λ)2 ≺ 𝑞(Θ)2 ,

where we have used (8.24) and (9.9). Hence, we have


2 −1
(1 − 𝑚sc )[𝑣] − 𝑚sc [𝑣]2 = 𝑂≺ (Λ3 + 𝑟(Λ)2 ) ≤ 𝑂≺ (Θ3 + 𝑞(Θ)2 )

where we have used (9.3). This shows (9.10).


Step 3. Finally, we need to solve the approximately quadratic relation (9.10)
for [𝑣], keeping in mind that Θ = |[𝑣]|. This will give the main estimates (6.32)
and (6.33) in Theorem 6.7. The entrywise version of the local semicircle law,
(6.31), will then directly follow from (9.3) on the event {𝜙 = 1}.
Recall the definition of 𝑞 from (9.2), so that
Im 𝑚sc + Θ 1
(9.13) 𝑞(Θ)2 ≍ + ,
𝑀𝜂 (𝑀𝜂)2
82 9. SKETCH OF THE PROOF OF THE LOCAL SEMICIRCLE LAW

and recall from Lemma 6.2 that


2
𝑐 ≤ |𝑚sc (𝑧)| ≤ 1 − 𝑐𝜂, |1 − 𝑚sc (𝑧)| ≍ √𝜅 + 𝜂,
(9.14) √𝜅 + 𝜂 if |𝐸| ≤ 2,
Im 𝑚sc (𝑧) ≍ { 𝜂
if |𝐸| ≥ 2.
√𝜅+𝜂

In particular, the coefficient of Θ2 = [𝑣]2 in the left-hand side of (9.10) is a


nonvanishing constant. In the right-hand side it is at most 𝑀 −𝛾/4 ≪ 1, so the
latter can be absorbed into the former. We can thus rewrite (9.10) as
(9.15) [𝑣]2 − 𝐵[𝑣] = 𝐷,
where
2
1 − 𝑚sc + 𝑂((𝑀𝜂)−1 ) 2
𝐵= −1
≍ 1 − 𝑚sc + 𝑂((𝑀𝜂)−1 ),
𝑚sc + 𝑂(𝑀 −𝛾/4 )
Im 𝑚sc 1
+
𝑀𝜂 (𝑀𝜂)2 Im 𝑚sc 1
|𝐷| ≤ −1
≍ + .
𝑚sc + 𝑂(𝑀 −𝛾/4 ) 𝑀𝜂 (𝑀𝜂)2
This quadratic equation has two solutions
𝐵 ± √𝐵2 + 4𝐷
(9.16) [𝑣] = ;
2
the correct one is selected by a continuity argument. Fix an energy |𝐸| ≤ 2,
consider [𝑣] = [𝑣(𝐸 + 𝑖𝜂)] as a function of 𝜂, and look at the two solution
branches (9.16) as continuous functions of 𝜂.
It is easy to check that for 𝜂 ≫ 1 large, we have |𝐵| ≍ 1 and |𝐷| ≪ 1, so the
two solutions are well-separated; one of them is close to 0, the other one is close
to 𝐵. The a priori bound |[𝑣]| ≤ 𝜂 −1 ≪ 1 guarantees that the correct branch is the
one near 0. A more careful analysis of (9.16) with the given coefficient functions
𝐵(𝜂) and 𝐷(𝜂) shows that on this branch the following inequality holds:
(9.17) Θ = |[𝑣]| ≲ (𝑀𝜂)−1 .
To see this more precisely, we distinguish two cases by reducing 𝜂 from a very
large value to 0. In the first regime, where 𝑀𝜂 √𝜅 + 𝜂 ≫ 1, we have |𝐵| ≍ √𝜅 + 𝜂
and |𝐷| ≲ √𝜅 + 𝜂/𝑀𝜂 ≪ |𝐵|2 , so the two branches remain separated and the
relevant branch satisfies Θ ≲ |𝐷/𝐵| ≲ (𝑀𝜂)−1 . As we decrease 𝜂 and enter the
regime 𝑀𝜂 √𝜅 + 𝜂 ≲ 1, both solutions satisfy Θ ≲ |𝐵| + √|𝐷|. In this regime,
|𝐵| ≲ (𝑀𝜂)−1 and |𝐷| ≲ (𝑀𝜂)−2 , so we also have (9.17). In both cases we
conclude the final bound (6.32). The proof of (6.33) for the regime |𝐸| > 2 is
similar. The improvement comes from the fact that in this regime (9.14) gives
a better bound for Im 𝑚sc (𝑧).
CHAPTER 10

Fluctuation Averaging Mechanism

10.1. Intuition Behind the Fluctuation Averaging


Before starting the proof of Lemma 8.9, we explain why it is so important
in our context. We also give a heuristic second moment calculation to show the
essential mechanism.
The leading error in the self-consistent equation (8.58) for 𝑣𝑖 is Υ𝑖 . Among
the three summands of Υ𝑖 (see (8.11)), typically 𝑍𝑖 is the largest; thus, we typi-
cally have
(10.1) 𝑣𝑖 ≺ 𝑍𝑖
in the regime where Γ is bounded. The large-deviation bounds in Theorem 7.7
show that 𝑍𝑖 ≺ Λo , and a simple second-moment calculation shows that this
bound is essentially optimal. On the other hand, (8.3) shows that the typical
size of the off-diagonal resolvent matrix elements is at least of order (𝑁𝜂)−1/2 ;
thus, the estimate
1
𝑍𝑖 ≺ Λ o ≺
√𝑁𝜂
is essentially optimal in the generalized Wigner case (𝑀 ≍ 𝑁). Together with
(10.1) this shows that the natural bound for Λ is (𝑁𝜂)−1/2 , which is also reflected
in the bound (6.31).
However, the bound (6.32) for the average, 𝑚𝑁 − 𝑚 = [𝑣], is of order
Λ2 ≍ (𝑁𝜂)−1 ; i.e., it is one order better than the bound for 𝑣𝑖 . For the pur-
pose of estimating [𝑣], it is the average [Υ] of the leading errors Υ𝑖 that matters
(see (8.59)). Since 𝑍𝑖 , the leading term in Υ𝑖 , is a fluctuating quantity with zero
expectation, the improvement comes from the fact that fluctuations cancel out
in the average. The basic example of this phenomenon is the central limit the-
orem; the average of 𝑁 independent centered random variables is typically of
order 𝑁 −1/2 , much smaller than the size of each individual variable. In our
case, however, the 𝑍𝑖 are not independent. In fact, their correlations do not
decay, at least in the Wigner case where the column of 𝐻 with an index other
than 𝑖 plays the same role in the function 𝑍𝑖 . Thus, standard results on central
limit theorems for weakly correlated random variables do not apply here and a
new argument is needed. We first give an idea how the fluctuation averaging
mechanism works by a second-moment calculation.
First, we claim that
| 1 |
(10.2) |𝑄𝑘 | ≺ Ψo .
| 𝐺𝑘𝑘 |
83
84 10. FLUCTUATION AVERAGING MECHANISM

Indeed, from the Schur complement formula (8.7) we get |𝑄𝑘 (𝐺𝑘𝑘 )−1 | ≤ |ℎ𝑘𝑘 |+
|𝑍𝑘 |. The first term is estimated by |ℎ𝑘𝑘 | ≺ 𝑀 −1/2 ≤ Ψo . The second term is
estimated exactly as in (8.29) and (8.30), giving |𝑍𝑘 | ≺ Ψo . In fact, the same
(𝕋)
bound holds if 𝐺𝑘𝑘 is replaced with 𝐺𝑘𝑘 as long as the cardinality of 𝕋 is bounded
(see (10.10) later).
Abbreviate 𝑋𝑘 ∶= 𝑄𝑘 (𝐺𝑘𝑘 )−1 and compute the variance:
2
(10.3) 𝔼||∑ 𝑡𝑖𝑘 𝑋𝑘 || = ∑ 𝑡𝑖𝑘 𝑡𝑖𝑙 𝔼𝑋𝑘 𝑋𝑙 = ∑ 𝑡𝑖𝑘
2
𝔼𝑋𝑘 𝑋𝑘 + ∑ 𝑡𝑖𝑘 𝑡𝑖𝑙 𝔼𝑋𝑘 𝑋𝑙 .
𝑘 𝑘,𝑙 𝑘 𝑘≠𝑙

Using the bounds (8.45) on 𝑡𝑖𝑘 and (10.2), we find that the first term on the
right-hand side of (10.3) is 𝑂≺ (𝑀 −1 Ψ2o ) = 𝑂≺ (Ψ4o ), where we used that Ψo is
admissible, recalling (8.20). Let us therefore focus on the second term of (10.3).
Using the fact that 𝑘 ≠ 𝑙, we apply the first resolvent decoupling formula (8.1)
to 𝑋𝑘 and 𝑋𝑙 to get
1 1
𝔼𝑋𝑘 𝑋𝑙 = 𝔼𝑄𝑘 ( )𝑄𝑙 ( )
𝐺𝑘𝑘 𝐺𝑙𝑙
(10.4)
1 𝐺𝑘𝑙 𝐺𝑙𝑘 1 𝐺𝑙𝑘 𝐺𝑘𝑙
= 𝔼𝑄𝑘 ( − )𝑄𝑙 ( − ).
(𝑙) (𝑙) (𝑘) (𝑘)
𝐺𝑘𝑘 𝐺𝑘𝑘 𝐺𝑘𝑘 𝐺𝑙𝑙 𝐺𝑙𝑙 𝐺𝑙𝑙 𝐺𝑙𝑙 𝐺𝑘𝑘
Notice that we used (8.1) in the form
(𝕋) (𝕋)
1 1 𝐺𝑘𝑙 𝐺𝑙𝑘
(10.5) = − for any 𝑘 ≠ 𝑙, 𝑘, 𝑙 ∉ 𝕋
(𝕋) (𝕋𝑙) (𝕋) (𝕋𝑙) (𝕋)
𝐺𝑘𝑘 𝐺𝑘𝑘 𝐺𝑘𝑘 𝐺𝑘𝑘 𝐺𝑙𝑙
with 𝕋 = ∅. We multiply out the parentheses on the right-hand side of (10.4).
The crucial observation is that if the random variable 𝑌 is independent of 𝑖 (see
Definition 7.3) then not only 𝑄𝑖 𝑌 = 0 but also 𝔼𝑄𝑖 (𝑋)𝑌 = 𝔼𝑄𝑖 (𝑋𝑌) = 0 hold
for any 𝑋. Hence out of the four terms obtained from the right-hand side of
(10.4), the only nonvanishing one is

𝐺𝑘𝑙 𝐺𝑙𝑘 𝐺𝑙𝑘 𝐺𝑘𝑙


𝔼𝑄𝑘 (
(𝑙)
)𝑄𝑙 ( (𝑘)
) ≺ Ψ4o
𝐺𝑘𝑘 𝐺𝑘𝑘 𝐺𝑙𝑙 𝐺𝑙𝑙 𝐺𝑙𝑙 𝐺𝑘𝑘
where we used that the denominators are harmless (see (8.27)–(8.28)). Together
2
with (8.45), this concludes the proof of 𝔼|∑𝑘 𝑡𝑖𝑘 𝑋𝑘 | ≺ Ψ4o , which means that
∑𝑘 𝑡𝑖𝑘 𝑋𝑘 is bounded by Ψ2o in the second-moment sense.

10.2. Proof of Lemma 8.9


Here we follow the presentation of the proof from [55], and we will com-
ment on previous results in Section 10.3.1. We start with a simple lemma that
summarizes the key properties of ≺ when combined with partial expectation.
Lemma 10.1. Suppose that the deterministic control parameter Ψ satisfies
Ψ ≥ 𝑁 −𝐶 , and that for all 𝑝 there is a constant 𝐶𝑝 such that the nonnegative
10.2. PROOF OF LEMMA 8.9 85

random variable 𝑋 satisfies 𝔼𝑋 𝑝 ≤ 𝑁 𝐶𝑝 . Suppose, moreover, that 𝑋 ≺ Ψ. Then,


for any fixed 𝑛 ∈ ℕ we have
(10.6) 𝔼𝑋 𝑛 ≺ Ψ𝑛 .
(Note that this estimate involves deterministic quantities only; i.e., it means that
𝔼𝑋 𝑛 ≤ 𝑁 𝜀 Ψ𝑛 for any 𝜀 > 0 if 𝑁 ≥ 𝑁0 (𝑛, 𝜀).) Conversely, if (10.6) holds for any
𝑛 ∈ ℕ, then 𝑋 ≺ Ψ. Moreover, we have
(10.7) 𝑃𝑖 𝑋 𝑛 ≺ Ψ𝑛 , 𝑄𝑖 𝑋 𝑛 ≺ Ψ𝑛 ,
uniformly in 𝑖. If 𝑋 = 𝑋(𝑢) and Ψ = Ψ(𝑢) depend on some parameter 𝑢, and the
above assumptions are uniform in 𝑢, then so are the conclusions.

This lemma is a generalization of (6.26), but notice that the majoring quan-
tity Ψ has to be deterministic. For general random variables, 𝑋 ≺ 𝑌 may not
imply the analogous relation 𝔼[𝑋|ℱ] ≺ 𝔼[𝑌|ℱ] for conditional expectations
with respect to a 𝜎-algebra ℱ.

Proof of Lemma 10.1. To prove (10.6), it is enough to consider the case


𝑛 = 1; the case of larger 𝑛 follows immediately from the case 𝑛 = 1 using the
basic property of the stochastic domination that 𝑋 ≺ Ψ implies 𝑋 𝑛 ≺ Ψ𝑛 .
For the first claim, pick 𝜖 > 0. Then
𝔼𝑋 = 𝔼𝑋 ⋅ 𝟏(𝑋 ≤ 𝑁 𝜖 Ψ) + 𝔼𝑋 ⋅ 𝟏(𝑋 > 𝑁 𝜖 Ψ)
≤ 𝑁 𝜖 Ψ + √𝔼𝑋 2 √ℙ(𝑋 > 𝑁 𝜖 Ψ) ≤ 𝑁 𝜖 Ψ + 𝑁 𝐶2 /2−𝐷/2

for arbitrary 𝐷 > 0. The first claim therefore follows by choosing 𝐷 large
enough. For the converse statement, we use the Chebyshev inequality: for any
𝜀 > 0 and 𝐷 > 0,
𝔼𝑋 𝑛
ℙ(𝑋 ≥ 𝑁 𝜀 Ψ) ≤ ≤ 𝑁 𝜀−𝜀𝑛 ≤ 𝑁 −𝐷
𝑁 𝜀𝑛 Ψ𝑛
by choosing 𝑛 large enough.
Finally, the claim (10.7) follows from Chebyshev’s inequality, using a high-
moment estimate combined with Jensen’s inequality for partial expectation. We
omit the details, which are similar to those of the first claim. □

We remark that it is tempting to generalize the converse of (10.6) and try to


verify stochastic domination 𝑋 ≺ 𝑌 between two random variables by proving
that 𝔼𝑋 𝑛 ≤ 𝔼𝑌 𝑛 . This implication in general is wrong, but it is correct if 𝑌 is
deterministic. This subtlety is the reason why many of the following theorems
about two random variables 𝐴 and 𝐵 are formulated in such a way that “if 𝐴
satisfies 𝐴 ≺ Ψ for some deterministic Ψ, then 𝐵 ≺ Ψ,” instead of directly saying
the more natural (but sometimes wrong) assertion that 𝐴 ≺ 𝐵.
We shall apply Lemma 10.1 to the resolvent entries of 𝐺. In order to verify
its assumptions, we record the following bounds.
86 10. FLUCTUATION AVERAGING MECHANISM

Lemma 10.2. Suppose that Λ ≺ Ψ and Λo ≺ Ψo for some deterministic con-


trol parameters Ψ and Ψo both satisfying (8.20). Fix 𝑝 ∈ ℕ. Then, for any 𝑖 ≠ 𝑗
and 𝕋 ⊂ {1, … , 𝑁} satisfying |𝕋| ≤ 𝑝 and 𝑖, 𝑗 ∉ 𝕋 we have
(𝕋) 1
(10.8) 𝐺𝑖𝑗 = 𝑂≺ (Ψo ), = 𝑂≺ (1).
(𝕋)
𝐺𝑖𝑖
(𝕋)
Moreover, we have the rough bounds |𝐺𝑖𝑗 | ≤ 𝑀 and
𝑛
| 1 |
(10.9) 𝔼| | ≤ 𝑁𝜖
| 𝐺 (𝕋) |
𝑖𝑖

for any 𝜖 > 0 and 𝑁 ≥ 𝑁0 (𝑛, 𝜖).


Proof. The bounds (10.8) follow easily by a repeated application of (8.1),
the assumption Λ ≺ 𝑀 −𝑐 , and the lower bound in (6.11). The deterministic
(𝕋)
bound |𝐺𝑖𝑗 | ≤ 𝑀 follows immediately from 𝜂 ≥ 𝑀 −1 by definition of a spectral
domain.
In order to prove (10.9), we use the Schur complement formula (8.7) applied
(𝕋) (𝕋)
to 1/𝐺𝑖𝑖 where the expectation is estimated using (5.6) and |𝐺𝑖𝑗 | ≤ 𝑀. (Recall
(6.4).) This gives
𝑝
| 1 |
𝔼| | ≺ 𝑁 𝐶𝑝
| 𝐺 (𝕋) |
𝑖𝑖
(𝕋)
for all 𝑝 ∈ ℕ. Since 1/𝐺𝑖𝑖 ≺ 1, (10.9) therefore follows from (10.6). □

We now start the proof of Lemma 8.9. The main statement is (8.47), the
other bounds will be relatively simple consequences. Finally, we present an
alternative argument for (8.47) that organizes the stopping rule in the expansion
somewhat differently.

Proof of Lemma 8.9.


Part I: Proof of (8.47). First we claim that, for any fixed 𝑝 ∈ ℕ, we have
| 1 |
(10.10) |𝑄𝑘 | ≺ Ψo
| 𝐺 (𝕋) |
𝑘𝑘

uniformly for 𝕋 ⊂ {1, … , ℕ}, |𝕋| ≤ 𝑝, and 𝑘 ∉ 𝕋. To simplify notation, for the
proof we set 𝕋 = ∅; the proof for nonempty 𝕋 is the same. From the Schur
complement formula (8.7), we get |𝑄𝑘 (𝐺𝑘𝑘 )−1 | ≤ |ℎ𝑘𝑘 | + |𝑍𝑘 |. The first term is
estimated by |ℎ𝑘𝑘 | ≺ 𝑀 −1/2 ≤ Ψo . The second term is estimated exactly as in
(8.29) and (8.30):
(𝑘) 1/2
(𝑘) 2
|𝑍𝑘 | ≺ ( ∑ 𝑠𝑘𝑥 |𝐺𝑥𝑦 | 𝑠𝑦𝑘 ) ≺ Ψo
𝑥≠𝑦
10.2. PROOF OF LEMMA 8.9 87

(𝑘)
where in the last step we used that |𝐺𝑥𝑦 | ≺ Ψo as follows from (10.8) and the
bound 1/|𝐺𝑘𝑘 | ≺ 1 (recall that Λ ≺ Ψ ≤ 𝑀 −𝑐 ). This concludes the proof of
(10.10).
Abbreviate 𝑋𝑘 ∶= 𝑄𝑘 (𝐺𝑘𝑘 )−1 . We shall estimate ∑𝑘 𝑡𝑖𝑘 𝑋𝑘 in probability
2𝑝
by estimating its 𝑝th moment by Ψo , from which the claim (8.47) will easily
follow using Chebyshev’s inequality.
After this preparation, the rest of the proof for (8.47) can be divided into four
steps.
Step 1. Coincidence structure in the expansion of the 𝐿𝑝 norm. Fix some even
integer 𝑝 and write
𝑝
| |
𝔼||∑ 𝑡𝑖𝑘 𝑋𝑘 || = ∑ 𝑡𝑖𝑘1 … 𝑡𝑖𝑘𝑝/2 𝑡 𝑖𝑘𝑝/2+1 … 𝑡𝑖𝑘𝑝 𝔼𝑋𝑘1 ⋯ 𝑋𝑘𝑝/2 𝑋 𝑘𝑝/2+1 ⋯ 𝑋 𝑘𝑝 .
𝑘 𝑘1 ,…,𝑘𝑝

Next, we regroup the terms in the sum over 𝐤 ∶= (𝑘1 , … , 𝑘𝑝 ) according to the
coincidence structure in 𝐤 as follows: given a sequence of indices 𝐤, define the
partition 𝒫(𝐤) of {1, … , 𝑝} by the equivalence relation 𝑟 ∼ 𝑠 if and only if 𝑘𝑟 = 𝑘𝑠 .
Denote the set of all partitions of {1, … , 𝑝} by 𝔓𝑝 . Then, we write
𝑝
| |
(10.11) 𝔼||∑ 𝑡𝑖𝑘 𝑋𝑘 || = ∑ ∑ 𝑡𝑖𝑘1 … 𝑡𝑖𝑘𝑝/2 𝑡 𝑖𝑘𝑝/2+1 … 𝑡 𝑖𝑘𝑝 𝟏(𝒫(𝐤) = 𝑃)𝑉(𝐤)
𝑘 𝑃∈𝔓𝑝 𝐤

where we defined
𝑉(𝐤) ∶= 𝔼𝑋𝑘1 … 𝑋𝑘𝑝/2 𝑋 𝑘𝑝/2+1 … 𝑋 𝑘𝑝 .
Given a partition 𝑃, for any 𝑟 ∈ {1, … , 𝑝}, we denote by [𝑟] the block of 𝑟 in 𝑃,
i.e., the set of all indices in the same block of the partition as 𝑟. Let 𝐿 ≡ 𝐿(𝑃) ∶=
{𝑟 ∶ [𝑟] = {𝑟}} ⊂ {1, … , 𝑝} be the set of lone labels. We denote by 𝐤𝐿 ∶= (𝑘𝑟 )𝑟∈𝐿
the summation indices associated with lone labels.
Step 2. Resolution of dependence in weakly dependent random variables. The
resolvent entry 𝐺𝑘𝑘 depends strongly on the randomness in the 𝑘-column of 𝐻,
but only weakly on the randomness in the other columns. We conclude that if
𝑟 is a lone label, then all factors 𝑋𝑘𝑠 with 𝑠 ≠ 𝑟 in 𝑉(𝐤) depend weakly on the
randomness in the 𝑘𝑟 th column of 𝐻 (if 𝑟 is not a lone label, then this statement
holds only for “all factors 𝑋𝑘𝑠 with 𝑘𝑠 ≠ 𝑘𝑟 ”). Thus, the idea is to make all
resolvent entries inside the expectation of 𝑉(𝐤) as independent of the indices
𝐤𝐿 as possible (see Definition 7.3), using the first decoupling resolvent identity
(8.1): for 𝑥, 𝑦, 𝑢 ∉ 𝕋 and 𝑥, 𝑦 ≠ 𝑢 (𝑥 can be equal to 𝑦),
(𝕋) (𝕋)
(𝕋) (𝕋𝑢) 𝐺𝑥𝑢 𝐺𝑢𝑦
(10.12) 𝐺𝑥𝑦 = 𝐺𝑥𝑦 +
(𝕋)
𝐺𝑢𝑢
and for 𝑥, 𝑢 ∉ 𝕋 and 𝑥 ≠ 𝑢
(𝕋) (𝕋)
1 1 𝐺𝑥𝑢 𝐺𝑢𝑥
(10.13) = − .
(𝕋) (𝕋𝑢) (𝕋) (𝕋𝑢) (𝕋)
𝐺𝑥𝑥 𝐺𝑥𝑥 𝐺𝑥𝑥 𝐺𝑥𝑥 𝐺𝑢𝑢
88 10. FLUCTUATION AVERAGING MECHANISM

(𝕋)
Definition 10.3. A resolvent entry 𝐺𝑥𝑦 with 𝑥, 𝑦 ∉ 𝕋 is maximally ex-
panded with respect to a set 𝐵 ⊂ {1, … 𝑁} if 𝐵 ⊂ 𝕋 ∪ {𝑥, 𝑦}.
(𝕋)
Given the set 𝐤𝐿 of lone indices, we say that a resolvent entry 𝐺𝑥𝑦 is maxi-
mally expanded if it is maximally expanded with respect to the set 𝐵 = 𝐤𝐿 . The
motivation behind this definition is that using (8.1) we cannot add upper indices
from the set 𝐤𝐿 to a maximally expanded resolvent entry. We shall apply (8.1) to
all resolvent entries in 𝑉(𝐤). In this manner we generate a sum of monomials
consisting of off-diagonal resolvent entries and inverses of diagonal resolvent
entries. We can now repeatedly apply (8.1) to each factor until either they are
all maximally expanded or a sufficiently large number of off-diagonal resolvent
entries has been generated. The cap on the number of off-diagonal entries is
introduced to ensure that this procedure terminates after a finite number of
steps.
In order to define the precise algorithm, let 𝒜 denote the set of monomials
(𝕋)
in the off-diagonal entries 𝐺𝑥𝑦 , with 𝕋 ⊂ 𝐤𝐿 , 𝑥 ≠ 𝑦, and 𝑥, 𝑦 ∈ 𝐤 ⧵ 𝕋, as well as
(𝕋)
the inverse diagonal entries 1/𝐺𝑥𝑥 with 𝕋 ⊂ 𝐤𝐿 and 𝑥 ∈ 𝐤 ⧵ 𝕋. Starting from
𝑉(𝐤), the algorithm will recursively generate sums of monomials in 𝒜. Let 𝑑(𝐴)
denote the number of off-diagonal entries in 𝐴 ∈ 𝒜. For 𝐴 ∈ 𝒜 we shall define
𝑤0 (𝐴), 𝑤1 (𝐴) ∈ 𝒜 satisfying
𝐴 = 𝑤0 (𝐴) + 𝑤1 (𝐴),
𝑑(𝑤0 (𝐴)) = 𝑑(𝐴),
𝑑(𝑤1 (𝐴)) ≥ max{2, 𝑑(𝐴) + 1}.
The idea behind this splitting is to use (8.1) on one entry of 𝐴; the first term on
the right-hand side of (8.1) gives rise to 𝑤0 (𝐴) and the second to 𝑤1 (𝐴). The
precise definition of the algorithm applied to 𝐴 ∈ 𝒜 is as follows:
(1) If all factors of 𝐴 are maximally expanded or 𝑑(𝐴) ≥ 𝑝 + 1, then stop
the expansion of 𝐴.
(2) Otherwise, choose some (arbitrary) factor of 𝐴 that is not maximally
(𝕋)
expanded. If this entry is off-diagonal, 𝐺𝑥𝑦 , we use (10.12) to de-
(𝕋)
compose 𝐺𝑥𝑦 into a sum of two terms with 𝑢 the smallest element
(𝕋)
in 𝐤𝐿 ⧵ (𝕋 ∪ {𝑥, 𝑦}). If the chosen entry is diagonal, 1/𝐺𝑥𝑥 , we use
(𝕋)
(10.13) to decompose 𝐺𝑥𝑥 into two terms with 𝑢 the smallest element
in 𝐤𝐿 ⧵ (𝕋 ∪ {𝑥}). The choice of 𝑢 to be the smallest element is not
important, we just chose it for definiteness of the algorithm. From the
(𝕋) (𝕋)
splitting of the factor 𝐺𝑥𝑦 or 1/𝐺𝑥𝑥 in the monomial 𝐴, we obtain a
(𝕋) (𝕋)
splitting 𝐴 = 𝑤0 (𝐴) + 𝑤1 (𝐴) (i.e., we replace 𝐺𝑥𝑦 or 1/𝐺𝑥𝑥 by the
right-hand sides of (10.12) or (10.13)).
It is clear that (10.14) holds with the algorithm just defined. In fact, in most
cases the last inequality in (10.14) is an equality, 𝑑(𝑤1 (𝐴)) = max{2, 𝑑(𝐴) + 1}.
The only exception is when (10.12) is used for 𝑥 = 𝑦, then two new off-diagonal
entries are added, i.e., 𝑑(𝑤1 (𝐴)) = 𝑑(𝐴) + 2. Notice also that this algorithm
10.2. PROOF OF LEMMA 8.9 89

Figure 10.1. The binary tree generated by applying the algo-


rithm (1)–(2) to a monomial 𝐴𝑟 . Each vertex of the tree is in-
dexed by a binary string 𝜎 and encodes a monomial 𝐴𝑟𝜍 . An
arrow towards the left represents the action of 𝑤0 and an arrow
towards the right the action of 𝑤1 . The monomial 𝐴𝑟11 satisfies
the assumptions of step (1), and hence its expansion is stopped,
so that the tree vertex 11 has no children.

contains some arbitrariness in the choice of the factor of 𝐴 to be expanded but


any choice will work for the proof.
We now apply this algorithm recursively to each entry 𝐴𝑟 ∶= 1/𝐺𝑘𝑟 𝑘𝑟 in the
definition of 𝑉(𝐤). More precisely, we start with 𝐴𝑟 and define 𝐴𝑟0 ∶= 𝑤0 (𝐴𝑟 )
and 𝐴𝑟1 ∶= 𝑤1 (𝐴𝑟 ). In the second step of the algorithm we define four mono-
mials
𝐴𝑟00 ∶= 𝑤0 (𝐴𝑟0 ), 𝐴𝑟01 ∶= 𝑤0 (𝐴𝑟1 ), 𝐴𝑟10 ∶= 𝑤1 (𝐴𝑟0 ), 𝐴𝑟11 ∶= 𝑤1 (𝐴𝑟1 ),
and so on, at each iteration performing the steps (1) and (2) on each new mono-
mial independently of the others. Note that the lower indices are binary se-
quences that describe the recursive application of the operations 𝑤0 and 𝑤1 . In
this manner, we generate a binary tree whose vertices are given by finite binary
strings 𝜎. The associated monomials satisfy 𝐴𝑟𝜍𝑖 ∶= 𝑤𝑖 (𝐴𝑟𝜍 ) for 𝑖 = 0, 1 where
𝜎𝑖 denotes the binary string obtained by appending 𝑖 to the right end of 𝜎. See
Figure 10.1 (borrowed from [55]) for an illustration of the tree. Notice that any
monomial that is obtained along these recursion is of the form
(𝕋 ) (𝕋 )
𝐺𝑥1 𝑦1 1 𝐺𝑥2 𝑦2 2 ⋯
(10.14)
(𝕋′ ) (𝕋′ )
𝐺𝑧1 𝑧1 1 𝐺𝑧2 𝑧2 2 ⋯
where all factors in the numerators are off-diagonal entries (𝑥𝑖 ≠ 𝑦𝑖 ) and all
factors in the denominators are diagonal entries. For later usage, we shall refer
to the sum
(10.15) |𝕋1 | + |𝕋2 | + ⋯ + |𝕋′1 | + |𝕋′2 | + ⋯
as the total number of upper indices in the monomial (10.14). (The absolute
value denotes the cardinality of the set.)
90 10. FLUCTUATION AVERAGING MECHANISM

We stop the recursion of a tree vertex whenever the associated monomial


satisfies the stopping rule of step (1). In other words, the set of leaves of the
tree is the set of binary strings 𝜎 such that either all factors of 𝐴𝑟𝜍 are maximally
expanded or 𝑑(𝐴𝑟𝜍 ) ≥ 𝑝 + 1.
We list a few obvious facts of this algorithm:
(i) 𝑑(𝐴𝑟𝜍 ) ≤ 𝑝 + 1 for any vertex 𝜎 of the tree (by the stopping rule in (1)).
(ii) The number of ones in any 𝜎 is at most 𝑝 (since each application of
𝑤1 increases 𝑑( ⋅ ) by at least one).
(iii) The number of resolvent entries (in the numerator and denominator)
in 𝐴𝑟𝜍 is bounded by 4𝑝 + 1 (since each application of 𝑤1 increases the
number of resolvent entries by at most four, and the application of 𝑤0
does not change this number).
(iv) The maximal total number of upper indices in 𝐴𝑟𝜍 for any tree vertex
𝜎 is (4𝑝 + 1)𝑝 (since |𝕋𝑖 | ≤ 𝑝 and |𝕋′𝑖 | ≤ 𝑝 in the formula (10.15)).
(v) 𝜎 contains at most (4𝑝 + 1)𝑝 zeros (since each application of 𝑤0 in-
creases the total number of upper indices by one).
(vi) The maximal length of the string 𝜎 (i.e., the depth of the tree) is at
most (4𝑝 + 1)𝑝 + 𝑝 = 4𝑝2 + 2𝑝.
(vii) The number of leaves of a tree is bounded by (𝐶𝑝2 )𝑝 (since this num-
𝑝 2
ber is bounded by ∑𝑘=0 (4𝑝 +2𝑝) ≤ (𝐶𝑝2 )𝑝 where 𝑘 is the number of
𝑘
ones in the string encoding the leaf). Denoting by ℒ𝑟 the set of leaves
of the binary tree generated from 𝐴𝑟 , we have |ℒ𝑟 | ≤ (𝐶𝑝2 )𝑝 . In
particular, the algorithm terminates after at most (𝐶𝑝2 )𝑝 steps, since
each step generates a new leaf.
By definition of the tree and 𝑤0 and 𝑤1 , we have the following resolution of
dependence decomposition
(10.16) 𝑋𝑘𝑟 = 𝑄𝑘𝑟 ∑ 𝐴𝑟𝜍 ,
𝜍∈ℒ𝑟

where each monomial 𝐴𝑟𝜍


for 𝜎 ∈ ℒ𝑟 either consists entirely of maximally ex-
panded resolvent entries or satisfies 𝑑(𝐴𝑟𝜍 ) = 𝑝 + 1. (This is an immediate
consequence of the stopping rule in (1)). Using (10.11) and (10.16) we have the
representation
𝑝
(10.17) 𝑉(𝐤) = ∑ ⋯ ∑ 𝔼(𝑄𝑘1 𝐴1𝜍1 ) … (𝑄𝑘𝑝 𝐴𝜍𝑝 ).
𝜍1 ∈ℒ1 𝜍𝑝 ∈ℒ𝑝

Step 3. Bounding the individual terms in the expansion (10.17).


We now claim that any nonzero term on the right-hand side of (10.17) sat-
isfies
𝑝 𝑝+|𝐿|
(10.18) 𝔼(𝑄𝑘1 𝐴1𝜍1 ) ⋯ (𝑄𝑘𝑝 𝐴𝜍𝑝 ) = 𝑂≺ (Ψo ).
Proof of (10.18). Before embarking on the proof, we explain its idea. First,
notice that for any string 𝜎 we have
𝑏(𝜍)+1
(10.19) 𝐴𝑘𝜍 = 𝑂≺ (Ψo )
10.2. PROOF OF LEMMA 8.9 91

where 𝑏(𝜎) is the number ones in the string 𝜎. Indeed, if 𝑏(𝜎) = 0 then this
follows from (10.10); if 𝑏(𝜎) ≥ 1 this follows from the last statement in (10.14)
which guarantees that every one in the string 𝜎 increases the exponent of Ψo by
at least one (we also use (10.8)). In particular, each 𝐴𝑘𝜍 is bounded by at least
Ψo .
If we used only the trivial bound 𝐴𝑘𝜍 = 𝑂≺ (Ψo ) for each factor in (10.18);
i.e., we did not exploit the gain from the 𝑤1 (𝐴) type terms in the expansion,
𝑝
then the naive size of the left-hand side of (10.18) would only be Ψo . The key
observation behind (10.18) is that each lone label 𝑠 ∈ 𝐿 yields one extra factor
Ψo to the estimate. This is because the expectation in (10.17) would vanish if
all other factors (𝑄𝑘𝑟 𝐴𝑟𝜍𝑟 ), 𝑟 ≠ 𝑠, were independent of 𝑘𝑠 . The expansion of the
binary tree makes this dependence explicit by exhibiting 𝑘𝑠 as a lower index.
But this requires performing an operation 𝑤1 with the choice 𝑢 = 𝑘𝑠 in (10.12)
or (10.13). However, 𝑤1 increases the number of off-diagonal elements by at
least one. In other words, every index associated with a lone label must have
a “partner” index in a different resolvent entry which arose by application of
𝑤1 . Such a partner index may only be obtained through the creation of at least
one off-diagonal resolvent entry. The actual proof below shows that this effect
applies cumulatively for all lone labels.
In order to give the rigorous proof of (10.18), we consider two cases. Con-
sider first the case where for some 𝑟 = 1, … , 𝑝 the monomial 𝐴𝑟𝜍𝑟 on the left-hand
side of (10.18) is not maximally expanded. Then 𝑑(𝐴𝑟𝜍𝑟 ) = 𝑝 + 1, so that (10.8)
𝑝+1
yields 𝐴𝑟𝜍𝑟 ≺ Ψo . Therefore, the observation that 𝐴𝑠𝜍𝑠 ≺ Ψo for all 𝑠 ≠ 𝑟
2𝑝
together with (10.7) implies that the left-hand side of (10.18) is 𝑂≺ (Ψo ). Since
|𝐿| ≤ 𝑝, (10.18) follows.
Consider now the case where 𝐴𝑟𝜍𝑟 on the left-hand side of (10.18) is maxi-
mally expanded for all 𝑟 = 1, … , 𝑝. The key observation is the following claim
about the left-hand side of (10.18) with a nonzero expectation.
For each 𝑠 ∈ 𝐿 there exists 𝑟 ∶= 𝜏(𝑠) ∈ {1, … , 𝑝} ⧵ {𝑠}
(∗) such that the monomial 𝐴𝑟𝜍𝑟 contains a resolvent entry
with lower index 𝑘𝑠 .
To prove (∗), suppose by contradiction that there exists an 𝑠 ∈ 𝐿 such that
for all 𝑟 ∈ {1, … , 𝑝} ⧵ {𝑠} the lower index 𝑘𝑠 does not appear in the monomial
𝐴𝑟𝜍𝑟 . To simplify notation, we assume that 𝑠 = 1. Then, for all 𝑟 = 2, … , 𝑝, since
𝐴𝑟𝜍𝑟 is maximally expanded, we find that 𝐴𝑟𝜍𝑟 is independent of 𝑘1 . Therefore,
we have
𝑝 𝑝
𝔼(𝑄𝑘1 𝐴1𝜍1 )(𝑄𝑘2 𝐴2𝜍2 ) ⋯ (𝑄𝑘𝑝 𝐴𝜍𝑝 ) = 𝔼𝑄𝑘1 (𝐴1𝜍1 (𝑄𝑘2 𝐴2𝜍2 ) ⋯ (𝑄𝑘𝑝 𝐴𝜍𝑝 )) = 0,

where in the last step we used that 𝔼𝑄𝑖 (𝑋)𝑌 = 𝔼𝑄𝑖 (𝑋𝑌) = 0 if 𝑌 is independent
of 𝑖. This concludes the proof of (∗).
The statement (∗) can be reformulated as asserting that, after expansion,
every lone label 𝑠 has a “partner” label 𝑟 = 𝜏(𝑠), such that the index 𝑘𝑠 appears
92 10. FLUCTUATION AVERAGING MECHANISM

also as a lower index in the expansion of 𝐴𝑟 (note that there may be several such
partner labels 𝑟, we can choose 𝜏(𝑠) to be any one of them).
For 𝑟 ∈ {1, … , 𝑝} we define ℓ(𝑟) ∶= ∑𝑠∈𝐿 𝟏(𝜏(𝑠) = 𝑟), the number of times
that the label 𝑟 was chosen as a partner to some lone label 𝑠. We now claim that
1+ℓ(𝑟)
(10.20) 𝐴𝑟𝜍𝑟 = 𝑂≺ (Ψo ).

To prove (10.20), fix 𝑟 ∈ {1, … , 𝑝}. By definition, for each 𝑠 ∈ 𝜏 −1 ({𝑟}) the
index 𝑘𝑠 appears as a lower index in the monomial 𝐴𝑟𝜍𝑟 . Since 𝑠 ∈ 𝐿 is by
definition a lone label and 𝑠 ≠ 𝑟, we know that 𝑘𝑠 does not appear as an index
in 𝐴𝑟 = 1/𝐺𝑘𝑟 𝑘𝑟 , i.e., 𝑘𝑟 ≠ 𝑘𝑠 . By definition of the monomials associated with
the tree vertex 𝜎𝑟 , it follows that 𝑏(𝜎𝑟 ), the number of ones in 𝜎𝑟 , is at least
|𝜏 −1 ({𝑟})| = ℓ(𝑟) since each application of 𝑤1 adds precisely one new lower
index while application of 𝑤0 leaves the lower indices unchanged. Note that in
this step it is crucial that 𝑠 ∈ 𝜏 −1 ({𝑟}) was a lone label. Recalling (10.19), we
therefore get (10.20).
Using (10.20) and Lemma 10.1, we find
𝑝
|(𝑄 𝐴1 ) ⋯ (𝑄 𝐴𝑝𝜍 )| ≺ ∏ Ψ1+ℓ(𝑟) 𝑝+|𝐿|
= Ψo .
| 𝑘1 𝜍1 𝑘𝑝 𝑝 | o
𝑟=1

This concludes the proof of (10.18). □


Summing over the binary trees in (10.17) and using Lemma 10.1, we get
from (10.18)
𝑝+|𝐿|
(10.21) 𝑉(𝐤) = 𝑂≺ (Ψo ).
Step 4. Summing over the expansion. We now return to the sum (10.11).
We perform the summation by first fixing 𝑃 ∈ 𝔓𝑝 , with associated lone labels
𝐿 = 𝐿(𝑃). Using |𝑡𝑖𝑗 | ≤ 𝑀 −1 and (8.45), we have

| | | |
|∑ 𝟏(𝒫(𝐤) = 𝑃) 𝑡𝑖𝑘1 ⋯ 𝑡𝑖𝑘𝑝/2 𝑡𝑖𝑘𝑝/2+1 ⋯ 𝑡𝑖𝑘𝑝 | ≤ 𝑀 −𝑝 |∑ 𝟏(𝒫(𝐤) = 𝑃) |.
| | | |
𝐤 𝐤

Now the number of 𝐤 with |𝐿| lone labels can be bounded easily by 𝑀 |𝐿|+(𝑝−|𝐿|)/2
since each block of 𝑃 that is not contained in 𝐿 consists of at least two labels.
Thus, we can bound the last displayed equation by
| |
|∑ 𝟏(𝒫(𝐤) = 𝑃)𝑡𝑖𝑘1 ⋯ 𝑡𝑖𝑘𝑝/2 𝑡𝑖𝑘𝑝/2+1 ⋯ 𝑡𝑖𝑘𝑝 | ≤ 𝑀 −𝑝 𝑀 |𝐿|+(𝑝−|𝐿|)/2
| |
𝐤

= (𝑀 −1/2 )𝑝−|𝐿| .
From (10.11) and (10.21) we get
𝑝
| | 𝑝+|𝐿(𝑃)| 2𝑝
𝔼||∑ 𝑡𝑖𝑘 𝑋𝑘 || ≺ ∑ (𝑀 −1/2 )𝑝−|𝐿(𝑃)| Ψo ≤ 𝐶𝑝 Ψo ,
𝑘 𝑃∈𝔓𝑝
10.2. PROOF OF LEMMA 8.9 93

where in the last step we used the lower bound from (8.20) and estimated the
summation over 𝔓𝑝 with a constant 𝐶𝑝 (which is bounded by (𝐶𝑝2 )𝑝 ). Sum-
marizing, we have proved that
𝑝
| | 2𝑝
(10.22) 𝔼||∑ 𝑡𝑖𝑘 𝑋𝑘 || ≺ Ψo
𝑘

for any 𝑝 ∈ 2ℕ.


We conclude the proof of (8.47) with a simple application of Chebyshev’s
inequality. Fix 𝜖 > 0 and 𝐷 > 0. Using (10.22) and Chebyshev’s inequality we
find
| |
ℙ(||∑ 𝑡𝑖𝑘 𝑋𝑘 || > 𝑁 𝜖 Ψ2o ) ≤ 𝑁 𝑁 −𝜖𝑝
𝑘
for large enough 𝑁 ≥ 𝑁0 (𝜖, 𝑝). Choosing 𝑝 ≥ 𝜖−1 (1+𝐷), we conclude the proof
of (8.47).
Remark 10.4. The first decoupling formula (8.1) is the only identity about
the entries of 𝐺 that is needed in the proof of Lemma 8.9. In particular, the
second decoupling formula (8.2) is never used, and the actual entries of 𝐻 never
appear in the argument.

Part II: Proof of (8.46).


The bound (8.46) may be proved by following the proof of (8.47) verbatim;
the only modification is the bound
|𝑄 𝐺 (𝕋) | = |𝑄 (𝐺 (𝕋) − 𝑚)| ≺ Ψ,
| 𝑘 𝑘𝑘 | | 𝑘 𝑘𝑘 |
which replaces (10.10). Here we again use the same upper bound Ψo = Ψ for Λ
and Λo .

Part III: Proofs of (8.48) and (8.50).


In order to prove (8.48), we note that the last term in (8.17) is of order 𝑂≺ (Ψ2 ).
Hence we have
𝑤𝑎 ∶= ∑ 𝑡𝑎𝑖 𝑣𝑖
𝑖
2 2
= 𝑚sc ∑ 𝑡𝑎𝑖 𝑠𝑖𝑘 𝑣𝑘 − 𝑚sc ∑ 𝑡𝑎𝑖 Υ𝑖 + 𝑂≺ (Ψ2 ) = 𝑚sc
2
∑ 𝑠𝑎𝑖 𝑡𝑖𝑘 𝑣𝑘 + 𝑂≺ (Ψ2 ),
𝑖,𝑘 𝑖 𝑖,𝑘

where in the last step we used Corollary 8.10 (recall that the proof of this corollary
used only (8.47), which was already proved in Part I) to bound ∑𝑖 𝑡𝑎𝑖 Υ𝑖 by 𝑂≺ (Ψ2 )
and that the matrices 𝑇 and 𝑆 commute by assumption. Introducing the vector
𝐰 = (𝑤𝑎 )𝑁𝑎=1 , we therefore have the equation
2
(10.23) 𝐰 = 𝑚sc 𝑆𝐰 + 𝑂≺ (Ψ2 ) ,
where the error term is in the sense of the ℓ∞ -norm (uniform in the components
2
of the vector 𝐰). Inverting the matrix 1−𝑚sc 𝑆 and recalling the definition (6.17)
yields (8.48).
94 10. FLUCTUATION AVERAGING MECHANISM

The proof of (8.50) is similar, except that we have to treat the subspace 𝐞⟂
separately. Using (8.49), we write
1
∑ 𝑡𝑎𝑖 (𝑣𝑖 − [𝑣]) = ∑ 𝑡𝑎𝑖 𝑣𝑖 − ∑ 𝑣𝑖 ,
𝑖 𝑖 𝑖
𝑁
and apply the above argument to each term separately. This yields
2 2 1
∑ 𝑡𝑎𝑖 (𝑣𝑖 − [𝑣]) = 𝑚sc ∑ 𝑡𝑎𝑖 ∑ 𝑠𝑖𝑘 𝑣𝑘 − 𝑚sc ∑ ∑ 𝑠𝑖𝑘 𝑣𝑘 + 𝑂≺ (Ψ2 )
𝑖 𝑖 𝑘 𝑖
𝑁 𝑘
2
= 𝑚sc ∑ 𝑠𝑎𝑖 𝑡𝑖𝑘 (𝑣𝑘 − [𝑣]) + 𝑂≺ (Ψ2 )
𝑖,𝑘

where we used (6.1) in the second step. Note that the error term on the right-
hand side is perpendicular to 𝐞 when regarded as a vector indexed by 𝑎, since
all other terms in the equation are. Hence we may invert the matrix (1 − 𝑚2 𝑆)
on the subspace 𝐞⟂ , as above, to get (8.50). This completes the proof of Lemma
8.9. □

10.3. Alternative Proof of (8.47) of Lemma 8.9


We conclude this section with an alternative proof of the key statement (8.47)
of Lemma 8.9. While the underlying argument remains similar, the following
proof makes use of an additional decomposition of the space of random variables
that avoids the use of the stopping rule from Step (1 in the above proof of Lemma
8.9. This decomposition may be regarded as an abstract reformulation of the
stopping rule.
Alternative proof of (8.47) of Lemma 8.9. As before, we set 𝑋𝑘 ∶=
𝑄𝑘 (𝐺𝑘𝑘 )−1 . For simplicity of presentation, we set 𝑡𝑖𝑘 = 𝑁 −1 . The decompo-
sition is defined using the operations 𝑃𝑖 and 𝑄𝑖 , introduced in Definition 7.3.
It is immediate that 𝑃𝑖 and 𝑄𝑖 are projections, that 𝑃𝑖 + 𝑄𝑖 = 1, and that all of
these projections commute with each other. For a set 𝐴 ⊂ {1, … , 𝑁} we use the
notation 𝑃𝐴 ∶= ∏𝑖∈𝐴 𝑃𝑖 and 𝑄𝐴 ∶= ∏𝑖∈𝐴 𝑄𝑖 .
Let 𝑝 be even and introduce the shorthand 𝑋˜𝑘𝑠 ∶= 𝑋𝑘𝑠 for 𝑠 ≤ 𝑝/2 and
𝑋˜𝑘𝑠 ∶= 𝑋 𝑘𝑠 for 𝑠 > 𝑝/2. Then we get
𝑝 𝑝
|1 | 1 ˜𝑘
𝔼|| ∑ 𝑋𝑘 || = 𝑝 ∑ 𝔼∏𝑋
𝑁 𝑘 𝑁 𝑘1 ,…,𝑘𝑝
𝑠
𝑠=1
𝑝 𝑝
1 ˜𝑘 ).
= ∑ 𝔼 ∏(∏(𝑃𝑘𝑟 + 𝑄𝑘𝑟 )𝑋
𝑁𝑝 𝑘1 ,…,𝑘𝑝 𝑠=1 𝑟=1
𝑠

Introducing the notation 𝐤 = (𝑘1 , … , 𝑘𝑝 ) and [𝐤] = {𝑘1 , … , 𝑘𝑝 }, we therefore


get, by multiplying out the parentheses,
𝑝 𝑝
|1 | 1
(10.24) 𝔼|| ∑ 𝑋𝑘 || = 𝑝 ∑ ∑ 𝔼 ∏(𝑃𝐴c𝑠 𝑄𝐴𝑠 𝑋˜𝑘𝑠 ).
𝑁 𝑘 𝑁 𝐤 𝐴
1 ,…,𝐴𝑝 ⊂[𝐤] 𝑠=1
10.3. ALTERNATIVE PROOF OF (8.47) OF LEMMA 8.9 95

˜𝑘 , we have that 𝑋
Next, by definition of 𝑋 ˜𝑘 = 𝑄𝑘 𝑋 ˜ , which implies that
𝑠 𝑠 𝑠 𝑘𝑠
˜
𝑃𝐴c𝑠 𝑋𝑘𝑠 = 0 if 𝑘𝑠 ∉ 𝐴𝑠 . Hence we may restrict the summation to 𝐴𝑠 satisfying
(10.25) 𝑘𝑠 ∈ 𝐴 𝑠
for all 𝑠. Moreover, we claim that the right-hand side of (10.24) vanishes unless
(10.26) 𝑘𝑠 ∈ 𝐴𝑞

𝑞≠𝑠

for all 𝑠. Indeed, suppose that 𝑘𝑠 ∈ ⋂𝑞≠𝑠 𝐴c𝑞 for some 𝑠, say 𝑠 = 1. In this case,
˜𝑘 is independent of 𝑘1 (see Definition
for each 𝑠 = 2, … , 𝑝 the factor 𝑃𝐴c 𝑄𝐴 𝑋 𝑠 𝑠 𝑠
7.3). Thus, we get
𝑝 𝑝
˜𝑘 ) = 𝔼(𝑃𝐴c 𝑄𝐴 𝑄𝑘 𝑋
𝔼 ∏(𝑃 𝑄𝐴𝑠 𝑋
𝐴c𝑠
˜ )
˜ ) ∏(𝑃𝐴c 𝑄𝐴 𝑋
𝑠 1 1 1 𝑘1 𝑠 𝑠 𝑘𝑠
𝑠=1 𝑠=2
𝑝
˜𝑘 ) ∏(𝑃𝐴c 𝑄𝐴 𝑋
= 𝔼𝑄𝑘1 ((𝑃𝐴c1 𝑄𝐴1 𝑋 ˜ )) = 0,
1 𝑠 𝑠 𝑘𝑠
𝑠=2

where in the last step we used that 𝔼𝑄𝑖 (𝑋) = 0 for any 𝑖 and random variable 𝑋.
We conclude that the summation on the right-hand side of (10.24) is re-
stricted to indices satisfying (10.25) and (10.26). Under these two conditions we
have
𝑝
(10.27) ∑ |𝐴𝑠 | ≥ 2 |[𝐤]|,
𝑠=1

since each index 𝑘𝑠 must belong to at least two different sets 𝐴𝑞 : to 𝐴𝑠 (by
(10.25)) as well as to some 𝐴𝑞 with 𝑞 ≠ 𝑠 (by (10.26)).
Next, we claim that for 𝑘 ∈ 𝐴 we have
|𝐴|
(10.28) |𝑄𝐴 𝑋𝑘 | ≺ Ψo .
(Note that if we were doing the case 𝑋𝑘 = 𝑄𝑘 𝐺𝑘𝑘 instead of 𝑋𝑘 = 𝑄𝑘 (𝐺𝑘𝑘 )−1 ,
then (10.28) would have to be weakened to |𝑄𝐴 𝑋𝑘 | ≺ Ψ|𝐴| , in accordance with
(8.46). Indeed, in that case and for 𝐴 = {𝑘}, we only have the bound |𝑄𝑘 𝐺𝑘𝑘 | ≺ Ψ
and not |𝑄𝑘 𝐺𝑘𝑘 | ≺ Ψo .)
Before proving (10.28), we show how it may be used to complete the proof.
Using (10.24), (10.28), and Lemma 10.1, we find
𝑝
|1 | 1 2|[𝑘]|
𝔼|| ∑ 𝑋𝑘 || ≺ 𝐶𝑝 𝑝 ∑ Ψo
𝑁 𝑘
𝑁 𝐤
𝑝
1
= 𝐶𝑝 ∑ Ψ2𝑢
o ∑ 𝟏(|[𝐤]| = 𝑢)
𝑢=1
𝑁𝑝 𝐤
𝑝
2𝑝
≤ 𝐶𝑝 ∑ Ψ2𝑢
o 𝑁
𝑢−𝑝
≤ 𝐶𝑝 (Ψo + 𝑁 −1/2 )2𝑝 ≤ 𝐶𝑝 Ψo ,
𝑢=1
96 10. FLUCTUATION AVERAGING MECHANISM

where in the first step we estimated the summation over the sets 𝐴1 , … , 𝐴𝑝
by a combinatorial factor 𝐶𝑝 depending on 𝑝, in the fourth step we used the
elementary inequality 𝑎𝑛 𝑏𝑚 ≤ (𝑎 + 𝑏)𝑛+𝑚 for positive 𝑎, 𝑏, and in the last step
we used (8.20) and the bound 𝑀 ≤ 𝑁. Thus, we have proved (10.22), from
which the claim follows exactly as in the first proof of (8.47).
What remains is the proof of (10.28). The case |𝐴| = 1 (corresponding to
𝐴 = {𝑘}) follows from (10.10), exactly as in the first proof of (8.47). To simplify
notation, for the case |𝐴| ≥ 2 we assume that 𝑘 = 1 and 𝐴 = {1, … , 𝑡} with 𝑡 ≥ 2.
It suffices to prove that
| 1 |
(10.29) |𝑄𝑡 ⋯ 𝑄2 | ≺ Ψ𝑡o .
| 𝐺11 |
We start by writing, using the first decoupling formula (8.1),
1 1 𝐺12 𝐺21 𝐺12 𝐺21
𝑄2 = 𝑄2 + 𝑄2 = 𝑄2 ,
𝐺11 (2)
𝐺11
(2)
𝐺11 𝐺11 𝐺22
(2)
𝐺11 𝐺11 𝐺22
(2)
where the first term vanishes since 𝐺11 is independent of 2 (see Definition 7.3).
We now consider
1 𝐺12 𝐺21
𝑄3 𝑄2 = 𝑄2 𝑄3 ,
𝐺11 𝐺 𝐺 𝐺
(2)
11 11 22

and apply again the first decoupling formula (8.1) with 𝑘 = 3 to each resolvent
entry on the right-hand side,and multiply everything out. The result is a sum of
fractions of entries of 𝐺, whereby all entries in the numerator are diagonal and
all entries in the denominator are diagonal. The leading-order term vanishes,
(3) (3)
𝐺12 𝐺21
𝑄2 𝑄3 = 0,
(3) (23) (3)
𝐺11 𝐺11 𝐺22
so that the surviving terms have at least three (off-diagonal) resolvent entries in
the numerator. We may now continue in this manner; at each step the number
of (off-diagonal) resolvent entries in the numerator increases by at least 1.
More formally, we obtain a sequence 𝐴2 , … , 𝐴𝑡 , where
𝐺12 𝐺21
𝐴2 ∶= 𝑄2
(2)
𝐺11 𝐺11 𝐺22
and 𝐴𝑖 is obtained by applying (8.1) with 𝑘 = 𝑖 to each entry of 𝑄𝑖 𝐴𝑖−1 and
keeping only the nonvanishing terms. The following properties are easy to check
by induction:
(i) 𝐴𝑖 = 𝑄𝑖 𝐴𝑖−1 .
(ii) 𝐴𝑖 consists of the projection 𝑄2 ⋯ 𝑄𝑖 applied to a sum of fractions such
that all entries in the numerator are off-diagonal and all entries in the
denominator are diagonal.
10.3. ALTERNATIVE PROOF OF (8.47) OF LEMMA 8.9 97

(iii) The number of (off-diagonal) entries in the numerator of each term


of 𝐴𝑖 is at least 𝑖.
By Lemma 10.1 combined with (ii) and (iii) we conclude that |𝐴𝑖 | ≺ Ψo𝑖 . From
(i) we therefore get
1
𝑄𝑡 ⋯ 𝑄2 = 𝐴𝑡 = 𝑂≺ (Ψ𝑡o ).
𝐺11
This is (10.29). Hence the proof is complete. □

10.3.1. History of the fluctuation averaging. Lemma 8.9 is a version of


the fluctuation averaging mechanism, taken from [55], that is the most useful
for our purpose in proving Theorem 6.7. We now briefly comment on its history.
The first version of the fluctuation averaging mechanism appeared in [68]
for the Wigner case, where [𝑍] = 𝑁 −1 ∑𝑘 𝑍𝑘 was bounded by Λ2o (recall 𝑍𝑖 from
(8.11)). Since 𝑄𝑘 [𝐺𝑘𝑘 ]−1 is essentially 𝑍𝑘 (see (8.7)), this corresponds to the first
bound in (8.46). A different proof (with a better bound on the constants) was
given in [70]. A conceptually streamlined version of the original proof was ex-
tended to sparse matrices [56] and to sample covariance matrices [114]. Finally,
an extensive analysis in [52] treated the fluctuation averaging of general poly-
nomials of resolvent entries and identified the order of cancellations depending
on the algebraic structure of the polynomial. Moreover, in [52] an additional
cancellation effect was found for the quantity 𝑄𝑖 |𝐺𝑖𝑗 |2 . All proofs of the fluc-
tuation averaging lemmas rely on computing expectations of high moments of
the averages and carefully estimating various terms of different combinatorial
structure. In [52] we have developed a Feynman diagrammatic representation
for bookkeeping the terms, but this is necessary only for the case of general poly-
nomials. For the special cases stated in Lemma 8.9, the proof presented here is
relatively simple.
CHAPTER 11

Eigenvalue Location: The Rigidity Phenomenon

The local semicircle law in Theorem 6.7 was proven for universal Wigner
matrices, characterized by the upper bound 𝑠𝑖𝑗 ≤ 1/𝑀 on the variances of ℎ𝑖𝑗 .
This result implies rigidity estimates on the location of the eigenvalues with
a precision depending on 𝑀; e.g., in the bulk spectrum the eigenvalues can
typically be located with a precision slightly above 1/𝑀. For the sake of simplic-
ity, from now on we restrict our presentation to the special case of generalized
Wigner matrices characterized by 𝐶inf /𝑁 ≤ 𝑠𝑖𝑗 ≤ 𝐶sup /𝑁 with two positive con-
stants 𝐶inf and 𝐶sup (see Definition 2.1). So, from now on the parameter 𝑀 will
be replaced with 𝑁. For rigidity results in the general case, see theorem 7.6
in [55].
In this section we will show that eigenvalues for generalized Wigner matri-
ces are quite rigid; they may fluctuate only on a scale slightly above 1/𝑁. This
is a manifestation that the eigenvalues are strongly correlated; the typical fluc-
tuation scale of 𝑁 independent, say Poisson, points in a finite interval would be
𝑁 −1/2 .

11.1. Extreme Eigenvalues


We first state a corollary to Theorem 6.7 on the extreme eigenvalues.
Corollary 11.1. For any generalized Wigner matrix, the largest eigenvalue
of 𝐻 is bounded by 2 + 𝑁 −2/3+𝜀 for any 𝜀 > 0 in the sense that for any 𝜀 > 0 and
any 𝐷 ≥ 1 we have
(11.1) ℙ( max |𝜆𝛼 | ≥ 2 + 𝑁 −2/3+𝜀 ) ≤ 𝑁 −𝐷
𝛼=1,…,𝑁

for any 𝑁 ≥ 𝑁0 (𝜀, 𝐷).


Proof. Set 𝜂 = 𝑁 −2/3 and choose an energy 𝐸 = 2 + 𝜅 outside of the
spectrum with some 𝜅 ≥ 𝑁 −2/3+𝜀 ≫ 𝑁 𝜀/2 𝜂. From (6.33) with 𝑧 = 𝐸 + 𝑖𝜂, we
have
𝑁 𝜀/2 1
(11.2) |Im 𝑚𝑁 (𝑧) − Im 𝑚sc (𝑧)| ≤ ≪
𝑁𝜅 𝑁𝜂
with very high probability. On the other hand, if there is an eigenvalue 𝜆 with
|𝜆 − 𝐸| ≤ 𝜂, then we have
1 𝑁 𝜀/2
(11.3) Im 𝑚𝑁 (𝑧) ≥ ≫ Im 𝑚sc (𝑧) + .
2𝑁𝜂 𝑁𝜅
99
100 11. EIGENVALUE LOCATION: THE RIGIDITY PHENOMENON

Here, the first inequality comes from


1 𝜂 1 𝜂
Im 𝑚𝑁 (𝑧) = ∑ ≥ ,
2
𝑁 𝛼 |𝜆𝛼 − 𝐸| + 𝜂 2 𝑁 |𝜆 − 𝐸|2 + 𝜂 2

while the second follows from Im 𝑚sc (𝑧) ≍ 𝜂/√𝜅 from (6.13).
The equations (11.2) and (11.3), however, contradict each other, showing
that with very high probability there is no eigenvalue with |𝜆 − 𝐸| ≤ 𝑁 −2/3 . We
can repeat this argument for a grid of energies 𝐸 = 𝐸𝑘 = 2+𝑁 −2/3+𝜀 +𝑘𝑁 −2/3 for
𝑘 = 0, 1, … , 𝐶𝑁 2/3+𝐷 for any 𝐷 finite. We can then use the union bound for the
exceptional sets of small probabilities to exclude the existence of eigenvalues
between 𝑁 −2/3+𝜀 and 𝑁 𝐷 . Finally, for a large enough 𝐷, it is trivial to prove
that the bound ‖𝐻‖ ≤ ∑𝑖𝑗 |ℎ𝑖𝑗 | ≤ 𝑁 𝐷 holds with very high probability. This
excludes eigenvalues |𝜆| ≥ 𝑁 𝐷 and proves the corollary. □

11.2. Stieltjes Transform and Regularized Counting Function


For any signed measure ˜
𝜌 on the real line, define its Stieltjes transform by
˜
𝜌 (𝜆)
(11.4) 𝑆(𝑧) = ∫ d𝜆, 𝑧 ∈ ℂ ⧵ ℝ.

𝜆−𝑧
The Helffer-Sjöstrand formula (originally used to develop an alternative func-
tional calculus for self-adjoint operators (see, e.g., [36])) yields an expression for
∫ℝ 𝑓(𝜆) ˜𝜌 (𝜆)d𝜆 for a large class of functions 𝑓(𝜆) on the real line in terms of the
Stieltjes transform of ˜ 𝜌 . More precisely, let 𝑓 ∈ 𝐶 1 (ℝ) with compact support
and let 𝜒(𝑦) be a smooth cutoff function with support in [−1, 1], with 𝜒(𝑦) = 1
for |𝑦| ≤ 12 and with bounded derivatives. Define
˜ (𝑥 + 𝑖𝑦) ∶= (𝑓(𝑥) + 𝑖𝑦𝑓 ′ (𝑥))𝜒(𝑦)
𝑓
to be an almost-analytic extension of 𝑓. With the standard convention 𝑧 = 𝑥+𝑖𝑦
and 𝜕𝑧 = [𝜕𝑥 + 𝑖𝜕𝑦 ]/2, we have

1 𝜕 𝑓˜ (𝑥 + 𝑖𝑦)
(11.5) 𝑓(𝜆) = ∫ 𝑧 d𝑥 d𝑦.
𝜋 ℝ2 𝜆 − 𝑥 − 𝑖𝑦
To see this identity, we use 𝜕𝑧 (𝜆 − 𝑧)−1 = 0 to write
1 𝜕 𝑓˜ (𝑥 + 𝑖𝑦) 1 ˜ (𝑥 + 𝑖𝑦)
𝑓
(11.6) ∫ 𝑧 d𝑥 d𝑦 = ∫ 𝜕𝑧 [ ]d𝑧 d𝑧.
𝜋 ℝ2 𝜆 − 𝑥 − 𝑖𝑦 2𝜋𝑖 ℝ2 𝜆 − 𝑥 − 𝑖𝑦
We can rewrite the last term as
1 ˜ (𝑥 + 𝑖𝑦)
𝑓
(11.7) ∫ d[ d𝑧],
2𝜋𝑖 ℝ2 𝜆 − 𝑥 − 𝑖𝑦
where the operator d is the differential in the sense of a 1-form. From the Green
theorem and the compact support of 𝑓 ˜ , we can integrate by parts to a small circle
11.2. STIELTJES TRANSFORM AND REGULARIZED COUNTING FUNCTION 101

𝐶𝜀 of radius 𝜀 around 𝜆. The contribution of the two-dimensional integration


inside the circle will vanish in the limit 𝜀 → 0. Hence, the last term is equal to
1 ˜ (𝑥 + 𝑖𝑦)
𝑓
(11.8) lim ∫ d𝑧 = 𝑓(𝜆),
𝜀→0 2𝜋𝑖 𝜆 − 𝑥 − 𝑖𝑦
𝐶 𝜀

and this proves (11.5). Computing the 𝜕𝑧̄ in (11.5) explicitly, we have
1 𝑖𝑦𝑓 ″ (𝑥)𝜒(𝑦) + 𝑖(𝑓(𝑥) + 𝑖𝑦𝑓 ′ (𝑥))𝜒 ′ (𝑦)
(11.9) 𝑓(𝜆) = ∫ d𝑥 d𝑦.
𝜋 ℝ2 𝜆 − 𝑥 − 𝑖𝑦

We remark that the same formula holds if we simply defined 𝑓 ˜ (𝑥 + 𝑖𝑦) ∶=


′ ˜
𝑓(𝑥)𝜒(𝑦). The addition of the term 𝑖𝑦𝑓 (𝑥)𝜒(𝑦) is to make 𝜕𝑧 𝑓 (𝑥 + 𝑖𝑦) = 𝑂(𝑦)
for |𝑦| small, which will be needed in the following estimates. In fact, the concept
of almost-analytic functions can be extended to arbitrary order 𝑛. We could have
defined 𝑓 ˜ such that 𝜕𝑧 𝑓˜ (𝑥 + 𝑖𝑦) = 𝑂(|𝑦|𝑛 ), but we will not need this result as
it would not improve our estimate.
The following lemma in a slightly less general form appeared in [59,68,69].
It shows how to translate estimates on the Stieltjes transform 𝑆(𝑧) in the regime
Im 𝑧 ≥ ˜ 𝜂 to the regularized distribution function of ˜ 𝜚 . Given two real numbers,
𝐸1 < 𝐸2 , we wish to estimate ˜ 𝜚 ([𝐸1 , 𝐸2 ]), the ˜
𝜚 -measure of the interval [𝐸1 , 𝐸2 ].
Since the integral of the sharp indicator function cannot be directly controlled,
we need to smooth it out; the function 𝑓2 − 𝑓1 in the lemma below plays this
role. The scale of the smoothing, denoted by ˆ 𝜂 below, must be larger than ˜ 𝜂.
Another feature of this lemma is that the condition on 𝑆(𝑧) = 𝑆(𝐸 + 𝑖𝜂) is local;
only 𝐸-values in a ˜ 𝜂 -neighborhood of the endpoints 𝐸1 , 𝐸2 are needed.
Lemma 11.2. Let ˜ 𝜚 be a signed measure on ℝ, and let 𝑆 be its Stieltjes trans-
form. Fix two energies 𝐸1 < 𝐸2 . Suppose that for some 0 < ˜𝜂 ≤ 12 and 0 < 𝑈1 ≤
𝑈2 we have
𝑈
|𝑆(𝑥 + i𝑦)| ≤ 1 for any 𝑥 ∈ [𝐸1 , 𝐸1 + ˜ 𝜂 ] ∪ [𝐸2 , 𝐸2 + ˜𝜂]
(11.10) 𝑦
and ˜
𝜂 ≤ 𝑦 ≤ 1,

𝑈1 1
(11.11) |Im 𝑆(𝑥 + 𝑖𝑦)| ≤ for any 𝑥 ∈ [𝐸1 , 𝐸2 + ˜
𝜂 ] and ≤ 𝑦 ≤ 1,
𝑦 2
𝑈
(11.12) |Im 𝑆(𝑥 + 𝑖𝑦)| ≤ 2 for any 𝑥 ∈ [𝐸1 , 𝐸1 + ˜ 𝜂 ] ∪ [𝐸2 , 𝐸2 + ˜
𝜂]
𝑦
and 0 < 𝑦 < ˜𝜂.
For ˆ𝜂 ≥ ˜
𝜂 , define two functions 𝑓𝑗 = 𝑓𝐸𝑗 ,ˆ
𝜂 ∶ ℝ → ℝ such that 𝑓𝑗 (𝑥) = 1 for
−1
𝜂 , ∞); moreover, |𝑓𝑗′ (𝑥)| ≤ 𝐶 ˆ
𝑥 ∈ (−∞, 𝐸𝑗 ], 𝑓𝑗 (𝑥) vanishes for 𝑥 ∈ [𝐸𝑗 + ˆ 𝜂
−2
and |𝑓𝑗″ (𝑥)| ≤ 𝐶 ˆ
𝜂 for some constant 𝐶. Then, for some other constant 𝐶 > 0
independent of 𝑈1 , 𝑈2 , and ˜
𝜂 we have
2
| | ˜
𝜂
(11.13) |∫(𝑓2 − 𝑓1 )(𝜆) ˜
𝜚 (𝜆)d𝜆| ≤ 𝐶‖ ˜
𝜚 ‖TV [𝑈1 |log ˜
𝜂 | + 𝑈2 2 ]
| | ˆ
𝜂
102 11. EIGENVALUE LOCATION: THE RIGIDITY PHENOMENON

where ‖ ⋅ ‖TV denotes the total variation norm of signed measures. In addition, if
we assume that ˜𝜌 has compact support, then we also have

2
| | ˜
𝜂
(11.14) 𝜚 (𝜆)d𝜆| ≤ 𝐶‖ ˜
|∫ 𝑓2 (𝜆) ˜ 𝜚 ‖TV [𝑈1 |log ˜
𝜂 | + 𝑈2 2 ]
| | ˆ
𝜂

where 𝐶 depends on the size of the support of ˜


𝜚.

Proof. In this proof, we will consider only the case ˜ 𝜂 = ˆ 𝜂 and 𝑈1 =


𝑈2 . We will denote these common values by 𝜂 and 𝑈. We may also assume
‖˜𝜚 ‖TV = 1. These simplifications only streamline some notation; the general
case is proven exactly in the same way.
Let 𝑓 = 𝑓2 − 𝑓1 be a function that is smooth and compactly supported. From
the representation (11.9) and since 𝑓 is real, we have
∞ ∞
| | | |
|∫ 𝑓(𝜆) ˜
𝜌 (𝜆)d𝜆| = |Re ∫ 𝑓(𝜆) ˜
𝜌 (𝜆)d𝜆|
| −∞ | | −∞ |
| |
≤ 𝐶 |∬ 𝑦𝑓 ″ (𝑥)𝜒(𝑦) Im 𝑆(𝑥 + i𝑦)d𝑥 d𝑦 |
(11.15) | |

+ 𝐶 ∬ (|𝑓(𝑥)||𝜒 ′ (𝑦)||Im 𝑆(𝑥 + i𝑦)|

+ |𝑦||𝑓 ′ (𝑥)||𝜒 ′ (𝑦)||Re 𝑆(𝑥 + i𝑦)|)d𝑥 d𝑦

for some universal 𝐶 > 0 and where 𝜒 is a smooth cutoff function with support
1
in [−1, 1] with 𝜒(𝑦) = 1 for |𝑦| ≤ 2
and with bounded derivatives. Recall that 𝑓 ′
1
is O(𝜂−1 ) on two intervals of size O(𝜂) each and 𝜒 ′ is supported in 2
≤ |𝑦| ≤ 1.
By (11.10), we can bound

(11.16) ∬ |𝑦||𝑓 ′ (𝑥)||𝜒 ′ (𝑦)||Re 𝑆(𝑥 + i𝑦)|d𝑥 d𝑦 ≤ 𝐶𝑈

(note that due to 𝑆(𝑥 + 𝑖𝑦) = 𝑆(𝑥 − 𝑖𝑦), the bounds analogous to (11.10)–(11.11)
also hold for negative 𝑦’s). Using that 𝑓 is bounded with compact support, we
have, by (11.11),

(11.17) ∬ |𝑓(𝑥)||𝜒 ′ (𝑦)||Im 𝑆(𝑥 + i𝑦)|d𝑥 d𝑦 ≤ 𝐶𝑈.

For the first term on the right-hand side of (11.15), we split the integral into
two regimes depending on 0 < 𝑦 < 𝜂 or 𝜂 < 𝑦 < 1. Note that by symme-
try we only need to consider positive 𝑦. From (11.12), the integral on the first
11.3. CONVERGENCE SPEED OF THE EMPIRICAL DISTRIBUTION FUNCTION 103

integration regime is easily bounded by


| |
|∬ 𝑦𝑓 ″ (𝑥)𝜒(𝑦) Im 𝑆(𝑥 + i𝑦)d𝑥 d𝑦 |
| 0<𝑦<𝜂 |
𝑈
= O(∬ [1(𝐸1 ≤ 𝑥 ≤ 𝐸1 + 𝜂) + 1(𝐸2 ≤ 𝑥 ≤ 𝐸2 + 𝜂)]𝑦𝜂 −2 d𝑥 d𝑦)
0<𝑦<𝜂
𝑦
= O(𝑈).
For the second integration regime, 𝑦 ∈ [𝜂, 1], we can integrate by parts in 𝑥,
then use 𝜕𝑥 Im 𝑆(𝑥 + 𝑖𝑦) = −𝜕𝑦 Re 𝑆(𝑥 + 𝑖𝑦), and then integrate by parts in 𝑦 to
get

∬ 𝑦𝑓 ″ (𝑥)𝜒(𝑦) Im 𝑆(𝑥 + i𝑦)d𝑥 d𝑦


𝑦>𝜂

(11.18) = −∬ 𝑓 ′ (𝑥)𝜕𝑦 (𝑦𝜒(𝑦)) Re 𝑆(𝑥 + i𝑦)d𝑥 d𝑦


𝑦>𝜂

(11.19) − ∫ 𝑓 ′ (𝑥)𝜂𝜒(𝜂) Re 𝑆(𝑥 + i𝜂)d𝑥.

Recall that 𝑓 ′ (𝑥) = 0 unless |𝑥 − 𝐸𝑗 | < 𝜂 for some 𝑗 = 1, 2, and in this regime
we have 𝑓 ′ = O(𝜂 −1 ). By (11.10), the last integral is easily bounded by O(𝑈).
For the first integral, we use 𝜕𝑦 (𝑦𝜒(𝑦)) = O(1) and 𝑆(𝑥 + i𝑦) = O(𝑈/𝑦) to have
1
| | d𝑦
|∬ 𝜕𝑦 (𝑦𝜒(𝑦))𝑓 ′ (𝑥)𝑆(𝑥 + i𝑦)d𝑥 d𝑦 | ≤ O(𝑈 ∫ )
(11.20) | 𝑦>𝜂 | 𝜂
𝑦
= O (𝑈| log 𝜂|) ,
which is (11.13). To prove (11.14), we simply choose 𝐸1 to be any energy on the
left side of the support of ˜
𝜌 and notice that
| | | |
(11.21) |∫ 𝑓2 (𝜆) ˜ 𝜚 (𝜆)d𝜆| ≤ 𝐶𝑈|log ˜
𝜚 (𝜆)d𝜆| = |∫(𝑓2 − 𝑓1 )(𝜆) ˜ 𝜂 |.
| | | |
This completes the proof of Lemma 11.2. □

11.3. Convergence Speed of the Empirical Distribution Function


Define the empirical distribution function (or, in physics terminology, inte-
grated density of states)
𝐸
1
𝔫𝑁 (𝐸) ∶= |{𝛼 ∶ 𝜆𝛼 ≤ 𝐸}| = ∫ 𝜚𝑁 (𝑥)d𝑥,
𝑁 −∞
where 𝜚𝑁 is the empirical eigenvalue distribution
𝑁
1
(11.22) 𝜚𝑁 (𝑥) = ∑ 𝛿(𝑥 − 𝜆𝛼 ).
𝑁 𝛼=1
104 11. EIGENVALUE LOCATION: THE RIGIDITY PHENOMENON

Similarly, we define the distribution function of the semicircle density


𝐸
𝑛sc (𝐸) ∶= ∫ 𝜚sc (𝑥)d𝑥.
−∞

We introduce the differences


𝜚∆ ∶= 𝜚𝑁 − 𝜚sc , 𝑚∆ ∶= 𝑚𝑁 − 𝑚sc ,
and recall 𝑚𝑁 (𝑧) = 𝑁1 Tr 𝐺(𝑧) = ∫ 𝜚𝑁 (𝑥)(𝑥 − 𝑧)−1 d𝑥 is the Stieltjes transform
of the empirical density. The following lemma, proven below, shows how the
local semicircle law implies a convergence rate for the empirical distribution
function.
Lemma 11.3. Suppose that for some |𝐸| ≤ 10 and 𝑁 −1+𝜀 ≤ ˜
𝜂 ≪ 1 with
some 𝜀 > 0 we know that
1
(11.23) |𝑚𝑁 (𝑥 + i𝜂) − 𝑚sc (𝑥 + i𝜂)| ≺
𝑁𝜂
holds for any |𝑥 − 𝐸| ≤ ˜
𝜂 and any 𝜂 ∈ [ ˜
𝜂 , 10]. Then, we have
(11.24) |𝔫𝑁 (𝐸) − 𝑛sc (𝐸)| ≺ ˜
𝜂.
For generalized Wigner matrices, the condition (11.23) holds with the choice
˜𝜂 = 𝑁 −1+𝜀 uniformly in |𝐸| ≤ 10 (see (6.29) and (6.32)). Thus, we have proved
the following corollary:
Corollary 11.4. For generalized Wigner matrices, we have
(11.25) sup |𝔫𝑁 (𝐸) − 𝑛sc (𝐸)| ≤ 𝑁 −1+𝜀
|𝐸|≤10

with probability at least 1 − 𝑁 −𝐷 for any 𝐷, 𝜀 > 0 if 𝑁 ≥ 𝑁0 (𝜀, 𝐷) is sufficiently


large.
Proof of Lemma 11.3. Since both the semicircle measure and the empir-
ical measure (by Corollary 11.1) have a compact support within [−3, 3] with a
very high probability, (11.24) clearly holds for 𝐸 ≤ −3 and for 𝐸 ≥ 3. For an
energy |𝐸| ≤ 3, we will apply Lemma 11.2 with the choice ˜ 𝜌 = 𝜌∆ , 𝑈1 = 𝑈2 =
𝜀
𝑁 ˜ 𝜂 , and 𝐸2 = 𝐸. The estimate (11.14) will immediately imply (11.24) provided
the other assumptions on Lemma 11.2 are satisfied.
For this purpose, we need the bounds (11.10), (11.11), and (11.12) on the
Stieltjes transform with the choice 𝑈1 = 𝑈2 = 𝑁 𝜀 ˜ 𝜂 . We denote this common
value by 𝑈 ∶= 𝑁 𝜀 ˜ 𝜂 . The assumption (11.23) implies (11.10) (with very high
probability) if we choose 𝑈 ≥ 𝑁 −1+𝜀 . From Lemma 6.2, we find
(11.26) |Im 𝑚sc (𝑥 + i𝑦)| ≤ 𝐶√𝜅𝑥 + 𝑦,
recalling the definition 𝜅𝑥 = min{|𝑥 − 2|, |𝑥 + 2|} from (6.23). By spectral de-
composition of 𝐻, it is easy to see that the function 𝑦 ↦ 𝑦 Im 𝑚𝑁 (𝑥 + i𝑦) is
11.3. CONVERGENCE SPEED OF THE EMPIRICAL DISTRIBUTION FUNCTION 105

monotone increasing. Thus, we get, using (11.26) and (11.23), that


1
𝑦 Im 𝑚𝑁 (𝑥 + i𝑦) ≤ ˜
𝜂 Im 𝑚𝑁 (𝑥 + i ˜ 𝜂 (√𝜅𝑥 + ˜
𝜂) ≺ ˜ 𝜂 + )
𝑁˜𝜂
(11.27)
1
𝜂 √𝜅𝑥 + ˜
≺ ˜ 𝜂 +
𝑁
for 𝑦 ≤ ˜𝜂 and |𝑥| ≤ 10. Using the notation 𝑚∆ ∶= 𝑚𝑁 − 𝑚sc and recalling
(11.26), we therefore get
1
(11.28) |𝑦 Im 𝑚∆ (𝑥 + i𝑦)| ≺ ˜
𝜂 √𝜅𝑥 + ˜
𝜂 + ≤ 𝐶˜
𝜂
𝑁
for 𝑦 ≤ ˜𝜂 and |𝑥| ≤ 10; here we have used the assumption ˜ 𝜂 ≥ 𝑁1 in the last
step. Therefore, with the choice 𝑈 = ˜ 𝜂 𝑁 𝜀 , the bounds (11.10), (11.11), and
(11.12) indeed hold.
Recall from Lemma 11.2 that 𝑓𝐸,𝜂˜ denotes a smooth indicator function of
[−∞, 𝐸] on scale ˜𝜂 , namely, 𝑓𝐸,𝜂˜ (𝑥) = 1 for 𝑥 ∈ [−∞, 𝐸] and 𝑓𝐸,𝜂˜ (𝑥) = 0 for
−1
𝑥≥𝐸+ ˜ 𝜂 and having derivative bounded by ˜ 𝜂 . Using (11.14) from Lemma
11.2, we conclude that
| |
(11.29) |∫ 𝑓𝐸,𝜂˜ (𝜆)(𝜆)𝜚∆ (𝜆)d𝜆| ≺ ˜
𝜂.
| |
Hence, we have

𝔫𝑁 (𝐸) − 𝑛sc (𝐸) − ∫ 𝑓𝐸,𝜂˜ (𝜆) 𝜚∆ (𝜆)d𝜆 =


˜
𝐸+𝜂 ˜
𝐸+𝜂
−∫ 𝑓𝐸,𝜂˜ (𝜆) 𝜚𝑁 (𝜆)d𝜆 + ∫ 𝑓𝐸,𝜂˜ (𝜆)𝜚sc (𝜆)d𝜆 ≤ 𝐶 ˜
𝜂.
𝐸 𝐸

Notice that we only used the positivity of 𝜚𝑁 in the first term and the bounded-
ness of 𝑓 and 𝜚sc in the second. Conversely, we have

𝔫𝑁 (𝐸) − 𝑛sc (𝐸) − ∫ 𝑓𝐸−𝜂˜,𝜂˜ (𝜆)𝜚∆ (𝜆)d𝜆 =


𝐸 𝐸
∫ (1 − 𝑓𝐸−𝜂˜,𝜂˜ )(𝜆)𝜚𝑁 (𝜆)d𝜆 − ∫ (1 − 𝑓𝐸−𝜂˜,𝜂˜ )(𝜆)𝜚sc (𝜆)d𝜆 ≥ −𝐶 ˜
𝜂.
˜
𝐸−𝜂 ˜
𝐸−𝜂

Hence we have proved that

𝜂 + ∫ 𝑓𝐸−𝜂˜,𝜂˜ (𝜆)𝜚∆ (𝜆)d𝜆 ≤ 𝔫𝑁 (𝐸) − 𝑛sc (𝐸) ≤ ∫ 𝑓𝐸,𝜂˜ (𝜆)𝜚∆ (𝜆)d𝜆 + 𝐶 ˜


−𝐶 ˜ 𝜂.

The right-hand side is directly bounded by (11.29); the left-hand side is esti-
mated in the same way by using (11.29) for 𝑓𝐸−𝜂˜,𝜂˜ instead of 𝑓𝐸,𝜂˜ . This proves
Lemma 11.3. □
106 11. EIGENVALUE LOCATION: THE RIGIDITY PHENOMENON

11.4. Rigidity of Eigenvalues


We now state and prove the rigidity theorem concerning the eigenvalue loca-
tions for generalized Wigner matrices. Recall the definition of 𝜚𝑁 from (11.22).
Hence we have
𝑗 𝜆
𝑗
(11.30) = ∫ 𝜚𝑁 (𝑥)d𝑥.
𝑁 −∞
We define the classical location of the 𝑗th eigenvalue by the equation
𝑗 𝛾
𝑗
(11.31) = ∫ 𝜚sc (𝑥)d𝑥;
𝑁 −∞
in other words, 𝛾𝑗 is the 𝑗th 𝑁-quantile of the semicircle distribution.
Theorem 11.5 (Rigidity of eigenvalues). For any generalized Wigner matrix
ensemble, for any constant 𝜀 > 0 and any constant 𝐷 ≥ 1 we have
(11.32) ℙ{∃𝑗 ∶ |𝜆𝑗 − 𝛾𝑗 | ≥ 𝑁 𝜀 [min(𝑗, 𝑁 − 𝑗 + 1)]−1/3 𝑁 −2/3 } ≤ 𝑁 −𝐷
for any sufficiently large 𝑁 ≥ 𝑁0 (𝜀, 𝐷).
Proof. We consider only the case 𝑗 ≤ 𝑁/2; the other case is similar. By
(11.30) and (11.31) we have
𝜆𝑗 𝛾𝑗
0=∫ 𝜌𝑁 (𝑥)d𝑥 − ∫ 𝜌sc (𝑥)d𝑥
−∞ −∞
(11.33) 𝜆𝑗 𝛾𝑗
= ∫ (𝜌𝑁 (𝑥) − 𝜌sc (𝑥))d𝑥 − ∫ 𝜚sc (𝑥)d𝑥,
−∞ 𝜆𝑗

i.e.,
| 𝛾𝑗 | | 𝜆𝑗 |
|∫ 𝜚sc (𝑥)d𝑥 | ≤ |∫ (𝜌𝑁 (𝑥) − 𝜌sc (𝑥))d𝑥| = |𝔫𝑁 (𝜆𝑗 ) − 𝑛sc (𝜆𝑗 )|
| 𝜆𝑗 | | −∞ |
and thus by the uniform bound (11.25)
| 𝛾𝑗 | 1
(11.34) |∫ 𝜚sc (𝑥)d𝑥| ≺ .
| 𝜆𝑗 | 𝑁

Since 𝜚sc (𝑥) ≍ √𝜅𝑥 for 𝑥 ∈ [−2, 1], we have


(11.35) 𝑛sc (𝑥) ≍ 𝜅𝑥3/2 , ′
𝜚sc (𝑥) = 𝑛sc 1/3
(𝑥) ≍ 𝑛sc (𝑥), 𝑥 ∈ [−2, 1].
This also implies for 𝑗 ≤ 𝑁/2 that
2/3 1/3
𝑗 𝑗
(11.36) 𝛾𝑗 + 2 ≍ ( ) , 𝜚sc (𝛾𝑗 ) ≍ ( ) .
𝑁 𝑁
If we knew that 𝜚sc (𝛾𝑗 ) and 𝜚sc (𝜆𝑗 ) were comparable, then the monotonicity of 𝜚sc
and the mean value theorem (11.34) would immediately imply
(11.37) |𝜆𝑗 − 𝛾𝑗 |𝜌sc (𝛾𝑗 ) ≺ 𝑁 −1 .
11.4. RIGIDITY OF EIGENVALUES 107

Combining this with the second asymptotics form (11.36), we would have
(11.38) |𝜆𝑗 − 𝛾𝑗 | ≺ 𝑁 −1 𝜚sc (𝛾𝑗 )−1 ≤ 𝐶𝑁 −1 (𝑗/𝑁)−1/3 .
For the complete proof, we first consider the indices 𝑗 ≥ 𝑗0 ∶= 𝑁 𝜀/2 . Since
𝑛sc (𝛾𝑗 ) ≥ 𝑁 −1+𝜀/2 and 𝑛sc (𝛾𝑗 ) = 𝔫𝑁 (𝜆𝑗 ), we have
|𝑛sc (𝜆𝑗 ) − 𝑛sc (𝛾𝑗 )| ≤ 𝑁 −𝜀/4 𝑛sc (𝛾𝑗 )
with very high probability by (11.25). This shows that 𝑛sc (𝜆𝑗 ) ≍ 𝑛sc (𝛾𝑗 ), but then
𝜚sc (𝜆𝑗 ) ≍ 𝜚sc (𝛾𝑗 ) by (11.35), as we presumed above.
Finally, for indices 𝑗 ≤ 𝑗0 = 𝑁 𝜀/2 we use monotonicity and rigidity (11.38)
for the index 𝑗0 :
𝜆𝑗 ≤ 𝜆𝑗0 ≤ 𝛾𝑗0 + 𝑁 −2/3+𝜀/2 ≤ −2 + 𝑁 −2/3+𝜀
with very high probability. In the last step we used (11.36). For the lower bound
on 𝜆𝑗 we refer to (11.1). This shows that |𝜆𝑗 + 2| ≤ 𝑁 −2/3+𝜀 with very high
probability, and we also have |𝛾𝑗 + 2| ≤ 𝑁 −2/3+𝜀 , so |𝜆𝑗 − 𝛾𝑗 | ≤ 2𝑁 −2/3+𝜀 . This
proves the rigidity estimate (11.32) for all indices 𝑗. □
CHAPTER 12

Universality for Matrices


with Gaussian Convolutions

12.1. Dyson Brownian Motion


Consider the following matrix-valued stochastic differential equation

1 1
(12.1) d𝐻(𝑡) = dB(𝑡) − 𝐻(𝑡)d𝑡, 𝑡 ≥ 0,
√𝑁 2

with initial data 𝐻0 where B(𝑡) is defined somewhat differently in the two sym-
metry classes, namely:

(i) In the real symmetric case (indicated by superscript), B(s) is an 𝑁 × 𝑁


(s) (s)
matrix such that 𝑏𝑖𝑗 for 𝑖 < 𝑗 and 𝑏𝑖𝑖 /√2 are independent standard
(s) (s)
Brownian motions and 𝑏𝑖𝑗 = 𝑏𝑗𝑖 .
(ii) In the complex Hermitian case, B(h) is an 𝑁 × 𝑁 matrix such that
(h) (h) (h)
√2 Re(𝑏𝑖𝑗 ), √2 Im(𝑏𝑖𝑗 ) for 𝑖 < 𝑗 and 𝑏𝑖𝑖 are independent standard
(h) (h)
Brownian motions and 𝑏𝑗𝑖 = (𝑏𝑖𝑗 )∗ .

We will drop the s or h superscript; the following formulas hold for both cases.
In coordinates, we have

d𝑏𝑖𝑗 (𝑡) 1
(12.2) dℎ𝑖𝑗 (𝑡) = − ℎ (𝑡)d𝑡
√𝑁 2 𝑖𝑗

where the entries 𝑏𝑖𝑗 (𝑡) have variance 𝑡 in the Hermitian case and in the sym-
metric case the off-diagonal entries (𝑏𝑖𝑗 (𝑡)) have variance 𝑡, while the diagonal
entries have variance 2𝑡.
The equation (12.1) defines a matrix-valued Ornstein-Uhlenbeck (OU) pro-
cess. It plays a distinguished role in the theory of random matrices that was
discovered by Dyson. Depending on the typographical convenience, the time
dependence will sometimes be indicated as an argument, 𝐻(𝑡), and sometimes
as an index, 𝐻𝑡 ; we keep both notations in parallel, i.e., 𝐻(𝑡) = 𝐻𝑡 . In particular,
unlike in some PDE literature, subscripts do not mean derivatives.
109
110 12. UNIVERSALITY FOR MATRICES WITH GAUSSIAN CONVOLUTIONS

Next, we are concerned about the evolution of the eigenvalues 𝝀(𝑡) = (𝜆1 (𝑡),
… , 𝜆𝑁 (𝑡)) of 𝐻𝑡 along the OU flow. We label the eigenvalues in an increasing
order; i.e., we assume that 𝝀(𝑡) ∈ Σ𝑁 , where we define the open simplex

(12.3) Σ𝑁 ∶= {𝝀 ∈ ℝ𝑁 ∶ 𝜆1 < ⋯ < 𝜆𝑁 }.

A theorem below will guarantee that the eigenvalues are simple and are con-
tinuous functions of 𝑡, hence the labeling is consistently preserved along the
evolution.
In principle, the eigenvalues {𝜆𝑗 (𝑡)} and eigenvectors {𝐮𝑗 (𝑡)} of 𝐻𝑡 are strongly
related, and one would expect a coupled system of stochastic differential equa-
tions for them. It was Dyson’s fundamental observation [45] that the eigenvalues
themselves satisfy an autonomous system of stochastic differential equations
(SDE) that do not involve the eigenvectors. This SDE is given in the following
definition.

Definition 12.1. Given a real parameter 𝛽 ≥ 1, consider the following


system of SDE:

(𝑖)
√2 𝜆𝑖 1 1
(12.4) d𝜆𝑖 = d𝐵𝑖 + (− + ∑ )d𝑡, 𝑖 ∈ J1, 𝑁K
√𝛽𝑁 2 𝑁 𝑗 𝜆𝑖 − 𝜆𝑗

where (𝐵𝑖 ) is a collection of real-valued, independent standard Brownian mo-


tions. The solution of this SDE is called the Dyson Brownian motion (DBM) with
parameter 𝛽.

Notice that we defined the DBM for any 𝛽 ≥ 1 and not only for the classical
values 𝛽 = 1, 2 that will correspond to an OU matrix flow. The following the-
orem summarizes the main properties of the DBM. For the proof, see lemma
4.3.3 and proposition 4.3.5 of [8]. We remark that the authors in [8] considered
(12.4) without the drift term −𝜆𝑖 /2. However, any drift term of the form −𝑉 ′ (𝜆𝑖 )
with 𝑉 ∈ 𝐶 2 (ℝ) could be added without further complications.

Theorem 12.2. Let 𝛽 ≥ 1 and suppose that the initial data satisfy 𝝀(0) ∈ Σ𝑁 .
Then there exists a unique (strong) solution to (12.4) in the space of continuous
functions (𝝀(𝑡))𝑡≥0 ∈ 𝐶(ℝ+ , Σ𝑁 ). Furthermore, for any 𝑡 > 0 we have 𝝀(𝑡) ∈
Σ𝑁 , and 𝝀(𝑡) depends continuously on 𝝀(0). In particular, if 𝝀(0) ∈ Σ𝑁 ; i.e., the
multiplicity of the initial points is 1, then (𝝀(𝑡))𝑡≥0 ∈ 𝐶(ℝ+ , Σ𝑁 ); i.e., this property
is preserved for all times along the evolution.

We point out that the solution is considered in the strong sense. This means
that despite the singularity in (12.4), the DBM admits a solution for almost all
realizations of the Brownian motions 𝐵𝑖 . For the precise definition, we recall
the concept of strong solution to a (scalar) SDE of the form

(12.5) d𝑋𝑡 = 𝑎(𝑋𝑡 )d𝐵𝑡 + 𝑏(𝑋𝑡 )d𝑡, 𝑋0 = 𝜉, 𝑡 ≥ 0,


12.1. DYSON BROWNIAN MOTION 111

with respect to a fixed realization of a Brownian motion 𝐵𝑡 where 𝑎 and 𝑏 are


given coefficient functions. The strong solution is a process (𝑋𝑡 )𝑡≥0 on a fil-
trated probability space (Ω, ℱ𝑡 ) with continuous sample paths that satisfies the
following properties:
• 𝑋𝑡 is adapted to the filtration ℱ𝑡 = 𝜎(𝒢𝑡 ∪ 𝒩) where
𝒢𝑡 = 𝜎(𝐵𝑠 , 𝑠 ≤ 𝑡, 𝑋0 ), 𝒩 ∶= {𝑁 ⊂ Ω, ∃𝐺 ⊂ 𝒢∞ with 𝑁 ⊂ 𝐺, ℙ(𝐺) = 0};
i.e., the filtration of 𝑋𝑡 is the same as that of 𝐵𝑡 after completing it with
events of zero measure.
• 𝑋0 = 𝜉 almost surely.
𝑡
• For any time 𝑡, we have ∫0 [|𝑏(𝑋𝑠 )|2 + |𝑎(𝑋𝑠 )|2 ]d𝑠 < ∞ almost surely.
• The integral version of (12.5); i.e.,
𝑡 𝑡
𝑋𝑡 = 𝑋0 + ∫ 𝑎(𝑋𝑠 )d𝐵𝑠 + ∫ 𝑏(𝑋𝑠 )d𝑠
0 0
holds almost surely for any 𝑡 ≥ 0.
For simplicity, we only presented the definition for the scalar case, 𝑋𝑡 ∈ ℝ; the
extension to the vector-valued case, 𝑋𝑡 ∈ ℝ𝑁 , is straightforward.
The relation between the OU matrix flow (12.1) and the DBM (12.4) is given
by the following theorem, essentially due to Dyson.
Theorem 12.3 ([45]). Let 𝐻𝑡 solve the matrix-valued SDE (12.1) in a strong
sense. Then its eigenvalue process satisfies (12.4) where the parameter 𝛽 = 1 if the
matrix B is real symmetric and 𝛽 = 2 if B is complex Hermitian.
We remark that the Ornstein-Uhlenbeck drift term − 12 𝐻𝑡 d𝑡 in (12.1) is not
essential. One may replace it with −𝑉 ′ (𝐻)d𝑡 with any 𝑉 ∈ 𝐶 2 (ℝ) potential;
then the same theorem holds but the −𝜆𝑖 /2 term has to be replaced with −𝑉 ′ (𝜆𝑖 )
in (12.4). In particular, the drift term can be removed; for example, the mono-
graph [8] defines DBM without the drift term. While there is no fundamental
difference between these formulations, (12.1) is technically slightly more con-
venient since it preserves not only the zero expectation value but also the 1/𝑁
variance of the matrix elements if the variance of the initial data 𝐻0 is 1/𝑁.
If technical complications related to the singularities (𝜆𝑖 − 𝜆𝑗 )−1 in (12.4)
could be neglected, the proof of Theorem 12.3 would be a relatively straightfor-
ward Itô calculus combined with standard perturbation formulas for eigenval-
ues and eigenvectors of self-adjoint matrices. We will present this calculation
in the next section. A rigorous proof is slightly more cumbersome since a priori
the 𝐿2 -integrability of the singularity is not guaranteed. The actual proof first
regularizes the singular term on a very short scale. Then a stopping time ar-
gument shows that the regularization can be removed since the dynamics has
an effective level repulsion mechanism that keeps neighboring points separated.
We will not present this argument here since it is not particularly instructive
for this book. The full details can be found in section 4.3.1 of [8] for the case
112 12. UNIVERSALITY FOR MATRICES WITH GAUSSIAN CONVOLUTIONS

when the term − 12 𝐻𝑡 d𝑡 is not present in (12.1). The necessary modifications


for (12.1) (or for some other drift term −𝑉 ′ (𝜆𝑖 )) are straightforward.

12.2. Derivation of Dyson Brownian Motion and Perturbation Theory


Let 𝐮𝛼 , 𝛼 = 1, … , 𝑁, be the ℓ2 -normalized eigenvectors of 𝐻 = (ℎ𝑖𝑗 ) with
(real) eigenvalues 𝜆𝛼 . For simplicity, we assume that all eigenvalues are distinct.
Fix an index pair (𝑖, 𝑗) and abbreviate the partial derivative as
𝜕𝑓
𝑓 ̇ ∶= .
𝜕ℎ𝑖𝑗
Differentiating 𝐻𝐮𝛼 = 𝜆𝛼 𝐮𝛼 and 𝐮∗𝛼 𝐮𝛽 = 𝛿𝛼,𝛽 yields

(12.6) ̇ 𝛼 + 𝐻 𝐮̇ 𝛼 = 𝜆𝛼̇ 𝐮𝛼 + 𝜆𝛼 𝐮̇ 𝛼 ,
𝐻𝐮
as well as
𝐮̇ ∗𝛼 𝐮𝛽 + 𝐮∗𝛼 𝐮̇ 𝛽 = 0, 𝐮̇ ∗𝛼 𝐮𝛼 = 0.
Taking the inner product with 𝐮𝛼 on both sides of (12.6), we get
(12.7) 𝜆𝛼̇ = 𝐮∗𝛼 𝐻𝐮
̇ 𝛼.

Moreover, multiplying (12.6) by 𝐮∗𝛽 yields, for 𝛽 ≠ 𝛼,


̇ 𝛼 + 𝐮∗𝛽 𝐻 𝐮̇ 𝛼 = 𝜆𝛼 𝐮∗𝛽 𝐮̇ 𝛼 ;
𝐮∗𝛽 𝐻𝐮

i.e.,
̇ 𝛼 + 𝜆𝛽 𝐮∗𝛽 𝐮̇ 𝛼 = 𝜆𝛼 𝐮∗𝛽 𝐮̇ 𝛼
𝐮∗𝛽 𝐻𝐮
where we have used that 𝐻 is self-adjoint and 𝜆𝛽 is real in the last equation.
Hence,
̇ 𝛼
𝐮∗𝛽 𝐻𝐮
(12.8) 𝐮̇ 𝛼 = ∑ (𝐮∗𝛽 𝐮̇ 𝛼 )𝐮𝛽 = ∑ 𝐮𝛽 .
𝛽≠𝛼 𝛽≠𝛼
𝜆𝛼 − 𝜆𝛽

For simplicity of notation, from now on we consider only the real symmetric
case. The complex Hermitian case can be treated similarly. In the real symmet-
ric case, (12.7) reads as
𝜕𝜆𝛼
(12.9) = 𝑢𝛼 (𝑖)𝑢𝛼 (𝑗)[2 − 𝛿𝑖𝑗 ],
𝜕ℎ𝑖𝑗
where 𝑢𝛼 (1), … , 𝑢𝛼 (𝑁) denote the coordinates of the vector 𝐮𝛼 . We may assume
that the eigenvectors are real. From (12.8) we get
𝜕𝑢𝛼 (𝑘) 𝑢𝛽 (𝑖)𝑢𝛼 (𝑗) + 𝑢𝛽 (𝑗)𝑢𝛼 (𝑖)[1 − 𝛿𝑖𝑗 ]
(12.10) = ∑ 𝑢𝛽 (𝑘).
𝜕ℎ𝑖𝑗 𝛽≠𝛼
𝜆𝛼 − 𝜆 𝛽
12.2. DERIVATION OF DYSON BROWNIAN MOTION AND PERTURBATION THEORY 113

Combining these last two formulas allows us to compute the second partial
derivatives; i.e., for any fixed indices 𝑖, 𝑗, 𝑘, and ℓ we have

𝜕 2 𝜆𝛼
𝜕ℎℓ𝑗 𝜕ℎ𝑖𝑘
𝜕𝑢𝛼 (𝑖) 𝜕𝑢 (𝑘)
= [2 − 𝛿𝑖𝑘 ][ 𝑢 (𝑘) + 𝑢𝛼 (𝑖) 𝛼 ]
𝜕ℎℓ𝑗 𝛼 𝜕ℎℓ𝑗
1
= [2 − 𝛿𝑖𝑘 ] ∑ [(𝑢 (𝑗)𝑢𝛼 (ℓ) + 𝑢𝛽 (ℓ)𝑢𝛼 (𝑗)[1 − 𝛿𝑗ℓ ])𝑢𝛽 (𝑖)𝑢𝛼 (𝑘)
𝛽≠𝛼
𝜆𝛼 − 𝜆𝛽 𝛽
(12.11)
+ (𝑢𝛽 (ℓ)𝑢𝛼 (𝑗) + 𝑢𝛽 (𝑗)𝑢𝛼 (ℓ)[1 − 𝛿𝑗ℓ ])𝑢𝛽 (𝑘)𝑢𝛼 (𝑖)]

1
= [2 − 𝛿𝑖𝑘 ] ∑ [(𝑢 (𝑗)𝑢𝛼 (ℓ) + 𝑢𝛽 (ℓ)𝑢𝛼 (𝑗)[1 − 𝛿𝑗ℓ ])
𝛽≠𝛼
𝜆𝛼 − 𝜆𝛽 𝛽

⋅ (𝑢𝛽 (𝑖)𝑢𝛼 (𝑘) + 𝑢𝛽 (𝑘)𝑢𝛼 (𝑖))].

By (12.2), (12.9), (12.11), and using Itô’s formula (neglecting the issue of singu-
larity), we have

𝜕𝜆𝛼 1 𝜕 2 𝜆𝛼
d𝜆𝛼 = ∑ dℎ𝑖𝑘 + ∑ ∑ (dℎ𝑖𝑘 )(dℎℓ𝑗 )
𝑖≤𝑘
𝜕ℎ𝑖𝑘 2 𝑖≤𝑘 𝑗≤ℓ 𝜕ℎ𝑖𝑘 𝜕ℎℓ𝑗

d𝑏𝑖𝑘 ℎ𝑖𝑘
= ∑ 𝑢𝛼 (𝑖)𝑢𝛼 (𝑘)[ − d𝑡]
𝑖,𝑘 √𝑁 2
(12.12)
1 1
+ ∑∑ [|𝑢 (𝑖)|2 |𝑢𝛼 (𝑘)|2 + |𝑢𝛼 (𝑖)|2 |𝑢𝛽 (𝑘)|2 ]d𝑡
2𝑁 𝑖,𝑘 𝛽≠𝛼 𝜆𝛼 − 𝜆𝛽 𝛽
1 𝜆𝛼 1 1
= ∑ 𝑢𝛼 (𝑖)𝑢𝛼 (𝑘)d𝑏𝑖𝑘 − d𝑡 + ∑ d𝑡.
√𝑁 𝑖,𝑘
2 𝑁 𝛽≠𝛼 𝜆𝛼 − 𝜆𝛽

In the second line we used 𝑏𝑖𝑘 = 𝑏𝑘𝑖 for 𝑖 ≠ 𝑘 and that


1
(dℎ𝑖𝑘 )(dℎℓ𝑗 ) = 𝛿 𝛿 [1 + 𝛿𝑖𝑘 ]d𝑡.
𝑁 𝑖ℓ 𝑘𝑗
Finally, in the last line we used the equation 𝐻𝐮𝛼 = 𝜆𝛼 𝐮𝛼 to get

∑ 𝑢𝛼 (𝑖)𝑢𝛼 (𝑘)ℎ𝑖𝑘 = ∑ 𝑢𝛼 (𝑖)(𝐻𝐮𝛼 )(𝑖) = 𝜆𝛼 ∑ |𝑢𝛼 (𝑖)|2 = 𝜆𝛼 ,


𝑖,𝑘 𝑖 𝑖

which gives the −(𝜆𝑖 /2)d𝑡 term in (12.4).


In the first term on the right-hand side of (12.12) we define a new real Gauss-
ian process
˜ 𝛼 ∶= ∑ 𝑢𝛼 (𝑖)𝑢𝛼 (𝑘)𝑏𝑖𝑘 .
𝐵
𝑖,𝑘
114 12. UNIVERSALITY FOR MATRICES WITH GAUSSIAN CONVOLUTIONS

˜ 𝛼 = 0 and its covariance satisfies


Clearly, 𝔼d 𝐵
˜ 𝛼d 𝐵
𝔼d 𝐵 ˜ 𝛼′ = 𝔼 ∑ ∑ 𝑢𝛼 (𝑖)𝑢𝛼 (𝑘)d𝑏𝑖𝑘 𝑢𝛼′ (ℓ)𝑢𝛼′ (𝑗)d𝑏ℓ𝑗
𝑖,𝑘 ℓ,𝑗

= 2 ∑ 𝑢𝛼 (𝑖)𝑢𝛼′ (𝑖)𝑢𝛼 (𝑘)𝑢𝛼′ (𝑘)d𝑡


𝑖,𝑘
= 𝛿𝛼𝛼′ 2 d𝑡

where there are two pairings, (𝑖, ℓ) = (𝑘, 𝑗) and (𝑖, ℓ) = (𝑗, 𝑘), that contributed
to the contractions. Thus 𝐵 ˜ 𝛼 = √2𝐵𝛼 where (𝐵𝛼 )𝑁 𝛼=1 is the standard (real)
𝑁
Brownian motion in ℝ , and this gives the martingale term in (12.4) with 𝛽 = 1.
A similar formula can be derived for the Hermitian case, where the parameter
becomes 𝛽 = 2; we omit the details.
In the next section we put the Dyson Brownian motion in a more general
context that is closer to the interpretation of GUE as an invariant ensemble in the
spirit of Chapter 4. It turns out that the measure on the eigenvalues, explicitly
given in (4.3), generates a DBM in a canonical way such that this measure will
be invariant under the dynamics. This holds for any values 𝛽 ≥ 1, i.e., even if
there is no underlying matrix ensemble.

12.3. Strong Local Ergodicity of the Dyson Brownian Motion


One key property of DBM is that it leaves the Gaussian Wigner ensembles
invariant. Moreover, the Gaussian measure is the only equilibrium of DBM, and
the DBM dynamics converges to this equilibrium from any initial condition. In
this section we will quantify these ideas.
We start with introducing the concept of the Dirichlet form and the gener-
ator associated with any probability measure 𝜇 = 𝜇𝑁 on ℝ𝑁 . For definiteness,
the reader may keep the invariant 𝛽-ensemble (4.3) in mind, i.e.,

𝑒−𝛽𝑁ℋ𝑁 (𝝀)
𝜇𝑁 (d𝝀) = d𝝀,
𝑍
(12.13) 𝑁
1 1
ℋ𝑁 (𝝀) ∶= ∑ 𝑉(𝜆𝑖 ) − ∑ log |𝜆𝑗 − 𝜆𝑖 |.
2 𝑖=1 𝑁 𝑖<𝑗

We will drop the subscript 𝑁, but most quantities depend on 𝑁.


We define the Dirichlet form associated with the measure 𝜇 on ℝ𝑁 , which
is just the homogeneous 𝐻 1 -norm, i.e.,
𝑁
1 1
(12.14) 𝐷𝜇 (𝑓) ∶= ∑ ∫(𝜕𝑖 𝑓)2 d𝜇 = ‖∇𝑓‖2𝐿2 (𝜇) (𝜕𝑖 ≡ 𝜕𝜆𝑖 ).
𝛽𝑁 𝑖=1 𝛽𝑁

We remark that in our earlier papers the Dirichlet form was defined with a
1/(2𝑁) prefactor instead of 1/(𝛽𝑁). The current convention is suitable to con-
sider DBM for general 𝛽.
12.3. STRONG LOCAL ERGODICITY OF THE DYSON BROWNIAN MOTION 115

The symmetric operator associated with the Dirichlet form is called the gen-
erator and denoted by ℒ = ℒ𝜇 . It satisfies

𝐷𝜇 (𝑓) = ⟨𝑓, (−ℒ)𝑓⟩𝐿2 (𝜇) = − ∫ 𝑓ℒ𝑓 d𝜇.

Notice that we follow the probabilistic convention to define the generator as a


negative operator, ℒ ≤ 0.
1
Formally, we have ℒ = 𝛽𝑁 Δ − (∇ℋ) ⋅ ∇, i.e.,

𝑁 𝑁 (𝑖)
1 2 1 1 1
(12.15) ℒ=∑ 𝜕𝑖 + ∑ (− 𝑉 ′ (𝜆𝑖 ) + ∑ )𝜕 .
𝑖=1
𝛽𝑁 𝑖=1
2 𝑁 𝑗 𝜆𝑖 − 𝜆𝑗 𝑖

In the special Gaussian case, when the confining potential is 𝑉(𝜆) = 12 𝜆2 ,


the corresponding measure restricted to Σ𝑁 ⊂ ℝ𝑁 is denoted by 𝜇𝐺 . Notice
that for 𝛽 = 1, 2 the measure 𝜇𝐺 coincides with the GOE and GUE measures,
respectively (see also (4.12)). The generator (12.15) reads as

𝑁 𝑁 (𝑖)
1 2 1 1 1
(12.16) ℒ𝐺 = ∑ 𝜕𝑖 + ∑ (− 𝜆𝑖 + ∑ )𝜕 .
𝑖=1
𝛽𝑁 𝑖=1
2 𝑁 𝑗
𝜆 𝑖 − 𝜆𝑗 𝑖

Now we consider the Dyson Brownian motion (12.4), i.e., dynamics of the
eigenvalues 𝝀 = (𝜆1 , … , 𝜆𝑁 ) ∈ Σ𝑁 of 𝐻𝑡 that evolves by (12.1). We write the
distribution of 𝝀 of 𝐻𝑡 at time 𝑡 as 𝑓𝑡 (𝝀)𝜇𝐺 (d𝝀). Comparing the SDE (12.4)
with ℒ𝐺 , we notice that Kolmogorov’s forward equation for the evolution of the
density 𝑓𝑡 takes the form

(12.17) 𝜕𝑡 𝑓𝑡 = ℒ𝐺 𝑓𝑡 (𝑡 ≥ 0).

We remark that the rigorous definition of ℒ as a self-adjoint operator on


2
𝐿 (d𝜇) through the Friedrichs extension and the existence of the corresponding
dynamics (12.17) restricted to the simplex Σ𝑁 for any 𝛽 ≥ 1 will be discussed
in Section 12.4. In particular, by the spectral theorem the operator (−ℒ)1/2 𝑒𝑡ℒ
is bounded for 𝑡 > 0; thus we see that for any initial condition 𝑓0 ∈ 𝐿2 (d𝜇)
we have 𝑓𝑡 ∈ 𝐻 1 (d𝜇) for any 𝑡 > 0. The restriction 𝛽 ≥ 1 is essential to de-
fine the dynamics on the simplex Σ𝑁 without specifying additional boundary
conditions; the same restriction was also necessary in Theorem 12.2 to define
the strong solution to the DBM as a stochastic differential equation. Indeed, if
𝛽 < 1, the particles cross each other in the SDE formulation. In this section,
we thus assume 𝛽 ≥ 1. Nevertheless, some ideas and results presented here are
still applicable to any 𝛽 > 0 with a regularization; we will comment on this in
Section 13.7.
By writing the distribution of the eigenvalues as 𝑓𝑡 𝜇𝐺 , our formulation of the
problem has already taken into account Dyson’s observation that the invariant
measure for DBM is 𝜇𝐺 since, as we will see, at 𝑡 → ∞ the solution 𝑓𝑡 converges
116 12. UNIVERSALITY FOR MATRICES WITH GAUSSIAN CONVOLUTIONS

to 𝑓∞ = 1. A natural question regarding the DBM is how fast the dynamics


reaches equilibrium. Dyson had already posed this question in 1962:
Dyson’s conjecture [45]. The global equilibrium of DBM is reached in time of
order 1 and the local equilibrium (in the bulk) is reached in time of order 1/𝑁.
Dyson further remarked,
The picture of the gas coming into equilibrium in two well-
separated stages, with microscopic and macroscopic time scales,
is suggested with the help of physical intuition. A rigorous proof
that this picture is accurate would require a much deeper math-
ematical analysis. [45, p. 1197]
We will prove that Dyson’s conjecture is correct if the initial data of the DBM
is the eigenvalues of a Wigner ensemble, which was Dyson’s original interest.
Our result in fact is valid for DBM with much more general initial data, which we
now survey. Briefly, it will turn out that the global equilibrium is indeed reached
within a time of order 1, but local equilibrium is achieved much faster if an
a priori estimate on the location of the eigenvalues (also called points) is satisfied.
Recalling the definition of the classical locations 𝛾𝑗 from (11.31), the a priori
bound (originally referred to as “Assumption III” in [63, 64]) is formulated as
follows.
A priori estimate. There exists a 𝜉 > 0 such that average rigidity on scale
−1+𝜉
𝑁 holds, i.e.,
𝑁
1
(12.18) 𝑄 = 𝑄𝜉 ∶= sup ∫ ∑ (𝜆𝑗 − 𝛾𝑗 )2 𝑓𝑡 (𝜆)𝜇𝐺 (d𝜆) ≤ 𝐶𝑁 −2+2𝜉
0≤𝑡≤𝑁 𝑁 𝑗=1

with a constant 𝐶 uniformly in 𝑁. (We assumed (12.18) with a lower bound


𝑡 ≥ 0 in the supremum, but in fact the proof shows that 𝑡 ≥ 𝑁 −1+2𝜉 would be
sufficient.)
Notice that (12.18) requires that the rigidity of the eigenvalues on scale
−1+𝜉
𝑁 holds, at least in an average sense. We recall that Theorem 11.5 guaran-
tees that for Wigner eigenvalues rigidity holds on scale 𝑁 −1+𝜉 for any 𝜉 > 0 (in
the bulk), so (12.18) holds for any 𝜉 > 0 in this case. However, our theory on
the local ergodicity of the DBM applies to other models as well, so we wish to
keep our formulation general and allow situations where (12.18) holds only for
larger 𝜉.
The main result on the local ergodicity of Dyson Brownian motion (12.17)
states that if the a priori estimate (12.18) is satisfied, then the local correlation
functions of the measure 𝑓𝑡 𝜇𝐺 are the same as the corresponding ones for the
Gaussian measure, 𝜇𝐺 = 𝑓∞ 𝜇𝐺 , provided that 𝑡 is larger than 𝑁 −1+2𝜉 . The
𝑛-point correlation functions of the (symmetrized) probability measure 𝜈 are
defined, similarly to (4.20), by

(𝑛)
(12.19) 𝑝𝜈,𝑁 (𝑥1 , … , 𝑥𝑛 ) ∶= ∫ 𝜈(𝐱)d𝑥𝑛+1 ⋯ d𝑥𝑁 , 𝐱 = (𝑥1 , … , 𝑥𝑁 ).
ℝ𝑁−𝑛
12.3. STRONG LOCAL ERGODICITY OF THE DYSON BROWNIAN MOTION 117

In particular, when 𝜈 = 𝑓𝑡 d𝜇𝐺 , we denote the correlation functions by


(𝑛) (𝑛)
(12.20) 𝑝𝑡,𝑁 (𝑥1 , … , 𝑥𝑛 ) = 𝑝𝑓𝑡 𝜇𝐺 ,𝑁 (𝑥1 , … , 𝑥𝑛 ).
(𝑛) (𝑛)
We also use 𝑝𝐺,𝑁 for 𝑝𝜇𝐺 ,𝑁 . In general, if 𝐻 is an 𝑁 × 𝑁 Wigner matrix, then
(𝑛)
we will also use 𝑝𝐻,𝑁 for the correlation functions of the eigenvalues of 𝐻.
Due to the convention that one can view the locations of eigenvalues as the
coordinates of particles (or points), we have used 𝐱, instead of 𝝀, in the last
equation. From now on, we will use both conventions depending on which
viewpoint we wish to emphasize. Notice that the probability distribution of the
eigenvalues at time 𝑡, 𝑓𝑡 𝜇𝐺 , is the same as that of the Gaussian divisible matrix:
(12.21) 𝐻𝑡 = 𝑒−𝑡/2 𝐻0 + (1 − 𝑒−𝑡 )1/2 𝐻 G ,
where 𝐻0 is the initial Wigner matrix and 𝐻 G is an independent standard GUE
(or GOE) matrix. This establishes the universality of the Gaussian divisible
ensembles. The precise statement is the following theorem (notice that [64]
uses somewhat different notation).
Theorem 12.4 ([64, theorem 2.1]). Suppose that for some exponent 𝜉 ∈
1
(0, ), the average rigidity (12.18) holds for the solution 𝑓𝑡 of the forward equation
2
(12.17) on scale 𝑁 −1+𝜉 . Additionally, suppose that in the bulk the rigidity holds
on scale 𝑁 −1+𝜉 even without averaging; i.e., for any 𝜅 > 0
(12.22) sup |𝜆𝑗 − 𝛾𝑗 | ≺ 𝑁 −1+𝜉
𝜅𝑁≤𝑗≤(1−𝜅)𝑁

holds for any 𝑡 ∈ [𝑁 −1+2𝜉 , 𝑁] if 𝑁 ≥ 𝑁0 (𝜉, 𝜅) is large enough. Let 𝐸 ∈ (−2, 2)


and 𝑏 = 𝑏𝑁 > 0 such that [𝐸 − 𝑏, 𝐸 + 𝑏] ⊂ (−2, 2). Then, for any integer 𝑛 ≥ 1
and for any compactly supported smooth test function 𝑂 ∶ ℝ𝑛 → ℝ we have, for
any 𝑡 ∈ [𝑁 −1+2𝜉 , 𝑁],

| 𝐸+𝑏 d𝐸 ′ (𝑛) (𝑛) 𝜶 |


(12.23) |∫ ∫ d𝜶 𝑂(𝜶)(𝑝𝑡,𝑁 − 𝑝𝐺,𝑁 )(𝐸 ′ + )| ≤
| 𝐸−𝑏 2𝑏 ℝ𝑛
𝑁 |
𝑁 −1+𝜉 1
𝑁𝜀[ +√ ]‖𝑂‖𝐶1
𝑏 𝑏𝑁𝑡
for any 𝑁 sufficiently large, 𝑁 ≥ 𝑁0 (𝑛, 𝜉, 𝜅).
The upper limit 𝑡 ≤ 𝑁 for the time interval in (12.18) and within the theorem
is unimportant; one could have replaced it with 𝑁 𝜀 for any 𝜀 > 0. Beyond 𝑡 ≥ 𝑁 𝜀
the measure 𝑓𝑡 𝜇G is already exponentially close to equilibrium 𝜇G in the entropy
sense, so all correlation functions are also close. Also, notice that for simplicity
of notation we replaced 𝜶/𝑁𝜚sc (𝐸) by 𝜶/𝑁 in the argument of the correlation
functions (compare (12.23) with (5.2)). This can be easily achieved by a change
of variables and a redefinition of the test function 𝑂. This convention will be
followed in the rest of the book.
118 12. UNIVERSALITY FOR MATRICES WITH GAUSSIAN CONVOLUTIONS

In other words, this theorem states that if we have rigidity on scale 𝑁 −1+𝜉
for some 𝜉 ∈ (0, 12 ), then the DBM has average energy universality in the bulk
for any time 𝑡 ≫ 𝑁 −1+2𝜉 on scale 𝑏 ≫ max{𝑁 −1+𝜉 , (𝑁𝑡)−1 }. For generalized
Wigner matrices we know that rigidity holds on the smallest possible scale, so
we have the following corollary:

Corollary 12.5. Consider the matrix OU process with initial matrix ensem-
ble 𝐻0 being a generalized Wigner ensemble and let |𝐸| < 2. Then, for any 𝜀 > 0,
for any 𝑡 ∈ [𝑁 −1+2𝜀 , 𝑁 𝜀 ], and for any 𝑏 ≥ (𝑁𝑡)−1 with 𝑏 < ||𝐸| − 2| we have
| 𝐸+𝑏 d𝐸 ′ (𝑛) (𝑛) 𝜶 | 𝑁 3𝜀
(12.24) |∫ ∫ d𝜶 𝑂(𝜶)(𝑝𝑡,𝑁 − 𝑝𝐺,𝑁 ) (𝐸 ′ + )| ≤ ‖𝑂‖𝐶1
| 𝐸−𝑏 2𝑏 ℝ𝑛 𝑁 | √𝑏𝑁𝑡
(𝑛)
for any 𝑁 ≥ 𝑁0 (𝑛, 𝜀) where 𝑝𝑡,𝑁 is the correlation function of 𝐻𝑡 , or equivalently,
that of 𝑒−𝑡/2 𝐻0 + √1 − 𝑒−𝑡 𝐻 G .

Proof. Since 𝐻0 is a generalized Wigner ensemble, so is 𝑒−𝑡/2 𝐻0 +


√1 − 𝑒−𝑡 𝐻 G for all 𝑡. Hence the optimal rigidity estimate (11.32) holds for all
times. Therefore, (12.18) and (12.22) hold with any 𝜉 > 0. Choose 𝜉 = 𝜀/2 and
apply Theorem 12.4. The right-hand side of (12.23) is bounded by
𝑁 𝜀/2 1 𝑁 3𝜀
𝑁𝜀[ + ]≤ ,
𝑁𝑏 √𝑏𝑁𝑡 √𝑏𝑁𝑡
where we have used that 𝑡 ≤ 𝑁 𝜀 and 𝑏𝑁𝑡 ≥ 1. □
The estimate (12.24) means that averaged energy universality holds in a
window of size slightly bigger than 𝑡𝑁. A large energy window of size 𝑏 =
𝑁 −𝛿 , for some small 𝛿 > 0, can already be achieved after a very short time,
i.e., for any 𝑡 ≥ 𝑁 −1+2𝛿 (by choosing 𝜀 = 𝛿/8). To achieve universality on
very short energy scales, 𝑏 = 𝑁 −1+𝛿 , we will need relatively large times, 𝑡 ≥
𝑁 −𝛿/2 (by choosing 𝜀 = 𝛿/16). In both cases, the right-hand side (12.24) is
bounded by 𝐶𝑁 −𝜀 . In the sense of universality of large-energy windows of size
𝑏 = 𝑁 −𝛿 , our result essentially establishes the Dyson’s conjecture that the time to
local equilibrium is 𝑁 −1 . A better error estimate can also be achieved if both 𝑏
and 𝑡 are large; e.g., with the choice 𝑏 = 𝑁 −𝛿 and 𝑡 ≥ 𝑁 −𝛿 , we get a convergence
of order 𝑁 −1/2+𝛿+𝜀 for any 𝜀 > 0.
Going back to Theorem 12.4, we also remark that there is considerable room
in this argument, even if optimal rigidity is not available (this is the case for
random band matrices, see [55]). In fact, Theorem 12.4 provides universality,
albeit only for relatively large times 𝑡 ≥ 𝑁 −𝛿 and with a large-energy averaging,
𝑏 = 𝑁 −𝛿 , even if (12.18) and (12.22) hold only with 𝜉 = (1+𝛿)/2. This restriction
comes from the requirement that 𝑡 ≥ 𝑁 −1+2𝜉 should be implied by 𝑡 ≥ 𝑁 −𝛿 .
This exponent 𝜉 slightly above 12 corresponds to a rigidity control on a scale
12.4. EXISTENCE AND RESTRICTION OF THE DYNAMICS 119

just below 𝑁 −1/2 in the bulk. Clearly, 𝑁 −1/2 is a critical threshold; no local
universality can be concluded with this argument unless a rigidity control on a
scale below 𝑁 −1/2 is established a priori.

12.4. Existence and Restriction of the Dynamics


This technical section was taken from appendix A of [64]; we include it here
for the reader’s convenience.
As in Section 12.3, we consider the euclidean space ℝ𝑁 with the normal-
ized measure 𝜇 = exp(−𝛽𝑁ℋ)/𝑍. The Hamiltonian ℋ, of the form (12.13),
is symmetric with respect to the permutation of the variables 𝐱 = (𝑥1 , … , 𝑥𝑁 );
thus, the measure can be restricted to the subset Σ𝑁 ⊂ ℝ𝑁 defined in (12.3). In
this appendix we outline how to define the dynamics (12.17) with its generator,
1
formally given by 𝐿 = 2𝑁 Δ − 12 (∇ℋ)∇, on Σ𝑁 . The condition 𝛽 ≥ 1 and the
specific factors ∏𝑖<𝑗 |𝑥𝑗 − 𝑥𝑖 |𝛽 will play a key role in the argument; in particular,
we will see that 𝛽 = 1 is the critical threshold for this method to work.
We first recall the standard definition of the dynamics on ℝ𝑁 . The quadratic
form
ℰ(𝑢, 𝑣) ∶= ∫ ∇𝑢 ⋅ ∇𝑣 d𝜇
ℝ𝑁
is a closable Markovian symmetric form on 𝐿2 (ℝ𝑁 , d𝜇) with a domain 𝐶0∞ (ℝ𝑁 )
(see example 1.2.1 and theorem 3.1.3 of [75]). This form can be closed with a
form domain 𝐻 1 (ℝ𝑁 , d𝜇) defined as the closure of 𝐶0∞ in the norm ‖ ⋅ ‖2+ =
ℰ( ⋅ , ⋅ ) + ‖ ⋅ ‖22 . The closure is called the Dirichlet form. It generates a strongly
continuous Markovian semigroup 𝑇𝑡 , 𝑡 > 0, on 𝐿2 [75, theorem 1.4.1], and it
can be extended to a contraction semigroup to 𝐿1 (ℝ𝑁 , d𝜇), ‖𝑇𝑡 𝑓‖1 ≤ ‖𝑓‖1 [75,
sec. 1.5]. The generator 𝐿 of the semigroup is defined via the Friedrichs exten-
sion [75, theorem 1.3.1], and it is a positive self-adjoint operator on its natural do-
1
main 𝐷(𝐿) with 𝐶0∞ being the core. The generator is given by 𝐿 = 2𝑁 Δ− 12 (∇ℋ)∇
on its domain [75, cor. 1.3.1]. By the spectral theorem, 𝑇𝑡 maps 𝐿2 into 𝐷(𝐿);
thus with the notation 𝑓𝑡 = 𝑇𝑡 𝑓 for some 𝑓 ∈ 𝐿2 , it holds that
(12.25) 𝜕𝑡 𝑓𝑡 = 𝐿𝑓𝑡 , 𝑡 > 0, and lim ‖𝑓𝑡 − 𝑓‖2 = 0.
𝑡→0+

Moreover, by approximating 𝑓 by 𝐿2 functions and using that 𝑇𝑡 is a contraction


in 𝐿1 (section 1.5 in [75]), the differential equation holds even if the initial con-
dition 𝑓 is only in 𝐿1 . In this case, the convergence 𝑓𝑡 → 𝑓, as 𝑡 → 0+ , holds
only in 𝐿1 . We remark that 𝑇𝑡 is also a contraction on 𝐿∞ , by duality.
Now we restrict the dynamics to Σ = Σ𝑁 equipped with the probability
measure 𝜇Σ (d𝐱) ∶= 𝑁! 𝜇(d𝐱) ⋅ 1(𝐱 ∈ Σ). Here 𝜇𝑁 is from (12.13) and the factor
𝑁! restores the normalization after restriction to Σ. Repeating the general con-
struction with ℝ𝑁 replaced by Σ𝑁 , we obtain the corresponding generator 𝐿(Σ)
(Σ)
and the semigroup 𝑇𝑡 on the space 𝐻 1 (Σ, d𝜇Σ ) that is defined as the closure
of 𝐶0∞ (Σ) with respect to the norm ‖ ⋅ ‖2+ = ℰ( ⋅ , ⋅ ) + ‖ ⋅ ‖22 on Σ.
120 12. UNIVERSALITY FOR MATRICES WITH GAUSSIAN CONVOLUTIONS

To establish the relation between 𝐿 and 𝐿(Σ) , we first define the symmetrized
version of Σ,
Σ̃ ∶= ℝ𝑁 ⧵ {𝐱 ∶ ∃𝑖 ≠ 𝑗 with 𝑥𝑖 = 𝑥𝑗 }.
Denote 𝑋 ∶= 𝐶0∞ ( Σ̃ ). The key information is that 𝑋 is dense in 𝐻 1 (ℝ𝑁 , d𝜇),
which is equivalent to the density of 𝑋 in 𝐶0∞ (ℝ𝑁 , d𝜇). We will check this prop-
erty below. Then the general argument above directly applies if ℝ𝑁 is replaced
by Σ̃ 𝑁 , and it shows that the generator 𝐿 is the same (with the same domain)
if we start from 𝑋 instead of 𝐶0∞ (ℝ𝑁 , d𝜇) as a core.
Note that both 𝐿 and 𝐿(Σ) are local operators and 𝐿 is symmetric with respect
to the permutation of the variables. For any function 𝑓 defined on Σ, we define its
symmetric extension onto Σ̃ by 𝑓 ˜ . Clearly, 𝐿 𝑓 ˜ = ˜ 𝐿(Σ) 𝑓 for any 𝑓 ∈ 𝐶0∞ (Σ).
Since the generator is uniquely determined by its action on its core, and the
generator uniquely determines the dynamics, we see that for any 𝑓 ∈ 𝐿1 (Σ, d𝜇),
(Σ) ˜ and restricting it to Σ. In other
one can determine 𝑇𝑡 𝑓 by computing 𝑇𝑡 𝑓
words, the dynamics (12.17) is well-defined when restricted to Σ = Σ𝑁 .
Finally, we have to prove the density of 𝑋 in 𝐶0∞ (ℝ𝑁 , d𝜇), i.e., to show that if
𝑓 ∈ 𝐶0∞ (ℝ𝑁 ), then a sequence 𝑓𝑛 ∈ 𝐶0∞ ( Σ̃ ) exists such that ℰ(𝑓−𝑓𝑛 , 𝑓−𝑓𝑛 ) → 0.
The structure of Σ̃ is complicated since in addition to the one-codimensional
coalescence hyperplanes 𝑥𝑖 = 𝑥𝑗 (and 𝑥𝑖 = 0 in case of Σ+ ), it contains higher-
order coalescence subspaces with higher codimensions. We will show the ap-
proximation argument in a neighborhood of a point 𝐱 such that 𝑥𝑖 = 𝑥𝑗 but
𝑥𝑖 ≠ 𝑥𝑘 for any other 𝑘 ≠ 𝑖, 𝑗. The proof uses the fact that the measure d𝜇
vanishes at least to first order, i.e., at least as |𝑥𝑖 − 𝑥𝑗 | around 𝐱, thanks to 𝛽 ≥ 1.
This is the critical case; the argument near higher-order coalescence points is
even easier, since they have lower codimension and the measure 𝜇 vanishes at
even higher order.
In a neighborhood of 𝐱 we can change to local coordinates such that 𝑟 ∶=
𝑥𝑖 − 𝑥𝑗 remains the only relevant coordinate. Thus, the task is equivalent to
showing that any 𝑔 ∈ 𝐶0∞ (ℝ) can be approximated by a sequence 𝑔𝜀 ∈ 𝐶0∞ (ℝ ⧵
{0}) in the sense that

(12.26) ∫ |𝑔′ (𝑟) − 𝑔𝜀′ (𝑟)|2 |𝑟|d𝑟 → 0


as 𝜀 → 0 over a discrete set, e.g., 𝜀 = 𝜀𝑛 = 1/𝑛. It is sufficient to consider only


the positive semi-axis, i.e., 𝑟 > 0. More precisely, define
(12.27) 𝑔𝜀 (𝑟) ∶= 𝑔(𝑟)𝜙𝜀 (𝑟)
where 𝜙𝜀 is a continuous cutoff function defined as 𝜙𝜀 (𝑟) = 0 for 𝑟 ≤ 𝜀2 , 𝜙𝜀 (𝑟) = 1
for 𝑟 ≥ 𝜀, and
𝜙𝜀 (𝑟) = 𝐶𝜀 (log 𝑟 − 2 log 𝜀), 𝜀2 ≤ 𝑟 ≤ 𝜀,
(12.28) 1
𝐶𝜀 = = |log 𝜀|−1 .
log 𝜀 − log 𝜀2
12.4. EXISTENCE AND RESTRICTION OF THE DYNAMICS 121

Simple calculation shows that



(12.29) ∫ |𝑔′ (𝑟) − 𝑔𝜀′ (𝑟)|2 𝑟 d𝑟 ≤
0
∞ ∞
𝐶 ∫ |𝑔′ (𝑟)|2 |1 − 𝜙𝜀 (𝑟)|2 𝑟 d𝑟 + 𝐶 ∫ |𝑔(𝑟)|2 |𝜙𝜀′ (𝑟)|2 𝑟 d𝑟.
0 0
The first term converges to 0 as 𝜀 → 0 by dominated convergence since 𝑔 ∈
𝐻 1 (ℝ+ , 𝑟 d𝑟) and 𝜙𝜀 → 1 pointwise. The second term is bounded by
∞ 𝜀
12 ‖𝑔‖2∞
∫ |𝑔(𝑟)| 2
|𝜙𝜀′ (𝑟)|2 𝑟 d𝑟 ≤ 𝐶𝜀2 ‖𝑔‖2∞ ∫ ||| ||| 𝑟 d𝑟 = ,
0 𝜀2
𝑟 |log 𝜀|
which also tends to 0 since 𝑔 ∈ 𝐿∞ . This shows that 𝑔𝜀 converges to 𝑔 in 𝐻 1 .
The function 𝑔𝜀 is not yet smooth, but we can smooth it on a scale 𝛿 ≪ 𝜀2
by taking the convolution with a smooth function 𝜂𝛿 compactly supported in
[−𝛿, 𝛿]. It is an elementary exercise to see that 𝜂𝛿 ⋆ 𝑔𝜀 converges to 𝑔𝜀 in 𝐻 1
as 𝛿 → 0. This verifies (12.26). The same proof shows that 𝐶0∞ (ℝ+ ⧵ {0}) is
dense in 𝐻 1 (ℝ+ , 𝑟 d𝑟) ∩ 𝐿∞ (ℝ+ , 𝑟 d𝑟). This completes the proof that for 𝛽 ≥ 1
the dynamics is well-defined in the space 𝐻 1 (Σ, d𝜇Σ ).
In fact, the boundedness condition 𝑓 ∈ 𝐿∞ (ℝ+ , 𝑟 d𝑟) can be removed in the
above argument, and one can show that 𝐶0∞ (ℝ+ ⧵ {0}) is dense in 𝐻 1 (ℝ+ , 𝑟 d𝑟).
This is a type of Meyers-Serrin theorem concerning the equivalence of two pos-
sible definitions of Sobolev spaces on the measure space (ℝ+ , 𝑟 d𝑟): the closure
of 𝐶0∞ functions coincides with the set of functions with finite 𝐻 1 -norm. An-
other formulation is the following: If we extend the functions on ℝ+ to two-
dimensional radial functions, i.e., 𝐺(𝑥) ∶= 𝑔(|𝑥|), 𝐺𝜀 (𝑥) = 𝑔𝜀 (|𝑥|) for 𝑥 ∈ ℝ2 ,
then this statement is equivalent to the fact that a point in two dimensions has
zero capacity.
To see this stronger claim, we need a different cutoff function in (12.27). Let
us now define
𝜙𝜀 (𝑟) ∶= log(𝑎 + 𝑏 log 𝑟), 𝜀2 ≤ 𝑟 ≤ 𝜀,
with 𝑏 ∶= (1 − 𝑒)/ log 𝜀, 𝑎 ∶= 2𝑒 − 1, and 𝜙𝜀 (𝑟) ∶= 0 for 𝑟 < 𝜀2 , and 𝜙𝜀 (𝑟) ∶= 1
for 𝑟 ≥ 𝜀. The first term on the right-hand side of (12.29) goes to 0 as before.
For the last term in (12.29), we have
∞ 𝜀
2 |𝑔(𝑟)|2
∫ |𝑔(𝑟)| |𝜙𝜀′ (𝑟)|2 𝑟 d𝑟 =𝑏 ∫2
𝑟 d𝑟.
0 𝜀2
𝑟2 (𝑎 + 𝑏 log 𝑟)2
We view this as an integral on ℝ by radially extending the function 𝑔 to ℝ2 by
2

𝐺(𝑦) ∶= 𝑔(|𝑦|) for any 𝑦 ∈ ℝ2 . Then the left-hand side above gives
𝜀
|𝑔(𝑟)|2 |𝐺(𝑦)|2
𝑏2 ∫ 𝑟 d𝑟 = 𝑏 2
∫ d𝑦
𝜀2
𝑟2 (𝑎 + 𝑏 log 𝑟)2 𝜀2 ≤|𝑦|≤𝜀
|𝑦|2 (𝑎 + 𝑏 log |𝑦|)2
|𝐺(𝑦)|2
≤ 𝐶∫ d𝑦.
𝜀2 ≤|𝑦|≤𝜀
|𝑦|2 (log |𝑦|)2
122 12. UNIVERSALITY FOR MATRICES WITH GAUSSIAN CONVOLUTIONS

𝑘
Define the sequence 𝜀𝑘 ∶= 2−2 ; clearly 𝜀𝑘2 = 𝜀𝑘+1 . Setting
|𝐺(𝑦)|2
𝐼𝑘 ∶= ∫ d𝑦,
𝜀𝑘+1 ≤|𝑦|≤𝜀𝑘
|𝑦|2 (log |𝑦|)2
we clearly have
∞ ∞
|𝐺(𝑦)|2
∑ 𝐼𝑘 = ∫ d𝑦 ≤ 𝐶 ∫ |∇𝐺|2 = 𝐶 ′ ∫ |𝑔′ (𝑟)|2 𝑟 d𝑟,
𝑘=1 |𝑦|≤1/2
|𝑦|2 (log |𝑦|)2 ℝ2 0

where in the last step we used the critical Hardy inequality with logarithmic
correction (see, e.g., theorem 2.8 in [48]). Since in our case 𝑔 ∈ 𝐻 1 (ℝ+ , 𝑟 d𝑟),
from the last displayed inequality we have 𝐼𝑘 → 0 as 𝑘 → ∞. Therefore, we can
use the cutoff function 𝜙𝜀𝑘 to prove the claim.
The Meyers-Serrin theorem we just proved for (𝑅+ , 𝑟 d𝑟) can be extended to
(ℝ𝑁 , d𝜇) using local changes of coordinates if 𝛽 ≥ 1. Thus, one may prove that
the form domain of the operator 𝐿(Σ) is 𝐻 1 (Σ, d𝜇Σ ) and the dynamics (12.25) is
well-defined on this space.
CHAPTER 13

Entropy and the Logarithmic Sobolev Inequality (LSI)

13.1. Basic Properties of the Entropy


In this section, we prove a few well-known inequalities for the entropy that
will be used in next section for the logarithmic Sobolev inequality (LSI). In this
section we work on an arbitrary probability space (Ω, ℬ) with an appropriate
𝜎-algebra. We will use the probabilistic and the analysis notation and concepts
in parallel. In particular, we interchangeably use the concept of random vari-
ables and measurable functions on Ω. Similarly, both the expectation 𝔼𝜇 [𝑓] and
the integral notation ∫Ω 𝑓 d𝜇 will be used. We will drop the Ω in the notation.
Definition 13.1. For any two probability measures 𝜈, 𝜇 on the same prob-
ability space, we define the relative entropy of 𝜈 w.r.t. 𝜇 by
d𝜈 d𝜈 d𝜈
(13.1) 𝑆(𝜈|𝜇) ∶= ∫ log d𝜈 = ∫ log d𝜇
d𝜇 d𝜇 d𝜇
if 𝜈 is absolutely continuous with respect to 𝜇, and we set 𝑆(𝜈|𝜇) ∶= ∞ other-
wise.
Applying the definition (13.1) to 𝜈 = 𝑓𝜇 for any probability density 𝑓 with
∫ 𝑓 d𝜇 = 1, we may also write

𝑆𝜇 (𝑓) ∶= 𝑆(𝑓𝜇|𝜇) = ∫ 𝑓(log 𝑓)d𝜇.

In most cases, the reference measure 𝜇 will be canonical (e.g., the natural equi-
librium measure), so often we will drop the subscript 𝜇 if there is no confusion.
We may then call 𝑆𝜇 (𝑓) = 𝑆(𝑓) the entropy of 𝑓. Using the convexity of the
function 𝑥 ↦ 𝑥 log 𝑥 on ℝ+ , a simple Jensen’s inequality,

0 = (∫ 𝑓 d𝜇) log(∫ 𝑓 d𝜇) ≤ ∫ 𝑓 log 𝑓 d𝜇,

shows that the relative entropy is always nonnegative, 𝑆(𝜈|𝜇) ≥ 0.


Proposition 13.2 (Entropy inequality or Gibbs inequality). Let 𝜇, 𝜈 be two
probability measures on a common probability space and let 𝑋 be a random vari-
able. Then, for any positive number 𝛼 > 0, the following inequality holds as long
as the right-hand side is finite:
(13.2) 𝔼𝜈 [𝑋] ≤ 𝛼 −1 𝑆(𝜈|𝜇) + 𝛼 −1 log 𝔼𝜇 𝑒𝛼𝑋 .
123
124 13. ENTROPY AND THE LOGARITHMIC SOBOLEV INEQUALITY (LSI)

Proof. First note that we only have to prove the case 𝛼 = 1 since we can
redefine 𝛼𝑋 → 𝑋. From the concavity of the logarithm and Jensen’s inequality,
we have
d𝜇
∫ 𝑋 d𝜈 − 𝑆(𝜈|𝜇) = ∫ log[𝑒𝑋 ]d𝜈 ≤ log 𝔼𝜈 𝑒𝑋 ,
d𝜈
and this proves (13.2). □
Although (13.2) is just an inequality, there is a variational characterization
of the relative entropy behind it. Namely,
(13.3) 𝑆(𝜈|𝜇) = sup[𝔼𝜈 [𝑋] − log 𝔼𝜇 𝑒𝑋 ],
𝑋

where the supremum is over all bounded random variables. As we will not need
this relation in this book, we leave it to the interested reader to prove it.
As a corollary of (13.2), we mention that for any set 𝐴 we have the bound
log 2 + 𝑆(𝜈|𝜇)
(13.4) ℙ𝜈 (𝐴) ≤ 1
.
log
ℙ𝜇 (𝐴)

The proof is left as an exercise. This inequality will be used in the following
context. Suppose that the relative entropy of 𝜈 with respect to 𝜇 is finite. Then
in order to show that a set has a small probability w.r.t. 𝜈, we need to verify
that this set is exponentially small w.r.t. 𝜇. In this sense, entropy provides only
a relatively weak link between two measures. However, it is still stronger than
the total variational norm, which we will show now.
For any two probability measures 𝑓𝜇 and 𝜇, the 𝐿𝑝 -distance between 𝑓𝜇
and 𝜇 is defined by
1/𝑝
𝑝
(13.5) [∫ |𝑓 − 1| d𝜇] .

When 𝑝 = 1, it is called the total variational norm between 𝑓𝜇 and 𝜇. Entropy


is a weaker measure of distance between two probability measures than the 𝐿𝑝 -
distance for any 𝑝 > 1 but stronger than the total variational norm. The former
statement can be expressed, for example, by the elementary inequality
1/𝑝
2
(13.6) ∫ 𝑓 log 𝑓 d𝜇 ≤ 2[∫ |𝑓 − 1|𝑝 d𝜇] + ∫ |𝑓 − 1|𝑝 d𝜇, 𝑝 > 1,
𝑝−1
whose proof is left as an exercise. (There is a more natural way to compare 𝐿𝑝 -
distance and the relative entropy in terms of the 𝐿 log 𝐿 Orlicz norm, but we
will not use this norm in this book.) The latter statement will be made precise
in the following proposition. Furthermore, we also remark the following easy
relation among 𝐿𝑝 -norms and the entropy:
1/𝑝
d | 𝑝
(13.7) | [∫ 𝑓 d𝜇] = ∫ 𝑓 log 𝑓 d𝜇
d𝑝 |
𝑝=1

holds for any probability density 𝑓 w.r.t. 𝜇.


13.1. BASIC PROPERTIES OF THE ENTROPY 125

Proposition 13.3 (Entropy and total variation norm, Pinsker inequality).


Suppose that ∫ 𝑓 d𝜇 = 1 and 𝑓 ≥ 0. Then, we have
2
(13.8) 2[∫ |𝑓 − 1|d𝜇] ≤ ∫ 𝑓 log 𝑓 d𝜇.

Proof. By the variational principle, we first rewrite

(13.9) ∫ |𝑓 − 1|d𝜇 = sup ∫ 𝑓𝑔 d𝜇 − ∫ 𝑔 d𝜇.


|𝑔|≤1

For any such function 𝑔, we have by the entropy inequality (13.2) that for any
𝑡>0

(13.10) ∫ 𝑓𝑔 d𝜇 − ∫ 𝑔 d𝜇 ≤ 𝑡 −1 log ∫ 𝑒𝑡𝑔 d𝜇 − ∫ 𝑔 d𝜇 + 𝑡 −1 ∫ 𝑓 log 𝑓 d𝜇.

We now define the function

ℎ(𝑡) ∶= log ∫ 𝑒𝑡𝑔 d𝜇

for any 𝑡 ≥ 0. A simple calculation shows that the second derivative of ℎ is given
by
𝑒𝑡𝑔 𝜇
ℎ′′ (𝑡) = ⟨𝑔; 𝑔⟩𝜔𝑡 , 𝜔𝑡 ∶= ,
∫ 𝑒𝑡𝑔 d𝜇
where 𝜔𝑡 is a probability measure, and

⟨𝑓; 𝑔⟩𝜔 ∶= ∫ 𝑓𝑔 d𝜔 − (∫ 𝑓 d𝜔)(∫ 𝑔 d𝜔)

denotes the covariance. Recall that the covariance is positive definite; i.e., it
satisfies the usual Schwarz inequality,

|⟨𝑓; 𝑔⟩𝜔 |2 ≤ ⟨𝑓; 𝑓⟩𝜔 ⟨𝑔; 𝑔⟩𝜔 .

Since |𝑔| ≤ 1, we have 0 ≤ ⟨𝑔; 𝑔⟩𝜔𝑡 ≤ 1, so ℎ″ (𝑡) ≤ 1. Thus by Taylor’s theorem,


ℎ(𝑡) ≤ ℎ(0) + 𝑡ℎ′ (0) + 𝑡 2 /2, i.e.,
𝑡
𝑡−1 log ∫ 𝑒𝑡𝑔 d𝜇 ≤ ∫ 𝑔 d𝜇 + .
2
Together with (13.10), we have

𝑡 1
∫ 𝑓𝑔 d𝜇 − ∫ 𝑔 d𝜇 ≤ + 𝑡 −1 ∫ 𝑓 log 𝑓 d𝜇 ≤ ∫ 𝑓 log 𝑓 d𝜇,
2 √2
where we optimized 𝑡 in the last step. Since this bound holds for any 𝑔 with
|𝑔| ≤ 1, using (13.9) we have proved (13.8). □
126 13. ENTROPY AND THE LOGARITHMIC SOBOLEV INEQUALITY (LSI)

13.2. Entropy on Product Spaces and Conditioning


Now we consider a product measure; i.e., we assume that the probability
space has a product structure, Ω = Ω1 × Ω2 and 𝜇 = 𝜇1 ⊗ 𝜇2 , 𝜈 = 𝜈1 ⊗ 𝜈2 where
𝜇𝑗 and 𝜈𝑗 are probability measures on Ω𝑗 , 𝑗 = 1, 2. For simplicity, we denote the
elements of Ω by (𝑥, 𝑦) with 𝑥 ∈ Ω1 , 𝑦 ∈ Ω2 . Writing 𝜈𝑗 = 𝑓𝑗 𝜇𝑗 , clearly we
have
𝑆(𝜈1 ⊗ 𝜈2 |𝜇1 ⊗ 𝜇2 ) = 𝑆𝜇1 ⊗𝜇2 (𝑓1 ⊗ 𝑓2 )

(13.11) =∬ 𝑓1 (𝑥)𝑓2 (𝑦)[log 𝑓1 (𝑥) + log 𝑓2 (𝑦)]𝜇1 (d𝑥)𝜇2 (d𝑦)


Ω1 ×Ω2
= 𝑆𝜇1 (𝑓1 ) + 𝑆𝜇2 (𝑓2 ) = 𝑆(𝜈1 |𝜇1 ) + 𝑆(𝜈2 |𝜇1 );
i.e., the entropy is additive for product measures.
This property of the entropy makes it an especially suitable tool to measure
closeness in very high-dimensional analysis. For example, if the probability
space is a large product space Ω = Ω⊗𝑁 1 equipped with two product measures
𝜇 = 𝜇1⊗𝑁 and 𝜈 = 𝑓 ⊗𝑁 𝜇, then their relative entropy
(13.12) 𝑆𝜇 (𝑓 ⊗𝑁 ) = 𝑁𝑆𝜇1 (𝑓)
grows only linearly in 𝑁, the number of degrees of freedom. It is easy to check
that any 𝐿𝑝 -distance (13.5) for 𝑝 > 1 grows exponentially in 𝑁, which usually
renders it useless. Thus the entropy is still stronger than the 𝐿1 -norm, but its
growth in 𝑁 is much more manageable. An additional advantage is that the
entropy is often easier to compute than any other 𝐿𝑝 -norm (with the exception
of 𝐿2 ).
Next we discuss how the entropy decomposes under conditioning. For sim-
plicity, we assume the product structure, Ω = Ω1 × Ω2 as before. Let 𝜔 be a
probability measure on the product space Ω. For any integrable function (ran-
dom variable) 𝑢(𝑥, 𝑦) on Ω we denote its conditional expectation by
(13.13) ˆ = 𝔼𝜔 [𝑢|ℱ1 ]
𝑢
where ℱ1 is the 𝜎-algebra of Ω1 canonically lifted to Ω. Note that 𝑢 ˆ = 𝑢ˆ(𝑥)
depends only on the 𝑥-variable.
The conditional expectation 𝑢 ˆ may also be characterized by the following
relation: it is the unique measurable function on Ω1 such that for any bounded,
measurable (w.r.t. ℱ1 ) function 𝑂(𝑥) the following identity holds:
𝔼𝜔 [ˆ
𝑢𝑂] = 𝔼𝜔 [𝑢𝑂].
Written out in coordinates, it means that

(13.14) ˆ(𝑥)𝑂(𝑥)𝜔(d𝑥 d𝑦) = ∬ 𝑢(𝑥, 𝑦)𝑂(𝑥)𝜔(d𝑥 d𝑦).


∬𝑢
Ω Ω
In particular, if 𝜔(d𝑥 d𝑦) = 𝜔(𝑥, 𝑦)d𝑥 d𝑦 is absolutely continuous with respect to
some reference product measure d𝑥 d𝑦 on Ω, then we have, somewhat formally,
∫Ω 𝑢(𝑥, 𝑦)𝜔(𝑥, 𝑦)d𝑦
(13.15) ˆ(𝑥) = 2
𝑢 .
∫Ω2 𝜔(𝑥, 𝑦)d𝑦
13.2. ENTROPY ON PRODUCT SPACES AND CONDITIONING 127

The notation d𝑥 d𝑦 for the reference measure already indicates that in our appli-
cations Ω𝑗 will be euclidean spaces ℝ𝑛𝑗 of some dimension 𝑛𝑗 , and the reference
measure will be the Lebesgue measure.
We remark that the concept of conditioning can be defined in full generality
with respect to any sub-𝜎-algebra; the product structure of Ω is not essential.
However, we will not need the general definition in this book and the above
definition is conceptually simpler.
The conditional expectation gives rise to a trivial martingale decomposition:
(13.16) 𝑢=𝑢
ˆ + (𝑢 − 𝑢
ˆ),
where 𝑢ˆ is ℱ1 -measurable, while 𝑢−ˆ
𝑢 has zero expectation on any ℱ1 -measurable
set. Subtracting the expectation 𝑢 ∶= 𝔼𝜔 𝑢 = 𝔼𝜔 𝑢 ˆ and squaring this formula,
we have the martingale decomposition of the variance of 𝑢:
Var𝜔 (𝑢) ∶= 𝔼𝜔 (𝑢 − 𝑢)2 = 𝔼𝜔 (𝑢 − 𝑢
ˆ)2 + 𝔼𝜔 (ˆ
𝑢 − 𝑢)2
(13.17)
= 𝔼𝜔 Var(𝑢(𝑥, ⋅ )) + Var𝜔 (ˆ
𝑢)
where we defined the conditional variance
ˆ)2 |ℱ1 ] = 𝔼[𝑢2 |ℱ1 ] − [ˆ
Var(𝑢(𝑥, ⋅ )) ∶= 𝔼[(𝑢 − 𝑢 𝑢 ]2 .
The identity (13.17) is a triviality, but its interpretation is important. It
means that the variance is additive w.r.t. the martingale decomposition (13.16).
The first term 𝔼𝜔 Var(𝑢(𝑥, ⋅ )) is the expectation of the variance w.r.t. 𝑦 condi-
tioned on 𝑥; the second term Var(ˆ 𝑢 − 𝑢)2 is the variance of the marginal w.r.t. 𝑥.
In other words, we can compute the variance one by one.
The martingale decomposition has an analogue for the entropy. For sim-
plicity, we assume that 𝜔 has a density 𝜔(𝑥, 𝑦) w.r.t. a reference measure d𝑥 d𝑦.
Denote by 𝜔˜ the marginal 𝜔˜ probability density on Ω1 ,

𝜔(𝑥)
˜ = ∫ 𝜔(𝑥, 𝑦)d𝑦 ,
Ω2

and by 𝑓ˆ the marginal density of 𝑓𝜔 w.r.t. 𝜔,


˜ i.e.,
∫Ω2 𝑓(𝑥, 𝑦)𝜔(𝑥, 𝑦)d𝑦
ˆ =
𝑓(𝑥) .
𝜔(𝑥)
˜
In particular, for any test function 𝑂(𝑥) we have

(13.18) ˆ 𝜔(𝑥)d𝑥.
∬ 𝑂(𝑥)𝑓(𝑥, 𝑦)𝜔(𝑥, 𝑦)d𝑥d𝑦 = ∫ 𝑂(𝑥)𝑓(𝑥) ˆ
Ω Ω1

Let
𝜔(𝑥, 𝑦)
𝜔𝑥 (𝑦) ∶=
𝜔(𝑥)
˜
be the probability density on Ω2 conditioned on a fixed 𝑥 ∈ Ω1 . Define
𝑓(𝑥, 𝑦)
(13.19) 𝑓𝑥 (𝑦) ∶=
ˆ
𝑓(𝑥)
128 13. ENTROPY AND THE LOGARITHMIC SOBOLEV INEQUALITY (LSI)

to be the corresponding density of the measure 𝑓𝜔 conditioned on a fixed 𝑥 ∈


Ω1 . Note that with these definitions, we have
𝑓(𝑥, 𝑦)𝜔(𝑥, 𝑦)
(𝑓𝜔)𝑥 (𝑦) = = 𝑓𝑥 (𝑦)𝜔𝑥 (𝑦).
∫Ω2 𝑓(𝑥, 𝑦)𝜔(𝑥, 𝑦)d𝑦
Now we are ready to state the following proposition on the additivity of
entropy w.r.t. martingale decomposition:
Proposition 13.4. Using the notation above, we have
𝑆𝜔 (𝑓) = 𝔼𝑓𝜔˜𝑆𝜔𝑥 (𝑓𝑥 ) + 𝑆𝜔˜(𝑓ˆ ).
ˆ
(13.20)
In particular, the marginal entropy is bounded by the total entropy, i.e.,
(13.21) 𝑆𝜔˜(𝑓ˆ ) ≤ 𝑆𝜔 (𝑓).
In other words, the entropy is additive w.r.t. the martingale decomposition;
i.e., the entropy is the sum of the expectation of the entropy in 𝑦 conditioned
on 𝑥 and the entropy of the marginal in 𝑥. The additivity of entropy in this
sense is an important tool in the application of the LSI, as we will demonstrate
shortly. It also indicates that entropy is an extensive quantity; i.e., the entropy
of two probabilities on a product space Ω𝑁 is often of order 𝑁, generalizing the
formulas (13.11)-(13.12) to nonproduct measures.
Proof. We can decompose the entropy as follows:

𝑆𝜔 (𝑓) = ∫ 𝑓 log 𝑓 d𝜔 = ∫ 𝑓 log(𝑓/𝑓ˆ )d𝜔 + ∫ 𝑓 log 𝑓ˆ d𝜔


Ω Ω
(13.22)
= ∫ 𝑓 log(𝑓/𝑓ˆ )d𝜔 + ∫ 𝑓ˆ log 𝑓ˆ d𝜔,
˜
Ω Ω1

where in the last step we used (13.18). The first term we can rewrite as

∫ 𝑓 log(𝑓/𝑓ˆ )𝜔(𝑥, 𝑦)d𝑥 d𝑦

(13.23) = ∫ d𝑥 𝜔(𝑥) ˆ
˜ 𝑓(𝑥)[∫ 𝑓𝑥 (𝑦) log 𝑓𝑥 (𝑦)𝜔𝑥 (𝑦)d𝑦]

= 𝔼𝑓𝜔˜𝑆𝜔𝑥 (𝑓𝑥 ).
ˆ

We have thus proved (13.20). □

13.3. Logarithmic Sobolev Inequality


The logarithmic Sobolev inequality requires that the underlying probability
space admit a concept of differentiation. While the theory can be developed for
more general spaces, we restrict our attention to the probability space Ω = ℝ𝑁
in this subsection. We will work with probability measures 𝜇 on ℝ𝑁 that are
defined by a Hamiltonian ℋ:
𝑒−ℋ(𝐱)
(13.24) d𝜇(𝐱) = d𝐱,
𝑍
13.3. LOGARITHMIC SOBOLEV INEQUALITY 129

where 𝑍 is a normalization factor. In Section 13.7 we will comment on how to


extend the results of this section to certain subsets of ℝ𝑁 , most importantly, to
the simplex Σ𝑁 = {𝐱 ∈ ℝ𝑁 ∶ 𝑥1 < ⋯ < 𝑥𝑁 }.
Note that for simplicity in this presentation we neglect the 𝛽𝑁 prefactor
compared with (12.13). Let ℒ be the generator of the dynamics associated with
the Dirichlet form

𝐷(𝑓) = 𝐷𝜇 (𝑓) = − ∫ 𝑓(ℒ𝑓)d𝜇


(13.25) 𝑁
∶= ∫ |∇𝑓|2 d𝜇 = ∑ ∫(𝜕𝑗 𝑓)2 d𝜇, 𝜕𝑗 = 𝜕𝑥𝑗 .
𝑗=1

Formally, we have ℒ = Δ − (∇ℋ) ⋅ ∇. The operator ℒ is symmetric with respect


to the measure d𝜇, i.e.,

(13.26) ∫ 𝑓(ℒ𝑔)d𝜇 = ∫(ℒ𝑓)𝑔 d𝜇 = −∫ ∇𝑓 ⋅ ∇𝑔 d𝜇.

We remark that in many books on probability, e.g., [43], the Dirichlet form
(13.25) is defined with a factor 12 , but this convention is not compatible with
the 1/(𝛽𝑁) prefactor in (12.14). The lack of this 12 factor in (13.25) causes slight
deviations from their customary form in the following theorems.
Definition 13.5. The probability measure 𝜇 on ℝ𝑁 satisfies the logarithmic
Sobolev inequality if there exists a constant 𝛾 such that

(13.27) 𝑆(𝑓) = ∫ 𝑓 log 𝑓 d𝜇 ≤ 𝛾 ∫ |∇√𝑓|2 d𝜇 = 𝛾𝐷(√𝑓)

holds for any smooth density function 𝑓 ≥ 0 with ∫ 𝑓 d𝜇 = 1. The smallest


such 𝛾 is called the logarithmic Sobolev inequality constant of the measure 𝜇.
A simple density argument shows that (13.27) extends from smooth func-
tions to any nonnegative function 𝑓 ∈ 𝐶0∞ (ℝ𝑁 ), the space of smooth functions
with compact support. In fact, it is easy to extend it to all √𝑓 ∈ 𝐻 1 (d𝜇). Since
we will not use the space 𝐻 1 (d𝜇) in this book, we will just use 𝑓 ∈ 𝐶0∞ (ℝ𝑁 ) in
the LSI.
Theorem 13.6 (Bakry-Émery [13]). Consider a probability measure 𝜇 on
ℝ𝑁 of the form (13.24), i.e., 𝜇 = 𝑒−ℋ /𝑍. Suppose that a convexity bound holds
for the Hamiltonian; i.e., with some positive constant 𝐾 we have
(13.28) ∇2 ℋ(𝐱) ≥ 𝐾
for any 𝐱 (in the sense of quadratic forms). Then, the logarithmic Sobolev inequal-
ity (13.27) holds with an LSI constant 𝛾 ≤ 2/𝐾, i.e.,
2
(13.29) 𝑆(𝑓) ≤ 𝐷(√𝑓) for any density 𝑓 with ∫ 𝑓 d𝜇 = 1.
𝐾
130 13. ENTROPY AND THE LOGARITHMIC SOBOLEV INEQUALITY (LSI)

Furthermore, the dynamics


(13.30) 𝜕𝑡 𝑓𝑡 = ℒ𝑓𝑡 , 𝑡 > 0,
relaxes to equilibrium on the time scale 𝑡 ≍ 1/𝐾, both in the sense of entropy and
Dirichlet form:
2 −𝑡𝐾
(13.31) 𝑆(𝑓𝑡 ) ≤ 𝑒−2𝑡𝐾 𝑆(𝑓0 ), 𝐷(√𝑓𝑡 ) ≤ 𝑒 𝑆(𝑓0 ).
𝑡
Proof. Let 𝑓𝑡 be the solution to the evolution equation (13.30) with a given
smooth initial condition 𝑓0 . Simple calculation shows that the derivative of the
entropy 𝑆(𝑓𝑡 ) is given by
ℒ𝑓𝑡
𝜕𝑡 𝑆(𝑓𝑡 ) = ∫(ℒ𝑓𝑡 ) log 𝑓𝑡 d𝜇 + ∫ 𝑓𝑡 d𝜇
𝑓𝑡
(13.32)
(∇𝑓𝑡 )2
= −∫ d𝜇 = −4𝐷(√𝑓𝑡 ),
𝑓𝑡
where we used that ∫ ℒ𝑓𝑡 d𝜇 = 0 by (13.26). Similarly, we can compute the
evolution of the Dirichlet form. Let ℎ𝑡 ∶= √𝑓𝑡 for simplicity; then
1 1 1
𝜕𝑡 ℎ𝑡 = 𝜕𝑡 ℎ𝑡2 = ℒℎ𝑡2 = ℒℎ𝑡 + (∇ℎ𝑡 )2 .
2ℎ𝑡 2ℎ𝑡 ℎ𝑡
We compute (dropping the 𝑡 subscript for brevity)

𝜕𝑡 𝐷(√𝑓𝑡 ) = 𝜕𝑡 ∫(∇ℎ)2 d𝜇

= 2 ∫ ∇ℎ ⋅ ∇𝜕𝑡 ℎ d𝜇

(∇ℎ)2
= 2 ∫(∇ℎ) ⋅ (∇ℒℎ)d𝜇 + 2 ∫(∇ℎ) ⋅ ∇ d𝜇

= 2 ∫(∇ℎ) ⋅ [∇, ℒ]ℎ d𝜇 + 2 ∫(∇ℎ) ⋅ ℒ(∇ℎ)d𝜇

2(𝜕𝑗 ℎ)𝜕𝑖 𝜕𝑗 ℎ (𝜕𝑗 ℎ)2 𝜕𝑖 ℎ


+ 2 ∫ ∑ 𝜕𝑖 ℎ[ − ]d𝜇
(13.33) 𝑖𝑗
ℎ ℎ2

= −2 ∫(∇ℎ) ⋅ (∇2 ℋ)∇ℎ d𝜇 − 2 ∫ ∑(𝜕𝑖 𝜕𝑗 ℎ)2 d𝜇


𝑖𝑗

2(𝜕𝑗 ℎ)(𝜕𝑖 ℎ)𝜕𝑖𝑗 ℎ (𝜕𝑗 ℎ)2 (𝜕𝑖 ℎ)2


+ 2 ∫ ∑[ − ]d𝜇
𝑖𝑗
ℎ ℎ2

= −2 ∫(∇ℎ) ⋅ (∇2 ℋ)∇ℎ d𝜇

(𝜕𝑖 ℎ)(𝜕𝑗 ℎ) 2
− 2 ∫ ∑(𝜕𝑖𝑗 ℎ − ) d𝜇
𝑖𝑗

13.3. LOGARITHMIC SOBOLEV INEQUALITY 131

where we used the commutator


[∇, ℒ] = −(∇2 ℋ)∇.
Therefore, under the convexity condition (13.28), we have
(13.34) 𝜕𝑡 𝐷(√𝑓𝑡 ) ≤ −2𝐾𝐷(√𝑓𝑡 ).
Integrating (13.34), we have
(13.35) 𝐷(√𝑓𝑡 ) ≤ 𝑒−2𝑡𝐾 𝐷(√𝑓0 ).
This proves that the equilibrium is achieved at 𝑡 = ∞ with 𝑓∞ = 1, and both
the entropy and the Dirichlet form are zero. Integrating (13.32) from 𝑡 = 0 to
𝑡 = ∞, and using the monotonicity of 𝐷(√𝑓𝑡 ) from (13.34), we obtain

−𝑆(𝑓0 ) = −4 ∫ 𝐷(√𝑓𝑡 )d𝑡
0

(13.36) ≥ −4𝐷(√𝑓0 ) ∫ 𝑒−2𝑡𝐾 d𝑡
0
2
= − 𝐷(√𝑓0 ).
𝐾
This proves the LSI (13.29) for any smooth function 𝑓 = 𝑓0 . By a standard
density argument, it can be extended to any function 𝑓 with finite Dirichlet
form, 𝐷(√𝑓) < ∞. For functions with unbounded Dirichlet form we interpret
the LSI (13.29) as a tautology. In particular, (13.29) holds for 𝑓𝑡 as well, 𝑆(𝑓𝑡 ) ≤
(2/𝐾)𝐷(√𝑓𝑡 ). Inserting this back into (13.32), we have
𝜕𝑡 𝑆(𝑓𝑡 ) ≤ −2𝐾𝑆(𝑓𝑡 ).
Integrating this inequality from time 0, we obtain the exponential relaxation of
the entropy on time scale 𝑡 ≍ 1/𝐾
(13.37) 𝑆(𝑓𝑡 ) ≤ 𝑒−2𝑡𝐾 𝑆(𝑓0 ).
Finally, we can integrate (13.32) from time 𝑡/2 to 𝑡 to get
𝑡
𝑆(𝑓𝑡 ) − 𝑆(𝑓𝑡/2 ) = −4 ∫ 𝐷(√𝑓𝜏 )d𝜏.
𝑡/2

Using the positivity of the entropy 𝑆(𝑓𝑡 ) ≥ 0 on the left side and the monotonicity
of the Dirichlet form (from (13.34)) on the right side, we get
2
(13.38) 𝐷(√𝑓𝑡 ) ≤ 𝑆(𝑓𝑡/2 );
𝑡
thus, using (13.37), we obtain exponential relaxation of the Dirichlet form on time
scale 𝑡 ≍ 1/𝐾,
2
𝐷(√𝑓𝑡 ) ≤ 𝑒−𝑡𝐾 𝑆(𝑓0 ). □
𝑡
132 13. ENTROPY AND THE LOGARITHMIC SOBOLEV INEQUALITY (LSI)

Standard example. Let 𝜇 be the centered Gaussian measure with variance


𝑎2 on ℝ𝑛 , i.e.,
1 2 2
(13.39) d𝜇(𝑥) = 𝑒−𝑥 /2𝑎 d𝑥.
(2𝜋𝑎2 )𝑛/2
Written in the form 𝑒−ℋ /𝑍 with the quadratic Hamiltonian ℋ(𝑥) = 𝑥 2 /2𝑎2 ,
we see that (13.28) holds with 𝐾 = 𝑎−2 . Then, (13.29) with the replacement
𝑓 → 𝑓 2 yields that

(13.40) ∫ 𝑓 2 log 𝑓 2 d𝜇 ≤ 2𝑎2 ∫(∇𝑓)2 d𝜇

for any normalized ∫ 𝑓 2 d𝜇 = 1. The constant of this inequality is optimal.


Alternative formulation of LSI. We remark that there is another formulation
of the LSI (see [100]). To make this connection, let
1 2 2
(13.41) 𝑔(𝑥) ∶= 𝑓(𝑥) 𝑒−𝑥 /4𝑎 ,
(2𝜋𝑎2 )𝑛/4
and notice that ∫ 𝑔2 (𝑥)d𝑥 = ∫ 𝑓 2 d𝜇 where d𝑥 is the Lebesgue measure and
d𝜇 is of the form (13.39). Rewriting (13.40) to an inequality for 𝑔 and with the
replacement 2𝜋𝑎2 → 𝑎2 , we get the following version of the LSI:

(13.42) ∫ 𝑔2 log(𝑔2 /‖𝑔‖2 )d𝑥 + 𝑛[1 + log 𝑎] ∫ 𝑔2 d𝑥 ≤ (𝑎2 /𝜋) ∫ |∇𝑔|2 d𝑥,
ℝ𝑛 ℝ𝑛 ℝ𝑛

which holds for any 𝑎 > 0 and any function 𝑔, where ‖𝑔‖ = (∫ 𝑔2 d𝑥)1/2 is the
𝐿2 -norm with respect to the Lebesgue measure.
Proposition 13.7 (LSI implies spectral gap). Let 𝜇 satisfy the LSI (13.27)
with an LSI constant 𝛾. Then, for any 𝑣 ∈ 𝐿2 (𝜇) with ∫ 𝑣 d𝜇 = 0, we have
𝛾 𝛾
(13.43) ∫ 𝑣 2 d𝜇 ≤ ∫ |∇𝑣|2 d𝜇 = 𝐷(𝑣);
2 2
i.e., 𝜇 has a spectral gap of size at least 𝛾/2.
Proof. By definition of the LSI constant, we have

∫ 𝑢 log 𝑢 d𝜇 ≤ 𝛾𝐷(√𝑢)

for any 𝑢 with ∫ 𝑢 d𝜇 = 1. For any bounded, smooth function 𝑣 with ∫ 𝑣 d𝜇 = 0,


define 𝑢 = 1 + 𝜀𝑣. Then, we have
𝛾 |∇𝑣|2
(13.44) 𝜀−2 ∫(1 + 𝜀𝑣) log(1 + 𝜀𝑣)d𝜇 ≤ ∫ d𝜇.
4 1 + 𝜀𝑣
1
Taking the limit 𝜀 → 0, we get that the right-hand side converges to 2
∫ 𝑣2 d𝜇
by dominated convergence. This proves the proposition. □
13.3. LOGARITHMIC SOBOLEV INEQUALITY 133

Proposition 13.8 (Concentration inequality (Herbst bound)). Suppose


that the measure 𝜇 satisfies the LSI with a constant 𝛾. Let 𝐹 be a function with
𝔼𝜇 𝐹 = 0. Then, we have
𝛾
(13.45) 𝔼𝜇 𝑒𝐹 ≤ exp ( ‖∇𝐹‖2∞ )
4
where
‖∇𝐹‖∞ ∶= sup ∑ |𝜕𝑖 𝐹(𝐱)|2 .
𝐱 √ 𝑖
In particular, we have
𝛼2
(13.46) ℙ𝜇 (|𝐹| ≥ 𝛼) ≤ exp(− )
𝛾‖∇𝐹‖2∞
for any 𝛼 > 0.
Notice that we get an exponential tail estimate from the LSI. If we only have
the spectral gap estimate, (13.43), we can only bound the variance of 𝐹 that
yields a quadratic tail estimate
𝛾‖∇𝐹‖2∞
ℙ𝜇 (|𝐹| ≥ 𝛼) ≤ .
2𝛼2
In our typical applications, we have 𝛾‖∇𝐹‖2∞ ≍ 𝑁 −1 . We often need to control
the concentration of many (≍ 𝑁 𝐶 ) different functions 𝐹 in parallel. Thus the
simple union bound is applicable with the LSI but not with the spectral gap
estimate.
Proof. Denote by
exp (𝑒𝑡 𝐹)
𝑢 = 𝑢(𝑡) ∶= .
𝔼𝜇 exp (𝑒𝑡 𝐹)
By differentiation and the LSI, we have
d −𝑡
(13.47) [𝑒 log 𝔼𝜇 exp (𝑒𝑡 𝐹)] = 𝑒−𝑡 𝔼𝜇 𝑢 log 𝑢 ≤ 𝑒−𝑡 𝛾𝔼𝜇 |∇√𝑢|2 .
d𝑡
Clearly,
𝑒2𝑡
(13.48) 𝔼𝜇 |∇√𝑢|2 ≤ ‖∇𝐹‖2∞ .
4
Integrating from any 𝑡 < 0 to 0 yields that
0
𝜇 −𝑡 𝛾
(13.49) [log 𝔼 exp (𝐹)] − [𝑒 log 𝔼 exp (𝑒 𝐹)] = ‖∇𝐹‖2∞ ∫ d𝑠 𝑒𝑠 .
𝜇 𝑡
4 𝑡
From the condition 𝔼𝜇 𝐹 = 0, we have
1
lim [𝑒−𝑡 log 𝔼𝜇 exp (𝑒𝑡 𝐹)] = lim log 𝔼𝜇 𝑒𝜀𝐹
𝑡→−∞ 𝜀→0 𝜀
1
= lim log 𝔼𝜇 [1 + 𝜀𝐹 + 𝑂(𝜀2 𝑒𝜀𝐹 )] = 0
𝜀→0 𝜀
134 13. ENTROPY AND THE LOGARITHMIC SOBOLEV INEQUALITY (LSI)

by dominated convergence. This proves the inequality (13.45). The concentra-


tion bound (13.46) follows from an exponential Markov inequality:
𝛾
ℙ𝜇 (𝐹 ≥ 𝛼) ≤ 𝔼𝜇 𝑒𝑡(𝐹−𝛼) ≤ exp(−𝛼𝑡 + 𝑡 2 ‖∇𝐹‖2∞ ),
4
where we chose the optimal 𝑡 = 2𝛼/[𝛾‖∇𝐹‖2∞ ] in the last step. Changing
𝐹 → −𝐹, we obtain the opposite bound and thus we have proved (13.46). □
Proposition 13.9 (Stability of the LSI constant). Consider two measures
𝜇, 𝜈 on ℝ𝑁 that are related by 𝜈 = 𝑔𝜇 with some bounded function 𝑔. Let 𝛾𝜇
denote the LSI constant for 𝜇. Then
𝛾𝜈 ≤ ‖𝑔‖∞ ‖𝑔−1 ‖∞ 𝛾𝜇 .
Proof. Take an arbitrary function 𝑓 ≥ 0 with ∫ 𝑓 d𝜈 = 1. Let 𝛼 ∶=
∫ 𝑓 d𝜇 ≤ ‖𝑔−1 ‖∞ and by definition of the entropy we have

𝑆𝜇 (𝑓/𝛼) = ∫(𝑓/𝛼) log(𝑓/𝛼)d𝜇.

The following inequality for any two nonnegative numbers 𝑎, 𝑏 can be checked
by elementary calculus:
𝑎 log 𝑎 − 𝑏 log 𝑏 − (1 + log 𝑏)(𝑎 − 𝑏) ≥ 0.
Hence,

∫[𝑓 log 𝑓 − 𝛼 log 𝛼 − (1 + log 𝛼)(𝑓 − 𝛼)]d𝜈

≤ ‖𝑔‖∞ ∫[𝑓 log 𝑓 − 𝛼 log 𝛼 − (1 + log 𝛼)(𝑓 − 𝛼)]d𝜇

= ‖𝑔‖∞ 𝛼𝑆𝜇 (𝑓/𝛼).


The left-hand side of the last inequality is equal to
𝑆𝜈 (𝑓) − [log 𝛼 − (𝛼 − 1)] ≥ 𝑆𝜈 (𝑓) ,
where we have used the concavity of log. This proves that
𝑆𝜈 (𝑓) ≤ ‖𝑔‖∞ 𝛼𝑆𝜇 (𝑓/𝛼).
Suppose that the LSI holds for 𝜇 with constant 𝛾𝜇 . Then, we have

𝑆𝜈 (𝑓) ≤ ‖𝑔‖∞ 𝛾𝜇 𝛼 ∫(∇√𝑓/𝛼)2 d𝜇 = ‖𝑔‖∞ 𝛾𝜇 ∫(∇√𝑓)2 𝑔−1 d𝜈

≤ ‖𝑔‖∞ ‖𝑔−1 ‖∞ 𝛾𝜇 ∫(∇√𝑓)2 d𝜈,

and this proves the lemma. □


Proposition 13.10 (Tensorial property of the LSI). Consider two probability
measures 𝜇, 𝜈 on ℝ𝑛 , ℝ𝑚 , respectively. Suppose that the LSI holds for them with
LSI constants 𝛾𝜇 and 𝛾𝜈 , respectively. Let 𝜔 = 𝜇 ⊗ 𝜈 be the product measure on
ℝ𝑚+𝑛 . Then, the LSI holds for 𝜔 with LSI constant 𝛾𝜔 ≤ max{𝛾𝜇 , 𝛾𝜈 }.
13.3. LOGARITHMIC SOBOLEV INEQUALITY 135

Proof. We will use the notation introduced in Section 13.2 with Ω1 = ℝ𝑛 ,


Ω2 = ℝ𝑚 . Recall the additivity of entropy (13.20) w.r.t. martingale decomposi-
tion. In the current situation, 𝜔 = 𝜇 ⊗ 𝜈 and thus 𝜔
ˆ = 𝜇 and 𝜔𝑥 = 𝜈 for any 𝑥.
Furthermore,

(13.50) ˆ = 𝜔(𝑥)
𝑓(𝑥) ˆ −1 ∫ 𝑓(𝑥, 𝑦)𝜔(𝑥, 𝑦)d𝑦 = ∫ 𝑓(𝑥, 𝑦)𝜈(𝑦)d𝑦.

By the additivity of entropy and the LSI w.r.t. 𝜇 and 𝜈, we have


ˆ
𝑆𝜔 (𝑓) = 𝔼𝑓𝜇 𝑆𝜈 (𝑓𝑥 ) + 𝑆𝜇 (𝑓ˆ )
2
|∇𝑦 √𝑓(𝑥, 𝑦)|
ˆ
≤ 𝛾𝜈 ∫ 𝑓(𝑥)𝜇(𝑥)d𝑥 ∫ 𝜈(𝑦)d𝑦
(13.51) ˆ
𝑓(𝑥)
| |
+ 𝛾𝜇 ∫ ||∇𝑥 ∫ 𝑓(𝑥, 𝑦)𝜈(𝑦)d𝑦 || 𝜇(𝑥)d𝑥.
| √ |
The integral in the first term on the right-hand side is equal to
2
∬|∇𝑦 √𝑓(𝑥, 𝑦)| 𝜇(𝑥)𝜈(𝑦)d𝑥 d𝑦.

The integral of the second term on the right-hand side is bounded by


2
| ∫ ∇𝑥 𝑓(𝑥, 𝑦)𝜈(𝑦)d𝑦 | 2
∫ 𝜇(𝑥)d𝑥 ≤ ∬||∇𝑥 √𝑓(𝑥, 𝑦)|| 𝜇(𝑥)𝜈(𝑦)d𝑥 d𝑦,
4 ∫ 𝑓(𝑥, 𝑦)𝜈(𝑦)d𝑦
where we have written ∇𝑥 𝑓 = 2(∇𝑥 √𝑓)√𝑓 and used the Schwarz inequality.
Summarizing, we have proved that
2 2
𝑆𝜔 (𝑓) ≤ 𝛾𝜈 ∫|∇𝑦 √𝑓(𝑥, 𝑦)| 𝜔(𝑥, 𝑦)d𝑥 d𝑦 + 𝛾𝜇 ∫||∇𝑥 √𝑓(𝑥, 𝑦)|| 𝜔(𝑥, 𝑦)d𝑥 d𝑦,

and this proves the proposition. □


The following lemma is a useful tool to control the entropy flow w.r.t. non-
equilibrium measure.
Lemma 13.11 ([140]). Suppose we have evolution equation 𝜕𝑡 𝑓𝑡 = ℒ𝑓𝑡 with
ℒ defined via the Dirichlet form ⟨𝑓, (−ℒ)𝑓⟩𝜇 = ∑𝑗 ∫(𝜕𝑗 𝑓)2 d𝜇. Then, for any
time-dependent probability density 𝜓𝑡 w.r.t. 𝜇, we have the entropy flow identity

𝜕𝑡 𝑆𝜇 (𝑓𝑡 |𝜓𝑡 ) = −4 ∑ ∫(𝜕𝑗 √𝑔𝑡 )2 𝜓𝑡 d𝜇 + ∫ 𝑔𝑡 (ℒ − 𝜕𝑡 )𝜓𝑡 d𝜇


𝑗

where 𝑔𝑡 ∶= 𝑓𝑡 /𝜓𝑡 and


𝑓𝑡
𝑆𝜇 (𝑓𝑡 |𝜓𝑡 ) ∶= ∫ 𝑓𝑡 log d𝜇 = 𝑆(𝑓𝑡 𝜇|𝜓𝑡 𝜇)
𝜓𝑡
is the relative entropy of 𝑓𝑡 𝜇 w.r.t. 𝜓𝑡 𝜇.
136 13. ENTROPY AND THE LOGARITHMIC SOBOLEV INEQUALITY (LSI)

Proof. A simple computation then yields that


d ℒ𝑓
𝑆𝜇 (𝑓𝑡 |𝜓𝑡 ) = ∫(ℒ𝑓𝑡 )(log 𝑔𝑡 )d𝜇 + ∫ 𝑓𝑡 𝑡 d𝜇 − ∫(𝜕𝑡 𝜓𝑡 ) 𝑔𝑡 d𝜇
d𝑡 𝑓𝑡

= ∫(ℒ𝑓𝑡 )(log 𝑔𝑡 )d𝜇 − ∫(𝜕𝑡 𝜓𝑡 ) 𝑔𝑡 d𝜇

= ∫(𝑔𝑡 𝜓𝑡 ) ℒ(log 𝑔𝑡 )d𝜇 − ∫ 𝑔𝑡 𝜕𝑡 𝜓𝑡 d𝜇

ℒ𝑔𝑡
= ∫ 𝜓𝑡 [𝑔𝑡 ℒ(log 𝑔𝑡 ) − 𝑔𝑡 ] d𝜇 + ∫ 𝑔𝑡 (ℒ − 𝜕𝑡 )𝜓𝑡 d𝜇.
𝑔𝑡
By definition of ℒ, we have
2
ℒ𝑔 (𝜕𝑗 𝑔) 2
(13.52) ℒ(log 𝑔) − = −∑ = −4 ∑ (𝜕𝑗 √𝑔) ,
𝑔 𝑗
𝑔 𝑗

and we have proved Lemma 13.11. □


We remark that stochastic processes and their generators can be defined in
more general setups, e.g., the underlying probability spaces may be different
from ℝ𝑁 and the generators may involve discrete jumps. In this case, the sto-
chastic generator ℒ is defined directly, without an a priori Dirichlet form. It
can, however, be proved that −𝑔[ℒ(log 𝑔) − (ℒ𝑔)/𝑔] ≥ 0, which can then be
viewed as a generalization of the “Dirichlet form” associated with a generator.

13.4. Hypercontractivity
We now present an interesting connection between the LSI of a probability
measure 𝜇 and the hypercontractivity properties of the semigroup generated by
ℒ = ℒ𝜇 . Since this result will not be used later in this book, this section can be
skipped.
To state the result, we define the semigroup {𝑃𝑡 }𝑡≥0 by 𝑃𝑡 𝑓 ∶= 𝑓𝑡 , where 𝑓𝑡
solves the equation 𝜕𝑡 𝑓𝑡 = ℒ𝑓𝑡 with initial condition 𝑓0 = 𝑓.
Theorem 13.12 (L. Gross [79]). For a measure 𝜇 on ℝ𝑛 and for any fixed
constants 𝛽 ≥ 0 and 𝛾 > 0 the following two statements are equivalent:
(i) The generalized LSI

(13.53) ∫ 𝑓 log 𝑓 d𝜇 ≤ 𝛾 ∫ |∇√𝑓|2 d𝜇 + 𝛽 for any 𝑓 ≥ 0 with ∫ 𝑓 d𝜇 = 1

holds.
(ii) The hypercontractivity estimate
1 1
(13.54) ‖𝑃𝑡 𝑓‖𝐿𝑞 (𝜇) ≤ exp {𝛽 [ − ]} ‖𝑓‖𝐿𝑝 (𝜇)
𝑝 𝑞
holds for all exponents satisfying
𝑞−1
≤ 𝑒4𝑡/𝛾 , 1 < 𝑝 ≤ 𝑞 < ∞.
𝑝−1
13.4. HYPERCONTRACTIVITY 137

Proof. We will only prove (i) ⇒ (ii), i.e., that the generalized LSI implies
the decay estimate, the proof of the converse statement can be found in [43].
First we assume that 𝑓 ≥ 0, hence 𝑓𝑡 ≥ 0. Direct differentiation yields the
identity

d
(13.55) log ‖𝑓𝑡 ‖𝑝(𝑡) =
d𝑡
𝑝(𝑡)
̇ 4(𝑝(𝑡) − 1)
[− 𝐷(𝑢(𝑡)) + ∫ 𝑢(𝑡)2 log(𝑢(𝑡)2 )d𝜇]
𝑝(𝑡)2 𝑝(𝑡)
̇
d
with 𝑝(𝑡)
̇ = 𝑝(𝑡) and where we defined
d𝑡

𝑝(𝑡)/2 −𝑝(𝑡)/2
𝑢(𝑡) ∶= 𝑓𝑡 ‖𝑓𝑡 ‖𝑝(𝑡) , 𝐷(𝑢) = ∫(∇𝑢)2 d𝜇.

Now choose 𝑝(𝑡) to solve the differential equation

4(𝑝(𝑡) − 1)
𝛾= with 𝑝(0) = 𝑝, i.e., 𝑝(𝑡) = 1 + (𝑝 − 1)𝑒4𝑡/𝛾 ,
𝑝(𝑡)
̇

where 𝛾 is the constant given in the theorem. Using (13.53) for the 𝐿1 (𝜇)-
normalized function 𝑢(𝑡)2 , we have

d 𝛽 𝑝(𝑡)
̇
log ‖𝑓𝑡 ‖𝑝(𝑡) ≤ .
d𝑡 𝑝(𝑡)2

Integrating both sides, from 𝑡 = 0 to some 𝑇 we have

1 1
log ‖𝑓𝑇 ‖𝑝(𝑇) − log ‖𝑓‖𝑝 ≤ 𝛽[ − ].
𝑝 𝑝(𝑇)

Choosing 𝑇 such that 𝑝(𝑇) = 𝑞, we have proved (13.54) for 𝑓 ≥ 0. The general
case follows from separating the positive and negative parts of 𝑓. □

Exercise. In this exercise, we show that the idea of the LSI can be useful
even if the invariant measure is not a probability measure. The sketch below
follows the paper by Carlen-Loss [33], and it works for any parabolic equation
of the type
𝜕𝑡 𝑓𝑡 = [∇ ⋅ (𝐷(𝑥, 𝑡)∇) + b(𝑥, 𝑡) ⋅ ∇]𝑓𝑡

for any divergence free b and 𝐷(𝑥, 𝑡) ≥ 𝑐 > 0. For simplicity of notation, we
consider only the heat equation on ℝ𝑛

(13.56) 𝜕𝑡 𝑓𝑡 = Δ𝑓𝑡 .
138 13. ENTROPY AND THE LOGARITHMIC SOBOLEV INEQUALITY (LSI)

The invariant measure of this flow is the standard Lebesgue measure on ℝ𝑛 .


(i) Check the formula (13.55) in this setup, i.e., show that the following
identity holds:
d 1 −𝑝(𝑡) d 𝑝(𝑡) 𝑝(𝑡)
̇ 𝑝
log ‖𝑓𝑡 ‖𝑝(𝑡) = ‖𝑓𝑡 ‖𝑝(𝑡) ∫ 𝑓𝑡 d𝑥 − log(‖𝑓𝑡 ‖𝑝 )
d𝑡 𝑝(𝑡) d𝑡 𝑝(𝑡)2
𝑝(𝑡)
̇ 4(𝑝(𝑡) − 1)
(13.57) = [− ∫(∇𝑢(𝑡))2 d𝑥
𝑝(𝑡)2 𝑝(𝑡)
̇

+ ∫ 𝑢(𝑡)2 log(𝑢(𝑡)2 )d𝑥],

where all norms are w.r.t. the Lebesgue measure and


𝑝/2 −𝑝/2
𝑢(𝑡) = 𝑓𝑡 ‖𝑓𝑡 ‖𝑝 , 𝑝 = 𝑝(𝑡).
(ii) For any 𝑡 fixed, choose the constant in (13.42) by setting
4(𝑝(𝑡) − 1)
𝑎2 = .
𝑝(𝑡)
̇
Thus we have
d 𝑛𝑝(𝑡)
̇ 1 4𝜋(𝑝(𝑡) − 1)
log ‖𝑓𝑡 ‖𝑝(𝑡) ≤ − [1 + log( )].
d𝑡 𝑝(𝑡) 2 2 𝑝(𝑡)
̇
For a suitable choice of the function 𝑝(𝑡) with 𝑝(0) = 1, 𝑝(𝑇) = ∞,
show that
(13.58) ‖𝑓𝑇 ‖∞ ≤ (4𝜋𝑇)−𝑛/2 ‖𝑓0 ‖1 .
We remark that in the case of 𝐷 ≡ 1 and b ≡ 0, this heat kernel estimate is
also a trivial consequence of the explicit formula for the heat kernel 𝑒𝑡∆ (𝑥, 𝑦).
The point is, as remarked earlier, that this proof works for a general class of
second-order parabolic equations.

13.5. Brascamp-Lieb Inequality


The following inequality will not be needed for the main result of this book,
so the reader may skip it. Nevertheless, we included it here since it is an impor-
tant tool in the analysis of probability measures in very high dimensions and
was used in universality proofs for the invariant ensembles [25]. We will for-
mulate it on ℝ𝑁 , and in Section 13.7 we comment on how to extend it to certain
probability measures on the simplex Σ𝑁 defined in (12.3).
Theorem 13.13 (Brascamp-Lieb inequality [29]). Consider a probability
measure 𝜇 on ℝ𝑁 of the form (13.24), i.e., 𝜇 = 𝑒−ℋ /𝑍. Suppose that the Hamil-
tonian is strictly convex, i.e.,
(13.59) ∇2 ℋ(𝐱) ≥ 𝐾 > 0
13.5. BRASCAMP-LIEB INEQUALITY 139

as a matrix inequality for some positive constant 𝐾. Then, for any smooth function
𝑓 ∈ 𝐿2 (𝜇), we have

(13.60) ⟨𝑓; 𝑓⟩𝜇 ≤ ⟨∇𝑓, [∇2 ℋ]−1 ∇𝑓⟩𝜇 .

Recall that ⟨𝑓, 𝑔⟩𝜇 = ∫ 𝑓𝑔 d𝜇 denotes the scalar product and ⟨𝑓; 𝑔⟩𝜇 =
⟨𝑓, 𝑔⟩𝜇 − ⟨1, 𝑓⟩𝜇 ⟨1, 𝑔⟩𝜇 is the covariance. With a slight abuse of notation, we
𝑁
also use the notation ⟨F, G⟩𝜇 = ∑𝑖=1 ⟨𝐹𝑖 , 𝐺𝑖 ⟩𝜇 for the scalar product of any two
vector-valued functions F, G ∶ ℝ𝑁 → ℝ𝑁 ; this extended scalar product is used
in the right-hand side of (13.60).

Remark. Notice that the Brascamp-Lieb inequality is optimal in the sense


that, if 𝜇 is a Gaussian measure, then we can find 𝑓 so that the inequality be-
comes equality. Moreover, if we use the matrix inequality ∇2 ℋ ≥ 𝐾, then
(13.60) is reduced to the spectral gap inequality (13.43), but of course (13.60) is
stronger. The following proof is due to Helffer-Sjöstrand [81] and Naddaf-Spen-
cer [109].

Proof of Theorem 13.13 . Recall that ℒ is the generator for a reversible


dynamics with reversible measure 𝜇 defined in (13.26). We have for the dynam-
ics (13.30)
d
⟨𝑓, 𝑒𝑡ℒ 𝑔⟩ = ⟨𝑓, ℒ𝑒𝑡ℒ 𝑔⟩ = − ∑⟨𝜕𝑥𝑗 𝑓, 𝜕𝑥𝑗 𝑒𝑡ℒ 𝑔⟩.
d𝑡 𝑗

Define

G(𝑡, 𝑥) ∶= (𝐺1 (𝑡, 𝐱), … , 𝐺𝑁 (𝑡, 𝐱)), 𝐺𝑗 (𝑡, 𝐱) ∶= 𝜕𝑥𝑗 [𝑒𝑡ℒ 𝑔(𝐱)].

Recall the explicit form of ℒ:

ℒ = Δ − (∇ℋ) ⋅ ∇ = Δ + ∑ 𝑏𝑘 𝜕𝑥𝑘 , 𝑏𝑘 ∶= −𝜕𝑥𝑘 ℋ.


𝑘

Define a new generator L on any vector-valued function G ∶ ℝ𝑁 → ℝ𝑁 by

(LG)𝑗 (𝐱) ∶= ℒ𝐺𝑗 (𝐱) + ∑(𝜕𝑥𝑗 𝑏𝑘 )𝐺𝑘 (𝐱).


𝑘

Clearly, by explicit computation, we have

𝜕𝑡 G(𝑡, 𝐱) = LG(𝑡, 𝐱).

From the decay estimate (13.31), it follows that the dynamics are mixing,
i.e., lim𝑡→∞ ⟨𝑓, 𝑒𝑡ℒ 𝑔⟩ = 0 for any smooth function 𝑓 with ∫ 𝑓 d𝜇 = 0. Thus for
140 13. ENTROPY AND THE LOGARITHMIC SOBOLEV INEQUALITY (LSI)

any such function 𝑓 we have


∞ ∞
d
⟨𝑓, 𝑔⟩𝜇 = − ∫ ⟨𝑓, 𝑒𝑡ℒ 𝑔⟩𝜇 d𝑡 = ∑ ∫ ⟨𝜕𝑥𝑗 𝑓, 𝐺𝑗 (𝑡, 𝐱)⟩𝜇 d𝑡
0
d𝑡 𝑗 0

= ∫ ⟨∇𝑓, 𝑒𝑡L ∇𝑔⟩𝜇 d𝑡
0

= ⟨∇𝑓, (−L)−1 ∇𝑔⟩𝜇 .


Taking 𝑔 = 𝑓 and from the definition of L, we have
−(L)𝑖,𝑗 = −ℒ𝛿𝑖,𝑗 − (𝜕𝑥𝑖 𝑏𝑗 ) ≥ (𝜕𝑥𝑖 𝜕𝑥𝑗 ℋ)

as an operator inequality, and we have proved that


⟨𝑓, 𝑓⟩𝜇 = ⟨∇𝑓, (−L)−1 ∇𝑓⟩𝜇 ≤ ⟨∇𝑓, [∇2 ℋ]−1 ∇𝑓⟩𝜇 . □

13.6. Remarks on the Applications of the LSI to Random Matrices


The Herbst bound provides strong concentration results for probability mea-
sures that satisfy the LSI. In this section we exploit this connection for random
matrices. For definiteness, we will consider the Gaussian orthogonal ensemble
(GOE), although the arguments below can be extended to more general Wigner
as well as invariant ensembles. There are two ways to view Gaussian matrix
ensembles in the context of the LSI: we can either work on the probability space
of the matrix 𝐻 with Gaussian matrix elements, or we can work directly on the
space of the eigenvalues 𝝀 by using the explicit formula (4.12) for invariant en-
sembles. Both measures satisfy the LSI (in Section 13.7 we will comment on
how to generalize the LSI from ℝ𝑁 to the simplex Σ𝑁 , (12.3), a version of the
LSI that we will actually use). We will explore both possibilities, and we start
with the matrix point of view.

13.6.1. LSI from the Wigner matrix point of view. Let 𝜇 = 𝜇G be the
standard Gaussian measure 𝜇 ∼ exp (−𝑁 Tr 𝐻 2 ) on symmetric matrices. Notice
that for this measure the family {𝑥𝑖𝑗 = 𝑁 1/2 ℎ𝑖𝑗 , 𝑖 ≤ 𝑗} is a collection of indepen-
dent standard Gaussian random variables (up to a factor 2). Hence the LSI holds
for every matrix element and by its tensorial property, i.e., Proposition 13.10,
the LSI holds for any function of the full matrix considered as a function of this
collection {𝑥𝑖𝑗 = 𝑁 1/2 ℎ𝑖𝑗 , 𝑖 ≤ 𝑗}. In particular, using the spectral gap estimate
(13.43) for the Gaussian variables 𝑥𝑖𝑗 , we have for any function 𝐹 = 𝐹(𝐻)
𝐶 2
(13.61) ⟨𝐹(𝐻); 𝐹(𝐻)⟩𝜇 ≤ 𝐶 ∑ ∫ |𝜕𝑥𝑖𝑗 𝐹(𝐻)|2 d𝜇 = ∑ ∫|𝜕ℎ𝑖𝑗 𝐹(𝐻)| d𝜇,
𝑖≤𝑗
𝑁 𝑖≤𝑗

where the additional 𝑁 −1 factor comes from the scaling ℎ𝑖𝑗 = 𝑁 −1/2 𝑥𝑖𝑗 .
13.6. REMARKS ON THE APPLICATIONS OF THE LSI TO RANDOM MATRICES 141

Suppose that 𝐹(𝐻) = 𝑅(𝝀(𝐻)) is a real function of the eigenvalues 𝝀 =


(𝜆1 , … , 𝜆𝑁 ); then by the chain rule we have
2
𝐶 𝐶 | 𝜕𝜆 |
∑ ∫ |𝜕ℎ𝑖𝑗 𝑅(𝝀(𝐻))|2 d𝜇 = ∑ ∫|∑ 𝜕𝜆𝛼 𝑅(𝝀) 𝛼 | d𝜇.
𝑁 𝑖≤𝑗 𝑁 𝑖≤𝑗 | 𝛼 𝜕ℎ𝑖𝑗 |

Expanding the square and using the perturbation formula (12.9) with real eigen-
vectors, we can compute
2
1 | 𝜕𝜆 |
∑ ∫|∑ 𝜕 𝑅(𝝀) 𝛼 || d𝜇
𝑁 𝑖≤𝑗 | 𝛼 𝜆𝛼 𝜕ℎ𝑖𝑗
2
1 2 | 𝜕𝜆 |
≤ ∑ ∫||∑ 𝜕𝜆𝛼 𝑅(𝝀) 𝛼 || d𝜇 =
𝑁 𝑖≤𝑗 2 − 𝛿𝑖𝑗 𝛼
𝜕ℎ𝑖𝑗

1 2 𝜕𝜆 𝜕𝜆𝛽
= ∑ ∑ ∫ 𝜕𝜆𝛼 𝑅(𝝀)𝜕𝜆𝛽 𝑅(𝝀) 𝛼 d𝜇
𝑁 𝑖≤𝑗 2 − 𝛿𝑖𝑗 𝛼,𝛽 𝜕ℎ𝑖𝑗 𝜕ℎ𝑖𝑗
2
= ∑[2 − 𝛿𝑖𝑗 ] ∑ ∫ 𝜕𝜆𝛼 𝑅(𝝀)𝜕𝜆𝛽 𝑅(𝝀)𝑢𝛼 (𝑖)𝑢𝛼 (𝑗)𝑢𝛽 (𝑖)𝑢𝛽 (𝑗) d𝜇
𝑁 𝑖≤𝑗 𝛼,𝛽

2
= ∑ ∑ ∫ 𝜕𝜆𝛼 𝑅(𝝀)𝜕𝜆𝛽 𝑅(𝝀)𝑢𝛼 (𝑖)𝑢𝛼 (𝑗)𝑢𝛽 (𝑖)𝑢𝛽 (𝑗) d𝜇
𝑁 𝑖,𝑗 𝛼,𝛽
2
(13.62) = ∑ ∫ |𝜕𝜆𝛼 𝑅(𝝀)|2 d𝜇,
𝑁 𝛼
where we have used the orthogonality property and the normalization conven-
tion of the eigenvectors in the last step.
We remark that the annoying factor [2 − 𝛿𝑖𝑗 ] can be avoided if we first con-
sider 𝜆𝛼 as a function of all {𝑥𝑖𝑗 ∶ 1 ≤ 𝑖, 𝑗 ≤ 𝑁} as independent variables. Then
the perturbation formula (12.9) becomes
𝜕𝜆𝛼 |
(13.63) | = 𝑢𝛼 (𝑖)𝑢𝛼 (𝑗),
𝜕ℎ𝑖𝑗 |ℎ𝑖𝑗 =ℎ𝑗𝑖
i.e., the derivative evaluated on the submanifold of Hermitian matrices. In this
way, we can keep the summation in (13.61) unrestricted, and up to a constant
factor, we will get the same final result as in (13.62).
In summary, we proved
𝐶
(13.64) ⟨𝐹; 𝐹⟩𝜇 = ⟨𝑅(𝝀(𝐻)); 𝑅(𝝀(𝐻))⟩𝜇 ≤ ∑ ∫ |𝜕𝜆𝛼 𝑅(𝝀)|2 d𝜇.
𝑁 𝛼
Notice that this argument holds for any generalized Wigner matrix as long as a
spectral gap estimate (13.43) holds for the distribution of every rescaled matrix
element 𝑁 1/2 ℎ𝑖𝑗 . Furthermore, it can be generalized for Wigner matrices with
Bernoulli random variables for which there is a spectral gap (and LSI) in discrete
form. Similar remarks apply to the following LSI estimates.
142 13. ENTROPY AND THE LOGARITHMIC SOBOLEV INEQUALITY (LSI)

To see how good this estimate is, consider a local linear statistics of eigen-
values, i.e., the local average of 𝐾 consecutive eigenvalues with label near 𝑀, by
setting
1 𝛼−𝑀
𝑅(𝝀) ∶= ∑ 𝐴( )𝜆𝛼 ,
𝐾 𝛼 𝐾
where 𝐴 is a smooth function of compact support and 1 ≤ 𝑀, 𝐾 ≤ 𝑁. Comput-
ing the right-hand side of (13.64) and combining it with (13.61), we get
𝐶 𝛼−𝑀 𝐶
(13.65) ⟨𝐹; 𝐹⟩𝜇 ≤ ∑ 𝐴2 ( )≤ 𝐴
𝑁𝐾 2 𝛼 𝐾 𝑁𝐾

where the last constant 𝐶𝐴 depends on the function 𝐴. For 𝐾 = 1, this inequal-
ity estimates the square of the fluctuation of a single eigenvalue (choosing 𝐴
appropriately). The bound (13.65) is off by a factor 𝑁 since the true fluctua-
tion is of order almost 1/𝑁 by rigidity, Theorem 11.5, at least in the bulk, i.e.,
if 𝛿𝑁 ≤ 𝑀 ≤ (1 − 𝛿)𝑁. On the other hand, for 𝐾 = 𝑁 the bound (13.65) is
much more precise; it shows that the variance of a macroscopic average of the
eigenvalues is at most of order 𝑁 −2 . This is the correct order of magnitude; in
𝛼
fact, it is known that ∑𝛼 𝐴( 𝑁 )𝜆𝛼 converges to a Gaussian random variable (see,
e.g., [103,118] and references therein). Hence, the spectral gap argument yields
the optimal (up to a constant) result for any macroscopic average of eigenvalues.
Another common quantity of interest is the Stieltjes transform of the em-
pirical eigenvalue distribution, i.e.,
1 1
(13.66) 𝐹(𝐻) = 𝐺(𝝀(𝐻)) with 𝐺(𝝀) = 𝑚𝑁 (𝑧) = ∑ , 𝑧 = 𝐸 + 𝑖𝜂.
𝑁 𝛼 𝜆𝛼 − 𝑧

In this case, using (13.64), we can estimate


𝐶
⟨𝑚𝑁 (𝑧); 𝑚𝑁 (𝑧)⟩𝜇 ≤ ∑ ∫ |𝜕𝜆𝛼 𝐺(𝝀)|2 d𝜇
𝑁 𝛼
(13.67)
𝐶 1 𝐶
= ∑∫ d𝜇 ≤ 2 4
𝑁3 𝛼 |𝜆𝛼 − 𝑧|4 𝑁 𝜂

where we used |𝜆𝛼 − 𝑧| ≥ Im 𝑧 = 𝜂.


This shows that the variance of 𝑚𝑁 (𝑧) vanishes if 𝜂 ≫ 𝑁 −1/2 . The last
estimate can be improved if we know that the density of the empirical measure
is bounded. Roughly speaking, if we know that in the summation over 𝛼, the
number of eigenvalues in an interval of size 𝜂 near 𝐸 is at most of order 𝑁𝜂;
then the last estimate can be improved to
1 1 1 1
∑∫ 4
d𝜇 ≲ 3 ∑ ∫ d𝜇
3
𝑁 𝛼 |𝜆𝛼 − 𝑧| 𝑁 𝛼 |𝜆𝛼 − 𝑧|4
(13.68) |𝜆𝛼 −𝐸|≤𝜂
𝐶 1 𝐶
≤ (𝑁𝜂) 4 = 2 3 .
𝑁3 𝜂 𝑁 𝜂
13.6. REMARKS ON THE APPLICATIONS OF THE LSI TO RANDOM MATRICES 143

This shows that the variance of 𝑚𝑁 (𝑧) vanishes if 𝜂 ≫ 𝑁 −2/3 . Since vanishing
fluctuations can be used to estimate the density, this argument can actually be
made rigorous by a bootstrapping argument [61]. Notice that the scale 𝜂 ≫
𝑁 −2/3 is still far from the resolution demonstrated in the local semicircle law,
Theorem 6.7.
Whenever the LSI is available, the variance bounds can be easily lifted to a
concentration estimate with exponential tail. We now demonstrate this mech-
anism for the fluctuation of a single eigenvalue. In other words, we will apply
(13.46) with 𝐹(𝐻) = 𝜆𝛼 (𝐻) − 𝔼𝜇 𝜆𝛼 (𝐻) for a fixed 𝛼. From (12.9), we have
2
∑ |∇𝑥𝑖𝑗 𝐹|2 ≤ ;
𝑖≤𝑗
𝑁

thus (13.46) implies


(13.69) ℙ𝜇 (|𝜆𝛼 − 𝔼𝜇 𝜆𝛼 | ≥ 𝑡) ≤ exp(−𝑐𝑁𝑡 2 )
for any 𝛼 fixed. This inequality has two deficiencies compared with the rigid-
ity bounds in Theorem 11.5. First, the control of |𝜆𝛼 − 𝔼𝜇 𝜆𝛼 | is only up to
𝑁 −1/2 , which is still far from the optimal 𝑁 −1 (in the bulk). Second, it does
not give information on the difference between the expectation 𝔼𝜇 𝜆𝛼 and the
corresponding classical location 𝛾𝛼 defined as the 𝛼th 𝑁-quantile of the limiting
density (see (11.31)).

13.6.2. LSI from the invariant ensemble point of view. Now we pass to
the second point of view, where the basic measure 𝜇𝐺 is the invariant ensemble
on the eigenvalues. One might hope that the situation can be improved since
we look directly at the Gaussian eigenvalue ensemble defined in (12.13) with
the Gaussian choice for 𝑉(𝜆) = 12 𝜆2 . Notice that the role of ℋ in Theorem 13.6
will be played by 𝑁ℋ𝑁 defined in (12.13) (with 𝛽 = 1). The Hessian of ℋ𝑁 is
given by (all inner products and norms in the following equation are w.r.t. the
standard inner product in ℝ𝑁 )

1 1 (𝑣𝑖 − 𝑣𝑗 )2 1
(13.70) (𝐯, ∇2 ℋ𝑁 (𝐱)𝐯) ≥ ‖𝐯‖2 + ∑ ≥ ‖𝐯‖2 , 𝐯 ∈ ℝ𝑁 ;
2 𝑁 𝑖<𝑗 (𝑥𝑖 − 𝑥𝑗 ) 2 2

thus the convexity bound (13.28) holds with a constant 𝐾 = 𝑁/2 for 𝑁ℋ𝑁 .
Hence, the spectral gap from (13.43) implies that for any function 𝑅(𝝀) we have
𝐶
(13.71) ⟨𝑅(𝝀); 𝑅(𝝀)⟩𝜇𝐺 ≤ ∑ ∫ |𝜕𝜆𝛼 𝑅(𝝀)|2 d𝜇𝐺 .
𝑁 𝛼

Notice that this bound is in the same form as in (13.64). A similar statement
holds for the LSI; i.e., we have
𝐶 2
(13.72) ∫ 𝑅 log 𝑅 d𝜇𝐺 ≤ ∑ ∫|𝜕𝜆𝛼 √𝑅(𝝀)| d𝜇𝐺 .
𝑁 𝛼
144 13. ENTROPY AND THE LOGARITHMIC SOBOLEV INEQUALITY (LSI)

We can apply the concentration inequality (13.46) to the function 𝑅(𝝀) = 𝜆𝛼 to


get (13.69). Once again, we get exactly the same result as considered earlier, so
there is no advantage of using the explicit invariant measure 𝜇𝐺 .
We have seen that in both the matrix and the invariant ensemble setup, even
for Gaussian ensembles, concentration estimates based on spectral gap or LSI do
not yield optimal results for the eigenvalues on short scales. In the matrix setup,
the proof of the optimal rigidity bounds for Wigner matrices, Theorem 11.5,
uses a completely different approach independent of the LSI. In the invariant
ensemble setup, while the convexity bound (13.70) cannot be improved for a
general vector 𝐯, it becomes much stronger if we consider it only for vectors
𝐯 satisfying ∑𝑗 𝑣𝑗 = 0 due to the structure of the Hessian (13.70). In the next
section, we will explore this idea to improve estimates on certain functions of
the eigenvalues.

13.7. Extensions to the Simplex; Regularization of the DBM


In the above application to either spectral gap or LSI, the measure 𝜇𝐺 was
restricted to the subset Σ = Σ𝑁 = {𝐱 ∈ ℝ𝑁 ∶ 𝑥1 < ⋯ < 𝑥𝑁 }. It is absolutely
continuous with respect to the Lebesgue measure on Σ𝑁 , but if we write it in
the form (13.24), then the Hamiltonian ℋ has to be infinite outside of Σ𝑁 ; in
particular, it is not differentiable, a property that is implicitly assumed in Sec-
tion 13.3. This issue is closely related to the boundary terms in the integration
by parts, especially in (13.33), that arise if one formally extends the proofs to Σ.
Therefore, the results of Theorem 13.6 do not apply directly. The proper proce-
dure involves an approximation and extension of the measure from Σ𝑁 to ℝ𝑁
by using the results of Section 13.3 for the regularized measure on ℝ𝑁 and then
removing the regularization. Whether the regularization can be removed for
any 𝛽 > 0 or only for 𝛽 ≥ 1 depends on the statement, as we explain now.
For simplicity, we work in the setup of the Gaussian 𝛽-ensemble; i.e., we set
𝑁
1 2 1
ℋ(𝐱) = ∑ 𝑥 − ∑ log(𝑥𝑗 − 𝑥𝑖 )
𝑖=1
4 𝑖 𝑁 𝑖<𝑗

and define 𝜇𝛽 (𝐱) = 𝑍𝜇−1 𝑒−𝑁𝛽ℋ(𝐱) on the simplex Σ𝑁 , exactly as in (12.13) with
the Gaussian potential 𝑉(𝑥) = 12 𝑥2 and with any parameter 𝛽 > 0. As usual,
𝑍𝜇 is a normalization constant.
Recall from (12.4) that the DBM on the simplex Σ𝑁 is defined via the sto-
chastic differential equation
√2 1 1 1
(13.73) d𝑥𝑖 = d𝐵𝑖 + (− 𝑥𝑖 + ∑ )d𝑡 for 𝑖 = 1, … , 𝑁,
√𝛽𝑁 2 𝑁 𝑗≠𝑖 𝑥𝑖 − 𝑥𝑗

where 𝐵1 , … , 𝐵𝑁 is a family of independent standard Brownian motions. The


unique strong solution exists only if 𝛽 ≥ 1 (Theorem 12.2) with equilibrium
measure 𝜇𝛽 . Since DBM does not exist on Σ𝑁 unless 𝛽 ≥ 1, we cannot use DBM
either in the SDE form (12.4) or in the PDE form (12.17) when 𝛽 < 1 (see a
13.7. EXTENSIONS TO THE SIMPLEX; REGULARIZATION OF THE DBM 145

remark below (12.17)). On the other hand, certain results may be extended to
any 𝛽 > 0 if their final formulations do not involve DBM.
In this section, we present a regularization procedure to show that substan-
tial parts of the main results of Theorem 13.6, i.e., the LSI for any 𝛽 > 0 and
exponential relaxation decay of the entropy for 𝛽 ≥ 1, remain valid on the sim-
plex Σ𝑁 . A similar generalization holds for the Brascamp-Lieb inequality. In the
next section, the same regularization will be used to show that the key Dirichlet
form inequality (Theorem 14.3) also holds for 𝛽 > 0.
For later applications, we work with a slightly bigger class of measures than
just 𝜇𝛽 . We consider measures on Σ = Σ𝑁 of the form

−1 −𝛽𝑁ℋ̂
𝜔 = 𝑍𝜔 𝑒 𝜇𝛽 ,

where ℋ̂(𝐱) = ∑𝑗 𝑈𝑗 (𝑥𝑗 ) for some convex real valued functions 𝑈𝑗 on ℝ. The
total Hamiltonian of 𝜔 is ℋ𝜔 ∶= ℋ + ℋ̂. Note that 𝑈𝑗 are defined and convex
on the entire ℝ𝑁 . The entropy and the Dirichlet form are defined as before:
1
𝑆𝜔 (𝑓) = ∫ 𝑓 log 𝑓 d𝜔, 𝐷𝜔 (𝑓) = ∫ |∇𝑓|2 d𝜔.
Σ
𝛽𝑁 Σ
The corresponding DBM is given by
√2 1 1 1
(13.74) d𝑥𝑖 = d𝐵𝑖 + (− 𝑥𝑖 − 𝑈𝑖′ (𝑥𝑖 ) + ∑ )d𝑡.
√𝛽𝑁 2 𝑁 𝑥
𝑗≠𝑖 𝑖
− 𝑥𝑗

We assume a lower bound on the Hessian of the total Hamiltonian,


1
(13.75) ″
ℋ𝜔 = ℋ ″ + ℋ̂ ″ ≥ + min min 𝑈𝑗″ (𝑥) ≥ 𝐾
2 𝑗 𝑥∈ℝ

for some positive constant 𝐾 on the entire ℝ𝑁 . This bound (13.75) plays the
role of (13.28). Let 𝐷𝜔 , 𝑆𝜔 , and ℒ𝜔 denote the Dirichlet form, entropy, and
generator corresponding to the measure 𝜔. Now we claim that Theorem 13.6
holds for the measure 𝜔 on Σ𝑁 in the following form:
Theorem 13.14. Assume (13.75). Then for 𝛽 > 0, the LSI holds, i.e.,
2
(13.76) 𝑆𝜔 (𝑓) ≤ 𝐷 (√𝑓)
𝐾 𝜔
for any nonnegative normalized density 𝑓 on Σ𝑁 , ∫ 𝑓 d𝜔 = 1, that satisfies 𝑓 ∈
𝐿∞ and ∇√𝑓 ∈ 𝐿∞ . For 𝛽 ≥ 1 the requirement that 𝑓 and ∇√𝑓 are bounded can
be removed.
Moreover, the Brascamp-Lieb inequality also holds for any 𝛽 > 0; i.e., for any
bounded function 𝑓 ∈ 𝐿2 (Σ𝑁 , d𝜔) we have
″ −1
(13.77) ⟨𝑓; 𝑓⟩𝜔 ≤ ⟨∇𝑓, [ℋ𝜔 ] ∇𝑓⟩𝜔 .
For 𝛽 ≥ 1 the requirement that 𝑓 be bounded can be removed.
146 13. ENTROPY AND THE LOGARITHMIC SOBOLEV INEQUALITY (LSI)

Furthermore, for any 𝛽 ≥ 1 the dynamics 𝜕𝑡 𝑓𝑡 = ℒ𝜔 𝑓𝑡 with initial condition


𝑓0 = 𝑓 is well-defined on Σ𝑁 , and it approaches to equilibrium in the sense that

(13.78) 𝑆𝜔 (𝑓𝑡 ) ≤ 𝑒−2𝑡𝐾 𝑆𝜔 (𝑓).

The inequalities above are understood in the usual sense that they are rel-
evant only when the right-hand side is finite. Moreover, by a standard density
argument, they extend to the closure of the corresponding spaces. For exam-
ple, (13.76) holds for any 𝑓 that can be approximated by a sequence of bounded
normalized densities 𝑓𝑛 ∈ 𝐿∞ with ∇√𝑓𝑛 ∈ 𝐿∞ such that 𝐷𝜔 (√𝑓𝑛 − √𝑓) → 0.
Before starting the formal proof, we explain the key idea. For the proofs of
(13.76) and (13.77), we extend the measure 𝜔 from Σ to the entire ℝ𝑁 in a con-
tinuous way by relaxing the strict ordering 𝑥𝑖 < 𝑥𝑖+1 imposed by Σ but heavily
penalizing the opposite order. In this way we can use Theorem 13.6 for the reg-
ularized measure and avoid the problematic boundary terms in the integration
by parts in (13.33). At the end we remove the regularization using the additional
boundedness assumptions on 𝑓 and ∇√𝑓. For 𝛽 ≥ 1 these additional assump-
tions are not necessary using that 𝐶0∞ (Σ) functions are dense in 𝐻 1 (Σ, d𝜔). The
entropy decay (13.78) will follow from the LSI and the time-integral version of
the entropy dissipation (13.32), which can be proven directly on Σ if 𝛽 ≥ 1.
We remark that there is an alternative regularization method that mimics
the proof of Theorem 13.6 directly on Σ for 𝛽 ≥ 1. This is based on introducing
carefully selected cutoff functions in order to approximate 𝑓𝑡 by a function com-
pactly supported on Σ. The compact support renders the boundary terms in the
integration by parts zero, but the cutoff does not commute with the dynamics;
the error has to be tracked carefully. The advantage of this alternative method
is that it also gives the exponential decay of the Dirichlet form, and it is also
the closest in spirit to the strategy of the proof of Theorem 13.6 on ℝ𝑁 . The
disadvantage is that it works only for 𝛽 ≥ 1; in particular, it does not yield the
LSI for 𝛽 ∈ (0, 1). We will not discuss this approach in this book; the interested
reader may find details in appendix B of [64].

Proof of Theorem 13.14. Define the extension 𝜇𝛽𝛿 = 𝑍𝜇,𝛿 −1 −𝛽𝑁ℋ𝛿


𝑒 of the
𝑁
measure 𝜇𝛽 from Σ𝑁 to ℝ for any 𝛿 > 0 by replacing the singular logarithm
with a 𝐶 2 -function. To that end, we introduce the approximation parameter
𝛿 > 0 and define for 𝐱 ∈ ℝ𝑁
1 2 1
(13.79) ℋ𝛿 (𝐱) ∶= ∑ 𝑥 − ∑ log (𝑥 − 𝑥𝑖 )
𝑖
4 𝑖 𝑁 𝑖<𝑗 𝛿 𝑗

where we set

𝑥−𝛿 1
log𝛿 (𝑥) ∶= 𝟏(𝑥 ≥ 𝛿) log 𝑥 + 𝟏(𝑥 < 𝛿)(log 𝛿 + − 2 (𝑥 − 𝛿)2 ), 𝑥 ∈ ℝ.
𝛿 2𝛿
13.7. EXTENSIONS TO THE SIMPLEX; REGULARIZATION OF THE DBM 147

It is easy to check that log𝛿 ∈ 𝐶 2 (ℝ) is concave and satisfies


log 𝑥 if 𝑥 > 0,
lim log𝛿 (𝑥) = {
𝛿→0 −∞ if 𝑥 ≤ 0.
The convergence is monotone decreasing, i.e.,
(13.80) log𝛿 (𝑥) ≥ log𝛿′ (𝑥), 𝑥 ∈ ℝ,
for any 𝛿 ≥ 𝛿 ′ . Furthermore, we have
1
if 𝑥 > 𝛿,
𝑥
𝜕𝑥 log𝛿 (𝑥) = { 2𝛿−𝑥
if 𝑥 ≤ 𝛿,
𝛿2
and the lower bound
1
− 2 if 𝑥 > 𝛿,
𝜕𝑥2 log𝛿 (𝑥) ≥ { 𝑥1
− 2 if 𝑥 ≤ 𝛿;
𝛿

in particular, ℋ𝛿 is convex with ℋ𝛿″ ≥ 12 . Similarly, we can extend the measure


𝜔 to ℝ𝑁 by setting
−1 −𝛽𝑁ℋ̂ 𝛿
(13.81) 𝜔𝛿 ∶= 𝑍𝜔,𝛿 𝑒 𝜇𝛽 ,
and its Hamiltonian still satisfies (13.75). By the monotonicity (13.80), we know
that 𝑒−𝛽𝑁𝐻𝛿 (𝐱) is pointwise monotonically decreasing in 𝛿 and clearly 𝑍𝜇,𝛿 and
𝑍𝜔,𝛿 converge to 1 as 𝛿 → 0. Clearly, 𝜔𝛿 → 𝜔 as 𝛿 → 0 in the sense that for
any set 𝐴 ⊂ ℝ𝑁 we have 𝜔𝛿 (𝐴) → 𝜔(𝐴 ∩ Σ𝑁 ). We remark that the regularized
DBM corresponding to the measure 𝜔𝛿 is given by
√2 1 1 ′
d𝑥𝑖 = d𝐵𝑖 +(− 𝑥𝑖 − 𝑈𝑖′ (𝑥𝑖 ) + ∑ log𝛿 (𝑥𝑖 − 𝑥𝑗 )
√𝛽𝑁 2 𝑁 𝑗<𝑖
(13.82)
1 ′
− ∑ log (𝑥 − 𝑥𝑖 ))d𝑡.
𝑁 𝑗>𝑖 𝛿 𝑗

Given a function 𝑓 on Σ𝑁 such that ∫Σ𝑁 𝑓 d𝜔 = 1 and 𝐷𝜔 (√𝑓) =


∫Σ𝑁 |∇√𝑓|2 d𝜔 < ∞, we extend it by symmetry to ℝ𝑁 . To do so, we first define
the symmetrized version of Σ𝑁 , i.e.,
(13.83) Σ̃ 𝑁 ∶= ℝ𝑁 ⧵ {𝐱 ∶ ∃𝑖 ≠ 𝑗, 𝑥𝑖 = 𝑥𝑗 },
which has the disjoint union structure
Σ̃ 𝑁 = 𝜋(Σ𝑁 )

𝜋∈𝑆𝑁

where 𝜋 runs through all 𝑁-element permutations and acts by permuting the
coordinates of any point 𝐱 ∈ ℝ𝑁 . For any 𝐱 ∈ Σ̃ 𝑁 there is a unique 𝜋 so that
˜ by 𝑓
𝐱 ∈ 𝜋(Σ𝑁 ), and we then define the extension 𝑓 ˜ (𝐱) ∶= 𝑓(𝜋 −1 (𝐱)). Clearly,
148 13. ENTROPY AND THE LOGARITHMIC SOBOLEV INEQUALITY (LSI)

𝜋 −1 (𝐱) is simply the coordinates of 𝐱 = (𝑥1 , … , 𝑥𝑁 ) permuted in increasing


order. Thus 𝑓 ˜ is defined on ℝ𝑁 apart from a zero measure set and is bounded
since 𝑓 ∈ 𝐿∞ . Furthermore, ∇[( 𝑓˜ )1/2 ] is also bounded since ∇√𝑓 ∈ 𝐿∞ .
Since 𝑓˜ is bounded, we have ∫ 𝑁 𝑓 ˜ d𝜔𝛿 → ∫ 𝑓 d𝜔 = 1, so there is a
ℝ Σ𝑁
˜
constant 𝐶𝛿 such that 𝑓𝛿 ∶= 𝐶𝛿 𝑓 is a probability density, i.e., ∫ 𝑁 𝑓𝛿 d𝜔𝛿 = 1,

and clearly 𝐶𝛿 → 1 as 𝛿 → 0. Applying Theorem 13.6 to the measure 𝜔𝛿 and
the function 𝑓𝛿 on ℝ𝑁 , we see that the LSI holds for 𝜔𝛿 , i.e.,
2
𝑆𝜔𝛿 (𝑓𝛿 ) ≤ 𝐷 (√𝑓𝛿 ),
𝐾 𝜔𝛿
or, equivalently,

𝐶𝛿 ∫ 𝑓 ˜ d𝜔𝛿 + log 𝐶𝛿 ≤ 2𝐶𝛿 𝐷𝜔 ( 𝑓


˜ log 𝑓 ˜ ).
𝐾 𝛿 √
ℝ𝑁

Now we let 𝛿 → 0. Using the boundedness of ∇[( 𝑓 ˜ )1/2 ], the weak convergence
of 𝜔𝛿 to 𝜔, and that 𝑍𝜔,𝛿 and 𝐶𝛿 converge to 1, the Dirichlet form on the right
side of the last inequality converges to 𝐷𝜔 (√𝑓). The first term on the left side
of the inequality converges to ∫ 𝑓 log 𝑓 d𝜔 by dominated convergence, and the
second term converges to 0. Thus we arrive at (13.76).
In the above argument, the boundedness of 𝑓 and ∇√𝑓 were only used to
ensure that 𝑓 or rather its extension 𝑓 ˜ has finite integral, and the Dirichlet
form w.r.t. the regularized measure 𝜔𝛿 , 𝐷𝜔𝛿 (√ 𝑓 ˜ ), converges to 𝐷𝜔 (√𝑓 ). For
𝛽 ≥ 1 we can remove these conditions by using a different extension of 𝑓 to
ℝ𝑁 if 𝑓 ∈ 𝐻 1 (Σ, d𝜔). We may assume that 𝐷𝜔 (√𝑓) < ∞; otherwise (13.76)
is a tautology. We first still assume that 𝑓 ∈ 𝐿∞ (Σ). We smoothly cut off 𝑓 to
be 0 at the boundary of Σ𝑁 ; i.e., we find a nonnegative sequence 𝑓𝜀 ∈ 𝐶0∞ (Σ𝑁 )
such that √𝑓𝜀 → √𝑓 in 𝐻 1 (Σ, d𝜔) and ∫ 𝑓𝜀 d𝜔 = 1. For 𝛽 ≥ 1 the existence of a
similar sequence but 𝑓𝜀 → 𝑓 in 𝐻 1 was shown in Section 12.4. In fact, the same
construction shows that we can also guarantee √𝑓𝜀 → √𝑓 in 𝐻 1 (Σ, d𝜔). Now
we use the LSI for the smooth functions 𝑓𝜀 , i.e.,
2
𝑆𝜔 (𝑓𝜀 ) ≤ 𝐷 (√𝑓𝜀 ),
𝐾 𝜔
and we let 𝜀 → 0. The right-hand side converges to 𝐷𝜔 (√𝑓) by the above choice
of 𝑓𝜀 . For the left-hand side, recall that apart from a smoothing that can be
dealt with via standard approximation arguments, the cutoff function was con-
structed in the form 𝑓𝜀 (𝐱) = 𝐶𝜀 𝜙𝜀 (𝐱)𝑓(𝐱), where 𝜙𝜀 (𝐱) ∈ (0, 1) with 𝜙𝜀 ↗ 1
monotonically pointwise and 𝐶𝜀 is a normalization such that 𝐶𝜀 → 1 as 𝜀 → 0.
Clearly,

𝑆𝜔 (𝑓𝜀 ) = 𝐶𝜀 log 𝐶𝜀 ∫ 𝜙𝜀 𝑓 d𝜔 + 𝐶𝜀 ∫ 𝑓𝜙𝜀 log 𝜙𝜀 d𝜔 + 𝐶𝜀 ∫ 𝜙𝜀 𝑓 log 𝑓 d𝜔.


Σ Σ Σ
13.7. EXTENSIONS TO THE SIMPLEX; REGULARIZATION OF THE DBM 149

The first term converges to 0 since ∫ 𝜙𝜀 𝑓 d𝜔 ≤ 1 and 𝐶𝜀 log 𝐶𝜀 → 0. The second


term also converges to 0 by dominated convergence since |𝜙𝜀 log 𝜙𝜀 | is bounded
by 1 and goes to 0 pointwise. Finally, the last term converges to 𝑆𝜔 (𝑓) by mono-
tone convergence. This proves (13.76) for 𝛽 ≥ 1 for any bounded 𝑓. Finally,
we remove this last condition. Given any 𝑓 with 𝐷𝜔 (√𝑓) < ∞, we define
𝑓𝑀 ∶= min{𝑓, 𝑀} and 𝑓 ˜ ∶= 𝐶𝑀 𝑓𝑀 where 𝐶𝑀 is the normalization. Clearly,
𝑀
𝐶𝑀 → 1 and 𝑓𝑀 ↗ 𝑓 pointwise. Since (13.76) holds for 𝑓 ˜ , we have
𝑀

2 2
(13.84) 𝐶𝑀 log 𝐶𝑀 ∫ 𝑓𝑀 d𝜔 + 𝐶𝑀 ∫ 𝑓𝑀 log 𝑓𝑀 d𝜔 ≤ 𝐶𝑀 ∫|∇√𝑓𝑀 | d𝜔.
𝐾
Now we let 𝑀 → ∞. The first term on the left is just log 𝐶𝑀 → 0. The second
term on the left converges to 𝑆𝜔 (𝑓) by monotone convergence and 𝐶𝑀 → 1 and
similarly the right-hand side converges to (2/𝐾)𝐷𝜔 (√𝑓) by monotone conver-
gence. This proves (13.76) for 𝛽 ≥ 1 without any additional condition on 𝑓.
The Brascamp-Lieb inequality, (13.77), is proved similarly, starting from its
regularized version on ℝ𝑁 ,

⟨𝑓; 𝑓⟩𝜔𝛿 ≤ ⟨∇𝑓, [ℋ𝛿″ + ℋ̂ ″ ]−1 ∇𝑓⟩𝜔


𝛿

that follows directly from Theorem 13.13. We can then take the limit 𝛿 → 0 using
monotone convergence on the left and the dominated convergence on the right,
using that ℋ𝛿″ → ℋ and the inverse [ℋ𝛿″ + ℋ̂ ″ ]−1 is uniformly bounded.
For the third part of the theorem, for the proof of (13.78), we first note that
the remark after (12.17) applies to the generator ℒ𝜔 as well; i.e., 𝛽 ≥ 1 is nec-
essary for the well-posedness of the equation 𝜕𝑡 𝑓𝑡 = ℒ𝜔 𝑓𝑡 on Σ𝑁 with initial
condition 𝑓0 supported on Σ𝑁 . The construction of the dynamics in Section 12.4
also implies that 𝑓𝑡 ∈ 𝐻 1 (d𝜔) for any 𝑡 > 0 if 𝑓 ∈ 𝐿2 (d𝜔).
We now mimic the proof of the entropy dissipation (13.32) in our setup.
Since we do not know that 𝐷𝜔 (√𝑓𝑡 ) < ∞, we have to introduce a regularization
𝑐 > 0 to keep 𝑓𝑡 away from 0. We compute
d ℒ𝑓𝑡
∫ 𝑓𝑡 log(𝑓𝑡 + 𝑐)d𝜔 = ∫(ℒ𝑓𝑡 ) log(𝑓𝑡 + 𝑐)d𝜔 + ∫ 𝑓𝑡 d𝜔
d𝑡 𝑓𝑡 + 𝑐
|∇𝑓𝑡 |2 𝑐ℒ𝑓𝑡
= −∫ d𝜔 − ∫ d𝜔
𝑓𝑡 + 𝑐 𝑓𝑡 + 𝑐
|∇𝑓𝑡 |2 𝑐|∇𝑓𝑡 |2
(13.85) = −∫ d𝜔 − ∫ d𝜔.
𝑓𝑡 + 𝑐 (𝑓𝑡 + 𝑐)2

Owing to the regularization 𝑐 > 0, we used integration by parts for 𝐻 1 (d𝜔)


functions only. Dropping the last term that is negative and integrating from 0
to 𝑡, we obtain
𝑡
|∇𝑓𝑠 |2
(13.86) ∫ 𝑓𝑡 log(𝑓𝑡 + 𝑐)d𝜔 + ∫ d𝑠 ∫ d𝜔 ≤ ∫ 𝑓0 log(𝑓0 + 𝑐)d𝜔.
0
𝑓𝑠 + 𝑐
150 13. ENTROPY AND THE LOGARITHMIC SOBOLEV INEQUALITY (LSI)

𝑓𝑡 +𝑐
Since 𝑓𝑡 log ≥ 0, we have
𝑓𝑡
𝑡
|∇𝑓𝑠 |2
(13.87) ∫ 𝑓𝑡 log 𝑓𝑡 d𝜔 + ∫ ∫ d𝑠 ≤ ∫ 𝑓0 log(𝑓0 + 𝑐)d𝜔.
0
𝑓𝑠 + 𝑐
Note that both terms on the left-hand side are nonnegative. Now we let 𝑐 → 0.
By monotone convergence and 𝑆𝜔 (𝑓0 ) < ∞, we get
𝑡
|∇𝑓𝑠 |2
(13.88) ∫ 𝑓𝑡 log 𝑓𝑡 d𝜇 + ∫ d𝑠 ∫ d𝜔 ≤ ∫ 𝑓0 log 𝑓0 d𝜔
0
𝑓𝑠
or
𝑡
(13.89) 𝑆𝜔 (𝑓𝑡 ) + 4 ∫ 𝐷𝜔 (√𝑓𝑠 )d𝑠 ≤ 𝑆𝜔 (𝑓0 ).
0
This is the entropy dissipation inequality in a time integral form. Notice that
neither equality nor the differential version as in (13.32) is claimed. Note that
by integrating (13.85) between 𝑡 and 𝜏, a similar argument yields
𝑡
(13.90) 𝑆𝜔 (𝑓𝑡 ) + 4 ∫ 𝐷𝜔 (√𝑓𝑠 )d𝑠 ≤ 𝑆𝜔 (𝑓𝜏 ), 𝑡 ≥ 𝜏 ≥ 0.
𝜏
In particular, the entropy decays:
(13.91) 0 ≤ 𝑆𝜔 (𝑓𝑡 ) ≤ 𝑆𝜔 (𝑓𝜏 ), 𝑡 ≥ 𝜏 ≥ 0.
Now we use the LSI (13.76) to estimate 𝐷𝜔 (√𝑓𝑠 ) in (13.90) and recall that
the LSI holds for any 𝑓𝑠 since 𝛽 ≥ 1. We get
𝑡
(13.92) 𝑆𝜔 (𝑓𝑡 ) + 2𝐾 ∫ 𝑆𝜔 (𝑓𝑠 )d𝑠 ≤ 𝑆𝜔 (𝑓𝜏 ), 𝑡 ≥ 𝜏 ≥ 0.
𝜏
A standard calculus exercise shows that 𝑆𝜔 (𝑓𝑡 ) ≤ 𝑒−2𝐾𝑡 𝑆𝜔 (𝑓0 ) for all 𝑡 ≥ 0. One
possible argument is to fix any 𝛿 > 0 and choose 𝜏 = (𝑛 − 1)𝛿, 𝑡 = 𝑛𝛿 with
𝑛 = 1, 2, … in (13.92). By monotonicity of the entropy, we have
(1 + 2𝐾𝛿)𝑆𝜔 (𝑓𝑛𝛿 ) ≤ 𝑆𝜔 (𝑓(𝑛−1)𝛿 )
for any 𝑛, and by iteration we obtain
𝑆𝜔 (𝑓𝑛𝛿 ) ≤ (1 + 2𝐾𝛿)−𝑛 𝑆𝜔 (𝑓0 ).
Setting 𝛿 = 𝑡/𝑛 and letting 𝑛 → ∞ we get (13.78). This completes the proof of
Theorem 13.14. □
CHAPTER 14

Universality of the Dyson Brownian Motion

In this section we assume 𝛽 ≥ 1 since Dyson Brownian motion in a strong


sense is well-defined only for these values of 𝛽. However, some results will
hold for any 𝛽 > 0, which we will comment on separately. We consider the
Gaussian equilibrium measure 𝜇 = 𝜇G ∼ exp(−𝛽𝑁ℋ) on Σ𝑁 , defined in (12.13)
in Section 12.3, with the Gaussian potential 𝑉(𝜆) = 12 𝜆2 . From (12.14) we recall
the corresponding Dirichlet form
𝑁
1 1
(14.1) 𝐷𝜇 (𝑓) ∶= ∑ ∫(𝜕𝑖 𝑓)2 d𝜇 = ‖∇𝑓‖2𝐿2 (𝜇) (𝜕𝑖 ≡ 𝜕𝜆𝑖 )
𝛽𝑁 𝑖=1 𝛽𝑁

and the generator ℒ = ℒG defined via

(14.2) 𝐷𝜇 (𝑓) = − ∫ 𝑓ℒG 𝑓 d𝜇,

1
and explicitly given by ℒG = Δ − (∇ℋ) ⋅ ∇. The corresponding dynamics is
𝛽𝑁
given by (12.17), i.e.,
(14.3) 𝜕𝑡 𝑓𝑡 = ℒG 𝑓𝑡 , 𝑡 ≥ 0.
In this section, we will drop all subscripts G.
As remarked in the previous section, the Hamiltonian ℋ is convex since
the Hessian of the Hamiltonian of 𝜇 satisfies ∇2 (𝛽𝑁ℋ) ≥ 𝛽𝑁/2 by (13.70).
Taking the different normalization of the Dirichlet form in (13.25) and (14.1)
into account, Theorem 13.6 (actually, its extension to Σ𝑁 in Theorem 13.14 with
𝑈𝑗 ≡ 0) guarantees that 𝜇 satisfies the LSI in the form

𝑆𝜇 (𝑓) ≤ 4𝐷𝜇 (√𝑓),


and the relaxation time to equilibrium is of order 1. Furthermore, we have the
exponential convergence to the equilibrium 𝜇G in the sense of total variation
norm (13.78) at exponential speed on time scales beyond order 1 for initial data
with finite entropy.
The following theorem is the main result of this section. It shows that under
a certain condition on the rigidity of the eigenvalues the relaxation times of the
DBM for observables depending only on the eigenvalue differences are in fact
much shorter than order 1. The main assumption is the a priori estimate (12.18),
which we restate here.
151
152 14. UNIVERSALITY OF THE DYSON BROWNIAN MOTION

A priori estimate. There exists 𝜉 > 0 such that


𝑁
1
(14.4) 𝑄 ∶= sup ∫ ∑ (𝜆𝑗 − 𝛾𝑗 )2 𝑓𝑡 (𝝀)𝜇G (d𝝀) ≤ 𝐶𝑁 −2+2𝜉
0≤𝑡≤𝑁 𝑁 𝑗=1

with a constant 𝐶 uniformly in 𝑁. We also assume that after time 1/𝑁 the
solution of the equation (14.3) satisfies the bound
(14.5) 𝑆𝜇 (𝑓1/𝑁 ) ≤ 𝐶𝑁 𝑚
for some fixed 𝑚. Later in Lemma 14.6, we will show that for 𝛽 = 1, 2 this bound
automatically holds.
Theorem 14.1 (Gap universality of the Dyson Brownian motion for short
time [64, theorem 4.1]). Let 𝛽 ≥ 1 and assume (14.5). Fix 𝑛 ≥ 1 and an array
of positive integers, 𝐦 = (𝑚1 , … , 𝑚𝑛 ) ∈ ℕ𝑛 𝑛
+ . Let 𝐺 ∶ ℝ → ℝ be a bounded,
smooth function with compact support and define

(14.6) 𝒢𝑖,𝐦 (𝐱) ∶=


𝐺(𝑁(𝑥𝑖 − 𝑥𝑖+𝑚1 ), 𝑁(𝑥𝑖+𝑚1 − 𝑥𝑖+𝑚2 ), … , 𝑁(𝑥𝑖+𝑚𝑛−1 − 𝑥𝑖+𝑚𝑛 )).
1
Then, for any 𝜉 ∈ (0, ) and any sufficiently small 𝜀 > 0, independent of 𝑁,
2
there exist constants 𝐶, 𝑐 > 0, depending only on 𝜀 and 𝐺, such that for any 𝐽 ⊂
{1, … , 𝑁 − 𝑚𝑛 } we have
| 1 | 𝑁2𝑄 𝜀
(14.7) sup |∫ ∑ 𝒢𝑖,𝐦 (𝐱)(𝑓𝑡 d𝜇 − d𝜇)| ≤ 𝐶𝑁 𝜀 + 𝐶𝑒−𝑐𝑁 .
𝑡≥𝑁−1+2𝜉+𝜀 |
|𝐽| 𝑖∈𝐽 | √ |𝐽|𝑡

In particular, if (14.4) holds, then for any 𝑡 ≥ 𝑁 −1+2𝜉+2𝜀+𝛿 with some 𝛿 ≥ 0, we


have
| 1 | 1 𝜀
(14.8) |∫ ∑ 𝒢𝑖,𝐦 (𝐱)(𝑓𝑡 d𝜇 − d𝜇)| ≤ 𝐶 + 𝐶𝑒−𝑐𝑁 .
| |𝐽| 𝑖∈𝐽 | 𝛿−1
√|𝐽|𝑁
Thus the gap distribution, averaged over 𝐽 indices, coincides for 𝑓𝑡 d𝜇 and d𝜇 pro-
vided that |𝐽|𝑁 𝛿−1 → ∞.
We stated this result with a very general test function (14.6) because we will
need this form later. Obviously, controlling test functions of the form (14.6)
for all 𝑛 is equivalent to controlling test functions of neighboring gaps only
(maybe with a larger 𝑛), so the reader may think only of the case 𝑚1 = 1,
𝑚2 = 2, … , 𝑚𝑛 = 𝑛.
In our applications, 𝐽 is chosen to be the indices of the eigenvalues in the
interval [𝐸 − 𝑏, 𝐸 + 𝑏] and thus |𝐽| ≍ 𝑁𝑏. This identifies the averaged gap distri-
butions of eigenvalues. However, Theorem 12.4 concerns correlation functions
and not gap distributions directly. It is intuitively clear that the information car-
ried in gap distribution or correlation functions are equivalent if both statistics
are averaged on a scale larger than the typical fluctuation of a single eigenvalue
14.1. MAIN IDEAS BEHIND THE PROOF OF THEOREM 14.1 153

(which is smaller than 𝑁 −1+𝜀 in the bulk by (11.32)). This standard fact will be
proved in Section 14.5.
As pointed out after Theorem 12.4, the input of this theorem, the a priori
estimate (14.4), identifies the location of the eigenvalues only on a scale 𝑁 −1+𝜉 ,
which is much weaker than the 1/𝑁 precision encoded in the rescaled eigen-
value differences in (14.7). Moreover, by the rigidity estimate (11.32), the a priori
estimate (14.4) holds for any 𝜉 > 0 if the initial data of the DBM is a general-
ized Wigner ensemble. Therefore, Theorem 14.1 holds for any 𝑡 ≥ 𝑁 −1+𝜀 for
any 𝜀 > 0. This establishes Dyson’s conjecture (described in Section 12.3) in the
sense of averaged gap distributions for any generalized Wigner matrices.

14.1. Main Ideas Behind the Proof of Theorem 14.1


The key method is to analyze the relaxation to equilibrium of the dynamics
(13.30). This approach was first introduced in section 5.1 of [63]; the presenta-
tion here follows [64].
Recall the convexity inequality (13.70) for the Hessian of the Hamiltonian
1 1 (𝑣𝑖 − 𝑣𝑗 )2 1
(14.9) ⟨𝐯, ∇2 ℋ(𝐱)𝐯⟩ ≥ ‖𝐯‖2 + ∑ ≥ ‖𝐯‖2 , 𝐯 ∈ ℝ𝑁 ,
2 𝑁 𝑖<𝑗 (𝑥𝑖 − 𝑥𝑗 )2 2

which guarantees a relaxation to equilibrium on a time scale of order 1. The key


idea to prove Theorem 14.1 is that the relaxation time is in fact much shorter
than order 1 for local observables that depend only on the eigenvalue differences.
The convexity bound (14.9) shows that the relaxation in the direction 𝑣𝑖 − 𝑣𝑗 is
much faster than order 1 provided that 𝑥𝑖 − 𝑥𝑗 are close. However, this effect is
hard to exploit directly due to the fact that all modes of different wavelengths
are coupled. Our idea is to add an auxiliary strongly convex potential ℋ̂(𝐱) to
the original Hamiltonian ℋ to “speed up” the convergence to local equilibrium.
On the other hand, we will also show that the cost of this speeding up can be
effectively be controlled if the a priori estimate (12.18) holds.
The auxiliary potential ℋ̂(𝐱) is defined by
𝑁
1
(14.10) ℋ̂(𝐱) ∶= ∑ 𝑈𝑗 (𝑥𝑗 ), 𝑈𝑗 (𝑥) ∶= (𝑥 − 𝛾 )2 ;
𝑗=1
2𝜏 𝑗 𝑗

i.e., it is a quadratic confinement on the scale √𝜏 for each eigenvalue near its
classical location, where the parameter 0 < 𝜏 < 1 will be chosen later on. The
total Hamiltonian is given by
(14.11) ℋ̃ ∶= ℋ + ℋ̂
where ℋ is the Gaussian Hamiltonian given by (4.12) with 𝑉(𝑥) = 12 𝑥 2 . The
measure with Hamiltonian ℋ̃ ,
˜
(14.12) d𝜔 ∶= 𝜔(𝐱)d𝐱, 𝜔 ∶= 𝑒−𝛽𝑁ℋ̃ / 𝑍 ,
will be called the local relaxation measure.
154 14. UNIVERSALITY OF THE DYSON BROWNIAN MOTION

The local relaxation flow is defined to be the flow with the generator char-
acterized by the natural Dirichlet form w.r.t. 𝜔, explicitly, ℒ̃:
𝑥𝑗 − 𝛾𝑗
(14.13) ℒ̃ = ℒ − ∑ 𝑏𝑗 𝜕𝑗 , 𝑏𝑗 = 𝑈𝑗′ (𝑥𝑗 ) = .
𝑗
𝜏

We will choose 𝜏 ≪ 1 so that the additional term ℋ̂ substantially increases the


lower bound (13.70) on the Hessian, hence speeding up the dynamics so that
the relaxation time is at most 𝜏.
The idea of adding an artificial potential ℋ̂ to speed up the convergence
appears to be unnatural here. The current formulation is a streamlined version
of a much more complicated approach that appeared in [63], which took ideas
from the earlier work [59]. Roughly speaking, in the hydrodynamical limit, the
short-wavelength modes always have shorter relaxation times than the long-
wavelength modes. A direct implementation of this idea is extremely compli-
cated due to the logarithmic interaction that couples short- and long-wavelength
modes. Adding a strongly convex auxiliary potential ℋ̂ shortens the relaxation
time of the long-wavelength modes, but it does not affect the short modes, i.e.,
the local statistics, which are our main interest. The analysis of the new system
is much simpler since now the relaxation is faster, uniform for all modes. Fi-
nally, we need to compare the local statistics of the original system with those of
the modified one. It turns out that the difference is governed by (∇ℋ̂)2 , which
can be directly controlled by the a priori estimate (14.4).
Our method for enhancing the convexity of ℋ is reminiscent of a standard
convexification idea concerning metastable states. To explain the similarity,
consider a particle near one of the local minima of a double-well potential sep-
arated by a local maximum, or energy barrier. Although the potential is not
convex globally, one may still study a reference problem defined by convexi-
fying the potential along with the well in which the particle initially resides.
Before the particle reaches the energy barrier, there is no difference between
these two problems. Thus questions concerning time scales shorter than the
typical escape time can be conveniently answered by considering the convexi-
fied problem; in particular, the escape time in the metastability problem itself
can be estimated by using convex analysis. Our DBM problem is already convex,
but not sufficiently convex. The modification by adding ℋ̂ enhances convexity
without altering the local statistics. This is similar to the convexification in the
metastability problem, which does not alter events before the escape time.

14.2. Proof of Theorem 14.1


We will work with the measures 𝜇 and 𝜔 defined on the simplex Σ𝑁 and
present the proof as if Theorem 13.6 and its proof were valid for measures not
only on ℝ𝑁 but also on Σ𝑁 . Theorem 13.14 demonstrated that this is indeed
true except that the exponential decay to equilibrium was proven only in the
entropy sense and not the Dirichlet form sense as in (13.31). For pedagogical
simplicity, we first neglect this technicality; in Section 14.3 we will comment on
14.2. PROOF OF THEOREM 14.1 155

how to remedy this problem with the help of the regularization introduced in
Section 13.7.
The core of the proof is divided into three theorems. For the flow with
generator ℒ̃, we have the following estimates on the entropy and Dirichlet form.
Theorem 14.2. Let 𝛽 ≥ 1 be arbitrary. Consider the forward equation
(14.14) 𝜕𝑡 𝑞𝑡 = ℒ̃𝑞𝑡 , 𝑡 ≥ 0,
with the reversible measure 𝜔 defined in (14.12). Let the initial condition 𝑞0 satisfy
∫ 𝑞0 d𝜔 = 1. Then, we have the following estimates:
𝑁 2
2 1 (𝜕𝑖 √𝑞𝑡 − 𝜕𝑗 √𝑞𝑡 )
(14.15) 𝜕𝑡 𝐷𝜔 (√𝑞𝑡 ) ≤ − 𝐷𝜔 (√𝑞𝑡 ) − ∫ ∑ d𝜔,
𝜏 𝛽𝑁 2
𝑖,𝑗=1
(𝑥𝑖 − 𝑥𝑗 )2
∞ 𝑁 2
1 (𝜕𝑖 √𝑞𝑠 − 𝜕𝑗 √𝑞𝑠 )
(14.16) 2
∫ d𝑠 ∫ ∑ d𝜔 ≤ 𝐷𝜔 (√𝑞0 ),
𝛽𝑁 0 𝑖,𝑗=1
(𝑥𝑖 − 𝑥𝑗 )2

and the logarithmic Sobolev inequality


(14.17) 𝑆𝜔 (𝑞0 ) ≤ 𝐶𝜏𝐷𝜔 (√𝑞0 )
with a universal constant 𝐶. Thus the relaxation time to equilibrium is of order 𝜏:
(14.18) 𝑆𝜔 (𝑞𝑡 ) ≤ 𝑒−𝐶𝑡/𝜏 𝑆𝜔 (𝑞0 ), 𝐷𝜔 (𝑞𝑡 ) ≤ 𝑒−𝐶𝑡/𝜏 𝐷𝜔 (𝑞0 ).
Proof. Let ℎ = √𝑞; by a calculation similar to (13.33) we have the equation

1
𝜕𝑡 𝐷𝜔 (ℎ𝑡 ) = 𝜕𝑡 ∫(∇ℎ)2 𝑒−𝛽𝑁ℋ̃ d𝐱
𝛽𝑁
(14.19)
2
≤− ∫ ∇ℎ(∇2 ℋ̃ )∇ℎ 𝑒−𝛽𝑁ℋ̃ d𝐱.
𝛽𝑁

In our case, (13.70) and (14.10) imply that the Hessian of ℋ̃ is bounded
from below as
1 1 1
(14.20) ∇ℎ(∇2 ℋ̃ )∇ℎ ≥ ∑(𝜕𝑗 ℎ)2 + ∑ (𝜕 ℎ − 𝜕𝑗 ℎ)2 .
𝜏 𝑗 2𝑁 𝑖,𝑗 (𝑥𝑖 − 𝑥𝑗 )2 𝑖

This proves (14.15) and (14.16). The rest can be proved by straightforward ar-
guments analogously to (13.32)–(13.37). □

Notice that the estimate (14.16) is additional information that we extracted


from the Bakry-Émery argument by using the second term in the Hessian esti-
mate (13.70). It plays a key role in the next theorem.
Theorem 14.3 (Dirichlet form inequality). Let 𝛽 ≥ 1 be arbitrary. Let 𝑞 be
a probability density with respect to the relaxation measure 𝜔 from (14.12), i.e.,
∫ 𝑞 d𝜔 = 1. Fix 𝑛 ≥ 1, an array of positive integers m ∈ ℕ𝑛 , and a smooth
156 14. UNIVERSALITY OF THE DYSON BROWNIAN MOTION

function 𝐺 ∶ ℝ𝑛 → ℝ with compact support as in Theorem 14.1. Then, for any


𝐽 ⊂ {1, … , 𝑁 − 𝑚𝑛 } and any 𝑡 > 0 we have
1/2
| 1 | 𝐷𝜔 (√𝑞)
(14.21) |∫ ∑ 𝒢𝑖,𝐦 (𝐱)(𝑞 d𝜔 − d𝜔)| ≤ 𝐶(𝑡 ) + 𝐶 √𝑆𝜔 (𝑞)𝑒−𝑐𝑡/𝜏 .
| |𝐽| 𝑖∈𝐽 | |𝐽|

Proof. We give the proof for the 𝛽 ≥ 1 case here since this is relevant for
Theorem 14.1. The general case 𝛽 > 0 with additional assumptions will be
discussed in Section 14.4. For simplicity of notation, we consider only the case
𝑛 = 1, 𝑚1 = 1, 𝒢𝑖,𝐦 (𝐱) = 𝐺(𝑁(𝑥𝑖 − 𝑥𝑖+1 )). Let 𝑞𝑡 satisfy
𝜕𝑡 𝑞𝑡 = ℒ̃𝑞𝑡 , 𝑡 ≥ 0,
with an initial condition 𝑞0 = 𝑞. We write

1
(14.22) ∫[ ∑ 𝐺(𝑁(𝑥𝑖 − 𝑥𝑖+1 ))](𝑞 − 1)d𝜔 =
|𝐽| 𝑖∈𝐽
1
∫[ ∑ 𝐺(𝑁(𝑥𝑖 − 𝑥𝑖+1 ))](𝑞 − 𝑞𝑡 )d𝜔
|𝐽| 𝑖∈𝐽
1
+ ∫[ ∑ 𝐺(𝑁(𝑥𝑖 − 𝑥𝑖+1 ))](𝑞𝑡 − 1)d𝜔.
|𝐽| 𝑖∈𝐽

The second term in (14.22) can be estimated by (13.8), the decay of the entropy
(14.18), and the boundedness of 𝐺; this gives the second term in (14.21).
To estimate the first term in (14.22), by the evolution equation 𝜕𝑞𝑡 = ℒ̃𝑞𝑡
and the definition of ℒ̃ we have
1 1
∫ ∑ 𝐺(𝑁(𝑥𝑖 − 𝑥𝑖+1 ))𝑞𝑡 d𝜔 − ∫ ∑ 𝐺(𝑁(𝑥𝑖 − 𝑥𝑖+1 ))𝑞0 d𝜔 =
|𝐽| 𝑖∈𝐽 |𝐽| 𝑖∈𝐽
𝑡
1
∫ d𝑠 ∫ ∑ 𝐺 ′ (𝑁(𝑥𝑖 − 𝑥𝑖+1 ))[𝜕𝑖 𝑞𝑠 − 𝜕𝑖+1 𝑞𝑠 ]d𝜔.
0
|𝐽| 𝑖∈𝐽

From the Schwarz inequality and 𝜕𝑞 = 2√𝑞𝜕 √𝑞, the last term is bounded by
𝑡 1/2
𝑁2 2
2[∫ d𝑠 ∫ 2
∑[𝐺 ′ (𝑁(𝑥𝑖 − 𝑥𝑖+1 ))] (𝑥𝑖 − 𝑥𝑖+1 )2 𝑞𝑠 d𝜔]
0 ℝ𝑁
|𝐽| 𝑖∈𝐽
𝑡 1/2
(14.23) 1 1 2
× [∫ d𝑠 ∫ 2
∑ 2
[𝜕𝑖 √𝑞𝑠 − 𝜕𝑖+1 √𝑞𝑠 ] d𝜔]
0 ℝ𝑁
𝑁 𝑖 (𝑥𝑖 − 𝑥𝑖+1 )
1/2
𝐷𝜔 (√𝑞0 )𝑡
≤ 𝐶( ) ,
|𝐽|
where we have used (14.16) and that [𝐺 ′ (𝑁(𝑥𝑖 − 𝑥𝑖+1 ))]2 (𝑥𝑖 − 𝑥𝑖+1 )2 ≤ 𝐶𝑁 −2
due to 𝐺 being smooth and compactly supported. □
14.2. PROOF OF THEOREM 14.1 157

Alternatively, we could have directly estimated the left-hand side of (14.21)


by using the total variation norm between 𝑞𝜔 and 𝜔, which in turn could be
estimated by the entropy and the Dirichlet form using the logarithmic Sobolev
inequality, i.e., by

(14.24) 𝐶 ∫ |𝑞 − 1|d𝜔 ≤ 𝐶 √𝑆𝜔 (𝑞) ≤ 𝐶 √𝜏𝐷𝜔 (√𝑞).

However, compared with this simple bound, the estimate (14.21) gains an ex-
tra factor |𝐽| ≍ 𝑁 in the denominator; i.e., it is in terms of the Dirichlet form
per particle. The improvement is due to the fact that the observable in (14.21)
depends only on the gap, i.e., difference of points. This allows us to exploit the
additional term (14.16) gained in the Bakry-Émery argument. This is a man-
ifestation of the general observation that gap observables behave much better
than point observables.
The final ingredient in proving Theorem 14.1 is the following entropy and
Dirichlet form estimates.

Theorem 14.4. Let 𝛽 ≥ 1 be arbitrary. Suppose that (13.70) holds and recall
the definition of 𝑄 from (12.18). Fix some (possibly 𝑁-dependent) 𝜏 > 0, and
consider the local relaxation measure 𝜔 with this 𝜏. Set 𝜓 ∶= 𝜔/𝜇 and let 𝑔𝑡 ∶=
𝑓𝑡 /𝜓 with 𝑓𝑡 solving the evolution equation (14.3). Suppose there is a constant 𝑚
such that

(14.25) 𝑆(𝑓𝜏 𝜇|𝜔) ≤ 𝐶𝑁 𝑚 .

Fix any small 𝜀 > 0. Then, for any 𝑡 ∈ [𝜏𝑁 𝜀 , 𝑁] the entropy and the Dirichlet
form satisfy the estimates

(14.26) 𝑆(𝑔𝑡 𝜔|𝜔) ≤ 𝐶𝑁 2 𝑄𝜏 −1 , 𝐷𝜔 (√𝑔𝑡 ) ≤ 𝐶𝑁 2 𝑄𝜏 −2 ,

where the constants depend on 𝜀 and 𝑚.

Remark 14.5. It will not be difficult to check that if the initial data is given
by a Wigner ensemble, then (14.25) holds without any assumption (see Lemma
14.6).

Proof of Theorem 14.4. Using Lemma 13.11, we can compute the evo-
lution of the entropy 𝑆(𝑓𝑡 𝜇|𝜔) = 𝑆(𝑓𝑡 𝜇|𝜓𝜇) as
4
𝜕𝑡 𝑆(𝑓𝑡 𝜇|𝜔) = − ∑ ∫(𝜕𝑗 √𝑔𝑡 )2 𝜓 d𝜇 + ∫(ℒ𝑔𝑡 )𝜓 d𝜇
𝛽𝑁 𝑗

where ℒ is defined in (14.3) and we used that 𝜓 = 𝜔/𝜇 is time independent.


Hence we have, by using (14.13),
4
𝜕𝑡 𝑆(𝑓𝑡 𝜇|𝜔) = − ∑ ∫(𝜕𝑗 √𝑔𝑡 )2 d𝜔 + ∫ ℒ̃𝑔𝑡 d𝜔 + ∑ ∫ 𝑏𝑗 𝜕𝑗 𝑔𝑡 d𝜔.
𝛽𝑁 𝑗 𝑗
158 14. UNIVERSALITY OF THE DYSON BROWNIAN MOTION

Since 𝜔 is ℒ̃-invariant and time independent, the middle term on the right-hand
side vanishes. From the Schwarz inequality and 𝜕𝑔 = 2√𝑔 𝜕 √𝑔, we have

𝜕𝑡 𝑆(𝑓𝑡 𝜇|𝜔) ≤ −2𝐷𝜔 (√𝑔𝑡 ) + 𝐶𝑁 ∑ ∫ 𝑏𝑗2 𝑔𝑡 d𝜔


(14.27) 𝑗

≤ −2𝐷𝜔 (√𝑔𝑡 ) + 𝐶𝑁 𝑄𝜏 −2 .
2

Notice that (14.27) is reminiscent to (13.32) for the derivative of the entropy of
the measure 𝑔𝑡 𝜔 = 𝑓𝑡 𝜇 with respect to 𝜔. The difference is, however, that 𝑔𝑡
does not satisfy the evolution equation with the generator ℒ̃. The last term in
(14.27) expresses the error.
Together with the logarithmic Sobolev inequality (14.17), we have
(14.28) 𝜕𝑡 𝑆(𝑓𝑡 𝜇|𝜔) ≤ −2𝐷𝜔 (√𝑔𝑡 ) + 𝐶𝑁 2 𝑄𝜏 −2 ≤ −𝐶𝜏 −1 𝑆(𝑓𝑡 𝜇|𝜔) + 𝐶𝑁 2 𝑄𝜏 −2 .
Integrating the last inequality from 𝜏 to 𝑡 and using the assumption (14.25) and
𝑡 ≥ 𝜏𝑁 𝜀 , we have proved the first inequality of (14.26). Using this result and
integrating (14.27), we have
𝑡
(14.29) ∫ 𝐷𝜔 (√𝑔𝑠 )d𝑠 ≤ 𝐶𝑁 2 𝑄𝜏 −1 .
𝜏

Notice that
1 |∇(𝑔𝑠 𝜓)|2 d𝜔
𝐷𝜇 (√𝑓𝑠 ) = ∫
𝛽𝑁 𝑔𝑠 𝜓 𝜓
(14.30) 𝐶 |∇𝑔𝑠 |2
≤ ∫[ + |∇ log 𝜓|2 𝑔𝑠 ]d𝜔
𝛽𝑁 𝑔𝑠

= 𝐶𝐷𝜔 (√𝑔𝑠 ) + 𝐶𝑁 2 𝑄𝜏 −2
by a Schwarz inequality. Thus from (14.29), after restricting the integration and
using 𝑡 ≥ 2𝜏, we get
𝑡
2 −1
𝐶𝑁 𝑄𝜏 ≥ ∫ 𝐷𝜔 (√𝑔𝑠 )d𝑠
𝑡−𝜏
𝑡
(14.31) 1
≥∫ [ 𝐷 (√𝑓𝑠 ) − 𝐶𝑁 2 𝑄𝜏 −2 ]d𝑠
𝑡−𝜏
𝐶 𝜇
𝜏
≥ 𝐷𝜇 (√𝑓𝑡 ) − 𝐶𝑁 2 𝑄𝜏 −1 ,
𝐶
where, in the last step, we used that 𝐷𝜇 (√𝑓𝑡 ) is decreasing in 𝑡, which follows
from the convexity of the Hamiltonian of 𝜇 (see, e.g., (13.33)). Using the opposite
inequality 𝐷𝜔 (√𝑔𝑡 ) ≤ 𝐶𝐷𝜇 (√𝑓𝑡 ) + 𝐶𝑁 2 𝑄𝜏 −2 , which can be proven similarly to
(14.30), we obtain
𝐷𝜔 (√𝑔𝑡 ) ≤ 𝐶𝑁 2 𝑄𝜏 −2 ,
i.e., the second inequality of (14.26). □
14.2. PROOF OF THEOREM 14.1 159

Finally, we complete the proof of Theorem 14.1. For any given 𝑡 > 0 we now
choose 𝜏 ∶= 𝑡𝑁 −𝜀 and construct the local relaxation measure 𝜔 with this 𝜏 as in
(14.12). Set 𝜓 = 𝜔/𝜇 and let 𝑞 ∶= 𝑔𝑡 = 𝑓𝑡 /𝜓 be the density 𝑞 in Theorem 14.3.
We would like to apply Theorem 14.4, and for this purpose we need to verify the
assumption (14.25). By the definitions of 𝜔, 𝜇, and 𝜓 = 𝜔/𝜇, we have

(14.32) 𝑆(𝑓𝜏 𝜇|𝜔) = ∫ 𝑓𝜏 log 𝑓𝜏 d𝜇 − ∫ 𝑓𝜏 log 𝜓 d𝜇 = 𝑆𝜇 (𝑓𝜏 ) − ∫ 𝑓𝜏 log 𝜓 d𝜇.

We can bound the last term by


˜
𝑍
(14.33) − ∫ 𝑓𝜏 log 𝜓 d𝜇 ≤ 𝐶𝑁 ∫ 𝑓𝜏 𝑊 d𝜇 + log ≤ 𝐶𝑁 2 𝑄𝜏 −1 ≤ 𝐶𝑁 𝑚
𝑍

for some 𝑚, where we have used that 𝑍 ˜ ≤ 𝑍 since ℋ̃ ≥ ℋ. To verify the


assumption (14.25), it remains to prove that 𝑆𝜇 (𝑓𝜏 ) ≤ 𝐶𝑁 𝑚 for some 𝑚. This
inequality follows from the following lemma.

Lemma 14.6. Let 𝛽 = 1, 2. Suppose the initial data 𝑓0 of the DBM is given by
the eigenvalue distribution of a Wigner matrix. Then for any 𝜏 > 0 we have

(14.34) 𝑆𝜇 (𝑓𝜏 ) ≤ 𝐶𝑁 2 (1 − log(1 − e−𝜏 )).

Proof. For simplicity, we consider only the case 𝛽 = 1, i.e., the real Wigner
matrices. Recall that the probability measure 𝑓𝜏 𝜇 is the same as the eigenvalue
distribution of the Gaussian divisible matrix (12.21):

(14.35) 𝐻𝜏 = 𝑒−𝜏/2 𝐻0 + (1 − 𝑒−𝜏 )1/2 𝐻 G

where 𝐻0 is the initial Wigner matrix and 𝐻 G is an independent standard GOE


matrix. Since the entropy is monotonic w.r.t. taking a marginal, (13.21), we have

𝑆𝜇 (𝑓𝜏 ) = 𝑆(𝑓𝜏 𝜇|𝜇) ≤ 𝑆(𝜇𝐻𝜏 |𝜇𝐻G )

where 𝜇𝐻𝜏 and 𝜇𝐻G are the laws of the matrix 𝐻𝜏 and 𝐻 G , respectively. Since
the laws of both 𝜇𝐻𝜏 and 𝜇𝐻G are given by the product of the laws of the matrix
elements, from the additivity of entropy (13.11), 𝑆(𝜇𝐻𝜏 |𝜇𝐻G ) is equal to the sum
of the relative entropies of the matrix elements. Recall that the variances of off-
diagonal and diagonal entries for GOE differ by a factor of 2. For simplicity of
notation, we consider only the off-diagonal terms. Let 𝛾 = 1 − e−𝜏 and denote
by 𝑔𝛼 the standard Gaussian distribution with variance 𝛼, i.e.,

1 𝑥2
𝑔𝛼 (𝑥) ∶= exp(− ).
√2𝜋𝛼 2𝛼

Let 𝜚𝛾 be the probability density of (1 − 𝛾)1/2 𝜁 = 𝑒−𝜏/2 𝜁 where 𝜁 is the random


variable for an off-diagonal matrix element. By definition, the probability den-
sity of the matrix element of 𝐻𝜏 is given by 𝜁𝜏 = 𝜚𝛾 ∗ 𝑔2𝛾/𝑁 . Therefore Jensen’s
160 14. UNIVERSALITY OF THE DYSON BROWNIAN MOTION

inequality yields

|
𝑆(𝜁𝜏 |𝑔2/𝑁 ) = 𝑆(∫ d𝑦 𝜚𝛾 (𝑦) 𝑔2𝛾/𝑁 (⋅ − 𝑦) || 𝑔2/𝑁 )
(14.36)
≤ ∫ d𝑦 𝜚𝛾 (𝑦)𝑆(𝑔2𝛾/𝑁 (⋅ − 𝑦)|𝑔2/𝑁 ).

By explicit computation one finds

𝑠 𝑦2 𝜎2 1
𝑆(𝑔𝜍2 (⋅ − 𝑦)|𝑔𝑠2 ) = log + 2+ 2− .
𝜎 2𝑠 2𝑠 2
In our case, we have
1 𝑁
𝑆(𝑔2𝛾/𝑁 (⋅ − 𝑦)|𝑔2/𝑁 ) = ( 𝑦 2 − log 𝛾 + 𝛾 − 1).
2 2

We can now continue the estimate (14.36). Using ∫ 𝑦 2 𝜚𝛾 (𝑦)d𝑦 = 2/𝑁, we obtain

𝑆(𝜁𝜏 |𝑔2/𝑁 ) ≤ 𝐶 − log 𝛾,

and the claim follows. □

We now return to the proof of Theorem 14.1. We can apply Lemma 14.6
1
with the choice 𝜏 = 𝑁 −1+2𝜉 , where 𝜉 ∈ (0, ) is from the assumption (14.4).
2
Together with (14.32)–(14.33), this implies that (14.25) holds. Thus, Theorem
14.4 and Theorem 14.3 imply for any 𝑡 ∈ [𝜏𝑁 𝜀 , 𝑁] that
1/2
| 1 | 𝐷𝜔 (√𝑞) 𝜀
|∫ ∑ 𝒢m,𝑖 (𝐱)(𝑓𝑡 d𝜇 − d𝜔)| ≤ 𝐶(𝑡 ) + 𝐶 √𝑆𝜔 (𝑞) 𝑒−𝑐𝑁
| 𝑁 𝑖∈𝐽 | |𝐽|
1/2
(14.37) 𝑁2𝑄 𝜀
≤ 𝐶(𝑡 ) + 𝐶𝑒−𝑐𝑁
|𝐽|𝜏 2
𝑁2𝑄 𝜀
≤ 𝐶𝑁 𝜀 + 𝐶𝑒−𝑐𝑁 ;
√ |𝐽|𝑡

i.e., the local statistics of 𝑓𝑡 𝜇 and 𝜔 are the same for any initial data 𝑓𝜏 .
Applying the same argument to the Gaussian initial data, 𝑓0 = 𝑓𝜏 = 1, we
can also compare 𝜇 and 𝜔. We have thus proved the estimate (14.7). Finally, if
𝑡 ≥ 𝑁 −1+2𝜉+𝛿+2𝜀 , then the assumption (14.4) guarantees that

𝑁2𝑄 1
≤ ,
√ |𝐽|𝑡 |𝐽|𝑁 𝛿−1

which proves Theorem 14.1.


14.3. RESTRICTION TO THE SIMPLEX VIA REGULARIZATION 161

14.3. Restriction to the Simplex via Regularization


In the previous section we tacitly assumed that the Bakry-Émery analysis
from Section 13.3 extends to Σ. Strictly speaking, this is not fully rigorous. There
are two options to handle this technical point.
As we remarked after Theorem 13.14, there is a direct method via cutoff
functions to make the Bakry-Émery argument rigorous on Σ if 𝛽 ≥ 1 (see ap-
pendix B of [64]). The same technique can also make every step in Section 14.2
rigorous on Σ.
We present here a more transparent argument that relies on the regularized
measure 𝜔𝛿 on ℝ𝑁 already used in the proof of the LSI in Theorem 13.14. This
path is technically simpler since it performs the core of the analysis on ℝ𝑁 , and
it uses an extra input, the strong solution to the DBM, Theorem 12.2, to remove
the regularization at the end. We now explain the correct procedure of the proof
of Theorem 14.1 using this regularization. The goal is to prove (14.37).
For any small 𝛿 > 0 we introduce the regularized versions 𝜇𝛿 and 𝜔𝛿 of the
measures 𝜇 and 𝜔 as in Section 13.7. Let ℒ𝜔𝛿 be the generator of the regularized
measure 𝜔𝛿 on ℝ𝑁 , and let 𝑓𝑡,𝛿 be the solution to 𝜕𝑡 𝑓𝑡,𝛿 = ℒ𝜔𝛿 𝑓𝑡,𝛿 on ℝ𝑁 with
the same initial condition as for 𝛿 = 0; i.e., 𝑓0,𝛿 is the extension of 𝑓0 to the entire
ℝ𝑁 with 𝑓0,𝛿 (𝐱) = 0 for 𝐱 ∉ Σ𝑁 .
The three key theorems, Theorems 14.2–14.4, of Section 14.2 were formu-
lated and proven for the regularized setup on ℝ𝑁 , so they can be applied to
the regularized dynamics. Using the notation employed in these theorems, the
conclusion is that if
(14.38) 𝑆(𝑓𝜏,𝛿 𝜇𝛿 |𝜔𝛿 ) ≤ 𝐶𝑁 𝑚
(see (14.25)), then
| 1 |
(14.39) |∫ ∑ 𝒢m,𝑖 (𝐱)(𝑓𝑡,𝛿 d𝜇𝛿 − d𝜔𝛿 )| ≤
| 𝑁 𝑖∈𝐽 |

𝑁 2 𝑄𝛿 𝜀
𝐶𝑁 𝜀 + 𝐶𝑒−𝑐𝑁 , 𝑡 ∈ [𝜏𝑁 𝜀 , 𝑁],
√ |𝐽|𝑡
where 𝑄𝛿 is defined exactly as in (14.4) with the regularized measures, i.e.,
𝑁
1
(14.40) 𝑄𝛿 ∶= sup ∫ ∑ (𝑥𝑗 − 𝛾𝑗 )2 𝑓𝑡,𝛿 (𝐱)𝜇𝛿 (d𝐱).
0≤𝑡≤𝑁 𝑁 𝑗=1

Now we let 𝛿 → 0. Since 𝜔𝛿 converges weakly to 𝜔 in the sense that 𝜔𝛿 (𝐴) →


𝜔(𝐴 ∩ Σ𝑁 ) for any subset 𝐴 ∈ ℝ𝑁 , we have ∫ 𝑂(𝐱)d𝜔𝛿 → ∫ 𝑂(𝐱)d𝜔 for any
bounded observable 𝑂. Setting
1
(14.41) 𝑂(𝐱) = ∑ 𝒢m,𝑖 (𝐱),
𝑁 𝑖∈𝐽
we see that the second term within the integral in the left-hand side of (14.39)
converges to the corresponding term in (14.37). On the right-hand side a similar
162 14. UNIVERSALITY OF THE DYSON BROWNIAN MOTION

argument shows that 𝑄𝛿 → 𝑄 as 𝛿 → 0. Strictly speaking, the observable


(𝑥𝑗 − 𝛾𝑗 )2 is unbounded, but a very simple cutoff argument shows that

ℙ𝑓𝑡,𝛿 𝜇𝛿 (max |𝑥𝑗 | ≥ 𝑁 2 ) ≤ 𝑒−𝑐𝑁


𝑗

uniformly in 𝛿 and 𝑡 ∈ [0, 𝑁].


For the first term in the left side of (14.39), we use the stochastic represen-
tation of the solution to 𝜕𝑡 𝑓𝑡,𝛿 = ℒ𝜔𝛿 𝑓𝑡,𝛿 to note that

𝐱
(14.42) ∫ 𝑂(𝐱)𝑓𝑡,𝛿 (𝐱)d𝜔𝛿 = 𝔼𝑓0 𝜔 𝔼𝛿0 𝑂(𝐱(𝑡)),
ℝ𝑁

𝐱
where 𝔼𝛿0 denotes the expectation with respect to the law of the regularized
DBM (𝐱(𝑡))𝑡 (see (13.82)) starting from 𝐱0 , and 𝔼𝑓0 𝜔 denotes the expectation
of 𝐱0 with respect to the measure 𝑓0 𝜔. Now we use the existence of the strong
solution to the DBM (13.74) for any 𝛽 ≥ 1, that can be obtained exactly as in
Theorem 12.2 for the 𝑈𝑗 ≡ 0 case. Since 𝐱(𝑡) is continuous and remains in the
open set Σ𝑁 , the probability that up to a fixed time 𝑡 it remains away from a
𝛿-neighborhood of the boundary of Σ𝑁 goes to 1 as 𝛿 → 0, i.e.,

lim ℙ𝜔𝛿 (dist(𝐱(𝑠), 𝜕Σ𝑁 ) ≥ 𝛿 ∶ 𝑠 ∈ [0, 𝑡]) = 1.


𝛿→0

Notice that (13.74) and (13.82) are exactly the same for paths that stay away from
the boundary of Σ𝑁 . This means that the right-hand side of (14.42) converges
to 𝔼𝑓0 𝜔 𝔼𝐱0 𝑂(𝐱(𝑡)), where 𝔼𝐱0 denotes expectation with respect to the law of
(13.74). This proves that

(14.43) lim ∫ 𝑂(𝐱)𝑓𝑡,𝛿 (𝐱)d𝜔𝛿 = ∫ 𝑂(𝐱)𝑓𝑡 (𝐱)d𝜔,


𝛿→0 ℝ𝑁 Σ𝑁

which completes the proof of (14.37).


Finally, we need to verify the condition (14.38). Following (14.32)–(14.33),
we only need to show that 𝑆𝜇𝛿 (𝑓𝜏,𝛿 ) ≤ 𝐶𝑁 𝑚 for sufficiently small 𝛿 > 0. We
first run the DBM for a very short time and replace the original matrix ensemble
𝐻0 by 𝐻𝜍 , its tiny Gaussian convolution of variance 𝜎 = 𝑁 −10 (see (14.35)).
This is not a restriction, since Theorem 14.1 concerns 𝑓𝑡 for much larger times,
𝑡 ≫ 𝑁 −1 . The corresponding eigenvalue density 𝑓𝜍 supported on Σ𝑁 satisfies
𝑆𝜇 (𝑓𝜍 ) ≤ 𝐶𝑁 12 by (14.34). As a second cutoff, for some large 𝐾 we set 𝑓ˆ𝜍,𝐾 ∶=
𝐶𝐾 max{𝑓𝜍 , 𝐾} with a normalization constant 𝐶𝐾 such that ∫ 𝑓ˆ𝜍,𝐾 d𝜇 = 1 and
𝐶𝐾 → 1 as 𝐾 → ∞. We will actually apply Theorem 14.1 to 𝑓ˆ𝜍,𝐾 as an initial
condition. The regularized flow also starts with this initial condition; i.e., we
define 𝑓𝑡,𝛿 to be the solution of 𝜕𝑡 𝑓𝑡,𝛿 = ℒ𝜇𝛿 𝑓𝑡,𝛿 with 𝑓0,𝛿 ∶= 𝑓ˆ𝜍,𝐾 . Since the
14.4. DIRICHLET FORM INEQUALITY FOR ANY 𝜷 > 𝟎 163

entropy decays along the regularized flow, we immediately have 𝑆𝜇𝛿 (𝑓𝜏,𝛿 ) ≤
𝑆𝜇𝛿 (𝑓0,𝛿 ) = 𝑆𝜇𝛿 (𝑓ˆ𝜍,𝐾 ). Since 𝑓ˆ𝜍,𝐾 is supported on Σ𝑁 and is bounded, clearly
𝑆𝜇𝛿 (𝑓ˆ𝜍,𝐾 ) converges to 𝑆𝜇 (𝑓ˆ𝜍,𝐾 ) as 𝛿 → 0. We then have

𝑆𝜇 (𝑓ˆ𝜍,𝐾 ) = 𝐶𝐾 ∫ max{𝑓𝜍 , 𝐾}[log max{𝑓𝜍 , 𝐾} + log 𝐶𝐾 ]d𝜇 → 𝑆𝜇 (𝑓𝜍 )

as 𝐾 → ∞ by monotone convergence. Thus, first choosing 𝐾 sufficiently large


so that 𝑆𝜇 (𝑓ˆ𝜍,𝐾 ) ≤ 2𝑆𝜇 (𝑓𝜍 ) ≤ 𝐶𝑁 12 , then choosing 𝛿 sufficiently small, we
achieved 𝑆𝜇𝛿 (𝑓𝜏,𝛿 ) ≤ 𝐶𝑁 12 .
This verifies the condition (14.38) for some sufficiently small 𝛿, so that
(14.39) holds. Letting 𝛿 → 0, we obtain (14.7), which completes the fully rigor-
ous proof of Theorem 14.1.

14.4. Dirichlet Form Inequality for Any 𝜷 > 𝟎


Here we extend the proof of the key Dirichlet form inequality, Theorem 14.3,
from 𝛽 ≥ 1 to any 𝛽 > 0. This extension will not be needed for this book, so the
reader may skip this section, but we remark that the Dirichlet form inequality
for 𝛽 > 0 played a crucial role in the proof of the universality for invariant
ensembles for any 𝛽 > 0 (see [25]).
Lemma 14.7. Let 𝛽 > 0 and 𝑞 be a probability density with respect to 𝜔 de-
fined in (14.12) with ∇√𝑞 ∈ 𝐿∞ (d𝜔) and 𝑞 ∈ 𝐿∞ (d𝜔). Then, for any 𝐽 ⊂ {1, … ,
𝑁 − 𝑚𝑛 − 1} and any 𝑡 > 0 we have

| 1 1 |
(14.44) |∫ ∑ 𝒢𝑖,𝐦 𝑞 d𝜔 − ∫ ∑ 𝒢𝑖,𝐦 d𝜔| ≤
| |𝐽| 𝑖∈𝐽 |𝐽| 𝑖∈𝐽 |

𝐷𝜔 (√𝑞) 𝑡
𝐶 + 𝐶 √𝑆𝜔 (𝑞) e−𝑐𝑡/𝜏 .
√ |𝐽|
For 𝛽 ≥ 1, the conditions ∇√𝑞 ∈ 𝐿∞ and 𝑞 ∈ 𝐿∞ (d𝜔) can be removed.
We emphasize that this lemma holds for any 𝛽 > 0; i.e., it does not (directly)
rely on the existence of the DBM. The parameter 𝑡 is not connected with the
time parameter of a dynamics on Σ𝑁 (although it emerges as a time cutoff in a
regularized dynamics on ℝ𝑁 within the proof).

Proof of Lemma 14.7. First we record the result if 𝜔 on Σ𝑁 is replaced


with the regularized measure 𝜔𝛿 on ℝ𝑁 , as defined in (13.81) with the choice
of ℋ̂(𝐱) = ∑𝑗 𝑈𝑗 (𝑥𝑗 ), where 𝑈𝑗 is given in (14.10). In this case the proof of
Theorem 14.3 applies to the letter even for 𝛽 > 0, since now we work on ℝ𝑁 and
complications arising from the boundary are absent. We immediately obtain for
𝑞 on ℝ𝑁 with ∫ ˆ
any probability density ˆ 𝑞 d𝜔𝛿 = 1, for any 𝐽 ⊂ {1, 𝑁 − 𝑚𝑛 − 1}
164 14. UNIVERSALITY OF THE DYSON BROWNIAN MOTION

and any 𝑡 > 0 that

| 1 1 |
(14.45) ||∫ ∑ 𝒢𝑖,𝐦 𝑞 ̂ d𝜔𝛿 − ∫ ∑ 𝒢𝑖,𝐦 d𝜔𝛿 || ≤
|𝐽| 𝑖∈𝐽
|𝐽| 𝑖∈𝐽

√ 𝐷𝜔𝛿 (√𝑞)̂ 𝑡
𝐶 + 𝐶 √𝑆𝜔𝛿 (𝑞)̂ e−𝑐𝑡/𝜏 .
√ |𝐽|
Suppose now that √𝑞 ∈ 𝐻 1 (d𝜔) and 𝑞 is a bounded probability density in
Σ𝑁 with respect to 𝜔. Similarly to the proof of (13.76) for the general 𝛽 > 0 case,
we may extend 𝑞 to Σ̃ by symmetrization. Let ˜ 𝑞 denote this extension, which
1/2
is bounded, and ∇ 𝑓˜ is also bounded since 𝑞 has these properties. Then there
is a constant 𝐶𝛿 such that 𝑞𝛿 ∶= 𝐶𝛿 ˜ 𝑞 is a probability density with respect to
𝑞 is bounded, we have ∫ℝ𝑁 ˜
𝜔𝛿 . Since ˜ 𝑞 d𝜔𝛿 → ∫Σ𝑁 𝑞 d𝜔 by dominated conver-
gence, and thus 𝐶𝛿 → 1 as 𝛿 → 0.
Now we apply (14.45) 𝑞ˆ = 𝑞𝛿 . Taking the limit 𝛿 → 0, the left-hand
side converges to that of (14.44) since 𝒢𝑖,m is a bounded smooth function and
𝑞𝛿 d𝜔𝛿 = 𝐶𝛿 ˜ 𝑞 d𝜔𝛿 converges weakly to 𝑞(𝜔)1(𝜔 ∈ Σ𝑁 )d𝜔 by dominated con-
vergence. We thus have
| 1 1 |
(14.46) |∫ ∑ 𝒢𝑖,𝐦 𝑞 d𝜔 − ∫ ∑ 𝒢𝑖,𝐦 d𝜔| ≤
| |𝐽| 𝑖∈𝐽 |𝐽| 𝑖∈𝐽 |

√ 𝐶𝛿 𝐷𝜔𝛿 (√ ˜
𝑞)𝑡
𝐶 lim sup 𝑞 ) e−𝑐𝑡/𝜏 .
+ 𝐶 lim sup √𝑆𝜔𝛿 (𝐶𝛿 ˜
𝛿→0 √ |𝐽| 𝛿→0
1/2 1/2
By dominated convergence, using that ∇ 𝑓 ˜ ∈ 𝐿∞ , we have 𝐷𝜔𝛿 ( 𝑓˜ ) →
𝐷𝜔 (√𝑓). Similarly, the entropy term also converges to 𝑆𝜔 (𝑞) by ˜ 𝑞 ∈ 𝐿∞ .
∞ ∞
This proves (14.44) under the condition 𝑞 ∈ 𝐿 , ∇√𝑞 ∈ 𝐿 . Finally, these
conditions can be removed for 𝛽 ≥ 1 by a simple approximation. We may assume
𝐷𝜔 (√𝑞) < ∞; otherwise, (14.44) is an empty statement. By the LSI (13.76), we
also know that 𝑆𝜔 (𝑞) < ∞. First, we still keep the assumption that 𝑞 ∈ 𝐿∞ .
Since 𝐶0∞ (Σ) is dense in 𝐻 1 (d𝜔) (see Section 12.4), we can find a sequence of
densities 𝑞𝑛 ∈ 𝐿∞ (Σ) such that ∇√𝑞𝑛 ∈ 𝐿2 (d𝜔) and √𝑞𝑛 → √𝑞 in 𝐿2 (d𝜔) and
∇√𝑞𝑛 → ∇√𝑞 in 𝐿2 (d𝜔). In fact, the construction in Section 12.4 guarantees
that, apart from an irrelevant smoothing, 𝑞𝑛 may be chosen of the form 𝑞𝑛 =
𝐶𝑛 𝜙𝑛 𝑞 where 𝜙𝑛 is a cutoff function with 0 ≤ 𝜙𝑛 ≤ 1, converging to 1 pointwise
and 𝐶𝑛 → 1. We know that (14.44) holds for every 𝑞𝑛 . Taking the limit 𝑛 → ∞,
the left-hand side converges since
| |
|∫ 𝑂(𝑞𝑛 − 𝑞)d𝜔| ≤ ‖𝑂‖∞ ‖√𝑞𝑛 − √𝑞‖𝐿2 (d𝜔) ‖√𝑞𝑛 + √𝑞‖𝐿2 (d𝜔) → 0,
| |
where we have used that ‖√𝑞𝑛 + √𝑞‖2 ≤ ‖√𝑞𝑛 ‖2 + ‖√𝑞‖2 and ‖√𝑞‖2 =
∫ 𝑞 d𝜔 = 1. Here, 𝑂 is given by (14.41). For the right-hand side of (14.44)
14.5. FROM GAP DISTRIBUTION TO CORRELATION FUNCTIONS 165

applied to the approximate function 𝑞𝑛 , we note that 𝐷𝜔 (√𝑞𝑛 ) → 𝐷𝜔 (√𝑞) di-


rectly by the choice of 𝑞𝑛 . Finally, we need to show that 𝑆𝜔 (𝑞𝑛 ) → 𝑆𝜔 (𝑞) as
𝑛 → ∞. Clearly,

𝑆𝜔 (𝑞𝑛 ) − 𝑆𝜔 (𝑞) = ∫[𝑞𝑛 log 𝑞𝑛 − 𝑞 log 𝑞]d𝜔


Σ

= ∫[𝐶𝑛 𝜙𝑛 𝑞 log(𝐶𝑛 𝜙𝑛 ) + (𝐶𝑛 𝜙𝑛 − 1)𝑞 log 𝑞]d𝜔,


Σ
and both integrals go to 0 by dominated convergence and by 𝑆𝜔 (𝑞) < ∞. This
proves (14.44) for any 𝑞 ∈ 𝐿∞ . Finally, this last condition can be removed by
approximating any density 𝑞 with 𝑞𝑀 ∶= 𝐶𝑀 max{𝑞, 𝑀}, where 𝐶𝑀 is the nor-
malization and 𝐶𝑀 → 1 as 𝑀 → ∞. Similarly to the proof in (13.84), one can
show that 𝑆𝜔 (𝑞𝑀 ) → 𝑆𝜔 (𝑞) and 𝐷𝜔 (𝑞𝑀 ) → 𝐷𝜔 (𝑞). Thus we can pass to the
limit in the inequality (14.44) written up for 𝑞𝑀 . This proves Lemma 14.7. No-
tice that the proof did not use the existence of the DBM; instead, it used the
existence of the regularized DBM. □

14.5. From Gap Distribution to Correlation Functions:


Proof of Theorem 12.4
Theorem 12.4 follows from Theorem 14.1 if we can show that the correlation
function difference in (12.23) can be bounded in terms of the difference of the
expectation of gap observables in (14.8). We state it as the following lemma,
which clearly proves Theorem 12.4 with Theorem 14.1 as an input.
Lemma 14.8. Let 𝑓 be a probability density with respect to the Gaussian equi-
librium measure (12.13). Suppose that the estimate (12.22) holds with some ex-
ponent 𝜉 > 0 and that
| 1 | 1
(14.47) |∫ ∑ 𝒢𝑖,𝐦 (𝐱)(𝑓 d𝜇 − d𝜇)| ≤ 𝐶
| |𝐽| 𝑖∈𝐽 | 𝛿−1
√|𝐽|𝑁
also holds for some 𝛿 > 0 and for any 𝐽 ⊂ {1, … , 𝑁 − 𝑚𝑛 }, where 𝒢𝑖,𝐦 is defined
in (14.6). Then, for any 𝜀 > 0 and 𝑁 large enough, we have for any 𝑏 = 𝑏𝑁 with
𝑁 −1 ≪ 𝑏 ≪ 1 that

| 𝐸+𝑏 d𝐸 ′ (𝑛) (𝑛) 𝜶 |


(14.48) |∫ ∫ d𝜶 𝑂(𝜶)(𝑝𝑓𝜇,𝑁 − 𝑝𝜇,𝑁 )(𝐸 ′ + )| ≤
| 𝐸−𝑏 2𝑏 ℝ𝑛 𝑁𝜚sc (𝐸) |
𝑁 −1+𝜉 𝑁 −𝛿
𝑁 2𝜀 [ +√ ].
𝑏 𝑏
This is a fairly standard technical argument, and the details will be given
in Section 14.6. Here we just summarize the main points. To understand the
difference between (14.47) and (14.48), consider 𝑛 = 1 for simplicity and let
𝑚1 = 1, say. The observable (14.7) answers the question, What is the empirical
166 14. UNIVERSALITY OF THE DYSON BROWNIAN MOTION

distribution of differences of consecutive eigenvalues?; in other words, (14.7) di-


rectly identifies the gap distribution. Correlation functions answer the question,
What is the probability that there are two eigenvalues at a fixed distance away
from each other, in other words, that they are not directly sensitive to the pos-
sible other eigenvalues in between? Of course, these two questions are closely
related, and it is easy to deduce the answers from each other under some rea-
sonable assumptions on the distribution of the eigenvalues. We now prove that
the correlation functions can be derived from (generalized) gap distributions,
which is the content of Lemma 14.8.

14.6. Details of the Proof of Lemma 14.8


By definition of the correlation function in (12.19), we have
𝐸+𝑏
d𝐸 ′ (𝑛) 𝜶
∫ ∫ d𝜶𝑂(𝜶)𝑝𝑓𝜇,𝑁 (𝐸 ′ + )
𝐸−𝑏
2𝑏 ℝ𝑛 𝑁
𝐸+𝑏
d𝐸 ′
(14.49) = 𝐶𝑁,𝑛 ∫ ∫ ∑ 𝑂(𝑁(𝑥𝑖1 − 𝐸 ′ ), 𝑁(𝑥𝑖1 − 𝑥𝑖2 ), … , 𝑁(𝑥𝑖𝑛−1 − 𝑥𝑖𝑛 ))𝑓 d𝜇
𝐸−𝑏
2𝑏 𝑖 ≠⋯≠𝑖
1 𝑛

𝐸+𝑏 ′ 𝑁
d𝐸
= 𝐶𝑁,𝑛 ∫ ∫ ∑ ∑ 𝑌𝑖,m (𝐸 ′ , 𝐱)𝑓 d𝜇
𝐸−𝑏
2𝑏 m∈𝑆 𝑖=1𝑛

with 𝐶𝑁,𝑛 ∶= 𝑁 (𝑁 − 𝑛)! /𝑁! = 1 + 𝑂𝑛 (𝑁 −1 ), where we let 𝑆𝑛 denote the set


𝑛

of increasing positive integers m = (𝑚2 , … , 𝑚𝑛 ) ∈ ℕ𝑛−1


+ , 𝑚2 < ⋯ < 𝑚𝑛 , and
we introduced
(14.50) 𝑌𝑖,m (𝐸 ′ , 𝐱) ∶= 𝑂(𝑁(𝑥𝑖 − 𝐸 ′ ), 𝑁(𝑥𝑖 − 𝑥𝑖+𝑚2 ), … , 𝑁(𝑥𝑖 − 𝑥𝑖+𝑚𝑛 )).
(𝑛)
We will set 𝑌𝑖,m = 0 if 𝑖 + 𝑚𝑛 > 𝑁. By permutational symmetry of 𝑝𝑓,𝑁 we can
assume that 𝑂 is symmetric, and thus we restrict the summation to 𝑖1 < ⋯ < 𝑖𝑛
upon an overall factor 𝑛!. Then, we changed indices 𝑖 = 𝑖1 , 𝑖2 = 𝑖 + 𝑚2 , 𝑖3 =
𝑖+𝑚3 , … , and performed a resummation over all index differences encoded in m.
Apart from the first variable 𝑁(𝑥𝑖1 − 𝐸 ′ ), the function 𝑌𝑖,m is of the form (14.6),
so (14.47) will apply. The dependence on the first variable will be negligible
after the d𝐸 ′ integration on a macroscopic interval.
To control the error terms in this argument, especially to show that the error
terms in the potentially infinite sum over m ∈ Sn converge, one needs an a priori
bound on the local density. But this information is provided very precisely by
the bulk rigidity estimate (12.22). The details for the rest of the argument will
be presented in the next subsection.
Since (14.49) also holds if we set 𝑓 = 1, to prove Lemma 14.8, we only need
to estimate
| 𝐸+𝑏 d𝐸 ′ 𝑁
|
(14.51) Θ ∶= |∫ ∫ ∑ ∑ 𝑌𝑖,m (𝐸 ′ , 𝐱)(𝑓 − 1)d𝜇|.
| 𝐸−𝑏 2𝑏 m∈𝑆𝑛 𝑖=1 |
Let 𝑀 be an 𝑁-dependent parameter chosen at the end of the proof. Let
𝑆𝑛 (𝑀) ∶= {m ∈ 𝑆𝑛 , 𝑚𝑛 ≤ 𝑀}, 𝑆𝑛c (𝑀) ∶= 𝑆𝑛 ⧵ 𝑆𝑛 (𝑀),
14.6. DETAILS OF THE PROOF OF LEMMA 14.8 167

(1) (2)
and note that |𝑆𝑛 (𝑀)| ≤ 𝑀 𝑛−1 . We have the simple bound Θ ≤ Θ𝑀 + Θ𝑀 +
(3)
Θ𝑀 where

(1) | 𝐸+𝑏 d𝐸 ′ 𝑁
|
(14.52) Θ𝑀 ∶= |∫ ∫ ∑ ∑ 𝑌𝑖,𝐦 (𝐸 ′ , 𝐱)(𝑓 − 1)d𝜇|
| 𝐸−𝑏 2𝑏 𝐦∈𝑆 (𝑀) 𝑖=1
|
𝑛

and

(2) | 𝐸+𝑏 d𝐸 ′ 𝑁
|
(14.53) Θ𝑀 ∶= ∑ |∫ ∫ ∑ 𝑌𝑖,𝐦 (𝐸 ′ , 𝐱)𝑓 d𝜇|.
|
𝐦∈𝑆c (𝑀) 𝐸−𝑏
2𝑏 𝑖=1 |
𝑛

(3) (2)
We define Θ𝑀 to be the same as Θ𝑀 but with 𝑓 replaced by the constant 1, i.e.,
the equilibrium measure 𝜇.
(1)
Step 1. Small 𝐦 case; estimate of Θ𝑀 . After performing the d𝐸 ′ integration,
we will eventually apply Theorem 14.1 to the function

𝐺(𝑢1 , 𝑢2 , … ) ∶= ∫ 𝑂(𝑦, 𝑢1 , 𝑢2 , … )d𝑦,


i.e., to the quantity


1
(14.54) ∫ d𝐸 ′ 𝑌𝑖,𝐦 (𝐸 ′ , 𝐱) = 𝐺(𝑁(𝑥𝑖 − 𝑥𝑖+𝑚2 ), … )

𝑁
for each fixed 𝑖 and m.
±
For any 𝐸 and 0 < 𝜁 < 𝑏, define sets of integers 𝐽 = 𝐽𝐸,𝑏,𝜁 and 𝐽 ± = 𝐽𝐸,𝑏,𝜁
by
𝐽 ∶= {𝑖 ∶ 𝛾𝑖 ∈ [𝐸 − 𝑏, 𝐸 + 𝑏]}, 𝐽 ± ∶= {𝑖 ∶ 𝛾𝑖 ∈ [𝐸 − (𝑏 ± 𝜁), 𝐸 + 𝑏 ± 𝜁]},
where 𝛾𝑖 was defined in (11.31). Clearly, 𝐽 − ⊂ 𝐽 ⊂ 𝐽 + . Using this notation, we
have
𝐸+𝑏 𝑁 𝐸+𝑏
(14.55) ∫ ′
d𝐸 ∑ 𝑌𝑖,𝐦 (𝐸 , 𝐱) = ∫′
d𝐸 ′ ∑ 𝑌𝑖,𝐦 (𝐸 ′ , 𝐱) + Ω+
𝐽,𝐦 (𝐱).
𝐸−𝑏 𝑖=1 𝐸−𝑏 𝑖∈𝐽+

The error term Ω+ 𝐽,𝐦 , defined by (14.55) indirectly, comes from those 𝑖 ∉ 𝐽
+
−1 ′
indices, for which 𝑥𝑖 ∈ [𝐸 − 𝑏, 𝐸 + 𝑏] + 𝑂(𝑁 ) since 𝑌𝑖,𝐦 (𝐸 , 𝐱) = 0 unless
|𝑥𝑖 − 𝐸 ′ | ≤ 𝐶/𝑁, the constant depending on the support of 𝑂. Thus,
|Ω+ −1
𝐽,𝐦 (𝐱)| ≤ 𝐶𝑁 #{𝑖 ∶ |𝑥𝑖 − 𝛾𝑖 | ≥ 𝜁/2, |𝑥𝑖 − 𝐸| ≤ 2𝑏}

for any sufficiently large 𝑁 assuming 𝜁 ≫ 1/𝑁 and using that 𝑂 is a bounded
function. The additional 𝑁 −1 factor comes from the d𝐸 ′ integration. Due to the

rigidity estimate (12.22) and choosing 𝜁 = 𝑁 −1+𝜉+𝜀 with some 𝜀′ > 0, we get

(14.56) ∫ |Ω+
𝐽,𝐦 (𝐱)|𝑓 d𝜇 ≤ 𝑁
−𝐷
168 14. UNIVERSALITY OF THE DYSON BROWNIAN MOTION

for any 𝐷. We can also estimate


𝐸+𝑏
∫ d𝐸 ′ ∑ 𝑖, m(𝐸 ′ , 𝐱)
𝐸−𝑏 𝑖∈𝐽+
𝐸+𝑏
≤∫ d𝐸 ′ ∑ 𝑌𝑖,m (𝐸 ′ , 𝐱) + 𝐶𝑁 −1 |𝐽 + ⧵ 𝐽 − |
𝐸−𝑏 𝑖∈𝐽−

(14.57)
= ∫ d𝐸 ′ ∑ 𝑌𝑖,m (𝐸 ′ , 𝐱) + 𝐶𝑁 −1 |𝐽 + ⧵ 𝐽 − | + Ξ+
𝐽,m (𝐱)
ℝ 𝑖∈𝐽−

≤ ∫ d𝐸 ′ ∑ 𝑌𝑖,m (𝐸 ′ , 𝐱) + 𝐶𝑁 −1 |𝐽 + ⧵ 𝐽 − |
ℝ 𝑖∈𝐽
+ 𝐶𝑁 −1 |𝐽 ⧵ 𝐽 − | + Ξ+
𝐽,m (𝐱)

where the error term Ξ+ 𝐽,m , defined by (14.57), comes from indices 𝑖 ∈ 𝐽 such

that 𝑥𝑖 ∉ [𝐸 −𝑏, 𝐸 +𝑏]+𝑂(1/𝑁). It satisfies the same bound (14.56) as Ω+ 𝐽,m . By


the continuity of 𝜚, the density of the 𝛾𝑖 ’s is bounded by 𝐶𝑁, thus |𝐽 + ⧵𝐽 − | ≤ 𝐶𝑁𝜁
and |𝐽 ⧵ 𝐽 − | ≤ 𝐶𝑁𝜁. Therefore, summing up the formula (14.54) for 𝑖 ∈ 𝐽, we
obtain from (14.55), (14.56), and (14.57)
𝐸+𝑏 𝑁
∫ d𝐸 ′ ∫ ∑ 𝑌𝑖,m (𝐸 ′ , 𝐱)𝑓 d𝜇 ≤
𝐸−𝑏 𝑖=1
1
∫ ∑ 𝐺(𝑁(𝑥𝑖 − 𝑥𝑖+𝑚2 ), … )𝑓 d𝜇 + 𝐶𝜁 + 𝐶𝑁 −𝐷
𝑁 𝑖∈𝐽
for each m ∈ 𝑆𝑛 . A similar lower bound can be obtained analogously, and we
get

| 𝐸+𝑏 ′ 𝑁
(14.58) | ∫ d𝐸 ∫ ∑ 𝑌𝑖,m (𝐸 ′ , 𝐱)𝑓 d𝜇
| 𝐸−𝑏 𝑖=1

1 |
−∫ ∑ 𝐺(𝑁(𝑥𝑖 − 𝑥𝑖+𝑚2 ), … )𝑓 d𝜇| ≤ 𝐶𝜁 + 𝐶𝑁 −𝐷
𝑁 𝑖∈𝐽 |
for each m ∈ 𝑆𝑛 . The error term 𝑁 −𝐷 can be neglected. Adding up (14.58) for
all m ∈ 𝑆𝑛 (𝑀), we get

| 𝐸+𝑏 ′ 𝑁
(14.59) |∫ d𝐸 ∫ ∑ ∑ 𝑌𝑖,m (𝐸 ′ , 𝐱)𝑓 d𝜇
| 𝐸−𝑏 m∈𝑆 (𝑀) 𝑖=1 𝑛

1 |
−∫ ∑ ∑ 𝐺(𝑁(𝑥𝑖 − 𝑥𝑖+𝑚2 ), … )𝑓 d𝜇| ≤ 𝐶𝑀 𝑛−1 𝜁.
m∈𝑆𝑛 (𝑀)
𝑁 𝑖∈𝐽 |
Clearly, the same estimate holds for the equilibrium, i.e., if we set 𝑓 = 1 in
(14.59). Subtracting these two formulas and applying (14.47) to each summand
14.6. DETAILS OF THE PROOF OF LEMMA 14.8 169

on the second term in (14.58), we conclude that

(1) | 𝐸+𝑏 d𝐸 ′ 𝑁
|
Θ𝑀 = | ∫ ∫ ∑ ∑ 𝑌𝑖,m (𝐸 ′ , 𝐱)(𝑓 d𝜇 − d𝜇)|
(14.60) | 𝐸−𝑏 2𝑏 m∈𝑆 (𝑀) 𝑖=1
|
𝑛

𝑛−1 −1 −1+𝜉+𝜀′
≤ 𝐶𝑀 (𝑏 𝑁 + 𝑏−1/2 𝑁 −𝛿/2 ),
(1)
where we have used that |𝐽| ≤ 𝐶𝑁𝑏. This completes the estimate of Θ𝑀 .
(2) (3)
Step 2. Large m case; estimate of Θ𝑀 and Θ𝑀 . For a fixed 𝑦 ∈ ℝ, ℓ > 0,
let
𝑁
ℓ ℓ
𝜒(𝑦, ℓ) ∶= ∑ 1(𝑥𝑖 ∈ [𝑦 − , 𝑦 + ])
𝑖=1
𝑁 𝑁
denote the number of points in the interval [𝑦 − ℓ/𝑁, 𝑦 + ℓ/𝑁]. Note that for a
fixed m = (𝑚2 , … , 𝑚𝑛 ), we have
𝑁
∑ |𝑌𝑖,m (𝐸 ′ , 𝐱)| ≤ 𝐶 ⋅ 𝜒(𝐸 ′ , ℓ) ⋅ 1(𝜒(𝐸 ′ , ℓ) ≥ 𝑚𝑛 )
𝑖=1
(14.61)
𝑁
≤ 𝐶 ∑ 𝑚 ⋅ 1(𝜒(𝐸 ′ , ℓ) ≥ 𝑚)
𝑚=𝑚𝑛

where ℓ denotes the maximum of |𝑢1 | + ⋯ + |𝑢𝑛 | in the support of 𝑂(𝑢1 , … , 𝑢𝑛 ).


Since the summation over all increasing sequences m = (𝑚2 , … , 𝑚𝑛 ) ∈
ℕ𝑛−1
+ with a fixed 𝑚𝑛 contains at most 𝑚𝑛𝑛−2 terms, we have

| 𝐸+𝑏 ′ 𝑁
|
(14.62) ∑ |∫ d𝐸 ∫ ∑ 𝑌𝑖,m (𝐸 ′ , 𝐱)𝑓 d𝜇| ≤
|
m∈𝑆c (𝑀) 𝐸−𝑏 𝑖=1 |
𝑛

𝐸+𝑏 𝑁
𝐶∫ d𝐸 ′ ∑ 𝑚𝑛−1 ∫ 1(𝜒(𝐸 ′ , ℓ) ≥ 𝑚)𝑓 d𝜇.
𝐸−𝑏 𝑚=𝑀
The rigidity bound (12.22) clearly implies

∫ 1(𝜒(𝐸 ′ , ℓ) ≥ 𝑚)𝑓 d𝜇 ≤ 𝐶𝐷,𝜀′ 𝑁 −𝐷



for any 𝐷 > 0 and 𝜀′ > 0, as long as 𝑚 ≥ ℓ𝑁 𝜀 . Therefore, choosing 𝐷 = 2𝑛 + 2,
we see from (14.62) that Θ(2) is negligible, e.g.,
(2) ′
Θ𝑀 ≤ 𝐶𝑁 −1 if 𝑀 ≥ ℓ𝑁 𝜀 .
Since all rigidity bounds hold for the equilibrium measure, the last bound is valid
(3) ′
for Θ𝑀 as well. We will choose 𝑀 = ℓ𝑁 𝜀 and recall that ℓ is independent of 𝑁.
Combining these bounds with (14.60), we obtain
′ ′
Θ ≤ 𝐶𝑁 𝜀 (𝑛+1) (𝑏−1 𝑁 −1+𝜉+𝜀 + 𝑏−1/2 𝑁 −𝛿/2 ) + 𝐶𝑁 −1 .
Choosing 𝜀′ = 𝜀/(𝑛 + 1), we conclude the proof of Lemma 14.8.
CHAPTER 15

Continuity of Local Correlation Functions


Under the Matrix OU Process

We have completed the first two steps of the three-step strategy introduced in
Chapter 5, i.e., the local semicircle law and the universality of Gaussian divisible
ensembles (Theorem 12.4). In this section, we will complete this strategy by
proving a continuity result for the local correlation functions of the matrix OU
process in the following Theorem 15.2 and Lemma 15.3. This is Step 3a defined
in Chapter 5. From these results, we obtain a weaker version of Theorem 5.1;
namely, we get averaged energy universality of Wigner matrices but only on
scale 𝑏 ≥ 𝑁 −1/2+𝜀 . In Section 16.1, we will use the idea of “approximation
by a Gaussian divisible ensemble” and prove Theorem 5.1 down to any scale
𝑏 ≥ 𝑁 −1+𝜀 .
Theorem 15.1 ([68, theorem 2.2]). Let 𝐻 be an 𝑁 × 𝑁 real symmetric or
complex Hermitian Wigner matrix. In the Hermitian case we assume that the
real and imaginary parts are i.i.d. Suppose that the distribution 𝜈 of the rescaled
matrix elements √𝑁ℎ𝑖𝑗 satisfies the decay condition (5.6). Fix a small 𝜀 > 0,
an integer 𝑛 ≥ 1 and let 𝑂 ∶ ℝ𝑛 → ℝ be a continuous, compactly supported
function. Then, for any |𝐸| < 2 and 𝑏 ∈ [𝑁 −1/2+𝜀 , 𝑁 −𝜀 ], we have
𝐸+𝑏
1 (𝑛) (𝑛) 𝜶
(15.1) lim ∫ d𝐸 ′ ∫ d𝜶 𝑂(𝜶)(𝑝𝐻,𝑁 − 𝑝𝐺,𝑁 )(𝐸 ′ + ) = 0.
𝑁→∞ 2𝑏 𝑁
𝐸−𝑏 ℝ𝑛

To prove this theorem, we first recall the matrix OU process (12.1) defined
by
1 1
(15.2) d𝐻𝑡 = dB𝑡 − 𝐻𝑡 d𝑡
√𝑁 2
with the initial data 𝐻0 . The eigenvalue evolution of this process is the DBM and
recall that we denote the eigenvalue distribution at the time 𝑡 by 𝑓𝑡 d𝜇 with 𝑓𝑡
satisfying (12.17). In this section, we assume that the initial data 𝐻0 is an 𝑁 × 𝑁
Wigner matrix and the distribution of matrix element satisfies the uniform poly-
nomial decay condition (5.6). We have the following Green function continuity
theorem for the matrix OU process.
Theorem 15.2 (Continuity of Green function). Suppose that the initial data
𝐻0 is an 𝑁 × 𝑁 Wigner matrix with the distribution of matrix elements satisfying
the uniform polynomial decay condition (5.6). Let 𝜅 > 0 be arbitrary and suppose
171
172 15. CONTINUITY OF LOCAL CORRELATION FUNCTIONS

that for some small parameter 𝜎 > 0 and for any 𝑦 ≥ 𝑁 −1+𝜍 we have the following
estimate on the diagonal elements of the resolvent for any 0 ≤ 𝑡 ≤ 1:
| 1 |
(15.3) max max ||( ) | ≺ 𝑁 2𝜍
1≤𝑘≤𝑁 |𝐸|≤2−𝜅 𝐻𝑡 − 𝐸 − 𝑖𝑦 𝑘𝑘 |

with some constants 𝐶, 𝑐 depending only on 𝜎, 𝜅.


Let 𝐺(𝑡, 𝑧) = (𝐻𝑡 − 𝑧)−1 denote the resolvent and 𝑚(𝑡, 𝑧) = 𝑁 −1 Tr 𝐺(𝑡, 𝑧).
Suppose that 𝐹(𝑥1 , … , 𝑥𝑛 ) is a function such that for any multi-index 𝛼 = (𝛼1 , … ,
𝛼𝑛 ) with 1 ≤ |𝛼| ≤ 5 and for any 𝜀′ > 0 sufficiently small, we have
′ ′
(15.4) max{|𝜕 𝛼 𝐹(𝑥1 , … , 𝑥𝑛 )| ∶ max |𝑥𝑗 | ≤ 𝑁 𝜀 } ≤ 𝑁 𝐶0 𝜀
𝑗

and
(15.5) max{|𝜕 𝛼 𝐹(𝑥1 , … , 𝑥𝑛 )| ∶ max |𝑥𝑗 | ≤ 𝑁 2 } ≤ 𝑁 𝐶0
𝑗

for some constant 𝐶0 .


Let 𝜀 > 0 be arbitrary and choose an 𝜂 with 𝑁 −1−𝜀 ≤ 𝜂 ≤ 𝑁 −1 . For any
sequence of complex parameters 𝑧𝑗 = 𝐸𝑗 ± 𝑖𝜂, 𝑗 = 1, … , 𝑛, with |𝐸𝑗 | ≤ 2 − 𝜅, there
is a constant 𝐶 such that for any choices of the signs in the imaginary part of 𝑧𝑗
and for any 𝑡 ∈ [0, 1] we have
(15.6) |𝔼𝐹(𝑚(𝑡, 𝑧1 ), … , 𝑚(𝑡, 𝑧𝑛 )) − 𝔼𝐹(𝑚(0, 𝑧1 ), … , 𝑚(0, 𝑧𝑛 ))| ≤
𝐶𝑁 20𝜍+6𝜀 (𝑡√𝑁).
We defer the proof of Theorem 15.2 to the next section. The following result
shows how the Green function continuity, i.e., estimate of the type (15.6), can be
used to compare correlation functions. The proof will be given in Section 15.2.
Next we will compare local statistics of two Wigner matrix ensembles. We
will use the labels 𝐯 and 𝐰 to distinguish them, because later in Chapter 16 we
will denote the matrix elements of the two ensembles by different letters, 𝑣𝑖𝑗 and
𝑤𝑖𝑗 . For any two (generalized) Wigner matrix ensembles 𝐻 𝐯 and 𝐻 𝐰 , we denote
the probability laws of their eigenvalues 𝜆𝐯 and 𝜆𝐰 by 𝜇𝐯 and 𝜇𝐰 , respectively.
Denote by 𝑚𝐯 (𝑧), 𝑚𝐰 (𝑧) the Stieltjes transforms of the eigenvalues, i.e.,
𝑁
𝐯 1 1
𝑚 (𝑧) = ∑ ,
𝑁 𝑗=1 𝜆𝑗𝐯 − 𝑧
(𝑛) (𝑛)
and similarly for 𝑚𝐯 (𝑧). Let 𝑝𝐯,𝑁 and 𝑝𝐰,𝑁 be the 𝑛-point correlation functions
of the eigenvalues w.r.t. 𝜇𝐯 and 𝜇𝐰 . We remind the reader that sometimes we
will use the notation 𝔼𝐯 𝐹(𝜆) to denote the expectation of 𝔼𝐹(𝜆𝐯 ). Although the
setup is motivated by local correlation functions of eigenvalues, we point out that
the concept of matrices or eigenvalues plays no role in the following theorem.
It is purely a statement about comparing two point processes on the real line,
asserting that if the finite-dimensional marginals of the Stieltjes transforms are
close, then the local correlation functions are also close. It is essential that in
(15.8) below the spectral parameter of the Stieltjes transform must be at distance
15. CONTINUITY OF LOCAL CORRELATION FUNCTIONS 173

𝑁 −1−𝜍 to the real axis for some 𝜎 > 0, i.e., below the scale of the eigenvalue
spacing. Controlling the Stieltjes transform on this very short scale is necessary
to identify and compare local correlation functions at the scale 𝑁 −1 .
The following theorem is a slightly modified version of [69, theorem 6.4].
Theorem 15.3 (Correlation function comparison). Let 𝜅 > 0 be arbitrary,
and suppose that for some small parameters 𝜎, 𝛿 > 0 the following two conditions
hold:
(i) For any 𝜀 > 0 and any 𝑘 integer
(15.7) 𝔼[Im 𝑚𝐯 (𝐸 + 𝑖𝑁 −1+𝜀 )]𝑘 + 𝔼[Im 𝑚𝐰 (𝐸 + 𝑖𝑁 −1+𝜀 )]𝑘 ≤ 𝐶
holds for any |𝐸| ≤ 2 − 𝜅 and 𝑁 ≥ 𝑁0 (𝜀, 𝑘, 𝜅).
(ii) For any sequence 𝑧𝑗 = 𝐸𝑗 + 𝑖𝜂𝑗 , 𝑗 = 1, … , 𝑛, with |𝐸𝑗 | ≤ 2 − 𝜅 and
𝜂𝑗 = 𝑁 −1−𝜍𝑗 for some 𝜎𝑗 ≤ 𝜎, we have
(15.8) |𝔼(Im 𝑚𝐯 (𝑧1 ) ⋯ Im 𝑚𝐯 (𝑧𝑛 )) − 𝔼(Im 𝑚𝐰 (𝑧1 ) ⋯ Im 𝑚𝐰 (𝑧𝑛 ))| ≤ 𝑁 −𝛿 .
Then, for any integer 𝑛 ≥ 1 there are positive constants 𝑐𝑛 = 𝑐𝑛 (𝜎, 𝛿) such that
for any |𝐸| ≤ 2 − 2𝜅 and for any 𝐶 1 -function 𝑂 ∶ ℝ𝑛 → ℝ with compact support,
(𝑛) (𝑛) 𝜶
(15.9) ∫ d𝜶 𝑂(𝜶)(𝑝𝐯,𝑁 − 𝑝𝐰,𝑁 )(𝐸 + ) ≤ 𝐶𝑁 −𝑐𝑛
ℝ𝑛
𝑁
where 𝐶 depends on 𝑂 and 𝑁 is sufficiently large.
We remark that in some applications we will use slightly different condi-
tions. Instead of (15.8) we may assume

(15.10) |𝔼𝐹 (Im 𝑚𝐯 (𝑧1 ), … , Im 𝑚𝐯 (𝑧𝑛 ))


− 𝔼𝐹 (Im 𝑚𝐰 (𝑧1 ), … , Im 𝑚𝐰 (𝑧𝑛 )) | ≤ 𝑁 −𝛿
where 𝐹 is as in Theorem 15.2. Then, (15.8) holds since we can approximate
𝔼[Im 𝑚𝐰 (𝑧1 ) … Im 𝑚𝐰 (𝑧𝑛 )] by the expression in (15.10), where the function 𝐹
is chosen to be 𝐹(𝑥1 , … , 𝑥𝑛 ) ∶= 𝑥1 ⋯ 𝑥𝑛 if max𝑗 |𝑥𝑗 | ≤ 𝑁 c , and it is smoothly
cut off to go to 0 in the regime max𝑗 |𝑥𝑗 | ≥ 𝑁 2𝑐 for some small 𝑐 > 0.
To see this, recall the elementary inequality,
𝜂 𝜂
(15.11) 𝜃𝜂1 ≤ 2 𝜃𝜂2 implying Im 𝑚(𝐸 + 𝑖𝜂1 ) ≤ 2 Im 𝑚(𝐸 + 𝑖𝜂2 )
𝜂1 𝜂1
for any 𝜂2 ≥ 𝜂1 > 0. Hence, (15.7) implies that
(15.12) 𝔼[Im 𝑚𝐯 (𝑧)]𝑘 ≤ 𝐶(𝑁𝜂)𝑘
for any 𝑘 > 0. Let 𝜂 = 𝑁 −1+𝑎 , 𝑎 > 0, and consider 𝑛 = 1 for simplicity. By
choosing 𝑐 = 2𝑎,
𝔼|𝐹(Im 𝑚𝐯 (𝑧)) − Im 𝑚𝐯 (𝑧)| ≤ 𝔼 Im 𝑚𝐯 (𝑧)1(Im 𝑚𝐯 (𝑧) ≥ 𝑁 c )
≤ 𝑁 −𝑐(𝑘−1) 𝔼[Im 𝑚𝐯 (𝑧)]𝑘
≤ 𝑁 𝑘𝑎−𝑐(𝑘−1) ≤ 𝑁 −𝛿
174 15. CONTINUITY OF LOCAL CORRELATION FUNCTIONS

provided that 𝑘 is large enough. Similar arguments can be used for general
𝑛, and this implies that the difference between the expectation of 𝐹 and the
product is negligible. This proves that the condition (15.8) can be replaced by
the condition (15.10).
Notice that we pass through the continuity of traces of Green functions as an
intermediate step to get continuity of the correlation functions. If we choose to
follow the evolution of the correlation functions directly by differentiating them,
it will involve higher derivatives of eigenvalues. From the formulas of derivatives
of eigenvalues and eigenvectors (12.9), (12.10), higher derivatives of eigenvalues
involve singularities of the form (𝜆𝑖 − 𝜆𝑗 )−𝑛 for some positive integers 𝑛. These
singularities are very difficult to control precisely. Our approach to use the Green
function as an intermediate step completely avoids this difficulty because the
Green function has a natural regularization parameter, the imaginary part of
the spectral parameter.

Proof of Theorem 15.1. Recall that the matrix OU (15.2) can be solved
by the formula (12.21) so that the probability distribution of the matrix OU is
given by a Wigner matrix ensemble if the initial data is a Wigner matrix en-
semble. More precisely, if we denote the initial Wigner matrix by 𝐻0 , then the
distribution of 𝐻𝑡 is the same as 𝑒−𝑡/2 𝐻0 + (1 − 𝑒−𝑡 )1/2 𝐻G . Hence, the rigidity
holds in this case by (11.32), and (15.3) holds with any sufficiently small 𝜎 > 0;
(𝑛)
we may choose 𝜎 = 𝜀1 . Recall that 𝑝𝑡,𝑁 denotes the correlation functions of 𝐻𝑡 .
We now apply Theorem 15.3 with 𝐻 𝐯 = 𝐻0 and 𝐻 𝐰 = 𝐻𝑡 . The assumption
(15.8) can be verified by (15.6) if 𝑡 ≤ 𝑁 −1/2−𝜀 for any 𝜀 > 0. From (15.9), we
have
𝐸+𝑏
d𝐸 ′ (𝑛) (𝑛) 𝜶
(15.13) lim ∫ ∫ d𝜶 𝑂(𝜶)(𝑝0,𝑁 − 𝑝𝑡,𝑁 )(𝐸 ′ + ) = 0.
𝑁→∞
𝐸−𝑏
2𝑏 ℝ𝑛 𝑁
This compares the correlation functions of 𝐻0 and 𝐻𝑡 if 𝑡 is not too large. To
compare 𝐻𝑡 with 𝐻∞ = 𝐻G , by (12.24), we have for 𝑡 = 𝑁 −1/2−𝜀 and 𝑏 ≥
(𝑛)
𝑁 −1/2+10𝜀 that (15.13) holds with 𝑝0,𝑁 (which are the correlation functions of
(𝑛)
𝐻0 ) replaced by 𝑝𝐺,𝑁 . We have thus completed the proof of Theorem 15.1. □

15.1. Proof of Theorem 15.2


The first ingredient to prove Theorem 15.2 is the following continuity es-
timate of the matrix OU process. To state it, we need to introduce the defor-
mation of a matrix 𝐻 at certain matrix elements. For any 𝑖 ≤ 𝑗, let 𝜽𝑖𝑗 𝐻 de-
note a new matrix with matrix elements (𝜽𝑖𝑗 𝐻)𝑘ℓ = 𝐻𝑘ℓ if {𝑘, ℓ} ≠ {𝑖, 𝑗} and
𝑖𝑗 𝑖𝑗
(𝜽𝑖𝑗 𝐻)𝑘,ℓ = 𝜃𝑘,ℓ 𝐻𝑘ℓ with some 0 ≤ 𝜃𝑘,ℓ ≤ 1 if {𝑘, ℓ} = {𝑖, 𝑗}. In other words,
the deformation operation 𝜽𝑖𝑗 multiplies the matrix elements 𝐻𝑖𝑗 and 𝐻𝑗𝑖 by a
factor between 0 and 1 and leaves all other matrix elements intact.
15.1. PROOF OF THEOREM 15.2 175

Lemma 15.4. Suppose that 𝐻0 is a Wigner ensemble and fix 𝑡 ∈ [0, 1]. Let 𝑔
be a smooth function of the matrix elements (ℎ𝑖𝑗 )𝑖≤𝑗 and set

(15.14) 𝑀𝑡 ∶= sup sup sup 𝔼((𝑁 3/2 |ℎ𝑖𝑗 (𝑠)3 | + √𝑁|ℎ𝑖𝑗 (𝑠)|)|𝜕ℎ3 𝑖𝑗 𝑔(𝜽𝑖𝑗 𝐻𝑠 )|)
0≤𝑠≤𝑡 𝑖≤𝑗 𝜽𝑖𝑗

where the last supremum runs through all deformations 𝜽𝑖𝑗 . Then,

𝔼𝑔 (𝐻𝑡 ) − 𝔼𝑔 (𝐻0 ) = O(𝑡√𝑁)𝑀𝑡 .

This lemma also holds for generalized Wigner matrices. We refer the reader
to [12] for the minor adjustment needed to this case.

Proof. Denote 𝜕𝑖𝑗 = 𝜕ℎ𝑖𝑗 ; notice that despite the two indices, this is still a
first- and not a second-order partial derivative. By Itô’s formula, we have

1 2
𝜕𝑡 𝔼𝑔(𝐻𝑡 ) = − ∑(𝔼(ℎ𝑖𝑗 (𝑡)𝜕𝑖𝑗 𝑔(𝐻𝑡 )) − 𝔼(𝜕𝑖𝑗 𝑔(𝐻𝑡 ))).
𝑖≤𝑗
2𝑁

A Taylor expansion of the first derivative 𝜕𝑖𝑗 𝑔 in the direction ℎ𝑖𝑗 yields
2
𝔼(ℎ𝑖𝑗 (𝑡)𝜕𝑖𝑗 𝑔(𝐻𝑡 )) = 𝔼ℎ𝑖𝑗 (𝑡)𝜕𝑖𝑗 𝑔ℎ𝑖𝑗 (𝑡)=0 + 𝔼(ℎ𝑖𝑗 (𝑡)2 𝜕𝑖𝑗 𝑔ℎ𝑖𝑗 (𝑡)=0 )
3
+ O(sup 𝔼(|ℎ𝑖𝑗 (𝑡)3 𝜕𝑖𝑗 𝑔(𝜽𝑖𝑗 𝐻𝑡 )|))
𝜽𝑖𝑗
2 3
= 𝑠𝑖𝑗 𝔼(𝜕𝑖𝑗 𝑔ℎ𝑖𝑗 (𝑡)=0 ) + O(sup 𝔼(|ℎ𝑖𝑗 (𝑡)3 𝜕𝑖𝑗 𝑔(𝜽𝑖𝑗 𝐻𝑡 )|)),
𝜽𝑖𝑗
2 2 3
𝔼(𝜕𝑖𝑗 𝑔(𝐻𝑡 )) = 𝔼(𝜕𝑖𝑗 𝑔ℎ𝑖𝑗 (𝑡)=0 ) + O(sup 𝔼(|ℎ𝑖𝑗 (𝑡)(𝜕𝑖𝑗 𝑔(𝜽𝑖𝑗 𝐻))|)).
𝜽𝑖𝑗

Here the shorthand notation 𝜕𝑖𝑗 𝑔ℎ𝑖𝑗 (𝑡)=0 means (𝜕𝑖𝑗 𝑔)(𝐻 ˜𝑡 ), where (𝐻 ˜𝑡 )𝑘ℓ =
˜𝑡 )𝑖𝑗 = (𝐻
(𝐻𝑡 )𝑘ℓ if {𝑘, ℓ} ≠ {𝑖, 𝑗} and (𝐻 ˜𝑡 )𝑗𝑖 = 0. In the calculation above, we
used the independence of the matrix elements, in particular, that 𝐻 ˜ 𝑡 is indepen-
dent of ℎ𝑖𝑗 (𝑡). We also used the fact that the OU process preserves the first and
second moments, i.e., 𝔼ℎ𝑖𝑗 (𝑡) = 𝔼ℎ𝑖𝑗 (0) = 0 and 𝔼|ℎ𝑖𝑗 (𝑡)|2 = 𝔼|ℎ𝑖𝑗 (0)|2 = 1/𝑁.
Thus we have
3
𝜕𝑡 𝔼𝑔(𝐻𝑡 ) = 𝑁 1/2 O(sup sup 𝔼(𝑁 3/2 |ℎ𝑖𝑗 (𝑡)3 | + 𝑁 1/2 |ℎ𝑖𝑗 (𝑡)|)|𝜕𝑖𝑗 𝑔(𝜽𝑖𝑗 𝐻𝑡 )|).
𝑖≤𝑗 𝜽𝑖𝑗

Integration over time finishes the proof. □

The second ingredient to prove Theorem 15.2 is an estimate of the type


(15.3), not only for diagonal matrix elements but also for off-diagonal ones and
for spectral parameters with imaginary parts slightly below 𝑁 −1 .
176 15. CONTINUITY OF LOCAL CORRELATION FUNCTIONS

Lemma 15.5. Suppose for a Wigner matrix 𝐻 we have the following estimate:
| 1 |
(15.15) max sup sup ||Im( ) || ≺ 𝑁 3𝜍+𝜀 .
1≤𝑘≤𝑁 |𝐸|≤2−𝜅 𝜂≥𝑁−1−𝜀 𝐻 − 𝐸 ± 𝑖𝜂 𝑘𝑘

Then, we have that for any 𝜂 ≥ 𝑁 −1−𝜀


| 1 |
(15.16) sup sup ||( ) || ≺ 𝑁 3𝜍+𝜀 .
1≤𝑘,ℓ≤𝑁 |𝐸|≤2−𝜅 𝐻 − 𝐸 ± 𝑖𝜂 𝑘ℓ

We remark that (15.16) could be strengthened by including the supremum over


𝜂 ≥ 𝑁 −1−𝜀 . This can be obtained by establishing the estimate first for a fine grid
of 𝜂’s with spacing 𝑁 −10 and then extend the bound for all 𝜂 by using that the
Green functions are Lipschitz continuous in 𝜂 with a Lipschitz constant 𝜂−2 .

Proof. Let 𝜆𝑚 and 𝑢𝑚 denote the eigenvalues and eigenvectors of 𝐻; then


by the definition of the Green function, we have
𝑁
| 1 | |𝑢 (𝑗)||𝑢𝑚 (𝑘)|
|( ) |≤ ∑ 𝑚
| 𝐻 − 𝑧 𝑗𝑘 | |𝜆𝑚 − 𝑧|
𝑚=1
(15.17) 1/2 1/2
𝑁 𝑁
|𝑢 (𝑗)|2 |𝑢 (𝑘)|2
≤[∑ 𝑚 ] [∑ 𝑚 ] .
𝑚=1
|𝜆𝑚 − 𝑧| 𝑚=1
|𝜆𝑚 − 𝑧|

Define a dyadic decomposition


𝑈𝑛 = {𝑚 ∶ 2𝑛−1 𝜂 ≤ |𝜆𝑚 − 𝐸| < 2𝑛 𝜂}, 𝑛 = 1, … , 𝑛0 ∶= 𝐶 log 𝑁,
(15.18)
𝑈0 = {𝑚 ∶ |𝜆𝑚 − 𝐸| < 𝜂}, 𝑈∞ ∶= {𝑚 ∶ 2𝑛0 𝜂 ≤ |𝜆𝑚 − 𝐸|},

and divide the summation over 𝑚 into ⋃𝑛 𝑈𝑛 :


𝑁
|𝑢𝑚 (𝑗)|2 |𝑢𝑚 (𝑗)|2 |𝑢𝑚 (𝑗)|2
∑ =∑ ∑ ≤ 𝐶 ∑ ∑ Im
𝑚=1
|𝜆𝑚 − 𝑧| 𝑛 𝑚∈𝑈
|𝜆𝑚 − 𝑧| 𝑛 𝑚∈𝑈
𝜆𝑚 − 𝐸 − 𝑖2𝑛 𝜂
𝑛 𝑛

1
≤ 𝐶 ∑ Im( ) .
𝑛
𝐻 − 𝐸 − 𝑖2𝑛 𝜂 𝑗𝑗

Now using (15.15) we can control the right-hand side of (15.17) and conclude
(15.16). □

Now we can finish the proof of Theorem 15.2. First note that from the trivial
bound
1 𝑦 1
(15.19) Im( ) ≤ ( ) Im( ) , 𝜂 ≤ 𝑦,
𝐻 − 𝐸 − 𝑖𝜂 𝑗𝑗 𝜂 𝐻 − 𝐸 − 𝑖𝑦 𝑗𝑗

and (15.3), the assumption (15.15) in Lemma 15.5 holds. Therefore, the bounds
(15.16) on the matrix elements are available.
15.2. PROOF OF THE CORRELATION FUNCTION COMPARISON THEOREM 177

Next, we will first consider the specific function


(15.20) 𝑔(𝐻) = 𝐺𝑎𝑏 (𝑧)
for a fixed index pair 𝑎, 𝑏 where 𝑧 = 𝐸 + i𝜂 with 𝑁 −1−𝜀 ≤ 𝜂 ≤ 1 and |𝐸| ≤ 2 − 𝜅.
While this function is not of the form appearing in (15.6), once the comparison
for 𝐺𝑎𝑏 (𝑧) is established, we take 𝑎 = 𝑏 and average over 𝑎 to get the normalized
trace of 𝐺, i.e., the Stieltjes transform. This will then prove (15.6) for the function
𝐹(𝑥) = 𝑥; i.e., compare 𝔼𝑚(𝑡, 𝑧) with 𝔼𝑚(0, 𝑧). The case of a polynomial 𝐹 with
several arguments is analogous, and similarly any function satisfying (15.4)–
(15.5) can be sufficiently well approximated by a Taylor expansion.
Returning to the case (15.20), in order to apply Lemma 15.4 we need to
bound the third derivatives of 𝑔(𝐻) to estimate 𝑀𝑡 from (15.14). We have
3
𝜕𝑖𝑗 𝐺(𝑧)𝑎𝑏 = − ∑ 𝐺(𝑧)𝑎𝛼1 𝐺(𝑧)𝛽1 ,𝛼2 𝐺(𝑧)𝛽2 ,𝛼3 𝐺(𝑧)𝛽3 ,𝑏
𝜶,𝜷

where {𝛼𝑘 , 𝛽𝑘 } = {𝑖, 𝑗} or {𝑗, 𝑖}. By (15.16), the four expressions


𝐺(𝑧)𝑎𝛼1 , 𝐺(𝑧)𝛽1 𝛼2 , 𝐺(𝑧)𝛽2 𝛼3 , 𝐺(𝑧)𝛽3 𝑏 ,
4𝜍+𝜀
are bounded by 𝑁 with very high probability provided 𝑁 −1−𝜀 ≤ 𝜂 ≤ 1.
Consequently, we proved that uniformly in 𝐸 ∈ (−2 + 𝜅, 2 − 𝜅), 𝑁 −1−𝜀 ≤ 𝜂 ≤ 1,
3
𝜕𝑖𝑗 𝐺(𝑧)𝑎𝑏 = O(𝑁 20𝜍+5𝜀 )
with very high probability. The same argument holds if some matrix elements
are reduced by deformation; i.e., we have
3
|𝜕𝑖𝑗 𝑔(𝜽𝑖𝑗 𝐻𝑠 )| ≤ 𝑁 20𝜍+5𝜀 ,
and thus (15.14) holds with 𝑀𝑡 = 𝑁 20𝜍+6𝜀 using the bound ℎ𝑖𝑗 ≺ 𝑁 −1/2 . Hence,
by Lemma 15.4, we have proved that for any 𝑡 ≤ 1 we have
|𝔼𝑔(𝐻𝑡 ) − 𝔼𝑔(𝐻0 )| ≤ 𝐶𝑁 20𝜍+6𝜀 (𝑡√𝑁).
This completes the proof of Theorem 15.2.

15.2. Proof of the Correlation Function Comparison Theorem,


Theorem 15.3
Define an approximate delta function at the scale 𝜂 by
1 1
(15.21) 𝜃𝜂 (𝑥) ∶= Im .
𝜋 𝑥 − 𝑖𝜂
We will choose 𝜂 ∼ 𝑁 −1−𝑎 , with some small 𝑎 > 0, i.e., slightly smaller than the
typical eigenvalue spacing. This means that an observable of the form 𝜃𝜂 has
sufficient resolution to detect individual eigenvalues. Moreover, polynomials of
such observables detect correlation functions. On the other hand,
1
∑ 𝜃 (𝜆 − 𝐸) = 𝜋 Im 𝑚(𝑧);
𝑁 𝑖 𝜂 𝑖
therefore expectation values of such observables are covered by the condition
(15.8). The rest of the proof consists of making this idea precise. There are two
178 15. CONTINUITY OF LOCAL CORRELATION FUNCTIONS

technicalities to resolve. First, correlation functions involve distinct eigenvalues


(see (15.25) below), while polynomials of the resolvent include an overcounting
of coinciding eigenvalues. Thus an exclusion-inclusion formula will be needed.
Second, although 𝜂 is much smaller than the relevant scale 1/𝑁, it still does not
give pointwise information on the correlation functions. However, the correla-
tion functions in (4.38) are identified only as a weak limit, i.e., tested against a
continuous function 𝑂. The continuity of 𝑂 can be used to show that the differ-
ence between the exact correlation functions and the smeared out ones on the
scale 𝜂 ∼ 𝑁 −1−𝑎 is negligible. This last step requires an a priori upper bound
on the density to ensure that not too many eigenvalues fall into an irrelevantly
small interval; this bound is given in (15.7), and it will eventually be verified by
the local semicircle law.

Proof of Theorem 15.3. For notational simplicity, we give the detailed


proof only for the case of 𝑛 = 3-point correlation functions; the proof is analo-
gous for the general case.
Denote by 𝜂 = 𝑁 −1−𝑎 for some small 𝑎 > 0; by following the proof one may
check that any 𝑎 ≤ 𝑐 min{𝛿, 𝜎}/𝑛2 will do. Let 𝑂 be a compactly supported test
function, and let

1 𝛽 − 𝛼1 𝛽 − 𝛼3
𝑂𝜂 (𝛽1 , 𝛽2 , 𝛽3 ) ∶= ∫ d𝛼 d𝛼 d𝛼 𝑂(𝛼1 , 𝛼2 , 𝛼3 )𝜃𝜂 ( 1 ) ⋯ 𝜃𝜂 ( 3 )
𝑁 3 ℝ3 1 2 3 𝑁 𝑁

be its smoothing on scale 𝑁𝜂. We note that for any nonnegative 𝐶 1 function 𝑂
with compact support, there is a constant depending on 𝑂 such that

(15.22) 𝑂𝜂 ≤ 𝐶𝑂𝜂′ for any 0 ≤ 𝜂 ≤ 𝜂 ′ ≤ 1/𝑁.

We can apply this bound with 𝜂′ = 1/𝑁 and combine it with (15.11) with 𝜂1 =
1/𝑁 and 𝜂2 = 𝑁 −1+𝜀 for any small 𝜀 > 0 to have

(15.23) 𝑂𝜂 ≤ 𝐶𝑁 3𝜀 𝑂𝑁−1+𝜀 for any 0 ≤ 𝜂 ≤ 1/𝑁.

After the change of variables 𝑥𝑗 = 𝐸 + 𝛽𝑗 /𝑁 and with 𝐸𝑗 ∶= 𝐸 + 𝛼𝑗 /𝑁, we


have

(3) 𝛽1 𝛽
∫ d𝛽1 d𝛽2 d𝛽3 𝑂𝜂 (𝛽1 , 𝛽2 , 𝛽3 )𝑝𝐰,𝑁 (𝐸 + ,…,𝐸 + 3)
ℝ3
𝑁 𝑁

(15.24) = ∫ d𝛼1 d𝛼2 d𝛼3 𝑂(𝛼1 , 𝛼2 , 𝛼3 )


ℝ3
(3)
⋅ ∫ d𝑥1 d𝑥2 d𝑥3 𝑝𝐰,𝑁 (𝑥1 , 𝑥2 , 𝑥3 )𝜃𝜂 (𝑥1 − 𝐸1 )𝜃𝜂 (𝑥2 − 𝐸2 )𝜃𝜂 (𝑥3 − 𝐸3 ).
ℝ3
15.2. PROOF OF THE CORRELATION FUNCTION COMPARISON THEOREM 179

By definition of the correlation function, for any fixed 𝐸, 𝛼1 , 𝛼2 , 𝛼3 ,


(3)
∫ d𝑥1 d𝑥2 d𝑥3 𝑝𝐰,𝑁 (𝑥1 , 𝑥2 , 𝑥3 )𝜃𝜂 (𝑥1 − 𝐸1 )𝜃𝜂 (𝑥2 − 𝐸2 )𝜃𝜂 (𝑥3 − 𝐸3 )
1
(15.25) = 𝔼𝐰
𝑁(𝑁 − 1)(𝑁 − 2)
𝛼 𝛼 𝛼
⋅ ∑ 𝜃𝜂 (𝜆𝑖 − 𝐸 − 1 )𝜃𝜂 (𝜆𝑗 − 𝐸 − 2 )𝜃𝜂 (𝜆𝑘 − 𝐸 − 3 ),
𝑖≠𝑗≠𝑘
𝑁 𝑁 𝑁

where 𝔼𝐰 indicates expectation w.r.t. the 𝐰-variables. By the exclusion-inclu-


sion principle,
1
(15.26) 𝔼𝐰 ∑ 𝜃 (𝑥 − 𝐸1 )𝜃𝜂 (𝑥2 − 𝐸2 )𝜃𝜂 (𝑥3 − 𝐸3 ) =
𝑁(𝑁 − 1)(𝑁 − 2) 𝑖≠𝑗≠𝑘 𝜂 1
𝔼𝐰 𝐴1 + 𝔼𝐰 𝐴2 + 𝔼𝐰 𝐴3
where
3
𝑁3 1
𝐴1 ∶= ∏[ ∑ 𝜃𝜂 (𝜆𝑖 − 𝐸𝑗 )],
𝑁(𝑁 − 1)(𝑁 − 2) 𝑗=1 𝑁 𝑖
2
𝐴3 ∶= ∑ 𝜃 (𝜆 − 𝐸1 )𝜃𝜂 (𝜆𝑖 − 𝐸2 )𝜃𝜂 (𝜆𝑖 − 𝐸3 ),
𝑁(𝑁 − 1)(𝑁 − 2) 𝑖 𝜂 𝑖
and 𝐴2 ∶= 𝐵1 + 𝐵2 + 𝐵3 with 𝐵𝑗 defined as follows:
1
𝐵3 = − ∑ 𝜃 (𝜆 − 𝐸1 )𝜃𝜂 (𝜆𝑖 − 𝐸2 ) ∑ 𝜃𝜂 (𝜆𝑘 − 𝐸3 ).
𝑁(𝑁 − 1)(𝑁 − 2) 𝑖 𝜂 𝑖 𝑘

𝐵1 consists of terms with 𝑗 = 𝑘, and 𝐵2 consists of terms with 𝑖 = 𝑘 from the


triple sum (15.26). We remark that 𝐴3 + 𝐴2 ≤ 0, and thus

(3)
(15.27) ∫ d𝑥1 d𝑥2 d𝑥3 𝑝𝐰,𝑁 (𝑥1 , 𝑥2 , 𝑥3 )𝜃𝜂 (𝑥1 − 𝐸1 )𝜃𝜂 (𝑥2 − 𝐸2 )𝜃𝜂 (𝑥3 − 𝐸3 ) ≤
3
𝐶𝔼𝐰 ∏ Im 𝑚(𝐸𝑗 + 𝑖𝜂).
𝑗=1

(3)
The same bound holds for 𝑝𝐯,𝑁 as well. Therefore, we can combine (15.23) and
(15.7) to obtain for any small 𝜀 > 0
(3) (3) 𝛽1 𝛽
(15.28) ∫ d𝛽1 d𝛽2 d𝛽3 𝑂𝜂 (𝛽1 , 𝛽2 , 𝛽3 )(𝑝𝐰,𝑁 + 𝑝𝐯,𝑁 )(𝐸 + , … , 𝐸 + 3 ) = 𝑂(𝑁 3𝜀 ).
ℝ3
𝑁 𝑁
To approximate 𝔼𝐰 𝐵3 , we define 𝜙𝐸1 ,𝐸2 (𝑥) = 𝜃𝜂 (𝑥 − 𝐸1 )𝜃𝜂 (𝑥 − 𝐸2 ). Recall
𝜂 = 𝑁 −1−𝑎 and let 𝜂 ˆ = 𝑁 −1−9𝑎 . Decompose 𝜃 ˆ𝜂ˆ = 𝜃ˆ1 + 𝜃ˆ2 where 𝜃 ˆ2 (𝑦) =
ˆ𝑁 3𝑎 ). Denote
𝜃𝜂ˆ (𝑦)1(|𝑦| ≥ 𝜂

(15.29) 𝜃˜ = (1 − 𝜓(𝑎))−1 𝜃1̂ , ˆ2 (𝑦)d𝑦 ≤ 𝑁 −3𝑎 ,


𝜓(𝑎) ∶= ∫ 𝜃
180 15. CONTINUITY OF LOCAL CORRELATION FUNCTIONS

so that ∫ ˜ 𝜃 = 1. Using |𝜙𝐸′ 1 ,𝐸2 (𝑥)| ≤ 𝐶𝜂 −2 𝜃𝜂 (𝑥 − 𝐸1 ) and |𝜙𝐸″1 ,𝐸2 (𝑥)| ≤


𝐶𝜂 −3 𝜃𝜂 (𝑥 − 𝐸1 ), we have

𝜃 ∗ 𝜙𝐸1 ,𝐸2 (𝑥) − 𝜙𝐸1 ,𝐸2 (𝑥)|
1 | |
≤ ˆ1 (𝑦)[𝜙𝐸 ,𝐸 (𝑥 − 𝑦) − 𝜙𝐸 ,𝐸 (𝑥)]|
|∫ d𝑦 𝜃
1 − 𝜓(𝑎) | 1 2 1 2
|
| |
ˆ1 (𝑦)[−𝜙′
≤ 𝐶 |∫ d𝑦 𝜃 ″ 2
𝐸1 ,𝐸2 (𝑥)𝑦 + 𝑂(𝜙𝐸1 ,𝐸2 (𝑥))𝑦 ]|
| |
ˆ2 𝑁 6𝑎
𝜂
≤𝐶 𝜃𝜂 (𝑥 − 𝐸1 )
𝜂3
ˆ1 , i.e., ∫ d𝑦 𝜃
where we used the symmetry of 𝜃 ˆ1 (𝑦)𝑦 = 0 and that 𝜃
ˆ1 (𝑦) is sup-
ˆ𝑁 . Together with (15.11) with the choice of 𝜂 = 𝑁 −1+𝜀 for
ported on |𝑦| ≤ 𝜂 3𝑎 ′

some small 𝜀 > 0, we have


| 1 |
| 𝔼 3 ∑[ ˜
𝜃 ∗ 𝜙𝐸1 ,𝐸2 (𝜆𝑖 ) − 𝜙𝐸1 ,𝐸2 (𝜆𝑖 )] ∑ 𝜃𝜂 (𝜆𝑘 − 𝐸3 )||
| 𝑁
𝑖 𝑘
ˆ2 6𝑎
𝜂 𝑁 | 1 |
≤ |𝔼 ∑ 𝜃 (𝜆 − 𝐸1 )𝜃𝜂 (𝜆𝑘 − 𝐸3 )||
𝑁𝜂 3 | 𝑁 2 𝑖,𝑘 𝜂 𝑖
(15.30)
ˆ2 𝑁 8𝑎+2𝜀 | 1
𝜂 |
≤ |𝔼 ∑ 𝜃𝑁−1+𝜀 (𝜆𝑖 − 𝐸1 )𝜃𝑁−1+𝜀 (𝜆𝑘 − 𝐸3 )||
𝑁𝜂 3 | 𝑁 2 𝑖,𝑘

ˆ)2 𝑁 11𝑎+2𝜀 ≤ 𝑁 −6𝑎 ,


≤ (𝑁 𝜂
where we used the a priori bound (15.7) and then 𝜀 ≤ 𝑎/2 in the two last steps.
Hence, we can approximate 𝔼𝐰 𝐵3 by

(15.31) 𝔼𝐰 𝐵3 = 𝔼𝐰 ∫ d𝑦 𝜙𝐸1 ,𝐸2 (𝑦)𝑁 −3 ∑ ˜


𝜃 (𝜆𝑖 − 𝑦)𝜃𝜂 (𝜆𝑘 − 𝐸3 ) + 𝑂(𝑁 −𝑎 ).
𝑖,𝑘

Recall 𝜃𝜂ˆ = 𝜃 ˆ2 and ˜


ˆ1 + 𝜃 𝜃 = (1 − 𝜓(𝑎))−1 𝜃 ˆ1 . We wish to replace ˜𝜃 in the
−1 −𝑎
right-hand side of (15.31) by (1 − 𝜓(𝑎)) 𝜃𝜂ˆ with an error 𝑂(𝑁 ). For this
ˆ2 is bounded
purpose, we need to prove that the additional contribution from 𝜃
by 𝑂(𝑁 −𝑎 ).
ˆ2 (𝑦) ≤ 𝐶𝑁 −3𝑎 𝜃 3𝑎 ˆ (𝑦), we have
Since 𝜃 𝑁 𝜂

(15.32) ˆ2 (𝜆𝑖 − 𝑦)𝜃𝜂 (𝜆𝑘 − 𝐸3 ) ≤


∫ d𝑦 𝜙𝐸1 ,𝐸2 (𝑦)𝔼𝐰 𝑁 −3 ∑ 𝜃
𝑖,𝑘

∫ d𝑦 𝜙𝐸1 ,𝐸2 (𝑦)𝔼𝐰 𝑁 −3 ∑ 𝑁 −3𝑎 𝜃𝑁3𝑎 𝜂ˆ (𝜆𝑖 − 𝑦)𝜃𝜂 (𝜆𝑘 − 𝐸3 ).


𝑖,𝑘

Now we show that the contribution of this error term to the right-hand side of
(15.24) is negligible. Recalling 𝐸1 = 𝐸 +𝛼1 /𝑁 and using ∫ d𝛼1 𝜃𝜂 (𝑦 −𝐸1 ) ≤ 𝐶𝑁,
15.2. PROOF OF THE CORRELATION FUNCTION COMPARISON THEOREM 181

we have
|
|∫ d𝛼1 d𝛼2 d𝛼3 𝑂(𝛼1 , 𝛼2 , 𝛼3 )
| ℝ3
|
× ∫ d𝑦𝜃𝜂 (𝑦 − 𝐸1 )𝜃𝜂 (𝑦 − 𝐸2 )𝔼𝐰 𝑁 −3 ∑ 𝑁 −3𝑎 𝜃𝑁3𝑎 𝜂ˆ (𝜆𝑖 − 𝑦)𝜃𝜂 (𝜆𝑘 − 𝐸3 )|
𝑖,𝑘 |
(15.33)
≤ 𝐶𝑁 −3𝑎 ∫ d𝛼2 d𝛼3 𝔼𝐰 𝑁 −2 ∑ 𝜃𝜂 ∗ 𝜃𝑁3𝑎 𝜂ˆ (𝜆𝑖 − 𝐸2 ) ∑ 𝜃𝜂 (𝜆𝑘 − 𝐸3 )
|𝛼2 |+|𝛼3 |≤𝐶 𝑖 𝑘

≤ 𝐶𝑁 −3𝑎 ∫ d𝛼2 d𝛼3 𝔼𝐰 𝑁 −2 ∑ 𝜃𝜂 (𝜆𝑖 − 𝐸2 ) ∑ 𝜃𝜂 (𝜆𝑘 − 𝐸3 ),


|𝛼2 |+|𝛼3 |≤𝐶 𝑖 𝑘

where we have used 𝜃𝜂 ∗ 𝜃𝜂′ ≤ 𝐶𝜃𝜂 if 𝜂 > 𝜂 ′ with 𝐶 independent of 𝜂, 𝜂 ′ . Notice


that we have only used that 𝑂 is compactly supported and ‖𝑂‖∞ is bounded.
Using (15.11) and (15.7), we can bound the last line by 𝑁 −𝑎+𝐶𝜀 .
As we will choose 𝜀 ≪ 𝑎, neglecting the 𝐶𝜀 exponent, we can thus approx-
imate the contribution of 𝔼𝐰 𝐵3 to the r.h.s. of (15.24) by

∫ d𝛼1 d𝛼2 d𝛼3 𝑂(𝛼1 , 𝛼2 , 𝛼3 )𝔼𝐰 𝐵3


ℝ3

= (1 − 𝜓(𝑎))−1 ∫ d𝛼1 d𝛼2 d𝛼3 𝑂(𝛼1 , 𝛼2 , 𝛼3 )


(15.34) ℝ3

× ∫ d𝑦 𝜙𝐸1 ,𝐸2 (𝑦)𝔼𝐰 𝑁 −3 ∑ 𝜃𝜂ˆ (𝜆𝑖 − 𝑦)𝜃𝜂 (𝜆𝑘 − 𝐸3 )


𝑖,𝑘

+ 𝑂(𝑁 −𝑎 ).

The same estimate holds if we replace the expectation 𝔼𝐰 by 𝔼𝐯 . Recalling (15.8),


we can thus estimate their difference by

| |
|∫ d𝛼1 d𝛼2 d𝛼3 𝑂(𝛼1 , 𝛼2 , 𝛼3 )[𝔼𝐰 − 𝔼𝐯 ]𝐵3 |
| ℝ3 |
| 𝐰 𝐯 −2 |
≤ 𝐶 ∫ d𝑦 𝜙𝐸1 ,𝐸2 (𝑦)||[𝔼 − 𝔼 ]𝑁 ∑ 𝜃𝜂ˆ (𝜆𝑖 − 𝑦)𝜃𝜂 (𝜆𝑘 − 𝐸3 )||
𝑖,𝑘
(15.35) −𝑎
+ 𝑂(𝑁 )

≤ 𝐶(𝑁𝜂)−1 𝑁 −𝛿 + 𝑂(𝑁 −𝑎 )

≤ 𝑂(𝑁 −𝑎 )

provided that 𝑎 ≤ 𝛿/2. Similar arguments can be used to prove that

(15.36) ∫ d𝛼1 d𝛼2 d𝛼3 𝑂(𝛼1 , 𝛼2 , 𝛼3 )[𝔼𝐰 − 𝔼𝐯 ][𝐴1 + 𝐴2 + 𝐴3 ] = 𝑂(𝑁 −𝑎 ).


ℝ3
182 15. CONTINUITY OF LOCAL CORRELATION FUNCTIONS

Recalling (15.24), we have thus proved that


(3) (3) 𝛽1 𝛽
(15.37) ∫ d𝛽1 d𝛽2 d𝛽3 𝑂𝜂 (𝛽1 , 𝛽2 , 𝛽3 )(𝑝𝐰,𝑁 − 𝑝𝐯,𝑁 )(𝐸 + ,…,𝐸 + 3)
ℝ3
𝑁 𝑁
= 𝑂(𝑁 −𝑎 )
for any 𝑂 compactly supported with ‖𝑂‖∞ bounded.
In order to prove Theorem 15.3, it remains to replace 𝑂𝜂 with 𝑂 in (15.37);
at this point we will use that 𝑂 is differentiable. For this purpose, we only need
to bound the error
(3) 𝛽1 𝛽
(15.38) ∫ d𝛽1 d𝛽2 d𝛽3 (𝑂 − 𝑂𝜂 )(𝛽1 , 𝛽2 , 𝛽3 )𝑝𝐰,𝑁 (𝐸 + ,…,𝐸 + 3)
ℝ3
𝑁 𝑁
= 𝑂(𝑁 −𝑎+𝜀 )
and use the similar estimate with 𝐰 replaced by 𝐯. One can easily check that
˜ with compact support and ‖𝑂‖
there is a 𝐶 1 nonnegative function 𝑂 ˜ ∞ ≤ 1 such
that
(15.39) ˜𝜂 .
|𝑂 − 𝑂𝜂 | ≤ 𝐶(𝑁𝜂)𝑂
˜ Choosing 𝜀 much smaller
Hence, (15.38) follows from applying (15.28) to 𝑂.
than 𝑎 completes the proof of Theorem 15.3. □
CHAPTER 16

Universality of Wigner Matrices


in Small Energy Windows: GFT

We have proved the universality of Wigner matrices in energy windows big-


ger than 𝑁 −1/2+𝜀 in Chapter 15. In this section, we will use the Green function
comparison theorem to improve it to any energy windows of size bigger than
𝑁 −1+𝜀 . This is Step 3 in the three-step strategy introduced in Chapter 5. In the
following, we will first prove the Green function comparison theorem and then
use it to prove our main universality theorem, i.e., Theorem 5.1.

16.1. Green Function Comparison Theorems


The main ingredient to prove this result is the following Green function
comparison theorem stating that the correlation functions of eigenvalues of two
matrix ensembles are identical on scale 1/𝑁 provided that the first four moments
of all matrix elements of these two ensembles are almost identical. Here we do
not assume that the real and imaginary parts are i.i.d.; hence, the 𝑘th moment
of ℎ𝑖𝑗 is understood as the collection of numbers ∫ ℎ𝑠 ℎ𝑘−𝑠 𝜈𝑖𝑗 (dℎ), 𝑠 = 0, 1, … , 𝑘.
Before stating the result we explain a notation. We will distinguish between
the two ensembles by using different letters, 𝑣𝑖𝑗 and 𝑤𝑖𝑗 , for their matrix ele-
ments, and we often use the notation 𝐻 (𝐯) and 𝐻 (𝐰) to indicate the difference.
Alternatively, one could denote the matrix elements of 𝐻 by ℎ𝑖𝑗 and make the
distinction of the two ensembles in the measure, especially in the expectation,
by using the notation 𝔼𝐯 and 𝔼𝐰 . Since the matrix elements will be replaced
one by one from one distribution to the other, the latter notation would have
been cumbersome, and so we will follow the first convention in this section.
Theorem 16.1 (Green function comparison [69, theorem 2.3]). Suppose
that we have two generalized 𝑁 × 𝑁 Wigner matrices, 𝐻 (𝐯) and 𝐻 (𝐰) , with matrix
elements ℎ𝑖𝑗 given by the random variables 𝑁 −1/2 𝑣𝑖𝑗 and 𝑁 −1/2 𝑤𝑖𝑗 , respectively,
with 𝑣𝑖𝑗 and 𝑤𝑖𝑗 satisfying the uniform polynomial decay condition (5.6). Fix a
bijective ordering map on the index set of the independent matrix elements,
𝑁(𝑁 + 1)
𝜙 ∶ {(𝑖, 𝑗) ∶ 1 ≤ 𝑖 ≤ 𝑗 ≤ 𝑁} → {1, … , 𝛾(𝑁)}, 𝛾(𝑁) ∶= ,
2
and denote by 𝐻𝛾 the generalized Wigner matrix whose matrix elements ℎ𝑖𝑗 follow
the 𝑣-distribution if 𝜙(𝑖, 𝑗) ≤ 𝛾 and the 𝐰-distribution otherwise; in particular,
𝐻 (𝐯) = 𝐻0 and 𝐻 (𝐰) = 𝐻𝛾(𝑁) . Let 𝜅 > 0 be arbitrary and suppose that, for any
small parameter 𝜏 > 0 and for any 𝑦 ≥ 𝑁 −1+𝜏 , we have the following estimate on
183
184 16. UNIVERSALITY OF WIGNER MATRICES IN SMALL ENERGY WINDOWS: GFT

the diagonal elements of the resolvent:


| 1 |
(16.1) max max max ||( ) || ≺ 𝑁 2𝜏
0≤𝛾≤𝛾(𝑁) 1≤𝑘≤𝑁 |𝐸|≤2−𝜅 𝐻𝛾 − 𝐸 − 𝑖𝑦 𝑘𝑘

with some constants 𝐶, 𝑐 depending only on 𝜏, 𝜅. Moreover, we assume that the


first four moments of 𝑣𝑖𝑗 and 𝑤𝑖𝑗 satisfy that
|𝔼𝑣𝑢 𝑠−𝑢 𝑠−𝑢 |
(16.2) 𝑖𝑗 𝑣𝑖𝑗 − 𝔼𝑤 𝑢
𝑖𝑗 𝑤𝑖𝑗 ≤ 𝑁 −𝛿−2+𝑠/2 , 𝑠 = 0, 1, 2, 3, 4, 𝑢 = 0, 1, … , 𝑠,
for some given 𝛿 > 0. Let 𝜀 > 0 be arbitrary, and choose an 𝜂 with 𝑁 −1−𝜀 ≤
𝜂 ≤ 𝑁 −1 . For any sequence of complex parameters 𝑧𝑗 = 𝐸𝑗 ± 𝑖𝜂, 𝑗 = 1, … , 𝑛,
with |𝐸𝑗 | ≤ 2 − 2𝜅 and with an arbitrary choice of the ± signs. Let 𝐺 (𝐯) (𝑧) =
(𝐻 (𝐯) − 𝑧)−1 denote the resolvent, and let 𝐹(𝑥1 , … , 𝑥𝑛 ) be a function such that for
any multi-index 𝛼 = (𝛼1 , … , 𝛼𝑛 ) with 1 ≤ |𝛼| ≤ 5 and for any 𝜀′ > 0 sufficiently
small, we have
′ ′
(16.3) max{|𝜕 𝛼 𝐹(𝑥1 , … , 𝑥𝑛 )| ∶ max |𝑥𝑗 | ≤ 𝑁 𝜀 } ≤ 𝑁 𝐶0 𝜀
𝑗

and
(16.4) max{|𝜕 𝛼 𝐹(𝑥1 , … , 𝑥𝑛 )| ∶ max |𝑥𝑗 | ≤ 𝑁 2 } ≤ 𝑁 𝐶0
𝑗

for some constant 𝐶0 .


Then there is a constant 𝐶1 , depending on 𝜗, ∑𝑚 𝑘𝑚 , and 𝐶0 such that for
any 𝜂 with 𝑁 −1−𝜀 ≤ 𝜂 ≤ 𝑁 −1 and for any choices of the signs in the imaginary
part of 𝑧𝑗𝑚 , we have
(𝐯) (𝐯)
(16.5) ||𝔼𝐹(𝐺𝑎1 𝑏1 (𝑧1 ), … , 𝐺𝑎𝑛 𝑏𝑛 (𝑧𝑛 )) − 𝔼𝐹(𝐺 (𝐯) → 𝐺 (𝐰) )|| ≤
𝐶1 𝑁 −1/2+𝐶1 𝜀 + 𝐶1 𝑁 −𝛿+𝐶1 𝜀 ,
where the arguments of 𝐹 in the second term are changed from the Green functions
of 𝐻 (𝐯) to 𝐻 (𝐰) and all other parameters remain unchanged.
Moreover, for any |𝐸| ≤ 2 − 2𝜅, any 𝑘 ≥ 1, and any compactly supported
smooth test function 𝑂 ∶ ℝ𝑛 → ℝ, we have
(𝑛) (𝑛) 𝜶
(16.6) lim ∫ d𝜶 𝑂(𝜶)(𝑝𝐯,𝑁 − 𝑝𝐰,𝑁 )(𝐸 + )=0
𝑁→∞
ℝ𝑛
𝑁
(𝑛) (𝑛)
where 𝑝𝐯,𝑁
and 𝑝𝐰,𝑁denote the 𝑛-point correlation functions of the 𝐯- and 𝐰-
ensembles, respectively.
Remark 1. We formulated Theorem 16.1 for functions of traces of mono-
mials of the Green function because this is the form we need in the application.
However, the result (and the proof we are going to present) holds directly for
matrix elements of monomials of Green functions as well (for the precise state-
ment, see [69]). We also remark that Theorem 16.1 holds for generalized Wigner
matrices if 𝐶sup = sup𝑖𝑗 𝑁𝑠𝑖𝑗 < ∞. The positive lower bound on the variances,
𝐶inf > 0 in (2.6), is not necessary for this theorem.
16.1. GREEN FUNCTION COMPARISON THEOREMS 185

Remark 2. Although we state Theorem 16.1 for Hermitian and symmetric


ensembles, similar results hold for real and complex sample covariance ensem-
bles; the modification of the proof is obvious.
Proof of Theorem 16.1 (Green Function Comparison Theorem).
The basic idea is to estimate the effect of changing matrix elements of the re-
solvent one by one by a resolvent expansion. Since each matrix element has a
typical size of 𝑁 −1/2 and the resolvents are essentially bounded thanks to (16.1),
a resolvent expansion up to the fourth order will identify the change of each ele-
ment with a precision 𝑂(𝑁 −5/2 ) (modulo some tiny corrections of order 𝑁 𝑂(𝜏) ).
The expectation values of the terms up to fourth order involve only the first four
moments of the single-entry distribution, which can be directly compared. The
error terms are negligible even when we sum them up 𝑁 2 times, the number of
comparison steps needed to replace all matrix elements.
Now we turn to the one-by-one replacement. For notational simplicity, we
will consider the case when the test function 𝐹 has only 𝑛 = 1 variable and
𝑘1 = 1; i.e., we consider the trace of a first-order monomial; the general case
follows analogously. Consider the telescopic sum of differences of expectations
1 1 1 1
(16.7) 𝔼𝐹( Tr ) − 𝔼𝐹( Tr )=
𝑁 (𝐯)
𝐻 −𝑧 𝑁 𝐻 (𝐰) −𝑧
𝛾(𝑁)
1 1 1 1
∑ [𝔼𝐹( Tr ) − 𝔼𝐹( Tr )].
𝛾=1
𝑁 𝐻𝛾 − 𝑧 𝑁 𝐻𝛾−1 − 𝑧

Let 𝐸 (𝑖𝑗) denote the matrix whose matrix elements are 0 everywhere except at
(𝑖𝑗)
the (𝑖, 𝑗) position, where it is 1, i.e., 𝐸𝑘ℓ = 𝛿𝑖𝑘 𝛿𝑗ℓ . Fix an 𝛾 ≥ 1 and let (𝑖, 𝑗)
be determined by 𝜙(𝑖, 𝑗) = 𝛾. We will compare 𝐻𝛾−1 with 𝐻𝛾 . Note that these
two matrices differ only in the (𝑖, 𝑗) and (𝑗, 𝑖) matrix elements, and they can be
written as
1
𝐻𝛾−1 = 𝑄 + 𝑉, 𝑉 ∶= 𝑣𝑖𝑗 𝐸 (𝑖𝑗) + 𝑣𝑗𝑖 𝐸 (𝑗𝑖) ,
√𝑁
1
𝐻𝛾 = 𝑄 + 𝑊, 𝑊 ∶= 𝑤𝑖𝑗 𝐸 (𝑖𝑗) + 𝑤𝑗𝑖 𝐸 (𝑗𝑖) ,
√𝑁
with a matrix 𝑄 that has zero matrix element at the (𝑖, 𝑗) and (𝑗, 𝑖) positions and
where we set 𝑣𝑗𝑖 ∶= 𝑣 𝑖𝑗 for 𝑖 < 𝑗 and similarly for 𝑤. Define the Green functions
1 1
𝑅 ∶= , 𝑆 ∶= .
𝑄−𝑧 𝐻𝛾 − 𝑧
We first claim that the estimate (15.16) holds for the Green function 𝑅 as
well. To see this, we have, from the resolvent expansion,
𝑅 = 𝑆 + 𝑁 −1/2 𝑆𝑉𝑆 + ⋯ + 𝑁 −9/5 (𝑆𝑉)9 𝑆 + 𝑁 −5 (𝑆𝑉)10 𝑅.
Since 𝑉 has only at most two nonzero elements, when computing the (𝑘, ℓ)
matrix element of this matrix identity, each term is a finite sum involving matrix
186 16. UNIVERSALITY OF WIGNER MATRICES IN SMALL ENERGY WINDOWS: GFT

elements of 𝑆 or 𝑅 and 𝑣𝑖𝑗 , e.g., (𝑆𝑉𝑆)𝑘ℓ = 𝑆𝑘𝑖 𝑣𝑖𝑗 𝑆𝑗ℓ + 𝑆𝑘𝑗 𝑣𝑗𝑖 𝑆𝑖ℓ . Using the
bound (15.16) for the 𝑆 matrix elements, the subexponential decay for 𝑣𝑖𝑗 , and
the trivial bound |𝑅𝑖𝑗 | ≤ 𝜂 −1 , we obtain that the estimate (15.16) holds for 𝑅.
We can now start proving the main result by comparing the resolvents of
(𝛾−1)
𝐻 and 𝐻 (𝛾) with the resolvent 𝑅 of the reference matrix 𝑄. By the resolvent
expansion,
𝑆 = 𝑅 − 𝑁 −1/2 𝑅𝑉𝑅 + 𝑁 −1 (𝑅𝑉)2 𝑅 − 𝑁 −3/2 (𝑅𝑉)3 𝑅
(16.8)
+ 𝑁 −2 (𝑅𝑉)4 𝑅 − 𝑁 −5/2 (𝑅𝑉)5 𝑆,
so we can write
4
1 (𝑚)
Tr 𝑆 = 𝑅ˆ + 𝜉, 𝜉 ∶= ∑ 𝑁 −𝑚/2 𝑅ˆ𝐯 + 𝑁 −5/2 Ω𝐯
𝑁 𝑚=1

with
1 (𝑚) 1
𝑅ˆ ∶= Tr 𝑅, 𝑅ˆ𝐯 ∶= (−1)𝑚 Tr (𝑅𝑉)𝑚 𝑅,
(16.9) 𝑁 𝑁
1
Ω𝐯 ∶= − Tr (𝑅𝑉)5 𝑆.
𝑁
For each diagonal element in the computation of these traces, the contribution
ˆ 𝑅ˆ(𝑚)
to 𝑅, 𝐯 , and Ω𝐯 is a sum of a few terms, e.g.,

(2) 1
𝑅ˆ𝐯 = ∑[𝑅𝑘𝑖 𝑣𝑖𝑗 𝑅𝑗𝑗 𝑣𝑗𝑖 𝑅𝑖𝑘 + 𝑅𝑘𝑖 𝑣𝑖𝑗 𝑅𝑗𝑖 𝑣𝑖𝑗 𝑅𝑗𝑘
𝑁 𝑘
+ 𝑅𝑘𝑗 𝑣𝑗𝑖 𝑅𝑖𝑖 𝑣𝑖𝑗 𝑅𝑗𝑘 + 𝑅𝑘𝑗 𝑣𝑗𝑖 𝑅𝑖𝑗 𝑣𝑗𝑖 𝑅𝑖𝑘 ]

and similar formulas hold for the other terms. Then we have
1 1
𝔼𝐹( Tr )
𝑁 𝐻𝛾 − 𝑧

= 𝔼𝐹(𝑅ˆ + 𝜉)
(16.10)
= 𝔼[𝐹(𝑅ˆ) + 𝐹 ′ (𝑅ˆ)𝜉 + 𝐹 ′′ (𝑅ˆ)𝜉 2 + ⋯ + 𝐹 (5) (𝑅ˆ + 𝜉 ′ )𝜉 5 ]
5
(𝑚)
= ∑ 𝑁 −𝑚/2 𝔼𝐴𝐯
𝑚=0

(𝑚)
where 𝜉 ′ is a number between 0 and 𝜉 that depends on 𝑅ˆ and 𝜉, and the 𝐴𝐯 ’s
are defined as
(0) (1) (1) (2) (1) (2)
ˆ
𝐴𝐯 = 𝐹(𝑅), ˆ 𝑅ˆ𝐯 ,
𝐴𝐯 = 𝐹 ′ (𝑅) ˆ 𝑅ˆ𝐯 )2 + 𝐹 ′ (𝑅)
𝐴𝐯 = 𝐹 ″ (𝑅)( ˆ 𝑅ˆ𝐯 ,
(3) (4)
and similarly for 𝐴𝐯 and 𝐴𝐯 . Finally,
(5)
ˆ + 𝐹 (5) (𝑅ˆ + 𝜉 ′ )(𝑅ˆ(1)
𝐴𝐯 = 𝐹 ′ (𝑅)Ω 5
𝐯 ) + ⋯.
16.1. GREEN FUNCTION COMPARISON THEOREMS 187

(𝑚)
The expectation values of the terms 𝐴𝐯 , 𝑚 ≤ 4, with respect to 𝑣𝑖𝑗 are deter-
mined by the first four moments of 𝑣𝑖𝑗 ; e.g.,
(2) 1
𝔼𝐴𝐯 = 𝐹 ′ (𝑅ˆ )[ ∑ 𝑅 𝑅 𝑅 + ⋯ ]𝔼|𝑣𝑖𝑗 |2
𝑁 𝑘 𝑘𝑖 𝑗𝑗 𝑖𝑘
1
+ 𝐹 ″ (𝑅ˆ )[ ∑ 𝑅 𝑅 𝑅 𝑅 + ⋯ ]𝔼|𝑣𝑖𝑗 |2
𝑁 2 𝑘,ℓ 𝑘𝑖 𝑗ℓ ℓ𝑗 𝑖𝑘
1
+ 𝐹 ′ (𝑅ˆ )[ ∑ 𝑅𝑘𝑖 𝑅𝑗𝑖 𝑅𝑗𝑘 + ⋯ ]𝔼 𝑣𝑖𝑗
2
𝑁 𝑘
1
+ 𝐹 ″ (𝑅ˆ )[ 2
∑ 𝑅 𝑅 𝑅 𝑅 + ⋯ ]𝔼𝑣𝑖𝑗 .
𝑁 2 𝑘,ℓ 𝑘𝑖 𝑗ℓ ℓ𝑖 𝑗𝑘

Note that the coefficients involve up to four derivatives of 𝐹 and normalized


sums of matrix elements of 𝑅. Using the estimate (15.16) for 𝑅 and the derivative
bounds (16.3) for the typical values of 𝑅, ˆ we see that all these coefficients are
𝐶(𝜏+𝜀)
bounded by 𝑁 with a very large probability, where 𝐶 is an explicit constant.
We use the bound (16.4) for the extreme values of 𝑅, ˆ but this event has a very
𝑠 𝑢
small probability by (15.16). Therefore, the coefficients of the moments 𝔼𝑣 𝑖𝑗 𝑣𝑖𝑗 ,
(0) (4)
𝑢+𝑠 ≤ 4, in the quantities 𝐴𝐯 , … , 𝐴𝐯 are essentially bounded, modulo a factor
𝑁 𝐶(𝜏+𝜀) . Notice that the 𝑘th moment of 𝑣𝑖𝑗 appears only in the 𝑚 = 𝑘 term that
already has a prefactor 𝑁 −𝑘/2 in (16.10).
(5)
Finally, we have to estimate the error term 𝐴𝐯 . All terms without Ω can
be dealt with as before; after estimating the derivatives of 𝐹 by 𝑁 𝐶(𝜏+𝜀) , one
can perform the expectation with respect to 𝑣𝑖𝑗 that is independent of 𝑅ˆ(𝑚) .
For the terms involving Ω, one can argue similarly by appealing to the fact that
the matrix elements of 𝑆 are also essentially bounded by 𝑁 𝐶(𝜏+𝜀) (see (15.16))
and that 𝑣𝑖𝑗 has subexponential decay. Alternatively, one can use the Hölder
inequality to decouple 𝑆 from the rest and use (15.16) directly, e.g.,
ˆ 1 ˆ Tr(𝑅𝑉)5 𝑆|
𝔼|𝐹 ′ (𝑅)Ω| = 𝔼|𝐹 ′ (𝑅)
𝑁
1 ˆ 2 Tr 𝑆 2 ]1/2 [𝔼 Tr(𝑅𝑉)5 (𝑉𝑅∗ )5 ]1/2
≤ [𝔼(𝐹 ′ (𝑅))
𝑁
≤ 𝐶𝑁 −5/2+𝐶(𝜏+𝜀) .
Note that exactly the same perturbation expansion holds for the resolvent
of 𝐻𝛾−1 with 𝐴𝐯 replaced by 𝐴𝐰 ; i.e., 𝑣𝑖𝑗 is replaced with 𝑤𝑖𝑗 everywhere. By
the moment matching condition (16.2), the expectation values
(𝑚) (𝑚)
𝑁 −𝑚/2 |𝔼𝐴𝐯 − 𝔼𝐴𝐰 | ≤ 𝑁 𝐶(𝜏+𝜀)−𝛿−2
for 𝑚 ≤ 4 in (16.10). Choosing 𝜏 = 𝜀, we have
1 1 1 1
𝔼𝐹( Tr ) − 𝔼𝐹( Tr ) ≤ 𝐶𝑁 −5/2+𝐶𝜀 + 𝐶𝑁 −2−𝛿+𝐶𝜀 .
𝑁 𝐻𝛾 − 𝑧 𝑁 𝐻𝛾−1 − 𝑧
188 16. UNIVERSALITY OF WIGNER MATRICES IN SMALL ENERGY WINDOWS: GFT

After summing up in (16.7) we have thus proved that


1 1 1 1
𝔼𝐹( Tr ) − 𝔼𝐹( Tr ) ≤ 𝐶𝑁 −1/2+𝐶𝜀 + 𝐶𝑁 −𝛿+𝐶𝜀 .
𝑁 𝐻 (𝐯) − 𝑧 𝑁 𝐻 (𝐰) − 𝑧
The proof can be easily generalized to functions of several variables. This con-
cludes the proof of Theorem 16.1. □

16.2. Conclusion of the Three-Step Strategy


In this section we put the previous results together to prove our main result,
Theorem 5.1. Recall that the main result of the Step 2, Theorem 12.4, asserts the
universality of the Gaussian divisible ensemble. However, in order to reach en-
ergy window 𝑁 −1+𝜀 , we need to add a substantially large Gaussian component.
In terms of DBM, we need the time 𝑡 ≥ 𝑁 −𝑐𝜀 for some 𝑐 > 0. The eigenvalue
continuity result in the previous section is unfortunately restricted to 𝑡 ≤ 𝑁 −1−𝜀 .
To overcome the deficiency of the continuity, we will approximate the Wigner
ensemble by a suitably chosen Gaussian divisible ensemble. Thus, we term this
step “approximation by a Gaussian divisible ensemble.”
We first review the classical moment matching problem for real random
variables. For any real random variable 𝜉, denote by 𝑚𝑘 (𝜉) = 𝔼𝜉 𝑘 its 𝑘th mo-
ment. By the Schwarz inequality, the sequence of moments, 𝑚1 , 𝑚2 , … are not
arbitrary numbers, e.g., |𝑚1 |𝑘 ≤ 𝑚𝑘 and 𝑚22 ≤ 𝑚4 , etc., but there are more
subtle relations. For example, if 𝑚1 = 0, then
(16.11) 𝑚4 𝑚2 − 𝑚32 ≥ 𝑚23 ,
which can be obtained by
𝑚32 = [𝔼𝜉 3 ]2 = [𝔼𝜉(𝜉 2 − 1)]2 ≤ [𝔼𝜉 2 ][𝔼(𝜉 2 − 1)2 ] = 𝑚2 (𝑚4 − 2𝑚22 + 1)
and noticing that (16.11) is scale invariant, so it is sufficient to prove it for 𝑚2 = 1.
In fact, it is easy to see that (16.11) saturates if and only if the support of 𝜉 consists
of exactly two points (apart from the trivial case when 𝜉 ≡ 0).
This restriction shows that given a sequence of four admissible moments,
𝑚1 = 0, 𝑚2 = 1, 𝑚3 , 𝑚4 , there may not exist a Gaussian divisible random
variable 𝜉 with these moments; e.g., the moment sequence (𝑚1 , 𝑚2 , 𝑚3 , 𝑚4 ) =
(0, 1, 0, 1) uniquely characterizes the standard Bernoulli variable (𝜉 = ±1 with
1/2-1/2 probability). However, if we allow a bit of room in the fourth moment,
then one can match any four admissible moments by a small Gaussian divisible
variable. This is the content of the next lemma [69, lemma 6.5]. This matching
idea in the context of random matrices appeared earlier in [131]. For simplicity,
we formulate it in the case of real variables; the extension to complex variables
is standard.
Lemma 16.2 (Moment matching). Let 𝑚3 and 𝑚4 be two real numbers such
that
(16.12) 𝑚4 − 𝑚32 − 1 ≥ 0, 𝑚4 ≤ 𝐶2 ,
16.2. CONCLUSION OF THE THREE-STEP STRATEGY 189

for some positive constant 𝐶2 . Let 𝜉 G be a Gaussian random variable with mean 0
and variance 1. Then, for any sufficiently small 𝛾 > 0 (depending on 𝐶2 ), there
exists a real random variable 𝜉𝛾 with subexponential decay that is independent
of 𝜉 G such that the first four moments of
(16.13) 𝜉 ′ = (1 − 𝛾)1/2 𝜉𝛾 + 𝛾1/2 𝜉 G
are 𝑚1 (𝜉 ′ ) = 0, 𝑚2 (𝜉 ′ ) = 1, 𝑚3 (𝜉 ′ ) = 𝑚3 , and 𝑚4 (𝜉 ′ ), and
(16.14) |𝑚4 (𝜉 ′ ) − 𝑚4 | ≤ 𝐶𝛾
for some 𝐶 depending on 𝐶2 .
Proof. We first construct a random variable 𝑋 with the first four moments
given by 0, 1, 𝑚3 , 𝑚4 satisfying 𝑚4 − 𝑚32 − 1 ≥ 0. We take the law of 𝑋 to be of
the form
𝑝𝛿𝑎 + 𝑞𝛿−𝑏 + (1 − 𝑝 − 𝑞)𝛿0 ,
where 𝑎, 𝑏, 𝑝, 𝑞 ≥ 0 are parameters satisfying 𝑝+𝑞 ≤ 1. The conditions 𝑚1 (𝑋) =
0 and 𝑚2 (𝑋) = 1 imply
1 1
𝑝= , 𝑞= .
𝑎(𝑎 + 𝑏) 𝑏(𝑎 + 𝑏)
Furthermore, the condition 𝑝 +𝑞 ≤ 1 reads 𝑎𝑏 ≥ 1. By an explicit computation,
we find
(16.15) 𝑚3 (𝑋) = 𝑎 − 𝑏, 𝑚4 (𝑋) = 𝑚3 (𝑋)2 + 𝑎𝑏.
Clearly, the condition 𝑚4 − 𝑚32 − 1 ≥ 0 implies that (16.15) has a solution with
𝑎𝑏 ≥ 1. This proves the existence of a random variable supported at three points
given four moments 0, 1, 𝑚3 , and 𝑚4 satisfying 𝑚4 − 𝑚32 − 1 ≥ 0.
Our main task is to construct a Gaussian divisible distribution that approx-
imates the four moment matching conditions stated in the lemma. For any real
random variable 𝜁, independent of 𝜉 G , and with the first four moments being
0, 1, 𝑚3 (𝜁) and 𝑚4 (𝜁) < ∞, the first four moments of
(16.16) 𝜁 ′ = (1 − 𝛾)1/2 𝜁 + 𝛾1/2 𝜉 G
are 0, 1,
(16.17) 𝑚3 (𝜁 ′ ) = (1 − 𝛾)3/2 𝑚3 (𝜁),
and
(16.18) 𝑚4 (𝜁 ′ ) = (1 − 𝛾)2 𝑚4 (𝜁) + 6𝛾 − 3𝛾2 .
Since we can match four moments by a random variable, for any 𝛾 > 0 there
exists a real random variable 𝜉𝛾 such that the first four moments are
0, 1, 𝑚3 (𝜉𝛾 ) = (1 − 𝛾)−3/2 𝑚3 ,
(16.19)
and 𝑚4 (𝜉𝛾 ) = 𝑚3 (𝜉𝛾 )2 + (𝑚4 − 𝑚32 ).
With 𝑚4 ≤ 𝐶2 , we have 𝑚32 ≤ 𝐶23/2 ; thus,
(16.20) |𝑚4 (𝜉𝛾 ) − 𝑚4 | ≤ 𝐶𝛾
190 16. UNIVERSALITY OF WIGNER MATRICES IN SMALL ENERGY WINDOWS: GFT

for some 𝐶 depending on 𝐶2 . Hence with (16.17) and (16.18), we obtain that
𝜉 ′ = (1 − 𝛾)1/2 𝜉𝛾 + 𝛾1/2 𝜉 G satisfies 𝑚3 (𝜉 ′ ) = 𝑚3 and (16.14). This completes
the proof of Lemma 16.2. □
We now prove Theorem 5.1, i.e., that the limit in (15.1) holds. We restrict
ourselves to the real symmetric case since the moment matching Lemma 16.2
was formulated for real random variables. A similar argument works for the
complex case. From the universality of Gaussian divisible ensembles (more
precisely, (12.24)) for any 𝑏 = 𝑁 −1+10𝜀 and 𝑡 ≥ 𝑁 −𝜀 , we have
𝐸+𝑏
1 (𝑛) (𝑛) 𝜶
(16.21) ∫ d𝐸 ′ ∫ d𝜶 𝑂(𝜶)(𝑝𝑡,𝑁 − 𝑝𝐺,𝑁 )(𝐸 ′ + ) → 0
2𝑏 𝐸−𝑏 ℝ𝑛
𝑁
(𝑛)
where 𝑝𝑡,𝑁 is the eigenvalue correlation function of the matrix ensemble 𝐻𝑡 =
𝑒−𝑡/2 𝐻0 + √1 − 𝑒−𝑡 𝐻 G .
The initial matrix ensemble 𝐻0 can be any Wigner ensemble. Recall that 𝐻
is the Wigner ensemble for which we wish to prove (15.1). Given the first four
moments 𝑚1 = 0, 𝑚2 = 1, 𝑚3 , and 𝑚4 of the rescaled matrix elements √𝑁ℎ𝑖𝑗
of 𝐻, we first use Lemma 16.2 to construct a distribution 𝜉𝛾 with 𝛾 = 1 − 𝑒−𝑡 .
This distribution has the property that the first four moments of 𝜉 ′ , defined by
(16.13), matches the first four moments of the rescaled matrix elements of 𝐻.
We now choose 𝐻0 to be the Wigner ensemble with rescaled matrix elements
distributed according to 𝜉𝛾 . Then, on one hand, the matrix 𝐻𝑡 will have universal
local spectral statistics in the sense of (16.21); on the other hand, we can apply
the Green function comparison Theorem 16.1 to the matrix ensembles 𝐻 and 𝐻𝑡
so that (15.9) holds for these two ensembles. Clearly, (15.1) follows from (16.21)
and (15.9). Note that averaging in the energy parameter 𝐸 was necessary only for
the first step in (16.21), the comparison argument works for any fixed energy 𝐸.
CHAPTER 17

Edge Universality

The main result of edge universality for the generalized Wigner matrix is
given by the following theorem. For simplicity, we formulate it only for the real
symmetric case; the complex Hermitian case is analogous.
Theorem 17.1 (Universality of extreme eigenvalues). Suppose that we have
two real symmetric generalized 𝑁 × 𝑁 Wigner matrices, 𝐻 (𝐯) and 𝐻 (𝐰) , with
matrix elements ℎ𝑖𝑗 given by the random variables 𝑁 −1/2 𝑣𝑖𝑗 and 𝑁 −1/2 𝑤𝑖𝑗 , re-
spectively, with 𝑣𝑖𝑗 and 𝑤𝑖𝑗 satisfying the uniform subexponential decay condition
(2.7). Suppose that
2 2
(17.1) 𝔼𝑣𝑖𝑗 = 𝔼𝑤𝑖𝑗 .
(𝐯) (𝐰)
Let 𝜆𝑁 and 𝜆𝑁 denote the largest eigenvalues of 𝐻 (𝐯) and 𝐻 (𝐰) , respectively.
Then there is an 𝜀 > 0 and 𝛿 > 0 depending on 𝜗 in (2.7) such that for any real
parameter 𝑠 (may depend on 𝑁) we have

(𝐯)
ℙ(𝑁 2/3 (𝜆𝑁 − 2) ≤ 𝑠 − 𝑁 −𝜀 ) − 𝑁 −𝛿
(𝐰)
(17.2) ≤ ℙ(𝑁 2/3 (𝜆𝑁 − 2) ≤ 𝑠)
(𝐯)
≤ ℙ(𝑁 2/3 (𝜆𝑁 − 2) ≤ 𝑠 + 𝑁 −𝜀 ) + 𝑁 −𝛿
for 𝑁 ≥ 𝑁0 sufficiently large, where 𝑁0 is independent of 𝑠. An analogous result
holds for the smallest eigenvalue 𝜆1 .
This theorem shows that the statistics of the extreme eigenvalues depend
only on the second moments of the centered matrix entries, but the result does
not determine the distribution. For the Gaussian case, the corresponding dis-
tribution was identified by Tracy and Widom in [134, 135]. Theorem 17.1 im-
mediately gives the same result for Wigner matrices, but not yet for generalized
Wigner matrices. In fact, the Tracy-Widom law for generalized Wigner matri-
ces does not follow from Theorem 17.1, and its proof in [24] required a quite
different argument (see Theorem 18.7).
(𝐯)
We first give an outline of the proof of Theorem 17.1. Denote by 𝜚𝑁 the
empirical eigenvalue distribution
𝑁
(𝐯) 1 (𝐯)
𝜚𝑁 (𝑥) = ∑ 𝛿(𝑥 − 𝜆𝛼 ).
𝑁 𝛼=1
191
192 17. EDGE UNIVERSALITY

(𝐯) (𝐰)
Our goal is to compare the difference of 𝜚𝑁 and 𝜚𝑁 via their Stieltjes trans-
(𝐯) (𝐰)
forms, 𝑚𝑁 (𝑧) and 𝑚𝑁 (𝑧). We will be able to locate the largest eigenvalue via
the distribution of certain functionals of the Stieltjes transforms. This will be
done in the preparatory Lemma 17.2 and its corollary, Corollary 17.3. Therefore,
(𝐯) (𝐰) (𝐯)
if 𝑚𝑁 (𝑧) and 𝑚𝑁 (𝑧) are sufficiently close, we will be able to compare 𝜆𝑁 and
(𝐰)
𝜆𝑁 .
The main ingredient of the proof of Theorem 17.1 is the Green function com-
parison theorem at the edge (Theorem 17.4), which shows that the distributions
(𝐯) (𝐰)
of 𝑚𝑁 (𝑧) and 𝑚𝑁 (𝑧) on the critical scale Im 𝑧 ≈ 𝑁 −2/3 are the same provided
the second moments of the matrix elements match. This comparison will be
done by a resolvent expansion as done for the Green function comparison the-
orem in the bulk (Theorem 16.1). However, the resolvent expansion is more
efficient at the edge for a reason that we explain now.
Recall (6.13), stating that 𝑚sc (𝑧) ≍ 𝜂 if 𝜅 = ||𝐸| − 2| ≤ 𝜂 and 𝑧 = 𝐸 + 𝑖𝜂.
Since the extremal eigenvalue gaps are expected to be order 𝑁 −2/3 , we need to
set 𝜂 ≪ 𝑁 −2/3 in order to identify individual eigenvalues. This is a smaller scale
than that of the local semicircle law; in fact, 𝑚𝑁 (𝑧) does not have a deterministic
limit; we need to identify its distribution. Hence, the reference size of Im 𝑚𝑁 (𝑧)
that we should keep in mind is 𝑁 −1/3 . The largest eigenvalue can be located via
the Stieltjes transform as follows. By definition of 𝑚𝑁 , if there is an eigenvalue
in a neighborhood of 𝐸 within distance 𝜂, then
1 𝜂 1
Im 𝑚𝑁 (𝐸 + i𝜂) = ∑ ≥ ≫ 𝑁 −1/3 ;
2
𝑁 𝛼 (𝜆𝛼 − 𝐸) + 𝜂 2 𝑁𝜂

otherwise Im 𝑚𝑁 (𝐸 +i𝜂) ≲ 𝑁 −1/3 . Based upon this idea, we will construct a cer-
tain functional of Im 𝑚𝑁 that expresses the distribution of 𝜆𝑁 . In other words,
we expect that if we can control the imaginary part of the trace of the Green
function by 𝑁 −1/3 , then we can identify the individual extremal eigenvalues.
Roughly speaking, our basic principle is that whenever we understand 𝑚𝑁 (𝑧)
with a precision better than (𝑁𝜂)−1 (i.e., we can improve over the strong local semi-
circle law (6.32)
1
|𝑚𝑁 (𝑧) − 𝑚sc (𝑧)| ≺ , Im 𝑧 = 𝜂,
𝑁𝜂
at some scale 𝜂), then we can “identify” the eigenvalue distribution at that scale
precisely. It is instructive to compare the edge situation with the bulk, where
the critical scale 𝜂 ≍ 1/𝑁 identifies individual eigenvalues and the typical size
of Im 𝑚𝑁 is of order 1. In other words, the critical scale Im 𝑚𝑁 is of order 1 in
the bulk and is of order 𝑁 −1/3 at the edge. Owing to (6.31), this fact extends to
each resolvent matrix element, i.e., |𝐺𝑖𝑗 | ≲ 𝑁 −1/3 at the edge. Thus a resolvent
expansion is more powerful near the edge, which explains that less moment
matching is needed than in the bulk.
17. EDGE UNIVERSALITY 193

Now we start the detailed proof of Theorem 17.1. We first introduce some
notation. For any 𝐸1 ≤ 𝐸2 , let

𝒩(𝐸1 , 𝐸2 ) ∶= #{𝑗 ∶ 𝐸1 ≤ 𝜆𝑗 ≤ 𝐸2 } = Tr 1[𝐸1 ,𝐸2 ] (𝐻)

be the unnormalized counting function of the eigenvalues. Set 𝐸+ ∶= 2 +


𝑁 −2/3+𝜀 . From rigidity, Theorem 11.5, we know that 𝜆𝑁 ≤ 𝐸+ , i.e.,

(17.3) 𝒩(𝐸, ∞) = 𝒩(𝐸, 𝐸+ )

with very high probability, so it is sufficient to consider the counting function


𝒩(𝐸, 𝐸+ ) = Tr 𝜒𝐸 (𝐻), with 𝜒𝐸 (𝑥) ∶= 1[𝐸,𝐸+ ] (𝑥). We will approximate the sharp
cutoff function 𝜒𝐸 by its smoothened version 𝜒𝐸 ⋆𝜃𝜂 on scale 𝜂 ≪ 𝑁 −2/3 , where
we recall the notation 𝜃𝜂 (𝑥) from (15.21). The following lemma shows that the
error caused by this replacement is negligible.

Lemma 17.2. For any 𝜀 > 0 set ℓ1 = 𝑁 −2/3−3𝜀 and 𝜂 = 𝑁 −2/3−9𝜀 . Then
there exist constants 𝐶, 𝑐 such that for any 𝐸 satisfying |𝐸 − 2| ≤ 32 𝑁 −2/3+𝜀 and
for any 𝐷 we have

(17.4) ℙ{| Tr 𝜒𝐸 (𝐻) − Tr 𝜒𝐸 ⋆ 𝜃𝜂 (𝐻)| ≤ 𝐶(𝑁 −2𝜀 + 𝒩(𝐸 − ℓ1 , 𝐸 + ℓ1 ))} ≥


1 − 𝑁 −𝐷

if 𝑁 ≥ 𝑁0 (𝐷) is large enough.

We will not give the detailed proof of this lemma since it is a straightforward
approximation argument. We only point out that 𝜃𝜂 is an approximate delta
function on scale 𝜂 ≪ 𝑁 −2/3 , and if it were compactly supported, then the
difference between Tr 𝜒𝐸 (𝐻) and Tr 𝜒𝐸 ⋆ 𝜃𝜂 (𝐻) were clearly bounded by the
number of eigenvalues in an 𝜂-vicinity of 𝐸. Since 𝜃𝜂 (𝑥) = 𝜋 −1 𝜂/(𝑥 2 + 𝜂 2 ); i.e.,
it has a quadratically decaying tail, the above argument must be complemented
by estimating the density of eigenvalues away from 𝐸. This estimate comes
from two contributions. Eigenvalues within the ℓ1 -vicinity of 𝐸 are directly
estimated by the term 𝒩(𝐸 − ℓ1 , 𝐸 + ℓ1 ). Eigenvalues farther away come with
an additional factor 𝜂/ℓ1 = 𝑁 −6𝜀 due to the decay of 𝜃𝜂 ; thus, it is sufficient to
estimate their density only up to an 𝑁 𝜀 factor precision, which is provided by
the optimal local law at the edge. For precise details, see the proof of lemma 6.1
in [70] for essentially the same argument.
Using (17.3) and the local law to estimate a slightly averaged version of
𝒩(𝐸 − ℓ1 , 𝐸 + ℓ1 ), we arrive at the following corollary:

Corollary 17.3. Let |𝐸 − 2| ≤ 𝑁 −2/3+𝜀 , ℓ = 12 ℓ1 𝑁 2𝜀 = 12 𝑁 −2/3−𝜀 . Then


the inequality

(17.5) Tr 𝜒𝐸+ℓ ⋆ 𝜃𝜂 (𝐻) − 𝑁 −𝜀 ≤ 𝒩(𝐸, ∞) ≤ Tr 𝜒𝐸−ℓ ⋆ 𝜃𝜂 (𝐻) + 𝑁 −𝜀


194 17. EDGE UNIVERSALITY

holds with probability bigger than 1 − 𝑁 −𝐷 for any 𝐷 if 𝑁 is large enough. Fur-
thermore, we have
𝔼𝐹(Tr 𝜒𝐸−ℓ ⋆ 𝜃𝜂 (𝐻)) ≤ ℙ(𝒩(𝐸, ∞) = 0)
(17.6)
≤ 𝔼𝐹(Tr 𝜒𝐸+ℓ ⋆ 𝜃𝜂 (𝐻)) + 𝐶𝑁 −𝐷 .
3
Proof. We have 𝐸+ − 𝐸 ≫ ℓ; thus |𝐸 − 2 − ℓ| ≤ 𝑁 −2/3−𝜀 , and therefore
2
(17.4) holds for 𝐸 replaced with 𝑦 ∈ [𝐸 − ℓ, 𝐸] as well. We thus obtain
𝐸
−1
Tr 𝜒𝐸 (𝐻) ≤ ℓ ∫ d𝑦 Tr 𝜒𝑦 (𝐻)
𝐸−ℓ
𝐸
≤ ℓ−1 ∫ d𝑦 Tr 𝜒𝑦 ∗ 𝜃𝜂 (𝐻)
𝐸−ℓ
𝐸
+ 𝐶ℓ−1 ∫ d𝑦[𝑁 −2𝜀 + 𝒩(𝑦 − ℓ1 , 𝑦 + ℓ1 )]
𝐸−ℓ
ℓ1
≤ Tr 𝜒𝐸−ℓ ∗ 𝜃𝜂 (𝐻) + 𝐶𝑁 −2𝜀 + 𝐶 𝒩(𝐸 − 2ℓ, 𝐸 + ℓ)

with a probability larger than 1 − 𝑁 −𝐷 for any 𝐷 > 0. From the rigidity estimate
in the form (11.25), the conditions |𝐸 − 2| ≤ 𝑁 −2/3+𝜀 , ℓ1 /ℓ = 2𝑁 −2𝜀 , and ℓ ≤
𝑁 −2/3 , we can bound
𝐸+ℓ
ℓ1
𝒩(𝐸 − 2ℓ, 𝐸 + ℓ) ≤ 𝑁 1−2𝜀 ∫ 𝜚sc (𝑥)d𝑥 + 𝑁 −2𝜀 𝑁 𝜀 ≤ 𝐶𝑁 −𝜀
ℓ 𝐸−2ℓ

with a very high probability, where we estimated the explicit integral using that
the integration domain is in a 𝐶𝑁 −2/3+𝜀 -vicinity of the edge at 2. We have thus
proved
𝒩(𝐸, 𝐸+ ) = Tr 𝜒𝐸 (𝐻) ≤ Tr 𝜒𝐸−ℓ ∗ 𝜃𝜂 (𝐻) + 𝑁 −𝜀 .
By (17.3), we can replace 𝒩(𝐸, 𝐸+ ) by 𝒩(𝐸, ∞). This proves the upper bound
of (17.5) and the lower bound can be proved similarly.
In the event that (17.5) holds, the condition 𝒩(𝐸, ∞) = 0 implies that
Tr 𝜒𝐸+ℓ ∗ 𝜃𝜂 (𝐻) ≤ 1/9. Thus we have
(17.7) ℙ(𝒩(𝐸, ∞) = 0) ≤ ℙ(Tr 𝜒𝐸+ℓ ∗ 𝜃𝜂 (𝐻) ≤ 1/9) + 𝐶𝑁 −𝐷 .
Together with the Markov inequality, this proves the upper bound in (17.6). For
the lower bound, we use
𝔼𝐹(Tr 𝜒𝐸−ℓ ∗ 𝜃𝜂 (𝐻)) ≤ ℙ(Tr 𝜒𝐸−ℓ ∗ 𝜃𝜂 (𝐻) ≤ 2/9)
≤ ℙ(𝒩(𝐸, ∞) ≤ 2/9 + 𝑁 −𝜀 )
= ℙ(𝒩(𝐸, ∞) = 0),
where we used the upper bound from (17.5) and that 𝒩 is an integer. This
completes the proof of the corollary. □
17. EDGE UNIVERSALITY 195

Since ℙ(𝜆𝑁 < 𝐸) = ℙ(𝒩(𝐸, ∞) = 0), we will use (17.6) and the identity
𝐸+
(17.8) Tr 𝜒𝐸 ⋆ 𝜃𝜂 (𝐻) = 𝑁 ∫ d𝑦 Im 𝑚𝑁 (𝑦 + 𝑖𝜂)
𝐸

to relate the distribution of 𝜆𝑁 with that of the Stieltjes transform 𝑚𝑁 . The fol-
lowing Green function comparison theorem shows that the distribution of the
right-hand side of (17.8) depends only on the second moments of the matrix
elements. Its proof will be given after we have completed the proof of Theo-
rem 17.1.

Theorem 17.4 (Green function comparison theorem on the edge). Suppose


that the assumptions of Theorem 17.1, including (17.1), hold. Let 𝐹 ∶ ℝ → ℝ be
a function whose derivatives satisfy
−𝐶1
(17.9) max |𝐹 (𝛼) (𝑥)| (|𝑥| + 1) ≤ 𝐶1 , 𝛼 = 1, 2, 3,
𝑥

with some constant 𝐶1 > 0. Then, there exists 𝜀0 > 0 depending only on 𝐶1 such
that for any 𝜀 < 𝜀0 and for any real numbers 𝐸1 and 𝐸2 satisfying
|𝐸1 − 2| ≤ 𝐶𝑁 −2/3+𝜀 , |𝐸2 − 2| ≤ 𝐶𝑁 −2/3+𝜀 ,
and setting 𝜂 = 𝑁 −2/3−𝜀 , we have
𝐸2
|
(17.10) ||𝔼𝐹(𝑁 ∫ d𝑦 Im 𝑚(𝐯) (𝑦 + 𝑖𝜂))
| 𝐸1
𝐸2
|
− 𝔼𝐹(𝑁 ∫ d𝑦 Im 𝑚(𝐰) (𝑦 + 𝑖𝜂))| ≤ 𝐶𝑁 −1/6+𝐶𝜀
𝐸1 |
for some constant 𝐶 and large enough 𝑁 depending only on 𝐶1 , 𝜗, 𝛿± , and 𝐶0 (in
(14.26)).

Note that the 𝑁 times the integration gives a factor 𝑁|𝐸1 − 𝐸2 | ∼ 𝑁 1/3+𝜀 on
the left-hand side. This factor compensates for the “natural” size of the imagi-
nary part of the Stieltjes transform obtained from the local semicircle law and
makes the argument of 𝐹 to be of order 1. The bound (17.10) shows that the
distributions of this order 1 quantity with respect to the 𝐯- and 𝐰-ensembles
are asymptotically the same.
Assuming that Theorem 17.4 holds, we now prove Theorem 17.1.

Proof of Theorem 17.1. By the rigidity of the eigenvalues in the form of


(11.25), we know that
|𝜆𝑁 − 2| ≤ 𝑁 −2/3+𝜀 , 𝒩(2 − 𝐶𝑁 −2/3+𝜀 , 2 + 𝐶𝑁 −2/3+𝜀 ) ≤ 𝐶𝑁 𝜀 ,
with very high probability. Thus we can assume that |𝑠| ≤ 𝑁 𝜀 holds for the
parameter 𝑠 in Theorem 17.1; the statement is trivial for other values of 𝑠. We
196 17. EDGE UNIVERSALITY

define 𝐸 ∶= 2 + 𝑠𝑁 −2/3 . With the left side of (17.6), for any sufficiently small
𝜀 > 0, we have
(17.11) 𝔼𝐰 𝐹(Tr 𝜒𝐸−ℓ ∗ 𝜃𝜂 (𝐻)) ≤ ℙ𝐰 (𝒩(𝐸, ∞) = 0)
with the choice
1
ℓ ∶= 𝑁 −2/3−𝜀 , 𝜂 ∶= 𝑁 −2/3−9𝜀 .
2
The bound (17.10) applying to the case 𝐸1 = 𝐸 − ℓ and 𝐸2 = 𝐸+ shows that
there exists 𝛿 > 0, for sufficiently small 𝜀 > 0, such that
(17.12) 𝔼𝐯 𝐹(Tr 𝜒𝐸−ℓ ∗ 𝜃𝜂 (𝐻)) ≤ 𝔼𝐰 𝐹(Tr 𝜒𝐸−ℓ ∗ 𝜃𝜂 (𝐻)) + 𝑁 −𝛿
(note that 9𝜀 plays the role of the 𝜀 in the Green function comparison theorem).
Then, applying the right side of (17.6) to the left-hand side of (17.12), we have
(17.13) ℙ𝐯 (𝒩(𝐸 − 2ℓ, ∞) = 0) ≤ 𝔼𝐯 𝐹(Tr 𝜒𝐸−ℓ ∗ 𝜃𝜂 (𝐻)) + 𝐶𝑁 −𝐷 .
Combining these inequalities, we have
(17.14) ℙ𝐯 (𝒩(𝐸 − 2ℓ, ∞) = 0) ≤ ℙ𝐰 (𝒩(𝐸, ∞) = 0) + 2𝑁 −𝛿
for sufficiently small 𝜀 > 0 and sufficiently large 𝑁. By recalling that 𝐸 =
2 + 𝑠𝑁 −2/3 , this proves the first inequality of (17.2) and, by switching the roles
of 𝐯 and 𝐰, the second inequality of (17.2) as well. This completes the proof of
Theorem 17.1. □
Proof of Theorem 17.4. We follow the notation and the setup in the proof
of Theorem 16.1. First, we prove a simpler version of (17.10) with 𝐹(𝑥) = 𝑥 and
without integration. Namely, we show that for any 𝐸 with |𝐸 − 2| ≤ 𝑁 −2/3+𝜀
we have
(𝐯) (𝐰)
𝑁𝜂 ||𝔼𝑚𝑁 (𝑧) − 𝔼𝑚𝑁 (𝑧)||

(17.15) | 1 1 1 1 |
= 𝑁𝜂 ||𝔼 Tr − 𝔼 Tr |
𝑁 (𝐯)
𝐻 −𝑧 𝑁 𝐻 (𝐰) − 𝑧|
≤ 𝐶𝑁 −1/6+𝐶𝜀 , 𝑧 = 𝐸 + 𝑖𝜂.
The prefactor 𝑁𝜂 accounts for the 𝑁 times the integration from 𝐸1 to 𝐸2 in (17.10)
since |𝐸1 − 𝐸2 | ≤ 𝐶𝜂.
We write
1 1 1 1
(17.16) 𝔼 Tr − 𝔼 Tr =
𝑁 (𝐯)
𝐻 −𝑧 𝑁 𝐻 (𝐰) −𝑧
𝛾(𝑁)
1 1 1 1
∑ [𝔼 Tr − 𝔼 Tr ]
𝛾=1
𝑁 𝐻𝛾 − 𝑧 𝑁 𝐻𝛾−1 − 𝑧

with 𝑅 ∶= (𝑄 − 𝑧)−1 , 𝑆 ∶= (𝐻𝛾 − 𝑧)−1 , and


4
1 (𝑚)
(17.17) Tr 𝑆 = 𝑅ˆ + 𝜉𝐯 , 𝜉𝐯 ∶= ∑ 𝑁 −𝑚/2 𝑅ˆ𝐯 + 𝑁 −5/2 Ω𝐯 ,
𝑁 𝑚=1
17. EDGE UNIVERSALITY 197

with
1 (𝑚) 1
𝑅ˆ ∶= Tr 𝑅, 𝑅ˆ𝐯 ∶= (−1)𝑚 Tr(𝑅𝑉)𝑚 𝑅,
(17.18) 𝑁 𝑁
1 5
Ω𝐯 ∶= − Tr(𝑅𝑉) 𝑆.
𝑁
All these quantities depend on 𝛾, or equivalently on the pair of indices (𝑖, 𝑗) with
𝜙(𝑖, 𝑗) = 𝛾 (see the proof of Theorem 16.1); i.e., 𝑅ˆ = 𝑅(𝛾),
ˆ etc., but we will omit
this fact from the notation. We also define the same quantities for 𝐯 replaced
with 𝐰. By the moment matching condition (17.1),
(𝑚) (𝑚)
𝔼𝑅ˆ𝐯 = 𝔼𝑅ˆ𝐰 , 𝑚 = 0, 1, 2,
so we can consider the 𝑚 = 3, 4 terms only in the summation in (17.17). Since
|𝐸 −2| ≤ 𝑁 −2/3+𝜀 and 𝜂 = 𝑁 −2/3−𝜀 , the strong local semicircle law (6.32) implies
that

Im 𝑚sc (𝑧) 1
(17.19) |𝑅𝑖𝑗 (𝑧) − 𝛿𝑖𝑗 𝑚sc (𝑧)| ≺ + ≤ 𝑁 −1/3+𝐶𝜀 ,
√ 𝑁𝜂 𝑁𝜂
and a similar bound holds for 𝑆. In particular, the off-diagonal terms of the Green
functions are of order 𝑁 −1/3+𝐶𝜀 and the diagonal terms are bounded with high
(𝑚)
probability. As a crude bound, we immediately obtain that |𝑅ˆ𝐯 |, |Ω𝐯 | ≤ 𝑁 𝐶𝜀
with high probability.
Hence we have
𝛾(𝑁)
| 1 1 1 1 |
(17.20) ∑ (𝑁𝜂)||𝔼 Tr − 𝔼 Tr |≤
|
𝛾=1
𝑁 𝐻 𝛾 − 𝑧 𝑁 𝐻 𝛾−1 − 𝑧
4
(𝑚) (𝑚)
∑ 𝑁 2 (𝑁𝜂)𝑁 −𝑚/2 max|𝔼[𝑅ˆ𝐯 − 𝑅ˆ𝐰 ]| + 𝜂𝑁 1/2+𝐶𝜀 .
𝛾
𝑚=3

Given a fixed 𝛾 with 𝜙(𝑖, 𝑗) = 𝛾, we have the explicit formula


(𝑚) (𝑚)
𝑅ˆ𝐯 = 𝑅ˆ𝐯 (𝛾)
(17.21) 𝑚
1
= (−1)𝑚 ∑∑ ∑ [𝑅 𝑣 𝑅 … 𝑣𝑎𝑚 𝑏𝑚 𝑅𝑏𝑚 𝑘 ].
𝑁 𝑘 ℓ=1 {𝑎 ,𝑏 }={𝑖,𝑗} 𝑘𝑎1 𝑎1 𝑏1 𝑏1 𝑎2
ℓ ℓ

First we discuss the generic case, when all indices are different, in particular
𝑖 ≠ 𝑗, and later we comment on the case of the coinciding indices. Notice that,
generically, the first and the last resolvents 𝑅𝑘𝑎1 , 𝑅𝑏𝑚 𝑘 are off-diagonal terms;
thus their size is 𝑁 −1/3+𝐶𝜀 . Every other factor in (17.21) is 𝑂(𝑁 𝜀 ). Hence the
generic contributions to the sum (17.20) give
(𝑚)
(17.22) 𝑁 2 (𝑁𝜂)𝑁 −𝑚/2 𝑅ˆ𝐯 ≤ 𝑁 2−𝑚/2 𝑁 −1/3+𝐶𝜀 .
198 17. EDGE UNIVERSALITY

For 𝑚 = 4 this is 𝑁 −1/3+𝐶𝜀 , so it is negligible. For 𝑚 = 3, we have the following:


(3)
(17.23) 𝑁 2 (𝑁𝜂)𝑁 −𝑚/2 𝑅ˆ𝐯 ≤ 𝑁 1/6+𝐶𝜀 ,
so the straightforward power counting is not sufficient. Our goal is to show
that the expectation of this term is in fact much smaller. To see this, we can
assume that on the right-hand side of (17.21), all the Green functions except
the first and the last one are diagonal since otherwise we have an additional
(third) off-diagonal term, which yields an extra factor 𝑁 −1/3+𝐶𝜀 , which would
be sufficient if combined with the trivial power counting yielding (17.23). If
both intermediate Green functions in (17.21) are diagonal, then we can replace
each of them with 𝑚sc by using the strong local semicircle law (17.19) at the
expense of a multiplicative error 𝑁 −1/3+𝐶𝜀 . This improves the previous power
counting of order 𝑁 1/6+𝐶𝜀 to 𝑁 −1/6+𝐶𝜀 as required in (17.15). Thus we only have
to consider the contributions from the terms
1
(17.24) ∑ ∑ 𝑚2 𝔼[𝑅𝑘𝑎1 𝑣𝑎1 𝑏1 𝑣𝑎2 𝑏2 𝑣𝑎3 𝑏3 𝑅𝑏3 𝑘 ],
𝑁 𝑘 {𝑎 ,𝑏 }={𝑖,𝑗} sc
ℓ ℓ

(3)
contributing to 𝑅ˆ𝐯 , where the summation is restricted to the choices consistent
with the fact that the all Green functions in the middle are diagonal, i.e., 𝑏1 = 𝑎2 ,
𝑏2 = 𝑎3 . Together with the restriction {𝑎ℓ , 𝑏ℓ } = {𝑖, 𝑗} and the assumption that
all indices are distinct, it yields only two terms:
1 1
(17.25) ∑ 𝑚2 𝔼[𝑅𝑘𝑖 𝑣𝑖𝑗 𝑣𝑗𝑖 𝑣𝑖𝑗 𝑅𝑗𝑘 ] + ∑ 𝑚sc
2
𝔼[𝑅𝑘𝑗 𝑣𝑗𝑖 𝑣𝑖𝑗 𝑣𝑗𝑖 𝑅𝑖𝑘 ].
𝑁 𝑘 sc 𝑁 𝑘

Considering the prefactors 𝑁 2 (𝑁𝜂)𝑁 −3/2 ≤ 𝑁 5/6 in (17.23), in order to prove


(17.15) we need to show that both terms in (17.25) are bounded by 𝑁 −1+𝐶𝜀 .
Note that the trivial power counting from the two off-diagonal elements would
only give the bound 𝑁 −2/3+𝐶𝜀 . In other words, we need to improve this trivial
estimate at least by a factor of 𝑁 −1/3 .
We will focus on the first term in (17.25). For the last resolvent entry, 𝑅𝑗𝑘 ,
we use the identity
(𝑖) 𝑅𝑗𝑖 𝑅𝑖𝑘
𝑅𝑗𝑘 = 𝑅𝑗𝑘 +
𝑅𝑖𝑖
from (8.1). The second factor already has two off-diagonal elements, 𝑅𝑗𝑖 𝑅𝑖𝑘 ,
yielding an additional 𝑁 −1/3+𝐶𝜀 factor with high probability (i.e., even without
taking the expectation). So we focus on the term
1 2 (𝑖)
(17.26) ∑ 𝑚sc 𝔼[𝑅𝑘𝑖 𝑣𝑖𝑗 𝑣𝑗𝑖 𝑣𝑖𝑗 𝑅𝑗𝑘 ].
𝑁 𝑘
Recall the identity (8.2)
(𝑖)
(𝑖)
𝑅𝑘𝑖 = −𝑅𝑘𝑘 ∑ 𝑅𝑘ℓ (𝑁 −1/2 𝑣ℓ𝑖 ).

17. EDGE UNIVERSALITY 199

Hence we have that


(𝑖)
1 2 −1/2 (𝑖) (𝑖)
(17.27) (17.26) = − ∑ 𝑚sc 𝑁 ∑ 𝔼[𝑅𝑘𝑘 𝑅𝑘ℓ 𝑣ℓ𝑖 𝑣𝑖𝑗 𝑣𝑗𝑖 𝑣𝑖𝑗 𝑅𝑗𝑘 ].
𝑁 𝑘 ℓ

We may again replace the diagonal term 𝑅𝑘𝑘 with 𝑚sc at a negligible error, and
we are left with
(𝑖)
3 −3/2 (𝑖) (𝑖)
(17.28) −𝑚sc 𝑁 ∑ ∑ 𝔼[𝑅𝑘ℓ 𝑣ℓ𝑖 𝑣𝑖𝑗 𝑣𝑗𝑖 𝑣𝑖𝑗 𝑅𝑗𝑘 ].
𝑘 ℓ

The expectation with respect to the variable 𝑣ℓ𝑖 renders this term 0 unless ℓ =
𝑗, in which case we gain an additional factor 𝑁 −1/2+𝐶𝜀 in the power counting
compared with (17.26). So the contribution of the last displayed expression to
(17.23) is improved from 𝑁 1/6+𝐶𝜀 to 𝑁 −1/3+𝐶𝜀 .
The argument so far assumed that all indices are distinct. In case of coin-
ciding indices, each coinciding index results in a factor 1/𝑁 from the restriction
in the summation. At the same time, one of the off-diagonal elements may be-
come diagonal; hence a factor 𝑁 −1/3+𝐶𝜀 , attributed to this off-diagonal element,
is “lost.” The total balance of a coinciding index in the power counting is thus a
factor 𝑁 −2/3 , so we conclude that terms with coinciding indices are negligible.
This proves the simplified version (16.5) of Theorem 17.4.
To prove (17.10), for simplicity we ignore the integration and we prove only
that

(17.29) |𝔼𝐹(𝑁𝜂 Im 𝑚(𝐯) (𝑧)) − 𝔼𝐹(𝑁𝜂 Im 𝑚(𝐰) (𝑧))|


≤ 𝐶𝑁 −1/6+𝐶𝜀 , 𝑧 = 𝐸 + 𝑖𝜂,

for any 𝐸 with |𝐸 − 2| ≤ 𝑁 −2/3+𝜀 . The 𝜂 factor, up to an irrelevant 𝑁 2𝜀 , accounts


for the integration domain of length |𝐸2 − 𝐸1 | ≤ 𝑁 −2/3+𝜀 in the arguments in
(17.10). The general form (17.10) follows in the same way. We point out that
(17.29) with 𝐹(𝑁𝜂 Im 𝑚(𝑧)) replaced by 𝐹(𝑁𝜂 𝑚(𝑧)) also holds with the same
argument given below.
Taking into account the telescopic summation over all 𝛾’s, for the proof of
(17.29) it is sufficient to prove that

| 1 1 1 1 |
(17.30) 𝑁 2 ||𝔼𝐹(𝑁𝜂 Im Tr ) − 𝔼𝐹(𝑁𝜂 Im Tr )|
𝑁 𝐻𝛾 − 𝑧 𝑁 𝐻𝛾 − 𝑧 |
≤ 𝐶𝑁 −1/6+𝐶𝜀 ,
and then sum it up for all 𝛾 = 𝜙(𝑖, 𝑗). As before, we assume the generic case,
i.e., 𝑖 ≠ 𝑗.
We Taylor expand the function 𝐹 around the point
Ξ ∶= 𝑁𝜂 Im 𝑚(𝑅) (𝑧), 𝑧 = 𝐸 + 𝑖𝜂,
200 17. EDGE UNIVERSALITY

where 𝑚(𝑅) = 𝑁1 Tr 𝑅. Notice that 𝑚(𝑅) is independent of 𝑣𝑖𝑗 and 𝑤𝑖𝑗 . Recalling
the definition of
4
(𝑚)
(17.31) 𝜉𝐯 = ∑ 𝑁 −𝑚/2 𝑅ˆ𝐯 + 𝑁 −5/2 Ω𝐯
𝑚=1

from (17.17), and a similar definition for 𝜉𝐰 , we obtain

| 1 1 1 1 |
(17.32) ||𝔼𝐹(𝑁𝜂 Im Tr ) − 𝔼𝐹(𝑁𝜂 Im Tr )| ≤
𝑁 𝐻𝛾 − 𝑧 𝑁 𝐻𝛾−1 − 𝑧 |
1
|𝔼𝐹 ′ (Ξ)(𝑁𝜂[𝜉𝐯 − 𝜉𝐰 ])| + |𝔼𝐹 ″ (Ξ)([𝑁𝜂𝜉𝐯 ]2 − [𝑁𝜂𝜉𝐰 ]2 )| + ⋯ .
2
We now substitute (17.31) into this expression. By the näive power counting
(𝑚)
(𝑁𝜂)𝑁 −𝑚/2 |𝑅ˆ𝐯 | ≤ 𝑁 −𝑚/2−1/3+𝐶𝜀 , (𝑁𝜂)𝑁 −5/2 |Ω𝐯 | ≤ 𝑁 −13/6+𝐶𝜀 ,
from the previous proof. The contributions of all the Ω error terms as well as the
𝑚 = 4 terms clearly satisfy the aimed bound (17.30), and thus can be neglected.
First, we prove that the quadratic term (as well as all higher-order terms) in
the Taylor expansion (17.32) is negligible. Here the cancellation between the 𝐯
and 𝐰 contributions is essential. It is therefore sufficient to consider
3
′ (𝑚) (𝑚 ) ′ (𝑚) (𝑚 )′
(𝑁𝜂)2 ∑ 𝑁 −𝑚/2−𝑚 /2 [𝑅ˆ𝐯 𝑅ˆ𝐯 − 𝑅ˆ𝐰 𝑅ˆ𝐰 ]
𝑚,𝑚′ =1

instead of [𝑁𝜂𝜉𝐯 ]2 − [𝑁𝜂𝜉𝐰 ]2 . Since 𝐹 ″ is bounded, any term with 𝑚 + 𝑚′ ≥ 4


is negligible, similarly to the previous by power counting. The contribution of
the 𝑚 = 𝑚′ = 1 term to the quadratic term in (17.32) is explicitly given by
1
𝔼𝐹 ″ (Ξ)( ∑[𝑅 𝑣 𝑅 + 𝑅𝑘𝑗 𝑣𝑗𝑖 𝑅𝑖𝑘 ])
𝑁 𝑘 𝑘𝑖 𝑖𝑗 𝑗𝑘
1
⋅( ∑[𝑅 ′ 𝑣 𝑅 ′ + 𝑅𝑘′ 𝑗 𝑣𝑗𝑖 𝑅𝑖𝑘′ ]) − 𝔼𝐹 ″ (Ξ)(𝑣 ↔ 𝑤).
𝑁 𝑘′ 𝑘 𝑖 𝑖𝑗 𝑗𝑘
Since 𝑅 and Ξ are independent of 𝑣𝑖𝑗 and 𝑤𝑖𝑗 , the expectations with respect to
𝑣𝑖𝑗 and 𝑤𝑖𝑗 can be computed. Since their second moments coincide, the two
expectations above exactly cancel each other.
We are left with the 𝑚 + 𝑚′ = 3 terms, we may assume 𝑚 = 2 and 𝑚′ = 1.
We do not expect any more cancellation from the 𝐯 and 𝐰 terms, so we con-
sider them separately. A typical such term contributing to (17.30) with all the
prefactors is of the form
1 1
𝑁 2 (𝑁𝜂)2 𝑁 −3/2 𝐹 ″ (Ξ)( ∑ 𝑅 𝑣 𝑅 𝑣 𝑅 )( ∑ 𝑅 ′ 𝑣 𝑅 ′ ).
𝑁 𝑘 𝑘𝑖 𝑖𝑗 𝑗𝑗 𝑗𝑖 𝑖𝑘 𝑁 𝑘′ 𝑘 𝑖 𝑖𝑗 𝑗𝑘
The key point is that generically there are at least four off-diagonal elements,
𝑅𝑘𝑖 , 𝑅𝑖𝑘 , 𝑅𝑘′ 𝑖 , and 𝑅𝑗𝑘′ (we chose the “worst” term, where the resolvent 𝑅𝑗𝑗
17. EDGE UNIVERSALITY 201

in the middle is diagonal). So together with the boundedness of 𝐹 ″ , the direct


power counting gives

𝑁 2 (𝑁𝜂)2 𝑁 −3/2 (𝑁 −1/3+𝐶𝜀 )4 = 𝑁 −1/6+𝐶𝜀 ,

which is exactly the precision required in (17.30). In case of each coinciding


index (for example, 𝑘 = 𝑖), two off-diagonal elements may be “lost,” but we gain
a factor 1/𝑁 from the restriction of the summation. So terms with coinciding
indices are negligible, as before. Here we discussed only the quadratic term in
the Taylor expansion (17.32), but it is easy to see that higher-order terms are
even smaller, without any cancellation effect.
So we are left with estimating the linear term in (17.32); i.e., we need to
bound
3
(𝑚) (𝑚)
𝑁 2 (𝑁𝜂) ∑ 𝑁 −𝑚/2 𝔼𝐹 ′ (Ξ)[𝑅ˆ𝐯 − 𝑅ˆ𝐰 ],
𝑚=1

where we already took into account that the 𝑚 = 4 term as well as the Ω𝐯
error term from (17.31) are negligible by direct power counting. If 𝐹 ′ (Ξ) were
deterministic, then the previous argument leading to (17.15) would also prove
the 𝑁 −1/6+𝐶𝜀 bound for the linear term in (17.32). In fact, the exact cancellation
(𝑚) (𝑚)
between the expectations of the 𝑅ˆ𝐯 and 𝑅ˆ𝐰 terms for 𝑚 = 1 and 𝑚 = 2
relied only on computing the expectation with respect to 𝑣𝑖𝑗 and 𝑤𝑖𝑗 and on the
matching of the first and second moments. Since 𝐹 ′ (Ξ) is independent of 𝑣𝑖𝑗
and 𝑤𝑖𝑗 , this argument remains valid even with the 𝐹 ′ (Ξ) factor included. We
thus need to control the 𝑚 = 3 term, i.e., show that
(3)
(17.33) 𝑁 2 (𝑁𝜂)𝑁 −3/2 𝔼𝐹 ′ (Ξ)𝑅ˆ𝐯 ≤ 𝑁 −1/6+𝐶𝜀 .
The same bound then would also hold if 𝐯 is replaced with 𝐰.
Using the boundedness of 𝐹 ′ and the argument between (17.24) and (17.28)
in the previous proof, we see that for (17.33) it is sufficient to show that

(𝑖)
| 2 −3/2 −3/2 (𝑖) (𝑖) |
(17.34) |𝑁 (𝑁𝜂)𝑁 𝑁 ∑ ∑ 𝔼𝐹 ′ (Ξ)[𝑅𝑘ℓ 𝑣ℓ𝑖 𝑣𝑖𝑗 𝑣𝑗𝑖 𝑣𝑖𝑗 𝑅𝑗𝑘 ]| ≤ 𝑁 −1/6+𝐶𝜀 .
| 𝑘 ℓ |

Without the 𝐹 ′ (Ξ) term, the expectation with respect to 𝑣ℓ𝑖 would render this
term 0, in the generic case when ℓ ≠ 𝑗. In order to exploit this effect we need
to expand 𝐹 ′ (Ξ) in the single variable 𝑣ℓ𝑖 .
Fix an index ℓ ≠ 𝑗 and let 𝑄ℓ be the matrix identical to 𝑄 except that the (ℓ, 𝑖)
and (𝑖, ℓ) matrix elements are 0, and let 𝑇 = 𝑇 ℓ = (𝑄ℓ − 𝑧)−1 be its resolvent.
Clearly 𝑇 is independent of 𝑣𝑖𝑗 and 𝑣ℓ𝑖 , and the local law (17.19) holds for the
matrix elements of 𝑇 as well. Using the resolvent expansion, we find
1
(17.35) Ξ = (𝑁𝜂) Im[ ∑[𝑇 + 𝑁 −1/2 (𝑇𝑛ℓ 𝑣ℓ𝑖 𝑇𝑖𝑛 + 𝑇𝑛𝑖 𝑣𝑖ℓ𝑖 𝑇ℓ𝑛 ) + ⋯ ]]
𝑁 𝑛 𝑛𝑛
202 17. EDGE UNIVERSALITY

where the higher-order terms carry a prefactor 𝑁 −1 . We Taylor expand 𝐹 ′ (Ξ)


around
1
Ξ0 ∶= (𝑁𝜂) Im ∑ 𝑇𝑛𝑛 ,
𝑁 𝑛
and we obtain
(17.36) 𝐹 ′ (Ξ) =
1 ″ 1
𝐹 ′ (Ξ0 ) + 𝐹 (Ξ0 )(𝑁𝜂) Im[ 3/2 ∑[𝑇𝑛ℓ 𝑣ℓ𝑖 𝑇𝑖𝑛 + 𝑇𝑛𝑖 𝑣𝑖ℓ𝑖 𝑇ℓ𝑛 + ⋯]] + ⋯ .
2 𝑁 𝑛

Since Ξ0 is independent of 𝑣ℓ𝑖 , when we insert the above expansion for 𝐹 ′ (Ξ)
into (17.34), the contribution of the leading term 𝐹 ′ (Ξ0 ) gives 0 after taking the
expectation for 𝑣ℓ𝑖 .
The two subleading terms in (17.36), which are linear in 𝑣ℓ𝑖 , have generi-
cally two off-diagonal elements, 𝑇𝑛ℓ and 𝑇𝑖𝑛 (or 𝑇𝑛𝑖 and 𝑇ℓ𝑛 ), which have size
𝑁 −1/3+𝐶𝜀 each. So the contribution of this term to (17.34) by simple power
counting is
𝑁 2 (𝑁𝜂)𝑁 −3/2 𝑁 −3/2 𝑁𝑁(𝑁𝜂)𝑁 −1/2 (𝑁 −1/3+𝐶𝜀 )4 ≤ 𝑁 −1/6+𝐶𝜀 .
A similar argument shows that the higher-order terms in the Taylor expansion
(17.36) as well as higher-order terms in the resolvent expansion (17.35) have a
negligible contribution.
As before, we presented the argument for generic indices; coinciding indices
give smaller contributions as we explained before. We leave the details to the
reader. This completes the proof of (17.30) and thus the proof of Theorem 17.4.

CHAPTER 18

Further Results and Historical Notes

Spectral statistics of large random matrices have been studied from many
different perspectives. The main focus of this book was on one particular result:
the Wigner-Dyson-Mehta bulk universality for Wigner matrices in the average
energy sense (Theorem 5.1). In this last section we collect a few other recent
results and open questions of this very active research area. We also give refer-
ences to related results and summarize the history of the development.

18.1. Wigner Matrices: Bulk Universality in Different Senses


Theorem 5.1, which is also valid for complex Hermitian ensembles, settles
the average energy version of the Wigner-Dyson-Mehta conjecture for gener-
alized Wigner matrices under the moment assumption (5.6) on the matrix ele-
ments. Theorem 5.1 was proved in increasing order of generality in [63,68–70];
see also [131] for the complex Hermitian Wigner ensemble and for a restricted
class of real symmetric matrices (see comments about this class later on). The
following theorem reduces the moment assumption to 4+𝜀 moments. The proof
in [53] also provides an effective speed of convergence in (18.2). We also point
out that the condition (2.6) can be somewhat relaxed (see corollary 8.3 in [55]).
Theorem 18.1 (Universality with averaged energy [53, theorem 7.2]). Sup-
pose that 𝐻 = (ℎ𝑖𝑗 ) is a complex Hermitian (respectively, real symmetric) gener-
alized Wigner matrix. Suppose that for some constants 𝜀 > 0, 𝐶 > 0,
4+𝜀
(18.1) 𝔼|√𝑁ℎ𝑖𝑗 | ≤ 𝐶.
Let 𝑛 ∈ ℕ and 𝑂 ∶ ℝ𝑛 → ℝ be a test function (i.e., compactly supported and
continuous). Fix |𝐸0 | < 2 and 𝜉 > 0, then with 𝑏𝑁 = 𝑁 −1+𝜉 we have
𝐸0 +𝑏𝑁
d𝐸 1 (𝑛) (𝑛) 𝜶
(18.2) lim ∫ ∫ d𝜶 𝑂(𝜶) 𝑛
(𝑝𝑁 − 𝑝G,𝑁 )(𝐸 + ) = 0.
𝑁→∞
𝐸0 −𝑏𝑁
2𝑏𝑁 ℝ𝑛 𝜚sc (𝐸) 𝑁𝜚sc (𝐸)
(𝑛)
Here 𝜚sc is the semicircle law defined in (3.1), 𝑝𝑁 is the 𝑛-point correlation func-
(𝑛)
tion of the eigenvalue distribution of 𝐻 (4.20), and 𝑝G,𝑁 is the 𝑛-point correlation
function of an 𝑁 × 𝑁 GUE (respectively, GOE) matrix.
A relatively straightforward consequence of Theorem 18.1 is the average gap
universality formulated in the following corollary. The gap distributions of the
comparison ensembles, i.e., the Gaussian ensembles, can be explicitly expressed
203
204 18. FURTHER RESULTS AND HISTORICAL NOTES

in terms of a Fredholm determinant [37, 40]. Recall the notation J𝐴, 𝐵K ∶=


ℤ ∩ [𝐴, 𝐵] for any real numbers 𝐴 < 𝐵.
Corollary 18.2 (Gap universality with averaged label). Let 𝐻 be as in The-
orem 18.1 and 𝑂 be a test function of 𝑛 variables. Fix small positive constants
𝜉, 𝛼 > 0. Then for any integer 𝑗0 ∈ J𝛼𝑁, (1 − 𝛼)𝑁K we have
1
(18.3) lim ∑ [𝔼 − 𝔼G ]𝑂(𝑁(𝜆𝑗 − 𝜆𝑗+1 ), … , 𝑁(𝜆𝑗 − 𝜆𝑗+𝑛 )) = 0.
𝑁→∞ 2𝑁 𝜉
|𝑗−𝑗0 |≤𝑁𝜉

Here 𝜆𝑗 ’s are the ordered eigenvalues, and for brevity we omitted the rescaling fac-
tor 𝜚(𝜆𝑗 ) from the argument of 𝑂. Moreover, 𝔼 and 𝔼G denote the expectation with
respect to the Wigner ensemble 𝐻 and the Gaussian (GOE or GUE) ensemble, re-
spectively.
The next result shows that Theorem 18.1 holds under the same conditions
without energy averaging.
Theorem 18.3 (Universality at fixed energy [26]). Consider a complex Her-
mitian (respectively, real symmetric) generalized Wigner matrix with moment
condition (18.1). Then, for any 𝐸 with |𝐸| < 2 we have
1 (𝑛) (𝑛) 𝜶
(18.4) lim ∫ d𝜶 𝑂(𝜶) 𝑛
(𝑝𝑁 − 𝑝G,𝑁 )(𝐸 + ) = 0.
𝑁→∞
ℝ𝑛
𝜚sc (𝐸) 𝑁𝜚sc (𝐸)
The fixed energy universality (18.4) for the 𝛽 = 2 (complex Hermitian)
case was already known earlier (see [57,66,132]). This case is exceptional since
the Harish-Chandra-Itzykson-Zuber identity allows one to compute correlation
functions for Gaussian divisible ensembles. This method relies on an algebraic
identity and has no generalization to other symmetry classes.
Finally, the gap universality with fixed label asserts that (18.3) holds without
averaging.
Theorem 18.4 (Gap universality with fixed label [67, theorem 2.2]). Con-
sider a complex Hermitian (respectively, real symmetric) generalized Wigner ma-
trix and assume subexponential decay of the matrix elements instead of (18.1).
Then, Corollary 18.2 holds without averaging:
(18.5) lim [𝔼 − 𝔼G ]𝑂(𝑁(𝜆𝑗 − 𝜆𝑗+1 ), … , 𝑁(𝜆𝑗 − 𝜆𝑗+𝑛 )) = 0
𝑁→∞

for any 𝑗 ∈ J𝛼𝑁, (1 − 𝛼)𝑁K with a fixed 𝛼 > 0. More generally, for any 𝑘, 𝑚 ∈
J𝛼𝑁, (1 − 𝛼)𝑁K we have

(18.6) lim |𝔼𝑂((𝑁𝜚𝑘 )(𝜆𝑘 − 𝜆𝑘+1 ), … , (𝑁𝜚𝑘 )(𝜆𝑘 − 𝜆𝑘+𝑛 ))


𝑁→∞
− 𝔼G 𝑂((𝑁𝜚𝑚 )(𝜆𝑚 − 𝜆𝑚+1 ), … , (𝑁𝜚𝑚 )(𝜆𝑚 − 𝜆𝑚+𝑛 ))| = 0
where the local density 𝜚𝑘 is defined by 𝜚𝑘 ∶= 𝜚sc (𝛾𝑘 ) with 𝛾𝑘 from (11.31).
18.2. HISTORICAL OVERVIEW OF THE THREE-STEP STRATEGY 205

The second part (18.6) of this theorem asserts that the gap distribution is
not only independent of the specific Wigner ensemble, but it is also universal
throughout the bulk spectrum. This is the counterpart of the statement that the
appropriately rescaled correlation functions (18.4) have a limit that is indepen-
dent of 𝐸 (see (4.38)).
Prior to [67], universality for a single gap was only achieved in the spe-
cial case of the Gaussian unitary ensemble (GUE) in [129]; this statement then
immediately implies the same result for complex Hermitian Wigner matrices
satisfying the four moment matching conditions.

18.2. Historical Overview of the Three-Step Strategy


for Bulk Universality
We summarize the recent history related to the bulk universality results
in Section 18.1. The three-step strategy first appeared implicitly in [57] in the
context of complex Hermitian matrices and was conceptualized for all symmetry
classes in [63] by linking it to the DBM. Several key components of this strategy
appeared in the later work [63, 64, 69]. We will review the history of Steps 1–3
separately.

18.2.1. History of Step 1. The semicircle law was proved by Wigner for
energy windows of order 1 [138]. Various improvements were made to shrink
the spectral windows; in particular, results down to scale 𝑁 −1/2 were obtained
by combining the results of [11] and [80]. The result at the optimal scale, 𝑁 −1 ,
referred to as the local semicircle law, was established for Wigner matrices in a
series of papers [60–62]. The method was based on a self-consistent equation for
the Stieltjes transform of the eigenvalues, 𝑚(𝑧), and the continuity in the imag-
inary part of the spectral parameter 𝑧. As a by-product, the optimal eigenvector
delocalization estimate was proved. For generalized Wigner matrices there is no
closed equation for 𝑚(𝑧) = 𝑁 −1 Tr 𝐺. In order to deal with this case, one needed
to consider a self-consistent equation for the entire vector (𝐺𝑖𝑖 (𝑧))𝑁
𝑖=1 consisting
of the diagonal matrix elements of the Green function [68, 69]. Together with
the fluctuation averaging lemma, this method implied the optimal rigidity esti-
mate of eigenvalues in the bulk in [68] and up to the edges in [70]. Furthermore,
optimal control was obtained for individual matrix elements of the resolvent 𝐺𝑖𝑗
and not only for its trace. The estimate on 𝐺𝑖𝑖 also provided a simple alternative
proof of the eigenvector delocalization estimate. A comprehensive summary of
these results can be found in [55]. Several further extensions concern weaker
moment conditions and improvements of (log 𝑁)-powers (see, e.g., [32,78,130]
and references therein).

18.2.2. History of Step 2. The universality for Gaussian divisible ensem-


bles was proved by Johansson [86] for complex Hermitian Wigner ensembles
in a certain range of the bulk spectrum. This was the first result beyond exact
206 18. FURTHER RESULTS AND HISTORICAL NOTES

Gaussian calculations going back to Gaudin, Dyson, and Mehta. It was later ex-
tended to complex sample covariance matrices on the entire bulk spectrum by
Ben Arous and Péché [19]. There were two major restrictions of this method:
(i) the Gaussian component was required to be of order 1 independent of 𝑁;
(ii) it relies on an explicit formula by Brézin-Hikami [30, 31] for the correlation
functions of eigenvalues, which is valid only for Gaussian divisible ensembles
with unitary invariant Gaussian component. The size of the Gaussian compo-
nent was reduced to 𝑁 −1+𝜀 in [57] by using an improved formula for correlation
functions and the local semicircle law from [60–62].
To eliminate the usage of an explicit formula, a conceptual approach for
Step 2 via the local ergodicity of Dyson Brownian motion was initiated in [63].
In [64], a general theorem for the bulk universality was formulated that ap-
plies to all classical ensembles, i.e., real and complex Wigner matrices, real and
complex sample covariance matrices, and quaternion Wigner matrices. The re-
laxation time to local equilibrium proved in these two papers were not optimal;
the optimal relaxation time, conjectured by Dyson, was obtained later in [70].
The DBM approach in these papers yields the average gap and hence the aver-
age energy universality, while the method via the Brézin-Hikami formula gives
universality in a fixed energy sense but is strictly restricted to the complex Her-
mitian case. The fixed energy universality for all symmetry classes was recently
achieved in [26]. The method in this paper was still based on DBM, but the
analysis of DBM was very different from the one discussed in this book.

18.2.3. History of Step 3. Once universality was established for Gaussian


divisible ensembles 𝑒−𝑡/2 𝐻0 + (1 − 𝑒−𝑡 )1/2 𝐻 G (here we used the convention
used in Chapter 5) for a sufficiently large class of 𝐻0 and sufficiently small 𝑡,
the final step is to approximate the local eigenvalue distribution of a general
Wigner matrix by that of a Gaussian divisible one. The first approximation
result was obtained via the reversal heat flow in [57], which required smoothness
of the distribution of matrix elements. Shortly after, Tao and Vu [131] proved
the four-moment theorem, which asserts that the local statistics of two Wigner
ensembles are the same provided that the first four moments of their matrix
elements are identical. The third comparison method is the Green function
comparison theorem, Theorem 16.1 [63, 68, 69], which was the main content
of Chapter 16. It uses similar moment conditions as those appeared in [131],
but it provides detailed information on the matrix elements of Green functions
as well. Both [131] and the proof of the Green function comparison theorem
[63, 68, 69] use the local semicircle law [60, 62] and the Lindeberg’s idea, which
had appeared in the proof of the Wigner semicircle law by Chatterjee [34]. The
approach [131] requires additional difficult estimates on level repulsions, while
Theorem 16.1 follows directly from the estimates on the matrix elements of
the Green function from Step 1 via standard resolvent expansions. Another
18.3. INVARIANT ENSEMBLES AND LOG-GASES 207

comparison method reviewed in this book, Theorem 15.2, can be viewed as a


variation of the GFT.
In summary, the WDM conjecture was first solved in the complex Hermitian
case and then for the real symmetric case. In the first case, the bulk universal-
ity for Hermitian matrices with smooth distribution was proved in [57]. The
smoothness requirement was removed in [131], but the complex Bernoulli dis-
tribution was still excluded; this was finally added in [58] (see also [66, 132] for
related discussions).
The result of [131] also implies bulk universality of symmetric Wigner ma-
trices under the restriction that the first four moments of the matrix elements
match those of the GOE. The major roadblock to the WDM conjecture for real
symmetric matrices was the lack of Step 2 in this case. For symmetric matrices,
we do not know any Brézin-Hikami type formula, which was the backbone for
Step 2 in the Hermitian case. The resolution of this difficulty was to abandon
any explicit formula and to prove Dyson’s conjecture on the relaxation of the
Dyson Brownian motion (see Step 2). Together with the Green function com-
parison theorem [63], bulk universality was proved for real symmetric matrices
without any smoothness requirement. In particular, this result covered the real
Bernoulli case [68], which has been a major objective in this subject due to the
application to the adjacency matrices for random graphs. The flexibility of the
DBM approach also allowed one to establish universality beyond the traditional
Wigner ensembles. The first such result was bulk universality for generalized
Wigner matrices [69], followed by many other models (see Section 18.6).
Finally, the technical condition assumed in all these papers, i.e., that the
probability distributions of the matrix elements decay subexponentially, was
reduced to the (4 + 𝜀)-moment assumption (18.1) by using the universality of
Erdős-Rényi matrices [53]. This yielded the bulk universality as formulated in
Theorem 18.1.

18.3. Invariant Ensembles and Log-Gases


The explicit formula (4.3) for the joint density function of the eigenvalues
for the invariant matrix ensemble defined in (4.2) gives rise to a direct statistical
physics interpretation: it can be viewed as a measure 𝜇(𝑁) on 𝑁 point particles
on the real line. Inspired by the Gibbs formalism, we may write it as
(18.7) 𝜇(𝑁) (d𝝀) = 𝑝𝑁 (𝝀)d𝝀 = const ⋅ 𝑒−𝛽𝑁ℋ (𝝀) d𝝀, 𝝀 = (𝜆1 , … , 𝜆𝑁 ),
where
𝑁
1 1
ℋ(𝝀) ∶= ∑ 𝑉(𝜆𝑗 ) − ∑ log(𝜆𝑗 − 𝜆𝑖 ).
𝑗=1
2 𝑁 1≤𝑖<𝑗≤𝑁

Clearly, ℋ can be interpreted as the classical Hamiltonian function of a log-gas,


i.e., a system of 𝑁 interacting particles subject to a potential 𝑉 and a two-body
logarithmic interaction. While a direct relation to invariant ensembles exists
208 18. FURTHER RESULTS AND HISTORICAL NOTES

only for specific values 𝛽 = 1, 2, 4, log-gases may be studied for any 𝛽 > 0, i.e.,
at any inverse temperature.
In the case of invariant ensembles, it is well-known that for 𝑉 satisfying cer-
tain mild conditions the sequence of one-point correlation functions, or density,
associated with (18.7) has a limit as 𝑁 → ∞, and the limiting equilibrium den-
sity 𝜚𝑉 (𝑠) can be obtained as the unique minimizer of the functional

𝐼(𝜈) = ∫ 𝑉(𝑡)𝜈(𝑡)d𝑡 − ∫ ∫ log |𝑡 − 𝑠|𝜈(𝑠)𝜈(𝑡)d𝑡 d𝑠.


ℝ ℝ ℝ
This is the analogue of the Wigner semicircle law for log-gases.
We assume that 𝜚 = 𝜚𝑉 is supported on a single compact interval, [𝐴, 𝐵],
and 𝜚 ∈ 𝐶 2 (𝐴, 𝐵). Moreover, we assume that 𝑉 is regular in the sense that 𝜚 is
strictly positive on (𝐴, 𝐵) and vanishes as a square root at the endpoints, i.e.,
(18.8) 𝜚(𝑡) = 𝑠𝐴 √𝑡 − 𝐴 (1 + 𝑂 (𝑡 − 𝐴)) , 𝑡 → 𝐴+ ,
for some constant 𝑠𝐴 > 0, and a similar condition holds at the upper edge.
It is known that these conditions are satisfied if, for example, 𝑉 is strictly
convex. In this case 𝜚𝑉 satisfies the equation
1 ′ 𝜚 (𝑠)d𝑠
(18.9) 𝑉 (𝑡) = ∫ 𝑉
2 ℝ
𝑡−𝑠
for any 𝑡 ∈ (𝐴, 𝐵). For the Gaussian case, 𝑉(𝑥) = 𝑥 2 /2, the equilibrium density
is given by the semicircle law, 𝜚𝑉 = 𝜚sc (see (3.1)).
Under these conditions, the density of 𝜇(𝑁) converges to 𝜚𝑉 even locally
down to the optimal scale, and rigidity analogous to Theorem 11.5 holds:
(𝑁) c
ℙ𝜇 (|𝜆𝑗 − 𝛾𝑗,𝑉 | ≥ 𝑁 −2/3+𝜉 min{𝑘, 𝑁 − 𝑘 + 1}−1/3 ) ≤ 𝑒−𝑁

where the quantiles 𝛾𝑗,𝑉 of the density 𝜚𝑉 are defined by


𝛾𝑗,𝑉
𝑗
(18.10) =∫ 𝜚𝑉 (𝑥)d𝑥,
𝑁 𝐴
similarly to (11.31) (see [24, theorem 2.4 ]).
The higher-order correlation functions, given in Definition 4.2 and rescaled
by the local density 𝜚𝛽,𝑉 (𝐸) around an energy 𝐸 in the bulk, exhibit a universal
behavior in a sense that they depend only on 𝛽 but are independent of the po-
tential 𝑉. A similar result holds at the edges of the support of the density 𝜚𝛽,𝑉 .
This can be viewed as a special realization of the universality hypothesis for the
invariant matrix ensemble. Historically, the universality for the special classical
cases 𝛽 = 1, 2, 4 had been extensively studied via orthogonal polynomial meth-
ods. These methods rely on explicit expressions of the correlation functions in
terms of Fredholm determinants involving orthogonal polynomials. The key
advantage of this method is that it yields explicit formulas for the limiting gap
distributions or correlation functions. However, these explicit expressions are
18.3. INVARIANT ENSEMBLES AND LOG-GASES 209

available only for the specific values 𝛽 = 1, 2, 4. Within the framework of this ap-
proach, the 𝛽 = 1, 2, 4 cases of Theorem 18.5 below was proved for very general
potentials for 𝛽 = 2 and for analytic potentials with some additional conditions
for 𝛽 = 1, 4. This is a whole subject by itself, and we can only refer the reader
to some reviews or books [39, 91, 116].
The first general result for nonclassical 𝛽-values is the following theorem
proved in [23–25]. In these works, a new approach to the universality of the
𝛽-ensemble was initiated. It departed from the previously mentioned traditional
approach using explicit expressions for the correlation functions. Instead, it re-
lies on comparing the probability measures of the 𝛽-ensembles to those of the
Gaussian ones. This basic idea of comparing two probability measures directly
was also used in the later works [18, 119], albeit in a different way, where The-
orem 18.5 was reproved under different sets of conditions on the potential 𝑉.

Theorem 18.5 (Universality with averaged energy). Assume the potential 𝑉


(𝑁)
is regular, 𝑉 ∈ 𝐶 4 (ℝ), and let 𝛽 > 0. Consider the 𝛽-ensemble 𝜇(𝑁) = 𝜇𝛽,𝑉 given
(𝑛)
in (18.7) with correlation functions 𝑝𝑉,𝑁 defined analogously to (4.20). For the
(𝑛)
Gaussian case, 𝑉(𝑥) = 𝑥 2 /2, the correlation functions are denoted by 𝑝𝐺,𝑁 . Let
𝐸0 ∈ (𝐴, 𝐵) lie in the interior of the support for 𝜚, and similarly let 𝐸0′ ∈ (−2, 2)
be inside the support for 𝜚sc . Then for 𝑏𝑁 = 𝑁 −1+𝜉 with some 𝜉 > 0 we have
𝐸0 +𝑏𝑁
d𝐸 1 (𝑛) 𝜶
lim ∫ d𝜶 𝑂(𝜶)[ ∫ 𝑝 (𝐸 + )
𝑁→∞
ℝ𝑛 𝐸0 −𝑏𝑁
2𝑏𝑁 𝜚(𝐸)𝑛 𝑉,𝑁 𝑁𝜚(𝐸)
(18.11) 𝐸′0 +𝑏𝑁
d𝐸 ′ 1 (𝑛) 𝜶
−∫ 𝑝 (𝐸 ′ + )] = 0;
𝐸′0 −𝑏𝑁
2𝑏𝑁 𝜚sc (𝐸 ′ )𝑛 G,𝑁 𝑁𝜚sc (𝐸 ′ )
(𝑁)
i.e., the correlation functions of 𝜇𝛽,𝑉 averaged around 𝐸0 asymptotically coincide
with those of the Gaussian case. In particular, they are independent of 𝐸0 .

Universality at fixed energy, i.e., (18.11) without averaging in 𝐸, was proven


by M. Shcherbina in [119] for any 𝛽 > 0 under certain regularity assumptions
on the potentials. We remark that the gap distribution in the Gaussian case for
any 𝛽 > 0 can be described by an interesting stochastic process [136].
Theorem 18.5 immediately implies gap universality with averaged labels,
exactly in the same way as Corollary 18.2 was deduced from Theorem 18.1. To
formulate the gap universality with a fixed label, we recall the quantiles 𝛾𝑗,𝑉
from (18.10) and set

(18.12) 𝜚𝑗𝑉 ∶= 𝜚𝑉 (𝛾𝑗,𝑉 ) and 𝜚𝑗 ∶= 𝜚sc (𝛾𝑗 )

to be the limiting densities at the 𝑗th quantiles. Let 𝔼𝜇𝑉 and 𝔼G denote the
expectation w.r.t. the measure 𝜇𝑉 and its Gaussian counterpart for 𝑉(𝜆) = 12 𝜆2 .
210 18. FURTHER RESULTS AND HISTORICAL NOTES

Theorem 18.6 (Gap universality with fixed label [67, theorem 2.3]). We
consider the setup of Theorem 18.5 and also assume 𝛽 ≥ 1. Set some 𝛼 > 0;
then
lim |𝔼𝜇𝑉 𝑂((𝑁𝜚𝑘𝑉 )(𝜆𝑘 − 𝜆𝑘+1 ), … , (𝑁𝜚𝑘𝑉 )(𝜆𝑘 − 𝜆𝑘+𝑛 ))
𝑁→∞
(18.13)
− 𝔼𝜇𝐺 𝑂((𝑁𝜚𝑚 )(𝜆𝑚 − 𝜆𝑚+1 ), … , (𝑁𝜚𝑚 )(𝜆𝑚 − 𝜆𝑚+𝑛 ))| = 0
for any 𝑘, 𝑚 ∈ J𝛼𝑁, (1 − 𝛼)𝑁K. In particular, the distribution of the rescaled gaps
w.r.t. 𝜇𝑉 does not depend on the index 𝑘 in the bulk.
We point out that Theorem 18.5 holds for any 𝛽 > 0, but Theorem 18.6
requires 𝛽 ≥ 1. This is partly due to the fact that the De Giorgi-Nash-Moser reg-
ularity theory was used in [67]. This restriction was later removed by Bekerman-
Figalli-Guionnet [18] under a higher regularity assumption on 𝑉 and some ad-
ditional hypotheses.

18.4. Universality at the Spectral Edge


So far we stated results for the bulk of the spectrum. Similar results hold at
the edge; in this case the “averaged” results are meaningless. In Theorem 17.1
we already presented a universality result for the largest eigenvalue for the
Wigner ensemble. That proof easily extends to the joint distribution of finitely
many extreme eigenvalues. The following theorem gives a bit more: they control
finite-dimensional distributions of the first 𝑁 1/4 eigenvalues and also identify
the limit distribution. We then give the analogous result for log-gases.
Theorem 18.7 (Universality at the edge for Wigner matrices [24]). Let 𝐻 be
a generalized Wigner ensemble with subexponentially decaying matrix elements.
Fix 𝑛 ∈ ℕ, 𝜅 < 1/4, and a test function 𝑂 of 𝑛 variables. Then for any Λ ⊂ J1, 𝑁 𝜅 K
with |Λ| = 𝑛, we have
|[𝔼 − 𝔼G ]𝑂((𝑁 2/3 𝑗 1/3 (𝜆𝑗 − 𝛾𝑗 )) )| ≤ 𝑁 −𝜒
𝑗∈Λ

with some 𝜒 > 0, where 𝔼G is expectation w.r.t. the standard GOE or GUE ensem-
ble depending on the symmetry class of 𝐻 and 𝛾𝑗 ’s are semicircle quantiles (11.31).
The edge universality for Wigner matrices was first proved by Soshinikov
[125] under the assumption that the distribution of the matrix elements is sym-
metric and has finite moments for all orders. Soshnikov used an elaborated
moment matching method, and the conditions on the moments are not easy
to improve. A completely different method based on the Green function com-
parison theorem was initiated in [70]. This method is more flexible in many
ways and removes essentially all restrictions in previous works. In particular,
the optimal moment condition on the distribution of the matrix elements was
obtained by Lee and Yin [98]. All these works assume that the variances of the
matrix elements are identical. The main point of Theorem 18.7 is to consider
generalized Wigner matrices, i.e., matrices with nonconstant variances. It was
18.5. EIGENVECTORS 211

proved in [70, theorem 17.1] that the edge statistics for any generalized Wigner
matrix coincide with those of a generalized Gaussian Wigner matrix with the
same variances, but it was not shown that the statistics are independent of the
variances themselves. Theorem 18.7 provides this missing step and thus proves
the edge universality for the generalized Wigner matrices.
Theorem 18.8 (Universality at the edge for log-gases [24]). Let 𝛽 ≥ 1 and
˜ be in 𝐶 4 (ℝ), regular such that the equilibrium density 𝜚𝑉 (resp., 𝜚 ) is
𝑉 (resp., 𝑉) ˜
𝑉
˜𝐵
supported on a single interval [𝐴, 𝐵] (resp., [𝐴, ˜]). Without loss of generality we
assume that for both densities (18.8) holds with 𝐴 = 0 and with the same constant
𝑠𝐴 . Fix 𝑛 ∈ ℕ and 𝜅 < 2/5. Then for any Λ ⊂ J1, 𝑁 𝜅 K with |Λ| = 𝑛, we have
(18.14) |(𝔼𝜇𝑉 − 𝔼𝜇𝑉˜ )𝑂((𝑁 2/3 𝑗 1/3 (𝜆𝑗 − 𝛾𝑗 )) )| ≤ 𝑁 −𝜒
𝑗∈Λ

with some 𝜒 > 0. Here, 𝛾𝑗 are the quantiles w.r.t. the density 𝜚𝑉 (18.10).
The history of edge universality runs in parallel to the bulk one in the sense
that most bulk methods were extended to the edge case. Similar to the bulk case,
the edge universality was first proved via orthogonal polynomial methods for
the classical values of 𝛽 = 1, 2, 4 in [38, 42, 111, 117]. The first edge universality
results were given independently in [24] (valid for general potentials and 𝛽 ≥
1) and, with a completely different method, in [92] (valid for strictly convex
potentials and any 𝛽 > 0). The later work [18] also covered the edge case and
proved universality to 𝛽 > 0 under some higher regularity assumption on 𝑉 and
an additional condition that can be checked for convex 𝑉. Some open questions
concern the universal behavior at the nonregular edges where even the scaling
may change.

18.5. Eigenvectors
Universality questions have been traditionally formulated for eigenvalues,
but they naturally extend to eigenvectors as well. They are closely related to two
different important physical phenomena: delocalization and quantum ergod-
icity.
The concept of delocalization stems from the basic theory of random Schrö-
dinger operators describing a single quantum particle subject to a random po-
tential 𝑉 in ℝ𝑑 . If the potential decays at infinity, then the operator −Δ + 𝑉
acting on 𝐿2 (ℝ𝑑 ) has a discrete pure point spectrum with 𝐿2 -eigenfunctions, lo-
calized or bound states, in the low-energy regime. In the high-energy regime
it has absolutely continuous spectrum with bounded but not 𝐿2 -normalizable
generalized eigenfunctions. The latter are also called delocalized or extended
states, and they correspond to scattering and conductance. If the potential 𝑉
is a stationary, ergodic random field, then celebrated Anderson localization [9]
occurs: at high disorder a dense pure point spectrum with exponentially de-
caying eigenfunctions appears even in the energy regime that would classically
correspond to scattering states. The localized regime has been mathematically
212 18. FURTHER RESULTS AND HISTORICAL NOTES

well understood for both the high-disorder regime and the regime of low den-
sity of states (near the spectral edges) since the fundamental works of Fröhlich
and Spencer [74] and later by Aizenman and Molchanov [1]. The local spectral
statistics is Poisson [108], reflecting the intuition that exponentially decaying
eigenfunctions typically have no overlap, so their energies are independent.
The low- disorder regime, however, is mathematically wide open. In 𝑑 = 3
or higher dimensions it is conjectured that −Δ + 𝑉 has absolutely continuous
spectrum (away from the spectral edges), the corresponding eigenfunctions are
delocalized, and the local eigenvalue statistics of the restriction of −Δ + 𝑉 to a
large finite box follow the GOE local statistics. In other words, a phase transi-
tion is believed to occur as the strength of the disorder varies; this is called the
Anderson metal-insulator transition. Currently, even the existence of this ex-
tended states regime is not known; this is considered one of the most important
open questions in mathematical physics. Only the Bethe lattice (regular tree) is
understood [2, 87], but this model exhibits Poisson local statistics [3] due to its
exponentially growing boundary.
The strong link between local eigenvalue statistics and delocalization for
random Schrödinger operators naturally raises the question about the structure
of the eigenvectors for an 𝑁 × 𝑁 Wigner matrix 𝐻. Complete delocalization in
this context would mean that any ℓ2 -normalized eigenvector 𝐮 = (𝑢1 , … , 𝑢𝑁 )
of 𝐻 is substantially supported on each coordinate; ideally |𝑢𝑖 |2 ≈ 𝑁 −1 for each 𝑖.
Due to fluctuations, this can hold only in a somewhat weaker sense.
The first type of results provide an upper bound on the ℓ∞ -norm in very
high probability:

Theorem 18.9. Let 𝐮𝛼 , 𝛼 = 1, … , 𝑁, be the ℓ2 -normalized eigenvectors of


a generalized Wigner matrix 𝐻 satisfying the moment condition (5.6). Then, for
any (small) 𝜀 > 0 and (large) 𝐷 > 0 we have
𝑁𝜀
(18.15) ℙ(∃𝛼 ∶ ‖𝐮𝛼 ‖∞ ≥ ) ≤ 𝑁 −𝐷 .
𝑁
The 𝑁 𝜀 tolerance factor may be changed to a log-power and the probabil-
ity estimate improved to subexponential under stronger decay conditions on 𝐻.
This theorem directly follows from the local semicircle law, Theorem 6.7. In-
deed, by spectral decomposition
|𝑢𝛼 (𝑖)|2
Im 𝐺𝑖𝑖 (𝑧) = 𝜂 ∑ ≥ 𝜂 −1 |𝑢𝛼 (𝑖)|2 , 𝑧 = 𝐸 + 𝑖𝜂,
𝛼
(𝐸 − 𝜆𝛼 )2 + 𝜂 2

where we chose 𝐸 in the 𝜂-vicinity of 𝜆𝛼 . Since from (6.31) the left-hand side
is bounded by 𝑁 𝜀 (𝑁𝜂)−1 with very high probability, we obtain (18.15). Under
stronger decay conditions on 𝐻, the local semicircle law can be slightly im-
proved, and thus the 𝑁 𝜀 tolerance factor may be changed to a log-power and
the probability estimate improved to subexponential.
18.6. GENERAL MEAN FIELD MODELS 213

Although this bound prevents concentration of eigenvectors onto a set of


size less than 𝑁 1−𝜀 , it does not imply the “complete flatness” of eigenvectors
in the sense that |𝑢𝛼 (𝑖)| ≈ 𝑁 −1/2 . Note that by a complete 𝑈(𝑁) or 𝑂(𝑁) sym-
metry, the eigenfunctions of a GUE or GOE matrix are distributed according to
the Haar measure on the 𝑁-dimensional unit sphere; moreover, the distribu-
tion of a particular coordinate 𝑢𝛼 (𝑖) of an fixed eigenvector 𝐮𝛼 is asymptotically
Gaussian. The same result holds for generalized Wigner matrices by theorem
1.2 of [28] (a similar result was also obtained in [88, 133] under the condition
that the first four moments match with those of the Gaussian):
Theorem 18.10 (Asymptotic normality of eigenvectors). Let 𝐮𝛼 be the ei-
genvectors of a generalized Wigner matrix with moment condition (5.6). Then
there is a 𝛿 > 0 such that for any 𝑚 ∈ ℕ and
𝐼 ⊂ 𝑇𝑁 ∶= J1, 𝑁 1/4 K ∪ J𝑁 1−𝛿 , 𝑁 − 𝑁 1−𝛿 K ∪ J𝑁 − 𝑁 1/4 , 𝑁K with |𝐼| = 𝑚

and for any unit vector 𝐪 ∈ ℝ𝑁 , the vector (√𝑁(|⟨𝐪, 𝐮𝛼 ⟩|)𝛼∈𝐼 ) ∈ ℝ𝑚 is asymp-
totically normal in the sense of finite moments.
Moreover, the study of eigenvectors leads to another fundamental problem
of mathematical physics: quantum ergodicity. Recall that the quantum ergod-
icity theorem (Shnirel′ man [123], Colin de Verdière [35], and Zelditch [141])
asserts that “most” eigenfunctions for the Laplacian on a compact Riemannian
manifold with ergodic geodesic flow are completely flat. For 𝑑-regular graphs
under certain assumptions on the injectivity radius and spectral gap of the ad-
jacency matrices, similar results were proved for eigenvectors of the adjacency
matrices [7]. A stronger notion of quantum ergodicity, the quantum unique
ergodicity (QUE) proposed by Rudnick-Sarnak [115], demands that all high
energy eigenfunctions become completely flat, and it supposedly holds for neg-
atively curved compact Riemannian manifolds. One case for which QUE was
rigorously proved concerns arithmetic surfaces, thanks to tools from number
theory and ergodic theory on homogeneous spaces [82, 83, 101]. For Wigner
matrices, a probabilistic version of QUE was settled in corollary 1.4 of [28]:
Theorem 18.11 (Quantum unique ergodicity for Wigner matrices). Let 𝐮𝛼
be the eigenvectors of a generalized Wigner matrix with moment condition (5.6).
Then there exists 𝜀 > 0 such that for any deterministic 1 ≤ 𝛼 ≤ 𝑁 and 𝐼 ⊂ J1, 𝑁K,
for any 𝛿 > 0 we have
| |𝐼| | 𝑁 −𝜀
(18.16) ℙ(||∑ |𝑢𝛼 (𝑖)|2 − || ≥ 𝛿) ≤ 2 .
𝑖∈𝐼
𝑁 𝛿

18.6. General Mean Field Models


Wigner matrices and their generalizations in Definition 2.1 are the most
prominent mean field random matrix ensembles, but there are several other
214 18. FURTHER RESULTS AND HISTORICAL NOTES

models of interest. Here we give a partial list. A more detailed account of the
results and citations can be found later on in the few representative references.
The three-step strategy opens up a path to investigate local spectral statis-
tics of a large class of matrices. In many examples the limiting density of the
eigenvalues is not the semicircle anymore, so the Dyson Brownian motion is not
in global equilibrium and Step 2 is more complicated. In the spirit of Dyson’s
conjecture, however, gap universality already follows from thelocal equilibra-
tion of DBM. Indeed, a local version of the DBM was used as a basic tool in [25].
A recent development to compare local and global dynamics of DBM was im-
plemented by a coupling and homogenization argument in [26]. It turns out
that as long as the initial condition of the DBM is regular on a certain scale, the
local gap statistics reaches its equilibrium, i.e., the Wigner-Dyson-Mehta distri-
bition, on the corresponding time scale [93] (see also [65] for a similar result).
This concept reduces the proof of universality to a proof of the corresponding
local law.
One natural class of mean field ensembles are the deformed Wigner matri-
ces; these are of the form 𝐻 = 𝑉 + 𝑊, where 𝑉 is a diagonal matrix and 𝑊 is a
standard Wigner matrix. In fact, after a trivial rescaling, these matrices may be
viewed as instances of the DBM with initial condition 𝐻0 = 𝑉, recalling that
the law of the DBM at time 𝑡 is given by 𝐻𝑡 = 𝑒−𝑡/2 𝑉 + (1 − 𝑒−𝑡 )1/2 𝑊. As-
suming that the diagonal elements of 𝑉 have a limiting density 𝜚𝑉 , the limiting
density of 𝐻𝑡 is given by the free convolution of 𝜚𝑉 with the semicircle density.
The corresponding local law was proved in [94] followed by the proof of bulk
universality in [97] and the edge universality [95, 97].
The free convolution of two arbitrary densities, 𝜚𝛼 ⊞ 𝜚𝛽 , naturally appear in
random matrix theory as the limiting density of the sum of two asymptotically
free matrices 𝐴 and 𝐵 whose limiting densities are given by 𝜚𝛼 and 𝜚𝛽 . Moreover,
for any deterministic 𝐴 and 𝐵, the matrices 𝐴 and 𝑈𝐵𝑈 ∗ are asymptotically
free [137] where 𝑈 is a Haar distributed unitary matrix. Thus, the eigenvalue
density of 𝐴 + 𝑈𝐵𝑈 ∗ is given by the free convolution . The corresponding local
law on the optimal scale in the bulk was given in [15].
Another way to generate a matrix ensemble with a limiting density differ-
ent from the semicircle law is to consider Wigner-type matrices that are further
generalizations of the universal Wigner matrices from Definition 2.1. These are
complex Hermitian or real symmetric matrices with centered independent (up
to symmetry) matrix elements but without the condition that the sum of the
variances in each row is constant (2.5). The limiting density is determined by
the matrix of variances 𝑆 by solving the corresponding Dyson-Schwinger equa-
tion. Depending on 𝑆, the density may exhibit square root singularity similar to
the edge of the semicircle law or a cubic root singularity [5]. The corresponding
optimal local law and bulk universality were proved in [4].
Different complications arise for adjacency matrices of sparse graphs. Each
edge of the Erdős-Rényi graph is chosen independently with probability 𝑝, so it
18.7. BEYOND MEAN FIELD MODELS: BAND MATRICES 215

is a mean field model with a semicircle density, but a typical realization of the
matrix contains many zero elements if 𝑝 ≪ 1. Optimal local law was proved
in [56], and bulk universality for 𝑝 ≫ 𝑁 −1/3 in [53] and for 𝑝 ≫ 𝑁 −1 in [84]. An-
other prominent model of random graphs is the uniform measure on 𝑑-regular
graphs . The elements of the adjacency matrix are weakly dependent since their
sum in every row is 𝑑. For 𝑑 ≫ 1 the limiting density is the semicircle law, and
optimal local law was obtained in [17]. Bulk universality was shown in the
regime 1 ≪ 𝑑 ≪ 𝑁 2/3 in [16],
Sample covariance matrices (2.9) and especially their deformations with fi-
nite rank deterministic matrices play an important role in statistics. The first
application of the three-step strategy to prove bulk universality was presented
in [64]. The edge universality was achieved in [90,96]. For applications in prin-
cipal component analysis, the main focus is on the extreme eigenvalues and the
outliers whose detailed analysis was given in [22]. Here we have listed refer-
ences for papers using methods closely related to this book. There are many
references in this subject that we are unable to include here.

18.7. Beyond Mean Field Models: Band Matrices


Wigner predicted that universality should hold for any large quantum sys-
tem, described by a Hamiltonian 𝐻, of sufficient complexity. After discretiza-
tion of the underlying physical state space, the Hamilton operator is given by
a matrix 𝐻 whose matrix elements 𝐻𝑥𝑦 describe the quantum transition rate
from state 𝑥 to 𝑦. Generalized Wigner matrices correspond to a mean field sit-
uation where transition from any state 𝑥 to 𝑦 is possible. In contrast, random
Schrödinger operators of the form −Δ + 𝑉 defined on ℤ𝑑 allow transition only
to neighboring lattice sites with a random on-site interaction, so they are pro-
totypes of disordered quantum systems with a nontrivial spatial structure.
As explained in Section 18.5, from the point of view of Anderson phase
transition, generalized Wigner matrices are always in the delocalized regime.
Random Schrödinger operators in 𝑑 = 1 dimension, or more general random
matrices 𝐻 representing one-dimensional Hamiltonians with short-range ran-
dom hoppings, are in the localized regime. It is therefore natural to vary the
range of interaction to detect the phase transition.
One popular model interpolating between the Wigner matrices and ran-
dom Schrödinger operator is the random band matrix (see Example 6.1). In this
model the physical state space, which labels the matrix elements, is equipped
with a distance. Band matrices are characterized by the property that 𝐻𝑖𝑗 be-
comes negligible if dist(𝑖, 𝑗) exceeds a certain parameter 𝑊, called the band-
width. A fundamental conjecture [76] states that the local spectral statistics of a
band matrix 𝐻 are governed by random matrix statistics for large 𝑊 and by Pois-
son statistics for small 𝑊. The transition is conjectured to be sharp [76,126] for
the band matrices in one spatial dimension around the critical value 𝑊 = √𝑁.
216 18. FURTHER RESULTS AND HISTORICAL NOTES

In other words, if 𝑊 ≫ √𝑁, we expect the universality results of [26, 57, 63, 67]
to hold. Furthermore, the eigenvectors of 𝐻 are expected to be completely de-
localized in this range. For 𝑊 ≪ √𝑁, one expects that the eigenvectors are
exponentially localized. This is the analogue of the celebrated Anderson metal-
insulator transition for random band matrices. The only rigorous work indicat-
ing the √𝑁 threshold concerns the second mixed moments of the characteristic
polynomial for a special class of Gaussian band matrices [120, 122].
The localization length for band matrices in one spatial dimension was re-
cently investigated in numerous works. For general distribution of the matrix
entries, eigenstates were proved to be localized [116] for 𝑊 ≪ 𝑁 1/8 , and delo-
calization of most eigenvectors in a certain averaged sense holds for 𝑊 ≫ 𝑁 6/7
[50, 51], improved to 𝑊 ≫ 𝑁 4/5 [54]. The Green’s function (𝐻 − 𝑧)−1 was
controlled down to the scale Im 𝑧 ≫ 𝑊 −1 in [69], implying a lower bound
of order 𝑊 for the localization length of all eigenvectors. When the entries are
Gaussian with some specific covariance profiles, supersymmetry techniques are
applicable to obtain stronger results. This approach was first developed by physi-
cists (see [49] for an overview); the rigorous analysis was initiated by Spencer
(see [126] for an overview) with an accurate estimate on the expected density
of states on arbitrarily short scales for a three-dimensional band matrix ensem-
ble in [44]. More recent works include universality for 𝑊 ≥ 𝑐𝑁 [121] and the
control of the Green’s function down to the optimal scale Im 𝑧 ≫ 𝑁 −1 , hence
delocalization in a strong sense for all eigenvectors, when 𝑊 ≫ 𝑁 6/7 [14] with
first four moments matching the Gaussian ones (both results require a block
structure and hold in part of the bulk spectrum).
While delocalization and Wigner-Dyson-Mehta spectral statistics are ex-
pected to occur simultaneously, there is no rigorous argument directly linking
them. The Dyson Brownian motion, the cornerstone of the three-step strategy,
proves universality for matrices where each entry has a nontrivial Gaussian com-
ponent and the comparison ideas require that second moments match exactly.
Therefore, this approach cannot be applied to matrices with many zero entries.
However, a combination of the quantum unique ergodicity with a mean field
reduction strategy in [27] yields Wigner-Dyson-Mehta bulk universality for a
large class of band matrices with general distribution in the large bandwidth
regime 𝑊 ≥ 𝑐𝑁. In contrast to the bulk, universality at the spectral edge is
much better understood: extreme eigenvalues follow the Tracy-Widom law for
𝑊 ≫ 𝑁 5/6 , an essentially optimal condition [124].
References

[1] Aizenman, M., and Molchanov, S. Localization at large disorder and at extreme energies:
an elementary derivation. Comm. Math. Phys. 157(2):245–278, 1993.
[2] Aizenman, M., Sims, R., and Warzel, S. Absolutely continuous spectra of quantum tree
graphs with weak disorder. Comm. Math. Phys. 264(2):371–389, 2006. doi:10.1007/s00220-
005-1468-5
[3] Aizenman, M., and Warzel, S. The canopy graph and level statistics for random opera-
tors on trees. Math. Phys. Anal. Geom. 9(4):291–333 (2007), 2006. doi:10.1007/s11040-007-
9018-3
[4] Ajanki,O., Erdős, L., and Krüger, T. Quadratic vector equations on complex upper half-
plane. Preprint, 2015. arXiv:1506.05095 [math.PR].
[5] . Universality for general Wigner-type matrices. Probab. Theory Relat. Fields, to
appear. doi:10.1007/s00440-016-0740-2.
[6] Akemann, G., Baik, J., and Di Francesco, P., eds. The Oxford handbook of random matrix
theory. Oxford University Press, Oxford, 2011. MR2920518.
[7] Anantharaman, N., and Le Masson, E. Quantum ergodicity on large regular graphs. Duke
Math. J. 164(4):723–765, 2015. doi:10.1215/00127094-2881592
[8] Anderson, G. W., Guionnet, A., and Zeitouni, O. An introduction to random matrices. Cam-
bridge Studies in Advanced Mathematics, 118. Cambridge University Press, Cambridge,
2010.
[9] Anderson, P. W. Absence of diffusion in certain random lattices. Phys. Rev. 109(5):1492–
1505, 1958. doi:10.1103/PhysRev.109.1492
[10] Bai, Z. D. Convergence rate of expected spectral distributions of large random matrices. I.
Wigner matrices. Ann. Probab. 21(2):625–648, 1993.
[11] Bai, Z. D., Miao, B., and Tsay, J. Convergence rates of the spectral distributions of large
Wigner matrices. Int. Math. J. 1(1):65–90, 2002.
[12] Bai, Z. D., and Yin, Y. Q. Limit of the smallest eigenvalue of a large-dimensional sample
covariance matrix. Ann. Probab. 21(3):1275–1294, 1993.
[13] Bakry, D., and Émery, M. Diffusions hypercontractives. Séminaire de probabilités,
XIX, 1983/84, 177–206. Lecture Notes in Mathematics, 1123. Springer, Berlin, 1985.
doi:10.1007/BFb0075847
[14] Bao, Z., and Erdős, L. Delocalization for a class of random block band matrices. Probab.
Theory Related Fields 1–104, 2016. doi:10.1007/s00440-015-0692-y
[15] Bao, Z., Erdős, L., and Schnelli, K. Local law of addition of random matrices on optimal
scale. Comm. Math. Phys. 349(3):947–990, 2017. doi:10.1007/s00220-016-2805-6
[16] Bauerschmidt, R., Huang, J., Knowles, A., and Yau, H.-T. Bulk eigenvalue statistics for
random regular graphs. Preprint, 2015. arXiv:1505.06700 [math.PR].
[17] Bauerschmidt, R., Knowles, A., and Yau, H.-T. Local semicircle law for random regular
graphs. Preprint, 2015. arXiv:1503.08702 [math.PR].
[18] Bekerman, F., Figalli, A., and Guionnet, A. Transport maps for 𝛽-matrix models and uni-
versality. Comm. Math. Phys. 338(2):589–619, 2015. doi:10.1007/s00220-015-2384-y

217
218 REFERENCES

[19] Ben Arous, G., and Péché, S. Universality of local eigenvalue statistics for some
sample covariance matrices. Comm. Pure Appl. Math. 58(10):1316–1357, 2005.
doi:10.1002/cpa.20070
[20] Bleher, P., and Its, A. Semiclassical asymptotics of orthogonal polynomials, Riemann-
Hilbert problem, and universality in the matrix model. Ann. of Math. (2) 150(1):185–266,
1999. doi:10.2307/121101
[21] Bloemendal, A., Erdős, L., Knowles, A., Yau, H.-T., and Yin, J. Isotropic local laws for
sample covariance and generalized Wigner matrices. Electron. J. Probab. 19(33):53 pp.,
2014.
[22] Bloemendal, A., Knowles, A, Yau, H.-T., and Yin, J. On the principal components
of sample covariance matrices. Probab. Theory Related Fields 164(1-2):459–552, 2016.
doi:10.1007/s00440-015-0616-x
[23] Bourgade, P., Erdős, L., and Yau, H.-T. Bulk universality of general 𝛽-ensembles with non-
convex potential. J. Math. Phys. 53(9):095221, 19 pp., 2012. doi:10.1063/1.4751478
[24] . Edge universality of beta ensembles. Comm. Math. Phys. 332(1):261–353, 2014.
doi:10.1007/s00220-014-2120-z
[25] . Universality of general 𝛽-ensembles. Duke Math. J. 163(6):1127–1190, 2014.
doi:10.1215/00127094-2649752
[26] Bourgade, P., Erdős, L., Yau, H.-T., and Yin, J. Fixed energy universality for generalized
Wigner matrices. Comm. Pure Appl. Math. 69(10):1815–1881, 2016. doi:10.1002/cpa.21624
[27] . Universality for a class of random band matrices. Preprint, 2016.
arXiv:1602.02312 [math.PR].
[28] Bourgade, P., and Yau, H.-T. The eigenvector moment flow and local quantum unique
ergodicity. Comm. Math. Phys. 1–48, 2016. doi:10.1007/s00220-016-2627-6
[29] Brascamp, H. J., and Lieb, E. H. Best constants in Young’s inequality, its converse, and
its generalization to more than three functions. Advances in Math. 20(2):151–173, 1976.
doi:10.1016/0001-8708(76)90184-5
[30] Brézin, E., and Hikami, S. Correlations of nearby levels induced by a random potential.
Nuclear Phys. B 479(3):697–706, 1996. doi:10.1016/0550-3213(96)00394-X
[31] . Spectral form factor in a random matrix theory. Phys. Rev. E (3) 55(4):4067–4083,
1997. doi:10.1103/PhysRevE.55.4067
[32] Cacciapuoti, C., Maltsev, A., and Schlein, B. Bounds for the Stieltjes transform and the
density of states of Wigner matrices. Probab. Theory Related Fields 163(1-2):1–59, 2015.
doi:10.1007/s00440-014-0586-4
[33] Carlen, E. A., and Loss, M. Optimal smoothing and decay estimates for viscously damped
conservation laws, with applications to the 2-D Navier-Stokes equation. Duke Math. J.
81(1):135–157 (1996), 1995. doi:10.1215/S0012-7094-95-08110-1
[34] Chen, T. Localization lengths and Boltzmann limit for the Anderson model at small dis-
orders in dimension 3. J. Stat. Phys. 120(1):279–337, 2005. doi:10.1007/s10955-005-5255-7
[35] Colin de Verdière, Y. Ergodicité et fonctions propres du laplacien. Comm. Math. Phys.
102(3):497–502, 1985.
[36] Davies, E. B. The functional calculus. J. London Math. Soc. (2) 52(1):166–176, 1995.
doi:10.1112/jlms/52.1.166
[37] Deift, P. A. Orthogonal polynomials and random matrices: a Riemann-Hilbert approach.
Courant Lecture Notes in Mathematics, 3. New York University, Courant Institute of
Mathematical Sciences, New York; American Mathematical Society, Providence, R.I.,
1999.
[38] Deift, P., and Gioev, D. Universality at the edge of the spectrum for unitary, orthogonal,
and symplectic ensembles of random matrices. Comm. Pure Appl. Math. 60(6):867–910,
2007. doi:10.1002/cpa.20164
REFERENCES 219

[39] . Universality in random matrix theory for orthogonal and symplectic ensembles.
Int. Math. Res. Pap. 2007(2):116, Art. ID rpm004, 2007. doi:10.1093/imrp/rpm004
[40] . Random matrix theory: invariant ensembles and universality. Courant Lecture
Notes in Mathematics, 18. Courant Institute of Mathematical Sciences, New York; Amer-
ican Mathematical Society, Providence, R.I., 2009. doi:10.1090/cln/018
[41] Deift, P., Kriecherbauer, T., McLaughlin, K. T.-R., Venakides, S., and Zhou,
X. Strong asymptotics of orthogonal polynomials with respect to exponential
weights. Comm. Pure Appl. Math. 52(12):1491–1552, 1999. doi:10.1002/(SICI)1097-
0312(199912)52:12<1491::AID-CPA2>3.3.CO;2-R
[42] . Uniform asymptotics for polynomials orthogonal with respect to vary-
ing exponential weights and applications to universality questions in random ma-
trix theory. Comm. Pure Appl. Math. 52(11):1335–1425, 1999. doi:10.1002/(SICI)1097-
0312(199911)52:11<1335::AID-CPA1>3.0.CO;2-1
[43] Deuschel, J.-D., and Stroock, D. W. Large deviations. Pure and Applied Mathematics, 137.
Academic Press, Boston, 1989.
[44] Disertori, M., Pinson, H., and Spencer, T. Density of states for random band matrices.
Comm. Math. Phys. 232(1):83–124, 2002. doi:10.1007/s00220-002-0733-0
[45] Dyson, F. J. A Brownian-motion model for the eigenvalues of a random matrix. J. Mathe-
matical Phys. 3:1191–1198, 1962. doi:10.1063/1.1703862
[46] . Statistical theory of the energy levels of complex systems. I, II, III. J. Mathematical
Phys. 3:140–156, 1962. doi:10.1063/1.1703773
[47] . Correlations between eigenvalues of a random matrix. Comm. Math. Phys.
19:235–250, 1970.
[48] Edmunds, D. E., and Triebel, H. Sharp Sobolev embeddings and related Hardy inequali-
ties: the critical case. Math. Nachr. 207:79–92, 1999. doi:10.1002/mana.1999.3212070105
[49] Efetov, K. Supersymmetry in disorder and chaos. Cambridge University Press, Cambridge,
1997.
[50] Erdős, L., and Knowles, A. Quantum diffusion and delocalization for band matrices with
general distribution. Ann. Henri Poincaré 12(7):1227–1319, 2007. doi:10.1007/s00023-011-
0104-5
[51] . Quantum diffusion and eigenfunction delocalization in a random band matrix
model. Comm. Math. Phys. 303(2):509–554, 2011. doi:10.1007/s00220-011-1204-2
[52] Erdős, L., Knowles, A., and Yau, H.-T. Averaging fluctuations in resolvents of random band
matrices. Ann. Henri Poincaré 14(8):1837–1926, 2013. doi:10.1007/s00023-013-0235-y
[53] Erdős, L., Knowles, A., Yau, H.-T., and Yin, J. Spectral statistics of Erdős-Rényi Graphs
II: Eigenvalue spacing and the extreme eigenvalues. Comm. Math. Phys. 314(3):587–640,
2012. doi:10.1007/s00220-012-1527-7
[54] . Delocalization and diffusion profile for random band matrices. Comm. Math.
Phys. 323(1):367–416, 2013. doi:10.1007/s00220-013-1773-3
[55] . The local semicircle law for a general class of random matrices. Electron. J. Probab.
18(59):58 pp., 2013. doi:10.1214/EJP.v18-2473
[56] . Spectral statistics of Erdős-Rényi graphs I: Local semicircle law. Ann. Probab.
41(3B):2279–2375, 2013. doi:10.1214/11-AOP734
[57] Erdős, L., Péché, S., Ramírez, J. A., Schlein, B., and Yau, H.-T. Bulk universality for Wigner
matrices. Comm. Pure Appl. Math. 63(7):895–925, 2010. doi:10.1002/cpa.20317
[58] Erdős, L., Ramírez, J., Schlein, B., Tao, T., Vu, V., and Yau, H.-T. Bulk universality for
Wigner Hermitian matrices with subexponential decay. Math. Res. Lett. 17(4):667–674,
2010. doi:10.4310/MRL.2010.v17.n4.a7
220 REFERENCES

[59] Erdős, L., Ramírez, J. A., Schlein, B., and Yau, H.-T. Universality of sine-kernel for Wigner
matrices with a small Gaussian perturbation. Electron. J. Probab. 15(18):526–603, 2010.
doi:10.1214/EJP.v15-768
[60] Erdős, L., Schlein, B., and Yau, H.-T. Local semicircle law and complete delocalization for
Wigner random matrices. Comm. Math. Phys. 287(2):641–655, 2009. doi:10.1007/s00220-
008-0636-9
[61] . Semicircle law on short scales and delocalization of eigenvectors for Wigner ran-
dom matrices. Ann. Probab. 37(3):815–852, 2009. doi:10.1214/08-AOP421
[62] . Wegner estimate and level repulsion for Wigner random matrices. Int. Math. Res.
Not. IMRN 2010(3):436–479, 2010. doi:10.1093/imrn/rnp136
[63] . Universality of random matrices and local relaxation flow. Invent. Math.
185(1):75–119, 2011. doi:10.1007/s00222-010-0302-7
[64] Erdős, L., Schlein, B., Yau, H.-T., and Yin, J. The local relaxation flow approach to uni-
versality of the local statistics for random matrices. Ann. Inst. Henri Poincaré Probab. Stat.
48(1):1–46, 2012. doi:10.1214/10-AIHP388
[65] Erdős, L., and Schnelli, K. Universality for random matrix flows with time-dependent den-
sity. Preprint, 2015. arXiv:1504.00650 [math.PR].
[66] Erdős, L., and Yau, H.-T. A comment on the Wigner-Dyson-Mehta bulk universality con-
jecture for Wigner matrices. Electron. J. Probab. 17(28):5 pp., 2012. doi:10.1214/EJP.v17-
1779
[67] . Gap universality of generalized Wigner and 𝛽-ensembles. J. Eur. Math. Soc.
(JEMS) 17(8):1927–2036, 2015. doi:10.4171/JEMS/548
[68] Erdős, L., Yau, H.-T., and Yin, J. Universality for generalized Wigner matrices with
Bernoulli distribution. J. Comb. 2(1):15–81, 2011. doi:10.4310/JOC.2011.v2.n1.a2
[69] . Bulk universality for generalized Wigner matrices. Probab. Theory Related Fields
154(1-2):341–407, 2012. doi:10.1007/s00440-011-0390-3
[70] . Rigidity of eigenvalues of generalized Wigner matrices. Adv. Math. 229(3):1435–
1515, 2012. doi:10.1016/j.aim.2011.12.010
[71] Firk, F. W. K., and Miller, S. J. Nuclei, primes and the random matrix connection. Sym-
metry 1:64–105. doi:10.3390/sym1010064
[72] Fokas, A. S., It.s, A. R., and Kitaev, A. V. The isomonodromy approach to matrix models in
2D quantum gravity. Comm. Math. Phys. 147(2):395–430, 1992. doi:10.1007/BF02096594
[73] Forrester, P. J. Log-gases and random matrices. London Mathematical Society Monographs
Series, 34. Princeton University Press, Princeton, N.J., 2010. doi:10.1515/9781400835416
[74] Fröhlich, J., and Spencer, T. Absence of diffusion in the Anderson tight binding model for
large disorder or low energy. Comm. Math. Phys. 88(2):151–184, 1983.
[75] Fukushima, M., Ōshima, Y., and Takeda, M. Dirichlet forms and symmetric Markov
processes. De Gruyter Studies in Mathematics, 19. Walter de Gruyter, Berlin, 1994.
doi:10.1515/9783110889741
[76] Fyodorov, Y. V., and Mirlin, A. D. Scaling properties of localization in ran-
dom band matrices: a 𝜍-model approach. Phys. Rev. Lett. 67(18):2405–2409, 1991.
doi:10.1103/PhysRevLett.67.2405
[77] Girko, V. L. Asymptotics of the distribution of the spectrum of random matrices. Trans-
lated from Uspekhi Mat. Nauk 44(4(268)):7–34, 256, 1989; Russian Math. Surveys 44(4):3–
36, 1989. doi:10.1070/RM1989v044n04ABEH002143
[78] Götze, F., Naumov, A., and Tikhomirov, A. Local semicircle law under moment conditions.
Part I: The Stieltjes transform. Preprint, 2015. arXiv:1510.07350 [math.PR].
[79] Gross, L. Logarithmic Sobolev inequalities. Amer. J. Math. 97(4):1061–1083, 1975.
doi:10.2307/2373688
REFERENCES 221

[80] Guionnet, A., and Zeitouni, O. Concentration of the spectral measure for large matrices.
Electron. Comm. Probab. 5:119–136, 2000. doi:10.1214/ECP.v5-1026
[81] Helffer, B., and Sjöstrand, J. On the correlation for Kac-like models in the convex case. J.
Statist. Phys. 74(1-2):349–409, 1994. doi:10.1007/BF02186817
[82] Holowinsky, R. Sieving for mass equidistribution. Ann. of Math. (2) 172(2):1499–1516,
2010.
[83] Holowinsky, R., and Soundararajan, K. Mass equidistribution for Hecke eigenforms. Ann.
of Math. (2) 172(2):1517–1528, 2010.
[84] Huang, J., Landon, B., and Yau, H.-T. Bulk universality of sparse random matrices. J. Math.
Phys. 56(12):123301, 19 pp., 2015. doi:10.1063/1.4936139
[85] Itzykson, C., and Zuber, J. B. The planar approximation. II. J. Math. Phys. 21(3):411–421,
1980. doi:10.1063/1.524438
[86] Johansson, K. Universality of the local spacing distribution in certain ensem-
bles of Hermitian Wigner matrices. Comm. Math. Phys. 215(3):683–705, 2001.
doi:10.1007/s002200000328
[87] Klein, A. Absolutely continuous spectrum in the Anderson model on the Bethe lattice.
Math. Res. Lett. 1(4):399–407, 1994. doi:10.4310/MRL.1994.v1.n4.a1
[88] Knowles, A., and Yin, J. Eigenvector distribution of Wigner matrices. Probab. Theory Re-
lated Fields 155(3-4):543–582, 2013. doi:10.1007/s00440-011-0407-y
[89] . The isotropic semicircle law and deformation of Wigner matrices. Comm. Pure
Appl. Math. 66(11):1663–1750, 2013. doi:10.1002/cpa.21450
[90] . Anisotropic local laws for random matrices. Probab. Theory Relat. Fields 2016,
1–96. doi:10.1007/s00440-016-0730-4
[91] Kriecherbauer, T., and Shcherbina, M. Fluctuations of eigenvalues of matrix models and
their applications. Preprint, 2010. arXiv:1003.6121. [math-ph].
[92] Krishnapur, M., Rider, B., and Virág, B. Universality of the stochastic Airy operator. Comm.
Pure Appl. Math. 69(1):145–199, 2016. doi:10.1002/cpa.21573
[93] Landon, B., and Yau, H.-T. Convergence of local statistics of Dyson Brownian motion.
Comm. Math. Phys., to appear.
[94] Lee, J. O., and Schnelli, K. Local deformed semicircle law and complete delocalization
for Wigner matrices with random potential. J. Math. Phys. 54(10):103504, 62 pp., 2013.
doi:10.1063/1.4823718
[95] . Edge universality for deformed Wigner matrices. Rev. Math. Phys. 27(8):1550018,
94 pp., 2015. doi:10.1142/S0129055X1550018X
[96] . Tracy–Widom distribution for the largest eigenvalue of real sample covari-
ance matrices with general population. Ann. Appl. Probab. 26(6):3786–3839, 2016.
doi:10.1214/16-AAP1193
[97] Lee, J. O., Schnelli, K., Stetler, B., and Yau, H.-T. Bulk universality for deformed Wigner
matrices. Ann. Probab. 44(3):2349–2425, 2016. doi:10.1214/15-AOP1023
[98] Lee, J. O., and Yin, J. A necessary and sufficient condition for edge universality of Wigner
matrices. Duke Math. J. 163(1):117–173, 2014. doi:10.1215/00127094-2414767
[99] Levin, E., and Lubinsky, D. S. Universality limits in the bulk for varying measures. Adv.
Math. 219(3):743–779, 2008. doi:10.1016/j.aim.2008.06.010
[100] Lieb, E. H., and Loss, M. Analysis. Second edition. Graduate Studies in Mathematics, 14.
American Mathematical Society, Providence, R.I., 2001. doi:10.1090/gsm/014
[101] Lindenstrauss, E. Invariant measures and arithmetic quantum unique ergodicity. Ann. of
Math. (2) 163(1):165–219, 2006. doi:10.4007/annals.2006.163.165
[102] Lubinsky, D. S. A new approach to universality limits involving orthogonal polynomials.
Ann. of Math. (2) 170(2):915–939, 2009. doi:10.4007/annals.2009.170.915
222 REFERENCES

[103] Lytova, A., and Pastur, L. Central limit theorem for linear eigenvalue statistics of random
matrices with independent entries. Ann. Probab. 37(5):1778–1840, 2009. doi:10.1214/09-
AOP452
[104] Marčenko, V. A., and Pastur, L. A. Distribution of eigenvalues in certain sets of random
matrices. Mat. Sb. (N.S.) 72(114):507–536, 1967.
[105] Mehta, M. L. A note on correlations between eigenvalues of a random matrix. Comm.
Math. Phys. 20:245–250, 1971.
[106] . Random matrices. Second edition. Academic Press, Boston, 1991.
[107] Mehta, M. L., and Gaudin, M. On the density of eigenvalues of a random matrix. Nuclear
Phys. 18:420–427, 1960.
[108] Minami, N. Local fluctuation of the spectrum of a multidimensional Anderson tight bind-
ing model. Comm. Math. Phys. 177(3):709–725, 1996.
[109] Naddaf, A., and Spencer, T. On homogenization and scaling limit of some gradi-
ent perturbations of a massless free field. Comm. Math. Phys. 183(1):55–84, 1997.
doi:10.1007/BF02509796
[110] Pastur, L. A. Spectra of random selfadjoint operators. Uspehi Mat. Nauk 28(1(169)):3–64,
1973.
[111] Pastur, L., and Shcherbina, M. On the edge universality of the local eigenvalue statistics
of matrix models. Mat. Fiz. Anal. Geom. 10(3):335–365, 2003.
[112] . Bulk universality and related properties of Hermitian matrix models. J. Stat. Phys.
130(2):205–250, 2008. doi:10.1007/s10955-007-9434-6
[113] . Eigenvalue distribution of large random matrices. Mathematical Surveys
and Monographs, 171. American Mathematical Society, Providence, R.I., 2011.
doi:10.1090/surv/171
[114] Pillai, N. S., and Yin, J. Universality of covariance matrices. Ann. Appl. Probab. 24(3):935–
1001, 2014. doi:10.1214/13-AAP939
[115] Rudnick, Z., and Sarnak, P. The behaviour of eigenstates of arithmetic hyperbolic mani-
folds. Comm. Math. Phys. 161(1):195–213, 1994.
[116] Schenker, J. Eigenvector localization for random band matrices with power law band
width. Comm. Math. Phys. 290(3):1065–1097, 2009. doi:10.1007/s00220-009-0798-0
[117] Shcherbina, M. Edge universality for orthogonal ensembles of random matrices. J. Stat.
Phys. 136(1):35–50, 2009. doi:10.1007/s10955-009-9766-5
[118] . Central limit theorem for linear eigenvalue statistics of the Wigner and sample
covariance random matrices. Zh. Mat. Fiz. Anal. Geom. 7(2):176–192, 197, 199, 2011.
[119] . Change of variables as a method to study general 𝛽-models: bulk universality. J.
Math. Phys. 55(4):043504, 23 pp., 2014. doi:10.1063/1.4870603
[120] Shcherbina, T. On the second mixed moment of the characteristic polynomials of 1D band
matrices. Comm. Math. Phys. 328(1):45–82, 2014. doi:10.1007/s00220-014-1947-7
[121] . Universality of the local regime for the block band matrices with a finite number
of blocks. J. Stat. Phys. 155(3):466–499, 2014. doi:10.1007/s10955-014-0964-4
[122] . Universality of the second mixed moment of the characteristic polynomials of
the 1D band matrices: real symmetric case. J. Math. Phys. 56(6):063303, 23 pp., 2015.
doi:10.1063/1.4922621
[123] Šnirel′ man, A. I. Ergodic properties of eigenfunctions. Uspehi Mat. Nauk 29(6(180)):181–
182, 1974.
[124] Sodin, S. The spectral edge of some random band matrices. Ann. of Math. (2) 172(3):2223–
2251, 2010. doi:10.4007/annals.2010.172.2223
[125] Soshnikov, A. Universality at the edge of the spectrum in Wigner random matrices. Comm.
Math. Phys. 207(3):697–733, 1999. doi:10.1007/s002200050743
REFERENCES 223

[126] Spencer, T. Random banded and sparse matrices. The Oxford handbook of random matrix
theory, 471–488. Oxford University Press, Oxford, 2011.
[127] Stroock, D. W. Probability theory, an analytic view. Cambridge University Press, Cam-
bridge, 1993.
[128] Tao, T. Topics in random matrix theory. Graduate Studies in Mathematics, 132. American
Mathematical Society, Providence, R.I., 2012. doi:10.1090/gsm/132
[129] . The asymptotic distribution of a single eigenvalue gap of a Wigner matrix. Probab.
Theory Related Fields 157(1-2):81–106, 2013. doi:10.1007/s00440-012-0450-3
[130] Tao, T., and Vu, V. Random matrices: localization of the eigenvalues and the necessity of
four moments. Acta Math. Vietnam. 36(2):431–449, 2011.
[131] . Random matrices: universality of local eigenvalue statistics. Acta Math.
206(1):127–204, 2011. doi:10.1007/s11511-011-0061-3
[132] . The Wigner-Dyson-Mehta bulk universality conjecture for Wigner matrices. Elec-
tron. J. Probab. 16(77):2104–2121, 2011. doi:10.1214/EJP.v16-944
[133] . Random matrices: universal properties of eigenvectors. Random Matrices Theory
Appl. 1(1):1150001, 27 pp., 2012. doi:10.1142/S2010326311500018
[134] Tracy, C. A., and Widom, H. Level-spacing distributions and the Airy kernel. Comm. Math.
Phys. 159(1):151–174, 1994.
[135] . On orthogonal and symplectic matrix ensembles. Comm. Math. Phys. 177(3):727–
754, 1996.
[136] Valkó, B., and Virág, B. Continuum limits of random matrices and the Brownian carousel.
Invent. Math. 177(3):463–508, 2009. doi:10.1007/s00222-009-0180-z
[137] Voiculescu, D. Limit laws for random matrices and free products. Invent. Math.
104(1):201–220, 1991. doi:10.1007/BF01245072
[138] Wigner, E. P. Characteristic vectors of bordered matrices with infinite dimensions. Ann.
of Math. (2) 62:548–564, 1955. doi:10.2307/1970079
[139] Wishart, J. The generalised product moment distribution in samples from a normal mul-
tivariate population. Biometrika 20A(1/2):32–52, 1928. doi:10.2307/2331939
[140] Yau, H.-T. Relative entropy and hydrodynamics of Ginzburg-Landau models. Lett. Math.
Phys. 22(1):63–80, 1991. doi:10.1007/BF00400379
[141] Zelditch, S. Uniform distribution of eigenfunctions on compact hyperbolic surfaces. Duke
Math. J. 55(4):919–941, 1987. doi:10.1215/S0012-7094-87-05546-3
Index

𝛽-ensemble, 20 gap universality; averaged, fixed, 30, 204


Gaudin distribution, 2
admissible control parameter, 67 Gaussian 𝛽-ensemble, 20, 32
Airy function, 26 Gaussian divisible ensemble, 4, 32
Airy kernel, 26 Gaussian orthogonal ensemble (GOE), 2, 8
Anderson localization, 211 Gaussian unitary ensemble (GUE), 2, 8
Anderson transition, 212, 215 generalized Wigner matrix, 3, 8
generator, 115
backtracking sequence, 12
Gibbs inequality, 123
Bakry-Émery argument, 129
Gibbs measure, 19
bandwidth, 215
Green function, 35
Brascamp-Lieb inequality, 138, 145
Green function comparison, 183, 192, 195
Catalan numbers, 13 Green function continuity, 171
Christoffel-Darboux formula, 23 Gross’ hypercontractivity bound, 136
classical exponent, 19
Haar measure, 19, 214
classical location of eigenvalues, 106
Harish-Chandra-Itzykson-Zuber formula,
concentration inequality, 133
3
correlation function, 22
Helffer-Sjöstrand formula, 100
deformed Wigner matrix, 214 Herbst bound, 133
delocalization, 31, 212 Hermite polynomial, 21
determinantal correlation, 23
integrated density of states, 103
Dirichlet form, 114, 129, 151
interlacing of eigenvalues, 47
Dirichlet form inequality, 155, 163
invariant ensemble, 2, 17, 207
Dyson conjecture, 4
invariant measure, 32
Dyson Brownian motion (DBM), 4, 32, 110
isotropic local law, 39, 39
Dyson conjecture, 32, 116, 153
level repulsion, 111
edge universality, 191, 210
local equilibrium, 4, 32
empirical distribution function, 103
local relaxation flow, 154
empirical eigenvalue distribution, 103
local relaxation measure, 153
empirical eigenvalue measure, 14
local semicircle law, 4, 39
energy parameter, 14
log-gas, 20, 207
energy universality; averaged, fixed, 29,
logarithmic Sobolev inequality (LSI), 129
203
entropy inequality, 125 Marchenko-Pastur law, 11
Marcinkiewicz-Zygmund inequality, 58
fluctuation averaging, 40, 61, 73, 83
mean field model, 8
free convolution, 214
moment matching, 188
225
226 INDEX

moment method, 12 sine-kernel, 24


single-entry distribution, 8
Ornstein-Uhlenbeck process, 4, 31, 109, sparse graph, 214
171 spectral domain, 37, 67
spectral edge, 34
partial expectation, 47, 62
spectral gap, 132
Pinsker inequality, 125
spectral parameter, 14
Plancherel-Rotach asymptotics, 24, 26
Stieltjes transform, 14, 34, 100
quadratic large deviation, 58 stochastic domination, 37
quantum (unique) ergodicity, 213, 216
three-step strategy, 4, 31, 205
quaternion determinant, 25
Tracy-Widom law, 191
random band matrix, 33, 215
universal Wigner matrix, 7
random Schrödinger operator, 33, 211
regular graph, 215 Vandermonde determinant, 18
regular potential, 208
relative entropy, 123 Ward identity, 63
resolvent, 14, 35 weak local law, 39, 45
resolvent decoupling identities, 62 Wigner-Dyson-Mehta (WDM) conjecture,
Riemann-Hilbert method, 3 2, 216
rigidity of eigenvalues, 31, 106, 208 Wigner-type matrix, 214
Wigner ensemble, 2, 7
sample covariance matrix, 9, 215 Wigner semicircle law, 11, 34
Schur formula, 45 Wigner surmise, 2
self-consistent equation, 40, 49, 61 Wishart matrix, 9
Published Titles in This Series
28 László Erdős and Horng-Tzer Yau, A Dynamical Approach to Random Matrix
Theory, 2017
27 S. R. S. Varadhan, Large Deviations, 2016
26 Jerome K. Percus and Stephen Childress, Mathematical Models in Developmental
Biology, 2015
25 Kurt O. Friedrichs, Mathematical Methods of Electromagnetic Theory, 2014
24 Christof Schütte and Marco Sarich, Metastability and Markov State Models in
Molecular Dynamics, 2013
23 Jerome K. Percus, Mathematical Methods in Immunology, 2011
22 Frank C. Hoppensteadt, Mathematical Methods for Analysis of a Complex Disease, 2011
21 Frank C. Hoppensteadt, Quasi-Static State Analysis of Differential, Difference, Integral,
and Gradient Systems, 2010
20 Pierpaolo Esposito, Nassif Ghoussoub, and Yujin Guo, Mathematical Analysis of
Partial Differential Equations Modeling Electrostatic MEMS, 2010
19 Stephen Childress, An Introduction to Theoretical Fluid Mechanics, 2009
18 Percy Deift and Dimitri Gioev, Random Matrix Theory: Invariant Ensembles and
Universality, 2009
17 Ping Zhang, Wigner Measure and Semiclassical Limits of Nonlinear Schrödinger
Equations, 2008
16 S. R. S. Varadhan, Stochastic Processes, 2007
15 Emil Artin, Algebra with Galois Theory, 2007
14 Peter D. Lax, Hyperbolic Partial Differential Equations, 2006
13 Oliver Bühler, A Brief Introduction to Classical, Statistical, and Quantum Mechanics,
2006
12 Jürgen Moser and Eduard J. Zehnder, Notes on Dynamical Systems, 2005
11 V. S. Varadarajan, Supersymmetry for Mathematicians: An Introduction, 2004
10 Thierry Cazenave, Semilinear Schrödinger Equations, 2003
9 Andrew Majda, Introduction to PDEs and Waves for the Atmosphere and Ocean, 2003
8 Fedor Bogomolov and Tihomir Petrov, Algebraic Curves and One-Dimensional Fields,
2002
7 S. R. S. Varadhan, Probability Theory, 2001
6 Louis Nirenberg, Topics in Nonlinear Functional Analysis, 2001
5 Emmanuel Hebey, Nonlinear Analysis on Manifolds: Sobolev Spaces and Inequalities,
2000
3 Percy Deift, Orthogonal Polynomials and Random Matrices: A Riemann-Hilbert
Approach, 2000
2 Jalal Shatah and Michael Struwe, Geometric Wave Equations, 2000
1 Qing Han and Fanghua Lin, Elliptic Partial Differential Equations, Second Edition, 2011
A Dynamical Approach
to Random Matrix Theory
L Á S Z L Ó E R DŐS A ND HO R NG-T ZER YAU

This book is a concise and self-contained introduction of recent techniques


to prove local spectral universality for large random matrices. Random
matrix theory is a fast expanding research area, and this book mainly
focuses on the methods that the authors participated in developing over the
past few years. Many other interesting topics are not included, and neither
are several new developments within the framework of these methods. The
authors have chosen instead to present key concepts that they believe are
the core of these methods and should be relevant for future applications.
They keep technicalities to a minimum to make the book accessible to
graduate students. With this in mind, they include in this book the basic
notions and tools for high-dimensional analysis, such as large deviation,
entropy, Dirichlet form, and the logarithmic Sobolev inequality.
This manuscript has been developed and continuously improved over the
last five years. The authors have taught this material in several regular
graduate courses at Harvard, Munich, and Vienna, in addition to various
summer schools and short courses.

For additional information


and updates on this book, visit
www.ams.org/bookpages/cln-28

CLN/28

AMS on the Web


New York University www.ams.org

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy