grothendieck
grothendieck
grothendieck
Leqi Zhu
1 Introduction
Let A = (Aij ) ∈ Rm×n be an m × n matrix. Then A defines a linear operator
between normed spaces (Rm , k · kp ) and (Rn , k · kq ), for 1 ≤ p, q ≤ ∞. The
(p → q)-norm of A is the quantity kAkp→q = maxx∈Rn :kxkp =1 kAxkq . (Recall
P
that, for a vector x = (xi ) ∈ Rd , the p-norm of x is kxkp = ( i |xi |p )1/p ; the
∞-norm of x is kxk∞ = maxi |xi |.) If p = q, then we denote the norm by kAkp .
For what value of p and q is kAkp→q maximized? Since A is linear, it suffices
to consider p such that {x ∈ Rn : kxkp ≤ 1} contains as many points as possible.
We also want kAxkq as large as possible. Figure 1 gives an illustration of
{x ∈ R2 : kxkp = 1}, for p ∈ {1, 2, ∞}. Going by the figure, kAk∞→1 ≥ kAkp→q .
1 ∞
2
1
-1 1
-1
1
P P
To see this, note that i,j Aij xi yj = i (Ay)i xi . Taking the maximum over
x ∈ {−1, 1}m gives us kAyk1 . Then taking the maximum over y ∈ {−1, 1}n
gives us kAk∞→1 (this requires an argument, but it follows from the convexity of
{x ∈ Rm : kxk∞ = 1} and triangle inequality). This quadratic integer program
may be relaxed to the following semidefinite program:
X
max Aij hx(i) , y (j) i
i,j
2
It turns out that Krivine’s proof can also be adapted to prove a theorem
about degree-2 pseudo-distributions µ : {−1, 1}m+n → R. Recall that µ has to
satisfy two properties for the pseudo-expectation Ẽµ that arises from µ: Ẽµ [1] =
P 2
P 2
x∈{−1,1}m+n µ(x) = 1 and Ẽµ [f ] = x∈{−1,1}m+n µ(x)(f (x)) ≥ 0 for all
m+n
degree-1 polynomials f : {−1, 1} → R.
Theorem (SOS). For any degree-2 pseudo-distribution µ : {−1, 1}m+n → R,
X
Ẽµ(x,y) Aij xi yj ≤ KkAk∞→1
i,j
2 Grothendieck’s Inequality
Krivine’s proof of Grothendieck’s Inequality relies on Grothendieck’s Identity,
which, as the name suggests, was first proved by Grothendieck:
Lemma (Grothendieck’s Identity). Let x and y be unit vectors in (Rd , k · k2 ),
where d ≥ 2. If z is a unit vector picked uniformly at random from (Rd , k · k2 ),
then
2
E[sign(hx, zi)sign(hy, zi)] = arcsin(hx, yi) .
π
Here, sign(a) ∈ {−1, 1} is 1 if and only if a ≥ 0.
Proof. Consider sign(hx, zi)sign(hy, zi). This has a nice geometric interpreta-
tion. First, we orient the sphere {x ∈ Rd : kxk2 = 1} so that z is at the top.
It can be verified that sign(hx, zi)sign(hy, zi) is 1 if and only if both x and y
lie in the same (upper or lower) half of the sphere when it is oriented this way.
Equivalently, {x ∈ Rd : hz, xi = 0} is a hyperplane passing through the origin
(with normal z). A vector x ∈ Rd satisfies hx, zi > 0 if and only if it lies above
the hyperplane. Figure 2 contains a depiction of this.
Now, consider the expectation. Given the geometric interpretation, the
expectation is Pr[x, y lie in same half] − Pr[x, y lie in different halves] = 1 −
2 Pr[x, y lie in different halves], when a random hyperplane passing through the
origin is selected (with normal z). Then we note that the probability x and y
2θ
lie in different halves of the circle is 2π , where θ is the angle between x and
y (factor of 2 comes from z and −z defining same hyperplane). Hence, the
expectation is 1 − 2θ 2 2
π . On the other hand, π arcsin(hx, yi) = π arcsin(cos θ) =
2 π 2 π 2θ
π arcsin(sin( 2 − θ)) = π ( 2 − θ) = 1 − π .
This doesn’t appear to help much as we don’t know what arcsin(hx, yi) is.
The next lemma addresses this problem.
3
x z x z
y
y
sign(hx, zi)sign(hy, zi) = 1 sign(hx, zi)sign(hy, zi) = −1
Lemma (Krivine/Alon and Naor). Suppose that x(i) , y (j) are unit vectors in
Rm+n , for i ∈ [m], j ∈ [n]. Then there are unit vectors x̂(i) , ŷ (j) in Rm+n , for
i ∈ [m], j ∈ [m], such that
√
arcsin(hx̂(i) , ŷ (j) i) = ln(1 + 2)hx(i) , y (j) i .
√
Proof. Let c = ln(1 + 2) and d = m + n. By Taylor’s expansion,
∞
X c2k+1
sin(chx(i) , y (j) i) = (−1)k (hx(i) , y (j) i)2k+1 .
(2k + 1)!
k=0
Our goal is to write the above as the inner product of two vectors in some vector
space. This suggests that we need an infinite dimensional vector space. Towards
this end, consider the infinite-dimensional vector space H obtained by taking
the direct product of 2k + 1 tensor powers of Rd , i.e. H = ⊕∞ d ⊗2k+1
k=0 (R ) . (As
a bit of an aside, the direct sum of two vector spaces A and B of dimension α
and β, respectively, is a vector space A ⊕ B of dimension α + β; given vectors
a ∈ A, b ∈ B, we get the vector a ⊕ b = (a1 , . . . , aα , b1 , . . . , bβ ). Similarly, the
tensor product of A and B, A ⊗ B, gives a vector space of dimension αβ; given
vectors a ∈ A and b ∈ B, we get the vector a ⊗ b = (ai bj )i,j .)
Let X (i) and Y (j) be vectors in H with the following “coordinates” for the
k’th part in the direct sum:
s
(i) c2k+1
Xk = (−1) k
(x(i) )⊗(2k+1)
(2k + 1)!
s
(j) c2k+1
Yk = (−1)k (y (j) )⊗(2k+1)
(2k + 1)!
4
Consider the span, S, of {X (i) , Y (j) }. As there are only d = m + n vectors,
S is isomorphic a subspace in Rd . By finding an orthonormal basis for S (for
example, using Gram-Schmidt) and mapping the basis to the standard basis
for Rm+n , we can preserve inner products. Thus, X (i) , Y (j) correspond to unit
vectors x̂(i) , ŷ (j) in Rd with the same inner product (in H and Rd , respectively).
It
√ follows that arcsin(hx̂(i) , ŷ (j) i) = arcsin(hX (i) , Y (j) i) = chx(i) , y (j) i = ln(1 +
(i) (j)
2)hx , y i, as required.
2X
= Aij arcsin(hx̂(i) , ŷ (j) i)
π i,j
√
2 ln(1 + 2) X
= Aij hx(i) , y (j) i
π i,j
1 X
= max Ai,j hx(i) , y (j) i
K unit x(i) ,y (j) ∈Rm+n
i,j
P
As (x̂, ŷ) ∈ {−1, 1}m+n , this is at most maxxi ,yj ∈{−1,1} i,j Aij xi yj = kAk∞→1 .
Grothendieck’s Inequality immediately follows.
5
Perhaps the final detail to address is how to find x̂(i) , ŷ (j) . We may do so
with the following semidefinite program:
X √ X
min Aij hx̂(i) , ŷ (j) i − ln(1 + 2) Aij hx(i) , y (j) i
i,j i,j
(1) (m) (1) (n)
s.t. x̂ , . . . x̂ , ŷ , . . . , ŷ
are unit vectors in (Rd , k · k2 )
4 Connection to SOS
Let µ : {−1, 1}m+n → R be an arbitrary P degree-2 pseudo-distribution. Recall
that we aim to prove that Ẽµ(x,y) [ i,j Aij xi yj ] ≤ KkAk∞→1 , for all such µ.
To see the similarity between this statement and Grothendieck’s Inequality,
we appeal to the fact that, by considering P µ0 (w) = 12 (µ(w) + µ(−w)), which
has the same pseudo-expectation as µ on i,j Aij xi yj , we may assume that
Ẽµ(x,y) [xi ] = Ẽµ(x,y) [yj ] = 0. Moreover, as (x, y) ∈ {−1, 1}m+n , Ẽµ(x,y) [x2i ] =
Ẽµ(x,y) [yj2 ] = Ẽµ(x,y) [1] = 1. By the Quadratic Sampling Lemma (see lecture
notes or Boaz’s notes), there exists a joint normal probability distribution ρ :
Rm+n → R that has the same first two moments as µ; i.e. the same mean
(entry-wise) and covariance matrix (here we consider the “formal” covariance
matrix under pseudo-expectation of µ). Geometrically, sampling a point in
Rm+n according to ρ is essentially sampling a point from {x ∈ Rm+n : kxk2 ≤
1}, the unit ball in Rm+n . Hence, roughly speaking, we have:
X X
E(u,v)∼ρ Aij ui vj ≤ Ex(i) ,y(j) ∼ρ Aij hx(i) , u(j) i
i,j i,j
X
≈E Aij hu(i) , v (j) i
x(i) ,y (j) ∈Rm+n :
kx(i) k2 ,ky (j) k2 ≤1 i,j
X
≤ max Aij hx(i) , y (j) i
x(i) ,y (j) ∈Rm+n :
i,j
kx(i) k2 ,ky (j) k2 ≤1
X
= max Aij hx(i) , y (j) i
x(i) ,y (j) ∈Rm+n :
i,j
kx(i) k2 ,ky (j) k2 =1
≤ KkAk∞→1
The fourth line follows since the maximum value of the semidefinite program
is achieved on the boundary (i.e. we could have relaxed the constraint on x(i) ,
y (j) to kx(i) k2 , ky (j) k2 ≤ 1).
P As µ and ρ have the same first two moments,
the pseudo-expectation of i,j Aij xi yj under µ is the same as the expectation
under ρ, so we have the desired bound. Of-course, this is rather imprecise as
6
sampling from ρ does not exactly correspond to sampling from {x ∈ Rm+n :
kxk2 ≤ 1} (although it is true with high probability), hence the second line is
only “≈”. But, hopefully this lends some intuition on why it could be true. In
the remainder of this section, we formally prove this.
We first prove a modified version of Grothendieck’s Identity.
Wθ θ
kU + W k = r
p
θ < arcsin( 1 − ρ2 )
7
p
this gives us that Pr[sign(u) 6= sign(v)] = π1 arcsin( 1 − ρ2 ). It follows that
p
E[sign(u)sign(v)] = 1 − π2 arcsin( 1 − ρ2 ) = 1 − π2 arccos(ρ) = 1 − π2 ( π2 −
arcsin(ρ)) = π2 arcsin(ρ).
The preceding lemma implies that if (u1 , . . . , um , v1 , . . . , vm ) ∈ Rm+n is a
joint normal distribution that satisfies E[ui ] = E[vj ] = 0 and E[u2i ] = E[vj2 ] = 1,
then, for all i, j,
2
E[sign(ui )sign(vj )] = arcsin(E[ui vj ]) .
π
To carry out the same calculation we did in the preceding section to prove
Grothendieck’s Inequality, we need
√
arcsin(E[ui vj ]) = ln(1 + 2)Ẽµ(x,y) [xi yj ] ,
which should look familiar from Krivine’s lemma in the previous section. Put in
another way, we need to pick the covariance of ui vj according to Ẽµ(x,y) [xi yj ].
The next lemma proves exactly this:
Lemma. There exists a joint normal distribution (u1 , . . . , um , v1 , . . . , vn ) ∈
Rm+n such that, for all i ∈ [m], j ∈ [n],
• E[ui ] = E[vj ] = 0,
• E[u2i ] = E[vj2 ] = 1, and
√
• arcsin(E[ui vj ]) = ln(1 + 2)Ẽµ(x,y) [xi yj ].
where sinh and sin is applied entry-wise to each submatrix. Since Σ is posi-
tive semidefinite and symmetric, it can be shown that the same holds for Σ0 ,
8
√
i.e. Σ0 defines a covariance matrix. Moreover, since sinh(ln(1 + 2)) = 1
and Ẽµ(x,y) [x2i ] = Ẽµ(x,y) [yj2 ] = 1, we have that the diagonal of Σ0 are all
1’s. It follows that we can pick (u, v) to be a joint normal distribution with
E[ui ] = E[vj ] = 0 and covariance matrix Σ0 .
Putting this all together, we have that:
X
kAk∞→1 ≥ E(u,v) Aij sign(ui )sign(vj )
i,j
X
= Aij E(u,v) [sign(ui )sign(vj )]
i,j
2X
= Aij arcsin(E(u,v) [ui vj ])
π i,j
1 X
= Aij Ẽµ(x,y) [xi yj ] .
K i,j