grothendieck

Download as pdf or txt
Download as pdf or txt
You are on page 1of 9

Grothendieck’s Inequality

Leqi Zhu

1 Introduction
Let A = (Aij ) ∈ Rm×n be an m × n matrix. Then A defines a linear operator
between normed spaces (Rm , k · kp ) and (Rn , k · kq ), for 1 ≤ p, q ≤ ∞. The
(p → q)-norm of A is the quantity kAkp→q = maxx∈Rn :kxkp =1 kAxkq . (Recall
P
that, for a vector x = (xi ) ∈ Rd , the p-norm of x is kxkp = ( i |xi |p )1/p ; the
∞-norm of x is kxk∞ = maxi |xi |.) If p = q, then we denote the norm by kAkp .
For what value of p and q is kAkp→q maximized? Since A is linear, it suffices
to consider p such that {x ∈ Rn : kxkp ≤ 1} contains as many points as possible.
We also want kAxkq as large as possible. Figure 1 gives an illustration of
{x ∈ R2 : kxkp = 1}, for p ∈ {1, 2, ∞}. Going by the figure, kAk∞→1 ≥ kAkp→q .

1 ∞
2
1

-1 1

-1

Figure 1: Depiction of {x ∈ R2 : kxkp = 1} for p ∈ {1, 2, ∞}.

Besides providing an upper bound on any (p → q)-norm, it is known that


the (∞ → 1)-norm provides a constant approximation to the cut norm of a
matrix, kAkC = maxS⊆[m],T ⊆[n] maxi∈S,j∈T Aij , which is closely related to the
MAX-CUT problem on a graph.
One way to compute kAk∞→1 is by solving a quadratic integer program:
X
max Aij xi yj
i,j

s.t. (x, y) ∈ {−1, 1}m+n

1
P P
To see this, note that i,j Aij xi yj = i (Ay)i xi . Taking the maximum over
x ∈ {−1, 1}m gives us kAyk1 . Then taking the maximum over y ∈ {−1, 1}n
gives us kAk∞→1 (this requires an argument, but it follows from the convexity of
{x ∈ Rm : kxk∞ = 1} and triangle inequality). This quadratic integer program
may be relaxed to the following semidefinite program:
X
max Aij hx(i) , y (j) i
i,j

s.t. x(1) , . . . x(m) , y (1) , . . . , y (n)


are unit vectors in (Rd , k · k2 )
Notice that, if d = 1, then we have exactly the same optimization problem.
It is known that exactly computing kAkp→q , for 1 ≤ q < p ≤ ∞, is NP-hard,
while exactly computing kAkp is NP-hard for p ∈ / {1, 2, ∞}. (As far as I can tell,
it is an open question whether or not kAkp→q is computable in polynomial time
when 1 ≤ p < q ≤ ∞.) Hence, if d = 1, we cannot hope for the ellipsoid method
to converge quickly on all instances of A. However, there are no hardness results
for d > 1. Thus, in principle, the ellipsoid method could converge quickly.
A natural questions is: how well does an optimal solution to the semidefinite
program approximate kAk∞→1 ? Grothendieck’s Inequality provides an answer
to this question:
Theorem (Grothendieck’s Inequality). There exists a fixed constant C > 0
such that, for all m, n ≥ 1, A ∈ Rm×n , and any Hilbert space H (vector space
over R with an inner product),
X
max Aij hx(i) , y (j) iH ≤ CkAk∞→1 .
unit vectors
x(i) ,y (j) ∈H i,j

Grothendieck’s constant is the smallest C such that the above inequality


holds. As far as I can tell, determining the exact value of Grothendieck’s con-
stant is an open problem. However, it is known that it lies between π2 ≈ 1.57
π √
and K = 2 ln(1+ 2)
≈ 1.78.
Hence, the value of an optimal solution to the semidefinite program provides
a constant approximation of kAk∞→1 . However, this is a bit unsatisfying be-
cause, given an optimal solution to the semidefinite program, we do not know
how to round the solution to obtain an integer solution (x, y) ∈ {−1, 1}m+n
with a good approximation ratio.
Alon and Naor resolved this problem by rather nicely by adapting Kriv-
ine’s proof of Grothendieck’s Inequality, which obtains the upper bound K on
Grothendieck’s constant, to obtain a randomized rounding method:
Theorem (Alon and Naor). For d = m + n, given an optimal solution to
the semidefinite program, x(i) , y (j) ∈ Rm+n , it is possible to obtain (x, y) ∈
{−1, 1}m+n (using randomized rounding) such that
 
X 1 X 1
E Aij xi yj  = Aij hx(i) , y (j) i ≥ kAk∞→1 ≈ 0.56kAk∞→1 .
i,j
K i,j K

2
It turns out that Krivine’s proof can also be adapted to prove a theorem
about degree-2 pseudo-distributions µ : {−1, 1}m+n → R. Recall that µ has to
satisfy two properties for the pseudo-expectation Ẽµ that arises from µ: Ẽµ [1] =
P 2
P 2
x∈{−1,1}m+n µ(x) = 1 and Ẽµ [f ] = x∈{−1,1}m+n µ(x)(f (x)) ≥ 0 for all
m+n
degree-1 polynomials f : {−1, 1} → R.
Theorem (SOS). For any degree-2 pseudo-distribution µ : {−1, 1}m+n → R,
 
X
Ẽµ(x,y)  Aij xi yj  ≤ KkAk∞→1
i,j

In this note, we will prove Grothendieck’s Inequality when H = Rm+n . The


proof is mainly due to Krivine. However, we use a nice simplification of a key
lemma in Krivine’s proof (which holds for general H), due to Alon and Naor.
This will provide us with the tools to prove Alon and Naor’s theorem. Finally,
we will discuss the connection to SOS and how to prove the SOS theorem.

2 Grothendieck’s Inequality
Krivine’s proof of Grothendieck’s Inequality relies on Grothendieck’s Identity,
which, as the name suggests, was first proved by Grothendieck:
Lemma (Grothendieck’s Identity). Let x and y be unit vectors in (Rd , k · k2 ),
where d ≥ 2. If z is a unit vector picked uniformly at random from (Rd , k · k2 ),
then
2
E[sign(hx, zi)sign(hy, zi)] = arcsin(hx, yi) .
π
Here, sign(a) ∈ {−1, 1} is 1 if and only if a ≥ 0.
Proof. Consider sign(hx, zi)sign(hy, zi). This has a nice geometric interpreta-
tion. First, we orient the sphere {x ∈ Rd : kxk2 = 1} so that z is at the top.
It can be verified that sign(hx, zi)sign(hy, zi) is 1 if and only if both x and y
lie in the same (upper or lower) half of the sphere when it is oriented this way.
Equivalently, {x ∈ Rd : hz, xi = 0} is a hyperplane passing through the origin
(with normal z). A vector x ∈ Rd satisfies hx, zi > 0 if and only if it lies above
the hyperplane. Figure 2 contains a depiction of this.
Now, consider the expectation. Given the geometric interpretation, the
expectation is Pr[x, y lie in same half] − Pr[x, y lie in different halves] = 1 −
2 Pr[x, y lie in different halves], when a random hyperplane passing through the
origin is selected (with normal z). Then we note that the probability x and y

lie in different halves of the circle is 2π , where θ is the angle between x and
y (factor of 2 comes from z and −z defining same hyperplane). Hence, the
expectation is 1 − 2θ 2 2
π . On the other hand, π arcsin(hx, yi) = π arcsin(cos θ) =
2 π 2 π 2θ
π arcsin(sin( 2 − θ)) = π ( 2 − θ) = 1 − π .

This doesn’t appear to help much as we don’t know what arcsin(hx, yi) is.
The next lemma addresses this problem.

3
x z x z
y

y
sign(hx, zi)sign(hy, zi) = 1 sign(hx, zi)sign(hy, zi) = −1

Figure 2: Geometric intepretation of sign(hx, zi)sign(hy, zi).

Lemma (Krivine/Alon and Naor). Suppose that x(i) , y (j) are unit vectors in
Rm+n , for i ∈ [m], j ∈ [n]. Then there are unit vectors x̂(i) , ŷ (j) in Rm+n , for
i ∈ [m], j ∈ [m], such that

arcsin(hx̂(i) , ŷ (j) i) = ln(1 + 2)hx(i) , y (j) i .

Proof. Let c = ln(1 + 2) and d = m + n. By Taylor’s expansion,

X c2k+1
sin(chx(i) , y (j) i) = (−1)k (hx(i) , y (j) i)2k+1 .
(2k + 1)!
k=0

Our goal is to write the above as the inner product of two vectors in some vector
space. This suggests that we need an infinite dimensional vector space. Towards
this end, consider the infinite-dimensional vector space H obtained by taking
the direct product of 2k + 1 tensor powers of Rd , i.e. H = ⊕∞ d ⊗2k+1
k=0 (R ) . (As
a bit of an aside, the direct sum of two vector spaces A and B of dimension α
and β, respectively, is a vector space A ⊕ B of dimension α + β; given vectors
a ∈ A, b ∈ B, we get the vector a ⊕ b = (a1 , . . . , aα , b1 , . . . , bβ ). Similarly, the
tensor product of A and B, A ⊗ B, gives a vector space of dimension αβ; given
vectors a ∈ A and b ∈ B, we get the vector a ⊗ b = (ai bj )i,j .)
Let X (i) and Y (j) be vectors in H with the following “coordinates” for the
k’th part in the direct sum:
s
(i) c2k+1
Xk = (−1) k
(x(i) )⊗(2k+1)
(2k + 1)!
s
(j) c2k+1
Yk = (−1)k (y (j) )⊗(2k+1)
(2k + 1)!

It is a fact that ha⊗(2k+1) , b⊗(2k+1) i = (ha, bi)2k+1 . Hence, hX (i) , Y (j) i =


sin(chx(i) , y (j) i), as required. Moreover, it can be verified that hX (i) , X (i) i =
sinh(chx(i) , x(i) i) = sinh(c) = 1, by appealing to the Taylor’s expansion of
sinh(x) = 21 (ex − e−x ) and using the preceding fact. Similarly, hY (j) , Y (j) i = 1.
It follows that X (i) and Y (j) are unit vectors in H.

4
Consider the span, S, of {X (i) , Y (j) }. As there are only d = m + n vectors,
S is isomorphic a subspace in Rd . By finding an orthonormal basis for S (for
example, using Gram-Schmidt) and mapping the basis to the standard basis
for Rm+n , we can preserve inner products. Thus, X (i) , Y (j) correspond to unit
vectors x̂(i) , ŷ (j) in Rd with the same inner product (in H and Rd , respectively).
It
√ follows that arcsin(hx̂(i) , ŷ (j) i) = arcsin(hX (i) , Y (j) i) = chx(i) , y (j) i = ln(1 +
(i) (j)
2)hx , y i, as required.

Proof of Grothendieck’s Inequality. We may now P prove Grothendieck’s


Inequality when H = Rm+n . For x(i) , y (j) that maximizes i,j Aij hx(i) , y (j) i, we
first apply Krivine/Alon and Naor’s lemma to obtain x̂(i) , ŷ (j) . Define random
variables x̂i = sign(hx̂(i) , zi) and ŷj = sign(hŷ (j) , zi), where z is a unit vector in
Rm+n chosen uniformly at random. We may then compute:
 
X X h i
E Aij x̂i ŷj  = Aij E sign(hx̂(i) , zi)sign(hŷ (j) , zi)
i,j i,j

2X
= Aij arcsin(hx̂(i) , ŷ (j) i)
π i,j

2 ln(1 + 2) X
= Aij hx(i) , y (j) i
π i,j
1 X
= max Ai,j hx(i) , y (j) i
K unit x(i) ,y (j) ∈Rm+n
i,j
P
As (x̂, ŷ) ∈ {−1, 1}m+n , this is at most maxxi ,yj ∈{−1,1} i,j Aij xi yj = kAk∞→1 .
Grothendieck’s Inequality immediately follows.

3 Alon and Naor’s Theorem


Alon and Naor’s rounding algorithm is as follows:
1. Compute the optimal solution x(i) , y (j) of the semidefinite program.

2. Find x̂(i) , ŷ (j) such that arcsin(hx̂(i) , ŷ (j) i) = ln(1 + 2)hx(i) , y (j) i.
3. Pick a unit vector z uniformly at random from Rm+n .
4. Set x̂i = sign(hx̂(i) , zi) and ŷj = sign(hŷ (j) , zi).
Then the same calculation as in the preceding section gives us that:
 
X 1 X
E Aij x̂i ŷj  = max Ai,j hx(i) , y (j) ii .
i,j
K unit x(i) ,y (j) ∈Rm+n
i,j

Alon and Naor’s theorem follows.

5
Perhaps the final detail to address is how to find x̂(i) , ŷ (j) . We may do so
with the following semidefinite program:
X √ X
min Aij hx̂(i) , ŷ (j) i − ln(1 + 2) Aij hx(i) , y (j) i
i,j i,j
(1) (m) (1) (n)
s.t. x̂ , . . . x̂ , ŷ , . . . , ŷ
are unit vectors in (Rd , k · k2 )

By Krivine’s lemma, the optimal value is 0.

4 Connection to SOS
Let µ : {−1, 1}m+n → R be an arbitrary P degree-2 pseudo-distribution. Recall
that we aim to prove that Ẽµ(x,y) [ i,j Aij xi yj ] ≤ KkAk∞→1 , for all such µ.
To see the similarity between this statement and Grothendieck’s Inequality,
we appeal to the fact that, by considering P µ0 (w) = 12 (µ(w) + µ(−w)), which
has the same pseudo-expectation as µ on i,j Aij xi yj , we may assume that
Ẽµ(x,y) [xi ] = Ẽµ(x,y) [yj ] = 0. Moreover, as (x, y) ∈ {−1, 1}m+n , Ẽµ(x,y) [x2i ] =
Ẽµ(x,y) [yj2 ] = Ẽµ(x,y) [1] = 1. By the Quadratic Sampling Lemma (see lecture
notes or Boaz’s notes), there exists a joint normal probability distribution ρ :
Rm+n → R that has the same first two moments as µ; i.e. the same mean
(entry-wise) and covariance matrix (here we consider the “formal” covariance
matrix under pseudo-expectation of µ). Geometrically, sampling a point in
Rm+n according to ρ is essentially sampling a point from {x ∈ Rm+n : kxk2 ≤
1}, the unit ball in Rm+n . Hence, roughly speaking, we have:
   
X X
E(u,v)∼ρ  Aij ui vj  ≤ Ex(i) ,y(j) ∼ρ  Aij hx(i) , u(j) i
i,j i,j
 
X
≈E  Aij hu(i) , v (j) i
x(i) ,y (j) ∈Rm+n :
kx(i) k2 ,ky (j) k2 ≤1 i,j
X
≤ max Aij hx(i) , y (j) i
x(i) ,y (j) ∈Rm+n :
i,j
kx(i) k2 ,ky (j) k2 ≤1
X
= max Aij hx(i) , y (j) i
x(i) ,y (j) ∈Rm+n :
i,j
kx(i) k2 ,ky (j) k2 =1

≤ KkAk∞→1

The fourth line follows since the maximum value of the semidefinite program
is achieved on the boundary (i.e. we could have relaxed the constraint on x(i) ,
y (j) to kx(i) k2 , ky (j) k2 ≤ 1).
P As µ and ρ have the same first two moments,
the pseudo-expectation of i,j Aij xi yj under µ is the same as the expectation
under ρ, so we have the desired bound. Of-course, this is rather imprecise as

6
sampling from ρ does not exactly correspond to sampling from {x ∈ Rm+n :
kxk2 ≤ 1} (although it is true with high probability), hence the second line is
only “≈”. But, hopefully this lends some intuition on why it could be true. In
the remainder of this section, we formally prove this.
We first prove a modified version of Grothendieck’s Identity.

Lemma. Let (u, v) ∈ R2 be a joint normal distribution such that:


   
0 1 ρ
(u, v) ∼ N , .
0 ρ 1

The latter matrix is the covariance matrix. Notice that, by Cauchy-Schwartz,


−1 ≤ ρ ≤ 1. Then
2 2
E[sign(u)sign(v)] = arcsin (E[uv]) = arcsin(ρ) .
π π
p
Proof. We may write v = ρup + 1 − ρ2 w, where wp∼ N (0, 1) is independent of
u. (This is because E[u(ρu+ 1 − ρ2 w)] = E[ρu2 + 1 − ρ2 uw] = ρ as E[u2 ] = 1
and u and w are independent, hence E[uw] = E[u]E[w] p = 0.) It follows that
E[sign(u)sign(v)] = 1 − 2 Pru,w [sign(u) 6= sign(ρu + 1 − ρ2 w)]. Wep observe
that sign(u) 6= sign(v) if and only if sign(u) 6= sign(w) and |ρu| < 1 − ρ2 |w|
2
(equivalently, ρ2 < u2w+w2 , except when u = w = 0).

Wθ θ

kU + W k = r
p
θ < arcsin( 1 − ρ2 )

Figure 3: Geometric depiction of U , U + W , θ.

To compute the probability of this, we interpret this geometrically. First,


as u and w are independent, u and w may be viewed as vectors U = (u, 0),
2
W = (0, w) in R2 , respectively. In this view, u2w+w2 = (cos θ)2 , where θ is the
angle between W and U + W . As u, w ∼ N (0, 1) are independent, for any
r > 0, if we do rejection sampling for (u, w) such that u2 + w2 = r2 , then
this is the same as sampling a point uniformly at random from the circle of
radius r in R2 . Hence, Pr[sign(u) 6= sign(v)|u2 + w2 = r2 ] = Pr[(cos(θ))2 >
ρ2 |u2 + w2 = r2 ] = Pr[(sin(θ))2 < 1 − ρ2 |u2 + w2 = r2 ] is proportion of the
arc kU + W k = r such that (sin(θ))2 < 1 − ρ2 , where θ is the angle between
W and U √ + W . This is summarized in Figure 3. We may compute this to
2 arcsin( 1−ρ2 )r p P
be 2πr = π1 arcsin( 1 − ρ2 ). As 2
r≥0 Pr[u + w
2
= r2 ] = 1,

7
p
this gives us that Pr[sign(u) 6= sign(v)] = π1 arcsin( 1 − ρ2 ). It follows that
p
E[sign(u)sign(v)] = 1 − π2 arcsin( 1 − ρ2 ) = 1 − π2 arccos(ρ) = 1 − π2 ( π2 −
arcsin(ρ)) = π2 arcsin(ρ).
The preceding lemma implies that if (u1 , . . . , um , v1 , . . . , vm ) ∈ Rm+n is a
joint normal distribution that satisfies E[ui ] = E[vj ] = 0 and E[u2i ] = E[vj2 ] = 1,
then, for all i, j,
2
E[sign(ui )sign(vj )] = arcsin(E[ui vj ]) .
π
To carry out the same calculation we did in the preceding section to prove
Grothendieck’s Inequality, we need

arcsin(E[ui vj ]) = ln(1 + 2)Ẽµ(x,y) [xi yj ] ,

which should look familiar from Krivine’s lemma in the previous section. Put in
another way, we need to pick the covariance of ui vj according to Ẽµ(x,y) [xi yj ].
The next lemma proves exactly this:
Lemma. There exists a joint normal distribution (u1 , . . . , um , v1 , . . . , vn ) ∈
Rm+n such that, for all i ∈ [m], j ∈ [n],
• E[ui ] = E[vj ] = 0,
• E[u2i ] = E[vj2 ] = 1, and

• arcsin(E[ui vj ]) = ln(1 + 2)Ẽµ(x,y) [xi yj ].

Proof. Let Σ ∈ R(m+n)×(m+n) be the matrix defined by Σij = Eµ(w) [wi wj ],


where w = (x, y). In other words, Σ is the “formal” covariance matrix of (x, y)
(here is where we use the assumption that Σij = Ẽµ(x,y) [xi ] = Ẽµ(x,y) [yj ] = 0).
By the Quadratic Sampling Lemma, Σ is positive semidefinite and symmetric,
i.e. it actually defines a covariance matrix. Moreover, since Ẽµ(x,y) [1] = 1,
Ẽµ(x,y) [x2i ] = Ẽµ(x,y) [1] = 1 and, similarly, Ẽµ(x,y) [yj2 ] = 1.
If we extend the pseudo-expectation function so that it applies to matrices
(in terms of x, y) entry-wise, i.e. , (Ẽµ(x,y) [B])ij = Ẽµ(x,y) [Bij ], then we may
view Σ more compactly as:
 
Ẽµ(x,y) [xxt ] Ẽµ(x,y) [xy t ]
.
Ẽµ(x,y) [yxt ] Ẽµ(x,y) [yy t ]

Consider the matrix Σ0 ∈ R(m+n)×(m+n) defined as follows:


 √ √ 
sinh ◦(ln(1 +√ 2)Ẽµ(x,y) [xxt ]) sin ◦(ln(1 + √2)Ẽµ(x,y) [xy t ])
sin ◦(ln(1 + 2)Ẽµ(x,y) [yxt ]) sinh ◦(ln(1 + 2)Ẽµ(x,y) [yy t ])

where sinh and sin is applied entry-wise to each submatrix. Since Σ is posi-
tive semidefinite and symmetric, it can be shown that the same holds for Σ0 ,

8

i.e. Σ0 defines a covariance matrix. Moreover, since sinh(ln(1 + 2)) = 1
and Ẽµ(x,y) [x2i ] = Ẽµ(x,y) [yj2 ] = 1, we have that the diagonal of Σ0 are all
1’s. It follows that we can pick (u, v) to be a joint normal distribution with
E[ui ] = E[vj ] = 0 and covariance matrix Σ0 .
Putting this all together, we have that:
 
X
kAk∞→1 ≥ E(u,v)  Aij sign(ui )sign(vj )
i,j
X
= Aij E(u,v) [sign(ui )sign(vj )]
i,j
2X
= Aij arcsin(E(u,v) [ui vj ])
π i,j
1 X
= Aij Ẽµ(x,y) [xi yj ] .
K i,j

The SOS theorem follows.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy