Computational Complexity Theory
Computational Complexity Theory
Computational Complexity Theory
Computational Complexity:
A Modern Approach
Sanjeev Arora and Boaz Barak
Princeton University
http://www.cs.princeton.edu/theory/complexity/
complexitybook@gmail.com
ii
Appendix A
Mathematical Background.
This appendix reviews the mathematical notions used in this book. However, most of these
are only used in few places, and so the reader might want to only quickly review Sections A.1
and A.2, and come back to the other sections as needed. In particular, apart from probability, the first part of the book essentially requires only comfort with mathematical proofs
and some very basic notions of discrete math.
The topics described in this appendix are covered in greater depth in many texts and
online sources. Almost all of the mathematical background needed is covered in a good
undergraduate discrete math for computer science course as currently taught at many
computer science departments. Some good sources for this material are the lecture notes
by Papadimitriou and Vazirani [PV06], and the book of Rosen [Ros06].
The mathematical tool we use most often is discrete probability. Alon and Spencer
[AS00b] is a great resource in this area. Also, the books of Mitzenmacher and Upfal [MU05]
and Motwani and Raghavan [MR95] cover probability from a more algorithmic perspective.
Although knowledge of algorithms is not strictly necessary for this book, it would be
quite useful. It would be helpful to review either one of the two recent books by Dasgupta
et al [DPV06] and Kleinberg and Tardos [KT06] or the earlier text by Cormen et al [CLRS01].
This book does not require prior knowledge of computability and automata theory, but some
basic familiarity with that theory could be useful: see Sipsers book [Sip96] for an excellent
introduction. See Shoups book [Sho05] for a computer-science introduction to algebra and
number theory.
Perhaps the mathematical prerequisite needed for this book is a certain level of comfort with mathematical proofs. The fact that a mathematical proof has to be absolutely
convincing does not mean that it has to be overly formal and tedious. It just has to be
clearly written, and contain no logical gaps. When you write proofs try to be clear and
concise, rather than using too much formal notation. Of course, to be absolutely convinced
that some statement is true, we need to be certain of what that statement means. This
why there is a special emphasis in mathematics (and this book) on very precise definitions.
Whenever you read a definition, try to make sure you completely understand it, perhaps
by working through some simple examples. Oftentimes, understanding the meaning of a
mathematical statement is more than half the work to prove that it is true.
A.1
434
Mathematical Background.
n
0
{0, 1} = n0 {0, 1} ({0, 1} has a single element: a binary string of length zero, which
we call the empty word and denote by ). As mentioned in Section 0.1 we can represent
various objects (numbers, graphs, matrices, etc...) as binary strings, and use xxy (not to be
confused with the floor operator bxc) to denote the representation of x. Moreover, we often
drop the x y symbols and use x to denote both the object and its representation.
Graphs. A graph G consists of a set V of vertices (which we often assume is equal to the
set [n] = {1, . . . , n} for some n N ) and a set E of edges, which consists of unordered pairs
(i.e., size two subsets) of elements in V . We denote the edge {u, v} of the graph by u v.
For v V , the neighbors of v are all the vertices u V such that u v E. In a directed
graph, the edges consist of ordered pairs of vertices, and to stress this we sometimes denote
the edge hu, vi in a directed graph by
u
v. One can represent an n-vertex graph G by its
adjacency matrix which is an n n matrix A such that Ai,j is equal to 1 if the edge i j
is present in G ith and is equal to 0 otherwise. One can think of an undirected graph as
a directed graph G that satisfies that for every u, v, G contains the edge
u
v if and only
if it contains the edge v u. Hence, one can represent an undirected graph by an adjecancy
matrix that is symmetric (Ai,j = Aj,i for every i, j [n]).
Boolean operators. A Boolean variable is a variable that can be either True or False
(we sometimes identify True with 1 and False with 0). We can combine variables via the
logical operations AND (), OR () and NOT (, sometimes also denoted by an overline), to
obtain Boolean formulae. For example, the following is a Boolean formulae on the variables
u1 , u2 , u3 : (u1 u2 )(u3 u1 ). The definitions of the operations are the usual: ab = True
if a = True and b = True and is equal to False otherwise; a = a = True if a = False
and is equal to False otherwise; a b = (a b). We sometimes use other Boolean
operators such as the XOR () operator, but they can be always replaced with the equivalent
expression using , , (e.g., a b = (a b) (a b)). If is a formulae in n variables
u1 , . . . , un , then for any assignment of values u {False, True}n (or equivalently, {0, 1}n ),
we denote by (u) the value of when its variables are assigned the values in u. We say
that is satisfiable if there exists a u such that (u) = True.
Quantifiers. We will often use the quantifiers (for all) and (exists). That is, if is
a condition that can be True or False depending on the value of a variable x, then we
write x (x) to denote the statement that is True for every possible value that can be
assigned to x. If A is a set then we write xA (x) to denote the statement that is True
for every assignment for x from the set A. The quantifier is defined similarly. Formally,
we say that x (x) holds if and only if (x (x)) holds.
Big-Oh Notation. We will often use the big-Oh notation (i.e., O, , , o, ) as defined in
Section 0.3.
A.2
A.2
Probability theory
435
Probability theory
A finite probability space is a finite set = {1 , . . . , N } along with a set of numbers
PN
p1 , . . . , pN [0, 1] such that i=1 pi = 1. A random element is selected from this space
by choosing i with probability pi . If x is chosen from the sample space then we denote
this by x R . If no distribution is specified then we use the uniform distribution over the
elements of (i.e., pi = N1 for every i).
An event over the P
space is a subset A and the probability that A occurs, denoted
by Pr[A], is equal to i:i A pi . To give an example, the probability space could be that
n
of all 2n possible outcomes of n tosses of a fair coin (i.e., = {0, 1} and pi = 2n for
every i [2n ]) and the event A can be that the number of coins that come up heads (or,
equivalently, 1) is even. In this case, Pr[A] = 1/2 (exercise). The following simple bound
called the union boundis often used in the book. For every set of events A1 , A2 , . . . , An ,
Pr[ni=1 Ai ]
n
X
Pr[Ai ].
(1)
i=1
Inclusion exclusion principle. The union bound is a special case of a more general principle.
Indeed, note that
P if the sets A1 , . . . , An are not disjoint then the probability of i Ai could be
smaller than i Pr[Ai ] since we are overcounting
elements that appear in more than one set.
P
We can correct this by substracting i<j Pr[Ai Aj ] but then we might be undercounting,
since we subtracted elements that appear in at least 3 sets too many times. Continuing this
process we get
Claim A.1 (Inclusion-Exclusion principle) For every A1 , . . . , An ,
Pr[ni=1 Ai ] =
n
X
i=1
Pr[Ai ]
1i<jn
Moreover, this is an alternating sum which means that if we take only the first k summands
of the right hand side, then this upper bounds the left-hand side if k is odd, and lower
bounds it if k is even.
We sometimes use the following corollary of this claim, known as the Bonefforni Inequality:
Corollary A.2 For every events A1 , . . . , An ,
Pr[ni=1 Ai ]
A.2.1
n
X
i=1
Pr[Ai ]
1i<jn
Pr[Ai Aj ]
436
Mathematical Background.
once k is larger than roughly 2n. This fact is often known as the birthday
paradox because it explains the seemingly strange phenomenon that a class of
more than 27 or so students is quite likely to have a pair of students sharing the
same birthday, even though there
are 365 days in the year.
Note that in contrast, if k n then by
the union bound, the probability there
will be even one collision is at most k2 /n 1.
Notes: (1) We sometimes also consider random variables whose range is not R, but other
n
sets such as C or {0, 1} . (2) Also, we often identify a random variable X over the sample
space with the distribution X() for R . For example, we may use both PrxR X [x2 =
1] and Pr[X 2 = 1] to denote the probability that for R , X()2 = 1.
A.2.2
1
.
k
Can we give any meaningful upper bound on the probability that X is much smaller
than its expectation? Yes, if X is bounded.
Lemma A.8 If a1 , a2 , . . . , an are numbers in the interval [0, 1] whose average is then at
least /2 of the ai s are at least as large as /2.
A.2
Probability theory
437
Proof: Let be the fraction of is such that ai /2. Then the average of the ai s is
bounded by 1 + (1 )/2. Hence, + /2, implying /2.
More generally, we have
Lemma A.9 If X [0, 1] and E[X] = then for any c < 1 we have
Pr[X c]
1
.
1 c
Example A.10
Suppose you took a lot of exams, each scored from 1 to 100. If your average
score was 90 then in at least half the exams you scored at least 80.
A.2.3
Pr[Ai ] .
(2)
iS
We say that A1 , . . . , An are k-wise independent if (2) holds for every S [n] with |S| k.
We say that two random variables X, Y are independent if for every x, y R, the events
{X = x} and {Y = y} are independent. We generalize similarly the definition of mutual
independence and k-wise independence to sets of random variables X1 , . . . , Xn . We have
the following claim:
Claim A.11 If X1 , . . . , Xn are mutually independent then
E[X1 Xn ] =
n
Y
E[Xi ]
i=1
Proof:
E[X1 Xn ] =
X
x1 ,...,xn
X
x
x Pr[X1 Xn = x] =
x1 ,...,xn
x1 xn Pr[X1 = x1 ] Pr[Xn = xn ] =
n
Y
X
X
X
E[Xi ]
xn Pr[Xn = xn ]) =
x2 Pr[X2 = x2 ]) (
x1 Pr[X1 = x1 ])(
(
x1
x2
xn
i=1
where the sums above are over all the possible real numbers that can be obtained by applying
the random variables or their products to the finite set .
438
A.2.4
Mathematical Background.
Proof: Apply Markovs inequality to the random variable (X E[X])2 , noting that by
definition of variance, E[(X E[X])2 ] = 2 .
Pn
Chebyshevs inequality is often useful in the case that X is equal to i=1 Xi for pairwise
independent random variables X1 , . . . , Xn . This is because of the following claim, that is
left as an exercise:
Claim A.13 If X1 , . . . , Xn are pairwise independent then
n
n
X
X
Var(Xi )
Xi ) =
Var(
i=1
i=1
The next inequality has many names, and is widely known in theoretical computer
science as the Chernoff bound (see also Note 7.11. It considers scenarios of the following
type. Suppose we toss a fair coin n times. The expected number of heads is n/2. How
tightly is this number concentrated? Should we be very surprised if after 1000 tosses we
have 625 heads? The bound we present is slightly more general, since it concerns n different
coin tosses of possibly different expectations (the expectation of a coin is the probability of
obtaining heads; for a fair coin this is 1/2). These are sometimes known as Poisson trials.
Theorem A.14 (Chernoff bounds) Let X1 , X2 , . . . , Xn be mutually
independent random
P
variables over {0, 1} (i.e., Xi can be either 0 or 1) and let = ni=1 E[Xi ]. Then for every
> 0,
n
X
e
Xi (1 + )]
Pr[
.
(3)
(1 + )(1+)
i=1
n
X
e
(1 )]
Pr[
.
(4)
(1 )(1)
i=1
In particular this probability is bounded by 2() (where the constant in the notation
depends on c).
Proof: Surprisingly, the Chernoff bound is also proved using the Markov inequality. We
only prove the first inequality; the second inequality can be proved similarly. We introduce
a positive dummy variable t, and observe that
Y
Y
X
E[exp(tXi )],
(5)
Xi )] = E[ exp(tXi )] =
E[exp(tX)] = E[exp(t
i
A.2
Probability theory
439
where exp(z) denotes ez and the last equality holds because the Xi r.v.s are independent.
Now,
E[exp(tXi )] = (1 pi ) + pi et ,
therefore,
Y
E[exp(tXi )] =
Y
Y
exp(pi (et 1))
[1 + pi (et 1)]
i
X
pi (et 1)) = exp((et 1)),
= exp(
(6)
E[exp(tX)]
exp((et 1))
=
,
exp(t(1 + ))
exp(t(1 + ))
using (5), (6) and the fact that t is positive. Since t is a dummy variable, we can choose any
positive value we like for it. Simple calculus shows that the right hand side is minimized for
t = ln(1 + ) and this leads to the theorem statement.
So, if all n coin tosses are fair (Heads has probability 1/2) then the the probability of
2
seeing N heads where |N n/2| > a n is at most 2ea /4 . In particular, the chance of
seeing at least 625 heads in 1000 tosses of an unbiased coin is less than 5.3 107 .
A.2.5
n k
k
n
k
ne k
k
2n
e 12n+1 < n! < 2n
e 12n
e
e
where H() = log(1/)+(1) log(1/(1)) and the constants hidden in the O notation
are independent of both n and .
440
Mathematical Background.
Pn
k
i=1 i =
A.2.6
Pn
i=1
nk+1
k+1
nk < O(1).
nc
i=1 (1+)n
1
i=1 n
< O(1).
= ln n O(1)
Statistical distance
The following notion of when two distributions are close to one another is often very useful.
Definition A.20 (Statistical Distance) Let be some finite set. For two random variables
X and Y with range , their statistical distance (also known as variation distance) is defined
as (X, Y ) = maxS {| Pr[X S] Pr[Y S]|}.
Some texts use the name total variation distance for the statistical distance. The next
lemma gives some useful properties of this distance:
Lemma A.21 Let X, Y, Z be any three distributions taking values in the finite set . Then,
1. (X, Y ) [0, 1] where (X) = (Y ) iff X is identical to Y .
2. (Triangle inequality) (X, Z) (X, Y ) + (Y, Z).
P
3. (X, Y ) = 21 x |Pr[X = x] Pr[Y = x]| .
4. (X, Y ) iff there is a Boolean function f : {0, 1} such that |E[f (X)] E[f (Y )]|
.
5. For every finite set 0 and function f : 0 , (f (X), f (Y )) (X, Y ). (Here
f (X) is a distribution on 0 obtained by taking a sample of X and applying f .)
Note that Item 3 means that (X, Y ) is equal to the L1 -distance of X and Y divided by 2
Proof of Lemma A.21: We start with Item 3. For every pairs of distributions X, Y over
n
{0, 1} let S be the set of strings x such that Pr[X = x] > Pr[X = y]. Then it is easy to
see that this choice of S maximizes the quantity b(S) = Pr[X S] Pr[Y S] and in fact
b(S) = (X, Y ) since if we had a set T with b(T ) < b(S) then the complement T of T
would satisfy b(T ) > b(S). But,
X
x{0,1}n
xS
Pr[X = x] Pr[Y = x] +
x6S
Pr[Y = x] Pr[X = x] =
A.3
441
The triangle inequality (Item 2) follows immediately from Item 3 since (X, Y ) = 1/2|X
Y |1 and the L1 norm satisfies the triangle inequality. Item 3 also implies Item 1 since
|X Y |1 = 0 iff X = Y and |X Y |1 kXk + |Y |1 = 1 + 1.
Item 4 is just a rephrasing of the definition of statistical distance, identifying a set
S {0, 1}n with the function f : {0, 1}n {0, 1} such that f (x) = 1 iff x S. Item 5
follows from Item 4 noting that if (X, Y ) then |E[g(f (X))] E[g(f (Y ))]| for every
function g.
A.3
n
(1 o(1))
ln n
The original proofs of the prime number theorem used rather deep mathematical tools,
and in fact people have conjectured that this is inherently the case. But in 1949 both Erd
os
and Selberg (independently) found elementary proofs for this theorem. For most computer
science applications, the following weaker statement proven by Chebychev suffices:
Theorem A.23 (n) = ( logn n )
2n
2n!
Proof: Consider the number 2n
=
n = n!n! . By Stirlings formula we know that log n
2n
(1 o(1))2n and in particular n log 2n
2n.
Also,
all
the
prime
factors
of
are
n
n
1 Some
442
Mathematical Background.
k
j
2n
times. Indeed,
between 0 and 2n, and each factor p cannot appear more than k = log
log p
P jnk
for every n, the number of times p appears in the factorization of n! is i pi , since we
j k
j k
get np times a factor p in the factorizations of {1, . . . , n}, pn2 times a factor of the form
(2n)!
p2 , etc...
Thus
theknumber of times p appears in the factorization of 2n
n = n!n! is equal
j
k
j
P 2n
to i pi 2 pni : a sum of at most k elements (since pk+1 > 2n) each of which is either
0 or 1.
log 2n
Q
2n
log p
. Taking logs we get that
Thus, n 1p2n p
p prime
n log
2n
X j
log 2n
log p
1p2n
p prime
log p
1p2n
p prime
to prove
that (n) = O(n)
(exercise!). But since all the primes between n + 1 and 2n divide
Q
2n
2n
at
least
once,
2n
2n log
n+1p2n
p prime
thus getting a recursive equation (2n) (n) + 2n which solves to (n) = O(n).
A.3.1
Groups.
A group is an abstraction that captures some properties of mathematical objects such as
the integers, matrices, functions and more. Formally, a group is a set that has a binary
operation, say ?, defined on it that is associative and has an inverse. That is, (G, ?) is a
group if
1. For every a, b, c G , (a ? b) ? c = a ? (b ? c)
2. There exists a special element id G such that a ? id = a for every a G, and for
every a G there exists b G such that a ? b = b ? a = id. (This element b is called
the inverse of a, and is often denote as a1 or a.)
Examples for groups are the integers, with addition being the group operation (and
zero the identity element), the non-zero real numbers with multiplication being the group
operation(and one the identity element), and the set of functions from a domain A to itself,
with function composition being the group operation.
Often, it is natural to use additive (+) or multiplicative () notation to denote the group
operation rather than ?. In these cases we will use `a (or respectively a` ) to denote the
result of applying the operation to a ` times.
A.3.2
Finite groups
A group is finite if it has a finite number of elements. We denote by |G| the number of
elements of G. Examples for finite groups are the following:
The group Zn of the integers from 0 to n 1 with the operation being addition modulo
n. In particular Z2 is the set {0, 1} with the XOR operation.
A.3
443
The group Sn of the permutations on [n], with the operation being function composition.
The group (Z2 )n of n-bit strings with the operation being bitwise XOR. More generally for every two groups G and H, we can define the group G H to be a group
whose elements are pairs hg, hi with g G and h H and with the group operation
corresponding to applying the group operations of G and H componentwise. Similarly,
we define Gn to be the group G G G (n times).
For every n, the group Zn consists of the set {k : 1 k n 1 , gcd(k, n) = 1} and
the operation of multiplication modulo n. Note that if gcd(k, n) = 1 then there exist
x, y such that kx + ny = 1 or in other words kx = 1 (mod n), meaning that x is the
inverse of k modulo n. This also means that we can find this inverse in polynomial
time using Euclids algorithm. The size of Zn is denoted by (n) and the function
is known as Eulers Quotient function.Note that if n is prime then (n) = n 1. It
is known that for every n > 6, (n) n.
A subgroup of G is a subset of G that is itself a group (i.e., closed under the group
operation and taking inverses). The following result is often quite useful
Theorem A.24 If G is a finite group and H is a subgroup of G then |H| divides |G|.
Proof: Consider the family of sets of the form aH = {ah : h H} for all a G (were
using here multiplicative notation for the group). It is easy to see that the map x 7 ax
is one-to-one and hence |aH| = |H| for every a. Hence it will suffice to show that we can
partition G into disjoint sets from this family. Yet this family clearly covers G (as a aH
for every a G) and hence it suffices to show that for every a, b either aH = bH or aH and
bH are disjoint. Indeed, suppose that there exist x, y H such that ax = by then for every
element az aH, we have that az = (byx1 )z and since yx1 z H we get that az bH.
Corollary A.25 (Fermats Little Theorem) For every n and x {1, . . . , n 1}, x(n) = 1
(mod n). In particular, if n is prime then xn1 = 1 (mod n).
`
Proof: Consider the set H = x : ` Z . This is clearly a subgroup of Zn and hence |H|
divides (n). But the size of H is simply the smallest number k such that xk = 1 (mod n).
Indeed, there must be such a number since, because Zn , if we consider the sequence of
numbers 1, x, x2 , x3 , . . . then eventually we get i, j such that xi = xj for i < j, meaning
that xij = 1 (mod n). Thus, the above sequence looks like 1, x, x2 , . . . , xk1 , 1, x, x2 , . . .,
meaning that |H| = k.
Since x|H| = 1 (mod n), obviously taking x to the power (n) (which is a multiple of
|H|) yields also 1 modulo n.
The order of an element x of a group G is the smallest integer k such that xk is equal to
the identity element. The proof above shows that in a finite group G, every element has a
finite order and furthermore this order divides the size of G. An element x of G with order
|G| is called a generator of G, since in this case the subgroup x, x1 , x2 , . . . is all of G.2 If a
group G has a generator then we say that G is cyclic. An example for a simple cyclic group
is the group Zn of the numbers {0, . . . , n 1} with addition modulo n, that is generated by
the element 1 (and also by any other element that is co-prime to n exercise).
A.3.3
2 A more general definition (that works also for infinite groups) is that x is a generator of G if the subgroup
x` : ` Z is equal to G.
444
Mathematical Background.
Theorem A.26 If n = pq where p, q coprime then function f that maps x to hx (mod p), x
(mod q)i is one-to-one on Zn . Furthermore f is an isomorphism in the sense that f (xy) =
f (x)f (y) (where multiplication on the left hand side is modulo n and on the right hand side
is componentwise modulo p and q respectively).
Proof: The furthermore part can be easily verified and so we focus on showing that f is
one-to-one. We need to show that if f (x) = f (x0 ) then x = x0 . Since f (xx0 ) = f (x)f (x0 ),
it suffices to show that if x = 0 (mod p) (i.e., p|x) and x = 0 (mod q) (i.e., q|x) then x = 0
(mod n) (i.e., pq|x). Yet, assume that p|x and write x = pk. Then since gcd(p, q) = 1 and
q|x we know that q|k, meaning that pq|x.
The Chinese Remainder Theorem can be easily generalized to show that for every n =
p1 p2 . . . pk , where all the pi s are co-prime, there is an isomorphism between Zn to Zp1
Zpk , meaning that for every n, the group Zn is isomorphic to a product of groups of
the form Zq for q a prime power (i.e., number of the form p` for prime p). In fact, it can
be generalized even further to show that every Abelian group G is isomorphic to a product
G1 G2 Gk where all the Gi s are cyclic. (This can be viewed as a generalization
of the CRT because all the groups of the form Zq for q a power of an odd prime are cyclic,
and all groups of the form Z2k are either cyclic or products of two cyclic groups.)
A.4
Finite fields
A field is a set F that has an addition (+) and multiplication () operations that behave in
the expected way: satisfy associative, commutative and distributive laws, have both additive
and multiplicative inverses, and neutral elements 0 and 1 for addition and multiplication
respectively. In other words, F is a field if it is an Abelian group with the operation + and an
identity element 0, and has an additional operation such that F\{0} and forms an Abelian
group, and furthermore the two operation satisfy the distributive rule a(b + c) = ab + ac.
Familiar fields are the real numbers (R), the rational numbers (Q) and the complex
numbers (C), but there are also finite fields. Recall that for a prime p, the set {0, . . . , p 1}
is an Abelian group with the addition modulo p operation and the set {1, . . . , p 1} is an
Abelian group with the multiplication modulo p operation. Hence {0, . . . , p 1} form a
field with these two operations, which we denote by GF(p). The simplest example for such
a field is the field GF(2) consisting of {0, 1} where multiplication is the AND () operation
and addition is the XOR operation.
Every finite field F has a number ` such that for every x F , x + x + + x (` times)
is equal to the zero element of F (exercise). This number ` is called the characteristic of F.
For every prime q, the characteristic of GF(q) is equal to q.
A.4.1
Non-prime fields.
One can see that if n is not prime, then the set {0, . . . , n 1} with addition and multiplication modulo n is not a field, as there exist two non-zero elements x, y in this set such
that x y = n = 0 (mod n). Nevertheless, there are finite fields of size n for non-prime
n. Specifically, for every prime q, and k 1, there exists a field of q k elements, which we
denote by GF(q k ). We will very rarely need to use such fields in this book, but still provide
an outline of their construction below.
For every prime q and k there exists an irreducible degree k polynomial P over the field
GF(q) (P is irreducible if it cannot be expressed as the product of two polynomials P 0 , P 00
of lower degree). We then let GF(q k ) be the set of all k 1-degree polynomials over GF(q).
Each such polynomial can be represented as a vector of its k coefficients. We perform both
addition and multiplication modulo the polynomial P . Note that addition corresponds
to standard vector addition of k-dimensional vectors over GF(q), and both addition and
multiplication can be easily done in poly(n, log q) time (we can reduce a polynomial S
A.5
445
modulo a polynomial P using a similar algorithm to long division of numbers). It turns out
that no matter how we choose the irreducible polynomial P , we will get the same field, up
to renaming of the elements. There is a deterministic poly(q, k)-time algorithm to obtain
an irreducible polynomial of degree k over GF(q). There are also probabilistic algorithms
(and deterministic algorithms whose analysis relies on unproven assumptions) that obtain
such a polynomial in poly(log q, k) time (see the book [Sho05]).
For us, the most important example of a finite field is GF(2k ), which consists of the
set {0, 1}k , with addition being component-wise XOR, and multiplication being polynomial
multiplication via some irreducible polynomial which we can find in poly(k) time. In fact,
we will mostly not even be interested in the multiplicative structure of GF(2k ) and only use
the addition operation (i.e., use it as the vector space GF(2)k , see below).
A.5
446
Mathematical Background.
A.5.1
Inner product
The vector spaces Rn and Cn have an additional structure that is often quite useful.3 An
inner product over Cn to be a function mapping two vectors u, v to a complex number hu, vi
satisfying the following conditions:
hxu + yw, vi = xhu, vi + yhw, vi
hv, ui = hu, vi where z denotes complex conjugation (i.e., if z = a+ib then z = aib).
For every u, hu, ui is a non-negative real number with hu, ui = 0 iff u = 0.
The two examples
for inner products we will use are the standard inner product mapping
Pn
n
n
y
x, y
C
to
x
i=1 i i and the expectation or normalized inner product mapping x, y C
Pn
to n1 i=1 xi yi . We can also define inner products over the space Rn , in which case we drop
the conjugation.
If hu, vi = 0 we say that u and v are orthogonal and denote this by u v. We have the
following result:
Lemma A.28 If non-zero vectors u1 , . . . , uk satisfy ui uj for all i 6= j then they are
linearly independent.
3 The reason we restrict ourselves to these fields is that they have characteristic zero which means that
there does not exist a number k N and nonzero a F such that ka = 0 (where ka is the result of adding a
to itself k times). You can check that if there is such a number for a field F then there will not be an inner
product over Fn .
A.5
i,j
where the last equality follows from the fact that hui , uj i = 0 for i 6= j. But unless all
the xi s are zero, the righthand
side of (7) is strictly positive. (Recall that for a complex
xi ui .
A.5.2
Dot product
Even in a field F that doesnt have an inner
P product, we can define the dot product of two
vectors u, v Fn , denoted by u v, as ni=1 ui vi . For every subspace S Fn , we define
S = {u : u v = 0v S}. We leave the following simple claim as an exercise:
Claim A.30 dim(S) + dim(S ) = n
vR GF(2)n
A.5.3
[u v = 0] = 1/2
448
Mathematical Background.
Note that A has an eigenvector with eigenvalue if and only if the matrix A I
is non-invertible, where I is the identity matrix. Thus in particular is a root of the
polynomial p(x) = det(A xI). Thus the fundamental Theorem of Algebra (that every
complex polynomial has as many roots as the degree) that every square matrix has at least
one eigenvector. (A non-invertible matrix has an eigenvector zero.)
For a matrix A, the conjugate transpose of A, denoted A , is the matrix such that for
every i, j, Ai,j = Aj,i where denotes the complex conjugate operation. We say that an
n n matrix A is Hermitian if A = A . An Hermitian matrix with only real entries is called
symmetric. That is, a real matrix is symmetric if A = A where is the transpose operation
(i.e., Ai,j = Aj,i ). An equivalent condition (exercise) is that A is Hermitian if and only if
hAu, vi = hu, Avi .
(8)
Proof: We prove this by induction on n. We know that A has one eigenvector v with
eigenvalue . Now let S = v be the n 1 dimensional space of all vectors orthogonal to
v. We claim that for every u S, Au S. Indeed, if hu, vi = 0 then
hAu, vi = hu, Avi = hu, vi = 0 .
Thus the restriction of A to S is an n 1 dimensional linear operator satisfying (8)
and hence by induction this restriction has an orthogonal basis of eigenvectors v2 , . . . , vn .
Adding v to this set we get an n-dimensional orthogonal basis of eigenvectors for A.
Note that if A is real and symmetric then all its eigenvalues must be real also (with no
imaginary components). Indeed, if Av = v then
hv, vi = hAv, vi = hv, Avi = hv, vi ,
meaning that for a nonzero v, = . This implies that the eigenvectors, that are obtained
by solving a linear equation with real coefficients, are also real.
A.5.4
Norms
A norm of a vector in Cn is a function mapping a vector v to a real number kvk satisfying:
For every v, kvk 0 with kvk = 0 iff v = 0.
If x C then kxvk = |x|kvk.
(Triangle inequality) For every u, v, ku + vk kuk + kvk.
1.
But
if
kuk
=
kvk
=
1
then
i
i
i
i
i
q
i=1
i=1
i=1
Pn p1
1
1
1
p
q
|u
|
+
|v
|
=
+
=
1,
where
the
last
inequality
uses
the
fact
that
for every
i=1 p i
q i
p
q
1
a, b > 0 and [0, 1], a b
a + (1 )b.
The H
older inequality implies the following relations between the L2 , L1 and L norms
of every vector (see Exercise 21.2):
p
(9)
|v|1 / n kvk2 |v|1 kvk
Vector spaces with a norm are sometimes known as Banach spaces.
A.6
A.5.5
Polynomials
449
Metric spaces
For any set and d : 2 R, we say that d is a metric on if it satisfies the following
conditions:
1. d(x, y) 0 for every x, y where d(x, y) = 0 if and only if x = y.
2. d(x, y) = d(y, x) for every x, y .
3. (Triangle Inequality) For every x, y, z , d(x, z) d(x, y) + d(y, z).
That is, d(x, y) denotes the distance between x and y according to some measure. If is
a vector space with a norm then the function d(x, y) = kx yk is a metric over , but there
are other examples for metrics that do not come from any norm. For example, for every
graph G we can define a metric over the vertex set of G by letting the distance of x and y
be the length of the shortest path between them. Various metric spaces and the relations
between them have found recently many applications in theoretical computer science, see
Chapter 15 of [Mat02] for a good survey.
A.6
Polynomials
We list some basic facts about univariate polynomials.
Theorem A.33 A nonzero polynomial of degree d has at most d distinct roots.
Proof: Suppose p(x) =
Then
Pd
i
i=0 ci x
d
X
i=0
ij ci = p(j ) = 0,
. . . d1
1
1
21
1
. . . d2
2
22
A=
. . . . . . . . . . . . . . . . . . . .
1 d+1 2d+1 . . . dd+1
has a solution y = c. The matrix A is a Vandermonde matrix, and it can be shown that
Y
(i j ),
det A =
i>j
which is nonzero for distinct i . Hence rankA = d + 1. The system Ay = 0 has therefore
only a trivial solution a contradiction to c 6= 0.
This theorem has an interesting corollary:
Corollary A.34 For every finite field F, the multiplicative group F is cyclic.
Proof: The fact that the polynomial xk 1 has at most k roots implies that the group F
has the property (*) that for every k the number of elements x satisfying xk = 1 is always
at most k. We will prove by induction that every group G satisfying (*) is cyclic.
Let n = |G|. We consider three cases:
n is prime. In this case every element of G has either order 1 or order n. Since the
only element with order 1 is the identity element, we see that G has an element of
order n G is cyclic.
450
Mathematical Background.
n = pc for some prime p and c > 1. In this case if there is no element of order n,
c1
then all the orders must divide pc1 . We get n = pc elements x such that xp
= 1,
violating (*).
n = pq for co-prime p and q. In this case let H and F be two subgroups of G defined
as follows: H = {a : ap = 1} and F = {b : bq = 1}. Then |H| p < n and |F | q < n
and also as subgroups of G both H and F satisfy (*). Thus by the induction hypothesis
both H and F are cyclic and have generators a and b respectively. We claim that ab
generates the entire group G. Indeed, let c be any element in G. Since p, q are
coprime, there are x, y such that xq + yp = 1 and hence c = cxq+yp . But (cxq )p = 1
and (cyp )q = 1 and hence c is a product of an element of H and an element of F , and
hence c = ai bj for some i {0, . . . , p 1} and j {0, . . . , q 1}. Thus, to show that
c = (ab)z for some z all we need to do is to find z such that z = i (mod p) and z = j
(mod q), but this can be done using the Chinese Remainder Theorem.
Theorem A.35 For any set of pairs (a1 , b1 ), . . . , (ad+1 , bd+1 ) there exists a unique polynomial g(x) of degree at most d such that g(ai ) = bi for all i = 1, 2, . . . , d + 1.
j6=i (x aj )
.
bi Q
j6=i (ai aj )
i=1
If two polynomials g1 (x), g2 (x) satisfy the requirements then their difference p(x) = g1 (x)
g2 (x) is of degree at most d, and is zero for x = a1 , . . . , ad+1 . Thus, from the previous
theorem, polynomial p(x) must be zero and polynomials g1 (x), g2 (x) identical.
The following elementary result is usually attributed to Schwartz and Zippel in the
computer science community, though it was certainly known earlier (see e.g. DeMillo and
Lipton [DLne]).
Lemma A.36 If a polynomial p(x1 , x2 , . . . , xm ) over F = GF (q) is nonzero and has total
degree at most d, then
d
Pr[p(a1 ..am ) 6= 0] 1 ,
q
where the probability is over all choices of a1 ..am F .
where pi has total degree at most d i. Since p is nonzero, at least one of pi is nonzero.
Let k be the largest i such that pi is nonzero. Then by the inductive hypothesis,
Pr
a2 ,a3 ,...,am
[pi (a2 , a3 , . . . , am ) 6= 0] 1
dk
.
q
dk
d
k
)(1
)1 ,
q
q
q