Computational Complexity Theory

Download as pdf or txt
Download as pdf or txt
You are on page 1of 20

i

Computational Complexity:
A Modern Approach
Sanjeev Arora and Boaz Barak
Princeton University
http://www.cs.princeton.edu/theory/complexity/
complexitybook@gmail.com

Not to be reproduced or distributed without the authors permission

ii

Appendix A

Mathematical Background.
This appendix reviews the mathematical notions used in this book. However, most of these
are only used in few places, and so the reader might want to only quickly review Sections A.1
and A.2, and come back to the other sections as needed. In particular, apart from probability, the first part of the book essentially requires only comfort with mathematical proofs
and some very basic notions of discrete math.
The topics described in this appendix are covered in greater depth in many texts and
online sources. Almost all of the mathematical background needed is covered in a good
undergraduate discrete math for computer science course as currently taught at many
computer science departments. Some good sources for this material are the lecture notes
by Papadimitriou and Vazirani [PV06], and the book of Rosen [Ros06].
The mathematical tool we use most often is discrete probability. Alon and Spencer
[AS00b] is a great resource in this area. Also, the books of Mitzenmacher and Upfal [MU05]
and Motwani and Raghavan [MR95] cover probability from a more algorithmic perspective.
Although knowledge of algorithms is not strictly necessary for this book, it would be
quite useful. It would be helpful to review either one of the two recent books by Dasgupta
et al [DPV06] and Kleinberg and Tardos [KT06] or the earlier text by Cormen et al [CLRS01].
This book does not require prior knowledge of computability and automata theory, but some
basic familiarity with that theory could be useful: see Sipsers book [Sip96] for an excellent
introduction. See Shoups book [Sho05] for a computer-science introduction to algebra and
number theory.
Perhaps the mathematical prerequisite needed for this book is a certain level of comfort with mathematical proofs. The fact that a mathematical proof has to be absolutely
convincing does not mean that it has to be overly formal and tedious. It just has to be
clearly written, and contain no logical gaps. When you write proofs try to be clear and
concise, rather than using too much formal notation. Of course, to be absolutely convinced
that some statement is true, we need to be certain of what that statement means. This
why there is a special emphasis in mathematics (and this book) on very precise definitions.
Whenever you read a definition, try to make sure you completely understand it, perhaps
by working through some simple examples. Oftentimes, understanding the meaning of a
mathematical statement is more than half the work to prove that it is true.

A.1

Sets, Functions, Pairs, Strings, Graphs, Logic.


Sets. A set contains a finite or infinite number of elements, without repetition or respect to order, for example {2, 17, 5}, N = {1, 2, 3, . . .} (the set of natural numbers),
[n] = {1, 2, . . . , n} (the set of natural numbers from 1 ro n), R (the set of real numbers). For
a finite set A, we denote by |A| the number of elements in A. Some operations on sets are:
(1) union: AB = {x : x A or x B}, (2) intersection : AB = {x : x A and x B},
and (3) set difference: A \ B = {x : x A and x 6 B}.

434

Mathematical Background.

Functions. We say that f is a function from a set A to B, denoted by f : A B, if it


maps any element of A into an element of B. If B and A are finite, then the number of
possible functions from A to B is |B||A| . We say that f is one to one if for every x, w A
with x 6= w, f (x) 6= f (w). If A,B are finite, the existence of such a function implies that
|A| |B|. We say that f is onto if for every y B there exists x A such that f (x) = y.
If A, B are finite, the existence of such a function implies that |A| |B|. We say that f is a
permutation if it is both one-to-one and onto. For finite A, B, the existence of a permutation
from A to B implies that |A| = |B|.
Pairs and tuples. If A,B are sets, then the A B denotes the set of all ordered pairs
ha, bi with a A, b B. Note that if A, B are finite then |A B| = |A| |B|. We can
define similarly A B C to be the set of ordered triples ha, b, ci with a A, b B, c C.
For n N, we denote by An the set A A A (n times). We will often use the
n
set {0, 1} , consisting of all length-n sequences of bits (i.e., length n strings), and the set

n
0
{0, 1} = n0 {0, 1} ({0, 1} has a single element: a binary string of length zero, which
we call the empty word and denote by ). As mentioned in Section 0.1 we can represent
various objects (numbers, graphs, matrices, etc...) as binary strings, and use xxy (not to be
confused with the floor operator bxc) to denote the representation of x. Moreover, we often
drop the x y symbols and use x to denote both the object and its representation.
Graphs. A graph G consists of a set V of vertices (which we often assume is equal to the
set [n] = {1, . . . , n} for some n N ) and a set E of edges, which consists of unordered pairs
(i.e., size two subsets) of elements in V . We denote the edge {u, v} of the graph by u v.
For v V , the neighbors of v are all the vertices u V such that u v E. In a directed
graph, the edges consist of ordered pairs of vertices, and to stress this we sometimes denote
the edge hu, vi in a directed graph by
u
v. One can represent an n-vertex graph G by its

adjacency matrix which is an n n matrix A such that Ai,j is equal to 1 if the edge i j
is present in G ith and is equal to 0 otherwise. One can think of an undirected graph as
a directed graph G that satisfies that for every u, v, G contains the edge
u
v if and only

if it contains the edge v u. Hence, one can represent an undirected graph by an adjecancy
matrix that is symmetric (Ai,j = Aj,i for every i, j [n]).
Boolean operators. A Boolean variable is a variable that can be either True or False
(we sometimes identify True with 1 and False with 0). We can combine variables via the
logical operations AND (), OR () and NOT (, sometimes also denoted by an overline), to
obtain Boolean formulae. For example, the following is a Boolean formulae on the variables
u1 , u2 , u3 : (u1 u2 )(u3 u1 ). The definitions of the operations are the usual: ab = True
if a = True and b = True and is equal to False otherwise; a = a = True if a = False
and is equal to False otherwise; a b = (a b). We sometimes use other Boolean
operators such as the XOR () operator, but they can be always replaced with the equivalent
expression using , , (e.g., a b = (a b) (a b)). If is a formulae in n variables
u1 , . . . , un , then for any assignment of values u {False, True}n (or equivalently, {0, 1}n ),
we denote by (u) the value of when its variables are assigned the values in u. We say
that is satisfiable if there exists a u such that (u) = True.
Quantifiers. We will often use the quantifiers (for all) and (exists). That is, if is
a condition that can be True or False depending on the value of a variable x, then we
write x (x) to denote the statement that is True for every possible value that can be
assigned to x. If A is a set then we write xA (x) to denote the statement that is True
for every assignment for x from the set A. The quantifier is defined similarly. Formally,
we say that x (x) holds if and only if (x (x)) holds.
Big-Oh Notation. We will often use the big-Oh notation (i.e., O, , , o, ) as defined in
Section 0.3.

A.2

A.2

Probability theory

435

Probability theory
A finite probability space is a finite set = {1 , . . . , N } along with a set of numbers
PN
p1 , . . . , pN [0, 1] such that i=1 pi = 1. A random element is selected from this space
by choosing i with probability pi . If x is chosen from the sample space then we denote
this by x R . If no distribution is specified then we use the uniform distribution over the
elements of (i.e., pi = N1 for every i).
An event over the P
space is a subset A and the probability that A occurs, denoted
by Pr[A], is equal to i:i A pi . To give an example, the probability space could be that
n
of all 2n possible outcomes of n tosses of a fair coin (i.e., = {0, 1} and pi = 2n for
every i [2n ]) and the event A can be that the number of coins that come up heads (or,
equivalently, 1) is even. In this case, Pr[A] = 1/2 (exercise). The following simple bound
called the union boundis often used in the book. For every set of events A1 , A2 , . . . , An ,
Pr[ni=1 Ai ]

n
X

Pr[Ai ].

(1)

i=1

Inclusion exclusion principle. The union bound is a special case of a more general principle.
Indeed, note that
P if the sets A1 , . . . , An are not disjoint then the probability of i Ai could be
smaller than i Pr[Ai ] since we are overcounting
elements that appear in more than one set.
P
We can correct this by substracting i<j Pr[Ai Aj ] but then we might be undercounting,
since we subtracted elements that appear in at least 3 sets too many times. Continuing this
process we get
Claim A.1 (Inclusion-Exclusion principle) For every A1 , . . . , An ,
Pr[ni=1 Ai ] =

n
X
i=1

Pr[Ai ]

1i<jn

Pr[Ai Aj ] + + (1)n1 Pr[A1 An ] .

Moreover, this is an alternating sum which means that if we take only the first k summands
of the right hand side, then this upper bounds the left-hand side if k is odd, and lower
bounds it if k is even.

We sometimes use the following corollary of this claim, known as the Bonefforni Inequality:
Corollary A.2 For every events A1 , . . . , An ,
Pr[ni=1 Ai ]

A.2.1

n
X
i=1

Pr[Ai ]

1i<jn

Pr[Ai Aj ]

Random variables and expectations.


A random variable is a mapping from a probability space to R. For example, if is as
above (i.e., the set of all possible outcomes of n tosses of a fair coin), then we can denote
by X the number of coins that came up heads.
The expectation of a random variable X, denoted by E[X], is its weighted average. That
PN
is, E[X] = i=1 pi X(i ). The following simple claim follows from the definition:
Claim A.3 (Linearity of expectation) For X, Y random variables over a space , denote by
X + Y the random variable that maps to X() + Y (). Then,
E[X + Y ] = E[X] + E[Y ]

436

Mathematical Background.

This claims implies


Pn that the random variable X from the example above has expectation
n/2. Indeed X = i=1 Xi where Xi is equal to 1 if the ith coins came up heads and is equal
to 0 otherwise. But clearly, E[Xi ] = 1/2 for every i.
For a real number and a random variable X, we define X to be the random variable
mapping to X(). Note that E[X] = E[X].
Example A.4
Suppose that we choose k random numbers x1 , . . . , xk independently in [n].
What is the expected number of collisions: unordered pairs {i, j} such that
xi = xj ? For every i 6= j, define the random variable Yi,j to equal 1 if xi = xj
and 0 otherwise. Since for every choice of xi , the probability that xj = xi is 1/n,
we have that E[Yi,j ] = 1/n. The number of collisions is the sum of Yi,j over all
i 6= j in [k]. Thus, by linearity of expectation the expected number of collisions
is
 
X
k 1
E[Yi,j ] =
.
2 n
1i<jn

This means that we expect at least one collision once k2 n, which happens

once k is larger than roughly 2n. This fact is often known as the birthday
paradox because it explains the seemingly strange phenomenon that a class of
more than 27 or so students is quite likely to have a pair of students sharing the
same birthday, even though there
are 365 days in the year.
Note that in contrast, if k  n then by
 the union bound, the probability there
will be even one collision is at most k2 /n  1.
Notes: (1) We sometimes also consider random variables whose range is not R, but other
n
sets such as C or {0, 1} . (2) Also, we often identify a random variable X over the sample
space with the distribution X() for R . For example, we may use both PrxR X [x2 =
1] and Pr[X 2 = 1] to denote the probability that for R , X()2 = 1.

A.2.2

The averaging argument


The following simple fact can be surprisingly useful:
The Averaging Argument : If a1 , a2 , . . . , an are some numbers whose average is c
then some ai c.
Equivalently, we can state this in probabilistic terms as follows:
Lemma A.5 (The Probabilistic Method) If X is a random variable which takes values
from a finite set and E[X] = then the event X has nonzero probability.

The following two facts are also easy to verify


Lemma A.6 If a1 , a2 , . . . , an 0 are numbers whose average is c then the fraction of ai s
that are at least kc is at most 1/k.

Lemma A.7 (Markovs inequality) Any non-negative random variable X satisfies


Pr (X k E[X])

1
.
k

Can we give any meaningful upper bound on the probability that X is much smaller
than its expectation? Yes, if X is bounded.
Lemma A.8 If a1 , a2 , . . . , an are numbers in the interval [0, 1] whose average is then at
least /2 of the ai s are at least as large as /2.

A.2

Probability theory

437

Proof: Let be the fraction of is such that ai /2. Then the average of the ai s is
bounded by 1 + (1 )/2. Hence, + /2, implying /2. 
More generally, we have
Lemma A.9 If X [0, 1] and E[X] = then for any c < 1 we have
Pr[X c]

1
.
1 c

Example A.10
Suppose you took a lot of exams, each scored from 1 to 100. If your average
score was 90 then in at least half the exams you scored at least 80.

A.2.3

Conditional probability and independence


If we already know that an event B happened, this reduces the space from to B,
where we need to scale the probabilities by 1/ Pr[B] so they will sum up to one. Thus,
the probability of an event A conditioned on an event B, denoted Pr[A|B], is equal to
Pr[A B]/ Pr[B] (where we always assume that B has positive probability).
We say that two events A, B are independent if Pr[A B] = Pr[A] Pr[B]. Note that this
implies that Pr[A|B] = Pr[A] and Pr[B|A] = Pr[B]. We say that a set of events A1 , . . . , An
are mutually independent if for every subset S [n],
Pr[iS Ai ] =

Pr[Ai ] .

(2)

iS

We say that A1 , . . . , An are k-wise independent if (2) holds for every S [n] with |S| k.
We say that two random variables X, Y are independent if for every x, y R, the events
{X = x} and {Y = y} are independent. We generalize similarly the definition of mutual
independence and k-wise independence to sets of random variables X1 , . . . , Xn . We have
the following claim:
Claim A.11 If X1 , . . . , Xn are mutually independent then
E[X1 Xn ] =

n
Y

E[Xi ]

i=1

Proof:
E[X1 Xn ] =
X

x1 ,...,xn

X
x

x Pr[X1 Xn = x] =

x1 xn Pr[X1 = x1 and X2 = x2 and Xn = xn ] = (by independence)


X

x1 ,...,xn

x1 xn Pr[X1 = x1 ] Pr[Xn = xn ] =

n
Y
X
X
X
E[Xi ]
xn Pr[Xn = xn ]) =
x2 Pr[X2 = x2 ]) (
x1 Pr[X1 = x1 ])(
(
x1

x2

xn

i=1

where the sums above are over all the possible real numbers that can be obtained by applying
the random variables or their products to the finite set . 

438

A.2.4

Mathematical Background.

Deviation upper bounds


Under various conditions, one can give better upper bounds on the probability of a random
variable straying too far from its expectation. These upper bounds are usually derived
by clever use of Markovs inequality.
The variance of a random variable X is defined to be Var[X] = E[(X E(X))2 ]. Note
that since it is the expectation of a non-negative random variable, Var[X] is always non2
2
negative. Also, using linearity of expectation, we can derive
p that Var[X] = E[X ] (E[X]) .
The standard deviation of a variable X is defined to be Var[X].
The first bound is Chebyshevs inequality, useful when only the variance is known.
Lemma A.12 (Chebyshev inequality) If X is a random variable with standard deviation ,
then for every k > 0,
Pr[|X E[X]| > k] 1/k 2

Proof: Apply Markovs inequality to the random variable (X E[X])2 , noting that by
definition of variance, E[(X E[X])2 ] = 2 . 
Pn
Chebyshevs inequality is often useful in the case that X is equal to i=1 Xi for pairwise
independent random variables X1 , . . . , Xn . This is because of the following claim, that is
left as an exercise:
Claim A.13 If X1 , . . . , Xn are pairwise independent then
n
n
X
X
Var(Xi )
Xi ) =
Var(
i=1

i=1

The next inequality has many names, and is widely known in theoretical computer
science as the Chernoff bound (see also Note 7.11. It considers scenarios of the following
type. Suppose we toss a fair coin n times. The expected number of heads is n/2. How
tightly is this number concentrated? Should we be very surprised if after 1000 tosses we
have 625 heads? The bound we present is slightly more general, since it concerns n different
coin tosses of possibly different expectations (the expectation of a coin is the probability of
obtaining heads; for a fair coin this is 1/2). These are sometimes known as Poisson trials.
Theorem A.14 (Chernoff bounds) Let X1 , X2 , . . . , Xn be mutually
independent random
P
variables over {0, 1} (i.e., Xi can be either 0 or 1) and let = ni=1 E[Xi ]. Then for every
> 0,


n
X
e
Xi (1 + )]
Pr[
.
(3)
(1 + )(1+)
i=1


n
X
e
(1 )]
Pr[
.
(4)
(1 )(1)
i=1

Often, we will only use the following corollary:


Corollary A.15 Under the above conditions, for every c > 0

#
" n

X
2


Xi c 2 e min{c /4,c/2} .
Pr


i=1

In particular this probability is bounded by 2() (where the constant in the notation
depends on c).

Proof: Surprisingly, the Chernoff bound is also proved using the Markov inequality. We
only prove the first inequality; the second inequality can be proved similarly. We introduce
a positive dummy variable t, and observe that
Y
Y
X
E[exp(tXi )],
(5)
Xi )] = E[ exp(tXi )] =
E[exp(tX)] = E[exp(t
i

A.2

Probability theory

439

where exp(z) denotes ez and the last equality holds because the Xi r.v.s are independent.
Now,
E[exp(tXi )] = (1 pi ) + pi et ,
therefore,
Y

E[exp(tXi )] =

Y
Y
exp(pi (et 1))
[1 + pi (et 1)]
i

X
pi (et 1)) = exp((et 1)),
= exp(

(6)

as 1 + x ex . Finally, apply Markovs inequality to the random variable exp(tX), viz.


Pr[X (1 + )] = Pr[exp(tX) exp(t(1 + ))]

E[exp(tX)]
exp((et 1))
=
,
exp(t(1 + ))
exp(t(1 + ))

using (5), (6) and the fact that t is positive. Since t is a dummy variable, we can choose any
positive value we like for it. Simple calculus shows that the right hand side is minimized for
t = ln(1 + ) and this leads to the theorem statement. 
So, if all n coin tosses are fair (Heads has probability 1/2) then the the probability of

2
seeing N heads where |N n/2| > a n is at most 2ea /4 . In particular, the chance of
seeing at least 625 heads in 1000 tosses of an unbiased coin is less than 5.3 107 .

A.2.5

Some other inequalities.


Jensens inequality.
The following inequality, generalizing the inequality E[X 2 ] E[X]2 , is also often useful:
Lemma A.16 (Jensens Inequality) A function f : R R is convex if for every p [0, 1]
and x, y R, f (px + (1 p)y) p f (x) + (1 p) f (y). For every random variable X and
convex function f , f (E[X]) E[f (X)].

Approximating the binomial coefficient


Of special interest is the Binomial random variable Bn denoting the number of coins
 that
come up heads when tossing n fair coins. For every k, Pr[Bn = k] = 2n nk where


n
n
n!
k
k = k!(nk)! denotes the number ofsize-k subsets of [n]. Clearly, k n , but sometimes
n
we will need a better estimate for k and use the following approximation:
Claim A.17 For every n, k < n,


n k
k

n
k


ne k
k

The best approximation can be obtained via Stirlings formula:


Lemma A.18 (Stirlings formula) For every n,
 n n
 n n 1
1

2n
e 12n+1 < n! < 2n
e 12n
e
e

R n natural logarithms and approximating ln n! = ln(1 2 n) =


PnIt can be proven by taking
ln
i
by
the
integral
ln x dx = n ln n n + 1. It implies the following corollary:
i=1
1
Corollary A.19 For every n N and [0, 1],
 
n
2H()n
= (1 O(n1 )) 1
2n(1)
n

where H() = log(1/)+(1) log(1/(1)) and the constants hidden in the O notation
are independent of both n and .

440

Mathematical Background.

More useful estimates.


The following inequalities can be obtained via elementary calculus:
x

x
1
For every x 1, 1 x1 1e 1 x+1
For every k,

Pn

k
i=1 i =

For every k > 1,

For every c,  > 0,


For every n,

A.2.6

Pn

i=1

nk+1
k+1

nk < O(1).

nc
i=1 (1+)n

1
i=1 n

< O(1).

= ln n O(1)

Statistical distance
The following notion of when two distributions are close to one another is often very useful.
Definition A.20 (Statistical Distance) Let be some finite set. For two random variables
X and Y with range , their statistical distance (also known as variation distance) is defined
as (X, Y ) = maxS {| Pr[X S] Pr[Y S]|}.

Some texts use the name total variation distance for the statistical distance. The next
lemma gives some useful properties of this distance:
Lemma A.21 Let X, Y, Z be any three distributions taking values in the finite set . Then,
1. (X, Y ) [0, 1] where (X) = (Y ) iff X is identical to Y .
2. (Triangle inequality) (X, Z) (X, Y ) + (Y, Z).
P
3. (X, Y ) = 21 x |Pr[X = x] Pr[Y = x]| .

4. (X, Y )  iff there is a Boolean function f : {0, 1} such that |E[f (X)] E[f (Y )]|
.
5. For every finite set 0 and function f : 0 , (f (X), f (Y )) (X, Y ). (Here
f (X) is a distribution on 0 obtained by taking a sample of X and applying f .)
Note that Item 3 means that (X, Y ) is equal to the L1 -distance of X and Y divided by 2

(see Section A.5.4 below). That is, if we think


Pof X as a vector in R where X = Pr[X = ],

and define for every vector v R , |v|1 = |v |, then (X, Y ) = 1/2|X Y |1 .

Proof of Lemma A.21: We start with Item 3. For every pairs of distributions X, Y over
n
{0, 1} let S be the set of strings x such that Pr[X = x] > Pr[X = y]. Then it is easy to
see that this choice of S maximizes the quantity b(S) = Pr[X S] Pr[Y S] and in fact
b(S) = (X, Y ) since if we had a set T with b(T ) < b(S) then the complement T of T
would satisfy b(T ) > b(S). But,
X

x{0,1}n

|Pr[X = x] Pr[Y = x]| =


X

xS

Pr[X = x] Pr[Y = x] +

x6S

Pr[Y = x] Pr[X = x] =

Pr[X S] Pr[Y S] + (1 Pr[Y S]) (1 Pr[X S]) =


2 Pr[X S] 2 Pr[Y S] ,
establishing Item 3.

A.3

Number theory and groups

441

The triangle inequality (Item 2) follows immediately from Item 3 since (X, Y ) = 1/2|X
Y |1 and the L1 norm satisfies the triangle inequality. Item 3 also implies Item 1 since
|X Y |1 = 0 iff X = Y and |X Y |1 kXk + |Y |1 = 1 + 1.
Item 4 is just a rephrasing of the definition of statistical distance, identifying a set
S {0, 1}n with the function f : {0, 1}n {0, 1} such that f (x) = 1 iff x S. Item 5
follows from Item 4 noting that if (X, Y )  then |E[g(f (X))] E[g(f (Y ))]|  for every
function g. 

A.3

Number theory and groups


The integers are the set Z = {0, 1, 2, . . .} while the natural numbers are the subset
N = {0, 1, 2, . . .}.1 A basic fact is that we can divide any integer n by an nonzero integer
k to obtain `, r such that n = k` + r and r {0, . . . , n 1}. If r = 0 then we say that
k divides n and denote this by k|n. The factors of n are the set of positive integers that
divide n.
The greatest common divisor of two integers n, m, denoted by gcd(n, m) is the largest
integers d such that d|n and d|m. We say that n and m are co-prime if their greatest
common divisor is equal to 1. The following basic facts are not hard to verify:
If a nonzero integer c divides both n and m then c|d.
The greatest common divisor of n and m is the smallest positive integer d such that
there exist integers x, y satisfying nx + my = d.
There is a polynomial-time (i.e., polylog(n, m)-time) algorithm that on input n, m
outputs the greatest common divisor d of n, m and the integers x, y satisfying nx +
my = d. (This algorithm is known as Euclids Algorithm.)
A number p > 1 is prime if its only factors are 1 and p. The following basic facts are
known about prime numbers:
Every positive integer n can be written uniquely (up to ordering) as a product of prime
numbers. This is called the prime factorization of n.
If gcd(p, a) = 1 and p|ab then p|b. In particular, if a prime p divides a b then either
p|a or p|b.
A fundamental question in number theory is how many primes exist. A celebrated result
is:
Theorem A.22 (The Prime Number Theorem (Hadamard, de la Vallee Poussin 1896)) For
n > 1, let (n) denote the number of primes between 1 and n then
(n) =

n
(1 o(1))
ln n

The original proofs of the prime number theorem used rather deep mathematical tools,
and in fact people have conjectured that this is inherently the case. But in 1949 both Erd
os
and Selberg (independently) found elementary proofs for this theorem. For most computer
science applications, the following weaker statement proven by Chebychev suffices:
Theorem A.23 (n) = ( logn n )



2n
2n!
Proof: Consider the number 2n
=
n = n!n! .  By Stirlings formula we know that log n

2n
(1 o(1))2n and in particular n log 2n

2n.
Also,
all
the
prime
factors
of
are
n
n
1 Some

texts exclude 0 in N; in most cases this does not any difference.

442

Mathematical Background.
k
j
2n
times. Indeed,
between 0 and 2n, and each factor p cannot appear more than k = log
log p
P jnk
for every n, the number of times p appears in the factorization of n! is i pi , since we
j k
j k
get np times a factor p in the factorizations of {1, . . . , n}, pn2 times a factor of the form

(2n)!
p2 , etc...
Thus
theknumber of times p appears in the factorization of 2n
n = n!n! is equal
j
k
j
P 2n
to i pi 2 pni : a sum of at most k elements (since pk+1 > 2n) each of which is either
0 or 1.


log 2n
 Q
2n
log p
. Taking logs we get that
Thus, n 1p2n p
p prime

n log

 
2n

X j

log 2n
log p

1p2n
p prime

log p

log 2n = (2n) log 2n ,

1p2n
p prime

establishing (n) = ( logn n ).


P
To prove that (n) = O( logn n ), we define the function (n) =
1pn log p. It suffices
p prime

to prove
that (n) = O(n)
(exercise!). But since all the primes between n + 1 and 2n divide

 Q
2n
2n
at
least
once,

n+1p2n p. Taking logs we get


n
n
p prime

 
2n
2n log

n+1p2n
p prime

log p = (2n) (n) ,

thus getting a recursive equation (2n) (n) + 2n which solves to (n) = O(n). 

A.3.1

Groups.
A group is an abstraction that captures some properties of mathematical objects such as
the integers, matrices, functions and more. Formally, a group is a set that has a binary
operation, say ?, defined on it that is associative and has an inverse. That is, (G, ?) is a
group if
1. For every a, b, c G , (a ? b) ? c = a ? (b ? c)
2. There exists a special element id G such that a ? id = a for every a G, and for
every a G there exists b G such that a ? b = b ? a = id. (This element b is called
the inverse of a, and is often denote as a1 or a.)
Examples for groups are the integers, with addition being the group operation (and
zero the identity element), the non-zero real numbers with multiplication being the group
operation(and one the identity element), and the set of functions from a domain A to itself,
with function composition being the group operation.
Often, it is natural to use additive (+) or multiplicative () notation to denote the group
operation rather than ?. In these cases we will use `a (or respectively a` ) to denote the
result of applying the operation to a ` times.

A.3.2

Finite groups
A group is finite if it has a finite number of elements. We denote by |G| the number of
elements of G. Examples for finite groups are the following:
The group Zn of the integers from 0 to n 1 with the operation being addition modulo
n. In particular Z2 is the set {0, 1} with the XOR operation.

A.3

Number theory and groups

443

The group Sn of the permutations on [n], with the operation being function composition.
The group (Z2 )n of n-bit strings with the operation being bitwise XOR. More generally for every two groups G and H, we can define the group G H to be a group
whose elements are pairs hg, hi with g G and h H and with the group operation
corresponding to applying the group operations of G and H componentwise. Similarly,
we define Gn to be the group G G G (n times).
For every n, the group Zn consists of the set {k : 1 k n 1 , gcd(k, n) = 1} and
the operation of multiplication modulo n. Note that if gcd(k, n) = 1 then there exist
x, y such that kx + ny = 1 or in other words kx = 1 (mod n), meaning that x is the
inverse of k modulo n. This also means that we can find this inverse in polynomial
time using Euclids algorithm. The size of Zn is denoted by (n) and the function
is known as Eulers Quotient function.Note that if n is prime then (n) = n 1. It
is known that for every n > 6, (n) n.
A subgroup of G is a subset of G that is itself a group (i.e., closed under the group
operation and taking inverses). The following result is often quite useful
Theorem A.24 If G is a finite group and H is a subgroup of G then |H| divides |G|.

Proof: Consider the family of sets of the form aH = {ah : h H} for all a G (were
using here multiplicative notation for the group). It is easy to see that the map x 7 ax
is one-to-one and hence |aH| = |H| for every a. Hence it will suffice to show that we can
partition G into disjoint sets from this family. Yet this family clearly covers G (as a aH
for every a G) and hence it suffices to show that for every a, b either aH = bH or aH and
bH are disjoint. Indeed, suppose that there exist x, y H such that ax = by then for every
element az aH, we have that az = (byx1 )z and since yx1 z H we get that az bH.

Corollary A.25 (Fermats Little Theorem) For every n and x {1, . . . , n 1}, x(n) = 1
(mod n). In particular, if n is prime then xn1 = 1 (mod n).

 `

Proof: Consider the set H = x : ` Z . This is clearly a subgroup of Zn and hence |H|
divides (n). But the size of H is simply the smallest number k such that xk = 1 (mod n).
Indeed, there must be such a number since, because Zn , if we consider the sequence of
numbers 1, x, x2 , x3 , . . . then eventually we get i, j such that xi = xj for i < j, meaning
that xij = 1 (mod n). Thus, the above sequence looks like 1, x, x2 , . . . , xk1 , 1, x, x2 , . . .,
meaning that |H| = k.
Since x|H| = 1 (mod n), obviously taking x to the power (n) (which is a multiple of
|H|) yields also 1 modulo n. 

The order of an element x of a group G is the smallest integer k such that xk is equal to
the identity element. The proof above shows that in a finite group G, every element has a
finite order and furthermore this order divides the size of G. An element x of G with order
|G| is called a generator of G, since in this case the subgroup x, x1 , x2 , . . . is all of G.2 If a
group G has a generator then we say that G is cyclic. An example for a simple cyclic group
is the group Zn of the numbers {0, . . . , n 1} with addition modulo n, that is generated by
the element 1 (and also by any other element that is co-prime to n exercise).

A.3.3

The Chinese Remainder Theorem


Let n = pq where p, q are co-prime. The Chinese Remainder Theorem (CRT) says that
the group Zn (multiplicative group modulo n) is isomorphic to the group Zp Zq (pairs of
numbers with multiplication done componentwise modulo p and q respectively).


2 A more general definition (that works also for infinite groups) is that x is a generator of G if the subgroup

x` : ` Z is equal to G.

444

Mathematical Background.

Theorem A.26 If n = pq where p, q coprime then function f that maps x to hx (mod p), x
(mod q)i is one-to-one on Zn . Furthermore f is an isomorphism in the sense that f (xy) =
f (x)f (y) (where multiplication on the left hand side is modulo n and on the right hand side
is componentwise modulo p and q respectively).

Proof: The furthermore part can be easily verified and so we focus on showing that f is
one-to-one. We need to show that if f (x) = f (x0 ) then x = x0 . Since f (xx0 ) = f (x)f (x0 ),
it suffices to show that if x = 0 (mod p) (i.e., p|x) and x = 0 (mod q) (i.e., q|x) then x = 0
(mod n) (i.e., pq|x). Yet, assume that p|x and write x = pk. Then since gcd(p, q) = 1 and
q|x we know that q|k, meaning that pq|x. 
The Chinese Remainder Theorem can be easily generalized to show that for every n =
p1 p2 . . . pk , where all the pi s are co-prime, there is an isomorphism between Zn to Zp1
Zpk , meaning that for every n, the group Zn is isomorphic to a product of groups of
the form Zq for q a prime power (i.e., number of the form p` for prime p). In fact, it can
be generalized even further to show that every Abelian group G is isomorphic to a product
G1 G2 Gk where all the Gi s are cyclic. (This can be viewed as a generalization
of the CRT because all the groups of the form Zq for q a power of an odd prime are cyclic,
and all groups of the form Z2k are either cyclic or products of two cyclic groups.)

A.4

Finite fields
A field is a set F that has an addition (+) and multiplication () operations that behave in
the expected way: satisfy associative, commutative and distributive laws, have both additive
and multiplicative inverses, and neutral elements 0 and 1 for addition and multiplication
respectively. In other words, F is a field if it is an Abelian group with the operation + and an
identity element 0, and has an additional operation such that F\{0} and forms an Abelian
group, and furthermore the two operation satisfy the distributive rule a(b + c) = ab + ac.
Familiar fields are the real numbers (R), the rational numbers (Q) and the complex
numbers (C), but there are also finite fields. Recall that for a prime p, the set {0, . . . , p 1}
is an Abelian group with the addition modulo p operation and the set {1, . . . , p 1} is an
Abelian group with the multiplication modulo p operation. Hence {0, . . . , p 1} form a
field with these two operations, which we denote by GF(p). The simplest example for such
a field is the field GF(2) consisting of {0, 1} where multiplication is the AND () operation
and addition is the XOR operation.
Every finite field F has a number ` such that for every x F , x + x + + x (` times)
is equal to the zero element of F (exercise). This number ` is called the characteristic of F.
For every prime q, the characteristic of GF(q) is equal to q.

A.4.1

Non-prime fields.
One can see that if n is not prime, then the set {0, . . . , n 1} with addition and multiplication modulo n is not a field, as there exist two non-zero elements x, y in this set such
that x y = n = 0 (mod n). Nevertheless, there are finite fields of size n for non-prime
n. Specifically, for every prime q, and k 1, there exists a field of q k elements, which we
denote by GF(q k ). We will very rarely need to use such fields in this book, but still provide
an outline of their construction below.
For every prime q and k there exists an irreducible degree k polynomial P over the field
GF(q) (P is irreducible if it cannot be expressed as the product of two polynomials P 0 , P 00
of lower degree). We then let GF(q k ) be the set of all k 1-degree polynomials over GF(q).
Each such polynomial can be represented as a vector of its k coefficients. We perform both
addition and multiplication modulo the polynomial P . Note that addition corresponds
to standard vector addition of k-dimensional vectors over GF(q), and both addition and
multiplication can be easily done in poly(n, log q) time (we can reduce a polynomial S

A.5

Basic facts from linear algebra

445

modulo a polynomial P using a similar algorithm to long division of numbers). It turns out
that no matter how we choose the irreducible polynomial P , we will get the same field, up
to renaming of the elements. There is a deterministic poly(q, k)-time algorithm to obtain
an irreducible polynomial of degree k over GF(q). There are also probabilistic algorithms
(and deterministic algorithms whose analysis relies on unproven assumptions) that obtain
such a polynomial in poly(log q, k) time (see the book [Sho05]).
For us, the most important example of a finite field is GF(2k ), which consists of the
set {0, 1}k , with addition being component-wise XOR, and multiplication being polynomial
multiplication via some irreducible polynomial which we can find in poly(k) time. In fact,
we will mostly not even be interested in the multiplicative structure of GF(2k ) and only use
the addition operation (i.e., use it as the vector space GF(2)k , see below).

A.5

Basic facts from linear algebra


For F a field and n N, we denote by Fn the set of n-length tuples (or vectors) of elements
of F. If u, v Fn and x F then we denote by u + v the vector obtained by componentwise
addition of u and v and by xu the vector obtained by multiplying each entry of u by x.
A set of vectors u1 , . . . , uk in Fn is linearly independent if the only solution to the
equation x1 u1 + + xk uk = 0 (where 0 denotes the all-zero vector) is x1 = x2 = =
xk = 0. It can be shown that if u1 , . . . , uk are linearly independent then k n (exercise).
A set of n linearly independent vectors in Fn is called a basis of Fn . It is not hard to
see that if u1 , . . . , un isP
a basis of Fn then every vector v Fn can be expressed as a
linear combination v = i xi ui of the vectors u1 , . . . , un and furthermore this expression
is unique. The standard basis of Fn is the set e1 , . . . , en , where eij is equal to 1 if j = i and
to 0 otherwise.
A subset S Fn is called a subspace if it is closed under addition and scalar multiplication
(i.e., u, v S and x, y F implies that xu + yv S). The dimension of S, denoted by
dim(S) is defined to be the maximum number k such that there are k linearly independent
vectors in S. Such a set of dim(S) linearly independent vectors in S is called a basis and
one can see that every vector in S can be expressed as a linear combination of the vectors
in the basis.
A function f : Fn Fm is linear if f (u + v) = f (u) + f (v). Its not hard to verify that
the following hold for every linear function f :
P
n
If u1 , . . . , un is a basis for Fn then for every
i xi f (ui ) where
P vi F , f (v) =
x1 , . . . , xn are the elements such that v =
xi u . Thus, to know f s value at every
point it suffices to know its value on the basis elements.
The set Im(f ) = {f (v) : v Fn } is a subspace of Fm .
The set Ker(f ) = {v : f (v) = 0} is a subspace of Fn .
dim(Im(f )) + dim(Ker(f )) = n
A linear function f : Fn Fm is often described by an m n matrix A whose ith
column is f (ei ). The multiplication of
Pan m n matrix A and an n k matrix B is the
n k matrix C = AB where Ci,j = `[n] Ai,` B`,j . One can verify that if A describes a
function f : Fn Fm and B describes a function g : Fk Fn then C describes the function
h : Fk Fm mapping v to f (g(v)). It can also be verified that if we identify members of
Fn with n 1 matrices (i.e., column vectors) then f (v) = Av.
Qn
P
The determinant of an nn matrix A, denoted by det(A) is equal to Sn (1)sgn() i=1 Ai,(i)
where Sn denotes the group of permutations over [n] and sgn() is equal to 1 if the number
of pairs hi, ji such that i < j but (i) > (j) is odd, and is equal to 0 otherwise. We have
the following two facts:
det(AB) = det(A) det(B). This can be verified by direct computation.

446

Mathematical Background.

If A is an upper triangular matrix (i.e., Ai,j = 0 whenever i > j) then det(A) =


A1,1 A2,2 An,n . Indeed, for a permutation to give a non-zero contribution to the
determinant in this case it must satisfy (i) i for every i, which means that it is the
identity permutation.
Together these two rules give a polynomial-time algorithm to compute the determinant
of a matrix A by following the well known Gaussian elimination algorithm to express A as
E1 E2 Em D where the Ei s are elementary matrices (multiplication by which corresponds
to switching two columns, multiplying a column by a field element, or adding one column
to another) and the D is upper diagonal. Since the determinant is easy to compute for all
these matrices, we can compute the determinant of A as well.
The following lemma relates the determinant of a matrix to the function it represents:
Lemma A.27 For a function f : Fn Fn represented by an n n matrix A, the following
conditions are equivalent:
The columns of A are a basis for Fn .
f is one-to-one.
dim(Im(f )) = n.
dim(Ker(f )) = 0.
det(A) 6= 0.
There exists v Fn such that the equation Ax = v has exactly one solution.
For every v Fn , the equation Ax = v has exactly one solution.
Furthermore, if f is one-to-one then the mapping f 1 is linear and is represented by an
det(A(i,j) )
nn matrix A1 whose (i, j)th entry is det(A)
, where A(i,j) denotes the (n1)(n1)
th
th
matrix obtained by removing the i row and j column from A.

A.5.1

Inner product
The vector spaces Rn and Cn have an additional structure that is often quite useful.3 An
inner product over Cn to be a function mapping two vectors u, v to a complex number hu, vi
satisfying the following conditions:
hxu + yw, vi = xhu, vi + yhw, vi
hv, ui = hu, vi where z denotes complex conjugation (i.e., if z = a+ib then z = aib).
For every u, hu, ui is a non-negative real number with hu, ui = 0 iff u = 0.
The two examples
for inner products we will use are the standard inner product mapping
Pn
n
n
y
x, y
C
to
x
i=1 i i and the expectation or normalized inner product mapping x, y C
Pn
to n1 i=1 xi yi . We can also define inner products over the space Rn , in which case we drop
the conjugation.
If hu, vi = 0 we say that u and v are orthogonal and denote this by u v. We have the
following result:
Lemma A.28 If non-zero vectors u1 , . . . , uk satisfy ui uj for all i 6= j then they are
linearly independent.

3 The reason we restrict ourselves to these fields is that they have characteristic zero which means that
there does not exist a number k N and nonzero a F such that ka = 0 (where ka is the result of adding a
to itself k times). You can check that if there is such a number for a field F then there will not be an inner
product over Fn .

A.5

Basic facts from linear algebra


447
P
Proof: Suppose that i xi ui = 0 and consider take an inner product of this vector with
itself. We get that
X
X
X
X
|xi |2 hui , uj i ,
(7)
xi xj hui , uj i =
xj uj i =
xi ui ,
0=h
i

i,j

where the last equality follows from the fact that hui , uj i = 0 for i 6= j. But unless all
the xi s are zero, the righthand
side of (7) is strictly positive. (Recall that for a complex

number x = a + ib, |x| = a2 + b2 and |x|2 = xx.) 


A set u1 , . . . , un of nonzero vectors in Cn satisfying hui , uj i = 0 for i 6= j is called an
orthogonal basis of Cn . If in addition hui , ui i = 1 for all i then we say this is an orthonormal
basis. An orthonormal basis consists of n linearly independent vectors and hencePas its name
implies is a basis of Cn , meaning that every vector v can be expressed as v = i xi ui . By
taking an inner product of this equality with ui , one can see that xi = hv, ui i
The following identity (that can be viewed as a generalization of the Pythagorean theorem) is often useful:
Lemma A.29 (Parsevals identity) If u1 , . . . , un is an orthonormal basis for Cn , then for
every v,
n
X
|xi |2 ,
hv, vi =
i=1

where x1 , . . . , xn are the numbers such that v =

xi ui .

Proof: As in the proof of Lemma A.28,


X
X
X
|xi |2 hui , ui i . 
xj uj i =
xi ui ,
hv, vi = h
j

Vector spaces with an inner product are known as Hilbert spaces.

A.5.2

Dot product
Even in a field F that doesnt have an inner
P product, we can define the dot product of two
vectors u, v Fn , denoted by u v, as ni=1 ui vi . For every subspace S Fn , we define
S = {u : u v = 0v S}. We leave the following simple claim as an exercise:
Claim A.30 dim(S) + dim(S ) = n

In particular for every nonzero vector u Fn , the subspace u of vectors v satisfying


u v = 0 has dimension n 1 and hence cardinality |F|n1 . As a corollary we get the
following very useful fact:
Claim A.31 (The random subsum principle) For every nonzero u GF(2) (the field {0, 1}
with addition and multiplication modulo 2):
Pr

vR GF(2)n

A.5.3

[u v = 0] = 1/2

Eigenvectors and eigenvalues


If A is an n n complex matrix and v Cn is a nonzero vector, we say that v is an
eigenvector of A if there exists C such that Av = v. We say that A is diagonalizable
if there is a basis v1 , . . . , vn of eigenvectors for A. In other words, there is an invertible
matrix P such that P AP 1 is a diagonal matrix.

448

Mathematical Background.

Note that A has an eigenvector with eigenvalue if and only if the matrix A I
is non-invertible, where I is the identity matrix. Thus in particular is a root of the
polynomial p(x) = det(A xI). Thus the fundamental Theorem of Algebra (that every
complex polynomial has as many roots as the degree) that every square matrix has at least
one eigenvector. (A non-invertible matrix has an eigenvector zero.)
For a matrix A, the conjugate transpose of A, denoted A , is the matrix such that for
every i, j, Ai,j = Aj,i where denotes the complex conjugate operation. We say that an
n n matrix A is Hermitian if A = A . An Hermitian matrix with only real entries is called
symmetric. That is, a real matrix is symmetric if A = A where is the transpose operation
(i.e., Ai,j = Aj,i ). An equivalent condition (exercise) is that A is Hermitian if and only if
hAu, vi = hu, Avi .

(8)

An important useful fact about Hermitian matrices is the following theorem:


Theorem A.32 If A is an n n Hermitian matrix then there exists an orthogonal basis of
eigenvectors for A.

Proof: We prove this by induction on n. We know that A has one eigenvector v with
eigenvalue . Now let S = v be the n 1 dimensional space of all vectors orthogonal to
v. We claim that for every u S, Au S. Indeed, if hu, vi = 0 then
hAu, vi = hu, Avi = hu, vi = 0 .
Thus the restriction of A to S is an n 1 dimensional linear operator satisfying (8)
and hence by induction this restriction has an orthogonal basis of eigenvectors v2 , . . . , vn .
Adding v to this set we get an n-dimensional orthogonal basis of eigenvectors for A. 
Note that if A is real and symmetric then all its eigenvalues must be real also (with no
imaginary components). Indeed, if Av = v then
hv, vi = hAv, vi = hv, Avi = hv, vi ,
meaning that for a nonzero v, = . This implies that the eigenvectors, that are obtained
by solving a linear equation with real coefficients, are also real.

A.5.4

Norms
A norm of a vector in Cn is a function mapping a vector v to a real number kvk satisfying:
For every v, kvk 0 with kvk = 0 iff v = 0.
If x C then kxvk = |x|kvk.
(Triangle inequality) For every u, v, ku + vk kuk + kvk.

For every v Cn and number p 1, the Lp norm of v, denoted kvkp , is equal to


Pn
1/p
( i=1 |vi |p ) p
. One particularly
p interesting case is p = 2, the so-called Euclidean norm, in
Pn
2
which kvk2 =
hv, vi. P
Another interesting case is p = 1, where we use the
i=1 |vi | =
n
single bar notation and denote |v|1 = i=1 |vi |. Another case is p = , where we denote
kvk = limp kvkp = maxi[n] |vi |.
Some relations between the different norms can be derived
from the H
older inequality,
P
stating that for every p, q with p1 + q1 = 1, kukp kvkq ni=1 |ui vi |. To prove it, note that
by simple scaling, it suffices
to show that
Pnit enough
Pnvectors, and so
Pn to consider norm one
p(1/p)
|u
|
|vi |q(1/q)
|u
||v
|
=
|u
||v
|

1.
But
if
kuk
=
kvk
=
1
then
i
i
i
i
i
q
i=1
i=1
i=1
Pn p1
1
1
1
p
q
|u
|
+
|v
|
=
+
=
1,
where
the
last
inequality
uses
the
fact
that
for every
i=1 p i
q i
p
q
1
a, b > 0 and [0, 1], a b
a + (1 )b.
The H
older inequality implies the following relations between the L2 , L1 and L norms
of every vector (see Exercise 21.2):
p

(9)
|v|1 / n kvk2 |v|1 kvk
Vector spaces with a norm are sometimes known as Banach spaces.

A.6

A.5.5

Polynomials

449

Metric spaces
For any set and d : 2 R, we say that d is a metric on if it satisfies the following
conditions:
1. d(x, y) 0 for every x, y where d(x, y) = 0 if and only if x = y.
2. d(x, y) = d(y, x) for every x, y .
3. (Triangle Inequality) For every x, y, z , d(x, z) d(x, y) + d(y, z).
That is, d(x, y) denotes the distance between x and y according to some measure. If is
a vector space with a norm then the function d(x, y) = kx yk is a metric over , but there
are other examples for metrics that do not come from any norm. For example, for every
graph G we can define a metric over the vertex set of G by letting the distance of x and y
be the length of the shortest path between them. Various metric spaces and the relations
between them have found recently many applications in theoretical computer science, see
Chapter 15 of [Mat02] for a good survey.

A.6

Polynomials
We list some basic facts about univariate polynomials.
Theorem A.33 A nonzero polynomial of degree d has at most d distinct roots.
Proof: Suppose p(x) =
Then

Pd

i
i=0 ci x

has d + 1 distinct roots 1 , . . . , d+1 in some field F.

d
X
i=0

ij ci = p(j ) = 0,

for j = 1, . . . , d + 1. This means that the system Ay = 0 with

. . . d1
1
1
21
1
. . . d2
2
22

A=
. . . . . . . . . . . . . . . . . . . .
1 d+1 2d+1 . . . dd+1

has a solution y = c. The matrix A is a Vandermonde matrix, and it can be shown that
Y
(i j ),
det A =
i>j

which is nonzero for distinct i . Hence rankA = d + 1. The system Ay = 0 has therefore
only a trivial solution a contradiction to c 6= 0. 
This theorem has an interesting corollary:
Corollary A.34 For every finite field F, the multiplicative group F is cyclic.

Proof: The fact that the polynomial xk 1 has at most k roots implies that the group F
has the property (*) that for every k the number of elements x satisfying xk = 1 is always
at most k. We will prove by induction that every group G satisfying (*) is cyclic.
Let n = |G|. We consider three cases:
n is prime. In this case every element of G has either order 1 or order n. Since the
only element with order 1 is the identity element, we see that G has an element of
order n G is cyclic.

450

Mathematical Background.

n = pc for some prime p and c > 1. In this case if there is no element of order n,
c1
then all the orders must divide pc1 . We get n = pc elements x such that xp
= 1,
violating (*).
n = pq for co-prime p and q. In this case let H and F be two subgroups of G defined
as follows: H = {a : ap = 1} and F = {b : bq = 1}. Then |H| p < n and |F | q < n
and also as subgroups of G both H and F satisfy (*). Thus by the induction hypothesis
both H and F are cyclic and have generators a and b respectively. We claim that ab
generates the entire group G. Indeed, let c be any element in G. Since p, q are
coprime, there are x, y such that xq + yp = 1 and hence c = cxq+yp . But (cxq )p = 1
and (cyp )q = 1 and hence c is a product of an element of H and an element of F , and
hence c = ai bj for some i {0, . . . , p 1} and j {0, . . . , q 1}. Thus, to show that
c = (ab)z for some z all we need to do is to find z such that z = i (mod p) and z = j
(mod q), but this can be done using the Chinese Remainder Theorem.

Theorem A.35 For any set of pairs (a1 , b1 ), . . . , (ad+1 , bd+1 ) there exists a unique polynomial g(x) of degree at most d such that g(ai ) = bi for all i = 1, 2, . . . , d + 1.

Proof: The requirements are satisfied by Lagrange Interpolating Polynomial:


d+1
X

j6=i (x aj )
.
bi Q
j6=i (ai aj )
i=1

If two polynomials g1 (x), g2 (x) satisfy the requirements then their difference p(x) = g1 (x)
g2 (x) is of degree at most d, and is zero for x = a1 , . . . , ad+1 . Thus, from the previous
theorem, polynomial p(x) must be zero and polynomials g1 (x), g2 (x) identical. 
The following elementary result is usually attributed to Schwartz and Zippel in the
computer science community, though it was certainly known earlier (see e.g. DeMillo and
Lipton [DLne]).
Lemma A.36 If a polynomial p(x1 , x2 , . . . , xm ) over F = GF (q) is nonzero and has total
degree at most d, then
d
Pr[p(a1 ..am ) 6= 0] 1 ,
q
where the probability is over all choices of a1 ..am F .

Proof: We use induction on m. If m = 1 the statement follows from Theorem A.33.


Suppose the statement is true when the number of variables is at most m 1. Then p can
be written as
d
X
xi1 pi (x2 , . . . , xm ),
p(x1 , x2 , . . . , xm ) =
i=0

where pi has total degree at most d i. Since p is nonzero, at least one of pi is nonzero.
Let k be the largest i such that pi is nonzero. Then by the inductive hypothesis,
Pr

a2 ,a3 ,...,am

[pi (a2 , a3 , . . . , am ) 6= 0] 1

dk
.
q

Whenever pi (a2 , a3 , . . . , am ) 6= 0, p(x1 , a2 , a3 , . . . , am ) is a nonzero univariate polynomial


of degree k, and hence becomes 0 only for at most k values of x1 . Hence
Pr[p(a1 ..am ) 6= 0] (1
and the induction is completed. 

dk
d
k
)(1
)1 ,
q
q
q

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy