Siggraph03

Download as pdf or txt
Download as pdf or txt
You are on page 1of 24

Chapter 1

Quasi-Monte Carlo Sampling by


Art B. Owen

In Monte Carlo (MC) sampling the sample averages of random quantities are used
to estimate the corresponding expectations. The justification is through the law of
large numbers. In quasi-Monte Carlo (QMC) sampling we are able to get a law
of large numbers with deterministic inputs instead of random ones. Naturally we
seek deterministic inputs that make the answer converge as quickly as possible. In
particular it is common for QMC to produce much more accurate answers than MC
does. Keller [19] was an early proponent of QMC methods for computer graphics.

We begin by reviewing Monte Carlo sampling and showing how many prob-
lems can be reduced to integrals over the unit cube [0, 1)d . Next we consider
how stratification methods, such as jittered sampling, can improve the accuracy
of Monte Carlo for favorable functions while doing no harm for unfavorable ones.
Method of multiple-stratification such as Latin hypercube sampling (n-rooks) rep-
resent a significant improvement on stratified sampling. These stratification meth-
ods balance the sampling points with respect to a large number of hyperrectangular
boxes. QMC may be thought of as an attempt to take this to the logical limit: how
close can we get to balancing the sample points with respect to every box in [0, 1)d
at once? The answer, provided by the theory of discrepancy is surprisingly far, and
that the result produce a significant improvement compared to MC. This chapter
concludes with a presentation of digital nets, integration lattices and randomized
QMC.

1
1.1 Crude Monte Carlo
As a frame of reference for QMC, we recap the basics of MC. Suppose that the
average we want to compute is written as an integral
Z
I= f (x)q(x)dx. (1.1)
D

The set D ⊆ Rd is the domain of interest, perhaps a region on the unit sphere
or in the unit cube. The function q is a probability density function on D. That is
R
q(x) ≥ 0 and D q(x)dx = 1. The function f gives the quantity whose expectation
we seek: I is the expected value of f (x) for random x with density q on D.
In crude Monte Carlo sampling we generate n independent samples x1 , . . . , xn
from the density q and estimate I by
n
1X
Iˆ = Iˆn = f (xi ). (1.2)
n
i=1

The strong law of large numbers tells us that

Pr lim Iˆn = I = 1.

(1.3)
n→∞

That is, crude Monte Carlo always converges to the right answer as n increases
without bound.
Now suppose that f has finite variance σ 2 = Var(f (x)) ≡ D (f (x)−I)2 q(x)dx.
R

Then E((Iˆn − I)2 ) = σ 2 /n so the root mean square error (RMSE) of MC sam-

pling is O(1/ n). This rate is slow compared to that of classical quadrature rules
(Davis and Rabinowitz [7]) for smooth functions in low dimensions. Monte Carlo
methods can improve on classical ones for problems in high dimensions or on dis-
continuous functions.
A given integration problem can be written in the form (1.1) in many different
ways. First, let p be a probability density on D such that p(x) > 0 whenever
q(x)|f (x)| > 0. Then
Z Z
f (x)q(x)
I= f (x)q(x)dx = p(x)dx
D D p(x)
and we could as well sample xi ∼ p(x) and estimate I by
n
1 X f (xi )q(xi )
Iˆp = Iˆn,p = . (1.4)
n p(xi )
i=1
The RMSE can be strongly affected, for better or worse, by this re-expression,
known as importance sampling. If we are able to find a good p that is nearly
proportional to f q then we can get much better estimates.
Making a good choice of density p is problem specific. Suppose for instance,
that one of the components of x describes the angle θ = θ(x) between a ray and a
surface normal. The original version of f may include a factor of cos(θ)η for some
η > 0. Using a density p(x) ∝ q(x) cos(θ)η corresponds to moving the cosine
power out of the integrand and into the sampling density.
We will suppose that a choice of p has already been made. There is also the
possibility of using a mixture of sampling densities pj as with the balance heuristic
of Veach and Guibas [42, 43]. This case can be incorporated by increasing the di-
mension of x by one, and using that variable to select j from a discrete distribution.
Monte Carlo sampling of x ∼ p over D almost always uses points from a
pseudo-random number generator simulating the uniform distribution on the inter-
val from 0 to 1. We will take this to mean the uniform distribution on the half-open
interval [0, 1). Suppose that it takes d∗ uniform random variables to simulate a
point in the dimensional domain D. Often d∗ = d but sometimes d∗ = 2 vari-
ables from [0, 1) can be used to generate a point within a surface element in d = 3
dimensional space. In other problems we might use d∗ > d random variables to
generate a p distributed point in D ⊆ Rd . Chapter ?? describes general techniques
for transforming [0, 1)d into D and provides some specific examples of use in ray
tracing. Devroye [8] is a comprehensive reference on techniques for transforming
uniform random variables into one’s desired random objects.

Suppose that a point having the U [0, 1)d distribution is transformed into a
point τ (x) having the density p on D. Then
Z Z Z
f (x)q(x) f (τ (x))q(τ (x))
I= p(x)dx = dx ≡ f ∗ (x)dx
D p(x) [0,1)d∗ p(τ (x)) [0,1)d∗
(1.5)

where f ∗ incorporates the transformation τ and the density q. Then I is estimated


by
n n
1 X f (τ (xi ))q(τ (xi )) 1X ∗
Iˆ = = f (xi ) (1.6)
n p(τ (xi )) n
i=1 i=1


where xi are independent U [0, 1)d random variables.
Equation (1.5) expresses the original MC problem (1.1) as one of integrating
a function f ∗ over the unit cube in d∗ dimensions. We may therefore reformulate
the problem as finding I = [0,1)d f (x)dx. The new d is the old d∗ and the new f
R

is the old f ∗ .

1.2 Stratification
Stratified sampling is a technique for reducing the variance of a Monte Carlo inte-
gral. It was originally applied in survey sampling (see Cochran [4]) and has been
adapted in Monte Carlo methods, Fishman [12]. In stratified sampling, the domain
of x is written as a union of strata D = H
S T
h=1 Dh where Dj Dk = ∅ if j 6= k.
An integral is estimated from within each stratum and then combined. Following
the presentation in chapter 1.1, we suppose here that D = [0, 1)d .
Figure 1.1 shows a random sample from the unit square along with 3 alternative
stratified samplings. The unit cube [0, 1)d is very easily partitioned into box shaped
strata like those shown. It is also easy to sample uniformly in such strata. Suppose
that a, c ∈ [0, 1)d with a < c componentwise. Let U ∼ U [0, 1)d . Then a + (c −
a)U interpreted componentwise is uniformly distributed on the box with lower left
corner a and upper right corner c.
In the simplest form of stratified sampling, a Monte Carlo sample xh1 , . . . xhnh
is taken from within stratum Dh . Each stratum is sampled independently, and the
results are combined as
H nh
X |Dh | X
IˆSTRAT = IˆSTRAT (f ) = f (xhi ), (1.7)
nh
h=1 i=1

where |Dh | is the volume of stratum D.


For any x ∈ [0, 1)d let h(x) denote the stratum containing x. That is x ∈ Dh(x) .
The mean and variance of f within stratum h are
Z
µh = |Dh |−1 f (x)dx, and, (1.8)
Dh
Z
2 −1
σh = |Dh | (f (x) − µh )2 dx (1.9)
Dh

respectively. We can write E(IˆSTRAT ) as:


H nh H H Z
X |Dh | X X X
E(f (xhi )) = |Dh |µh = f (x)dx = I,
nh
h=1 i=1 h=1 h=1 Dh

● ●

● ● ● ●



● ●

● ●

●● ●

● ●


● ●


● ●

● ●

● ● ● ●

● ●

● ●● ●

● ●


● ●

● ●


● ● ●

● ●




● ●

Figure 1.1: The upper left figure is a simple random sample of 16 points in [0, 1)2 .
The other figures show stratified samples with 4 points from each of 4 strata.

so that stratified sampling is unbiased.

The variance of stratified sampling depends on the allocation of sample size nh


to strata. We will suppose that nh is allocated proportionally, so that nh = n|Dh |
for the total sample size n. First we note that when x ∼ U [0, 1)d , then h(x) is
a random variable taking the value ` with probability |D` |. Then from a standard
variance formula

σ 2 = Var(f (x)) = E(Var(f (x) | h(x))) + Var(E(f (x) | h(x))) (1.10)


H
X H
X
= |Dh |σh2 + |Dh |(µh − I)2 , (1.11)
h=1 h=1

so that σ 2 is a sum of contributions from within and between strata. Now


H H
X |Dh |2 1X σ2
Var(IˆSTRAT ) = σh2 = |Dh |σh2 ≤ , (1.12)
nh n n
h=1 h=1

from (1.10).
Equation (1.12) shows that stratified sampling with proportional allocation
does not increase the variance. Proportional allocation is not usually optimal. Op-
timal allocations take nh ∝ |Dh |σh . If estimates of σh are available they can be
used to set nh , but poor estimates of σh could result in stratified sampling with
larger variance than crude MC. We will assume proportional allocation.
A particular form of stratified sampling is well suited to the unit cube. Haber [13]
proposes to partition the unit cube [0, 1)d into H = md congruent cubical regions
and to take nh = 1 point from each of them. This stratification is known as jittered
sampling in graphics, following Cook, Porter and Carpenter [5].
Any function that is constant within strata is integrated without error by IˆSTRAT .
If f is close to such a function, then f is integrated with a small error. Let f¯ be the
function defined by f¯(x) = µh(x) , and define the residual fRES (x) = f (x) − f¯(x).
This decomposition is illustrated in Figure 1.2 for a function on [0, 1). Stratified
sampling reduces the Monte Carlo variance from σ 2 (f )/n to σ 2 (fRES )/n.

1.3 Multiple Stratification


Suppose we can afford to sample 16 points in [0, 1)2 . Sampling one point from
each of 16 vertical strata would be a good strategy if the function f depended
primarily on the horizontal coordinate. Conversely if the vertical coordinate is the
more important one, then it would be better to take one point from each of 16
horizontal strata.
It is possible to stratify both ways with the same sample, in what is known as
Latin hypercube sampling (McKay, Beckman and W. J. Conover [24]) or n-rooks
1.2
1.0
0.8
0.6
0.4
0.2
0.0

0.0 0.2 0.4 0.6 0.8 1.0


0.6
0.4
0.2
0.0
−0.6 −0.4 −0.2

0.0 0.2 0.4 0.6 0.8 1.0

Figure 1.2: The upper plot shows a piece-wise smooth function f on [0, 1). The
step function is the best approximation f¯ to f , in mean square error, among func-
tions constant over intervals [j/10, (j + 1)/10). The lower plot shows the differ-
ence f − f¯ using a vertical scale similar to the upper plot.

sampling (Shirley [33]). Figure 1.3 shows a set of 16 points in the square, that are
simultaneously stratified in each of 16 horizontal and vertical strata.
If the function f on [0, 1)2 is dominated by either the horizontal coordinate
or the vertical one, then we’ll get an accurate answer, and we don’t even need to
know which is the dominant variable. Better yet, suppose that neither variable is
● ●

● ●

● ●

● ●

● ●

● ●

● ●
● ●

● ●

● ●
● ●

● ●
● ●

● ●

● ●

● ●

Figure 1.3: The left plot shows 16 points, one in each of 16 vertical strata. The
right plot shows the same 16 points. There is one in each of 16 horizontal strata.
These points form what is called a Latin hypercube sample, or an n-rooks pattern.

dominant but that

f (x) = fH (x) + fV (x) + fRES (x) (1.13)

where fH depends only on the horizontal variable, fV depends only on the vertical
one, and the residual fRES is defined by subtraction. Latin hypercube sampling will
give an error that is largely unaffected by the additive part fH + fV . Stein [37]
showed that the variance in Latin hypercube sampling is approximately σRES 2 /n
2 is the smallest variance of f
where σRES RES for any decomposition of the form (1.13).

His result is for general d, not just d = 2.


Stratification with proportional allocation is never worse than crude MC. The
same is almost true for Latin hypercube sampling. Owen [28] shows that for all
n ≥ 2, d ≥ 1 and square integrable f , that
σ2
Var(IˆLHS ) ≤ .
n−1
For the worst f , Latin hypercube sampling is like using crude MC with one obser-
vation less.
The construction of a Latin hypercube sample requires uniform random permu-
tations. A uniform random permutation of 0 through n − 1 is one for which all n!
possible orderings have the same probability. Devroye [8] gives algorithms for such
random permutations. One choice is to have an array Ai = i for i = 0, . . . , n − 1
and then for j = n − 1 down to 1 swap Aj with Ak where k is uniformly and
randomly chosen from 0 through j.
For j = 1, . . . , d, let πj be independent uniform random permutations of
0, . . . , n − 1. Let Uij ∼ U [0, 1)d independently for i = 1, . . . , n and j = 1, . . . , d
and let X be a matrix with

πj (i − 1) + Uij
Xij = .
n
Then the n rows of X form a Latin hypercube sample. That is we may take xi =
(Xi1 , . . . , Xid ). An integral estimate Iˆ is the same whatever order the f (xi ) are
summed. As a consequence we only need to permute d − 1 of the d input variables.
We can take π1 (i − 1) = i − 1 to save the cost of one random permutation.
Jittered sampling uses n = k 2 strata arranged in a k by k grid of squares
while n-rooks provides simultaneous stratification in both an n by 1 grid and a 1
by n grid. It is natural to wonder which method is better. The answer depends on
whether f is better approximated by a step function, constant within squares of size
1/k × 1/k grid, or by an additive function with each term constant within narrower
bins of width 1/n. Amazingly, we don’t have to choose. It is possible to arrange
n = k 2 points in an n-rooks arrangement that simultaneously has one point in
each square of a k by k grid. A construction for this was proposed independently
by Chiu, Shirley and Wang [2] and by Tang [38]. The former handle more general
grids of n = k1 × k2 points. The latter reference arranges points in [0, 1)d with
d ≥ 2 in a Latin hypercube such that every two dimensional projection of xi puts
one point into each of a grid of strata.

1.4 Uniformity and Discrepancy


The previous sections look at stratifications in which every cell in a rectangular
grid or indeed in multiple rectangular grids gets the proper number of points. It
is clear that a finite number of points in [0, 1)d cannot be simultaneously stratified
with respect to every hyper-rectangular subset of [0, 1)d , yet it is interesting to ask
how far we might be able to go in that direction. This is a problem that has been
studied since Weyl [44] originated his theory of uniform distribution. Kuipers and
Niederreiter [21] summarize that theory.
Let a and c be points in [0, 1)d for which a < c holds componentwise, and then
let [a, c) denote the box of points x where a ≤ x < c holds componentwise. We
use |[a, c)| to denote the d-dimensional volume of this box.
An infinite sequence of points x1 , x2 , · · · ∈ [0, 1)d is uniformly distributed
if limn→∞ (1/n) ni=1 1a≤xi <c = |[a, c)| holds for all boxes. This means that
P

Iˆn → I for every function f (x) of the form 1a≤x<c and so for any finite linear
combination of such indicators of boxes. It is known that limn→∞ |Iˆn − I| = 0
for uniformly distributed xi and any function f that is Riemann integrable. Thus
uniformly distributed sequences can be used to provide a deterministic law of large
numbers.
To show that a sequence is uniformly distributed it is enough to show that
Iˆn → I when f is the indicator of a suitable subset of boxes. Anchored boxes
take the form [0, a) for some point a ∈ [0, 1)d . If Iˆn → I for all indicators of
anchored boxes, then the same holds for all boxes. For integers b ≥ 2 a b-adic box
is a Cartesian product of the form
d h
Y `j `j + 1 
, . (1.14)
j=1
b k j bk j

for integers kj ≥ 0 and 0 ≤ `j < bkj . When b = 2 the box is called dyadic. An
arbitrary box can be approximated by b-ary boxes. If Iˆ → I for all indicators of b-
adic boxes then the sequence (xi ) is uniformly distributed. A mathematically more
interesting result is the Weyl condition. The sequence (xi ) is uniformly√
distributed
if and only if Iˆn → I for all trigonometric polynomials f (x) = e2π −1k·x where
k ∈ Zd .
If xi are independent U [0, 1)d variables, then (xi ) is uniformly distributed with
probability one. Of course we hope to do better than random points. To that end,
we need a numerical measure of how uniformly distributed a sequence of points is.
These measures are called discrepancies, and there are a great many of them. One
of the simplest is the star discrepancy
n
1X
Dn∗ = Dn∗ (x1 , . . . , xn ) = sup 10≤xi <a − [0, a) (1.15)
a∈[0,1)d n
i=1

Figure 1.4 illustrates this discrepancy. It shows an anchored box [0, a) ∈ [0, 1)2 and
a list of n = 20 points. The anchored box has 5 of the 20 points so (1/n) ni=1 10≤xi <a =
P

0.20. The volume of the anchored box is 0.21, so the difference is |0.2 − 0.21| =

● ●

a


● ● ●


● ● ●

● ●


Figure 1.4: Shown are 20 points in the unit square and an anchored box (shaded)
from (0, 0) to a = (.3, .7). The anchored box [0, a) has volume 0.21 and contains
a fraction 5/20 = 0.2 of the points.

0.01. The star discrepancy Dn∗ is found by maximizing this difference over all
anchored boxes [0, a).
For xi ∼ U [0, 1)d , Chung [3] showed that

2nDn∗
lim sup p =1 (1.16)
n→∞ log(log(n))
so Dn∗ = O((log log(n)/n)1/2 ) with probability one. An iterated logarithm grows
slowly with n, so Dn∗ may be only slightly larger than n−1/2 for large n.
It is known that a deterministic choice of (xi ) can yield Dn∗ much smaller
than (1.16). There are infinite sequences (xi ) in [0, 1)d with Dn∗ (x1 , . . . , xn ) =
O(log(n)d /n). Such sequences are called “low discrepancy” sequences, and some
of them are described in chapter 1.5. It is suspected but not proven that infinite
sequences cannot be constructed with Dn∗ = o(log(n)d /n); see Beck and Chen [1].
In an infinite sequence, the first m points of x1 , . . . , xn are the same for any
n ≥ m. If we knew in advance the value of n that we wanted then we might use
a sequence customized for that value of n, such as xn1 , . . . , xnn ∈ [0, 1)d , without
insisting that xni = xn+1 i . In this setting Dn∗ (xn1 , . . . , xnn ) = O(log(n)d−1 /n)
is possible. The effect is like reducing d by one, but the practical cost is that such
a sequence is not extensible to larger n.
There is a connection between better discrepancy and more accurate integra-
tion. Hlawka [16] proved the Koksma-Hlawka inequality

|Iˆ − I| ≤ Dn∗ (x1 , . . . , xn )VHK (f ). (1.17)

The factor VHK (f ) is the total variation of f in the sense of Hardy and Krause.
Niederreiter [26] gives the definition.
Equation (1.17) shows that a deterministic law of large numbers can be much
better than the random one, for large enough n and a function f with finite variation
VHK (f ). One often does see QMC methods performing much better than MC, but
equation (1.17) is not good for predicting when this will happen. The problem is
that Dn∗ is hard to compute, VHK (f ) is harder still, and that the bound (1.17) can
grossly overestimate the error. In some cases VHK is infinite while QMC still beats
MC. Schlier [32] reports that even for QMC the variance of f is more strongly
related to the error than is the variation.

1.5 Digital Nets and Related Methods


Niedereitter [26] presents a comprehensive account of digital nets and sequences.
We will define them below, but first we illustrate a construction for d = 1.
The simplest digital nets are the radical inverse sequences initiated by van der
Corput [40, 41]. Let b ≥ 2 be an integer base. The non-negative integer n can
be written as ∞ k−1 where n ∈ {0, 1, . . . , b − 1} and only finitely many
P
k=1 nk b k
nk are not zero. The base b radical inverse function is φb (n) = ∞ −k ∈
P
k=1 nk b
` ` base 2 φ2 (`)
0 0. 0.000 0.000
1 1. 0.100 0.500
2 10. 0.010 0.250
3 11. 0.110 0.750
4 100. 0.001 0.125
5 101. 0.101 0.625
6 110. 0.011 0.375
7 111. 0.111 0.875

Table 1.1: The first column shows integers ` from 0 to 7. The second column shows
` in base 2. The third column reflects the digits of ` through the binary point to
construct φ2 (`). The final column is the decimal version of φ2 (`).

[0, 1). A radical inverse sequence consists of φb (i) for n consecutive values of i,
conventionally 0 through n − 1.
Table 1.1 illustrates a radical inverse sequence, using b = 2 as van der Corput
did. Because consecutive integers alternate between even and odd, the van der
Corput sequence alternates between values in [0, 1/2) and [1/2, 1). Among any 4
consecutive van der Corput points there is exactly one in each interval [k/4, (k +
1)/4) for k = 0, 1, 2, 3. Similarly any bm consecutive points from the radical
inverse sequence in base b are stratified with respect to bm congruent intervals of
length 1/bm .
If d > 1 then it would be a serious mistake to simply replace a stream of
pseudo-random numbers by the van der Corput sequence. For example with d = 2
taking points xi = (φ2 (2i − 2), φ2 (2i − 1)) ∈ [0, 1)2 we would find that all xi lie
on a diagonal line with slope 1 inside [0, 1/2) × [1/2, 1).
For d > 1 we really need a stream of quasi-random d-vectors. There are
several ways to generalize the van der Corput sequence to d ≥ 1. The Halton [14]
sequence in [0, 1)d works with d relatively prime bases b1 , . . . , bd . Usually these
are the first d prime numbers. Then for i ≥ 1,

xi = (φ2 (i − 1), φ3 (i − 1), φ5 (i − 1), . . . , φbd (i − 1)) ∈ [0, 1)d .

The Halton sequence has low discrepancy: Dn∗ = O((log n)d /n).
The Halton sequence is extensible in both n and d. For small d the points of the
Halton sequence have a nearly uniform distribution. The left panel of Figure 1.5
shows a two dimensional portion of the Halton sequence using prime bases 2 and
● ● ● ● ●

● ● ●
● ● ● ●
● ● ● ● ● ●
● ●
● ● ● ●
● ●
● ● ● ● ●
● ●
● ● ● ● ●
● ● ●
● ● ● ●
● ● ● ● ● ●
● ● ●
● ● ● ●
● ● ●
● ● ● ●
● ● ● ● ●
● ●
● ● ● ● ●
● ●
● ● ● ● ●
● ● ●
● ● ●
● ● ●
● ● ● ●
● ● ● ● ●
● ● ●
● ● ● ● ●

● ● ● ● ●
● ●
● ● ● ● ●
● ●
● ● ● ● ●
● ●
● ● ● ● ●
● ● ●
● ● ● ● ● ●
● ● ●
● ● ● ● ●
● ● ● ● ● ●

● ● ● ● ● ●
● ●
● ● ● ● ● ●
● ●
● ● ● ● ● ●
● ●
● ● ● ● ●
● ●
● ● ● ● ● ● ●
● ●
● ● ● ● ● ●
● ●
● ● ● ● ● ●

Figure 1.5: The left panel shows the first 23 ×32 = 72 points of the Halton sequence
using bases 2 and 3. The middle panel shows the first 72 points for the 10’th and
11’th primes, 29 and 31 respectively. The right panel shows these 72 points after
Faure’s [11] permutation is applied.

3. The second panel shows the same points for bases 29 and 31 as would be needed
with d = 11. While the are nearly uniform in one dimensional problems, their two
dimensional uniformity is seriously lacking. When it is possible to identify the
more important components of x, these should be sampled using the smaller prime
bases.
The poorer distribution for larger primes can be mitigated using a permutation
of Faure [11]. Let π be a permutation of {0, . . . , b − 1}. Then the radical inverse
function can be generalized to φb,π (n) = ∞ −k
P
k=1 π(nk )b . It still holds that any
consecutive bm values of φb,π (i) stratify into bm boxes of length 1/bm . Faure’s
transformation πb of 0, . . . , b − 1 is particularly simple. Let π2 = (0, 1). For even
b > 2 take πb = (2πb/2 , 2πb/2 + 1), so π4 = (0, 2, 1, 3). For odd b > 2 put
k = (b − 1)/2 and η = φb−1 . Then add 1 to any member of η greater than or equal
to k. Then πb = (η(0), . . . , η(k − 1), k, η(k), . . . , η(b − 2)). For example with
b = 5 we get k = 2, and after the larger elements are incremented, η = (0, 3, 1, 4).
Finally π5 = (0, 3, 2, 1, 4). The third plot in Figure 1.5 shows the effect of Faure’s
permutations on the Halton sequence.
Digital nets provide more satisfactory generalizations of radical inverse se-
quences to d ≥ 2. Recall the b-ary boxes in (1.14). The box there has volume
b−K where K = k1 + · · · + kd . Ideally we would like nb−K points in every such
box. Digital nets do this, at least for small enough K.
Let b ≥ 2 be an integer base and let m ≥ t ≥ 0 be integers. A (t, m, d)–net in
base b is a finite sequence x1 , . . . , xbm for which every b-ary box of volume bt−m
contains exactly bt points of the sequence.

Clearly t = 0 corresponds to better stratification. For given values of b, m, and


d, particularly for large d, there may not exist a net with t = 0, and so nets with
t > 0 are widely used.

Faure [10] provides a construction of (0, m, p)–nets in base p where p is a


prime number. The first component of these nets is the radical inverse function in
base p applied to 0 through bm − 1. Figure 1.6 shows 81 points of a (0, 4, 2)–net
in base 3. There are 5 different shapes of 3-ary box with volume 1/81. The aspect
ratios are 1 × 1/81, 1/3 × 1/27, 1/9 × 1/9, 1/17 × 1/3, and 1/81 × 1. Latin
hypercube samples of 81 points balance the first and last of these, jittered sampling
balances the third, while multi-jittered sampling balances the first, third, and fifth.
A (0, 4, 2)–net balances 81 different 3-ary boxes of each of these 5 aspect ratios.
If f is well approximated by a sum of the corresponding 405 indicator functions,
then |Iˆ − I| will be small.

The extensible version of a digital net is a digital sequence. A (t, s)–sequence


in base b is an infinite sequence (xi ) for i ≥ 1 such that for all integers r ≥ 0
and m ≥ t, the points xrbm +1 , . . . , x(r+1)bm form a (t, m, d)–net in base b. This
sequence can be expressed as an infinite stream of (t, m, d)–nets, simultaneously
for all m ≥ t. Faure [10] provided a construction of (0, p)-sequences in base p.
Niederreiter [25] showed that construction can be extended to (0, q)–sequences in
base q where q = pr is a power of a prime p. The Faure net shown in Figure 1.6 is
in fact the first 81 points of the first two variables in a (0, 3)-sequence in base 3.

For m ≥ t and 1 ≤ λ < b, the first λbm points in a (t, d)–sequence are
balanced with respect to all b-ary boxes of volume bt−m or larger. If n is not of the
form λbm , then the points do not necessarily balance any non-trivial b-ary boxes.

The Faure sequence and Niederreiter’s generalization of it, require b ≥ d.


When the dimension is large then it becomes necessary to use a large base b, and
then either bm is very large, or m is very small. Then the Sobol’ [35] sequences
become attractive. They are (t, d)–sequences in base b = 2. The quality parame-
ter t depends on d. Niederreiter [25] combined the methods of Sobol’ and Faure,
generating new sequences. Any (t, s)–sequence is a low discrepancy sequence, as
shown in Niederreiter [26].
● ●

● ●




● ●

● ●






● ●



● ●


● ●





● ●


● ●

● ●

● ●
● ●





● ●
● ●



● ●



● ●






● ●



Figure 1.6: Shown are 81 points of a (0, 4)–net in base 3. Reference lines are
included to make the 3-ary boxes more visible. There 5 different shapes of 3-ary
box balanced by these points. One box of each shape is highlighted.

1.6 Integration Lattices


In addition to digital nets and sequences, there is a second major QMC technique,
known as integration lattices. The simplest example of an integration lattice is a
rank one lattice. These take the form

xi = (i − 1)(g1 , . . . , gd ) mod n (1.18)


● ● ● ●
● ● ● ●
● ● ● ●
● ● ● ●
● ● ● ●
● ● ● ●
● ● ● ●
● ● ● ●
● ● ● ●
● ● ● ●
● ● ● ●
● ● ● ●
● ● ● ●
● ● ● ●
● ● ● ●
● ● ● ●
● ● ● ●
● ● ● ●
● ● ● ●
● ● ● ●
● ● ● ●
● ● ● ●
● ● ● ●
● ● ● ●
● ● ● ●
● ● ● ●
● ● ● ●
● ● ● ●
● ● ● ●
● ● ● ●
● ● ● ●
● ● ● ●
● ● ● ●
● ● ● ●
● ● ● ●
● ● ● ●
● ● ● ●
● ● ● ●
● ● ● ●
● ● ● ●
● ● ● ●
● ● ● ●
● ● ● ●
● ● ● ●
● ●

Figure 1.7: Shown are the points of two integration lattices in the unit square. The
lattice on the right has much better uniformity, showing the importance of making
a good choice of lattice generator.

for i = 1, . . . , n. Usually g1 = 1. Figure 1.7 shows two integration lattices in


[0, 1)2 with n = 89. The first has g2 = 22 and the second one has g2 = 55.
It is clear that the second lattice in Figure 1.7 is more evenly distributed than
the first one. The method of good lattice points is the application of the rule (1.18)
with n and g carefully chosen to get good uniformity. Fang and Wang [9] and
Hua and Wang [18] describe construction and use of good lattice points, including
extensive tables of n and g.
Sloan and Joe [34] describe integration lattices in general, including lattices of
rank higher than 1. A lattice of rank r for 1 ≤ r ≤ d requires r vectors like g to
generate it. The origin of lattice methods is in Korobov [20]. Korobov’s rules have
g = (1, h, h2 , . . . , hd−1 ) so that the search for a good rule requires only a careful
choice of two numbers n and h.
Until recently, integration lattices were not extensible. Extensible integration
lattices are a research topic of current interest, following the publication of Hick-
ernell, Hong, L’Ecuyer and Lemieux [15].
Integration lattices are not as widely used in computer graphics as digital nets.
Their periodic structure is likely to produce unwanted aliasing artifacts, at least in
some applications. Compared to digital nets, integration lattices are very good at
integrating smooth functions, especially smooth periodic functions.

1.7 Randomized Quasi-Monte Carlo


QMC methods may be thought of as derandomized MC. Randomized QMC (RQMC)
methods re-randomize them. The original motivation is to get sample based error
estimates.
In RQMC, one takes a QMC sequence (ai ) and transforms it into random points
(xi ) such that xi retain a QMC property and the expectation of Iˆ is I. The simplest
way to achieve the latter property is to have each xi ∼ U [0, 1)d . With RQMC
we can repeat a QMC integration R times independently getting Iˆ1 , . . . , IˆR . The
combined estimate Iˆ = (1/R) R ˆ
P
r=1 Ir hasP expected value I and an unbiased
ˆ −1 −1 r ˆ ˆ2
estimate of the RMSE of I is R (R − 1) r=1 (Ir − I) .
Cranley and Patterson [6] proposed a rotation modulo one

xi = ai + U mod 1

where U ∼ U [0, 1)d and both addition and remainder modulo one are interpreted
componentwise. It is easy to see that each xi ∼ U [0, 1)d . Cranley and Patterson
proposed rotations of integration lattices. Tuffin [39] considered applying such
rotations to digital nets. They don’t remain nets, but they still look very uniform.
Owen [27] proposes a scrambling of the base b digits of ai . Suppose that ai
is the i’th row of the matrix A with entries Aij for j = 1, . . . , d, and either i =
1, . . . , n for a finite sequence or i ≥ 1 for an infinite one. Let Aij = ∞ −k
P
k=1 b aijk
where aijk ∈ {0, 1, . . . , b − 1}. Now let xijk = πj·aij1 ...aij k−1 (aijk ) where
πj·aij1 ...aij k−1 is a uniform random permutation of 0, . . . , b − 1. All the permu-
tations required are independent, and the permutation applied to the k’th digits of
Aij depends on j and on the preceding k − 1 digits.
Applying this scrambling to any point a ∈ [0, 1)d produces a point x ∼
U [0, 1)d . If (ai ) is a (t, m, d)–net in base b or a (t, d)–sequence in base b, then
with probability 1, the same holds for the scrambled version (xi ). The scrambling
described above requires a great many permutations. Random linear scrambling
is a partial derandomization of scrambled nets, given by Matoušek [23] and also
in Hong and Hickernell [17]. Random linear scrambling significantly reduces the
number of permutations required from O(dbm ) to O(dm2 ).
For integration over a scrambled digital sequence we have Var(I)ˆ = o(1/n)
2
for any f with σ < ∞. Thus for large enough n a better than MC result will be
obtained. For integration over a scrambled (0, m, d)-net Owen [28] shows that
 b min(d−1,m) σ 2 2.72 σ 2
ˆ ≤
Var(I) ≤ .
b−1 n n
.
That is scrambled (0, m, d)–nets cannot have more than e = exp(1) = 2.72 times
the Monte Carlo variance for finite n. For nets in base b = 2 and t ≥ 0, Owen [30]
shows that
2
ˆ ≤ 2t 3d σ .
Var(I)
n
Compared to QMC, we expect RQMC to do no harm. After all, the resulting
xi still have a QMC structure, and so the RMSE should be O(n−1 (log n)d ). Some
forms of RQMC reduce the RMSE to O(n−3/2 (log n)(d−1)/2 ) for smooth enough
f . This can be understood as random errors cancelling where deterministic ones
do not. Surveys of RQMC appear in Owen [31] and L’Ecuyer and Lemieux [22].

1.8 Padding and Latin Supercube Sampling


In some applications d is so large that it becomes problematic to construct a mean-
ingful QMC sequence. For example the number of random vectors needed to fol-
low a single light path in a scene with many reflective objects can be very large
and may not have an a priori bound. As another example, if acceptance-rejection
sampling (Devroye [8]) is used to generate a random variable then a large number
of random variables may need to be generated in order to produce that variable.
Padding is a simple expedient solution to the problem. One uses a QMC or
RQMC sequence in dimension s for what one expects are the s most important in-
put variables. Then one pads out the input with d − s independent U [0, 1) random
variables. This technique was used in Spanier [36] for particle transport simula-
tions. It is also possible to pad with a d − s dimensional Latin hypercube sample
as described in Owen [29], even when d is conceptually infinite.
In Latin supercube sampling, the d input variables of xi are partitioned into
some number k of groups. The j’th group has dimension dj ≥ 1 and of course
Pk
j=1 dj = d. A QMC or RQMC method is applied in each of the k groups.
Just as the van der Corput sequence cannot simply be substituted for a pseudo-
random generator, care has to be taken in using multiple (R)QMC methods within
the same problem. It would not work to take k independent randomizations of
the same QMC sequence. The fix is to randomize the run order of the k groups
relative to each other, just as Latin hypercube sampling randomizes the run order
of d stratified samples.
To describe LSS, for j = 1, . . . , k and i = 1, . . . , n let aji ∈ [0, 1)dj . Suppose
that aj1 , . . . , ajn are a (R)QMC point set. For j = 1, . . . , k, let πj (i) be indepen-
dent uniform permutations of 1, . . . , n. Then let xji = ajπj (i) . The LSS has rows
xi comprised of x1i , . . . , xki . Owen [29] shows that in Latin supercube sampling
the function f can be written as a sum of two parts. One, from within groups
of variables, is integrated with an (R)QMC error rate, while the other part, from
between groups of variables, is integrated at the Monte Carlo rate. Thus a good
grouping of variables is important as is a good choice of (R)QMC within groups.
Bibliography

[1] J. Beck and W. W. L. Chen. Irregularities of Distribution. Cambridge Uni-


versity Press, New York, 1987.

[2] Kenneth Chiu, Peter Shirley, and Changyaw Wang. Multi-jittered sampling.
In Paul Heckbert, editor, Graphics Gems IV, pages 370–374. Academic Press,
Boston, 1994.

[3] K.-L. Chung. An estimate concerning the Kolmogoroff limit distribution.


Transactions of the American Mathematical Society, 67:36–50, 1949.

[4] William G. Cochran. Sampling Techniques (3rd Ed). John Wiley & Sons,
1977.

[5] Robert L. Cook, Thomas Porter, and Loren Carpenter. Distributed ray trac-
ing. Computer Graphics, 18(4):165–174, July 1984. ACM Siggraph ’84
Conference Proceedings.

[6] R. Cranley and T.N.L. Patterson. Randomization of number theoretic meth-


ods for multiple integration. SIAM Journal of Numerical Analysis, 13:904–
914, 1976.

[7] P. J. Davis and P. Rabinowitz. Methods of Numerical Integration (2nd Ed.).


Academic Press, San Diego, 1984.

[8] Luc Devroye. Non-uniform Random Variate Generation. Springer, 1986.

[9] Kai-Tai Fang and Yuan Wang. Number Theoretic Methods in Statistics. Chap-
man and Hall, London, 1994.

[10] Henri Faure. Discrépance de suites associées à un système de numération (en


dimension s). Acta Arithmetica, 41:337–351, 1982.

21
[11] Henri Faure. Good permutations for extreme discrepancy. Journal of Number
Theory, 42:47–56, 1992.

[12] G. Fishman. Monte Carlo: Concepts, Algorithms, and Applications.


Springer-Verlag, 1995.

[13] S. Haber. A modified Monte Carlo quadrature. Mathematics of Computation,


20:361–368, 1966.

[14] J.H. Halton. On the efficiency of certain quasi-random sequences of points


in evaluating multi-dimensional integrals. Numerische Mathematik, 2:84–90,
1960.

[15] F. J. Hickernell, H. S. Hong, P. L’Ecuyer, and C. Lemieux. Extensible lat-


tice sequences for quasi-Monte Carlo quadrature. SIAM Journal on Scientific
Computing, 22(3):1117–1138, 2000.

[16] E. Hlawka. Funktionen von beschränkter Variation in der Theorie der Gle-
ichverteilung. Annali di Matematica Pura ed Applicata, 54:325–333, 1961.

[17] H. S. Hong and F. J. Hickernell. Implementing scrambled digital sequences.


AMS Transactions on Mathematical Software, 2003. To appear.

[18] L.K. Hua and Y. Wang. Applications of number theory to numerical analysis.
Springer, Berlin, 1981.

[19] Alexander Keller. A quasi-Monte Carlo algorithm for the global illumination
problem in a radiosity setting. In Harald Niederreiter and Peter Jau-Shyong
Shiue, editors, Monte Carlo and Quasi-Monte Carlo Methods in Scientific
Computing, pages 239–251, New York, 1995. Springer-Verlag.

[20] N. M. Korobov. The approximate computation of multiple integrals. Dokl.


Akad. Nauk SSSR, 124:1207–1210, 1959.

[21] L. Kuipers and H. Niederreiter. Uniform Distribution of Sequences. John


Wiley and Son, New York, 1976.

[22] P. L’Ecuyer and C. Lemieux. A survey of randomized quasi-Monte Carlo


methods. In M. Dror, P. L’Ecuyer, and F. Szidarovszki, editors, Modeling
Uncertainty: An Examination of Stochastic Theory, Methods, and Applica-
tions, pages 419–474. Kluwer Academic Publishers, 2002.
[23] J. Matoušek. On the L2 –discrepancy for anchored boxes. Journal of Com-
plexity, 14:527–556, 1998.

[24] M. D. McKay, R. J. Beckman, and W. J. Conover. A comparison of three


methods for selecting values of input variables in the analysis of output from
a computer code. Technometrics, 21(2):239–45, 1979.

[25] Harald Niederreiter. Point sets and sequences with small discrepancy. Monat-
shefte fur mathematik, 104:273–337, 1987.

[26] Harald Niederreiter. Random Number Generation and Quasi-Monte Carlo


Methods. S.I.A.M., Philadelphia, PA, 1992.

[27] A. B. Owen. Randomly permuted (t, m, s)-nets and (t, s)-sequences. In


Harald Niederreiter and Peter Jau-Shyong Shiue, editors, Monte Carlo and
Quasi-Monte Carlo Methods in Scientific Computing, pages 299–317, New
York, 1995. Springer-Verlag.

[28] A. B. Owen. Monte Carlo variance of scrambled equidistribution quadrature.


SIAM Journal of Numerical Analysis, 34(5):1884–1910, 1997.

[29] A. B. Owen. Latin supercube sampling for very high dimensional simula-
tions. ACM Transactions on Modeling and Computer Simulation, 8(2):71–
102, 1998.

[30] Art B. Owen. Scrambling Sobol’ and Niederreiter-Xing points. Journal of


Complexity, 14(4):466–489, December 1998.

[31] Art B. Owen. Monte Carlo quasi-Monte Carlo and randomized quasi-Monte
Carlo. In H. Niederreiter and J. Spanier, editors, Monte Carlo and quasi-
Monte Carlo Methods 1998, pages 86–97, 1999.

[32] Ch. Schlier. A practitioner’s view on qmc integration. Mathematics and


Computers in Simulation, 2002.

[33] P. Shirley. Discrepancy as a quality measure for sample distributions.


In Werner Purgathofer, editor, Eurographics ’91, pages 183–194. North-
Holland, September 1991.

[34] Ian H. Sloan and S. Joe. Lattice Methods for Multiple Integration. Oxford
Science Publications, Oxford, 1994.
[35] I. M. Sobol’. The distribution of points in a cube and the accurate evaluation
of integrals (in Russian). Zh. Vychisl. Mat. i Mat. Phys., 7:784–802, 1967.

[36] J. Spanier. Quasi-Monte Carlo Methods for Particle Transport Problems. In


Harald Niederreiter and Peter Jau-Shyong Shiue, editors, Monte Carlo and
Quasi-Monte Carlo Methods in Scientific Computing, pages 121–148, New
York, 1995. Springer-Verlag.

[37] Michael Stein. Large sample properties of simulations using Latin hypercube
sampling. Technometrics, 29(2):143–51, 1987.

[38] Boxin Tang. Orthogonal array-based Latin hypercubes. Journal of the Amer-
ican Statistical Association, 88:1392–1397, 1993.

[39] Bruno Tuffin. On the use of low discrepancy sequences in Monte Carlo meth-
ods. Technical Report 1060, I.R.I.S.A., Rennes, France, 1996.

[40] J. G. van der Corput. Verteilungsfunktionen I. Nederl. Akad. Wetensch. Proc.,


38:813–821, 1935.

[41] J. G. van der Corput. Verteilungsfunktionen II. Nederl. Akad. Wetensch.


Proc., 38:1058–1066, 1935.

[42] Eric Veach and Leonidas Guibas. Bidirectional estimators for light transport.
In 5th Annual Eurographics Workshop on Rendering, pages 147–162, June
13–15 1994.

[43] Eric Veach and Leonidas Guibas. Optimally combining sampling techniques
for Monte Carlo rendering. In SIGGRAPH ’95 Conference Proceedings,
pages 419–428. Addison-Wesley, August 1995.

[44] H. Weyl. Über die gleichverteilung von zahlen mod. eins. Mathematische
Annalen, 77:313–352, 1916.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy