Siggraph03
Siggraph03
Siggraph03
In Monte Carlo (MC) sampling the sample averages of random quantities are used
to estimate the corresponding expectations. The justification is through the law of
large numbers. In quasi-Monte Carlo (QMC) sampling we are able to get a law
of large numbers with deterministic inputs instead of random ones. Naturally we
seek deterministic inputs that make the answer converge as quickly as possible. In
particular it is common for QMC to produce much more accurate answers than MC
does. Keller [19] was an early proponent of QMC methods for computer graphics.
We begin by reviewing Monte Carlo sampling and showing how many prob-
lems can be reduced to integrals over the unit cube [0, 1)d . Next we consider
how stratification methods, such as jittered sampling, can improve the accuracy
of Monte Carlo for favorable functions while doing no harm for unfavorable ones.
Method of multiple-stratification such as Latin hypercube sampling (n-rooks) rep-
resent a significant improvement on stratified sampling. These stratification meth-
ods balance the sampling points with respect to a large number of hyperrectangular
boxes. QMC may be thought of as an attempt to take this to the logical limit: how
close can we get to balancing the sample points with respect to every box in [0, 1)d
at once? The answer, provided by the theory of discrepancy is surprisingly far, and
that the result produce a significant improvement compared to MC. This chapter
concludes with a presentation of digital nets, integration lattices and randomized
QMC.
1
1.1 Crude Monte Carlo
As a frame of reference for QMC, we recap the basics of MC. Suppose that the
average we want to compute is written as an integral
Z
I= f (x)q(x)dx. (1.1)
D
The set D ⊆ Rd is the domain of interest, perhaps a region on the unit sphere
or in the unit cube. The function q is a probability density function on D. That is
R
q(x) ≥ 0 and D q(x)dx = 1. The function f gives the quantity whose expectation
we seek: I is the expected value of f (x) for random x with density q on D.
In crude Monte Carlo sampling we generate n independent samples x1 , . . . , xn
from the density q and estimate I by
n
1X
Iˆ = Iˆn = f (xi ). (1.2)
n
i=1
Pr lim Iˆn = I = 1.
(1.3)
n→∞
That is, crude Monte Carlo always converges to the right answer as n increases
without bound.
Now suppose that f has finite variance σ 2 = Var(f (x)) ≡ D (f (x)−I)2 q(x)dx.
R
Then E((Iˆn − I)2 ) = σ 2 /n so the root mean square error (RMSE) of MC sam-
√
pling is O(1/ n). This rate is slow compared to that of classical quadrature rules
(Davis and Rabinowitz [7]) for smooth functions in low dimensions. Monte Carlo
methods can improve on classical ones for problems in high dimensions or on dis-
continuous functions.
A given integration problem can be written in the form (1.1) in many different
ways. First, let p be a probability density on D such that p(x) > 0 whenever
q(x)|f (x)| > 0. Then
Z Z
f (x)q(x)
I= f (x)q(x)dx = p(x)dx
D D p(x)
and we could as well sample xi ∼ p(x) and estimate I by
n
1 X f (xi )q(xi )
Iˆp = Iˆn,p = . (1.4)
n p(xi )
i=1
The RMSE can be strongly affected, for better or worse, by this re-expression,
known as importance sampling. If we are able to find a good p that is nearly
proportional to f q then we can get much better estimates.
Making a good choice of density p is problem specific. Suppose for instance,
that one of the components of x describes the angle θ = θ(x) between a ray and a
surface normal. The original version of f may include a factor of cos(θ)η for some
η > 0. Using a density p(x) ∝ q(x) cos(θ)η corresponds to moving the cosine
power out of the integrand and into the sampling density.
We will suppose that a choice of p has already been made. There is also the
possibility of using a mixture of sampling densities pj as with the balance heuristic
of Veach and Guibas [42, 43]. This case can be incorporated by increasing the di-
mension of x by one, and using that variable to select j from a discrete distribution.
Monte Carlo sampling of x ∼ p over D almost always uses points from a
pseudo-random number generator simulating the uniform distribution on the inter-
val from 0 to 1. We will take this to mean the uniform distribution on the half-open
interval [0, 1). Suppose that it takes d∗ uniform random variables to simulate a
point in the dimensional domain D. Often d∗ = d but sometimes d∗ = 2 vari-
ables from [0, 1) can be used to generate a point within a surface element in d = 3
dimensional space. In other problems we might use d∗ > d random variables to
generate a p distributed point in D ⊆ Rd . Chapter ?? describes general techniques
for transforming [0, 1)d into D and provides some specific examples of use in ray
tracing. Devroye [8] is a comprehensive reference on techniques for transforming
uniform random variables into one’s desired random objects.
∗
Suppose that a point having the U [0, 1)d distribution is transformed into a
point τ (x) having the density p on D. Then
Z Z Z
f (x)q(x) f (τ (x))q(τ (x))
I= p(x)dx = dx ≡ f ∗ (x)dx
D p(x) [0,1)d∗ p(τ (x)) [0,1)d∗
(1.5)
∗
where xi are independent U [0, 1)d random variables.
Equation (1.5) expresses the original MC problem (1.1) as one of integrating
a function f ∗ over the unit cube in d∗ dimensions. We may therefore reformulate
the problem as finding I = [0,1)d f (x)dx. The new d is the old d∗ and the new f
R
is the old f ∗ .
1.2 Stratification
Stratified sampling is a technique for reducing the variance of a Monte Carlo inte-
gral. It was originally applied in survey sampling (see Cochran [4]) and has been
adapted in Monte Carlo methods, Fishman [12]. In stratified sampling, the domain
of x is written as a union of strata D = H
S T
h=1 Dh where Dj Dk = ∅ if j 6= k.
An integral is estimated from within each stratum and then combined. Following
the presentation in chapter 1.1, we suppose here that D = [0, 1)d .
Figure 1.1 shows a random sample from the unit square along with 3 alternative
stratified samplings. The unit cube [0, 1)d is very easily partitioned into box shaped
strata like those shown. It is also easy to sample uniformly in such strata. Suppose
that a, c ∈ [0, 1)d with a < c componentwise. Let U ∼ U [0, 1)d . Then a + (c −
a)U interpreted componentwise is uniformly distributed on the box with lower left
corner a and upper right corner c.
In the simplest form of stratified sampling, a Monte Carlo sample xh1 , . . . xhnh
is taken from within stratum Dh . Each stratum is sampled independently, and the
results are combined as
H nh
X |Dh | X
IˆSTRAT = IˆSTRAT (f ) = f (xhi ), (1.7)
nh
h=1 i=1
● ● ● ●
●
●
●
● ●
●
● ●
●● ●
● ●
●
● ●
●
●
● ●
● ●
● ● ● ●
● ●
● ●● ●
● ●
●
● ●
● ●
●
●
● ● ●
● ●
●
●
●
●
● ●
Figure 1.1: The upper left figure is a simple random sample of 16 points in [0, 1)2 .
The other figures show stratified samples with 4 points from each of 4 strata.
from (1.10).
Equation (1.12) shows that stratified sampling with proportional allocation
does not increase the variance. Proportional allocation is not usually optimal. Op-
timal allocations take nh ∝ |Dh |σh . If estimates of σh are available they can be
used to set nh , but poor estimates of σh could result in stratified sampling with
larger variance than crude MC. We will assume proportional allocation.
A particular form of stratified sampling is well suited to the unit cube. Haber [13]
proposes to partition the unit cube [0, 1)d into H = md congruent cubical regions
and to take nh = 1 point from each of them. This stratification is known as jittered
sampling in graphics, following Cook, Porter and Carpenter [5].
Any function that is constant within strata is integrated without error by IˆSTRAT .
If f is close to such a function, then f is integrated with a small error. Let f¯ be the
function defined by f¯(x) = µh(x) , and define the residual fRES (x) = f (x) − f¯(x).
This decomposition is illustrated in Figure 1.2 for a function on [0, 1). Stratified
sampling reduces the Monte Carlo variance from σ 2 (f )/n to σ 2 (fRES )/n.
Figure 1.2: The upper plot shows a piece-wise smooth function f on [0, 1). The
step function is the best approximation f¯ to f , in mean square error, among func-
tions constant over intervals [j/10, (j + 1)/10). The lower plot shows the differ-
ence f − f¯ using a vertical scale similar to the upper plot.
sampling (Shirley [33]). Figure 1.3 shows a set of 16 points in the square, that are
simultaneously stratified in each of 16 horizontal and vertical strata.
If the function f on [0, 1)2 is dominated by either the horizontal coordinate
or the vertical one, then we’ll get an accurate answer, and we don’t even need to
know which is the dominant variable. Better yet, suppose that neither variable is
● ●
● ●
● ●
● ●
● ●
● ●
● ●
● ●
● ●
● ●
● ●
● ●
● ●
● ●
● ●
● ●
Figure 1.3: The left plot shows 16 points, one in each of 16 vertical strata. The
right plot shows the same 16 points. There is one in each of 16 horizontal strata.
These points form what is called a Latin hypercube sample, or an n-rooks pattern.
where fH depends only on the horizontal variable, fV depends only on the vertical
one, and the residual fRES is defined by subtraction. Latin hypercube sampling will
give an error that is largely unaffected by the additive part fH + fV . Stein [37]
showed that the variance in Latin hypercube sampling is approximately σRES 2 /n
2 is the smallest variance of f
where σRES RES for any decomposition of the form (1.13).
πj (i − 1) + Uij
Xij = .
n
Then the n rows of X form a Latin hypercube sample. That is we may take xi =
(Xi1 , . . . , Xid ). An integral estimate Iˆ is the same whatever order the f (xi ) are
summed. As a consequence we only need to permute d − 1 of the d input variables.
We can take π1 (i − 1) = i − 1 to save the cost of one random permutation.
Jittered sampling uses n = k 2 strata arranged in a k by k grid of squares
while n-rooks provides simultaneous stratification in both an n by 1 grid and a 1
by n grid. It is natural to wonder which method is better. The answer depends on
whether f is better approximated by a step function, constant within squares of size
1/k × 1/k grid, or by an additive function with each term constant within narrower
bins of width 1/n. Amazingly, we don’t have to choose. It is possible to arrange
n = k 2 points in an n-rooks arrangement that simultaneously has one point in
each square of a k by k grid. A construction for this was proposed independently
by Chiu, Shirley and Wang [2] and by Tang [38]. The former handle more general
grids of n = k1 × k2 points. The latter reference arranges points in [0, 1)d with
d ≥ 2 in a Latin hypercube such that every two dimensional projection of xi puts
one point into each of a grid of strata.
Iˆn → I for every function f (x) of the form 1a≤x<c and so for any finite linear
combination of such indicators of boxes. It is known that limn→∞ |Iˆn − I| = 0
for uniformly distributed xi and any function f that is Riemann integrable. Thus
uniformly distributed sequences can be used to provide a deterministic law of large
numbers.
To show that a sequence is uniformly distributed it is enough to show that
Iˆn → I when f is the indicator of a suitable subset of boxes. Anchored boxes
take the form [0, a) for some point a ∈ [0, 1)d . If Iˆn → I for all indicators of
anchored boxes, then the same holds for all boxes. For integers b ≥ 2 a b-adic box
is a Cartesian product of the form
d h
Y `j `j + 1
, . (1.14)
j=1
b k j bk j
for integers kj ≥ 0 and 0 ≤ `j < bkj . When b = 2 the box is called dyadic. An
arbitrary box can be approximated by b-ary boxes. If Iˆ → I for all indicators of b-
adic boxes then the sequence (xi ) is uniformly distributed. A mathematically more
interesting result is the Weyl condition. The sequence (xi ) is uniformly√
distributed
if and only if Iˆn → I for all trigonometric polynomials f (x) = e2π −1k·x where
k ∈ Zd .
If xi are independent U [0, 1)d variables, then (xi ) is uniformly distributed with
probability one. Of course we hope to do better than random points. To that end,
we need a numerical measure of how uniformly distributed a sequence of points is.
These measures are called discrepancies, and there are a great many of them. One
of the simplest is the star discrepancy
n
1X
Dn∗ = Dn∗ (x1 , . . . , xn ) = sup 10≤xi <a − [0, a) (1.15)
a∈[0,1)d n
i=1
Figure 1.4 illustrates this discrepancy. It shows an anchored box [0, a) ∈ [0, 1)2 and
a list of n = 20 points. The anchored box has 5 of the 20 points so (1/n) ni=1 10≤xi <a =
P
0.20. The volume of the anchored box is 0.21, so the difference is |0.2 − 0.21| =
●
● ●
a
●
●
● ● ●
●
●
●
● ● ●
●
● ●
●
●
Figure 1.4: Shown are 20 points in the unit square and an anchored box (shaded)
from (0, 0) to a = (.3, .7). The anchored box [0, a) has volume 0.21 and contains
a fraction 5/20 = 0.2 of the points.
0.01. The star discrepancy Dn∗ is found by maximizing this difference over all
anchored boxes [0, a).
For xi ∼ U [0, 1)d , Chung [3] showed that
√
2nDn∗
lim sup p =1 (1.16)
n→∞ log(log(n))
so Dn∗ = O((log log(n)/n)1/2 ) with probability one. An iterated logarithm grows
slowly with n, so Dn∗ may be only slightly larger than n−1/2 for large n.
It is known that a deterministic choice of (xi ) can yield Dn∗ much smaller
than (1.16). There are infinite sequences (xi ) in [0, 1)d with Dn∗ (x1 , . . . , xn ) =
O(log(n)d /n). Such sequences are called “low discrepancy” sequences, and some
of them are described in chapter 1.5. It is suspected but not proven that infinite
sequences cannot be constructed with Dn∗ = o(log(n)d /n); see Beck and Chen [1].
In an infinite sequence, the first m points of x1 , . . . , xn are the same for any
n ≥ m. If we knew in advance the value of n that we wanted then we might use
a sequence customized for that value of n, such as xn1 , . . . , xnn ∈ [0, 1)d , without
insisting that xni = xn+1 i . In this setting Dn∗ (xn1 , . . . , xnn ) = O(log(n)d−1 /n)
is possible. The effect is like reducing d by one, but the practical cost is that such
a sequence is not extensible to larger n.
There is a connection between better discrepancy and more accurate integra-
tion. Hlawka [16] proved the Koksma-Hlawka inequality
The factor VHK (f ) is the total variation of f in the sense of Hardy and Krause.
Niederreiter [26] gives the definition.
Equation (1.17) shows that a deterministic law of large numbers can be much
better than the random one, for large enough n and a function f with finite variation
VHK (f ). One often does see QMC methods performing much better than MC, but
equation (1.17) is not good for predicting when this will happen. The problem is
that Dn∗ is hard to compute, VHK (f ) is harder still, and that the bound (1.17) can
grossly overestimate the error. In some cases VHK is infinite while QMC still beats
MC. Schlier [32] reports that even for QMC the variance of f is more strongly
related to the error than is the variation.
Table 1.1: The first column shows integers ` from 0 to 7. The second column shows
` in base 2. The third column reflects the digits of ` through the binary point to
construct φ2 (`). The final column is the decimal version of φ2 (`).
[0, 1). A radical inverse sequence consists of φb (i) for n consecutive values of i,
conventionally 0 through n − 1.
Table 1.1 illustrates a radical inverse sequence, using b = 2 as van der Corput
did. Because consecutive integers alternate between even and odd, the van der
Corput sequence alternates between values in [0, 1/2) and [1/2, 1). Among any 4
consecutive van der Corput points there is exactly one in each interval [k/4, (k +
1)/4) for k = 0, 1, 2, 3. Similarly any bm consecutive points from the radical
inverse sequence in base b are stratified with respect to bm congruent intervals of
length 1/bm .
If d > 1 then it would be a serious mistake to simply replace a stream of
pseudo-random numbers by the van der Corput sequence. For example with d = 2
taking points xi = (φ2 (2i − 2), φ2 (2i − 1)) ∈ [0, 1)2 we would find that all xi lie
on a diagonal line with slope 1 inside [0, 1/2) × [1/2, 1).
For d > 1 we really need a stream of quasi-random d-vectors. There are
several ways to generalize the van der Corput sequence to d ≥ 1. The Halton [14]
sequence in [0, 1)d works with d relatively prime bases b1 , . . . , bd . Usually these
are the first d prime numbers. Then for i ≥ 1,
The Halton sequence has low discrepancy: Dn∗ = O((log n)d /n).
The Halton sequence is extensible in both n and d. For small d the points of the
Halton sequence have a nearly uniform distribution. The left panel of Figure 1.5
shows a two dimensional portion of the Halton sequence using prime bases 2 and
● ● ● ● ●
●
● ● ●
● ● ● ●
● ● ● ● ● ●
● ●
● ● ● ●
● ●
● ● ● ● ●
● ●
● ● ● ● ●
● ● ●
● ● ● ●
● ● ● ● ● ●
● ● ●
● ● ● ●
● ● ●
● ● ● ●
● ● ● ● ●
● ●
● ● ● ● ●
● ●
● ● ● ● ●
● ● ●
● ● ●
● ● ●
● ● ● ●
● ● ● ● ●
● ● ●
● ● ● ● ●
●
● ● ● ● ●
● ●
● ● ● ● ●
● ●
● ● ● ● ●
● ●
● ● ● ● ●
● ● ●
● ● ● ● ● ●
● ● ●
● ● ● ● ●
● ● ● ● ● ●
●
● ● ● ● ● ●
● ●
● ● ● ● ● ●
● ●
● ● ● ● ● ●
● ●
● ● ● ● ●
● ●
● ● ● ● ● ● ●
● ●
● ● ● ● ● ●
● ●
● ● ● ● ● ●
Figure 1.5: The left panel shows the first 23 ×32 = 72 points of the Halton sequence
using bases 2 and 3. The middle panel shows the first 72 points for the 10’th and
11’th primes, 29 and 31 respectively. The right panel shows these 72 points after
Faure’s [11] permutation is applied.
3. The second panel shows the same points for bases 29 and 31 as would be needed
with d = 11. While the are nearly uniform in one dimensional problems, their two
dimensional uniformity is seriously lacking. When it is possible to identify the
more important components of x, these should be sampled using the smaller prime
bases.
The poorer distribution for larger primes can be mitigated using a permutation
of Faure [11]. Let π be a permutation of {0, . . . , b − 1}. Then the radical inverse
function can be generalized to φb,π (n) = ∞ −k
P
k=1 π(nk )b . It still holds that any
consecutive bm values of φb,π (i) stratify into bm boxes of length 1/bm . Faure’s
transformation πb of 0, . . . , b − 1 is particularly simple. Let π2 = (0, 1). For even
b > 2 take πb = (2πb/2 , 2πb/2 + 1), so π4 = (0, 2, 1, 3). For odd b > 2 put
k = (b − 1)/2 and η = φb−1 . Then add 1 to any member of η greater than or equal
to k. Then πb = (η(0), . . . , η(k − 1), k, η(k), . . . , η(b − 2)). For example with
b = 5 we get k = 2, and after the larger elements are incremented, η = (0, 3, 1, 4).
Finally π5 = (0, 3, 2, 1, 4). The third plot in Figure 1.5 shows the effect of Faure’s
permutations on the Halton sequence.
Digital nets provide more satisfactory generalizations of radical inverse se-
quences to d ≥ 2. Recall the b-ary boxes in (1.14). The box there has volume
b−K where K = k1 + · · · + kd . Ideally we would like nb−K points in every such
box. Digital nets do this, at least for small enough K.
Let b ≥ 2 be an integer base and let m ≥ t ≥ 0 be integers. A (t, m, d)–net in
base b is a finite sequence x1 , . . . , xbm for which every b-ary box of volume bt−m
contains exactly bt points of the sequence.
For m ≥ t and 1 ≤ λ < b, the first λbm points in a (t, d)–sequence are
balanced with respect to all b-ary boxes of volume bt−m or larger. If n is not of the
form λbm , then the points do not necessarily balance any non-trivial b-ary boxes.
Figure 1.6: Shown are 81 points of a (0, 4)–net in base 3. Reference lines are
included to make the 3-ary boxes more visible. There 5 different shapes of 3-ary
box balanced by these points. One box of each shape is highlighted.
Figure 1.7: Shown are the points of two integration lattices in the unit square. The
lattice on the right has much better uniformity, showing the importance of making
a good choice of lattice generator.
xi = ai + U mod 1
where U ∼ U [0, 1)d and both addition and remainder modulo one are interpreted
componentwise. It is easy to see that each xi ∼ U [0, 1)d . Cranley and Patterson
proposed rotations of integration lattices. Tuffin [39] considered applying such
rotations to digital nets. They don’t remain nets, but they still look very uniform.
Owen [27] proposes a scrambling of the base b digits of ai . Suppose that ai
is the i’th row of the matrix A with entries Aij for j = 1, . . . , d, and either i =
1, . . . , n for a finite sequence or i ≥ 1 for an infinite one. Let Aij = ∞ −k
P
k=1 b aijk
where aijk ∈ {0, 1, . . . , b − 1}. Now let xijk = πj·aij1 ...aij k−1 (aijk ) where
πj·aij1 ...aij k−1 is a uniform random permutation of 0, . . . , b − 1. All the permu-
tations required are independent, and the permutation applied to the k’th digits of
Aij depends on j and on the preceding k − 1 digits.
Applying this scrambling to any point a ∈ [0, 1)d produces a point x ∼
U [0, 1)d . If (ai ) is a (t, m, d)–net in base b or a (t, d)–sequence in base b, then
with probability 1, the same holds for the scrambled version (xi ). The scrambling
described above requires a great many permutations. Random linear scrambling
is a partial derandomization of scrambled nets, given by Matoušek [23] and also
in Hong and Hickernell [17]. Random linear scrambling significantly reduces the
number of permutations required from O(dbm ) to O(dm2 ).
For integration over a scrambled digital sequence we have Var(I)ˆ = o(1/n)
2
for any f with σ < ∞. Thus for large enough n a better than MC result will be
obtained. For integration over a scrambled (0, m, d)-net Owen [28] shows that
b min(d−1,m) σ 2 2.72 σ 2
ˆ ≤
Var(I) ≤ .
b−1 n n
.
That is scrambled (0, m, d)–nets cannot have more than e = exp(1) = 2.72 times
the Monte Carlo variance for finite n. For nets in base b = 2 and t ≥ 0, Owen [30]
shows that
2
ˆ ≤ 2t 3d σ .
Var(I)
n
Compared to QMC, we expect RQMC to do no harm. After all, the resulting
xi still have a QMC structure, and so the RMSE should be O(n−1 (log n)d ). Some
forms of RQMC reduce the RMSE to O(n−3/2 (log n)(d−1)/2 ) for smooth enough
f . This can be understood as random errors cancelling where deterministic ones
do not. Surveys of RQMC appear in Owen [31] and L’Ecuyer and Lemieux [22].
[2] Kenneth Chiu, Peter Shirley, and Changyaw Wang. Multi-jittered sampling.
In Paul Heckbert, editor, Graphics Gems IV, pages 370–374. Academic Press,
Boston, 1994.
[4] William G. Cochran. Sampling Techniques (3rd Ed). John Wiley & Sons,
1977.
[5] Robert L. Cook, Thomas Porter, and Loren Carpenter. Distributed ray trac-
ing. Computer Graphics, 18(4):165–174, July 1984. ACM Siggraph ’84
Conference Proceedings.
[9] Kai-Tai Fang and Yuan Wang. Number Theoretic Methods in Statistics. Chap-
man and Hall, London, 1994.
21
[11] Henri Faure. Good permutations for extreme discrepancy. Journal of Number
Theory, 42:47–56, 1992.
[16] E. Hlawka. Funktionen von beschränkter Variation in der Theorie der Gle-
ichverteilung. Annali di Matematica Pura ed Applicata, 54:325–333, 1961.
[18] L.K. Hua and Y. Wang. Applications of number theory to numerical analysis.
Springer, Berlin, 1981.
[19] Alexander Keller. A quasi-Monte Carlo algorithm for the global illumination
problem in a radiosity setting. In Harald Niederreiter and Peter Jau-Shyong
Shiue, editors, Monte Carlo and Quasi-Monte Carlo Methods in Scientific
Computing, pages 239–251, New York, 1995. Springer-Verlag.
[25] Harald Niederreiter. Point sets and sequences with small discrepancy. Monat-
shefte fur mathematik, 104:273–337, 1987.
[29] A. B. Owen. Latin supercube sampling for very high dimensional simula-
tions. ACM Transactions on Modeling and Computer Simulation, 8(2):71–
102, 1998.
[31] Art B. Owen. Monte Carlo quasi-Monte Carlo and randomized quasi-Monte
Carlo. In H. Niederreiter and J. Spanier, editors, Monte Carlo and quasi-
Monte Carlo Methods 1998, pages 86–97, 1999.
[34] Ian H. Sloan and S. Joe. Lattice Methods for Multiple Integration. Oxford
Science Publications, Oxford, 1994.
[35] I. M. Sobol’. The distribution of points in a cube and the accurate evaluation
of integrals (in Russian). Zh. Vychisl. Mat. i Mat. Phys., 7:784–802, 1967.
[37] Michael Stein. Large sample properties of simulations using Latin hypercube
sampling. Technometrics, 29(2):143–51, 1987.
[38] Boxin Tang. Orthogonal array-based Latin hypercubes. Journal of the Amer-
ican Statistical Association, 88:1392–1397, 1993.
[39] Bruno Tuffin. On the use of low discrepancy sequences in Monte Carlo meth-
ods. Technical Report 1060, I.R.I.S.A., Rennes, France, 1996.
[42] Eric Veach and Leonidas Guibas. Bidirectional estimators for light transport.
In 5th Annual Eurographics Workshop on Rendering, pages 147–162, June
13–15 1994.
[43] Eric Veach and Leonidas Guibas. Optimally combining sampling techniques
for Monte Carlo rendering. In SIGGRAPH ’95 Conference Proceedings,
pages 419–428. Addison-Wesley, August 1995.
[44] H. Weyl. Über die gleichverteilung von zahlen mod. eins. Mathematische
Annalen, 77:313–352, 1916.