Local Spatial Autocorrelation Biological Variables: Robert R

Download as pdf or txt
Download as pdf or txt
You are on page 1of 22

BiologicalJuumal ofthe Lannenii Socw& (1998), 65: 41-62.

With 6 figures
Article ID: bj980238

Local spatial autocorrelation in biological


variables

ROBERT R. S O U L FMLS*
Department of Ecology & Evolution, State Universip of flew Erk, Stony Brook,
AY I1 794-5245, U.S.A.

NEAL L. ODEN
;The M M E S Corporation, 11325 Seven L o c h Road, Suite 214, Potomac, MD 20854,
U.S.A.

BARBARA A. THOMSON
Department of Ecology €3 Evolution, State University offlew Erk, Stony Brook,
AY I1 794-5245, U.S.A.
Received 26January 1998; accepted for publication 28 April 1 9 9 8

Spatial autocorrclation (SA) methods have recently been extended to include the detection
of local spatial autocorrelation at individual sampling stations. We review the formulas for
these statistics and report on the results of an extensive population-genetic simulation study
we have published elsewhere to test the applicability of these methods in spatially distributed
biological data. We find that most biological variables exhibit global SA, and that in such
cases the methods proposed for testing the significance of local SA coefficients reject the null
hypothesis excessively. When global SA is absent, permutational methods for testing sig-
nificance yield reliable results. Although standard errors have been published for the local
SA coefficients, their employment using an asymptotically normal approach leads to unreliable
results; permutational methods are preferred. In addition to significance tests of suspected
non-stationary localities, we can use these methods in an exploratory manner to find and
identify hotspots (places with positive local SA) and coldspots (negative local SA) in a dataset.
We illustrate the application of these methods in three biological examples from plant
population biology, ecology and population genetics. The examples range from the study of
single variables to the joint analysis of several variables and can lead to successful demographic
and evolutionary inferences about the populations studied.
0 IYYB The 1.1iiiienn Soclei) d Lolidm~

ADDITIONAL KEY WORDS:-LISA - simulation - drift - isolation by distance - Liatric


cylindracea - Etythronium grand$flomm - Drosophila butzatii.

* Correspondence to: R. R. Sokal. E-mail: sokal@life.bio.sunysb.edu.


41
00241066/98/090041+22 $30.00/0 0 1998 The Linnean Society of London

Downloaded from https://academic.oup.com/biolinnean/article-abstract/65/1/41/2661155


by guest
on 12 April 2018
42 R. R. SOKAL ET AL.

CONTENTS

Introduction . . . . . . . . . . . . . . . . . . . . . . . 42
Global and local spatial autocorrelation . . . . . . . . . . . . . . 43
Simulation studies . . . . . . . . . . . . . . . . . . . . . 45
Examples . . . . . . . . . . . . . . . . . . . . . . . . 48
Local structure in a prairie herb population . . . . . . . . . . . 48
Lilies, gophers, and rocks in Colorado . . . . . . . . . . . . . 51
Cactus-breeding fruit flies in Australia . . . . . . . . . . . . . 53
Acknowledgements . . . . . . . . . . . . . . . . . . . . 60
References . . . . . . . . . . . . . . . . . . . . . . . 60

INTRODUCTION

When values of a variable observed at adjacent geographic locations resemble


each other more than expected under a randomness model, the variable is said to
be spatially autocorrelated. This phenomenon of spatial autocorrelation (SA) doc-
uments a departure from the assumption, required by common statistics, of in-
dependence of the values of the variable. When SA is present in a dataset, its
statistical analysis may require nonstandard tests.
The early work in SA analysis dates back to the 1950’s, but it was rarely employed
until the comprehensive treatment of the subject by Cliff & Ord (1973). Eight years
later, a much enlarged and revised presentation appeared (Cliff & Ord, 1981).
Spatial autocorrelation was introduced to biologists by Jumars, Thistle & Jones
(1977) and by Sokal & Oden (1978a,b) in this journal. Our laboratory has been
working on SA and its attendant problems ever since. We soon discovered that,
despite the nuisance of complicating the interpretation of ordinary statistical tests,
spatial autocorrelation can be informative about the processes that give rise to
observed patterns. In a series of papers (Sokal & Oden, 197813; Sokal, 1979; Sokal
& Wartenberg, 1981; Sokal, 1984, 1986; Sokal &Jacquez, 1991 ; and Sokal, Harding
& Oden, 1989), we examined the types of processes that SA could identify in the
areas of ecology and evolution. When these methods were criticized by Slatkin &
Arter (199 l), we responded with Sokal & Oden (199 1) and with a large simulation
study (Sokal, Oden & Thomson, 1997). Among studies by others, we may mention
Barbujani (1987) and the extensive work of B.K. Epperson (Epperson, 1990a,b,
1993a,b, 1994, 1995; Epperson, Huang & Li, in prep.; Epperson & Li, 1996, 1997).
There have been numerous applications of these methods to datasets from plants,
animals, and humans, resulting in inferences of selection, migration, drift, and
isolation by distance (see Sokal, Smouse & Neel, 1986; Sokal et al., 1987a, 1992;
Sokal, Oden & Barker, 1987b; Sokal et al., 1989; Barbujani & Sokal, 1991; Falsetti
& Sokal, 1993; Epperson, 1992; and references cited therein).
Along with the development of geographic information systems, analytical geo-
graphers are developing indices of local spatial autocorrelation. These indices measure
the dependence of the value of a variable at any one location upon neighboring
values of that variable. Seminal papers in this area are by Anselin (1995), Getis &
Ord (1992) and Ord & Getis (1995). These authors proposed various statistics,
summarized in the next section. Such a local SA statistic can serve several functions.
Applied to spatial datasets lacking global SA, the methods may find local areas
exhibiting spatial inhomogeneities and showing significant local departures from
randomness. Such work may lead to the discovery and statistical validation of

Downloaded from https://academic.oup.com/biolinnean/article-abstract/65/1/41/2661155


by guest
on 12 April 2018
LOCAL SPATIAL AUTOCORRELATION 43

‘hotspots’. These are areas of localized highly positive SA. By contrast, ‘coldspotd
are areas of localized highly negative SA. When global SA is present, we shall see
that determining the statistical significance of a local spatial autocorrelation statistic
is usually intractable, yet we may use local SA in an exploratory manner to see
which localities contribute more than others to the global SA. Furthermore, we shall
see that, when several variables are measured at each locality, the joint analysis of
the local SA of these variables helps us to infer the underlying causes of spatial
inhomogeneities. Our study aims to explore applications of these new local spatial
autocorrelation statistics in biological data, with special emphasis to population-
genetic, biosystematic, and ecological datasets.
After discussing the results and conclusions of a parallel statistical and simulation
study (Sokal, Oden & Thomson, 1998) regarding the usefulness and validity of these
methods in biological work, we shall proceed to three biological examples that
illustrate the applications of local spatial autocorrelation statistics.

GLOBAL AND LOCAL SPATIAL AUTOCORRELATION

The spatial autocorrelation coefficient of a variable Xis computed from the values
of X observed over a set of n localities. These values are combined according to a
weight matrix Wwith elements w irepresenting the strengths of connections between
localities i a n d j in the set of n samples. For purposes of this presentation, we shall
restrict ourselves to binary weight matrices consisting of ones (connected) and zeros
(unconnected), although in principle the weights wi may take on any value. The
diagonal elements w,,of the weight matrix are zero.
The earlier, above-cited work on SA dealt exclusivelywithglobalspatial autocovelation,
that is, with a summary measure of SA computed over all n localities. There are
two principal SA statistics-Moran’s I and Geary’s c. The former is a product-
moment coefficient analogous to the Pearson correlation coefficient, the latter is a
standardized, squared-distance coefficient. Formulas for these coefficients and their
expected values and variances under randomness hypotheses are found in Sokal &
Oden (1 978a), Cliff& Ord (1 98 l), Upton & Fingleton (1985), and Sokal et al. (1 998).
Anselin (1 995) has defined a local indicator ofspatial association (USA) as any statistic
satisfying the following two requirements:

a. the LISA for each observation gives an indication of the extent of significant
spatial clustering of similar values around the observation;
b. the sum of LISAs for all observations is proportional to a global indicator of
spatial association.
The first of these requirements means that the LISA is a measure of local SA. The
second requirement permits the decomposition of a global coefficient of SA into
separate parts, making it possible to identify the individual locations that are major
contributors to the global autocorrelation. Anselin (1 995) described two LISAs to
match the established global SA coefficients, I and c. These LISAs employ only
those elements wijof the weight matrix W that have nonzero weights with locality
i, whose local SA we wish to evaluate.

Downloaded from https://academic.oup.com/biolinnean/article-abstract/65/1/41/2661155


by guest
on 12 April 2018
44 R. R.S O W ETAL.

The local Moran coefiient at locality i is defined as

where zk=xk-.f, and m2 = X&/n, k= 1,. . .,n. This formula computes a product-
moment numerator between the indexed locality i and all the localities j that have
a nonzero connection (weight) with it. This quantity is standardized by division by
m2, the variance of the ti)^, a constant denominator for the I, associated with every
locality in the study. A positive value of 4 indicates a cluster of variates around
locality i similar to each other and to the variate 2,. A negative I,, signifies connected
variates dissimilar to that at i. The expected value oft., under the total randomization
hypothesis, is

E(I,.)= - w,/(n - 1) (2)


where w, is the sum Zjwijofall weights connected to location i. The total randomization
hypothesis assumes that all permutations of the observed data values on the locations
are equally likely. The derivation of expression (2) and of a formula for the variance
of 4 is given by Anselin (1995) and, from a different approach, by Sokal et al. (1998).
The local Geary coefficient at locality i is defined as

the notation being the same as before. This formula calculates a standardized
squared distance between the indexed locality i and the localitiesj that are connected
to it by a nonzero weight. High values of ci are caused by substantial differences
between the pivotal locality i and its neighbors, regardless of whether any of the
values are near the extremes of the distribution. Low values of c, indicate that ziis
close to the q’s of its neighbors, again without implying that either set of deviations
is extreme. Under the total randomization hypothesis, the expected value of c, is

E(cJ= 2nwi/(n- 1) (4)


All terms in this expression have been defined above. In a parallel study (Sokal et
al., 1998), we furnish the derivation of expression (4)and of a formula for the
variance of c,.
We do not feature the variances of the two LISAs in this paper, because in our
experience, summarized in the next section, we have noted that standard deviates
based on these variances are not asymptotically normal and hence yield unreliable
significance tests. As an alternative, we note that in those cases where significance
testing can be justified, a permutational significance test is recommended.
Getis & Ord (1992) and Ord & Getis (1995) have taken another approach to
estimating local SA. These authors define two statistics, G, and G,*, which are not
strictly measures of local SA but describe the spatial clustering of high or low values
of the variable studied around or at the pivotal locality i. They are not LISAs by
Anselin’s (1995) definition, since condition (b)of his LISA definition is not met. The
two coefficients differ in whether the pivotal locality i is included in the computation
of the coefficient. We prefer G;:*, which includes the pivotal locality. All new
symbolism is explained as it is presented.

Downloaded from https://academic.oup.com/biolinnean/article-abstract/65/1/41/2661155


by guest
on 12 April 2018
LOCAL SPATIAL AUTOCORRELATION 45

Getis & Ord (1992) define Gi*as follows:

G,* =X,w,,.+& j may equal i (5)


where ( x ~ ., . .,x,) is the set of observed values of the variable. Getis and Ord have
constrained their weights w,] to be binary. The quantity ZJ.J is the sum of all the
observations. The GI* statistic therefore sums the variates attached to all the localities
neighboring i and to i itself, and divides this quantity by the sum of all the localities.
Note that w,, must not equal zero. GI* is therefore a weighted mean of the cluster
at and around the pivotal locality for the variable under study. If G,* is high, then
the cluster of localities all show high means; if G,* is low, then the cluster means
are low. The expected value is

E(C,*)= w,*/n (6)


where wl* = C, = C,wl, with the summation including the nonzero pivotal weight wit.
Although we have investigated both GI and GI*,we prefer the latter, since it makes
more sense biologically to look for solid clusters of high or low values (including the
pivot) rather than rings or donuts of such values.
We can deduce some of the properties of the four coefficients of local SA from
their formulas. The Moran LISA I, measures joint covariation of neighboring
localities. If these localities deviate strongly from the mean and are of like sign with
the pivot (either all positive or all negative), we obtain positive local SA. However,
if the pivot locality deviates widely from the mean and with a sign opposite to those
of its neighbors, I, will be negative. The Geary LISA c, measures (squared) differences
between the values of neighboring points and the pivot. High values of c, indicate
negative SA. When data values at the pivot and its neighbors are all close to the
mean, this will show up as positive SA by c,, but as weak or zero autocorrelation
by I,. Thus I, and c, furnish partly separate information about the spatial structure.
Clusters deduced from both low and high G, or GI* values correspond to positive
SA.

SIMULATION STUDIES

We investigated the properties of all four local SA coefficients by means of


analytical and simulation work in our parallel study (Sokal et al., 1998), where
procedures and results are described in detail. Here we summarize those findings
that are relevant to the application of these methods to biological problems.
We simulated a population located on a 2 1 x 2 1 lattice of stepping stones, each
of which is settled with n=64 diploid individuals polymorphic for a biallelic locus.
Starting gene frequencies were randomly chosen. For each generation after the
initial settling, gene flow takes place between all orthogonally-connected immediately
neighboring stepping stones. The program then simulates random mating re-
production among the individuals of each stepping stone i by recolonizing it with
64 new individuals whose observed gene frequency is randomly chosen from the
current gene frequency of the deme. In this way a random component (drift) was
introduced to generational gene frequency change. The model is described in detail
in an earlier publication (Sokal et al., 1997).

Downloaded from https://academic.oup.com/biolinnean/article-abstract/65/1/41/2661155


by guest
on 12 April 2018
46 R.R.S O W ET AL.

The above design will yield an isolation-by-distance (IBD) gene frequency surface
that exhibits positive global SA. We terminated the program after 49 generations.
We also created various designs lacking global SA by shutting off the gene flow
mechanism. One such design is genetic drift. For each design we created a randomly
assembled dataset of 1000 surfaces.
Two designs test the power of LISAs as exploratory tools. Eight stepping stones,
arranged as four pairs of orthogonally adjacent stones, were located more or less
arbitrarily within the inner 7 x 7 lattice and designated as hotspots. These hotspots
are stepping stones with gene outflow increased by a factor of x. When superimposed
on our regular isolation-by-distance model, starting with generation 1, these hotspots
created positive local SA at their stepping stones and their immediate orthogonal
neighbours. We generated two datasets of 1000 surfaces each for strong ( x = 10) and
weak ( x = 2) gene outflow. The second design involved eight coldspots, again located
in the inner 7 x 7 lattice. In 1000 separately-generated isolation-by-distance surfaces,
after the 49 generations had been completed, we replaced the gene frequencies at
the eight designated stepping stones with randomly chosen gene frequencies, thereby
frequently inducing negative local SA at these locations.
For each surface of our datasets, we calculated the four local SA coefficients (A,
ci, Ci,G,*) on the inner 7 x 7 lattice, thereby avoiding any edge effects from the
surface-generating program and ensuring that all the stepping stones in the lattice
have four orthogonal connections.
We tested the significances of the resulting 49 000 local SA coefficients by two
methods, each under two assumptions. The first method is to test the observed value
of the coefficient by subtracting its expected value and dividing by the standard
deviation, hoping for asymptotic normality of the resulting mcore. The second
method is to permute the data values on the stepping stones of the lattice p- 1
times and evaluate the probability of obtaining an outcome at least as deviant as
the observed coefficient in p outcomes. The two assumptions are total randomization
(in which all values in the lattice are permuted including the pivot) and conditional
randomization (in which the value at the pivot remains fixed and the permutations
are carried out on the remaining values). For 4 and ci we obtained probabilities
under the four combinations of methods and assumptions. For G, we computed
asymptotic and permutational conditional probabilities only, and for G,* asymptotic
and permutational total probabilities only; the other randomizations are in-
appropriate for these statistics.
We tested whether the resulting probabilities were uniformly distributed, which
should be true if there is no local SA in any one design. We also estimated skewness
and kurtosis of standardized deviations from expectation in these coefficients and
tested for significance of such trends. Finally, we evaluated the proportions of
observations at the conventional cut-offs at the tails of our probability distributions.
We found that, in the designs we tested, only surfaces that had experienced
isolation by distance, hence had become globally SA, consistently rejected all three
null hypotheses for all four local coefficients. Those designs lacking global SA, such
as drift, largely conformed to the null hypotheses, with some exceptions mentioned
below. These findings are somewhat discouraging. Because many, if not most, spatial
biological datasets are globally autocorrelated, tests of individual local SA statistics
will be too liberal, i.e. coefficients appear to be significant when they are not. Before
embarking on significance tests of local SA statistics, one should carry out a global
SA analysis. Only if the global statistic is not significant should one proceed to test

Downloaded from https://academic.oup.com/biolinnean/article-abstract/65/1/41/2661155


by guest
on 12 April 2018
LOCAL SPATIAL AUTOCORRELATION 47

individual local SA coefficients for significance by the methods described here. Both
Anselin (1995) and Ord & Getis (1995) discuss this problem, but neither of the two
studies emerges with a solution. Anselin (1 995) recommends that “In practice,
inference based on the pseudo significance levels indicated by a conditional ran-
domization approach seems to be the only viable alternative.” Our findings with
conditional randomization applied to the globally spatially autocorrelated IBD
surfaces would suggest that this approach will not work. Ord & Getis (1995)
demonstrate that the variance of GI* will increase by a factor derived by them, but
conclude that “much more remains to be done in this context”.
The asymptotic approach has not proven successful, regardless of whether the
data are globally SA or not. For the four coefficients and all designs, we find
departures from expectation in all three of our tests. This means that we cannot
simply use the standard errors and normal theory to test the significance of any one
local SA coefficient, but must resort to a permutational significance test. The possible
reasons for the failure of the asymptotic approach are threefold. The sample size of
points surrounding each stepping stone in our study is small, n = 4 . We know that
the data depart from normality and may require consideration of higher moments
for the standard errors to be useful in significance testing. Cliff& Ord (1973) pointed
out that star-shaped graphs will yield marked departures from normality. Star-
shaped graphs are those in which all edges lead from a common central node or
pivot to outlying nodes. Unfortunately, the various local SA statistics lead mostly to
star-shaped graphs, even in actual datasets that are not regularly spaced as in our
simulations.
Our tests fail to show substantial differences between the results of total versus
conditional randomizations. Although one may prefer one or the other randomization
procedure for theoretical reasons, the outcomes of these tests are not different
enough to make the correctness of the assumption a major concern. The only
exception is that both skewness and kurtosis are more pronounced in the total
randomization replicates than in those randomized conditionally. Significance tests
based on conditional randomization assumptions might thus be more reliable than
those based on total randomization assumptions.
The observed frequencies in the tails of the distributions differ markedly from
expectation for the IBD design. In the other designs, only the asymptotic distributions
for I,, c,, GI, and G,* deviate strongly from expectation. The tail frequencies are well
behaved in the permutational mode of testing significance in all designs except
isolation by distance.
We investigated both G,, and G,*. Asymptotic approaches again show departures
from expectation, but the permutational results are well behaved for all designs
except IBD. The results for the two Getis and Ord coefficients are quite similar.
Thus whether one includes the pivot in the computation (G,*) or leaves it out (GI)
does not make much practical difference.
In summary, for LISAs, GI, and GI*, tests based on either conditional or un-
conditional permutational tail probabilities can be employed, but only when global
SA is absent.
Turning now to the second purpose of local spatial autocorrelation statistics, i.e.
data exploration, we found that weak gene outflow ( x = 2 ) did not differentiate the
hotspot data very well in single surfaces. The situation is much improved with strong
gene outflow (x= 10). Coldspots also were easily distinguished. Population biologists
commonly study frequencies of numerous genes in a collection of population samples

Downloaded from https://academic.oup.com/biolinnean/article-abstract/65/1/41/2661155


by guest
on 12 April 2018
48 R. R. SOKAL ETAL.

to be able to infer the processes that have affected all or the majority of the gene
loci and all or most of the populations (see the references cited in the Introduction).
The ability to identify one or more localities at which the populations have unusually
extreme positive or negative SA would be very helpful in this regard. When there
is only a single surface being analysed, not much can be read into the top or bottom
local SA coefficients ranked by their magnitudes or by the nominal P-values of their
significance tests, unless the surface lacks global SA and individual LISAs can be
tested for statistical significance. But when multiple surfaces are available for the
same population samples, we can calculate average ranks over the same pivots, and
if the processes are such that they will affect all or most of the multiple variables,
then we can obtain quite reliable information about hotspots and coldspots. This
employment of multiple variables is an important aspect of using local SA coefficients
for population-biological inferences. Success in pointing out such local outliers was
a function of both the strength of the process and the number k of surfaces averaged.
Our simulations have shown that anywhere between k= 10 and k= 15 surfaces is a
sufficient number to pick up even weak underlying processes. This number of gene-
frequency surfaces is commonly employed in geographic variation studies. Far fewer
surfaces are needed to detect strong processes.
The biological interpretation of hotspots and coldspots will differ with the types
of data analysed. Hotspots are clusters marked by positive local SA at the pivot;
coldspots are clusters of negative local SA at the pivot. Hotspots single out pivots
that are unusually like the surrounding samples. By contrast, coldspots differ greatly
from their neighbors with respect to the variable(s) under study. Gene flow is an
obvious generator of hotspots but any other contagious process can generate high
positive LISAs as well. Examples that come to mind are demographic expansions,
the spread of infectious diseases, spatial cloning, etc. Coldspots could be caused by
allelopathy, incompatible mating types, or ecotones. The average ranks should be
considered with reference to the number of locality samples in the study. Thus if,
as in our simulation example, there are 49 stepping stones, average ranks cannot
be greater than 49. Of course, the lowest average cannot be less than 1.

EXAMPLES

We feature below first an analysis of a single surface from the field of plant
population biology, second an analysis of six surfaces from a study of subalpine
ecology, and third a study of 12 allele-frequency surfaces from drosophila population
genetics. In the first study, there is little evidence of global SA. We can therefore
test the statistical significance of local SA in its surfaces. In the remaining two
examples, many of the studied variables are highly significantly globally SA. Tests
of the significance of local spatial autocorrelation coefficients would therefore likely
be overly liberal and we do not feature such tests here. Nevertheless, it is worthwhile
to compute individual local autocorrelation coefficients and examine them for their
magnitudes, with a view to data exploration.

Local structure in a prairie herb population

These data were first analysed by Sokal and Oden (1 97815) when they introduced
spatial autocorrelation to readers of this journal. They involve a perennial obligate

Downloaded from https://academic.oup.com/biolinnean/article-abstract/65/1/41/2661155


by guest
on 12 April 2018
LOCAL SPATIAL AUTOCORRELATION 49

outbreeder, Liatrzj cylindracea, studied by Schaal (1974, 1975). A grid 18 m across


and 33 m down was laid down on a sand prairie hillside and partitioned into 3 x 3 m
quadrats, making 66 quadrats (6 columns x 11 rows) in all. From each quadrat 60
plants were chosen for allozyme analysis. Schaal found 15 polymorphic loci. The
gridlike arrangement permitted the connection, for SA analysis, of the quadrats by
chessboard moves (see Upton & Fingleton, 1985 or Sokal & Oden, 1978a). Our
earlier analysis (Sokal & Oden, 1978b) found very little significant global structure,
with 1 1 of the 15 loci showing no significance whatsoever. When the entire
correlogram of each surface was considered, Bonferroni testing of the correlograms
(Oden, 1984) showed even less significance. It thus appears safe to test most of the
surfaces for significant local SA. We chose one of the 11 nonsignificant surfaces,
Peroxidase'.
Figure 1A shows that the Pmxdase' gene frequencies were restricted to 38
quadrats, the missing ones (shown as white quadrats) having insufficient material
for analysis. We computed local SA using queen's connections, which includes
distances up to 1.41 quadrat side lengths or 4.24 m. We find five nominally significant
(PC0.05) Moran LISAs as shown in Figure lB, all but one indicating negative local
SA. When analysed by Geary LISAs, there are six nominally significant localities,
three positive and three negative (Fig. 1C). Three quadrats are significantly negatively
autocorrelated for both coefficients. The Getis and Ord coefficient Gi*was employed
in two different ways. First, it was applied to the original gene frequencies. We find
that there are four significant clusters of relatively low gene frequency quadrats (Fig.
1D). The second way was to test for clusters of polymorphic quadrats of Pmxidase'
alleles by recoding each gene frequency as the absolute deviation from 0.5. We
show the results in Figure 1E. Two significant clusters of relatively polymorphic
quadrats (shown as minus signs) are found. They are the same quadrats that were
negatively autocorrelated in the earlier figures.
However, the nominal significance values are deceptive. Since for each type of
coefficient we examined all local SA coefficient values in the grid, and since the
individual values are not independent of each other, we must apply a Bonferroni
procedure (Anselin, 1995; Sokal & Rohlf, 1995). Ord & Getis (1995) use a similar
approach to simultaneous testing for their C, and Gi*statistics. Note that the
Bonferroni procedure is usually considered conservative. The problem of sim-
ultaneously testing dependent local SA coefficients still awaits a truly satisfjmg
solution. In this case the overall 5% level would be P= 0.05438 x 2) = 0.0007,
because of the two-tailed null hypothesis. For the clusters of polymorphic quadrats
in Figure 1E, the critical Bonferroni value would be 0.001 3, since this is a one-
tailed test. By this criterion, only a single quadrat (circled in Fig. 1B)for 4,is marginally
significant (observed P= 0.0008). The same quadrat was close to significance for
the polymorphism clustering by G,* of the absolute deviations from 0.5 (observed
P= 0.00 18). Actually, even this is a liberal interpretation. T o find this example, we
tested all 11 globally nonsignificant gene-frequency surfaces in this dataset, and this
is the only case where we found a Bonferroni-significant local SA coefficient. Since
we looked at 1 1 surfaces to find the only example of Bonferroni-significant coefficients,
it can be argued that we should divide the critical probability by yet another 11. In
such a case, none of the local coefficients could be counted as being significant.
It would appear that significance of local SA coefficients will be found in only
very restrictive circumstances. The global SA must be nonsignificant, and even when
this is so, it will be very difficult to find significant local values because of the LISA

Downloaded from https://academic.oup.com/biolinnean/article-abstract/65/1/41/2661155


by guest
on 12 April 2018
50 R.R. SOKAL ETAI,.
A B C

1 1 1
2 2 2
3 3 3
4 4 4
5 5 5
6 6 6
7 7 7
8 8 8
9 9 9
10 10 10
11
1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6

D E
1 1
2 2
3 3
4 4
5 5
6 6
7 7
8 8
9 9
10 10

1 2 3 4 5 6 1 2 3 4 5 6

Figure 1. Maps of the Pmxidme+ allele frequencies in the I 1 x 6 Liatrk grid and of their local spatial
autocorrelation Coefficients. Only 38 of the 3 m2 quadrats had enough data to analyse. Those without
data are shown as empty (white) quadrats. A, map of allele frequencies: light gray (0.8524.936);
medium grey (0.954-0.966); dark grey (0.970-0.991); black (1,000). B-E are maps of the nominally
significant LISAs and C,* coefficients. Plus signs indicate nominally significant (5%) positive au-
tocorrelation (for B and C). Minus signs indicate nominally significant negative autocorrelation (for B
and C), nominally significant clusters of relatively low gene frequencies (for D), or nominally significant
clusters of polymorphic quadrats (for E). The circled minus sign (in B) is the one quadrat that was
close to Bonferroni significance. (B) I, coefficients; (C) c, coefficients; (D) C,* coefficients calculated for
the raw data; (E) G,* coefficients calculated for absolute deviations from 0.5.

properties of the coefficients. We would have to postulate a situation where one or


a very few local coefficients have relatively substantial contributions which are
countervailed by many coefficients with small nonsignificant contributions in the
opposite direction. This model is not very likely to occur frequently. In most cases,

Downloaded from https://academic.oup.com/biolinnean/article-abstract/65/1/41/2661155


by guest
on 12 April 2018
IJOCAIJSPATIAL AUTOCORRELATION 51

absence of global SA will also mean absence of local SA. We therefore believe that
the major application of local SA will come in the exploratory phase, not in
significance testing.
We should point out that if prior work permits the construction of an hypothesis
concerning SA at one or a few quadrats, we need not apply a Bonferroni test, but
can use the probabilities as they emerge from the analysis. In this example, that
would have permitted a number of quadrats to be significant.
Schaal (1974, 1975) hypothesized limited gene flow in these plants as being
responsible for their minimal substructure. Schaal (1 975) and Schaal & Levin (1976)
suggested that these plants were actively selected within micropatches in the habitat.
In our earlier study (Sokal & Oden, 1978b), we concluded on the basis of the
nonsignificant (global) autocorrelations that if there were indeed micropatches with
differential selective forces, these micropatches were not SA. The local SA analysis
permits us to probe this question in greater detail. There appear to be quadrats
that do have similar neighbours, and for these, there is the possibility of local SA.
Another way of looking at the problem is to look for heterogeneity in habitat patch
size in the grid. Using the results of the local SA analysis, the investigator can
examine the localities (quadrats in this case) affected to discover a mechanism for
the observed phenomenon.

Lilies, gophers, and rocks in Colorado

These data are taken from a study by Thomson et al. (1996) of the interactions
among glacier lilies, pocket gopher activity, and soil rockiness in a subalpine meadow
in western Colorado. The authors laid out a 16 x 16 grid of 2 x 2 m quadrats in a
meadow of moderately dense glacier lilies (EvthmniumgrandzJomm). Six variables were
measured and recorded for each quadrat: (a) number of flowers; (b) number of
vegetative (nonflowering)plants in a 20 cm x 2 m strip inside the quadrat; (c) number
of first-year seedlings in ten 20 x 20 cm subplots of the quadrat; (d) an index of
rockiness of the surface; (e) an index of soil moisture; and (f) an index of pocket
gopher activity. Note that the first three variables are the biological variables of
interest to the investigators, while the last three variables are environmental agents
with putative effects on the distributions of the plants.
Maps were constructed for the six variables (see figure 2 in Thomson et al., 1996).
The environmental variables define an elongated mass of dry, rocky soil along the
principal diagonal of the map and two areas of relatively moist, deep soil in the
north and southeast. High gopher activity is found in the northwest corner of the
study area and also in the south. The number of flowers and of vegetative plants is
high on the rocky ridge, whereas the number of seedlings is highest in the moist
soil areas. When global SA statistics were computed, correlograms of all variables
show highly significant positive global SA for the first two distance classes (up to
9.0 m), making significance testing of local SA statistics problematical. For this reason
we did not carry out statistical significance tests of any of the 256 LISAs and Getis
and Ord statistics computed for each surface. Although we have only six variables,
which is few for obtaining an overall view of local structure by data exploration,
we believe it is justified in this case because of the unusually clear patterns shown.
For the local SA computations, we again connected the quadrats by a queen’s
connection scheme and counted all single-step connected quadrats as neighbours.

Downloaded from https://academic.oup.com/biolinnean/article-abstract/65/1/41/2661155


by guest
on 12 April 2018
52 R. R. S O W ET A L
A
32 32
28 - ++
24 -
++
24 ++++ ++++
+++++ 20 -
+++++
20 - +-+++ ++++
+-
--- --I 16 -
12 -
+

81,-1-,+1
4
0 4
+,,;:-- ,
8 12 16 20 24 28 32
I I I 8-
4-

0
,

4
I ,
+++ +
I I
+
++
8 12 16 20 24 28 32
, I I

Figure 2. A, top (positive local SA, plus signs) and bottom (negative local SA; minus signs) 25 ( x10'Yo)
averages of the ranked conditional permutational probabilities of I, averaged over six variables in the
Lilies-Gophers-Rocksexample are plotted inside the 16 x 16 2-meter-square study grid. B, top (clusters
of high values; plus signs) 25 ( x 10%) averages over six variables of the ranked total permutational
probabilities of G,*. Left-tail probabilities have been pooled with right-tailed ones by 1-complementing.
The plus signs indicate clusters of positively SA extreme values of the six variables. Scales in meters.

Figure 2A is an overall map showing the top 25 and bottom 25 averages of ranked
Moran LISAs. We first ranked the conditional permutational probabilities of each
variable separately, with the highest right tail probability (the most positive local
SA) given rank 1. Then we averaged the ranks of each 2 x 2 quadrat over the six
surfaces. The top and bottom 25 are shown as plus and minus signs, respectively,
in Figure 2A. We note two clusters of high positive local S A one is along the upper
portion of the principal diagonal of the map, the other at the bottom of the grid.
These are regions of homogeneous conditions over a substantial portion of the study
area. The former cluster marks the biggest rock in the plot, which also is dry and
has the most flowers and vegetative plants and the least gopher activity. The latter
cluster lacks rocks, has deep moist soil, and has much gopher activity. There are
few flowers or vegetative plants here. The clusters of low negative local SA are far
smaller and many of the numerically high ranks occur in isolated 2 x 2 quadrats
marking the boundaries of homogeneous areas. They seem to indicate transition
zones of rapid ecological change. The corresponding Geary LISA map is not shown.
It is less clear than the Moran LISA map, but shows a similar pattern.
In this example, analysis by the Getis and Ord coefficient G? is not as informative
as in the previous example (and also in the next one), because the directions of at
least two of the variables employed are arbitrary. Thus the rockiness index and soil
moisture could just as well have been recorded as soil depth and soil dryness,
respectively. In any case, finding homogeneous areas of high (or low) values for all
variables is not a very meaningful concept. For these reasons, after computing Gi*
for each of the six surfaces, we obtained total permutational probabilities for each
coefficient. Then we computed 1 -P for all cumulative probabilities PC0.5. This
converted these left-tail probabilities into right-tail ones so that extreme probabilities
were close to 1. Next we ranked the pooled original right-tail probabilities plus those
newly converted, with the highest cumulative probability assigned rank 1. Finally
we averaged the ranks for each quadrat over the six variables. The top 25 average

Downloaded from https://academic.oup.com/biolinnean/article-abstract/65/1/41/2661155


by guest
on 12 April 2018
LOCAL SPATIAL AUTOCORRELATION 53

ranks are shown as plus signs in Figure 2B. Because of the complementation of left-
tail probabilities, the bottom readings are of no interest and are not shown in this
figure. When we compare the results for I, (Fig. 2A) and G,* (Fig. 2B), we note
considerable overlap of the plus signs. Only four of the 25 plus signs in Figure 2B
do not fall on quadrats designated by plus signs in Figure 2A, which represent strong
positive local SA. The G,*-coefficient finds homogeneous clusters of quadrats, and
these appear largely coincident with high positive LISAs. Note, however, that we
cannot tell from these results whether the clusters found tend to have high or low
values, i.e. whether they are near peaks or troughs of their surfaces.
The ecological interpretation of these results is the following. Seed dispersal is
spatially limited and one would expect that seedlings should be spatially associated
with flowers, but this was not the case. Abundance of seedlings is greatest in the
distant moist quadrats, rather than the dry, rocky substrate where flowers are found.
Yet the seedlings in the moist soil pockets are subjected to intense predation pressure
from the gophers who are most active in this ecological niche. So, ultimately,
Thomson et al. (1996) posit that most seedlings that do survive to the flowering stage
occur in the diagonal, rocky area where gophers cannot burrow.

Cactus-breedingb i t Jies in Australia

The third example is a re-analysis of data from a study of cactophilic fruit flies
in Australia (Sokal et al., 1987b). Here a larger number (12) of surfaces was analysed,
and we show how one can combine the findings from the separate surfaces to make
inferences about the population structure of the entire study.

Materials and methods


Drosophila buzzatii, a native of the Chaco of Argentina, was inadvertently introduced
to Australia between 1931 and 1936 during a biological control program against
introduced cacti (from the Americas) of the genus Opuntia (Barker et al., 1985). To
effect the control, biologists employed a lepidopteran, Cactoblastis cactorum, whose
larvae feed on plant tissue within stems and cladodes of the cacti. The lesions
produced by this damage are invaded by microorganisms which in turn produce
rots that ultimately lead to the demise of the plants. D. buzsatii adults and larvae
feed on the microflora in the rots and are specific to the cactus niche.
The introduction of Cactoblastis from Argentina was effected by eggs in 1925. The
Drosophila probably came from rotting cactus material introduced between 1931 and
1936. Once introduced to Australia, natural dispersal as well as the widespread
distribution of the rotting cladodes spread the Drosophila over the area of distribution
of the cacti. By 1940, the cacti were controlled. They still ranged over their former
area of distribution, but only as small patches ranging from a few plants to a few
hundred hectares. It is from these reduced patches that the samples for the present
study were taken. For an account of the biological control of Opuntia in Australia,
see Barker & Mulley (1976) and Murray (1982).

Downloaded from https://academic.oup.com/biolinnean/article-abstract/65/1/41/2661155


by guest
on 12 April 2018
54 R. R. SOKAL E'TAL.

Figure 3. Locations and code numbers of the 57 localities for which hsophila buzratii populations
were sampled, and the distribution (shown as shaded areas) of the main Opuntiu infestations in 1920
(before control was begun). A Gabriel network has been imposed on the localities. Only those edges
of the graph with length I 186 km are shown.

Altogether 57 localities were sampled (Fig. 3), some repeatedly. Repeated samples
were pooled for analysis. Tests by Sokal et al. (1987b) demonstrated the absence of
seasonal trends in the repeated samples. Allozyme frequencies were determined at
six loci for these populations: Aldox and Hex (biallelic);Est-Z, Pgm, and Adh-Z (triallelic);
and Est-2 (pentallelic). For details on electrophoretic procedures, see Barker &
Mulley (1 976) and Barker, East & Weir (1986). The entire table of allele frequencies
is furnished as Table 1 in Sokal et al. (1987b).
The 57 sampling stations had been connected by a Gabriel network (Gabriel &
Sokal, 1969; see figure 1B in Sokal et al., 1987b), and all distances between localities
were calculated along their shortest paths through this network. For purposes of
local spatial autocorrelation, we decided to employ the first distance class designated
in the earlier study. This means that for a sampling station to be a neighbour to a
specified locality, it had to be within 186 km of that locality. This leaves all localities
with one or more neighbours, except for three, 25, 36, and 49. These were omitted

Downloaded from https://academic.oup.com/biolinnean/article-abstract/65/1/41/2661155


by guest
on 12 April 2018
I.OCAL SPATIAL AUTOCORRELATION 55

’IABLE I . Nominally significant LISAs for the separate allele-frequency surfaces. Following each surface
symbol is a list of thc localities that are nominally significant (0.001SP10.05).Negative LISAs are
shown as italic locality numbers. Globally significant allele-frequency symbols are shown in boldface

Allrlc-frequrnry
mrfacr r,
Est-la 1-5, 40 45, 47, 48, 58-60 4, 5, 41-45, 47, 48, 59. 60
Es-lr 2, 5, 10, 12, 33 10, 33
Est-2a I , 2, 16, 22, 32, 54, 55, 58, 59 7 , 14, 28, 33, 40, 47
Ell-26 I 42, 43, 46
Est-2~ 17, 29, 40, 4 I , 42, 43 45, 46, 47, 48 9, 28, 29, 33, 40, 42, 46
E+2P 7, 16, 32, 54 16, 32, 54
Hwa 17 13, 14, 18, 23, 26, 27
Pp-a 39-42’, 43, 44, 45, 46, 47, 48 39, 42-44, 46-47
PpfI-r 23. 24, 31 23-26, 54
Aldox-a I , 2. 5, 6, 11-13, 39, 42-44, 47, 48, 58 2, 5, 6, 29, 30, 34, 39, 42, 44, 48. 59
..ldh-la 6, 1 I , 12, 13, 26, 30, 34, 44 4, 6, 11-13
Adh-lb 5, 8, 19, 20, 30, 34, 56, 57, 58, 59 19, 29, 30, 33, 34, 58

from consideration and computation, leaving us with n = 54 local populations. ‘This


reduced Gabriel network is also shown in Figure 3.
For the 12 gene-frequency surfaces, we calculated Moran and Geary LISAs, as
well as Getis and Ord’s GI*. For each local SA coefficient at each locality, we
calculated the permutational probability under conditional randomization, i.e., we
fixed the pivotal gene-frequency value and randomized the other n- 1 values over
the remaining localities. We carried out 999 such randomizations. Together with
the observed coefficient, this yielded 1000 realizations for the local SA coefficients
from which the probability of obtaining the observed result by chance could be
estimated. Although for the majority of allele-frequency surfaces these probabilities
are nominal only, because of the global SA for these surfaces, we found that using
the ranks of these probabilities reveals interesting overall structure in this and similar
studies. We discuss this issue later in our account. We then ranked these probabilities
separately for each gene-frequency surface from those indicating the most negative
local SA to those signifying the most positive autocorrelation. For GI* an identical
computational strategy was followed, except that the permutations were carried out
under total randomization, which is the appropriate assumption for that statistic.
We also calculated GI* coefficients for allele frequencies coded as the absolute
difference of their values from 0.5 and their permutational probabilities.
We experimented with a variety of approaches to obtain an overall view of the
local SA at each locality. Ultimately, we averaged the ranks of the probabilities for
the 12 gene-frequency surfaces for a given locality. Figures 4 and 5 highlight the
results for I, and c,. We chose to indicate the top and bottom 10% of the average
ranks. Positive local SA or clusters of high values are indicated by rectangular boxes,
negative SA or clusters of low values by circles.
A map for GI* based on the regular allele frequencies (not shown) would contrast
clusters of localities having high G,*’s with those having low values of G,*. This
would show areas that have generally high allele frequencies and others with low
frequencies. While this was of interest in the previous ecological example, it is less
important in population genetics. Therefore, we show in Figure 6 the map based
on absolute deviations which contrasts clusters of the most deviant allele frequencies

Downloaded from https://academic.oup.com/biolinnean/article-abstract/65/1/41/2661155


by guest
on 12 April 2018
56 R.R. S O U L E T A .

Figure 4. Dmsophila burzatii populations in Australia. Map of the top (hotspots;localities enclosed by
rectangular boxes) and bottom (coldspots; localities enclosed by circles) 10% of the average ranks of
the conditional permutational probabilities of the &coefficients for 12 allele-frequencysurfaces at the
indicated locality.

(near allele loss or fixation) from those that are the least deviant from 0.5 (highly
polymorphic). This is clearly of more interest to population geneticists.

Results
Although we document our account below by only three figures, we have relied
in our interpretation and conclusions not only on these, but also on other quantities
and maps derived from these data, too numerous to publish. The average ranks of
the conditional probabilities of Moran and Geary LISAs (Figs 4 and 5) agree quite
well in their general configurations. Four areas of local SA are singled out. The first
is an area of positive local SA in southwestern New South Wales and northwestern
Victoria (localities 40-45, 47, 48), isolated from the main Opuntiu distribution. The
second is an outlying region in eastern New South Wales south of that of the main
distribution (localities 1, 2, 58 with possible extensions to 3-5, 59, 60), also showing
positive local SA. The highlighted localities in these areas are themselves surrounded

Downloaded from https://academic.oup.com/biolinnean/article-abstract/65/1/41/2661155


by guest
on 12 April 2018
LOCAL SPATIAL AUTOCORRELATION 57

Figure 5. Dmsophila buzzatii populations in Australia. Map of the top (hotspots; localities enclosed by
rectangular boxes) and bottom (coldspots; localities enclosed by circles) 10% of the average ranks of
the conditional permutational probabilities of the crcoefficients for 12 allele-frequency surfaces at the
indicated locality.

by neighbouring localities with similar gene frequencies. The third area exhibits
negative local SA in a group of localities along the western margin of the Opuntid
D. buttah2 distribution in southern Queensland (localities 22-24, 26, 50). This
indicates crazy-quilt or very bumpy surfaces in which neighbouring localities differ
greatly in gene frequency. The fourth area of interest is more complicated. In this
area, along the eastern margin of the distribution in southeasternmost Queensland,
localities 3 1 and 35 are strongly negatively autocorrelated. Farther inland, localities
8, 29, 33, and 18 form a transect of positive local SA.
In Figure 6 (of G,* for absolute deviations from 0.5), we note that there is one
cluster of deviant gene frequencies in New South Wales among the southern outliers
to the main area of distribution in Queensland. Locality 8 is also quite deviant in
its allele frequencies. Heavily polymorphic localities are 10-13 and 27 in southern
Queensland and northernmost New South Wales.
In the study of the global spatial autocorrelation of these data, Sokal et al. (1987b),

Downloaded from https://academic.oup.com/biolinnean/article-abstract/65/1/41/2661155


by guest
on 12 April 2018
58 R. R. SOKAL ET AL.

Figure 6. Drosophilu butzatii populations in Australia. Map of the top (clusters of localities most deviant
from allele frequency of 0.5; localities enclosed by rectangular boxes) and bottom (clusters of localities
least deviant from 0.5, i t . most polymorphic; localities enclosed by circles) 10?/0of the averagc ranks
of the total permutational probabilities of the G,*-coefficients for 12 surfaces of allele frequcncies
expressed as absolute deviations from 0.5 at the indicated locality.

found that four allele frequencies (Est-Pe, Hex-a, Ppm-c, and Adh-la) lacked SA in
both one and two dimensions and that two additional frequencies (Est-Zc and Est-
24 have questionable global significance. As we have seen, for such surfaces one
can reliably test the significance of local SA coefficients, at least permutationally.
When we did so for the four clearly nonsignificant surfaces, approximately 8% of
the LISA coefficients were significant at P10.05. This figure compares with c. 15%
of the LISAs being nominally significant when the eight surfaces showing significant
global SA are examined. O n examining the significances of the Moran and Geary
LISAs of each allele-frequency surface (Table l), we find that these contribute
separately and differentially to the overall picture of local SA reported above. Thus
among the globally nonsignificant surfaces, Hex-a and Ppm-c contribute to the
negative local SA exhibited by the third group of localities (in southern Queensland).
The globally significant surfaces support the positive local SA of the first two groups.

Downloaded from https://academic.oup.com/biolinnean/article-abstract/65/1/41/2661155


by guest
on 12 April 2018
LOCAL SPATIAL AUTOCORRELATION 59

Est- l a , Est-Pc, Pgm-a, and Aldox-a describe the first group (localities 40-48), while
Est-la, Est-Pa, and Aldox-a affect the second group. The fourth group is not clearly
indicated by individually significant LISAs, except, possibly, for Est-Pc. It must have
come about as the cumulation of moderate, nonsignificant trends in several gene-
frequency surfaces. There are some other locality groups singled out by individual
surfaces, although these did not succeed in being in the highlighted group in the
overall analysis. An example is localities 10-1 3 in southern Queensland and adjoining
areas of New South Wales, which show negative SA for Adh-la and Est-lc. Another
is the coast of Queensland where localities 32 and 54 exhibit highly significant
negative local SA for Est-Pe.

Discussion
Based on the global SA of these same data, Sokal et al. (1987b)concluded that there
was significant spatial pattern for some ofthe allele frequencies, whereas the rest showed
no interpretable pattern. There was no evidence for a migratory wave from the point
of origin of the dispersion in the 1930s (Brisbane, Queensland). Nor was there strong
evidence of local inbreeding or drift. These authors ultimately concluded that selection
seemed the most reasonable explanation for the spatial allele-frequency patterns. This
selection was apparently taking place on three hierarchic levels: continental (Est-Pa,
Gt-Pc, P p - a , and possibly Adh-lb);smaller scale up to 1204 km (Est-la,Aldox-a);and
local (Est-lc,Est-26, Hex-a, Adh-la, Pp-c).
In the present study, we focus on local spatial variation between localities up to
186 km apart. In the global study, five of the allele frequencies were significantly
positive for the first distance class. Thus there may be environmental patches about
that distance apart that exhibit homogeneity in the environment (presumably the
microorganismic flora within the rots). Yet, the local SA analyses reported above
yield a number of bumpy surfaces, with pivotal localities that exhibit negative local
SA. The most prominent such group is the one designated earlier as the third group
comprising localities 22-24, 26, 27, and 50. It is useful to examine the distribution
of nominally significant negative LISAs over the 54 localities and the 12 allele
frequencies (Table 1). Inspection of this table reveals that various allele frequencies,
Est-Pa, P p - c by Z, (and Hex-a by c,, not shown) affect the third group negatively.
These localities are on the periphery of their distribution, with localities 22, 23, and
24 actually small patches of Opuntia beyond the boundary of the main Opuntia
infestation. What can account for such a pattern? Ongoing genetic drift is one such
mechanism, but it seems unlikely for Est-Pa and Hex-a since there is no evidence of
a tendency toward fixation of allele frequencies. Possibly it is the local differentiation
among the rots with respect to their microflora to which the hsophila have to adapt
that causes these localities to have bumpy allele frequencies. Such local selection
was also suggested for these alleles by Sokal et al. (1987b). Results for locality 23
should be regarded with caution, as it is based on a small sample size.
Est-Pe is a special case. It was de-emphasized as uninformative by Sokal et al.
(1987b), because not only did it not exhibit significant global SA, but its allele
frequencies did not even prove significantly heterogeneous among localities as the
other 1 1 allele frequencies did. Est-Pe is a rare allele that occurs at only 14 localities.
Inspection of Table 1 shows that four localities show significant negative local SA.
Locality 7 possesses the allele, but is surrounded by five neighbours all of whom
lack the allele. The other three significantly negative 4-values (localities 16, 32, 54)

Downloaded from https://academic.oup.com/biolinnean/article-abstract/65/1/41/2661155


by guest
on 12 April 2018
60 R. R. S O W E T A .

are in the same neighbourhood cluster. They all lack Est-Pe, but are surrounded by
some neighbours with it, including locality 55, which has the highest Est-2e allele
frequency in the study. We can see that local SA has been instrumental in finding
significant coldspots in the data despite the absence of global SA, or apparent
geographic differentiation. The mechanism behind these patterns is not clear. In
the case of this rare allele, genetic drift may be a probable cause of the phenomenon.
Pgm-c may be a similar case.
The areas with positive local autocorrelation are largely due to Est-la, Est-Pc,
Pgm-a, and Aldox-a. These all were found to exhibit long-range trends (continental
or at least up to 1204km) by Sokal et al. (1987b). The clusters of positive LISAs
indicate areas of homogeneity disrupting the potentially smooth trends of the allele-
frequency surface. The first and second groups are the most noteworthy examples
of such areas. Both of these are in areas disjoint from the main distribution in
Queensland. What could engender such homogeneity of allele frequencies? Increased
interlocality gene flow within these areas, or unusual homogeneity of the cactus rot
microclimate, come to mind. The homogeneity could also be due to differences in
the Opuntia species serving as hosts in these areas. Whereas in central Queensland
the prickly pears are mostly tree pears 0. tomentosa and 0. streptocantha, the collections
for the second group in New South Wales were largely from 0. stricta, and those
for the first group along the New South Wales-Victoria border came from cultivated
tree pears 0.ficus-indica. Adaptations to possible microflora differences of the rots in
these cacti may have led to the clusters of positive local SA. Further tests are
necessary to distinguish among these models.

ACKNOWLEDGEMENTS

Contribution No. 1017 in Ecology and Evolution from the State University of
New York at Stony Brook. This research was supported by grant DEB9220538
from the National Science Foundation to Robert R. Sokal. The extensive com-
putations were made possible by a grant of supercomputer time from the Cornell
Theory Center. We are indebted to Prof. J.S.F. Barker for letting us use his Drosophilu
buzzatii data and for a critical reading of the manuscript.

REFERENCES

Anselin L. 1995. Local indicators of spatial association-LISA. Geopraphical Anabsis 27: 93-1 15.
Barbujani G. 1987. Autocorrelation of gene frequencies under isolation by distance. Genetics 117:
777-782.
Barbujani G, Sokal RR. 1991. Genetic population structure of Italy. I. Geographic patterns of gene
frequencies. Human BioloQ 63: 253-272.
Barker JSF, East PD, Weir BS. 1986. Temporal and microgeographic variation in allozyme
frequencies in a natural population of Dmsophila buzzatii. Gaetics 112: 5 7 7 4 1 1.
BarkerJSF, Mulley JC. 1976. Isozyme variation in natural populations of hsophila buzzatii. Evolution
3 0 2 13-233.
Barker JSF, Sene FdeM, East PD, Pereira MAQR. 1985. AUozyme and chromosomal poly-
morphism of Drosophila buzzatii in Brazil and Argentina. Genetica @ Hague) 67: 161-1 70.
Cliff AD, OrdJK. 1973. Spatial Autoconelation. London: Pion.
Cliff AD, OrdJK. 1981. Spatial hcesses. London: Pion.

Downloaded from https://academic.oup.com/biolinnean/article-abstract/65/1/41/2661155


by guest
on 12 April 2018
LOCAL SPAlIAL AUTOCORRELATION 61

Epperson BK. 1990a. Spatial patterns of genetic variation within plant populations. In: Brown AHD,
Clegg MT, Kahler AL, Weir BS, eds. Population Genetics and Germ Plasma Resources in Cmp Impmvement.
Sunderland, MA: Sinauer Associates, 229-253.
Epperson BK. 199Ob. Spatial autocorrelation of genotypes under directional selection. Genetics 124:
757-77 1.
Epperson BK. 1992. Spatial structure of genetic variation within populations of forest trees. Nm
Forests 6: 257-278.
Epperson BK. 1993a. Spatial and space-time correlations in systems of subpopulations with genetic
drift and migration. Genetics 133: 71 1-727.
Epperson BK. 1993b. Recent advances in correlation studies of spatial patterns of genetic variation.
Evolutionaty Biology 27: 95-1 55.
Epperson BK. 1994. Spatial and space-time correlations in systems of subpopulations with stochastic
migration. Theoretical Population Biology 46: 160-197.
Epperson BK. 1995. Spatial distributions of genotypes under isolation by distance. Genetics 140:
1431-1 440.
Epperson BK, Li T. 1996. Measurement of genetic structure within populations using Moran’s
spatial autocorrelation statistics. Froceedings ofthe National Academy ofSciences USA 93: 10528-10532.
Epperson BK, Li T. 1997. Gene dispersal and spatial genetic structure. Ewlution 51: 672681.
Falsetti A, Sokal RR. 1993. Genetic structure of human populations in the British Isles. Annals of
Human Biology 20: 215-229.
Gabriel KR,Sokal RR. 1969. A new statistical approach to geographic variation analysis. Systematic
<OO~OQ 1 8 259-278.
Getis A, Ord JK. 1992. The analysis of spatial association by use of distance statistics. Geographical
Anabsis 24: 189-206.
Jumars P, Thistle D, Jones M. 1977. Detecting two dimensional spatial structure in biological
data. Oecologia 28: 109-123.
Murray ND. 1982. Ecology and evolution of the Opuntia-Cactoblastis ecosystem in Australia. In: Barker
JSF, Starmer WT, eds. Ecological genetics and evolution: the cactusyeast-Dmsophila model system. Sydney:
Academic Press, 17-30.
Oden NL. 1984. Assessing the significance of a spatial correlogram. Geographical Anahsir 16: 1-16.
OrdJK, Getis A. 1995. Local spatial autocorrelation statistics:Distributional issues and an application.
Geographical Anabsis 27: 286-306.
Schaal BA. 1974. Population structure and balancing selection in LiatriS cylindracea. Ph.D. dissertation,
Yale University.
Schaal BA. 1975. Population structure and local differentiation in Liatris cylindracea. American Naturalist
109: 491-510.
Schaal BA, Levin DA. 1976. The demographic genetics of L i a t h cylindracea Michx (Compositae).
American Naturalist 110 191-206.
Slatkin M, Arter HE. 1991. Spatial autocorrelation methods in population genetics. American Naturalist
138: 499-5 17.
Sokal RR. 1979. Ecological parameters inferred from spatial correlograms. In: Patil GP, Rosenzweig
ML, eds. Contemporaty Quantitative Ecology and Related Ecometrics. Fairland, MD: International Co-
operative Publishing House, 167- 196.
Sokal RR. 1984. Spatial analysis in population biology and regional science. In: Anderson AE, Isard
W, Puu T, eds. Regional and industrial development theories, models and empirical euidence. Amsterdam:
North-Holland, 241-266.
Sokal RR. 1986. Spatial data analysis and historical processes. In: Diday E, Escoufier Y, Lebart L,
Pages J, Schektman Y, Tomassone R, eds. Data Anahsir and Informatics, Zc! Amsterdam: North-
Holland, 29-43.
Sokal RR, Harding RM, Oden NL. 1989. Spatial patterns of human gene frequencies in Europe.
American Journal of Physical Anthmpology 80: 267-294.
Sokal RR, Harding RM, Lasker GW, Mascie-Taylor CGN. 1992. A spatial analysis of 100
surnames in England and Wales. Annals ofHuman Biology 19: 445476.
Sokal RR, Jacquez GM. 1991. Testing inferences about microevolutionary processes by means of
spatial autocorrelation analysis. Evolution 45: 152- 168.
Sokal RR, Lengyel IA, Derish PA, Wooten MC, Oden NL. 1987a. Spatial autocorrelation of
ABO serotypes in medieval cemeteries as an indicator of ethnic and familial structure. Journal of
Archaeological Science 14: 615-633.

Downloaded from https://academic.oup.com/biolinnean/article-abstract/65/1/41/2661155


by guest
on 12 April 2018
62 R. R. S O U L ETAL.

Sokal RR, Oden NL. 1978a. Spatial autocorrelation in biology 1. Methodology. Biological Journal of
the Linnean So&& 10: 199-228.
Sokal RR, Oden NL. 1978b. Spatial autocorrelation in biology 2. Some biological implications and
four applications of evolutionary and ecological interest. Biological Journal of the Linnean Sociep 10:
229-249.
Sokal RR, Oden NL. 1991. Spatial autocorrelation analysis as an inferential tool in population
genetics. American Naturalist 138 5 18-52 I .
Sokal RR, Oden NL, Barker JSF. 198713. Spatial structure in Drosophila buzzatii populations:
simple and directional spatial autocorrelation. American Naturalist 129: 122-1 42.
Sokal RR, Oden NL, Thomson BA. 1997. A simulation study of microevolutionary inferences by
spatial autocorrelation analysis. Biological Journal of the Linnean So&& 60: 73-93.
Sokal RR, Oden NL, Thomson BA. 1998. Local spatial autocorrelation in a biological model.
Geogaphical Anahsis (in press).
Sokal RR, Rohlf FJ. 1995. Biomety, 7hird Edition. New York: W. H. Freeman and Co.
Sokal RR, Smouse PE, Nee1JV. 1986. The genetic structure of a tribal population, the Yanomama
Indians. XV. Patterns inferred by autocorrelation analysis. Genetics 114: 259-28 I .
Sokal RR, Wartenberg DE. 1981. Space and population structure. In: Griflith D, McKinnon R,
eds. Qnarnic Spatial Models. Alphen aan den Ftijn, The Netherlands: Sijthoff and Noordhoff, 186-2 13.
ThomsonJD, Weiblen G , Thomson BA, Alfaro S, Legendre P. 1996. Untangling multiple
factors in spatial distributions: Lilies, gophers, and rocks. Ecologv 77: 1698-1 715.
Upton GJG, Fingleton B. 1985. Spatial Data Anabssis Example. El. 1. Point Pattern and Quantitative
Data. Chichester, England: John Wiley.

Downloaded from https://academic.oup.com/biolinnean/article-abstract/65/1/41/2661155


by guest
on 12 April 2018

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy