0% found this document useful (0 votes)
27 views

Chapter Seven Disease Cluster and Cluster - Analysis

The document describes cluster analysis techniques in spatial statistics. It defines spatial autocorrelation and discusses how Moran's I and Getis-Ord General G statistics can be used to detect global clustering. Moran's I detects overall spatial patterns while General G indicates if high or low values cluster. Local statistics like Anselin Local Moran's I identify statistically significant clusters, hotspots, coldspots and outliers at specific locations.

Uploaded by

Bikila Dessalegn
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views

Chapter Seven Disease Cluster and Cluster - Analysis

The document describes cluster analysis techniques in spatial statistics. It defines spatial autocorrelation and discusses how Moran's I and Getis-Ord General G statistics can be used to detect global clustering. Moran's I detects overall spatial patterns while General G indicates if high or low values cluster. Local statistics like Anselin Local Moran's I identify statistically significant clusters, hotspots, coldspots and outliers at specific locations.

Uploaded by

Bikila Dessalegn
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 52

Chapter 7

Disease Cluster and Cluster


Analysis

1
Session Objectives
At the end of this Session students will enable to:
 Describe the geocoding and data linkage using primary and
secondary data
Define cluster analysis
 Differentiate Local and Global Spatial autocorrelation
 Identify the types of interpolation
 Define network analysis

2
What are spatial statistics?
 They are similar to traditional statistics, but integrate spatial
they
relationships into the calculations.
 Spatial statistics will allow you to answer the following questions about your data:

 How are the features distributed?

 What is the pattern created by the features?

 Where are the clusters?

 How do patterns clusters of different variables compare on one


and
another?
 What are the relationships between sets of features or values?
3
What is spatial autocorrelation?

 Spatial autocorrelation in GIS helps to understand the degree to which one


object is similar to other nearby objects.
 It is a statistical concept that measures the degree of similarity or dissimilarity
between neighboring locations in a spatial dataset.
 It measures how much close objects are in comparison with other close
objects.
 Measure of how likely two neighboring areas are to have similar values for a
specific field of data.
4
What is spatial autocorrelation?
 Moran’s I (Index) measures spatial autocorrelation.

 Global Moran's I statistic measures spatial autocorrelation based on feature


locations and attribute values

 Moran’s I statistic is robust in detecting the presence of a spatial pattern


amongst a variable
 Moran’s I can be classified as positive, negative and no spatial auto-
correlation.
 Positive occurs when observations having similar values are closer
(clustered) to one another
 Negative occurs when observations having dissimilar values occur near one
another 5
Autocorrelation cont’d….

 Positive Spatial Autocorrelation Example


 Positive spatial autocorrelation is when similar values cluster together in a
map and occurs when Moran’s I is close to +1.

 This means values cluster together.


 Similar attribute values tend to cluster together in neighboring locations.

 It indicates a spatial pattern where areas with similar characteristics are


spatially grouped. 6
Autocorrelation cont’d….

Negative Spatial Autocorrelation Example


 Negative spatial autocorrelation is when dissimilar values cluster
together in a map
and occurs when Moran’s I is near -1

 Moran’s I is -1 because dissimilar values are next to each other

 A value of 0 for Moran’s I typically indicates no autocorrelation

 It indicates a spatial pattern where areas with contrasting characteristics


are spatially grouped.
7
Testing of the existence of clusters (Autocorrelation)
A. Global Tools /Statistics/

• Are tools used to test the existence of overall clustering (either high or
low)

• It doesn’t indicate the occurrence of specific pattern

• But, used to identify and measure the pattern of the entire study areas

• It is a single value statistic used to summarize pattern

• Homogeneity
8
Testing of the existence of clusters cont’d…

B. Local Tools /statistics/


 Test the existence of local clusters

 Identify variation across the study area, focusing on individual


features and their relationships to near by features

 It is location specific statistics (i.e. specific areas of clustering)

 Heterogeneity
9
A. Global Statistics
i) Getis-Ord General G (High/Low Clustering)

 General G is a tool used to measure the concentration of high/low values for


a given study area

 The Global G statistic computes a single statistic for the entire study area

 Able to indicate whether there is a clustering of high or low values but not
both

 Value of G score indicates statistically significant relationships


10
Global Statistics cont’d…

 The draw back is that, if there are both high and low clusters they will
counteract each other so it is advisable to first use Moran’s I

 G statistics are useful when negative spatial autocorrelation (outliers) is


negligible

 High G score: Statistically significant clustering of high values

 Low G value: Slight clustering of low values


11
Formula for Getis-Ord General G

12
Interpretation of Getis-Ord General G result
 It is an inferential statistic, which means that the results of the analysis are interpreted
within the context of the null hypothesis

 The null hypothesis states that there is no spatial clustering of feature values

 When the p-value returned by this tool is small and statistically significant, the null
hypothesis can be rejected

 If the null hypothesis is rejected, the sign of the z-score becomes important.

 If the z-score value is positive, the observed General G index is larger than the expected
General G index, indicating that high values for the attribute are clustered in the study
area.

 If the z-score value is negative, the observed General G index is smaller than the
expected index, indicating that low values are clustered in the study area.
13
Global statistics cont’d….
ii) Spatial Auto-Correlation (Global Moran’s I)

 Measures whether the pattern of feature values is clustered, dispersed, or


random.

 Global Statistic

 Calculates I values to test for statistically significant clustering

 High and low values are not separated (exist together)

14
Spatial Autocorrelation Calculation

15
Global statistics cont’d..
Interpretation Global Moran’s I
• Global Moran's I tool is an inferential statistic, which means that the
results of the analysis are always interpreted within the context of its null
hypothesis

• The null hypothesis states that the attribute being analyzed is


randomly distributed among the features in your study area;

• In another way, the spatial processes promoting the observed pattern of


values is random chance.

• When the p-value returned by this tool is statistically significant, you can
reject the null hypothesis. 16
Interpretation Global Moran’s I cont’d….

17
18
Global Moran's I vs. Getis-Ord General G
 Both techniques are used to assess the global clustering (simply tell you whether there is a
cluster or not where the clustering actually exist)

 The assumptions behind both statistics are that your data is continuous and normally
distributed in the study area.

 Moran's I measure only indicates that similar values occur together (It does not indicate
whether any cluster is composed of high or low values)

 General G statistic can be used to indicate whether high or low values are concentrated
over the study area

 Hence, when we wish to find out whether our data is clustered in general (auto correlated)
we can use Moran's I.

 However, if we want to know more specifically whether or not there are clusters of high/low
values we can use G statistics 19
Moran's I vs. Getis-Ord General G

20
B. Local statistics
i) Anselin Local Moran’s I (Cluster and outlier analysis )

 Measures the strength of patterns for each specific feature.

 Given a set of weighted features, cluster and outlier analysis identifies


statistically significant hotspots, cold spots and spatial outliers

 The math of the two are the same as to the global variant but the result are
somewhat different

 Anselin Local Moran’s I can identify HH, LL, HL, LH clusters H=High
L=Low HL is a high value surrounded by low values (outliers) 21
Local statistics cont’d…

Interpretation of Anselin Local Moran’s I

 A positive value for I


 Indicates that a feature has neighboring features with similarly high or low attributes
values;

 Feature is part of a cluster

 Statistically significant clusters can consist of high values (HH) or low values (LL)

22
Local statistics cont’d…

Negative value for I

 Indicates that a feature has neighboring features with dissimilar values;

 Feature is an outlier

 Statistically significant outliers can be a feature with a high surrounded by


features with low values (HL) or a feature with a low value surrounded by
features with high values (LH)

 In either instance, a p-value for the feature must be small enough for the
cluster or outlier to be considered statistically significant 23
Local statistics cont’d…
ii) Hotspot Analysis (Getis-Ord Gi*)

• Local version of the G statistic that indicates hot spot (cluster of high
values) or cold spots (clusters of low value)

 To be statistically significant, the hot or cold spot will have a high/low value
and be surrounded by other features with high/low values exist in the area

 Getis-Ord Gi* can identify Hot (High) or Cold (Low) clusters with
different confidence levels

 It is useful when negative spatial autocorrelation (outliers) is negligible 24


Getis-Ord Gi* (High/Clustering) vs. Anselin Local Moran’s I

 The math of the two are the same as for the global variant but the results
are somewhat different

 Getis-Ord Gi* can identify Hot (High) or Cold (Low) clusters with
different confidence intervals.

 Anselin Local Moran's I can identify HH, LL, LH, HL clusters where
H=High, L=Low and HL is a high value surrounded by low values

25
Why is spatial autocorrelation important?
• One of the main reasons why spatial auto-correlation is important is because
statistics relies on observations being independent from one another

• If autocorrelation exists in a map, then this violates the fact that observations are
independent from one another

• Another potential application is analyzing clusters and dispersion of ecology and


disease

• Is the disease an isolated case or spreading with dispersion?

• These trends can be better understood using spatial autocorrelation analysis

26
Best practice guidelines for using cluster and outlier analysis
(Anselin Local Moran’s I)
 Results are only reliable if the input feature class contains at least 30
features;
 This tool requires an input field such as count, rate, or other numeric
measurements

 If you are analyzing point data, where each point represents a single event or
incident, you might not have a specific numeric attribute to evaluate (a
severity ranking, count or other measurement)

 If you are interested in finding locations with many incidents (hot spots)
and /or locations with very few incidents (cold spot)s), you will need to
27
Best practice guidelines for using cluster cont’d……

 Select an appropriate conceptualization of spatial relationships

 Select an appropriate distance band or threshold distance

 All features should have at least one neighbor

 No feature should have all other features as a neighbor

 Especially if the values for the input field are asked, each feature should
have about eight neighbors
28
Best Practice guidelines for using Cluster cont’d…
 Given a set of weighted features, the Getis-Ord Gi* (pronounced as Gee Eye Star)
statistic identifies statistically significant hot pots and cold spots

 This tool works by looking feature with in the context of neighboring features.

 To be statistically significant hot spot, a feature will have a high value and be
surrounded by other features with high values as well.

 The local sum of features & its neighbors is compared proportionally to the sum of
all features;

 Wen the sum is very different from the expected local sum, and when that
difference is too large to be the result of random chance, a statistically significant
z-score results.

29
Clustering vs Clusters
 The mapping clusters tools perform cluster analysis to identify the locations
of statistically significant hot spots, cold spots, spatial outliers and similar
features
 Clustering can be detected at the Global level where clusters at the local
level
 Moran’s I is a global statistic, i.e. a single value for the whole spatial
pattern
 Moran’s I does not provide the location of clusters

 Cluster detection requires a local statistic

30
Interpolation
What is Interpolation?

 Interpolation is the procedure of estimating unknown values at un sampled


sites using known values of existing observations.

 It can be used to predict unknown values for any geographic point data, such
as home delivery, high child mortality, low ANC visit and so on.

 Interpolation predicts values for cells in a raster from a limited number of


sample data points.

31
Interpolation Methods/Types/
INVERSE DISTANCE WEIGHTED (IDW)

• The Inverse Distance Weighting interpolator assumes that each input point has
a local influence that diminishes with distance.

• It weights the points closer to the processing cell greater than those further
away.

• A specified number of points, or all points within a specified radius can be used
to determine the output value of each location.

• Use of this method assumes the variable being mapped decreases in influence
32
Interpolation Methods cont’d…

• IDW interpolation explicitly implements the assumption that things that are
close to one another are more alike than those that are farther apart.

• To predict a value for any unmeasured location, IDW will use the measured
values surrounding the prediction location.

• Those measured values closest to the prediction location will have more
influence on the predicted value than those farther away.

33
Interpolation Methods cont’d…
Kriging

• Kriging is a geostatistical interpolation technique that considers both the distance and the
degree of variation between known data points when estimating values in unknown areas.

• A kriged estimate is a weighted linear combination of the known sample values around the
point to be estimated.

• Kriging procedure that generates an estimated surface from a scattered set of points with z-
values.

• Kriging assumes that the distance or direction between sample points reflects a spatial
correlation that can be used to explain variation in the surface.
34
Interpolation Methods cont’d…

• The Kriging tool fits a mathematical function to a specified number of points, or


all points within a specified radius, to determine the output value for each
location.

• Kriging is a multistep process; it includes exploratory statistical analysis of the


data, variogram modeling, creating the surface, and (optionally) exploring a
variance surface.

• Kriging is most appropriate when you know there is a spatially correlated


distance or directional bias in the data.

• It is often used in soil science and geology


35
Sat Scan Analysis

36
What is Sat Scan?
• Sat Scan is a freely available software that uses the scan statistic to detect clusters
(www.satscan.org)

• To test whether a disease is randomly distributed over space, over time or over
space and time

• To perform geographical surveillance of disease, to detect areas of significantly


high or low rates

• The spatial scan statistic can be useful as an addition to disease maps, in order to
determine if the observed patterns are likely due to chance or not

• A complement rather than a replacement for regular disease maps 37


The Spatial Scan Statistic
• For each distinct window, calculate the likelihood, proportional to:

Where: n=number of cases inside the circle


N= total number of cases
µ= expected number of cases inside a circle
• Circles of different sizes (from zero up to maximum 50% of the population size is included)
• LLR is used to test (compare) goodness of two models. (i.e. when the LLR is greater than
the Monte Carol critical value, we reject the null model (hypothesis)

• For each circle, a likelihood ratio statistic is computed based on the number of observed
and expected cases within and outside the circle and compared with the likelihood L0
under the null hypothesis.
• Create a regular or irregular grid of centroids covering the whole study region.
38
The Spatial Scan Statistic cont’d…
For each circle:
 Obtain actual and expected number of cases inside and outside the circle.

 Calculate Likelihood Function.

Compare Circles:
– Pick circle with highest likelihood function as Most Likely Cluster.

Inference:

 Generate random replicas of the data set under the null-hypothesis of no clusters
(Monte Carlo sampling).

 Compare most likely clusters in real and random data sets (Likelihood ratio test).

39
The Spatial Scan Statistic cont’d…

 The scan statistic is the maximum likelihood over all possible circles

• Identifies the most unusual clusters

 To find p-value, use Monte Carlo hypothesis testing

 Redistribute cases randomly and recalculate the scan statistic many times

 Proportion of scan statistics from the Monte Carlo replicates which are greater

than or equal to the scan statistic for the true cluster is the p-value Scan Statistics

40
What SaTScan can/can’t do?
CAN

• Identify spatial, temporal, spatial-temporal clusters

• Provide flexible geographic units

CANNOT
o Display maps of events and clusters locations

o Need GIS or mapping software (such as ArcGIS)

o Create other statistical and regression models

41
Spatial Scan Statistic: Properties
 Adjusts for inhomogeneous population density.

 Simultaneously tests for clusters of any size and any location, by using circular
windows with continuously variable radius.

 Accounts for multiple testing.

 Possibility to include confounding variables, such as age, sex or socio-economic


variables.

 Aggregated or non-aggregated data (states, counties, census tracts, block groups,


households, individuals).

42
Introduction of Statistical models in SaTScan
Bernoulli Model
• There are animals with or without a disease (represented by a 0/1 variable)

 A set of cases and controls

• Purely temporal/spatial or the space-time scan statistics

Discrete Poisson Model


• The number of cases in each location is Poisson- distributed.

• Under the null hypothesis, and when there are no covariates, the expected number
of cases in each area is proportional to its population size

• Purely temporal, purely spatial and space-time

• This model a very good approximation to the Bernoulli model if few cases vs.
controls (less than 10%) 43
Introduction of Statistical models in SaTScan cont’d…
Space-Time Permutation Model

 Requires only cases data with information about the spatial location and time for each
case (No information needed for population at risk)

 If the population increase (or decrease) is the same across the study region, that is
okay, and will not lead to biased results

 The user is advised to be very careful when using this method for data spanning
several years

 Population in some areas grows faster than in others

44
45
46
47
48
49
50
Reading assignment
• Geocoding and data linkage using primary and secondary data
• Network analysis

51
Thank
You!!
52

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy