Kriging 1 - 3
Kriging 1 - 3
Kriging 1 - 3
Kriging is a geostatistical interpolation technique that considers both the distance and the degree of variation between known data points when estimating values in unknown areas. A kriged estimate is a weighted linear combination of the known sample values around the point to be estimated. Applied properly, Kriging allows the user to derive weights that result in optimal and unbiased estimates. It attempts to minimize the error variance and set the mean of the prediction errors to zero so that there are no over- or under-estimates. Included with the Kriging routine is the ability to construct a semivariogram of the data which is used to weight nearby sample points when interpolating. It also provides a means for users to understand and model the directional (e.g., north-south, east-west) trends of their data. A unique feature of Kriging is that it provides an estimation of the error at each interpolated point, providing a measure of confidence in the modeled surface. The effectiveness of Kriging depends on the correct specification of several parameters that describe the semivariogram and the model of the drift (i.e., does the mean value change over distance). Because Kriging is a robust interpolator, even a nave selection of parameters will provide an estimate comparable to many other grid estimation procedures. The trade-off for estimating the optimal solution for each point by Kriging is computation time. Given the additional trial and error time necessary to select appropriate parameters, Kriging should be applied where best estimates are required, data quality is good, and error estimates are essential.
Vertical Mapper provides three different methods of Kriging interpolation; Ordinary Kriging, Simple Kriging, and Universal Kriging.
The disadvantage of the IDW interpolation technique is that it treats all sample points that fall within the search radius the same way. For example, If an exponent of 1 is specified, a linear distance decay function is used to determine the weights for all points that lie within the search radius, (Figure 3.9). This same function is also used for all points regardless of their geographic orientation to the node (north, south etc.) unless a sectored search is implemented. Kriging on the other hand, can use different weighting functions depending on, 1) the distance and orientation of sample points with respect to the node, and 2) the manner in which sample points are clustered.
Before the actual interpolation can begin, Kriging must calculate every possible distance weighting function. This is done by generating the experimental semivariogram of the data set and choosing a mathematical model which best approximates the shape of the semivariogram. The model provides a smooth, continuous function for determining appropriate weights for increasingly distant data points.
Generating a Semivariogram
As mentioned above, Kriging uses a different weighting function depending on both the distance and geographic orientation of the sample point to the node being calculated. The problem is that it is impossible for a user, at a first glance, to know precisely how a data set varies outward from any one location with respect to distance and direction. There are, however, many techniques available to help determine this, the most popular being a variance analysis.
Figure 3.11. Example of data that has no variance crosswise but varies greatly along the lengthwise axis of the data set.
Kriging uses a property called the semivariance to express the degree of relationship between points on a surface. The semivariance is simply half the variance of the differences between all possible points spaced a constant distance apart. The semivariance at a distance d = 0 will be zero, because there are no differences between points that
are compared to themselves. However, as points are compared to increasingly distant points, the semivariance increases. At some distance, called the Range, the semivariance will become approximately equal to the variance of the whole surface itself. This is the greatest distance over which the value at a point on the surface is related to the value at another point. The range defines the maximum neighbourhood over which control points should be selected to estimate a grid node, to take advantage of the statistical correlation among the observations. The calculation of semivariance between sample pairs is performed at different distances until all possible distance combinations have been analyzed. The initial distance used is called the Lag distance which is increased by the same amount for each pass through the data set. For example, if the Lag distance is ten metres, the first pass calculates the variance of all sample pairs that are ten metres apart. The second pass calculates the variance of all sample pairs 20 metres apart, the third at 30 metres and so on until the last two points that are the farthest apart have been examined. Simply, every point is compared to every other point to find out which ones are approximately the first Lag distance apart. When points are found to be this distance apart, the variance between their values and their geographical orientation is determined. When the first Lag distance has been analyzed the process repeats using the second Lag distance and then the third, and so on until all distance possibilities are exhausted. When the variance analysis is complete the information is displayed in a semivariogram. A semivariogram is a graph which plots the variance between points on the Y-axis and distance at which the variance was calculated on the X-axis. An example of a semivariogram is shown in Figure 3.12 below. The undulating line in the graph is the plot of calculated variances, plotted on the Y-axis, and their corresponding Lag distances on the X-axis. This plot is given the term experimental semivariogram. The jagged nature of the experimental semivariogram makes it unsuitable for use in calculating the kriging weights, so a smooth mathematical function (model) must be fit to the variogram. The model is shown as the white line in the graph (Figure 3.12.).
Figure 3.12. An example of a omni-direction semivariogram. The white line represents the model that will be used in the Kriging interpolation.
Although the strength of Kriging is its ability to account for directional trends of the data, it is possible to analyze variance with respect to distance only and disregard how points are geographically oriented. The above experimental semivariogram is an example of this, called an omni-directional experimental semivariogram. If geographic orientation is important then a directional semivariogram should be calculated such as the one shown in Figure 3.13 below.
Figure 3.13. An example of a directional semivariogram. Notice the two experimental semivariograms, one representing points oriented north and south of each other, and the other representing points east to west of each other.
When two or more directions are analyzed an experimental semivariogram will be generated for each direction. In Figure 3.13, two directions are being investigated and therefore two experimental semivariograms are plotted. Semivariogram experimentation can uncover fundamental information about the data set i.e., does the data vary in more than one direction. In more technical terms the semivariogram experimentation can reveal if the data set is isotropic (varies the same in all directions), or anisotropic (data varies differently in different directions) as demonstrated in Figure 3.13. When investigating these directional trends it is necessary to have tools available to modify parameters such as the directions in which the variances will be calculated. These parameters are discussed in the following section.
In the above diagram two directions are analyzed, represented by the dark and light grey pie shapes. It is important to note that although the diagram shows four pies, variance analysis is always performed in opposing directions. When more than one direction is set, the angle to which these sectors will be oriented must be specified. In the above diagram the angles are 0 degrees and 80 degrees. It is unlikely to find data pairs along exactly 0 degrees or 80 degrees orientation, thus it is necessary to define an interval around these exact values for which points will be considered. This interval is
known as the tolerance. In the above diagram the 0 degree direction has a tolerance of 45 degrees and the 80 degree direction has a tolerance of 20 degrees.
Figure 3.15 The Sill is a variance value that the model curve ideally approaches but does not cross. The Range is the distance value at which the variogram model determines where the Sill begins.
Anisotropic Modeling
It is quite natural for the behaviour of a data set to vary differently in one direction as compared to another. For example, a steeply sloping hill will typically vary in two directions. The first is up and down the hill where it varies quickly from the top to bottom, and the second is across the hill where it varies more slowly. When this occurs in a data set it is called anisotropy. When performing Anistropic Modeling the user is essentially guiding the Kriging interpolator to use sample data points that will most accurately reflect the behaviour of the surface. This is achieved by creating additional models for each direction analyzed. When interpolating points oriented in a north-south direction the Kriging weights can be influenced to use the parameters of one model while the points oriented in an east-west direction will be weighted using a different model.
Ordinary Kriging
This method assumes that the data set has a stationary variance but also a non-stationary mean value within the search radius. Ordinary Kriging is highly reliable and is recommended for most data sets.
Simple Kriging
This method assumes that the data set has a stationary variance and a stationary mean value and requires the user to enter the mean value.
Universal Kriging
This method represents a true geostatistical approach to interpolating a trend surface of an area. The method involves a two-stage process where the surface representing the drift of the data is built in the first stage and the residuals for this surface are calculated in the second stage. With Universal Kriging the user can set the polynomial expression used to represent the drift surface. The most general form of this expression is:
F(x, y) = a20 * x2 + a11 * xy + a02 * y2 + a10 * x + a01 * y + a00
where a00 is always present but rarely set to zero in advance of the calculation. However, any of the other coefficients can be set to zero. The recommended setting is a first degree polynomial which will avoid unpredictable behaviour at the outer margins of the data set.
Block Kriging
Any one of the three Kriging interpolation methods can be applied in one of two forms Punctual or Block. Punctual Kriging (the default) estimates the value at a given point and is most commonly used. Block Kriging uses the estimate of the average expected value in a given location (such as a block) around a point. Block Kriging provides better variance estimation and has the effect of smoothing interpolated results.
interpolation performed. However, experienced users will always spend some time fitting a model to the semivariogram.
Coincident Point Distance setting is a method of grouping or aggregating data points into a single new point with a recalculated value. As the distance setting becomes greater, the number of points found within each circular area will correspondingly increase. This may be appropriate in dealing with a highly variable and irregularly distributed data set.
The Coincident Point Aggregation setting allows the user to define the mathematical expression for handling aggregated data. For example, choosing a large coincident point distance and selecting Use Average Value will result in the creation of a new set of data points for interpolation, spaced approximately according to the distance setting, and with recalculated values based on the average of all points in each coincident point area. New points are placed at the geometric centre of the original group.
Cell Size is defined in map units for the interpolated grid file. Note that the grid dimensions (in cell units) vary inversely with cell size: the smaller the cell, the larger the grid file. The value
chosen should be a compromise between the degree of resolution required for analysis and visualization purposes and the processing time and file size. The default value is calculated by dividing the diagonal width of the aggregated point file by 250 (considered an optimum number based on computing power required to solve this computationally intensive algorithm). 2 Search Radius defines the maximum size, in map units, of a circular zone centred around each grid node within which point values from the original data set will be used in the calculation. The minimum and maximum number of data points used is also defined by the user (see Search Criteria settings below). The default setting is calculated as the diagonal distance through the minimum bounding rectangle of the point data set. This setting will be automatically changed based upon the results of the semivariogram analysis. The analysis is performed when the Variogram Builder button is selected this is explained later. Search Criteria is specified by two settings the Minimum # of Points and the Maximum # of Points. These options refer to the minimum number of points that must be found inside the search radius in order for a grid node to receive a calculated value, and the maximum number of points that will be used in the calculation. The default values of 3 and 10 are appropriate for most data sets. The user must keep in mind that if the maximum number of points in this setting is doubled, the processing time will increase by a factor of eight.
1. The Filename edit box prompts the user for the name and file path of the new grid that will be created. With the Kriging interpolator two grids are created. The first is the interpolated surface and the second is a grid of the estimated variance at each grid cell. The variance grid will have the same filename as the surface grid but will have the _var appended to the end of the name. Both grids will be placed in the same directory. 4 The Extents button opens an information box that summarizes the geographical size and the Z-value range of the original point data base, the density of the points, as well as the data value units.
1. The user may either select the Finish button to complete the gridding process or, if modifications to the previous dialogue are required, select << Back to return to one or more dialogues back. Once the grid file is created, it appears in a Map window with the default colour palette applied. 2. The Set Kriging Method button allows the user to choose from the three varieties of Kriging that can be applied: Ordinary Kriging, Simple Kriging and Universal Kriging. The default method is set to Ordinary Kriging which is suitable for most data sets. Regardless of the Kriging type employed, Block Kriging can also be implemented.
The Set Kriging Method Dialogue By choosing the Set button the Set Kriging Method dialogue box will appear providing additional settings for each of the Kriging methods.
Simple Kriging The user can set the mean value that will be used in the calculation.
Universal Kriging For advanced users - a check box is available for introducing a complex polynomial expression used to approximate the drift in the data. Detailed explanation of the concept of a regionalized variable and drift versus residual values is beyond the scope of this manual. Interested users should refer to the references
b) Block Kriging If left un-checked Punctual Kriging is applied. When checked the X Block and Y Block settings become available. These settings define the level discretization of the area around every point. 1. The Variogram Builder button on the Kriging Interpolation dialogue builds a semivariogram of the data allowing the user to match or tune a mathematical model to the experimental semivariogram.
The Variogram Dialogue When the Variogram Builder button is selected the Variogram dialogue will appear. The top portion of the dialogue displays the experimental semivariogram of the data set along with settings that control the directional calculations. The bottom portion of the dialogue contains settings that allow the tuning of the model. By default a best-fit model is automatically calculated, which can be further modified manually.
The Experimental Variogram Builder section of the dialogue contains seven settings that control the generation of the experimental semivariogram.
Directions describes the number of directions that will be analyzed. The default value is 2 with a minimum of 1 and a maximum of 6.
Active specifies which direction is currently being modified. The active direction is shown by a solid fill colour in the Circle View diagram.
3 4
Angle specifies the angle in degrees with respect to true north in which the Active direction is facing. Tolerance sets the interval in degrees on either side of the Angle setting within which points will be considered.
Lag Distance refers to the distance at which sample pairs will be analyzed for variance. This value increases until every point in the data set has been examined. The default value is the mean distance between points for the aggregated data set.
Circle View is a graphical display representing each of the Directions specified and their respective Tolerances. The solid coloured region represents the Active direction for which any changes to the Angle or Tolerances settings will be applied.
a)
Apply button recalculates and refreshes the semivariogram using any new settings.
1 The graph in the Variogram dialogue is a semivariogram of the data set which plots variance between sample pairs on the Y-axis and the Lag distances for the calculated variances on the X-axis. The experimental semivariograms that appear in the graph will only be updated when the Apply button is selected. The model curve(s) will automatically update when changes have been made to the dialogue that affect the model(s). The visual display of the graph can be modified in only a small number of ways. It can be maximized to full screen if a zoom has been applied to the view, or it can be undone, and the colours can be switched from colour to black and white. These options can be accessed by selecting the right mouse button while the cursor is over the graph.
The bottom portion of the Variogram dialogue provides the ability to modify the model curve so that it better fits the experimental semivariogram. This is done by applying one or more variogram models to the model curve.
Variogram Model reflects the model that will be applied to the model curve. Seven variogram models are available: Spherical, Exponential, Gaussian, Power, Hole Effect, Quadratic, and Rquadratic. Up to six models can be used at any one time.
Range/Power refers to two different aspects depending on the chosen variogram model. For all variogram models except for Power, the value entered will refer to the Range setting. This value indicates the Lag distance where the Range is considered to begin. If the Power variogram model is chosen then the value entered is the power coefficient. Note: A power of 1 yields a linear model. When multiple models are selected, the Range values are summed.
Sill/Slope also refers to two different aspects depending on the chosen variogram model. For all variogram models except for Power, the value entered will refer to the Sill setting. This value indicates the Lag distance where the Sill is considered to begin. If the Power variogram model is chosen then the value entered is the Slope value of the scale coefficient of the curve. When multiple models are selected, the Sill values are summed
Anistropic Modelling is selected when more than one model curve is to be built for the different directions analyzed. Once checked, the Angle and Anisotropy settings become available for each chosen variogram model.
Angle reflects which direction the variogram model will be applied to. The drop-down will list the Angle of the different directions used in the semivariogram.
Anisotropy is the ratio between the range in the minor and major directions of the semivariogram.
Suggested Model analyzes the experimental variogram and chooses the variogram model that best represents it. In some cases it may not be possible to automatically generate a model. A warning message will appear and the user must set a model manually.
Nugget - when scattered areas of high concentrations prevent the semivariogram from passing through the origin (0,0 point) the nugget value will force the semivariogram to pass through the Y axis at a higher value (has a smoothing effect on the kriging process preventing it from acting as an exact interpolator).
Anisotropy View shows the directional trends in the data set. A fat ellipse indicates that there is a greater degree of correlation of the variances between sample pairs in that direction. Conversely, the narrower the ellipse the smaller the correlation is. Each model will have an ellipse drawn that will be rotated to match the Angle setting of the model.
Tip: When using anistropic modeling, related directions, models, experimental semivariograms and ellipses in the Anistropic View are all colour coordinated.