Circular Data Correlation PDF
Circular Data Correlation PDF
Circular Data Correlation PDF
com
Chapter 231
Circular Data
Correlation
Introduction
This procedure computes summary statistics, generates rose plots and circular histograms, and computes the
circular correlation coefficient for circular data.
Angular data, recorded in degrees or radians, is generated in a wide variety of scientific research areas. Examples
of angular (and cyclical) data include daily wind directions, ocean current directions, departure directions of
animals, direction of bone-fracture plane, and orientation of bees in a beehive after stimuli.
The usual summary statistics, such as the sample mean and standard deviation, cannot be used with angular
values. For example, consider the average of the angular values 1 and 359. The simple average is 180. But with a
little thought, we would conclude that 0 is a better answer. Because of this and other problems, a special set of
techniques have been developed for analyzing angular data.
231-1
NCSS, LLC. All Rights Reserved.
NCSS Statistical Software NCSS.com
Circular Data Correlation
Technical Details
Suppose a sample of n angles a1 , a2 ,..., a n is to be summarized. It is assumed that these angles are in degrees.
Fisher (1993) and Mardia & Jupp (2000) contain definitions of various summary statistics that are used for
angular data. These results will be presented next. Let
n n
Cp Sp
Cp = cos( pai ) , Cp =
i =1 n
, Sp = sin( pa ) , S
i =1
i p =
n
,
Rp
Rp = C p2 + S p2 , Rp =
n
1 S p
tan C p > 0, S p > 0
Cp
Sp
Tp = tan 1 + Cp < 0
Cp
S
tan 1 p + 2 S p < 0, C p > 0
Cp
To interpret these quantities it may be useful to imagine that each angle represents a vector of length one in the
direction of the angle. Suppose these individual vectors are arranged so that the beginning of the first vector is at
the origin, the beginning of the second vector is at the end of the first, the beginning of the third vector is at the
end of the second, and so on. We can then imagine a single vector a that will stretch from the origin to the end of
the last observation.
R1 , called the resultant length, is the length of a . R1 is the mean resultant length of a . Note that R1 varies
between zero and one and that a value of R1 near one implies that there was little variation in values of the
angles.
The mean direction, , is a measure of the mean of the individual angles. is estimated by T1 .
The circular variance, V, measures the variation in the angles about the mean direction. V varies from zero to one.
The formula for V is
V = 1 R1
The circular standard deviation, v, is defined as
v = 2 ln( R1 )
231-2
NCSS, LLC. All Rights Reserved.
NCSS Statistical Software NCSS.com
Circular Data Correlation
n(1 H )
=
4 R2
1 n n
H = cos(2T1 ) cos(2ai ) + sin(2T1 ) sin(2ai )
n i =1 i =1
231-3
NCSS, LLC. All Rights Reserved.
NCSS Statistical Software NCSS.com
Circular Data Correlation
Tests of Uniformity
Uniformity refers to the situation in which all values around the circle are equally likely. Occasionally, it is useful
to perform a statistical test of whether a set of data do not follow the uniform distribution. Several tests of
uniformity have been developed. Note that when any of the following tests are rejected, we can conclude that the
data were not uniform. However, when the test is not rejected, we cannot conclude that the data follow the
uniform distribution. Rather, we do not have enough evidence to reject the null hypothesis of uniformity.
Rayleigh Test
The Rayleigh test, discussed in Mardia & Jupp (2000) pages 94-95, is the score test and the likelihood ratio test
for uniformity within the von Mises distribution family. The Rayleigh test statistic is 2nR 2 . For large samples, the
distribution of this statistic under uniformity is a chi-square with two degrees of freedom with an error of
( )
approximation of O n 1 . A closer approximation to the chi-square with two degrees of freedom is achieved by
( )
the modified Rayleigh test. This test, which has an error of O n 2 , is calculated as follows.
1 nR 4
S * = 1 2nR 2 +
2n 2
Modified Kuiper's Test
The modified Kuiper's test, Mardia & Jupp (2000) pages 99-103, was designed to test uniformity against any
alternative. It measures the distance between the cumulative uniform distribution function and the empirical
distribution function. It is accurate for samples as small as 8. The test statistic, V, is calculated as follows
0.24
V = Vn n + 0155
. +
n
where
a i a i 1
Vn = max ( i ) min ( i ) +
i =1 to n 360 n i=1 to n 360 n n
Published critical values of V are
V Alpha
1.537 0.150
1.620 0.100
1.747 0.050
1.862 0.025
2.001 0.010
This table was used to create an interpolation formula from which the alpha values are calculated.
Watson Test
The following uniformity test is outlined in Mardia & Jupp pages 103-105. The test is conducted by calculating
U 2 and comparing it to a table of values. If the calculated value is greater than the critical value, the null
hypothesis of uniformity is rejected. Note that the test is only valid for samples of at least eight angles.
The calculation of U 2 is as follows
i 12
2
n
1 1
U = u( i )
2
u + +
i =1 n 2 12n
231-4
NCSS, LLC. All Rights Reserved.
NCSS Statistical Software NCSS.com
Circular Data Correlation
where
n
u (i)
a( i )
u= i =1
, u( i ) =
n 360
a(1) a( 2 ) a( 3) a( n ) are the sorted angles. Note that maximum likelihood estimates of and are used
in the distribution function. Mardia & Jupp (2000) present a table of critical values that has been entered into
NCSS. When a value of U 2 is calculated, the table is interpolated to determine its significance level.
Published critical values of U 2 are
U2 Alpha
0.131 0.150
0.152 0.100
0.187 0.050
0.221 0.025
0.267 0.010
f (a; , ) =
1
2 I 0 ( )
[
exp cos(a ) ]
where I p ( x ) (the modified Bessel function of the first kind and order p) is defined by
2r + p
1 x
I p ( x) = , p = 0,1,2,
r = 0 ( r + p ) !r ! 2
In particular
2r
1 x
I0 ( x ) = 2
r = 0 ( r !)
2
2
1 x cos ( )
=
2 e
0
d
The parameter is the mean direction and the parameter is the concentration parameter.
The distribution is unimodal. It is symmetric about A. It appears as a normal distribution that is truncated at plus
and minus 180 degrees. When is zero, the von Mises distribution reduces to the uniform distribution. As gets
large, the von Mises distribution approaches the normal distribution.
231-5
NCSS, LLC. All Rights Reserved.
NCSS Statistical Software NCSS.com
Circular Data Correlation
Point Estimation
The maximum likelihood estimate of is the sample mean direction. That is, = T1 .
The maximum likelihood of is the solution to
A1 ( ) = R
where
I1 ( x )
A1 ( x ) = .
I0 ( x )
That is, the MLE of is given by
* = A11 ( R )
This can be approximated by (see Fisher (1993) page 88 and Mardia & Jupp (2000) pages 85-86)
5R 5
2 R + R + 6 R < 0.53
3
0.43
= 0.4 + 139
*
. R+ 0.53 R < 0.53
1 R
1
R 0.85
3R 4 R 2 + R 3
This estimate is very biased. This bias is corrected by using the following modified estimator.
* 2
max , 0 * < 2
n *
n 15
= ( n 1)
3 *
*
2
(
n n2 + 1 )
* n > 15
T1 cos 1 [
2n 2 R 2 nz 2
] if R 2 / 3
R 2 4n z 2
( )
2
n 2 n 2 R 2 exp z
( )
n
T1 cos 1 if R > 2 / 3
R
231-6
NCSS, LLC. All Rights Reserved.
NCSS Statistical Software NCSS.com
Circular Data Correlation
n(1 R )
b=
n21,1 / 2
n(1 R )
d=
n21, / 2
p (i)
p = i =1
n
(
p ( i ) = F a( i ) T1 )
a(1) a( 2 ) a( 3) a( n ) are the sorted angles and F (a ) is the cumulative distribution function of the von
Mises distribution. Note that maximum likelihood estimates of and are used in the distribution function.
Lockhart & Stephens (1985) present a table of critical values that has been entered into NCSS. When a value of
U 2 is calculated, the table is interpolated to determine its significance level.
Cox Test
Mardia & Jupp (2000) pages 142-143 present a von Mises goodness-of-fit test that was originally given by Cox
(1975).
The test statistic, C, is distributed as a chi-squared variable with two degrees of freedom under the null hypothesis
that the data follow the von Mises distribution. It is calculated as follows.
sc2 ss2
C= +
nv c ( ) nv s ( )
231-7
NCSS, LLC. All Rights Reserved.
NCSS Statistical Software NCSS.com
Circular Data Correlation
where
n
sc = cos 2(a i T1 ) n 2 ( )
i =1
n
ss = sin 2(a i T1 )
i =1
vc ( x ) =
1 + 4 [ / 2 + 3 / 2 1 2 ]
22 1
2
2 (1 + 2 ) / 2 12
vS ( x) =
1 4
(1 3 )
2
2 1 2
sin (a
k =1
2
1k T1,1 ) sin 2 (a2 k T2,1 )
k =1
where T1,1 is the mean direction of the first circular variable and T2,1 is the mean direction of second.
The significance of this correlation coefficient can be test using the fact the zr is approximately distributed as a
standard normal, where
n2002
zr = rc
22
and
1 n
ij = sin i (a1k T1,1 ) sin j (a2 k T2,1 )
n k =1
Data Structure
The data consist of two or more variables. Each variable contains a set of angular values. An example of a dataset
containing circular data is Circular3.S0. Missing values are entered as blanks (empty cells).
231-8
NCSS, LLC. All Rights Reserved.
NCSS Statistical Software NCSS.com
Circular Data Correlation
Procedure Options
This section describes the options available in this procedure.
Variables Tab
These options specify the variables that will be used in the analysis.
Data Variables
Data Variables
Specify two or more variables that contain the angular values. The angular correlation coefficient will be
calculated for each pair of variables. These variables must be of the type specified in 'Data Type'.
Data Type
Specify the type of circular data that is contained in the Data Variables. Note that all variables must be of the
same data type. The possible data types are
Angle (0 to 360)
Data are in the range 0 to 360 degrees. Negative values are converted to positive values by subtracting them
from 360 (e.g. -20 becomes 340). Data outside 0 to 360 are converted to this range by subtracting (or adding)
360 until the value is in this range.
RADIAN (0 to 2 pi)
Data are in the range 0 to 2pi radian. Negative values are converted to positive values by subtracting them
from 2pi.
AXIAL (0 to 180)
Data are bidirectional. Axial data are converted to angular data by multiplying by two. Axial data may be in
the full 0-360 range.
Compass
Text data representing the 16 points of the compass are entered. Values are converted into degrees using the
recodes: N = 0, E = 90, S = 180, W = 270. Two and three letters may be used. For example, 'NNW' is north
by north-west.
Time (0-24)
Time of day values between 0 and 24 may be entered.
Weekday
Integers representing the days of the week are entered. The relationship is 1 = Monday, 2 = Tuesday, ..., 7 =
Sunday. The integers are converted to degrees using 1 = 180/7, 2 = 180/7+360/7, and so on.
Month of Year
Integers representing the months of the year are entered. The relationship is 1 = January, 2 = February, ..., 12
= December. The integers are converted to degrees using 1 = 180/12, 2 = 180/12+360/12, and so on.
231-9
NCSS, LLC. All Rights Reserved.
NCSS Statistical Software NCSS.com
Circular Data Correlation
Grouping Factor
Grouping Correction Factor
When the same data values occur repeatedly, a correction factor is suggested for the calculation of R bar. This
correction factor depends on the number of unique values, which is entered here. If '0' is entered, no correction
factor is used.
Reports Tab
The options on this panel control which reports and plots are displayed.
Confidence Coefficient
Confidence Coefficient
Specify the value of confidence coefficient for the confidence intervals.
Select Reports
Summary Reports ... Correlations
Select these options to display the indicated reports.
Report Options
Show Notes
This option controls whether the available notes and comments that are displayed at the bottom of each report.
This option lets you omit these notes to reduce the length of the output.
Precision
Specify the precision of numbers in the report. A single-precision number will show seven-place accuracy, while
a double-precision number will show thirteen-place accuracy. Note that the reports were formatted for single
precision. If you select double precision, some numbers may run into others. Also note that all calculations are
performed in double precision regardless of which option you select here. This is for reporting purposes only.
Variable Names
This option lets you select whether to display only variable names, variable labels, or both.
231-10
NCSS, LLC. All Rights Reserved.
NCSS Statistical Software NCSS.com
Circular Data Correlation
Plots Tab
The options on this panel control the appearance of the plots.
Select Plots
Rose Plot / Circular Histogram (Combined) and Rose Plots / Circular Histograms (Individual)
Select these options to display the indicated plots.
Format
Click the plot format button to change the plot settings (see the Window Options below).
Edit During Run
Checking this option will cause the bar chart format window to appear when the procedure is run. This allows you
to modify the format of the graph with the actual data.
Data Type
The data type of the plot is specified independently of the data type specified on the Variables tab of the Circular
Data Analysis procedure.
231-11
NCSS, LLC. All Rights Reserved.
NCSS Statistical Software NCSS.com
Circular Data Correlation
Direction
This option indicates whether the orientation of the plot is in a 'Clockwise' or 'Counter-Clockwise' direction.
231-12
NCSS, LLC. All Rights Reserved.
NCSS Statistical Software NCSS.com
Circular Data Correlation
Interior Objects
The two choices for plot styles are Rose Plot and Circular Histogram.
231-13
NCSS, LLC. All Rights Reserved.
NCSS Statistical Software NCSS.com
Circular Data Correlation
Group Display
When the data is grouped data, this option determines whether the petals within a bin are side-by-side, stacked
upon each other, or overlaid.
Side-by-Side
The bin width is divided equally by the number of groups and the petals are laid out sequentially in the bin.
Although the petals are narrower, they still encompass the points of the group that within the boundaries of the
whole bin.
Stacked
A single petal in each bin is divided by the number of groups. Rose plots with the group display set to Stacked
may be misleading because the proportional area is larger for the outside groups.
Overlaid
Each petal for each group is overlaid in each bin. Some degree of transparency is recommended when using the
Overlaid group display. It is also difficult to distinguish groups when there are more than 2 or 3 groups.
Petal Width
Specify the percent of the total width of each bin that is to be used for each petal.
Number of Bins
Specify the number of bins for the circle.
231-14
NCSS, LLC. All Rights Reserved.
NCSS Statistical Software NCSS.com
Circular Data Correlation
231-15
NCSS, LLC. All Rights Reserved.
NCSS Statistical Software NCSS.com
Circular Data Correlation
231-16
NCSS, LLC. All Rights Reserved.
NCSS Statistical Software NCSS.com
Circular Data Correlation
References Tab
Direction References
The options in this section allow you to specify the tick marks and references going around the plot.
Magnitude References
The options in this section allow you to specify the tick marks and references going from the center to the outside
of the plot.
231-17
NCSS, LLC. All Rights Reserved.
NCSS Statistical Software NCSS.com
Circular Data Correlation
The following reports and charts will be displayed in the Output window.
Variable
This is the variable presented on this line.
Sample Size
This is the number of non-missing values in this variable.
Mean Direction
This is estimated mean direction, T1 .
231-18
NCSS, LLC. All Rights Reserved.
NCSS Statistical Software NCSS.com
Circular Data Correlation
Circular Variance
The circular variance, V, is a measure of variation in the data. Note that V = 1 R1 .
Circular Dispersion
1 T2
The circular dispersion, = , is another measure of variation.
2 R12
This report provides the angular correlation coefficient of each pair of variables as defined in Jammalamadaka and
SenGupta (2001). It also provides the results of a large sample significance test of whether the correlation is zero.
This report provides the large sample confidence interval for the mean direction as described by Upton &
Fingleton (1989) page 220. Note that this interval does not require the assumption that the data come from the von
Mises distribution.
This report provides measures of data variation and dispersion which were defined in the Statistical Summary
Report. It also provides measures of the skewness and kurtosis of the data.
231-19
NCSS, LLC. All Rights Reserved.
NCSS Statistical Software NCSS.com
Circular Data Correlation
Skewness
This is a measure of the skewness (lack of symmetry about the mean) in the data. Symmetric, unimodal datasets
have a skewness value near zero.
Kurtosis
This is a measure of the kurtosis (peakedness) in the data. Von Mises datasets have a kurtosis near zero.
This report provides estimates and confidence intervals of the parameters (mean direction and concentration) of
the von Mises distribution that best fits the data. Note that the von Mises distribution is a symmetric, unimodal
distribution. You should check the rose plot or circular histogram to determine if the data are symmetric.
The formulas used in the estimation and confidence intervals were given earlier in this chapter. They come from
Mardia & Jupp (2000).
This report provides summary statistics that are used in other calculations.
Mean Cos(a)
1 n
This is C1 = cos(ai ) .
n i =1
Mean Sin(a)
1 n
This is S1 = sin(ai ) .
n i =1
Mean Cos(2a)
1 n
This is C2 = cos(2ai ) .
n i =1
Mean Sin(2a)
1 n
This is S 2 = sin(2ai ) .
n i =1
231-20
NCSS, LLC. All Rights Reserved.
NCSS Statistical Software NCSS.com
Circular Data Correlation
R bar
This is R1 =
1
n
(
n C12 + S12 . )
2R bar
This is R2 =
1
n
(
n C22 + S22 . )
Theta, 2 Theta
This is calculated using the following formula with p set to 1 and then 2, respectively.
1 S p
tan C p > 0, S p > 0
Cp
Sp
Tp = tan 1 + Cp < 0
Cp
S
tan 1 p + 2 S p < 0, C p > 0
Cp
Notes:
The tests in this report assess the goodness-of-fit of the uniform distribution.
The Rayleigh test requires samples of at least 20.
The Kuiper and Watson tests require samples of at least 8.
This section reports the results of three goodness-of-fit tests for the uniform distribution. They were documented
earlier in this chapter.
These tests may be viewed as testing whether the data are distributed uniformly around the circle.
231-21
NCSS, LLC. All Rights Reserved.
NCSS Statistical Software NCSS.com
Circular Data Correlation
Notes:
The tests in this report assess the goodness-of-fit of the von Mises distribution.
Both tests require samples of at least 20.
This section reports the results of two goodness-of-fit tests for the von Mises distribution. They were documented
earlier in this chapter. Several hypothesis tests assume that the data follow a von Mises distribution. These tests
allow you to check the accuracy of this assumption.
Rose Plots
231-22
NCSS, LLC. All Rights Reserved.
NCSS Statistical Software NCSS.com
Circular Data Correlation
These plots show the distribution of the data around the circle.
Circular Histograms
231-23
NCSS, LLC. All Rights Reserved.
NCSS Statistical Software NCSS.com
Circular Data Correlation
The circular histograms are generated by setting the Interior Objects on Plot to Circular Histogram.
231-24
NCSS, LLC. All Rights Reserved.