Shapiro
Shapiro
Shapiro
Introduction:
Shapiro and Wilk develop a goodness of fit test for normality that may be used instead of the
Lilliefors test.
Some empirical studies indicate that this test has good power in many situations when
compare with Lilliefors and Chi-square test.
The data consist of random sample 𝑋1 , 𝑋2 … 𝑋𝑛 of size 𝑛 associated with some unknown
distribution function 𝐹(𝑥).
Assumption: The sample is a random sample.
Objective: The Shapiro–Wilk test can be used to decide whether or not a sample fits a normal
distribution, and it is commonly used for small samples (𝑛 < 50).
Parametric Counterpart: The parametric counterpart of the Shapiro-Wilk test is the Anderson-
Darling test. Both tests are used to assess the normality assumption of a dataset, but they differ in their
approaches.
Procedure:
Step 1
𝐻𝑜 : 𝐹(𝑥) is a normal distribution function with unspecified mean and variance.
𝐻1 : 𝐹(𝑥) is non-normal.
Step 2
Level of significance: 𝛼 = 0.05 (unless otherwise stated)
Step 3
Test Statistic: First compute the denominator 𝐷 of the test statistics 𝐷 = ∑𝑛𝑖=1(𝑥𝑖 − 𝑥̅ )2 where 𝑥̅ is the
sample mean. Then order the sample from smallest to largest 𝑥1 ≤ ⋯ ≤ 𝑥𝑛 . From the Shapiro-Wilks
𝑛
table for the observed sample size 𝑛, obtain the coefficients 𝑎1 , 𝑎2 , … , 𝑎𝑚 . If 𝑛 is even, then 𝑚 = ,
2
(𝑛−1)
while if 𝑛 is odd then 𝑚 = 2
. The test statistics 𝑊 is given as:
1
𝑊= [∑𝑚
𝑖=1 𝑎𝑖 (𝑥𝑛+1−𝑖 − 𝑥𝑖 )]2
𝐷
Step 4
Calculations:
Step 5
Critical Region: Reject 𝐻𝑜 if 𝑊 < 𝑊𝛼 (from Shapiro-Wilks table)
Step 6
Conclusion:
Example 1
A random sample of 12 people is taken from a large population.65,61,63,86,70,55,74,35,72,68,45,58.
Is this data normally distributed?
Solution:
Step 1
𝐻𝑜 : The data is normally distributed.
𝐻1 : The data is not normally distributed.
Step 2
Level of significance: 𝛼 = 0.05
Step 3
1
Test Statistic: 𝑊 = [∑𝑚 2 𝑛
𝑖=1 𝑎𝑖 (𝑥𝑛+1−𝑖 − 𝑥𝑖 )] , where 𝐷 = ∑𝑖=1(𝑥𝑖 − 𝑥̅ )
2
𝐷
Step 4
Calculations: For calculation of [∑𝑚
𝑖=1 𝑎𝑖 (𝑥𝑛+1−𝑖 − 𝑥𝑖 )]
2
𝑛 12
𝑛 =12, 𝑚 = 2 = 2
=6
(𝑥𝑖 − 𝑥̅ )2 (𝑥𝑖 − 𝑥̅ )2
765.446 28.4441
312.112 53.7773
58.7783 87.1105
21.7781 128.444
2.77789 544.443
0.11109 Total= 2008.67
5.44429
(44.1641)2
𝑊= = 0.9170
2008.67
Step 5
Critical Region: Reject 𝐻𝑜 if 𝑊 < 𝑊𝛼 (from Shapiro-Wilks table)
𝑊 < 𝑊0.05
𝑊 < 0.859
Step 6
Conclusion: Can not reject 𝐻𝑜
Large Sample Size: Shapiro and Francia (1972) suggest an approximate test for 𝑛 greater tha 50 that
is similar to Shapiro-Wilk test.
Example 2
A random sample of 60 people is taken from a large population 47, 32, 55, 28, 63, 41, 38, 49, 52, 45,
36, 58, 29, 44, 51, 33, 42, 57, 39, 48, 31, 53, 46, 35, 50, 37, 60, 30, 43, 54, 56, 40, 61, 34, 59, 27, 62,
64, 25, 70, 26, 68, 67, 69, 66, 72, 73, 74, 71, 65, 23, 75, 22, 24, 21, 76, 20, 19, 18, 77, 17, 78.Is this
data normally distributed?
Solution:
Step 1
𝐻𝑜 : The data is normally distributed.
𝐻1 : The data is not normally distributed.
Step 2
Level of significance: 𝛼 = 0.05
Step 3
ln(1−𝑊 ′ )−𝜇
Test Statistic: 𝑍 = 𝜎
where
∑𝑛𝑖=1(𝑥𝑖 − 𝑥̅ )(𝑚𝑖 − 𝑚
̅)
𝑊′ =
2
√[∑𝑛𝑖=1(𝑥𝑖 − 𝑥̅ )2 ][∑𝑛𝑖=1(𝑚𝑖 − 𝑚
̅) ]
Numerator:
∑𝑛𝑖=1(𝑥𝑖 − 𝑥̅ )(𝑚𝑖 − 𝑚
̅ )= 1027.5893
Denominator:
2
∑𝑛𝑖=1(𝑥𝑖 − 𝑥̅ )2 =19096.18333, ∑𝑛𝑖=1(𝑚𝑖 − 𝑚
̅ ) =57.1166
2
√[∑𝑛𝑖=1(𝑥𝑖 − 𝑥̅ )2 ][ ∑𝑛𝑖=1(𝑚𝑖 − 𝑚
̅ ) ] = √(19096.18333)(57.1166) = 1044.3702
1027.5893
𝑊′ = = 0.9839
1044.3702
ln(1−𝑊 ′ )−𝜇
Now, 𝑍 = 𝜎
𝜇 = 0.0038915(ln 𝑛)3 − 0.083751(ln 𝑛)2 − 0.31082 ln 𝑛 − 1.5861
𝜇 = 0.0038915(ln(60))3 − 0.083751(ln(60))2 − 0.31082 ln(60) − 1.5861
𝜇 =-3.9956
2
𝜎 = 𝑒 0.0030302(ln 𝑛) −0.082676 ln 𝑛−0.4803
2
𝜎 = 𝑒 0.0030302(ln(60)) −0.082676 ln(60)−0.4803
𝜎 =0.46394
ln(1 − 0.9839) − (−3.9956)
𝑍= = −0.29146
0.46394
P-value= 0.614 ((From Standard Normal Distribution table)
Step 5
Critical Region: Reject 𝐻𝑜 if 𝑃_𝑣𝑎𝑙𝑢𝑒 < 𝛼
Step 6
Conclusion: Cannot reject 𝐻𝑜