Project Work
Project Work
Project Work
LIST OF CONTENTS
Chapter 1: Introduction
1.1 Origin and development
1.2 Terminologies and Definitions
1.3 Single – Tailed Test
1.4 Two – Tailed Test
Chapter 2: Parametric Tests
2.1 Introduction
2.2 Assumptions
2.3 Types of Tests
2.4 Method
2.5 Application
2
CHAPTER 1
INTRODUCTION
1.1 Origin and Development
In today’s data-driven world, decisions are based on data all the time.
Hypothesis plays a crucial role in that process, whether it may be making
business decisions, in the health sector, academia, agriculture, or quality
improvement. Without hypothesis and hypothesis tests, you risk drawing the
wrong conclusions and making bad decisions.
Hypothesis testing is a systematic procedure for deciding whether the
results of a research study support a particular theory.
Hypothesis testing or significance testing is a method for testing a claim or
hypothesis about a parameter in a population, using data measured in a sample.
In this method, we test some hypotheses by determining the likelihood that a
sample statistic could have been selected, if the hypothesis regarding the
population parameter were true.
Hypothesis testing is the method of using samples to learn more about the
characteristics of a given population. It involves making an assumption
about a population parameter or distribution and testing it further.
While hypothesis testing was popularized early in the 20th century, early forms
were used in the 1700s. The first use is credited to John Arbuthnot (1710),
followed by Pierre-Simon Laplace (1770s), in analyzing the human sex ratio at
birth.
In 1778, Pierre Laplace compared the birthrates of boys and girls in multiple
European cities. He states: "It is natural to conclude that these possibilities are
very nearly in the same ratio". Thus, the null hypothesis in this case is that the
birthrates of boys and girls should be equal.
In 1900, Karl Pearson developed the chi-square test to determine ‘whether a
given form of frequency curve will effectively describe the samples drawn from
a given population.’ Thus, the null hypothesis is that a population is described
by some distribution predicted by theory.
In 1904, Karl Pearson developed the concept of "contingency" to determine
whether outcomes are independent of a given categorical factor. Here the null
hypothesis is by default that two things are unrelated (e.g. scar formation and
death rates from smallpox). In this case, the null hypothesis is no longer
3
predicted by theory or conventional wisdom, but instead by the principle of
indifference that led Fisher and others to dismiss the use of "inverse
probabilities".
Modern significance testing is largely the product of Karl Pearson (p-value,
Pearson's chi-squared test), William Sealy Gosset (Student's t-distribution), and
Ronald Fisher ("null hypothesis", analysis of variance, "significance test"),
while hypothesis testing was developed by Jerzy Neyman and Egon Pearson
(son of Karl). Ronald Fisher began his life in statistics as a Bayesian (Zabell
1992), but Fisher soon grew disenchanted with the subjectivity involved
(namely use of the principle of indifference when determining prior
probabilities) and sought to provide a more "objective" approach to inductive
inference.
1.2.2 Sample
Selecting data from the population is known as a sample. The method or
technique used for the selection of data from the population is referred to as
sampling. Sampling can be easily related to an example of forming a subset
from an original set. Since the analysis of the whole population is impractical or
impossible (due to size, cost, or time constraints), samples are drawn. Sampling
is required to estimate a large population's characteristics or behaviour with a
sample's help. There are several techniques for sampling data which include
simple random sampling, systematic sampling, stratified sampling, cluster
sampling, etc.
4
1.2.3 Estimation
The process of drawing inferences about a population using its sample is known
as estimation. It can also be referred to as a way of making a calculation more
manageable. Estimation in statistics is a method to calculate the value of some
property of a population from the observation of a sample drawn from the same
population.
1.2.4 Parameter
A parameter is a numerical value that explains the characteristics of a
population. A few common parameters used in testing a hypothesis are mean,
variance, standard deviation, proportion, etc. It is a key characteristic of a
population, providing a foundation for analysis, estimation, and decision-
making in research.
5
1.2.5 Sample statistic
Sample statistics are functions that explain a sample's characteristics. These
function’s values are computed using sample data. In statistics, a function or
mathematical formula that uses sample data to calculate the estimate of an
unknown parameter or quantity is known as an estimator.
6
1.2.11 Error
The decision to accept or reject the null hypothesis H₀ is made based on
information available from the observation of the sample. The conclusion hence
drawn is necessarily not always true in respect of the population. This leads to
errors in decision-making. This error can be of two types namely -
Type I Error – The error of rejecting the null hypothesis even if it is true is
known as a Type I error. The probability of Type I error is denoted by α. Type I
error is also known as significance level.
Type II Error – The error of accepting the null hypothesis even if it is false is
known as Type II error. The probability of Type II error is denoted by β.
7
1.2.12 Power of a test
In hypothesis testing, the power of a test refers to avoiding a type II error i.e.
correctly rejecting a false null hypothesis.
The power of a test is (1- β), that is, the probability of rejecting the null
hypothesis when it is false and can be calculated by following the procedures
outlined by Cohen (1988).
It ranges from 0 to 1 since it is the probability of rejecting a false null
hypothesis. Factors such as significance level, sample size, etc. affect the power.
If the value of the power of a test is closer to 1, it is said to be a good power.
1.2.13 p-Value
A p-value is the probability of obtaining a sample outcome, given that the value
stated in the null hypothesis is true. It ranges from 0 to 1. If the p-value is small,
more likely to reject the null hypothesis, and if the p-value is large, more likely
to accept the null hypothesis. A p-value can be obtained using test distribution
tables.
8
1.2.14 Degrees of Freedom ( ⅆf )
The number of independent items in a data sample data which is randomly
selected from the population is known as the degree of freedom. Usually, the
degree of freedom is one less than the total number of items in the sample data.
The degree of freedom is used to ensure that the sample data is statistically valid
for tests.
ⅆf = N – 1
9
ii) ALTERNATE HYPOTHESIS
The important reason for testing the null hypothesis is because we
suspect it is wrong. The acceptance or rejection of the null hypothesis
is meaningful to the decision-maker hence the suspension.
The alternate hypothesis is a statement about the population parameter
that states there is a significant difference between the population
parameter and the sample statistic.
The alternate hypothesis is accepted when the null hypothesis is
rejected and is rejected when the null hypothesis is accepted.
⇒ H₁: µ>µ₁ or µ<µ₁ or µ≠µ₁
where H₁ is the alternate hypothesis, µ is the population mean, and µ₁
is the sample mean.
10
1.4 Two-Tailed Test or Double–Tailed Test
A two-tailed test, in statistics, is a method in which the critical area of a
distribution is two-sided and tests whether a sample is greater than or less than a
certain range of values. If the sample being tested falls into either of the critical
areas, the alternative hypothesis is accepted instead of the null hypothesis. The
two-tailed test gets its name from testing the area under both tails of a normal
distribution.
11
1.2.15 p-Value
A p-value is the probability of obtaining a sample outcome, given that the value
stated in the null hypothesis is true. It ranges from 0 to 1. Typically, the
probability of obtaining a sample outcome is set at 5% i.e. 95% level of
significance. If the p-value is small, more likely to reject the null hypothesis,
and if the p-value is large, more likely to accept the null hypothesis. A p-value
can be obtained using test distribution tables.
12
CHAPTER 2
PARAMETRIC TESTS
2.1 Introduction
Parametric statistics is a branch of statistics that leverages models based on a
fixed (finite) set of parameters. However, it may make some assumptions about
that distribution, such as continuity or symmetry, or even an explicit
mathematical shape but have a model for a distributional parameter that is not
itself finite-parametric. Parametric statistics was mentioned by R. A. Fisher in
his work Statistical Methods for Research Workers in 1925, which created the
foundation for modern statistics.
When choosing a test for our hypothesis, we need to know what type of
(outcome) data we have and the characteristics of the data. Parametric tests are
the most common statistical tests for continuous outcome variables. They are
often easier to calculate. They are named "parametric" because they are based
on Gaussian parameters, such as the mean and standard deviation.
When conducting hypothesis testing with parametric data, several assumptions
must be met to ensure the validity of statistical tests.
2.2 Assumptions
The following are the key assumptions while performing parametric tests:
2.2.1 Random Sampling
All statistical hypothesis tests assume that the sampling is random.
2.2.2 Distribution
It is important for the distribution of the population must be known or the
sample size should be large enough.
2.2.3 Normality
The data should be normally distributed, especially for smaller sample sizes.
This means that the distribution of the population from which the samples are
drawn should resemble a bell-shaped curve.
13
For larger sample sizes (typically n≥30), the Central Limit Theorem suggests
that the sampling distribution of the sample mean will be approximately normal,
even if the data itself is not.
2.2.4 Independence
The samples drawn from the populations should be independent of each other.
2.5 Methodology
A method of comparing a claim or statement observed regarding the population
with the help of a sample drawn from the same population is known as the
testing of a hypothesis. Statistical tests are used in hypothesis testing which can
be used to determine whether a predictor or estimator variable has a statistically
significant relationship with an outcome.
Steps involved in Testing of Hypothesis-
2.5.1 Formulate the null hypothesis
14
Depending upon the given or collected sample data, wisely choose the
test for further observation and inference. The test can be selected
based on the distribution of the sample data. The appropriate test
should be used for a better study of the design of the sample data or
variation.
Perform the statistical test to compute the test statistics and calculate
the p-value. The p-value is calculated using the sampling distribution
of test statistics, the sample data, and the type of test being performed.
The method of calculation of the p-value is different for different
tests.
2.5.5 Conclusion
15
3.6 Application
We have already discussed that hypothesis testing is a method used to test
whether the null hypothesis is valid to be accepted or rejected. There are various
fields where testing of hypothesis is used. The following are some examples-
3.6.1 Evaluate the strength of a claim
It helps to determine whether the evidence supports a specific claim
or theory, ensuring that conclusions are based on empirical data rather
than assumptions. Hypothesis testing is a way of assessing the
strength of a claim or assumption before using it in a data set.
16
company can use hypothesis testing to compare sales data from
customers who received free shipping offers and those who didn't.
Hypothesis testing can help businesses reduce the risk of costly
mistakes by basing business decisions on data, not hunches.
In manufacturing, hypothesis testing can help ensure that production
processes are within specified limits. In financial analysis, hypothesis
testing can help measure investment performance and identify
anomalies in financial data.
17
CHAPTER 3
SMALL SAMPLE TESTS
When the sample size is small (n < 30) the central limit theorem does
not assume the distribution of sample statistics as normal. When the
sample size is small the special probability distributions are used to
determine critical value for the test statistic.
The two types of small sample tests are –
i) t-test
ii) F-test
3.1 t-test
The term "t-statistic" is abbreviated from "hypothesis test statistic".
The t-test is named after William Sealy Gosset’s Student’s t-
distribution, created while he was writing under the pen name
“Student.” It was William Sealy Gosset who first published it in
English in 1908 in the scientific journal Biometrika using the
pseudonym “Student”.
Although it was William Gosset after whom the term "Student" is
penned, it was actually through the work of Ronald Fisher that the
distribution became well known as "Student's distribution" and
"Student's t-test".
18
values in the sample that are free to vary after estimating the sample
mean.
TYPES OF t-TESTS
1. One-Sample t-test
2. Two-sample t-test
3. Paired sample t-test
We can use this when the sample size is small (i.e. n < 30) data is
collected randomly and it is approximately normally distributed. It
can be calculated as:
x−μ
t=
s
√n
Where,
t = t-value
x = sample mean
μ= population mean
19
σ = standard deviation
n = sample size
and,
√
n
1
s= ∑
n−1 i=1
( x i−x )
2
20
In this case, the degree of freedom ( ⅆf ) = n1 +n 2−2, where n1 and n2 ae
sample sizes for the two groups. This is because there are two
parameters to estimate a two-sample t-test.
Given two independent random samples x i(i=1,2,3 . . . n1) and y i( ⅈ
=1,2,3 . . . n2) of size n1 ¿ n2 with means x and y and standard
deviations s₁ and s₂ from a normal population with the same variance,
we have to test the hypothesis that the population means are the same.
The test statistics are given by:
( x− y )
t=
1
√
S 1 1
+
n1 n 2
2
s=
n1 +n 2−2
[ s1 ( n1−1 )+ s 2 ( n2−1 ) ]
2 2
n1
1
x= ∑ x i and,
n1 i=1
n2
1
y= ∑y
n2 i=1 i
21
measurements to determine if they are significantly different. Paired
measurements can be taken from the same individual, object, or
related units at different times or under various conditions. If the
groups come from a single population (e.g., measuring before and
after an experimental treatment), perform a paired t-test.
If the size of the two samples is the same, say equal to “n” and the
data are paired ( x i , y i ) corresponds to the same ⅈ th sample unit. The
problem is to test if the sample means differ significantly or not. Here
we take the null hypothesis as there is no significant difference
between the sample means over time. In paired t-test, the degree of
freedom ( ⅆf )= (n−1).
The test statistics are given by:
d
t=
s/√n
Where,
n
1
d= ∑ⅆ
n i=1 i
22
ⅆi= xi − y i (i=1,2,3 . . . n)
√
n
1
s= ∑
n−1 i=1
( di−d )2
3.2 F – Test
F test is a statistical test that is used in hypothesis testing, that
determines whether or not the variances of two populations or two
samples are equal. The F – statistic is a ratio of two variances, which
are a measure of how spread-out data is from the mean.
The history of F – Test involves the work of two statisticians:
Sir Ronald Fisher and Gorge W. Snedecor.
In the 1920s, Fisher developed the F-statistic as a variance ratio. He
also provided the form of the F-distribution. In 1934, Snedecor
tabulated the F-distribution and named the test statistic "F" in the
honor of Fisher. Snedecor also coined the name "F-test".
F – Distribution: The F-distribution was developed by Fisher to study
the behaviour of two variances from random samples taken from two
independent normal populations. In applied problems we may be
interested in knowing whether the population variances are equal,
based on the response of the random samples. the F-
distribution or F-ratio, also known as Snedecor's F distribution or
the Fisher–Snedecor distribution (after Ronald Fisher and George
W. Snedecor), is a continuous probability distribution that arises
frequently as the null distribution of a test statistic.
Several assumptions are used in the F Test equation. For the F-test
Formula to be utilized, the population distribution needs to be normal.
Independent events should be the basis for the test samples. Apart
from this, the following considerations should also be taken into
consideration.
It is simpler to calculate right-tailed tests. By pushing the
bigger variance into the numerator, the test is forced to be right
tailed.
23
Before the critical value is determined in two-tailed tests, alpha
is divided by two.
Squares of standard deviations equal variances.
In this case, the degrees of freedom are ( n 1−1 ) and ( n 2−1 ) where n1 and
n2 are the sample sizes of the two groups. The shape of the F –
distribution is determined by its degrees of freedom. It is a right –
skewed distribution, meaning it has a longer tail on the right side. As
the degrees of freedom increase, the F-distribution becomes more
symmetric and approaches a bell shape.
where,
n1
1
2
S1 = ∑ ( x−x i )
2
( n1 −1 ) i=1
n2
1
S=
2
2 ∑ ( y− y i )
2
( n 2−1 ) i=1
24
3.3 ANOVA
3.3.1 Introduction
We have seen that the test of significance of the difference of two
means whether two samples differ significantly with respect to some
property or not. In actual practice, however, it happens that more than
two samples are involved for example-
In the agriculture experiment, 4 different chemical treatments of soil
A, B, C, and D produce mean wheat yields of 22, 24, 18, and 24
bushels per acre respectively. If we want to test whether there is a
significant difference in these means or due to chance, we cannot use
a t-test. However, one way of using a t-test is to make pairs and then
test them. We test AB, AC, AD, BD, and CD separately.
The conclusion is also drawn separately. In other words, a t-test will
be applied 6 times and still, a joint-together test will not be available.
This t-test is not suitable in this case because we want a test that
provides interference for all 4 samples. Such problems can be solved
by using an important technique ANALYSIS OF VARIANCE.
As the name indicates the method consists of the analysis of
the variance of sample into useful components. We know that
variability may arise due to a large number of causes and the amount
of variation in the data may be the sum of the total small deviation
produced by these factors and causes forming a homogenous system.
The variation in data may also arise due to other random causes such
as lack of homogeneity of some raw material or error or chance or
fluctuation.
ANALYSIS OF VARIANCE is a method to estimate the contribution
made by each factor to the total variation.
3.3.2 History
In the 1770s, Laplace was performing hypothesis testing. Around
1800, Laplace and Gauss developed the least-squares method for
combining observations. Ronald Fisher introduced the term
“variation” and proposed its formal analysis in 1918. Fisher’s first
application of ANOVA to data analysis was published in 1921,
Studies in Crop Variation I. Later he came up with Studies in Crop
Variation II, which was written with Winifred Mackenzie and
published it in 1923. The variation in yield across plots sown with
25
different varieties and fertilizer treatments was studied. Analysis of
variance became widely known after being included in Fisher’s 1925
book “Statistical Methods for Research Workers”. Since ANOVA was
developed by Fisher, it is also known as Fisher analysis of variance. It
uses f-distribution to test two or more sample variations.
3.3.3 Assumptions
When the ANOVA technique is used the following assumptions
should be met –
i) The total variance of various sources of variance should be
additive i.e. contribution made by different factors or sources is
additive.
ii) The individuals in various subgroups should be selected based
on random sampling from a normally distributed population.
iii) The variance of the subgroups/samples should be homogenous.
2 2 2 2
σ 1=σ 2=σ 3=…=σ n
iv) The errors attached to each observation are independently and
normally distributed with mean = 0 and variance = σ 2.
v) The observations are independent and are distributed about
the true unknown mean.
vi) There should be at least two observations in each subgroup
otherwise ANOVA cannot be applied.
3.3.4 One – Way ANOVA
Let us assume that n random observations are classified into ‘k’
different classes or groups such that ⅈ thclass contains ni observations (
ⅈ =1 , 2, 3 , ⋯ , k ). We shall assume that all the observations are
independent and that the distribution from which the observations are
taken is normal with mean μi and variance σ 2.
Mean ( μi) maybe different for each sample but variance ( σ 2) remains
the same for different groups or samples.
Consider the following arrangement of n observations in k classes.
Classes
A₁₁ A₁₂ Ai Ak
y₁₁ y₁₂ y1 ⅈ y1 k
y 1 n1
26
k
∑ yi =x
i=1
27
CHAPTER 4
LARGE SAMPLE TEST
The large sample tests are used when a sample size is greater than 30
(n>30 ). The central limit theorem states that they become more
normally distributed as the number of samples increases. This means
that for large sample sizes, the sampling distributions of statistics are
approximately normal.
The different types of large sample tests –
i) Z – Test
ii) Welch’s F - Test
4.1 Z – TEST
A z-test is a statistical test used to determine whether two population
means are different when the variances are known and the sample size
is large. It can also be used to compare one mean to a hypothesized
value. It is commonly used when the sample size is large (typically
n>30 ). It can be used for:
28
The Z-test gained more widespread use as statistical theory matured,
especially as statisticians emphasized the importance of standardizing
data and making inferential claims using the standard normal
distribution. The work on central limit theorem played a critical role
in justifying the use of the Z-test for large samples. This theorem
states that the distribution of the sample mean approaches a normal
distribution as the sample size becomes larger, regardless of the
population distribution, as long as the variance is finite. It is
commonly used in fields such as economics, medicine, psychology,
and social sciences to compare sample data to population means or
compare the means of two large independent samples.
4.1.2 ASSUMPTIONS
When Z – test is used following assumptions should be met –
The Z-test is typically used when the sample size is large,
generally considered n>30. This is because the central limit
theorem states that for sufficiently large sample sizes, the
distribution of the sample mean approaches a normal distribution
regardless of the shape of the population distribution.
The population standard deviation (σ\sigma) should be known for
the Z-test to be valid. If the population variance is unknown but
the sample size is large, the sample standard deviation can
sometimes be used as an approximation
For smaller sample sizes, the underlying data should be
approximately normally distributed. However, this is less critical
for large samples due to the central limit theorem.
The sample should be randomly drawn from the population.
The data should be measured on an interval or ratio scale, where
the differences between values are meaningful (e.g., height,
weight, or temperature).
4.1.3 TYPES OF Z – TESTS
Following are the various types of Z-tests used in statistical
hypothesis-
i) One Sample Z – Test
ii) Two Sample Z – Test
iii) Z – Test for Proportions
4.1.4 ONE SAMPLE Z – TEST
29
A one-sample z-test is used to test whether the mean of a population
is less than, greater than, or equal to some specific value. The Z – Test
can be left-tailed, right-tailed, or one-tailed.
or
In this test, our region of rejection is located to the extreme left of the
distribution. Here our null hypothesis is that the claimed value is less
than or equal to the mean population value.
or
30
In this test, our region of rejection is located to the extreme right of
the distribution. Here our null hypothesis is that the claimed value is
less than or equal to the mean population value.
or
n = sample size
31
Suppose we want to test, H 0 : μ=μ 0
Against,
H A : μ> μ 0
μ< μ0
μ ≠ μ0
1) If z > z α , H 0 is to be rejected.
2) If z <−z α , H 0is to be rejected.
3) If |z|>−z α , H is to be rejected.
2
0
( )
2
σ
x1 N μ1 , 1
2
( )
2
σ
x2 N μ2 , 2
2
( )
2 2
σ1 σ2
x 1−x 2 N μ 1−μ2 , +
n1 n2
Where,
E ( x 1−x 2 )=E ( x 1 )−E ( x 2)
¿ μ1−μ2
And,
V ( x 1−x 2 )=V ( x 1 )+ V ( x2 ) −2 cov ( x 1 , x 2 )
32
2 2
σ1 σ 2
¿ + −0
n 1 n2
√
2 2
σ 1 σ2
+
n1 n2
√
2 2
σ 1 σ2
+
n1 n 2
Where,
x 1 , x 2= mean of the two samples
Result: If the calculated Z-test statistic falls within the critical region
(exceeds the critical value), reject the null hypothesis. Otherwise, fails
to reject the null hypothesis.
And,
P1 Q 1 PQ
V ( p1 )= and V ( p2 )= 2 2
n1 n2
34
Since, large samples p1 and p2 are normally distributed then, ( p1− p 2) is
also normally distributed.
Thus, the standard variable corresponding to ( p1− p 2) is given by:
( p1− p2 ) −E ( p 1− p2 )
z=
√ V ( p 1− p2 )
Now, under the null hypothesis:
Η 0: Ρ1=Ρ2
¿ P1−P2
=0
V ( p1− p2 ) =V ( p 1) + V ( p 2)
P 1 Q 1 P2 Q 2
¿ +
n1 n2
¿ PQ
( 1 1
+
n1 n2 )
Under Η 0 : P1=P2=P
( p1− p2 ) −0
z=
( ( ))
1
1 1
PQ + 2
n1 n 2
P1 < P2
P1 ≠ P2
If, z > z α
Η 0is to be rejected at α % level of significance.
35
|z|> z α
2
If, z <−z α
H 0 is to be rejected at α % level of significance.
Where,
X1 X
p1 = and p2= 2
n1 n2
37
Heterogeneous Variances: Unlike ANOVA, Welch’s F-test does not
assume that the variances of the groups are equal.
4.2.3 METHODOLOGY
Welch's F-test uses a more complex calculation for the degrees of
freedom to account for unequal variances across groups. The degrees
of freedom are calculated using the Welch–Satterthwaite equation,
which adjusts based on the variances and sample sizes of each group.
The degrees of freedom (df ) for the F-statistic in Welch’s test are
adjusted as:
(∑ )
2 2
k
n i ( X i−X )
2
i=1 s i ∕ ni
ⅆf =
(( ) )
( ni−1 )
k
∑ 2 2
i=1 s i ∕ ni
(∑ n i ( X i−X ) 2
)
k
i=1 s 2i ∕ ni
ⅆf =
( )
k
ni−1
∑ s 2i
i=1
where,
k : Number of groups.
38
CHAPTER 5
NON-PARAMETRIC TESTS
5.1 Introduction
A non-parametric test is a type of statistical test that does not require
the data to follow a specific distribution (e.g., a normal distribution).
These tests are particularly useful when you cannot assume or do not
know the distribution of the population from which your sample is
drawn. They are also applied when dealing with ordinal data or when
the sample size is too small to satisfy the assumptions of parametric
tests.
Nonparametric tests serve as an alternative to parametric tests such as
T-test or ANOVA that can be employed only if the underlying data
satisfies certain criteria and assumptions. Note that nonparametric
tests are used as an alternative method to parametric tests, not as their
substitutes. In other words, if the data meets the required assumptions
for performing the parametric tests, the relevant parametric test must
be applied.
5.3 Importance
39
In order to achieve the correct results from the statistical analysis, we
should know the situations in which the application of nonparametric
tests is appropriate. The main reasons to apply the nonparametric test
include the following:
1. The underlying data do not meet the assumptions about the
population sample
Generally, the application of parametric tests requires various
assumptions to be satisfied. For example, the data follows a normal
distribution and the population variance is homogeneous. However,
some data samples may show skewed distributions.
The skewness makes the parametric tests less powerful because the
mean is no longer the best measure of central tendency because it is
strongly affected by the extreme values. At the same time,
nonparametric tests work well with skewed distributions and
distributions that are better represented by the median.
2. The population sample size is too small
The sample size is an important assumption in selecting the
appropriate statistical method. If a sample size is reasonably large, the
applicable parametric test can be used. However, if a sample size is
too small, it is possible that you may not be able to validate the
distribution of the data. Thus, the application of nonparametric tests is
the only suitable option.
3. The analyzed data is ordinal or nominal
Unlike parametric tests that can work only with continuous data,
nonparametric tests can be applied to other data types such as ordinal
or nominal data. For such types of variables, the nonparametric tests
are the only appropriate solution.
40