Statistical concepts
Statistical concepts
An experiment is a process to have a series of trials or observations taken under some conditions
specified by the experimenter to confirm something doubtful and also to discover some unknown
principles or effects, or to test some suggested known truth. Examining the truth of a statistical
hypothesis, relating to some research problem, is known as an experiment. For example, an
experiment to examine the usefulness of certain newly developed drug.
There are two types of experiment, such as
1. An absolute experiment
2. A comparative experiment
An Absolute Experiment:
An absolute experiment is one in which the absolute value of some characteristic is determined.
Example: If we want to determine the impact of a fertilizer on the yield of a crop, it is a case of
absolute experiment. (Design of Sample Survey).
A Comparative Experiment:
A comparative experiment is one where two or more varieties or treatments are compared to
assess the significance of difference among the varieties. Example: if we want to determine the
impact of one fertilizer as compared to the impact of some other fertilizer, our experiment then
will be termed as a comparative experiment. (Design of Experiment)
Treatment: Various objects of comparison in a comparative experiment are called treatments. For
example, in an agricultural experiment, different fertilizers or different varieties of crop are
treatments. In medical experiment different doses of a medicine or diets are the treatments.
Experimental unit: The smallest subdivision of the experimental material to which the treatments
are applied and on which the variable under study is measured is called an experimental unit. Thus
in an agricultural experiment the plot of land on which the treatment is applied is an experimental
unit. In human experiments in which the treatment affects the individual, the individual will be the
experimental unit.
Yield:
The quantity which is measured from an experimental unit after the application of a treatment there
in is called a yield or response. Examples:
The production of paddy from different agriculture plots of same size.
The gain in weight of an experimental animal in a biological experiment.
The IQ of a student in a psychological experiment.
Blocks:
A block is a group of experimental unit which are internally homogeneous in respect of the
characteristic affecting the yields and externally heterogeneous. Examples:
In agricultural experiment, a block refers to a group of adjacent plots having uniform
fertility.
In animal feeding experiments, a group of animals same age, weight, sex, breed litter.
In a medical experiment, the patients of same symptoms having same age group, same sex
etc.
Randomization:
Professor R.A. Fisher introduced the principle of randomization in modern experimental design.
Randomization is the process of distributing the treatments to the experimental units purely by
chance mechanism in such a way that any experimental unit is equally likely to receive any
treatment. This process randomly assigns treatments to the experimental units. It implies that every
allotment of treatments ends up with the same probability. Randomizations purpose is to remove
bias and other sources of extraneous variation, which are uncontrollable. It is the basis of any valid
statistical test. Therefore, the treatments must be assigned randomly to the experimental units.
Local Control: Randomization and Replication do not remove all extraneous sources of variation.
A more refined experimental technique is required for that. A design should be chosen such that
all the extraneous sources of variation come under control. For this purpose, local control, which
refers to the amount of balancing, blocking and grouping of the experimental units, is used.
Balancing implies that the treatments should be assigned to the experimental units such that the
result is a balanced arrangement of treatments. Blocking means that, similar experimental units
should be collected together to form a relatively homogeneous group. The main purpose of local
control is to increase the efficiency of an experimental design by minimizing the experimental
error. In this case, local control should not be confused with the word control. Control in
experimental design is used for a treatment. It does not receive any treatment, but the effectiveness
of other treatments should be found through comparison.
Design of Experiment:
Design of experiment is the plan used in experimentation. More specifically, design of experiment
is the formulation of a set of rules and principles according to which an experiment is to be
conducted to collect appropriate data whose analysis will lead to valid inferences for the problem
under investigation.
D2 be two designs with error variances 12 and 22 and replications r1 and r2 respectively.
Where,
y ij is the yield corresponding to j th plot where i th treatment is used.
Where,
y ij is the yield corresponding to j th plot where i th treatment is used.
Assumptions
ii) i j 0
i j
Example:
If we select the level of irrigation and the level of fertilizer at random then model is known
as a random effect model.
y ij i j eij ; i 1 1 p , j 1 1q
Where,
y ij is the yield corresponding to j th plot where i th treatment is used.
Assumptions
i) unknown parameter
i NID (0, 2 )
ii) j NID (0, 2 )
eij NID (0, 2 )
Where,
y ij is the yield corresponding to j th plot where i th treatment is used.
Assumptions
ii)
i
i 0
Where,
th
y ij is the yield corresponding to j th plot where i treatment is used.
is the general mean effect.
i is the effect due to i th level of the factor A
eij is the random error component.
Where,
y ij is the j th observation due to i th class.
One-Way Classification:
For analysis of data by ANOVA technique, the arrangement of observation in various classes on
the basis of a single factor or criterion is called a one-way classification.
Basic designs: There are three basic designs used in design of experiments.
Completely Randomized Design (CRD) is the basic single factor design. In this design the
treatments are assigned completely at random so that each experimental unit has the same chance
of receiving any one treatment. But CRD is appropriate only when the experimental material is
homogeneous. As there is generally large variation among experimental plots due to many
factors CRD is not preferred in field experiments. In laboratory experiments and greenhouse
studies it is easy to achieve homogeneity of experimental materials and therefore CRD is most
useful in such experiments.
Layout
The step-by-step procedure for randomization and layout of a CRD are given here for a pot
culture experiment with four treatments A, B, C and D, each replicated five times.
Step 1. Determine the total number of experimental plots (n) as the product of the number of
treatments (t) and the number of replications (r); that is, n = rt. For our example, n = 5 x 4 = 20.
Here, one pot with a single plant in it may be called a plot. In case the number of replications is
not the same for all the treatments, the total number of experimental pots is to be obtained as the
t
sum of the replications for each treatment. i.e., n ri where ri is the number of times the ith
i 1
treatment replicated.
Step 2. Assign a plot number to each experimental plot in any convenient manner; for example,
consecutively from 1 to n.
Step 3. Assign the treatments to the experimental plots randomly using a table of random
numbers as follows. Locate a starting point in a table of random numbers by closing your eyes
and pointing a finger to any position in a page. For our example, the starting point is taken at the
intersection of the sixth row and the twelfth (single) column of two-digit numbers. Using the
starting point obtained, read downward vertically to obtain n = 20 distinct two-digit random
numbers. For our example, starting at the intersection of the sixth row and the twelfth column,
the 20 distinct two-digit random numbers are as shown here together with their corresponding
sequence of appearance.
Random number : 37, 80, 76, 02, 65, 27, 54, 77, 48, 73,
Sequence : 1, 2, 3, 4, 5, 6, 7, 8, 9, 10,
Random number : 86, 30, 67, 05, 50, 31, 04, 18, 41, 89
Sequence : 11, 12, 13, 14, 15, 16, 17, 18, 19, 20
Step 4.Rank the n random numbers obtained in ascending or descending order. For our example,
the 20 random numbers are ranked from the smallest to the largest, as shown in the following:
37 1 8 86 11 19
80 2 18 30 12 6
76 3 16 67 13 14
02 4 1 05 14 3
65 5 13 50 15 11
27 6 5 31 16 7
54 7 12 04 17 2
77 8 17 18 18 4
48 9 10 41 19 9
73 10 15 89 20 20
Step 5.Divide the n ranks derived into t groups, each consisting of r numbers, according to the
sequence in which the random numbers appeared. For our example, the 20 ranks are divided into
four groups, each consisting of five numbers, as follows:
Group Ranks in the Group
Number
1 8 13 10 14 2
2 18 5 15 3 4
3 16 12 19 11 9
4 1 17 6 7 20
Step 6. Assign the t treatments to the n experimental plots, by using the group number as the
treatment number and the corresponding ranks in each group as the plot number in which the
corresponding treatment is to be assigned. For our example, the first group is assigned to
treatment A and plots numbered 8, 13, 10, 14 and 2 are assigned to receive this treatment; the
second group is assigned to treatment B with plots numbered 18, 5, 15, 3 and 4; the third group is
assigned to treatment C with plots numbered 16, 12, 19, 11 and 9; and the fourth group to
treatment D with plots numbered 1, 17, 6, 7 and 20. The final layout of the experiment is shown
below.
Plot no 1 2 3 4
Treatment D A B B
5 6 7 8
B D D A
9 10 11 12
C A C C
13 14 15 16
A A B C
17 18 19 20
D B C D
Figure 1.1. A sample layout of a completely randomized design with four treatments (A, B, C
and D) each replicated five times.
Advantage of CRD:
ii) Number of treatment can be repeated i.e. complete flexibility regarding the
replication.
v) Analysis of data is quite simple and straight forward even if different treatments have
unequal number of replication.
vi) The analysis is easier if there is some missing observations. In fact properly of
orthogonality is not lost by missing values in a CRD.
Disadvantage of CRD:
i) CRD is relatively inefficient design as local control method is not adopted to reduce
error variation in this design .
ii) It is seldom used in field experiments because homogeneous unis over the whole
experimental area is rarely available in practice.
iii) Due to inflated error variations there is greater change of wrongly accepting null
hypothesis.
iii) It is suitable in situations where a large fraction of experimental units may not
respond or may be lost in course of experiment.
iv) It is advantages for small experiments because it furnishes maximum number of error
degrees of freedom.
Stating the underlying assumptions, discuss the method of analysis of data of completely
Randomized Design.
4.
eij ~ NID 0, 2 .
5. The model considered is fixed effect model.
Layout of One-Way Classified Data:
Data of completely Randomized Design with K -treatments can be presented in tabular form as
shown below:
Treatment
A1 A2 … Ai … Ak
1 y11 y 21 … yi1 … y k1
2 y12 y 22 … yi 2 … yk 2
j y1 j y2 j y ij y kj
replication
ni y1n1 y 2n2 … y ini … y knk
Total y1 . y2 . … yi . … yk .
Mean y1 . y2 . … yi . … yk .
Where n1 n2 nk N = total number of observations
k ni
G = Grand total = y
i 1 j 1
ij
ni
y ij
yi . j 1
= The mean value of the ith treatment =
ni
G
y. . Grand mean
N
i 1 j 1 i 1 j 1
ni
y i . y.. y ij y i . y ij y i . 0
k ni k ni
2 2
i 1 j 1 i 1 j 1 j 1
ni y i . y.. y ij y i .
k k ni
2 2
i 1 i 1 j 1
A randomized block design (RBD) is a design in which the whole set of experimental units
arranged in several blocks which are internally homogeneous and extremely heterogeneous and
then the selected treatments are randomly allocated to the experimental units within each block
such that each treatment occurs one or same number of times in each block.
Advantages of RBD:
i) RBD is more efficient than CRD and thus provides more accurate and precise results
than CRD.
ii) Any number of blocks and any number of treatment can be used in RBD except the
restriction that at least two replicates are needed to a carry out the test of significance.
Disadvantage of RBD:
i) RBD is not suitable for large number of treatments as large error variation may arise
in such case and when the blocks are within heterogeneous.
iv) Since RBD controls variability due to one extraneous factor it is unsatisfactory when
several extraneous factor exists among the experimental unit.
Uses of RBD:
i) RBD removes one extraneous source of variation from experimental error and so
increases precision. Thus RBD is used to increases precision.
ii) Its use is found to be satisfactory in many experimental situations and thus it avoids
the necessity of using more complex designs.
Page no 1
iii) RBD provides unbiased estimates of block means in addition to that of treatment
means and thus furnishes additional information from the experiment. It is not
necessary that all blocks be conducted at the same location or at same time.
Reasons of blocking:
i) One major reason for use of blocks is to make inferences over a large number of
environmental conditions.
ii) Another major reason is to reduce error variation by removing an unwanted source of
variation from error variation.
Discuss the method of analysis of data of R.B.D setting the necessary assumptions./ Set up a
linear model to analyze the data obtained from R.B.D with one observation per cell.
The linear model in RBD for one observation per experimental unit is given by
yij i j ij ; i 1 1 p
j 1 1q
Where,
Assumption:
2. There is no interaction between blocks and treatment so that they are independent.
p q
3. treatment effects and block effects are additive in nature, i.e i 0 and
i 1
j 1
j 0
Page no 2
5. eij ~ NID 0, 2 .
Layout of Data:
Block
Marginal
B1 B2 … Bj … Bq Mean
Total
Treatment
A2 y 21 y 22 … y2 j … y2q y2 . y2 .
Ap y p1 y p2 … y pj … y pq yp. yp.
Marginal
y. 1 y. 2 … y. j … y. q G = y. .
Total
Mean y. 1 y. 2 … y. j … y. q y. .
y
j 1
ij
th
yi . =Mean value corresponding to i treatment =
q
th
y
i 1
ij
y. j = Mean value corresponding to j block =
p
p q p q
G= yij yi. y. j = Grand total
i 1 j 1 i 1 j 1
Page no 3
Latin square design (LSD):
Latin square design is a design in which experimental units are arranged in complete blocks in
two different ways, called rows and columns and then the selected treatments are randomly
allocated to the experimental units within each row and column such that each treatment appears
exactly once in each row and once in each column. Since this design is a square arrangement
where the treatments are denoted by Latin letters, so this design is named Latin square design.
The term Latin square was first used in analysis of variance context by R. A. Fisher who
borrowed it from Swiss mathematician Leonard Euler (1707-1783).
In general a r r Latin square is an arrangement of r letters in r rows and r columns such that
each letter appears once in each row and once each column.
Thus a 3 3 Latin square with treatments A, B, and C is given by
A B C
B C A
C A B
Example:
i) In agricultural field experiments, LSD is used to eliminate the variation due to soil
fertility difference in two perpendicular directions and then to compare the yields of
several varieties of paddy or wheat.
ii) In animal feeding experiments LSD may be used to remove the variation due to
breeds and ages of cows and then to compare the yields of milk from cows fed on
different nations.
Advantages of LSD:
i) LSD is more efficient than RBD and CRD. Since it control more of the variation than
CRD or RBD.
ii) Statistical analysis of data remains simple even with missing observations.
iii) LSD is an complete layout needs less number of observations tan the corresponding
complete layout. So LSD has adequate economy in the use of experimental material.
iv) LSD covers relatively complete situations where factors can be studied
simultaneously.
Disadvantages of LSD:
i) LSD is not suitable for large member of treatments.
ii) Analysis of data in a LSD depends on the assumption that there in no interaction
among rows, columns and treatments. So LSD is not appropriate when interactions
are present in data.
iii) Error d.f is relatively small in a LSD. In fact, there is no error d . f for 2 2 latin
square.
iv) Property of orthogonality is lost by missing values in LSD.
Uses of LSD: Latin square design is used in experimentation in different way:
i) Glass house experiments, where there may exists variation across the house due to
light differences and along the house due to treatment differences.
ii) Cow feeding experiment.
iii) Used to eliminate two extraneous source of variability.
iv) Field experiment.
How does LSD or incomplete three way classification differ from complete three way
classification?
A complete three way classification involves r 3 possible level combinations. While a LSD or
incomplete three way classification is a design involving r 2 observations out r 3 possible level
combinations.
Standard Square: A square is said to be Standard Square if the first row and the first column are
ordered alphabetically or numerically.
For example,
1 2 3 A B C
2 3 1 or , B C A
3 1 2 C A B
Conjugates square: Two standard squares are said to be conjugate if the row of one square are
the columns of the other. For example,
A B A B
and are conjugate square.
B A B A
Self Conjugate Square: A square is called self Conjugate Square if its arrangement of rows and
columns are the same. For example,
A B C
B C A is self conjugate square.
C A B
Orthogonal Latin squares: Two Latin squares of same size are said to be orthogonal Latin
squares if each letter of one square appears exactly once with each letter of the other square
when the two Latin squares are superimposed.
For example,
1st latin square 2nd latin square
A B C
are orthogonal latin square.
B C A
C A B
Difference between CRD and LSD
CRD LSD
The additive model of C.R.D with unequal 1. A Linear additive model for LSD is given by
observation is yijl i j l eijl ; i, j , l 1, 2,..., r.
yij i eij ; i 1 1 k , j 1 1 ni
where,
Where, y ij is the i th treatment in the j th yijl observation in the i th row, j th column and l th treatment.
replication
general mean effect
is the general mean effect
i is the effect due to i th treatment i fixed effect of i th row
eij is the random error component j fixed effect of j th column
l fixed effect of l th treatment.
eijl random error component.
2. In CRD the number of replication varies 2. In LSD the number of replication must be the same as the number
from treatment to treatment. of treatment.
3. For missing observation CRD does not loss 3. LSD losses its orthogonality for missing observation.
its orthogonality.
4. CRD does not control any external sources 4. LSD controls the external sources of variation in two
of variation and it is less efficient than LSD perpendicular direction for which and it is more efficient than CRD
Stating the necessary assumptions, discuss the method of analysis of data of L.S.D.
Statistical analysis of LSD:
A fixed effect model for LSD is given by
yijl i j l eijl ; i, j , l 1, 2,..., r.
where,
yijl observation in the i th row, j th column and l th treatment.
general mean effect
i fixed effect of i th row
j fixed effect of j th column
l fixed effect of l th treatment.
eijl random error component.
Assumption:
i) , i , j and l are unknown parameters
ii) There is no interaction effect between rows, columns and treatment.
r r r
iii) i j l 0
i 1 j 1 l 1
2 y 21. y 22. … y 2 j. … y 2r . y 2 .. y 2 ..
Marginal
y.1. y. 2. … y. j . … y. r . G = y...
Total
yi..
yi... Mean of ith row
r
y. j . y ijl
i (l )
= total of jth column
y. j.
y. j . Mean of jth column
r
y..l y ij (l ) total of l th treatment
ij
y..l
y..l mean of l th treatment
r
G y... y ijl Grand total of all observation
i jl
y...
y... Grand mean
r2