RMI Module3
RMI Module3
RMI Module3
Module -3
Research Methodology
Topics Covered
Research Design: Meaning of Research Design, Need for Research Design, Features of a
Good Design, Important Concepts Relating to Research Design, Different Research Designs,
Basic Principles of Experimental Designs, Important Experimental Designs. Design of Sample
Surveys: Introduction, Sample Design, Sampling and Non sampling Errors, Sample Survey
versus Census Survey, Types of Sampling Designs.
• The research design is the conceptual structure within which research is conducted; it
constitutes the blueprint for the collection, measurement and analysis of data. As such
the design includes an outline of what the researcher will do from writing the hypothesis
and its operational implications to the final analysis of data.
b) The observational design which relates to the conditions under which the
observations are to be made;
c) The statistical design which concerns with the question of how many items are to
be observed and how the information and data gathered are to be analysed;
d) The operational design which deals with the techniques by which the procedures
specified in the sampling, statistical and observational designs can be carried out.
ii. It is a strategy specifying which approach will be used for gathering and analysing the
data.
iii. It also includes the time and cost budgets since most studies are done under these
two constraints.
• Preparation of the research design should be done with great care as any error in it may
upset the entire project.
• Many researches do not serve the purpose for which they are undertaken. In fact, they
may even give misleading conclusions. Thoughtlessness in designing the research
project may result in rendering the research exercise futile. The design helps the
researcher to organize his ideas in a form whereby it will be possible for him to look for
flaws and inadequacies. In the absence of such a course of action, it will be difficult for
the critic to provide a comprehensive review of the proposed study.
• Generally, the design which minimizes bias and maximizes the reliability of the data
collected and analysed is considered a good design, The design which gives the smallest
experimental error is supposed to be the best design in many investigations.
• A design which yields maximal information and provides an opportunity for considering
many different aspects of a problem is considered most appropriate. A design may be
quite suitable in one case but may be found wanting in one respect or the other in the
context of some other research problem. One single design cannot serve the purpose of
all types of research problems.
• A research design appropriate for a particular research problem, usually involves the
consideration of the following factors:
(i) The means of obtaining information;
(ii) The availability and skills of the researcher and his staff, if any;
(iii)The objective of the problem to be studied;
• A concept which can take different quantitative values is called a variable. As such
the concepts like weight, height, income are all examples of variables.
• Phenomena which can take quantitatively different values even in decimal points are
called ‘continuous variables’.
• If they can only be expressed in integer values, they are non-continuous variables, Age is
an example of continuous variable, but the number of children is an example of non-
continuous variable, If one variable depends upon or is a consequence of the other
variable, it is termed as a dependent variable, and the variable that is antecedent to the
dependent variable is termed as an independent variable. For instance, if we say that
height depends upon age, then height is a dependent variable and age is an independent
variable.
2. Extraneous variable
• Independent variables that are not related to the purpose of the study, but may affect
the dependent variable are termed as extraneous variables.
• A study must always be so designed that the effect upon the dependent variable is
attributed entirely to the independent variable(s), and not to some extraneous variable
or variables.
3. Control
One important characteristic of a good research design is to minimise the influence or
effect of extraneous variable(s). In experimental researches, the term ‘control’ is used to
refer to restrain experimental conditions.
4. Confounded relationship
When the dependent variable is not free from the influence of extraneous variable(s), the
relationship between the dependent and independent variables is said to be confounded
by an extraneous variable(s).
5. Research hypothesis
When a prediction or a hypothesised relationship is to be tested by scientific methods, it
is termed as research hypothesis. The research hypothesis is a predictive statement that
relates an independent variable to a dependent variable. Usually a research hypothesis
must contain, at least, one independent and one dependent variable.
Training programming given to 50 students with 25 students given usual studies and 25
students given special studies is experimental as independent variable intelligence is
manipulated.
In the above illustration, the Group A can be called a control group and the Group B an
experimental group.
8. Treatments
• The different conditions under which experimental and control groups are put are
usually referred to as ‘treatments’.
• In the illustration taken above, the two treatments are the usual studies programme and
the special studies programme.
• Similarly, if we want to determine through an experiment the comparative impact of
three varieties of fertilizers on the yield of wheat, in that case the three varieties of
fertilizers will be treated as three treatments.
9. Experiment
• The process of examining the truth of a statistical hypothesis, relating to some research
problem, is known as an experiment.
• Generally, the following three methods in the context of research design for such studies
are talked about:
(a) the survey of concerning literature;
(b) the experience survey and
(c) the analysis of ‘insight-stimulating’ examples.
The researcher must prepare an interview schedule for the systematic questioning of informants. It
is often considered desirable to send a copy of the questions to be discussed to the respondents well
in advance. This will also give an opportunity to the respondents for doing some advance thinking
over the various issues involved so that, at the time of interview, they may be able to contribute
effectively.
• The research design must make enough provision for protection against bias and must
maximise reliability.
• The design in such studies must be rigid and not flexible and must focus attention on the
following:
a) Formulating the objective of the study (what the study is about and why is it being
made?)
b) Designing the methods of data collection (what techniques of gathering data will be
adopted?)
c) Selecting the sample (how much material will be needed?)
d) Collecting the data (where can the required data be found and with what time period
should the data be related?)
e) Processing and analysing the data.
f) Reporting the findings.
The difference between research designs in respect of the above two types of research
studies
• The Principle of Replication, the experiment should be repeated more than once. Thus,
each treatment is applied in many experimental units instead of one. By doing so the
statistical accuracy of the experiments is increased.
For example, suppose we need to examine the effect of two varieties of rice. For this purpose
we may divide the field into two parts and grow one variety in one part and the other variety
in the other part. We can then compare the yield of the two parts and draw conclusion on
that basis. But if we are to apply the principle of replication to this experiment, then we
first divide the field into several parts, grow one variety in half of these parts and the other
variety in the remaining parts.
For instance, if we grow one variety of rice, say, in the first half of the parts of a field and the
other variety is grown in the other half, then it is just possible that the soil fertility may be
different in the first half in comparison to the other half. If this is so, our results would not
be realistic. In such a situation, we may assign the variety of rice to be grown in different
parts of the field on the basis of some random sampling technique i.e., we may apply
randomization principle and protect ourselves against the effects of the extraneous factors
(soil fertility differences in the given case).
In other words, according to the principle of local control, we first divide the field into several
homogeneous parts, known as blocks, and then each such block is divided into parts equal
to the number of treatments. Then the treatments are randomly assigned to these parts of a
block.
• We can classify experimental designs into two broad categories, viz., informal
experimental designs and formal experimental designs.
• Informal experimental designs are those designs that normally use a less sophisticated
form of analysis based on differences in magnitudes, whereas formal experimental
designs offer relatively more control and use precise statistical procedures for analysis.
• The two groups (experimental and control groups) of such a design are given
different treatments of the independent variable. This design of experiment is
quite common in research studies concerning behavioural sciences.
• The merit of such a design is that it is simple and randomizes the differences
among the sample items.
• But the limitation of it is that the individual differences among those conducting the
treatments are not eliminated, i.e., it does not control the extraneous variable
(ii) Random replications design: The limitation of the two-group randomized design is
usually eliminated within the random replications design. In the illustration just cited
above, the teacher differences on the dependent variable were ignored, i.e., the
extraneous variable was not controlled. But in a random replications design, the effect
of such differences are minimised (or reduced) by providing a number of repetitions for
From the diagram it is clear that there are two populations in the replication design. The
sample is taken randomly from the population available for study and is randomly assigned
to, say, four experimental and four control groups. Similarly, sample is taken randomly from
the population available to conduct experiments (because of the eight groups eight such
individuals be selected) and the eight individuals so selected should be randomly assigned to
the eight groups. Generally, equal number of items are put in each group so that the size of
the group is not likely to affect the result of the study. Variables relating to both population
characteristics are assumed to be randomly distributed among the two groups. Thus, this
random replication design is, in fact, an extension of the two-group simple randomized
design.
• In the R.B. design, subjects are first divided into groups, known as blocks, such that
within each group the subjects are relatively homogeneous in respect to some selected
variable. The variable selected for grouping the subjects is one that is believed to be
related to the measures to be obtained in respect of the dependent variable.
• The number of subjects in a given block would be equal to the number of treatments
and one subject in each block would be randomly assigned to each treatment. In
general, blocks are the levels at which we hold the extraneous factor fixed, so that its
contribution to the total variability of data can be measured.
• Let us illustrate the R.B. design with the help of an example. Suppose four different
forms of a standardised test in statistics were given to each of five students (selected one
from each of the five I.Q. blocks) and following are the scores which they obtained.
The purpose of this randomization is to take care of such possible extraneous factors
(say as fatigue) or perhaps the experience gained from repeatedly taking the test.
• The L.S. design is used when there are two major extraneous factors such as the
varying soil fertility and varying seeds.
• For instance, an experiment has to be made through which the effects of five different
varieties of fertilizers on the yield of a certain crop, say wheat, it to be judged. In such a
case the varying fertility of the soil in different blocks in which the experiment has to be
performed must be taken into consideration; otherwise the results obtained may not be
very dependable because the output happens to be the effect not only of fertilizers, but it
may also be the effect of fertility of soil. Similarly, there may be impact of varying seeds
on the yield.
• The Latin-square design is one wherein each fertilizer, in our example, appears five
times but is used only once in each row and in each column of the design. In other
words, the treatments in a L.S design are so allocated among the plots that no
treatment occurs more than once in any one row or any one column. The two blocking
factors may be represented through rows and columns (one through rows and the other
through columns). The following is a diagrammatic form of such a design in respect of,
say, five types of fertilizers, viz., A, B, C, D and E and the two blocking factor viz., the
varying soil fertility and the varying seeds:
• The merit of this experimental design is that it enables differences in fertility gradients
in the field to be eliminated in comparison to the effects of different varieties of
fertilizers on the yield of the crop.
• But this design suffers from one limitation, and it is that although each row and each
column represents equally all fertilizer varieties, there may be considerable difference in
the row and column means both up and across the field.
• Another limitation of this design is that it requires number of rows, columns and
treatments to be equal. This reduces the utility of this design
(i) Simple factorial designs: In case of simple factorial designs, we consider the effects of
varying two factors on the dependent variable, but when an experiment is done with more than
two factors, we use complex factorial designs. Simple factorial design is also termed as a ‘two-
factor-factorial design’.
In this design the extraneous variable to be controlled by homogeneity is called the control
variable and the independent variable, which is manipulated, is called the experimental
variable. Then there are two treatments of the experimental variable and two levels of the
control variable. As such there are four cells into which the sample is divided. the column
means in the given design are termed the main effect for treatments without taking into
account any differential effect that is due to the level of the control variable. Similarly, the row
means in the said design are termed the main effects for levels without regard to treatment.
The following examples make clear the interaction effect between treatments and levels. The
data obtained in case of two (2 × 2) simple factorial studies may be as given in Fig.
The graph relating to Study I indicates that there is an interaction between the treatment and
the level which, in other words, means that the treatment and the level are not independent
of each other. The graph relating to Study II shows that there is no interaction effect which
means that treatment and level in this study are relatively independent of each other.
It may also be of the type having two experimental variables or two control variables. For
example, a college teacher compared the effect of the class size as well as the introduction of
the new instruction technique on the learning of research methodology.
(ii) Complex factorial designs: Experiments with more than two factors at a time involve the
use of complex factorial designs. A design which considers three or more independent
variables simultaneously is called a complex factorial design. In case of three factors with one
experimental variable having two treatments and two control variables, each one of which
having two levels, the design used will be termed 2 × 2 × 2 complex factorial design which
will contain a total of eight cells as shown below in Figure.
INTRODUCTION
A complete enumeration of all the items in the ‘population’ is known as a census inquiry or
census survey.
Ex: To calculate average per capita income of the people. Census survey is impossible in the
situations when population is infinite, it is not possible to examine every item in the
population. Therefore it is better to resort to a sample survey, i.e selection of a few items or
respondent, the respondent is a representative of the total population, the selected
respondent is the sample, the selection process is called ‘sampling technique’, the survey
conducted is called ‘sample survey’.
SAMPLE DESIGN
• A sample design is a definite plan for obtaining a sample from a given population. Sample
design may set down the number of items to be included in the sample. Sample design is
determined before the data is collected. There are many sample designs from which a
researcher can choose.
ii. Population: What should be the population? This question should be answered.
iii. Sampling unit and frame: Sampling unit may be a geographical one such as state,
district, village or a construction unit such as house, flat, etc., or it may be a social unit
such as family, club, school, etc., the list of sampling unit is called as sampling frame.
iv. Size of sample: The size of the sample should be neither be excessively large, nor too
small. It should be optimum. An optimum sample is one which fulfills the requirements
of efficiency, representativeness, reliability and flexibility. Budgetary constraint must be
considered when we decide the sample size.
v. Parameters of interest: Statistical constants of the population are called as
parameters, e.g., population mean, population proportion etc.
vii. Non-respondents: This non-response tends to change the results. The reason for the
non-response should be recorded by the investigator.
viii. Selection of proper sampling design: The researcher must decide the type of sample
he will use i.e., he must decide about the technique to be used in selecting the items for
the sample.
ix. Organizing field work: There should be efficient supervisory staff and trained
personnel for the field work.
x. Pilot survey: Try out the research design on a small scale before going to the field.
xi. Budgetary constraint: Cost considerations, from practical point of view have a major
impact upon decisions relating to not only the size of sample but also the type of
sample.
The errors involved in the collection of data are classified into sampling and non-sampling
errors.
• Sampling Errors arise due to the fact that only a part of the population has been used
to estimate population parameters, Sampling errors are absent in census survey. The
measurement of sampling error is usually called the ‘precision of the sampling plan’. If
we increase the sample size the precision can be improved, but increasing the sample
size increasing the cost of collecting data and also enhances the systematic bias. The
effective way to increase precision is usually to select a better sampling design.
• Non-sampling errors arise at the stage of collection and preparation of data and thus
are present in both the sample survey as well as the census survey. Non-sampling
errors can be reduced by defining the sampling units, frame and the population
correctly and by employing efficient people in the investigations.
• Sample survey, since we study only a subpart of the whole population, requires less
money and time. Non sampling errors are so much large in census survey than sample
survey. Non sampling errors such as inefficiency of field workers, non-response, bias
due to interviewers etc. If the information is required from every sample then resort to
census survey.
Non-probability sampling:
• Non-probability sampling is also known by different names such as deliberate
sampling, purposive sampling and judgement sampling.
• In this type of sampling, items for the sample are selected deliberately by the researcher;
his choice concerning the items remains supreme.
• For instance, if economic conditions of people living in a state are to be studied, a few
towns and villages may be purposively selected for intensive study on the principle that
they can be representative of the entire state. Thus, there is always the danger of bias
entering into this type of sampling technique. But in the investigators are impartial,
work without bias and have the necessary experience so as to take sound judgement,
Quota sampling is also an example of non-probability sampling. Under quota sampling
the interviewers are simply given quotas to be filled from the different strata, with some
restrictions on how they are to be filled. Quota samples are essentially judgement
samples and inferences drawn on their basis are not amenable to statistical treatment
in a formal way.
Probability sampling:
• Probability sampling is also known as ‘random sampling’ or ‘chance sampling’. Under
this sampling design, every item of the universe has an equal chance of inclusion in the
sample. It is, so to say, a lottery method in which individual units are picked up from
the whole group not deliberately but by some mechanical process. Random sampling
ensures the law of Statistical Regularity which states that if on an average the sample
chosen is a random one, the sample will have the same composition and characteristics
as the universe.
• Keeping this in view we can define a simple random sample (or simply a random sample)
from a finite population as a sample which is chosen in such a way that each of the NCn
possible samples has the same probability, 1/NCn, of being selected. To make it more
clear we take a certain finite population consisting of six elements (say a, b, c, d, e, f ) i.e.,
N = 6. Suppose that we want to take a sample of size n = 3 from it. Then there are n6C =
20 possible distinct samples of the required size, and they consist of the elements abc,
abd, abe, abf, acd, ace, acf, ade, adf, aef, bcd, bce, bcf, bde, bdf, bef, cde, cdf, cef, and def.
If we choose one of these samples in such a way that each has the probability 1/20 of
being chosen, we will then call this a random sample.
• This procedure will also result in the same probability for each possible sample. We can
verify this by taking the above example. Since we have a finite population of 6 elements
and we want to select a sample of size 3, the probability of drawing any one element for
our sample in the first draw is 3/6, the probability of drawing one more element in the
second draw is 2/5, (the first element drawn is not replaced) and similarly the probability
of drawing one more element in the third draw is 1/4. Since these draws are independent,
the joint probability of the three elements which constitute our sample is the product of
their individual probabilities and this works out to 3/6 × 2/5 × 1/4 = 1/20. This verifies
our earlier calculation.
(ii) Stratified sampling: If a population from which a sample is to be drawn does not
constitute a homogeneous group, stratified sampling technique is generally applied in order
to obtain a representative sample. Under stratified sampling the population is divided into
several sub-populations that are individually more homogeneous than the total population
(the different sub-populations are called ‘strata’) and then we select items from each stratum
to constitute a sample.
The following three questions are highly relevant in the context of stratified sampling:
(a) How to form strata?
(b) How should items be selected from each stratum?
(c) How many items be selected from each stratum or how to allocate the sample size of
each stratum?
• Regarding the first question, we can say that the strata be formed on the basis of
common characteristic(s) of the items to be put in each stratum. This means that
various strata be formed in such a way as to ensure elements being most homogeneous
within each stratum and most heterogeneous between the different strata. Thus, strata
are purposively formed and are usually based on past experience and personal
judgement of the researcher. In respect of the second question, we can say that the
usual method, for selection of items for the sample from each stratum, resorted to is
that of simple random sampling. Systematic sampling can be used if it is considered
more appropriate in certain situations. Regarding the third question, we usually follow
the method of proportional allocation under which the sizes of the samples from the
different strata are kept proportional to the sizes of the strata.
(iii) Cluster sampling: If the total area of interest happens to be a big one, a convenient way
in which a sample can be taken is to divide the area into a number of smaller non-
overlapping areas and then to randomly select a number of these smaller areas (usually
called clusters), with the ultimate sample consisting of all (or samples of) units in these small
areas or clusters. Suppose we want to estimate the proportion of machine parts in an
inventory which are defective. Also assume that there are 20000 machine parts in the
inventory at a given point of time, stored in 400 cases of 50 each. Now using a cluster
sampling, we would consider the 400 cases as clusters and randomly select ‘n’ cases and
examine all the machine parts in each randomly selected case.
(iv) Area sampling: If clusters happen to be some geographic subdivisions, in that case
cluster sampling is better known as area sampling. In other words, cluster designs, where
the primary sampling unit represents a cluster of units based on geographic area, are
distinguished as area sampling. The plus and minus points of cluster sampling are also
applicable to area sampling. sampling unit such as states in a country. Then we may select
certain districts and interview all banks in the chosen districts. This would represent a two-
stage sampling design
(vi) Sampling with probability proportional to size: In case the cluster sampling units do
not have the same number or approximately the same number of elements, it is considered
appropriate to use a random selection process where the probability of each cluster being
included in the sample is proportional to the size of the cluster. For this purpose, we have to
list the number of elements in each cluster irrespective of the method of ordering the cluster.
• The following are the number of departmental stores in 15 cities: 35, 17, 10, 32, 70, 28,
26, 19, 26, 66, 37, 44, 33, 29 and 28. If we want to select a sample of 10 stores, using
cities as clusters and selecting within clusters proportional to size, how many stores
from each city should be chosen? (Use a starting point of 10).
• Since in the given problem, we have 500 departmental stores from which we have to
select a sample of 10 stores, the appropriate sampling interval is 50. As we have to use
the starting point of 10 , so we add successively increments of 50 till 10 numbers have
been selected. The numbers, thus, obtained are: 10, 60, 110, 160, 210, 260, 310, 360,
410 and 460 which have been shown in the last column of the table (Table 4.1) against
the concerning cumulative totals. From this we can say that two stores should be
selected randomly from city number five and one each from city number 1, 3, 7, 9, 10,
11, 12, and 14. This sample of 10 stores is the sample with probability proportional to
size.
(vii) Sequential sampling: This sampling design is some what complex sample design. The
ultimate size of the sample under this technique is not fixed in advance, but is determined
according to mathematical decision rules on the basis of information yielded as survey
progresses. This is usually adopted in case of acceptance sampling plan in context of
statistical quality control. When a particular lot is to be accepted or rejected on the basis of
a single sample, it is known as single sampling; when the decision is to be taken on the
basis of two samples, it is known as double sampling and in case the decision rests on the
basis of more than two samples but the number of samples is certain and decided in
advance, the sampling is known as multiple sampling. But when the number of samples is
more than two but it is neither certain nor decided in advance, this type of system is often
referred to as sequential sampling.
Question Bank
1. Define the meaning of Research Design.
10. Illustrate the type ‘After-only with control design’ under informal experimental design.
11. Explain ‘Randomized block design (RB design)’ in formal experimental design.
12. List the two forms of Completely randomized design (CR design).
13. List the steps involved in sample design. Explain terms
i) Sampling unit ii) Source unit
14. What are the factors that affect the systematic bias?
15. Compare Sample survey vs Census survey.
16. Describe briefly about the different types of sample design.