BBA RM Notes

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 150

UNIT – I

INTRODUCTION

1.1 Meaning of Research:


Research in simple terms refers to search for knowledge. It is a scientific
and systematic search for information on a particular topic or issue. It is also
known as the art of scientific investigation. Several social scientists have
defined research in different ways.
In the Encyclopedia of Social Sciences, D. Slesinger and M. Stephension
(1930) defined research as “the manipulation of things, concepts or symbols for
the purpose of generalizing to extend, correct or verify knowledge, whether that
knowledge aids in the construction of theory or in the practice of an art”.
According to Redman and Mory (1923), research is a “systematized effort to gain new
knowledge”. It is an academic activity and therefore the term should be used in a
technical sense. According to Clifford Woody (Kothari, 1988), research comprises
“defining and redefining problems, formulating hypotheses or suggested solutions;
collecting, organizing and evaluating data; making deductions and reaching conclusions;
and finally, carefully testing the conclusions to determine whether they fit the formulated
hypotheses”.
Thus, research is an original addition to the available knowledge, which
contributes to its further advancement. It is an attempt to pursue truth through
the methods of study, observation, comparison and experiment. In sum,
research is the search for knowledge, using objective and systematic methods to
find solution to a problem.

1.1.1 Objectives of Research:


The objective of research is to find answers to the questions by applying
scientific procedures. In other words, the main aim of research is to find out the
truth which is hidden and has not yet been discovered. Although every research
study has its own specific objectives, the research objectives may be broadly
grouped as follows:
1. to gain familiarity with new insights into a phenomenon (i.e., formulative
research studies);
2. to accurately portray the characteristics of a particular individual, group, or a
situation (i.e., descriptive research studies);
3. to analyse the frequency with which something occurs (i.e., diagnostic research
studies); and
4. to examine the hypothesis of a causal relationship between two variables (i.e.,
hypothesis-testing research studies).
1.1.2 Research Methods versus Methodology:
Research methods include all those techniques/methods that are adopted
for conducting research. Thus, research techniques or methods are the methods
that the researchers adopt for conducting the research studies.
On the other hand, research methodology is the way in which research
problems are solved systematically. It is a science of studying how research is
conducted scientifically. Under it, the researcher acquaints himself/herself with
the various steps generally adopted to study a research problem, along with the
underlying logic behind them. Hence, it is not only important for the researcher
to know the research techniques/methods, but also the scientific approach called
methodology.

1.1.3 Research Approaches:


There are two main approaches to research, namely quantitative
approach and qualitative approach. The quantitative approach involves the
collection of quantitative data, which are put to rigorous quantitative analysis in
a formal and rigid manner. This approach further includes experimental,
inferential, and simulation approaches to research. Meanwhile, the qualitative
approach uses the method of subjective assessment of opinions, behaviour and
attitudes. Research in such a situation is a function of the researcher’s
impressions and insights. The results generated by this type of research are
either in non-quantitative form or in the form which cannot be put to rigorous
quantitative analysis. Usually, this approach uses techniques like indepth
interviews, focus group interviews, and projective techniques.

1.1.4 Types of Research:


There are different types of research. The basic ones are as follows:
1) Descriptive versus Analytical:
Descriptive research consists of surveys and fact-finding enquiries of
different types. The main objective of descriptive research is describing the
state of affairs as it prevails at the time of study. The term ‘ex post facto
research’ is quite often used for descriptive research studies in social sciences
and business research. The most distinguishing feature of this method is that the
researcher has no control over the variables here. He/she has to only report what
is happening or what has happened. Majority of the ex post facto research
projects are used for descriptive studies in which the researcher attempts to
examine phenomena, such as the consumers’ preferences, frequency of
purchases, shopping, etc. Despite the inability of the researchers to control the
variables, ex post facto studies may also comprise attempts by them to discover
the causes of the selected problem. The methods of research adopted in
conducting descriptive research are survey methods of all kinds, including
correlational and comparative methods.
Meanwhile in the Analytical research, the researcher has to use the
already available facts or information, and analyse them to make a critical
evaluation of the subject.

2) Applied versus Fundamental:


Research can also be applied or fundamental in nature. An attempt to
find a solution to an immediate problem encountered by a firm, an industry, a
business organisation, or the society is known as Applied Research. Researchers
engaged in such researches aim at drawing certain conclusions confronting a
concrete social or business problem.
On the other hand, Fundamental Research mainly concerns
generalizations and formulation of a theory. In other words, “Gathering
knowledge for knowledge’s sake is termed ‘pure’ or ‘basic’ research” (Young in
Kothari, 1988). Researches relating to pure mathematics or concerning some
natural phenomenon are instances of Fundamental Research. Likewise, studies
focusing on human behaviour also fall under the category of fundamental
research.
Thus, while the principal objective of applied research is to find a
solution to some pressing practical problem, the objective of basic research is to
find information with a broad base of application and add to the already existing
organized body of scientific knowledge.

3) Quantitative versus Qualitative:


Quantitative research relates to aspects that can be quantified or can be
expressed in terms of quantity. It involves the measurement of quantity or
amount. The various available statistical and econometric methods are adopted
for analysis in such research. Some such includes correlation, regressions and
time series analysis.
On the other hand, Qualitative research is concerned with qualitative
phenomena, or more specifically, the aspects related to or involving quality or
kind. For example, an important type of qualitative research is ‘Motivation
Research’, which investigates into the reasons for human behaviour. The main
aim of this type of research is discovering the underlying motives and desires of
human beings by using in-depth interviews. The other techniques employed in
such research are story completion tests, sentence completion tests, word
association tests, and other similar projective methods. Qualitative research is
particularly significant in the context of behavioural sciences, which aim at
discovering the underlying motives of human behaviour. Such research helps to
analyse the various factors that motivate human beings to behave in a certain
manner, besides contributing to an understanding of what makes individuals like
or dislike a particular thing. However, it is worth noting that conducting
qualitative research in practice is considerably a difficult task. Hence, while
undertaking such research, seeking guidance from experienced expert
researchers is important.

4) Conceptual versus Empirical:


The research related to some abstract idea or theory is known as
Conceptual Research. Generally, philosophers and thinkers use it for developing
new concepts or for reinterpreting the existing ones. Empirical Research, on the
other hand, exclusively relies on the observation or experience with hardly any
regard for theory and system. Such research is data based, which often comes
up with conclusions that can be verified through experiments or observation.
Empirical research is also known as experimental type of research, in which it is
important to first collect the facts and their sources, and actively take steps to
stimulate the production of desired information. In this type of research, the
researcher first formulates a working hypothesis, and then gathers sufficient
facts to prove or disprove the stated hypothesis. He/she formulates the
experimental design, which according to him/her would manipulate the
variables, so as to obtain the desired information. This type of research is thus
characterized by the researcher’s control over the variables under study.
Empirical research is most appropriate when an attempt is made to prove that
certain variables influence the other variables in some way. Therefore, the
results obtained by using the experimental or empirical studies are considered to
be the most powerful evidences for a given hypothesis.

5) Other Types of Research:


The remaining types of research are variations of one or more of the
afore-mentioned methods. They vary in terms of the purpose of research, or the
time required to complete it, or may be based on some other similar factor. On
the basis of time, research may either be in the nature of one-time or
longitudinal research. While the research is restricted to a single time-period in
the former case, it is conducted over several time-periods in the latter case.
Depending upon the environment in which the research is to be conducted, it can
also be laboratory research or field-setting research, or simulation research,
besides being diagnostic or clinical in nature. Under such research, in-depth
approaches or case study method may be employed to analyse the basic causal
relations. These studies usually undertake a detailed in-depth analysis of the
causes of certain events of interest, and use very small samples and sharp data
collecting methods. The research may also be explanatory in nature.
Formalized research studies consist of substantial structure and specific
hypotheses to be verified. As regards historical research, sources like historical
documents, remains, etc. are utilized to study past events or ideas. It also
includes philosophy of persons and groups of the past or any remote point of
time.
Research has also been classified into decision-oriented and conclusion-
oriented categories. The Decision-oriented research is always carried out as per
the need of a decision maker and hence, the researcher has no freedom to
conduct the research according to his/her own desires. On the other hand, in the
case of Conclusion-oriented research, the researcher is free to choose the
problem, redesign the enquiry as it progresses and even change
conceptualization as he/she wishes to. Further, Operations research is a kind of
decision-oriented research, because it is a scientific method of providing the
departments, a quantitative basis for decision-making with respect to the
activities under their purview.

1.1.5 Importance of Knowing How to Conduct Research:


The importance of knowing how to conduct research is listed below:
(i) the knowledge of research methodology provides training to new
researchers and enables them to do research properly. It helps them to
develop disciplined thinking or a ‘bent of mind’ to objectively observe
the field;
(ii) the knowledge of doing research inculcates the ability to evaluate and
utilise the research findings with confidence;
(iii) the knowledge of research methodology equips the researcher with the
tools that help him/her to make the observations objectively; and
(iv) the knowledge of methodology helps the research consumer to evaluate
research and make rational decisions.

1.1.6 Qualities of a Researcher:


It is important for a researcher to possess certain qualities to conduct
research. First and foremost, he being a scientist should be firmly committed to
the ‘articles of faith’ of the scientific methods of research. This implies that a
researcher should be a social science person in the truest sense. Sir Michael
Foster (Wilkinson and Bhandarkar, 1979) identified a few distinct qualities of a
scientist. According to him, a true research scientist should possess the
following qualities:
(1) First of all, the nature of a researcher must be of the temperament that
vibrates in unison with the theme which he is searching. Hence, the seeker of
knowledge must be truthful with truthfulness of nature, which is much more
important, much more exacting than what is sometimes known as truthfulness.
The truthfulness relates to the desire for accuracy of observation and precision
of statement. Ensuring facts is the principle rule of science, which is not an easy
matter. The difficulty may arise due to untrained eye, which fails to see
anything beyond what it has the power of seeing and sometimes even less than
that. This may also be due to the lack of discipline in the method of science. An
unscientific individual often remains satisfied with the expressions like
approximately, almost, or nearly, which is never what nature is. It cannot see
two things which differ, however minutely, as the same.

(2) A researcher must possess an alert mind. Nature is constantly


changing and revealing itself through various ways. A scientific researcher must
be keen and watchful to notice such changes, no matter how small or
insignificant they may appear. Such receptivity has to be cultivated slowly and
patiently over time by the researcher through practice. An individual who is
ignorant or not alert and receptive during his research will not make a good
researcher. He will fail as a good researcher if he has no keen eyes or mind to
observe the unusual behind the routine. Research demands a systematic
immersion into the subject matter for the researcher to be able to grasp even the
slightest hint that may culminate into significant research problems. In this
context, Cohen and Negal (Selltiz et al, 1965; Wilkinson and Bhandarkar, 1979)
state that “the ability to perceive in some brute experience the occasion of a
problem is not a common talent among men… It is a mark of scientific genius to
be sensitive to difficulties where less gifted people pass by untroubled by
doubt”.

(3) Scientific enquiry is pre-eminently an intellectual effort. It requires


the moral quality of courage, which reflects the courage of a steadfast
endurance. The science of conducting research is not an easy task. There are
occasions when a research scientist might feel defeated or completely lost. This
is the stage when a researcher would need immense courage and the sense of
conviction. The researcher must learn the art of enduring intellectual hardships.
In the words of Darwin, “It’s dogged that does it”.
In order to cultivate the afore-mentioned three qualities of a researcher, a
fourth one may be added. This is the quality of making statements cautiously.
According to Huxley, the assertion that outstrips the evidence is not only a
blunder but a crime (Thompson, 1975). A researcher should cultivate the habit
of reserving judgment when the required data are insufficient.

1.1.7 Significance of Research:


According to a famous Hudson Maxim, “All progress is born of inquiry.
Doubt is often better than overconfidence, for it leads to inquiry, and inquiry
leads to invention”. It brings out the significance of research, increased amounts
of which make the progress possible. Research encourages scientific and
inductive thinking, besides promoting the development of logical habits of
thinking and organisation. The role of research in applied economics in the
context of an economy or business is greatly increasing in modern times. The
increasingly complex nature of government and business has raised the use of
research in solving operational problems. Research assumes significant role in
the formulation of economic policy for both, the government and business. It
provides the basis for almost all government policies of an economic system.
Government budget formulation, for example, depends particularly on the
analysis of needs and desires of people, and the availability of revenues, which
requires research. Research helps to formulate alternative policies, in addition
to examining the consequences of these alternatives. Thus, research also
facilitates the decision-making of policy-makers, although in itself it is not a part
of research. In the process, research also helps in the proper allocation of a
country’s scarce resources.
Research is also necessary for collecting information on the social and
economic structure of an economy to understand the process of change
occurring in the country. Collection of statistical information, though not a
routine task, involves various research problems. Therefore, large staff of
research technicians or experts is engaged by the government these days to
undertake this work. Thus, research as a tool of government economic policy
formulation involves three distinct stages of operation: (i) investigation of
economic structure through continual compilation of facts; (ii) diagnosis of
events that are taking place and analysis of the forces underlying them; and (iii)
the prognosis i.e., the prediction of future developments (Wilkinson and
Bhandarkar, 1979).

Research also assumes a significant role in solving various operational


and planning problems associated with business and industry. In several ways,
operations research, market research and motivational research are vital and
their results assist in taking business decisions. Market research refers to the
investigation of the structure and development of a market for the formulation of
efficient policies relating to purchases, production and sales. Operational
research relates to the application of logical, mathematical, and analytical
techniques to find solution to business problems, such as cost minimization or
profit maximization, or the optimization problems. Motivational research helps
to determine why people behave in the manner they do with respect to market
characteristics. More specifically, it is concerned with the analysis of the
motivations underlying consumer behaviour. All these researches are very
useful for business and industry, and are responsible for business decision-
making.

Research is equally important to social scientists for analyzing the social


relationships and seeking explanations to various social problems. It gives
intellectual satisfaction of knowing things for the sake of knowledge. It also
possesses the practical utility for the social scientist to gain knowledge so as to
be able to do something better or in a more efficient manner. The research in
social sciences is concerned with both knowledge for its own sake, and
knowledge for what it can contribute to solve practical problems.

1.2 Research Process:


Research process consists of a series of steps or actions required for
effectively conducting research. The following are the steps that provide useful
procedural guidelines regarding the conduct of research:
(1) formulating the research problem;
(2) extensive literature survey;
(3) developing hypothesis;
(4) preparing the research design;
(5) determining sample design;
(6) collecting data;
(7) execution of the project;
(8) analysis of data;
(9) hypothesis testing;
(10) generalization and interpretation, and
(11) preparation of the report or presentation of the results. In other
words, it involves the formal write-up of conclusions.

1.3 Research Problem:


The first and foremost stage in the research process is to select and
properly define the research problem. A researcher should first identify a
problem and formulate it, so as to make it amenable or susceptible to research.
In general, a research problem refers to an unanswered question that a researcher
might encounter in the context of either a theoretical or practical situation,
which he/she would like to answer or find a solution to. A research problem is
generally said to exist if the following conditions emerge (Kothari, 1988):

(i) there should be an individual or an organisation, say X, to whom the


problem can be attributed. The individual or the organization is situated
in an environment Y, which is governed by certain uncontrolled variables
Z;
(ii) there should be atleast two courses of action to be pursued, say A 1 and
A2. These courses of action are defined by one or more values of the
controlled variables. For example, the number of items purchased at a
specified time is said to be one course of action.
(iii) there should be atleast two alternative possible outcomes of the said
courses of action, say B1 and B2. Of them, one alternative should be
preferable to the other. That is, atleast one outcome should be what the
researcher wants, which becomes an objective.
(iv) the courses of possible action available must offer a chance to the
researcher to achieve the objective, but not the equal chance. Therefore,
if P(Bj / X, A, Y) represents the probability of the occurrence of an
outcome Bj when X selects Aj in Y, then P(B1 / X, A1,Y) ≠ P (B1 / X, A2,
Y). Putting it in simple words, it means that the choices must not have
equal efficiencies for the desired outcome.
Above all these conditions, the individual or organisation may be said to have
arrived at the research problem only if X does not know what course of action to
be taken is the best. In other words, X should have a doubt about the solution.
Thus, an individual or a group of persons can be said to have a problem if they
have more than one desired outcome. They should have two or more alternative
courses of action, which have some but not equal efficiency. This is required for
probing the desired objectives, such that they have doubts about the best course
of action to be taken. Thus, the components of a research problem may be
summarised as:
(i) there should be an individual or a group who have some difficulty or
problem.
(ii) there should be some objective(s) to be pursued. A person or an
organization who wants nothing cannot have a problem.
(iii) there should be alternative ways of pursuing the objective the researcher
wants to pursue. This implies that there should be more than one
alternative means available to the researcher. This is because if the
researcher has no choice of alternative means, he/she would not have a
problem.
(iv) there should be some doubt in the mind of the researcher about the
choice of alternative means. This implies that research should answer
the question relating to the relative efficiency or suitability of the
possible alternatives.
(v) there should be a context to which the difficulty relates.
Thus, identification of a research problem is the pre-condition to conducting
research. A research problem is said to be the one which requires a researcher to
find the best available solution to the given problem. That is, the researcher
needs to find out the best course of action through which the research objective
may be achieved optimally in the context of a given situation. Several factors
may contribute to making the problem complicated. For example, the
environment may alter, thus affecting the efficiencies of the alternative courses
of action taken or the quality of the outcomes. The number of alternative courses
of action might be very large and the individual not involved in making the
decision may be affected by the change in environment and may react to it
favorably or unfavorably. Other similar factors are also likely to cause such
changes in the context of research, all of which may be considered from the
point of view of a research problem.

1.4 Research Design:


The most important step after defining the research problem is preparing the
design of the research project, which is popularly known as the ‘research
design’. A research design helps to decide upon issues like what, when, where,
how much, by what means etc. with regard to an enquiry or a research study.
A research design is the arrangement of conditions for collection and analysis of
data in a manner that aims to combine relevance to the research purpose with
economy in procedure. Infact, research design is the conceptual structure within
which research is conducted; it constitutes the blueprint for the collection,
measurement and analysis of data (Selltiz et al, 1962). Thus, research design
provides an outline of what the researcher is going to do in terms of framing the
hypothesis, its operational implications and the final data analysis. Specifically,
the research design highlights decisions which include:
(i) the nature of the study
(ii) the purpose of the study
(iii) the location where the study would be conducted
(iv) the nature of data required
(v) from where the required data can be collected
(vi) what time period the study would cover
(vii) the type of sample design that would be used
(viii) the techniques of data collection that would be used
(ix) the methods of data analysis that would be adopted and
(x) the manner in which the report would be prepared
In view of the stated research design decisions, the overall research
design may be divided into the following (Kothari 1988):

(a) the sampling design that deals with the method of selecting items to be
observed for the selected study;

(b) the observational design that relates to the conditions under which the
observations are to be made;

(c) the statistical design that concerns with the question of how many items are
to be observed, and how the information and data gathered are to be
analysed; and

(d) the operational design that deals with the techniques by which the
procedures specified in the sampling, statistical and observational designs
can be carried out.

1.4.1 Features of Research Design:


The important features of research design may be outlined as follows:
(i) it constitutes a plan that identifies the types and sources of information
required for the research problem;

(ii) it constitutes a strategy that specifies the methods of data collection and
analysis which would be adopted; and

(iii) it also specifies the time period of research and monetary budget involved
in conducting the study, which comprise the two major constraints of
undertaking any research.
1.4.2 Concepts Relating to Research Design:
Some of the important concepts relating to Research Design are
discussed below:
1. Dependent and Independent Variables:
A magnitude that varies is known as a variable. The concept may
assume different quantitative values like height, weight, income etc. Qualitative
variables are not quantifiable in the strictest sense of the term. However, the
qualitative phenomena may also be quantified in terms of the presence or
absence of the attribute(s) considered. The phenomena that assume different
values quantitatively even in decimal points are known as ‘continuous
variables’. But all variables need not be continuous. Values that can be
expressed only in integer values are called ‘non-continuous variables’. In
statistical terms, they are also known as ‘discrete variables’. For example, age
is a continuous variable, whereas the number of children is a non-continuous
variable. When changes in one variable depend upon the changes in other
variable or variables, it is known as a dependent or endogenous variable, and the
variables that cause the changes in the dependent variable are known as the
independent or explanatory or exogenous variables. For example, if demand
depends upon price, then demand is a dependent variable, while price is the
independent variable. And, if more variables determine demand, like income
and price of the substitute commodity, then demand also depends upon them in
addition to the price of original commodity. In other words, demand is a
dependent variable which is determined by the independent variables like price
of the original commodity, income and price of substitutes.
2 Extraneous Variable:
The independent variables which are not directly related to the purpose
of the study but affect the dependent variable are known as extraneous variables.
For instance, assume that a researcher wants to test the hypothesis that there is a
relationship between children’s school performance and their self-concepts, in
which case the latter is an independent variable and the former, a dependent
variable. In this context, intelligence may also influence the school
performance. However, since it is not directly related to the purpose of the study
undertaken by the researcher, it would be known as an extraneous variable. The
influence caused by the extraneous variable(s) on the dependent variable is
technically called the ‘experimental error’. Therefore, a research study should
always be framed in such a manner that the influence of extraneous variables on
the dependent variable/s is completely controlled, and the influence of
independent variable/s is clearly evident.

3. Control:
One of the most important features of a good research design is to
minimize the effect of extraneous variable(s). Technically, the term ‘control’ is
used when a researcher designs the study in such a manner that it minimizes the
effects of extraneous variables. The term ‘control’ is used in experimental
research to reflect the restrain in experimental conditions.

4. Confounded Relationship:
The relationship between the dependent and independent variables is
said to be confounded by an extraneous variable, when the dependent variable is
not free from its effects.
5. Research Hypothesis:
When a prediction or a hypothesized relationship is tested by adopting
scientific methods, it is known as research hypothesis. The research hypothesis
is a predictive statement which relates to a dependent variable and an
independent variable. Generally, a research hypothesis must consist of at least
one dependent variable and one independent variable. Whereas, the
relationships that are assumed but not to be tested are predictive statements that
are not to be objectively verified, thus are not classified as research hypotheses.

6. Experimental and Non-experimental Hypothesis Testing Research:


When the objective of a research is to test a research hypothesis, it is known as
hypothesis-testing research. Such research may be in the nature of experimental
design or non-experimental design. The research in which the independent
variable is manipulated is known as ‘experimental hypothesis-testing research’,
whereas the research in which the independent variable is not manipulated is
termed as ‘non-experimental hypothesis-testing research’. For example, assume
that a researcher wants to examine whether family income influences the school
attendance of a group of students, by calculating the coefficient of correlation
between the two variables. Such an example is known as a non-experimental
hypothesis-testing research, because the independent variable - family income is
not manipulated here. Again assume that the researcher randomly selects 150
students from a group of students who pay their school fees regularly and then
classifies them into two sub-groups by randomly including 75 in Group A,
whose parents have regular earning, and 75 in group B, whose parents do not
have regular earning. Assume that at the end of the study, the researcher
conducts a test on each group in order to examine the effects of regular earnings
of the parents on the school attendance of the student. Such a study is an
example of experimental hypothesis-testing research, because in this particular
study the independent variable regular earnings of the parents have been
manipulated.

7. Experimental and Control Groups:


When a group is exposed to usual conditions in an experimental
hypothesis-testing research, it is known as ‘control group’. On the other hand,
when the group is exposed to certain new or special condition, it is known as an
‘experimental group’. In the afore-mentioned example, Group A can be called
as control group and Group B as experimental group. If both the groups, A and
B are exposed to some special feature, then both the groups may be called as
‘experimental groups’. A research design may include only the experimental
group or both the experimental and control groups together.

8. Treatments:
Treatments refer to the different conditions to which the experimental
and control groups are subject to. In the example considered, the two treatments
are the parents with regular earnings and those with no regular earnings.
Likewise, if a research study attempts to examine through an experiment the
comparative effect of three different types of fertilizers on the yield of rice crop,
then the three types of fertilizers would be treated as the three treatments.

9. Experiment:
Experiment refers to the process of verifying the truth of a statistical
hypothesis relating to a given research problem. For instance, an experiment
may be conducted to examine the yield of a certain new variety of rice crop
developed. Further, Experiments may be categorized into two types, namely,
‘absolute experiment’ and ‘comparative experiment’. If a researcher wishes to
determine the impact of a chemical fertilizer on the yield of a particular variety
of rice crop, then it is known as absolute experiment. Meanwhile, if the
researcher wishes to determine the impact of chemical fertilizer as compared to
the impact of bio-fertilizer, then the experiment is known as a comparative
experiment.
10. Experimental Unit(s):
Experimental Units refer to the pre-determined plots, characteristics or
the blocks, to which different treatments are applied. It is worth mentioning
here that such experimental units must be selected with great caution.

1.4.3 Types of Research Design:


There are different types of research designs. They may be broadly categorized
as:
(1) Exploratory Research Design;
(2) Descriptive and Diagnostic Research Design; and
(3) Hypothesis-Testing Research Design.

1. Exploratory Research Design:


The Exploratory Research Design is known as formulative research design.
The main objective of using such a research design is to formulate a research
problem for an in-depth or more precise investigation, or for developing a
working hypothesis from an operational aspect. The major purpose of such
studies is the discovery of ideas and insights. Therefore, such a research design
suitable for such a study should be flexible enough to provide opportunity for
considering different dimensions of the problem under study. The in-built
flexibility in research design is required as the initial research problem would be
transformed into a more precise one in the exploratory study, which in turn may
necessitate changes in the research procedure for collecting relevant data.
Usually, the following three methods are considered in the context of a research
design for such studies. They are (a) a survey of related literature; (b)
experience survey; and (c) analysis of ‘insight-stimulating’ instances.

2. Descriptive and Diagnostic Research Design:


A Descriptive Research Design is concerned with describing the
characteristics of a particular individual or a group. Meanwhile, a diagnostic
research design determines the frequency with which a variable occurs or its
relationship with another variable. In other words, the study analyzing whether
a certain variable is associated with another comprises a diagnostic research
study. On the other hand, a study that is concerned with specific predictions or
with the narration of facts and characteristics related to an individual, group or
situation, are instances of descriptive research studies. Generally, most of the
social research design falls under this category. As a research design, both the
descriptive and diagnostic studies share common requirements, hence they are
grouped together. However, the procedure to be used and the research design
must be planned carefully. The research design must also make appropriate
provision for protection against bias and thus maximize reliability, with due
regard to the completion of the research study in an economical manner. The
research design in such studies should be rigid and not flexible. Besides, it must
also focus attention on the following:

(a) formulation of the objectives of the study,


(b) proper designing of the methods of data collection ,
(c) sample selection,
(d) data collection,
(e) processing and analysis of the collected data, and
(f) reporting the findings.
3. Hypothesis-testing Research Design:
Hypothesis-testing Research Designs are those in which the researcher tests
the hypothesis of causal relationship between two or more variables. These
studies require procedures that would not only decrease bias and enhance
reliability, but also facilitate deriving inferences about the causality. Generally,
experiments satisfy such requirements. Hence, when research design is
discussed in such studies, it often refers to the design of experiments.

1.4.4 Importance of Research Design:


The need for a research design arises out of the fact that it facilitates the
smooth conduct of the various stages of research. It contributes to making
research as efficient as possible, thus yielding the maximum information with
minimum effort, time and expenditure. A research design helps to plan in
advance, the methods to be employed for collecting the relevant data and the
techniques to be adopted for their analysis. This would help in pursuing the
objectives of the research in the best possible manner, provided the available
staff, time and money are given. Hence, the research design should be prepared
with utmost care, so as to avoid any error that may disturb the entire project.
Thus, research design plays a crucial role in attaining the reliability of the results
obtained, which forms the strong foundation of the entire process of the research
work.
Despite its significance, the purpose of a well-planned design is not
realized at times. This is because it is not given the importance that it deserves.
As a consequence, many researchers are not able to achieve the purpose for
which the research designs are formulated, due to which they end up arriving at
misleading conclusions. Therefore, faulty designing of the research project
tends to render the research exercise meaningless. This makes it imperative that
an efficient and suitable research design must be planned before commencing
the process of research. The research design helps the researcher to organize
his/her ideas in a proper form, which in turn facilitates him/her to identify the
inadequacies and faults in them. The research design is also discussed with
other experts for their comments and critical evaluation, without which it would
be difficult for any critic to provide a comprehensive review and comments on
the proposed study.

1.4.5 Characteristics of a Good Research Design:


A good research design often possesses the qualities of being flexible,
suitable, efficient, economical and so on. Generally, a research design which
minimizes bias and maximizes the reliability of the data collected and analysed
is considered a good design (Kothari 1988). A research design which does not
allow even the smallest experimental error is said to be the best design for
investigation. Further, a research design that yields maximum information and
provides an opportunity of viewing the various dimensions of a research
problem is considered to be the most appropriate and efficient design. Thus, the
question of a good design relates to the purpose or objective and nature of the
research problem studied. While a research design may be good, it may not be
equally suitable to all studies. In other words, it may be lacking in one aspect or
the other in the case of some other research problems. Therefore, no single
research design can be applied to all types of research problems.

A research design suitable for a specific research problem would usually


involve the following considerations:
(i) the methods of gathering the information;
(ii) the skills and availability of the researcher and his/her staff, if any;
(iii) the objectives of the research problem being studied;
(iv) the nature of the research problem being studied; and
(v) the available monetary support and duration of time for the research
work.

1.5 Case Study Research:


The method of exploring and analyzing the life or functioning of a social
or economic unit, such as a person, a family, a community, an institution, a firm
or an industry is called case study method. The objective of case study method
is to examine the factors that cause the behavioural patterns of a given unit and
its relationship with the environment. The data for a study are always gathered
with the purpose of tracing the natural history of a social or economic unit, and
its relationship with the social or economic factors, besides the forces involved
in its environment. Thus, a researcher conducting a study using the case study
method attempts to understand the complexity of factors that are operative
within a social or economic unit as an integrated totality. Burgess (Kothari,
1988) described the special significance of the case study in understanding the
complex behaviour and situations in specific detail. In the context of social
research, he called such data as social microscope.

1.5.1 Criteria for Evaluating Adequacy of Case Study:


John Dollard (Dollard, 1935) specified seven criteria for evaluating the
adequacy of a case or life history in the context of social research. They are:
(i) The subject being studied must be viewed as a specimen in a cultural set
up. That is, the case selected from its total context for the purpose of study
should be considered a member of the particular cultural group or community.
The scrutiny of the life history of the individual must be carried out with a view
to identify the community values, standards and shared ways of life.
(ii) The organic motors of action should be socially relevant. This is to say
that the action of the individual cases should be viewed as a series of
reactions to social stimuli or situations. To Put in simple words, the social
meaning of behaviour should be taken into consideration.

(iii) The crucial role of the family-group in transmitting the culture should be
recognized. This means, as an individual is the member of a family, the
role of the family in shaping his/her behaviour should never be ignored.

(iv) The specific method of conversion of organic material into social


behaviour should be clearly demonstrated. For instance, case-histories that
discuss in detail how basically a biological organism, that is man,
gradually transforms into a social person are particularly important.

(v) The constant transformation of character of experience from childhood to


adulthood should be emphasized. That is, the life-history should portray
the inter-relationship between the individual’s various experiences during
his/her life span. Such a study provides a comprehensive understanding of
an individual’s life as a continuum.

(vi) The ‘social situation’ that contributed to the individual’s gradual


transformation should carefully and continuously be specified as a factor.
One of the crucial criteria for life-history is that an individual’s life should
be depicted as evolving itself in the context of a specific social situation
and partially caused by it.
(vii) The life-history details themselves should be organized according to some
conceptual framework, which in turn would facilitate their generalizations
at higher levels.

These criteria discussed by Dollard emphasize the specific link of co-


ordinated, related, continuous and configured experience in a cultural pattern
that motivated the social and personal behaviour. Although, the criteria
indicated by Dollard are principally perfect, some of them are difficult to put to
practice.
Dollard (1935) attempted to express the diverse events depicted in the
life-histories of persons during the course of repeated interviews by utilizing
psycho-analytical techniques in a given situational context. His criteria of life-
history originated directly from this experience. While the life-histories possess
independent significance as research documents, the interviews recorded by the
investigators can afford, as Dollard observed, “rich insights into the nature of the
social situations experienced by them”.
It is a well-known fact that an individual’s life is very complex. Till date
there is hardly any technique that can establish some kind of uniformity, and as
a result ensure the cumulative of case-history materials by isolating the complex
totality of a human life. Nevertheless, although case history data are difficult to
put to rigorous analysis, a skilful handling and interpretation of such data could
help in developing insights into cultural conflicts and problems arising out of
cultural-change.
Gordon Allport (Kothari 1988) has recommended the following aspects
so as to broaden the perspective of case-study data:
(i) if the life-history is written in first person, it should be as comprehensive
and coherent as possible.
(ii) Life-histories must be written for knowledgeable persons. That is, if the
enquiry of study is sociological in nature, the researcher should write it on
the assumption that it would be read largely by sociologists only.
(iii) It would be advisable to supplement case study data by observational,
statistical and historical data, as they provide standards for assessing the
reliability and consistency of the case study materials. Further, such data
offer a basis for generalizations.
(iv) Efforts must be made to verify the reliability of life-history data by
examining the internal consistency of the collected material, and by
repeating the interviews with the concerned person. Besides this, personal
interviews with the persons who are well-acquainted with him/her,
belonging to his/her own group should be conducted.
(v) A judicious combination of different techniques for data-collection is
crucial for collecting data that are culturally meaningful and scientifically
significant.
(vi) Life-histories or case-histories may be considered as an adequate basis for
generalization to the extent that they are typical or representative of a
certain group.
(vii) The researcher engaged in the collection of case study data should never
ignore the unique or typical cases. He/she should include them as
exceptional cases.
Case histories are filled with valuable information of a personal or
private nature. Such information not only helps the researcher to portray the
personality of the individual, but also the social background that contributed to
it. Besides, it also helps in the formulation of relevant hypotheses. In general,
although Blummer (in Wilkinson and Bhandarkar, 1979) was critical of
documentary material, he gave due credit to case histories by acknowledging the
fact that the personal documents offer an opportunity to the researcher to
develop his/her spirit of enquiry. The analysis of a particular subject would be
more effective if the researcher acquires close acquaintance with it through
personal documents. However, Blummer also acknowledges the limitations of
the personal documents. According to him, such documents do not entirely
fulfill the criteria of adequacy, reliability, and representativeness. Despite these
shortcomings, avoiding their use in any scientific study of personal life would be
wrong, as these documents become necessary and significant for both theory-
building and practice.
In spite of these formidable limitations, case study data are used by
anthropologists, sociologists, economists and industrial psychiatrists. Gordon
Allport (Kothari, 1988) strongly recommends the use of case study data for in-
depth analysis of a subject. For, it is one’s acquaintance with an individual that
instills a desire to know his/her nature and understand them. The first stage
involves understanding the individual and all the complexity of his/her nature.
Any haste in analyzing and classifying the individual would create the risk of
reducing his/her emotional world into artificial bits. As a consequence, the
important emotional organizations, anchorages and natural identifications
characterizing the personal life of the individual might not yield adequate
representation. Hence, the researcher should understand the life of the subject.
Therefore, the totality of life-processes reflected in the well-ordered life-history
documents become invaluable source of stimulating insights. Such life-history
documents provide the basis for comparisons that contribute to statistical
generalizations and help to draw inferences regarding the uniformities in human
behaviour, which are of great value. Even if some personal documents do not
provide ordered data about personal lives of people, which is the basis of
psychological science, they should not be ignored. This is because the final aim
of science is to understand, control and make predictions about human life. Once
they are satisfied, the theoretical and practical importance of personal
documents must be recognized as significant. Thus, a case study may be
considered as the beginning and the final destination of abstract knowledge.
1.6 Hypothesis:
“Hypothesis may be defined as a proposition or a set of propositions set
forth as an explanation for the occurrence of some specified group of
phenomena either asserted merely as a provisional conjecture to guide some
investigation in the light of established facts” (Kothari, 1988). A research
hypothesis is quite often a predictive statement, which is capable of being tested
using scientific methods that involve an independent and some dependent
variables. For instance, the following statements may be considered:
i) “students who take tuitions perform better than the others who do not receive
tuitions” or,
ii) “the female students perform as well as the male students”.
These two statements are hypotheses that can be objectively verified and tested.
Thus, they indicate that a hypothesis states what one is looking for. Besides, it is
a proposition that can be put to test in order to examine its validity.

1.6.1 Characteristics of Hypothesis:

A hypothesis should have the following characteristic features:-


(i) A hypothesis must be precise and clear. If it is not precise and clear, then
the inferences drawn on its basis would not be reliable.

(ii) A hypothesis must be capable of being put to test. Quite often, the
research programmes fail owing to its incapability of being subject to
testing for validity. Therefore, some prior study may be conducted by the
researcher in order to make a hypothesis testable. A hypothesis “is tested
if other deductions can be made from it, which in turn can be confirmed or
disproved by observation” (Kothari, 1988).
(iii) A hypothesis must state relationship between two variables, in the case of
relational hypotheses.
(iv) A hypothesis must be specific and limited in scope. This is because a
simpler hypothesis generally would be easier to test for the researcher.
And therefore, he/she must formulate such hypotheses.
(v) As far as possible, a hypothesis must be stated in the simplest language, so
as to make it understood by all concerned. However, it should be noted that
simplicity of a hypothesis is not related to its significance.
(vi) A hypothesis must be consistent and derived from the most known facts.
In other words, it should be consistent with a substantial body of
established facts. That is, it must be in the form of a statement which
Judges accept as being the most likely to occur.
(vii) A hypothesis must be amenable to testing within a stipulated or reasonable
period of time. No matter how excellent a hypothesis, a researcher should
not use it if it cannot be tested within a given period of time, as no one can
afford to spend a life-time on collecting data to test it.
(viii) A hypothesis should state the facts that give rise to the necessity of looking
for an explanation. This is to say that by using the hypothesis, and other
known and accepted generalizations, a researcher must be able to derive
the original problem condition. Therefore, a hypothesis should explain
what it actually wants to explain, and for this it should also have an
empirical reference.
1.6.2 Concepts Relating to Testing of Hypotheses:
Testing of hypotheses requires a researcher to be familiar with various
concepts concerned with it such as:

1) Null Hypothesis and Alternative Hypothesis:


In the context of statistical analysis, hypothesis is of two types viz., null
hypothesis and alternative hypothesis. When two methods A and B are
compared on their relative superiority, and it is assumed that both the methods
are equally good, then such a statement is called as the null hypothesis. On the
other hand, if method A is considered relatively superior to method B, or vice-
versa, then such a statement is known as an alternative hypothesis. The null
hypothesis is expressed as H0, while the alternative hypothesis is expressed as
Ha. For example, if a researcher wants to test the hypothesis that the population
mean (µ) is equal to the hypothesized mean (H 0) = 100, then the null hypothesis
should be stated as the population mean is equal to the hypothesized mean 100.
Symbolically it may be written as:-
H0: = µ = µ H0 = 100
If sample results do not support this null hypothesis, then it should be
concluded that something else is true. The conclusion of rejecting the null
hypothesis is called as alternative hypothesis. To put it in simple words, the set
of alternatives to the null hypothesis is termed as the alternative hypothesis. If H 0
is accepted, then it implies that Ha is being rejected. On the other hand, if H0 is
rejected, it means that Ha is being accepted. For H0: µ = µ H0 = 100, the
following three possible alternative hypotheses may be considered:
Alternative hypothesis to be read as follows
the alternative hypothesis is that the
Ha: µ ≠ µ H0 population mean is not equal to 100,
i.e., it could be greater than or less
than 100
the alternative hypothesis is that the
Ha : µ > µ H0
population mean is greater than 100
the alternative hypothesis is that the
Ha : µ < µ H0
population mean is less than 100

Before the sample is drawn, the researcher has to state the null
hypothesis and the alternative hypothesis. While formulating the null
hypothesis, the following aspects need to be considered:
(a) Alternative hypothesis is usually the one which a researcher wishes to prove,
whereas the null hypothesis is the one which he/she wishes to disprove. Thus, a
null hypothesis is usually the one which a researcher tries to reject, while an
alternative hypothesis is the one that represents all other possibilities.
(b) The rejection of a hypothesis when it is actually true involves great risk, as it
indicates that it is a null hypothesis because then the probability of rejecting it
when it is true is α (i.e., the level of significance) which is chosen very small.
(c) Null hypothesis should always be specific hypothesis i.e., it should not state
about or approximately a certain value.
(2) The Level of Significance:
In the context of hypothesis testing, the level of significance is a very
important concept. It is a certain percentage that should be chosen with great
care, reason and thought. If for instance, the significance level is taken at 5 per
cent, then it means that H0 would be rejected when the sampling result has a less
than 0.05 probability of occurrence when H0 is true. In other words, the five per
cent level of significance implies that the researcher is willing to take a risk of
five per cent of rejecting the null hypothesis, when (H0) is actually true. In sum,
the significance level reflects the maximum value of the probability of rejecting
H0 when it is actually true, and which is usually determined prior to testing the
hypothesis.
(3) Test of Hypothesis or Decision Rule:
Suppose the given hypothesis is H0 and the alternative hypothesis Ha,
then the researcher has to make a rule known as the decision rule. According to
the decision rule, the researcher accepts or rejects H 0. For example, if the H0 is
that certain students are good against the Ha that all the students are good, then
the researcher should decide the number of items to be tested and the criteria on
the basis of which to accept or reject the hypothesis.
(4) Type I and Type II Errors:
As regards the testing of hypotheses, a researcher can make basically two
types of errors. He/she may reject H0 when it is true, or accept H0 when it is
not true. The former is called as Type I error and the latter is known as Type II
error. In other words, Type I error implies the rejection of a hypothesis when it
must have been accepted, while Type II error implies the acceptance of a
hypothesis which must have been rejected. Type I error is denoted by α (alpha)
and is known as α error, while Type II error is usually denoted by β (beta) and is
known as β error.
(5) One-tailed and two-tailed Tests:
These two types of tests are very important in the context of hypothesis
testing. A two-tailed test rejects the null hypothesis, when the sample mean is
significantly greater or lower than the hypothesized value of the mean of the
population. Such a test is suitable when the null hypothesis is some specified
value, the alternative hypothesis is a value that is not equal to the specified value
of the null hypothesis.
1.6.3 Procedure of Hypothesis Testing:
Testing a hypothesis refers to verifying whether the hypothesis is valid
or not. Hypothesis testing attempts to check whether to accept or not to accept
the null hypothesis. The procedure of hypothesis testing includes all the steps
that a researcher undertakes for making a choice between the two alternative
actions of rejecting or accepting a null hypothesis. The various steps involved in
hypothesis testing are as follows:
(i) Making a Formal Statement:
This step involves making a formal statement of the null hypothesis (H0)
and the alternative hypothesis (Ha). This implies that the hypotheses should be
clearly stated within the purview of the research problem. For example, suppose
a school teacher wants to test the understanding capacity of the students which
must be rated more than 90 per cent in terms of marks, the hypotheses may be
stated as follows:
Null Hypothesis H0 : = 100
Alternative Hypothesis Ha : > 100

(ii) Selecting a Significance Level:


The hypotheses should be tested on a pre-determined level of
significance, which should be specified. Usually, either 5% level or 1% level is
considered for the purpose. The factors that determine the levels of significance
are: (a) the magnitude of difference between the sample means; (b) the sample
size: (c) the variability of measurements within samples; and (d) whether the
hypothesis is directional or non-directional (Kothari, 1988). In sum, the level of
significance should be sufficient in the context of the nature and purpose of
enquiry.
(iii) Deciding the Distribution to Use:
After making decision on the level of significance for hypothesis testing,
the researcher has to next determine the appropriate sampling distribution. The
choice to be made generally relates to normal distribution and the t-distribution.
The rules governing the selection of the correct distribution are similar to the
ones already discussed with respect to estimation.

(iv) Selection of a Random Sample and Computing an Appropriate


Value:
Another step involved in hypothesis testing is the selection of a random
sample and then computing a suitable value from the sample data relating to test
statistic by using the appropriate distribution. In other words, it involves
drawing a sample for furnishing empirical data.

(v) Calculation of the Probability:


The next step for the researcher is to calculate the probability that the
sample result would diverge as far as it can from expectations, under the
situation when the null hypothesis is actually true.

(vi) Comparing the Probability:


Another step involved consists of making a comparison of the probability
calculated with the specified value for α, the significance level. If the calculated
probability works out to be equal to or smaller than the α value in case of one-
tailed test, then the null hypothesis is to be rejected. On the other hand, if the
calculated probability is greater, then the null hypothesis is to be accepted. In
case the null hypothesis H0 is rejected, the researcher runs the risk of committing
the Type I error. But, if the null hypothesis H0 is accepted, then it
involves some risk (which cannot be specified in size as long as H 0 is vague and
not specific) of committing the Type II error.

1.7 Sample Survey:


A sample design is a definite plan for obtaining a sample from a given
population (Kothari, 1988). Sample constitutes a certain portion of the
population or universe. Sampling design refers to the technique or the
procedure the researcher adopts for selecting items for the sample from the
population or universe. A sample design helps to decide the number of items to
be included in the sample, i.e., the size of the sample. The sample design should
be determined prior to data collection. There are different kinds of sample
designs which a researcher can choose. Some of them are relatively more
precise and easier to adopt than the others. A researcher should prepare or select
a sample design, which must be reliable and suitable for the research study
proposed to be undertaken.

1.8.1 Steps in Sampling Design:


A researcher should take into consideration the following aspects while
developing a sample design:

(i) Type of universe:


The first step involved in developing sample design is to clearly define the
number of cases, technically known as the Universe, to be studied. A universe
may be finite or infinite. In a finite universe the number of items is certain,
whereas in the case of an infinite universe the number of items is infinite (i.e.,
there is no idea about the total number of items). For example, while the
population of a city or the number of workers in a factory comprise finite
universes, the number of stars in the sky, or throwing of a dice represent infinite
universe.

(ii) Sampling Unit:


Prior to selecting a sample, decision has to be made about the sampling unit. A
sampling unit may be a geographical area like a state, district, village, etc., or a
social unit like a family, religious community, school, etc., or it may also be an
individual. At times, the researcher would have to choose one or more of such
units for his/her study.

(iii) Source List:


Source list is also known as the ‘sampling frame’, from which the sample is to
be selected. The source list consists of names of all the items of a universe. The
researcher has to prepare a source list when it is not available. The source list
must be reliable, comprehensive, correct, and appropriate. It is important that
the source list should be as representative of the population as possible.

(iv) Size of Sample:


Size of the sample refers to the number of items to be chosen from the universe
to form a sample. For a researcher, this constitutes a major problem. The size of
sample must be optimum. An optimum sample may be defined as the one that
satisfies the requirements of representativeness, flexibility, efficiency, and
reliability. While deciding the size of sample, a researcher should determine the
desired precision and the acceptable confidence level for the estimate. The size
of the population variance should be considered, because in the case of a larger
variance generally a larger sample is required. The size of the population should
be considered, as it also limits the sample size. The parameters of interest in a
research study should also be considered, while deciding the sample size.
Besides, costs or budgetary constraint also plays a crucial role in deciding the
sample size.

(a) Parameters of Interest:


The specific population parameters of interest should also be considered
while determining the sample design. For example, the researcher may want to
make an estimate of the proportion of persons with certain characteristic in the
population, or may be interested in knowing some average regarding the
population. The population may also consist of important sub-groups about
whom the researcher would like to make estimates. All such factors have strong
impact on the sample design the researcher selects.

(b) Budgetary Constraint:


From the practical point of view, cost considerations exercise a major
influence on the decisions related to not only the sample size, but also on the
type of sample selected. Thus, budgetary constraint could also lead to the
adoption of a non-probability sample design.

(c) Sampling Procedure:


Finally, the researcher should decide the type of sample or the technique
to be adopted for selecting the items for a sample. This technique or procedure
itself may represent the sample design. There are different sample designs from
which a researcher should select one for his/her study. It is clear that the
researcher should select that design which, for a given sample size and budget
constraint, involves a smaller error.

1.7.2 Criteria for Selecting a Sampling Procedure:


Basically, two costs are involved in a sampling analysis, which govern
the selection of a sampling procedure. They are:
(i) the cost of data collection, and
(ii) the cost of drawing incorrect inference from the selected data.
There are two causes of incorrect inferences, namely systematic bias and
sampling error. Systematic bias arises out of errors in the sampling procedure.
They cannot be reduced or eliminated by increasing the sample size. Utmost,
the causes of these errors can be identified and corrected. Generally, a
systematic bias arises out of one or more of the following factors:
a. inappropriate sampling frame,
b. defective measuring device,
c. non-respondents,
d. indeterminacy principle, and
e. natural bias in the reporting of data.

Sampling error refers to the random variations in the sample estimates


around the true population parameters. Because they occur randomly and likely
to be equally in either direction, they are of compensatory type, the expected
value of which errors tend to be equal to zero. Sampling error tends to decrease
with the increase in the size of the sample. It also becomes smaller in magnitude
when the population is homogenous.

Sampling error can be computed for a given sample size and design. The
measurement of sampling error is known as ‘precision of the sampling plan’.
When the sample size is increased, the precision can be improved. However,
increasing the sample size has its own limitations. The large sized sample not
only increases the cost of data collection, but also increases the systematic bias.
Thus, an effective way of increasing the precision is generally to choose a better
sampling design, which has smaller sampling error for a given sample size at a
specified cost. In practice, however, researchers generally prefer a less precise
design owing to the ease in adopting the same, in addition to the fact that
systematic bias can be controlled better way in such designs.
In sum, while selecting the sample, a researcher should ensure that the
procedure adopted involves a relatively smaller sampling error and helps to
control systematic bias.
1.7.3 Characteristics of a Good Sample Design:
The following are the characteristic features of a good sample design:
(a) the sample design should yield a truly representative sample;
(b) the sample design should be such that it results in small sampling error;
(c) the sample design should be viable in the context of budgetary
constraints of the research study;
(d) the sample design should be such that the systematic bias can be
controlled; and
(e) the sample must be such that the results of the sample study would be
applicable, in general, to the universe at a reasonable level of confidence.

1.7.4 Different Types of Sample Designs:


Sample designs may be classified into different categories based on two
factors, namely, the representation basis and the element selection technique.
Under the representation basis, the sample may be classified as:
I. non-probability sampling
II. probability sampling
While probability sampling is based on random selection, the non-
probability sampling is based on ‘non-random’ sampling.

I. Non-Probability Sampling:
Non-probability sampling is the sampling procedure that does not afford any
basis for estimating the probability that each item in the population would have
an equal chance of being included in the sample. Non-probability sampling is
also known as deliberate sampling, judgment sampling and purposive sampling.
Under this type of sampling, the items for the sample are deliberately chosen by
the researcher; and his/her choice concerning the choice of items remains
supreme. In other words, under non-probability sampling the researchers select
a particular unit of the universe for forming a sample on the basis that the small
number that is thus selected out of a huge one would be typical or representative
of the whole population. For example, to study the economic conditions of
people living in a state, a few towns or village may be purposively selected for
an intensive study based on the principle that they are representative of the
entire state. In such a case, the judgment of the researcher of the study assumes
prime importance in this sampling design.

Quota Sampling:
Quota sampling is also an example of non-probability sampling. Under
this sampling, the researchers simply assume quotas to be filled from different
strata, with certain restrictions imposed on how they should be selected. This
type of sampling is very convenient and is relatively less expensive. However,
the samples selected using this method certainly do not satisfy the characteristics
of random samples. They are essentially judgment samples and inferences
drawn based on that would not be amenable to statistical treatment in a formal
way.
II. Probability Sampling:
Probability sampling is also known as ‘choice sampling’ or ‘random sampling’.
Under this sampling design, every item of the universe has an equal chance of
being included in the sample. In a way, it is a lottery method under which
individual units are selected from the whole group, not deliberately, but by using
some mechanical process. Therefore, only chance would determine whether an
item or the other would be included in the sample or not. The results obtained
from probability or random sampling would be assured in terms of probability.
That is, the researcher can measure the errors of estimation or the significance of
results obtained from the random sample. This is the superiority of random
sampling design over the deliberate sampling design. Random sampling
satisfies the law of Statistical Regularity, according to which if on an average the
sample chosen is random, then it would have the same composition and
characteristics of the universe. This is the reason why the random sampling
method is considered the best technique of choosing a representative sample.

The following are the implications of the random sampling:


(i) it provides each element in the population an equal probability chance of
being chosen in the sample, with all choices being independent of one another
and
(ii) it offers each possible sample combination an equal probability
opportunity of being selected.

1.7.5 Method of Selecting a Random Sample:

The process of selecting a random sample involves writing the name of


each element of a finite population on a slip of paper and putting them into a box
or a bag. Then they have to be thoroughly mixed and then the required number
of slips for the sample should be picked one after the other without replacement.
While doing this, it has to be ensured that in successive drawings each of the
remaining elements of the population has an equal chance of being chosen. This
method results in the same probability for each possible sample.
1.7.6 Complex random sampling designs:
Under restricted sampling technique, the probability sampling may result in
complex random sampling designs. Such designs are known as mixed sampling
designs. Many of such designs may represent a combination of non-probability
and probability sampling procedures in choosing a sample.
Some of the prominent complex random sampling designs are as follows:
(i) Systematic sampling: In some cases, the best way of sampling is to select
every first item on a list. Sampling of this kind is called as systematic sampling.
An element of randomness is introduced in this type of sampling by using
random numbers to select the unit with which to start. For example, if a 10 per
cent sample is required, the first item would be selected randomly from the first
and thereafter every 10th item. In this kind of sampling, only the first unit is
selected randomly, while rest of the units of the sample is chosen at fixed
intervals.
(ii) Stratified Sampling: When a population from which a sample is to be
selected does not comprise a homogeneous group, stratified sampling technique
is generally employed for obtaining a representative sample. Under stratified
sampling, the population is divided into many sub-populations in such a manner
that they are individually more homogeneous than the rest of the total
population. Then, items are selected from each stratum to form a sample. As
each stratum is more homogeneous than the remaining total population, the
researcher is able to obtain a more precise estimate for each stratum and by
estimating each of the component parts more accurately, he/she is able to obtain
a better estimate of the whole. In sum, stratified sampling method yields more
reliable and detailed information.
(iii) Cluster Sampling: When the total area of research interest is large, a
convenient way in which a sample can be selected is to divide the area into a
number of smaller non-overlapping areas and then randomly selecting a number
of such smaller areas. In the process, the ultimate sample would consist of all
the units in these small areas or clusters. Thus in cluster sampling, the total
population is sub-divided into numerous relatively smaller subdivisions, which
in themselves constitute clusters of still smaller units. And then, some of such
clusters are randomly chosen for inclusion in the overall sample.
(iv) Area Sampling: When clusters are in the form of some geographic
subdivisions, then cluster sampling is termed as area sampling. That is, when
the primary sampling unit represents a cluster of units based on geographic area,
the cluster designs are distinguished as area sampling. The merits and demerits
of cluster sampling are equally applicable to area sampling.
(v) Multi-stage Sampling: A further development of the principle of cluster
sampling is multi-stage sampling. When the researcher desires to investigate the
working efficiency of nationalized banks in India and a sample of few banks is
required for this purpose, the first stage would be to select large primary
sampling unit like the states in the country. Next, certain districts may be
selected and all banks interviewed in the chosen districts. This represents a two-
stage sampling design, with the ultimate sampling units being clusters of
districts.
On the other hand, if instead of taking census of all banks within the
selected districts, the researcher chooses certain towns and interviews all banks
in it, this would represent three-stage sampling design. Again, if instead of
taking a census of all banks within the selected towns, the researcher randomly
selects sample banks from each selected town, then it represents a case of using
a four-stage sampling plan. Thus, if the researcher selects randomly at all
stages, then it is called as multi-stage random sampling design.
(vi) Sampling with Probability Proportional to Size: When the case of cluster
sampling units does not have exactly or approximately the same number of
elements, it is better for the researcher to adopt a random selection process,
where the probability of inclusion of each cluster in the sample tends to be
proportional to the size of the cluster. For this, the number of elements in each
cluster has to be listed, irrespective of the method used for ordering it. Then the
researcher should systematically pick the required number of elements from the
cumulative totals. The actual numbers thus chosen would not however reflect the
individual elements, but would indicate as to which cluster and how many from
them are to be chosen by using simple random sampling or systematic sampling.
The outcome of such sampling is equivalent to that of simple random sample.
The method is also less cumbersome and is also relatively less expensive.
Thus, a researcher has to pass through various stages of conducting
research once the problem of interest has been selected. Research methodology
familiarizes a researcher with the complex scientific methods of conducting
research, which yield reliable results that are useful to policy-makers,
government, industries etc. in decision-making.
Data Collection

Introduction:
It is important for a researcher to know the sources of data which he
requires for different purposes. Data are nothing but the information. There are
two sources of information or data - Primary data and Secondary data. Primary
data refers to the data collected for the first time, whereas secondary data refers
to the data that have already been collected and used earlier by somebody or
some agency. For example, the statistics collected by the Government of India
relating to the population is primary data for the Government of India since it
has been collected for the first time. Later when the same data are used by a
researcher for his study of a particular problem, then the same data become the
secondary data for the researcher. Both the sources of information have their
merits and demerits. The selection of a particular source depends upon the (a)
purpose and scope of enquiry, (b) availability of time, (c) availability of finance,
(d) accuracy required, (e) statistical tools to be used, (f) sources of information
(data), and (g) method of data collection.

(a) Purpose and Scope of Enquiry: The purpose and scope of data
collection or survey should be clearly set out at the very beginning. It requires
the clear statement of the problem indicating the type of information which is
needed and the use for which it is needed. If for example, the researcher is
interested in knowing the nature of price change over a period of time, it would
be necessary to collect data of commodity prices. It must be decided whether it
would be helpful to study wholesale or retail prices and the possible uses to
which such information could be put. The objective of an enquiry may be either
to collect specific information relating to a problem or adequate data to test a
hypothesis. Failure to set out clearly the purpose of enquiry is bound to lead to
confusion and waste of resources.
After the purpose of enquiry has been clearly defined, the next step is to
decide about the scope of the enquiry. Scope of the enquiry means the coverage
with regard to the type of information, the subject-matter and geographical area.
For instance, an enquiry may relate to India as a whole or a state or an industrial
town wherein a particular problem related to a particular industry can be studied.

(b)Availability of Time: - The investigation should be carried out within a


reasonable period of time, failing which the information collected may become
outdated, and would have no meaning at all. For instance, if a producer wants to
know the expected demand for a product newly launched by him and the result
of the enquiry that the demand would be meager takes two years to reach him,
then the whole purpose of enquiry would become useless because by that time
he would have already incurred a huge loss. Thus, in this respect the information
is quickly required and hence the researcher has to choose the type of enquiry
accordingly.

(c) Availability of Resources: The investigation will greatly depend on the


resources available like number of skilled personnel, the financial position etc. If
the number of skilled personnel who will carry out the enquiry is quite sufficient
and the availability of funds is not a problem, then enquiry can be conducted
over a big area covering a good number of samples, otherwise a small sample
size will do.

(d)The Degree of Accuracy Desired: Deciding the degree of accuracy required


is a must for the investigator, because absolute accuracy in statistical work is
seldom achieved. This is so because (i) statistics are based on estimates, (ii)
tools of measurement are not always perfect and (iii) there may be unintentional
bias on the part of the investigator, enumerator or informant. Therefore, a desire
of 100% accuracy is bound to remain unfulfilled. Degree of accuracy desired
primarily depends upon the object of enquiry. For example, when we buy gold,
even a difference of 1/10th gram in its weight is significant, whereas the same
will not be the case when we buy rice or wheat. However, the researcher must
aim at attaining a higher degree of accuracy, otherwise the whole purpose of
research would become meaningless.
(e) Statistical Tools to be used: A well defined and identifiable object or a
group of objects with which the measurements or counts in any statistical
investigation are associated is called a statistical unit. For example, in socio-
economic survey the unit may be an individual, a family, a household or a block
of locality. A very important step before the collection of data begins is to define
clearly the statistical units on which the data are to be collected. In number of
situations the units are conventionally fixed like the physical units of
measurement, such as meters, kilometers, quintals, hours, days, weeks etc.,
which are well defined and do not need any elaboration or explanation.
However, in many statistical investigations, particularly relating to socio-
economic studies, arbitrary units are used which must be clearly defined. This is
a must because in the absence of a clear cut and precise definition of the
statistical units, serious errors in the data collection may be committed in the
sense that we may collect irrelevant data on the items, which should have, in
fact, been excluded and omit data on certain items which should have been
included. This will ultimately lead to fallacious conclusions.

(f) Sources of Information (data): After deciding about the unit, a researcher
has to decide about the source from which the information can be obtained or
collected. For any statistical inquiry, the investigator may collect the data first
hand or he may use the data from other published sources, such as publications
of the government/semi-government organizations or journals and magazines
etc.
(g) Method of Data Collection: - There is no problem if secondary data are
used for research. However, if primary data are to be collected, a decision has to
be taken whether (i) census method or (ii) sample technique is to be used for
data collection. In census method, we go for total enumeration i.e., all the units
of a universe have to be investigated. But in sample technique, we inspect or
study only a selected representative and adequate fraction of the population and
after analyzing the results of the sample data we draw conclusions about the
characteristics of the population. Selection of a particular technique becomes
difficult because where population or census method is more scientific and
100% accuracy can be attained through this method, choosing this becomes
difficult because it is time taking, it requires more labor and it is very expensive.
Therefore, for a single researcher or for a small institution it proves to be
unsuitable. On the other hand, sample method is less time taking, less laborious
and less expensive but a 100% accuracy cannot be attained through this method
because of sampling and non-sampling errors attached to this method. Hence, a
researcher has to be very cautious and careful while choosing a particular
method.
Methods of Collecting Primary Data:
Primary data may be obtained by applying any of the following methods:
1. Direct Personal Interviews.
2. Indirect oral interviews.
3. Information from correspondents.
4. Mailed questionnaire methods.
5. Schedule sent through enumerators.
1. Direct personal interviews: A face to face contact is made with the
informants (persons from whom the information is to be obtained) under this
method of collecting data. The interviewer asks them questions pertaining to the
survey and collects the desired information. Thus, if a person wants to collect
data about the working conditions of the workers of the Tata Iron and Steel
Company, Jamshedpur, he would go to the factory, contact the workers and
obtain the desired information. The information collected in this manner is first
hand and also original in character. There are many merits and demerits of this
method, which are discussed as under:
Merits:
1. Most often respondents are happy to pass on the information required
from them when contacted personally and thus response is encouraging.
2. The information collected through this method is normally more accurate
because interviewer can clear doubts of the informants about certain
questions and thus obtain correct information. In case the interviewer
apprehends that the informant is not giving accurate information, he may
cross-examine him and thereby try to obtain the information.
3. This method also provides the scope for getting supplementary
information from the informant, because while interviewing it is possible
to ask some supplementary questions which may be of greater use later.
4. There might be some questions which the interviewer would find
difficult to ask directly, but with some tactfulness, he can mingle such
questions with others and get the desired information. He can twist the
questions keeping in mind the informant’s reaction. Precisely, a delicate
situation can usually he handled more effectively by a personal interview
than by other survey techniques.
5. The interviewer can adjust the language according to the status and
educational level of the person interviewed, and thereby can avoid
inconvenience and misinterpretation on the part of the informant.

Demerits:

1. This method can prove to be expensive if the number of informants is large


and the area is widely spread.

2. There is a greater chance of personal bias and prejudice under this method as
compared to other methods.

3. The interviewers have to be thoroughly trained and experienced; otherwise


they may not be able to obtain the desired information. Untrained or poorly
trained interviewers may spoil the entire work.

4. This method is more time taking as compared to others. This is because


interviews can be held only at the convenience of the informants. Thus, if
information is to be obtained from the working members of households,
interviews will have to be held in the evening or on week end. Even during
evening only an hour or two can be used for interviews and hence, the work
may have to be continued for a long time, or a large number of people may
have to be employed which may involve huge expenses.

Conclusion:
Though there are some demerits in this method of data collection still we cannot
say that it is not useful. The matter of fact is that this method is suitable for
intensive rather than extensive field surveys. Hence, it should be used only in
those cases where intensive study of a limited field is desired.
In the present time of extreme advancement in the communication system,
the investigator instead of going personally and conducting a face to face
interview may also obtain information over telephone. A good number of
surveys are being conducted every day by newspapers and television channels
by sending the reply either by e-mail or SMS. This method has become very
popular nowadays as it is less expensive and the response is extremely quick.
But this method suffers from some serious defects, such as (a) very few people
own a phone or a television and hence a limited number of people can be
approached by this method, (b) only few questions can be asked over phone or
through television, (c) the respondents may give a vague and reckless answers
because answers on phone or through SMS would have to be very short.
2. Indirect Oral Interviews: Under this method of data collection, the
investigator contacts third parties generally called ‘witnesses’ who are capable
of supplying necessary information. This method is generally adopted when the
information to be obtained is of a complex nature and informants are not
inclined to respond if approached directly. For example, when the researcher is
trying to obtain data on drug addiction or the habit of taking liquor, there is high
probability that the addicted person will not provide the desired data and hence
will disturb the whole research process. In this situation taking the help of such
persons or agencies or the neighbours who know them well becomes necessary.
Since these people know the person well, they can provide the desired data.
Enquiry Committees and Commissions appointed by the Government generally
adopt this method to get people’s views and all possible details of the facts
related to the enquiry.
Though this method is very popular, its correctness depends upon a number of
factors which are discussed below:
1. The person or persons or agency whose help is solicited must be of proven
integrity; otherwise any bias or prejudice on their part will not bring the correct
information and the whole process of research will become useless.
2. The ability of the interviewers to draw information from witnesses by means
of appropriate questions and cross-examination.
3. It might happen that because of bribery, nepotism or certain other reasons
those who are collecting the information give it such a twist that correct
conclusions are not arrived at.
Therefore, for the success of this method it is necessary that the evidence of
one person alone is not relied upon. Views from other persons and related
agencies should also be ascertained to find the real position .Utmost care must
be exercised in the selection of these persons because it is on their views that the
final conclusions are reached.
3. Information from Correspondents: The investigator appoints local agents
or correspondents in different places to collect information under this method.
These correspondents collect and transmit the information to the central office
where data are processed. This method is generally adopted by news paper
agencies. Correspondents who are posted at different places supply information
relating to such events as accidents, riots, strikes, etc., to the head office. The
correspondents are generally paid staff or sometimes they may be honorary
correspondents also. This method is also adopted generally by the government
departments in such cases where regular information is to be collected from a
wide area. For example, in the construction of a wholesale price index numbers
regular information is obtained from correspondents appointed in different areas.
The biggest advantage of this method is that it is cheap and appropriate for
extensive investigation. But a word of caution is that it may not always ensure
accurate results because of the personal prejudice and bias of the correspondents.
As stated earlier, this method is suitable and adopted in those cases where the
information is to be obtained at regular intervals from a wide area.
4. Mailed Questionnaire Method: Under this method, a list of questions
pertaining to the survey which is known as ‘Questionnaire’ is prepared and
sent to the various informants by post. Sometimes the researcher himself too
contacts the respondents and gets the responses related to various
questions in the questionnaire. The questionnaire contains questions and
provides space for answers. A request is made to the informants through a
covering letter to fill up the questionnaire and send it back within a specified
time. The questionnaire studies can be classified on the basis of:
i. The degree to which the questionnaire is formalized or structured.
ii. The disguise or lack of disguise of the questionnaire and
iii. The communication method used.

When no formal questionnaire is used, interviewers adapt their questioning


to each interview as it progresses. They might even try to elicit responses by
indirect methods, such as showing pictures on which the respondent comments.
When a researcher follows a prescribed sequence of questions, it is referred to as
structured study. On the other hand, when no prescribed sequence of questions
exists, the study is non-structured.
When questionnaires are constructed in such a way that the objective is clear
to the respondents then these questionnaires are known as non- disguised; on the
other hand, when the objective is not clear, the questionnaire is a disguised one.
On the basis of these two classifications, four types of studies can he
distinguished:
i. Non-disguised structured,
ii. Non-disguised non-structured,
iii. Disguised structured and
iv. Disguised non-structured.

There are certain merits and demerits or limitations of this method of data
collection which are discussed below:
Merits:
1. Questionnaire method of data collection can be easily adopted where the
field of investigation is very vast and the informants are spread over a
wide geographical area.
2. This method is relatively cheap and expeditious provided the informants
respond in time.
3. This method has proved to be superior when compared to other methods like
personal interviews or telephone method. This is because when questions
pertaining to personal nature or the ones requiring reaction by the family are
put forth to the informants, there is a chance for them to be embarrassed in
answering them.
Demerits:
1. This method can be adopted only where the informants are literate
people so that they can understand written questions and lend the
answers in writing.
2. It involves some uncertainty about the response. Co-operation on the part of
informants may be difficult to presume.
3. The information provided by the informants may not be correct and it may
be difficult to verify the accuracy.

However, by following the guidelines given below, this method can be made
more effective:
The questionnaires should be made in such a manner that they do not
become an undue burden on the respondents; otherwise the respondents may not
return them back.
i. Prepaid postage stamp should be affixed
ii. The sample should be large
iii. It should be adopted in such enquiries where it is expected that the
respondents would return the questionnaire because of their own interest
in the enquiry.
iv. It should be preferred in such enquiries where there could be a legal
compulsion to provide the information.

5. Schedules sent through Enumerators: Another method of data collection is


sending schedules through the enumerators or interviewers. The enumerators
contact the informants, get replies to the questions contained in a schedule and
fill them in their own handwriting in the questionnaire form. There is difference
between questionnaire and schedule. Questionnaire refers to a device for
securing answers to questions by using a form which the respondent fills in him
self, whereas Schedule is the name usually applied to a set of questions which
are asked in a face-to face situation with another person. This method is free
from most of the limitations of the mailed questionnaire method.

Merits:
The main merits or advantages of this method are listed below:
i. It can be adopted in those cases where informants are illiterate.
ii. There is very little scope of non-response as the enumerators go personally
to obtain the information.
iii. The information received is more reliable as the accuracy of statements can
be checked by supplementary questions wherever necessary.
This method too like others is not free from defects or limitations. The
main limitations are listed below:

Demerits:
i. In comparison to other methods of collecting primary data, this method is
quite costly as enumerators are generally paid persons.
ii. The success of the method depends largely upon the training imparted to
the enumerators.
iii. Interviewing is a very skilled work and it requires experience and training.
Many statisticians have the tendency to neglect this extremely important
part of the data collecting process and this result in bad interviews.
Without good interviewing most of the information collected is of doubtful
value.
iv. Interviewing is not only a skilled work but it also requires a great degree of
politeness and thus the way the enumerators conduct the interview would
affect the data collected. When questions are asked by a number of
different interviewers, it is possible that variations in the personalities of
the interviewers will cause variation in the answers obtained. This
variation will not be obvious. Hence, every effort must be made to remove
as much of variation as possible due to different interviewers.

Secondary Data: As stated earlier, secondary data are those data which have
already been collected and analyzed by some earlier agency for its own use, and
later the same data are used by a different agency. According to
W.A.Neiswanger, “A primary source is a publication in which the data are
published by the same authority which gathered and analyzed them. A secondary
source is a publication, reporting the data which was gathered by other
authorities and for which others are responsible.”
Sources of secondary data:-The various sources of secondary data can be
divided into two broad categories:
1. Published sources, and
2. Unpublished sources.

1. Published Sources: The governmental, international and local agencies


publish statistical data, and chief among them are explained below:

(a) International Publications: There are some international institutions and


bodies like I.M.F, I.B.R.D, I.C.A.F.E and U.N.O who publish regular and
occasional reports on economic and statistical matters.

(b) Official publications of Central and State Governments: Several


departments of the Central and State Governments regularly publish reports on a
number of subjects. They gather additional information. Some of the important
publications are: The Reserve Bank of India Bulletin, Census of India, Statistical
Abstracts of States, Agricultural Statistics of India, Indian Trade Journal, etc.
(c) Semi-official publications: Semi-Government institutions like Municipal
Corporations, District Boards, Panchayats, etc. publish reports relating to
different matters of public concern.
(d) Publications of Research Institutions: Indian Statistical Institute (I.S.I),
Indian Council of Agricultural Research (I.C.A.R), Indian Agricultural Statistics
Research Institute (I.A.S.R.I), etc. publish the findings of their research
programmes.
(e) Publications of various Commercial and Financial Institutions
(f) Reports of various Committees and Commissions appointed by the
Government as the Raj Committee’s Report on Agricultural Taxation, Wanchoo
Committee’s Report on Taxation and Black Money, etc. are also important
sources of secondary data.
(g) Journals and News Papers: Journals and News Papers are very important
and powerful source of secondary data. Current and important materials on
statistics and socio-economic problems can be obtained from journals and
newspapers like Economic Times, Commerce, Capital, Indian Finance, Monthly
Statistics of trade etc.

2. Unpublished Sources: Unpublished data can be obtained from many


unpublished sources like records maintained by various government and private
offices, the theses of the numerous research scholars in the universities or
institutions etc.

Precautions in the Use of Secondary Data: Since secondary data have already
been obtained, it is highly desirable that a proper scrutiny of such data is made
before they are used by the investigator. In fact the user has to be extra-cautious
while using secondary data. In this context Prof. Bowley rightly points out that
“Secondary data should not be accepted at their face value.” The reason being
that data may be erroneous in many respects due to bias, inadequate size of the
sample, substitution, errors of definition, arithmetical errors etc. Even if there is
no error such data may not be suitable and adequate for the purpose of the
enquiry. Prof. Simon Kuznet’s view in this regard is also of great importance.
According to him, “The degree of reliability of secondary source is to be
assessed from the source, the compiler and his capacity to produce correct
statistics and the users also, for the most part, tend to accept a series particularly
one issued by a government agency at its face value without enquiring its
reliability”.
Therefore, before using the secondary data the investigators should
consider the following factors:
4. The suitability of data: The investigator must satisfy himself that the data
available are suitable for the purpose of enquiry. It can be judged by the
nature and scope of the present enquiry with the original enquiry. For
example, if the object of the present enquiry is to study the trend in retail
prices, and if the data provide only wholesale prices, such data are
unsuitable.
(a) Adequacy of data: If the data are suitable for the purpose of investigation
then we must consider whether the data are useful or adequate for the
present analysis. It can be studied by the geographical area covered by the
original enquiry. The time for which data are available is very important
element. In the above example, if our object is to study the retail price
trend of India, and if the available data cover only the retail price trend in
the State of Bihar, then it would not serve the purpose.

(b) Reliability of data: The reliability of data is must. Without which there is
no meaning in research. The reliability of data can be tested by finding out
the agency that collected such data. If the agency has used proper methods
in collection of data, statistics may be relied upon.

It is not enough to have baskets of data in hand. In fact, data in a raw form are
nothing but a handful of raw material waiting for proper processing so that they
can become useful. Once data have been obtained from primary or secondary
source, the next step in a statistical investigation is to edit the data i.e. to
scrutinize the same. The chief objective of editing is to detect possible errors and
irregularities. The task of editing is a highly specialized one and requires great
care and attention. Negligence in this respect may render useless the findings of
an otherwise valuable study. Editing data collected from internal records and
published sources is relatively simple but the data collected from a survey need
excessive editing.
While editing primary data, the following considerations should be borne in
mind:
1. The data should be complete in every respect
2. The data should be accurate
3. The data should be consistent, and
4. The data should be homogeneous.

Data to posses the above mentioned characteristics have to undergo the


same type of editing which is discussed below:
5. Editing for completeness: While editing, the editor should see that each
schedule and questionnaire is complete in all respects. He should see to it that
the answers to each and every question have been furnished. If some questions
are not answered and if they are of vital importance, the informants should be
contacted again either personally or through correspondence. Even after all the
efforts it may happen that a few questions remain unanswered. In such
questions, the editor should mark ‘No answer’ in the space provided for answers
and if the questions are of vital importance then the schedule or questionnaire
should be dropped.
1. Editing for Consistency: At the time of editing the data for consistency,
the editor should see that the answers to questions are not contradictory in
nature. If they are mutually contradictory answers, he should try to obtain the
correct answers either by referring back the questionnaire or by contacting,
wherever possible, the informant in person. For example, if amongst others, two
questions in questionnaire are (a) Are you a student? (b) Which class do you
study and the reply to the first question is ‘no’ and to the latter ‘tenth’ then there
is contradiction and it should be clarified.
2. Editing for Accuracy: The reliability of conclusions depends basically
on the correctness of information. If the information supplied is wrong,
conclusions can never be valid. It is, therefore, necessary for the editor to see
that the information is accurate in all respects. If the inaccuracy is due to
arithmetical errors, it can be easily detected and corrected. But if the cause of
inaccuracy is faulty information supplied, it may be difficult to verify it and an
example of this kind is information relating to income, age etc.
3. Editing for Homogeneity: Homogeneity means the condition in which
all the questions have been understood in the same sense. The editor must check
all the questions for uniform interpretation. For example, as to the question of
income, if some informants have given monthly income, others annual income
and still others weekly income or even daily income, no comparison can be
made. Therefore, it becomes an essential duty of the editor to check up that the
information supplied by the various people is homogeneous and uniform.

Choice between Primary and Secondary Data: As we have already seen, there
are a lot of differences in the methods of collecting Primary and Secondary data.
Primary data which is to be collected originally involves an entire scheme of
plan starting with the definitions of various terms used, units to be employed,
type of enquiry to be conducted, extent of accuracy aimed at etc. For the
collection of secondary data, a mere compilation of the existing data would be
sufficient. A proper choice between the type of data needed for any particular
statistical investigation is to be made after taking into consideration the nature,
objective and scope of the enquiry; the time and the finances at the disposal of
the agency; the degree of precision aimed at and the status of the agency
(whether government- state or central-or private institution of an individual).
In using the secondary data, it is best to obtain the data from the primary source
as far as possible. By doing so, we would at least save ourselves from the errors
of transcription which might have inadvertently crept in the secondary source.
Moreover, the primary source will also provide us with detailed discussion about
the terminology used, statistical units employed, size of the sample and the
technique of sampling (if sampling method was used), methods of data
collection and analysis of results and we can ascertain ourselves if these would
suit our purpose.
Now-a-days in a large number of statistical enquiries, secondary data are
generally used because fairly reliable published data on a large number of
diverse fields are now available in the publications of governments, private
organizations and research institutions, agencies, periodicals and magazines etc.
In fact, primary data are collected only if there do not exist any secondary data
suited to the investigation under study. In some of the investigations both
primary as well as secondary data may be used.

SUMMARY:
There are two types of data, primary and secondary. Data which are collected
first hand are called Primary data and data which have already been collected
and used by somebody are called Secondary data. There are two methods of
collecting data: (a) Survey method or total enumeration method and (b) Sample
method. When a researcher goes for investigating all the units of the subject, it is
called as survey method. On the other hand if he/she resorts to investigating only
a few units of the subject and gives the result on the basis of that, it is known as
sample survey method. There are different sources of collecting Primary and
Secondary data. Some of the important sources of Primary data are—Direct
Personal Interviews, Indirect Oral Interviews, Information from correspondents,
Mailed questionnaire method, Schedules sent through enumerators and so on.
Though all these sources or methods of Primary data have their relative merits
and demerits, a researcher should use a particular method with lot of care. There
are basically two sources of collecting secondary data- (a) Published sources and
(b) Unpublished sources. Published sources are like publications of different
government and semi-government departments, research institutions and
agencies etc. whereas unpublished sources are like records maintained by
different government departments and unpublished theses of different
universities etc. Editing of secondary data is necessary for different purposes as
– editing for completeness, editing for consistency, editing for accuracy and
editing for homogeneity.

It is always a tough task for the researcher to choose between primary


and secondary data. Though primary data are more authentic and accurate, time,
money and labor involved in obtaining these more often prompt the researcher to
go for the secondary data. There are certain amount of doubt about its
authenticity and suitability, but after the arrival of many government and semi
government agencies and some private institutions in the field of data collection,
most of the apprehensions in the mind of the researcher have been removed.
Methods explained

Introduction:
Nowadays questionnaire is widely used for data collection in social research. It
is a reasonably fair tool for gathering data from large, diverse, varied and
scattered social groups. The questionnaire is the media of communication
between the investigator and the respondents. According to Bogardus, a
questionnaire is a list of questions sent to a number of persons for their answers
and which obtains standardized results that can be tabulated and treated
statistically. The Dictionary of Statistical Terms defines it as a “group of or
sequence of questions designed to elicit information upon a subject or sequence
of subjects from information.” A questionnaire should be designed or drafted
with utmost care and caution so that all the relevant and essential information
for the enquiry may be collected without any difficulty, ambiguity and
vagueness. Drafting of a good questionnaire is a highly specialized job and
requires great care skill, wisdom, efficiency and experience. No hard and fast
rule can be laid down for designing or framing a questionnaire. However, in this
connection, the following general points may be borne in mind:

1. Size of the Questionnaire Should be Small: A researcher should try his


best to keep the number of questions as small as possible, keeping in view the
nature, objectives and scope of the enquiry. Respondent’s time should not be
wasted by asking irrelevant and unimportant questions. A large number of
questions would involve more work for the investigator and thus result in delay
on his part in collecting and submitting the information. A large number of
unnecessary questions may annoy the respondent and he may refuse to
cooperate. A reasonable questionnaire should contain from 15 to 25 questions at
large. If a still larger number of questions are a must in any enquiry, then the
questionnaire should be divided into various sections or parts.
2. The Questions Should be Clear: The questions should be easy, brief,
unambiguous, non-offending, courteous in tone, corroborative in nature and to
the point, so that much scope of guessing is left on the part of the respondents.

3. The Questions Should be Arranged in a Logical Sequence: Logical


arrangement of questions reduces lot of unnecessary work on the part of the
researcher because it not only facilitates the tabulation work but also does not
leave any chance for omissions or commissions. For example, to find if a person
owns a television, the logical order of questions would be: Do you own a
television? When did you buy it? What is its make? How much did it cost you?
Is its performance satisfactory? Have you ever got it serviced?

4. Questions Should be Simple to Understand: The vague words like good,


bad, efficient, sufficient, prosperity, rarely, frequently, reasonable, poor, rich
etc., should not be used since these may be interpreted differently by different
persons and as such might give unreliable and misleading information. Similarly
the use of words having double meaning like price, assets, capital income etc.,
should also be avoided.

5. Questions Should be Comprehensive and Easily Answerable: Questions


should be designed in such a way that they are readily comprehensible and easy
to answer for the respondents. They should not be tedious nor should they tax
the respondents’ memory. At the same time questions involving mathematical
calculations like percentages, ratios etc., should not be asked.
6. Questions of Personal and Sensitive Nature Should Not be Asked: There
are some questions which disturb the respondents and he/she may be shy or
irritated by hearing such questions. Therefore, every effort should be made to
avoid such questions. For example, ‘do you cook yourself or your wife cooks?’
‘Or do you drink?’ Such questions will certainly irk the respondents and thus be
avoided at any cost. If unavoidable then highest amount of politeness should be
used.

7. Types of Questions: Under this head, the questions in the questionnaire may
be classified as follows:
(a) Shut Questions: Shut questions are those where possible answers are
suggested by the framers of the questionnaire and the respondent is required to
tick one of them. Shut questions can further be subdivided into the following
forms:
(i) Simple Alternate Questions: In this type of questions the respondent has to
choose from the two clear cut alternatives like ‘Yes’ or ‘No’, ‘Right or Wrong’
etc. Such questions are also called as dichotomous questions. This technique can
be applied with elegance to situations where two clear cut alternatives exist.
(ii) Multiple Choice Questions: Many a times it becomes difficult to define a
clear cut alternative and accordingly in such a situation additional answers
between Yes and No, like Do not know, No opinion, Occasionally, Casually,
Seldom etc. are added. For example, in order to find if a person smokes or
drinks, the following multiple choice answers may be used:
Do you smoke?
(a)Yes regularly [ ] (b) No never [ ]
(c) Occasionally [] (d) Seldom []
Multiple choice questions are very easy and convenient for the respondents to
answer. Such questions save time and also facilitate tabulation. This method
should be used if only a selected few alternative answers exist to a particular
question.

8. Leading Questions Should be Avoided: Questions like ‘Why do you use a


particular type of car, say Maruti car’ should preferably be framed into two
questions-
(i) Which car do you use?
(ii) Why do you prefer it?
It gives smooth ride [ ]
It gives more mileage [ ]
It is cheaper [ ]
It is maintenance free [ ]

9 Cross Checks: The questionnaire should be so designed as to provide


internal checks on the accuracy of the information supplied by the respondents
by including some connected questions at least with respect to matters which are
fundamental to the enquiry.

10 Pre testing the Questionnaire: It would be practical in every sense to try


out the questionnaire on a small scale before using it for the given enquiry on a
large scale. This has been found extremely useful in practice. The given
questionnaire can be improved or modified in the light of the drawbacks,
shortcomings and problems faced by the investigator in the pre test.

11 A Covering Letter: A covering letter from the organizers of the enquiry


should be enclosed along with the questionnaire for the purposes regarding
definitions, units, concepts used in the questionnaire, for taking the respondent’s
confidence, self addressed envelop in case of mailed questionnaire, mention
about award or incentives for the quick response, a promise to send a copy of the
survey report etc.

SAMPLING
Though sampling is not new, the sampling theory has been developed
recently. People knew or not but they have been using the sampling technique in
their day to day life. For example a house wife tests a small quantity of rice to
see whether it has been well-cooked and gives the generalized result about the
whole rice boiling in the vessel. The result arrived at is most of the times 100%
correct. In another example, when a doctor wants to examine the blood for any
deficiency, takes only a few drops of blood of the patient and examines. The
result arrived at is most of the times correct and represent the whole amount of
blood available in the body of the patient. In all these cases, by inspecting a few,
they simply believe that the samples give a correct idea about the population.
Most of our decision are based on the examination of a few items only i.e.
sample studies. In the words of Croxton and Cowdon, “It may be too expensive
or too time consuming to attempt either a complete or a nearly complete
coverage in a statistical study. Further to arrive at valid conclusions, it may not
be necessary to enumerate all or nearly all of a population. We may study a
sample drawn from the large population and if that sample is adequately
representative of the population, we should be able to arrive at valid
conclusions.”
According to Rosander, “The sample has many advantages over a census
or complete enumeration. If carefully designed, the sample is not only
considerably cheaper but may give results which are just accurate and
sometimes more accurate than those of a census. Hence a carefully designed
sample may actually be better than a poorly planned and executed census.”

Merits:
1. It saves time: Sampling method of data collection saves time because
fewer items are collected and processed. When the results are urgently required,
this method is very helpful.
2. It reduces cost: Since only a few and selected items are studied in
sampling, there is reduction in cost of money and reduction in terms of man
hours.
3. More reliable results can be obtained: Through sampling, more
reliable results can be obtained because (a) there are fewer chances of sampling
statistical errors. If there is sampling error, it is possible to estimate and control
the results.(b) Highly experienced and trained persons can be employed for
scientific processing and analyzing of relatively limited data and they can use
their high technical knowledge and get more accurate and reliable results.
4. It provides more detailed information: As it saves time, money and
labor, more detail information can be collected in a sample survey.
5. Sometimes only Sampling method to depend upon: Some times it so
happens that one has to depend upon sampling method alone because if the
population under study is finite, sampling method is the only method to be used.
For example, if someone’s blood has to be examined, it will become fatal to take
all the blood out from the body and study depending upon the total enumeration
method.
6. Administrative convenience: The organization and administration of
sample survey are easy for the reasons which have been discussed earlier.
7. More scientific: Since the methods used to collect data are based on
scientific theory and results obtained can be tested, sampling is a more scientific
method of collecting data.
It is not that sampling is free from demerits or shortcomings. There are certain
shortcomings of this method which are discussed below:
1. Illusory conclusion: If a sample enquiry is not carefully planned and
executed, the conclusions may be inaccurate and misleading.
2. Sample not representative: To make the sample representative is a
difficult task. If a representative sample is taken from the universe, the result is
applicable to the whole population. If the sample is not representative of the
universe the result may be false and misleading.
3. Lack of experts: As there are lack of experts to plan and conduct a
sample survey, its execution and analysis, and its results would be
unsatisfactory and not trustworthy.
4. Sometimes more difficult than census method: Sometimes the
sampling plan may be complicated and requires more money, labor and time
than a census method.
5. Personal bias: There may be personal biases and prejudices with regard
to the choice of technique and drawing of sampling units.
6. Choice of sample size: If the size of the sample is not appropriate then it
may lead to untrue characteristics of the population.
7. Conditions of complete coverage: If the information is required for
each and every item of the universe, then a complete enumeration survey is
better.
Essentials of sampling: In order to reach a clear conclusion, the sampling
should possess the following essentials:
1. It must be representative: The sample selected should possess the
similar characteristics of the original universe from which it has been drawn.
2. Homogeneity: Selected samples from the universe should have similar
nature and should mot have any difference when compared with the universe.
3. Adequate samples: In order to have a more reliable and representative
result, a good number of items are to be included in the sample.
4. Optimization: All efforts should be made to get maximum results both
in terms of cost as well as efficiency. If the size of the sample is larger, there is
better efficiency and at the same time the cost is more. A proper size of sample
is maintained in order to have optimized results in terms of cost and efficiency.

Testing of Hypothesis:
As a part of investigation, samples are drawn from the population and
results are derived to help in taking the decisions. But such decisions involve an
element of uncertainty causing wrong decisions. Hypothesis is an assumption
which may or may not be true about a population parameter. For example, if we toss a
coin 200 times, we may get 110 heads and 90 tails. At this instance, we are interested in
testing whether the coin is unbiased or not.
Therefore, we may conduct a test to judge the significance of the difference of
sampling or otherwise. To carry out a test of significance, the following
procedure has to be followed:
1. Framing the Hypothesis: To verify the assumption, which is based on
sample study, we collect data and find out the difference between the sample
value and the population value. If there is no difference found or the difference
is very small then the hypothetical value is correct. Generally two hypotheses
are constructed, and if one is found correct, the other is rejected.
(a) Null Hypothesis: The random selection of the samples from the given
population makes the tests of significance valid for us. For applying any test of
significance we first set up a hypothesis- a definite statement about the
population parameter/s. Such a statistical hypothesis, which is under test, is
usually a hypothesis of no difference and hence is called Null hypothesis. It is
usually denoted by Ho. In the words of Prof. R.A.Fisher “Null hypothesis is the
hypothesis which is tested for possible rejection under the assumption that
it is true.”
(b) Alternative Hypothesis. Any hypothesis which is complementary to the
null hypothesis is called an alternative hypothesis. It is usually denoted by H 1. It
is very important to explicitly state the alternative hypothesis in respect of any
null hypothesis H0 because the acceptance or rejection of Ho is meaningful only
if it is being tested against a rival hypothesis. For example, if we want to test the
null hypothesis that the population has a specified mean µ0(say), i.e., H0:µ=µ
then the alternative hypothesis could be:
(i) H1:µ≠µ0 (i.e., µ>µ0 or µ<µ0)
(ii) H1: µ>µ0 (iii) H1: µ<µ0
The alternative hypothesis (i) is known as a two-tailed alternative and the
alternatives in (ii) and (iii) are known as right-tailed and left-tailed alternatives.
Accordingly, the corresponding tests of significance are called two-tailed, right-
tailed and left-tailed tests respectively.
The null hypothesis consists of only a single parameter value and is
usually simple while alternative hypothesis is usually composite.

Types of Errors in Testing of Hypothesis: As stated earlier, the inductive


inference consists in arriving at a decision to accept or reject a null hypothesis
(Ho) after inspecting only a sample from it. As such an element of risk – the risk
of taking wrong decision is involved. In any test procedure, the four possible
mutually disjoint and exhaustive decisions are:
(i) Reject Ho when actually it is not true i.e., when Ho is false.
(ii) Accept Ho when it is true.
(iii) Reject Ho when it is true.
(iv) Accept Ho when it is false.
The decisions in (i) and (ii) are correct decisions while the decisions in
(iii) and (iv) are wrong decisions. These decisions may be expressed in the
following dichotomous table:
Decision from sample

Reject Ho Accept Ho

True State
Ho True Wrong Correct
Type I Error
Ho False Correct Wrong
(H1True) Type II Error.
Thus, in testing of hypothesis we are likely to commit two types of
errors. The error of rejecting Ho when Ho is true is known as Type I error and
the error of accepting Ho when Ho is false is known as Type II Error.
For example, in the Industrial Quality Control, while inspecting the quality of a
manufactured lot, the Inspector commits Type I Error when he rejects a good lot
and he commits Type II Error when he accepts a bad lot.

SUMMARY
Nowadays questionnaire method of data collection has become very popular. It
is a very powerful tool to collect required data in shortest period of time and
with little expense. It is scientific too. But drafting of questionnaire is a very
skilled and careful work. Therefore, there are certain requirements and essentials
which should be followed at the time of framing the questionnaire. They include
the following viz., (i) the size of the questionnaire should be small, (ii) questions
should be very clear in understanding, (iii) questions should be put in a logical
order, (iv) questions should have simple meaning etc. Apart from this, multiple
choice questions should be asked. Questionnaire should be pre tested before
going for final data collection. Information supplied should be cross checked for
any false or insufficient information. After all these formalities have been
completed, a covering note should accompany the questionnaire explaining
various purposes, designs, units and incentives.

There are two ways of survey- Census survey and Sample survey through
which data can be collected. Census survey means total enumeration i.e.,
collecting data from each and every unit of the universe, whereas sample survey
concentrates on collecting data from a few units of the universe selected
scientifically for the purpose. Since census method is more time taking,
expensive and labor intensive, it becomes impractical to depend on it. Therefore,
sample survey is preferred which is scientific, less expensive, less time taking
and less labor intensive too.
But there are merits and demerits of this method which are detailed below:
Merits - It reduces cost, saves time and is more reliable. It provides
more detailed information and is sometimes the only method to depend upon for
administrative convenience and scientifically.
Demerits - Sometimes samples may not be representative and may give
illusory conclusions. There are lack of experts and sometimes it is more difficult
than the census method, since there might arise personal bias and the
determination of the size of the sample might be very difficult.
Apart from these, there are some essentials of sampling which must be
followed. They are: Samples must be representative, samples must be
homogeneous and the number of samples must be adequate. When a researcher
resorts to sampling, he intends to collect some data which would help him to
draw results and finally take a decision. When he takes a decision it’s on the
basis of hypothesis which is precisely assumption and is prone to two types of
errors-Type I Error and Type II Error. When a researcher rejects a correct
hypothesis, he commits type I error and when he accepts a wrong hypothesis he
commits type II error. The researcher should try to avoid both types of errors but
committing type II error is more harmful than type I error.
Research Design

Introduction
The meaning of experiment lies in the process of examining the truth of
a statistical hypothesis related to some research problem. For example, a
researcher can conduct an experiment to examine the newly developed
medicine. Experiment is of two types: absolute experiment and comparative
experiment. When a researcher wants to determine the impact of a fertilizer on
the yield of a crop it is a case of absolute experiment. On the other hand, if he
wants to determine the impact of one fertilizer as compared to the impact of
some other fertilizer, the experiment will then be called as a comparative
experiment. Normally, a researcher conducts a comparative experiment when he
talks of designs of experiments.

Research design can be of three types:


(a) Research design in the case of descriptive and diagnostic research
studies,
(b) Research design in the case of exploratory research studies, and
(c) Research design in the case of hypothesis testing research studies.
Here we are mainly concerned with the third one which is Research design
in the case of hypothesis testing research studies.

Research design in the case of hypothesis testing research studies:


Hypothesis testing research studies are generally known as experimental studies.
This is a study where a researcher tests the hypothesis of causal relationships
between variables. This type of study requires some procedures which will not
only reduce bias and increase reliability, but will also permit drawing inferences
about causality. Most of the times, experiments meet these requirements. Prof.
Fisher is considered as the pioneer of this type of studies (experimental studies).
He did pioneering work when he was working at Rothamsted Experimental
Station in England which was a centre for Agricultural Research. While working
there, Prof. Fisher found that by dividing plots into different blocks and then by
conducting experiments in each of these blocks whatever information is
collected and inferences drawn from them happened to be more reliable. This
was where he was inspired to develop certain experimental designs for testing
hypotheses concerning scientific investigations. Nowadays, the experimental
design is used in researches relating to almost every discipline of knowledge.
Prof. Fisher laid three principles of experimental designs:
(1) The Principle of Replication
(2) The Principle of Randomization and
(3) The Principle of Local Control.

(1) The Principle of Replication:


According to this principle, the experiment should be repeated more than
once. Thus, each treatment is applied in many experimental units instead of one.
This way the statistical accuracy of the experiments is increased. For example,
suppose we are going to examine the effect of two varieties of wheat.
Accordingly, we divide the field into two parts and grow one variety in one part
and the other variety in the other. Then we compare the yield of the two parts
and draw conclusion on that basis. But if we are to apply the principle of
replication to this experiment, then we first divide the field into several parts,
grow one variety in half of these parts and the other variety in the remaining
parts. Then we collect the data of yield of the two varieties and draw conclusion
by comparing the same. The result so obtained will be more reliable in
comparison to the conclusion we draw without applying the principle of
replication. The entire experiment can be repeated several times for better
results.
(2) The Principle of Randomization:
When we conduct an experiment, the principle of randomization
provides us a protection against the effects of extraneous factors by
randomization. This means that this principle indicates that the researcher
should design or plan the experiment in such a way that the variations caused by
extraneous factors can all be combined under the general heading of ‘chance’.
For example, when a researcher grows one variety of wheat , say , in the first
half of the parts of a field and the other variety he grows in the other half, then it
is just possible that the soil fertility may be different in the first half in
comparison to the other half. If this is so the researcher’s result is not realistic.
In this situation, he may assign the variety of wheat to be grown in different
parts of the field on the basis of some random sampling technique i.e., he may
apply randomization principle and protect himself against the effects of the
extraneous factors. Therefore, by using the principle of randomization, he can
draw a better estimate of the experimental error.

(3). The Principle of Local Control:


This is another important principle of experimental designs. Under this
principle, the extraneous factor which is the known source of variability is made
to vary deliberately over as wide a range as necessary. This needs to be done in
such a way that the variability it causes can be measured and hence eliminated
from the experimental error. The experiment should be planned in such a way
that the researcher can perform a two-way analysis of variance, in which the
total variability of the data is divided into three components attributed to
treatments (varieties of wheat in this case), the extraneous factor (soil fertility in
this case) and experimental error. In short, through the principle of local control
we can eliminate the variability due to extraneous factors from the experimental
error.
Kinds of experimental Designs and Control
Experimental designs refer to the framework of structure of an
experiment and as such there are several experimental designs. Generally,
experimental designs are classified into two broad categories: informal
experimental designs and formal experimental designs. Informal experimental
designs are those designs that normally use a less sophisticated form of analysis
based on differences in the magnitudes, whereas formal experimental designs
offer relatively more control and use precise statistical procedures for analysis.
Important experimental designs are discussed below:
(1) Informal experimental designs:
(i) Before and after without control design
(ii) After only with control design
(iii) Before and after with control design

(2) Formal experimental designs:


(i) Completely randomized design (generally called C.R design)
(ii) Randomized block design (generally called R.B design)
(iii) Latin square design (generally called L.S design)
(iv) Factorial designs.

(1) Before and after without control design:


In this design, a single test group or area is selected and the dependent
variable is measured before introduction of the treatment. Then the treatment is
introduced and the dependent variable is measured again after the treatment has
been introduced. The effect of the treatment would be equal to the level of the
phenomenon after the treatment minus the level of the phenomenon before the
treatment. Thus, the design can be presented in the following manner:
Test area Level of phenomenon Treatment Level of phenomenon
Before treatment(X) introduced after treatment(Y)

Treatment Effect =(Y)-(X)

The main difficulty of such a design is that with the passage of time
considerable extraneous variations may be there in its treatment effect.

(2) After-only with control design:


Two groups or areas are selected in this design and the treatment is
introduced into the test area only. Then the dependent variable is measured in
both the areas at the same time. Treatment impact is assessed by subtracting the
value of the dependent variable in the control area from its value in the test area.
The design can be presented in the following manner:
Test area: Treatment
Level
introduced
of phenomenon after Treatment (Y) Level of phenomenon
Without treatment (Z)

Control area:

Treatment Effect = (Y)-(Z)

The basic assumption in this type of design is that the two areas are identical
with respect to their behavior towards the phenomenon considered. If this
assumption is not true, there is the possibility of extraneous variation entering
into the treatment effect.

(3) Before and after with control design:


In this design, two areas are selected and the dependent variable is
measured in both the areas for an identical time-period before the treatment.
Thereafter, the treatment is introduced into the test area only, and the dependent
variable id measured in both for and identical time –period after the introduction
of the treatment. The effect of the treatment is determined by subtracting the
change in the dependent variable in the control area from the change in the
dependent variable in test area. This design can be shown in the following way:
Time Period I Time Period II
Test area: Level of phenomenon Treatment Level of phenomenon

Before treatment (X) introduced after treatment (Y)


Control area: Level of phenomenon Level of phenomenon
Without treatment without treatment
(A) (Z)
Treatment Effect = (Y-X)-(Z-A)

This design is superior to the previous two designs because it avoids extraneous
variation resulting both from the passage of time and from non-comparability of
the rest and control areas. But at times, due to lack of historical data time or a
comparable control area, we should prefer to select one of the first two informal
designs stated above.

(2) Formal Experimental Design


(i) Completely randomized design: -
This design involves only two principles i.e., the principle of replication
and the principle of randomization of experimental designs. Among all other
designs this is the simpler and easier because it’s procedure and analysis are
simple. The important characteristic of this design is that the subjects are
randomly assigned to experimental treatments. For example, if the researcher
has 20 subjects and if he wishes to test 10 under treatment A and 10 under
treatment B, the randomization process gives every possible group of 10 subjects
selected from a set of 20 an equal opportunity of being assigned to treatment A
and treatment B. One way analysis of variance (one way ANOVA) is used to
analyze such a design.

(ii) Randomized block design:-


R. B. design is an improvement over the C.R. design. In the R .B. design,
the principle of local control can be applied along with the other two principles
of experimental designs. In the R.B. design, subjects are first divided into
groups, known as blocks, such that within each group the subjects are relatively
homogenous in respect to some selected variable. The number of subjects in a
given block would be randomly assigned to each treatment. Blocks are the levels
at which we hold the extraneous factor fixed, so that its contribution to the total
variability of data can be measured. The main feature of the R.B. design is that,
in this, each treatment appears the same number of times in each block. This
design is analyzed by the two-way analysis of variance (two-way ANOVA)
technique.

(3) Latin squares design:-


The Latin squares design (L.S design) is an experimental design which is
very frequently used in agricultural research. Since agriculture depends upon
nature to a large extent, the condition of research and investigation in agriculture
is different than the other studies. For example, an experiment has to be made
through which the effects of fertilizers on the yield of a certain crop, say wheat,
is to be judged. In this situation, the varying fertility of the soil in different
blocks in which the experiment has to be performed must be taken into
consideration; otherwise the results obtained may not be very dependable
because the output happens to be the effects of not only of fertilizers, but also of
the effect of fertility of soil. Similarly there may be the impact of varying seeds
of the yield. In order to overcome such difficulties, the L.S. design is used when
there are two major extraneous factors such as the varying soil fertility and
varying seeds. The Latin square design is such that each fertilizer will appear
five times but will be used only once in each row and in each column of the
design. In other words, in this design, the treatment is so allocated among the
plots that no treatment occurs more than once in any one row or any one
column. This experiment can be shown with the help of the following diagram:

FERTILITY LEVEL
I II III IV V
X1 A B C D E
X2 B C D E A
X3 C D E A B
X4 D E A B C
X5 E A B C D

From the above diagram, it is clear that in L.S. design the field is divided into as
many blocks as there are varieties of fertilizers. Then, each block is again
divided into as many parts as there are varieties of fertilizers in such a way that
each of the fertilizer variety is used in each of the block only once. The analysis
of L.S. design is very similar to the two-way ANOVA technique.

3. Factorial design:
Factorial designs are used in experiments where the effects of varying
more than one factor are to be determined. These designs are used more in
economic and social matters where usually a large number of factors affect a
particular problem. Factorial designs are usually of two types:

(i) Simple factorial designs and (ii) complex factorial designs.


(i) Simple factorial design:
In simple factorial design, the effects of varying two factors on the
dependent variable are considered but when an experiment is done with more
than two factors, complex factorial designs are used. Simple factorial design is
also termed as a ‘two-factor-factorial design,’ whereas complex factorial design
is known as ‘multi-factor-factorial design.

(ii) Complex factorial designs:-


When the experiments with more than two factors at a time are
conducted, it involves the use of complex factorial designs. A design which
considers three or more independent variables simultaneously is called a
complex factorial design. In case of three factors with one experimental
variable, two treatments and two levels, complex factorial design will contain a
total of eight cells. This can be seen through the following diagram:
2x2x2 COMPLEX FACTORIAL DESIGN

Experimental Variable
Treatment A Treatment B
Control Control Control Control
Variable 2 Variable 2 Variable 2 Variable 2
Level I Level II Level I Level II
Level I Cell 1 Cell 3 Cell 5 Cell 7
Control Cell 2 Cell 4 Cell 6 Cell 8
Variable 2
Level II

A pictorial presentation is given of the design shown above in the following:

Experimental Variable

Treatment Treatment
A B
Control Variable II
Level II
Level I
Control Variable I

Level I

Level II

The dotted line cell in this diagram corresponds to cell I of the above stated
2x2x2 design and is for treatment A, level I of the control variable 1, and level I
of the control variable 2. From this design, it is possible to determine the main
effects for three variables i.e., one experimental and true control variables. The
researcher can also determine the interaction between each possible pair of
variables (such interactions are called ‘first order interactions’) and interaction
between variable taken in triplets (such interactions are called second order
interactions). In case of a 2x2x2 design, the further given first order interactions
are possible:
Experimental variable with control variable 1 (or EV x CV 1);
Experimental variable with control variable 2 (or EV x CV 2);
Control variable 1 with control variable 2 (or CV 1 x CV 2);
There will be one second order interaction as well in the given design (it is
between all the three variables i.e., EV x CV 1 x CV 2).

To determine the main effect for the experimental variable, the


researcher must necessarily compare the combined mean of data in cells 1, 2, 3
and 4 for Treatment A with the combined mean of data in cells 5,6,7 and 8 for
Treatment B. In this way the main effect experimental variable, independent of
control variable 1 and variable 2, is obtained. Similarly, the main effect for
control variable 1, independent experimental variable and control variable 2, is
obtained if we compare the combined mean of data in cells 1, 3, 5 and 7 with the
combined mean of data in cells 2, 4, 6 and 8 of our 2x2x2 factorial design. On
similar lines, one can determine the effect of control variable 2 independent of
experimental variable and control variable 1, if the combined mean of data in
cells 1,2,5 and 6 are compared with the combined mean of data in cells 3,4,7 and
8.
To obtain the first order interaction, say, for EV x CV 1 in the above
stated design, the researcher must necessarily ignore control variable 2 for which
purpose he may develop 2x2 design from the 2x2x2 design by combining the
data of the relevant cells of the latter design as has been shown on next page:
Experimental Variable
Treatment A Treatment B
Control Level I Cells 1.3 Cells 5,7
Variable 1 Level II Cells 2,4 Cells 6,8

Similarly, the researcher can determine other first order interactions. The
analysis of the first order interaction in the manner described above is essentially
a simple factorial analysis as only two variables are considered at a time and the
remaining ones are ignored. But the analysis of the second order interaction
would not ignore one of the three independent variables in case of a 2x2x2
design. The analysis would be termed as a complex factorial analysis.

It may, however, be remembered that the complex factorial design need not
necessarily be of 2x2x2 type design, but can be generalized to any number and
combinations of experimental and control independent variables. Of course, the
greater the number of independent variables included in a complex factorial
design, the higher the order of the interaction analysis possible. But the overall
task goes on becoming more and more complicated with the inclusion of more
and more independent variables in our design.

Factorial designs are used mainly because of the two advantages -


(i) They provide equivalent accuracy (as happens in the case of experiments with
only one factor) with less labour and as such are source of economy. Using
factorial designs, we can determine the effects of two (in simple factorial
design) or more (in case of complex factorial design) factors (or variables) in
one single experiment. (ii) They permit various other comparisons of interest.
For example, they give information about such effects which cannot be obtained
by treating one single factor at a time. The determination of interaction effects is
possible in case of factorial designs.

Conclusion
There are several research designs and the researcher must decide in advance of
collection and analysis of data as to which design would prove to be more
appropriate for his research project. He must give due weight to various points
such as type of universe and it’s nature, the objective of the study, the source list
or the sampling frame, desired standard accuracy and the like when taking a
decision in respect of the design for his research project.
Observation

Introduction
Observation is a method that employs vision as its main means of data
collection. It implies the use of eyes rather than of ears and the voice. It is
accurate watching and noting of phenomena as they occur with regard to the
cause and effect or mutual relations. It is watching other persons’ behavior as it
actually happens without controlling it. For example, watching bonded
labourer’s life, or treatment of widows and their drudgery at home, provide
graphic description of their social life and sufferings. Observation is also defined
as “a planned methodical watching that involves constraints to improve
accuracy”.
CHARACTERISTICS OF OBSERVATION
Scientific observation differs from other methods of data collection
specifically in four ways: (i) observation is always direct while other methods
could be direct or indirect; (ii) field observation takes place in a natural setting;
(iii) observation tends to be less structured; and (iv) it makes only the qualitative
(and not the quantitative) study which aims at discovering subjects’ experiences
and how subjects make sense of them (phenomenology) or how subjects
understand their life (interpretivism).
Lofland (1955:101-113) has said that this method is more appropriate for
studying lifestyles or sub-cultures, practices, episodes, encounters, relationships,
groups, organizations, settlements and roles etc. Black and Champion
(1976:330) have given the following characteristics of observation:
 Behavior is observed in natural surroundings.
 It enables understanding significant events affecting social relations of the
participants.
 It determines reality from the perspective of observed person himself.
 It identifies regularities and recurrences in social life by comparing data in
our study with that of those in other studies.

Besides, four other characteristics are:

 Observation involves some control pertaining to the observation and to the


means he uses to record data. However, such controls do not exist for the
setting or the subject population.
 It is focused on hypotheses-free inquiry.
 It avoids manipulations in the independent variable i.e., one that is supposed
to cause other variable(s) and is not caused by them.
 Recording is not selective.
Since at times, observation technique is indistinguishable from
experiment technique, it is necessary to distinguish the two.
(i) Observation involves few controls than the experiment technique.
(ii) The behaviour observed in observation is natural, whereas in
experiment it is not always so.
(iii) The behavior observed in experiment is more molecular (of a
smaller unit), while one in observation is molar.
(iv) In observation, fewer subjects are watched for long periods of time
in more varied circumstances than in experiment.
(v) Training required in observation study is directed more towards
sensitizing the observer to the flow of events, whereas training in experiments
serves to sharpen the judgment of the subject.
(vi) In observational study, the behavior observed is more diffused.
Observational methods differ from one another along several variables or
dimensions.

***
Normal Distribution and its properties

The important properties of the normal distribution are:-


1. The normal curve is “bell shaped” and symmetrical in nature. The distribution
of the frequencies on either side of the maximum ordinate of the curve is similar
with each other.
2. The maximum ordinate of the normal curve is at x = µ. Hence the mean,
median and mode of the normal distribution coincide.
3. It ranges between -  to + 
4. The value of the maximum ordinate is 1/ σ√2Π.
5. The points where the curve change from convex to concave or vice versa is at X
= µ ± σ.
6. The first and third quartiles are equidistant from median.
7. The area under the normal curve distribution are:
a) µ ± 1σ covers 68.27% area;
b) µ ± 2σ covers 95.45% area.
c) µ ± 3σ covers 99.73% area.

68.27%

95.45%

99.73%
µ - 36 µ - 26 µ - 16 µ=0 µ + 16 µ + 26 µ + 36
- 3 - 2 -1 Z=0 +1 +2 +3

8. When µ = 0 and σ = 1, then the normal distribution will be a standard


normal curve. The probability function of standard normal curve is
1 -x2/2
P(X) = e
√2Π

The following table gives the area under the normal probability curve for
some important value of Z.
Distance from the mean ordinate in Area under the curve
Terms of ± σ
Z = ± 0.6745 0.50
Z = ± 1.0 0.6826
Z = ± 1.96 0.95
Z = ± 2.00 0.9544
Z = ± 2.58 0.99
Z = ± 3.0 0.9973

9. All odd moments are equal to zero.


10. Skewness = 0 and Kurtosis = 3 in normal distribution.

Illustration: Find the probability that the standard normal value lies between 0
and 1.5

0.4332 (43.32%)

Z=0 Z = 1.5

As the mean, Z = 0.
To find the area between Z = 0 and Z = 1.5, look the area between 0 and 1.5,
from the table. It is 0.4332 (shaded area)
Illustration: The results of a particular examination are given below in a
summary form:
Result Percentage of candidates
Passed with distinction 10
Passed 60
Failed 30
It is known that a candidate gets plucked if he obtains less than 40
marks, out of 100 while he must obtain at least 75 marks in order to pass with
distinction. Determine the mean and standard deviation of the distribution of
marks assuming this to be normal.
Solution:
30% students get marks less than 40.
40 – X
Z =-----------= -0.52 (from the table)
σ

30%20%40%10%

40 – X = -0.52σ-------------------------------------(i)
10% students get more than 75
40% area = 75 – X = 1.28 ------------ (ii)
= 75 – X = 1.28σ
Subtract (ii) from (i)
40 – X = -0.52 σ
75 – X = 1.28 σ

-35 = -1.8 σ
35 = 1.8 σ
1.80 σ = 35
35
σ =--------= 19.4
1.80

Mean 40 – X = -0.52 x (19.4)


-X = -40 – 10.09 = 50.09
Illustration: The scores made by candidate in a certain test are normally
distributed with mean 1000 and standard deviation 200. what per cent of
candidates receive scores (i) less than 800, (ii) between 800 and 1200?
(the area under the curve between Z = 0 and Z = 1 is 0.34134).
Solution:

X = 1000; σ = 200
X–X
Z=
σ

(i) For X = 800


800 – 1000
Z =--------------= -1
200

Area between Z = -1 and Z = 0 is 0.34134


Area for Z = -1 = 0.5 – 0.34134 = 0.15866
Therefore, the percentage = 0.15866 x 100 = 15.86%

(ii) When, X = 1200,


1200 – 1000
Z =---------------= 1
200
Area between Z = 0 and Z = 1 is 0.34134
Area between X = 400 to X = 600
i.e., Z = -1 and Z = 1 is 0.34134 + 0.34134 = 0.6826 = 68.26%
0.6826

0.1586

800 1000 1200

3. TESTING OF HYPOTHESIS

3.1 Test of Significance for Large Samples


The test of significance for the large samples can be explained by the following
assumptions:
(i) The random sampling distribution of statistics is approximately normal.
(ii) Sampling values are sufficiently close to the population value and can be
used for the calculation of standard error of estimate.

1. The standard error of mean.


In the case of large samples, when we are testing the significance of statistic, the
concept of standard error is used. It measures only sampling errors. Sampling
errors are involved in estimating a population parameter from a sample, instead
of including all the essential information in the population.

(i) when standard deviation of the population is known, the formula is


σp
S.E. X = ----
√n
Where,
S.E.X = The standard error of the mean, σp = Standard deviation of the
population, and n = Number of observations in the sample.
(ii) When standard deviation of population is not known, we have to use the
standard deviation of the sample in
calculating standard error of mean. The
formula is
σ (Sample)
S.E. X = ------------
√n
Where, σ = standard deviation of the sample, and n = sample size

Illustration: A sample of 100 students from Pondicherry University was taken


and their average was found to be 116 lbs with a standard deviation of 20 lbs.
Could the mean weight of students in the population be 125 pounds?

Solution:
Let us take the hypothesis that there is no significant difference between the
sample mean and the hypothetical population mean.
σ 20 20
S.E. X = ---- = -------- = -------- = 2
√n √100 10

Difference 125 – 116 9


= = = 4.5
S.E.X 2 2

Since, the difference is more than 2.58 S.E.(1% level) it could not have arisen
due to fluctuations of sampling. Hence the mean weight of students in the
population could not be 125 lbs.
3.2 Test of Significance for Small Samples
If the sample size is less than 30, then those samples may be regarded as
small samples. As a rule, the methods and the theory of large samples are not
applicable to the small samples. The small samples are used in testing a given
hypothesis, to find out the observed values, which could have arisen by sampling
fluctuations from some values given in advance. In a small sample, the
investigator’s estimate will vary widely from sample to sample. An inference
drawn from a smaller sample result is less precise than the inference drawn from
a large sample result.
t-distribution will be employed, when the sample size is 30 or less and
the population standard deviation is unknown.
The formula is

( X - µ)
t =--------x √n
σ

Where, σ = √ (X – X)2/n – 1

Illustration: The following results are obtained from a sample of 20 boxes of


mangoes:
Mean weight of contents = 490gms,
Standard deviation of the weight = 9 gms.
Could the sample come from a population having a mean of 500 gms?

Solution:
Let us take the hypothesis that µ = 510 gms.
( X - µ)
t = ------- x √n
σ
X = 500; µ = 510; σ = 10; n = 20.
500 – 510
t= x √20
10

Df = 20 – 1 = 19 = (10/9) √20 = (10/9) x 4.47 = 44.7/9 = 4.96


Df = 19, t0.01 = 3.25

The computed value is less than the table value. Hence, our null hypothesis is
accepted.

4. CHI-SQUARE TEST
F, t and Z tests were based on the assumption that the samples were drawn from
normally distributed populations. The testing procedure requires assumption
about the type of population or parameters, and these tests are known as
‘parametric tests’.
There are many situations in which it is not possible to make any rigid
assumption about the distribution of the population from which samples are
being drawn. This limitation has led to the development of a group of
alternative techniques known as non-parametric tests. Chi-square test of
independence and goodness of fit is a prominent example of the use of non-
parametric tests.
Though non-parametric theory developed as early as the middle of the
nineteenth century, it was only after 1945 that non-parametric tests came to be
used widely in sociological and psychological research. The main reasons for
the increasing use of non-parametric tests in business research are:-
(i) These statistical tests are distribution-free
(ii) They are usually computationally easier to handle and understand than
parametric tests; and
(iii) They can be used with type of measurements that prohibit the use of
parametric tests.

The χ2 test is one of the simplest and most widely used non-parametric
tests in statistical work. It is defined as:
∑(O – E)2
χ2 = ------------
E
Where O = the observed frequencies, and E = the expected frequencies.
Steps: The steps required to determine the value of χ2are:
(i) Calculate the expected frequencies. In general the expected frequency
for any cell can be calculated from the following equation:
RXC
E =
N
Where E = Expected frequency, R = row’s total of the respective cell, C =
column’s total of the respective cell and N = the total number of observations.
(ii) Take the difference between observed and expected frequencies and
obtain the squares of these differences. Symbolically, it can be represented as
(O – E)2

(iii) Divide the values of (O – E)2 obtained


in step (ii) by the respective
expected frequency and obtain the total, which can be symbolically represented
by ∑[(O – E)2/E]. This gives the value of χ2 which can range from zero to
infinity. If χ2 is zero it means that the observed and
expected frequencies
completely coincide. The greater the discrepancy between the observed and
expected frequencies, the greater shall be the value of χ2.

The computed value of χ2 is compared with the table value of χ 2 for given
degrees of freedom at a certain specified level of significance. If at the stated
level, the calculated value of χ2 is less than the table value, the difference
between theory and observation is not considered as significant.
The following observation may be made with regard to the χ2
distribution:-

(i) The sum of the observed and expected frequencies is always zero.
Symbolically, ∑(O – E) = ∑O - ∑E = N – N = 0
(ii) The χ2 test depends only on the set of observed and expected frequencies
and on degrees of freedom v. It is a non-parametric test.
(iii) χ2 distribution is a limiting approximation of the multinomial
distribution.
(iv) Even though χ2 distribution is essentially a continuous distribution it can
be applied to discrete random variables whose frequencies can be counted and
tabulated with or without grouping.
The Chi-Square Distribution
For large sample sizes, the sampling distribution of χ 2 can be closely
approximated by a continuous curve known as the Chi-square distribution. The
probability function of χ2 distribution is:
F(χ2) = C (χ2)(v/2 – 1)e – x2/2
Where e = 2.71828, v = number of degrees of freedom, C = a constant
depending only on v.
The χ2 distribution has only one parameter, v, the number of degrees of
freedom. As in case of t-distribution there is a distribution for each different
number of degrees of freedom. For very small number of degrees of freedom,
the Chi-square distribution is severely skewed to the right. As the number of
degrees of freedom increases, the curve rapidly becomes more symmetrical. For
large values of v the Chi-square distribution is closely approximated by the
normal curve.
The following diagram gives χ2 distribution for 1, 5 and 10 degrees of
freedom:

F(x2)

v=1

v=5
v = 10

0 2 4 6 8 10 12 14 16 18 20 22 χ2
χ2 Distribution

It is clear from the given diagram that as the degrees of freedom increase,
the curve becomes more and more symmetric. The Chi-square distribution is a
probability distribution and the total area under the curve in each chi-square
distribution is unity.

Properties of χ2 distribution
The main Properties of χ2 distribution are:-
(i) the mean of the χ2 distribution is equal to the number of degrees of freedom, i.e.,
X=v
(ii) the variance of the χ2 distribution is twice the degrees of freedom, Variance =
2v
(iii) µ1 = 0,
(iv) µ2 = 2v,
(v) µ3 = 8v,
(vi) µ4 = 48v + 12v2.
µ32 64v2 8
(vii) β1 = --- =-------= --
µ22 8v3 v

µ4 48v + 12v2 12
(v) β1µ3 = ---- = --------------- = 3 + ---

µ22 4v2 v

The table values of χ2 are available only up to 30 degrees of freedom.


For degrees of freedom greater than 30, the distribution of √2χ 2 approximates the
normal distribution. For degrees of freedom greater than 30, the approximation
is acceptable close. The mean of the distribution √2χ2 is √2v – 1, and the
standard deviation is equal to 1. Thus the application of the test is simple, for
deviation of √2χ2 from √2v – 1 may be interpreted as a normal deviate with units
standard deviation. That is,
Z = √2χ2 - √ 2v – 1
Alternative Method of Obtaining the Value of χ2
In a 2x2 table where the cell frequencies and marginal totals are as below:
a b (a+b)
c d (c+d)
(a+c) (b+d) N

N is the total frequency and ad the larger cross-product, the value of χ2


can easily be obtained by the following formula:
N (ad – bc)2
χ2 = or
(a + c) (b + d) (c + d) (a + b)

With Yate’s corrections

N (ab – bc - ½N)2
χ2 =
(a + c) (b + d) (c + d) (a + b)

Conditions for applying χ2 test:

The main conditions considered for employing the χ2 test are:


(i) N must be to ensure the similarity between theoretically correct distribution and
our sampling distribution of χ2.

(ii) No theoretical cell frequency should be small when the expected frequencies are
too small. If it is so, then the value of χ 2 will be overestimated and will result in
too many rejections of the null hypothesis. To avoid making incorrect
inferences, a general rule is followed that expected frequency of less than 5 in
one cell of a contingency table is too small to use. When the table contains
more than one cell with an expected frequency of less than 5 then add with the
preceding or succeeding frequency so that the resulting sum is 5 or more.
However, in doing so, we reduce the number of categories of data and will gain
less information from contingency table.

(iii) The constraints on the cell frequencies if any should be linear, i.e., they should
not involve square and higher powers of the frequencies such as ∑O = ∑E = N.

Uses of χ2 test:

The main uses of χ2 test are:

(i) χ2 test as a test of independence. With the help of χ2 test, we can find
out whether two or more attributes are associated or not. Let’s assume that we
have N observations classified according to some attributes. We may ask
whether the attributes are related or independent. Thus, we can find out whether
there is any association between skin colour of husband and wife. To examine
the attributes that are associated, we formulate the null hypothesis that there is
no association against an alternative hypothesis and that there is an association
between the attributes under study. If the calculated value of χ 2 is less than the
table value at a certain level of significance, we say that the result of the
experiment provides no evidence for doubting the hypothesis. On the other
hand, if the calculated value of χ2 is greater than the table value at a certain level
of significance, the results of the experiment do not support the hypothesis.

(ii) χ2 test as a test of goodness of fit. This is due to the fact that it enables
us to ascertain how appropriately the theoretical distributions such as binomial,
Poisson, Normal, etc., fit empirical distributions. When an ideal frequency
curve whether normal or some other type is fitted to the data, we are interested
in finding out how well this curve fits with the observed facts. A test of the
concordance of the two can be made just by inspection, but such a test is
obviously inadequate. Precision can be secured by applying the χ2 test.

(iii) χ2 test as a test of Homogeneity. The χ2 test


of homogeneity is an
extension of the chi-square test of independence. Tests of homogeneity are
designed to determine whether two or more independent random samples are
drawn from the same population or from different populations. Instead of one
sample as we use with independence problem we shall now have 2 or more
samples. For example, we may be interested in finding out whether or not
university students of various levels, i.e., middle and richer poor income groups
are homogeneous in performance in the examination.
Illustration: In an anti-diabetes campaign in a certain area, a particular
medicine, say x was administered to 812 persons out of a total population of
3248. The number of diabetes cases is shown below:
Treatment Diabetes No Diabetes Total
Medicine x 20 792 812
No Medicine x 220 2216 2436
Total 240 3008 3248
Discuss the usefulness of medicine x in checking malaria.

Solution: Let us take the hypothesis that quinine is not effective in checking
diabetes. Applying χ2 test :
(A) X (B) 240 x 812
Expectation of (AB) = ------------ =------------------= 60
N 3248
Or E1, i.e., expected frequency corresponding to first row and first column is 60.
the bale of expected frequencies shall be:
60 752 812
180 2256 2436

240 3008 3248

O E (O – E)2 (O – E)2/E
20 60 1600 26.667
220 180 1600 8.889
792 752 1600 2.218
2216 2256 1600 0.709
[∑(O – E)2/E] = 38.593

χ2 = [∑(O – E)2/E] = 38.593


v = (r – 1) (c – 1) = (2 – 1) (2 – 1) = 1
for v = 1, χ2 0.05 = 3.84

The calculated value of χ2 is greater than the table value. The hypothesis is
rejected. Hence medicine x is useful in checking malaria.

Illustration: In an experiment on immunization of cattle from tuberculosis the


following results were obtained:
Affected Not affected
Inoculated 10 20
Not inoculated 15 5

Calculate χ2 and discuss the effect of vaccine in controlling susceptibility to


tuberculosis (5% value of χ2 for one degree of freedom = 3.84).
Solution: Let us take the hypothesis that the vaccine is not effective in
controlling susceptibility to tuberculosis. Applying χ2 test:

N(ad – bc)2 50 (11x5 – 20x15)2


χ2 = = = 8.3
(a+b) (c+d)(a+c)(b+d) 30x20x25x25

Since the calculated value of χ2 is greater than the table value the hypothesis is
not true. We, therefore, conclude the vaccine is effective in controlling
susceptibility to tuberculosis.
ANALYSIS OF VARIANCE (ANOVA)
Introduction
For managerial decision making, sometimes one has to carry out tests of
significance. The analysis of variance is an effective tool for this purpose. The
objective of the analysis of variance is to test the homogeneity of the means of
different samples.
Definition
According to R.A. Fisher, “Analysis of variance is the separation of variance
ascribable to one group of causes from the variance ascribable to other groups”.
Assumptions of ANOVA
The technique of ANOVA is mainly used for the analysis and interpretation of
data obtained from experiments. This technique is based on three important
assumptions, namely
1. The parent population is normal.
2. The error component is distributed normally with zero mean and
constant variance.
3. The various effects are additive in nature.
The technique of ANOVA essentially consists of partitioning the total variation
in an experiment into components of different sources of variation. These
sources of variations are due to controlled factors and uncontrolled factors.
Since the variation in the sample data is characterized by means of many
components of variation, it can be symbolically represented in the mathematical
form called a linear model for the sample data.
Classification of models
Linear models for the sample data may broadly be classified into three types as
follows:
1. Random effect model
2. Fixed effect model
3. Mixed effect model
In any variance components model, the error component has always
random effects, since it occurs purely in a random manner. All other
components may be either mixed or random.
Random effect model
A model in which each of the factors has random effect (including error effect)
is called a random effect model or simply a random model.
Fixed effect model
A model in which each of the factors has fixed effects, buy only the error effect
is random is called a fixed effect model or simply a fixed model.
Mixed effect model
A model in which some of the factors have fixed effects and some others have
random effects is called a mixed effect model or simply a mixed model.
In what follows, we shall restrict ourselves to a fixed effect model.
In a fixed effect model, the main objective is to estimate the effects and
find the measure of variability among each of the factors and finally to find the
variability among the error effects.
The ANOVA technique is mainly based on the linear model which
depends on the types of data used in the linear model. There are several types of
data in ANOVA, depending on the number of sources of variation namely,
One-way classified data,
Two-way classified data,

m-way classified data.
One-way classified data
When the set of observations is distributed over different levels of a single
factor, then it gives one-way classified data.
ANOVA for One-way classified data
Let
y denote the jth observation corresponding to the ith level of factor A and
ij

Yij the corresponding random variate.


Define the linear model for the sample data
obtained from the
experiment by the equation
yae
⎛ i  1, 2,..., k ⎞
⎜ ⎟
ij i ij
j  1, 2,..., n
⎝ i⎠

where
 represents the general mean effect which is fixed and which represents
the general condition of the experimental units, ai denotes the fixed effect due
to ith level of
the factor A (i=1,2,…,k) and hence the variation due to ai
(i=1,2,…,k) is said to be control.
The last component of the model eij is the random variable. It is called the error

component and it makes the Yij a random variate. The variation in ei is due to
j

all the uncontrolled factors and eij is independently, identically and normally

distributed with mean zero and constant variance  .


2

For the realization of the random variate Yij, consider


yi defined by
j

y    a  e ⎛ i  1, 2,..., k ⎞
ij ⎜ ⎟
ij i
j  1, 2,..., n
⎝ i⎠

The expected value of the general observation yi in the experimental units is


j

given by
E( yij )  for all i  1, 2,..., k
i
with
yij  i  eij , where eij is the random error effect due to uncontrolled factors

(i.e., due to chance only).


Here we may expect
i  for all i  1, 2,..., k , if there is no variation due to

control factors. If it is not the case, we have
i  
for all i  1, 2,..., k
i.e., i    for all i  1, 2,..., k
0
Suppose i    a .
i

Then we have i    for all i  1, 2,..., k


ai

On substitution for i
in the above equation, the linear model reduces to

yae ⎛ i  1, 2,..., k (1)



⎜ ⎟
ij i ij
j  1, 2,..., n
⎝ i⎠

The objective of ANOVA is to test the null hypothesis


Ho : i  
for all i  1, 2,..., or Ho : ai  0 for all i  1, 2,..., k . For carrying
k
out this test, we need to estimate the unknown parameters
, for all i  1, 2,..., by the principle of least squares. This can be done by
ai
k
minimizing the residual sum of squares defined by
E  e2
ij
ij
 ( y    a )2,
ij i
ij

using (1). The normal equations can be obtained by partially differentiating E


with respect to  and ai
for all i  1, 2,..., and equating the results to zero.
We obtain k

G  N    niai (2)
i

and Ti =  + ni ai, i = 1,2,…,k (3)


ni
where N = nk. We see that the number of variables (k+1) is more than the
number of independent equations (k). So, by the theorem on a system of linear
equations, it follows that unique solution for this system is not possible.

However, by making the assumption that = 0 , we can get a

n a i i
i
unique solution for  and ai (i = 1,2,…,k). Using this condition in equation (2),
we get
GN
G
i.e.  
N

Therefore the estimate of  is given by µ (4) (4)


G
N
Again from equation (2), we have

Ti
   ai
ni
Ti
Hence, ai  
ni
Therefore, the estimate of ai is given by
T
aµ i  µ
i

ni
T G
i.e., aµ i  (5) (5)
i
ni N

Substituting the least square estimates of µ and i aµ in the residual sum of

squares, we get
E   ( yij  µ $a2i )
ij

After carrying out some calculations and using the normal equations (2) and (3)
we obtain
⎛ G2 ⎞ ⎛ T 2 G2 ⎞
E  ⎜  y 2  N ⎟  ⎜ in  N ⎟ (6)
i
⎝ ij ⎠ ⎝ i i ⎠
The first term in the RHS of equation (6) is called the corrected total sum of
squares while  y2i is called the uncorrected total sum of squares.
ij j

For measuring the variation due to treatment (controlled factor), we


consider the null hypothesis that all the treatment effects are equal. i.e.,
Ho : 1  2  ...  k 

i.e., H o :   for all i   
k
i    0 for all i  
k
i.e., Ho : 0
i
i.e., Ho : ai
y    e ⎛ i  1, 2,..., k ⎞
ij ⎜ ⎟
ij
j  1, 2,..., n
⎝ i⎠

Proceeding as before, we get the residual sum of squares for this hypothetical
model as
⎛ 2⎞ G2
E1  ⎜  yij ⎟  (7)
N
⎝ ij ⎠
Actually,
E1 contains the variation due to both treatment and error. Therefore a

measure of variation due to treatment can be obtained by “ E1  E ”. Using (6)


and (7), we get
k
T 2 G2
E1  E   ni  N (8)
i1 i

The expression in (8) is usually called the corrected treatment sum of squares
k
T2
while the term  i is called uncorrected treatment sum of squares. Here it
i 1 ni

G 2
may be noted that is a correction factor (Also called a correction term).
N
Since E is based on N-k free observations, it has N - k degrees of freedom (df).
Similarly, since
E1 is based on N -1 free observation, E1 has N -1 degrees of

freedom. So
E1  has k -1 degrees of freedom.
E
When actually the null hypothesis is true, if we reject it on the basis of
the estimated value in our statistical analysis, we will be committing Type – I
er ror. The probability for committi ng this error is referred to as the level of
significance, denoted by . The t esting of the null hy pothesis Ho may be

carried out by F test. For given , w e have


TrMSS Trss dF
F  : F .
EMSS Ess k 1, N k

d
F
i.e., It follows F distribution with degrees of freedom k-1 and N-k.
All these values are represented in the form of a table called ANOVA table,
furnished below.

ANOVA Table for one-way classified data


Source of Degrees of Sum of Squares Mean Squares Variance ratio
Variation freedom (SS) (MS) F
Between the
E1  E  QT MT
level of the QT F :
k-1 M T
T ME
factor k
T 2
G2 k 1

i
 Fk 1, N k
(Treatment) i ni N

QE :
Within the level N-k QE -
M 
By subtraction E
of factor (Error) Nk

G2
Total N-1 Q   yij  - -
ij N

Variance ratio
The variance ratio is the ratio of the greater variance to the smaller variance. It is
also called the F-coefficient. We have
F = Greater variance / Smaller variance.
We refer to the table of F values at a desired level of significance . In general,
 is taken to be 5 %. The table value is referred to as the theoretical value or the
expected value. The calculated value is referred to as the observed value.
Inference
If the observed value of F is less than the expected value of F (i.e., Fo < Fe) for
the given level of significance  , then the null hypothesis H o is accepted. In
this case, we conclud
e that there is no sign ificant d ifference between the
treatment effects.
On the other hand, if t
he observed value of F isthan
greater
the expected value
of F (i.e., he given level of significance  ,
Fo  Fe ) for t then the null hypothesis
case, we conclude tha t all the tr
Ho is rejected. In this
eatment effects are not
equal.
Note: If the calculated value of F and the table value of F are equal, we can try
some other value of  .
Problem 1
The following are the details of sales effected by three sales persons in three
door-to-door campaigns.
Sales person Sales in door – to – door campaign
A 8 9 5 10
B 7 6 6 9
C 6 6 7 5
Construct an ANOVA table and find out whether there is any significant
difference in the performance of the sales persons.
Solution:
Method I (Direct method) :
 A  8  9  5 10  32
 B  7  6  6  9  28
C  6  6  7  5  24
32
Sample mean for A : A  8

28
Sample mean for B : B  7

24
Sample mean for C : C  6

Total num ber of sam ple items = No. of items for A + No . of items for B + No.
of items for C
= 4
32 
ll the samples X 
Sum of squares of deviations for A:

A AAA8  A
A 2
8 0 0
9 1 1
5 -3 9
10 2 4
14

Sum of squares of deviations for B:

B BBB7  B  B 2
7 0 0
6 -1 1
6 -1 1
9 2 4
6

Sum of squares of deviations for C:

C CCC6 C  C
2
6 0 0
6 0 0
7 1 1
5 -1 1
2
Sum of squares of deviations within

varieties =   A  A 2    B  B 2    C  C 2
= 14 + 6 + 2
= 22

Sum of squares of deviations for total variance:

Sales person Sales Sales - X = Sales – 7  Sales  7 2


A 8 1 1
A 9 2 4
A 5 -2 4
A 10 3 9
B 7 0 0
B 6 -1 1
B 6 -1 1
B 9 2 4
C 6 -1 1
C 6 -1 1
C 7 0 0
C 5 2 4
30

ANOVA Table
Source of variation Degrees of freedom Sum of squares of Variance
deviations
8
Between varieties 3–1= 2 8 4
2
22
Within varieties 12 – 3 = 9 22  2.44
9
Total 12 – 1 = 11 30

Calculation of F value:
Greater Variance 4.00
F = Smaller Variance  2.44 1.6393

Degrees of freedom for greater variance df1  = 2

Degrees of freedom for smaller variance df2  =


9
Let us take the level of significance as 5%
The table value of F = 4.26

Inference:
The calculated valu e of F is less than the table value of F. Therefore, the null
hypothesis is accepted. It is concluded that there is no significant difference in
the performance of the sales persons, at 5% level of significance.
Method II (Short cut method):
 A = 32,  B = 28,  C = 24.
T= Sum of all the sample items

 A B C
 32  28  24
 84
N = Total number of items in all the samples = 4 + 4 + 4 =12
2
2 84
Correction Factor = T   588
N 12
Calculate the sum of squares of the observed values as follows:
Sales Person X X2
A 8 64
9 81
A 5 25
A 10 100
A 7 49
B 6 36
B 6 36
B 9 81
B 6 36
C 6 36
C 7 49
C 5 25
C
618

Sum of squares of deviations for total variance =  X 2 - correction factor

= 618 – 588 = 30.

Sum of squares of deviations for variance between samples


  A  2   B 2   C  2
    CF
N1 N2 N3
322 282 242
 4  4  4  588
1024 784 576
 4  4  4  588
 256 196 144  588
8
ANOVA Table
Source of Degrees of Sum of squares of Variance
variation Freedom deviations
Between varieties 3-1 = 2 8 8
4
2
Within varieties 12 – 3 = 9 22 22
 2.44
9
Total 12 – 1 = 11 30

It is to be noted that the ANOVA tables in the methods I and II are one and the
same. For the further steps of calculation of F value and drawing inference,
refer to method I.
Problem 2
The following are the details of plinth areas of ownership apartment flats offered
by 3 housing companies A,B,C. Use analysis of variance to determine whether
there is any significant difference in the plinth areas of the apartment flats.
Housing Company Plinth area of apartment flats
A 1500 1430 1550 1450
B 1450 1550 1600 1480
C 1550 1420 1450 1430

Use analysis of variance to determine whether there is any significant difference


in the plinth areas of the apartment’s flats.
Note: As the given figures are large, working with them will be difficult.
Therefore, we use the following facts:
i. Variance ratio is independent of the change of origin.
ii. Variance ratio is independent of the change of scale.
In the problem under consideration, the numbers vary from 1420 to 1600. So
we follow a method ca lled the coding metho d. First, let us subtract 1400 from
each item. We get the following transformed data:
Company Transformed measurement
A 100 30 150 50
B 50
150 100 80
C 150 20 50 30

Next, divide each entry by 10.


The transformed data are given below.
Company Transformed measurement
A 10 3 15 5
B 5 15 10 8
C 15 2 5 3
We work with these transformed data. We have

 A=10+3+15+5=33
 B 5+15+10+8=38
C=15+2+5+3=25
T   A   B   C
 33  38  25
 96
N = Total number of items in all the samples = 4 + 4 + 4 = 12
T2 962
Correction Factor =   768
N 12
Calculate the sum of squares of the observed values as follows:
Company X X2
A 10 100
A 3 9
A 15 225
A 5 25
B 5 25
B 15 225
B 10 100
B 8 64
C 15 225
C 2 4
C 5 25
C 3 9
1036

Sum of squares of deviations for total variance =


X 2
- correction factor

= 1036 – 768 = 268


Sum of squares of deviations for variance between samples
  A  2   B 2   C  2
    CF
N1 N2 N3
332 382 252
 4  4  4  768
1089 1444 625
 4  4  4  768
 272.25  361156.25  768
 789.5  768
 21.5
ANOVA Table
Source of variation Degrees of Freedom Sum of squares Variance
of deviations
Between varieties 3-1 = 2 21.5 21.5
 10.75
2
Within varieties 12 – 3 = 9 264.5 24.65
 27.38
9
Total 12 – 1 = 11 268
Calculation of F value:
Greater Variance 27.38
F = Smaller Variance  10.75  2.5470

Degrees of freedom for greater variance df1  =

9 Degrees of freedom for smaller variance  df2 


=2
The table value of F at 5% level of significance = 19.38

Inference:
Since the calculated value of F is less than the table value of F, the null
hypothesis is accepted and it is concluded that there is no significant difference
in the plinth areas of ownership apartment flats offered by the three companies,
at 5% level of significance.

Problem 3
A finance manager has collected the following information on the performance
of three financial schemes.
Source of variation Degrees of Freedom Sum of squares of deviations
Treatments 5 15
Residual 2 25
Total (corrected) 7 40

Interpret the information obtained by him.


Note: ‘Treatments’ means ‘Between varieties’.
‘Residual’ means ‘Within varieties’ or ‘Error’.
Solution:
Number of schemes = 3 (since 3 – 1 = 2)
Total number of sample items = 8 (since 8 – 1 = 7)
Let us calculate the variance.
15
Variance between varieties =  7.5
2
25
Variance between varieties = 5
5
Greater Variance 7.5
F = Smaller Variance  5  1.5

Degrees of freedom for greater variance df1  = 2

Degrees of freedom for smaller variance df2  =5


df2/df1 1 2 3 4 5 6 7 8 9 10
1 The4999.5
4052.1 total value of F5624.5
5403.3 at 5% 5763.6
level of5858.9
significance
5928.3= 5.79
5981.0 6022.4 6055.8
2 98.5 99.0 99.1 99.2 99.2 99.3 99.3 99.3 99.3 99.3
3
Inference:34.1 30.8 29.4 28.7 28.2 27.9 27.6 27.4 27.3 27.2
4 21.1 18.0 16.6 15.9 15.5 15.2 14.9 14.7 14.6 14.5
Since the calculated value of F is less than the table value of F, we accept the
5 16.2 13.2 12.0 11.3 10.9 10.6 10.4 10.2 10.1 10.0
null-hypothesis
13.7
and
10.9
conclude
9.7
that9.1there 8.7
is no significant
8.4 8.2
difference
8.1
in7.9the 7.8
6
performance
7 of the three
12.2 9.5 financial
8.4 schemes.
7.8 7.4 7.1 6.9 6.8 6.7 6.6
8 11.2 8.6 7.5 7.0 6.6 6.3 6.1 6.0 5.9 5.8
9 10.5 8.0 6.9 6.4 6.0 5.8 5.6 5.4 5.3 5.2
10 10.0 7.5 6.5 5.9 5.6 5.3 5.2 5.0 4.9 4.8
11 9.6 7.2 6.2 5.6 5.3 5.0 4.8 4.7 4.6 4.5
12 9.3 6.9 5.9 5.4 5.0 4.8 4.6 4.4 4.3 4.2
Statistical Table-1: F-values at 1% level of significance
13 9.0 6.7 5.7 5.2 4.8 4.6 4.4 4.3 4.1 4.1
14 8.8 6.5 5.5 5.0 4.6 4.4 4.2 4.1 4.0 3.9
df1: degrees15of freedom
8.6 for greater
6.3 variance
5.4 df
4.82: 4.5 4.3 4.1 4.0 3.8 3.8
16 8.5 6.2 5.2 4.7 4.4 4.2 4.0 3.8 3.7 3.6

degrees of 17 8.4 smaller


freedom for 6.1 variance
5.1 4.6 4.3 4.1 3.9 3.7 3.6 3.5
18 8.2 6.0 5.0 4.5 4.2 4.0 3.8 3.7 3.5 3.5
19 8.1 5.9 5.0 4.5 4.1 3.9 3.7 3.6 3.5 3.4
20 8.0 5.8 4.9 4.4 4.1 3.8 3.6 3.5 3.4 3.3
21 8.0 5.7 4.8 4.3 4.0 3.8 3.6 3.5 3.3 3.3
22 7.9 5.7 4.8 4.3 3.9 3.7 3.5 3.4 3.3 3.2
23 7.8 5.6 4.7 4.2 3.9 3.7 3.5 3.4 3.2 3.2
24 7.8 5.6 4.7 4.2 3.8 3.6 3.4 3.3 3.2 3.1
25 7.7 5.5 4.6 4.1 3.8 3.6 3.4 3.3 3.2 3.1
26 7.7 5.5 4.6 4.1 3.8 3.5 3.4 3.2 3.1 3.0
27 7.6 5.4 4.6 4.1 3.7 3.5 3.3 3.2 3.1 3.0
28 7.6 5.4 4.5 4.0 3.7 3.5 3.3 3.2 3.1 3.0
29 7.5 5.4 4.5 4.0 3.7 3.4 3.3 3.1 3.0 3.0
30 7.5 5.3 4.5 4.0 3.6 3.4 3.3 3.1 3.0 2.9

Statistical Table-2: F-values at 2.5% level of significance

df1: degrees of freedom for greater variance

df2: degrees of freedom for smaller variance

df2/df1 1 2 3 4 5 6 7 8 9 10
1 647.7 799.5 864.1 899.5 921.8 937.1 948.2 956.6 963.2 968.6
2 38.5 39.0 39.1 39.2 39.2 39.3 39.3 39.3 39.3 39.3
3 17.4 16.0 15.4 15.1 14.8 14.7 14.6 14.5 14.4 14.4
4 12.2 10.6 9.9 9.6 9.3 9.1 9.0 8.9 8.9 8.8
5 10.0 8.4 7.7 7.3 7.1 6.9 6.8 6.7 6.6 6.6
6 8.8 7.2 6.5 6.2 5.9 5.8 5.6 5.5 5.5 5.4
7 8.0 6.5 5.8 5.5 5.2 5.1 4.9 4.8 4.8 4.7
8 7.5 6.0 5.4 5.0 4.8 4.6 4.5 4.4 4.3 4.2
9 7.2 5.7 5.0 4.7 4.4 4.3 4.1 4.1 4.0 3.9
10 6.9 5.4 4.8 4.4 4.2 4.0 3.9 3.8 3.7 3.7
11 6.7 5.2 4.6 4.2 4.0 3.8 3.7 3.6 3.5 3.5
12 6.5 5.0 4.4 4.1 3.8 3.7 3.6 3.5 3.4 3.3
13 6.4 4.9 4.3 3.9 3.7 3.6 3.4 3.3 3.3 3.2
14 6.2 4.8 4.2 3.8 3.6 3.5 3.3 3.2 3.2 3.1
15 6.1 4.7 4.1 3.8 3.5 3.4 3.2 3.1 3.1 3.0
16 6.1 4.6 4.0 3.7 3.5 3.3 3.2 3.1 3.0 2.9
17 6.0 4.6 4.0 3.6 3.4 3.2 3.1 3.0 2.9 2.9
18 5.9 4.5 3.9 3.6 3.3 3.2 3.0 3.0 2.9 2.8
19 5.9 4.5 3.9 3.5 3.3 3.1 3.0 2.9 2.8 2.8
20 5.8 4.4 3.8 3.5 3.2 3.1 3.0 2.9 2.8 2.7
21 5.8 4.4 3.8 3.4 3.2 3.0 2.9 2.8 2.7 2.7

22 5.7 4.3 3.7 3.4 3.2 3.0 2.9 2.8 2.7


2.7

23 5.7 4.3 3.7 3.4 3.1 3.0 2.9 2.8 2.7


2.6

24 5.7 4.3 3.7 3.3 3.1 2.9 2.8 2.7 2.7


2.6

25 5.6 4.2 3.6 3.3 3.1 2.9 2.8 2.7 2.6


2.6

26 5.6 4.2 3.6 3.3 3.1 2.9 2.8 2.7 2.6


2.5

27 5.6 4.2 3.6 3.3 3.0 2.9 2.8 2.7 2.6


2.5
28 2.6
5.6 4.2 3.6 3.2 3.0 2.9 2.7 2.6 2.5

29 5.5 4.2 3.6 3.2 3.0 2.8 2.7 2.6 2.5


2.5

30 5.5 4.1 3.5 3.2 3.0 2.8 2.7 2.6 2.5


2.5

Statistical Table-3: F-values at 5% level of significance

df1: degrees of freedom for greater variance

df2: degrees of freedom for smaller variance


df2/df1 1 2 3 4 5 6 7 8 9 10
1 161.4 199.5 215.7 224.5 230.1 233.9 236.7 238.8 240.5 241.8
2 18.5 19.0 19.1 19.2 19.2 19.3 19.3 19.3 19.3 19.3
3 10.1 9.5 9.2 9.1 9.0 8.9 8.8 8.8 8.8 8.7
4 7.7 6.9 6.5 6.3 6.2 6.1 6.0 6.0 5.9 5.9
5 6.6 5.7 5.4 5.1 5.0 4.9 4.8 4.8 4.7 4.7
6 5.9 5.1 4.7 4.5 4.3 4.2 4.2 4.1 4.0 4.0
7 5.5 4.7 4.3 4.1 3.9 3.8 3.7 3.7 3.6 3.6
8 5.3 4.4 4.0 3.8 3.6 3.5 3.5 3.4 3.3 3.3
9 5.1 4.2 3.8 3.6 3.4 3.3 3.2 3.2 3.1 3.1
10 4.9 4.1 3.7 3.4 3.3 3.2 3.1 3.0 3.0 2.9
11 4.8 3.9 3.5 3.3 3.2 3.0 3.0 2.9 2.8 2.8
12 4.7 3.8 3.4 3.2 3.1 2.9 2.9 2.8 2.7 2.7
13 4.6 3.8 3.4 3.1 3.0 2.9 2.8 2.7 2.7 2.6
14 4.6 3.7 3.3 3.1 2.9 2.8 2.7 2.6 2.6 2.6
15 4.5 3.6 3.2 3.0 2.9 2.7 2.7 2.6 2.5 2.5
16 4.4 3.6 3.2 3.0 2.8 2.7 2.6 2.5 2.5 2.4
17 4.4 3.5 3.1 2.9 2.8 2.6 2.6 2.5 2.4 2.4
18 4.4 3.5 3.1 2.9 2.7 2.6 2.5 2.5 2.4 2.4
19 4.3 3.5 3.1 2.8 2.7 2.6 2.5 2.4 2.4 2.3
20 4.3 3.4 3.0 2.8 2.7 2.5 2.5 2.4 2.3 2.3
21 4.3 3.4 3.0 2.8 2.6 2.5 2.4 2.4 2.3 2.3
22 4.3 3.4 3.0 2.8 2.6 2.5 2.4 2.4 2.3 2.3
23 4.2 3.4 3.0 2.7 2.6 2.5 2.4 2.3 2.3 2.2
24 4.2 3.4 3.0 2.7 2.6 2.5 2.4 2.3 2.3 2.2
25 4.2 3.3 2.9 2.7 2.6 2.4 2.4 2.3 2.2 2.2
26 4.2 3.3 2.9 2.7 2.5 2.4 2.3 2.3 2.2 2.2
27 4.2 3.3 2.9 2.7 2.5 2.4 2.3 2.3 2.2 2.2
28 4.1 3.3 2.9 2.7 2.5 2.4 2.3 2.2 2.2 2.1
29 4.1 3.3 2.9 2.7 2.5 2.4 2.3 2.2 2.2 2.1
30 4.1 3.3 2.9 2.6 2.5 2.4 2.3 2.2 2.2 2.1
Statistical Table-4: F-values at 10% level of significance

df1: degrees of freedom for greater variance df2:

degrees of freedom for smaller variance

df2/df1 1 2 3 4 5 6 7 8 9 10
1 39.8 49.5 53.5 55.8 57.2 58.2 58.9 59.4 59.8 60.1
2 8.5 9.0 9.1 9.2 9.2 9.3 9.3 9.3 9.3 9.3
3 5.5 5.4 5.3 5.3 5.3 5.2 5.2 5.2 5.2 5.2
4 4.5 4.3 4.1 4.1 4.0 4.0 3.9 3.9 3.9 3.9
5 4.0 3.7 3.6 3.5 3.4 3.4 3.3 3.3 3.3 3.2
6 3.7 3.4 3.2 3.1 3.1 3.0 3.0 2.9 2.9 2.9
7 3.5 3.2 3.0 2.9 2.8 2.8 2.7 2.7 2.7 2.7
8 3.4 3.1 2.9 2.8 2.7 2.6 2.6 2.5 2.5 2.5
9 3.3 3.0 2.8 2.6 2.6 2.5 2.5 2.4 2.4 2.4
10 3.2 2.9 2.7 2.6 2.5 2.4 2.4 2.3 2.3 2.3
11 3.2 2.8 2.6 2.5 2.4 2.3 2.3 2.3 2.2 2.2
12 3.1 2.8 2.6 2.4 2.3 2.3 2.2 2.2 2.2 2.1
13 3.1 2.7 2.5 2.4 2.3 2.2 2.2 2.1 2.1 2.1
14 3.1 2.7 2.5 2.3 2.3 2.2 2.1 2.1 2.1 2.0
15 3.0 2.6 2.4 2.3 2.2 2.2 2.1 2.1 2.0 2.0
16 3.0 2.6 2.4 2.3 2.2 2.1 2.1 2.0 2.0 2.0
17 3.0 2.6 2.4 2.3 2.2 2.1 2.1 2.0 2.0 2.0
18 3.0 2.6 2.4 2.2 2.1 2.1 2.0 2.0 2.0 1.9
19 2.9 2.6 2.3 2.2 2.1 2.1 2.0 2.0 1.9 1.9
20 2.9 2.5 2.3 2.2 2.1 2.0 2.0 1.9 1.9 1.9
21 2.9 2.5 2.3 2.2 2.1 2.0 2.0 1.9 1.9 1.9
22 2.9 2.5 2.3 2.2 2.1 2.0 2.0 1.9 1.9 1.9
23 2.9 2.5 2.3 2.2 2.1 2.0 1.9 1.9 1.9 1.8
24 2.9 2.5 2.3 2.1 2.1 2.0 1.9 1.9 1.9 1.8
25 2.9 2.5 2.3 2.1 2.0 2.0 1.9 1.9 1.8 1.8
26 2.9 2.5 2.3 2.1 2.0 2.0 1.9 1.9 1.8 1.8
27 2.9 2.5 2.2 2.1 2.0 2.0 1.9 1.9 1.8 1.8
28 2.8 2.5 2.2 2.1 2.0 1.9 1.9 1.9 1.8 1.8
29 2.8 2.4 2.2 2.1 2.0 1.9 1.9 1.8 1.8 1.8
30 2.8 2.4 2.2 2.1 2.0 1.9 1.9 1.8 1.8 1.8
WHAT IS A REPORT?
A report is a written document on a particular topic, which conveys
information and ideas and may also make recommendations. Reports often form
the basis of crucial decision making. Inaccurate, incomplete and poorly written
reports fail to achieve their purpose and reflect on the decision, which will
ultimately be made. This will also be the case if the report is excessively long,
jargonistic and/ or structureless. A good report can be written by keeping the
following features in mind:
1. All points in the report should be clear to the intended reader.
2. The report should be concise with information kept to a necessary
minimum and arranged logically under various headings and sub-headings.
3. All information should be correct and supported by evidence.
4. All relevant material should be included in a complete report.

Purpose of Research Report


1. Why am I writing this report? Do I want to inform/ explain/
persuade, or indeed all of these.
2. Who is going to read this report? Managers/ academicians/
researchers! What do they already know? What do they need to know? Do any
of them have certain attitudes or prejudices?
3. What resources do we have? Do I have access to a computer? Do I
have enough time? Can any of my colleagues help?
4. Think about the content of your report – what am I going to put in it?
What are my main themes? How much should be the text, and how much should
be the illustrations?
Framework of a Report
The various frameworks can be used depending on the content of the
report, but generally the same rules apply. Introduction, method, results and
discussion with references or bibliography at the end, and an abstract at the
beginning could form the framework.
STRUCTURE OF A REPORT
Structure your writing around the IMR&D framework and you will
ensure a beginning, middle and end to your report.
I Introduction Why did I do this research? (beginning)
M Method What did I do and how did I go about (middle)
doing it?
R Results What did I find? (middle)
AND
D Discussion What does it all mean? (end)

What do I put in the beginning part?


TITLE PAGE Title of project, Sub–title (where
appropriate), Date, Author, Organization,
Logo
BACKGROUND History(if any) behind project
ACKNOWLEDGEMENT Author thanks people and organization who
helped during the project
SUMMARY(sometimes called A condensed version of a report – outlines
abstract of the synopsis) salient points, emphasizes main conclusions
and (where appropriate) the main
recommendations. N.B this is often
difficult to write and it is suggested that you
write it last.
LIST OF CONTENTS An at- a – glance list that tells the reader
what is in the report and what page
number(s) to find it on.
LIST OF TABLES As above, specifically for tables.
LIST OF APPENDICES As above, specifically for appendices.
INTRODUCTION Author sets the scene and states his/ her
intentions.
AIMS AND OBJECTIVES AIMS – general aims of the audit/ project,
broad statement of intent. OBJECTIVES –
specific things expected to do/ deliver(e.g.
expected outcomes)

What do I put in the middle part?


METHOD Work steps; what was done – how, by
whom, when?
RESULT/FINDINGS Honest presentation of the findings,
whether these were as expected or not.
give the facts, including any
inconsistencies or difficulties
encountered
What do I put in the end part?
DISCUSSION Explanation of the results.( you might like to
keep the SWOT analysis in mind and think about
your project’s strengths, weakness, opportunities
and threats, as you write)
CONCLUSIONS The author links the results/ findings with the
points made in the introduction and strives to
reach clear, simply stated and unbiased
conclusions. Make sure they are fully supported
by evidence and arguments of the main body of
your audit/project.
RECOMMENDATIONS The author states what specific actions should be
taken, by whom and why. They must always be
linked to the future and should always be
realistic. Don’t make them unless asked to.
REFERENCES A section of a report, which provides full details
of publications mentioned in the text, or from
which extracts have been quoted.
APPENDIX The purpose of an appendix is to supplement the
information contained in the main body of the
report.

PRACTICAL REPORTS VS. ACADEMIC REPORTS


Practical Reports:
In the practical world of business or government, a report conveys an
information and (sometimes) recommendations from a researcher who has
investigated a topic in detail. A report like this will usually be requested
by people who need the information for a specific purpose and their
request may be written in terms of reference or the brief. whatever the
report, it is important to look at the instruction for what is wanted. A
report like this differs from an essay in that it is designed to provide
information which will be acted on, rather than to be read by people
interested in the ideas for their own sake. Because of this, it has a different
structure and layout.

Academic Reports:
A report written for an academic course can be thought of as a
simulation. We can imagine that someone wants the report for a practical
purpose, although we are really writing the report as an academic exercise for
assessment. Theoretical ideas will be more to the front in an academic report
than in a practical one. Sometimes a report seems to serve academic and
practical purposes. Students on placement with organizations often have to
produce a report for the organization and for assessment on the course.
Although the background work for both will be related, in practice, the report the
student produces for academic assessment will be different from the report
produced for the organization, because the needs of each are different.

RESEARCH REPORT: PRELIMINARIES


It is not sensible to leave all your writing until the end. There is always
the possibility that it will take much longer than you anticipate and you will not
have enough time. There could also be pressure upon available word processors
as other students try to complete their own reports. It is wise to begin writing up
some aspects of your research as you go along. Remember that you do not have
to write your report in the order than it will be read. Often it is easiest to start
with the method section. Leave the introduction and the abstract to last. The
use of a word processor makes it very straightforward to modify and rearrange
what you have written as your research progresses and your ideas change. The
very process of writing will help your ideas to develop. Last but by no means
least, ask someone to proofread your work.

STRUCTURE OF A RESEARCH REPORT


A research report has a different structure and layout in comparison to a
project report. A research report is for reference and is often quite a long
document. It has to be clearly structured for the readers to quickly find the
information wanted. It needs to be planned carefully to make sure that the
information given in the report is put under correct headings.

PARTS OF RESEARCH REPORT


Cover sheet: This should contain some or all of the following:
Full title of the report
Name of the researcher
Name of the unit of which the project is a part
Name of the institution
Date/Year.

Title page: Full title of the report.


Your name
Acknowledgement: A thank you to the people who helped you.
Contents
List of the Tables

Headings and sub-headings used in the report should be given with their
page numbers. Each chapter should begin on a new page. Use a consistent
system in dividing the report into parts. The simplest may be to use chapters for
each major part and subdivide these into sections and sub-sections. 1, 2, 3 etc.
can be used as the numbers for each chapter. The sections of chapter 3 (for
example) would be 3.1, 3.2, 3.3, and so on. For further sub-division of a sub-
section you may use 3.2.1, 3.2.2, and so on.
Abstract or Summary or Executive Summary or Introduction:
This presents an overview of the whole report. It should let the reader see
in advance, what is in the report. This includes what you set out to do, how
review of literature is focused and narrowed in your research, the relation of the
methodology you chose to your objectives, a summary of your findings and
analysis of the findings

BODY
Aims and Purpose or Aims and Objectives:
Why did you do this work? What was the problem you were
investigating? If you are not including review of literature, mention the specific
research/es which is/are relevant to your work.
Review of Literature
This should help to put your research into a background context and to
explain its importance. Include only the books and articles which relate directly
to your topic. You need to be analytical and critical, and not just describe the
works that you have read.
Methodology
Methodology deals with the methods and principles used in an activity,
in this case research. In the methodology chapter, explain the method/s you used
for the research and why you thought they were the appropriate ones. You may,
for example, be depending mostly upon secondary data or you may have
collected your own data. You should explain the method of data collection,
materials used, subjects interviewed, or places you visited. Give a detailed
account of how and when you carried out your research and explain why you
used the particular method/s, rather than other methods. Included in this chapter
should be an examination of ethical issues, if any.
Results or Findings
What did you find out? Give a clear presentation of your results. Show
the essential data and calculations here. You may use tables, graphs and figures.
Analysis and Discussion
Interpret your results. What do you make out of them? How do they
compare with those of others who have done research in this area? The accuracy
of your measurements/results should be discussed and deficiencies, if any, in the
research design should be mentioned.
Conclusions
What do you conclude? Summarize briefly the main conclusions which
you discussed under "Results." Were you able to answer some or all of the
questions which you raised in your aims and objectives? Do not be tempted to
draw conclusions which are not backed up by your evidence. Note the
deviation/s from expected results and any failure to achieve all that you had
hoped.
Recommendations
Make your recommendations, if required. The suggestions for action and
further research should be given.
Appendix
You may not need an appendix, or you may need several. If you have
used questionnaires, it is usual to include a blank copy in the appendix. You
could include data or calculations, not given in the body, that are necessary, or
useful, to get the full benefit from your report. There may be maps, drawings,
photographs or plans that you want to include. If you have used special
equipment, you may include information about it.
The plural of an appendix is appendices. If an appendix or appendices
are needed, design them thoughtfully in a way that your readers find it/them
convenient to use.
References
List all the sources which you referred in the body of the report. You
may use the pattern prescribed by American Psychological Association, or any
other standard pattern recognized internationally.
REVIEW OF LITERATURE
In the case of small projects, this may not be in the form of a critical review
of the literature, but this is often asked for and is a standard part of larger
projects. Sometimes students are asked to write Review of Literature on a topic
as a piece of work in its own right. In its simplest form, the review of literature
is a list of relevant books and other sources, each followed by a description and
comment on its relevance.
 The literature review should demonstrate that you have read and analysed
the literature relevant to your topic. From your readings, you may get ideas
about methods of data collection and analysis. If the review is part of a project,
you will be required to relate your readings to the issues in the project, and while
describing the readings, you should apply them to your topic. A review should
include only relevant studies. The review should provide the reader with a
picture of the state of knowledge in the subject.
Your literature search should establish what previous researches have been
carried out in the subject area. Broadly speaking, there are three kinds of sources
that you should consult:
1. Introductory material;
2. Journal articles and
3. Books.

To get an idea about the background of your topic, you may consult one or
more textbooks at the appropriate time. It is a good practice to review in
cumulative stages - that is, do not think you can do it all at one go. Keep a
careful record of what you have searched, how you have gone about it, and the
exact citations and page numbers of your readings. Write notes as you go along.
Record suitable notes on everything you read and note the methods of
investigations. Make sure that you keep a full reference, complete with page
numbers. You will have to find your own balance between taking notes that are
too long and detailed, and ones too brief to be of any use. It is best to write your
notes in complete sentences and paragraphs, because research has shown that
you are more likely to understand your notes later if they are written in a way
that other people would understand. Keep your notes from different sources
and/or about different points on separate index cards or on separate sheets of
paper. You will do mainly basic reading while you are trying to decide on your
topic. You may scan and make notes on the abstracts or summaries of work in
the area. Then do a more thorough job of reading later on, when you are more
confident of what you are doing. If your project spans several months, it would
be advisable towards the end to check whether there are any new and recent
references.

REFERENCES
There are many methods of referencing your work; some of the most
common ones are the Numbered Style, American Psychological Association
Style and the Harvard Method, with many other variations. Just use the one
you are most familiar and comfortable with. Details of all the works referred
by you should be given in the reference section.
THE PRESENTATION OF REPORT

Well-produced, appropriate illustrations enhance the presentability of


a report. With today's computer packages, almost anything is possible.
However, histograms, bar charts and pie charts are still the three 'staples'.
Readers like illustrated information, because it is easier to absorb and it's more
memorable. Illustrations are useful only when they are easier to understand than
words or figures and they must be relevant to the text. Use the algorithm
included to help you decide whether or not to use an illustration. They should
never be included for their own sake, and don't overdo it; too many
illustrations distract the attention of readers

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy