RM Book IIT-R 109

Download as pdf or txt
Download as pdf or txt
You are on page 1of 119



(31.05.2010 to 04.06.2010)

Under the aegis of

Quality Improvement Programme Centre
IIT Roorkee, Roorkee


D. K. Nauriyal

Dept. of Humanities & Social Sciences

Indian Institute of Technology Roorkee,
Roorkee - 247667

S. P. Singh


In organizing this course, we drew support from several people and organisations. First of
all, we thank Prof. Vinod Kumar, Co-ordinator, QIP, IIT Roorkee, who has extended his
complete support towards the smooth conduct of this programnme. Our sincere thanks are
also due to Prof. S. P. Singh, the Head, Department of Humanities & Social Sciences for
his encouragement and support.
We owe profound gratitude to Prof. S. S. Dhillon, Punjab School of Economics, GNDU,
Amritsar, Dr. R. K. Deswal, National Institute of Technology (NIT) Kurukshetra, and our
institute especially Dr. Rashmi Gaur, who have kindly accepted our request to deliver
lectures in the programme.
We must also admit that without a very high level of receptivity, active involvement and
cooperation from the participants, the programme would have never accomplished its
objectives. They made the sessions lively and vibrant through their active and thoughtful
interaction. They deserve all the appreciation and kudos.
The infrastructural support and secretarial assistance received from the office of the
Coordinator, QIP IIT Roorkee has been quite significant. It would not be fair on our part
if we fail to acknowledge the timely and kind help lent by the QIP staff.
Last but not the least we thank all those who have directly or indirectly contributed
towards the success of this programme.


D. K Nauriyal

S. P. Singh





An Overview of Research

D. K. Nauriyal


Research Design

D. K. Nauriyal


Methods of Data Collection

D. K. Nauriyal


Basic Statistical Measures

S. P. Singh


Sampling Methods

S. P. Singh


How to Conduct Surveys

S. P. Singh


Parametric and Non-parametric


S.S. Dhillon


How to write research Proposal



Refining Your Skills in Basic

Statistical Analysis




Statistical Sostware: SPSS




DEA Techniques




Advanced Multivariate Analysis




Interpretation, Report Writing

and Interpretation

D. K. Nauriyal



List of Participants



Is Research Science or Art?

Objectivity of Investigator
Procedural integrity
Accurate reporting

Accuracy of Measurement
Valid and Reliable
Meaningful and useful
Appropriate design (sample, execution)

Open-minded to Findings
Willing to refute expectations
Acknowledge limitations

It is an art of scientific investigation

A movement from Known to unknown

A voyage of discovery

Objectives of Research
To achieve new insight into a phenomenon (exploratory / formulative research.
2. To portray accurately the characteristics of a particular individual situation or a group
(Descriptive Research).
3. To determine the frequency with which something occurs or with which it is
associated with something else (Diagnostic Research).
4. To test a hypothesis of a causal relationship between variables (Hypothesis Testing
Why Research?
Taking the challenge to solve an unsolved problem.
Desire to get intellectual satisfaction of doing some creative work.
Desire to get a research degree.
Desire to move up the career ladder in the academic institutions.
Desire to be of service to the society.

Significance of Research
1. Research inculcates scientific and inductive thinking and it promotes the development
of logical habits of thinking.
2. Research provides the basis for nearly all govt. policies in our economic system.

3. It helps to solve various operational and planning problems of business and industry.
Market research / operations research/demand forecasting
Conceptualizing the Research
Curiosity and intuition play an important role

What concept or puzzling phenomena is interesting?

The Research Process

Preparing and
Presenting the
Analyzing the

Identifying the

Determine the
Determine Data

Design Sample
and Collect
Chapter 10

Design Data
Conducting & Reading Research
Baumgartner et al

Key Properties of Research

Validity - Have you measured (or observed) what you think you have? Were the
instruments used suitable for purpose? Have you adequately and faithfully captured the
state of affairs?
Even if the methods are valid, can we be sure that the data are consistent and a true
reflection of the phenomena under study?
This is essential in scientific work, it means that the work has been done and described in
such a way that it is repeatable. In social science exact replication is often impossible, but
similar studies and to the weight of evidence.
Are the findings generally applicable, for example to other contexts, situations, times, or
persons other than the sample?
Establishing Credibility
Credibility is a property of good research.
Care and attention in planning and conducting the research.

Care and attention in writing it up in such a way that readers have confidence in the
integrity of the work.
Research reputations are established by repeatedly carrying out interesting and
worthwhile work that is consistently methodologically strong and accurately reported.

Approaches to the Research: Deductive and Inductive

Deductive Approach:
Deductive reasoning works from the more general to the more specific. It is a "top-down"
approach. We might begin with thinking up a theory about our topic of interest. We then
narrow that down into more specific hypotheses that we can test. We narrow down even
further when we collect observations to address the hypotheses. This ultimately leads us to
be able to test the hypotheses with specific data -- a confirmation (or not) of our original

Inductive Approach:
Inductive reasoning works the other way, moving from specific observations to broader
generalizations and theories. This is a "bottom up" approach. In inductive reasoning, we
begin with:

specific observations and measures,

begin to detect patterns and regularities,
formulate some tentative hypotheses that we can explore, and finally end up
developing some general conclusions or theories.

Types of Research
1. Descriptive (Ex-post facto research) Vs. Analytical (Critical Evaluation of the
2. Applied (Action) Vs. Fundamental (Basic or Pure).
3. Quantitative (Inferential/experimental/ simulation) Vs. Qualitative.
4. Conceptual (abstract idea or theory) Vs. Empirical (Experience or observations based
on data)
5. Longitudinal Research (Over a time period such as clinical or diagnostic research) Vs.
Laboratory or Simulation Research.

Quantitative and Qualitative Research

Quantitative Research
is an inquiry into an identified problem, based on testing a theory, measured with
numbers, and analyzed using statistical techniques. The goal of quantitative methods is to
determine whether the predictive generalizations of a theory hold true. All quantitative
research requires a hypothesis before research can begin.

Qualitative Research
By contrast, a study based upon a qualitative process of inquiry has the goal of
understanding a social or human problem from multiple perspectives. Qualitative research

is conducted in a natural setting and involves a process of building a complex and holistic
picture of the phenomenon of interest. In qualitative research, a hypothesis is not
needed to begin research.

Quantitative Research

In quantitative research, the researcher is ideally an objective

participates in nor influences what is being studied.

observer who neither

Qualitative Research

In qualitative research, however, it is thought that the researcher can learn the most by
participating and/or being immersed in a research situation
Characteristics of quantitative and qualitative research
Research questions: How many? Strength Research questions: What?
of association?
"Hard" science
"Soft" science
Literature review must be done early in Literature review may be done as study
progresses or afterwards
Test theory
Develops theory
One reality: focus is concise and narrow
Multiple realities: focus is complex and
Facts are value-free and unbiased
Facts are value-laden and biased
Reduction, control, precision
Discovery, description, understanding,
shared interpretation
Mechanistic: parts equal the whole
Organismic: whole is greater than the parts
Report statistical analysis.
Basic element of analysis is numbers
interpretation. Basic element of analysis is
Researcher is separate
Researcher is part of process
Context free
Context dependent
Research questions
Reasoning is logistic and deductive
Reasoning is dialectic and inductive
Establishes relationships, causation
Describes meaning, discovery
Uses instruments
Uses communications and observation
Strives for generalization
Strives for uniqueness
Generalizations leading to prediction, Patterns and theories developed for
explanation, and understanding
Highly controlled setting: experimental Flexible approach: natural setting (process
setting (outcome oriented)
Sample size: n
Sample size is not a concern; seeks
"informal rich" sample
Which one to choose?
Choose a more quantitative method when most of the following conditions apply:

The research is confirmatory rather than exploratory i.e. this is a frequently

researched topic, and (numerical) data from earlier research is available.

You are trying to measure a trend (almost impossible with qualitative


There is no ambiguity about the concepts being measured, and only one
way to measure each concept.

The concept is being measured on a ratio or ordinal scale.

And choose a qualitative method when most of these conditions apply:

You have no existing research data on this topic.

The most appropriate unit of measurement is not certain (Individuals?

Households? Organizations?)

The concept is assessed on a nominal scale, with no clear demarcation


You are exploring the reasons why people do or believe something.

One extreme example:

You are studying the trends in weather in the town where you live. There aren't
many variables: temperature ranges, wind speed, rainfall, barometric pressure, and
perhaps a few others. Most of the variables are measured mechanically, and a lot of
historical data exists. You wouldn't even consider doing qualitative research on

Research Methods vs Methodology

Research Methods:
Methods of data collection
Statistical methods used for establishing relationships between the data and the
Methods used to evaluate the accuracy of results obtained.

Research Methodology:
Research Methods
Consideration of the logic behind the methods we use.

Research Process
Series of actions or steps necessary to effectively carry out research and the desired
sequencing of these steps.
A. Formulating the Research Problem:
Understanding the problem thoroughly, and

Rephrasing the same into meaningful terms from an analytical point of view.

Studies of broader literature (Conceptual and empirical)

This stage is important because
1. The research problem needs to be defined unambiguously.
2. It helps to collect the relevant data, choice of research methods etc.
B. Extensive Literature Survey:
1. Abstracting and indexing the journals published or unpublished bibliographies.
2. Identifying the academic journals, conference proceedings, govt. reports, books
C. Development of Working Hypothesis:
Tentative assumptions made in order to draw out and test its logical or empirical
It is prior thinking about the subject.
D. Preparing the Research Design Exploration / Description / Diagnosis /
Research design needs consideration of the following:

The means of obtaining the information

The availability and skills of the researcher

Time available

Cost factor
E. Determining the sample size Probability / Non-probability.
Simple Random Sampling
Systematic Sampling
Cluster/Area Sampling
Non-Probability / Purposive /Deliberate sampling:
Convenience Sampling
Judgment Sampling
Quota Sampling
F. Collecting the Data: Collection of only appropriate data

Primary DataBy observation, through personal interviews, telephone interviews and by mailing
of questionnaire
Secondary Data
G. Analysis of Data:
1. Computation of statistics viz., mean, median, mode, standard Deviation,
coefficient of variation, coefficient of skewness etc.
2. Designing regression equation for estimating dependent variable as a function of
a set of independent variables.
3. Performing correlation analysis.

4. Factor, Discriminat, Conjoint analysis


Hypotheses Testing:


Interpretation of Results:


Validation of Results: The results after interpretation must be validated by using

past data. It ensures credibility of the results.


Report Writing


It is a conceptual structure within which research is conducted. It constitutes the blue
print for the collection, measurement and analysis of data
The research design addresses the following questions:
1. What is the study about?
2. Why is the study being done?
3. Where will the study be carried out?
4. What type of data is required?
5. Where can the required data be found?
6. What periods of time will the study include?
7. What will be the sample design?
8. What techniques of data collection will be used?
9. How will the data be analyzed?
10. Style of Report.
Thus the Essentials of the Research Design are:
The design is an activity and time-based plan.
The design is always based on the research question.
The design guides the selection of the sources and types of information.
The design is a framework for specifying the relationships among the studys
The design outlines procedures for every research activity.
Research Design
Overall Design
Sampling Design

Statistical Design
Observational Design

Opeartional Design

Type of Study
Exploratory/ Formulative
Descriptive / Diagnostic
Flexible (for considering
Rigid Design
different parts of the problem
Non-probability (purposive or Probability (Random
No Pre-planned design for
Unstructured instruments for
collection of data
NO fixed decisions about the
operational procedures

Pre-planned design for

Structured or well thought
out instruments for
collection of data.
Advanced decisions about
operational procedures.

Determining Research Design

Exploratory Research: collecting information in an unstructured and informal

Descriptive Research: refers to a set of methods and procedures describing

marketing variables.

Causal Research (experiments and other approaches): allows isolation of causes

and effects via use of experiment or surveys.

Components of Research Design

A. Sampling Designs.
Probability: Simple Random Sampling, Systematic Sampling, Stratified Random
Sampling, Cluster
Multi-stage Sampling
Non-Probability: Convenience Sampling, Judgment
sampling, Snowball Sampling

Sampling, Quota

B. Statistical Designs (Sample size, collection and analyses of data).

C. Operational Designs (Techniques by which the procedure specified in the sampling,
statistical and observational designs can be carried out).
Features of a Good Design
1. Generally, the design which minimizes biases and maximizes the reliability of the
data collected and analyzed is considered a good design.
2. A good research design involves consideration of the following factors:
a. The means of obtaining information
b. The availability and skills of the researcher and the staff,
if any
c. The objective of the problem to be studied
d. The nature of the problem to be studied.
e. The availability of time and money for the research work.
Important Concepts Relating to Research Design
1. Dependent and independent variables
1. Extraneous variables:
(Independent variables that are not related to the purpose of the study but may affect
the dependent variable are terms as extraneous variables).
3. Control:
(The technical term is used when we design the study minimizing the effects of
extraneous independent variables).
4. Confounded relationships
(When the dependent variable is not free from the influence of extraneous variables).
5. Research Hypotheses:
The research hypothesis is a predictive statement that relates an independent
variable to a dependent variable.
Predictive statements which are assumed but not to be tested are not termed as
research hypotheses.

Experimental and non-experimental hypothesis testing research:

For example mothers age 15-45 overall group above 15 years.
7. Experimental and control groups: For example dummy variable of caste / religion.



Methods of Data Collection

Primary data: information that is developed or gathered by the researcher
specifically for the research project at hand

Secondary data: information that has previously been gathered by someone other
than the researcher and/or for some other purpose than the research project at hand

Classification of Secondary Data

Internal secondary data: data that have been collected within the firm

Internal databases: databases (collection of data and information describing items

of interest) consisting of information gathered by a company typically during the
normal course of business transactions

External secondary data: data obtained from outside the firm

Syndicated Services Data
External Databases

External secondary data

Published: sources of information prepared for public distribution and
found in libraries or a variety of other entities

Syndicated Services Data: data provided by firms that collect data in a

standard format and make them available to subscribing firms

External secondary data

External Databases: databases provided by outside firms; many are now
available online (online information databases)
Bibliographic databases..citations by subject
Numeric or statistical databases, 2001 Census
Government Reports, Other Studies
Directory or list databases
Comprehensive databases, Contain all of the above

Advantages of Secondary Data

Obtained quickly (compared to primary data gathering)
Inexpensive (compared to primary data gathering)
Usually available
Enhances existing primary data

Limitations of Secondary Data

Exact data that one may need may not be available.
May have difficulty in getting access.
Errors in data base.

Possible coding problems

Data may be available but it may have problems:
Missing or incomplete data.
Unknown definitions of data.
Changed definitions or procedures.
Might be too aggregated.

Evaluating Secondary Data

What was the purpose of the study?
Who collected the information and when was this done?
What information was collected (questions, scales, etc.)?
How was the information obtained (sampling frame, method of sample draw,
communication method, resulting sample, etc.)?
How consistent is the information with other published information?
Locating Secondary Data Sources
Step 1:Identify what you wish to know and what you already know about your
Step 2: Develop a list of key words and names.
Step 3: Begin your search using several library and Web sources.
Step 4: Compile the literature you have found and evaluate your findings.

Primary Data Collection

Methods of Primary Data: Observation, In-depth Techniques, Experimentation, Surveys

Types of Observations
Structured (Descriptive)
Unstructured (Exploratory)
Participant (Anthropological)
Non-Participant (Political forecasts)
Disguised Participation (Presence of the observer is hidden)
In-Depth Techniques:
Focus groups Interviews
Interviews: Personal, Telephonic, Focused, Non-Directive
Projective Techniques
Primary Data Collection Methods
1. Observation
Human or physical observation includes mystery shopping, cameras in store,
watching children handle toys, etc.
Ethnography watching behaviors in the consumers natural setting
Mechanical or electronic observation using Nielsen people meter, eye tracking
devices, or using software to track behaviors on the Web, etc.
A. Expensive method in terms of time and money
B. Limited Information.
C. Interference of unforeseen factors
Merits of interview method

It provides greater information in-depth.

Can overcome the resistance through persuasion .
Personal information can be sort.
Low no-response.
Can secure most spontaneous reactions.
Adaptability of the language to the level of interviewee.
Can collate supplementary information which maybe of great value in
interpreting the results.
Interviewer can clarify unclear questions
Literacy is not required
Interviewer can collect more complex answers and observations
Interviewer can minimize missing and inappropriate responses
Interviewer can prevent respondent from answering out of sequence

Interview method is most useful when:

Other methods do not make sense.
When the issues are complex and in-depth understanding is needed.
When the issues and questions are still being determined.
Pre-requisites of interviewing

Careful selection, training, and briefing of the interviewer.

Must ask questions properly and intelligently.
Must answer legitimate questions of the interviewee.
Should not show surprise or disapproval
Must discourage irrelevant conversation.
Interviewer must possess the technical competence and necessary practical
Occasional field checks.

Guidelines for successful Interviewing:

1) Choose the time when the interviewee is at ease.
2) Approach must be friendly and informal.
3) Establish Rapport with the interviewee.
- People are motivated to communicate when atmosphere is favorable.
4) Listen with understanding, respect and curiosity.
5) Control the course of the interview and avoid irrelevant conversation.
1. No thinking space to the interviewee
2. Survey is restricted to those, who have telephone facilities.
3. Unsuitable for intensive surveys where comprehensive answers are required.
4. Greater possibility of bias.
Limitations of the interview method
Possibility of Data collection and interpretation biases.
Time consuming when sample is large.
May introduce systematic errors.
Lack of proper rapport , with the interviewee.
Through mailed questionnaires

Low cost
Free from the bias of the interviewer.
Enough thinking space.
Can be reached to otherwise inaccessible people.
Sample could be larger.
Low rate of return.
Only the educated and cooperating people could be approached.
Difficulty in modifying the approach once the questionnaire is made.
Possibility of ambiguous replies/omissions of questions.
Method is slightly to be slowest of all.
Important considerations while framing a questionnaire
A) General form (close ended/Open ended).
Question sequence:
First few questions are important because they are lightly to influence the attitude and the
desired cooperation from the respondent

First questions should break the ice

General to specific order of questions
Questions on personal or sensitive topics left towards the end
Avoid a series of questions that are likely to elicit the same response (bias)
One question can affect another
Questions should be easily understood and should be simple
There should always be provision for indications of uncertainty. e.g." Don't know
No preference

Questionnaire Design: General Principles

Open-ended vs closed-ended questions:
Open-ended questions generate answers that are more nuanced and information-rich.
They permit subject freedom to answer question in own words (without pre-specified
Open-ended questions do not provide respondents with any answers from which to choose.

Open-ended Questions: Advantages and Disadvantages

Not forced to choose between categories
May better reflect respondents thoughts\beliefs
Appropriate when list of possible answers is excessive
Lets respondent have the say, let him tell the researcher what he means, and not
vice-versa (obtain unanticipated answers)


Respondent may say too much or too little
Provide incomplete or unintelligible answers
Flexibility in responses difficult to code and analyze -Interpretations of answers
may vary
Too much variance in response
Expensive and time-consuming
Closed-ended Questions
Closed-ended questions provide respondents with a list of responses from which to
choose. Alternatively, closed-ended questions can provide multiple choices for the
respondent to accept or reject

Closed-ended Questions: Advantages and Disadvantages

- Advantages:
Easy to answer and takes little time
Answers can be precoded (assigned a number) and easily transferred to a
Answers are easy to compare
Easier to elicit responses to sensitive questions
Answers are more reliable
Meaning of responses more meaningful to researcher
May not be accurate--forces people to accept categories, or puts too many people
into other category
Answers relative to response scale provided
Respondent's choice not among listed alternatives
Choices listed communicate kind of response wanted
Wording of response choices may influence responses

Difference between a questionnaire and a schedule

1) Questionnaires are sent through mail to the informants while schedules are filled in
either by the researcher himself or by the enumerators who are specially appointed for
the purpose.
2) Questionnaire is relatively cheap but data collection through schedules in expensive.
3) Non-response is high in case of a questionnaire.
4) In case of a questionnaire, identity of the person who has actually filled in may be
unknown as he/she might be doing it on behalf of someone else.
5) Questionnaire method is slow as many respondents may not return the filled in response
in time.
6) Personal contact is not possible in case of questionnaires.
7) Questionnaire method can be used only when respondents are literate and cooperative.
8) Coverage with questionnaire could be wider and cheap.


9) Risk of collecting incomplete and wrong information is relatively more under the
questionnaire method particularly when people are unable to understand questions
10) Observation method can also be used along with the schedules but it is not possible
with the questionnaire.



The data of given situation must be characterized by some statistical measure for the
purpose of estimation or comparison with similar data or making inference about the
sample population to which the data belong.
Statistical measures can be classified into:

Measures of Central tendencies

Measures of Variation
Measures of Skew ness
Measures of Kurtosis
Time series

1. Measures of Central Tendency: Following are the measures of central tendency.

(a) Arithmetic Mean
(b) Weighted Arithmetic Mean
(c ) Median
(d) Mode
(e) Geometric Mean
(f) Harmonic Mean
2. Measures of Variation are
1. Range and Coefficient of range
2. Quartile deviation and Average Deviation
3. Standard Deviation
4. Coefficient of Variation
3. Measures of Skewness : Shape of distribution is another characteristics which is a
measure of concern. Shape of a distribution explains the nature of the distribution of
frequencies of observations which is defined as Coefficient of Skewness and it ranges
from -1 to +1. If coefficient is zero then distribution is symmetrical. If it is +ive then
distribution is positively skewed then relation between mean, median and mode is
If coefficient is ive the distribution is ively skewed here Mean<Median<mode.
Measures of Skewness:
1. Karl Pearsons Coefficient of skewness= (mean-mode)/st. deviation
With Mode =3(median) 2 (Mean)
CS= 3(Mean-Median)/St. Deviation
4. Kurtosis: Even if know measures of central tendency, Variation and Skewness we still
cannot form a complete idea about distribution. We should know convexity of distribution
or of the frequency curve or Kurtosis. Kurtosis enables us to have idea about the flatness
of or peaked ness of the frequency curve.
It is Measured by Coefficient B2 or its derivative Gamma2
Normal curve is called masokurtic curve.
5. Time Series: Many a times researchers have to deal with quantities which changes in
value with time. For obtaining the knowledge about the nature of variation of a quantity
along with time, time series can be used.

Graphs can used to plot such values and is called Histogram.of time series.
The fluctuations may be due to:
(a) Causes which operate over a long time period
(b) Causes which operate over a short time period
These causes are segregated and this process is called analysis of time series.
The variations in the value of the variance can be analysed into the following three
main components.
1. The basic or long time period
2. Short time of periodic changes
3. Irregular fluctuations
Measurement of Trend:
a. Free hand smoothing
b. Sectional Average: in this whole series is divided into suitable number of sections
and average of each section is found. These averages are plotted against the mid
year of the sections, then a free hand smooth curve is drawn through these points.
The curve represents Trend.
c. Method of Moving Average: In this fluctuations due to cyclical changes are
eliminated by averaging the values of the variance for a specified number of
successive years. Number of years over which the values are averaged depends
upon the average length of the cycle found in the series. Then mean is taken. All
these means are plotted and successive points are joined by straight line segment.
The resulting polygonal graph indicates the trend of the given time series.
d. Method of Least Squares: From the mid year of the time series , time deviations
are to be taken and these deviations are to be squared. Then multiply the values
with the squares. The resultants are trend ordinates. When these are plotted against
the corresponding year we get the line of best fit in the sense of least square.
Correlation: The relation between two or more characteristics of a population or a sample
can be studied with the help of a statistical method called correlation. If two quantities
vary in a related manner so that a movement in increase or decrease in one tends to be
accompanied by a movement in the same or in the opposite direction in the other, the two
quantities are said to be correlated. It may be +ive or ive. It may be perfect or imperfect.
Methods: 1. Graphic Method; 2. Scatter Diagram; 3. Co-efficient of correlation
Co-efficient of correlation is a numerical measure of correlation.
1. Karl pearson coefficient of correlation also called product movement correlation
2. Spearmans rank correlation
Test of significance is done for both measures by using t- test

Coefficient of Determination measures variations explained by the independent variable. It

is ratio of explained variations to total variations.
Regression Analysis: Regression means stepping towards the average. Regression is
dependence of a variable on one of more variable/s.


Y = a+ b X +u is a linear form of regression equation.

Y = a + b1X1 + b2X2+ -------+ bn Xn+ Ui

is a example of multiple regression

U is a random error to fit the st. line we apply the method of least squares. In order to
estimate a and b we need to minimize sum of square of Ui. For this we solve the required
regression equations.
Coefficient of Determination: In order to measure the extent of strength of correlation
between the dependent and independent variable/s we calculate the statistic called
coefficient of determination (r2/R2)
This measure is developed on the basis of two levels of variations
The variations of Y values around the fitted regression line given by (Y-^Y)2 and
The variations of Y values around their own mean given by (Y-Y*)2
Where Y* is mean of Y. then r2/R2 = 1- e2 /Y2
The value of r2/R2 shows the goodness of fit of the regression equation/s. Higher value of
r2/R2 , higher the closeness of fit and lesser r2/R2 , lesser the goodness of fit.


Sampling is a process of selecting a subset of randomized number of members of the
population of a study and collecting data about their attributes. Based on the data of the
sample the analyst will draw inference about the population.
Advantages of Sampling:
(1) Less time taken to collect data
(2) Less cost for data collection
(3) Physical impossibility of complete enumeration necessitates sampling
(4) More accuracy of data collected due to its limited size.
Sampling Frame: The complete list of all the members/units of the population from
which each sampling unit is selected is known as sampling frame. It should be free from
Sampling Methods:
Sampling methods are divided into two/;
(a) Probability Sampling
(b) Non-Probability Sampling
Probability Sampling: In probability sampling each unit of the population has a
probability of being selected as an unit of the sample. But this probability varies from one
method to another method of probability sampling.
In non-probability sampling there may be instances that certain units of population will
have zero probability of selection, because judgment biases and convenience of the
interviewer are considered to be the criteria for the selection of sample units.
Probability Sampling Methods:
(1) Simple Random sampling
(2) Systematic Sampling
(3) Stratified Sampling
(4) Cluster Sampling
(5) Multistage Sampling
(1) Simple Random Sampling: Let N = No of units of population
n = no of unit of sample
Where n< N
There are two ways of performing SRS (a) with replacement abd (b) without
SRS with replacement: Each unit of the pop. has the equal probability of being
Prob. Of selection = 1/N
Selection is done by using Random Number Tables
SRS without replacement: Each unit of pop has varing prob of being selected as an
unit of the sample.
The Prob of First unit = 1/N
The Prob of second unit = 1/N-1
-- - - - - - - - - - - - - - - The Prob of nth unit = 1/N-(n-1)


Unit are selected from the population based on the respective probability using
Monte-Crlo Simulation.
(2) Systematic Sampling: This is a special kind of random sampling in which the
selection of the first unit of the sample from the popo is based on randomistion.
The remaining units of the sample are selected from the pop at a fixed interval of n,
where n is a sample size.
Sampling interval width I = N/n
(3) Stratified Sampling: It is an improvised sampling over simple random sampling
and systematic sampling.
In this sampling pop is divided into specified set of strata such that members with
in stratum have similar attributes but members between strata have dissimilar
(a) Proportional Stratified sampling: when same proportion of units are
selected from each stratum. There is no much difference (less variance) in
attributes with in each stratum.
n is the sample selected such that n = n1+n2+------+nk
N=pop size, Ni=Strata Size; ni = size of sub sample
n1= n.N1/N------------------- nk=n.Nk/N
(b) Dis proportional S.S.: When different proportion of units are selected from
each stratum. Attributes differ and there is high variance. In this sampling
the stratum which has more variance will have prop more sampling units as
compared to other stratum with less variance.
No of sampling units of the stratum i=ni= qi.si. n/ Sum qisi
si is st deviation of stratum i
qi= Ni/N
(4) Cluster Sampling: Pop is divided into different clusters. Memebers within the
cluster are dissimilar in terms of their attributes. But different clusters are similar
to each other. Each cluster can be treated as a small population which possess all
the attributes of the pop. Any one cluster is selected and all units of cluster
constitute the sample.
(5) Multistage Sampling: In a large scale survey covering the entire nation the size of
the sample frame will be very large. In such study multistage sampling technique is
The entire country is divided into regions.
Stage 1: Different states of the country are sampled from each region using stratified
sampling. Here it is assumed that the states within the region are similar and the
regions are dissimilar.
Stage 2: Then cluster sampling is can be used from each selected state by assuming
that different districts of each state as its cluster.
Stage 3: In each selected district a random sampling may be used to select the
proportional number of units from it.
II. Non-Probability Sampling Methods:


Convenience Sampling
Judgment Sampling also called Purposive Sampling
Quota sampling
Snowball Sampling



This lecture will help you to learn how to conduct a survey and design a
questionnaire. Survey research is being conducted in almost all areas of management,
economics and social sciences. It is quite relevant to understand the various techniques
and tools used in survey research. As a faculty of management and social sciences, you
conduct market research, socio-economic evaluation study, opinion-based studies, and
policy and programme assessment studies. For all such type of studies, conducting survey
through questionnaire becomes essential. Keeping this in view, this lecture focuses on
survey research techniques and how to design a questionnaire that gets the true opinions of
your sample. Questionnaires are the most common marketing research method. They are
used for structured interviews, written surveys, email, and internet surveys.
Conducting a survey is a useful way of finding something out, especially when
`human factors' are under investigation. Although surveys often investigate subjective
issues, a well-designed survey should produce quantitative, rather than qualitative, results.
That is, the results should be expressed numerically, and be capable of rigorous analysis.
Researchers quite often underestimate how difficult it is to carry out a survey well; a good
survey is more than a handful of questionnaires and a couple of bar charts: it requires
careful planning, methodical application, and detailed analysis of the results
Methods of getting information
We are living in an information age. More information has been published in the
last decade than in all previous history. Everyone uses information to make decisions
about the future. If our information is accurate, we have a high probability of making a
good decision. If our information is inaccurate, our ability to make a correct decision is
diminished. Better information usually leads to better decisions. There are six common
ways to get information. These are: literature searches, talking with people, focus groups,
personal interviews, telephone surveys, and mail surveys.
1. A literature search involves reviewing all readily available materials. These materials
can include internal company information, relevant trade publications, newspapers,
magazines, annual reports, company literature, on-line databases, and any other published
2. Talking with the people is a good way to get information during the initial stages of a
research project. It can be used to gather information that is not publicly available, or that
is too new to be found in the literature. Examples might include meetings with prospects,
customers, suppliers, and other types of business conversations at trade shows, seminars,
and association meetings. Although often valuable, the information has questionable
validity because it is highly subjective and might not be representative of the population.
3. A Focus Group is used as a preliminary research technique to explore peoples ideas
and attitudes. It is often used to test new approaches (such as products or advertising), and
to discover customer concerns. A group of 6 to 20 people meet in a conference-room-like
setting with a trained moderator. The room usually contains a one-way mirror for viewing,
including audio and video capabilities. The moderator leads the group's discussion and
keeps the focus on the areas you want to explore. Their disadvantage is that the sample is
small and may not be representative of the population in general.


4. Personal Interviews are a way to get in-depth and comprehensive information. They
involve one person interviewing another person for personal or detailed information.
Personal interviews are very expensive because of the one-to-one nature of the interview.
Typically, an interviewer will ask questions from a written questionnaire and record the
answers verbatim. Personal interviews (because of their expense) are generally used only
when subjects are not likely to respond to other survey methods.
5. Telephone Surveys are the fastest method of gathering information from a relatively
large sample (100-400 respondents). The interviewer follows a prepared script that is
essentially the same as a written questionnaire. However, unlike a mail survey, the
telephone survey allows the opportunity for some opinion probing. Telephone surveys
generally last less than ten minutes.
5. Mail Surveys are a cost-effective method of gathering information. They are ideal for
large sample sizes, or when the sample comes from a wide geographic area. They cost a
little less than telephone interviews, however, they take over twice as long to complete
(eight to twelve weeks). Because there is no interviewer, there is no possibility of
interviewer bias. The main disadvantage is the inability to probe respondents for more
detailed information.
6. Email and Internet surveys are relatively new and little is known about the effect of
sampling bias in Internet surveys. While it is clearly the most cost effective and fastest
method of distributing a survey, the demographic profile of the Internet user does not
represent the general population, although this is changing. Before doing an e-mail or
Internet survey, carefully consider the effect that this bias might have on the results.
What is Survey?
A survey is a method of collecting information directly from people about their
ideas, feelings, health, plans, beliefs, and social, educational and financial background. It
usually takes the form of self-administered questionnaires and interviews. Selfadministered questionnaires can be completed by hand or by computer. Interviews take
place in person or on telephone.
Why do we conduct survey?
There are at least three good reasons for conducting surveys.
1. A policy needs to set or a programme must be planned.
Surveys are conducted to meet policy or programmes needs. For instance, a
company is considering providing day care for children of its working staff. How many
have young children? How many would use the agency services?
2. You want to evaluate the effectiveness of programmes to change peoples
knowledge, attitudes, health or welfare.
3. You are a researcher and a survey is used to assist you.

When is a Survey Best?

Many methods are available for obtaining information about people. A survey is
one of them. Surveys can be used to make policy or to plan and evaluate programmes and

conduct research when the information you need should come directly from people. The
data they provide are descriptions of attitudes, values, habits and background characterizes
such as age, health, education and income.
Types of Survey
1. Cross-sectional Survey
With this design, data are collected at a single point in time. Think of a cross
sectional survey as a snapshot of a group of people or organizations. Cross-sectional
surveys have several advantages.
2. Longitudinal Surveys
With longitudinal survey, data are collected over time. At least three variations are
particularly useful.
(a) Trend: a trend design means surveying a particular group over time. For example,
studying a group of rural peoples socio-economic conditions over time.
(b) Cohort: In cohort survey, you study a particular group over time but people in the
group may vary.
(c) Panel: panel survey means collecting data from the same sample over time.
The Survey Content

Select your information needs or Hypotheses.

Make sure you can get the information you need
Do not ask for information unless you can act on it.
Survey items may take form of open-ended or forced choice questions: Forced
choice questions with several choices are easier to score than open ended, short
answer, essay questions. Open ended questions give respondents an opportunity to
state a position in their own words; unfortunately these words may be difficult to

Rules for Writing Survey Items with Forced Choices

1. Each question should be meaningful to respondents. In a survey of political views,
the questions should be about the political process, parties, candidates and so on. If
you introduce other questions that have no readily obvious purpose, such as those
about age or gender, you might want to explain why they are being asked.
2. Use Standard English. Because you want an accurate answer to each survey items,
you must use conventional grammar, spelling and syntax.
3. Make questions concrete: questions should be close to the respondents personal
experience. For instance, asking respondents if they enjoyed a book is more
abstract than asking if they recommended it to others or read more books of the
same author.
4. Avoid biased words and phrases: certain names, places and views are emotionally
charged. When include in a survey, they unfairly influence peoples response.
5. Check your own biases: an additional source of bias is present when survey writers
are unaware of their own position towards a topic. Look at this:
Do you think the left parties and the congress will soon reach a greater degree of
understanding? (Biased question)
When you have questions that you suspect encourage strong views on either side.

Better: in your opinion, in the next two years, how is the relationship between the left
parties and congress likely to change?
Much improvement
Some improvement
Some worsening
Much worsening
Impossible to predict
6. Use caution when asking about the personal: another source of bias may result
from questions that may intimidate the respondent.
How much do you earn each year? Are you single or divorced? How do you feel about
your teacher, counselor or doctor? When personal information is essential to the
survey, you can ask questions in the least emotionally charged way if you provide
categories of responses.
Poor: What was your annual income last year? Rs..
Better: In which category does your annual income last year fit best:
Below Rs. 100000
Rs. 100000 - Rs.150000
Rs. 150000 Rs. 200000
Rs. 200000 Rs. 250000
Rs. 250000 and above.
7. Each question should have just one thought: do not use questions in which a
respondents truthful answer could be both ways yes and no at the same time
Survey Design
Here, we shall discuss options and provides suggestions on how to design and conduct a
successful survey research. There are 7 steps in the survey research:

Establish the objectives of the study - What you want to examine

Determine your sample - Whom you will interview
Choose interviewing methodology - How you will interview
Create your questionnaire - What you will ask
Pre-test the questionnaire, if practical - Test the questions
Conduct interviews and enter data - Ask the questions
Analyze the data - Produce the reports

Setting Objectives
The first step in any survey is deciding the objectives. If your objectives are unclear, the
results will probably be unclear. Some typical objectives may be like these:

The potential market for a new product or service

Ratings of current products or services
Employee attitudes
Customer/patient satisfaction levels
Reader/viewer/listener opinions
Association member opinions

Opinions about political candidates or issues

Corporate images

Selecting Your Sample

There are two main components in determining whom you will interview. The first
is deciding what kind of people to interview. Researchers often call this group the target
population. If you conduct an employee attitude survey or an association membership
survey, the population is obvious. If you are trying to determine the likely success of a
product, the target population may be less obvious. Correctly determining the target
population is critical. If you do not interview the right kinds of people, you will not
successfully meet your goals.
The next thing to decide is how many people you need to interview. You must make a
decision about your sample size based on factors such as: time available, budget and
necessary degree of precision.
A survey is biased if its outcome has been influenced by factors other than the one being
studied. Bias is occasionally overt: the experimenter is not open-minded about the results,
and interprets them wrongly. But more often bias comes from poor survey design. A
typical problem is that of comparing two groups of people that are not really alike. For
example, if there are more men than women in one group, and more women than men in
another, the responses of the groups to any question will be influenced by the differences
between men and women. The solution to this problem is that of randomization. In some
cases it is necessary to use `stratified' random sampling to ensure that the sample is typical
of the population.
Selecting Respondents
Select survey respondents at random from the intended audience. If at all possible,
identify a comparison group that doesn't get the information so that you can see how much
of the change in knowledge, attitude, and/or behavior is a result of your information versus
a result of other factors in the market place. This is a variation on a control group; in a real
experiment, you would randomly assign people to either a group that gets the information
or the control group that would not. But random assignment is not feasible in the context
of report cards, so a comparison group is an acceptable alternative. One of the easiest
ways to create a comparison group is to collect baseline data, i.e., responses to key
questions collected before the information was disseminated. This is often referred to as a
"pre/post" survey. You do not have to contact the same people before and after the
distribution period. But be sure to survey a representative sample each time so that their
responses are comparable.
Interviewing Methods
Once you have decided on your sample you must decide on your method of data
collection. Each method has advantages and disadvantages. Some of the methods are listed
as follows:
1. Personal Interview: An interview is called personal when the Interviewer asks the
questions face-to-face with the Interviewee. Personal interviews can take place in the
home, at a shopping mall, on the street, outside a movie theater or polling place, and so on.


2. Telephone Surveys: Surveying by telephone is the most popular interviewing method.

This is made possible by nearly universal coverage.
3. Mail Surveys: One way of improving response rates to mail surveys is to mail a
postcard telling your sample to watch for a questionnaire in the next week or two. Another
is to follow up a questionnaire mailing after a couple of weeks with a card asking people
to return the questionnaire.
4. Computer Direct Interviews: These are interviews in which the Interviewees enter their
own answers directly into a computer. They can be used at malls, trade shows, offices, and
so on. Some researchers set up a Web page survey for this purpose.
5. Email Surveys: Email surveys are both very economical and very fast. More people
have email than have full Internet access. This makes email a better choice than a Web
page survey for some populations. On the other hand, email surveys are limited to simple
questionnaires, whereas Web page surveys can include complex logic.
6. Internet/Intranet (Web Page) Surveys: Web surveys are rapidly gaining popularity.
They have major speed, cost, and flexibility advantages, but also significant sampling
limitations. These limitations make software selection especially important and restrict
the groups you can study using this technique. Internet survey is recommended mainly
when your target population is Internet users. Business-to-business research and employee
attitude surveys can often meet this requirement. Another reason to use a Web page survey
is when you want to show video or both sound and graphics. A Web page survey may be
the only practical way to have many people view and react to a video.
Tips for Improving Response Rates

Know your respondents. Make certain the questions are understandable to

them, to the point, and not insensitive to their social and cultural values.

Use trained personnel to recruit respondents and conduct surveys. Set up a

quality assurance system for monitoring quality and retraining

Identify a larger number of eligible respondents than you need in case you do
not get the sample size you need.

Keep survey responses confidential or anonymous

Send remainders to complete mailed surveys and make repeat phone calls.

Provide gift or cash incentives

Be realistic about the eligibility criteria. Anticipate the proportion of

respondents who may not be able to participate because of survey
circumstances (such as incorrect addresses) or by change (sudden illness).

Formally respect each respondents privacy.


Questionnaire is widely used for data collection in survey research. It is fairly reliable tool
for gathering data from large, diverse, varied and scattered groups. Questionnaire is a list
of questions sent to a number of persons for their answers and which obtains standardised
results which can be tabulated and treated statistically.
Sometimes a distinction is made between `questionnaire and `schedule or `interview
guide. Generally questionnaire is mailed to the respondents who are to give answers in a
manner specified either in the covering letter or in the main questionnaire itself. On the
other hand a schedule refers to a form of questionnaire which is generally filled in by the
investigator himself. He/she sets with the informant face to face and fills the form.
Schedule is more effective than mailed questionnaire because in most of the cases,
respondents do not response properly the mailed questionnaire because of ignorance,
illiteracy and lack of awareness and interest, while in case of schedule, investigator has
face to face contact with respondents, she/he would be able to get reliable information
from the respondents.
Types of Questionnaire
Questionnaire may be broadly of two types, viz. Structured and unstructured
questionnaire. According to P.V. Young, structured questionnaires are those which pose
definite, concrete, and pre-determined questions, i.e.; they are prepared in advance and not
constructed on the spot during the question period. Additional questions may be asked
only when some clarification is required. Answers to these questions are normally given
with high precision. For e.g. age, sex, marital status, number of children nationality etc.,
are automatically structured. Structured questionnaire may further be grouped into closed
form or open-end questionnaire. A close form questionnaire is one in which questions are
set in such a manner that it leaves only a few alternative answers. The informant is left
with only a few choices to answer them. For e.g., do you think poverty and
unemployment have increased in India after economic reform? Yes/No/Cant say.
In above stated question, respondent has to select one out of three alternatives.
The open-ended questionnaire, on the other hand, is one in which the respondent has full
choice of using his own style and diction of language expression, length and perception.
He has enough freedom while providing answers to open questions.
The unstructured questionnaire contains a set of questions which are not structured
in advance and which may be adjusted according to the need of question period. The
unstructured questionnaire is used mainly for conducting interviews. Flexibility is its
chief merit.
A widespread criticism of closed questionnaire is that they force people to choice
among offered alternatives instead of answering in their own words. Closed questions
spell the response options; they are more specific than open questions and therefore more
apt to communicate the same frame of reference to all respondents.
Let us take a hypothetical case; we want to identify the most important problem
facing the country. In open-closed experiment people are asked what they think is the most

important problem facing the nation. In a close-ended framework, we set five alternatives,
namely, unemployment, economic disparity, crime, poor governance and inflation. One
open-ended question is also set. In response to the open-ended questions, the respondents
may identify power shortages the most vital problem of the country. Thus, open-ended
questions are also relevant, especially when the researcher has inadequate knowledge
about the various problems faced by the country.
Construction of Questionnaire
A. General Considerations
1. Most problems with questionnaire analysis can be traced back to the design phase
of the project. Well-defined goals are the best way to assure a good questionnaire
design. When the goals of a study can be expressed in a few clear and concise
sentences, the design of the questionnaire becomes considerably easier. The
questionnaire is developed to directly address the goals of the study.
2. One of the best ways to clarify your study goals is to decide how you intend to
use the information. This sounds obvious, but many researchers neglect this task.
3. Be sure to commit the study goals to writing. Whenever you are unsure of a
question, refer to the study goals and a solution will become clear. Ask only
questions that directly address the study goals.
4. KISS - keep it short and simple. If you present a 20-page questionnaire most
potential respondents will give up in horror before even starting. A one of the
most effective methods of maximizing response is to shorten the questionnaire.
5. If your survey over a few pages, try to eliminate questions. Many people have
difficulty knowing which questions could be eliminated. For the elimination
round, read each question and ask, "How am I going to use this information?" If
the information will be used in a decision-making process, then keep the
question... it's important. If not, throw it out.
6. Involve other experts and relevant decision-makers in the questionnaire design
7. Formulate a plan for doing the statistical analysis during the design stage of the
project. Know how every question will be analyzed and be prepared to handle
missing data. If you cannot specify how you intend to analyze a question or use
the information, do not use it in the survey.
8. Provide a well written cover page. The respondent's next impression comes from
the cover letter (for mailed questionnaire). It provides your best chance to
persuade the respondent to complete the survey.
9. Giver your questionnaire a title that is short and meaningful to the respondents. A
questionnaire with a title is generally perceived to be more credible than one
10. Begin with a few non-threatening and interesting items. If the first items are too
threatening or "boring", there is little chance that the person will complete the
11. Leave adequate space for respondents to make comments. Leaving space for
comments will provide valuable information not captured by the response
12. Place the most important items in the first half of the questionnaire. Respondents
often send back partially completed questionnaires.

13. Use professional production methods for the questionnaireeither desktop

publishing or typesetting and key-lining. Be creative.
14. The final test of a questionnaire is to try it on representatives of the target
B. Language
The wording of a question is extremely important. Researchers strive for
objectivity in surveys and, therefore, must be careful not to lead the respondent into giving
a desired answer. Many investigators have confirmed that slight changes in the way
questions are worded can have a significant impact on how people respond.
Because questionnaires are usually written by educated persons who have special
interest in and understanding of the topic of their investigation and because these people
usually consult with other educated and concerned persons, it is common for
questionnaires to be overwritten, over complicated, and too demanding of the respondent.
Therefore, it requires special measures to cast questions that are clear and straight forward
in four important aspects; simple language, common concepts, manageable tasks and
widespread information.
In choosing the language for a good questionnaire, the nature and structure of
population to be studied should be kept in mind. Technical terms and jargons should be
avoided to the maximum possible extent. Words used in ordinary conversation should be
preferred. For example:

start and so on

In surveys of general population, questions should consist of simple words, which

convey the exact meaning. Ambiguous and vague words should be avoided. As far as
possible, the words of local dialect should be used. Double version of questions should be
Common concepts should be used in the questionnaire. Mathematical abstractions
tend to be difficult for the general public `variance for instance survey investigators
would not think of asking the general public questions about variances or standard
deviations. They know perfectly well that the concept of an average is much more widely
understood than others.


C. Question Content
A questionnaire designer has to ensure that all the necessary items are duly
incorporated in the questionnaire. The investigator may take the help of standard
checklists to see that all the required items are included in the questionnaire. The checklist
can also be prepared by the investigator himself. Check lists may differ depending upon
the aims and objectives of the survey research. Some of the important items of checklist
of content are as follows:

Is this question necessary for clear understanding? Just how well it is used.
Are several questions needed on the subject matter of this one question?
Do the respondents have the information necessary to answer the questions?
Does the question need to be more concrete, more specific and closely related to
the respondents experience?
Is the question content sufficiently general and free from superiors concreteness
and specificity?
Is the question content biased or loaded in one direction without accompanying
questions to balance the emphasis?

D. Question Types
Researchers use three basic types of questions: multiple choice, numeric open end and
text open end (sometimes called "verbatim"). Examples of each kind of question follow:
Multiple choice Question
1. Where do you live?

(1) Northern Region (2) Central Region (3) Eastern region

(4) Western region (5) Southern region

2. Numeric Open End Question

How much did you spend on fruits last week? ------------3. Text Open End
How can your company improve its working conditions?
--------------------------------------------------------------------------------Rating Scales and Agreement Scales are two common types of questions.
Rating scale Example
How would you rate this Product?
1. Excellent
2. Very good
3. Good
4. Fair
5. Poor
On a scale where 10 means you have a great amount of interest in a subject and
1 means you have none at all, how would you rate your interest in each of the following


New economic policy

SEZ policy
Corporate social responsibility
Labour Market reforms

Agreement scale Example

How much do you agree with each of the following statements:
Sl. Statement
y agree
constructive criticism
Our medical plan provides
adequate coverage
Globalization has benefited
the Indian Economy
Rural-urban disparities has
increased in the post-reform




E. Qualities of a Good Question

The qualities of a good question are as follows:
1. Evokes the truth. Questions must be non-threatening. Anonymous questionnaires
that contain no identifying information are more likely to produce honest
responses than those identifying the respondent.
2. Asks for an answer on only one dimension. For example, a researcher
investigating a new food snack asks "Do you like the texture and flavor of the
snack?" If a respondent answers "no", then the researcher will not know if the
respondent dislikes the texture or the flavor, or both. Another questionnaire asks,
"Were you satisfied with the quality of our food and service?"
3. Can accommodate all possible answers. Multiple choice items are the most
popular type of survey questions because they are generally the easiest for a
respondent to answer and the easiest to analyze. For example, consider the
What brand of computer do you own? __
B. Apple
Clearly, there are many problems with this question. What if the respondent doesn't own
a microcomputer? What if he owns a different brand of computer? What if he owns both
an IBM PC and an Apple? There are two ways to correct this kind of problem.
The first way is to make each response a separate dichotomous item on the questionnaire.
For example:
Do you own an IBM PC? (circle: Yes or No)

Do you own an Apple computer? (circle: Yes or No)

Another way to correct the problem is to add the necessary response categories and allow
multiple responses. This is the preferable method because it provides more information
than the previous method.
What brand of computer do you own?
(Check all that apply)
__ Do not own a computer
__ Apple
__ Other
4. Has Mutually exclusive options. A good question leaves no ambiguity in the mind of
the respondent. There should be only one correct or appropriate choice for the respondent
to make
5. Produces variability of responses. When a question produces no variability in
responses, we are left with considerable uncertainty about why we asked the question and
what we learned from the information. If a question does not produce variability in
responses, it will not be possible to perform any statistical analyses on the item. For
What do you think about this report? __
A. It's the worst report I've read
B. It's somewhere between the worst and best
C. It's the best report I've read
Since almost all responses would be choice B, very little information is learned.
6. Does not presuppose a certain state of affairs. Among the most subtle mistakes in
questionnaire design are questions that make an unwarranted assumption. An example of
this type of mistake is:
Are you satisfied with your current auto insurance? (Yes or No)
This question will present a problem for someone who does not currently have auto
insurance. Write your questions so they apply to everyone.
One of the most common mistaken assumptions is that the respondent knows the correct
answer to the question. Industry surveys often contain very specific questions that the
respondent may not know the answer to. For example:
What percent of your budget do you spend on direct mail advertising? ____
7. Does not imply a desired answer. The wording of a question is extremely important. As
Don't you think most of the politicians are corrupt?


8. Does not use emotionally loaded or vaguely defined words. Quantifying adjectives
(e.g., most, least, majority) are frequently used in questions. It is important to understand
that these adjectives mean different things to different people.
E. Question Sequence
Items of a questionnaire should be grouped into logically coherent sections.
Grouping questions that are similar will make the questionnaire easier to complete, and
the respondent will feel more comfortable. Questions that use the same response formats,
or those that cover a specific topic, should appear together. Each question should follow
comfortably from the previous question. Writing a questionnaire is similar to writing
anything else. Transitions between questions should be smooth. Questionnaires that jump
from one unrelated topic to another feel disjointed and are not likely to produce high
response rates.
Some researchers have suggested that it may be necessary to present general
questions before specific ones in order to avoid response contamination. Other
researchers have reported that when specific questions were asked before general
questions, respondents tended to exhibit greater interest in the general questions.
The numbering of questions should be in a logical sequence. To check the
sequence of questions the following questions should be answered.

Are the answers to the questions likely to be influenced by the content of the
preceding questions?
Are the questions led up to in a natural way?
Do some questions come too early or too late from the point of view of arousing
interest and receiving sufficient attention, avoiding resistance and inhabitations?

E. Commandments for Construction of Good Questionnaire

D.C. Miller provides a guide to the questionnaire construction:

Keep the language pitched to the level of respondent.

Try to pick words that have the same meaning for every one.
Avoid long questions
Do not have a priori assumption that your respondent possesses factual information
or first hand opinions.
Establish the frame of reference you have in mind.
In informing a question either suggest all possible alternatives or do not suggest
Protect your respondents ego.
If you are after unpleasant orientations, give your respondent a chance to express
his positive feeling first so that he is not put in an unfavourable light.
Decide whether you need a direct question, an indirect question, or an indirect
followed by a direct question.
Decide whether the question should be open or closed.
Decide whether general or specific questions are needed.
Avoid ambiguous wording.
Avoid biased questions.
Phrase questions so that they are not unnecessarily objectionable.


Decide whether a personal or impersonal question will obtain the better response.
Questions should be limited to a single idea or a single reference.

Pre-test the Questionnaire

The last step in questionnaire design is to test a questionnaire with a small number
of interviews before conducting your main interviews. Ideally, you should test the survey
on the same kinds of people you will include in the main study. Pre-tests and Pilot study
are the essence of a good questionnaire. It enables the investigator to identify the mistakes
and unwarranted and undesirable trends that might have crept into the questionnaire. It
helps in enriching the design of the questionnaire and assists in testing the validity and
reliability of statistical techniques to be adopted for data processing and analysis.
Questionnaire for Interviewers/Investigator
After making a pre-test of a questionnaire, a questionnaire for interviewer should
be constructed for getting relevant information so that the mistakes or inconsistency
observed in the questionnaire may be removed.

Did any of the questions seem to make respondent uncomfortable?

Did you have to repeat any questions?
Did respondent misinterpret any questions?
Which questions were the most difficult or awkward for you to read? Have you
come to dislike any specific questions? Why?
Did any of the sections seem to drag?
Were there any sections in which you felt that the respondent would have liked the
opportunity to say more? and so on.

After finalising the questionnaire/schedule by correcting it on the basis of pretesting, the investigator has to collect data from the field. The following points should be
taken into account by the investigator while collecting data through



He must plan in advance and should fully know the problem under consideration.
He must choose a suitable time and place so that respondent should be ease during
All possible efforts should be made to establish proper rapport with the informant;
people are motivated to communicate when the atmosphere is favourable.
He must know that ability to listen with understanding, respect and curiosity is the
gateway to communication, and hence acts accordingly during the survey.
Investigators approach must be friendly and informal. Initially friendly greetings
in accordance with the cultural pattern of the respondent should be exchanged and
then the purpose of the survey should be explained.
To the extent possible, there should be a free-flow interview and the questions
must be well phrased in order to have full cooperation of the respondent.



Testing of Hypothesis- developed by Neyman and Pearson- employs statistical tools to

arrive at a decision in certain situations where there is element of uncertainty.
In test of Hypothesis we see whether there is significant difference between parameters
and Statistic, parameter and parameter and statistic and statistic.
Inference about population on basis of sample may be pertaining to certain Hypothesis.
Hypothesis: A Hypothesis is an assumption or a theoretical proposition that is capable of
empirical verification or disproof.
It may or may not be true.
Statistical Hypothesis is an assertion about Probability distribution of one or more
random variables.
It can be simple, if it completely specifies the probability distribution of population.
or Complex or composite if it does not completely specifies the Probability distribution
of population.
Test of Significance:
The procedure to access the significance of the difference between a sample statistics and
corresponding population parameter or difference between two independent statistics is
called test of significance.
Example Agronomist wants to establish from his research experiment data if the average
yield of new variety has some specific value or not.
Or whether the yield of two varieties of wheat is same or not.
Question is how to arrive at the conclusion whether difference is real (significant) or due
to chance (called non-significant) and how large difference is to be considered
statistically significant.
Hypotheses are of two types
Null Hypothesis (H0) and Alternate Hypothesis (H1)
Decision-maker should always logically adopt a neutral or null attitude towards the
outcome of experiment.
Null Hypothesis is a statistical Hypothesis (set by statistician as a judge) of no difference
and it is tested for its possible rejection under the assumption that it is true.
Null hypothesis shall be rejected or shall not be rejected at certain level of significance
. Null hypothesis should never be accepted on the basis of one sample statistic
Alternate Hypothesis: Set by Experimental. The Hypothesis representing the opposite of
the null hypothesis is called alternate hypothesis. It is any statistical hypothesis, which is
complementary to null hypothesis
Ex. If we are to test whether the average per capita of two states differ significantly or not,
The null hypothesis will be
H0: a= b ( i.e. PCI of states A & B do not differ significantly)
Alternate Hypothesis
1. H1: a = b (Two Tailed alternate)
2. H1: a > b ( one Tailed ; Right tailed alternate)
3. H1: a < b ( one Tailed ;Left tailed alternate)


Significance Level: The significance is the probability with which null hypothesis will be
rejected due to sampling error though it is true.
Decision to reject or accept null hypothesis depends upon the information contained in the
sample and there is always a risk of taking wrong decision. One is likely to commit two
types of errors.
TYPE I Error: The error of rejecting null Hypothesis on the basis of information
contained in the sample when actually it is true is called Type-I error ( probability of
rejecting null Hypothesis when it is true) It is denoted by (In quality control it is called
producers risk because it is probability of rejecting a good lot).
Probability of committing type I error is called Level Significance = Level of
Significance = Probability of rejecting H0 when it is true.
Type-II Error: It is the probability of accepting the null hypothesis when it is false.
Also called consumers risk because it is prob of accepting bad lot. It is denoted by .
Hypothesis H0 and Hi are mutually exclusive events. i.e. if H0 is accepted (rejected) then
H1 is rejected (accepted).
Power of Test: The probability of accepting null Hypothesis on the basis of sample
information when null hypo is true is called Power of a test.
Therefore Power of Test: Prob. (Accept H0 when H0 is true)
Prob.(Accept H0 when true) + Prob. ( Accept H0 when H0 is false) = 1
Prob (Accept H0 when true)+ = 1
Therefore Power of test = Prob(Accept H0 when true) = 1-
So test will be more powerful when error is small.
Sample Space: Pop size= N, Random sample size drawn= n and possible samples are
Suppose some statistic t is computed from each of the samples.
t =f(x1,x2,x3,-----xn) Possible sample statistic are t1,t2,t3-----tk constitute sample space.
It is used to test null hypothesis. Some will lead to rejection of Ho other may lead to
acceptance of H0.
Thus sample space of statistic is divided into two disjoint and exhaustive sets.
Critical Region (W) : It is part of sample space which leads to rejection of null hypothesis
if given sample statistic fall in this region.
Acceptance Region: It is that part of sample space, which leads to acceptance of null
hypothesis, if sample statistic falls in it.
Critical Point: The point in sample space which divides the sample space in two mutually
disjoint and exhaustive sets is known as critical Point.
The critical points are tabulated values for different sampling distributions. Form of
Sample Space is determined by different sampling distribution like t, F, 2,Z etc.
Sampling Distribution: Sampling distribution is Probability distribution of a statistic.

Two tailed and one tailed tests: A two tailed test rejects the null hypothesis if, say, the
sample mean is significantly higher or lower than the hypothesized value of the mean of
the population. Such a test is appropriate when the null hypothesis is some specified value
and the alternative hypothesis is a value not equal to the specified value of the null
Symbolically, the two tailed test is appropriate when we have H0: = 0
and Hi : : 0 which may mean >0 or <0. Thus in a two tailed test there are two
rejection regions, one on each tail of the curve.
One tailed test : A one tailed test would be used when we are to test, say, whether the
population mean is either lower than or higher than some hypothesized value.
For ex. If H0: = 0
H1: < 0 then we are interested in what is known as left tailed test or if
H1: > 0 then it is one tail test, which is known as right tailed test. (Where there is only
one rejection region either on the left tail or right tail)
Tests of Hypotheses:
Tests of hypotheses (also known as tests of significance) can be classified as:
1. Parametric Tests
2. Non-parametric Tests
Parametric Tests: Parametric tests usually assume certain properties of the parent
population from which sample is drawn. Assumptions like observations come from a
normal population, sample size is large, assumptions about the population parameters like
mean, variance etc. must hold good before parametric tests can be used. Probability
distribution of statistic (sampling distribution) is known i.e. it follows particular
distribution like t, F, Z etc. Parametric tests cannot be applied if nature of parent
population is unknown and data is measured on nominal/ ordering scale.
The important parametric tests are: z-test, t-test, 2- test and F-test. (2-test is also used as
a test of goodness of fit and also as a test of independence in which case it is a nonparametric test.)
All these test are based on the assumption of normality i.e. the source of data is considered
to be normally distributed.
Z-test: it is based on the normal probability distribution and is used for judging the
significance of several statistical measures, particularly the mean. The relevant test
statistic, Z, is worked out and compared with its probable value at specified level of
significance for judging the significance if the measure concerned.
As n becomes large Z-test is generally used even when binomial distribution or tdistribution is applicable on the presumption that such a distribution tends to approximate
normal distribution.
Z- test is used for comparing the mean for the population, when pop. variance is known,
for judging the significance of difference between means of two independent samples
when pop. variance is known, for comparing the sample proportion to a theoretical value
of population or for judging the difference in proportions of two independent samples
when n happens to be large. This test may be used for judging the significance of median,
mode, coefficient of correlation and several other measures.
t-test: t-test is based on t-distribution and is considered an appropriate test for judging the
significance of a sample mean or for difference between means of two samples in case of

small sample (s) when pop. variance is not known (then sample variance is used for pop
variance.). In case two samples are related, we use paired t-test (difference test) for
judging the significance of mean of difference between two related samples. Also used for
testing the significance of the coefficient of simple and partial correlations. The relevant
test statistic t is calculated from the sample data and then compared with its probable value
based on t-distribution to read from table at different level of significance and degree of
freedom for accepting or rejecting the hypothesis.
2-test: It is based on chi-square distribution and as a parametric test is used for comparing
a sample variance to a theoretical population variance.
2 = (Xi- X)2/2 = (n-1) S2/2
with n-1 d. f.
F-test: F-test is based on F-distribution and is used to compare the variance of the two
independent samples. This test is also used in the context of analysis of variance
(ANOVA) for judging the significance of more than two sample means at one and the
same time. It is also used for judging the significance of multiple correlation coefficients.
Test statistic, F, is calculated and compared with its probable value for accepting or
rejecting the null hypothesis. (we use F-ratio Table for certain d.f. at certain level of
Non-Parametric Tests: The tests which are used when practical data may be non normal
and /or it may not be possible to estimate the parameter(s) of the data are called nonparametric tests. Since these tests are based on the data, which are free from distribution
and parameter, these tests are called non-parametric tests or distribution free tests. The
non-parametric tests can be used for nominal data (qualitative data, like greater or less
etc.) and ordinal data, like ranked data. These tests require less calculation, because there
is no need to compute parameters. Also these tests can be applied to very small samples,
more specifically during pilot studies in market research. Inference about the population
can be made by the non-parametric tests when assumptions of the standard methods
cannot be satisfied since the non-parametric tests involve no or less restricting
assumptions when compared to the parametric tests.
Main non-parametric tests are
1. One-sample tests
a. one sample sign tests
b. Chi-square test
c. Kolmogorov-Smirnov test
d. Run test for randomness
2. Two- Sample tests
a. Two-sample sign test
b. Median test
c. Mann-Whitney U test (Rank sum test)
3. K-sample test
a. Median test
b. Kruskal- wallis test (H test)
c. Kendalls coefficient of concordance test




Leverage funds to expand facilities

Accord international recognition
Support staff training
Enable Universities participate in community service
Enable academic staff to achieve promotion
Generates new knowledge for National growth and Development

Two main purposes: (i) to get a degree and (ii) conduct sponsored and Consultancy
research projects.
With increasing privatization of higher education and shrinking public grants,
greater stress of Academic Institutions is on generating their own resources.
Academic institutions need faculty capable of doing independent sponsored and
consultancy research projectslead the research team
A good research proposal (RP) is not only necessary for a high quality of research
but also for getting grant from the funding agencies
A RP must be convincing to anonymous experts who examine it and see whether it
is methodologically sound, conceptually clear and would make significant
contribution to the knowledge on the subject.
As large No. of RP submitted to the funding agencies for financial assistance, your
proposal need to be excellent and not just very good for getting approved for the
1. Research you really want to do:
Find sponsor!
--CSIR, MHRD, UGC, DST, UNDP, Foundations, NGOs, World Bank, DFID, Ministries
2. Topics some sponsor wants to see done:
Industries, organizations, Ministries,
market surveys, evaluation studies, R&D projects,
A RP is the presentation of an idea that you wish to pursue.
It is intended to convince funding agency/ RDC that you have a worthwhile
research project and that you have the competence and the work-plan to complete
A good RP presumes that you have already thought about your project and have
devoted some time and efforts in gathering information, reading and organizing
your thoughts
A high quality proposal not only promises success for the project, but also
impresses RDC about your potential as a researcher.


Letter proposal
Preliminary expression of interest to an investor
Developed into full proposal only with donors consent
Most unsolicited proposals should first be in form of letter proposal
Full proposal
Often in response to Request For Proposal - RFP
It should contain:
Statement of problem
Budget and budget explanatory notes
Capability of investigators and the institution (Curriculum Vitae)
Title page
Executive summary
Problem statement
Project description
Project hypothesis
Expected outputs
Study methods/Research approach
Budget and budgetary notes.
Logical framework
Project management and personnel
Important attachments
1. Content:
2. Time Frame: Short-term
3. Scope:
4. Teaming:
Single PI
5. Selection:
6. Client:

sole source


Need to convince reviewers of scientific merit, and of your qualifications and
ability to successfully make an important contribution to the state-of-the-art.
FOR Sponsoring Agencies
Need to convince sponsoring agency that you understand the problem, that you
have a realistic approach that is likely to succeed, that could be implemented, and that you
will deliver results that will make them look good.
Doability: a good RP must be systematic, coherent and doable

Parsimony: simple, unambiguous, no jargons, able to convey what you want to do

and how you want to do.
The most important ideas are highlighted.
Consistency among objectives, hypotheses and title
Contains Executive summary
A detailed schedule of activities
Collaboration, if any clearly stated.
Follows all of the directions given in the proposal guidelines.
Appendices for for detailed and lengthy materials
The length consistent with the guidelines of funding agency
The budget and the proposal narrative are consistent.
The uses of fund are clearly indicated.
The qualifications and experience and credentials of PI and Co-PI mentioned
Process of Selection of a Topic
Three factors: interest, competence and relevance
Identify broad area and then narrow downfollow general to particular approach
Study broadlybooks, journals and reports: the more you read, the more likely
you will encounter a topic that interest you
Think when you read; most of the ideas come upon surprisingly while you are
reading. Think and think beyond
Be inclusive with your thinking: do not try to eliminate ideas too quickly. Build on
your ideas and see how many different research topics you can identify. Be
expensive in yr thinking at this stageyou would not be able to do later.
Write down your ideas: whenever you have a good idea, no matter how small and
how immature it may be, write it down and save it. Later on, when you check your
idea box, you will be surprised to find how many brilliant ideas you already have.
Develop a topic that has interested you throughout your graduate or undergraduate
Think about the top three issues you want to study, then turn them into questions
Look at class notes; your teachers may have pointed out potential research topics
or commented on unanswered questions in the field
Talk with professors or advisors about possible topics
Study broadly to identify gaps in the literature
Get feedback on a potential topic from your advisor
Do research to discover why your topic has not been studied before .

Does the topic appeal to your interest ?

Will it bring any pecuniary reward or advancement in status ?
Can you afford the time and expense involved in completing the work ?
Will you get ample facilities for conducting investigation ? (for e.g. skilled
advice, equipment, field workers etc.)
Does the topic give sufficient scope to a problem that needs investigations.
Are the results of practical or utilitarian significance ?
Does the topic cover a gap in the existing field of knowledge and offers new
solutions, makes advancement in techniques, thought or practices ?



The function of title is to encapsulate in a few words the essence of the research

It should be catchy, small and informative

Should have some key words reflecting variables, theoretical basis, and purposes,
time, place, etc.

Leave out phrases like an investigation into, a study of, aspects of as these
are obvious attributes of a research project

In the background, the researchers should:
Create reader interest in the topic
Make sure that the reviewers know in the first few sentences what your project is
about. It is a good idea to start a RP like this: in the proposed study we seek to
Lay the broad foundation of the problem that lead to the study
Place the study within the larger context of the scholarly literature
Reach out to a specific audience
If a researcher is working within a particular theoretical framework of enquiry, the
theory of inquiry should be introduced
The efficient use of reference to keep the background short
A flow in the text, where every issue raised , leads to the research problem.
It must be clear from the text what the nature of the problem is , how it was
identified and why it is significant.
A problem statement should be presented within a context and the context should
be briefly explained, including a discussion on the conceptual framework.
The problem might be defined as the issue that exists in the literature, theory or
practice that leads to a need for the study.
The problem should be clearly defined, making the evaluation easy for the
reviewers/RDC members.
Effective problem statements answer the question: Why does this research need to
be conducted?
Prepare your arguments that leads to the statement of yr. prob.
Make a list of issues you will address in yr account of the context
Issues should be summarized in a few words and put them in an order sequence so
that it is possible to progress from one issue to the next in a logical manner
Keep the list brief but make sure it covers all vital issues
Write yr research problem in one sentence at the end of the list
check that in yr sequence the issues all relate to the problem and lead logically
towards it.
if there are gaps in the logic of the argument, add linking issues
When u think that the argument is cogent, put some flesh on the bones in the text
by making full sentence and adding references

It provides the background and context for the research problem.
It shares with the readers the results of other studies that are closely related to the
proposed study.
It relates to the proposed study to the ongoing dialogue in the literature, filling in
It provides a framework for establishing the importance of the study, as well as a
benchmark for comparing the results of the study with other findings.
Demonstrate to the reader that you have a comprehensive knowledge of the field.
Help to avoid statements that imply little has been done in he area.
The literature review must address three areas:
Topic or problem area: This part of the literature review covers material directly
related to the problem being studied. separate substantive areas.
Theory area: Investigators must identify the theory which relates to the problem
METHODOLOGY: Review of the literature related to various aspects of their
chosen method, including design, selection of subjects, and methods of data
collection. It describes research methods and measurement approaches used in
previous studies.
Purpose of the study should be clearly stated. If it is not clear to the writer, it
cannot be clear to the reader.
Briefly define the specific area of the research.
Foreshadow the hypotheses to be tested or the questions to be raised as well as
significance of the study. These will require specific elaboration in separate
Should incorporate rationale for the study.
Key points:
Start with the purpose of the study is---.
Clearly identify and define the central ideas of the study
Identify the specific method of inquiry to be used.
Typically comes after problem definition, motivation and significance.
Start with overall objective (or aim of the research), and then state two to four
specific objectives
Write a list of the objectives of yr research. Think as many as u can. When u have
done it, consider each one carefully by asking the questions:
How will the objective be achievedmethods, resources, skills, time?
Is it realistically possible to achieve it?
What results are required to achieve it?
Is the objective central to yr study?
Are there any overlaps between the objectives?
Is there any sequence or hierarchy that link one to another. If so, are they in the
correct order?
Are there too many objectives to be realistically achievable?

State what kind of relationships u expect to find between variables or factors.
Hypothesis is particularly necessary in the search for cause and effect relationship.
It is yr intelligent guess about the possible relationship. Do not hard pressto prove
that yr guess is right. It is more common to disapprove than prove the hypothesis.
A good hypothesis should possess the following features:
must be conceptually clear
should have empirical reference
must be specific
should be related to available techniques
Should be related to a body of theory
The RP must specify the research operations you will undertake and the way you
will interpret the results of these operations in terms of your central problem.
Do not just tell what you mean to achieve, tell how you will achieve.
A methodology is not just a list of research tasks but an argument as to why these
tasks add up to the best attack on the problem.
Indicates the methodological steps you will take to answer every question or test
every hypothesis.
The variables you propose to control and howexperimentally or statistically
Sampling design, data collection, data analysis techniques, Instruments, etc.
Indicate how your research will refine, revise, or extend existing
Such refinements, revisions, or extensions may have either substantive,
theoretical, or methodological significance.
Think about implicationshow results of the study may affect scholarly
research, theory, practice, policy.
Will results influence programs, methods, and/or interventions?
Will results influence policy decisions?
How will results of the study be implemented, and what innovations will
come about?

Do not promise mountains and deliver molehills
Be quite precise as to the nature and scope of the outcomes and as to who might be
the beneficiaries
Make sure that the outcome relate directly to the purpose of the research


Problems from reviewers point of view

Problem: They may not get the significance of your proposed research.
Solution: Write a compelling argument.
Problem: They may not be familiar with all your methods.
Solution: Write to the non-expert in the field.
Problem: They may not be familiar with your lab.
Solution: Show them you can do the job.
Problem: They may get worn out by having to read 10 to 15 applications in detail.
Solution: Write clearly and concisely, and make sure your application is neat, well
organized, and visually appealing. Leave out anything that is not absolutely

Problem not important enough.

Study not likely to produce useful information.
Methods unsuited to the objective.
Problem more complex than investigator appears to realize.
Too little detail in the research plan to convince reviewers.
Over-ambitious Research Plan
Direction or sense of priority not clearly defined,
Lack of original or new ideas.
Proposal lacking enough preliminary data or preliminary data do not support
project's feasibility.
Insufficient consideration of statistical needs.
Deadline for submission not met
Guidelines for proposal - content, format, length e.t.c. not followed
Study or project not a priority topic to the funding agenc


Does the title of the RP exactly identify and delineate the area of investigation?
Are the objectives clearly relate to the problem?
Does the proposal clearly identify the problem?
Does the proposal give adequate reasons to show that the study will contribute: to
knowledge; development of theory in the subject; or to either theoretical or
practical methodology?
Does the proposed RP show: how the research will be structure; how the research
will phased and carried out; techniques & methods to be used and reasons for using
them; the proposed work is practicable and commence within time frame



You need to convince the selection committee that:
your research proposal promises a notable advancement or innovation in the
discipline or results of importance to a broad range of applications;
you have identified well-formulated short- and long-term goals;
attaining these goals would be a significant contribution to the discipline;
you have a good chance of attaining the goals with the resources available.
Every proposal reader constantly scans for clear answers to three questions:
What are we going to learn as the result of the proposed project that we do not
know now?
Why is it worth knowing?
How will we know that the conclusions are valid?
The opening paragraph, or the first page at most, is yr chance to grab the attention.
This is the moment to overstate, rather than understate, your point or question.
Questions that are clearly posed are an excellent way to begin a proposal
Most roposals are reviewed by multidisciplinary committees.
An Abstract is an abbreviated summary of a research proposal while An ES is,
basically, anything but a product presentation, and nothing but a persuasive sales
While the abstract aims at convincing the reader to go through the whole document
in order to quash his thirst of information, the ES, at the opposite, aims at
persuading the reader, who is supposed to be a decision maker, to take of forgo an
Audience of abstractspecialized readers or researchers; of ESdecision-makers
Scope of Abstractthesis, articles patents for academics and public goods; of
ESsponsored and consultancy projects especially as pvt goods
Content of abstractmainly technical (problem, scope, methodology, results,
conclusions); of ES mainly managerial (outcome& benefits, problem solutions
and recommendation)
Length of Abstractshorter than the ES
Style of abstracttechnical, static and more academic; of the ESmanagerial,
dynamic and enthusiastic
Be constructive (diplomatic) in reviewing others work;
BAD: All previous studies are worthless because they failed to recognize the effect of X
on Y. Chen and Smith (1998) tried but their approach was simply wrong. Ours is the first
study to address this question correctly.
BETTER: Previous studies have made important contributions to this challenging
problem, however none of the published studies appear to have completely accounted for

the effect of X on Y. A pioneering effort in this direction is described by Chen and Smith
Do not assume that your reader/reviewer knows the problem.
DO NOT use language like:
It is well known, it is obvious or it is trivial to show.
BETTER: It is generally accepted in the literature



To give participants greater confidence in data analysis using statistical software.
Learning outcomes
On completion of the lecture, participant would be able to understand the following:

What is regression analysis?

What is regression good for?
What are the various types of regression models?
What kinds of variables can be used in regression model?
How can we judge how good the predictions are?
How do we judge how good the coefficient estimates are?
How to test hypothesis?
How to construct confidence interval?
How does regression control for variables?
What are the mediate variables?
How is the regression results interpreted?
What can go wrong with regression analysis?
What are the limitations of regression analysis?

Faculty from engineering, management, pure sciences, and social sciences who have been
actively engaging in guiding research scholars and carrying out sponsored research
projects. A basic understanding of statistics will be useful, but is certainly not essential.
Regression Analysis
It is a statistical method for studying the relationship between a single dependent variable
and one or more independent variables, with a view to estimating and/or predicting the
(population) mean or average value of the former in terms of known or fixed values of the
latter. When independent variable is only one, it is called bivariate analysis and when
independent variables are more than one, it is known as multivariate analysis.




Types of Regression models

Stochastic/ Probabilistic

If we construct a model, which hypothesizes an exact relationship between

dependent and independent variables, it is called a deterministic model. On the other hand,

if we believe that the model should be constructed to allow for random error, then we
hypothesize a probabilistic model. This includes both a deterministic component and a
random error component.
Y = a + bX


Y = a + bX + u


How to derive OLS equations?

= a + bX + u
e2 = (Y - )2
= (Y a- bX)2
Differentiating with respect to a and b respectively and equated to zero, we get
e2 / a = - 2 (Y - a - bX) = 0
e2 / b = - 2 X (Y -a - bX) = 0
Dividing both the side by 2, we get
(Y - a - bX) = 0,

Y - na - b X = 0,

X (Y -a - bX) = 0,

XY -a X - bX2 = 0

= na -+ b X

XY = a X + bX2
By solving these two normal equations, intercept and slope of linear regression line are
estimated. Values of a and b can directly be estimated by the following formulae:
B = (Yi - y) (Xi- x) / (Xi - x)2

Where y is mean of dependent variable and x is mean of independent variable

a = y bx
Bivariate regression can be done manually. When the number of independent variables is
more than one, it becomes cumbersome to estimate coefficients manually. For this, special
computer packages are used to run the regression programme.
All variables must be measured without error. (No measurement error)

For each set of values for the K independent variables, (Xi1, X2j -------Xkjj),
E (uj)= 0, i.e. mean of error term is zero.
For each set of values for K independent variable, VAR (uj) = 2 (i.e., the variance of
error term is constant) (homoscedasticity). Violation of the assumption creates the
problem of hetroscedasticity.
For any two sets of values for the K independent variable, COV (ui ,uj) = 0 (i.e., the
error terms are uncorrelated) (Autocorrelation problem).
For each Xi, COV (Xi, u) = 0 (i.e., each independent variable is uncorrelated with the
error terms) (Autocorrelation problem).
There is no perfect collinearity among the impendent variables. (Multicollinearity
For each set of the value for the K independent variable, uj is normally distributed.
Violation of these assumptions will provide biased estimate of coefficients.
What is Regression good for?
There are two uses of regression: prediction and causal analysis. In a prediction
study, the aim is to develop a formula for making predictions about the dependent
variable, based on the observed values of exogenous variables. For example, an
economist may want to predict next years GNP based on such variable as last years
GNP, current interest rates, current level of rate of investment and other variables.
In a causal analysis, the independent variables are regarded as causes of the
dependent variables. The aim is to determine whether a particular exogenous variable
really affects the endogenous variable and to estimate the magnitude of that effect, if
any. However, these two uses are not mutually exclusive.
Why is linear regression so popular?
It does two things: for prediction studies, it makes possible to combine many
variables to produce optimal predictions of the endogenous variable and for causal
analysis, it separates the effects of exogenous variables on the endogenous variables.
Sophisticated non-linear regression models are very complicated and require high level
of mathematical skill and specialized software.
What will happen if true relationship is not linear?
In a bivariate analysis, the relationship between the two variables can easily be
identified by plotting the data on graph. If a straight line is formed, linear function is
used. It is difficult to know the linearity among the variable when the number of
exogenous variables are large. If the real relationship is non-linear, and the linear
function is used, the analysis may provide inefficient results. A useful general
principle in science is that when you do not know the true form of a relationship, start
with something simple. A linear equation is perhaps the simplest way to describe a
relationship between two or more variables and still get reasonably accurate

predictions. Furthermore, it is essay to modify the linear equation to represent certain

kinds of non-linearity.
What kinds of data are needed for regression analysis?
To do a regression analysis, we first need a set of cases. These cases must be in
sufficient number. Most regression analysts would be reluctant to do a regression with
less than five cases per variable. Because we cannot afford to study the entire
population, we take a probability sample with n cases. It is a sample in which the
probability of selecting any possible sample of size is known or can be calculated.
There are basically three types of probability samples: simple random sample,
stratified samples and cluster samples.
What kinds of variables can be used in Regression model?
1. Interval Scales Variables: Quantitative variables like age, income, year of
schooling, production, and consumption, etc. are measured on some well-defined
scale. For each of these scales, it is reasonable to claim that an increase of a
specified amount means the same thing no matter where you start. These variables
are the most appropriate for regression analysis.
2. Ordinal Scales Variables: Many variables in management and social sciences are
based on the opinion of the respondents. For example, people may be asked
whether they strongly agree, agree, agree somewhat, disagree and strongly disagree
to the statement that local government is more effective in disaster management
than central government. Most people would accept the claim that higher scores
represents stronger agreement with the statement, but it is not at all clear that the
distance between 1 and 2 is same as the distance between 2 and 3, or between 4 or
5. Variables like this are called ordinal scale. Ordinal variables are not appropriate
for regression analysis because the linear equation, to be meaningful, requires
information on the magnitude of changes. If you use such variables, you are
implicitly assuming that an increase or decrease of one unit on the scale means the
same no matter where you start.
3. Nominal Scales variables: There are some variables that do not have any order at
all like gender, marital status, literate-illiterate, rural-urban, poor-rich, etc. If the
variables have two categories, just assign a score of 1 to one of the categories and 0
to the other. Thus binary numbers are generated. Such variables are called dummy
variables or indicator variable. In case of more than two categories, more than one
dummy variable can be generated. Dummy variables are appropriate as exogenous
variables for the regression analysis. If dependent variable is dummy, then some
advanced regression models such as logistic regression are used.
How can we judge how good the predictions are?
The most common statistics for doing this is coefficient of determination (R2). The basic
idea behind R2 is to compare two quantities:

The sum of squared errors produced by the least squares equation and
The sum of squared errors for a least squares equation with no independent
variables (just the intercept).

When an equation has no independent variables, the least squares estimate for the intercept
is just the mean of the dependent variable.

R2 = 1- { (Y - ) 2 / (Yi - y)}
R2 = 1- RSS / TSS = ESS/TSS
R-2 = 1- (RSS/ n-k) / (TSS/ n-1)
F = (ESS/ k-1)/ (RSS/ n-k) = (ESS/ k-1) X (n-k) / RSS = ESS (n-k)/RSS (k-1)

Dividing by TSS, we get

F = (n-k) (ESS/ TSS/ (k-1) RSS/ TSS) = (n-k) R2 / (k-1) (TSS ESS/TSS)
F = (n-k) R2 / (k-1) (1- R2 )
How do we judge how good the coefficient estimates are?
In any regression analysis, we typically want to know something about the
accuracy of the numbers we get when we calculate estimates of regression coefficients.
There are three possible sources of error:

Measurement error: very few variables can be measured with perfect accuracy,
especially in social sciences.
Sampling error: In many cases, our data are only a sample from larger population
and the sample will never be exactly like the population.
Uncontrolled variations: there may be so many other variables that are not under
the control. They can disturb the relationship between the dependent and
independent variables included in the function.

The basic assumption is that the errors occur in a random and unsystematic
fashion. We evaluate the extent and importance of this random variation by calculating
confidence intervals or hypothesis tests.
Confidence intervals give us a range of possible values for the coefficients.
Although we may not be certain that the true value falls in the calculated range, we can be
reasonable confident. Hypothesis tests are used to answer the question of whether or not
the true coefficient is zero.
Confidence Interval at 95% level = b 2 S.E.
For instance, if b is 600 and SE is 210, then Confidence interval is 600 + (2 x 210)
= 1020, and 600 (2 x 210) = 180. We can say that we are 95% confidence that the true
coefficient lies somewhere between 180 and 1020.
In published research using regression analysis, we are more likely to see
hypothesis tests than confidence intervals. Usually, the kind of question people most want
answered is does this particular variable really affect the dependent variable? If a
variable has no effect, then its true value is zero. To know whether a coefficient is
significantly different from zero, T- test is conducted.
T- Statistics = b / SE


Then we consult a t table (or computer does this for us) to calculate the associated p value.
If p value is small, it is taken as evidence that the coefficient is not zero.
What is Hypothesis Testing?
It is analogous to decision reached in court of law. Under the court system, a defendant is
brought to trail and he is assumed to be not guilty. For the judge or jury to reject the
findings of guilty, sufficient evidence must be produced. In the court system, error can be
made, innocent defendant can be found guilty and guilty individual cannot be found guilty.
Under a legal system where the evidence must show beyond a shadow of doubt that the
assumption of non-guilt is to be rejected, there is a primary concern for the influential
error of the first type i.e., of convicting an innocent person. Just as defendant is assumed
not guilty until proven guilty, in hypothesis testing, the null hypothesis is assumed true
until there is sufficient evidence that it is not true.
How does regression control for variables?
Another use of multiple regression is to examine the effects of some independent
variables on dependent variable while controlling for other independent variables. In
regression analysis, coefficient for one variable can be interpreted while holding the other
variables constant.
How do we interpret regression results?
There are different ways of interpretation of results from different types of
variables. As discussed, there are three types of variables: interval scales, ordinal scales,
and nominal scales.
What can go wrong with regression analysis?
Any tool as widely used as regression is bound to be frequently misused.
Nowadays, statistical packages are so user-friendly that anyone can perform a multiple
regression with a few mouse clicks. As results, many researchers apply it to their data with
little understanding of the underlying assumptions or the possible pitfalls.
1. Inclusion of wrong variables
The following results indicates that the wrongly selected data and variable may
provide misleading results State-wise data for the year 2000-01 was collected to assess
the impact of technical education on economic development by some researchers. The
results are as follows:
PCI = 50.96** + 0.33** TE + 0.08 GE



R2 = 0.55
F-Value = 8.42

PCI = 59.55** + 0.29** TEH + 0.05TEL + 0.005 GE






** Significant at 5% level, figures in parentheses are t-values.


R2 = 0.61
F-Value = 8.42


PCI = per capita net state domestic products

TE = Overall technical education
TEH = higher level technical education
TEL = lower level technical education
GE = general education
2. Left out of important variables
There are two possible reasons for putting a variable in the regression model: you
want to know the effect of the variable on the dependent variable and you want to
control for the variable. Obviously, researchers will include variables that are the
main focus of their study, but they may not be so careful about including important
control variables. What makes a control variable important? To answer this
question, you need to answer two other questions: Does a variable have a causal
effect on the dependent variable? Is the variable correlated with those variables
whose effects are the focus of the study? If answer is yes, the control variable is
to be considered important.
3. Reverse causation
If dependent variable affects one or more independent variables, the resulting
biases can be as serious as those produced by the omission of important variables.
This problemknown as reverse causationactually can be worse than the
omitted variables problem because: every coefficient in the model may be biased
and it is hard to design a study that will adequately solve this problem.
4. Sample Size
Sample size has a profound effect on tests of statistical significance. The general
principle is this: In a small sample, statistically significant coefficients should be
taken seriously, but a non-significant coefficient is extremely weak evidence for
the absence of an effect. Small samples have low power to test hypotheses.
Therefore larger the sample, more accurate will be the results.
5. Effect of mediate variables
Even if the same is not small, there is another reason for being cautious in
concluding that a variable has no effect: it is possible that other variable mediate
the effect of that variable. If those other variables are also included in the model,
the effect of the variable you are interested in may disappear. Let us consider the
following model:
AIRjee = a + b1 SES + b2 INTERMARKS + b3 COACHING + U
If SES has a big effect on INTERMARKS and INTERMARKS, in turn, has a big
effect on AIRjee , then INTERMARKS is an intervening variable between SES
(socio-economic status) and AIRjee. If we put both the variables, the overall effect
of SES on the dependent variable is *. We may mistakenly conclude that SES has
no impact on the AIRjee.
6. Multicollinearity Problem
It is something that nearly all users of multiple regression have heard about.
However, their knowledge is often limited to two facts.

It is bad.

It has something to do with high correlation among the independent


It comes in two forms: extreme and near-extreme. Extreme multicollinearity means

that at least two of the independent variables are perfectly correlated. The
computer easily detects this type of problem. Near-extreme multicollinearity
means simply that there are strong linear relationships among the exogenous
variables. In the presence of this problem, regression coefficients tend to have
larger SE than they would have been in its absence.
This problem can de identified by: (1) Estimating correlation among variables and
if value of R is 0.8 and above, there is severe problem, (2) fitting regression
between the two variable and if the R2 is 0.60 and above, there is problem., (3)
tolerance value i.e 1- R2 , it is above 0.4, there is problem, and finally (4) variance
inflation factor (VIF), ie. 1/ tolerance. The square root of VIFtells us how much
larger the SE is, compared with what it would be if that variable were uncorrelated
with the other variables.
Time series data, panel data and aggregate data are more prone to the problem.
There are various solutions: deletion of one or more variable from the model,
combining the collinear variables into an index, and performing joint hypothesis
7. Hetroscedasticity problem
The word homoscedasticity is derived from a Latin phrase meaning same
variance. Its opposite is hetroscedasticity which means that the degree of random
noise in the equation varies with the values of the x-variables. It can be checked
by plotting the data on graph. This problem has two effects:

Inefficiency: in the presence of this problem. OLS is not optimal as it gives

equal weights to all observations. Biased SE: in its presence, SE estimates
can be seriously biased. That in turn leads to bias in test statistics and
confidence interval.

Solutions: WLS and transforming data.

8. Auto-correlation problem
Auto-correlation or serial correlation refers to the case in which the residual error
terms from different observations are correlated. It can be caused by several
factors, including omission of an important explanatory variable or the use of an
incorrect functional form. Whatever the cause may be, it influences the outcome of
the hypothesis testing. Its effect is underestimating the SE of coefficients. This in
turn yields an inflated t-ratio, which means that it is possible that the coefficient
will be found to be significantly different from zero when in fact they are not.
This problem can be diagnosed by Darban Whatson test (DW test).
presence of autocorrelation, OLS is not efficient, GLS is preferred.
What are the limitations of regression?

Based on measurement of central tendency.

Cannot handle more than one dependent variable simultaneously.


In the

How do we run a regression?

How do we choose a computer package?

How do we get our data into the computer package?

What else should we do before running the regression?

How do we indicate which regression model to run?

How do we interpret computer output?

What are the common options in regression packages?

What are standardized coefficients and how are they interpreted?


10. Statistical Software: SPSS

In this lecture, we shall attempt to make you aware of the main features of SPSS and to
enrich your skill in application of the software to with a view to analyse the data for
drawing meaningful results. As SPSS is very comprehensive and flexible software,
covering almost all aspects of data processing, cleaning, tabulating, analyzing and
reporting, it would not be feasible to discuss all these aspects in one session. Therefore,
the discussion will be limited only to some of the most relevant features of the software.
SPSS (Statistical Package for the Social Sciences) can take data from almost any
type of file and use them to generate tabulated reports, charts, and plots of distributions
and trends, descriptive statistics, and conduct complex statistical analyses. Our institute
has site license for this software that allows researchers to use the software in any general
access labs on campus. A brief note about the software is given in this write up. Detailed
discussions along with some practical examples will be made during the interactive
session on SPSS.
1. The Data Window
When you start SPSS for Windows, the first thing you will see is the data window. The
data window has a spreadsheet akin to Excel spreadsheet. You can directly enter data into
it. Cases (observations) are recorded in rows and variables in columns. You can cut, past,
and delete rows and columns as per the requirement.
2. The Output Window
The output window displays the output from statistical analyses and any charts you have
run. The table can be edited by double-clicking on the section of the table that you would
like to edit. Furthermore, they can be opened as a pivot table and edited from the pivot
table window so that one may adjust the look of the table.
3. The Syntax Window
There are two approaches to working in SPSS, using a point-and-click approach, and using
SPSS syntax to program commands and routines. The syntax window is used when data is
to be extracted from large databases. CSO and NSSO unit-level data are extracted through
syntax window. It is a very useful record-keeping tool. When using the point-and-click
approach, all commands and procedures are stored "in the background." This information
can then be pasted to the syntax window, using the "Paste" option found in the GUI
interface (i.e., the point-and-click approach). Having this information saved in the syntax
window can save a researcher abundant grief if printouts of output are lost. In addition,
you can place comments in the syntax window to indicate what it is that you are doing.
Syntax Editor window can be opened by doing the following: under File, select New, and
then Syntax. Keep in mind the following rules when:

Each command must begin on a new line and end with a period (.).
Most subcommands are separated by slashes (/). The slash before the first
subcommand on a command is usually optional.
Variable names must be spelled out fully.


Text included within apostrophes or quotation marks must be contained on a single

Each line of command syntax cannot exceed 80 characters.
A period (.) must be used to indicate decimals, regardless of your Windows
regional settings.
Variable names ending in a period can cause errors in commands created by the
dialog boxes. You cannot create such variable names in the dialog boxes, and you
should generally avoid them.

Command syntax is case insensitive, and three-letter abbreviations can be

used for many command specifications. You can use as many lines as you want to
specify a single command. You can add space or break lines at almost any point
where a single blank is allowed, such as around slashes, parentheses, arithmetic
operators, or between variable names.
4. Editing Options in SPSS
There are several default options in SPSS that you may find useful to change. You can
edit these options by going to the edit menu and selecting options. You will get a dialog
5. Getting Data into SPSS
There are three main ways to get data into SPSS: (a) creating a new SPSS data file, (b)
opening existing SPSS data files, and (c) importing data from another source such as an
ASCII file, an Excel spreadsheet, etc.
i. Creating new SPSS data files
Data can be directly entered into SPSS similar to an Excel spreadsheet. You may also cut
and paste data from other applications into SPSS. However, if you are going to enter data
directly, you will need to name and define your variables.
ii. Opening existing SPSS system files
Opening existing SPSS files is simple procedure, similar to opening other Windows files.
Select "Open" from the File menu, and you will find a dialog box.
ii. Importing data from an ASCII file
In order to use the data in SPSS, the data must be converted to a file format that SPSS
can recognize, namely something in *.sav format. SPSS can read in ASCII data, which
can then be saved in *.sav format.
iii. Importing data from other file formats
SPSS allows the user to open data directly into SPSS from many different file formats.
For example, SPSS will directly open Excel, SAS, Lotus, and *.dbf (database) files. All
the user needs to do is to go to the File Menu, select "Open", "Data", select the correct file
type from the "Files of Type" drop down menu, and navigate to the file you wish to open.


6. Saving Data in SPSS

Saving data in SPSS is very similar to other Windows applications. Select "Save as" from
the File Menu, move to the directory in which you want to save the file, and give the file
any name you desire. SPSS allows you to give your files descriptive names, without
having an eight character restriction. The default file type in which the data file will be
saved is *.sav. If you wish to save it as another file type (i.e., Excel), simply change the
file type in the "Save as Type" drop down menu.
7. Variable and Value Labels
A good data set will include variable and value labels that provide a fuller description of
both the variable and the meaning of each value within a variable (for nominal and ordinal
data; value labels are not needed for continuous data). Unlike the variable names that are
limited to 8 characters, the label may be up to 120 characters long. They give a fuller
description of a variable. Value may be given like 1 for males and 2 for females for a
variable on gender.
Missing Values
SPSS has two types of missing values that are automatically excluded from statistics
computed by procedures: system-missing values and user-missing values. Any variable for
which a valid value cannot be read from raw data or computed is assigned the systemmissing value. User-missing values are values that you tell SPSS to treat as missing for
particular variables. These values are values (other than blanks) that you coded into your
data to indicate non-acceptable responses.

Common menus of SPSS

1. File menu: used for create a new file, open an existing file, read in spreadsheet or
database files created by other software programs.
2. Edit Menu: used to cut, copy, and paste data values from the Data Editor; modify
or copy text from the Viewer or Syntax Editor; copy charts for pasting into other
publications from the Chart Editor, etc.
3. View Menu: used to turn toolbars and the status bar on and off, and turn grid lines
on and off from all window types; and control the display of value labels and data
values in the Data Editor.
4. Analyze: this menu is selected for various statistical procedures such as crosstabulation, analysis of variance, correlation, linear regression, and factor analysis.
5. Graphs: graphs menu is used to create bar charts, pie charts, histograms, scatterplots, and other full-color, high-resolution graphs. Some statistical procedures also
generate graphs. All graphs can be customized with the Chart Editor.
6. Utilities: used to display information about variables in the working data file and
control the list of variables from all window types; change the designated Viewer
and Syntax Editor, etc.
7. Window: use the Window menu to switch between SPSS windows or to minimize
all open SPSS windows.
8. Help: this menu opens a standard Microsoft Help window containing information
on how to use the many features of SPSS. Context-sensitive help is available
through the dialog boxes.


Statistical Analysis through SPSS

1. Descriptive Data Analysis
For some types of variables (especially continuous variables), we will want to
obtain summary statistics other than the number of cases in each category of
the variable. For example, we might be interested in the mean, median, or
standard deviation of a particular variable.

Select Descriptive Statistics from Analyze menu

Choose Frequencies/descriptive statistics
A dialog box appears. Names of all the variables in the data set appear on the left
side of the dialog box.
Select the variable from the list.
Click the arrow button right to the selected variable.

Now the selected variable appears in a box on the right and disappears from the left box.
Note that when a variable is highlighted in the left box, the arrow button is pointed right
for you to complete the selection. When a variable is highlighted in the right box, the
arrow button is pointed left to enable you to deselect a variable (by clicking the button) if
necessary. If you need additional statistics besides the frequency count, click the
Statistics... button at the bottom of the screen. When the Statistics... dialog box appears,
make appropriate selections and click Continue. In this instance, we are interested only in
frequency counts. The output appears on the Viewer screen
The mean, standard deviation, minimum, and maximum are displayed by default. The
variables are displayed, by default, in the order in which you selected them. Click
Options... for other statistics and display order. The following output will be displayed on
the Viewer screen.
The MEANS procedure displays means, standard deviations, and group counts for
dependent variables based on grouping variables. To run the MEANS procedure:

Select Analyze/Compare Means/Means...

Select the dependent variables and independent variable
Click Options...

Select Mean, Number of cases, and Standard Deviation. Normally these options
are selected by default. If any other options are selected, deselect them by clicking
Click Continue
Click OK
The output will be displayed on the Viewer screen.

Editing Pivot Tables

SPSS displays the output in pivot table with cells divided with vertical lines. Sometimes,
the default width of the output table columns is not enough to fit the values that will be
inserted in the cells. To edit a pivot table, double-click the pivot table and this activates the
Pivot Table Editor. Or click the right mouse button on the pivot table and from the context
menu; choose SPSS Pivot Table Object/Open and the pivot table will be ready to edit in its
own separate Pivot Table Editor window.

Printing the Output

Once you are satisfied with your analysis you may want to obtain a hard copy of the
output. You may print the entire output on the viewer window, or delete the sections you
do not want before you print. Or you can save the output to a diskette or hard drive and
print it later. The SPSS data file contains the actual data, variable and value labels, and
missing values that appear in the SPSS Data Editor window.
Correlation analysis
A correlation analysis is performed to quantify the strength of association between two
numeric variables. Select Analyze/Correlate/Bivariate... This opens the Bivariate
Correlations dialog box. The numeric variables in your data file appear on the source list
on the left side of the screen.
Linear Regression

Choose Analyze/Regression/Linear... The Linear Regression dialog box appears.

Choose the dependent variable
Choose the independent variables

T-test is a data analysis procedure to test the hypothesis that two population means are
equal. SPSS can compute independent (not related) and dependent (related) t-tests. For
independent t-tests, you must have a grouping variable with exactly two values (e.g., male
and female, pass and fail). The variable may either be numeric or character. Suppose you
have a grouping variable with more than two categories. You may use the RECODE
(Transform/Recode) command to collapse the categories into two groups. RECODE is a
powerful SPSS command for data transformation with both numeric and string variables.
Select Analyze/Compare Means/Independent-Samples T-test...

Select Variables
Select Grouping Variable.
Click on Define Groups...
Type 1 for Group 1, and 2 for Group 2.

A t-test with two related variables is performed using the Paired-Samples T-Test from the
Analyze/Compare Means menu.
One-way Analysis of Variance
The statistical technique used to test the null hypothesis that several population means are
equal is called analysis of variance. It is called that because it examines the variability in
the sample, and based on the variability, it determines whether there is a reason to believe
the population means are not equal. The statistical test for the null hypothesis that all of
the groups have the same mean in the population is based on computing the ratio of within
and between group variability estimates, called the F statistic. A significant F value only
tells you that the population means are probably not all equal. It does not tell you which
pairs of groups appear to have different means. To pinpoint exactly where the differences
are, multiple comparisons may be performed.



Performance of any decision-making unit (DMU) largely depends on how
efficiently inputs are used in the production, marketing and distribution processes. As
resources at its disposal are limited and have competitive use, they are to be optimally
applied to enhance productivity, efficiency and profitability. In order to survive in todays
competitive environment, it has to improve its performance not only relative to its past
performances but also relative to its competitors in the industry. In this context, it becomes
vital to study inter-firm comparison to identify best practices of efficient firms in resource
utilization and apply them to improve the efficiency of relatively less efficient firms.
In order to identify up to what extent a firm produces output efficiently and costeffectively, its economic efficiency is estimated. Economic efficiency is the product of
two efficiencies-technical efficiency and allocative efficiency. Technical efficiency refers
to the firms ability to produce the maximum possible output from a given combination of
inputs and technology, regardless of market demand and prices. Allocative efficiency
refers to the firms ability to use the inputs in optimal proportion, given their respective
prices. Classical production theory assumes that given the level of technology, a
production function shows maximum quantity of output that a firm can produce with the
given set of inputs. This means that the firm produces output with 100 per cent technical
efficiency. However, in reality, a firms realised output may be below the potential output.
Hence, measurement of individual firms technical efficiency becomes essential to know
the extent of deviation of firms actual output from its potential output. There are two most
popular approaches to estimate technical efficiencyData Envelopment Analysis (DEA)
and Stochastic Production Frontier (SPF) Analysis. In this lecture, a detailed discussion
will be held on the DEA and the efficiency estimating procedure will be taught through
DEA software.
Genesis of DEA
Farrell (1957) laid the foundation for new approaches to efficiency and
productivity analysis at the micro level, involving new insights on two issues: how to
define efficiency and productivity, and how to calculate the benchmark technology and the
efficiency measures. He showed how to define economic efficiency and how to
decompose it into its technical and allocative components. He defines technical efficiency
as the ratio of observed output to the maximum potential output that can be attained from
given inputs. If a firms actual output is below the potential output, the shortages is
regarded as an indicator of inefficiency. Allocative efficiency (AE) of a firm is defined as
the ratio of minimum cost to the actual cost. It refers to the firms ability to use the inputs
in optimal proportion, given the prices of inputs.
Farrells paper gave birth to two approaches of efficiency measurement
deterministic frontier approach and stochastic frontier approach (SFA). Deterministic
frontiers are parametric as well as non-parametric. Aigner and Chu (1968), Afriat (1972),
Richmond (1974), and Schmidt (1976) develop parametric deterministic models, while
Charnes, Cooper and Rhodes (1978) evolve a non-parametric deterministic approach,
popularly known as Data Envelopment Analysis (DEA) which is extended by Banker,
Charnes, and Cooper (1984). SFA is developed independently by Aigner, Lovell and
Schmidt (1977) and Meeusen and Broeck (1977) and later on extended by Jondrow,
Lovell, Materov, and Schmidt (1982) and Battese and Coelli (1992; 1995). Both DEA
and SFA are being applied by the researchers to measure technical efficiency of decisionmaking units (DMUs) using cross-sectional as well as panel data. Earlier, economists

usually prefer to use econometric methods to measure efficiency. In the 1990s, many of
them have also started using DEA because of its ability to handle multiple inputs and
outputs and its suitability for studying the performance of both manufacturing and service
sectors DMUs.
Deterministic frontier approach does not incorporate the measurement errors and
other noise. In it, all deviations from the frontier are assumed to be the result of technical
inefficiency, whereas, stochastic frontier production function (SFPF) accommodates
exogenous shocks. This involves the specification of the error term as being made up of
two components: a symmetric component permitting random variation of the frontier
across firms, and captures the effects of measurement error, other statistical noise, and
random shocks outside the DMUs control and a one-sided component capturing the effects
of inefficiency relative to the stochastic frontier.
Aigner, Lovell and Schmidt (1977), Meeusen and van den Broeck (1977), and
Battese and Corra (1977) propose the SFPF. Consider the following Cobb-Douglas
production function:
y i = f ( xi , ) + i i = 1, 2.., N.
where yi is the logarithm of the (scalar) output (Y) for the ith firm, xi is a (K+1)row
vector whose first element is 1 and the remaining elements are the logarithms of the Kinput quantities used by the ith firm; = ( 0, 1, k) is a (K+1)column vector
of unknown parameters to be estimated; i is random errors which is: i = vi u i . Thus,
equation (3.1) can be written as:
i = 1, 2.., N.
y i = f ( xi ) + vi u i
vi ~ N (0, v ) is a two sided error term representing the usual statistical noise found in any
relationship, and u i > 0 is one side error term representing technical inefficiency in the
sense that it measures the shortfall of output (yi) from its maximal possible value given by
the stochastic frontier [f(xi + vi]. The model (2) is known as SFPF because the output
values are bounded above by the stochastic (random) variable exp (xi + vi). The random
error vi can be positive or negative (Coelli, et al., 1998)
Direct estimates of the stochastic frontier model can be obtained either by
maximum likelihood or corrected ordinary least square (COLS) methods. Introducing
specific probability distributions for vi and ui, assuming that ui and vi are independent and
that xi is exogenous, the asymptotic properties of the maximum likelihood estimators can
be obtained. The model can also be estimated by COLS by adjusting the constant term by
E(ui), which is derived from the moments of the OLS residuals. Once a model of this form
is estimated, one can readily obtain residuals i = yi - f(xi, ), which can be regarded as
estimates of the error terms i .
Meeusen et al. (1977) assign an exponential distribution to u, Battese and Corra
(1977) assign a half normal distribution to u, and Aigner et al. (1977) consider both
distributions for u. Parameters to be estimated are , v2 and variance parameter u2
associated with u. Either distributional assumption on u implies that the composed error (v
- u) is negatively skewed and statistical efficiency requires that the model be estimated by
maximum likelihood. After estimating production frontier, an estimate of mean technical
inefficiency in the sample is provided by E (-u) = E (v - u) = - (2/)1/2 u in the normalhalf normal case and by E (-u) = E (v- u) = -u in the normal-exponential case.
SFA approach gives less biased measure of efficiency. However, it could only
provide average technical efficiency measures for the sample observations. Although these
aggregate measures are useful in a way, individual observation- specific technical
efficiency measures are more useful from a policy viewpoint. Jondrow, Lovell, Materov

and Schmidt (1982) and Kalirajan and Flinn (1983) independently considered the Aigner
et al. (1977) and Meeusen and van den Broeck (1977) stochastic models to predict the
random variable ui under the assumption that i is known. SFA does not have a priori
justification for the selection of any particular distribution form of the random error term
and resulting efficiency measures may be sensitive to the distributional assumption.
Another problem with SFA is that it cannot handle multiple output variables at a time
(Thanassoulis, 2001).
DEA is a linear programming (LP) based multi-factor productivity analysis model
for measuring the relative efficiency of homogenous set of DMUs. It optimises on each
individual observations with an objective of calculating a discrete piecewise frontier
determined by the set of Pareto-efficeint DMUs. It does not require any specific
assumptions about the functional form. It calculates a maximal performance measure for
each DMU relative to all other DMUs in the observed population with the sole
requirement that each DMU lie on or below the external frontier. Each DMU not on the
frontier is scaled down against a convex combination of the DMUs on the frontier facet
closest to it (Charnes, et al. 1994).
There is an increasing concern with measuring and comparing the efficiency of
organisational units such as local authority departments, schools, hospitals, shops, bank
branches and similar instances where there is a relatively homogeneous set of units.
The usual measure of efficiency, i.e.:

is often inadequate due to the existence of multiple inputs and outputs related to different
resources, activities and environmental factors. DEA methodology is developed to solve
this problem. This technique is quite useful for measuring the efficiency of service sector
DMUs, especially the government organization providing public goods.
We have two basic DEA modelsCCR model, developed by Charnes, Cooper and
Rhodes in 1978 and BCC model, developed by Banker, Charnes, and Cooper in 1984.
CCR model generalises the single output/input ratio measure of efficiency for a single
DMU in terms of fractional linear programming (FLP) formulation transfroming the
multiple output/input characteristics of each DMU to that of a single virtual output and
virtual input. The model defines the relative efficiency for any DMU as a weighted sum
of outputs divided by a weighted sum of inputs where all efficiency scores are restricted to
lay between zero and one. An efficiency score less than one means that a linear
combination of other units from the sample could produce the same vector of outputs
using a smaller vector of inputs. The score reflects the radial distance from the estimated
production frontier to the DMU under consideration. Variables in the model are inputoutput weights and the LP solution produces the weights most favourable to the unit under
reference. In order to calculate efficiency scores, FLP is converted into LP by normalising
either the numerator or the denominator of the fractional programming objective function.
In case of output maximization DEA program, the weighted sum of inputs is constrained
to be unity to maximize weighted sum of outputs, while in input-minimization DEA
program, the weighted sum of outputs is constrained to be unity to minimize weighted sum
of inputs. CCR model is based on constant returns to scale assumption. Under this
assumption, if the input levels of a feasible input-output correspondence are scaled up or
down, then another feasible input-output correspondence is obtained in which the output

levels are scaled by the same factor as the input levels (Thanassoulis, 2001).
Another version of DEA was given by Banker, Charnes and Cooper (1984). The
primary difference between BCC and CCR models is the convexity constraint, which
represents the returns to scale. The CCR model is based on the assumption that constant
return to scale exists at the efficient frontiers whereas BCC assumes variable retunes to
scale frontiers. CCR efficiency is overall technical efficiency (OTE), known as global
technical efficiency whereas BCC efficiency is the pure technical efficiency (PTE) net of
scale-effect, known as local technical efficiency. If a DMU scores value of both CCRefficiency and BCC-efficiency one, it is operating in the most productive scale Size
(MPSS). If a DMU has BCC-efficiency score one and CCR-efficiency score less than one,
it is operating locally efficiently but not globally efficiently due to the scale size of the
DMU. Thus, inefficiency in any DMU may be caused by the inefficient operation of the
DMU itself (BCC-inefficiency) or by the disadvantageous conditions under which the
DMU is operating (scale-inefficiency). Scale efficiency is estimated by dividing the CCRefficiency from the BCC-efficiency for a DMU. Another technique based on DEA is
Malmquist Productivity Index (MPI) proposed by Caves, et al. in 1982. The MPI is
defined with distance functions. For panel data, distance functions permit to describe
multiple input-output production technologies without behavioural objectives such as
profit maximisation or cost minimisation. The detail description of the MPI model is
presented in the chapter 7.
Since the publication of the seminal paper of Charnes, et al. (1978), a numerous
research papers have been written on both theoretical and applied aspects of the DEA
approach. On theoretical facets, a number of DEA models and their extensions have been
made. Weight restriction, non-discretionary inputs and outputs, categorical inputs and
outputs, sensitivity analysis, input congestion, returns to scale, bad outputs, supper
efficiency, target setting etc. are the major aspects on which extension of DEA models
have been made. In parallel with the theoretical development, a wide range of empirical
studies have also been published which evince the inexhaustible potential of DEA for
innovative applications.
Originally, DEA was applied to estimate the relative efficiency of non-profit
organizations such as educational institutions, government hospitals, public utilities, etc.
where market prices are not generally available. However, its ability to use multiple
output-input variables without a priori underlying functional form assumption has
motivated the researchers to extend it to the profit-organizations also. Some of the areas
where applications of DEA have been made frequently by the researcher are: banks,
academic institutions, hospitals, public utilities like gas, water, electricity supply, police
services, transport services, agriculture, and industry. Moreover, development of DEAbased MPI for measuring total factor productivity growth and its decomposition into
technical efficiency change and technical progress is the significant achievement in the
field of productivity analysis.
Terminology of DEA
1. Benchmarking: It is the process of comparing the performance of an individual
organization against a benchmark, or ideal level of performance. Benchmarks can be
set on the basis of performance over time or across a sample of similar organizations
or some externally set standard.
2. Best Practices: Best practices refer to the set of management and work practices that
results in the highest potential or optimal quantity and combination of outputs for a


given quantity and combination of inputs (productivity) for a group of similar

3. Decision Making Unit (DMU): The term DMU is first used by Charnes, Cooper and
Rhodes in 1978 in their seminal paper on DEA. DMU means individual production
unit producing tangible or intangible output under private, cooperative, government or
any other organizations ownership. It comprises manufacturing firms, banking and
insurance companies, transport and communication firms, hospitals, schools and
universities, other service providing firms, government organizations, local
governments, municipal corporations, etc. For measuring the relative performance of
individual DMUs, the set of DMUs should face the same fundamental characteristics
in terms of environment and technological constraints. If someone wants to assess the
efficiency of educational institutions, the DMUs in the dataset should be
homogeneous. For instance, school cannot be compared with universities.
4. Economies of Scale: It refers to increasing a firms size until it obtains the minimum
cost per unit of output.
5. Inefficiency: The amount by which a firm lies below the estimated frontier can be
regarded as measure of inefficiency. Under the given technology, if actual output of a
firm equals the potential output, the firm would not have inefficiency in the
6. Most Productive Scale Size (MPSS): It is that size at which a DMU obtains 100
percent pure technical efficiency and scale efficiency. This is possible when a DMU
attains an efficiency score of one under constant returns to scale technology
7. Pareto Efficiency: A DMU is Pareto-efficient if it is not possible to reduce any one
of its input levels without increasing at least another one of its input levels and /or
without lowering at least one of its output levels.
8. Peer: A peer is an efficient DMU which acts as a reference point (in terms of input
and output mix) for inefficient DMUs.
9. Productivity: It can be defined as the ratio of a measure of output of one or more of
inputs used to produce the output. There are two main concept of productivity: partial
(single) factor productivity and total (multiple) factor productivity. Partial factor
productivity is a simple ratio of volume of total output to the volume of total quantity
of a single input. For instance, labour productivity is measured by dividing the total
production of a firm by the number of total workers (or total hours of work) of that
firm. Partial factor productivity concept cannot provide the true performance of a
resource. For instance, labour productivity in a firm can be raised either by improving
the quality of human resource through training and retraining or simply by retrenching
the manpower and using more capital and technology intensive production process.
Therefore, total factor productivity (TFP) index is measured to assess the overall
productivity of a firm or industry. TFP is a ratio of weighted sum of output to the
weighted sum of inputs. The TFP index having value greater than one indicates to the
positive growth in the productivity and a value of TFP index less than one means
negative growth. If value of the index is equal to one, there is no growth in the
productivity. Various methods have been developed to compute TFP. In this study, we
apply a non-parametric DEA-based method, known as MPI to measure the TFP growth
in the sugar mills.
10. Production Frontier: Production frontier is what it gives maximal output that can be
achieved with the given amount of inputs.


11. Returns to Scale: It refers to a measure of change in output resulting from a change in
the scale of a firms operation as determined by its input usage. There are three returns
to scaleincreasing, constant and decreasing. When inputs are doubled and output
increases more than double, it is increasing returns to scale. If the output increases in
the same proportion as inputs are increased, it is constant returns to scale. Decreasing
returns to scale exists when output increases less than the proportional increase in the
12. Pure Technical Efficiency: It refers to the proportion of technical efficiency which is
attributed to the efficient conversion of inputs into output. Effect of size of plant on the
efficiency is neutralized in it. It is also known as managerial efficiency or local
efficiency. It is estimated through BCC DEA model which is based on the variable
returns to scale technology assumption. Value of pure technical efficiency score lies
between zero and one.
13. Technical Efficiency: Technical efficiency refers to the firms ability to produce the
maximum possible output from a given combination of inputs and technology. In
DEA, technical efficiency is determined by the difference between the observed
quantities of a DMUs output (s) to input (s) and the ratio achieved by best practice
DMUs. It is, therefore, a relative technical efficiency, not the absolute technical
efficiency. Its value lies between zero and one. If a DMU is on the production frontier
and does not have any input or output slack, its technical efficiency score will be equal
to one. Technical efficiency can be decomposed into scale efficiency and pure
technical efficiency.
14. Scale Efficiency: The extent to which an organization can take advantage of returns to
scale by altering its size towards optimal scale. In DEA analysis, scale efficiency for a
DMU is calculated by dividing CCR efficiency score from BCC efficiency score. As
BCC score is more than or equal to CCR score, value of scale efficiency score lies
between zero and one.
15. Slacks: Slacks in DEA refer to the extra quantity by which an input (output) can be
reduced (increased) to obtain technical efficiency after all inputs (outputs) have been
radially reduced to reach the production frontier.
Basic DEA models are described as:
CCR Model
This model generalizes the usual input/output ratio measure of efficiency for a
given firm in terms of a fractional linear program formulation. Mathematically, the
relative efficiency of the kth DMU is given by:

Max h k =


r =1






y rj

i =1

Subjected to:

r =1

i =1


j = 1. k. n



u rk

i =1



i =1

r = 1....s




y rk = the amount of the r th output produced by the k th DMU; x ik = the amount of the
i th input used by the k th DMU; u rk = the weight given to the r th output of the k th DMU;
v ik = the weight given to the i th input of the k th DMU; n= no. of DMUs ; s= no. of
outputs; m= no. of inputs; and = a non-Archimedean (infinitesimal) constant
The above objective function is reformulated in LP problem as follows:

Max w k =

r =1



y rk

Subjected to

i =1


x xk = 1

r =1

i =1

rk y rj ik xij 0

j = 1......n


r = 1......s
i = 1.......m

Since the number of DMUs is generally larger than the total number of inputs and
outputs, solving the dual of the model can reduce the computational burden.
Mathematically, the dual formulation of the above model is:

r =1

i =1

Min z k = k S rk+ S ik


Subjected to

j =1


y rj S rk+ = y rk

r = 1..........s


xij + S ik = k xik

i = 1..........m

j =1

jk 0

j = 1..........n

k free
S rk+ , S ik 0 ; r = 1.....s, i = 1......m

S rk+ = Slacks in the i th input of the k th DMU; S ik

= slacks in the r th output of the

k th DMU; jk ' s = non-negative dual variables; k (scalar) is the (proportional) reduction

applied to all inputs of DMU k to impose efficiency. If for DMU k, k* =1 and all slacks
are zero, it is Pareto efficient. The non-zero slacks and (or) k* 1 identify the sources and
amount of any inefficiency that may exist in the DMU under reference.


BCC Model
The primary difference between BCC model and CCR model is the convexity
constraint. In the BCC model jk s are restricted to summing to one (i.e.


jk 1 instead of
j =1

j =1


=1). If we

j =1

=1, then the model is converted into Non-Increasing


Returns to Scale (NIRS) model. Similarly if we impose

j =1


1 instead of

j =1



then the model is known as Non-Decreasing Returns to Scale (NDRS) model.

The technical efficiency measured by the CCR model includes the effects of both
scale and technical efficiencies. The BCC model measures the pure technical efficiency
net of scale effect. It captures the pure resource-conversion efficiencies, irrespective of
whether the DMUs operate at increasing, decreasing or constant returns to scale. Scale
efficiency of a DMU is estimated dividing the CCR efficiency score by the BCC
efficiency score. As BCC efficiency score is more than or equal to the CCR efficiency
score, value of scale efficiency score will be less than or equal to one.








Figure 1: Comparison of CRS and VRS Frontiers

Figure-1 makes the comparison of CCR and BCC models. The CCR model is
based on constant returns to scale (CRS) technology assumption and the BCC model is
based on variable returns to scale (VRS) technology assumption. The CRS surface is the
straight-line oicm and the VRS surface is abcde.


Efficiency of any interior point (such as k) is intuitively given by the distance

between the envelope and itself. Typically, such a distance may be measured either
horizontally along the x-axis or vertically along the y-axis, providing an input-oriented or
output-oriented measure, respectively. For example, using an input orientated measure,
technical efficiency of DMU k will the measured by hi/hk in the CRS technology
assumption and by hj/hk in the VRS technology assumption. A measure of scale efficiency
is providing by the ratio hi/hj. A DMU at point c is operating at most productive scale
size (MPSS).
Advantages and Limitations of DEA
DEA methodology has several advantages over the traditional regression-based
production function approach. A few of them are: it can handle multiple inputs and
outputs; it doesn't require any assumption of a functional form relating inputs to outputs;
DMUs are directly compared against a peer or combination of peers; inputs and outputs in
the model can have different units; it sets targets for inefficient DMUs to make them
efficient, it also identifies slacks in inputs and outputs; and it estimates a single efficiency
score for each DMU. This approach also has certain advantages over the SFA. Apart from
not imposing any functional form on production or technology, it makes the minimum
assumptions about the underlying technology. SFA can use only single output variable,
while DEA can use more than one output variables. In case of the stochastic frontier
approach the parameter estimates are sensitive to the choice of the probability distributions
specified for the disturbance terms (Ray, Seon n.d.), whereas DEA does not require any
functional form. However, DEA has several limitations also, such as:
1. Since DEA is an extreme point technique, noise such as measurement error can
cause significant problems.
2. In DEA, efficiency is defined relative to the efficiency of other firms under
consideration. It is not an absolute measure.
3. Since DEA is a nonparametric technique, statistical hypothesis testing is difficult8.
4. DEA scores are sensitive to input-output specification and the size of sample.
Precautions to be taken
1. Since, no hypothesis testing is possible, data accuracy must be given priority.
2. In order to make sufficient discrimination between DMUs, sample-size should be
adequate. It should be at least three times greater than the sum of input-output
3. Most important exercise in DEA is the identification of input-output variables.
Regression analysis can be conducted to identify the best fit in output and input
variables. Zero and negative values of any input or output should be avoided.
Variables in the model should be as few as possible.
4. Data scaling should be done before applying DEA so that input-output variables do
not have excessively large values.
Post-DEA Analysis
A key aspect of DEA is incorporating environmental factors into the model as either
inputs or outputs. Resources available to units are classed as inputs whilst activity levels or
performance measures are represented by outputs. One approach to incorporating
environmental factors is to consider whether they are effectively additional resources to

the unit in which case they can be incorporated as inputs, or whether they are resource
users in which case they may be better included as outputs. For example in comparing
efficiency of schools research has indicated that in general parents of higher educational
attainment provide greater support to their children and therefore are effectively an
additional resource to the schools and should be classed as an input. Tobit regression is an
appropriate method to study the impact of environmental and background factors on
efficiency. It assumes that the data are truncated, or censored, above or below certain
values. In DEA, values of dependent variable are censured as they range between 0 and 1.
Productivity Measurement Methods
Productivity growth is one of the major determinants of competitiveness and
profitability of a firm. A higher level productivity growth may result in lower product
prices, better remunerations and working conditions to the employees, better returns to
the investors and adequate surplus to the firm for plant expansion and modernization.
Technical change and technical efficiency change are the two sources of productivity
growth. A study of these sources is crucial for identifying the factors that are
responsible for the productivity stagnation and for adopting appropriate measures at
firm, industry and government levels to improve the productivity. In this chapter, we
examine the productivity growth and its sources in the sugar mills of Uttar Pradesh. A
non-parametric approach, known as Malmquist productivity index (MPI) is applied on
the panel data of seven years collected from 36 sugar mills of the state. Outputoriented DEA method is used for estimation of the TFP growth and its decomposition
into technical efficiency change and technical progress. Technical efficiency change is
further decomposed into pure technical efficiency change and scale efficiency change.
The study also examines inter-sector and inter-region variations in the TFP growth and
its components.
Productivity Measurement Approaches
Most commonly used measures of productivity are partial or single factor
productivity and total factor productivity. Single factor productivity is the ratio of total
output to the quantity or number of the factor for which productivity is to be estimated.
Single factor productivity provides a distorted view about the contribution of a factor to
the total production. For instance, partial productivity of labour can be increased by
reducing quantity of labour and increasing quantity of capital in the production unit.
Therefore, concept of total factor productivity (TFP) is more relevant in context of
resource use efficiency. TFP is defined as the ratio of weighted sum of output to the
weighted sum of inputs. Over the last three decades, several theories and methods of TFP
measurement have been developed. Before the mid 1990s, most studies estimated TFP
growth by growth accounting approach (Frank et al., 2002). The approach is based on
unrealistic assumptions of perfect competition and constant returns to scale. It assumes
that a firm operates on its production frontier, implying that the firm has 100 per cent
technical efficiency. Thus, TFP growth measured through this approach is due to technical
change, not due to technical efficiency change (Mawson, et al., 2003). Parametric
(stochastic frontier analysis) and non-parametric (DEA-based MPI) are the other two
productivity measurement approaches which use panel data for estimation of productivity
of individual production units. These approaches do not assume that all production units
operate at 100 per cent technical efficiency. According to the MPI approach, TFP can
increase not only due to technical progress (shifting of frontier) but also due to

improvement in technical efficiency (catch up). The approach has become quite popular
because: (i) it does not require price data, therefore suitable when price data are not
available or price data are distorted; (ii) it rests on much weaker behavioral assumptions,
since it does not assume cost minimizing or revenue maximizing behaviour; (iii) it uses
panel data and provides a decomposition of productivity change into two components
technical change and technical efficiency change. Technical change reflects improvement
or deterioration in the performance of best practice firms, while technical efficiency
change reflects the convergence toward or the divergence from best practice on the part of
the remaining firms. The significance of the decomposition is that it provides information
on the source of overall productivity change in the firms.
The MPI was initially introduced by Caves, Christensen and Diewert (CCD) in
1982 and was empirically applied by Fare, Grosskopf, Lindgren and Roos (FGLR) in 1992
and Fare, Grosskopf, Norris and Zhing (FGNZ) in 1994. Since then, several extended
versions of MPI and its decomposition have been developed by the researchers. A few of
them are: Ray and Desli (1997), Simer and Wilson (1998), Grifell-Tatje and Lovell
(1999), Balk (2001), Kumar and Russell (2002) and Chem and Ali (2004).
DEA analysis is static in nature as the performance of a mill is assessed in response
to the best practice mills in a given year. The shift of frontier overtime is not accounted for
by this assessment. To account for this dynamic shift, the MPI model is used. Since it is
also capable of decomposing the productivity growth in technical efficiency change and
technical progress, it is able to shed light on the mechanism of productivity change (Ma, et
al., 2002).
The MPI is defined with distance functions. Distance functions allow us to
describe multiple input-output production technology without the need to specify a
behavioural objective such as cost minimization or profit maximization (Coelli, et al.,
1998). Both output and input distance functions can be defined. With the given input
vector, an output distance function maximizes the proportional expansion of the output
vector, whereas in case of input distance function, the aim is to minimise the input vector,
given the output vector.
The output-oriented Malmquist TFP change index between period t (the base
period) and period t+1 is given by
D0t ( y t +1 , x t +1 ) D0t +1 ( y t +1 , x t +1 ) 12
t +1
D0 ( y , x )
D (y , x )
Equation (1) is the geometric0mean of the two
indicestechnical efficiency change
and technical change. The first is estimated with respect to period t technology and second
with respect to period t+1 technology. Assuming that D0t ( y t , x t ) 1 and
M 0t +1 ( y t +1 , x t +1 , y t , x t ) = [

D0t ( y t +1 , x t +1 ) 1 , equation (7.1) can be rewritten as

t +1

D0t +1 ( y t +1 , x t +1 ) D0t ( y t +1 , x t +1 )
D0t ( y t , x t ) 12
(y , x , y , x ) =
D0t ( y t , x t ) D0t +1 ( y t +1 , x t +1 ) D0t +1 ( y t , x t )
t +1

t +1


Where, the ratio outside the square brackets in equation (7.2) represents technical
efficiency change (effch) and the expression in the square brackets indicates technical
change (techch). Thus, MPI can be decomposed into change in technical efficiency
(catching up) and into change in frontier (technical progress):

effch =

D0t +1 ( y t +1 , x t +1 )
D0t ( y t , x t )


D t ( y t +1 , x t +1 )
Dt ( yt , xt ) 1
techch = [ t 0+1 t +1 t +1 * t 0+1 t t ] 2
Technical efficiency change (effch) measures the change in technical efficiency
between periods t and t+1 with respect to the production possibilities existing in each
period. Technical change (techch) is the geometric mean of the shifts in frontier at the
factor ratios of periods t+1 and t respectively. The value of the MPI greater than 1 means
productivity growth and a value less than 1 means deterioration in productivity. The same
is applicable to each of the components of the Malmquist Productivity Index.
Figure-1 describes the MPI with one input (x) and one output (y) under CRS
technology and its decomposition into efficiency change and technical change, MPI under
CRS technology indicates a rise in potential productivity as the technology frontier shifts
from t to t+1. Points P and R in the figure represent the input-output combinations of a
production unit (Mill) in periods t and t+1respectively. In both periods, the unit is
operating below the production possibility frontier.

Frontier in
period t+1


Frontier in
period t





Figure 1: Malmquist Productivity Indices using CRS Technology

Technical efficiency change and technical change are represented by the distance
functions. In terms of the distances along the y-axis, the index becomes
Mt+1(y t+1, x t+1, yt, xt ) = (y t+1/ Y3) / (yt / Y1) [(yt+1/ Y2) /( yt+1/ Y3) * [(yt / Y1)/( yt / Y2)]1/2
Efficiency change = (y t+1/ Y3) / (yt / Y1)


Technical Change = [(yt+1/ Y2) /( yt+1/ Y3) * [(yt / Y1)/( yt / Y2)]1/2



In order to calculate the productivity of the year between t and t+1, we need to
solve four different LP problems: D t ( x t, y t), D t+1( x t, y t ), D t ( x t+1, y t+1 ), and D t+1( x
, y t+1 ). Mathematical formulations are shown in Box 1. If technical efficiency change is
to be decomposed into scale efficiency change and pure technical efficiency change, two
more LP problems are to be solved by putting the convexity restriction in (7.8) and (7.9),
that is, one would estimate these two distance functions relative to VRS technology
(Coelli et al., 1998).
Linear Programming Formulation of MPI
The MPI requires the following four LP problems:


[d 0t X t , Y t ] 1 =

subject to

y rt


j =1

y rj


j =1


r = 1..........s


i = 1..........m

tj 0
j = 1..........n
is unrestricted in sign

( 7.8)


[d 0t +1 X t +1 , Y t +1 ] 1 =

subject to

y rt +1

j =1

j =1

t +1

t +1

y rjt +1

xijt +1

r = 1..........s

xit +1

i = 1..........m

j = 1..........n
tj+1 0
is unrestricted in sign



[d 0t X t +1 , Y t +1 ] 1 =

subject to

y rt +1

j =1


j =1


y rj

xit +1


r = 1..........s
i = 1..........m

j = 1..........n
tj 0
is unrestricted in sign

[d 0t +1 X t , Y t ] 1 =



subject to

y rt

j =1

j =1

t +1

t +1

xijt +1

y rjt +1 0

r = 1..........s
i = 1..........m

tj+1 0
76 j = 1..........n
is unrestricted in sign



In this lecture, we shall discuss two advanced topics of multivariate analysis. They are
discriminate analysis and factor analysis
Discriminant Analysis
Researchers often wish to classify people or objects into two or more groups. One
might need to classify persons as buyers or non-buyers, good or bad credit risks or
superior, average or poor performers in some activity. The objective is to establish a
procedure to find the predictors that best classify subjects.
Discriminant analysis joins a nominally scaled criterion or dependent variable with
one or more independent variables that are interval or ratio scaled. Once the discriminant
equation is found, it can be used to predict the classification of a new observation. The
researchers may be interested to check whether the predictor variables discriminate among
the group. More specifically, it is essential to identify which independent variable is more
important when compared to other predictor variables. This is done by calculating a linear
Discriminant function analysis, known as discriminant analysis (DA) is used to
classify cases into the values of a categorical dependent, usually a dichotomy. It is applied
when grouping variable has only two categories. Multiple discriminant analysis (MDA) is
used to classify a categorical dependent that has more than two categories. MDA is
sometimes also called discriminant factor analysis or canonical discriminant analysis.
DA shares all the usual assumptions of correlation, requiring linear and homoscedastic
Like multiple regression, it also assumes proper model specification
(inclusion of all important independents and exclusion of extraneous variables). It also
assumes the dependent variable is a true dichotomy.
Objectives of DA

To classify cases into groups using a discriminant prediction equation.

To test theory by observing whether cases are classified as predicted.
To investigate differences between or among groups.
To determine the percent of variance in the dependent variable explained by the
To assess the relative importance of the independent variables in classifying the
dependent variable.
To discard variables which are little related to group distinctions.

Key Terms and Concepts

Discriminating variables: These are the independent variables, also called

The criterion variable. This is the dependent variable, also called the grouping

Discriminant function: A discriminant function is a latent variable that is created

as a linear combination of discriminating (independent) variables. This is

analogous to multiple regression, but the b's are discriminant coefficients which
maximize the distance between the means of the criterion (dependent) variable.

The eigenvalue, also called the characteristic root of each discriminant function,
reflects the ratio of importance of the dimensions which classify cases of the
dependent variable. There is one eigenvalue for each discriminant function. The
eigenvalues assess relative importance because they reflect the percents of variance
explained in the dependent variable, cumulating to 100% for all functions.

The relative percentage of a discriminant function equals a function's eigenvalue

divided by the sum of all eigenvalues of all discriminant functions in the model.
Thus it is the percent of discriminating power for the model associated with a
given discriminant function.

The canonical correlation, R*, is a measure of the association between the groups
formed by the dependent and the given discriminant function. When R* is zero,
there is no relation between the groups and the function. When the canonical
correlation is large, there is a high correlation between the discriminant functions
and the groups. R* is used to tell how much each function is useful in determining
group differences. An R* of 1.0 indicates that all of the variability in the
discriminant scores can be accounted for by that dimension.

The discriminant score, also called the DA score, is the value resulting from
applying a discriminant function formula to the data for a given case. The Z score
is the discriminant score for standardized data.

Unstandardized discriminant coefficients are used in the formula for making the
classifications in DA, much as b coefficients are used in regression in making
predictions. The constant plus the sum of products of the unstandardized
coefficients with the observations yields the discriminant scores. That is,
discriminant coefficients are the regression-like b coefficients in the discriminant
function, in the form L = b1x1 + b2x2 + ... + bnxn + c, where L is the latent variable
formed by the discriminant function, the b's are discriminant coefficients, the x's
are discriminating variables, and c is a constant. There will be no constant when
the data are standardized or are deviations from the mean. The discriminant
function coefficients are partial coefficients, reflecting the unique contribution of
each variable to the classification of the criterion variable. The standardized
discriminant coefficients, like beta weights in regression, are used to assess the
relative classifying importance of the independent variables.

Standardized discriminant coefficients, also termed the standardized canonical

discriminant function coefficients, are used to compare the relative importance of
the independent variables, much as beta weights are used in regression.

Tests of significance: Wilks' lambda is used to test the significance of the

discriminant function as a whole. In SPSS, the "Wilks' Lambda" table will have a
column labeled "Test of Function(s)" and a row labeled "1 through n" (where n is
the number of discriminant functions). The "Sig." level for this row is the
significance level of the discriminant function as a whole. A significant lambda
means one can reject the null hypothesis that the two groups have the same mean
discriminant function scores and conclude the model is discriminating.


ANOVA table for discriminant scores is another overall test of the DA model. It is
an F test, where a "Sig." p value < .05 means the model differentiates discriminant
scores between the groups significantly better than chance (than a model with just
the constant).

(Variable) Wilks' lambda also can be used to test which independents contribute
significantly to the discrimiinant function. The smaller the variable Wilks' lambda
for an independent variable, the more that variable contributes to the discriminant
function. Lambda varies from 0 to 1, with 0 meaning group means differ (thus the
more the variable differentiates the groups), and 1 meaning all group means are the
same. The F test of Wilks's lambda shows which variables' contributions are
significant. Wilks's lambda is sometimes called the U statistic. In SPSS, this use of
Wilks' lambda is in the "Tests of equality of group means" table in DA output.

Method of Estimation
DA is done by calculating a linear function of the form:
Di = d0 + d1X1 + d2X2 + ..dpXp
Di is the score on discriminate function i; the Xs are the values of the discriminating
variable used in the analysis; di s are weighting coefficients; and d0 is constant.
A single discriminant equation is required if the categorization calls for two groups. If
three groups are involved, it requires two discriminant equations. If more categories are
called for in the dependent variable, it is necessary to calculate a separate discriminant
function for each pair of classification in the criterion group. Here we shall describe twogroup DA.
Let X1 and X2 be the predictor variables; G1 and G2 two groups and n1 and n2 number of
set of observations in G1 and G2, respectively.
Calculation Process
1. Find the mean of X1 and X2. Let 1(G1) be the mean of X1 and 2(G2) be the mean of
in Group-2. Also find the aggregate mean of X1 and X2..
2. In each group, find X12, X22 and X1X2.
3. Define the linear composition as Di = d1X1 + d2X2 and find the value of d1 andd2 by
solving the following normal equations.
d1(X1 - 1)2 + d2 (X1 1) X2 2) = 1(G2) 1(G1)
d1(X1 - 1) (X2 2) + d2 (X2 2)2 = 2 (G2) 2(G1)
The sum of squares in the above normal equation can be substituted with the following
simple formula.
(X1 - 1)2 = (X1 1 (G1))2 + (X1 2 (G2))2


(X2 2)2 = (X2 2 (G1))2 + (X1 2 (G2))2

[(X1 1) (X2 2)] = [(X1 1(G1)) (X2 2 (G1))] + [(X1 1(G2)) (X2 2 (G2))]
(X1 1 (G1))2 = X12 n1 12 (G1)
(X1 1 (G2))2 = X12 n2 12 (G2)
(X2 2 (G1))2 = X22 n1 22 (G1)
(X2 2 (G2))2 = X22 n2 22 (G2)
(X1 1 (G1)) (X2 2 (G1) = X1 X2 - n11(G1) X2(G1)
(X1 1 (G2)) (X2 2 (G2) = X1 X2 - n21(G2) X2(G2)
4. In each group, find the discriminate score for each combination of the variables X1 and
X2. Then find the average of the discriminate scores of each group and also the grant mean
of the discriminate scores for the entire problem.
5. Find the variability between group (VBG) using the following formula:
VBG = n1(S1- S)2 + n2(S2- S)2
Where S1 and S2 are the means of the discriminant scores in the group-1 and group-2,
respectively and S is the aggregate mean of the entire problem
6. Find the variability within group VWG using the following formula:


VWG = (S1j- S1) + (S2j - S2)2




where S1j and S2j are the discriminant scores for the jth set of observations in the group-1
and group-2, respectively; S1 and S2 are the mean of discriminant scores of group-1 and
7. Find the discriminate ratio (K) K = VBG/ / VWG
This is the maximum possible ratio between the variability between groups and the
variability within group.
The director of a management school wants to do discriminate analysis concerning the
effect of two factors, namely, the yearly spending (Rs.lakh) on infrastructure of the school
(X1) and the yearly spending on interface events of the school (X2) on the grading of the
school by an inspection team. The data are given below:
Expenditure on Expenditure on
interface events
(Rs lakh) X1
(Rs lakh) X2
Below average
Below average
Above average 10
Below average

Below average
Above average
Below average
Above average
Below average
Below average
Above average
Above average
Below = 0 and above = 1



Expenditure on
(Rs lakh) X1

Expenditure on
interface events
(Rs lakh) X2

Expenditure on
(Rs lakh) X1

Expenditure on
interface events
(Rs lakh) X2

Calculation process











X1 X2


















X1 X2







Sum of squares
(X1 - 1)2 = (X1 1 (G1))2
+ (X1 2 (G2))2


(X2 2)2 = (X2 2 (G1))2
+ (X1 2 (G2))
[(X1 1) (X2 2)] = [(X1 7
1(G1)) (X2 2 (G1))] + [(X1
1(G2)) (X2 2 (G2))]






Discriminate Function

Di = d1X1 + d2X2
Normal Equations
d1(X1 - 1)2 + d2 (X1 1) X2 2) = 1(G2) 1(G1)
d1(X1 - 1) (X2 2) + d2 (X2 2)2 = 2 (G2) 2(G1)
d1 38 + 11 d2 = 12 6 = 6
d1 11 + 118 d2 = 6 5 = 1
by solving the equations, we get d1 = 0.17229 and d2 = -0.04973
Di = 0.17229X1 0.04973X2
Computation of Discriminate ratio (K)
Mean discriminate score of each Group
Mean score for G-1 = 0.172291 0.049732
= 0.17229 x 6 0.04973 x 5
= 0.78509


Mean score for G-2 = 0.17229 x 12 0.04973 x 6

= 1.7691
Mean score for aggregate = 0.17229 x 8.5 0.04973 x 5.41666
= 1.195094
Summary of Discriminate Scores and their group averages


score (S1j)
0.78509 (G-1)

(S1j- S1)




score (S1j)


(S1j- S2)2


Variability between Groups

VBG = n1(S1- S)2 + n2(S2- S)2
= 7 (0.78509 1.19094) 2 + 5(1.7691- 1.195094)2
= 2.824137

VWG = (S1j- S1)2 + (S2j - S2)2



VWG = 0.78509 + 0.253025

= 0.984006
Discriminate Ratio
= 2.824137 / 0.984006 = 2.87
This is the maximum possible ratio between the variability between groups and the
variability within group. In the discriminate function, the coefficient X2 has negative sign
which indicates that the variable X1 (spending on infrastructure) is more important than
the variable spending on interface events.


Factor analysis can simultaneously manage over a hundred variables, compensate
for random error and invalidity, and disentangle complex interrelationships into their
major and distinct regularities. It takes thousands of measurements and qualitative
observations and resolves them into distinct patterns of occurrence. It makes explicit and
more precise the building of fact-linkages going on continuously in the human mind. It is a
means by which the regularity and order in phenomena can be discerned.
What is Factor analysis?
Factor analysis refers to a variety of statistical techniques whose common
objectives is to represent a set of variables in terms of a smaller number of hypothetical
variables (Kim & Mueller, 1984: 9). It is a technique of data reduction. When a researcher
deals with a large number of variables and does not know exactly which of them are
exogenous and which endogenous, this technique may be of great use for meaningful
analysis and interpretation of data. The term factor analysis was first introduced by
Thurstone in 1931.
Factor analysis assumes that the observed variables are linear combinations of
some underlying (hypothetical or unobservable) factors. Some of these factors are
assumed to be common to two or more variables and some are assumed to be unique to
each variable. The unique factors are assumed to be orthogonal to each other. They do not
contribute to the co-variation among the observed variables (Kim & Mueller, 1983: 8). As
in other multivariate analysis, in factor analysis too, we are concerned with the variance.
We want to know how big it is and where it is. The purpose of this technique is to examine
which variables have what amount of variance in common.
Many statistical methods are used to study the relation between independent and
dependent variables. Factor analysis is different; it is used to study the patterns of
relationship among many dependent variables, with the goal of discovering something
about the nature of the independent variables that affect them, even though those
independent variables were not measured directly. Thus answers obtained by factor
analysis are necessarily more hypothetical and tentative than is true when independent
variables are observed directly.
A factor analysis usually begins with a correlation matrix. It can also use covariances. Without getting deeply into the mathematics, we can say that factor analysis
attempts to express each variable as the sum of common and unique portions. The common
portions of all the variables are by definition fully explained by the common factors, and
the unique portions are ideally perfectly uncorrelated with each other. The degree to which
a given data set fits this condition can be judged from an analysis of what is usually called
the "residual correlation matrix".
A typical factor analysis suggests answers to four major questions:
1. How many different factors are needed to explain the pattern of relationships
among these variables?
2. What is the nature of those factors?
3. How well do the hypothesized factors explain the observed data?
4. How much purely random or unique variance does each observed variable include?


Uses of Factor Analysis

The main applications of factor analytic techniques are: (1) to reduce the number
of variables and (2) to detect structure in the relationships between variables, that is to
classify variables. Therefore, factor analysis is applied as a data reduction or structure
detection method. If a scientist has a table of data and he suspects that these data are
interrelated in a complex fashion, then factor analysis may be used to untangle the linear
relationships into their separate patterns.
Factor analysis may be employed to discover the basic structure of a domain. As a
case in point, a scientist may want to uncover the primary independent lines or
dimensions--such as size, leadership, and age--of variation in group characteristics and
behavior. Data collected on a large sample of groups and factor analyzed can help disclose
this structure.
It can also be used to group interdependent variables into descriptive categories,
such as ideology, revolution, liberal voting, and authoritarianism.
A scientist often wishes to develop a scale on which individuals, groups, or nations
can be rated and compared. A problem in developing a scale is to weight the
characteristics being combined. Factor analysis offers a solution by dividing the
characteristics into independent sources of variation (factors). Each factor then represents
a scale based on the empirical relationships among the characteristics.
The technique can be used to transform data to meet the assumptions of other
techniques. For instance, application of the multiple regression technique assumes that
predictors are statistically unrelated. If the predictor variables are correlated in violation
of the assumption, factor analysis can be employed to reduce them to a smaller set of
uncorrelated factor scores. The scores may be used in the regression analysis in place of
the original variables, with the knowledge that the meaningful variation in the original
data has not been lost.
Major Steps in Factor Analysis
1. The first step in factor analysis is the preparation of data matrix, which has two modes
-- entity mode, which represents cases (observations) arranged as rows and the variable
mode, which represents the variables arranged as column. After data matrix,
correlation matrix of variables is prepared.
2. The second step in this analysis is the extraction of common factors that can
adequately explain the observed correlation among the variables. There are several
methods of extraction such as: Maximum Likelihood, Least Square, Alpha Factoring,
Image Factoring, and Principal Component Analysis. The main purpose of extraction
is to know whether a small number of factors can account for the correlation among a
much larger number of variables.
3. There are several criteria to determine the number of initial factors to be extracted by
PC. Notable among them are Scree-Test and Eigenvalue greater than or equal to one
criterion as suggested by Kaiser (1969). In the present analysis, we have applied both
the criteria. Both methods provide the same number of factors.


4. The initially extracted factors are rarely interpretable. In order to get the meaningful
results from the initially extracted common factors, the next step is the rotation of
these factors. The purpose of rotation is to achieve the simplest possible factor
structure. Method of rotation can not improve the degree of fit between the data and
factor structure. It makes the results interpretable. There are several methods of
rotation. In orthogonal rotation, three methods: Quartimax, Varimax, and Equimax, are
applied, while in oblique rotation, two methods: Reference Axes, and Primary pattern
matrix, are used. According to Harman (1968), the varimax solution seems to be the
best parsimonious analytical solution.
5. Lastly, for interpretation and analysis of factors, variables with highest factor loadings
(weights) are taken into account
How many Factors to Extract?
Note that as we extract consecutive factors, they account for less and less
variability. The decision of when to stop extracting factors basically depends on when
there is only very little "random" variability left. Kaisers criterion of eigenvalues
greater than 1, can be adopted for identification of factors. This criterion, proposed by
Kaiser (1960) is probably the one most widely used. Another method is the scree test
first proposed by Cattell (1966). We can plot the eigenvalues shown above in a simple
line plot.
Cattell suggests to find the place where the smooth decrease of eigenvalues
appears to level off to the right of the plot. According to this criterion, we would
probably retain 2 or 3 factors in our example.

Terminology of factor analysis

1. Common factor: a factor on which two or more variables load.
2. Common variance: variance in a variable shared with common factors. Factor analysis
assumes that a variable's variance is composed of three components: common, specific and
3. Communality: the proportion of a variable's variance explained by a factor structure. It
is is denoted by h2
4. Complex variable: a variable which loads on two or more factors.


5. Eigenvalue: the variance in a set of variables explained by a factor or component, and

denoted by lambda. Eigenvalue is the sum of squared values in the column of a factor
6. Factor loading: a term used to refer to factor pattern coefficients or structure
7. Factor scores: linear combinations of variables that are used to estimate the cases'
scores on the factors or components. Least squares estimates of factor scores are the most
commonly used.
8. Parsimony principle: When two or more theories explain the data equally well, select
the simplest theory. Factor analysis application: If a two-factor and a three-factor model
explain about the same amount of variance, interpret the two-factor model.
9. Unique variance: that variance of a variable that is not explained by common factors.
Unique variance is composed of specific and error variance.
10. Varimax rotation: an orthogonal rotation criterion which maximizes the variance of
the squared elements in the columns of a factor matrix. Varimax is the most common
rotational criterion.
Explanation of Method by Example
Factor analysis begins with the construction of a new set of variable based on the
relationships in the correlation matrix. While this can be done by number of ways, the
most frequently used approach is Principal Component Analysis. This method transforms
a set of variables into a net set of composite variables or principal components that are not
correlated with each other. These linear combinations of variables are called factors,
account for the variance in the data as a whole. The best combination makes up the first
principal components and is the first factor. The second principal component is defined as
the best linear combination of variables for explaining the variance not accounted for by
the first factor. In turn, there may be a third, fourth, and kth component, each being the
best linear combination of variables not accounted for by the precious factor. The process
continues until all the variance is accounted for, but as a practical point of view it is
usually stopped after a small number of factors have been extracted.
The output of PCA of a hypothetical case is shown in the following table.
Eigen Value
Percentage of Variance
Cumulative Percentage

Un-rotated factors


Rotated factors

The values in this table are correlation coefficients between the factor and the variable. For
instance 0.70 is the r between variable A and Factor I. These correlations are called

loadings. Eigen values are the sum of variance of factor loadings. For example, eigen
value for factor I is 0.702 + 0.602+ 0.502+ 0.602+ 0.602. When divided by number of
variables, an eigenvalue yields an estimate of the amount of total variance explained by
the factor. Communalities (h2) measure the variance in each variable that is explained by
the two factors. It is sum of squares of factor loadings of all the factors for a variable. For
instance, with variable A, communality is 702 + -0.402 = 0.65, indicating that 65 per cent of
the variance in variable A is statistically explained in terms of factor I and factor II.
Un-rotated factor loadings do not provide meaningful results. They are difficult to
interpret. What one would like to find is some pattern in which factor I would be heavily
loaded on some variables and factor II on others. Such a condition would suggest rather
pure constructs underlying each factor. You attempt to secure this less ambiguous
condition between factors and variables by rotation. This procedure can be conducted
through orthogonal method. Rotated factor loadings are given in the table. This shows that
the measurement from six variables may be summarized by two underlying factors.
The interpretation of factor loadings is largely subjective. There is no way to
calculate the meanings of factors; they are what one sees in them. For this reason, factor
analysis is largely used for exploration. One can detect patterns in latent variables,
discover new concepts and reduce data.
Analysis of variance is a special case of regression model, which is generally used to
analyse data collected using experimentation. Multivariate analysis of variance
(MANOVA) examines the relationship between several dependent and independent
variables. Whereas ANOVA assess the differences between groups, MANOVA examines
the dependence relationship between a set of variables across a set of groups. It is a
technique which determines the effects of independent categorical variables on multiple
continuous dependent variables. It is usually used to compare several groups with respect
to multiple continuous variables. The main distinction between MANOVA and ANOVA
is that several dependent variables are considered in MANOVA.
Classification of MANOVA
1. One-way MANOVA: it is similar to the one-way ANOVA. It anayses the
variance between one independent variable and multiple dependent
2. Two-way MANOVA
Assumptions of MANAVA
1. Normal Distribution
The dependent variables should be normally distributed within groups. Overall, the F test
is robust to non-normality, if the non-normality is caused by skewness rather than by
outliers. Tests for outliers should be run before performing a MANOVA, and outliers
should be transformed or removed.
2. Linearity
It assumes that there are linear relationships among all pairs of dependent variables, all
pairs of covariates, and all dependent variable-covariate pairs in each cell.


3. Homogeneity of Variances
Homogeneity of variances assumes that the dependent variables exhibit equal levels of
variance across the range of predictor variables. Homoscedasticity can be examined
graphically or by means of a number of statistical tests.
4. Homogeneity of Variances and Covariances
In multivariate designs, with multiple dependent measures, the homogeneity of variances
assumption described earlier also applies. However, since there are multiple dependent
variables, it is also required that their intercorrelations (covariances) are homogeneous
across the cells of the design.
5. Multicollinearity and Singularity
When correlations among dependent variables are high, problem of multicollinerarity and
singularity exists. Multicollinearity when the relationship between pairs of variables is
high (r>.90). Singularity a variable is redundant; if it is a combination of two or more of
the other variables.
A social scientist wished to compare those respondents who had lodged an organ donor
card with those who had not. Three hundred and eighty eight new drivers completed a
questionnaire that measured their attitudes towards organ donation, their feelings about
organ donation and their previous exposure to the issue. It is hypothesized that individuals
who agreed to be donors would have more positive attitudes towards organ donation, more
positive feelings towards organ donation and greater previous exposure to the issues.
Therefore, the independent variable was whether a donor card had been signed and the
dependent variables were attitudes towards organ donation, feelings towards organ
donation and previous exposure to organ donation. Attitudes and feelings are measures on
traditional scales with a Likert scale response format. Exposure was measured in terms of
media exposure and personal experience. Conceputally and theoretically these dependent
variables were believed to be related and so MANOVA was the analysis of choice.
Complete data are available on www.johmwiley.com.au/highered/spssv

Between-Subjects Factors
Value Label
signed donor card





Descriptive Statistics




exposure to donation issues yes

Std. Deviation











attitude towards organ














feelings towards organ















Box's Test of Equality of


Covariance Matrices
Box's M








Tests the null hypothesis that the

observed covariance matrices of the
dependent variables are equal across
a. Design: Intercept + donor

Multivariate Tests



Hypothesis df

Error df


Pillai's Trace






Wilks' Lambda






Hotelling's Trace






Roy's Largest Root






Pillai's Trace






Wilks' Lambda






Hotelling's Trace






Roy's Largest Root






a. Exact statistic


Box's Test of Equality of


Covariance Matrices
Box's M








Tests the null hypothesis that the

observed covariance matrices of the
dependent variables are equal across
b. Design: Intercept + donor

Levene's Test of Equality of Error Variances

exposure to donation issues
attitude towards organ













feelings towards organ


Tests the null hypothesis that the error variance of the dependent variable is
equal across groups.
a. Design: Intercept + donor

Tests of Between-Subjects Effects

Type III Sum

Dependent Variable


exposure to donation


attitude towards organ
feelings towards organ


exposure to donation
attitude towards organ
feelings towards organ


of Squares




















1 331042.346 3984.772









exposure to donation
attitude towards organ
feelings towards organ


exposure to donation
attitude towards organ
feelings towards organ


exposure to donation
attitude towards organ
feelings towards organ


exposure to donation


attitude towards organ
feelings towards organ


































a. R Squared = .010 (Adjusted R Squared = .007)

b. R Squared = .010 (Adjusted R Squared = .008)
c. R Squared = .029 (Adjusted R Squared = .026)

Estimated Marginal Means

signed donor card

95% Confidence Interval

Dependent Variable


exposure to donation issues yes


Std. Error

Lower Bound

Upper Bound










attitude towards organ












feelings towards organ













Boxs M tests the homogeneity of the variance-covariance matrices. We have

homogeneity of variance because this test is not significant at an alpha level 0.001.
The multivariate tests of significance test whether there are significant group
differences on a linear combination of the independent variables. We notice that several
statistics are available. Pillais Trace Criterion is considered to have acceptable power and
to be the most robust statistic against violation of assumptions. Having obtained a
significant multivariate effect for donor, i.e., a significant of F less than 0.05. an
examination of the univariate F-test for each variable indicates which individual dependent
variables contribute to the significant multivariate effect.
We can conclude that a persons decision to act a donor is significantly influenced
by their feelings towards organ donation. No significant main effects were found for the
other dependent measures.



What are research reports and why would I write one?

1. A research report is the only concrete evidence of your research.
2. The quality of the research may be judged directly by the quality of the writing and
how well you convey the importance of your findings.
3. If you are submitting a research report for a class or to an organization, check for
specific requirements and guidelines before beginning to write your research
Types of Report
Academic overview
All vary slightly in their purpose & structure.
Writing & Editing Your Report: Writing the first draft for yourself
Where do you start writing: the introduction or elsewhere?

Reports are rarely written in linear order. The order for writing the final sections
may be Conclusions, Introductions & finally the abstract. These are the sections
most likely to be read by readers
For every 1000 readers who see your title, 100 may read the abstract, perhaps 10
will read some of the main report [conclusions, some results etc], at most 1 may
follow all the way through
A middle section such as the methods/ system design may be a good starting point
Writing notes for the introduction, some background theory or a review of previous
studies may help you to clarify the focus of your report.

Writing specifications
Use 10 or 12 point font
The most acceptable fonts: Ariel, Times New Roman (the old reliable), Verdana,
Unacceptable fonts:Broadway, Brush Script, Chiller, Courier, Freestyle Script, Gigli, Old
English Text, Playbill, etc.
Your reports should be in pristine condition when they are turned in
No frayed edges or coffee stains (front OR back)
Section/point identification systems
It represents important choices made by the writer regarding:

The range of the material covered,


The relative importance of the sections in the report, and the relatedness of
information within sections.

It, Therefore, plays a very important role in communicating meaning to the reader. The
report presents meaning and information in two complementary and equivalent ways:
- The meaning represented by the words, thought, research,

- The meaning represented by the layout
Choosing the Layout System
chooses one of the following two layout systems:

The decimal numbering, or

The number-letter.

Once a system is chosen, the writer must present this system consistently throughout the
The decimal numbering System
First level
(of importance/generality) (also termed the A heading)
N.B. The `point-zero' is not always used in decimal numbering
Second level 1.1
(also termed the B heading)
Third level
(also termed the C heading)






Fourth level

(also termed the D heading)
Writer must present this system consistently throughout the report
The Indenting with decimal numbering System
This is generally used with indenting to structure the text in the following way.
It is possible for a reader to gain a strong indication of the relatedness, and relative
importance of the parts of the text as a result of this layout, even though no meaning from
the content is provided.

1.0 ________________________________
1.1 _______________________________
1.2 _______________________________
1.2.1 ________________________
1.2.2 ________________________ _______________ _______________
2.0 ________________________________


Number - letter
(still encountered, but becoming less commonly used)
First level (of importance/generality)
(A heading) I


Second level
(B heading) A

Third level
(C heading)













Fourth level
(D heading) (a)
Fifth level
(E heading) (i)

The Indenting with Number - letter System

I ____________________________________
A ________________________________
B _____________________________
-1 __________________________
-2 __________________________
(a) ___________________
-(b) ___________________

II ___________________________________
A ________________________________

Writing your report: Developing the structure

Abstract Executive Summary.
Introduction Background Literature review.
Results (Analysis).
Conclusion (Recommendations).
How to Write a Report. The Title Page
General guidelines:
-There are four main pieces of information that have to be included into the title page:
- The report title;
- The name of the person, company, or organization for whom the report has been
- The name of the author and the company or university which originated the report;


- The date the report was completed.

-A title page might also include contract number, a security classification, or a copy
number depending on the nature of the report you are writing.

Table of contents
Your report should include a table of contents if longer than about 5-10 pages.

This allows the reader to quickly find the relevant section.

While many word processing packages will automatically generate a table of

contents, it is wise to check that the page numbers are correct before printing and
before submission.

Reports: Writing the abstract

The abstract is of utmost importance, for it is read by 10 to 500 times more people than
hear or read the entire article. It should not be a mere recital of the subjects covered. The
abstract should be a condensation and concentration of the essential information in the
Although the abstract is first part of the report you read, you should write it last, after
writing the introduction.

Needs to stand alone i.e. be complete in itself,

Allows the reader to gain a v. brief but complete overview of your entire report
from aims to conclusions

Does not act as an introduction

Typically 100-200 words; one paragraph

Highly succinct but must be cohesive- i.e. flow well

Most important section along with conclusions

Beginning an introduction
Introductions serve as a place for you:
To catch your readers attention,

Also help to place your project in its context (whether that context is background
information or your purpose in writing is up to you).

Consider the following examples; they represent two extremes that writers can take in
beginning their introductions.
What is the problem with this sentence as an opening to an introduction?
The universe has been expanding from the very moment that it was born.
One of the ways that the sentence above might be rewritten is:
Recent studies suggest that the universe will continue expanding forever and may pick
up speed over time.


The rewritten sentence establishes the reports context within recent studies concerning
a specific theory related to universe expansion. This context is much more specific than
that of the original sentence.
The introduction may carry out the following roles

Gives some background to the study & sets the scene for the report.

Explains connections with any previous work & gap

Explains background theory [longer reports?]

Explains your aims/hypotheses clearly

Explains briefly what you will do & why the study is being carried out
Explains briefly how the report is structured [signposting]
Main Body
Text with headings and sub-headings.
In general, the body of the research report will include three distinct sections:
A section on theories, models, and your own hypothesis
A section in which you discuss the materials and methods you used in your
A section in which you present and interpret the results of your research

The headings should be self-explanatory. The main body of the report needs to be
clear, concise and follow a logical order.

Figures and tables must be referred to in the body of the text and need to have clear
captions. Label figures at the bottom and tables at the top in numerical order.

Each figure should be capable of being understood on its own using the caption as
the only reference.

Theories, Models and Your Own Hypothesis

This section can be very important, especially for:

Research articles,
Formal reports, or
scientific papers.

Inclusion of such theories and models directly affects:

The hypothesis that you propose and on which you base your research.
When you develop hypotheses, you predict what you will find after you conduct your
research. This prediction is based on existing theories, models, evidence, and logic.

In this section, you may need to:

Define and explain your hypothesis and the theories and models you used to
develop it.


Define and explain competing hypotheses, theories, and models, including their
strengths and weaknesses.

Compare and contrast the specific points where they agree or disagree.

The Literature Review: What others did to solve the problem

Review important relevant sources
Theoretical framework
Methodological or conceptual gaps/flaws of previous research
Leads to your research question/hypothesis

The following questions are good ones to work through: What do I expect this
experiment to reveal? Why?

How does my hypothesis directly answer the question posed by the problem?

How does the hypothesis fit in with other hypotheses or more general theory? How
will my work challenge or support the work of others?

What is the current theory to which it relates?

What are alternative views to this theory? What are the strengths and weakness of
those views?
On what literature did I or can I base my explanation?

Materials and Methods

All materials and methods sections should address the following questions:

How was the experiment designed?

On what subjects or materials was the experiment performed?

How were the subjects/materials prepared?

What machinery and equipment was used in the experiment?

What sequence of events did you follow as you handled the subjects/materials or as
you recorded data?


Results: Presenting data

All preceding sections of the report (Introduction, Materials and Methods, etc.) lead in
to the Results section of the report and all subsequent sections will consider what the
results mean (conclusion, recommendations, etc.).
How should I incorporate figures and tables into my report?
Figures and tables should help to simplify information, so you should consider using them
when words are not able to convey information as efficiently as a visual aid would be able
Consider using figures and tables when you need to decipher information or the analysis
of information, when you need to describe relationships among data that are not apparent
otherwise, and when you need to communicate purely visual aspects of a phenomenon or
Tables or lists are simple ways to organize the precise data points themselves in one-onone relationships.
A graph is best at showing the trend or relationship between two dimensions, or the
distribution of data points in a certain dimension (i.e., time, space, across studies,
A pie chart is best at showing the relative areas, volumes, or amounts into which a whole
(100%) has been divided.
Flow charts show the organization or relationships between discrete parts of a system. For
that reason they are often used in computer programming.

The most important general rule is that tables and figures should supplement rather
than simply repeat information in the report.

You should never include a table or figure simply to include them. This is
redundant and wastes your readers time.

Additionally, all tables and figures should:

Be self-containedthey should make complete sense on their own without
reference to the text
Be cited in the textit will be very confusing to your audience to suddenly come
upon a table or figure that is not introduced somewhere in the text. They will not
have a context for understanding its relevance to your report.
Include a number such as Table 1 or Figure 10this will help you to distinguish
multiple tables and figures from each other.

Include a concise titleit is a good idea to make the most important feature of the
data the title of the figure.

Use legends and clear, concise, descriptive titles for tables and figures.

Ensure all axes of graphs are labeled and that units are identified in all
tables and figures


Results & Discussion: Interpretation of Data

This section of the report is important because it demonstrates the meaning of your
This section of the paper draws upon writing skills that other sections do not because you
need to write persuasively in this section as you convince readers that your interpretation
of data is logical and correct.
As you develop your argument in this section, consider arranging your evidence in the
order that best highlights your main point, cite authorities that have come to similar
interpretations under similar circumstances, and consider the superiority of your
conclusions to opposing viewpoints.
For most research reports, the most certain part of your case will be your data, and
many research reports will develop along this outline:

Begin with a discussion of the data.

Move on to generalize about or analyze the data.

Consider how the data addresses the research problem or hypothesis outlined in the

Proceed from most general features of the data to more specific results

Discuss what can be inferred from the data as they relate to other research and
scientific concepts
Compare with other studies and draw conclusions based on your findings.

Refer back to the original hypotheses you were testing

Identify sources of error/limitation and any inadequacies of your techniques

especially if:

your results are inadequate, negative, or not consistent with earlier studies or with
your own hypothesis.

Do not try to defend your research or minimize the seriousness of the limitation in
your interpretation; instead, focus on the limitation only as it affects the research
and try to account for it.

One Way to sum up the results and discussion could be as follows:

What is already known on this topic
Low birth weight is associated with poor cognitive development
Few studies have examined this association across the full birthweight range in the
normal population.
What this study adds
Birth weight is significantly associated with cognitive ability at age 8 years
through adolescence, and into early adulthood, independent of social background

The associations between birth weight and cognitive function at ages 8, 11, and 15
are evident across the normal birthweight range (>2.5 kg) and so are not accounted
for exclusively by low birth weight
Birth weight is also associated with educational attainment, suggesting that the
association between birth weight and cognition may have functional implications

The conclusion is important because:
It is your last chance to convey the significance and meaning of your research to
your reader by concisely summarizing your findings and generalizing their

It is also a place to raise questions that remain unanswered and to discuss

ambiguous data.

The conclusions you draw are opinions, based on the evidence presented in the
body of your report, but because they are opinions you should not tell the reader
what to do or what action they should take.

Be sure that you use language that distinguishes conclusions from inferences.
Use phrases like This research demonstrates . . . to present your conclusions and phrases
like This research suggests . . . or This research implies . . . to discuss implications.
Make sure that readers can tell your conclusions from the implications of those
conclusions, and do not claim too much for your research in discussing implications. You
can use phrases such as Under the following circumstances, In most instances, or In
these specific cases to warn readers that they should not generalize your conclusions.
You might also raise unanswered questions and discuss ambiguous data in your
Raising questions or discussing ambiguous data does not mean that your own work is
incomplete or faulty; rather, it connects your research to the larger work of science and
parallels the introduction in which you also raised questions.
The following is an example taken from a text that evaluated the hearing and speech
development following the implantation of a cochlear implant. The authors of Beginning
To Talk At 20 Months: Early Vocal Development In a Young Cochlear Implant
Recipient, published in Journal of Speech, Language, and Hearing Research, titled their
conclusion Summary and Caution. Using this title calls readers attention to the
limitations of their research:
This section appears in a report when the results and conclusions indicate that further
work needs to be done or when you have considered several ways to resolve a problem or
improve a situation and want to determine which one is best.
This gives you another opportunity to demonstrate how your research fits within the larger
project of science.

It also demonstrates that you fully understand the importance and implications of your
research, as you suggest ways that it could continue to be developed.

Reference sections are important because:

Like the sections on the procedure you used to gather data, they allow other
researchers to build on or to duplicate your research.

Without references, readers will not be able to tell whether the information that
you present is credible, and they will not be able to find it for themselves.

Reference sections also allow you to refer to other researchers work without
reviewing that work in detail. You can refer readers to your reference page for
more information.

It is best to compile your own reference list containing a variety of information. This will
save you from having to track down pieces of information you may have neglected to
make note of if they are specifically requested after you have filed a source, returned it to
the library, or misplaced it.
Information to include on your reference list
The reference list is placed at the end of the report.

It is arranged in alphabetical order of authors' surnames and chronologically for

each author.

The reference list includes only references cited in the text. The author's surname is
placed first, immediately followed by the year of publication. This date is often
placed in brackets.

The title of the publication appears after the date followed by place of publication,
then publisher (some sources say publisher first, then place of publication).

The important thing is to check for any special requirements or, if there are none,
to be consistent.

Information to include on your reference list

1. The Harvard (author-date) system is the one usually encountered in the sciences
and social sciences.
2. Notice that the titles of books, journals and other major works appear in italics (or
are underlined when handwritten), while the titles of articles and smaller works
which are found in larger works are placed in (usually single) quotation marks.
Information to include on your reference list


1. The Harvard (author-date) system is the one usually encountered in the sciences
and social sciences.
2. Notice that the titles of books, journals and other major works appear in italics (or
are underlined when handwritten), while the titles of articles and smaller works
which are found in larger works are placed in (usually single) quotation marks.
Harvard System: Examples
Begon, M., Harper, J.L. & Townsend, C.R. (1990). Ecology: Individuals, Populations and
Communities. Oxford: Blackwell Scientific Publications.
Hirschberger, P. & Bauer, T. (1994). The coprophagous insect fauna and its influence on
dung disappearance. Pedobiologia, 38, 375-384.
Holt, R.D. (1993). Ecology at the mesoscale: the influence of regional processes on local
communities. In R. E. Ricklefs & D. Schluter (Eds.), Species Diversity in Ecological
Communities. Chicago: University of Chicago Press, 77-88.
Crook, A. C. & Finn, J. (2002). STARS: Scientific Training by Assignment for Research
Students [online]. Available from: http://www.ucc.ie/research/stars [Accessed 16th
November 2004].


When the exact words of a writer are quoted, they must be reproduced exactly in
all respects: wording, spelling, punctuation, capitalisation and paragraphing.

Quotations should be carefully selected and sparingly used, as too many quotations
can lead to a poorly integrated argument.

Use of a direct quotation is justified when:

--changes, through paraphrasing, may cause misinterpretation
--the original words are so concisely and convincingly expressed that they
cannot be improved upon

--a major argument needs to be documented as evidence

--the student wishes to comment upon, refute or analyse the ideas expressed
in another source.
The intention of the original text must not be altered.

Short quotations (up to 4 lines):

Incorporate the quotation into the sentence or paragraph, without disrupting the flow of
the text, using the same spacing as in the rest of the text. The source of the quotation is
either acknowledged in a footnote or in the text. Use single quotation marks at the
beginning and end of the quotation:
EXAMPLE: The Style Manual (1978, p. 46) states that 'the modern tendency to use single
quotation marks rather than double is recommended.'


Long quotations (more than thirty words):

Indent the quotation from the remainder of the text.
Do not use quotation marks.
Some writers recommend the use of smaller type or italics to set off indented quotations.
Introduce the quotation appropriately, and cite the source at the end of the quotation as
you would in your text
Irrelevancies within very long quotations can be omitted by the use of an ellipsis which is
indicated by three spaced dots (. . .).
Nowadays it is not usual to place an ellipsis at the beginning or the end of a quotation
which is intended to stand alone or forms part of one of your own sentences.

You should place information in an Appendix that is relevant to your subject but
needs to be kept separate from the main body of the report to avoid interrupting the
line of development of the report.

An appendix should include only one set of data, but additional appendices are
acceptable if you need to include several sets of data that do not belong in the same

Label each appendix with a letter, A, B, C, and so on.

Do not place the appendices in order of their importance to you, but rather in the
order in which you referred to them in your report. You should also paginate each
appendix separately so that the first page of each appendix you include begins with

Defining Your Terms

A good general rule to follow is to define all terms that you are not completely sure
your audience will understand the same way you do.

Words to focus on are those key to your research, those relatively new or unfamiliar,
and those that readers could not look up for themselves in a standard dictionary.

o You should take your audience into consideration when determining when to include
jargon in your writing.
o Consider their vocabulary and whether they will be familiar with a word or phrase
before you use it.


o Do not simply choose to include jargon without taking your audience into
consideration. Jargon can come between your writing and your reader, and readers
who do not understand jargon may see your use of jargon as impolite.

Writing Numbers, Measurements, and Equations

Writing Numbers
1) Spell out numbers between zero and ten and use figures for all other numbers.
Examples: two cats, 11 materials, one attempt, 20,000 residents
Unfortunately, there are a number of exceptions to this general guideline. Make sure that
you are as familiar with the exceptions as you are with the rule.
Mathematical operations
Raised to the power of 4
Units of measurement
6 feet
9 years old
1 pm
June 8, 2001
Page numbers
Page 4
2 percent
All numbers that begin a sentence should be spelled out.
Seven times the tests failed.
2) When you use two or more numbers in the same section of writing, use figures. This
makes them easier to see and compare.
Example: We are requesting funding to purchase 25 pumps, 15 fans, and 5 ducts.
If none of the numbers included is larger than 10, then spell out all of the numbers.
We are requesting funding to purchase nine pumps, six fans, and three ducts.
3) Form the plural of a number by adding s.
Example: All of the 15s tested within acceptable limits.
4) Use hyphens when you write fractions, a sequence or range of values, and between
number and unit of measure when they modify a noun.

Sequence or range of values

Pages 167-170
Pages 224-35
Number and until of measure used to modify a noun
20-pound dog
20-ounce pitcher
5) Use decimals instead of fractions, whenever possible. Decimals are easier to type and
to read. Write both decimals and fractions as figures.
6) A zero is always placed before the decimal point for numbers less than one.
7) Spell out the shorter of two numbers that appear consecutively in a phrase.
Not:But:4 6-inch nails - 4 six-inch nails
20 1,000-piece puzzles - Twenty 1,000-piece puzzles
Writing Measurements
9) Separate the figure from the name of the measure with a space, but do not separate % or
$ from the figure with a space.
Examples: 3.4 hr $22 50%
10) Do not use a period after the abbreviation of a measure.
Example: 3.4 hr
11) Use figures for years and decades and dont abbreviate them.
Not:But:30s The 1930s
The fifties The 1950s

Writing Equations
12) Place equations on a separate line and number them consecutively with a number in
parentheses at the right margin.
13) Do not use punctuation after the equation, but punctuate words to introduce equations
as you would words forming any other sentence.
14) Refer to an equation in the body of the text by its number in parentheses.
Reports: Writing the Results
Presents the results [data] from the experiment or model
Do not just include figures and tables, ensure that your text provides:
a commentary guiding the reader through the figures & tables: location &
summary of purpose in report e.g. Figure 3.2 shows how the incidence of malaria
increases when; statement highlighting key trends
Check that figures are clearly presented [see slide 10]

Remember that the reader will look at the figures & tables only if directed to do
so in the text.
Editing Your Report with a Critical Friend


Murphy's Law of Errata Detection: "The very first person to see your mistake is always
the last person you want to know about it.
Reading your own work, you dont always spot errors as you may read your draft in the
way you want it to sound
Work with a critical friend- someone who gives honest advice- perhaps outside your field
As soon as you sit in front of the paper with your critical friend, your perspective may
change from that of the writer to a potential reader of the paper
Dont over edit the 1st part you write
Try using the editing questions provided by the Purdue Uni. Online Writing Lab- next
Style & Vocabulary
Formal & objective
No 'I' or 'You'; no contracted forms cant
Avoidance of direct questions & standard negatives
No colloquial English: lots, stuff, things
Formal verbs chosen e.g. investigated (from Latin/ Greek, rather than look into(not 2 part verbs)
Precise & often abstract vocabulary
A Few Grammar Points

Passive e.g. Tests were made frequently used; dont overuse

Longer sentences often used with clauses dont make them too long or complex.
Make claims carefully using the modals: can, may, might, etc This compound may
cause an increased incidence of
Noun groups are often used to convey information concisely (nominalisation) e.g.

Its and its: Dont use its, which is a spoken contraction of it is/it has; its is a
possessive adjective

To Sum up
Use and evaluate all the data you report and do not be discouraged if your results differ
from published studies or from what you expected

Justify all tables and figures by discussing their content and labeling them clearly
Be creative in your presentation of data, your analysis, and your interpretation of
data - play around with different variations before completing your report
Do not force conclusions from your data or fudge data by omitting that which does
not support pre-conceived conclusions
Make sure all calculations and analyses are relevant to the hypotheses you are
testing and the overall objectives of the study
Justify your ideas and conclusions with data, facts, and background literature and
with sound reasoning


Ensure to keep the different sections of the report discrete, i.e. methods in the
methods section, results in the results section, and leave discussion and
interpretation of those results for the discussion section
Plan your writing: organize your thoughts and data, and sketch the report before
actually writing. This will help maximize your time efficiency and lead to a
concise, well structured report


14. List of Participants : QIP Short Term Course

Research Methodology & Quantitative Techniques with Software Applications
During: 31.05.2010 to 04.06.2010



Name & Address

Ms. Meenakshi
Lecturer in Architecture Department
Guru Nanak Dev University
AMRITSAR 143 005
Mob.: 9855288667



Ms. Simrandeep Kaur

Asstt. Professor in DOMS
Adesh Institute of Engineering & Technology
Sadiq Road,
FARIDKOT 151 203
Mob.: 9872960160



Mr. Nitin Girdharwal

Sr. Lecturer in Management Studies Department
Krishna Institute of Engineering and Technology
Ghaziabad Meerut Highway (NH 58)
GHAZIABAD 201 206 (U.P.)
Mob.: 9997123173



Ms. Neha Sharma

Lecturer in Management Studies Department
Dehradun Institute of Technology
Mussoorie Diversion Road,
Mob.: 9927007669



Mr. Ankur Bhatnagar

Lecturer in M.B.A Department
Shriram Institute of Management & Technology
7th KM milestone , Kartipur
Mob.: 9319681348



Mr. Jitendra Singh Chauhan

Lecturer in Faculty of Management Department
Graphic Era University, Dehradun
Clement Town
Mobile : 9760355007





Name & Address

Mr. Sachin Ghai
HOD in Faculty of Management Department
Graphic Era University, Dehradun
Clement Town
Mobile : 9837333163



Mr. Mukesh Sehrawat

Asstt. Prof. in Management Department
Accurate Institute of Management & Technology
Plot No. 49, Knowledge Park III,
GREATER NOIDA 201 308 (U.P.)
Mobile : 9278936484



Ms. Deepshikha
Sr. Lecturer in Management Studies Department
Dehradun Institute of Technology, Dehradun
Mussoorie Diversion Road, Village Makkawala
Mobile : 9837135331



Ms. Rhuta Mehta

Asstt. Prof. in Faculty of Engineering (M.B.A.) Deptt.
Marwadi Education Foundations Group of Instiutions
Rajkot Morbi Highway,
Mobile : 9428701164



Ms. Nutan Singh

Lecturer in Management Studies Department
Institute of Management Studies, Dehradun
C/o Mr. Vipin Kumar Rathore
105, RTO Campus, Rajpur Road
Mobile : 9415175057



Ms. Aishwarya Mehata

Lecturer in Management Studies Department
Graphic Era University
566, Bell Road, Clement Town
Mobile : 9837289686



Mr. Uday Khanna

Lecturer in Management Studies Department
Graphic Era University, Dehradun
566/6, Bell Road, Clement Town,
Mobile : 9319701501





Name & Address

Mr. Sarvendu Tiwari
Lecturer in M.B.A. Department
Skyline Institute of Engineering & Technology
Plot No. 03, Knowledge Park II,
Mobile : 9953585288



Mr. Sudhir Kumar Singh

Assoc. Prof. in Mechanical Engineering Department
Skyline Institute of Engineering & Technology
Plot No. 03, Knowledge Park II,
Mobile : 9818547947



Ms. Anusha Tayal

Lecturer in Management Studies Department Krishna
Institute of Engineering & Technology
13th Km Stone, Ghaziabad Meerut Road,
GHAZIABAD 201 206 (U.P.)
Mobile : 9711882062



Ms. Priya Grover

Asstt. Prof. in Management Studies Department
Uttaranchal Institute of Management, Dehradun
Arcadia Grant, P.O. Chandanwari, Premnagar
Mobile : 9897214320



Mr. Gagan Deep Sharma

Sr. Lecturer in Management Studies Department
Baba Banda Singh Bahadur Engineering College
Mobile : 9915233734



Mr. Saurabh Joshi

Assoc. Prof. in Management Studies Department
Uttaranchal Institute of Management
Arcadia Grant,
P.O. Chandanwari, Premnagar
Mobile : 9410121204
Mr. Shubhagata Roy
Sr. Lecturer in Management Studies Department
School of Management of Sciences
P. O. BACCHAON 221 011 (U.P.)
Village Khushipur, Varanasi
Mobile : 9336902571







Name & Address

Mohammad Zohair
Lecturer in Management Studies Department
School of Management of Sciences
Village Khushipur, Varanasi
P. O. BACCHAON 221 011 (U.P.)
Mobile : 9450537393



Mr. Rajiv Sindwani

Lecturer in Management Studies Department
YMCA University of Science & Technology
Sector 6,
Mobile : 9810432422



Ms. Suchitra Singh

Asstt. Professor in Computer Applications
Ajay Kumar Garg Engineering College
27 Km. Stone Delhi-Hapur Bypass Road
P.O.Adhyatmik Nagar
GHAZIABAD-201 009 (U.P.)
Mobile : 9971400700



Mrs. Suman Dhawan

Lecturer in Training and Technical Education Deptt.
Meera Bai Institute of Technology
Maharani Bagh
NEW DELHI 110 065
Mobile: 9310856940



Mr. Rajesh Gupta

Sr. Lecturer in Mechanical Engineering Deptt.
Dr. B.R.A., N.I.T. Jalandhar
Mobile: 9463135222



Ms. Ritu
Lecturer in M.B.A Department
S. D. Institute of Professional Studies
Mandi Samiti Road,
Mobile: 9897751213



Ms. Shalu
Lecturer in Management Studies Department
Dr. K.N. Modi Institute of Engg. & Technology
Opposite Satish Park, Kapda Mill, Modinagar
Mobile: 9808463239





Name & Address

Ms. Anshu Bhatia
Asstt. Professor in M.B.A Department
S.D. College of Management Studies
Bhopa Road,
Mobile: 9411245531



Mr. Nirdosh Kumar Agarwal

Professor in Management Department
Trident ET Group of Institutions
NH 58, Delhi Meerut Road. Morta,
GHAZIABAD 201 001 (U.P.)
Mobile: 9650595488



Mr. Brijesh Singh

Associate Professor in Mechanical Engg. Deptt.
Aryabhatt College of Engg. & Technology
13th KM stone, Baghpat Meerut Road,
Vill.- Daula,
BAGHPAT 250 601 (U.P.)
Mobile: 9359765640



Ms. Anju Arora

Asstt. Professor in M.B.A Department
S.D. College of Management Studies
Bhopa Road,
Mobile: 9411245531



Mr. Partha Pratim Saikia

Sr. Lecturer in Management Department
Uttaranchal Institute of Management
Arcadia Grant 10, Chandanwari Prem Nagar
Mobile: 9756821919



Mr. K. G. Arora
Professor in Management Department
S. D. Institute of Professional Studies
Mandi Samiti Road,
Mobile: 9411276769



Mr. Vijay Verma

Lecturer in Management Department
Bhilai Institute of Technology
Bhilai House,
Mobile: 9301108031





Name & Address

Ms. Shailja Agarwal
Lecturer in PGDM Department
Jaipuria Institute of Management
Gomti Nagr,
LUCKNOW 226 010 (U.P.)
Mobile: 9451223075


Mr. Kuldeep
Lecturer in Management Studies Department
Disha Institute of Science & Technology
Distt. Bijnor
Mobile : 9927069856
Mr. Pankaj Kumar
Lecturer in Management Studies Department (MBA)
Trident ET Group of Institutions
Delhi Meerut Road,
Mobile : 9911056373



Ms. Garima Kathuria

Lecturer in Management Studies Department
Vishveshwarya Institute of Engineering & Technology
Distt. Gautambudhnagar
Mobile : 9761377465



Ms. Veeralakshmi B.
Asstt. Prof. in MBA Department
College of Engineering Roorkee School of Mgmt.
7th Km. Roorkee Haridwar Road,
Mob.: 9997239017



Mr. Veer P. Gangwar

Sr. Lecturer in Management Department
Amrapali Institute of Management and Computers
Shiksha Nagar, Lamachaur
Mobile: 9719546523








Name & Address

Mr. R. Vishal Kumar
Sr. Lecturer in Management Studies Department
Happy Valley Business School
Velanthavalam Road,
Veerappanur, Pichanur Post
Mob.: 9944919392



Mr. Mrinal Verma

Lecturer in MBA Department
Krishna Institute of Engineering and Technology
13 KM Milestone
Muradnagar, Delhi Meerut Road,
GHAZIABAD 201 206 (U.P.)
Mob.: 9811671621



Mr. Sachindra Kumar Gupta

Sr. Lecturer in Management Department
Disha Institute of Science & Technology
Mehtaur Road,
Mob.: 9927069856



Ms. Reena Rani

Lecturer in Applied Science Department
Aryabhatt College of Engineering and Technology
Baghpat Meerut Road,
Vill.- Daula Baghpat
BAGHPAT 250 601 (U.P.)
Mob.: 9548991113



Ms. Anjali Chauhan

Sr. Lecturer in Anthropology Department
Sri Jai Narain P.G. College Station Road,
Mob.: 9451505103



Mr. Rajesh M. Holmukhe

Asstt. Prof. in Electrical Engineering Department
Bharat Vidyapeeth University College of Engineering
Katraj, Pune Satara Road,
PUNE 411 043 (M.S.)
Mobile : 9011064868



You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy