RM Book IIT-R 109
RM Book IIT-R 109
RM Book IIT-R 109
Coordinated
By
D. K. Nauriyal
S. P. Singh
ACKNOWLEDGEMENTS
In organizing this course, we drew support from several people and organisations. First of
all, we thank Prof. Vinod Kumar, Co-ordinator, QIP, IIT Roorkee, who has extended his
complete support towards the smooth conduct of this programnme. Our sincere thanks are
also due to Prof. S. P. Singh, the Head, Department of Humanities & Social Sciences for
his encouragement and support.
We owe profound gratitude to Prof. S. S. Dhillon, Punjab School of Economics, GNDU,
Amritsar, Dr. R. K. Deswal, National Institute of Technology (NIT) Kurukshetra, and our
institute especially Dr. Rashmi Gaur, who have kindly accepted our request to deliver
lectures in the programme.
We must also admit that without a very high level of receptivity, active involvement and
cooperation from the participants, the programme would have never accomplished its
objectives. They made the sessions lively and vibrant through their active and thoughtful
interaction. They deserve all the appreciation and kudos.
The infrastructural support and secretarial assistance received from the office of the
Coordinator, QIP IIT Roorkee has been quite significant. It would not be fair on our part
if we fail to acknowledge the timely and kind help lent by the QIP staff.
Last but not the least we thank all those who have directly or indirectly contributed
towards the success of this programme.
Coordinators
D. K Nauriyal
S. P. Singh
CONTENT
Sl.
No.
TITLE
SPEAKER
PAGE
An Overview of Research
Methodology
D. K. Nauriyal
1-7
Research Design
D. K. Nauriyal
8-10
D. K. Nauriyal
11-16
S. P. Singh
17-19
Sampling Methods
S. P. Singh
20-22
S. P. Singh
23-36
S.S. Dhillon
37-40
S.P.Singh
41-49
S.P.Singh
50-58
10
S.P.Singh
59-63
11
DEA Techniques
S.P.Singh
64-76
12
S.P.Singh
77-93
13
D. K. Nauriyal
94-109
14
List of Participants
110-116
Objectivity of Investigator
Unbiased
Procedural integrity
Accurate reporting
Accuracy of Measurement
Valid and Reliable
Meaningful and useful
Appropriate design (sample, execution)
Open-minded to Findings
Willing to refute expectations
Acknowledge limitations
Objectives of Research
To achieve new insight into a phenomenon (exploratory / formulative research.
2. To portray accurately the characteristics of a particular individual situation or a group
(Descriptive Research).
3. To determine the frequency with which something occurs or with which it is
associated with something else (Diagnostic Research).
4. To test a hypothesis of a causal relationship between variables (Hypothesis Testing
Research).
Why Research?
Taking the challenge to solve an unsolved problem.
Desire to get intellectual satisfaction of doing some creative work.
Desire to get a research degree.
Desire to move up the career ladder in the academic institutions.
Desire to be of service to the society.
Significance of Research
1. Research inculcates scientific and inductive thinking and it promotes the development
of logical habits of thinking.
2. Research provides the basis for nearly all govt. policies in our economic system.
3. It helps to solve various operational and planning problems of business and industry.
Market research / operations research/demand forecasting
Conceptualizing the Research
Curiosity and intuition play an important role
(7)
Preparing and
Presenting the
Report
(6)
Analyzing the
Data
(8)
Follow-up
(1)
Identifying the
Research
Problem/Opport
unity
(2)
Determine the
Research
Design
(3)
Determine Data
Collection
Method
(5)
Design Sample
and Collect
Data
Chapter 10
(4)
Design Data
Collection
Conducting & Reading Research
Forms
Baumgartner et al
Care and attention in writing it up in such a way that readers have confidence in the
integrity of the work.
Research reputations are established by repeatedly carrying out interesting and
worthwhile work that is consistently methodologically strong and accurately reported.
Inductive Approach:
Inductive reasoning works the other way, moving from specific observations to broader
generalizations and theories. This is a "bottom up" approach. In inductive reasoning, we
begin with:
Types of Research
1. Descriptive (Ex-post facto research) Vs. Analytical (Critical Evaluation of the
material).
2. Applied (Action) Vs. Fundamental (Basic or Pure).
3. Quantitative (Inferential/experimental/ simulation) Vs. Qualitative.
4. Conceptual (abstract idea or theory) Vs. Empirical (Experience or observations based
on data)
5. Longitudinal Research (Over a time period such as clinical or diagnostic research) Vs.
Laboratory or Simulation Research.
Qualitative Research
By contrast, a study based upon a qualitative process of inquiry has the goal of
understanding a social or human problem from multiple perspectives. Qualitative research
3
is conducted in a natural setting and involves a process of building a complex and holistic
picture of the phenomenon of interest. In qualitative research, a hypothesis is not
needed to begin research.
Quantitative Research
Qualitative Research
In qualitative research, however, it is thought that the researcher can learn the most by
participating and/or being immersed in a research situation
Characteristics of quantitative and qualitative research
Quantitative
Qualitative
Objective
Subjective
Research questions: How many? Strength Research questions: What?
of association?
"Hard" science
"Soft" science
Literature review must be done early in Literature review may be done as study
study
progresses or afterwards
Test theory
Develops theory
One reality: focus is concise and narrow
Multiple realities: focus is complex and
broad
Facts are value-free and unbiased
Facts are value-laden and biased
Reduction, control, precision
Discovery, description, understanding,
shared interpretation
Measurable
Interpretive
Mechanistic: parts equal the whole
Organismic: whole is greater than the parts
Report statistical analysis.
Report
rich
narrative,
individual;
Basic element of analysis is numbers
interpretation. Basic element of analysis is
words/ideas.
Researcher is separate
Researcher is part of process
Subjects
Participants
Context free
Context dependent
Hypothesis
Research questions
Reasoning is logistic and deductive
Reasoning is dialectic and inductive
Establishes relationships, causation
Describes meaning, discovery
Uses instruments
Uses communications and observation
Strives for generalization
Strives for uniqueness
Generalizations leading to prediction, Patterns and theories developed for
explanation, and understanding
understanding
Highly controlled setting: experimental Flexible approach: natural setting (process
setting (outcome oriented)
oriented)
Sample size: n
Sample size is not a concern; seeks
"informal rich" sample
Which one to choose?
Choose a more quantitative method when most of the following conditions apply:
There is no ambiguity about the concepts being measured, and only one
way to measure each concept.
You are studying the trends in weather in the town where you live. There aren't
many variables: temperature ranges, wind speed, rainfall, barometric pressure, and
perhaps a few others. Most of the variables are measured mechanically, and a lot of
historical data exists. You wouldn't even consider doing qualitative research on
this.
Research Methodology:
Research Methods
Consideration of the logic behind the methods we use.
Research Process
Series of actions or steps necessary to effectively carry out research and the desired
sequencing of these steps.
A. Formulating the Research Problem:
Understanding the problem thoroughly, and
Rephrasing the same into meaningful terms from an analytical point of view.
5
Time available
Cost factor
E. Determining the sample size Probability / Non-probability.
Probability:
Simple Random Sampling
Systematic Sampling
Cluster/Area Sampling
Non-Probability / Purposive /Deliberate sampling:
Convenience Sampling
Judgment Sampling
Quota Sampling
F. Collecting the Data: Collection of only appropriate data
Primary DataBy observation, through personal interviews, telephone interviews and by mailing
of questionnaire
Secondary Data
G. Analysis of Data:
1. Computation of statistics viz., mean, median, mode, standard Deviation,
coefficient of variation, coefficient of skewness etc.
2. Designing regression equation for estimating dependent variable as a function of
a set of independent variables.
3. Performing correlation analysis.
Hypotheses Testing:
I.
Interpretation of Results:
J.
K.
Report Writing
2. RESEARCH DESIGN
RESEARCH DESIGN
It is a conceptual structure within which research is conducted. It constitutes the blue
print for the collection, measurement and analysis of data
The research design addresses the following questions:
1. What is the study about?
2. Why is the study being done?
3. Where will the study be carried out?
4. What type of data is required?
5. Where can the required data be found?
6. What periods of time will the study include?
7. What will be the sample design?
8. What techniques of data collection will be used?
9. How will the data be analyzed?
10. Style of Report.
Thus the Essentials of the Research Design are:
The design is an activity and time-based plan.
The design is always based on the research question.
The design guides the selection of the sources and types of information.
The design is a framework for specifying the relationships among the studys
variables.
The design outlines procedures for every research activity.
Research Design
Overall Design
Sampling Design
Statistical Design
Observational Design
Opeartional Design
Type of Study
Exploratory/ Formulative
Descriptive / Diagnostic
Flexible (for considering
Rigid Design
different parts of the problem
Non-probability (purposive or Probability (Random
Judgment)
Sampling)
No Pre-planned design for
analysis
Unstructured instruments for
collection of data
NO fixed decisions about the
operational procedures
Sampling, Quota
10
Secondary data: information that has previously been gathered by someone other
than the researcher and/or for some other purpose than the research project at hand
Types of Observations
Structured (Descriptive)
Unstructured (Exploratory)
Participant (Anthropological)
Non-Participant (Political forecasts)
Disguised Participation (Presence of the observer is hidden)
In-Depth Techniques:
Focus groups Interviews
Interviews: Personal, Telephonic, Focused, Non-Directive
Projective Techniques
Primary Data Collection Methods
1. Observation
Human or physical observation includes mystery shopping, cameras in store,
watching children handle toys, etc.
Ethnography watching behaviors in the consumers natural setting
Mechanical or electronic observation using Nielsen people meter, eye tracking
devices, or using software to track behaviors on the Web, etc.
Limitations:
A. Expensive method in terms of time and money
B. Limited Information.
C. Interference of unforeseen factors
Merits of interview method
12
Merits:
Low cost
Free from the bias of the interviewer.
Enough thinking space.
Can be reached to otherwise inaccessible people.
Sample could be larger.
Demerits:
Low rate of return.
Only the educated and cooperating people could be approached.
Difficulty in modifying the approach once the questionnaire is made.
Possibility of ambiguous replies/omissions of questions.
Method is slightly to be slowest of all.
Important considerations while framing a questionnaire
A) General form (close ended/Open ended).
Question sequence:
First few questions are important because they are lightly to influence the attitude and the
desired cooperation from the respondent
14
Disadvantages:
Respondent may say too much or too little
Provide incomplete or unintelligible answers
Flexibility in responses difficult to code and analyze -Interpretations of answers
may vary
Too much variance in response
Expensive and time-consuming
Closed-ended Questions
Closed-ended questions provide respondents with a list of responses from which to
choose. Alternatively, closed-ended questions can provide multiple choices for the
respondent to accept or reject
15
9) Risk of collecting incomplete and wrong information is relatively more under the
questionnaire method particularly when people are unable to understand questions
properly.
10) Observation method can also be used along with the schedules but it is not possible
with the questionnaire.
16
Graphs can used to plot such values and is called Histogram.of time series.
The fluctuations may be due to:
(a) Causes which operate over a long time period
(b) Causes which operate over a short time period
These causes are segregated and this process is called analysis of time series.
The variations in the value of the variance can be analysed into the following three
main components.
1. The basic or long time period
2. Short time of periodic changes
3. Irregular fluctuations
Measurement of Trend:
a. Free hand smoothing
b. Sectional Average: in this whole series is divided into suitable number of sections
and average of each section is found. These averages are plotted against the mid
year of the sections, then a free hand smooth curve is drawn through these points.
The curve represents Trend.
c. Method of Moving Average: In this fluctuations due to cyclical changes are
eliminated by averaging the values of the variance for a specified number of
successive years. Number of years over which the values are averaged depends
upon the average length of the cycle found in the series. Then mean is taken. All
these means are plotted and successive points are joined by straight line segment.
The resulting polygonal graph indicates the trend of the given time series.
d. Method of Least Squares: From the mid year of the time series , time deviations
are to be taken and these deviations are to be squared. Then multiply the values
with the squares. The resultants are trend ordinates. When these are plotted against
the corresponding year we get the line of best fit in the sense of least square.
Correlation: The relation between two or more characteristics of a population or a sample
can be studied with the help of a statistical method called correlation. If two quantities
vary in a related manner so that a movement in increase or decrease in one tends to be
accompanied by a movement in the same or in the opposite direction in the other, the two
quantities are said to be correlated. It may be +ive or ive. It may be perfect or imperfect.
Methods: 1. Graphic Method; 2. Scatter Diagram; 3. Co-efficient of correlation
Co-efficient of correlation is a numerical measure of correlation.
1. Karl pearson coefficient of correlation also called product movement correlation
2. Spearmans rank correlation
Test of significance is done for both measures by using t- test
18
U is a random error to fit the st. line we apply the method of least squares. In order to
estimate a and b we need to minimize sum of square of Ui. For this we solve the required
regression equations.
Coefficient of Determination: In order to measure the extent of strength of correlation
between the dependent and independent variable/s we calculate the statistic called
coefficient of determination (r2/R2)
This measure is developed on the basis of two levels of variations
The variations of Y values around the fitted regression line given by (Y-^Y)2 and
The variations of Y values around their own mean given by (Y-Y*)2
Where Y* is mean of Y. then r2/R2 = 1- e2 /Y2
The value of r2/R2 shows the goodness of fit of the regression equation/s. Higher value of
r2/R2 , higher the closeness of fit and lesser r2/R2 , lesser the goodness of fit.
19
5. SAMPLING METHODS
Sampling is a process of selecting a subset of randomized number of members of the
population of a study and collecting data about their attributes. Based on the data of the
sample the analyst will draw inference about the population.
Advantages of Sampling:
(1) Less time taken to collect data
(2) Less cost for data collection
(3) Physical impossibility of complete enumeration necessitates sampling
(4) More accuracy of data collected due to its limited size.
Sampling Frame: The complete list of all the members/units of the population from
which each sampling unit is selected is known as sampling frame. It should be free from
error.
Sampling Methods:
Sampling methods are divided into two/;
(a) Probability Sampling
(b) Non-Probability Sampling
Probability Sampling: In probability sampling each unit of the population has a
probability of being selected as an unit of the sample. But this probability varies from one
method to another method of probability sampling.
In non-probability sampling there may be instances that certain units of population will
have zero probability of selection, because judgment biases and convenience of the
interviewer are considered to be the criteria for the selection of sample units.
Probability Sampling Methods:
(1) Simple Random sampling
(2) Systematic Sampling
(3) Stratified Sampling
(4) Cluster Sampling
(5) Multistage Sampling
(1) Simple Random Sampling: Let N = No of units of population
n = no of unit of sample
Where n< N
There are two ways of performing SRS (a) with replacement abd (b) without
replacement
SRS with replacement: Each unit of the pop. has the equal probability of being
selected.
Prob. Of selection = 1/N
Selection is done by using Random Number Tables
SRS without replacement: Each unit of pop has varing prob of being selected as an
unit of the sample.
The Prob of First unit = 1/N
The Prob of second unit = 1/N-1
-- - - - - - - - - - - - - - - The Prob of nth unit = 1/N-(n-1)
20
Unit are selected from the population based on the respective probability using
Monte-Crlo Simulation.
(2) Systematic Sampling: This is a special kind of random sampling in which the
selection of the first unit of the sample from the popo is based on randomistion.
The remaining units of the sample are selected from the pop at a fixed interval of n,
where n is a sample size.
Sampling interval width I = N/n
(3) Stratified Sampling: It is an improvised sampling over simple random sampling
and systematic sampling.
In this sampling pop is divided into specified set of strata such that members with
in stratum have similar attributes but members between strata have dissimilar
attributes.
(a) Proportional Stratified sampling: when same proportion of units are
selected from each stratum. There is no much difference (less variance) in
attributes with in each stratum.
n is the sample selected such that n = n1+n2+------+nk
N=pop size, Ni=Strata Size; ni = size of sub sample
n1/N1=n2/N2=--------=nk/Nk=n/N
n1= n.N1/N------------------- nk=n.Nk/N
(b) Dis proportional S.S.: When different proportion of units are selected from
each stratum. Attributes differ and there is high variance. In this sampling
the stratum which has more variance will have prop more sampling units as
compared to other stratum with less variance.
No of sampling units of the stratum i=ni= qi.si. n/ Sum qisi
si is st deviation of stratum i
qi= Ni/N
(4) Cluster Sampling: Pop is divided into different clusters. Memebers within the
cluster are dissimilar in terms of their attributes. But different clusters are similar
to each other. Each cluster can be treated as a small population which possess all
the attributes of the pop. Any one cluster is selected and all units of cluster
constitute the sample.
(5) Multistage Sampling: In a large scale survey covering the entire nation the size of
the sample frame will be very large. In such study multistage sampling technique is
used.
The entire country is divided into regions.
Stage 1: Different states of the country are sampled from each region using stratified
sampling. Here it is assumed that the states within the region are similar and the
regions are dissimilar.
Stage 2: Then cluster sampling is can be used from each selected state by assuming
that different districts of each state as its cluster.
Stage 3: In each selected district a random sampling may be used to select the
proportional number of units from it.
II. Non-Probability Sampling Methods:
21
1.
2.
3.
4.
Convenience Sampling
Judgment Sampling also called Purposive Sampling
Quota sampling
Snowball Sampling
22
23
4. Personal Interviews are a way to get in-depth and comprehensive information. They
involve one person interviewing another person for personal or detailed information.
Personal interviews are very expensive because of the one-to-one nature of the interview.
Typically, an interviewer will ask questions from a written questionnaire and record the
answers verbatim. Personal interviews (because of their expense) are generally used only
when subjects are not likely to respond to other survey methods.
5. Telephone Surveys are the fastest method of gathering information from a relatively
large sample (100-400 respondents). The interviewer follows a prepared script that is
essentially the same as a written questionnaire. However, unlike a mail survey, the
telephone survey allows the opportunity for some opinion probing. Telephone surveys
generally last less than ten minutes.
5. Mail Surveys are a cost-effective method of gathering information. They are ideal for
large sample sizes, or when the sample comes from a wide geographic area. They cost a
little less than telephone interviews, however, they take over twice as long to complete
(eight to twelve weeks). Because there is no interviewer, there is no possibility of
interviewer bias. The main disadvantage is the inability to probe respondents for more
detailed information.
6. Email and Internet surveys are relatively new and little is known about the effect of
sampling bias in Internet surveys. While it is clearly the most cost effective and fastest
method of distributing a survey, the demographic profile of the Internet user does not
represent the general population, although this is changing. Before doing an e-mail or
Internet survey, carefully consider the effect that this bias might have on the results.
What is Survey?
A survey is a method of collecting information directly from people about their
ideas, feelings, health, plans, beliefs, and social, educational and financial background. It
usually takes the form of self-administered questionnaires and interviews. Selfadministered questionnaires can be completed by hand or by computer. Interviews take
place in person or on telephone.
Why do we conduct survey?
There are at least three good reasons for conducting surveys.
1. A policy needs to set or a programme must be planned.
Surveys are conducted to meet policy or programmes needs. For instance, a
company is considering providing day care for children of its working staff. How many
have young children? How many would use the agency services?
2. You want to evaluate the effectiveness of programmes to change peoples
knowledge, attitudes, health or welfare.
3. You are a researcher and a survey is used to assist you.
conduct research when the information you need should come directly from people. The
data they provide are descriptions of attitudes, values, habits and background characterizes
such as age, health, education and income.
Types of Survey
1. Cross-sectional Survey
With this design, data are collected at a single point in time. Think of a cross
sectional survey as a snapshot of a group of people or organizations. Cross-sectional
surveys have several advantages.
2. Longitudinal Surveys
With longitudinal survey, data are collected over time. At least three variations are
particularly useful.
(a) Trend: a trend design means surveying a particular group over time. For example,
studying a group of rural peoples socio-economic conditions over time.
(b) Cohort: In cohort survey, you study a particular group over time but people in the
group may vary.
(c) Panel: panel survey means collecting data from the same sample over time.
The Survey Content
Better: in your opinion, in the next two years, how is the relationship between the left
parties and congress likely to change?
Much improvement
Some improvement
Some worsening
Much worsening
Impossible to predict
6. Use caution when asking about the personal: another source of bias may result
from questions that may intimidate the respondent.
How much do you earn each year? Are you single or divorced? How do you feel about
your teacher, counselor or doctor? When personal information is essential to the
survey, you can ask questions in the least emotionally charged way if you provide
categories of responses.
Example:
Poor: What was your annual income last year? Rs..
Better: In which category does your annual income last year fit best:
Below Rs. 100000
Rs. 100000 - Rs.150000
Rs. 150000 Rs. 200000
Rs. 200000 Rs. 250000
Rs. 250000 and above.
7. Each question should have just one thought: do not use questions in which a
respondents truthful answer could be both ways yes and no at the same time
Survey Design
Here, we shall discuss options and provides suggestions on how to design and conduct a
successful survey research. There are 7 steps in the survey research:
1.
2.
3.
4.
5.
6.
7.
Setting Objectives
The first step in any survey is deciding the objectives. If your objectives are unclear, the
results will probably be unclear. Some typical objectives may be like these:
27
Identify a larger number of eligible respondents than you need in case you do
not get the sample size you need.
Send remainders to complete mailed surveys and make repeat phone calls.
28
Part-II
DESIGNING QUESTIONNAIRE/SCHEDULE
Introduction
Questionnaire is widely used for data collection in survey research. It is fairly reliable tool
for gathering data from large, diverse, varied and scattered groups. Questionnaire is a list
of questions sent to a number of persons for their answers and which obtains standardised
results which can be tabulated and treated statistically.
Sometimes a distinction is made between `questionnaire and `schedule or `interview
guide. Generally questionnaire is mailed to the respondents who are to give answers in a
manner specified either in the covering letter or in the main questionnaire itself. On the
other hand a schedule refers to a form of questionnaire which is generally filled in by the
investigator himself. He/she sets with the informant face to face and fills the form.
Schedule is more effective than mailed questionnaire because in most of the cases,
respondents do not response properly the mailed questionnaire because of ignorance,
illiteracy and lack of awareness and interest, while in case of schedule, investigator has
face to face contact with respondents, she/he would be able to get reliable information
from the respondents.
Types of Questionnaire
Questionnaire may be broadly of two types, viz. Structured and unstructured
questionnaire. According to P.V. Young, structured questionnaires are those which pose
definite, concrete, and pre-determined questions, i.e.; they are prepared in advance and not
constructed on the spot during the question period. Additional questions may be asked
only when some clarification is required. Answers to these questions are normally given
with high precision. For e.g. age, sex, marital status, number of children nationality etc.,
are automatically structured. Structured questionnaire may further be grouped into closed
form or open-end questionnaire. A close form questionnaire is one in which questions are
set in such a manner that it leaves only a few alternative answers. The informant is left
with only a few choices to answer them. For e.g., do you think poverty and
unemployment have increased in India after economic reform? Yes/No/Cant say.
In above stated question, respondent has to select one out of three alternatives.
The open-ended questionnaire, on the other hand, is one in which the respondent has full
choice of using his own style and diction of language expression, length and perception.
He has enough freedom while providing answers to open questions.
The unstructured questionnaire contains a set of questions which are not structured
in advance and which may be adjusted according to the need of question period. The
unstructured questionnaire is used mainly for conducting interviews. Flexibility is its
chief merit.
A widespread criticism of closed questionnaire is that they force people to choice
among offered alternatives instead of answering in their own words. Closed questions
spell the response options; they are more specific than open questions and therefore more
apt to communicate the same frame of reference to all respondents.
Let us take a hypothetical case; we want to identify the most important problem
facing the country. In open-closed experiment people are asked what they think is the most
29
important problem facing the nation. In a close-ended framework, we set five alternatives,
namely, unemployment, economic disparity, crime, poor governance and inflation. One
open-ended question is also set. In response to the open-ended questions, the respondents
may identify power shortages the most vital problem of the country. Thus, open-ended
questions are also relevant, especially when the researcher has inadequate knowledge
about the various problems faced by the country.
Construction of Questionnaire
A. General Considerations
1. Most problems with questionnaire analysis can be traced back to the design phase
of the project. Well-defined goals are the best way to assure a good questionnaire
design. When the goals of a study can be expressed in a few clear and concise
sentences, the design of the questionnaire becomes considerably easier. The
questionnaire is developed to directly address the goals of the study.
2. One of the best ways to clarify your study goals is to decide how you intend to
use the information. This sounds obvious, but many researchers neglect this task.
3. Be sure to commit the study goals to writing. Whenever you are unsure of a
question, refer to the study goals and a solution will become clear. Ask only
questions that directly address the study goals.
4. KISS - keep it short and simple. If you present a 20-page questionnaire most
potential respondents will give up in horror before even starting. A one of the
most effective methods of maximizing response is to shorten the questionnaire.
5. If your survey over a few pages, try to eliminate questions. Many people have
difficulty knowing which questions could be eliminated. For the elimination
round, read each question and ask, "How am I going to use this information?" If
the information will be used in a decision-making process, then keep the
question... it's important. If not, throw it out.
6. Involve other experts and relevant decision-makers in the questionnaire design
process.
7. Formulate a plan for doing the statistical analysis during the design stage of the
project. Know how every question will be analyzed and be prepared to handle
missing data. If you cannot specify how you intend to analyze a question or use
the information, do not use it in the survey.
8. Provide a well written cover page. The respondent's next impression comes from
the cover letter (for mailed questionnaire). It provides your best chance to
persuade the respondent to complete the survey.
9. Giver your questionnaire a title that is short and meaningful to the respondents. A
questionnaire with a title is generally perceived to be more credible than one
without.
10. Begin with a few non-threatening and interesting items. If the first items are too
threatening or "boring", there is little chance that the person will complete the
questionnaire.
11. Leave adequate space for respondents to make comments. Leaving space for
comments will provide valuable information not captured by the response
categories.
12. Place the most important items in the first half of the questionnaire. Respondents
often send back partially completed questionnaires.
30
inform
help
think
live
say
enough
start and so on
31
C. Question Content
A questionnaire designer has to ensure that all the necessary items are duly
incorporated in the questionnaire. The investigator may take the help of standard
checklists to see that all the required items are included in the questionnaire. The checklist
can also be prepared by the investigator himself. Check lists may differ depending upon
the aims and objectives of the survey research. Some of the important items of checklist
of content are as follows:
1.
2.
3.
4.
5.
6.
Is this question necessary for clear understanding? Just how well it is used.
Are several questions needed on the subject matter of this one question?
Do the respondents have the information necessary to answer the questions?
Does the question need to be more concrete, more specific and closely related to
the respondents experience?
Is the question content sufficiently general and free from superiors concreteness
and specificity?
Is the question content biased or loaded in one direction without accompanying
questions to balance the emphasis?
D. Question Types
Researchers use three basic types of questions: multiple choice, numeric open end and
text open end (sometimes called "verbatim"). Examples of each kind of question follow:
Multiple choice Question
1. Where do you live?
1.
2.
3.
4.
agree
Agree
disagree
somewhat
Strongly
disagree
34
8. Does not use emotionally loaded or vaguely defined words. Quantifying adjectives
(e.g., most, least, majority) are frequently used in questions. It is important to understand
that these adjectives mean different things to different people.
E. Question Sequence
Items of a questionnaire should be grouped into logically coherent sections.
Grouping questions that are similar will make the questionnaire easier to complete, and
the respondent will feel more comfortable. Questions that use the same response formats,
or those that cover a specific topic, should appear together. Each question should follow
comfortably from the previous question. Writing a questionnaire is similar to writing
anything else. Transitions between questions should be smooth. Questionnaires that jump
from one unrelated topic to another feel disjointed and are not likely to produce high
response rates.
Some researchers have suggested that it may be necessary to present general
questions before specific ones in order to avoid response contamination. Other
researchers have reported that when specific questions were asked before general
questions, respondents tended to exhibit greater interest in the general questions.
The numbering of questions should be in a logical sequence. To check the
sequence of questions the following questions should be answered.
1.
2.
3.
Are the answers to the questions likely to be influenced by the content of the
preceding questions?
Are the questions led up to in a natural way?
Do some questions come too early or too late from the point of view of arousing
interest and receiving sufficient attention, avoiding resistance and inhabitations?
15.
16.
Decide whether a personal or impersonal question will obtain the better response.
Questions should be limited to a single idea or a single reference.
After finalising the questionnaire/schedule by correcting it on the basis of pretesting, the investigator has to collect data from the field. The following points should be
taken into account by the investigator while collecting data through
questionnaire/schedule.
1.
2.
3.
4.
5.
He must plan in advance and should fully know the problem under consideration.
He must choose a suitable time and place so that respondent should be ease during
interview.
All possible efforts should be made to establish proper rapport with the informant;
people are motivated to communicate when the atmosphere is favourable.
He must know that ability to listen with understanding, respect and curiosity is the
gateway to communication, and hence acts accordingly during the survey.
Investigators approach must be friendly and informal. Initially friendly greetings
in accordance with the cultural pattern of the respondent should be exchanged and
then the purpose of the survey should be explained.
To the extent possible, there should be a free-flow interview and the questions
must be well phrased in order to have full cooperation of the respondent.
36
37
Significance Level: The significance is the probability with which null hypothesis will be
rejected due to sampling error though it is true.
Decision to reject or accept null hypothesis depends upon the information contained in the
sample and there is always a risk of taking wrong decision. One is likely to commit two
types of errors.
TYPE I Error: The error of rejecting null Hypothesis on the basis of information
contained in the sample when actually it is true is called Type-I error ( probability of
rejecting null Hypothesis when it is true) It is denoted by (In quality control it is called
producers risk because it is probability of rejecting a good lot).
Probability of committing type I error is called Level Significance = Level of
Significance = Probability of rejecting H0 when it is true.
Type-II Error: It is the probability of accepting the null hypothesis when it is false.
Also called consumers risk because it is prob of accepting bad lot. It is denoted by .
Hypothesis H0 and Hi are mutually exclusive events. i.e. if H0 is accepted (rejected) then
H1 is rejected (accepted).
Power of Test: The probability of accepting null Hypothesis on the basis of sample
information when null hypo is true is called Power of a test.
Therefore Power of Test: Prob. (Accept H0 when H0 is true)
Prob.(Accept H0 when true) + Prob. ( Accept H0 when H0 is false) = 1
Prob (Accept H0 when true)+ = 1
Therefore Power of test = Prob(Accept H0 when true) = 1-
So test will be more powerful when error is small.
Sample Space: Pop size= N, Random sample size drawn= n and possible samples are
k=Ncn
Suppose some statistic t is computed from each of the samples.
t =f(x1,x2,x3,-----xn) Possible sample statistic are t1,t2,t3-----tk constitute sample space.
It is used to test null hypothesis. Some will lead to rejection of Ho other may lead to
acceptance of H0.
Thus sample space of statistic is divided into two disjoint and exhaustive sets.
Critical Region (W) : It is part of sample space which leads to rejection of null hypothesis
if given sample statistic fall in this region.
Acceptance Region: It is that part of sample space, which leads to acceptance of null
hypothesis, if sample statistic falls in it.
Critical Point: The point in sample space which divides the sample space in two mutually
disjoint and exhaustive sets is known as critical Point.
The critical points are tabulated values for different sampling distributions. Form of
Sample Space is determined by different sampling distribution like t, F, 2,Z etc.
Sampling Distribution: Sampling distribution is Probability distribution of a statistic.
38
Two tailed and one tailed tests: A two tailed test rejects the null hypothesis if, say, the
sample mean is significantly higher or lower than the hypothesized value of the mean of
the population. Such a test is appropriate when the null hypothesis is some specified value
and the alternative hypothesis is a value not equal to the specified value of the null
hypothesis.
Symbolically, the two tailed test is appropriate when we have H0: = 0
and Hi : : 0 which may mean >0 or <0. Thus in a two tailed test there are two
rejection regions, one on each tail of the curve.
One tailed test : A one tailed test would be used when we are to test, say, whether the
population mean is either lower than or higher than some hypothesized value.
For ex. If H0: = 0
H1: < 0 then we are interested in what is known as left tailed test or if
H1: > 0 then it is one tail test, which is known as right tailed test. (Where there is only
one rejection region either on the left tail or right tail)
Tests of Hypotheses:
Tests of hypotheses (also known as tests of significance) can be classified as:
1. Parametric Tests
2. Non-parametric Tests
Parametric Tests: Parametric tests usually assume certain properties of the parent
population from which sample is drawn. Assumptions like observations come from a
normal population, sample size is large, assumptions about the population parameters like
mean, variance etc. must hold good before parametric tests can be used. Probability
distribution of statistic (sampling distribution) is known i.e. it follows particular
distribution like t, F, Z etc. Parametric tests cannot be applied if nature of parent
population is unknown and data is measured on nominal/ ordering scale.
The important parametric tests are: z-test, t-test, 2- test and F-test. (2-test is also used as
a test of goodness of fit and also as a test of independence in which case it is a nonparametric test.)
All these test are based on the assumption of normality i.e. the source of data is considered
to be normally distributed.
Z-test: it is based on the normal probability distribution and is used for judging the
significance of several statistical measures, particularly the mean. The relevant test
statistic, Z, is worked out and compared with its probable value at specified level of
significance for judging the significance if the measure concerned.
As n becomes large Z-test is generally used even when binomial distribution or tdistribution is applicable on the presumption that such a distribution tends to approximate
normal distribution.
Z- test is used for comparing the mean for the population, when pop. variance is known,
for judging the significance of difference between means of two independent samples
when pop. variance is known, for comparing the sample proportion to a theoretical value
of population or for judging the difference in proportions of two independent samples
when n happens to be large. This test may be used for judging the significance of median,
mode, coefficient of correlation and several other measures.
t-test: t-test is based on t-distribution and is considered an appropriate test for judging the
significance of a sample mean or for difference between means of two samples in case of
39
small sample (s) when pop. variance is not known (then sample variance is used for pop
variance.). In case two samples are related, we use paired t-test (difference test) for
judging the significance of mean of difference between two related samples. Also used for
testing the significance of the coefficient of simple and partial correlations. The relevant
test statistic t is calculated from the sample data and then compared with its probable value
based on t-distribution to read from table at different level of significance and degree of
freedom for accepting or rejecting the hypothesis.
2-test: It is based on chi-square distribution and as a parametric test is used for comparing
a sample variance to a theoretical population variance.
2 = (Xi- X)2/2 = (n-1) S2/2
with n-1 d. f.
F-test: F-test is based on F-distribution and is used to compare the variance of the two
independent samples. This test is also used in the context of analysis of variance
(ANOVA) for judging the significance of more than two sample means at one and the
same time. It is also used for judging the significance of multiple correlation coefficients.
Test statistic, F, is calculated and compared with its probable value for accepting or
rejecting the null hypothesis. (we use F-ratio Table for certain d.f. at certain level of
significance)
Non-Parametric Tests: The tests which are used when practical data may be non normal
and /or it may not be possible to estimate the parameter(s) of the data are called nonparametric tests. Since these tests are based on the data, which are free from distribution
and parameter, these tests are called non-parametric tests or distribution free tests. The
non-parametric tests can be used for nominal data (qualitative data, like greater or less
etc.) and ordinal data, like ranked data. These tests require less calculation, because there
is no need to compute parameters. Also these tests can be applied to very small samples,
more specifically during pilot studies in market research. Inference about the population
can be made by the non-parametric tests when assumptions of the standard methods
cannot be satisfied since the non-parametric tests involve no or less restricting
assumptions when compared to the parametric tests.
Main non-parametric tests are
1. One-sample tests
a. one sample sign tests
b. Chi-square test
c. Kolmogorov-Smirnov test
d. Run test for randomness
2. Two- Sample tests
a. Two-sample sign test
b. Median test
c. Mann-Whitney U test (Rank sum test)
3. K-sample test
a. Median test
b. Kruskal- wallis test (H test)
c. Kendalls coefficient of concordance test
40
Two main purposes: (i) to get a degree and (ii) conduct sponsored and Consultancy
research projects.
With increasing privatization of higher education and shrinking public grants,
greater stress of Academic Institutions is on generating their own resources.
Academic institutions need faculty capable of doing independent sponsored and
consultancy research projectslead the research team
A good research proposal (RP) is not only necessary for a high quality of research
but also for getting grant from the funding agencies
A RP must be convincing to anonymous experts who examine it and see whether it
is methodologically sound, conceptually clear and would make significant
contribution to the knowledge on the subject.
As large No. of RP submitted to the funding agencies for financial assistance, your
proposal need to be excellent and not just very good for getting approved for the
grant.
TWO MAIN TYPES OF FUNDED RESEARCH
1. Research you really want to do:
Find sponsor!
--CSIR, MHRD, UGC, DST, UNDP, Foundations, NGOs, World Bank, DFID, Ministries
2. Topics some sponsor wants to see done:
Industries, organizations, Ministries,
market surveys, evaluation studies, R&D projects,
WHAT IS RESEARCH PROPOSAL?
A RP is the presentation of an idea that you wish to pursue.
It is intended to convince funding agency/ RDC that you have a worthwhile
research project and that you have the competence and the work-plan to complete
it.
A good RP presumes that you have already thought about your project and have
devoted some time and efforts in gathering information, reading and organizing
your thoughts
A high quality proposal not only promises success for the project, but also
impresses RDC about your potential as a researcher.
41
TYPES OF PROPOSALS
Letter proposal
Preliminary expression of interest to an investor
Developed into full proposal only with donors consent
Most unsolicited proposals should first be in form of letter proposal
Full proposal
Often in response to Request For Proposal - RFP
COMPONENTS OF A LETTER PROPOSAL
It should contain:
Summary
Statement of problem
Solution
Budget and budget explanatory notes
Capability of investigators and the institution (Curriculum Vitae)
COMPONENTS OF A FULL PROPOSAL
Title page
Executive summary
Introduction
Problem statement
Project description
Project hypothesis
Expected outputs
Study methods/Research approach
Budget and budgetary notes.
Logical framework
Project management and personnel
references
Important attachments
SEVERAL DIMENSIONS FOR RP
1. Content:
Basic
vs.
2. Time Frame: Short-term
vs.
3. Scope:
Program
vs.
4. Teaming:
Single PI
5. Selection:
Competitive
6. Client:
Scientific
applied
long-term
project
vs.
multiple
vs.
sole source
vs.
agency
43
Should have some key words reflecting variables, theoretical basis, and purposes,
time, place, etc.
Leave out phrases like an investigation into, a study of, aspects of as these
are obvious attributes of a research project
THE BACKGROUND
In the background, the researchers should:
Create reader interest in the topic
Make sure that the reviewers know in the first few sentences what your project is
about. It is a good idea to start a RP like this: in the proposed study we seek to
examine..
Lay the broad foundation of the problem that lead to the study
Place the study within the larger context of the scholarly literature
Reach out to a specific audience
If a researcher is working within a particular theoretical framework of enquiry, the
theory of inquiry should be introduced
The efficient use of reference to keep the background short
A flow in the text, where every issue raised , leads to the research problem.
STATEMENT OF THE PROBLEM
It must be clear from the text what the nature of the problem is , how it was
identified and why it is significant.
A problem statement should be presented within a context and the context should
be briefly explained, including a discussion on the conceptual framework.
The problem might be defined as the issue that exists in the literature, theory or
practice that leads to a need for the study.
The problem should be clearly defined, making the evaluation easy for the
reviewers/RDC members.
Effective problem statements answer the question: Why does this research need to
be conducted?
PROCESS OF PREPARING THE STATEMENT
Prepare your arguments that leads to the statement of yr. prob.
Make a list of issues you will address in yr account of the context
Issues should be summarized in a few words and put them in an order sequence so
that it is possible to progress from one issue to the next in a logical manner
Keep the list brief but make sure it covers all vital issues
Write yr research problem in one sentence at the end of the list
check that in yr sequence the issues all relate to the problem and lead logically
towards it.
if there are gaps in the logic of the argument, add linking issues
When u think that the argument is cogent, put some flesh on the bones in the text
by making full sentence and adding references
44
REVIEW OF LITERATURE
It provides the background and context for the research problem.
It shares with the readers the results of other studies that are closely related to the
proposed study.
It relates to the proposed study to the ongoing dialogue in the literature, filling in
gaps.
It provides a framework for establishing the importance of the study, as well as a
benchmark for comparing the results of the study with other findings.
Demonstrate to the reader that you have a comprehensive knowledge of the field.
Help to avoid statements that imply little has been done in he area.
The literature review must address three areas:
Topic or problem area: This part of the literature review covers material directly
related to the problem being studied. separate substantive areas.
Theory area: Investigators must identify the theory which relates to the problem
area.
METHODOLOGY: Review of the literature related to various aspects of their
chosen method, including design, selection of subjects, and methods of data
collection. It describes research methods and measurement approaches used in
previous studies.
PURPOSE OF THE STUDY
Purpose of the study should be clearly stated. If it is not clear to the writer, it
cannot be clear to the reader.
Briefly define the specific area of the research.
Foreshadow the hypotheses to be tested or the questions to be raised as well as
significance of the study. These will require specific elaboration in separate
sections.
Should incorporate rationale for the study.
Key points:
Start with the purpose of the study is---.
Clearly identify and define the central ideas of the study
Identify the specific method of inquiry to be used.
OBJECTIVES
Typically comes after problem definition, motivation and significance.
Start with overall objective (or aim of the research), and then state two to four
specific objectives
Write a list of the objectives of yr research. Think as many as u can. When u have
done it, consider each one carefully by asking the questions:
How will the objective be achievedmethods, resources, skills, time?
Is it realistically possible to achieve it?
What results are required to achieve it?
Is the objective central to yr study?
Are there any overlaps between the objectives?
Is there any sequence or hierarchy that link one to another. If so, are they in the
correct order?
Are there too many objectives to be realistically achievable?
45
HYPOTHESES
State what kind of relationships u expect to find between variables or factors.
Hypothesis is particularly necessary in the search for cause and effect relationship.
It is yr intelligent guess about the possible relationship. Do not hard pressto prove
that yr guess is right. It is more common to disapprove than prove the hypothesis.
A good hypothesis should possess the following features:
must be conceptually clear
should have empirical reference
must be specific
should be related to available techniques
Should be related to a body of theory
METHODS AND PROCEDURES
The RP must specify the research operations you will undertake and the way you
will interpret the results of these operations in terms of your central problem.
Do not just tell what you mean to achieve, tell how you will achieve.
A methodology is not just a list of research tasks but an argument as to why these
tasks add up to the best attack on the problem.
Indicates the methodological steps you will take to answer every question or test
every hypothesis.
The variables you propose to control and howexperimentally or statistically
Sampling design, data collection, data analysis techniques, Instruments, etc.
RELEVANCE/SIGNIFICANCE
Indicate how your research will refine, revise, or extend existing
knowledge.
Such refinements, revisions, or extensions may have either substantive,
theoretical, or methodological significance.
Think about implicationshow results of the study may affect scholarly
research, theory, practice, policy.
Will results influence programs, methods, and/or interventions?
Will results influence policy decisions?
How will results of the study be implemented, and what innovations will
come about?
POSSIBLE OUTCOME
Do not promise mountains and deliver molehills
Be quite precise as to the nature and scope of the outcomes and as to who might be
the beneficiaries
Make sure that the outcome relate directly to the purpose of the research
46
47
the effect of X on Y. A pioneering effort in this direction is described by Chen and Smith
(1998),
Do not assume that your reader/reviewer knows the problem.
DO NOT use language like:
It is well known, it is obvious or it is trivial to show.
BETTER: It is generally accepted in the literature
49
Beneficiaries:
Faculty from engineering, management, pure sciences, and social sciences who have been
actively engaging in guiding research scholars and carrying out sponsored research
projects. A basic understanding of statistics will be useful, but is certainly not essential.
Regression Analysis
It is a statistical method for studying the relationship between a single dependent variable
and one or more independent variables, with a view to estimating and/or predicting the
(population) mean or average value of the former in terms of known or fixed values of the
latter. When independent variable is only one, it is called bivariate analysis and when
independent variables are more than one, it is known as multivariate analysis.
Variables
Dependent
Independent
Explained
Predictand
Regressand
Endogenous
Explanatory
Predictor
Repressor
Exogenous
Deterministic
Stochastic/ Probabilistic
if we believe that the model should be constructed to allow for random error, then we
hypothesize a probabilistic model. This includes both a deterministic component and a
random error component.
Y = a + bX
(deterministic)
Y = a + bX + u
(stochastic)
Y - na - b X = 0,
X (Y -a - bX) = 0,
XY -a X - bX2 = 0
= na -+ b X
XY = a X + bX2
By solving these two normal equations, intercept and slope of linear regression line are
estimated. Values of a and b can directly be estimated by the following formulae:
B = (Yi - y) (Xi- x) / (Xi - x)2
For each set of values for the K independent variables, (Xi1, X2j -------Xkjj),
E (uj)= 0, i.e. mean of error term is zero.
For each set of values for K independent variable, VAR (uj) = 2 (i.e., the variance of
error term is constant) (homoscedasticity). Violation of the assumption creates the
problem of hetroscedasticity.
For any two sets of values for the K independent variable, COV (ui ,uj) = 0 (i.e., the
error terms are uncorrelated) (Autocorrelation problem).
For each Xi, COV (Xi, u) = 0 (i.e., each independent variable is uncorrelated with the
error terms) (Autocorrelation problem).
There is no perfect collinearity among the impendent variables. (Multicollinearity
problem).
For each set of the value for the K independent variable, uj is normally distributed.
Violation of these assumptions will provide biased estimate of coefficients.
What is Regression good for?
There are two uses of regression: prediction and causal analysis. In a prediction
study, the aim is to develop a formula for making predictions about the dependent
variable, based on the observed values of exogenous variables. For example, an
economist may want to predict next years GNP based on such variable as last years
GNP, current interest rates, current level of rate of investment and other variables.
In a causal analysis, the independent variables are regarded as causes of the
dependent variables. The aim is to determine whether a particular exogenous variable
really affects the endogenous variable and to estimate the magnitude of that effect, if
any. However, these two uses are not mutually exclusive.
Why is linear regression so popular?
It does two things: for prediction studies, it makes possible to combine many
variables to produce optimal predictions of the endogenous variable and for causal
analysis, it separates the effects of exogenous variables on the endogenous variables.
Sophisticated non-linear regression models are very complicated and require high level
of mathematical skill and specialized software.
What will happen if true relationship is not linear?
In a bivariate analysis, the relationship between the two variables can easily be
identified by plotting the data on graph. If a straight line is formed, linear function is
used. It is difficult to know the linearity among the variable when the number of
exogenous variables are large. If the real relationship is non-linear, and the linear
function is used, the analysis may provide inefficient results. A useful general
principle in science is that when you do not know the true form of a relationship, start
with something simple. A linear equation is perhaps the simplest way to describe a
relationship between two or more variables and still get reasonably accurate
52
The sum of squared errors produced by the least squares equation and
The sum of squared errors for a least squares equation with no independent
variables (just the intercept).
When an equation has no independent variables, the least squares estimate for the intercept
is just the mean of the dependent variable.
53
R2 = 1- { (Y - ) 2 / (Yi - y)}
R2 = 1- RSS / TSS = ESS/TSS
R-2 = 1- (RSS/ n-k) / (TSS/ n-1)
F = (ESS/ k-1)/ (RSS/ n-k) = (ESS/ k-1) X (n-k) / RSS = ESS (n-k)/RSS (k-1)
Measurement error: very few variables can be measured with perfect accuracy,
especially in social sciences.
Sampling error: In many cases, our data are only a sample from larger population
and the sample will never be exactly like the population.
Uncontrolled variations: there may be so many other variables that are not under
the control. They can disturb the relationship between the dependent and
independent variables included in the function.
The basic assumption is that the errors occur in a random and unsystematic
fashion. We evaluate the extent and importance of this random variation by calculating
confidence intervals or hypothesis tests.
Confidence intervals give us a range of possible values for the coefficients.
Although we may not be certain that the true value falls in the calculated range, we can be
reasonable confident. Hypothesis tests are used to answer the question of whether or not
the true coefficient is zero.
Confidence Interval at 95% level = b 2 S.E.
For instance, if b is 600 and SE is 210, then Confidence interval is 600 + (2 x 210)
= 1020, and 600 (2 x 210) = 180. We can say that we are 95% confidence that the true
coefficient lies somewhere between 180 and 1020.
In published research using regression analysis, we are more likely to see
hypothesis tests than confidence intervals. Usually, the kind of question people most want
answered is does this particular variable really affect the dependent variable? If a
variable has no effect, then its true value is zero. To know whether a coefficient is
significantly different from zero, T- test is conducted.
T- Statistics = b / SE
54
Then we consult a t table (or computer does this for us) to calculate the associated p value.
If p value is small, it is taken as evidence that the coefficient is not zero.
What is Hypothesis Testing?
It is analogous to decision reached in court of law. Under the court system, a defendant is
brought to trail and he is assumed to be not guilty. For the judge or jury to reject the
findings of guilty, sufficient evidence must be produced. In the court system, error can be
made, innocent defendant can be found guilty and guilty individual cannot be found guilty.
Under a legal system where the evidence must show beyond a shadow of doubt that the
assumption of non-guilt is to be rejected, there is a primary concern for the influential
error of the first type i.e., of convicting an innocent person. Just as defendant is assumed
not guilty until proven guilty, in hypothesis testing, the null hypothesis is assumed true
until there is sufficient evidence that it is not true.
How does regression control for variables?
Another use of multiple regression is to examine the effects of some independent
variables on dependent variable while controlling for other independent variables. In
regression analysis, coefficient for one variable can be interpreted while holding the other
variables constant.
How do we interpret regression results?
There are different ways of interpretation of results from different types of
variables. As discussed, there are three types of variables: interval scales, ordinal scales,
and nominal scales.
What can go wrong with regression analysis?
Any tool as widely used as regression is bound to be frequently misused.
Nowadays, statistical packages are so user-friendly that anyone can perform a multiple
regression with a few mouse clicks. As results, many researchers apply it to their data with
little understanding of the underlying assumptions or the possible pitfalls.
1. Inclusion of wrong variables
The following results indicates that the wrongly selected data and variable may
provide misleading results State-wise data for the year 2000-01 was collected to assess
the impact of technical education on economic development by some researchers. The
results are as follows:
PCI = 50.96** + 0.33** TE + 0.08 GE
(2.71)
(3.64)
(0.47)
R2 = 0.55
F-Value = 8.42
(2.77)
(1)
(0.63)
(30.03
R2 = 0.61
F-Value = 8.42
(2)
It is bad.
57
In the
58
Each command must begin on a new line and end with a period (.).
Most subcommands are separated by slashes (/). The slash before the first
subcommand on a command is usually optional.
Variable names must be spelled out fully.
59
60
61
Now the selected variable appears in a box on the right and disappears from the left box.
Note that when a variable is highlighted in the left box, the arrow button is pointed right
for you to complete the selection. When a variable is highlighted in the right box, the
arrow button is pointed left to enable you to deselect a variable (by clicking the button) if
necessary. If you need additional statistics besides the frequency count, click the
Statistics... button at the bottom of the screen. When the Statistics... dialog box appears,
make appropriate selections and click Continue. In this instance, we are interested only in
frequency counts. The output appears on the Viewer screen
The mean, standard deviation, minimum, and maximum are displayed by default. The
variables are displayed, by default, in the order in which you selected them. Click
Options... for other statistics and display order. The following output will be displayed on
the Viewer screen.
The MEANS procedure displays means, standard deviations, and group counts for
dependent variables based on grouping variables. To run the MEANS procedure:
Select Mean, Number of cases, and Standard Deviation. Normally these options
are selected by default. If any other options are selected, deselect them by clicking
them
Click Continue
Click OK
The output will be displayed on the Viewer screen.
T-test
T-test is a data analysis procedure to test the hypothesis that two population means are
equal. SPSS can compute independent (not related) and dependent (related) t-tests. For
independent t-tests, you must have a grouping variable with exactly two values (e.g., male
and female, pass and fail). The variable may either be numeric or character. Suppose you
have a grouping variable with more than two categories. You may use the RECODE
(Transform/Recode) command to collapse the categories into two groups. RECODE is a
powerful SPSS command for data transformation with both numeric and string variables.
Select Analyze/Compare Means/Independent-Samples T-test...
Select Variables
Select Grouping Variable.
Click on Define Groups...
Type 1 for Group 1, and 2 for Group 2.
A t-test with two related variables is performed using the Paired-Samples T-Test from the
Analyze/Compare Means menu.
One-way Analysis of Variance
The statistical technique used to test the null hypothesis that several population means are
equal is called analysis of variance. It is called that because it examines the variability in
the sample, and based on the variability, it determines whether there is a reason to believe
the population means are not equal. The statistical test for the null hypothesis that all of
the groups have the same mean in the population is based on computing the ratio of within
and between group variability estimates, called the F statistic. A significant F value only
tells you that the population means are probably not all equal. It does not tell you which
pairs of groups appear to have different means. To pinpoint exactly where the differences
are, multiple comparisons may be performed.
63
usually prefer to use econometric methods to measure efficiency. In the 1990s, many of
them have also started using DEA because of its ability to handle multiple inputs and
outputs and its suitability for studying the performance of both manufacturing and service
sectors DMUs.
STOCHASTIC FRONTIER ANALYSIS
Deterministic frontier approach does not incorporate the measurement errors and
other noise. In it, all deviations from the frontier are assumed to be the result of technical
inefficiency, whereas, stochastic frontier production function (SFPF) accommodates
exogenous shocks. This involves the specification of the error term as being made up of
two components: a symmetric component permitting random variation of the frontier
across firms, and captures the effects of measurement error, other statistical noise, and
random shocks outside the DMUs control and a one-sided component capturing the effects
of inefficiency relative to the stochastic frontier.
Aigner, Lovell and Schmidt (1977), Meeusen and van den Broeck (1977), and
Battese and Corra (1977) propose the SFPF. Consider the following Cobb-Douglas
production function:
(1)
y i = f ( xi , ) + i i = 1, 2.., N.
where yi is the logarithm of the (scalar) output (Y) for the ith firm, xi is a (K+1)row
vector whose first element is 1 and the remaining elements are the logarithms of the Kinput quantities used by the ith firm; = ( 0, 1, k) is a (K+1)column vector
of unknown parameters to be estimated; i is random errors which is: i = vi u i . Thus,
equation (3.1) can be written as:
i = 1, 2.., N.
(2)
y i = f ( xi ) + vi u i
2
vi ~ N (0, v ) is a two sided error term representing the usual statistical noise found in any
relationship, and u i > 0 is one side error term representing technical inefficiency in the
sense that it measures the shortfall of output (yi) from its maximal possible value given by
the stochastic frontier [f(xi + vi]. The model (2) is known as SFPF because the output
values are bounded above by the stochastic (random) variable exp (xi + vi). The random
error vi can be positive or negative (Coelli, et al., 1998)
Direct estimates of the stochastic frontier model can be obtained either by
maximum likelihood or corrected ordinary least square (COLS) methods. Introducing
specific probability distributions for vi and ui, assuming that ui and vi are independent and
that xi is exogenous, the asymptotic properties of the maximum likelihood estimators can
be obtained. The model can also be estimated by COLS by adjusting the constant term by
E(ui), which is derived from the moments of the OLS residuals. Once a model of this form
is estimated, one can readily obtain residuals i = yi - f(xi, ), which can be regarded as
estimates of the error terms i .
Meeusen et al. (1977) assign an exponential distribution to u, Battese and Corra
(1977) assign a half normal distribution to u, and Aigner et al. (1977) consider both
distributions for u. Parameters to be estimated are , v2 and variance parameter u2
associated with u. Either distributional assumption on u implies that the composed error (v
- u) is negatively skewed and statistical efficiency requires that the model be estimated by
maximum likelihood. After estimating production frontier, an estimate of mean technical
inefficiency in the sample is provided by E (-u) = E (v - u) = - (2/)1/2 u in the normalhalf normal case and by E (-u) = E (v- u) = -u in the normal-exponential case.
SFA approach gives less biased measure of efficiency. However, it could only
provide average technical efficiency measures for the sample observations. Although these
aggregate measures are useful in a way, individual observation- specific technical
efficiency measures are more useful from a policy viewpoint. Jondrow, Lovell, Materov
65
and Schmidt (1982) and Kalirajan and Flinn (1983) independently considered the Aigner
et al. (1977) and Meeusen and van den Broeck (1977) stochastic models to predict the
random variable ui under the assumption that i is known. SFA does not have a priori
justification for the selection of any particular distribution form of the random error term
and resulting efficiency measures may be sensitive to the distributional assumption.
Another problem with SFA is that it cannot handle multiple output variables at a time
(Thanassoulis, 2001).
DATA ENVELOPMENT ANALYSIS
DEA is a linear programming (LP) based multi-factor productivity analysis model
for measuring the relative efficiency of homogenous set of DMUs. It optimises on each
individual observations with an objective of calculating a discrete piecewise frontier
determined by the set of Pareto-efficeint DMUs. It does not require any specific
assumptions about the functional form. It calculates a maximal performance measure for
each DMU relative to all other DMUs in the observed population with the sole
requirement that each DMU lie on or below the external frontier. Each DMU not on the
frontier is scaled down against a convex combination of the DMUs on the frontier facet
closest to it (Charnes, et al. 1994).
There is an increasing concern with measuring and comparing the efficiency of
organisational units such as local authority departments, schools, hospitals, shops, bank
branches and similar instances where there is a relatively homogeneous set of units.
The usual measure of efficiency, i.e.:
is often inadequate due to the existence of multiple inputs and outputs related to different
resources, activities and environmental factors. DEA methodology is developed to solve
this problem. This technique is quite useful for measuring the efficiency of service sector
DMUs, especially the government organization providing public goods.
We have two basic DEA modelsCCR model, developed by Charnes, Cooper and
Rhodes in 1978 and BCC model, developed by Banker, Charnes, and Cooper in 1984.
CCR model generalises the single output/input ratio measure of efficiency for a single
DMU in terms of fractional linear programming (FLP) formulation transfroming the
multiple output/input characteristics of each DMU to that of a single virtual output and
virtual input. The model defines the relative efficiency for any DMU as a weighted sum
of outputs divided by a weighted sum of inputs where all efficiency scores are restricted to
lay between zero and one. An efficiency score less than one means that a linear
combination of other units from the sample could produce the same vector of outputs
using a smaller vector of inputs. The score reflects the radial distance from the estimated
production frontier to the DMU under consideration. Variables in the model are inputoutput weights and the LP solution produces the weights most favourable to the unit under
reference. In order to calculate efficiency scores, FLP is converted into LP by normalising
either the numerator or the denominator of the fractional programming objective function.
In case of output maximization DEA program, the weighted sum of inputs is constrained
to be unity to maximize weighted sum of outputs, while in input-minimization DEA
program, the weighted sum of outputs is constrained to be unity to minimize weighted sum
of inputs. CCR model is based on constant returns to scale assumption. Under this
assumption, if the input levels of a feasible input-output correspondence are scaled up or
down, then another feasible input-output correspondence is obtained in which the output
66
levels are scaled by the same factor as the input levels (Thanassoulis, 2001).
Another version of DEA was given by Banker, Charnes and Cooper (1984). The
primary difference between BCC and CCR models is the convexity constraint, which
represents the returns to scale. The CCR model is based on the assumption that constant
return to scale exists at the efficient frontiers whereas BCC assumes variable retunes to
scale frontiers. CCR efficiency is overall technical efficiency (OTE), known as global
technical efficiency whereas BCC efficiency is the pure technical efficiency (PTE) net of
scale-effect, known as local technical efficiency. If a DMU scores value of both CCRefficiency and BCC-efficiency one, it is operating in the most productive scale Size
(MPSS). If a DMU has BCC-efficiency score one and CCR-efficiency score less than one,
it is operating locally efficiently but not globally efficiently due to the scale size of the
DMU. Thus, inefficiency in any DMU may be caused by the inefficient operation of the
DMU itself (BCC-inefficiency) or by the disadvantageous conditions under which the
DMU is operating (scale-inefficiency). Scale efficiency is estimated by dividing the CCRefficiency from the BCC-efficiency for a DMU. Another technique based on DEA is
Malmquist Productivity Index (MPI) proposed by Caves, et al. in 1982. The MPI is
defined with distance functions. For panel data, distance functions permit to describe
multiple input-output production technologies without behavioural objectives such as
profit maximisation or cost minimisation. The detail description of the MPI model is
presented in the chapter 7.
GROWTH OF DEA APPROACH
Since the publication of the seminal paper of Charnes, et al. (1978), a numerous
research papers have been written on both theoretical and applied aspects of the DEA
approach. On theoretical facets, a number of DEA models and their extensions have been
made. Weight restriction, non-discretionary inputs and outputs, categorical inputs and
outputs, sensitivity analysis, input congestion, returns to scale, bad outputs, supper
efficiency, target setting etc. are the major aspects on which extension of DEA models
have been made. In parallel with the theoretical development, a wide range of empirical
studies have also been published which evince the inexhaustible potential of DEA for
innovative applications.
Originally, DEA was applied to estimate the relative efficiency of non-profit
organizations such as educational institutions, government hospitals, public utilities, etc.
where market prices are not generally available. However, its ability to use multiple
output-input variables without a priori underlying functional form assumption has
motivated the researchers to extend it to the profit-organizations also. Some of the areas
where applications of DEA have been made frequently by the researcher are: banks,
academic institutions, hospitals, public utilities like gas, water, electricity supply, police
services, transport services, agriculture, and industry. Moreover, development of DEAbased MPI for measuring total factor productivity growth and its decomposition into
technical efficiency change and technical progress is the significant achievement in the
field of productivity analysis.
Terminology of DEA
1. Benchmarking: It is the process of comparing the performance of an individual
organization against a benchmark, or ideal level of performance. Benchmarks can be
set on the basis of performance over time or across a sample of similar organizations
or some externally set standard.
2. Best Practices: Best practices refer to the set of management and work practices that
results in the highest potential or optimal quantity and combination of outputs for a
67
68
11. Returns to Scale: It refers to a measure of change in output resulting from a change in
the scale of a firms operation as determined by its input usage. There are three returns
to scaleincreasing, constant and decreasing. When inputs are doubled and output
increases more than double, it is increasing returns to scale. If the output increases in
the same proportion as inputs are increased, it is constant returns to scale. Decreasing
returns to scale exists when output increases less than the proportional increase in the
inputs.
12. Pure Technical Efficiency: It refers to the proportion of technical efficiency which is
attributed to the efficient conversion of inputs into output. Effect of size of plant on the
efficiency is neutralized in it. It is also known as managerial efficiency or local
efficiency. It is estimated through BCC DEA model which is based on the variable
returns to scale technology assumption. Value of pure technical efficiency score lies
between zero and one.
13. Technical Efficiency: Technical efficiency refers to the firms ability to produce the
maximum possible output from a given combination of inputs and technology. In
DEA, technical efficiency is determined by the difference between the observed
quantities of a DMUs output (s) to input (s) and the ratio achieved by best practice
DMUs. It is, therefore, a relative technical efficiency, not the absolute technical
efficiency. Its value lies between zero and one. If a DMU is on the production frontier
and does not have any input or output slack, its technical efficiency score will be equal
to one. Technical efficiency can be decomposed into scale efficiency and pure
technical efficiency.
14. Scale Efficiency: The extent to which an organization can take advantage of returns to
scale by altering its size towards optimal scale. In DEA analysis, scale efficiency for a
DMU is calculated by dividing CCR efficiency score from BCC efficiency score. As
BCC score is more than or equal to CCR score, value of scale efficiency score lies
between zero and one.
15. Slacks: Slacks in DEA refer to the extra quantity by which an input (output) can be
reduced (increased) to obtain technical efficiency after all inputs (outputs) have been
radially reduced to reach the production frontier.
DEA MODELS
Basic DEA models are described as:
CCR Model
This model generalizes the usual input/output ratio measure of efficiency for a
given firm in terms of a fractional linear program formulation. Mathematically, the
relative efficiency of the kth DMU is given by:
s
u
Max h k =
rk
r =1
m
yrk
(1)
ik
xik
rk
y rj
i =1
Subjected to:
s
u
r =1
m
v
i =1
ik
j = 1. k. n
xij
69
u rk
m
u
i =1
ik
vik
ik
i =1
r = 1....s
i=1..m
xik
xik
Where:
y rk = the amount of the r th output produced by the k th DMU; x ik = the amount of the
i th input used by the k th DMU; u rk = the weight given to the r th output of the k th DMU;
v ik = the weight given to the i th input of the k th DMU; n= no. of DMUs ; s= no. of
outputs; m= no. of inputs; and = a non-Archimedean (infinitesimal) constant
The above objective function is reformulated in LP problem as follows:
s
Max w k =
r =1
rk
(2)
y rk
Subjected to
m
i =1
ik
x xk = 1
r =1
i =1
rk y rj ik xij 0
j = 1......n
rk
ik
r = 1......s
i = 1.......m
Since the number of DMUs is generally larger than the total number of inputs and
outputs, solving the dual of the model can reduce the computational burden.
Mathematically, the dual formulation of the above model is:
s
r =1
i =1
Min z k = k S rk+ S ik
(3)
Subjected to
n
j =1
jk
y rj S rk+ = y rk
r = 1..........s
jk
xij + S ik = k xik
i = 1..........m
j =1
jk 0
j = 1..........n
k free
S rk+ , S ik 0 ; r = 1.....s, i = 1......m
Where:
S rk+ = Slacks in the i th input of the k th DMU; S ik
70
BCC Model
The primary difference between BCC model and CCR model is the convexity
constraint. In the BCC model jk s are restricted to summing to one (i.e.
n
impose
jk 1 instead of
j =1
j =1
jk
=1). If we
j =1
jk
j =1
jk
1 instead of
j =1
jk
=1,
CRS
e
e
VRS
m
l
q
i
b
.
n
Y
Figure-1 makes the comparison of CCR and BCC models. The CCR model is
based on constant returns to scale (CRS) technology assumption and the BCC model is
based on variable returns to scale (VRS) technology assumption. The CRS surface is the
straight-line oicm and the VRS surface is abcde.
71
the unit in which case they can be incorporated as inputs, or whether they are resource
users in which case they may be better included as outputs. For example in comparing
efficiency of schools research has indicated that in general parents of higher educational
attainment provide greater support to their children and therefore are effectively an
additional resource to the schools and should be classed as an input. Tobit regression is an
appropriate method to study the impact of environmental and background factors on
efficiency. It assumes that the data are truncated, or censored, above or below certain
values. In DEA, values of dependent variable are censured as they range between 0 and 1.
Productivity Measurement Methods
Introduction
Productivity growth is one of the major determinants of competitiveness and
profitability of a firm. A higher level productivity growth may result in lower product
prices, better remunerations and working conditions to the employees, better returns to
the investors and adequate surplus to the firm for plant expansion and modernization.
Technical change and technical efficiency change are the two sources of productivity
growth. A study of these sources is crucial for identifying the factors that are
responsible for the productivity stagnation and for adopting appropriate measures at
firm, industry and government levels to improve the productivity. In this chapter, we
examine the productivity growth and its sources in the sugar mills of Uttar Pradesh. A
non-parametric approach, known as Malmquist productivity index (MPI) is applied on
the panel data of seven years collected from 36 sugar mills of the state. Outputoriented DEA method is used for estimation of the TFP growth and its decomposition
into technical efficiency change and technical progress. Technical efficiency change is
further decomposed into pure technical efficiency change and scale efficiency change.
The study also examines inter-sector and inter-region variations in the TFP growth and
its components.
Productivity Measurement Approaches
Most commonly used measures of productivity are partial or single factor
productivity and total factor productivity. Single factor productivity is the ratio of total
output to the quantity or number of the factor for which productivity is to be estimated.
Single factor productivity provides a distorted view about the contribution of a factor to
the total production. For instance, partial productivity of labour can be increased by
reducing quantity of labour and increasing quantity of capital in the production unit.
Therefore, concept of total factor productivity (TFP) is more relevant in context of
resource use efficiency. TFP is defined as the ratio of weighted sum of output to the
weighted sum of inputs. Over the last three decades, several theories and methods of TFP
measurement have been developed. Before the mid 1990s, most studies estimated TFP
growth by growth accounting approach (Frank et al., 2002). The approach is based on
unrealistic assumptions of perfect competition and constant returns to scale. It assumes
that a firm operates on its production frontier, implying that the firm has 100 per cent
technical efficiency. Thus, TFP growth measured through this approach is due to technical
change, not due to technical efficiency change (Mawson, et al., 2003). Parametric
(stochastic frontier analysis) and non-parametric (DEA-based MPI) are the other two
productivity measurement approaches which use panel data for estimation of productivity
of individual production units. These approaches do not assume that all production units
operate at 100 per cent technical efficiency. According to the MPI approach, TFP can
increase not only due to technical progress (shifting of frontier) but also due to
73
improvement in technical efficiency (catch up). The approach has become quite popular
because: (i) it does not require price data, therefore suitable when price data are not
available or price data are distorted; (ii) it rests on much weaker behavioral assumptions,
since it does not assume cost minimizing or revenue maximizing behaviour; (iii) it uses
panel data and provides a decomposition of productivity change into two components
technical change and technical efficiency change. Technical change reflects improvement
or deterioration in the performance of best practice firms, while technical efficiency
change reflects the convergence toward or the divergence from best practice on the part of
the remaining firms. The significance of the decomposition is that it provides information
on the source of overall productivity change in the firms.
THE MPI Model
The MPI was initially introduced by Caves, Christensen and Diewert (CCD) in
1982 and was empirically applied by Fare, Grosskopf, Lindgren and Roos (FGLR) in 1992
and Fare, Grosskopf, Norris and Zhing (FGNZ) in 1994. Since then, several extended
versions of MPI and its decomposition have been developed by the researchers. A few of
them are: Ray and Desli (1997), Simer and Wilson (1998), Grifell-Tatje and Lovell
(1999), Balk (2001), Kumar and Russell (2002) and Chem and Ali (2004).
DEA analysis is static in nature as the performance of a mill is assessed in response
to the best practice mills in a given year. The shift of frontier overtime is not accounted for
by this assessment. To account for this dynamic shift, the MPI model is used. Since it is
also capable of decomposing the productivity growth in technical efficiency change and
technical progress, it is able to shed light on the mechanism of productivity change (Ma, et
al., 2002).
The MPI is defined with distance functions. Distance functions allow us to
describe multiple input-output production technology without the need to specify a
behavioural objective such as cost minimization or profit maximization (Coelli, et al.,
1998). Both output and input distance functions can be defined. With the given input
vector, an output distance function maximizes the proportional expansion of the output
vector, whereas in case of input distance function, the aim is to minimise the input vector,
given the output vector.
The output-oriented Malmquist TFP change index between period t (the base
period) and period t+1 is given by
(1)
D0t ( y t +1 , x t +1 ) D0t +1 ( y t +1 , x t +1 ) 12
*
]
t
t
t
t +1
t
t
D0 ( y , x )
D (y , x )
Equation (1) is the geometric0mean of the two
indicestechnical efficiency change
and technical change. The first is estimated with respect to period t technology and second
with respect to period t+1 technology. Assuming that D0t ( y t , x t ) 1 and
M 0t +1 ( y t +1 , x t +1 , y t , x t ) = [
t +1
0
D0t +1 ( y t +1 , x t +1 ) D0t ( y t +1 , x t +1 )
D0t ( y t , x t ) 12
(y , x , y , x ) =
[
*
]
D0t ( y t , x t ) D0t +1 ( y t +1 , x t +1 ) D0t +1 ( y t , x t )
t +1
t +1
(2)
Where, the ratio outside the square brackets in equation (7.2) represents technical
efficiency change (effch) and the expression in the square brackets indicates technical
change (techch). Thus, MPI can be decomposed into change in technical efficiency
(catching up) and into change in frontier (technical progress):
74
effch =
D0t +1 ( y t +1 , x t +1 )
D0t ( y t , x t )
(3)
D t ( y t +1 , x t +1 )
Dt ( yt , xt ) 1
(4)
techch = [ t 0+1 t +1 t +1 * t 0+1 t t ] 2
(
,
)
(
,
)
D
y
x
D
y
x
0
0
Technical efficiency change (effch) measures the change in technical efficiency
between periods t and t+1 with respect to the production possibilities existing in each
period. Technical change (techch) is the geometric mean of the shifts in frontier at the
factor ratios of periods t+1 and t respectively. The value of the MPI greater than 1 means
productivity growth and a value less than 1 means deterioration in productivity. The same
is applicable to each of the components of the Malmquist Productivity Index.
Figure-1 describes the MPI with one input (x) and one output (y) under CRS
technology and its decomposition into efficiency change and technical change, MPI under
CRS technology indicates a rise in potential productivity as the technology frontier shifts
from t to t+1. Points P and R in the figure represent the input-output combinations of a
production unit (Mill) in periods t and t+1respectively. In both periods, the unit is
operating below the production possibility frontier.
y
Frontier in
period t+1
Y3
yt+1
Y2
Frontier in
period t
Y1
Yt
p
0
Xt
Xt+1
(7.6)
(7.7)
75
In order to calculate the productivity of the year between t and t+1, we need to
solve four different LP problems: D t ( x t, y t), D t+1( x t, y t ), D t ( x t+1, y t+1 ), and D t+1( x
t+1
, y t+1 ). Mathematical formulations are shown in Box 1. If technical efficiency change is
to be decomposed into scale efficiency change and pure technical efficiency change, two
more LP problems are to be solved by putting the convexity restriction in (7.8) and (7.9),
that is, one would estimate these two distance functions relative to VRS technology
(Coelli et al., 1998).
Box-1
Linear Programming Formulation of MPI
The MPI requires the following four LP problems:
max
[d 0t X t , Y t ] 1 =
subject to
n
y rt
t
j
j =1
t
y rj
t
j
j =1
xijt
r = 1..........s
xit
i = 1..........m
tj 0
j = 1..........n
is unrestricted in sign
( 7.8)
max
[d 0t +1 X t +1 , Y t +1 ] 1 =
subject to
n
y rt +1
n
j =1
j =1
t +1
j
t +1
j
y rjt +1
xijt +1
r = 1..........s
xit +1
i = 1..........m
j = 1..........n
tj+1 0
is unrestricted in sign
(7.9)
max
[d 0t X t +1 , Y t +1 ] 1 =
subject to
n
y rt +1
j =1
t
j
j =1
t
j
t
y rj
xit +1
xijt
r = 1..........s
i = 1..........m
j = 1..........n
tj 0
is unrestricted in sign
[d 0t +1 X t , Y t ] 1 =
(7.10)
max
,
subject to
y rt
n
j =1
j =1
t +1
j
t +1
j
xijt +1
y rjt +1 0
xit
r = 1..........s
i = 1..........m
tj+1 0
76 j = 1..........n
is unrestricted in sign
(7.11)
In this lecture, we shall discuss two advanced topics of multivariate analysis. They are
discriminate analysis and factor analysis
Discriminant Analysis
Researchers often wish to classify people or objects into two or more groups. One
might need to classify persons as buyers or non-buyers, good or bad credit risks or
superior, average or poor performers in some activity. The objective is to establish a
procedure to find the predictors that best classify subjects.
Discriminant analysis joins a nominally scaled criterion or dependent variable with
one or more independent variables that are interval or ratio scaled. Once the discriminant
equation is found, it can be used to predict the classification of a new observation. The
researchers may be interested to check whether the predictor variables discriminate among
the group. More specifically, it is essential to identify which independent variable is more
important when compared to other predictor variables. This is done by calculating a linear
function.
Discriminant function analysis, known as discriminant analysis (DA) is used to
classify cases into the values of a categorical dependent, usually a dichotomy. It is applied
when grouping variable has only two categories. Multiple discriminant analysis (MDA) is
used to classify a categorical dependent that has more than two categories. MDA is
sometimes also called discriminant factor analysis or canonical discriminant analysis.
DA shares all the usual assumptions of correlation, requiring linear and homoscedastic
relationship.
Like multiple regression, it also assumes proper model specification
(inclusion of all important independents and exclusion of extraneous variables). It also
assumes the dependent variable is a true dichotomy.
Objectives of DA
The criterion variable. This is the dependent variable, also called the grouping
variable.
analogous to multiple regression, but the b's are discriminant coefficients which
maximize the distance between the means of the criterion (dependent) variable.
The eigenvalue, also called the characteristic root of each discriminant function,
reflects the ratio of importance of the dimensions which classify cases of the
dependent variable. There is one eigenvalue for each discriminant function. The
eigenvalues assess relative importance because they reflect the percents of variance
explained in the dependent variable, cumulating to 100% for all functions.
The canonical correlation, R*, is a measure of the association between the groups
formed by the dependent and the given discriminant function. When R* is zero,
there is no relation between the groups and the function. When the canonical
correlation is large, there is a high correlation between the discriminant functions
and the groups. R* is used to tell how much each function is useful in determining
group differences. An R* of 1.0 indicates that all of the variability in the
discriminant scores can be accounted for by that dimension.
The discriminant score, also called the DA score, is the value resulting from
applying a discriminant function formula to the data for a given case. The Z score
is the discriminant score for standardized data.
Unstandardized discriminant coefficients are used in the formula for making the
classifications in DA, much as b coefficients are used in regression in making
predictions. The constant plus the sum of products of the unstandardized
coefficients with the observations yields the discriminant scores. That is,
discriminant coefficients are the regression-like b coefficients in the discriminant
function, in the form L = b1x1 + b2x2 + ... + bnxn + c, where L is the latent variable
formed by the discriminant function, the b's are discriminant coefficients, the x's
are discriminating variables, and c is a constant. There will be no constant when
the data are standardized or are deviations from the mean. The discriminant
function coefficients are partial coefficients, reflecting the unique contribution of
each variable to the classification of the criterion variable. The standardized
discriminant coefficients, like beta weights in regression, are used to assess the
relative classifying importance of the independent variables.
78
ANOVA table for discriminant scores is another overall test of the DA model. It is
an F test, where a "Sig." p value < .05 means the model differentiates discriminant
scores between the groups significantly better than chance (than a model with just
the constant).
(Variable) Wilks' lambda also can be used to test which independents contribute
significantly to the discrimiinant function. The smaller the variable Wilks' lambda
for an independent variable, the more that variable contributes to the discriminant
function. Lambda varies from 0 to 1, with 0 meaning group means differ (thus the
more the variable differentiates the groups), and 1 meaning all group means are the
same. The F test of Wilks's lambda shows which variables' contributions are
significant. Wilks's lambda is sometimes called the U statistic. In SPSS, this use of
Wilks' lambda is in the "Tests of equality of group means" table in DA output.
Method of Estimation
DA is done by calculating a linear function of the form:
Di = d0 + d1X1 + d2X2 + ..dpXp
Where
Di is the score on discriminate function i; the Xs are the values of the discriminating
variable used in the analysis; di s are weighting coefficients; and d0 is constant.
A single discriminant equation is required if the categorization calls for two groups. If
three groups are involved, it requires two discriminant equations. If more categories are
called for in the dependent variable, it is necessary to calculate a separate discriminant
function for each pair of classification in the criterion group. Here we shall describe twogroup DA.
Let X1 and X2 be the predictor variables; G1 and G2 two groups and n1 and n2 number of
set of observations in G1 and G2, respectively.
Calculation Process
1. Find the mean of X1 and X2. Let 1(G1) be the mean of X1 and 2(G2) be the mean of
X2
in Group-2. Also find the aggregate mean of X1 and X2..
2. In each group, find X12, X22 and X1X2.
3. Define the linear composition as Di = d1X1 + d2X2 and find the value of d1 andd2 by
solving the following normal equations.
d1(X1 - 1)2 + d2 (X1 1) X2 2) = 1(G2) 1(G1)
d1(X1 - 1) (X2 2) + d2 (X2 2)2 = 2 (G2) 2(G1)
The sum of squares in the above normal equation can be substituted with the following
simple formula.
(X1 - 1)2 = (X1 1 (G1))2 + (X1 2 (G2))2
79
n2
j=1
j=1
where S1j and S2j are the discriminant scores for the jth set of observations in the group-1
and group-2, respectively; S1 and S2 are the mean of discriminant scores of group-1 and
group-2.
7. Find the discriminate ratio (K) K = VBG/ / VWG
This is the maximum possible ratio between the variability between groups and the
variability within group.
Example
The director of a management school wants to do discriminate analysis concerning the
effect of two factors, namely, the yearly spending (Rs.lakh) on infrastructure of the school
(X1) and the yearly spending on interface events of the school (X2) on the grading of the
school by an inspection team. The data are given below:
Table-1
year
Grade
Expenditure on Expenditure on
infrastructure
interface events
(Rs lakh) X1
(Rs lakh) X2
1993-94
Below average
3
4
94-95
Below average
4
5
95-96
Above average 10
7
96-97
Below average
5
4
80
97-98
Below average
98-99
Above average
99-00
Below average
00-01
Above average
01-02
Below average
02-03
Below average
03-04
Above average
04-05
Above average
Below = 0 and above = 1
6
11
7
12
8
9
13
14
6
4
4
5
7
5
6
8
Expenditure on
infrastructure
(Rs lakh) X1
3
4
5
6
7
8
9
42
6
Expenditure on
interface events
(Rs lakh) X2
4
5
4
6
4
7
5
35
5
Expenditure on
infrastructure
(Rs lakh) X1
10
11
12
13
14
60
12
8.5
Expenditure on
interface events
(Rs lakh) X2
7
4
5
6
8
30
6
5.41666
Calculation process
Table-2
year
1993-94
94-95
96-97
97-98
99-00
01-02
02-03
Total
Mean
Table-3
year
95-96
98-99
00-01
03-04
04-05
Total
Mean
Aggregate
Mean
(G-1+G-2)
Grade
(Group-1)
0
0
0
0
0
0
0
Grade
(Group-2)
1
1
1
1
1
Table-4
year
1993-94
94-95
96-97
97-98
99-00
01-02
Grade
(Group-1)
0
0
0
0
0
0
X1
X2
X12
X22
X1 X2
3
4
5
6
7
8
4
5
4
6
4
7
9
16
25
36
49
64
16
25
16
36
16
49
12
20
20
36
28
56
81
02-03
Total
9
42
5
35
81
280
25
183
45
217
Grade
(Group-2)
1
1
1
1
1
X1
X2
X12
X22
X1 X2
10
11
12
13
14
60
7
4
5
6
8
30
100
121
144
169
196
730
49
16
25
36
64
190
70
44
60
78
112
364
Table-5
year
95-96
98-99
00-01
03-04
04-05
Total
Table-6
Sum of squares
(X1 - 1)2 = (X1 1 (G1))2
+ (X1 2 (G2))2
Below
28.
8
(X2 2)2 = (X2 2 (G1))2
2
+ (X1 2 (G2))
[(X1 1) (X2 2)] = [(X1 7
1(G1)) (X2 2 (G1))] + [(X1
1(G2)) (X2 2 (G2))]
Above
10
total
38
10
18
11
Discriminate Function
Di = d1X1 + d2X2
Normal Equations
d1(X1 - 1)2 + d2 (X1 1) X2 2) = 1(G2) 1(G1)
d1(X1 - 1) (X2 2) + d2 (X2 2)2 = 2 (G2) 2(G1)
d1 38 + 11 d2 = 12 6 = 6
d1 11 + 118 d2 = 6 5 = 1
by solving the equations, we get d1 = 0.17229 and d2 = -0.04973
Di = 0.17229X1 0.04973X2
Computation of Discriminate ratio (K)
Mean discriminate score of each Group
Mean score for G-1 = 0.172291 0.049732
= 0.17229 x 6 0.04973 x 5
= 0.78509
82
year
1993-94
94-95
96-97
97-98
99-00
01-02
02-03
Total
Mean
Mean
(Aggregate)
Group-1
Discriminate
score (S1j)
0.31795
0.44051
0.66253
0.73536
0.00711
1.03021
1.30196
5.49563
0.78509 (G-1)
1.195094
(S1j- S1)
Year
0.218220
0.118755
0.015021
0.002473
0.049293
0.060084
0.267155
0.730981
95-96
98-99
00-01
03-04
04-05
Group-2
Discriminate
score (S1j)
1.37479
1.69627
1.81883
1.94139
2.01422
8.84550
1.7691(G-2)
(S1j- S2)2
0.155480
0.005304
0.002473
0.029684
0.060084
0.253025
j=1
83
FACTOR ANALYSIS
Introduction
Factor analysis can simultaneously manage over a hundred variables, compensate
for random error and invalidity, and disentangle complex interrelationships into their
major and distinct regularities. It takes thousands of measurements and qualitative
observations and resolves them into distinct patterns of occurrence. It makes explicit and
more precise the building of fact-linkages going on continuously in the human mind. It is a
means by which the regularity and order in phenomena can be discerned.
What is Factor analysis?
Factor analysis refers to a variety of statistical techniques whose common
objectives is to represent a set of variables in terms of a smaller number of hypothetical
variables (Kim & Mueller, 1984: 9). It is a technique of data reduction. When a researcher
deals with a large number of variables and does not know exactly which of them are
exogenous and which endogenous, this technique may be of great use for meaningful
analysis and interpretation of data. The term factor analysis was first introduced by
Thurstone in 1931.
Factor analysis assumes that the observed variables are linear combinations of
some underlying (hypothetical or unobservable) factors. Some of these factors are
assumed to be common to two or more variables and some are assumed to be unique to
each variable. The unique factors are assumed to be orthogonal to each other. They do not
contribute to the co-variation among the observed variables (Kim & Mueller, 1983: 8). As
in other multivariate analysis, in factor analysis too, we are concerned with the variance.
We want to know how big it is and where it is. The purpose of this technique is to examine
which variables have what amount of variance in common.
Many statistical methods are used to study the relation between independent and
dependent variables. Factor analysis is different; it is used to study the patterns of
relationship among many dependent variables, with the goal of discovering something
about the nature of the independent variables that affect them, even though those
independent variables were not measured directly. Thus answers obtained by factor
analysis are necessarily more hypothetical and tentative than is true when independent
variables are observed directly.
A factor analysis usually begins with a correlation matrix. It can also use covariances. Without getting deeply into the mathematics, we can say that factor analysis
attempts to express each variable as the sum of common and unique portions. The common
portions of all the variables are by definition fully explained by the common factors, and
the unique portions are ideally perfectly uncorrelated with each other. The degree to which
a given data set fits this condition can be judged from an analysis of what is usually called
the "residual correlation matrix".
A typical factor analysis suggests answers to four major questions:
1. How many different factors are needed to explain the pattern of relationships
among these variables?
2. What is the nature of those factors?
3. How well do the hypothesized factors explain the observed data?
4. How much purely random or unique variance does each observed variable include?
84
85
4. The initially extracted factors are rarely interpretable. In order to get the meaningful
results from the initially extracted common factors, the next step is the rotation of
these factors. The purpose of rotation is to achieve the simplest possible factor
structure. Method of rotation can not improve the degree of fit between the data and
factor structure. It makes the results interpretable. There are several methods of
rotation. In orthogonal rotation, three methods: Quartimax, Varimax, and Equimax, are
applied, while in oblique rotation, two methods: Reference Axes, and Primary pattern
matrix, are used. According to Harman (1968), the varimax solution seems to be the
best parsimonious analytical solution.
5. Lastly, for interpretation and analysis of factors, variables with highest factor loadings
(weights) are taken into account
How many Factors to Extract?
Note that as we extract consecutive factors, they account for less and less
variability. The decision of when to stop extracting factors basically depends on when
there is only very little "random" variability left. Kaisers criterion of eigenvalues
greater than 1, can be adopted for identification of factors. This criterion, proposed by
Kaiser (1960) is probably the one most widely used. Another method is the scree test
first proposed by Cattell (1966). We can plot the eigenvalues shown above in a simple
line plot.
Cattell suggests to find the place where the smooth decrease of eigenvalues
appears to level off to the right of the plot. According to this criterion, we would
probably retain 2 or 3 factors in our example.
86
Un-rotated factors
I
II
0.70
-0.40
0.60
-0.50
0.60
-0.35
0.50
0.50
0.60
0.50
0.60
0.60
2.18
1.39
36.30
23.20
36.30
59.50
h
0.65
0.61
0.48
0.50
0.61
0.72
Rotated factors
I
II
0.79
0.15
0.75
0.03
0.68
0.10
0.06
0.70
0.13
0.77
0.07
0.85
The values in this table are correlation coefficients between the factor and the variable. For
instance 0.70 is the r between variable A and Factor I. These correlations are called
87
loadings. Eigen values are the sum of variance of factor loadings. For example, eigen
value for factor I is 0.702 + 0.602+ 0.502+ 0.602+ 0.602. When divided by number of
variables, an eigenvalue yields an estimate of the amount of total variance explained by
the factor. Communalities (h2) measure the variance in each variable that is explained by
the two factors. It is sum of squares of factor loadings of all the factors for a variable. For
instance, with variable A, communality is 702 + -0.402 = 0.65, indicating that 65 per cent of
the variance in variable A is statistically explained in terms of factor I and factor II.
Un-rotated factor loadings do not provide meaningful results. They are difficult to
interpret. What one would like to find is some pattern in which factor I would be heavily
loaded on some variables and factor II on others. Such a condition would suggest rather
pure constructs underlying each factor. You attempt to secure this less ambiguous
condition between factors and variables by rotation. This procedure can be conducted
through orthogonal method. Rotated factor loadings are given in the table. This shows that
the measurement from six variables may be summarized by two underlying factors.
The interpretation of factor loadings is largely subjective. There is no way to
calculate the meanings of factors; they are what one sees in them. For this reason, factor
analysis is largely used for exploration. One can detect patterns in latent variables,
discover new concepts and reduce data.
MANOVA
Analysis of variance is a special case of regression model, which is generally used to
analyse data collected using experimentation. Multivariate analysis of variance
(MANOVA) examines the relationship between several dependent and independent
variables. Whereas ANOVA assess the differences between groups, MANOVA examines
the dependence relationship between a set of variables across a set of groups. It is a
technique which determines the effects of independent categorical variables on multiple
continuous dependent variables. It is usually used to compare several groups with respect
to multiple continuous variables. The main distinction between MANOVA and ANOVA
is that several dependent variables are considered in MANOVA.
Classification of MANOVA
1. One-way MANOVA: it is similar to the one-way ANOVA. It anayses the
variance between one independent variable and multiple dependent
variables.
2. Two-way MANOVA
Assumptions of MANAVA
1. Normal Distribution
The dependent variables should be normally distributed within groups. Overall, the F test
is robust to non-normality, if the non-normality is caused by skewness rather than by
outliers. Tests for outliers should be run before performing a MANOVA, and outliers
should be transformed or removed.
2. Linearity
It assumes that there are linear relationships among all pairs of dependent variables, all
pairs of covariates, and all dependent variable-covariate pairs in each cell.
88
3. Homogeneity of Variances
Homogeneity of variances assumes that the dependent variables exhibit equal levels of
variance across the range of predictor variables. Homoscedasticity can be examined
graphically or by means of a number of statistical tests.
4. Homogeneity of Variances and Covariances
In multivariate designs, with multiple dependent measures, the homogeneity of variances
assumption described earlier also applies. However, since there are multiple dependent
variables, it is also required that their intercorrelations (covariances) are homogeneous
across the cells of the design.
5. Multicollinearity and Singularity
When correlations among dependent variables are high, problem of multicollinerarity and
singularity exists. Multicollinearity when the relationship between pairs of variables is
high (r>.90). Singularity a variable is redundant; if it is a combination of two or more of
the other variables.
Example:
A social scientist wished to compare those respondents who had lodged an organ donor
card with those who had not. Three hundred and eighty eight new drivers completed a
questionnaire that measured their attitudes towards organ donation, their feelings about
organ donation and their previous exposure to the issue. It is hypothesized that individuals
who agreed to be donors would have more positive attitudes towards organ donation, more
positive feelings towards organ donation and greater previous exposure to the issues.
Therefore, the independent variable was whether a donor card had been signed and the
dependent variables were attitudes towards organ donation, feelings towards organ
donation and previous exposure to organ donation. Attitudes and feelings are measures on
traditional scales with a Likert scale response format. Exposure was measured in terms of
media exposure and personal experience. Conceputally and theoretically these dependent
variables were believed to be related and so MANOVA was the analysis of choice.
Complete data are available on www.johmwiley.com.au/highered/spssv
Results
Between-Subjects Factors
Value Label
signed donor card
yes
189
no
188
Descriptive Statistics
89
signed
donor
card
Mean
Std. Deviation
11.78
15.946
189
8.69
14.752
188
Total
10.24
15.420
377
yes
85.03
47.899
189
donation
no
96.48
62.916
188
Total
90.74
56.114
377
yes
28.07
9.277
189
donation
no
31.20
8.948
188
Total
29.63
9.236
377
no
Covariance Matrices
Box's M
19.260
3.182
df1
df2
1018790.282
Sig.
.004
Multivariate Tests
Effect
Intercept
donor
Value
Hypothesis df
Error df
Sig.
Pillai's Trace
.935
1790.688
3.000
373.000
.000
Wilks' Lambda
.065
1790.688
3.000
373.000
.000
Hotelling's Trace
14.402
1790.688
3.000
373.000
.000
14.402
1790.688
3.000
373.000
.000
Pillai's Trace
.033
4.255
3.000
373.000
.006
Wilks' Lambda
.967
4.255
3.000
373.000
.006
Hotelling's Trace
.034
4.255
3.000
373.000
.006
.034
4.255
3.000
373.000
.006
a. Exact statistic
90
Covariance Matrices
Box's M
19.260
3.182
df1
df2
1018790.282
Sig.
.004
df1
df2
Sig.
2.936
375
.087
15.346
375
.000
1.284
375
.258
Tests the null hypothesis that the error variance of the dependent variable is
equal across groups.
a. Design: Intercept + donor
Dependent Variable
Corrected
exposure to donation
Model
issues
attitude towards organ
donation
feelings towards organ
donation
Intercept
exposure to donation
issues
attitude towards organ
donation
feelings towards organ
donation
Mean
of Squares
df
Square
Sig.
903.925
3.830
.051
12372.705
3.960
.047
922.187
11.100
.001
39489.506
39489.506
167.331
.000
3105144.376
993.910
.000
1 331042.346 3984.772
.000
903.925
12372.705
922.187
331042.346
91
3105144.37
6
donor
exposure to donation
issues
attitude towards organ
donation
feelings towards organ
donation
Error
exposure to donation
issues
attitude towards organ
donation
feelings towards organ
donation
Total
exposure to donation
issues
attitude towards organ
donation
feelings towards organ
donation
Corrected
exposure to donation
Total
issues
attitude towards organ
donation
feelings towards organ
donation
903.925
903.925
3.830
.051
12372.705
12372.705
3.960
.047
922.187
922.187
11.100
.001
88498.590
375
235.996
1171563.820
375
3124.170
31153.824
375
83.077
128924.000
377
4288063.000
377
363028.000
377
89402.515
376
1183936.525
376
32076.011
376
donor
Dependent Variable
card
Mean
Std. Error
Lower Bound
Upper Bound
11.783
1.117
9.586
13.980
no
8.686
1.120
6.483
10.889
yes
85.026
4.066
77.032
93.021
donation
no
96.484
4.077
88.468
104.500
yes
28.069
.663
26.765
29.372
donation
no
31.197
.665
29.890
32.504
92
93
Reports are rarely written in linear order. The order for writing the final sections
may be Conclusions, Introductions & finally the abstract. These are the sections
most likely to be read by readers
For every 1000 readers who see your title, 100 may read the abstract, perhaps 10
will read some of the main report [conclusions, some results etc], at most 1 may
follow all the way through
A middle section such as the methods/ system design may be a good starting point
Writing notes for the introduction, some background theory or a review of previous
studies may help you to clarify the focus of your report.
Writing specifications
Use 10 or 12 point font
The most acceptable fonts: Ariel, Times New Roman (the old reliable), Verdana,
Lucida
Unacceptable fonts:Broadway, Brush Script, Chiller, Courier, Freestyle Script, Gigli, Old
English Text, Playbill, etc.
Your reports should be in pristine condition when they are turned in
No frayed edges or coffee stains (front OR back)
Section/point identification systems
It represents important choices made by the writer regarding:
94
The relative importance of the sections in the report, and the relatedness of
information within sections.
It, Therefore, plays a very important role in communicating meaning to the reader. The
report presents meaning and information in two complementary and equivalent ways:
- The meaning represented by the words, thought, research,
information
- The meaning represented by the layout
Choosing the Layout System
chooses one of the following two layout systems:
Once a system is chosen, the writer must present this system consistently throughout the
report.
The decimal numbering System
First level
1.0
2.0
3.0
4.0
5.0
(of importance/generality) (also termed the A heading)
N.B. The `point-zero' is not always used in decimal numbering
systems
Second level 1.1
2.1
3.1
(also termed the B heading)
Third level
1.1.1
2.1.1
(also termed the C heading)
4.1
5.1
3.1.1
4.1.1
5.1.1
1.0 ________________________________
1.1 _______________________________
1.2 _______________________________
1.2.1 ________________________
1.2.2 ________________________
1.2.2.1 _______________
1.2.2.2 _______________
2.0 ________________________________
95
Number - letter
(still encountered, but becoming less commonly used)
First level (of importance/generality)
(A heading) I
II
III
IV
V
VI
VII
Second level
(B heading) A
Third level
(C heading)
(b)
(c)
(d)
(e)
(f)
(g)
(ii)
(iii)
(iv)
(v)
(vi)
(vii)
Fourth level
(D heading) (a)
Fifth level
(E heading) (i)
II ___________________________________
A ________________________________
96
Table of contents
Your report should include a table of contents if longer than about 5-10 pages.
Allows the reader to gain a v. brief but complete overview of your entire report
from aims to conclusions
Also help to place your project in its context (whether that context is background
information or your purpose in writing is up to you).
Consider the following examples; they represent two extremes that writers can take in
beginning their introductions.
What is the problem with this sentence as an opening to an introduction?
The universe has been expanding from the very moment that it was born.
One of the ways that the sentence above might be rewritten is:
Recent studies suggest that the universe will continue expanding forever and may pick
up speed over time.
97
The rewritten sentence establishes the reports context within recent studies concerning
a specific theory related to universe expansion. This context is much more specific than
that of the original sentence.
The introduction may carry out the following roles
Gives some background to the study & sets the scene for the report.
Explains briefly what you will do & why the study is being carried out
Explains briefly how the report is structured [signposting]
Main Body
Text with headings and sub-headings.
In general, the body of the research report will include three distinct sections:
A section on theories, models, and your own hypothesis
A section in which you discuss the materials and methods you used in your
research
A section in which you present and interpret the results of your research
The headings should be self-explanatory. The main body of the report needs to be
clear, concise and follow a logical order.
Figures and tables must be referred to in the body of the text and need to have clear
captions. Label figures at the bottom and tables at the top in numerical order.
Each figure should be capable of being understood on its own using the caption as
the only reference.
Research articles,
Formal reports, or
scientific papers.
Define and explain your hypothesis and the theories and models you used to
develop it.
98
Define and explain competing hypotheses, theories, and models, including their
strengths and weaknesses.
Compare and contrast the specific points where they agree or disagree.
The following questions are good ones to work through: What do I expect this
experiment to reveal? Why?
How does my hypothesis directly answer the question posed by the problem?
How does the hypothesis fit in with other hypotheses or more general theory? How
will my work challenge or support the work of others?
What are alternative views to this theory? What are the strengths and weakness of
those views?
On what literature did I or can I base my explanation?
What sequence of events did you follow as you handled the subjects/materials or as
you recorded data?
99
The most important general rule is that tables and figures should supplement rather
than simply repeat information in the report.
You should never include a table or figure simply to include them. This is
redundant and wastes your readers time.
Include a concise titleit is a good idea to make the most important feature of the
data the title of the figure.
Use legends and clear, concise, descriptive titles for tables and figures.
Ensure all axes of graphs are labeled and that units are identified in all
tables and figures
100
Consider how the data addresses the research problem or hypothesis outlined in the
Introduction.
Proceed from most general features of the data to more specific results
Discuss what can be inferred from the data as they relate to other research and
scientific concepts
Compare with other studies and draw conclusions based on your findings.
your results are inadequate, negative, or not consistent with earlier studies or with
your own hypothesis.
Do not try to defend your research or minimize the seriousness of the limitation in
your interpretation; instead, focus on the limitation only as it affects the research
and try to account for it.
The associations between birth weight and cognitive function at ages 8, 11, and 15
are evident across the normal birthweight range (>2.5 kg) and so are not accounted
for exclusively by low birth weight
Birth weight is also associated with educational attainment, suggesting that the
association between birth weight and cognition may have functional implications
Conclusions
The conclusion is important because:
It is your last chance to convey the significance and meaning of your research to
your reader by concisely summarizing your findings and generalizing their
importance.
The conclusions you draw are opinions, based on the evidence presented in the
body of your report, but because they are opinions you should not tell the reader
what to do or what action they should take.
Be sure that you use language that distinguishes conclusions from inferences.
Use phrases like This research demonstrates . . . to present your conclusions and phrases
like This research suggests . . . or This research implies . . . to discuss implications.
Make sure that readers can tell your conclusions from the implications of those
conclusions, and do not claim too much for your research in discussing implications. You
can use phrases such as Under the following circumstances, In most instances, or In
these specific cases to warn readers that they should not generalize your conclusions.
You might also raise unanswered questions and discuss ambiguous data in your
conclusion.
Raising questions or discussing ambiguous data does not mean that your own work is
incomplete or faulty; rather, it connects your research to the larger work of science and
parallels the introduction in which you also raised questions.
The following is an example taken from a text that evaluated the hearing and speech
development following the implantation of a cochlear implant. The authors of Beginning
To Talk At 20 Months: Early Vocal Development In a Young Cochlear Implant
Recipient, published in Journal of Speech, Language, and Hearing Research, titled their
conclusion Summary and Caution. Using this title calls readers attention to the
limitations of their research:
Recommendations
This section appears in a report when the results and conclusions indicate that further
work needs to be done or when you have considered several ways to resolve a problem or
improve a situation and want to determine which one is best.
This gives you another opportunity to demonstrate how your research fits within the larger
project of science.
102
It also demonstrates that you fully understand the importance and implications of your
research, as you suggest ways that it could continue to be developed.
References
Reference sections are important because:
Like the sections on the procedure you used to gather data, they allow other
researchers to build on or to duplicate your research.
Without references, readers will not be able to tell whether the information that
you present is credible, and they will not be able to find it for themselves.
Reference sections also allow you to refer to other researchers work without
reviewing that work in detail. You can refer readers to your reference page for
more information.
It is best to compile your own reference list containing a variety of information. This will
save you from having to track down pieces of information you may have neglected to
make note of if they are specifically requested after you have filed a source, returned it to
the library, or misplaced it.
Information to include on your reference list
The reference list is placed at the end of the report.
The reference list includes only references cited in the text. The author's surname is
placed first, immediately followed by the year of publication. This date is often
placed in brackets.
The title of the publication appears after the date followed by place of publication,
then publisher (some sources say publisher first, then place of publication).
The important thing is to check for any special requirements or, if there are none,
to be consistent.
103
1. The Harvard (author-date) system is the one usually encountered in the sciences
and social sciences.
2. Notice that the titles of books, journals and other major works appear in italics (or
are underlined when handwritten), while the titles of articles and smaller works
which are found in larger works are placed in (usually single) quotation marks.
Harvard System: Examples
BOOK
Begon, M., Harper, J.L. & Townsend, C.R. (1990). Ecology: Individuals, Populations and
Communities. Oxford: Blackwell Scientific Publications.
JOURNAL ARTICLE
Hirschberger, P. & Bauer, T. (1994). The coprophagous insect fauna and its influence on
dung disappearance. Pedobiologia, 38, 375-384.
BOOK CHAPTER
Holt, R.D. (1993). Ecology at the mesoscale: the influence of regional processes on local
communities. In R. E. Ricklefs & D. Schluter (Eds.), Species Diversity in Ecological
Communities. Chicago: University of Chicago Press, 77-88.
INTERNET SITE
Crook, A. C. & Finn, J. (2002). STARS: Scientific Training by Assignment for Research
Students [online]. Available from: http://www.ucc.ie/research/stars [Accessed 16th
November 2004].
Quotations
When the exact words of a writer are quoted, they must be reproduced exactly in
all respects: wording, spelling, punctuation, capitalisation and paragraphing.
Quotations should be carefully selected and sparingly used, as too many quotations
can lead to a poorly integrated argument.
104
You should place information in an Appendix that is relevant to your subject but
needs to be kept separate from the main body of the report to avoid interrupting the
line of development of the report.
An appendix should include only one set of data, but additional appendices are
acceptable if you need to include several sets of data that do not belong in the same
appendix.
Do not place the appendices in order of their importance to you, but rather in the
order in which you referred to them in your report. You should also paginate each
appendix separately so that the first page of each appendix you include begins with
1.
A good general rule to follow is to define all terms that you are not completely sure
your audience will understand the same way you do.
Words to focus on are those key to your research, those relatively new or unfamiliar,
and those that readers could not look up for themselves in a standard dictionary.
Jargon
o
o You should take your audience into consideration when determining when to include
jargon in your writing.
o Consider their vocabulary and whether they will be familiar with a word or phrase
before you use it.
105
o Do not simply choose to include jargon without taking your audience into
consideration. Jargon can come between your writing and your reader, and readers
who do not understand jargon may see your use of jargon as impolite.
Writing Equations
12) Place equations on a separate line and number them consecutively with a number in
parentheses at the right margin.
13) Do not use punctuation after the equation, but punctuate words to introduce equations
as you would words forming any other sentence.
14) Refer to an equation in the body of the text by its number in parentheses.
Reports: Writing the Results
Presents the results [data] from the experiment or model
Do not just include figures and tables, ensure that your text provides:
a commentary guiding the reader through the figures & tables: location &
summary of purpose in report e.g. Figure 3.2 shows how the incidence of malaria
increases when; statement highlighting key trends
Check that figures are clearly presented [see slide 10]
Remember that the reader will look at the figures & tables only if directed to do
so in the text.
Editing Your Report with a Critical Friend
107
Murphy's Law of Errata Detection: "The very first person to see your mistake is always
the last person you want to know about it.
Reading your own work, you dont always spot errors as you may read your draft in the
way you want it to sound
Work with a critical friend- someone who gives honest advice- perhaps outside your field
As soon as you sit in front of the paper with your critical friend, your perspective may
change from that of the writer to a potential reader of the paper
Dont over edit the 1st part you write
Try using the editing questions provided by the Purdue Uni. Online Writing Lab- next
slide
Style & Vocabulary
Style:
Formal & objective
No 'I' or 'You'; no contracted forms cant
Avoidance of direct questions & standard negatives
No colloquial English: lots, stuff, things
Vocabulary
Formal verbs chosen e.g. investigated (from Latin/ Greek, rather than look into(not 2 part verbs)
Precise & often abstract vocabulary
A Few Grammar Points
Its and its: Dont use its, which is a spoken contraction of it is/it has; its is a
possessive adjective
To Sum up
Use and evaluate all the data you report and do not be discouraged if your results differ
from published studies or from what you expected
Justify all tables and figures by discussing their content and labeling them clearly
Be creative in your presentation of data, your analysis, and your interpretation of
data - play around with different variations before completing your report
Do not force conclusions from your data or fudge data by omitting that which does
not support pre-conceived conclusions
Make sure all calculations and analyses are relevant to the hypotheses you are
testing and the overall objectives of the study
Justify your ideas and conclusions with data, facts, and background literature and
with sound reasoning
108
Ensure to keep the different sections of the report discrete, i.e. methods in the
methods section, results in the results section, and leave discussion and
interpretation of those results for the discussion section
Plan your writing: organize your thoughts and data, and sketch the report before
actually writing. This will help maximize your time efficiency and lead to a
concise, well structured report
109
Sl.
No.
1.
meenakshi_gndu@yahoo.com
2.
simrandeepbhatti@gmail.com
3.
nitingirdharwal79@yahoo.com
4.
nehash@rediffmail.com
5.
ankurr.bhatnagar@gmail.com
6.
jitendra.geu@gmail.com
110
Sl.
No.
7.
sir.sachin.ghai@gmail.com
8.
mukeshsehrawat2007@rediffmail.com
9.
Ms. Deepshikha
Sr. Lecturer in Management Studies Department
Dehradun Institute of Technology, Dehradun
Mussoorie Diversion Road, Village Makkawala
DEHRADUN 248 001 (UTTARAKHAND)
Mobile : 9837135331
sk_shrma@rediffmail.com
10.
rhutam@yahoo.com
11.
singh_nutan9@yahoo.co.in
12.
mehtaaishwarya30@gmail.com
13.
khannauday77@gmail.com
111
Sl.
No.
14.
sarvendu161979@yahoo.com
15.
sudhiraman@rediffmail.com
16.
anushatayal@gmail.com
17.
priya_grover0123@yahoo.co.in
18.
angrishgagan@gmail.com
19.
saurabhjoshi666@rediffmail.com
20.
112
roy.shubhagata@gmail.com
Sl.
No.
21.
zohair_ams@yahoo.co.in
22.
rajiv_sindwani@ymcaie.ac.in
23.
suchitra_singh@yahoo.com
24.
dhawan_sumana@yahoo.co.in
25.
rg91@rediffmail.com
26.
Ms. Ritu
Lecturer in M.B.A Department
S. D. Institute of Professional Studies
Mandi Samiti Road,
MUZAFFARNAGAR 251 001 (U.P.)
Mobile: 9897751213
kangna_ritu@yahoo.co.in
27.
Ms. Shalu
Lecturer in Management Studies Department
Dr. K.N. Modi Institute of Engg. & Technology
Opposite Satish Park, Kapda Mill, Modinagar
GHAZIABAD (U.P.)
Mobile: 9808463239
shalu_singh16@yahoo.com
113
Sl.
No.
28.
sdcmsmzn@gmail.com
29.
nirdosh_agarwal@rediffmail.com
30.
brjshsngh@yahoo.com
31.
sdcmsmzn@gmail.com
32.
partha.saikia32@gmail.com
33.
Mr. K. G. Arora
Professor in Management Department
S. D. Institute of Professional Studies
Mandi Samiti Road,
MUZAFFARNAGAR 251 001 (U.P.)
Mobile: 9411276769
kgarora@rediffmail.com
34.
vjy.verma@gmail.com
114
Sl.
No.
35.
shailja.ag@rediffmail.com
Mr. Kuldeep
Lecturer in Management Studies Department
Disha Institute of Science & Technology
DHAMPUR (U.P.)
Distt. Bijnor
Mobile : 9927069856
Mr. Pankaj Kumar
Lecturer in Management Studies Department (MBA)
Trident ET Group of Institutions
Morta,
Delhi Meerut Road,
GHAZIABAD (U.P.)
Mobile : 9911056373
goswamikuldeep13@gmail.com
38.
jhaveri_garima@yahoo.com
39.
Ms. Veeralakshmi B.
Asstt. Prof. in MBA Department
College of Engineering Roorkee School of Mgmt.
Vardhmanpuram,
7th Km. Roorkee Haridwar Road,
ROORKEE 247 667 (UTTARAKHAND)
Mob.: 9997239017
coe-roorkee@vsnl.com
40.
gangwarveer.aimca@gmail.com
36.
37.
115
pankajkumar_13@yahoo.co.in
Sl.
No.
41.
vishal.hvbs@gmail.com
42.
mrinal.verma3@gmail.com
43.
sachindragupta1@rediffmail.com
44.
mailme.reena84@rediffmail.com
45.
dranjalichauhan@gmail.com
46.
rajeshmholmukhe@hotmail.com
116