Research Methods in Computing Notes
Research Methods in Computing Notes
Research Methods in Computing Notes
MURANG’A UNIVERSITY OF
TECHNOLOGY
Page 1 of 72
SCS301: RESEARCH METHODS IN COMPUTING
Definition of research
Different authors have defined research as follows:
Research is carrying out a diligent inquiry or a critical examination of a given
phenomenon.
Research involves a critical analysis of existing conclusions or theories with regard to
newly discovered facts i.e. it’s a continued search for new knowledge and
understanding of the world around us.
Research is a process of arriving at effective solutions to problems through systematic
collection, analysis and interpretation of data.
Purpose of Research
To discover new knowledge
To describe a phenomenon
To enable prediction.
To enable control i.e. the ability to regulate the phenomenon under study.
To enable explanation of a phenomenon i.e. accurate observation and measurement of
a given phenomenon.
To enable theory development and validation of existing theories. Theory
development involves formulating concepts, laws and generalizations about a given
phenomenon.
Research provides one with the knowledge and skills needed for the fast-paced
decision-making environment
Sources of Knowledge
Research
Experience: Empiricists attempt to describe, explain, and make predictions through
observation.
Tradition: Rationalists believe all knowledge can be deduced from known laws or
basic truths of nature
Authority: They serve as important sources of knowledge, but should be judged on
integrity and willingness to present a balanced case.
Intuition: it is the perception, explanation or insight into phenomena by instinct.
The Value of Acquiring Research Skills
To gather more information before selecting a course of action
To do a high-level research study
To understand research design
To evaluate and resolve a current management dilemma
To establish a career as a research specialist
Page 3 of 72
SCS301: RESEARCH METHODS IN COMPUTING
Components of research
1. Identification of the research area and topic.
2. Statement of the problem.
3. Literature review.
4. Methodology design
5. Sampling frame and sampling techniques.
6. Data collection tools, design and techniques.
7. Data analysis methods.
8. Report writing techniques.
TYPES OF RESEARCH
Different authors have classified research into various categories.
Qualitative research
It includes designs, techniques and measures that do not produce discrete numerical data.
Qualitative data can be collected through direct observation, participant observation or
interview method. Qualitative research includes an “array of interpretive techniques
which seek to describe, decode, translate and otherwise come to terms with the meaning,
not the frequency, of certain more or less naturally occurring phenomena in the social
world. Qualitative research aims to achieve an in-depth understanding of a situation.
Qualitative research is designed to tell the researcher how (process) and why (meaning)
Page 4 of 72
SCS301: RESEARCH METHODS IN COMPUTING
things happen as they do. Qualitative techniques are used at both the data collection and
data analysis stages of a research project. At the data collection stage, the array of
techniques includes focus groups, individual depth interviews, case studies, ethnography,
grounded theory, action research and observation. During analysis, the qualitative
researcher uses content analysis of written or recorded materials drawn from personal
expressions by participants and behavioural observations.
Qualitative Quantitative
Focus of research Understand and interpret Describe, explain and
predict
Researcher High, researcher is participant or Limited, controlled to
involvement catalyst prevent bias
Research purpose In-depth understanding : theory Describe or predict: Build
building and test theory
Sample design Non-probabilistic : purposive Probabilistic
Research design May evolve or adjust during the Determined before
course of the period commencing the project
Often uses multiple methods Uses single method or
simultaneously or sequentially mixed methods
Consistency is not expected Consistency is critical
Involves longitudinal approach Involves either a cross-
sectional or a longitudinal
approach
Participant Pre-tasking is common No preparation desired to
preparation avoid biasing the participant
Data type and Verbal or pictorial descriptions Verbal descriptions
preparation Reduced to verbal codes Reduced to numerical
codes for computerized
analysis
Quantitative research
It includes designs, techniques and measures that produce discreet numerical or
quantifiable data.
Page 5 of 72
SCS301: RESEARCH METHODS IN COMPUTING
Classification by purpose
1. Basic / Pure / Fundamental Research
Basic researchers are interested in deriving scientific knowledge i.e. they are
motivated by intellectual curiosity and need to come up with a particular solution. It
focuses on generating new knowledge in order to refine or expand existing theories. It
does not consider the practical application of the findings to actual problems or
situations.
2. Applied research
It is conducted for the purpose of applying or testing theory and evaluating its
usefulness in solving problems. It provides data to support a theory, guide theory
revision or suggest the development of a new theory.
3. Action research
It is conducted with the primary intention of solving a specific, immediate and
concrete problem in a local setting e.g. investigating ways of overcoming water
shortage in a given area. It is not concerned with whether the results can be
generalized to any other setting.
4. Evaluation Research
It is the process of determining whether the intended results were realized.
3. Correlation Methods
It describes in quantitative terms the degree to which variables are related. It explores
relationships between variables and also tries to predict a subject’s score on one
variable given his or her score on another variable.
Steps in correlational research
Problem statement
Selection of subjects
Data collection
Data analysis
Page 7 of 72
SCS301: RESEARCH METHODS IN COMPUTING
2. Historical research
Involves the study of a problem that requires collecting information from the past
3. Observational Research
The current status of a phenomenon is determined not by asking but by observing.
This helps to collect objective information.
Steps
Selection and definition of the problem.
Sample selection.
Definition of the observational information.
Recording observational information
Data analysis and interpretation.
Page 9 of 72
SCS301: RESEARCH METHODS IN COMPUTING
Advantages
a) Researchers are able to economize in terms of time and money.
b) Errors that arise during the study are easier to detect and correct.
c) The method has no effect on what is being studied.
Disadvantages
a) It is limited to recorded communication.
b) It is difficult to ascertain the validity of the data.
Page 11 of 72
SCS301: RESEARCH METHODS IN COMPUTING
e) Rephrasing the research problem: Its putting the research problem in as specific
terms as possible so that it may become operationally viable and may help in the
development of working hypotheses.
The following should also be observed when defining a research problem:
a) Technical terms and words or phrases with special meanings used in the statement
of the problem, should be clearly defined.
b) Basic assumptions or postulates if any relating to the research problem should be
clearly stated.
c) A straight forward statement of the value of the investigation should be provided.
d) The suitability of the time-period and the sources of data available must also be
considered by the researcher in defining the problem.
e) The scope of the investigation or the limits within which the problem is to be
studied must be mentioned explicitly in defining a research problem.
Page 12 of 72
SCS301: RESEARCH METHODS IN COMPUTING
The purpose of a study crystallizes the researcher’s inquiry into a particular area of
knowledge in a given field. If the purpose is accurately expressed, the research process
will be carried out with ease. The purpose of the study should meet the following criteria:
It must be indicated clearly, unambiguously and in a declarative manner.
The purpose should indicate the concepts or variables in the study.
Where possible, the relationships among the variables should be stated.
The purpose should state the target population.
The variables and target population given in the purpose should be consistent with
the variables and target population operationalised in the methods section of the
study.
In stating the purpose of the study, the researcher should choose the right words to
convey the focus of the study effectively. Use of subjective or biased words or sentences
should be avoided.
Examples
Biased Neutral
To show To determine
To prove To compare
To confirm To investigate
To verify To differentiate
To check To explore
To demonstrate To find out
To indicate To examine
To validate To inquire
To explain To establish
To illustrate To test
FORMULATING HYPOTHESES
A hypothesis is a researcher’s prediction regarding the outcome of the study. It states
possible differences, relationships or causes between two variables or concepts.
Hypothesis are derived from or based on existing theories, previous research, personal
Page 13 of 72
SCS301: RESEARCH METHODS IN COMPUTING
Purpose of hypothesis
It provides direction by bridging the gap between the problem and the evidence
needed for its solution.
It ensures collection of the evidence necessary to answer the question posed in the
statement of the problem.
It enables the investigator to assess the information he or she has collected from the
standpoint of both relevance and organisation.
It sensitizes the investigator to certain aspects of the situation that are relevant
regarding the problem at hand.
It permits the researcher to understand the problem with greater clarity and use the
data to find solutions to problems.
It guides the collection of data and provides the structure for their meaningful
interpretation in relation to the problem under investigation.
It forms the framework for the ultimate conclusions as solutions.
Page 14 of 72
SCS301: RESEARCH METHODS IN COMPUTING
LITERATURE REVIEW
The review of literature involves the systematic identification, location and analysis of
documents containing information related to the research problem being investigated. It
should be extensive and thorough because it is aimed at obtaining detailed knowledge of
the topic being studied.
8. The literature should be organized in such a way that the more general is covered first
before the researcher narrows down to that which is more specific to the research
problem.
Sources of literature
(a) Primary sources: are direct descriptions of any occurrence by an individual who
actually observed or witnessed the occurrence.
(b) Secondary source: they include any publications written by an author who was not
a direct observer or participant in the events described.
Examples
Scholarly journals
Theses and dissertations
Government documents
Papers presented at conferences
Books
References quoted in books
International indices
Abstracts
Periodicals
The Africana section of the library
Reference section of the library
Grey literature
Inter-library loan
The British lending library
The internet
Microfilm
Page 16 of 72
SCS301: RESEARCH METHODS IN COMPUTING
ETHICS IN RESEARCH
Ethics are norms or standards of behaviour that guide moral choices about our behaviour
and our relationship with others. Ethics differ from legal constraints, in which generally
accepted standards have defined penalties that are universally enforced. The goal of
ethics in research is to ensure that no one is harmed or suffers adverse consequences from
research activities.
(a) Benefits
Whenever direct contact is made with a respondent, the researcher should discuss the
study benefits, being careful to neither overstate nor understate the benefits. An
interviewer should begin an introduction with his or her name, the name of the research
organisation and a brief description of the purpose and benefits of the research. This puts
the respondent at ease, lets them know to whom they are speaking and motivates them to
answer questions truthfully. Inducements to participate, financial or otherwise, should
not be disproportionate to the task or presented in a fashion that results in coercion.
Deception occurs when the respondents are told only part of the truth or when the truth is
fully compromised. The benefits to be gained by deception should be balanced against
the risks to the respondents. When possible, an experiment or interview should be
designed to reduce reliance on deception. In addition, the respondent’s rights and well-
being must be adequately protected. In instances where deception in an experiment could
Page 17 of 72
SCS301: RESEARCH METHODS IN COMPUTING
Page 18 of 72
SCS301: RESEARCH METHODS IN COMPUTING
Researchers should restrict access to information that reveals names, telephone numbers,
address or other identifying features. Only researchers who have signed nondisclosure,
confidentiality forms should be allowed access to the data. Links between the data or
database and the identifying information file should be weakened. Individual interview
response sheets should be inaccessible to everyone except the editors and data entry
personnel.
Occasionally, data collection instruments should be destroyed once the data are in a data
file. Data files that make it easy to reconstruct the profiles or identification of individual
respondents should be carefully controlled. For very small groups, data should not be
made available because it is often easy to pinpoint a person within the group. Employee-
satisfaction survey feedback in small units can be easily used to identify an individual
through descriptive statistics.
Privacy is more than confidentiality. A right to privacy means one has the right to refuse
to be interviewed or to refuse to answer any question in an interview. Potential
participants have a right to privacy in their own homes, including not admitting
researchers and not answering telephones. They have the right to engage in private
behaviour in private places without fear of observation. To address these rights, ethical
researchers can do the following:-
Inform respondents of their right to refuse to answer any questions or participate in
the study.
Obtain permission to interview respondents
Schedule field and phone interviews.
Limit the time required for participation.
Restrict observation to public behaviour only.
(a) Confidentiality
Sponsors have a right to several types of confidentiality including sponsor nondisclosure,
purpose nondisclosure and findings nondisclosure.
Sponsor nondisclosure: Companies have a right to dissociate themselves from the
sponsorship of a research project. Due to the sensitive nature of the management
dilemma or the research question, sponsors may hire an outside consulting or
research firm to complete research projects. this is often done when a company is
testing a new product idea, to avoid potential consumers from being influenced by
the company’s current image or industry standing. If a company is contemplating
entering a new market, it may not wish to reveal its plans to competitors. In such
cases, it is the responsibility of the researcher to respect this desire and device a
plan to safeguard the identity of the sponsor.
Purpose nondisclosure: It involves protecting the purpose of the study or its
details. A research sponsor may be testing a new idea that is not yet patented and
Page 19 of 72
SCS301: RESEARCH METHODS IN COMPUTING
may not want the competitor to know his plans. It may be investigating employee
complaints and may not want to spark union activity. The sponsor might also be
contemplating a new public stock offering, where advance disclosure would spark
the interest of authorities or cost the firm thousands of shillings.
Findings nondisclosure: If a sponsor feels no need to hide its identity or the
study’s purpose, most sponsors want research data and findings to be confidential,
at least until the management decision is made.
The ethical course often requires confronting the sponsor’s demand and taking the
following actions: -
Educating the sponsor on the purpose of research
Explain the researcher’s role in fact finding versus the sponsor’s role in decision-
making.
Explain how distorting the truth or breaking faith with respondents leads to future
problems
Failing moral suasion, terminate the relationship with the sponsor.
Page 20 of 72
SCS301: RESEARCH METHODS IN COMPUTING
RESEARCH DESIGN
Therefore a research design is the strategy for a study and the plan by which the strategy
is to be carried out. It specifies the methods and procedures for the collection,
measurement, and analysis of data.
Page 21 of 72
SCS301: RESEARCH METHODS IN COMPUTING
CLASSIFICATIONS OF DESIGNS
Research can be classified using eight different descriptors as shown in the table below:
Category Options
The degree to which the research Exploratory study
questions has been crystallized Formal study
The method of data collection Monitoring
Interrogation / communication
The power of the researcher to Experimental
produce effects in the variables Ex post facto
under study
The purpose of the study Descriptive
Causal
The time dimension Cross-sectional
Longitudinal
The topical scope – breath and depth Case
of the study Statistical study
The research environment Field setting
Laboratory research
Simulation
The participants perceptions of Actual routine
research activity Modified routine
Case studies: they place more emphasis on a full contextual analysis of fewer
events or conditions and their interrelations. Although hypotheses are often used,
the reliance on qualitative data makes support or rejection more difficult. An
emphasis on detail provides valuable insight for problem solving, evaluation and
strategy. This detail is secured from multiple sources of information. It allows
evidence to be verified and avoids missing data.
Page 23 of 72
SCS301: RESEARCH METHODS IN COMPUTING
8. Participants’ perceptions
The usefulness of a design may be reduced when people in a disguised study perceive
that research is being conducted. Participants’ perceptions influence the outcomes of the
research in subtle ways. There are three levels of perception:
Participants perceive no deviations from everyday routines
Participants perceive deviations, but as unrelated to the researcher.
Participants perceive deviations as researcher-induced.
In all research environments and control situations, researchers need to be vigilant to
effects that may alter their conclusions. Participant’s perceptions serve as a reminder to
classify one’s study by type, to examine validation strengths and weaknesses and to be
prepared to qualify results accordingly.
Despite its obvious value, researchers and managers give exploration less attention that it
deserves. Exploration is sometimes linked to old biases about qualitative research i.e.
subjective ness, non-representativeness and non-systematic design.
When we consider the scope of qualitative research, several approaches are adaptable for
exploratory investigations of management questions:
a) In-depth interviewing – usually conversational rather than structured.
b) Participant observation – to perceive first hand what participants in the setting
experience
c) Films, photographs and videotapes – to capture the life of the group under study.
d) Case studies – for an in-depth contextual analysis of a few events or conditions
Page 24 of 72
SCS301: RESEARCH METHODS IN COMPUTING
Where these approaches are combined, four exploratory techniques emerge with wide
applicability for the management researcher: -
a) Secondary data analysis
b) Experience surveys
c) Focus groups
d) Two-stage designs
An exploratory research is finished when the researchers have achieved the following:
Established the major dimensions of the research task
Defined a set of subsidiary investigative questions that can be used as a guide to a
detailed research design.
Developed several hypotheses about possible causes of a management dilemma.
Learned that certain other hypotheses are such remote possibilities that they can be
safely ignored in any subsequent study.
Concluded additional research is not needed or is not feasible.
Page 26 of 72
SCS301: RESEARCH METHODS IN COMPUTING
The methodology section of a research study describes the procedures that are to be
followed in conducting the study. The techniques of obtaining data are developed.
Population: It’s a complete set of individuals, cases or objects with some observable
characteristics.
A census is a count of all the elements in a population.
Sample: A sample is a subset of a particular population. The target population is that
population to which a researcher wants to generalize the results of the study. There must
be a rationale for defining and identifying the accessible population from the target
population.
Sampling; It’s the process of selecting a sample from a population.
Page 27 of 72
SCS301: RESEARCH METHODS IN COMPUTING
Sampling procedures:
There are two major ways of selecting samples;
Probability sampling methods
Non - Probability sampling methods
Periodicity within the population may skew the sample and results.
If the population list has a monotonic trend, a biased estimate will result based
on the start point.
c) Stratified Random Sampling:
A population is divided into subgroups called strata and a sample is selected from
each stratum. After the population is divided into strata, either a proportional or a
non-proportional sample can be selected. In a proportional sample, the number of
items in each stratum is in the same proportion as in the population while in a non-
proportional sample, the number of items chosen in each stratum is disproportionate
to the respective numbers in the population.
Advantages
Researcher controls sample size in strata
Increased statistical efficiency
Provides data to represent and analyze subgroups.
Enables use of different methods in strata.
Disadvantages
Increased error will result if subgroups are selected at different rates
Expensive especially if strata on the population have to be created.
d) Cluster Sampling:
The population is divided into internally heterogeneous subgroups and some are
randomly selected for further study. It is used when it is not possible to obtain a
sampling frame because the population is either very large or scattered over a large
geographical area. A multi-stage cluster sampling method can also be used.
Advantages
Provides an unbiased estimate of population parameters if properly done.
Economically more efficient than simple random.
Lowest cost per sample, especially with geographic clusters.
Easy to do without a population list.
Disadvantages
More error (Lower statistical efficiency) due to subgroups being homogeneous
rather the heterogeneous.
Advantage
Widely used by pollsters, marketers and other researchers.
Disadvantages
a) It gives no assurance that the sample is representative of the variables being
studied.
b) The data used to provide controls may be outdated or inaccurate.
c) There is a practical limit on the number of simultaneous controls that can be
applied to ensure precision.
d) Since the choice of subjects is left to field workers, they may choose only friendly
looking people.
Sampling error
It’s the difference between a sample statistic and its corresponding population parameter.
The sampling distribution of the sample means is a probability distribution of possible
sample means of a given sample size.
Statistical Inference
Sample information is used to shade some light on the population characteristics i.e. we
infer population properties based on findings on the sample. Statistical inference falls into
two main areas i.e. statistical estimation and hypothesis testing.
Statistical Estimation: The characteristics of the sample (sample statistic) are used to
estimate or approximate some unknown population characteristics.
Hypothesis testing: The population characteristics are known or assumed. The sample
characteristics are used to verify or ascertain this assumed or known population
characteristic. The assignment of values to a population parameter is based on a sample is
called estimation. The values assigned to a population parameter based on the value of a
sample statistic is called an estimate of the population parameter. The sample statistic
used to estimate a population parameter is called an estimator. Estimation can be
undertaken in two forms namely, Point estimation or Interval estimation
One of the most common questions asked of statisticians is, how large should the sample
taken in a survey be? The answer to this question depends on three factors:-
i. The parameter to be estimated
iii. The maximum error of estimation, where error of estimation is the absolute
difference between the point estimator and the parameter e.g. the point estimator
of is so that the error of estimation
The maximum error of estimation is also called the error bound and is denoted B.
Suppose the parameter of interest in an experiment is the population mean . The
confidence interval estimator (assuming a normal population, with the population
variance known) is . If we want to estimate to within a certain specified
bound B, we will want the confidence interval estimator to be . As a consequence,
2. Find , given that we want to estimate to within 10 units with 95% confidence,
assuming that
3. The operations manager of a large production plant would like to estimate the average
amount of time a worker takes to assemble a new electronic component. After
observing a number of workers assembling similar devices, she noted that the shortest
time taken was 10 minutes and the longest time taken was 22 minutes. How large a
sample of workers should she take if she wants to estimate the mean assembly time to
within 20 seconds? Assume that the confidence level is to be 99%.
4. Determine the sample size necessary to estimate to within 10 units with 99%
confidence. We know that the range of the population is 200 units.
Page 31 of 72
SCS301: RESEARCH METHODS IN COMPUTING
1. The manager of a bank feels that 35% of branches will have enhanced yearly
collection of deposits after introducing a hike in interest rate. Determine the sample
size such that the mean proportion is within plus or minus 0.06 at a confidence level
of (i) 90% (ii) 95% and (iii) 99%.
2. How large a sample should be taken in order to estimate to within 0.01 with 95%
confidence ? assume that
3. The director of a management school feels that 55% of students will have enhanced
performance if additional input is given to them. Determine the sample size such that
the mean proportion is within plus or minus 0.10 at a confidence level of 95%.
MEASUREMENT
Introduction
While people measure things casually in daily life, research measurement is more precise
and controlled. In measurement, one settles for measuring properties of the objects rather
than the objects themselves. An event is measured in terms of its duration i.e. what
happened during it, who was involved, where it occurred etc. Measurement is the basis
for all systematic inquiry because it provides us with the tools for recording differences in
the outcome of variable change.
Definition of Measurement
d) Origin: The number series has a unique origin indicated by the number zero.
This is an absolute and meaningful zero point.
LEVELS OF MEASUREMENT
Anything that can be measured falls into one of the four types;
The higher the level of measurement, the more precision in measurement; and
a) Nominal level. The observations are classified under a common characteristic e.g.
sex, race, marital status, employment status, language, religion etc. helps in
sampling.
b) Ordinal level: items or subjects are not only grouped into categories, but they are
ranked into some order e.g. greater than, less than, superior, happier than, poorer,
above etc. helps in developing a likert scale.
c) Interval level: numerals are assigned to each measure and ranked. The intervals
between numerals are equal. The numerals used represent meaningful quantities
but the zero point is not meaningful e.g. test scores, temperature.
d) Ratio level: has all the characteristics of the other levels and in addition the zero
point is meaningful. Mathematical operations can be applied to yield meaningful
values e.g. height, weight, distance, age, area etc.
The ideal study should be designed and controlled for precise and unambiguous
measurement of the variables. Since 100% control is unattainable, error occurs. Much
potential error is systematic (results from a bias) while the remainder is random (occurs
erratically). Some of the major sources of error are:
(a) The respondent: opinion differences that affect measurement come from relatively
stable characteristics of the respondent e.g. employee status, ethnic group and
social class. Temporary factors like fatigue, boredom, anxiety and other distractions
also limit the ability to respond accurately and fully. Hunger, impatience or general
variations in mood will also have an impact.
(b) The situational factors: any condition that places a strain on the interview or
measurement session can have serious effects on the interviewer – respondent
rapport. If another person is present, that person can distort responses by joining in,
by distracting or by merely being present. If the respondents believe anonymity is
not ensured, they may be reluctant to express certain feelings.
(c) The measurer: the interviewer can distort responses by re-wording, paraphrasing,
or re-ordering questions. Stereotypes in appearance and action introduce bias.
Inflections of voice or unconscious prompting with smiles and nods may encourage
or discourage certain replies. Incorrect coding, careless tabulation and faulty
statistical calculation may introduce further errors in data analysis.
(d) The data collection instrument: a defective instrument can cause distortion in two
major ways:
It can be too confusing and ambiguous e.g. the use of complex words,
leading questions, ambiguous meanings, multiple questions.
Leads to poor selection from the universe of content items. Seldom does
the instrument explore all the potentially important issues.
TYPES OF VARIABLES
Page 34 of 72
SCS301: RESEARCH METHODS IN COMPUTING
3. Extraneous variables
They are those variables that affect the outcome of a research study either because the
researcher is not aware of their existence or if the researcher is aware, she or he has no
control over them.
a)Subject variables, which are the characteristics of the individuals being studied
that might affect their actions. These variables include age, gender, health status,
mood, background, etc.
b) Experimental variables are characteristics of the persons conducting the
experiment which might influence how a person behaves. Gender, the presence of
racial discrimination, language, or other factors may qualify as such variables.
c) Situational variables are features of the environment in which the study or
research was conducted, which have a bearing on the outcome of the experiment
in a negative way. Included are the air temperature, level of activity, lighting, and
the time of day.
4. Control variables / concomitant / covariate or blocking variables
Page 35 of 72
SCS301: RESEARCH METHODS IN COMPUTING
They are extraneous variables that are built into the study. Extraneous variables are
variables, which influence the results of a study when they are not controlled.
Since absolute control of extraneous variables is not possible in any study, results are
interpreted on the basis of degrees of confidence rather than certainty.
Once the major extraneous variables are identified, the researcher can control them by:-
i. Building the extraneous variable into the study: i.e. including it as an independent
variable. E.g. in determining the effect of alcohol on reaction time, sex may
influence reaction time. Therefore, sex can be introduced as an independent
variable. Using regression, one can measure the effect of alcohol on reaction time,
controlling sex.
ii. Include them in the study but only at one level e.g. time is the dependent variable,
alcohol level - the independent and sex the extraneous variable. Sex can be
controlled by sampling only females or males of a given age. The disadvantage of
this method is that generalizations are limited to a smaller population.
iii. By removing the effects of the extraneous variables by statistical procedures i.e.
by siphoning its effects on the dependent variable. This can be done by:
Analysis of co-variance
Partial correlation.
5. Intervening variables
They are a special case of extraneous variables. The difference between the intervening
and extraneous variables is in the assumed relationship among the variables. An
intervening variable is a hypothetical internal state that is used to explain relationships
between observed variables, such as independent and dependent variables, in empirical
research. With an extraneous variable, there is no causal link between the independent
and dependent variable, but they are independently associated with a third variable – the
extraneous variable. An intervening variable is recognized as being caused by the
independent variable and as being a determinant of the dependent variable.
The choice of the right intervening variables helps one not only to determine accurately
the total effects of an independent variable on the dependent variable but also partition
the total effects into direct and indirect.
Page 36 of 72
SCS301: RESEARCH METHODS IN COMPUTING
6. Antecedent variables
They do not interfere with the established relationship between an independent and
dependent variable but clarifies the influence that precedes such a relationship.
The variables including the antecedent variable must be related in some logical
sequence.
When the antecedent variable is controlled for, the relationship between the
independent and the dependent variables should not disappear. Rather it should be
enhanced.
When the independent variable is controlled for or its influence removed, there
should not be any relationship between the antecedent variable and the dependent
variable.
e.g. political stability – attracts investors – increased job opportunities – high standards of
living – reduction of poverty.
7. Suppressor variables
8. Distorter variables
It is a variable that converts what was thought of as a positive relationship into a negative
relationship and vice-versa. Its effects lead a researcher into drawing erroneous
conclusions from the data. When the distorter variable is controlled, a true relationship is
obtained. Consideration of distorter variables in a study reduces the chances of making a
type I (rejecting a true null hypothesis) or type two error (accepting a false null
hypothesis).
They are commonly used in testing hypothesized causal models. Path analysis ( a
procedure that tests causal links among several variables) is often used in testing the
validity of causal relationships in a theory or model.
Page 37 of 72
SCS301: RESEARCH METHODS IN COMPUTING
A C
B D
A and B are called exogenous variables. They lack hypothesized causes in the model.
The quality of a research study depends to a large extent on the accuracy of the data
collection procedures. Reliability and validity measures the relevance and correctness of
the data.
Reliability
Reliability is the extent to which an experiment, test, or any measuring procedure yields
the same result on repeated trials. Without the agreement of independent observers able
to replicate research procedures, or the ability to use research tools and procedures that
yield consistent measurements, researchers would be unable to satisfactorily draw
conclusions, formulate theories, or make claims about the generalizability of their
research. In addition to its important role in research, reliability is critical for many parts
of our lives, including manufacturing, medicine and sports. Reliability is such an
important concept that it has been defined in terms of its application to a wide range of
activities.
Reliability is influenced by random error. Random error is the deviation from a true
measurement due to factors that have not effectively been addressed by the researcher. As
random error increases, reliability decreases.
2. Equivalent form
Equivalent reliability is the extent to which two items measure identical concepts at an
identical level of difficulty. Equivalency reliability is determined by relating two sets of
test scores to one another to highlight the degree of relationship or association. In
quantitative studies and particularly in experimental studies, a correlation coefficient,
statistically referred to as r, is used to show the strength of the correlation between a
dependent variable (the subject under study), and one or more independent variable,
which are manipulated to determine effects on the dependent variable. An important
consideration is that equivalency reliability is concerned with correlational, not causal,
relationships.
reliability between the two actions. In other words, studying was not a reliable predictor
of shopping for gifts.
Two instruments are used. Specific items in each form are different but they are designed
to measure the same concept. They are the same in number, structure and level of
difficulty e.g. TOEFL, GRE
Advantages
Estimates the stability of the data as well as the equivalence of the items in the two
forms
Disadvantages
Difficulty in constructing two tests, which measure the same concept (time and
resources).
4. Interrater reliability
Interrater reliability is the extent to which two or more individuals (coders or raters)
agree. Interrater reliability addresses the consistency of the implementation of a rating
system.
A test of interrater reliability would be the following scenario: Two or more researchers
are observing a high school classroom. The class is discussing a movie that they have just
viewed as a group. The researchers have a sliding rating scale (1 being most positive, 5
being most negative) with which they are rating the student's oral responses. Interrater
reliability assesses the consistency of how the rating system is implemented. For
example, if one researcher gives a "1" to a student response, while another researcher
gives a "5," obviously the interrater reliability would be inconsistent. Interrater reliability
is dependent upon the ability of two or more individuals to be consistent. Training,
education and monitoring skills can enhance interrater reliability.
Page 40 of 72
SCS301: RESEARCH METHODS IN COMPUTING
Broaden the sample of measurement questions by adding similar questions to the data
collection instrument or adding more observers or occasions to an observation study.
Improve internal consistency of an instrument by excluding data from analysis drawn
from measurement questions eliciting extreme responses.
Validity
Validity refers to the degree to which a study accurately reflects or assesses the specific
concept that the researcher is attempting to measure. It is the degree to which results
obtained from the analysis of data actually represent the phenomenon under study. It is
the accuracy and meaningfulness of inferences, which are based on the research results. It
has to do with how accurately the data obtained in the study represents the variables of
the study. If such data is a true reflection of the variables, then inferences based on such
data will be accurate and meaningful. Validity is largely determined by the presence or
absence of systematic error in the data e.g. using a faulty scale to measure.
Types of validity
(a) Construct validity
Construct validity can be broken down into two sub-categories: Convergent validity and
discriminate validity. Convergent validity is the actual general agreement among ratings,
gathered independently of one another, where measures should be theoretically related.
Discriminate validity is the lack of a relationship among measures which theoretically
should not be related.
To understand whether a piece of research has construct validity, three steps should be
followed. First, the theoretical relationships must be specified. Second, the empirical
relationships between the measures of the concepts must be examined. Third, the
empirical evidence must be interpreted in terms of how it clarifies the construct validity
of the particular measure being tested.
measure an attitude like self-esteem must decide what constitutes a relevant domain of
content for that attitude. For socio-cultural studies, content validity forces the researchers
to define the very domains they are attempting to study.
The usual procedure in assessing the content validity of a measure is to use professional
or experts in the particular field. The instrument is given to two groups of experts, one
group is requested to assess what concept the instrument is trying to measure. The other
group is asked to determine whether the set of items or checklist accurately represents the
concept under study.
c) Instrumentation -
d) Pre-testing – solution – use equivalent form tests
e) Statistical regression
f) Attrition- subjects dropping out of the study before completion- leads to error,
biasness in the sample
g) Differential selection – occurs when subjects are systematically selected for a study -
volunteers and non-volunteers – biasness leads error
h) Selection – maturation interaction
i) Ambiguity - when correlation is taken for causation
j) Apprehension - when people are scared to respond to your study
k) Demoralization - when people get bored with your measurements
l) Diffusion - when people figure out your test and start mimicking symptoms
RESEARCH INSTRUMENTS
QUESTIONNAIRES
Each item in the questionnaire is developed to address a specific objective, research
question or hypothesis of the study. The researcher must also know how information
obtained from each questionnaire item will be analysed.
Page 43 of 72
SCS301: RESEARCH METHODS IN COMPUTING
3 Contingency questions
In particular cases, certain questions are applicable to certain groups of respondents. In
such cases, follow-up questions are needed to get further information from the relevant
sub-group only. These subsequent questions, which are asked after the initial questions,
are called ‘contingency questions’ or ‘ filter questions’. The purpose of these kinds of
questions is to probe for more information. They also simplify the respondent’s task, in
that they will not be required to answer questions that are not relevant to them.
4 Matrix questions
These are questions, which share the same set of response categories. They are used
whenever scales like likert scale are being used.
Follow-up techniques
Sending a follow-up letter which should be polite, and asking the subjects to
respond
A questionnaire and a follow-up letter.
Response rate
It refers to the percentage of subjects who respond to questionnaires. Many authors
believe that a response rate of 50% is adequate for analysis and reporting. If the response
rate is low, the researcher must question the representativeness of the sample.
INTERVIEWS
An interview is an oral (face to face) administration of a questionnaire or an interview
schedule. To obtain accurate information through interviews, a researcher needs to obtain
the maximum co-operation from respondents. Interviews are particularly useful for
getting the story behind a participant's experiences. The interviewer can pursue in-depth
information around a topic. Interviews may be useful as follow-up to certain respondents
to questionnaires, e.g., to further investigate their responses. Usually open-ended
questions are asked during interviews.
Sequence of Questions
1. Get the respondents involved in the interview as soon as possible.
2. Before asking about controversial matters (such as feelings and conclusions), first
ask about some facts. With this approach, respondents can more easily engage in
the interview before warming up to more personal matters.
3. Intersperse fact-based questions throughout the interview to avoid long lists of
fact-based questions, which tends to leave respondents disengaged.
4. Ask questions about the present before questions about the past or future. It's
usually easier for them to talk about the present and then work into the past or
future.
5. The last questions might be to allow respondents to provide any other information
they prefer to add and their impressions of the interview.
Wording of Questions
Page 47 of 72
SCS301: RESEARCH METHODS IN COMPUTING
Personal interviews
People selected to be part of the sample are interviewed in person by a trained
interviewer.
Requirements for success
Three broad conditions must be met in order to have a successful personal interview:
The participant must possess the information being targeted by the investigative
questions
The participant must understand his or her role in the interview as the provider of
accurate information
The participant must perceive adequate motivation to cooperate
The first goal in an interview is to establish a friendly relationship with the participant.
Three factors will help increase participant receptiveness. The participant must:
Believe that the experience will be pleasant and satisfying
Believe that answering the survey is an important and worthwhile use of his or her
time
Dismiss any mental reservations that he or she might have about participation.
The technique of stimulating participants to answer more fully and relevantly is termed
probing. Since it presents a great potential for bias, a probe should be neutral and appear
as a natural part of the conversation. Appropriate probes should be specified by the
designer of the data collection instrument. There are several probing styles e.g.
A brief assertion of understanding and interest e.g. comments such as “I see” “yes”.
An expectant pause
Repeating the question
Repeating the participant’s reply
A neutral question or comment
Question clarification.
Telephone interviews
People selected to be part of the sample are interviewed on the telephone by a trained
interviewer.
Advantages of Telephone interviews
Lower costs than personal interviews
Expanded geographic coverage without dramatic increase in costs
Uses fewer, more highly skilled interviewers
Reduced interview bias
Fates completion time
Better access to hard-to-reach respondents through repeated callbacks
Can use computerized random digit dialing
Responses can be entered directly into a computer file to reduce error and cost when
using computer assisted telephone interviewing.
An interview schedule
It’s a set of questions that the interviewer asks when interviewing. It makes it possible
to obtain data required to meet specific objectives of the study.
Advantages
It facilitates data analysis since the information is readily accessible and already
classified into appropriate categories.
If taken well, no information is left out.
Tape recording
The interviewer’s questions and the respondent’s answers are recorded either using a tape
recorder or a video tape.
Advantages
It reduces the tendency for the interviewer to make unconscious selection of data in
the course of the recording.
The tape can be played back and studied more thoroughly.
Page 51 of 72
SCS301: RESEARCH METHODS IN COMPUTING
A person other than the interviewer can evaluate and categorize responses.
It speeds up the interview.
Communication is not interrupted.
Disadvantages
It changes the interview situation since respondents get nervous.
Respondents may be reluctant to give sensitive information if they know they are
being taped.
Transcribing the tapes before analysis is time consuming and tedious.
Advantages of interviews
It provides in-depth data, which is not possible to get using a questionnaire.
It makes it possible to obtain data required to meet specific objectives of the study.
Are more flexible than questionnaires because the interviewer can adapt to the
situation and get as much information as possible.
Very sensitive and personal information can be extracted from the respondent.
The interviewer can clarify and elaborate the purpose of the research and effectively
convince respondents about the importance of the research.
They yield higher response rates
Disadvantages of interviews
They are expensive – traveling costs
It requires a higher level of skill
Interviewers need to be trained to avoid bias
Not appropriate for large samples
Responses may be influenced by the respondent’s reaction to the interviewer.
OBSERVATION
Observation is one of the few options available for studying records, mechanical
processes, small children and complex interactive processes. Data can be gathered as
the event occurs. Observation includes a variety of monitoring situations that cover non-
behavioural and behavioural activities.
Page 52 of 72
SCS301: RESEARCH METHODS IN COMPUTING
Advantages of observation
Enables one to:
Secure information about people or activities that cannot be derived from
experiment or surveys
Reduces obtrusiveness
Avoid participant filtering and forgetfulness
Secure environmental context information
Optimize the naturalness of the research setting
Limitations of observation
Difficulty of waiting for long periods to capture the relevant phenomena
The expense of observer costs and equipment
Reliability of inferences from surface indicators
The problem of quantification and disproportionately large records
DATA ANALYSIS
DATA PREPARATION AND DESCRIPTION
Once the data begins to flow in, attention turns to data analysis. If the project has been
done correctly, the analysis planning is already done.
Data preparation
This includes editing, coding and data entry. These activities ensure the accuracy of the
data and their conversion from raw form to reduced and classified forms that are more
appropriate for analysis.
Editing
Editing detects errors and omissions, corrects them when possible and certifies that
minimum data quality standards have been achieved. The editor’s purpose is to
guarantee that data are:
Accurate
Consistent with intent of the question and other information in the survey
Uniformly entered
Complete
Arranged to simplify coding and tabulation
Field editing
In large projects, field editing review is a responsibility of the field supervisor. It should
be done soon after the data have been gathered. During the stress of data collection, the
researcher often uses ad hoc abbreviations and special symbols. Soon after the interview,
experiment or observation, the investigator should review the reporting forms. It is
difficult to complete what was abbreviated or written in shorthand or noted illegibly if the
Page 53 of 72
SCS301: RESEARCH METHODS IN COMPUTING
entry is not caught that day. When entry gaps are present from interviews, a call back
should be made rather than guessing what the respondent ‘probably would have said’.
Self-interviewing has no place in quality research.
Central editing
For a small study, the use of a single editor produces maximum consistency. In large
studies, the tasks may be broken down so that each editor can deal with one entire
section. This approach will not identify inconsistencies between answers in different
sections. However, this problem can be handled by identifying points of possible
inconsistency and having one editor check specifically for them.
Content analysis is always used to analyse open-ended questions. Converse and Presser
(1986) define content analysis as a research technique for the objective, systematic and
quantitative description of the manifest content of a communication.
Content analysis guards against selective perception of the content, provides for the
rigorous application of reliability and validity criteria and is amenable to
computerization.
“Don’t know” replies
“Don’t know” replies are evaluated in light of the questions nature and the respondent.
While many don’t know are legitimate, some result from questions that are ambiguous or
from an interviewing situation that is not motivating. It is better to report don’t knows as
a separate category unless there are compelling reasons to treat them otherwise.
Data entry
Data entry converts information gathered by secondary or primary methods to a medium
for viewing and manipulation. Data entry is accomplished by keyboard entry from pre-
coded instruments, optical scanning, real time keyboarding, telephone pad data entry, bar
codes, voice recognition, optical mark recognition (OMR) and data transfers from
electronic notebooks and laptop computers. Database programs, spreadsheets and editors
in statistical software programs e.g. SPSS and SAS offer flexibility for entering,
manipulating and transferring data for analysis, warehousing and mining.
Data description
The objective of descriptive statistical analysis is to develop sufficient knowledge to
describe a body of data. This is accomplished by understanding the data levels for the
measurements we choose, their distributions and characteristics of location, spread and
shape. The discovery of miscoded values, missing data and other problems in the data set
is enhanced with descriptive statistics
There are three general areas that make up the field of statistics: descriptive statistics,
relational statistics, and inferential statistics:
DESCRIPTIVE STATISTICS
Descriptive statistics fall into one of two categories: measures of central tendency (mean,
median, and mode) or measures of dispersion (standard deviation and variance). Their
purpose is to explore hunches that may have come up during the course of the research
process, but most people compute them to look at the normality of their numbers.
Examples include descriptive analysis of sex, age, race, social class, and so forth.
Page 55 of 72
SCS301: RESEARCH METHODS IN COMPUTING
INFERENTIAL STATISTICS
Hypothesis: It’s a statement about a population parameter developed for the purpose of
testing.
Hypothesis testing: It’s a procedure based on sample evidence and probability theory to
determine whether the hypothesis is a reasonable statement.
Procedure for testing a hypothesis
1. State the null and alternate hypothesis
2. Identify the test statistic
3. Formulate a decision rule and identify the rejection region
4. Compute the value of the test statistic
5. Make a conclusion.
State the null hypothesis (HO) and alternate hypothesis (HA)
The null hypothesis is a statement about the value of a population parameter. It
should be stated as “There is no significant difference between ……………”. It
should always contain an equal sign.
The alternate hypothesis is a statement that is accepted if sample data provide
enough evidence that the null hypothesis is false.
One-tailed and Two-tailed tests
A test is one tailed when the alternate hypothesis states a direction e.g.
Ho: The mean income of women is equal to the mean income of men
Page 56 of 72
SCS301: RESEARCH METHODS IN COMPUTING
HA: The mean income of women is greater than the mean income of men
A test is two tailed if no direction is specified in the alternate hypothesis
Ho: There is no difference between the mean income of women and the mean
income of men
HA: There is a difference between the mean income of women and the mean
income of men
Identify the test statistic
A test statistic is the statistic that will be used to test the hypothesis e.g.
Examples
1. A study by the Coca-Cola Company showed that the typical adult Kenyan consumes
18 gallons of Coca-Cola each year. According to the same survey, the standard
deviation of the number of gallons consumed is 3.0. A random sample of 64 college
students showed they consumed an average (mean) of 17 gallons of cola last year. At
the 0.05 significance level, can we conclude that there is a significance difference
between the mean consumption rate of college students and other adults?
Page 57 of 72
SCS301: RESEARCH METHODS IN COMPUTING
hypothesis of tests about mean and variance. The test statistic for is
Example:
1. An inventor has developed a system that allows visitors to museums, zoos and other
attractions to get information at the touch of a digital code. For example, zoo patrons
can listen to an announcement (recorded on a microchip) about each animal they see.
It is anticipated that the device would rent for $3.00 each. The installation cost for the
complete system is expected to be about $400,000. The ABC zoo is interested in
having the system installed, but the management is uncertain about whether to take
the risk. A financial analysis of the problem indicates that if more than 10% of the
zoo visitors rent the system, the zoo will make a profit. To help make the decision, a
random sample of 400 zoo visitors is given details of the systems capabilities and
cost. If 48 people say that they would rent the device, can the management of the zoo
conclude at the 5% significance level that the investment would result in a profit?
2. In a random sample of 100 units from an assembly line, 22 were defective.
(a) Does this provide sufficient evidence at the 10% significance level to
allow us to conclude that the defective rate among all units exceeds 10%?
(b) Find a 99% confidence interval estimate of the defective rate.
3. A manufacturer of computer chips claims that more than 90% of his products
conform to specifications. In a random sample of 1,000 chips drawn from a large
production run, 75 were defective. Do the data provide sufficient evidence at the 1%
level of significance to enable us to conclude that the manufacturer’s claim is true?
Page 58 of 72
SCS301: RESEARCH METHODS IN COMPUTING
Test statistic is
Rejection region is
Example
1. Two companies A and B have recently conducted aggressive advertising campaigns
in order to maintain and possibly increase their respective shares of the market for a
particular product. These two companies enjoy a dominant position in the market.
Before advertising campaigns began, the market share for Company A was 45%
while Company B had a market share of 40%. Other competitors accounted for the
remaining market share of 15%. To determine whether these market shares changed
after the advertising campaigns, a marketing analyst solicited the preferences of a
random sample of 200 consumers of this product. Of the 200 consumers, 100
indicated a preference for Company’s A’s product, 85 preferred Company’s B
product and the remainder preferred one or another of the products distributed by
other competitors. Conduct a test to determine at the 5% level of significance,
whether the market shares have changed from the levels they were at before the
advertising campaigns occurred.
2. To determine if a single die, is balanced, or fair, the die was rolled 600 times. The
observed frequencies with which each of the six sides of the die turned up are
recorded in the following table: -
Face 1 2 3 4 5 6
Observed frequency 114 92 84 101 107 102
Is there sufficient evidence to conclude at the 5% level of significance, that the die is
not fair?
3. Grades assigned by an economics instructor have historically followed a symmetrical
distribution.
Grade A B C D F
Percentage 5 25 40 25 5
Can we conclude at the 1% level of significance that this year’s grades are
distributed differently than they were in the past?
Rule of five
For the discrete distribution of the test statistic to be adequately approximated by the
continuous chi-square distribution, the conventional rule is to require that the expected
frequency for each cell be at least 5. Where necessary, cells should be combined in order
to satisfy this condition. The choice of cells to be combined should be made in such a
way that meaningful categories result from the combination.
Examples
1. The trustee of a company’s pension plan has solicited the opinions of a sample of the
company’s employees regarding a proposed revision of the plan. A breakdown of the
responses is shown in the table below: -
Response Lower level Middle Top
management management management
For 67 32 11
Against 63 18 9
Is there sufficient evidence at the 5% significance level, to conclude that the
responses differ among the three groups of employees?
2. The operations manager at a shirt manufacturing plant has been concerned about the
large number of defects that the company’s three shifts have been producing. They
appear to be three types of defects: Improper stitching, buttons not aligned with
button holes and inconsistent colouring. The manager decides to investigate the
problem. As a first step to improving the quality, she wants to know if the number
and type of defects are the same for all three shifts. A random sample of one day’s
shirt production is taken. The number of each type of defect and the number of
perfect shirts for each are shown in the following table.
Shift
Shirt condition 1 2 3 Total
Perfect 224 249 238 711
Improperly stitched 15 19 21 55
Unaligned buttons 8 12 12 32
Inconsistent colour 17 16 11 44
Total 264 296 282 842
Do these results allow the operations manager to conclude that at the 10%
significance level, there are differences in quality among the three shifts?
Page 60 of 72
SCS301: RESEARCH METHODS IN COMPUTING
REGRESSION ANALYSIS
Regression involves developing a mathematical equation that analyses the relationship
between the variable to be forecast (dependent variable) and the variables that the
statistician believes are related to the forecast variable (independent variable).
Regression is the estimation of unknown values or the prediction of one variable from
known values of other variables.
Types of regression
Simple linear regression: Involves a relationship between two variables only.
Page 61 of 72
SCS301: RESEARCH METHODS IN COMPUTING
Simple Regression
The first step in establishing the relationship between X and Y is to obtain observations
on the two variables and analyze the data using a scatter diagram to indicate whether a
positive or negative relationship exists between X and Y. the relationship can be
approximated by a straight line. Algebraically, the relationship is
The above function is deterministic since it gives exact relationship between X and Y.
when the line is plotted, not all the points will fall on the line because of the following
reasons:-
Omission of other explanatory variables from the function
Random behavior of human beings
Imperfect specification of the functional form of the model
Errors of aggregation
Errors of measurement
To account for the deviations of some points from the straight line, the error term is
introduced. The introduction of the error term makes the function stochastic
. To estimate the values of the coefficients and , we need
observations on Y, X and the error term. However, the error term is not observable and
therefore we make assumptions about the error term.
Other assumptions
The explanatory variables are not perfectly linearly related or correlated (No
multicollinearity)
The variables are correctly aggregated
The relation being estimated is identified
The relationship is correctly specified
Page 62 of 72
SCS301: RESEARCH METHODS IN COMPUTING
To determine the values of and the following two normal equations are to be
solved simultaneously
Alternatively the values of and can be got using the following formula’s
Example
1. A random sample of eight auto drivers insured with a company and having similar
auto insurance policies was selected. The following table lists their driving experience
(in years) and the monthly auto insurance premium (in Sh.000) paid by them.
Driving experience (Years) 5 2 12 9 15 6 25 16
Monthly auto insurance premium 64 87 50 71 44 56 42 60
(In Sh.000)
i. Find the least squares regression line by identifying the appropriate dependent and
independent variable
ii. Interpret the meaning of the constants calculated in part (i) above.
iii. Compute the coefficient of correlation and coefficient of determination and
interpret their values.
2. A farmer wanted to find out the relationship between the amount of fertilizer used and
the yield of corn. He selected seven acres of his land on which he used different
amounts of fertilizer to grow corn. The following table gives the amount (in kg) of
fertilizer used and the yield (in Tonnes) of corn for each of the seven acres.
Fertilizer used 120 80 100 70 88 75 110
Yield of corn 138 112 129 96 119 104 134
i. Find the least squares regression line by identifying the appropriate dependent and
independent variable.
ii. Interpret the meaning of the constants calculated in part (i) above.
iii. Compute the coefficient of correlation and coefficient of determination and interpret
their values.
iv. Predict the yield of corn per acre for 105 kg of fertilizer used.
Page 63 of 72
SCS301: RESEARCH METHODS IN COMPUTING
i. Find the least squares regression line by identifying the appropriate dependent and
independent variable.
ii. Interpret the meaning of the constants calculated in part (i) above.
iii. Compute the coefficient of correlation and coefficient of determination and interpret
their.
CORRELATION
Definition: It is the existence of some definite relationship between two or more
variables. Correlation analysis is a statistical tool used to describe the degree to which
one variable is linearly related to another variable.
Types of Correlation
Correlation may be classified in the following ways:-
Positive and negative correlation
Correlation is said to be positive if two series move in the same direction, otherwise it is
negative (opposite Direction).
Linear and Non-Linear correlation
Correlation is linear if the amount of change in one variable tends to bear a constant ratio
to the amount of change in the other variable otherwise it is non-linear.
Simple, partial and multiple correlation
Simple correlation is where two variables are studied while partial or multiple involves
three or more variables.
Scatter diagram
It is a chart that potrays the relationship between two variables.
Advantages
It is simple and non-mathematical method of studying correlation between variables.
It is not influenced by the size of extreme values
Limitation
One cannot establish the exact degree of correlation between the variables.
Page 64 of 72
SCS301: RESEARCH METHODS IN COMPUTING
The closer r is to +1 or to –1, the closer the relationship between the variables and the
closer r is to 0, the less close the relationship.
The closeness of the relationship is not proportional to r.
The following table lists the interpretations for various correlation coefficients:
Value Comment
0.8 to 1.0 Very strong
0.6 to 0.8 Strong
0.4 to 0.6 Moderate
0.2 to 0.4 Weak
0.0 to 0.2 Very weak
Advantage
It summarizes in one figure the degree of correlation and whether it is positive or
negative.
Limitations
It assumes linear relationship regardless of the fact whether that assumption is true or
not.
The coefficient can be misinterpreted.
The value of the coefficient is unduly affected by the extreme values.
It is time consuming.
Method of least squares
It is denoted by R or p
Page 65 of 72
SCS301: RESEARCH METHODS IN COMPUTING
Example
The ranks given by two judges to 10 individuals are given below.
Individual 1 2 3 4 5 6 7 8 9 10
Judge 1(X) 1 2 7 9 8 6 4 3 10 5
Judge 2 (Y) 7 5 8 10 9 4 1 6 3 2
Calculate the spearman’s rank correlation.
Example
Calculate the Rank correlation coefficient for the following data of marks given to 1st
year B Com students:
CMS 100 45 47 60 38 50
CAC 100 60 61 58 48 46
Merits of the Rank method
It is simpler to understand and easier to apply compared to the Karl Pearson’s
method.
Where the data are of qualitative nature like honesty, efficiency, intelligence etc, the
method can be used with great advantage.
It is the only method that can be used where we are given the ranks and not the actual
values.
Limitations
The method cannot be used for finding out correlation in a grouped frequency
distribution.
Where the number of observations exceeds 30, the calculations become quite tedious
and require a lot of time.
Page 66 of 72
SCS301: RESEARCH METHODS IN COMPUTING
The final research report will have what is contained in the proposal (apart from the time
schedule and budget) and in addition dedication, acknowledgement, chapter four: Data
analysis and findings and chapter five: Summary, conclusions and recommendations.
Prefatory items
Prefatory items do not have a direct bearing on the research itself. They assist the reader
in using the research report. They can include: -
Title page:
The title page should include the title of the report, the date and for whom and by whom
it was prepared. The title should be brief but should include the variables included in the
study, the type of relationship among the variables and the population to which the results
may be applied.
Declaration
This is whereby the researcher declares that the work s his/her original work.
Dedication
Some researchers would always wish to dedicate their work to a person or persons they
deem special in their lives.
Acknowledgements
During the research process, the researcher may require help from other individuals or
organisations. It would be necessary if the researcher acknowledged received from these
individuals and organisations.
Table of contents and list of figures and tables
Any report with several sections that total more than six to ten pages should have a table
of contents. If there are many tables, charts or other exhibits, they should also be listed
after the table of contents in a separate list of tables or list of figures.
List of abbreviations and acronyms
All abbreviations and acronyms used in report should be explained. An abbreviation is a
short form of a word while an acronym is a contraction formed by taking the first letter of
several words.
Abstract
A proposal abstract is a summary of what the researcher intends to do. It should be brief,
precise and to the point.
Chapter One
1.0 Introduction
The introduction prepares the reader for the report by describing the parts of the report.
1.1 Background to the problem
Page 67 of 72
SCS301: RESEARCH METHODS IN COMPUTING
In the background, the researcher should broadly introduce the topic under investigation.
The researcher introduces briefly the general area of study, and then narrows down to the
specific problem to be studied. The background enables the reader to have an idea of
what is happening regarding the area under investigation.
1.2 The problem Statement
The researcher states the problem under investigation. The researcher should describe the
factors that make the stated problem a critical issue to warrant the study. Relevant
literature can be referred to. It should be brief and precise.
1.3 The objectives of the study
Research objectives are those specific issues within the scope of the stated purpose that
the researcher wants to focus upon and examine in the study. The objectives should be
specific, measurable, achievable, reliable and time bound. Objectives guide the researcher
in formulating testable hypotheses.
1.4 Research questions
These are the questions, which the researcher would like to be answered by undertaking
the study. They should be formulated from the objectives of the study.
1.5 Research Hypothesis
A hypothesis is a researchers prediction regarding the outcome of the study. It states
possible differences, relationships or causes between two variables or concepts.
Hypothesis are derived from or based on existing theories, previous research, personal
observations or experiences. The test of a hypothesis involves collection and analysis of
data that may either support or fail to support the hypothesis. If the results fail to support
a stated hypothesis, it does not mean that the study has failed but it implies that the
existing theories or principles need to be revised or retested under various situations.
1.6 Scope of the study
This section indicates the boundary of the study
1.7 Significance / Justification of the study
The justification helps to answer the following questions. Why is this work important?
What are the implications of doing it? How does it link to other knowledge? How does it
stand to inform policy making? The significance must be strong enough to warrant the
use of time, energy and money in carrying out the research.
1.8 Assumptions and limitations of the study
An assumption is any fact that a researcher takes to be true without actually verifying it.
It puts some boundary around the study and provides the reader with vital information,
which influences the way results of the study are interpreted. A limitation is an aspect of
a research that may influence the results negatively but over which the researcher has no
control. A common limitation in social science studies is the scope of the study, which
sometimes may not allow generalizations. Sample size may also be another limitation.
Chapter Two
2.0 Literature Review
The purpose of the literature review is to situate your research in the context of what is
already known about a topic. It need not be exhaustive, it needs to show how your work
will benefit the whole. It should provide the theoretical basis for your work, show what
Page 68 of 72
SCS301: RESEARCH METHODS IN COMPUTING
has been done in the area by others, and set the stage for your work.
In a literature review you should give the reader enough ties to the literature that they feel
confident that you have found, read, and assimilated the literature in the field. It should
probably move from the more general to the more focused studies, but need not be
exhaustive, only relevant.
The literature review should clearly present the holes in the knowledge that need to be
plugged and by so doing, situate your work. It is the place where you establish that your
work will fit in and be significant to the discipline.
Chapter Three
3.0 Research Methodology
This section should make clear to the reader the way that you intend to approach the
research question and the techniques and logic that you will use to address it.
3.1 Research design
The coverage of the design must be adapted to the purpose. In an experimental study, the
materials, tests, equipment, control conditions and other devices should be described. In
descriptive or ex post facto designs, it may be sufficient to cover the rationale for using
one design instead of competing alternatives. The strengths and weaknesses of the design
can be identified and the instrumentation and materials discussed.
3.2 The target population
The researcher should explicitly define the target population being studied
3.3 Sampling strategy
Explanations of the sampling methods, uniqueness of the chosen parameters or other
points that need explanation should be covered with brevity.
3.4 Data Collection Tools and Techniques
This part of the report describes the specifics of gathering the data. Its contents depend on
the design. This might include the data that you anticipate collecting and a description of
the instruments you will use. Detailed copies of the data collection tools e.g.
questionnaires, interview schedule or observation schedule should be attached as an
appendix.
3.5 Data Analysis
This section summarizes the methods used to analyze the data. It describes data handling,
preliminary analysis, statistical tests, computer programs and other technical information.
The rationale for the choice of analysis approaches should be clear. A brief commentary
on assumptions and appropriateness of use should be presented.
Chapter Four
4.0 Data analysis and Findings
The objective is to explain the data rather than draw interpretations or conclusions. When
quantitative data can be presented, it should be done as simply as possible with charts,
graphics and tables. The data need not include everything collected. Only material
important to the reader’s understanding of the problem and the findings should be
Page 69 of 72
SCS301: RESEARCH METHODS IN COMPUTING
included. Both findings that support or do not support the hypothesis should be included.
Chapter Five
5.0 Summary and Conclusions
The summary is a brief statement of the essential findings. Sectional summaries may be
used if there are many specific findings. These may be combined into an overall
summary. Conclusions represent inferences drawn from the findings. Conclusions may
be presented in a tabular form for easy reading and reference. Summary findings may be
subordinated under the related conclusion statement.
Recommendations
There are usually a few ideas about corrective actions. In academic research, the
recommendations are often further study suggestions that broaden or test understanding
of the subject area. In applied research, the recommendations will usually be for
managerial action rather than research action. The writer may offer several alternatives
with justifications.
References
The use of secondary data requires a reference or a bibliography. Proper citation, style
and formats are unique to the purpose of the report. The
Appendixes
The appendixes are the place for complex tables, statistical tests, supporting documents,
copies of forms and questionnaires, detailed descriptions of the methodology, instructions
to field workers and other evidence important for later support. The reader who wishes to
learn about technical aspects of the study and to look at statistical breakdowns will want a
complete appendix.
Time schedule
It is a listing of the major activities and the corresponding anticipated time period it will
take to accomplish that activity. The time is usually given in months. Activities to be
undertaken can always overlap.
Budget
A budget is a list of items that will be required to carry out the research and their
approximate cost. It should be detailed enough and precise on items needed, prices per
unit and total cost. Details of requirements in each budget will be governed by the type of
research.
Characteristics of a Good Proposal:
a) The need for the proposed activity is clearly established, preferably with data.
b) The most important ideas are highlighted and repeated in several places.
c) The objectives of the project are given in detail.
d) There is a detailed schedule of activities for the project, or at least sample portions
of such a complete project schedule.
e) Collaboration with all interested groups in planning of the proposed project is
evident in the proposal.
f) The commitment of all involved parties is evident, e.g., letters of commitment in
Page 70 of 72
SCS301: RESEARCH METHODS IN COMPUTING
the appendix and cost sharing stated in both the narrative of the proposal and the
budget.
g) The budget and the proposal narrative are consistent.
h) The uses of money are clearly indicated in the proposal narrative as well as in the
budget.
i) All of the major matters indicated in the proposal guidelines are clearly addressed
in the proposal.
j) The agreement of all project staff and consultants to participate in the project was
acquired and is so indicated in the proposal.
k) All governmental procedures have been followed with regard to matters such as
civil rights compliance and protection of human subjects.
l) Appropriate detail is provided in all portions of the proposal.
m) All of the directions given in the proposal guidelines have been followed
carefully.
n) Appendices have been used appropriately for detailed and lengthy materials
which the reviewers may not want to read but are useful as evidence of careful
planning, previous experience, etc.
o) The length is consistent with the proposal guidelines and/or funding agency
expectations.
p) The budget explanations provide an adequate basis for the figures used in building
the budget.
q) If appropriate, there is a clear statement of commitment to continue the project
after external funding ends.
r) The qualifications of project personnel are clearly communicated.
s) The writing style is clear and concise. It speaks to the reader, helping the reader
understand the problems and proposal. Summarizing statements and headings are
used to lead the reader.
Page 71 of 72
SCS301: RESEARCH METHODS IN COMPUTING
Page 72 of 72