SAS 2130 Statistics 2021
SAS 2130 Statistics 2021
By Kinyita A.M
Course Purpose: This course introduces the students to the essentials required to understand issues
related to measurement and how to generate descriptive information and statistical analysis from these mea-
surements.
Course Objectives: The aim of this course is to provide students with the necessary statistical back-
ground for analyzing data and drawing inference from the analyses.
Expected Outcomes: On completion of the unit the student will be able to:
1. Summarise the main features of a data set (exploratory data analysis).
(a) Summarise a set of data using a table or frequency distribution, and display it graphically using
a line plot, a box plot, a bar chart, histogram, stem and leaf plot, or other appropriate elementary
device.
(b) Describe the level/location of a set of data using the mean, median, mode, as appropriate.
(c) Describe the spread/variability of a set of data using the standard deviation, range, interquartile
range, as appropriate.
(d) Explain what is meant by symmetry and skewness for the distribution of a set of data.
(a) Explain what is meant by a set function, a sample space for an experiment, and an event.
(b) Define probability as a set function on a collection of events, stating basic axioms.
(c) Derive basic properties satisfied by the probability of occurrence of an event, and calculate prob-
abilities of events in simple situations.
(d) Derive the addition rule for the probability of the union of two events, and use the rule to calculate
probabilities.
(e) Define the conditional probability of one event given the occurrence of another event, and cal-
culate such probabilities.
(f) Derive Bayes’ Theorem for events, and use the result to calculate probabilities.
(g) Define independence for two events, and calculate probabilities in situations involving indepen-
dence.
3. Apply various probability distribution functions and sampling techniques in a real life situation.
5. Investigate linear relationships between variables using correlation analysis and regression analysis.
(f) Calculate R2 (coefficient of determination) and describe its use to measure the goodness of fit
of a linear regression model.
(g) Use a fitted linear relationship to predict a mean response or an individual response with confi-
dence limits.
(h) State the usual multiple linear regression model (with several explanatory variables).
Course Description: Introduction to statistics; Descriptive and inferential statistics. Data: sources, collec-
tion, classification and processing. Frequency distributions. Graphical representation of data: Pie-charts,
histograms, frequency polgons, ogive & stem-and-leaf diagrams. Measures of central tendency (the arith-
metic mean, mode, median, quartiles& percentiles) and dispersion (Variance and standard deviation). Skew-
ness and Kurtosis. Basic probability theory: Classical and Axiomatic approaches to probability; events,
compound and conditional including Bayes’ theorem. Concept of discrete random variable: expectation
and variance. Correlation and simple regression analysis. Fitting data to a best line.
Pre-Requisites: Mathematics for Science.
Reference Books:
1. Mathematical statistics. Freund, John E - 7th ed. - Prentice Hall International, 2004. 614 pages.
ISBN: 978-0131246461.
3. Introduction to the Theory of Probability with Statistical Application. Hogg and Graig.
Teaching Methodology: The method of instruction will be lectures, interactive tutorials and any other
presentations/demonstrations the lecturer will deem fit towards enhancing understanding of the concepts
taught in class.
Contents
1 Introduction 5
1.1 Categories of Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.2 Commonly used Terms in Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.2.1 Populations and Samples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.3 Characteristics of Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.4 Functions of Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.5 Scope of Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.6 Limitations of Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.7 Self-Test Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
6 Partition Values 60
6.1 Quartiles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
6.1.1 Quartile for Individual Observations (Ungrouped Data) . . . . . . . . . . . . . . . 60
6.1.2 Quartile for a Frequency Distribution (Discrete Data) . . . . . . . . . . . . . . . . 61
6.1.3 Quartile for Grouped Frequency Distribution . . . . . . . . . . . . . . . . . . . . 61
6.2 Deciles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
6.2.1 Deciles for Individual Observations (Ungrouped Data) . . . . . . . . . . . . . . . 63
6.2.2 Decile for a Frequency Distribution (Discrete Data): . . . . . . . . . . . . . . . . 63
6.2.3 Decile for Grouped Frequency Distribution . . . . . . . . . . . . . . . . . . . . . 64
6.3 Percentiles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
6.4 Estimation of Measures of Location from Ogive Curves . . . . . . . . . . . . . . . . . . . 64
6.5 Measures of Location from Grouped Data . . . . . . . . . . . . . . . . . . . . . . . . . . 64
6.6 Boxplots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
6.7 Properties of Measures of Central Tendency . . . . . . . . . . . . . . . . . . . . . . . . . 67
7 Measures of Dispersion 68
7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
7.2 Range . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
7.3 Inter-Quartile Range (IQR) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
7.4 Quartile Deviation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
7.5 Mean Absolute Deviation (MAD) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
7.6 Variance and Standard Deviation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
7.6.1 Calculations from a Frequency Distribution . . . . . . . . . . . . . . . . . . . . . 71
7.7 Assumed Mean and Coding Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
7.8 Standard Deviation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
7.9 Variance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
7.10 Properties of Measures of Dispersion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
7.11 Combined Variance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
7.12 Relative measures of Dispersion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
7.12.1 Coefficient of range . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
7.12.2 Quartile coefficient of deviation . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
7.12.3 Coefficient of mean deviation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
7.12.4 Coefficient of Variation (C.V.) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
8 Statistical Moments 79
8.1 Moments about Origin (Raw moments) . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
8.2 Moments about mean (or Central Moments) . . . . . . . . . . . . . . . . . . . . . . . . . 80
8.2.1 Moments about mean (or Central Moments) for ungrouped data . . . . . . . . . . 80
8.2.2 Moments about mean (or Central Moments) for discrete frequency distribution . . 80
8.3 Moments about any arbitrary constant point . . . . . . . . . . . . . . . . . . . . . . . . . 82
8.4 Relations between central moments and raw moments (upto 4-th order) . . . . . . . . . . . 82
8.5 Practice Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
10 Correlations Analysis 92
10.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
10.2 Definition of Correlation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
11 Types of correlation 93
11.1 Correlation Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
14 Probability 121
14.1 Definitions of Terms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
14.2 Probability of an Event . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
14.2.1 Classical definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
14.2.2 Frequentist approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
14.2.3 Subjective/Bayesian approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
14.3 Laws of Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
14.4 Law of Total Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
1 Introduction
Statistics is a discipline, which scientifically deals with data, and is often described as the science of data.
In dealing with statistics as data, statistics has developed appropriate methods of collecting, presenting,
summarizing, and analysing data, and thus consists of a body of these methods. Statistics has become an
integral part of our daily lives. Everyday, we are confronted with some form of statistical information through
newspapers, magazines and other forms of communication. Such statistical information has become highly
influential in our lives. Indeed, the famous science fiction writer H.G. Wells had predicted nearly a century
ago that
"Statistical thinking will one day be as necessary for efficient citizenship as the ability to read
and write".
Thus, the subject of statistics in itself, has gained considerable importance in affecting the processes of our
thinking and decision making.
The subject of statistics is primarily concerned with making decisions about various properties of some
population of interest such as stock market trends, unemployment rates in various sectors of industries, de-
mographic shifts, interest rates, inflation rates over the years and so on. By way of examples, consider the
following statistical statements.
• The crime rate has gone up by 15% of what it was last year.
• The rate of inflation in Kenya is expected to remain above 15% per year for the next 5 years.
• Less than 20% of all high school graduates enter colleges for higher education, and less than 40% of
those who do enter colleges, actually graduate.
• Majority of Kenyans consider Japanese cars superior in quality than Chinese cars.
All the above statements represent statistical conclusions in some form.
Statistics is a scientific discipline concerned with collection, description, analysis, and interpretation
of data obtained from observation or experiment.
1. Collection of data: Once an investigator has collected data through a survey, it is necessary to edit
these data in order to correct any apparent inconsistencies, ambiguities, recording errors or for that
matter any mistake that can enter into the actual computations. But even before the data has been
collected and edited, it is assumes that these can be suitably classified according to some common
characteristic of the population sampled.
2. Description of data: The organized data can now be presented in the form of tables or diagrams or
graphs. This presentation in an orderly manner facilitates the understanding as well as analysis of data.
3. Analysis of data: The basic purpose of data analysis is to make it useful for certain conclusions. This
analysis may simply be a critical observation of data to draw some meaningful conclusions about it or it
may involve highly complex and sophisticated mathematical techniques. Some simple statistical tools
such as calculations of averages, dispersion of data around averages and percentages are commonly
used to analyze data.
4. Interpretation of data: Interpretation means drawing conclusions from the data which form the
basis of decision making. Correct interpretation requires a high degree of skill and experience and is
necessary in order to draw valid conclusions.
(In descriptive statistics, we use our analysis of data in order to describe a the situation from which it is
drawn, that is, to summarize the information we have found in a set of data, and to interpret it or present it
clearly.
In inferential statistics, we are interested in using the analysis of data (the ’sample’) in order to make pre-
dictions, generalizations, or other inferences about a larger set of data (the ’population’). For example, we
might want to ask how confidently we can infer that the average BSC student at DeKUT has completed x
years of college.)
The term descriptive statistics deals with collecting, summarizing, and simplifying data, which are oth-
erwise quite unwieldy and voluminous. It seeks to achieve this in a manner that meaningful conclusions can
be readily drawn from the data. Descriptive statistics may thus be seen as comprising methods of bringing
out and highlighting the latent characteristics present in a set of numerical data. It not only facilitates an
understanding of the data and systematic reporting thereof in a manner; and also makes them amenable to
further discussion, analysis, and interpretations.
The first step in any scientific inquiry is to collect data relevant to the problem in hand. When the inquiry
relates to physical and/or biological sciences, data collection is normally an integral part of the experiment
itself. In fact, the very manner in which an experiment is designed, determines the kind of data it would
require and/or generate. The problem of identifying the nature and the kind of the relevant data is thus au-
tomatically resolved as soon as the design of experiment is finalized. It is possible in the case of physical
sciences. In the case of social sciences, where the required data are often collected through a questionnaire
from a number of carefully selected respondents, the problem is not that simply resolved. For one thing,
designing the questionnaire itself is a critical initial problem. For another, the number of respondents to be
accessed for data collection and the criteria for selecting them has their own implications and importance
for the quality of results obtained. Further, the data have been collected, these are assembled, organized,
and presented in the form of appropriate tables to make them readable. Wherever needed, figures, diagrams,
charts, and graphs are also used for better presentation of the data. A useful tabular and graphic presentation
of data will require that the raw data be properly classified in accordance with the objectives of investigation
and the relational analysis to be carried out.
A well thought-out and sharp data classification facilitates easy description of the hidden data character-
istics by means of a variety of summary measures. These include measures of central tendency, dispersion,
skewness, and kurtosis, which constitute the essential scope of descriptive statistics.
• statistics pertaining to dispersion around the central tendency such as the range or standard deviation
Inferential statistics helps to evaluate the risks involved in reaching inferences or generalizations about an
unknown population on the basis of sample information. for example, an inspection of a sample of five
battery cells drawn from a given lot may reveal that all the five cells are in perfectly good condition. This
information may be used to conclude that the entire lot is good enough to buy or not.
Since this inference is based on the examination of a sample of limited number of cells, it is equally likely
that all the cells in the lot are not in order. It is also possible that all the items that may be included in
the sample are unsatisfactory. This may be used to conclude that the entire lot is of unsatisfactory quality,
whereas the fact may indeed be otherwise. It may, thus, be noticed that there is always a risk of an infer-
ence about a population being incorrect when based on the knowledge of a limited sample. The rescue in
such situations lies in evaluating such risks. For this, statistics provides the necessary methods. These cen-
tres on quantifying in probabilistic term the chances of decisions taken on the basis of sample information
being incorrect. This requires an understanding of the what, why, and how of probability and probability
distributions to equip ourselves with methods of drawing statistical inferences and estimating the degree of
reliability of these inferences.
Inferential statistics allow one to infer population parameters based upon sample statistics and to model
relationships within the data. The categories of inferential statistics are
• Estimation is the group of statistics which allow for the estimation about population values based
upon sample data. The two types of statistics in this category are population parameter estimates and
confidence intervals.
• Modeling allows us to develop mathematical equations which describe the interrelationships between
two or more variables.
• Hypothesis testing allows us to test for whether a particular hypothesis we’ve developed is supported
by a systematic analysis of the data.
The population is often too large for us to examine each of its members. In such case, we try to learn about
the population by choosing and then examining a sub group of its elements. The subgroup of a population
is called a sample. A sample is defined as a set of selected individuals, items, or data taken from a popula-
tion of interest. A characteristic (usually numeric) that describes a sample is referred to as a sample statistic.
The total collection of all elements that we are interested in is called a target population.
A sample of k members of a population is said to be a random sample, sometimes called a simple random
sample, if the members are chosen in such a way that all possible choices of the k members are equally
likely.
Example 1.1. On the basis of the following example, we will identify the population, sample, population
parameter, and sample statistic: Suppose you read an article in the local college newspaper citing that the
average college student plays 2 hours of video games per week. To test whether this is true for your school,
you randomly approach 20 fellow students and ask them how long (in hours) they play video games per
week. You find that the average student, among those you asked, plays video games for 1 hour per week.
Distinguish the population from the sample.
Answer. In this example, all college students at your school constitute the population of interest, and the 20
students you approached is the sample that was selected from this population of interest. Since it is purported
that the average college student plays 2 hours of video games per week, this is the population parameter (2
hours). The average number of hours playing video games in the sample is the sample statistic (1 hour).
1. Statistics are aggregate of facts. Single or isolated facts or figures cannot be called statistics as these
cannot be compared or related to other figures within the same framework. For example, a single
birth in a hospital is not statistics, as it has no significance for analysis purposes. However, when
such information about many births in the same hospital or birth information for different hospitals
is collected, then this information can be compared and analyzed and thus this data would constitute
statistics.
2. Statistics, generally are not the outcomes of a single cause, but are affected by multiple causes.
There are a number of forces working together that affect the facts and figures. For example, when
we say that the crime rate in Kenya has increased by 15% over the last year, a number of factors
might have affected this change. These factors may be: general level of economy such as state of
economic recession, unemployment rate, extent of use of drugs, areas affected by crime, extent of
legal effectiveness, social structure of the family in the area and so on. While these factors can be
isolated by themselves, the effects of these factors cannot be isolated and measured individually. It is
generally not possible to segregate and study the effect of each of these forces individually.
3. Statistics are numerically expressed. All statistics are stated in numerical figures which means that
these are quantitative information only.
4. Statistical data is collected in a systematic manner. The procedure for collecting data should be
predetermined and well planned and such data collection should be undertaken by trained investigators.
Haphazard collection of data can lead to erroneous conclusions.
5. Statistics are collected for a predetermined purpose. The purpose and objective of collecting per-
tinent data must be clearly defined, decided upon and determined prior to data collection. This would
facilitate the collection of proper and relevant data. For example, data on the heights of students
would be irrelevant if considered in connection with the ability to get admission in a college, but may
be relevant when considering recruits to join the army.
6. Statistics are enumerated or estimated according to reasonable standard of accuracy. There are
basically two ways of collecting data. one is the actual counting or measuring, which is the most accu-
rate way. For example, the number of people attending a football game can be accurately determined
by counting the number of tickets sold and redeemed at the gate.
The second way of collecting data is by estimation and is used in situations where actual counting
or measuring is not feasible or where it involves prohibitive cost. For example, the crowd at a cam-
paign rally can be estimated by using visual observation or by taking samples of some segments of
the crowd and then estimating the total number of people on that basis of these samples. Estimates,
based on the samples cannot be as precise and accurate as actual counts or measurements, but these
should be consistent with the degree of accuracy desired.
7. Statistics must be placed in relation to each other. The main objective of data collection is to
facilitate a comparative or relative study of the desired characteristics of the data. The comparisons
of facts and figures may be conducted regarding the same characteristics over a period of time from a
single source or it may be from various sources at any one given time. For example, prices of different
items in a store as such would not be considered statistics. However, prices of one product in different
stores constitute statistical data since these prices are comparable. Also, the changes in the price of a
product in one store over a period of time would also be considered statistical data since these changes
provide for comparison over a period of time.
2. It facilitates classification and comparison of data. Arrangement of data with respect to different
characteristics facilitates comparison and interpretation. For example, data on age, height, gender,
and family income of college students gives us a much better picture of students when the data is
categorized relative to these characteristics.
4. It helps in predicting future trends (Forecasting). Statistical methods are highly useful tools in
analyzing the past data and predicting some future trends. For example, the sales for a particular
product for the next year can be computed by knowing the sales for the same product over the previous
years, the current market trends and the possible changes in the variable that affect the demand of the
product.
5. It helps the central management and the government in formulating policies. Example, the re-
cently conducted census, will be used as a source of information for planning by the government for
the next 10 years until another census is conducted in 2019.
1. Government. Various departments of the government collect and interpret vast amount of data and
information for efficient functioning and decision making.
2. Economics. Statistics are widely used in economics study and research. The subject of economics
is mainly concerned with production and distribution of wealth as well as savings and investments.
Some of the areas of economic interest in which statistical tools are used are as follows:
(a) Statistical methods are extensively used in measuring and forecasting Gross National Product
(GNP).
(b) Economic stability is primarily judged by statistical studies of business cycles.
(c) Statistical analyzes of population growth, unemployment figures, rural or urban population shifts
and so on influence much of the economic policy making.
(d) Econometric models which involve application of statistical methods and used for optimum uti-
lization of resources available.
(e) Financial statistics are necessary in the fields of money and banking including consumer savings
and credit availability.
3. Physical, Natural and Social Sciences. In physical sciences, as an example, the science of meteorol-
ogy uses statistics in analyzing the data gathered by satellites in predicting weather conditions.
4. Statistics and Research. There is hardly any advanced research going on without the use of statistics
in one form or another. Statistics are used extensively in medical, pharmaceutical and agricultural
research. The effectiveness of a new drug is determined by statistical experimentation and evaluation.
5. Other Areas. Statistics are commonly used by insurance companies, stock brokerage firms, banks,
public utility companies and so on. Statistics are also immensely useful to politicians since they can
predict their chance of winning through the use of sampling techniques in random selection of voters
sampled and studying their attitude on issues and policies.
1. It does not deal with individual values. Statistics only deals with aggregate values. For example,
the marks obtained by one student in a class does not carry any meaning in itself, unless it can be
compared with a set standard or with other students in the same class or with his own marks obtained
earlier.
2. It cannot deal with qualitative characteristics. Statistics is not applicable to qualitative character-
istics such as honesty, kindness, goodness, colour, poverty, beauty, and so on, since these cannot be
expressed in quantitative terms. The characteristics, however, can be statistically dealt with if some
quantitative values can be assigned to these with logical criterion.
3. Statistical conclusions are not universally true. Since statistics is not an exact science, as is the case
with natural sciences, the statistical conclusions are true only under certain assumptions.
4. Statistical interpretation requires a high degree of skill and under standing of the subject. In
order to get meaningful results, it is necessary that the data be properly and professionally collected
and critically interpreted. it requires extensive training to read and analyze statistics in its proper
context.
5. Statistics can be misused. The famous statement that ‘figures don’t lie but the liars can figure’, is a
testimony to the misuse of statistics. Thus, inaccurate or incomplete figures, can be manipulated to
get desirable references. Example, advertising slogans such as 4 out of 5 dentists recommend brand X
tooth paste gives us the impression that 80% of all dentists recommended this brand. This may not be
true since we don’t know how big the sample is or whether the sample represents the entire population
or not.
Another example is the opinion polls after the news. We are normally given a percentage but not told
the sample size of the total number of people who called to respond to the questions.
6. There are certain phenomena or concepts where statistics cannot be used. This is because these phe-
nomena or concepts are not amenable to measurement. For example, beauty, intelligence, courage
cannot be quantified. Statistics has no place in all such cases where quantification is not possible.
7. Statistics reveal the average behaviour, the normal or the general trend. An application of the ’aver-
age’ concept if applied to an individual or a particular situation may lead to a wrong conclusion and
sometimes may be disastrous. For example, one may be misguided when told that the average depth
of a river from one bank to the other is four feet, when there may be some points in between where
its depth is far more than four feet. On this understanding, one may enter those points having greater
depth, which may be hazardous.
8. Since statistics are collected for a particular purpose, such data may not be relevant or useful in other
situations or cases. For example, secondary data (i.e., data originally collected by someone else) may
not be useful for the other person.
9. Statistics are not 100 per cent precise as is Mathematics or Accountancy. Those who use statistics
should be aware of this limitation.
10. In statistical surveys, sampling is generally used as it is not physically possible to cover all the units
or elements comprising the universe. The results may not be appropriate as far as the universe is
concerned. Moreover, different surveys based on the same size of sample but different sample units
may yield different results.
11. At times, association or relationship between two or more variables is studied in statistics, but such
a relationship does not indicate cause and effect’ relationship. It simply shows the similarity or dis-
similarity in the movement of the two variables. In such cases, it is the user who has to interpret the
results carefully, pointing out the type of relationship obtained.
12. A major limitation of statistics is that it does not reveal all pertaining to a certain phenomenon. There
is some background information that statistics does not cover. Similarly, there are some other aspects
related to the problem on hand, which are also not covered. The user of Statistics has to be well
informed and should interpret Statistics keeping in mind all other aspects having relevance on the
given problem.
Apart from the limitations of statistics mentioned above, there are misuses of it. Many people, knowingly or
unknowingly, use statistical data in wrong manner. Let us see what the main misuses of statistics are so that
the same could be avoided when one has to use statistical data. The misuse of Statistics may take several
forms some of which are explained below.
(i) Sources of data not given: At times, the source of data is not given. In the absence of the source, the
reader does not know how far the data are reliable. Further, if he wants to refer to the original source,
he is unable to do so.
(ii) Defective data: Another misuse is that sometimes one gives defective data. This may be done know-
ingly in order to defend one’s position or to prove a particular point. This apart, the definition used to
denote a certain phenomenon may be defective. For example, in case of data relating to unemployed
persons, the definition may include even those who are employed, though partially. The question here
is how far it is justified to include partially employed persons amongst unemployed ones.
(iii) Unrepresentative sample: In statistics, several times one has to conduct a survey, which necessitates
to choose a sample from the given population or universe. The sample may turn out to be unrepresen-
tative of the universe. One may choose a sample just on the basis of convenience. He may collect the
desired information from either his friends or nearby respondents in his neighbourhood even though
such respondents do not constitute a representative sample.
(iv) Inadequate sample: Earlier, we have seen that a sample that is unrepresentative of the universe is
a major misuse of statistics. This apart, at times one may conduct a survey based on an extremely
inadequate sample. For example, in a city we may find that there are 100,000 households. When we
have to conduct a household survey, we may take a sample of merely 100 households comprising only
0.1 per cent of the universe. A survey based on such a small sample may not yield right information.
(v) Unfair Comparisons: An important misuse of statistics is making unfair comparisons from the data
collected. For instance, one may construct an index of production choosing the base year where the
production was much less. Then he may compare the subsequent year’s production from this low base.
Such a comparison will undoubtedly give a rosy picture of the production though in reality it is not
so. Another source of unfair comparisons could be when one makes absolute comparisons instead
of relative ones. An absolute comparison of two figures, say, of production or export, may show a
good increase, but in relative terms it may turnout to be very negligible. Another example of unfair
comparison is when the population in two cities is different, but a comparison of overall death rates
and deaths by a particular disease is attempted. Such a comparison is wrong. Likewise, when data
are not properly classified or when changes in the composition of population in the two years are not
taken into consideration, comparisons of such data would be unfair as they would lead to misleading
conclusions.
(vi) Unwanted conclusions: Another misuse of statistics may be on account of unwarranted conclusions.
This may be as a result of making false assumptions. For example, while making projections of
population in the next five years, one may assume a lower rate of growth though the past two years
indicate otherwise. Sometimes one may not be sure about the changes in business environment in the
near future. In such a case, one may use an assumption that may turn out to be wrong. Another source
of unwarranted conclusion may be the use of wrong average. Suppose in a series there are extreme
values, one is too high while the other is too low, such as 800 and 50. The use of an arithmetic average
in such a case may give a wrong idea. Instead, harmonic mean would be proper in such a case.
(vii) Confusion of correlation and causation: In statistics, several times one has to examine the rela-
tionship between two variables. A close relationship between the two variables may not establish a
cause-and-effect-relationship in the sense that one variable is the cause and the other is the effect. It
should be taken as something that measures degree of association rather than try to find out causal
relationship.
Quantitative data are those that can be quantified in definite units of measurement. These refer to char-
acteristics whose successive measurements yield quantifiable observations. Depending on the nature of the
variable observed for measurement, quantitative data can be further categorized as continuous and discrete
data.
(i) Continuous data represent the numerical values of a continuous variable. A continuous variable is
the one that can assume any value between any two points on a line segment, thus representing an
interval of values. The values are quite precise and close to each other, yet distinguishably different.
All characteristics such as weight, length, height, thickness, velocity, temperature, tensile strength,
etc., represent continuous variables. Thus, the data recorded on these and similar other characteristics
are called continuous data. It may be noted that a continuous variable assumes the finest unit of
measurement. Finest in the sense that it enables measurements to the maximum degree of precision.
(ii) Discrete data are the values assumed by a discrete variable. A discrete variable is the one whose
outcomes are measured in fixed numbers. Such data are essentially count data. These are derived from
a process of counting, such as the number of items possessing or not possessing a certain characteristic.
The number of customers visiting a departmental store everyday, the incoming flights at an airport,
and the defective items in a consignment received for sale, are all examples of discrete data.
(i) Nominal data are the outcome of classification into two or more categories of items or units com-
prising a sample or a population according to some quality characteristic. Classification of students
according to sex (as males and females), of workers according to skill (as skilled, semi-skilled, and
unskilled), and of employees according to the level of education (as matriculates, undergraduates, and
post-graduates), all result into nominal data. Given any such basis of classification, it is always possi-
ble to assign each item to a particular class and make a summation of items belonging to each class.
The count data so obtained are called nominal data.
(ii) Ordinal/Rank data, on the other hand, are the result of assigning ranks to specify order in terms
of the integers 1, 2, 3, · · · , n. Ranks may be assigned according to the level of performance in
Developing a good understanding of the kinds of data and data measurement is necessary because the kind of
data you are analyzing essentially dictates the type of statistical analysis you perform. Data can be classified
as either numerical (quantitative) or categorical (qualitative).
Data is either provided to you or you collect it yourself. In the latter case, it will be worth your while to
think about how you enter (key in) the data. For example, counts are represented as nonnegative integers
while measurements are real numbers. Furthermore, when it comes to analyzing and presenting data, the
same method will display data differently based on their type.
Categorical data is data that can be sorted according to a category and each value is from a set of non-
overlapping values. Examples of categorical data would include eye color (green, brown, blue, etc.) and
managerial level (supervisor, mid-level, executive).
• Categorical variables are typically measured on a nominal scale. Nominal level variables are those
that can simply be grouped; there’s no underlying numeric order to them and any ordering is arbitrary
or artificial. Our examples above of eye color and managerial level are both measured at a nominal
level. Other examples of categorical data that’s measured on a nominal scale include type of indus-
try, state of residence, marital status, and favorite food. Please note that you might have responses
“dummy-coded” with numbers to represent a response such as 0s representing male and 1s repre-
senting female, but even though there are numbers it’s still a nominal scale because the ordering is
completely arbitrary; we could have had 0s representing female and 1s representing male.
Data involve the values of a variable and there are several types of variable:
Numerical data can be classified into two types: discrete and continuous. The distinction between discrete
and continuous data is that discrete data can only take one of a set of particular values, whereas continuous
data can take any value within a specified range (or the possible values are so close together that they can be
considered to occupy a continuous range).
Discrete data arise from counting, eg numbers of actuaries, chemists, number of atoms, numbers
of claims.
From a strictly mathematical point of view, the distinction we make here between continuous and discrete
is not correct. For our purpose, the distinction is useful.
CATEGORICAL Factors are also called categories or enumerated types. Think of a factor as a set of
category names. Factors are qualitative classification of objects. Categories do not imply order. A black cat
is different from a brown cat. It is neither larger or smaller.
Attribute (or dichotomous) data have only two categories, eg yes/no, male/female, claim/no claim.
Nominal data have several unordered categories, eg type of policy, nature of claim.
Ordinal data have several ordered categories, eg questionnaire responses such as “strongly in favour /
· · · / strongly against”.
Sometimes we use the levels to indicate order, but not necessarily magnitude. For example, we can de-
fine the label of presidential candidates as implying order from the most popular (having the most number of
voters) to the least popular. Ordinal data do not reveal this kind of information. For example, we generally
agree that rabbis are faster than turtles. We rarely know by how much.
Examples: Here are some examples of categorical data: a division of a population into males and females,
the number of dots that appear on the face of a die, head or tail in flipping a coin, species and colour of flowers.
Categorical data may be presented in graphs. However, the location of categories along the x or the y
axes does not imply order.
In all, scales of measurement are characterized by three properties: order, differences, and ratios. Each
property can be described by answering the following questions:
1. Order: Does a larger number indicate a greater value than a smaller number?
3. Ratio: Does dividing (or taking the ratio of) two numbers represent some meaningful value?
1. Nominal Level (in name only): Nominal scales are measurements where a number is assigned to repre-
sent something or someone. Qualities with no ranking/ordering; no numerical or quantitative value. Data
consists of names, labels and categories. a. Taos, Acoma, Zuni and Cochiti are names of four native Ameri-
can pueblos. b. Car colors for a certain model are: red, silver, blue and black. Examples of nominal variables
include a person’s race, gender, nationality, sexual orientation, hair and eye color, season of birth, marital
status, or other demographic or personal information. A researcher may code men as 1 and women as 2.
These numbers are used to identify gender and nothing more.
2. Ordinal Level: An ordinal scale of measurement is one that conveys order alone. This scale indicates
that some value is greater or less than another value. Examples of ordinal scales include finishing order in a
competition, education level, and rankings. These scales only indicate that one value is greater or less than
another, so differences between ranks do not have meaning.
Can be arranged in some order, but the differences between the data values are meaningless. a. Of 17
fishing reels rated: 6 were rated good quality, 4 were rated better quality, and 7 were rated best quality. b.
Out of a high school class of 319, Walter ranked 4th, June ranked 12th, and Jim ranked 20th.
3. Interval Level: Interval scales are measurements where the values have no true zero and the distance
between each value is equidistant.
Equidistant scales are those values whose intervals are distributed in equal units. A true zero describes
values where the value 0 truly indicates nothing. Values on an interval scale do not have a true zero.
Data values can be ranked and the differences between data values are meaningful. However, there is no
intrinsic zero, or starting point, and the ratio of data values are meaningless. Note: Calendar dates and Cel-
sius & Fahrenheit temperature readings have no meaningful zero and ratios are meaningless. a. The years
in which democrats won presidential elections. b. Body temperature in degrees Celsius (or Fahrenheit) of
trout swimming in the North River. c. Building A was built in 1284, Building B in 1492 and Building C in
5 bce.
4. Ratio Level: Ratio scales are similar to interval scales in that scores are distributed in equal units.
Yet, unlike interval scales, a distribution of scores on a ratio scale has a true zero. This is an ideal scale in
behavioral research because any mathematical operation can be performed on the values that are measured.
Common examples of ratio scales include counts and measures of length, height, weight, and time. For
scores on a ratio scale, order is informative. For example, a person who is 30 years old is older than another
who is 20. Differences are also informative. For example, the difference between 70 and 60 seconds is the
same as the difference between 30 and 20 seconds (the difference is 10 seconds). Ratios are also informative
on this scale because a true zero is defined tO truly means nothing. Hence, it is meaningful to state that 60
pounds is twice as heavy as 30 pounds.
Similar to interval, except there is a true zero, or starting point, and the ratios of data values have mean-
ing. a. Core temperature of stars measured in degrees Kelvin. b. Time elapsed between the deposit of a
check and the clearance of that check. c. Length of trout in the North
Question 2.1. Answer the following dating agency questionnaire and state what type of data is required in
each question:
(a) How old are you? (Give your age last birthday.)
(d) Do you smoke? 6. How would you rate your looks? (10 =Drop-dead gorgeous, 1= Seen better days)
(f) concentration of a pollutant in the air in unit of parts per million (discrete)
Question 2.2. With the help of the tutor or otherwise collect the following data.
3. DATA SET 3: Number of siblings (including self) of the current students in your class.
The validity and accuracy of final judgement is most crucial and depends heavily on how well the data
was collected in the first place. The quality of data will greatly affect the conclusions and hence, utmost
importance must be given to this process and every possible precaution should be taken to ensure accuracy
while gathering and collecting data.
Statistical data, may be classified under two categories depending upon the sources utilized. These cate-
gories are:
Data sources could be seen as of two types, viz., secondary and primary. The two can be defined as un-
der: (i) Secondary data: They already exist in some form: published or unpublished - in an identifiable
secondary source. They are, generally, available from published source(s), though not necessarily in the
form actually required. (ii) Primary data: Those data which do not already exist in any form, and thus have
to be collected for the first time from the primary source(s). By their very nature, these data require fresh
and first-time collection covering the whole population or a sample drawn from it.
The secondary data can be obtained from journals, reports, government publications, publications of profes-
sional and research organizations etc. For example if a researcher desires to analyze the weather conditions
of different regions, he can get the required information or data from the records of the metrology depart-
ment, economic surveys data from the KNBS.
The following steps may be considered in the primary data collection process:
The scope of the study must take into consideration the field to be covered and the time period in
which to conduct the study. The time span is very important because in certain areas, the conditions
change very quickly and hence by the time the study is completed, it may become irrelevant.
Surveys: A survey solicits information from people; e.g. Gallup polls; pre-election polls; marketing
surveys. The Response Rate (i.e. the proportion of all people selected who complete the survey) is a
key survey parameter. Surveys may be administered in a variety of ways, e.g.
– Personal Interview,
– Telephone Interview, and
– Self-Administered Questionnaire.
3. Sampling
Recall that statistical inference permits us to draw conclusions about a population based on a sample.
Sampling (i.e. selecting a sub-set of a whole population) is often done for reasons of cost (it’s less
expensive to sample 1,000 television viewers than 100 million TV viewers) and practicality (e.g. per-
forming a crash test on every automobile produced is impractical).
In any case, the sampled population and the target population should be similar to one another.
A sampling plan is just a method or procedure for specifying how a sample will be taken from a
population. We will focus our attention on one of these methods; Simple Random Sampling.
A simple random sample is a sample selected in such a way that every possible sample of the same
size is equally likely to be chosen. Drawing three names from a hat containing all the names of the
students in the class is an example of a simple random sample: any group of three names is as equally
likely as picking any other group of three names.
Two major types of error can arise when a sample of observations is taken from a population: sam-
pling error and non-sampling error.
Sampling error refers to differences between the sample and the population that exist only because
of the observations that happened to be selected for the sample. Non-sampling errors are more seri-
ous and are due to mistakes made in the acquisition of data or due to the sample observations being
selected improperly.
4. Data Organization
Data that describes or measure a single attribute, say height of a tree, are called univariate. They are
composed of a set of observations of objects about which a single value is obtained. Bivariate data
are represented in pairs. Multivariate data are composed of a set of observations or objects. Each
observation contain a number of values that represent this object.
Statistical analysis usually involves more than one data file. Often we use several files to store dif-
ferent data that relate to a single analysis. We then need to somehow relate data from different files.
This requires careful consideration of how the data are to be organized. Once you commit the data to
a particular organization it is difficult to change. The way the data are organized will then dictate how
easy they are to prepare for different types of statistical analyses.
Data are organized into tables and tables are related to each other. The tables, their relationship and
other auxiliary information form a database.
5. Data manipulation
The core of working with data is the ability to subset, merge, split, and perform other such data oper-
ations. Applying various operations to subsets of the data wholesale is as important.
In this lecture we will learn methods for presenting and describing sets of data in tables and graphics. Often,
tables requires a good deal of data manipulations. Graphics is an important tool not only in presenting data
but also in cleaning them and latter analyzing them.
Ungrouped Data
Suppose we have a collection of measurements given by numbers. Some may occur only once, while others
may be repeated several times. If we write down the numbers as they appear, the processing of them is likely
to be cumbersome. This is known as “ungrouped (or raw) data”. When the data set contains only a rela-
tive small number of distinct or different values, it is convenient to represent it in an ungrouped frequency
distribution table which present each distinct value along with its frequency of occurrence.
The data from a discrete distribution can be summarised using a frequency distribution, that is, by
counting the number of 0’s, 1’s, 2’s, etc. For example, the number of children in a sample of 80 families
might be summarised as follows:
1. Identify the smallest and the largest value in the data set.
3. Count the number of tallies of each quantity and record them as the frequency for the value.
Example 3.1. Construction of ungrouped frequency distribution table: The following data represents the
number of days of sick leave taken by each of 50 worker of a given company over the last 6 weeks:
17 13 8 9 16 12 8 11 9 13
11 11 11 16 19 12 10 13 10 15
16 12 9 11 11 13 8 11 15 10
15 16 10 16 18 12 14 12 11 8
12 12 12 12 9 8 8 10 15 13
Solution. Since the data set contains only a relative small number of distinct, or different values, it is con-
venient to represent it in a frequency table below which presents each distinct value along with its frequency
of occurrence.
Frequency 1 3 4 5 4 2 1
Example 3.3.
Two types of frequency distributions that are most often used are the categorical frequency distribution and
the grouped frequency distribution. The procedures for constructing these distributions are shown now.
Each item is counted every time it appears in order to obtain the “class frequency” and each class inter-
val has the same “class width”. Too few classes means that the data is over-summarised, while too many
classes means that there is little advantage in summarising at all. When data is arranged this way we call it
a grouped data. The resulting table is called a grouped frequency distribution table.
Here, we use the convention that the lower boundary of the class is included while the upper boundary
is excluded. Each item in a particular class is considered to be approximately equal to the “class midpoint”;
that is, the average of the two “class boundaries”. A “grouped frequency distribution table” normally has
columns which show the class intervals, class mid-points, class frequencies, and “cumulative frequencies”,
the last of these being a running total of the frequencies themselves. There may also be a column of “tallied
frequencies”, if the table is being constructed from the raw data without having first arranged the values in
rank order.
1. Select the number of classes k, using some professional judgement so as k falls between 5 and 20.
One such good guideline is to pick k such that 2k ≥ n, so that if the sample size, n = 20, k = 5
because 25 = 32 > n and if n = 80, k = 7 because 27 = 128 > n. To be more specific, we can solve
log n
for k to get k ≥ .
log 2
2. Find the largest and smallest values and compute the working range denoted by R.
LCL of the starting class is normally the Minimum value in the data or any other value slightly less
than the minimum value.
3. Identify the smallest unit of measurement (u) used in the data collection. The value of u can be
inferred from the given data or the given starting value (usually tens (10), ones(1), oneth (0.1) and
tenth (0.01) etc. For example
10 20 40 60 u = 10
15 12 11 52 u = 1
2.8 1.6 1.7 5.6 u = 0.1
Note: You must Round Up, not Round Off. For u = 1, Round Up (5.2) = 6 not 5 and for u = 0.1
Round Up (5.21) = 5.3 not 5.2. If Rk is exact (no remainder when divided by u) add one to the number
of classes.
4. The starting value used in calculation of R above is picked as the lower class limit (LCL) of the of the
first class. Add the class interval i to this LCL successfully to get the rest of the lower class limits.
5. Find the Upper Class Limit (UCL) of the first class by subtracting u from the LCL of the second class.
Then continue to add the class interval i to this UCL to find the rest of the upper limits.
6. If necessary, find the class boundaries (CB) for each class as follows.
7. Tally the number of observations falling in each class and find the frequencies.
Note: A value x falls into a class LCL − U CL only if LCB ≤ x < U CB. That is x can be
equal to LCB but not UCB of that class.
9. Compute the cumulative frequencies to confirm that the last value of the column is equal to the sum
of the frequencies.
10. Compute the midpoints of each class using the class boundaries.
Example 3.4. The Dean of the Faculty of Science wishes to determine the amount of studying BCM students
do. He selects a random sample of 40 students and records the number of hours each student studies per
week as follows
15.0 23.7 19.7 15.4 18.3 23.0 17.5 20.8 13.5 20.7
17.4 18.6 12.9 20.3 23.7 21.4 18.3 29.8 17.1 18.9
10.3 26.1 15.7 24.0 17.8 32.8 23.2 24.5 27.1 16.6
9.2 16.5 30.8 29.6 24.6 12.5 21.6 28.4 27.9 22.4
Organize the data into a grouped frequency distribution.
n = 40, 2k ≥ 40 ⇒ k = 6.
Identify the smallest and the largest values of the data. In this case the smallest value is 9.2 and the largest
is 32.8. The range is given by
For the upper class limits, the smallest unit of measurement is 0.1 so u = 0.1, UCL of the first class
= 13.2 − 0.1 = 13.1. Adding 4 to 13.1 gives 17.1, adding 4 to 17.1 gives 21.1 and so on.
Note: The last column in the table contains cumulative frequencies (cf ) which gives the total number of
observations equal to or less than the UCB of a particular class. The cumulative frequency is obtained by
successively adding the frequencies of values of the variable from the lowest to highest value or class.
Example 3.5. Suppose in the previous example we were to use 9.0 as the starting value of the first class,
then the interval would become
Range R = 32.8 − 9.0 = 23.8
R
i = Round up = Round up [3.967] = 4.0 to the nearest u
k
The frequency table would be
Class 9.0 - 12.9 13.0 - 16.9 17.0 - 20.9 21.0 - 24.9 25.0 - 28.9 29.0 - 32.9
Tally :::: ;: ;;:: ;; :::: ::::
Frequency 4 6 12 10 4 4
Example 3.6. Consider the data below
2370 1970 1540 1830 1500 2300 1750
1740 1860 1290 2030 2370 2140 1830
1030 2610 1570 2400 1780 3280 2320
920 1650 3080 2960 2460 1250 2160
Organize the data into a grouped frequency distribution.
Solution. n = 28, we pick the smallest k such that 2k > 28 ⇒ k = 5, If 920 is the starting (minimum)
value,
Range (R) = 3280 − 920 = 2360
Therefore the class interval (i) is given by
R 2360
= = 472
k 5
i = Round UP (472) = 480 to the nearest u = 10
Frequency 4 9 5 7 2 1
Example 3.7. Consider the figures recorded in the table below which gives the weight of Oranges measured
to the nearest gram.
There are several different graphical displays for describing a frequency distribution data. Some of the com-
monly used displays are; Bar charts, Histograms, Frequency Polygons, and cumulative frequency curves or
ogive, pie charts, and pictograph etc.
5 275
6 91534718
7 49247
8 482
9 3
where the stems are the ten digits of the scores and the leaves are the one digits.
The disadvantage of the stem-and-leaf plots is that data must be grouped according to place value. What if
one wants to use different groupings? In this case histograms, to be discussed below, are more suited.
If you are comparing two sets of data, you can use a back-to-back stem-and-leaf plot where the leaves
are sets listed on either side of the stem as shown in the table below.
96 0 57
87641 1
887655322221 2 2567889
9964432 3 123444556789
9651 4 23567899
where the stems represents the tens digits of a science test scores and the leaves represent the ones digits.
Example 4.2. Suppose the members of your class scored the following percentages in a statistics test:
32 56 45 78 77 59 65 54 54 39
45 44 52 47 50 52 51 40 69 72
36 57 55 47 33 39 66 61 48 45
53 57 56 55 71 63 62 65 58 55
Construct a stem and leaf diagram.
3 23699
4 0455778
5 0122344555667789
6 1235569
7 1278
Example 4.3. The number of stories in two selected samples of tall buildings in Nairobi and Mombasa is
shown below. Construct a back-to-back stem and leaf plot, and compare the distributions.
Nairobi: 55, 70, 44, 36, 36, 40, 63, 40, 44, 34, 38,60, 47, 52, 32, 32, 50, 53, 32, 28, 31, 52, 32, 34, 32, 30,
26, 29.
Mombasa: 61, 40, 38, 32, 30, 58, 40, 40, 25, 30, 50, 38, 36, 54, 40, 36, 30, 30, 53, 39, 36, 34, 33, 39, 32.
Solution
Notice the stem and leaf display is visual representation of the data. It is easy to see that there are more
marks in the fifties than in the seventies.
The Histogram
A “histogram” is a diagram which is directly related to a grouped frequency distribution table and consists
of a collection of rectangles whose height represents the class frequency (to some suitable scale) and whose
breadth represents the class width.
A histogram is a bar graph on which the bars are adjacent to each other with no space between them.
In a histogram, each of the classes in the frequency distribution is represented by a vertical bar whose height
is the class frequency of the interval. The horizontal endpoints of each vertical bar correspond to the class
endpoints.
To construct a histogram, arrange the data in equal intervals. Represent the frequencies along the verti-
cal axis and the scores along the horizontal axis. The true limits of any interval extend one half unit beyond
the endpoints established for the interval and are represented in this manner on the horizontal axis. For
example, the true limits of the interval 76-80 are 75.5 and 80.5. To get the proper perspective, the vertical
axis should be approximately three-fourths as long as the horizontal axis.
Histograms are close relatives of bar plots. The main difference is that in histograms we are interested
in the distribution of data. In other words, we wish to know if there is regularity in the number of obser-
vations that fall within a category. This means that how the data are binned takes on an additional importance.
For grouped data the height of each rectangle is the relative frequency (h) of a class given by
f
h=
i
where f is the class frequency and i is the class interval. The value represented on the x-axis of a histogram
is the middle point of the classes that determine the width of the bars (placed at the middle of the bar) or
class boundaries (placed at the edges of the bars).
Histogram
Solution
Histogram
Practice Problems
Question 4.1. Illustrate the following set of measurements on a histogram, superimpose a frequecy polygon,
plot an ogive and a stem and leaf diagram:
72 82 56 73 87 89 72 86 88 76
86 69 84 85 62 97 70 78 84 93
70 60 91 76 83 94 65 72 92 81
98 78 88 76 96 89 90 83 74 80
Question 4.2. The height of 100 maize plants was measured, to the nearest cm, one month after planting.
Height: 1-20 21-40 41-60 61 - 80
Number of Plants: 12 28 54 6
Construct the corresponding histogram superimpose a frequecy polygon, plot an ogive and a stem and leaf
diagram.
Question 4.3. Construct a histogram and superimpose a frequecy polygon for the following scores earned
by a group of high school students on a Statistical Aptitude Examination.
Score Number of students
400-449 20
450-499 35
500-549 50
550-599 50
600-649 40
650-699 20
700-749 10
Question 4.4. The weights of 40 football players are as follows:
210 181 192 164 170 186 205 194
178 161 175 195 172 188 196 182
206 188 165 202 178 163 190 198
187 198 174 172 183 208 185 162
203 172 196 184 185 176 197 184
b. Make a histogram for the given data and superimpose a frequecy polygon.
Question 4.5. The following table shows some test scores from a statistics class.
65 91 85 76 85 87 79 93
82 75 100 70 88 78 83 59
87 69 89 54 74 89 83 80
94 67 77 92 82 70 94 84
96 98 46 70 90 96 88 72
Question 4.6. Suppose a sample of 38 female university students were asked their weights in pounds. This
was actually done, with the following results:
130 108 135 120 97 110
130 112 123 117 170 124
120 133 87 130 160 128
110 135 115 127 102 130
89 135 87 135 115 110
105 130 115 100 125 120
120 120
Question 4.7. The table below shows the response times of calls for police service measured in minutes.
34 10 4 3 9 18 4
3 14 8 15 19 24 9
36 5 7 13 17 22 27
3 6 11 16 21 26 31
32 38 40 30 47 53 14
6 12 18 23 28 33
3 4 62 24 35 54
15 6 13 19 3 4
4 20 5 4 5 5
10 25 7 7 42 44
Construct a frequency distribution and the corresponding histogram.
Question 4.8. A nutritionist is interested in knowing the percent of calories from fat which Students intake
on a daily basis. To study this, the nutritionist randomly selects 25 students and evaluates the percent of
calories from fat consumed in atypical day. The results of the study are as follows
24% 18% 33% 25% 30%
42% 40% 33% 39% 40%
45% 35% 45% 25% 27%
23% 32% 33% 47% 23%
27% 32% 30% 28% 36%
Construct a frequency distribution and the corresponding histogram.
Frequency Polygon
It is another method of representing a frequency distribution of a graph. Frequency polygons are more suit-
able than histograms whenever two or more frequency distributions are to be compared.
Frequency polygon of a grouped or continuous frequency distribution is a straight line graph. The frequen-
cies of the classes are plotted against the mid-values of the corresponding classes. The points so obtained are
joined by straight lines (segments) to obtain the frequency polygon. It can also be obtained by connecting
midpoints of the top of the rectangles in a Histogram. The gaps at both ends are extended to the next lower
and the next upper class mark (imaginary classes with frequency zero). For grouped data, a straight line
graph is drawn with class frequency plotted against class mark (midpoint).
Example 4.7. The following data show the number of accidents sustained by 313 drivers of a public utility
company over a period of 5 years. Draw the frequency polygon.
No. of accidents: 0 1 2 3 4 5 6 7 8 9 10 11
No. of drivers: 80 44 68 41 25 20 13 7 5 4 3 2
Example 4.8. Construct an Ogive to represent the data shown below:
Solution
Upper
99.5 104.5 109.5 114.5 119.5 124.5 129.5 134.5
Boundaries
CF 0 2 10 28 41 48 49 50
Example 4.9. Draw a Histogram and Frequency polygon from the following distribution giving marks of
50 students in statistics.
Marks: 0-9 10-19 20 − 29 30 − 39 40 − 49 50 − 59 60 − 69 70 − 79 80-89
No.: 0 2 3 7 13 13 9 2 1
Note that, here we will first draw histogram and then the mid-points of the top of bars are joined by line
segments to get the frequency polygon.
Remark: Note that frequency polygon can be drawn even without converting the given distribution into
classes. The frequencies are plotted against the corresponding mid-points (given) and joined by line seg-
ment.
Frequency Curve
Frequency curve is similar like frequency polygon, only the difference is that the points are joined by a free
hand curve instead of line segments as we join in frequency polygon.
Let us study the following examples to understand the concept of frequency curve.
Solution. Here we will take ages on horizontal axis and number of students on vertical axis. We will plot
the given frequencies against mid-points of the given class interval and then join these points by free hand
curve. Extremities (first and the last point plotted) are joined to the mid-points of the neighboring class
intervals.
Cumulative Frequency Polygon
A line graph of cumulative frequency plotted against the upper class boundaries (UCBs) is called an Ogive
or cumulative frequency curve. The Ogive curve is very useful in estimating the median and the measures
of location as we will see later.
Example 4.11. The data below give the marks secured by 70 students at a certain examination:
(b) Use the ogive curve to estimate the percentage of students getting less than 45.
Solution. We will plot the cumulative frequencies on the vertical axis against the upper class boundaries of
the corresponding class on the horizontal axis. We will then join the points by a smooth free hand to get an
ogive.
To estimate the number of students getting marks less than 45, draw a perpendicular to the X-axis (rep-
resenting marks) at X = 45, meeting the ogive at point P . From P draw a perpendicular P M on the Y -axis
(representing number of students). Then, from the graph OM = 26.8 ≈ 27 is the number of candidates
getting score 45 or less. Hence, the percentage of students getting less than 45 marks is given by
27
× 100 = 38.57
70
(Here total students = 70).
Bar Graphs
Bar Graphs, similar to histograms, are often useful in conveying information about categorical data where
the horizontal scale represents some non-numerical attribute. In a bar graph, the bars are non overlapping
rectangles of equal width and they are equally spaced. The bars can be vertical or horizontal. The length of
a bar represents the quantity we wish to compare.
Example 4.12. The areas of the various continents of the world (in millions of square miles) are as follows:
11.7 for Africa; 10.4 for Asia; 1.9 for Europe; 9.4 for North America; 3.3 Oceania; 6.9 South America; 7.9
Soviet Union. Draw a bar chart representing the above data and where the bars are horizontal.
A double bar graph is similar to a regular bar graph, but gives 2 pieces of information for each item on the
vertical axis, rather than just 1.
Practice Problems
Question 4.9. Given are several gasoline vehicles and their fuel consumption averages.
Buick 27 mpg
BMW 28 mpg
Honda Civic 35 mpg
Geo 46 mpg
Neon 38 mpg
Land Rover 16 mpg
Note that
244.8 + 82.8 + 21.6 + 10.8 = 360.
The pie chart is given in Figure
Example 4.15. A sample of 250 students were asked to indicate their favourite TV station and their responses
were as follows; KBC - 52, CITIZEN - 28, KTN - 63, STV - 15 and NTV - 92 viewers. Draw a pie chart
representing this information.
Practice Problems
Question 4.11. The table below shows the ingredients used to make a sausage and mushroom pizza.
Ingredient %
Sausage 7.5
Cheese 25
Crust 50
Tomato Sauce 12.5
Mushroom 5
Plot a pie chart for the data.
Question 4.12. A newly qualifies teacher was given the following information about the regional origins of
the pupils in a class.
Region No. of pupils
Central 12
Rift Valley 7
Coast 2
Western 3
Nyanza 6
TOTAL 30
Plot a pie chart representing the data.
Question 4.13. The following table represents a survey of people’s favorite ice cream flavor.
Flavor Number of people
Vanilla 21.0%
Chocolate 33.0%
Strawberry 12.0%
Raspberry 4.0%
Peach 7.0%
Neopolitan 17.0%
Others 6.0%
Plot a pie chart to representing the data.
Question 4.14. In Kenya, approximately 45% of the population has blood type O; 40% type A; 11% type
B; and 4% type AB. Illustrate this distribution of blood types with a pie chart.
Pictograph
Another type of chart which has been used widely is the pictograph. In a pictograph, a symbol or icon is
used to represent a quantity of items. A pictograph needs a title to describe what is being presented and
how the data are classified as well as the time period and the source of the data. It is also called pictogram.
Example of a pictograph is given in Figure...
Practice Problems
Question 4.15. Make a pictograph to represent the data in the following table. Use ♣ to represent 10 glasses
of lemonade.
Day Frequency
Monday 15
Tuesday 20
Wednesday 30
Thursday 5
Friday 10
Scatter plots
A relationship between two sets of data is sometimes determined by using a scatterplot. Let’s consider the
question of whether studying longer for a test will lead to better scores. A collection of data is given below.
Study Hours 3 5 2 6 7 1 2 7 1 7
Score 80 90 75 80 90 50 65 85 40 100
Based on these data scatterplot has been prepared and is given in Figure... (Remember when making a scat-
terplot, do NOT connect the dots.)
The data displayed on the graph resembles a line rising from left to right. Since the slope of the line is
positive, there is a positive correlation between the two sets of data. This means that according to this set of
data, the longer I study, the better grade I will get on my exam score.
If the slope of the line had been negative (falling from left to right), a negative correlation would exist.
Under a negative correlation, the longer I study, the worse grade I would get on my exam.
If the plot on the graph is scattered in such a way that it does not approximate a line (it does not appear to
rise or fall), there is no correlation between the sets of data. No correlation means that the data just doesn’t
show if studying longer has any affect on my exam score. (will de done later in correlation and Regression
analysis.
There is a tendency in almost every statistical data that most of the values concentrate at the centre which
is referred as “central tendency”. The typical values which measure the central tendency are called mea-
sures of central tendency or measures of location. Measures of central tendency are commonly known as
“Averages”. They are also known as first order measures. Averages always lie between the lowest and the
highest observation.
The purpose for computing an average value for a set of observations is to obtain a single value which
is representative of all the items and which the mind can grasp simply and quickly. The single value is the
point or location around which the individual items cluster.
There are a number of different quantities, which can be used to estimate the central point of a sample.
The various types of measures of central tendency for statistical distribution discussed in this lecture in-
clude; Arithmetic mean or simply the mean, Weighted arithmetic mean, Geometric mean, Harmonic mean,
median and Quartiles, deciles, percentiles, and mode. Of these, arithmetic mean, geometric mean and har-
monic mean are called mathematical averages; median and mode are called positional averages.
Where
X̄ = sample mean
Xi = the ith data value in the sample
Σ= the sum of
n= number of data value in the sample
Example 5.1. Find the arithmetic mean for the following data representing marks in six subjects at the
university examination of a student. The marks are 74, 89, 93, 68, 85 and 76.
n
1X
Solution. n = 6, and x̄ = Xi .
n i=1
74 + 89 + 93 + 68 + 85 + 76 485
X̄ = =
6 6
= 80.83
Example 5.2. Compute the sample mean for the values,
9, 3, 4, 2, 1, 5, 8, 4, 7, and 3.
Solution. n = 10, X1 = 9, X3 = 4, · · · , X10 = 3,
5
X
Xi = 9 + 3 + 4 + 2 + 1 = 19
i=1
X5
Xi = 4 + 2 + 1 = 7
i=3
Example 5.3. A sample of five executive received the following amounts of bonus last year: 14,000, 15,000,
17,000, 16,000, and y. Find the value of y if the average bonus for these five executives is 15,400.
Solution. Since these values represent a sample size of 5, the sample mean is
5
1X
ȳ = Yi
n i=1
14, 000 + 15, 000 + 17, 000 + 16, 000 + Y
= = 15, 400
5
⇒ 62, 000 + Y = 15, 400 × 5
Y = 177, 000 − 62, 000
= 15, 000
Example 5.4. Find the mean of the numbers 5, 2, 3 , 7, and 3.
Solution. The mean is given as
5+2+3+7+3 20
X̄ = = =4
5 5
Question 5.1. Find arithmetic mean for the following data; 425 , 408 , 441 , 435 , 418.
For grouped data the mid-point of each group would normally be used in the frequency distribution to
determine the mean.
Steps:
(ii) P
Multiply these mid-points by the respective frequency of each class interval and obtain the total
f i xi .
(iii) Divide the total obtained by step (2) by the total frequency Σfi .
Example 5.6. Find the arithmetic mean for the following data representing marks of 60 students.
Marks: 10-19 20-29 30-39 40-49 50-59 60-69 70-79
No. of Students: 8 15 13 10 7 4 3
The first step is to choose one among the xi ’s as the assumed mean, and denote it by ‘a’. Also, to further
reduce our calculation work, we may take ‘a’ to be that xi which lies in the centre of x1 , x2 , · · · , xn .
The next step is to find the difference di between ‘a’ and each of the xi ’s, that is, the deviation of A from
each of the xi ’s.
i.e.,
di = xi − a
The third step is to find the product
P of di with the corresponding fi , and take the sum of all the fi di ’s. So,
f i di
the mean of the deviations, d¯ = P .
fi
Now, let us find the relation between d¯ and x̄. Since in obtaining di , we subtracted ‘a’ from each xi , so,
¯ This can be expressed mathematically as:
in order to get the mean x̄, we need to add A to d.
P P
¯ f i di fi (xi − a)
Mean of deviations, d= P = P
fi fi
P P
f i xi fi
= P − aP
fi fi
= x̄ − a
So,
x̄ = a + d¯
If the classes are of equal width the work of calculating the mean is made easy by change of origin and scale.
The assumed mean method gives the mean as:
P
f i di
x̄ = a + P
fi
Example 5.8. Solve the above example on wages using the assumed mean method.
Activity 5.1. From the Table ,... find the mean by taking each of xi (i.e., ... and so on) as a. What do you
observe? You will find that the mean determined in each case is the same, i.e., 27.85. (Why?)
So, we can conclude that the value of the mean obtained does not depend on the choice of ‘a’. If after
subtracting assumed mean a from each of the observations, the values are still large, one could divide the
xi − a
deviations with a constant value c so that ui = , where a is the assumed mean and c is the class size.
c
Observe that in Table ..., the values in Column 4 are all multiples of 15..... So, if we divide the values in the
entire Column 4 by 15, we would get smaller numbers to multiply with fi (Here, 15 is the class size of each
class interval.)
Now, we calculate ui as above and continue as before (i.e., find fi ui and then fi ui . Taking c = 5, let
P
us construct Table...
(X − 35)
X f u= fu
5
15 2 −4 −8
20 22 −3 −66
25 19 −2 −38
30 14 −1 −14
35 3 0 0
40 4 1 4
45 6 2 12
50 1 3 3
55 1 4 4
Σf = 72 Σf u = −103
Let P
fi ui
ū = P
fi
Here, again let us find the relationship between ū and x̄.
We have,
xi − a
ui =
c
Therefore,
P xi − a
fi P P
c 1 f i xi − a f i
ū = P = P
fi c fi
P P
1 fx f 1
= P i i − a P i = [x̄ − a]
c fi fi c
cū = x̄ + a
So;
ū = a + cū
P
fi ui
=a+c P
fi
P
fi ui −103
x̄ = a + c P = 35 + 5 = 27.85
fi 72
We note that:
• the step-deviation method will be convenient to apply if all the di ’s have a common factor.
• The assumed mean method and step-deviation method are just simplified forms of the direct method.
• The formula x̄ = a + cū still holds if a and c are not as given above, but are any non-zero numbers
xi − a
such that ui = .
c
Let us apply these methods in another example.
Example 5.9.
Remark: The result obtained by all the three methods is the same. So the choice of method to be used
depends on the numerical values of xi and fi . If xi and fi are sufficiently small, then the direct method is
an appropriate choice. If xi and fi are numerically large numbers, then we can go for the assumed mean
method or step-deviation method. If the class sizes are unequal, and xi are large numerically, we can still
apply the step-deviation method by taking c to be a suitable divisor of all the di ’s.
If x1 , x2 , · · · , xn are the n values of the variable X with the corresponding weights w1 , w2 , · · · , wn , then
the weighted mean is given by;
n
X
w i xi
w1 x1 + w2 x2 + · · · + wn xn i=1
x̄w = = n
w1 + w2 + · · · + wn X
wi
i=1
The weights are normally assigned as measures of the importance of a subject to the issue under considera-
tion.
Example 5.10. Calculate the weighted mean for the following data.
X 28 25 20 32 40
w 3 6 4 5 8
Solution. The data is represented in the table below:
xi w i w i xi
28 3 84
25 6 150
20 4 80
32 5 160
40 8 320
Totals: 26 794
P
wi xi 794
x̄w = P = = 30.54 units
w 26
Example 5.11. Consider the table below with marks obtained by two students James (marks x) and Jane
(marks y). The subjects are to be used in determining who joins an Engineering course whose requirements
is a mean of 58% in the four subjects.
For both students, mean X̄ = 225/4 = 56.25 implying that they are both unqualified but how much of
history is required in the course?
Subject % marks (X) % marks (Y ) Weight (w) wX wY
Mathematics 25 70 3.6 90.0 252.0
English 87 45 2.3 200.1 103.5
History 83 35 1.5 124.5 52.5
Physics 30 75 2.6 78.0 195.0
Totals 225 225 10 492.6 603.0
Solution. If the subjects are given weights depending on their usefulness to the programme (column 4), the
weighted means are;
For James n
X
wi Xi
i=1 492.6
X̄w = n = = 49.26
X 10
wi
i=1
which clearly falls below the required mean.
For Jane n
X
w i Xi
i=1 603
n = = 60.3
X 10
wi
i=1
which clearly above the required mean. Jane therefore qualifies for the admission into the programme.
Example 5.12. A tycoon has three house girls who he pays Ksh.2,000 per month each, two watchmen who
receives Ksh.2,500 per month each and some gardeners who he pays ksh3,500 each. If he pays out an average
of Ksh.2,850 per month to these people. Find the number of gardeners?
Solution. The weighted mean,
n
X
w i Xi
i=1
X̄w = n
X
wi
i=1
Question 5.6. The Geometric Mean of three numbers is 8. Two of the numbers are 4 and 32. What is the
third number?
Question 5.7. Find the Geometric Mean of the values 10, 5, 15, 8, 12.
(For Grouped Data) If we have a series of n positive values with repeated values such as x1 , x2 , x3 , · · · , xn
are repeated f1 , f2 , f3 , · · · , fn times respectively then the Geometric mean denoted by G is given as:
q
G.M of X = xf11 · xf22 · · · xfnn
n
where n = f1 + f2 + f3 + · · · + fn .
X 13 14 15 16 17
f 2 5 13 7 3
Solution. Here
Using the formula of geometric mean for grouped data, geometric mean in this case will become:
q
G.M of X = xf11 · xf22 · · · xfnn
n
p
= 30 (13)2 · (14)5 · (15)13 · (16)7 · (17)3
√ 1
= 2.33292 × 1035 = 2.33292 × 1035 30
30
= 15.0984 ≈ 15.10
The method explained above for the calculation of geometric mean is useful when the numbers of val-
ues in given data are small in number and the facility of electronic calculator is available. When a set of
data contains large number of values then we need an alternative way for computing geometric mean. The
modified or alternative way of computing geometric mean is given as under:
Example 5.15. Find the Geometric Mean of the values 10, 5, 15, 8, 12
x log x
10 1.0000
5 0.6990
15 1.1761
8 0.9031
12 1.0792
Total Σ log x = 4.8573
Σ log x
G.M of X = Antilog
n
4.8573
= Antilog
5
= Antilog (0.9715)
= 9.36
X 15 20 25 30 35 40 45 50
f 2 22 29 24 7 8 6 2
Solution. The Geometric mean is given by
h i n1
G= xf11
· xf22 where n = Σf
· · · xfnn
1X
log G = fi log xi
n
1
= [f1 log x1 + f2 log x2 + · · · + fn log xn ]
n
1
= [2 log 15 + 22 log 20 + 29 log 25 + 24 log 30 + 7 log 35 + 8 log 40 + 6 log 45 + 2 log 50]
100
1
= [143.903]
100
= 1.43908
G = 27.48
Example 5.17. Find the Geometric Mean for the following distribution of students marks:
Question 5.10. Calculate the Geometric Mean of the first eight natural numbers.
Example 5.20. Calculate the harmonic mean for the given below:
Marks 30 − 39 40 − 49 50 − 59 60 − 69 70 − 79 80 − 89 90 − 99
f 3 4 12 21 33 26 8
Example 5.21. Calculate the Harmonic mean of the data in the example above.
x 1
x
frequency f · x1
15 0.066 2 0.13
20 0.050 22 1.10
25 0.040 29 1.16
30 0.033 24 0.80
35 0.029 7 0.20
40 0.025 8 0.20
45 0.022 6 0.13
50 0.020 2 0.04
Σf = 100 Σfi /xi = 3.76
The harmonic mean is given by
Σfi
H=
fi
Σ xi
100
=
3.76
= 26.60
(iv). Harmonic mean does not exist if any of the values of the data is zero.
(v). it is suitable when observations are dealing with rates i.e., speed in Km/h.
Let x̄1 and x̄2 be the arithmetic means of data set 1 and data set 2 respectively, then
78 + 66 + 43 + 56 + 76 + 26 + 57 + 42 444
x̄1 = =
8 8
= 55.5
65 + 52 + 42 + 53 + 53 265
x̄2 = =
5 5
= 53
Next, is the mean of the combined sets of data the average of the two means? Let us check;
Therefore the mean of combined sets of data is NOT (x̄1 + x̄2 )/2 however if n1 = n2 the case will hold.
In general if there are k data sets with means x̄1 , x̄1 , · · · , x̄k then the combined mean x̄ for all of the data
sets is
n1 x̄1 + n2 x̄2 + · · · + nk x̄k
x̄ =
n1 + n2 + · · · + nk
n1 X̄1 + n2 X̄2
X̄c =
n1 + n2
The above formula can be generalized for more than two groups. If n1 , n2 , · · · , nk are sizes of k groups
with means X̄1 , X̄2 · · · , X̄k respectively then the mean X̄c of the combined group is given by
Solution. Given;
Group I Group II
n1 = 80 n2 = 70
X̄1 = 1500 X̄2 200
5.7 Mode
Mode is the value which occurs the greatest number of times in the data. When each value occur the same
numbers of times in the data, there is no mode. If two or more values occur the same numbers of time, then
there are two or more modes and distribution is said to be multi-mode. If the data having only one mode the
distribution is said to be uni-model and data having two modes, the distribution is said to be bi-model.
The mode is the most frequently occurring value in a set of observation. For example, given 2, 3, 4, 5, 4,
the mode is 4, because there are more fours than any other number. Data may have two modes. In this case
we say the data are bimodal, and observations with more than two modes are referred to as multi-modal.
Note that the mode does not have important mathematical properties for future use. Also, the mode is not a
helpful measure of location, because there can be more than one mode or even no mode.
When the class widths (sizes) are constant say c, frequencies are proportional to frequency densities so
L1 ∆1 d1
= = (1)
L2 ∆2 d2
where ∆1 is the difference between the frequency of the modal class and the frequency of the class preceding
the modal class.
∆2 is the difference between the frequency of the modal class and the frequency of the class following the
modal class.
But also, L1 + L2 = c
∆2 L1
L1 + =c
∆1
∆1 L1 + ∆2 L2 = c∆1
(∆1 + ∆2 )L1 = c∆1
c∆1
L1 =
∆1 + ∆2
The mode is given by
M0 = Lcb + L1
∆1 c
= Lcb +
∆1 + ∆2
(fm − f1 )c
Mode = lcb +
(fm − f1 ) + (fm − f2 )
(fm0 − f1 )c
= Lcb +
2f0 − f1 − f2
where;
Using the same diagram the formula may be modified to cater for the case when the classes do not have
constant width. In this case the mode is given by;
M0 = Lcb + L1
d1 × cm0
= Lcb +
d1 + d2
where;
Example 5.26. Calculate the mode for the following frequency distribution of marks obtained by 50 students
in Statistics.
Marks 6 − 10 11 − 15 16 − 20 21 − 25 26 − 30 31 − 35 36 − 40 41 − 45 45 − 50
Frequency 5 6 15 10 5 4 2 2 1
Solution. The modal class is 16 − 20. Thus, lcb = 15.5, f1 = 6, f2 = 10, fm0 = 15 and c = 5.
The mode is then given by
(fm0 − f1 ) × c
M0 = Lcb +
2fm0 − f1 − f2
(15 − 6)5
= 15.5 +
30 − 6 − 10
= 15.5 + 3.214
= 18.71
Example 5.27. Calculate the mode for the following frequency distribution scores for 80 students.
Scores Frequency Class interval Frequency density
5 − 20 8 16 0.50
21 − 40 12 20 0.60
41 − 55 18 15 1.20
56 − 87 40 32 1.25
88 − 95 2 8 0.25
Σf = 80
The modal class is 56−87, lcb = 55.5, d1 = (1.25−1.20) = 0.05, d2 = (1.25−0.25) = 1.00 and cm0 = 32.
The mode is given by;
M0 = Lcb + L1
d1 × cm0
= Lcb +
d1 + d2
0.05 × 32
= 55.5 +
0.05 + 1.00
= 55.5 + 1.52
= 57.02
5.8 Median
The median by definition refers to the middle value in the arrayed data. It means that when the data are
arranged, median is the middle value if the number of values is odd and the mean of the two middle values,
if the numbers of values is even. A value which divides the arrayed set of data in two equal parts is called
median, the values greater than the median is equal to the values smaller than the median. The 50% obser-
vations lie below the value of the median and 50% observations lie above it. Median is called a positional
average. It is denoted by X̃ read as X- tilde, and also sometimes denoted by M .
The first step is to arrange the data in ascending or descending order of magnitude.
Example 5.29. Find the median for the following set of data:
Since there is an even number of data, the average of the middle two numbers (i.e., 69 and 73) is the median
(142/2 = 71). Note that in general, location of the median is = (n + 1)/2 where n = total number of items.
Generally, the median provides a better measure of location than the mean when there are some extremely
large or small observations (i.e., when the data are skewed to the right or to the left).
n = 8 (even number)
Example 5.31. Find the median for the following data sets.
(i) 3, 5, 2, 7, 8
(ii) 3, 5, 2, 7, 8, 11
Solution:
2, 3, 5, 7, 8
The median is 5
(ii) 2, 3, 5, 7, 8,11
5+7 12
= =6
2 2
Note: The results of median will not be affected by arranging in ascending or descending order.
Steps:
(1) Arrange the data in ascending or descending order of magnitude with respective frequencies.
(4) Find the cumulative frequency c.f column that is either equal or greater than fi /2 and determine
P
the value of the variable corresponding to it.
n+1
Median = thvalue
2
Example 5.32. Consider the following frequency distribution. Calculate the median.
X 1 2 3 4 5 6 7
f 2 10 15 20 25 18 10
Solution. First we compute the cumulative frequency of the data.
Observation Frequency Cumulative
X f frequency (C.F)
1 2 2
2 10 12
3 15 27
4 20 47
5 25 72
6 18 90
7 10 100
Total 100
th
n+1 100 + 1
Median = size of term = = 50.0.
2 2
The median is 5 because 51st item lies corresponding to 5.
Steps:
n
1. To locate the median class, divide the cumulative frequency n by 2, since the median is the the
2
value of the variable when data is arranged in ascending order.
2. Up to the lcb of the interval containing the median we have cumL items say.
3. If we assume that the values are evenly distributed in the interval containing the median, then to reach
the median from the lcb we add
( n2 − cumL )c
fme
Example 5.33. Calculate median for the following data.
Group 20 − 24 25 − 29 30 − 34 35 − 39 40 − 44 45 − 49
Frequency 2 6 10 14 8 3
Solution. The class boundaries and cumulative frequency are given in the table below:
n
− cumL c
Median = lcb + 2
fm
50
2
− 17 5 20
= 34.5 + = 34.5 +
14 7
= 38.257
Now, that you have studied about all the three measures of central tendency, let us discuss which measure
would be best suited for a particular requirement.
The mean is the most frequently used measure of central tendency because it takes into account all the
observations, and lies between the extremes, i.e., the largest and the smallest observations of the entire data.
It also enables us to compare two or more distributions. For example, by comparing the average (mean)
results of students of different schools of a particular examination, we can conclude which school has a
better performance.
However, extreme values in the data affect the mean. For example, the mean of classes having frequencies
more or less the same is a good representative of the data. But, if one class has frequency, say 2, and the five
others have frequency 20, 25, 20, 21, 18, then the mean will certainly not reflect the way the data behaves.
So, in such cases, the mean is not a good representative of the data.
In problems where individual observations are not important, and we wish to find out a ‘typical’ obser-
vation, the median is more appropriate, e.g., finding the typical productivity rate of workers, average wage
in a country, etc. These are situations where extreme values may be there. So, rather than the mean, we take
the median as a better measure of central tendency.
In situations which require establishing the most frequent value or most popular item, the mode is the
best choice, e.g., to find the most popular Television programme being watched, the consumer item in great-
est demand, the colour of the vehicle used by most of the people, etc.
Remarks:
2. The median of grouped data with unequal class sizes can also be calculated. However, it is not dis-
cussed here.
Question 5.16. If the median of the distribution given below is 28.5, find the values of x and y.
Class Interval 0 − 9 10 − 19 20 − 29 30 − 39 40 − 49 50 − 59
Frequency 5 x 20 15 y 5
6 Partition Values
If the values of the variate are arranged in ascending or descending order of magnitudes then we have seen
above that median is that value of the variate which divides the total frequencies in two equal parts. Similarly
the given series can be divided into four, ten and hundred equal parts. The values of the variate dividing into
four equal parts are called Quartile, into ten equal parts are called Decile and into hundred equal parts are
called Percentile.
6.1 Quartiles
The Quartiles are those values which divide the set of observations into four equal parts. There are three
quartiles called, first quartile, second quartile and third quartile. The first quartile is also called lower quartile
and is denoted by Q1 is the value that lies between the smallest value and the median. The median is the
second quartile Q2 . The third quartile is also called upper quartile and is denoted by Q3 is the value that
lie midway between the median and the largest value. The lower quartile Q1 is a point which has 25%
observations less than it and 75% observations are above it. The upper quartile Q3 is a point with 75%
observations below it and 25% observations above it.
Generally, the interpolation formula for the n.dth value, (n, whole part and d, the decimal part) is;
n.dth value = nth value + 0.d × [(n + 1)th value − nth value]
Example 6.2. The wheat production (in Kg) of 20 acres is given as: 1120, 1240, 1320, 1040, 1080, 1200,
1440, 1360, 1680, 1730, 1785, 1342, 1960, 1880, 1755, 1720, 1600, 1470, 1750, and 1885. Find the quartile
deviation and coefficient of quartile deviation.
Solution. After arranging the observations in ascending order, we get 1040, 1080, 1120, 1200, 1240, 1320,
1342, 1360, 1440, 1470, 1600, 1680, 1720, 1730, 1750, 1755, 1785, 1880, 1885, 1960.
th
n+1
Q1 = Value of item
4
th
20 + 1
= Value of item
4
= Value of (5.25)th item
= 5th value + 0.25 (6th value − 5th value) = 1240 + 0.25(1320 − 1240)
Q1 = 1240 + 20 = 1260
th
3(n + 1)
Q3 = Value of item
4
th
3(20 + 1)
= Value of item
4
= Value of (15.75)th item
= 15th value + 0.75 (16th value − 15th value) = 1750 + 0.75(1755 − 1750)
Q3 = 1750 + 3.75 = 1753.75
th
n+1 X
Q1 = Value of item n= f
4
th th
n+1 n+1
Q2 = Value of 2 item = Value of item = Median
4 2
th
n+1
Q3 = Value of 3 item
4
Example 6.3. Calculate the quartile deviation and coefficient of quartile deviation from the data given below:
where:
When i = 2 we get; n
− cumL c
Median = lcb + 2
fm
Where;
Example 6.4. Calculate the three quartiles for the following frequency distribution of marks obtained by 50
students.
Marks Frequency Cumulative
Frequency
6 − 10 5 5
11 − 15 6 11
16 − 20 15 26
21 − 25 10 36
26 − 30 5 41
31 − 35 4 45
36 − 40 2 47
41 − 45 2 49
46 − 50 1 50
Σf = 50
n
i · − cumL c
The Qi = lcb + 4 for i = 1, 2, 3
fQi
th
The first quartile Q1 is the 50
4
= 12.5th observation from the smallest. This value is found in the 16 −
20 class interval.
50 th
The median is the or 25th observation. This occurs in class 16 − 20 so that
2
6.2 Deciles
The deciles are the partition values which divides the set of observations into ten equal parts. There are nine
deciles namely D1 , D2 , D3 , · · · , D9 . The first decile is D1 is a point which has 10% of the observations
below it.
6.3 Percentiles
The percentiles are the points which divide the set of observations into one hundred equal parts. These points
are denoted by P1 , P2 , P3 , · · · , P99 , and are called the first, second, third, · · · , ninety ninth percentiles. The
percentiles are calculated for very large number of observations like workers in factories and the population
in counties or countries. The percentiles are usually calculated for grouped data. The first percentile denoted
by P1 is calculated as
th
n+1
P1 = Value of item
100
th
We find the group in which the n+1 100
item lies and then P1 is interpolated from the formula.
c in
The Pi = lcb + − cumL for i = 1, 2, 3, · · · , 99
fPi 100
15th value lies in class 18 − 22 because the previous cumulative frequency is 13 < 15 and the following one
is 34 > 15. Then, LCB = 17.5, k = 15, pcf = 13, , f = 12 and i = 22.5 − 17.5 = 5.
15 − 13
15th value = 17.5 + × 5 = 17.5 + 0.83 = 18.33
12
Earlier, we found that the median is located at position
n+1
2
for ungrouped data. But with grouped data we normally drop the 1 in (n + 1) to get
n
2
as the position of the median value. Here the median = 40
2
= 20 th value.
20 − 13
20th value = 17.5 + × 5 = 17.5 + 2.92 = 20.42
12
What is the 85th percentile?
nth 34 − 25
P85 = 85 value = 34th value = 22.5 + × 5 = 22.5 + 5 = 27.5
100 9
Example 6.7. The table below shows the frequency of weekly withdrawals of money from a certain Bank.
Use the data to answer the following questions that follows.
Amount Withdrawn Frequency
1,000 - 4,999 10
5,000 - 8,999 14
9,000 - 12,999 20
13,000 - 16,999 16
17,000 - 20,999 12
21,000 - 24,999 8
(c) The position of the withdrawal sheet reading Ksh10,000.00 if the sheets are arranged in ascending
order.
Solution. To choose the assumed mean, we note that the two middle classes are 9,000 - 12,999 and 13,000
- 16,999 with frequencies 20 and 16 respectively. We choose assumed mean A = 10, 999.5 as the midpoint
of the one with a higher frequency. The class interval c = 12, 999.5 − 8.999.5 = 4, 000 which is a good
choice of scaling constant, then
Class-interval x f x − A d = (x−A)
c
d2 f d f d2 cf
1,000 - 4,999 2,999.5 10 -8,000 -2 4 -20 40 10
5,000 - 8,999 6,999.5 14 -4,000 -1 1 -14 14 24
9,000 - 12,999 10,999.5 20 0 0 0 0 0 44
13,000 - 16,999 14,999.5 16 4,000 1 1 16 16 60
17,000 - 20,999 18,999.5 12 8,000 2 4 24 48 72
21,000 - 24,999 22,999.5 8 12,000 3 9 24 72 80
80 30 190
6.6 Boxplots
The summary information contained in the quartiles is highlighted in a graphic display called a boxplot.
The center half of the data, extending from the first to the third quartile, is represented by a rectangle. The
median is identified by a bar within this box. A line extends from the third quartile to the maximum and
another line extends from the first quartile to the minimum. (For large data sets the lines may only extend to
the 95th and 5th percentiles).
1. A constant value (k say) added or subtracted from the each of the values in the data set translates the
measures by the same constant. That is, New m= Old m + k. For the data set 2, 3, 4, 5, and 6 with
mean x̄ = 4. If we add k = 20 to every value to have 22, 23, 24, 25, and 26 then the mean x̄ = 24.
2. If each value in the set is multiplied with a constant k, the new measure of location is given by k × m.
For the data is 2, 3, 4, 5, and 6 each of the values is multiplied or divided by k = 2, we have for
multiplication the new median m = 2 × 4 = 8 and for division as m = 24 = 2.
7 Measures of Dispersion
7.1 Introduction
The measures of central tendency help to locate the center of the distribution, but they do not reveal how
the observations are spread out on either side of the center. Although the measures of central tendency pro-
vide useful information about the data, they depend largely on the extent to which the data are dispersed.
The degree to which numerical data tend to spread about the average value is called the variation or disper-
sion of the data. The data sets may have common means, medians and modes and identical frequencies in
the modal class, yet they may differ widely in their spread of values about the measures of central tendencies.
Consider the four data sets and their measures of central tendency.
Data Sets A B C D
8 8 8 4
8 8 6 12
8 6 7 8
8 10 9 16
8 8 10 0
Mean 8 8 8 8
Mode 8 8
Median 8 8 8 8
From the given measures, it is not possible to differentiate the four sets in the absence of the raw data. Other
measures are required to make the comparison. The measures of dispersion or spread which is the degree
of scatter or variation of the variable about the central value can be considered. There are various measures
of dispersion which include; Range, Inter-Quartile Range, Quartile deviations, Mean Absolute Deviation,
Variance and Standard Deviation.
7.2 Range
The range is the simplest of all the measure of dispersion and is defined as the difference between the largest
and smallest value of a given data set. Range is denoted by R and is given by:
where xmax is the maximum value and xmin is the smallest observation in the sample (data set).
The major disadvantage of the range is that it does not include all of the observations. Only the two most
extreme values are included and these two numbers may be untypical observations. For example, given that
the ages for a sample of 8 students at CSC are: 24, 18, 22, 19, 25, 20, 23, and 21, the range for this data set
is:
25 − 18 = 7.
Note: In case of grouped frequency distribution, range will be the difference between UCB of the highest
class and LCB of the lowest class.
Find (a) mean, (b) IQR (c) Quartile deviation and (d) mean absolute deviation
Solution. Let
(a). Mean P
xi 3 + 6 + 9 + ··· + 6 + 5
x̄ = = = 7.5
n 12
(b). Arranging the observations in ascending order:
1, 3, 3, 5, 6, 6, 7, 9, 10, 12 , 13, 15
IQR = Q3 − Q1
th
13
Q1 = value
4
= 3.25th value = 3rd + 0.25(4th − 3rd )
Q1 = 3 + 0.25(5 − 3) = 3 + 0.5 = 3.5
(d).
|3 − 7.5| + |6 − 7.5| + · · · + |5 − 7.5|
M AD =
12
43
= = 3.5833 ≈ 3.58
12
where x1 , x2 , · · · , xn are the set of n observations. This is referred to as the variance of the data set some
books denoted it by σ 2 .
Note: The variance formula given above is a good estimate (unbiased) of the population variance only
if the sample is large (n ≥ 30). If this is not the case the formula because;
n
X
(xi − x̄)2
i=1
s2 =
n−1
Throughout these notes, n is used instead of n − 1 for the variance calculation (Assume that sample is a
large sample or the variance is not for inference).
Example 7.2. Consider the set ofP
values: 3, 8, and 4 whose mean is 5. Now, (3 − 5) + (8 − 5) + (4 − 5) =
−2 + 3 − 1 = 0. In other words, (xi − x̄) = 0. This is generally the case for any data set.
Note: For grouped frequency distribution, the use of x from the midpoint values instead of actual observa-
tions leads to just estimates (not accurate answers).
P 2 P 2
2 fd fd
s = P − P
f f
Example 7.3. The masses (x) in kilograms of 30 bridging studentsPwho arrived the P
first day was recorded
as they reported for the course and the following results calculated x = 1530 and x2 = 80, 604.
(b). Find Find the mean and the standard deviation if afterwards the weighing machine was discovered to
be under weighing them by 2 kg.
(c). On the second day two students weighing 48 kg and 56 kg were absent. find the mean and the standard
deviation of weight of those who were present.
(b). The new mean = 51 + 2 = 53 kg and new standard deviation s = 9.26 (remains the same).
Example 7.5. A quality control supervisor has taken a sample of 16 bolts from the output of a thread-cutting
machine and tested their tensile strengths. The results, in tons of force required for breakage, are s follows:
2.20 1.95 2.15 2.08 1.85 1.92
2.23 2.19 1.98 2.07 2.24 2.31
1.96 2.30 2.27 1.89
If the grouped frequency distribution has equal widths then the change of origin and scale (the so called
Assumed mean or Coding method) can simplify the calculations for the standard deviation to
Example 7.6. Calculate the standard deviation for the wages given in example 1 above;
7.9 Variance
It is defined to be the square of the standard deviation and is thus denoted by s2 . For computational purposes
the two formulae for standard deviation are given as;
s 2
Σx2i
Σxi
S= − for un-grouped data
n n
s 2
Σfi x2i
Σfi xi
S= − for grouped data
Σfi Σfi
Example 7.7. Calculate the standard deviation of the wages in example 1 above;
xi x2i fi f i xi fi x2i
15 225 2 30 450
20 400 22 440 8800
25 625 19 475 11,875
30 900 14 420 12,600
35 1225 3 105 3,675
40 1600 4 160 6,400
45 2025 6 270 12,150
50 2500 1 50 2,500
55 3025 1 55 3,025
Generally the dispersion of observations remains the same if a constant is added to each of the data items.
where d1 and d2 are the differences of the means x̄1 and x̄2 , from the combined mean x̄ respectively.
Example 7.8. Find the combined standard deviation of two series A and B.
Series A Series B
Mean 50 40
Standard deviation 5 6
No. of items 100 150
Solution. Given x̄1 = 50 and x̄2 = 40, s21 = 25 and s22 = 36, n1 = 36 and n2 = 150.
100 × 50 + 150 × 40
Combined mean x̄ = = 44.
100 + 150
d1 = x̄1 − x̂ = 50 − 44 = 6 and d2 = x̄2 − x̄ = 40 − 44 = −4.
100(25 + 36) + 150(36 + 16)
Combined variance =
100 + 150
= 55.6
√
Therefore, combined SD = 55.6 = 7.46
Some of the relative measures of dispersion (or coefficient of dispersion) which are commonly used include;
Quartile coefficient of dispersion, Coefficient of mean dispersion and Coefficient of variation or coefficient
of dispersion.
used, i.e., the coefficients of variation for different distributions are compared, and the distribution with the
largest coefficient of variation value has the greatest relative variation.
The coefficient of variation is the ratio between the standard deviation of a sample and its mean, i.e., it
reflects the variation in a distribution relative to the mean:
σ
C.V =
X̄
The coefficient of variation is usually expressed in percentages:
σ
C.V = × 100%
X̄
It allows us to compare the dispersions of two different distributions if their means are positive. The coeffi-
cient of variation for a distribution can be calculated to compare the values obtained with another distribution.
The greater dispersion corresponds to the value of the coefficient of greater variation.
For example, Mark teaches two sections of statistics. He gives each section a different test covering the
same material. The mean score on the test for the day section is 27, with a standard deviation of 3.4. The
mean score for the night section is 74 with a standard deviation of 8.0. Which section has the greatest vari-
ation or dispersion of scores?
Direct comparison of the two standard deviations shows that the night section has the greatest variation.
But comparing the coefficient of variations show quite different results:
8
C.V.(day) = (3.4/27) × 100 = 12.6% and C.V.(night) = ( ) × 100 = 8.5%
94
Thus, based on the size of the coefficient of variation, Mark finds that the night section test results have a
smaller variation relative to its mean than do the day section test results.
Example 7.9. A distribution is x̄ = 140 and σ = 28.28 and the other is x̄ = 150 and σ = 24. Which of the
two has a greater dispersion?
28.28
C.V1 = × 100 = 20.2%
140
24
C.V2 = × 100 = 16%
150
The first distribution has a higher dispersion.
Question 7.1. If the mean of the values 14, y, 17, 16, and y is y + 0.4, find the value y.
Question 7.2. Find the (a). mean, (b). median, (c). 3rd Quartile, (d). 8th Decile (e). 85th Percentile, and
(f). mode of the following data below.
11 12 3 14 7 18 9 23 5 18 4 10
(a) Add 10 to each value of the given data and repeat (a). to (f).
(b) Subtract 5 from each of the given values and repeat (a). to (f).
Question 7.3. For 3 sample data sets n1 = 10, x̄1 = 15.4, n2 = 15, x̄2 = 6.2 and n3 = 12, x̄3 = 3.8. Find
the combined mean.
Question 7.4. Find the variance and standard deviation of the following values
15 23 19 16 18 23 14 12
(a) Complete the column for the mid-interval values and the other columns for the table below.
(b) Calculate an estimate of the mean mark and the standard deviation.
(c) Explain why the mean and standard deviation are estimates and not the exact value.
Question 7.5. The data in the table below represents the weight (in kg) of 50 sacks of potatoes leaving a
farm shop.
10.4 11.2 9.3 11.3 10.0 9.9 8.7 9.2 10.6 10.7
10.0 10.5 9.6 10.8 11.3 10.2 9.4 11.6 8.8 10.6
9.3 8.5 10.3 8.9 11.0 10.6 10.9 9.6 10.1 12.8
11.3 10.4 10.0 9.7 10.2 10.0 9.5 10.3 10.6 10.0
9.6 8.2 11.5 9.5 10.6 8.1 9.9 10.4 9.7 10.2
(a) Organize the data into a grouped frequency distribution starting with class
(b) Estimate the sample mean from the grouped frequency table.
(d) Find the modal class and the mode of the data.
8 Statistical Moments
Moments can be defined as the arithmetic mean of various powers of deviations taken from the mean of a
distribution. For any frequency distribution, the rth moment about any point A is defined as the arithmetic
mean of rth powers of deviations from the point A. Thus, by using moments, we can measure the central
tendency of a series, dispersion or variability, skewness and the peakedness of the curve. The mean and the
variance provide information on the location and variability (spread, dispersion) of a set of numbers, and
by doing so, provide some information on the appearance of the distribution (for example, as shown by the
histogram) of the numbers. The mean and variance are the first two statistical moments, and the third and
fourth moments also provide information on the shape of the distribution.
8.2.1 Moments about mean (or Central Moments) for ungrouped data
Let x1 , x2 , · · · , xn be the n values of the variable X, then the r-th moment about the mean (arithmetic mean)
X̄ is denoted by mr and is defined by
n
1X
mr = (xi − x̄)r , for r = 0, 1, 2, · · ·
n i=1
When r = 0,
n
1X
m0 = (xi − x̄)0 = 1
n i=1
The first central moment equals zero:
n
1X
m1 = (xi − x̄) = 0
n i=1
The second central moment is the variance, which gives information on the spread or scale of the distribution
of numbers. n
1X
m2 = (xi − x̄)2
n i=1
The third moment, is used to define the skewness of a distribution,
n
1X
m3 = (xi − x̄)3
n i=1
Example 8.2. Find the first four moments about the mean for the set of numbers 2, 3, 4, 5, and 6.
Solution. The first four moments about the mean are given by
Question 8.1. The first four moments of the distribution about the mean of a variable are 2, 20, 40 and 50.
Find the central moment.
8.2.2 Moments about mean (or Central Moments) for discrete frequency distribution
For a discrete frequency distribution. Then the r-th moment mr about the mean X̄ is defined by
n
1 X
mr = fi (xi − x̄)r , for r = 0, 1, 2, · · ·
N i=1
n
X
where N = fi .
i=1
In the case of a frequency distribution, the first four moments will be:
First moment: m1 = 1
P
N
fi (Xi − X̄)
Second moment: m2 = 1
fi (Xi − X̄)2
P
N
Third moment: m3 = 1
fi (Xi − X̄)3
P
N
Fourth moment: m4 = 1
fi (Xi − X̄)4
P
N
If the mean is a fractional value, then it becomes a difficult task to work out the moments. In such cases,
we can calculate moments about a working origin and then change it into moments about the actual mean.
The r-th moment about the mean or (central Moment)
Σf (x − x̄)r
mr = r = 1, 2, 3, · · ·
Σf
when r = 1
Σf (x − x̄)r
m1 =
n
Σf x x̄Σf
= −
n n
n
= x̄ − x̄
n
= x̄ − x̄ = 0
i.e., m1 = 0.
Next, when r = 2;
Σf (x − x̄)2
m2 = = s2 (variance)
n
0
The third, fourth etc., moments can be obtained from the formula for mr and mr .
Example 8.3. Calculate the first four central moments from the following grouped frequency distribution
table.
Class: 0 − 9 10 − 19 20 − 29 30 − 39 40 − 49
Frequency: 1 3 5 7 4
Solution. Let A = 24.5, h = 10 to facilitate the calculation, let
x−A x − 24.5
u= =
h 10
Class Frequency Md-points u fu f u2 f u3 f u4
0−9 1 4.5 −2
10 − 19 3 14.5 −1
20 − 29 5 24.5 0
30 − 39 7 34.5 1
40 − 49 P 4 44.5 2 P
Total
P 2
f u3 f u4
P P
f = 20 fu fu
8.4 Relations between central moments and raw moments (upto 4-th order)
The relationship between these moments is given by
n
1X
m1 = (xi − x̄) = 0
n i=1
n n
1X 1X 2
(xi − x̄)2 = xi − 2x̄xi + x̄2
m2 =
n i=1 n i=1
n n
! n n
1 X 2 X
2 1X 2 1X
= x − 2x̄ xi + nx̄ = xi − 2x̄ xi + x̄2
n i=1 i i=1
n i=1
n i=1
n
1X 2
= x − x̄2
n i=1 i
2
= m02 − (m01 )
n n
1X 1X 3
(xi − x̄)3 = xi − 3x̄x2i + 3x̄2 xi − x̄3
m3 =
n i=1 n i=1
n n n
1X 3 1X 2 1X
= xi − 3x̄ xi + 3x̄2 xi − x̄3
n i=1 n i=1 n i=1
3
= m03 − 3m02 m01 + 2 (m01 )
Moments about Origin (m0r ) Moments about mean (mr )
n n
0 1X 1X
m1 = xi = x̄ m1 = (xi − x̄) = 0
n i=1 n i=1
n n
1X 2 1X 2
0
m2 = x m2 = (xi − x̄)2 = m02 − (m01 )
n i=1 i n i=1
n n
1X 3 1X 3
0
m3 = x m3 = (xi − x̄)3 = m03 − 3m02 m01 + 2 (m01 )
n i=1 i n i=1
n n
1X 4 1X
m04 = x m4 = (xi − x̄)4 = m04 − 4m03 m01 + 6m02 (m01 )2 − 3(m01 )4
n i=1 i n i=1
Question 8.3. Show that m4 = m04 − 4m03 m01 + 6m02 (m01 )2 − 3(m01 )4 .
In general, relationship between these moments is given by
2 r
mr = m0r − r C1 m01 m0r−1 + r C2 (m01 ) m0r−2 + · · · + (−1)r (m01 ) .
2. What do you mean by dispersion? What are the different measures of dispersion?
3. Why is the standard deviation the most widely used measure of dispersion? Explain.
6. For a distribution, the first four moments about zero are 1, 7, 38 and 155 respectively. (i) Compute the
moment coefficients of skewness and kurtosis. (ii) Is the distribution mesokurtic? Give reason.
7. The first four moments of a distribution about the value 4 are 1,4, 10 and 45. Obtain various char-
acteristics of the distribution on the basis of the information given. Comment upon the nature of the
distribution.
8. Define kurtosis. If β1 = 1 and β2 = 4 and variance = 9, find the values of β3 and β4 and comment
upon the nature of the distribution.
In both these distributions the value of mean and standard deviation is the same (X̄ = 15, σ = 5). But it
does not imply that the distributions are alike in nature. The distribution on the left-hand side is symmetrical
one whereas the distribution on the right-hand side is asymmetrical or skewed. Measures of skewness help
us to distinguish between different types of distributions.
Measures of skewness tell us the direction and the extent of skewness. In symmetrical distribution the mean,
median and mode are identical. The more the mean moves away from the mode, the larger the asymmetry
or skewness. A distribution is said to be ’skewed’ when the mean and the median fall at different points in
the distribution, and the balance (or centre of gravity) is shifted to one side or the other-to left or right.
The above definitions show that the term ‘skewness’ refers to “lack of symmetry” i.e., when a distribu-
tion is not symmetrical (or is asymmetrical) it is called a skewed distribution.
Lack of symmetry is called Skewness. If a distribution is not symmetrical then it is called skewed
distribution. So, mean, median and mode are different in values and one tail becomes longer than other. The
skewness may be positive or negative.
The concept of skewness will be clear from the following three diagrams showing a symmetrical distri-
bution. a positively skewed distribution and a negatively skewed distribution.
1. Symmetrical Distribution. In a symmetrical distribution the values of mean, median and mode
coincide. The spread of the frequencies is the same on both sides of the centre point of the curve.
3. Positively Skewed Distribution. In the positively skewed distribution the value of the mean is maxi-
mum and that of mode least-the median lies in between the two. In the positively skewed distribution
the frequencies are spread out over a greater range of values on the high-value end of the curve (the
right-hand side) than they are on the low-value end.
If the frequency curve has longer tail to right the distribution is known as positively skewed distribution
and Mean > Median > Mode.
4. Negatively Skewed Distribution. In a negatively skewed distribution the value of mode is maximum
and that of mean least-the median lies in between the two. In the negatively skewed distribution the
position is reversed, i.e. the excess tail is on the left-hand side.
If the frequency curve has longer tail to left the distribution is known as negatively skewed distribution
and Mean < Median < Mode.
It should be noted that in moderately symmetrical distributions the interval between the mean and the median
is approximately one-third of the interval between the mean and the mode. It is this relationship, which
provides a means of measuring the degree of skewness.
2. When the data are plotted on a graph they do not give the normal bellshaped form i.e. when cut along
a vertical line through the centre the two halves are not equal.
3. The sum of the positive deviations from the median is not equal to the sum of the negative deviations.
5. Frequencies are not equally distributed at points of equal deviation from the mode.
The direction of skewness is determined by ascertaining whether the mean is greater than the mode or
less than the mode. If it is greater than the mode, then skewness is positive. But when the mean is less than
the mode, it is negative. The difference between the mean and mode indicates the extent of departure from
symmetry. It is measured in standard deviation units, which provide a measure independent of the unit of
measurement. It may be recalled that this observation was made in the preceding chapter while discussing
standard deviation. The value of coefficient of skewness is zero, when the distribution is symmetrical. Nor-
mally, this coefficient of skewness lies between ±1. If the mean is greater than the mode, then the coefficient
of skewness will be positive, otherwise negative.
Example 9.1. Given the following data, calculate the Karl Pearson’s coefficient of skewness.
X X
X = 452, X 2 = 24, 270, Mode = 43.7, and n = 10.
Applying the values of mean, mode and standard deviation in the above formula,
45.2 − 43.7
SKp = = 0.08
19.59
This shows that there is a positive skewness though the extent of skewness is marginal.
Question 9.1. From the following data, calculate the measure of skewness using the mean, median and
standard deviation:
X 10 − 19 20 − 29 30 − 39 40 − 49 50 − 59 60 − 69 70 − 79
f 18 30 40 55 38 20 16
Now, we have the values of both the upper and the lower quartiles.
Q3 − Q1
Coefficient of Quartile deviation =
Q3 + Q1
26.4 − 16.4 10
= = ≈ 0.234.
26.4 + 16.4 42.8
A frequency distribution is said to be positively or right skewed if the longer tail is towards the positive x
direction, so that the peak of the frequency curve lies to the left of the center. If on the other hand, the peak
of the curve lies to the right of the center so that the longer tail of the frequency curve is towards the left or
negative x direction the distribution is said to be negatively or left skewed.
3 (Mean − Median)
S · Kp =
Standard deviation
(Q3 + Q1 − 2Q2 )
=
(Q3 − Q1 )
Note: For a Normal distribution (perfectly symmetrical bell shaped curve) the measure of Skewness a3 is
zero. Any symmetric data should have a skewness near zero. Negative values for the skewness indicate data
that are skewed left and positive values for the skewness indicate data that are skewed right. By skewed left,
we mean that the left tail is long relative to the right tail. Similarly, skewed right means that the right tail is
long relative to the left tail. Some measurements have a lower bound and are skewed right. For example, in
reliability studies, failure times cannot be negative.
9.7 Kurtosis
Kurtosis is another measure of the shape of a frequency curve. It is a Greek word, which means bulginess.
While skewness signifies the extent of asymmetry, kurtosis measures the degree of peakedness of a fre-
quency distribution. Karl Pearson classified curves into three types on the basis of the shape of their peaks.
These are mesokurtic, leptokurtic and platykurtic. These three types of curves are shown in figure below:
It will be seen from Fig. 3.2 that mesokurtic curve is neither too much flattened nor too much peaked.
In fact, this is the frequency curve of a normal distribution. Leptokurtic curve is a more peaked than the
normal curve. In contrast, platykurtic is a relatively flat curve. The coefficient of kurtosis as given by Karl
Pearson is β2 = µ4 /µ22 . In case of a normal distribution, that is, mesokurtic curve, the value of β2 = 3.
If µ2 > 3, the curve is called a leptokurtic curve and is more peaked than the normal curve. Again, when
β2 < 3, the curve is called a platykurtic curve and is less peaked than the normal curve. The measure of kur-
tosis is very helpful in the selection of an appropriate average. For example, for normal distribution, mean is
most appropriate; for a leptokurtic distribution, median is most appropriate; and for platykurtic distribution,
the quartile range is most appropriate.
tails. Data sets with low kurtosis tend to have a flat top near the mean rather than a sharp peak. This is a
measure of “peakedness” of a distribution.
a4 < 3 platykurtic
a4 > 3 leptokurtic
a4 ≈ 3 mesokurtic (medium peak)
The kurtosis for a standard normal distribution is three. For this reason, some sources use the following
definition of kurtosis (often referred to as “excess kurtosis”):
PN
(Yi − ȳ)
Kurtosis = i=1 −3
(N − 1)s4
This definition is used so that the standard normal distribution has a kurtosis of zero. In addition, with the
second definition positive kurtosis indicates a “peaked” distribution and negative kurtosis indicates a “flat”
distribution.
2. What do you mean by dispersion? What are the different measures of dispersion?
3. Why is the standard deviation the most widely used measure of dispersion? Explain.
6. For a distribution, the first four moments about zero are 1, 7, 38 and 155 respectively. (i) Compute the
moment coefficients of skewness and kurtosis. (ii) Is the distribution mesokurtic? Give reason.
7. The first four moments of a distribution about the value 4 are 1,4, 10 and 45. Obtain various char-
acteristics of the distribution on the basis of the information given. Comment upon the nature of the
distribution.
8. Define kurtosis. If β1 = 1 and β2 = 4 and variance = 9, find the values of β3 and β4 and comment
upon the nature of the distribution.
10 Correlations Analysis
Objective: The overall objective of this lesson is to give you an understanding of bivariate linear correlation,
there by enabling you to understand the importance as well as the limitations of correlation analysis.
7.1 Introduction
10.1 Introduction
Statistical methods of measures of central tendency, dispersion, skewness and kurtosis are helpful for the
purpose of comparison and analysis of distributions involving only one variable i.e. univariate distributions.
However, describing the relationship between two or more variables, is another important part of statistics.
There are situations where data appears as pairs of figures relating to two variables. A correlation prob-
lem considers the joint variation of two measurements neither of which is restricted by the experimenter.
The regression problem discussed in this lecture considers the frequency distribution of one variable (called
the dependent variable) when another (independent variable) is held fixed at each of several levels.
Examples of correlation problems are found in the study of the relationship between IQ and aggregate per-
centage of marks obtained by a person in the SSC examination, blood pressure and metabolism or the relation
between height and weight of individuals. In these examples both variables are observed as they naturally
occur, since neither variable is fixed at predetermined levels.
Examples of regression problems can be found in the study of the yields of crops grown with different
amount of fertilizer, the length of life of certain animals exposed to different levels of radiation, and so on.
In these problems the variation in one measurement is studied for particular levels of the other variable se-
lected by the experimenter.
In many research situations, the key to decision making lies in understanding the relationships between
two or more variables. For example, in an effort to predict the behavior of the bond market, a broker might
find it useful to know whether the interest rate of bonds is related to the prime interest rate. While studying
the effect of advertising on sales, an account executive may find it useful to know whether there is a strong
relationship between advertising costs and sales volumes for a company.
The statistical methods of correlation (discussed in the present lesson) and regression (to be discussed
in the next lesson) are helpful in knowing the relationship between two or more variables which may be
related in same way, like interest rate of bonds and prime interest rate; advertising expenditure and sales;
income and consumption; crop-yield and fertilizer used; height and weights; profits and sales; hours of study
and student marks and so on.
In all these cases involving two or more variables, we may be interested in seeing:
• if so, what form the relationship between the two variables takes;
• how we can make use of that relationship for predictive purposes, that is, forecasting; and
When we collect data on two of such characteristics it is called bivariate data. It is generally denoted by
(X, Y ) where X and Y are the variables representing the values of the characteristics. Therefore, correlation
analysis gives the idea about the nature and extent of relationship between two variables in the bivariate data.
“The correlation between variables is a measure of the nature and degree of association between
the variables”.
As a measure of the degree of relatedness of two variables, correlation is widely used in exploratory research
when the objective is to locate variables that might be related in some way to the variable of interest.
Correlation measures the degree of linear relation between the variables.The existence of correlation
between variables does not necessarily mean that one is the cause of the change in the other. It should noted
that the correlation analysis merely helps in determining the degree of association between two variables,
but it does not tell any thing about the cause and effect relationship. While interpreting the correlation co-
efficient, it is necessary to see whether there is any cause and effect relationship between variables under
study. If there is no such relationship, the observed is meaningless.
11 Types of correlation
Correlation can be classified in several ways. The important ways of classifying correlation are:
Positive and Negative Correlation: If both the variables move in the same direction, we say that there is a
positive correlation, i.e., if one variable increases, the other variable also increases on an average or if one
variable decreases, the other variable also decreases on an average. On the other hand, if the variables are
varying in opposite direction, we say that it is a case of negative correlation; e.g., movements of demand and
supply.
Linear and Non-linear (Curvilinear) Correlation: If the change in one variable is accompanied by change
in another variable in a constant ratio, it is a case of linear correlation. Observe the following data:
X : 10 20 30 40 50
Y : 25 50 75 100 125
The ratio of change in the above example is the same. It is, thus, a case of linear correlation. If we plot these
variables on graph paper, all the points will fall on the same straight line.
On the other hand, if the amount of change in one variable does not follow a constant ratio with the change
in another variable, it is a case of non-linear or curvilinear correlation. If a couple of figures in either series
X or series Y are changed, it would give a non-linear correlation.
Simple, Partial and Multiple Correlation: The distinction amongst these three types of correlation de-
pends upon the number of variables involved in a study. If only two variables are involved in a study, then the
correlation is said to be simple correlation. When three or more variables are involved in a study, then it is a
problem of either partial or multiple correlation. In multiple correlation, three or more variables are studied
simultaneously. But in partial correlation we consider only two variables influencing each other while the
effect of other variable(s) is held constant.
Suppose we have a problem comprising three variables X, Y and Z. X is the number of hours studied,
Y is I.Q. and Z is the number of marks obtained in the examination. In a multiple correlation, we will study
the relationship between the marks obtained (Z) and the two variables, number of hours studied (X) and
I.Q. (Y ). In contrast, when we study the relationship between X and Z, keeping an average I.Q. (Y ) as
constant, it is said to be a study involving partial correlation.
Correlation coefficient is merely a mathematical relationship and this has nothing to do with cause and effect
relation. It only reveals co-variation between two variables. Even when there is no cause-and-effect rela-
tionship in bivariate series and one interprets the relationship as causal, such a correlation is called spurious
or non-sense correlation.
The commonly used methods for studying linear relationship between two variables involve both graphic
and algebraic methods. Some of the widely used methods include:
1. Scatter Diagram
2. Correlation Graph
A scatter diagram gives two very useful types of information. First, we can observe patterns between vari-
ables that indicate whether the variables are related. Secondly, if the variables are related we can get an idea
of what kind of relationship (linear or non-linear) would describe the relationship.
Correlation examines the first question of determining whether an association exists between the two vari-
ables, and if it does, to what extent. Regression examines the second question of establishing an appropriate
relation between the variables.
• if the plotted points are very close to each other, it indicates high degree of correlation. If the plotted
points are away from each other, it indicates low degree of correlation.
• if the points on the diagram reveal any trend (either upward or downward), the variables are said to be
correlated and if no trend is revealed, the variables are uncorrelated.
• if there is an upward trend rising from lower left hand corner and going upward to the upper right
hand corner, the correlation is positive since this reveals that the values of the two variables move in
the same direction. If, on the other hand, the points depict a downward trend from the upper left hand
corner to the lower right hand corner, the correlation is negative since in this case the values of the
two variables move in the opposite directions.
• in particular, if all the points lie on a straight line starting from the left bottom and going up towards
the right top, the correlation is perfect and positive, and if all the points like on a straight line starting
from left top and coming down to right bottom, the correlation is perfect and negative.
Example 12.1. Given the following data on sales (in thousand units) and expenses (in thousand shillings)
of a firm for 10 month:
Month: J F M A M J J A S O
Sales: 50 50 55 60 62 65 68 60 60 50
Expenses: 11 13 14 16 16 15 15 14 13 13
(b) Do you think that there is a correlation between sales and expenses of the firm? Is it positive or
negative? Is it high or low?
Karl Pearson’s measure, known as Pearsonian correlation coefficient between two variables X and Y , usu-
ally denoted by r(X, Y ) or rxy or simply r is a numerical measure of linear relationship between them and is
defined as the ratio of the covariance between X and Y , to the product of the standard deviations of X and Y .
Mathematically;
Cov(X, Y )
r= (2)
σx σy
where, (X1 , Y1 ), (X2 , Y2 ) · · · (Xn , Yn ) are n pairs of observations of the variables X and Y in a bivariate
distribution, Cov(X, Y ), covariance between x and y, σx -standard deviation of x and σy -standard deviation
of y.
Also
1X 1X
Cov(X, Y ) = (X − X̄)(Y − Ȳ ) = XY − X̄ Ȳ (3)
n
r r n
1X 1X 2
σx = (X − X̄)2 = X − X̄ 2 (4)
n n
r r
1X 1X 2
σy = 2
(Y − Ȳ ) = Y − Ȳ 2 (5)
n n
Thus by substituting Eqs. (5) in Eq. (2), we can write the Pearsonian correlation coefficient as
1
P
n
(X − X̄)(Y − Ȳ )
rxy = q P qP
(X−X̄)2 (Y −Ȳ )2
n n
P
(X − X̄)(Y − Ȳ )
= pP pP
(X − X̄) (Y − Ȳ )
P
XY
n
− X̄ Ȳ
= r P P
X2 Y2
n
− X̄ 2 n
− Ȳ 2
or
nΣxy − (Σx)(Σy)
r=p
[nΣx − (Σx)2 ] [nΣy 2 − (Σy)2 ]
2
The Pearson’s Correlation coefficient is also called as the product moment correlation coefficient.
1. Pearsonian correlation coefficient cannot exceed 1 numerically. In other words it lies between 1
and +1.
−1 ≤ r ≤ +1
Remarks:
(a) This property provides us a check on our calculations. If in any problem, the obtained value of
r lies outside the limits ±1, this implies that there is some mistake in our calculations.
(b) The sign of r indicate the nature of the correlation. Positive value of r indicates positive corre-
lation, whereas negative value indicates negative correlation. r = 0 indicate absence of correla-
tion.
Mathematically, if given variables X and Y are transformed to new variables U and V by change
of origin and scale, i. e.
X −A Y −B
U= and V =
h k
Where A, B, h and k are constants and h > 0, k > 0; then the correlation coefficient between X and
Y is same as the correlation coefficient between U and V i.e.,
3. Two independent variables are uncorrelated but the converse is not true. If X and Y are independent
variables then
rxy = 0
However, the converse of the theorem is not true i.e., uncorrelated variables need not necessarily be
independent.
4. Pearsonian coefficient of correlation is the geometric mean of the two regression coefficients, i.e.
p
rxy = ± bxy · byx
The signs of both the regression coefficients are the same, and so the value of r will also have the same
sign.
5. The square of Pearsonian correlation coefficient is known as the coefficient of determination. Coef-
ficient of determination, which measures the percentage variation in the dependent variable that is
accounted for by the independent variable, is a much better and useful measure for interpreting the
value of r.
The Pearsonian correlation coefficient between the ranks X and Y is called the rank correlation coefficient
between the characteristics A and B for the group of individuals.
Spearman’s rank correlation coefficient, usually denoted by rs , R or ρ(Rho) is given by the equation
6 d2
P
rs = 1 − . (6)
n(n2 − 1)
Where d is the difference between the pair of ranks of the same individual in the two characteristics and n
is the number of pairs.
Remarks: The value of rs always lies between −1 and +1. The positive value of rs indicates positive
correlation (association) in the rank allocation. Whereas, the negative value of rs indicates the negative
correlation (association) in the rank allocation.
The calculation of rank correlation will be illustrated under three situations.
2. The ranks are not given. They have to be worked out from the data.
Example 12.3. Data given below read the ranks assigned by two judges to 8 participants. Calculate the
coefficient of rank correlation.
Participant Ranks by Judge Rank diff squared
A B d2
1 5 4 (5 − 4)2 = 1
2 6 8 4
3 7 1 36
4 1 7 36
5 8 5 9
6 2 6 16
7 3 2 1
8 4 3 P 21
n=8 d = 104
Solution. Spearman’s rank correlation coefficient is given by
6 d2
P
rs = 1 =
n(n2 − 1)
The value of correlation coefficient is −0.23. This indicates that there is negative association in rank allo-
cation by the two judges A and B.
Example 12.4. Five persons are assessed by three judges in a beauty contest. We have to find out which
pair of judges has the nearest approach to common perception of beauty.
Competitors
Judge 1 2 3 4 5
A 1 2 3 4 5
B 2 4 1 5 3
C 1 3 5 2 4
There are 3 pairs of judges necessitating calculation of rank correlation thrice. Formula (6) will be used.
Spearman’s rank correlation can also be used even if we are dealing with variables, which are measured
quantitatively, i.e. when the actual data but not the ranks relating to two variables are given. In such a case
we shall have to convert the data into ranks. The highest (or the smallest) observation is given the rank 1.
The next highest (or the next lowest) observation is given rank 2 and so on. It is immaterial in which way
(descending or ascending) the ranks are assigned. However, the same approach should be followed for all
the variables under consideration.
Example 12.5. We are given the percentage of marks, secured by 5 students in Economics and Statistics.
Then the ranking has to be worked out and the rank correlation is to be calculated.
Student Marks in Marks in
Statistics Economics
(X) (Y )
A 85 60
B 60 48
C 55 49
D 65 50
E 75 55
Once the ranking is complete formula (6) is used to calculate rank correlation.
Example 12.6. Calculate the rank coefficient of correlation from the following data:
X : 75 88 95 70 60 80 81 50
Y : 120 134 150 115 110 140 142 100
Solution. The table below gives the calculations for coefficient of rank correlation.
X Ranks RX Y Ranks RY d = RX − RY d2
75 5 120 5 0 0
88 2 134 4 −2 4
95 1 150 1 0 0
70 6 115 6 0 0
60 7 110 7 0 0
80 4 140 3 1 1
81 3 142 2 1 1
50 8 100 8 0 P 02
d =6
6 d2
P
rs = 1 −
n(n2 − 1)
6×6 36
=1− 2
=1−
8(8 − 1) 504
= 0.93
Question 12.2. The data below gives the marks given by two examiners to a set of 10 students in a aptitude
test. Calculate the Spearman’s Rank correlation coefficent.
A 85 56 45 65 96 52 80 75 78 60
B 80 60 50 62 90 55 75 68 77 53
Case 3: When the ranks are repeated
If two or more data have the same value, then they are said to be “tied”, and each of their ranks may be
set equal to the mean of the ranks of the positions they occupy in the ordered data set.
In case of attributes if there is a tie i.e., if any two or more individuals are placed together in any clas-
sification w.r.t. an attribute or if in case of variable data there is more than one item with the same value
in either or both the series then Spearman’s correlation rank for calculating the rank correlation coefficient
breaks down, since in this case the variables X [the ranks of individuals in characteristic A (1st series)] and
Y [the ranks of individuals in characteristic B (2nd series)] do not take the values from 1 to n.
In this case common ranks are assigned to the repeated items. These common ranks are the arithmetic
mean of the ranks, which these items would have got if they were different from each other and the next
item will get the rank next to the rank used in computing the common rank. For example, suppose an item
is repeated at rank 4. Then the common rank to be assigned to each item is (4 + 5)/2, i.e., 4.5 which is the
average of 4 and 5, the ranks which these observations would have assumed if they were different. The next
item will be assigned the rank 6. If an item is repeated thrice at rank 7, then the common rank to be assigned
to each value will be (7 + 8 + 9)/3, i.e., 8 which is the arithmetic mean of 7, 8 and 9 viz., the ranks these
observations would have got if they were different from each other. The next rank to be assigned will be 10.
If only a small proportion of the ranks are tied, this technique may be applied together with Eq.(6). If a
large proportion of ranks are tied, it is advisable to apply an adjustment or a correction factor to Eq. (6) as
explained below:
Note: In this example, we can note that the ranks are in fraction e.g., 4.5, which is Plogically incorrect
or meaningless. Therefore in the calculation of ρ we add a correction factor (C.F.) to d2 calculated as
follows.
Value Repeated Frequency m m(m2 − 1)
35 2 2(22 − 1) = 6
28 2 6
26 2 6
Total m(m2 − 1) = 18
P
Now,
m(m2 − 1)
P
18
C.F = = = 1.5.
12 12
Therefore
m(m2 − 1)
X P
2
d + = 117.5 + 1.5 = 119
12
We use this value in the calculation of ρ. Now the Spearman’s rank correlation coefficient is given by
2 −1)
P
d2 + m(m
P
12
rs = 1 −
n(n2 − 1)
P 2 (m31 − m1 ) (m31 − m1 )
6 d + + + ···
12 12
rs = 1 − (7)
n(n2 − 1)
(m31 − m1 )
where m1 , m2 , · · · , are the number of repetitions of ranks and · · · their corresponding correction
12
factors.
X has the value 35 both at the 4th and 5th rank. Hence both are given the average rank i.e.,
4+5
th = 4.5 th rank.
2
X Y Rank of Rank of Deviation
X = R0 X = R00 d = R0 − R00 d2
25 55 6 2 4 16
45 80 1 1 0 0
35 30 4.5 8 3.5 12.25
40 35 3 7 −4 16
15 40 8 5 3 9
19 42 7 4 3 9
35 36 4.5 6 −1.5 2.25
42 48 2 3 −1 P 21
d = 65.5
The necessary condition thus is
m3 − m 23 − 2 1
= =
12 12 2
Using the equation
(m31 − m1 ) (m31 − m1 )
2
P
6 d + + + ···
12 12
rs = 1 −
n(n2 − 1)
y <-c(1,3,6,2,7,4,5);
rcorr(x,y,type="spearman");
##Ties
x2 <- c(1,2,3,4,5,6,7);
y2 <- c(1,3,6,2,7,4,6);
rcorr(x2,y2,type="spearman");
2. Since Spearman’s rank correlation coefficient, rs , is nothing but Karl Pearson’s correlation coefficient,
r, between the ranks, it can be interpreted in the same way as the Karl Pearson’s correlation coefficient.
3. Karl Pearson’s correlation coefficient assumes that the parent population from which sample observa-
tions are drawn is normal. If this assumption is violated then we need a measure, which is distribution
free (or non-parametric). Spearman’s ρ is such a distribution free measure, since no strict assumption
are made about the from of the population from which sample observations are drawn.
4. Spearman’s formula is easy to understand and apply as compared to Karl Pearson’s formula. The
values obtained by the two formulae, viz Pearsonian r and Spearman’s rho are generally different.
The difference arises due to the fact that when ranking is used instead of full set of observations, there
is always some loss of information. Unless many ties exist, the coefficient of rank correlation should
be only slightly lower than the Pearsonian coefficient.
5. Spearman’s formula is the only formula to be used for finding correlation coefficient if we are dealing
with qualitative characteristics, which cannot be measured quantitatively but can be arranged serially.
It can also be used where actual data are given. In case of extreme observations, Spearman’s formula
is preferred to Pearson’s formula.
6. Spearman’s formula has its limitations also. It is not practicable in the case of bivariate frequency
distribution. For n > 30, this formula should not be used unless the ranks are given.
2. Another mistake that occurs frequently is on account of misinterpretation of the coefficient of corre-
lation. Suppose in one case r = 0.7, it will be wrong to interpret that correlation explains 70 percent
of the total variation in Y . The error can be seen easily when we calculate the coefficient of determi-
nation. Here, the coefficient of determination r2 will be 0.49. This means that only 49 percent of the
total variation in Y is explained. Similarly, the coefficient of determination is misinterpreted if it is
also used to indicate causal relationship, that is, the percentage of the change in one variable is due to
the change in another variable.
3. Another mistake in the interpretation of the coefficient of correlation occurs when one concludes a
positive or negative relationship even though the two variables are actually unrelated. For example,
the age of students and their score in the examination have no relation with each other. The two
variables may show similar movements but there does not seem to be a common link between them.
• Direction of the relationship If r is positive, y and x are directly related - i.e.,when x increases, y
will tend to increase. If r is negative, y and x are inversely related - i.e., when x increases, y will tend
to decrease.
• Strength of the relationship The larger the absolute value of r, the stronger the linear relationship
between y and x. If r = −1 or r = +1, the regression line will actually include all of the data points
and the line will be a perfect fit.
Calculating the Coefficient of Correlation for a set of data involves combining the same terms that appear in
the table above. The formula for r can be expressed as follows:
P P P
n ( xi yi ) − ( xi )( yi )
r=p P 2 p P
n( xi ) − ( xi )2 · n( yi2 ) − ( yi )2
P P
where
r = coefficient of correlation, −1 ≤ r ≤ +1
Example 12.9. A production manager has compared the dexterity-test scores of ve assembly-line em-
ployees with their hourly productivity. The data are in the table below.
Employee x = score on y = units produced in
Dexterity Test one hour
A 12 55
B 14 63
C 17 67
D 16 70
E 11 51
Solution. Referring to the dexterity-test example above, the coefficient of correlation between productivity
(y) and dexterity-test score (x) can be computed as
P P P
n ( xi yi ) − ( xi )( yi )
r=p P 2 p P
n( xi ) − ( xi )2 · n( yi2 ) − ( yi )2
P P
5 (4362) − (70)(306)
=p p
5(1006) − (70)2 · 5(18, 984) − (306)2
390
=
11.40175 × 33.6749
= 0.9546
The coefficient of correlation (r = 0.9546) is positive, reflecting that productivity (y) is directly related to
dexterity-test score (x). In other words, persons scoring higher on the dexterity test tend to record higher
levels of productivity. This is also reflected in the positive slope of the regression line, ŷ = 19.2 + 3.0x.
Coefficient of correlation is a unit free measure of degree of linear relationship between two or more vari-
ables. It is denoted by r. The square of correlation coefficient i.e., r2 is called the coefficient of determi-
nation.
Coefficient of determination measures the amount of variation in one variable that can be accounted for
in terms of variation in the other(s). For instance if r = 0.90 then r2 = 0.81 which implies that 81% of
variation in one variable can be attributed to variation in the other. Correlation coefficient is just that and no
more. That is the fact that a variable is correlated highly with the other does not imply that one variable is
dependent on the other. No causation is implied.
The most common coefficient of correlation is the pearsonian coefficient of correlation given by;
Σ(x − x̄)(y − ȳ)
r=p
Σ(x − x̄)2 Σ(y − ȳ)2
which for computational convenience is also given as:
nΣxy − (Σx)(Σy)
r=p
[nΣx2 − (Σx)2 ] [nΣy 2 − (Σy)2 ]
Example 12.10. The following data refers to examination marks verses hours of study per week of a sample
of 8 candidates that sat for statistics examinations in 2000:
Exam mark: (Y ) 64 61 84 70 88 92 72 72
Hours of study: (X) 20 16 34 23 27 32 18 22
(a) Calculate the Pearson’s product moment coefficient of correlation.
(b) Calculate the coefficient of determination between examination marks and hours of study.
x y xy x2 y2
20 64 1280 400 4096
16 61 976 256 3721
34 84 2856 1156 7056
23 70 1610 529 4900
27 88 2376 729 7744
32 92 2944 1024 8464
18 72 1296 324 5184
22 71 1562 484 5041
192 602 14,900 4902 46,206
n = 8, Σx = 192, Σy = 602, Σxy = 14, 900, Σx2 = 4902, Σy 2 = 46, 206
Hence;
(8)(14, 900) − (192)(602)
r=p
[(8)(4902) − (192)2 ] [(8)(46206) − (602)2 ]
361.6
=
421.697663
= 0.88
Therefore r = 0.88.
12.3.4 Properties of r
(i). r lies between −1 and 4, i.e., −1 ≤ r ≤ 1.
Question 12.6.
The ratings below are based on collisions claim experience and theft frequency for 12 makes of small, two-
door cars. higher numbers reflect higher claims and more frequent thefts, respectively.
Collision Theft Collision Theft
103 103 106 97
97 113 139 425
105 81 110 82
115 68 96 81
127 90 84 59
104 79 105 167
(i) Determine the least-squares regression line for predicting the rate of collision claims on the basis of
theft frequency rating.
(iii) If a new model were to have a theft rating of 110, what would be the predicted rating for collision
claims?
Question 12.7. The following data describes fuel consumption and flying hours for turboprop general avia-
tion aircraft from 1992 through 1997. Fuel consumption is in millions of gallons, flying times is in millions
of hours.
Year
(i) Determine the least-squares regression line for predicting fuel consumption on the basis of flying time.
(iii) If there were 2.0 million flying hours during a given year, what would be the prediction for the amount
of fuel consumed?
13 Regression Analysis
Objectives: The overall objective of this lesson is to give you an understanding of linear regression, there
by enabling you to understand the importance and also the limitations of regression analysis
13.1 Introduction
Regression Analysis is a very powerful tool in the field of statistical analysis in predicting the value of one
variable, given the value of another variable, when those variables are related to each other.
Regression analysis is a statistical tool used in prediction of value of unknown variable from known
variable.
Regression analysis is the mathematical process of using observations to find the line of best fit through
the data in order to make estimates and predictions about the behaviour of the variables. This line of best fit
may be linear (straight) or curvilinear to some mathematical formula. Regression analysis determines the
nature of the linear relationship between two interval-or ratio-scale variables.
In this lecture we will focus only on simple regression – linear regression involving only two variables,
dependent variable and an independent variable. Regression analysis for studying more than two variables
at a time is known as multiple regressions.
The decision, as to which variable is which sometimes, causes problems. If we are unsure, here are some
points that might be of use:
• if we have control over one of the variables then that is the independent. For example, a manufacturer
can decide how much to spend on advertising and expect his sales to be dependent upon how much
he spends
• it there is any lapse of time between the two variables being measured, then the latter must depend
upon the former, it cannot be the other way round
• if we want to predict the values of one variable from your knowledge of the other variable, the variable
to be predicted must be dependent on the known one
• The regression analysis is used to estimate the values within the range for which it is valid.
• The relationship between the dependent and independent variables remains the same till the regression
equation is calculated.
• The dependent variable takes any random value but the values of the independent variables are fixed.
• In regression, we have only one dependant variable in our estimating equation. However, we can use
more than one independent variable.
Regression analysis, in general sense, means the estimation or prediction of the unknown value of one vari-
able from the known value of the other variable. It is one of the most important statistical tools which is
extensively used in almost all sciences  Natural, Social and Physical. It is specially used in business and
economics to study the relationship between two or more variables that are related causally and for the esti-
mation of demand and supply graphs, cost functions, production and consumption functions and so on.
Regression analysis was explained by M. M. Blair as follows: “Regression analysis is a mathematical mea-
sure of the average relationship between two or more variables in terms of the original units of the data.”
We are interested in the nature of relationship between two or more variables. It is a usual practice to
observe the actual series of data. The observed series is then plotted on a diagram which is called a scatter
plot or scatter diagram.
(a) (Y − Yc ) = 0 and
P
(b) (Y − Yc )2 = minimum
P
The task of bringing out linear relationship consists of developing methods of fitting a straight line, or a
regression line as is often called, to the data on two variables. The line of Regression is the graphical or
relationship representation of the best estimate of one variable for any given value of the other variable. The
nomenclature of the line depends on the independent and dependent variables.
For two variables X and Y , there are always two lines of regression: (a) Regression line of X on Y :
gives the best estimate for the value of X for any specific given values of Y . (b) Regression line of Y on X:
gives the best estimate for the value of Y for any specific given values of X.
If X and Y are two variables of which relationship is to be indicated, a line that gives best estimate of
Y for any value of X, it is called Regression line of Y on X. If the dependent variable changes to X, then
best estimate of X by any value of Y is called Regression line of X on Y .
• Both the regression coefficients cannot be greater than 1 in number, (e.g., −1.25 and −1.32) is not
possible.
• Product of both the regression coefficients bxy and byx must be < 1, i.e., bxy × byx < 1. Here 0.35 ×
2.61 = 0.91 < 1. (Check this always!)
with the use of differential Calculus S is minimized for a and b which satisfy the Least Squares Normal
Equations.
Σy = na + bΣx (8)
Σxy = aΣx + bΣx2 (9)
The coefficient b in the equation y = a + bx is called the regression coefficient of y on x. The regression
coefficient b measures the linear relationship between the two variables x and y.
In geometrical terms b is a rate of change of y with respect to x. i.e., the slope of the line y = a + bx.
The regression line can be used for interpolation and extrapolation.
Example 13.2. The following data gives the observations on weekly income and expenditure for food for
five households.
(ii). Determine the least squares regression line of expenditure on weekly income.
(iii). Using the equation in (ii). estimate the expenditure on food for some one having a weekly income of
380.
Solution. We need to fit a line y = a + bx where a and b are to be determined from the data using the least
squares method.
x y xy x2
240 200 48,000 57,600
270 220 59,400 72,900
300 240 72,000 90,000
330 245 80,850 108,900
360 250 90,000 129,600
Σ1500 Σ1155 350, 250 459,000
Using the formula;
nΣxy − ΣxΣy
b̂ =
nΣx2 − (Σx)2
(5)(350, 250) − (1500)(1155)
=
(5)(459, 000) − (1500)2
= 0.42
1h i
â = Σy − b̂Σx
n
1
= [1155 − (0.42)(1500)]
5
= 105
y = 105 + 0.42x
For x = 380,
y = 105 + (0.42)(380) = 264.6
For someone with a weekly wage of 380 he is expected to spend 264.6 on food.
Regression analysis involving two variables as discussed above is known as simple regression. If more
than one independent variable are involved we talk of multiple regression. If in particular the regression is
believed to be linear, i.e., of the form;
y = a + b1 x 1 + b2 x 2 + · · · + bn x n
We can determine simultaneous equations resulting from the least squares normal equations for estimating
a, b1 , b2 · · · , bn for two independent variables i.e., three variables the equation is given by y = a+b1 x1 +b2 x2
and the least squares normal equations are given by;
Σy = an + b1 Σxi + b2 Σx (10)
Σx1 y = aΣx1 + b1 Σx21 + b2 Σx1 x2 (11)
Σx2 y = aΣx1 + b1 x1 x2 + b2 Σx21 (12)
The simple linear regression model is a linear equation having a y-intercept and a slope, with estimates
of these population parameters based on sample data and determined by standard formulas. The model is
described in terms of the population parameters as follows:
ŷ = the estimated value of the dependent variable (y) for a given value of x.
b0 = the y-intercept; this is the value of y where the line intersects the y-axis whenever x = 0.
The cap (ˆ·) over the y indicates that it is an estimate of the (unknown) “true” value of y. The equation is
completely described by the y-intercept (b0 ) and slope (b1 ), which are sample estimates of their population
counterparts, β0 and β1 , respectively. An infinite number of possible equations can be fitted to a given
scatter diagram, and each equation will have a unique combination of values for b0 and b1 . However, only
one equation will be the “best fit” as defined by the least squares criterion we are going to use.
(Σxi yi ) − nx̄ȳ
Slope: b1 =
(Σx2i ) − nx̄2
y − intercept: b0 = ȳ − b1 x̄
where n = number of data points. With the slope determined, we take advantage of the fact that the
least-squares regression equation passes through the point (x̄, ȳ). The equation for finding the y-intercept
(b0 = ȳ − b1 x̄) is just a rearrangement of ȳ = b0 + b1 x̄.
Example:
A production manager has compared the dexterity-test scores of ve assembly-line employees with their
hourly productivity. The data are in the table below.
Employee x = score on y = units produced in
Dexterity Test one hour
A 12 55
B 14 63
C 17 67
D 16 70
E 11 51
Solution:
The calculations necessary for determining the slope and y -intercept of the regression equation are shown
below.
Data and preliminary calculations:
Employee x = score on y = units produced in xi yi x2i yi2
Dexterity Test one hour
A 12 55 660 144 3025
B 14 63 882 196 3969
C 17 67 1139 289 4489
D 16 70 1120 256 4900
E P11 P 51 P561 121
P 2 2601
P 2
xi yi xi yi xi yi
70 306 4362 1006 18984
70 306
= 14.0
x̄ = ȳ = = 61.2
5 5
Calculations for slope and y−intercept of Least-squares Regression line
For example, if a job applicant were to score x = 15 on the manual dexterity test, we would predict this
person would be capable of producing 64.2 units per hour on the assembly line. This is calculated as
estimated productivity.
Exercise:
1. For a sample of 8 employees, a personnel director has collected the following data on ownership of
company stock versus years with the firm.
x = years y = shares
6 300
12 408
14 560
6 252
9 288
13 650
15 630
9 522
(a) Determine the least-squares regression line and interpret the slope.
(b) For an employee who has been with the firm 10 years, what is the predicted number of shares of
stock owned?
2. The following data represents x = boats sales and y = boat trailer sales from 1995 through 2000.
Year Boat sales Boat Trailer sales
(Thousands) (Thousands)
1995 649 207
1996 619 194
1997 596 181
1998 576 174
1999 585 168
2000 574 159
(a) Determine the least squares regression line and interpret its slope.
(b) Estimate, for a year during which 500,000 boats are sold, the number of boat trailers that would
be sold.
(c) What reasons might explain why the number of boat trailer sold per year is less than the number
of boats sold per year?
Question 13.1. Scores made by students in a statistics class in the mid-term and final examination are given
in the table below. Develop a regression equation which may be used to predict final examination scores
from the mid-term score.
Student: 1 2 3 4 5 6 7 8 9 10
Mid-term: 98 66 100 96 88 45 76 60 74 82
Final: 90 74 98 88 80 62 78 74 86 80
We want to predict the final exam scores from the mid term scores.So, we let y- be the final exam scores and
x be the mid-term exam scores.
Calculate;
ii). Determine the regression line of social class index to amount of money spent and use the equation to
estimate the social class index of a customer spending 23,000?, 65,000?, 15,000?.
Question: Scores made by students in a statistics class in the mid-term and final examination are given here.
Student 1 2 3 4 5 6 7 8 9 10
Mid Term (X) 98 66 100 96 88 45 76 60 74 82
Final (Y ) 90 74 98 88 80 62 78 74 86 80
i). Plot the data on a scatter diagram and calculate Correlation coefficient.
ii). Determine a regression line of final examination score to the mid âĂŞ term score.
iii). Using the equation in ii). estimate the final score of a student having a midterm score of 50?, 70?
14 Probability
The word probability denotes the chance or likelihood of the occurrence of an event. The theory of proba-
bility deals with laws governing the chances of occurrence of phenomena, which are unpredictable in nature.
To understand the concept of probability and learn the methods of calculating the probabilities, we should
first understand some basic terms and concepts related to probability.
Experiment: An experiment is any activity where we do not know for certain what will happen but we
will observe what happens. For example:
• we will ask someone whether or not they have used certain products.
Sample Space: The set of all possible outcomes in a random experiment is called a sample space. It
is denoted by S. The outcomes listed in the sample space are called the sample points or sample elements.
The sample space may be finite, countable infinite or infinite in nature. The number of sample points in the
sample space may be denoted by n(S).
Examples: Some examples of the experiments and their sample spaces are as follows:
• In throwing a die, the sample space S = {1, 2, 3, 4, 5, 6}, n(S) = 6 and typical sample elements are
2 and 3.
• The subset B = {T T T } for the sample space of throwing three coins is the event of getting three tails.
The number of sample points in an event E is denoted by n(E).
Simple (Elementary) Events: An event consisting of single sample point is called a simple event. The
six events in S of throwing a die are;
{1} , {2} , {3} , {4} , {5} , {6} .
In a simultaneous toss of two coins, the event {HH} of getting both heads is a simple event where S =
{HH, HT, T H, T T }.
Null Event: It is the event containing no sample point in it. It is the impossible happening and is de-
noted by ∅, e.g in the experiment of throwing a cubic die, where the sample space is S = {1, 2, 3, 4, 5, 6}.
We define the event
Sure Events: An event which is sure to occur is called a sure event. In throwing a die an event consist-
ing of number lying between 1 and 6 is a sure event.
Equally likely Events: A number of events are said to be equally likely if any one of them cannot be
expected to occur in preference to the other. e.g in tossing a fair coin, the two possible outcomes head and
tail are equally likely i.e. we have no reason to accept that heads will appear more often than tails or vice
versa.
Independent Events: If the occurrence of one event does not change the probability that another event
will occur, we say that the events are independent.
Dependent Events: If the occurrence of one event does change the probability that another event will occur,
we say that the events are dependent.
Exhaustive Events: Events are said to be exhaustive when they include all possible outcomes of a ran-
dom experiment. In tossing a coin, exhaustive events are two, i.e., H or T, in rolling a die, there are six
exhaustive cases, since any of the six numbers may appear on top.
Mutually Exclusive Events: Two or more events are mutually exclusive or disjoint events if the events
cannot occur simultaneously, in other words, the occurrence of one of the events prevents the occurrence of
others. If A and B are any two events defined on a sample space S and A ∩ B = ∅ i.e. there is no common
sample points between them, then the events A and B are said to be mutually exclusive.
e.g. in tossing the coin the appearance of head and tail are mutually exclusive events as they cannot occur
simultaneously.
Mutually Exclusive and Exhaustive Events: The two events A and B are said to be mutually exclusive
and exhaustive if they are disjoint and their union is S.
Now if the events A and B defined on S have the sample points as follows
A = {1, 2, 3}, and B = {4, 5, 6}, then A ∩ B = ∅, and A ∪ B = S
Hence A and B are mutually exclusive and exhaustive.
Every event associated with a random experiment is assigned a weight or measure of the chance of it occur-
ring called its probability.
Example 14.1. Two unbiased dice are thrown. Find the probability that
(i). getting a sum of 6
(ii). the numbers shown are equal
(iii). the difference of the numbers shown is 1
(iv). the first die shows 6
(v). the total of numbers greater than 8
Solution. The two dice can be thrown in 6 × 6 = 36 ways.
+ 1 2 3 4 5 6
1 1 2 3 4 5 6
2 2 4 6 8 10 12
3 3 6 9 12 15 18
4 4 8 12 16 20 24
5 5 10 15 20 25 30
6 6 12 18 24 30 12
Hence n(S) = 36 where S is the sample space.
(i). Let E1 , be the event of getting a sum of 6. So
E1 = {(1, 5), (5, 1), (2, 4), (4, 2), (3, 3)}
E2 = {(1, 1), (2, 2), (3, 3), (4, 4), (5, 5), (6, 6)}
E3 = {(1, 2), (2, 1), (3, 2), (2, 3), (4, 3), (3, 4), (4, 5), (5, 4), (5, 6)(6, 5)}
E4 = {(6, 1), (6, 2), (6, 3), (6, 4), (6, 5), (6, 6)}
6. For three events A, B and C applying the above rule, twice, we obtain
P (A ∪ B ∪ C) = P (A) + P (B) + P (C) − P (A ∩ B) − P (B ∩ C) − P (A ∩ C) + P (A ∩ B ∩ C)
Theorem 1: P (∅) = 0
Proof.
A∪∅=A ⇒ P (A ∪ ∅) = P (A) (13)
But A ∩ ∅ = ∅. Hence by rule 3, we have
P (A ∪ ∅) = P (A) + P (∅) (14)
From (13) and (14) we have;
P (A) = P (A) + P (∅) ⇒ P (∅) = 0
c
Theorem 2: P (A ) = 1 − P (A)
Proof.
A ∪ Ac = S
also
A ∩ Ac = ∅
P (A ∪ Ac ) = P (S) = 1 by rule 2.
c
⇒ P (A) + P (A ) = 1 by rule 3.
P (Ac ) = 1 − P (A).
Theorem 3: If A ⊂ B, then P (A) ≤ P (B).
Proof. Ac ∩ B = B\A (backslash means subtract only elements common to both sets).
If A ⊂ B, then B = A ∪ (B\A).
⇒ P (B) = P (A) + P (B\A) by rule 3.
P (B) ≥ P (A) given that P (B\A) ≥ 0
Theorem 4: P (A\B) = P (A) − P (A ∩ B).
(A\B) ∪ (A ∩ B) = A
P [(A\B) ∪ (A ∩ B)] = P (A)
P (A\B) + P (A ∩ B) = P (A).
Theorem 5: P (A ∪ B) = P (A) + P (B) − P (A ∩ B) From theorem 4 above;
P (A\B) = P (A) − P (A ∩ B).
Hence;
A\(B ∪ B) = A ∪ B
P [(A\B) ∪ B] = P (A ∪ B)
P (A\B) + P (B) = P (A ∪ B)
P (A) − P (A ∩ B) + P (B) = P (A ∪ B)
This is the theorem of total probability or the generalized addition theorem.
1. P (∅) = 0
2. P (Ac ) = 1 − P (A)
3. If A ⊂ B, then P (A) ≤ P (B)
4. P (A\B) = P (A) − P (A ∩ B)
5. P (A ∪ B) = P (A) + P (B) − P (A ∩ B)
Theorem 1: P (∅) = 0
Theorem 2: P (Ac ) = 1 − P (A)
Theorem 3: If A ⊂ B, then P (A) ≤ P (B).
Theorem 4: P (A\B) = P (A) − P (A ∩ B).
Theorem 5: P (A ∪ B) = P (A) + P (B) − P (A ∩ B)
This is the theorem of total probability or the generalized addition theorem.
1 1 1
Example 14.3. Given events A and B such that P (A) = , P (B) = and P (A ∩ B) = . Find
3 4 6
(i) P (Ac )
(ii) P (B c ∩ A)
(iii) P (A ∪ B)
(iv) P (Ac ∩ B c )
(i) P (Ac )
1 2
P (Ac ) = 1 − P (A) = 1 − i.e., P (Ac ) =
3 3
(ii) P (B c ∩ A)
A = A\B ∪ (A ∩ B)
= (A ∩ B c ) ∪ (A ∩ B)
P (A) = P [(A ∩ B c ) ∪ (a ∩ B)]
1 1
= P (A ∩ B c ) +
3 6
Thus,
1 1 1
P (A ∩ B c ) = − =
3 6 6
(iii) P (A ∪ B)
P (A ∪ B) = P (A) + P (B) − P (A ∩ B)
1 1 1 5
+ − =
3 4 6 12
(iv) P (Ac ∩ B c )
(Ac ∩ B c ) = (A ∪ B)c
P (Ac ∩ B c ) = 1 − P (A ∪ B)
5 7
=1− = .
12 12
P [(A ∪ B)c ] = 1 − P (A ∪ B)
5 7
=1− =
12 12
Example 14.4. Given P (A) = 0.6, P (B) = 0.5 and P (A ∩ B) = 0.4. Find P (A ∪ B); P (A|B); P (B|A).
Solution.
P (A ∪ B) = P (A) + P (B) − P (A ∩ B)
= 0.6 + 0.5 − 0.4
= 1.1 − 0.4 = 0.7
P (A ∩ B) 0.4 4
P (A|B) = = =
P (B) 0.5 5
P (A ∩ B 0.4 2
P (B|A) = = =
P (A) 0.6 3
2 1 5
Example 14.5. For two events A, and B; P (A) = , P (B 0 ) = , P (A∪B) = . Find P (A∩B); P (A|B);
5 3 6
P (only A); P (only one).
Solution.
P (A ∪ B) = P (A) + P (B) − P (A ∩ B)
5 2
= +
6 5
P (A ∩ B) = P (B) · P (A|B).
P (A ∩ B) = P (A) · P (B|A).
Events A and B are said to be independent if whether or not event B has occurred gives us no information
on whether event A has occurred. This can be expressed algebraically as follows:
Given that:
P (A ∩ B)
P (A|B) =
P (B)
Then if A and B are independent
P (A ∩ B) = P (A) · P (B)
This is a special case of the multiplication rule when events A and B are independent.
Example 14.6. A friend tosses a coin three times. You accidentally notice that the first time the coin shows
Head. What is the chance that the friend observes 2 Heads?
Solution. From the experiment, the sample space is given by
{HHH, HHT, HT T HT H}
P (A ∩ B) = P (A) × P (B)
= 0.8 × 0.7
= 0.56
P (A ∪ B) = P (A) + P (B) − P (A ∩ B)
= 0.8 + 0.7 − 0.56
= 0.94
P (A ∩ B) = 0
P (A ∪ B) = P (A) + P (B) − P (A ∩ B)
= 0.8 + 0.7 − 0
= 1.5 This value is not a probability since (0 ≤ P ≤ 1).
Also
600 × 30 400 × 5
n(B) = + = 200
100 100
and
600 × 30
n(A ∩ B) = = 180.
100
Thus, the required probability is given by
n(A ∩ B) 180
P (A|B) = = = 0.9.
n(B) 200
Example 14.10. If A and B are two events such that P (A) = 2/3, P (Ā ∩ B) = 1
6
and P (A ∩ B) = 13 .
Find P (B), P (A ∪ B), P (A|B), P (B|A), P (Ac ∪ B), P (Ac ∩ B c ) and P (B c ).
Also examine whether the events A and B are (a) Equally likely (b) Exhaustive (c) Mutually exclusive
and (d) Independent.
Solution. The probabilities of various events are obtained as follows:
1 1 1
P (B) = P (Ā ∩ B) + P (A ∩ B) = + = .
6 3 2
2 1 5
P (A ∪ B) = + −
3 2 6
P (A ∩ B) 1 2 2
P (A|B) = = × =
P (B) 3 1 3
P (A ∩ B) 1 3 1
P (B|A) = = × =
P (A) 3 2 2
1 1 1 2
P (Ā ∪ B) = P (Ā) + P (B) − P (Ā ∩ B) = + − = .
3 2 6 3
5 1
P (Ā ∩ B̄) = 1 − P (A ∪ B) = 1 − =
6 6
1 1
P (B̄) = 1 − P (B) = 1 − = .
2 2
(a) Since P (A) 6= P (B), A and B are not equally likely events.
(b) Since P (A ∪ B) 6= 1, A and B are not exhaustive events.
(c) Since P (A ∩ B) 6= 0, A and B are not mutually exclusive.
(d) Since P (A)P (B) = P (A ∩ B), A and B are independent events.
Example 14.11. Probability that an electric bulb will last for 150 days or more is 0.7 and that it will last at
the most 160 days is 0.8. Find the probability that it will last between 150 and 160 days.
Solution. We are given, P (A) = 0.7 and P (B) = 0.8. Thus, P (A ∪ B) = 1. We have to find P (A ∩ B).
This probability is given by
P (A ∩ B) = P (A) + P (B) − P (A ∪ B) = 0.7 + 0.8 − 1.0 = 0.5.
Question 14.2. Explain the meaning of conditional probability. State and prove the multiplication rule of
probability of two events when
(a) they are not independent
(b) they are independent
Question 14.3. If P (A) = 31 , P (B) = 21 , P (A|B) = 16 , find P (B|A) and P (B|Ā).
Question 14.4. In a group of 80 students, 30 are taking mathematics, 20 are taking chemistry, and 10 are
taking mathematics and chemistry. What is the probability that a randomly chosen student is taking either
mathematics or chemistry?
Let A and B be two events. The probability that event B will occur, given that event A has occurred, is
known-this is a “forward-looking” probability in the sense that event A occurred before event B. Suppose
instead that you were asked to find the “backward-looking” probability that event A has occurred, given that
event B has occurred. In other words, you are asked to find
P (A|B).
Bayes’ Theorem gives a way to find this conditional probability by using the formula. The conditional
probability that an event A has occurred, given that event B has occurred, is
P (A) · P (B|A)
P (A|B) =
P (A) · P (B|A) + P (A0 ) · P (B|A0 )
Example 14.12. Two machines, A and B produce identical beads. Machine A has probability 0.1 of pro-
ducing a defective bead each time, whereas machine B has probability 0.4 of producing a defective bead.
Each machine produces one bead. One of these beads is selected at random, tested and found to be defective.
What is the probability that it was produced by machine B?
Solution. The probabilities can be represented on a tree diagram. Let A be the event that the bead was
produced by machine A, and B be the event that the bead was produced b machine B. Then,
P (B ∩ D) P (B) · P (D|B)
P (B|D) = =
P (D) P (D)
(0.5 × 0.4)
=
0.25
= 0.8
Example 14.13. Analysis of questionnaire completed by holiday makers showed that 0.75 classified their
holiday as good at Malindi. The probability of hot weather in the resort is 0.6. If the probability of regarding
the holiday as good given hot weather is 0.9, what is the probability that there was hot weather if a holiday
maker considers his holiday good?
Solution. Let
H = Hot weather
G = Good holiday
Then
Recall
P (G ∩ H) P (G ∩ H)
P (G|H) = and P (H|G) =
P (H) P (G)
P (G|H) · P (H) = P (H|G) · P (G)
P (G|H) · P (H)
P (H|G) =
P (G)
0.9 × 0.6
P (H|G) =
0.75
= 0.72
Example 14.14. Three machines A, B, and C produce respectively 60%, 30% and 10% of the total number
of items of a factory. The percentages of defective output of these machines are respectively 2%, 3%, and
4%. An item is selected at random from the product and is found to be defective. Find the probability that
the item was produced by machine C.
Solution. Let A1 , B1 , C1 be the events that an item drawn at random is produced by machines A, B, C
respectively and let X be the event that the product drawn is defective.
Then
We seek P (C1 |X) = probability that an item is produced by machine C given that the item is defective.
By Bayes’ Theorem
P (C1 ) · P (X|C1 )
P (C1 |X) =
P (A1 )P (X|A1 ) + P (B1 )P (X|B1 ) + P (C1 )P (X|C1 )
(0.10) · (0.04)
=
(0.60)(0.02) + (0.30)(0.03) + (0.10)(0.04)
= 0.08
Example 14.15. Suppose there are 3 urns containing 2 white and 3 black balls, 3 white and 2 black, and 4
white and 1 black balls respectively. There is equal probability of each urn being chosen. One ball is drawn
from an urn chosen at random. What is the probability that a white ball is drawn?
Solution. Let Ai be the event that ith urn is chosen i = 1, 2, 3 and P (Ai ) = 1
3
and B the event that a white
ball is drawn.
Then
P (B) = P (A1 ) · P (B|A1 ) + P (A2 ) · P (B|A2 ) + P (A3 ) · P (B|A3 )
Here;
1
P (A1 ) = P (A2 ) = P (A3 ) = (each urn being equally likely to be chosen)
3
2 3 4
P (B|A1 ) = , P (B|A2 ) = , P (B|A3 ) =
5 5 5
1 2 3 4
P (B) = + +
3 5 5 5
3
=
5
Bayes’ Theorem can be generalized to include any number of mutually exclusive events whose union is the
entire sample space. For instance, suppose in Figure C.16 that the events are mutually exclusive and that
Then the conditional probability that the event has occurred, given that event has occurred, is
Reverend Thomas Bayes (1702-1761), introduced his theorem on probability which is concerned with a
method for estimating the probability of causes which are responsible for the outcome of an observed effect.
Bayes’ theorem makes use of conditional probability formula where the condition can be described in
terms of the additional information which would result in the revised probability of the outcome of an event.
then, by substituting for P (Ei ∩ A) and P (A) in Eqn (15), the result is:
P (Ei )P (A|Ei )
P (Ej |A) = Pn , i = 1, 2, 3, · · · , n
j=1 P (Ej )P (A|Ej )
Essentially Baye’s formula allows us to “turnaround” conditional probabilities, i.e., calculate P (Ei |A) given
only information about P (A|Ei ).
The value P (Ej ) are known as prior probabilities, the event A is some event which is known to have
occurred and the conditional probability P (Ei |A) is known as the posterior probability.
Bayes’ theorem is frequently used in the analysis of decisions using decision trees where information is
given in the form of conditional probabilities and the reverse of these probabilities must be found.
Example 14.16. In a test, an examinee (student) either guesses or copies or knows the answer to multiple
choice question with four choices, only one answer being correct. The probability that he makes a guess is
1
3
and the probability that he copies the answer is 16 . The probability that his answer is correct given that he
copies it is 81 . Find the probability that he knew the answer to the question given that he correctly answers it.
Solution. Let A be the event of answering by guess work, B be the event of answering by copying, C be the
event of answering by knowing and D be the event of answering correctly.
1 1
P (A) = ; P (B) =
3 6
As the question is answered by either guessing or copying or knowing
1 1 1
P (C) = 1 − P (A) − P (B) = 1 − − =
3 6 2
We have to find the probability that he knew when he answered correctly, i.e. P (C|D)
By Baye’s theorem,
P (C) · P (D|C)
P (C|D) =
P (A)P (D|A) + P (B)P (D|B) + P (C)P (D|C)
(0.5) · (1)
=
(1/3)(1/4) + (1/6)(1/8) + (1/2)(1)
24
=
29
Tree diagrams can be a useful device for keeping track of conditional probabilities when using multipli-
cation and partition rules. The idea is to draw a tree where each path represents a sequence of events.
On any given branch of the tree we write the conditional probability of that event given all the events on
branches leading to it. The probability at any node of the tree is obtained by multiplying the probabilities
on the branches leading to the node, and equals the probability of the intersection of the events leading to it.
Constructing the Tree for a Sequential Process: Probability trees may be shown growing from left to
right or from top to bottom. The root of the tree corresponds to the starting point of the process. Line
segments called branches connect the root to nodes representing the different outcomes that are possible at
the first stage of the process. Each of those stage 1 nodes is connected to nodes representing the possible
outcomes at the next stage, and the process continues until all stages are completed.
Consider and example where you have three (3) red balls, two (2) green balls and two (1) white ball in
an urn. One ball is chosen randomly from the urn. If a red or white ball is chosen, a fair coin is flipped
once. If the ball is green, the coin is flipped twice. We can construct a tree to enumerate the outcomes (the
elements of the sample space) for this random process In the tree, the letters R, G and W represent the colors
red, green and white, and the letters H and T represent heads and tails, respectively
Probabilities of Compound Events: Since different paths in a probability tree represent mutually exclu-
sive events, we can add their probabilities without worrying about any overlap to find the probability of a
compound event. (This is a consequence of the Additive Law of Probability.)
3. Drawing with Conditional Replacement Suppose that the replacement of the first bead is done only if
it is blue, then the tree diagram will be similar as before but the probabilities will change only when a red
bead is picked as shown below.
Question 14.5. A tourist decides between two plays, called "Good" (G) and "Bad" (B). The probability
of the tourist choosing Good is P (G) = 10%. A tourist choosing Good likes it (L) with 70% probability
(P (L | G) = .7) while a tourist choosing Bad dislikes it with 80% probability (P (D | B) = 0.8).
b. Calculate P (L), the probability that the tourist liked the play he or she saw.
c. If the tourist liked the play he or she chose, what is the probability that he or she chose Good?
The use of tree diagrams may become tedious when the tree grows beyond four stages. We can make use of
the binomial formula which will be discussed in the next two lectures. Exercise Problems
1. The probability that the judge selected to try a criminal case will arrive at the appropriate verdict is
0.95. That is, given a guilty defendant on trial, the probability is 0.95 that the judge will find him
guilty and conversely, given an innocent man on trial, the probability is 0.95 that the judge will find
him innocent. Suppose that the local police is quite diligent in its duties and that 99% of the people
brought before the court are actually guilty.
Find:
2. A firm has four plants scattered around the city producing the same item. Plant A produces 30% of
total production, plant B produces 25%, plant C produces 35%, and plant D produces 10%. The firm
has a single warehouse in the city for storing finished products from all the plants. From the past
performance records on the proportion of defectives, it has been found that 5%, 10%,15%, and 20%
of the items produced at A,B, C, and D respectively are defectives. before the shipment of an item to
a dealer, one unit is selected at random and found to be defective. What is the probability that it was
produced by plant C?
3. Three girls, Aileen, Barbara, and Cathy, pack biscuits in a factory. From the batch allotted to them
Aileen packs 55%, Barbara 30%, and Cathy 15%. The probability that Aileen breaks some biscuits
in a packet is 0.7, and the respective probabilities for Barbara and Cathy are 0.2 and 0.1. What is the
probability that a packet with broken biscuits found by the checker was packed by Aileen?
4. When a person needs a mini-cab, it is hired from one of three firms, X, Y, and Z. Of the hirings
40% are from X, 50% are from Y and 10% are from Z. For cabs hired from X, 9% arrive late, the
corresponding percentages for cabs hired from firms Y and Z belong 6% and 20% respectively.
Calculate the probability that the next cab hired,
5. In a bolt factory machines A, B, and C manufacture respectively 25%, 30%, and 45% of the bolts.
The respective percentage defective bolts for machines A, B, and C are 2,1, and 0.5. A bolt is drawn
at random from the production of this factory and is found to be defective. What is the probability that
it was manufactured by machine C?
6. In the town of Corruptaville in the country of Burania 30% of the drivers are learners, 50% of the
drivers are licensed but incompetent and bribed the examiner to make them pass. The remaining 20%
of drivers are licensed, competent and did not bribe the examiner. 10% of the learners, 80% of the
incompetent drivers and 1% of the competent drivers drive carelessly. The probability that any careless
driver has an accident is 70% and the probability that any careful driver has an accident is 20%. The
police have been called to check up on an accident involving a driver and a pedestrian. What is the
probability, correct to three significant figures, that
7. In a study made to find the relationship between IQ of a person and his academic achievements the
following results were obtained.
8. A factory manufacturing memory card chips has three machines A, B, and C in operation. Machine
A produces 50% of the cards and is known to have a rate of 3% defectives. Machine B produces 30%
of the memory cards with the rate of 4% dfective and machine C produces 20% of the cards with the
rate of 5% defectives.
An item selected at random from the memory cards manufactured by the company was found to be
defective, what is the probability that it’s defective? and what machine produced the card?
9. A box contains 12 light bulbs of which 5 are defective. All the bulbs look alike and have equal
probability of being chosen. Three bulbs are picked up at random. What is the probability that at
least 2 are defective ?
10. If you take a bus to work in the morning there is a 20% chance you’ll arrive late. When you go by
bicycle there is a 10% chance you’ll be late. 70% of the time you go by bike, and 30% by bus. Given
that you arrive late, what is the probability you took the bus?
11. At a police spot check, 10% of cars stopped have defective headlights and a faulty muffler. 15%
have defective headlights and a muffler which is satisfactory. If a car which is stopped has defective
headlights, what is the probability that the muffler is also faulty?
12. In a large population, people are one of 3 genetic types A, B and C: 30% are type A, 60% type B and
10% type C. The probability a person carries another gene making them susceptible for a disease is
.05 for A, .04 for B and .02 for C. If ten unrelated persons are selected, what is the probability at least
one is susceptible for the disease?
13. Let A and B be events defined on the same sample space, with P (A) = 0.3, P (B) = 0.4 and
P (A|B) = 0.5. Given that event B does not occur, what is the probability of event A?
14. Events A and B are independent with P (A) = .3 and P (B) = .2. Find P (A ∪ B).
15. Students A, B and C each independently answer a question on a test. The probability of getting the
correct answer is .9 for A, .7 for B and .4 for C. If 2 of them get the correct answer, what is the
probability C was the one with the wrong answer?
16. E and F are two events such P (E) = 0.60, P (EorF ) = 0.90 and P (EandF ) = 0.50. Find P (F ).
17. The probability that a randomly chosen adult resident of Kisumu city owns a boat is 0.16. The prob-
ability that a randomly chosen adult rents an apartment is 0.30. The probability that the adult owns a
boat given he/she rents an apartment is 0.20. Find the probability that a randomly chosen adult rents
an apartment and owns a boat
18. Assume that the probability is 95% that the jury selected to try a criminal case will arrive at the
correct verdict whether innocent or guilty. Further, suppose that the local police force is quite diligent
in performing its function, that 99% of the people brought to trial are in fact guilty. Given that a jury
finds a defendant innocent, what is the probability that he is in fact innocent? Draw a tree diagram.
19. Medical researchers know that the probability of getting lung cancer if a person smokes is 0.34. The
probability that a nonsmoker get lung cancer is 0.03. It is also known that 11% of the population
smokes. What is the probability that a person with lung cancer was a smoker?
20. A sandwich is made with only one type of bread, one type of meat, and one type of cheese. There are
3 types of bread: white, wheat, or rye; 2 types of meat: turkey or roast beef; and 2 types of cheese:
American or Swiss. Draw a tree diagram to show the number of sandwich choices.
21. A drawer contains 4 red socks, 3 white socks, and 3 blue socks. Without looking, you select a sock at
random, replace it, and select a second sock at random. What is the probability that the first sock is
blue and the second sock is red?
22. Two urns each contain green balls and blue balls. Urn I contains 4 green balls and 6 blue balls, and
Urn II contains 6 green balls and 2 blue balls. A ball is drawn at random from each urn. What is the
probability that both balls are blue?
23. A bag contains 6 purple marbles and 7 white marbles. Two marbles are drawn at random. One marble
is drawn and not replaced. Then a second marble is drawn. What is the probability that the first marble
is white and the second one is purple?
24. Radar detection. If an aircraft is present in a certain area, a radar correctly registers its presence with
probability 0.99. If it is not present, the radar falsely registers an aircraft presence with probability
0.10. We assume that an aircraft is present with probability 0.05. What is the probability of false alarm
(a false indication of aircraft presence), and the probability of missed detection (nothing registers, even
though an aircraft is present)?
25. A company has installed a new computer system and some employees are having difficulty logging
on to the system. They have been given training and the problems which arose during training were
recorded and their probabilities calculated as follows:
26.
If we define a variable X as the number of heads observed when a fair coin is tossed three times, then X
takes values 0, 1, 2, 3, where
Hence to each sample points in S we have assigned a real number, which uniquely determines the sample
point. The variable X is called the random variable defined on the sample space S.
We can also find the probabilities of values 0, 1, 2, 3 of the random variable X as follows
(ii)
P
all x P (X) = 1.
Note: Here, the uppercase X is used for the random variable and lowercase x is used to denote (represent)
a realization of X. Probabilities can be easily obtained from the probability distribution table as follows:
Probability of getting two or more heads
3 1 1
P (X > 1) = P (X = 2) = P (X = 3) = + =
8 8 2
Example 15.1. A discrete random variable X has the following probability distribution.
X -2 -1 0 1 2
P (X): k 0.2 2k 2k 0.1
Find k and also find the expected value of the random variable X.
Solution. Since X is a random variable with given P (X), it must satisfy the conditions of a probability
distribution. X
P (X) = 1 ⇒ 5k + 0.3 = 1 ⇒ k = 0.7/5 = 0.14.
P
Now we can calculate the expected value by the formula E(X) = XP (X).
X P (x) xP (x)
-2 0.14 -0.28
-1 0.2 -0.2
0 0.28 0
1 0.28 0.28
2 0.1 0.2
Total 1 0
Example 15.2. A random variable follows the probability distribution given below;
X 0 1 2 3 4
P (X) 0.12 0.23 k 0.20 0.10
Obtain the value of k, and hence compute the expected value of X.
k = 0.35, E(X) = 1.93 and V ar(X) = 0.35
Example 15.3. A coin is such that the tail is thrice as likely as the head. A game is played such that you earn
5 points for a head and lose 2 points for a tail after every toss. Let X be the total score after 4 consecutive
tosses. Find the probability distribution of X and the expected number of points.
Solution. Let H be the event of observing a head and let X be the points earned, then P (H) = 0.25, and
P (T ) + 0.75, n = 4 and using the binomial formula, we have
X
E(X) = x · p(x)
all x
1 5 5 1 1
=4× +8× + 16 × + 12 × + 24 ×
16 16 16 16 4
232
=
16
= 14.5
Question 15.1. A discrete random variable X takes the following values with the corresponding probabili-
ties;
x −3 −1 0 1 2 3
P (x) 0.1 0.2 0.1 0.2 0.15 0.25
Compute the following probabilities; (a). P (X = −1) (b). P (X = −2)
(c). P (X ≤ 0) (d). P (X is negative) (e). E(X)
(c). the probability that you will score more than the expected value.
15.3 Variance
The variance of a probability distribution of a discrete random variable provides a numerical measure of the
spread and is given by the sum of the products of the squared deviations between the mean and all individual
values of the random variable, taken one at a time and their respective probabilities.
Thus variance is given by the formula:
Variance is denoted by σ 2 = Var(X) = E(X − E(X))2 and the standard deviation is the square root of the
variance.
From our example above of the three fair coins being tossed once, we can calculate the value of the variance,
as follows, knowing that the mean of the distribution is 1.5.
No. of Heads (X) P (X) µ X − µ (X − µ)2 (X − µ)2 P (X)
0 1/8 1.5 −1.5 2.25 0.28
1 3/8 1.5 −0.5 0.25 0.09
2 3/8 1.5 0.5 0.25 0.09
3 1/8 1.5 1.5 2.25 0.28
Question 15.5. Determine the mean, variance, and standard deviation of the following discrete probability
distribution.
x 0 1 2 3 4
P (x) 0.10 0.30 q 0.20 0.10
Question 15.6. A random variable X has the following probability distribution:
X: -2 -1 0 1 2 3
P (x): 0.1 k 0.2 2k 0.3 k
Find the value of k. Find the expected value and variance of X.
Question 15.7. A random variable X has the following probability distribution:
X: 0 1 2 3 4 5
P (x): 0.1 0.1 0.2 k 0.2 0.1
Find the value of k. Find the expected value and variance of X.
Question 15.8. An unbiased coin is tossed four times. Find the expected value and variance of the random
variable defined as number of Heads.
Example 15.6. The number of telephone calls received in an office between 9.00 A.M - 10.00 A.M has the
probability distribution as shown in the table below:
The Probability distribution of the number of telephone calls.
No. of calls Probability P (X)
0 0.05
1 0.20
2 0.25
3 0.20
4 0.10
5 0.15
6 0.05
(a). Verify that it is a probability function.
(b). Find the probability that there will be 3 or more calls.
(c). Find the probability that there will be even number of calls.
Solution. Clearly,
(a).
(i). 0 ≤ P (xi ) ≤ 1
X n
(ii). P (Xi ) = 0.05 + 0.20 + 0.25 + 0.2 + .010 + 0.15 + 0.05 = 1
i=1
(b).
P (X ≥ 3) = P (X = 4) + P (X = 5) + P (X = 6)
= 0.20 + 0.10 + 0.15 + 0.05
= 0.50
(c).
P (X = 0 or 2 or 4 or 6) = P (X = 0) + P (X = 2) + P (X = 4) + P (X = 6)
= 0.05 + 0.25 + 0.10 + 0.05
= 0.40
Each value of the random variable is multiplied by the probability of occurrence of this value and then all
these products are summed up.
It is also common in statistical literature to refer to the mean as Mathematical Expectation or the Expected
value of the random variable X.
Example 15.7. Assume that we have three fair coins and we toss them simultaneously. The possible number
of heads that can appear as s result of the random experiment are given in the following table:
Outcomes No. of Heads Probability
TTT 0 1/8
HTT 1 1/8
TTH 1 1/8
THT 1 1/8
THH 2 1/8
HHT 2 1/8
HTH 2 1/8
HHH 3 1/8
The table can be summarized as to the number of heads occurring in the entire experiment and their respective
probabilities as follows:
Number of Heads (X) P (X) X · P (X)
0 1/8 0
1 3/8 3/8
2 3/8 6/8
3 1/8 3/8
1.0 12/8
The expected value (mean) for the number of heads in this experiment is
n
X
E(X) = xi P (Xi ) i = 1, 2, · · · , n
i=1
3 6 3
=0+ + +
8 8 8
12
= = 1.5
8
This means that on an average, 1.5 heads can be expected to appear as a result of every random experiment
of tossing three fair coins at any one time.
Example 15.8. In the telephone calls problem above find the mean of the telephone calls between 9 -10 am
X P (X) X P (X)
0 0.05 0
1 0.20 0.2
2 0.25 0.5
3 0.20 0.6
4 0.10 0.4
5 0.15 0.75
6 0.05 0.30
1.00 2.75
6
X
µ= xi P (Xi ) = 2.75
i=0
Example 15.9. Suppose the hourly earnings X of a self employed landscaper gardener are given by the
following probability function.
Hourly Earning X: 0 6 12 16
P (X) : 0.3 0.2 0.3 0.2
Find the gardener’s Mean.
Solution. The Mean is given as:
By Kinyita A.M
(a) Giving an example in each case differentiate the following terms as used in statistics:-
(b) Applicants for an assembly job are required to take a test of manual dexterity. The times in seconds
taken to complete the task for 19 applicants were as follows:-
63,229,165,77,49,74,67,59,66,102,81,72,59,74,61,82,48,70,86
An outlier is defined as a value outside the range x̄ ± 2s where s is the standard deviation and x̄ the
mean of the data.
(c) The following table gives the one-way commuting distance (in nearest kms) of 30 working women in
an Insurance company.
13 47 10 3 16 7
25 8 21 19 12 45
1 8 4 6 2 14
13 7 34 13 41 28
50 14 26 10 24 36
(d) In an entrance examination in Mathematics and Statistics, of the 120 students appeared for the exami-
nation, 65 passed in Mathematics, 75 passed in Statistics and 35 passed in both the tests. A student is
selected at random. What is the probability that the student has
(a) In an experiment, a bottle of milk of milk was brought from a cooler into a room whose temperature
is 25◦ C. Its temperature y ◦ C was recorded at time t minutes after it was brought in for 11 different
values of t. The results are summarized as follows
P P 2 P P
t = 44 t = 180.4 ty = 824.5 y = 205
(b) The table below shows the marks obtained by six students in two examinations
student A B C D E F
English 38 62 56 42 59 48
Maths 64 84 84 60 73 89
(i) Calculate the Spearman’s rank correlation coefficient and comment on the value. [6 marks]
(ii) The maths papers were remarked and one of the students awarded five more marks. Given that
the other marks and the rank correlation coefficient were unchanged, state with reason which
student received the extra marks. [2 marks]
(iii) Under what conditions would you expect the Spearman’s rank correlation coefficient to be equal
to the product-moment correlation coefficient. [1 mark]
(a) (i) Explain in words the meaning of the following symbol P (A|B) where A and B are two events.
[1 mark]
(ii) State the relationship between A and B if P (A|B) = 0
and P (A|B) = P (A). [2 marks]
(b) When a car owner needs his car to be serviced he calls one of the three garages A, B or C. Of all his
calls 30% to garage A, 10% to garage B and 60% to garage C.
The percentage of occasions when the garage called can take the car on that particular day are 20%
for A, 6% for B and 9% for C.
(c) A bag contains five identical balls two of which are green while the others are red. The balls are
successively drawn without replacement until both green balls are obtained.
Let X denote the number of draws required to obtain both green balls. Obtain the
(a) A discrete random variable X takes only the values 0, 1, 2, 3, 4, 5. The probability distribution of X
is as follows
P (X = 0) = P (X = 1) = P (X = 2) = p
P (X = 3) = P (X = 4) = P (X = 5) = q
P (X ≥ 2) = 3P (X < 2)
Determine the
(b) For a given data set the regression lines y on x and x on y are y +0.219x = 20.8 and 0.785y +x = 16.2
respectively. Find
(c) From two samples x and y, the following statistics were obtained.
9
X 9
X 7
X 7
X
xi = 39 x2i = 237 yi = 27 yi2 = 131
i=1 i=1 i=1 i=1
(i) Determine the mean and the variance of the combined(pooled) sample. [4 marks]
(ii) Suppose 5 is added to each of the xi and yi . Find the new pooled mean and variance.[2 marks]
Estimate the
(b) Compute the first four moments about the point 57 using the coding method and hence investigate the
skewness and peakedness of this distribution.
[15 marks]
(b) Applicants for an assembly job are required to take a test of manual dexterity. The times in seconds
taken to complete the task for 19 applicants were as follows:-
63,229,165,77,49,74,67,59,66,102,81,72,59,74,61,82,48,70,86
(c) A bag contains five identical balls each bearing one of the numbers 1,2,3,4 and 5. A ball is picked at
random from the bag its number noted and then replaced. This was done 50 times and the following
results obtained.
Number 1 2 3 4 5
Frequency x 11 y 8 9
If the mean of is 2.7 find the standard deviation of the values. [5 marks]
(d) A company manufacturers T.V. sets. The probability that a set from this company fails during first
month of its use is 0.02. Of those that do not fail during first month, the probability of failure in the
next five months is 0.01. Of those that do not fail during the first six months, the probability of failure
by the end of the first year is 0.001. The company replaces, free of charge, any set that fails during its
warranty period. If 2,000 sets are sold, how many will have to be replaced if the warranty period is
(a) Gross mean weekly earnings (y in Ksh. per week) for a sample of male clerical workers of varying
ages (x, in complete years) in a large company are as follows:
(i) Plot a scatter diagram of these data and comment on their suitability for simple linear regression
analysis. [3 marks]
(ii) Obtain the coefficient of correlation between X and Y . [4 marks]
(iii) Write down the models for
(a) Simple linear regression of y on x.
(b) Simple linear regression of x on y.
Define your notation and explain clearly which model is better suited to fit the variables and data
as defined in the table above. [5 marks]
(iv) (a) Fit the simple linear regression model of y on x to the data above, find the equation of the
fitted regression line, draw this line on your scatter diagram, and use the equation to estimate
the mean weekly earnings at age 50.
[7 marks]
(b) Your line manager asks you to use your model to estimate the mean weekly earnings at age
70. How would you answer him? [1 mark]
(a) A computer program generates random questions in arithmetic that have to be answered within a fixed
time. The probability of answering the first question correctly is 0.8. Whenever a question is answered
correctly, the next question generated is more difficult and the probability of a correct answer being
given is reduced by 0.1. Whenever a question is answered wrongly, the next question is of the same
standard and the probability of answering it correctly is not changed.
(i) Draw a tree diagram to show this information for the first two generated questions.
[2 marks]
(ii) Find the probability that the second question was answered wrongly.
[2 marks]
(iii) By extending the tree diagram find the probability that the second question is answered correctly
given that the second question is answered correctly.
[4 marks]
(b) Two events A and B are independent. Given that P (A) = 0.4 and
P (A ∪ B) = 0.7 find P (B). [2marks]
(c) Two events C and D are such that P (D|C) = 0.2 and P (C|D) = 0.25. Given that P (C ∪ D) = 0.2
find the probability that both events occur. [4 marks]
(d) A discrete random variable X takes the values 0,1,2 and 3 only.
Given that P (X ≤ 2) = 0.9, P (X ≤ 1) = 0.5 and E(X) = 1.4, find the 2nd moment about the origin
of X. [5 marks]
QUESTION FOUR (20 Marks)
(a) A random variable X has the following probability distribution.
kx2 , x = 1, 2, 3
P (X = x) = k(7 − x)2 , x = 4, 5, 6
0, otherwise
(b) A, B, C and D toss a fair coin in turn, starting with A and the first to throw a head wins. The game
can continue indefinitely until a head is thrown. However D, objects as the others have their first turn
before him.
Compare the probability that A wins with the probability that D wins.
[8 marks]
QUESTION FIVE (20 Marks)
(a) A doctor inquired from 10 of his patients the number of years they had smoked. For each patient he
gave a grade between 0 and 100 of the extent of lung damage. The following table shows the results
No of years 15 22 25 28 31 33 36 39 42 48
Grade 30 50 55 30 57 35 60 72 70 75
Calculate the Spearman’s rank correlation coefficient between the number of years of smoking and
the extent of lung damage.
Comment on the figure your obtain. [6 marks]
(b) A sample of 51 people were asked to record the distance they had travelled by car in a given week.
The distances to the nearest kilometer are shown below
67 76 85 42 93 48 93 46 52 72
77 53 41 48 86 78 56 80 70 70
66 62 54 85 60 58 43 58 74 44
52 74 52 82 78 47 66 50 67 87
78 86 94 63 72 63 44 47 57 68
81
(i) Construct a suitable stem and leaf diagram to represent these data.
Comment on the shape of the distribution. [4 marks]
(ii) Starting with the interval 40-49, construct a frequency distribution table for the data. [2 marks]
(ii) Investigate the skewness and peakedness of this distribution. [8 marks]
(b) Distinguish between qualitative and quantitative variables. Determine which of the following variables
are qualitative and which are quantitative. Classify the quantitative variables further into discrete
quantitative and continuous quantitative variables.
(i) Organize the data in a table, using 100 − 119 as the smallest interval. (3 marks)
(ii) Construct a frequency histogram based on the grouped data. (4 marks)
(iii) Calculate the mean and the standard deviation of the data. (5 marks)
(iv) In what interval is the median for these grouped data? Calculate the median of the data. (3
marks)
(d) Given two events A and B such that P (A|B) = 0.8, P (A) = 0.5, and P (B) = 0.25.
Determine
(ii) The data below gives sales in millions for a certain company in Kenya from 2007 to 2015:
Year: 2007 2008 2009 2010 2011 2012 2013 2014 2015
Sales: 23 15 17 22 25 29 25 30 29
Obtain smoothed values using 4-point moving averages.
(6 marks)
(c) (i) If the Fisher’s price index is 109.91 and the Paasche’s price index is 110.6, calculate the Laspeyre’s
index number. (2 marks)
(ii) Define an index number. Explain two areas where index numbers are applied.
(2 marks)
(d) The table below gives the monthly cost in Ksh of some living necessities in a certain town for two
time periods. Each necessity has been given a weight as a measure of its importance to living.
Taking 2007 as the base period, calculate the cost of living index interpret it.
(6 marks)
65 76 36 48 49 48 84 55 79 51
43 21 78 35 37 61 40 45 68 33
88 45 50 53 60 34 56 67 57 42
59 62 62 65 76 55 76 61 70 73
35 41 60 74 52 82 63 58 32 26
(i) Starting with a class 20 − 29, group this data into a frequency distribution, and plot its ogive.(6
marks)
(ii) Using a suitable assumed mean, calculate the mean and standard deviation of this data. (4
marks)
3 5 1 13 6 10 8
11 12 17 23 X 0
Calculate
(b) The following data have been collected relating to returns which would have been earned from an
investment of an equal sum of money in the shares of E.T. Plc. and a group of market shares.
Year: 1 2 3 4 5 6 7 8 9 10
E.T. Plc shares (Y ): 7.8 11.0 15.2 23.1 29.7 37.4 44.6 52.8 60.2 63.9
Mkt shares (X): 11.1 12.3 18.5 25.4 28.7 33.8 37.7 39.6 44.7 45.5
(c) Recent unit prices in hundreds of shillings of various fruits and vegetables (items) in Nairobi and Nyeri
were as follows:
Item: A B C D E F G
Nairobi (X): 14 16 16 9 8 28 35
Nakuru (Y ): 9 18 20 15 6 26 38
(a) Find the probabilities that a random variable having a Standard Normal distribution will take on a
value.
(b) Three machines A, B, and C produce respectively 60%, 30% and 10% of the total number of items
of a factory. The percentages of defective output of these machines are respectively 2%, 3%, and 4%.
An item is selected at random from the product and is found to be defective. Find the probability that
the item was produced by machine C.
(4 marks)
(c) A sample of 40 electric batteries gives a mean life span of 600 hrs with a standard deviation of 20 hours.
Another sample of 50 electric batteries gives a mean lifespan of 520 hours with a standard deviation of
30 hours. If these two samples were combined and used in a given project simultaneously, determine
the combined new mean for the larger sample and hence determine the combined or pulled standard
deviation.
(6 marks)
(a) Giving an example in each case differentiate the following terms as used in statistics:-
(b) The following data are the temperatures of effluent at discharge from a sewage treatment facility on
consecutive days
43 47 51 48 52 50 46 49 45 52 46 51 44 49 46 51 49 45 44 50 48 50 49 50
Construct a box plot of the data and use comment on the skewness of this
distribution. [5 marks]
(c) The mode of the following incomplete distribution of weights of 160 students is 56.
Assuming that the weights are linearly distributed in each group estimate the
(d) Among 1,000 applicants for admission to M.A. economics course in a University, 600 were economics
graduates and 400 were non-economics graduates; 30% of economics graduate applicants and 5% of
non-economics graduate applicants obtained admission. If an applicant selected at random is found
to have been given admission, what is the probability that he or she is an economics
graduate? [3 marks]
(a) The following table presents sample data relating the number of study hours spent by students outside
of class during a three-week period for a course in statistics and their scores in an examination given
at the end of that period.
Sampled student 1 2 3 4 5 6 7 8
Study hours ( x) 20 61 34 23 27 23 18 22
Examination grade (y) 64 61 84 70 88 92 72 77
(b) Two variables X and Y are such that the regression equation y on x
is 4x − 5y + 33 = 0 and that of x on y is 20x − 9y = 107. Obtain the
(a) Three computer viruses arrived as an e-mail attachment. Virus A damages the system with probability
0.4. Independently of it, virus B damages the system with probability 0.5. Independently of A and B,
virus C damages the system with probability 0.2. Use a tree diagram to obtain the possible outcomes
and hence determine the probability that the system gets damaged? [5 marks]
(b) A computer assembling company receives 24% of parts from supplier X, 36% of parts from supplier
Y, and the remaining 40% of parts from supplier Z. Five percent of parts supplied by X, ten percent
of parts supplied by Y, and six percent of parts supplied by Z are defective. If an assembled computer
has a defective part in it, what is the probability that this part was received from supplier Z?[4marks]
(c) An computer system has two components. Define the following events
x 1 2 3 4
P (X = x) c c2 c +c 3c + 2c
2 2
(b) The heights and the corresponding weights of a group of 9 randomly selected students were measured
and the following results obtained.
Height (m) X 1.6 1.7 1.6 1.4 1.6 1.7 1.4 1.3 1.2
Weight (kg) Y 70 75 65 55 60 76 55 50 49
Marks 10 - 19 20 - 29 30 - 39 40 - 49 50 - 59 60 - 69
frequency 4 6 9 5 4 2
(b). The marks obtained by 30 students in a mathematics test marked out of 20 are as shown below
Marks 8 9 10 11 12 14 15 17 18 20
No. of Students 2 3 4 4 5 3 3 3 2 1
Compute
(c). The first of the two groups has 100 items with mean 45 and variance 49. If the combined group has
250 items with mean 51 and variance 130, find the mean and standard deviation of the second group.
[5 marks]
(d). The following data refers to Examination marks verses hours of study per week of a sample of eight
candidates that sat for Business statistics examination in 2010.
Exam Mark (Y ) 64 61 84 70 88 92 72 72
Hours of Study (X) 20 16 34 23 27 32 18 22
Calculate
(e). If 50% of the families subscribe to the morning newspaper, 65% of the families subscribe to the
afternoon newspaper and 85% of the families subscribe to at least one of the two newspapers, what
proportion of the families subscribe to both newspapers? [3 marks]
(ii) What is the addition rule of probability and for what type of events is it valid? [2 marks]
QUESTION TWO (20 marks)
(b). If the Fisher’s price index is 109.91 and the Paasche’s price index is 110.6, calculate the Laspeyre’s
index number. [2 marks]
(c). (i) Define the term “time series” and giving an example. [2 marks]
(ii) What is the aim of time series analysis? [2 marks]
(iii) By giving the relevant examples, briefly explain the components of a time series. [6 marks]
(d). The number of new stereo systems sold by an electrical store each quarter for four years is shown
below.
(a). The following table gives the marks obtained by first year students in a Marketing examination.
Calculate
(ii) KY accounting firm has noticed that of the companies it audits, 85% show no inventory short-
ages, 10% show small inventory shortages and 5% show large inventory shortages. KY firm has
devised a new accounting test for which it believes the following probabilities hold:
P (company will pass test | no shortages) = 0.90
P (company will pass test | small shortages) = 0.50
P (company will pass test | large shortages) = 0.20
Determine the probability if a company being audited fails this test has large or small inventory
shortages. [8 marks]
(a). Statistics is a means of collection of numerical facts on data. What are the major advantages of sam-
pling method over the census? [4
marks]
(b). The random variable X has a probability distribution shown in the table below
X 0 10 20 30
P (X = x) 0.1 p 0.45 q
Given that the mean of X is 16.5, find the value of p and q. [4 marks]
(c). The following data shows two groups of casual workers, their number and average wages they are paid.
(i) Starting with a class 20 − 29, group the data into a frequency distribution, represent the data in a
histogram and plot its ogive. [8 marks]
(iii) Using a suitable assumed mean, calculate the mean hence or otherwise calculate, the mode, the median
and the standard deviation of the data above. [11 marks]
KUCT
Town Campus
Introduction to Business Statistics
WKD class ASSIGNMENT I: 60 marks
Instructions: Answer all Questions.
1. The ratings below are based on collisions claim experience and theft frequency for 12 makes of small,
two-door cars. higher numbers reflect higher claims and more frequent thefts, respectively.
2. A firm has four plants scattered around the city producing the same item. Plant A produces 30% of
total production, plant B produces 25%, plant C produces 35%, and plant D produces 10%. The firm
has a single warehouse in the city for storing finished products from all the plants. From the past
performance records on the proportion of defectives, it has been found that 5%, 10%,15%, and 20%
of the items produced at A,B, C, and D respectively are defectives. before the shipment of an item to
a dealer, one unit is selected at random and found to be defective. What is the probability that it was
produced by plant C?
3. The masses in grams of some fruits are given in the table below
363.7 346.4 377.4 341.7 359.8 361.2 385.7 363.5 354.2 375.3
372.2 364.3 373.3 379.4 351.4 369.5 385.5 365.5 385.5 369.5
(a). Starting with 340 group the data into classes of interval 10 and draw an Ogive. From the obtained
frequency table calculate
(b). Using a suitable assumed mean, calculate the mean and the standard deviation.
(c). Using coding method, calculate the mean and the standard deviation.
(d). The coefficient of variation.
(e). Draw a histogram and determine the mode.
4. The table below gives the monthly costs of some living necessities in two towns A and B. Each ne-
cessity has been given a weight as a measure of its importance to basic living. Christine has just been
enrolled in a college near the two towns and is contemplating residing in one of the two towns.
Taking A as the base town, calculate the cost of living index and advice Christine accordingly.
5. The table below gives the probability distribution of a discrete random variable X given that P (X <
13) = 0.75. Find the value of k and q, hence calculate E(X).
x 4 8 12 15 20
P (X = x) k 0.25 0.3 q 0.1
(a) The masses of 90 eggs to the nearest gram were recorded as follows Assuming that the are linearly
distributed in each group and that 60% of the eggs have masses below 66.5g ,estimate the
(b) An inspector working for a manufacturing company has a 99% chance of correctly identifying defec-
tive items and a 0.5% chance of incorrectly classifying a good item as defective. The company has
evidence that its line produces 0.9% of nonconforming items.With the aid of a tree diagram,
(i) What is the probability that an item picked at random for inspection
is defective? [2 marks]
(ii) If an item selected at random is classified as defective, what is the probability that it is indeed
good? [5 marks]
(c) The marks obtained by eight students in maths and programming are as shown above.
Calculate to 4 d.p. the Spearman’s rank correlation coefficient and
maths 67 24 85 51 39 97 81 70
programming 70 59 71 38 55 62 80 76
sample 1 2 3 4 5 6 7 8 9
x 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0
y 49 60 66 62 72 64 89 90 96
(b) It is discovered that one of the samples of the film was damaged and gave incorrect results. State which
sample this could be. [1 mark]
(c) Ignoring the sample that was damaged calculate to 2d.p. the product moment correlation coefficient.
[8marks]
(d) State with reason whether it is sensible to conclude that x and y are linearly related. [2 marks]
(f) Use the regression line to estimate the contrast index corresponding to the damaged piece of film. [2
marks]
(g) State with reason whether it is sensible to estimate the contrast index when the amount of the chemical
applied is zero. [1 mark]
(c) A group of 140 college students study maths. Each student takes only algebra, or statistics only or
both algebra and statistics. If a student takes Statistics the probability that he takes Algebra is 13 while
the probability that takes he statistics given that he takes algebra is 15 . Find the number of students
taking
(a) A game consists of tossing three unbiased coins simultaneously. The total score is calculated by
awarding three points for each head and one point for each tail. Let the random variable X represents
the total score
(b) A discrete random variable X takes only the values 0, 1, 2, 3, 4, 5. The probability distribution of X
is as follows
P (X = 0) = P (X = 1) = P (X = 2) = p
P (X = 3) = P (X = 4) = P (X = 5) = q
P (X ≥ 2) = 3P (X < 2)
Determine
(c) The marks obtained in a test by a class of 120 students were as follows
By taking moments about the point 54.5, investigate the skewness and peakedness of distribution of
the marks. [13 marks]
(a). Most undergraduate business students will not go on to become actual practitioners of statistical re-
search and analysis. Considering this fact, why should such individuals bother to become familiar
with business statistics? [2 marks]
(b). Given two events A and B such that P (A|B) = 0.4, P (A) = 0.06, P (B) = 0.10. Find
(c). In a game organized by Prof. Makoha, the score for the game is a random variable X which takes a
set of values {0, 1, 2, 3, 4} according to the probability distribution below:
x 0 1 2 3 4
P (x) 0.1 a b 0.2 0.1
Given the expectation (mean) of x is 1.8 determine the values of a and b and hence the variance of x.
[6 marks]
(d). Given below is an incomplete frequency distribution of masses (in kg) of 100 students in a college.
The classes are are of equal width.
(e) Distinguish between the moment coefficient of skewness and moment coefficient of kurtosis. [2
marks]
30
X 30
X
(a). Two data sets are such that data set A collected on a random variable X has x = 240 and x2 = 2520
i=1 i=1
and data set B of ten observations collected on a random variable Y has a mean of 5 and variance of
6, find;
(i). Find the mean and the variance of data set A [3 marks]
(ii). The mean and variance of combined data sets [5 marks]
(iii). Write down the answers to (i) and (ii) if each of original values is multiplied by 2 [2 marks]
(b). A Battery manufacturer was interested in predicting the annual mantainance cost of the battery man-
ufacturing machines based upon the age of the machine. A sample of ten machines revealed the
following ages and mantainance costs during the previous year.
Age(years) 9 4 2 8 4 5 1 3 6 8
cost 40 12 8 27 15 17 5 10 25 31
(a). The following table gives the weights in kilograms of a certain product from some farmers. Using an
assumed mean of A = 655;
(b). An aircraft emergency locator transmitter (ELT) is a device designed to transmit a signal in the case of
a crash. The Ultimate Manufacturing company makes 80% of the ELTs, the Bryant Company makes
15% of them, and the Charterair Company makes the other 5%. The ELTs made by Ultimate have a
4% rate of defects, the Bryant ELTs have a 6% rate of defects, and the Charterair ELTs have a 9% rate
of defects (which helps to explain why Charterair has the lowest market share).
An ELT is randomly selected from the general population of all ELTs then tested and is found to be
defective
(iii) If a randomly selected ELT is defective, find the probability that it was made by the Charterair
manufacturing company. [3 marks]
QUESTION FOUR(20 marks)
(a). The table below gives a probability distribution of a discrete random variable X. Given that P (X <
130) = 0.62, find the value of a and b hence calculate the standard deviation of (X).
x 40 80 120 150 200
P (X = x) a 0.22 0.25 b 0.15
[6 marks]
(b). The following is information on the rail distance to destination and transportation times for ten ship-
ments by a spare parts supplier data.
Customer: A B C D E F G H I J
Distance (X): 270 290 350 480 490 730 780 850 920 1010
Time(Days) (Y): 5 7 6 11 8 11 12 8 15 12
(i) Fit a regression line to the data. [6 marks]
(ii) Compare the regression line above with that obtained by regressing transportation time on the
rail distance. [2 marks]
(iv) Calculate the Pearson’s the coefficient of correlation. [4 marks]
(iv) Calculate the coefficient of determination. [2 marks]
(b). The marks obtained by 30 students in a mathematics test marked out of 20 are as shown below
Marks 8 9 10 11 12 14 15 17 18 20
No. of Students 2 3 4 4 5 3 3 3 2 1
Compute
(c). The first of the two groups has 100 items with mean 45 and variance 49. If the combined group has
250 items with mean 51 and variance 130, find the mean and standard deviation of the second group.
[5 marks]
(d). The following data refers to Examination marks verses hours of study per week of a sample of eight
candidates that sat for Business statistics examination in 2010.
Exam Mark (Y ) 64 61 84 70 88 92 72 72
Hours of Study (X) 20 16 34 23 27 32 18 22
Calculate
(e). If 50% of the families subscribe to the morning newspaper, 65% of the families subscribe to the
afternoon newspaper and 85% of the families subscribe to at least one of the two newspapers, what
proportion of the families subscribe to both newspapers? [3 marks]
(ii) What is the addition rule of probability and for what type of events is it valid? [2 marks]
QUESTION TWO (20 marks)
(b). If the Fisher’s price index is 109.91 and the Paasche’s price index is 110.6, calculate the Laspeyre’s
index number. [2 marks]
(c). (i) Define the term “time series” and giving an example. [2 marks]
(ii) What is the aim of time series analysis? [2 marks]
(iii) By giving the relevant examples, briefly explain the components of a time series. [6 marks]
(d). The number of new stereo systems sold by an electrical store each quarter for four years is shown
below.
(a). The following table gives the marks obtained by first year students in a Marketing examination.
Calculate
(ii) KY accounting firm has noticed that of the companies it audits, 85% show no inventory short-
ages, 10% show small inventory shortages and 5% show large inventory shortages. KY firm has
devised a new accounting test for which it believes the following probabilities hold:
P (company will pass test | no shortages) = 0.90
P (company will pass test | small shortages) = 0.50
P (company will pass test | large shortages) = 0.20
Determine the probability if a company being audited fails this test has large or small inventory
shortages. [8 marks]
(a). Statistics is a means of collection of numerical facts on data. What are the major advantages of sam-
pling method over the census? [4
marks]
(b). The random variable X has a probability distribution shown in the table below
X 0 10 20 30
P (X = x) 0.1 p 0.45 q
Given that the mean of X is 16.5, find the value of p and q. [4 marks]
(c). The following data shows two groups of casual workers, their number and average wages they are paid.
(i) Starting with a class 20 − 29, group the data into a frequency distribution, represent the data in a
histogram and plot its ogive. [8 marks]
(iii) Using a suitable assumed mean, calculate the mean hence or otherwise calculate, the mode, the median
and the standard deviation of the data above. [11 marks]
(b). State four reasons why statisticians would prefer to use sample data instead of population data. [4
marks]
(c). Find the mean, the mean absolute deviation and the 35th percentile of the following data.
[7 marks]
(d). A fair coin is tossed and a fair die is thrown at the same time. Find the probability of getting a head
and a three (3) at the same time. [2 marks]
(e). The table below gives the probability distribution of a discrete random variable X given that P (X <
13) = 0.75. Find the value of k and q, hence calculate E(X).
x 4 8 12 15 20
P (X = x) k 0.25 0.3 q 0.1
[5 marks]
(f). State the four main component movements of a time series. [4 marks]
(g). For a certain data set of 20 values, x = 154 and x = 2045. Find the mean and standard
P P 2
deviation of the data set after dropping a value of 19 from the data set. [4 marks]
(a). Twenty staff members in a construction company were surveyed to find out what their weekly wages
were in euros. The results were as follows:
49.05 73.10 72.75 58.65 63.00 42.75 53.25 60.00 61.35 49.80
54.90 63.75 51.75 66.00 59.25 51.30 53.55 49.20 38.10 58.95
(iii). Draw a properly labeled histogram and a frequency polygon on the same graph for the data above.
[4 marks]
(b). The table below shows the distribution of the weights of 100 students in a university.
(a). In a certain company 65% of the workers can speak English, 75% can speak Kiswahili, while 15%
can neither speak English nor Kiswahili. An employee is randomly picked from this group. Find the
probability that the person speaks;
(b). Marion can neither take a course in computers or in chemistry. If she takes the computer cause, then
she will score an A with probability 0.5, if she takes the chemistry course, then she will score an A
grade with probability 31 . Marion decides to base her decision on the flip of a coin.
(i). What is the probability that she will score an A in chemistry? [6 marks]
(ii). Given that Marion scores an A, what is the probability that she took the computer course. [6
marks]
(a). The table below gives the monthly costs of some living necessities in two towns A and B. Each ne-
cessity has been given a weight as a measure of its importance to basic living. Christine has just been
enrolled in a college near the two towns and is contemplating residing in one of the two towns.
Taking A as the base town, calculate the cost of living index and advice Christine accordingly. [10
marks]
(b). The data below gives sales in millions for a certain company in Kenya from 2000-2008.
Year 2000 2001 2002 2003 2004 2005 2006 2007 2008
Sales 2.3 1.5 1.7 2.2 2.5 2.9 2.5 3.0 2.9
(c). Two data sets are such that the first has 150 items with a mean of 55. If the combined set has 250 items
with a mean of 51, calculate the mean of the second set. [2 marks]
(c). Find the number of classes and the class intervals for data with the following characteristic.
Maximum value 467, minimum value 323, sample size 50. [2 marks]
(b). A random variable X has probability distribution shown in the table below;
x 10 15 20 24
P (x) a 0.24 2a 0.28
Find
(c). A bag contains some rotten eggs and 40 good ones, it the probability of randomly picking a rotten egg
from it is 1/5, find the number of rotten eggs. [4 marks]
INSTRUCTIONS:
Answer Question ONE (COMPULSORY) and any other TWO Questions.
QUESTION ONE (30 marks) (COMPULSORY)
(a). Most undergraduate business students will not go on to become actual practitioners of statistical re-
search and analysis. Considering this fact, why should such individuals bother to become familiar
with business statistics? [3 marks]
(b). In a study of the daily production of a company for over 50 days, the following data was obtained:
65 76 36 48 49 48 84 55 79 51
43 21 78 35 37 61 40 45 68 33
88 45 50 53 60 34 56 67 57 42
59 62 62 65 76 55 76 61 70 73
35 41 60 74 52 82 63 58 32 26
(i). Starting with a class 20 − 29, group this data into a frequency distribution, and plot its ogive.[6
marks]
(ii). Using a suitable assumed mean, calculate the mean and standard deviation of this data. [5
marks]
3 5 1 13 6 10 8
11 12 17 23 X 0
Determine
(d). Given two events A and B such that P (A|B) = 0.8, P (A) = 0.5, and P (B) = 0.25.
Determine
(b). The weather in Nyeri over 18 months was observed and the mean temperatures in degrees Celsius
reported as shown below:
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
2007 27 25 22 20 17 15 13 16 19 18 20 23
2008 24 23 20 20 18 12
(c). A sample of 40 electric batteries gives a mean life span of 600 hrs with a standard deviation of 20 hours.
Another sample of 50 electric batteries gives a mean lifespan of 520 hours with a standard deviation of
30 hours. If these two samples were combined and used in a given project simultaneously, determine
the combined new mean for the larger sample and hence determine the combined or pulled standard
deviation. [6 marks]
(a). The following data have been collected relating to returns which would have been earned from an
investment of an equal sum of money in the shares of E.T. Plc. and a group of market shares.
Year: 1 2 3 4 5 6 7 8 9 10
E.T. Plc shares (Y ): 7.8 11.0 15.2 23.1 29.7 37.4 44.6 52.8 60.2 63.9
Mkt shares (X): 11.1 12.3 18.5 25.4 28.7 33.8 37.7 39.6 44.7 45.5
(b). Recent unit prices in hundreds of shillings of various fruits and vegetables (items) in Nairobi and
Nakuru were as follows:
Item: A B C D E F G
Nairobi (X): 14 16 16 9 8 28 35
Nakuru (Y ): 9 18 20 15 6 26 38
(a). The weights in grams of some 25 computer parts are given in the following frequency distribution
table below
Weights 160.0 − 169.9 170.0 − 179.9 180.0 − 189.9 190.0 − 199.9 200.0 − 209.9
No. of parts 3 X 9 6 3
Calculate
(i). The missing frequency and hence find the mode. [4 marks]
(ii). Using the assumed mean A = 184.95. Find the mean and the standard deviation of the weights
using the Coding method. [6 marks]
(iii). Calculate the third quartile of the weight. [4 marks]
(b). At a supermarket 60% of the customers pay using the credit card. Find the probability that in a ran-
domly selected set of 10 customers.
(a). Find the probabilities that a random variable having a Standard Normal distribution will take on a
value.
(b). Three machines A, B, and C produce respectively 60%, 30% and 10% of the total number of items
of a factory. The percentages of defective output of these machines are respectively 2%, 3%, and 4%.
An item is selected at random from the product and is found to be defective. Find the probability that
the item was produced by machine C. [8 marks]
X 0 1 2 3
P (X = x) p 2q p+q q
• The following data was observed and it is required to establish if there exists a relationship between
the two.
X 15 24 25 30 35 40 45 65 70 75
Y 60 45 50 35 42 46 28 20 22 15
INSTRUCTIONS:
Answer Question ONE (COMPULSORY) and any other TWO Questions.
QUESTION ONE (30 marks) (COMPULSORY)
(b). State four reasons why statisticians would prefer to use sample data instead of population data. [4
marks]
(c). Find the mean, the mean absolute deviation and the 35th percentile of the following data.
[7 marks]
(d). A fair coin is tossed and a fair die is thrown at the same time. Find the probability of getting a head
and a three (3) at the same time. [2 marks]
(e). The table below gives the probability distribution of a discrete random variable X given that P (X <
13) = 0.75. Find the value of k and q, hence calculate E(X).
x 4 8 12 15 20
P (X = x) k 0.25 0.3 q 0.1
[5 marks]
(f). State the four main component movements of a time series. [4 marks]
(g). For a certain data set of 20 values, x = 154 and x = 2045. Find the mean and standard
P P 2
deviation of the data set after dropping a value of 19 from the data set. [4 marks]
(a). Twenty staff members in a construction company were surveyed to find out what their weekly wages
were in euros. The results were as follows:
49.05 73.10 72.75 58.65 63.00 42.75 53.25 60.00 61.35 49.80
54.90 63.75 51.75 66.00 59.25 51.30 53.55 49.20 38.10 58.95
(ii). Determine the appropriate class interval and present this data in a frequency distribution table
starting with the value of 38.00. [5 marks]
(iii). Draw a properly labeled histogram and a frequency polygon on the same graph for the data above.
[4 marks]
(b). The table below shows the distribution of the weights of 100 students in a university.
(a). In a certain company 65% of the workers can speak English, 75% can speak Kiswahili, while 15%
can neither speak English nor Kiswahili. An employee is randomly picked from this group. Find the
probability that the person speaks;
(b). Marion can neither take a course in computers or in chemistry. If she takes the computer cause, then
she will score an A with probability 0.5, if she takes the chemistry course, then she will score an A
grade with probability 31 . Marion decides to base her decision on the flip of a coin.
(i). What is the probability that she will score an A in chemistry? [6 marks]
(ii). Given that Marion scores an A, what is the probability that she took the computer course. [6
marks]
(a). The table below gives the monthly costs of some living necessities in two towns A and B. Each ne-
cessity has been given a weight as a measure of its importance to basic living. Christine has just been
enrolled in a college near the two towns and is contemplating residing in one of the two towns.
Taking A as the base town, calculate the cost of living index and advice Christine accordingly. [10
marks]
(b). The data below gives sales in millions for a certain company in Kenya from 2000-2008.
Year 2000 2001 2002 2003 2004 2005 2006 2007 2008
Sales 2.3 1.5 1.7 2.2 2.5 2.9 2.5 3.0 2.9
x 10 15 20 24
P (x) a 0.24 2a 0.28
Find
(i). the value of a. [3 marks]
(ii). the expected value of X [3 marks]
(c). A bag contains some rotten eggs and 40 good ones, it the probability of randomly picking a rotten egg
from it is 1/5, find the number of rotten eggs. [4 marks]
Definition 16.1. A random variable X has a Bernoulli distribution and it is referred to as a bernoulli random
variable if and only if its probability distribution is given by
f (x; p) = px (1 − p)1−x f or x = 0, 1
In connection with the Bernoulli distribution, a success may be getting heads with a balanced coin, it may
be catching pneumonia, it may be passing (or failing) an examination and it may be losing a race.
Note: Bernoulli distribution is a special case of the Binomial distribution. The mean and variance of the
Bernoulli distribution are given as
Question 16.1. Show that E(X) = p and Var(X) = p(1 − p) for a random variable X which has a bernoulli
distribution.
X 1
X
E(X) = xP (x) = px (1 − p)1−x
allx x=0
= 0 · p0 (1 − p)1 + 1 · p(1 − p)1−1
= 0 + p(1 − p)0 = p
X
Var(X) = (x − µ)P (x) = E(X 2 ) − E(X)2
allx
= 02 (1 − p) + 12 (p − p2 )
= p − p2 = p(1 − p).
= (1 − p) + et p
= 1 − p + et p
= P (et − 1) + 1
Example 16.1. A carton contain 4 good eggs and 6 bad eggs. If an egg is selected at random, then the
random variable (
0 if the egg is bad
X=
1 if the egg is good
4 2
P (good egg) = =
10 5
P (X = 1) = (2/5) · (1 − 2/5)0 = 2/5
1
(iii) the probability of “success,” denoted by p, 0 < p < 1, is the same on every trial.
If the above conditions are satisfied then X is said to follow a binomial distribution. If a Bernoulli experiment
is performed repeatedly, we obtain a sequence of bernoulli trials. In a finite sequence of bernoulli trials,
we are usually interested in the number of ‘success’. Since there are only two possible outcomes at each
trial a sample of “n” bernoulli trials contains 2n possible outcomes. As before X = 0, 1 for each trial. The
probability of any particular sequence of outcomes in ‘n’ trials is obtained as the product of the probabilities
of the n outcomes resulting at each trial.
Definition 16.2. A random variable X has a binomial distribution and it is referred to as a binomial random
variable if and only if its probability distribution is given by
n x
f (x; n, θ) = p (1 − p)n−x for x = 0, 1, 2, · · · , n
x
where
n - number of trials
p - probability of success
1 − p - probability of failure
-
x is the random variable (the number of successes in n trials).
n n!
=
x x!(n − x)!
Thus, the number of successes in n trials is a random variable having a binomial distribution with parameters
n and p. The name ‘binomial distribution’ is derived from the fact that the values of b(X; n, p) for X =
0, 1, 2, · · · , n are successive terms of the binomial expansion [(1 − p) + p]n ; this shows that the sum of the
probabilities equal 1, and it should.
Example 16.2. Each of the following situations represent binomial experiments. (Are you satisfied with
the Bernoulli assumptions in each instance?)
(a) Suppose we flip a fair coin 10 times and let Y denote the number of tails in 10 flips. Here, Y ∼ b(n =
10; p = 0.5).
(b) In an agricultural experiment, forty percent of all plots respond to a certain treatment. I have four plots
of land to be treated. If Y is the number of plots that respond to the treatment, then Y ∼ b(n = 4; p =
0.4).
(c) In rural Kenya, the prevalence rate for HIV is estimated to be around 8 percent. Let Y denote the
number of HIV infected in a sample of 740 individuals. Here, Y ∼ b(n = 740; p = 0.08).
(d) It is known that screws produced by a certain company do not meet specifications (i.e., are defective)
with probability 0.001. Let Y denote the number of defectives in a package of 40. Then, Y ∼ b(n =
40; p = 0.001).
(e) Toss a fair coin 100 times and let X be the number of heads. Then X ∼ B(100, 0.5).
(f) A certain kind of lizard lays 8 eggs, each of which will hatch independently with probability 0.7. Let
X denote the number of eggs which hatch. Then X ∼ B(8, 0.7).
(g) What is the probability that at least 9 out of a group of 10 people who have been infected by a serious
disease will survive, if the survival probability for the disease is 70%?
This is the probability of having x successes in a series of n independent trials when the probability of
success in any one of the trials is p. If X is a random variable with this probability distribution,
n
X n x
E(X) = x p (1 − p)n−x
x=0
x
n
X n!
= x px (1 − p)n−x
x=0
x!(n − x)!
n
X n!
= px (1 − p)n−x
x=1
(x − 1)!(n − x)!
since the x = 0 term vanishes. Let y = x − 1 and m = n − 1. Subbing x = y + 1 and n = m + 1 into the
last sum (and using the fact that the limits x = 1 and x = n correspond to y = 0 and y = m, respectively)
m
X (m + 1)! y+1
E(X) = p (1 − p)m
y=0
y!(m − y)!
m
X m!
= (m + 1)p py (1 − p)m
y=0
y!(m − y)!
m
X m!
= np py (1 − p)m
y=0
y!(m − y)!
Setting a = p and b = 1 − p
m m
X m! X m!
py (1 − p)m = ay bm = (a + b)m = (p + 1 − p)m = 1
y=0
y!(m − y)! y=0
y!(m − y)!
so that
E(X) = np
Let us make use of the fact that E(X 2 ) = E[X(X − 1)] + E(X) and first evaluate E[(X(X − 1)]. Similarly,
but this time using y = x − 2 and m = n − 2.
n
X n x
E(X(X − 1)) = x(x − 1) p (1 − p)n−x
x=0
x
n
X n!
= x(x − 1) px (1 − p)n−x
x=0
x!(n − x)!
n
X n!
= px (1 − p)n−x
x=2
(x − 2)!(n − x)!
n
X (n − 2)!
= n(n − 1)p2 px−2 (1 − p)n−x
x=2
(x − 2)!(n − x)!
m
X m!
= n(n − 1)p2 py (1 − p)m−y
y=0
y!(m − y)!
= n(n − 1)p (p + (1 − p))m
2
= n(n − 1)p2
So the variance of X is
MGF FOR THE BINOMIAL DISTRIBUTION: Suppose that Y ∼ b(n; p). Then the mgf of Y is
given by
P (X = 1) = p
P (X = 0) = 1 − p
n
tY n yX
ty
MX (t) = E(e ) = e p (1 − p)n−y
y=0
y
n
X n y
= pet (1 − p)n−y = (q + pet )n ,
y=0
y
b(x; n, θ) = b(n − x, n, 1 − θ)
2. Each trial has only two possible, mutually exclusive, outcomes which are termed as a ’success’ or a
’failure’, or “good” or “bad”, “Head” or “Tail” etc.
3. The probability of a success, denoted by p, is known and remains constant from trial to trial. The
probability of a failure, denoted by q, is equal to 1 − p, such that p + q = 1.
4. Different trials are independent, i.e., outcome of any trial or sequence of trials has no effect on the
outcome of the subsequent trials.
The sequence of trials under the above assumptions is also termed as Bernoulli Trials.
If the above conditions are satisfied then X is said to follow a Binomial distribution.
Definition 16.3. A random variable X has a Binomial distribution and it is referred to as a Binomial random
variable if and only if its probability distribution is given by
n x
f (x; n, p) = p (1 − p)n−x f or x = 0, 1, 2, · · · , n
x
where
n - number of trials
p - probability of success
1 − p - probability of failure
-
x is the random variable (the number of successes in n trials).
n n!
=
x x!(n − x)!
Thus, the number of successes in n trials is a random variable having a Binomial distribution with
parameters n and p. The name ‘Binomial distribution’ is derived from the fact that the values of b(X; n, p)
for x = 0, 1, 2, · · · , n are successive terms of the Binomial expansion [(1 − p) + p]n ; This shows that the
sum of the probabilities equal 1, and it should.
This distribution is known as the binomial distribution with index n and probability p. We write this as
X ∼ Bin(n, p).
Also r
5
SD(X) = Var(X) =
p
= 0.7454
9
Example 16.3. Find the probability of getting five heads and seven tails in 12 flips of a balanced coin.
Solution. Substituting x = 5, n = 12, and θ = 0.5 into the formula for the binomial distribution.
12
f (5, 12, 0.5) = (0.5)5 (1 − 0.5)12−5 = 792 (0.5)12 = 0.19
5
Example 16.4. Find the probability that seven of ten persons will recover from a tropical disease if we can
assume independence and the probability is 0.80 that any one of them will recover from the disease.
Solution. Substituting x = 7, n = 10, and θ = 0.80 into the formula for the binomial distribution.
10
f (7, 10, 0.80) = (0.80)7 (1 − 0.80)10−7 = 120 (0.80)7 (0.20)3 ≈ 0.2
7
Example 16.5. At a supermarket 60% of the customers pay using the credit card. Find the probability that
in a randomly selected set of 10 customers.
Example 16.6. Five independent trials of an experiment are carried out, the probability of a successful
outcome is p and failure is q. Write out the probability distribution distribution of X where x is the number
of successful outcomes in five trials. Comment on your answer.
Example 16.7. The random variable X is defined binomially B(7, 0.2). Find to 3 d.p.
(a). P (X = 3) (b). P (1 < X ≤ 4) (c). P (X > 1)
(d). P (X ≤ 3) (e). P (X ≥ 3)
Example 16.8. A risky operation used for patients with no hope for survival has a survival rate of 80%.
Find the probability that exactly 4 of the next 5 patients operated on will survive. Ans: 0.4096
Example 16.9. A quiz, has 6 multiple choice questions, each with 3 alternatives. Find the probability of
getting five or more correct. Ans: 0.0178
Example 16.10. A box contains a large number of pens. The probability that a pen is faulty is 0.1. How
many pens would you need to select to be more than 95% certain of picking at least one faulty pen.
Example 16.11. Show that E(X) = np and Var(X) = npq of Xn ∼ B(n, p).
Example 16.12. The probability that it will be a fine day is 0.4.
Example 16.13. A biased coin is tossed four times and the number of heads noted. The experiment was
repeated 500 times in all. results are summarized as below;
Number of heads: 0 1 2 3 4
Frequency: 12 50 151 200 87
1. From the data, estimate the probability of obtaining a head when the coin is tossed.
2. Using binomial distribution in the same mean, calculate theoretical frequencies of 0, 1, 2, 3, 4 heads.
Example 16.14. The probability of a student t being awarded a distinction in mathematics is 0.05. In a
randomly selected group of 50 students. What is the most likely number of students awarded distinction.
E(X) = λ, Var(X) = λ.
(i) Events occur singly and at random in a given interval of time or space.
A random variable X has a Poisson distribution if the above conditions are satisfied.
The distribution is useful in describing the number of events that will occur in a specific period of time or
a specific area or volume. For example the number of accidents per month at a busy intersection (junction)
has a Poisson distribution.
Example 16.15. On average the school photocopier breaks down 8 times during the school week (Monday-
Friday). Assuming that the number of breakdowns can be modelled by Poisson distribution. find out the
probability that it breaks down;
2. Once on a Monday
3. 8 times in a fortnight
Solution. Let λ = 8
e−λ λx
f (x, λ) =
x!
−8 5
e 8
(a). P (X = 5) = =
5!
e−8 8
(b). P (X = 1) = =
1!
(c). λ = 8 ⇒ λ = 16 in two weeks
e−16 168
P (X = 8) =
8!
Example 16.16. The average number of trucks arriving on any one day at a truck depot in a certain city is
known to be 2. What is the probability that on a given day fewer than nine trucks will arrive at the depot?
Solution. Let X be the number of trucks arriving on a given day. λ = 12
8
X
P (X < 9) = P (x; 12) = 0.1550
x=0
Example 16.17. A maximum security prison reports that the number of escape attempts by prisoners per
month has nearly a poisson distribution with mean equal to 1.5. Find
2. The probability of at least one escape attempts during the next month.
Solution.
1.53 e−1.5
1. P (X = 3) = f (3) = = 0.1255
3!
1.50 e−1.5
2. P (X ≥ 1) = 1 − f (0) = 1 − = 0.7769
0!
1. X ∼ P (λ) with standard deviation 1.5. Find the P (X ≥ 3).
X ∼ B(500, 0.007)
E(X) = np = 500(0.007) = λ
e−λ λx
(a). P (X = 3) = =
x!
(b). P (X ≥ 2) = P (X = 2) + P (X = 3) + · · ·
= 1 − [P (X = 0) + P (X = 1)]
=???
If we define a variable X as the number of heads observed when a fair coin is tossed three times, then X
takes values 0, 1, 2, 3, where
Hence to each sample points in S we have assigned a real number, which uniquely determines the sample
point. The variable X is called the random variable defined on the sample space S.
We can also find the probabilities of values 0, 1, 2, 3 of the random variable X as follows
Note: Here, the uppercase X is used for the random variable and lowercase x is used to denote (represent)
a realization of X. Probabilities can be easily obtained from the probability distribution table as follows:
Probability of getting two or more heads
3 1 1
P (X > 1) = P (X = 2) = P (X = 3) = + =
8 8 2
Probability of getting at least one head
1 7
P (X > 0) = 1 − P (X = 0) = 1 − =
8 8
A random variable X defined on the sample space S may be finite or infinite, at the same time it may take
only countable values (without decimal) such variables are called discrete random variables. A discrete
random variable is one which may take on only a countable number of distinct values such as 0, 1, 2, 3, 4, · · · .
Examples of discrete random variables include the number of students in a class, the members teams in a
football tournament, number of public holidays in a year, number of guests in attendance at a party, etc. On
the other hand some variables like height, weigh, income do take any possible value on a given range and
are called the continuous random variables.
X P (x) xP (x)
-2 0.14 -0.28
-1 0.2 -0.2
0 0.28 0
1 0.28 0.28
2 0.1 0.2
Total 1 0
Example 17.2. A random variable follows the probability distribution given below;
X 0 1 2 3 4
P (X) 0.12 0.23 k 0.20 0.10
Obtain the value of k, and hence compute the expected value of X.
k = 0.35, E(X) = 1.93 and V ar(X) = 0.35
Example 17.3. A coin is such that the tail is thrice as likely as the head. A game is played such that you earn
5 points for a head and lose 2 points for a tail after every toss. Let X be the total score after 4 consecutive
tosses. Find the probability distribution of X and the expected number of points.
Solution. Let H be the event of observing a head and let X be the points earned, then P (H) = 0.25, and
P (T ) + 0.75, n = 4 and using the binomial formula, we have
X
E(X) = x · p(x)
all x
1 5 5 1 1
=4× +8× + 16 × + 12 × + 24 ×
16 16 16 16 4
232
=
16
= 14.5
Question 17.1. A discrete random variable X takes the following values with the corresponding probabili-
ties;
x −3 −1 0 1 2 3
P (x) 0.1 0.2 0.1 0.2 0.15 0.25
Compute the following probabilities; (a). P (X = −1) (b). P (X = −2)
(c). P (X ≤ 0) (d). P (X is negative) (e). E(X)
Question 17.2. A random variable X has the following probability distribution;
x 10 20 30 40
P (X = x) a 2a 4a 3a
Find the value of a hence the expected value of X.
Question 17.3. The table below gives the probability distribution function of a random variable X.
x 0 1 2 3
P (X = x) p 2q p+q q
Given that the mean of X is 1.375, find the values of p and q.
Question 17.4. A game is played as follows; throw a fair four sided die and score four times the number
that faces down unless its a four. If its a four, you toss a fair coin whose sides are assigned 0 for tail and 2
for head and score 5 times the coins score. Let X be a random variable denoting the score for each player,
representing this information on a tree diagram showing the value of X and the corresponding probability
hence, find;
(c). the probability that you will score more than the expected value.
17.3 Variance
The variance of a probability distribution of a discrete random variable provides a numerical measure of the
spread and is given by the sum of the products of the squared deviations between the mean and all individual
values of the random variable, taken one at a time and their respective probabilities.
Thus variance is given by the formula:
Variance is denoted by σ 2 = Var(X) = E(X − E(X))2 and the standard deviation is the square root of the
variance.
From our example above of the three fair coins being tossed once, we can calculate the value of the variance,
as follows, knowing that the mean of the distribution is 1.5.
Question 17.5. Determine the mean, variance, and standard deviation of the following discrete probability
distribution.
x 0 1 2 3 4
P (x) 0.10 0.30 q 0.20 0.10
Question 17.6. A random variable X has the following probability distribution:
X: -2 -1 0 1 2 3
P (x): 0.1 k 0.2 2k 0.3 k
Find the value of k. Find the expected value and variance of X.
Question 17.7. A random variable X has the following probability distribution:
X: 0 1 2 3 4 5
P (x): 0.1 0.1 0.2 k 0.2 0.1
Find the value of k. Find the expected value and variance of X.
Question 17.8. An unbiased coin is tossed four times. Find the expected value and variance of the random
variable defined as number of Heads.
Example 17.6. The number of telephone calls received in an office between 9.00 A.M - 10.00 A.M has the
probability distribution as shown in the table below:
The Probability distribution of the number of telephone calls.
No. of calls Probability P (X)
0 0.05
1 0.20
2 0.25
3 0.20
4 0.10
5 0.15
6 0.05
(a). Verify that it is a probability function.
(b). Find the probability that there will be 3 or more calls.
(c). Find the probability that there will be even number of calls.
Solution. Clearly,
(a).
(i). 0 ≤ P (xi ) ≤ 1
X n
(ii). P (Xi ) = 0.05 + 0.20 + 0.25 + 0.2 + .010 + 0.15 + 0.05 = 1
i=1
(b).
P (X ≥ 3) = P (X = 4) + P (X = 5) + P (X = 6)
= 0.20 + 0.10 + 0.15 + 0.05
= 0.50
(c).
P (X = 0 or 2 or 4 or 6) = P (X = 0) + P (X = 2) + P (X = 4) + P (X = 6)
= 0.05 + 0.25 + 0.10 + 0.05
= 0.40
Each value of the random variable is multiplied by the probability of occurrence of this value and then all
these products are summed up.
It is also common in statistical literature to refer to the mean as Mathematical Expectation or the Expected
value of the random variable X.
Example 17.7. Assume that we have three fair coins and we toss them simultaneously. The possible number
of heads that can appear as s result of the random experiment are given in the following table:
Outcomes No. of Heads Probability
TTT 0 1/8
HTT 1 1/8
TTH 1 1/8
THT 1 1/8
THH 2 1/8
HHT 2 1/8
HTH 2 1/8
HHH 3 1/8
The table can be summarized as to the number of heads occurring in the entire experiment and their respective
probabilities as follows:
Number of Heads (X) P (X) X · P (X)
0 1/8 0
1 3/8 3/8
2 3/8 6/8
3 1/8 3/8
1.0 12/8
The expected value (mean) for the number of heads in this experiment is
n
X
E(X) = xi P (Xi ) i = 1, 2, · · · , n
i=1
3 6 3
=0+ + +
8 8 8
12
= = 1.5
8
This means that on an average, 1.5 heads can be expected to appear as a result of every random experiment
of tossing three fair coins at any one time.
Example 17.8. In the telephone calls problem above find the mean of the telephone calls between 9 -10 am
X P (X) X P (X)
0 0.05 0
1 0.20 0.2
2 0.25 0.5
3 0.20 0.6
4 0.10 0.4
5 0.15 0.75
6 0.05 0.30
1.00 2.75
6
X
µ= xi P (Xi ) = 2.75
i=0
Example 17.9. Suppose the hourly earnings X of a self employed landscaper gardener are given by the
following probability function.
Hourly Earning X: 0 6 12 16
P (X) : 0.3 0.2 0.3 0.2
Find the gardener’s Mean.
Solution. The Mean is given as:
• Checking items from a production line: success = not defective, failure = defective.
P (X = 1) = p
P (X = 0) = 1 − p
If an experiment has two possible outcomes, ‘success’ and ‘failure’ and their probabilities are respectively
p and 1 − p, then the number of successes, 0 or 1 has a Bernoulli distribution.
Definition 17.1. A random variable X has a Bernoulli distribution and it is referred to as a bernoulli random
variable if and only if its probability distribution is given by
f (x; p) = px (1 − p)1−x f or x = 0, 1
In connection with the Bernoulli distribution, a success may be getting heads with a balanced coin, it may
be catching pneumonia, it may be passing (or failing) an examination and it may be losing a race.
Note: Bernoulli distribution is a special case of the Binomial distribution. The mean and variance of the
Bernoulli distribution are given as
E(x) = p and Var(x) = σ 2 = p(1 − p)
Question 17.9. Show that E(X) = p and Var(X) = p(1 − p) for a random variable X which has a bernoulli
distribution.
X X1
E(X) = xP (x) = px (1 − p)1−x
allx x=0
= 0 · p0 (1 − p)1 + 1 · p(1 − p)1−1
= 0 + p(1 − p)0 = p
X
Var(X) = (x − µ)P (x) = E(X 2 ) − E(X)2
allx
= 02 (1 − p) + 12 (p − p2 )
= p − p2 = p(1 − p).
The moment generating function of a Bernoulli distribution is given by
1
X
MX (t) = E(etx ) = ·px (1 − x)1−x
x=0
= e · p0 (1 − p)1 + et · p(1 − p)0
0
= (1 − p) + et p
= 1 − p + et p
= P (et − 1) + 1
Example 17.10. A carton contain 4 good eggs and 6 bad eggs. If an egg is selected at random, then the
random variable (
0 if the egg is bad
X=
1 if the egg is good
4 2
P (good egg) = =
10 5
P (X = 1) = (2/5)1 · (1 − 2/5)0 = 2/5
E(X) = 2/5, V ar(X) = p(1 − p) = 2/5(1 − 2/5) = 6/25.
BERNOULLI TRIALS: Many experiments consist of a sequence of trials, where
(i) each trial results in a “success” or a “failure,”
(ii) the trials are independent, and
(iii) the probability of “success,” denoted by p, 0 < p < 1, is the same on every trial.
If the above conditions are satisfied then X is said to follow a binomial distribution. If a Bernoulli experiment
is performed repeatedly, we obtain a sequence of bernoulli trials. In a finite sequence of bernoulli trials,
we are usually interested in the number of ‘success’. Since there are only two possible outcomes at each
trial a sample of “n” bernoulli trials contains 2n possible outcomes. As before X = 0, 1 for each trial. The
probability of any particular sequence of outcomes in ‘n’ trials is obtained as the product of the probabilities
of the n outcomes resulting at each trial.
Definition 17.2. A random variable X has a binomial distribution and it is referred to as a binomial random
variable if and only if its probability distribution is given by
n x
f (x; n, θ) = p (1 − p)n−x for x = 0, 1, 2, · · · , n
x
where
n - number of trials
p - probability of success
1 − p - probability of failure
-
x is the random variable (the number of successes in n trials).
n n!
=
x x!(n − x)!
Thus, the number of successes in n trials is a random variable having a binomial distribution with parameters
n and p. The name ‘binomial distribution’ is derived from the fact that the values of b(X; n, p) for X =
0, 1, 2, · · · , n are successive terms of the binomial expansion [(1 − p) + p]n ; this shows that the sum of the
probabilities equal 1, and it should.
Example 17.11. Each of the following situations represent binomial experiments. (Are you satisfied with
the Bernoulli assumptions in each instance?)
(a) Suppose we flip a fair coin 10 times and let Y denote the number of tails in 10 flips. Here, Y ∼ b(n =
10; p = 0.5).
(b) In an agricultural experiment, forty percent of all plots respond to a certain treatment. I have four plots
of land to be treated. If Y is the number of plots that respond to the treatment, then Y ∼ b(n = 4; p =
0.4).
(c) In rural Kenya, the prevalence rate for HIV is estimated to be around 8 percent. Let Y denote the
number of HIV infected in a sample of 740 individuals. Here, Y ∼ b(n = 740; p = 0.08).
(d) It is known that screws produced by a certain company do not meet specifications (i.e., are defective)
with probability 0.001. Let Y denote the number of defectives in a package of 40. Then, Y ∼ b(n =
40; p = 0.001).
(e) Toss a fair coin 100 times and let X be the number of heads. Then X ∼ B(100, 0.5).
(f) A certain kind of lizard lays 8 eggs, each of which will hatch independently with probability 0.7. Let
X denote the number of eggs which hatch. Then X ∼ B(8, 0.7).
(g) What is the probability that at least 9 out of a group of 10 people who have been infected by a serious
disease will survive, if the survival probability for the disease is 70%?
This is the probability of having x successes in a series of n independent trials when the probability of
success in any one of the trials is p. If X is a random variable with this probability distribution,
n
X n x
E(X) = x p (1 − p)n−x
x=0
x
n
X n!
= x px (1 − p)n−x
x=0
x!(n − x)!
n
X n!
= px (1 − p)n−x
x=1
(x − 1)!(n − x)!
since the x = 0 term vanishes. Let y = x − 1 and m = n − 1. Subbing x = y + 1 and n = m + 1 into the
last sum (and using the fact that the limits x = 1 and x = n correspond to y = 0 and y = m, respectively)
m
X (m + 1)! y+1
E(X) = p (1 − p)m
y=0
y!(m − y)!
m
X m!
= (m + 1)p py (1 − p)m
y=0
y!(m − y)!
m
X m!
= np py (1 − p)m
y=0
y!(m − y)!
so that
E(X) = np
Let us make use of the fact that E(X 2 ) = E[X(X − 1)] + E(X) and first evaluate E[(X(X − 1)]. Similarly,
but this time using y = x − 2 and m = n − 2.
n
X n x
E(X(X − 1)) = x(x − 1) p (1 − p)n−x
x=0
x
n
X n!
= x(x − 1) px (1 − p)n−x
x=0
x!(n − x)!
n
X n!
= px (1 − p)n−x
x=2
(x − 2)!(n − x)!
n
2
X (n − 2)!
= n(n − 1)p px−2 (1 − p)n−x
x=2
(x − 2)!(n − x)!
m
X m!
= n(n − 1)p2 py (1 − p)m−y
y=0
y!(m − y)!
= n(n − 1)p (p + (1 − p))m
2
= n(n − 1)p2
So the variance of X is
MGF FOR THE BINOMIAL DISTRIBUTION: Suppose that Y ∼ b(n; p). Then the mgf of Y is
given by
P (X = 1) = p
P (X = 0) = 1 − p
n
tY
X n y
ty
MX (t) = E(e ) = e p (1 − p)n−y
y=0
y
n
X n y
= pet (1 − p)n−y = (q + pet )n ,
y=0
y
b(x; n, θ) = b(n − x, n, 1 − θ)
2. Each trial has only two possible, mutually exclusive, outcomes which are termed as a ’success’ or a
’failure’, or “good” or “bad”, “Head” or “Tail” etc.
3. The probability of a success, denoted by p, is known and remains constant from trial to trial. The
probability of a failure, denoted by q, is equal to 1 − p, such that p + q = 1.
4. Different trials are independent, i.e., outcome of any trial or sequence of trials has no effect on the
outcome of the subsequent trials.
The sequence of trials under the above assumptions is also termed as Bernoulli Trials.
If the above conditions are satisfied then X is said to follow a Binomial distribution.
Definition 17.3. A random variable X has a Binomial distribution and it is referred to as a Binomial random
variable if and only if its probability distribution is given by
n x
f (x; n, p) = p (1 − p)n−x f or x = 0, 1, 2, · · · , n
x
where
n - number of trials
p - probability of success
1 − p - probability of failure
-
x is the random variable (the number of successes in n trials).
n n!
=
x x!(n − x)!
Thus, the number of successes in n trials is a random variable having a Binomial distribution with
parameters n and p. The name ‘Binomial distribution’ is derived from the fact that the values of b(X; n, p)
for x = 0, 1, 2, · · · , n are successive terms of the Binomial expansion [(1 − p) + p]n ; This shows that the
sum of the probabilities equal 1, and it should.
This distribution is known as the binomial distribution with index n and probability p. We write this as
X ∼ Bin(n, p).
Example 17.15. Five independent trials of an experiment are carried out, the probability of a successful
outcome is p and failure is q. Write out the probability distribution distribution of X where x is the number
of successful outcomes in five trials. Comment on your answer.
Example 17.16. The random variable X is defined binomially B(7, 0.2). Find to 3 d.p.
(a). P (X = 3) (b). P (1 < X ≤ 4) (c). P (X > 1)
(d). P (X ≤ 3) (e). P (X ≥ 3)
Example 17.17. A risky operation used for patients with no hope for survival has a survival rate of 80%.
Find the probability that exactly 4 of the next 5 patients operated on will survive. Ans: 0.4096
Example 17.18. A quiz, has 6 multiple choice questions, each with 3 alternatives. Find the probability of
getting five or more correct. Ans: 0.0178
Example 17.19. A box contains a large number of pens. The probability that a pen is faulty is 0.1. How
many pens would you need to select to be more than 95% certain of picking at least one faulty pen.
Example 17.20. Show that E(X) = np and Var(X) = npq of Xn ∼ B(n, p).
Example 17.21. The probability that it will be a fine day is 0.4.
Example 17.22. A biased coin is tossed four times and the number of heads noted. The experiment was
repeated 500 times in all. results are summarized as below;
Number of heads: 0 1 2 3 4
Frequency: 12 50 151 200 87
1. From the data, estimate the probability of obtaining a head when the coin is tossed.
2. Using binomial distribution in the same mean, calculate theoretical frequencies of 0, 1, 2, 3, 4 heads.
Example 17.23. The probability of a student t being awarded a distinction in mathematics is 0.05. In a
randomly selected group of 50 students. What is the most likely number of students awarded distinction.
E(X) = λ, Var(X) = λ.
(i) Events occur singly and at random in a given interval of time or space.
A random variable X has a Poisson distribution if the above conditions are satisfied.
The distribution is useful in describing the number of events that will occur in a specific period of time or
a specific area or volume. For example the number of accidents per month at a busy intersection (junction)
has a Poisson distribution.
Example 17.24. On average the school photocopier breaks down 8 times during the school week (Monday-
Friday). Assuming that the number of breakdowns can be modelled by Poisson distribution. find out the
probability that it breaks down;
2. Once on a Monday
3. 8 times in a fortnight
Solution. Let λ = 8
e−λ λx
f (x, λ) =
x!
e−8 85
(a). P (X = 5) = =
5!
e−8 8
(b). P (X = 1) = =
1!
(c). λ = 8 ⇒ λ = 16 in two weeks
e−16 168
P (X = 8) =
8!
Example 17.25. The average number of trucks arriving on any one day at a truck depot in a certain city is
known to be 2. What is the probability that on a given day fewer than nine trucks will arrive at the depot?
Solution. Let X be the number of trucks arriving on a given day. λ = 12
8
X
P (X < 9) = P (x; 12) = 0.1550
x=0
Example 17.26. A maximum security prison reports that the number of escape attempts by prisoners per
month has nearly a poisson distribution with mean equal to 1.5. Find
1. The probability of exactly 3 escape attempts during that month
2. The probability of at least one escape attempts during the next month.
Solution.
1.53 e−1.5
1. P (X = 3) = f (3) = = 0.1255
3!
1.50 e−1.5
2. P (X ≥ 1) = 1 − f (0) = 1 − = 0.7769
0!
1. X ∼ P (λ) with standard deviation 1.5. Find the P (X ≥ 3).
2. Show that if X ∼ P (λ), M(t) = eλ(e −1) .
t
e−λ λx
(a). P (X = 3) = =
x!
(b). P (X ≥ 2) = P (X = 2) + P (X = 3) + · · ·
= 1 − [P (X = 0) + P (X = 1)]
=???