0% found this document useful (0 votes)
488 views109 pages

Sta 101 Note PDF

This document provides lecture notes for an introductory statistics course at the University of Abuja. It outlines the course aims, objectives, contents including topics like data collection, frequency distributions, probability, and correlation. The notes define statistics and discuss primary methods of data collection and classification of data. The importance of statistics across many fields is also highlighted.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
488 views109 pages

Sta 101 Note PDF

This document provides lecture notes for an introductory statistics course at the University of Abuja. It outlines the course aims, objectives, contents including topics like data collection, frequency distributions, probability, and correlation. The notes define statistics and discuss primary methods of data collection and classification of data. The importance of statistics across many fields is also highlighted.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 109

Lecture Notes for

STA101
Introductory Statistics
(2 UNITS)
Department of
Statistics
University of Abuja
This lecture notes is not for sale.

1
Course Aim and Objectives

This course teaches introduction to statistics required by 100L Students.


On successful completion of the course, you are expected to have
adequate knowledge of the course contents listed below:

COURSE Contents:

· Definitions and classifications of Statistics

· Data, methods of data collection, Classification and Tabulation

· Frequency distributions (grouped and ungrouped)

· Diagrammatic and Graphic Representations of data

· Averages (Measures of Central Tendency), Measures of Dispersion,


Skewness and Kurtosis

· Theory of Probability (Basic definitions, addition and multiplication


theorem, Bayes’ s Rule)

· Random Variable and Mathematical Expectation

· Probability Distributions (Binomial, Poison, and Normal Distributions)

· Relationship between Variables (Simple Correlation and Regression


Analysis)

Texts

This course has detailed lecture notes, it should not be necessary to


buy a book for this course. However, you may consult any elementary

2
books in Statistics.

Lecture notes:

The course attempts to convey a large amount of information in † a short


space of time. Some of the material is of a technical nature and may not
be covered explicitly in the lectures and classes. You are expected to read
the lecture notes thoroughly. The syllabus is defined by the contents of the
lecture notes Additional reading issuggestedat the endofeachweek’s
classes. Despite my best efforts, there will be mistakes in the notes. If you
spot something that looks wrong please let me know. Solutions to some
problems have been provided while some have been intentionally omitted.
All problems will be solved together in the lecture rooms.

Definitions of Statistics

Statistics has been defined differently by different writers from time to


time so much so that scholarly articles have collected together hundreds
of definitions, emphasizing precisely the meaning, scope and limitations of
the subject. The reasons for such definitions may be broadly classified as
follows:

i. The field of utility of statistics has been increasing steadily and thus
different people defined it differently according to the development
of the subject. In old days statistics was regarded as the “science of
state craft” but today it embraces almost every sphere of natural and
human activity. Accordingly, the old definitions which were confined
to and very limited and narrow field of enquiry were replaced by the
new definitions which are more exhaustive and elaborate in
approach.

ii. The word statistics has been used to convey different meanings in
singular and plural sense. When use as plural statics means
3
numerical set of data and when used singular (statistic) sense it
means the science of statistical methods embodying the theory and
techniques use for collecting analyzing and drawing inferences from
numerical data.

We give below some selected definitions of statistics.

1. “Statistics are the classified facts representing the conditions of the


people in a state. Specially those facts which can be stated in
numbers or in tables of numbers or in any tabular or classified
arrangement.” - Webster

2. “Statistics are numerical statements of facts in any departments of


enquiry place in relation to each other.” – Bowley.

3. By statistics we mean quantitative data affected to a marked extent


by multiplication of causes.” – Yule and Kendall.

4. Statistics may be defined as the aggregate of facts affected for a


marked extent by multiplicity of causes, numerically expressed
enumerated or estimated according to a reasonable standard of
accuracy, collected in a systematic manner, for a predetermined
purpose and placed in relation to each other”. – Prof. Horace Secrist.

Classification of Statistics

It has become accepted in today’s world that in order to learn about


something, you must first collect data. Statistics is the art of learning data.
It is concerned with the collection of data, its subsequent descriptions,
and its analysis, which often leads to the drawing of conclusion. At the
end, the data should be described for instance the scores of two groups of

4
teaching methods should be presented. In addition, summary measures
such as the average scores of members of each of the groups should be
presented. This part of statics, concerned with the description and
summarization of data, is called descriptive statistics. After the preceding
experiment is completed and the data are described and summarized, we
hope to be able to draw conclusion about which teaching method is
superior. This part of statistics. Concerned with the drawing of conclusion,
is called inferential statistics.

Importance of Statistics

To a very striking degree our culture has become a statistical culture. Even
a person who may never have heard of an index number is affected by
those index numbers, which describe the cost of living. It is impossible to
understand psychology, sociology, economics, finance or natural and
physical sciences without some general idea of the meaning of an
average, of variation, of concomitance, of sampling, of how to interpret
charts and tables. Statistics is important in all area of human endeavors.
For example, we have: statistics in planning, statistics in state, statistics in
mathematics, statistics in physics, statistics in chemistry, in Biology, in
Economics, in industry, in insurance in astronomy, in psychology, in
education, in war, in medical science, among others, the list of area where
statistics is important is endless.

Collection of Data

Statistics are set of numerical data. In fact only numerical data constitute
statistics. This means that the phenomenon under study must be capable
of quantitative measurement. Thus, the raw material of statistics always
originates from the operation of counting (enumeration), or measurements.
For any statistical enquiry, whether it is in business, economic, or
Sciences, the basic problem is to collect facts and figures relating to
particular phenomenon under study. The items in which the

5
measurements are taken are called statistical units. On the face of it, it
might appear that the collection of data is the first step for any statistical
investigation. But in a scientifically prepared (efficient and well-plane)
statistical enquiry, the collection of data is by no means the first step.
Before we embark upon the collection of data for any given statistics
enquiry, it is imperative to examine carefully the following points which
may be termed as preliminaries to data collection: objectives and scope of
the enquiry, statistical units to be used, sources of information (data),
method of data collection, degree of accuracy aimed at in the final result,
type of enquiry. A good data however, possessed the following
characteristics: it should be unambiguous, it should be specific, it should
be uniform, it should be stable, and it should be appropriate.

Methods of Data Collection

Primary and Secondary Data

The data which are originally collected by an investigators or agency for


the first time for any statistical investigation and used by them in the
statistical analysis are termed as primary data. On the other hand, the data
(published or unpublished), which have already been collected and
processed by some agency or person and taken over from there and used
by any other agency or person for their statistical work are termed as
secondary data as far as second agency is concerned. The second agency
if and when it publishes and files such data becomes the secondary
source to anyone who latter uses these data. In other words, secondary
source is the agency who publishes or releases for use by others the data
which was not originally collected and processed by it. It may be observed
that the distinction between primary and secondary data is a matter of
degree or relativity only. The same set of data may be secondary in the
hands of one and primary in the hands of others. In general, the data are
primary to the source who collects and processes them for the first time

6
and are secondary for all sources who latter use such data. The methods
commonly use for the collection of primary data are as follows: direct
personal investigation. Indirect oral interviews. Information received
through local agencies. Mailed questionnaire method. Schedules sent
through enumerators. The chief source of secondary data may be broadly
classified into the following two groups: published sources and
unpublished sources.

Classification and Tabulation

Classification: it is of interest to give below the following definitions of


classification:

i. Classification is the process of arranging data into sequences and


groups according to their common characteristics or separating
them into different but related parts. – securest

ii. A classification is a scheme for breaking a category into a set of


parts, called classes, according to some precisely defined differing
characteristics possessed by all the elements of the category –
Tuttle A. M

Thus classification impresses upon the arrangement of the data into


different classes, which are to be determined depending upon the nature,
objectives and scope of the enquiry. For instance the number of students
registered in university of Abuja may be classified on the basis of any of
the following criterion:

· Sex

· Age

· The state to which they belong

7
· Religion

· Different faculties

· Heights or weights and so on

The functions of classification may be briefly summarized as follows:

i. It condenses the data

ii. It facilitates comparisons statistical treatment of the data

iii. it helps to study the relationships between groups of data

iv. it facilitates the statistical treatments of the data

The basis or the criteria with respect to (w.r.t) which the data are classified
primarily depends on the objectives and the purpose of the enquiry.
Generally, the data can be classified on the following four bases;

i. Geographical i.e., Area-wise or regional

ii. Chronological i.e., w.r.t occurrence of time

iii. Qualitative i.e., w.r.t some character or attribute

iv. Quantitative i.e., w.r.t numerical values or magnitudes

Frequency Distribution

A frequency table is a table that displays all the observed values of a


variable under study and shows many times each values occurs. The
distribution of the total number of observations among the various
categories or classes of the variable is called a frequency distribution.

The organization of the data pertaining to a quantitative phenomenon


involves the following four stages;

i. The set or series of individual observations - unorganized (raw) or


organized (arrayed data)

8
ii. Discrete or Ungrouped distribution

iii. Continuous frequency distribution

iv. Grouped frequency distribution

We shall explain the various stages by means of a numerical illustration.

Example: the table below is a frequency table distributing 100 female


students in a class according to their marital status (nominal data).

S/N Marital Status Frequency (number in


each class)

1. Married 28

2. Divorced 11

3. Separated 17

4. Single 44

Relative frequency

This would be very useful if we need to compare tow data set of different
sizes.

Grouped Frequency Distribution

Frequently one has a collection of sets of data whose tabulation would

9
result in lengthy frequency table. To summarize such data sets and make
them more comprehensible in a frequency table we often collapse the
data into fewer classes by grouping them. A number of rules of the thumb
have been proposed for calculating the proper number of classes.
However, an elegant, though approximate formula seems to be one given
by Prof. Sturges known as Sturges’srule, According to which K = 1+3.322
log10 N

Which K is the number of class intervals (classes) and N is the total


frequency i.e, total number of observations in the data. The value obtained
is rounded to the next higher integer.

Accordingly, the Sturges formula very ingeniously restricts the number of


classes between 4 and 20, which is fairly reasonable number from
practical point of view. The rule, however, fails if the number of
observations is very large or very small.

One has also to decide on the width of the class intervals (or size of class
intervals). In general, one should aim at class of equal width. Each class
intervals width, w, is obtained by w= , where R is the range, and K is the
number of class intervals. Another rule of the thumb for determining the
rule of the class interval should not be greater than th of the estimated
population standard deviation.

Examples: form a grouped frequency distribution from the following data


by inclusive method, using Sturges rule.

10, 17, 15, 11, 16, 19, 24, 29, 18, 25, 26, 32, 14, 22, 17, 20, 23, 27, 30, 12, 15,
18, 24, 36, 18, 15, 21, 28, 33, 38, 34, 13, 10, 16, 20, 22, 29, 19, 23, 31.

Example: a college management wanted to give scholarship to students


securing 60 and above marks in the following manner: 74, 62, 84, 72, 61,

10
83, 72, 81, 64, 71, 63, 61, 60, 67, 74, 66, 64, 79, 73, 75, 76, 69, 68, 78, 67.
Calculate the monthly scholarship paid to the students.

of marks Monthly
scholarsh
ip

60-65 2,500

65-70 3,000

70-75 3,500

75-80 4,000

80-85 4,500

Cumulative Frequency Distribution:

A frequency distribution simply tells us how frequently a particular value of


the variable (class) is occurring. However, if we want to know the total
number of observations getting a value “less than” or “more than: a
particular value of the variable, this frequency table fails to furnish the
information as such. This information can be obtained very conveniently
from the cumulative frequency distribution.

Examples:

Less than c.f. of marks of 70 students less than c.f.


distribution

Marks f Less than c.f

30-50 5 5

35-40 10 5+10=15

11
Less than 55 5 40-45 15 15+15=30

Less than 60 0
Marks F
45-50 30 30+30=60
Less than 30 0
50-55 5 60+5=65
Less than 35 5
55-60 5 65+5=70
Less than 40 15

Less than 45 30

Less than 50 60
More than c.f
Less than 55 65
Marks f More than thanc. Less than 60 70
f

30-35 5 65+5=70

35-40 10 55+10=65

40-45 15 40+15=55

45-50 30 10+30=40

50-55 5 5+5=10

55-60 5 5

Marks F more than frequency distribution

Less than 30 70

Less than 35 65

Less than 40 55

Less than 45 40

Less than 50 10
12
Tabulation

By tabulation we mean the systematic presentation of the information


contained in the data, in rows and columns in accordance with some
salient features or characteristics. Rows are horizontal arrangements and
columns and vertical arrangements. In the word of A. M. Tuttle; “A
statistical table is the logical listing of related quantitative data in rows
and columns with sufficient explanatory and qualifying words, phrases and
statements in the form of titles, heading and notes to make clear the full
meaning of data and their origin.” Professor Bowley in his manual of
statistics refers to tabulation as “the intermediate process between the
accumulation of data in whatever form they are obtained, and the final
reasoned account of the result shown by the statistics.”

The various parts of a table include: the table number, title, head notes or
prefatory notes, captions and stubs, body of the table, foot-note, and
source note.

Diagrammatic and Graphic Representation

Another important, convincing, appealing and easily understood method of


presenting the statistical data is the use of diagrams and graphs. They are
nothing but geometrical figures like points, lines, bars, squares, circles,
cubes, etc. we shall consider the most commonly used once, they include:
line graphs, Bar charts, Pie charts, Histogram, Frequency polygon, and
Ogive.

1. Line Diagram: This is the simplest of all the diagrams. It consists in


drawing vertical line being equal to the frequency. Line graph

13
facilitate comparisons.

Example: The following data shows the number of accidents


sustained by 314 drivers of a public utility company over a period of
five years.

Number of accidents:

0 1 2 3 4 5 6 7 8 9 10 11

Number of drivers:

82 44 68 41 25 20 13 7 5 4 3 2

Represent the data by a line diagram.

2. Bar charts: Bar diagrams are one of the easiest and the most
commonly used devices of presenting most of the business and
economic data. These are especially satisfactory for categorical data
or series. The height (or length) of each bar indicating the size of the
figure represented, where the length of a single bar is proportional to
the magnitude of each part of the data. The sizes of the bars are the
same. The following are the various types of bar diagram in common
use:

I. Simple bar chart

II. Sub- divided or component bar chart

III. Percentage bar chart

IV. Multiple bar diagram

V. Deviation or bilateral bar diagram

14
Example: use a simple bar chart to illustrate the number of
workers employed in the factories tabulated below.

Factory A B C D

No. of Employee 120 300 250 150

3. Angular or Pie Diagram: just as sub- divided and percentage bars or


rectangles are used to represent the total magnitude and its various
components, the circle (representing the total) may be divided into
various sections or segments viz., sectors representing certain
proportion or percentage of the various components parts to the
total. Such a sub- divided circle diagram is known as an angular or
pie chart, named so because the various segments resemble slices
cut from a pie.The degree represented by the various component
parts of a given magnitude can be obtained directly without
computing their percentage to the total value as follows:

Degree of any component part = 360

Examples: the following tables shows the area in millions of sq.km.


of ocean of the world:

Ocean Area (million sq.km)

Pacific 70.8

Atlantic 41.2

15
Draw a pie Indian 28.5 diagram to
represent the data.
Antarctic 7.6

Arctic 4.8

GRAPHIC REPRESENTATION OF DATA.

The difference between the diagrams and graphs is that; diagrams are
useful for visual presentation of categorical and geographical data while
the data relating to time series and frequency distribution is best
represented through graphs. Diagrams are primarily used for comparative
studies and can’t be used to study the relationship, (not necessary
functional) between the variables under study. This is done through graphs.
The most commonly used graphs for charting a frequency distribution for
the general understanding of the details of the data are:

i. Histogram

ii. Frequency polygon

iii. Frequency curve

iv. Ogive or Cumulative frequency Curve

Histogram: It is one of the most popular and commonly used devices for

16
charting continuous frequency distribution. It consist in erecting a series
of adjacent vertical rectangles on the sections of the horizontal axis (
x-axis), with the bases (sections) equal to the width of the corresponding
class intervals and heights are so taken that the areas of the rectangles
are equal to the frequencies of the corresponding classes. The values are
taken along the x-axis and the frequencies along the y-axis. This however,
involves two cases: case (i) Histogram with equal classes, case (ii)
Histogram with unequal classes.

Example: represent the adjoining distribution of marks of 100 students in


the examination by a histogram

Marks obtained No. of students (c.f)

Less than 10 4

Less than 20 6

Less than 30 24

Less than 40 46

Less than 50 67

Less than 60 86

Less than 70 96

Less than 80 99

Less than 90 100

17
Example: represent the following data by means of a histogram

Weekly 10-1 15-20 20-25 25-30 30-40 40-60 60-80


wages 5

No. of 7 19 27 15 12 12 8
workers

Frequency Polygon: frequency polygon is another device of graphic


presentation of a frequency distribution (continuous, grouped or discrete).

In case of discrete frequency distribution, frequency polygon is obtained


on plotting the frequencies on the vertical axis (y-axis) against the
corresponding values of the variables on the horizontal axis (x-axis) and
joining the points so obtained by straight lines.

Examples: the following data show the number of accidents sustained by


313 drivers of a public utility company over a period of 5 years.

No. of 0 1 2 3 4 5 6 7 8 9 10 11
accidents

No. of 8 44 6 41 25 20 13 7 5 4 3 2
drivers 0 8

Draw the frequency polygon.

In case of grouped or continuous frequency distribution, frequency


polygon may be drawn in two ways. Case (1) from histogram, case (2)
without constructing histogram.

Example. The following table gives give the frequency distribution of the
weekly wages

18
(in ’00 N) of 100 workers in a factory.

Weekly 20-2 25-2 30-3 35-3 40-4 45-4 50-5 55- 60 Tota
wages ( ’ 4 9 4 9 4 9 4 59 -6 l
00 N 4

No of 4 5 12 23 31 10 8 5 2 100
workers

Draw the histogram and frequency polygon of the distribution.

Frequency Curve. A frequency curve is a smooth free hand curve drawn


through the vertices of a frequency polygon. The object of smoothing of
the frequency polygon is to eliminate, as far possible the random or erratic
fluctuations that might be present in the data. The area enclosed by the
frequency curve is same as that of the histogram or frequency polygon but
its shape is smooth one and not with sharp edges. Frequency curve may
be regarded as a limiting form of the frequency polygon as the number of
observations (total frequency) becomes very large and the class intervals
are made smaller and smaller.

Example Draw a frequency curve for the following distribution

Age (year) 17-19 19-21 21-23 23-2 25-27 27-29 29-31


5

No. of 7 13 24 30 22 15 6
students

However, frequency curves are of different types. Some of the important


curves which in general, describe most of the data observed in practice
are:

19
i. Curves of symmetrical distribution. In a symmetrical distribution, the
class frequencies first rise steadily, reach a maximum and then
diminish in the same identical manner. The most commonly and
widely used symmetrical curve in statistics is the normal frequency
curve.
Normal Probability Curve

X = mean

ii. Moderately Asymmetrical (Skewed) Frequency Curves.


A frequency curve is said to be skewed (asymmetrical) if it is not
symmetrical. Such curves are stretched more to one side than to the other.
If the curve is stretched more to the right (i.e, it has a longer tail towards
the right), it is said
to be posi
tively skewed and
if it is stre
tched more to the
left (i.e. has a long
er tail towards thel
eft), it is said to
be negatively ske
wed.

20
iii. Extremely Asymmetrical or J-Shaped Curve.
The distribution in which the value of the variable correspondingly to the
maximum frequency is at one of the ranges, give rise to highly skewed
curves. When plotted, they give a J-shaped or inverted J-shaped curve and
accordingly such curves are also called J-Shaped curves.

J – Shaped Curve Inverted J – Shaped


Cur
ve

iv. U-Curve.
The frequency distribution in which the maximum frequency occurs at the
extremes (i.e, both ends) of the range and the frequency keeps on falling
symmetrically (about the middle), the minimum frequency being attained
at the center give rise to a U-Shaped curve.

U-shaped Curve
Bi 21

-
v. Mixed Curves
Sometimes, though very rarely, we come across certain distributions in
which maximum frequency is attained at two or more points in an irregular
manner. Such curves are obtained in a distribution where as the value of
the variable increase, the frequencies increase and decrease, then again
increase and decrease twice or thrice as shown in the diagram or even
more than that.

Tri-modal
Curve
frequenc
frequency

Variable
Variabl
e

OGIVE: Ogive is a graphic presentation of the cumulative frequency (C.F)


distribution of continuous variables. It consists in plotting the C.F (along
the y-axis) against the class boundaries (along x-axis). since there are two
types of cumulative frequency distribution viz, ‘less than’ C.F and ‘more
than’ C.F. we have accordingly two types of ogives, viz.,

i. ‘Less Than’ Ogive,

22
ii. ‘More Than’ Ogive
‘Less Than’ Ogive; this consists in plotting the ‘less than’ cumulative
frequencies against the upper-class boundaries of the respective classes.
The point so obtained are joined by the smooth freehand curve to give
‘Less Than’ Ogive. Obviously, ‘Less than’ Ogive is an increasing curve,
sloping upwards form left to right and has the shape of an elongated S.
‘More Than’ Ogive; Similarly, in ‘More Than’ Ogive, the ‘More Than’
cumulative frequencies are plotted against the lower class boundaries of
the respective classes. The point so obtained are joined by a smooth
freehand curve to give ‘More Than’ Ogive. ‘More Than’ Ogive is a
decreasing curve and slopes downwards from left to right and has the
shape of an elongated S, upside down
Remarks: we may draw both the ‘Less Than’ Ogive and ‘More Than’ Ogive
on the same graph. If done so, they intersect at a pint. The foot of the
perpendicular from their point of intersection on the x-axis gives the value
of median.

Example

The table below give the marks obtained by 70 candidates in STA 101
examination

Marks No. of Candidates (f) Less than C. More than C.F


F

0-10 2 2 70

10-20 3 2+3 =5 70-2 =68

20-30 6 5+6 =11 68-3 = 65

23
30-40 11 11+11 =22 65-6 = 59

40-50 12 22+12 =34 59-11 = 48

50-60 15 34+15 =49 48-12 = 36

60-70 10 49+10 =59 36-15 = 21

70-80 7 59+7 = 66 21-10 = 11

80-90 4 66+4 =70 11-7 = 4

plot both less than and more than ogiv for the data.

24
AVERAGES

One of the important objectives of statistical analysis is to determine


various numerical measures which describe the inherent characteristics of
a frequency distribution. The first of such measures is average. The
averages are the measures which condense a huge unwieldy set of
numerical data into single numerical value which are representative of the
entire distribution.

Averages are the typical values around which other items of the
distribution congregate. They are the values which lie between the two
extreme observation, (i.e., the smallest and the largest observations), of
the distribution and give us an idea about the concentration of the values
in the central part of the distribution. Accordingly they are also sometimes
referred to as the measures of central tendency. Averages are very much
useful:

i) For describing the distribution in concise manner

ii) For comparative study of different distributions

iii) For computing various other statistical measures such as dispersion,


skewness, kurtosis and various other basic characteristics of a mass
data. Averages are also sometimes referred to as measures of
location since they enable us to locate the position or place of the
distribution in question.

“Statistical analysis seeks to develop concise4 summary figures which


describe a large body of quantitative data. One of the most widely used set
of summary figures is known as measures of location, which are often
referred to as averages, measures of central tendency or central location.
The purpose for computing an average value for a set of observation is to
obtain a single value which is representative of all the items and which the
mind can grasp simply and quickly. The single value is the point or location

25
around cluster” _______ Lawrence J. Kaplan

The following are the five measures of central tendency or measure of


location which are commonly used in practice

(i) Arithmetic mean or simply mean

(ii) Geometric mean

(iii) Harmonic mean

(iv) Median

(v) Mode

We shall discuss them in detail one by one.

Arithmetic mean (AR)

Arithmetic mean of a given set of observations is their sum divided by the


number of observations. For example, the AR of 5, 8, 10, 15, 24 and 28 is

In general, if X1, X2, ---, Xn are the given n observations, then their arithmetic
mean, usually denoted by is given by:

X=

Example: the following table gives the daily income of ten operators in a
machine tool factory. Find the mean.

Name of A B C D E F G H I J
operators

Income 12 15 18 20 25 30 22 35 37 26

Solution
26
If income is represented by x, then

In case of a discrete frequency distribution:

X - --

- --

The arithmetic mean is given by:

Illustration: The following is the frequency distribution of the number of


telephone calls received in 245 successive one-minute intervals at an
exchange:

No of Calls 0 1 2 3 4 5 6 7

One Minute 14 21 25 43 51 40 39 12
intervals

Obtain the mean number of calls per minute

Solution

Let the variable x denote the number of calls received per minute at the
exchange

No of 0 1 2 3 4 5 6 7
Calls (x)

Frequenc 14 21 25 43 51 40 39 12 ∑f=245
y (f)

27
Fx 0 21 50 129 204 200 234 84 ∑fx=24
5

= =3.8

ARITHMETIC MEAN FOR GROUPED DATA

In calculating the mean of a grouped data set it is customary to assume


that all values falling in a particular class interval are located at the class
mark or midpoint of that interval.

To calculate the mean of such a data set; we multiply each class mark by
the corresponding class frequencies, sum these products over all the
class intervals, and divide the results by the total frequency. Thus, the
mean is calculated as = i=1, 2, - - -, k, where xi represent the class mark

Illustration: The data below showing the frequency distribution of a


lifetime of 720 television tubes:

Lifetimes in 0 - 50 – 100 - 150 - 200 - 250 - 300 - 350 -


hours 49 99 149 199 249 299 349 399

No. of Tubes 5 16 29 128 206 314 17 5

Obtain the arithmetic mean.

Solution:

S/N Class Class Mark Fi Fi xi


o Interval (x)

1 0 – 49 24.5 5 122.5

2 50 – 99 74.5 16 1192.0

3 100 – 149 124.5 29 3610.5

28
4 150 – 199 174.5 128 22336.0

5 200 – 249 224.5 206 46247.0

6 250 – 299 274.5 314 86193.0

7 300 – 349 324.5 17 5516.5

8 350 – 399 374.5 5 1872.5

∑fi=72 ∑fixi=
0

167090.
0

Mathematical Properties of Arithmetic Mean

Arithmetic Mean possesses some very interesting and important


mathematical properties as given below:

Property 1: The algebraic sum of the deviations of the given set of


observations from their arithmetic mean is zero.

Mathematically, ∑ (x - ) = 0

Or for a frequency distribution: ∑f (x - ) = 0

Proof: ∑f (x - ) = ∑(fx - f) = ∑fx - ∑f

= ∑ f x - ∑f (: is a constant)

=∑fx- N (: ∑f = N)

But = ∑ fx = ∑ fx = N

29
: ∑ f (x - = N - N = 0

Property 2: The sum of squares of deviations of the given set of


observations is minimum when taken from the arithmetic mean compare
to the sum of squares of deviation of other value of the distribution.

Mathematically, for a given frequency distribution, the sum

S = ∑ f (x – A)2

Which represent the sum of the squares of deviations of given


observations from any arbitrary value ‘A’ is minimum when A =

S1 = sum of squared deviations from mean = ∑ (x - )2, and

S = sum of squared deviations from any arbitrary point A = ∑ (X – A)2 : A

Then S1 is always less than S i.e, S1 <S

Property 3: The product of the arithmetic mean and number of values on


which the mean is based is equal to the sum of all given values. that is,

= fx

Property 4: Mean of the combined series

The mean of all the sum (or, differences) of corresponding observations in


two series, number of observations being equal in the two, is equal to the
sum (or, difference) of the means of the two series. If n1 and n2 are the
sizes and 1 and 2 are the respective means of two groups then the mean of
the combined group of sizes n1 + n2 is given by:

= or 1,2 =

In general, if 1, 2, - - - k are the arithmetic means of k groups with n1, n2, - - -, nk


observations respectively, then

30
Illustration: The mean of marks in statistics of 100 students in a class was
72. The mean of marks of boys was 75, while their number was 70. Find
out the mean marks of girls in the class.

Solution: In the usual notations we are given:

n1 = 70, 1 = 75; n1 + n2 = 100, = 72 : n2 = 100 – 70 = 30, we want 2

= = 72 =

2 = = = 65

Hence the mean of marks of girls in the class is 65.

Step Deviation Method for Computing Arithmetic Mean

It may be pointed out that the formula can be used conveniently if the
values of x or/and f are small. However, if the values of x or/and f are large,
the calculations of mean by is quite tedious and time consuming. In such
a case the calculation can be reduced to a great extent by using the step
deviation (or assumed mean) method which consists in taking the
deviation (differences) of the given observation from any arbitrary value A.

Let d = X – A, then fd = f (X – A) = Fx – A.F

Taking the sum over various values of x, we get

∑fd = ∑fx - A∑f

∑fd = ∑fx – A.N (:∑f = N)

: Dividing both sides by N, we get

=A+

In case of grouped or continuous frequency distribution, with class


intervals of equal magnitude, the calculations are further simplified by
31
taking:

d = , where X is the mid-value of the class and h is the common magnitude


of the class intervals.

From d = , we get hd = (X – A)

Multiplying both sides by f, we get hfd = f (X – A) = fx – FA

Summing both sides over the values of X, we get:

h∑fd = ∑fx - A∑f = ∑fx – N.A

Dividing both sides by N, we get

Illustration: calculate the mean for the following frequency distribution:

Marks 0 - 10 10 – 20 20 – 30 30 – 40 40 – 50 50 – 60 60 – 70

No. of 6 5 8 15 7 6 3
Students

(i) By the direct formula (ii) By the step deviation method

Solution

S/N Marks Mid-valu F Fx d= fd


e (x)

1 0 - 10 5 6 30 -3 -18

2 10 – 20 15 5 75 -2 -10

3 20 – 30 25 8 200 -1 -8

4 30 – 40 35 15 525 0 0

32
5 40 – 50 45 7 315 1 7

6 50 – 60 55 6 330 2 12

7 60 – 70 65 3 195 3 9

∑f = 50 ∑fx = ∑fd = -8
1670

(i) = =

(ii) A = 35, h = 10

: =A+

Geometric Mean

The geometric mean usually abbreviated as (G.M) of a set of n observation


is the nth root of their product. Thus if x1, x2, - - - xn are the given n
observation then their G.M is given by

G.M = = (X1. X2. - - -. Xn) 1/n

For example, the G.M of 4, 8, 16 is

G.M = = 8

But if n, the number of observations is large, then the computation of the


nth root is very tedious. In such a case the calculations are facilitated by
making use of the logarithms. Taking logarithm of both sides

=
33
Taking anti log of both sides

G.M = Antilog

In case of frequency distribution

G.M. Antilog

Example: find the G.M. of 2, 4, 8, 12, 16, 24

Solution

X 2 4 8 12 16 24 Total

Log x 0.3010 0.6021 0.9031 1.0791 1.2041 1.3802 5.4697

G.M. = Antilog

Example: Find the G.M. for the following distribution.

Marks 0-10 10-20 20-30 30-40 40-50

No. of 5 7 15 25 8
Students

Solution

Marks Mid-point (x) F Log x F log x

0-10 5 5 0. 3.4950
6990

34
10-20 15 7 1.1761 8.2327

20-30 25 15 1. 20.9685
3979

30-40 35 25 1.5441 38.6025

40-50 45 8 1.6532 13.2256

∑f = ∑ f log x = 84.
60 5243

G.M. Antilog = 25.64 marks

HARMONIC MEAN

Another important mean is the harmonic mean which is used for averaging
the rates. If X1, X2, - - -, Xn is a given set of n observations, then their
harmonic mean (H.M) or simply H is given by:
H=

In other words, H.M. is the reciprocal of the arithmetic mean of the


reciprocal of the given observations. In case of frequency distribution, we
have:

=H=

Example: The following table gives the weights of 31 persons on a sample


enquiry. Calculate the mean weight using harmonic mean

Weights 13 13 14 14 14 14 14 15 15
(lbs) 0 5 0 5 6 8 9 0 7

35
No. of 3 4 6 6 3 5 2 2 1
persons

Solution

Weights (lbs) (x) F Fx

130 3 0.0231

135 4 0.0296

140 6 0.0429

145 6 0.0414

146 3 0.0205

148 5 0.0338

149 2 0.0134

150 1 0.0067

157 1 0.0064

∑f = 31 ∑ = 0.2178

H.M. =

Example: A cyclist pedals from his house to his college at a speed of 10


km h-1 and back from the college to his house at 15km h-1. Find the average
speed.

36
Solution

Let the distance from the house to the college be xkms. In going from
house to college, the distance (x kms) is covered in hours, while in coming
from college to house, the distance is covered in hours. Thus a total
distance of 2x kms is covered in hours.

Hence, average speed =

H = 12 km h-1

MEDIAN

In the words of L.R. Connor:

“The median is that value of the variable which divides the group into two
equal parts, one part comprising all the values greater and the other, all the
values less than median”. Thus Median of a distribution may be define as
that value of the variable which exceeds and is exceeded by the same
number of observations i.e, it is the value such that the number of
observations above it is equal to the number of observations below it.
Thus the median is a positional average i.e, its value depends on the
position occupied by a value in the frequency distribution.

Calculation of Median

Case (1): Ungrouped Data: if the number of observations is odd, then the
median is the middle value after the observations have been arranged in
ascending or descending order of magnitude. For example, the median of
5 observations 32, 12, 40, 8, 60 i.e, 8, 12, 35, 40, 60 is 35.

In case of even number of observations median is obtained as the


arithmetic mean of the two middle observations after they are arranged in
ascending or descending order of magnitude.

37
For example 8, 12, 35, 40, 50, 60

The median =

Case (II): Frequency Distribution (Discrete type): In case of frequency


distribution where the variable takes the values X1, X2, - - -, Xn with
respective frequencies f1, f2, - - -, fn with , total frequency, median is the size
of the item or observation.
In this case the use of cumulative frequency distribution facilitates the
calculations.

Example: eight coins were tossed together and the number of heads (x)
resulting was noted. The operation was repeated 256 times and the
frequency distribution of the number of heads is given below:

No of heads 0 1 2 3 4 5 6 7 8
(x)

Frequency 1 9 26 59 72 52 29 7 1

Calculate the median

Solution

X F Less than c.
f.

0 1 1

1 9 9

2 26 36

3 59 95

4 72 167

5 52 219

6 29 248

38
7 7 255

8 1 256

∑f = 256

Here, ∑f = 256, = = 128. the c.f. just greater than 128 is 167 and the value
of X corresponding to 167 is 4. hence, median number of heads is 4.

Case (III): Continuous Frequency Distribution

The value of median is now obtained by using the interpolation formula:

median =l + , where

l is the lower limit of the median class, h is the magnitude or width of the
median class, N = , is the total frequency, f is the frequency of the median
class, c is the cumulative frequency of the class preceding the median
class.

The interpolation formula is based on the following assumptions:

(i) The distribution of the variable under consideration is continuous


with exclusive type classes without any gaps.

(ii) There is an orderly and even distribution of observations with


each class.

However, if the data are given as a grouped frequency distribution where


classes are not continuous, then it must be converted into a continuous
frequency distribution before applying the formula. This adjustment will
affect only the value ofl

Example: The following table shows the frequency distribution of weight in


grams of mangoes of a given variety. Calculate the median.

Weight in 410-41 420-42 430-43 440-44 450-45 460-46 470-47


grams 9 9 9 9 9 9 9

39
No. of 14 20 42 54 45 18 7
Mangoes

Solution

Class F Less than


boundaries c.f.

409.5 - 419.5 14 14

419.5 – 429. 20 34
5

429.5 – 439. 42 76
5

439.5 – 449. 54 130


5

449.5 – 459. 45 175


5

459.5 – 469. 18 193


5

469.5 – 479. 7 200


5

= = 100 The c.f. just greater than 100 is 130. Hence the corresponding
class 439.5 – 449.5 is the median class.

Median =l +

= 439.5 +

= 443.94 grams

40
We can also get the median by plotting the ogive of the distribution,
median is the value below which item lie.

MODE

Mode is the value which occurs most frequently in a set of observations


and around which the other items of the set cluster densely. In other
words, mode is the value of a series which is predominant in it. In the
words of Coxton and Cowden, “The mode of a distribution is value at the
point around which the items tend to be most heavily concentrated. It may
be regarded as the most typical of a series of values”. According to A.M.
Tuttle. ‘Mode is the value which has the greatest frequency density in its
immediate neighborhood’.

Illustration: The wheat yield in a particular region over the past 12 years (in
millions of tons) are: 1.5, 1.3, 1.2, 1.0, 1.3, 1.4, 1.6, 1.7, 1.5, 1.3, 1.2 and 1.4.
The mode is 1.3 (million tons).\

In case of a frequency distribution, mode is the value of the variable


corresponding to the maximum frequency. This method can be applied
with ease and simplicity if the distribution is ‘unimodal’. For example, in
the following distribution:

X 1 2 3 4 5 6 7 8 9

F 3 1 18 25 40 30 22 10 6

The maximum frequency is 40 and therefore, the corresponding value 5


gives the value of mode. In case of a frequency curve mode corresponds
to the peak of the curve.

Fre
qu
en 41
cy
0 Mode x

In the case of continuous distribution, the class corresponding to the


maximum frequency is called the modal class and the value of mode is
obtained by the interpolation formula:

Mode =l + , where

l is the lower limit of the modal class,

f1 is the frequency of the modal class,

f0 is the frequency of the class preceding the modal class

f2 is the frequency of the class succeeding the modal class

h is the magnitude of the modal class.

Assumptions:

(i) The frequency distribution must be continuous with exclusive


type classes without any gaps.

(ii) The class intervals must be uniform throughout

42
Example: find the value of mode from the data given below:

Weight (in 93-97 98-10 103-1 108-1 113-1 118-1 123-1 128-1
kg) 2 07 12 17 22 27 32

No. of 3 5 12 17 14 6 3 1
students

Solution

Class boundaries f

92.5 – 97.5 3

97.5 – 102.5 5

102.5 – 107.5 12

107.5 – 112.5 17

112.5 – 117.5 14

117.5 – 122.5 6

122.5 – 127.5 3

127.5 – 132.5 1

Here maximum frequency is 17. The corresponding class 107.5 – 112.5 is


the modal class.

43
Mode =l + =

= 107.5 + = 110.625 kgs

EMPIRICAL RELATION BETWEEN MEAN (M), MEDIAN (MD), MODE (MO)

In case of a symmetrical distribution mean, median and mode coincide i.e.


mean = median = mode. However, for a moderately asymmetrical
(non-symmetrical or skewed) distribution, mean and mode usually lie on
the two ends and median lies in between them and they obey the following
important empirical relationship given by Prof. Karl Pearson.

Mode = mean – 3 (mean – median)

mean – mode = 3 (mean – median)

mean – median = (mean – mode)

The above relation can be exhibited diagrammatically as follows:

Divides area in halves

Under peak of curve

Centre of gravity

MO MD M

44
QUARTILES, DECILES AND PERCENTILES

We have defined the median as the value of items which is located at the
centre of the array, we can define other measures which are located at
other specified points.

Quartiles: The values which divide the given data into four equal parts are
known as quartiles. Obviously there will be three such points Q1, Q2 and Q3
such that Q1 ≤ Q2 ≤ Q3, termed as the three quartiles. Q1, known as the lower
or first quartile is the value which has 25% of the items of the distribution
below it and consequently 75% of the items are greater than it. Incidentally
Q2, the second quartile, coincide with the median and has an equal number
of observations above it and below it. Q3, known as the upper or third
quartile, has 75% of the observations below it and consequently 25% of the
observations above it. The working principle for computing the quartiles is
basically the same as that of computing the median.

To computer Q1, find , where N = ∑f, see the (less than) c.f. just greater
than , the corresponding value of x gives the value of Q1. In case of
continuous frequency distribution, the corresponding class containing Q1
and the value of Q1 is obtained by the interpolation formula:

Q1 =l + , where all symbols has this usual meaning.

Similarly to compute Q3, see the (less than) c.f., just greater than , and for
continuous distribution, Q3 =l +

Deciles

Deciles are the values which divide the series into ten equal parts.
Obviously there are Nine deciles, D1, D2, D3, - - -, D9 (Say), such that D1 ≤ D2 ≤
D3 ≤ - - - ≤ D9. Incidentally D5 coincides with the median. The method of
computing the deciles Di (i = 1, 2, 3, - - -, 9) is the same as discussed for Q1
and Q3. To compute the ith decile = Di (i = 1, 2, 3, - - -, 9) see the c.f. just

45
greater than . the corresponding value of X is Di . In case of continuous
frequency distribution the corresponding class contains Di and its value is
obtained by the interpolation formula:

Di =l +

Percentiles:

Percentiles are the values which divide the series into 100 equal parts.
Obviously, there are 99 percentiles, P1, P2, P3, - - -, P99 such that P1 ≤ P2 ≤ P3 ≤
- - -≤ P99. The ith percentile Pi (i = 1, 2, 3, - - -, 99) is the value of X
corresponding to c.f. just greater than . In case of continuous frequency
distribution, the corresponding class contains Pi and its value is obtained
by the interpolation formula:

Pi =l +

In particular, we shall have:

P25 = Q1, P50 = D5 = Q2, P75 = Q3, D9 = P90, D1 = P10, D2 = P20

D3 = P30

The various partition values quartiles, deciles and percentiles can be easily
located graphically with the help of Ogive.

Example: The following data gives the distribution of marks of 100


students. Obtain the values of quartiles, 6th decile and 70th percentile

Class F Less than Class


c.f. boundaries

Less than 5 5 Below 9.5


10

10 – 19 8 13 9.5 - 19.5

46
20 – 29 7 20 19.5 - 29.5

30 – 39 12 32 29.5 - 39.5

40 – 49 28 60 39.5 - 49.5

50 – 59 20 80 49.5 - 59.5

60 – 69 10 90 59.5 - 69.5

70 – 79 10 100 69.5 - 79.5

∑f =
100

Quartiles: Q1 = =

The c.f. just greater than is 32. Hence the corresponding class is 29.5 –
39.5

Q1 = 29.5 +

The c.f. just greater than is 80. Hence, the class is 49.5 – 59.5 is the Q3
class

Q3 = 49.5 +

6th Deciles D6 =

D6 = 49.5 +

70th Percentile =

P70 = 49.5 +

DISPERSION

47
Averages or the measures of central tendency give us an idea of the
concentration of the observations about the central party of the
distribution. In spite of their great utility in statistical analysis, they have
their own limitations. If we are given only the average of a series of
observations, we cannot form complete idea about the distribution since
there may exist a number of distribution where averages are same but
which may differ widely from each other in a number of ways. Thus, the
measures of central tendency must be supported and supplemented by
some other measures, one such measure is ‘Dispersion’ literal meaning of
dispersion is “scatteredness”. We study dispersion to have an idea of the
homogeneity (compactness) or heterogeneity (scatter) of the distribution.
Dispersion is the measure of the variation of the items.

The various measures of dispersion are:

(i) Range (ii) Quartile deviation or semi-interquartile range (iii)mean


deviation (iv) Standard deviation (v) Lorenz curve.

The first two measures, range and quartile deviation are termed a position
measures since they depend upon the values of the variables of particular
position of the distribution. The last measure, Lorenz curve is a graphical
method of studying variability.

1. Range: Range is the difference between the greatest (maximum) and


the smallest (minimum) observation of the distribution. Thus

Range = Xmax - Xmin

In case of a grouped frequency distribution (for discrete values) or the


continuous frequency distribution, range is defined as the difference
between the upper limit of the highest class and the lower limit of the
smallest class.

48
Coefficient of Range (Relative measure of range) =

Illustration: Calculate the range and the coefficient of range of A’s


monthly earnings for a year.

Month 1 2 3 4 5 6 7 8 9 10 11 12

Earning 13 15 15 15 15 15 16 16 16 16 17 17
(N1000) 9 0 1 1 7 8 0 1 2 2 3 5

Solution

L = 175000, 5 = 139000

Range = L – S = 175000 – 139000 = 36000

Coefficient of range =

Illustration: The following table gives the age distribution of a group of


50 individuals.

Age (in 16 – 20 21 – 25 26 – 30 31 – 35
years)

No of 10 15 17 8
Persons

Calculate range and the coefficient of range.

Solution

Convert into continuous classes. The first class will then become 15.5

49
– 20.5 and the last class will become 30.5 – 35.5

L = 35.5, 5 = 15.5

Range = L – S = 35.5 – 15.5 = 20 years

Coefficient of range =

2. Quartile Deviation or Semi Inter-Quartile Range

It is a measure of dispersion based on the upper quartile Q3 and the


lower quartile Q1.

Inter-quartile range = Q3 – Q1

Quartile Deviation (Q.D) =

Q.D as defined above is only an absolute measure of dispersion for


comparative studies of variability of two distributions we need a
relative measure which is known as efficient for Quartile Deviation
and is given by:

Coefficient of Q.D =

Percentile Range

This is a measure of dispersion based on the difference between certain


percentiles. If Pi is the ith percentile and Pj is the jth percentile then the
so-called i-j percentile range is given by i-j percentile Range = Pj – Pi (i < j).

Thus i – j semi-percentile Range is given by:

50
(Pj – Pi)/2, (i < j)

The commonly used percentile range is the one which corresponds to the
10th and 90th percentile. Thus,

10 – 90 percentile Range = P90 – P10 and

10 – 90 semi-percentile Range = (P90 – P10)/2.

The above measures are absolute measures only. The relative measure of
variability based on percentile is given by:

Coefficient of 10 – 90 percentile =

3. Mean Deviation or Average Deviation

As already pointed out, the two measures of dispersion discussed so


far, range and Q.D are not based on all the observations also they do not
exhibit any scatter of the observations. Also they do not exhibit any
scatter of the observation from an average and thus completely ignore
the composition of the series. Average Deviation overcomes both these
drawbacks.

According to Clark and Schkade: “Average” deviation is the average


amount of scatter of the items in a distribution from either the mean or
the median, ignoring the signs of the deviations. The average that is
taken of the scatter is an arithmetic mean, which account for the fact
that this measure is often called the mean deviation”,

If X1, X2, - - -, Xn are n given observations then the mean deviation (M.D)
about an average A, say, is given by:

M.D =

51
Where =

Steps:

(i) Calculate the average A of the distribution by the usual method

(ii) Take the deviation d = X – A of each observation from the Average A.

(iii) Ignore the negative signs of deviation, taking all the deviation to be
positive to obtain the absolute deviation, = .

(iv) Obtain the sum of the absolute deviations obtained in step (iii)

(v) Divide the total obtained in step (iv) by n, the number of observation.

The result gives the value of the mean deviation about the average A. In
case of frequency distribution or grouped or continuous frequency
distribution, mean deviation about an average A is given by:

M.D. = , where x is the value of variable or it is the mid-value of the class


interval.

Relative Measures of Mean Deviation. The measures of mean deviation as


defined above are absolute measure depending on the units of
measurement. The relative measure of dispersion called the coefficient of
mean deviation is given by:

Coefficient of M.D. =

coefficient of M.D. about mean =

And coefficient of M.D. about median =

The coefficients of mean deviation defined above are pure numbers


independent of the units of measurement and are useful for comparing the
variability of different distribution.

Example: Calculate the mean deviation from the following data given

52
marks obtained by 11 students in a class test 14, 15, 23, 20, 10, 30, 19, 18,
16, 25, 12

53
Solution

M.D. =

Example: Calculate mean deviation from median of the following


distribution.

Class 50 – 100 100 – 150 – 200 – 250 – 300 –


Interval 150 200 250 300 350

f 7 18 25 31 15 4

Also calculate the coefficient of mean deviation from median.

Solution

Less than C.F. Mid-value (X) = F

7 75 125 875

25 125 75 1350

50 175 25 625

81 225 25 775

96 275 75 1125

100 325 125 500

∑ F = 5250

Here

Median =1 +

54
M.D. = about median =

Coefficient of M.D. =

4. Standard Deviation

Standard deviation, usual denoted by the letter (small sigma) of the Greek
alphabet was first suggested by Karl Pearson as a measure of dispersion
in 1893. It is defined as the positive square root of the arithmetic mean of
the squares of the deviations of the given observations from the arithmetic
mean. Thus if X1, X2, - - -, Xn is a set of n observations then its standard
deviation is given by:

= 2 , where

, is the arithmetic mean of the given values.

Steps:

(i) Compute the arithmetic mean

(ii) Compute the deviation (X - ) of each observation from arithmetic


mean, i.e., obtain X1 - , X2 - , - - -, Xn - .

(iii) Square each of the deviations obtained in step (ii) i.e., compute (X1 - )
2
, (X2 - )2, - - -, (Xn - 2.

(iv) Find the sum of the squared deviations in step (iii) and divide by n
given by:

∑f (X - )2/n = (X1 - )2 + (X2 - )2 + - - - + (Xn - 2.

(v) Take the positive square root of the value obtained in step (v)

(vi) The resulting value gives the standard deviation of the distribution.

In case of frequency distribution, the standard deviation is given by:


55
= 2, N = ∑f, X is the value of the variable or the mid-value of class (in case
of grouped or continuous frequency distribution); f is the corresponding
frequency of the value x.

Thus the value of will be greater if the values of X are scattered widely
away from the mean. Thus a small value of will imply that the distribution
is homogeneous and a large value of will imply that it is heterogeneous.

Variance and Mean Square Deviation

According to William I. Greenwald the variance is the mean of the squared


deviations about the mean of a series. Thus, variance is the square of the
standard deviation and is denoted by 2. For a frequency distribution
variance is given by:
2
= 2.

The mean square deviation, usually denoted by S2 is defined as

S2 = 2, where A is any arbitrary number.

The square root of the mean square deviation is called root mean square
deviation and given by: S =2

Relation between 2 and S2. We have

S2 = 2

=2

= 2 + (2 + 2

=2+2.

being constant is taken outside the summation sign.

∑f

S2 = 2 + 2
56
so, S2 = 2 + [2

(, being the square of a real quantity is always non-negative.

Thus S2 = 2+ (A non-negative quantity)

S22

In other words, mean square deviation is not less than the variance or the
root mean square deviation is not less than the square deviation.
2 2
S = iff

(2 = O

so, ,

Thus, S2 will be least when = A. Hence, mean square deviation or


equivalently root mean square deviation is least when deviations are taken
from the arithmetic mean and variance (standard deviation) is the
minimum value of mean square deviation (root mean square deviation).

Different Formula:

· 2
x = 2, ∑f = N

· 2
x = 22 = 2 - 2

If d = X – A, when A is an arbitrary constant, then

· 2 2
x = d =2-2

If we change the origin and scale in X i.e., if we take

d = ; h > O, then

· 2
x = h22d = h2 2 2

· Coefficient of standard deviation = , coefficient of variation = x 100

57
Example, calculate the standard deviation of the frequency observation on
a certain variable:

240.12, 240.13, 240.15, 240.12, 240.17,

240.15, 240.17, 240.16, 240.22, 240.21.

solution

X X– (X - 2

240.12 - 0.04 0.0016

240.13 - 0.03 0.0009

240.15 - 0.01 0.0001

240.12 - 0.04 0.0016

240.17 0.01 0.0001

240.15 - 0.01 0.0001

240.17 0.01 0.0001

240.16 0.00 0

240.22 0.06 0.0036

240.21 0.05 0.0025

58
∑X = 2401. ∑ (X – ∑ (X - 2 = 0.0106
60

=240.16, variance=0.00106, sd=0.03256

Example: Calculate the mean and standard deviation from the following:

Value 90 – 80 – 70 – 60 – 50 – 40 – 30 –
99 89 79 69 59 49 39

F 2 12 22 20 14 4 1

Solution

Class Mid-value f d= Fd fd2


(X)

90 – 99 94.5 2 3 6 18

80 – 89 84.5 12 2 24 48

70 – 79 74.5 22 1 22 22

60 – 69 64.5 20 0 0 0

50 – 59 54.5 14 -1 -14 14

59
40 – 49 44.5 4 -2 -8 16

30 – 39 34.5 1 -3 -3 9

∑f = ∑fd = ∑fd2 =
75 27 127

= 68.1

= h. = 12.505

coefficient of variation = x 100 = 18.36%

Skewness and Kurtosis

It has been pointed out that we need statistical measures which will reveal
clearly the salient features of a frequency distribution. The measures of
central tendency tells us about the concentration of the observations
about the middle of the distribution and the measure of dispersion gives
us an idea about the spread or scatter of the observations about some
measure of central tendency. We may come across frequency
distributions which differ widely in their nature and composition and yet
may have the same central tendency and dispersion, but yet may give
histograms which differ very widely in shape and size.

Thus the measures of central tendency and dispersion are inadequate to


characterize a distribution completely and they must be supported and
supplemented by two more measures; ‘skewness’ and ‘kurtosis’.

Skewness helps us to study the shape i.e., symmetry or asymmetry of the


distribution while kurtosis refers to the flatness or peakedness of the
curve which can be drawn with the help of the given data.

60
Skewness

Literal meaning of skewness is ‘lack of symmetry’. We study skewness to


have an idea about the shape of the curve which we can draw with the help
of the given frequency distribution. It helps us to determine the nature and
extent of the concentration of the observation towards the higher or lower
values of the variable. A distribution is said to be skewed if:

(i) The frequency curve of the distribution is not a symmetric


bell-shaped curve but it is stretched more to one side than to the
other.

(ii) The values of mean, median and mode fall at different point, i.e., they
do not coincide.

(iii) Quartiles Q1 and Q3 are not equidistant from the median Q3 – md ≠


md – Q1

(iv) The corresponding pairs of deciles and percentiles are not


equidistant from the median i.e.,

D5 – D5 – i ≠ D5+1 – D5 (i = 1, 2, 3, 4)

P50 – P50-i ≠ P50 + 1 – P50 (i = 1, 2, - - - 49)

(v) The sum of the positive deviations from the median is not equal to
the sum of the negative deviation from the median.

the following are coefficient of skewness which are commonly used:

1. Karl Pearson’s coefficient of skewness. This is given by the formula:

Skewness =

But quite often, mode is ill-defined and is thus quite difficult to locate.
In such a situation, we use the following empirical relationship between

61
the mean, median and mode for a moderately asymmetrical (skewed)
distribution.

MO = 3md – 2m

= skewness =

2. Bowley’s coefficient of skewness. Prof. A. L. Bowley’s coefficient of


skewness is based on the quartiles and is given by :

Skewness =

This is also known as quartile coefficient of skewness and is especially


useful in situation where quartiles and median are used.

3. Kelly’s measure of skewness. The drawbacks of Bowley’s coefficient of


skewness (that it ignores the 50% of the data towards the extremes),
can be partially removed by taking two deciles or percentiles
equidistant from the median value. The refinement was suggested by
Kelly. Kelly’s percentile (or decile) measure of skewness is given by:

Skewness = (P90 – P50) – (P50 – P10) = P90 + P10 – 2P50

But P50 = D5, P90 = D9 and P10 = D1. Hence

Skewness = (D9 – D5) – (D5 – D1) = D9 + D1 – 2D5

The kelly’s coefficient of skewness is given by:

Sk (Kelly) =

4. Coefficient of skewness based on moments. This coefficient is based


on the 2nd and 3rd moment about mean.

62
KURTOSIS

So far we have studied three measures; central tendency, dispersion and


skewness to describe the characteristics of a frequency distribution.
However, even if we know all these three measures we are not in a position
to characterize a distribution completely. The following diagram will clarify
the point.

A – Lepto -
Kurtic

B – Meso - Kurtic

C – Platy - Kurtic

Kurtosis is concerned with flatness or peakedness of the frequency curve.


Curve of type B which is neither flat nor peaked is known as normal curve
and shape of its hump (middle part) is accepted as a standard one. Curve
with humps of the form of a normal curve are said to have normal kurtosis
and are termed as meso-kurtic. The curve of type A, which is more peaked
than the normal curve are known as lepto-kurtic and are said to lack
Kurtosis or to have negative kurtosis. On the other hand, curve of type C,
which are flatter than the normal curve are called platy-kurtic and they are
said to possess kurtosis in excess or have positive kurtosis.

63
THOERY OF PROBABILITY

If an experiment is performed repeatedly under essentially homogeneous


and similar conditions, the results or what is termed as outcome may be
classified as:
(a)It is unique or certain,

(b)It is not definite but maybe one of the various possibilities depending
on the experiment.

The result under category (a) where the results can be predicted with
certainty is known as deterministic or predictable phenomenon. In a
deterministic phenomenon, the conditions under which an experiment is
performed, uniquely determine the outcome of the experiment.

In category (b) where the results cannot be predicted with certainty are
known as unpredictable or probabilistic phenomenon. Such phenomena
involve uncertainty or chance.

A numerical measure of uncertainty is provided by a very important branch


of statistic called the “Theory of Probability”. Statistics is the science of
decision making with calculated risk in the face of uncertainty.

Therefore, the theory of probability has as its central feature, the concept
of a repeatable random experiment, the outcome of which is uncertain.

Basic Definitions.

64
Before we define probability as a concept, it is necessary to review the
definition of some probability terms that shall be employed in our
discussions.

(1) A trial and Event. Performing of a random experiment is called a trial


and outcome or combinations of outcomes are termed as events.
For Example: If a coin is tossed repeatedly, the result is not unique.
We may get any of the two faces, Head (H), or Tail (T). Thus tossing
of a coin is a random experiment or trial and getting of a head or tail
is an event.

(2) Random Experiment. An experiment is called a random experiment if


when conducted repeatedly under essentially homogeneous
conditions, the result is not unique but may be any one of the various
possible outcomes.

(3) An outcome. It is a possible result of a trial or experiment. For


example, in a toss of two coins, our outcome could be any one of HH,
HT, TH, TT. The possible outcome in a throw of a die and 1, 2, 3, 4, 5,
6.

(4) Exhaustive cases: the total number of possible outcome of random


experiment is called the exhaustive cases for the experiment. Thus,
in toss of a single coin, we can get Head (H) and Tail (T). Hence,
exhaustive number of cases is 2, viz, (H,T)

(5) Favourable cases of Events. The number of outcomes of a random


experiment which entail (or result in) the happening of an event are
termed as the cases favourable to the event. For example; in a toss
of two coins, the number of cases favourable to the event exactly
one head is 2, viz, HT, TH and for getting two heads is one viz, HH.

(6) Mutually Exclusive Events. Two or more events are said to be

65
mutually exclusive if the happening of any one of them excludes the
happening of all others in the same experiment. For example, in toss
of a coin, the event ‘head’ and ‘tail’ are mutually exclusive because if
head comes, we can’t get tail and if tail comes we can’t get head.
Similarly, in the throw of a die, the six faces numbered 1, 2, 3, 4, 5 and
6 are mutually exclusive. Thus, events are said to be mutually
exclusive if no two or more of them can happen simultaneously.

(7) Equally likely cases: The outcomes are said to be equally likely or
equally probable if none of them is expected to occur in preference
to other. Thus, in tossing of a coin or a dice, all the outcomes, H,T or
the faces 1, 2, 3, 4, 5, 6 are equally likely if the coin or dice is
unbiased.

(8) Independent Events. Events are said to be independent of each other


if happening of any one of them is not affected by and does not
affect the happening of any one of others. For example, in tossing of
a die repeatedly, the event of getting ‘5’ in 1st throw is independent of
getting ‘5’ in second, third or subsequent throws.

(9) Sample Space S( ): this is the collection of all possible outcomes of


an experiment. It is a set of all finite or countably infinite number of
elementary outcomes. It is usually represented byS . for example, the
sample when a dice is thrown twice isS = {1,1 1,2 1,3 . . . 6,6}

The probability associated with an event is a measure of believe that an


event will occur. However, there are three conceptual approaches to the
definition of probability.

i) Classical approach

ii) Empirical approach

66
iii) Axiomatic approach

CLASSICAL APPROACH

Definition: If a random experiment results in ‘N’ exclusive, mutually


exclusive and equally likely outcomes (cases) out of which ‘M’ are
favourable to the happening of an event, then the probability of occurrence
of A, usually denoted by P(A) is given by:

Remarks.
(1) Obviously, the number of cases favourable to the complementary
event i.e, non-happening of event ‘A’ are (N-M) and hence by
definition, the probability of non-occurrence of a is given by:

(2) P(A) ≥ 0, since M ≤ N, we have P(A) ≤ 1. Hence 0 ≤ P (A) ≤ 1.

For any event A. if P (A) = 0, then A is called and impossible or null event. If
P (A) = 1, then A is called a certain event.

Limitations: The classical probability has it short-comings and fails in the


following situations:
(i) If N, the exhaustive number of outcomes of the random experiment
is infinite.

(ii)If the various outcomes of the random experiment are not equally
likely.

(iii) If the actual value of N is not known.

67
EMPIRICAL APPROACH

Definition: if an experiment is performed repeatedly under essentially


homogeneous and identical condition, then the limiting value of the ratio
of the number of times the event occurs to the number of trials, as the
number of trials becomes indefinitely large, is called the probability of
happening of the event, it being assumed that the limit is finite and unique.

Suppose that an event A occurs m times in N repetitions of a random


experiment. Then the ration gives the relative frequency of the event A
and it will not vary appreciably from one trial to another. In the limiting
case when N becomes sufficiently large, it more or less settles to a
number which is called the probability of A. symbolically,

Limitations: It may be remarked that the empirical probability P (A) defined


above can never be obtained in practice and we can only attempt at a
close estimate of P(A) by making N sufficiently large. The following are the
limitations of the experiment.

(i) The experimental conditions may not remain essentially


homogeneous and identical in a large number of repetitions of the
experiment

(ii) The relative frequency, may not attain a unique value, no matter
however large N may be.

Example 1. A uniform dice is thrown at random. Find the probability that


the number on it is (i) 5, (ii) greater than 4 (iii) even

Solution.
Since the dice can fall with any one of the faces 1, 2, 3, 4, 5, and 6, the
exhaustive number of cases is 6.

68
(i) The number of cases favourable to event of getting ‘5’ is 1

required probability =

(ii) The number of cases favourable to the event of getting a number


greater than 4 is 2 (5&6): required probability =

(iii) Favourable cases for getting an event number are 2, 4, and 6:


required probability =

Example 2. A coin is rolled three times, what is the probability of getting (i)
1 head, (ii) 2 heads, (iii) at least 2 heads.

Solution

Let H and T represent head and tail respectively, by using tree diagram to
generate the sample space, we haveS = {HHH, HTH, HHT, THH, TTH. HTT,
THT, TTT}

(i) Prb (1 head) = {HTT, THT, TTH} =

(ii) Prb (2 heads) = {HHT, THH, HTH} =

(iii) Prb (at least 2 heads) = P (2 heads) + P (3 heads) =

ADDITION THEOREM OF PROBABILITY

Theorem 1. The probability of occurrence of at least one of the two events


A and B is given by: P (AUB) = P (A) + P (B) – P (AnB)

Theorem 2. The probability of happening of any one of the two mutually


exclusive events is equal to the sum of their individual probabilistic i.e.

P (AUB) = P (A) + P (B); P (AnB) = 0

Multiplication Theorem of Probability

Theorem 3. The probability of simultaneously happening of two events A


and B is given by:
69
P (AnB) = P (A) . P (B/A); P (A) ≠ 0 or

P (BnA) = P (B) . P (A/B); P (B) ≠ 0

Where P (B/A) is the conditional probability of happening of B under the


condition that A has happened and P (A/B) is the conditional Probability of
A under the condition that B has happened.

For independent Events; P (AnB) = P (A). P (B).

In general, the events A1, A2, A3, - - -, An are independent if and only if

P (A1 n A2 n A3 n - - - n An) = P (A1). P (A2). P (A3) . - - - . P (An)

i.e., the probability of the simultaneous happening of n events is equal to


the product of the probabilities of the individual happenings.

Note: P (A/B) =

P (B/A) =

Example 3. A bag contains 8 white and 3 red balls. If two balls are drawn at
random without replacement, find the probability that
(i) Both are white

(ii) Both are red

(iii) One is of each color

Solution.

Total number of balls in the bag = 8 + 3 = 11

(i) Prb (of 1st ball being white) =

Prb (of 2nd ball drawn is white) =

Prb (both the balls are white) = x

(ii) Prb (of 1st ball being red) =


70
Prb (of 2nd ball drawn being red) =

Prb (both the balls drawn are red) = x

(iii) Prb (one is of each colour)

= prb (1st is white and 2nd is red) or prb (1st is red and 2nd is white)

The Axiomatic Approach

If A is an event in a relation to an experiment with sample spaceS is


defined as a real valued function P (A) which satisfy the following
properties.

Property 1: 0 ≤ P (A) 1 for every event A. i.e., P (A) ≥ 0 (Axiom of


non-negative)

Property 2: P S
( ) = 1 (Axiom of certainty)

Property 3: if A1, A2, - - -, An is any finite or infinite sequence of disjoint


events ofS , then

P or

P Axiom of additivity

Example. Let A and B be the two possible outcomes of an experiment and


suppose P (A) = 0.4, P (AUB) = 0.7 and P (B) = P

(i) For what choice of P are A and B mutually exclusive?

(ii) For what choice of P are A and B independent?

Solution.
(i) We have P (AUB) = P (A) + P (B) – P (AnB)

71
P (AnB) = P (A) + P (B) – P (AUB) = 0.4 + P – 0.7 = P – 0.3

If A and B are mutually exclusive, then

P (AnB) = 0 so, P – 0.3 = 0, P = 0.3

(ii) A and B are independent if and only if

P (AnB) = P (A). P (B)

P – 0.3 = (0.4) x P, (1 – 0.4) p = 0.3

0.6 p = 0.3, P = = 0.5

Theorems
(1) P (Ā) = 1 – P (A)

(2) (i) P (ĀnB) = P (B) – P (AnB)

(ii) P (An = P (A) – P (AnB)

(3) If events A and B are independent then the events

(i) A and are independent

(ii) and B are independent

(iii) and are independent

Example. Probability that a man will be alive 25 years hence is 0.3 and the
probability that his wife will be alive 25 years hence is 0.4. Find the
probability that 25 years hence

(i) Both will be alive

(ii) Only the man will be alive

(iii) Only the woman will be alive

(iv) None will be alive

72
(v) At least one of them will be alive

Solution

Let A be the event that the man will be alive 25 years hence, B the event
that the woman will be alive 25 years hence. P (A) = 0.3 and P (B) = 0.4

(i) P (AnB) = P (A). P (B) = 0.3 x 0.4 = 0.12 (A & B are independent)

(ii) P (An) = P (A) – P (AnB) = 0.30 – 0.12 = 0.18

(iii) P (nB) = P (B) – P (AnB) = 0.40 – 0.12 = 0.28

(iv) P () = P ( P ( = (1 – 0.3) (1 – 0.4) = 0.42

(v) P = 1 – P (None will be alive) = 1 – 0.42 = 0.58

or

P (AUB) = P (A) + P (B) – P (AnB) = P (A) + P (B) – P (A). P (B)

= 0.3 + 0.4 – 0.3 x 0.4 = 0.70 – 0.12 = 0.58

BAYE’S RULE (INVERSE PROBABILITY)


One of the important applications of the conditional probability is in the
computation of unknown probabilities on the basis of the information
supplied by the experiment or past records. For example, suppose an
event has occurred through one of the various mutually disjoint events or
reasons, then the conditional probability that it has occurred due to a
particular event or reason is called it inverse or posteriori probability.
These probabilities are computed by Baye’s Rule.

Theorem. If an event A can only occur in conjunction with one of the


mutually exclusive and exhaustive events E1, E2,, - - -, En and if ‘A’ actually
happens, then the probability that it was preceded by the particular event Ei

73
(i = 1, 2, - - -, n)

P (Ei/A) =

Remark. The probabilities P (E1), P (E2), - - - , P (En) which are already given
or known before conducting an experiment are termed as a prior
probabilities. The conditional probabilities P(E1/A), P (E2/A), - - -, P(En/A),
which are computed after conducting the experiment, occurrence of A are
termed as posteriori probabilities.

Example. In a bolt factory, machines A, B, C manufactured respectively


25% , 35% and 40% of the total, of their output 5, 4, 2 percent are known to
be defective bolts. A bolt is drawn at random from the product and is
found to be defective. What are the probabilities that it was manufactured
by

(i) Machine A, (ii) Machine B or C

Solution

Let E1, E2 and E3 denote respectively the events that the bolt selected at
random is manufactured by the machines A, B and C respectively and let E
denote the event that it is defective. Then we have

Ei E1 E2 E3 Total

P (Ei) 0.25 0.35 0.40

P (E/Ei) 0.05 0.04 0.02

P (EnEi) =

P (Ei) x P (E/Ei) 0.0125 0.0140 0.0080 P (E) = 0.0345

(i) Hence, the probability that a defective bolt chosen at random is


manufactured by factory A is given by Baye’s rule as:

74
P (E1/E) =

(ii) Similarly

P (E2/E) =

P (E3/E) =

Hence, the probability that a defective bolt chosen at random is


manufactured by machine B or C is:

P (E2/E) + P (E3/E) = 0.41 + 0.23 = 0.64

Or

Required probability = 1 – P (E1/E) = 1 – 0.36 = 0.64

Or using Tree Diagram

Events probability
0. E E1 n E 0.25 x 0.05 = 0.0125

E1
0.
0. E E2 n E 0.35 x 0.04 = 0.0140
0.

E2
0. E E3 n E 0.40 x 0.02 = 0.0080
0.

E3

Total = 0.0345

From the above diagram, the probability that a defective bolt is


manufactured by factory A is

P (E1/E) =

75
P (E2/E) =

P (E3/E) =

76
Random Variable

Intuitively, by a random variable (r.v) we mean a real number X associated


with the outcomes of a random experiment. It can take any one of the
various possible values each with definite probability. For example, in a
throw of a die, if X denotes the number obtained, then X is a random
variable which can take any one of the values 1, 2, 3, 4, 5 or 6, each with
equal probability. Similarly, in toss of a coin if X denotes the number of
heads, then X is a random variable which can take any one of the two
values; 0 (No Head), 1 (head), each with equal probability .

For example, consider the variable X, which is the number of heads


obtained when three (3) coins are tossed simultaneously. Then, X is a
random variable which can take any one of the values 0, 1, 2, 3.

Outcome HHH HTH THH HHT HTT THT TTT

Values of 3 2 1 2 1 1 0
X

Thus, rigorously speaking, random variable may be defined as a real valued


function on the sample space, taking values on the real lineR (-, ). In other
words, random variable is a function which takes real values which are
determined by the outcomes of the random experiment. A random variable
is denoted by the capital letters X, Y, Z, - - - etc.

Discrete and Continuous Random Variable.

If the random variable X assumes only a finite or countably infinite set of


values it is known as discrete random variable. For example, marks
obtained by students in a test. On the other hand, if the random variable X
can assume infinite and uncountable set of values, it is said to be
continuous random variable, e.g., the age, height, or weight of students in a

77
class.

Mathematical Expectation

If X is a random variable which can assume any one of the values x1, x2, …, xn
with the respective probabilities p1 p2, - - -, pn, then the mathematical
expectation of X, usually called the expected value of X denoted by E (X), is
defined as;

E (X) = p1 x1 + p2 x2 + - - - + pn xn = (For a discrete random variable)

Where = p1 + p2 + - - - + pn = 1

More precisely, if X is a random variable with probability distribution (X,


P(x)), then E(x) =∑ x.p (x).

Physical Interpretation of E(X)

Let us consider the following frequency distribution of the random variable


X;

X x1 x2 x3 - - - xi - - - xn

F f1 f2 f3 - - - fi - - - fn

Then the mean of distribution is given by;

= = x1 + x2 + - - - + xn - - - (*)

We observe that, out of total of N cases fi cases are favourable to xi

so, P (X = xi) =

78
Substituting in (*) we get:

p 1 x1 + p 2 x2 + - - - + p n xn

E (x) (by defi. of E (X))

Hence, mathematical expectation of a random variable is nothing but its


arithmetic mean.

Theorem on Expectation

Theorem 1. E (C) = C, where C is a constant

Theorem 2. E (cX) = c E (X), where c is a constant

Theorem 3. E (aX + b) = a E (X) + b, where a and b are constants

Theorem 4. E (X + Y) = E (X) + E (Y)

Theorem 5. E (XY) = E (X) . E (Y)

Remark. It should be borne in mind that the multiplication theorem of


expectation holds only for independent events while no such condition on
the variables is required for the addition theorem of expectation.

Variance of X in Terms of Expectation

Variance of X (var (X)) is defined as:

Var (X) = E [X-E(X)]2

= E (X2) – [E (X)]2 = ∑ X2 p(n) – [∑X p(X)]2

Theorems:

Theorem 1. Var [X ± c) = var (X), where c is a constant

79
Theorem 2. Var (aX) = a2. Var (X), where a is a constant

Theorem 3. Var (C) = 0, where c is a constant

Example. What is the expected number of heads appearing when a fair


coin is tossed three times?

Solution let X denote the number of heads obtained in a random toss of 3


coins. Then the probability distribution of x is

X 0 1 2 3

P (X)

E (X) = ∑ x. p(x)= (0 + 1 x 3 + 2 x3+ 3 x 1) = = 1.5

Example. A contractor spends N30, 000 to prepare for a bid on a


construction project which, after deducting manufacturing expenses and
the cost of bidding, will yield a profit of N250, 000 if the bid is won. If the
chance of winning the bid is ten percent, compute his expected profit and
state the likely decision on whether to bid or not to bid.

Solution
Expenses = N30, 000, this will be his loss if the bid is not won,

Profit if bid is won = N250, 000

80
P (winning the bid) = 10% = 0.10

P (losing the bid) = 1 – 0.10 = 0.90

Expected profit = [250, 000 x 0.10 + (-30, 000) x 0.90] = - N2000

Since the contractor’s expected profit is negative, he should not bid for the
contract.

Example. A random variable X has the following probability distribution

Xi 4 5 6 8

Pi 0.1 0.3 0.4 0.2

Find the var (X)

Solution

Var (X) = E { X – E(X)}2

= E (x2) – (E (x))2

= ∑ PX2 – (∑PX)2

81
X P PX PX2

4 0. 0.4 1.6
1

5 0. 1.5 7.5
3

6 0. 2.4 14.4
4
8 1.6 12.8
0.
Tota 5.9 36.3
2
l
1

Var(X)=36.3-(5.9)2 = 1.49

Theoretical Probability Distributions

A scientific way of drawing inferences about the population characteristics


is through the study of theoretical distributions. In the population, the
values of the variable may be distributed according to some definite
probability law which can be expressed mathematically and the
corresponding probability distribution is known as theoretical probability
distribution. Such probability laws may be based on ‘a priori’
considerations or ‘a posteriori’ inferences. These distributions are based
on expectations on the basis of previous experience. Theoretical
distributions also enable us to fit a mathematical model or a function of
the form Y = P(X) to the given data.

82
Probability Distribution of a Discrete Random Variable

Let us consider a discrete random variable X which can take the possible
values x1, x2, x3, - - -,xn. with each value of the variable, we associate a
number,

Pi = P (X = x); i = 1, 2, - - - , n

Which is known as the probability of Xi and satisfies the following


conditions:

(i) Pi = P (X= xi) ≥ o i.e., Pi’s are all non-negative and

(ii) = P1 + P2 + - - - + Pn = 1 i.e., total probability is one.

More specifically, let X be a discrete random variable and define:

P(X) = P(X = x) such that P(X) ≥ O and ∑P (X) = 1, the function Pi = P (X =


xi) or P (X) is called the probability function or more precisely Probability
Mass Function (pmf) of the random variable X and the set of all possible
ordered pairs [X, p (X)], is called the probability distribution of the random
variable X. it usually represented in a tabular form.

83
Probability Distribution of a Continuous Random Variable

Unlike a discrete probability distribution, a continuous probability


distribution cannot be presented in a tabular form. It is either a formula
form or a graphical form. Let X be a continuous random variable taking
values on the interval [a, b]. A function P(X) is said to be the probability
density function (pdf) of the continuous random variable X if it satisfies
the following properties:

(i) P(X) ≥ O for all X in the interval

(ii) For two distinct numbers c and d in the interval [a, b] p (c ≤ X ≤ d)


= Area under the probability curve between the ordinates (vertical
lines) at X = c and X = d (fig. below)

(iii) Total area under the probability curve is 1

i.e., P (a ≤ X ≤ b) = 1

P (X)

P (c ≤ X≤ d) =

p (x)

P (c ≤ X ≤ d)

X=c X=d x

We shall study the following univariate probability distributions;

84
(i) Binomial probability distribution

(ii) Poisson probability distribution

(iii) Normal probability distribution

The first two distributions are discrete probability distributions and


the third is a continuous probability distribution.

BINOMIAL PROBABILITY DISTRIBUTION

The binomial distribution can be applied under the following conditions:

(i) The random experiment is performed repeatedly a finite and fixed


number of times. In other words n, the number of trials is finite and
fixed.

(ii) The outcome of the random experiment (trial) results in the


dichotomous classification of events. That is success (p) or failure
(1-p=q)

(iii) All the trial are independent

(iv) The probability of success (happening of an event) in any trial is p


and is constant for each trial.

PROBABILITY FUNCTION OF BINOMIAL DISTRIBUTION

If X denotes the number of successes in n trials, A fandom variable X with


a sequence of n independent Bernoulli trial with probability of success P
on each trial, the pmf of binomial is given by

b x qn-x

85
, x = 0, 1, 2, - - -, n

O otherwise

Theorem. If X has a Binomial Distribution, then the mean, E (x) = np, V (x) =
npq

Proof:

By definition

E (x) = ∑x.f (x)

For Binomial Distribution:

E (X) = Px qn-x

= Px qn-x

= Px P! P-! qn-x

= np Px-1 qn-x

= put k = x -1

E (x) = np

By definition V(X) = E (X2) – (E (X))2

But E (X2) = EX (X-1) + E (X), and

P2P-2

86
Put k = x – 2

So,

Hence,

Example; Ten unbiased coins were tossed simultaneously find the


probability of obtaining

(i) Exactly 6 heads

(ii) At least 8 heads

(iii) No head

(iv) At least one head

(v) Not more than three heads

(vi) At least 4 heads

Solution: If p denotes the probability of a head, then p = q = . Here n = 10. If


the random variable X denotes the number of heads, then by the binomial
probability law, the probability of X heads is given by,

87
(i) Required probability = p (6) = 0.2051

(ii) P (X ≥ 8) = P (8) + P (9) + P (10) = 0.0547

(iii) P (X = 0) = P (0) = 0.0009

(iv) P (at least one head) = 1 – P (No head) = 1 – P (o) = 0.9990

(v) P (X ≥ 4) = P (4) + P (5) + - - - + P (10)

= 1 – P (x ≤ 3)

= 1 – [P (0) + P (1) + P (2) + P (3)] = 0.8281

Example: Protein - caloric malnutrition occurs in 10% of all persons in a


certain community. In a randomly selected sample of five persons,
calculate the probability that

(a)Not less than 4 persons have protein - caloric malnutrition

(b)At most 1 person has protein - caloric malnutrition

(c)Between 1 and 4 persons, exclusively have protein - caloric


malnutrition

Solution: If X is the random variable denoting the number of persons with


protein - caloric malnutrition, then X has a binominal distribution with
parameter n=5

p= = 0.10, q= 1-0.10 = 0.90, p (X = x) = ( ) (o.10) x (0.90)5-x

(a)P (X ≥ 4) = P (X = 4) + P (X =5) = 0.0005

(b)P (X ≤ 1) = p (X=0) + p (X = 1) = 0.9186

88
(c)P (1 ≤ x ≤ 4) = P (X = 2) + P (X=3) = 0.0810

POISON DISTRIBUTION (AS A LIMITING CASE OF BINOMIAL


DISTRIBUTION)

Poison distribution may be obtained as a limiting case of Binomial


probability distribution under the following conditions:

(i) n, the number of trial is indefinitely large i.e., n .

(ii) P, the constant probability of success for each trial is indefinitely


small i.e., p o

(iii) np = λ (say), is finite.

Under the above three conditions, the Binomial probability mass function
(pmf) tends to the probability function of the poison distribution given
below:

Where X is the number of successes (occurrences of the event), λ = np


and,

e = 2.71828 (the base of the system of Natural logarithms)

Theorem. Ifx has a Poisson distribution with parameter λ, then E (X) = λ, V


(X) = λ

Proof:

89
By definition ((X ) has a poison distribution)

Put k = x – 1

By definition

ButE (X2) =E X (X – 1) + E (X)

So

Put k = x – 2

, and hence

Hence, the proof.

In Poisson (pmf) distribution the mean and the variance are equal.

Practical situations where Poisson distribution can be Applied

1. The number of telephone calls arriving at a telephone switch board in


unit time ( say, per minute)

2. The number of customers arriving at the super market: say per hour

90
3. The number of defects per unit of manufactured products

4. To count the number of radio – active disintegrations of a radio –


active element per unit of time

5. To count the number of bacteria per unit

6. The number of defect material say, pins, blades etc, in a packing


manufactured by a good concern

7. The number of Suicides reported in a particular day on the number of


causalities (persons dying) due to a rare disease such as heart
attack or cancer or snake bite in any part.

8. The number of accidents taking place per day in a busy road

9. The number of typographical errors per page in a typed material or


the number of printing mistakes per page in a book.

Example: Between the hours 2pm and 4pm the average number of phone
calls per minute coming into the switch board of a company is 2.35 find
the probability that during the particular minute, there will be at most two
phone calls.

Solution: if the random variable X denotes the number of telephone calls


per minute, than X will follow Poisson distribution with parameter = 2.35
and pmf

P (X ≤ 2) = P (x = 0) + P (x = 1) + P (x = 2) = 0.5829

Example: If 5% of the electric bulb manufactured by a company are


defective, find the probability that in a sample of 100 bulbs

91
(i) None is defective

(ii) 5 bulbs will be defective

Solution

Here, n = 100, p = 5% = 0.05

Since p is small and n is large, we may approximate the given distribution


by Poisson distribution.

= np = 100 x 0.05 = 5

Let the random variable X denote the number of defective bulbs in a


sample of 100

(i) P (x = 0) = e-5 = 0.007

(ii)

NORMAL DISTRIBUTION

The distributions discussed so far, viz., Binomial distribution and Poisson


distribution, are discrete probability distributions, since the variable under
study are discrete random variables. Now we confine the discussion to
continuous probability distributions which arise when the underlying
variable is a continuous one.

Normal probability or commonly called the normal distribution is one of


the most important continuous theoretical distributions in statistics. Most
of the data relating to applied statistics conform to this distribution.
Normal distribution is also known as Gaussian distribution (Gaussian Law

92
of Errors) after Karl Friedrich Gauss (1777 – 1855) who used this
distribution to describe the theory of accidental errors of measurements
involved in the calculation of orbits of heavenly bodies. Today, Normal
probability model is one of the most important probability models in
statistical analysis.

EQUATION OF NORMAL PROBABILITY CURVE

If X is a continuous random variable following Normal probability


distribution with mean and standard deviation , then its probability density
function (pdf) is given by

Or

Where π and e are the constants given by

(The base of the system of Natural logarithms)

The Mean and standard deviation are the parameters of the Normal
distribution.

STANDARD NORMAL DISTRIBUTION

If X is a random variable following normal distribution with mean and


standard deviation , then the random variable Z defined as follows:

is called the standard normal variate (S.N.V), we have:

93
Therefore, the standard normal variate has mean 0 and standard deviation
1. Hence, the probability function (pdf) of S.N.V. Z is given by:

=,

This gives the height (ordinate) of standard normal curve at the point Z. A
standard normal variable Z is denoted by Z N (0, 1)

RELATION BETWEEN BINOMIAL AND NORMAL DISTRIBUTIONS

Normal distribution is a limiting case of the binomial probability under the


following conditions:

(i) n, the number of trials is indefinitely large i.e.,

(ii) Neither p nor q is very small

In Binomial variate x with parameters n and p

E (x) = np and var (x) = npq

Tends to the distribution of standard normal variate. Thus, the normal


approximation to the Binomial distribution is better for increasing values
of n and is exact in the limiting case as

94
RELATION BETWEEN POISSON AND NORMAL DISTRIBUTION

If X is a random variable following Poisson distribution with parameter λ,


then

E (X) = λ, and var (X) = λ

Thus standard Poisson variate becomes:

Thus, normal distribution may also be regarded as a limiting case of


poisson distribution as the parameter λ

PROPERTIES OF NORMAL DISTRIBUTION

The normal distribution possesses many properties; some of those


properties are listed below:

1. The graph of P(X) is the famous bell shaped curve as shown below.
The top of the bell is directly above the mean (µ)

P (X)

x= µ

2. The curve is symmetrical about the line X = µ, (Z = 0), i.e., it has the
same shape on either side of the line x = µ (or z = 0)

95
3. Since the distribution is symmetrical, mean = median = mode = µ

4. The total area under normal probability curve is 1

5. The normal distribution is unimodal, the mode occurring at x = µ

AREAS UNDER STANDARD NORMAL PROBABILITY CURVE

P (0 < Z < z1)

z=0 z1

For z1, > 0,

P (0 < Z < z1) = Area under standard normal curve between Z = 0 and Z = z1

The value of the definite integral can be obtained to any degree of


accuracy by the numerical approximation procedures and have been
tabulated for different values of z1, at an interval of 0.01. These areas
(probabilities) are given in the Table provided.

In terms of probabilities, the areas under the standard normal curve are
given by:

P (z ≥ a) = P (z > a) = Area to the right of the (vertical) line at z = a

P (z ≤ a) = P (z < a) = Area to the left of the line at z = a

P (a ≤ z ≤ b) = P(a < z < b) = Area between the line at z = a and z = b

The values of z to the left of z = 0 are negative and to the right of z = 0 are

96
positive.

COMPUTATION OF AREA TO THE RIGHT OF THE ORDINATE AT x=a

P (x > a)
= P (z > z1)

X=µ X=a
Z = 0 Z = z1

Case (i), a > µ; i.e., a is to the right of the mean ordinate (see fig. above)

When x = a, (say)

P (X > a)

and the probability p (0 < Z < z1) can be read from the table provided.

Case (ii), a < µ, i.e., a is to the left of the mean ordination (fig. below)

X=a X=µ
Z = -z1 Z=0

Since a < µ, the value of z corresponding to X = a will be negative,

97
when X = a,

= 0.5 + P (-z1 < Z < 0)

= 0.5 + P (0 < Z < z1) (By Symmetry)

and P (0 < z < z1) can be read from the normal table provided.

COMPUTATION OF THE AREA TO THE LEFT OF THE ORDINATE AT x = b

05

X=µ X=b X=b X=u


Z=0 Z = z1 Z = z1 Z = 0 Z = z1

Case (i) i b > µ i.e., b is to the right of the ordinate at X = µ (fig. above)

When X = b,

(By Symmetry)

= 0.5 – P (0 < z < z1)

case(ii) b < µ, i.e., b is to the left of the ordinate at X = µ (fig. above)

P(X<b) = P(Z<-z1) = P(Z<z1) (By symmetry)

=0.5-P(0<Z<z1)

Example. Suppose the waist measurements W of 800 girls are normally


distributed with mean 66 cms, and standard deviation 5 cms. Find the

98
number N of girls with waists

(i) Between 65 and 70 cms; - - -

(ii) Greater than or equal to 72 cms .

Solution

We are given w where µ = 66 cms and, cms

For

(i) P (65 ≤ W ≤ 70) = P (- 0.2 ≤ Z ≤ 0.8)

= P (- 0.2 ≤ Z ≤ 0.8) + P (0 ≤ z ≤ 0.8)

= P (0 ≤ Z ≤ 0.2) + P (0 ≤ Z ≤ 0.8) (By Symmetry)

= 0.0793 + 0.2881 = 0.3674 (From Normal table)

Hence in a group of 800 girls, the expected number of girls with waists
between 65 cms and 70 cms is: 800 x 0.3674 = 293.92 294

(ii)The probability that a girl has waist greater than or equal to 72 cms is
given by

P (W ≥ 72) = P (Z ≥ 1.2) = 0.5 – P (0 ≤ Z ≤ 1.2)

= 0.5 – 0.3849 = 0.1151

Hence, in a group of 800 girls, the expected number of girls with waist
greater than or equal to 72 cms is: 800 x 0.1151 = 92.08 92

Example. Assume the mean height of soldiers to be 68.22 inches with a

99
variance of 10.8 inches. How many soldiers in a regiment of 1000 would
you expect to be

(i) over six feet tall, and (ii) below 5.5 feet? Assume heights to be normally
distributed

Solution

Let the variable x denote the height (in inches) of the soldiers. Then we are
given:

N = 68.22 and 2 = 10.8

(i) A soldier will be over 6 feet tall if X is greater than 72, (because x is
height in inches)

When

The probability that a soldier is over 6 feet = 72 tall is given by:

P (x > 72) = P (Z > 1.15) = 0.5 – P (0 ≤ Z ≤ 1.15) = 0.5 – 0.3749 = 0.


1251. Hence, in a regiment of 1000 Soldiers, the number of soldiers over 6
feet tall is: 1000 x 0.1251 = 125.1 125

(ii) The probability that a soldier is below 5.5’ = 66” is given by:

P (x < 66) = P

= P (Z < - 0.6756) = P (Z < 0.6756) (By Symmetry)

= 0.5 – P (0 < Z < 0.6756)

= 0.5 – 0.2501 = 0.2499 (approx.) Hence, the number of


soldiers over 5.5 feet in a regiment of 1000 soldier is:100 x 0.
2499 = 249.9 250

100
RELATIONSHIP BETWEEN VARIABLES

CORRELATION ANALYSIS: correlation analysis is the statistical tool that is


used to describe the degree to which one variable is linearly related to the
other. It’s, therefore, directed towards measuring the degree of association
of the variables. All measures of correlation are defined in such a fashion
that a measures of zero signifies no correlation at all, and a measures of
one signifies perfect correlation. Positive for a direct relationship, and
negative for an inverse relationship. Correlation analysis on the other hand,
is a descriptive statistic which only indicate how closely associated
variables are. The variables maybe designated as X and Y or P and Q etc
without any element of dependency or independency. Correlation
coefficient maybe interpreted as follows:

strong weak weak strong


- negati - negati 0 positiv 0 positiv +
1 0 . 1
perfect ve ve no e e perfect
.
moder moder
5
5
negativ ate correla ate positiv
e negativ tion positiv e
e e

METHODS OF STUDYING CORRELATION

We shall confine our discussion to the methods of ascertaining only linear


relationship between two variables (series). The commonly used methods
for studying the correlation between two variables are:

101
i. Scatter diagram method.

ii. Karl Pearson’s coefficient of correlation (covariance method

iii. Spearman’s Rank coefficient of correlation.

1) SCATTER DIAGRAM METHOD

Scatter diagram is one of the simplest ways of diagrammatic


representation of a bivariate distribution and provides us one of the
simplest tools of ascertaining the correlation between two variables.
The following diagrams of the scattered data depict different form of
correlation.

102
Examples. Following are the heights and weights of 10 students.

Height (in inches) (X) 62 72 68 58 65 70 66 63 60 7


2

Weight (in kgs) (Y) 50 65 63 50 54 60 61 55 54 6


5

Draw a scatter diagram and indicate whether the correlation is


positive or negative.

2) KARL PEARSON’S COEFFICIENT OF CORRELATION (rxy)

A mathematical method for measuring the intensity or the


magnitude of linear relationship between two variable series was

103
suggested by Karl Pearson (1867-1936) and is by far the most widely
used method in practice. We give below without proof the formula
for rxy

Or

Karl Pearson’s correlation coefficient is also known as the product


moment correlation coefficient.

Example. Calculate Karl Pearson’s coefficient of correlation between


expenditure on advertising and sales from the data giving below

Advertising expenses (X) 39 65 62 90 82 75 25 98 36 7


8

Sales (Y) 47 53 58 86 62 68 60 91 51 8
4

RANK CORRELATION METHOD

104
Sometimes we come across statistical series in which the variables under
consideration are not capable of quantitative measurement but can be
arranged in serial order. This happens when we are dealing with qualitative
characteristics (attributes) such as honesty, beauty, character, morality,
etc, which cannot be measured quantitatively but can be arranged serially.
In such situation Karl Pearson’s coefficient of correlation cannot be used
as such.

Charles Edward spearman, developed a formula in 1904 which consists in


obtaining the correlation coefficient between the ranks of n individuals in
the two attributes under study. We give here without proof the spearman’s
rank correlation coefficient formula

Where di is the different between the pair of ranks of the same individual in
the two characteristic and n is the number of pair.

Example. A Psychologist wanted to compare two method A and B of


teaching. He selected a random sample of 22 students. He grouped them
into 11 pairs so that students in a pair has approximately equal score on
an intelligence test. In each pair one student was taught by method A and
the other by method B and examined after the course. The mark obtained
by them are tabulated below

Pair: 1 2 3 4 5 6 7 8 9 10 11

A: 24 29 19 14 30 19 27 30 20 28 11

B: 37 35 16 26 23 27 19 20 16 11 21

105
Find the rank correlation coefficient.

LINEAR REGRESSION ANALYSIS

In the words of M.M. Blair “Regression analysis is a mathematical measure


of the average relationship between two or more variables in terms of the
original units of the data” Regression analysis is the process by which we
obtain a functional relationship between the variables under consideration,
it is a statistical tool which helps to predict on variable or variables on the
basis of the assumed nature of the relationship between variables. In
regression analysis there are two types of variables, the variable whose
value is influenced or is to be predicted is called dependent variable and
the variable which influences the values or is used for prediction is called
independent variable. Independent variable is also known as regressor or
predictor or explanatory, while the dependent variable is also known as
regressed or explained variable.-

SIMPLE LINEAR REGRESSION MODEL

This has two variables X and Y (say), it is the most commonly used
regression line, it is a straight line whose equation is , i =1,2,…,n where and
are respectively the dependent and independent variables, B0 and B1are the
parameters (constant), is the error term, which assumed to be
independently and normally distributed random variable with mean 0 and
constant variance , This equation with an appropriate values of may be
used to forecast the values of the dependent variable given a value of .

LINES OF REGRESSION

Line of regression is the line which gives the best estimate of one variable
for any given value of the other variable, In case of two variables X and Y
(say), we shall have two line of regression; one of Y on X and the other of X

106
on Y.

Line of regression Y on X is the line which gives the best estimate for the
value of for any specified value of. The equation is given below:

. The normal equations is given below without proof

Where

Similarly, line of regression of X on Y is the line which gives the best


estimate for the value of X for any specified value of Y.

The equation is: . The two normal equations are gives as

And , where

COEFFICIENT OF REGRESSION

Let us consider the line of regression of Y on X, , the coefficient ‘b’ which


is the slope of the line of regression of Y on X is called the coefficient of
regression of Y on X. It represents the increment in the value of the
dependent variable Y for a unit change in the value of the independent
variable X. In the other words, it represents the rate of change of Y with
respect to X. this is denoted by , where

Similarly in the regression equation of X on Y, , the coefficient B represent

107
the change in the value of dependent variable X for a unit change in the
value of independent variable Y and is called the coefficient of regression
of X on Y, This is also denoted by , where

Note: r2 =

Both and must have the same sign, if they had opposite signs, then r2
would become negative, This implies that r (correlation coefficient) would
be imaginary, which is a contradiction to the fact that r is a real quantity
lying between ±1. Hence, & must have the same sign. The sign of
correlation is same as that of the regression coefficient. If regression
coefficients are positive, r is positive and if negative, r is also negative.

For numerical computations of the equations of line of regression Y on X,


and X on Y, the following formulae are also very convenient to use.

Equation for line regression y on x is

which

,,

Equation for line regression of X on Y is

which

108
Example from the following data, obtain the two regression equations

Sales: 91 97 108 121 67 124 51 73 11 57


1

Purchas 71 75 69 97 70 91 39 61 80 47
e:

Example. A student computed the regression equation of nine


observations as

8-x, the other result are . The student later discovered an error in one pair
of the observation i.e., (8, 5) and decided to remove it. Find the regression
equation of the remaining observation.

This lecture notes is not for sale.

109

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy