Sta 101 Note PDF
Sta 101 Note PDF
STA101
Introductory Statistics
(2 UNITS)
Department of
Statistics
University of Abuja
This lecture notes is not for sale.
1
Course Aim and Objectives
COURSE Contents:
Texts
2
books in Statistics.
Lecture notes:
Definitions of Statistics
i. The field of utility of statistics has been increasing steadily and thus
different people defined it differently according to the development
of the subject. In old days statistics was regarded as the “science of
state craft” but today it embraces almost every sphere of natural and
human activity. Accordingly, the old definitions which were confined
to and very limited and narrow field of enquiry were replaced by the
new definitions which are more exhaustive and elaborate in
approach.
ii. The word statistics has been used to convey different meanings in
singular and plural sense. When use as plural statics means
3
numerical set of data and when used singular (statistic) sense it
means the science of statistical methods embodying the theory and
techniques use for collecting analyzing and drawing inferences from
numerical data.
Classification of Statistics
4
teaching methods should be presented. In addition, summary measures
such as the average scores of members of each of the groups should be
presented. This part of statics, concerned with the description and
summarization of data, is called descriptive statistics. After the preceding
experiment is completed and the data are described and summarized, we
hope to be able to draw conclusion about which teaching method is
superior. This part of statistics. Concerned with the drawing of conclusion,
is called inferential statistics.
Importance of Statistics
To a very striking degree our culture has become a statistical culture. Even
a person who may never have heard of an index number is affected by
those index numbers, which describe the cost of living. It is impossible to
understand psychology, sociology, economics, finance or natural and
physical sciences without some general idea of the meaning of an
average, of variation, of concomitance, of sampling, of how to interpret
charts and tables. Statistics is important in all area of human endeavors.
For example, we have: statistics in planning, statistics in state, statistics in
mathematics, statistics in physics, statistics in chemistry, in Biology, in
Economics, in industry, in insurance in astronomy, in psychology, in
education, in war, in medical science, among others, the list of area where
statistics is important is endless.
Collection of Data
Statistics are set of numerical data. In fact only numerical data constitute
statistics. This means that the phenomenon under study must be capable
of quantitative measurement. Thus, the raw material of statistics always
originates from the operation of counting (enumeration), or measurements.
For any statistical enquiry, whether it is in business, economic, or
Sciences, the basic problem is to collect facts and figures relating to
particular phenomenon under study. The items in which the
5
measurements are taken are called statistical units. On the face of it, it
might appear that the collection of data is the first step for any statistical
investigation. But in a scientifically prepared (efficient and well-plane)
statistical enquiry, the collection of data is by no means the first step.
Before we embark upon the collection of data for any given statistics
enquiry, it is imperative to examine carefully the following points which
may be termed as preliminaries to data collection: objectives and scope of
the enquiry, statistical units to be used, sources of information (data),
method of data collection, degree of accuracy aimed at in the final result,
type of enquiry. A good data however, possessed the following
characteristics: it should be unambiguous, it should be specific, it should
be uniform, it should be stable, and it should be appropriate.
6
and are secondary for all sources who latter use such data. The methods
commonly use for the collection of primary data are as follows: direct
personal investigation. Indirect oral interviews. Information received
through local agencies. Mailed questionnaire method. Schedules sent
through enumerators. The chief source of secondary data may be broadly
classified into the following two groups: published sources and
unpublished sources.
· Sex
· Age
7
· Religion
· Different faculties
The basis or the criteria with respect to (w.r.t) which the data are classified
primarily depends on the objectives and the purpose of the enquiry.
Generally, the data can be classified on the following four bases;
Frequency Distribution
8
ii. Discrete or Ungrouped distribution
1. Married 28
2. Divorced 11
3. Separated 17
4. Single 44
Relative frequency
This would be very useful if we need to compare tow data set of different
sizes.
9
result in lengthy frequency table. To summarize such data sets and make
them more comprehensible in a frequency table we often collapse the
data into fewer classes by grouping them. A number of rules of the thumb
have been proposed for calculating the proper number of classes.
However, an elegant, though approximate formula seems to be one given
by Prof. Sturges known as Sturges’srule, According to which K = 1+3.322
log10 N
One has also to decide on the width of the class intervals (or size of class
intervals). In general, one should aim at class of equal width. Each class
intervals width, w, is obtained by w= , where R is the range, and K is the
number of class intervals. Another rule of the thumb for determining the
rule of the class interval should not be greater than th of the estimated
population standard deviation.
10, 17, 15, 11, 16, 19, 24, 29, 18, 25, 26, 32, 14, 22, 17, 20, 23, 27, 30, 12, 15,
18, 24, 36, 18, 15, 21, 28, 33, 38, 34, 13, 10, 16, 20, 22, 29, 19, 23, 31.
10
83, 72, 81, 64, 71, 63, 61, 60, 67, 74, 66, 64, 79, 73, 75, 76, 69, 68, 78, 67.
Calculate the monthly scholarship paid to the students.
of marks Monthly
scholarsh
ip
60-65 2,500
65-70 3,000
70-75 3,500
75-80 4,000
80-85 4,500
Examples:
30-50 5 5
35-40 10 5+10=15
11
Less than 55 5 40-45 15 15+15=30
Less than 60 0
Marks F
45-50 30 30+30=60
Less than 30 0
50-55 5 60+5=65
Less than 35 5
55-60 5 65+5=70
Less than 40 15
Less than 45 30
Less than 50 60
More than c.f
Less than 55 65
Marks f More than thanc. Less than 60 70
f
30-35 5 65+5=70
35-40 10 55+10=65
40-45 15 40+15=55
45-50 30 10+30=40
50-55 5 5+5=10
55-60 5 5
Less than 30 70
Less than 35 65
Less than 40 55
Less than 45 40
Less than 50 10
12
Tabulation
The various parts of a table include: the table number, title, head notes or
prefatory notes, captions and stubs, body of the table, foot-note, and
source note.
13
facilitate comparisons.
Number of accidents:
0 1 2 3 4 5 6 7 8 9 10 11
Number of drivers:
82 44 68 41 25 20 13 7 5 4 3 2
2. Bar charts: Bar diagrams are one of the easiest and the most
commonly used devices of presenting most of the business and
economic data. These are especially satisfactory for categorical data
or series. The height (or length) of each bar indicating the size of the
figure represented, where the length of a single bar is proportional to
the magnitude of each part of the data. The sizes of the bars are the
same. The following are the various types of bar diagram in common
use:
14
Example: use a simple bar chart to illustrate the number of
workers employed in the factories tabulated below.
Factory A B C D
Pacific 70.8
Atlantic 41.2
15
Draw a pie Indian 28.5 diagram to
represent the data.
Antarctic 7.6
Arctic 4.8
The difference between the diagrams and graphs is that; diagrams are
useful for visual presentation of categorical and geographical data while
the data relating to time series and frequency distribution is best
represented through graphs. Diagrams are primarily used for comparative
studies and can’t be used to study the relationship, (not necessary
functional) between the variables under study. This is done through graphs.
The most commonly used graphs for charting a frequency distribution for
the general understanding of the details of the data are:
i. Histogram
Histogram: It is one of the most popular and commonly used devices for
16
charting continuous frequency distribution. It consist in erecting a series
of adjacent vertical rectangles on the sections of the horizontal axis (
x-axis), with the bases (sections) equal to the width of the corresponding
class intervals and heights are so taken that the areas of the rectangles
are equal to the frequencies of the corresponding classes. The values are
taken along the x-axis and the frequencies along the y-axis. This however,
involves two cases: case (i) Histogram with equal classes, case (ii)
Histogram with unequal classes.
Less than 10 4
Less than 20 6
Less than 30 24
Less than 40 46
Less than 50 67
Less than 60 86
Less than 70 96
Less than 80 99
17
Example: represent the following data by means of a histogram
No. of 7 19 27 15 12 12 8
workers
No. of 0 1 2 3 4 5 6 7 8 9 10 11
accidents
No. of 8 44 6 41 25 20 13 7 5 4 3 2
drivers 0 8
Example. The following table gives give the frequency distribution of the
weekly wages
18
(in ’00 N) of 100 workers in a factory.
Weekly 20-2 25-2 30-3 35-3 40-4 45-4 50-5 55- 60 Tota
wages ( ’ 4 9 4 9 4 9 4 59 -6 l
00 N 4
No of 4 5 12 23 31 10 8 5 2 100
workers
No. of 7 13 24 30 22 15 6
students
19
i. Curves of symmetrical distribution. In a symmetrical distribution, the
class frequencies first rise steadily, reach a maximum and then
diminish in the same identical manner. The most commonly and
widely used symmetrical curve in statistics is the normal frequency
curve.
Normal Probability Curve
X = mean
20
iii. Extremely Asymmetrical or J-Shaped Curve.
The distribution in which the value of the variable correspondingly to the
maximum frequency is at one of the ranges, give rise to highly skewed
curves. When plotted, they give a J-shaped or inverted J-shaped curve and
accordingly such curves are also called J-Shaped curves.
iv. U-Curve.
The frequency distribution in which the maximum frequency occurs at the
extremes (i.e, both ends) of the range and the frequency keeps on falling
symmetrically (about the middle), the minimum frequency being attained
at the center give rise to a U-Shaped curve.
U-shaped Curve
Bi 21
-
v. Mixed Curves
Sometimes, though very rarely, we come across certain distributions in
which maximum frequency is attained at two or more points in an irregular
manner. Such curves are obtained in a distribution where as the value of
the variable increase, the frequencies increase and decrease, then again
increase and decrease twice or thrice as shown in the diagram or even
more than that.
Tri-modal
Curve
frequenc
frequency
Variable
Variabl
e
22
ii. ‘More Than’ Ogive
‘Less Than’ Ogive; this consists in plotting the ‘less than’ cumulative
frequencies against the upper-class boundaries of the respective classes.
The point so obtained are joined by the smooth freehand curve to give
‘Less Than’ Ogive. Obviously, ‘Less than’ Ogive is an increasing curve,
sloping upwards form left to right and has the shape of an elongated S.
‘More Than’ Ogive; Similarly, in ‘More Than’ Ogive, the ‘More Than’
cumulative frequencies are plotted against the lower class boundaries of
the respective classes. The point so obtained are joined by a smooth
freehand curve to give ‘More Than’ Ogive. ‘More Than’ Ogive is a
decreasing curve and slopes downwards from left to right and has the
shape of an elongated S, upside down
Remarks: we may draw both the ‘Less Than’ Ogive and ‘More Than’ Ogive
on the same graph. If done so, they intersect at a pint. The foot of the
perpendicular from their point of intersection on the x-axis gives the value
of median.
Example
The table below give the marks obtained by 70 candidates in STA 101
examination
0-10 2 2 70
23
30-40 11 11+11 =22 65-6 = 59
plot both less than and more than ogiv for the data.
24
AVERAGES
Averages are the typical values around which other items of the
distribution congregate. They are the values which lie between the two
extreme observation, (i.e., the smallest and the largest observations), of
the distribution and give us an idea about the concentration of the values
in the central part of the distribution. Accordingly they are also sometimes
referred to as the measures of central tendency. Averages are very much
useful:
25
around cluster” _______ Lawrence J. Kaplan
(iv) Median
(v) Mode
In general, if X1, X2, ---, Xn are the given n observations, then their arithmetic
mean, usually denoted by is given by:
X=
Example: the following table gives the daily income of ten operators in a
machine tool factory. Find the mean.
Name of A B C D E F G H I J
operators
Income 12 15 18 20 25 30 22 35 37 26
Solution
26
If income is represented by x, then
X - --
- --
No of Calls 0 1 2 3 4 5 6 7
One Minute 14 21 25 43 51 40 39 12
intervals
Solution
Let the variable x denote the number of calls received per minute at the
exchange
No of 0 1 2 3 4 5 6 7
Calls (x)
Frequenc 14 21 25 43 51 40 39 12 ∑f=245
y (f)
27
Fx 0 21 50 129 204 200 234 84 ∑fx=24
5
= =3.8
To calculate the mean of such a data set; we multiply each class mark by
the corresponding class frequencies, sum these products over all the
class intervals, and divide the results by the total frequency. Thus, the
mean is calculated as = i=1, 2, - - -, k, where xi represent the class mark
Solution:
1 0 – 49 24.5 5 122.5
2 50 – 99 74.5 16 1192.0
28
4 150 – 199 174.5 128 22336.0
∑fi=72 ∑fixi=
0
167090.
0
Mathematically, ∑ (x - ) = 0
= ∑ f x - ∑f (: is a constant)
=∑fx- N (: ∑f = N)
But = ∑ fx = ∑ fx = N
29
: ∑ f (x - = N - N = 0
S = ∑ f (x – A)2
= fx
= or 1,2 =
30
Illustration: The mean of marks in statistics of 100 students in a class was
72. The mean of marks of boys was 75, while their number was 70. Find
out the mean marks of girls in the class.
= = 72 =
2 = = = 65
It may be pointed out that the formula can be used conveniently if the
values of x or/and f are small. However, if the values of x or/and f are large,
the calculations of mean by is quite tedious and time consuming. In such
a case the calculation can be reduced to a great extent by using the step
deviation (or assumed mean) method which consists in taking the
deviation (differences) of the given observation from any arbitrary value A.
=A+
From d = , we get hd = (X – A)
Marks 0 - 10 10 – 20 20 – 30 30 – 40 40 – 50 50 – 60 60 – 70
No. of 6 5 8 15 7 6 3
Students
Solution
1 0 - 10 5 6 30 -3 -18
2 10 – 20 15 5 75 -2 -10
3 20 – 30 25 8 200 -1 -8
4 30 – 40 35 15 525 0 0
32
5 40 – 50 45 7 315 1 7
6 50 – 60 55 6 330 2 12
7 60 – 70 65 3 195 3 9
∑f = 50 ∑fx = ∑fd = -8
1670
(i) = =
(ii) A = 35, h = 10
: =A+
Geometric Mean
G.M = = 8
=
33
Taking anti log of both sides
G.M = Antilog
G.M. Antilog
Solution
X 2 4 8 12 16 24 Total
G.M. = Antilog
No. of 5 7 15 25 8
Students
Solution
0-10 5 5 0. 3.4950
6990
34
10-20 15 7 1.1761 8.2327
20-30 25 15 1. 20.9685
3979
∑f = ∑ f log x = 84.
60 5243
HARMONIC MEAN
Another important mean is the harmonic mean which is used for averaging
the rates. If X1, X2, - - -, Xn is a given set of n observations, then their
harmonic mean (H.M) or simply H is given by:
H=
=H=
Weights 13 13 14 14 14 14 14 15 15
(lbs) 0 5 0 5 6 8 9 0 7
35
No. of 3 4 6 6 3 5 2 2 1
persons
Solution
130 3 0.0231
135 4 0.0296
140 6 0.0429
145 6 0.0414
146 3 0.0205
148 5 0.0338
149 2 0.0134
150 1 0.0067
157 1 0.0064
∑f = 31 ∑ = 0.2178
H.M. =
36
Solution
Let the distance from the house to the college be xkms. In going from
house to college, the distance (x kms) is covered in hours, while in coming
from college to house, the distance is covered in hours. Thus a total
distance of 2x kms is covered in hours.
H = 12 km h-1
MEDIAN
“The median is that value of the variable which divides the group into two
equal parts, one part comprising all the values greater and the other, all the
values less than median”. Thus Median of a distribution may be define as
that value of the variable which exceeds and is exceeded by the same
number of observations i.e, it is the value such that the number of
observations above it is equal to the number of observations below it.
Thus the median is a positional average i.e, its value depends on the
position occupied by a value in the frequency distribution.
Calculation of Median
Case (1): Ungrouped Data: if the number of observations is odd, then the
median is the middle value after the observations have been arranged in
ascending or descending order of magnitude. For example, the median of
5 observations 32, 12, 40, 8, 60 i.e, 8, 12, 35, 40, 60 is 35.
37
For example 8, 12, 35, 40, 50, 60
The median =
Example: eight coins were tossed together and the number of heads (x)
resulting was noted. The operation was repeated 256 times and the
frequency distribution of the number of heads is given below:
No of heads 0 1 2 3 4 5 6 7 8
(x)
Frequency 1 9 26 59 72 52 29 7 1
Solution
X F Less than c.
f.
0 1 1
1 9 9
2 26 36
3 59 95
4 72 167
5 52 219
6 29 248
38
7 7 255
8 1 256
∑f = 256
Here, ∑f = 256, = = 128. the c.f. just greater than 128 is 167 and the value
of X corresponding to 167 is 4. hence, median number of heads is 4.
median =l + , where
l is the lower limit of the median class, h is the magnitude or width of the
median class, N = , is the total frequency, f is the frequency of the median
class, c is the cumulative frequency of the class preceding the median
class.
39
No. of 14 20 42 54 45 18 7
Mangoes
Solution
409.5 - 419.5 14 14
419.5 – 429. 20 34
5
429.5 – 439. 42 76
5
= = 100 The c.f. just greater than 100 is 130. Hence the corresponding
class 439.5 – 449.5 is the median class.
Median =l +
= 439.5 +
= 443.94 grams
40
We can also get the median by plotting the ogive of the distribution,
median is the value below which item lie.
MODE
Illustration: The wheat yield in a particular region over the past 12 years (in
millions of tons) are: 1.5, 1.3, 1.2, 1.0, 1.3, 1.4, 1.6, 1.7, 1.5, 1.3, 1.2 and 1.4.
The mode is 1.3 (million tons).\
X 1 2 3 4 5 6 7 8 9
F 3 1 18 25 40 30 22 10 6
Fre
qu
en 41
cy
0 Mode x
Mode =l + , where
Assumptions:
42
Example: find the value of mode from the data given below:
Weight (in 93-97 98-10 103-1 108-1 113-1 118-1 123-1 128-1
kg) 2 07 12 17 22 27 32
No. of 3 5 12 17 14 6 3 1
students
Solution
Class boundaries f
92.5 – 97.5 3
97.5 – 102.5 5
102.5 – 107.5 12
107.5 – 112.5 17
112.5 – 117.5 14
117.5 – 122.5 6
122.5 – 127.5 3
127.5 – 132.5 1
43
Mode =l + =
Centre of gravity
MO MD M
44
QUARTILES, DECILES AND PERCENTILES
We have defined the median as the value of items which is located at the
centre of the array, we can define other measures which are located at
other specified points.
Quartiles: The values which divide the given data into four equal parts are
known as quartiles. Obviously there will be three such points Q1, Q2 and Q3
such that Q1 ≤ Q2 ≤ Q3, termed as the three quartiles. Q1, known as the lower
or first quartile is the value which has 25% of the items of the distribution
below it and consequently 75% of the items are greater than it. Incidentally
Q2, the second quartile, coincide with the median and has an equal number
of observations above it and below it. Q3, known as the upper or third
quartile, has 75% of the observations below it and consequently 25% of the
observations above it. The working principle for computing the quartiles is
basically the same as that of computing the median.
To computer Q1, find , where N = ∑f, see the (less than) c.f. just greater
than , the corresponding value of x gives the value of Q1. In case of
continuous frequency distribution, the corresponding class containing Q1
and the value of Q1 is obtained by the interpolation formula:
Similarly to compute Q3, see the (less than) c.f., just greater than , and for
continuous distribution, Q3 =l +
Deciles
Deciles are the values which divide the series into ten equal parts.
Obviously there are Nine deciles, D1, D2, D3, - - -, D9 (Say), such that D1 ≤ D2 ≤
D3 ≤ - - - ≤ D9. Incidentally D5 coincides with the median. The method of
computing the deciles Di (i = 1, 2, 3, - - -, 9) is the same as discussed for Q1
and Q3. To compute the ith decile = Di (i = 1, 2, 3, - - -, 9) see the c.f. just
45
greater than . the corresponding value of X is Di . In case of continuous
frequency distribution the corresponding class contains Di and its value is
obtained by the interpolation formula:
Di =l +
Percentiles:
Percentiles are the values which divide the series into 100 equal parts.
Obviously, there are 99 percentiles, P1, P2, P3, - - -, P99 such that P1 ≤ P2 ≤ P3 ≤
- - -≤ P99. The ith percentile Pi (i = 1, 2, 3, - - -, 99) is the value of X
corresponding to c.f. just greater than . In case of continuous frequency
distribution, the corresponding class contains Pi and its value is obtained
by the interpolation formula:
Pi =l +
D3 = P30
The various partition values quartiles, deciles and percentiles can be easily
located graphically with the help of Ogive.
10 – 19 8 13 9.5 - 19.5
46
20 – 29 7 20 19.5 - 29.5
30 – 39 12 32 29.5 - 39.5
40 – 49 28 60 39.5 - 49.5
50 – 59 20 80 49.5 - 59.5
60 – 69 10 90 59.5 - 69.5
∑f =
100
Quartiles: Q1 = =
The c.f. just greater than is 32. Hence the corresponding class is 29.5 –
39.5
Q1 = 29.5 +
The c.f. just greater than is 80. Hence, the class is 49.5 – 59.5 is the Q3
class
Q3 = 49.5 +
6th Deciles D6 =
D6 = 49.5 +
70th Percentile =
P70 = 49.5 +
DISPERSION
47
Averages or the measures of central tendency give us an idea of the
concentration of the observations about the central party of the
distribution. In spite of their great utility in statistical analysis, they have
their own limitations. If we are given only the average of a series of
observations, we cannot form complete idea about the distribution since
there may exist a number of distribution where averages are same but
which may differ widely from each other in a number of ways. Thus, the
measures of central tendency must be supported and supplemented by
some other measures, one such measure is ‘Dispersion’ literal meaning of
dispersion is “scatteredness”. We study dispersion to have an idea of the
homogeneity (compactness) or heterogeneity (scatter) of the distribution.
Dispersion is the measure of the variation of the items.
The first two measures, range and quartile deviation are termed a position
measures since they depend upon the values of the variables of particular
position of the distribution. The last measure, Lorenz curve is a graphical
method of studying variability.
48
Coefficient of Range (Relative measure of range) =
Month 1 2 3 4 5 6 7 8 9 10 11 12
Earning 13 15 15 15 15 15 16 16 16 16 17 17
(N1000) 9 0 1 1 7 8 0 1 2 2 3 5
Solution
L = 175000, 5 = 139000
Coefficient of range =
Age (in 16 – 20 21 – 25 26 – 30 31 – 35
years)
No of 10 15 17 8
Persons
Solution
Convert into continuous classes. The first class will then become 15.5
49
– 20.5 and the last class will become 30.5 – 35.5
L = 35.5, 5 = 15.5
Coefficient of range =
Inter-quartile range = Q3 – Q1
Coefficient of Q.D =
Percentile Range
50
(Pj – Pi)/2, (i < j)
The commonly used percentile range is the one which corresponds to the
10th and 90th percentile. Thus,
The above measures are absolute measures only. The relative measure of
variability based on percentile is given by:
Coefficient of 10 – 90 percentile =
If X1, X2, - - -, Xn are n given observations then the mean deviation (M.D)
about an average A, say, is given by:
M.D =
51
Where =
Steps:
(iii) Ignore the negative signs of deviation, taking all the deviation to be
positive to obtain the absolute deviation, = .
(iv) Obtain the sum of the absolute deviations obtained in step (iii)
(v) Divide the total obtained in step (iv) by n, the number of observation.
The result gives the value of the mean deviation about the average A. In
case of frequency distribution or grouped or continuous frequency
distribution, mean deviation about an average A is given by:
Coefficient of M.D. =
Example: Calculate the mean deviation from the following data given
52
marks obtained by 11 students in a class test 14, 15, 23, 20, 10, 30, 19, 18,
16, 25, 12
53
Solution
M.D. =
f 7 18 25 31 15 4
Solution
7 75 125 875
25 125 75 1350
50 175 25 625
81 225 25 775
96 275 75 1125
∑ F = 5250
Here
Median =1 +
54
M.D. = about median =
Coefficient of M.D. =
4. Standard Deviation
Standard deviation, usual denoted by the letter (small sigma) of the Greek
alphabet was first suggested by Karl Pearson as a measure of dispersion
in 1893. It is defined as the positive square root of the arithmetic mean of
the squares of the deviations of the given observations from the arithmetic
mean. Thus if X1, X2, - - -, Xn is a set of n observations then its standard
deviation is given by:
= 2 , where
Steps:
(iii) Square each of the deviations obtained in step (ii) i.e., compute (X1 - )
2
, (X2 - )2, - - -, (Xn - 2.
(iv) Find the sum of the squared deviations in step (iii) and divide by n
given by:
(v) Take the positive square root of the value obtained in step (v)
(vi) The resulting value gives the standard deviation of the distribution.
Thus the value of will be greater if the values of X are scattered widely
away from the mean. Thus a small value of will imply that the distribution
is homogeneous and a large value of will imply that it is heterogeneous.
The square root of the mean square deviation is called root mean square
deviation and given by: S =2
S2 = 2
=2
= 2 + (2 + 2
=2+2.
∑f
S2 = 2 + 2
56
so, S2 = 2 + [2
S22
In other words, mean square deviation is not less than the variance or the
root mean square deviation is not less than the square deviation.
2 2
S = iff
(2 = O
so, ,
Different Formula:
· 2
x = 2, ∑f = N
· 2
x = 22 = 2 - 2
· 2 2
x = d =2-2
d = ; h > O, then
· 2
x = h22d = h2 2 2
57
Example, calculate the standard deviation of the frequency observation on
a certain variable:
solution
X X– (X - 2
240.16 0.00 0
58
∑X = 2401. ∑ (X – ∑ (X - 2 = 0.0106
60
Example: Calculate the mean and standard deviation from the following:
Value 90 – 80 – 70 – 60 – 50 – 40 – 30 –
99 89 79 69 59 49 39
F 2 12 22 20 14 4 1
Solution
90 – 99 94.5 2 3 6 18
80 – 89 84.5 12 2 24 48
70 – 79 74.5 22 1 22 22
60 – 69 64.5 20 0 0 0
50 – 59 54.5 14 -1 -14 14
59
40 – 49 44.5 4 -2 -8 16
30 – 39 34.5 1 -3 -3 9
∑f = ∑fd = ∑fd2 =
75 27 127
= 68.1
= h. = 12.505
It has been pointed out that we need statistical measures which will reveal
clearly the salient features of a frequency distribution. The measures of
central tendency tells us about the concentration of the observations
about the middle of the distribution and the measure of dispersion gives
us an idea about the spread or scatter of the observations about some
measure of central tendency. We may come across frequency
distributions which differ widely in their nature and composition and yet
may have the same central tendency and dispersion, but yet may give
histograms which differ very widely in shape and size.
60
Skewness
(ii) The values of mean, median and mode fall at different point, i.e., they
do not coincide.
D5 – D5 – i ≠ D5+1 – D5 (i = 1, 2, 3, 4)
(v) The sum of the positive deviations from the median is not equal to
the sum of the negative deviation from the median.
Skewness =
But quite often, mode is ill-defined and is thus quite difficult to locate.
In such a situation, we use the following empirical relationship between
61
the mean, median and mode for a moderately asymmetrical (skewed)
distribution.
MO = 3md – 2m
= skewness =
Skewness =
Sk (Kelly) =
62
KURTOSIS
A – Lepto -
Kurtic
B – Meso - Kurtic
C – Platy - Kurtic
63
THOERY OF PROBABILITY
(b)It is not definite but maybe one of the various possibilities depending
on the experiment.
The result under category (a) where the results can be predicted with
certainty is known as deterministic or predictable phenomenon. In a
deterministic phenomenon, the conditions under which an experiment is
performed, uniquely determine the outcome of the experiment.
In category (b) where the results cannot be predicted with certainty are
known as unpredictable or probabilistic phenomenon. Such phenomena
involve uncertainty or chance.
Therefore, the theory of probability has as its central feature, the concept
of a repeatable random experiment, the outcome of which is uncertain.
Basic Definitions.
64
Before we define probability as a concept, it is necessary to review the
definition of some probability terms that shall be employed in our
discussions.
65
mutually exclusive if the happening of any one of them excludes the
happening of all others in the same experiment. For example, in toss
of a coin, the event ‘head’ and ‘tail’ are mutually exclusive because if
head comes, we can’t get tail and if tail comes we can’t get head.
Similarly, in the throw of a die, the six faces numbered 1, 2, 3, 4, 5 and
6 are mutually exclusive. Thus, events are said to be mutually
exclusive if no two or more of them can happen simultaneously.
(7) Equally likely cases: The outcomes are said to be equally likely or
equally probable if none of them is expected to occur in preference
to other. Thus, in tossing of a coin or a dice, all the outcomes, H,T or
the faces 1, 2, 3, 4, 5, 6 are equally likely if the coin or dice is
unbiased.
i) Classical approach
66
iii) Axiomatic approach
CLASSICAL APPROACH
Remarks.
(1) Obviously, the number of cases favourable to the complementary
event i.e, non-happening of event ‘A’ are (N-M) and hence by
definition, the probability of non-occurrence of a is given by:
For any event A. if P (A) = 0, then A is called and impossible or null event. If
P (A) = 1, then A is called a certain event.
(ii)If the various outcomes of the random experiment are not equally
likely.
67
EMPIRICAL APPROACH
(ii) The relative frequency, may not attain a unique value, no matter
however large N may be.
Solution.
Since the dice can fall with any one of the faces 1, 2, 3, 4, 5, and 6, the
exhaustive number of cases is 6.
68
(i) The number of cases favourable to event of getting ‘5’ is 1
required probability =
Example 2. A coin is rolled three times, what is the probability of getting (i)
1 head, (ii) 2 heads, (iii) at least 2 heads.
Solution
Let H and T represent head and tail respectively, by using tree diagram to
generate the sample space, we haveS = {HHH, HTH, HHT, THH, TTH. HTT,
THT, TTT}
In general, the events A1, A2, A3, - - -, An are independent if and only if
Note: P (A/B) =
P (B/A) =
Example 3. A bag contains 8 white and 3 red balls. If two balls are drawn at
random without replacement, find the probability that
(i) Both are white
Solution.
= prb (1st is white and 2nd is red) or prb (1st is red and 2nd is white)
Property 2: P S
( ) = 1 (Axiom of certainty)
P or
P Axiom of additivity
Solution.
(i) We have P (AUB) = P (A) + P (B) – P (AnB)
71
P (AnB) = P (A) + P (B) – P (AUB) = 0.4 + P – 0.7 = P – 0.3
Theorems
(1) P (Ā) = 1 – P (A)
Example. Probability that a man will be alive 25 years hence is 0.3 and the
probability that his wife will be alive 25 years hence is 0.4. Find the
probability that 25 years hence
72
(v) At least one of them will be alive
Solution
Let A be the event that the man will be alive 25 years hence, B the event
that the woman will be alive 25 years hence. P (A) = 0.3 and P (B) = 0.4
(i) P (AnB) = P (A). P (B) = 0.3 x 0.4 = 0.12 (A & B are independent)
or
73
(i = 1, 2, - - -, n)
P (Ei/A) =
Remark. The probabilities P (E1), P (E2), - - - , P (En) which are already given
or known before conducting an experiment are termed as a prior
probabilities. The conditional probabilities P(E1/A), P (E2/A), - - -, P(En/A),
which are computed after conducting the experiment, occurrence of A are
termed as posteriori probabilities.
Solution
Let E1, E2 and E3 denote respectively the events that the bolt selected at
random is manufactured by the machines A, B and C respectively and let E
denote the event that it is defective. Then we have
Ei E1 E2 E3 Total
P (EnEi) =
74
P (E1/E) =
(ii) Similarly
P (E2/E) =
P (E3/E) =
Or
Events probability
0. E E1 n E 0.25 x 0.05 = 0.0125
E1
0.
0. E E2 n E 0.35 x 0.04 = 0.0140
0.
E2
0. E E3 n E 0.40 x 0.02 = 0.0080
0.
E3
Total = 0.0345
P (E1/E) =
75
P (E2/E) =
P (E3/E) =
76
Random Variable
Values of 3 2 1 2 1 1 0
X
77
class.
Mathematical Expectation
If X is a random variable which can assume any one of the values x1, x2, …, xn
with the respective probabilities p1 p2, - - -, pn, then the mathematical
expectation of X, usually called the expected value of X denoted by E (X), is
defined as;
Where = p1 + p2 + - - - + pn = 1
X x1 x2 x3 - - - xi - - - xn
F f1 f2 f3 - - - fi - - - fn
= = x1 + x2 + - - - + xn - - - (*)
so, P (X = xi) =
78
Substituting in (*) we get:
p 1 x1 + p 2 x2 + - - - + p n xn
Theorem on Expectation
Theorems:
79
Theorem 2. Var (aX) = a2. Var (X), where a is a constant
X 0 1 2 3
P (X)
Solution
Expenses = N30, 000, this will be his loss if the bid is not won,
80
P (winning the bid) = 10% = 0.10
Since the contractor’s expected profit is negative, he should not bid for the
contract.
Xi 4 5 6 8
Solution
= E (x2) – (E (x))2
= ∑ PX2 – (∑PX)2
81
X P PX PX2
4 0. 0.4 1.6
1
5 0. 1.5 7.5
3
6 0. 2.4 14.4
4
8 1.6 12.8
0.
Tota 5.9 36.3
2
l
1
Var(X)=36.3-(5.9)2 = 1.49
82
Probability Distribution of a Discrete Random Variable
Let us consider a discrete random variable X which can take the possible
values x1, x2, x3, - - -,xn. with each value of the variable, we associate a
number,
Pi = P (X = x); i = 1, 2, - - - , n
83
Probability Distribution of a Continuous Random Variable
i.e., P (a ≤ X ≤ b) = 1
P (X)
P (c ≤ X≤ d) =
p (x)
P (c ≤ X ≤ d)
X=c X=d x
84
(i) Binomial probability distribution
b x qn-x
85
, x = 0, 1, 2, - - -, n
O otherwise
Theorem. If X has a Binomial Distribution, then the mean, E (x) = np, V (x) =
npq
Proof:
By definition
E (X) = Px qn-x
= Px qn-x
= Px P! P-! qn-x
= np Px-1 qn-x
= put k = x -1
E (x) = np
P2P-2
86
Put k = x – 2
So,
Hence,
(iii) No head
87
(i) Required probability = p (6) = 0.2051
= 1 – P (x ≤ 3)
88
(c)P (1 ≤ x ≤ 4) = P (X = 2) + P (X=3) = 0.0810
Under the above three conditions, the Binomial probability mass function
(pmf) tends to the probability function of the poison distribution given
below:
Proof:
89
By definition ((X ) has a poison distribution)
Put k = x – 1
By definition
So
Put k = x – 2
, and hence
In Poisson (pmf) distribution the mean and the variance are equal.
2. The number of customers arriving at the super market: say per hour
90
3. The number of defects per unit of manufactured products
Example: Between the hours 2pm and 4pm the average number of phone
calls per minute coming into the switch board of a company is 2.35 find
the probability that during the particular minute, there will be at most two
phone calls.
P (X ≤ 2) = P (x = 0) + P (x = 1) + P (x = 2) = 0.5829
91
(i) None is defective
Solution
= np = 100 x 0.05 = 5
(ii)
NORMAL DISTRIBUTION
92
of Errors) after Karl Friedrich Gauss (1777 – 1855) who used this
distribution to describe the theory of accidental errors of measurements
involved in the calculation of orbits of heavenly bodies. Today, Normal
probability model is one of the most important probability models in
statistical analysis.
Or
The Mean and standard deviation are the parameters of the Normal
distribution.
93
Therefore, the standard normal variate has mean 0 and standard deviation
1. Hence, the probability function (pdf) of S.N.V. Z is given by:
=,
This gives the height (ordinate) of standard normal curve at the point Z. A
standard normal variable Z is denoted by Z N (0, 1)
94
RELATION BETWEEN POISSON AND NORMAL DISTRIBUTION
1. The graph of P(X) is the famous bell shaped curve as shown below.
The top of the bell is directly above the mean (µ)
P (X)
x= µ
2. The curve is symmetrical about the line X = µ, (Z = 0), i.e., it has the
same shape on either side of the line x = µ (or z = 0)
95
3. Since the distribution is symmetrical, mean = median = mode = µ
z=0 z1
P (0 < Z < z1) = Area under standard normal curve between Z = 0 and Z = z1
In terms of probabilities, the areas under the standard normal curve are
given by:
The values of z to the left of z = 0 are negative and to the right of z = 0 are
96
positive.
P (x > a)
= P (z > z1)
X=µ X=a
Z = 0 Z = z1
Case (i), a > µ; i.e., a is to the right of the mean ordinate (see fig. above)
When x = a, (say)
P (X > a)
and the probability p (0 < Z < z1) can be read from the table provided.
Case (ii), a < µ, i.e., a is to the left of the mean ordination (fig. below)
X=a X=µ
Z = -z1 Z=0
97
when X = a,
and P (0 < z < z1) can be read from the normal table provided.
05
Case (i) i b > µ i.e., b is to the right of the ordinate at X = µ (fig. above)
When X = b,
(By Symmetry)
=0.5-P(0<Z<z1)
98
number N of girls with waists
Solution
For
Hence in a group of 800 girls, the expected number of girls with waists
between 65 cms and 70 cms is: 800 x 0.3674 = 293.92 294
(ii)The probability that a girl has waist greater than or equal to 72 cms is
given by
Hence, in a group of 800 girls, the expected number of girls with waist
greater than or equal to 72 cms is: 800 x 0.1151 = 92.08 92
99
variance of 10.8 inches. How many soldiers in a regiment of 1000 would
you expect to be
(i) over six feet tall, and (ii) below 5.5 feet? Assume heights to be normally
distributed
Solution
Let the variable x denote the height (in inches) of the soldiers. Then we are
given:
(i) A soldier will be over 6 feet tall if X is greater than 72, (because x is
height in inches)
When
(ii) The probability that a soldier is below 5.5’ = 66” is given by:
P (x < 66) = P
100
RELATIONSHIP BETWEEN VARIABLES
101
i. Scatter diagram method.
102
Examples. Following are the heights and weights of 10 students.
103
suggested by Karl Pearson (1867-1936) and is by far the most widely
used method in practice. We give below without proof the formula
for rxy
Or
Sales (Y) 47 53 58 86 62 68 60 91 51 8
4
104
Sometimes we come across statistical series in which the variables under
consideration are not capable of quantitative measurement but can be
arranged in serial order. This happens when we are dealing with qualitative
characteristics (attributes) such as honesty, beauty, character, morality,
etc, which cannot be measured quantitatively but can be arranged serially.
In such situation Karl Pearson’s coefficient of correlation cannot be used
as such.
Where di is the different between the pair of ranks of the same individual in
the two characteristic and n is the number of pair.
Pair: 1 2 3 4 5 6 7 8 9 10 11
A: 24 29 19 14 30 19 27 30 20 28 11
B: 37 35 16 26 23 27 19 20 16 11 21
105
Find the rank correlation coefficient.
This has two variables X and Y (say), it is the most commonly used
regression line, it is a straight line whose equation is , i =1,2,…,n where and
are respectively the dependent and independent variables, B0 and B1are the
parameters (constant), is the error term, which assumed to be
independently and normally distributed random variable with mean 0 and
constant variance , This equation with an appropriate values of may be
used to forecast the values of the dependent variable given a value of .
LINES OF REGRESSION
Line of regression is the line which gives the best estimate of one variable
for any given value of the other variable, In case of two variables X and Y
(say), we shall have two line of regression; one of Y on X and the other of X
106
on Y.
Line of regression Y on X is the line which gives the best estimate for the
value of for any specified value of. The equation is given below:
Where
And , where
COEFFICIENT OF REGRESSION
107
the change in the value of dependent variable X for a unit change in the
value of independent variable Y and is called the coefficient of regression
of X on Y, This is also denoted by , where
Note: r2 =
Both and must have the same sign, if they had opposite signs, then r2
would become negative, This implies that r (correlation coefficient) would
be imaginary, which is a contradiction to the fact that r is a real quantity
lying between ±1. Hence, & must have the same sign. The sign of
correlation is same as that of the regression coefficient. If regression
coefficients are positive, r is positive and if negative, r is also negative.
which
,,
which
108
Example from the following data, obtain the two regression equations
Purchas 71 75 69 97 70 91 39 61 80 47
e:
8-x, the other result are . The student later discovered an error in one pair
of the observation i.e., (8, 5) and decided to remove it. Find the regression
equation of the remaining observation.
109