Unit 2.
Unit 2.
2.1 Introduction:
Descriptive statistics is used to summarize or present the data, either numerically or graphically.
Various kinds of summary measures, such as mean, variance, correlation coefficient, regression
coefficient, skewness, kurtosis and so on
Series:
(i) Individual series.( eg : 10,20,30,40,40)
Frequency 2 4 6 4 2
Frequency 2 4 6 4 2
Frequency Distributation :
6
2.3 Diagrammatic and graphical presentation of data:-
Example : 1
Present the following data by a simple bar diagram.
A B C D E F G
Cities
Population (Millions) 5 7 10 15 13 16 14
Solution:- The following figure represents the population in millions of different cities by means of
simple bar diagram.
A simple bar diagram represents the magnitude of a single factor according to time periods, places, items,
etc. But, when the magnitude of the factor is given with its sub-factors, each bar is further sub-divided into
components in proportion to the magnitude of the sub-factors. Such a diagram is known as sub-divided
bar diagram.
Note: When the negative value of a variable is to be presented, a sub-divided bar diagram is appropriate.
7
Example : 2
Draw a sub-divided bar diagram of the following data.
Country-wise tourists in various cities of Nepal
Country Cities
Kathmandu Pokhara Palpa Illam
USA 192 182 172 162
Canada 90 80 70 65
Germany 55 54 50 45
Solution :
400
Sub divided bar diagram
sts 350
uri 300
to
of 250
. 200
Germany
No
150
Canada
100 USA
50
0
Kathmandu Pokhara Palpa Illam
Example : 3
Represent the following data of expenditure of two families by a suitable diagram.
-200
Family A Family B
8
Percentage Bar Diagram :
A sub-divided bar diagram presented in a percentage basis is known as percentage bar diagram. It is
used for comparing the relative changes in the data. In this diagram, total value of each characteristic is
considered as 100 and expressed the value of each component as the total. So in this diagram, height of
each bar will be the same i.e. 100 and the different segments of the bar representing different heights
corresponding to their respective percentage.
Example : 4
Draw the percentage bar diagram of data of the number of students in different colleges in different
programs.
Program College
A B C D
BBS 450 390 295 360
BA 285 195 190 200
B Sc 160 150 140 130
100%
Percentage of No. of students
80%
60% B Sc
BA
40% BBS
20%
0%
College A College B College C College D
Draw the multiple bar diagram for import and export of a company given below.
r1: r 2 =
10
The comparison of the pie diagrams is to be made on the basis of the areas of the circles and to various
sectors that are difficult to be ascertained visually with precision. Generally sub-divided or percentage bar
diagrams are preferred to pie diagrams for studying the changes in the total and component parts.
Moreover, pie diagrams are difficult to construct and compare with bars. If the number of component parts
is more than 6, a pie chart is not preferable to construct.
Example : 6
Solution:
Let Total expenditure Rs 800 = 360°
Example : 7
11
Construct a pie-diagram for the following data of weekly expenditure of two families from Kathmandu.
Family A Family B
Items
Expenditure Angle at Center Expenditure Angle at Center
Food 250 118.4° 300 108°
Clothing 150 71.05° 275 99°
Housing 100 47.37° 125 45°
Fuel 50 23.68° 75 27°
Education 100 47.37° 100 36°
Entertainment 50 23.68° 60 21.6°
Miscellaneous 60 28.42° 65 23.4°
Total 760 360° 1000 360°
Square root of Total = 27.568 = 31.623
Radii 1 1.147
12
Pie-chart showing expenditure of two families from Kathmandu
Family A Family B
Miscellaneous
Entertainment
Education
Fuel
Housing
Clothing
(i) Histogram
(ii) Frequency Polygon
(iii) Frequency Curve
(iv) Ogive
Histogram:
A common graphical presentation of quantitative data is a histogram. It is used to describe numerical data
that have been grouped into frequency, relative frequency or percentage distributions. A histogram is
constructed by placing the variable of interest on the horizontal axis and the frequency, relative frequency
or percentage frequency on the vertical axis. The frequency, relative frequency or percentage frequency
of each class is shown by drawing a rectangle whose base is the class interval on the horizontal axis (X)
and whose height is the corresponding frequency relative frequency or percent frequency.
For unequal class intervals the heights will be proportional to the frequency density. It is ratio of frequency
to corresponding class size.
Frequency density =
Example : 8
20
10
X
0 5 10 15 20 25 30 35 40
Class
Example :9
Frequency 8 18 25 15 12 12
Solution :
It is a case of unequal class size. Therefore, we first calculate frequency density of each class
as,Frequency density of a class =
Histogram
Frequency Polygon :
10 15 20 25 30 40 60
Wages (Rs)
It is another method of graphic representation of frequency distribution. When the distribution is discrete,
plotting the points with values of the variable on the X-coordinate and the corresponding frequencies on
the y-coordinate and joining the points by straight line we obtain the frequency polygon.
If the frequency distribution is continuous then joining the mid points of the top of the adjacent bars of a
histogram in order draws frequency polygon. Since polygon is a closed geometrical figure, so line of
polygon from first and last bars are extended up to x-axis. Frequency polygon can be drawn with the help
of histogram and without histogram.
The main purpose of frequency polygon is to find the mode to depict the nature of the distribution.
Example : 10
Frequency Curve :
A frequency curve is a graphic representation of frequencies corresponding to the vertices of the
frequency polygon by a smooth curve. The frequency polygon is smoothed in such a way that the area
enclosed by frequency curve is the same as in frequency polygon and histogram.
Example : 11
Frequency Curve
160
cy 140
uen 120
Freq 100
80
60
40
20
0 0 20 40 60 80 100 120
Marks
Mode
So, by inspection of histogram the mode of the distribution is 48
Example : 12
Draw Ogive from the following data and hence locate the value of median.
Frequency 10 30 60 40 10
Solution:
First of all prepare more than and less than cumulative frequency as,
Less than method More than method
Wage (Rs) Frequency Wage Frequency
Less than 20 10 More than 0 150
Less than 40 40 More than 20 140
Less than 60 100 More than 40 110
Less than 80 140 More than 60 50
Less than 100 150 More than 80 10
80
ve
60
lati
mu
40
Cu 20
0
0 20 40 60 80 100
Median
Wage (Rs)
Example : 13
Following data gives the frequency distribution of marks secured by 100 students.
Marks 0 - 10 10 - 20 20 - 30 30 - 40 40 - 50 50 - 60 60 - 70 70 - 80
Students 4 10 16 22 20 18 8 2
Draw the "less than" Ogive to estimate the number of students getting marks 45 or less.
Solution :
Less than cumulative frequency table
Marks Number of students
less than 10 4
less than 20 14
less than 30 30
less than 40 52
less than 50 72
less than 60 90
less than 70 98
less than 80 100
Less than Ogives
120
100
80
s 60
student40
20
No. of
0
10 20 30 40 50 60 70 80
Marks
From the graph of Ogives we can find that the number of students who secure marks less than or equal to
45 is 62.
Introduction:
After the collection of data, the next step is to analyze it; since huge and unwieldy masses of data are
confusing and difficult to remember, so we need a unique value representing them. The averages are the
measures which condense a huge mass of data into single value representing the whole data. Averages
are the typical values around which most of the data tend to cluster. These are the values which lie
between two extreme observations of the entire data and give us the idea about the concentration of the
value in the central part of the distribution. Measure of such single value is known as measure of central
tendency. The objects of central tendency are,
(i) To facilitate comparison.
(ii) To present the salient features of a mass of complex data.
(iii) To know about universe a sample
(iv) To trace mathematical relation.
(v) To help in decision-making.
Requisites of a good average:
The measure of central tendency is designed to measure the central value around which most of the data
tend to concentrate. The following are the measures of central tendency or measures of location:
(a) Mean.
Arithmetic mean.
Geometric mean
Harmonic mean.
(b) Median.
(c) Mode.
Arithmetic mean:
Arithmetic mean or simply a “mean” of a set of observation is the sum of all the observations divided by
the number of observations. Arithmetic mean is also known as the arithmetic average. It is divided into
three types.
Simple arithmetic mean.
Weighted arithmetic mean.
Combined arithmetic mean.
Again,
Let f1, f2, …………fn be the corresponding frequency of X1, X2……….Xn. Then the simple arithmetic mean
Weighted mean:-
Let X¹, X²……….Xⁿ be n number of observations of a variable X. Let W¹, W²……….Wⁿ be their
corresponding weights. Then weighted mean denoted as,
W= , W=
Q. The mean of 100 observations was found to be 40. Later on it was found that an observation 140 was
wrongly interred instead of 41. Find the correct mean?
Solⁿ:- Given, n=100, wrong =40, wrong observation=41, correct =?
Solution: - The arithmetic mean is the best measure of central tendency because of;
(1) The algebraic sum of deviation of observation taken from arithmetic mean is always zero.ie,
=0. X 1 2 3 4 5
-2 -1 0 1 2
Where, n = 5 , , = , = , =3.
(2) The algebraic sum of square of deviation of observation taken from arithmetic mean is least.ie,
is least.
X
1 -2 4
2 -1 1
3 0 0
4 1 1
5 2 2
Combined mean : -
If X¹ be the arithmetic mean of the first group consisting of N ¹ values & X² be the arithmetic mean of the
second group consisting of N² values, the A.M.( 12 ) of the whole group is given by,
…………………………….(1)
Similarly, If X3 be the arithmetic mean of the third group consisting of N 3 values, than A.M.( 123 ) of the
, 12 = , 12 = , 12 = , 12 = 66.4% Ans.
Median:
Median is a central tendency that divides the entire arranged data set into two equal parts. So it is also
called a positional average. It is denoted by Md. It is preferred to use for a highly skewed data. More
ever, It can be used for qualitative data.
A B
10 10
20 20
30 30
40 40
50 90
X F cf
Where, N=∑f, For, eg. 1 1 1
2 3 4
3 5 9
:. Md. = Value of ( ) th item, = Value of ( ) th item, = 3. 4 3 12
5 1 13
N=13
:. Md. = L+ .
Solution: -
Since the given distribution consists of inclusive classes, we need to change it into exclusive classes.
L = 199.5, F = 6, c f = 6, H =100.
:.Md. = L+ . = 199.5 + , = 199.5 + , =199.5 + , = 199.5+ , = 249.5 Ans.
Merits of Median:-
1. Median is rigidly defined.
2. It is simple to understand and easy to calculate.
3. Median is not affected by extreme observations.
4. Median can be computed even for open-end classes.
5. Median can sometimes be located by inspection.
6. Median can be obtained graphically.
7. Median is only the average to be used while dealing with qualitative characteristics such as
intelligence, beauty etc.
Demerits of Median:-
(i) Arrangement of data according to magnitude is necessary.
(ii) Median is not based on all observations.
(iii) For an ungrouped data, if the number of observation is even, median cannot be
determined exactly.
(iv) Median is not suitable for further mathematical treatment.
(v) For a small size sample, median is affected by fluctuation of sampling.
Mode:-
Mode is a central value that repeats maximum time. It is denoted by Mode. It is the most common item in
the distribution. A distribution may have only one mode, called unimodal distribution. While a distribution
having two modal values is called bimodal distribution. And a distribution having more than two modal
values is called multimodal distribution, mode is said to be ill defined in such a situation we use the
following empirical relationship to get the single values of mode, Mode= 3Median- 2Mean.
Arithmetic average:-
In the study of social, economical or commercial problems such as production, income, prices, imports
exports etc. the arithmetic average is used. It is mostly used average. But it should not be used in the
following case:
When the distribution is highly skewed.
When the distribution have open end classes.
When average rate of growth is required.
When there are very large and very small items in the series.
Weighted arithmetic average:-
When proper weights are to be given for different items of the series, weighted arithmetic average is
used.
Geometric average:-
Geometric average is widely used in averaging ratios and percentage and is computing average rates of
increasing or decrease. It is also advantageously used in the construction of index numbers.
Harmonic average:-
Harmonic average is used in computing the averages relating to the rates and ratios where time factor is
the variable.
Median:-
Median is specially applicable to cases relating to the qualitative phenomena such as intelligence, beauty
etc. which cannot be measured quantitatively. It is also useful in case of open end classes.
Mode:-
Mode is particularly used in business. Whenever a shopkeeper wants to stock the goods he sells, he
always looks to the modal size of the goods.