0% found this document useful (0 votes)
31 views20 pages

Unit 2.

Uploaded by

Avi Hamal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
31 views20 pages

Unit 2.

Uploaded by

Avi Hamal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
You are on page 1/ 20

UNIT: 2, DESCRIPTIVE STATISTICS

2.1 Introduction:

2.2 Frequency distribution:-

Descriptive statistics is used to summarize or present the data, either numerically or graphically.

Numerical descriptors of data include the followings.

 One-way or two-way frequency tables

 Various kinds of summary measures, such as mean, variance, correlation coefficient, regression
coefficient, skewness, kurtosis and so on

Series:
(i) Individual series.( eg : 10,20,30,40,40)

(ii) Continuous series

Inclusive(x) 0—9 10—19 20—29 30—39 40—49

Frequency 2 4 6 4 2

Exclusive(x) 0—10 10—20 20—30 30—40 40—50

Frequency 2 4 6 4 2

An example of an open end class:-

Class Interval Below10 10-20 20-30 30-40 Above 40


Frequency 2 4 6 4 2

Frequency Distributation :

(i) Discrete frequency distributation.


X 10 11 12 13 4
Y 2 4 6 4 1

(ii) Continuous frequency distributation or Group frequency distributation .


Class Interval 0-10 10-20 20-30 30-40
Frequency 10 12 13 12

6
2.3 Diagrammatic and graphical presentation of data:-

2.3.1 Simple, sub-divided, percentage and multiple bar-diagrams, pie diagram.


There are various diagrammatic devices by which statistical data can be presented.
The following are the most common types of diagrams:
(i). Simple bar diagram
(ii). Sub divided bar diagram
(iii). Percentage bar diagram
(iv). Multiple bar diagram
(v). Pie-chart (Pie diagram)

Simple Bar Diagram :


It is a diagram used to represent only one variable. It is used for comparative study of two or more values
of a single variable. The different bars are drawn for the different values of the single variable on the
same baseline. The height of the bar is used to represent the value of variable and the width of the bar is
used to make the diagram attractive and understandable. The different bars should be of the same width.

Example : 1
Present the following data by a simple bar diagram.

A B C D E F G
Cities
Population (Millions) 5 7 10 15 13 16 14

Solution:- The following figure represents the population in millions of different cities by means of
simple bar diagram.

s) Simple bar diagram


on
16
illi
(m 14
on 12
ati10
8
pul 6
Po 4
2
0
A B C D E F G
Sub-Divided Bar Diagram : Cities

A simple bar diagram represents the magnitude of a single factor according to time periods, places, items,
etc. But, when the magnitude of the factor is given with its sub-factors, each bar is further sub-divided into
components in proportion to the magnitude of the sub-factors. Such a diagram is known as sub-divided
bar diagram.

Note: When the negative value of a variable is to be presented, a sub-divided bar diagram is appropriate.

7
Example : 2
Draw a sub-divided bar diagram of the following data.
Country-wise tourists in various cities of Nepal
Country Cities
Kathmandu Pokhara Palpa Illam
USA 192 182 172 162
Canada 90 80 70 65
Germany 55 54 50 45
Solution :

400
Sub divided bar diagram
sts 350
uri 300
to
of 250
. 200
Germany
No
150
Canada
100 USA
50

0
Kathmandu Pokhara Palpa Illam

Example : 3
Represent the following data of expenditure of two families by a suitable diagram.

Items of expenditure Family A(Income Rs 800) Family B(Income Rs 500)


Food 300 190
Clothing 175 100
Education 70 160
Miscellaneous 210 180
Saving / Deficit +45 -130
Solution :
Since pie chart cannot show the profit and loss, the above data can be presented by a sub-divided bar
diagram to compute variables in a better way.

Sub divided bar diagram


e
1000
tur
800
ndi Saving / Deficit
600 Food
pe
Ex 400 Clothing
Education
200
Miscellaneous
0

-200
Family A Family B
8
Percentage Bar Diagram :
A sub-divided bar diagram presented in a percentage basis is known as percentage bar diagram. It is
used for comparing the relative changes in the data. In this diagram, total value of each characteristic is
considered as 100 and expressed the value of each component as the total. So in this diagram, height of
each bar will be the same i.e. 100 and the different segments of the bar representing different heights
corresponding to their respective percentage.

Example : 4
Draw the percentage bar diagram of data of the number of students in different colleges in different
programs.

Program College
A B C D
BBS 450 390 295 360
BA 285 195 190 200
B Sc 160 150 140 130

Solution : Percentage of total numbers of Students


Program College
A B C D
BBS 50.3% 53.1% 47.2% 52.2%
BA 31.8% 26.5% 30.4% 28.9%
B Sc 17.9% 20.4% 22.4% 18.8%
Total 100% 100% 100% 100%
Percentage bar diagram:

100%
Percentage of No. of students

80%

60% B Sc
BA
40% BBS

20%

0%
College A College B College C College D

Multiple Bar Diagram :


Multiple bar diagram is a one dimensional bar diagram in which two or more sets of interrelated data are
represented. In a multiple bar diagram, adjoining bars are drawn according to the number of factors and
their heights, in proportion to the values of the factors in the same order for each period or place. Each
bar of a group is shown by different patterns or colours to make them easily distinguishable and this
pattern is retained in the entire group. A constant distance is maintained between group of bars drawn for
periods or places. Such a diagram is known as multiple bar diagram.
9
Example : 5

Draw the multiple bar diagram for import and export of a company given below.

Year Import (‘000 Rs) Export (‘000 Rs)


1999 122 142
2000 102 123
2001 92 116
2002 81 99
2003 75 96
2004 74 73
The following diagram shows the import and export of a company in different years by means of a
multiple bar diagram.

Multiple bar diagram


rt
po 160
ex 140
d 120
100
an Import
80
rt
60 Export
po
40
Im
20
0
1999 2000 2001 2002 2003 2004
Year

Pie – Diagram or Pie Chart:


A pie-chart is a circular diagram which is usually used for depicting the components of a single factor. Pie
chart is an angular two dimensional diagram in which area of different sectors of circle can represent
items of statistical data. The circle is divided into segments, which are in proportion to the size of the
components. They are shown by different patterns of colors to make them attractive. In this diagram we
compare the area of different components. The main limitation of a pie chart is that negative data can’t be
depicted.
To calculate the angle of each component, we first equate the total observation equal to 360° i.e. angle at
center of the circle. Then the corresponding angle for any given value = × Given value. For a single pie
chart, the radius may be any value which makes the chart attractive. But in case of two pie charts at a
time radii of the circle of the pie chart must be in the proportion of square root of the total observations.
Therefore,

r1: r 2 =

10
The comparison of the pie diagrams is to be made on the basis of the areas of the circles and to various
sectors that are difficult to be ascertained visually with precision. Generally sub-divided or percentage bar
diagrams are preferred to pie diagrams for studying the changes in the total and component parts.
Moreover, pie diagrams are difficult to construct and compare with bars. If the number of component parts
is more than 6, a pie chart is not preferable to construct.
Example : 6

Draw a pie–chart for the following data of expenditure.

Items Expenditure (Rs)


Food 300
Rent 200
Clothing 100
Education 75
Lighting 65
Others 60

Solution:
Let Total expenditure Rs 800 = 360°

Then angle for Food = × 300 = 135° and so on

Items Expenditure (Rs) Angle at center (degree)


300 135
Food
Rent 200 90
Clothing 100 45
Education 75 33.75
Lighting 65 29.25
Others 60 27
Total 800 360

Pie chart showing expenditure (Rs.)


Food
Rent
Clothing
Education
Lighting
Others

Example : 7
11
Construct a pie-diagram for the following data of weekly expenditure of two families from Kathmandu.

Expenditure on Family A Family B


Food 250 300
Clothing 150 275
Housing 100 125
Fuel 50 75
Education 100 100
Entertainment 50 60
Miscellaneous 60 65
Total 760 1000
Solution :
For Family ‘A’, Taking Rs 760 = 360°, Rs.1= ,

So, angle at center for food = = 118.42°

In similar manner, other angles are calculated.

Similarly for family ‘B’, Taking Rs. 1000 = 360°, Rs. 1 = ,

Therefore, angle at center for food = = 108°

In similar manner, other angles are calculated.

Calculation table for pie-diagram,

Family A Family B
Items
Expenditure Angle at Center Expenditure Angle at Center
Food 250 118.4° 300 108°
Clothing 150 71.05° 275 99°
Housing 100 47.37° 125 45°
Fuel 50 23.68° 75 27°
Education 100 47.37° 100 36°
Entertainment 50 23.68° 60 21.6°
Miscellaneous 60 28.42° 65 23.4°
Total 760 360° 1000 360°
Square root of Total = 27.568 = 31.623
Radii 1 1.147

12
Pie-chart showing expenditure of two families from Kathmandu

Family A Family B

Miscellaneous

Entertainment
Education
Fuel
Housing
Clothing

Radius r1 = 2cm Radius r2 = 2.29 Food


cm

2.3.2 Graphic presentation of data:


Graphical representation can be advantageously employed to bring out clearly the statistical nature of
frequency distribution which may be discrete or continuous.
The most common graphical presentations of the data of frequency distribution or frequency graphs are

(i) Histogram
(ii) Frequency Polygon
(iii) Frequency Curve
(iv) Ogive

Histogram:
A common graphical presentation of quantitative data is a histogram. It is used to describe numerical data
that have been grouped into frequency, relative frequency or percentage distributions. A histogram is
constructed by placing the variable of interest on the horizontal axis and the frequency, relative frequency
or percentage frequency on the vertical axis. The frequency, relative frequency or percentage frequency
of each class is shown by drawing a rectangle whose base is the class interval on the horizontal axis (X)
and whose height is the corresponding frequency relative frequency or percent frequency.
For unequal class intervals the heights will be proportional to the frequency density. It is ratio of frequency
to corresponding class size.
 Frequency density =

Example : 8

Draw a histogram from the following data:


Frequency

Class Y 0-5 5-10 10-15 Histogram


15-20 20-25 25-30 30-35
40
Frequency 8 16 30 35 40 24 4
30

20

10

X
0 5 10 15 20 25 30 35 40
Class
Example :9

Draw a histogram from the following data.


Wage (Rs) 10-15 15-20 20-25 25-30 30-40 40-60

Frequency 8 18 25 15 12 12

Solution :
It is a case of unequal class size. Therefore, we first calculate frequency density of each class
as,Frequency density of a class =

Wage Frequency Frequency density (Adjusted frequency)


10 - 15 8 8
15 - 20 18 18
20 - 25 25 25
25 - 30 15 15
30 - 40 12 12/2 = 6
40 - 60 12 12/4 = 3
Frequency

Histogram

Frequency Polygon :

10 15 20 25 30 40 60
Wages (Rs)
It is another method of graphic representation of frequency distribution. When the distribution is discrete,
plotting the points with values of the variable on the X-coordinate and the corresponding frequencies on
the y-coordinate and joining the points by straight line we obtain the frequency polygon.
If the frequency distribution is continuous then joining the mid points of the top of the adjacent bars of a
histogram in order draws frequency polygon. Since polygon is a closed geometrical figure, so line of
polygon from first and last bars are extended up to x-axis. Frequency polygon can be drawn with the help
of histogram and without histogram.
The main purpose of frequency polygon is to find the mode to depict the nature of the distribution.

Example : 10

Draw the histogram and frequency polygon of a following data.

Marks 10-20 20-30 30-40 40-50 50-60 60-70 70-80


Frequency 9 18 27 36 32 29 17

Frequency Curve :
A frequency curve is a graphic representation of frequencies corresponding to the vertices of the
frequency polygon by a smooth curve. The frequency polygon is smoothed in such a way that the area
enclosed by frequency curve is the same as in frequency polygon and histogram.

Example : 11

Draw a histogram and frequency curve of the following data.

Marks 0-20 20-40 40-60 60-80 80-100 100-120


Frequency 50 100 150 90 60 50

Also locate the value of the mode of data.

Frequency Curve

160
cy 140
uen 120
Freq 100
80
60
40
20
0 0 20 40 60 80 100 120
Marks
Mode
So, by inspection of histogram the mode of the distribution is 48

Cumulative Frequency Curve(ogive):


It is a graph plotted from the variate values and their corresponding cumulative frequency of a frequency
distribution. Variate values are plotted in X-axis and cumulated frequency in Y- axis. Its shape is just as
elongated S. An ogive curve is prepared either for more than type or less than type distribution. It is used
to calculate the median and partition values.

Less than ogive:


To prepare less than type ogive, first of all less than cumulative frequencies are prepared by serially
adding the frequencies from top to bottom. Then, taking the upper limits of each class interval along X-
axis and the respective cumulative frequencies along Y-axis we plot the points on the graph paper. Then
the points so obtained are joined freely. Less than ogive is an increasing curve sloping upward from left to
right and has the shape of an elongated S.

More than ogive:


To prepare more than type ogive, first of all more than cumulative frequencies are prepared by serially
adding the frequencies from bottom to top. Then, taking the lower limits of each class interval along X-
axis and the respective cumulative frequencies along Y-axis we plot the points on the graph paper. Then
the points so obtained are joined freely. More than ogive is a decreasing curve sloping downwards from
left to right and has the shape of an elongated S.

Example : 12
Draw Ogive from the following data and hence locate the value of median.

Wage (Rs) 0 - 20 20 - 40 40 - 60 60 - 80 80 - 100

Frequency 10 30 60 40 10

Solution:
First of all prepare more than and less than cumulative frequency as,
Less than method More than method
Wage (Rs) Frequency Wage Frequency
Less than 20 10 More than 0 150
Less than 40 40 More than 20 140
Less than 60 100 More than 40 110
Less than 80 140 More than 60 50
Less than 100 150 More than 80 10

Ogives showing the wage distribution


cy
en 160
qu 140
120
100
fre

80
ve

60
lati
mu
40
Cu 20
0
0 20 40 60 80 100
Median
Wage (Rs)

From the graph, the median of the distribution is estimated as Rs 52.

Example : 13
Following data gives the frequency distribution of marks secured by 100 students.

Marks 0 - 10 10 - 20 20 - 30 30 - 40 40 - 50 50 - 60 60 - 70 70 - 80
Students 4 10 16 22 20 18 8 2

Draw the "less than" Ogive to estimate the number of students getting marks 45 or less.
Solution :
Less than cumulative frequency table
Marks Number of students
less than 10 4
less than 20 14
less than 30 30
less than 40 52
less than 50 72
less than 60 90
less than 70 98
less than 80 100
Less than Ogives
120
100
80
s 60
student40
20
No. of

0
10 20 30 40 50 60 70 80
Marks

From the graph of Ogives we can find that the number of students who secure marks less than or equal to
45 is 62.

2.4 Measure of central tendency :(Average)

Introduction:

After the collection of data, the next step is to analyze it; since huge and unwieldy masses of data are
confusing and difficult to remember, so we need a unique value representing them. The averages are the
measures which condense a huge mass of data into single value representing the whole data. Averages
are the typical values around which most of the data tend to cluster. These are the values which lie
between two extreme observations of the entire data and give us the idea about the concentration of the
value in the central part of the distribution. Measure of such single value is known as measure of central
tendency. The objects of central tendency are,
(i) To facilitate comparison.
(ii) To present the salient features of a mass of complex data.
(iii) To know about universe a sample
(iv) To trace mathematical relation.
(v) To help in decision-making.
Requisites of a good average:

(i) To be an ideal average, the following characteristics should be satisfied.


(ii) It should be rigidly defined and its value should be definite.
(iii) It should be simple to understand.
(iv) It should be based on all the observations.
(v) It should be easy to calculate.
(vi) It should be suitable for further mathematical treatment.
(vii) It should be least affected by fluctuation of sampling
(viii) It should not be affected by extreme observation.
Types of Average:

The measure of central tendency is designed to measure the central value around which most of the data
tend to concentrate. The following are the measures of central tendency or measures of location:
(a) Mean.
Arithmetic mean.
Geometric mean
Harmonic mean.
(b) Median.
(c) Mode.

Arithmetic mean:
Arithmetic mean or simply a “mean” of a set of observation is the sum of all the observations divided by
the number of observations. Arithmetic mean is also known as the arithmetic average. It is divided into
three types.
 Simple arithmetic mean.
 Weighted arithmetic mean.
 Combined arithmetic mean.

Simple arithmetic mean:-


Let X1, X2……….Xn be n number of observations of a variable X. Thus simple arithmetic mean denoted by

is defined as, = (For individual series)

Again,
Let f1, f2, …………fn be the corresponding frequency of X1, X2……….Xn. Then the simple arithmetic mean

is defined as, = (For frequency distribution)

Weighted mean:-
Let X¹, X²……….Xⁿ be n number of observations of a variable X. Let W¹, W²……….Wⁿ be their
corresponding weights. Then weighted mean denoted as,

W= , W=

Correction of wrong observation :

Q. The mean of 100 observations was found to be 40. Later on it was found that an observation 140 was
wrongly interred instead of 41. Find the correct mean?
Solⁿ:- Given, n=100, wrong =40, wrong observation=41, correct =?

we have, = , wrong = , 40 = , = 4000.

Then, Correct =4000-140+41, Correct = 3901,

Again, correct = , correct = , correct =39.01 Ans.

Q. Why is the arithmetic mean best measure of central tendency?

Solution: - The arithmetic mean is the best measure of central tendency because of;

(1) The algebraic sum of deviation of observation taken from arithmetic mean is always zero.ie,

=0. X 1 2 3 4 5
-2 -1 0 1 2

Where, n = 5 , , = , = , =3.

(2) The algebraic sum of square of deviation of observation taken from arithmetic mean is least.ie,

is least.

X
1 -2 4
2 -1 1
3 0 0
4 1 1
5 2 2

(3) It is based on all the observation.


(4) It is least affected by sampling fluctuation.
(5) It is rigidly defined average.
(6) It is used for further statistical treatment.
Note:- Properties( no.1. to no.2.), Reason (no.1. to no.6.)

Combined mean : -

If X¹ be the arithmetic mean of the first group consisting of N ¹ values & X² be the arithmetic mean of the

second group consisting of N² values, the A.M.( 12 ) of the whole group is given by,
…………………………….(1)

Similarly, If X3 be the arithmetic mean of the third group consisting of N 3 values, than A.M.( 123 ) of the

whole group is given by,


………………………………..…….(2)
Example :
No. of Students. Intrance (N1=80%) I.sc. (N2=20%)
1 x1=70% X2=52%
2 x1=60% X2=80%
3 x1=75% X2=60%

, 12 = , 12 = , 12 = , 12 = 66.4% Ans.

Merits of arithmetic mean.

(i) It is rigidly defined.


(ii) It is based on all observations.
(iii) It is simple to understand and easy to calculate.
(iv) It is suitable for further mathematical treatmeants.
(v) It is least affected by fluctuation of sampling.
Demerits of arithmetic mean.

(i) It is very much affected by extreme observations.


(ii) It cannot be computed accurately in case of open end classes.
(iii) It gives sometimes fallacious conclusion.
It cannot be used if we are dealing with qualitative characteristics with cannot be measured quantitatively.

Median:

Median is a central tendency that divides the entire arranged data set into two equal parts. So it is also
called a positional average. It is denoted by Md. It is preferred to use for a highly skewed data. More
ever, It can be used for qualitative data.
A B
10 10
20 20
30 30
40 40
50 90

How to calculate median?

(i)For Individual series:- Md. = Value of ( ) th item. n (1, 2, 3, 4, 5, 6) = 6


:. Md. = Value of ( ) th item, = Value of ( ) th item, = ( ) th item, = 3.5 th
item, = ,= , = 3.5

(ii) For discrete series:- Md. = Value of ( ) th item.

X F cf
Where, N=∑f, For, eg. 1 1 1
2 3 4
3 5 9
:. Md. = Value of ( ) th item, = Value of ( ) th item, = 3. 4 3 12
5 1 13
N=13

(iii) For continuous series : - Median class = Size of ( ) th item.

:. Md. = L+ .

Where, L = Lower limit of median class, F = Frequency corresponding to median class.


c f = Cumulative frequency preceding median class, h = Class size.

Note : - (i) Inclusive- Exclusive, (ii) Unequal class-Equal class.

Q. Calculate the following data ?

Class interval 0-99 100-199 200-299 300-399 400-499


Frequency 2 4 6 4 2

Solution: -
Since the given distribution consists of inclusive classes, we need to change it into exclusive classes.

0-99, 100-199, D = (99-100), D=1, cf= , = , = 0.5

Class interval Frequency Cumulative frequency


-0.5-99.5 2 2
99.5-199.5 4 6
199.5-299.5 6 12
299.5-399.5 4 16
399.5-499.5 2 18
N=18

Median class = Size of ( ) th item, = Size of ( ) th item, = Size of 9 th item, = 199.5----299.5

L = 199.5, F = 6, c f = 6, H =100.
:.Md. = L+ . = 199.5 + , = 199.5 + , =199.5 + , = 199.5+ , = 249.5 Ans.

Merits of Median:-
1. Median is rigidly defined.
2. It is simple to understand and easy to calculate.
3. Median is not affected by extreme observations.
4. Median can be computed even for open-end classes.
5. Median can sometimes be located by inspection.
6. Median can be obtained graphically.
7. Median is only the average to be used while dealing with qualitative characteristics such as
intelligence, beauty etc.

Demerits of Median:-
(i) Arrangement of data according to magnitude is necessary.
(ii) Median is not based on all observations.
(iii) For an ungrouped data, if the number of observation is even, median cannot be
determined exactly.
(iv) Median is not suitable for further mathematical treatment.
(v) For a small size sample, median is affected by fluctuation of sampling.

Mode:-

Mode is a central value that repeats maximum time. It is denoted by Mode. It is the most common item in
the distribution. A distribution may have only one mode, called unimodal distribution. While a distribution
having two modal values is called bimodal distribution. And a distribution having more than two modal
values is called multimodal distribution, mode is said to be ill defined in such a situation we use the
following empirical relationship to get the single values of mode, Mode= 3Median- 2Mean.

Q.How to calculate Mode?


(i) For Individual series:- (ii) For discrete series:-
By inspection.(maximum repeats) By inspection for regular distribution.

(iii) For continuous series:-

Mode class by inspection method, Mode = L + x h.


Merits of Mode:-

1) It is easy to calculate and simple to understand.


2) Mode is not at all affected by extreme observations.
3) It can be obtained even in case of open end classes.
4) It can be obtained by inspection or by graph.
Demerits of Mode:-
i. Mode is not rigidly defined.
ii. It is not based on all the observation.
iii. Mode is not suitable for further mathematical treatment.
iv. Mode is affected to a greater extent by fluctuation of sampling.
Choice of an average:-
We have discussed various types of averages and also their relative merits and demerits. From these
discussion, it can be concluded that no single average is suitable for all circumstances. The choice of the
average depends upon,
 The nature and availability of the data.
 The nature of the variable involved.
 The purpose of the enquiry.
The following are the brief discussion of the situation in which case the different averages are chosen.

Arithmetic average:-
In the study of social, economical or commercial problems such as production, income, prices, imports
exports etc. the arithmetic average is used. It is mostly used average. But it should not be used in the
following case:
 When the distribution is highly skewed.
 When the distribution have open end classes.
 When average rate of growth is required.
 When there are very large and very small items in the series.
Weighted arithmetic average:-
When proper weights are to be given for different items of the series, weighted arithmetic average is
used.
Geometric average:-
Geometric average is widely used in averaging ratios and percentage and is computing average rates of
increasing or decrease. It is also advantageously used in the construction of index numbers.
Harmonic average:-
Harmonic average is used in computing the averages relating to the rates and ratios where time factor is
the variable.
Median:-
Median is specially applicable to cases relating to the qualitative phenomena such as intelligence, beauty
etc. which cannot be measured quantitatively. It is also useful in case of open end classes.
Mode:-
Mode is particularly used in business. Whenever a shopkeeper wants to stock the goods he sells, he
always looks to the modal size of the goods.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy